the combination of flit buffer flow control methods and latency insensitive protocols is an effective solution for networks on chip noc since they both rely on backpressure the two techniques are easy to combine while offering complementary advantages low complexity of router design and the ability to cope with long communication channels via automatic wire pipelining we study various alternative implementations of this idea by considering the combination of three different types of flit buffer flow control methods and two different classes of channel repeaters based respectively on flip flops and relay stations we characterize the area and performance of the two most promising alternative implementations for nocs by completing the rtl design and logic synthesis of the repeaters and routers for different channel parallelisms finally we derive high level abstractions of our circuit designs and we use them to perform system level simulations under various scenarios for two distinct noc topologies and various applications based on our comparative analysis and experimental results we propose noc design approach that combines the reduction of the router queues to minimum size with the distribution of flit buffering onto the channels this approach provides precious flexibility during the physical design phase for many nocs particularly in those systems on chip that must be designed to meet tight constraint on the target clock frequency
we present an easy to use model that addresses the practical issues in designing bus based shared memory multiprocessor systems the model relates the shared bus width bus cycle time cache memory the features of program execution and the number of processors on shared bus to metric called request utilization the request utilization is treated as the scaling factor for the effective average waiting processors in computing the queuing delay cycles simulation study shows that the model performs very well in estimating the shared bus response time using the model system designer can quickly decide the number of the processors that shared bus is able to support effectively the size of the cache memory system should use and the bus cycle time that the main memory system should provide with the model we show that the design favors caching the requests for contention based medium instead of speeding up the transfers although the same performance can be respectively achieved by the two techniques in contention free situation
software product lines spls are used to create tailor made software products by managing and composing reusable assets generating software product from the assets of an spl is possible statically before runtime or dynamically at load time or runtime both approaches have benefits and drawbacks with respect to composition flexibility performance and resource consumption which type of composition is preferable should be decided by taking the application scenario into account current tools and languages however force programmer to decide between static and dynamic composition during development in this paper we present an approach that employs code generation to support static and dynamic composition of features of single code base we offer an implementation on top of featurec an extension of the programming language that supports software composition based on features to simplify dynamic composition and to avoid creation of invalid products we furthermore provide means to validate the correctness of composition at runtime automatically instantiate spls in case of stand alone applications and automatically apply interaction code of crosscutting concerns
experience has proved that interactive applications delivered through digital tv must provide personalized information to the viewers in order to be perceived as valuable service due to the limited computational power of dtv receivers either domestic set top boxes or mobile devices most of the existing systems have opted to place the personalization engines in dedicated servers assuming that return channel is always available for bidirectional communication however in domain where most of the information is transmitted through broadcast there are still many cases of intermittent sporadic or null access to return channel in such situations it is impossible for the servers to learn who is watching tv at the moment and so the personalization features become unavailable to solve this problem without sacrificing much personalization quality this paper introduces solutions to run downsized semantic reasoning process in the dtv receivers supported by pre selection of material driven by audience stereotypes in the head end evaluation results are presented to prove the feasibility of this approach and also to assess the quality it achieves in comparison with previous ones
model based testing techniques play vital role in producing quality software however compared to the testing of functional requirements these techniques are not prevalent that much in testing software security this paper presents model based approach to automatic testing of attack scenarios an attack testing framework is proposed to model attack scenarios and test the system with respect to the modeled attack scenarios the techniques adopted in the framework are applicable in general to the systems where the potential attack scenarios can be modeled in formalism based on extended abstract state machines the attack events ie attack test vectors chosen from the attacks happening in real world are converted to the test driver specific events ready to be tested against the attack signatures the proposed framework is implemented and evaluated using the most common attack scenarios the framework is useful to test software with respect to potential attacks which can significantly reduce the risk of security vulnerabilities
in this paper we address the problem of cache replacement for transcoding proxy caching transcoding proxy is proxy that has the functionality of transcoding multimedia object into an appropriate format or resolution for each client we first propose an effective cache replacement algorithm for transcoding proxy in general when new object is to be cached cache replacement algorithms evict some of the cached objects with the least profit to accommodate the new object our algorithm takes into account of the inter relationships among different versions of the same multimedia object and selects the versions to replace according to their aggregate profit which usually differs from simple summation of their individual profits as assumed in the existing algorithms it also considers cache consistency which is not considered in the existing algorithms we then present complexity analysis to show the efficiency of our algorithm finally we give extensive simulation results to compare the performance of our algorithm with some existing algorithms the results show that our algorithm outperforms others in terms of various performance metrics
distributed proof construction protocols have been shown to be valuable for reasoning about authorization decisions in open distributed environments such as pervasive computing spaces unfortunately existing distributed proof protocols offer only limited support for protecting the confidentiality of sensitive facts which limits their utility in many practical scenarios in this paper we propose distributed proof construction protocol in which the release of fact’s truth value can be made contingent upon facts managed by other principals in the system we formally prove that our protocol can safely prove conjunctions of facts without leaking the truth values of individual facts even in the face of colluding adversaries and fact release policies with cyclical dependencies this facilitates the definition of context sensitive release policies that enable the conditional use of sensitive facts in distributed proofs
in this paper we introduce novel approach to image completion which we call structure propagation in our system the user manually specifies important missing structure information by extending few curves or line segments from the known to the unknown regions our approach synthesizes image patches along these user specified curves in the unknown region using patches selected around the curves in the known region structure propagation is formulated as global optimization problem by enforcing structure and consistency constraints if only single curve is specified structure propagation is solved using dynamic programming when multiple intersecting curves are specified we adopt the belief propagation algorithm to find the optimal patches after completing structure propagation we fill in the remaining unknown regions using patch based texture synthesis we show that our approach works well on number of examples that are challenging to state of the art techniques
as text documents are explosively increasing in the internet the process of hierarchical document clustering has been proven to be useful for grouping similar documents for versatile applications however most document clustering methods still suffer from challenges in dealing with the problems of high dimensionality scalability accuracy and meaningful cluster labels in this paper we will present an effective fuzzy frequent itemset based hierarchical clustering ihc approach which uses fuzzy association rule mining algorithm to improve the clustering accuracy of frequent itemset based hierarchical clustering fihc method in our approach the key terms will be extracted from the document set and each document is pre processed into the designated representation for the following mining process then fuzzy association rule mining algorithm for text is employed to discover set of highly related fuzzy frequent itemsets which contain key terms to be regarded as the labels of the candidate clusters finally these documents will be clustered into hierarchical cluster tree by referring to these candidate clusters we have conducted experiments to evaluate the performance based on classic hitech re reuters and wap datasets the experimental results show that our approach not only absolutely retains the merits of fihc but also improves the accuracy quality of fihc
this paper presents novel scheme for maintaining accurate information about distributed data in message passing programs we describe static single assignment ssa based algorithms to build up an intermediate representation of sequential program while targeting code generation for distributed memory machines employing the single program multiple data spmd model of programming this ssa based intermediate representation helps in variety of optimizations performed by our automatic parallelizing compiler paradigm which generates message passing programs and targets distributed memory machines in this paper we concentrate on the semantics and implementation of this ssa form for message passing programs while giving some examples of the kind of optimizations they enable we describe in detail the need for various kinds of merge functions to maintain the single assignment property of distributed data we give algorithms for placement and semantics of these merge functions and show how the requirements are substantially different owing to the presence of distributed data and arbitrary array addressing functions this scheme has been incorporated in our compiler framework which can use uniform methods to compile parallelize and optimize sequential program irrespective of the subscripts used in array addressing functions experimental results for number of benchmarks on an ibm sp show significant improvement in the total runtimes owing to some of the optimizations enabled by the ssa based intermediate representation we have observed up to around ndash reduction in total runtimes in our ssa based schemes compared to non ssa based schemes on processors
super resolution reconstruction of face image is the problem of reconstructing high resolution face image from one or more low resolution face images assuming that high and low resolution images share similar intrinsic geometries various recent super resolution methods reconstruct high resolution images based on weights determined from nearest neighbors in the local embedding of low resolution images these methods suffer disadvantages from the finite number of samples and the nature of manifold learning techniques and hence yield unrealistic reconstructed images to address the problem we apply canonical correlation analysis cca which maximizes the correlation between the local neighbor relationships of high and low resolution images we use it separately for reconstruction of global face appearance and facial details experiments using collection of frontal human faces show that the proposed algorithm improves reconstruction quality over existing state of the art super resolution algorithms both visually and using quantitative peak signal to noise ratio assessment
we consider substantial subset ofc named we develop mathematical specification for by formalizing its abstract syntax execution environment well typedness conditions and operational evaluation semantics based on this specification we prove that is type safe by showing that the execution of programs preserves the types up to subtype relationship
web site presents graph like spatial structure composed of pages connected by hyperlinks this structure may represent an environment in which situated agents associated to visitors of the web site user agents are positioned and moved in order to monitor their navigation this paper presents heterogeneous multi agent system supporting the collection of information related to user’s behaviour in web site by specific situated reactive user agents the acquired information is then exploited by interface agents supporting advanced adaptive functionalities based on the history of user’s movement in the web site environment interface agents also interact with user agents to acquire information on other visitors of the web site and to support context aware form of interaction among web site visitors
in many advanced database applications eg multimedia databases data objects are transformed into high dimensional points and manipulated in high dimensional space one of the most important but costly operations is the similarity join that combines similar points from multiple datasets in this paper we examine the problem of processing nearest neighbor similarity join knn join knn join between two datasets and returns for each point in its most similar points in we propose new index based knn join approach using the idistance as the underlying index structure we first present its basic algorithm and then propose two different enhancements in the first enhancement we optimize the original knn join algorithm by using approximation bounding cubes in the second enhancement we exploit the reduced dimensions of data space we conducted an extensive experimental study using both synthetic and real datasets and the results verify the performance advantage of our schemes over existing knn join algorithms
caches exploits locality of references to reduce memory access latencies and thereby improve processor performance when an operating system switches application task or performs other kernel services the assumption of locality may be violated because the instructions and data may no longer be in the cache when the preempted operation is resumed thus these operations have an additional cache interference cost that must be taken into account when calculating or estimating the performance and responsiveness of the systemin this paper we present simulation framework suitable for examining the cache interference cost in preemptive real time systems using this framework we measure the interference cost for operating system services and set of embedded benchmarksthe simulations show that there are significant performance gap between the best and worst case execution times even for simple hardware architectures also the worst case performance of some software modules was found to be more or less independent of the cache configuration these results can be used to get better understanding of the execution behavior of preemptive real time systems and can serve as guidelines for choosing suitable cache configurations
in this paper we consider the restless bandit problem which is one of the most well studied generalizations of the celebrated stochastic multi armed bandit problem in decision theory in its ultimate generality the restless bandit problem is known to be pspace hard to approximate to any non trivial factor and little progress has been made on this problem despite its significance in modeling activity allocation under uncertainty we make progress on this problem by showing that for an interesting and general subclass that we term monotone bandits surprisingly simple and intuitive greedy policy yields factor approximation such greedy policies are termed index policies and are popular due to their simplicity and their optimality for the stochastic multi armed bandit problem the monotone bandit problem strictly generalizes the stochastic multi armed bandit problem and naturally models multi project scheduling where the state of project becomes increasingly uncertain when the project is not scheduled we develop several novel techniques in the design and analysis of the index policy our algorithm proceeds by introducing novel balance constraint to the dual of well known lp relaxation to the restless bandit problem this is followed by structural characterization of the optimal solution by using both the exact primal as well as dual complementary slackness conditions this yields an interpretation of the dual variables as potential functions from which we derive the index policy and the associated analysis
we introduce straightforward robust and efficient algorithm for rendering high quality soft shadows in dynamic scenes each frame points in the scene visible from the eye are inserted into spatial acceleration structure shadow umbrae are computed by sampling the scene from the light at the image plane coordinates given by the stored points penumbrae are computed at the same set of points per silhouette edge in two steps first the set of points affected by given edge is estimated from the expected light view screen space bounds of the corresponding penumbra second the actual overlap between these points and the penumbra is computed analytically directly from the occluding geometry the umbral and penumbral sources of occlusion are then combined to determine the degree of shadow at the eye view pixel corresponding to each sample point an implementation of this algorithm for the larrabee architecture yields from to frames per second in simulation for scenes from modern game and produces significantly higher image quality than other recent methods in the real time domain
in this paper we present system using computational linguistic techniques to extract metadata for image access we discuss the implementation functionality and evaluation of an image catalogers toolkit developed in the computational linguistics for metadata building climb research project we have tested components of the system including phrase finding for the art and architecture domain functional semantic labeling using machine learning and disambiguation of terms in domain specific text vis vis rich thesaurus of subject terms geographic and artist names we present specific results on disambiguation techniques and on the nature of the ambiguity problem given the thesaurus resources and domain specific text resource with comparison of domain general resources and text our primary user group for evaluation has been the cataloger expert with specific expertise in the fields of painting sculpture and vernacular and landscape architecture
data mining is new technology that helps businesses to predict future trends and behaviours allowing them to make proactive knowledge driven decisions when data mining tools and techniques are applied on the data warehouse based on customer records they search for the hidden patterns and trends these can be further used to improve customer understanding and acquisition customer relationship management crm systems are adopted by the organisations in order to achieve success in the business and also to formulate business strategies which can be formulated based on the predictions given by the data mining tools basically three major areas of data mining research are identified implementation of crm systems evaluation criteria for data mining software and crm systems and methods to improve data quality for data mining the paper is concluded with proposed integrated model for the crm systems evaluation and implementation this paper focuses on these areas where there is need for more explorations and will provide framework for analysis of the data mining research for crm systems
in this paper we propose the novel concept of probabilistic design for multimedia embedded systems which is motivated by the challenge of how to design but not overdesign such systems while systematically incorporating performance requirements of multimedia application uncertainties in execution time and tolerance for reasonable execution failures unlike most present techniques that are based on either worst or average case execution times of application tasks where the former guarantees the completion of each execution but often leads to overdesigned systems and the latter fails to provide any completion guarantees the proposed probabilistic design method takes advantage of unique features mentioned above of multimedia systems to relax the rigid hardware requirements for software implementation and avoid overdesigning the system in essence this relaxation expands the design space and we further develop an off line on line minimum effort algorithm for quick exploration of the enlarged design space at early design stages this is the first step toward our goal of bridging the gap between real time analysis and embedded software implementation for rapid and economic multimedia system design it is our belief that the proposed method has great potential in reducing system resource while meeting performance requirements the experimental results confirm this as we achieve significant saving in system’s energy consumption to provide statistical completion ratio guarantee ie the expected number of completions over large number of iterations is greater than given value
column statistics are an important element of cardinality estimation frameworks more accurate estimates allow the optimizer of rdbms to generate better plans and improve the overall system’s efficiency this paper introduces filtered statistics which model value distribution over set of rows restricted by predicate this feature available in microsoft sql server can be used to handle column correlation as well as focus on interesting data ranges in particular it fits well for scenarios with logical subtables like flexible schema or multi tenant applications integration with the existing cardinality estimation infrastructure is presented
to keep up with the explosive internet packet processing demands modern network processors nps employ highly parallel multi threaded and multi core architecture in such parallel paradigm accesses to the shared variables in the external memory and the associated memory latency are contained in the critical sections so that they can be executed atomically and sequentially by different threads in the network processor in this paper we present novel program transformation that is used in the intel auto partitioning compiler for ixp to exploit the inherent finer grained parallelism of those critical sections using the software controlled caching mechanism available in the nps consequently those critical sections can be executed in pipelined fashion by different threads thereby effectively hiding the memory latency and improving the performance of network applications experimental results show that the proposed transformation provides impressive speedup up to and scalability up to threads of the performance for the real world network application lgbps efhernet core metro router
we show that if connected graph with nodes has conductance then rumour spreading also known as randomized broadcast successfully broadcasts message within log many rounds with high probability regardless of the source by using the push pull strategy the notation hides polylog factor this result is almost tight since there exists graph of nodes and conductance with diameter log if in addition the network satisfies some kind of uniformity condition on the degrees our analysis implies that both both push and pull by themselves successfully broadcast the message to every node in the same number of rounds
although database design tools have been developed that attempt to automate or semiautomate the design process these tools do not have the capability to capture common sense knowledge about business applications and store it in context specific manner as result they rely on the user to provide great deal of trivial details and do not function as well as human designer who usually has some general knowledge of how an application might work based on his or her common sense knowledge of the real world common sense knowledge could be used by database design system to validate and improve the quality of an existing design or even generate new designs this requires that context specific information about different database design applications be stored and generalized into information about specific application domains eg pharmacy daycare hospital university manufacturing such information should be stored at the appropriate level of generality in hierarchically structured knowledge base so that it can be inherited by the subdomains below for this to occur two types of learning must take place first knowledge about particular application domain that is acquired from specific applications within that domain are generalized into domain node eg entities relationships and attributes from various hospital applications are generalized to hospital node this is referred to as within domain learning second the information common to two or more related application domain nodes is generalized to higher level node for example knowledge from the car rental and video rental domains may be generalized to rental node this is called across domain learning this paper presents methodology for learning across different application domains based on distance measure the parameters used in this methodology were refined by testing on set of representative cases empirical testing provided further validation
in this paper we address the problem of clustering graphs in object oriented databases unlike previous studies which focused only on workload consisting of single operation this study tackles the problem when the workload is set of operations method and queries that occur with certain probability thus the goal is to minimize the expected cost of an operation in the workload while maintaining similarly low cost for each individual operation classto this end we present new clustering policy based on the nearest neighbor graph partitioning algorithm we then demonstrate that this policy provides considerable gains when compared to suite of well known clustering policies proposed in the literature our results are based on two widely referenced object oriented database benchmarks namely the tektronix hypermodel and oo
integrating several legacy software systems together is commonly performed with multiple applications of the adapter design pattern in oo languages such as java the integration is based on specifying bi directional translations between pairs of apis from different systems yet manual development of wrappers to implement these translations is tedious expensive and error prone in this paper we explore how models aspects and generative techniques can be used in conjunction to alleviate the implementation of multiple wrappers briefly the steps are the automatic reverse engineering of relevant concepts in apis to high level models the manual definition of mapping relationships between concepts in different models of apis using an ad hoc dsl the automatic generation of wrappers from these mapping specifications using aop this approach is weighted against manual development of wrappers using an industrial case study criteria are the relative code length and the increase of automation
higher order abstract syntax is simple technique for implementing languages with functional programming object variables and binders are implemented by variables and binders in the host language by using this technique one can avoid implementing common and tricky routines dealing with variables such as capture avoiding substitution however despite the advantages this technique provides it is not commonly used because it is difficult to write sound elimination forms such as folds or catamorphisms for higher order abstract syntax to fold over such data type one must either simultaneously define an inverse operation which may not exist or show that all functions embedded in the data type are parametric in this paper we show how first class polymorphism can be used to guarantee the parametricity of functions embedded in higher order abstract syntax with this restriction we implement library of iteration operators over data structures containing functionals from this implementation we derive fusion laws that functional programmers may use to reason about the iteration operator finally we show how this use of parametric polymorphism corresponds to the schürmann despeyroux and pfenning method of enforcing parametricity through modal types we do so by using this library to give sound and complete encoding of their calculus into system private char inline graphic mime subtype gif xlink schar private char this encoding can serve as starting point for reasoning about higher order structures in polymorphic languages
we introduce theoretical framework for discovering relationships between two database instances over distinct and unknown schemata this framework is grounded in the context of data exchange we formalize the problem of understanding the relationship between two instances as that of obtaining schema mapping so that minimum repair of this mapping provides perfect description of the target instance given the source instance we show that this definition yields intuitive results when applied on database instances derived from each other by basic operations we study the complexity of decision problems related to this optimality notion in the context of different logical languages and show that even in very restricted cases the problem is of high complexity
mining association rules is an important technique for discovering meaningful patterns in transaction databases many different measures of interestingness have been proposed for association rules however these measures fail to take the probabilistic properties of the mined data into account we start this paper with presenting simple probabilistic framework for transaction data which can be used to simulate transaction data when no associations are present we use such data and real world database from grocery outlet to explore the behavior of confidence and lift two popular interest measures used for rule mining the results show that confidence is systematically influenced by the frequency of the items in the left hand side of rules and that lift performs poorly to filter random noise in transaction data based on the probabilistic framework we develop two new interest measures hyper lift and hyper confidence which can be used to filter or order mined association rules the new measures show significantly better performance than lift for applications where spurious rules are problematic
in spite of impressive gains by pl fortran and cobol remain the languages in which most of the world’s production programs are written and will remain so into the foreseeable future there is great deal of theoretical interest in algol and in extensible languages but so far at least they have had little practical impact problem oriented languages may very well become the most important language development area in the next five to ten years in the operating system area all major computer manufacturers set out to produce very ambitious multiprogramming systems and they all ran into similar problems number of university projects though not directly comparable to those of the manufacturers have contributed greatly to better understanding of operating system principles important trends include the increased interest in the development of system measurement and evaluation techniques and increased use of microprogramming for some programming system functions
this paper describes an ambient intelligent prototype known as socio ec socio ec explores the design and implementation of system for sensing and display user modeling and interaction models based on game structure the game structure includes word puzzles levels body states goals and game skills body states are body movements and positions that players must discover in order to complete level and in turn represent learned game skill the paper provides an overview of background concepts and related research we describe the prototype and game structure provide technical description of the prototype and discuss technical issues related to sensing reasoning and display the paper contributes by providing method for constructing group parameters from individual parameters with real time motion capture data and model for mapping the trajectory of participant’s actions in order to determine an intensity level used to manage the experience flow of the game and its representation in audio and visual display we conclude with discussion of known and outstanding technical issues and future research
we describe forward rasterization class of rendering algorithms designed for small polygonal primitives the primitive is efficiently rasterized by interpolation between its vertices the interpolation factors are chosen to guarantee that each pixel covered by the primitive receives at least one sample which avoids holes the location of the samples is recorded with subpixel accuracy using pair of offsets which are then used to reconstruct resample the output image offset reconstruction has good static and temporal antialiasing properties we present two forward rasterization algorithms one that renders quadrilaterals and is suitable for scenes modeled with depth images like in image based rendering by warping and one that renders triangles and is suitable for scenes modeled conventionally when compared to conventional rasterization forward rasterization is more efficient for small primitives and has better temporal antialiasing properties
there is common misconception that the automobile industry is slow to adapt new technologies such as artificial intelligence ai and soft computing the reality is that many new technologies are deployed and brought to the public through the vehicles that they drive this paper provides an overview and sampling of many of the ways that the automotive industry has utilized ai soft computing and other intelligent system technologies in such diverse domains like manufacturing diagnostics on board systems warranty analysis and design
as memory hierarchy becomes deeper and shared by more processors locality increasingly determines system performance as rigorous and precise locality model reuse distance has been used in program optimizations performance prediction memory disambiguation and locality phase prediction however the high cost of measurement has been severely impeding its uses in scenarios requiring high efficiency such as product compilers performance debugging run time optimizationswe recently discovered the statistical connection between time and reuse distance which led to an efficient way to approximate reuse distance using time however not exposed are some algorithmic and implementation techniques that are vital for the efficiency and scalability of the approximation model this paper presents these techniques it describes an algorithm that approximates reuse distance on arbitrary scales it explains portable scheme that employs memory controller to accelerate the measure of time distance it uncovers the algorithm and proof of trace generator that can facilitate various locality studies
retrieving images from large image collection has been an active area of research most of the existing works have focused on content representation in this paper we address the issue of identifying relevant images quickly this is important in order to meet the users performance requirements we propose framework for fast image retrieval based on object shapes extracted from objects within images the framework builds hierarchy of approximations on object shapes such that shape representation at higher level is coarser representation of shape at the lower level in other words multiple shapes at lower level can be mapped into single shape at higher level in this way the hierarchy serves to partition the database at various granularities given query shape by searching only the relevant paths in the hierarchy large portion of the database can thus be pruned away we propose the angle mapping am method to transform shape from one level to another higher level am essentially replaces some edges of shape by smaller number of edges based on the angles between the edges thus reducing the complexity of the original shape based on the framework we also propose two hierarchical structures to facilitate speedy retrieval the first called hierarchical partitioning on shape representation hpsr uses the shape representation as the indexing key the second called hierarchical partitioning on angle vector hpav captures the angle information from the shape representation we conducted an extensive study on both methods to see their quality and efficiency our experiments on sets of images each of which has objects around from to showed that the framework can provide speedy image retrieval without sacrificing on the quality both proposed schemes can improve the efficiency by as much as hundreds of times to sequential scanning the improvement grows as image database size objects per image or object dimension increase
with the advent of extensive wireless networks that blanket physically compact urban enclaves such as office complexes shopping centers or university campuses it is possible to create software applications that provide location based mobile online services one such application is campuswiki which integrates location information into wiki structure in the design science research reported in this paper we employed form of action research in which we engaged users as participants in an iterative process of designing and evaluating campuswiki two qualitative studies were undertaken early in the design process in which semi structured interviews were used to assess potential users reactions to campuswiki through this research the designers were able to assess whether their intentions matched the mental models of potential users of the application the results showed that although many of the perceived benefits were as designed by the developers misunderstanding of the location aware feature led users to unanticipated concerns and expectations these findings are important in guiding designers and implementers on the desirable and possibly undesirable features of such systems
existing template independent web data extraction approaches adopt highly ineffective decoupled strategies attempting to do data record detection and attribute labeling in two separate phases in this paper we propose an integrated web data extraction paradigm with hierarchical models the proposed model is called dynamic hierarchical markov random fields dhmrfs dhmrfs take structural uncertainty into consideration and define joint distribution of both model structure and class labels the joint distribution is an exponential family distribution as conditional model dhmrfs relax the independence assumption as made in directed models since exact inference is intractable variational method is developed to learn the model’s parameters and to find the map model structure and label assignments we apply dhmrfs to real world web data extraction task experimental results show that integrated web data extraction models can achieve significant improvements on both record detection and attribute labeling compared to decoupled models in diverse web data extraction dhmrfs can potentially address the blocky artifact issue which is suffered by fixed structured hierarchical models
as the total amount of traffic data in networks has been growing at an alarming rate there is currently substantial body of research that attempts to mine traffic data with the purpose of obtaining useful information for instance there are some investigations into the detection of internet worms and intrusions by discovering abnormal traffic patterns however since network traffic data contain information about the internet usage patterns of users network users privacy may be compromised during the mining process in this paper we propose an efficient and practical method that preserves privacy during sequential pattern mining on network traffic data in order to discover frequent sequential patterns without violating privacy our method uses the repository server model which operates as single mining server and the retention replacement technique which changes the answer to query probabilistically in addition our method accelerates the overall mining process by maintaining the meta tables in each site so as to determine quickly whether candidate patterns have ever occurred in the site or not extensive experiments with real world network traffic data revealed the correctness and the efficiency of the proposed method
this paper describes llvm low level virtual machine compiler framework designed to support transparent lifelongprogram analysis and transformation for arbitrary programs by providing high level information to compilertransformations at compile time link time run time and inidle time between runsllvm defines common low levelcode representation in static single assignment ssa form with several novel features simple language independenttype system that exposes the primitives commonly used toimplement high level language features an instruction fortyped address arithmetic and simple mechanism that canbe used to implement the exception handling features ofhigh level languages and setjmp longjmp in uniformlyand efficientlythe llvm compiler framework and coderepresentation together provide combination of key capabilitiesthat are important for practical lifelong analysis andtransformation of programsto our knowledge no existingcompilation approach provides all these capabilitieswe describethe design of the llvm representation and compilerframework and evaluate the design in three ways thesize and effectiveness of the representation including thetype information it provides compiler performance forseveral interprocedural problems and illustrative examplesof the benefits llvm provides for several challengingcompiler problems
caching frequently accessed data items on the client side is an effective technique for improving performance in mobile environment classical cache invalidation strategies are not suitable for mobile environments due to frequent disconnections and mobility of the clients one attractive cache invalidation technique is based on invalidation reports irs however the ir based cache invalidation solution has two major drawbacks which have not been addressed in previous research first there is long query latency associated with this solution since client cannot answer the query until the next ir interval second when the server updates hot data item all clients have to query the server and get the data from the server separately which wastes large amount of bandwidth in this paper we propose an ir based cache invalidation algorithm which can significantly reduce the query latency and efficiently utilize the broadcast bandwidth detailed analytical analysis and simulation experiments are carried out to evaluate the proposed methodology compared to previous ir based schemes our scheme can significantly improve the throughput and reduce the query latency the number of uplink request and the broadcast bandwidth requirements
physical database design is important for query performance in shared nothing parallel database system in which data is horizontally partitioned among multiple independent nodes we seek to automate the process of data partitioning given workload of sql statements we seek to determine automatically how to partition the base data across multiple nodes to achieve overall optimal or close to optimal performance for that workload previous attempts use heuristic rules to make those decisions these approaches fail to consider all of the interdependent aspects of query performance typically modeled by today’s sophisticated query optimizerswe present comprehensive solution to the problem that has been tightly integrated with the optimizer of commercial shared nothing parallel database system our approach uses the query optimizer itself both to recommend candidate partitions for each table that will benefit each query in the workload and to evaluate various combinations of these candidates we compare rank based enumeration method with random based one our experimental results show that the former is more effective
considering the constraint brought by mobility and resources it is important for routing protocols to efficiently deliver data in intermittently connected mobile network icmn different from previous works that use the knowledge of previous encounters to predict the future contact we propose storagefriendly region based protocol namely rena in this paper instead of using temporal information rena builds routing tables based on regional movement history which avoids excessive storage for tracking encounter history we validate the generality of rena through time variant community mobility model with parameters extracted from the mit wlan trace and the vehicular network based on bus routes of the city of helsinki the comprehensive simulation results show that rena is not only storage friendly but also more efficient than the epidemic routing the restricted replication protocol snw and the encounter based protocol rapid under various conditions
we propose distributed on demand power management protocol for collecting data in sensor networks the protocol aims to reduce power consumption while supporting fluctuating demand in the network and provide local routing information and synchronicity without global control energy savings are achieved by powering down nodes during idle times identified through dynamic scheduling we present real implementation on wireless sensor nodes based on novel two level architecture we evaluate our approach through measurements and simulation and show how the protocol allows adaptive scheduling and enables smooth trade off between energy savings and latency an example current measurement shows an energy savings of on an intermediate node
we study new research problem where an implicit information retrieval query is inferred from eye movements measured when the user is reading and used to retrieve new documents in the training phase the user’s interest is known and we learn mapping from how the user looks at term to the role of the term in the implicit query assuming the mapping is universal that is the same for all queries in given domain we can use it to construct queries even for new topics for which no learning data is available we constructed controlled experimental setting to show that when the system has no prior information as to what the user is searching the eye movements help significantly in the search this is the case in proactive search for instance where the system monitors the reading behaviour of the user in new topic in contrast during search or reading session where the set of inspected documents is biased towards being relevant stronger strategy is to search for content wise similar documents than to use the eye movements
in recent years network of workstations pcs so called now are becoming appealing vehicles for cost effective parallel computing due to the commodity nature of workstations and networking equipment lan environments are gradually becoming heterogeneous the diverse sources of heterogeneity in now systems pose challenge on the design of efficient communication algorithms for this class of systems in this paper we propose efficient algorithms for multiple multicast on heterogeneous now systems focusing on heterogeneity in processing speeds of workstations pcs multiple multicast is an important operation in many scientific and industrial applications multicast on heterogeneous systems has not been investigated until recently our work distinguishes itself from others in two aspects in contrast to the blocking communication model used in prior works we model communication in heterogeneous cluster more accurately by non blocking communication model and design multicast algorithms that can fully take advantage of non blocking communication while prior works focus on single multicast problem we propose efficient algorithms for general multiple multicast in which single multicast is special case on heterogeneous now systems to our knowledge our work is the earliest effort that addresses multiple multicast for heterogeneous now systems these algorithms are evaluated using network simulator for heterogeneous now systems our experimental results on system of up to nodes show that some of the algorithms outperform others in many cases the best algorithm achieves completion time that is within times of the lower bound
supporting quality of service qos in wireless networks has been very rich and interesting area of research many significant advances have been made in supporting qos in single wireless networks however the support for the qos across multiple heterogeneous wireless networks will be required in the future wireless networks in connections spanning multiple wireless networks the end to end qos will depend on several factors such as mobility and connection patterns of users and the qos policies in each of the wireless networks the end to end qos is also affected by multiple decisions that must be made by several different network entities for resource allocation the paper has two objectives one is to demonstrate the decision making process for resource allocation in multiple heterogeneous wireless networks and the second is to present novel concept of composite qos in such wireless environment more specifically we present an architecture for multiple heterogeneous wireless networks decision making process for resource request and allocation simulation model to study composite qos and several interesting results we also present potential implications of composite qos on users and network service providers we also show how the qos ideas presented in this paper can be used by wireless carriers for improved qos support and management the paper can form the basis for significant further research in dss for emerging wireless networks supporting qos for range of sophisticated and resource intensive mobile applications
this work addresses the problem of optimizing the deployment of sensors in order to ensure the quality of the readings of the value of interest in given critical geographic region as usual we assume that each sensor is capable of reading particular physical phenomenon eg concentration of toxic materials in the air and transmitting it to server or peer however the key assumptions considered in this work are each sensor is capable of moving where the motion may be remotely controlled and the spatial range for which the individual sensor’s reading is guaranteed to be of desired quality is limited in scenarios like disaster management and homeland security in case some of the sensors dispersed in larger geographic area report value higher than certain threshold one may want to ensure quality of the readings for the affected region this in turn implies that one may want to ensure that there are enough sensors there and consequently guide subset of the rest of the sensors towards the affected region in this paper we explore variants of the problem of optimizing the guidance of the mobile sensors towards the affected geographic region and we present algorithms for their solutions
we describe in place reconfiguration ipr for lut based fpgas an algorithm that maximizes identical configuration bits for complementary inputs of lut thereby reducing the propagation of faults seen at pair of complementary inputs based on ipr we develop fault tolerant logic resynthesis algorithm which decreases the circuit fault rate while preserving functionality and topology of the lut based logic network since the topology is preserved the resynthesis algorithm can be applied post layout and without changes in physical design compared to the state of the art academic technology mapper berkeley abc ipr reduces the relative fault rate by and increases mttf by with the same area and performance and ipr combined with previous fault tolerant logic resynthesis algorithm rose reduces the relative fault rate by and increases mttf by with less area but same performance the above improvement assumes stochastic single fault and more improvement is expected for multi fault models
in this paper an interactive and realistic virtual head oriented to human computer interaction and social robotics is presented it has been designed following hybrid approach taking robotic characteristics into account and searching for convergence between these characteristics real facial actions and animation techniques an initial head model is first obtained from real person using laser scanner then the model is animated using hierarchical skeleton based procedure the proposed rig structure is close to real facial muscular anatomy and its behaviour follows the facial action coding system speech synthesis and visual human face tracking capabilities are also integrated for providing the head with further interaction ability using the said hybrid approach the head can be readily linked to social robot architecture the opinions of number of persons interacting with this social avatar have been evaluated and are reported in the paper as against their reactions when interacting with social robot with mechatronic face results show the suitability of the avatar for on screen real time interfacing in human computer interaction the proposed technique could also be helpful in the future for designing and parameterizing mechatronic human like heads for social robots
the emerging paradigm of electronic services promises to bring to distributed computation and services the flexibility that the web has brought to the sharing of documents an understanding of fundamental properties of service composition is required in order to take full advantage of the paradigm this paper examines proposals and standards for services from the perspectives of xml data management workflow and process models key areas for study are identified including behavioral service signatures verification and synthesis techniques for composite services analysis of service data manipulation commands and xml analysis applied to service specifications we give sample of the relevant results and techniques in each of these areas
generic database replication algorithms do not scale linearly in throughput as all update deletion and insertion udi queries must be applied to every database replica the throughput is therefore limited to the point where the number of udi queries alone is sufficient to overload one server in such scenarios partial replication of database can help as udi queries are executed only by subset of all servers in this paper we propose globetp system that employs partial replication to improve database throughput globetp exploits the fact that web application’s query workload is composed of small set of read and write templates using knowledge of these templates and their respective execution costs globetp provides database table placements that produce significant improvements in database throughput we demonstrate the efficiency of this technique using two different industry standard benchmarks in our experiments globetp increases the throughput by to compared to full replication while using identical hardware configuration furthermore adding single query cache improves the throughput by another to
many artificial intelligence tasks such as automated question answering reasoning or heterogeneous database integration involve verification of semantic category eg coffee is drink red is color while steak is not drink and big is not color we present novel algorithm to automatically validate semantic category contrary to the methods suggested earlier our approach does not rely on any manually codified knowledge but instead capitalizes on the diversity of topics and word usage on the world wide web we have tested our approach within our online fact seeking question answering environment when tested on the trec questions that expect the answer to belong to specific semantic category our approach has improved the accuracy by up to depending on the model and metrics used
in the rank join problem we are given set of relations and scoring function and the goal is to return the join results with the top scores it is often the case in practice that the inputs may be accessed in ranked order and the scoring function is monotonic these conditions allow for efficient algorithms that solve the rank join problem without reading all of the input in this article we present thorough analysis of such rank join algorithms strong point of our analysis is that it is based on more general problem statement than previous work making it more relevant to the execution model that is employed by database systems one of our results indicates that the well known hrjn algorithm has shortcomings because it does not stop reading its input as soon as possible we find that it is np hard to overcome this weakness in the general case but cases of limited query complexity are tractable we prove the latter with an algorithm that infers provably tight bounds on the potential benefit of reading more input in order to stop as soon as possible as result the algorithm achieves cost that is within constant factor of optimal
in this paper we present method for organizing and indexing logo digital libraries like the ones of the patent and trademark offices we propose an efficient queried by example retrieval system which is able to retrieve logos by similarity from large databases of logo images logos are compactly described by variant of the shape context descriptor these descriptors are then indexed by locality sensitive hashing data structure aiming to perform approximate nn search in high dimensional spaces in sub linear time the experiments demonstrate the effectiveness and efficiency of this system on realistic datasets as the tobacco logo database
early applications of smart cards have focused in the area of personal security recently there has been an increasing demand for networked multi application cards in this new scenario enhanced application specific on card java applets and complex cryptographic services are executed through the smart card java virtual machine jvm in order to support such computation intensive applications contemporary smart cards are designed with built in microprocessors and memory as smart cards are highly area constrained environments with memory cpu and peripherals competing for very small die space the vm execution engine of choice is often small slow interpreter in addition support for multiple applications and cryptographic services demands high performance vm execution engine the above necessitates the optimization of the jvm for java cardsin this paper we present the concept of an annotation aware interpreter that optimizes the interpreted execution of java code using java bytecode superoperators sos sos are groups of bytecode operations that are executed as specialized vm instruction simultaneous translation of all the bytecode operations in an so reduces the bytecode dispatch cost and the number of stack accesses data transfer to from the java operand stack and stack pointer updates furthermore sos help improve native code quality without hindering class file portability annotation attributes in the class files mark the occurrences of valuable sos thereby dispensing the expensive task of searching and selecting sos at runtime besides our annotation based approach incurs minimal memory overhead as opposed to just in time jit compilerswe obtain an average speedup of using an interpreter customized with the top sos formed from operation folding patterns further we show that greater speedups could be achieved by statically adding to the interpreter application specific sos formed by top basic blocks the effectiveness of our approach is evidenced by performance improvements of upto obtained using sos formed from optimized basic blocks
engineering knowledge is specific kind of knowledge that is oriented to the production of particular classes of artifacts is typically related to disciplined design methods and takes place in tool intensive contexts as consequence representing engineering knowledge requires the elaboration of complex models that combine functional and structural representations of the resulting artifacts with process and methodological knowledge the different categories used in the engineering domain vary in their status and in the way they should be manipulated when building applications that support engineering processes these categories include artifacts activities methods and models this paper surveys existing models of engineering knowledge and discusses an upper ontology that abstracts the categories that crosscut different engineering domains such an upper model can be reused for particular engineering disciplines the process of creating such elaborations is reported on the particular case study of software engineering as concrete application example
recent works have shown the benefits of keyword proximity search in querying xml documents in addition to text documents for example given query keywords over shakespeare’s plays in xml the user might be interested in knowing how the keywords cooccur in this paper we focus on xml trees and define xml keyword proximity queries to return the possibly heterogeneous set of minimum connecting trees mcts of the matches to the individual keywords in the query we consider efficiently executing keyword proximity queries on labeled trees xml in various settings when the xml database has been preprocessed and when no indices are available on the xml database we perform detailed experimental evaluation to study the benefits of our approach and show that our algorithms considerably outperform prior algorithms and other applicable approaches
many current research efforts address the problem of personalizing the web experience for each user with respect to user’s identity and or context in this paper we propose new high level model for the specification of web applications that takes into account the manner in which users interact with the application for supplying appropriate contents or gathering profile data we therefore consider entire behaviors rather than single properties as the smallest information units allowing for automatic restructuring of application components for this purpose high level event condition action eca paradigm is proposed which enables capturing arbitrary and timed clicking behaviors also the architecture and components of first prototype implementation are discussed
one reason that researchers may wish to demonstrate that an external software quality attribute can be measured consistently is so that they can validate prediction system for the attribute however attempts at validating prediction systems for external subjective quality attributes have tended to rely on experts indicating that the values provided by the prediction systems informally agree with the experts intuition about the attribute these attempts are undertaken without pre defined scale on which it is known that the attribute can be measured consistently consequently valid unbiased estimate of the predictive capability of the prediction system cannot be given because the experts measurement process is not independent of the prediction system’s values usually no justification is given for not checking to see if the experts can measure the attribute consistently it seems to be assumed that subjective measurement isn’t proper measurement or subjective measurement cannot be quantified or no one knows the true values of the attributes anyway and they cannot be estimated however even though the classification of software systems or software artefacts quality attributes is subjective it is possible to quantify experts measurements in terms of conditional probabilities it is then possible using statistical approach to assess formally whether the experts measurements can be considered consistent if the measurements are consistent it is also possible to identify estimates of the true values which are independent of the prediction system these values can then be used to assess the predictive capability of the prediction system in this paper we use bayesian inference markov chain monte carlo simulation and missing data imputation to develop statistical tests for consistent measurement of subjective ordinal scale attributes
for robots operating in real world environments the ability to deal with dynamic entities such as humans animals vehicles or other robots is of fundamental importance the variability of dynamic objects however is large in general which makes it hard to manually design suitable models for their appearance and dynamics in this paper we present an unsupervised learning approach to this model building problem we describe an exemplar based model for representing the time varying appearance of objects in planar laser scans as well as clustering procedure that builds set of object classes from given observation sequences extensive experiments in real environments demonstrate that our system is able to autonomously learn useful models for eg pedestrians skaters or cyclists without being provided with external class information
we consider the problem of establishing route and sending packets between source destination pair in ad hoc networks composed of rational selfish nodes whose purpose is to maximize their own utility in order to motivate nodes to follow the protocol specification we use side payments that are made to the forwarding nodes our goal is to design fully distributed algorithm such that node is always better off participating in the protocol execution individual rationality ii node is always better off behaving according to the protocol specification truthfulness iii messages are routed along the most energy efficient least cost path and iv the message complexity is reasonably low we introduce the commit protocol for individually rational truthful and energy efficient routing in ad hoc networks to the best of our knowledge this is the first ad hoc routing protocol with these features commit is based on the vcg payment scheme in conjunction with novel game theoretic technique to achieve truthfulness for the sender node by means of simulation we show that the inevitable economic inefficiency is small as an aside our work demonstrates the advantage of using cross layer approach to solving problems leveraging the existence of an underlying topology control protocol we are able to simplify the design and analysis of our routing protocol and to reduce its message complexity on the other hand our investigation of the routing problem in presence of selfish nodes disclosed new metric under which topology control protocols can be evaluated the cost of cooperation
modern presentation software is still built around interaction metaphors adapted from traditional slide projectors we provide an analysis of the problems in this application genre that presentation authors face and present fly presentation tool that is based on the idea of planar information structures inspired by the natural human thought processes of data chunking association and spatial memory fly explores authoring of presentation documents evaluation of paper prototype showed that the planar ui is easily grasped by users and leads to presentations more closely resembling the information structure of the original content thus providing better authoring support than the slide metaphor our software prototype confirmed these results and outperformed powerpoint in second study for tasks such as prototyping presentations and generating meaningful overviews users reported that this interface helped them better to express their concepts and expressed significant preference for fly over the traditional slide model
we study generalization of the constraint satisfaction problem csp the periodic constraint satisfaction problem an input instance of the periodic csp is finite set of generating constraints over structured variable set that implicitly specifies larger possibly infinite set of constraints the problem is to decide whether or not the larger set of constraints has satisfying assignment this model is natural for studying constraint networks consisting of constraints obeying high degree of regularity or symmetry our main contribution is the identification of two broad polynomial time tractable subclasses of the periodic csp
suppose we are given graph and set of terminals we consider the problem of constructing graph eh that approximately preserves the congestion of every multicommodity flow with endpoints supported in we refer to such graph as flow sparsifier we prove that there exist flow sparsifiers that simultaneously preserve the congestion of all multicommodity flows within an log log log factor where this bound improves to if excludes any fixed minor this is strengthening of previous results which consider the problem of finding graph eh cut sparsifier that approximately preserves the value of minimum cuts separating any partition of the terminals indirectly our result also allows us to give construction for better quality cut sparsifiers and flow sparsifiers thereby we immediately improve all approximation ratios derived using vertex sparsification in we also prove an log log lower bound for how well flow sparsifier can simultaneously approximate the congestion of every multicommodity flow in the original graph the proof of this theorem relies on technique which we refer to as oblivious dual certifcates for proving super constant congestion lower bounds against many multicommodity flows at once our result implies that approximation algorithms for multicommodity flow type problems designed by black box reduction to uniform case on nodes see for examples must incur super constant cost in the approximation ratio
similarity search is important in information retrieval applications where objects are usually represented as vectors of high dimensionality this paper proposes new dimensionality reduction technique and an indexing mechanism for high dimensional datasets the proposed technique reduces the dimensions for which coordinates are less than critical value with respect to each data vector this flexible datawise dimensionality reduction contributes to improving indexing mechanisms for high dimensional datasets that are in skewed distributions in all coordinates to apply the proposed technique to information retrieval cva file compact va file which is revised version of the va file is developed by using cva file the size of index files is reduced further while the tightness of the index bounds is held maximally the effectiveness is confirmed by synthetic and real data
many of today’s high level parallel languages support dynamic fine grained parallelism these languages allow the user to expose all the parallelism in the program which is typically of much higher degree than the number of processors hence an efficient scheduling algorithm is required to assign computations to processors at runtime besides having low overheads and good load balancing it is important for the scheduling algorithm to minimize the space usage of the parallel program this paper presents scheduling algorithm that is provably space efficient and time efficient for nested parallel languages in addition to proving the space and time bounds of the parallel schedule generated by the algorithm we demonstrate that it is efficient in practice we have implemented runtime system that uses our algorithm to schedule parallel threads the results of executing parallel programs on this system show that our scheduling algorithm significantly reduces memory usage compared to previous techniques without compromising performance
in this paper we study generalization of standard property testing where the algorithms are required to be more tolerant with respect to objects that do not have but are close to having the property specifically tolerant property testing algorithm is required to accept objects that are close to having given property and reject objects that are far from having for some parameters another related natural extension of standard property testing that we study is distance approximation here the algorithm should output an estimate of the distance of the object to where this estimate is sufficiently close to the true distance of the object to we first formalize the notions of tolerant property testing and distance approximation and discuss the relationship between the two tasks as well as their relationship to standard property testing we then apply these new notions to the study of two problems tolerant testing of clustering and distance approximation for monotonicity we present and analyze algorithms whose query complexity is either polylogarithmic or independent of the size of the input
despite their popularity and importance pointer based programs remain major challenge for program verification in this paper we propose an automated verification system that is concise precise and expressive for ensuring the safety of pointer based programs our approach uses user definable shape predicates to allow programmers to describe wide range of data structures with their associated size properties to support automatic verification we design new entailment checking procedure that can handle well founded inductive predicates using unfold fold reasoning we have proven the soundness and termination of our verification system and have built prototype system
this paper introduces simple real time distributed computing model for message passing systems which reconciles the distributed computing and the real time systems perspective by just replacing instantaneous computing steps with computing steps of non zero duration we obtain model that both facilitates real time scheduling analysis and retains compatibility with classic distributed computing analysis techniques and results we provide general simulations and validity conditions for transforming algorithms from the classic synchronous model to our real time model and vice versa and investigate whether which properties of real systems are inaccurately or even wrongly captured when resorting to zero step time models we revisit the well studied problem of deterministic drift and failure free internal clock synchronization for this purpose and show that no clock synchronization algorithm with constant running time can achieve optimal precision in our real time model since such an algorithm is known for the classic model this is an instance of problem where the standard distributed computing analysis gives too optimistic results we prove that optimal precision is only achievable with algorithms that take time in our model and establish several additional algorithms and lower bounds
advances in microsensor and radio technology will enable small but smart sensors to be deployed for wide range of environmental monitoring applications the low per node cost will allow these wireless networks of sensors and actuators to be densely distributed the nodes in these dense networks will coordinate to perform the distributed sensing and actuation tasks moreover as described in this paper the nodes can also coordinate to exploit the redundancy provided by high density so as to extend overall system lifetime the large number of nodes deployed in these systems will preclude manual configuration and the environmental dynamics will preclude design time preconfiguration therefore nodes will have to self configure to establish topology that provides communication under stringent energy constraints ascent builds on the notion that as density increases only subset of the nodes are necessary to establish routing forwarding backbone in ascent each node assesses its connectivity and adapts its participation in the multihop network topology based on the measured operating region this paper motivates and describes the ascent algorithm and presents analysis simulation and experimental measurements we show that the system achieves linear increase in energy savings as function of the density and the convergence time required in case of node failures while still providing adequate connectivity
an error occurs when software cannot complete requested action as result of some problem with its input configuration or environment high quality error report allows user to understand and correct the problem unfortunately the quality of error reports has been decreasing as software becomes more complex and layered end users take the cryptic error messages given to them by programsand struggle to fix their problems using search engines and support websites developers cannot improve their error messages when they receive an ambiguous or otherwise insufficient error indicator from black box software component we introduce clarify system that improves error reporting by classifying application behavior clarify uses minimally invasive monitoring to generate behavior profile which is summary of the program’s execution history machine learning classifier uses the behavior profile to classify the application’s behavior thereby enabling more precise error report than the output of the application itself we evaluate prototype clarify system on ambiguous error messages generated by large modern applications like gcc la tex and the linux kernel for performance cost of less than on user applications and on the linux kernel the proto type correctly disambiguates at least of application behaviors that result in ambiguous error reports this accuracy does not degrade significantly with more behaviors clarify classifier for la tex error messages is at most less accurate than classifier for latex error messages finally we show that without any human effort to build classifier clarify can provide nearest neighbor software support where users who experience problem are told about other users who might have had the same problem on average of the users that clarify identifies have experienced the same problem
distributional measures of lexical similarity and kernel methods for classification are well known tools in natural language processing we bring these two methods together by introducing distributional kernels that compare co occurrence probability distributions we demonstrate the effectiveness of these kernels by presenting state of the art results on datasets for three semantic classification compound noun interpretation identification of semantic relations between nominals and semantic classification of verbs finally we consider explanations for the impressive performance of distributional kernels and sketch some promising generalisations
we survey recent results on wireless networks that are based on analogies with various branches of physics we address among others the problems of optimally arranging the flow of traffic in wireless sensor networks finding minimum cost routes performing load balancing optimizing and analyzing cooperative transmissions calculating the capacity finding routes that avoid bottlenecks and developing distributed anycasting protocols the results are based on establishing analogies between wireless networks and settings from various branches of physics such as electrostatics optics percolation theory diffusion and others many of the results we present hinge on the assumption that the network is massive ie it consists of so many nodes that it can be described in terms of novel macroscopic view the macroscopic view is not as detailed as the standard microscopic one but nevertheless contains enough details to permit meaningful optimization
when meeting someone new the first impression is often influenced by someone’s physical appearance and other types of prejudice in this paper we present touchmedare an interactive canvas which aims to provide an experience when meeting new people while preventing visual prejudice and lowering potential thresholds the focus of the designed experience was to stimulate people to get acquainted through the interactive canvas touchmedare consists of flexible opaque canvas which plays music when touched simultaneously from both sides dynamic variation of this bodily contact is reflected through real time adaptations of the musical compositions two redesigns were qualitatively and quantitatively evaluated and final version was placed in the lowlands festival as case study evaluation results showed that some explanation was needed for the initial interaction with the installation on the other hand after this initial unfamiliarity passed results showed that making bodily contact through the installation did help people to get acquainted with each other and increased their social interaction
the standard formalism for explaining abstract types is existential quantification while it provides sufficient model for type abstraction in entirely statically typed languages it proves to be too weak for languages enriched with forms of dynamic typing where parametricity is violated as an alternative approach to type abstraction that addresses this shortcoming we present calculus for dynamic type generation it features an explicit construct for generating new type names and relies on coercions for managing abstraction boundaries between generated types and their designated representation sealing is represented as generalized form of these coercions the calculus maintains abstractions dynamically without restricting type analysis
many applications require randomized ordering of input data examples include algorithms for online aggregation data mining and various randomized algorithms most existing work seems to assume that accessing the records from large database in randomized order is not difficult problem however it turns out to be extremely difficult in practice using existing methods randomization is either extremely expensive at the front end as data are loaded or at the back end as data are queried this paper presents simple file structure which supports both efficient online random shuffling of large database as well as efficient online sampling or randomization of the database when it is queried the key innovation of our method is the introduction of small degree of carefully controlled rigorously monitored nonrandomness into the file
moments before the launch of every space vehicle engineering discipline specialists must make critical go no go decision the cost of false positive allowing launch in spite of fault or false negative stopping potentially successful launch can be measured in the tens of millions of dollars not including the cost in morale and other more intangible detriments the aerospace corporation is responsible for providing engineering assessments critical to the go no go decision for every department of defense space vehicle these assessments are made by constantly monitoring streaming telemetry data in the hours before launch we will introduce viztree novel time series visualization tool to aid the aerospace analysts who must make these engineering assessments viztree was developed at the university of california riverside and is unique in that the same tool is used for mining archival data and monitoring incoming live telemetry the use of single tool for both aspects of the task allows natural and intuitive transfer of mined knowledge to the monitoring task our visualization approach works by transforming the time series into symbolic representation and encoding the data in modified suffix tree in which the frequency and other properties of patterns are mapped onto colors and other visual properties we demonstrate the utility of our system by comparing it with state of the art batch algorithms on several real and synthetic datasets
numerous context aware mobile communication systems have emerged for individuals and groups calling for the identification of critical success factors related to the design of such systems at different levels the effective system design cannot be achieved without the understanding of situated user behaviour in using context aware systems drawing on activity theory this article advances cross level but coherent conceptualisations of context awareness as enabled by emerging systems grounded in the activities of using context aware systems these conceptualisations provide implications for system design at individual and group levels in terms of critical success factors including contextualisation interactivity and personalisation
many materials including water plastic and metal have specular surface characteristics specular reflections have commonly been considered nuisance for the recovery of object shape however the way that reflections are distorted across the surface depends crucially on curvature suggesting that they could in fact be useful source of information indeed observers can have vivid impression of shape when an object is perfectly mirrored ie the image contains nothing but specular reflections this leads to the question what are the underlying mechanisms of our visual system to extract this shape information from perfectly mirrored object in this paper we propose biologically motivated recurrent model for the extraction of visual features relevant for the perception of shape information from images of mirrored objects we qualitatively and quantitatively analyze the results of computational model simulations and show that bidirectional recurrent information processing leads to better results than pure feedforward processing furthermore we utilize the model output to create rough nonphotorealistic sketch representation of mirrored object which emphasizes image features that are mandatory for shape perception eg occluding contour and regions of high curvature moreover this sketch illustrates that the model generates representation of object features independent of the surrounding scene reflected in the mirrored object
we address the problem of answering conjunctive queries over extended entity relationship schemata which we call eer extended er schemata with is among entities and relationships and cardinality constraints this is common setting in conceptual data modelling where reasoning over incomplete data with respect to knowledge base is required we adopt semantics for eer schemata based on their relational representation we identify wide class of eer schemata for which query answering is tractable in data complexity the crucial condition for tractability is the separability between maximum cardinality constraints represented as key constraints in relational form and the other constraints we provide by means of graph based representation syntactic condition for separability we show that our conditions is not only sufficient but also necessary thus precisely identifying the class of separable schemata we present an algorithm based on query rewriting that is capable of dealing with such eer schemata while achieving tractability we show that further negative constraints can be added to the eer formalism while still keeping query answering tractable we show that our formalism is general enough to properly generalise the most widely adopted knowledge representation languages
we present set of algorithms and an associated display system capable of producing correctly rendered eye contact between three dimensionally transmitted remote participant and group of observers in teleconferencing system the participant’s face is scanned in at hz and transmitted in real time to an autostereoscopic horizontal parallax display displaying him or her over more than deg field of view observable to multiple observers to render the geometry with correct perspective we create fast vertex shader based on lookup table for projecting scene vertices to range of subject angles heights and distances we generalize the projection mathematics to arbitrarily shaped display surfaces which allows us to employ curved concave display surface to focus the high speed imagery to individual observers to achieve two way eye contact we capture video from cross polarized camera reflected to the position of the virtual participant’s eyes and display this video feed on large screen in front of the real participant replicating the viewpoint of their virtual self to achieve correct vertical perspective we further leverage this image to track the position of each audience member’s eyes allowing the display to render correct vertical perspective for each of the viewers around the device the result is one to many teleconferencing system able to reproduce the effects of gaze attention and eye contact generally missing in traditional teleconferencing systems
common deficiency of discretized datasets is that detail beyond the resolution of the dataset has been irrecoverably lost this lack of detail becomes immediately apparent once one attempts to zoom into the dataset and only recovers blur here we describe method that generates the missing detail from any available and plausible high resolution data using texture synthesis since the detail generation process is guided by the underlying image or volume data and is designed to fill in plausible detail in accordance with the coarse structure and properties of the zoomed in neighborhood we refer to our method as constrained texture synthesis regular zooms become semantic zooms where each level of detail stems from data source attuned to that resolution we demonstrate our approach by medical application the visualization of human liver but its principles readily apply to any scenario as long as data at all resolutions are available we will first present viewing application called the virtual microscope and then extend our technique to volumetric viewing
we present new technique that employs support vector machines svms and gaussian mixture densities gmds to create generative discriminative object classification technique using local image features in the past several approaches to fuse the advantages of generative and discriminative approaches were presented often leading to improved robustness and recognition accuracy support vector machines are well known discriminative classification framework but similar to other discriminative approaches suffer from lack of robustness with respect to noise and overfitting gaussian mixtures on the contrary are widely used generative technique we present method to directly fuse both approaches effectively allowing to fully exploit the advantages of both the fusion of svms and gmds is done by representing svms in the framework of gmds without changing the training and without changing the decision boundary the new classifier is evaluated on the pascal voc data additionally we perform experiments on the usps dataset and on four tasks from the uci machine learning repository to obtain additional insights into the properties of the proposed approach it is shown that for the relatively rare cases where svms have problems the combined method outperforms both individual ones
in this paper processor scheduling policies that save processors are introduced and studied in multiprogrammed parallel system processor saving scheduling policy purposefully keeps some of the available processors idle in the presence of work to be done the conditions under which processor saving policies can be more effective than their greedy counterparts ie policies that never leave processors idle in the presence of work to be done are examined sensitivity analysis is performed with respect to application speedup system size coefficient of variation of the applications execution time variability in the arrival process and multiclass workloads analytical simulation and experimental results show that processor saving policies outperform their greedy counterparts under variety of system and workload characteristics
principles of the unitesk test development technology based on the use of formal models of target software are presented this technology was developed by the redverst group in the institute for system programming russian academy of sciences ispras lsqb rsqb which obtained rich experience in testing and verification of complex commercial software
recovering from malicious attacks in survival database systems is vital in mission critical information systems traditional rollback and re execute techniques are too time consuming and can not be applied in survival environments in this paper two efficient approaches transaction dependency based and data dependency based are proposed comparing to transaction dependency based approach data dependency recovery approaches need not undo innocent operations in malicious and affected transactions even some benign blind writes on bad data item speed up recovery process
information system engineering has become under increasing pressure to come up with software solutions that endow systems with the agility that is required to evolve in continually changing business and technological environment in this paper we suggest that software engineering has contribution to make in terms of concepts and techniques that have been recently developed for parallel program design and software architectures we show how such mechanisms can be encapsulated in new modelling primitive coordination contract that can be used for extending component based development approaches in order to manage such levels of change
this article provides detailed implementation study on the behavior of web serves that serve static requests where the load fluctuates over time transient overload various external factors are considered including wan delays and losses and different client behavior models we find that performance can be dramatically improved via kernel level modification to the web server to change the scheduling policy at the server from the standard fair processor sharing scheduling to srpt shortest remaining processing time scheduling we find that srpt scheduling induces no penalties in particular throughput is not sacrificed and requests for long files experience only negligibly higher response times under srpt than they did under the original fair scheduling
recent work demonstrates the potential for extracting patterns from users behavior as detected by sensors since there is currently no generalized framework for reasoning about activity aware applications designers can only rely on the existing systems for guidance however these systems often use custom domain specific definition of activity pattern consequently the guidelines designers can extract from individual systems are limited to the specific application domains of those applications in this paper we introduce five high level guidelines or commandments for designing activity aware applications by considering the issues we outlined in this paper designers will be able to avoid common mistakes inherent in designing activity aware applications
cast shadows are an informative cue to the shape of objects they are particularly valuable for discovering object’s concavities which are not available from other cues such as occluding boundaries we propose new method for recovering shape from shadows which we call shadow carving given conservative estimate of the volume occupied by an object it is possible to identify and carve away regions of this volume that are inconsistent with the observed pattern of shadows we prove theorem that guarantees that when these regions are carved away from the shape the shape still remains conservative shadow carving overcomes limitations of previous studies on shape from shadows because it is robust with respect to errors in shadows detection and it allows the reconstruction of objects in the round rather than just bas reliefs we propose reconstruction system to recover shape from silhouettes and shadow carving the silhouettes are used to reconstruct the initial conservative estimate of the object’s shape and shadow carving is used to carve out the concavities we have simulated our reconstruction system with commercial rendering package to explore the design parameters and assess the accuracy of the reconstruction we have also implemented our reconstruction scheme in table top system and present the results of scanning of several objects
information from which knowledge can be discovered is frequently distributed due to having been recorded at different times or to having arisen from different sources such information is often subject to both imprecision and uncertainty the dempster shafer representation of evidence offers way of representing uncertainty in the presence of imprecision and may therefore be used to provide mechanism for storing imprecise and uncertain information in databases we consider an extended relational data model that allows the imprecision and uncertainty associated with attribute values to be quantified using mass function distribution when query is executed it may be necessary to combine imprecise and uncertain data from distributed sources in order to answer that query mechanism is therefore required both for combining the data and for generating measures of uncertainty to be attached to the imprecise combined data in this paper we provide such mechanism based on aggregation of evidence we show first how this mechanism can be used to resolve inconsistencies and hence provide an essential database capability to perform the operations necessary to respond to queries on imprecise and uncertain data we go on to exploit the aggregation operator in an attribute driven approach to provide information on properties of and patterns in the data this is fundamental to rule discovery and hence such an aggregation operator provides facility that is central requirement in providing distributed information system with the capability to perform the operations necessary for knowledge discovery
one view of computational learning theory is that of learner acquiring the knowledge of teacher we introduce formal model of learning capturing the idea that teachers may have gaps in their knowledge the goal of the learner is still to acquire the knowledge of the teacher but now the learner must also identify the gaps this is the notion of learning from consistently ignorant teacher we consider the impact of knowledge gaps on learning for example monotone dnf and dimensional boxes and show that learning is still possible negatively we show that knowledge gaps make learning conjunctions of horn clauses as hard as learning dnf we also present general results describing when known learning algorithms can be used to obtain learning algorithms using consistently ignorant teacher
in this paper we study the challenges and evaluate the effectiveness of data collected from the web for recommendations we provide experimental results including user study showing that our methods produce good recommendations in realistic applications we propose new evaluation metric that takes into account the difficulty of prediction we show that the new metric aligns well with the results from user study
this paper proposes new clothing segmentation method using foreground clothing and background non clothing estimation based on the constrained delaunay triangulation cdt without any pre defined clothing model in our method the clothing is extracted by graph cuts where the foreground seeds and background seeds are determined automatically the foreground seeds are found by torso detection based on dominant colors determination and the background seeds are estimated based on cdt with the determined seeds the color distributions of the foreground and background are modeled by gaussian mixture models and filtered by cdt based noise suppression algorithm for more robust and accurate segmentation experimental results show that our clothing segmentation method is able to extract different clothing from static images with variations in backgrounds and lighting conditions
in this paper we address the issue of feeding future superscalar processor cores with enough instructions hardware techniques targeting an increase in the instruction fetch bandwidth have been proposed such as the trace cache microarchitecture we present microarchitecture solution based on register file holding basic blocks of instructions this solution places the instruction memory hierarchy out of the cycle determining path we call our approach instruction register file irf we estimate our approach with simplescalar based simulator run on the mediabench benchmark suite and compare to the trace cache performance on the same benchmarks we show that on this benchmark suite an irf based processor fetching up to three basic blocks per cycle outperforms trace cache based processor fetching instructions long traces by on the average
the authors propose method for personalizing the flexible widget layout fwl by adjusting the desirability of widgets with pairwise comparison method and show its implementation and that it actually works personalization of graphical user interfaces guis is important from perspective of usability and it is challenge in the field of model based user interface designs the fwl is model and optimization based layout framework of guis offering possibility for personalization but it has not actually realized it with any concrete method yet in this paper the authors implement method for personalization as dialog box and incorporate it into the existing system of the fwl thus users can personalize layouts generated by the fwl system at run time
within the context of the relational model general technique for establishing that the translation of view update defined by constant complement is independent of the choice of complement is presented in contrast to previous results the uniqueness is not limited to order based updates those constructed from insertions and deletions nor is it limited to those well behaved complements which define closed update strategies rather the approach is based upon optimizing the change of information in the main schema which the view update entails the only requirement is that the view and its complement together possess property called semantic bijectivity relative to the information measure it is furthermore established that very wide range of views have this property this results formalizes the intuition long observed in examples that it is difficult to find different complements which define distinct but reasonable update strategies
in wireless networks bandwidth is relatively scarce especially for supporting on demand media streaming in wired networks multicast stream merging is well known technique for scalable on demand streaming also caching proxies are widely used on the internet to offload servers and reduce network traffic this paper uses simulation to examine caching hierarchy for wireless streaming video distribution in combination with multicast stream merging the main purpose is to gain insight into the filtering effects caused by caching and merging using request frequencies entropy and inter reference times as metrics we illustrate how merging caching and traffic aggregation affect the traffic characteristics at each level the simulation results provide useful insights into caching performance in video streaming hierarchy
the power of high level languages lies in their abstraction over hardware and software complexity leading to greater security better reliability and lower development costs however opaque abstractions are often show stoppers for systems programmers forcing them to either break the abstraction or more often simply give up and use different language this paper addresses the challenge of opening up high level language to allow practical low level programming without forsaking integrity or performance the contribution of this paper is three fold we draw together common threads in diverse literature we identify framework for extending high level languages for low level programming and we show the power of this approach through concrete case studies our framework leverages just three core ideas extending semantics via intrinsic methods extending types via unboxing and architectural width primitives and controlling semantics via scoped semantic regimes we develop these ideas through the context of rich literature and substantial practical experience we show that they provide the power necessary to implement substantial artifacts such as high performance virtual machine while preserving the software engineering benefits of the host language the time has come for high level low level programming to be taken more seriously more projects now use high level languages for systems programming increasing architectural heterogeneity and parallelism heighten the need for abstraction and new generation of high level languages are under development and ripe to be influenced
collaboration and information sharing between organizations that share common goal is becoming increasingly important effective sharing promotes efficiency and productivity as well as enhances customer service with internet connectivity widely available sharing and access to information is relatively simple to implement however the abundance causes another problem the difficulty of determining where truly useful and relevant information is housed information resources such as data documents multimedia objects and services stored in different agencies need to be easily discovered and shared we propose collaborative semantic and pragmatic annotation environment where resources of each agency can be annotated by users in the government social network this collaborative annotation captures not only the semantics but also the pragmatics of the resources such as who when where how and why the resources are used the benefits of semantic and pragmatic annotation tags will include an ability to filter discover and search new and dynamic as well as hidden resources to navigate between resources in search by traversing semantic relationships and to recommend the most relevant government information distributed over different agencies distributed architecture of tagging system is shown and tag based search is illustrated
number of supervised learning methods have been introduced in the last decade unfortunately the last comprehensive empirical evaluation of supervised learning was the statlog project in the early we present large scale empirical comparison between ten supervised learning methods svms neural nets logistic regression naive bayes memory based learning random forests decision trees bagged trees boosted trees and boosted stumps we also examine the effect that calibrating the models via platt scaling and isotonic regression has on their performance an important aspect of our study is the use of variety of performance criteria to evaluate the learning methods
model driven development using languages such as uml and bon often makes use of multiple diagrams eg class and sequence diagrams when modeling systems these diagrams presenting different views of system of interest may be inconsistent metamodel provides unifying framework in which to ensure and check consistency while at the same time providing the means to distinguish between valid and invalid models that is conformance two formal specifications of the metamodel for an object oriented modeling language are presented and it is shown how to use these specifications for model conformance and multiview consistency checking comparisons are made in terms of completeness and the level of automation each provide for checking multiview consistency and model conformance the lessons learned from applying formal techniques to the problems of metamodeling model conformance and multiview consistency checking are summarized
disk caching algorithm is presented that uses an adaptive prefetching scheme to optimize the system performance in disk controllers for traces with different data localities the algorithm uses on line measurements of disk transfer times and of inter page fault rates to adjust the level of prefetching dynamically and its performance is evaluated through trace driven simulations using real workloads the results confirm the effectiveness and efficiency of the new adaptive prefetching algorithm
grids offer dramatic increase in the number of available processing and storing resources that can be delivered to applications however efficient job submission and management continue being far from accessible to ordinary scientists and engineers due to their dynamic and complex nature this paper describes new globus based framework that allows an easier and more efficient execution of jobs in submit and forget fashion the framework automatically performs the steps involved in job submission and also watches over its efficient execution in order to obtain reasonable degree of performance job execution is adapted to dynamic resource conditions and application demands adaptation is achieved by supporting automatic application migration following performance degradation better resource discovery requirement change owner decision or remote resource failure the framework is currently functional on any grid testbed based on globus because it does not require new system software to be installed in the resources the paper also includes practical experiences of the behavior of our framework on the trgp and ucm cab testbeds
debugging refers to the laborious process of finding causes of program failures often such failures are introduced when program undergoes changes and evolves from stable version to new modified version in this paper we propose an automated approach for debugging evolving programs given two programs reference stable program and new modified program and an input that fails on the modified program our approach uses concrete as well as symbolic execution to synthesize new inputs that differ marginally from the failing input in their control flow behavior comparison of the execution traces of the failing input and the new inputs provides critical clues to the root cause of the failure notable feature of our approach is that it handles hard to explain bugs like code missing errors by pointing to the relevant code in the reference program we have implemented our approach in tool called darwin we have conducted experiments with several real life case studies including real world web servers and the libpng library for manipulating png images our experience from these experiments points to the efficacy of darwin in pinpointing bugs moreover while localizing given observable error the new inputs synthesized by darwin can reveal other undiscovered errors
os kernels have been written in weakly typed or non typed programming languages for example therefore it is extremely hard to verify even simple memory safety of the kernels the difficulty could be resolved by writing os kernels in strictly typed programming languages but existing strictly typed languages are not flexible enough to implement important os facilities eg memory management and multi thread management facilities to address the problem we designed and implemented talk new strictly and statically typed assembly language which is flexible enough to implement os facilities and wrote an os kernel with talk in our approach the safety of the kernel can be verified automatically through static type checking at the level of binary executables without source code
existing research has most often relied on simulation and considered the uniform traffic distribution when investigating the performance properties of multicomputer networks eg the torus however there are numerous parallel applications that generate non uniform traffic patterns such as hot spot furthermore much more attention has been paid to capturing the impact of non uniform traffic on network performance resulting in the development of number of analytical models for predicting message latency in the presence of hot spots in the network for instance analytical models have been reported for the adaptively routed torus with uni directional as well as bi directional channels however models for the deterministically routed torus have considered uni directional channels only in an effort to fill in this gap this paper describes an analytical model for the deterministically routed torus with bi directional channels when subjected to hot spot traffic the modelling approach adopted for deterministic routing is totally different from that for adaptive routing due to the inherently different nature of the two types of routing the validity of the model is demonstrated by comparing analytical results against those obtained through extensive simulation experiments
the task of obtaining an optimal set of parameters to fit mixture model has many applications in science and engineering domains and is computationally challenging problem novel algorithm using convolution based smoothing approach to construct hierarchy or family of smoothed log likelihood surfaces is proposed this approach smooths the likelihood function and applies the em algorithm to obtain promising solution on the smoothed surface using the most promising solutions as initial guesses the em algorithm is applied again on the original likelihood though the results are demonstrated using only two levels the method can potentially be applied to any number of levels in the hierarchy theoretical insight demonstrates that the smoothing approach indeed reduces the overall gradient of modified version of the likelihood surface this optimization procedure effectively eliminates extensive searching in non promising regions of the parameter space results on some benchmark datasets demonstrate significant improvements of the proposed algorithm compared to other approaches empirical results on the reduction in the number of local maxima and improvements in the initialization procedures are provided
an object detection method from line drawings is presented the method adopts the local neighborhood structure as the elementary descriptor which is formed by grouping several nearest neighbor lines curves around one reference with this representation both the appearance and the geometric structure of the line drawing are well described the detection algorithm is hypothesis test scheme the top most similar local structures in the drawing are firstly obtained for each local structure of the model and the transformation parameters are estimated for each of the candidates such as object center scale and rotation factors by treating each estimation result as point in the parameter space dense region around the ground truth is then formed provided that there exist model in the drawing the mean shift method is used to detect the dense regions and the significant modes are accepted as the occurrence of object instances
we describe our experiences of designing digital community display with members of rural community these experiences are highlighted by the development of printed and digital postcard features for the wray photo display public photosharing display designed with the community which was trialled during popular village fair where both local residents and visitors interacted with the system this trial allowed us to examine the relative popularity and differences in usage between printed and digital postcard and offer insights into the uses of these features with community generated content and potential problems encountered
the interconnection network considered in this paper is the generalized base hypercube that is an attractive variance of the well known hypercube the generalized base hypercube is superior to the hypercube in many criteria such as diameter connectivity and fault diameter in this paper we study the hamiltonian connectivity and pancyclicity of the generalized base hypercube by the algorithmic approach we show that generalized base hypercube is hamiltonian connected for that is there exists hamiltonian path joining each pair of vertices in generalized base hypercube for we also show that generalized base hypercube is pancyclic for that is it embeds cycles of all lengths ranging from to the order of the graph for
in this paper we approach the problem of constructing ensembles of classifiers from the point of view of instance selection instance selection is aimed at obtaining subset of the instances available for training capable of achieving at least the same performance as the whole training set in this way instance selection algorithms try to keep the performance of the classifiers while reducing the number of instances in the training set meanwhile boosting methods construct an ensemble of classifiers iteratively focusing each new member on the most difficult instances by means of biased distribution of the training instances in this work we show how these two methodologies can be combined advantageously we can use instance selection algorithms for boosting using as objective to optimize the training error weighted by the biased distribution of the instances given by the boosting method our method can be considered as boosting by instance selection instance selection has mostly been developed and used for nearest neighbor nn classifiers so as first step our methodology is suited to construct ensembles of nn classifiers constructing ensembles of classifiers by means of instance selection has the important feature of reducing the space complexity of the final ensemble as only subset of the instances is selected for each classifier however the methodology is not restricted to nn classifier other classifiers such as decision trees and support vector machines svms may also benefit from smaller training set as they produce simpler classifiers if an instance selection algorithm is performed before training in the experimental section we show that the proposed approach is able to produce better and simpler ensembles than random subspace method rsm method for nn and standard ensemble methods for and svms
we present an expectation maximization learning algorithm em for estimating the parameters of partially constrained bayesian trees the bayesian trees considered here consist of an unconstrained subtree and set of constrained subtrees in this tree structure constraints are imposed on some of the parameters of the parametrized conditional distributions such that all conditional distributions within the same subtree share the same constraint we propose learning method that uses the unconstrained subtree to guide the process of discovering set of relevant constrained substructures substructure discovery and constraint enforcement are simultaneously accomplished using an em algorithm we show how our tree substructure discovery method can be applied to the problem of learning representative pose models from set of unsegmented video sequences our experiments demonstrate the potential of the proposed method for human motion classification
in multimedia retrieval query is typically interactively refined towards the optimal answers by exploiting user feedback however in existing work in each iteration the refined query is re evaluated this is not only inefficient but fails to exploit the answers that may be common between iterations furthermore it may also take too many iterations to get the optimal answers in this paper we introduce new approach called optrfs optimizing relevance feedback search by query prediction for iterative relevance feedback search optrfs aims to take users to view the optimal results as fast as possible it optimizes relevance feedback search by both shortening the searching time during each iteration and reducing the number of iterations optrfs predicts the potential candidates for the next iteration and maintains this small set for efficient sequential scan by doing so repeated candidate accesses ie random accesses can be saved hence reducing the searching time for the next iteration in addition efficient scan on the overlap before the next search starts also tightens the search space with smaller pruning radius as step forward optrfs also predicts the optimal query which corresponds to optimal answers based on the early executed iterations queries by doing so some intermediate iterations can be saved hence reducing the total number of iterations by taking the correlations among the early executed iterations into consideration optrfs investigates linear regression exponential smoothing and linear exponential smoothing to predict the next refined query so as to decide the overlap of candidates between two consecutive iterations considering the special features of relevance feedback optrfs further introduces adaptive linear exponential smoothing to self adjust the parameters for more accurate prediction we implemented optrfs and our experimental study on real life data sets show that it can reduce the total cost of relevance feedback search significantly some interesting features of relevance feedback search are also discovered and discussed
automated software engineering methods support the construction maintenance and analysis of both new and legacy systems their application is commonplace in desktop and enterprise class systems due to the productivity and reliability benefits they afford the contribution of this article is to present an applied foundation for extending the use of such methods to the flourishing domain of wireless sensor networks the objective is to enable developers to construct tools that aid in understanding both the static and dynamic properties of reactive event based systems we present static analysis and instrumentation toolkit for the nesc language the defacto standard for sensor network development we highlight the novel aspects of the toolkit analyze its performance and provide representative case studies that illustrate its use
real scale semantic web applications such as knowledge portals and marketplaces require the management of large volumes of metadata ie information describing the available web content and services better knowledge about their meaning usage accessibility or quality will considerably facilitate an automated processing of web resources the resource description framework rdf enables the creation and exchange of metadata as normal web data although voluminous rdf descriptions are already appearing sufficiently expressive declarative languages for querying both rdf descriptions and schemas are still missing in this paper we propose new rdf query language called rql it is typed functional language la oql and relies on formal model for directed labeled graphs permitting the interpretation of superimposed resource descriptions by means of one or more rdf schemas rql adapts the functionality of semistructured xml query languages to the peculiarities of rdf but foremost it enables to uniformly query both resource descriptions and schemas we illustrate the rql syntax semantics and typing system by means of set of example queries and report on the performance of our persistent rdf store employed by the rql interpreter
the web is now being used as general platform for hosting distributed applications like wikis bulletin board messaging systems and collaborative editing environments data from multiple applications originating at multiple sources all intermix in single web browser making sensitive data stored in the browser subject to broad milieu of attacks cross site scripting cross site request forgery and others the fundamental problem is that existing web infrastructure provides no means for enforcing end to end security on data to solve this we design an architecture using mandatory access control mac enforcement we overcome the limitations of traditional mac systems implemented solely at the operating system layer by unifying mac enforcement across virtual machine operating system networking and application layers we implement our architecture using xen virtual machine management selinux at the operating system layer labeled ipsec for networking and our own label enforcing web browser called flowwolf we tested our implementation and find that it performs well supporting data intermixing while still providing end to end security guarantees
paper augmented digital documents padds are digital documents that can be manipulated either on computer screen or on paper padds and the infrastructure supporting them can be seen as bridge between the digital and the paper worlds as digital documents padds are easy to edit distribute and archive as paper documents padds are easy to navigate annotate and well accepted in social settings the chimeric nature of padds make them well suited for many tasks such as proofreading editing and annotation of large format document like blueprintswe are presenting an architecture which supports the seamless manipulation of padds using today’s technologies and reports on the lessons we learned while implementing the first padd system
we present statistical method called covering topic score cts to predict query performance for information retrieval estimation is based on how well the topic of user’s query is covered by documents retrieved from certain retrieval system our approach is conceptually simple and intuitive and can be easily extended to incorporate features beyond bag of words such as phrases and proximity of terms experiments demonstrate that cts significantly correlates with query performance in variety of trec test collections and in particular cts gains more prediction power benefiting from features of phrases and proximity of terms we compare cts with previous state of the art methods for query performance prediction including clarity score and robustness score our experimental results show that cts consistently performs better than or at least as well as these other methods in addition to its high effectiveness cts is also shown to have very low computational complexity meaning that it can be practical for real applications
sensor networks consist of many small sensing devices that monitor an environment and communicate using wireless links the lifetime of these networks is severely curtailed by the limited battery power of the sensors one line of research in sensor network lifetime management has examined sensor selection techniques in which applications judiciously choose which sensors data should be retrieved and are worth the expended energy in the past many ad hoc approaches for sensor selection have been proposed in this paper we argue that sensor selection should be based upon tradeoff between application perceived benefit and energy consumption of the selected sensor setwe propose framework wherein the application can specify the utility of measuring data nearly concurrently at each set of sensors he goal is then to select sequence of sets to measure whose total utility is maximized while not exceeding the available energy alternatively we may look for the most cost effective sensor set maximizing the product of utility and system lifetimethis approach is very generic and permits us to model many applications of sensor networks we proceed to study two important classes of utility functions submodular and supermodular functions we show that the optimum solution for submodular functions can be found in polynomial time while optimizing the costeffectiveness of supermodular functions is np hard for practically important subclass of supermodular functions we present an lp based solution if nodes can send for different amounts of time and show that we can achieve an logn approximation ratio if each node has to send for the same amount of timefinally we study scenarios in which the quality of measurements is naturally expressed in terms of distances from targets we show that the utility based approach is analogous to penalty based approach in those scenarios and present preliminary results on some practically important special cases
while total order broadcast or atomic broadcast primitives have received lot of attention this paper concentrates on total order multicast to multiple groups in the context of asynchronous distributed systems in which processes may suffer crash failures multicast to multiple groups means that each message is sent to subset of the process groups composing the system distinct messages possibly having distinct destination groups total order means that all message deliveries must be totally ordered this paper investigates consensus based approach to solve this problem and proposes corresponding protocol to implement this multicast primitive this protocol is based on two underlying building blocks namely uniform reliable multicast and uniform consensus its design characteristics lie in the two following properties the first one is minimality property more precisely only the sender of message and processes of its destination groups have to participate in the total order multicast of the message the second property is locality property no execution of consensus has to involve processes belonging to distinct groups ie consensus is executed on per group basis this locality property is particularly useful when one is interested in using the total order multicast primitive in large scale distributed systems in addition to correctness proof an improvement that reduces the cost of the protocol is also suggested
as collaboration in virtual environments becomes more object focused and closely coupled the frequency of conflicts in accessing shared objects can increase in addition two kinds of concurrency control surprises become more disruptive to the collaboration undo surprises can occur when previously visible change is undone because of an access conflict intention surprises can happen when concurrent action by remote session changes the structure of shared object at the same perceived time as local access of that object such that the local user might not get what they expect because they have not had time to visually process the change hierarchy of three concurrency control mechanisms is presented in descending order of collaborative surprises which allows the concurrency scheme to be tailored to the tolerance for such surprises one mechanism is semioptimistic the other two are pessimistic designed for peer to peer vitual environments in which several threads have access to the shared scene graph these algorithms are straightforward and relatively simple they can be implemented using and java under windows and unix on both desktop and immersive systems in series of usability experiments the average performance of the most conservative concurrency control mechanism on local lan was found to be quite acceptable
using moving parabolic approximations mpa we reconstruct an improved point based model of curve or surface represented as an unorganized point cloud while also estimating the differential properties of the underlying smooth manifold we present optimization algorithms to solve these mpa models and examples which show that our reconstructions of the curve or surface and estimates of the normals and curvature information are accurate for precise point clouds and robust in the presence of noise
transactional memory tm is promising paradigm for concurrent programming this paper is an overview of our recent theoretical work on defining theory of tm we first recall some tm correctness properties and then overview results on the inherent power and limitations of tms
time sequences which are ordered sets of observations have been studied in various database applications in this paper we introduce new class of time sequences where each observation is represented by an interval rather than number such sequences may arise in many situations for instance we may not be able to determine the exact value at time point due to uncertainty or aggregation such observation may be represented better by range of possible values similarity search with interval time sequences as both query and data sequences poses new challenge for research we first address the issue of dis similarity measures for interval time sequences we choose an norm based measure because it effectively quantifies the degree of overlapping and remoteness between two intervals and is invariant irrespective of the position of an interval when it is enclosed within another interval we next propose an efficient indexing technique for fast retrieval of similar interval time sequences from large databases more specifically we propose to extract segment based feature vector for each sequence and to map each feature vector to either point or hyper rectangle in multi dimensional feature space we then show how we can use existing multi dimensional index structures such as the tree for efficient query processing the proposed method guarantees no false dismissals experimental results show that for synthetic and real stock data it is superior to sequential scanning in performance and scales well with the data size
application integration can be carried out on three different levels the data source level the business logic level and the user interface level with ontologies based integration on the data source level dating back to the and semantic web services for integrating on the business logic level coming of age it is time for the next logical step employing ontologies for integration on the user interface level such an approach supports both the developer in terms of reduced development times and the user in terms of better usability of integrated applications in this paper we introduce framework employing ontologies for integrating applications on the user interface level the acm portal is published by the association for computing machinery copyright acm inc terms of usage privacy policy code of ethics contact us useful downloads adobe acrobat quicktime windows media player real player
the handling of user preferences is becoming an increasingly important issue in present day information systems among others preferences are used for information filtering and extraction to reduce the volume of data presented to the user they are also used to keep track of user profiles and formulate policies to improve and automate decision makingwe propose here simple logical framework for formulating preferences as preference formulas the framework does not impose any restrictions on the preference relations and allows arbitrary operation and predicate signatures in preference formulas it also makes the composition of preference relations straightforward we propose simple natural embedding of preference formulas into relational algebra and sql through single winnow operator parameterized by preference formula the embedding makes possible the formulation of complex preference queries for example involving aggregation by piggybacking on existing sql constructs it also leads in natural way to the definition of further preference related concepts like ranking finally we present general algebraic laws governing the winnow operator and its interactions with other relational algebra operators the preconditions on the applicability of the laws are captured by logical formulas the laws provide formal foundation for the algebraic optimization of preference queries we demonstrate the usefulness of our approach through numerous examples
block correlations are common semantic patterns in storage systems these correlations can be exploited for improving the effectiveness of storage caching prefetching data layout and disk scheduling unfortunately information about block correlations is not available at the storage system level previous approaches for discovering file correlations in file systems do not scale well enough to be used for discovering block correlations in storage systems in this paper we propose miner an algorithm which uses data mining technique called frequent sequence mining to discover block correlations in storage systems miner runs reasonably fast with feasible space requirement indicating that it is practical tool for dynamically inferring correlations in storage system moreover we have also evaluated the benefits of block correlation directed prefetching and data layout through experiments our results using real system workloads show that correlation directed prefetching and data layout can reduce average response time by compared to the base case and compared to the commonly used sequential prefetching scheme
this paper describes onechip third generation reconfigurable processor architecture that integrates reconfigurable functional unit rfu into superscalar reduced instruction set computer risc processor’s pipeline the architecture allows dynamic scheduling and dynamic reconfiguration it also provides support for pre loading configurations and for least recently used lru configuration managementto evaluate the performance of the onechip architecture several off the shelf software applications were compiled and executed on sim onechip an architecture simulator for onechip that includes software environment for programming the system the architecture is compared to similar one but without dynamic scheduling and without an rfu onechip achieves performance improvement and shows speedup range from up to for the different applications and data sizes used the results show that dynamic scheduling helps performance the most on average and that the rfu will always improve performance the best when most of the execution is in the rfu
most correlation clustering algorithms rely on principal component analysis pca as correlation analysis tool the correlation of each cluster is learned by applying pca to set of sample points since pca is rather sensitive to outliers if small fraction of these points does not correspond to the correct correlation of the cluster the algorithms are usually misled or even fail to detect the correct results in this paper we evaluate the influence of outliers on pca and propose general framework for increasing the robustness of pca in order to determine the correct correlation of each cluster we further show how our framework can be applied to pca based correlation clustering algorithms thorough experimental evaluation shows the benefit of our framework on several synthetic and real world data sets
existing dram controllers employ rigid non adaptive scheduling and buffer management policies when servicing prefetch requests some controllers treat prefetch requests the same as demand requests others always prioritize demand requests over prefetch requests however none of these rigid policies result in the best performance because they do not take into account the usefulness of prefetch requests if prefetch requests are useless treating prefetches and demands equally can lead to significant performance loss and extra bandwidth consumption in contrast if prefetch requests are useful prioritizing demands over prefetches can hurt performance by reducing dram throughput and delaying the service of useful requests this paper proposes new low cost memory controller called prefetch aware dram controller padc that aims to maximize the benefit of useful prefetches and minimize the harm caused by useless prefetches to accomplish this padc estimates the usefulness of prefetch requests and dynamically adapts its scheduling and buffer management policies based on the estimates the key idea is to adaptively prioritize between demand and prefetch requests and drop useless prefetches to free up memory system resources based on the accuracy of the prefetcher our evaluation shows that padc significantly outperforms previous memory controllers with rigid prefetch handling policies on both single and multi core systems with variety of prefetching algorithms across wide range of multiprogrammed spec cpu workloads it improves system performance by on core system and by on an core system while reducing dram bandwidth consumption by and respectively
most hardware predictors are table based eg two level branch predictors and have exponential size growth in the number of input bits or features eg previous branch outcomes this growth severely limits the amount of predictive information that such predictors can use to avoid exponential growth we introduce the idea of dynamic feature selection for building hardware predictors that can use large amount of predictive information based on this idea we design the dynamic decision tree ddt predictor which exhibits only linear size growth in the number of features our initial evaluation in branch prediction shows that the general purpose ddt using only branch history features is comparable on average to conventional branch predictors opening the door to practically using large numbers of additional features
referential integrity is an essential global constraint in relational database that maintains it in complete and consistent state in this work we assume the database may violate referential integrity and relations may be denormalized we propose set of quality metrics defined at four granularity levels database relation attribute and value that measure referential completeness and consistency quality metrics are efficiently computed with standard sql queries that incorporate two query optimizations left outer joins on foreign keys and early foreign key grouping experiments evaluate our proposed metrics and sql query optimizations on real and synthetic databases showing they can help in detecting and explaining referential errors
cluster based replication solutions are an attractive mechanism to provide both high availability and scalability for the database backend within the multi tier information systems of service oriented businesses an important issue that has not yet received sufficient attention is how database replicas that have failed can be reintegrated into the system or how completely new replicas can be added in order to increase the capacity of the system ideally recovery takes place online ie while transaction processing continues at the replicas that are already running in this paper we present complete online recovery solution for database clusters one important issue is to find an efficient way to transfer the data the joining replica needs in this paper we present two data transfer strategies the first transfers the latest copy of each data item the second transfers the updates rejoining replica has missed during its downtime second challenge is to coordinate this transfer with ongoing transaction processing such that the joining node does not miss any updates we present coordination protocol that can be used with postgres replication tool which uses group communication system for replica control we have implemented and compared our transfer solutions against set of parameters and present heuristics which allow an automatic selection of the optimal strategy for given configuration
influence of items on some other items might not be the same as the association between these sets of items many tasks of data analysis are based on expressing influence of items on other items in this paper we introduce the notion of an overall influence of set of items on another set of items we also propose an extension to the notion of overall association between two items in database using the notion of overall influence we have designed two algorithms for influence analysis involving specific items in database as the number of databases increases on yearly basis we have adopted incremental approach in these algorithms experimental results are reported for both synthetic and real world databases
the growing number of information security breaches in electronic and computing systems calls for new design paradigms that consider security as primary design objective this is particularly relevant in the embedded domain where the security solution should be customized to the needs of the target system while considering other design objectives such as cost performance and power due to the increasing complexity and shrinking design cycles of embedded software most embedded systems present host of software vulnerabilities that can be exploited by security attacks many attacks are initiated by causing violation in the properties of data eg integrity privacy access control rules etc associated with trusted program that is executing on the system leading to range of undesirable effectsin this work we develop general framework that provides security assurance against wide class of security attacks our work is based on the observation that program’s permissible behaviorwith respect to data accesses can be characterized by certain properties we present hardware software approach wherein such properties can be encoded as data attributes and enforced as security policies during program execution these policies may be application specific eg access control for certain data structures compiler generated eg enforcing that variables are accessed only within their scope or universally applicable to all programs eg disallowing writes to unallocated memory we show how an embedded system architecture can support such policies by enhancing the memory hierarchy to represent the attributes of each datum as security tags that are linked to it through its lifetime and ii adding configurable hardware checker that interprets the semantics of the tags and enforces the desired security policies we evaluated the effectiveness of the proposed architecture in enforcing various security policies for several embedded benchmarks our experiments in the context of the simplescalar framework demonstrate that the proposed solution ensures run time validation of program data properties with minimal execution time overheads
given two sets of moving objects future timestamp tq and distance threshold spatio temporal join retrieves all pairs of objects that are within distance at tq the selectivity of join equals the number of retrieved pairs divided by the cardinality of the cartesian product this paper develops model for spatio temporal join selectivity estimation based on rigorous probabilistic analysis and reveals the factors that affect the selectivity initially we solve the problem for id point and rectangle objects whose location and velocities distribute uniformly and then extend the results to multi dimensional spaces finally we deal with non uniform distributions using specialized spatio temporal histogram extensive experiments confirm that the proposed formulae are highly accurate average error below
we give semantics to polymorphic effect analysis that tracks possibly thrown exceptions and possible non termination for higher order language the semantics is defined using partial equivalence relations over standard monadic domain theoretic model of the original language and establishes the correctness of both the analysis itself and of the contextual program transformations that it enables
the need to visualize large social networks is growing as hardware capabilities make analyzing large networks feasible and many new data sets become available unfortunately the visualizations in existing systems do not satisfactorily resolve the basic dilemma of being readable both for the global structure of thenetwork and also for detailed analysis of local communities to address this problem we present nodetrix hybrid representation for networks that combines the advantages of two traditional representations node link diagrams are used to show the global structure of network while arbitrary portions of the network can be shown as adjacency matrices to better support the analysis of communities key contribution is set of interaction techniques these allow analysts to create nodetrix visualization by dragging selections to and from node link and matrix forms and to flexibly manipulate the nodetrix representation to explore the dataset andcreate meaningful summary visualizations of their findings finally we present case study applying nodetrix to the analysis of the infovis coauthorship dataset to illustrate the capabilities of nodetrix as both an exploration tool and an effective means of communicating results
the artifacts constituting software system often drift apart over time we have developed the software reflexion model technique to help engineers perform various software engineering tasks by exploiting rather than removing the drift between design and implementation more specifically the technique helps an engineer compare artifacts by summarizing where one artifact such as design is consistent with and inconsistent with another artifact such as source the technique can be applied to help software engineer evolve structural mental model of system to the point that it is good enough to be used for reasoning about task at hand the software reflexion model technique has been applied to support variety of tasks including design conformance change assessment and an experimental reengineering of the million lines of code microsoft excel product in this paper we provide formal characterization of the reflexion model technique discuss practical aspects of the approach relate experiences of applying the approach and tools and place the technique into the context of related work
abstraction and slicing are both techniques for reducing the size of the state space to be inspected during verification in this paper we present new model checking procedure for infinite state concurrent systems that interleaves automatic abstraction refinement which splits states according to new predicates obtained by craig interpolation with slicing which removes irrelevant states and transitions from the abstraction the effects of abstraction and slicing complement each other as the refinement progresses the increasing accuracy of the abstract model allows for more precise slice the resulting smaller representation gives room for additional predicates in the abstraction the procedure terminates when an error path in the abstraction can be concretized which proves that the system is erroneous or when the slice becomes empty which proves that the system is correct
compressing the instructions of an embedded program is important for cost sensitive low power control oriented embedded computing number of compression schemes have been proposed to reduce program size however the increased instruction density has an accompanying performance cost because the instructions must be decompressed before execution in this paper we investigate the performance penalty of hardware managed code compression algorithm recently introduced in ibm’s powerpc this scheme is the first to combine many previously proposed code compression techniques making it an ideal candidate for study we find that code compression with appropriate hardware optimizations does not have to incur much performance loss furthermore our studies show this holds for architectures with wide range of memory configurations and issue widths surprisingly we find that performance increase over native code is achievable in many situations
detecting code clones has many software engineering applications existing approaches either do not scale to large code bases or are not robust against minor code modifications in this paper we present an efficient algorithm for identifying similar subtrees and apply it to tree representations of source code our algorithm is based on novel characterization of subtrees with numerical vectors in the euclidean space mathbb and an efficient algorithm to cluster these vectors wrt the euclidean distance metric subtrees with vectors in one cluster are considered similar we have implemented our tree similarity algorithm as clone detection tool called deckard and evaluated it on large code bases written in and java including the linux kernel and jdk our experiments show that deckard is both scalable and accurate it is also language independent applicable to any language with formally specified grammar
this paper presents an initial study of multimodal collaborative platform concerning user preferences and interaction technique adequacy towards task true collaborative interactions are missing aspect of the majority of nowadays multi user system on par with the lack of support towards impaired users in order to surpass these obstacles we provide an accessible platform for co located collaborative environments which aims at not only improving the ways users interact within them but also at exploring novel interaction patterns brief study regarding set of interaction techniques and tasks was conducted in order to assess the most suited modalities in certain settings we discuss the results drawn from this study detail some related conclusions and present future work directions
password authenticated key exchange pake protocols allow parties to share secret keys in an authentic manner based on an easily memorizable password recently lu and cao proposed three party password authenticated key exchange protocol so called pake based on ideas of the abdalla and pointcheval two party spake extended to three parties pake can be seen to have structure alternative to that of another three party pake protocol pake by abdalla and pointcheval furthermore simple improvement to pake was proposed very recently by chung and ku to resist the kind of attacks that applied to earlier versions of pake in this paper we show that pake falls to unknown key share attacks by any other client and undetectable online dictionary attacks by any adversary the latter attack equally applies to the recently improved pake indeed the provable security approach should be taken when designing pakes and furthermore our results highlight that extra cautions still be exercised when defining models and constructing proofs in this direction
efficient construction of inverted indexes is essential to provision of search over large collections of text data in this article we review the principal approaches to inversion analyze their theoretical cost and present experimental results we identify the drawbacks of existing inversion approaches and propose single pass inversion method that in contrast to previous approaches does not require the complete vocabulary of the indexed collection in main memory can operate within limited resources and does not sacrifice speed with high temporary storage requirements we show that the performance of the single pass approach can be improved by constructing inverted files in segments reducing the cost of disk accesses during inversion of large volumes of data
materialized xpath access control views are commonly used for enforcing access control when access control rules defining materialized xml access control view change the view must be adapted to reflect these changes the process of updating materialized view after its definition changes is referred to as view adaptation while xpath security views have been widely reported in literature the problem of view adaptation for xpath security views has not been addressed view adaptation results in view downtime during which users are denied access to security views to prevent unauthorized access thus efficient view adaptation is important for making xpath security views pragmatic in this work we show how to adapt an xpath access control view incrementally by re using the existing view which reduces computation and communication costs significantly and results in less downtime for the end user empirical evaluations confirm that the incremental view adaptation algorithms presented in this paper are efficient and scalable
class imbalance where the classes in dataset are not represented equally is common occurrence in machine learning classification models built with such datasets are often not practical since most machine learning algorithms would tend to perform poorly on the minority class instances we present unique evolutionary computing based data sampling approach as an effective solution for the class imbalance problem the genetic algorithm based approach evolutionary sampling works as majority undersampling technique where instances from the majority class are selectively removed this preserves the relative integrity of the majority class while maintaining the original minority class group our research prototype evann also implements genetic algorithm based optimization of modeling parameters for the machine learning algorithms considered in our study an extensive empirical investigation involving four real world datasets is performed comparing the proposed approach to other existing data sampling techniques that target the class imbalance problem our results demonstrate that evolutionary sampling both with and without learner optimization performs relatively better than other data sampling techniques detailed coverage of our case studies in this paper lends itself toward empirical replication
high level languages are growing in popularity however decades of software development have produced large libraries of fast time tested meritorious code that are impractical to recreate from scratch cross language bindings can expose low level code to high level languages unfortunately writing bindings by hand is tedious and error prone while mainstream binding generators require extensive manual annotation or fail to offer the language features that users of modern languages have come to expect we present an improved binding generation strategy based on static analysis of unannotated library source code we characterize three high level idioms that are not uniquely expressible in c’s low level type system array parameters resource managers and multiple return values we describe suite of interprocedural analyses that recover this high level information and we show how the results can be used in binding generator for the python programming language in experiments with four large libraries we find that our approach avoids the mistakes characteristic of hand written bindings while offering level of python integration unmatched by prior automated approaches among the thousands of functions in the public interfaces of these libraries roughly exhibit the behaviors detected by our static analyses
the ability to predict at compile time the likelihood of particular branch being taken provides valuable information for several optimizations including global instruction scheduling code layout function inlining interprocedural register allocation and many high level optimizations previous attempts at static branch prediction have either used simple heuristics which can be quite inaccurate or put the burden onto the programmer by using execution profiling data or source code hintsthis paper presents new approach to static branch prediction called value range propagation this method tracks the weighted value ranges of variables through program much like constant propagation these value ranges may be either numeric of symbolic in nature branch prediction is then performed by simply consulting the value range of the appropriate variable heuristics are used as fallback for cases where the value range of the variable cannot be determined statically in the process value range propagationsubsumes both constant propagation and copy propagationexperimental results indicate that this approach produces significantly more accurate predictions than the best existing heuristic techniques the value range propagation method can be implemented over any ldquo factored rdquo dataflow representation with static single assignment property such as ssa form or dependence flow graph where the variables have been renamed to achieve single assignment experimental results indicate that the technique maintains the linear runtime behavior of constant propagation experienced in practice
in this paper we study the problem of effective keyword search over xml documents we begin by introducing the notion of valuable lowest common ancestor vlca to accurately and effectively answer keyword queries over xml documents we then propose the concept of compact vlca cvlca and compute the meaningful compact connected trees rooted as cvlcas as the answers of keyword queries to efficiently compute cvlcas we devise an effective optimization strategy for speeding up the computation and exploit the key properties of cvlca in the design of the stack based algorithm for answering keyword queries we have conducted an extensive experimental study and the experimental results show that our proposed approach achieves both high efficiency and effectiveness when compared with existing proposals
decision support systems help the decision making process with the use of olap on line analytical processing and data warehouses these systems allow the analysis of corporate data as olap and data warehousing evolve more and more complex data is being used xml extensible markup language is flexible text format allowing the interchange and the representation of complex data finding an appropriate model for an xml data warehouse tends to become complicated as more and more solutions appear hence in this survey paper we present an overview of the different proposals that use xml within data warehousing technology these proposals range from using xml data sources for regular warehouses to those using full xml warehousing solutions some researches merely focus on document storage facilities while others present adaptations of xml technology for olap even though there are growing number of researches on the subject many issues still remain unsolved
in this paper we propose an extension algorithm to closet one of the most efficient algorithms for mining frequent closed itemsets in static transaction databases to allow it to mine frequent closed itemsets in dynamic transaction databases in dynamic transaction database transactions may be added deleted and modified with time based on two variant tree structures our algorithm retains the previous mined frequent closed itemsets and updates them by considering the changes in the transaction databases only hence the frequent closed itemsets in the current transaction database can be obtained without rescanning the entire changed transaction database the performance of the proposed algorithm is compared with closet showing performance improvements for dynamic transaction databases compared to using mining algorithms designed for static transaction databases
this paper explores the suitability of dense circulant graphs of degree four for the design of on chip interconnection networks networks based on these graphs reduce the torus diameter in factor which translates into significant performance gains for unicast traffic in addition they are clearly superior to tori when managing collective communications this paper introduces new two dimensional node’s labeling of the networks explored which simplifies their analysis and exploitation in particular it provides simple and optimal solutions to two important architectural issues routing and broadcasting other implementation issues such as network folding and scalability by using hierarchical networks are also explored in this work
wireless sensors are very small computers and understanding the timing and behavior of software written for them is crucial to ensuring that they perform correctly this paper outlines lightweight method for gathering behavioral and timing information from simulated executions of software written in the nesc tinyos environment the resulting data is used to generate both behavioral and timing profiles of the software using uml sequence diagrams to visualize the behavior and to present the timing information
repeated elements are ubiquitous and abundant in both manmade and natural scenes editing such images while preserving the repetitions and their relations is nontrivial due to overlap missing parts deformation across instances illumination variation etc manually enforcing such relations is laborious and error prone we propose novel framework where user scribbles are used to guide detection and extraction of such repeated elements our detection process which is based on novel boundary band method robustly extracts the repetitions along with their deformations the algorithm only considers the shape of the elements and ignores similarity based on color texture etc we then use topological sorting to establish partial depth ordering of overlapping repeated instances missing parts on occluded instances are completed using information from other instances the extracted repeated instances can then be seamlessly edited and manipulated for variety of high level tasks that are otherwise difficult to perform we demonstrate the versatility of our framework on large set of inputs of varying complexity showing applications to image rearrangement edit transfer deformation propagation and instance replacement
we investigate the problem of optimizing the routing performance of virtual network by adding extra random links our asynchronous and distributed algorithm ensures by adding single extra link per node that the resulting network is navigable small world ie in which greedy routing using the distance in the original network computes paths of polylogarithmic length between any pair of nodes with probability previously known small world augmentation processes require the global knowledge of the network and centralized computations which is unrealistic for large decentralized networks our algorithm based on careful multi layer sampling of the nodes and the construction of light overlay network bypasses these limitations for bounded growth graphs ie graphs where for any node and any radius the number of nodes within distance from is at most constant times the number of nodes within distance our augmentation process proceeds with high probability in log log communication rounds with log log messages of size log bits sent per node and requiring only log log bit space in each node where is the number of nodes and the diameter in particular with the only knowledge of original distances greedy routing computes between any pair of nodes in the augmented network path of length at most log log with probability and of expected length log log hence we provide distributed scheme to augment any bounded growth graph into small world with high probability in polylogarithmic time while requiring polylogarithmic memory we consider that the existence of such lightweight process might be first step towards the definition of more general construction process that would validate kleinberg’s model as plausible explanation for the small world phenomenon in large real interaction networks
to achieve interoperability modern information systems and commerce applications use mappings to translate data from one representation to another in dynamic environments like the web data sources may change not only their data but also their schemas their semantics and their query capabilities such changes must be reflected in the mappings mappings left inconsistent by schema change have to be detected and updated as large complicated schemas become more prevalent and as data is reused in more applications manually maintaining mappings even simple mappings like view definitions is becoming impractical we present novel framework and tool tomas for automatically adapting mappings as schemas evolve our approach considers not only local changes to schema but also changes that may affect and transform many components of schema we consider comprehensive class of mappings for relational and xml schemas with choice types and nested constraints our algorithm detects mappings affected by structural or constraint change and generates all the rewritings that are consistent with the semantics of the mapped schemas our approach explicitly models mapping choices made by user and maintains these choices whenever possible as the schemas and mappings evolve we describe an implementation of mapping management and adaptation tool based on these ideas and compare it with mapping generation tool
in this paper we discuss the energy efficient multicast problem in ad hoc wireless networks each node in the network is assumed to have fixed level of transmission power the problem of our concern is given an ad hoc wireless network and multicast request how to find multicast tree such that the total energy cost of the multicast tree is minimized we first prove this problem is np hard and it is unlikely to have an approximation algorithm with constant performance ratio of the number of nodes in the network we then propose an algorithm based on the directed steiner tree method that has theoretically guaranteed approximation performance ratio we also propose two efficient heuristics node join tree njt and tree join tree tjt algorithms the njt algorithm can be easily implemented in distributed fashion extensive simulations have been conducted to compare with other methods and the results have shown significant improvement on energy efficiency of the proposed algorithms
the key notion in service oriented architecture is decoupling clients and providers of service based on an abstract service description which is used by the service broker to point clients to suitable service implementation client then sends service requests directly to the service implementation problem with the current architecture is that it does not provide trustworthy means for clients to specify service brokers to verify and service implementations to prove that certain desired non functional properties are satisfied during service request processing an example of such non functional property is access and persistence restrictions on the data received as part of the service requests in this work we propose an extension of the service oriented architecture that provides these facilities we also discuss prototype implementation of this architecture and report preliminary results that demonstrate the potential practical value of the proposed architecture in real world software applications
virtual humans are being increasingly used in different domains virtual human modeling requires to consider aspects belonging to different levels of abstractions for example at lower levels one has to consider aspects concerning the geometric definition of the virtual human model and appearance while at higher levels one should be able to define how the virtual human behaves into an environment anim the standard for representing humanoids in xd vrml worlds is mainly concerned with low level modeling aspects as result the developer has to face the problem of defining the virtual human behavior and translating it into lower levels eg geometrical and kinematic aspects in this paper we propose vha virtual human architecture software architecture that allows one to easily manage an interactive anim virtual human into xd vrml worlds the proposed solution allows the developer to focus mainly on high level aspects of the modeling process such as the definition of the virtual human behavior
robotic tape libraries are popular for applications with very high storage requirements such as video servers here we study the throughput of tape library system we design new scheduling algorithm the so called relief and compare it against some older straightforward ones like fcfs maximum queue length mql and an unfair one bypass roughly equivalent to shortest job first the proposed algorithm incorporates an aging mechanism in order to attain fairness and we prove that under certain assumptions it minimizes the average start up latency extensive simulation experiments show that relief outperforms its competitors fair and unfair alike with up to improvement in throughput for the same rejection ratio
this paper demonstrates the advantages of using controlled mobility in wireless sensor networks wsns for increasing their lifetime ie the period of time the network is able to provide its intended functionalities more specifically for wsns that comprise large number of statically placed sensor nodes transmitting data to collection point the sink we show that by controlling the sink movements we can obtain remarkable lifetime improvements in order to determine sink movements we first define mixed integer linear programming milp analytical model whose solution determines those sink routes that maximize network lifetime our contribution expands further by defining the first heuristics for controlled sink movements that are fully distributed and localized our greedy maximum residual energy gmre heuristic moves the sink from its current location to new site as if drawn toward the area where nodes have the highest residual energy we also introduce simple distributed mobility scheme random movement or rm according to which the sink moves uncontrolled and randomly throughout the network the different mobility schemes are compared through extensive ns based simulations in networks with different nodes deployment data routing protocols and constraints on the sink movements in all considered scenarios we observe that moving the sink always increases network lifetime in particular our experiments show that controlling the mobility of the sink leads to remarkable improvements which are as high as sixfold compared to having the sink statically and optimally placed and as high as twofold compared to uncontrolled mobility
the method described in this article evaluates case similarity in the retrieval stage of case based reasoning cbr it thus plays key role in deciding which case to select and therefore in deciding which solution will be eventually applied in cbr there are many retrieval techniques one feature shared by most is that case retrieval is based on attribute similarity and importance however there are other crucial factors that should be considered such as the possible consequences of given solution in other words its potential loss and gain as their name clearly implies these concepts are defined as functions measuring loss and gain when given retrieval case solution is applied moreover these functions help the user to choose the best solution so that when mistake is made the resulting loss is minimal in this way the highest benefit is always obtained
activity centric computing acc systems seek to address the fragmentation of office work across tools and documents by allowing users to organize work around the computational construct of an activity defining and structuring appropriate activities within system poses challenge for users that must be overcome in order to benefit from acc support we know little about how knowledge workers appropriate the activity construct to address this we studied users appropriation of production quality acc system lotus activities for everyday work by employees in large corporation we contribute to better understanding of how users articulate their individual and collaborative work in the system by providing empirical evidence of their patterns of appropriation we conclude by discussing how our findings can inform the design of other acc systems for the workplace
we present an interactive system for synthesizing urban layouts by example our method simultaneously performs both structure based synthesis and an image based synthesis to generate complete urban layout with plausible street network and with aerial view imagery our approach uses the structure and image data of real world urban areas and synthesis algorithm to provide several high level operations to easily and interactively generate complex layouts by example the user can create new urban layouts by sequence of operations such as join expand and blend without being concerned about low level structural details further the ability to blend example urban layout fragments provides powerful way to generate new synthetic content we demonstrate our system by creating urban layouts using example fragments from several real world cities each ranging from hundreds to thousands of city blocks and parcels
dimensionality reduction is an essential data preprocessing technique for large scale and streaming data classification tasks it can be used to improve both the efficiency and the effectiveness of classifiers traditional dimensionality reduction approaches fall into two categories feature extraction and feature selection techniques in the feature extraction category are typically more effective than those in feature selection category however they may break down when processing large scale data sets or data streams due to their high computational complexities similarly the solutions provided by the feature selection approaches are mostly solved by greedy strategies and hence are not ensured to be optimal according to optimized criteria in this paper we give an overview of the popularly used feature extraction and selection algorithms under unified framework moreover we propose two novel dimensionality reduction algorithms based on the orthogonal centroid algorithm oc the first is an incremental oc ioc algorithm for feature extraction the second algorithm is an orthogonal centroid feature selection ocfs method which can provide optimal solutions according to the oc criterion both are designed under the same optimization criterion experiments on reuters corpus volume data set and some public large scale text data sets indicate that the two algorithms are favorable in terms of their effectiveness and efficiency when compared with other state of the art algorithms
most large public displays have been used for providing information to passers by with the primary purpose of acting as one way information channels to individual users we have developed large public display to which users can send their own media content using mobile devices the display supports multi touch interaction thus enabling collaborative use of the display this display called citywall was set up in city center with the goal of showing information of events happening in the city we observed two user groups who used mobile phones with upload capability during two large scale events happening in the city our findings are that this kind of combined use of personal mobile devices and large public display as publishing forum used collaboratively with other users creates unique setting that extends the group’s feeling of participation in the events we substantiate this claim with examples from user data
abstract almost all semantics for logic programs with negation identify set sem of models of program as the intended semantics of and any model in this class is considered possible meaning of with regard to the semantics the user has in mind thus for example in the case of stable models check end of sentence choice models check end of sentence answer sets check end of sentence etc different possible models correspond to different ways of completing the incomplete information in the logic program however different end users may have different ideas on which of these different models in sem is reasonable one from their point of view for instance given sem user may prefer model in sem to model in sem based on some evaluation criterion that she has in this paper we develop logic program semantics based on optimal models this semantics does not add yet another semantics to the logic programming arena it takes as input an existing semantics sem and user specified objective function obj and yields new semantics underline rm opt subseteq sem that realizes the objective function within the framework of preferred models identified already by sem thus the user who may or may not know anything about logic programming has considerable flexibility in making the system reflect her own objectives by building on top of existing semantics known to the system in addition to the declarative semantics we provide complete complexity analysis and algorithms to compute optimal models under varied conditions when sem is the stable model semantics the minimal models semantics and the all models semantics
the growing nature of databases and the flexibility inherent in the sql query language that allows arbitrarily complex formulations can result in queries that take inordinate amount of time to complete to mitigate this problem strategies that are optimized to return the first few rows or top rows in case of sorted results are usually employed however both these strategies can lead to unpredictable query processing times thus in this paper we propose supporting time constrained sql queries specifically user issues sql query as before but additionally provides nature of constraint soft or hard an upper bound for query processing time and acceptable nature of results partial or approximate the dbms takes the criteria constraint type time limit quality of result into account in generating the query execution plan which is expected guaranteed to complete in the allocated time for soft hard time constraint if partial results are acceptable then the technique of reducing result set cardinality ie returning first few or top rows is used whereas if approximate results are acceptable then sampling is used to compute query results within the specified time limit for the latter case we argue that trading off quality of results for predictable response time is quite useful however for this case we provide additional aggregate functions to estimate the aggregate values and to compute the associated confidence interval this paper presents the notion of time constrained sql queries discusses the challenges in supporting such construct describes framework for supporting such queries and outlines its implementation in oracle database by exploiting oracle’s cost based optimizer and extensibility capabilities
scenarios have been advocated as means of improving requirements engineering yet few methods or tools exist to support scenario based re the paper reports method and software assistant tool for scenario based re that integrates with use case approaches to object oriented development the method and operation of the tool are illustrated with financial system case study scenarios are used to represent paths of possible behavior through use case and these are investigated to elaborate requirements the method commences by acquisition and modeling of use case the use case is then compared with library of abstract models that represent different application classes each model is associated with set of generic requirements for its class hence by identifying the class es to which the use case belongs generic requirements can be reused scenario paths are automatically generated from use cases then exception types are applied to normal event sequences to suggest possible abnormal events resulting from human error generic requirements are also attached to exceptions to suggest possible ways of dealing with human error and other types of system failure scenarios are validated by rule based frames which detect problematic event patterns the tool suggests appropriate generic requirements to deal with the problems encountered the paper concludes with review of related work and discussion of the prospects for scenario based re methods and tools
the new hybrid clone detection tool nicad combines the strengths and overcomes the limitations of both text based and ast based clone detection techniques and exploits novel applications of source transformation system to yield highly accurate identification of cloned code in software systems in this paper we present an in depth study of near miss function clones in open source software using nicad we examine more than open source java and num systems including the entire linux kernel apache httpd jsdk swing and dbo and compare their use of cloned code in several different dimensions including language clone size clone similarity clone location and clone density both by proportion of cloned functions and lines of cloned code we manually verify all detected clones and provide complete catalogue of different clones in an online repository in variety of formats these validated results can be used as cloning reference for these systems and as benchmark for evaluating other clone detection tools copyright copy john wiley sons ltd in this paper we provide an empirical study of function clones in more than open source java and num systems of varying kinds and sizes including the entire linux kernel using the new hybrid clone detection method nicad we manually verify all the detected clones and provide complete catalogue of the different clones in an online repository in variety of formats our studies show that there are large number of near miss function clones in those systems copyright copy john wiley sons ltd
mobile agents are becoming increasingly important in the highly distributed applications frameworks seen today their routing dispatching from node to node is very important issue as we need to safeguard application efficiency achieve better load balancing and resource utilization throughout the underlying network selecting the best target server for dispatching mobile agent is therefore multi faceted problem that needs to be carefully tackled in this paper we propose distributed adaptive routing schemes next node selection for mobile agents the proposed schemes overcome risks like load oscillations ie agents simultaneously abandoning congested node in search for other less saturated node we try to induce different routing decisions taken by agents to achieve load balancing and better utilization of network resources we consider five different algorithms and evaluate them through simulations our findings are quite promising both from the user application and the network infrastructure perspective
the evolution of geographic phenomena has been one of the concerns of spatiotemporal database research however in large spectrum of geographical applications users need more than mere representation of data evolution for instance in urban management applications mdash eg cadastral evolution mdash users often need to know why how and by whom certain changes have been performed as well as their possible impact on the environment answers to such queries are not possible unless supplementary information concerning real world events is associated with the corresponding changes in the database and is managed efficiently this paper proposes solution to this problem which is based on extending spatiotemporal database with mechanism for managing documentation on the evolution of geographic information this solution has been implemented in gis based prototype which is also discussed in the paper
we describe an efficient top down strategy for overlap removal and floorplan repair which repairs overlaps in floorplans produced by placement algorithms or rough floorplanning methodologies the algorithmic framework that we propose incorporates novel geometric shifting technique within top down flow the effect of our algorithm is quantified across broad range of floorplans produced by multiple tools our method succeeds in producing valid placements in almost all cases moreover compared to leading methods it requires only one fifth the run time and produces placements with to less hpwl and up to less cell movement
in this paper we describe conceptual framework and address the related issues and solutions in the identification of three major challenges for the development and evaluation of immersive digital educational games idegs these challenges are advancing adaptive educational technologies to shape learning experience ensuring the individualization of learning experiences adaptation to personal aims needs abilities and prerequisites ii providing technological approaches to reduce the development costs for idegs by enabling the creation of entirely different stories and games for variety of different learning domains each based more or less on the same pool of story units patterns and structures iii developing robust evaluation methodologies for idegs by the extension of iso to include user satisfaction motivation and learning progress and other user experience ux attributes while our research and development is by no means concluded we believe that we have arrived at stage where conclusions may be drawn which will be of considerable use to other researchers in this domain
recently lot of work has been done on formalization of business process specification in particular using petri nets and process algebra however these efforts usually do not explicitly address complex business process development which necessitates the specification coordination and synchronization of large number of business steps it is imperative that these atomic tasks are associated correctly and monitored for countless dependencies moreover as these business processes grow they become critically reliant on large number of split and merge points which additionally increases modeling complexity therefore one of the central challenges in complex business process modeling is the composition of dependent business steps we address this challenge and introduce formally correct method for automated composition of algebraic expressions in complex business process modeling based on acyclic directed graph reductions we show that our method generates an equivalent algebraic expression from an appropriate acyclic directed graph if the graph is well formed and series parallel additionally we encapsulate the reductions in an algorithm that transforms business step dependencies described by users into digraphs recognizes structural conflicts identifies wheatstone bridges and finally generates algebraic expressions
to narrow the semantic gap in content based image retrieval cbir relevance feedback is utilized to explore knowledge about the user’s intention in finding target image or image category users provide feedback by marking images returned in response to query image as relevant or irrelevant existing research explores such feedback to refine querying process select features or learn image classifier however the vast amount of unlabeled images is ignored and often substantially limited examples are engaged into learning in this paper we address the two issues and propose novel effective method called relevance aggregation projections rap for learning potent subspace projections in semi supervised way given relevances and irrelevances specified in the feedback rap produces subspace within which the relevant examples are aggregated into single point and the irrelevant examples are simultaneously separated by large margin regarding the query plus its feedback samples as labeled data and the remainder as unlabeled data rap falls in special paradigm of imbalanced semi supervised learning through coupling the idea of relevance aggregation with semi supervised learning we formulate constrained quadratic optimization problem to learn the subspace projections which entail semantic mining and therefore make the underlying cbir system respond to the user’s interest accurately and promptly experiments conducted over large generic image database show that our subspace approach outperforms existing subspace methods for cbir even with few iterations of user feedback
applications that analyze mine and visualize large datasets are considered an important class of applications in many areas of science engineering and business queries commonly executed in data analysis applications often involve user defined processing of data and application specific data structures if data analysis is employed in collaborative environment the data server should execute multiple such queries simultaneously to minimize the response time to clients in this paper we present the design of runtime system for executing multiple query workloads on shared memory machine we describe experimental results using an application for browsing digitized microscopy images
in recent years classification learning for data streams has become an important and active research topic major challenge posed by data streams is that their underlying concepts can change over time which requires current classifiers to be revised accordingly and timely to detect concept change common methodology is to observe the online classification accuracy if accuracy drops below some threshold value concept change is deemed to have taken place an implicit assumption behind this methodology is that any drop in classification accuracy can be interpreted as symptom of concept change unfortunately however this assumption is often violated in the real world where data streams carry noise that can also introduce significant reduction in classification accuracy to compound this problem traditional noise cleansing methods are incompetent for data streams those methods normally need to scan data multiple times whereas learning for data streams can only afford one pass scan because of data’s high speed and huge volume another open problem in data stream classification is how to deal with missing values when new instances containing missing values arrive how learning model classifies them and how the learning model updates itself according to them is an issue whose solution is far from being explored to solve these problems this paper proposes novel classification algorithm flexible decision tree flexdt which extends fuzzy logic to data stream classification the advantages are three fold first flexdt offers flexible structure to effectively and efficiently handle concept change second flexdt is robust to noise hence it can prevent noise from interfering with classification accuracy and accuracy drop can be safely attributed to concept change third it deals with missing values in an elegant way extensive evaluations are conducted to compare flexdt with representative existing data stream classification algorithms using large suite of data streams and various statistical tests experimental results suggest that flexdt offers significant benefit to data stream classification in real world scenarios where concept change noise and missing values coexist
as more and more human motion data are becoming widely used to animate computer graphics figures in many applications the growing need for compact storage and fast transmission makes it imperative to compress motion data we propose data driven method for efficient compression of human motion sequences by exploiting both spatial and temporal coherences of the data we first segment motion sequence into subsequences such that the poses within subsequence lie near low dimensional linear space we then compress each segment using principal component analysis our method achieves further compression by storing only the key frames projections to the principal component space and interpolating the other frames in between via spline functions the experimental results show that our method can achieve significant compression rate with low reconstruction errors
wireless mesh networks wmns can provide seamless broadband connectivity to network users with low setup and maintenance costs to support next generation applications with real time requirements however these networks must provide improved quality of service guarantees current mesh protocols use techniques that fail to accurately predict the performance of end to end paths and do not optimize performance based on knowledge of mesh network structures in this paper we propose quorum routing protocol optimized for wmns that provides accurate qos properties by correctly predicting delay and loss characteristics of data traffic quorum integrates novel end to end packet delay estimation mechanism with stability aware routing policies allowing it to more accurately follow qos requirements while minimizing misbehavior of selfish nodes
this paper presents method for acquiring concession strategy of an agent in multi issue negotiation this method learns how to make concession to an opponent for realizing win win negotiation to learn the concession strategy we adopt reinforcement learning first an agent receives proposal from an opponent the agent recognizes negotiation state using the difference between their proposals and the difference between their concessions according to the state the agent makes proposal by reinforcement learning reward of the learning is profit of an agreement and punishment of negotiation breakdown the experimental results showed that the agents could acquire the negotiation strategy that avoids negotiation breakdown and increases profits of an agreement as result agents can acquire the action policy that strikes balance between cooperation and competition
the capabilities of current mobile devices especially pdas are making it possible to design and develop mobile applications that employ visual techniques for using geographic data in the field these applications can be extremely useful in areas as diverse as tourism business natural resources management and homeland security in this paper we present system aimed at supporting users in the exploratory analysis of geographic data on pdas through highly interactive interface based on visual dynamic queries we propose alternative visualizations to display query results and present an experimental evaluation aimed at comparing their effectiveness on pda in tourist scenario our findings provide an experimental confirmation of the unsuitability of the typical visualization employed by classic dynamic query systems which displays only those results that fully satisfy query in those cases where only sub optimal results are obtainable for such cases the results of our study highlight the usefulness of visualizations that display all results and their degree of satisfaction of the query
the problem confronted in the content based image retrieval research is the semantic gap between the low level feature representing and high level semantics in the images this paper describes way to bridge such gap by learning the similar images given from the user the system extracts the similar region pairs and classifies those similar region pairs either as object or non object semantics and either as object relation or non object relation semantics automatically which are obtained from comparing the distances and spatial relationships in the similar region pairs by themselves the system also extracts interesting parts of the features from the similar region pair and then adjusts each interesting feature and region pair weight dynamically using those objects and object relation semantics as well as the dynamic weights adjustment from the similar images the semantics of those similar images can be mined and used for searching the similar images the experiments show that the proposed system can retrieve the similar images well and efficient
in this paper some studies have been made on the essence of fuzzy linear discriminant analysis lda algorithm and fuzzy support vector machine fsvm classifier respectively as kernel based learning machine fsvm is represented with the fuzzy membership function while realizing the same classification results with that of the conventional pair wise classification it outperforms other learning machines especially when unclassifiable regions still remain in those conventional classifiers however serious drawback of fsvm is that the computation requirement increases rapidly with the increase of the number of classes and training sample size to address this problem an improved fsvm method that combines the advantages of fsvm and decision tree called dt fsvm is proposed firstly furthermore in the process of feature extraction reformative lda algorithm based on the fuzzy nearest neighbors fknn is implemented to achieve the distribution information of each original sample represented with fuzzy membership grade which is incorporated into the redefinition of the scatter matrices in particular considering the fact that the outlier samples in the patterns may have some adverse influence on the classification result we developed novel lda algorithm using relaxed normalized condition in the definition of fuzzy membership function thus the classification limitation from the outlier samples is effectively alleviated finally by making full use of the fuzzy set theory complete lda cf lda framework is developed by combining the reformative lda rf lda feature extraction method and dt fsvm classifier this hybrid fuzzy algorithm is applied to the face recognition problem extensive experimental studies conducted on the orl and nust face images databases demonstrate the effectiveness of the proposed algorithm
the region analysis of tofte and talpin is an attempt to determine statically the life span of dynamically allocated objects but the calculus is at once intuitively simple yet deceptively subtle and previous theoretical analyses have been frustratingly complex no analysis has revealed and explained in simple terms the connection between the subleties of the calculus and the imperative features it builds on we present novel approach for proving safety and correctness of simplified version of the region calculus we give stratified operational semantics composed of highlevel semantics dealing with the conceptual difficulties of effect annotations and low level one with explicit operations on region indexed store the main results of the paper are proof simpler than previous ones and modular approach to type safety and correctness the flexibility of this approach is demonstrated by the simplicity of the extension to the full calculus with type and region polymorphism
ontologies enable to directly encode domain knowledge in software applications so ontology based systems can exploit the meaning of information for providing advanced and intelligent functionalities one of the most interesting and promising application of ontologies is information extraction from unstructured documents in this area the extraction of meaningful information from pdf documents has been recently recognized as an important and challenging problem this paper proposes an ontology based information extraction system for pdf documents founded on well suited knowledge representation approach named self populating ontology spo the spo approach combines object oriented logic based features with formal grammar capabilities and allows expressing knowledge in term of ontology schemas instances and extraction rules called descriptors aimed at extracting information having also tabular form the novel aspect of the spo approach is that it allows to represent ontologies enriched by rules that enable them to populate them self with instances extracted from unstructured pdf documents in the paper the tractability of the spo approach is proven moreover features and behavior of the prototypical implementation of the spo system are illustrated by means of running example
using wireless peer to peer interactions between portable devices it is possible to locally share information and maintain spatial temporal knowledge emanating from the surroundings we consider the prospects for unleashing ambient data from the surrounding environment for information provision using two biological phenomena human mobility and human social interaction this leads to analogies with epidemiology and is highly relevant to future technology rich environments here embedded devices in the physical environment such as sensors and wireless enabled appliances represent information sources that can provide extensive situated information in this paper we address candidate scenario where isolated sensors in the environment provide real time data from fixed locations using simulation we examine what happens when information is greedily acquired and shared by mobile participants through peer to peer interaction this is assessed taking into account availability of source nodes and the effects of mobility with respect to temporal accuracy of information the results reaffirm the need to consider range of mobility models in testing and validating protocols
we study adding aggregate operators such as summing up elements of column of relation to logics with counting mechanisms the primary motivation comes from database applications where aggregate operators are present in all real life query languages unlike other features of query languages aggregates are not adequately captured by the existing logical formalisms consequently all previous approaches to analyzing the expressive power of aggregation were only capable of producing partial results depending on the allowed class of aggregate and arithmetic operationswe consider powerful counting logic and extend it with the set of all aggregate operators we show that the resulting logic satisfies analogs of hanf’s and gaifman’s theorems meaning that it can only express local properties we consider database query language that expresses all the standard aggregates found in commercial query languages and show how it can be translated into the aggregate logic thereby providing number of expressivity bounds that do not depend on particular class of arithmetic functions and that subsume all those previously known we consider restricted aggregate logic that gives us tighter capture of database languages and also use it to show that some questions on expressivity of aggregation cannot be answered without resolving some deep problems in complexity theory
while there have been advances in visualization systems particularly in multi view visualizations and visual exploration the process of building visualizations remains major bottleneck in data exploration we show that provenance metadata collected during the creation of pipelines can be reused to suggest similar content in related visualizations and guide semi automated changes we introduce the idea of query by example in the context of an ensemble of visualizations and the use of analogies as first class operations in system to guide scalable interactions we describe an implementation of these techniques in vistrails publicly available open source system
since the first results published in by liu and layland on the rate monotonic rm and earliest deadline first edf algorithms lot of progress has been made in the schedulability analysis of periodic task sets unfortunately many misconceptions still exist about the properties of these two scheduling methods which usually tend to favor rm more than edf typical wrong statements often heard in technical conferences and even in research papers claim that rm is easier to analyze than edf it introduces less runtime overhead it is more predictable in overload conditions and causes less jitter in task executionsince the above statements are either wrong or not precise it is time to clarify these issues in systematic fashion because the use of edf allows better exploitation of the available resources and significantly improves system’s performancethis paper compares rm against edf under several aspects using existing theoretical results specific simulation experiments or simple counterexamples to show that many common beliefs are either false or only restricted to specific situations
we present methodology for data warehouse design and its application within the telecom italia information system the methodology is based on conceptual representation of the enterprise which is exploited both in the integration phase of the warehouse information sources and during the knowledge discovery activity on the information stored in the warehouse the application of the methodology in the telecom italia framework has been supported by prototype software tools both for conceptual modeling and for data integration and reconciliation
sprint is middleware infrastructure for high performance and high availability data management it extends the functionality of standalone in memory database imdb server to cluster of commodity shared nothing servers applications accessing an imdb are typically limited by the memory capacity of the machine running the imdb sprint partitions and replicates the database into segments and stores them in several data servers applications are then limited by the aggregated memory of the machines in the cluster transaction synchronization and commitment rely on total order multicast differently from previous approaches sprint does not require accurate failure detection to ensure strong consistency allowing fast reaction to failures experiments conducted on cluster with data servers using tpc and micro benchmark showed that sprint can provide very good performance and scalability
the effort in software process support has focused so far on modeling and enacting processes certain amount of work has been done but little has reached satisfactory level of maturity and acceptance in our opinion this is due to the difficulty for system to accommodate the very numerous aspects involved in software processes complete process support should cover topics ranging from low level tasks like compiling to organizational and strategic tasks this includes process enhancement resource management and control cooperative work etc the environment must also be convenient for software engineers team leaders managers and so on it must be able to describe details for efficient execution and be high level for capturing understanding etc as matter of fact the few tools that have reached sufficient maturity have focussed on single topic and addressed single class of usersit is our claim that no single system can provide satisfactory solution except in clearly defined subdomain thus we shifted our attention from finding the universal system to finding ways to make many different systems cooperate with their associated formalisms and process enginesthis paper presents novel approach for software process support environments based on federation of heterogeneous and autonomous components the approach has been implemented and experimented in the apel environment it is shown which architecture and technology is involved how it works which interoperability paradigms have been used which problems we have solved and which issues are still under study
as the internet grows in size it becomes crucial to understand how the speeds of links in the network must improve in order to sustain the pressure of new end nodes being added each day although the speeds of links in the core and at the edges improve roughly according to moore’s law this improvement alone might not be enough indeed the structure of the internet graph and routing in the network might necessitate much faster improvements in the speeds of key links in the network in this paper using combination of analysis and extensive simulations we show that the worst congestion in the internet as level graph in fact scales poorly with the network size sup omega sup where is the number of nodes when shortest path routing is used to route traffic between ases we also show somewhat surprisingly that policy based routing does not exacerbate the maximum congestion when compared to shortest path routing our results show that it is crucial to identify ways to alleviate this congestion to avoid some links from being perpetually congested to this end we show that the congestion scaling properties of internet like graphs can be improved dramatically by introducing moderate amounts of redundancy in the graph in terms of parallel edges between pairs of adjacent nodes
challenge involved in applying density based clustering to categorical biomedical data is that the cube of attribute values has no ordering defined making the search for dense subspaces slow we propose the hierdenc algorithm for hierarchical density based clustering of categorical data and complementary index for searching for dense subspaces efficiently the hierdenc index is updated when new objects are introduced such that clustering does not need to be repeated on all objects the updating and cluster retrieval are efficient comparisons with several other clustering algorithms showed that on large datasets hierdenc achieved better runtime scalability on the number of objects as well as cluster quality by fast collapsing the bicliques in large networks we achieved an edge reduction of as much as hierdenc is suitable for large and quickly growing datasets since it is independent of object ordering does not require re clustering when new data emerges and requires no user specified input parameters
this paper surveys how the maximum adjacency ma ordering of the vertices in graph can be used to solve various graph problems we first explain that the minimum cut problem can be solved efficiently by utilizing the ma ordering the idea is then extended to fundamental operation of graph edge splitting based on this the edge connectivity augmentation problem for given and also for the entire range of can be solved efficiently by making use of the ma ordering where it is asked to add the smallest number of new edges to given graph so that its edge connectivity is increased to other related topics are also surveyed
this paper examines algorithmic aspects of searching for approximate functional dependencies in database relation the goal is to avoid exploration of large parts of the space of potential rules this is accomplished by leveraging found rules to make finding other rules more efficient the overall strategy is an attribute at time iteration which uses local breadth first searches on lattices that increase in width and height in each iteration the resulting algorithm provides many opportunities to apply heuristics to tune the search for particular data sets and or search objectives the search can be tuned at both the global iteration level and the local search level number of heuristics are developed and compared experimentally
current recommender systems attempt to identify appealing items for user by applying syntactic matching techniques which suffer from significant limitations that reduce the quality of the offered suggestions to overcome this drawback we have developed domain independent personalization strategy that borrows reasoning techniques from the semantic web elaborating recommendations based on the semantic relationships inferred between the user’s preferences and the available items our reasoning based approach improves the quality of the suggestions offered by the current personalization approaches and greatly reduces their most severe limitations to validate these claims we have carried out case study in the digital tv field in which our strategy selects tv programs interesting for the viewers from among the myriad of contents available in the digital streams our experimental evaluation compares the traditional approaches with our proposal in terms of both the number of tv programs suggested and the users perception of the recommendations finally we discuss concerns related to computational feasibility and scalability of our approach
despite the effectiveness of search engines the persistently increasing amount of web data continuously obscures the search task efforts have thus concentrated on personalized search that takes account of user preferences new concept is introduced towards this direction search based on ranking of local set of categories that comprise user search profile new algorithms are presented that utilize web page categories to personalize search results series of user based experiments show that the proposed solutions are efficient finally we extend the application of our techniques in the design of topic focused crawlers which can be considered an alternative personalized search
we discuss some basic issues of interactive computations in the framework of rough granular computing among these issues are hierarchical modeling of granule structures and interactions between granules of different complexity interactions between granules on which computations are performed are among the fundamental concepts of wisdom technology wistech wistech is encompassing such areas as interactive computations multiagent systems cognitive computation natural computing complex adaptive and autonomous systems or knowledge representation and reasoning about knowledge
the tremendous growth of system memories has increased the capacities and capabilities of memory resident embedded databases yet current embedded databases need to be tuned in order to take advantage of new memory technologies in this paper we study the implications of hosting memory resident databases and propose hardware and software query driven techniques to improve their performance and energy consumption we exploit the structured organization of memories which enables selective mode of operation in which banks are accessed selectively unused banks are placed in lower power mode based on access pattern information we propose hardware techniques that dynamically control the memory by making the system adapt to the access patterns that arise from queries we also propose software query directed scheme that directly modifies the queries to reduce the energy consumption by ensuring uniform bank accesses our results show that these optimizations could lead to at the least reduction in memory energy we also show that query directed schemes better utilize the low power modes achieving up to improvement
modern database systems provide not only powerful data models but also complex query languages supporting powerful features such as the ability to create new database objects and invocation of arbitrary methods possibly written in third party programming language in this sense query languages have evolved into powerful programming languages surprisingly little work exists utilizing techniques from programming language research to specify and analyse these query languages this paper provides formal high level operational semantics for complex value oql like query language that can create fresh database objects and invoke external methods we define type system for our query language and prove an important soundness propertywe define simple effect typing discipline to delimit the computational effects within our queries we prove that this effect system is correct and show how it can be used to detect cases of non determinism and to define correct query optimizations
we present practical technique for pointing and selection using combination of eye gaze and keyboard triggers eyepoint uses two step progressive refinement process fluidly stitched together in look press look release action which makes it possible to compensate for the accuracy limitations of the current state of the art eye gaze trackers while research in gaze based pointing has traditionally focused on disabled users eyepoint makes gaze based pointing effective and simple enough for even able bodied users to use for their everyday computing tasks as the cost of eye gaze tracking devices decreases it will become possible for such gaze based techniques to be used as viable alternative for users who choose not to use mouse depending on their abilities tasks and preferences
there has been tremendous growth in the amount and range of information available on the internet the users requests for online information can be captured by long tail model few popular websites enjoy high number of visitations while the majority of the rest are less frequently requested in this study we use real world data to investigate this phenomenon and show that both users physical location and time of access affect the heterogeneity of website requests the effect can partially be explained by differences in demographic characteristics at locations and diverse user browsing behavior in weekdays and weekends these results can be used to design better online marketing strategies affiliate advertising models and internet caching algorithms with sensitivities to user location and time of access differences
leakage energy reduction for caches has been the target of many recent research efforts in this work we propose novel compiler directed approach to reduce the data cache leakage energy by exploiting the program behavior the proposed approach is based on the observation that only small portion of the data are active at runtime and the program spends lot of time in loops so large portion of data cache lines which are not accessed by the loop can be placed into the leakage control mode to reduce leakage energy consumption the compiler directed approach does not require hardware counters to monitor the access patterns of the cache lines and it is adaptive to the program behavior the experimental results show that the compiler directed approach is very competitive in terms of energy consumption and energy delay product compared to the recently proposed pure hardware based approach we also show that the utilization of loop transformations can increase the effectiveness of our strategy
with an increasing use of data mining tools and techniques we envision that knowledge discovery and data mining system kddms will have to support and optimize for the following scenarios sequence of queries user may analyze one or more datasets by issuing sequence of related complex mining queries and multiple simultaneous queries several users may be analyzing set of datasets concurrently and may issue related complex queriesthis paper presents systematic mechanism to optimize for the above cases targeting the class of mining queries involving frequent pattern mining on one or multiple datasets we present system architecture and propose new algorithms to simultaneously optimize multiple such queries and use knowledgeable cache to store and utilize the past query results we have implemented and evaluated our system with both real and synthetic datasets our experimental results show that our techniques can achieve speedup of up to factor of compared with the systems which do not support caching or optimize for multiple queries
collaborative brainstorming can be challenging but important part of creative group problem solving mind mapping has the potential to enhance the brainstorming process but has its own challenges when used in group we introduce groupmind collaborative mind mapping tool that addresses these challenges and opens new opportunities for creative teamwork including brainstorming we present semi controlled evaluation of groupmind and its impact on teamwork problem solving and collaboration for brainstorming activities groupmind performs better than using traditional whiteboard in both interaction group and nominal group settings for the task involving memory recall the hierarchical mind map structure also imposes important framing effects on group dynamics and idea organization during the brainstorming process we also present design ideas to assist in the development of future tools to support creative problem solving in groups
new paradigm for mobile service chain’s competitive and collaborative mechanism is proposed in this study the main idea of the proposed approach is based on multi agent system with optimal profit of the pull push and collaborative models among the portal access service provider pasp the product service provider psp and the mobile service provider msp to address the running mechanism for the multi agent system an integrated system framework is proposed based on the agent evolution algorithm aea which could resolve all these modes to examine the feasibility of the framework prototype system based on java repast is implemented the simulation experiments show that this system can help decision makers take the appropriate strategies with higher profits by analyzing the expectations and variances or risks of each player’s profit the interaction between and among entities in the chain is well understood it is found that in the situation where collaborative mechanism is applied the performance of players is better as compared to the other two situations where competitive mechanism is implemented if some constraints are applied the risk will be kept at low level
the common abstraction of xml schema by unranked regular tree languages is not entirely accurate to shed some light on the actual expressive power of xml schema intuitive semantical characterizations of the element declarations consistent edc rule are provided in particular it is obtained that schemas satisfying edc can only reason about regular properties of ancestors of nodes hence with respect to expressive power xml schema is closer to dtds than to tree automata these theoretical results are complemented with an investigation of the xml schema definitions xsds occurring in practice revealing that the extra expressiveness of xsds over dtds is only used to very limited extent as this might be due to the complexity of the xml schema specification and the difficulty of understanding the effect of constraints on typing and validation of schemas simpler formalism equivalent to xsds is proposed it is based on contextual patterns rather than on recursive types and it might serve as light weight front end for xml schema next the effect of edc on the way xml documents can be typed is discussed it is argued that cleaner more robust larger but equally feasible class is obtained by replacing edc with the notion of pass preorder typing ppt schemas that allow one to determine the type of an element of streaming document when its opening tag is met this notion can be defined in terms of grammars with restrained competition regular expressions and there is again an equivalent syntactical formalism based on contextual patterns finally algorithms for recognition simplification and inclusion of schemas for the various classes are given
we propose method to handle approximate searching by image content in medical image databases image content is represented by attributed relational graphs holding features of objects and relationships between objects the method relies on the assumption that fixed number of labeled or expected objects eg heart lungs etc are common in all images of given application domain in addition to variable number of unexpected or unlabeled objects eg tumor hematoma etc the method can answer queries by example such as find all rays that are similar to smith’s ray the stored images are mapped to points in multidimensional space and are indexed using state of the art database methods trees the proposed method has several desirable propertiesdatabase search is approximate so that all images up to prespecified degree of similarity tolerance are retrievedit has no false dismissals ie all images qualifying query selection criteria are retrieved it is much faster than sequential scanning for searching in the main memory and on the disk ie by up to an order of magnitude thus scaling up well for large databases
this paper proposes novel method using constant inter frame motion for self calibration from an image sequence of an object rotating around single axis with varying camera internal parameters our approach makes use of the facts that in many commercial systems rotation angles are often controlled by an electromechanical system and that the inter frame essential matrices are invariant if the rotation angles are constant but not necessary known therefore recovering camera internal parameters is possible by making use of the equivalence of essential matrices which relate the unknown calibration matrices to the fundamental matrices computed from the point correspondences we also describe linear method that works under restrictive conditions on camera internal parameters the solution of which can be used as the starting point of the iterative non linear method with looser constraints the results are refined by enforcing the global constraints that the projected trajectory of any point should be conic after compensating for the focusing and zooming effects finally using the bundle adjustment method tailored to the special case ie static camera and constant object rotation the structure of the object is recovered and the camera parameters are further refined simultaneously to determine the accuracy and the robustness of the proposed algorithm we present the results on both synthetic and real sequences
discriminative reranking is one method for constructing high performance statistical parsers collins discriminative reranker requires source of candidate parses for each sentence this paper describes simple yet novel method for constructing sets of best parses based on coarse to fine generative parser charniak this method generates best lists that are of substantially higher quality than previously obtainable we used these parses as the input to maxent reranker johnson et al riezler et al that selects the best parse from the set of parses for each sentence obtaining an score of on sentences of length or less
we have proposed the extent system for automated photograph annotation using image content and context analysis key component of extent is landmark recognition system called landmarker in this paper we present the architecture of landmarker the content of query photograph is analyzed and compared against database of sample landmark images to recognize any landmarks it contains an algorithm is presented for comparing query image with sample image context information may be used to assist landmark recognition also we show how landmarker deals with scalability to allow recognition of large number of landmarks we have implemented prototype of the system and present empirical results on large dataset
this article describes our research on spoken language translation aimed toward the application of computer aids for second language acquisition the translation framework is incorporated into multilingual dialogue system in which student is able to engage in natural spoken interaction with the system in the foreign language while speaking query in their native tongue at any time to obtain spoken translation for language assistance thus the quality of the translation must be extremely high but the domain is restricted experiments were conducted in the weather information domain with the scenario of native english speaker learning mandarin chinese we were able to utilize large corpus of english weather domain queries to explore and compare variety of translation strategies formal example based and statistical translation quality was manually evaluated on test set of spontaneous utterances the best speech translation performance percnt correct percnt incorrect and percnt rejected is achieved by system which combines the formal and example based methods using parsability by domain specific chinese grammar as rejection criterion
preventive measures sometimes fail to defect malicious attacks with cyber attacks on data intensive applications becoming an ever more serious threat intrusion tolerant database systems are significant concern the main objective of intrusion tolerant database systems is to detect attacks and to assess and repair the damage caused by the attacks in timely manner such that the database will not be damaged to such degree that is unacceptable or useless this paper focuses on efficient damage assessment and repair in resilient distributed database systems the complexity of distributed database systems caused by data partition distributed transaction processing and failures makes damage assessment and repair much more challenging than in centralized database systems this paper identifies the key challenges and presents an efficient algorithm for distributed damage assessment and repair
this paper presents new algorithm that detects set of dominant points on the boundary of an eight connected shape to obtain polygonal approximation of the shape itself the set of dominant points is obtained from the original break points of the initial boundary where the integral square is zero for this goal most of the original break points are deleted by suppressing those whose perpendicular distance to an approximating straight line is lower than variable threshold value the proposed algorithm iteratively deletes redundant break points until the required approximation which relies on decrease in the length of the contour and the highest error is achieved comparative experiment with another commonly used algorithm showed that the proposed method produced efficient and effective polygonal approximations for digital planar curves with features of several sizes
multisource data flow problems involve information which may enter nodes independently through different classes of edges in some cases dissimilar meet operations appear to be used for different types of nodes these problems include bidirectional and flow sensitive problems as well as many static analyses of concurrent programs with synchronization tuple frameworks type of standard data flow framework provide natural encoding for multisource problems using single meet operator previously the solution of these problems has been described as the fixed point of set of data flow equations using our tuple representation we can access the general results of standard data flow frameworks concerning convergence time and solution precision for these problems we demonstrate this for the bidirectional component of partial redundancy suppression and two problems on the program summary graph an interesting subclass of tuple frameworks the join of meets frameworks is useful for reachability problems especially those stemming from analyses of explicitly parallel programs we give results on function space properties for join of meets frameworks that indicate precise solutions for most of them will be difficult to obtain
in this paper we analyze the node spatial distribution of mobile wireless ad hoc networks characterizing this distribution is of fundamental importance in the analysis of many relevant properties of mobile ad hoc networks such as connectivity average route length and network capacity in particular we have investigated under what conditions the node spatial distribution resulting after large number of mobility steps resembles the uniform distribution this is motivated by the fact that the existing theoretical results concerning mobile ad hoc networks are based on this assumption in order to test this hypothesis we performed extensive simulations using two well known mobility models the random waypoint model which resembles intentional movement and brownian like model which resembles nonintentional movement our analysis has shown that in brownian like motion the uniformity assumption does hold and that the intensity of the concentration of nodes in the center of the deployment region that occurs in the random waypoint model heavily depends on the choice of some mobility parameters for extreme values of these parameters the uniformity assumption is impaired
we present tool that predicts whether the software under development inside an ide has bug an ide plugin performs this prediction using the change classification technique to classify source code changes as buggy or clean during the editing session change classification uses support vector machines svm machine learning classifier algorithm to classify changes to projects mined from their configuration management repository this technique besides being language independent and relatively accurate can classify change immediately upon its completion and use features extracted solely from the change delta added deleted and the source code to predict buggy changes thus integrating change classification within an ide can predict potential bugs in the software as the developer edits the source code ideally reducing the amount of time spent on fixing bugs later to this end we have developed change classification plugin for eclipse based on client server architecture described in this paper
weak pseudorandom function wprf is cryptographic primitive similar to but weaker than pseudorandom function for wprfs one only requires that the output is pseudorandom when queried on random inputs we show that unlike normal prfs wprfs are seed incompressible in the sense that the output of wprf is pseudorandom even if bounded amount of information about the key is leakedas an application of this result we construct simple mode of operation which when instantiated with any wprf gives leakage resilient stream cipher the implementation of such cipher is secure against every side channel attack as long as the amount of information leaked per round is bounded but overall can be arbitrary large the construction is simpler than the previous one dziembowski pietrzak focs as it only uses single primitive wprf in straight forward manner
we describe framework for finding and tracking trails for autonomous outdoor robot navigation through combination of visual cues and ladar derived structural information the algorithm is able to follow paths which pass through multiple zones of terrain smoothness border vegetation tread material and illumination conditions our shape based visual trail tracker assumes that the approaching trail region is approximately triangular under perspective it generates region hypotheses from learned distribution of expected trail width and curvature variation and scores them using robust measure of color and brightness contrast with flanking regions the structural component analogously rewards hypotheses which correspond to empty or low density regions in groundstrike filtered ladar obstacle map our system’s performance is analyzed on several long sequences with diverse appearance and structural characteristics ground truth segmentations are used to quantify performance where available and several alternative algorithms are compared on the same data
after reviewing number of internet tools and technologies originating in the field of logic programming and discussing promissing directions of ongoing research we describe logic programming based networking infrastructure which combines reasoning and knowledge processing with flexible coordination of dynamic state changes and computation mobility as well as and its use for the design of intelligent mobile agent programs lightweight logic programming language jinni implemented in java is introduced as flexible scripting tool for gluing together knowledge processing components and java objects in networked client server applications and thin client environments as well as through applets over the web mobile threads implemented by capturing first order continuations in compact data structure sent over the network allow jinni to interoperate with remote high performance binprolog servers for cpu intensive knowledge processing controlled natural language to prolog translator with support of third party speech recognition and text to speech translation allows interaction with users not familiar with logic programming
science projects of various disciplines face fundamental challenge thousands of users want to obtain new scientific results by application specific and dynamic correlation of data from globally distributed sources considering the involved enormous and exponentially growing data volumes centralized data management reaches its limits since scientific data are often highly skewed and exploration tasks exhibit large degree of spatial locality we propose the locality aware allocation of data objects onto distributed network of interoperating databases hisbase is an approach to data management in scientific federated data grids that addresses the scalability issue by combining established techniques of database research in the field of spatial data structures quadtrees histograms and parallel databases with the scalable resource sharing and load balancing capabilities of decentralized peer to peer pp networks the proposed combination constitutes complementary science infrastructure enabling load balancing and increased query throughput
the web and especially major web search engines are essential tools in the quest to locate online information for many people this paper reports results from research that examines characteristics and changes in web searching from nine studies of five web search engines based in the us and europe we compare interactions occurring between users and web search engines from the perspectives of session length query length query complexity and content viewed among the web search engines the results of our research shows users are viewing fewer result pages searchers on us based web search engines use more query operators than searchers on european based search engines there are statistically significant differences in the use of boolean operators and result pages viewed and one cannot necessary apply results from studies of one particular web search engine to another web search engine the wide spread use of web search engines employment of simple queries and decreased viewing of result pages may have resulted from algorithmic enhancements by web search engine companies we discuss the implications of the findings for the development of web search engines and design of online content
in this paper we formulate two classes of problems the colored range query problems and the colored point enclosure query problems to model multi dimensional range and point enclosure queries in the presence of categorical information many of these problems are difficult to solve using traditional data structural techniques based on new framework of combining sketching techniques and traditional data structures we obtain two sets of results in solving the problems approximately and efficiently in addition the framework can be employed to attack other related problems by finding the appropriate summary structures
in this paper we show how pattern matching can be seen to arise from proof term assignment for the focused sequent calculus this use of the curry howard correspondence allows us to give novel coverage checking algorithm and makes it possible to give rigorous correctness proof for the classical pattern compilation strategy of building decision trees via matrices of patterns
in this paper we propose complete model handling the physical simulation of deformable objects we formulate continuous expressions for stretching bending and twisting energies these expressions are mechanically rigorous and geometrically exact both elastic and plastic deformations are handled to simulate wide range of materials we validate the proposed model in several classical test configurations the use of geometrical exact energies with dynamic splines provides very accurate results as well as interactive simulation times which shows the suitability of the proposed model for constrained cad applications we illustrate the application potential of the proposed model by describing virtual system for cable positioning which can be used to test compatibility between planned fixing clip positions and mechanical cable properties
emerging applications in the area of emergency response and disaster management are increasingly demanding interactive capabilities to allow for the quick understanding of critical situation in particular in urban environments key component of these interactive simulations is how to recreate the behavior of crowd in real time while supporting individual behaviors crowds can often be unpredictable and present mixed behaviors such as panic or aggression that can very rapidly change based on unexpected new elements introduced into the environment we present preliminary research specifically oriented towards the simulation of large crowds for emergency response and rescue planning situations our approach uses highly scalable architecture integrated with an efficient rendering architecture and an immersive visualization environment for interaction in this environment users can specify complex scenarios plug in crowd behavior algorithms and interactively steer the simulation to analyze and evaluate multiple what if situations
we introduce an algorithm for space variant filtering of video based on spatio temporal laplacian pyramid and use this algorithm to render videos in order to visualize prerecorded eye movements spatio temporal contrast and colour saturation are reduced as function of distance to the nearest gaze point of regard ie non fixated distracting regions are filtered out whereas fixated image regions remain unchanged results of an experiment in which the eye movements of an expert on instructional videos are visualized with this algorithm so that the gaze of novices is guided to relevant image locations show that this visualization technique facilitates the novices perceptual learning
this paper presents an investigation into dynamic self adjustment of task deployment and other aspects of self management through the embedding of multiple policiesnon dedicated loosely coupled computing environments such as clusters and grids are increasingly popular platforms for parallel processing these abundant systems are highly dynamic environments in which many sources of variability affect the run time efficiency of tasks the dynamism is exacerbated by the incorporation of mobile devices and wireless communicationthis paper proposes an adaptive strategy for the flexible run time deployment of tasks to continuously maintain efficiency despite the environmental variability the strategy centres on policy based scheduling which is informed by contextual and environmental inputs such as variance in the round trip communication time between client and its workers and the effective processing performance of each workera self management framework has been implemented for evaluation purposes the framework integrates several policy controlled adaptive services with the application code enabling the run time behaviour to be adapted to contextual and environmental conditions using this framework an exemplar self managing parallel application is implemented and used to investigate the extent of the benefits of the strategy
in the area of health care and sports in recent years variety of mobile applications have been established mobile devices are of emerging interest due to their high availability and increasing computing power in many different health scenarios in this paper we present scalable secure sensor monitoring platform ssmp which collects vital data of users vital parameters can be collected by just one single sensor or in multi sensor configuration nowadays wide spectrum of sensors is available which provide wireless connectivity eg bluetooth vital data can then easily be transmitted to mobile device which subsequently transmits these data to an ehealth portal there are already solutions implementing these capabilities however privacy aspects of users are very often neglected since health data may enable people to draw potentially compromising conclusions eg insurance companies it is absolutely necessary to design an enhanced security concept in this context to complicate matters further the trustworthiness of providers which are operating with user’s health data can not be determined by users priori this means that the security concept implemented by the provider may bear security flaws additionally there is no guarantee that the provider preserves the users privacy claims in this work we propose security concept incorporating privacy aspects using mobile devices for transferring and storing health data at portal in addition the concept guarantees anonymity in the transfer process as well as for stored data at service provider hence insider attacks based on stored data can be prevented
next generation system designs are challenged by multiple walls among them the inter related impediments offered by power dissipation limits and reliability are particularly difficult ones that all current chip system design teams are grappling with in this paper we first describe the attendant challenges in integrated multi dimensional pre silicon modeling and the solution approaches being pursued later we focus on leading edge solutions for power thermal and failure rate mitigation that have been proposed in our work over the past decade
the main goal of this paper is to provide routing table free online algorithms for wireless sensor networks wsns to select cost eg node residual energies and delay efficient paths as basic information to drive the routing process both node costs and hop count distances are considered particular emphasis is given to greedy routing schemes due to their suitability for resource constrained and highly dynamic networks for what concerns greedy forwarding we present the statistically assisted routing algorithm sara where forwarding decisions are driven by statistical information on the costs of the nodes within coverage and in the second order neighborhood by analysis we prove that an optimal online policy exists we derive its form and we exploit it as the core of sara besides greedy techniques sub optimal algorithms where node costs can be partially propagated through the network are also presented these techniques are based on real time learning lrta algorithms which through an initial exploratory phase converge to quasi globally optimal paths all the proposed schemes are then compared by simulation against globally optimal solutions discussing the involved trade offs and possible performance gains the results show that the exploitation of second order cost information in sara substantially increases the goodness of the selected paths with respect to fully localized greedy routing finally the path quality can be further increased by lrta schemes whose convergence can be considerably enhanced by properly setting real time search parameters however these solutions fail in highly dynamic scenarios as they are unable to adapt the search process to time varying costs
peer to peer pp networks are beginning to form the infrastructure of future applications computers are organized in pp overlay networks to facilitate search queries with reasonable cost so scalability is major aim in design of pp networks in this paper to obtain high factor of scalability we partition network search space using consistent static shared upper ontology we name our approach semantic partition tree spt all resources and queries are annotated using the upper ontology and queries are semantically routed in the overlay network also each node indexes addresses of other nodes that possess contents expressible by the concept it maintains so our approach can be conceived as an ontology based distributed hash table dht also we introduce lookup service for the network which is very scalable and independent of the network size and just depends on depth of the ontology tree further we introduce broadcast algorithm on the network we present worst case analysis of both lookup and broadcast algorithms and measure their performance using simulation the results show that our scheme is highly scalable and can be used in real pp applications
stacked wafer integration has the potential to improve multiprocessor system on chip mpsoc integration density performance and power efficiency however the power density of mpsocs increases with the number of active layers resulting in high chip temperatures this can reduce system reliability reduce performance and increase cooling cost thermal optimization for mpsocs imposes numerous challenges it is difficult to manage assignment and scheduling of heterogeneous workloads to maintain thermal safety in addition the thermal characteristics of mpsocs differ from those of mpsocs because each stacked layer has different thermal resistance to the ambient and vertically adjacent processors have strong temperature correlation we propose mpsoc thermal optimization algorithm that conducts task assignment scheduling and voltage scaling power balancing algorithm is initially used to distribute tasks among cores and active layers detailed thermal analysis is used to guide hotspot mitigation algorithm that incrementally reduces the peak mpsoc temperature by appropriately adjusting task execution times and voltage levels the proposed algorithm considers leakage power consumption and adapts to inter layer thermal heterogeneity performance evaluation on set of multiprogrammed and multithreaded benchmarks indicates that the proposed techniques can optimize dmpsoc power consumption power profile and chip peak temperature
the test scheduling problem is one of the major issues in the test integration of system on chip soc and test schedule is usually influenced by the test access mechanism tam in this paper we propose graph based approach to power constrained test scheduling with tam assignment and test conflicts also considered by mapping test schedule to subgraph of the test compatibility graph an interval graph recognition method can be used to determine the order of the core tests we then present heuristic algorithm that can effectively assign tam wires to the cores given the test order with the help of the tabu search method and the test compatibility graph the proposed algorithm allows rapid exploration of the solution space experimental results for the itc benchmarks show that short test length is achieved within reasonable computation time
commercial workloads form an important class of applications and have performance characteristics that are distinct from scientific and technical benchmarks such as spec cpu however due to the prohibitive simulation time of commercial workloads it is extremely difficult to use them in computer architecture research in this paper we study the efficacy of using statistical sampling based simulation methodology for two classes of commercial workloads java server benchmark specjbb and an online transaction processing oltp benchmark dbt our results show that although specjbb shows distinct garbage collection phases there are no large scale phases in the oltp benchmark we take advantage of this stationary behavior in steady phase and propose statistical sampling based simulation technique dynasim with two dynamic stopping rules in this approach the simulation terminates once the target accuracy has been met we apply dynasim to simulate commercial workloads and show that with the simulation of only few million total instructions the error can be within at confidence level of dynasim compares favorably with random sampling and representative sampling in terms of the total number of instructions simulated time cost and with representative sampling in terms of the number of checkpoints storage cost dynasim increases the usability of sampling based simulation approach for commercial workloads and will encourage the use of commercial workloads in computer architecture research
we describe new data format for storing triangular symmetric and hermitian matrices called rectangular full packed format rfpf the standard two dimensional arrays of fortran and also known as full format that are used to represent triangular and symmetric matrices waste nearly half of the storage space but provide high performance via the use of level blas standard packed format arrays fully utilize storage array space but provide low performance as there is no level packed blas we combine the good features of packed and full storage using rfpf to obtain high performance via using level blas as rfpf is standard full format representation also rfpf requires exactly the same minimal storage as packed the format each lapack full and or packed triangular symmetric and hermitian routine becomes single new rfpf routine based on eight possible data layouts of rfpf this new rfpf routine usually consists of two calls to the corresponding lapack full format routine and two calls to level blas routines this means no new software is required as examples we present lapack routines for cholesky factorization cholesky solution and cholesky inverse computation in rfpf to illustrate this new work and to describe its performance on several commonly used computer platforms performance of lapack full routines using rfpf versus lapack full routines using the standard format for both serial and smp parallel processing is about the same while using half the storage performance gains are roughly one to factor of for serial and one to factor of for smp parallel times faster using vendor lapack full routines with rfpf than with using vendor and or reference packed routines
application resource usage models can be used in the decision making process for ensuring quality of service as well as for capacity planning apart from their general use in performance modeling optimization and systems management current solutions for modeling application resource usage tend to address parts of the problem by either focusing on specific application or specific platform or on small subset of system resources we propose simple and flexible approach for modeling application resource usage in platform independent manner that enables the prediction of application resource usage on unseen platforms the technique proposed is application agnostic requiring no modification to the application binary or source and no knowledge of application semantics we implement linux based prototype and evaluate it using four different workloads including real world applications and benchmarks our experiments reveal prediction errors that are bound within of the observed for these workloads when using the proposed approach
similarity search is core module of many data analysis tasks including search by example classification and clustering for time series data dynamic time warping dtw has been proven very effective similarity measure since it minimizes the effects of shifting and distortion in time however the quadratic cost of dtw computation to the length of the matched sequences makes its direct application on databases of long time series very expensive we propose technique that decomposes the sequences into number of segments and uses cheap approximations thereof to compute fast lower bounds for their warping distances we present several progressively tighter bounds relying on the existence or not of warping constraints finally we develop an index and multi step technique that uses the proposed bounds and performs two levels of filtering to efficiently process similarity queries thorough experimental study suggests that our method consistently outperforms state of the art methods for dtw similarity search
various known models of probabilistic xml can be represented as instantiations of the abstract notion of documents in addition to ordinary nodes documents have distributional nodes that specify the possible worlds and their probabilistic distribution particular families of documents are determined by the types of distributional nodes that can be used as well as by the structural constraints on the placement of those nodes in document some of the resulting families provide natural extensions and combinations of previously studied probabilistic xml models the focus of the paper is on the expressive power of families of documents in particular two main issues are studied the first is the ability to efficiently translate given document of one family into another family the second is closure under updates namely the ability to efficiently represent the result of updating the instances of document of given family as another document of that family for both issues we distinguish two variants corresponding to value based and object based semantics of documents
we discuss the parallelization of algorithms for solving poly nomial systems symbolically by way of triangular decompositions we introduce component level parallelism for which the number of processors in use depends on the geometry of the solution set of the input system our long term goal is to achieve an efficient multi level parallelism coarse grained component level for tasks computing geometric objects in the solution sets and medium fine grained level for polynomial arithmetic such as gcd resultant computation within each task component level parallelization of triangular decompositions belongs to the class of dynamic irregular parallel applications which leads us to address the following question how to exploit geometrical information at an early stage of the solving process that would be favorable to parallelization we report on the effectiveness of the approaches that we have applied including modular methods solving by decreasing order of dimension task pool with dimension and rank guided scheduling we have extended the aldor programming language to support multiprocessed parallelism on smps and realized preliminary implementation our experimentation shows promising speedups for some well known problems and proves that our component level parallelization is practically efficient we expect that this speedup would add multiplicative factor to the speedup of medium fine grained level parallelization as parallel gcd and resultant computations
transactional coherence and consistency tcc offers way to simplify parallel programming by executing all code within transactions in tcc systems transactions serve as the fundamental unit of parallel work communication and coherence as each transaction completes it writes all of its newly produced state to shared memory atomically while restarting other processors that have speculatively read stale data with this mechanism tcc based system automatically handles data synchronization correctly without programmer intervention to gain the benefits of tcc programs must be decomposed into transactions we describe two basic programming language constructs for decomposing programs into transactions loop conversion syntax and general transaction forking mechanism with these constructs writing correct parallel programs requires only small incremental changes to correct sequential programs the performance of these programs may then easily be optimized based on feedback from real program execution using few simple techniques
modern stream applications such as sensor monitoring systems and publish subscription services necessitate the handling of large numbers of continuous queries specified over high volume data streams efficient sharing of computations among multiple continuous queries especially for the memory and cpu intensive window based operations is critical novel challenge in this scenario is to allow resource sharing among similar queries even if they employ windows of different lengths this paper first reviews the existing sharing methods in the literature and then illustrates the significant performance shortcomings of these methodsthis paper then presents novel paradigm for the sharing of window join queries namely we slice window states of join operator into fine grained window slices and form chain of sliced window joins by using an elaborate pipelining methodology the number of joins after state slicing is reduced from quadratic to linear this novel sharing paradigm enables us to push selections down into the chain and flexibly select subsequences of such sliced window joins for computation sharing among queries with different window sizes based on the state slice sharing paradigm two algorithms are proposed for the chain buildup one minimizes the memory consumption while the other minimizes the cpu usage the algorithms are proven to find the optimal chain with respect to memory or cpu usage for given query workload we have implemented the slice share paradigm within the data stream management system cape the experimental results show that our strategy provides the best performance over diverse range of workload settings among all alternate solutions in the literature
in ad hoc networks the performance is significantly degraded as the size of the network grows the network clustering by which the nodes are hierarchically organized on the basis of the proximity relieves this performance degradation finding the weakly connected dominating set wcds is promising approach for clustering the wireless ad hoc networks finding the minimum wcds in the unit disk graph is an np hard problem and host of approximation algorithms has been proposed in this article we first proposed centralized approximation algorithm called dla cc based on distributed learning automata dla for finding near optimal solution to the minimum wcds problem then we propose dla based clustering algorithm called dla dc for clustering the wireless ad hoc networks the proposed cluster formation algorithm is distributed implementation of dla cc in which the dominator nodes and their closed neighbors assume the role of the cluster heads and cluster members respectively in this article we compute the worst case running time and message complexity of the clustering algorithm for finding near optimal cluster head set we argue that by proper choice of the learning rate of the clustering algorithm trade off between the running time and message complexity of algorithm with the cluster head set size clustering optimality can be made the simulation results show the superiority of the proposed algorithms over the existing methods
traditional compilers compile and optimize files separately making worst case assumptions about the program context in which file is to be linked more aggressive compilation architectures perform cross file interprocedural or whole program analyses potentially producing much faster programs but substantially increasing the cost of compilation even more radical are systems that perform all compilation and optimization at run time such systems can optimize programs based on run time program and system properties as well as static whole program properties however run time compilers also called dynamic compilers or just in time compilers suffer under severe constraints on allowable compilation time since any time spent compiling steals from time spent running the program none of these compilation models dominates the others each has unique strengths and weaknesses not present in the other modelswe are developing new staged compilation model which strives to combine high run time code quality with low compilation overhead compilation is organized as series of stages with stages corresponding to for example separate compilation library linking program linking and run time execution any given optimization can be performed at any of these stages to reduce compilation time while maintaining high effectiveness an optimization should be performed at the earliest stage that provides the necessary program context information to carry out the optimization effectively moreover single optimization can itself be spread across multiple stages with earlier stages performing preplanning work that enables the final stage to complete the optimization quickly in this way we hope to produce highly optimized programs nearly as good as what could be done with purely run time compiler that had an unconstrained compilation time budget but at much more practical compile time costwe are building the whirlwind optimizing compiler as the concrete embodiment of this staged compilation model initially targeting object oriented languages key component of whirlwind is set of techniques for automatically constructing staged compilers from traditional unstaged compilers including aggressive applications of specialization and other partial evaluation technology
many cache management schemes designed for mobile environments are based on invalidation reports irs however ir based approach suffers from long query latency and it cannot efficiently utilize the broadcast bandwidth in this paper we propose techniques to address these problems first by replicating small fraction of the essential information related to cache invalidation the query latency can be reduced then we propose techniques to efficiently utilize the broadcast bandwidth based on counters associated with each data item novel techniques are designed to maintain the accuracy of the counter in case of server failures client failures and disconnections extensive simulations are provided and used to evaluate the proposed methodology compared to previous ir based algorithms the proposed solution can significantly reduce the query latency improve the bandwidth utilization and effectively deal with disconnections and failures
recently many natural language processing nlp applications have improved the quality of their output by using various machine learning techniques to mine information extraction ie patterns for capturing information from the input text currently to mine ie patterns one should know in advance the type of the information that should be captured by these patterns in this work we propose novel methodology for corpus analysis based on cross examination of several document collections representing different instances of the same domain we show that this methodology can be used for automatic domain template creation as the problem of automatic domain template creation is rather new there is no well defined procedure for the evaluation of the domain template quality thus we propose methodology for identifying what information should be present in the template using this information we evaluate the automatically created domain templates through the text snippets retrieved according to the created templates
model transformations provide powerful capability to automate model refinements however the use of model transformation languages may present challenges to those who are unfamiliar with specific transformation language this paper presents an approach called model transformation by demonstration mtbd which allows an end user to demonstrate the exact transformation desired by actually editing source model and demonstrating the changes that evolve to target model an inference engine built into the underlying modeling tool records all editing operations and infers transformation pattern which can be reused in other models the paper motivates the need for the approach and discusses the technical contributions of mtbd case study with several sample inferred transformations serves as concrete example of the benefits of mtbd
spectral clustering refers to flexible class of clustering procedures that can produce high quality clusterings on small data sets but which has limited applicability to large scale problems due to its computational complexity of in general with the number of data points we extend the range of spectral clustering by developing general framework for fast approximate spectral clustering in which distortion minimizing local transformation is first applied to the data this framework is based on theoretical analysis that provides statistical characterization of the effect of local distortion on the mis clustering rate we develop two concrete instances of our general framework one based on local means clustering kasp and one based on random projection trees rasp extensive experiments show that these algorithms can achieve significant speedups with little degradation in clustering accuracy specifically our algorithms outperform means by large margin in terms of accuracy and run several times faster than approximate spectral clustering based on the nystrom method with comparable accuracy and significantly smaller memory footprint remarkably our algorithms make it possible for single machine to spectral cluster data sets with million observations within several minutes
many software systems suffer from missing support for behavioral runtime composition and configuration of software components the concern behavioral composition and configuration is not treated as first class entity but instead it is hard coded in different programming styles leading to tangled composition and configuration code that is hard to understand and maintain we propose to embed dynamic language with tailorable object and class concept into the host language in which the components are written and use the tailorable language for behavioral composition and configuration tasks using this approach we can separate the concerns behavioral composition and configuration from the rest of the software system leading to more reusable understandable and maintainable composition and configuration of software components
deep packet inspection dpi has been widely adopted in detecting network threats such as intrusion viruses and spam it is challenging however to achieve high speed dpi due to the expanding rule sets and ever increasing line rates key issue is that the size of the finite automata falls beyond the capacity of on chip memory thus incurring expensive off chip accesses in this paper we present dpico hardware based dpi engine that utilizes novel techniques to minimize the storage requirements for finite automata the techniques proposed are modified content addressable memory mcam interleaved memory banks and data packing the experiment results show the scalable performance of dpico can achieve up to gbps throughput using contemporary fpga chip experiment data also show that dpico based accelerator can improve the pattern matching performance of dpi server by up to times
an accurate tractable analytic cache model for time shared systems is presented which estimates the overall cache miss rate of multiprocessing system with any cache size and time quanta the input to the model consists of the isolated miss rate curves for each process the time quanta for each of the executing processes and the total cache size the output is the overall miss rate trace driven simulations demonstrate that the estimated miss rate is very accurate since the model provides fast and accurate way to estimate the effect of context switching it is useful for both understanding the effect of context switching on caches and optimizing cache performance for time shared systems cache partitioning mechanism is also presented and is shown to improve the cache miss rate up to over the normal lru replacement policy
in this paper we investigate technique for fusing approximate knowledge obtained from distributed heterogeneous information sources this issue is substantial eg in modeling multiagent systems where group of loosely coupled heterogeneous agents cooperate in achieving common goal information exchange leading ultimately to knowledge fusion is natural and vital ingredient of this process we use generalization of rough sets and relations which depends on allowing arbitrary similarity relations the starting point of this research is where framework for knowledge fusion in multiagent systems is introduced agents individual perceptual capabilities are represented by similarity relations further aggregated to express joint capabilities of teams this aggregation expressing shift from individual to social level of agents activity has been formalized by means of dynamic logic the approach of doherty et al uses the full propositional dynamic logic which does not guarantee tractability of reasoning our idea is to adapt the techniques of nguyen to provide an engine for tractable approximate database querying restricted to horn fragment of serial dynamic logic we also show that the obtained formalism is quite powerful in applications
the problem of modeling memory locality of applications to guide compiler optimizations in systematic manner is an important unsolved problem made even more significant with the advent of multi core and many core architectures we describe an approach based on novel source level metric called static reuse distance to model the memory behavior of applications written in matlab we use matlab as representative language that lets end users express their algorithms precisely but at relatively high level matlab’s high level characteristics allow the static analysis to focus on large objects such as arrays without losing accuracy due to processor specific layout of scalar values in memory we present an efficient algorithm to compute static reuse distances using an extended version of dependence graphs our approach differs from earlier similar attempts in three important aspects it targets high level programming systems characterized by heavy use of libraries it works on full programs instead of being confined to loops and it integrates practical mechanisms to handle separately compiled procedures as well as pre compiled library procedures that are only available in binary form we study matlab code taken from real programs to demonstrate the effectiveness of our model finally we present some applications of our approach to program transformations that are known to be important in matlab but are expected to be relevant to other similar high level languages as well
the problem of frequent item discovery in streaming data has attracted lot of attention lately while the above problem has been studied extensively and several techniques have been proposed for its solution these approaches treat all the values of the data stream equally nevertheless not all values are of equal importance in several situations we are interested more in the new values that have appeared in the stream rather than in the older onesin this paper we address the problem of finding recent frequent items in data stream given small bounded memory and present novel algorithms to this direction we propose basic algorithm that extends the functionality of existing approaches by monitoring item frequencies in recent windows subsequently we present an improved version of the algorithm with significantly improved performance in terms of accuracy at no extra memory cost finally we perform an extensive experimental evaluation and show that the proposed algorithms can efficiently identify the frequent items in ad hoc recent windows of data stream
the past two decades have presented significant technological developments of mobile information and communication technology ict such as portable technologies eg mobile phones notebook computers personal digital assistants and associated wireless infrastructures eg wireless local area networks mobile telecommunications infrastructures bluetooth personal area networks mobile ict offers range of technical opportunities for organisations and their members to implement enterprise mobility however the challenges of unlocking the opportunities of enterprise mobility are not well understood one of the key issues is to establish systems and associated working practices that are deemed usable by both individuals and the organisation the aim of this paper is to show that the concept of organisational usability can enrich the understanding of mobile ict in organisations as an addition to the traditional understanding of individual usability organisational usability emphasises the role of mobile ict beyond individual support large scale study of four different ways of organising foreign exchange trading in middle eastern bank serves as the concrete foundation for the discussion the empirical study showed how the final of the four attempts at establishing trading deployed mobile ict to enable mobile trading and by providing solution which was deemed usable for both the organisation and the traders the paper contributes to the understanding of how usability of mobile ict critically depends on carefully balancing individual and organisational requirements it also demonstrates the need for research in enterprise mobility to embrace both individual and organisational concerns in order to grasp the complexity of the phenomena
high performance clusters have been growing rapidly in scale most of these clusters deploy high speed interconnect such as infini band to achieve higher performance most scientific applications executing on these clusters use the message passing interface mpi as the parallel programming model thus the mpi library has key role in achieving application performance by consuming as few resources as possible and enabling scalable performance state of the art mpi implementations over infiniband primarily use the reliable connection rc transport due to its good performance and attractive features however the rc transport requires connection between every pair of communicating processes with each requiring several kb of memory as clusters continue to scale memory requirements in rc based implementations increase the connection less unreliable datagram ud transport is an attractive alternative which eliminates the need to dedicate memory for each pair of processes in this paper we present high performance ud based mpi design we implement our design and compare the performance and resource usage with the rc based mvapich we evaluate npb smg sweepd and sppm up to processes on an core infiniband cluster for smg our prototype shows speedup and seven fold reduction in memory for processes additionally based on our model our design has an estimated times reduction in memory over mvapich at processes when all connections are created to the best of our knowledge this is the first research work that presents high performance mpi design over infiniband that is completely based on ud and can achieve near identical or better application performance than rc
outlier detection has been used for centuries to detect and where appropriate remove anomalous observations from data outliers arise due to mechanical faults changes in system behaviour fraudulent behaviour human error instrument error or simply through natural deviations in populations their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences it can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing the original outlier detection methods were arbitrary but now principled and systematic techniques are used drawn from the full gamut of computer science and statistics in this paper we introduce survey of contemporary techniques for outlier detection we identify their respective motivations and distinguish their advantages and disadvantages in comparative review
this paper presents novel image feature representation method called multi texton histogram mth for image retrieval mth integrates the advantages of co occurrence matrix and histogram by representing the attribute of co occurrence matrix using histogram it can be considered as generalized visual attribute descriptor but without any image segmentation or model training the proposed mth method is based on julesz’s textons theory and it works directly on natural images as shape descriptor meanwhile it can be used as color texture descriptor and leads to good performance the proposed mth method is extensively tested on the corel dataset with natural images the results demonstrate that it is much more efficient than representative image feature descriptors such as the edge orientation auto correlogram and the texton co occurrence matrix it has good discrimination power of color texture and shape features
it is widely believed that distributed software development is riskier and more challenging than collocated development prior literature on distributed development in software engineering and other fields discuss various challenges including cultural barriers expertise transfer difficulties and communication and coordination overhead we evaluate this conventional belief by examining the overall development of windows vista and comparing the post release failures of components that were developed in distributed fashion with those that were developed by collocated teams we found negligible difference in failures this difference becomes even less significant when controlling for the number of developers working on binary we also examine component characteristics such as code churn complexity dependency information and test code coverage and find very little difference between distributed and collocated components to investigate if less complex components are more distributed further we examine the software process and phenomena that occurred during the vista development cycle and present ways in which the development process utilized may be insensitive to geography by mitigating the difficulties introduced in prior work in this area
seed based framework for textual information extraction allows for weakly supervised extraction of named entities from anonymized web search queries the extraction is guided by small set of seed named entities without any need for handcrafted extraction patterns or domain specific knowledge allowing for the acquisition of named entities pertaining to various classes of interest to web search users inherently noisy search queries are shown to be highly valuable albeit little explored resource for web based named entity discovery
making the structure of software visible during system development helps build shared understanding of the context for each piece of work ii identify progress with implementation and iii highlight any conflict between individual development activities finding an adequate representation for such information is not straightforward especially for large applications this paper describes an implementation of such visualization system designed to explore some of the issues involved the approach is based on war room command console metaphor and uses bank of eight linked consoles to present information the tool was applied to several industrial software systems written in mixture of java and one of which was over million lines of code in size
this paper is concerned with the parallel evaluation of datalog rule programs mainly by processors that are interconnected by communication network we introduce paradigm called data reduction for the parallel evaluation of general datalog program several parallelization strategies discussed previously in cw gst ws are special cases of this paradigm the paradigm parallelizes the evaluation by partitioning among the processors the instantiations of the rules after presenting the paradigm we discuss the following issues that we see fundamental for parallelization strategies derived from the paradigm properties of the strategies that enable reduction in the communication overhead decomposability load balancing and application to programs with negation we prove that decomposability concept introduced previously in ws cw is undecidable
dynamic load balancing is key factor in achieving high performance for large scale distributed simulations on grid infrastructures in grid environment the available resources and the simulation’s computation and communication behavior may experience run time critical imbalances consequently an initial static partitioning should be combined with dynamic load balancing scheme to ensure the high performance of the distributed simulation many improved or novel dynamic load balancing designs have been proposed in recent years which aim to improve the distributed simulation performance such designs are in general non formalized and the realizations of the designs are highly time consuming and error prone practices in this paper we propose formal dynamic load balancing design approach using discrete event system specification devs we discuss the feasibility of using devs and as an additional step we consider studying recently proposed design through formalized devs model system our focus is how devs component based formalized design approach can predict some of the key design factors before the design is realized or can further validate and consolidate realized dynamic load balancing designs
the glasgow haskell compiler ghc has quite sophisticated support for concurrency in its runtime system which is written in low level code as ghc evolves the runtime system becomes increasingly complex error prone difficult to maintain and difficult to add new concurrency features this paper presents an alternative approach to implement concurrency in ghc rather than hard wiring all kinds of concurrency features the runtime system is thin substrate providing only small set of concurrency primitives and the remaining concurrency features are implemented in software libraries written in haskell this design improves the safety of concurrency support it also provides more customizability of concurrency features which can be developed as haskell library packages and deployed modularly
for large scale and residual software like network service reliability is critical requirement recent research has shown that most of network software still contains number of bugs methods for automated detection of bugs in software can be classified into static analysis based on formal verification and runtime checking based on fault injection in this paper framework for checking software security vulnerability is proposed the framework is based on automated bug detection technologies ie static analysis and fault injection which are complementary each other the proposed framework provides new direction in which various kinds of software can be checked its vulnerability by making use of static analysis and fault injection technology in experiment on proposed framework we find unknown vulnerability as well as known vulnerability in windows network module
we present the design implementation and evaluation of promise novel peer to peer media streaming system encompassing the key functions of peer lookup peer based aggregated streaming and dynamic adaptations to network and peer conditions particularly promise is based on new application level pp service called collectcast collectcast performs three main functions inferring and leveraging the underlying network topology and performance information for the selection of senders monitoring the status of peers and connections and reacting to peer connection failure or degradation with low overhead dynamically switching active senders and standby senders so that the collective network performance out of the active senders remains satisfactory based on both real world measurement and simulation we evaluate the performance of promise and discuss lessons learned from our experience with respect to the practicality and further optimization of promise
several mesh like coarse grained reconfigurable architectures have been devised in the last few years accompanied with their corresponding mapping flows one of the major bottlenecks in mapping algorithms on these architectures is the limited memory access bandwidth only few mapping methodologies encountered the problem of the limited bandwidth while none has explored how the performance improvements are affected from the architectural characteristics we study in this paper the impact that the architectural parameters have on performance speedups achieved when the pes local rams are used for storing the variables with data reuse opportunities the data reuse values are transferred in the internal interconnection network instead of being fetched from external memories in order to reduce the data transfer burden on the bus network novel mapping algorithm is also proposed that uses list scheduling technique the experimental results quantified the trade offs that exist between the performance improvements and the memory access latency the interconnection network and the processing element’s local ram size for this reason our mapping methodology targets on flexible architecture template which permits such an exploration more specifically the experiments showed that the improvements increase with the memory access latency while richer interconnection topology can improve the operation parallelism by factor of on average finally for the considered set of benchmarks the operation parallelism has been improved from to from the application of our methodology and by having each pe’s local ram size of words
operational transformation ot is technique originally invented for supporting consistency maintenance in collaborative text editors word processors have much richer data types and more comprehensive operations than plain text editors among others the capability of updating attributes of any types of object is an essential feature of all word processors in this paper we report an extension of ot for supporting generic update operation in addition to insert and delete operations for collaborative word processing we focus on technical issues and solutions involved in transforming updates for both consistency maintenance and group undo novel technique called multi version single display mvsd has been devised to resolve conflict between concurrent updates and integrated into the framework of ot this work has been motivated by and conducted in the coword project which aims to convert ms word into real time collaborative word processor without changing its source code this ot extension is relevant not only to word processors but also to range of interactive applications that can be modelled as editors
we present fully automatic method for content selection evaluation in summarization that does not require the creation of human model summaries our work capitalizes on the assumption that the distribution of words in the input and an informative summary of that input should be similar to each other results on large scale evaluation from the text analysis conference show that input summary comparisons are very effective for the evaluation of content selection our automatic methods rank participating systems similarly to manual model based pyramid evaluation and to manual human judgments of responsiveness the best feature jensen shannon divergence leads to correlation as high as with manual pyramid and with responsiveness evaluations
software comprehension understanding software structure and behavior is essential for developing maintaining and improving software this is particularly true of agent based systems in which the actions of autonomous agents are affected by numerous factors such as events in dynamic environment local uncertain beliefs and intentions of other agents existing comprehension tools are not suited to such large concurrent software and do not leverage concepts of the agent oriented paradigm to aid the user in understanding the software’s behavior to address the software comprehension of agent based systems this research proposes method and accompanying tool that automates some of the manual tasks performed by the human user during software comprehension such as explanation generation and knowledge verification
this paper focuses on the realizability problem of framework for modeling and specifying the global behavior of reactive electronic services services in this framework web accessible programs peers communicate by asynchronous message passing and virtual global watcher listens silently to the network the global behavior is characterized by conversation which is the infinite sequence of messages observed by the watcher we show that given büchi automaton specifying the desired set of conversations called conversation protocol it is possible to implement it using set of finite state peers if three realizability conditions are satisfied in particular the synthesized peers will conform to the protocol by generating only those conversations specified by the protocol our results enable top down verification strategy where conversation protocol is specified by realizable büchi automaton the properties of the protocol are verified on the büchi automaton specification the peer implementations are synthesized from the protocol via projection
traditional duplicate elimination techniques are not applicable to many data stream applications in general precisely eliminating duplicates in an unbounded data stream is not feasible in many streaming scenarios therefore we target at approximately eliminating duplicates in streaming environments given limited space based on well known bitmap sketch we introduce data structure stable bloom filter and novel and simple algorithm the basic idea is as follows since there is no way to store the whole history of the stream sbf continuously evicts the stale information so that sbf has room for those more recent elements after finding some properties of sbf analytically we show that tight upper bound of false positive rates is guaranteed in our empirical study we compare sbf to alternative methods the results show that our method is superior in terms of both accuracy and time effciency when fixed small space and an acceptable false positive rate are given
modern techniques for distributed information retrieval use set of documents sampled from each server but these samples have been underutilised in server selection we describe new server selection algorithm sushi which unlike earlier algorithms can make full use of the text of each sampled document and which does not need training data sushi can directly optimise for many common cases including high precision retrieval and by including simple stopping condition can do so while reducing network traffic our experiments compare sushi with alternatives and show it achieves the same effectiveness as the best current methods while being substantially more efficient selecting as few as as many servers
design and control of vector fields is critical for many visualization and graphics tasks such as vector field visualization fluid simulation and texture synthesis the fundamental qualitative structures associated with vector fields are fixed points periodic orbits and separatrices in this paper we provide new technique that allows for the systematic creation and cancellation of fixed points and periodic orbits this technique enables vector field design and editing on the plane and surfaces with desired qualitative properties the technique is based on conley theory which provides unified framework that supports the cancellation of fixed points and periodic orbits we also introduce novel periodic orbit extraction and visualization algorithm that detects for the first time periodic orbits on surfaces furthermore we describe the application of our periodic orbit detection and vector field simplification algorithms to engine simulation data demonstrating the utility of the approach we apply our design system to vector field visualization by creating data sets containing periodic orbits this helps us understand the effectiveness of existing visualization techniques finally we propose new streamline based technique that allows vector field topology to be easily identified
sensor networks are very specific type of wireless networks where both security and performance issues need to be solved efficiently in order to avoid manipulations of the sensed data and at the same time minimize the battery energy consumption this paper proposes an efficient way to perform data collection by grouping the sensors in aggregation zones allowing the aggregators to process the sensed data inside the aggregation zone in order to minimize the amount of transmissions to the sink moreover the paper provides security mechanism based on hash chains to secure data transmissions in networks with low capability sensors and without the requirements of an instantaneous source authentication
grid resources are non dedicated and thus grid users are forced to compete with resource owners for idle cpu cycles as result the turnaround times of both the grid jobs and the owners jobs are invariably delayed to resolve this problem the current study proposes progressive multi layer resource reconfiguration framework designated as pmr in which intra and inter site reconfiguration strategies are employed to adapt grid users jobs dynamically to changes in the available cpu resources at each node the experimental results show that pmr enables the idle cpu cycles of resource to be fully exploited by grid users with minimum interference to the resource owner’s jobs
there exist emerging applications of data streams that require association rule mining such as network traffic monitoring and web click streams analysis different from data in traditional static databases data streams typically arrive continuously in high speed with huge amount and changing data distribution this raises new issues that need to be considered when developing association rule mining techniques for stream data this paper discusses those issues and how they are addressed in the existing literature
classification is an important problem in data mining given an example and class classifier usually works by estimating the probability of being member of ie membership probability well calibrated classifiers are those able to provide accurate estimates of class membership probabilities that is the estimated probability is close to which is the true empirical probability of being member of given that the probability estimated by the classifier is calibration is not necessary property for producing accurate classifiers and thus most of the research has focused on direct accuracy maximization strategies ie maximum margin rather than on calibration however non calibrated classifiers are problematic in applications where the reliability associated with prediction must be taken into account ie cost sensitive classification cautious classification etc in these applications sensible use of the classifier must be based on the reliability of its predictions and thus the classifier must be well calibrated in this paper we show that lazy associative classifiers lac are accurate and well calibrated using well known sound entropy minimization method we explore important applications where such characteristics ie accuracy and calibration are relevant and we demonstrate empirically that lac drastically outperforms other classifiers such as svms naive bayes and decision trees even after these classifiers are calibrated by specific methods additional highlights of lac include the ability to incorporate reliable predictions for improving training and the ability to refrain from doubtful predictions
in recent years wireless sensor networking has shown great promise in applications ranging from industrial control environmental monitoring and inventory tracking given the resource constrained nature of sensor devices and the dynamic wireless channel used for communication sensor networking protocol needs to be compact energy efficient and highly adaptable in this paper we present sampl simple aggregation and message passing layer aimed at flexible aggregation of sensor information over long period of time and supporting sporadic messages from mobile devices sampl is compact network layer that operates on top of low power csma ca based mac protocol the protocol has been designed with extensibility in mind to support new transducer devices and unforeseen applications without requiring reprogramming of the entire network sampl uses highly adaptive tree based routing scheme to achieve highly robust operation in time varying environment the protocol supports peer to peer data transactions local storage of data similar to what many rfid systems provide as well as secure gateway to infrastructure communication sampl is built on top of the nano rk operating system that runs on the firefly sensor networking platform nano rk’s resource management primitives are used to create virtual energy budgets within sampl that enforce application lifetimes as of october sampl has been operating as part of the sensor andrew project at carnegie mellon university with battery powered sensor nodes for over seven months and continues to be actively used as research testbed we describe our deployment tools and network health monitoring strategies necessary for configuring and maintaining long term operation of sensor network our approach has led to sustainable average packet success rate of across the entire network
to execute mpi applications reliably fault tolerance mechanisms are needed message logging is well known solution to provide fault tolerance for mpi applications it as been proved that it can tolerate higher failure rate than coordinated checkpointing however pessimistic and causal message logging can induce high overhead on failure free execution in this paper we present op new optimistic message logging protocol based on active optimistic message logging contrary to existing optimistic message logging protocols that saves dependency information on reliable storage periodically op logs dependency information as soon as possible to reduce the amount of data piggybacked on application messages thus it reduces the overhead of the protocol on failure free execution making it more scalable and simplifying recovery op is implemented as module of the open mpi library experiments show that active message logging is promising to improve scalability and performance of optimistic message logging
we introduce sensordcsp naturally distributed benchmark based on real world application that arises in the context of networked distributed systems in order to study the performance of distributed csp discsp algorithms in truly distributed setting we use discrete event network simulator which allows us to model the impact of different network traffic conditions on the performance of the algorithms we consider two complete discsp algorithms asynchronous backtracking abt and asynchronous weak commitment search awc and perform performance comparison for these algorithms on both satisfiable and unsatisfiable instances of sensordcsp we found that random delays due to network traffic or in some cases actively introduced by the agents combined with dynamic decentralized restart strategy can improve the performance of discsp algorithms in addition we introduce gsensordcsp plain embedded version of sensordcsp that is closely related to various real life dynamic tracking systems we perform both analytical and empirical study of this benchmark domain in particular this benchmark allows us to study the attractiveness of solution repairing for solving sequence of discsps that represent the dynamic tracking of set of moving objects
we analyze theoretically the subspace best approximating images of convex lambertian object taken from the same viewpoint but under different distant illumination conditions since the lighting is an arbitrary function the space of all possible images is formally infinite dimensional however previous empirical work has shown that images of largely diffuse objects actually lie very close to five dimensional subspace in this paper we analytically construct the principal component analysis for images of convex lambertian object explicitly taking attached shadows into account and find the principal eigenmodes and eigenvalues with respect to lighting variability our analysis makes use of an analytic formula for the irradiance in terms of spherical harmonic coefficients of the illumination and shows under appropriate assumptions that the principal components or eigenvectors are identical to the spherical harmonic basis functions evaluated at the surface normal vectors our main contribution is in extending these results to the single viewpoint case showing how the principal eigenmodes and eigenvalues are affected when only limited subset the upper hemisphere of normals is available and the spherical harmonics are no longer orthonormal over the restricted domain our results are very close both qualitatively and quantitatively to previous empirical observations and represent the first essentially complete theoretical explanation of these observations our analysis is also likely to be of interest in other areas of computer vision and image based rendering in particular our results indicate that using complex illumination for photometric problems in computer vision is not significantly more difficult than using directional sources
various programming languages allow the construction of structure shy programs such programs are defined generically for many different datatypes and only specify specific behavior for few relevant subtypes typical examples are xml query languages that allow selection of subdocuments without exhaustively specifying intermediate element tags other examples are languages and libraries for polytypic or strategic functional programming and for adaptive object oriented programming in this paper we present an algebraic approach to transformation of declarative structure shy programs in particular for strategic functions and xml queries we formulate rich set of algebraic laws not just for transformation of structure shy programs but also for their conversion into structure sensitive programs and vice versa we show how subsets of these laws can be used to construct effective rewrite systems for specialization generalization and optimization of structure shy programs we present type safe encoding of these rewrite systems in haskell which itself uses strategic functional programming techniques
programs which manipulate pointers are hard to debug pointer analysis algorithms originally aimed at optimizing compilers may provide some remedy by identifying potential errors such as dereferencing null pointers by statically analyzing the behavior of programs on all their input dataour goal is to identify the core program analysis techniques that can be used when developing realistic tools which detect memory errors at compile time without generating too many false alarms our preliminary experience indicates that the following techniques are necessary finding aliases between pointers ii flow sensitive techniques that account for the program control flow constructs iii partial interpretation of conditional statements iv analysis of the relationships between pointers and sometimes analysis of the underlying data structures manipulated by the programwe show that combination of these techniques can yield better results than those achieved by state of the art tools yet it is not clear to us whether our ideas are applicable to large programs
in this paper we introduce new approach for the embedding of linear elastic deformable models our technique results in significant improvements in the efficient physically based simulation of highly detailed objects first our embedding takes into account topological details that is disconnected parts that fall into the same coarse element are simulated independently second we account for the varying material properties by computing stiffness and interpolation functions for coarse elements which accurately approximate the behaviour of the embedded material finally we also take into account empty space in the coarse embeddings which provides better simulation of the boundary the result is straightforward approach to simulating complex deformable models with the ease and speed associated with coarse regular embedding and with quality of detail that would only be possible at much finer resolution
recently the increasing use of time series data has initiated various research and development attempts in the field of data and knowledge management time series data is characterized as large in data size high dimensionality and update continuously moreover the time series data is always considered as whole instead of individual numerical fields indeed large set of time series data is from stock market stock time series has its own characteristics over other time series moreover dimensionality reduction is an essential step before many time series analysis and mining tasks for these reasons research is prompted to augment existing technologies and build new representation to manage financial time series data in this paper financial time series is represented according to the importance of the data points with the concept of data point importance tree data structure which supports incremental updating is proposed to represent the time series and an access method for retrieving the time series data point from the tree which is according to their order of importance is introduced this technique is capable to present the time series in different levels of detail and facilitate multi resolution dimensionality reduction of the time series data in this paper different data point importance evaluation methods new updating method and two dimensionality reduction approaches are proposed and evaluated by series of experiments finally the application of the proposed representation on mobile environment is demonstrated
through analysis and experiments this paper investigates two phase waiting algorithms to minimize the cost of waiting for synchronization in large scale multiprocessors in two phase algorithm thread first waits by polling synchronization variable if the cost of polling reaches limit lpoll and further waiting is necessary the thread is blocked incurring an additional fixed cost the choice of lpoll is critical determinant of the performance of two phase algorithms we focus on methods for statically determining lpoll because the run time overhead of dynamically determining lpoll can be comparable to the cost of blocking in large scale multiprocessor systems with lightweight threads our experiments show that always block lpoll is good waiting algorithm with performance that is usually close to the best of the algorithms compared we show that even better performance can be achieved with static choice of lpoll based on knowledge of likely wait time distributions motivated by the observation that different synchronization types exhibit different wait time distributions we prove that static choice of lpoll can yield close to optimal on line performance against an adversary that is restricted to choosing wait times from fixed family of probability distributions this result allows us to make an optimal static choice of lpoll based on synchronization type for exponentially distributed wait times we prove that setting lpoll results in waiting cost that is no more than times the cost of an optimal off line algorithm for uniformly distributed wait times we prove that setting lpoll square root of results in waiting cost that is no more than square root of the golden ratio times the cost of an optimal off line algorithm experimental measurements of several parallel applications on the alewife multiprocessor simulator corroborate our theoretical findings
code sandboxing is useful for many purposes but most sandboxing techniques require kernel modifications do not completely isolate guest code or incur substantial performance costs vx is multipurpose user level sandbox that enables any application to load and safely execute one or more guest plug ins confining each guest to system call api controlled by the host application and to restricted memory region within the host’s address space vx runs guest code efficiently on several widespread operating systems without kernel extensions or special privileges it protects the host program from both reads and writes by its guests and it allows the host to restrict the instruction set available to guests the key to vx combination of portability flexibility and efficiency is its use of segmentation hardware to sandbox the guest’s data accesses along with lightweight instruction translator to sandbox guest instructions we evaluate vx using microbenchmarks and whole system benchmarks and we examine four applications based on vx an archival storage system an extensible public key infrastructure an experimental user level operating system running atop another host os and linux system call jail the first three applications export custom apis independent of the host os to their guests making their plug ins binary portable across host systems compute intensive workloads for the first two applications exhibit between slowdown and speedup on vx relative to native execution speedups result from vx instruction translator improving the cache locality of guest code the experimental user level operating system allows the use of the guest os’s applications alongside the host’s native applications and runs faster than whole system virtual machine monitors such as vmware and qemu the linux system call jail incurs up to overhead but requires no kernel modifications and is delegation based avoiding concurrency vulnerabilities present in other interposition mechanisms
little is known about the strategies end user programmers use in debugging their programs and even less is known about gender differences that may exist in these strategies without this type of information designers of end user programming systems cannot know the target at which to aim if they are to support male and female end user programmers we present study investigating this issue we asked end user programmers to debug spreadsheets and to describe their debugging strategies using mixed methods we analyzed their strategies and looked for relationships among participants strategy choices gender and debugging success our results indicate that males and females debug in quite different ways that opportunities for improving support for end user debugging strategies for both genders are abundant and that tools currently available to end user debuggers may be especially deficient in supporting debugging strategies used by females
visual complexity is an apparent feature in website design yet its effects on cognitive and emotional processing are not well understood the current study examined website complexity within the framework of aesthetic theory and psychophysiological research on cognition and emotion we hypothesized that increasing the complexity of websites would have detrimental cognitive and emotional impact on users in passive viewing task pvt website screenshots differing in their degree of complexity operationalized by jpeg file size correlation with complexity ratings in preliminary study were presented to participants in randomized order additionally standardized visual search task vst assessing reaction times and one week delayed recognition task on these websites were conducted and participants rated all websites for arousal and valence psychophysiological responses were assessed during the pvt and vst visual complexity was related to increased experienced arousal more negative valence appraisal decreased heart rate and increased facial muscle tension musculus corrugator visual complexity resulted in increased reaction times in the vst and decreased recognition rates reaction times in the vst were related to increases in heart rate and electrodermal activity these findings demonstrate that visual complexity of websites has multiple effects on human cognition and emotion including experienced pleasure and arousal facial expression autonomic nervous system activation task performance and memory it should thus be considered an important factor in website design
the vast expansion of interconnectivity with the internet and the rapid evolution of highly capable but largely insecure mobile devices threatens cellular networks in this paper we characterize the impact of the large scale compromise and coordination of mobile phones in attacks against the core of these networks through combination of measurement simulation and analysis we demonstrate the ability of botnet composed of as few as compromised mobile phones to degrade service to area code sized regions by as such attacks are accomplished through the execution of network service requests and not constant stream of phone calls users are unlikely to be aware of their occurrence we then investigate number of significant network bottlenecks their impact on the density of compromised nodes per base station and how they can be avoided we conclude by discussing number of countermeasures that may help to partially mitigate the threats posed by such attacks
many enterprise applications require the use of object oriented middleware and message oriented middleware in combination middleware mediated transacdons have been proposed as transaction model to address reliability of such applications they extend distributed object transactions to include message oriented transactions in this paper we present three message queuing patterns that we have found useful for implementing middleware mediated transactions we discuss and show how the patterns can be applied to support guaranteed compensation in the engineering of transactional enterprise applications
join is fundamental operator in data stream management system dsms it is more efficient to share execution of multiple windowed joins than separate execution of everyone because the former saves part of cost in common windows therefore shared window join is adopted widely in multi queries dsms when all tasks of queries exceed maximum system capacity the overloaded dsms fails to process all of its input data and keep up with the rates of data arrival especially in time critical environment queries should be completed not just timely but within certain deadlines in this paper we address load shedding approach for shared window join over real time data streams load shedding algorithm ls sjrt cw is proposed to handle queries shared window join in overloaded real time system effectively it would reduce load shedding overhead by adjusting sliding window size experiment results show that our algorithm would decrease average deadline miss ratio over some ranges of workloads
as interactive multimedia communications are developing rapidly on the internet they present stringent challenges on end to end ee performance on the other hand however the internet’s architecture ipv remains almost the same as it was originally designed for only data transmission purpose and has experienced big hurdle to actualize qos universally this paper designs cooperatively overlay routing service cors aiming to overcome the performance limit inherent in the internet’s ip layer routing service the key idea of cors is to efficiently compose number of eligible application layer paths with suitable relays in the overlay network besides the direct ip path cors can transfer data simultaneously through one or more application layer paths to adaptively satisfy the data’s application specific requirements on ee performance simulation results indicate the proposed schemes are scalable and effective practical experiments based on prototype implemented on planetlab show that cors is feasible to enhance the transmission reliability and the quality of multimedia communications
sensors have been increasingly used for many ubiquitous computing applications such as asset location monitoring visual surveillance and human motion tracking in such applications it is important to place sensors such that every point of the target area can be sensed by more than one sensor especially many practical applications require coverage for triangulation hull building and etc also in order to extract meaningful information from the data sensed by multiple sensors those sensors need to be placed not too close to each other minimum separation requirement to address the coverage problem with the minimum separation requirement our recent work kim et al proposes two heuristic methods so called overlaying method and tre based method which complement each other depending on the minimum separation requirement for these two methods we also provide mathematical analysis that can clearly guide us when to use the tre based method and when to use the overlaying method and also how many sensors are required to make it self contained in this paper we first revisit the two heuristic methods then as an extension we present an ilp based optimal solution targeting for grid coverage with this ilp based optimal solution we investigate how much close the two heuristic methods are to the optimal solution finally this paper discusses the impacts of the proposed methods on real deployed systems using two example sensor systems to the best of our knowledge this is the first work that systematically addresses the coverage problem with the minimum separation requirement
mobile location aware applications have become quite popular across range of new areas such as pervasive games and mobile edutainment applications however it is only recently that approaches have been presented which combine gaming and education with mobile augmented reality systems however they typically lack close crossmedia integration of the surroundings and often annotate or extend the environment rather than modifying and altering it in this paper we present mobile outdoor mixed reality game for exploring the history of city in the spatial and the temporal dimension we introduce the design and concept of the game and present universal mechanism to define and setup multi modal user interfaces for the game challenges finally we discuss the results of the user tests
surrogate is an object that stands for document and enables navigation to that document hypermedia is often represented with textual surrogates even though studies have shown that image and text surrogates facilitate the formation of mental models and overall understanding surrogates may be formed by breaking document down into set of smaller elements each of which is surrogate candidate while processing these surrogate candidates from an html document relevant information may appear together with less useful junk material such as navigation bars and advertisements this paper develops pattern recognition based approach for eliminating junk while building the set of surrogate candidates the approach defines features on candidate elements and uses classification algorithms to make selection decisions based on these features for the purpose of defining features in surrogate candidates we introduce the document surrogate model dsm streamlined document object model dom like representation of semantic structure using quadratic classifier we were able to eliminate junk surrogate candidates with an average classification rate of by using this technique semiautonomous agents can be developed to more effectively generate surrogate collections for users we end by describing new approach for hypermedia and the semantic web which uses the dsm to define value added surrogates for document
we consider generic garbled circuit gc based techniques for secure function evaluation sfe in the semi honest modelwe describe efficient gc constructions for addition subtraction multiplication and comparison functions our circuits for subtraction and comparison are approximately two times smaller in terms of garbled tables than previous constructions this implies corresponding computation and communication improvements in sfe of functions using our efficient building blocks the techniques rely on recently proposed free xor gc techniquefurther we present concrete and detailed improved gc protocols for the problem of secure integer comparison and related problems of auctions minimum selection and minimal distance performance improvement comes both from building on our efficient basic blocks and several problem specific gc optimizations we provide precise cost evaluation of our constructions which serves as baseline for future protocols
the restricted correspondence problem is the task of solving the classical stereo correspondence problem when the surface being observed is known to belong to family of surfaces that vary in known way with one or more parameters under this constraint the surface can be extracted far more robustly than by classical stereo applied to an arbitrary surface since the problem is solved semi globally rather than locally for each epipolar line here the restricted correspondence problem is solved for two examples the first being the extraction of the parameters of an ellipsoid from calibrated stereo pair the second example is the estimation of the osculating paraboloid at the frontier points of convex object
recently research on text mining has attracted lots of attention from both industrial and academic fields text mining concerns of discovering unknown patterns or knowledge from large text repository the problem is not easy to tackle due to the semi structured or even unstructured nature of those texts under consideration many approaches have been devised for mining various kinds of knowledge from texts one important aspect of text mining is on automatic text categorization which assigns text document to some predefined category if the document falls into the theme of the category traditionally the categories are arranged in hierarchical manner to achieve effective searching and indexing as well as easy comprehension for human beings the determination of category themes and their hierarchical structures were most done by human experts in this work we developed an approach to automatically generate category themes and reveal the hierarchical structure among them we also used the generated structure to categorize text documents the document collection was trained by self organizing map to form two feature maps these maps were then analyzed to obtain the category themes and their structure although the test corpus contains documents written in chinese the proposed approach can be applied to documents written in any language and such documents can be transformed into list of separated terms
unsupervised clustering can be significantly improved using supervision in the form of pairwise constraints ie pairs of instances labeled as belonging to same or different clusters in recent years number of algorithms have been proposed for enhancing clustering quality by employing such supervision such methods use the constraints to either modify the objective function or to learn the distance measure we propose probabilistic model for semi supervised clustering based on hidden markov random fields hmrfs that provides principled framework for incorporating supervision into prototype based clustering the model generalizes previous approach that combines constraints and euclidean distance learning and allows the use of broad range of clustering distortion measures including bregman divergences eg euclidean distance and divergence and directional similarity measures eg cosine similarity we present an algorithm that performs partitional semi supervised clustering of data by minimizing an objective function derived from the posterior energy of the hmrf model experimental results on several text data sets demonstrate the advantages of the proposed framework
evaluative texts on the web have become valuable source of opinions on products services events individuals etc recently many researchers have studied such opinion sources as product reviews forum posts and blogs however existing research has been focused on classification and summarization of opinions using natural language processing and data mining techniques an important issue that has been neglected so far is opinion spam or trustworthiness of online opinions in this paper we study this issue in the context of product reviews which are opinion rich and are widely used by consumers and product manufacturers in the past two years several startup companies also appeared which aggregate opinions from product reviews it is thus high time to study spam in reviews to the best of our knowledge there is still no published study on this topic although web spam and email spam have been investigated extensively we will see that opinion spam is quite different from web spam and email spam and thus requires different detection techniques based on the analysis of million reviews and million reviewers from amazoncom we show that opinion spam in reviews is widespread this paper analyzes such spam activities and presents some novel techniques to detect them
given string and language the hamming distance of to is the minimum hamming distance of to any string in the edit distance of string to language is analogously definedfirst we prove that there is language in ac such that both hamming and edit distance to this language are hard to approximate they cannot be approximated with factor for any unless np denotes the length of the input string second we show the parameterized intractability of computing the hamming distance we prove that for every there exists language in ac for which computing the hamming distance is hard moreover there is language in for which computing the hamming distance is wp hardthen we show that the problems of computing the hamming distance and of computing the edit distance are in some sense equivalent by presenting approximation ratio preserving reductions from the former to the latter and vice versafinally we define hamp to be the class of languages to which the hamming distance can efficiently ie in polynomial time be computed we show some properties of the class hamp on the other hand we give evidence that characterization in terms of automata or formal languages might be difficult
in this paper new color space called the rgb color ratio space is proposed and defined according to reference color such that an image can be transformed from conventional color space to the rgb color ratio space because color in the rgb color ratio space is represented as three color ratios and intensity the chrominance can be completely reserved three color ratios and the luminance can be de correlated with the chrominance different from traditional distance measurement road color model is determined by an ellipse area in the rgb ratio space enclosed by the estimated boundaries proposed adaptive fuzzy logic in which fuzzy membership functions are defined according to estimated boundaries is introduced to implement clustering rules therefore each pixel will have its own fuzzy membership function corresponding to its intensity basic neural network is trained and used to achieve parameters optimization the low computation cost of the proposed segmentation method shows the feasibility for real time application experimental results for road detection demonstrate the robustness to intensity variation of the proposed approach
we show the decidability of model checking pa processes against several first order logics based upon the reachability predicate the main tool for this result is the recognizability by tree automata of the reachability relation the tree automata approach and the transition logics we use allow smooth and general treatment of parameterized model checking for pa this approach is extended to handle quite general notion of costs of pa steps in particular when costs are parikh images of traces we show decidability of transition logic extended by some form of first order reasoning over costs
hierarchical agent framework is proposed to construct monitoring layer towards self aware parallel systems on chip socs with monitoring services as new design dimension systems are capable of observing and reconfiguring themselves dynamically at all levels of granularity based on application requirements and platform conditions agents with hierarchical priorities work adaptively and cooperatively to maintain and improve system performance in the presence of variations and faults function partitioning of agents and hierarchical monitoring operations on parallel socs are analyzed applying the design approach on the network on chip noc platform demonstrates the design process and benefits using the novel approach
proliferation of portable wireless enabled laptop computers and pdas cost effective deployment of access points and availability of the license exempt bands and appropriate networking standards contribute to the conspicuous success of ieee wlans in the article we provide comprehensive overview of techniques for capacity improvement and qos provisioning in the ieee protocol family these techniques represent the efforts both in the research community and the ieee working groups specifically we summarize the operations of ieee legacy as well as its extension introduce several protocol modeling techniques and categorize the various approaches to improve protocol capacity to provide qos by either devising new mac protocol components or fine tuning protocol parameters in ieee and to judiciously arbitrate radio resources eg transmission rate and power to demonstrate how to adapt qos provisioning in newly emerging areas we use the wireless mesh network as an example discuss the role ieee plays in such network and outline research issues that arise
digital audio video data have become an integral part of multimedia information systems to reduce storage and bandwidth requirements they are commonly stored in compressed format such as mpeg increasing amounts of mpeg encoded audio and video documents are available online and in proprietary collections in order to effectively utilise them we need tools and techniques to automatically analyse segment and classify mpeg video content several techniques have been developed both in the audio and visual domain to analyse videos this paper presents survey of audio and visual analysis techniques on mpeg encoded media that are useful in supporting variety of video applications although audio and visual feature analyses have been carried out extensively they become useful to applications only when they convey semantic meaning of the video content therefore we also present survey of works that provide semantic analysis on mpeg encoded videos
bottom sketch is summary of set of items with nonnegative weights each such summary allows us to compute approximate aggregates over the set of items bottom sketches are obtained by associating with each item in ground set an independent random rank drawn from probability distribution that depends on the weight of the item for each subset of interest the bottom sketch is the set of the minimum ranked items and their ranks bottom sketches have numerous applications we develop and analyze data structures and estimators for bottom sketches to facilitate their deployment we develop novel estimators and algorithms that show that they are superior alternative to other sketching methods in both efficiency of obtaining the sketches and the accuracy of the estimates derived from the sketches
novel touch based interaction method by use of orientation information of touch region is proposed to capture higher dimensional information of touch including position and an orientation as well we develop robust algorithms to detect contact shape and to estimate its orientation angle also we suggest practical guidelines to use our method through experiments considering various conditions and show possible service scenarios of aligning documents and controlling media player
this paper offers theoretical study of constraint simplification fundamental issue for the designer of practical type inference system with subtyping in the simpler case where constraints are equations simple isomorphism between constrained type schemes and finite state automata yields complete constraint simplification method using it as guide for the intuition we move on to the case of subtyping and describe several simplification algorithms although no longer complete they are conceptually simple efficient and very effective in practice overall this paper gives concise theoretical account of the techniques found at the core of our type inference system our study is restricted to the case where constraints are interpreted in non structural lattice of regular terms nevertheless we highlight small number of general ideas which explain our algorithms at high level and may be applicable to variety of other systems copyright academic press
personalization of learning has become prominent issue in the educational field at various levels this article elaborates different view on personalisation than what usually occurs in this area its baseline is that personalisation occurs when learning turns out to become personal in the learner’s mind through literature survey we analyze constitutive dimensions of this inner sense of personalisation here we devote special attention to confronting learners with tracked information making their personal interaction footprints visible contrasts with the back office usage of this data by researchers instructors or adaptive systems we contribute prototype designed for the moodle platform according to the conceptual approach presented here
the fluid documents project has developed various research prototypes that show that powerful annotation techniques based on animated typographical changes can help readers utilize annotations more effectively our recently developed fluid open hypermedia prototype supports the authoring and browsing of fluid annotations on third party web pages this prototype is an extension of the arakne environment an open hypermedia application that can augment web pages with externally stored hypermedia structures this paper describes how various web standards including dom css xlink xpointer and rdf can be used and extended to support fluid annotations
wireless sensor networks have created new opportunities for data collection in variety of scenarios such as environmental and industrial where we expect data to be temporally and spatially correlated researchers may want to continuously collect all sensor data from the network for later analysis suppression both temporal and spatial provides opportunities for reducing the energy cost of sensor data collection we demonstrate how both types can be combined for maximal benefit we frame the problem as one of monitoring node and edge constraints monitored node triggers report if its value changes monitored edge triggers report if the difference between its nodes values changes the set of reports collected at the base station is used to derive all node values we fully exploit the potential of this global inference in our algorithm conch short for constraint chaining constraint chaining builds network of constraints that are maintained locally but allow global view of values to be maintained with minimal cost network failure complicates the use of suppression since either causes an absence of reports we add enhancements to conch to build in redundant constraints and provide method to interpret the resulting reports in case of uncertainty using simulation we experimentally evaluate conch’s effectiveness against competing schemes in number of interesting scenarios
abstract researchers have recently discovered several interesting self organized regularities from the world wide web ranging from the structure and growth of the web to the access patterns in web surfing what remains to be great challenge in web log mining is how to explain user behavior underlying observed web usage regularities in this paper we will address the issue of how to characterize the strong regularities in web surfing in terms of user navigation strategies and present an information foraging agent based approach to describing user behavior by experimenting with the agent based decision models of web surfing we aim to explain how some web design factors as well as user cognitive factors may affect the overall behavioral patterns in web usage
most video retrieval systems are multimodal commonly relying on textual information low and high level semantic features extracted from query visual examples in this work we study the impact of exploiting different knowledge sources in order to automatically retrieve query visual examples relevant to video retrieval task our hypothesis is that the exploitation of external knowledge sources can help on the identification of query semantics as well as on improving the understanding of video contents we propose set of techniques to automatically obtain additional query visual examples from different external knowledge sources such as dbpedia flickr and google images which have different coverage and structure characteristics the proposed strategies attempt to exploit the semantics underlying the above knowledge sources to reduce the ambiguity of the query and to focus the scope of the image searches in the repositories we assess and compare the quality of the images obtained from the different external knowledge sources when used as input of number of video retrieval tasks we also study how much they complement manually provided sets of examples such as those given by trecvid tasks based on our experimental results we report which external knowledge source is more likely to be suitable for the evaluated retrieval tasks results also demonstrate that the use of external knowledge can be good complement to manually provided examples and when lacking of visual examples provided by user our proposed approaches can retrieve visual examples to improve the user’s query
we introduce light weight scalable truthful routing protocol lstop for selfish nodes problem in mobile ad hoc networks where node may use different cost to send packets to different neighbours lstop encourages nodes cooperation by rewarding nodes for their forwarding service according to their cost it incurs low overhead of in the worst case and only on the average we show the truthfulness of lstop and present the result of an extensive simulation study to show that lstop approaches optimal cost routing and achieves significant better network performance compared to ad hoc vcg
dynamic binary translators dbts provide powerful platforms for building dynamic program monitoring and adaptation tools dbts however have high memory demands because they cache translated code and auxiliary code to software code cache and must also maintain data structures to support the code cache the high memory demands make it difficult for memory constrained embedded systems to take advantage of dbt based tools previous research on dbt memory management focused on the translated code and auxiliary code only however we found that data structures are comparable to the code cache in size we show that the translated code size auxiliary code size and the data structure size interact in complex manner depending on the path selection trace selection and link formation strategy therefore holistic memory efficiency comprising translated code auxiliary code and data structures cannot be improved by focusing on the code cache only in this paper we use path selection for improving holistic memory efficiency which in turn impacts performance in memory constrained environments although there has been previous research on path selection such research only considered performance in memory unconstrained environments the challenge for holistic memory efficiency is that the path selection strategy results in complex interactions between the memory demand components also individual aspects of path selection and the holistic memory efficiency may impact performance in complex ways we explore these interactions to motivate path selection targeting holistic memory demand we enumerate all the aspects involved in path selection design and evaluate comprehensive set of approaches for each aspect finally we propose path selection strategy that reduces memory demands by and at the same time improves performance by compared to an industrial strength dbt
effective identification of coexpressed genes and coherent patterns in gene expression data is an important task in bioinformatics research and biomedical applications several clustering methods have recently been proposed to identify coexpressed genes that share similar coherent patterns however there is no objective standard for groups of coexpressed genes the interpretation of co expression heavily depends on domain knowledge furthermore groups of coexpressed genes in gene expression data are often highly connected through large number of intermediate genes there may be no clear boundaries to separate clusters clustering gene expression data also faces the challenges of satisfying biological domain requirements and addressing the high connectivity of the data sets in this paper we propose an interactive framework for exploring coherent patterns in gene expression data novel coherent pattern index is proposed to give users highly confident indications of the existence of coherent patterns to derive coherent pattern index and facilitate clustering we devise an attraction tree structure that summarizes the coherence information among genes in the data set we present efficient and scalable algorithms for constructing attraction trees and coherent pattern indices from gene expression data sets our experimental results show that our approach is effective in mining gene expression data and is scalable for mining large data sets
we address specific enterprise document search scenario where the information need is expressed in an elaborate manner in our scenario information needs are expressed using short query of few keywords together with examples of key reference pages given this setup we investigate how the examples can be utilized to improve the end to end performance on the document retrieval task our approach is based on language modeling framework where the query model is modified to resemble the example pages we compare several methods for sampling expansion terms from the example pages to support query dependent and query independent query expansion the latter is motivated by the wish to increase aspect recall and attempts to uncover aspects of the information need not captured by the query for evaluation purposes we use the csiro data set created for the trec enterprise track the best performance is achieved by query models based on query independent sampling of expansion terms from the example documents
the view update problem is concerned with indirectly modifying those tuples that satisfy view or derived table by an appropriate update against the corresponding base tables the notion of deduction tree is defined and the relationship between such trees and the view update problem for indefinite deductive databases is considered it is shown that traversal of an appropriate deduction tree yields sufficient information to perform view updates at the propositional level to obtain similar result at the first order level it is necessary for theoretical and computational reasons to impose some weak stratification and definiteness constraints on the database
this paper describes new method for contouring signed grid whose edges are tagged by hermite data ie exact intersection points and normals this method avoids the need to explicitly identify and process features as required in previous hermite contouring methods using new numerically stable representation for quadratic error functions we develop an octree based method for simplifying contours produced by this method we next extend our contouring method to these simpli pound ed octrees this new method imposes no constraints on the octree such as being restricted octree and requires no crack patching we conclude with simple test for preserving the topology of the contour during simplification
benchmarking is critical when evaluating performance but is especially difficult for file and storage systems complex interactions between devices caches kernel daemons and other os components result in behavior that is rather difficult to analyze moreover systems have different features and optimizations so no single benchmark is always suitable the large variety of workloads that these systems experience in the real world also adds to this difficulty in this article we survey file system and storage benchmarks from recent papers we found that most popular benchmarks are flawed and many research papers do not provide clear indication of true performance we provide guidelines that we hope will improve future performance evaluations to show how some widely used benchmarks can conceal or overemphasize overheads we conducted set of experiments as specific example slowing down read operations on ext by factor of resulted in only percnt wall clock slowdown in popular compile benchmark finally we discuss future work to improve file system and storage benchmarking
vector and matrix clocks are extensively used in asynchronous distributed systems this paper asks how does the clock abstraction generalize to address this problem the paper motivates and proposes logical clocks of arbitrary dimensions it then identifies and explores the conceptual link between such clocks and knowledge it establishes the necessary and sufficient conditions on the size and dimension of clocks required to attain any specified level of knowledge about the timestamp of the most recent system state for which this is possible without using any messages in the clock protocol the paper then gives algorithms to determine the time stamp of the latest system state about which specified level of knowledge is attainable in given system state and to compute the timestamp of the earliest system state in which specified level of knowledge about given system state is attainable the results are applicable to applications that deal with certain class of properties identified as monotonic properties
check if we can apply woodruff’s method to our protocol we show an efficient secure two party protocol based on yao’s construction which provides security against malicious adversaries yao’s original protocol is only secure in the presence of semi honest adversaries security against malicious adversaries can be obtained by applying the compiler of goldreich micali and wigderson the gmw compiler however this approach does not seem to be very practical as it requires using generic zero knowledge proofsour construction is based on applying cut and choose techniques to the original circuit and inputs security is proved according to the ideal real simulation paradigm and the proof is in the standard model with no random oracle model or common reference string assumptions the resulting protocol is computationally efficient the only usage of asymmetric cryptography is for running oblivious transfers for each input bit or for each bit of statistical security parameter whichever is larger our protocol combines techniques from folklore like cut and choose along with new techniques for efficiently proving consistency of inputs we remark that naive implementation of the cut and choose technique with yao’s protocol does not yield secure protocol this is the first paper to show how to properly implement these techniques and to provide full proof of securityour protocol can also be interpreted as constant round black box reduction of secure two party computation to oblivious transfer and perfectly hiding commitments or black box reduction of secure two party computation to oblivious transfer alone with number of rounds which is linear in statistical security parameter these two reductions are comparable to kilian’s reduction which uses ot alone but incurs number of rounds which is linear in the depth of the circuit
reuse signature or reuse distance pattern is an accurate model for program memory accessing behaviors it has been studied and shown to be effective in program analysis and optimizations by many recent works however the high overhead associated with reuse distance measurement restricts the scope of its application this paper explores applying sampling in reuse signature collection to reduce the time overhead we compare different sampling strategies and show that an enhanced systematic sampling with uniform coverage of all distance ranges can be used to extrapolate the reuse distance distribution based on that analysis we present novel sampling method with measurement accuracy of more than our average speedup of reuse signature collection is while the best improvement observed is this is the first attempt to utilize sampling in measuring reuse signatures experiments with varied programs and instrumentation tools show that sampling has great potential in promoting the practical uses of reuse signatures and enabling more optimization opportunities
embedded system designers face unique set of challenges in making their systems more secure as these systems often have stringent resource constraints or must operate in harsh or physically insecure environments one of the security issues that have recently drawn attention is software integrity which ensures that the programs in the system have not been changed either by an accident or an attack in this paper we propose an efficient hardware mechanism for runtime verification of software integrity using encrypted instruction block signatures we introduce several variations of the basic mechanism and give details of three techniques that are most suitable for embedded systems performance evaluation using selected mibench mediabench and basicrypt benchmarks indicates that the considered techniques impose relatively small performance overhead the best overall technique has performance overhead in the range when protecting byte instruction blocks with byte signatures with byte instruction blocks the overhead is in the range the average overhead with kb cache is with additional investment in signature cache this overhead can be almost completely eliminated
clustered microarchitectures are an effective organization to deal with the problem of wire delays and complexity by partitioning some of the processor resources the organization of the data cache is key factor in these processors due to its effect on cache miss rate and inter cluster communications this paper investigates alternative designs of the data cache centralized distributed replicated and physically distributed cache architectures are analyzed results show similar average performance but significant performance variations depending on the application features specially cache miss ratio and communications in addition we also propose novel instruction steering scheme in order to reduce communications this scheme conditionally stalls the dispatch of instructions depending on the occupancy of the clusters whenever the current instruction cannot be steered to the cluster holding most of the inputs this new steering outperforms traditional schemes results show an average speedup of and up to for some applications
age specific human computer interaction ashci has vast potential applications in daily life however automatic age estimation technique is still underdeveloped one of the main reasons is that the aging effects on human faces present several unique characteristics which make age estimation challenging task that requires non standard classification approaches according to the speciality of the facial aging effects this paper proposes the ages aging pattern subspace method for automatic age estimation the basic idea is to model the aging pattern which is defined as sequence of personal aging face images by learning representative subspace the proper aging pattern for an unseen face image is then determined by the projection in the subspace that can best reconstruct the face image while the position of the face image in that aging pattern will indicate its age the ages method has shown encouraging performance in the comparative experiments either as an age estimator or as an age range estimator
this paper presents an empirical study that evaluates oo method function points oomfp functional size measurement procedure for object oriented systems that are specified using the oo method approach laboratory experiment with students was conducted to compare oomfp with the ifpug function point analysis fpa procedure on range of variables including efficiency reproducibility accuracy perceived ease of use perceived usefulness and intention to use the results show that oomfp is more time consuming than fpa but the measurement results are more reproducible and accurate the results also indicate that oomfp is perceived to be more useful and more likely to be adopted in practice than fpa in the context of oo method systems development we also report lessons learned and suggest improvements to the experimental procedure employed and replications of this study using samples of industry practitioners
as an approach that applies not only to support user navigation on the web recommender systems have been built to assist and augment the natural social process of asking for recommendations from other people in typical recommender system people provide suggestions as inputs which the system aggregates and directs to appropriate recipients in some cases the primary computation is in the aggregation in others the value of the system lies in its ability to make good matches between the recommenders and those seeking recommendationsin this paper we discuss the architectural and design features of webmemex system that provides recommended information based on the captured history of navigation from list of people well known to the users including the users themselves allows users to have access from any networked machine demands user authentication to access the repository of recommendations and allows users to specify when the capture of their history should be performed
as the world uses more digital video that requires greater storage space grid computing is becoming indispensable for urgent problems in multimedia content analysis parallel horus support tool for applications in multimedia grid computing lets users implement multimedia applications as sequential programs for efficient execution on clusters and grids based on wide area multimedia services
traditionally software pipelining is applied either to theinnermost loop of given loop nest or from the innermostloop to outer loops in this paper we propose three stepapproach called single dimension software pipelining ssp to software pipeline loop nest at an arbitraryloop levelthe first step identifies the most profitable loop level forsoftware pipelining in terms of initiation rate or data reusepotential the second step simplifies the multi dimensionaldata dependence graph ddg into dimensional ddgand constructs dimensional schedule for the selectedloop level the third step derives simple mapping functionwhich specifies the schedule time for the operations of themulti dimensional loop based on the dimensional schedulewe prove that the ssp method is correct and at least asefficient as other modulo scheduling methodswe establish the feasibility and correctness of our approachby implementing it on the ia architecture experimentalresults on small number of loops show significantperformance improvements over existing modulo schedulingmethods that software pipeline loop nest from the innermostloop
the problem of performing tasks on asynchronous or undependable processors is basic problem in distributed computing this paper considers an abstraction of this problem called write all using processors write into all locations of an array of size in this problem writing abstracts the notion of performing simple task despite substantial research there is dearth of efficient deterministic asynchronous algorithms for write all efficiency of algorithms is measured in terms of work that accounts for all local steps performed by the processors in solving the problem thus an optimal algorithm would have work however it is known that optimality cannot be achieved when the quest then is to obtain work optimal solutions for this problem using non trivial compared to number of processors recently it was shown that optimality can be achieved using non trivial number of processors where log the new result in this paper significantly extends the range of processors for which optimality is achieved the result shows that optimality can be achieved using close to processors more precisely using log processors for any additionally the new result uses only the atomic read write memory without resorting to using the test and set primitive that was necessary in the previous solution this paper presents the algorithm and gives its analysis showing that the work complexity of the algorithm is which is optimal when while all prior deterministic algorithms require super linear work when
given sequence of symbols over some alphabet sigma of size sigma we develop new compression methods that are very simple to implement ii provide time random access to any symbol or short substring of the original sequence our simplest solution uses at most bits of space where and is the zeroth order empirical entropy of we discuss number of improvements and trade offs over the basic method for example we can achieve bits of space for log sigma several applications are discussed including text compression compressed full text indexing and string matching
most search engines display some document metadata such as title snippet and url in conjunction with the returned hits to aid users in determining documents however metadata is usually fragmented pieces of information that even when combined does not provide an overview of returned document in this paper we propose mechanism of enriching metadata of the returned results by incorporating automatically extracted document keyphrases with each returned hit we hypothesize that keyphrases of document can better represent the major theme in that document therefore by examining the keyphrases in each returned hit users can better predict the content of documents and the time spent on downloading and examining the irrelevant documents will be reduced substantially
when the mobile environment consists of light weight devices the energy consumption of location based services lbss and the limited bandwidth of the wireless network become important issues motivated by this we propose new spatial query processing algorithms to support mobile continuous nearest neighbor query mcnnq in wireless broadcast environments our solution provides general client server architecture for answering mcnnq on objects with unknown and possibly variable movement types our solution enables the application of spatio temporal access methods specifically designed for particular type to arbitrary movements without any false misses our algorithm does not require any conventional spatial index for mcnnq processing it can be adapted to static or moving objects and does not require additional knowledge eg direction of moving objects beyond the maximum speed and the location of each object extensive experiments demonstrate that our location based data dissemination algorithm significantly outperforms index based solutions
creating maintaining or using digital library requires the manipulation of digital documents information workspaces provide visual representation allowing users to collect organize annotate and author information the visual knowledge builder vkb helps users access collect annotate and combine materials from digital libraries and other sources into personal information workspace vkb has been enhanced to include direct search interfaces for nsdl and google users create visualization of search results while selecting and organizing materials for their current activity additionally metadata applicators have been added to vkb this interface allows the rapid addition of metadata to documents and aids the user in the extraction of existing metadata for application to other documents study was performed to compare the selection and organization of documents in vkb to the commonly used tools ofa web browser and word processor this study shows the value of visual workspaces for such effort but points to the need for subdocument level objects ephemeral visualizations and support for moving from visual representations to metadata
growing attention is being paid to application security at requirements engineering time confidentiality is particular subclass of security concerns that requires sensitive information to never be disclosed to unauthorized agents disclosure refers to undesired knowledge states of such agents in previous work we have extended our requirements specification framework with epistemic constructs for capturing what agents may or may not know about the application roughly an agent knows some property if the latter is found in the agent’s memorythis paper makes the semantics of such constructs further precise through formal model of how sensitive information may appear or disappear in an agent’s memory based on this extended framework catalog of specification patterns is proposed to codify families of confidentiality requirements proof of concept tool is presented for early checking of requirements models against such confidentiality patterns in case of violation the counterexample scenarios generated by the tool show how an unauthorized agent may acquire confidential knowledge counter measures should then be devised to produce further confidentiality requirements
constructing multipath routes in manets is important for providing reliable delivery load balancing and bandwidth aggregation however popular multipath routing approaches fail to produce spatially disjoint routes in simple and cost effective manner and existing single path approaches cannot be easily modified to produce multiple disjoint routes in this paper we propose electric field based routing efr as reliable framework for routing in manets by applying the concept of electric field lines our location based protocol naturally provides spatially disjoint routes based on the shapes of these lines the computation is highly localized and requires no explicit coordination among routes efr can also be easily extended to offer load balancing bandwidth aggregation and power management through simulation efr shows higher delivery ratio and lower overhead under high mobility high network loads and network failures compared to popular multipath and location based schemes efr also demonstrates high resiliency to dos attacks
simulation is widely used for developing evaluating and analyzing sensor network applications especially when deploying large scale sensor network remains expensive and labor intensive however due to its computation intensive nature existent simulation tools have to make trade offs between fidelity and scalability and thus offer limited capabilities as design and analysis tools in this paper we introduce disens distributed sensor network simulation highly scalable distributed simulation system for sensor networks disens does not only faithfully emulates an extensive set of sensor hardware and supports extensible radio power models so that sensor network applications can be simulated transparently with high fidelity but also employs distributed memory parallel cluster system to attack the complex simulation problem combining an efficient distributed synchronization protocol and sophisticated node partitioning algorithm based on existent research disens achieves greater scalability than even many discrete event simulators on small to medium size cluster nodes disens is able to simulate hundreds of motes in realtime speed and scale to thousands in sub realtime speed to our knowledge disens is the first full system sensor network simulator with such scalability
this paper provides an extensive survey of the different methods of addressing security issues in the grid computing environment and specifically contributes to the research environment by developing comprehensive framework for classification of these research endeavors the framework presented classifies security literature into system solutions behavioral solutions hybrid solutions and related technologies each one of these categories is explained in detail in the paper to provide insight as to their unique methods of accomplishing grid security the types of grid and security situations they apply best to and the pros and cons for each type of solution further several areas of research were identified in the course of the literature survey where more study is warranted these avenues for future research are also discussed in this paper several types of grid systems exist currently and the security needs and solutions to address those needs for each type vary as much as the types of systems themselves this research framework will aid in future research efforts to define analyze and address grid security problems for the many varied types of grid setups as well as the many security situations that each grid may face
in this paper we extend the scope of mining association rules from traditional single dimensional intratransaction associations to multidimensional intertransaction associations intratransaction associations are the associations among items with the same transaction where the notion of the transaction could be the items bought by the same customer the events happened on the same day and so on however an intertransaction association describes the association relationships among different transactions such as ldquo if company a’s stock goes up on day b’s stock will go down on day but go up on day rdquo in this case whether we treat company or day as the unit of transaction the associated items belong to different transactions moreover such an intertransaction association can be extended to associate multiple contextual properties in the same rule so that multidimensional intertransaction associations can be defined and discovered two dimensional intertransaction association rule example is ldquo after mcdonald and burger king open branches kfc will open branch two months later and one mile away rdquo which involves two dimensions time and space mining intertransaction associations poses more challenges on efficient processing than mining intratransaction associations interestingly intratransaction association can be treated as special case of intertransaction association from both conceptual and algorithmic point of view in this study we introduce the notion of multidimensional intertransaction association rules study their measurements mdash support and confidence mdash and develop algorithms for mining intertransaction associations by extension of apriori we overview our experience using the algorithms on both real life and synthetic data sets further extensions of multidimensional intertransaction association rules and potential applications are also discussed
modern data mining settings involve combination of attribute valued descriptors over entities as well as specified relationships between these entities we present an approach to cluster such non homogeneous datasets by using the relationships to impose either dependent clustering or disparate clustering constraints unlike prior work that views constraints as boolean criteria we present formulation that allows constraints to be satisfied or violated in smooth manner this enables us to achieve dependent clustering and disparate clustering using the same optimization framework by merely maximizing versus minimizing the objective function we present results on both synthetic data as well as several real world datasets
one of the main challenges faced by content based publish subscribe systems is handling large amount of dynamic subscriptions and publications in multidimensional content space to reduce subscription forwarding load and speed up content matching subscription covering subsumption and merging techniques have been proposed in this paper we propose mics multidimensional indexing for content space that provides an efficient representation and processing model for large number of subscriptions and publications mics creates one dimensional representation for publications and subscriptions using hilbert space filling curve based on this representation we propose novel content matching and subscription management covering subsumption and merging algorithms our experimental evaluation indicates that the proposed approach significantly speeds up subscription management operations compared to the naive linear approach
an overall sensornet architecture would help tame the increasingly complex structure of wireless sensornet software and help foster greater interoperability between different codebases previous step in this direction is the sensornet protocol sp unifying link abstraction layer this paper takes the natural next step by proposing modular network layer for sensornets that sits atop sp this modularity eases implementation of new protocols by increasing code reuse and enables co existing protocols to share and reduce code and resources consumed at run time we demonstrate how current protocols can be decomposed into this modular structure and show that the costs in performance and code footprint are minimal relative to their monolithic counterparts
die stacking is an exciting new technology that increases transistor density by vertically integrating two or more die with dense high speed interface the result of die stacking is significant reduction of interconnect both within die and across dies in system for instance blocks within microprocessor can be placed vertically on multiple die to reduce block to block wire distance latency and power disparate si technologies can also be combined in die stack such as dram stacked on cpu resulting in lower power higher bw and lower latency interfaces without concern for technology integration into single process flow has the potential to change processor design constraints by providing substantial power and performance benefits despite the promising advantages of there is significant concern for thermal impact in this research we study the performance advantages and thermal challenges of two forms of die stacking stacking large dram or sram cache on microprocessor and dividing traditional microarchitecture between two die in stack results it is shown that mb stacked dram cache can reduce the cycles per memory access of twothreaded rms benchmark on average by and as much as while increasing the peak temperature by negligible ºc off die bw and power are also reduced by on average it is also shown that floorplan of high performance microprocessor can simultaneously reduce power and increase performance with small ºc increase in peak temperature voltage scaling can reach neutral thermals with simultaneous power reduction and performance improvement
as gml is becoming the de facto standard for geographic data storage transmission and exchange more and more geographic data exists in gml format in applications gml documents are usually very large in size because they contain large number of verbose markup tags and large amount of spatial coordinate data in order to speedup data transmission and reduce network cost it is essential to develop effective and efficient gml compression tools although gml is special case of xml current xml compressors are not effective if directly applied to gml because these compressors have been designed for general xml data in this paper we propose gpress compressor for effectively compressing gml documents to the best of our knowledge gpress is the first compressor specifically for gml documents compression gpress exploits the unique characteristics of gml documents to achieve good performance extensive experiments over real world gml documents show that gpress evidently outperforms xmill one of the best existing xml compressors in compression ratio while its compression efficiency is comparable to the existing xml compressors
aggregation and cube are important operations for online analytical processing olap many efficient algorithms to compute aggregation and cube for relational olap have been developed some work has been done on efficiently computing cube for multidimensional data warehouses that store data sets in multidimensional arrays rather than in tables however to our knowledge there is nothing to date in the literature describing aggregation algorithms on compressed data warehouses for multidimensional olap this paper presents set of aggregation algorithms on compressed data warehouses for multidimensional olap these algorithms operate directly on compressed data sets which are compressed by the mapping complete compression methods without the need to first decompress them the algorithms have different performance behaviors as function of the data set parameters sizes of outputs and main memory availability the algorithms are described and the and cpu cost functions are presented in this paper decision procedure to select the most efficient algorithm for given aggregation request is also proposed the analysis and experimental results show that the algorithms have better performance on sparse data than the previous aggregation algorithms
emerging grid computing infrastructures such as cloud computing can only become viable alternatives for the enterprise if they can provide stable service levels for business processes and sla based costing in this paper we describe and apply three step approach to map sla and qos requirements of business processes to such infrastructures we start with formalization of service capabilities and business process requirements we compare them and if we detect performance or reliability gap we dynamically improve performance of individual services deployed in grid and cloud computing environments here we employ translucent replication of services an experimental evaluation in amazon ec verified our approach
in this paper we present new end to end protocol namely scalable streaming video protocol ssvp which operates on top of udp and is optimized for unicast video streaming applications ssvp employs additive increase multiplicative decrease aimd based congestion control and adapts the sending rate by properly adjusting the inter packet gap ipg the smoothness oriented modulation of aimd parameters and ipg adjustments reduce the magnitude of aimd oscillation and allow for smooth transmission patterns while tcp friendliness is maintained our experimental results demonstrate that ssvp eventually adapts to the vagaries of the network and achieves remarkable performance on real time video delivery in the event where awkward network conditions impair the perceptual video quality we investigate the potential improvement via layered adaptation mechanism that utilizes receiver buffering and adapts video quality along with long term variations in the available bandwidth the adaptation mechanism sends new layer based on explicit criteria that consider both the available bandwidth and the amount of buffering at the receiver preventing wasteful layer changes that have an adverse effect on user perceived quality quantifying the interactions of ssvp with the specific adaptation scheme we identify notable gains in terms of video delivery especially in the presence of limited bandwidth
this work examines how awareness systems class of technologies that support sustained and effortless communication between individuals and groups can support family communication going beyond the evaluation of specific design concepts this paper reports on three studies that aimed to answer the following research questions do families want to be aware of each other through the day or would they perhaps rather not know more about each other’s activities and whereabouts than they already do if they do wish to have some awareness what should they be aware of the research involved in depth interviews with participants field trial of an awareness system connecting five busy parents with their children and survey of participants conducted over the web triangulation of the results of the three studies leads to the following conclusions some busy parents want to automatically exchange awareness information during the day while others do not availability of partner for coordinating family activities daily activities in new family situations activity and location information of dependent children are salient awareness information needs for this group awareness information needs to vary with contexts suggesting the need for flexible mechanisms to manage the sharing of such information
we present novel framework based on continuous fluid simulator for general simulation of realistic bubbles with which we can handle as many significant dynamic bubble effects as possible to capture very thin liquid film of bubbles we have developed regional level set method allowing multi manifold interface tracking based on the definitions of regional distance and its five operators the implementation of the regional level set method is very easy an implicit surface of liquid film with arbitrary thickness can be reconstructed from the regional level set function to overcome the numerical instability problem we exploit new semi implicit surface tension model which is unconditionally stable and makes the simulation of surface tension dominated phenomena much more efficient an approximated film thickness evolution model is proposed to control the bubble’s lifecycle all these new techniques combine into general framework that can produce various realistic dynamic effects of bubbles
shortest path queries spq are essential in many graph analysis and mining tasks however answering shortest path queries on the fly on large graphs is costly to online answer shortest path queries we may materialize and index shortest paths however straightforward index of all shortest paths in graph of vertices takes space in this paper we tackle the problem of indexing shortest paths and online answering shortest path queries as many large real graphs are shown richly symmetric the central idea of our approach is to use graph symmetry to reduce the index size while retaining the correctness and the efficiency of shortest path query answering technically we develop framework to index large graph at the orbit level instead of the vertex level so that the number of breadth first search trees materialized is reduced from to delta where delta le is the number of orbits in the graph we explore orbit adjacency and local symmetry to obtain compact breadth first search trees compact bfs trees an extensive empirical study using both synthetic data and real data shows that compact bfs trees can be built efficiently and the space cost can be reduced substantially moreover online shortest path query answering can be achieved using compact bfs trees
many large organizations have multiple large databases as they transact from multiple branches most of the previous pieces of work are based on single database thus it is necessary to study data mining on multiple databases in this paper we propose two measures of similarity between pair of databases also we propose an algorithm for clustering set of databases efficiency of the clustering process has been improved using the following strategies reducing execution time of clustering algorithm using more appropriate similarity measure and storing frequent itemsets space efficiently
knowing that two numerical variables always hold different values at some point of program can be very useful especially for analyzing aliases if then and are not aliased and this knowledge is of great help for many other program analyses surprisingly disequalities are seldom considered in abstract interpretation most of the proposed numerical domains being restricted to convex sets in this paper we propose to combine simple ordering properties with disequalities difference bound matrices or dbms is domain proposed by david dill for expressing relations of the form or we define ddbms disequalities dbms as conjunctions of dbms with simple disequalities of the form or we give algorithms on ddbms for deciding the emptiness computing normal form and performing the usual operations of an abstract domain these algorithms have the same complexity where is the number of variables than those for classical dbms if the variables are considered to be valued in dense set or in the arithmetic case the emptiness decision is np complete and other operations run in
as organizations implement information strategies that call for sharing access to resources in the networked environment mechanisms must be provided to protect the resources from adversaries the proposed delegation framework addresses the issue of how to advocate selective information sharing in role based systems while minimizing the risks of unauthorized access we introduce systematic approach to specify delegation and revocation policies using set of rules we demonstrate the feasibility of our framework through policy specification enforcement and proof of concept implementation on specific domains eg the healthcare environment we believe that our work can be applied to organizations that rely heavily on collaborative tasks
as new communications media foster international collaborations we would be remiss in overlooking cultural differences when assessing them in this study pairs in three cultural groupings american american aa chinese chinese cc and american chinese ac worked on two decision making tasks one face to face and the other via im drawing upon prior research we predicted differences in conversational efficiency conversational content interaction quality persuasion and performance the quantitative results combined with conversation analysis suggest that the groups viewed the task differently aa pairs as an exercise in situation specific compromise cc as consensus reaching cultural differences were reduced but not eliminated in the im condition
automatic annotation of medical images is an increasingly important tool for physicians in their daily activity hospitals nowadays produce an increasing amount of data manual annotation is very costly and prone to human mistakes this paper proposes multi cue approach to automatic medical image annotation we represent images using global and local features these cues are then combined using three alternative approaches all based on the support vector machine algorithm we tested our methods on the irma database and with two of the three approaches proposed here we participated in the imageclefmed benchmark evaluation in the medical image annotation track these algorithms ranked first and fifth respectively among all submission experiments using the third approach also confirm the power of cue integration for this task
we evaluate various heuristics for hierarchical spectral clustering in large telephone call and web graphs spectral clustering without additional heuristics often produces very uneven cluster sizes or low quality clusters that may consist of several disconnected components fact that appears to be common for several data sources but to our knowledge no general solution provided so far divide and merge recently described postfiltering procedure may be used to eliminate bad quality branches in binary tree hierarchy we propose an alternate solution that enables way cuts in each step by immediately filtering unbalanced or low quality clusters before splitting them furtherour experiments are performed on graphs with various weight and normalization built based on call detail records and web crawls we measure clustering quality both by modularity as well as by the geographic and topical homogeneity of the clusters compared to divide and merge we give more homogeneous clusters with more desirable distribution of the cluster sizes
enforcing strong replica consistency among set of replicas of service deployed across an asynchronous distributed system in the presence of crash failures is real practical challenge if each replica runs the consistency protocol bundled with the actual service implementation this target cannot be achieved as replicas need to be located over partially synchronous distributed system to solve the distributed agreement problems underlying strong replica consistencya three tier architecture for software replication enables the separation of the replication logic ie protocols and mechanisms necessary for managing software replication from both clients and server replicas the replication logic is embedded in middle tier that confines the need of partial synchrony and thus frees replica deploymentin this paper we first introduce the basic concepts underlying three tier replication then we present the interoperable replication logic irl architecture fault tolerant corba compliant infrastructure irl exploits three tier approach to replicate stateful deterministic corba objects and allows object replicas to run on object request brokers from different vendors description of an irl prototype developed in our department is proposed along with an extensive performance analysis
this paper presents cognitive vision system capable of autonomously learning protocols from perceptual observations of dynamic scenes the work is motivated by the aim of creating synthetic agent that can observe scene containing interactions between unknown objects and agents and learn models of these sufficient to act in accordance with the implicit protocols present in the scene discrete concepts utterances and object properties and temporal protocols involving these concepts are learned in an unsupervised manner from continuous sensor input alone crucial to this learning process are methods for spatio temporal attention applied to the audio and visual sensor data these identify subsets of the sensor data relating to discrete concepts clustering within continuous feature spaces is used to learn object property and utterance models from processed sensor data forming symbolic description the progol inductive logic programming system is subsequently used to learn symbolic models of the temporal protocols presented in the presence of noise and over representation in the symbolic data input to it the models learned are used to drive synthetic agent that can interact with the world in semi natural way the system has been evaluated in the domain of table top game playing and has been shown to be successful at learning protocol behaviours in such real world audio visual environments
the modeling analysis and design of systems is generally based on many formalisms to describe discrete and or continuous behaviors and to map these descriptions into specific platform in this context the article proposes the concept of functional metamodeling to capture then to integrate modeling languages the concept offers an alternative to standard model driven engineering mde and is well adapted to mathematical descriptions such as the ones found in system modeling as an application set of functional metamodels is proposed for dataflows usable to model continuous behaviors state transition systems usable to model discrete behaviors and metamodel for actions to model interactions with target platform and concurrent execution model of control architecture for legged robot is proposed as an application of these modeling languages
we propose simple obstacle model to be used while simulating wireless sensor networks to the best of our knowledge this is the first time such an integrated and systematic obstacle model for these networks has been proposed we define several types of obstacles that can be found inside the deployment area of wireless sensor network and provide categorization of these obstacles based on their nature physical and communication obstacles ie obstacles that are formed out of node distribution patterns or have physical presence respectively their shape and their change of nature over time we make an extension to custom made sensor network simulator simdust and conduct number of simulations in order to study the effect of obstacles on the performance of some representative in terms of their logic data propagation protocols for wireless sensor networks our findings confirm that obstacle presence has significant impact on protocol performance and also that different obstacle shapes and sizes may affect each protocol in different ways this provides an insight into how routing protocol will perform in the presence of obstacles and highlights possible protocol shortcomings moreover our results show that the effect of obstacles is not directly related to the density of sensor network and cannot be emulated only by changing the network density
technological achievements have made it possible to fabricate cmos circuits with over billion transistors implement boolean operations using quantum devices and or the spin of an electron implement transformations using bio and molecular based cells problems with many of these technologies are due to such factors as process variations defects and impurities in materials and solutions and noise consequently many systems built from these technologies operate imperfectly luckily there are many complex and large market systems applications that tolerate acceptable though not always correct results in addition there is emerging body of mathematical analysis related to imperfect computation in this paper we first introduce the concepts of acceptable error tolerance and acceptable performance degradation and demonstrate how important attributes of these concepts can be quantified we interlace this discussion with several examples of systems that can effectively employ these two concepts next we mention several immerging technologies that motivate the need to study these concepts as well as related mathematical paradigms finally we will list few cad issues that are needed to support this new form of technological revolution
in mainstream oo languages inheritance can be used to add new methods or to override existing methods virtual classes and feature oriented programming are techniques which extend the mechanism of inheritance so that it is possible to refine nested classes as well these techniques are attractive for programming in the large because inheritance becomes tool for manipulating whole class hierarchies rather than individual classes nevertheless it has proved difficult to design static type systems for virtual classes because virtual classes introduce dependent types the compile time type of an expression may depend on the run time values of objects in that expressionwe present formal object calculus which implements virtual classes in type safe manner our type system uses novel technique based on prototypes which blur the distinction between compile time and run time at run time prototypes act as objects and they can be used in ordinary computations at compile time they act as types prototypes are similar in power to dependent types and subtyping is shown to be form of partial evaluation we prove that prototypes are type safe but undecidable and briefly outline decidable semi algorithm for dealing with them
distributed sparing is method to improve the performance of raid disk arrays with respect to dedicated sparing system with disks including the spare disk since it utilizes the bandwidth of all disks we analyze the performance of raid with distributed sparing in normal mode degraded mode and rebuild mode in an oltp environment which implies small reads and writes the analysis in normal mode uses an queuing model which takes into account the components of disk service time in degraded mode low cost approximate method is developed to estimate the mean response time of fork join requests resulting from accesses to recreate lost data on the failed disk rebuild mode performance is analyzed by considering an vacationing server model with multiple vacations of different types to take into account differences in processing requirements for reading the first and subsequent tracks an iterative solution method is used to estimate the mean response time of disk requests as well as the time to read each disk which is shown to be quite accurate through validation against simulation results we next compare raid performance in system without cache with cache and with nonvolatile storage nvs cache the last configuration in addition to improved read response time due to cache hits provides fast write capability such that dirty blocks can be destaged asynchronously and at lower priority than read requests resulting in an improvement in read response time the small write penalty is also reduced due to the possibility of repeated writes to dirty blocks in the cache and by taking advantage of disk geometry to efficiently destage multiple blocks at time
this work investigates the problem of privacy preserving mining of frequent itemsets we propose procedure to protect the privacy of data by adding noisy items to each transaction then an algorithm is proposed to reconstruct frequent itemsets from these noise added transactions the experimental results indicate that this method can achieve rather high level of accuracy our method utilizes existing algorithms for frequent itemset mining and thereby takes full advantage of their progress to mine frequent itemset efficiently
in mobile computing environments as result of the reduced capacity of local storage it is commonly not feasible to replicate entire datasets on each mobile unit in addition reliable secure and economical access to central servers is not always possible moreover since mobile computers are designed to be portable they are also physically small and thus often unable to hold or process the large amounts of data held in centralised databases as many systems are only as useful as the data they can process the support provided by database and system management middleware for applications in mobile environments is an important driver for the uptake of this technology by application providers and thus also for the wider use of the technologyone of the approaches to maximize the available storage is through the use of database summarisation to date most strategies for reducing data volumes have used compression techniques that ignore the semantics of the data those that do not use data compression techniques adopt structural ie data and use independent methods in this paper we outline the special constraints imposed on storing information in mobile databases and provide flexible data summarisation policy the method works by assigning level of priority to each data item through the setting of number of parameters the paper discusses some policies for setting these parameters and some implementation strategies
with concurrent and garbage collected languages like java and becoming popular the need for suitable non intrusive efficient and concurrent multiprocessor garbage collector has become acute we propose novel mark and sweep on the fly algorithm based on the sliding views mechanism of levanoni and petrank we have implemented our collector on the jikes java virtual machine running on netfinity multiprocessor and compared it to the concurrent algorithm and to the stop the world collector supplied with jikes jvm the maximum pause time that we measured with our benchmarks over all runs was ms in all runs the pause times were smaller than those of the stop the world collector by two orders of magnitude and they were also always shorter than the pauses of the jikes concurrent collector throughput measurements of the new garbage collector show that it outperforms the jikes concurrent collector by up to as expected the stop the world does better than the on the fly collectors with results showing about differenceon top of being an effective mark and sweep on the fly collector standing on its own our collector may also be used as backup collector collecting cyclic data structures for the levanoni petrank reference counting collector these two algorithms perfectly fit sharing the same allocator similar data structure and similar jvm interface
dynamic energy performance scaling deps framework is proposed to save energy in fixed priority hard real time embedded systems in this generalized framework two existing technologies ie dynamic hardware resource configuration dhrc and dynamic voltage frequency scaling dvfs can be combined for energy performance tradeoff the problem of selecting the optimal hardware configuration and voltage frequency parameters is formulated to achieve maximal energy savings and meet the deadline constraint simultaneously through case study the effectiveness of deps has been validated
phrase based statistical machine translation approach mdash the alignment template approach mdash is described this translation approach allows for general many to many relations between words thereby the context of words is taken into account in the translation model and local changes in word order from source to target language can be learned explicitly the model is described using log linear modeling approach which is generalization of the often used source ndash channel approach thereby the model is easier to extend than classical statistical machine translation systems we describe in detail the process for learning phrasal translations the feature functions used and the search algorithm the evaluation of this approach is performed on three different tasks for the german ndash english speech verbmobil task we analyze the effect of various system components on the french ndash english canadian hansards task the alignment template system obtains significantly better results than single word based translation model in the chinese ndash english national institute of standards and technology nist machine translation evaluation it yields statistically significantly better nist scores than all competing research and commercial translation systems
we introduce the concept of administrative scope in role hierarchy and demonstrate that it can be used as basis for role based administration we then develop family of models for role hierarchy administration rha employing administrative scope as the central concept we then extend rha the most complex model in the family to complete decentralized model for role based administration we show that sarbac the resulting role based administrative model has significant practical and theoretical advantages over arbac we also discuss how administrative scope might be applied to the administration of general hierarchical structures how our model can be used to reduce inheritance in the role hierarchy and how it can be configured to support discretionary access control features
broadcast data dissemination is well suited for mobile wireless environments where bandwidth is scarce and mutual interference must be minimised however broadcasting monopolises the medium precluding clients from performing any other communication we address this problem in two ways firstly we segment the server broadcast with intervening periods of silence during which the wireless devices may communicate secondly we reduce the average access delay for clients using novel cooperative caching scheme our scheme is fully decentralised and uses information available locally at the client our results show that our model prevents the server from monopolising the medium and that our caching strategy reduces client access delays significantly
recently planning based on answer set programming has been proposed as an approach towards realizing declarative planning systems in this paper we present the language κc which extends the declarative planning language by action costs κc provides the notion of admissible and optimal plans which are plans whose overall action costs are within given limit resp minimum over all plans ie cheapest plans as we demonstrate this novel language allows for expressing some nontrivial planning tasks in declarative way furthermore it can be utilized for representing planning problems under other optimality criteria such as computing shortest plans with the least number of steps and refinement combinations of cheapest and fastest plans we study complexity aspects of the language κc and provide transformation to logic programs such that planning problems are solved via answer set programming furthermore we report experimental results on selected problems our experience is encouraging that answer set planning may be valuable approach to expressive planning systems in which intricate planning problems can be naturally specified and solved
in this contribution we present new paradigm and methodology for the network on chip noc based design of complex hardware software systems while classical industrial design platforms represent dedicated fixed architectures for specific applications flexible noc architectures open new degrees of system reconfigurability after giving an overview on required demands for noc hyper platforms we describe the realisation of these prerequisites within the hinoc platform we introduce new dynamic hardware software co design methodology for pre and post manufacturing design finally we will summarize the concept combined with an outlook on further investigations
mobile services operate on hosts with diverse capabilities in heterogeneous networks where the usage of resources such as processor memory and network is constantly changing in order to maintain efficiency in terms of performance and resource utilization such services should be able to adapt to changes in their environmentthis paper proposes and empirically evaluates an application transparent adaptation strategy for service oriented systems the strategy is based upon the solution of an optimization model derived from an existing suite of metrics for services which maps system services to network nodesthe strategy is evaluated empirically using number of distinct scenarios involving runtime changes in processor memory and network utilization in order to maintain execution efficiency in response to these changing operating conditions the strategy rearranges the service topology of the system dynamically by moving services between network nodes the results show that the negative impact of environmental changes on runtime efficiency can be reduced after adaptation from to depending on the selected parameters
we present an optimization framework for exploring gradient domain solutions for image and video processing the proposed framework unifies many of the key ideas in the gradient domain literature under single optimization formulation our hope is that this generalized framework will allow the reader to quickly gain general understanding of the field and contribute new ideas of their own we propose novel metric for measuring local gradient saliency that identifies salient gradients that give rise to long coherent edges even when the individual gradients are faint we present general weighting scheme for gradient constraints that improves the visual appearance of results we also provide solution for applying gradient domain filters to videos and video streams in coherent manner finally we demonstrate the utility of our formulation in creating effective yet simple to implement solutions for various image processing tasks to exercise our formulation we have created new saliency based sharpen filter and pseudo image relighting application we also revisit and improve upon previously defined filters such as nonphotorealistic rendering image deblocking and sparse data interpolation over images eg colorization using optimization
upcoming multi media compression applications will require high memory bandwidth in this paper we estimate that software reference implementation of an mpeg video decoder typically requires mtransfers to memory to decode cif times video object plane vop at frames this imposes high penalty in terms of power but also performancehowever we also show that we can heavily improve on the memory transfers without sacrificing speed even gaining about on cache misses and cycles for dec alpha by aggressive code transformations for this purpose we have manually applied an extended version of our data transfer and storage exploration dtse methodology which was originally developed for custom hardware implementations
although hierarchical pp systems have been found to outperform flat systems in many respects current pp research does not focus on strategies to build and maintain such systems available solutions assume either no or little coordination between peers that could lead the system toward satisfying globally defined goal eg minimizing traffic in this paper we focus on hierarchical dhts and provide full set of algorithms to build and maintain such systems that mitigate this problem in particular given the goal state of minimizing the total traffic without overloading any peer our algorithms dynamically adjust the system state as to keep the goal met at any time the algorithms are fully decentralized and probabilistic all decisions taken by the peers are based on their partial view on set of system wide parameters thus they demonstrate the main principle of self organization the system behavior emerges from local interactions our simulations run in range of realistic settings confirm good performance of the algorithms
programmers have traditionally used locks to synchronize concurrent access to shared data lock based synchronization however has well known pitfalls using locks for fine grain synchronization and composing code that already uses locks are both difficult and prone to deadlock transactional memory provides an alternate concurrency control mechanism that avoids these pitfalls and significantly eases concurrent programming transactional memory language constructs have recently been proposed as extensions to existing languages or included in new concurrent language specifications opening the door for new compiler optimizations that target the overheads of transactional memorythis paper presents compiler and runtime optimizations for transactional memory language constructs we present high performance software transactional memory system stm integrated into managed runtime environment our system efficiently implements nested transactions that support both composition of transactions and partial roll back our jit compiler is the first to optimize the overheads of stm and we show novel techniques for enabling jit optimizations on stm operations we measure the performance of our optimizations on way smp running multi threaded transactional workloads our results show that these techniques enable transactional memory’s performance to compete with that of well tuned synchronization
this paper proposes neural network based approach for solving the resource discovery problem in peer to peer pp networks and an adaptive global local memetic algorithm aglma for performing the training of the neural network this training is very challenging due to the large number of weights and noise caused by the dynamic neural network testing the aglma is memetic algorithm consisting of an evolutionary framework which adaptively employs two local searchers having different exploration logic and pivot rules furthermore the aglma makes an adaptive noise compensation by means of explicit averaging on the fitness values and dynamic population sizing which aims to follow the necessity of the optimization process the numerical results demonstrate that the proposed computational intelligence approach leads to an efficient resource discovery strategy and that the aglma outperforms two classical resource discovery strategies as well as popular neural network training algorithm
this paper introduces the prophet critic hybrid conditionalbranch predictor which has two component predictorsthat play the role of either prophet or critictheprophet is conventional predictor that uses branch historyto predict the direction of the current branchfurther accessesof the prophet yield predictions for the branches followingthe current onepredictions for the current branchand the ones that follow are collectively known as thebranch’s futurethey are actually prophecy or predictedbranch futurethe critic uses both the branch’s history andfuture to give critique of the prophet’s prediction fo thecurrent branchthe critique either agree or disagree isused to generate the final prediction for the branchour results show an byte prophet critic hybridhas fewer mispredicts than byte bc gskewpredictor predictor similar to that of the proposed compaq alpha ev processor across wide range of applicationsthe distance between pipeline flushes due to mispredictsincreases from one flush per micro operations uops to one per uopsfor gcc the percentage of mispredictedbranches drops from to on machinebased on the intel pentium processor this improvesupc uops per cycle by for gcc andreduces the number of uops fetched along both correct andincorrect paths by
in this paper we propose new dynamic and efficient bounding volume hierarchy for breakable objects undergoing structured and or unstructured motion our object space method is based on different ways to incrementally update the hierarchy during simulation by exploiting temporal coherence and lazy evaluation techniques this leads to significant advantages in terms of execution speed furthermore we also show how our method lends itself naturally for an adaptive low memory cost implementation which may be of critical importance in some applications finally we propose two different techniques for detecting self intersections one using our hierarchical data structure and the other is an improved sorting based method
in this article we present an experimental study of the properties of webgraphs we study large crawl from of pages and about billion edges made available by the webbase project at stanford as well as several synthetic ones generated according to various models proposed recently we investigate several topological properties of such graphs including the number of bipartite cores and strongly connected components the distribution of degrees and pagerank values and some correlations we present comparison study of the models against these measuresour findings are that the webbase sample differs slightly from the older samples studied in the literature and ii despite the fact that these models do not catch all of its properties they do exhibit some peculiar behaviors not found for example in the models from classical random graph theorymoreover we developed software library able to generate and measure massive graphs in secondary memory this library is publicy available under the gpl licence we discuss its implementation and some computational issues related to secondary memory graph algorithms
process algebraic techniques for distributed systems are increasingly being targeted at identifying abstractions that are adequate for both high level programming and specification and security analysis and verification drawing on our earlier work in bugliesi and focardi we investigate the expressive power of core set of security and network abstractions that provide high level primitives for specifying the honest principals in network while at the same time enabling an analysis of the network level adversarial attacks that may be mounted by an intruder we analyse various bisimulation equivalences for security that arise from endowing the intruder with label label different adversarial capabilities and label ii label increasingly powerful control over the interaction among the distributed principals of network by comparing the relative strength of the bisimulation equivalences we obtain direct measure of the intruder’s discriminating power and hence of the expressiveness of the corresponding intruder model
we present aspect oriented programming in jiazzi jiazzi enhances java with separately compiled externally linked code modules called units units can act as effective aspect constructs with the ability to separate crosscutting concern code in non invasive and safe way unit linking provides convenient way for programmers to explicitly control the inclusion and configuration of code that implements concern while separate compilation of units enhances the independent development and deployment of the concern the expressiveness of concern separation is enhanced by units in two ways first classes can be made open to the addition of new behavior fields and methods after they are initially defined which enables the direct modularization of concerns whose code crosscut object boundaries second the signatures of methods and classes can also be made open to refinement which permits more aggressive modularization by isolating the naming and calling requirements of concern implementation
end user programming has become ubiquitous so much so that there are more end user programmers today than there are professional programmers end user programming empowers but to do what make really bad decisions based on really bad programs enter software engineering’s focus on quality considering software quality is necessary because there is ample evidence that the programs end users create are filled with expensive errors in this paper consider what happens when we add to end user programming environments considerations of software quality going beyond the create program aspect of end user programming describe philosophy to software engineering for end users and then survey several projects in this area basic premise is that end user software engineering can only succeed to the extent that it respects the fact that the user probably has little expertise or even interest in software engineering
motivated by the optimality of shortest remaining processing time srpt for mean response time in recent years many computer systems have used the heuristic of favoring small jobs in order to dramatically reduce user response times however rarely do computer systems have knowledge of exact remaining sizes in this paper we introduce the class of smart policies which formalizes the heuristic of favoring small jobs in way that includes wide range of policies that schedule using inexact job size information examples of smart policies include policies that use exact size information eg srpt and psjf ii policies that use job size estimates and iii policies that use finite number of size based priority levels for many smart policies eg srpt with inexact job size information there are no analytic results available in the literature in this work we prove four main results we derive upper and lower bounds on the mean response time the mean slowdown the response time tail and the conditional response time of smart policies in each case the results explicitly characterize the tradeoff between the accuracy of the job size information used to prioritize and the performance of the resulting policy thus the results provide designers insight into how accurate job size information must be in order to achieve desired performance guarantees
in this paper we show how to augment object oriented application interfaces with enhanced specifications that include sequencing constraints called protocols protocols make explicit the relationship between messages methods supported by the application these relationships are usually only given implicitly either in the code or in textual comments we define notions of interface compatibility based upon protocols and show how compatibility can be checked discovering class of errors that cannot be discovered via the type system alone we then define software adaptors that can be used to bridge the difference between object oriented applications that have functionally compatible but type incompatible interfaces we discuss what it means for an adaptor to be well formed leveraging the information provided by protocols we show how adaptors can be automatically generated from high level description called an interface mapping
location based and personalized services are the key factors for promoting user satisfaction however most service providers did not consider the needs of mobile user in terms of their location and event participation consequently the service provider may lose the chance for better service and profit in this paper we present multi stage collaborative filtering mscf process to provide event recommendation based on mobile user’s location to achieve this purpose the collaborative filtering cf technique is employed and the adaptive resonance theory art network is applied to cluster mobile users according to their personal profile sequential pattern mining is then used to discover the correlations between events for recommendation the mscf is designed not only to recommend for the old registered mobile user ormu but also to handle the cold start problem for new registered mobile user nrmu this research is designed to achieve the followings to present personalized event recommendation system for mobile users to discover mobile users moving patterns to provide recommendations based on mobile users preferences to overcome the cold start problem for new registered mobile user the experimental results of this research show that the mscf is able to accomplish the above purposes and shows better outcome for cold start problem when comparing with user based cf and item based cf
in this research we aim to identify factors that significantly affect the clickthrough of web searchers our underlying goal is determine more efficient methods to optimize the clickthrough rate we devise clickthrough metric for measuring customer satisfaction of search engine results using the number of links visited number of queries user submits and rank of clicked links we use neural network to detect the significant influence of searching characteristics on future user clickthrough our results show that high occurrences of query reformulation lengthy searching duration longer query length and the higher ranking of prior clicked links correlate positively with future clickthrough we provide recommendations for leveraging these findings for improving the performance of search engine retrieval and result ranking along with implications for search engine marketing copy wiley periodicals inc
various code certification systems allow the certification and static verification of important safety properties such as memory and control flow safety these systems are valuable tools for verifying that untrusted and potentially malicious code is safe before execution however one important safety property that is not usually included is that programs adhere to specific bounds on resource consumption such as running time we present decidable type system capable of specifying and certifying bounds on resource consumption our system makes two advances over previous resource bound certification systems both of which are necessary for practical system we allow the execution time of programs and their subroutines to vary depending on their arguments and we provide fully automatic compiler generating certified executables from source level programs the principal device in our approach is strategy for simulating dependent types using sum and inductive kinds
assessing mobility in thorough fashion is crucial step toward more efficient mobile network design recent research on mobility has focused on two main points analyzing models and studying their impact on data transport these works investigate the consequences of mobility in this paper instead we focus on the causes of mobility starting from established research in sociology we propose simps mobility model of human crowds with pedestrian motion this model defines process called sociostation rendered by two complimentary behaviors namely socialize and isolate that regulate an individual with regard to her his own sociability level simps leads to results that agree with scaling laws observed both in small scale and large scale human motion although our model defines only two simple individual behaviors we observe many emerging collective behaviors group formation splitting path formation and evolution
data domain description techniques aim at deriving concise descriptions of objects belonging to category of interest for instance the support vector domain description svdd learns hypersphere enclosing the bulk of provided unlabeled data such that points lying outside of the ball are considered anomalous however relevant information such as expert and background knowledge remain unused in the unsupervised setting in this paper we rephrase data domain description as semi supervised learning task that is we propose semi supervised generalization of data domain description sssvdd to process unlabeled and labeled examples the corresponding optimization problem is non convex we translate it into an unconstraint continuous problem that can be optimized accurately by gradient based techniques furthermore we devise an effective active learning strategy to query low confidence observations our empirical evaluation on network intrusion detection and object recognition tasks shows that our sssvdds consistently outperform baseline methods in relevant learning settings
hair simulation remains one of the most challenging aspects of creating virtual characters most research focuses on handling the massive geometric complexity of hundreds of thousands of interacting hairs this is accomplished either by using brute force simulation or by reducing degrees of freedom with guide hairs this paper presents hybrid eulerian lagrangian approach to handling both self and body collisions with hair efficiently while still maintaining detail bulk interactions and hair volume preservation is handled efficiently and effectively with flip based fluid solver while intricate hair hair interaction is handled with lagrangian self collisions thus the method has the efficiency of continuum guide based hair models with the high detail of lagrangian self collision approaches
we present an adaptive work stealing thread scheduler steal for fork join multithreaded jobs like those written using the cilk multithreaded language or the hood work stealing library the steal algorithm is appropriate for large parallel servers where many jobs share common multiprocessor resource and in which the number of processors available to particular job may vary during the job’s execution steal provides continual parallelism feedback to job scheduler in the form of processor requests and the job must adaptits execution to the processors allotted to it assuming that the job scheduler never allots any job more processors than requested by thejob’s thread scheduler steal guarantees that the job completes in near optimal time while utilizing at least constant fraction of the allotted processors our analysis models the job scheduler as the thread scheduler’s adversary challenging the thread scheduler to be robust to the system environment and the job scheduler’s administrative policies we analyze the performance of steal using trim analysis which allows us to prove that our thread scheduler performs poorly on at most small number of time steps while exhibiting near optimal behavior on the vast majority to be precise suppose that job has work and span critical path length on machine with processors steal completes the job in expected lg time steps where is the length of scheduling quantum and denotes the lg trimmed availability this quantity is the average of the processor availability over all but the lg time steps having the highest processor availability when the job’s parallelism dominates the trimmed availability that is the job achieves nearly perfect linear speedup conversely when the trimmed mean dominates the parallelism the asymptotic running time of the job is nearly its span
mobile storage devices such as usb flash drives offer flexible solution for the transport and exchange of data nevertheless in order to prevent unauthorized access to sensitive data many enterprises require strict security policies for the use of such devices with the effect of rendering their advantages rather unfruitful trusted virtual domains tvds provide secure it infrastructure offering homogeneous and transparent enforcement of access control policies on data and network resources however the current model does not specifically deal with mobile storage devices in this paper we present an extension of the tvd architecture to incorporate the usage of mobile storage devices our proposal addresses three major issues coherent extension of tvd policy enforcement by introducing architectural components that feature identification and management of transitory devices transparent mandatory encryption of sensitive data stored on mobile devices and highly dynamic centralized key management service in particular we address offline scenarios allowing users to access and modify data while being temporarily disconnected from the domain we also present prototype implementation based on the turaya security kernel
we consider the problem of clustering web image search results generally the image search results returned by an image search engine contain multiple topics organizing the results into different semantic clusters facilitates users browsing in this paper we propose hierarchical clustering method using visual textual and link analysis by using vision based page segmentation algorithm web page is partitioned into blocks and the textual and link information of an image can be accurately extracted from the block containing that image by using block level link analysis techniques an image graph can be constructed we then apply spectral techniques to find euclidean embedding of the images which respects the graph structure thus for each image we have three kinds of representations ie visual feature based representation textual feature based representation and graph based representation using spectral clustering techniques we can cluster the search results into different semantic clusters an image search example illustrates the potential of these techniques
several supervised learning algorithms are suited to classify instances into multiclass value space multinomial logit mnl is recognized as robust classifier and is commonly applied within the crm customer relationship management domain unfortunately to date it is unable to handle huge feature spaces typical of crm applications hence the analyst is forced to immerse himself into feature selection surprisingly in sharp contrast with binary logit current software packages lack any feature selection algorithm for multinomial logit conversely random forests another algorithm learning multiclass problems is just like mnl robust but unlike mnl it easily handles high dimensional feature spaces this paper investigates the potential of applying the random forests principles to the mnl framework we propose the random multinomial logit rmnl ie random forest of mnls and compare its predictive performance to that of mnl with expert feature selection random forests of classification trees we illustrate the random multinomial logit on cross sell crm problem within the home appliances industry the results indicate substantial increase in model accuracy of the rmnl model to that of the mnl model with expert feature selection
dynamic information flow tracking dift is an important tool for detecting common security attacks and memory bugs dift tool tracks the flow of information through monitored program’s registers and memory locations as the program executes detecting and containing fixing problems on the fly unfortunately sequential dift tools are quite slow and dift is quite challenging to parallelize in this paper we present new approach to parallelizing dift like functionality extending our recent work on accelerating sequential dift we consider variant of dift that tracks the information flow only through unary operations relaxed dift and yet makes sense for detecting security attacks and memory bugs we present parallel algorithm for relaxed dift based on symbolic inheritance tracking which achieves linear speed up asymptotically moreover we describe techniques for reducing the constant factors so that speed ups can be obtained even with just few processors we implemented the algorithm in the context of log based architectures lba system which provides hardware support for logging program trace and delivering it to other monitoring processors our simulation results on spec benchmarks and video player show that our parallel relaxed dift reduces the overhead to as low as using monitoring cores on core chip multiprocessor
this article presents resolution matched shadow maps rmsm modified adaptive shadow map asm algorithm that is practical for interactive rendering of dynamic scenes adaptive shadow maps which build quadtree of shadow samples to match the projected resolution of each shadow texel in eye space offer robust solution to projective and perspective aliasing in shadow maps however their use for interactive dynamic scenes is plagued by an expensive iterative edge finding algorithm that takes highly variable amount of time per frame and is not guaranteed to converge to correct solution this article introduces simplified algorithm that is up to ten times faster than asms has more predictable performance and delivers more accurate shadows our main contribution is the observation that it is more efficient to forgo the iterative refinement analysis in favor of generating all shadow texels requested by the pixels in the eye space image the practicality of this approach is based on the insight that for surfaces continuously visible from the eye adjacent eye space pixels map to adjacent shadow texels in quadtree shadow space this means that the number of contiguous regions of shadow texels which can be efficiently generated with rasterizer is proportional to the number of continuously visible surfaces in the scene moreover these regions can be coalesced to further reduce the number of render passes required to shadow an image the secondary contribution of this paper is demonstrating the design and use of data parallel algorithms inseparably mixed with traditional graphics programming to implement novel interactive rendering algorithm for the scenes described in this paper we achieve frames per second on static scenes and frames per second on dynamic scenes for and images with maximum effective shadow resolution of texels
escape analysis is static analysis that determines whether the lifetime of data may exceed its static scopethis paper first presents the design and correctness proof of an escape analysis for javatm this analysis is interprocedural context sensitive and as flow sensitive as the static single assignment form so assignments to object fields are analyzed in flow insensitive manner since java is an imperative language the effect of assignments must be precisely determined this goal is achieved thanks to our technique using two interdependent analyses one forward one backward we introduce new method to prove the correctness of this analysis using aliases as an intermediate step we use integers to represent the escaping parts of values which leads to fast and precise analysisour implementation blanchet which applies to the whole java language is then presented escape analysis is applied to stack allocation and synchronization elimination in our benchmarks we stack allocate percnt to percnt of data eliminate more than percnt of synchronizations on most programs percnt and percnt on two examples and get up to percnt runtime decrease percnt on average our detailed experimental study on large programs shows that the improvement comes more from the decrease of the garbage collection and allocation times than from improvements on data locality contrary to what happened for ml this comes from the difference in the garbage collectors
regular path queries are way of declaratively expressing queries on graphs as regular expression like patterns that are matched against paths in the graph there are two kinds of queries existential queries which specify properties about individual paths and universal queries which specify properties about all paths they provide simple and convenient framework for expressing program analyses as queries on graph representations of programs for expressing verification model checking problems as queries on transition systems for querying semi structured data etc parametric regular path queries extend the patterns with variables called parameters which significantly increase the expressiveness by allowing additional information along single or multiple paths to be captured and relatethis paper shows how variety of program analysis and model checking problems can be expressed easily and succinctly using parametric regular path queries the paper describes the specification design analysis and implementation of algorithms and data structures for efficiently solving existential and universal parametric regular path queries major contributions include the first complete algorithms and data structures for directly and efficiently solving existential and universal parametric regular path queries detailed complexity analysis of the algorithms detailed analytical and experimental performance comparison of variations of the algorithms and data structures and investigation of efficiency tradeoffs between different formulations of queries
mobile agent technology has emerged as promising programming paradigm for developing highly dynamic and large scale service oriented computing middlewares due to its desirable features for this purpose first of all scalable location transparent agent communication issue should be addressed in mobile agent systems despite agent mobility although there were proposed several directory service and message delivery mechanisms their disadvantages force them not to be appropriate to both low overhead location management and fast delivery of messages to agents migrating frequently to mitigate their limitations this paper presents scalable distributed directory service and message delivery mechanism the proposed mechanism enables each mobile agent to autonomously leave tails of forwarding pointers on some few of its visiting nodes depending on its preferences this feature results in low message forwarding overhead and low storage and maintenance cost of increasing chains of pointers per host also keeping mobile agent location information in the effective binding cache of each sending agent the sending agent can communicate with mobile agents much faster compared with the existing ones
idle resources can be exploited not only to run important local tasks such as data replication and virus checking but also to make contributions to society by participating in open computing projects like seti home when executing background processes to utilize such valuable idle resources we need to explicitly control them so that the user will not be discouraged from exploiting idle resources by foreground performance degradation unfortunately common priority based schedulers lack such explicit execution control in addition to encourage active use of idle resources mechanism for controlling background processes should not require modifications to the underlying operating system or user applications if such modifications are required the user may be reluctant to employ the mechanism in this paper we argue that we can reasonably detect resource contention between foreground and background processes and properly control background process execution at the user level we infer the existence of resource contention from the approximated resource shares of background processes our approach takes advantage of dynamically instrumented probes which are becoming increasingly popular in estimating the resource shares also it considers different resource types in combination and can handle varied workloads including multiple background processes we show that our system effectively avoids the performance degradation of foreground activities by suspending background processes in an appropriate fashion our system keeps the increase in foreground execution time due to background processes below or much lower in most of our experiments also we extend our approach to address undesirable resource allocations to cpu intensive processes that can occur in multiprocessor environments
we practices lack an impact on industry partly due to we field that is not quality aware in fact it is difficult to find we methodologies that pay explicit attention to quality aspects however the use of systematic process that includes quality concerns from the earliest stages of development can contribute to easing the building up of quality guaranteed web applications without drastically increasing development costs and time to market in this kind of process quality issues should be taken into account while developing each outgoing artifact from the requirements model to the final application also quality models should be defined to evaluate the quality of intermediate we artifacts and how it contributes to improving the quality of the deployed application in order to tackle its construction while avoiding some of the most common problems that existing quality models suffer from in this paper we propose number of we quality models to address the idiosyncrasies of the different stakeholders and we software artifacts involved additionally we propose that these we quality models are supported by an ontology based we measurement meta model that provides set of concepts with clear semantics and relationships this we quality metamodel is one of the main contributions of this paper furthermore we provide an example that illustrates how such metamodel may drive the definition of particular we quality model
in this work we present rendering method with guaranteed interactive frame rates in complex scenes the algorithm is based on an new data structure determined in preprocessing to avoid frozen displays in large simulative visualizations like industrial plants typically described as cad models within preprocessing polygons are grouped by size and within these groups core clusters are calculated based on similarity and locality the clusters and polygons are building up hierarchy including weights ascertained within repetitive stages of re grouping and re clustering this additional information allows to choose subset over all primitives to reduce scene complexity depending on the viewer’s position sight and the determined weights within the hierarchy to guarantee specific frame rate the number of rendered primitives is limited by constant and typically constrained by hardware this reduction is controlled by the pre calculated weights and the viewer’s position and is not done arbitrarily at least the rendered section is suitable scene approximation that includes the viewer’s interests combining all this constant frame rate including million polygons at fps is obtainable practical results indicate that our approach leads to good scene approximations and realtime rendering of very large environments at the same time
because of the high volume and unpredictable arrival rate stream processing systems may not always be able to keep up with the input data streams resulting in buffer overflow and uncontrolled loss of data load shedding the prevalent strategy for solving this overflow problem has so far only been considered for relational stream processing but not for xml shedding applied to xml stream processing brings new opportunities and challenges due to complex nested nature of xml structures in this paper we tackle this unsolved xml shedding problem using three pronged approach first we develop an xquery preference model that enables users to specify the relative importance of preserving different subpatterns in the xml result structure this transforms shedding into the problem of rewriting the user query into shed queries that return approximate query answers with utility as measured by the given user preference model second we develop cost model to compare the performance of alternate shed queries third we develop two shedding algorithms optshed and fastshed optshed guarantees to find an optimal solution however at the cost of exponential complexity fastshed as confirmed by our experiments achieves close to optimal result in wide range of test cases finally we describe the in automaton shedding mechanism for xquery stream engines the experiments show that our proposed utility driven shedding solutions consistently achieve higher utility results compared to the existing relational shedding techniques
in the recent years the web has been rapidly deepened with the prevalence of databases online on this deep web many sources are structured by providing structured query interfaces and results organizing such structured sources into domain hierarchy is one of the critical steps toward the integration of heterogeneous web sources we observe that for structured web sources query schemas ie attributes in query interfaces are discriminative representatives of the sources and thus can be exploited for source characterization in particular by viewing query schemas as type of categorical data we abstract the problem of source organization into the clustering of categorical data our approach hypothesizes that homogeneous sources are characterized by the same hidden generative models for their schemas to find clusters governed by such statistical distributions we propose new objective function model differentiation which employs principled hypothesis testing to maximize statistical heterogeneity among clusters our evaluation over hundreds of real sources indicates that the schema based clustering accurately organizes sources by object domains eg books movies and on clustering web query schemas the model differentiation function outperforms existing ones such as likelihood entropy and context linkages with the hierarchical agglomerative clustering algorithm
nearest neighbor search is an important and widely used technique in number of important application domains in many of these domains the dimensionality of the data representation is often very high recent theoretical results have shown that the concept of proximity or nearest neighbors may not be very meaningful for the high dimensional case therefore it is often complex problem to find good quality nearest neighbors in such data sets furthermore it is also difficult to judge the value and relevance of the returned results in fact it is hard for any fully automated system to satisfy user about the quality of the nearest neighbors found unless he is directly involved in the process this is especially the case for high dimensional data in which the meaningfulness of the nearest neighbors found is questionable in this paper we address the complex problem of high dimensional nearest neighbor search from the user perspective by designing system which uses effective cooperation between the human and the computer the system provides the user with visual representations of carefully chosen subspaces of the data in order to repeatedly elicit his preferences about the data patterns which are most closely related to the query point these preferences are used in order to determine and quantify the meaningfulness of the nearest neighbors our system is not only able to find and quantify the meaningfulness of the nearest neighbors but is also able to diagnose situations in which the nearest neighbors found are truly not meaningful
we propose finite element simulation method that addresses the full range of material behavior from purely elastic to highly plastic for physical domains that are substantially reshaped by plastic flow fracture or large elastic deformations to mitigate artificial plasticity we maintain simulation mesh in both the current state and the rest shape and store plastic offsets only to represent the non embeddable portion of the plastic deformation to maintain high element quality in tetrahedral mesh undergoing gross changes we use dynamic meshing algorithm that attempts to replace as few tetrahedra as possible and thereby limits the visual artifacts and artificial diffusion that would otherwise be introduced by repeatedly remeshing the domain from scratch our dynamic mesher also locally refines and coarsens mesh and even creates anisotropic tetrahedra wherever simulation requests it we illustrate these features with animations of elastic and plastic behavior extreme deformations and fracture
shopping lists play central role in grocery shopping among other things shopping lists serve as memory aids and as tool for budgeting more interestingly shopping lists serve as an expression and indication of customer needs and interests accordingly shopping lists can be used as an input for recommendation techniques in this paper we describe methodology for making recommendations about additional products to purchase using items on the user’s shopping list as shopping list entries seldom correspond to products we first use information retrieval techniques to map the shopping list entries into candidate products association rules are used to generate recommendations based on the candidate products we evaluate the usefulness and interestingness of the recommendations in user study
the emergence of wireless and mobile networks has made possible the introduction of new research area commerce or mobile commerce mobile payment is natural successor to web centric payments which has emerged as one of the sub domains of mobile commerce applications study reveals that there are wide ranges of mobile payment solutions and models which are available with the aid of various services such as short message service sms but there is no specific mobile payment system for educational institutions to collect the fees as well as for student community to pay the fees without huge investment this paper proposes secured framework for mobile payment consortia system mpcs to carry out the transactions from the bank to the academic institutions for the payment of fees by students through mobile phone mobile payment consortia system provides an end to end security using public key infrastructure pki through mobile information device profile midp enabled mobile device this framework provides an efficient reliable and secured system to perform mobile payment transactions and reduces transactional cost for both students and educational institutions mobile payment consortia system is designed with strong authentication and non repudiation by employing digital signatures confidentiality and message integrity are also provided by encrypting the messages at application level and by using public key certificates and digital signature envelops
the advance of multi core architectures provides significant benefits for parallel and throughput oriented computing but the performance of individual computation threads does not improve and may even suffer penalty because of the increased contention for shared resources this paper explores the idea of using available general purpose cores in cmp as helper engines for individual threads running on the active cores we propose lightweight architectural framework for efficient event driven software emulation of complex hardware accelerators and describe how this framework can be applied to implement variety of prefetching techniques we demonstrate the viability and effectiveness of our framework on wide range of applications from the spec cpu and olden benchmark suites on average our mechanism provides performance benefits within of pure hardware implementations furthermore we demonstrate that running event driven prefetching threads on top of baseline with hardware stride prefetcher yields significant speedups for many programs finally we show that our approach provides competitive performance improvements over other hardware approaches for multi core execution while executing fewer instructions and requiring considerably less hardware support
texture is an essential component of computer generated models for texture mapping procedure to be effective it has to generate continuous textures and cause only small mapping distortion the angle based flattening abf parameterization method is guaranteed to provide continuous no foldovers mapping it also minimizes the angular distortion of the parameterization including locating the optimal planar domain boundary however since it concentrates on minimizing the angular distortion of the mapping it can introduce relatively large linear distortionin this paper we introduce new procedure for reducing length distortion of an existing parameterization and apply it to abf results the linear distortion reduction is added as second step in texture mapping computation the new method is based on computing mapping from the plane to itself which has length distortion very similar to that of the abf parameterization by applying the inverse mapping to the result of the initial parameterization we obtain new parameterization with low length distortion we notice that the procedure for computing the inverse mapping can be applied to any other convenient mapping from the three dimensional surface to the plane in order to improve itthe mapping in the plane is computed by applying weighted laplacian smoothing to cartesian grid covering the planar domain of the initial mapping both the mapping and its inverse are provably continuous since angle preserving conformal mappings such as abf locally preserve distances as well the planar mapping has small local deformation as result the inverse mapping does not significantly increase the angular distortionthe combined texture mapping procedure provides mapping with low distance and angular distortion which is guaranteed to be continuous
retrieval speed and precision ultimately determine the success of any database system this article outlines the challenges posed by distributed and heterogeneous database systems including those that store unstructured data and surveys recent work much work remains to help users retrieve information with ease and efficiency from heterogeneous environment in which relational object oriented textual and pictorial databases coexist the article outlines the progress that has been made in query processing in distributed relational database systems and heterogeneous and multidatabase systems
the ultimate challenges of system modeling concern designing accurate yet highly transparent and user centric models we have witnessed plethora of neurofuzzy architectures which are aimed at addressing these two highly conflicting requirements this study is concerned with the design and the development of transparent logic networks realized with the aid of fuzzy neurons and fuzzy unineurons the construction of networks of this form requires formation of efficient interfaces that constitute conceptually appealing bridge between the model and the real world experimental environment in which the model is to be used in general the interfaces are constructed by invoking some form of granulation of information and binary boolean discretization in particular we introduce new discretization environment that is realized by means of particle swarm optimization pso and data clustering implemented by the means algorithm the underlying structure of the network is optimized by invoking combination of the pso and the mechanisms of conventional gradient based learning we discuss various optimization strategies by considering boolean as well as fuzzy data coming as the result of discretization of original experimental data and then involving several learning strategies we elaborate on the interpretation aspects of the network and show how those could be strengthened through efficient pruning we also show how the interpreted network leads to simpler and more accurate logic description of the experimental data number of experimental studies are included
this paper shows that even very small data caches when split to serve data streams exhibiting temporal and spatial localities can improve performance of embedded applications without consuming excessive silicon real estate or power it also shows that large block sizes or higher set associativities are unnecessary with split cache organizations we use benchmark programs from mibench to show that our cache organization outperforms other organizations in terms of miss rates access times energy consumption and silicon area
in this paper we address the task of crosslingual semantic relatedness we introduce method that relies on the information extracted from wikipedia by exploiting the interlanguage links available between wikipedia versions in multiple languages through experiments performed on several language pairs we show that the method performs well with performance comparable to monolingual measures of relatedness
the fair evaluation and comparison of side channel attacks and countermeasures has been long standing open question limiting further developments in the field motivated by this challenge this work makes step in this direction and proposes framework for the analysis of cryptographic implementations that includes theoretical model and an application methodology the model is based on commonly accepted hypotheses about side channels that computations give rise to it allows quantifying the effect of practically relevant leakage functions with combination of information theoretic and security metrics measuring the quality of an implementation and the strength of an adversary respectively from theoretical point of view we demonstrate formal connections between these metrics and discuss their intuitive meaning from practical point of view the model implies unified methodology for the analysis of side channel key recovery attacks the proposed solution allows getting rid of most of the subjective parameters that were limiting previous specialized and often ad hoc approaches in the evaluation of physically observable devices it typically determines the extent to which basic but practically essential questions such as how to compare two implementations or how to compare two side channel adversaries can be answered in sound fashion
with the significant advances in mobile computing technology there is an increasing demand for various mobile applications to process transactions in real time fashion when remote data access is considered in mobile environment data access delay becomes one of the most serious problems in meeting transaction deadlines in this paper we propose multi version data model and adopt the relative consistency as the correctness criterion for processing of real time transactions in mobile environment the purpose is to reduce the impacts of unpredictable and unreliable mobile network on processing of the real time transactions under the proposed model the overheads for concurrency control can be significantly reduced and the data availability is much enhanced even under network failures real time transaction may access stale data provided that they are relatively consistent with the data accessed by the transaction and the staleness of the data is within the requirements an image transaction model which pre fetches multiple data versions at fixed hosts is proposed to reduce the data access delay and to simplify the management of the real time transactions in mobile environment the image transaction model also helps in reducing the transaction restart overheads and minimizing the impacts of the unpredictable performance of mobile network on transaction executions
bisimulation semantics are very pleasant way to define the semantics of systems mainly because the simplicity of their definitions and their nice coalgebraic properties however they also have some disadvantages they are based on sequential operational semantics defined by means of an ordinary transition system and in order to be bisimilar two systems have to be too similar in this work we will present several natural proposals to define weaker bisimulation semantics that we think properly capture the desired behaviour of distributed systems the main virtue of all these semantics is that they are real bisimulation semantics thus inheriting most of the good properties of bisimulation semantics this is so because they can be defined as particular instances of jacobs and hughes categorical definition of simulation which they have already proved to satisfy all those properties
new ldquo range space rdquo approach is described for synergistic resolution of both stereovision and reflectance visual modeling problems simultaneously this synergistic approach can be applied to arbitrary camera arrangements with different intrinsic and extrinsic parameters image types image resolutions and image number these images are analyzed in step wise manner to extract range measurements and also to render customized perspective view the entire process is fully automatic an extensive and detailed experimental validation phase supports the basic feasibility and generality of the range space approach
category level object recognition segmentation and tracking in videos becomes highly challenging when applied to sequences from hand held camera that features extensive motion and zooming an additional challenge is then to develop fully automatic video analysis system that works without manual initialization of tracker or other human intervention both during training and during recognition despite background clutter and other distracting objects moreover our working hypothesis states that category level recognition is possible based only on an erratic flickering pattern of interest point locations without extracting additional features compositions of these points are then tracked individually by estimating parametric motion model groups of compositions segment video frame into the various objects that are present and into background clutter objects can then be recognized and tracked based on the motion of their compositions and on the shape they form finally the combination of this flow based representation with an appearance based one is investigated besides evaluating the approach on challenging video categorization database with significant camera motion and clutter we also demonstrate that it generalizes to action recognition in natural way
development of new embedded systems requires tuning of the software applications to specific hardware blocks and platforms as well as to the relevant input data instances the behaviour of these applications heavily relies on the nature of the input data samples thus making them strongly data dependent for this reason it is necessary to extensively profile them with representative samples of the actual input data an important aspect of this profiling is done at the dynamic data type level which actually steers the designers choice of implementation of these data types the behaviour of the applications is then characterized through an analysis phase as collection of software metadata that can be used to optimize the system as whole in this paper we propose to represent the behaviour of data dependent applications to enable optimizations rather than to analyze their structure or to define the engineering process behind them moreover we specifically limit ourselves to the scope of applications dominated by dynamically allocated data types running on embedded systems we characterize the software metadata that these optimizations require and we present methodology as well as appropriate techniques to obtain this information from the original application the optimizations performed on complete case study utilizing the extracted software metadata achieve overall improvements of up to in the number of cycles spent accessing memory when compared to code optimized only with the static techniques applied by gnu
this work addresses the issue of answering spatio temporal range queries when there is uncertainty associated with the model of the moving objects uncertainty is inherent in moving objects database mod applications and capturing it in the data model has twofold impact the number of updates when the actual trajectory deviates from its mod representation the linguistic constructs and the processing algorithms for querying the mod the paper presents both spatial and temporal uncertainty aspects which are combined into one model of uncertain trajectories given the model the methodology is presented which enables processing of queries such as what is the probability that given moving object was will be inside given region sometimes always during given time interval where the regions are bounded by arbitrary polygons
in recent years grid systems and peer to peer networks are the most commonly used solutions to achieve the same goal the sharing of resources and services in heterogeneous dynamic distributed environments many studies have proposed hybrid approaches that try to conjugate the advantages of the two models this paper proposes an architecture that integrates the pp interaction model in grid environments so as to build an open cooperative model wherein grid entities are composed in decentralized way in particular this paper focuses on qos aware discovery algorithm for pp grid systems analyzing protocol and explaining techniques used to improve its performance
technological success has ushered in massive amounts of data for scientific analysis to enable effective utilization of these data sets for all classes of users supporting intuitive data access and manipulation interfaces is crucial this paper describes an autonomous scientific workflow system that enables high level natural language based queries over low level data sets our technique involves combination of natural language processing metadata indexing and semantically aware workflow composition engine which dynamically constructs workflows for answering queries based on service and data availability specific contribution of this work is metadata registration scheme that allows for unified index of heterogeneous metadata formats and service annotations our approach thus avoids standardized format for storing all data sets or the implementation of federated mediator based querying framework we have evaluated our system using case study from the geospatial domain to show functional results our evaluation supports the potential benefits which our approach can offer to scientific workflow systems and other domain specific data intensive applications
this paper evaluates pointer tainting an incarnation of dynamic information flow tracking dift which has recently become an important technique in system security pointer tainting has been used for two main purposes detection of privacy breaching malware eg trojan keyloggers obtaining the characters typed by user and detection of memory corruption attacks against non control data eg buffer overflow that modifies user’s privilege level in both of these cases the attacker does not modify control data such as stored branch targets so the control flow of the target program does not change phrased differently in terms of instructions executed the program behaves normally as result these attacks are exceedingly difficult to detect pointer tainting is considered one of the onlymethods for detecting them in unmodified binaries unfortunately almost all of the incarnations of pointer tainting are flawed in particular we demonstrate that the application of pointer tainting to the detection of keyloggers and other privacybreaching malware is problematic we also discuss whether pointer tainting is able to reliably detect memory corruption attacks against non control data pointer tainting generates itself the conditions for false positives we analyse the problems in detail and investigate various ways to improve the technique most have serious drawbacks in that they are either impractical and incur many false positives still and or cripple the technique’s ability to detect attacks in conclusion we argue that depending on architecture and operating system pointer tainting may have some value in detecting memory orruption attacks albeit with false negatives and not on the popular architecture but it is fundamentally not suitable for automated detecting of privacy breaching malware such as keyloggers
common belief in the scientific community is that traffic classifiers based on deep packet inspection dpi are far more expensive in terms of computational complexity compared to statistical classifiers in this paper we counter this notion by defining accurate models for deep packet inspection classifier and statistical one based on support vector machines and by evaluating their actual processing costs through experimental analysis the results suggest that contrary to the common belief dpi classifier and an svm based one can have comparable computational costs although much work is left to prove that our results apply in more general cases this preliminary analysis is first indication of how dpi classifiers might not be as computationally complex compared to other approaches as we previously thought
in recent years both performance and power have become key factors in efficient memory design in this paper we propose systematic approach to reduce the energy consumption of the entire memory hierarchy we first evaluate an existing power aware memory system where memory modules can exist in different power modes and then propose on chip memory module buffers called energy saver buffers esb which reside in between the cache and main memory esbs reduce the additional overhead incurred due to frequent resynchronization of the memory modules in low power state an additional improvement is attained by using model that dynamically resizes the active cache based on the varying needs of program our experimental results demonstrate that an integrated approach can reduce the energy delay product by as much as when compared to traditional non power aware memory hierarchy
we show how properties of an interesting class of imperative programs can be calculated by means of relational modeling and symbolic computation the ideas of are implemented using symbolic computations based on maple
focus on organizing and implementing workflows in government from standpoint of data awareness or even data centricity might provide opportunities to address several of the challenges facing governments efforts to improve government effectiveness and efficiency as well as interoperation between government entities the notion of data aware as opposed to data unaware or just process centric workflows is based on taking into account and using the particular enactments and instance specific data in the workflow itself and beyond its single instance in other words on the one hand workflows process data however through their enactments and instances on the other hand workflows are data ready as inputs for other workflows including cross process enactment mining and analyses to be useful as strategy in government we need to explore how data centric approaches can tackle the specific challenges of government such as the ill structured or semi structured workflows found in emergency and disaster response management and we need to better understand the specific constraints under which intra and cross agency data centric workflows can be designed and implemented this paper lays out research agenda that will inform questions about the issues with and the potential of the data centric approach in the context of government
many electronic cash systems have been proposed with the proliferation of the internet and the activation of electronic commerce cash enables the exchange of digital coins with value assured by the bank’s signature and with concealed user identity in an electronic cash system user can withdraw coins from the bank and then spends each coin anonymously and unlinkably in this paper we design an efficient anonymous mobile payment system based on bilinear pairings in which the anonymity of coins is revocable by trustee in case of dispute the message transfer from the customer to the merchant occurs only once during the payment protocol also the amount of communication between customer and merchant is about bits therefore our mobile payment system can be used in the wireless networks with the limited bandwidth the security of the new system is under the computational diffie hellman problem in the random oracle model
in this paper we present formal model named pobsam policy based self adaptive model for developing and modeling self adaptive systems in this model policies are used as mechanism to direct and adapt the behavior of self adaptive systems pobsam model consists of set of self managed modules smm an smm is collection of autonomous managers and managed actors managed actors are dedicated to functional behavior while autonomous managers govern the behavior of managed actors by enforcing suitable policies to adapt smm behavior in response to changes policies governing an smm are adjusted ie dynamic policies are used to govern and adapt system behavior we employ the combination of an algebraic formalism and an actor based model to specify this model formally managers are modeled as meta actors whose policies are described using an algebra managed actors are expressed by an actor model furthermore we provide an operational semantics for pobsam described using labeled transition systems
embedded systems designers are free to choose the most suitable configuration of cache in modern processor based socs choosing the appropriate cache configuration necessitates the simulation of long memory access traces to accurately obtain hit miss rates the long execution time taken to simulate these traces particularly separate simulation for each configuration is major drawback researchers have proposed techniques to speed up the simulation of caches with lru replacement policy these techniques are of little use in the majority of embedded processors as these processors utilize round robin policy based caches in this paper we propose fast cache simulation approach called scud sorted collection of unique data for caches with the round robin policy scud is single pass cache simulator that can simulate multiple cache configurations with varying set sizes and associativities by reading the application trace once utilizing fast binary searches in novel data structure scud simulates an application trace significantly faster than widely used single configuration cache simulator dinero iv we show scud can simulate set of cache configurations up to times faster than dinero iv scud shows an average speed up of times over dinero iv for mediabench applications and an average speed up of over times for spec cpu applications
the frequent items problem is to process stream of items and find all items occurring more than given fraction of the time it is one of the most heavily studied problems in data stream mining dating back to the many applications rely directly or indirectly on finding the frequent items and implementations are in use in large scale industrial systems however there has not been much comparison of the different methods under uniform experimental conditions it is common to find papers touching on this topic in which important related work is mischaracterized overlooked or reinvented in this paper we aim to present the most important algorithms for this problem in common framework we have created baseline implementations of the algorithms and used these to perform thorough experimental study of their properties we give empirical evidence that there is considerable variation in the performance of frequent items algorithms the best methods can be implemented to find frequent items with high accuracy using only tens of kilobytes of memory at rates of millions of items per second on cheap modern hardware
constructing correct concurrent garbage collection algorithms is notoriously hard numerous such algorithms have been proposed implemented and deployed and yet the relationship among them in terms of speed and precision is poorly understood and the validation of one algorithm does not carry over to othersas programs with low latency requirements written in garbagecollected languages become part of society’s mission critical infrastructure it is imperative that we raise the level of confidence in the correctness of the underlying system and that we understand the trade offs inherent in our algorithmic choicein this paper we present correctness preserving transformations that can be applied to an initial abstract concurrent garbage collection algorithm which is simpler more precise and easier to prove correct than algorithms used in practice but also more expensive and with less concurrency we then show how both pre existing and new algorithms can be synthesized from the abstract algorithm by series of our transformations we relate the algorithms formally using new definition of precision and informally with respect to overhead and concurrencythis provides many insights about the nature of concurrent collection allows the direct synthesis of new and useful algorithms reduces the burden of proof to single simple algorithm and lays the groundwork for the automated synthesis of correct concurrent collectors
moving and resizing desktop windows are frequently performed but largely unexplored interaction tasks the standard title bar and border dragging techniques used for window manipulation have not changed much over the years we studied three new methods to move and resize windows the new methods are based on proxy and goal crossing techniques to eliminate the need of long cursor movements and acquiring narrow window borders instead moving and resizing actions are performed by manipulating proxy objects close to the cursor and by sweeping cursor motions across window borders we compared these techniques with the standard techniques the results indicate that further investigations and redesigns of window manipulation techniques are worthwhile all new techniques were faster than the standard techniques with task completion times improving more than in some cases also the new resizing techniques were found to be less error prone than the traditional click and drag method
permissive nominal logic pnl is an extension of first order logic where term formers can bind names in their arguments this allows for direct axiomatisations with binders such as the quantifier of first order logic itself and the binder of the lambda calculus this also allows us to finitely axiomatise arithmetic like first and higher order logic and unlike other nominal logics equality reasoning is not necessary to alpha rename all this gives pnl much of the expressive power of higher order logic but terms derivations and models of pnl are first order in character and the logic seems to strike good balance between expressivity and simplicity
computing environments on cellphones especially smartphones are becoming more open and general purpose thus they also become attractive targets of malware cellphone malware not only causes privacy leakage extra charges and depletion of battery power but also generates malicious traffic and drains down mobile network and service capacity in this work we devise novel behavior based malware detection system named pbmds which adopts probabilistic approach through correlating user inputs with system calls to detect anomalous activities in cellphones pbmds observes unique behaviors of the mobile phone applications and the operating users on input and output constrained devices and leverages hidden markov model hmm to learn application and user behaviors from two major aspects process state transitions and user operational patterns built on these pbdms identifies behavioral differences between malware and human users through extensive experiments on major smartphone platforms we show that pbmds can be easily deployed to existing smartphone hardware and it achieves high detection accuracy and low false positive rates in protecting major applications in smartphones
context the technology acceptance model tam was proposed in as means of predicting technology usage however it is usually validated by using measure of behavioural intention to use bi rather than actual usage objective this review examines the evidence that the tam predicts actual usage using both subjective and objective measures of actual usage method we performed systematic literature review based on search of six digital libraries along with vote counting meta analysis to analyse the overall results results the search identified relevant empirical studies in articles the results show that bi is likely to be correlated with actual usage however the tam variables perceived ease of use peu and perceived usefulness pu are less likely to be correlated with actual usage conclusion care should be taken using the tam outside the context in which it has been validated
recent years have transformed the web from web of content to web of applications and social content thus it has become crucial to be able to tap on this social aspect of the web whenever possible in addition to its content particularly for focused crawling in this paper we present novel profile based focused crawling system for dealing with the increasingly popular social media sharing web sites without assuming any privileged access to the internal private databases of such websites nor any requirement for the existence of apis for the extraction of social data our experiments prove the robustness of our profile based focused crawler as well as significant improvement in harvest ratio compared to breadth first and opic crawlers when crawling the flickr web site for two different topics
the combination of sgml and database technology allows to refine both declarative and navigational access mechanisms for structured document collection with regard to declarative access the user can formulate complex information needs without knowing query language the respective document type definition dtd or the underlying modelling navigational access is eased by hyperlink rendition mechanisms going beyond plain link integrity checking with our approach the database internal representation of documents is configurable it allows for an efficient implementation of operations because dtd knowledge is not needed for document structure recognition we show how the number of method invocations and the cost of parsing can be significantly reduced
information shown on tabletop display can appear distorted when viewed by seated user even worse the impact of this distortion is different depending on the location of the information on the display in this paper we examine how this distortion affects the perception of the basic graphical elements of information visualization shown on displays at various angles we first examine perception of these elements on single display and then compare this to perception across displays in order to evaluate the effectiveness of various elements for use in tabletop and multi display environment we found that the perception of some graphical elements is more robust to distortion than others we then develop recommendations for building data visualizations for these environments
the paper proposes new knowledge representation language called dlp which extends disjunctive logic programming with strong negation by inheritance the addition of inheritance enhances the knowledge modeling features of the language providing natural representation of default reasoning with exceptions declarative model theoretic semantics of dlp is provided which is shown to generalize the answer set semantics of disjunctive logic programs the knowledge modeling features of the language are illustrated by encoding classical nonmonotonic problems in dlp the complexity of dlp is analyzed proving that inheritance does not cause any computational overhead as reasoning in dlp has exactly the same complexity as reasoning in disjunctive logic programming this is confirmed by the existence of an efficient translation from dlp to plain disjunctive logic programming using this translation an advanced kr system supporting the dlp language has been implemented on top of the dlv system and has subsequently been integrated into dlv
the event service of the common object request broker architecture corba is useful in supporting decoupled and asynchronous communication between distributed object components however the specification of the event service standard does not require implementation to provide facilities to guarantee efficient event data delivery consequently applications in which large number of objects need to communicate via an event service channel may suffer from poor performance in this paper generic corba based framework is proposed to tackle this scalability problem two techniques are applied namely event channel federation and load balancing the solution is transparent in the sense that it exports the same idl interface as the original event service we explore three critical dimensions underlying the design of the load balancing algorithm and conduct experiments to evaluate their impact on the overall performance of the framework the results provide some useful insights into the improvement of the scalability of the event service
the development of efficient techniques for transforming massive volumes of remotely sensed hyperspectral data into scientific understanding is critical for space based earth science and planetary exploration although most available parallel processing strategies for information extraction and mining from hyperspectral imagery assume homogeneity in the underlying computing platform heterogeneous networks of computers hnocs have become promising cost effective solution expected to play major role in many on going and planned remote sensing missions in this paper we develop new morphological parallel algorithm for hyperspectral image classification using heterompi an extension of mpi for programming high performance computations on hnocs the main idea of heterompi is to automate and optimize the selection of group of processes that executes heterogeneous algorithm faster than any other possible group in heterogeneous environment in order to analyze the impact of many to one gather communication operations introduced by our proposed algorithm we resort to recently proposed collective communication model the parallel algorithm is validated using two heterogeneous clusters at university college dublin and massively parallel beowulf cluster at nasa’s goddard space flight center
this paper presents many core heterogeneous computational platform that employs gals compatible circuit switched on chip network the platform targets streaming dsp and embedded applications that have high degree of task level parallelism among computational kernels the test chip was fabricated in nm cmos consisting of simple small programmable cores three dedicated purpose accelerators and three shared memory modules all processors are clocked by their own local oscillators and communication is achieved through simple yet effective source synchronous communication technique that allows each interconnection link between any two processors to sustain peak throughput of one data word per cycle complete wlan baseband receiver was implemented on this platform it has real time throughput of mbps with all processors running at mhz and and consumes an average mw with mw or dissipated by its interconnection links we can fully utilize the benefit of the gals architecture and by adjusting each processor’s oscillator to run at workload based optimal clock frequency with the chip’s dual supply voltages set at and the receiver consumes only mw in power reduction measured results of its power consumption on the real chip come within the difference of only compared with the estimated results showing our design to be highly reliable and efficient
despite the fact that global software development gsd is steadily becoming the standard engineering mode in the software industry commercial projects still struggle with how to effectively manage it recent research and our own experiences from numerous gsd projects at capgemini sd indicate that staging the development process with handover checkpoints is promising practice in order to tackle many of the encountered problems in practice in this paper we discuss typical management problems in gsd we describe how handover checkpoints are used at capgemini sd to control and safely manage large gsd projects we show how these handover checkpoints and the use of cohesive and self contained work packages effectively mitigate the discussed management problems we are continuously refining and improving our handover checkpoint approach by applying it within large scale commercial gsd projects we thus believe that the presented results can serve the practitioner as fundament for implementing and customizing handover checkpoints within his own organisation
play on demand is usually regarded as feasible access mode for web content including streaming video web pages and so on web services and some software as service saas applications but not for common desktop applications this paper presents such solution for windows desktop applications based on lightweight virtualization and network transportation technologies which allows user to run her personalized software on any compatible computer across the internet even though they do not exist on local disks of the host in our approach the user’s data and their configurations are stored on portable usb device at run time the desktop applications are downloaded from the internet and run in lightweight virtualization environment in which some resource accessing apis such as registry files directories environment variables and the like are intercepted and redirected to the portable device or network as needed because applications are played without installation like streaming media they can be called streaming software moreover to protect software vendors rights access control technologies are used to block any illegal access in the current implementation pp transportation is used as the transport method however our design actually does not rely on pp and another data delivery mechanism like dedicated file server could be employed instead to make the system more predictable this paper describes the design and technical details for this system presents demo application and evaluates it performance the proposed solution is shown to be more efficient in performance and storage capacity than some of the existing solutions based on vm techniques
in an application where sparse matching of feature points is used towards fast scene reconstruction the choice of the type of features to be matched has an important impact on the quality of the resulting model in this work method is presented for quickly and reliably selecting and matching points from three views of scene the selected points are based on epipolar gradients and consist of stable image features relevant to reconstruction then the selected points are matched using edge transfer measure of geometric consistency for point triplets and the edges on which they lie this matching scheme is tolerant to image deformations due to changes in viewpoint models drawn from matches obtained by the proposed technique are shown to demonstrate its usefulness
several approaches to collaborative filtering have been studied but seldom have studies been reported for large several millionusers and items and dynamic the underlying item set is continually changing settings in this paper we describe our approach to collaborative filtering for generating personalized recommendations for users of google news we generate recommendations using three approaches collaborative filtering using minhash clustering probabilistic latent semantic indexing plsi and covisitation counts we combine recommendations from different algorithms using linear model our approach is content agnostic and consequently domain independent making it easily adaptable for other applications and languages with minimal effort this paper will describe our algorithms and system setup in detail and report results of running the recommendations engine on google news
this article presents an approach to identify abstract data types adt and abstract state encapsulations ase also called abstract objects in source code this approach named similarity clustering groups together functions types and variables into adt and ase candidates according to the proportion of features they share the set of features considered includes the context of these elements the relationships to their environment and informal information prototype tool has been implemented to support this approach it has been applied to three systems each between ndash kloc the adts and ases identified by the approach are compared to those identified by software engineers who did not know the proposed approach or other automatic approaches within this case study this approach has been shown to have higher detection quality and to identify in most of the cases more adts and ases than the other techniques in all other cases its detection quality is second best nb this article reports on work in progress on this approach which has evolved since it was presented in the original ase conference paper
competitive native solvers for answer set programming asp perform backtracking search by assuming the truth of literals the choice of literals the heuristic is fundamental for the performance of these systems most of the efficient asp systems employ heuristic based on look ahead that is literal is tentatively assumed and its heuristic value is based on its deterministic consequences however looking ahead is costly operation and indeed look ahead often accounts for the majority of time taken by asp solvers for satisfiability sat radically different approach called look back heuristic proved to be quite successful instead of looking ahead one uses information gathered during the computation performed so far thus looking back in this approach atoms which have been frequently involved in inconsistencies are preferred in this paper we carry over this approach to the framework of disjunctive asp we design number of look back heuristics exploiting peculiarities of asp and implement them in the asp system dlv we compare their performance on collection of hard asp programs both structured and randomly generated these experiments indicate that very basic approach works well outperforming all of the prominent disjunctive asp systems dlv with its traditional heuristic gnt and cmodels on many of the instances considered
the widespread presence of simd devices in today’s microprocessors has made compiler techniques for these devices tremendously important one of the most important and difficult issues that must be addressed by these techniques is the generation of the data permutation instructions needed for non contiguous and misaligned memory references these instructions are expensive and therefore it is of crucial importance to minimize their number to improve performance and in many cases enable speedups over scalar codealthough it is often difficult to optimize an isolated data reorganization operation collection of related data permutations can often be manipulated to reduce the number of operations this paper presents strategy to optimize all forms of data permutations the strategy is organized into three steps first all data permutations in the source program are converted into generic representation these permutations can originate from vector accesses to non contiguous and misaligned memory locations or result from compiler transformations second an optimization algorithm is applied to reduce the number of data permutations in basic block by propagating permutations across statements and merging consecutive permutations whenever possible the algorithm can significantly reduce the number of data permutations finally code generation algorithm translates generic permutation operations into native permutation instructions for the target platform experiments were conducted on various kinds of applications the results show that up to of the permutation instructions are eliminated and as result the average performance improvement is on vmx and on sse for several applications near perfect speedups have been achieved on both platforms
this paper presents an approach that uses special purpose rbac constraints to base certain access control decisions on context information in our approach context constraint is defined as dynamic rbac constraint that checks the actual values of one or more contextual attributes for predefined conditions if these conditions are satisfied the corresponding access request can be permitted accordingly conditional permission is an rbac permission which is constrained by one or more context constraints we present an engineering process for context constraints that is based on goal oriented requirements engineering techniques and describe how we extended the design and implementation of an existing rbac service to enable the enforcement of context constraints with our approach we aim to preserve the advantages of rbac and offer an additional means for the definition and enforcement of fine grained context dependent access control policies
publish subscribe systems are used increasingly often as communication mechanism in loosely coupled distributed applications with their gradual adoption in mission critical areas it is essential that systems are subjected to rigorous performance analysis before they are put into production however existing approaches to performance modeling and analysis of publish subscribe systems suffer from many limitations that seriously constrain their practical applicability in this paper we present set of generalized and comprehensive analytical models of publish subscribe systems employing different peer to peer and hierarchical routing schemes the proposed analytical models address the major limitations underlying existing work in this area and are the first to consider all major performance relevant system metrics including the expected broker and link utilization the expected notification delay the expected time required for new subscriptions to become fully active as well as the expected routing table sizes and message rates to illustrate our approach and demonstrate its effectiveness and practicality we present case study showing how our models can be exploited for capacity planning and performance prediction in realistic scenario
speed up techniques that exploit given node coordinates have proven useful for shortest path computations in transportation networks and geographic information systems to facilitate the use of such techniques when coordinates are missing from some or even all of the nodes in network we generate artificial coordinates using methods from graph drawing experiments on large set of german train timetables indicate that the speed up achieved with coordinates from our drawings is close to that achieved with the true coordinates and in some special cases even better
this paper proposes novel method for phrase based statistical machine translation based on the use of pivot language to translate between languages and with limited bilingual resources we bring in third language called the pivot language for the language pairs and there exist large bilingual corpora using only and bilingual corpora we can build translation model for the advantage of this method lies in the fact that we can perform translation between and even if there is no bilingual corpus available for this language pair using bleu as metric our pivot language approach significantly outperforms the standard model trained on small bilingual corpus moreover with small bilingual corpus available our method can further improve translation quality by using the additional and bilingual corpora
media spaces and videoconference systems are beneficial for connecting separated co workers and providing rich contextual information however image sharing communication tools may also touch on sensitive spots of the human psyche related to personal perceived image issues eg appearance self image self presentation and vanity we conducted two user studies to examine the impact of self image concerns on the use of media spaces and videoconference systems our results suggest that personal perceived image concerns have considerable impact on the comfort level of users and may hinder effective communication we also found that image filtering techniques can help users feel more comfortable our results revealed that distortion filters which are frequently cited to help preserve privacy do not tend to be the ones preferred by users instead users seemed to favor filters that make subtle changes to their appearance or in some instances they preferred to use surrogate instead
xml documents have recently become ubiquitous because of their varied applicability in number of applications classification is an important problem in the data mining domain but current classification methods for xml documents use ir based methods in which each document is treated as bag of words such techniques ignore significant amount of information hidden inside the documents in this paper we discuss the problem of rule based classification of xml data by using frequent discriminatory substructures within xml documents such technique is more capable of finding the classification characteristics of documents in addition the technique can also be extended to cost sensitive classification we show the effectiveness of the method with respect to other classifiers we note that the methodology discussed in this paper is applicable to any kind of semi structured data
in this paper we present new algorithm named turbosyn for fpga synthesis with retiming and pipelining to minimizethe clock period for sequential circuitsfor target clockperiod since pipelining can eliminate all critical paths but not critical loops we concentrate on fpga synthesis toeliminate the critical loopswe combine the combinationalfunctional decomposition technique with retiming to performthe sequential functional decomposition and incorporate itin the label computation of turbomap to eliminate allcritical loopsthe results show significant improvementover the state of the art fpga mapping and resynthesis algorithms times reduction on the clock period moreover we develop novel approach for positive loop detectionwhich leads to over times speedup of the algorithmas result turbosyn can optimize sequential circuits ofover gates and flipflops in reasonable time
real time database management systems rt dbms have the necessary characteristics for providing efficient support to develop applications in which both data and transactions have temporal constraints however in the last decade new applications were identified and are characterized by large geographic distribution high heterogeneity lack of global control partial failures and lack of safety besides they need to manage large data volumes with real time constraints scheduling algorithms should consider transactions with soft deadlines and the concurrency control protocols should allow conflicting transactions to execute in parallel the last ones should be based in their requirements which are specified through both quality of services functions and performance metrics in this work method to model and develop applications that execute in open and unpredictable environments is proposed based on this model it is possible to perform analysis and simulations of systems to guide the decision making process and to identify solutions for improving it for validating the model case study considering the application domain of sensors network is discussed
in this paper we describe new via configurable routing architecture which shows much better throughput and performance than the previous structures we demonstrate how to construct single via mask fabric to reduce the mask cost further and we analyze the penalties which it incurs to solve the routability problem commonly existing in fabric based designs an efficient white space allocation and an incremental cell movement scheme are suggested which help to provide fast design convergence and early prediction of circuit’s mappability to given fabric
array redistribution is usually needed for more efficiently executing data parallel program on distributed memory multicomputers to minimize the redistribution data transfer cost processor mapping techniques were proposed to reduce the amount of redistributed data elements theses techniques demand that the beginning data elements on processor not be redistributed in the redistribution on the other hand for satisfying practical computation needs programmer may require other data elements to be un redistributed localized in the redistribution in this paper we propose flexible processor mapping technique for the block cyclic redistribution to allow the programmer to localize the required data elements in the redistribution we also present an efficient redistribution method for the redistribution employing our proposed technique the data transfer cost reduction and system performance improvement for the redistributions with data localization are analyzed and presented in our experimental results
consider data warehouses as large data repositories queried for analysis and data mining in variety of application contexts query over such data may take large amount of time to be processed in regular pc consider partitioning the data into set of pcs nodes with either parallel database server or any database server at each node and an engine independent middleware nodes and network may even not be fully dedicated to the data warehouse in such scenario care must be taken for handling processing heterogeneity and availability so we study and propose efficient solutions for this we concentrate on three main contributions performance wise index measuring relative performance replication degree flexible chunk wise organization with on demand processing these contributions extend the previous work on de clustering and replication and are generic in the sense that they can be applied in very different contexts and with different data partitioning approaches we evaluate their merits with prototype implementation of the system
some of the most difficult questions to answer when designing distributed application are related to mobility what information to transfer between sites and when and how to transfer it network transparent distribution the property that program’s behavior is independent of how it is partitioned among sites does not directly address these questions therefore we propose to extend all language entities with network behavior that enables efficient distributed programming by giving the programmer simple and predictable control over network communication patterns in particular we show how to give objects an arbitrary mobility behavior that is independent of the objects definition in this way the syntax and semantics of objects are the same regardless of whether they are used as stationary servers mobile agents or simply as caches these ideas have been implemented in distributed oz concurrent object oriented language that is state aware and has dataflow synchronization we prove that the implementation of objects in distributed oz is network transparent to satisfy the predictability condition the implementation avoids forwarding chains through intermediate sites the implementation is an extension to the publicly available dfki oz system
many spatiotemporal applications store moving object data in the form of trajectories various recent works have addressed interesting queries on trajectorial data mainly focusing on range queries and nearest neighbor queries here we examine another interesting query the time relaxed spatiotemporal trajectory join trstj which effectively finds groups of moving objects that have followed similar movements in different times we first attempt to address the trstj problem using symbolic representation algorithm which we have recently proposed for trajectory joins however we show experimentally that this solution produces false positives that grow rapidly with the increase of the problem size as result it is inefficient for trstj queries as it leads to large query time overhead in order to improve query performance we propose two important heuristics that turn the symbolic represenation approach effective for trstj queries our first improvement allows the use of multiple origins when processing strings representing trajectories the experimental evaluation shows that the multiple origin approach drastically reduces query performance we then present divide and conquer approach to further reduce false positives through symbolic class separation the proposed solutions can be combined together which leads to even better query performance we present an experimental study revealing the advantages of using these approaches for solving time relaxed spatiotemporal trajectory join queries
divide and conquer programs are easily parallelized by letting the programmer annotate potential parallelism in the form of spawn and sync constructs to achieve efficient program execution the generated work load has to be balanced evenly among the available cpus for single cluster systems random stealing rs is known to achieve optimal load balancing however rs is inefficient when applied to hierarchical wide area systems where multiple clusters are connected via wide area networks wans with high latency and low bandwidthin this paper we experimentally compare rs with existing load balancing strategies that are believed to be efficient for multi cluster systems random pushing and two variants of hierarchical stealing we demonstrate that in practice they obtain less than optimal results we introduce novel load balancing algorithm cluster aware random stealing crs which is highly efficient and easy to implement crs adapts itself to network conditions and job granularities and does not require manually tuned parameters although crs sends more data across the wans it is faster than its competitors for out of test applications with various wan configurations it has at most overhead in run time compared to rs on single large cluster even with high wide area latencies and low wide area bandwidths these strong results suggest that divide and conquer parallelism is useful model for writing distributed supercomputing applications on hierarchical wide area systems
counterexample guided abstraction refinement cegar is key technique for the verification of computer programs grumberg et al developed cegar based algorithm for the modal calculus there every abstract state is split in refinement step in this paper the work of grumberg et al is generalized by presenting new cegar based algorithm for the calculus it is based on more expressive abstract model and applies refinement only locally at single abstract state ie the lazy abstraction technique for safety properties is adapted to the calculus furthermore it separates refinement determination from the valued based model checking three different heuristics for refinement determination are presented and illustrated
trust management systems are frameworks for authorization in modern distributed systems allowing remotely accessible resources to be protected by providers by allowing providers to specify policy and access requesters to possess certain access rights trust management automates the process of determining whether access should be allowed on the basis of policy rights and an authorization semantics in this paper we survey modern state of the art in trust management authorization focusing on features of policy and rights languages that provide the necessary expressiveness for modern practice we characterize systems in light of generic structure that takes into account components of practical implementations we emphasize systems that have formal foundation since security properties of them can be rigorously guaranteed underlying formalisms are reviewed to provide necessary background
we show how range of role based access control rbac models may be usefully represented as constraint logic programs executable logical specifications the rbac models that we define extend the standard rbac models that are described by sandhu et al and enable security administrators to define range of access policies that may include features like denials of access and temporal authorizations that are often useful in practice but which are not widely supported in existing access control models representing access policies as constraint logic programs makes it possible to support certain policy options constraint checks and administrator queries that cannot be represented by using related methods like logic programs representing an access control policy as constraint logic program also enables access requests and constraint checks to be efficiently evaluated
this paper contributes to growing body of design patterns in interaction design for cooperative work while also describing how to go from field studies to design patterns it focuses on sociable face to face situations the patterns are based on field studies and design work in three sociable settings where desirable use qualities were identified and translated into forces in three design patterns for controlling information visibility on the basis of the patterns the design of multiple device multimedia platform is described it is shown that desirable qualities of systems in use can be utilised as forces in patterns which means that traditional qualitative research is highly valuable when documenting design knowledge in patterns three classes of interaction design patterns are identified environments for interactions means for interaction and interfaces for interaction these classes describe types of patterns within hierarchical model of interaction design
the wireless network community has become increasingly aware of the benefits of data driven link estimation and routing as compared with beacon based approaches but the issue of biased link sampling bls has not been well studied even though it affects routing convergence in the presence of network and environment dynamics focusing on traffic induced dynamics we examine the open unexplored question of how serious the bls issue is and how to effectively address it when the routing metric etx is used for wide range of traffic patterns and network topologies and using both node oriented and network wide analysis and experimentation we discover that the optimal routing structure remains quite stable even though the properties of individual links and routes vary significantly as traffic pattern changes in cases where the optimal routing structure does change data driven link estimation and routing is either guaranteed to converge to the optimal structure or empirically shown to converge to close to optimal structure these findings provide the foundation for addressing the bls issue in the presence of traffic induced dynamics and suggest approaches other than existing ones these findings also demonstrate that it is possible to maintain an optimal stable routing structure despite the fact that the properties of individual links and paths vary in response to network dynamics
mobile computation in which executing computations can move from one physical computing device to another is recurring theme from os process migration to language level mobility to virtual machine migration this article reports on the design implementation and verification of overlay networks to support reliable communication between migrating computations in the nomadic pict project we define two levels of abstraction as calculi with precise semantics low level nomadic pi calculus with migration and location dependent communication and high level calculus that adds location independent communication implementations of location independent communication as overlay networks that track migrations and forward messages can be expressed as translations of the high level calculus into the low we discuss the design space of such overlay network algorithms and define three precisely as such translations based on the calculi we design and implement the nomadic pict distributed programming language to let such algorithms and simple applications above them to be quickly prototyped we go on to develop the semantic theory of the nomadic pi calculi proving correctness of one example overlay network this requires novel equivalences and congruence results that take migration into account and reasoning principles for agents that are temporarily immobile eg waiting on lock elsewhere in the system the whole stands as demonstration of the use of principled semantics to address challenging system design problems
choosing good variable order is crucial for making symbolic state space generation algorithms truly efficient one such algorithm is the mdd based saturation algorithm for petri nets implemented in smart whose efficiency relies on exploiting event locality this paper presents novel static ordering heuristic that considers place invariants of petri nets in contrast to related work we use the functional dependencies encoded by invariants to merge decision diagram variables rather than to eliminate them we prove that merging variables always yields smaller mdds and improves event locality while eliminating variables may increase mdd sizes and break locality combining this idea of merging with heuristics for maximizing event locality we obtain an algorithm for static variable order which outperforms competing approaches regarding both time efficiency and memory efficiency as we demonstrate by extensive benchmarking
in this paper we consider the problem of processor allocation on mesh based multiprocessor systems we employ the idea of using migration to minimize fragmentation and the overall processing time of the tasks in our schemes we consider the use of task migration whenever required to improve the problem of fragmentation to this end we propose three efficient schemes to improve the performance of first fit allocation strategies commonly used in practice the first scheme called the first fit mesh bifurcation ffmb scheme attempts to start the search for free submesh from either the bottom left corner or the top left corner of the mesh so as to reduce the amount of fragmentation in the mesh the next two schemes called the online dynamic compaction single corner odc sc and online dynamic compaction four corners odc fc schemes use task migration to improve the performance of existing submesh allocation strategies we perform rigorous simulation experiments based on practical workloads as reported in the literature to quantify all our proposed schemes and compare them against standard schemes existing in the literature based on the results we make clear recommendations on the choice of the strategies
data exchange and virtual data integration have been the subject of several investigations in the recent literature at the same time the notion of peer data management has emerged as powerful abstraction of many forms of flexible and dynamic data centere ddistributed systems although research on the above issues has progressed considerably in the last years clear understanding on how to combine data exchange and data integration in peer data management is still missing this is the subject of the present paper we start our investigation by first proposing novel framework for peer data exchange showing that it is generalization of the classical data exchange setting we also present algorithms for all the relevant data exchange tasks and show that they can all be done in polynomial time with respect to data complexity based on the motivation that typical mappings and integrity constraints found in data integration are not captured by peer data exchange we extend the framework to incorporate these features one of the main difficulties is that the constraints of this new class are not amenable to materialization we address this issue by resorting to suitable combination of virtual and materialized data exchange showing that the resulting framework is generalization of both classical data exchange and classical data integration and that the new setting incorporates the most expressive types of mapping and constraints considered in the two contexts finally we present algorithms for all the relevant data management tasks also in the new setting and show that again their data complexity is polynomial
radiance transfer represents how generic source lighting is shadowed and scattered by an object to produce view dependent appearance we generalize by rendering transfer at two scales macro scale is coarsely sampled over an object’s surface providing global effects like shadows cast from an arm onto body meso scale is finely sampled over small patch to provide local texture low order spherical harmonics represent low frequency lighting dependence for both scales to render coefficient vector representing distant source lighting is first transformed at the macro scale by matrix at each vertex of coarse mesh the resulting vectors represent spatially varying hemisphere of lighting incident to the meso scale function called radiance transfer texture rtt then specifies the surface’s meso scale response to each lighting basis component as function of spatial index and view direction finally dot product of the macro scale result vector with the vector looked up from the rtt performs the correct shading integral we use an id map to place rtt samples from small patch over the entire object only two scalars are specified at high spatial resolution results show that bi scale decomposition makes preprocessing practical and efficiently renders self shadowing and interreflection effects from dynamic low frequency light sources at both scales
we present multiple pass streaming algorithms for basic clustering problem for massive data sets if our algorithm is allotted passes it will produce an approximation with error at most epsilon using otilde epsilon bits of memory the most critical resource for streaming computation we demonstrate that this tradeoff between passes and memory allotted is intrinsic to the problem and model of computation by proving lower bounds on the memory requirements of any pass randomized algorithm that are nearly matched by our upper bounds to the best of our knowledge this is the first time nearly matching bounds have been proved for such an exponential tradeoff for randomized computationin this problem we are given set of points drawn randomly according to mixture of uniform distributions and wish to approximate the density function of the mixture the points are placed in datastream possibly in adversarial order which may only be read sequentially by the algorithm we argue that this models among others the datastream produced by national census of the incomes of all citizens
we present unified feature representation of pointclouds and apply it to face recognition the representation integrates local and global geometrical cues in single compact representation which makes matching probe to large database computationally efficient the global cues provide geometrical coherence for the local cues resulting in better descriptiveness of the unified representation multiple rank tensors scalar features are computed at each point from its local neighborhood and from the global structure of the pointcloud forming multiple rank tensor fields the pointcloud is then represented by the multiple rank tensor fields which are invariant to rigid transformations each local tensor field is integrated with every global field in histogram which is indexed by local field in one dimension and global field in the other dimension finally pca coefficients of the histograms are concatenated into single feature vector the representation was tested on frgc data set and achieved identification and verification rate at far
we present method for learning model of human body shape variation from corpus of range scans our model is the first to capture both identity dependent and pose dependent shape variation in correlated fashion enabling creation of variety of virtual human characters with realistic and non linear body deformations that are customized to the individual our learning method is robust to irregular sampling in pose space and identity space and also to missing surface data in the examples our synthesized character models are based on standard skinning techniques and can be rendered in real time
unconstrained consumer photos pose great challenge for content based image retrieval unlike professional images or domain specific images consumer photos vary significantly more often than not the objects in the photos are ill posed occluded and cluttered with poor lighting focus and exposure in this paper we propose cascading framework for combining intra image and inter class similarities in image retrieval motivated from probabilistic bayesian principles support vector machines are employed to learn local view based semantics based on just in time fusion of color and texture features new detection driven block based segmentation algorithm is designed to extract semantic features from images the detection based indexes also serve as input for support vector learning of image classifiers to generate class relative indexes during image retrieval both intra image and inter class similarities are combined to rank images experiments using query by example on genuine heterogeneous consumer photos with semantic queries show that the combined matching approach is better than matching with single index it also outperformed the method of combining color and texture features by in average precision
compiler based auto parallelization is much studied area yet has still not found wide spread application this is largely due to the poor exploitation of application parallelism subsequently resulting in performance levels far below those which skilled expert programmer could achieve we have identified two weaknesses in traditional parallelizing compilers and propose novel integrated approach resulting in significant performance improvements of the generated parallel code using profile driven parallelism detection we overcome the limitations of static analysis enabling us to identify more application parallelism and only rely on the user for final approval in addition we replace the traditional target specific and inflexible mapping heuristics with machine learning based prediction mechanism resulting in better mapping decisions while providing more scope for adaptation to different target architectures we have evaluated our parallelization strategy against the nas and spec omp benchmarks and two different multi core platforms dual quad core intel xeon smp and dual socket qs cell blade we demonstrate that our approach not only yields significant improvements when compared with state of the art parallelizing compilers but comes close to and sometimes exceeds the performance of manually parallelized codes on average our methodology achieves of the performance of the hand tuned openmp nas and spec parallel benchmarks on the intel xeon platform and gains significant speedup for the ibm cell platform demonstrating the potential of profile guided and machine learning based parallelization for complex multi core platforms
we present new variational method for multi view stereovision and non rigid three dimensional motion estimation from multiple video sequences our method minimizes the prediction error of the shape and motion estimates both problems then translate into generic image registration task the latter is entrusted to global measure of image similarity chosen depending on imaging conditions and scene properties rather than integrating matching measure computed independently at each surface point our approach computes global image based matching score between the input images and the predicted images the matching process fully handles projective distortion and partial occlusions neighborhood as well as global intensity information can be exploited to improve the robustness to appearance changes due to non lambertian materials and illumination changes without any approximation of shape motion or visibility moreover our approach results in simpler more flexible and more efficient implementation than in existing methods the computation time on large datasets does not exceed thirty minutes on standard workstation finally our method is compliant with hardware implementation with graphics processor units our stereovision algorithm yields very good results on variety of datasets including specularities and translucency we have successfully tested our motion estimation algorithm on very challenging multi view video sequence of non rigid scene
interfaces based on recognition technologies are used extensively in both the commercial and research worlds but recognizers are still error prone and this results in human performance problems brittle dialogues and other barriers to acceptance and utility of recognition systems interface techniques specialized to recognition systems can help reduce the burden of recognition errors but building these interfaces depends on knowledge about the ambiguity inherent in recognition we have extended user interface toolkit in order to model and to provide structured support for ambiguity at the input event level this makes it possible to build re usable interface components for resolving ambiguity and dealing with recognition errors these interfaces can help to reduce the negative effects of recognition errors by providing these components at toolkit level we make it easier for application writers to provide good support for error handling further with this robust support we are able to explore new types of interfaces for resolving more varied range of ambiguity
we present an efficient approach for end to end out of core construction and interactive inspection of very large arbitrary surface models the method tightly integrates visibility culling and out of core data management with level of detail framework at preprocessing time we generate coarse volume hierarchy by binary space partitioning the input triangle soup leaf nodes partition the original data into chunks of fixed maximum number of triangles while inner nodes are discretized into fixed number of cubical voxels each voxel contains compact direction dependent approximation of the appearance of the associated volumetric sub part of the model when viewed from distance the approximation is constructed by visibility aware algorithm that fits parametric shaders to samples obtained by casting rays against the full resolution dataset at rendering time the volumetric structure maintained off core is refined and rendered in front to back order exploiting vertex programs for gpu evaluation of view dependent voxel representations hardware occlusion queries for culling occluded subtrees and asynchronous for detecting and avoiding data access latencies since the granularity of the multiresolution structure is coarse data management traversal and occlusion culling cost is amortized over many graphics primitives the efficiency and generality of the approach is demonstrated with the interactive rendering of extremely complex heterogeneous surface models on current commodity graphics platforms
paraphrasing van rijsbergen the time is ripe for another attempt at using natural language processing nlp for information retrieval ir this paper introduces my dissertation study which will explore methods for integrating modern nlp with state of the art ir techniques in addition to text will also apply retrieval to conversational speech data which poses unique set of considerations in comparison to text greater use of nlp has potential to improve both text and speech retrieval
on line analytical processing olap is technology basically created to provide users with tools in order to explore and navigate into data cubes unfortunately in huge and sparse data exploration becomes tedious task and the simple user’s intuition or experience does not lead to efficient results in this paper we propose to exploit the results of the multiple correspondence analysis mca in order to enhance data cube representations and make them more suitable for visualization and thus easier to analyze our approach addresses the issues of organizing data in an interesting way and detects relevant facts our purpose is to help the interpretation of multidimensional data by efficient and simple visual effects to validate our approach we compute its efficiency by measuring the quality of resulting multidimensional data representations in order to do so we propose an homogeneity criterion to measure the visual relevance of data representations this criterion is based on the concept of geometric neighborhood and similarity between cells experimental results on real data have shown the interest of using our approach on sparse data cubes
there is growing demand for provisioning of different levels of quality of service qos on scalable web servers to meet changing resource availability and to satisfy different client requirements in this paper we investigate the problem of providing proportional qos differentiation with respect to response time on web servers we first present processing rate allocation scheme based on the foundations of queueing theory it provides different processing rates to requests of different client classes so as to achieve the differentiation objective at application level process is used as the resource allocation principal for achieving processing rates on apache web servers we design and implement an adaptive process allocation approach guided by the queueing theoretical rate allocation scheme on an apache server this application level implementation however shows weak qos predictability because it does not have fine grained control over the consumption of resources that the kernel consumes and hence the processing rate is not strictly proportional to the number of processes allocated we then design feedback controller and integrate it with the queueing theoretical approach it adjusts process allocations according to the difference between the target response time and the achieved response time using proportional integral derivative controller experimental results demonstrate that this integrated approach can enable web servers to provide robust proportional response time differentiation
we present hybridpointing technique that lets users easily switch between absolute and relative pointing with direct input device such as pen our design includes new graphical element the trailing widget which remains close at hand but does not interfere with normal cursor operation the use of visual feedback to aid the user’s understanding of input state is discussed and several novel visual aids are presented an experiment conducted on large wall sized display validates the benefits of hybridpointing under certain conditions we also discuss other situations in which hybridpointing may be useful finally we present an extension to our technique that allows for switching between absolute and relative input in the middle of single drag operation
many adaptive routing algorithms have been proposed for wormhole routed interconnection networks comparatively little work however has been done on determining how the selection function routing policy affects the performance of an adaptive routing algorithm in this paper we present detailed simulation study of various selection functions for fully adaptive wormhole routing on two dimensional meshes the simulation results show that the choice of selection function has significant effect on the average message latency in addition it is possible to find single selection function that exhibits excellent performance across wide range of traffic patterns network sizes and number of virtual channels thus well chosen selection function for an adaptive routing algorithm can lead to consistently better performance than an arbitrary selection function one of the selection functions considered is theoretically optimal selection function ieee trans comput october we show that although theoretically optimal the actual performance of the optimal selection function is not always best an explanation and interpretation of the results is provided
imperative and object oriented programs make ubiquitous use of shared mutable objects updating shared object can and often does transgress boundary that was supposed to be established using static constructs such as class with private fields this paper shows how auxiliary fields can be used to express two state dependent encapsulation disciplines ownership kind of separation and friendship kind of sharing methodology is given for specification and modular verification of encapsulated object invariants and shown sound for class based language as an example the methodology is used to specify iterators which are problematic for previous ownership systems
energy consumption is of significant concern in battery operated embedded systems in the processors of such systems the instruction cache consumes significant fraction of the total energy one of the most popular methods to reduce the energy consumption is to shut down idle cache banks however we observe that operating idle cache banks at reduced voltage frequency level along with the active banks in pipelined manner can potentially achieve even better energy savings in this paper we propose novel dvs based pipelined reconfigurable instruction memory hierarchy called prim canonical example of our proposed prim consists of four cache banks two of these cache banks can be configured at runtime to operate at lower voltage and frequency levels than that of the normal cache instruction fetch throughput is maintained by pipelining the accesses to the low voltage banks we developed profile driven compilation framework that analyzes applications and inserts the appropriate cache reconfiguration points our experimental results show that prim can significant reduce the energy consumption for popular embedded benchmarks with minimal performance overhead we obtained and energy savings for aggressive and conservative vdd settings respectively at the expense of performance overhead
software defect prediction is important for reducing test times by allocating testing resources effectively in terms of predicting the defects in software naive bayes outperforms wide range of other methods however naive bayes assumes the independence and equal importance of attributes in this work we analyze these assumptions of naive bayes using public software defect data from nasa our analysis shows that independence assumption is not harmful for software defect data with pca pre processing our results also indicate that assigning weights to static code attributes may increase the prediction performance significantly while removing the need for feature subset selection
this paper presents fundamentally new approach to integrating local decisions from various nodes and efficiently routing data in sensor networks by classifying the nodes in the sensor field as hot or cold in accordance with whether or not they sense the target we are able to concentrate on smaller set of nodes and gear the routing of data to and from the sink to fraction of the nodes that exist in the network the introduction of this intermediary step is fundamentally new and allows for efficient and meaningful fusion and routing this is made possible through the use of novel markov random field mrf approach which to the best of our knowledge has never been applied to sensor networks in combination with maximum posteriori probability map stochastic relaxation tools to flag out the hot nodes in the network and to optimally combine their data and decisions towards an integrated and collaborative global decision fusion this global decision supersedes all local decisions and provides the basis for efficient use of the sensed data because of the mrf local nature nodes need not see or interact with other nodes in the sensor network beyond their immediate neighborhood which can either be defined in terms of distance between nodes or communication connectivity hence adding to the flexibility of dealing with irregular and varying sensor topologies and also minimizing node power usage and providing for easy scalability the routing of the hot nodes data is confined to cone of nodes and power constraints are taken into account we also use the found location of the centroid of the hot nodes over time to track the movement of the target this is achieved by using the segmentation at time as an initial state in the stochastic map relaxation at time dt
pen gesture interfaces have difficulty supporting arbitrary multiple stroke selections because lifting the pen introduces ambiguity as to whether the next stroke should add to the existing selection or begin new one we explore and evaluate techniques that use non preferred hand button or touchpad to phrase together one or more independent pen strokes into unitary multi stroke gesture we then illustrate how such phrasing techniques can support multiple stroke selection gestures with tapping crossing lassoing disjoint selection circles of exclusion selection decorations and implicit grouping operations these capabilities extend the expressiveness of pen gesture interfaces and suggest new directions for multiple stroke pen input techniques
in this paper we propose an image completion algorithm which takes advantage of the countless number of images available on internet photo sharing sites to replace occlusions in an input image the algorithm automatically selects the most suitable images from database of downloaded images and seamlessly completes the input image using the selected images with minimal user intervention experimental results on input images captured at various locations and scene conditions demonstrate the effectiveness of the proposed technique in seamlessly reconstructing user defined occlusions
we extend distributed database query optimization techniques to support database programming language language much richer than relational query languages with the richness comes difficulties eg how to recognize joins and how to handle aliases in this paper we describe our techniques dataflow analysis abstract evaluation partial evaluation and rewriting also we overview the algorithm that uses these techniques
this paper considers new security protocol paradigm whereby principals negotiate and on the fly generate security protocols according to their needs when principals wish to interact then rather than offering each other fixed menu of known protocols they negotiate and possibly with the collaboration of other principles synthesise new protocol that is tailored specifically to their current security environment and requirements this approach provides basis for autonomic security protocols such protocols are self configuring since only principal assumptions and protocol goals need to be priori configured the approach has the potential to survive security compromises that can be modelled as changes in the beliefs of the principals compromise of key or change in the trust relationships between principals can result in principal self healing and synthesising new protocol to survive the event
bilingual documentation has become common phenomenon in official institutions and private companies in this scenario the categorization of bilingual text is useful tool in this paper different approaches will be proposed to tackle this bilingual classification task on the one hand three finite state transducer algorithms from the grammatical inference framework will be presented on the other hand naive combination of smoothed gram models will be introduced to evaluate the performance of bilingual classifiers two categorized bilingual corpora of different complexity were considered experiments in limited domain task show that all the models obtain similar results however results on more open domain task denote the supremacy of the naive approach
code generation for embedded processors opens up the possibility for several performance optimization techniques that have been ignored by traditional compilers due to compilation time constraints we present techniques that take into account the parameters of the data caches for organizing scalar and array variables declared in embedded code into memory with the objective of improving data cache performance we present techniques for clustering variables to minimize compulsory cache misses and for solving the memory assignment problem to minimize conflict cache misses our experiments with benchmark code kernels from dsp and other domains on the cw embedded processor from lsi logic indicate significant improvements in data cache performance by the application of our memory organization technique
one purpose of software metrics is to measure the quality of programs the results can be for example used to predict maintenance costs or improve code quality an emerging view is that if software metrics are going to be used to improve quality they must help in finding code that should be refactored often refactoring or applying design pattern is related to the role of the class to be refactored in client based metrics project gives the class context these metrics measure how class is used by other classes in the context we present new client based metric lcic lack of coherence in clients which analyses if the class being measured has coherent set of roles in the program interfaces represent the roles of classes if class does not have coherent set of roles it should be refactored or new interface should be defined for the class we have implemented tool for measuring the metric lcic for java projects in the eclipse environment we calculated lcic values for classes of several open source projects we compare these results with results of other related metrics and inspect the measured classes to find out what kind of refactorings are needed we also analyse the relation of different design patterns and refactorings to our metric our experiments reveal the usefulness of client based metrics to improve the quality of code
ranked queries return the top objects of database according to preference function we present and evaluate experimentally and theoretically core algorithm that answers ranked queries in an efficient pipelined manner using materialized ranked views we use and extend the core algorithm in the described prefer and merge systems prefer precomputes set of materialized views that provide guaranteed query performance we present an algorithm that selects near optimal set of views under space constraints we also describe multiple optimizations and implementation aspects of the downloadable version of prefer then we discuss merge which operates at metabroker and answers ranked queries by retrieving minimal number of objects from sources that offer ranked queries speculative version of the pipelining algorithm is described
the geographic routing is an ideal approach to realize pointto point routing in wireless sensor networks because packets can be delivered by only maintaining small set of neighbors physical positions the geographic routing assumes that packet can be moved closer to the destination in the network topology if it is moved geographically closer to the destination in the physical space this assumption however only holds in an ideal model where uniformly distributed nodes communicate with neighbors through wireless channels with perfect reception because this model oversimplifies the spatial complexity of wireless sensor network the geographic routing may often lead packet to the local minimum or low quality route unlike the geographic forwarding the etx embedding proposed in this paper can accurately encode both network’s topological structure and channel quality to small size nodes virtual coordinates which makes it possible for greedy forwarding to guide packet along an optimal routing path our performance evaluation based on both the mica sensor platform and tossim simulator shows that the greedy forwarding based on etx embedding outperforms previous geographic routing approaches
soft real time systems can tolerate some occasional deadline misses this feature provides unique opportunity to reduce system’s energy consumption in this paper we study the system with firm deadline popular model for soft real time systems it basically requires at least successful completions in any consecutive executions our goal is to design such system with dual supply voltages for energy efficiency to reach this goal we first propose an on line greedy deterministic scheduler that provides the firm guarantee with the provably minimum energy consumption we then develop novel exact method to compute the scheduler’s average energy consumption per iteration this leads us to the numerical solution to the voltage set up problem which seeks for the values of the two supply voltages to achieve the most energy efficiency with firm guarantee simulation shows that dual voltage system can reduce significant amount of energy over single voltage system our numerical method finds the best voltage set ups in seconds while it takes hours to obtain almost identical solutions by simulation
in this paper we describe how the memory management mechanisms of the intel iapx are used to implement the visibility rules of ada at any point in the execution of an ada reg program on the the program has protected address space that corresponds exactly to the program’s accessibility at the corresponding point in the program’s source this close match of architecture and language did not occur because the was designed to execute ada mdash it was not rather both ada and the are the result of very similar design goals to illustrate this point we compare in their support for ada the memory management mechanisms of the to those of traditional computers the most notable differences occur in heap space management and multitasking with respect to the former we describe degree of hardware software cooperation that is not typical of other systems in the latter area we show how ada’s view of sharing is the same as the but differs totally from the sharing permitted by traditional systems description of these differences provide some insight into the problems of implementing an ada compiler for traditional architecture
the diffusion of mobile devices in the working landscape is promoting collaboration across time and space following through this development we investigate opportunities for improving awareness in mobile environments with view to enable collaboration under power constraints and transitory network disconnections we elaborate in particular on synchronous cscw and expose with it significant details of group awareness while we contribute protocol for awareness support over large areas that strikes balance between energy consumption and notification time to avoid user disruption this protocol notifies awareness information in multicast fashion while the bandwidth is allocated dynamically among notifications and data requests thus minimizing the time needed by each one of them and ensuring the isochronous delivery of information to all clients the efficiency and scalability of our protocol are evaluated with simulation experiments whereby we compare various notification schemes and finally choose one that changes dynamically over time
the advances in wireless and mobile computing allow mobile user to perform wide range of aplications once limited to non mobile hard wired computing environments as the geographical position of mobile user is becoming more trackable users need to pull data which are related to their location perhaps seeking information about unfamiliar places or local lifestyle data in these requests location attribute has to be identified in order to provide more efficient access to location dependent data whose value is determined by the location to which it is related local yellow pages local events and weather information are some of the examples of these data in this paper we give formalization of location relatedness in queries we differentiate location dependence and location awareness and provide thorough examples to support our approach
power consumption is major factor that limits the performance of computers we survey the ldquo state of the art rdquo in techniques that reduce the total power consumed by microprocessor system over time these techniques are applied at various levels ranging from circuits to architectures architectures to system software and system software to applications they also include holistic approaches that will become more important over the next decade we conclude that power management is multifaceted discipline that is continually expanding with new techniques being developed at every level these techniques may eventually allow computers to break through the ldquo power wall rdquo and achieve unprecedented levels of performance versatility and reliability yet it remains too early to tell which techniques will ultimately solve the power problem
automatic localisation of correspondences for the construction of statistical shape models from examples has been the focus of intense research during the last decade several algorithms are available and benchmarking is needed to rank the different algorithms prior work has argued that the quality of the models produced by the algorithms can be evaluated by measuring compactness generality and specificity in this paper severe problems with these standard measures are analysed both theoretically and experimentally both on natural and synthetic datasets we also propose that ground truth correspondence measure gcm is used for benchmarking and in this paper benchmarking is performed on several state of the art algorithms using seven real and one synthetic dataset
we present core calculus with two of key constructs for parallelism namely async and finish our calculus forms convenient basis for type systems and static analyses for languages with async finish parallelism and for tractable proofs of correctness for example we give short proof of the deadlock freedom theorem of saraswat and jagadeesan our main contribution is type system that solves the open problem of context sensitive may happen in parallel analysis for languages with async finish parallelism we prove the correctness of our type system and we report experimental results of performing type inference on lines of code our analysis runs in polynomial time takes total of seconds on our benchmarks and produces low number of false positives which suggests that our analysis is good basis for other analyses such as race detectors
we present efficient fixed parameter algorithms for the np complete edge modification problems cluster editing and cluster deletion here the goal is to make the fewest changes to the edge set of an input graph such that the new graph is vertex disjoint union of cliques allowing up to edge additions and deletions cluster editing we solve this problem in time allowing only up to edge deletions cluster deletion we solve this problem in time the key ingredients of our algorithms are two easy to implement bounded search tree algorithms and reduction to problem kernel of size this improves and complements previous work
data integration is the problem of combining data residing at different autonomous heterogeneous sources and providing the client with unified reconciled global view of the data we discuss data integration systems taking the abstract viewpoint that the global view is an ontology expressed in class based formalism we resort to an expressive description logic alcqi that fully captures class based representation formalisms and we show that query answering in data integration as well as all other relevant reasoning tasks is decidable however when we have to deal with large amounts of data the high computational complexity in the size of the data makes the use of full fledged expressive description logic infeasible in practice this leads us to consider dl lite specifically tailored restriction of alcqi that ensures tractability of query answering in data integration while keeping enough expressive power to capture the most relevant features of class based formalisms
novel technique is proposed for the management of reconfigurable device in order to get true hardware multitasking we use vertex list set to keep track of the free area boundary this structure contains the best candidate locations for the task and several heuristics are proposed to select one of them based in fragmentation and adjacency look ahead heuristic that anticipates the next known event is also proposed metric is used to estimate the fragmentation status of the fpga based on the number of holes and their shape defragmentation measures are taken when needed
labeling text data is quite time consuming but essential for automatic text classification especially manually creating multiple labels for each document may become impractical when very large amount of data is needed for training multi label text classifiers to minimize the human labeling efforts we propose novel multi label active learning approach which can reduce the required labeled data without sacrificing the classification accuracy traditional active learning algorithms can only handle single label problems that is each data is restricted to have one label our approach takes into account the multi label information and select the unlabeled data which can lead to the largest reduction of the expected model loss specifically the model loss is approximated by the size of version space and the reduction rate of the size of version space is optimized with support vector machines svm an effective label prediction method is designed to predict possible labels for each unlabeled data point and the expected loss for multi label data is approximated by summing up losses on all labels according to the most confident result of label prediction experiments on several real world data sets all are publicly available demonstrate that our approach can obtain promising classification result with much fewer labeled data than state of the art methods
this paper introduces novel technique for joint surface reconstruction and registration given set of roughly aligned noisy point clouds it outputs noise free and watertight solid model the basic idea of the new technique is to reconstruct prototype surface at increasing resolution levels according to the registration accuracy obtained so far and to register all parts with this surface we derive non linear optimization problem from bayesian formulation of the joint estimation problem the prototype surface is represented as partition of unity implicit surface which is constructed from piecewise quadratic functions defined on octree cells and blended together using spline basis functions allowing the representation of objects with arbitrary topology with high accuracy we apply the new technique to set of standard data sets as well as especially challenging real world cases in practice the novel prototype surface based joint reconstruction registration algorithm avoids typical convergence problems in registering noisy range scans and substantially improves the accuracy of the final output
we study traffic measurement issue for active queue management and dynamic bandwidth allocation at single network node under the constraint of cell loss probability clp or buffer overflow probability using the concept of measurement based virtual queue vq and frequency domain traffic filtering we propose an online algorithm to estimate the real time bandwidth demand under both short and long term cell loss constraint the algorithm is adaptive and robust to the piece wise stationary traffic dynamics the vq runs in parallel to the real queueing system and monitors the latter in non intrusive way it captures proper traffic sampling interval tc simulation and analysis show its critical role in achieving higher utilization of bandwidth and buffer resource without violating qos requirement given an appropriate tc we argue that network controls such as the call admission control cac and dynamic bandwidth allocation are facilitated with accurate information about traffic loading and qos status at the smallest timescale
two methods have been used extensively to model resting contact for rigid body simulation the first approach the penalty method applies virtual springs to surfaces in contact to minimize interpenetration this method as typically implemented results in oscillatory behavior and considerable penetration the second approach based on formulating resting contact as linear complementarity problem determines the resting contact forces analytically to prevent interpenetration the analytical method exhibits expected case polynomial complexity in the number of contact points and may fail to find solution in polynomial time when friction is modeled we present fast penalty method that minimizes oscillatory behavior and leads to little penetration during resting contact our method compares favorably to the analytical method with regard to these two measures while exhibiting much faster performance both asymptotically and empirically
this paper proposes cluster based peer to peer system called peercluster for sharing data over the internet in peercluster all participant computers are grouped into various interest clusters each of which contains computers that have the same interests the intuition behind the system design is that by logically grouping users interested in similar topics together we can improve query efficiency to efficiently route and broadcast messages across within interest clusters hypercube topology is employed in addition to ensure that the structure of the interest clusters is not altered by arbitrary node insertions deletions we have devised corresponding join and leave protocols the complexities of these protocols are analyzed moreover we augment peercluster with system recovery mechanism to make it robust against unpredictable computer network failures using an event driven simulation we evaluate the performance of our approach by varying several system parameters the experimental results show that peercluster outperforms previous approaches in terms of query efficiency while still providing the desired functionality of keyword based search
grid resource management has been traditionally limited to just two levels of hierarchy namely local resource managers and metaschedulers this results in non manageable and thus not scalable architecture where each metascheduler has to be able to access thousands of resources which also implies having detailed knowledge about their interfaces and configuration this paper presents recursive architecture allowing an arbitrary number of levels in the hierarchy this way resources can be arranged in different ways for example following organizational boundaries or aggregating them by similarity while hiding the access details an implementation of this architecture is shown as well as its benefits in terms of autonomy scalability deployment and security the proposed implementation is based on existing interfaces allowing for standardization
tail calls are expected not to consume stack space in most functional languages however there is no support for tail calls in some environments even in such environments proper tail calls can be implemented with technique called trampoline to reduce the overhead of trampolining while preserving stack space asymptotically we propose selective tail call elimination based on an effect system the effect system infers the number of successive tail calls generated by the execution of an expression and trampolines are introduced only when they are necessary
automatic processing of medical dictations poses significant challenge we approach the problem by introducing statistical framework capable of identifying types and boundaries of sections lists and other structures occurring in dictation thereby gaining explicit knowledge about the function of such elements training data is created semi automatically by aligning parallel corpus of corrected medical reports and corresponding transcripts generated via automatic speech recognition we highlight the properties of our statistical framework which is based on conditional random fields crfs and implemented as an efficient publicly available toolkit finally we show that our approach is effective both under ideal conditions and for real life dictation involving speech recognition errors and speech related phenomena such as hesitation and repetitions
this paper concerns construction of additive stretched spanners with few edges for vertex graphs having tree decomposition into bags of diameter at most ie the tree length graphs for such graphs we construct additive spanners with dn nlogn edges and additive spanners with dn edges this provides new upper bounds for chordal graphs for which we also show lower bound and prove that there are graphs of tree length for which every multiplicative spanner and thus every additive spanner requires edges
sequential sat solver satori was recently proposed as an alternative to combinational sat in verification applications this paper describes the design of seq sat an efficient sequential sat solver with improved search strategies over satori the major improvements include new and better heuristic for minimizing the set of assignments to state variables new priority based search strategy and flexible sequential search framework which integrates different search strategies and decision variable selection heuristic more suitable for solving the sequential problems we present experimental results to demonstrate that our sequential sat solver can achieve orders of magnitude speedup over satori we plan to release the source code of seq sat along with this paper
finding frequent patterns in continuous stream of transactions is critical for many applications such as retail market data analysis network monitoring web usage mining and stock market prediction even though numerous frequent pattern mining algorithms have been developed over the past decade new solutions for handling stream data are still required due to the continuous unbounded and ordered sequence of data elements generated at rapid rate in data stream therefore extracting frequent patterns from more recent data can enhance the analysis of stream data in this paper we propose an efficient technique to discover the complete set of recent frequent patterns from high speed data stream over sliding window we develop compact pattern stream tree cps tree to capture the recent stream data content and efficiently remove the obsolete old stream data content we also introduce the concept of dynamic tree restructuring in our cps tree to produce highly compact frequency descending tree structure at runtime the complete set of recent frequent patterns is obtained from the cps tree of the current window using an fp growth mining technique extensive experimental analyses show that our cps tree is highly efficient in terms of memory and time complexity when finding recent frequent patterns from high speed data stream
initial algebra semantics is cornerstone of the theory of modern functional programming languages for each inductive data type it provides fold combinator encapsulating structured recursion over data of that type church encoding build combinator which constructs data of that type and fold build rule which optimises modular programs by eliminating intermediate data of that type it has long been thought that initial algebra semantics is not expressive enough to provide similar foundation for programming with nested types specifically the folds have been considered too weak to capture commonly occurring patterns of recursion and no church encodings build combinators or fold build rules have been given for nested types this paper overturns this conventional wisdom by solving all of these problems
existing test suite reduction techniques employed for testing web applications have either used traditional program coverage based requirements or usage based requirements in this paper we explore three different strategies to integrate the use of program coverage based requirements and usage based requirements in relation to test suite reduction for web applications we investigate the use of usage based test requirements for comparison of test suites that have been reduced based on program coverage based test requirements we examine the effectiveness of test suite reduction process based on combination of both usage based and program coverage based requirements finally we modify popular test suite reduction algorithm to replace part of its test selection process with selection based on usage based test requirements our case study suggests that integrating program coverage based and usage based test requirements has positive impact on the effectiveness of the resulting test suites
this paper demonstrates the use of model based evaluation approach for instrumentation systems iss the overall objective of this study is to provide early feedback to tool developers regarding is overhead and performance such feedback helps developers make appropriate design decisions about alternative system configurations and task scheduling policies we consider three types of system architectures network of workstations now symmetric multiprocessors smp and massively parallel processing mpp systems we develop resource occupancy rocc model for an on line is for an existing tool and parameterize it for an ibm sp platform this model is simulated to answer several what if questions regarding two policies to schedule instrumentation data forwarding collect and forward cf and batch and forward bf in addition this study investigates two alternatives for forwarding the instrumentation data direct and binary tree forwarding for an mpp system simulation results indicate that the bf policy can significantly reduce the overhead and that the tree forwarding configuration exhibits desirable scalability characteristics for mpp systems initial measurement based testing results indicate more than percent reduction in the direct is overhead when the bf policy was added to paradyn parallel performance measurement tool
database optimizers require statistical information about data distributions in order to evaluate result sizes and access plan costs for processing user queries in this context we consider the problem of estimating the size of the projections of database relation when measures on attribute domain cardinalities are maintained in the system our main theoretical contribution is new formal model ad valid under the hypotheses of attribute independence and uniform distribution of attribute values derived considering the difference between time invariant domain the set of values that an attribute can assume and time dependent active domain the set of values that are actually assumed at certain time early models developed under the same assumptions are shown to be formally incorrect since the ad model is computationally high demanding we also introduce an approximate easy to compute model ad that unlike previous approximations yields low errors on all the parameter space of the active domain cardinalities finally we extend the ad model to the case of nonuniform distributions and present experimental results confirming the good behavior of the model
in the past years research on inductive inference has developed along different lines eg in the formalizations used and in the classes of target concepts considered one common root of many of these formalizations is gold’s model of identification in the limit this model has been studied for learning recursive functions recursively enumerable languages and recursive languages reflecting different aspects of machine learning artificial intelligence complexity theory and recursion theory one line of research focuses on indexed families of recursive languages classes of recursive languages described in representation scheme for which the question of membership for any string in any of the given languages is effectively decidable with uniform procedure such language classes are of interest because of their naturalness the survey at hand picks out important studies on learning indexed families including basic as well as recent research summarizes and illustrates the corresponding results and points out links to related fields such as grammatical inference machine learning and artificial intelligence in general
two complications frequently arise in real world applications motion and the contamination of data by outliers we consider fundamental clustering problem the center problem within the context of these two issues we are given finite point set of size and an integer in the standard center problem the objective is to compute set of center points to minimize the maximum distance from any point of to its closest center or equivalently the smallest radius such that can be covered by disks of this radius in the discrete center problem the disk centers are drawn from the points of and in the absolute center problem the disk centers are unrestricted we generalize this problem in two ways first we assume that points are in continuous motion and the objective is to maintain solution over time second we assume that some given robustness parameter
existing works on processing of extensible markup language xml documents have been concentrated on query optimisation storage problems documents transformation compressing methods and normalisation there are only few papers on concurrency control in accessing and modifying xml documents which are stored in xml database systems the aim of this paper is to analyse and compare the quantity of concurrency control methods for xml database systems based on dom api
we compare the performance of three usual allocations namely max min fairness proportional fairness and balanced fairness in communication network whose resources are shared by random number of data flows the model consists of network of processor sharing queues the vector of service rates which is constrained by some compact convex capacity set representing the network resources is function of the number of customers in each queue this function determines the way network resources are allocated we show that this model is representative of rich class of wired and wireless networks we give in this general framework the stability condition of max min fairness proportional fairness and balanced fairness and compare their performance on number of toy networks
context uncertainty is an unavoidable issue in software engineering and an important area of investigation this paper studies the impact of uncertainty on total duration ie make span for implementing all features in operational release planning objective the uncertainty factors under investigation are the number of new features arriving during release construction the estimated effort needed to implement features the availability of developers and the productivity of developers method an integrated method is presented combining monte carlo simulation to model uncertainty in the operational release planning orp process with process simulation to model the orp process steps and their dependencies as well as an associated optimization heuristic representing an organization specific staffing policy for make span minimization the method allows for evaluating the impact of uncertainty on make span the impact of uncertainty factors both in isolation and in combination are studied in three different pessimism levels through comparison with baseline plan initial evaluation of the method is done by an explorative case study at chartwell technology inc to demonstrate its applicability and its usefulness results the impact of uncertainty on release make span increases both in terms of magnitude and variance with an increase of pessimism level as well as with an increase of the number of uncertainty factors among the four uncertainty factors we found that the strongest impact stems from the number of new features arriving during release construction we have also demonstrated that for any combination of uncertainty factors their combined ie simultaneous impact is bigger than the addition of their individual impacts conclusion the added value of the presented method is that managers are able to study the impact of uncertainty on existing ie baseline operational release plans pro actively
in this paper an approach for the implementation of quality based web search engine is proposed quality retrieval is introduced and an overview on previous efforts to implement such service is given machine learning approaches are identified as the most promising methods to determine the quality of web pages features for the most appropriate characterization of web pages are determined quality model is developed based on human judgments this model is integrated into meta search engine which assesses the quality of all results at run time the evaluation results show that quality based ranking does lead to better results concerning the perceived quality of web pages presented in the result set the quality models are exploited to identify potentially important features and characteristics for the quality of web pages
in this work we present new in network techniques for communication efficient approximate query processing in wireless sensornets we use model based approach that constructs and maintains spanning tree within the network rooted at the basestation the tree maintains compressed summary information for each link that is used to stub out traversal during query processing our work is based on formal model of the in network tree construction task framed as an optimization problemwe demonstrate hardness results for that problem and develop efficient approximation algorithms for subtasks that are too expensive to compute exactly we also propose efficient heuristics to accommodate wider set of workloads and empirically evaluate their performance and sensitivity to model changes
in practice any database management system sometimes needs reorganization that is change in some aspect of the logical and or physical arrangement of database in traditional practice many types of reorganization have required denying access to database taking the database offline during reorganization taking database offline can be unacceptable for highly available hour database for example database serving electronic commerce or armed forces or for very large database solution is to reorganize online concurrently with usage of the database incrementally during users activities or interpretively this article is tutorial and survey on requirements issues and strategies for online reorganization it analyzes the issues and then presents the strategies which use the issues the issues most of which involve design trade offs include use of partitions the locus of control for the process that reorganizes background process or users activities reorganization by copying to newly allocated storage as opposed to reorganizing in place use of differential files references to data that has moved performance and activation of reorganization the article surveys online strategies in three categories of reorganization the first category maintenance involves restoring the physical arrangement of data instances without changing the database definition this category includes restoration of clustering reorganization of an index rebalancing of parallel or distributed data garbage collection for persistent storage and cleaning reclamation of space in log structured file system the second category involves changing the physical database definition topics include construction of indexes conversion between trees and linear hash files and redefinition eg splitting of partitions the third category involves changing the logical database definition some examples are changing column’s data type changing the inheritance hierarchy of object classes and changing relationship from one to many to many to many the survey encompasses both research and commercial implementations and this article points out several open research topics as highly available or very large databases continue to become more common and more important in the world economy the importance of online reorganization is likely to continue growing
we have devised an algorithm for minimal placement of bank selections in partitioned memory architectures this algorithm is parameterizable for chosen metric such as speed space or energy bank switching is technique that increases the code and data memory in microcontrollers without extending the address buses given program in which variables have been assigned to data banks we present novel optimization technique that minimizes the overhead of bank switching through cost effective placement of bank selection instructions the placement is controlled by number of different objectives such as runtime low power small code size or combination of these parameters we have formulated the minimal placement of bank selection instructions as discrete optimization problem that is mapped to partitioned boolean quadratic programming pbqp problem we implemented the optimization as part of pic microchip backend and evaluated the approach for several optimization objectives our benchmark suite comprises programs from mibench and dspstone plus microcontroller real time kernel and drivers for microcontroller hardware devices our optimization achieved reduction in program memory space of between and percent and an overall improvement with respect to instruction cycles between and percent our optimization achieved the minimal solution for all benchmark programs we investigated the scalability of our approach toward the requirements of future generations of microcontrollers this study was conducted as worst case analysis on the entire mibench suite our results show that our optimization scales well to larger numbers of memory banks scales well to the larger problem sizes that will become feasible with future microcontrollers and achieves minimal placement for more than percent of all functions from mibench
we present new metric for routing in multi radio multi hop wireless networks we focus on wireless networks with stationary nodes such as community wireless networksthe goal of the metric is to choose high throughput path between source and destination our metric assigns weights to individual links based on the expected transmission time ett of packet over the link the ett is function of the loss rate and the bandwidth of the link the individual link weights are combined into path metric called weighted cumulative ett wcett that explicitly accounts for the interference among links that use the same channel the wcett metric is incorporated into routing protocol that we call multi radio link quality source routingwe studied the performance of our metric by implementing it in wireless testbed consisting of nodes each equipped with two wireless cards we find that in multi radio environment our metric significantly outperforms previously proposed routing metrics by making judicious use of the second radio
semantic annotations of web services can support the effective and efficient discovery of services and guide their composition into workflows at present however the practical utility of such annotations is limited by the small number of service annotations available for general use manual annotation of services is time consuming and thus expensive task so some means are required by which services can be automatically or semi automatically annotated in this paper we show how information can be inferred about the semantics of operation parameters based on their connections to other annotated operation parameters within tried and tested workflows because the data links in the workflows do not necessarily contain every possible connection of compatible parameters we can infer only constraints on the semantics of parameters we show that despite their imprecise nature these so called loose annotations are still of value in supporting the manual annotation task inspecting workflows and discovering services we also show that derived annotations for already annotated parameters are useful by comparing existing and newly derived annotations of operation parameters we can support the detection of errors in existing annotations the ontology used for annotation and in workflows the derivation mechanism has been implemented and its practical applicability for inferring new annotations has been established through an experimental evaluation the usefulness of the derived annotations is also demonstrated
we describe an ethnographic study that explores how low tech and new tech surfaces support participation and collaboration during workshop breakout session the low tech surfaces were post it notes and large sheets of paper the new tech surfaces were writeable walls and multi touch tabletop four groups used the different surfaces during three phases brief presentation of position papers and discussion of themes ii the creation of group presentation and iii report back session participation and collaboration varied depending on the physical technological and social factors at play when using the different surfaces we discuss why this is the case noting how new shareable surfaces may need to be constrained to invite participation in ways that are simply taken for granted because of their familiarity when using low tech materials
this paper addresses the pragmatics of web information systems wis by analysing their usage starting from classification of intentions we first present life cases which capture observations of user behaviour in reality we discuss the facets of life cases and present semi formal way for their documentation life cases can be used in pragmatic way to specify story space which is an important component of storyboard in second step we complement life cases by user models that are specified by various facets of actor profiles that are needed for them we analyse actor profiles and present semi formal way for their documentation we outline how these profiles can be used to specify actors which are an important component of storyboard finally we analyse contexts and the way they impact on life cases user models and the storyboard
we present method to align words in bitext that combines elements of traditional statistical approach with linguistic knowledge we demonstrate this approach for arabic english using an alignment lexicon produced by statistical word aligner as well as linguistic resources ranging from an english parser to heuristic alignment rules for function words these linguistic heuristics have been generalized from development corpus of parallel sentences our aligner ualign outperforms both the commonly used giza aligner and the state of the art leaf aligner on measure and produces superior scores in end to end statistical machine translation bleu points over giza and over leaf
this paper investigates efficient evaluation of database updates and presents procedural semantics for stratified update programs that extend stratified logic programs with bulk updates and hypothetical reasoning bulk rules with universal quantification in the body allow an arbitrary update to be applied simultaneously for every answer of an arbitrary query hypothetical reasoning is supported by testing the success or failure of an update the procedural semantics offers efficient goal dash oriented tabled evaluation of database updates it guarantees termination for function dash free stratified update programs and avoids repeated computation of identical subgoals
we consider computationally efficient incentive compatiblemechanisms that use the vcg payment scheme and study how well theycan approximate the social welfare in auction settings we present anovel technique for setting lower bounds on the approximation ratioof this type of mechanisms specifically for combinatorial auctionsamong submodular and thus also subadditive bidders we prove an lower bound which is close to the knownupper bound of and qualitatively higher than theconstant factor approximation possible from purely computationalpoint of view
embedded systems have been traditional area of strength in the research agenda of the university of california at berkeley in parallel to this effort pattern of graduate and undergraduate classes has emerged that is the result of distillation process of the research results in this paper we present the considerations that are driving our curriculum development and we review our undergraduate and graduate program in particular we describe in detail graduate class eecs design of embedded systems modeling validation and synthesis that has been taught for six years common feature of our education agenda is the search for fundamentals of embedded system science rather than embedded system design techniques an approach that today is rather unique
pointer analysis classic problem in software program analysis has emerged as an important problem to solve in design automation at time when complex designs specified in the form of code need to be synthesized or verified however precise pointer analysis algorithms that are both context and flow sensitive fscs have not been shown to scale in this paper we report new solution for fscs analysis which can evaluate the program states of all program points under billions of different calling paths our solution extends the recently proposed symbolic pointer analysis spa technology which exploits the efficiency of binary decision diagrams bdds with our new strategy of problem solving called superposed symbolic computation and its application on our generic pointer analysis framework we are able to report the first result on all spec benchmarks that completes context sensitive flow insensitive analysis in seconds and context sensitive flow sensitive analysis in minutes
we present novel technique called radiance scaling for the depiction of surface shape through shading it adjusts reflected light intensities in way dependent on both surface curvature and material characteristics as result diffuse shading or highlight variations become correlated to surface feature variations enhancing surface concavities and convexities this approach is more versatile compared to previous methods first it produces satisfying results with any kind of material we demonstrate results obtained with phong and ashikmin brdfs cartoon shading sub lambertian materials and perfectly reflective or refractive objects second it imposes no restriction on lighting environment it does not require dense sampling of lighting directions and works even with single light third it makes it possible to enhance surface shape through the use of precomputed radiance data such as ambient occlusion prefiltered environment maps or lit spheres our novel approach works in real time on modern graphics hardware
mobile nodes in some challenging network scenarios eg battlefield and disaster recovery scenarios suffer from intermittent connectivity and frequent partitions disruption tolerant network dtn technologies are designed to enable communications in such environments several dtn routing schemes have been proposed however not much work has been done on designing schemes that provide efficient information access in such challenging network scenarios in this paper we explore how content based information retrieval system can be designed for dtns there are three important design issues namely how data should be replicated and stored at multiple nodes how query is disseminated in sparsely connected networks and how query response is routed back to the issuing node we first describe how to select nodes for storing the replicated copies of data items we consider the random and the intelligent caching schemes in the random caching scheme nodes that are encountered first by data generating node are selected to cache the extra copies while in the intelligent caching scheme nodes that can potentially meet more nodes eg faster nodes are selected to cache the extra data copies the number of replicated data copies can be the same for all data items or varied depending on the access frequencies of the data items in this work we consider fixed proportional and square root replication schemes then we describe two query dissemination schemes copy selective query spraying wss scheme and hop neighborhood spraying lns scheme in the wss scheme nodes that can move faster are selected to cache the queries while in the lns scheme nodes that are within hops of querying node will cache the queries for message routing we use an enhanced prophet scheme where next hop node is selected only if its predicted delivery probability to the destination is higher than certain threshold we conduct extensive simulation studies to evaluate different combinations of the replication and query dissemination algorithms our results reveal that the scheme that performs the best is the one that uses the wss scheme combined with binary spread of replicated data copies the wss scheme can achieve higher query success ratio when compared to scheme that does not use any data and query replication furthermore the square root and proportional replication schemes provide higher query success ratio than the fixed copy approach with varying node density in addition the intelligent caching approach can further improve the query success ratio by with varying node density our results using different mobility models reveal that the query success ratio degrades at most when the community based model is used compared to the random waypoint rwp model broch et al performance comparison of multihop wireless ad hoc network routing protocols acm mobicom pp compared to the rwp and the community based mobility models the umassbusnet model from the dieselnet project zhang et al modeling of bus based disruption tolerant network trace proceedings of acm mobihoc achieves much lower query success ratio because of the longer inter node encounter time
we propose an automatic instrumentation method for embedded software annotation to enable performance modeling in high level hardware software co simulation environments the proposed cross annotation technique consists of extending retargetable compiler infrastructure to allow the automatic instrumentation of embedded software at the basic block level thus target and annotated native binaries are guaranteed to have isomorphic control flow graphs cfg the proposed method takes into account the processor specific optimizations at the compiler level and proves to be accurate with low simulation overhead
one approach to prolong the lifetime of wireless sensor network wsn is to deploy some relay nodes to communicate with the sensor nodes other relay nodes and the base stations the relay node placement problem for wireless sensor networks is concerned with placing minimum number of relay nodes into wireless sensor network to meet certain connectivity or survivability requirements previous studies have concentrated on the unconstrained version of the problem in the sense that relay nodes can be placed anywhere in practice there may be some physical constraints on the placement of relay nodes to address this issue we study constrained versions of the relay node placement problem where relay nodes can only be placed at set of candidate locations in the connected relay node placement problem we want to place minimum number of relay nodes to ensure that each sensor node is connected with base station through bidirectional path in the survivable relay node placement problem we want to place minimum number of relay nodes to ensure that each sensor node is connected with two base stations or the only base station in case there is only one base station through two node disjoint bidirectional paths for each of the two problems we discuss its computational complexity and present framework of polynomial time approximation algorithms with small approximation ratios extensive numerical results showthat our approximation algorithms can produce solutions very close to optimal solutions
we propose low leakage cache architecture based on the observation of the spatio temporal properties of data caches in particular we exploit the fact that during the program lifetime few data values tend to exhibit both spatial and temporal locality in cache ie values that are simultaneously stored by several lines at the same time leakage energy can be reduced by turning off those lines and storing these values in smaller separate memory in this work we introduce an architecture that implements such scheme as well as an algorithm to detect these special values we show that by using as few as four values we can achieve leakage energy savings with an additional reduction of dynamic energy as consequence of reduced average cache access cost
in november the fcc ruled that the digital tv whitespaces be used for unlicensed access this is an exciting development because dtv whitespaces are in the low frequency range mhz compared to typical cellular and ism bands thus resulting in much better propagation characteristics and much higher spectral efficiencies the fcc has also mandated certain guidelines for short range unlicensed access so as to avoid any interference to dtv receivers we consider the problem of wifi like access popularly referred to as wifi for enterprizes we assume that the access points and client devices are equipped with cognitive radios ie they can adaptively choose the center frequency bandwidth and ower of operation the access points can be equipped with one or more radios our goal is to design complete system which does not violate the fcc mandate ii dynamically assigns center frequency and bandwidth to each access point based on their demands and iii squeezes the maximum efficiency from the available spectrum this problem is far more general than prior work that investigated dynamic spectrum allocation in cellular and ism bands due to the non homogenous nature of the whitespaces ie different whitespace widths in different parts of the spectrum and the large range of frequency bands with different propagation characteristics this calls for more holistic approach to system design that also accounts for frequency dependent propagation characteristics and radio frontend characteristics in this paper we first propose design rules for holistic system design we then describe an architecture derived from our design rules finally we propose demand based dynamic spectrum allocation algorithms with provable worst case guarantees we provide extensive simulation results showing that the performance of our algorithm is within of the optimal in typical settings and ii and the dtv whitespaces can provide significantly higher data rates compared to the ghz ism band our approach is general enough for designing any system with access to wide range of spectrum
opinion mining is the task of extracting from set of documents opinions expressed by source on specified target this article presents comparative study on the methods and resources that can be employed for mining opinions from quotations reported speech in newspaper articles we show the difficulty of this task motivated by the presence of different possible targets and the large variety of affect phenomena that quotes contain we evaluate our approaches using annotated quotations extracted from news provided by the emm news gathering engine we conclude that generic opinion mining system requires both the use of large lexicons as well as specialised training and testing data
we consider the setting of multiprocessor where the speeds of the processors can be individually scaled jobs arrive over time and have varying degrees of parallelizability nonclairvoyant scheduler must assign the jobs to processors and scale the speeds of the processors we consider the objective of energy plus flow time for jobs that may have side effects or that are not checkpointable we show an bound on the competitive ratio of any deterministic algorithm here is the number of processors and is the exponent of the power function for checkpointable jobs without side effects we give an log competitive algorithm thus for jobs that may have side effects or that are not checkpointable the achievable competitive ratio grows quickly with the number of processors but for checkpointable jobs without side effects the achievable competitive ratio grows slowly with the number of processors we then show lower bound of log on the competitive ratio of any algorithm for checkpointable jobs without side effects finally we slightly improve the upper bound on the competitive ratio for the single processor case which is equivalent to the case that all jobs are fully parallelizable by giving an improved analysis of previously proposed algorithm
model to query document databases by both their content and structure is presented the goal is to obtain query language that is expressive in practice while being efficiently implementable features not present at the same time in previous work the key ideas of the model are set oriented query language based on operations on nearby structure elements of one or more hierarchies together with content and structural indexing and bottom up evaluation the model is evaluated in regard to expressiveness and efficiency showing that it provides good trade off between both goals finally it is shown how to include in the model other media different from text
the state explosion problem of formal verification has obstructed its application to large scale software systems in this article we introduce set of new condensation theories iot failure equivalence iot state equivalence and firing dependence theory to cope with this problem our condensation theories are much weaker than current theories used for the compositional verification of petri nets more significantly our new condensation theories can eliminate the interleaved behaviors caused by asynchronously sending actions therefore our technique provides much more powerful means for the compositional verification of asynchronous processes our technique can efficiently analyze several state based properties boundedness reachable markings reachable submarkings and deadlock states based on the notion of our new theories we develop set of condensation rules for efficient verification of large scale software systems the experimental results show significant improvement in the analysis large scale concurrent systems
we consider the distribution of channels of live multimedia content eg radio or tv broadcasts via multiple content aggregators in our work an aggregator receives channels from content sources and redistributes them to potentially large number of mobile hosts each aggregator can offer channel in various configurations to cater for different wireless links mobile hosts and user preferences as result mobile host can generally choose from different configurations of the same channel offered by multiple alternative aggregators which may be available through different interfaces eg in hotspot mobile host may need to handoff to another aggregator once it receives channel to prevent service disruption mobile host may for instance need to handoff to another aggregator when it leaves the subnets that make up its current aggregator’s service area eg hotspot or cellular network in this paper we present the design of system that enables multi homed mobile hosts to seamlessly handoff from one aggregator to another so that they can continue to receive channel wherever they go we concentrate on handoffs between aggregators as result of mobile host crossing subnet boundary as part of the system we discuss lightweight application level protocol that enables mobile hosts to select the aggregator that provides the best configuration of channel the protocol comes into play when mobile host begins to receive channel and when it crosses subnet boundary while receiving the channel we show how our protocol can be implemented using the standard ietf session control and description protocols sip and sdp the implementation combines sip and sdp’s offer answer model in novel way
in this paper we describe the design implementation and evaluation of software framework that supports the development of mobile context aware trails based applications trail is contextually scheduled collection of activities and represents generic model that can be used to satisfy the activity management requirements of wide range of context based time management applications trails overcome limitations with traditional time management techniques based on static to do lists by dynamically reordering activities based on emergent context
in multiuser multimedia information systems eg movie on demand digital editing scheduling the retrievals of continuous media objects becomes challenging task this is because of both intra and inter iobject time dependencies intraobject time dependency refers to the real time display requirement of continuous media object interobject time dependency is the temporal relationships defined among multiple continuous media objects in order to compose tailored multimedia presentations user might define complex time dependencies among multiple continuous media objects with various lengths and display bandwidths scheduling the retrieval tasks corresponding to the components of such presentation in order to respect both inter and intra task time dependencies is the focus of this study to tackle this task scheduling problem crs we start with simpler scheduling problem ars where there is no inter task time dependency eg movie on demand next we investigate an augmented version of ars termed ars where requests reserve displays in advance eg reservation based movie on demand finally we extend our techniques proposed for ars and ars to address the crs problem we also provide formal definition of these scheduling problems and proof of their np hardness
in automatic software verification we have observed theoretical convergence of model checking and program analysis in practice however model checkers are still mostly concerned with precision eg the removal of spurious counterexamples for this purpose they build and refine reachability trees lattice based program analyzers on the other hand are primarily concerned with efficiency we designed an algorithm and built tool that can be configured to perform not only purely tree based or purely lattice based analysis but offers many intermediate settings that have not been evaluated before the algorithm and tool take one or more abstract interpreters such as predicate abstraction and shape analysis and configure their execution and interaction using several parameters our experiments show that such customization may lead to dramatic improvements in the precision efficiency spectrum
in web database that dynamically provides information in response to user queries two distinct schemas interface schema the schema users can query and result schema the schema users can browse are presented to users each partially reflects the actual schema of the web database most previous work only studied the problem of schema matching across query interfaces of web databases in this paper we propose novel schema model that distinguishes the interface and the result schema of web database in specific domain in this model we address two significant web database schema matching problems intra site and inter site the first problem is crucial in automatically extracting data from web databases while the second problem plays significant role in meta retrieving and integrating data from different web databases we also investigate unified solution to the two problems based on query probing and instance based schema matching techniques using the model cross validation technique is also proposed to improve the accuracy of the schema matching our experiments on real web databases demonstrate that the two problems can be solved simultaneously with high precision and recall
in this paper we propose run time strategy for allocating application tasks to embedded multiprocessor systems on chip platforms where communication happens via the network on chip approach as novel contribution we incorporate the user behavior information in the resource allocation process this allows the system to better respond to real time changes and to adapt dynamically to different user needs several algorithms are proposed for solving the task allocation problem while minimizing the communication energy consumption and network contention when the user behavior is taken into consideration we observe more than communication energy savings with negligible energy and run time overhead compared to an arbitrary contiguous task allocation strategy
automatically recognising which html documents on the web contain items of interest for user is non trivial as step toward solving this problem we propose an approach based on information extraction ontologies given html documents tables and forms our document recognition system extracts expected ontological vocabulary keywords and keyword phrases and expected ontological instance data particular values for ontological concepts we then use machine learned rules over this extracted information to determine whether an html document contains items of interest experimental results show that our ontological approach to categorisation works well having achieved measures above for all applications we tried
we present new approach to accelerate collision detection for deformable models our formulation applies to all triangulated models and significantly reduces the number of elementary tests between features of the mesh ie vertices edges and faces we introduce the notion of representative triangles standard geometric triangles augmented with mesh feature information and use this representation to achieve better collision query performance the resulting approach can be combined with bounding volume hierarchies and works well for both inter object and self collision detection we demonstrate the benefit of representative triangles on continuous collision detection for cloth simulation and body collision scenarios we observe up to one order of magnitude reduction in feature pair tests and up to improvement in query time
graph theory has been shown to provide powerful tool for representing and tackling machine learning problems such as clustering semi supervised learning and feature ranking this paper proposes graph based discrete differential operator for detecting and eliminating competence critical instances and class label noise from training set in order to improve classification performance results of extensive experiments on artificial and real life classification problems substantiate the effectiveness of the proposed approach
there has been little research into how end users might be able to communicate advice to machine learning systems if this resource the users themselves could somehow work hand in hand with machine learning systems the accuracy of learning systems could be improved and the users understanding and trust of the system could improve as well we conducted think aloud study to see how willing users were to provide feedback and to understand what kinds of feedback users could give users were shown explanations of machine learning predictions and asked to provide feedback to improve the predictions we found that users had no difficulty providing generous amounts of feedback the kinds of feedback ranged from suggestions for reweighting of features to proposals for new features feature combinations relational features and wholesale changes to the learning algorithm the results show that user feedback has the potential to significantly improve machine learning systems but that learning algorithms need to be extended in several ways to be able to assimilate this feedback
information visualisation has become increasingly important in science engineering and commerce as tool to convey and explore complex sets of information this paper introduces visualisation schema which uses visual attributes as the principle components of visualisation we present new classification of visual attributes according to information accuracy information dimension and spatial requirements and obtain values for the information content and information density of each attribute the classification applies only to the perception of quantitative information and initial results of experiments suggest that it can not be extended to other visual processing tasks such as preattentive target detectionthe classification in combination with additional guidelines given in this paper provide the reader with useful tool for creating visualisations which convey complex sets of information more effectively
in this article we examine how clausal resolution can be applied to specific but widely used nonclassical logic namely discrete linear temporal logic thus we first define normal form for temporal formulae and show how arbitrary temporal formulae can be translated into the normal form while preserving satisfiability we then introduce novel resolution rules that can be applied to formulae in this normal form provide range of examples and examine the correctness and complexity of this approach finally we describe related work and future developments concerning this work
bitwise operations are commonly used in low level systems code to access multiple data fields that have been packed into single word program analysis tools that reason about such programs must model the semantics of bitwise operations precisely in order to capture program control and data flow through these operations we present type system for subword data structures that explitictly tracks the flow of bit values in the program and identifies consecutive sections of bits as logical entities manipulated atomically by the programmer our type inference algorithm tags each integer value of the program with bitvector type that identifies the data layout at the subword level these types are used in translation phase to remove bitwise operations from the program thereby allowing verification engines to avoid the expensive low level reasoning required for analyzing bitvector operations we have used software model checker to check properties of translated versions of linux device driver and memory protection system the resulting verification runs could prove many more properties than the naive model checker that did not reason about bitvectors and could prove properties much faster than model checker that did reason about bitvectors we have also applied our bitvector type inference algorithm to generate program documentation for virtual memory subsystem of an os kernel while we have applied the type system mainly for program understanding and verification bitvector types also have applications to better variable ordering heuristics in boolean model checking and memory optimizations in compilers for embedded software
in art grouping plays major role to convey relationships of objects and the organization of scenes it is separated from style which only determines how groups are rendered to achieve visual abstraction of the depicted scene we present an approach to interactively derive grouping information in dynamic scene our solution is simple and general the resulting grouping information can be used as an input to any rendering style we provide an efficient solution based on an extended mean shift algorithm customized by user defined criteria the resulting system is temporally coherent and real time the computational cost is largely determined by the scene’s structure rather than by its geometric complexity
this paper describes an approach spy to recovering the specification of software component from the observation of its run time behavior it focuses on components that behave as data abstractions components are assumed to be black boxes that do not allow any implementation inspection the inferred description may help understand what the component does when no formal specification is available spy works in two main stages first it builds deterministic finite state machine that models the partial behavior of instances of the data abstraction this is then generalized via graph transformation rules the rules can generate possibly infinite number of behavior models which generalize the description of the data abstraction under an assumption of regularity with respect to the observed behavior the rules can be viewed as likely specification of the data abstraction we illustrate how spy works on relevant examples and we compare it with competing methods
this paper presents concept hierarchy based approach to privacy preserving data collection for data mining called the level model the level model allows data providers to divulge information at any chosen privacy level level on any attribute data collected at high level signifies divulgence at higher conceptual level and thus ensures more privacy providing guarantees prior to release such as satisfying anonymity samarati sweeney can further protect the collected data set from privacy breaches due to linking the released data set with external data sets however the data mining process which involves the integration of various data values can constitute privacy breach if combinations of attributes at certain levels result in the inference of knowledge that exists at lower level this paper describes the level reduction phenomenon and proposes methods to identify and control the occurrence of this privacy breach
replay is an important technique in program analysis allowing to reproduce bugs to track changes and to repeat executions for better understanding of the results unfortunately since re executing concurrent program does not necessarily produce the same ordering of events replay of such programs becomes difficult task the most common approach to replay of concurrent programs is based on analyzing the logical dependencies among concurrent events and requires complete recording of the execution we are trying to replay as well as complete control over the program’s scheduler in realistic settings we usually have only partial recording of the execution and only partial control over the scheduling decisions thus such an analysis is often impossible in this paper we present an approach for replay in the presence of partial information and partial control our approach is based on novel application of the cross entropy method and it does not require any logical analysis of dependencies among concurrent events roughly speaking given partial recording of an execution we define performance function on executions which reaches its maximum on or any other execution that coincides with on the recorded events then the program is executed many times in iterations on each iteration adjusting the probabilistic scheduling decisions so that the performance function is maximized our method is also applicable to debugging of concurrent programs in which the program is changed before it replayed in order to increase the information from its execution we implemented our replay method on concurrent java programs and we show that it consistently achieves close replay in presence of incomplete information and incomplete control as well as when the program is changed before it is replayed
the work in this paper is motivated by the real world problems such as mining frequent traversal path patterns from very large web logs generalized suffix trees over very large alphabet can be used to solve such problems however traditional algorithms such as the weiner ukkonen and mccreight algorithms are not sufficient assurance of practicality because of large magnitudes of the alphabet and the set of strings in those real world problems two new algorithms are designed for fast construction of generalized suffix trees over very large alphabet and their performance is analyzed in comparison with the well known ukkonen algorithm it is shown that these two algorithms have better performance and can deal with large alphabets and large string sets well
we present texture synthesis scheme based on neighborhood matching with contributions in two areas parallelism and control our scheme defines an infinite deterministic aperiodic texture from which windows can be computed in real time on gpu we attain high quality synthesis using new analysis structure called the gaussian stack together with coordinate upsampling step and subpass correction approach texture variation is achieved by multiresolution jittering of exemplar coordinates combined with the local support of parallel synthesis the jitter enables intuitive user controls including multiscale randomness spatial modulation over both exemplar and output feature drag and drop and periodicity constraints we also introduce synthesis magnification fast method for amplifying coarse synthesis results to higher resolution
traditional behavior based worm detection can’t eliminate the influence of the worm like pp traffic effectively as well as detect slow worms to try to address these problems this paper first presents user habit model to describe the factors which influent the generation of network traffic then design of hpbrwd host packet behavior ranking based worm detection and some key issues about it are introduced this paper has three contributions to the worm detection presenting hierarchical user habit model using normal software and time profile to eliminate the worm like pp traffic and accelerate the detection of worms presenting hpbrwd to effectively detect worms experiments results show that hpbrwd is effective to detect worms
we present the design and implementation of new garbage collection framework that significantly generalizes existing copying collectors the beltway framework exploits and separates object age and incrementality it groups objects in one or more increments on queues called belts collects belts independently and collects increments on belt in first in first out order we show that beltway configurations selected by command line options act and perform the same as semi space generational and older first collectors and encompass all previous copying collectors of which we are aware the increasing reliance on garbage collected languages such as java requires that the collector perform well we show that the generality of beltway enables us to design and implement new collectors that are robust to variations in heap size and improve total execution time over the best generational copying collectors of which we are aware by up to and on average by to for small to moderate heap sizes new garbage collection algorithms are rare and yet we define not just one but new family of collectors that subsumes previous work this generality enables us to explore larger design space and build better collectors
comprehending and modifying software is at the heart of many software engineering tasks and this explains the growing interest that software reverse engineering has gained in the last years broadly speaking reverse engineering is the process of analyzing subject system to create representations of the system at higher level of abstraction this paper briefly presents an overview of the field of reverse engineering reviews main achievements and areas of application and highlights key open research issues for the future
we present shape retrieval methodology based on the theory of spherical harmonics using properties of spherical harmonics scaling and axial flipping invariance is achieved rotation normalization is performed by employing the continuous principal component analysis along with novel approach which applies pca on the face normals of the model the model is decomposed into set of spherical functions which represents not only the intersections of the corresponding surface with rays emanating from the origin but also points in the direction of each ray which are closer to the origin than the furthest intersection point the superior performance of the proposed methodology is demonstrated through comparison against state of the art approaches on standard databases
how to allocate computing and communication resources in way that maximizes the effectiveness of control and signal processing has been an important area of research the characteristic of multi hop real time wireless sensor network raises new challenges first the constraints are more complicated and new solution method is needed second distributed solution is needed to achieve scalability this article presents solutions to both of the new challenges the first solution to the optimal rate allocation is centralized solution that can handle the more general form of constraints as compared with prior research the second solution is distributed version for large sensor networks using pricing scheme it is capable of incremental adjustment when utility functions change this article also presents new sensor device network backbone architecture real time independent channels rich which can easily realize multi hop real time wireless sensor networking
data mining mechanisms have widely been applied in various businesses and manufacturing companies across many industry sectors sharing data or sharing mined rules has become trend among business partnerships as it is perceived to be mutually benefit way of increasing productivity for all parties involved nevertheless this has also increased the risk of unexpected information leaks when releasing data to conceal restrictive itemsets patterns contained in the source database sanitization process transforms the source database into released database that the counterpart cannot extract sensitive rules from the transformed result also conceals non restrictive information as an unwanted event called side effect or the misses cost the problem of finding an optimal sanitization method which conceals all restrictive itemsets but minimizes the misses cost is np hard to address this challenging problem this study proposes the maximum item conflict first micf algorithm experimental results demonstrate that the proposed method is effective has low sanitization rate and can generally achieve significantly lower misses cost than those achieved by the minfia maxfia iga and algob methods in several real and artificial datasets
networked games can provide groupware developers with important lessons in how to deal with real world networking issues such as latency limited bandwidth and packet loss games have similar demands and characteristics to groupware but unlike the applications studied by academics games have provided production quality real time interaction for many years the techniques used by games have not traditionally been made public but several game networking libraries have recently been released as open source providing the opportunity to learn how games achieve network performance we examined five game libraries to find networking techniques that could benefit groupware this paper presents the concepts most valuable to groupware developers including techniques to deal with limited bandwidth reliability and latency some of the techniques have been previously reported in the networking literature therefore the contribution of this paper is to survey which techniques have been shown to work over several years and then to link these techniques to quality requirements specific to groupware by adopting these techniques groupware designers can dramatically improve network performance on the real world internet
recent trend in interface design for classrooms in developing regions has many students interacting on the same display using mice text entry has emerged as an important problem preventing such mouse based singledisplay groupware systems from offering compelling interactive activities we explore the design space of mouse based text entry and develop techniques with novel characteristics suited to the multiple mouse scenario we evaluated these in phase study over days with students in developing region schools the results show that one technique effectively balanced all of our design dimensions another was most preferred by students and both could benefit from augmentation to support collaborative interaction our results also provide insights into the factors that create an optimal text entry technique for single display groupware systems
search engines need to evaluate queries extremely fast challenging task given the quantities of data being indexed significant proportion of the queries posed to search engines involve phrases in this article we consider how phrase queries can be efficiently supported with low disk overheads our previous research has shown that phrase queries can be rapidly evaluated using nextword indexes but these indexes are twice as large as conventional inverted files alternatively special purpose phrase indexes can be used but it is not feasible to index all phrases we propose combinations of nextword indexes and phrase indexes with inverted files as solution to this problem our experiments show that combined use of partial nextword partial phrase and conventional inverted index allows evaluation of phrase queries in quarter the time required to evaluate such queries with an inverted file alone the additional space overhead is only percnt of the size of the inverted file
the concept of privacy preserving has recently been proposed in response to the concerns of preserving personal or sensible information derived from data mining algorithms for example through data mining sensible information such as private information or patterns may be inferred from non sensible information or unclassified data there have been two types of privacy concerning data mining output privacy tries to hide the mining results by minimally altering the data input privacy tries to manipulate the data so that the mining result is not affected or minimally affectedfor output privacy in hiding association rules current approaches require hidden rules or patterns to be given in advance this selection of rules would require data mining process to be executed first based on the discovered rules and privacy requirements hidden rules or patterns are then selected manually however for some applications we are interested in hiding certain constrained classes of association rules such as collaborative recommendation association rules to hide such rules the pre process of finding these hidden rules can be integrated into the hiding process as long as the recommended items are given in this work we propose two algorithms dcis decrease confidence by increase support and dcds decrease confidence by decrease support to automatically hiding collaborative recommendation association rules without pre mining and selection of hidden rules examples illustrating the proposed algorithms are given numerical simulations are performed to show the various effects of the algorithms recommendations of appropriate usage of the proposed algorithms based on the characteristics of databases are reported
in recent years peer to peer pp file sharing systems have evolved to accommodate growing numbers of participating peers in particular new features have changed the properties of the unstructured overlay topologies formed by these peers little is known about the characteristics of these topologies and their dynamics in modern file sharing applications despite their importance this paper presents detailed characterization of pp overlay topologies and their dynamics focusing on the modern gnutella network we present cruiser fast and accurate pp crawler which can capture complete snapshot of the gnutella network of more than one million peers in just few minutes and show how inaccuracy in snapshots can lead to erroneous conclusions such as power law degree distribution leveraging recent overlay snapshots captured with cruiser we characterize the graph related properties of individual overlay snapshots and overlay dynamics across slices of back to back snapshots our results reveal that while the gnutella network has dramatically grown and changed in many ways it still exhibits the clustering and short path lengths of small world network furthermore its overlay topology is highly resilient to random peer departure and even systematic attacks more interestingly overlay dynamics lead to an onion like biased connectivity among peers where each peer is more likely connected to peers with higher uptime therefore long lived peers form stable core that ensures reachability among peers despite overlay dynamics
while soft processor cores provided by fpga vendors offer designers with increased flexibility such processors typically incur penalties in performance and energy consumption compared to hard processor core alternatives the recently developed technology of warp processing can help reduce those penalties warp processing is the dynamic and transparent transformation of critical software regions from microprocessor execution to much faster circuit execution on an fpga in this article we describe an implementation of warp processor on xilinx virtex ii pro and spartan fpgas incorporating one or more microblaze soft processor cores we further provide detailed analysis of the energy overhead of dynamically partitioning an application’s kernels to hardware executing within an fpga considering an implementation that periodically partitions the executing application once every minute microblaze based warp processor implemented on spartan fpga achieves average speedups of times and energy reductions of percnt compared to the microblaze soft processor core alone mdash providing competitive performance and energy consumption compared to existing hard processor cores
this paper proposes and evaluates single isa heterogeneousmulti core architectures as mechanism to reduceprocessor power dissipation our design incorporatesheterogeneous cores representing different points inthe power performance design space during an application’sexecution system software dynamically chooses themost appropriate core to meet specific performance andpower requirementsour evaluation of this architecture shows significant energybenefits for an objective function that optimizes forenergy efficiency with tight performance threshold for spec benchmarks our results indicate average energyreduction while only sacrificing in performancean objective function that optimizes for energy delay withlooser performance bounds achieves on average nearly afactor of three improvement in energy delay product whilesacrificing only in performance energy savings aresubstantially more than chip wide voltage frequency scaling
skip graphs are novel distributed data structure based on skip lists that provide the full functionality of balanced tree in distributed system where elements are stored in separate nodes that may fail at any time they are designed for use in searching peer to peer networks and by providing the ability to perform queries based on key ordering they improve on existing search tools that provide only hash table functionality unlike skip lists or other tree data structures skip graphs are highly resilient tolerating large fraction of failed nodes without losing connectivity in addition constructing inserting new elements into searching skip graph and detecting and repairing errors in the data structure introduced by node failures can be done using simple and straight forward algorithms
algorithmic solutions can help reduce energy consumption in computing environs
collaborative filtering has become an established method to measure users similarity and to make predictions about their interests however prediction accuracy comes at the cost of user’s privacy in order to derive accurate similarity measures users are required to share their rating history with each other in this work we propose new measure of similarity which achieves comparable prediction accuracy to the pearson correlation coefficient and that can successfully be estimated without breaking users privacy this novel method works by estimating the number of concordant discordant and tied pairs of ratings between two users with respect to shared random set of ratings in doing so neither the items rated nor the ratings themselves are disclosed thus achieving strictly private collaborative filtering the technique has been evaluated using the recently released netflix prize dataset
this article introduces novel representation for three dimensional objects in terms of local affine invariant descriptors of their images and the spatial relationships between the corresponding surface patches geometric constraints associated with different views of the same patches under affine projection are combined with normalized representation of their appearance to guide matching and reconstruction allowing the acquisition of true affine and euclidean models from multiple unregistered images as well as their recognition in photographs taken from arbitrary viewpoints the proposed approach does not require separate segmentation stage and it is applicable to highly cluttered scenes modeling and recognition results are presented
we study an approach to text categorization that combines distributional clustering of words and support vector machine svm classifier this word cluster representation is computed using the recently introduced information bottleneck method which generates compact and efficient representation of documents when combined with the classification power of the svm this method yields high performance in text categorization this novel combination of svm with word cluster representation is compared with svm based categorization using the simpler bag of words bow representation the comparison is performed over three known datasets on one of these datasets the newsgroups the method based on word clusters significantly outperforms the word based representation in terms of categorization accuracy or representation efficiency on the two other sets reuters and webkb the word based representation slightly outperforms the word cluster representation we investigate the potential reasons for this behavior and relate it to structural differences between the datasets
we consider the problem of finding highly correlated pairs in large data set that is given threshold not too small we wish to report all the pairs of items or binary attributes whose pearson correlation coefficients are greater than the threshold correlation analysis is an important step in many statistical and knowledge discovery tasks normally the number of highly correlated pairs is quite small compared to the total number of pairs identifying highly correlated pairs in naive way by computing the correlation coefficients for all the pairs is wasteful with massive data sets where the total number of pairs may exceed the main memory capacity the computational cost of the naive method is prohibitive in their kdd paper hui xiong et al address this problem by proposing the taper algorithm the algorithm goes through the data set in two passes it uses the first pass to generate set of candidate pairs whose correlation coefficients are then computed directly in the second pass the efficiency of the algorithm depends greatly on the selectivity pruning power of its candidate generating stagein this work we adopt the general framework of the taper algorithm but propose different candidate generation method for pair of items taper’s candidate generation method considers only the frequencies supports of individual items our method also considers the frequency support of the pair but does not explicitly count this frequency support we give simple randomized algorithm whose false negative probability is negligible the space and time complexities of generating the candidate set in our algorithm are asymptotically the same as taper’s we conduct experiments on synthesized and real data the results show that our algorithm produces greatly reduced candidate set one that can be several orders of magnitude smaller than that generated by taper because of this our algorithm uses much less memory and can be faster the former is critical for dealing with massive data
this study uses neural field model to investigate computational aspects of population coding and decoding when the stimulus is single variable general prototype model for the encoding process is proposed in which neural responses are correlated with strength specified by gaussian function of their difference in preferred stimuli based on the model we study the effect of correlation on the fisher information compare the performances of three decoding methods that differ in the amount of encoding information being used and investigate the implementation of the three methods by using recurrent network this study not only rediscovers main results in existing literatures in unified way but also reveals important new features especially when the neural correlation is strong as the neural correlation of firing becomes larger the fisher information decreases drastically we confirm that as the width of correlation increases the fisher information saturates and no longer increases in proportion to the number of neurons however we prove that as the width increases further wider than times the effective width of the turning function the fisher information increases again and it increases without limit in proportion to the number of neurons furthermore we clarify the asymptotic efficiency of the maximum likelihood inference mli type of decoding methods for correlated neural signals it shows that when the correlation covers nonlocal range of population excepting the uniform correlation and when the noise is extremely small the mli type of method whose decoding error satisfies the cauchy type distribution is not asymptotically efficient this implies that the variance is no longer adequate to measure decoding accuracy
recently there has been growing interest in gossip based protocols that employ randomized communication to ensure robust information dissemination in this paper we present novel gossip based scheme using which all the nodes in an node overlay network can compute the common aggregates of min max sum average and rank of their values using log log messages within log log log rounds of communication to the best of our knowledge ours is the first result that shows how to compute these aggregates with high probability using only log log messages in contrast the best known gossip based algorithm for computing these aggregates requires nlog messages and log rounds thus our algorithm allows system designers to trade off small increase in round complexity with significant reduction in message complexity this can lead to dramatically lower network congestion and longer node lifetimes in wireless and sensor networks where channel bandwidth and battery life are severely constrained
partitioning an application among software running on microprocessor and hardware co processors in on chip configurable logic has been shown to improve performance and energy consumption in embedded systems meanwhile dynamic software optimization methods have shown the usefulness and feasibility of runtime program optimization but those optimizations do not achieve as much as partitioning we introduce first approach to dynamic hardware software partitioning we describe our system architecture and initial on chip tools including profiler decompiler synthesis and placement and routing tools for simplified configurable logic fabric able to perform dynamic partitioning of real benchmarks we show speedups averaging for five benchmarks taken from powerstone netbench and our own benchmarks
developing desirable framework for handling inconsistencies in software requirements specifications is challenging problem it has been widely recognized that the relative priority of requirements can help developers to make some necessary trade off decisions for resolving con flicts however for most distributed development such as viewpoints based approaches different stakeholders may assign different levels of priority to the same shared requirements statement from their own perspectives the disagreement in the local levels of priority assigned to the same shared requirements statement often puts developers into dilemma during the inconsistency handling process the main contribution of this paper is to present prioritized merging based framework for handling inconsistency in distributed software requirements specifications given set of distributed inconsistent requirements collections with the local prioritization we first construct requirements specification with prioritization from an overall perspective we provide two approaches to constructing requirements specification with the global prioritization including merging based construction and priority vector based construction following this we derive proposals for handling inconsistencies from the globally prioritized requirements specification in terms of prioritized merging moreover from the overall perspective these proposals may be viewed as the most appropriate to modifying the given inconsistent requirements specification in the sense of the ordering relation over all the consistent subsets of the requirements specification finally we consider applying negotiation based techniques to viewpoints so as to identify an acceptable common proposal from these proposals
even the best laid plans can fail and robot plans executed in real world domains tend to do so often the ability of robot to reliably monitor the execution of plans and detect failures is essential to its performance and its autonomy in this paper we propose technique to increase the reliability of monitoring symbolic robot plans we use semantic domain knowledge to derive implicit expectations of the execution of actions in the plan and then match these expectations against observations we present two realizations of this approach crisp one which assumes deterministic actions and reliable sensing and uses standard knowledge representation system loom and probabilistic one which takes into account uncertainty in action effects in sensing and in world states we perform an extensive validation of these realizations through experiments performed both in simulation and on real robots
database queries are often exploratory and users often find their queries return too many answers many of them irrelevant existing work either categorizes or ranks the results to help users locate interesting results the success of both approaches depends on the utilization of user preferences however most existing work assumes that all users have the same user preferences but in real life different users often have different preferences this paper proposes two step solution to address the diversity issue of user preferences for the categorization approach the proposed solution does not require explicit user involvement the first step analyzes query history of all users in the system offline and generates set of clusters over the data each corresponding to one type of user preferences when user asks query the second step presents to the user navigational tree over clusters generated in the first step such that the user can easily select the subset of clusters matching his needs the user then can browse rank or categorize the results in selected clusters the navigational tree is automatically constructed using cost based algorithm which considers the cost of visiting both intermediate nodes and leaf nodes in the tree an empirical study demonstrates the benefits of our approach
database applications often require to evaluate queries containing quantifiers or disjunctions eg for handling general integrity constraints existing efficient methods for processing quantifiers depart from the relational model as they rely on non algebraic procedures looking at quantified query evaluation from new angle we propose an approach to process quantifiers that makes use of relational algebra operators only our approach performs in two phases the first phase normalizes the queries producing canonical form this form permits to improve the translation into relational algebra performed during the second phase the improved translation relies on new operator the complement join that generalizes the set difference on algebraic expressions of universal quantifiers that avoid the expensive division operator in many cases and on special processing of disjunctions by means of constrained outer joins our method achieves an efficiency at least comparable with that of previous proposals better in most cases furthermore it is considerably simpler to implement as it completely relies on relational data structures and operators
peer to peer pp model being widely adopted in todays internet computing suffers from the problem of topology mismatch between the overlay networks and the underlying physical network traditional topology optimization techniques identify physical closer nodes to connect as overlay neighbors but could significantly shrink the search scope recent efforts have been made to address the mismatch problem without sacrificing search scope but they either need time synchronization among peers or have low convergent speed in this paper we propose scalable bipartite overlay sbo scheme to optimize the overlay topology by identifying and replacing the mismatched connections in sbo we employ an efficient strategy for distributing optimization tasks in peers with different colors we conducted comprehensive simulations to evaluate this design the results show that sbo achieves approximately reduction on traffic cost and about reduction on query response time our comparisons with previous approaches to address the topology mismatch problem have shown that sbo can achieve fast convergent speed without the need of time synchronization among peers
bytecodes and virtual machines vm are prevailing programming facilities in contemporary software industry due to their ease of portability across various platforms thus it is critical to improve their trustworthiness this paper addresses the interesting and challenging problem of certifying bytecode programs over certified vms our solutions to this problem include logical systems cbp for bytecode machine is built to modularly certify bytecode programs with abstract control stacks and unstructured control flows and the corresponding stack based virtual machine is implemented and certified simulation relation between bytecode program and vm implementation is developed and proved to achieve the objective that once some safety property of bytecode program is certified in cbp system the property will be preserved on any certified vm we prove the soundness and demonstrate its power by certifying some example programs with the coq proof assistant this work not only provides solid theoretical foundation for reasoning about bytecode programs but also gains insight into building proof preserving compilers
multiple clock domain mcd processor addresses the challenges of clock distribution and power dissipation by dividing chip into several coarse grained clock domains allowing frequency and voltage to be reduced in domains that are not currently on the application’s critical path given reconfiguration mechanism capable of choosing appropriate times and values for voltage frequency scaling an mcd processor has the potential to achieve significant energy savings with low performance degradationearly work on mcd processors evaluated the potential for energy savings by manually inserting reconfiguration instructions into applications or by employing an oracle driven by off line analysis of identical prior program runs subsequent work developed hardware based on line mechanism that averages of the energy delay improvement achieved via off line analysisin this paper we consider the automatic insertion of reconfiguration instructions into applications using profile driven binary rewriting profile based reconfiguration introduces the need for training runs prior to production use of given application but avoids the hardware complexity of on line reconfiguration it also has the potential to yield significantly greater energy savings experimental results training on small data sets and then running on larger alternative data sets indicate that the profile driven approach is more stable than hardware based reconfiguration and yields virtually all of the energy delay improvement achieved via off line analysis
an accurate and rapid method is required to retrieve the overwhelming majority of digital images to date image retrieval methods include content based retrieval and keyword based retrieval the former utilizing visual features such as color and brightness and the latter utilizing keywords that describe the image however the effectiveness of these methods in providing the exact images the user wants has been under scrutiny hence many researchers have been working on relevance feedback process in which responses from the user are given as feedback during the retrieval session in order to define user’s need and provide an improved result methods that employ relevance feedback however do have drawbacks because several pieces of feedback are necessary to produce an appropriate result and the feedback information cannot be reused in this paper novel retrieval model is proposed which annotates an image with keywords and modifies the confidence level of the keywords in response to the user’s feedback in the proposed model not only the images that have been given feedback but also other images with visual features similar to the features used to distinguish the positive images are subjected to confidence modification this allows for modification of large number of images with relatively little feedback ultimately leading to faster and more accurate retrieval results an experiment was performed to verify the effectiveness of the proposed model and the result demonstrated rapid increase in recall and precision using the same amount of feedback
we present framework for designing efficient distributed data structures for multi dimensional data our structures which we call skip webs extend and improve previous randomized distributed data structures including skipnets and skip graphs our framework applies to general class of data querying scenarios which include linear one dimensional data such as sorted sets as well as multi dimensional data such as dimensional octrees and digital tries of character strings defined over fixed alphabetwe show how to perform query over such set of items spread among hosts using log log log messages for one dimensional data or log messages for fixed dimensional data while using only log space per host we also show how to make such structures dynamic so as to allow for insertions and deletions in log messages for quadtrees octrees and digital tries and log log log messages for one dimensional data finally we show how to apply blocking strategy to skip webs to further improve message complexity for one dimensional data when hosts can store more data
by using elliptic curve cryptography ecc it has been recently shown that public key cryptography pkc is indeed feasible on resource constrained nodes this feasibility however does not necessarily mean attractiveness as the obtained results are still not satisfactory enough in this paper we present results on implementing ecc as well as the related emerging field of pairing based cryptography pbc on two of the most popular sensor nodes by doing that we show that pkc is not only viable but in fact attractive for wsns as far as we know pairing computations presented in this paper are the most efficient results on the mica bit mhz atmegal and tmote sky bit mhz msp nodes
animation techniques for controlling passive simulation are commonly based on an optimization paradigm the user provides goals priori and sophisticated numerical methods minimize cost function that represents these goals unfortunately for multibody systems with discontinuous contact events these optimization problems can be highly nontrivial to solve and many hour offline optimizations unintuitive parameters and convergence failures can frustrate end users and limit usage on the other hand users are quite adaptable and systems which provide interactive feedback via an intuitive interface can leverage the user’s own abilities to quickly produce interesting animations however the online computation necessary for interactivity limits scene complexity in practice we introduce many worlds browsing method which circumvents these limits by exploiting the speed of multibody simulators to compute numerous example simulations in parallel offline and online and allow the user to browse and modify them interactively we demonstrate intuitive interfaces through which the user can select among the examples and interactively adjust those parts of the scene that do not match his requirements we show that using combination of our techniques unusual and interesting results can be generated for moderately sized scenes with under an hour of user time scalability is demonstrated by sampling much larger scenes using modest offline computations
current capacity planning practices based on heavy over provisioning of power infrastructure hurt the operational costs of data centers as well as ii the computational work they can support we explore combination of statistical multiplexing techniques to improve the utilization of the power hierarchy within data center at the highest level of the power hierarchy we employ controlled underprovisioning and over booking of power needs of hosted workloads at the lower levels we introduce the novel notion of soft fuses to flexibly distribute provisioned power among hosted workloads based on their needs our techniques are built upon measurement driven profiling and prediction framework to characterize key statistical properties of the power needs of hosted workloads and their aggregates we characterize the gains in terms of the amount of computational work cpu cycles per provisioned unit of power computation per provisioned watt cpw our technique is able to double the cpwoffered by power distribution unit pdu running the commerce benchmark tpc compared to conventional provisioning practices over booking the pdu by based on tails of power profiles yields further improvement of reactive techniques implemented on our xen vmm based servers dynamically modulate cpu dvfs states to ensure power draw below the limits imposed by soft fuses finally information captured in our profiles also provide ways of controlling application performance degradation despite overbooking the th percentile of tpc session response time only grew from sec to sec degradation of
resource aware random key predistribution schemes have been proposed to overcome the limitations of energy constrained wireless sensor networks wsns in most of these schemes each sensor node is loaded with key ring neighbouring nodes are considered to be connected through secure link if they share common key nodes which are not directly connected establish secure path which is then used to negotiate symmetric key however since different symmetric keys are used for different links along the secure path each intermediate node must first decrypt the message received from the upstream node notice that during this process the negotiated key is revealed to each node along the secure path the objective of this paper is to address this shortcoming to this end we propose an end to end pairwise key establishment scheme which uses properly selected set of node disjoint paths to securely negotiate symmetric keys between sensor nodes we show through analysis and simulation that our scheme is highly secure against node captures in wsns
silicon technology advances have made it possible to pack millions of transistors switching at high clock speeds on single chip while these advances bring unprecedented performance to electronic products they also pose difficult power energy consumption problems for example large number of transistors in dense on chip cache memories consume significant static leakage power even if the cache is not used by the current computation while previous compiler research studied code and data restructuring for improving data cache performance to our knowledge there exists no compiler based study that targets data cache leakage power consumption in this paper we present code restructuring techniques for array based and pointer intensive applications for reducing data cache leakage energy consumption the idea is to let the compiler analyze the application code and insert instructions that turn off cache lines that keep variables not used by the current computation this turning off does not destroy contents of cache line and waking up the cache line when it is accessed later does not incur much overhead due to inherent data locality in applications we find that at given time only small portion of the data cache needs to be active the remaining part can be placed into leakage saving mode state ie they can be turned off our experimental results indicate that the proposed compiler based strategy reduces the cache energy consumption significantly we also demonstrate how different compiler optimizations can increase the effectiveness of our strategy
modern source control systems such as subversion preserve change sets of files as atomic commits however the specific ordering information in which files were changed is typically not found in these source code repositories in this paper set of heuristics for grouping change sets ie log entries found in source code repositories is presented given such groups of change sets sequences of files that frequently change together are uncovered this approach not only gives the unordered sets of files but supplements them with partial temporal ordering information the technique is demonstrated on subset of kde source code repository the results show that the approach is able to find sequences of changed files
it is crucial to maximize targeting efficiency and customer satisfaction in personalized marketing state of the art techniques for targeting focus on the optimization of individual campaigns our motivation is the belief that the effectiveness of campaign with respect to customer is affected by how many precedent campaigns have been recently delivered to the customer we raise the multiple recommendation problem which occurs when performing several personalized campaigns simultaneously we formulate the multicampaign assignment problem to solve this issue and propose algorithms for the problem the algorithms include dynamic programming and efficient heuristic methods we verify by experiments the effectiveness of the problem formulation and the proposed algorithms
given an incorrect value produced during failed program run eg wrong output value or value that causes the program to crash the backward dynamic slice of the value very frequently captures the faulty code responsible for producing the incorrect value although the dynamic slice often contains only small percentage of the statements executed during the failed program run the dynamic slice can still be large and thus considerable effort may be required by the programmer to locate the faulty codein this paper we develop strategy for pruning the dynamic slice to identify subset of statements in the dynamic slice that are likely responsible for producing the incorrect value we observe that some of the statements used in computing the incorrect value may also have been involved in computing correct values eg value produced by statement in the dynamic slice of the incorrect value may also have been used in computing correct output value prior to the incorrect value for each such executed statement in the dynamic slice using the value profiles of the executed statements we compute confidence value ranging from to higher confidence value corresponds to greater likelihood that the execution of the statement produced correct value given failed run involving execution of single error we demonstrate that the pruning of dynamic slice by excluding only the statements with the confidence value of is highly effective in reducing the size of the dynamic slice while retaining the faulty code in the slice our experiments show that the number of distinct statements in pruned dynamic slice are to times less than the full dynamic slice confidence values also prioritize the statements in the dynamic slice according to the likelihood of them being faulty we show that examining the statements in the order of increasing confidence values is an effective strategy for reducing the effort of fault location
dynamic inference techniques have been demonstrated to provide useful support for various software engineering tasks including bug finding test suite evaluation and improvement and specification generation to date however dynamic inference has only been used effectively on small programs under controlled conditions in this paper we identify reasons why scaling dynamic inference techniques has proven difficult and introduce solutions that enable dynamic inference technique to scale to large programs and work effectively with the imperfect traces typically available in industrial scenarios we describe our approximate inference algorithm present and evaluate heuristics for winnowing the large number of inferred properties to manageable set of interesting properties and report on experiments using inferred properties we evaluate our techniques on jboss and the windows kernel our tool is able to infer many of the properties checked by the static driver verifier and leads us to discover previously unknown bug in windows
the last fifteen years has seen vast proliferation of middleboxes to solve all manner of persistent limitations in the internet protocol suite examples include firewalls nats load balancers traffic shapers deep packet intrusion detection virtual private networks network monitors transparent web caches content delivery networks and the list goes on and on however most smaller networks in homes small businesses and the developing world are left without this level of support further the management burden and limitations of middleboxes are apparent even in enterprise networks we argue for shift from using proprietary middle box harware as the dominant tool for managing networks toward using open software running on end hosts we show that functionality that seemingly must be in the network such as nats and traffic prioritization can be more cheaply flexibly and securely provided by distributed software running on end hosts working in concert with vastly simplified physical network hardware
set valued ordered information systems can be classified into two categories disjunctive and conjunctive systems through introducing two new dominance relations to set valued information systems we first introduce the conjunctive disjunctive set valued ordered information systems and develop an approach to queuing problems for objects in presence of multiple attributes and criteria then we present dominance based rough set approach for these two types of set valued ordered information systems which is mainly based on substitution of the indiscernibility relation by dominance relation through the lower upper approximation of decision some certain possible decision rules from so called set valued ordered decision table can be extracted finally we present attribute reduction also called criteria reduction in ordered information systems approaches to these two types of ordered information systems and ordered decision tables which can be used to simplify set valued ordered information system and find decision rules directly from set valued ordered decision table these criteria reduction approaches can eliminate those criteria that are not essential from the viewpoint of the ordering of objects or decision rules
nowadays embedded systems are growing at an impressive rate and provide more and more sophisticated applications characterized by having complex array index manipulation and large number of data accesses those applications require high performance specific computation that general purpose processors can not deliver at reasonable energy consumption very long instruction word architectures seem good solution providing enough computational performance at low power with the required programmability to speed up the time to market those architectures rely on compiler effort to exploit the available instruction and data parallelism to keep the data path busy all the time with the density of transistors doubling each months more and more sophisticated architectures with high number of computational resources running in parallel are emerging with this increasing parallel computation the access to data is becoming the main bottleneck that limits the available parallelism to alleviate this problem in current embedded architectures special unit works in parallel with the main computing elements to ensure efficient feed and storage of the data the address generator unit which comes in many flavors future architectures will have to deal with enormous memory bandwidth in distributed memories and the development of address generators units will be crucial for effective next generation of embedded processors where global trade offs between reaction time bandwidth energy and area must be achieved this paper provides survey of methods and techniques that optimize the address generation process for embedded systems explaining current research trends and needs for future
students studying topics in cyber security benefit from working with realistic training labs that test their knowledge of network security cost space time and reproducibility are major factors that prevent instructors from building realistic networks for their students this paper explores the ways that existing virtualization technologies could be packaged to provide more accessible comprehensive and realistic training and education environment the paper focuses on ways to leverage technologies such as operating system virtualization and other virtualization techniques to recreate an entire network environment consisting of dozens of nodes on moderately equipped hardware
in this paper we consider the problem of web page usage prediction in web site by modeling users navigation history with weighted suffix trees this user’s navigation prediction can be exploited either in an on line recommendation system in website or in web page cache system the method proposed has the advantage that it demands constant amount of computational effort per user action and consumes relatively small amount of extra memory space these features make the method ideal for an on line working environment finally we have performed an evaluation of the proposed scheme with experiments on various website logfiles and we have found that its prediction quality is fairly good in many cases outperforming existing solutions
current systems on chip soc execute applications that demand extensive parallel processing networks on chip noc provide structured way of realizing interconnections on silicon and obviate the limitations of bus based solution nocs can have regular or ad hoc topologies and functional validation is essential to assess their correctness and performance in this paper we present flexible emulation environment implemented on an fpga that is suitable to explore evaluate and compare wide range of noc solutions with very limited effort our experimental results show speed up of four orders of magnitude with respect to cycle accurate hdl simulation while retaining cycle accuracy with our emulation framework designers can explore and optimize various range of solutions as well as characterize quickly performance figures
approaches for indexing proteins and for fast and scalable searching for structures similar to query structure have important applications such as protein structure and function prediction protein classification and drug discovery in this paper we develop new method for extracting local structural or geometric features from protein structures these feature vectors are in turn converted into set of symbols which are then indexed using suffix tree for given query the suffix tree index can be used effectively to retrieve the maximal matches which are then chained to obtain the local alignments finally similar proteins are retrieved by their alignment score against the query our results show classification accuracy up to and at the topology and class level according to the cath classification these results outperform the best previous methods we also show that psist is highly scalable due to the external suffix tree indexing approach it uses it is able to index about domains from scop in under an hour
we introduce rich language of descriptions for semistructured tree like data and we explain how such descriptions relate to the data they describe various query languages and data schemas can be based on such descriptions
internet routing is mostly based on static information it’s dynamicity is limited to reacting to changes in topology adaptive performance based routing decisions would not only improve the performance itself of the internet but also its security and availability however previous approaches for making internet routing adaptive based on optimizing network wide objectives are not suited for an environment in which autonomous and possibly malicious entities interact in this paper we propose different framework for adaptive routing decisions based on regret minimizing online learning algorithms these algorithms as applied to routing are appealing because adopters can independently improve their own performance while being robust to adversarial behavior however in contrast to approaches based on optimization theory that provide guarantees from the outset about network wide behavior the network wide behavior if online learning algorithms were to interact with each other is less understood in this paper we study this interaction in realistic internet environment and find that the outcome is stable state and that the optimality gap with respect to the network wide optimum is small our findings suggest that online learning may be suitable framework for adaptive routing decisions in the internet
real time embedded systems are typically constrained in terms of three system performance criteria space time and energy the performance requirements are directly translated into constraints imposed on the system’s resources such as code size execution time and energy consumption these resource constraints often interact or even conflict with each other in complex manner making it difficult for system developer to apply well defined design methodology in developing real time embedded system motivated by this observation we propose design framework that can flexibly balance the tradeoff involving the system’s code size execution time and energy consumption given system specification and an optimization criteria the proposed technique generates set of design parameters in such way that system cost function is minimized while the given resource constraints are satisfied specifically the technique derives code generation decision for each task so that specific version of code is selected among number of different ones that have distinct characteristics in terms of code size and execution time in addition the design framework determines the voltage frequency setting for variable voltage processor whose supply voltage can be adjusted at runtime in order to minimize the energy consumption while execution performance is degraded accordingly the proposed technique formulates this design process as constrained optimization problem we show that this optimization problem is np hard and then provide heuristic solution to it we show that these seemingly conflicting design goals can be pursued by using simple optimization algorithm that works with single optimization criteria moreover the optimization is driven by an abstract system specification given by the system developer so that the system development process can be automated the results from our simulation show that the proposed algorithm finds solution that is close to the optimal one with the average error smaller than percent
the goal of the research described here is to develop multistrategy classifier system that can be used for document categorization the system automatically discovers classification patterns by applying several empirical learning methods to different representations for preclassified documents belonging to an imbalanced sample the learners work in parallel manner where each learner carries out its own feature selection based on evolutionary techniques and then obtains classification model in classifying documents the system combines the predictions of the learners by applying evolutionary techniques as well the system relies on modular flexible architecture that makes no assumptions about the design of learners or the number of learners available and guarantees the independence of the thematic domain
we show that for several natural classes of structured matrices including symmetric circulant hankel and toeplitz matrices approximating the permanent modulo prime is as hard as computing its exact value results of this kind are well known for arbitrary matrices however the techniques used do not seem to apply to structured matrices our approach is based on recent advances in the hidden number problem introduced by boneh and venkatesan in combined with some bounds of exponential sums motivated by the waring problem in finite fields
nowadays it is widely accepted that the data warehouse design task should be largely automated furthermore the data warehouse conceptual schema must be structured according to the multidimensional model and as consequence the most common way to automatically look for subjects and dimensions of analysis is by discovering functional dependencies as dimensions functionally depend on the fact over the data sources most advanced methods for automating the design of the data warehouse carry out this process from relational oltp systems assuming that rdbms is the most common kind of data source we may find and taking as starting point relational schema in contrast in our approach we propose to rely instead on conceptual representation of the domain of interest formalized through domain ontology expressed in the dl lite description logic we propose an algorithm to discover functional dependencies from the domain ontology that exploits the inference capabilities of dl lite thus fully taking into account the semantics of the domain we also provide an evaluation of our approach in real world scenario
while set associative caches incur fewer misses than direct mapped caches they typically have slower hit times and higher power consumption when multiple tag and data banks are probed in parallel this paper presents the location cache structure which significantly reduces the power consumption for large set associative caches we propose to use small cache called location cache to store the location of future cache references if there is hit in the location cache the supported cache is accessed as direct mapped cache otherwise the supported cache is referenced as conventional set associative cachethe worst case access latency of the location cache system is the same as that of conventional cache the location cache is virtually indexed so that operations on it can be performed in parallel with the tlb address translation these advantages make it ideal for cache systems where traditional way predication strategies perform poorlywe used the cacti cache model to evaluate the power con sumption and access latency of proposed cache architecture simplescalar cpu simulator was used to produce final results it is shown that the proposed location cache architecture is power efficient in the simulated cache configurations up to of cache accessing energy and of average cache access latency can be reduced
middleware is often built using layered architectural style layered design provides good separation of the different concerns of middleware such as communication marshaling request dispatching thread management etc layered architecture helps in the development and evolution of the middleware it also provides tactical side benefits layers provide convenient protection boundaries for enforcing security policies however the benefits of this layered structure come at cost layered designs can hinder performance related optimizations and actually make it more difficult to adapt systems to conveniently address late bound requirements such as dependability access control virus protection and so on we present some examples of this issue and outline new approach under investigation at uc davis which includes ideas in middleware architectures and programming models
symmetric multiprocessor smp servers provide superior performance for the commercial workloads that dominate the internet our simulation results show that over one third of cache misses by these applications result in cache to cache transfers where the data is found in another processor’s cache rather than in memory smps are optimized for this case by using snooping protocols that broadcast address transactions to all processors conversely directory based shared memory systems must indirectly locate the owner and sharers through directory resulting in larger average miss latenciesthis paper proposes timestamp snooping technique that allows smps to utilize high speed switched interconnection networks and ii exploit physical locality by delivering address transactions to processors and memories without regard to order traditional snooping requires physical ordering of transactions timestamp snooping works by processing address transactions in logical order logical time is maintained by adding few bits per address transaction and having network switches perform handshake to ensure on time delivery processors and memories then reorder transactions based on their timestamps to establish total orderwe evaluate timestamp snooping with commercial workloads on processor sparc system using the simics full system simulator we simulate both an indirect butterfly and direct torus network design for oltp dss web serving web searching and one scientific application timestamp snooping with the butterfly network runs faster than directories at cost of more link traffic similarly with the torus network timestamp snooping runs faster for more link traffic thus timestamp snooping is worth considering when buying more interconnect bandwidth is easier than reducing interconnect latency
scientific research and practical applications of solar physics require data and computational services to be integrated seamlessly and efficiently the european grid for solar observations egso leverages grid oriented concepts and technology to provide high performance infrastructure for solar applications in this paper an architecture for data brokerage service is proposed brokers interact with providers and consumers in order to build profile of both parties in particular broker interacts with providers in order to gather information on the data potentially available to consumers and with the consumers in order to identify the set of providers that are most likely to satisfy specific data needs the brokerage technique is based on multi tier management of metadata copyright copy john wiley sons ltd
superimposition is composition technique that has been applied successfully in many areas of software development although superimposition is general purpose concept it has been re invented and implemented individually for various kinds of software artifacts we unify languages and tools that rely on superimposition by using the language independent model of feature structure trees fsts on the basis of the fst model we propose general approach to the composition of software artifacts written in different languages furthermore we offer supporting framework and tool chain called featurehouse we use attribute grammars to automate the integration of additional languages in particular we have integrated java haskell javacc and xml several case studies demonstrate the practicality and scalability of our approach and reveal insights into the properties language must have in order to be ready for superimposition
probabilistic top ranking queries have been extensively studied due to the fact that data obtained can be uncertain in many real applications probabilistic top ranking query ranks objects by the interplay of score and probability with an implicit assumption that both scores based on which objects are ranked and probabilities of the existence of the objects are stored in the same relation we observe that in general scores and probabilities are highly possible to be stored in different relations for example in column oriented dbmss and in data warehouses in this paper we study probabilistic top ranking queries when scores and probabilities are stored in different relations we focus on reducing the join cost in probabilistic top ranking we investigate two probabilistic score functions discuss the upper lower bounds in random access and sequential access and provide insights on the advantages and disadvantages of random sequential access in terms of upper lower bounds we also propose random sequential and hybrid algorithms to conduct probabilistic top ranking we conducted extensive performance studies using real and synthetic datasets and report our findings in this paper
we describe polynomial time algorithm for global value numbering which is the problem of discovering equivalences among program sub expressions we treat all conditionals as non deterministic and all program operators as uninterpreted we show that there are programs for which the set of all equivalences contains terms whose value graph representation requires exponential size our algorithm discovers all equivalences among terms of size at most in time that grows linearly with for global value numbering it suffices to choose to be the size of the program earlier deterministic algorithms for the same problem are either incomplete or take exponential time we provide detailed analytical comparison of some of these algorithms
level one cache normally resides on processor’s critical path which determines the clock frequency directmapped caches exhibit fast access time but poor hit rates compared with same sized set associative caches due to nonuniform accesses to the cache sets which generate more conflict misses in some sets while other sets are underutilized we propose technique to reduce the miss rate of direct mapped caches through balancing the accesses to cache sets we increase the decoder length and thus reduce the accesses to heavily used sets without dynamically detecting the cache set usage information we introduce replacement policy to direct mapped cache design and increase the access to the underutilized cache sets with the help of programmable decoders on average the proposed balanced cache or bcache achieves and miss rate reductions on all speck benchmarks for the instruction and data caches respectively this translates into an average ipc improvement of the cache consumes more power per access but exhibits total memory access related energy saving due to the miss rate reductions and hence the reduction to applications execution time compared with previous techniques that aim at reducing the miss rate of direct mapped caches our technique requires only one cycle to access all cache hits and has the same access time of direct mapped cache
the problem of results merging in distributed information retrieval environments has been approached by two different directions in research estimation approaches attempt to calculate the relevance of the returned documents through ad hoc methodologies weighted score merging regression etc while download approaches download all the documents locally partially or completely in order to estimate first hand their relevance both have their advantages and disadvantages it is assumed that download algorithms are more effective but they are very expensive in terms of time and bandwidth estimation approaches on the other hand usually rely on document relevance scores being returned by the remote collections in order to achieve maximum performance in addition to that regression algorithms which have proved to be more effective than weighted scores merging rely on significant number of overlap documents in order to function effectively practically requiring multiple interactions with the remote collections the new algorithm that is introduced reconciles the above two approaches combining their strengths while minimizing their weaknesses it is based on downloading limited selected number of documents from the remote collections and estimating the relevance of the rest through regression methodologies the proposed algorithm is tested in variety of settings and its performance is found to be better than estimation approaches while approximating that of download
it is well known that grid technology has the ability to coordinate shared resources and scheduled tasks however the problem of resource management and task scheduling has always been one of the main challenges in this paper we present performance effective pre scheduling strategy for dispatching tasks onto heterogeneous processors the main extension of this study is the consideration of heterogeneous communication overheads in grid systems one significant improvement of our approach is that average turnaround time could be minimized by selecting the processor that has the smallest communication ratio first the other advantage of the proposed method is that system throughput can be increased by dispersing processor idle time our proposed technique can be applied on heterogeneous cluster systems as well as computational grid environments in which the communication costs vary in different clusters to evaluate performance of the proposed techniques we have implemented the proposed algorithms along with previous methods the experimental results show that our techniques outperform other algorithms in terms of lower average turnaround time higher average throughput less processor idle time and higher processors utilization
recent years have witnessed large body of research work on mining concept drifting data streams where primary assumption is that the up to date data chunk and the yet to come data chunk share identical distributions so classifiers with good performance on the up to date chunk would also have good prediction accuracy on the yet to come data chunk this stationary assumption however does not capture the concept drifting reality in data streams more recently learnable assumption has been proposed and allows the distribution of each data chunk to evolve randomly although this assumption is capable of describing the concept drifting in data streams it is still inadequate to represent real world data streams which usually suffer from noisy data as well as the drifting concepts in this paper we propose realistic assumption which asserts that the difficulties of mining data streams are mainly caused by both concept drifting and noisy data chunks consequently we present new aggregate ensemble ae framework which trains base classifiers using different learning algorithms on different data chunks all the base classifiers are then combined to form classifier ensemble through model averaging experimental results on synthetic and real world data show that ae is superior to other ensemble methods under our new realistic assumption for noisy data streams
it is becoming apparent that the next generation ip route lookup architecture needs to achieve speeds of gbps and beyond while supporting both ipv and ipv with fast real time updates to accommodate ever growing routing tables some of the proposed multibit trie based schemes such as tree bitmap have been used in today’s high end routers however their large data structure often requires multiple external memory accesses for each route lookup pipelining technique is widely used to achieve high speed lookup with cost of using many external memory chips pipelining also often leads to poor memory load balancing in this paper we propose new ip route lookup architecture called flashtrie that overcomes the shortcomings of the multibit trie based approach we use hash based membership query to limit off chip memory accesses per lookup to one and to balance memory utilization among the memory modules we also develop new data structure called prefix compressed trie that reduces the size of bitmap by more than our simulation and implementation results show that flashtrie can achieve gbps worst case throughput while simultaneously supporting prefixes for ipv and prefixes for ipv using one fpga chip and four ddr sdram chips flashtrie also supports incremental real time updates
we present an adaptive fault tolerant wormhole routing algorithm for hypercubes by using virtual networks the routing algorithm can tolerate at least faulty nodes and can route message via path of length no more than the shortest path plus four previous algorithms which achieve the same fault tolerant ability need virtual networks simulation results are also given in this paper
the advent of strong multi level partitioners has made topdown min cut placers favored choice for modern placer implementations we examine terminal propagation an important step in min cut placers because it is responsible for translating partitioning results into global placement wirelength assumptions in this work we identify previously overlooked problem ambiguous terminal propagation and propose solution based on the concept of feedback from automatic control systems implementing our approach in capo version and applying it to standard benchmark circuits yields up to wirelength reductions for the ibm benchmarks and reductions for peko instances experiments also show consistent improvements for routed wirelength yielding up to wirelength reductions with practical increase in placement runtime in addition our method significantly improves routability without building congestion maps and reduces the number of vias
efficient fine grain synchronization is extremely important to effectively harness the computational power of many core architectures however designing and implementing finegrain synchronization in such architectures presents several challenges including issues of synchronization induced overhead storage cost scalability and the level of granularity to which synchronization is applicable this paper proposes the synchronization state buffer ssb scalable architectural design for fine grain synchronization that efficiently performs synchronizations between concurrent threads the design of ssb is motivated by the following observation at any instance during the parallel execution only small fraction of memory locations are actively participating in synchronization based on this observation we present fine grain synchronization design that records and manages the states of frequently synchronized data using modest hardware support we have implemented the ssb design in the context of the core ibm cyclops architecture using detailed simulation we present our experience for set of benchmarks with different workload characteristics
pattern matching in concrete syntax is very useful in program manipulation tools in particular user defined extensions to such tools are written much easier using concrete syntax patterns few advanced frameworks for language development implement support for concrete syntax patterns but mainstream frameworks used today still do not support them this prevents most existing program manipulation tools from using concrete syntax matching which in particular severely limits the writing of tool extensions to few language experts this paper argues that the major implementation obstacle to the pervasive use of concrete syntax patterns is the pattern parser we propose an alternative approach based on unparsed patterns which are concrete syntax patterns that can be efficiently matched without being parsed this lighter approach gives up static checks that parsed patterns usually do in turn it can be integrated within any existing parser based software tool almost for free one possible consequence is enabling widespread adoption of extensible program manipulation tools by the majority of programmers unparsed patterns can be used in any programing language including multi lingual environments to demonstrate our approach we implemented it both as minimal patch for the gcc compiler allowing to scan source code for user defined patterns and as stand alone prototype called matchbox
depth of field refers to the swath through scene that is imaged in acceptable focus through an optics system such as camera lens control over depth of field is an important artistic tool that can be used to emphasize the subject of photograph in real camera the control over depth of field is limited by the nature of the image formation process and by physical constraints the depth of field effect has been simulated in computer graphics but with the same limited control as found in real camera lenses in this paper we use diffusion in non homogeneous medium to generalize depth of field in computer graphics by enabling the user to independently specify the degree of blur at each point in three dimensional space generalized depth of field provides novel tool to emphasize an area of interest within scene to pick objects out of crowd and to render busy complex picture more understandable by focusing only on relevant details that may be scattered throughout the scene our algorithm operates by blurring sequence of nonplanar layers that form the scene choosing suitable blur algorithm for the layers is critical thus we develop appropriate blur semantics such that the blur algorithm will properly generalize depth of field we found that diffusion in non homogeneous medium is the process that best suits these semantics
this paper introduces the expander new object oriented oo programming language construct designed to support object adaptation expanders allow existing classes to be noninvasively updated with new methods fields and superinterfaces each client can customize its view of class by explicitly importing any number of expanders this view then applies to all instances of that class including objects passed to the client from other components form of expander overriding allows expanders to interact naturally with oo style inheritancewe describe the design implementation and evaluation of ejava an extension to java supporting expanders we illustrate ejava’s syntax and semantics through several examples the statically scoped nature of expander usage allows for modular static type system that prevents several important classes of errors we describe this modular static type system informally formalize ejava and its type system in an extension to featherweight java and prove type soundness theorem for the formalization we also describe modular compilation strategy for ejava which we have implemented using the polyglot extensible compiler framework finally we illustrate the practical benefits of ejava by using this compiler in two experiments
methods for triangle mesh decimation are common however most existing techniques operate only on static geometry in this paper we present view and pose independent method for the automatic simplification of skeletally articulated meshes such meshes have associated kinematic skeletons that are used to control their deformation with the position of each vertex influenced by linear combination of bone transformations our method extends the commonly used quadric error metric by incorporating knowledge of potential poses into probability function we minimize the average error of the deforming mesh over all possible configurations weighted by the probability this is possible by transforming the quadrics from each configuration into common coordinate system our simplification algorithm runs as preprocess and the resulting meshes can be seamlessly integrated into existing systems we demonstrate the effectiveness of this approach for generating highly simplified models while preserving necessary detail in deforming regions near joints
reliable broadband communication is becoming increasingly important during disaster recovery and emergency response operations in situations where infrastructure based communication is not available or has been disrupted an incident area network needs to be dynamically deployed ie temporary network that provides communication services for efficient crisis management at an incident site wireless mesh networks wmns are multi hop wireless networks with self healing and self configuring capabilities these features combined with the ability to provide wireless broadband connectivity at comparably low cost make wmns promising technology for incident management communications this paper specifically focuses on hybrid wmns which allow both mobile client devices as well as dedicated infrastructure nodes to form the network and provide routing and forwarding functionality hybrid wmns are the most generic and most flexible type of mesh networks and are ideally suited to meet the requirements of incident area communications however current wireless mesh and ad hoc routing protocols do not perform well in hybrid wmn and are not able to establish stable and high throughput communication paths one of the key reasons for this is their inability to exploit the typical high degree of heterogeneity in hybrid wmns safemesh the routing protocol presented in this paper addresses the limitations of current mesh and ad hoc routing protocols in the context of hybrid wmns safemesh is based on the well known aodv routing protocol and implements number of modifications and extensions that significantly improve its performance in hybrid wmns this is demonstrated via an extensive set of simulation results we further show the practicality of the protocol through prototype implementation and provide performance results obtained from small scale testbed deployment
topic modeling has been key problem for document analysis one of the canonical approaches for topic modeling is probabilistic latent semantic indexing which maximizes the joint probability of documents and terms in the corpus the major disadvantage of plsi is that it estimates the probability distribution of each document on the hidden topics independently and the number of parameters in the model grows linearly with the size of the corpus which leads to serious problems with overfitting latent dirichlet allocation lda is proposed to overcome this problem by treating the probability distribution of each document over topics as hidden random variable both of these two methods discover the hidden topics in the euclidean space however there is no convincing evidence that the document space is euclidean or flat therefore it is more natural and reasonable to assume that the document space is manifold either linear or nonlinear in this paper we consider the problem of topic modeling on intrinsic document manifold specifically we propose novel algorithm called laplacian probabilistic latent semantic indexing lapplsi for topic modeling lapplsi models the document space as submanifold embedded in the ambient space and directly performs the topic modeling on this document manifold in question we compare the proposed lapplsi approach with plsi and lda on three text data sets experimental results show that lapplsi provides better representation in the sense of semantic structure
the system and network architecture for static sensor nets is largely solved today with many stable commercial solutions now available and standardization efforts underway at the ieee ietf isa and within many industry groups as result many researchers have begun to explore new domains like mobile sensor networks or mobiscopes since they enable new applications in the home and office for health and safety and in transportation and asset management this paper argues that mobility invalidates many assumptions implicit in low power static designs so the architecture for micropower mobiscopes is still very much an open research question in this paper we explore several mobile sensing applications identify research challenges to their realization and explore how emerging technologies and real time motion data could help ease these challenges
software maintenance tools for program analysisand refactoring rely on meta model capturing the relevantproperties of programs however what is considered relevantmay change when the tools are extended with new analyses andrefactorings and new programming languages this paper proposesa language independent meta model and an architecture toconstruct instances thereof which is extensible for new analyses refactorings and new front ends of programming languages dueto the loose coupling between analysis refactoring and frontend components new components can be added independentlyand reuse existing ones two maintenance tools implementingthe meta model and the architecture vizzanalyzer and xdevelop serve as proof of concept
this article explores the architectural challenges introduced by emerging bottom up fabrication of nanoelectronic circuits the specific nanotechnology we explore proposes patterned dna nanostructures as scaffold for the placement and interconnection of carbon nanotube or silicon nanorod fets to create limited size circuit node three characteristics of this technology that significantly impact architecture are limited node size random node interconnection and high defect rates we present and evaluate an accumulator based active network architecture that is compatible with any technology that presents these three challenges this architecture represents an initial unoptimized solution for understanding the implications of dna guide self assembly
structured peer to peer pp overlays have been successfully employed in many applications to locate content however they have been less effective in handling massive amounts of data because of the high overhead of maintaining indexes in this paper we propose pisces peer based system that indexes selected content for efficient search unlike traditional approaches that index all data pisces identifies subset of tuples to index based on some criteria such as query frequency update frequency index cost etc in addition coarse grained range index is built to facilitate the processing of queries that cannot be fully answered by the tuple level index more importantly pisces can adaptively self tune to optimize the subset of tuples to be indexed that is the partial index in pisces is built in just in time jit manner beneficial tuples for current users are pulled for indexing while indexed tuples with infrequent access and high maintenance cost are discarded we also introduce light weight monitoring scheme for structured networks to collect the necessary statistics we have conducted an extensive experimental study on planetlab to illustrate the feasibility practicality and efficiency of pisces the results show that pisces incurs lower maintenance cost and offers better search and query efficiency compared to existing methods
vast amount of valuable information produced and consumed by people and institutions is currently stored in relational databases for many purposes there is an ever increasing demand for having these databases published on the web so that users can query the data available in them an important requirement for this to happen is that query interfaces must be as simple and intuitive as possible in this paper we present labrador system for efficiently publishing relational databases on the web by using simple text box query interface the system operates by taking an unstructured keyword based query posed by user and automatically deriving an equivalent sql query that fits the user’s information needs as expressed by the original query the sql query is then sent to dbms and its results are processed by labrador to create relevance based ranking of the answers experiments we present show that labrador can automatically find the most suitable sql query in more than of the cases and that the overhead introduced by the system in the overall query processing time is almost insignificant furthermore the system operates in non intrusive way since it requires no modifications to the target database schema
discriminative probabilistic models are very popular in nlp because of the latitude they afford in designing features but training involves complex trade offs among weights which can be dangerous few highly indicative features can swamp the contribution of many individually weaker features causing their weights to be undertrained such model is less robust for the highly indicative features may be noisy or missing in the test data to ameliorate this weight undertraining we introduce several new feature bagging methods in which separate models are trained on subsets of the original features and combined using mixture model or product of experts these methods include the logarithmic opinion pools used by smith et al we evaluate feature bagging on linear chain conditional random fields for two natural language tasks on both tasks the feature bagged crf performs better than simply training single crf on all the features
java programmers can document that the relationship between two objects is unchanging by declaring the field that encodes that relationship to be final this information can be used in program understanding and detection of errors in new code additions unfortunately few fields in programs are actually declared final programs often contain fields that could be final but are not declared so moreover the definition of final has restrictions on initializationthat limit its applicability we introduce stationary fields as generalization of final field in program is stationary if for every object that contains it all writes to the field occur before all the reads unlike the definition of final fields there can be multiple writes during initialization and initialization can span multiple methods we have developed an efficient algorithm for inferring which fields are stationary in program based on the observation that many fields acquire their value very close to object creation we presume that an object’s initialization phase has concluded when its reference is saved in some heap object we perform precise analysis only regarding recently created objects applying our algorithm to real world java programs demonstrates that stationary fields are more common than final fields vs respectively in our benchmarks these surprising results have several significant implications first substantial portions of java programs appear to be written in functional style second initialization of these fields occurs very close to object creation when very good alias information is available these results open the door for more accurate and efficient pointer alias analysis
this article proposes statistical approach for fast articulated body tracking similar to the loose limbed model but using the factor graph representation and fast estimation algorithm fast nonparametric belief propagation on factor graphs is used to estimate the current marginal for each limb all belief propagation messages are represented as sums of weighted samples the resulting algorithm corresponds to set of particle filters one for each limb where an extra step recomputes the weight of each sample by taking into account the links between limbs applied to upper body tracking with stereo and colour images the resulting algorithm estimates the body pose in quasi real time hz results on sequences illustrate the effectiveness of this approach
we focus on collaborative filtering dealing with self organizing communities host mobility wireless access and ad hoc communications in such domain knowledge representation and users profiling can be hard remote servers can be often unreachable due to client mobility and feedback ratings collected during random connections to other users ad hoc devices can be useless because of natural differences between human beings our approach is based on so called affinity networks and on novel system called mobhinter that epidemically spreads recommendations through spontaneous similarities between users main results of our study are two fold firstly we show how to reach comparable recommendation accuracies in the mobile domain as well as in complete knowledge scenario secondly we propose epidemic collaborative strategies that can reduce rapidly and realistically the cold start problem
different notions of provenance for database queries have been proposed and studied in the past few years in this article we detail three main notions of database provenance some of their applications and compare and contrast amongst them specifically we review why how and where provenance describe the relationships among these notions of provenance and describe some of their applications in confidence computation view maintenance and update debugging and annotation propagation
we present novel approach for classifying documents that combines different pieces of evidence eg textual features of documents links and citations transparently through data mining technique which generates rules associating these pieces of evidence to predefined classes these rules can contain any number and mixture of the available evidence and are associated with several quality criteria which can be used in conjunction to choose the best rule to be applied at classification time our method is able to perform evidence enhancement by link forwarding backwarding ie navigating among documents related through citation so that new pieces of link based evidence are derived when necessary furthermore instead of inducing single model or rule set that is good on average for all predictions the proposed approach employs lazy method which delays the inductive process until document is given for classification therefore taking advantage of better qualitative evidence coming from the document we conducted systematic evaluation of the proposed approach using documents from the acm digital library and from brazilian web directory our approach was able to outperform in both collections all classifiers based on the best available evidence in isolation as well as state of the art multi evidence classifiers we also evaluated our approach using the standard webkb collection where our approach showed gains of in accuracy being times faster further our approach is extremely efficient in terms of computational performance showing gains of more than one order of magnitude when compared against other multi evidence classifiers
distributed digital libraries dls integration is significant for the enforcement of novel searching mechanisms in the internet the great heterogeneity of systems storing and providing digital content requires the introduction of interoperability aspects in order to resolve integration problems in flexible and dynamic way our approach introduces an innovative service oriented pp system which initialises distributed ontology schema semantically describing and indexing the digital content stored in distributed dls the proposed architecture enforces the distributed semantic index by defining virtual clusters consisting of nodes peers with similar or related content in order to provide efficient searching and recommendation mechanisms furthermore use case example is presented in order to demonstrate the functionalities of the proposed architecture in pp network with dls containing cultural material
the maintenance of an existing database depends on the depth of understanding of its characteristics such an understanding is easily lost when the developers disperse the situation becomes worse when the related documentation is missing this paper addresses this issue by extracting the extended entity relationship schema from the relational schema we developed algorithms that investigate characteristics of an existing legacy database in order to identify candidate keys of all relations in the relational schema to locate foreign keys and to decide on the appropriate links between the given relations based on this analysis graph consistent with the entity relationship diagram is derived to contain all possible uniary and binary relationships between the given relations the minimum and maximum cardinalities of each link in the mentioned graph are determined and extra links within the graph are identified and categorized if any the latter information is necessary to optimize foreign keys related information finally the last steps in the process involve when applicable suggesting improvements on the original conceptual design deciding on relationships with attributes many to many and ary relationships and identifying is links user involvement in the process is minimized to the case of having multiple choices where the system does not have the semantic knowledge required to decide on certain choice
in this paper we propose new learning method for extracting bilingual word pairs from parallel corpora in various languages in cross language information retrieval the system must deal with various languages therefore automatic extraction of bilingual word pairs from parallel corpora with various languages is important however previous works based on statistical methods are insufficient because of the sparse data problem our learning method automatically acquires rules which are effective to solve the sparse data problem only from parallel corpora without any prior preparation of bilingual resource eg bilingual dictionary machine translation system we call this learning method inductive chain learning icl moreover the system using icl can extract bilingual word pairs even from bilingual sentence pairs for which the grammatical structures of the source language differ from the grammatical structures of the target language because the acquired rules have the information to cope with the different word orders of source language and target language in local parts of bilingual sentence pairs evaluation experiments demonstrated that the recalls of systems based on several statistical approaches were improved through the use of icl
distributed shared memory dsm is an abstraction of shared memory on distributed memory machine hardware dsm systems support this abstraction at the architecture level software dsm systems support the abstraction within the runtime system one of the key problems in building an efficient software dsm system is to reduce the amount of communication needed to keep the distributed memories consistent in this article we present four techniques for doing so software release consistency multiple consistency protocols write shared protocols and an update with timeout mechanism these techniques have been implemented in the munin dsm system we compare the performance of seven munin application programs first to their performance when implemented using message passing and then to their performance when running on conventional software dsm system that does not embody the preceding techniques on processor cluster of workstations munin’s performance is within of message passing for four out of the seven applications for the other three performance is within to detailed analysis of two of these three applications indicates that the addition of function shipping capability would bring their performance to within of the message passing performance compared to conventional dsm system munin achieves performance improvements ranging from few to several hundred percent depending on the application
in this paper we present our experiences concerning the enforcement of access rights extracted from odrl based digital contracts we introduce the generalized contract schema cosa which is an approach to provide generic representation of contract information on top of rights expression languages we give an overview of the design and implementation of the xorelinterpreter software component in particular the xorelinterpreter interprets digital contracts that are based on rights expression languages eg odrl or xrml and builds runtime cosa object model we describe how the xorbac access control component and the xorelinterpreter component are used to enforce access rights that we extract from odrl based digital contracts thus our approach describes how odrl based contracts can be used as means to disseminate certain types of access control information in distributed systems
we propose framework and methodology for quantifying the effect of denial of service dos attacks on distributed system we present systematic study of the resistance of gossip based multicast protocols to dos attacks we show that even distributed and randomized gossip based protocols which eliminate single points of failure do not necessarily eliminate vulnerabilities to dos attacks we propose drum simple gossip based multicast protocol that eliminates such vulnerabilities drum was implemented in java and tested on large cluster we show using closed form mathematical analysis simulations and empirical tests that drum survives severe dos attacks
we propose general approach for frequency based string mining which has many applications eg in contrast data mining our contribution is novel algorithm based on deferred data structure despite its simplicity our approach is up to times faster and uses about half the memory compared to the best known algorithm of fischer et al applications in various string domains eg natural language dna or protein sequences demonstrate the improvement of our algorithm
volumetric displays which provide view of imagery illuminated in true space are promising platform for interactive applications however presenting text in volumetric displays can be challenge as the text may not be oriented towards the user this is especially problematic with multiple viewers as the text could for example appear forwards to one user and backwards to another in first experiment we determined the effects of rotations on text readability based on the results we developed and evaluated new technique which optimizes text orientation for multiple viewers this technique provided faster group reading times in collaborative experimental task
although data broadcast has been shown to be an efficient method for disseminating data items in mobile computing systems the issue on how to ensure consistency and currency of data items provided to mobile transactions mt which are generated by mobile clients has not been examined adequately while data items are being broadcast update transactions may install new values for them if the executions of update transactions and the broadcast of data items are interleaved without any control mobile transactions may observe inconsistent data values the problem will be more complex if the mobile clients maintain some cached data items for their mobile transactions in this paper we propose concurrency control method called ordered update first with order oufo for the mobile computing systems where mobile transaction consists of sequence of read operations and each mt is associated with time constraint on its completion time besides ensuring data consistency and maximizing currency of data to mobile transactions oufo also aims at reducing data access delay of mobile transactions using client caches hybrid re broadcast invalidation report ir mechanism is designed in oufo for checking the validity of cached data items so as to improve cache consistency and minimize the overhead of transaction restarts due to data conflicts this is highly important to the performance of the mobile computing systems where the mobile transactions are associated with deadline constraint on their completion times extensive simulation experiments have been performed to compare the performance of oufo with two other efficient schemes the multi version broadcast method and the periodic ir method the performance results show that oufo offers better performance in most aspects even when network disconnection is common
at present the search for specific information on the world wide web is faced with several problems which arise on the one hand from the vast number of information sources available and on the other hand from their intrinsic heterogeneity since standards are missing promising approach for solving the complex problems emerging in this context is the use of multi agent systems of information agents which cooperatively solve advanced information retrieval problems this requires advanced capabilities to address complex tasks such as search and assessment of information sources query planning information merging and fusion dealing with incomplete information and handling of inconsistency in this paper our interest lies in the role which some methods from the field of declarative logic programming can play in the realization of reasoning capabilities for information agents in particular we are interested to see how they can be used extended and further developed for the specific needs of this application domain we review some existing systems and current projects which typically address information integration problems we then focus on declarative knowledge representation methods and review and evaluate approaches and methods from logic programming and nonmonotonic reasoning for information agents we discuss advantages and drawbacks and point out the possible extensions and open issues
overall performance of the data mining process depends not just on the value of the induced knowledge but also on various costs of the process itself such as the cost of acquiring and pre processing training examples the cpu cost of model induction and the cost of committed errors recently several progressive sampling strategies for maximizing the overall data mining utility have been proposed all these strategies are based on repeated acquisitions of additional training examples until utility decrease is observed in this paper we present an alternative projective sampling strategy which fits functions to partial learning curve and partial run time curve obtained from small subset of potentially available data and then uses these projected functions to analytically estimate the optimal training set size the proposed approach is evaluated on variety of benchmark datasets using the rapidminer environment for machine learning and data mining processes the results show that the learning and run time curves projected from only several data points can lead to cheaper data mining process than the common progressive sampling methods
as most current query processing architectures are already pipelined it seems logical to apply them to data streams however two classes of query operators are impractical for processing long or infinite data streams unbounded stateful operators maintain state with no upper bound in size and so run out of memory blocking operators read an entire input before emitting single output and so might never produce result we believe that priori knowledge of data stream can permit the use of such operators in some cases we discuss kind of stream semantics called punctuated streams punctuations in stream mark the end of substreams allowing us to view an infinite stream as mixture of finite streams we introduce three kinds of invariants to specify the proper behavior of operators in the presence of punctuation pass invariants define when results can be passed on keep invariants define what must be kept in local state to continue successful operation propagation invariants define when punctuation can be passed on we report on our initial implementation and show strategy for proving implementations of these invariants are faithful to their relational counterparts
with the explosive growth of the world wide web the public is gaining access to massive amounts of information however locating needed and relevant information remains difficult task whether the information is textual or visual text search engines have existed for some years now and have achieved certain degree of success however despite the large number of images available on the web image search engines are still rare in this article we show that in order to allow people to profit from all this visual information there is need to develop tools that help them to locate the needed images with good precision in reasonable time and that such tools are useful for many applications and purposes the article surveys the main characteristics of the existing systems most often cited in the literature such as imagerover webseek diogenes and atlas wise it then examines the various issues related to the design and implementation of web image search engine such as data gathering and digestion indexing query specification retrieval and similarity web coverage and performance evaluation general discussion is given for each of these issues with examples of the ways they are addressed by existing engines and related references are given some concluding remarks and directions for future research are also presented
load sensitive faults cause program to fail when it is executed under heavy load or over long period of time but may have no detrimental effect under small loads or short executions in addition to testing the functionality of these programs testing how well they perform under stress is very important current approaches to stress or load testing treat the system as black box generating test data based on parameters specified by the tester within an operational profile in this paper we advocate structural approach to load testing there exist many structural testing methods however their main goal is generating test data for executing all statements branches definition use pairs or paths of program at least once without consideration for executing any particular path extensivelyour initial work has focused on the identification of potentially load sensitive modules based on static analysis of the module’s code and then limiting the stress testing to the regions of the modules that could be the potential causes of the load sensitivity this analysis will be incorporated into testing tool for structural load testing which takes program as input and automatically determines whether that program needs to be load tested and if so automatically generates test data for structural load testing of the program
developing countries face significant challenges in network access making even simple network tasks unpleasant many standard techniques caching and predictive prefetching help somewhat but provide little or no assistance for personal data that is needed only by single user sulula addresses this problem by leveraging the near ubiquity of cellular phones able to send and receive simple sms messages rather than visit kiosk and fetch data on demand tiresome process at best users request future visit if capacity exists the kiosk can schedule secure retrieval of that user’s data saving time and more efficiently utilizing the kiosk’s limited connectivity when the user arrives at provisioned kiosk she need only obtain the session key on demand and thereafter has instant access in addition sulula allows users to schedule data uploads experimental results show significant gains for the end user saving tens of minutes of time for typical email news reading session we also describe small ongoing deployment in country for proof of concept lessons learned from that experience and provide discussion on pricing and marketplace issues that remain to be addressed to make the system viable for developing world access
process support systems psss are software systems supporting the modeling enactment monitoring and analysis of business processes process automation technology can be fully exploited when predictable and repetitive processes are executed unfortunately many processes are faced with the need of managing exceptional situations that may occur during their execution and possibly even more exceptions and failures can occur when the process execution is supported by pss exceptional situations may be caused by system hardware or software failures or may by related to the semantics of the business processin this paper we introduce taxonomy of failures and exceptions and discuss the effect that they can have on pss and on its ability to support business processes then we present the main approaches that commercial psss and research prototypes offer in order to capture and react to exceptional situations and we show which classes of failure or exception can be managed by each approach
we report on model based approach to system software co engineering which is tailored to the specific characteristics of critical on board systems for the aerospace domain the approach is supported by system level integrated modeling slim language by which engineers are provided with convenient ways to describe nominal hardware and software operation probabilistic faults and their propagation error recovery and degraded modes of operationcorrectness properties safety guarantees and performance and dependability requirements are given using property patterns which act as parameterized templates to the engineers and thus offer comprehensible and easy to use framework for requirement specification instantiated properties are checked on the slim specification using state of the art formal analysis techniques such as bounded sat based and symbolic model checking and probabilistic variants thereof the precise nature of these techniques together with the formal slim semantics yield trustworthy modeling and analysis framework for system and software engineers supporting among others automated derivation of dynamic ie randomly timed fault trees fmea tables assessment of fdir and automated derivation of observability requirements
there is tremendous amount of web content available today but it is not always in form that supports end users needs in many cases all of the data and services needed to accomplish goal already exist but are not in form amenable to an end user to address this problem we have developed an end user programming tool called marmite which lets end users create so called mashups that re purpose and combine existing web content and services in this paper we present the design implementation and evaluation of marmite an informal user study found that programmers and some spreadsheet users had little difficulty using the system
we present system that composes realistic picture from simple freehand sketch annotated with text labels the composed picture is generated by seamlessly stitching several photographs in agreement with the sketch and text labels these are found by searching the internet although online image search generates many inappropriate results our system is able to automatically select suitable photographs to generate high quality composition using filtering scheme to exclude undesirable images we also provide novel image blending algorithm to allow seamless image composition each blending result is given numeric score allowing us to find an optimal combination of discovered images experimental results show the method is very successful we also evaluate our system using the results from two user studies
in large scale sensor networks sensor nodes are at high risk of being captured and compromised once sensor node is compromised all the secret keys data and code stored on it are exposed to the attacker the attacker can insert arbitrary malicious code in the compromised node moreover he can easily replicate it in large number of clones and deploy them on the network this node replication attack can form the basis of variety of attacks such as dos attacks and sybil attacks previous studies of node replication attacks have had some drawbacks they need central trusted entity or they become vulnerable when many nodes are compromised therefore we propose distributed protocol for detecting node replication attacks that is resilient to many compromised nodes our method does not need any reliable entities and has high detection rate of replicated nodes our analysis and simulations demonstrate our protocol is effective even when there are large number of compromised nodes
developing and maintaining open source software has become an important source of profit for many companies change prone classes in open source products increase project costs by requiring developers to spend effort and time identifying and characterizing change prone classes can enable developers to focus timely preventive actions for example peer reviews and inspections on the classes with similar characteristics in the future releases or products in this study we collected set of static metrics and change data at class level from two open source projects koffice and mozilla using these data we first tested and validated pareto’s law which implies that great majority around of change is rooted in small proportion around of classes then we identified and characterized the change prone classes in the two products by producing tree based models in addition using tree based models we suggested prioritization strategy to use project resources for focused preventive actions in an efficient manner our empirical results showed that this strategy was effective for prioritization purposes this study should provide useful guidance to practitioners involved in development and maintenance of large scale open source products
we study auctions whose bidders are embedded in social or economic network as result even bidders who do not win the auction themselves might derive utility from the auction namely when friend wins on the other hand when an enemy or competitor wins bidder might derive negative utility such spite and altruism will alter the bidding strategies simple and natural model for bidders utilities in these settings posits that the utility of losing bidder as result of bidder winning is constant positive or negative fraction of bidder j’s utilitywe study such auctions under bayesian model in which all valuations are distributed independently according to known distribution but the actual valuations are private we describe and analyze nash equilibrium bidding strategies in two broad classes regular friendship networks with arbitrary valuation distributions and arbitrary friendship networks with identical uniform valuation distributions
the process of network debugging is commonly guided by decision trees that describe and attempt to address the most common failure modes we show that troubleshooting can be made more effective by converting decision trees into suites of convergent troubleshooting scripts that do not change network attributes unless these are out of compliance with accepted norms maelstrom is tool for managing and coordinating execution of these scripts maelstrom exploits convergence of individual scripts to dynamically infer an appropriate execution order for the scripts it accomplishes this in procedure trials where is the number of troubleshooting scripts this greatly eases adding scripts to troubleshooting scheme and thus makes it easier for people to cooperate in producing more exhaustive and effective troubleshooting schemes
recent studies have indicated an increase in customer profiling techniques used by commerce businesses commerce businesses are creating maintaining and utilising customer profiles to assist in personalisation personalisation can help improve customers satisfaction levels purchasing behaviour loyalty and subsequently improve sales the continuously changing customer needs and preferences pose challenge to commerce businesses on how to maintain and update individual customer profiles to reflect any changes in customers needs and preferences this research set out to investigate how dynamic customer profile for on line customers can be updated and maintained taking into consideration individual web visitors activities the research designed and implemented decision model that analysed on line customers activities during interaction sessions and determined whether to update customers profiles or not evaluation results indicated that the model was able to analyse the on line customers activities from log file and successfully updated the customers profiles based on the customer activities undertaken during the interaction session
unlike conventional rule based knowledge bases kbs that support monotonic reasoning key correctness issue ie the correctness of sub kb with respect to the full kb arises when using kb represented by non monotonic reasoning languages such as answer set programming asp since user may have rights to access only subset of kb the non monotonic nature of asp may cause the occurrence of consequences which are erroneous in the sense that the consequences are not reasonable in the full kb this paper proposes an approach dealing with the problem the main idea is to let the usage of closed world assumptions cwas for literals in kb satisfy certain constraints two kinds of access right propositions are created rule retrieval right propositions to control the access to rules and cwa right propositions to control the usage of cwas for literals based on these right propositions this paper first defines an algorithm for translating an original kb into kb tagged by right propositions and then discusses the right dependency in kb and proposes methods for checking and obtaining set of rights that is closed under set of dependency rules finally several results on the correctness of set of rights in kb are presented which serve as guidelines for the correct use of kb as an example kb of illness related financial support for teachers of university is presented to illustrate the application of our approach
the minimum energy broadcast routing problem was extensively studied during the last years given sample space where wireless devices are distributed the aim is to perform the broadcast communication pattern from given source while minimising the total energy consumption while many papers deal with the dimensional case where the sample space is given by plain area few results are known about the more interesting and practical dimensional case in this paper we study this case and we present tighter analysis of the minimum spanning tree heuristic in order to considerably decrease its approximation factor from the known to roughly this decreases the gap with the known lower bound of given by the so called dimensional kissing number
the collection of digital information by governments corporations and individuals has created tremendous opportunities for knowledge and information based decision making driven by mutual benefits or by regulations that require certain data to be published there is demand for the exchange and publication of data among various parties data in its original form however typically contains sensitive information about individuals and publishing such data will violate individual privacy the current practice in data publishing relies mainly on policies and guidelines as to what types of data can be published and on agreements on the use of published data this approach alone may lead to excessive data distortion or insufficient protection privacy preserving data publishing ppdp provides methods and tools for publishing useful information while preserving data privacy recently ppdp has received considerable attention in research communities and many approaches have been proposed for different data publishing scenarios in this survey we will systematically summarize and evaluate different approaches to ppdp study the challenges in practical data publishing clarify the differences and requirements that distinguish ppdp from other related problems and propose future research directions
customized processors use compiler analysis and design automation techniques to take generalized architectural model and create specific instance of it which is optimized to given application or set of applications these processors offer the promise of satisfying the high performance needs of the embedded community while simultaneously shrinking design times finite state machines fsm are fundamental building block in computer architecture and are used to control and optimize all types of prediction and speculation now even in the embedded space they are used for branch prediction cache replacement policies and confidence estimation and accuracy counters for variety of optimizations in this paper we present framework for automated design of small fsm predictors for customized processors our approach can be used to automatically generate small fsm predictors to perform well over suite of applications tailored to specific application or even specific instruction we evaluate the use of these customized fsm predictors for branch prediction over set of benchmarks
scaling of interconnects exacerbates the already challenging reliability of on chip networks although many researchers have provided various fault handling techniques in chip multi processors cmps the fault tolerance of the interconnection network is yet to adequately evolve as an end to end recovery approach delays fault detection and complicates recovery to consistent global state in such system link level retransmission is endorsed for recovery making higher level protocol simple in this paper we introduce fault tolerant flow control scheme for soft error handling in on chip networks the fault tolerant flow control recovers errors at link level by requesting retransmission and ensures an error free transmission on flit basis with incorporation of dynamic packet fragmentation dynamic packet fragmentation is adopted as part of fault tolerant flow control to disengage flits from the fault containment and recover the faulty flit transmission thus the proposed router provides high level of dependability at the link level for both datapath and control planes in simulation with injected faults the proposed router is observed to perform well gracefully degrading while exhibiting error coverage in datapath elements the proposed router has been implemented using tsmc nm standard cell library as compared to router which employs triple modular redundancy tmr in datapath elements the proposed router takes less area and consumes less energy per packet on average
systems providing only exact match answers without allowing any kind of preference or approximate queries are not sufficient in many contexts many different approaches have been introduced often incompatible in their setup or proposed implementation this work shows how different kinds of preference queries prefer preference sql and skyline can be combined and answered efficiently using bit sliced index bsi arithmetic this approach has been implemented in dbms and performance results are included showing that the bit sliced index approach is efficient not only in prototype system but in real system
the management of hierarchically organized data is starting to play key role in the knowledge management community due to the proliferation of topic hierarchies for text documents the creation and maintenance of such organized repositories of information requires great deal of human interventionthe machine learning community has partially addressed this problem by developing hierarchical supervised classifiers that help people categorize new resources within given hierarchies the worst problem of hierarchical supervised classifiers however is their high demand in terms of labeled examples the number of examples required is related to the number of topics in the taxonomy bootstrapping huge hierarchy with proper set of labeled examples is therefore critical issuethis paper proposes some solutions for the bootstrapping problem that implicitly or explicitly use taxonomy definition baseline approach that classifies documents according to the class terms and two clustering approaches whose training is constrained by the priori knowledge encoded in the taxonomy structure which consists of both terminological and relational aspects in particular we propose the tax som model that clusters set of documents in predefined hierarchy of classes directly exploiting the knowledge of both their topological organization and their lexical description experimental evaluation was performed on set of taxonomies taken from the googletm and looksmarttm web directories obtaining good results
the traditional interaction mechanism with database system is through the use of query language the most widely used one being sql however when one is facing situation where he or she has to make minor modification to previously issued sql query either the whole query has to be written from scratch or one has to invoke an editor to edit the query this however is not the way we converse with each other as humans during the course of conversation the preceding interaction is used as context within which many incomplete and or incremental phrases are uniquely and unambiguously interpreted sparing the need to repeat the same things again and again in this paper we present an effective mechanism that allows user to interact with database system in way similar to the way humans converse more specifically incomplete sql queries are accepted as input which are then matched to identified parts of previously issued queries disambiguation is achieved by using various types of semantic information the overall method works independently of the domain under which it is used ie independently of the database schema several algorithms that are variations of the same basic mechanism are proposed they are mutually compared with respect to efficiency and accuracy through limited set of experiments on human subjects the results have been encouraging especially when semantic knowledge from the schema is exploited laying potential foundation for conversational querying in databases
learned activity specific motion models are useful for human pose and motion estimation nevertheless while the use of activity specific models simplifies monocular tracking it leaves open the larger issues of how one learns models for multiple activities or stylistic variations and how such models can be combined with natural transitions between activities this paper extends the gaussian process latent variable model gp lvm to address some of these issues we introduce new approach to constraining the latent space that we refer to as the locally linear gaussian process latent variable model ll gplvm the ll gplvm allows for an explicit prior over the latent configurations that aims to preserve local topological structure in the training data we reduce the computational complexity of the gplvm by adapting sparse gaussian process regression methods to the gp lvm by incorporating sparsification dynamics and back constraints within the ll gplvm we develop general framework for learning smooth latent models of different activities within shared latent space allowing the learning of specific topologies and transitions between different activities
effective system verification requires good specifications the lack of sufficient specifications can lead to misses of critical bugs design re spins and time to market slips in this paper we present new technique for mining temporal specifications from simulation or execution traces of digital hardware design given an execution trace we mine recurring temporal behaviors in the trace that match set of pattern templates subsequently we synthesize them into complex patterns by merging events in time and chaining the patterns using inference rules we specifically designed our algorithm to make it highly efficient and meaningful for digital circuits in addition we propose pattern mining diagnosis framework where specifications mined from correct and erroneous traces are used to automatically localize an error we demonstrate the effectiveness of our approach on industrial size examples by mining specifications from traces of over million cycles in few minutes and use them to successfully localize errors of different types to within module boundaries
mobile robots that interact with humans in an intuitive way must be able to follow directions provided by humans in unconstrained natural language in this work we investigate how statistical machine translation techniques can be used to bridge the gap between natural language route instructions and map of an environment built by robot our approach uses training data to learn to translate from natural language instructions to an automatically labeled map the complexity of the translation process is controlled by taking advantage of physical constraints imposed by the map as result our technique can efficiently handle uncertainty in both map labeling and parsing our experiments demonstrate the promising capabilities achieved by our approach
haskell programmers often use multi parameter type class in which one or more type parameters are functionally dependent on the first although such functional dependencies have proved quite popular in practice they express the programmer’s intent somewhat indirectly developing earlier work on associated data types we propose to add functionally dependent types as type synonyms to type class bodies these associated type synonyms constitute an interesting new alternative to explicit functional dependencies
software monitoring is technique that is well suited to supporting the development of dependable system it has been widely applied not only for this purpose but also for other purposes such as debugging security performance evaluation and enhancement etc however there is an inherent gap between the levels of abstraction of the information that is collected during software monitoring the implementation level and that of the software architecture level where many design decisions are made unless an immediate structural one to one architecture to implementation mapping takes place we need specification language to describe how low level events are related to higher level ones although some specification languages for monitoring have been proposed in the literature they do not provide support up to the software architecture level in addition these languages make it harder to link to and reuse information from other event based models often employed for reliability analysis in this paper we discuss the importance of event description as an integration element for architecting dependable systems
there is growing interest in algorithms for processing and querying continuous data streams ie data seen only once in fixed order with limited memory resources in its most general form data stream is actually an update stream ie comprising data item deletions as well as insertions such massive update streams arise naturally in several application domains eg monitoring of large ip network installations or processing of retail chain transactions estimating the cardinality of set expressions defined over several possibly distributed update streams is perhaps one of the most fundamental query classes of interest as an example such query may ask xc what is the number of distinct ip source addresses seen in passing packets from both router sub sub and sub sub but not router sub sub xd earlier work only addressed very restricted forms of this problem focusing solely on the special case of insert only streams and specific operators eg union in this paper we propose the first space efficient algorithmic solution for estimating the cardinality of full fledged set expressions over general update streams our estimation algorithms are probabilistic in nature and rely on novel hash based synopsis data structure termed xd level hash sketch xd we demonstrate how our level hash sketch synopses can be used to provide low error high confidence estimates for the cardinality of set expressions including operators such as set union intersection and difference over continuous update streams using only space that is significantly sublinear in the sizes of the streaming input multi sets furthermore our estimators never require rescanning or resampling of past stream items regardless of the number of deletions in the stream we also present lower bounds for the problem demonstrating that the space usage of our estimation algorithms is within small factors of the optimal finally we propose an optimized time efficient stream synopsis based on level hash sketches that provides similar strong accuracy space guarantees while requiring only guaranteed logarithmic maintenance time per update thus making our methods applicable for truly rapid rate data streams our results from an empirical study of our synopsis and estimation techniques verify the effectiveness of our approach
internet hosting centers serve multiple service sites from common hardware base this paper presents the design and implementation of an architecture for resource management in hosting center operating system with an emphasis on energy as driving resource management issue for large server clusters the goals are to provision server resources for co hosted services in way that automatically adapts to offered load improve the energy efficiency of server clusters by dynamically resizing the active server set and respond to power supply disruptions or thermal events by degrading service in accordance with negotiated service level agreements slas our system is based on an economic approach to managing shared server resources in which services bid for resources as function of delivered performance the system continuously monitors load and plans resource allotments by estimating the value of their effects on service performance greedy resource allocation algorithm adjusts resource prices to balance supply and demand allocating resources to their most efficient use reconfigurable server switching infrastructure directs request traffic to the servers assigned to each service experimental results from prototype confirm that the system adapts to offered load and resource availability and can reduce server energy usage by or more for typical web workload
user behavior information analysis has been shown important for optimization and evaluation of web search and has become one of the major areas in both information retrieval and knowledge management researches this paper focuses on users searching behavior reliability study based on large scale query and click through logs collected from commercial search engines the concept of reliability is defined in probabilistic notion the context of user click behavior on search results is analyzed in terms of relevance five features namely query number click entropy first click ratio last click ratio and rank position are proposed and studied to separate reliable user clicks from the others experimental results show that the proposed method evaluates the reliability of user behavior effectively the auc value of the roc curve is and the algorithm maintains relevant clicks when filtering out low quality clicks
we present our experience in implementing group communication toolkit in objective caml dialect of the ml family of programming languages we compare the toolkit both quantitatively and qualitatively to predecessor toolkit which was implemented in our experience shows that using the high level abstraction features of ml gives substantial advantages some of these features such as automatic memory management and message marshalling allowed us to concentrate on those pieces of the implementation which required careful attention in order to achieve good performance we conclude with set of suggested changes to ml implementations
soft state is an often cited yet vague concept in network protocol design in which two or more network entities intercommunicate in loosely coupled often anonymous fashion researchers often define this concept operationally if at all rather than analytically source of soft state transmits periodic refresh messages over lossy communication channel to one or more receivers that maintain copy of that state which in turn expires if the periodic updates cease though number of crucial internet protocol building blocks are rooted in soft state based designs eg rsvp refresh messages pim membership updates various routing protocol updates rtcp control messages directory services like sap and so forth controversy is building as to whether the performance overhead of soft state refresh messages justify their qualitative benefit of enhanced system robustness we believe that this controversy has risen not from fundamental performance tradeoffs but rather from our lack of comprehensive understanding of soft state to better understand these tradeoffs we propose herein formal model for soft state communication based on probabilistic delivery model with relaxed reliability using this model we conduct queueing analysis and simulation to characterize the data consistency and performance tradeoffs under range of workloads and network loss rates we then extend our model with feedback and show through simulation that adding feedback dramatically improves data consistency by up to without increasing network resource consumption our model not only provides foundation for understanding soft state but also induces new fundamental transport protocol based on probabilistic delivery toward this end we sketch our design of the soft state transport protocol sstp which enjoys the robustness of soft state while retaining the performance benefit of hard state protocols like tcp through its judicious use of feedback
the development of the semantic web will require agents to use common domain ontologies to facilitate communication of conceptual knowledge however the proliferation of domain ontologies may also result in conflicts between the meanings assigned to the various terms that is agents with diverse ontologies may use different terms to refer to the same meaning or the same term to refer to different meanings agents will need method for learning and translating similar semantic concepts between diverse ontologies only until recently have researchers diverged from the last decade’s ldquo common ontology rdquo paradigm to paradigm involving agents that can share knowledge using diverse ontologies this paper describes how we address this agent knowledge sharing problem of how agents deal with diverse ontologies by introducing methodology and algorithms for multi agent knowledge sharing and learning in peer to peer setting we demonstrate how this approach will enable multi agent systems to assist groups of people in locating translating and sharing knowledge using our distributed ontology gathering group integration environment doggie and describe our proof of concept experiments doggie synthesizes agent communication machine learning and reasoning for information sharing in the web domain
we propose an energy efficient framework called saf for approximate querying and clustering of nodes in sensor network saf uses simple time series forecasting models to predict sensor readings the idea is to build these local models at each node transmit them to the root of the network the sink and use them to approximately answer user queries our approach dramatically reduces communication relative to previous approaches for querying sensor networks by exploiting properties of these local models since each sensor communicates with the sink only when its local model varies due to changes in the underlying data distribution in our experimental results performed on trace of real data we observed on average about message transmissions from each sensor over week including the learning phase to correctly predict temperatures to within csaf also provides mechanism to detect data similarities between nodes and organize nodes into clusters at the sink at no additional communication cost this is again achieved by exploiting properties of our local time series models and by means of novel definition of data similarity between nodes that is based not on raw data but on the prediction values our clustering algorithm is both very efficient and provably optimal in the number of clusters our clusters have several interesting features first they can capture similarity between far away nodes that are not geographically adjacent second cluster membership to variations in sensors local models third nodes within cluster are not required to track the membership of other nodes in the cluster we present number of simulation based experimental results that demonstrate these properties of saf
good spatial locality alleviates both the latency and bandwidth problem of memory by boosting the effect of prefetching and improving the utilization of cache however conventional definitions of spatial locality are inadequate for programmer to precisely quantify the quality of program to identify causes of poor locality and to estimate the potential by which spatial locality can be improved this paper describes new component based model for spatial locality it is based on measuring the change of reuse distances as function of the data block size it divides spatial locality into components at program and behavior levels while the base model is costly because it requires the tracking of the locality of every memory access the overhead can be reduced by using small inputs and by extending sampling based tool the paper presents the result of the analysis for large set of benchmarks the cost of the analysis and the experience of user study in which the analysis helped to locate data layout problem and improve performance by with line change in an application with over lines
this paper presents novel technique anatomy for publishing sensitive data anatomy releases all the quasi identifier and sensitive values directly in two separate tables combined with grouping mechanism this approach protects privacy and captures large amount of correlation in the microdata we develop linear time algorithm for computing anatomized tables that obey the diversity privacy requirement and minimize the error of reconstructing the microdata extensive experiments confirm that our technique allows significantly more effective data analysis than the conventional publication method based on generalization specifically anatomy permits aggregate reasoning with average error below which is lower than the error obtained from generalized table by orders of magnitude
in response to long lasting anticipation by the java community version of the java platform referred to as java introduced generic types and methods to the java language the java generics are significant enhancement to the language expressivity because they allow straightforward composition of new generic classes from existing ones while reducing the need for plethora of type casts while the java generics are expressive the chosen implementation method type erasure has triggered undesirable orthogonality violations this paper identifies six cases of orthogonality violations in the java generics and demonstrates how these violations are mandated by the use of type erasure the paper also compares the java cases of orthogonality violations to compatible cases in and nextgen and analyzes the tradeoffs in the three approaches the conclusion is that java users face new challenges number of generic type expressions are forbidden while others that are allowed are left unchecked by the compiler
compared to traditional text classification with flat category set or small hierarchy of categories classifying web pages to large scale hierarchy such as open directory project odp and yahoo directory is challenging while recently proposed deep classification method makes the problem tractable it still suffers from low classification performance major problem is the lack of training data which is unavoidable with such huge hierarchy training pages associated with the category nodes are short and their distributions are skewed to alleviate the problem we propose new training data selection strategy and na iuml ve bayes combination model which utilize both local and global information we conducted series of experiments with the odp hierarchy containing more than categories to show that the proposed method of using both local and global information indeed helps avoiding the training data sparseness problem outperforming the state of art method
without proper simplification techniques database integrity checking can be prohibitively time consuming several methods have been developed for producing simplified incremental checks for each update but none until now of sufficient quality and generality for providing true practical impact and the present paper is an attempt to fill this gap on the theoretical side general characterization is introduced of the problem of simplification of integrity constraints and natural definition is given of what it means for simplification procedure to be ideal we prove that ideality of simplification is strictly related to query containment in fact an ideal simplification pro cedure can only exist in database languages for which query containment is decidable however simplifications that do not qualify as ideal may also be relevant for practical purposes we present concrete approach based on transformation operators that apply to integrity constraints written in rich datalog like language with negation the resulting procedure produces at design time simplified constraints for parametric transaction patterns which can then be instantiated and checked for consistency at run time these tests take place before the execution of the update so that only consistency preserving updates are eventually given to the database the extension to more expressive languages and the application of the framework to other contexts such as data integration and concurrent database systems are also discussed our experiments show that the simplifications obtained with our method may give rise to much better performance than with previous methods and that further improvements are achieved by checking consistency before executing the update
what distinguishes commerce from ordinary commerce what distinguishes it from distributed computation in this paper we propose performative theory of commerce drawing on speech act theory in which commerce exchanges are promises of future commercial actions whose real world meanings are constructed jointly and incrementally we then define computational model for this theory called posit spaces along with the syntax and semantics for an agent interaction protocol the posit spaces protocol or psp this protocol enables participants in multi agent commercial interaction to propose accept modify and revoke joint commitments our work integrates three strands of prior research the theory of tuple spaces in distributed computation formal dialogue games from argumentation theory and the study of commitments in multi agent systems
traffic anomalies such as failures and attacks are increasing in frequency and severity and thus identifying them rapidly and accurately is critical for large network operators the detection typically treats the traffic as collection of flows and looks for heavy changes in traffic patterns eg volume number of connections however as link speeds and the number of flows increase keeping per flow state is not scalable the recently proposed sketch based schemes are among the very few that can detect heavy changes and anomalies over massive data streams at network traffic speeds however sketches do not preserve the key eg source ip address of the flows hence even if anomalies are detected it is difficult to infer the culprit flows making it big practical hurdle for online deployment meanwhile the number of keys is too large to record to address this challenge we propose efficient reversible hashing algorithms to infer the keys of culprit flows from sketches without storing any explicit key information no extra memory or memory accesses are needed for recording the streaming data meanwhile the heavy change detection daemon runs in the background with space complexity and computational time sublinear to the key space size this short paper describes the conceptual framework of the reversible sketches as well as some initial approaches for implementation see for the optimized algorithms in details comment we further apply various emph ip mangling algorithms and emph bucket classification methods to reduce the false positives and false negatives evaluated with netflow traffic traces of large edge router we demonstrate that the reverse hashing can quickly infer the keys of culprit flows even for many changes with high accuracy
software bugs in routers lead to network outages security vulnerabilities and other unexpected behavior rather than simply crashing the router bugs can violate protocol semantics rendering traditional failure detection and recovery techniques ineffective handling router bugs is an increasingly important problem as new applications demand higher availability and networks become better at dealing with traditional failures in this paper we tailor software and data diversity sdd to the unique properties of routing protocols so as to avoid buggy behavior at run time our bug tolerant router executes multiple diverse instances of routing software and uses voting to determine the output to publish to the forwarding table or to advertise to neighbors we design and implement router hypervisor that makes this parallelism transparent to other routers handles fault detection and booting of new router instances and performs voting in the presence of routing protocol dynamics without needing to modify software of the diverse instances experiments with bgp message traces and open source software running on our linux based router hypervisor demonstrate that our solution scales to large networks and efficiently masks buggy behavior
commercially available database systems do not meet the information and processing needs of design and manufacturing environments new generation of systems engineering information systems must be built to meet these needs the architectural and computational aspects of such systems are addressed and solutions are proposed the authors argue that mainframe workstation architecture is needed to provide distributed functionality while ensuring high availability and low communication overhead that explicit control of metaknowledge is needed to support extendibility and evolution that large rule bases are needed to make the knowledge of the systems active and that incremental computation models are needed to achieve the required performance of such engineering information systems
large number of call graph construction algorithms for object oriented and functional languages have been proposed each embodying different tradeoffs between analysis cost and call graph precision in this article we present unifying framework for understanding call graph construction algorithms and an empirical comparison of representative set of algorithms we first present general parameterized algorithm that encompasses many well known and novel call graph construction algorithms we have implemented this general algorithm in the vortex compiler infrastructure mature multilanguage optimizing compiler the vortex implementation provides level playing field for meaningful cross algorithm performance comparisons the costs and benefits of number of call graph construction algorithms are empirically assessed by applying their vortex implementation to suite of sizeable to lines of code cecil and java programs for many of these applications interprocedural analysis enabled substantial speed ups over an already highly optimized baseline furthermore significant fraction of these speed ups can be obtained through the use of scalable near linear time call graph construction algorithm
in this paper we present the java aspect components jac framework for building aspect oriented distributed applications in java this paper describes the aspect oriented programming model and the architectural details of the framework implementation the framework enables extension of application semantics for handling well separated concerns this is achieved with software entity called an aspect component ac acs provide distributed pointcuts dynamic wrappers and metamodel annotations distributed pointcuts are key feature of our framework they enable the definition of crosscutting structures that do not need to be located on single host acs are dynamic they can be added removed and controlled at runtime this enables our framework to be used in highly dynamic environments where adaptable software is needed
scheduling algorithms used in compilers traditionally focus on goals such as reducing schedule length and register pressure or producing compact code in the context of hardware synthesis system where the schedule is used to determine various components of the hardware including datapath storage and interconnect the goals of scheduler change drastically in addition to achieving the traditional goals the scheduler must proactively make decisions to ensure efficient hardware is produced this paper proposes two exact solutions for cost sensitive modulo scheduling one based on an integer linear programming formulation and another based on branch and bound search to achieve reasonable compilation times decomposition techniques to break down the complex scheduling problem into phase ordered sub problems are proposed the decomposition techniques work either by partitioning the dataflow graph into smaller subgraphs and optimally scheduling the subgraphs or by splitting the scheduling problem into two phases time slot and resource assignment the effectiveness of cost sensitive modulo scheduling in minimizing the costs of function units register structures and interconnection wires are evaluated within fully automatic synthesis system for loop accelerators the cost sensitive modulo scheduler increases the efficiency of the resulting hardware significantly compared to both traditional cost unaware and greedy cost aware modulo schedulers
this study develops theory of personal values and trust in mobile commerce commerce service systems research hypotheses involve the satisfaction formation processes and usage continuance intentions in commerce service context and the participator is selected from three private wireless telecommunication service providers in taiwan results showed consumers might not use the commerce service in the future without being totally satisfied with the system and their customer values mobile technology trusting expectations were very important in the continued commerce service usage behaviour and the providers might not fulfil the commerce service need for consumers but satisfied with the commerce service delivered
files classes or methods have frequently been investigated in recent research on co change in this paper we present first study at the level of lines to identify line changes across several versions we define the annotation graph which captures how lines evolve over time the annotation graph provides more fine grained software evolution information such as life cycles of each line and related changes whenever developer changed line of versiontxt she also changed line of libraryjava
we consider lock free synchronization for dynamic embedded real time systems that are subject to resource overloads and arbitrary activity arrivals we model activity arrival behaviors using the unimodal arbitrary arrival model or uam uam embodies stronger ldquo adversary rdquo than most traditional arrival models we derive an upper bound on lock free retries under the uam with utility accrual scheduling mdash the first such result we establish the tradeoffs between lock free and lock based sharing under uam these include conditions under which activities accrued timeliness utility is greater under lock free than lock based and the consequent lower and upper bound on the total accrued utility that is possible with lock free and lock based sharing we confirm our analytical results with posix rtos implementation
in this paper study is described which investigates differences in game experience between the use of iconic and symbolic tangibles in digital tabletop interaction to enable this study new game together with two sets of play pieces iconic and symbolic was developed and used in an experiment with participants in this experiment the understanding of the game the understanding of the play pieces and the fun experience were tested both the group who played with iconic play pieces and the group who played with symbolic play pieces were proven to have comparable fun experience and understanding of the game however the understanding of the play pieces was higher in the iconic group and large majority of both groups preferred to play with iconic play pieces rather then symbolic play pieces
packet based on chip networks are increasingly being adopted in complex system on chip soc designs supporting numerous homogeneous and heterogeneous functional blocks these network on chip noc architectures are required to not only provide ultra low latency but also occupy small footprint and consume as little energy as possible further reliability is rapidly becoming major challenge in deep sub micron technologies due to the increased prominence of permanent faults resulting from accelerated aging effects and manufacturing testing challenges towards the goal of designing low latency energyefficient and reliable on chip communication networks we propose novel fine grained modular router architecture the proposed architecture employs decoupled parallel arbiters and uses smaller crossbars for row and column connections to reduce output port contention probabilities as compared to existing designs furthermore the router employs new switch allocation technique known as mirroring effect to reduce arbitration depth and increase concurrency in addition the modular design permits graceful degradation of the network in the event of permanent faults and also helps to reduce the dynamic power consumption our simulation results indicate that in an mesh network the proposed architecture reduces packet latency by and power consumption by as compared to two existing router architectures evaluation using combined performance energy and fault tolerance metric indicates that the proposed architecture provides overall improvement compared to the two earlier routers
the world wide web has undergone major changes in recent years the idea to see the web as platform for services instead of one way source of information has come along with number of new applications such as photo and video sharing portals and wikisin this paper we study how these changes affect the nature of the data distributed over the world wide web to do so we compare two data traces collected at the web proxy server of the rwth aachen the first trace was recorded in the other more than seven years later in we show the major differences and the similarities between the two traces and compare our observations with other work the results indicate that traditional proxy caching is no longer effective in typical university networks
transactions have been around since the seventies to provide reliable information processing in automated information systems originally developed for simple debit credit style database operations in centralized systems they have moved into much more complex application domains including aspects like distribution process orientation and loose coupling the amount of published research work on transactions is huge and number of overview papers and books already exist concise historic analysis providing an overview of the various phases of development of transaction models and mechanisms in the context of growing complexity of application domains is still missing however to fill this gap this paper presents historic overview of transaction models organized in several transaction management eras thereby investigating numerous transaction models ranging from the classical flat transactions via advanced and workflow transactions to the web services and grid transaction models the key concepts and techniques with respect to transaction management are investigated placing well known research efforts in historical perspective reveals specific trends and developments in the area of transaction management as such this paper provides comprehensive structured overview of developments in the area
we describe compression model for semistructured documents called structural contexts model scm which takes advantage of the context information usually implicit in the structure of the text the idea is to use separate model to compress the text that lies inside each different structure type eg different xml tag the intuition behind scm is that the distribution of all the texts that belong to given structure type should be similar and different from that of other structure types we mainly focus on semistatic models and test our idea using word based huffman method this is the standard for compressing large natural language text databases because random access partial decompression and direct search of the compressed collection is possible this variant dubbed scmhuff retains those features and improves huffman’s compression ratios we consider the possibility that storing separate models may not pay off if the distribution of different structure types is not different enough and present heuristic to merge models with the aim of minimizing the total size of the compressed database this gives an additional improvement over the plain technique the comparison against existing prototypes shows that among the methods that permit random access to the collection scmhuff achieves the best compression ratios better than the closest alternative from purely compression aimed perspective we combine scm with ppm modeling separate ppm model is used to compress the text that lies inside each different structure type the result scmppm does not permit random access nor direct search in the compressed text but it gives better compression ratios than other techniques for texts longer than mb
this paper presents general framework for determining average program execution times and their variance based on the program’s interval structure and control dependence graph average execution times and variance values are computed using frequency information from an optimized counter based execution profile of the program
using analysis simulation and experimentation we examine the threat against anonymous communications posed by passive logging attacks in previous work we analyzed the success of such attacks under various assumptions here we evaluate the effects of these assumptions more closely first we analyze the onion routing based model used in prior work in which fixed set of nodes remains in the system indefinitely we show that for this model by removing the assumption of uniformly random selection of nodes for placement in the path initiators can greatly improve their anonymity second we show by simulation that attack times are significantly lower in practice than bounds given by analytical results from prior work third we analyze the effects of dynamic membership model in which nodes are allowed to join and leave the system we show that all known defenses fail more quickly when the assumption of static node set is relaxed fourth intersection attacks against peer to peer systems are shown to be an additional danger either on their own or in conjunction with the predecessor attack finally we address the question of whether the regular communication patterns required by the attacks exist in real traffic we collected and analyzed the web requests of users to determine the extent to which basic patterns can be found we show that for our study frequent and repeated communication to the same web site is common
methods of top search with no random access can be used to find best objects using sorted access to the sources of attribute values in this paper we present new heuristics over the nra algorithm that can be used for fast search of top objects using wide range of user preferences nra algorithm usually needs periodical scan of large number of candidates during the computation in this paper we propose methods of no random access top search that optimize the candidate list maintenance during the computation to speed up the search the proposed methods are compared to table scan method typically used in databases we present results of experiments showing speed improvement depending on number of object attributes expressed in user preferences or selectivity of user preferences
the mapreduce framework is increasingly being used to analyze large volumes of data one important type of data analysis done with mapreduce is log processing in which click stream or an event log is filtered aggregated or mined for patterns as part of this analysis the log often needs to be joined with reference data such as information about users although there have been many studies examining join algorithms in parallel and distributed dbmss the mapreduce framework is cumbersome for joins mapreduce programmers often use simple but inefficient algorithms to perform joins in this paper we describe crucial implementation details of number of well known join strategies in mapreduce and present comprehensive experimental comparison of these join techniques on node hadoop cluster our results provide insights that are unique to the mapreduce platform and offer guidance on when to use particular join algorithm on this platform
we propose dialogue game protocol for purchase negotiation dialogues which identifies appropriate speech acts defines constraints on their utterances and specifies the different sub tasks agents need to perform in order to engage in dialogues according to this protocol our formalism combines dialogue game similar to those in the philosophy of argumentation with model of rational consumer purchase decision behaviour adopted from marketing theory in addition to the dialogue game protocol we present portfolio of decision mechanisms for the participating agents engaged in the dialogue and use these to provide our formalism with an operational semantics we show that these decision mechanisms are sufficient to generate automated purchase decision dialogues between autonomous software agents interacting according to our proposed dialogue game protocol
the increasing size and complexity of many software systems demand greater emphasis on capturing and maintaining knowledge at many different levels within the software development process this knowledge includes descriptions of the hardware and software components and their behavior external and internal design specifications and support for system testing the knowledge based software engineering kbse research paradigm is concerned with systems that use formally represented knowledge with associated inference precedures to support the various subactivities of software development as they growing scale kbse systems must balance expressivity and inferential power with the real demands of knowledge base construction maintenance performance and comprehensibility description logics dls possess several features mdash terminological orientation formal semantics and efficient reasoning procedures mdash which offer an effective tradeoff of these factors we discuss three kbse systems in which dls capture some of the requisite knowledge needed to support design coding and testing activities we then survey some alternative approaches to dls in kbse systems we close with discussion of the benefits of dls and ways to address some of their limitations
this paper shows an asymptotically tight analysis of the certified write all algorithm called awt that was introduced by anderson and woll siam comput and method for creating near optimal instances of the algorithm this algorithm is the best known deterministic algorithm that can be used to simulate synchronous parallel processors on asynchronous processors the algorithm is instantiated with permutations on where can be chosen from wide range of values when implementing simulation on specific parallel system with processors one would like to select the best possible value of and the best possible permutations in order to maximize the efficiency of the simulationthis paper shows that work complexity of any instance of awt is logq where is the number of permutations selected and is value related to their combinatorial properties the choice of turns out to be critical for obtaining an instance of the awt algorithm with near optimal work for any and any large enough work of any instance of the algorithm must be at least ln ln ln under certain conditions however that is about ln ln ln and for infinitely many large enough this lower bound can be nearly attained by instances of the algorithm that use certain permutations and have work at most ln ln ln the paper also shows penalty for not selecting well when is significantly away from ln ln ln then work of any instance of the algorithm with this displaced must be considerably higher than otherwise
two or more components eg objects modules or programs interoperate when they exchange data such as xml data using application programming interface api calls exported by xml parsers remains primary mode of accessing and manipulating xml and these api calls lead to various run time errors in components that exchange xml data currently no tool checks the source code of interoperating components for potential flaws caused by third party api calls that lead to incorrect xml data exchanges and runtime errors even when components are located within the same application our solution combines program abstraction and symbolic execution in order to reengineer the approximate schema of xml data that would be output by component this schema is compared using bisimulation with the schema of xml data that is expected by some other components we describe our approach and give our error checking algorithm we implemented our approach in tool that we used on open source and commercial systems and discovered errors that were not detected during their design and testing
outsourcing the training of support vector machines svm to external service providers benefits the data owner who is not familiar with the techniques of the svm or has limited computing resources in outsourcing the data privacy is critical issue for some legal or commercial reasons since there may be sensitive information contained in the data existing privacy preserving svm works are either not applicable to outsourcing or weak in security in this paper we propose scheme for privacy preserving outsourcing the training of the svm without disclosing the actual content of the data to the service provider in the proposed scheme the data sent to the service provider is perturbed by random transformation and the service provider trains the svm for the data owner from the perturbed data the proposed scheme is stronger in security than existing techniques and incurs very little redundant communication and computation cost
representation exposure is well known problem in the object oriented realm object encapsulation mechanisms have established tradition for solving this problem based on principle of reference containment this paper proposes novel type system which is based on different principle we call effect encapsulation which confines side effects rather than object references according to an ownership structure compared to object encapsulation effect encapsulation liberates us from the restriction on object referenceability and offers more flexibility in this paper we show that effect encapsulation can be statically type checked
set based program analysis establishes constraints between sets of abstract values for all expressions in program solving the system of constraints produces conservative approximation to the program’s runtime flow of valuessome practical set based analyses use explicit selectors to extract the relevant values from an approximation set for example if the analysis needs to determine the possible return values of procedure it uses the appropriate selector to extract the relevant component from the abstract representation of the procedurein this paper we show that this selector based approach complicates the constraint solving phase of the analysis too much and thus fails to scale up to realistic programming languages we demonstrate this claim with full fledged value flow analysis for case lambda multi branched version of lambda we show how both the theoretical underpinnings and the practical implementation become too complex in response we present variant of set based closure analysis that computes equivalent results in much more efficient manner
we argue that runtime program transformation partial evaluation and dynamic compilation are essential tools for automated generation of flexible highly interactive graphical interfaces in particular these techniques help bridge the gap between high level functional description and an efficient implementation to support our claim we describe our application of these techniques to functional implementation of vision real time visualization system that represents multivariate relations as nested interactors and to auto visual rule based system that designs vision visualizations from high level task specifications vision visualizations are specified using simple functional language these programs are transformed into cached dataflow graph partial evaluator is used on particular computation intensive function applications and the results are compiled to native code the functional representation simplifies generation of correct code and the program transformations ensure good performance we demonstrate why these transformations improve performance and why they cannot be done at compile time
building rules on top of ontologies is the ultimate goal of the logical layer of the semantic web to this aim an ad hoc markup language for this layer is currently under discussion it is intended to follow the tradition of hybrid knowledge representation and reasoning systems such as inline graphic mime subtype gif xlink sinline alt text mathcal al alt text inline graphic log that integrates the description logic inline graphic mime subtype gif xlink sinline alt text mathcal alc alt text inline graphic and the function free horn clausal language datalog in this paper we consider the problem of automating the acquisition of these rules for the semantic web we propose general framework for rule induction that adopts the methodological apparatus of inductive logic programming and relies on the expressive and deductive power of inline graphic mime subtype gif xlink sinline alt text mathcal al alt text inline graphic log the framework is valid whatever the scope of induction description versus prediction is yet for illustrative purposes we also discuss an instantiation of the framework which aims at description and turns out to be useful in ontology refinement
we present set of techniques for reducing the memory consumption of object oriented programs these techniques include analysis algorithms and optimizations that use the results of these analyses to eliminate fields with constant values reduce the sizes of fields based on the range of values that can appear in each field and eliminate fields with common default values or usage patterns we apply these optimizations both to fields declared by the programmer and to implicit fields in the runtime object header although it is possible to apply these techniques to any object oriented program we expect they will be particularly appropriate for memory limited embedded systemswe have implemented these techniques in the mit flex compiler system and applied them to the programs in the specjvm benchmark suite our experimental results show that our combined techniques can reduce the maximum live heap size required for the programs in our benchmark suite by as much as some of the optimizations reduce the overall execution time others may impose modest performance penalties
next generation decision support applications besides being capable of processing huge amounts of data require the ability to integrate and reason over data from multiple heterogeneous data sources often these data sources differ in variety of aspects such as their data models the query languages they support and their network protocols also typically they are spread over wide geographical area the cost of processing decision support queries in such setting is quite high however processing these queries often involves redundancies such as repeated access of same data source and multiple execution of similar processing sequences minimizing these redundancies would significantly reduce the query processing cost in this paper we propose an architecture for processing complex decision support queries involving multiple heterogeneous data sources introduce the notion of transient views mdash materialized views that exist only in the context of execution of query mdash that is useful for minimizing the redundancies involved in the execution of these queries develop cost based algorithm that takes query plan as input and generates an optimal ldquo covering plan rdquo by minimizing redundancies in the original plan validate our approach by means of an implementation of the algorithms and detailed performance study based on tpc benchmark queries on commercial database system and finally compare and contrast our approach with work in related areas in particular the areas of answering queries using views and optimization using common sub expressions our experiments demonstrate the practicality and usefulness of transient views in significantly improving the performance of decision support queries
editor’s note this article describes web based energy estimation tool for embedded systems an interesting feature of this tool is that it performs real time cycle accurate energy measurements on hardware prototype of the processor the authors describe the various steps involved in using the tool and present case studies to illustrate its utility anand raghunathan nec laboratories
warping the pointer across monitor bezels has previously been demonstrated to be both significantly faster and preferred to the standard mouse behavior when interacting across displays in homogeneous multi monitor configurations complementing this work we present user study that compares the performance of four pointer warping strategies including previously untested frame memory placement strategy in heterogeneous multi monitor environments where displays vary in size resolution and orientation our results show that new frame memory pointer warping strategy significantly improved targeting performance up to in some cases in addition our study showed that when transitioning across screens the mismatch between the visual and the device space has significantly bigger impact on performance than the mismatch in orientation and visual size alone for mouse operation in highly heterogeneous multi monitor environment all our participants strongly preferred using pointer warping over the regular mouse behavior
cloud computing is disruptive trend that is changing the way we use computers the key underlying technology in cloud infrastructures is virtualization so much so that many consider virtualization to be one of the key features rather than simply an implementation detail unfortunately the use of virtualization is the source of significant security concern because multiple virtual machines run on the same server and since the virtualization layer plays considerable role in the operation of virtual machine malicious party has the opportunity to attack the virtualization layer successful attack would give the malicious party control over the all powerful virtualization layer potentially compromising the confidentiality and integrity of the software and data of any virtual machine in this paper we propose removing the virtualization layer while retaining the key features enabled by virtualization our nohype architecture named to indicate the removal of the hypervisor addresses each of the key roles of the virtualization layer arbitrating access to cpu memory and devices acting as network device eg ethernet switch and managing the starting and stopping of guest virtual machines additionally we show that our nohype architecture may indeed be no hype since nearly all of the needed features to realize the nohype architecture are currently available as hardware extensions to processors and devices
effort prediction is very important issue for software project management historical project data sets are frequently used to support such prediction but missing data are often contained in these data sets and this makes prediction more difficult one common practice is to ignore the cases with missing data but this makes the originally small software project database even smaller and can further decrease the accuracy of prediction the alternative is missing data imputation there are many imputation methods software data sets are frequently characterised by their small size but unfortunately sophisticated imputation methods prefer larger data sets for this reason we explore using simple methods to impute missing data in small project effort data sets we propose class mean imputation cmi method based on the nn hot deck imputation method mini to impute both continuous and nominal missing data in small data sets we use an incremental approach to increase the variance of population to evaluate mini and nn and cmi methods as benchmarks we use data sets with cases and cases sampled from larger industrial data set with and missing data percentages respectively we also simulate missing completely at random mcar and missing at random mar missingness mechanisms the results suggest that the mini method outperforms both cmi and the nn methods we conclude that this new imputation technique can be used to impute missing values in small data sets
the number of mobile phone users has been steadily increasing due to the development of microtechnology and human needs for ubiquitous communication menu design features play significant role in cell phone design from the perspective of customer satisfaction moreover small screens of the type used on mobile phones are limited in the amount of available space therefore it is important to obtain good menu design review of previous menu design studies for human computer interaction suggests that design guidelines for mobile phones need to be reappraised especially display features we propose conceptual model for cell phone menu design with displays the three main factors included in the model are the number of items task complexity and task type
an ad hoc data source is any semistructured data source for which useful data analysis and transformation tools are not readily available such data must be queried transformed and displayed by systems administrators computational biologists financial analysts and hosts of others on regular basis in this paper we demonstrate that it is possible to generate suite of useful data processing tools including semi structured query engine several format converters statistical analyzer and data visualization routines directly from the ad hoc data itself without any human intervention the key technical contribution of the work is multi phase algorithm that automatically infers the structure of an ad hoc data source and produces format specification in the pads data description language programmers wishing to implement custom data analysis tools can use such descriptions to generate printing and parsing libraries for the data alternatively our software infrastructure will push these descriptions through the pads compiler creating format dependent modules that when linked with format independent algorithms for analysis and transformation result infully functional tools we evaluate the performance of our inference algorithm showing it scales linearlyin the size of the training data completing in seconds as opposed to the hours or days it takes to write description by hand we also evaluate the correctness of the algorithm demonstrating that generating accurate descriptions often requires less than of theavailable data
to manage the evolution of software systems effectively software developers must understand software systems identify and evaluate alternative modification strategies implement appropriate modifications and validate the correctness of the modifications one analysis technique that assists in many of these activities is program slicing to facilitate the application of slicing to large software systems we adapted control flow based interprocedural slicing algorithm so that it accounts for interprocedural control dependencies not recognized by other slicing algorithms and reuses slicing information for improved efficiency our initial studies suggest that additional slice accuracy and slicing efficiency may be achieved with our algorithm
this paper addresses necessary modification and extensions to existing grid computing approaches in order to meet modern business demand grid computing has been traditionally used to solve large scientific problems focussing more on accumulative use of computing power and processing large input and output files typical for many scientific problems nowadays businesses have increasing computational demands such that grid technologies are of interest however the existing business requirements introduce new constraints on the design configuration and operation of the underlying systems including availability of resources performance monitoring aspects security and isolation issues this paper addresses the existing grid computing capabilities discussing the additional demands in detail this results in suggestion of problem areas that must be investigated and corresponding technologies that should be used within future business grid systems
existing work on scheduling with energy concern has focused on minimizing the energy for completing all jobs or achieving maximum throughput that is energy usage is secondary concern when compared to throughput and the schedules targeted may be very poor in energy efficiency in this paper we attempt to put energy efficiency as the primary concern and study how to maximize throughput subject to user defined threshold of energy efficiency we first show that all deterministic online algorithms have competitive ratio at least where is the max min ratio of job size nevertheless allowing the online algorithm to have slightly poorer energy efficiency leads to constant ie independent of competitive online algorithm on the other hand using randomization we can reduce the competitive ratio to logδ without relaxing the efficiency threshold finally we consider special case where no jobs are demanding and give deterministic online algorithm with constant competitive ratio for this case
mapreduce and similar systems significantly ease the task of writing data parallel code however many real world computations require pipeline of mapreduces and programming and managing such pipelines can be difficult we present flumejava java library that makes it easy to develop test and run efficient data parallel pipelines at the core of the flumejava library are couple of classes that represent immutable parallel collections each supporting modest number of operations for processing them in parallel parallel collections and their operations present simple high level uniform abstraction over different data representations and execution strategies to enable parallel operations to run efficiently flumejava defers their evaluation instead internally constructing an execution plan dataflow graph when the final results of the parallel operations are eventually needed flumejava first optimizes the execution plan and then executes the optimized operations on appropriate underlying primitives eg mapreduces the combination of high level abstractions for parallel data and computation deferred evaluation and optimization and efficient parallel primitives yields an easy to use system that approaches the efficiency of hand optimized pipelines flumejava is in active use by hundreds of pipeline developers within google
in clustering based wireless sensor network wsn cluster heads play an important role by serving as data forwarders amongst other network organisation functions thus malfunctioning and or compromised sensor nodes that serve as cluster head ch can lead to unreliable data delivery in this paper we propose scheme called secure low energy clustering seclec that incorporates secure cluster head selection and distributed cluster head monitoring to achieve reliable data delivery the seclec framework is flexible and can accommodate various tracking and monitoring mechanisms our goal in this paper is to understand the performance impact of adding such security mechanisms to the clustering architecture in terms of energy consumed and prevented data losses our experiments show that with seclec bs detects such anomalous nodes with the latency of one network operation round and the data loss due to malicious nodes is up to with reasonable communication overhead
we present new technique failure oblivious computing that enables servers to execute through memory errors without memory corruption our safe compiler for inserts checks that dynamically detect invalid memory accesses instead of terminating or throwing an exception the generated code simply discards invalid writes and manufactures values to return for invalid reads enabling the server to continue its normal execution path we have applied failure oblivious computing to set of widely used servers from the linux based open source computing environment our results show that our techniques make these servers invulnerable to known security attacks that exploit memory errors and enable the servers to continue to operate successfully to service legitimate requests and satisfy the needs of their users even after attacks trigger their memory errors we observed several reasons for this successful continued execution when the memory errors occur in irrelevant computations failure oblivious computing enables the server to execute through the memory errors to continue on to execute the relevant computation even when the memory errors occur in relevant computations failure oblivious computing converts requests that trigger unanticipated and dangerous execution paths into anticipated invalid inputs which the error handling logic in the server rejects because servers tend to have small error propagation distances localized errors in the computation for one request tend to have little or no effect on the computations for subsequent requests redirecting reads that would otherwise cause addressing errors and discarding writes that would otherwise corrupt critical data structures such as the call stack localizes the effect of the memory errors prevents addressing exceptions from terminating the computation and enables the server to continue on to successfully process subsequent requests the overall result is substantial extension of the range of requests that the server can successfully process
multidatabase system mdbs integrates information from autonomous local databases managed by heterogeneous database management systems dbms in distributed environment for query involving more than one database global query optimization should be performed to achieve good overall system performance the significant differences between an mdbs and traditional distributed database system ddbs make query optimization in the former more challenging than in the latter challenges for query optimization in an mdbs are discussed in this paper two phase optimization approach for processing query in an mdbs is proposed several global query optimization techniques suitable for an mdbs such as semantic query optimization query optimization via probing queries parametric query optimization and adaptive query optimization are suggested the architecture of global query optimizer incorporating these techniques is designed
we present the design implementation and evaluation of beepbeep high accuracy acoustic based ranging system it operates in spontaneous ad hoc and device to device context without leveraging any pre planned infrastructure it is pure software based solution and uses only the most basic set of commodity hardware speaker microphone and some form of device to device communication so that it is readily applicable to many low cost sensor platforms and to most commercial off the shelf mobile devices like cell phones and pdas it achieves high accuracy through combination of three techniques two way sensing self recording and sample counting the basic idea is the following to estimate the range between two devices each will emit specially designed sound signal beep and collect simultaneous recording from its microphone each recording should contain two such beeps one from its own speaker and the other from its peer by counting the number of samples between these two beeps and exchanging the time duration information with its peer each device can derive the two way time of flight of the beeps at the granularity of sound sampling rate this technique cleverly avoids many sources of inaccuracy found in other typical time of arrival schemes such as clock synchronization non real time handling software delays etc our experiments on two common cell phone models have shown that we can achieve around one or two centimeters accuracy within range of more than ten meters despite series of technical challenges in implementing the idea
the ability of reconfiguring software architectures in order to adapt them to new requirements or changing environment has been of growing interest we propose uniform algebraic approach that improves on previous formal work in the area due to the following characteristics first components are written in high level program design language with the usual notion of state second the approach deals with typical problems such as guaranteeing that new components are introduced in the correct state possibly transferred from the old components they replace and that the resulting architecture conforms to certain structural constraints third reconfigurations and computations are explicitly related by keeping them separate this is because the approach provides semantics to given architecture through the algebraic construction of an equivalent program whose computations can be mirrored at the architectural level
developers and designers always strive for quality software quality software tends to be robust reliable and easy to maintain and thus reduces the cost of software development and maintenance several methods have been applied to improve software quality refactoring is one of those methods the goal of this paper is to validate invalidate the claims that refactoring improves software quality we focused this study on different external quality attributes which are adaptability maintainability understandability reusability and testability we found that refactoring does not necessarily improve these quality attributes
we investigate new approach to editing spatially and temporally varying measured materials that adopts stroke based workflow in our system user specifies small number of editing constraints with painting interface which are smoothly propagated to the entire dataset through an optimization that enforces similar edits are applied to areas with similar appearance the sparse nature of this appearance driven optimization permits the use of efficient solvers allowing the designer to interactively refine the constraints we have found this approach supports specifying wide range of complex edits that would not be easy with existing techniques which present the user with fixed segmentation of the data furthermore it is independent of the underlying reflectance model and we show edits to both analytic and non parametric representations in examples from several material databases
the circular sensing model has been widely used to estimate performance of sensing applications in existing analysis and simulations while this model provides valuable high level guidelines the quantitative results obtained may not reflect the true performance of these applications due to the existence of obstacles and sensing irregularity introduced by insufficient hardware calibration in this project we design and implement two sensing area modeling sam techniques useful in the real world they complement each other in the design space sam provides accurate sensing area models for individual nodes using controlled or monitored events while sam provides continuous sensing similarity models using natural events in an environment with these two models we pioneer an investigation of the impact of sensing irregularity on application performance such as coverage scheduling we evaluate sam extensively in real world settings using three testbeds consisting of micaz motes and xsm motes to study the performance at scale we also provide an extensive node simulation evaluation results reveal several serious issues concerning circular models and demonstrate significant improvements
the successful design and implementation of secure systems must occur from the beginning component that must process data at multiple security levels is very critical and must go through additional evaluation to ensure the processing is secure it is common practice to isolate and separate the processing of data at different levels into different components in this paper we present architecture based refinement techniques for the design of multilevel secure systems we discuss what security requirements must be satisfied through the refinement process including when separation works and when it does not the process oriented approach will lead to verified engineering techniques for secure systems which should greatly reduce the cost of certification of those systems
this paper presents an analysis of the performance effects of burstiness in multi tiered systems we introduce compact characterization of burstiness based on autocorrelation that can be used in capacity planning performance prediction and admission control we show that if autocorrelation exists either in the arrival or the service process of any of the tiers in multi tiered system then autocorrelation propagates to all tiers of the system we also observe the surprising result that in spite of the fact that the bottleneck resource in the system is far from saturation and that the measured throughput and utilizations of other resources are also modest user response times are very high when autocorrelation is not considered this underutilization of resources falsely indicates that the system can sustain higher capacities we examine the behavior of small queuing system that helps us understand this counter intuitive behavior and quantify the performance degradation that originates from autocorrelated flows we present case study in an experimental multi tiered internet server and devise model to capture the observed behavior our evaluation indicates that the model is in excellent agreement with experimental results and captures the propagation of autocorrelation in the multi tiered system and resulting performance trends finally we analyze an admission control algorithm that takes autocorrelation into account and improves performance by reducing the long tail of the response time distribution
in this article we propose techniques that enable efficient exploration of the design space where each logical block can span more than one silicon layer fine grain integration provides reduced intrablock wire delay as well as improved power consumption however the corresponding power and performance advantage is usually underutilized since various implementations of multilayer blocks require novel physical design and microarchitecture infrastructure to explore microarchitecture design space we develop cubic packing engine which can simultaneously optimize physical and architectural design for efficient vertical integration this technique selects the individual unit designs from set of single layer or multilayer implementations to get the best microarchitectural design in terms of performance temperature or both our experimental results using design driver of high performance superscalar processor show percnt performance improvement over traditional for layers and percnt over with single layer unit implementations since thermal characteristics of integrated circuits are among the main challenges thermal aware floorplanning and thermal via insertion techniques are employed to keep the peak temperatures below threshold
large scheduling windows are an effective mechanism for increasing microprocessor performance through the extraction of instruction level parallelism current techniques do not scale effectively for very large windows leading to slow wakeup and select logic as well as large complicated bypass networks this paper introduces new instruction scheduler implementation referred to as hierarchical scheduling windows or hsw which exploits latency tolerant instructions in order to reduce implementation complexity hsw yields very large instruction window that tolerates wakeup select and bypass latency while extracting significant far flung ilpresults it is shown that hsw loses performance per additional cycle of bypass select wakeup latency as compared to monolithic window that loses per additional cycle also hsw achieves the performance of traditional implementations with only to the number of entries in the critical timing path
bubba is highly parallel computer system for data intensive applications the basis of the bubba design is scalable shared nothing architecture which can scale up to thousands of nodes data are declustered across the nodes ie horizontally partitioned via hashing or range partitioning and operations are executed at those nodes containing relevant data in this way parallelism can be exploited within individual transactions as well as among multiple concurrent transactions to improve throughput and response times for data intensive applications the current bubba prototype runs on commercial node multicomputer and includes parallelizing compiler distributed transaction management object management and customized version of unix the current prototype is described and the major design decisions that went into its construction are discussed the lessons learned from this prototype and its predecessors are presented
in this article we consider whether traditional index structures are effective in processing unstable nearest neighbors workloads it is known that under broad conditions nearest neighbors workloads become unstable distances between data points become indistinguishable from each other we complement this earlier result by showing that if the workload for an application is unstable you are not likely to be able to index it efficiently using almost all known multidimensional index structures for broad class of data distributions we prove that these index structures will do no better than linear scan of the data as dimensionality increasesour result has implications for how experiments should be designed on index structures such as trees trees and sr trees simply put experiments trying to establish that these index structures scale with dimensionality should be designed to establish crossover points rather than to show that the methods scale to an arbitrary number of dimensions in other words experiments should seek to establish the dimensionality of the dataset at which the proposed index structure deteriorates to linear scan for each data distribution of interest that linear scan will eventually dominate is givenan important problem is to analytically characterize the rate at which index structures degrade with increasing dimensionality because the dimensionality of real data set may well be in the range that particular method can handle the results in this article can be regarded as step toward solving this problem although we do not characterize the rate at which structure degrades our techniques allow us to reason directly about broad class of index structures rather than the geometry of the nearest neighbors problem in contrast to earlier work
modeling is core software engineering practice conceptual models are constructed to establish an abstract understanding of the domain among stakeholders these are then refined into computational models that aim to realize conceptual specification the refinement process yields sets of models that are initially incomplete and inconsistent by nature the aim of the engineering process is to negotiate consistency and completeness toward stable state sufficient for deployment implementation this paper presents the notion of model ecosystem which permits the capability to guide analyst edits toward stability by computing consistency and completeness equilibria for conceptual models during periods of model change
several forms of reasoning in ai like abduction closed world reasoning circumscription and disjunctive logic programming are well known to be intractable in fact many of the relevant problems are on the second or third level of the polynomial hierarchy in this paper we show how the notion of treewidth can be fruitfully applied to this area in particular we show that all these problems become tractable actually even solvable in linear time if the treewidth of the involved formulae or programs is bounded by some constant clearly these theoretical tractability results as such do not immediately yield feasible algorithms however we have recently established new method based on monadic datalog which allowed us to design an efficient algorithm for related problem in the database area in this work we exploit the monadic datalog approach to construct new algorithms for logic based abduction
we propose different implementations of the sparse matrix dense vector multiplication spmv for finite fields and rings we take advantage of graphic card processors gpu and multi core architectures our aim is to improve the speed of spmv in the linbox library and henceforth the speed of its black box algorithms besides we use this library and new parallelisation of the sigma basis algorithm in parallel block wiedemann rank implementation over finite fields
fairground thrill laboratory was series of live events that augmented the experience of amusement rides wearable telemetry system captured video audio heart rate and acceleration data streaming them live to spectator interfaces and watching audience in this paper we present study of this event which draws on video recordings and post event interviews and which highlights the experiences of riders spectators and ride operators our study shows how the telemetry system transformed riders into performers spectators into an audience and how the role of ride operator began to include aspects of orchestration with the relationship between all three roles also transformed critically the introduction of telemetry system seems to have had the potential to re connect riders performers back to operators orchestrators and spectators audience re introducing closer relationship that used to be available with smaller rides introducing telemetry to real world situation also creates significant complexity which we illustrate by focussing on moment of perceived crisis
in order to achieve good performance in object classification problems it is necessary to combine information from various image features because the large margin classifiers are constructed based on similarity measures between samples called kernels finding appropriate feature combinations boils down to designing good kernels among set of candidates for example positive mixtures of predetermined base kernels there are couple of ways to determine the mixing weights of multiple kernels uniform weights brute force search over validation set and multiple kernel learning mkl mkl is theoretically and technically very attractive because it learns the kernel weights and the classifier simultaneously based on the margin criterion however we often observe that the support vector machine svm with the average kernel works at least as good as mkl in this paper we propose as an alternative two step approach at first the kernel weights are determined by optimizing the kernel target alignment score and then the combined kernel is used by the standard svm with single kernel the experimental results with the voc data set show that our simple procedure outperforms the average kernel and mkl
reinhard wilhelm’s career in computer science spans more than third of century during this time he has made numerous research contributions in the areas of programming languages compilers and compiler generators static program analysis program transformation algorithm animation and real time systems co founded company to transfer some of these ideas to industry held the chair for programming languages and compiler construction at saarland university and served since its inception as the scientific director of the international conference and research center for computer science at schloÃ� dagstuhl
in this paper we evaluate the atomic region compiler abstraction by incorporating it into commercial system we find that atomic regions are simple and intuitive to integrate into an binary translation system furthermore doing so trivially enables additional optimization opportunities beyond that achievable by high performance dynamic optimizer which already implements superblocks we show that atomic regions can suffer from severe performance penalties if misspeculations are left uncontrolled but that simple software control mechanism is sufficient to reign in all detrimental side effects we evaluate using full reference runs of the spec cpu integer benchmarks and find that atomic regions enable up to on average improvement beyond the performance of tuned product these performance improvements are achieved without any negative side effects performance side effects such as code bloat are absent with atomic regions in fact static code size is reduced the hardware necessary is synergistic with other needs and was already available on the commercial product used in our evaluation finally the software complexity is minimal as single developer was able to incorporate atomic regions into sophisticated line code base in three months despite never having seen the translator source code beforehand
twin page storage method which is an alternative to the twist twin slot approach by reuter
the primary business model behind web search is based on textual advertising where contextually relevant ads are displayed alongside search results we address the problem of selecting these ads so that they are both relevant to the queries and profitable to the search engine showing that optimizing ad relevance and revenue is not equivalent selecting the best ads that satisfy these constraints also naturally incurs high computational costs and time constraints can lead to reduced relevance and profitability we propose novel two stage approach which conducts most of the analysis ahead of time an offine preprocessing phase leverages additional knowledge that is impractical to use in real time and rewrites frequent queries in way that subsequently facilitates fast and accurate online matching empirical evaluation shows that our method optimized for relevance matches state of the art method while improving expected revenue when optimizing for revenue we see even more substantial improvements in expected revenue
modern web search engines use different strategies to improve the overall quality of their document rankings usually the strategy adopted involves the combination of multiple sources of relevance into single ranking this work proposes the use of evolutionary techniques to derive good evidence combination functions using three different sources of evidence of relevance the textual content of documents the reputation of documents extracted from the connectivity information available in the processed collection and the anchor text concatenation the combination functions discovered by our evolutionary strategies were tested using collection containing queries extracted from real nation wide search engine query log with over million documents the experiments performed indicate that our proposal is an effective and practical alternative for combining sources of evidence into single ranking we also show that different types of queries submitted to search engine can require different combination functions and that our proposal is useful for coping with such differences
we propose framework for integrating data from multiple relational sources into an xml document that both conforms to given dtd and satisfies predefined xml constraints the framework is based on specification language aig that extends dtd by associating element types with semantic attributes inherited and synthesized inspired by the corresponding notions from attribute grammars computing these attributes via parameterized sql queries over multiple data sources and incorporating xml keys and inclusion constraints the novelty of aig consists in semantic attributes and their dependency relations for controlling context dependent dtd directed construction of xml documents as well as for checking xml constraints in parallel with document generation we also present cost based optimization techniques for efficiently evaluating aigs including algorithms for merging queries and for scheduling queries on multiple data sources this provides new grammar based approach for data integration under both syntactic and semantic constraints
technology trends present new challenges for processor architectures and their instruction schedulers growing transistor density will increase the number of execution units on single chip and decreasing wire transmission speeds will cause long and variable on chip latencies these trends will severely limit the two dominant conventional architectures dynamic issue superscalars and static placement and issue vliws we present new execution model in which the hardware and static scheduler instead work cooperatively called static placement dynamic issue spdi this paper focuses on the static instruction scheduler for spdi we identify and explore three issues spdi schedulers must consider locality contention and depth of speculation we evaluate range of spdi scheduling algorithms executing on an explicit data graph execution edge architecture we find that surprisingly simple one achieves an average of instructions per cycle ipc for spec wide issue machine and is within of the performance without on chip latencies these results suggest that the compiler is effective at balancing on chip latency and parallelism and that the division of responsibilities between the compiler and the architecture is well suited to future systems
the operations and management activities of enterprises are mainly task based and knowledge intensive accordingly an important issue in deploying knowledge management systems is the provision of task relevant information codified knowledge to meet the information needs of knowledge workers during the execution of task codified knowledge extracted from previously executed tasks can provide valuable knowledge about conducting the task at hand current task and is valuable information source for constructing task profile that models worker’s task needs ie information needs for the current task in this paper we propose novel task relevance assessment approach that evaluates the relevance of previous tasks in order to construct task profile for the current task the approach helps knowledge workers assess the relevance of previous tasks through linguistic evaluation and the collaboration of knowledge workers in addition applying relevance assessment to large number of tasks may create an excessive burden for workers thus we propose novel two phase relevance assessment method to help workers conduct relevance assessment effectively furthermore modified relevance feedback technique which is integrated with the task relevance assessment method is employed to derive the task profile for the task at hand consequently task based knowledge support can be enabled to provide knowledge workers with task relevant information based on task profiles empirical experiments demonstrate that the proposed approach models workers task needs effectively and helps provide task relevant knowledge
some significant progress related to multidimensional data analysis has been achieved in the past few years including the design of fast algorithms for computing datacubes selecting some precomputed group bys to materialize and designing efficient storage structures for multidimensional data however little work has been carried out on multidimensional query optimization issues particularly the response time or evaluation cost for answering several related dimensional queries simultaneously is crucial to the olap applications recently zhao et al first exploited this problem by presenting three heuristic algorithms in this paper we first consider in detail two cases of the problem in which all the queries are either hash based star joins or index based star joins only in the case of the hash based star join we devise polynomial approximation algorithm which delivers plan whose evaluation cost is epsilon times the optimal where is the number of queries and epsilon is fixed constant with epsilon leq we also present an exponential algorithm which delivers plan with the optimal evaluation cost in the case of the index based star join we present heuristic algorithm which delivers plan whose evaluation cost is times the optimal and an exponential algorithm which delivers plan with the optimal evaluation cost we then consider general case in which both hash based star join and index based star join queries are included for this case we give possible improvement on the work of zhao et al based on an analysis of their solutions we also develop another heuristic and an exact algorithm for the problem we finally conduct performance study by implementing our algorithms the experimental results demonstrate that the solutions delivered for the restricted cases are always within two times of the optimal which confirms our theoretical upper bounds actually these experiments produce much better results than our theoretical estimates to the best of our knowledge this is the only development of polynomial algorithms for the first two cases which are able to deliver plans with deterministic performance guarantees in terms of the qualities of the plans generated the previous approaches including that of zdns may generate feasible plan for the problem in these two cases but they do not provide any performance guarantee ie the plans generated by their algorithms can be arbitrarily far from the optimal one
recently method for removing shadows from colour images was developed finlayson et al in ieee trans pattern anal mach intell that relies upon finding special direction in chromaticity feature space this invariant direction is that for which particular colour features when projected into produce greyscale image which is approximately invariant to intensity and colour of scene illumination thus shadows which are in essence particular type of lighting are greatly attenuated the main approach to finding this special angle is camera calibration colour target is imaged under many different lights and the direction that best makes colour patch images equal across illuminants is the invariant direction here we take different approach in this work instead of camera calibration we aim at finding the invariant direction from evidence in the colour image itself specifically we recognize that producing projection in the correct invariant direction will result in distribution of pixel values that have smaller entropy than projecting in the wrong direction the reason is that the correct projection results in probability distribution spike for pixels all the same except differing by the lighting that produced their observed rgb values and therefore lying along line with orientation equal to the invariant direction hence we seek that projection which produces type of intrinsic independent of lighting reflectance information only image by minimizing entropy and from there go on to remove shadows as previously to be able to develop an effective description of the entropy minimization task we go over to the quadratic entropy rather than shannon’s definition replacing the observed pixels with kernel density probability distribution the quadratic entropy can be written as very simple formulation and can be evaluated using the efficient fast gauss transform the entropy written in this embodiment has the advantage that it is more insensitive to quantization than is the usual definition the resulting algorithm is quite reliable and the shadow removal step produces good shadow free colour image results whenever strong shadow edges are present in the image in most cases studied entropy has strong minimum for the invariant direction revealing new property of image formation
we investigate the relationship between symmetry reduction and inductive reasoning when applied to model checking networks of featured components popular reduction techniques for combatting state space explosion in model checking like abstraction and symmetry reduction can only be applied effectively when the natural symmetry of system is not destroyed during specification we introduce property which ensures this is preserved open symmetry we describe template based approach for the construction of open symmetric promela specifications of featured systems for certain systems safely featured parameterised systems our generated specifications are suitable for conversion to abstract specifications representing any size of network this enables feature interaction analysis to be carried out via model checking and induction for systems of any number of featured components in addition we show how for any balanced network of components by using graphical representation of the features and the process communication structure group of permutations of the underlying state space of the generated specification can be determined easily due to the open symmetry of our promela specifications this group of permutations can be used directly for symmetry reduced model checking the main contributions of this paper are an automatic method for developing open symmetric specifications which can be used for generic feature interaction analysis and the novel application of symmetry detection and reduction in the context of model checking featured networks we apply our techniques to well known example of featured network an email system
log polar imaging consists of type of methods that represent visual information with space variant resolution inspired by the visual system of mammals it has been studied for about three decades and has surpassed conventional approaches in robotics applications mainly the ones where real time constraints make it necessary to utilize resource economic image representations and processing methodologies this paper surveys the application of log polar imaging in robotic vision particularly in visual attention target tracking egomotion estimation and perception the concise yet comprehensive review offered in this paper is intended to provide novel and experienced roboticists with quick and gentle overview of log polar vision and to motivate vision researchers to investigate the many open problems that still need solving to help readers identify promising research directions possible research agenda is outlined finally since log polar vision is not restricted to robotics couple of other areas of application are discussed
to understand how and why individuals make use of emerging information assimilation services on the web as part of their daily routine we combined video recordings of online activity with targeted interviews of eleven experienced web users from these observations we describe their choice of systems the goals they are trying to achieve their information diets the basic process they use for assimilating information and the impact of user interface speed
in this article we explore the syntactic and semantic properties of prepositions in the context of the semantic interpretation of nominal phrases and compounds we investigate the problem based on cross linguistic evidence from set of six languages english spanish italian french portuguese and romanian the focus on english and romance languages is well motivated most of the time english nominal phrases and compounds translate into constructions of the form in romance languages where the preposition may vary in ways that correlate with the semantics thus we present empirical observations on the distribution of nominal phrases and compounds and the distribution of their meanings on two different corpora based on two state of the art classification tag sets lauer’s set of eight prepositions and our list of semantic relations mapping between the two tag sets is also provided furthermore given training set of english nominal phrases and compounds along with their translations in the five romance languages our algorithm automatically learns classification rules and applies them to unseen test instances for semantic interpretation experimental results are compared against two state of the art models reported in the literature
this paper investigates the data exchange problem among distributed independent sources it is based on previous works in in which declarative semantics for pp systems in this semantics only facts not making the local databases inconsistent are imported weak models and the preferred weak models are those in which peers import maximal sets of facts not violating integrity constraints the framework proposed in does not provide any mechanism to set priorities among mapping rules anyhow while collecting data it is quite natural for source peer to associate different degrees of reliability to the portion of data provided by its neighbor peers starting from this observation this paper enhances previous semantics by using priority levels among mapping rules in order to select the weak models containing maximum number of mapping atoms according to their importance we will call these weak models trusted weak models and we will show they can be computed as stable models of logic program with weak constraints
significant effort has been invested in developing expressive and flexible access control languages and systems however little has been done to evaluate these systems in practical situations with real users and few attempts have been made to discover and analyze the access control policies that users actually want to implement we report on user study in which we derive the ideal access policies desired by group of users for physical security in an office environment we compare these ideal policies to the policies the users actually implemented with keys and with smartphone based distributed access control system we develop methodology that allows us to show quantitatively that the smartphone system allowed our users to implement their ideal policies more accurately and securely than they could with keys and we describe where each system fell short
location based routing lbr is one of the most widely used routing strategies in large scale wireless sensor networks with lbr small cheap and resource constrained nodes can perform the routing function without the need of complex computations and large amounts of memory space further nodes do not need to send energy consuming periodic advertisements because routing tables in the traditional sense are not needed one important assumption made by most lbr protocols is the availability of location service or mechanism to find other nodes positions although several mechanisms exist most of them rely on some sort of flooding procedure unsuitable for large scale wireless sensor networks especially with multiple and moving sinks and sources in this paper we introduce the anchor location service als protocol grid based protocol that provides sink location information in scalable and efficient manner and therefore supports location based routing in large scale wireless sensor networks the location service is evaluated mathematically and by simulations and also compared with well known grid based routing protocol our results demonstrate that als not only provides an efficient and scalable location service but also reduces the message overhead and the state complexity in scenarios with multiple and moving sinks and sources which are not usually included in the literature
ontology mapping is mandatory requirement for enabling semantic interoperability among different agents and services relying on different ontologies this aspect becomes more critical in peer to peer pp networks for several reasons the number of different ontologies can dramatically increase ii mappings among peer ontologies have to be discovered on the fly and only on the parts of ontologies contextual to specific interaction in which peers are involved iii complex mapping strategies eg structural mapping based on graph matching cannot be exploited since peers are not aware of one another’s ontologies in order to address these issues we developed new ontology mapping algorithm called semantic coordinator secco secco is composed by three individual matchers syntactic lexical and contextual the syntactic matcher in order to discover mappings exploits different kinds of linguistic information eg comments labels encoded in ontology entities the lexical matcher enables discovering mappings in semantic way since it interprets the semantic meaning of concepts to be compared the contextual matcher relies on how it fits strategy inspired by the contextual theory of meaning and by taking into account the contexts in which the concepts to be compared are used refines similarity values we show through experimental results that secco fulfills two important requirements fastness and accuracy ie quality of mappings secco differently from other semantic pp applications eg piazza gridvine that assume the preexistence of mappings for achieving semantic interoperability focuses on the problem of finding mappings therefore if coupled with pp platform it paves the way towards comprehensive semantic pp solution for content sharing and retrieval semantic query answering and query routing we report on the advantages of integrating secco in the link system
in this paper we develop an automatic wrapper for the extraction of multiple sections data records from search engine results pages in the information extraction world less attention has been focused on the development of wrappers for the extraction of multiple sections data records this is evidenced by the fact that there is only one automatic wrapper mse developed for this purpose using the separation distance of data records and sections mse is able to distinguish sections and data records and extract them from search engine results pages in this study our approach is the use of dom tree properties to develop an adaptive search method which is able to detect differentiate and partition sections and data records the multiple sections data records labeled are used to pass through few filtering stages each filter is designed to filter out particular group of irrelevant data until one data region containing the relevant records is found our filtering rules are designed based on visual cue such as text and image size obtained from the browser rendering engine experimental results show that our wrapper is able to obtain better results than the currently available mse wrapper
discovery of sequential patterns is an essential data mining task with broad applications among several variations of sequential patterns closed sequential pattern is the most useful one since it retains all the information of the complete pattern set but is often much more compact than it unfortunately there is no parallel closed sequential pattern mining method proposed yet in this paper we develop an algorithm called par csp parallel closed sequential pattern mining to conduct parallel mining of closed sequential patterns on distributed memory system par csp partitions the work among the processors by exploiting the divide and conquer property so that the overhead of interprocessor communication is minimized par csp applies dynamic scheduling to avoid processor idling moreover it employs technique called selective sampling to address the load imbalance problem we implement par csp using mpi on node linux cluster our experimental results show that par csp attains good parallelization efficiencies on various input datasets
spearman’s footrule and kendall’s tau are two well established distances between rankings they however fail to take into account concepts crucial to evaluating result set in information retrieval element relevance and positional information that is changing the rank of highly relevant document should result in higher penalty than changing the rank of an irrelevant document similar logic holds for the top versus the bottom of the result ordering in this work we extend both of these metrics to those with position and element weights and show that variant of the diaconis graham inequality still holds the generalized two measures remain within constant factor of each other for all permutations we continue by extending the element weights into distance metric between elements for example in search evaluation swapping the order of two nearly duplicate results should result in little penalty even if these two are highly relevant and appear at the top of the list we extend the distance measures to this more general case and show that they remain within constant factor of each other we conclude by conducting simple experiments on web search data with the proposed measures our experiments show that the weighted generalizations are more robust and consistent with each other than their unweighted counter parts
we propose new approach for constructing pp networks based on dynamic decomposition of continuous space into cells corresponding to servers we demonstrate the power of this approach by suggesting two new pp architectures and various algorithms for them the first serves as dht distributed hash table and the other is dynamic expander network the dht network which we call distance halving allows logarithmic routing and load while preserving constant degrees it offers an optimal tradeoff between degree and path length in the sense that degree guarantees path length of logd another advantage over previous constructions is its relative simplicity major new contribution of this construction is dynamic caching technique that maintains low load and storage even under the occurrence of hot spots our second construction builds network that is guaranteed to be an expander the resulting topologies are simple to maintain and implement their simplicity makes it easy to modify and add protocols small variation yields dht which is robust against random byzantine faults finally we show that using our approach it is possible to construct any family of constant degree graphs in dynamic environment though with worse parameters therefore we expect that more distributed data structures could be designed and implemented in dynamic environment
wormhole attack is particularly harmful against routing in sensor networks where an attacker receives packets at one location in the network tunnels and then replays them at another remote location in the network wormhole attack can be easily launched by an attacker without compromising any sensor nodes since most of the routing protocols do not have mechanisms to defend the network against wormhole attacks the route request can be tunneled to the target area by the attacker through wormholes thus the sensor nodes in the target area build the route through the attacker later the attacker can tamper the data messages or selectively forward data messages to disrupt the functions of the sensor network researchers have used some special hardware such as the directional antenna and the precise synchronized clock to defend the sensor network against wormhole attacks during the neighbor discovery process in this paper we propose secure routing protocol against wormhole attacks in sensor networks serwa serwa protocol avoids using any special hardware such as the directional antenna and the precise synchronized clock to detect wormhole moreover it provides real secure route against the wormhole attack simulation results show that serwa protocol only has very small false positives for wormhole detection during the neighbor discovery process less than the average energy usage at each node for serwa protocol during the neighbor discovery and route discovery is below mj which is much lower than the available energy kj at each node the cost analysis shows that serwa protocol only needs small memory usage at each node below kb if each node has neighbors which is suitable for the sensor network
exception handling mechanisms are intended to support the development of robust software however the implementation of such mechanisms with aspect oriented ao programming might lead to error prone scenarios as aspects extend or replace existing functionality at specific join points in the code execution aspects behavior may bring new exceptions which can flow through the program execution in unexpected ways this paper presents systematic study that assesses the error proneness of aop mechanisms on exception flows of evolving programs the analysis was based on the object oriented and the aspect oriented versions of three medium sized systems from different application domains our findings show that exception handling code in ao systems is error prone since all versions analyzed presented an increase in the number of uncaught exceptions and exceptions caught by the wrong handler the causes of such problems are characterized and presented as catalogue of bug patterns
service oriented systems are constructed using web services as first class programmable units and subsystems and there have been many successful applications of such systems however there is major unresolved problem with the software development and subsequent management of these applications and systems web service interfaces and implementations may be developed and changed autonomously which makes traditional configuration management practices inadequate for web services checking the compatibility of these programmable units turns out to be difficult task in this paper we present technique for checking compatibility of web service interfaces and implementations based on categorizing domain ontology instances of service description documents this technique is capable of both assessing the compatibility and identifying incompatibility factors of service interfaces and implementations the design details of system model for web service compatibility checking and the key operator for evaluating compatibility within the model are discussed we present simulation experiments and analyze the results to show the effectiveness and performance variations of our technique with different data source patterns
in this paper we are interested in minimizing the delay and maximizing the lifetime of event driven wireless sensor networks for which events occur infrequently in such systems most of the energy is consumed when the radios are on waiting for packet to arrive sleep wake scheduling is an effective mechanism to prolong the lifetime of these energy constrained wireless sensor networks however sleep wake scheduling could result in substantial delays because transmitting node needs to wait for its next hop relay node to wake up an interesting line of work attempts to reduce these delays by developing anycast based packet forwarding schemes where each node opportunistically forwards packet to the first neighboring node that wakes up among multiple candidate nodes in this paper we first study how to optimize the anycast forwarding schemes for minimizing the expected packet delivery delays from the sensor nodes to the sink based on this result we then provide solution to the joint control problem of how to optimally control the system parameters of the sleep wake scheduling protocol and the anycast packet forwarding protocol to maximize the network lifetime subject to constraint on the expected end to end packet delivery delay our numerical results indicate that the proposed solution can outperform prior heuristic solutions in the literature especially under practical scenarios where there are obstructions eg lake or mountain in the coverage area of the wireless sensor network
in this paper we propose sharpness dependent filter design based on the fairing of surface normal whereby the filtering algorithm automatically selects filter this may be mean filter min filter or filter ranked between these two depending on the local sharpness value and the sharpness dependent weighting function selected to recover the original shape of noisy model the algorithm selects mean filter for flat regions and min filter for distinguished sharp regions the selected sharpness dependent weighting function has gaussian laplacian or el fallah ford form that approximately fits the sharpness distribution found in all tested noisy models we use sharpness factor in the weighting function to control the degree of feature preserving the appropriate sharpness factor can be obtained by sharpness analysis based on the bayesian classification our experiment results demonstrate that the proposed sharpness dependent filter is superior to other approaches for smoothing polygon mesh as well as for preserving its sharp features
singleton kinds provide an elegant device for expressing type equality information resulting from modern module languages but they can complicate the metatheory of languages in which they appear present translation from language with singleton kinds to one without and prove this translation to be sound and complete this translation is useful for type preserving compilers generating typed target languages the proof of soundness and completeness is done by normalizing type equivalence derivations using stone and harper’s type equivalence decision procedure
the paper investigates geometric properties of quasi perspective projection model in one and two view geometry the main results are as follows quasi perspective projection matrix has nine degrees of freedom dof and the parallelism along and directions in world system are preserved in images ii quasi fundamental matrix can be simplified to special form with only six dofs the fundamental matrix is invariant to any non singular projective transformation iii plane induced homography under quasi perspective model can be simplified to special form defined by six dofs the quasi homography may be recovered from two pairs of corresponding points with known fundamental matrix iv any two reconstructions in quasi perspective space are defined up to non singular quasi perspective transformation the results are validated experimentally on both synthetic and real images
this paper presents an end user oriented programming environment called mashroom major contributions herein include an end user programming model with an expressive data structure as well as set of formally defined mashup operators the data structure takes advantage of nested table and maintains the intuitiveness while allowing users to express complex data objects the mashup operators are visualized with contextual menu and formula bar and can be directly applied on the data experiments and case studies reveal that end users have little difficulty in effectively and efficiently using mashroom to build mashup applications
how to save energy is critical issue for the life time of sensor networks under continuously changing environments sensor nodes have varying sampling rates in this paper we present an online algorithm to minimize the total energy consumption while satisfying sampling rate with guaranteed probability we model the sampling rate as random variable which is estimated over finite time window an efficient algorithm eosp energy aware online algorithm to satisfy sampling rates with guaranteed probability is proposed our approach can adapt the architecture accordingly to save energy experimental results demonstrate the effectiveness of our approach
policies in modern systems and applications play an essential role we argue that decisions based on policy rules should take into account the possibility for the users to enable specific policy rules by performing actions at the time when decisions are being rendered and or by promising to perform other actions in the future decisions should also consider preferences among different sets of actions enabling different rules we adopt formalism and mechanism devised for policy rule management in this context and investigate in detail the notion of obligations which are those actions users promise to perform in the future upon firing of specific policy rule we also investigate how obligations can be monitored and how the policy rules should be affected when obligations are either fulfilled or defaulted
vision based human action recognition provides an advanced interface and research in the field of human action recognition has been actively carried out however an environment from dynamic viewpoint where we can be in any position any direction etc must be considered in our living space in order to overcome the viewpoint dependency we propose volume motion template vmt and projected motion template pmt the proposed vmt method is an extension of the motion history image mhi method to space the pmt is generated by projecting the vmt into plane that is orthogonal to an optimal virtual viewpoint where the optimal virtual viewpoint is viewpoint from which an action can be described in greatest detail in space from the proposed method any actions taken from different viewpoints can be recognized independent of the viewpoints the experimental results demonstrate the accuracies and effectiveness of the proposed vmt method for view independent human action recognition
in blaze bleumer and strauss bbs proposed an application called atomic proxy re encryption in which semitrusted proxy converts ciphertext for alice into ciphertext for bob without seeing the underlying plaintext we predict that fast and secure re encryption will become increasingly popular as method for managing encrypted file systems although efficiently computable the wide spread adoption of bbs re encryption has been hindered by considerable security risks following recent work of dodis and ivan we present new re encryption schemes that realize stronger notion of security and demonstrate the usefulness of proxy re encryption as method of adding access control to secure file system performance measurements of our experimental file system demonstrate that proxy re encryption can work effectively in practice
advances in biological experiments such as dna microarrays have produced large multidimensional data sets for examination and retrospective analysis scientists however heavily rely on existing biomedical knowledge in order to fully analyze and comprehend such datasets our proposed framework relies on the gene ontology for integrating priori biomedical knowledge into traditional data analysis approaches we explore the impact of considering each aspect of the gene ontology individually for quantifying the biological relatedness between gene products we discuss two figure of merit scores for quantifying the pair wise biological relatedness between gene products and the intra cluster biological coherency of groups of gene products finally we perform cluster deterioration simulation experiments on well scrutinized saccharomyces cerevisiae data set consisting of hybridization measurements the results presented illustrate strong correlation between the devised cluster coherency figure of merit and the randomization of cluster membership
modern business process management expands to cover the partner organisations business processes across organisational boundaries and thereby supports organisations to coordinate the flow of information among organisations and link their business processes with collaborative business processes organisations can create dynamic and flexible collaborations to synergically adapt to the changing conditions and stay competitive in the global market due to its significant potential and value collaborative business processes are now turning to be an important issue of contemporary business process management and attracts lots of attention and efforts from both academic and industry sides in this paper we review the development of bb collaboration and collaborative business processes provide an overview of related issues in managing collaborative business processes and discuss some emerging technologies and their relationships to collaborative business processes finally we introduce the papers that are published in this special issue
we apply an extension of the nelson oppen combination method to develop decision procedure for the non disjoint union of theories modeling data structures with counting operator and fragments of arithmetic we present some data structures and some fragments of arithmetic for which the combination method is complete and effective to achieve effectiveness the combination method relies on particular procedures to compute sets that are representative of all the consequences over the shared theory we show how to compute these sets by using superposition calculus for the theories of the considered data structures and various solving and reduction techniques for the fragments of arithmetic we are interested in including gauss elimination fourier motzkin elimination and groebner bases computation
we present the first to our knowledge approximation algorithm for tensor clustering powerful generalization to basic clustering tensors are increasingly common in modern applications dealing with complex heterogeneous data and clustering them is fundamental tool for data analysis and pattern discovery akin to their cousins common tensor clustering formulations are np hard to optimize but unlike the case no approximation algorithms seem to be known we address this imbalance and build on recent co clustering work to derive tensor clustering algorithm with approximation guarantees allowing metrics and divergences eg bregman as objective functions therewith we answer two open questions by anagnostopoulos et al our analysis yields constant approximation factor independent of data size worst case example shows this factor to be tight for euclidean co clustering however empirically the approximation factor is observed to be conservative so our method can also be used in practice
many multicast overlay networks maintain application specific performance goals by dynamically adapting the overlay structure when the monitored performance becomes inadequate this adaptation results in an unstructured overlay where no neighbor selection constraints are imposed although such networks provide resilience to benign failures they are susceptible to attacks conducted by adversaries that compromise overlay nodes previous defense solutions proposed to address attacks against overlay networks rely on strong organizational constraints and are not effective for unstructured overlays in this work we identify demonstrate and mitigate insider attacks against measurement based adaptation mechanisms in unstructured multicast overlay networks we propose techniques to decrease the number of incorrect adaptations by using outlier detection and limit the impact of malicious nodes by aggregating local information to derive global reputation for each node we demonstrate the attacks and mitigation techniques through real life deployments of mature overlay multicast system
due to the rapid development in mobile communication technologies the usage of mobile devices such as cell phone or pda has increased significantly as different devices require different applications various new services are being developed to satisfy the needs one of the popular services under heavy demand is the location based service lbs that exploits the spatial information of moving objects per temporal changes in order to support lbs well in this paper we investigate how spatio temporal information of moving objects can be efficiently stored and indexed in particular we propose novel location encoding method based on hierarchical administrative district information our proposal is different from conventional approaches where moving objects are often expressed as geometric points in two dimensional space instead in ours moving objects are encoded as one dimensional points by both administrative district as well as road information our method becomes especially useful for monitoring traffic situation or tracing location of moving objects through approximate spatial queries
as current trends in software development move toward more complex object oriented programming inlining has become vital optimization that provides substantial performance improvements to and java programs yet the aggressiveness of the inlining algorithm must be carefully monitored to effectively balance performance and code size the state of the art is to use profile information associated with call edges to guide inlining decisions in the presence of virtual method calls profile information for one call edge may not be sufficient for making effectual inlining decisions therefore we explore the use of profiling data with additional levels of context sensitivity in addition to exploring fixed levels of context sensitivity we explore several adaptive schemes that attempt to find the ideal degree of context sensitivity for each call site our techniques are evaluated on the basis of runtime performance code size and dynamic compilation time on average we found that with minimal impact on performance context sensitivity can enable reductions in compiled code space and compile time performance on individual programs varied from minus to while reductions in compile time and code space of up to and respectively were obtained
suffix trees are by far the most important data structure in stringology with myriads of applications in fields like bioinformatics and information retrieval classical representations of suffix trees require log bits of space for string of size this is considerably more than the log bits needed for the string itself where is the alphabet size the size of suffix trees has been barrier to their wider adoption in practice recent compressed suffix tree representations require just the space of the compressed string plus extra bits this is already spectacular but still unsatisfactory when is small as in dna sequences in this paper we introduce the first compressed suffix tree representation that breaks this linear space barrier our representation requires sublinear extra space and supports large set of navigational operations in logarithmic time an essential ingredient of our representation is the lowest common ancestor lca query we reveal important connections between lca queries and suffix tree navigation
approximation algorithms for clustering points in metric spaces is flourishing area of research with much research effort spent on getting better understanding of the approximation guarantees possible for many objective functions such as median means and min sum clustering this quest for better approximation algorithms is further fueled by the implicit hope that these better approximations also yield more accurate clusterings eg for many problems such as clustering proteins by function or clustering images by subject there is some unknown correct target clustering and the implicit hope is that approximately optimizing these objective functions will in fact produce clustering that is close pointwise to the truth in this paper we show that if we make this implicit assumption explicit that is if we assume that any approximation to the given clustering objective phi is epsilon close to the target then we can produce clusterings that are epsilon close to the target even for values for which obtaining approximation is np hard in particular for median and means objectives we show that we can achieve this guarantee for any constant and for the min sum objective we can do this for any constant our results also highlight surprising conceptual difference between assuming that the optimal solution to say the median objective is epsilon close to the target and assuming that any approximately optimal solution is epsilon close to the target even for approximation factor say in the former case the problem of finding solution that is epsilon close to the target remains computationally hard and yet for the latter we have an efficient algorithm
decoupled software pipelining dswp is one approach to automatically extract threads from loops it partitions loops into long running threads that communicate in pipelined manner via inter core queues this work recognizes that dswp can also be an enabling transformation for other loop parallelization techniques this use of dswp called dswp splits loop into new loops with dependence patterns amenable to parallelization using techniques that were originally either inapplicable or poorly performing by parallelizing each stage of the dswp pipeline using potentially different techniques not only is the benefit of dswp increased but the applicability and performance of other parallelization techniques are enhanced this paper evaluates dswp as an enabling framework for other transformations by applying it in conjunction with doall localwrite and specdoall to individual stages of the pipeline this paper demonstrates significant performance gains on commodity core multicore machine running variety of codes transformed with dswp
insufficiency of labeled training data is major obstacle for automatic video annotation semi supervised learning is an effective approach to this problem by leveraging large amount of unlabeled data however existing semi supervised learning algorithms have not demonstrated promising results in large scale video annotation due to several difficulties such as large variation of video content and intractable computational cost in this paper we propose novel semi supervised learning algorithm named semi supervised kernel density estimation sskde which is developed based on kernel density estimation kde approach while only labeled data are utilized in classical kde in sskde both labeled and unlabeled data are leveraged to estimate class conditional probability densities based on an extended form of kde it is non parametric method and it thus naturally avoids the model assumption problem that exists in many parametric semi supervised methods meanwhile it can be implemented with an efficient iterative solution process so this method is appropriate for video annotation furthermore motivated by existing adaptive kde approach we propose an improved algorithm named semi supervised adaptive kernel density estimation ssakde it employs local adaptive kernels rather than fixed kernel such that broader kernels can be applied in the regions with low density in this way more accurate density estimates can be obtained extensive experiments have demonstrated the effectiveness of the proposed methods
acquiring models of intricate objects like tree branches bicycles and insects is challenging task due to severe self occlusions repeated thin structures and surface discontinuities in theory shape from silhouettes sfs approach can overcome these difficulties and reconstruct visual hulls that are close to the actual shapes regardless of the complexity of the object in practice however sfs is highly sensitive to errors in silhouette contours and the calibration of the imaging system and has therefore not been used for obtaining accurate shapes with large number of views in this work we present practical approach to sfs using novel technique called coplanar shadowgram imaging that allows us to use dozens to even hundreds of views for visual hull reconstruction point light source is moved around an object and the shadows silhouettes cast onto single background plane are imaged we characterize this imaging system in terms of image projection reconstruction ambiguity epipolar geometry and shape and source recovery the coplanarity of the shadowgrams yields unique geometric properties that are not possible in traditional multi view camera based imaging systems these properties allow us to derive robust and automatic algorithm to recover the visual hull of an object and the positions of the light source simultaneously regardless of the complexity of the object we demonstrate the acquisition of several intricate shapes with severe occlusions and thin structures using to views
the unix fast file system ffs is probably the most widely used file system for performance comparisons however such comparisons frequently overlook many of the performance enhancements that have been added over the past decade in this paper we explore the two most commonly used approaches for improving the performance of meta data operations and recovery journaling and soft updates journaling systems use an auxiliary log to record meta data operations and soft updates uses ordered writes to ensure metadata consistency the commercial sector has moved en masse to journaling file systems as evidenced by their presence on nearly every server platform available today solaris aix digital unix hp ux irix and windows nt on all but solaris the default file system uses journaling in the meantime soft updates holds the promise of providing stronger reliability guarantees than journaling with faster recovery and superior performance in certain boundary cases in this paper we explore the benefits of soft updates and journaling comparing their behavior on both microbenchmarks and workload based macrobenchmarks we find that journaling alone is not sufficient to solve the meta data update problem if synchronous semantics are required ie meta data operations are durable once the system call returns then the journaling systems cannot realize their full potential only when this synchronicity requirement is relaxed can journaling systems approach the performance of systems like soft updates which also relaxes this requirement our asynchronous journaling and soft updates systems perform comparably in most cases while soft updates excels in some meta data intensive microbenchmarks the macrobenchmark results are more ambiguous in three cases soft updates and journaling are comparable in file intensive news workload journaling prevails and in small isp workload soft updates prevails
currently there are no known explicit algorithms for the great majority of graph problems in the dynamic distributed message passing model instead most state of the art dynamic distributed algorithms are constructed by composing static algorithm for the problem at hand with simulation technique that converts static algorithms to dynamic ones we argue that this powerful methodology does not provide satisfactory solutions for many important dynamic distributed problems and this necessitates developing algorithms for these problems from scratch in this paper we develop fully dynamic distributed algorithm for maintaining sparse spanners our algorithm improves drastically the quiescence time of the state of the art algorithm for the problem moreover we show that the quiescence time of our algorithm is optimal up to small constant factor in addition our algorithm improves significantly upon the state of the art algorithm in all efficiency parameters specifically it has smaller quiescence message and space complexities and smaller local processing time finally our algorithm is self contained and fairly simple and is consequently amenable to implementation on unsophisticated network devices
modeling spatiotemporal data in particular fuzzy and complex spatial objects representing geographic entities and relations is topic of great importance in geographic information systems computer vision environmental data management systems etc because of complex requirements it is challenging to represent spatiotemporal data and its features in databases and to effectively query them this article presents new approach to model and query the spatiotemporal data of fuzzy spatial and complex objects and or spatial relations in our case study we use meteorological database application in an intelligent database architecture which combines an object oriented database with knowledgebase for modeling and querying spatiotemporal objects
we present synchroscalar tile based architecture forembedded processing that is designed to provide the flexibilityof dsps while approaching the power efficiency ofasics we achieve this goal by providing high parallelismand voltage scaling while minimizing control and communicationcosts specifically synchroscalar uses columnsof processor tiles organized into statically assignedfrequency voltage domains to minimize power consumptionfurthermore while columns use simd control to minimizeoverhead data dependent computations can besupported by extremely flexible statically scheduled communicationbetween columnswe provide detailed evaluation of synchroscalar includingspice simulation wire and device models synthesisof key components cycle level simulation andcompiler and hand optimized signal processing applicationswe find that the goal of meeting not exceeding performancetargets with data parallel applications leads todesigns that depart significantly from our intuitions derivedfrom general purpose microprocessor design inparticular synchronous design and substantial global interconnectare desirable in the low frequency low powerdomain this global interconnect supports parallelizationand reduces processor idle time which are critical to energyefficient implementations of high bandwidth signalprocessing overall synchroscalar provides programmabilitywhile achieving power efficiencies within ofknown asic implementations which is better thanconventional dsps in addition frequency voltage scalingin synchroscalar provides between power savingsin our application suite
adaptive object models aom are sophisticated way of building object oriented systems that let non programmers customize the behavior of the system and that are most useful for businesses that are rapidly changing although systems based on an aom are often much smaller than competitors they can be difficult to build and to learn we believe that the problems with aom are due in part to mismatch between their design and the languages that are used to build them this paper describes how to avoid this mismatch by using implicit and explicit metaclasses
this paper proposes using user level memory thread ulmt for correlation prefetching in this approach user thread runs on general purpose processor in main memory either in the memory controller chip or in dram chip the thread performs correlation prefetching in software sending the prefetched data into the cache of the main processor this approach requires minimal hardware beyond the memory processor the correlation table is software data structure that resides in main memory while the main processor only needs few modifications to its cache so that it can accept incoming prefetches in addition the approach has wide applicability as it can effectively prefetch even for irregular applications finally it is very flexible as the prefetching algorithm can be customized by the user on an application basis our simulation results show that through new design of the correlation table and prefetching algorithm our scheme delivers good results specifically nine mostly irregular applications show an average speedup of furthermore our scheme works well in combination with conventional processor side sequential prefetcher in which case the average speedup increases to finally by exploiting the customization of the prefetching algorithm we increase the average speedup to
this paper describes enhanced subquery optimizations in oracle relational database system it discusses several techniques subquery coalescing subquery removal using window functions and view elimination for group by queries these techniques recognize and remove redundancies in query structures and convert queries into potentially more optimal forms the paper also discusses novel parallel execution techniques which have general applicability and are used to improve the scalability of queries that have undergone some of these transformations it describes new variant of antijoin for optimizing subqueries involved in the universal quantifier with columns that may have nulls it then presents performance results of these optimizations which show significant execution time improvements
the replica placement problem rpp aims at creating set of duplicated data objects across the nodes of distributed system in order to optimize certain criteria typically rpp formulations fall into two categories static and dynamic the first assumes that access statistics are estimated in advance and remain static and therefore one time replica distribution is sufficient irpp in contrast dynamic methods change the replicas in the network potentially upon every request this paper proposes an alternative technique named continuous replica placement problem crpp which falls between the two extreme approaches crpp can be defined as given an already implemented replication scheme and estimated access statistics for the next time period define new replication scheme subject to optimization criteria and constraints as we show in the problem formulation crpp is different in that the existing heuristics in the literature cannot be used either statically or dynamically to solve the problem in fact even with the most careful design their performance will be inferior since crpp embeds scheduling problem to facilitate the proposed mechanism we provide insight on the intricacies of crpp and propose various heuristics
software classification models have been regarded as an essential support tool in performing measurement and analysis processes most of the established models are single cycled in the model usage stage and thus require the measurement data of all the model’s variables to be simultaneously collected and utilized for classifying an unseen case within only single decision cycle conversely the multi cycled model allows the measurement data of all the model’s variables to be gradually collected and utilized for such classification within more than one decision cycle and thus intuitively seems to have better classification efficiency but poorer classification accuracy software project managers often have difficulties in choosing an appropriate classification model that is better suited to their specific environments and needs however this important topic is not adequately explored in software measurement and analysis literature by using an industrial software measurement dataset of nasa kc this paper explores the quantitative performance comparisons of the classification accuracy and efficiency of the discriminant analysis da and logistic regression lr based single cycled models and the decision tree dt based and echaid algorithms multi cycled models the experimental results suggest that the re appraisal cost of the type mr the software failure cost of type ii mr and the data collection cost of software measurements should be considered simultaneously when choosing an appropriate classification model
cyber physical systems increasingly rely on dynamically adaptive programs to respond to changes in their physical environment examples include ecosystem monitoring and disaster relief systems these systems are considered high assurance since errors during execution could result in injury loss of life environmental impact and or financial loss in order to facilitate the development and verification of dynamically adaptive systems we separate functional concerns from adaptive concerns specifically we model dynamically adaptive program as collection of non adaptive steady state programs and set of adaptations that realize transitions among steady state programs in response to environmental changes we use linear temporal logic ltl to specify properties of the non adaptive portions of the system and we use ltl an adapt operator extension toltl to concisely specify properties that hold during the adaptation process model checking offers an attractive approach to automatically analyzing models for adherence to formal properties and thus providing assurance however currently model checkers are unable to verify properties specified using ltl moreover as the number of steady state programs and adaptations increase the verification costs in terms of space and time potentially become unwieldy to address these issues we propose modular model checking approach to verifying that formal model of an adaptive program satisfies its requirements specified in ltl and ltl respectively
as the size of an rfid tag becomes smaller and the price of the tag gets lower rfid technology has been applied to wide range of areas recently rfid has been adopted in the business area such as supply chain management since companies can get movement information for products easily using the rfid technology it is expected to revolutionize supply chain management however the amount of rfid data in supply chain management is huge therefore it requires much time to extract valuable information from rfid data for supply chain management in this paper we define query templates for tracking queries and path oriented queries to analyze the supply chain we then propose an effective path encoding scheme to encode the flow information for products to retrieve the time information for products efficiently we utilize numbering scheme used in the xml area based on the path encoding scheme and the numbering scheme we devise storage scheme to process tracking queries and path oriented queries efficiently finally we propose method which translates the queries to sql queries experimental results show that our approach can process the queries efficiently on the average our approach is about times better than recent technique in terms of query performance
we review two areas of recent research linking proportional fairness with product form networks the areas concern respectively the heavy traffic and the large deviations limiting regimes for the stationary distribution of flow model where the flow model is stochastic process representing the randomly varying number of document transfers present in network sharing capacity according to the proportional fairness criterion in these two regimes we postulate the limiting form of the stationary distribution by comparison with several variants of the fairness criterion we outline how product form results can help provide insight into the performance consequences of resource pooling
large scale supercomputing is revolutionizing the way science is conducted growing challenge however is understanding the massive quantities of data produced by large scale simulations the data typically time varying multivariate and volumetric can occupy from hundreds of gigabytes to several terabytes of storage space transferring and processing volume data of such sizes is prohibitively expensive and resource intensive although it may not be possible to entirely alleviate these problems data compression should be considered as part of viable solution especially when the primary means of data analysis is volume rendering in this paper we present our study of multivariate compression which exploits correlations among related variables for volume rendering two configurations for multidimensional compression based on vector quantization are examined we emphasize quality reconstruction and interactive rendering which leads us to solution using graphics hardware to perform on the fly decompression during rendering
abstract in this paper we extend the standard for object oriented databases odmg with reactive features by proposing language for specifying triggers and defining its semantics this extension has several implications thus this work makes three different specific contributions first the definition of declarative data manipulation language for odmg which is missing in the current version of the standard such definition requires revisiting data manipulation in odmg and also addressing issues related to set oriented versus instance oriented computation then the definition of trigger language for odmg unifying also the sql proposal and providing support for trigger inheritance and overriding finally the development of formal semantics for the proposed data manipulation and trigger languages
most future large scale sensor networks are expected to follow two tier architecture which consists of resource rich master nodes at the upper tier and resource poor sensor nodes at the lower tier sensor nodes submit data to nearby master nodes which then answer the queries from the network owner on behalf of sensor nodes relying on master nodes for data storage and query processing raises severe concerns about data confidentiality and query result correctness when the sensor network is deployed in hostile environments in particular compromised master node may leak hosted sensitive data to the adversary it may also return juggled or incomplete query results to the network owner this paper for the first time in the literature presents suite of novel schemes to secure multidimensional range queries in tiered sensor networks the proposed schemes can ensure data confidentiality against master nodes and also enable the network owner to verify with very high probability the authenticity and completeness of any query result by inspecting the spatial and temporal relationships among the returned data detailed performance evaluations confirm the high efficacy and efficiency of the proposed schemes
applications ranging from location based services to multi player online gaming require continuous query support to monitor track and detect events of interest among sets of moving objects examples are alerting capabilities for detecting whether the distance the travel cost or the travel time among set of moving objects exceeds threshold these types of queries are driven by continuous streams of location updates simultaneously evaluated over many queries in this paper we define three types of proximity relations that induce location constraints to model continuous spatio temporal queries among sets of moving objects in road networks our focus lies on evaluating large number of continuous queries simultaneously we introduce novel moving object indexing technique that together with novel road network partitioning scheme restricts computations within the partial road network these techniques reduce query processing overhead by more than experiments over real world data sets show that our approach is twenty times faster than baseline algorithm
wireless sensor networks produce large amount of data that needs to be processed delivered and assessed according to the application objectives the way these data are manipulated by the sensor nodes is fundamental issue information fusion arises as response to process data gathered by sensor nodes and benefits from their processing capability by exploiting the synergy among the available data information fusion techniques can reduce the amount of data traffic filter noisy measurements and make predictions and inferences about monitored entity in this work we survey the current state of the art of information fusion by presenting the known methods algorithms architectures and models of information fusion and discuss their applicability in the context of wireless sensor networks
the technique of latent semantic indexing is used in wide variety of commercial applications in these applications the processing time and ram required for svd computation and the processing time and ram required during lsi retrieval operations are all roughly linear in the number of dimensions chosen for the lsi representation space in large scale commercial lsi applications reducing values could be of significant value in reducing server costs this paper explores the effects of varying dimensionality the approach taken here focuses on term comparisons pairs of terms are considered which have strong real world associations the proximities of members of these pairs in the lsi space are compared at multiple values of the testing is carried out for collections of from one to five million documents for the five million document collection value of provides the best performance the results suggest that there is something of an island of stability in the to range the results also indicate that there is relatively little room to employ values outside of this range without incurring significant distortions in at least some term term correlations
traditional content based image retrieval systems typically compute single descriptor per image based for example on color histograms the result of query is in general the images from the database whose descriptors are the closest to the descriptor of the query image systems built this way are able to return images that are globally similar to the query image but can not return images that contain some of the objects that are in the query as opposed to this traditional coarse grain recognition scheme recent advances in image processing make fine grain image recognition possible notably by computing local descriptors that can detect similar objects in different images obviously powerful fine grain recognition in images also changes the retrieval process instead of submitting single query to retrieve similar images multiple queries must be submitted and their partial results must be post processed before delivering the answer this paper first presents family of local descriptors supporting fine grain image recognition these descriptors enforce robust recognition despite image rotations and translations illumination variations and partial occlusions many multi dimensional indexes have been proposed to speed up the retrieval process these indexes however have been mostly designed for and evaluated against databases where each image is described by single descriptor while this paper does not present any new indexing scheme it shows that the three most efficient indexing techniques known today are still too slow to be used in practice with local descriptors because of the changes in the retrieval process
the advance of the web has significantly and rapidly changed the way of information organization sharing and distribution the next generation of the web the semantic web seeks to make information more usable by machines by introducing more rigorous structure based on ontologies in this context we try to propose novel and integrated approach for semi automated extraction of ontology based semantic web from data intensive web application and thus make the web content machine understandable our approach is based on the idea that semantics can be extracted by applying reverse engineering technique on the structures and the instances of html forms which are the most convenient interface to communicate with relational databases on the current data intensive web application this semantics is exploited to produce over several steps personalised ontology
we consider the problem of flow coordination in distributed multimedia applications most transport level protocols are designed to operate independently and lack mechanisms for sharing information with other flows and coordinating data transport in various ways this limitation becomes problematic in distributed applications that employ numerous flows between two computing clusters sharing the same intermediary forwarding path across the internet in this article we propose an open architecture that supports the sharing of network state information peer flow information and application specific information called simply the coordination protocol cp the scheme facilitates coordination of network resource usage across flows belonging to the same application as well as aiding other types of coordination the effectiveness of our approach is illustrated in the context of multistreaming in tele immersion where consistency of network information across flows both greatly improves frame transport synchrony and minimizes buffering delay
nodes in the hexagonal mesh and torus network are placed at the vertices of regular triangular tessellation so that each node has up to six neighbors the routing algorithm for the hexagonal torus is very complicated and it is an open problem by now hexagonal mesh and torus are known to belong to the class of cayley digraphs in this paper we use cayley formulations for the hexagonal torus along with some result on subgraphs and coset graphs to develop the optimal routing algorithm for the hexagonal torus and then we draw conclusions to the network diameter of the hexagonal torus
deformable isosurfaces implemented with level set methods have demonstrated great potential in visualization and computer graphics for applications such as segmentation surface processing and physically based modeling their usefulness has been limited however by their high computational cost and reliance on signi pound cant parameter tuning this paper presents solution to these challenges by describing graphics processor gpu based algorithms for solving and visualizing level set solutions at interactive rates the proposed solution is based on new streaming implementation of the narrow band algorithm the new algorithm packs the level set isosurface data into texture memory via multi dimensional virtual memory system as the level set moves this texture based representation is dynamically updated via novel gpu to cpu message passing scheme by integrating the level set solver with real time volume renderer user can visualize and intuitively steer the level set surface as it evolves we demonstrate the capabilities of this technology for interactive volume segmentation and visualization
usage data captured and logged by computers has long been an essential source of information for software developers support services personnel usability designers and learning researchers whether from mainframes file servers network devices or workstations the user event data logged in its many forms has served as an essential source of information for those who need to improve software analyze problems monitor security track workflow report on resource usage evaluate learning activities etc with today’s generation of open and community source web based frameworks however new challenges arise as to how where and when user activity gets captured and analyzed these frameworks flexibility in allowing easy integration of different applications presentation technologies middleware and data sources has side effects on usage data fragmented logs in wide range of formats often bestrewn across many locations this paper focuses on common issues faced especially by academic computing support personnel who need to gather and analyze user activity information within heterogeneous distributed open source web frameworks like sakai and uportal as described in this paper these kinds of challenges can be met by drawing upon techniques for coordinated distributed event monitoring along with some basic data mining and data visualization approaches in particular this paper describes work in progress to develop an approach towards building distributed capture and analysis systems for large production deployment of the sakai collaboration and learning environment in order to meet wide range of tracking monitoring and reporting log analysis in one university setting
in the sponsored search model search engines are paid by businesses that are interested in displaying ads for their site alongside the search results businesses bid for keywords and their ad is displayed when the keyword is queried to the search engine an important problem in this process is keyword generation given business that is interested in launching campaign suggest keywords that are related to that campaign we address this problem by making use of the query logs of the search engine we identify queries related to campaign by exploiting the associations between queries and urls as they are captured by the user’s clicks these queries form good keyword suggestions since they capture the wisdom of the crowd as to what is related to site we formulate the problem as semi supervised learning problem and propose algorithms within the markov random field model we perform experiments with real query logs and we demonstrate that our algorithms scale to large query logs and produce meaningful results
tracked objects rarely move alone they are often temporarily accompanied by other objects undergoing similar motion we propose novel tracking algorithm called sputnik tracker it is capable of identifying which image regions move coherently with the tracked object this information is used to stabilize tracking in the presence of occlusions or fluctuations in the appearance of the tracked object without the need to model its dynamics in addition sputnik tracker is based on novel template tracker integrating foreground and background appearance cues the time varying shape of the target is also estimated in each video frame together with the target position the time varying shape is used as another cue when estimating the target position in the next frame
in this position paper we discuss number of issues relating to model metrics with particular emphasis on metrics for uml models our discussion is presented as series of nine observations where we examine some of the existing work on applying metrics to uml models present some of our own work in this area and specify some topics for future research that we regard as important furthermore we identify three categories of challeges for model metrics and describe how our nine observations can be partitioned into these categories
relatively small set of static instructions has significant leverage on program execution performance these problem instructions contribute disproportionate number of cache misses and branch mispredictions because their behavior cannot be accurately anticipated using existing prefetching or branch prediction mechanisms the behavior of many problem instructions can be predicted by executing small code fragment called speculative slice if speculative slice is executed before the corresponding problem instructions are fetched then the problem instructions can move smoothly through the pipeline because the slice has tolerated the latency of the memory hierarchy for loads or the pipeline for branches this technique results in speedups up to percent over an aggressive baseline machine to benefit from branch predictions generated by speculative slices the predictions must be bound to specific dynamic branch instances we present technique that invalidates predictions when it can be determined by monitoring the program’s execution path that they will not be used this enables the remaining predictions to be correctly correlated
for any outsourcing service privacy is major concern this paper focuses on outsourcing frequent itemset mining and examines the issue on how to protect privacy against the case where the attackers have precise knowledge on the supports of some items we propose new approach referred to as support anonymity to protect each sensitive item with other items of similar support to achieve support anonymity we introduce pseudo taxonomy tree and have the third party mine the generalized frequent itemsets under the corresponding generalized association rules instead of association rules the pseudo taxonomy is construct to facilitate hiding of the original items where each original item can map to either leaf node or an internal node in the taxonomy tree the rationale for this approach is that with taxonomy tree the nodes to satisfy the support anonymity may be any nodes in the taxonomy tree with the appropriate supports so this approach can provide more candidates for support anonymity with limited fake items as only the leaf nodes not the internal nodes of the taxonomy tree need to appear in the transactions otherwise for the association rule mining the nodes to satisfy the support anonymity have to correspond to the leaf nodes in the taxonomy tree this is far more restricted the challenge is thus on how to generate the pseudo taxonomy tree to facilitate support anonymity and to ensure the conservation of original frequent itemsets the experimental results showed that our methods of support anonymity can achieve very good privacy protection with moderate storage overhead
topical crawlers are increasingly seen as way to address the scalability limitations of universal search engines by distributing the crawling process across users queries or even client computers the context available to such crawlers can guide the navigation of links with the goal of efficiently locating highly relevant target pages we developed framework to fairly evaluate topical crawling algorithms under number of performance metrics such framework is employed here to evaluate different algorithms that have proven highly competitive among those proposed in the literature and in our own previous research in particular we focus on the tradeoff between exploration and exploitation of the cues available to crawler and on adaptive crawlers that use machine learning techniques to guide their search we find that the best performance is achieved by novel combination of explorative and exploitative bias and introduce an evolutionary crawler that surpasses the performance of the best nonadaptive crawler after sufficiently long crawls we also analyze the computational complexity of the various crawlers and discuss how performance and complexity scale with available resources evolutionary crawlers achieve high efficiency and scalability by distributing the work across concurrent agents resulting in the best performance cost ratio
this paper argues for set of requirements that an architectural style for self healing systems should satisfy adaptability dynamicity awareness autonomy robustness distributability mobility and traceability support for these requirements is discussed along five dimensions we have identified as distinguishing characteristics of architectural styles external structure topology rules behavior interaction and data flow as an illustration these requirements are used to assess an existing architectural style while this initial formulation of the requirements appears to have utility much further work remains to be done in order to apply it in evaluating and comparing architectural styles for self healing systems
leakage energy will be the major energy consumer in futuredeep sub micron designs especially the memory sub systemof future socs will be negatively affected by this trend inorder to reduce the leakage energy memory banks are transitionedto low energy state when possible this transitionitself costs some energy which is termed as the transition energyin this paper we present as the first approach of its kind novel energy saving replacement policy called lru seq forinstruction caches evaluation of the policy on various architecturesin system level environment has shown that upto energy savings can be obtained considering the negligiblehardware impact lru seq offers viable choice foran energy saving policy
in this work we propose method to reduce the impact of process variations by adapting the application’s algorithm at the software layer we introduce the concept of hardware signatures as the measured post manufacturing hardware characteristics that can be used to drive software adaptation across different die using encoding as an example we demonstrate significant yield improvements as much as points at over design reduction in over design by as much as points at yield as well as application quality improvements about db increase in average psnr at yield further we investigate implications of limited information exchange ie signature measurement granularity on yield and quality we show that our proposed technique for determining optimal signature measurement points results in an improvement in psnr of about db over naive sampling for the encoder we conclude that hardware signature based application adaptation is an easy and inexpensive to implement better informed by actual application requirements and ffective way to manage yield cost quality tradeoffs in application implementation design flows
modern embedded consumer devices execute complex network and multimedia applications that require high performance and low energy consumption for implementing complex applications on network on chips nocs design methodology is needed for performing exploration at noc system level in order to select the optimal application specific noc architecture serving the application requirements in the best way the design methodology we present in this paper is based on the exploration of different noc characteristics and is supported by flexible noc simulator which provides the essential evaluation metrics in order to select the optimal communication parameters of the noc architectures we illustrated that it is possible with the evaluation metrics provided by the simulator we present to perform exploration of several noc aspects and select the optimal communication characteristics for noc platforms having network and multimedia applications as the target domains with our methodology we can achieve gain of in the energy times delay product on average
in order to manage service to meet the agreed upon sla it is important to design service of the required capacity and to monitor the service thereafter for violations at runtime this objective can be achieved by translating slos specified in the sla into lower level policies that can then be used for design and enforcement purposes such design and operational policies are often constraints on thresholds of lower level metrics in this paper we propose systematic and practical approach that combines fine grained performance modeling with regression analysis to translate service level objectives into design and operational policies for multi tier applications we demonstrate that our approach can handle both request based and session based workloads and deal with workload changes in terms of both request volume and transaction mix we validate our approach using both the rubis commerce benchmark and trace driven simulation of business critical enterprise application these results show the effectiveness of our approach
classical compiler optimizations assume fixed cache architecture and modify the program to take best advantage of it in some cases this may not be the best strategy because each nest might work best with different cache configuration and transforming nest for given fixed cache configuration may not be possible due to data and control dependences working with fixed cache configuration can also increase energy consumption in loops where the best required configuration is smaller than the default fixed one in this paper we take an alternate approach and modify the cache configuration for each nest depending on the access pattern exhibited by the nest we call this technique compiler directed cache polymorphism cdcp more specifically in this paper we make the following contributions first we present an approach for analyzing data reuse properties of loop nests second we give algorithms to simulate the footprints of array references in their reuse space third based on our reuse analysis we present an optimization algorithm to compute the cache configurations for each loop nest our experimental results show that cdcp is very effective in finding the near optimal data cache configurations for different nests in array intensive applications
it is envisaged that the application of the multilevel security mls scheme will enhance flexibility and effectiveness of authorization policies in shared enterprise databases and will replace cumbersome authorization enforcement practices through complicated view definitions on per user basis however as advances in this area are being made and ideas crystallized the concomitant weaknesses of the mls databases are also surfacing we insist that the critical problem with the current model is that the belief at higher security level is cluttered with irrelevant or inconsistent data as no mechanism for attenuation is supported critics also argue that it is imperative for mls database users to theorize about the belief of others perhaps at different security levels an apparatus that is currently missing and the absence of which is seriously felt the impetus for our current research is this need to provide an adequate framework for belief reasoning in mls databases we demonstrate that prudent application of the concept of inheritance in deductive database setting will help capture the notion of declarative belief and belief reasoning in mls databases in an elegant way to this end we develop function to compute belief in multiple modes which can be used to reason about the beliefs of other users we strive to develop poised and practical logical characterization of mls databases for the first time based on the inherently difficult concept of non monotonic inheritance we present an extension of the acclaimed datalog language called the multilog and show that datalog is special case of our language we also suggest an implementation scheme for multilog as front end for coral
in this paper we propose formal analysis approach to estimate the expected average data cache access time of an application across all possible program inputs towards this goal we introduce the notion of probabilistic access history that intuitively summarizes the history of data memory accesses along different program paths to reach particular program point and their associated probabilities an efficient static program analysis technique has been developed to compute the access history at all program points we estimate the cache hit miss probabilities and hence the expected access time of each data memory reference from the access history our experimental evaluation confirms the accuracy and viability of the probabilistic data cache modeling approach
this paper presents new static type system for multithreaded programs well typed programs in our system are guaranteed to be free of data races and deadlocks our type system allows programmers to partition the locks into fixed number of equivalence classes and specify partial order among the equivalence classes the type checker then statically verifies that whenever thread holds more than one lock the thread acquires the locks in the descending orderour system also allows programmers to use recursive tree based data structures to describe the partial order for example programmers can specify that nodes in tree must be locked in the tree order our system allows mutations to the data structure that change the partial order at runtime the type checker statically verifies that the mutations do not introduce cycles in the partial order and that the changing of the partial order does not lead to deadlocks we do not know of any other sound static system for preventing deadlocks that allows changes to the partial order at runtimeour system uses variant of ownership types to prevent data races and deadlocks ownership types provide statically enforceable way of specifying object encapsulation ownership types are useful for preventing data races and deadlocks because the lock that protects an object can also protect its encapsulated objects this paper describes how to use our type system to statically enforce object encapsulation as well as prevent data races and deadlocks the paper also contains detailed discussion of different ownership type systems and the encapsulation guarantees they provide
distributed information brokering system dibs is peer to peer overlay network that comprises diverse data servers and brokering components helping client queries locate the data server many existing information brokering systems adopt server side access control deployment and honest assumptions on brokers however little attention has been drawn on privacy of data and metadata stored and exchanged within dibs in this paper we address privacy preserving information sharing via on demand information access we propose flexible and scalable system using broker coordinator overlay network through an innovative automaton segmentation scheme distributed access control enforcement and query segment encryption our system integrates security enforcement and query forwarding while preserving system wide privacy we present the automaton segmentation approach analyze privacy preservation in details and finally examine the end to end performance and scalability through experiments and analysis
we study the problem of finding in given word all maximal gapped palindromes verifying two types of constraints that we call long armed and length constrained palindromes for each of the two classes we propose an algorithm that runs in time for constant size alphabet where is the number of output palindromes both algorithms can be extended to compute biological gapped palindromes within the same time bound
many of the existing approaches in software comprehension focus on program structure or external documentation however by analyzing formal information the informal semantics contained in the vocabulary of source code are overlooked to understand software as whole we need to enrich software analysis with the developer knowledge hidden in the code naming this paper proposes the use of information retrieval to exploit linguistic information found in source code such as identifier names and comments we introduce semantic clustering technique based on latent semantic indexing and clustering to group source artifacts that use similar vocabulary we call these groups semantic clusters and we interpret them as linguistic topics that reveal the intention of the code we compare the topics to each other identify links between them provide automatically retrieved labels and use visualization to illustrate how they are distributed over the system our approach is language independent as it works at the level of identifier names to validate our approach we applied it on several case studies two of which we present in this paper note some of the visualizations presented make heavy use of colors please obtain color copy of the article for better understanding
existing work on software connectors shows significant disagreement on both their definition and their relationships with components coordinators and adaptors we propose precise characterisation of connectors discuss how they relate to the other three classes and contradict the suggestion that connectors and components are disjoint we discuss the relationship between connectors and coupling and argue the inseparability of connection models from component programming models finally we identify the class of configuration languages show how it relates to primitive connectors and outline relevant areas for future work
the area under the roc receiver operating characteristics curve or simply auc has been traditionally used in medical diagnosis since the it has recently been proposed as an alternative single number measure for evaluating the predictive ability of learning algorithms however no formal arguments were given as to why auc should be preferred over accuracy in this paper we establish formal criteria for comparing two different measures for learning algorithms and we show theoretically and empirically that auc is better measure defined precisely than accuracy we then reevaluate well established claims in machine learning based on accuracy using auc and obtain interesting and surprising new results for example it has been well established and accepted that naive bayes and decision trees are very similar in predictive accuracy we show however that naive bayes is significantly better than decision trees in auc the conclusions drawn in this paper may make significant impact on machine learning and data mining applications
this paper presents search engine architecture retin aiming at retrieving complex categories in large image databases for indexing scheme based on two step quantization process is presented to compute visual codebooks the similarity between images is represented in kernel framework such similarity is combined with online learning strategies motivated by recent machine learning developments such as active learning additionally an offline supervised learning is embedded in the kernel framework offering real opportunity to learn semantic categories experiments with real scenario carried out from the corel photo database demonstrate the efficiency and the relevance of the retin strategy and its outstanding performances in comparison to up to date strategies
automated verification is one of the most successful applications of automated reasoning in computer science in automated verification one uses algorithmic techniques to establish the correctness of the design with respect to given property automated verification is based on small number of key algorithmic ideas tying together graph theory automata theory and logic in this self contained talk will describe how this holy trinity gave rise to automated verification tools and mention some applications to planning
many data warehouse systems have been developed recently yet data warehouse practice is not sufficiently sophisticated for practical usage most data warehouse systems have some limitations in terms of flexibility efficiency and scalability in particular the sizes of these data warehouses are forever growing and becoming overloaded with data scenario that leads to difficulties in data maintenance and data analysis this research focuses on data information integration between data cubes this research might contribute to the resolution of two concerns the problem of redundancy and the problem of data cubes independent information this work presents semantic cube model which extends object oriented technology to data warehouses and which enables users to design the generalization relationship between different cubes in this regard this work’s objectives are to improve the performance of query integrity and to reduce data duplication in data warehouse to deal with the handling of increasing data volume in data warehouses we discovered important inter relationships that hold among data cubes that facilitate information integration and that prevent the loss of data semantics
the vision of future electronic marketplaces markets is that of markets being populated by autonomous intelligent entities software trading agents representing their users or owners and conducting business on their behalf for this vision to materialize one fundamental issue that needs to be addressed is that of trust first users need to be able to trust that the agents will do what they say they do second they need to be confident that their privacy is protected and that the security risks involved in entrusting agents to perform transactions on their behalf are minimized finally users need to be assured that any legal issues relating to agents trading electronically are fully covered as they are in traditional trading practices in this paper we consider the barriers for the adoption of agent technology in electronic commerce commerce which pertain to trust security and legal issues we discuss the perceived risks of the use of agents in commerce and the fundamental issue of trust in this context issues regarding security and how some of these can be addressed through the use of cryptography are described the impact of the use of agent technology on the users privacy and how it can be both protected as well as hindered by it is also examined finally we discuss the legal issues that arise in agent mediated commerce and discuss the idea of attributing to software agents the status of legal persons or persons and the various implications
wireless sensor networks have been proposed for multitude of location dependent applications for such systems the cost and limitations of the hardware on sensing nodes prevent the use of range based localization schemes that depend on absolute point to point distance estimates because coarse accuracy is sufficient for most sensor network applications solutions in range free localization are being pursued as cost effective alternative to more expensive range based approaches in this paper we present apit novel localization algorithm that is range free we show that our apit scheme performs best when an irregular radio pattern and random node placement are considered and low communication overhead is desired we compare our work via extensive simulation with three state of the art range free localization schemes to identify the preferable system configurations of each in addition we study the effect of location error on routing and tracking performance we show that routing performance and tracking accuracy are not significantly affected by localization error when the error is less than times the communication radio radius
xml data can be represented by tree or graph structure and xml query processing requires the information of structural relationships among nodes the basic structural relationships are parent child and ancestor descendant and finding all occurrences of these basic structural relationships in an xml data is clearly core operation in xml query processing several node labeling schemes have been suggested to support the determination of ancestor descendant or parent child structural relationships simply by comparing the labels of nodes however the previous node labeling schemes have some disadvantages such as large number of nodes that need to be relabeled in the case of an insertion of xml data huge space requirements for node labels and inefficient processing of structural joins in this paper we propose the nested tree structure that eliminates the disadvantages and takes advantage of the previous node labeling schemes the nested tree structure makes it possible to use the dynamic interval based labeling scheme which supports xml data updates with almost no node relabeling as well as efficient structural join processing experimental results show that our approach is efficient in handling updates with the interval based labeling scheme and also significantly improves the performance of the structural join processing compared with recent methods
component programming techniques encourage abstraction and reuse through external linking some parts of program however must use concrete internally specified references so pure component system is not sufficient mechanism for structuring programs we present the combination of static internally linked module system and purely abstractive component system the latter extends our previous model of typed units to properly account for translucency and sharing we also show how units and modules can express an sml style system of structures and functors and we explore the consequences for recursive structures and functors
in this paper we kernelize conventional clustering algorithms from novel point of view based on the fully mathematical proof we first demonstrate that kernel kmeans kkmeans is equivalent to kernel principal component analysis kpca prior to the conventional kmeans algorithm by using kpca as preprocessing step we also generalize gaussian mixture model gmm to its kernel version the kernel gmm kgmm consequently conventional clustering algorithms can be easily kernelized in the linear feature space instead of nonlinear one to evaluate the newly established kkmeans and kgmm algorithms we utilized them to the problem of semantic object extraction segmentation of color images based on series of experiments carried out on set of color images we indicate that both kkmeans and kgmm can offer more elaborate output than the conventional kmeans and gmm respectively
various index structures have been proposed to speed up the evaluation of xml path expressions however existing xml path indices suffer from at least one of three limitations they focus only on indexing the structure relying on separate index for node content they are useful only for simple path expressions such as root to leaf paths or they cannot be tightly integrated with relational query processor moreover there is no unified framework to compare these index structures in this paper we present framework defining family of index structures that includes most existing xml path indices we also propose two novel index structures in this family with different space time tradeoffs that are effective for the evaluation of xml branching path expressions ie twigs with value conditions we also show how this family of index structures can be implemented using the access methods of the underlying relational database system finally we present an experimental evaluation that shows the performance tradeoff between index space and matching time the experimental results show that our novel indices achieve orders of magnitude improvement in performance for evaluating twig queries albeit at higher space cost over the use of previously proposed xml path indices that can be tightly integrated with relational query processor
this paper addresses performance bottleneck in time series subsequence matching first we analyze the disk access and cpu processing times required during the index searching and post processing steps of subsequence matching through preliminary experiments based on their results we show that the post processing step is main performance bottleneck in subsequence matching in order to resolve the performance bottleneck we propose simple yet quite effective method that processes the post processing step by rearranging the order of candidate subsequences to be compared with query sequence our method completely eliminates the redundancies of disk accesses and cpu processing occurring in the post processing step we show that our method is optimal and also does not incur any false dismissal also we justify the effectiveness of our method by extensive experiments
software maintenance is expensive and difficult because soft ware is complex and maintenance requires the understanding of code written by someone else prerequisite to maintainability is program understanding specifically understanding the control flows between software components this is especially problematic for emerging software technologies such as the world wide web because of the lack of formal development practices and because web applications comprise mix of static and dynamic content adequate representations are therefore necessary to facilitate program understanding this research proposes an approach called readable readable executable augmentable database linked environment that generates executable tabular representations that can be used to both understand and manipulate software applications controlled laboratory experiment carried out to test the efficacy of the approach demonstrates that the representations significantly enhance program understanding the results suggest that the approach and the corresponding environment may be useful to alleviate problems associated with the software maintainability of new web applications
structured texts for example dictionaries and user manuals typically have heirarchical tree like structure we describe query language for retrieving information from collections of hierarchical text the language is based on tree pattern matching notion called tree inclusion tree inclusion allows easy expression of queries that use the structure and the content of the document in using it user need not be aware of the whole structure of the database thus language based on tree inclusion is data independent property made necessary because of the great variance in the structure of the texts
this paper presents an approach to matching parts of deformable shapes multiscale salient parts of the two shapes are first identified then these parts are matched if their immediate properties are similar the same holds recursively for their subparts and the same holds for their neighbor parts the shapes are represented by hierarchical attributed graphs whose node attributes encode the photometric and geometric properties of corresponding parts and edge attributes capture the strength of neighbor and part of interactions between the parts their matching is formulated as finding the subgraph isomorphism that minimizes quadratic cost the dimensionality of the matching space is dramatically reduced by convexifying the cost experimental evaluation on the benchmark mpeg and brown datasets demonstrates that the proposed approach is robust
tree automata completion is technique for the verification of infinite state systems it has already been used for the verification of cryptographic protocols and the prototyping of java static analyzers however as for many other verification techniques the correctness of the associated tool becomes more and more difficult to guarantee it is due to the size of the implementation that constantly grows and due to optimizations which are necessary to scale up the efficiency of the tool to verify real size systems in this paper we define and develop checker for tree automata produced by completion the checker is defined using coq and its implementation is automatically extracted from its formal specification using extraction gives checker that can be run independently of the coq environment specific algorithm for tree automata inclusion checking has been defined so as to avoid the exponential blow up the obtained checker is certified in coq independent of the implementation of completion usable with any approximation performed during completion small and fast some benchmarks are given to show how efficient the tool is
shape skeletons are fundamental concepts for describing the shape of geometric objects and have found variety of applications in number of areas where geometry plays an important role two types of skeletons commonly used in geometric computations are the straight skeleton of linear polygon and the medial axis of bounded set of points in the dimensional euclidean space however exact computation of these skeletons of even fairly simple planar shapes remains an open problem in this paper we propose novel approach to construct exact or approximate continuous distance functions and the associated skeletal representations skeleton and the corresponding radius function for solid semi analytic sets that can be either rigid or undergoing topological deformations our approach relies on computing constructive representations of shapes with functions that operate on real valued halfspaces as logic operations we use our approximate distance functions to define new type of skeleton ie the skeleton which is piecewise linear for polygonal domains generalizes naturally to planar and spatial domains with curved boundaries and has attractive properties we also show that the exact distance functions allow us to compute the medial axis of any closed bounded and regular planar domain importantly our approach can generate the medial axis the straight skeleton and the skeleton of possibly deformable shapes within the same formulation extends naturally to and can be used in variety of applications such as skeleton based shape editing and adaptive motion planning
this paper offers an exploration of the attitudes of older adults to keeping in touch with people who are important to them we present findings from three focus groups with people from to years of age themes emerging from the findings suggest that older adults view the act of keeping in touch as being worthy of time and dedication but also as being something that needs to be carefully managed within the context of daily life communication is seen as means through which skill should be demonstrated and personality expressed and is understood in very different context to the lightweight interaction that is increasingly afforded by new technologies the themes that emerged are used to elicit number of design implications and to promote some illustrative design concepts for new communication devices
software has spent the bounty of moore’s law by solving harder problems and exploiting abstractions such as high level languages virtual machine technology binary rewriting and dynamic analysis abstractions make programmers more productive and programs more portable but usually slow them down since moore’s law is now delivering multiple cores instead of faster processors future systems must either bear relatively higher cost for abstractions or use some cores to help tolerate abstraction costs this paper presents the design implementation and evaluation of novel concurrent configurable dynamic analysis framework that efficiently utilizes multicore cache architectures it introduces cache friendly asymmetric buffering cab lock free ring buffer that implements efficient communication between application and analysis threads we guide the design and implementation of our framework with model of dynamic analysis overheads the framework implements exhaustive and sampling event processing and is analysis neutral we evaluate the framework with five popular and diverse analyses and show performance improvements even for lightweight low overhead analyses efficient inter core communication is central to high performance parallel systems and we believe the cab design gives insight into the subtleties and difficulties of attaining it for dynamic analysis and other parallel software
recent work has shown that the physical connectivity of the internet exhibits small world behavior characterizing such behavior is important not only for generating realistic internet topology but also for the proper evaluation of large scale content delivery mechanisms along this line this paper tries to understand how small world behavior arises in the internet topologies and how it impacts the performance of multicast techniques first we attribute small world behavior to two possible causes namely the variability of vertex degree and the preference for local connections for vertices we have found that both factors contribute with different relative degrees to the small world behavior of autonomous system as level and router level internet topologies for as level topology we observe that high variability of vertex degree is sufficient to cause small world behavior but for router level topology preference for local connectivity plays more important role second we propose better models to generate small world internet topologies our models incorporate both causes of small world behavior and generate graphs closely resemble real internet graphs third using simulation we demonstrate the importance of our work by studying the scaling behavior of multicast techniques we show that multicast tree size largely depends on network topology if topology generators capture only the variability of vertex degree they are likely to underestimate the benefit of multicast techniques
there is currently an abundance of social network services available on the internet in addition examples of location aware social network services are emerging the use of such services presents interesting consequences for users privacy and behaviour and ultimately the adoption of such services yet not lot of explicit knowledge is available that addresses these issues the work presented here tries to answer this by investigating the willingness to use location aware service in population of students during three week festival the main findings show that most users are willing to use such systems also on larger scale however some reservations particularly with regard to privacy are uncovered
cache memories in embedded systems play an important role in reducing the execution time of the applications various kinds of extensions have been added to cache hardware to enable software involvement in replacement decisions thus improving the run time over purely hardware managed cache novel embedded systems like intel’s xscale and arm cortex processors provide the facility of locking one or more lines in cache this feature is called cache locking this paper presents the first method in the literature for instruction cache locking that is able to reduce the average case run time of the program we devise cost benefit model to discover the memory addresses which should be locked in the cache we implement our scheme inside binary rewriter thus widening the applicability of our scheme to binaries compiled using any compiler results obtained on suite of mibench and mediabench benchmarks show up to improvement in the instruction cache miss rate on average and up to improvement in the execution time on average for applications having instruction accesses as bottleneck depending on the cache configuration the improvement in execution time is as high as for some benchmarks
regular path expressions are essential for formulating queries over the semistructured data without specifying the exact structure the query pruning is an important optimization technique to avoid useless traversals in evaluating regular path expressions while the previous query pruning optimizes single regular path expression well it often fails to fully optimize multiple regular path expressions nevertheless multiple regular path expressions are very frequently used in nontrivial queries and so an effective optimization technique for them is required in this paper we present new technique called the two phase query pruning that consists of the preprocessing phase and the pruning phase our two phase query pruning is effective in optimizing multiple regular path expressions and is more scalable and efficient than the combination of the previous query pruning and post processing in that it never deals with exponentially many combinations of sub results produced from all the regular path expressions
web search engines typically provide search results without considering user interests or context we propose personalized search approach that can easily extend conventional search engine on the client side our mapping framework automatically maps set of known user interests onto group of categories in the open directory project odp and takes advantage of manually edited data available in odp for training text classifiers that correspond to and therefore categorize and personalize search results according to user interests in two sets of controlled experiments we compare our personalized categorization system pcat with list interface system list that mimics typical search engine and with nonpersonalized categorization system cat in both experiments we analyze system performances on the basis of the type of task and query length we find that pcat is preferable to list for information gathering types of tasks and for searches with short queries and pcat outperforms cat in both information gathering and finding types of tasks and for searches associated with free form queries from the subjects answers to questionnaire we find that pcat is perceived as system that can find relevant web pages quicker and easier than list and cat
the discovery of biclusters which denote groups of items that show coherent values across subset of all the transactions in data set is an important type of analysis performed on real valued data sets in various domains such as biology several algorithms have been proposed to find different types of biclusters in such data sets however these algorithms are unable to search the space of all possible biclusters exhaustively pattern mining algorithms in association analysis also essentially produce biclusters as their result since the patterns consist of items that are supported by subset of all the transactions however major limitation of the numerous techniques developed in association analysis is that they are only able to analyze data sets with binary and or categorical variables and their application to real valued data sets often involves some lossy transformation such as discretization or binarization of the attributes in this paper we propose novel association analysis framework for exhaustively and efficiently mining range support patterns from such data set on one hand this framework reduces the loss of information incurred by the binarization and discretization based approaches and on the other it enables the exhaustive discovery of coherent biclusters we compared the performance of our framework with two standard biclustering algorithms through the evaluation of the similarity of the cellular functions of the genes constituting the patterns biclusters derived by these algorithms from microarray data these experiments show that the real valued patterns discovered by our framework are better enriched by small biologically interesting functional classes also through specific examples we demonstrate the ability of the rap framework to discover functionally enriched patterns that are not found by the commonly used biclustering algorithm isa the source code and data sets used in this paper as well as the supplementary material are available at http wwwcsumnedu vk gaurav rap
in building large scale video server it is highly desirable to use heterogeneous disk subsystems for the following reasons first existing disks may fail especially in an environment with large number of disks enforcing the use of new disks second for scalable server to cope with the increasing demand of customers new disks may be needed to increase the server apos storage capacity and throughput with rapid advances in the performance of disks the newly added disks generally have higher data transfer rate and larger storage capacity than the disks originally in the system in this paper we propose novel striping scheme termed as resource based striping rbs for video servers built on heterogeneous disks rbs combines the techniques of wide striping and narrow striping so that it can obtain the optimal stripe allocation and efficiently utilize both the bandwidth and storage capacity of all disks rbs is suitable for applications whose files are not updated frequently such as course on demand and movie on demand we examine the performance of rbs via simulation experiments our results show that rbs greatly outperforms the conventional striping schemes proposed for video servers with heterogeneous or homogeneous disks in terms of the number of simultaneous streams supported and the number of files that can be stored
this paper describes the moveme interaction prototype developed in conjunction with vlab in rotterdam moveme proposes scenario for social interaction and the notion of social intimacy interaction with sensory enhanced soft pliable tactile throw able cushions afford new approaches to pleasure movement and play somatics approach to touch and kinaesthesia provides an underlying design framework the technology developed for moveme uses the surface of the cushion as an intelligent tactile interface making use of movement analysis system called laban effort shape we have developed model that provides high level interpretation of varying qualities of touch and motion trajectory we describe the notion of social intimacy and how we model it through techniques in somatics and performance practice we describe the underlying concepts of moveme and its motivations we illustrate the structural layers of interaction and related technical detail finally we discuss the related body of work in the context of evaluating our approach and conclude with plans for future work
this paper proposes set of new software test diversity measures based on control oscillations of test suites oscillation diversity uses conversion inversion and phase transformation to vary test suite amplitudes frequencies and phases resistance and inductance are defined as measures of diversification difficulty the experimental results show correlation between some oscillation diversity measures and fault detection effectiveness
this paper presents techniques and tools to transform spreadsheets into relational databases and back set of data refinement rules is introduced to map tabular datatype into relational database schema having expressed the transformation of the two data models as data refinements we obtain for free the functions that migrate the data we use well known relational database techniques to optimize and query the data because data refinements define bi directional transformations we can map such database back to an optimized spreadsheet we have implemented the data refinement rules and we constructed haskell based tools to manipulate optimize and refactor excel like spreadsheets
we introduce new acceleration to the standard splatting volume rendering algorithm our method achieves full colour bit depth sorted and shaded volume rendering significantly faster than standard splatting the speedup is due to dimensional adjacency data structure that efficiently skips transparent parts of the data and stores only the voxels that are potentially visible our algorithm is robust and flexible allowing for depth sorting of the data including correct back to front ordering for perspective projections this makes interactive splatting possible for applications such as medical visualizations that rely on structure and depth information
in this paper new method to improve the utilization of main memory systems is presented the new method is based on prestoring in main memory number of query answers each evaluated out of single memory page to this end the ideas of page answers and page traces are formally described and their properties analyzed the query model used here allows for selection projection join recursive queries as well as arbitrary combinations we also show how to apply the approach under update traffic this concept is especially useful in managing the main memories of an important class of applications this class includes the evaluation of triggers and alerters performance improvement of rule based systems integrity constraint checking and materialized views these applications are characterized by the existence at compile time of predetermined set of queries by slow but persistent update traffic and by their need to repetitively reevaluate the query set the new approach represents new type of intelligent database caching which contrasts with traditional caching primarily in that the cache elements are derived data and as consequence they overlap arbitrarily and do not have fixed length the contents of the main memory cache are selected based on the data distribution within the database the set of fixed queries to preprocess and the paging characteristics page answers and page traces are used as the smallest indivisible units in the cache an efficient heuristic to select near optimal set of page answers and page traces to populate the main memory has been developed implemented and tested finally quantitative measurements of performance benefits are reported
all frequent itemset mining algorithms rely heavily on the monotonicity principle for pruning this principle allows for excluding candidate itemsets from the expensive counting phase in this paper we present sound and complete deduction rules to derive bounds on the support of an itemset based on these deduction rules we construct condensed representation of all frequent itemsets by removing those itemsets for which the support can be derived resulting in the so called non derivable itemsets ndi representation we also present connections between our proposal and recent other proposals for condensed representations of frequent itemsets experiments on real life datasets show the effectiveness of the ndi representation making the search for frequent non derivable itemsets useful and tractable alternative to mining all frequent itemsets
programming web applications in direct style with the help of continuations is much simpler safer modular and better performing technology than the current dominating page centric technology combining cgi scripts active pages or servlets this paper discusses the use of continuations in the context of web applications the problems they solve as well as some new problems they introduce
in degree pagerank number of visits and other measures of web page popularity significantly influence the ranking of search results by modern search engines the assumption is that popularity is closely correlated with quality more elusive concept that is difficult to measure directly unfortunately the correlation between popularity and quality is very weak for newly created pages that have yet to receive many visits and or in links worse since discovery of new content is largely done by querying search engines and because users usually focus their attention on the top few results newly created but high quality pages are effectively shut out and it can take very long time before they become popularwe propose simple and elegant solution to this problem the introduction of controlled amount of randomness into search result ranking methods doing so offers new pages chance to prove their worth although clearly using too much randomness will degrade result quality and annul any benefits achieved hence there is tradeoff between exploration to estimate the quality of new pages and exploitation of pages already known to be of high quality we study this tradeoff both analytically and via simulation in the context of an economic objective function based on aggregate result quality amortized over time we show that modest amount of randomness leads to improved search results
the ever growing needs of large multimedia systems cannot be met by magnetic disks due to their high cost and low storage density consequently cheaper and denser tertiary storage systems are being integrated into the storage hierarchies of these applications although tertiary storage is cheaper the access latency is very high due to the need to load and unload media on the drives this high latency and the bursty nature of traffic result in the accumulation of requests for tertiary storage we study the problem of scheduling these requests to improve performance in particular we address the issues of scheduling across multiple tapes or disks as opposed to most other studies which consider only one or two media we focus on algorithms that minimize the number of switches and show through simulation that these result in near optimal schedules for single drive libraries an efficient algorithm that produces optimal schedules is developed for multiple drives the problem is shown to be np complete efficient and effective heuristics are presented for both single and multiple drives the scheduling policies developed achieve significant performance gains over naive policies the algorithms are simple to implement and are not restrictive the study encompasses all types of storage libraries handling removable media such as tapes and optical disks
botnets are networks of compromised computers infected with malicious code that can be controlled remotely under common command and control channel recognized as one the most serious security threats on current internet infrastructure advanced botnets are hidden not only in existing well known network applications eg irc http or peer to peer but also in some unknown or novel creative applications which makes the botnet detection challenging problem most current attempts for detecting botnets are to examine traffic content for bot signatures on selected network links or by setting up honeypots in this paper we propose new hierarchical framework to automatically discover botnets on large scale wifi isp network in which we first classify the network traffic into different application communities by using payload signatures and novel cross association clustering algorithm and then on each obtained application community we analyze the temporal frequent characteristics of flows that lead to the differentiation of malicious channels created by bots from normal traffic generated by human beings we evaluate our approach with about million flows collected over three consecutive days on large scale wifi isp network and results show the proposed approach successfully detects two types of botnet application flows ie blackenergy http bot and kaiten irc bot from about million flows with high detection rate and an acceptable low false alarm rate
in this paper we present multimodal approach for the recognition of eight emotions our approach integrates information from facial expressions body movement and gestures and speech we trained and tested model with bayesian classifier using multimodal corpus with eight emotions and ten subjects firstly individual classifiers were trained for each modality next data were fused at the feature level and the decision level fusing the multimodal data resulted in large increase in the recognition rates in comparison with the unimodal systems the multimodal approach gave an improvement of more than when compared to the most successful unimodal system further the fusion performed at the feature level provided better results than the one performed at the decision level
this study investigates level set multiphase image segmentation by kernel mapping and piecewise constant modeling of the image data thereof kernel function maps implicitly the original data into data of higher dimension so that the piecewise constant model becomes applicable this leads to flexible and effective alternative to complex modeling of the image data the method uses an active curve objective functional with two terms an original term which evaluates the deviation of the mapped image data within each segmentation region from the piecewise constant model and classic length regularization term for smooth region boundaries functional minimization is carried out by iterations of two consecutive steps minimization with respect to the segmentation by curve evolution via euler lagrange descent equations and minimization with respect to the regions parameters via fixed point iterations using common kernel function this step amounts to mean shift parameter update we verified the effectiveness of the method by quantitative and comparative performance evaluation over large number of experiments on synthetic images as well as experiments with variety of real images such as medical satellite and natural images as well as motion maps
in main memory databases the number of processor cache misses has critical impact on the performance of the system cache conscious indices are designed to improve performance by reducing the number of processor cache misses that are incurred during search operation conventional wisdom suggests that the index’s node size should be equal to the cache line size in order to minimize the number of cache misses and improve performance as we show in this paper this design choice ignores additional effects such as the number of instructions executed and the number of tlb misses which play significant role in determining the overall performance to capture the impact of node size on the performance of cache conscious tree csb tree we first develop an analytical model based on the fundamental components of the search process this model is then validated with an actual implementation demonstrating that the model is accurate both the analytical model and experiments confirm that using node sizes much larger than the cache line size can result in better search performance for the csb tree
adaptive user interface composition is the ability of software system to compose its user interface at runtime according to given deployment profile and to possibly drop running components and activate better alternatives in their place in response to deployment profile modifications while adaptive behavior has gained interest for wide range of software products and services its support is very demanding requiring adoption of user interface architectural patterns from the early software design stages while previous research addressed the issue of engineering adaptive systems from scratch there is an important methodological gap since we lack processes to reform existing non adaptive systems towards adaptive behavior we present stepwise transformation process of user interface software by incrementally upgrading relevant class structures towards adaptive composition by treating adaptive behavior as cross cutting concern all our refactoring examples have emerged from real practice
with the recent dramatic increase in electronic access to documents text categorization mdash the task of assigning topics to given document mdash has moved to the center of the information sciences and knowledge management this article uses the structure that is present in the semantic space of topics in order to improve performance in text categorization according to their meaning topics can be grouped together into ldquo meta topics rdquo eg gold silver and copper are all metals the proposed architecture matches the hierarchical structure of the topic space as opposed to flat model that ignores the structure it accommodates both single and multiple topic assignments for each document its probabilistic interpretation allows its predictions to be combined in principled way with information from other sources the first level of the architecture predicts the probabilities of the meta topic groups this allows the individual models for each topic on the second level to focus on finer discriminations within the group evaluating the performance of two level implementation on the reuters testbed of newswire articles shows the most significant improvement for rare classes
unanticipated connection of independently developed componentsis one of the key issues in component oriented programming while variety of component oriented languages have been proposed none of them has achieved breakthrough yet in this paper we present scl simple language dedicated to component oriented programming scl integrates well known features such as component class component interface port or service all these well known features are presented discussed and compared to existing approaches because they vary quite widely from one language to another but these features are not enough to build component language indeed most approaches use language primitives and shared interfaces to connect components but shared interfaces are in contradiction with the philosophy of independently developed components to this issue scl provides new features such as uniform component composition model based on connectors connectors represent interactions between independently developed components scl also integrates component properties which enable connections based on component state changes with no requirements of specific code in components
we provide type system inspired by affine intuitionistic logic for the calculus of higher order mobile embedded resources homer resulting in the first process calculus combining affine linear non copyable and non linear copyable higher order mobile processes nested locations and local names the type system guarantees that linear resources are neither copied nor embedded in non linear resources during computation we exemplify the use of the calculus by modelling simplistic cash smart card system the security of which depends on the interplay between linear mobile hardware embedded non linear mobile processes and local names purely linear calculus would not be able to express that embedded software processes may be copied conversely purely non linear calculus would not be able to express that mobile hardware processes cannot be copied
cryptographic operations are essential for many security critical systems reasoning about information flow in such systems is challenging because typical noninterference based information flow definitions allow no flow from secret to public data unfortunately this implies that programs with encryption are ruled out because encrypted output depends on secret inputs the plaintext and the key however it is desirable to allow flows arising from encryption with secret keys provided that the underlying cryptographic algorithm is strong enough in this article we conservatively extend the noninterference definition to allow safe encryption decryption and key generation to illustrate the usefulness of this approach we propose and implement type system that guarantees noninterference for small imperative language with primitive cryptographic operations the type system prevents dangerous program behavior eg giving away secret key or confusing keys and nonkeys which we exemplify with secure implementations of cryptographic protocols because the model is based on standard noninterference property it allows us to develop some natural extensions in particular we consider public key cryptography and integrity which accommodate reasoning about primitives that are vulnerable to chosen ciphertext attacks
with the multiplication of xml data sources many xml data warehouse models have been proposed to handle data heterogeneity and complexity in way relational data warehouses fail to achieve however xml native database systems currently suffer from limited performances both in terms of manageable data volume and response time fragmentation helps address both these issues derived horizontal fragmentation is typically used in relational data warehouses and can definitely be adapted to the xml context however the number of fragments produced by classical algorithms is difficult to control in this paper we propose the use of means based fragmentation approach that allows to master the number of fragments through its parameter we experimentally compare its efficiency to classical derived horizontal fragmentation algorithms adapted to xml data warehouses and show its superiority
with over us science and mathematics education standards and rapid proliferation of web enabled curriculum retrieving curriculum that aligns with the standards to which teachers must teach is key objective for educational digital libraries however previous studies of such alignment use single dimensional and binary measures of the alignment concept as consequence they suffer from low inter rater reliability irr with experts agreeing about alignments only some of the time we present the results of an experiment in which the alignment variable was operationalized using the saracevic model of relevance clues taken from the everyday practice of teaching results show high irr across all clues with irr on several specific alignment dimensions significantly higher than on overall alignment in addition model of overall alignment is derived and estimated the structure and explanatory power of the model as well as the relationships between alignment clues differ significantly between alignments of curriculum found by users themselves and curriculum found by others these results illustrate the usefulness of clue based relevance measures for information retrieval and have important consequences for both the formulation of automated retrieval mechanisms and the construction of gold standard or benchmark set of standard curriculum alignments
open distributed systems are becoming increasingly popular such systems include components that may be obtained from number of different sources for example java allows run time loading of software components residing on remote machines one unfortunate side effect of this openness is the possibility that hostile software components may compromise the security of both the program and the system on which it runs java offers built in security mechanism using which programmers can give permissions to distributed components and check these permissions at run time this security model is flexible but using it is not straightforward which may lead to insufficiently tight permission checking and therefore breaches of securityin this paper we propose data flow algorithm for automated analysis of the flow of permissions in java programs our algorithm produces for given instruction in the program set of permissions that are checked on all possible executions up to this instruction this information can be used in program understanding tools or directly for checking properties that assert what permissions must always be checked before access to certain functionality is allowed the worst case complexity of our algorithm is low order polynomial in the number of program statements and permission types while comparable previous approaches have exponential costs
in this paper we present microsearch search system suitable for small devices used in ubiquitous computing environments akin to desktop search engine microsearch indexes the information inside small device and accurately resolves user queries given the very limited hardware resources conventional search engine designs and algorithms cannot be used we adopt information retrieval techniques for query resolution and propose space efficient algorithm to perform top query on limited hardware resources finally we present theoretical model of microsearch to better understand the tradeoffs in system design parameters by implementing microsearch on actual hardware for evaluation we demonstrate the feasibility of scaling down information retrieval systems onto very small devices
data cube construction has been the focus of much research due to its importance in improving efficiency of olap significant fraction of this work has been on rolap techniques which are based on relational technology existing rolap cubing solutions mainly focus on flat datasets which do not include hierarchies in their dimensions nevertheless the nature of hierarchies introduces several complications into cube construction making existing techniques essentially inapplicable in significant number of real world applications in particular hierarchies raise three main challenges the number of nodes in cube lattice increases dramatically and its shape is more involved these require new forms of lattice traversal for efficient execution the number of unique values in the higher levels of dimension hierarchy may be very small hence partitioning data into fragments that fit in memory and include all entries of particular value may often be impossible this requires new partitioning schemes the number of tuples that need to be materialized in the final cube increases dramatically this requires new storage schemes that remove all forms of redundancy for efficient space utilization in this paper we propose cure novel rolap cubing method that addresses these issues and constructs complete data cubes over very large datasets with arbitrary hierarchies cure contributes novel lattice traversal scheme an optimized partitioning method and suite of relational storage schemes for all forms of redundancy we demonstrate the effectiveness of cure through experiments on both real world and synthetic datasets among the experimental results we distinguish those that have made cure the first rolap technique to complete the construction of the cube of the highest density dataset in the apb benchmark gb cure was in fact quite efficient on this showing great promise with respect to the potential of the technique overall
in this article we present the integration of shape knowledge into variational model for level set based image segmentation and contour based pose tracking given the surface model of an object that is visible in the image of one or multiple cameras calibrated to the same world coordinate system the object contour extracted by the segmentation method is applied to estimate the pose parameters of the object vice versa the surface model projected to the image plane helps in top down manner to improve the extraction of the contour while common alternative segmentation approaches which integrate shape knowledge face the problem that an object can look very differently from various viewpoints free form model ensures that for each view the model can fit the data in the image very well moreover one additionally solves the problem of determining the object’s pose in space the performance is demonstrated by numerous experiments with monocular and stereo camera system
we introduce novel representation for random access rendering of antialiased vector graphics on the gpu along with efficient encoding and rendering algorithms the representation supports broad class of vector primitives including multiple layers of semitransparent filled and stroked shapes with quadratic outlines and color gradients our approach is to create coarse lattice in which each cell contains variable length encoding of the graphics primitives it overlaps these cell specialized encodings are interpreted at runtime within pixel shader advantages include localized memory access and the ability to map vector graphics onto arbitrary surfaces or under arbitrary deformations most importantly we perform both prefiltering and supersampling within single pixel shader invocation achieving inter primitive antialiasing at no added memory bandwidth cost we present an efficient encoding algorithm and demonstrate high quality real time rendering of complex real world examples
transaction management on mobile database systems mds has to cope with number of constraints such as limited bandwidth low processing power unreliable communication and mobility etc as result of these constraints traditional concurrency control mechanisms are unable to manage transactional activities to maintain availability innovative transaction execution schemes and concurrency control mechanisms are therefore required to exploit the full potential of mds in this paper we report our investigation on multi versions transaction processing approach and deadlock free concurrency control mechanism based on multiversion two phase locking scheme integrated with timestamp approach we study the behavior of the proposed model with simulation study in mds environment we have compared our schemes using reference model to argue that such performance comparison helps to show the superiority of our model over others experimental results demonstrate that our model provide significantly higher throughput by improving degree of concurrency by reducing transaction wait time and by minimizing restarts and aborts
the cade atp system competition casc is an annual evaluation of fully automatic first order automated theorem proving atp systems casc was the thirteenth competition in the casc series twenty six atp systems and system variants competed in the various competition and demonstration divisions an outline of the competition design and commentated summary of the results are presented
this paper presents theoretical approach that has been developed to capture the computational intensity and computing resource requirements of geographical data and analysis methods these requirements are then transformed into common framework grid based representation of spatial computational domain which supports the efficient use of emerging cyberinfrastructure environments two key types of transformational functions data centric and operation centric are identified and their relationships are explained the application of the approach is illustrated using two geographical analysis methods inverse distance weighted interpolation and the spatial statistic we describe the underpinnings of these two methods present their conventional sequential algorithms and then address their latent parallelism based on spatial computational domain representation through the application of this theoretical approach the development of domain decomposition methods is decoupled from specific high performance computer architectures and task scheduling implementations which makes the design of generic parallel processing solutions feasible for geographical analyses
we present and evaluate simple yet efficient optimization technique that improves memory hierarchy performance for pointer centric applications by up to and reduces cache misses by up to this is achieved by selecting an improved ordering for the data members of pointer based data structures our optimization is applicable to all type safe programming languages that completely abstract from physical storage layout examples of such languages are java and oberon our technique does not involve programmers in the optimization process but runs fully automatically guided by dynamic profiling information that captures which paths through the program are taken with that frequencey the algorithm first strives to cluster data members that are accessed closely after one another onto the same cache line increasing spatial locality then the data members that have been mapped to particular cache line are ordered to minimize load latency in case of cache miss
we propose method for discovering the dependency relationships between the topics of documents shared in social networks using the latent social interactions attempting to answer the question given seemingly new topic from where does this topic evolve in particular we seek to discover the pair wise probabilistic dependency in topics of documents which associate social actors from latent social network where these documents are being shared by viewing the evolution of topics as markov chain we estimate markov transition matrix of topics by leveraging social interactions and topic semantics metastable states in markov chain are applied to the clustering of topics applied to the citeseer dataset collection of documents in academia we show the trends of research topics how research topics are related and which are stable we also show how certain social actors authors impact these topics and propose new ways for evaluating author impact
munin is distributed shared memory dsm system that allows shared memory parallel programs to be executed efficiently on distributed memory multiprocessors munin is unique among existing dsm systems in its use of multiple consistency protocols and in its use of release consistency in munin shared program variables are annotated with their expected access pattern and these annotations are then used by the runtime system to choose consistency protocol best suited to that access pattern release consistency allows munin to mask network latency and reduce the number of messages required to keep memory consistent munin’s multiprotocol release consistency is implemented in software using delayed update queue that buffers and merges pending outgoing writes sixteen processor prototype of munin is currently operational we evaluate its implementation and describe the execution of two munin programs that achieve performance within ten percent of message passing implementations of the same programs munin achieves this level of performance with only minor annotations to the shared memory programs
this paper presents generative model for textures that uses local sparse description of the image content this model enforces the sparsity of the expansion of local texture patches on adapted atomic elements the analysis of given texture within this framework performs the sparse coding of all the patches of the texture into the dictionary of atoms conversely the synthesis of new texture is performed by solving an optimization problem that seeks for texture whose patches are sparse in the dictionary this paper explores several strategies to choose this dictionary set of hand crafted dictionaries composed of edges oscillations lines or crossings elements allows to synthesize synthetic images with geometric features another option is to define the dictionary as the set of all the patches of an input exemplar this leads to computer graphics methods for synthesis and shares some similarities with non local means filtering the last method we explore learns the dictionary by an optimization process that maximizes the sparsity of set of exemplar patches applications of all these methods to texture synthesis inpainting and classification shows the efficiency of the proposed texture model
optical flow computation is well known technique and there are important fields in which the application of this visual modality commands high interest nevertheless most real world applications require real time processing an issue which has only recently been addressed most real time systems described to date use basic models which limit their applicability to generic tasks especially when fast motion is presented or when subpixel motion resolution is required therefore instead of implementing complex optical flow approach we describe here very high frame rate optical flow processing system recent advances in image sensor technology make it possible nowadays to use high frame rate sensors to properly sample fast motion ie as low motion scene which makes gradient based approach one of the best options in terms of accuracy and consumption of resources for any real time implementation taking advantage of the regular data flow of this kind of algorithm our approach implements novel superpipelined fully parallelized architecture for optical flow processing the system is fully working and is organized into more than pipeline stages which achieve data throughput of one pixel per clock cycle this computing scheme is well suited to fpga technology and vlsi implementation the developed customized dsp architecture is capable of processing up to frames per second at resolution of pixels we discuss the advantages of high frame rate processing and justify the optical flow model chosen for the implementation we analyze this architecture measure the system resource requirements using fpga devices and finally evaluate the system’s performance and compare it with other approaches described in the literature
hardware systems and reactive software systems can be described as the composition of several concurrently active processes automated reasoning based on model checking algorithms can substantially increase confidence in the overall reliability of system direct methods for model checking concurrent composition however usually suffer from the explosion in the number of program states that arises from concurrency reasoning compositionally about individual processes helps mitigate this problem number of rules have been proposed for compositional reasoning typically based on an assume guarantee reasoning paradigm reasoning with these rules can be delicate as some are syntactically circular in nature in that assumptions and guarantees are mutually dependent this is known to be source of unsoundness in this article we investigate rules for compositional reasoning from the viewpoint of completeness we show that several rules are incomplete that is there are properties whose validity cannot be established using only these rules we derive new circular reasoning rule and show it to be sound and complete we show that the auxiliary assertions needed for completeness need be defined only on the interface of the component processes we also show that the two main paradigms of circular and noncircular reasoning are closely related in that proof of one type can be transformed in straightforward manner to one of the other type these results give some insight into the applicability of compositional reasoning methods
the development of new techniques and the emergence of new high throughput tools have led to new information revolution the amount and the diversity of the information that need to be stored and processed have led to the adoption of data integration systems in order to deal with information extraction from disparate sources the mediation between traditional databases and ontologies has been recognized as cornerstone issue in bringing in legacy data with formal semantic meaning however our knowledge evolves due to the rapid scientific development so ontologies and schemata need to change in order to capture and accommodate such an evolution when ontologies change these changes should somehow be rendered and used by the pre existing data integration systems problem that most of the integration systems seem to ignore in this paper we review existing approaches for ontology schema evolution and examine their applicability in state of the art ontology based data integration setting then we show that changes in schemata differ significantly from changes in ontologies this strengthens our position that current state of the art systems are not adequate for ontology based data integration so we give the requirements for an ideal data integration system that will enable and exploit ontology evolution
abstract recurrent neural networks readily process recognize and generate temporal sequences by encoding grammatical strings as temporal sequences recurrent neural networks can be trained to behave like deterministic sequential finite state automata algorithms have been developed for extracting grammatical rules from trained networks using simple method for inserting prior knowledge or rules into recurrent neural networks we show that recurrent neural networks are able to perform rule revision rule revision is performed by comparing the inserted rules with the rules in the finite state automata extracted from trained networks the results from training recurrent neural network to recognize known non trivial randomly generated regular grammar show that not only do the networks preserve correct rules but that they are able to correct through training inserted rules which were initially incorrect by incorrect we mean that the rules were not the ones in the randomly generated grammar
recently mining from data streams has become an important and challenging task for many real world applications such as credit card fraud protection and sensor networking one popular solution is to separate stream data into chunks learn base classifier from each chunk and then integrate all base classifiers for effective classification in this paper we propose new dynamic classifier selection dcs mechanism to integrate base classifiers for effective mining from data streams the proposed algorithm dynamically selects single best classifier to classify each test instance at run time our scheme uses statistical information from attribute values and uses each attribute to partition the evaluation set into disjoint subsets followed by procedure that evaluates the classification accuracy of each base classifier on these subsets given test instance its attribute values determine the subsets that the similar instances in the evaluation set have constructed and the classifier with the highest classification accuracy on those subsets is selected to classify the test instance experimental results and comparative studies demonstrate the efficiency and efficacy of our method such dcs scheme appears to be promising in mining data streams with dramatic concept drifting or with significant amount of noise where the base classifiers are likely conflictive or have low confidence
in this work we address the issue of efficient processing of range queries in dht based pp data networks the novelty of the proposed approach lies on architectures algorithms and mechanisms for identifying and appropriately exploiting powerful nodes in such networks the existence of such nodes has been well documented in the literature and plays key role in the architecture of most successful real world pp applications however till now this heterogeneity has not been taken into account when architecting solutions for complex query processing especially in dht networks with this work we attempt to fill this gap for optimizing the processing of range queries significant performance improvements are achieved due to ensuring much smaller hop count performance for range queries and ii avoiding the dangers and inefficiencies of relying for range query processing on weak nodes with respect to processing storage and communication capacities and with intermittent connectivity we present detailed experimental results validating our performance claims
we present the auckland layout model alm constraint based technique for specifying layout as it is used for arranging the controls in graphical user interface gui most gui frameworks offer layout managers that are basically adjustable tables often adjacent table cells can be merged in the alm the focus switches from the table cells to vertical and horizontal tabulators between the cells on the lowest level of abstraction the model applies linear constraints and an optimal layout is calculated using linear programming however bare linear programming makes layout specification cumbersome and unintuitive especially for gui domain experts who are often not used to such mathematical formalisms in order to improve the usability of the model alm offers several other layers of abstraction that make it possible to define common gui layout more easily in the domain of user interfaces it is important that specifications are not over constrained therefore alm introduces soft constraints which are automatically translated to appropriate hard linear constraints and terms in the objective function guis are usually composed of rectangular areas containing controls therefore alm offers an abstraction for such areas dynamic resizing behavior is very important for guis hence areas have domain specific parameters specifying their minimum maximum and preferred sizes from such definitions hard and soft constraints are automatically derived third level of abstraction allows designers to arrange guis in tabular fashion using abstractions for columns and rows which offer additional parameters for ordering and alignment row and column definitions are used to automatically generate definitions from lower levels of abstraction such as hard and soft constraints and areas specifications from all levels of abstraction can be consistently combined offering gui developers rich set of tools that is much closer to their needs than pure linear constraints incremental computation of solutions makes constraint solving fast enough for near real time use
in the area of image retrieval from data bases and for copyright protection of large image collections there is growing demand for unique but easily computable fingerprints for images these fingerprints can be used to quickly identify every image within larger set of possibly similar images this paper introduces novel method to automatically obtain such fingerprints from an image it is based on reinterpretation of an image as riemannian manifold this representation is feasible for gray value images and color images we discuss the use of the spectrum of eigenvalues of different variants of the laplace operator as fingerprint and show the usability of this approach in several use cases contrary to existing works in this area we do not only use the discrete laplacian but also with particular emphasis the underlying continuous operator this allows better results in comparing the resulting spectra and deeper insights in the problems arising we show how the well known discrete laplacian is related to the continuous laplace beltrami operator furthermore we introduce the new concept of solid height functions to overcome some potential limitations of the method
liveness analysis is an important analysis in optimizing compilers liveness information is used in several optimizations and is mandatory during the code generation phase two drawbacks of conventional liveness analyses are that their computations are fairly expensive and their results are easily invalidated by program transformations we present method to check liveness of variables that overcomes both obstacles the major advantage of the proposed method is that the analysis result survives all program transformations except for changes in the control flow graph for common program sizes our technique is faster and consumes less memory than conventional data flow approaches thereby we heavily make use of ssa form properties which allow us to completely circumvent data flow equation solving we evaluate the competitiveness of our approach in an industrial strength compiler our measurements use the integer part of the spec benchmarks and investigate the liveness analysis used by the ssa destruction pass we compare the net time spent in liveness computations of our implementation against the one provided by that compiler the results show that in the vast majority of cases our algorithm while providing the same quality of information needs less time an average speed up of
we have developed self healing key distribution scheme for secure multicast group communications for wireless sensor network environment we present strategy for securely distributing rekeying messages and specify techniques for joining and leaving group access control in multicast system is usually achieved by encrypting the content using an encryption key known as the group key session key that is only known by the group controller and all legitimate group members in our scheme all rekeying messages except for unicast of an individual key are transmitted without any encryption using one way hash function and xor operation in our proposed scheme nodes are capable of recovering lost session keys on their own without requesting additional transmission from the group controller the proposed scheme provides both backward and forward secrecy we analyze the proposed scheme to verify that it satisfies the security and performance requirements for secure group communication
name ambiguity problem has raised an urgent demand for efficient high quality named entity disambiguation methods the key problem of named entity disambiguation is to measure the similarity between occurrences of names the traditional methods measure the similarity using the bag of words bow model the bow however ignores all the semantic relations such as social relatedness between named entities associative relatedness between concepts polysemy and synonymy between key terms so the bow cannot reflect the actual similarity some research has investigated social networks as background knowledge for disambiguation social networks however can only capture the social relatedness between named entities and often suffer the limited coverage problem to overcome the previous methods deficiencies this paper proposes to use wikipedia as the background knowledge for disambiguation which surpasses other knowledge bases by the coverage of concepts rich semantic information and up to date content by leveraging wikipedia’s semantic knowledge like social relatedness between named entities and associative relatedness between concepts we can measure the similarity between occurrences of names more accurately in particular we construct large scale semantic network from wikipedia in order that the semantic knowledge can be used efficiently and effectively based on the constructed semantic network novel similarity measure is proposed to leverage wikipedia semantic knowledge for disambiguation the proposed method has been tested on the standard weps data sets empirical results show that the disambiguation performance of our method gets improvement over the traditional bow based methods and improvement over the traditional social network based methods
physically motivated method for surface reconstruction is proposed that can recover smooth surfaces from noisy and sparse data sets no orientation information is required by new technique based on regularized membrane potentials the input sample points are aggregated leading to improved noise tolerability and outlier removal without sacrificing much with respect to detail feature recovery after aggregating the sample points on volumetric grid novel iterative algorithm is used to classify grid points as exterior or interior to the surface this algorithm relies on intrinsic properties of the smooth scalar field on the grid which emerges after the aggregation step second mesh smoothing paradigm based on mass spring system is introduced by enhancing this system with bending energy minimizing term we ensure that the final triangulated surface is smoother than piecewise linear in terms of speed and flexibility the method compares favorably with respect to previous approaches most parts of the method are implemented on modern graphics processing units gpus results in wide variety of settings are presented ranging from surface reconstruction on noise free point clouds to grayscale image segmentation
we present memory management scheme for java based on thread local heaps assuming most objects are created and used by single thread it is desirable to free the memory manager from redundant synchronization for thread local objects therefore in our scheme each thread receives partition of the heap in which it allocates its objects and in which it does local garbage collection without synchronization with other threads we dynamically monitor to determine which objects are local and which are global furthermore we suggest using profiling to identify allocation sites that almost exclusively allocate global objects and allocate objects at these sites directly in global areawe have implemented the thread local heap memory manager and preliminary mechanism for direct global allocation on an ibm prototype of jdk for windows our measurements of thread local heaps with direct global allocation on way multiprocessor ibm netfinity server show that the overall garbage collection times have been substantially reduced and that most long pauses have been eliminated
the recent proliferation of location based services lbss has necessitated the development of effective indoor positioning solutions in such context wireless local area network wlan positioning is particularly viable solution in terms of hardware and installation costs due to the ubiquity of wlan infrastructures this paper examines three aspects of the problem of indoor wlan positioning using received signal strength rss first we show that due to the variability of rss features over space spatially localized positioning method leads to improved positioning results second we explore the problem of access point ap selection for positioning and demonstrate the need for further research in this area third we present kernelized distance calculation algorithm for comparing rss observations to rss training records experimental results indicate that the proposed system leads to percent improvement over the widely used nearest neighbor and histogram based methods
grids involve coordinated resource sharing and problem solving in heterogeneous dynamic environments to meet the needs of generation of researchers requiring large amounts of bandwidth and more powerful computational resources the lack of resource ownership by grid schedulers and fluctuations in resource availability require mechanisms which will enable grids to adjust themselves to cope with fluctuations the lack of central controller implies need for self adaptation grids must thus be enabled with the ability to discover monitor and manage the use of resources so they can operate autonomously two different approaches have been conceived to match the resource demands of grid applications to resource availability dynamic scheduling and adaptive scheduling however these two approaches fail to address at least one of three important issues the production of feasible schedules in reasonable amount of time in relation to that required for the execution of an application ii the impact of network link availability on the execution time of an application and iii the necessity of migrating codes to decrease the execution time of an application to overcome these challenges this paper proposes procedure for enabling grid applications composed of various dependent tasks to deal with the availability of hosts and links bandwidth this procedure involves task scheduling resource monitoring and task migration with the goal of decreasing the execution time of grid applications the procedure differs from other approaches in the literature because it constantly considers changes in resource availability especially network bandwidth availability to trigger task migration the proposed procedure is illustrated via simulation using various scenarios involving fluctuation of resource availability an additional contribution of this paper is the introduction of set of schedulers offering solutions which differ in terms of both schedule length and computational complexity the distinguishing aspect of this set of schedulers is the consideration of time requirements in the production of feasible schedules performance is then evaluated considering various network topologies and task dependencies
this paper proposes weighted power series model for face verification scores fusion essentially linear parametric power series model is adopted to directly minimize an approximated total error rate for fusion of multi modal face verification scores unlike the conventional least squares error minimization approach which involves fitting of learning model to data density and then perform threshold process for error counting this work directly formulates the required target error count rate in terms of design model parameters with closed form solution the solution is found to belong to specific setting of the weighted least squares our experiments on fusing scores from visual and infra red face images as well as on public data sets show promising results
the increasing amount of communication between individuals in formats eg email instant messaging and the web has motivated computational research in social network analysis sna previous work in sna has emphasized the social network sn topology measured by communication frequencies while ignoring the semantic information in sns in this paper we propose two generative bayesian models for semantic community discovery in sns combining probabilistic modeling with community detection in sns to simulate the generative models an enf gibbs sampling algorithm is proposed to address the efficiency and performance problems of traditional methods experimental studies on enron email corpus show that our approach successfully detects the communities of individuals and in addition provides semantic topic descriptions of these communities
how many pages are there on the web more less big bets on clusters in the clouds could be wiped out if small cache of few million urls could capture much of the value language modeling techniques are applied to msn’s search logs to estimate entropy the perplexity is surprisingly small millions not billions entropy is powerful tool for sizing challenges and opportunities how hard is search how hard are query suggestion mechanisms like auto complete how much does personalization help all these difficult questions can be answered by estimation of entropy from search logs what is the potential opportunity for personalization in this paper we propose new way to personalize search personalization with backoff if we have relevant data for particular user we should use it but if we don’t back off to larger and larger classes of similar users as proof of concept we use the first few bytes of the ip address to define classes the coefficients of each backoff class are estimated with an em algorithm ideally classes would be defined by market segments demographics and surrogate variables such as time and geography
we address the problem of finding best deterministic query answer to query over probabilistic database for this purpose we propose the notion of consensus world or consensus answer which is deterministic world answer that minimizes the expected distance to the possible worlds answers this problem can be seen as generalization of the well studied inconsistent information aggregation problems eg rank aggregation to probabilistic databases we consider this problem for various types of queries including spj queries top ranking queries group by aggregate queries and clustering for different distance metrics we obtain polynomial time optimal or approximation algorithms for computing the consensus answers or prove np hardness most of our results are for general probabilistic database model called and xor tree model which significantly generalizes previous probabilistic database models like tuples and block independent disjoint models and is of independent interest
this paper presents novel architectural solution to address the problem of scalable routing in very large sensor networks the control complexities of the existing sensor routing protocols both flat and with traditional hierarchy do not scale very well for large networks with potentially hundreds of thousands of embedded sensor devices this paper develops novel routing solution off network control processing oncp that achieves control scalability in large sensor networks by shifting certain amount of routing functions off network this routing approach consisting of coarse grain global routing and distributed fine grain local routing is proposed for achieving scalability by avoiding network wide control message dissemination we present the oncp architectural concepts and analytically characterize its performance in relation to both flat and traditional hierarchical sensor routing architectures we also present ns based experimental results which indicate that for very large networks the packet drop latency and energy performance of oncp can be significantly better than those for flat sensor routing protocols such as directed diffusion and cluster based traditional hierarchical protocols such as cbrp
constrained clustering has recently become an active research topic this type of clustering methods takes advantage of partial knowledge in the form of pairwise constraints and acquires significant improvement beyond the traditional unsupervised clustering however most of the existing constrained clustering methods use constraints which are selected at random recently active constrained clustering algorithms utilizing active constraints have proved themselves to be more effective and efficient in this paper we propose an improved algorithm which introduces multiple representatives into constrained clustering to make further use of the active constraints experiments on several benchmark data sets and public image data sets demonstrate the advantages of our algorithm over the referenced competitors
this paper analyzes memory access scheduling and virtual channels as mechanisms to reduce the latency of main memory accesses by the cpu and peripherals in web servers despite the address filtering effects of the cpu’s cache hierarchy there is significant locality and bank parallelism in the dram access stream of web server which includes traffic from the operating system application and peripherals however sequential memory controller leaves much of this locality and parallelism unexploited as serialization and bank conflicts affect the realizable latency aggressive scheduling within the memory controller to exploit the available parallelism and locality can reduce the average read latency of the sdram however bank conflicts and the limited ability of the sdram’s internal row buffers to act as cache hinder further latency reduction virtual channel sdramovercomes these limitations by providing set of channel buffers that can hold segments from rows of any internal sdram bank this paper presents memory controller policies that can make effective use of these channel buffers to further reduce the average read latency of the sdram
locating faults in program can be very time consuming and arduous and therefore there is an increased demand for automated techniques that can assist in the fault localization process in this paper code coverage based method with family of heuristics is proposed in order to prioritize suspicious code according to its likelihood of containing program bugs highly suspicious code ie code that is more likely to contain bug should be examined before code that is relatively less suspicious and in this manner programmers can identify and repair faulty code more efficiently and effectively we also address two important issues first how can each additional failed test case aid in locating program faults and second how can each additional successful test case help in locating program faults we propose that with respect to piece of code the contribution of the first failed test case that executes it in computing its likelihood of containing bug is larger than or equal to that of the second failed test case that executes it which in turn is larger than or equal to that of the third failed test case that executes it and so on this principle is also applied to the contribution provided by successful test cases that execute the piece of code tool gdebug was implemented to automate the computation of the suspiciousness of the code and the subsequent prioritization of suspicious code for locating program faults to validate our method case studies were performed on six sets of programs siemens suite unix suite space grep gzip and make data collected from the studies are supportive of the above claim and also suggest heuristics iii and of our method can effectively reduce the effort spent on fault localization
in order to achieve high performance wide issue superscalar processors have to fetch large number of instructions per cycle conditional branches are the primary impediment to increasing the fetch bandwidth because they can potentially alter the flow of control and are very frequent to overcome this problem these processors need to predict the outcome of multiple branches in cycle this paper investigates two control flow prediction schemes that predict the effective outcome of multiple branches with the help of single prediction instead of considering branches as the basic units of prediction these schemes consider subgraphs of the control flow graph of the executed program as the basic units of prediction and predict the target of an entire subgraph at time thereby allowing the superscalar fetch mechanism to go past multiple branches in cycle the first control flow prediction scheme investigated considers sequential block like subgraphs and the second scheme considers tree like subgraphs to make the control flow predictions both schemes do out of prediction as opposed to the out of prediction done by branch level prediction schemes these two schemes are evaluated using mips isa based way superscalar microarchitecture an improvement in effective fetch size of approximately percent and percent respectively is observed over identical microprocessors that use branch level prediction no appreciable difference in the prediction accuracy was observed although the control flow prediction schemes predicted out of outcomes
wireless mesh networks wmns are considered as cost effective easily deployable and capable of extending internet connectivity however one of the major challenges in deploying reliable wmns is preventing their nodes from malicious attacks which is of particular concern as attacks can severely degrade network performance when dos attack is targeted over an entire communication path it is called path based dos attack we study the performance impact of path based dos attacks by considering attack intensity medium errors physical diversity collusion and hop count we setup wireless mesh testbed and configure set of experiments to gather realistic measurements and assess the effects of different factors we find that medium errors have significant impact on the performance of wmns when path based dos attack is carried out and the impact is exacerbated by the mac layer retransmissions we show that due to physical diversity far attacker can lead to an increased performance degradation than close by attacker additionally we demonstrate that the joint impact of two colluding attackers is not as severe as the joint result of individual attacks we also discuss strategy to counter path based dos attacks which can potentially alleviate the impact of the attack significantly
current cscw applications support one or more modes of cooperative work the selection of and transition between these modes is usually placed on the users at ipsi we built the sepia cooperative hypermedia authoring environment supporting whole range of situations arising during collaborative work and the smooth transitions between them while early use of the system shows the benefits of supporting smooth transitions between different collaborative modes it also reveals some deficits regarding parallel work management of alternative documents or reuse of document parts we propose to integrate version support to overcome these limitations this leads to versioned data management and an extended user interface enabling concurrent users to select certain state of their work to be aware of related changes and to cooperate with others either asynchronously or synchronously
trust between users is an important piece of knowledge that can be exploited in search and recommendationgiven that user supplied trust relationships are usually very sparse we study the prediction of trust relationships using user interaction features in an online user generated review application context we show that trust relationship prediction can achieve better accuracy when one adopts personalized and cluster based classification methods the former trains one classifier for each user using user specific training data the cluster based method first constructs user clusters before training one classifier for each user cluster our proposed methods have been evaluated in series of experiments using two datasets from epinionscom it is shown that the personalized and cluster based classification methods outperform the global classification method particularly for the active users
due to increasing system decentralization component heterogeneity and interface complexities many trustworthiness challenges become more and more complicated and intertwined moreover there is lack of common understanding of software trustworthiness and its related development methodology this paper reports preliminary results from an ongoing collaborative research project among international research units which aims at exploring theories and methods for enhancing existing software process techniques for trustworthy software development the results consist in two parts the proposal of new concept of process trustworthiness as capability indicator to measure the relative degree of confidence for certain software processes to deliver trustworthy software and the introduction of the architecture of trustworthy process management framework tpmf toolkit for process runtime support in measuring and improving process trustworthiness in order to assess and assure software trustworthiness
the research issue of broadcasting has attracted considerable amount of attention in mobile computing system by utilizing broadcast channels server is able to continuously and repeatedly broadcast data to mobile users from these broadcast channels mobile users obtain the data of interest efficiently and only need to wait for the required data to be present on the broadcast channel given the access frequencies of data items one can design proper data allocation in the broadcast channels to reduce the average expected delay of data items in practice the data access frequencies may vary with time we explore in this paper the problem of adjusting broadcast programs to effectively respond to the changes of data access frequencies and develop an efficient algorithm dl to address this problem performance of algorithm dl is analyzed and system simulator is developed to validate our results sensitivity analysis on several parameters including the number of data items the number of broadcast disks and the variation of access frequencies is conducted it is shown by our results that the broadcast programs adjusted by algorithm dl are of very high quality and are in fact very close to the optimal ones
contemporary database applications often perform queries in hybrid data spaces hds where vectors can have mix of continuous valued and non ordered discrete valued dimensions to support efficient query processing for an hds robust indexing method is required existing indexing techniques to process queries efficiently either apply to continuous data spaces eg the tree or non ordered discrete data spaces eg the nd tree no techniques directly indexing vectors in hdss have been reported in the literature in this paper we propose new multidimensional indexing technique called the nd tree to directly index vectors in an hds to build such an index we first introduce some essential geometric concepts eg hybrid bounding rectangle in hdss the nd tree structure and the relevant tree building and query processing algorithms based on these geometric concepts in hdss are then presented strategies have been suggested to make the values in continuous dimensions and non ordered discrete dimensions comparable and controllable novel node splitting heuristics which exploit characteristics of both continuous and discrete dimensions are proposed performance of the nd tree is compared with that of linear scan tree and nd tree using range queries on hybrid data experimental results demonstrate that the nd tree is quite promising in supporting range queries in hdss
it is now well admitted that formal methods are helpful for many issues raised in the web service area in this paper we present framework for the design and the verification of wss using process algebras and their tools we define two way mapping between abstract specifications written using these calculi and executable web services written in bpelws the translation includes also compensation event and fault handlers the following choices are available design and verification in bpelws using process algebra tools or design and verification in process algebra and automatically obtaining the corresponding bpelws code the approaches can be combined process algebras are not useful only for temporal logic verification we remark the use of simulation bisimulation for verification for the hierarchical refinement design method for the service redundancy analysis in community and for replacing service with another one in composition
with the increased use of virtual machines vms as vehicles that isolate applications running on the same host it is necessary to devise techniques that enable multiple vms to share underlying resources both fairly and efficiently to that end one common approach is to deploy complex resource management techniques in the hosting infrastructure alternately in this paper we advocate the use of self adaptation in the vms themselves based on feedback about resource usage and availability consequently we define friendly vm fvm to be virtual machine that adjusts its demand for system resources so that they are both efficiently and fairly allocated to competing fvms such properties are ensured using one of many provably convergent control rules such as additive increase multiplicative decrease aimd by adopting this distributed application based approach to resource management it is not necessary to make assumptions about the underlying resources nor about the requirements of fvms competing for these resources to demonstrate the elegance and simplicity of our approach we present prototype implementation of our fvm framework in user mode linux uml an implementation that consists of less than lines of code changes to uml we present an analytic control theoretic model of fvm adaptation which establishes convergence and fairness properties these properties are also backed up with experimental results using our prototype fvm implementation
with power consumption becoming increasingly critical in interconnected systems power aware networks will become part and parcel of many single chip and multichip systems as communication links consume significant power regardless of utilization mechanism to realize such power aware networks is on off links network links that can be turned on off as function of traffic in this paper we investigate and propose self regulating power aware interconnection networks that turn their links on off in response to bursts and dips in traffic in distributed fashion we explore the design space of such on off networks outlining step design methodology along with various building block solutions at each step that can be effectively assembled to develop various on off network designs we applied our methodology to the design of two classes of on off networks with links that possess substantially different on off delays an on chip network as well as chip to chip network and show that our designs are able to adapt dynamically to variations in network traffic three specific network designs are then constructed presented and evaluated our simulations show that link power consumption can be reduced by up to percent with modest increase in network latency
fraud is serious problem that costs the worldwide economy billions of dollars annually however fraud detection is difficult as perpetrators actively attempt to masquerade their actions among typically overwhelming large volumes of legitimate activity in this paper we investigate the fraud detection problem and examine how learning classifier systems can be applied to it we describe the common properties of fraud introducing an abstract problem which can be tuned to exhibit those characteristics we report experiments on this abstract problem with popular real time learning classifier system algorithm results from our experiments demonstrating that this approach can overcome the difficulties inherent to the fraud detection problem finally we apply the algorithm to real world problem and show that it can achieve good performance in this domain
the predictability of data values is studied at fundamental level two basic predictor models are defined computational predictors perform an operation on previous values to yield predicted next values examples we study are stride value prediction which adds delta to previous value and last value prediction which performs the trivial identity operation on the previous value context based predictors match recent value history context with previous value history and predict values based entirely on previously observed patterns to understand the potential of value prediction we perform simulations with unbounded prediction tables that are immediately updated using correct data values simulations of integer spec benchmarks show that data values can be highly predictable best performance is obtained with context based predictors overall prediction accuracies are between and the context based predictor typically has an accuracy about better than the computational predictors last value and stride comparison of context based prediction and stride prediction shows that the higher accuracy of context based prediction is due to relatively few static instructions giving large improvements this suggests the usefulness of hybrid predictors among different instruction types predictability varies significantly in general load and shift instructions are more difficult to predict correctly whereas add instructions are more predictable
data clustering has been discussed extensively but almost all known conventional clustering algorithms tend to break down in high dimensional spaces because of the inherent sparsity of the data points existing subspace clustering algorithms for handling high dimensional data focus on numerical dimensions in this paper we designed an iterative algorithm called subcad for clustering high dimensional categorical data sets based on the minimization of an objective function for clustering we deduced some cluster memberships changing rules using the objective function we also designed an objective function to determine the subspace associated with each cluster we proved various properties of this objective function that are essential for us to design fast algorithm to find the subspace associated with each cluster finally we carried out some experiments to show the effectiveness of the proposed method and the algorithm
with more and more large networks becoming available mining and querying such networks are increasingly important tasks which are not being supported by database models and querying languages this paper wants to alleviate this situation by proposing data model and query language for facilitating the analysis of networks key features include support for executing external tools on the networks flexible contexts on the network each resulting in different graph primitives for querying subgraphs including paths and transforming graphs the data model provides for closure property in which the output of every query can be stored in the database and used for further querying
large real time software systems such as real time java virtual machines often use barrier protocols which work for dynamically varying number of threads without using centralized locking such barrier protocols however still suffer from priority inversion similar to centralized locking we introduce gang priority management as generic solution for avoiding unbounded priority inversion in barrier protocols our approach is either kernel assisted for efficiency or library based for portability but involves cooperation from the protocol designer for generality we implemented gang priority management in the linux kernel and rewrote the garbage collection safe point barrier protocol in ibm’s websphere real time java virtual machine to exploit it we run experiments on an way smp machine in multi user and multi process environment and show that by avoiding unbounded priority inversion the maximum latency to reach barrier point is reduced by factor of and the application jitter is reduced by factor of
achieving intuitive control of animated surface deformation while observing specific style is an important but challenging task in computer graphics solutions to this task can find many applications in data driven skin animation computer puppetry and computer games in this paper we present an intuitive and powerful animation interface to simultaneously control the deformation of large number of local regions on deformable surface with minimal number of control points our method learns suitable deformation subspaces from training examples and generate new deformations on the fly according to the movements of the control points our contributions include novel deformation regression method based on kernel canonical correlation analysis cca and poisson based translation solving technique for easy and fast deformation control based on examples our run time algorithm can be implemented on gpus and can achieve few hundred frames per second even for large datasets with hundreds of training examples
the specification of schema mappings has proved to be time and resource consuming and has been recognized as critical bottleneck to the large scale deployment of data integration systems in an attempt to address this issue dataspaces have been proposed as data management abstraction that aims to reduce the up front cost required to setup data integration system by gradually specifying schema mappings through interaction with end users in pay as you go fashion as step in this direction we explore an approach for incrementally annotating schema mappings using feedback obtained from end users in doing so we do not expect users to examine mapping specifications rather they comment on results to queries evaluated using the mappings using annotations computed on the basis of user feedback we present method for selecting from the set of candidate mappings those to be used for query evaluation considering user requirements in terms of precision and recall in doing so we cast mapping selection as an optimization problem mapping annotations may reveal that the quality of schema mappings is poor we also show how feedback can be used to support the derivation of better quality mappings from existing mappings through refinement an evolutionary algorithm is used to efficiently and effectively explore the large space of mappings that can be obtained through refinement the results of evaluation exercises show the effectiveness of our solution for annotating selecting and refining schema mappings
watermarking algorithms provide way of hiding or embedding some bits of information in watermark in the case of watermarking model many algorithms employ so called indexed localization scheme in this paper we propose an optimization framework with two new steps for such watermarking algorithms to improve their capacity and invisibility the first step is to find an optimal layout of invariant units to improve capacity the second step is to rearrange the correspondence between the watermark units and the invariant units to improve invisibility experimental tests show that by using this framework the capacity and invisibility of watermarking algorithms can be greatly improved
many virtual environments and games must be populated with synthetic characters to create the desired experience these characters must move with sufficient realism so as not to destroy the visual quality of the experience yet be responsive controllable and efficient to simulate in this paper we present an approach to character motion called snap together motion that addresses the unique demands of virtual environments snap together motion stm preprocesses corpus of motion capture examples into set of short clips that can be concatenated to make continuous streams of motion the result process is simple graph structure that facilitates efficient planning of character motions user guided process selects common character poses and the system automatically synthesizes multi way transitions that connect through these poses in this manner well connected graphs can be constructed to suit particular application allowing for practical interactive control without the effort of manually specifying all transitions
as more and more documents become electronically available finding documents in large databases that fit users needs is becoming increasingly important in the past the document search problem was dealt with using the database query approach or the text based search approach in this paper we investigate this problem focusing on the sci ssci databases from isi specifically we design our search methodology based on the four fields commonly seen in scientific research document abstract title keywords and reference list of these four only the abstract field can be viewed as normal text while the other three have their own characteristics to differentiate them from texts therefore we first develop method to compute the similarity value for each field our next problem is combining the four similarity values into final value one approach is to assign weights to each and compute the weighted sum we have not adopted this simple weighting method however because it is difficult to determine appropriate weights instead we use the back propagation neural network to combine them finally extensive experiments have been carried out using real documents drawn from tkde journal and the results indicate that in all situations our method has much higher accuracy than the traditional text based search approach
denial of service dos attacks are arguably one of the most cumbersome problems in the internet this paper presents distributed information system over set of completely connected servers called chameleon which is robust to dos attacks on the nodes as well as the operations of the system in particular it allows nodes to efficiently look up and insert data items at any time despite powerful past insider adversary which has complete knowledge of the system up to some time point and can use that knowledge in order to block constant fraction of the nodes and inject lookup and insert requests to selected data this is achieved with smart randomized replication policy requiring polylogarithmic overhead only and the interplay of permanent and temporary distributed hash table all requests in chameleon can be processed in polylogarithmic time and work at every node
energy consumption is major concern in many embedded computing systems several studies have shown that cache memories account for about of the total energy consumed in these systems the performance of given cache architecture is largely determined by the behavior of the application using that cache desktop systems have to accommodate very wide range of applications and therefore the manufacturer usually sets the cache architecture as compromise given current applications technology and cost unlike desktop systems embedded systems are designed to run small range of well defined applications in this context cache architecture that is tuned for that narrow range of applications can have both increased performance as well as lower energy consumption we introduce novel cache architecture intended for embedded microprocessor platforms the cache can be configured by software to be direct mapped two way or four way set associative using technique we call way concatenation having very little size or performance overhead we show that the proposed cache architecture reduces energy caused by dynamic power compared to way shutdown cache furthermore we extend the cache architecture to also support way shutdown method designed to reduce the energy from static power that is increasing in importance in newer cmos technologies our study of programs drawn from powerstone mediabench and spec show that tuning the cache’s configuration saves energy for every program compared to conventional four way set associative as well as direct mapped caches with average savings of compared to four way conventional cache
this paper describes the generation of model capturing information on how placenames co occur together the advantages of the co occurrence model over traditional gazetteers are discussed and the problem of placename disambiguation is presented as case study we begin by outlining the problem of ambiguous placenames we demonstrate how analysis of wikipedia can be used in the generation of co occurrence model the accuracy of our model is compared to handcrafted ground truth then we evaluate alternative methods of applying this model to the disambiguation of placenames in free text using the geoclef evaluation forum we conclude by showing how the inclusion of placenames in both the text and geographic parts of query provides the maximum mean average precision and outline the benefits of co occurrence model as data source for the wider field of geographic information retrieval gir
with the proliferation of mobile streaming multimedia available battery capacity constrains the end user experience since streaming applications are expected to be long running wireless network interface card’s wnic energy consumption is particularly an acute problem in this work we explore various mechanisms to conserve client wnic energy consumption for popular streaming formats such as microsoft windows media real and apple quicktime first we investigate the wnic energy consumption characteristics for these popular multimedia streaming formats under varying stream bandwidth and network loss rates we show that even for high bandwidth kbps stream the wnic unnecessarily spent over of the time in idle state illustrating the potential for significant energy savingsbased on these observations we explore two mechanisms to conserve the client wnic energy consumption first we show the limitations of ieee power saving mode for multimedia streams without an understanding of the stream requirements these scheduled rendezvous mechanisms do not offer any energy savings for multimedia streams over kbps we also develop history based client side strategies to reduce the energy consumed by transitioning the wnics to lower power consuming sleep state we show that streams optimized for kbps can save over in energy consumption with data loss high bandwidth stream kbps can still save in energy consumption with less than data loss we also show that real and quicktime packets are harder to predict at the network level without understanding the packet semantics as the amount of cross traffic generated by other clients that share the same wireless segment increases the potential energy savings from our client side policies deteriorate further our work enables multimedia proxy and server developers to suitably customize the stream to lower client energy consumption
this paper describes an approach of representing shape by using set of invariant spherical harmonic sh coefficients after conformal mapping specifically genus zero mesh object is first conformally mapped onto the unit sphere by using modified discrete conformal mapping where the modification is based on mobius factorization and aims at obtaining canonical conformal mapping then sh analysis is applied to the resulting conformal spherical mesh the obtained sh coefficients are further made invariant to translation and rotation while at the same time retain the completeness thanks to which the original shape information has been faithfully preserved
active harmony is an automated runtime performance tuning system in this paper we describe parameter prioritizing tool to help focus on those parameters that are performance critical historical data is also utilized to further speed up the tuning process we first verify our proposed approaches with synthetic data and finally we verify all the improvements on real cluster based web service system taken together these changes allow the active harmony system to reduce the time spent tuning from up to and at the same time reduce the variation in performance while tuning
an overwhelming volume of news videos from different channels and languages is available today which demands automatic management of this abundant information to effectively search retrieve browse and track cross lingual news stories news story similarity measure plays critical role in assessing the novelty and redundancy among them in this paper we explore the novelty and redundancy detection with visual duplicates and speech transcripts for cross lingual news stories news stories are represented by sequence of keyframes in the visual track and set of words extracted from speech transcript in the audio track major difference to pure text documents is that the number of keyframes in one story is relatively small compared to the number of words and there exist large number of non near duplicate keyframes these features make the behavior of similarity measures different compared to traditional textual collections furthermore the textual features and visual features complement each other for news stories they can be further combined to boost the performance experiments on the trecvid cross lingual news video corpus show that approaches on textual features and visual features demonstrate different performance and measures on visual features are quite effective overall the cosine distance on keyframes is still robust measure language models built on visual features demonstrate promising performance the fusion of textual and visual features improves overall performance
the ready availability of online source code examples has fundamentally changed programming practices however current search tools are not designed to assist with programming tasks and are wholly separate from editing tools this paper proposes that embedding task specific search engine in the development environment can significantly reduce the cost of finding information and thus enable programmers to write better code more easily this paper describes the design implementation and evaluation of blueprint web search interface integrated into the adobe flex builder development environment that helps users locate example code blueprint automatically augments queries with code context presents code centric view of search results embeds the search experience into the editor and retains link between copied code and its source comparative laboratory study found that blueprint enables participants to write significantly better code and find example code significantly faster than with standard web browser analysis of three months of usage logs with users suggests that task specific search interfaces can significantly change how and when people search the web
efforts to improve application reliability can be irrelevant if the reliability of the underlying operating system on which the application resides is not seriously considered an important first step in improving the reliability of an operating system is to gain insights into why and how the bugs originate contributions of the different modules to the bugs their distribution across severities the different ways in which the bugs may be resolved and the impact of bug severities on their resolution times to acquire this insight we conducted an extensive analysis of the publicly available bug data on the linux kernel over period of seven years we also justify and explain the statistical bug occurrence trends observed from the data using the architecture of the linux kernel as an anchor the statistical analysis of the linux bug data suggests that the linux kernel may draw significant benefits from the continual reliability improvement efforts of its developers these efforts however are disproportionately targeted towards popular configurations and hardware platforms due to which the reliability of these configurations may be better than those that are not commonly used thus key finding of our study is that it may be prudent to restrict to using common configurations and platforms when using open source systems such as linux in applications with stringent reliability expectations finally our study of the architectural properties of the bugs suggests that the dependence among the modules rather than the unreliabilities of the individual modules is the primary cause of the bugs and their impact on system reliability
major obstacle in the technology transfer agenda of behavioral analysis and design methods is the need for logics or automata to express properties for control intensive systems interaction modeling notations may offer replacement or complement with practitioner appealing and lightweight flavor due partly to the subspecification of intended behavior by means of scenarios we propose novel approach consisting of engineering new formal notation of this sort based on simple compact declarative semantics vts visual timed event scenarios scenarios represent event patterns graphically depicting conditions over traces they predicate general system events and provide features to describe complex properties not expressible with msc like notations the underlying formalism supports partial orders and real time constraints the problem of checking whether timed automaton model has matching trace is proven decidable on top of this kernel we introduce notation to state properties over all system traces conditional scenarios allowing engineers to describe uniquely rich connections between antecedent and consequent portions of the scenario an undecidability result is presented for the general case of the model checking problem over dense time domains to later identify decidable yet practically relevant subclass where verification is solvable by generating antiscenarios expressed in the vts hbox rm kernel notation
data generalization is widely used to protect identities and prevent inference of sensitive information during the public release of microdata the anonymity model has been extensively applied in this context the model seeks generalization scheme such that every individual becomes indistinguishable from at least other individuals and the loss in information while doing so is kept at minimum the search is performed on domain hierarchy lattice where every node is vector signifying the level of generalization for each attribute an effort to understand privacy and data utility trade offs will require knowing the minimum possible information losses of every possible value of however this can easily lead to an exhaustive evaluation of all nodes in the hierarchy lattice in this paper we propose using the concept of pareto optimality to obtain the desired trade off information pareto optimal generalization is one in which no other generalization can provide higher value of without increasing the information loss we introduce the pareto optimal anonymization poka algorithm to traverse the hierarchy lattice and show that the number of node evaluations required to find the pareto optimal generalizations can be significantly reduced results on benchmark data set show that the algorithm is capable of identifying all pareto optimal nodes by evaluating only of nodes in the lattice
distributed simulation techniques are commonly used to improve the speed and scalability of wireless sensor network simulators however accurate simulations of dynamic interactions of sensor network applications incur large synchronization overheads and severely limit the performance of existing distributed simulators in this paper we present two novel techniques that significantly reduce such overheads by minimizing the number of sensor node synchronizations during simulations these techniques work by exploiting radio and mac specific characteristics without reducing simulation accuracy in addition we present new probing mechanism that makes it possible to exploit any potential application specific characteristics for synchronization reductions we implement and evaluate these techniques in cycle accurate distributed simulation framework that we developed based on avrora in our experiments the radio level technique achieves speedup of to times in simulating hop networks with to nodes with default backoffs themac level technique achieves speedup of to times in the best case scenarios of simulating and nodes in our multi hop flooding tests together they achieve speedup of to times in simulating networks with to nodes the experiments also demonstrate that the speedups can be significantly larger as the techniques scale with the number of processors and radio off mac backoff time
an important problem in software engineering is the automated discovery of noncrashing occasional bugs in this work we address this problem and show that mining of weighted call graphs of program executions is promising technique we mine weighted graphs with combination of structural and numerical techniques more specifically we propose novel reduction technique for call graphs which introduces edge weights then we present an analysis technique for such weighted call graphs based on graph mining and on traditional feature selection schemes the technique generalises previous graph mining approaches as it allows for an analysis of weights our evaluation shows that our approach finds bugs which previous approaches cannot detect so far our technique also doubles the precision of finding bugs which existing techniques can already localise in principle
this article presents new technique adaptive replication for automatically eliminating synchronization bottlenecks in multithreaded programs that perform atomic operations on objects synchronization bottlenecks occur when multiple threads attempt to concurrently update the same object it is often possible to eliminate synchronization bottlenecks by replicating objects each thread can then update its own local replica without synchronization and without interacting with other threads when the computation needs to access the original object it combines the replicas to produce the correct values in the original object one potential problem is that eagerly replicating all objects may lead to performance degradation and excessive memory consumptionadaptive replication eliminates unnecessary replication by dynamically detecting contention at each object to find and replicate only those objects that would otherwise cause synchronization bottlenecks we have implemented adaptive replication in the context of parallelizing compiler for subset of given an unannotated sequential program written in the compiler automatically extracts the concurrency determines when it is legal to apply adaptive replication and generates parallel code that uses adaptive replication to efficiently eliminate synchronization bottlenecksin addition to automatic parallelization and adaptive replication our compiler also implements lock coarsening transformation that increases the granularity at which the computation locks objects the advantage is reduction in the frequency with which the computation acquires and releases locks the potential disadvantage is the introduction of new synchronization bottlenecks caused by increases in the sizes of the critical sections because the adaptive replication transformation takes place at lock acquisition sites there is synergistic interaction between lock coarsening and adaptive replication lock coarsening drives down the overhead of using adaptive replication and adaptive replication eliminates synchronization bottlenecks associated with the overaggressive use of lock coarseningour experimental results show that for our set of benchmark programs the combination of lock coarsening and adaptive replication can eliminate synchronization bottlenecks and significantly reduce the synchronization and replication overhead as compared to versions that use none or only one of the transformations
in this paper we investigate reduced representations for the emerging cube we use the borders classical in data mining for the emerging cube these borders can support classification tasks to know whether trend is emerging or not however the borders do not make possible to retrieve the measure values this is why we introduce two new and reduced representations without measure loss the emerging closed cube and emerging quotient cube we state the relationship between the introduced representations experiments performed on various data sets are intended to measure the size of the three reduced representations
even great efforts have been made for decades the recognition of human activities is still an unmature technology that attracted plenty of people in computer vision in this paper system framework is presented to recognize multiple kinds of activities from videos by an svm multi class classifier with binary tree architecture the framework is composed of three functionally cascaded modules detecting and locating people by non parameter background subtraction approach extracting various of features such as local ones from the minimum bounding boxes of human blobs in each frames and newly defined global one contour coding of the motion energy image ccmei and recognizing activities of people by svm multi class classifier whose structure is determined by clustering process the thought of hierarchical classification is introduced and multiple svms are aggregated to accomplish the recognition of actions each svm in the multi class classifier is trained separately to achieve its best classification performance by choosing proper features before they are aggregated experimental results both on home brewed activity data set and the public schuldt’s data set show the perfect identification performance and high robustness of the system
the ability to walk up to any computer personalize it and use it as one’s own has long been goal of mobile computing research we present soulpad new approach based on carrying an auto configuring operating system along with suspended virtual machine on small portable device with this approach the computer boots from the device and resumes the virtual machine thus giving the user access to his personal environment including previously running computations soulpad has minimal infrastructure requirements and is therefore applicable to wide range of conditions particularly in developing countries we report our experience implementing soulpad and using it on variety of hardware configurations we address challenges common to systems similar to soulpad and show that the soulpad model has significant potential as mobility solution
data races occur when multiple threads are about to access the same piece of memory and at least one of those accesses is write such races can lead to hard to reproduce bugs that are time consuming to debug and fix we present relay static and scalable race detection analysis in which unsoundness is modularized to few sources we describe the analysis and results from our experiments using relay to find data races in the linux kernel which includes about million lines of code
one popular approach to object design proposes to identify responsibilities from software contracts apply number of principles to assign them to objects and finally construct an object interaction that realizes the contract this three step activity is currently manual process that is time consuming and error prone and is among the most challenging activities in object oriented development in this paper we present model transformation that partially automates this activity such transformation is modularized in three stages the first stage automatically transforms software contract to trace of state modification actions in the second stage the designer manually extends the trace with design decisions finally the extended trace is automatically transformed to an object interaction in the third stage prototype of the whole transformation was developed and successfully applied to case study from the literature our technique allows the extraction of valuable information from software contracts provides bridge between analysis and design artifacts and significantly reduces the effort of interaction design
operational transformation ot is an optimistic concurrency control method that has been well established in realtime group editors and has drawn significant research attention in the past decade it is generally believed that the use of ot automatically achieves high local responsiveness in group editors however no performance study has been reported previously on ot algorithms to the best of our knowledge this paper extends recent ot algorithm and studies its performance by theoretical analyses and performance experiments this paper proves that the worst case execution time of ot only appears in rare cases and shows that local responsiveness of ot based group editors in fact depends on number of factors such as the size of the operation log the paper also reveals that these two results have general implications on ot algorithms and hence the design of ot based group editors must pay attention to performance issues
addressed in this paper is the issue of email data cleaning for text mining many text mining applications need take emails as input email data is usually noisy and thus it is necessary to clean it before mining several products offer email cleaning features however the types of noises that can be eliminated are restricted despite the importance of the problem email cleaning has received little attention in the research community thorough and systematic investigation on the issue is thus needed in this paper email cleaning is formalized as problem of non text filtering and text normalization in this way email cleaning becomes independent from any specific text mining processing cascaded approach is proposed which cleans up an email in four passes including non text filtering paragraph normalization sentence normalization and word normalization as far as we know non text filtering and paragraph normalization have not been investigated previously methods for performing the tasks on the basis of support vector machines svm have also been proposed in this paper features in the models have been defined experimental results indicate that the proposed svm based methods can significantly outperform the baseline methods for email cleaning the proposed method has been applied to term extraction typical text mining processing experimental results show that the accuracy of term extraction can be significantly improved by using the data cleaning method
due to the reliance on the textual information associated with an image image search engines on the web lack the discriminative power to deliver visually diverse search results the textual descriptions are key to retrieve relevant results for given user query but at the same time provide little information about the rich image content in this paper we investigate three methods for visual diversification of image search results the methods deploy lightweight clustering techniques in combination with dynamic weighting function of the visual features to best capture the discriminative aspects of the resulting set of images that is retrieved representative image is selected from each cluster which together form diverse result set based on performance evaluation we find that the outcome of the methods closely resembles human perception of diversity which was established in an extensive clustering experiment carried out by human assessors
inspired by the surprising discovery of several recurring structures in various complex networks in recent years number of related works treated software systems as complex network and found that software systems might expose the small world effects and follow scale free degree distributions different from the research perspectives adopted in these works the work presented in this paper treats software execution processes as an evolving complex network for the first time the concept of software mirror graph is introduced as new model of complex networks to incorporate the dynamic information of software behavior the experimentation paradigm with statistical repeatability was applied to three distinct subject programs to conduct several software experiments the corresponding experimental results are analyzed by treating the software execution processes as an evolving directed topological graph as well as an evolving software mirror graph this results in several new findings while the software execution processes may demonstrate as small world complex network in the topological sense they no longer expose the small world effects in the temporal sense further the degree distributions of the software execution processes may follow power law however they may also follow an exponential function or piecewise power law
we overview the development of first order automated reasoning systems starting from their early years based on the analysis of current and potential applications of such systems we also try to predict new trends in first order automated reasoning our presentation will be centered around two main motives efficiency and usefulness for existing and future potential applications
dynamic slicing algorithms have been considered to aid in debugging for many years however as far as we know no detailed studies on evaluating the benefits of using dynamic slicing for locating real faults present in programs have been carried out in this paper we study the effectiveness of fault location using dynamic slicing for set of real bugs reported in some widely used software programs our results show that of the faults studied faults were captured by data slices required the use of full slices and none of them required the use of relevant slices moreover it was observed that dynamic slicing considerably reduced the subset of program statements that needed to be examined to locate faulty statements interestingly we observed that all of the memory bugs in the faulty versions were captured by data slices the dynamic slices that captured faulty code included to of statements that were executed at least once
video on demand vod service offers large selection of videos from which customers can choose designers of vod systems strive to achieve low access latency for customers one approach that has been investigated by several researchers allows the server to batch clients requesting the same video and to serve clients in the same batch with one multicast video stream this approach has the advantage that it can save server resources as well as server access and network bandwidth thus allowing the server to handle large number of customers without sacrificing access latency vod server replication is another approach that can allow vod service to handle large number of clients albeit at the additional cost of providing more servers while replication is an effective way to increase the service capacity it needs to be coupled with appropriate selection techniques in order to make efficient use of the increased capacity in this paper we investigate the design of server selection techniques for system of replicated batching vod servers we design and evaluate range of selection algorithms as they would be applied to three batching approaches batching with persistent channel allocation patching and hierarchical multicast stream merging hmsm we demonstrate that server replication combined with appropriate server selection scheme can indeed be used to increase the capacity of the service leading to improved performance
this work addresses data warehouse maintenance ie how changes to autonomous heterogeneous and distributed sources should be detected and propagated to warehouse the research community has mainly addressed issues relating to the internal operation of data warehouse servers work related to data warehouse maintenance has received less attention and only limited set of maintenance alternatives are considered while ignoring the autonomy and heterogeneity of sourcesin this paper we extend work on single source view maintenance to views with multiple heterogeneous sources we present tool pam which allows for comparison of large number of relevant maintenance policies under different configurations based on such analysis and previous studies we propose set of heuristics to guide in policy selection the quality of these heuristics is evaluated empirically using test bed developed for this purpose this is done for number of different criteria and for different data sources and computer systems the performance gained using the policy selected through the heuristics is compared with the performance of all identified policies based on these experiments we claim that heuristic based selections are good
massive data streams are now fundamental to many data processing applications for example internet routers produce large scale diagnostic data streams such streams are rarely stored in traditional databases and instead must be processed on the fly as they are produced similarly sensor networks produce multiple data streams of observations from their sensors there is growing focus on manipulating data streams and hence there is need to identify basic operations of interest in managing data streams and to support them efficiently we propose computation of the hamming norm as basic operation of interest the hamming norm formalises ideas that are used throughout data processing when applied to single stream the hamming norm gives the number of distinct items that are present in that data stream which is statistic of great interest in databases when applied to pair of streams the hamming norm gives an important measure of dis similarity the number of unequal item counts in the two streams hamming norms have many uses in comparing data streams we present novel approximation technique for estimating the hamming norm for massive data streams this relies on what we call the sketch and we prove its accuracy we test our approximation method on large quantity of synthetic and real stream data and show that the estimation is accurate to within few percentage points
we present framework for specification and security analysis of communication protocols for mobile wireless networks this setting introduces new challenges which are not being addressed by classical protocol analysis techniques the main complication stems from the fact that the actions of intermediate nodes and their connectivity can no longer be abstracted into single unstructured adversarial environment as they form an inherent part of the system’s security in order to model this scenario faithfully we present broadcast calculus which makes clear distinction between the protocol processes and the network’s connectivity graph which may change independently from protocol actions we identify property characterising an important aspect of security in this setting and express it using behavioural equivalences of the calculus we complement this approach with control flow analysis which enables us to automatically check this property on given network and attacker specification
characterizing the communication behavior of large scale applications is difficult and costly task due to code system complexity and long execution times while many tools to study this behavior have been developed these approaches either aggregate information in lossy way through high level statistics or produce huge trace files that are hard to handle we contribute an approach that provides orders of magnitude smaller if not near constant size communication traces regardless of the number of nodes while preserving structural information we introduce intra and inter node compression techniques of mpi events that are capable of extracting an application’s communication structure we further present replay mechanism for the traces generated by our approach and discuss results of our implementation for bluegene given this novel capability we discuss its impact on communication tuning and beyond to the best of our knowledge such concise representation of mpi traces in scalable manner combined with deterministic mpi call replay is without any precedent
over the last decade feature creep and the convergence of multiple devices have increased the complexity of both design and use one way to reduce the complexity of device without sacrificing its features is to design the ui consistently however designing consistent user interface of multifunction device often becomes formidable task especially when the logical interaction is concerned this paper presents systematic method for consistent design of user interaction called cuid consistent user interaction design and validates its usefulness through case study cuid focusing on ensuring consistency of logical interaction rather than physical or visual interfaces employs constraint based interactive approach it strives for consistency as the main goal but also considers efficiency and safety of use cuid will reduce the cognitive complexity of the task of interaction design to help produce devices that are easier to learn and use
the web graph is giant social network whose properties have been measured and modeled extensively in recent years most such studies concentrate on the graph structure alone and do not consider textual properties of the nodes consequently web communities have been characterized purely in terms of graph structure and not on page content we propose that topic taxonomy such as yahoo or the open directory provides useful framework for understanding the structure of content based clusters and communities in particular using topic taxonomy and an automatic classifier we can measure the background distribution of broad topics on the web and analyze the capability of recent random walk algorithms to draw samples which follow such distributions in addition we can measure the probability that page about one broad topic will link to another broad topic extending this experiment we can measure how quickly topic context is lost while walking randomly on the web graph estimates of this topic mixing distance may explain why global pagerank is still meaningful in the context of broad queries in general our measurements may prove valuable in the design of community specific crawlers and link based ranking systems
current techniques and tools for automated termination analysis of term rewrite systems trss are already very powerful however they fail for algorithms whose termination is essentially due to an inductive argument therefore we show how to couple the dependency pair method for trs termination with inductive theorem proving as confirmed by the implementation of our new approach in the tool aprove now trs termination techniques are also successful on this important class of algorithms
we present freeform modeling framework for unstructured triangle meshes which is based on constraint shape optimization the goal is to simplify the user interaction even for quite complex freeform or multiresolution modifications the user first sets various boundary constraints to define custom tailored abstract basis function which is adjusted to given design task the actual modification is then controlled by moving one single dof manipulator object the technique can handle arbitrary support regions and piecewise boundary conditions with smoothness ranging continuously from to to more naturally adapt the modification to the shape of the support region the deformed surface can be tuned to bend with anisotropic stiffness we are able to achieve real time response in an interactive design session even for complex meshes by precomputing set of scalar valued basis functions that correspond to the degrees of freedom of the manipulator by which the user controls the modification
balancing peer to peer graphs including zone size distributions has recently become an important topic of peer to peer pp research to bring analytical understanding into the various peer join mechanisms we study how zone balancing decisions made during the initial sampling of the peer space affect the resulting zone sizes and derive several asymptotic results for the maximum and minimum zone sizes that hold with high probability
we explore making virtual desktops behave in more physically realistic manner by adding physics simulation and using piling instead of filing as the fundamental organizational structure objects can be casually dragged and tossed around influenced by physical characteristics such as friction and mass much like we would manipulate lightweight objects in the real world we present prototype called bumptop that coherently integrates variety of interaction and visualization techniques optimized for pen input we have developed to support this new style of desktop organization
in distributed shared memory multiprocessors remote memory references generate processor to memory traffic which may result in bottleneck it is therefore important to design algorithms that minimize the number of remote memory references we establish lower bound of three on remote reference time complexity for mutual exclusion algorithms in model where processes communicate by means of general read modify write primitive that accesses at most one shared variable in one instruction since the general read modify write primitive is generalization of variety of atomic primitives that have been implemented in multiprocessor systems our lower bound holds for all mutual exclusion algorithms that use such primitives furthermore this lower bound is shown to be tight by presenting an algorithm with the matching upper bound
in this experience report we present an evaluation of different techniques to manage concurrency in the context of application servers traditionally using entity beans is considered as the only way to synchronize concurrent access to data in jave ee and using mechanism such as synchronized blocks within ejbs is strongly not recommended in our evaluation we consider the use of software transactional memory to enable concurrent accesses to shared data across different session beans we are also comparing our approach with using entity beans and session beans synchronized by global lock
in this paper the problem of optimizing svr automatically for time series forecasting is considered which involves introducing auto adaptive parameters and to depict the non uniform distribution of the information offered by the training data developing multiple kernel function to rescale different attributes of input space optimizing all the parameters involved simultaneously with genetic algorithm and performing feature selection to reduce the redundant information experimental results assess the feasibility of our approach called model optimizing svr or briefly mo svr and demonstrate that our method is promising alternative for time series forecasting
information dissemination is powerful mechanism for finding information in wide area environments an information dissemination server accepts long term user queries collects new documents from information sources matches the documents against the queries and continuously updates the users with relevant information this paper is retrospective of the stanford information filtering service sift system that as of april was processing over worldwide subscriptions and over daily documents the paper describes some of the indexing mechanisms that were developed for sift as well as the evaluations that were conducted to select scheme to implement it also describes the implementation of sift and experimental results for the actual system finally it also discusses and experimentally evaluates techniques for distributing service such as sift for added performance and availability
with the diverse new capabilities that sensor and ad hoc networks can provide applicability of data aggregation is growing data aggregation is useful in dealing with multi value domain information which often requires approximate agreement decisions among nodes in contrast to fully connected networks the research on data aggregation for partially connected networks is very limited this is due to the complexity of formal proofs and the fact that node may not have global view of the entire network which makes it difficult to attain the convergence properties the complexity of the problem is compounded in the presence of message dropouts faults and orchestrated attacks by exploiting the properties of discrete markov chains this study investigates the data aggregation problem for partially connected networks to obtain the number of rounds of message exchanges needed to reach network convergence the average convergence rate in round of message exchange and the number of rounds required to reach stationary convergence
the purpose of this talk is to provide comprehensive state of the art concerning the evolution of query optimization methods from centralized database systems to data grid systems through parallel distributed and data integration systems for each environment we try to describe synthetically some methods and point out their main characteristics
recently the network attached secure disk nasd model has become more widely used technique for constructing large scale storage systems however the security system proposed for nasd assumes that each client will contact the server to get capability to access one object on server while this approach works well in smaller scale systems in which each file is composed of few objects it fails for large scale systems in which thousands of clients make accesses to single file composed of thousands of objects spread across thousands of disks the file system we are building ceph distributes files across many objects and disks to distribute load and improve reliability in such system the metadata server cluster will sometimes see thousands of open requests for the same file within seconds to address this bottleneck we propose new authentication protocols for object based storage systems in which sequence of fixed size objects comprise file and flash crowds are likely we qualitatively evaluated the security and risks of each protocol and using traces of scientific application compared the overhead of each protocol we found that surprisingly protocol using public key cryptography incurred little extra cost while providing greater security than protocol using only symmetric key cryptography
this paper presents flickr distance which is novel measurement of the relationship between semantic concepts objects scenes in visual domain for each concept collection of images are obtained from flickr based on which the improved latent topic based visual language model is built to capture the visual characteristic of this concept then flickr distance between different concepts is measured by the square root of jensen shannon js divergence between the corresponding visual language models comparing with wordnet flickr distance is able to handle far more concepts existing on the web and it can scale up with the increase of concept vocabularies comparing with google distance which is generated in textual domain flickr distance is more precise for visual domain concepts as it captures the visual relationship between the concepts instead of their co occurrence in text search results besides unlike google distance flickr distance satisfies triangular inequality which makes it more reasonable distance metric both subjective user study and objective evaluation show that flickr distance is more coherent to human perception than google distance we also design several application scenarios such as concept clustering and image annotation to demonstrate the effectiveness of this proposed distance in image related applications
lifetime is very important to wireless sensor networks since most sensors are equipped with non rechargeable batteries therefore energy and delay are critical issues for the research of sensor networks that have limited lifetime due to the uncertainties in execution time of some tasks this paper models each varied execution time as probabilistic random variable with the consideration of applications performance requirements to solve the map mode assignment with probability problem using probabilistic design we propose an optimal algorithm to minimize the total energy consumption while satisfying the timing constraint with guaranteed confidence probability the experimental results show that our approach achieves significant energy saving than previous work for example our algorithm achieves an average improvement of on total energy consumption
several current support systems for travel and tourism are aimed at providing information in personalized manner taking users interests and preferences into account in this vein personalized systems observe users behavior and based thereon make generalizations and predictions about them this article describes user modeling server that offers services to personalized systems with regard to the analysis of user actions the representation of assumptions about the user and the inference of additional assumptions based on domain knowledge and characteristics of similar users the system is open and compliant with major standards allowing it to be easily accessed by clients that need personalization services
we present novel rendering system for defocus blur and lens effects it supports physically based rendering and outperforms previous approaches by involving novel gpu based tracing method our solution achieves more precision than competing real time solutions and our results are mostly indistinguishable from offline rendering our method is also more general and can integrate advanced simulations such as simple geometric lens models enabling various lens aberration effects these latter is crucial for realism but are often employed in artistic contexts too we show that available artistic lenses can be simulated by our method in this spirit our work introduces an intuitive control over depth of field effects the physical basis is crucial as starting point to enable new artistic renderings based on generalized focal surface to emphasize particular elements in the scene while retaining realistic look our real time solution provides realistic as well as plausible expressive results
miniaturization of devices and the ensuing decrease in the threshold voltage has led to substantial increase in the leakage component of the total processor energy consumption relatively simpler issue logic and the presence of large number of function units in the vliw and the clustered vliw architectures attribute large fraction of this leakage energy consumption in the functional units however functional units are not fully utilized in the vliw architectures because of the inherent variations in the ilp of the programs this underutilization is even more pronounced in the context of clustered vliw architectures because of the contentions for the limited number of slow intercluster communication channels which lead to many short idle cyclesin the past some architectural schemes have been proposed to obtain leakage energy bene ts by aggressively exploiting the idleness of functional units however presence of many short idle cycles cause frequent transitions from the active mode to the sleep mode and vice versa and adversely ffects the energy benefits of purely hardware based scheme in this paper we propose and evaluate compiler instruction scheduling algorithm that assist such hardware based scheme in the context of vliw and clustered vliw architectures the proposed scheme exploits the scheduling slacks of instructions to orchestrate the functional unit mapping with the objective of reducing the number of transitions in functional units thereby keeping them off for longer duration the proposed compiler assisted scheme obtains further reduction of energy consumption of functional units with negligible performance degradation over hardware only scheme for vliw architecture the benefits are and in the context of clustered and clustered vliw architecture respectively our test bed uses the trimaran compiler infrastructure
despite well documented advantages attempts to go truly paperless seldom succeed this is principally because computer based paperless systems typically do not support all of the affordances of paper nor the work process that have evolved with paper based systems we suggest that attention to users work environments activities and practices are critical to the success of paperless systems this paper describes the development and effective utilization of software tool for the paperless marking of student assignments which does not require users to compromise on established best practice it includes significant advance in the task management support
clustering decisions frequently arise in business applications such as recommendations concerning products markets human resources etc currently decision makers must analyze diverse algorithms and parameters on an individual basis in order to establish preferences on the decision making issues they face because there is no supportive model or tool which enables comparing different result clusters generated by these algorithms and parameters combinations the multi algorithm voting mav methodology enables not only visualization of results produced by diverse clustering algorithms but also provides quantitative analysis of the results the current research applies mav methodology to the case of recommending new car pricing the findings illustrate the impact and the benefits of such decision support system
master worker paradigm for executing large scale parallel discrete event simulation programs over networkenabled computational resources is proposed and evaluated in contrast to conventional approaches to parallel simulation client server architecture is proposed where clients workers repeatedly download state vectors of logical processes and associated message data from server master perform simulation computations locally at the client and then return the results back to the server this process offers several potential advantages over conventional parallel discrete event simulation systems including support for execution over heterogeneous distributed computing platforms load balancing efficient execution on shared platforms easy addition or removal of client machines during program execution simpler fault tolerance and improved portability prototype implementation called the aurora parallel and distributed simulation system aurora is described the structure and interaction of the aurora components is described results of an experimental performance evaluation are presented detailing primitive timings and application performance on both dedicated and shared computing platforms
most real world databases contain substantial amounts of time referenced or temporal data recent advances in temporal query languages show that such database applications may benefit substantially from built in temporal support in the dbms to achieve this temporal query representation optimization and processing mechanisms must be provided this paper presents foundation for query optimization that integrates conventional and temporal query optimization and is suitable for both conventional dbms architectures and ones where the temporal support is obtained via layer on top of conventional dbms this foundation captures duplicates and ordering for all queries as well as coalescing for temporal queries thus generalizing all existing approaches known to the authors it includes temporally extended relational algebra to which sql and temporal sql queries may be mapped six types of algebraic equivalences concrete query transformation rules that obey different equivalences procedure for determining which types of transformation rules are applicable for optimizing query and query plan enumeration algorithm the presented approach partitions the work required by the database implementor to develop provably correct query optimizer into four stages the database implementor has to specify operations formally design and prove correct appropriate transformation rules that satisfy any of the six equivalence types augment the mechanism that determines when the different types of rules are applicable to ensure that the enumeration algorithm applies the rules correctly and ensure that the mapping generates correct initial query plan
queries over sets of complex elements are performed extracting features from each element which are used in place of the real ones during the processing extracting large number of significant features increases the representative power of the feature vector and improves the query precision however each feature is dimension in the representation space consequently handling more features worsen the dimensionality curse the problem derives from the fact that the elements tends to distribute all over the space and large dimensionality allows them to spread over much broader spaces therefore in high dimensional spaces elements are frequently farther from each other so the distance differences among pairs of elements tends to homogenize when searching for nearest neighbors the first one is usually not close but as long as one is found small increases in the query radius tend to include several others this effect increases the overlap between nodes in access methods indexing the dataset both spatial and metric access methods are sensitive to the problem this paper presents general strategy applicable to metric access methods in general improving the performance of similarity queries in high dimensional spaces our technique applies function that stretches the distances thus close objects become closer and far ones become even farther experiments using the metric access method slim tree show that similarity queries performed in the transformed spaces demands up to less distance calculations less disk access and reduces up to in total time when comparing with the original spaces
differential calculus for first order logic is developed to enforce database integrity formal differentiation of first order sentences is useful in maintaining database integrity since once database constraint is expressed as first order sentence its derivative with respect to transaction provides the necessary and sufficient condition for maintaining integrity the derivative is often much simpler to test than the original constraint since it maintains integrity differentially by assuming integrity before the transaction and testing only for new violations the formal differentiation requires no resolution search but only substitution it is more efficient than resolution based approaches and it provides considerably more general solution than previous substitution based methods since it is valid for all first order sentences and with all transactions involving arbitrary collections of atomic changes to the database it also produces large number of sufficient conditions that are often less strict than those of the previous approaches and it can be extended to accommodate many dynamic constraints
in this paper we propose two languages called future temporal logic ftl and past temporal logic ptl for specifying temporal triggers some examples of trigger conditions that can be specified in our language are the following the value of certain attribute increases by more than in minutes tuple that satisfies certain predicate is added to the database at least minutes before another tuple satisfying different condition is added to the database such triggers are important for monitor and control applicationsin addition to the languages we present algorithms for processing the trigger conditions specified in these languages namely procedures for determining when the trigger conditions are satisfied these methods can be added as temporal component to an existing database management systems preliminary prototype of the temporal component that uses the ftl language has been built on top of sybase running on sun workstations
theoretical and technological progress has revived the interest in the design of services for the support of co located human human communication and collaboration witnessing the start of several large scale projects over the last few years most of these projects focus on meetings and or lecture situations however user centred design and evaluation frameworks for co located communication and collaboration are major concern in this paper we summarise the prevalent approaches towards user centred design and evaluation and we develop two different services in one service participants in small group meeting receive real time feedback about observable properties of the meeting that are directly related to the social dynamics such as individual amount of speaking time or eye gaze patterns in the other service teachers in classroom receive real time feedback about the activities and attention level of participants in the lecture we also propose ways to address the different dimensions that are relevant to the design and evaluation of these services the individual the social and the organisational dimension bringing together methods from different disciplines
the specification of an information system should include description of structural system aspects as well as description of the system behavior in this article we show how this can be achieved by high level petri nets mdash namely the so called nr nets nested relation transition nets in nr nets the structural part is modeled by nested relations and the behavioral part is modeled by novel petri net formalism each place of net represents nested relation scheme and the marking of each place is given as nested relation of the respective type insert and delete operations in nested relational database nf database are expressed by transitions in net these operations may operate not only on whole tuples of given relation but also on ldquo subtuples rdquo of existing tuples the arcs of net are inscribed with so called filter tables which allow together with an optional logical expression as transition inscription conditions to be formulated on the specified sub tuples the occurrence rule for nr net transitions is defined by the operations union intersection and ldquo negative rdquo in lattices of nested relations the structure of an nr net together with the occurrence rule defines classes of possible information system procedures ie sequences of possibly concurrent operations in an information system
tabletop systems are currently being focused on and many applications using these systems are being developed in such tabletop systems how to recognize real objects on the table is an essential and important issue in existing tabletop systems markers have been often used however their black and white pattern which means nothing to humans spoils the appearance of the object we developed transparent markers on liquid crystal display lcd tabletop system by using the polarization features of the lcd and optical lms in particular through experiments with various kinds of optical films we found that two halfwave plates make the markers rotation invariant by using the transparent markers tangible transparent magic lenses tm applications were developed
text sentiment analysis also referred to as emotional polarity computation has become flourishing frontier in the text mining community this paper studies online forums hotspot detection and forecast using sentiment analysis and text mining approaches first we create an algorithm to automatically analyze the emotional polarity of text and to obtain value for each piece of text second this algorithm is combined with means clustering and support vector machine svm to develop unsupervised text mining approach we use the proposed text mining approach to group the forums into various clusters with the center of each representing hotspot forum within the current time span the data sets used in our empirical studies are acquired and formatted from sina sports forums which spans range of different topic forums and posts experimental results demonstrate that svm forecasting achieves highly consistent results with means clustering the top hotspot forums listed by svm forecasting resembles of means clustering results both svm and means achieve the same results for the top hotspot forums of the year
semi supervised support vector machine svm attempts to learn decision boundary that traverses through low data density regions by maximizing the margin over labeled and unlabeled examples traditionally svm is formulated as non convex integer programming problem and is thus difficult to solve in this paper we propose the cutting plane semi supervised support vector machine cutsvm algorithm to solve the svm problem specifically we construct nested sequence of successively tighter relaxations of the original svm problem and each optimization problem in this sequence could be efficiently solved using the constrained concave convex procedure cccp moreover we prove theoretically that the cutsvm algorithm takes time sn to converge with guaranteed accuracy where is the total number of samples in the dataset and is the average number of non zero features ie the sparsity experimental evaluations on several real world datasets show that cutsvm performs better than existing svm methods both in efficiency and accuracy
this paper explores the embodied interactional ways in which people naturally collaborate around and share collections of photographs we employ ethnographic studies of paper based photograph use to consider requirements for distributed collaboration around digital photographs distributed sharing is currently limited to the passing on of photographs to others by email webpages or mobile phones to move beyond this fundamental challenge for photoware consists of developing support for the practical achievement of sharing at distance specifically this entails augmenting the natural production of accounts or photo talk to support the distributed achievement of sharing
in this paper we consider the problem of maximizing wireless network capacity aka one shot scheduling in both the protocol and physical models we give the first distributed algorithms with provable guarantees in the physical model and show how they can be generalized to more complicated metrics and settings in which the physical assumptions are slightly violated we also give the first algorithms in the protocol model that do not assume transmitters can coordinate with their neighbors in the interference graph so every transmitter chooses whether to broadcast based purely on local events our techniques draw heavily from algorithmic game theory and machine learning theory even though our goal is distributed algorithm indeed our main results allow every transmitter to run any algorithm it wants so long as its algorithm has learning theoretic property known as no regret in game theoretic setting
in previous paper we introduced hybrid scalable approach for data gathering and dissemination in sensor networks we called on board data dissemination obdd the approach was validated and implemented on well structured network topologies where virtual boards carry the queries are constructed guided by well defined trajectories in this paper we extend our previous work to include networks with irregular topologies in such networks boards guided by well defined trajectories are no longer enough to construct the dissemination structure we have adapted and modified an underlying topology detection algorithm to fit our protocol the resulting topology based on board data dissemination tobdd efficiently works with any network topology it also maintains the desirable features of obdd although the approach adapts both pull and push strategies it synchronizes both phases to maintain the communications mainly on demand and to prohibit problems that could be inherited from the push strategy both analysis and simulation show that tobdd promises an efficient and scalable paradigm for data gathering and dissemination in sensor networks that reduces the communication cost and leads to load balancing
several recent studies have introduced lightweight versions of java reduced languages in which complex features like threads and reflection are dropped to enable rigorous arguments about key properties such as type safety we carry this process step further omitting almost all features of the full language including interfaces and even assignment to obtain small calculus featherweight java for which rigorous proofs are not only possible but easy featherweight java bears similar relation to java as the lambda calculus does to languages such as ml and haskell it offers similar computational feel providing classes methods fields inheritance and dynamic typecasts with semantics closely following java’s proof of type safety for featherweight java thus illustrates many of the interesting features of safety proof for the full language while remaining pleasingly compact the minimal syntax typing rules and operational semantics of featherweight java make it handy tool for studying the consequences of extensions and variations as an illustration of its utility in this regard we extend featherweight java with generic classes in the style of gj bracha odersky stoutamire and wadler and give detailed proof of type safety the extended system formalizes for the first time some of the key features of gj
memory analysis techniques have become sophisticated enough to model with high degree of accuracy the manipulation of simple memory structures finite structures single double linked lists and trees however modern programming languages provide extensive library support including wide range of generic collection objects that make use of complex internal data structures while these data structures ensure that the collections are efficient often these representations cannot be effectively modeled by existing methods either due to excessive analysis runtime or due to the inability to represent the required information this paper presents method to represent collections using an abstraction of their semantics the construction of the abstract semantics for the collection objects is done in manner that allows individual elements in the collections to be identified our construction also supports iterators over the collections and is able to model the position of the iterators with respect to the elements in the collection by ordering the contents of the collection based on the iterator position the model can represent notion of progress when iteratively manipulating the contents of collection these features allow strong updates to the individual elements in the collection as well as strong updates over the collections themselves
the paper presents problems pertaining to spatial data mining based on the existing solutions new method of knowledge extraction in the form of spatial association rules and collocations has been worked out and is proposed herein delaunay diagram is used for determining neighborhoods based on the neighborhood notion spatial association rules and collocations are defined novel algorithm for finding spatial rules and collocations has been presented the approach allows eliminating the parameters defining neighborhood of objects thus avoiding multiple test and trial repetitions of the process of mining for various parameter values the presented method has been implemented and tested the results of the experiments have been discussed
given that contiguous reads and writes between cache and disk outperform fragmented reads and writes fragmented reads and writes are forcefully transformed into contiguous reads and writes via proposed matrix stripe cache based contiguity transform msc ct method which employs rule of consistency for data integrity at the block level and rule of performance that ensures no performance degradation msc ct performs for reads and writes both of which are produced by write requests from host as write request from host employs reads for parity update and writes to disks in redundant array of independent disks raid msc ct is compatible with existing disk technologies the proposed implementation in linux kernel delivers peak throughput that is times higher than case without msc ct on representative workloads the results demonstrate that msc ct is extremely simple to implement has low overhead and is ideally suited for raid controllers not only for random writes but also for sequential writes in various realistic scenarios
although unit tests are recognized as an important tool in software development programmers prefer to write code rather than unit tests despite the emergence of tools like junit which automate part of the process unit testing remains time consuming resource intensive and not particularly appealing activitythis paper introduces new development method called contract driven development this development method is based on novel mechanism that extracts test cases from failure producing runs that the programmers trigger it exploits actions that developers perform anyway as part of their normal process of writing code thus it takes the task of writing unit tests off the developers shoulders while still taking advantage of their knowledge of the intended semantics and structure of the code the approach is based on the presence of contracts in code which act as the oracle of the test cases the test cases are extracted completely automatically are run in the background and can easily be maintained over versions the tool implementing this methodology is called cdd and is available both in binary and in source form
as the width of the processor grows complexity of register file rf with multiple ports grows more than linearly and leads to larger register access time and higher power consumption analysis of spec programs reveals that only small portion of the instructions in program in integer and in floating point require both the source operands also when the programs are executed in an wide processor only very few two or less two source instructions are executed in cycle for significant portion of time more than for integer and for floating point leading to significant under utilization of register port bandwidth in this paper we propose novel technique to significantly reduce the number of register ports with very minor modification in the select logic to issue only limited number of two source instructions each cycle this is achieved with no significant impact on processor’s overall performance the novelty of the technique is that it is easy to implement and succeeds in reducing the access time power and area of the register file without aggravating these factors in any other logic on the chip with this technique in an wide processor as compared to conventional entry rf with read ports for integer programs register file can be designed with or read ports as these configurations result in instructions per cycle ipc degradation of only and respectively this significantly low degradation in ipc is achieved while reducing the register access time by and respectively and reducing power by and respectively for fp programs register file can be designed with read ports ipc loss less access time and less power or with read ports ipc loss less access time and less power the paper analyzes the performance of all the possible flavors of the proposed technique for register file in both wide and wide processors and presents choice of the performance and register port complexity combination to the designer
in this paper we propose new and general preprocessor algorithm called csroulette which converts any cost insensitive classification algorithms into cost sensitive ones csroulette is based on cost proportional roulette sampling technique called cprs in short csroulette is closely related to costing another cost sensitive meta learning algorithm which is based on rejection sampling unlike rejection sampling which produces smaller samples cprs can generate different size samples to further improve its performance we apply ensemble bagging on cprs the resulting algorithm is called csroulette our experiments show that csroulette outperforms costing and other meta learning methods in most datasets tested in addition we investigate the effect of various sample sizes and conclude that reduced sample sizes as in rejection sampling cannot be compensated by increasing the number of bagging iterations
this paper presents physics based method for creating complex multi character motions from short single character sequences we represent multi character motion synthesis as spacetime optimization problem where constraints represent the desired character interactions we extend standard spacetime optimization with novel timewarp parameterization in order to jointly optimize the motion and the interaction constraints in addition we present an optimization algorithm based on block coordinate descent and continuations that can be used to solve large problems multiple characters usually generate this framework allows us to synthesize multi character motion drastically different from the input motion consequently small set of input motion dataset is sufficient to express wide variety of multi character motions
when web user’s underlying information need is not clearly specified from the initial query an effective approach is to diversify the results retrieved for this query in this paper we introduce novel probabilistic framework for web search result diversification which explicitly accounts for the various aspects associated to an underspecified query in particular we diversify document ranking by estimating how well given document satisfies each uncovered aspect and the extent to which different aspects are satisfied by the ranking as whole we thoroughly evaluate our framework in the context of the diversity task of the trec web track moreover we exploit query reformulations provided by three major web search engines wses as means to uncover different query aspects the results attest the effectiveness of our framework when compared to state of the art diversification approaches in the literature additionally by simulating an upper bound query reformulation mechanism from official trec data we draw useful insights regarding the effectiveness of the query reformulations generated by the different wses in promoting diversity
we consider the problem to infer concise document type definition dtd for given set of xml documents problem which basically reduces to learning of concise regular expressions from positive example strings we identify two such classes single occurrence regular expressions sores and chain regular expressions chares both classes capture the far majority of the regular expressions occurring in practical dtds and are succinct by definition we present the algorithm idtd infer dtd that learns sores from strings by first inferring an automaton by known techniques and then translating that automaton to corresponding sore possibly by repairing the automaton when no equivalent sore can be found in the process we introduce novel automaton to regular expression rewrite technique which is of independent interest we show that idtd outperforms existing systems in accuracy conciseness and speed in scenario where only very small amount of xml data is available for instance when generated by web service requests or by answers to queries idtd produces regular expressions which are too specific therefore we introduce novel learning algorithm crx that directly infers chares which form subclass of sores without going through an automaton representation we show that crx performs very well within its target class on very small data sets finally we discuss incremental computation noise numerical predicates and the generation of xml schemas
in this paper we investigate future topics and challenges of interaction and user experience in multimedia we bring together different perspectives from overlapping fields of research such as multimedia human computer interaction information retrieval networked multimedia and creative arts based on potential intersections we define three application domains to be investigated further as they create high demand and good prospect for long lasting developments in the future these application domains are media working environments media enter edutainment and social media engagement each application domain is analyzed along five dimensions namely information quality presentation quality ambience interactivity and user expectations based on this analysis we identify the most pressing research questions and key challenges for each area finally we advocate user centered approach to tackle these challenges and questions in order to develop relevant multimedia applications that best meet the users expectations
this paper presents the design and implementation of compiler that translates programs written in type safe subset of the programming language into highly optimized dec alpha assembly language programs and certifier that automatically checks the type safety and memory safety of any assembly language program produced by the compiler the result of the certifier is either formal proof of type safety or counterexample pointing to potential violation of the type system by the target program the ensemble of the compiler and the certifier is called certifying compilerseveral advantages of certifying compilation over previous approaches can be claimed the notion of certifying compiler is significantly easier to employ than formal compiler verification in part because it is generally easier to verify the correctness of the result of computation than to prove the correctness of the computation itself also the approach can be applied even to highly optimizing compilers as demonstrated by the fact that our compiler generates target code for range of realistic programs which is competitive with both the cc and gcc compilers with all optimizations enabled the certifier also drastically improves the effectiveness of compiler testing because for each test case it statically signals compilation errors that might otherwise require many executions to detect finally this approach is practical way to produce the safety proofs for proof carrying code system and thus may be useful in system for safe mobile code
this paper describes the development and use of set of design representations of moving bodies in the design of bystander multi user interactive immersive artwork built on video based motion sensing technology we extended the traditional user centred design tools of personas and scenarios to explicitly address human movement characteristics embedded in social interaction set of corresponding movement schemas in labanotation was constructed to visually represent the spatial and social interaction of multiple users over time together these three design representations of moving bodies were used to enable the design team to work with the aspects of human movement relevant to bystander and to ensure that the system could respond in coherent and robust manner to the shifting configurations of visitors in the space they also supported two experiential methods of design reflection in action enactment and immersion that were vital for grounding designers understandings of the specific interactive nature of the work in their own sensing feeling and moving bodies
establishing pairwise keys for each pair of neighboring sensors is the first concern in securing communication in sensor networks this task is challenging because resources are limited several random key predistribution schemes have been proposed but they are appropriate only when sensors are uniformly distributed with high density these schemes also suffer from dramatic degradation of security when the number of compromised sensors exceeds threshold in this paper we present group based key predistribution scheme gke which enables any pair of neighboring sensors to establish unique pairwise key regardless of sensor density or distribution since pairwise keys are unique security in gke degrades gracefully as the number of compromised nodes increases in addition gke is very efficient since it requires only localized communication to establish pairwise keys thus significantly reducing the communication overhead our security analysis and performance evaluation illustrate the superiority of gke in terms of resilience connectivity communication overhead and memory requirement
we present particle based method for viscoelastic fluids simulation in the method based on the traditional navier stokes equation an additional elastic stress term is introduced to achieve viscoelastic flow behaviors which have both fluid and solid features benefiting from the lagrangian nature of smoothed particle hydrodynamics large flow deformation can be handled more easily and naturally and also by changing the viscosity and elastic stress coefficient of the particles according to the temperature variation the melting and flowing phenomena such as lava flow and wax melting are achieved the temperature evolution is determined with the heat diffusion equation the method is effective and efficient and has good controllability different kinds of viscoelastic fluid behaviors can be obtained easily by adjusting the very few experimental parameters
to address business requirements and to survive in competing markets companies or open source organizations often have to release different versions of their projects in different languages manually migrating projects from one language to another such as from java to is tedious and error prone task to reduce manual effort or human errors tools can be developed for automatic migration of projects from one language to another however these tools require the knowledge of how application programming interfaces apis of one language are mapped to apis of the other language referred to as api mapping relations in this paper we propose novel approach called mam mining api mapping that mines api mapping relations from one language to another using api client code mam accepts set of projects each with two versions in two languages and mines api mapping relations between those two languages based on how apis are used by the two versions these mined api mapping relations assist in migration of projects from one language to another we implemented tool and conducted two evaluations to show the effectiveness of mam the results show that our tool mines unique mapping relations of apis between java and with more than accuracy the results also show that mined api mapping relations help reduce compilation errors and defects during migration of projects with an existing migration tool called javacsharp the reduction in compilation errors and defects is due to our new mined mapping relations that are not available with the existing migration tool
this research reports on study of the interplay between multi tasking and collaborative work we conducted an ethnographic study in two different companies where we observed the experiences and practices of thirty six information workers we observed that people continually switch between different collaborative contexts throughout their day we refer to activities that are thematically connected as working spheres we discovered that to multi task and cope with the resulting fragmentation of their work individuals constantly renew overviews of their working spheres they strategize how to manage transitions between contexts and they maintain flexible foci among their different working spheres we argue that system design to support collaborative work should include the notion that people are involved in multiple collaborations with contexts that change continually system design must take into account these continual changes people switch between local and global perspectives of their working spheres have varying states of awareness of their different working spheres and are continually managing transitions between contexts due to interruptions
the growing cost of tuning and managing computer systems is leading to out sourcing of commercial services to hosting centers these centers provision thousands of dense servers within relatively small real estate in order to host the applications services of different customers who may have been assured by service level agreement sla power consumption of these servers is becoming serious concern in the design and operation of the hosting centers the effects of high power consumption manifest not only in the costs spent in designing effective cooling systems to ward off the generated heat but in the cost of electricity consumption itself it is crucial to deploy power management strategies in these hosting centers to lower these costs towards enhancing profitability at the same time techniques for power management that include shutting down these servers and or modulating their operational speed can impact the ability of the hosting center to meet slas in addition repeated on off cycles can increase the wear and tear of server components incurring costs for their procurement and replacement this paper presents formalism to this problem and proposes three new online solution strategies based on steady state queuing analysis feedback control theory and hybrid mechanism borrowing ideas from these two using real web server traces we show that these solutions are more adaptive to workload behavior when performing server provisioning and speed control than earlier heuristics towards minimizing operational costs while meeting the slas
in this article we present synthesis technique for generating schedulers for real time systems the aim of the scheduler is to ensure via restricting the general behaviour that the real time system satisfies the specification the real time system and the specification are described as alur dill timed automata while the synthesised scheduler is type of timed trajectory automaton this allows us to perform the synthesis without incurring the cost of constructing timed regions we also note simple constraint that the specification has to satisfy for this technique to be useful
one important trend in today’s microprocessor architectures is the increase in size of the processor caches these caches also tend to be set associative as technology scales process variations are expected to increase the fault rates of the sram cells that compose such caches as an important component of the processor the parametric yield of sram cells is crucial to the overall performance and yield of the microchip in this article we propose microarchitectural solution called the buddy cache that permits large set associative caches to tolerate faults in sram cells due to process variations in essence instead of disabling faulty cache block in set as is the current practice it is paired with another faulty cache block in the same set mdash the buddy although both cache blocks are faulty if the faults of the two blocks do not overlap then instead of losing two blocks buddying will yield functional block from the nonfaulty portions of the two blocks we found that with buddying caches can better mitigate the negative impacts of process variations on performance and yield gracefully downgrading performance as opposed to catastrophic failure we will describe the details of the buddy cache and give insights as to why it is both more performance and yield resilient to faults
in this paper we propose novel application specific demand paging mechanism for low end embedded systems with flash memory as secondary storage these systems are not equipped with virtual memory small memory space called an execution buffer is allocated to page an application an application specific page manager manages the buffer the manager is generated by compiler post pass and combined with the application image our compiler post pass analyzes the elf executable image of an application and transforms function call return instructions into calls to the page manager as result each function of the code can be loaded into memory on demand at run time to minimize the overhead of demand paging code clustering algorithms are also presented we evaluate our techniques with five embedded applications we show that our approach can reduce the code memory size by on average with reasonable performance degradation and energy consumption more on average for low end embedded systems
overlapping split phase large latency operations with computations is standard technique for improving performance on modern architectures in this paper we present general interprocedural technique for overlapping such accesses with computation we have developed an interprocedural balanced code placement ibcp framework which performs analysis on arbitrary recursive procedures and arbitrary control flow and replaces synchronous operations with balanced pair of asynchronous operations we have evaluated this scheme in the context of overlapping operations with computation we demonstrate how this analysis is useful for applications which perform frequent and large accesses to disks including applications which snapshot or checkpoint their computations or out of core applications
in an effort to make robust traffic classification more accessible to human operators we present visualization techniques for network traffic our techniques are based solely on network information that remains intact after application layer encryption and so offer way to visualize traffic in the dark our visualizations clearly illustrate the differences between common application protocols both in their transient ie time dependent and steady state behavior we show how these visualizations can be used to assist human operator to recognize application protocols in unidentified traffic and to verify the results of an automated classifier via visual inspection in particular our preliminary results show that we can visually scan almost connections in less than one hour and correctly identify known application behaviors moreover using visualizations together with an automated comparison technique based on dynamic time warping of the motifs we can rapidly develop accurate recognizers for new or previously unknown applications
mobile devices used in educational settings are usually employed within collaborative learning activity in which learning takes place in the form of social interactions between team members while performing shared task we introduce mobitop mobile tagging of objects and people geospatial digital library system which allows users to contribute and share multimedia annotations via mobile devices key feature of mobitop that is well suited for collaborative learning is that annotations are hierarchical allowing annotations to be annotated by other users to an arbitrary depth group of student teachers involved in an inquiry based learning activity in geography were instructed to identify rock types and associated landforms by collaborating with each other using the mobitop system the outcome of the study and its implications are reported in this paper
we consider the setting of device that obtains its energy from battery and some regenerative source such as solar cell we consider the speed scaling problem of scheduling collection of tasks with release times deadlines and sizes so as to minimize the energy recharge rate of the regenerative source this is the first theoretical investigation of speed scaling for devices with regenerative energy source we show that the problem can be expressed as polynomial sized convex program we show that using the kkt conditions one can obtain an efficient algorithm to verify the optimality of schedule we show that the energy optimal yds schedule is approximate with respect to the recharge rate we show that the online algorithm bkp is competitive with respect to recharge rate
the pie segment slider is novel parameter control interface combining the advantages of tangible input with the customizability of graphical interface representation the physical part of the interface consists of round touchpad which serves as an appropriate sensor for manipulating ring shaped sliders arranged around virtual object the novel interface concept allows to shift substantial amount of interaction task time from task preparation to its exploratory execution our user study compared the task performance of the novel interface to common touchpad operated gui and examined the task sequences of both solutions the results confirm the benefits of exploiting tangible input and proprioception for operating graphical user interface elements
time series estimation techniques are usually employed in biomedical research to derive variables less accessible from set of related and more accessible variables these techniques are traditionally built from systems modeling approaches including simulation blind decovolution and state estimation in this work we define target time series tts and its related time series rts as the output and input of time series estimation process respectively we then propose novel data mining framework for time series estimation when tts and rts represent different sets of observed variables from the same dynamic system this is made possible by mining database of instances of tts its simultaneously recorded rts and the input output dynamic models between them the key mining strategy is to formulate mapping function for each tts rts pair in the database that translates feature vector extracted from rts to the dissimilarity between true tts and its estimate from the dynamic model associated with the same tts rts pair at run time feature vector is extracted from an inquiry rts and supplied to the mapping function associated with each tts rts pair to calculate dissimilarity measure an optimal tts rts pair is then selected by analyzing these dissimilarity measures the associated input output model of the selected tts rts pair is then used to simulate the tts given the inquiry rts as an input an exemplary implementation was built to address biomedical problem of noninvasive intracranial pressure assessment the performance of the proposed method was superior to that of simple training free approach of finding the optimal tts rts pair by conventional similarity based search on rts features
for specific set of features chosen for representing images the performance of content based image retrieval cbir system depends critically on the similarity or dissimilarity measure used instead of manually choosing distance function in advance more promising approach is to learn good distance function from data automatically in this paper we propose kernel approach to improve the retrieval performance of cbir systems by learning distance metric based on pairwise constraints between images as supervisory information unlike most existing metric learning methods which learn mahalanobis metric corresponding to performing linear transformation in the original image space we define the transformation in the kernel induced feature space which is nonlinearly related to the image space experiments performed on two real world image databases show that our method not only improves the retrieval performance of euclidean distance without distance learning but it also outperforms other distance learning methods significantly due to its higher flexibility in metric learning
large amounts of information can be overwhelming and costly process especially when transmitting data over network typical modern geographical information system gis brings all types of data together based on the geographic component of the data and provides simple point and click query capabilities well as complex analysis tools querying geographical information system however can be prohibitively expensive due to the large amounts of data which may need to be processed since the use of gis technology has grown dramatically in the past few years there is now need more than ever to provide users with the fastest and least expensive query capabilities especially since an approximated of data stored in corporate databases has geographical component however not every application requires the same high quality data for its processing in this paper we address the issues of reducing the cost and response time of gis queries by pre aggregating data by compromising the data accuracy and precision we present computational issues in generation of multi level resolutions of spatial data and show that the problem of finding the best approximation for the given region and real value function on this region under predictable error in general is np complete
transactional memory aims to provide programming model that makes parallel programming easier hardware implementations of transactional memory htm suffer from fewer overheads than implementations in software and refinements in conflict management strategies for htm allow for even larger improvements in particular lazy conflict management has been shown to deliver better performance but it has hitherto required complex protocols and implementations in this paper we show new scalable htm architecture that performs comparably to the state of the art and can be implemented by minor modifications to the mesi protocol rather than re engineering it from the ground up our approach detects conflicts eagerly while transaction is running but defers the resolution lazily until commit time we evaluate this eager lazy system eazyhtm by comparing it with the scalable tcc like approach and system employing ideal lazy conflict management with zero cycle transaction validation and fully parallel commits we show that eazyhtm performs on average faster than scalable tcc in addition eazyhtm has fast commits and aborts can commit in parallel even if there is only one directory present and does not suffer from cascading waits
single user interactive computer applications are pervasive in our daily lives and work leveraging single user applications for multi user collaboration has the potential to significantly increase the availability and improve the usability of collaborative applications in this paper we report an innovative transparent adaptation approach for this purpose the basic idea is to adapt the single user application programming interface to the data and operational models of the underlying collaboration supporting technique namely operational transformation distinctive features of this approach include application transparency it does not require access to the source code of the single user application unconstrained collaboration it supports concurrent and free interaction and collaboration among multiple users and reusable collaborative software components collaborative software components developed with this approach can be reused in adapting wide range of single user applications this approach has been applied to transparently convert ms word into real time collaborative word processor called coword which supports multiple users to view and edit any objects in the same word document at the same time over the internet the generality of this approach has been tested by re applying it to convert ms powerpoint into copowerpoint
we show that naming the existence of distinct ids known to all is hidden but necessary assumption of herlihy’s universality result for consensus we then show in very precise sense that naming is harder than consensus and bring to the surface some relevant differences existing between popular shared memory models
we propose novel partition path based ppb grouping strategy to store compressed xml data in stream of blocks in addition we employ minimal indexing scheme called block statistic signature bss on the compressed data which is simple but effective technique to support evaluation of selection and aggregate xpath queries of the compressed data we present formal analysis and empirical study of these techniques the bss indexing is first extended into effective cluster statistic signature css and multiple cluster statistic signature mss indexing by establishing more layers of indexes we analyze how the response time is affected by various parameters involved in our compression strategy such as the data stream block size the number of cluster layers and the query selectivity we also gain further insight about the compression and querying performance by studying the optimal block size in stream which leads to the minimum processing cost for queries the cost model analysis provides solid foundation for predicting the querying performance finally we demonstrate that our ppb grouping and indexing strategies are not only efficient enough to support path based selection and aggregate queries of the compressed xml data but they also require relatively low computation time and storage space when compared with other state of the art compression strategies
logic programs under answer set semantics constitute an important tool for declarative problem solving in recent years two research issues received growing attention on the one hand concepts like loops and elementary sets have been proposed in order to extend clark’s completion for computing answer sets of logic programs by means of propositional logic on the other hand different concepts of program equivalence like strong and uniform equivalence have been studied in the context of program optimization and modular programming in this paper we bring these two lines of research together and provide alternative characterizations for different conceptions of equivalence in terms of unfounded sets along with the related concepts of loops and elementary sets our results yield new insights into the model theory of equivalence checking we further exploit these characterizations to develop novel encodings of program equivalence in terms of standard and quantified propositional logic respectively
the acoustic ensbox is an embedded platform which enables practical distributed acoustic sensing by providing integrated hardware and software support in single platform it provides highly accurate acoustic self calibration system which eliminates the need for manual surveying of node reference positions in this paper we present an acoustic laptop that enables distributed acoustic research through the use of less resource constrained and more readily available platform it runs exactly the same software and uses the same sensor hardware as the acoustic ensbox but replaces the embedded computing platform with standard laptop we describe the advantages of using the acoustic laptop as rich prototyping platform for acoustic source localization and mote class node localization applications the acoustic laptop is not intended to replace the acoustic ensbox but to compliment it by providing an easily replicated prototyping platform that is extensible and resource rich and suitable for attended pilot deployments we show that the benefits gained by laptop’s extra resources enable intensive signal processing in real time without optimization this enables on line interactive experimentation with algorithms such as approximated maximum likelihood applications developed using the acoustic laptop can subsequently be run on the more deployable acoustic ensbox platform unmodified apart from performance optimizations
in this paper we try to conclude what kind of computer architecture is efficient for executing sequential problems and what kind of an architecture is efficient for executing parallel problems from the processor architect’s point of view for that purpose we analytically evaluate the performance of eight general purpose processor architectures representing widely both commercial and scientific processor designs in both single processor and multiprocessor setups the results are interesting the most efficient architecture for sequential problems is two level pipelined vliw very long instruction word architecture with few parallel functional units the most efficient architecture for parallel problems is deeply inter thread superpipelined architecture in which functional units are chained thus designing computer for efficient sequential computation leads to very different architecture than designing one for efficient parallel computation and there exists no single optimal architecture for general purpose computation
visitors enter website through variety of means including web searches links from other sites and personal bookmarks in some cases the first page loaded satisfies the visitor’s needs and no additional navigation is necessary in other cases however the visitor is better served by content located elsewhere on the site found by navigating links if the path between user’s current location and his eventual goal is circuitous then the user may never reach that goal or will have to exert considerable effort to reach it by mining site access logs we can draw conclusions of the form users who load page are likely to later load page if there is no direct link from to then it is advantageous to provide one the process of providing links to users eventual goals while skipping over the in between pages is called shortcutting existing algorithms for shortcutting require substantial offline training which make them unable to adapt when access patterns change between training sessions we present improved online algorithms for shortcut link selection that are based on novel analogy drawn between shortcutting and caching in the same way that cache algorithms predict which memory pages will be accessed in the future our algorithms predict which web pages will be accessed in the future our algorithms are very efficient and are able to consider accesses over long period of time but give extra weight to recent accesses our experiments show significant improvement in the utility of shortcut links selected by our algorithm as compared to those selected by existing algorithms
mobile computing systems should be self managed to simplify operation and maintenance plus meet user expectation with respect to quality of service qos when architecting self managed mobile computing systems one must take holistic view on both qos management and the entities in the mobile environment this paper presents novel model that includes both resources and context elements to illustrate the usefulness of the model it is applied to video streaming application by modelling context elements and resources in the environment specifying context dependencies and qos characteristics of the application and designing weakly integrated resource and context managers we describe middleware that uses the developed managers when evaluating context dependencies and predict offered qos of alternative implementations of the application in order to select the one that can operate in the current environment and that best satisfies given user preferences
the construction of taxonomies is considered as the first step for structuring domain knowledge many methodologies have been developed in the past for building taxonomies from classical information repositories such as dictionaries databases or domain text however in the last years scientists have started to consider the web as valuable repository of knowledge in this paper we present novel approach especially adapted to the web environment for composing taxonomies in an automatic and unsupervised way it uses combination of different types of linguistic patterns for hyponymy extraction and carefully designed statistical measures to infer information relevance the learning performance of the different linguistic patterns and statistical scores considered is carefully studied and evaluated in order to design method that maximizes the quality of the results our proposal is also evaluated for several well distinguished domains offering in all cases reliable taxonomies considering precision and recall
applications like multimedia retrieval require efficient support for similarity search on large data collections yet nearest neighbor search is difficult problem in high dimensional spaces rendering efficient applications hard to realize index structures degrade rapidly with increasing dimensionality while sequential search is not an attractive solution for repositories with millions of objects this paper approaches the problem from different angle solution is sought in an unconventional storage scheme that opens up new range of techniques for processing nn queries especially suited for high dimensional spaces the suggested physical database design accommodates well novel variant of branch and bound search that reduces the high dimensional space quickly to small candidate set the paper provides insight in applying this idea to nn search using two similarity metrics commonly encountered in image database applications and discusses techniques for its implementation in relational database systems the effectiveness of the proposed method is evaluated empirically on both real and synthetic data sets reporting the significant improvements in response time yielded
clock network power in field programmable gate arrays fpgas is considered and two complementary approaches for clock power reduction in the xilinx virtex fpga are presented the approaches are unique in that they leverage specific architectural aspects of virtex to achieve reductions in dynamic power consumed by the clock network the first approach comprises placement based technique to reduce interconnect resource usage on the clock network thereby reducing capacitance and power up to the second approach borrows the clock gating notion from the asic domain and applies it to fpgas clock enable signals on flip flops are selectively migrated to use the dedicated clock enable available on the fpga’s built in clock network leading to reduced toggling on the clock interconnect and lower power up to power reductions are achieved without any performance penalty on average
the creation of realistic virtual human heads is challenging problem in computer graphics due to the diversity and individuality of the human being in this paper an easy to use geometry optimized virtual human head is presented by using uniform two step refinement operator the virtual model has natural multiresolution structure in the coarsest level the model is represented by control mesh which serves as the anatomical structure of the head to achieve fine geometric details over multi level detail set is assigned to the corresponding control vertices in with the aid of uniquely defined local frame field we show in this paper that with carefully designed control mesh the presented virtual model not only captures the physical characteristics of the head but also is optimal in the geometric sense the contributions of this paper also include that diverse applications of the presented virtual model are presented showing that the model deformation is smooth and natural and the presented virtual model is easy to use
while simulationists devise ever more efficient simulation algorithms for specific applications and infrastructures the problem of automatically selecting the most appropriate one for given problem has received little attention so far one reason for this is the overwhelming amount of performance data that has to be analyzed for deriving suitable selection mechanisms we address this problem with framework for data mining on simulation performance data which enables the evaluation of various data mining methods in this context such an evaluation is essential as there is no best data mining algorithm for all kinds of simulation performance data once an effective data mining approach has been identified for specific class of problems its results can be used to select efficient algorithms for future simulation problems this paper covers the components of the framework the integration of external tools and the re formulation of the algorithm selection problem from data mining perspective basic data mining strategies for algorithm selection are outlined and sample algorithm selection problem from computational biology is presented
major performance bottleneck for database systems is the memory hierarchy the performance of the memory hierarchy is directly related to how the content of disk pages maps to the cache lines ie to the organization of data within disk page called the page layout the prevalent page layout in database systems is the ary storage model nsm as demonstrated in this paper using nsm for temporal data deteriorates memory hierarchy performance for query intensive workloads this paper proposes two cacheconscious read optimized page layouts for temporal data experiments show that the proposed page layouts are substantially faster than nsm
in modern software development regression tests are used to confirm the fundamental functionalities of an edited program and to assure the code quality difficulties occur when testing reveals unexpected behaviors which indicate potential defects introduced by the edit however the changes that caused the failure are not always easy to find we propose heuristic that ranks method changes that might have affected failed test indicating the likelihood that they may have contributed to test failure our heuristic is based on the calling structure of the failed test eg the number of ancestors and descendents of method in the test’s call graph whether the caller or callee was changed etc we evaluated the effectiveness of the heuristic in pairs of edited versions in the eclipse jdt core plug in using the test suite from its compiler tests plug in our results indicate that when failure is caused by single method change our heuristic ranked the failure inducing change as number or number of all the method changes in of the delegate tests ie representatives of all failing tests even when the failure is caused by some combination of the changes rather than single change our heuristic still helps
within the area of general purpose fine grained subjectivity analysis opinion topic identification has to date received little attention due to both the difficulty of the task and the lack of appropriately annotated resources in this paper we provide an operational definition of opinion topic and present an algorithm for opinion topic identification that following our new definition treats the task as problem in topic coreference resolution we develop methodology for the manual annotation of opinion topics and use it to annotate topic information for portion of an existing general purpose opinion corpus in experiments using the corpus our topic identification approach statistically significantly outperforms several non trivial baselines according to three evaluation measures
we study the representation and manipulation of geospatial information in database management system dbms the geospatial data model that we use as basis hinges on complex object model whose set and tuple constructors make it efficient for defining not only collections of geographic objects but also relationships among these objects in addition it allows easy manipulation of nonbasic types such as spatial data types we investigate the mapping of our reference model onto major commercial dbms models namely relational model extended to abstract data types adt and an object oriented model our analysis shows the strengths and limits of the two model types for handling highly structured data with spatial components
data integration is the problem of combining data residing at different sources and providing the user with unified view of these data one of the critical issues of data integration is the detection of similar entities based on the content this complexity is due to three factors the data type of the databases are heterogenous the schema of databases are unfamiliar and heterogenous as well and the amount of records is voluminous and time consuming to analyze as solution to these problems we extend our work in another of our papers by introducing new measure to handle heterogenous textual and numerical data type for coincident meaning extraction firstly to in order accommodate the heterogeneous data types we propose new weight called bin frequency inverse document bin frequency bf idbf for effective heterogeneous data pre processing and classification by unified vectorization secondly in order to handle the unfamiliar data structure we use the unsupervised algorithm self organizing map finally to help the user to explore and browse the semantically similar entities among the copious amount of data we use som based visualization tool to map the database tables based on their semantical content
this paper proposes and studies distributed cache management approach through page level data to cache slice mapping in future processor chip comprising many cores cache management is crucial multicore processor design aspect to overcome non uniform cache access latency for high program performance and to reduce on chip network traffic and related power consumption unlike previously studied pure hardware based private and shared cache designs the proposed os microarchitecture approach allows mimicking wide spectrum of caching policies without complex hardware support moreover processors and cache slices can be isolated from each other without hardware modifications resulting in improved chip reliability characteristics we discuss the key design issues and implementation strategies of the proposed approach and present an experimental result showing the promise of it
the one dimensional decomposition of nonuniform workload arrays with optimal load balancing is investigated the problem has been studied in the literature as the chains on chains partitioning problem despite the rich literature on exact algorithms heuristics are still used in parallel computing community with the hope of good decompositions and the myth of exact algorithms being hard to implement and not runtime efficient we show that exact algorithms yield significant improvements in load balance over heuristics with negligible overhead detailed pseudocodes of the proposed algorithms are provided for reproducibility we start with literature review and propose improvements and efficient implementation tips for these algorithms we also introduce novel algorithms that are asymptotically and runtime efficient our experiments on sparse matrix and direct volume rendering datasets verify that balance can be significantly improved by using exact algorithms the proposed exact algorithms are times faster than single sparse matrix vector multiplication for way decompositions on the average we conclude that exact algorithms with proposed efficient implementations can effectively replace heuristics
an efficient index structure for complex multi dimensional objects is one of the most challenging requirements in non traditional applications such as geographic information systems computer aided design and multimedia databases in this paper we first propose main memory data structure for complex multi dimensional objects then we present an extension of the existing multi dimensional index structure among existing multi dimensional index structures the popular ast tree is selected the ast tree is coupled with the main memory data structure to improve the performance of spatial query processing an analytical model is developed for our index structure experimental results show that the analytical model is accurate the relative error being below the performance of our index structure is compared with that of state of the art index structure by experimental measurements our index structure outperforms the state of the art index structure due to its ability to reduce large amount of storage
in this paper we propose measures for compressed data structures in which space usage is measured in data aware manner in particular we consider the fundamental dictionary problem on set data where the task is to construct data structure for representing set of items out of universe and supporting various queries on we use well known data aware measure for set data called gap to bound the space of our data structures we describe novel dictionary structure that requires gap nlog logn nloglog bits under the ram model our dictionary supports membership rank and predecessor queries in nearly optimal time matching the time bound of andersson and thorup’s predecessor structure andersson thorup tight er worst case bounds on dynamic searching and priority queues in acm symposium on theory of computing stoc while simultaneously improving upon their space usage we support select queries even faster in loglogn time our dictionary structure uses exactly gap bits in the leading term ie the constant factor is and answers queries in near optimal time when seen from the worst case perspective we present the first nlog bit dictionary structure that supports these queries in near optimal time under the ram model we also build dictionary which requires the same space and supports membership select and partial rank queries even more quickly in loglogn time we go on to show that for many real world datasets data aware methods lead to worthwhile compression over combinatorial methods to the best of our knowledge these are the first results that achieve data aware space usage and retain near optimal time
topological relationships like overlap inside meet and disjoint uniquely characterize the relative position between objects in space for long time they have been focus of interdisciplinary research as in artificial intelligence cognitive science linguistics robotics and spatial reasoning especially as predicates they support the design of suitable query languages for spatial data retrieval and analysis in spatial database systems and geographical information systems while to large extent conceptual aspects of topological predicates like their definition and reasoning with them as well as strategies for avoiding unnecessary or repetitive predicate executions like predicate migration and spatial index structures have been emphasized the development of robust and efficient implementation techniques for them has been largely neglected especially the recent design of topological predicates for all combinations of complex spatial data types has resulted in large increase of their numbers and stressed the importance of their efficient implementation the goal of this article is to develop correct and efficient implementation techniques of topological predicates for all combinations of complex spatial data types including two dimensional point line and region objects as they have been specified by different authors and in different commercial and public domain software packages our solution consists of two phases in the exploration phase for given scene of two spatial objects all topological events like intersection and meeting situations are summarized in two precisely defined topological feature vectors one for each argument object of topological predicate whose specifications are characteristic and unique for each combination of spatial data types these vectors serve as input for the evaluation phase which analyzes the topological events and determines the boolean result of topological predicate predicate verification or the kind of topological predicate predicate determination by formally defined method called nine intersection matrix characterization besides this general evaluation method the article presents an optimized method for predicate verification called matrix thinning and an optimized method for predicate determination called minimum cost decision tree the methods presented in this article are applicable to all known complete collections of mutually exclusive topological predicates that are formally based on the well known nine intersection model
voltage islanding technique in network on chip noc can significantly reduce the computational energy consumption by scaling down the voltage levels of the processing elements pes this reduction in energy consumption comes at the cost of the energy consumption of the level shifters between voltage islands moreover from physical design perspective it is desirable to have limited number of voltage islands considering voltage islanding during mapping of the pes to the noc routers can significantly reduce both the computational and the level shifter energy consumptions and the communication energy consumption on the noc links in this paper we formulate the problem as an optimization problem with an objective of minimizing the overall energy consumption constrained by the performance in terms of delay and the maximum number of voltage islands we provide the optimal solution to our problem using mixed integer linear program milp formulation we also propose heuristic based on random greedy selection to solve the problem experimental results using es benchmark applications and some real applications show that the heuristic finds near optimal solution in almost all cases in very small fraction of the time required to achieve the optimal solution
we extend controlled query evaluation cqe an inference control method to enforce confidentiality in static information systems under queries to updatable databases within the framework of the lying approach to cqe we study user update requests that have to be translated into new database state in order to avoid dangerous inferences some such updates have to be denied even though the new database instance would be compatible with set of integrity constraints in contrast some other updates leading to an incompatible instance should not be denied we design control method to resolve this seemingly paradoxical situation and then prove that the general security definitions of cqe and other properties linked to user updates hold
the ambient calculus is calculus of computation that allows active processes to move between sites we present an analysis inspired by state of the art pointer analyses that safety and accurately predicts which processes may turn up at what sites during the execution of composite system the analysis models sets of processes by sets of regular tree grammars enhanced with context dependent counts and it obtains its precision by combining powerful redex materialisation with strong redex reduction in the manner of the strong updates performed in pointer analyses the underlying ideas are flexible and scale up to general tree structures admitting powerful restructuring operations
the web fosters the creation of communities by offering users wide array of social software tools while the success of these tools is based on their ability to support different interaction patterns among users by imposing as few limitations as possible the communities they support are not free of rules just think about the posting rules in community forum or the editing rules in thematic wiki in this paper we propose framework for the sharing of best community practices in the form of potentially rule based annotation layer that can be integrated with existing web community tools with specific focus on wikis this solution is characterized by minimal intrusiveness and plays nicely within the open spirit of the web by providing users with behavioral hints rather than by enforcing the strict adherence to set of rules
estimating the global data distribution in peer to peer pp networks is an important issue and has not yet been well addressed it can benefit many pp applications such as load balancing analysis query processing data mining and so on in this paper we propose novel algorithm which is based on compact multi dimensional histogram information to achieve high estimation accuracy with low estimation cost maintaining data distribution in multi dimensional histogram which is spread among peers without overlapping and each part of which is further condensed by set of discrete cosine transform coefficients each peer is capable to hierarchically accumulate the compact information to the entire histogram by information exchange and consequently estimates the global data density with accuracy and efficiency algorithms on discrete cosine transform coefficients hierarchically accumulating as well as density estimation error are introduced with detailed theoretical analysis and proof our extensive performance study confirms the effectiveness and efficiency of our methods on density estimation in dynamic pp networks
we demonstrate real time simulation system capable of automatically balancing standing character while at the same time tracking reference motion and responding to external perturbations the system is general to non human morphologies and results in natural balancing motions employing the entire body for example wind milling our novel balance routine seeks to control the linear and angular momenta of the character we demonstrate how momentum is related to the center of mass and center of pressure of the character and derive control rules to change these centers for balance the desired momentum changes are reconciled with the objective of tracking the reference motion through an optimization routine which produces target joint accelerations hybrid inverse forward dynamics algorithm determines joint torques based on these joint accelerations and the ground reaction forces finally the joint torques are applied to the free standing character simulation we demonstrate results for following both motion capture and keyframe data as well as both human and non human morphologies in presence of variety of conditions and disturbances
static program checking tools can find many serious bugs in software but due to analysis limitations they also frequently emit false error reports such false positives can easily render the error checker useless by hiding real errors amidst the false effective error report ranking schemes mitigate the problem of false positives by suppressing them during the report inspection process in this way ranking techniques provide complementary method to increasing the precision of the analysis results of checking tool weakness of previous ranking schemes however is that they produce static rankings that do not adapt as reports are inspected ignoring useful correlations amongst reports this paper addresses this weakness with two main contributions first we observe that both bugs and false positives frequently cluster by code locality we analyze clustering behavior in historical bug data from two large systems and show how clustering can be exploited to greatly improve error report ranking second we present general probabilistic technique for error ranking that exploits correlation behavior amongst reports and incorporates user feedback into the ranking process in our results we observe factor of improvement over randomized ranking for error reports emitted by both intra procedural and inter procedural analysis tools
we present framework for the formal verification of abstract state machine asm designs using the multiway decision graphs mdg tool asm is state based language for describing transition systems mdg provides symbolic representation of transition systems with support of abstract sorts and functions we implemented transformation tool that automatically generates mdg models from asm specifications then formal verification techniques provided by the mdg tool such as model checking or equivalence checking can be applied on the generated models we illustrate this work with the case study of an atm switch controller in which behavior and structure were specified in asm and using our asm mdg facility are successfully verified with the mdg tool
standard pattern discovery techniques such as association rules suffer an extreme risk of finding very large numbers of spurious patterns for many knowledge discovery tasks the direct adjustment approach to controlling this risk applies statistical test during the discovery process using critical value adjusted to take account of the size of the search space however problem with the direct adjustment strategy is that it may discard numerous true patterns this paper investigates the assignment of different critical values to different areas of the search space as an approach to alleviating this problem using variant of technique originally developed for other purposes this approach is shown to be effective at increasing the number of discoveries while still maintaining strict control over the risk of false discoveries
multi stage programming msp provides disciplined approach to run time code generation in the purely functional setting it has been shown how msp can be used to reduce the overhead of abstractions allowing clean maintainable code without paying performance penalties unfortunately msp is difficult to combine with imperative features which are prevalent in mainstream languages the central difficulty is scope extrusion wherein free variables can inadvertently be moved outside the scopes of their binders this paper proposes new approach to combining msp with imperative features that occupies sweet spot in the design space in terms of how well useful msp applications can be expressed and how easy it is for programmers to understand the key insight is that escapes or anti quotes must be weakly separable from the rest of the code ie the computational effects occurring inside an escape that are visible outside the escape are guaranteed to not contain code to demonstrate the feasibility of this approach we formalize type system based on lightweight java which we prove sound and we also provide an implementation called mint to validate both the expressivity of the type system and the effect of staging on the performance of java programs
scheduling in large scale parallel systems has been and continues to be an important and challenging research problem several key factors including the increasing use of off the shelf clusters of workstations to build such parallel systems have resulted in the emergence of new class of scheduling strategies broadly referred to as dynamic coscheduling unfortunately the size of both the design and performance spaces of these emerging scheduling strategies is quite large due in part to the numerous dynamic interactions among the different components of the parallel computing environment as well as the wide range of applications and systems that can comprise the parallel environment this in turn makes it difficult to fully explore the benefits and limitations of the various proposed dynamic coscheduling approaches for large scale systems solely with the use of simulation and or experimentationto gain better understanding of the fundamental properties of different dynamic coscheduling methods we formulate general mathematical model of this class of scheduling strategies within unified framework that allows us to investigate wide range of parallel environments we derive matrix analytic analysis based on stochastic decomposition and fixed point iteration large number of numerical experiments are performed in part to examine the accuracy of our approach these numerical results are in excellent agreement with detailed simulation results our mathematical model and analysis is then used to explore several fundamental design and performance tradeoffs associated with the class of dynamic coscheduling policies across broad spectrum of parallel computing environments
in this paper we propose new index structure for object oriented databases the main idea of this is graph structure called signature graph which is constructed over signature file generated for class and improves the search of signature file dramatically in addition the signature files accordingly the signature graphs can be organized into hierarchy according to the nested structure called the aggregation hierarchy of classes in an object oriented database which leads to another significant improvements
schema mapping is specification that describes how data structured under one schema the source schema is to be transformed into data structured under different schema the target schema although the notion of an inverse of schema mapping is important the exact definition of an inverse mapping is somewhat elusive this is because schema mapping may associate many target instances with each source instance and many source instances with each target instance based on the notion that the composition of mapping and its inverse is the identity we give formal definition for what it means for schema mapping to be an inverse of schema mapping for class of source instances we call such an inverse an inverse particular case of interest arises when is the class of all instances in which case an inverse is global inverse we focus on the important and practical case of schema mappings defined by source to target tuple generating dependencies and uncover rich theory when is defined by set of dependencies with finite chase we show how to construct an inverse when one exists in particular we show how to construct global inverse when one exists given and we show how to define the largest class such that is an inverse of
we propose group membership service for dynamic ad hoc networks it maintains as long as possible the existing groups and ensures that each group diameter is always smaller than constant fixed according to the application using the groups the proposed protocol is self stabilizing and works in dynamic distributed systems moreover it ensures kind of continuity in the service offer to the application while the system is converging except if too strong topology changes happen such best effort behavior allows applications to rely on the groups while the stabilization has not been reached which is very useful in dynamic ad hoc networks
new approach for supporting reactive capability is described in the context of an advanced object oriented database system called adome ii besides having rich set of pre defined composite event expressions and well defined execution model adome ii supports an extensible approach to reactive processing so as to be able to gracefully accommodate dynamic applications requirements in this approach production rules combined with methods are used as unifying mechanism to process rules to enable incremental detection of composite events and to allow new composite event expressions to be introduced into the system declaratively this allows the definition of new production rules each time an extension of the model takes place methods of supporting new composite event expressions are described and comparisons with other relevant approaches are also conducted prototype of adome ii has been constructed which has as its implementation base an ordinary passive oodbms and production rule base system
human societies have long used the capability of argumentation and dialogue to overcome and resolve conflicts that may arise within their communities today there is an increasing level of interest in the application of such dialogue games within artificial agent societies in particular within the field of multi agent systems this theory of argumentation and dialogue games has become instrumental in designing rich interaction protocols and in providing agents with means to manage and resolve conflicts however to date much of the existing literature focuses on formulating theoretically sound and complete models for multi agent systems nonetheless in so doing it has tended to overlook the computational implications of applying such models in agent societies especially ones with complex social structures furthermore the systemic impact of using argumentation in multi agent societies and its interplay with other forms of social influences such as those that emanate from the roles and relationships of society within such contexts has also received comparatively little attention to this end this paper presents significant step towards bridging these gaps for one of the most important dialogue game types namely argumentation based negotiation abn the contributions are three fold first we present both theoretically grounded and computationally tractable abn framework that allows agents to argue negotiate and resolve conflicts relating to their social influences within multi agent society in particular the model encapsulates four fundamental elements scheme that captures the stereotypical pattern of reasoning about rights and obligations in an agent society ii mechanism to use this scheme to systematically identify social arguments to use in such contexts iii language and protocol to govern the agent interactions and iv set of decision functions to enable agents to participate in such dialogues second we use this framework to devise series of concrete algorithms that give agents set of abn strategies to argue and resolve conflicts in multi agent task allocation scenario in so doing we exemplify the versatility of our framework and its ability to facilitate complex argumentation dialogues within artificial agent societies finally we carry out series of experiments to identify how and when argumentation can be useful for agent societies in particular our results show clear inverse correlation between the benefit of arguing and the resources available within the context that when agents operate with imperfect knowledge an arguing approach allows them to perform more effectively than non arguing one that arguing earlier in an abn interaction presents more efficient method than arguing later in the interaction and that allowing agents to negotiate their social influences presents both an effective and an efficient method that enhances their performance within society
sampling conditions for recovering the homology of set using topological persistence are much weaker than sampling conditions required by any known polynomial time algorithm for producing topologically correct reconstruction under the former sampling conditions which we call weak sampling conditions we give an algorithm that outputs topologically correct reconstruction unfortunately even though the algorithm terminates its time complexity is unbounded motivated by the question of knowing if polynomial time algorithm for reconstruction exists under the weak sampling conditions we identify at the heart of our algorithm test which requires answering the following question given two dimensional simplicial complexes does there exist simplicial complex containing and contained in which realizes the persistent homology of into we call this problem the homological simplification of the pair and prove that this problem is np complete using reduction from sat
the global spread of business is introducing new trend in the organization of work this dynamics in business has led to distribution of activities in different locations affecting also the way people develop software developers software engineers quality inspectors and also users and customers are distributed around the world since stakeholders are distributed the software development cannot be efficiently performed without support for articulating these people in consistent way issues like distance communication and different time zones introduce additional difficulties to the stakeholders involved in the process this paper explores the use of an agents architecture designed to support the requirements engineering specifically the verification and validation activities in the distributed development process goal driven approach is used to define high level goals that after refined makes possible to derive requirements and assigning responsibilities to the actors humans software agents devices and programs
the growing use of digital signal processors dsps in embedded systems necessitates the use of optimizing compilers supporting special hardware features in this paper we present compiler optimizations with the aim of minimizing energy consumption of embedded applications this comprises loop optimizations for exploitation of simd instructions and zero overhead hardware loops in order to increase performance and decrease the energy consumption in addition we use phase coupled code generator based on genetic algorithm gcg which is capable of performing energy aware instruction selection and scheduling energy aware compilation is done with respect to an instruction level energy cost model which is integrated into our code generator and simulator experimental results for several benchmarks show the effectiveness of our approach
memory latency is an important bottleneck in system performance that cannot be adequately solved by hardware alone several promising software techniques have been shown to address this problem successfully in specific situations however the generality of these software approaches has been limited because current architecturtes do not provide fine grained low overhead mechanism for observing and reacting to memory behavior directly to fill this need this article proposes new class of memory operations called informing memory operations which essentially consist of memory operatin combined either implicitly or explicitly with conditional branch and ink operation that is taken only if the reference suffers cache miss this article describes two different implementations of informing memory operations one is based on cache outcome condition code and the other is based on low overhead traps we find that modern in order issue and out of order issue superscalar processors already contain the bulk of the necessary hardware support we describe how number of software based memory optimizations can exploit informing memory operations to enhance performance and we look at cache coherence with fine grained access control as case study our performance results demonstrate that the runtime overhead of invoking the informing mechanism on the alpha and mips processors is generally small enough to provide considerable flexibility to hardware and software designers and that the cache coherence application has improved performance compared to other current solutions we believe that the inclusion of informing memory operations in future processors may spur even more innovative performance optimizations
this paper presents the creation deployment and evaluation of large scale spatially stable paper based visualization of software system the visualization was created for single team who were involved systematically in its initial design and subsequent design iterations the evaluation indicates that the visualization supported the onboarding scenario but otherwise failed to realize the research team’s expectations we present several lessons learned and cautions to future research into largescale spatially stable visualizations of software systems
frequent subtree mining has attracted great deal of interest among the researchers due to its application in wide variety of domains some of the domains include bio informatics xml processing computational linguistics and web usage mining despite the advances in frequent subtree mining mining for the entire frequent subtrees is infeasible due to the combinatorial explosion of the frequent subtrees with the size of the datasets in order to provide reduced and concise representation without information loss we propose novel algorithm pcitminer prefix based closed induced tree miner pcitminer adopts the prefix based pattern growth strategy to provide the closed induced frequent subtrees efficiently the empirical analysis reveals that our algorithm significantly outperforms the current state of the art algorithm prefixtreeispan zou lu zhang hu and zhou
data on the web in html tables is mostly structured but we usually do not know the structure in advance thus we cannot directly query for data of interest we propose solution to this problem based on document independent extraction ontologies our solution entails elements of table understanding data integration and wrapper creation table understanding allows us to find tables of interest within web page recognize attributes and values within the table pair attributes with values and form records data integration techniques allow us to match source records with target schema ontologically specified wrappers allow us to extract data from source records into target schema experimental results show that we can successfully locate data of interest in tables and map the data from source html tables with unknown structure to given target database schema we can thus directly query source data with unknown structure through known target schema
recent proposals for chip multiprocessors cmps advocate speculative or implicit threading in which the hardware employs prediction to peel off instruction sequences ie implicit threads from the sequential execution stream and speculatively executes them in parallel on multiple processor cores these proposals augment conventional multiprocessor which employs explicit threading with the ability to handle implicit threads current proposals focus on only implicitly threaded code sections this paper identifies for the first time the issues in combining explicit and implicit threading we present the multiplex architecture to combine the two threading models multiplex exploits the similarities between implicit and explicit threading and provides unified support for the two threading models without additional hardware multiplex groups subset of protocol states in an implicitly threaded cmp to provide write invalidate protocol for explicit threads using fully integrated compiler infrastructure for automatic generation of multiplex code this paper presents detailed performance analysis for entire benchmarks instead of just implicitly threaded sections as done in previous papers we show that neither threading models alone performs consistently better than the other across the benchmarks cmp with four dual issue cpus achieves speedup of and over one dual issue cpu using implicit only and explicit only threading respectively multiplex matches or outperforms the better of the two threading models for every benchmark and four cpu multiplex achieves speedup of our detailed analysis indicates that the dominant overheads in an implicitly threaded cmp are speculation state overflow due to limited cache capacity and load imbalance and data dependences in fine grain threads
task scheduling is an essential aspect of parallel programming most heuristics for this np hard problem are based on simple system model that assumes fully connected processors and concurrent interprocessor communication hence contention for communication resources is not considered in task scheduling yet it has strong influence on the execution time of parallel program this paper investigates the incorporation of contention awareness into task scheduling new system model for task scheduling is proposed allowing us to capture both end point and network contention to achieve this the communication network is reflected by topology graph for the representation of arbitrary static and dynamic networks the contention awareness is accomplished by scheduling the communications represented by the edges in the task graph onto the links of the topology graph edge scheduling is theoretically analyzed including aspects like heterogeneity routing and causality the proposed contention aware scheduling preserves the theoretical basis of task scheduling it is shown how classic list scheduling is easily extended to this more accurate system model experimental results show the significantly improved accuracy and efficiency of the produced schedules
we revisit the shortest path problem in asynchronous duty cycled wireless sensor networks which exhibit time dependent features we model the time varying link cost and distance from each node to the sink as periodic functions we show that the time cost function satisfies the fifo property which makes the time dependent shortest path problem solvable in polynomial time using the synchronizer we propose fast distributed algorithm to build all to one shortest paths with polynomial message complexity and time complexity the algorithm determines the shortest paths for all discrete times with single execution in contrast with multiple executions needed by previous solutions we further propose an efficient distributed algorithm for time dependent shortest path maintenance the proposed algorithm is loop free with low message complexity and low space complexity of maxdeg where maxdeg is the maximum degree for all nodes the performance of our solution is evaluated under diverse network configurations the results suggest that our algorithm is more efficient than previous solutions in terms of message complexity and space complexity
the semantic web not only contains resources but also includes the heterogeneous relationships among them which is sharply distinguished from the current web as the growth of the semantic web specialized search techniques are of significance in this paper we present rss framework for enabling ranked semantic search on the semantic web in this framework the heterogeneity of relationships is fully exploited to determine the global importance of resources in addition the search results can be greatly expanded with entities most semantically related to the query thus able to provide users with properly ordered semantic search results by combining global ranking values and the relevance between the resources and the query the proposed semantic search model which supports inference is very different from traditional keyword based search methods moreover rss also distinguishes from many current methods of accessing the semantic web data in that it applies novel ranking strategies to prevent returning search results in disorder the experimental results show that the framework is feasible and can produce better ordering of semantic search results than directly applying the standard pagerank algorithm on the semantic web
when assessing the quality and maintainability of large code bases tools are needed for extracting several facts from the source code such as architecture structure code smells and quality metrics moreover these facts should be presented in such ways so that one can correlate them and find outliers and anomalies we present solidfx an integrated reverse engineering environment ire for and solidfx was specifically designed to support code parsing fact extraction metric computation and interactive visual analysis of the results in much the same way ides and design tools offer for the forward engineering pipeline in the design of solidfx we adapted and extended several existing code analysis and data visualization techniques to render them scalable for handling code bases of millions of lines in this paper we detail several design decisions taken to construct solidfx we also illustrate the application of our tool and our lessons learnt in using it in several types of analyses of real world industrial code bases including maintainability and modularity assessments detection of coding patterns and complexity analyses
the growing aging population faces number of challenges including rising medical cost inadequate number of medical doctors and healthcare professionals as well as higher incidence of misdiagnosis there is an increasing demand for better healthcare support for the elderly and one promising solution is the development of context aware middleware infrastructure for pervasive health wellness care this allows the accurate and timely delivery of health medical information among the patients doctors and healthcare workers through widespread deployment of wireless sensor networks and mobile devices in this paper we present our design and implementation of such context aware middleware for pervasive homecare camph the middleware offers several key enabling system services that consist of pp based context query processing context reasoning for activity recognition and context aware service management it can be used to support the development and deployment of various homecare services for the elderly such as patient monitoring location based emergency response anomalous daily activity detection pervasive access to medical data and social networking we have developed prototype of the middleware and demonstrated the concept of providing continuing care to an elderly with the collaborative interactions spanning multiple physical spaces person home office and clinic the results of the prototype show that our middleware approach achieves good efficiency of context query processing and good accuracy of activity recognition
the simulation of computer networks requires accurate models of user behavior to this end we present empirical models of end user network traffic derived from the analysis of neti home data there are two forms of models presented the first models traffic for specific tcp or udp port the second models all tcp or udp traffic for an end user these models are meant to be network independent and contain aspects such as bytes sent bytes received and user think time the empirical models derived in this study can then be used to enable more realistic simulations of computer networks
energy usage has been an important concern in recent research on online scheduling in this paper we extend the study of the tradeoff between flow time and energy from the single processor setting to the multi processor setting our main result is an analysis of simple non migratory online algorithm called crr classified round robin on processors showing that its flow time plus energy is within times of the optimal non migratory offline algorithm when the maximum allowable speed is slightly relaxed this result still holds even if the comparison is made against the optimal migratory offline algorithm the competitive ratio increases by factor of as special case our work also contributes to the traditional online flow time scheduling specifically for minimizing flow time only crr can yield competitive ratio one or even arbitrarily smaller than one when using sufficiently faster processors prior to our work similar result is only known for online algorithms that needs migration while the best non migratory result can achieve an competitive ratio the above result stems from an interesting observation that there always exists some optimal migratory schedule that can be converted in an offline sense to non migratory schedule with moderate increase in flow time plus energy more importantly this non migratory schedule always dispatches jobs in the same way as crr
in this paper we describe legal core ontology that is part of the legal knowledge interchange format knowledge representation formalism that enables the translation of legal knowledge bases written in different representation formats and formalisms legal core ontology can play an important role in the translation of existing legal knowledge bases to other representation formats in particular as the basis for articulate knowledge serving this requires that the ontology has firm grounding in commonsense and is developed in principled manner we describe the theory and methodology underlying the lkif core ontology compare it with other ontologies introduce the concepts it defines and discuss its use in the formalisation of an eu directive
after some general remarks about program verification we introduce separation logic novel extension of hoare logic that can strengthen the applicability and scalability of program verification for imperative programs that use shared mutable data structures or shared memory concurrency
we present probabilistic model of user affect designed to allow an intelligent agent to recognise multiple user emotions during the interaction with an educational computer game our model is based on probabilistic framework that deals with the high level of uncertainty involved in recognizing variety of user emotions by combining in dynamic bayesian network information on both the causes and effects of emotional reactions the part of the framework that reasons from causes to emotions diagnostic model implements theoretical model of affect the occ model which accounts for how emotions are caused by one’s appraisal of the current context in terms of one’s goals and preferences the advantage of using the occ model is that it provides an affective agent with explicit information not only on which emotions user feels but also why thus increasing the agent’s capability to effectively respond to the users emotions the challenge is that building the model requires having mechanisms to assess user goals and how the environment fits them form of plan recognition in this paper we illustrate how we built the predictive part of the affective model by combining general theories with empirical studies to adapt the theories to our target application domain we then present results on the model’s accuracy showing that the model achieves good accuracy on several of the target emotions we also discuss the model’s limitations to open the ground for the next stage of the work ie complementing the model with diagnostic information
this paper evaluates the combination of two methods for adapting bipedal locomotion to explore virtual environments displayed on head mounted displays hmds within the confines of limited tracking spaces we combine method of changing the optic flow of locomotion effectively scaling the translational gain with method of intervening and manipulating user’s locations in physical space while preserving their spatial awareness of the virtual space this latter technique is called resetting in two experiments we evaluate both scaling the translational gain and resetting while subject locomotes along path and then turns to face remembered object we find that the two techniques can be effectively combined although there is cognitive cost to resetting
an authoring methodology and set of checking tools let authors specify the spatial and temporal features of an application and verify the application prior to its execution the checking tools include an animation tool spatial and temporal layouts and the execution table
business process management is tightly coupled with service oriented architecture as business processes orchestrate services for business collaboration at logical level given the complexity of business processes and the variety of users it is sought after feature to show business process with different views so as to cater for the diverse interests authority levels etc of users this paper presents framework named flexview to support process abstraction and concretisation novel model is proposed to characterise the structural components of business process and describe the relations between these components two algorithms are developed to formally illustrate the realisation of process abstraction and concretisation in compliance with the defined consistency rules prototype is also implemented with ws bpel to prove the applicability of the approach
as usual the sigops workshop provided great platform for interesting discussion among other things controversy arose around the usefulness of causal ordering in distributed system in this paper explain causality in non technical terms and enumerate some of the most prevalent misconceptions that surrounded causality next present some important examples where causal delivery is necessary and sufficient ordering of events
dynamic optimizers modify the binary code of programs at runtime by profiling and optimizing certain aspects of the execution we present completely software based framework that dynamically optimizes programs for object based distributed shared memory dsm systems in dsm systems reducing the number of messages between nodes is crucial prefetching transfers data in advance from the storage node to the local node so that communication is minimized our framework uses profiler and dynamic binary rewriter that monitors the access behavior of the application and places prefetches where they are beneficial to speed up the application in addition we adapt the number of prefetches per request to best fit the application’s behavior evaluation shows that the performance of our system is better than manual prefetching the number of messages sent decreases by up to performance gains of up to can be observed on the benchmarks
we present novel approach to parameterize mesh with disk topology to the plane in shape preserving manner our key contribution is local global algorithm which combines local mapping of each triangle to the plane using transformations taken from restricted set with global stitch operation of all triangles involving sparse linear system the local transformations can be taken from variety of families eg similarities or rotations generating different types of parameterizations in the first case the parameterization tries to force each triangle to be an as similar as possible version of its counterpart this is shown to yield results identical to those of the lscm algorithm in the second case the parameterization tries to force each triangle to be an as rigid as possible version of its counterpart this approach preserves shape as much as possible it is simple effective and fast due to pre factoring of the linear system involved in the global phase experimental results show that our approach provides almost isometric parameterizations and obtains more shape preserving results than other state of the art approaches we present also more general hybrid parameterization model which provides continuous spectrum of possibilities controlled by single parameter the two cases described above lie at the two ends of the spectrum we generalize our local global algorithm to compute these parameterizations the local phase may also be accelerated by parallelizing the independent computations per triangle
inserting virtual objects in real camera images with correct lighting is an active area of research current methods use high dynamic range camera with fish eye lens to capture the incoming illumination the main problem with this approach is the limitation to distant illumination therefore the focus of our work is real time description of both near and far field illumination for interactive movement of virtual objects in the camera image of real room the daylight which is coming in through the windows produces spatially varying distribution of indirect light in the room therefore near field description of incoming light is necessary our approach is to measure the daylight from outside and to simulate the resulting indirect light in the room to accomplish this we develop special dynamic form of the irradiance volume for real time updates of indirect light in the room and combine this with importance sampling and shadow maps for light from outside this separation allows object movements with interactive frame rates fps to verify the correctness of our approach we compare images of synthetic objects with real objects
denial of service dos attacks constitute one of the major threats and among the hardest security problems in today’s internet of particular concern are distributed denial of service ddos attacks whose impact can be proportionally severe with little or no advance warning ddos attack can easily exhaust the computing and communication resources of its victim within short period of time because of the seriousness of the problem many defense mechanisms have been proposed to combat these attacks this paper presents structural approach to the ddos problem by developing classification of ddos attacks and ddos defense mechanisms furthermore important features of each attack and defense system category are described and advantages and disadvantages of each proposed scheme are outlined the goal of the paper is to place some order into the existing attack and defense mechanisms so that better understanding of ddos attacks can be achieved and subsequently more efficient and effective algorithms techniques and procedures to combat these attacks may be developed
novel space time representation str of iterative algorithms for their systematic mapping onto regular processor arrays is proposed timing information is introduced in the dependence graph dg by the definition and the construction of the space time dg stdg any variable instance of the loop body independently of the number of the loop indices is characterized by an integer vector composed by its indices as space part and an additional time index representing its execution time according to preliminary timing the main advantage of the str is that the need for the uniformization of the algorithm is avoided moreover it is proven that in the stdg dependence vectors having opposite directions do not exist and therefore linear mapping of the stdg onto the desired processor array can always be derived efficient and regular architectures are produced by applying the str to the original description of the warshall floyd algorithm for the algebraic path problem
we propose an algorithm to improve the quality of depth maps used for multi view stereo mvs many existing mvs techniques make use of two stage approach which estimates depth maps from neighbouring images and then merges them to extract final surface often the depth maps used for the merging stage will contain outliers due to errors in the matching process traditional systems exploit redundancy in the image sequence the surface is seen in many views in order to make the final surface estimate robust to these outliers in the case of sparse data sets there is often insufficient redundancy and thus performance degrades as the number of images decreases in order to improve performance in these circumstances it is necessary to remove the outliers from the depth maps we identify the two main sources of outliers in top performing algorithm spurious matches due to repeated texture and matching failure due to occlusion distortion and lack of texture we propose two contributions to tackle these failure modes firstly we store multiple depth hypotheses and use spatial consistency constraint to extract the true depth secondly we allow the algorithm to return an unknown state when the true depth estimate cannot be found by combining these in discrete label mrf optimisation we are able to obtain high accuracy depth maps with low numbers of outliers we evaluate our algorithm in multi view stereo framework and find it to confer state of the art performance with the leading techniques in particular on the standard evaluation sparse data sets
in this paper we describe new technique called pipeline spectroscopy and use it to measure the cost of each cache miss the cost of miss is displayed graphed as histogram which represents precise readout showing detailed visualization of the cost of each cache miss throughout all levels of the memory hierarchy we call the graphs spectrograms because they reveal certain signature features of the processor’s memory hierarchy the pipeline and the miss pattern itself next we provide two examples that use spectroscopy to optimize the processor’s hardware or application’s software the first example demonstrates how miss spectrogram can aid software designers in analyzing the performance of an application the second example uses miss spectrogram to analyze bus queueing our experiments show that performance gains of up to are possible detailed analysis of spectrogram leads to much greater insight in pipeline dynamics including effects due to miss cluster miss overlap prefetching and miss queueing delays
in this paper we present an observational case study at major teaching hospital which both inspired and gave us valuable feedback on the design and development of lwoad lwoad is denotational language we propose to support users of an electronic document system in declaratively expressing specifying and implementing computational mechanisms that fulfill coordinative requirements our focus addresses the user friendly and formal expression of local coordinative practices the agile mocking up of corresponding functionalities the full deployment of coordination oriented and context aware behaviors into legacy electronic document systems we give examples of lwoad mechanisms taken from the case study and discuss their impact for the eud of coordinative functionalities
the rapid increase in ic design complexity and wide spread use of intellectual property ip blocks have made the so called mixed size placement very important topic in recent years although several algorithms have been proposed for mixed sized placements most of them primarily focus on the global placement aspect in this paper we propose three step approach named xdp for mixed size detailed placement first combination of constraint graph and linear programming is used to legalize macros then an enhanced greedy method is used to legalize the standard cells finally sliding window based cell swapping is applied to further reduce wirelength the impact of individual techniques is analyzed and quantified experiments show that when applied to the set of global placement results generated by aplace xdp can produce wirelength comparable to the native detailed placement of aplace and shorter wirelength compared to fengshui when applied to the set of global placements generated by mpl xdp is the only detailed placement that successfully produces legal placement for all the examples while aplace and fengshui fail for and of the examples for cases where legal placements can be compared the wirelength produced by xdp is shorter by on average compared to aplace and fengshui furthermore xdp displays higher robustness than the other tools by covering broader spectrum of examples by different global placement tools
efficient access to web content remains elusive for individuals accessing the web using assistive technology previous efforts to improve web accessibility have focused on developer awareness technological improvement and legislation but these approaches have left remaining concerns first while many tools can help produce accessible content these tools are generally difficult to integrate into existing developer workflows and rarely offer specific suggestions that developers can implement second tools that automatically improve web content for users generally solve specific problems and are difficult to combine and use on diversity of existing assistive technology finally although blind web users have proven adept at overcoming the shortcomings of the web and existing tools they have been only marginally involved in improving the accessibility of their own web experience as first step toward addressing these concerns we introduce accessmonkey common scripting framework that web users web developers and web researchers can use to collaboratively improve accessibility this framework advances the idea that javascript and dynamic web content can be used to improve inaccessible content instead of being cause of it using accessmonkey web users and developers on different platforms with potentially different goals can collaboratively make the web more accessible in this paper we first present the accessmonkey framework describe three implementations of it that we have created and offer several example scripts that demonstrate its utility we conclude by discussing future extensions of this work that will provide efficient access to scripts as users browse the web and allow non technical users be involved in creating scripts
this paper motivated by functional brain imaging applications is interested in the discovery of stable spatio temporal patterns this problem is formalized as multi objective multi modal optimization problem on one hand the target patterns must show good stability in wide spatio temporal region antagonistic objectives on the other hand experts are interested in finding all such patterns global and local optima the proposed algorithm termed miner is empirically validated on artificial and real world datasets it shows good performances and scalability detecting target spatiotemporal patterns within minutes from mo datasets
recently number of hybrid systems have been proposed to combine the advantages of shared nothing and shared everything concepts for computing relational join operations most of these proposed systems however presented few analytical results and have produced limited or no implementations on actual multiprocessors in this paper we present parallel join algorithm with load balancing for hybrid system that combines both shared nothing and shared everything architectures we derive an analytical model for the join algorithm on this architecture and validate it using both hardware software simulations and actual experimentations we study the performance of the join on the hybrid system for wide range of system parameter values we conclude that the hybrid system outperforms both shared nothing and shared everything architectures
we present new non blocking array based shared stack and queue implementations we sketch proofs of correctness and amortized time analyses for the algorithms to the best of our knowledge our stack algorithm is the first practical array based one and it is the first time that bounded counter values are employed to implement shared stack and queue we verify the correctness of our algorithms by the spin model checker and compare our algorithms to other algorithms experimentally
while significant amount of research efforts has been reported on developing algorithms based on joins and semijoins to tackle distributed query processing there is relatively little progress made toward exploring the complexity of the problems studied as result proving np hardness of or devising polynomial time algorithms for certain distributed query optimization problems has been elaborated upon by many researchers however due to its inherent difficulty the complexity of the majority of problems on distributed query optimization remains unknown in this paper we generally characterize the distributed query optimization problems and provide frame work to explore their complexity as it will be shown most distributed query optimization problems can be transformed into an optimization problem comprising set of binary decisions termed sum product optimization spo problem we first prove spo is np hard in light of the np completeness of well known problem knapsack knap then using this result as basis we prove that five classes of distributed query optimization problems which cover the majority of distributed query optimization problems previously studied in the literature are np hard by polynomially reducing spo to each of them the detail for each problem transformation is derived we not only prove the conjecture that many prior studies relied upon but also provide frame work for future related studies
it is often argued that usability problems should be identified as early as possible during software development but many usability evaluation methods do not fit well in early development activities we propose method for usability evaluation of use cases widely used representation of design ideas produced early in software development processes the method proceeds by systematic inspection of use cases with reference to set of guidelines for usable design to validate the method four evaluators inspected set of use cases for health care application the usability problems predicted by the evaluators were compared to the result of conventional think aloud test about one fourth of the problems were identified by both think aloud testing and use case inspection about half of the predicted problems not found by think aloud testing were assessed as providing useful input to early development qualitative data on the evaluators experience using the method are also presented on this background we argue that use case inspection has promising potential and discuss its limitations
compilers are critical for embedded systems and high performance computing compiler infrastructure provides an infrastructure for rapid development of high quality compilers based on main components of compiler infrastructures this paper reviews representative compiler infrastructure products and summarizes their features it focuses on an insight analysis of the key techniques for building the compiler back ends and presents our probes into compiler infrastructures for typical issues
intrusion prevention system ips has been an effective tool to detect and prevent unwanted attempts which are mainly through network and system vulnerabilities at accessing and manipulating computer systems intrusion detection and prevention are two main functions of ips as attacks are becoming massive and complex the traditional centralized ipses are incapable of detecting all those attempts the existing distributed ipses mainly based on mobile agent have some serious problems such as weak security of mobile agents response latency large code size in this paper we propose customized intrusion prevention system vmfence in distributed virtual computing environment to simplify the complexity of the management in vmfence the states of detection processes vary with those of virtual machines vms which are described by deterministic finite automata dfa the detection processes each of which detects one virtual machine reside in privileged virtual machine the processes run synchronously and outside of vms in order to achieve high performance and security the experimental results also show vmfence has higher detection efficiency than traditional intrusion detection systems and little impact on the performance of the monitored vms
the rewrite based approach provides executable specifications for security policies which can be independently designed verified and then anchored on programs using modular discipline in this paper we describe how to perform queries over these rule based policies in order to increase the trust of the policy author on the correct behavior of the policy the analysis we provide is founded on the strategic narrowing process which provides both the necessary abstraction for simulating executions of the policy over access requests and the mechanism for solving what if queries from the security administrator we illustrate this general approach by the analysis of firewall system policy
power is an important design constraint in embedded computing systems to meet the power constraint microarchitecture and hardware designed to achieve high performance need to be revisited from both performance and power angles this paper studies one of them branch predictor as well known branch prediction is critical to exploit instruction level parallelism effectively but may incur additional power consumption due to the hardware resource dedicated for branch prediction and the extra power consumed on mispredicted branches this paper explores the design space of branch prediction mechanisms and tries to find the most beneficial one to realize iow power embedded processor the sample processor studied is godson like processor which is dual issue out of order processor with deep pipeline supporting mips instruction set
we propose novel critiquing based recommender interface the hybrid critiquing interface that integrates the user self motivated critiquing facility to compensate for the limitations of system proposed critiques the results from our user study show that the integration of such self motivated critiquing support enables users to achieve higher level of decision accuracy while consuming less cognitive effort in addition users expressed higher subjective opinions of the hybrid critiquing interface than the interface simply providing system proposed critiques and they would more likely return to it for future use
many activities in business process management such as process retrieval process mining and process integration need to determine the similarity or the distance between two processes although several approaches have recently been proposed to measure the similarity between business processes neither the definitions of the similarity notion between processes nor the measure methods have gained wide recognition in this paper we define the similarity and the distance based on firing sequences in the context of workflow nets wf nets as the unified reference concepts however to many wf nets either the number of full firing sequences or the length of single firing sequence is infinite since transition adjacency relations tars can be seen as the genes of the firing sequences which describe transition orders appearing in all possible firing sequences we propose practical similarity definition based on the tar sets of two processes it is formally shown that the corresponding distance measure between processes is metric an algorithm using model reduction techniques for the efficient computation of the measure is also presented experimental results involving comparison of different measures on artificial processes and evaluations on clustering real life processes validate our approach
frequent pattern mining has been studied extensively and has many useful applications however frequent pattern mining often generates too many patterns to be truly efficient or effective in many applications it is sufficient to generate and examine frequent patterns with sufficiently good approximation of the support frequency instead of in full precision such compact but close enough frequent pattern base is called condensed frequent pattern basein this paper we propose and examine several alternatives for the design representation and implementation of such condensed frequent pattern bases several algorithms for computing such pattern bases are proposed their effectiveness at pattern compression and methods for efficiently computing them are investigated systematic performance study is conducted on different kinds of databases and demonstrates the effectiveness and efficiency of our approach in handling frequent pattern mining in large databases
ultra low voltage operation has recently drawn significant attention due to its large potential energy savings however typical design practices used for super threshold operation are not necessarily compatible with the low voltage regime here radically different guidelines may be needed since existing process technologies have been optimized for super threshold operation we therefore study the selection of the optimal technology in ultra low voltage designs to achieve minimum energy and minimum variability which are among foremost concerns we investigate five industrial technologies from nm to nm we demonstrate that mature technologies are often the best choice in very low voltage applications saving as much as in total energy consumption compared to poorly selected technology in parallel the effect of technology choice on variability is investigated when operating at the energy optimal design point the results show up to improvement in delay variation due to global process shift and mismatch when using the most advanced technologies despite their large variability at nominal vdd
self adjusting computation offers language centric approach to writing programs that can automatically respond to modifications to their data eg inputs except for several domain specific implementations however all previous implementations of self adjusting computation assume mostly functional higher order languages such as standard ml prior to this work it was not known if self adjusting computation can be made to work with low level imperative languages such as without placing undue burden on the programmer we describe the design and implementation of ceal based language for self adjusting computation the language is fully general and extends with small number of primitives to enable writing self adjusting programs in style similar to conventional programs we present efficient compilation techniques for translating ceal programs into that can be compiled with existing compilers using primitives supplied by run time library for self adjusting computation we implement the proposed compiler and evaluate its effectiveness our experiments show that ceal is effective in practice compiled self adjusting programs respond to small modifications to their data by orders of magnitude faster than recomputing from scratch while slowing down from scratch run by moderate constant factor compared to previous work we measure significant space and time improvements
in recent years privacy model called anonymity has gained popularity in the microdata releasing as the microdata may contain multiple sensitive attributes about an individual the protection of multiple sensitive attributes has become an important problem different from the existing models of single sensitive attribute extra associations among multiple sensitive attributes should be invested two kinds of disclosure scenarios may happen because of logical associations the diversity is checked to prevent the foregoing disclosure risks with an requirement definition used to ensure the diversity requirement at last two step greedy generalization algorithm is used to carry out the multiple sensitive attributes processing which deal with quasi identifiers and sensitive attributes respectively we reduce the overall distortion by the measure of masking sa
ragnarok is an experimental software development environment that focuses on enhanced support for managerial activities in large scale software development taking the daily work of the software developer as its point of departure the main emphasis is support in three areas management navigation and collaboration the leitmotif is the software architecture which is extended to handle managerial data in addition to source code this extended software architecture is put under tight version and configuration management control and furthermore used as basis for visualisation preliminary results of using the ragnarok prototype in number of projects are outlined
many web information services utilize techniques of information extraction ie to collect important facts from the web to create more advanced services one possible method is to discover thematic information from the collected facts through text classification however most conventional text classification techniques rely on manual labelled corpora and are thus ill suited to cooperate with web information services with open domains in this work we present system named liveclassifier that can automatically train classifiersthrough web corpora based on user defined topic hierarchies due to its flexibility and convenience liveclassifier can be easily adapted for various purposes new web information services can be created to fully exploit it human users can use it to create classifiers for their personal applications the effectiveness of classifiers created by liveclassifier is well supportedby empirical evidence
the amount and variety of data available electronically have dramatically increased in the last decade however data and documents are stored in different ways and do not usually show their internal structure in order to take full advantage of the topological structure of digital documents and particularly web sites their hierarchical organization should be exploited by introducing notion of query similar to the one used in database systems good approach in that respect is the one provided by graphical query languages originally designed to model object bases and later proposed for semistructured data like log the aim of this paper is to provide suitable graph based semantics to this language supporting both data structure variability and topological similarities between queries and document structures suite of operational semantics based on the notion of bisimulation is introduced both at the concrete level instances and at the abstract level schemata giving rise to semantic framework that benefits from the cross fertilization of tools originally designed in quite different research areas databases concurrency logics static analysis
multiprocessor interconnection networks may reach congestion with high traffic loads which prevents reaching the wished performance unfortunately many of the mechanisms proposed in the literature for congestion control either suffer from lack of robustness being unable to work properly with different traffic patterns or message lengths or detect congestion relying on global information that wastes some network bandwidth this paper presents family of mechanisms to avoid network congestion in wormhole networks all of them need only local information applying message throttling when it is required the proposed mechanisms use different strategies to detect network congestion and also apply different corrective actions the mechanisms are evaluated and compared for several network loads and topologies noticeably improving network performance with high loads but without penalizing network behavior for low and medium traffic rates where no congestion control is required
beyond certain number of cores multi core processing chips will require network on chip noc to interconnect the cores and overcome the limitations of bus nocs must be carefully designed to meet constraints like power consumption area and ultra low latencies although meshes with dor dimension order routing meet these constraints the need for partitioning eg virtual machines coherency domains and traffic isolation may prevent the use of dor routing also core heterogeneity and manufacturing and run time faults may lead to partially irregular topologies routing in these topologies is complex and previously proposed solutions required routing tables which drastically increase power consumption area and latency the exception is lbdr logic based distributed routing flexible routing method for irregular topologies that removes the need for using routing tables both at end nodes and switches thus achieving large savings in chip area and power consumption but lbdr lacks support for multicast and broadcast which are required to efficiently support cache coherence protocols both for single and multiple coherence domains
geography is becoming increasingly important in web search search engines can often return better results to users by analyzing features such as user location or geographic terms in web pages and user queries this is also of great commercial value as it enables location specific advertising and improved search for local businesses as result major search companies have invested significant resources into geographic search technologies also often called local search this paper studies geographic search queries ie text queries such as hotel new york that employ geographical terms in an attempt to restrict results to particular region or location our main motivation is to identify opportunities for improving geographical search and related technologies and we perform an analysis of million queries of the recently released aol query trace first we identify typical properties of geographic search geo queries based on manual examination of several thousand queries based on these observations we build classifier that separates the trace into geo and non geo queries we then investigate the properties of geo queries in more detail and relate them to web sites and users associated with such queries we also propose new taxonomy for geographic search queries
electronic voting support systems should not focus only on ballot casting and recording instead user centered perspective should be adopted for the design of system that supports information gathering organizing and sharing deliberation decision making and voting relevant social science literature on political decision making and voting is used to develop requirements design concept is presented that supports extended information browsing using combined filtering from ballot materials and voter profiles the system supports information sharing and participation in electronic dialogues voters may interweave information browsing annotation contextualized discussion and ballot markup over extended time periods
we present statistical method that uses prediction modeling to decrease the temporally redundant data transmitted back to the sink the major novelties are fourfold first prediction model is fit to the sensor data second prediction error is utilized to adaptively update the model parameters using hypothesis testing third data transformation is proposed to bring the sensor sample series closer to weak stationarity finally an efficient implementation is presented we show that our proposed prediction error based hypothesis testing drastig method achieves low energy dissipation while keeping the prediction errors at user defined tolerable magnitudes based on real data experiments
when we attempted to introduce an extractive approach to company we were faced with challenging project situation where legacy applications did not have many commonalities among their implementations as they were developed independently by different teams without sharing common code base although there were not many structural similarities we expected to find similarities if we view them from the domain model perspective as they were in the same domain and were developed with the object oriented paradigm therefore we decided to place the domain model at the center of extraction and reengineering thus developing domain model based extractive method the method has been successfully applied to introduce software product line to set top box manufacturing company
previous research on mining sequential patterns mainly focused on discovering patterns from point based event data little effort has been put toward mining patterns from interval based event data where pair of time values is associated with each event kam and fu’s work in identified temporal relationships between two intervals according to these temporal relationships new variant of temporal patterns was defined for interval based event data unfortunately the patterns defined in this manner are ambiguous which means that the temporal relationships among events cannot be correctly represented in temporal patterns to resolve this problem we first define new kind of nonambiguous temporal pattern for interval based event data then the tprefixspan algorithm is developed to mine the new temporal patterns from interval based events the completeness and accuracy of the results are also proven the experimental results show that the efficiency and scalability of the tprefixspan algorithm are satisfactory furthermore to show the applicability and effectiveness of temporal pattern mining we execute experiments to discover temporal patterns from historical nasdaq data
we present functional object calculus which solves the traditional conflict between matching based functional programming and object oriented programming by treating uniformly method invocations and functional constructor applications the key feature of the calculus is that each object remembers its history that is the series of method calls that created it histories allow us to classify objects in finer way than classes do the resulting calculus has simple syntax and is very expressive we give examples finally we define type system for the calculus and show its soundness typable programs do not produce matching or method invocation errors
we present simple local protocol pcover which provides partial but high coverage in sensor networks through pcover we demonstrate that it is feasible to maintain high coverage while significantly increasing coverage duration when compared with protocols that provide full coverage in particular we show that we are able to maintain coverage for duration that is times the duration for which existing protocols maintain full coverage through simulations we show that our protocol provides load balancing ie the desired level of coverage is maintained almost until the point where all sensors deplete their batteries we also show that pcover handles failure of sensors different coverage areas different node densities and different topologies and can be used for dynamically changing the level of coverage
virtual networks provide applications with the illusion of having their own dedicated high performance networks although network interfaces posses limited shared resources we present the design of large scale virtual network system and examine the integration of communication programming interface system resource management and network interface operation our implementation on cluster of workstations quantifies the impact of virtualization on small message latencies and throughputs shows full hardware performance is delivered to dedicated applications and time shared workloads and shows robust performance under demanding workloads that overcommit interface resources
fault free path in the dimensional hypercube with faulty vertices is said to be long if it has length at least similarly fault free cycle in is long if it has length at least if all faulty vertices are from the same bipartite class of such length is the best possible we show that for every set of at most faulty vertices in and every two fault free vertices and satisfying simple necessary condition on neighbors of and there exists long fault free path between and this number of faulty vertices is tight and improves the previously known results furthermore we show for every set of at most faulty vertices in where that has long fault free cycle this is first quadratic bound which is known to be asymptotically optimal
collaborative web search cws is community based approach to web search that supports the sharing of past result selections among group of related searchers so as to personalize result lists to reflect the preferences of the community as whole in this paper we present the results of recent live user trial which demonstrates how cws elicits high levels of participation and how the search activities of community of related users form type of social search network
existing methods place data or code in scratchpad memory ie spm by either relying on heuristics or resorting to integer programming or mapping it to graph coloring problem in this work the spm allocation problem is formulated as an interval coloring problem the key observation is that in many embedded applications arrays including structs as special case are often related in the following way for any two arrays their live ranges are often such that one is either disjoint from or contains the other as result array interference graphs are often superperfect graphs and optimal interval colorings for such array interference graphs are possible this has led to the development of two new spm allocation algorithms while differing in whether live range splits and spills are done sequentially or together both algorithms place arrays in spm based on examining the cliques in an interference graph in both cases we guarantee optimally that all arrays in an interference graph can be placed in spm if its size is no smaller than the clique number of the graph in the case that the spm is not large enough we rely on heuristics to split or spill live range until the graph is colorable our experiment results using embedded benchmarks show that our algorithms can outperform graph coloring when their interference graphs are superperfect or nearly so although graph coloring is admittedly more general and may also be effective to applications with arbitrary interference graphs
this paper describes method for studying idioms based implementations of crosscutting concerns and our experiences with it in the context of real world large scale embedded software system in particular we analyse seemingly simple concern tracing and show that it exhibits significant variability despite the use of prescribed idiom we discuss the consequences of this variability in terms of how aspect oriented software development techniques could help prevent it how it paralyses automated migration efforts and which aspect language features are required in order to obtain precise and concise aspects additionally we elaborate on the representativeness of our results and on the usefulness of our proposed method
word wide web intelligent agent technology has provided researchers and practitioners such as those involved in information technology innovation knowledge management and technical collaboration with the ability to examine the design principles and performance characteristics of the various approaches to intelligent agent technology and to increase the cross fertilization of ideas on the development of autonomous agents and multi agent systems among different domains this study investigates the employment of intelligent agents in web based auction process with particular reference to the appropriateness of the agent software for the online auction task consumers value perception of the agent the effect of this consumer perception on their intention to use the tool and measure of consumer acceptance in the initial case study both consumers and web operators thought the use of software agents enhanced online auction efficiency and timeliness the second phase of the investigation established that consumer familiarity with the agent functionality was positively associated with seven dimensions online auction site’s task agent’s technology task technology fit perceived ease of use perceived usefulness perceived playfulness intention to use tool and negatively associated with perceived risk intelligent agents have the potential to release skilled operator time for the use of value adding tasks in the planning and expansion of online auctions
internet exchange points ixps are an important ingredient of the internet as level ecosystem logical fabric of the internet made up of about ases and their mutual business relationships whose primary purpose is to control and manage the flow of traffic despite the ixps critical role in this fabric little is known about them in terms of their peering matrices ie who peers with whom at which ixp and corresponding traffic matrices ie how much traffic do the different ases that peer at an ixp exchange with one another in this paper we report on an internet wide traceroute study that was specifically designed to shed light on the unknown ixp specific peering matrices and involves targeted traceroutes from publicly available and geographically dispersed vantage points based on our method we were able to discover and validate the existence of about ixp specific peering links nearly more links than were previously known in the process we also classified all known ixps depending on the type of information required to detect them moreover in view of the currently used inferred as level maps of the internet that are known to miss significant portion of the actual as relationships of the peer to peer type our study provides new method for augmenting these maps with ixp related peering links in systematic and informed manner
the problem of optimizing joins between two fragmented relations on broadcast local network is analyzed data redundancy is considered semantic information associated with fragments are used to eliminate necessary processing more than one physical copies of fragment is allowed to be used in strategy to achieve more parallelism join analysis graphs are introduced to represent joins on two fragmented relations the problem of optimizing join is mapped into an equivalent problem of finding minimum weight vertex cover for the corresponding join analysis graph this problem is proved to be np hard four phase approach for processing joins is proposed
we introduce computational framework for discovering regular or repeated geometric structures in shapes we describe and classify possible regular structures and present an effective algorithm for detecting such repeated geometric patterns in point or meshbased models our method assumes no prior knowledge of the geometry or spatial location of the individual elements that define the pattern structure discovery is made possible by careful analysis of pairwise similarity transformations that reveals prominent lattice structures in suitable model of transformation space we introduce an optimization method for detecting such uniform grids specifically designed to deal with outliers and missing elements this yields robust algorithm that successfully discovers complex regular structures amidst clutter noise and missing geometry the accuracy of the extracted generating transformations is further improved using novel simultaneous registration method in the spatial domain we demonstrate the effectiveness of our algorithm on variety of examples and show applications to compression model repair and geometry synthesis
we present unified mathematical framework for analyzing the tradeoffs between parallelism and storage allocation within parallelizing compiler using this framework we show how to find good storage mapping for given schedule good schedule for given storage mapping and good storage mapping that is valid for all legal one dimensional affine schedules we consider storage mappings that collapse one dimension of multidimensional array and programs that are in single assignment form and accept one dimensional affine schedule our method combines affine scheduling techniques with occupancy vector analysis and incorporates general affine dependences across statements and loop nests we formulate the constraints imposed by the data dependences and storage mappings as set of linear inequalities and apply numerical programming techniques to solve for the shortest occupancy vector we consider our method to be first step towards automating procedure that finds the optimal tradeoff between parallelism and storage space
high on chip temperature impairs the processor’s reliability and reduces its lifetime hardware level dynamic thermal management dtm techniques can effectively constrain the chip temperature but degrades the performance we propose an os level technique that performs thermal aware job scheduling to reduce dtms the algorithm is based on the observation that hot and cool jobs executed in different order can make difference in resulting temperature real system implementation in linux shows that our scheduler can remove percnt to percnt of the hardware dtms in medium thermal environment the cpu throughput is improved by up to percnt percnt on average in severe thermal environment
texture representation should corroborate various functions of texture in this paper we present novel approach that incorporates texture features for retrieval in an examplar based texture compaction and synthesis algorithm the original texture is compacted and compressed in the encoder to obtain thumbnail texture which the decoder then synthesizes to obtain perceptually high quality texture we propose using probabilistic framework based on the generalized em algorithm to analyze the solutions of the approach our experiment results show that high quality synthesized texture can be generated in the decoder from compressed thumbnail texture the number of bits in the compressed thumbnail is times lower than that in the original texture and times lower than that needed to compress the original texture using jpeg we also show that in terms of retrieval and synthesization our compressed and compacted textures perform better than compressed cropped textures and compressed compacted textures derived by the patchwork algorithm
there is growing interest in studying dynamic graphs or graphs that evolve with time in this work we investigate new type of dynamic graph analysis finding regions of graph that are evolving in similar manner and are topologically similar over period of time for example these regions can be used to group set of changes having common cause in event detection and fault diagnosis prior work has proposed greedy framework called cstag to find these regions it was accurate in datasets where the regions are temporally and spatially well separated however in cases where the regions are not well separated cstag produces incorrect groupings in this paper we propose new algorithm called reghunter it treats the region discovery problem as multi objective optimisation problem and it uses multi level graph partitioning algorithm to discover the regions of correlated change in addition we propose an external clustering validation technique and use several existing internal measures to evaluate the accuracy of reghunter using synthetic datasets we found reghunter is significantly more accurate than cstag in dynamic graphs that have regions with small separation using two real datasets the access graph of the world cup website and the bgp connectivity graph during the landfall of hurricane katrina we found reghunter obtained more accurate results than cstag furthermore reghunter was able to discover two interesting regions for the world cup access graph that cstag was not able to find
in this work we present spice based rtl subthreshold leakage model analyzing components built in nm technology we present separation approach regarding inter and intra die threshold variations temperature supply voltage and state dependence the body effect and differences between nmos and pmos introduce leakage state dependence of one order of magnitude we show that the leakage of rt components still shows state dependencies between and leakage model not regarding the state can never be more accurate than this the proposed state aware model has an average error of for the rt components analyzed
in this paper we investigate efficient strategies for supporting on demand information dissemination and gathering in large scale vwireless sensor networks in particular we propose comb needle discovery support model resembling an ancient method use comb to help find needle in sands or haystack the model combines push and pull for information dissemination and gathering the push component features data duplication in linear neighborhood of each node the pull component features dynamic formation of an on demand routing structure resembling comb the comb needle model enables us to investigate the cost of spectrum of push and pull combinations for supporting discovery and query in large scale sensor networks our result shows that the optimal routing structure depends on the frequency of query occurrence and the spatial temporal frequency of related events in the network the benefit of balancing push and pull for discovery in large scale geometric networks are demonstrated we also raise the issue of query coverage in unreliable networks and investigate how redundancy can improve the coverage via both theoretical analysis and simulation last we study adaptive strategies for the case where the frequencies of query and events are unknown priori and time varying
traditional high performance computing systems require extensive management and suffer from security and configuration problems this paper presents two generations of cluster management system that aims at making clusters as secure and self managing as possible the goal of the system is minimality all nodes in cluster are configured with minimal software base consisting of virtual machine monitor and remote bootstrapping mechanism and customers then buy access using simple pre paid token scheme all necessary application software including the operating system is provided by the customer as full virtual machine and boot strapped or migrated into the cluster we have explored two different models for cluster control the first decentralized push model evil man requires direct network access to cluster nodes each of which is running truly minimal control plane implementation consisting of only few hundred lines of code in the second centralized pull model evil twin nodes may be running behind nats or firewalls and are controlled by centralized web service specially developed cache invalidation protocol is used for telling nodes when to reload their workload description from the centralized service
there is an increasing interest in techniques that support analysis and measurement of fielded software systems these techniques typically deploy numerous instrumented instances of software system collect execution data when the instances run in the field and analyze the remotely collected data to better understand the system’s in the field behavior one common need for these techniques is the ability to distinguish execution outcomes eg to collect only data corresponding to some behavior or to determine how often and under which condition specific behavior occurs most current approaches however do not perform any kind of classification of remote executions and either focus on easily observable behaviors eg crashes or assume that outcomes classifications are externally provided eg by the users to address the limitations of existing approaches we have developed three techniques for automatically classifying execution data as belonging to one of several classes in this paper we introduce our techniques and apply them to the binary classification of passing and failing behaviors our three techniques impose different overheads on program instances and thus each is appropriate for different application scenarios we performed several empirical studies to evaluate and refine our techniques and to investigate the trade offs among them our results show that the first technique can build very accurate models but requires complete set of execution data the second technique produces slightly less accurate models but needs only small fraction of the total execution data and the third technique allows for even further cost reductions by building the models incrementally but requires some sequential ordering of the software instances instrumentation
two general techniques for implementing domain specific language dsl with less overhead are the finally tagless embedding of object programs and the direct style representation of side effects we use these techniques to build dsl for probabilistic programming for expressing countable probabilistic models and performing exact inference and importance sampling on them our language is embedded as an ordinary ocaml library and represents probability distributions as ordinary ocaml programs we use delimited continuations to reify probabilistic programs as lazy search trees which inference algorithms may traverse without imposing any interpretive overhead on deterministic parts of model we thus take advantage of the existing ocaml implementation to achieve competitive performance and ease of use inference algorithms can easily be embedded in probabilistic programs themselves
this paper discusses semantic interoperability issues in agent based commerce systems the literature reports various techniques to enable agents to understand the meanings of the messages exchanged we will argue how these different techniques can be combined in one agent communication protocol to obtain the best of each world the resulting communication protocol enables agents to sufficiently understand each other to participate in successful collaboration
several wireless sensor network applications are currently appearing in various domains their goal is often to monitor geographical area when sensor detects monitored event it informs sink node using alarm messages the area surveillance application needs to react to such an event with finite bounded and known delay these are real time constraints this work proposes real time mac protocol with realistic assumptions for random linear network where sensors are deployed randomly along line we present formal validation of this protocol both for initialization and run time and present simulation results on realistic scenario
sculpting various facial expressions from static face model is process with intensive manual tuning efforts in this paper we present an interactive facial expression posing system through portrait manipulation where manipulated portrait serves metaphor for automatically inferring its corresponding facial expression with fine details users either rapidly assemble face portrait through pre designed portrait component library or intuitively modify an initial portrait during the editing procedure when the users move one or group of control points on the portrait other portrait control points are adjusted in order to automatically maintain the faceness of the edited portrait if the automated propagation function switch is optionally turned on finally the portrait is used as query input to search for and reconstruct its corresponding facial expression from pre recorded facial motion capture database we showed that this system is effective for rapid facial expression sculpting through comparative user study
software frameworks and libraries are indispensable to today’s software systems as they evolve it is often time consuming for developers to keep their code up to date so approaches have been proposed to facilitate this usually these approaches cannot automatically identify change rules for one replaced by many and many replaced by one methods and they trade off recall for higher precision using one or more experimentally evaluated thresholds we introduce aura novel hybrid approach that combines call dependency and text similarity analyses to overcome these limitations we implement it in java system and compare it on five frameworks with three previous approaches by dagenais and robillard kim et al and sch auml fer et al the comparison shows that on average the recall of aura is higher while its precision is similar eg lower
as the popularity of areas including document storage and distributed systems continues to grow the demand for high performance xml databases is increasingly evident this has led to number of research efforts aimed at exploiting the maturity of relational database systems in order to increase xml query performance in our approach we use an index structure based on metamodel for xml databases combined with relational database technology to facilitate fast access to xml document elements the query process involves transforming xpath expressions to sql which can be executed over our optimised query engine as there are many different types of xpath queries varying processing logic may be applied to boost performance not only to individual xpath axes but across multiple axes simultaneously this paper describes pattern based approach to xpath query processing which permits the execution of group of xpath location steps in parallel
in yen defines class of formulas for paths in petri nets and claims that its satisfiability problem is expspace complete in this paper we show that in fact the satisfiability problem for this class of formulas is as hard as the reachability problem for petri nets moreover we salvage almost all of yen’s results by defining fragment of this class of formulas for which the satisfiability problem is expspace complete by adapting his proof
we present declarative and visual debugging environment for eclipse called jive traditional debugging is procedural in that programmer must proceed step by step and object by object in order to uncover the cause of an error in contrast we present declarative approach to debugging consisting of flexible set of queries over program’s execution history as well as over individual runtime states this runtime information is depicted in visual manner during program execution in order to aid the debugging process the current state of execution is depicted through an enhanced object diagram and the history of execution is depicted by sequence diagram our methodology makes use of these diagrams as means of formulating queries and reporting results in visual manner it also supports revisiting past runtime states either through reverse stepping of the program or through queries that report information from past states eclipse serves as an ideal framework for implementing jive since like the jive architecture it makes crucial use of the java platform debugging architecture jpda this paper presents details of the jive architecture and its integration into eclipse
the wide adoption of xml has increased the interest of the database community on tree structured data management techniques querying capabilities are provided through tree pattern queries the need for querying tree structured data sources when their structure is not fully known and the need to integrate multiple data sources with different tree structures have driven recently the suggestion of query languages that relax the complete specification of tree pattern in this paper we use query language which allows partial tree pattern queries ptpqs the structure in ptpq can be flexibly specified fully partially or not at all to evaluate ptpq we exploit index graphs which generate an equivalent set of complete tree pattern queriesin order to process ptpqs we need to efficiently solve the ptpq satisfiability and containment problems these problems become more complex in the context of ptpqs because the partial specification of the structure allows new non trivial structural expressions to be derived from those explicitly specified in ptpq we address the problem of ptpq satisfiability and containment in the absence and in the presence of index graphs and we provide necessary and sufficient conditions for each case to cope with the high complexity of ptpq containment in the presence of index graphs we study family of heuristic approaches for ptpq containment based on structural information extracted from the index graph in advance and on the fly we implement our approaches and we report on their extensive experimental evaluation and comparison
modeling and simulating pipelined processors in procedurallanguages such as requires lots of cost in handlingconcurrent events which hinders fast simulation number ofresearches on simulation have devised speed up techniques toreduce the number of events this paper presents newsimulation approach developed to enhance the simulation ofpipelined processors the proposed approach is based on earlypipeline evaluation that all the intermediate values of aninstruction are computed in advance creating future state for thenext instructions the future state allows the next instructions tobe computed without considering data dependencies betweennearby instructions we apply this concept to building cycle accurate simulator for pipelined risc processor and achieve almost the same speed as the instruction level simulator
modern languages like java and rely on dynamic optimizations in virtual machines for better performance current dynamic optimizations are reactive their performance is constrained by the dependence on runtime sampling and the partial knowledge of the execution this work tackles the problems by developing set of techniques that make virtual machine evolve across production runs the virtual machine incrementally learns the relation between program inputs and optimization strategies so that it proactively predicts the optimizations suitable for new run the prediction is discriminative guarded by confidence measurement through dynamic self evaluation we employ an enriched extensible specification language to resolve the complexities in program inputs these techniques implemented in jikes rvm produce significant performance improvement on set of java applications
supporting dynamic reconfiguration is required even in highly constrained embedded systems to allow software patches and updates and to allow adaptations to changes in environmental and operating conditions without service interruption dynamic reconfiguration however is complex and error prone process in this paper we report our experience in implementing safe dynamic reconfigurations in embedded devices with limited resources our approach relies on component based framework for building reconfigurable operating systems and the use of domain specific language dsl for reconfiguration
event correlation has become the cornerstone of many reactive applications particularly in distributed systems however support for programming with complex events is still rather specific and rudimentary this paper presents eventjava an extension of java with generic support for event based distributed programming eventjava seamlessly integrates events with methods and broadcasting with unicasting of events it supports reactions to combinations of events and predicates guarding those reactions eventjava is implemented as framework to allow for customization of event semantics matching and dispatching we present its implementation based on compiler transforming specific primitives to java along with reference implementation of the framework we discuss ordering properties of eventjava through formalization of its core as an extension of featherweight java in performance evaluation we show that eventjava compares favorably to highly tuned database backed event correlation engine as well as to comparably lightweight concurrency mechanism
deriving local cost models for query optimization in dynamic multidatabase system mdbs is challenging issue in this paper we study how to evolve query cost model to capture slowly changing dynamic mdbs environment so that the cost model is kept up to date all the time two novel evolutionary techniques ie the shifting method and the block moving method are proposed the former updates cost model by taking up to date information from new sample query into consideration at each step while the latter considers block batch of new sample queries at each step the relevant issues including derivation of recurrence updating formulas development of efficient algorithms analysis and comparison of complexities and design of an integrated scheme to apply the two methods adaptively are studied our theoretical and experimental results demonstrate that the proposed techniques are quite promising in maintaining accurate cost models efficiently for slowly changing dynamic mdbs environment besides the application to mdbss the proposed techniques can also be applied to the automatic maintenance of cost models in self managing database systems
this study develops knowledge navigator model knm tm to navigate knowledge management km implementation journey the knm comprises two frameworks evaluation and calculation framework qualitative research methods including literature review in depth interviews focus groups and content analysis are conducted to construct the evaluation framework of knm an algorithm model is proposed in the calculation framework and cases survey was employed to obtain the initial version of the score ranges used to differentiate maturity levels several propositions and the corresponding knm were constructed we define the km maturity level into five stages knowledge chaotic stage knowledge conscientious stage km stage km advanced stage and km integration stage the evaluation framework of knm consists of three aspects three target management objects culture km process and information technology km activities and key areas kas the initial version of the score ranges was identified the study results can be referenced and the methodology can be applied to other countries although the sample is confined to industries in taiwan
the importance and benefits of expertise sharing for organizations in knowledge economy are well recognized however the potential cost of expertise sharing is less well understood this paper proposes conceptual framework called collective attention economy to identify the costs associated with expertise sharing and provide the basis for analyzing and understanding the cost benefit structure of different communication mechanisms to demonstrate the analytical power of the conceptual framework the paper describes new communication mechanism dynamic mailing list dml that is developed by adjusting certain cost factors
programs written in managed languages are compiled to platform independent intermediate representation such as java bytecode the relative high level of java bytecode has engendered widespread practice of changing the bytecode directly without modifying the maintained version of the source code this practice called bytecode engineering or enhancement has become indispensable in introducing various concerns including persistence distribution and security transparently for example transparent persistence architectures help avoid the entanglement of business and persistence logic in the source code by changing the bytecode directly to synchronize objects with stable storage with functionality added directly at the bytecode level the source code reflects only partial semantics of the program specifically the programmer can neither ascertain the program’s runtime behavior by browsing its source code nor map the runtime behavior back to the original source code this paper presents an approach that improves the utility of source level programming tools by providing enhancement specifications written in domain specific language by interpreting the specifications source level programming tool can gain an awareness of the bytecode enhancements and improve its precision and usability we demonstrate the applicability of our approach by making source code editor and symbolic debugger enhancements aware
software changes during their life cycle software systems experience wide spectrum of changes from minor modifications to major architectural shifts small scale changes are usually performed with text editing and refactorings while large scale transformations require dedicated program transformation languages for medium scale transformations both approaches have disadvantages manual modifications may require myriad of similar yet not identical edits leading to errors and omissions while program transformation languages have steep learning curve and thus only pay off for large scale transformationswe present system supporting example based program transformation to define transformation programmer performs an example change manually feeds it into our system and generalizes it to other application contexts with time developer can build palette of reusable medium sized code transformations we provide detailed description of our approach and illustrate it with examples
future multicore processors will be more susceptible to variety of hardware failures in particular intermittent faults caused in part by manufacturing thermal and voltage variations can cause bursts of frequent faults that last from several cycles to several seconds or more due to practical limitations of circuit techniques cost effective reliability will likely require the ability to temporarily suspend execution on core during periods of intermittent faults we investigate three of the most obvious techniques for adapting to the dynamically changing resource availability caused by intermittent faults and demonstrate their different system level implications we show that system software reconfiguration has very high overhead that temporarily pausing execution on faulty core can lead to cascading livelock and that using spare cores has high fault free cost to remedy these and other drawbacks of the three baseline techniques we propose using thin hardware firmware layer to manage an overcommitted system one where the os is configured to use more virtual processors than the number of currently available physical cores we show that this proposed technique can gracefully degrade performance during intermittent faults of various duration with low overhead without involving system software and without requiring spare cores
refactorings are usually proposed in an ad hoc way because it is difficult to prove that they are sound with respect to formal semantics not guaranteeing the absence of type errors or semantic changes consequently developers using refactoring tools must rely on compilation and tests to ensure type correctness and semantics preservation respectively which may not be satisfactory to critical software development in this paper we formalize static semantics for alloy which is formal object oriented modeling language and encode it in prototype verification system pvs the static semantics formalization can be useful for specifying and proving that transformations in general not only refactorings do not introduce type errors for instance as we show here
entity matching is the problem of deciding if two given mentions in the data such as helen hunt and hunt refer to the same real world entity numerous solutions have been developed but they have not considered in depth the problem of exploiting integrity constraints that frequently exist in the domains examples of such constraints include mention with age two cannot match mention with salary and if two paper citations match then their authors are likely to match in the same order in this paper we describe probabilistic solution to entity matching that exploits such constraints to improve matching accuracy at the heart of the solution is generative model that takes into account the constraints during the generation process and provides well defined interpretations of the constraints we describe novel combination of em and relaxation labeling algorithms that efficiently learns the model thereby matching mentions in an unsupervised way without the need for annotated training data experiments on several real world domains show that our solution can exploit constraints to significantly improve matching accuracy by and that the solution scales up to large data sets
unifying terminology usages which captures more term semantics is useful for event clustering this paper proposes metric of normalized chain edit distance to mine incrementally controlled vocabulary from cross document coreference chains controlled vocabulary is employed to unify terms among different co reference chains novel threshold model that incorporates both time decay function and spanning window uses the controlled vocabulary for event clustering on streaming news under correct co reference chains the proposed system has performance increase compared to the baseline system and performance increase compared to the system without introducing controlled vocabulary furthermore chinese co reference resolution system with chain filtering mechanism is used to experiment on the robustness of the proposed event clustering system the clustering system using noisy co reference chains still achieves performance increase compared to the baseline system the above shows that our approach is promising
in this paper assuming that each node is incident with two or more fault free links we show that an dimensional alternating group graph can tolerate up to link faults where while retaining fault free hamiltonian cycle the proof is computer assisted the result is optimal with respect to the number of link faults tolerated previously without the assumption at most link faults can be tolerated for the same problem and the same graph
under the tuple level uncertainty paradigm we formalize the use of novel graphical model generator recognizer network grn as model of probabilistic databases the grn modeling framework is capable of representing much wider range of tuple dependency structure we show that grn representation of probabilistic database may undergo transitions induced by imposing constraints or evaluating queries we formalize procedures for these two types of transitions such that the resulting graphical models after transitions remain as grns this formalism makes grn self contained modeling framework and closed representation system for probabilistic databases property that is lacking in most existing models in addition we show that exploiting the transitional mechanisms allows systematic approach to constructing grns for arbitrary probabilistic data at arbitrary stages advantages of grns in query evaluation are also demonstrated
past research on probabilistic databases has studied the problem of answering queries on static database application scenarios of probabilistic databases however often involve the conditioning of database using additional information in the form of new evidence the conditioning problem is thus to transform probabilistic database of priors into posterior probabilistic database which is materialized for subsequent query processing or further refinement it turns out that the conditioning problem is closely related to the problem of computing exact tuple confidence values it is known that exact confidence computation is an np hard problem this has led researchers to consider approximation techniques for confidence computation however neither conditioning nor exact confidence computation can be solved using such techniques in this paper we present efficient techniques for both problems we study several problem decomposition methods and heuristics that are based on the most successful search techniques from constraint satisfaction such as the davis putnam algorithm we complement this with thorough experimental evaluation of the algorithms proposed our experiments show that our exact algorithms scale well to realistic database sizes and can in some scenarios compete with the most efficient previous approximation algorithms
server pages also called dynamic pages render generic web page into many similar ones the technique is commonly used for implementing web application user interfaces uis yet our previous study found high rate of repetitions also called lsquo clones rsquo in web applications particularly in uis the finding raised the question as to why such repetitions had not been averted with the use of server pages for an answer we conducted an experiment using php server pages to explore how far server pages can be pushed to achieve generic web applications our initial findings suggested that generic representation obtained using server pages sometimes compromises certain important system qualities such as run time performance it may also complicate the use of wysiwyg editors we have analysed the nature of these trade offs and now propose mixed strategy approach to obtain optimum generic representation of web applications without unnecessary compromise to critical system qualities and user experience the mixed strategy approach applies the generative technique of xvcl to achieve genericity at the meta level representation of web application leaving repetitions to the actual web application our experiments show that the mixed strategy approach can achieve good level of genericity without conflicting with other system qualities our findings should open the way for others to better informed decisions regarding generic design solutions which should in turn lead to simpler more maintainable and more reusable web applications copyright copy john wiley sons ltd
web data extraction systems in use today mainly focus on the generation of extraction rules ie wrapper induction thus they appear ad hoc and are difficult to integrate when holistic view is taken each phase in the data extraction process is disconnected and does not share common foundation to make the building of complete system straightforward in this paper we demonstrate holistic approach to web data extraction the principal component of our proposal is the notion of document schema document schemata are patterns of structures embedded in documents once the document schemata are obtained the various phases eg training set preparation wrapper induction and document classification can be easily integrated the implication of this is improved efficiency and better control over the extraction procedure our experimental results confirmed this more importantly because document can be represented as avector of schema it can be easily incorporated into existing systems as the fabric for integration
in this paper we propose the integration of services into social networks soaf service of friend to leverage the creation of the internet of services vision we show how to integrate services and humans into common network structure and discuss design and implementation issues in particular we discuss the required extensions to existing social network vocabulary with regard to services we illustrate scenario where this network structures can be applied in the context of service discovery and highlight the benefit of service enriched social network structure
recent spates of cyber attacks and frequent emergence of applications affecting internet traffic dynamics have made it imperative to develop effective techniques that can extract and make sense of significant communication patterns from internet traffic data for use in network operations and security management in this paper we present general methodology for building comprehensive behavior profiles of internet backbone traffic in terms of communication patterns of end hosts and services relying on data mining and information theoretic techniques the methodology consists of significant cluster extraction automatic behavior classification and structural modeling for in depth interpretive analyses we validate the methodology using data sets from the core of the internet the results demonstrate that it indeed can identify common traffic profiles as well as anomalous behavior patterns that are of interest to network operators and security analysts
tiling is crucial loop transformation for generating high performance code on modern architectures efficient generation of multi level tiled code is essential for maximizing data reuse in systems with deep memory hierarchies tiled loops with parametric tile sizes not compile time constants facilitate runtime feedback and dynamic optimizations used in iterative compilation and automatic tuning previous parametric multi level tiling approaches have been restricted to perfectly nested loops where all assignment statements are contained inside the innermost loop of loop nest previous solutions to tiling for imperfect loop nests have only handled fixed tile sizes in this paper we present an approach to parametric multi level tiling of imperfectly nested loops the tiling technique generates loops that iterate over full rectangular tiles making them amenable to compiler optimizations such as register tiling experimental results using number of computational benchmarks demonstrate the effectiveness of the developed tiling approach
popular web sites are expected to handle huge number of requests concurrently within reasonable time frame the performance of these web sites is largely dependent on effective thread management of their web servers although the implementation of static and dynamic thread policies is common practice remarkably little is known about the implications on performance moreover the commonly used policies do not take into account the complex interaction between the threads that compete for access to shared resource we propose new dynamic thread assignment policies that minimize the average response time of web servers the web server is modeled as two layered tandem of multi threading queues where the active threads compete for access to common resource this type of two layered queueing model which occurs naturally in the performance modeling of systems with intensive software hardware interaction are on the one hand appealing from an application point of view but on the other hand are challenging from methodological point of view our results show that the optimal dynamic thread assignment policies yield strong reductions in the response times validation on an apache web server shows that our dynamic thread policies confirm our analytical results
in constructive type theory recursive and corecursive definitions are subject to syntactic restrictions which guarantee termination for recursive functions and productivity for corecursive functions however many terminating and productive functions do not pass the syntactic tests bove proposed in her thesis an elegant reformulation of the method of accessibility predicates that widens the range of terminative recursive functions formalisable in constructive type theory in this paper we pursue the same goal for productive corecursive functions notably our method of formalisation of coinductive definitions of productive functions in coq requires not only the use of ad hoc predicates but also systematic algorithm that separates the inductive and coinductive parts of functions
probabilistic or randomized algorithms are fast becoming as commonplace as conventional deterministic algorithms this survey presents five techniques that have been widely used in the design of randomized algorithms these techniques are illustrated using randomized algorithms mdash both sequential and distributed mdash that span wide range of applications includingprimality testing classical problem in number theory interactive probabilistic proof systems new method of program testing dining philosophers classical problem in distributed computing and byzantine agreement reaching agreement in the presence of malicious processors included with each algorithm is discussion of its correctness and its computational complexity several related topics of interest are also addressed including the theory of probabilistic automata probabilistic analysis of conventional algorithms deterministic amplification and derandomization of randomized algorithms finally comprehensive annotated bibliography is given
approximate query answering has recently emerged as an effective method for generating viable answer among various techniques for approximate query answering wavelets have received lot of attention however wavelet techniques minimizing the root squared error ie the norm error have several problems such as the poor quality of reconstructed data when the original data is biased in this paper we present amid approximation of multi measured data using svd for multi measured data in amid we adapt the singular value decomposition svd to compress multi measured data we show that svd guarantees the root squared error and also drive an error bound of svd for an individual data value using mathematical analyses in addition in order to improve the accuracy of approximated data we combine svd and wavelets in amid since svd is applied to fixed matrix we use various properties of matrices to adapt svd to the incremental update environment we devise two variants of amid for the incremental update environment incremental amid and local amid to the best of our knowledge our work is the first to extend svd to incremental update environments
in this paper we propose two simple and efficient schemes for establishing anonymity in clustered wireless sensor networks cwsns the first scheme simple anonymity scheme sas uses range of pseudonyms as identifiers for node to ensure concealment of its true identifier id after deployment neighbouring nodes share their individual pseudonyms and use them for anonymous communication the second scheme cryptographic anonymity scheme cas uses keyed cryptographic one way hash function kchf for id concealment in cas neighbouring nodes securely share the information used by the kchf to generate pseudonyms even when many nodes in neighbourhood are compromised and are colluding our schemes guarantee complete anonymity to non compromised nodes during their mutual communication our schemes have reasonably low memory and computation costs they can be embedded into any wireless sensor network routing protocol to ensure anonymity and privacy during node discovery routing and data delivery
crucial step in volume rendering is the design of transfer functions that will highlight those aspects of the volume data that are of interest to the user for many applications boundaries carry most of the relevant information reliable detection of boundaries is often hampered by limitations of the imaging process such as blurring and noise we present method to identify the materials that form the boundaries these materials are then used in new domain that facilitates interactive and semiautomatic design of appropriate transfer functions we also show how the obtained boundary information can be used in region growing based segmentation
sensitivity analysis sa is novel compiler technique that complements and integrates with static automatic parallelization analysis for the cases when program behavior is input sensitive sa can extract all the input dependent statically unavailable conditions for which loops can be dynamically parallelized sa generates sequence of sufficient conditions which when evaluated dynamically in order of their complexity can each validate the dynamic parallel execution of the corresponding loop while sa’s principles are fairly simple implementing it in real compiler and obtaining good experimental results on benchmark codes is difficult task in this paper we present some of the most important implementation issues that we had to overcome in order to achieve fairly successful automatic parallelizer we present techniques related to validating dependence removing transformations eg privatization or pushback parallelization and static and dynamic evaluation of complex conditions for loop parallelization we concern ourselves with multi version and parallel code generation as well as the use of speculative parallelization when other less costly options fail we present summary table of the contributions of our techniques to the successful parallelization of industry benchmark codes we also report speedups and parallel coverage of these codes on two multicore based systems and compare them to results obtained by the ifort compiler
the last years have seen the definition of many languages models and standards tailored to specify and enforce access control policies but such frameworks do not provide methodological support during the policy specification process in particular they do not provide facilities for the analysis of the social context where the system operatesin this paper we propose model driven approach for the specification and analysis of access control policies we build this framework on top of si modeling language tailored to capture and analyze functional and security requirements of socio technical systems the framework also provides formal mechanisms to assist policy writers and system administrators in the verification of access control policies and of the actual user permission assignment
we propose hybrid method for simulating multiphase fluids such as bubbly water the appearance of subgrid visual details is improved by incorporating new bubble model based on smoothed particle hydrodynamics sph into an eulerian grid based simulation that handles background flows of large bodies of water and air to overcome the difficulty in simulating small bubbles in the context of the multiphase flows on coarse grid we heuristically model the interphase properties of water and air by means of the interactions between bubble particles as result we can animate lively motion of bubbly water with small scale details efficiently
knowing patterns of relationship in social network is very useful for law enforcement agencies to investigate collaborations among criminals for businesses to exploit relationships to sell products or for individuals who wish to network with others after all it is not just what you know but also whom you know that matters however finding out who is related to whom on large scale is complex problem asking every single individual would be impractical given the huge number of individuals and the changing dynamics of relationships recent advancement in technology has allowed more data about activities of individuals to be collected such data may be mined to reveal associations between these individuals specifically we focus on data having space and time elements such as logs of people’s movement over various locations or of their internet activities at various cyber locations reasoning that individuals who are frequently found together are likely to be associated with each other we mine from the data instances where several actors co occur in space and time presumably due to an underlying interaction we call these spatio temporal co occurrences events which we use to establish relationships between pairs of individuals in this paper we propose model for constructing social network from events and provide an algorithm that mines these events from the data experiments on real life data tracking people’s accesses to cyber locations have also yielded encouraging results
understanding bgp routing dynamics is critical to the solid growth and maintenance of the internet routing infrastructure however while the most extensive study on bgp dynamics is nearly decade old many factors that could affect bgp dynamics have changed considerably we revisit this important topic in this paper focusing on not only comparing with the previous results but also issues not well explored before we have found that compared to almost decade ago although certain characteristics remain unchanged such as some temporal properties bgp dynamics are now busier and more importantly now have much less pathological behavior and are healthierâ for example forwarding dynamics are now not only dominant but also more consistent across different days contributions to bgp dynamics by different bgp peersâ which are not proportional to the size of peerâ asâ are also more stable and dynamics due to policy changes or duplicate announcements are usually from specific peers
the paper proposes variation of simulation for checking and proving contextual equivalence in non deterministic call by need lambda calculus with constructors case seq and letrec with cyclic dependencies it also proposes novel method to prove its correctness the calculus semantics is based on small step rewrite semantics and on may convergence the cyclic nature of letrec bindings as well as non determinism makes known approaches to prove that simulation implies contextual preorder such as howe’s proof technique inapplicable in this setting the basic technique for the simulation as well as the correctness proof is called pre evaluation which computes set of answers for every closed expression if simulation succeeds in finite computation depth then it is guaranteed to show contextual preorder of expressions
automatic emotion sensing in textual data is crucial for the development of intelligent interfaces in many interactive computer applications this paper describes high precision knowledgebase independent approach for automatic emotion sensing for the subjects of events embedded within sentences the proposed approach is based on the probability distribution of common mutual actions between the subject and the object of an event we have incorporated web based text mining and semantic role labeling techniques together with number of reference entity pairs and hand crafted emotion generation rules to realize an event emotion detection system the evaluation outcome reveals satisfactory result with about accuracy for detecting the positive negative and neutral emotions
this chapter discusses usability engineering approach for the design and the evaluation of adaptive web based systems focusing on practical issues list of methods will be presented considering user centered approach after having introduced the peculiarities that characterize the evaluation of adaptive web based systems the chapter describes the evaluation methodologies following the temporal phases of evaluation according to user centered approach three phases are distinguished requirement phase preliminary evaluation phase and final evaluation phase moreover every technique is classified according to set of parameters that highlight the practical exploitation of that technique for every phase the appropriate techniques are described by giving practical examples of their application in the adaptive web number of issues that arise when evaluating an adaptive system are described and potential solutions and workarounds are sketched
we investigate the problem of refining sql queries to satisfy cardinality constraints on the query result this has applications to the many few answers problems often faced by database users we formalize the problem of query refinement and propose framework to support it in database system we introduce an interactive model of refinement that incorporates user feedback to best capture user preferences our techniques are designed to handle queries having range and equality predicates on numerical and categorical attributes we present an experimental evaluation of our framework implemented in an open source data manager and demonstrate the feasibility and practical utility of our approach
using an underlying role based model for the administration of roles has proved itself to be successful approach this paper sets out to describe the enterprise role based access control model erbac in the context of sam jupiter commercial enterprise security management softwarewe provide an overview of the role based conceptual model underlying sam jupiter having established this basis we describe how the model is used to facilitate role based administration approach in particular we discuss our notion of scopes which describe the objects over which an administrator has authority the second part provides case study based on our real world experiences in the implementation of role based administrative infrastructures finally critical evaluation and comparison with current approaches to administrative role based access control is provided
inductive logic programming or relational learning is powerful paradigm for machine learning or data mining however in order for ilp to become practically useful the efficiency of ilp systems must improve substantially to this end the notion of query pack is introduced it structures sets of similar queries furthermore mechanism is described for executing such query packs complexity analysis shows that considerable efficiency improvements can be achieved through the use of this query pack execution mechanism this claim is supported by empirical results obtained by incorporating support for query pack execution in two existing learning systems
distributed data stream processing applications are often characterized by data flow graphs consisting of large number of built in and user defined operators connected via streams these flow graphs are typically deployed on large set of nodes the data processing is carried out on the fly as tuples arrive at possibly very high rates with minimum latency it is well known that developing and debugging distributed multi threaded and asynchronous applications such as stream processing applications can be challenging thus without domain specific debugging support developers struggle when debugging distributed applications in this paper we describe tools and language support to support debugging distributed stream processing applications our key insight is to view debugging of stream processing applications from four different but related perspectives first debugging the semantics of the application involves verifying the operator level composition and inspecting the flows at the logical level second debugging the user defined operators involves traditional source code debugging but strongly tied to the stream level interactions third debugging the deployment details of the application require understanding the runtime physical layout and configuration of the application fourth debugging the performance of the application requires inspecting various performance metrics such as communication rates cpu utilization etc associated with streams operators and nodes in the system in light of this characterization we developed several tools such as debugger aware compiler and an associated stream debugger composition and deployment visualizers and performance visualizers as well as language support such as configuration knobs for logging and tracing deployment configurations such as operator to process and process to node mappings monitoring directives to inspect streams and special sink adapters to intercept and dump streaming data to files and sockets to name few we describe these tools in the context of spade mdash language for creating distributed stream processing applications and system mdash distributed stream processing middleware under development at the ibm watson research center published in by john wiley sons ltd this article is us government work and is in the public domain in the usa
most existing pp networks route requests in kn log log log hops where is the number of participating nodes and is an adjustable parameter although some can achieve hop routing for constant by tuning the parameter the neighbor locations however become function of causing considerable maintenance overhead if the user base is highly dynamic as witnessed by the deployed systems this paper explores the design space using the simple uniformly random neighbor selection strategy and proposes random peer to peer network that is the first of its kind to resolve requests in hops with chosen probability of where is constant the number of neighbors per node is within constant factor from the optimal complexity for any network whose routing paths are bounded by hops
mails concerning the development issues of system constitute an important source of information about high level design decisions low level implementation concerns and the social structure of developers establishing links between mails and the software artifacts they discuss is non trivial problem due to the inherently informal nature of human communication different approaches can be brought into play to tackle this trace ability issue but the question of how they can be evaluated remains unaddressed as there is no recognized benchmark against which they can be compared in this article we present such benchmark which we created through the manual inspection of statistically significant number of mails pertaining to six unrelated software systems we then use our benchmark to measure the effectiveness of number of approaches ranging from lightweight approaches based on regular expressions to full fledged information retrieval approaches
in this paper we propose an automated system able to detect volume based anomalies in network traffic caused by denial of service dos attacks we designed system with two stage architecture that combines more traditional change point detection approaches adaptive threshold and cumulative sum with novel one based on the continuous wavelet transform the presented anomaly detection system is able to achieve good results in terms of the trade off between correct detections and false alarms estimation of anomaly duration and ability to distinguish between subsequent anomalies we test our system using set of publicly available attack free traffic traces to which we superimpose anomaly profiles obtained both as time series of known common behaviors and by generating traffic with real tools for dos attacks extensive test results show how the proposed system accurately detects wide range of dos anomalies and how the performance indicators are affected by anomalies characteristics ie amplitude and duration moreover we separately consider and evaluate some special test cases
frequent pattern mining is popular topic in data mining with the advance of this technique privacy issues attract more and more attention in recent years in this field previous works based hiding sensitive information on uniform support threshold or disclosure threshold however in practical applications we probably need to apply different support thresholds to different itemsets for reflecting their significance in this paper we propose new hiding strategy to protect sensitive frequent patterns with multiple sensitive thresholds based on different sensitive thresholds the sanitized dataset is able to highly fulfill user requirements in real applications while preserving more information of the original dataset empirical studies show that our approach can protect sensitive knowledge well not only under multiple thresholds but also under uniform threshold moreover the quality of the sanitized dataset can be maintained
web page classification is one of the essential techniques for web mining specifically classifying web pages of user interesting class is the first step of mining interesting information from the web however constructing classifier for an interesting class requires laborious pre processing such as collecting positive and negative training examples for instance in order to construct homepage classifier one needs to collect sample of homepages positive examples and sample of non homepages negative examples in particular collecting negative training examples requires arduous work and special caution to avoid biasing them we introduce in this paper the positive example based learning pebl framework for web page classification which eliminates the need for manually collecting negative training examples in pre processing we present an algorithm called mapping convergence that achieves classification accuracy with positive and unlabeled data as high as that of traditional svm with positive and negative data our experiments show that when the algorithm uses the same amount of positive examples as that of traditional svm the algorithm performs as well as traditional svm
since software systems need to be continuously available under varying conditions their ability to evolve at runtime is increasingly seen as one key issue modern programming frameworks already provide support for dynamic adaptations however the high variability of features in dynamic adaptive systems das introduces an explosion of possible runtime system configurations often called modes and mode transitions designing these configurations and their transitions is tedious and error prone making the system feature evolution difficult while aspect oriented modeling aom was introduced to improve the modularity of software this paper presents how an aom approach can be used to tame the combinatorial explosion of das modes using aom techniques we derive wide range of modes by weaving aspects into an explicit model reflecting the runtime system we use these generated modes to automatically adapt the system we validate our approach on an adaptive middleware for home automation currently deployed in rennes metropolis
interactive multi user internet games require frequent state updates between players to accommodate the great demand for reality and interactivity the large latency and limited bandwidth on the internet greatly affects the games scalability the high level architecture hla is the ieee standard for distributed simulation with its data distribution management ddm service group assuming the functionalities of interest management with its support for reuse and interoperability and its ddm support for communication optimization the hla is promising at supporting multi user gaming on the internet however this usually requires particular prior security setup across administrative domains according to the specific run time infrastructure rti used we have previously developed service oriented hla rti sohr which enables distributed simulations to be conducted across administrated domains on the grid this paper discusses multi user gaming on the grid using sohr specifically maze game is used to illustrate how sohr enables users to join game conveniently experiments have been carried out to show how ddm can improve the communication efficiency
the world is changing and so must the data that describes its history not surprisingly considerable research effort has been spent in databases along this direction covering topics such as temporal models and schema evolution topic that has not received much attention however is that of concept evolution for example germany instance level concept has evolved several times in the last century as it went through different governance structures then split into two national entities that eventually joined again likewise caterpillar is transformed into butterfly while mother becomes two maternally related entities as well the concept of whale class level concept changed over the past two centuries thanks to scientific discoveries that led to better understanding of what the concept entails in this work we present formal framework for modeling querying and managing such evolution in particular we describe how to model the evolution of concept and how this modeling can be used to answer historical queries of the form how has concept evolved over period our proposal extends an rdf like model with temporal features and evolution operators then we provide query language that exploits these extensions and supports historical queries
blog post opinion retrieval aims at finding blog posts that are relevant and opinionated about user’s query in this paper we propose simple probabilistic model for assigning relevant opinion scores to documents the key problem is how to capture opinion expressions in the document that are related to the query topic current solutions enrich general opinion lexicons by finding query specific opinion lexicons using pseudo relevance feedback on external corpora or the collection itself in this paper we use general opinion lexicon and propose using proximity information in order to capture opinion term relatedness to the query we propose proximity based opinion propagation method to calculate the opinion density at each point in document the opinion density at the position of query term in the document can then be considered as the probability of opinion about the query term at that position the effect of different kernels for capturing the proximity is also discussed experimental results on the blog dataset show that the proposed method provides significant improvement over standard trec baselines and achieves increase in map over the best performing run in the trec blog track
with the increase in the size of data sets data mining has recently become an important research topic and is receiving substantial interest from both academia and industry at the same time interest in temporal databases has been increasing and growing number of both prototype and implemented systems are using an enhanced temporal understanding to explain aspects of behavior associated with the implicit time varying nature of the universe this paper investigates the confluence of these two areas surveys the work to date and explores the issues involved and the outstanding problems in temporal data mining
this paper examines some issues that affect the efficiency and fairness of the transmission control protocol tcp the backbone of internet protocol communication in multihops satellite network systems it proposes scheme that allows satellite systems to automatically adapt to any change in the number of active tcp flows due to handover occurrence the free buffer size and the bandwidth delay product of the networkthe proposed scheme has two major design goals increasing the system efficiency and improving its fairness the system efficiency is controlled by matching the aggregate traffic rate to the sum of the link capacity and total buffer size on the other hand the system min max fairness is achieved by allocating bandwidth among individual flows in proportion with their rtts the proposed scheme is dubbed recursive explicit and fair window adjustment refwa simulation results elucidate that the refwa scheme substantially improves the system fairness reduces the number of packet drops and makes better utilization of the bottleneck link the results demonstrate also that the proposed scheme works properly in more complicated environments where connections traverse multiple bottlenecks and the available bandwidth may change over data transmission time
propositional satisfiability solving or sat is an important reasoning task arising in numerous applications such as circuit design formal verification planning scheduling or probabilistic reasoning the depth first search dpll procedure is in practice the most efficient complete algorithm to date previous studies have shown the theoretical and experimental advantages of decomposing propositional formulas to guide the ordering of variable instantiation in dpll however in practice the computation of tree decomposition may require considerable amount of time and space on large formulas existing decomposition tools are unable to handle most currently challenging sat instances because of their size in this paper we introduce simple fast and scalable method to quickly produce tree decompositions of large sat problems we show experimentally the efficiency of orderings derived from these decompositions on the solving of challenging benchmarks
understanding web document and the sections inside the document is very important for web transformation and information retrieval from web pages detecting pagelets which are small features located inside web page in order to understand web document’s structure is difficult problem current work on pagelet detection focuses only on finding the location of the pagelet without regard to its functionality we describe method to detect both the location and functionality of pagelets using html element patterns for each pagelet type an html element pattern is created and matched to web page sections of the web page that matches the patterns are marked as pagelet candidates we test this technique on multiple popular web pages from the news and commerce genres we find that this method adequately recalls various pagelets from the web page
the assumption of routing symmetry is often embedded into traffic analysis and classification tools this paper uses passively captured network data to estimate the amount of traffic actually routed symmetrically on specific link we propose flow based symmetry estimator fse set of metrics to assess symmetry in terms of flows packets and bytes which disregards inherently asymmetrical traffic such as udp icmp and tcp background radiation this normalized metric allows fair comparison of symmetry across different links we evaluate our method on large heterogeneous dataset and confirm anecdotal reports that routing symmetry typically does not hold for non edge internet links and decreases as one moves toward core backbone links due to routing policy complexity our proposed metric for traffic asymmetry induced by routing policies will help the community improve traffic characterization techniques and formats but also support quantitative formalization of routing policy effects on links in the wild
we present specification language called action language for model checking software specifications action language forms an interface between transition system models that model checker generates and high level specification languages such as statecharts rsml and scr mdash similar to an assembly language between microprocessor and programming language we show that action language translations of statecharts and scr specifications are compact and they preserve the structure of the original specification action language allows specification of both synchronous and asynchronous systems it also supports modular specifications to enable compositional model checking
in previous work we have designed tracking protocol stalk for wireless sensor networks and proved it to be self stabilizing at the pseudo code automata level however it is very challenging to achieve and verify self stabilization of the same protocol at the implementation tinyos level due to the size of the corresponding program at the implementation level in this paper we present lightweight and practical method for specification based design of stabilization and illustrate this method on the stalk protocol as our case study
the analysis of large execution traces is almost impossible without efficient tool support lately there has been an increase in the number of tools for analyzing traces generated from object oriented systems this interest has been driven by the fact that polymorphism and dynamic binding pose serious limitations to static analysis however most of the techniques supported by existing tools are found in the context of very specific visualization schemes which makes them hard to reuse it is also very common to have two different tools implement the same techniques using different terminology this appears to result from the absence of common framework for trace analysis approaches this paper presents the state of the art in the area of trace analysis we do this by analyzing the techniques that are supported by eight trace exploration tools we also discuss their advantages and limitations and how they can be improved
context aware application in the pervasive computing environment provides intuitive user centric services using implicit context cues personalization and control are important issues for this class of application as they enable end users to understand and configure the behavior of an application however most development efforts for building context aware applications focus on the sensor fusion and machine learning algorithms to generate and distribute context cues that drive the application with little emphasis on user centric issues we argue that to elevate user experiences with context aware applications it is very important to address these personalization and control issues at the system interface level in parallel to context centric design towards this direction we present persona toolkit that provides support for extending context aware applications with end user personalization and control features specifically persona exposes few application programming interfaces that abstract end user customization and control mechanisms and enables developers to integrate these user centric aspects with rest of the application seamlessly there are two primary advantages of persona first it can be used with various existing middlewares as ready to use plug in to build customizable and controllable context aware applications second existing context aware applications can easily be augmented to provide end user personalization and control support in this paper we discuss the design and implementation of persona and demonstrate its usefulness through the development and augmentation of range of common context aware applications
most file systems attempt to predict which disk blocks will be needed in the near future and prefetch them into memory this technique can improve application throughput as much as but why the reasons include that the disk cache comes into play the device driver amortizes the fixed cost of an operation over larger amount of data total disk seek time can be decreased and that programs can overlap computation and however intuition does not tell us the relative benefit of each of these causes or techniques for increasing the effectiveness of prefetching to answer these questions we constructed an analytic performance model for file system reads the model is based on bsd derived file system and parameterized by the access patterns of the files layout of files on disk and the design characteristics of the file system and of the underlying disk we then validated the model against several simple workloads the predictions of our model were typically within of measured values and differed at most by from measured values using the model and experiments we explain why and when prefetching works and make proposals for how to tune file system and disk parameters to improve overall system throughput
in order to provide general access control methodology for parts of xml documents we propose combining role based access control as found in the role graph model with methodology originally designed for object oriented databases we give description of the methodology showing how different access modes xpath expressions and roles can be combined and how propagation of permissions is handled given this general approach system developer can design complex authorization model for collections of xml documents
we present new window based method for correspondence search using varying support weights we adjust the support weights of the pixels in given support window based on color similarity and geometric proximity to reduce the image ambiguity our method outperforms other local methods on standard stereo benchmarks
increasing risks of spoof attacks and other common problems of unimodal biometric systems such as intra class variations non universality and noisy data necessitate the use of multimodal biometrics the face and the ear are highly attractive biometric traits for combination because of their physiological structure and location besides both of them can be acquired non intrusively however changes of facial expressions variations in pose scale and illumination and the presence of hair and ornaments present some genuine challenges in this paper local feature based approach is proposed to fuse ear and face biometrics at the score level experiments with frgc and the university of notre dame biometric databases show that the technique achieves an identification rate of and verification rate of at far for fusion of the ear with neutral face biometrics it is also found to be fast and robust to facial expressions achieving and identification and verification rates respectively
synchronous data flow languages such as scade lustre manage infinite sequences or streams as primitive values making them naturally adapted to the description of dominated systems their conservative extension with means to define control structures or modes has been long term research topic through which several solutions have emergedin this paper we pursue this effort and generalize existing solutions by providing two constructs general form of state machines called parameterized state machines and valued signals as can be found in esterel parameterized state machines greatly reduce the reliance on error prone mechanisms such as shared memory in automaton based programming signals provide new way of programming with multi rate data in synchronous dataflow languages together they allow for much more direct and natural programming of systems that combine dataflow and state machinesthe proposed extension is fully implemented in the new lucid synchrone compiler
this paper describes new hierarchical approach to content based image retrieval called the customized queries approach cqa contrary to the single feature vector approach which tries to classify the query and retrieve similar images in one step cqa uses multiple feature sets and two step approach to retrieval the first step classifies the query according to the class labels of the images using the features that best discriminate the classes the second step then retrieves the most similar images within the predicted class using the features customized to distinguish subclasses within that class needing to find the customized feature subset for each class led us to investigate feature selection for unsupervised learning as result we developed new algorithm called fssem feature subset selection using expectation maximization clustering we applied our approach to database of high resolution computed tomography lung images and show that cqa radically improves the retrieval precision over the single feature vector approach to determine whether our cbir system is helpful to physicians we conducted an evaluation trial with eight radiologists the results show that our system using cqa retrieval doubled the doctors diagnostic accuracy
blind scheduling policies schedule tasks without knowledge of the tasks remaining processing times existing blind policies such as fcfs ps and las have proven useful in network and operating system applications but each policy has separate vastly differing description leading to separate and distinct implementations this paper presents the design and implementation of configurable blind scheduler that contains continuous tunable parameter by merely changing the value of this parameter the scheduler’s policy exactly emulates or closely approximates several existing standard policies other settings enable policies whose behavior is hybrid of these standards we demonstrate the practical benefits of such configurable scheduler by implementing it into the linux operating system we show that we can emulate the behavior of linux’s existing more complex scheduler with single hybrid setting of the parameter we also show using synthetic workloads that the best value for the tunable parameter is not unique but depends on distribution of the size of tasks arriving to the system finally we use our formulation of the configurable scheduler to contrast the behavior of various blind schedulers by exploring how various properties of the scheduler change as we vary our scheduler’s tunable parameter
in this paper we investigate generic methods for placing photos uploaded to flickr on the world map as primary input for our methods we use the textual annotations provided by the users to predict the single most probable location where the image was taken central to our approach is language model based entirely on the annotations provided by users we define extensions to improve over the language model using tag based smoothing and cell based smoothing and leveraging spatial ambiguity further we demonstrate how to incorporate geonames footnote http wwwgeonamesorg visited may large external database of locations for varying levels of granularity we are able to place images on map with at least twice the precision of the state of the art reported in the literature
as massively parallel computers proliferate there is growing interest in finding ways by which performance of massively parallel codes can be efficiently predicted this problem arises in diverse contexts such as parallelizing compilers parallel performance monitoring and parallel algorithm development in this paper we describe one solution where one directly executes the application code but uses discrete event simulator to model details of the presumed parallel machine such as operating system and communication network behavior because this approach is computationally expensive we are interested in its own parallelization specifically the parallelization of the discrete event simulator we describe methods suitable for parallelized direct execution simulation of message passing parallel programs and report on the performance of such system lapse large application parallel simulation environment we have built on the intel paragon on all codes measured to date lapse predicts performance well typically within relative error depending on the nature of the application code we have observed low slowdowns relative to natively executing code and high relative speedups using up to processors
formal verification techniques are not yet widely used in the software industry perhaps because software tends to be more complex than hardware and the penalty for bugs is often lower software can be patched after the release instead large amount of time and money is being spent on software testing which misses many subtle errors especially in concurrent programs increased use of concurrency eg due to the popularity of web services and the surge of complex viruses which exploit security vulnerabilities of software make the problem of creating verifying compiler for production quality code essential and urgent
multimedia and network processing applications make extensive use of subword data since registers are capable of holding full data word when subword variable is assigned register only part of the register is used new embedded processors have started supporting instruction sets that allow direct referencing of bit sections within registers and therefore multiple subword variables can be made to simultaneously reside in the same register without hindering accesses to these variables however new register allocation algorithm is needed that is aware of the bitwidths of program variables and is capable of packing multiple subword variables into single register this paper presents one such algorithmthe algorithm we propose has two key steps first combination of forward and backward data flow analyses are developed to determine the bitwidths of program variables throughout the program this analysis is required because the declared bitwidths of variables are often larger than their true bitwidths and moreover the minimal bitwidths of program variable can vary from one program point to another second novel interference graph representation is designed to enable support for fast and highly accurate algorithm for packing of subword variables into single register packing is carried out by node coalescing phase that precedes the conventional graph coloring phase of register allocation in contrast to traditional node coalescing packing coalesces set of interfering nodes our experiments show that our bitwidth aware register allocation algorithm reduces the register requirements by to over traditional register allocation algorithm that assigns separate registers to simultaneously live subword variables
the core task of sponsored search is to retrieve relevant ads for the user’s query ads can be retrieved either by exact match when their bid term is identical to the query or by advanced match which indexes ads as documents and is similar to standard information retrieval ir recently there has been great deal of research into developing advanced match ranking algorithms however no previous research has addressed the ad indexing problem unlike most traditional search problems the ad corpus is defined hierarchically in terms of advertiser accounts campaigns and ad groups which further consist of creatives and bid terms this hierarchical structure makes indexing highly non trivial as naively indexing all possible displayable ads leads to prohibitively large and ineffective index we show that ad retrieval using such an index is not only slow but its precision is suboptimal as well we investigate various strategies for compact hierarchy aware indexing of sponsored search ads through adaptation of standard ir indexing techniques we also propose new ad retrieval method that yields more relevant ads by exploiting the structured nature of the ad corpus experiments carried out over large ad test collection from commercial search engine show that our proposed methods are highly effective and efficient compared to more standard indexing and retrieval approaches
existing methods for measuring the quality of search algorithms use static collection of documents set of queries and mapping from the queries to the relevant documents allow the experimenter to see how well different search engines or engine configurations retrieve the correct answers this methodology assumes that the document set and thus the set of relevant documents are unchanging in this paper we abandon the static collection requirement we begin with recent trec collection created from web crawl and analyze how the documents in that collection have changed over time we determine how decay of the document collection affects trec systems and present the results of an experiment using the decayed collection to measure live web search system we employ novel measures of search effectiveness that are robust despite incomplete relevance information lastly we propose methodology of collection maintenance which supports measuring search performance both for single system and between systems run at different points in time
xml has tree structured data model which is used to uniformly represent structured as well as semi structured data and also enable concise query specification in xquery via the use of its xpath twig patterns this in turn can leverage the recently developed technology of structural join algorithms to evaluate the query efficiently in this paper we identify fundamental tension in xml data modeling data represented as deep trees which can make effective use of twig patterns are often un normalized leading to update anomalies while ii normalized data tends to be shallow resulting in heavy use of expensive value based joins in queriesour solution to this data modeling problem is novel multi colored trees mct logical data model which is an evolutionary extension of the xml data model and permits trees with multi colored nodes to signify their participation in multiple hierarchies this adds significant semantic structure to individual data nodes we extend xquery expressions to navigate between structurally related nodes taking color into account and also to create new colored trees as restructurings of an mct database while mct serves as significant evolutionary extension to xml as logical data model one of the key roles of xml is for information exchange to enable exchange of mct information we develop algorithms for optimally serializing an mct database as xml we discuss alternative physical representations for mct databases using relational and native xml databases and describe an implementation on top of the timber native xml database experimental evaluation using our prototype implementation shows that not only are mct queries updates more succinct and easier to express than equivalent shallow tree xml queries but they can also be significantly more efficient to evaluate than equivalent deep and shallow tree xml queries updates
the large address space needs of many current applications have pushed processor designs toward bit word widths although full bit addresses and operations are indeed sometimes needed arithmetic operations on much smaller quantities are still more common in fact another instruction set trend has been the introduction of instructions geared toward subword operations on bit quantities for examples most major processors now include instruction set support for multimedia operations allowing parallel execution of several subword operations in the same alu this article presents our observations demonstrating that operations on ldquo narrow width rdquo quantities are common not only in multimedia codes but also in more general workloads in fact across the specint benchmarks over half the integer operation executions require bits or less based on this data we propose two hardware mechanisms that dynamically recognize and capitalize on these narrow width operations the first power oriented optimization reduces processor power consumption by using operand value based clock gating to turn off portions of arithmetic units that will be unused by narrow width operations this optimization results in reduction in the integer unit’s power consumption for the specint and mediabench benchmark suites applying this optimization to specfp benchmarks results in slightly smaller power reductions but still seems warranted these reductions in integer unit power consumption equate to full chip power savings our second performance oriented optimization improves processor performance by packing together narrow width operations so that they share single arithmetic unit conceptually similar to dynamic form of mmx this optimization offers speedups of for specint and for mediabench overall these optimizations highlight an increasing opportunity for value based optimizations to improve both power and performance in current microprocessors
we describe novel method for controlling physics based fluid simulations through gradient based nonlinear optimization using technique known as the adjoint method derivatives can be computed efficiently even for large simulations with millions of control parameters in addition we introduce the first method for the full control of free surface liquids we show how to compute adjoint derivatives through each step of the simulation including the fast marching algorithm and describe new set of control parameters specifically designed for liquids
context sensitive points to analysis maintains separate points to relationships for each possible abstract calling context of method previous work has shown that large number of equivalence classes exists in the representation of calling contexts such equivalent contexts provide opportunities for context sensitive analyses based on binary decision diagrams bdds in which bdds automatically merge equivalent points to relationships however the use of bdd black box introduces additional overhead for analysis running time furthermore with heap cloning ie using context sensitive object allocation sites bdds are not as effective because the number of equivalence classes increases significantly further step must be taken to look inside the bdd black box to investigate where the equivalence comes from and what tradeoffs can be employed to enable practical large scale heap cloning this paper presents an analysis for java that exploits equivalence classes in context representation for particular pointer variable or heap object all abstract contexts within an equivalence class can be merged this technique naturally results in new non bdd context sensitive points to analysis based on these equivalence classes the analysis employs last substring merging approach to define scalability and precision tradeoffs we show that small values for can enable scalable heap cloning for large java programs the proposed analysis has been implemented and evaluated on large set of java programs the experimental results show improvements over an existing object sensitive analysis with heap cloning which is the most precise scalable analysis implemented in the state of the art paddle analysis framework for computing points to solution for an entire program our approach is an order of magnitude faster compared to this bdd based analysis and to related non bdd refinement based analysis
consequence of ilp systems being implemented in prolog or using prolog libraries is that usually these systems use prolog internal database to store and manipulate data however in real world problems the original data is rarely in prolog format in fact the data is often kept in relational database management systems rdbms and then converted to format acceptable by the ilp system therefore more interesting approach is to link the ilp system to the rdbms and manipulate the data without converting it this scheme has the advantage of being more scalable since the whole data does not need to be loaded into memory by the ilp system in this paper we study several approaches of coupling ilp systems with rdbms systems and evaluate their impact on performance we propose to use deductive database ddb system to transparently translate the hypotheses to relational algebra expressions the empirical evaluation performed shows that the execution time of ilp algorithms can be effectively reduced using ddb and that the size of the problems can be increased due to non memory storage of the data
deployments of wireless lans consisting of hundreds of access points with large number of users have been reported in enterprises as well as college campuses however due to the unreliable nature of wireless links users frequently encounter degraded performance and lack of coverage this problem is even worse in unplanned networks such as the numerous access points deployed by homeowners existing approaches that aim to diagnose these problems are inefficient because they troubleshoot at too high level and are unable to distinguish among the root causes of degradation this paper designs implements and tests fine grained detection algorithms that are capable of distinguishing between root causes of wireless anomalies at the depth of the physical layer an important property that emerges from our system is that diagnostic observations are combined from multiple sources over multiple time instances for improved accuracy and efficiency
the objective of this research is to apply markerless augmented reality ar techniques to aid in the visualisation of robotic helicopter related tasks conventional robotic ar applications work well with markers in prepared environments but are infeasible in outdoor settings in this paper we present preliminary results from real time markerless ar system for tracking natural features in an agricultural scene by constructing virtual marker under known initial configuration of the robotic helicopter camera and the ground plane the camera pose can be continuously tracked using the natural features from the image sequence to perform augmentation of virtual objects the experiments are simulated on mock up model of an agricultural farm and the results show that the current ar system is capable of tracking the camera pose accurately for translational motions and roll rotations future work includes reducing jitter in the virtual marker vertices to improve camera pose estimation accuracy for pitch and yaw rotations and implementing feature recovery algorithms
two key steps in the compilation of strict functional languages are the conversion of higher order functions to data structures closures and the transformation to tail recursive style we show how to perform both steps at once by applying first order offline partial evaluation to suitable interpreter the resulting code is easy to transliterate to low level or native code we have implemented the compilation to it yields performance comparable to that of other modern scheme to compilers in addition we have integrated various optimizations such as constant propagation higher order removal and arity raising simply by modifying the underlying interpreter purely first order methods suffice to achieve the transformations our approach is an instance of semantics directed compiler generation
this paper gives short introduction into parameterized complexity theory aimed towards database theorists interested in this area the main results presented here classify the evaluation of first order queries and conjunctive queries as hard parameterized problems
image restoration is an important and widely studied problem in computer vision and image processing various image filtering strategies have been effective but invariably make strong assumptions about the properties of the signal and or degradation hence these methods lack the generality to be easily applied to new applications or diverse image collections this paper describes novel unsupervised information theoretic adaptive filter uinta that improves the predictability of pixel intensities from their neighborhoods by decreasing their joint entropy in this way uinta automatically discovers the statistical properties of the signal and can thereby restore wide spectrum of images the paper describes the formulation to minimize the joint entropy measure and presents several important practical considerations in estimating neighborhood statistics it presents series of results on both real and synthetic data along with comparisons with current state of the art techniques including novel applications to medical image processing
given terabyte click log can we build an efficient and effective click model it is commonly believed that web search click logs are gold mine for search business because they reflect users preference over web documents presented by the search engine click models provide principled approach to inferring user perceived relevance of web documents which can be leveraged in numerous applications in search businesses due to the huge volume of click data scalability is must we present the click chain model ccm which is based on solid bayesian framework it is both scalable and incremental perfectly meeting the computational challenges imposed by the voluminous click logs that constantly grow we conduct an extensive experimental study on data set containing million query sessions obtained in july from commercial search engine ccm consistently outperforms two state of the art competitors in number of metrics with over better log likelihood over better click perplexity and much more robust up to prediction of the first and the last clicked position
in nutshell duality for constraint satisfaction problem equates the existence of one homomorphism to the non existence of other homomorphisms in this survey paper we give an overview of logical combinatorial and algebraic aspects of the following forms of duality for constraint satisfaction problems finite duality bounded pathwidth duality and bounded treewidth duality
novel face recognition algorithm based on gabor texture information is proposed in this paper two kinds of strategies to capture it are introduced gabor magnitude based texture representation gmtr and gabor phase based texture representation gptr specifically gmtr is characterized by using the gamma density to model the gabor magnitude distribution while gptr is characterized by using the generalized gaussian density ggd to model the gabor phase distribution the estimated model parameters serve as texture representation experiments are performed on yale orl and feret databases to validate the feasibility of the proposed method the results show that the proposed gmtr based and gptr based nlda both significantly outperform the widely used gabor features based nlda and other existing subspace methods in addition the feature level fusion of these two kinds of texture representations performs better than them individually
we propose novel privacy preserving distributed infrastructure in which data resides only with the publishers owning it the infrastructure disseminates user queries to publishers who answer them at their own discretion the infrastructure enforces publisher anonymity guarantee which prevents leakage of information about which publishers are capable of answering certain query given the virtual nature of the global data collection we study the challenging problem of efficiently locating publishers in the community that contain data items matching specified query we propose distributed index structure uqdt that is organized as union of query dissemination trees qdts and realized on an overlay ie logical network infrastructure each qdt has data publishers as its leaf nodes and overlay network nodes as its internal nodes each internal node routes queries to publishers based on summary of the data advertised by publishers in its subtrees we experimentally evaluate design tradeoffs and demonstrate that uqdt can maximize throughput by preventing any overlay network node from becoming bottleneck
web pages often contain objects created at different times the information about the age of such objects may provide useful context for understanding page content and may serve many potential uses in this paper we describe novel concept for detecting approximate creation dates of content elements in web pages our approach is based on dynamically reconstructing page histories using data extracted from external sources web archives and efficiently searching inside them to detect insertion dates of content elements we discuss various issues involving the proposed approach and demonstrate the example of an application that enhances browsing the web by inserting annotations with temporal metadata into page content on user request
program slicing systematically identifies parts of program relevant to seed statement unfortunately slices of modern programs often grow too large for human consumption we argue that unwieldy slices arise primarily from an overly broad definition of relevance rather than from analysis imprecision while traditional slice includes all statements that may affect point of interest not all such statements appear equally relevant to human as an improved method of finding relevant statements we propose thin slicing thin slice consists only of producer statements for the seed ie those statements that help compute and copy avalue to the seed statements that explain why producers affect the seed are excluded for example for seed that reads value from container object thin slice includes statements that store the value into the container but excludes statements that manipulate pointers to the container itself thin slices can also be hierarchically expanded to include statements explaining how producers affect the seed yielding traditional slice in the limit we evaluated thin slicing for set of debugging and program understanding tasks the evaluation showed that thin slices usually included the desired statements for the tasks eg the buggy statement for debugging task furthermore in simulated use of slicing tool thin slices revealed desired statements after inspecting times fewer statements than traditional slicing for our debugging tasks and times fewer statements for our program understanding tasks finally our thin slicing algorithm scales well to relatively large java benchmarks suggesting that thin slicing represents an attractive option for practical tools
reconfigurable hardware is ideal for use in systems on chip soc as it provides both hardware level performance and post fabrication flexibility however any one architecture is rarely equally optimized for all applications socs targeting specific set of applications can greatly benefit from incorporating customized reconfigurable logic instead of generic field programmable gate array fpga logic unfortunately manually designing domain specific architecture for every soc would require significant design time instead this paper discusses our initial efforts towards creating reconfigurable hardware generator capable of automatically creating flexible yet domain specific designs our tests indicate that our generated architectures are more than smaller than equivalent fpga implementations and nearly as area efficient as standard cell designs we also use novel technique employing synthetic circuit generation to demonstrate the flexibility of our architecture generation techniques
from personal software to advanced systems caching mechanisms have steadfastly been ubiquitous means for reducing workloads it is no surprise then that under the grid and cluster paradigms middlewares and other large scale applications often seek caching solutions among these distributed applications scientific workflow management systems have gained ground towards mitigating the often painstaking process of composing sequences of scientific data sets and services to derive virtual data in the past workflow managers have relied on low level system cache for reuse support but in distributed query intensive environments where high volumes of intermediate virtual data can potentially be stored anywhere on the grid novel cache structure is needed to efficiently facilitate workflow planning in this paper we describe an approach to combat the challenges of maintaining large fast virtual data caches for workflow composition hierarchical structure is proposed for indexing scientific data with spatiotemporal annotations across grid nodes our experimental results show that our hierarchical index is scalable and outperforms centralized indexing scheme by an exponential factor in query intensive environments
at the highest level of formal certification the current research trend consists in providing evaluators with formal checkable proof produced by automatic verification tools the aim is to reduce the certification process to verifying the provided proof using proof checker however to date no certified proof checker has emerged in addition checkable proofs do not eliminate the need to validate the formalization of the verification problem in this paper we consider the point of view of evaluators we elaborate criteria that must be fulfilled by formal proof in order to convince skeptical evaluators then we present methodology based on this notion of convincing proofs that requires simple formalizations to reach the level of confidence of formal certification the key idea is to build certified proof checker in collaboration with the evaluators which is finally used to validate the proof provided by developers we illustrate our approach on the correctness proof of buffering protocol written in that manages the data exchanges between concurrent tasks in avionics control systems
nowadays grid computing has been widely recognized as the next big thing in distributed software development grid technologies allow developers to implement massively distributed applications with enormous demands for resources such as processing power data and network bandwidth despite the important benefits grid computing offers contemporary approaches for grid enabling applications still force developers to invest much effort into manually providing code to discover and access grid resources and services moreover the outcome of this task is usually software that is polluted by grid aware code as result of which the maintainability suffers in previous article we presented jgrim novel approach to easily gridify java applications in this paper we report detailed evaluation of jgrim that was conducted by comparing it with ibis and proactive two platforms for grid development specifically we used these three platforms for gridifying the nearest neighbor algorithm and an application for restoring panoramic images the results show that jgrim simplifies gridification without resigning performance for these applications
while touch screen displays are becoming increasingly popular many factors affect user experience and performance surface quality parallax input resolution and robustness for instance can vary with sensing technology hardware configurations and environmental conditions we have developed framework for exploring how we could overcome some of these dependencies by leveraging the higher visual and input resolution of small coarsely tracked mobile devices for direct precise and rapid interaction on large digital displays the results from formal user study show no significant differences in performance when comparing four techniques we developed for tracked mobile device where two existing touch screen techniques served as baselines the mobile techniques however had more consistent performance and smaller variations among participants and an overall higher user preference in our setup our results show the potential of spatially aware handhelds as an interesting complement or substitute for direct touch interaction on large displays
we present set of time efficient approaches to index objects moving on the plane to efficiently answer range queries about their future positions our algorithms are based on previously described solutions as well as on the employment of efficient access methods finally an experimental evaluation is included that shows the performance scalability and efficiency of our methods
the design of complex systems requires powerful mechanisms for modeling data state communication and real time behaviour as well as for structuring and decomposing systems in order to control local complexity timed communicating object tcoz builds on object z’s strengths in modeling complex data and state and on timed csp’s strengths in modeling process control and real time interactions in this paper we demonstrate the tcoz approach to the design and verification of the teleservices and remote medical care system
recent user interface concepts such as multimedia multimodal wearable ubiquitous tangible or augmented reality based ar interfaces each cover different approaches that are all needed to support complex human computer interaction increasingly an overarching approach towards building what we call ubiquitous augmented reality uar user interfaces that include all of the just mentioned concepts will be required to this end we present user interface architecture that can form sound basis for combining several of these concepts into complex systems we explain in this paper the fundamentals of dwarf user interface framework dwarf standing for distributed wearable augmented reality framework and an implementation of this architecture finally we present several examples that show how the framework can form the basis of prototypical applications
in wireless sensor networks one of the main design challenges is to save severely constrained energy resources and obtain long system lifetime low cost of sensors enables us to randomly deploy large number of sensor nodes thus potential approach to solve lifetime problem arises that is to let sensors work alternatively by identifying redundant nodes in high density networks and assigning them an off duty operation mode that has lower energy consumption than the normal on duty mode in single wireless sensor network sensors are performing two operations sensing and communication therefore there might exist two kinds of redundancy in the network most of the previous work addressed only one kind of redundancy sensing or communication alone wang et al intergrated coverage and connectivity configuration in wireless sensor networks in proceedings of the first acm conference on embedded networked sensor systems sensys los angeles november and zhang and hou maintaining sensing coverage and connectivity in large sensor networks technical report uiucdcs june first discussed how to combine consideration of coverage and connectivity maintenance in single activity scheduling they provided sufficient condition for safe scheduling integration in those fully covered networks however random node deployment often makes initial sensing holes inside the deployed area inevitable even in an extremely high density network therefore in this paper we enhance their work to support general wireless sensor networks by proving another conclusion the communication range is twice of the sensing range is the sufficient condition and the tight lower bound to ensure that complete coverage preservation implies connectivity among active nodes if the original network topology consisting of all the deployed nodes is connected also we extend the result to degree network connectivity and degree coverage preservation
geospatial data play key role in wide spectrum of critical data management applications such as disaster and emergency management environmental monitoring land and city planning and military operations often requiring the coordination among diverse organizations their data repositories and users with different responsibilities although variety of models and techniques are available to manage access and share geospatial data very little attention has been paid to addressing security concerns such as access control security and privacy policies and the development of secure and in particular interoperable gis applications the objective of this paper is to discuss the technical challenges raised by the unique requirements of secure geospatial data management and to suggest comprehensive framework for security and privacy for geospatial data and gis such framework is the first coherent architectural approach to the problem of security and privacy for geospatial data
in virtual machines for embedded devices that use just in time compilation the management of the code cache can significantly impact performance in terms of both memory usage and start up time although improving memory usage has been common focus for system designers start up time is often overlooked in systems with constrained resources however these two performance metrics are often at odds and must be considered together in this paper we present an adaptive self adjusting code cache manager to improve performance with respect to both start up time and memory usage it balances these concerns by detecting changes in method compilation rates resizing the cache after each pitching event we conduct experiments to validate our proposed system and quantify the impacts that different code cache management techniques have on memory usage and start up time through two oracle systems our results show that the proposed algorithm yields nearly the same start up times as hand tuned oracle and shorter execution times than those of the sscli in eight out of ten applications it also has lower memory usage over time in all but one application
structural testing techniques such as statement and branch coverage play an important role in improving dependability of software systems however finding set of tests which guarantees high coverage is time consuming task in this paper we present technique for structural testing based on kernel computation kernel satisfies the property that any set of tests which executes all vertices edges of the kernel executes all vertices edges of the program’s flowgraph we present linear time algorithm for computing minimum kernels based on pre and post dominator relations of flowgraph
the user interfaces of the most popular search engines are largely the same typically users are presented with an ordered list of documents which provide limited help if users are having trouble finding the information they need this article presents an interface called the venn diagram interface vdi that offers users improved search transparency the vdi allows users to see how each term or group of terms in query contributes to the entire result set of search furthermore it allows users to browse the result sets generated by each of these terms in test with participants the vdi was compared against standard web search interface google with the vdi users were able to find more documents of higher relevance and were more inclined to continue searching their level of interactivity was higher the quality of the answers they found was perceived to be better eight out of users preferred the vdi
suffix trees are indexing structures that enhance the performance of numerous string processing algorithms in this paper we propose cache conscious suffix tree construction algorithms that are tailored to cmp architectures the proposed algorithms utilize novel sample based cache partitioning algorithm to improve cache performance and exploit on chip parallelism on cmps furthermore several compression techniques are applied to effectively trade space for cache performance through an extensive experimental evaluation using real text data from different domains we demonstrate that the algorithms proposed herein exhibit better cache performance than their cache unaware counterparts and effectively utilize all processing elements achieving satisfactory speedup
online transaction processing oltp databases include suite of features disk resident trees and heap files locking based concurrency control support for multi threading that were optimized for computer technology of the late advances in modern processors memories and networks mean that today’s computers are vastly different from those of years ago such that many oltp databases will now fit in main memory and most oltp transactions can be processed in milliseconds or less yet database architecture has changed little based on this observation we look at some interesting variants of conventional database systems that one might build that exploit recent hardware trends and speculate on their performance through detailed instruction level breakdown of the major components involved in transaction processing database system shore running subset of tpc rather than simply profiling shore we progressively modified it so that after every feature removal or optimization we had faster working system that fully ran our workload overall we identify overheads and optimizations that explain total difference of about factor of in raw performance we also show that there is no single high pole in the tent in modern memory resident database systems but that substantial time is spent in logging latching locking tree and buffer management operations
traditionally clock network layout is performed after cell placement such methodology is facing serious problem in nanometer ic designs where people tend to use huge clock buffers for robustness against variations that is clock buffers are often placed far from ideal locations to avoid overlap with logic cells as result both power dissipation and timing are degraded in order to solve this problem we propose low power clock buffer planning methodology which is integrated with cell placement bin divided grouping algorithm is developed to construct virtual buffer tree which can explicitly model the clock buffers in placement the virtual buffer tree is dynamically updated during the placement to reflect the changes of latch locations to reduce power dissipation latch clumping is incorporated with the clock buffer planning the experimental results show that our method can reduce clock power significantly by on average
with recent advances in computing and communication technologies enabling mobile devices more powerful the scope of grid computing has been broadened to include mobile and pervasive devices energy has become critical resource in such devices so battery energy limitation is the main challenge towards enabling persistent mobile grid computing in this paper we address the problem of energy constrained scheduling scheme for the grid environment there is limited energy budget for grid applications the paper investigates both energy minimization for mobile devices and grid utility optimization problem we formalize energy aware scheduling using nonlinear optimization theory under constraints of energy budget and deadline the paper also proposes distributed pricing based algorithm that is used to tradeoff energy and deadline to achieve system wide optimization based on the preference of the grid user the simulations reveal that the proposed energy constrained scheduling algorithms can obtain better performance than the previous approach that considers both energy consumption and deadline
the increasing use of computers for saving valuable data imposes stringent reliability constraints on storage systems reliability improvement via use of redundancy is common practice as the disk capacity improves advanced techniques such as disk scrubbing are being employed to proactively fix latent sector errors these techniques utilize the disk idle time for reliability improvement however the idle time is key to dynamic energy management that detects such idle periods and turns off the disks to save energy in this paper we are concerned with the distribution of the disk idle periods between reliability and energy management tasks for this purpose we define new metric energy reliability product erp to capture the effect of one technique on the other our initial investigation using trace driven simulations of typical enterprise applications shows that the erp is suitable metric for identifying efficient idle period utilization thus erp can facilitate development of systems that provide both reliability and energy managements
this article deals with the design of on chip architectures for testing large system chips socs for manufacturing defects in modular fashion these architectures consist of wrappers and test access mechanisms tams for an soc with specified parameters of modules and their tests we design an architecture that minimizes the required tester vector memory depth and test application time in this article we formulate the test architecture design problems for both modules with fixed and flexible length scan chains assuming the relevant module parameters and maximal soc tam width are given subsequently we derive formulation for an architecture independent lower bound for the soc test time we analyze three types of tam under utilization that make the theoretical lower bound unachievable in most practical architecture instances we present novel architecture independent heuristic algorithm that effectively optimizes the test architecture for given soc the algorithm efficiently determines the number of tams and their widths the assignment of modules to tams and the wrapper design per module we show how this algorithm can be used for optimizing both test bus and testrail architectures with either serial or parallel test schedules experimental results for the itc soc test benchmarks show that compared to manual best effort engineering approaches we can save up to percnt in test times while compared to previously published algorithms we obtain comparable or better test times at negligible compute time
in publish subscribe paradigm user service discovery requires matching user preferences to available published services eg user may want to find if there is chinese restaurant close by this is difficult problem when users are mobile wirelessly connected to network and dynamically roaming in different environments the magnitude of the problem increases with respect to the number of attributes for each users preference criteria as matches must be done in real time we present an algorithm that uses singular value decomposition to encode each service properties in few values users preference criteria are matched by using the same encoding to produce value that can be rapidly compared to those of the services we show that reasonable matches can be found in time log where is the number of publications and the number of attributes in the preference criteria subscription this is in contrast to approximate nearest neighbor techniques which require either time or storage exponential in
the growing popularity of augmented reality ar games in both research and more recently commercial context has led for need to take closer look at design related issues which impact on player experience while issues relating to this area have been considered to date most of the emphasis has been on the technology aspects furthermore it is almost always assumed that the augmented reality element in itself will provide sufficient experience for the player this has led to need to evaluate what makes successful augmented reality game in this paper we present set of design guidelines which are drawn from experiences of three mixed reality games the guidelines provide specific guidance on relationships between real and virtual space social interaction use of ar technologies maintaining consistent themes and implicitly address higher level aspects such as presence within particular augmented reality place
content based retrieval of the similar motions for the human joints has significant impact in the fields of physical medicine biomedicine rehabilitation and motion therapy in this paper we propose an efficient indexing approach for human motion capture data supporting queries involving both subbody motions as well as whole body motions
abductive logic programming offers formalism to declaratively express and solve problems in areas such as diagnosis planning belief revision and hypothetical reasoning tabled logic programming offers computational mechanism that provides level of declarativity superior to that of prolog and which has supported successful applications in fields such as parsing program analysis and model checking in this paper we show how to use tabled logic programming to evaluate queries to abductive frameworks with integrity constraints when these frameworks contain both default and explicit negation the result is the ability to compute abduction over well founded semantics with explicit negation and answer sets our approach consists of transformation and an evaluation method the transformation adjoins to each objective literal in program an objective literal hbox it not along with rules that ensure that hbox it not will be true if and only if is false we call the resulting program dual program the evaluation method abdual then operates on the dual program abdual is sound and complete for evaluating queries to abductive frameworks whose entailment method is based on either the well founded semantics with explicit negation or on answer sets further abdual is asymptotically as efficient as any known method for either class of problems in addition when abduction is not desired abdual operating on dual program provides novel tabling method for evaluating queries to ground extended programs whose complexity and termination properties are similar to those of the best tabling methods for the well founded semantics publicly available meta interpreter has been developed for abdual using the xsb system
in molecular biology dna sequence matching is one of the most crucial operations since dna databases contain huge volume of sequences fast indexes are essential for efficient processing of dna sequence matching in this paper we first point out the problems of the suffix tree an index structure widely used for dna sequence matching in respect of storage overhead search performance and difficulty in seamless integration with dbms then we propose new index structure that resolves such problems the proposed index structure consists of two parts the primary part realizes the trie as binary bit string representation without any pointers and the secondary part helps fast access to the trie’s leaf nodes that need to be accessed for post processing we also suggest efficient algorithms based on that index for dna sequence matching to verify the superiority of the proposed approach we conduct performance evaluation via series of experiments the results reveal that the proposed approach which requires smaller storage space can be few orders of magnitude faster than the suffix tree
we study the effects of node mobility on the wireless links and protocol performance in mobile ad hoc networks manets first we examine the behavior of links through an analytical framework and develop statistical models to accurately characterize the distribution of lifetime of such wireless links in manets we compute the lifetimes of links through two state markov model and use these results to model multi hop paths and topology changes we show that the analytical solution follows closely the results obtained through discrete event simulations for two mobility models namely random direction and random waypoint mobility models finally we present comprehensive simulation study that combines the results from the findings in simulations with the analytical results to bring further insight on how different types of mobility translate into protocol performance
workflow views abstract groups of tasks in workflow into high level composite tasks in order to reuse sub workflows and facilitate provenance analysis however unless view is carefully designed it may not preserve the dataflow between tasks in the workflow ie it may not be sound unsound views can be misleading and cause incorrect provenance analysis this paper studies the problem of efficiently identifying and correcting unsound workflow views with minimal changes in particular given workflow view we wish to split each unsound composite task into the minimal number of tasks such that the resulting view is sound we prove that this problem is np hard by reduction from independent set we then propose two local optimality conditions weak and strong and design polynomial time algorithms for correcting unsound views to meet these conditions experiments show that our proposed algorithms are effective and efficient and that the strong local optimality algorithm produces better solutions than the weak local optimality algorithm with little processing overhead
wireless sensor networks consist of many nodes that collect real world data process them and transmit the data by radio wireless sensor networks represent new rapidly developing direction in the field of organization of computer networks of free configuration sensor networks are used for monitoring parameter field where it is often required to fix time of an event with high accuracy high accuracy of local clocks is also necessary for operation of network protocols for energy saving purposes the nodes spend most of the time in the sleeping mode and communicate only occasionally in the paper base techniques used in the existing time synchronization schemes are analyzed models of local clock behavior and models of interaction of the network devices are described classification of the synchronization problems is presented and survey of the existing approaches to synchronization of time in sensor networks is given
in this paper we introduce new branch predictor that predicts the outcome of branches by predicting the value of their inputs and performing an early computation of their results according to the predicted values the design of hybrid predictor comprising the above branch predictor and correlating branch predictor is presented we also propose new selector that chooses the most reliable prediction for each branch this selector is based on the path followed to reach the branch results for immediate updates show significant misprediction rate reductions with respect to conventional hybrid predictor for different size configurations in addition the proposed hybrid predictor with size of kb achieves the same accuracy as conventional one of kb performance evaluation for dynamically scheduled superscalar processor with realistic updates shows speed up of percent despite its higher latency up to four cycles
databases are increasingly being used to store multi media objects such as maps images audio and video storage and retrieval of these objects is accomplished using multi dimensional index structures such as trees and ss trees as dimensionality increases query performance in these index structures degrades this phenomenon generally referred to as the dimensionality curse can be circumvented by reducing the dimensionality of the data such reduction is however accompanied by loss of precision of query results current techniques such as qbic use svd transform based dimensionality reduction to ensure high query precision the drawback of this approach is that svd is expensive to compute and therefore not readily applicable to dynamic databases in this paper we propose novel techniques for performing svd based dimensionality reduction in dynamic databases when the data distribution changes considerably so as to degrade query precision we recompute the svd transform and incorporate it in the existing index structure for recomputing the svd transform we propose novel technique that uses aggregate data from the existing index rather than the entire data this technique reduces the svd computation time without compromising query precision we then explore efficient ways to incorporate the recomputed svd transform in the existing index structure without degrading subsequent query response times these techniques reduce the computation time by factor of in experiments on color and texture image vectors the error due to approximate computation of svd is less than
this paper presents case study of globally distributed work group’s use of an online environment called loops loops is web based persistent chat system whose aim is to support collaboration amongst corporate work groups we describe the ways in which the group turned the system’s features to its own ends and the unusual usage rhythm that corresponded with the team’s varying needs for communication as it moved through its work cycle we conclude with discussion of design implications and suggestion that community may not always be the best way to think about groups use of online systems
we present simple image based method of generating novel visual appearance in which new image is synthesized by stitching together small patches of existing images we call this process image quilting first we use quilting as fast and very simple texture synthesis algorithm which produces surprisingly good results for wide range of textures second we extend the algorithm to perform texture transfer mdash rendering an object with texture taken from different object more generally we demonstrate how an image can be re rendered in the style of different image the method works directly on the images and does not require information
organizations increasingly coordinate their product and service development processes to deliver their products and services as fast as possible and to involve employees customers suppliers and business partners seamlessly in different stages of the processes these processes have to consider that their participants are increasingly on the move or distributed while they are working expertise needs to be shared across locations and different mobile devices this paper describes framework for distributed and mobile collaboration defines set of requirements for virtual communities and discusses mobile teamwork support software architecture that has been developed in the eu project motion the framework together with the architecture enables to enhance current collaboration approaches to include the dimension of mobile participants and virtual communities for distributed product development this is achieved by integrating process and workspace management requirements with peer to peer middleware publish subscribe and community and user management components
schapire and singer’s improved version of adaboost for handling weak hypotheses with confidence rated predictions represents an important advance in the theory and practice of boosting its success results from more efficient use of information in weak hypotheses during updating instead of simple binary voting weak hypothesis is allowed to vote for or against classification with variable strength or confidence the pool adjacent violators pav algorithm is method for converting score into probability we show how pav may be applied to weak hypothesis to yield new weak hypothesis which is in sense an ideal confidence rated prediction and that this leads to an optimal updating for adaboost the result is new algorithm which we term pav adaboost we give several examples illustrating problems for which this new algorithm provides advantages in performance
predicting the worst case execution time wcet and best case execution time bcet of real time program is challenging task though much progress has been made in obtaining tighter timing predictions by using techniques that model the architectural features of machine significant overestimations of wcet and underestimations of bcet can still occur even with perfect architectural modeling dependencies on data values can constrain the outcome of conditional branches and the corresponding set of paths that can be taken in program while branch constraint information has been used in the past by some timing analyzers it has typically been specified manually which is both tedious and error prone this paper describes efficient techniques for automatically detecting branch constraints by compiler and automatically exploiting these constraints within timing analyzer the result is significantly tighter timing analysis predictions without requiring additional interaction with user
previous schemes for implementing full tail recursion when compiling into have required some form of trampoline to pop the stack we propose solving the tail recursion problem in the same manner as standard ml of new jersey by allocating all frames in the garbage collected heap the scheme program is translated into continuation passing style so the target functions never return the stack pointer then becomes the allocation pointer for cheney style copying garbage collection scheme our scheme can use function calls arguments variable arity functions and separate compilation without requiring complex block compilation of entire programs our version of the boyer benchmark is available at ftp ftpnetcomcom pub hb hbaker cboyer
we study whether when restricted to using polylogarithmic memory and polylogarithmic passes we can achieve qualitatively better data compression with multiple read write streams than we can with only one we first show how we can achieve universal compression using only one pass over one stream we then show that one stream is not sufficient for us to achieve good grammar based compression finally we show that two streams are necessary and sufficient for us to achieve entropy only bounds the acm portal is published by the association for computing machinery copyright acm inc terms of usage privacy policy code of ethics contact us useful downloads adobe acrobat quicktime windows media player real player
this paper describes new framework of register allocation based on chaitin style coloring our focus is on maximizing the chances for live ranges to be allocated to the most preferred registers while not destroying the colorability obtained by graph simplification our coloring algorithm uses graph representation of preferences called register preference graph which helps find good register selection we then try to relax the register selection order created by the graph simplification the relaxed order is defined as partial order represented using graph called coloring precedence graph our algorithm utilizes such partial order for the register selection instead of using the traditional simplification driven order so that the chances of honoring the preferences are effectively increased experimental results show that our coloring algorithm is powerful to simultaneously handle spill decisions register coalescing and preference resolutions
statistical language models have been successfully applied to many information retrieval tasks including expert finding the process of identifying experts given particular topic in this paper we introduce and detail language modeling approaches that integrate the representation association and search of experts using various textual data sources into generative probabilistic framework this provides simple intuitive and extensible theoretical framework to underpin research into expertise search to demonstrate the flexibility of the framework two search strategies to find experts are modeled that incorporate different types of evidence extracted from the data before being extended to also incorporate co occurrence information the models proposed are evaluated in the context of enterprise search systems within an intranet environment where it is reasonable to assume that the list of experts is known and that data to be mined is publicly accessible our experiments show that excellent performance can be achieved by using these models in such environments and that this theoretical and empirical work paves the way for future principled extensions
bag of tasks applications are parallel applications composed of independent tasks examples of bag of tasks bot applications include monte carlo simulations massive searches such as key breaking image manipulation applications and data mining algorithms this paper analyzes the scalability of bag of tasks applications running on master slave platforms and proposes scalability related measure dubbed input file affinity in this work we also illustrate how the input file affinity which is characteristic of an application can be used to improve the scalability of bag of tasks applications running on master slave platforms the input file affinity was considered in new scheduling algorithm dubbed dynamic clustering which is oblivious to task execution times we compare the scalability of the dynamic clustering algorithm to several other algorithms oblivious and non oblivious to task execution times proposed in the literature we show in this paper that in several situations the oblivious algorithm dynamic clustering has scalability performance comparable to non oblivious algorithms which is remarkable considering that our oblivious algorithm uses much less information to schedule tasks
flow directed inlining strategy uses information derived from control flow analysis to specialize and inline procedures for functional and object oriented languages since it uses control flow analysis to identify candidate call sites flow directed inlining can inline procedures whose relationships to their call sites are not apparent for instance procedures defined in other modules passed as arguments returned as values or extracted from data structures can all be inlined flow directed inlining specializes procedures for particular call sites and can selectively inline particular procedure at some call sites but not at others finally flow directed inlining encourages modular implementations control flow analysis inlining and post inlining optimizations are all orthogonal components results from prototype implementation indicate that this strategy effectively reduces procedure call overhead and leads to significant reduction in execution time
this position paper addresses the question of integrating grid andmas multi agent systems models by means of service oriented approachservice oriented computing soc tries to address many challenges in the worldof computing with services the concept of service is clearly at the intersectionof grid and mas and their integration allows to address one of these keychallenges the implementation of dynamically generated services based on conversationsin our approach services are exchanged ie provided and used byagents through grid mechanisms and infrastructure integration goes beyondthe simple interoperation of applications and standards it has to be intrinsic tothe underpinning model we introduce here an quite unique integration modelfor grid and mas this model is formalized and represented by graphicaldescription language called agent grid integration language agil this integrationis based on two main ideas the representation of agent capabilitiesas grid services in service containers ii the assimilation of the service instantiationmechanism from grid with the creation of new conversation context from mas the integrated model may be seen as formalization of agent interactionfor service exchange
major difficulty of text categorization problems is the high dimensionality of the feature space thus feature selection is often performed in order to increase both the efficiency and effectiveness of the classification in this paper we propose feature selection method based on testor theory this criterion takes into account inter feature relationships we experimentally compared our method with the widely used information gain using two well known classification algorithms nearest neighbour and support vector machine two benchmark text collections were chosen as the testbeds reuters and reuters corpus version rcv we found that our method consistently outperformed information gain for both classifiers and both data collections especially when aggressive feature selection is carried out
when trying to apply recently developed approaches for updating description logic aboxes in the context of an action programming language one encounters two problems first updates generate so called boolean aboxes which cannot be handled by traditional description logic reasoners second iterated update operations result in very large boolean aboxes which however contain huge amount of redundant information in this paper we address both issues from practical point of view
job scheduling in data centers can be considered from cyber physical point of view as it affects the data center’s computing performance ie the cyber aspect and energy efficiency the physical aspect driven by the growing needs to green contemporary data centers this paper uses recent technological advances in data center virtualization and proposes cyber physical spatio temporal ie start time and servers assigned thermal aware job scheduling algorithms that minimize the energy consumption of the data center under performance constraints ie deadlines savings are possible by being able to temporally spread the workload assign it to energy efficient computing equipment and further reduce the heat recirculation and therefore the load on the cooling systems this paper provides three categories of thermal aware energy saving scheduling techniques fcfs backfill xint and fcfs backfill lrh thermal aware job placement enhancements to the popular first come first serve with back filling fcfs backfill scheduling policy edf lrh an online earliest deadline first scheduling algorithm with thermal aware placement and an offline genetic algorithm for scheduling to minimize thermal cross interference scint which is suited for batch scheduling of backlogs simulation results based on real job logs from the asu fulton hpc data center show that the thermal aware enhancements to fcfs backfill achieve up to savings compared to fcfs backfill with first fit placement depending on the intensity of the incoming workload while scint achieves up to savings the performance of edf lrh nears that of the offline scint for low loads and it degrades to the performance of fcfs backfill for high loads however edf lrh requires milliseconds of operation which is significantly faster than scint the latter requiring up to hours of runtime depending upon the number and size of submitted jobs similarly fcfs backfill lrh is much faster than fcfs backfill xint but it achieves only part of fcfs backfill xint’s savings
structured overlay networks have recently received much attention due to their self properties under dynamic and decentralized settings the number of nodes in an overlay fluctuates all the time due to churn since knowledge of the size of the overlay is core requirement for many systems estimating the size in decentralized manner is challenge taken up by recent research activities gossip based aggregation has been shown to give accurate estimates for the network size but previous work done is highly sensitive to node failures in this paper we present gossip based aggregation style network size estimation algorithm we discuss shortcomings of existing aggregation based size estimation algorithms and give solution that is highly robust to node failures and is adaptive to network delays we examine our solution in various scenarios to demonstrate its effectiveness
the scopedmemory class of the rtsj enables the organization of objects into regions this ensures time predictable management of dynamic memory using scopes forces the programmer to reason in terms of locality to comply with rtsj restrictions the programmer is also faced with the problem of providing upper bounds for regions without appropriate compile time support scoped memory management may lead to unexpected runtime errors this work presents the integration of series of compile time analysis techniques to help identifying memory regions their sizes and overall memory usage first the tool synthesizes scoped based memory organization where regions are associated with methods second it infers their sizes in parametric forms in terms of relevant program variables third it exhibits parametric upper bound on the total amount of memory required to run method we present some preliminary results showing that semi automatic tool assisted generation of scoped based code is both helpful and doable
there is large consensus on the need for middleware to efficiently support adaptation in pervasive and mobile computing advanced forms of adaptation require the aggregation of context data and the evaluation of policy rules that are typically provided by multiple sources this paper addresses the problem of designing the reasoning core of middleware that supports these tasks while guaranteeing very low response times as required by mobile applications technically the paper presents strategies to deal with conflicting rules algorithms that implement the strategies and algorithms that detect and solve potential rule cycles detailed experimental analysis supports the theoretical results and shows the applicability of the resulting middleware in large scale applications
we report an exploratory study of the impacts of design planning on end users asked to develop simple interactive web application some participants were asked to create conceptual map to plan their projects and others to write scenarios third group was asked to do whatever they found useful we describe the planning that each group underwent how they approached the web development task and their reactions to the experience afterwards we also discuss how participants gender and experience was related to their web development activities
the ability to understand and manage social signals of person we are communicating with is the core of social intelligence social intelligence is facet of human intelligence that has been argued to be indispensable and perhaps the most important for success in life this paper argues that next generation computing needs to include the essence of social intelligence the ability to recognize human social signals and social behaviours like politeness and disagreement in order to become more effective and more efficient although each one of us understands the importance of social signals in everyday life situations and in spite of recent advances in machine analysis of relevant behavioural cues like blinks smiles crossed arms laughter and similar design and development of automated systems for social signal processing ssp are rather difficult this paper surveys the past efforts in solving these problems by computer it summarizes the relevant findings in social psychology and it proposes set of recommendations for enabling the development of the next generation of socially aware computing
system area networks sans which usually accept arbitrary topologies have been used to connect hosts in pc clusters although deadlock free routing is often employed for low latency communications using wormhole or virtual cut through switching the interconnection adaptivity introduces difficulties in establishing deadlock free paths an up down routing algorithm which has been widely used to avoid deadlocks in irregular networks tends to make unbalanced paths as it employs one dimensional directed graph the current study introduces two dimensional directed graph on which adaptive routings called left up first turn turn routings and right down last turn turn routings are proposed to make the paths as uniformly distributed as possible this scheme guarantees deadlock freedom because it uses the turn model approach and the extra degree of freedom in the two dimensional graph helps to ensure that the prohibited turns are well distributed simulation results show that better throughput and latency results from uniformly distributing the prohibited turns by which the traffic would be more distributed toward the leaf nodes the turn routings which meet this condition improve throughput by up to percent compared with two up down based routings and also reduce latency
optimal data placement on clv constant linear velocity format optical discs has an objective the minimization of the expected access cost of data retrievals from the disc when the probabilities of access of data items may be different the problem of optimal data placement for optical discs is both important and more difficult than the corresponding problem on magnetic discs good data placement on optical discs is more important because data sets on optical discs such as worm and cd rom cannot be modified or moved once they are placed on disc currently even rewritable optical discs are best suited for applications that are archival in nature the problem of optimal data placement on clv format optical discs is more difficult mainly because the useful storage space is not uniformly distributed across the disc surface along the radius this leads to complicated positional performance trade off not present for magnetic disks we present model that encompasses all the important aspects of the placement problem on clv format optical discs the model takes into account the nonuniform distribution of useful storage the dependency of the rotational delay on disc position parameterized seek cost function for optical discs and the varying access probabilities of data items we show that the optimal placement of high probability blocks satisfies unimodality property based on this observation we solve the optimal placement problem we then study the impact of the relative weights of the problem parameters and show that the optimal data placement may be very different from the optimal data placement on magnetic disks we also validate our model and analysis and give an algorithm for computing the placement of disc sectors
in this paper we propose notion of question utility for studying usefulness of questions and show how question utility can be integrated into question search as static ranking to measure question utility we examine three methods method of employing the language model to estimate the probability that question is generated from question collection and then using the probability as question utility method of using the lexrank algorithm to evaluate centrality of questions and then using the centrality as question utility and the combination of and to use question utility in question search we employ log linear model for combining relevance score in question search and utility score regarding question utility our experimental results with the questions about travel from yahoo answers show that question utility can be effective in boosting up ranks of generally useful questions
this paper presents an approach to multi sensory and multi modal fusion in which computer vision information obtained from calibrated cameras is integrated with large scale sentient computing system known as spirit the spirit system employs an ultrasonic location infrastructure to track people and devices in an office building and model their state vision techniques include background and object appearance modelling face detection segmentation and tracking modules integration is achieved at the system level through the metaphor of shared perceptions in the sense that the different modalities are guided by and provide updates to shared world model this model incorporates aspects of both the static eg positions of office walls and doors and the dynamic eg location and appearance of devices and people environmentfusion and inference are performed by bayesian networks that model the probabilistic dependencies and reliabilities of different sources of information over time it is shown that the fusion process significantly enhances the capabilities and robustness of both sensory modalities thus enabling the system to maintain richer and more accurate world model
we present angie system that can answer user queries by combining knowledge from local database with knowledge retrieved from web services if user poses query that cannot be answered by the local database alone angie calls the appropriate web services to retrieve the missing information this information is integrated seamlessly and transparently into the local database so that the user can query and browse the knowledge base while appropriate web services are called automatically in the background
filter method of feature selection based on mutual information called normalized mutual information feature selection nmifs is presented nmifs is an enhancement over battiti’s mifs mifs and mrmr methods the average normalized mutual information is proposed as measure of redundancy among features nmifs outperformed mifs mifs and mrmr on several artificial and benchmark data sets without requiring user defined parameter in addition nmifs is combined with genetic algorithm to form hybrid filter wrapper method called gamifs this includes an initialization procedure and mutation operator based on nmifs to speed up the convergence of the genetic algorithm gamifs overcomes the limitations of incremental search algorithms that are unable to find dependencies between groups of features
the serious bugs and security vulnerabilities facilitated by lack of bounds checking are well known yet and remain in widespread use unfortunately c’s arbitrary pointer arithmetic conflation of pointers and arrays and programmer visible memory layout make retrofitting with spatial safety guarantees extremely challenging existing approaches suffer from incompleteness have high runtime overhead or require non trivial changes to the source code thus far these deficiencies have prevented widespread adoption of such techniques this paper proposes softbound compile time transformation for enforcing spatial safety of inspired by hardbound previously proposed hardware assisted approach softbound similarly records base and bound information for every pointer as disjoint metadata this decoupling enables softbound to provide spatial safety without requiring changes to source code unlike hardbound softbound is software only approach and performs metadata manipulation only when loading or storing pointer values formal proof shows that this is sufficient to provide spatial safety even in the presence of arbitrary casts softbound’s full checking mode provides complete spatial violation detection with runtime overhead on average to further reduce overheads softbound has store only checking mode that successfully detects all the security vulnerabilities in test suite at the cost of only runtime overhead on average
the design issues related to routing in wireless sensor networks wsns are inherently different from those encountered in traditional mobile ad hoc networks routing protocols for adhoc networks usually impose prohibitive demands on scares resources of sensor node such as memory bandwidth and energy therefore they are not suitable for wsns in this paper we present novel energy adaptive data forwarding protocol referred to as ring band based energy adaptive protocol reap in which nodes self organise into virtual ring bands centred at the base station bs packets are automatically delivered to the bs along path with decreasing ring band number furthermore the proposed probabilistic forwarding mechanism also balances the workload among neighbouring nodes within the same ring band simulation study showed reap exhibits good performance in various network settings even when nodes are in rapid motion
the most expressive way humans display emotions is through facial expressions in this work we report on several advances we have made in building system for classification of facial expressions from continuous video input we introduce and test different bayesian network classifiers for classifying expressions from video focusing on changes in distribution assumptions and feature dependency structures in particular we use naive bayes classifiers and change the distribution from gaussian to cauchy and use gaussian tree augmented naive bayes tan classifiers to learn the dependencies among different facial motion features we also introduce facial expression recognition from live video input using temporal cues we exploit the existing methods and propose new architecture of hidden markov models hmms for automatically segmenting and recognizing human facial expression from video sequences the architecture performs both segmentation and recognition of the facial expressions automatically using multi level architecture composed of an hmm layer and markov model layer we explore both person dependent and person independent recognition of expressions and compare the different methods
we study the size of memory of mobile agents that permits to solve deterministically the rendezvous problem ie the task of meeting at some node for two identical agents moving from node to node along the edges of an unknown anonymous connected graph the rendezvous problem is unsolvable in the class of arbitrary connected graphs as witnessed by the example of the cycle hence we restrict attention to rendezvous in trees where rendezvous is feasible if and only if the initial positions of the agents are not symmetric we prove that the minimum memory size guaranteeing rendezvous in all trees of size at most is logn bits the upper bound is provided by an algorithm for abstract state machines accomplishing rendezvous in all trees and using logn bits of memory in trees of size at most the lower bound is consequence of the need to distinguish between up to links incident to node thus in the second part of the paper we focus on the potential existence of pairs of finite agents ie finite automata capable of accomplishing rendezvous in all bounded degree trees we show that as opposed to what has been proved for the graph exploration problem there are no finite agents capable of accomplishing rendezvous in all bounded degree trees
the compare and swap register cas is synchronization primitive for lock free algorithms most uses of it however suffer from the so called aba problem the simplest and most efficient solution to the aba problem is to include tag with the memory location such that the tag is incremented with each update of the target location this solution however is theoretically unsound and has limited applicability this paper presents general lock free pattern that is based on the synchronization primitive cas without causing aba problem or problems with wrap around it can be used to provide lock free functionality for any data type our algorithm is cas variation of herlihy’s ll sc methodology for lock free transformation the basis of our techniques is to poll different locations on reading and writing objects in such way that the consistency of an object can be checked by its location instead of its tag it consists of simple code that can be easily implemented using like languages true problem of lock free algorithms is that they are hard to design correctly which even holds for apparently straightforward algorithms we therefore develop reduction theorem that enables us to reason about the general lock free algorithm to be designed on higher level than the synchronization primitives the reduction theorem is based on lamport’s refinement mappings and has been verified with the higher order interactive theorem prover pvs using the reduction theorem fewer invariants are required and some invariants are easier to discover and formulate without considering the internal structure of the final implementation
sentence extraction is widely adopted text summarization technique where the most important sentences are extracted from document and presented as summary the first step towards sentence extraction is to rank sentences in order of importance as in the summary this paper proposes novel graph based ranking method ispreadrank to perform this task ispreadrank models set of topic related documents into sentence similarity network based on such network model ispreadrank exploits the spreading activation theory to formulate general concept from social network analysis the importance of node in network ie sentence in this paper is determined not only by the number of nodes to which it connects but also by the importance of its connected nodes the algorithm recursively re weights the importance of sentences by spreading their sentence specific feature scores throughout the network to adjust the importance of other sentences consequently ranking of sentences indicating the relative importance of sentences is reasoned this paper also develops an approach to produce generic extractive summary according to the inferred sentence ranking the proposed summarization method is evaluated using the duc data set and found to perform well experimental results show that the proposed method obtains rouge score of which represents slight difference of when compared with the best participant in the duc evaluation
embedded system security is often compromised when trusted software is subverted to result in unintended behavior such as leakage of sensitive data or execution of malicious code several countermeasures have been proposed in the literature to counteract these intrusions common underlying theme in most of them is to define security policies at the system level in an application independent manner and check for security violations either statically or at run time in this paper we present methodology that addresses this issue from different perspective it defines correct execution as synonymous with the way the program was intended to run and employs dedicated hardware monitor to detect and prevent unintended program behavior specifically we extract properties of an embedded program through static program analysis and use them as the bases for enforcing permissible program behavior at run time the processor architecture is augmented with hardware monitor that observes the program’s dynamic execution trace checks whether it falls within the allowed program behavior and flags any deviations from expected behavior to trigger appropriate response mechanisms we present properties that capture permissible program behavior at different levels of granularity namely inter procedural control flow intra procedural control flow and instruction stream integrity we outline systematic methodology to design application specific hardware monitors for any given embedded program hardware implementations using commercial design flow and cycle accurate performance simulations indicate that the proposed technique can thwart several common software and physical attacks facilitating secure program execution with minimal overheads
we present progressive encoding technique specifically designed for complex isosurfaces it achieves better rate distortion performance than all standard mesh coders and even improves on all previous single rate isosurface coders our novel algorithm handles isosurfaces with or without sharp features and deals gracefully with high topologic and geometric complexity the inside outside function of the volume data is progressively transmitted through the use of an adaptive octree while local frame based encoding is used for the fine level placement of surface samples local patterns in topology and local smoothness in geometry are exploited by context based arithmetic encoding allowing us to achieve an average of bits per vertex at very low distortion of this rate only are dedicated to connectivity data this improves by over the best previous single rate isosurface encoder
we explore how the placement of control widgets such as menus affects collaboration and usability for co located tabletop groupware applications we evaluated two design alternatives centralized set of controls shared by all users and separate per user controls replicated around the borders of the shared tabletop we conducted this evaluation in the context of teamtag system for collective annotation of digital photos our comparison of the two design alternatives found that users preferred replicated over shared controls we discuss the cause of this preference and also present data on the impact of these interface design variants on collaboration as well as the role that orientation co touching and the use of different regions of the table played in shaping users behavior and preferences
bpql is novel query language for querying business process specifications introduced recently in it is based on an intuitive model of business processes as rewriting systems an abstraction of the emerging bpel business process execution language standard bpql allows users to query business processes visually in manner very analogous to the language used to specify the processes the goal of the present paper is to study the formal model underlying bpql and investigate its properties as well as the complexity of query evaluation we also study its relationship to previously suggested formalisms for process modeling and querying in particular we propose query evaluation algorithm of polynomial data complexity that can be applied uniformly to queries on the structure of the process specification as well as on the potential behavior of the defined process we show that unless np the efficiency of our algorithm is asymptotically optimal
objects often define usage protocols that clients must follow inorder for these objects to work properly aliasing makes itnotoriously difficult to check whether clients and implementations are compliant with such protocols accordingly existing approaches either operate globally or severely restrict aliasing we have developed sound modular protocol checking approach based on typestates that allows great deal of flexibility in aliasing while guaranteeing the absence of protocol violations at runtime the main technical contribution is novel abstraction access permissions that combines typestate and object aliasing information in our methodology developers express their protocol design intent through annotations based on access permissions our checking approach then tracks permissions through method implementations for each object reference the checker keeps track of the degree of possible aliasing and is appropriately conservativein reasoning about that reference this helps developers account for object manipulations that may occur through aliases the checking approach handles inheritance in novel way giving subclasses more flexibility in method overriding case studies on java iterators and streams provide evidence that access permissions can model realistic protocols and protocol checking based on access permissions can be used to reason precisely about the protocols that arise in practice
the goal of data integration is to provide uniform access to set of heterogeneous data sources freeing the user from the knowledge about where the data are how they are stored and how they can be accessed one of the outcomes of the research work carried out on data integration in the last years is clear architecture comprising global schema the source schema and the mapping between the source and the global schema although in many research works and commercial tools the global schema is simply data structure integrating the data at the sources we argue that the global schema should represent instead the conceptual model of the domain however to fully pursue such an approach several challenging issues are to be addressed the main goal of this paper is to analyze one of them namely how to express the conceptual model representing the global schema we start our analysis with the case where such schema is expressed in terms of uml class diagram and we end up with proposal of particular description logic called textit dl lite mathcal id we show that the data integration framework based on such logic has several interesting properties including the fact that both reasoning at design time and answering queries at run time can be done efficiently
with business emerging as key enabler to drive supply chains the focus of supply chain management has been shifted from production efficiency to customer driven and partnership synchronization approaches this strategic shift depends on the match between the demands and offerings that deliver the services to achieve this we need to coordinate the flow of information among the services and link their business processes under various constraints existing approaches to this problem have relied on complete information of services and resources and have failed to adequately address the dynamics and uncertainties of the operating environments the real world situation is complicated as result of undetermined requirements of services involved in the chain unpredictable solutions contributed by service providers and dynamic selection and aggregation of solutions to services this paper examines an agent mediated approach to on demand business supply chain integration each agent works as service broker exploring individual service decisions as well as interacting with each other for achieving compatibility and coherence among the decisions of all services based on the framework prototype has been implemented with simulated experiments highlighting the effectiveness of the approach
wireless network wlan provides unique challenges to system design wlan uses shared and highly unreliable medium where protocols must rely on precise timing of requests and responses to detect submission errors and priorities among network nodes in addition wlan stations are often embedded and have tight constraints on power costs and performance to design wlan nodes precise estimations on performance and resource usage are needed for the complete network system to explore the design space and assess the quality of solutions our systemclick framework combines systemc with resource models and performance annotations derived from actual implementations based on click model generation is automated and the performance of systemclick model is obtained depending on actual activation patterns of different functional blocks in the model case study demonstrates our approach for the analysis of real time critical systems
unexpected rules are interesting because they are either previously unknown or deviate from what prior user knowledge would suggest in this paper we study three important issues that have been previously ignored in mining unexpected rules first the unexpectedness of rule depends on how the user prefers to apply the prior knowledge to given scenario in addition to the knowledge itself second the prior knowledge should be considered right from the start to focus the search on unexpected rules third the unexpectedness of rule depends on what other rules the user has seen so far thus only rules that remain unexpected given what the user has seen should be considered interesting we develop an approach that addresses all three problems above and evaluate it by means of experiments focusing on finding interesting rules
developing profiles to describe user or system behaviour is useful technique employed in computer forensic investigations information found in data obtained by investigators can often be used to establish view of regular usage patterns which can then be examined for unusual occurrences this paper describes one such method based on details provided by events found within computer forensic evidence events compiled from potentially numerous sources are grouped according to some criteria and frequently occurring event sequences are established the methodology and techniques to extract and contrast these sequences are then described and discussed along with similar prior work in the same domain
on line analytical processing olap has become an important component in most data warehouse systems and decision support systems in recent years in order to deal with the huge amount of data highly complex queries and increasingly strict response time requirements approximate query processing has been deemed viable solution most works in this area however focus on the space efficiency and are unable to provide quality guaranteed answers to queries to remedy this in this paper we propose an efficient framework of dct for data with error estimation called dawn which focuses on answering range sum queries from compressed op cubes transformed by dct specifically utilizing the techniques of geometric series and euler’s formula we devise robust summation function called the ge function to answer range queries in constant time regardless of the number of data cells involved note that the ge function can estimate the summation of cosine functions precisely thus the quality of the answers is superior to that of previous works furthermore an estimator of errors based on the brown noise assumption bna is devised to provide tight bounds for answering range sum queries our experiment results show that the dawn framework is scalable to the selectivity of queries and the available storage space with ge functions and the bna method the dawn framework not only delivers high quality answers for range sum queries but also leads to shorter query response time due to its effectiveness in error estimation
temperature is becoming first rate design criterion in asics due to its negative impact on leakage power reliability performance and packaging cost incorporating awareness of such lower level physical phenomenon in high level synthesis algorithms will help to achieve better designs in this work we developed temperature aware binding algorithm switching power of module correlates with its operating temperature the goal of our binding algorithm is to distribute the activity evenly across functional units this approach avoids steep temperature differences between modules on chip hence the occurrence of hot spots starting with switching optimal binding solution our algorithm iteratively minimizes the maximum temperature reached by the hottest functional unit our algorithm does not change the number of resources used in the original binding we have used hotspot temperature modeling tool to simulate temperature of number asic designs our binding algorithm reduces temperature reached by the hottest resource by on average reducing the peak temperature has positive impact on leakage as well our binding technique improves leakage power by and overall power by on average at nm technology node compared to switching optimal binding
this research investigates the work practices of system administrators using semi structured interviews and an analysis of existing system administrator literature we theorize that system administrators act as technical brokers who bridge two communities the end users they support and their own technical community we also show that system administrators like other technical workers rely on contextual knowledge this knowledge is largely acquired through practice and less through formal education and certification through discussion of common reactive and proactive system administrator duties we present system administrators as broker technicians who must mediate between the end users they support and their technical community we end with discussion of the changing role of sysadmins as their tools and users get more sophisticated
asymmetric broadband connections in the home provide limited upstream pipe to the internet this limitation makes various applications such as remote backup and sharing high definition video impractical however homes in neighborhood often have high bandwidth wireless networks whose bandwidth exceeds that of single wired uplink moreover most wired and wireless connections are idle most of the time in this paper we examine the fundamental requirements of system that aggregates upstream broadband connections in neighborhood using wireless communication between homes scheme addressing this problem must operate efficiently in an environment that is highly lossy ii broadcast in nature and iii half duplex we propose novel scheme link alike that addresses those three challenges using opportunistic wireless reception novel wireless broadcast rate control scheme and preferential use of the wired downlink through analytical and experimental evaluation we demonstrate that our approach provides significantly better throughput than previous solutions based on tcp or udp unicast
entity resolution er is the problem of identifying which records in database refer to the same real world entity an exhaustive er process involves computing the similarities between pairs of records which can be very expensive for large datasets various blocking techniques can be used to enhance the performance of er by dividing the records into blocks in multiple ways and only comparing records within the same block however most blocking techniques process blocks separately and do not exploit the results of other blocks in this paper we propose an iterative blocking framework where the er results of blocks are reflected to subsequently processed blocks blocks are now iteratively processed until no block contains any more matching records compared to simple blocking iterative blocking may achieve higher accuracy because reflecting the er results of blocks to other blocks may generate additional record matches iterative blocking may also be more efficient because processing block now saves the processing time for other blocks we implement scalable iterative blocking system and demonstrate that iterative blocking can be more accurate and efficient than blocking for large datasets
wikipedia has grown to be the world largest and busiest free encyclopedia in which articles are collaboratively written and maintained by volunteers online despite its success as means of knowledge sharing and collaboration the public has never stopped criticizing the quality of wikipedia articles edited by non experts and inexperienced contributors in this paper we investigate the problem of assessing the quality of articles in collaborative authoring of wikipedia we propose three article quality measurement models that make use of the interaction data between articles and their contributors derived from the article edit history our scp asic scp model is designed based on the mutual dependency between article quality and their author authority the scp eer scp scp eview scp model introduces the review behavior into measuring article quality finally our scp rob scp scp eview scp models extend scp eer scp scp eview scp with partial reviewership of contributors as they edit various portions of the articles we conduct experiments on set of well labeled wikipedia articles to evaluate the effectiveness of our quality measurement models in resembling human judgement
on account of the enormous amounts of rules that can be produced by data mining algorithms knowledge post processing is difficult stage in an association rule discovery process in order to find relevant knowledge for decision making the user decision maker specialized in the data studied needs to rummage through the rules to assist him her in this task we here propose the rule focusing methodology an interactive methodology for the visual post processing of association rules it allows the user to explore large sets of rules freely by focusing his her attention on limited subsets this new approach relies on rule interestingness measures on visual representation and on interactive navigation among the rules we have implemented the rule focusing methodology in prototype system called arvis it exploits the user’s focus to guide the generation of the rules by means of specific constraint based rule mining algorithm
for the last few years considerable number of efforts have been devoted into integrating security issues into information systems development practices this has led to number of languages methods methodologies and techniques for considering security issues during the developmental stages of an information system however these approaches mainly focus on security requirements elicitation analysis and design issues and neglect testing this paper presents the security attack testing sat approach novel scenario based approach that tests the security of an information system at the design time the approach is illustrated with the aid of real life case study involving the development of health and social care information system
how to exploit application semantics to improve the performance of real time data intensive application has been an active research topic in the past few years weaker correctness criteria and semantics based concurrency control algorithms were proposed to provide more flexibility in reordering read and write events distinct from past work this paper exploits the trade off between data consistency and system workload the definition of similarity is combined with the idea of transaction skipping to provide theoretical foundation for reducing the workload of transaction system we also propose guidelines to adjust the execution frequencies of static set of transactions and prove their correctness the strengths of this work were verified by simulation experiments on an air traffic control example
we investigate the feasibility of reconstructing an arbitrarily shaped specular scene refractive or mirror like from one or more viewpoints by reducing shape recovery to the problem of reconstructing individual light paths that cross the image plane we obtain three key results first we show how to compute the depth map of specular scene from single viewpoint when the scene redirects incoming light just once second for scenes where incoming light undergoes two refractions or reflections we show that three viewpoints are sufficient to enable reconstruction in the general case third we show that it is impossible to reconstruct individual light paths when light is redirected more than twice our analysis assumes that for every point on the image plane we know at least one point on its light path this leads to reconstruction algorithms that rely on an environment matting procedure to establish pixel to point correspondences along light path preliminary results for variety of scenes mirror glass etc are also presented
the well accepted wisdom is that tcp’s exponential backoff mechanism introduced by jacobson years ago is essential for preserving the stability of the internet in this paper we show that removing exponential backoff from tcp altogether can be done without inducing any stability side effects we introduce the implicit packet conservation principle and show that as long as the endpoints uphold this principle they can only improve their end to end performance relative to the exponential backoff case by conducting large scale simulations modeling and network experiments in emulab and the internet using kernel level freebsd tcp implementation realistic traffic distributions and complex network topologies we demonstrate that tcp’s binary exponential backoff mechanism can be safely removed moreover we show that insuitability of tcp’s exponential backoff is fundamental ie independent from the currently dominant internet traffic properties or bottleneck capacities surprisingly our results indicate that path to incrementally deploying the change does exist
in this paper we investigate whether it is possible to develop measure that quantifies the naturalness of human motion as defined by large database such measure might prove useful in verifying that motion editing operation had not destroyed the naturalness of motion capture clip or that synthetic motion transition was within the space of those seen in natural human motion we explore the performance of mixture of gaussians mog hidden markov models hmm and switching linear dynamic systems slds on this problem we use each of these statistical models alone and as part of an ensemble of smaller statistical models we also implement naive bayes nb model for baseline comparison we test these techniques on motion capture data held out from database keyframed motions edited motions motions with noise added and synthetic motion transitions we present the results as receiver operating characteristic roc curves and compare the results to the judgments made by subjects in user study
in these lecture notes we present the itasks system set of combinators to specify work flows in pure functional language at very high level of abstraction work flow systems are automated systems in which tasks are coordinated that have to be executed by either humans or computers the combinators that we propose support work flow patterns commonly found in commercial work flow systems in addition we introduce novel work flow patterns that capture real world requirements but that can not be dealt with by current systems compared with most of these commercial systems the itasks system offers several further advantages tasks are statically typed tasks can be higher order the combinators are fully compositional dynamic and recursive work flows can be specified and last but not least the specification is used to generate an executable web based multi user work flow application with the itasks system useful work flows can be defined which cannot be expressed in other systems work can be interrupted and subsequently directed to other workers for further processing the itasks system has been constructed in the programming language clean making use of its generic programming facilities and its idata toolkit with which interactive thin client form based web applications can be created in all itasks are an excellent case of the expressive power of functional and generic programming
in this paper we present perceptual experiment whose results aid the creation of non photorealistic rendering npr styles which can positively affect user task performance in real time scenes in visual search task designed to test people’s perception of abstracted scenes different types of stylisation are used to investigate how reaction times can be affected we show how npr techniques compare against non stylised renderings compare the effectiveness of different styles determine how varying each style can affect performance and investigate how these styles perform with objects of varying complexity the results show that npr can be useful tool for increasing the saliency of target objects while reducing the visual impact of the rest of the scene however it is also shown that the success of each style depends largely on the scene context and also on the level of stylisation used we believe the results from this study can help in the creation of effective npr styles in the future supplementary material can be found at http isgcstcdie skrbal redmond
applications written in unsafe languages like and are vulnerable to memory errors such as buffer overflows dangling pointers and reads of uninitialized data such errors can lead to program crashes security vulnerabilities and unpredictable behavior we present diehard runtime system that tolerates these errors while probabilistically maintaining soundness diehard uses randomization and replication to achieve probabilistic memory safety by approximating an infinite sized heap diehard’s memory manager randomizes the location of objects in heap that is at least twice as large as required this algorithm prevents heap corruption and provides probabilistic guarantee of avoiding memory errors for additional safety diehard can operate in replicated mode where multiple replicas of the same application are run simultaneously by initializing each replica with different random seed and requiring agreement on output the replicated version of die hard increases the likelihood of correct execution because errors are unlikely to have the same effect across all replicas we present analytical and experimental results that show diehard’s resilience to wide range of memory errors including heap based buffer overflow in an actual application
in this paper we show how to reduce downtime of jee applications by rapidly and automatically recovering from transient and intermittent software failures without requiring application modifications our prototype combines three application agnostic techniques macroanalysis for fault detection and localization microrebooting for rapid recovery and external management of recovery actions the individual techniques are autonomous and work across wide range of componentized internet applications making them well suited to the rapidly changing software of internet services the proposed framework has been integrated with jboss an open source jee application server our prototype provides an execution platform that can automatically recover jee applications within seconds of the manifestation of fault our system can provide subset of system’s active end users with the illusion of continuous uptime in spite of failures occurring behind the scenes even when there is no functional redundancy in the system
we present method for browsing videos by directly dragging their content this method brings the benefits of direct manipulation to an activity typically mediated by widgets we support this new type of interactivity by automatically extracting motion data from videos and new technique called relative flow dragging that lets users control video playback by moving objects of interest along their visual trajectory we show that this method can outperform the traditional seeker bar in video browsing tasks that focus on visual content rather than time
mobile agents are promising technology to face the problems raised by the increasing complexity and size of today’s networks in particular in the area of network management mobile agents can lead to fully distributed paradigm to overcome the limits of traditional centralized approaches basic requirement for the management of complex network is the definition of high level and flexible models to coordinate the accesses to the resources mdash data and services mdash provided by the network nodes on this basis this paper describes the mars coordination architecture for mobile agents mars is based on the definition of programmable tuple spaces associated with the network nodes mobile agents can access the local resources and services via the tuple space thus adopting standard and high level interface the network administrator mdash via mobile agents mdash can dynamically program the behavior of the tuple space in reaction to the agents access to the tuple space thus leading to flexible network model several examples show the effectiveness of the mars approach in supporting network management activities
this paper studies the performance and security aspects of the iscsi protocol in network storage based system ethernet speeds have been improving rapidly and network throughput is no longer considered bottleneck when compared to fibre channel based storage area networks however when security of the data traffic is taken into consideration existing protocols like ipsec prove to be major hindrance to the overall throughput in this paper we evaluate the performance of iscsi when deployed over standard security protocols and suggest lazy crypto approaches to alleviate the processing needs at the server the testbed consists of cluster of linux machines directly connected to the server through gigabit ethernet network micro and application benchmarks like btio and dbench were used to analyze the performance and scalability of the different approaches our proposed lazy approaches improved through put by as much as for microbenchmarks and for application benchmarks in comparison to the ipsec based approaches
this paper presents novel independent component analysis ica color space method for pattern recognition the novelty of the ica color space method is twofold deriving effective color image representation based on ica and implementing efficient color image classification using the independent color image representation and an enhanced fisher model efm first the ica color space method assumes that each color image is defined by three independent source images which can be derived by means of blind source separation procedure such as ica unlike the color rgb space where the and component images are correlated the new ica color space method derives three component images and that are independent and hence uncorrelated second the three independent color component images are concatenated to form an augmented pattern vector whose dimensionality is reduced by principal component analysis pca an efm then derives the discriminating features of the reduced pattern vector for pattern recognition the effectiveness of the proposed ica color space method is demonstrated using complex grand challenge pattern recognition problem and large scale database in particular the face recognition grand challenge frgc and the biometric experimentation environment bee reveal that for the most challenging frgc version experiment which contains training images controlled target images and uncontrolled query images the ica color space method achieves the face verification rate roc iii of at the false accept rate far of compared to the face verification rate fvr of of the rgb color space using the same efm and of the frgc baseline algorithm at the same far
in this paper we introduce the method tagging substitute complement attributes on miscellaneous recommending relations and elaborate how this step contributes to electronic merchandising there are already decades of works in building recommender systems steadily outperforming previous algorithms is difficult under the conventional framework however in real merchandising scenarios we find describing the weight of recommendation simply as scalar number is hardly expressive which hinders the further progress of recommender systems we study large log of user browsing data revealing the typical substitute complement relations among items that can further extend recommender systems in enriching the presentation and improving the practical quality finally we provide an experimental analysis and sketch an online prototype to show that tagging attributes can grant more intelligence to recommender systems by differentiating recommended candidates to fit respective scenarios
despite the importance of pointing device movement to efficiency in interfaces little is known on how target shape impacts speed acceleration and other kinematic properties of motion in this paper we examine which kinematic characteristics of motion are impacted by amplitude and directional target constraints in fitts style pointing tasks our results show that instantaneous speed acceleration and jerk are most affected by target constraint results also show that the effects of target constraint are concentrated in the first of movement distance we demonstrate that we can discriminate between the two classes of target constraint using machine learning with accuracy greater than chance finally we highlight future work in designing techniques that make use of target constraint to improve pointing efficiency in computer interfaces
this paper presents an innovative approach to solve the problem of missing transparency of competencies within virtual organizations we based our work on empirical studies to cope with the problem of competence finding in distributed organizations former studies have shown that central storage of expertise profiles is inappropriate due to missing flexibility and high costs of maintenance the focus of our approach is to support peripheral awareness to become aware of the available competences in organizations our approach runs along two lines making expertise related communication visible for all members of an organization and visualizing competence indicating events in collaboration infrastructures we verified this approach by the evaluation of prototypical implementation
with the advent of home networking and widespread deployment of broadband connectivity to homes wealth of new services with real time quality of service qos requirements have emerged eg video on demand vod ip telephony which have to co exist with traditional non real time services such as web browsing and file downloading over the transmission control protocol tcp the co existence of such real time and non real time services demands the residential gateway rg to employ bandwidth management algorithms to control the amount of non real time tcp traffic on the broadband access link from the internet service provider isp to the rg so that the bandwidth requirements of the real time traffic are satisfied in this paper we propose an algorithm to control the aggregate bandwidth of the incoming non real time tcp traffic at the rg so that qos requirements of the real time traffic can be guaranteed the idea is to limit the maximum data rates of active tcp connections by dynamically manipulating their flow control window sizes based on the total available bandwidth for the non real time traffic we show by simulation results that our algorithm limits the aggregate bandwidth of the non real time tcp traffic thus granting the real time traffic the required bandwidth
we give generic construction for universal designated verifier signature schemes from large class of signature schemes the resulting schemes are efficient and have two important properties firstly they are provably dv unforgeable non transferable and also non delegatable secondly the signer and the designated verifier can independently choose their cryptographic settings we also propose generic construction for identity based signature schemes from any signature scheme in and prove that the construction is secure against adaptive chosen message and identity attacks we discuss possible extensions of our constructions to universal multi designated verifier signatures hierarchical identity based signatures identity based universal designated verifier signatures and identity based ring signatures from any signature in
encrypting data in unprotected memory has gained much interest lately for digital rights protection and security reasons counter mode is well known encryption scheme it is symmetric key encryption scheme based on any block cipher eg aes the schemeýs encryption algorithm uses block cipher secret key and counter or sequence number to generate an encryption pad which is xored with the data stored in memory like other memory encryption schemes this method suffers from the inherent latency of decrypting encrypted data when loading them into the on chip cache one solution that parallelizes data fetching and encryption pad generation requires the sequence numbers of evicted cache lines to be cached on chip on chip sequence number caching can be successful in reducing the latency at the cost of large area overhead in this paper we present novel technique to hide the latency overhead of decrypting counter mode encrypted memory by predicting the sequence number and pre computing the encryption pad that we call one time pad or otp in contrast to the prior techniques of sequence number caching our mechanism solves the latency issue by using idle decryption engine cycles to speculatively predict and pre compute otps before the corresponding sequence number is loaded this technique incurs very little area overhead in addition novel adaptive otp prediction technique is also presented to further improve our regular otp prediction and precomputation mechanism this adaptive scheme is not only able to predict encryption pads associated with static and infrequently updated cache lines but also those frequently updated ones as well experimental results using spec benchmark show an prediction rate moreover we also explore several optimization techniques for improving the prediction accuracy two specific techniques two level prediction and context based prediction are presented and evaluated for the two level prediction the prediction rate was improved from to with the context based prediction the prediction rate approaches context based otp prediction outperforms very large kb sequence number cache for many memory bound spec programs ipc results show an overall to performance improvement using our prediction and precomputation and another improvement when context based prediction techniques is used
in order to fulfill the complex resource requirements of some users in grid environments support for coallocation between different resource providers is needed here it is quite difficult to coordinate these different services from different resource providers because grid scheduler has to cope with different policies and objectives of the different resource providers and of the users agreement based resource management is considered feasible solution to solve many of these problems as it supports the reliable interaction between different providers and users however most current models do not well support co allocation here negotiation is needed to create such bi lateral agreements between several grid parties such negotiation process should be automated with no or minimal human interaction considering the potential scale of grid systems and the amount of necessary transactions therefore strategic negotiation models play an important role in this paper negotiation models which supports the co allocation between different resource providers are proposed and examined first simulations have been conducted to evaluate the presented system the results demonstrate that the proposed negotiation model are suitable and effective for grid environments
syndromic surveillance can play an important role in protecting the public’s health against infectious diseases infectious disease outbreaks can have devastating effect on society as well as the economy and global awareness is therefore critical to protecting against major outbreaks by monitoring online news sources and developing an accurate news classification system for syndromic surveillance public health personnel can be apprised of outbreaks and potential outbreak situations in this study we have developed framework for automatic online news monitoring and classification for syndromic surveillance the framework is unique and none of the techniques adopted in this study have been previously used in the context of syndromic surveillance on infectious diseases in recent classification experiments we compared the performance of different feature subsets on different machine learning algorithms the results showed that the combined feature subsets including bag of words noun phrases and named entities features outperformed the bag of words feature subsets furthermore feature selection improved the performance of feature subsets in online news classification the highest classification performance was achieved when using svm upon the selected combination feature subset
the self organizing knowledge representation aspects in heterogeneous information environments involving object oriented databases relational databases and rulebases are investigated the authors consider facet of self organizability which sustains the structural semantic integrity of an integrated schemea regardless of the dynamic nature of local schemata to achieve this objective they propose an overall scheme for schema translation and schema integration with an object oriented data model as common data model and it is shown that integrated schemata can be maintained effortlessly by propagating updates in local schemata to integrated schemata unambiguously
the increasing number of functionally similar services requires the existence of non functional properties selection process based on the quality of service qos thus in this article authors focus on the provision of qos model an architecture and an implementation which enhance the selection process by the annotation of service level agreement sla templates with semantic qos metrics this qos model is composed by specification for annotating sla templates files qos conceptual model formed as qos ontology and selection algorithm this approach which is backward compatible provides interoperability among customer providers and lightweight alternative finally its applicability and benefits are shown by using examples of infrastructure services
we present new technique for verifying correspondences in security protocols in particular correspondences can be used to formalize authentication our technique is fully automatic it can handle an unbounded number of sessions of the protocol and it is efficient in practice it significantly extends previous technique for the verification of secrecy the protocol is represented in an extension of the pi calculus with fairly arbitrary cryptographic primitives this protocol representation includes the specification of the correspondence to be verified but no other annotation this representation is then translated into an abstract representation by horn clauses which is used to prove the desired correspondence our technique has been proved correct and implemented we have tested it on various protocols from the literature the experimental results show that these protocols can be verified by our technique in less than
wireless sensor networks are resource constrained self organizing systems that are often deployed in inaccessible and inhospitable environments in order to collect data about some outside world phenomenon for most sensor network applications point to point reliability is not the main objective instead reliable event of interest delivery to the server needs to be guaranteed possibly with certain probability the nature of communication in sensor networks is unpredictable and failure prone even more so than in regular wireless ad hoc networks therefore it is essential to provide fault tolerant techniques for distributed sensor applications many recent studies in this area take drastically different approaches to addressing the fault tolerance issue in routing transport and or application layers in this paper we summarize and compare existing fault tolerant techniques to support sensor applications we also discuss several interesting open research directions
this paper presents an algorithm for maximizing the lifetime of sensor network while guaranteeing an upper bound on the end to end delay we prove that the proposed algorithm is optimal and requires simple computing operations that can be implemented by simple devices to the best of our knowledge this is the first paper to propose sensor wake up frequency that depends on the sensor’s location in the routing paths using simulations we show that the proposed algorithm significantly increases the lifetime of the network while guaranteeing maximum on the end to end delay
real time eyegaze selection interface was implemented using tobii eyegaze tracking monitor hierarchical button menu was displayed on the screen and specified selections were made by eyegaze fixations and glances on the menu widgets the initial version tested three different spatial layouts of the menu widgets and employed dwell glance method of selection results from the pilot interface led to usability improvements in the second version of the interface selections were activated using glance dwell method the usability of the second study interface received positive response from all participants each selection gained more than speed increase using the revised interface more intuitive selection interface in the second study allowed us to test users selection accuracy at faster dwell selection thresholds users quickly learned to achieve accurate selections in ms but made errors when selections occurred in ms
the turn model routing algorithms for mesh interconnection network achieve partial adaptivity without any virtual channels however the routing performance measured by simulations is worse than with the simple deterministic routing algorithm authors have explained these results simply by uneven dynamic load through the network however this phenomenon has not been studied furtherthis paper investigates performance degradation with turn model and drawbacks of partially adaptive routing in comparison with the deterministic routing and it introduces some new concepts our simulations deal with individual channels and results are presented by graphs rather than by commonly used averages an additional parameter channel occupation which is consistent with queuing theory commonly used in many proposed analytical models is introduced we also propose new structure the channel directions dependency graph cddg it provides new approach in analysis helps in understanding of dynamic routing behaviour and it can be generalized in other routing algorithms
although tremendous success has been achieved for interactive object cutout in still images accurately extracting dynamic objects in video remains very challenging problem previous video cutout systems present two major limitations reliance on global statistics thus lacking the ability to deal with complex and diverse scenes and treating segmentation as global optimization thus lacking practical workflow that can guarantee the convergence of the systems to the desired results we present video snapcut robust video object cutout system that significantly advances the state of the art in our system segmentation is achieved by the collaboration of set of local classifiers each adaptively integrating multiple local image features we show how this segmentation paradigm naturally supports local user editing and propagates them across time the object cutout system is completed with novel coherent video matting technique comprehensive evaluation and comparison is presented demonstrating the effectiveness of the proposed system at achieving high quality results as well as the robustness of the system against various types of inputs
to efficiently deliver streaming media researchers have developed technical solutions that fall into three categories each of which has its merits and limitations infrastructure based cdns with dedicated network bandwidths and hardware supports can provide high quality streaming services but at high cost server based proxies are cost effective but not scalable due to the limited proxy capacity in storage and bandwidth and its centralized control also brings single point of failure client based pp networks are scalable but do not guarantee high quality streaming service due to the transient nature of peers to address these limitations we present novel and efficient design of scalable and reliable media proxy system assisted by pp networks called prop in the prop system the clients machines in an intranet are self organized into structured pp system to provide large media storage and to actively participate in the streaming media delivery where the proxy is also embedded as an important member to ensure the quality of streaming service the coordination and collaboration in the system are efficiently conducted by our pp management structure and replacement policies our system has the following merits it addresses both the scalability problem in centralized proxy systems and the unreliable service concern by only relying on the pp sharing of clients the proposed content locating scheme can timely serve the demanded media data and fairly dispatch media streaming tasks in appropriate granularity across the system based on the modeling and analysis we propose global replacement policies for proxy and clients which well balance the demand and supply of streaming data in the system achieving high utilization of peers cache we have comparatively evaluated our system through trace driven simulations with synthetic workloads and with real life workload extracted from the media server logs in an enterprise network which shows our design significantly improves the quality of media streaming and the system scalability
message passing interface mpi is widely used standard for managing coarse grained concurrency on distributed computers debugging parallel mpi applications however has always been particularly challenging task due to their high degree of concurrent execution and non deterministic behavior deterministic replay is potentially powerful technique for addressing these challenges with existing mpi replay tools adopting either data replay or order replay approaches unfortunately each approach has its tradeoffs data replay generates substantial log sizes by recording every communication message order replay generates small logs but requires all processes to be replayed together we believe that these drawbacks are the primary reasons that inhibit the wide adoption of deterministic replay as the critical enabler of cyclic debugging of mpi applications this paper describes subgroup reproducible replay srr hybrid deterministic replay method that provides the benefits of both data replay and order replay while balancing their trade offs srr divides all processes into disjoint groups it records the contents of messages crossing group boundaries as in data replay but records just message orderings for communication within group as in order replay in this way srr can exploit the communication locality of traffic patterns in mpi applications during replay developers can then replay each group individually srr reduces recording overhead by not recording intra group communication and reduces replay overhead by limiting the size of each replay group exposing these tradeoffs gives the user the necessary control for making deterministic replay practical for mpi applications we have implemented prototype mpiwiz to demonstrate and evaluate srr mpiwiz employs replay framework that allows transparent binary instrumentation of both library and system calls as result mpiwiz replays mpi applications with no source code modification and relinking and handles non determinism in both mpi and os system calls our preliminary results show that mpiwiz can reduce recording overhead by over factor of four relative to data replay yet without requiring the entire application to be replayed as in order replay recording increases execution time by while the application can be replayed in just of its base execution time
the smoke and mirrors file system smfs mirrors files at geographically remote datacenter locations with negligible impact on file system performance at the primary site and minimal degradation as function of link latency it accomplishes this goal using wide area links that run at extremely high speeds but have long round trip time latencies combination of properties that poses problems for traditional mirroring solutions in addition to its raw speed smfs maintains good synchronization should the primary site become completely unavailable the system minimizes loss of work even for applications that simultaneously update groups of files we present the smfs design then evaluate the system on emulab and the cornell national lambda rail nlr ring testbed intended applications include wide area file sharing and remote backup for disaster recovery
we discuss approaches to incrementally construct an ensemble the first constructs an ensemble of classifiers choosing subset from larger set and the second constructs an ensemble of discriminants where classifier is used for some classes only we investigate criteria including accuracy significant improvement diversity correlation and the role of search direction for discriminant ensembles we test subset selection and trees fusion is by voting or by linear model using classifiers on data sets incremental search finds small accurate ensembles in polynomial time the discriminant ensemble uses subset of discriminants and is simpler interpretable and accurate we see that an incremental ensemble has higher accuracy than bagging and random subspace method and it has comparable accuracy to adaboost but fewer classifiers
using automated reasoning techniques we tackle the niche activity of proving that program is free from run time exceptions such property is particularly valuable in high integrity software for example safety or security critical applications the context for our work is the spark approach for the development of high integrity software the spark approach provides significant degree of automation in proving exception freedom where this automation fails however the programmer is burdened with the task of interactively constructing proof and possibly also having to supply auxiliary program annotations we minimize this burden by increasing the automation through an integration of proof planning and program analysis oracle we advocate cooperative integration where proof failure analysis directly constrains the search for auxiliary program annotations the approach has been successfully tested on industrial data
to provide scalable communication infrastructure for systems on chips socs networks on chips nocs communication centric design paradigm is needed to be cost effective socs are often programmable and integrate several different applications or use cases on to the same chip for the soc platform to support the different use cases the noc architecture should satisfy the performance constraints of each individual use case in this work we motivate the need to consider multiple use cases during the noc design process we present method to efficiently map the applications on to the noc architecture satisfying the design constraints of each individual use case we also present novel ways to dynamically reconfigure the network across the different use cases and explore the possibility of integrating dynamic voltage and frequency scaling dvs dfs techniques with the use case centric noc design methodology we validate the performance of the design methodology on several soc applications the dynamic reconfiguration of the noc integrated with dvs dfs schemes results in large power savings for the resulting noc systems
tools and analyses that find bugs in software are becoming increasingly prevalent however even after the potential false alarms raised by such tools are dealt with many real reported errors may go unfixed in such cases the programmers have judged the benefit of fixing the bug to be less than the time cost of understanding and fixing itthe true utility of bug finding tool lies not in the number of bugs it finds but in the number of bugs it causes to be fixedanalyses that find safety policy violations typically give error reports as annotated backtraces or counterexamples we propose that bug reports additionally contain specially constructed patch describing an example way in which the program could be modified to avoid the reported policy violation programmers viewing the analysis output can use such patches as guides starting points or as an additional way of understanding what went wrongwe present an algorithm for automatically constructing such patches given model checking and policy information typically already produced by most such analyses we are not aware of any previous automatic techniques for generating patches in response to safety policy violations our patches can suggest additional code not present in the original program and can thus help to explain bugs related to missing program elements in addition our patches do not introduce any new violations of the given safety policyto evaluate our method we performed software engineering experiment applying our algorithm to over bug reports produced by two off the shelf bug finding tools running on large java programs bug reports also accompanied by patches were three times as likely to be addressed as standard bug reportsthis work represents an early step toward developing new ways to report bugs and to make it easier for programmers to fix them even minor increase in our ability to fix bugs would be great increase for the quality of software
randomized response techniques have been investigated in privacy preserving categorical data analysis however the released distortion parameters can be exploited by attackers to breach privacy in this paper we investigate whether data mining or statistical analysis tasks can still be conducted on randomized data when distortion parameters are not disclosed to data miners we first examine how various objective association measures between two variables may be affected by randomization we then extend to multiple variables by examining the feasibility of hierarchical loglinear modeling finally we show some classic data mining tasks that cannot be applied on the randomized data directly
digital television will bring significant increase in the amount of channels and programs available to end users with many more difficulties to find contents appealing to them among myriad of irrelevant information thus automatic content recommenders should receive special attention in the following years to improve their assistance to users the current content recommenders have important deficiencies that hamper their wide acceptance in this paper we present new approach for automatic content recommendation that significantly reduces those deficiencies this approach based on semantic web technologies has been implemented in the advanced telematic search of audiovisual contents by semantic reasoning tool hybrid content recommender that makes extensive use of well known standards such as multimedia home platform tv anytime and owl also we have carried out an experimental evaluation the results of which show that our proposal performs better than other existing approaches copyright copy john wiley sons ltd
distributing single global clock across chip while meeting the power requirements of the design is troublesome task due to shrinking technology nodes associated with high clock frequencies to deal with this network on chip noc architectures partitioned into several voltage frequency islands vfis have been proposed to interface the islands on chip operating at different frequencies complex bi synchronous fifo design is inevitable however these fifos are not needed if adjacent switches belong to the same clock domain in this paper reconfigurable synchronous bi synchronous rsbs fifo is proposed which can adapt its operation to either synchronous or bi synchronous mode the fifo is presented by three different scalable and synthesizable design styles and in addition some techniques are suggested to show how the fifo could be utilized in vfi based noc our analysis reveal that the rsbs fifos can help to achieve up to savings in the average power consumption of noc switches and improvement in the total average packet latency in the case of mpeg encoder application when compared to non reconfigurable architecture
in this paper we discuss some of our recent research work designing tabletop interfaces for co located photo sharing we draw particular attention to specific feature of an interface design which we have observed over an extensive number of uses as facilitating an under reported but none the less intriguing aspect of the photo sharing experience namely the process of getting sidetracked through series of vignettes of interaction during photo sharing sessions we demonstrate how users of our tabletop photoware system used peripheral presentation of topically incoherent photos to artfully initiate new photo talk sequences in on going discourse from this we draw implications for the design of tabletop photo applications and for the experiential analysis of such devices
the high performance computing domain is enriching with the inclusion of networks on chip nocs as key component of many core cmps or mpsocs architectures nocs face the communication scalability challenge while meeting tight power area and latency constraints designers must address new challenges that were not present before defective components the enhancement of application level parallelism or power aware techniques may break topology regularity thus efficient routing becomes challengein this paper ulbdr universal logic based distributed routing is proposed as an efficient logic based mechanism that adapts to any irregular topology derived from meshes being an alternative to the use of routing tables either at routers or at end nodes ulbdr requires small set of configuration bits thus being more practical than large routing tables implemented in memories several implementations of ulbdr are presented highlighting the trade off between routing cost and coverage the alternatives span from the previously proposed lbdr approach with of coverage to the ulbdr mechanism achieving full coverage this comes with small performance cost thus exhibiting the trade off between fault tolerance and performance
automatic facial expression analysis is an interesting and challenging problem and impacts important applications in many areas such as human computer interaction and data driven animation deriving an effective facial representation from original face images is vital step for successful facial expression recognition in this paper we empirically evaluate facial representation based on statistical local features local binary patterns for person independent facial expression recognition different machine learning methods are systematically examined on several databases extensive experiments illustrate that lbp features are effective and efficient for facial expression recognition we further formulate boosted lbp to extract the most discriminant lbp features and the best recognition performance is obtained by using support vector machine classifiers with boosted lbp features moreover we investigate lbp features for low resolution facial expression recognition which is critical problem but seldom addressed in the existing work we observe in our experiments that lbp features perform stably and robustly over useful range of low resolutions of face images and yield promising performance in compressed low resolution video sequences captured in real world environments
web personalization is the process of customizing web site to the needs of specific users taking advantage of the knowledge acquired from the analysis of the user’s navigational behavior usage data in correlation with other information collected in the web context namely structure content and user profile data due to the explosive growth of the web the domain of web personalization has gained great momentum both in the research and commercial areas in this article we present survey of the use of web mining for web personalization more specifically we introduce the modules that comprise web personalization system emphasizing the web usage mining module review of the most common methods that are used as well as technical issues that occur is given along with brief overview of the most popular tools and applications available from software vendors moreover the most important research initiatives in the web usage mining and personalization areas are presented
an abstract business process contains description the protocol that business process engages in without revealing the internal computation of the process this description provides the information necessary to compose the process with other web services bpel supports this by providing distinct dialects for specifying abstract and executable processes unfortunately bpel does not prevent complex computations from being included in an abstract process this complicates the protocol description unnecessarily reveals implementation details and makes it difficult to analyze correctness we propose some restrictions on the data manipulation constructs that can be used in an abstract bpel process the restrictions permit full description of protocol while hiding computation restricted abstract process can easily be converted into an abstract bpel process or expanded into an executable bpel process based on these restrictions we propose formal model for business process and use it as the basis of an algorithm for demonstrating the correctness of protocol described by restricted abstract process we then sketch an algorithm for synthesizing protocol based on formal specification of its outcome and the tasks available for its construction
to meet the high demand for powerful embedded processors vliw architectures are increasingly complex eg multiple clusters and moreover they now run increasinglysophisticated control intensive applications as result developingarchitecture specific compiler optimizations is becomingboth increasingly critical and complex while time to market constraints remain very tightin this article we present novel program optimizationapproach called the virtual hardware compiler vhc that can perform as well as static compiler optimizations but which requires far less compiler development effort even for complex vliw architectures and complex targetapplications the principle is to augment the target processorsimulator with superscalar like features observe howthe target program is dynamically optimized during execution and deduce an optimized binary for the static vliwarchitecture developing an architecture specific optimizerthen amounts to modifying the processor simulator whichis very fast compared to adapting static compiler optimizationsto an architecture we also show that vhc optimizedbinary trained on number of data sets performs as wellas statically optimized binary on other test data sets theonly drawback of the approach is largely increased compilationtime which is often acceptable for embedded applicationsand devices using the texas instruments vliwprocessor and the associated compiler we experimentallyshow that this approach performs as well as static compileroptimizations for much lower research and developmenteffort using single core and dual core clusteredc processors we also show that the same approach canbe used for efficiently retargeting binary programs within afamily of processors
in this paper we present the results of research project concerning the temporal management of normative texts in xml format in particular four temporal dimensions publication validity efficacy and transaction times are used to correctly represent the evolution of norms in time and their resulting versioning hence we introduce multiversion data model based on xml schema and define basic mechanisms for the maintenance and retrieval of multiversion norm texts finally we describe prototype management system which has been implemented and evaluated
with increasing connectivity between computers the need to keep networks secure progressively becomes more vital intrusion detection systems ids have become an essential component of computer security to supplement existing defenses this paper proposes multiple level hybrid classifier novel intrusion detection system which combines the supervised tree classifiers and unsupervised bayesian clustering to detect intrusions performance of this new approach is measured using the kddcup dataset and is shown to have high detection and low false alarm rates
fault tolerance is an essential requirement for real time systems due to potentially catastrophic consequences of faults in this paper we investigate an efficient off line scheduling algorithm generating schedules in which real time tasks with precedence constraints can tolerate one processor’s permanent failure in heterogeneous system with fully connected network the tasks are assumed to be non preemptable and each task has two copies scheduled on different processors and mutually excluded in time in the literature in recent years the quality of schedule has been previously improved by allowing backup copy to overlap with other backup copies on the same processor however this approach assumes that tasks are independent of one other to meet the needs of real time systems where tasks have precedence constraints new overlapping scheme is proposed we show that given two tasks the necessary conditions for their backup copies to safely overlap in time with each other are their corresponding primary copies are scheduled on two different processors they are independent tasks and the execution of their backup copies implies the failures of the processors on which their primary copies are scheduled for tasks with precedence constraints the new overlapping scheme allows the backup copy of task to overlap with its successors primary copies thereby further reducing schedule length based on proposed reliability model tasks are judiciously allocated to processors so as to maximize the reliability of heterogeneous systems additionally times for detecting and handling of permanent fault are incorporated into the scheduling scheme we have performed experiments using synthetic workloads as well as real world application simulation results show that compared with existing scheduling algorithms in the literature our scheduling algorithm improves reliability by up to with an average of and achieves an improvement in performability measure that combines reliability and schedulability by up to with an average of
network throughput can be increased by allowing multipath adaptive routing adaptive routing allows more freedom in the paths taken by messages spreading load over physical channels more evenly the flexibility of adaptive routing introduces new possibilities of deadlock previous deadlock avoidance schemes in ary cubes require an exponential number of virtual channels independent of network size and dimension planar adaptive routing algorithms reduce the complexity of deadlock prevention by reducing the number of choices at each routing step in the fault free case planar adaptive networks are guaranteed to be deadlock free in the presence of network faults the planar adaptive router can be extended with misrouting to produce working network which remains provably deadlock free and is provably livelock free in addition planar adaptive networks can simultaneously support both in order and adaptive out of order packet delivery planar adaptive routing is of practical significance it provides the simplest known support for deadlock free adaptive routing in ary cubes of more than two dimensions with restricting adaptivity reduces the hardware complexity improving router speed or allowing additional performance enhancing network features the structure of planar adaptive routers is amenable to efficient implementation
in the distributed test architecture system with multiple ports is tested using tester at each port interface these testers cannot communicate with one another and there is no global clock recent work has defined an implementation relation for testing against an input output transition system in the distributed test architecture however this framework placed no restrictions on the test cases and in particular allowed them to produce some kind of nondeterminism in addition it did not consider the test generation problem this paper explores the class of controllable test cases for the distributed test architecture defining new implementation relation and test generation algorithm
emulators have long been valuable tool in teaching particularly in the os course emulators have allowed students to experiment meaningfully with different machine architectures furthermore many such tools run in user mode allowing students to operate as system administrators without the concomitant security risks virtual distributed ethernet vde is system which emulates in user mode all aspects of an internet including switches routers communication lines etc in completely realistic manner consistent with the operation of such artifacts in the real world vde’s can be implemented on single computer spread over several machines on the same lan or scattered across the internet vde can interoperate with both real systems via standard virtual interface connectivity tools and several virtual machine environments support encryption and actually run fast enough to support real applications furthermore vde can interface interoperate with real networks vdn’s have proven highly effective in supporting both undergraduate and graduate networking courses and wide range of student experiments and projects
ownership is relationship that pervades many aspects of our lives from the personal to the economic and is particularly important in the realm of the emerging electronic economy as it is understood on an intuitive level ownership exhibits great deal of complexity and carries rich semantics with respect both to the owner and the possession formal model of an ownership relationship that inherently captures varied ownership semantics is presented this ownership relationship expands the repertoire of available conceptual data modeling primitives it is built up from set of characteristic dimensions namely exclusiveness dependency documentation transferability and inheritance each of which focuses on specific aspect of ownership semantics the data modeler has the ability to make variety of choices along these five dimensions and thus has access to wide range of available ownership features in declarative fashion these choices ultimately impose various constraints specified in ocl on the states of data objects and their respective ownership activities including transactions such as acquiring and relinquishing ownership to complement the formal aspects of the ownership model and enhance its usability we present graphical ownership notation that augments the unified modeling language uml class diagram formalism an implementation of the ownership relationship in commercial object oriented database system is discussed
in this paper we present hybrid placer called ntuplace which integrates both the partitioning and the analytical quadratic programming placement techniques for large scale mixed size designs unlike most existing placers that minimize wirelength alone we also control the cell density to optimize routability while minimizing the total wirelength ntuplace consists of three major stages multilevel global placement legalization and detailed placement to handle mixed size designs in particular we present linear programming based legalization algorithm to remove overlaps between macros during global placement various other techniques are integrated to improve the solution quality at every stage
multimedia applications are fast becoming one of the dominating workloads for modern computer systems since these applications normally have large data sets and little data reuse many researchers believe that they have poor memory behavior compared to traditional programs and that current cache architectures cannot handle them well it is therefore important to quantitatively characterize the memory behavior of these applications in order to provide insights for future design and research of memory systems however very few results on this topic have been published this paper presents comprehensive research on the memory requirements of group of programs that are representative of multimedia applications these programs include subset of the popular mediabench suite and several large multimedia programs running on the linux windows nt and tru unix operating systems we performed extensive measurement and trace driven simulation experiments we then compared the memory utilization of these programs to that of specint applications we found that multimedia applications actually have better memory behavior than specint programs the high cache hit rates of multimedia applications can be contributed to the following three factors most multimedia applications apply block partitioning algorithms to the input data and work on small blocks of data that easily fit into the cache secondly within these blocks there is significant data reuse as well as spatial locality the third reason is that large number of references generated by multimedia applications are to their internal data structures which are relatively small and can also easily fit into reasonably sized caches
this paper studies how to enable an effective ranked retrieval over data with categorical attributes in particular by supporting personalized ranked retrieval of highly relevant data while ranked retrieval has been actively studied lately existing efforts have focused only on supporting ranking over numerical or text data however many real life data contain large amount of categorical attributes in combination with numerical and text attributes which cannot be efficiently supported unlike numerical attributes where natural ordering is inherent the existence of categorical attributes with no such ordering complicates both the formulation and processing of ranking this paper studies the efficient and effective support of ranking over categorical data as well as uniform support with other types of attributes
global routing for modern large scale circuit designs has attracted much attention in the recent literature most of the state of the art academic global routers just work on simplified routing congestion model that ignores the essential via capacity for routing through multiple metal layers such simplified model would easily cause fatal routability problems in subsequent detailed routing to remedy this deficiency we present in this paper more effective congestion metric that considers both the in tile nets and the residual via capacity for global routing with this congestion metric we develop new global router that features two novel routing algorithms for congestion optimization namely least flexibility first routing and multi source multi sink escaping point routing the least flexibility first routing processes the nets with the least flexibility first facilitating quick prediction of congestion hot spots for the subsequent nets enjoying lower time complexity than traditional maze and search routing in particular the linear time escaping point routing guarantees to find the optimal solution and achieves the theoretical lower bound time complexity experimental results show that our global router can achieve very high quality routing solutions with more reasonable via usage which can benefit and correctly guide subsequent detailed routing
in large scale multimedia storage system lmss where the user requests for different multimedia objects may have different demands placement and replication of the objects is an important factor as it may result in an imbalance in loading across the system since replica management and load balancing is crucial issue in multimedia systems normally this problem is handled by centralized servers eg metadata servers mds in distributed file systems each object based storage device osd responds to the requests coming from the centralized servers independently and has no communication with other osds among the system in this paper we design novel distributed architecture of lmss in which the osds have some kind of intelligences and can cooperate to achieve high performance such an osd named as autonomous object based storage device aosd can replicate the objects to and balance the requests among other aosds and handle fail over and recovery autonomously in the proposed architecture we move the request balancing from centralized mds to aosds and make the system more scalable flexible and robust based on the proposed architecture we propose two different object replication and load balancing algorithms named as minimum average waiting time mawt and one of the best two choices obtc respectively we validate the performance of the algorithms via rigorous simulations with respect to several influencing factors our findings conclusively demonstrate that the proposed architecture minimizes the average waiting time and at the same time carries out load balancing across servers
we present an internal language with equivalent expressive power to standard ml and discuss its formalization in lf and the machine checked verification of its type safety in twelf the internal language is intended to serve as the target of elaboration in an elaborative semantics for standard ml in the style of harper and stone therefore it includes all the programming mechanisms necessary to implement standard ml including translucent modules abstraction polymorphism higher kinds references exceptions recursive types and recursive functions our successful formalization of the proof involved careful interplay between the precise formulations of the various mechanisms and required the invention of new representation and proof techniques of general interest
desktop client applications interact with both local and remote resources this is both benefit in terms of the rich features desktop clients can provide but also security risk due to their high connectivity desktop clients can leave user’s machine vulnerable to viruses malicious plug ins and scripts aspect oriented software development can be used to address security concerns in software in modular fashion however most existing research focuses on the protection of server side resources in this paper we introduce an aspect oriented mechanism authority aspects to enforce the principle of least privilege on desktop clients this helps to ensure that legitimate resource access is allowed and illegitimate access is blocked we present case study applying our approach on two desktop applications an rss feed aggregator and web browser
in this paper we present multi stream approach for off line handwritten word recognition the proposed approach combines low level feature streams namely density based features extracted from different sliding windows with different widths and contour based features extracted from upper and lower contours the multi stream paradigm provides an interesting framework for the integration of multiple sources of information and is compared to the standard combination strategies namely fusion of representations and fusion of decisions we investigate the extension of stream approach to streams and analyze the improvement in the recognition performance the computational cost of this extension is discussed significant experiments have been carried out on two publicly available word databases ifn enit benchmark database arabic script and ironoff database latin script the multi stream framework improves the recognition performance in both cases using stream approach the best recognition performance is in the case of the arabic script on word lexicon consisting of tunisian town village names in the case of the latin script the proposed approach achieves recognition rate of using lexicon of words
this paper presents new density based clustering algorithm st dbscan which is based on dbscan we propose three marginal extensions to dbscan related with the identification of core objects ii noise objects and iii adjacent clusters in contrast to the existing density based clustering algorithms our algorithm has the ability of discovering clusters according to non spatial spatial and temporal values of the objects in this paper we also present spatial temporal data warehouse system designed for storing and clustering wide range of spatial temporal data we show an implementation of our algorithm by using this data warehouse and present the data mining results
the semantic web augments the current www by giving information well defined meaning better enabling computers and people to work in cooperation this is done by adding machine understandable content to web resources such added content is called metadata whose semantics is provided by referring to an ontology domain’s conceptualization agreed upon by community the semantic web relies on the complex interaction of several technologies involving ontologies therefore sophisticated semantic web applications typically comprise more than one software module instead of coming up with proprietary solutions developers should be able to rely on generic infrastructure for application development in this context we call such an infrastructure application server for the semantic web whose design and development are based on existing application servers however we apply and augment their underlying concepts for use in the semantic web and integrate semantic technology within the server itself the article discusses requirements and design issues of such server presents our implementation kaon server and demonstrates its usefulness by detailed scenario
this short paper argues that multi relational data mining has key role to play in the growth of kdd and briefly surveys some of the main drivers research problems and opportunities in this emerging field
for almost decade we have been working at developing and using template based models for coarse grained parallel computing our initial system frameworks was positively received but had number of shortcomings the enterprise parallel programming environment evolved out of this work and now after several years of experience with the system its shortcomings are becoming evident this paper outlines our experiences in developing and using the two parallel programming systems many of our observations are relevant to other parallel programming systems even though they may be based on different assumptions although template base models have the potential for simplifying the complexities of parallel programming they have yet to realize these expectations for high performance applications
jockey is an execution record replay tool for debugging linux programs it records invocations of system calls and cpu instructions with timing dependent effects and later replays them deterministically it supports process checkpointing to diagnose long running programs efficiently jockey is implemented as shared object file that runs as part of the target process while this design is the key for achieving jockey’s goal of safety and ease of use it also poses challenges this paper discusses some of the practical issues we needed to overcome in such environments including low overhead system call interception techniques for segregating resource usage between jockey and the target process and an interface for fine grain control of jockey’s behavior
this paper presents the results of an empirical study on the subjective evaluation of code smells that identify poorly evolvable structures in software we propose use of the term software evolvability to describe the ease of further developing piece of software and outline the research area based on four different viewpoints furthermore we describe the differences between human evaluations and automatic program analysis based on software evolvability metrics the empirical component is based on case study in finnish software product company in which we studied two topics first we looked at the effect of the evaluator when subjectively evaluating the existence of smells in code modules we found that the use of smells for code evaluation purposes can be difficult due to conflicting perceptions of different evaluators however the demographics of the evaluators partly explain the variation second we applied selected source code metrics for identifying four smells and compared these results to the subjective evaluations the metrics based on automatic program analysis and the human based smell evaluations did not fully correlate based upon our results we suggest that organizations should make decisions regarding software evolvability improvement based on combination of subjective evaluations and code metrics due to the limitations of the study we also recognize the need for conducting more refined studies and experiments in the area of software evolvability
we define and study bisimulation for proving contextual equivalence in an aspect extension of the untyped lambda calculus to our knowledge this is the first study of coinductive reasoning principles aimed at proving equality of aspect programs the language we study is very small yet powerful enough to encode mutable references and range of temporal pointcuts including cflow and regular event patterns examples suggest that our bisimulation principle is useful for an encoding of higher order programs with state our methods suffice to establish well known and well studied subtle examples involving higher order functions with stateeven in the presence of first class dynamic advice and expressive pointcuts our reasoning principles show that aspect aware interfaces can aid in ensuring that clients of component are unaffected by changes to an implementation our paper generalizes existing results given for open modules to also include variety of history sensitive pointcuts such as cflow and regular event patternsour formal techniques and results suggest that aspects are amenable to the formal techniques developed for stateful higher order programs
we explore the ways in which interfaces can be designed to deceive users so as to create the illusion of magic we present study of an experimental performance in which magician used computer vision system to conduct series of illusions based on the well known three cups magic trick we explain our findings in terms of the two broad strategies of misdirecting attention and setting false expectations articulating specific tactics that were employed in each case we draw on existing theories of collaborative and spectator interfaces ambiguity and interpretation and trajectories through experiences to explain our findings in broader hci terms we also extend and integrate current theory to provide refined sensitising concepts for analysing deceptive interactions
animators today have started using motion captured mocap sequences to drive characters mocap allows rapid acquisition of highly realistic animation data consequently animators have at their disposal an enormous amount of mocap sequences which ironically has created new retrieval problem thus while working with mocap databases an animator often needs to work with subset of useful clips once the animator selects candidate working set of motion clips she then needs to identify appropriate transition points amongst these clips for maximal reusein this paper we describe methods for querying mocap databases and identifying transitions for given set of clips we preprocess clips and clip subsequences and precompute frame locations to allow interactive stitching in contrast with existing methods that view each individual clips as nodes for optimal reuse we reduce the granularity
there has been significant increase in the number of cctv cameras in public and private places worldwide the cost of monitoring these cameras manually and of reviewing recorded video is prohibitive and therefore manual systems tend to be used mainly reactively with only small fraction of the cameras being monitored at any given time there is need to automate at least simple observation tasks through computer vision functionality that has become known popularly as video analytics the large size of cctv systems and the requirement of high detection rates and low false alarms are major challenges this paper illustrates some of the recent efforts reported in the literature highlighting advances and pointing out important limitations
articulated hand tracking systems have been widely used in virtual reality but are rarely deployed in consumer applications due to their price and complexity in this paper we propose an easy to use and inexpensive system that facilitates articulated user input using the hands our approach uses single camera to track hand wearing an ordinary cloth glove that is imprinted with custom pattern the pattern is designed to simplify the pose estimation problem allowing us to employ nearest neighbor approach to track hands at interactive rates we describe several proof of concept applications enabled by our system that we hope will provide foundation for new interactions in modeling animation control and augmented reality
pervasive devices are becoming popular and smaller those mobile systems should be able to adapt to changing requirements and execution environments but it requires the ability to reconfigure deployed codes which is considerably simplified if applications are component oriented rather than monolithic blocks of codes so we propose middleware called wcomp which federates an event driven component oriented approach to compose distributed services for devices this approach is coupled with adaptation mechanisms dealing with separation of concerns in such mechanisms aspects called aspects of assembly are selected either by the user or by self adaptive process and composed by weaver with logical merging of high level specifications the result of the weaver is then projected in terms of pure elementary modifications of components assemblies with respect to blackbox properties of cots components our approach is validated by analyzing the results of different experiments drawn from sets of application configurations randomly generated and by showing its advantages while evaluating the additional costs on the reaction time to context changing
we introduce method for segmentation of materials segmented in volumetric models of mechanical parts created by ray ct scanning for the purpose of generating their boundary surfaces when the volumetric model is composed of two materials one for the object and the other for the background air these boundary surfaces can be extracted as isosurfaces using surface contouring method for volumetric model composed of more than two materials we need to classify the voxel types into segments by material and then use surface contouring method that can deal with both ct values and material types here we propose method for precisely classifying the volumetric model into its component materials using modified and combined method of two well known algorithms in image segmentation region growing and graph cut we then apply our non manifold iso contouring method to generate triangulated mesh surfaces in addition we demonstrate the effectiveness of our method by constructing high quality triangular mesh models of the segmented parts
users often want to find entities instead of just documents ie finding documents entirely about specific real world entities rather than general documents where the entities are merely mentioned searching for entities on web scale repositories is still an open challenge as the effectiveness of ranking is usually not satisfactory semantics can be used in this context to improve the results leveraging on entity driven ontologies in this paper we propose three categories of algorithms for query adaptation using semantic information nlp techniques and link structure to rank entities in wikipedia our approaches focus on constructing queries using not only keywords but also additional syntactic information while semantically relaxing the query relying on highly accurate ontology the results show that our approaches perform effectively and that the combination of simple nlp link analysis and semantic techniques improves the retrieval performance of entity search
this paper presents tangible interaction techniques for fine tuning one to one scale nurbs curves on large display for automotive design we developed new graspable handle with transparent groove that allows designers to manipulate virtual curves on display screen directly the use of the proposed handle leads naturally to rich vocabulary of terms describing interaction techniques that reflect existing shape styling methods user test raised various issues related to the graspable user interface two handed input and large display interaction
as fundamental problem in distributed hash table dht based systems load balancing is important to avoid performance degradation and guarantee system fairness among existing migration based load balancing strategies there are two main categories rendezvous directory strategy rds and independent searching strategy iss however none of them can achieve resilience and efficiency at the same time in this paper we propose group multicast strategy gms for load balancing in dht systems which attempts to achieve the benefits of both rds and iss gms does not rely on few static rendezvous directories to perform load balancing instead load information is disseminated within the formed groups via multicast protocol thus each peer has enough information to act as the rendezvous directory and perform load balancing within its group besides intra group load balancing inter group load balancing and emergent load balancing are also supported by gms in gms the position of the rendezvous directory is randomized in each round which further improves system resilience in order to have better understanding of gms we also perform analytical studies on gms in terms of its scalability and efficiency under churn finally the effectiveness of gms is evaluated by extensive simulation under different workload and churn levels
the family of trees is suitable for indexing various kinds of multidimensional objects tpr trees are tree based structures that have been proposed for indexing moving object database eg data base of moving boats region quadtrees are suitable for indexing dimensional regional data and their linear variant linear region quadtrees is used in many geographical information systems gis for this purpose eg for the representation of stormy or sunny regions although both are tree structures the organization of data space the types of spatial data stored and the search algorithms applied on them are different in trees and region quadtrees in this paper we examine spatio temporal problem that appears in many practical applications processing of predictive joins between moving objects and regions eg discovering the boats that will enter storm using these two families of data structures as storage and indexing mechanisms and taking into account their similarities and differences with thorough experimental study we show that the use of synchronous depth first traversal order has the best performance balance on average taking into account the activity and response time as performance measurements
the development of grid and workflow technologies has enabled complex loosely coupled scientific applications to be executed on distributed resources many of these applications consist of large numbers of short duration tasks whose runtimes are heavily influenced by delays in the execution environment such applications often perform poorly on the grid because of the large scheduling overheads commonly found in grids in this paper we present provisioning system based on multi level scheduling that improves workflow runtime by reducing scheduling overheads the system reserves resources for the exclusive use of the application and gives applications control over scheduling policies we describe our experiences with the system when running suite of real workflow based applications including in astronomy earthquake science and genomics provisioning resources with corral ahead of the workflow execution reduced the runtime of the astronomy application by up to on average and of genome mapping application by an order of magnitude when compared to traditional methods we also show how provisioning can benefit applications both on small local cluster as well as large scale campus resource
quality aspects become increasingly important when business process modeling is used in large scale enterprise setting in order to facilitate storage without redundancy and an efficient retrieval of relevant process models in model databases it is required to develop theoretical understanding of how degree of behavioral similarity can be defined in this paper we address this challenge in novel way we use causal footprints as an abstract representation of the behavior captured by process model since they allow us to compare models defined in both formal modeling languages like petri nets and informal ones like epcs based on the causal footprint derived from two models we calculate their similarity based on the established vector space model from information retrieval we validate this concept with an experiment using the sap reference model and an implementation in the prom framework
in this article we propose deniable electronic voting authentication protocol for mobile ad hoc networks which meets the essential requirements of secure voting system due to the characteristics constraints and security requirements of mobile ad hoc networks our protocol does not require the aid of any centralized administration and mobile nodes could cooperate with each other to securely facilitate voting finally the proposed protocol provides the ability to deniable authentication when legal voter casts vote with this voting system he she can deny that he she has voted for third party and the third party cannot judge who votes
this paper presents two new hardware assisted rendering techniques developed for interactive visualization of the terascale data generated from numerical modeling of next generation accelerator designs the first technique based on hybrid rendering approach makes possible interactive exploration of large scale particle data from particle beam dynamics modeling the second technique based on compact texture enhanced representation exploits the advanced features of commodity graphics cards to achieve perceptually effective visualization of the very dense and complex electromagnetic fields produced from the modeling of reflection and transmission properties of open structures in an accelerator design because of the collaborative nature of the overall accelerator modeling project the visualization technology developed is for both desktop and remote visualization settings we have tested the techniques using both time varying particle data sets containing up to one billion particles per time step and electromagnetic field data sets with millions of mesh elements
dynamic instruction scheduling logic is quite complex and dissipates significant energy in microprocessors that support superscalar and out of order execution we propose novel microarchitectural technique to reduce the complexity and energy consumption of the dynamic instruction scheduling logic the proposed method groups several instructions as single issue unit and reduces the required number of ports and the size of the structure for dispatch wakeup select and issue the present paper describes the microarchitecture mechanisms and shows evaluation results for energy savings and performance these results reveal that the proposed technique can greatly reduce energy with almost no performance degradation compared to the conventional dynamic instruction scheduling logic
an important feature of information systems is the ability to inform users about changes of the stored information therefore systems have to know what changes user wants to be informed about this is well known from the field of publish subscribe architectures in this paper we propose solution for information system designers of how to extend their information model in way that the notification mechanism can consider semantic knowledge when determining which parties to inform two different kinds of implementations are introduced and evaluated one based on aspect oriented programming aop the other one based on traditional database triggers the evaluation of both approaches leads to combined approach preserving the advantages of both techniques using model driven architecture mda to create the triggers from uml model enhanced with stereotypes
the problem of object recognition has been considered here color descriptions from distinct regions covering multiple segments are considered for object representation distinct multicolored regions are detected using edge maps and clustering performance of the proposed methodologies has been evaluated on three data sets and the results are found to be better than existing methods when small number of training views is considered
dynamic class loading during program execution in the java programming language is an impediment for generating code that is as efficient as code generated using static whole program analysis and optimization whole program analysis and optimization is possible for languages such as that do not allow new classes and or methods to be loaded during program execution one solution for performing whole program analysis and avoiding incorrect execution after new class is loaded is to invalidate and recompile affected methods runtime invalidation and recompilation mechanisms can be expensive in both space and time and therefore generally restrict optimization to address these drawbacks we propose new framework called the extant analysis framework for interprocedural optimization of programs that support dynamic class or method loading given set of classes comprising the closed world we perform an offline static analysis which partitions references into two categories unconditionally extant references which point only to objects whose runtime type is guaranteed to be in the closed world and conditionally extant references which point to objects whose runtime type is not guaranteed to be in the closed world optimizations solely dependent on the first categorycan be statically performed and are guaranteed to be correct even with any future class method loading optimizations dependent on the second category are guarded by dynamic tests called extant safety tests for correct execution behaviorwe describe the properties for extant safety tests and provide algorithms for their generation and placement
this paper proposes learning approach for the merging process in multilingual information retrieval mlir to conduct the learning approach we also present large number of features that may influence the mlir merging process these features are mainly extracted from three levels query document and translation after the feature extraction we then use the frank ranking algorithm to construct merge model to our knowledge this practice is the first attempt to use learning based ranking algorithm to construct merge model for mlir merging in our experiments three test collections for the task of crosslingual information retrieval clir in ntcir and are employed to assess the performance of our proposed method moreover several merging methods are also carried out for comparison including traditional merging methods the step merging strategy and the merging method based on logistic regression the experimental results show that our method can significantly improve merging quality on two different types of datasets in addition to the effectiveness through the merge model generated by frank our method can further identify key factors that influence the merging process this information might provide us more insight and understanding into mlir merging
in today’s multitiered application architectures clients do not access data stored in the databases directly instead they use applications which in turn invoke the dbms to generate the relevant content since executing application programs may require significant time and other resources it is more advantageous to cache application results in result cache various view materialization and update management techniques have been proposed to deal with updates to the underlying data these techniques guarantee that the cached results are always consistent with the underlying data several applications including commerce sites on the other hand do not require the caches be consistent all the time instead they require that all outdated pages in the caches are invalidated in timely fashion in this paper we show that invalidation is inherently different from view maintenance we develop algorithms that benefit from this difference in reducing the cost of update management in certain applications and we present an invalidation framework that benefits from these algorithms
in this paper we propose novel framework for the web based retrieval of chinese calligraphic manuscript images which includes two main components shape similarity ss based method which is to effectively support retrieval over large chinese calligraphic manuscript databases in this retrieval method shapes of calligraphic characters are represented by their approximate contour points extracted from the character images to speed up the retrieval efficiency we then propose composite distance tree cd tree based high dimensional indexing scheme for it comprehensive experiments are conducted to testify the effectiveness and efficiency of our proposed retrieval and indexing methods respectively
we propose in this paper an algorithm for off line selection of the contents of on chip memories the algorithm supports two types of on chip memories namely locked caches and scratchpad memories the contents of on chip memory although selected off line is changed at run time for the sake of scalability with respect to task size experimental results show that the algorithm yields to good ratios of on chip memory accesses on the worst case execution path with tolerable reload overhead for both types of on chip memories furthermore we highlight the circumstances under which one type of on chip memory is more appropriate than the other depending of architectural parameters cache block size and application characteristics basic block size
most existing dynamic voltage scaling dvs schemes for multiple tasks assume an energy cost function energy consumption versus execution time that is independent of the task characteristics in practice the actual energy cost functions vary significantly from task to task different tasks running on the same hardware platform can exhibit different memory and peripheral access patterns cache miss rates etc these effects results in distinct energy cost function for each taskwe present new formulation and solution to the problem of minimizing the total dynamic and static system energy while executing set of tasks under dvs first we demonstrate and quantify the dependence of the energy cost function on task characteristics by direct measurements on real hardware platform the ti omap processor using real application programs next we present simple analytical solutions to the problem of determining energy optimal voltage scale factors for each task while allowing each task to be preempted and to have its own energy cost function based on these solutions we present simple and efficient algorithms for implementing dvs with multiple tasks we consider two cases all tasks have single deadline and each task has its own deadline experiments on real hardware platform using real applications demonstrate additional saving in total system energy compared to previous leakage aware dvs schemes
the order and arrangement of dimensions variates is crucial for the effectiveness of large number of visualization techniques such as parallel coordinates scatterplots recursive pattern and many others in this paper we describe systematic approach to arrange the dimensions according to their similarity the basic idea is to rearrange the data dimensions such that dimensions showing similar behavior are positioned next to each other for the similarity clustering of dimensions we need to define similarity measures which determine the partial or global similarity of dimensions we then consider the problem of finding an optimal one or two dimensional arrangement of the dimensions based on their similarity theoretical considerations show that both the one and the two dimensional arrangement problem are surprisingly hard problems ie they are np complete our solution of the problem is therefore based on heuristic algorithms an empirical evaluation using number of different visualization techniques shows the high impact of our similarity clustering of dimensions on the visualization results
nowadays enormous amounts of data are continuously generated not only in massive scale but also from different sometimes conflicting views therefore it is important to consolidate different concepts for intelligent decision making for example to predict the research areas of some people the best results are usually achieved by combining and consolidating predictions obtained from the publication network co authorship network and the textual content of their publications multiple supervised and unsupervised hypotheses can be drawn from these information sources and negotiating their differences and consolidating decisions usually yields much more accurate model due to the diversity and heterogeneity of these models in this paper we address the problem of consensus learning among competing hypotheses which either rely on outside knowledge supervised learning or internal structure unsupervised clustering we argue that consensus learning is an np hard problem and thus propose to solve it by an efficient heuristic method we construct belief graph to first propagate predictions from supervised models to the unsupervised and then negotiate and reach consensus among them their final decision is further consolidated by calculating each model’s weight based on its degree of consistency with other models experiments are conducted on newsgroups data cora research papers dblp author conference network and yahoo movies datasets and the results show that the proposed method improves the classification accuracy and the clustering quality measure nmi over the best base model by up to furthermore it runs in time proportional to the number of instances which is very efficient for large scale data sets
business process modeling is expensive and time consuming it largely depends on the elicitation method and the person in charge the model needs to be shared in order to promote multiple perspectives this paper describes group storytelling approach as an alternative to the traditional individual interviews to elicitate processes information gathering is proposed to be done through capturing the stories told by process performers who describe their work difficulties and suggestions process to abstract and transform stories into business process representations is also part of the method tool to support storytelling and this transformation is described as well
in this paper new algorithm is proposed to improve the efficiency and robustness of random sampling consensus ransac without prior information about the error scale three techniques are developed in an iterative hypothesis and evaluation framework firstly we propose consensus sampling technique to increase the probability of sampling inliers by exploiting the feedback information obtained from the evaluation procedure secondly the preemptive multiple th order approximation pmka is developed for efficient model evaluation with unknown error scale furthermore we propose coarse to fine strategy for the robust standard deviation estimation to determine the unknown error scale experimental results of the fundamental matrix computation on both simulated and real data are shown to demonstrate the superiority of the proposed algorithm over the previous methods
cooperative checkpointing increases the performance and robustness of system by allowing checkpoints requested by applications to be dynamically skipped at runtime robust system must be more than merely resilient to failures it must be adaptable and flexible in the face of new and evolving challenges simulation based experimental analysis using both probabilistic and harvested failure distributions reveals that cooperative checkpointing enables an application to make progress under wide variety of failure distributions that periodic checkpointing lacks the flexibility to handle cooperative checkpointing can be easily implemented on top of existing application initiated checkpointing mechanisms and may be used to enhance other reliability techniques like qos guarantees and fault aware job scheduling the simulations also support number of theoretical predictions related to cooperative checkpointing including the non competitiveness of periodic checkpointing
mallet multi agent logic language for encoding teamwork is intended to enable expression of teamwork emulating human teamwork allowing experimentation with different levels and forms of inferred team intelligence consequence of this goal is that the actual teamwork behavior is determined by the level of intelligence built into the underlying system as well as the semantics of the language in this paper we give the design objectives the syntax and an operational semantics for mallet in terms of transition system we show how the semantics can be used to reason about the behaviors of team based agents the semantics can also be used to guide the implementation of various mallet interpreters emulating different forms of team intelligence as well as formally study the properties of team based agents specified in mallet we have explored various forms of proactive information exchange behavior embodied in human teamwork using the cast system which implements built in mallet interpreter
superimposition is composition technique that has been applied successfully in several areas of software development in order to unify several languages and tools that rely on superimposition we present an underlying language independent model that is based on feature structure trees fsts furthermore we offer tool called fstcomposer that composes software components represented by fsts currently the tool supports the composition of components written in java jak xml and plain text three nontrivial case studies demonstrate the practicality of our approach
beaconless georouting algorithms are fully reactive and work without prior knowledge of their neighbors however existing approaches can either not guarantee delivery or they require the exchange of complete neighborhood information we describe two general methods for completely reactive face routing with guaranteed delivery the beaconless forwarder planarization bfp scheme determines correct edges of local planar subgraph without hearing from all neighbors face routing then continues properly angular relaying determines directly the next hop of face traversal both schemes are based on the select and protest principle neighbors respond according to delay function but only if they do not violate planar subgraph condition protest messages are used to remove falsely selected neighbors that are not in the planar subgraph we show that correct beaconless planar subgraph construction is not possible without protests we also show the impact of the chosen planar subgraph on the message complexity with the new circlunar neighborhood graph cng we can bound the worst case message complexity of bfp which is not possible when using the gabriel graph gg for planarization simulation results show similar message complexities in the average case when using cng and gg angular relaying uses delay function that is based on the angular distance to the previous hop we develop theoretical framework for delay functions and show both theoretically and in simulations that with function of angle and distance we can reduce the number of protests by factor of compared to simple angle based delay function
broadcasting is commonly used communication primitive needed by many applications and protocols in mobile ad hoc networks manet unfortunately most broadcast solutions are tailored to one class of manets with respect to node density and node mobility and are unlikely to operate well in other classes in this paper we introduce hypergossiping novel adaptive broadcast algorithm that combines two strategies hypergossiping uses adaptive gossiping to efficiently distribute messages within single network partitions and implements an efficient heuristic to distribute them across partitions simulation results in ns show that hypergossiping operates well for broad range of manets with respect to node densities mobility levels and network loads
in this paper we solve the classic problem of computing the average number of decomposed quadtree blocks quadrants nodes or pieces in quadtree decomposition for an arbitrary hyperrectangle aligned with the axes we derive closed form formula for general cases the previously known state of the art solution provided closed form solution for special cases and utilized these formulas to derive linearly interpolated formulas for general cases individually however there is no exact and uniform closed form formula that fits all cases contrary to the top down counting approach used by most prior solutions we employ bottom up enumeration approach to transform the problem into one that involves the cartesian product of multisets of successive powers classic combinatorial enumeration techniques are applied to obtain an exact and uniform closed form formula the result is of theoretical interest since it is the first exact closed form formula for arbitrary cases practically it is nice to have uniform formula for estimating the average number since simple program can be conveniently constructed taking side lengths as inputs
the problem of performing tasks on asynchronous or undependable processors is basic problem in parallel and distributed computing we consider an abstraction of this problem called the write all problem mdash using processors write into all locations of an array of size the most efficient known deterministic asynchronous algorithms for this problem are due to anderson and woll the first class of algorithms has work complexity of ogr egr for nle ty and any egr and they are the best known for the full range of processors to schedule the work of the processors the algorithms use sets of permutations on nle that have certain combinatorial properties instantiating such an algorithm for specific egr either requires substantial pre processing exponential in egr to find the requisite permutations or imposes prohibitive constant exponential in egr hidden by the asymptotic analysis the second class deals with the specific case of nu nle and these algorithms have work complexity of ogr log they also use sets of permutations with the same combinatorial properties however instantiating these algorithms requires exponential in preprocessing to find the permutations to alleviate this costly instantiation kanellakis and shvartsman proposed simple way of computing the permutation schedules they conjectured that their construction has the desired properties but they provided no analysis in this paper we show for the first time an analysis of the properties of the set of permutations proposed by kanellakis and shvartsman our result is hybrid as it includes analytical and empirical parts the analytical result covers subset of the possible adversarial patterns of asynchrony the empirical results provide strong evidence that our analysis covers the worst case scenario and we formally state it as conjecture we use these results to analyze an algorithm for nu gne tasks that takes advantage of processor slackness and that has work ogr log conditioned on our conjecture this algorithm requires only ogr log time to instantiate it next we study the case for the full range of processors nle we define family of deterministic asynchronous write all algorithms with work ogr egr contingent upon our conjecture we show that our method yields faster construction of ogr egr write all algorithms than the method developed by anderson and woll finally we show that our approach yields more efficient write all algorithms as compared to the algorithms induced by the constructions of naor and roth for the same asymptotic work complexity
microprocessor vendors have provided special purpose instructions such as psadbw and pdist to accelerate the sum of absolute differences sad similarity measurement the usefulness of these special purpose instructions is limited except for the motion estimation kernel this has several drawbacks first if the sad becomes obsolete because different similarity metric is going to be employed then those special purpose instructions are no longer useful second these special instructions process bit subwords only this precision is not su cient for some kernels such as motion estimation in the transform domain in addition when employing other way parallel simd instructions to implement the sad and sum of squared differences ssd the obtained speedup is much less than this is because there is mismatch between the storage and the computational format in this paper we design and evaluate variety of simd instructions for different data types we synthesize special purpose instructions using few general purpose simd instructions in addition we employ the extended subwords technique to avoid conversion overhead and to increase parallelism in this technique there are four extra bits for every byte of register the results show that using different simd instructions and extended subwords achieve speedup ranging from to over performance for sad ssd with interpolation and ssd functions in the motion estimation kernel while mmx achieves speedup ranging from to additionally the proposed simd instructions improve the performance of similarity measurement for image histograms by factor ranging from way to way over cwhile for mmx speedup is between way and way
component based software development cbsd is an attractive way to deliver generic executable pieces of program ready to be reused in many different contexts component reuse is based on black box model that frees component consumers from diving into implementation details adapting generic component to particular context of use is then based on parameterized interface that becomes specific component wrapper at runtime this shallow adaptation which keeps the component implementation unchanged is major source of inefficiency by building on top of well known specialization techniques it is possible to take advantage of the genericity of components and adapt their implementation to their usage context without breaking the black box model we illustrate these ideas on simple component model considering dual specialization techniques partial evaluation and slicing key to not breaking encapsulation is to use specialization scenarios extended with assumptions on the required services and to package components as component generators
web spam is behavior that attempts to deceive search engine ranking algorithms trustrank is recent algorithm that can combat web spam however trustrank is vulnerable in the sense that the seed set used by trustrank may not be sufficiently representative to cover well the different topics on the web also for given seed set trustrank has bias towards larger communities we propose the use of topical information to partition the seed set and calculate trust scores for each topic separately to address the above issues combination of these trust scores for page is used to determine its ranking experimental results on two large datasets show that our topical trustrank has better performance than trustrank in demoting spam sites or pages compared to trustrank our best technique can decrease spam from the top ranked sites by as much as
building hardware prototypes for computer architecture research is challenging unfortunately development of the required software tools compilers debuggers runtime is even more challenging which means these systems rarely run real applications to overcome this issue when developing our prototype platform we used the tensilica processor generator to produce customized processor and corresponding software tools and libraries while this base processor was very different from the streamlined custom processor we initially imagined it allowed us to focus on our main objective the design of reconfigurable cmp memory system and to successfully tape out an core cmp chip with only small group of designers one person was able to handle processor configuration and hardware generation support of complete software tool chain as well as developing the custom runtime software to support three different programming models having sophisticated software tool chain not only allowed us to run more applications on our machine it once again pointed out the need to use optimized code to get an accurate evaluation of architectural features
interpreters are widely used to implement portable language runtime environments programs written in these languages may benefit from performance beyond that obtainable by optimizing interpretation alone modern high performance mixed mode virtual machine vm includes amethod based just in time jit compiler method based jit however requires the up front development of complex compilation infrastructure before any performance benefits are realizedideally the architecture for mixed mode vm could detect and execute variety of shapes of hot regions of virtual program our vm architecture is based on context threading it supports powerful efficient instrumentation and simple framework for dynamic code generation it has the potential to directly support full spectrum of mixed mode execution from interpreted bytecode bodies to specialized bytecode generated at runtime to traces to compiled methods further it provides the necessary tools to detect these regions at runtimewe extended two vms sablevm and the ocaml interpreter with our infrastructure on both the and ppc to demonstrate the power and flexibility of our infrastructure we compare the selection and dispatch effectiveness for three common region shapes whole methods partial methods and specl traces we report results for preliminary version of our code generator which compiles region into sequence of direct calls to bytecode bodies
this tutorial introduces dynamic web services as solution to cope with the dynamism and flexibility required by many modern software systems current technologies wsdl ws bpel etc have proven insufficient in addressing these issues however they remain good starting point for the analysis of the current situation and for building for the futurethe core part of the tutorial analyzes by looking at available technologies and prominent research proposals the deployment and execution of these applications within three separate phases composition phase to discover available services and implement the desired behavior monitoring phase to understand if given service is behaving correctly with respect to both functional and non functional requirements and recovery phase to react to anomalies by means of suitable replanning or recovery strategiesin conclusion the tutorial summarizes the main topics presents list of still to be solved problems and highlights possible directions for future research
defeasible logic is promising representation for legal knowledge that appears to overcome many of the deficiencies of previous approaches to representing legal knowledge unfortunately an immediate application of technology to the challenges of generating theories in the legal domain is an expensive and computationally intractable problem so in light of the potential benefits we seek to find practical algorithm that uses heuristics to discover an approximate solution as an outcome of this work we have developed an algorithm that integrates defeasible logic into decision support system by automatically deriving its knowledge from databases of precedents experiments with the new algorithm are very promising delivering results comparable to and exceeding other approaches
mobile ad hoc networks are subject to some unique security issues that could delay their diffusion several solutions have already been proposed to enforce specific security properties however mobility pattern nodes obey to can on one hand severely affect the quality of the security solutions that have been tested over synthesized mobility pattern on the other hand specific mobility patterns could be leveraged to design specific protocols that could outperform existing solutionsin this work we investigate the influence of realistic mobility scenario over benchmark mobility model random waypoint mobility model using as underlying protocol recent solution introduced for the detection of compromised nodes extensive simulations show the quality of the underlying protocol however the main contribution is to show the relevance of the mobility model over the achieved performances stressing out that in mobile ad hoc networks the quality of the solution provided is satisfactory only when it can be adapted to the nodes underlying mobility model
most existing transfer learning techniques are limited to problems of knowledge transfer across tasks sharing the same set of class labels in this paper however we relax this constraint and propose spectral based solution that aims at unveiling the intrinsic structure of the data and generating partition of the target data by transferring the eigenspace that well separates the source data furthermore clustering based kl divergence is proposed to automatically adjust how much to transfer we evaluate the proposed model on text and image datasets where class categories of the source and target data are explicitly different eg classes transfer to classes and show that the proposed approach improves other baselines by an average of in accuracy the source code and datasets are available from the authors
in this paper we propose and experimentally evaluate the use of the client server database paradigm for real time processing to date the study of transaction processing with time constraints has mostly been restricted to centralized or single node systems recently client server databases have exploited locality of data accesses in real world applications to successfully provide reduced transaction response times our objective is to investigate the feasibility of real time processing in data shipping client server database architecture we compare the efficiency of the proposed architecture with that of centralized real time database system we discuss transaction scheduling issues in the two architectures and propose new policy for scheduling transactions in the client server environment this policy assigns higher priorities to transactions that have greater potential for successful completion through the use of locally available data through detailed performance scalability study we investigate the effects of client data access locality and various updating workloads on transaction completion rates our experimental results show that real time client server databases can provide significant performance gains over their centralized counterparts these gains become evident when large numbers of clients more than are attached per server even in the presence of high data contention
we introduce novel measure called four pointscondition pc which assigns value to every metric space quantifying how close the metric is to tree metric data sets taken from real internet measurements indicate remarkable closeness of internet latencies to tree metrics based on this condition we study embeddings of pc metric spaces into trees and prove tight upper and lower bounds specifically we show that there are constants and such that every metric which satisfies the pc can be embedded into tree with distortion clog and for every and any number of nodes there is metric space satisfying the pc that does not embed into tree with distortion less than clog in addition we prove lower bound on approximate distance labelings of pc metrics and give tight bounds for tree embeddings with additive error guarantees
the usage of interactive applications increases in handheld systems in this paper we describe system level dynamic power management scheme that considers interaction between the cpu and the wnic and interactive applications to reduce the energy consumption of handheld systems previous research efforts considered the cpu and the wnic separately to reduce energy consumption the proposed scheme reduces the energy consumption of handheld systems by using the information gathered from the wnic to control the cpu voltage and frequency when interactive applications are executed experimental results show that on average the proposed scheme reduces energy consumption by when compared to dvfs dynamic voltage and frequency scaling for the cpu and dpm dynamic power management for the wnic
program restructuring is key method for improving the quality of ill structured programs thereby increasing the understandability and reducing the maintenance cost it is challenging task and great deal of research is still ongoing this paper presents an approach to program restructuring inside of function based on clustering techniques with cohesion as the major concern clustering has been widely used to group related entities together the approach focuses on automated support for identifying ill structured or low cohesive functions and providing heuristic advice in both the development and evolution phases new similarity measure is defined and studied intensively specifically from the function perspective comparative study on three different hierarchical agglomerative clustering algorithms is also conducted the best algorithm is applied to restructuring of functions of real industrial system the empirical observations show that the heuristic advice provided by the approach can help software designers make better decision of why and how to restructure program specific source code level software metrics are presented to demonstrate the value of the approach
the total amount of stored information on disks has increased tremendously in recent years with storage devices getting cheaper and government agencies requiring strict data retention policies it is clear that this trend will continue for several years to come this progression creates challenge for system administrators who must determine several aspects of storage policy with respect to provisioning backups retention redundancy security performance etc these decisions are made for an entire file system logical volume or storage pool however this granularity is too large and can sacrifice storage efficiency and performance particularly since different files have different requirements in this paper we advocate that storage policy decisions be made on finer granularity we describe attest an extendable stackable storage architecture that allows storage policy decisions to be made at file granularity and at all levels of the storage stack through the use of attributes that enable plugin policy modules and application aware storage functionality we present an implementation of attest that shows minimal impact on overall performance
we present method to certify subset of the java bytecode with respect to security the method is based on abstract interpretation of the operational semantics of the language we define concrete small step enhanced semantics of the language able to keep information on the flow of data and control during execution main point of this semantics is the handling of the influence of the information flow on the operand stack we then define an abstract semantics keeping only the security information and forgetting the actual values this semantics can be used as static analysis tool to check security of programs the use of abstract interpretation allows on one side being semantics based to accept as secure wide class of programs and on the other side being rule based to be fully automated
advanced typing matching and evaluation strategy features as well as very general conditional rules are routinely used in equational programming languages such as for example asf sdf obj cafeobj maude and equational subsets of elan and casl proving termination of equational programs having such expressive features is important but nontrivial because some of those features may not be supported by standard termination methods and tools such as muterm cime aprove ttt termptation etc yet use of the features may be essential to ensure termination we present sequence of theory transformations that can be used to bridge the gap between expressive equational programs and termination tools prove the correctness of such transformations and discuss prototype tool performing the transformations on maude equational programs and sending the resulting transformed theories to some of the aforementioned tools
as greater volume of information becomes increasingly available across all disciplines many approaches such as document clustering and information visualization have been proposed to help users manage information easily however most of these methods do not directly extract key concepts and their semantic relationships from document corpora which could help better illuminate the conceptual structures within given information to address this issue we propose an approach called ldquo clonto rdquo to process document corpus identify the key concepts and automatically generate ontologies based on these concepts for the purpose of conceptualization for given document corpus clonto applies latent semantic analysis to identify key concepts allocates documents based on these concepts and utilizes wordnet to automatically generate corpus related ontology the documents are linked to the ontology through the key concepts based on two test collections the experimental results show that clonto is able to identify key concepts and outperforms four other clustering algorithms moreover the ontologies generated by clonto show significant informative conceptual structures copy wiley periodicals inc
multithreaded programs are prone to errors caused by unintended interference between concurrent threads this paper focuses on verifying that deterministically parallel code is free of such thread interference errors deterministically parallel code may create and use new threads via fork and join and coordinate their behavior with synchronization primitives such as barriers and semaphores such code does not satisfy the traditional non interference property of atomicity or serializability however and so existing atomicity tools are inadequate for checking deterministically parallel code we introduce new non interference specification for deterministically parallel code and we present dynamic analysis to enforce it we also describe singletrack prototype implementation of this analysis singletrack’s performance is competitive with prior atomicity checkers but it produces many fewer spurious warnings because it enforces more general non interference property that is applicable to more software
with suitable algorithm for ranking the expertise of user in collaborative tagging system we will be able to identify experts and discover useful and relevant resources through them we propose that the level of expertise of user with respect to particular topic is mainly determined by two factors firstly an expert should possess high quality collection of resources while the quality of web resource depends on the expertise of the users who have assigned tags to it secondly an expert should be one who tends to identify interesting or useful resources before other users do we propose graph based algorithm spear spamming resistant expertise analysis and ranking which implements these ideas for ranking users in folksonomy we evaluate our method with experiments on data sets collected from deliciouscom comprising over web documents million users and million shared bookmarks we also show that the algorithm is more resistant to spammers than other methods such as the original hits algorithm and simple statistical measures
as computer systems are essential components of many critical commercial services the need for secure online transactions is now becoming evident the demand for such applications as the market grows exceeds the capacity of individual businesses to provide fast and reliable services making outsourcing technologies key player in alleviating issues of scale consider stock broker that needs to provide real time stock trading monitoring service to clients since the cost of multicasting this information to large audience might become prohibitive the broker could outsource the stock feed to third party providers who are in turn responsible for forwarding the appropriate sub feed to clients evidently in critical applications the integrity of the third party should not be taken for granted in this work we study variety of authentication algorithms for selection and aggregation queries over sliding windows our algorithms enable the end users to prove that the results provided by the third party are correct ie equal to the results that would have been computed by the original provider our solutions are based on merkle hash trees over forest of space partitioning data structures and try to leverage key features like update query signing and authentication costs we present detailed theoretical analysis for our solutions and empirically evaluate the proposed techniques
we study the incorporation of generic types in aspect languages since advice acts like method update such study has to accommodate the subtleties of the interaction of classes polymorphism and aspects indeed simple examples demonstrate that current aspect compiling techniques do not avoid runtime type errorswe explore type systems with polymorphism for two models of parametric polymorphism the type erasure semantics of generic java and the type carrying semantics of designs such as generic our main contribution is the design and exploration of source level type system for parametric oo language with aspects we prove progress and preservation propertieswe believe our work is the first source level typing scheme for an aspect based extension of parametric object oriented language
the scaling of microchip technologies has enabled large scale systems on chip soc network on chip noc research addresses global communication in soc involving move from computation centric to communication centric design and ii the implementation of scalable communication structures this survey presents perspective on existing noc research we define the following abstractions system network adapter network and link to explain and structure the fundamental concepts first research relating to the actual network design is reviewed then system level design and modeling are discussed we also evaluate performance analysis techniques the research shows that noc constitutes unification of current trends of intrachip communication rather than an explicit new alternative
online social media draws heavily on active reader participation such as voting or rating of news stories articles or responses to question this user feedback is invaluable for ranking filtering and retrieving high quality content tasks that are crucial with the explosive amount of social content on the web unfortunately as social media moves into the mainstream and gains in popularity the quality of the user feedback degrades some of this is due to noise but increasingly small fraction of malicious users are trying to game the system by selectively promoting or demoting content for profit or fun hence an effective ranking of social media content must be robust to noise in the user interactions and in particular to vote spam we describe machine learning based ranking framework for social media that integrates user interactions and content relevance and demonstrate its effec tiveness for answer retrieval in popular community question answering portal we consider several vote spam attacks and introduce method of training our ranker to increase its robustness to some common forms of vote spam attacks the results of our large scale experimental evaluation show that our ranker is signifcicantly more robust to vote spam compared to state of the art baseline as well as the ranker not explicitly trained to handle malicious interactions
the star graph interconnection network has been recognized as an attractive alternative to the popular hypercube network in this paper we present multipath based multicast routing model for wormholerouted star graph networks propose two efficient multipath routing schemes and contrast the performance of the proposed schemes with the performance of the scheme presented in our previous work both of the two proposed schemes have been proven to be deadlock free the first scheme simple multipath routing uses multiple independent paths for concurrent multicasting the second scheme two phase multipath routing includes two phases source to relay and relay to destination for each phase multicasting is carried out using simple multipath routing experimental results show that for short and medium messages with small message startup latencies the proposed schemes reduce multicast latency more efficiently than other schemes
we show that there exist translations between polymorphic calculus and subsystem of minimal logic with existential types which form galois insertion embedding the translation from polymorphic calculus into the existential type system is the so called call by name cps translation that can be expounded as an adjoint from the neat connection the construction of an inverse translation is investigated from viewpoint of residuated mappings the duality appears not only in the reduction relations but also in the proof structures such as paths between the source and the target calculi from programming point of view this result means that abstract data types can interpret polymorphic functions under the cps translation we may regard abstract data types as dual notion of polymorphic functions
place names are often used to describe and to enquire about geographical information it is common for users to employ vernacular names that have vague spatial extent and which do not correspond to the official and administrative place name terminology recorded within typical gazetteers there is need therefore to enrich gazetteers with knowledge of such vague places and hence improve the quality of place name based information retrieval here we describe method for modelling vague places using knowledge harvested from web pages it is found that vague place names are frequently accompanied in text by the names of more precise co located places that lie within the extent of the target vague place density surface modelling of the frequency of co occurrence of such names provides an effective method of representing the inherent uncertainty of the extent of the vague place while also enabling approximate crisp boundaries to be derived from contours if required the method is evaluated using both precise and vague places the use of the resulting approximate boundaries is demonstrated using an experimental geographical search engine
this paper outlines the history of the programming language from the early days of its iso standardization through the iso standard to the later stages of the revision of that standard the emphasis is on the ideals constraints programming techniques and people that shaped the language rather than the minutiae of language features among the major themes are the emergence of generic programming and the stl the standard library’s algorithms and containers specific topics include separate compilation of templates exception handling and support for embedded systems programming during most of the period covered here was mature language with millions of users consequently this paper discusses various uses of and the technical and commercial pressures that provided the background for its continuing evolution
the analysis of data usage in large set of real traces from high energy physics collaboration revealed the existence of an emergent grouping of files that we coined filecules this paper presents the benefits of using this file grouping for prestaging data and compares it with previously proposed file grouping techniques along range of performance metrics our experiments with real workloads demonstrate that filecule grouping is reliable and useful abstraction for data management in science grids that preserving time locality for data prestaging is highly recommended that job reordering with respect to data availability has significant impact on throughput and finally that relatively short history of traces is good predictor for filecule grouping our experimental results provide lessons for workload modeling and suggest design guidelines for data management in data intensive resource sharing environments
underwater sensor networks are attracting increasing interest from researchers in terrestrial radio based sensor networks there are important physical technological and economic differences between terrestrial and underwater sensor networks in this survey we highlight number of important practical issues that have not been emphasized in recent surveys of underwater networks with an intended audience of researchers who are moving from radio based terrestrial networks into underwater networks
over the last years game theory has provided great insights into the behavior of distributed systems by modeling the players as utility maximizing agents in particular it has been shown that selfishness causes many systems to perform in globally suboptimal fashion such systems are said to have large price of anarchy in this paper we extend this active field of research by allowing some players to be malicious or byzantine rather than selfish we ask what is the impact of byzantine players on the system’s efficiency compared to purely selfish environments or compared to the social optimum in particular we introduce the price of malice which captures this efficiency degradation as an example we analyze the price of malice of game which models the containment of the spread of viruses in this game each node can choose whether or not to install anti virus software then virus starts from random node and iteratively infects all neighboring nodes which are not inoculated we establish various results about this game for instance we quantify how much the presence of byzantine players can deteriorate or in case of highly risk averse selfish players improve the social welfare of the distributed system
networks on chip noc have emerged as the design paradigm for scalable system on chip soc communication infrastructure due to convergence growing number of applications are integrated on the same chip when combined these applications result in use cases with different communication requirements the noc is configured per use case and traditionally all running applications are disrupted during use case transitions even those continuing operation in this paper we present model that enables partial reconfiguration of nocs and mapping algorithm that uses the model to map multiple applications onto noc with undisrupted quality of service during reconfiguration the performance of the methodology is verified by comparison with existing solutions for several soc designs we apply the algorithm to mobile phone soc with telecom multimedia and gaming applications reducing noc area by more than and power consumption by compared to state of the art approach
the realization of semantic web reasoning is central to substantiating the semantic web vision however current mainstream research on this topic faces serious challenges which force us to question established lines of research and to rethink the underlying approaches the acm portal is published by the association for computing machinery copyright acm inc terms of usage privacy policy code of ethics contact us useful downloads adobe acrobat quicktime windows media player real player
persistently saturated links are abnormal conditions that indicate bottlenecks in internet traffic network operators are interested in detecting such links for troubleshooting to improve capacity planning and traffic estimation and to detect denial of service attacks currently bottleneck links can be detected either locally through snmp information or remotely through active probing or passive flow based analysis however local snmp information may not be available due to administrative restrictions and existing remote approaches are not used systematically because of their network or computation overhead this paper proposes new approach to remotely detect the presence of bottleneck links using spectral and statistical analysis of traffic our approach is passive operates on aggregate traffic without flow separation and supports remote detection of bottlenecks addressing some of the major limitations of existing approaches our technique assumes that traffic through the bottleneck is dominated by packets with common size typically the maximum transfer unit for reasons discussed in section with this assumption we observe that bottlenecks imprint periodicities on packet transmissions based on the packet size and link bandwidth such periodicities manifest themselves as strong frequencies in the spectral representation of the aggregate traffic observed at downstream monitoring point we propose detection algorithm based on rigorous statistical methods to detect the presence of bottleneck links by examining strong frequencies in aggregate traffic we use data from live internet traces to evaluate the performance of our algorithm under various network conditions results show that with proper parameters our algorithm can provide excellent accuracy up to even if the traffic through the bottleneck link accounts for less than of the aggregate traffic
interaction patterns with handheld mobile devices are constantly evolving researchers observed that users prefer to interact with mobile device using one hand however only few interaction techniques support this mode of operation we show that one handed operations can be enhanced with coordinated interaction using for input the front and back of mobile device which we term as dual surface interaction we present some of the design rationale for introducing coordinated dual surface interactions we demonstrate that several tasks including target selection benefit from dual surface input which allows users to rapidly select small targets in locations that are less accessible when interacting using the thumb with one handed input furthermore we demonstrate the benefits of virtual enhancements that are possible with behind the display relative input to perform complex tasks such as steering our results show that dual surface interactions offer numerous benefits that are not available with input on the front or the back alone
this paper formulates set of minimal requirements for the platform independent model pim of the model driven architecture mda it then defines the use case responsibility driven analysis and design methodology urdad which provides simple algorithmic design methodology generating pim satisfying the specified pim requirements
energy consumption can be reduced by scaling down frequency when peak performance is not needed lower frequency permits slower circuits and hence lower supply voltage energy reduc tion comes from voltage reduction technique called dynamic voltage scaling dvs this paper makes the case that the useful frequency range of dvs is limited because there is lower bound on voltage lowering fre quency permits voltage reduction until the lowest voltage is reached beyond that point lowering frequency further does not save energy because voltage is constanthowever there is still opportunity for energy reduction outside the influence of dvs if frequency is lowered enough pairs of pipe line stages can be merged to form shallower pipeline the shal low pipeline has better instructions per cycle ipc than the deep pipeline since energy also depends on ipc energy is reduced for given frequency accordingly we propose dynamic pipeline scaling dps dps enabled deep pipeline can merge adjacent pairs of stages by making the intermediate latches transparent and disabling corresponding feedback paths thus dps enabled pipeline has deep mode for higher frequencies within the influ ence of dvs and shallow mode for lower frequencies shallow mode extends the frequency range for which energy reduction is possible for frequencies outside the influence of dvs dps enabled deep pipeline consumes from to less energy than rigid deep pipeline
the popularity of object oriented programming has led to the wide use of container libraries it is important for the reliability of these containers that they are tested adequately we describe techniques for automated test input generation of java container classes test inputs are sequences of method calls from the container interface the techniques rely on state matching to avoid generation of redundant tests exhaustive techniques use model checking with explicit or symbolic execution to explore all the possible test sequences up to predefined input sizes lossy techniques rely on abstraction mappings to compute and store abstract versions of the concrete states they explore underapproximations of all the possible test sequenceswe have implemented the techniques on top of the java pathfinder model checker and we evaluate them using four java container classes we compare state matching based techniques and random selection for generating test inputs in terms of testing coverage we consider basic block coverage and form of predicate coverage that measures whether all combinations of predetermined set of predicates are covered at each basic block the exhaustive techniques can easily obtain basic block coverage but cannot obtain good predicate coverage before running out of memory on the other hand abstract matching turns out to be powerful approach for generating test inputs to obtain high predicate coverage random selection performed well except on the examples that contained complex input spaces where the lossy abstraction techniques performed better
hashing schemes are common technique to improve the performance in mining not only association rules but also sequential patterns or traversal patterns however the collision problem in hash schemes may result in severe performance degradation in this paper we propose perfect hashing schemes for mining traversal patterns to avoid collisions in the hash table the main idea is to transform each large itemsets into one large itemset by employing delicate encoding scheme then perfect hash schemes designed only for itemsets of length two rather than varied lengths are applied the experimental results show that our method is more than twice as faster than fs algorithm the results also show our method is scalable to database sizes one variant of our perfect hash scheme called partial hash is proposed to cope with the enormous memory space required by typical perfect hash functions we also give comparison of the performances of different perfect hash variants and investigate their properties
recent web searching and mining tools are combining text and link analysis to improve ranking and crawling algorithms the central assumption behind such approaches is that there is correlation between the graph structure of the web and the text and meaning of pages here formalize and empirically evaluate two general conjectures drawing connections from link information to lexical and semantic web content the link content conjecture states that page is similar to the pages that link to it and the link cluster conjecture that pages about the same topic are clustered together these conjectures are often simply assumed to hold and web search tools are built on such assumptions the present quantitative confirmation sheds light on the connection between the success of the latest web mining techniques and the small world topology of the web with encouraging implications for the design of better crawling algorithms
nested transactional memory tm facilitates software composition by letting one module invoke another without either knowing whether the other uses transactions closed nested transactions extend isolation of an inner transaction until the toplevel transaction commits implementations may flatten nested transactions into the top level one resulting in complete abort on conflict or allow partial abort of inner transactions open nested transactions allow committing inner transaction to immediately release isolation which increases parallelism and expressiveness at the cost of both software and hardware complexitythis paper extends the recently proposed flat log based transactional memory logtm with nested transactions flat logtm saves pre transaction values in log detects conflicts with read and write bits per cache block and on abort invokes software handler to unroll the log nested logtm supports nesting by segmenting the log into stack of activation records and modestly replicating bits to facilitate composition with nontransactional code such as language runtime and operating system services we propose escape actions that allow trusted code to run outside the confines of the transactional memory system
the valued constraint satisfaction problem vcsp is generic optimization problem defined by network of local cost functions defined over discrete variables it has applications in artificial intelligence operations research bioinformatics and has been used to tackle optimization problems in other graphical models including discrete markov random fields and bayesian networks the incremental lower bounds produced by local consistency filtering are used for pruning inside branch and bound search in this paper we extend the notion of arc consistency by allowing fractional weights and by allowing several arc consistency operations to be applied simultaneously over the rationals and allowing simultaneous operations we show that an optimal arc consistency closure can theoretically be determined in polynomial time by reduction to linear programming this defines optimal soft arc consistency osac to reach more practical algorithm we show that the existence of sequence of arc consistency operations which increases the lower bound can be detected by establishing arc consistency in classical constraint satisfaction problem csp derived from the original cost function network this leads to new soft arc consistency method called virtual arc consistency which produces improved lower bounds compared with previous techniques and which can solve submodular cost functions these algorithms have been implemented and evaluated on variety of problems including two difficult frequency assignment problems which are solved to optimality for the first time our implementation is available in the open source toulbar platform
gossip based communication protocols are often touted as being robust not surprisingly such claim relies on assumptions under which gossip protocols are supposed to operate in this paper we discuss and in some cases expose some of these assumptions and discuss how sensitive the robustness of gossip is to these assumptions this analysis gives rise to collection of new research challenges
electronic business business enables employees in organizations to complete jobs more efficiently if they can rely on the dependable functions and services delivered by business unfortunately only limited amount of research has explored the topic of employee perceived dependability of business and the concept lacks measurement tools hence this research develops validated instrument for measuring business dependability ebd from an employee perspective based on survey of employees in six large semiconductor manufacturing companies in the hsinchu science based industrial park in taiwan this research conceptualizes and operationalizes the construct of ebd and then creates and refines an business dependability instrument ebdi after rigorous purification procedures the item ebd instrument with good reliability and validity is presented finally this paper concludes with discussion of potential applications of this ebd instrument the business dependability instrument and its potential implications will not only aid researchers interested in designing and testing related business theories but also provide direction for managers in improving the quality and performance of business
flooding based querying and broadcasting schemes have low hop delays of to reach any node that is unit distance away where is the transmission range of any sensor node however in sensor networks with large radio ranges flooding based broadcasting schemes cause many redundant transmissions leading to broadcast storm problem in this paper we study the role of geographic information and state information ie memory of previous messages or transmissions in reducing the redundant transmissions in the network we consider three broadcasting schemes with varying levels of local information where nodes have no geographic or state information ii coarse geographic information about the origin of the broadcast and iii no geographic information but remember previously received messages for each of these network models we demonstrate localized forwarding algorithms for broadcast based on geography or state information that achieve significant reductions in the transmission overheads while maintaining hop delays comparable to flooding based schemes we also consider the related problem of broadcasting to set of spatially uniform points in the network lattice points in the regime where all nodes have only local sense of direction and demonstrate an efficient sparse broadcast scheme based on branching random walk that has low number of packet transmissions thus our results show that even with very little local information it is possible to make broadcast schemes significantly more efficient
hot potato routing is form of synchronous routing which makes no use of buffers at intermediate nodes packets must move at every time step until they reach their destination if contention prevents packet from taking its preferred outgoing edge it is deflected on different edge two simple design principles for hot potato routing algorithms are minimum advance that advances at least one packet towards its destination from every nonempty node and possibly deflects all other packets and maximum advance that advances the maximum possible number of packets livelock is situation in which packets keep moving indefinitely in the network without any packet ever reaching its destination it is known that even maximum advance algorithms might livelock on some networks we show that minimum advance algorithms never livelock on tree networks and that maximum advance algorithms never livelock on triangulated networks
secabstractnak autonomic computing self configuring self healing self managing applications systems and networks is promising solution to ever increasing system complexity and the spiraling costs of human management as systems scale to global proportions most results to date however suggest ways to architect new software designed from the ground up as autonomic systems whereas in the real world organizations continue to use stovepipe legacy systems and or build systems of systems that draw from gamut of disparate technologies from numerous vendors our goal is to retrofit autonomic computing onto such systems externally without any need to understand modify or even recompile the target system’s code we present an autonomic infrastructure that operates similarly to active middleware to explicitly add autonomic services to pre existing systems via continual monitoring and feedback loop that performs reconfiguration and or repair as needed our lightweight design and separation of concerns enables easy adoption of individual components for use with variety of target systems independent of the rest of the full infrastructure this work has been validated by several case studies spanning multiple real world application domains
the windows and doorways that connect offices to public spaces are site for people to gather awareness information and initiate interaction however these portals often reveal more information to the public area than the office occupant would like as result people often keep doors and window blinds closed which means that nobody can gather awareness information even those with whom the occupant would be willing to share one solution to this problem is co present media space computer mediated video connection at the boundary between an office and public area these systems can provide both greater privacy control to the occupant and greater overall awareness information to observers to see how co present media spaces would work in real world settings we built what we believe are the first ever co present media spaces and deployed them in two offices from observations gathered over fifteen months it is clear that the systems can do better job of balancing the occupant’s need for privacy and the observers need for awareness better than standard window however we also identified number of issues that affected the use and the success of the systems the existence of alternate information sources confusion with existing social norms disparities between effort and need and reduced interactional subtlety for observers in the public area our work contributes both novel arrangement of media space for co present collaborators and the first investigation into the design factors that affect the use and acceptance of these systems
due to the uncertainty of software processes statistic based schedule estimation and stochastic project scheduling both play significant roles in software project management however most current work investigates them independently without an integrated process to achieve on time delivery for software development organisations for such an issue this paper proposes two stage probabilistic scheduling strategy which aims to decrease schedule overruns specifically probability based temporal consistency model is employed at the first pre scheduling stage to support negotiation between customers and project managers for setting balanced deadlines of individual software processes at the second scheduling stage an innovative genetic algorithm based scheduling strategy is proposed to minimise the overall completion time of multiple software processes with individual deadlines the effectiveness of our strategy in achieving on time delivery is verified with large scale simulation experiments
creating user profiles is an important step in personalization many methods for user profile creation have been developed to date using different representations such as term vectors and concepts from an ontology like dmoz in this paper we propose and evaluate different methods for creating user profiles using wikipedia as the representation the key idea in our approach is to map documents to wikipedia concepts at different levels of resolution words key phrases sentences paragraphs the document summary and the entire document itself we suggest method for evaluating profile recall by pooling the relevant results from the different methods and evaluate our results for both precision and recall we also suggest novel method for profile evaluation by assessing the recall over known ontological profile drawn from dmoz
we present new system for robustly performing boolean operations on linear polyhedra our system is exact meaning that all internal numeric predicates are exactly decided in the sense of exact geometric computation our bsp tree based system is faster at performing iterative computations than cgal’s nef polyhedra based system the current best practice in robust boolean operations while being only twice as slow as the non robust modeler maya meanwhile we achieve much smaller substrate of geometric subroutines than previous work comprised of only predicates convex polygon constructor and convex polygon splitting routine the use of bsp tree based boolean algorithm atop this substrate allows us to explicitly handle all geometric degeneracies without treating large number of cases
in distributed system broadcasting is an efficient way to dispense data in certain highly dynamic environments while there are several well known on line broadcast scheduling strategies that minimize wait time there has been little research that considers on demand broadcasting with timing constraints one application which could benefit from strategy for on demand broadcast with timing constraints is real time database system scheduling strategies are needed in real time databases that identify which data item to broadcast next in order to minimize missed deadlines the scheduling decisions required in real time broadcast system allow the system to be modeled as markov decision process mdp in this paper we analyze the mdp model and determine that finding an optimal solution is hard problem in pspace we propose scheduling approach called aggregated critical requests acr which is based on the mdp formulation and present two algorithms based on this approach acr is designed for timely delivery of data to clients in order to maximize the reward by minimizing the deadlines missed results from trace driven experiments indicate the acr approach provides flexible strategy that can outperform existing strategies under variety of factors
multi label learning deals with the problem where each instance is associated with multiple labels simultaneously the task of this learning paradigm is to predict the label set for each unseen instance through analyzing training instances with known label sets in this paper neural network based multi label learning algorithm named ml rbf is proposed which is derived from the traditional radial basis function rbf methods briefly the first layer of an ml rbf neural network is formed by conducting clustering analysis on instances of each possible class where the centroid of each clustered groups is regarded as the prototype vector of basis function after that second layer weights of the ml rbf neural network are learned by minimizing sum of squares error function specifically information encoded in the prototype vectors corresponding to all classes are fully exploited to optimize the weights corresponding to each specific class experiments on three real world multi label data sets show that ml rbf achieves highly competitive performance to other well established multi label learning algorithms
privacy is an important issue in data publishing many organizations distribute non aggregate personal data for research and they must take steps to ensure that an adversary cannot predict sensitive information pertaining to individuals with high confidence this problem is further complicated by the fact that in addition to the published data the adversary may also have access to other resources eg public records and social networks relating individuals which we call adversarial knowledge robust privacy framework should allow publishing organizations to analyze data privacy by means of not only data dimensions data that publishing organization has but also adversarial knowledge dimensions information not in the data in this paper we first describe general framework for reasoning about privacy in the presence of adversarial knowledge within this framework we propose novel multidimensional approach to quantifying adversarial knowledge this approach allows the publishing organization to investigate privacy threats and enforce privacy requirements in the presence of various types and amounts of adversarial knowledge our main technical contributions include multidimensional privacy criterion that is more intuitive and flexible than previous approaches to modeling background knowledge in addition we identify an important congregation property of the adversarial knowledge dimensions based on this property we provide algorithms for measuring disclosure and sanitizing data that improve computational efficiency several orders of magnitude over the best known techniques
while symmetry reduction has been established to be an important technique for reducing the search space in model checking its application in concurrent software verification is still limited due to the difficulty of specifying symmetry in realistic software we propose an algorithm for automatically discovering and applying transition symmetry in multithreaded programs during dynamic model checking our main idea is using dynamic program analysis to identify permutation of variables and labels of the program that entails syntactic equivalence among the residual code of threads and to check whether the local states of threads are equivalent under the permutation the new transition symmetry discovery algorithm can bring substantial state space savings during dynamic verification of concurrent programs we have implemented the new algorithm in the dynamic model checker inspect our preliminary experiments show that this algorithm can successfully discover transition symmetries that are hard or otherwise cumbersome to identify manually and can significantly reduce the model checking time while using inspect to examine realistic multithreaded applications
spatio temporal geo referenced datasets are growing rapidly and will be more in the near future due to both technological and social commercial reasons from the data mining viewpoint spatio temporal trajectory data introduce new dimensions and correspondingly novel issues in performing the analysis tasks in this paper we consider the clustering problem applied to the trajectory data domain in particular we propose an adaptation of density based clustering algorithm to trajectory data based on simple notion of distance between trajectories then set of experiments on synthesized data is performed in order to test the algorithm and to compare it with other standard clustering approaches finally new approach to the trajectory clustering problem called temporal focussing is sketched having the aim of exploiting the intrinsic semantics of the temporal dimension to improve the quality of trajectory clustering
this paper evaluates managing the processor’s datapath width at the compiler level by means of exploiting dynamic narrow width operands we capitalize on the large occurrence of these operands in multimedia programs to build static narrow width regions that may be directly exposed to the compiler we propose to augment the isa with instructions directly exposing the datapath and the register widths to the compiler simple exception management allows this exposition to be only speculative in this way we permit the software to speculatively accommodate the execution of program on narrower datapath width in order to save energy for this purpose we introduce novel register file organization the byte slice register file which allows the width of the register file to be dynamically reconfigured providing both static and dynamic energy savings we show that by combining the advantages of the byte slice register file with the advantages provided by clock gating the datapath on per region basis up to of the datapath dynamic energy can be saved while reduction of the register file static energy is achieved
networks on chip nocs emerge as the solution for the problem of interconnecting cores or ips in systems on chip socs which require reusable and scalable communication architectures the building block of noc is its router or switch whose architecture has great impact on the costs and on the performance of the network this work presents parameterizable router architecture for nocs which is based on canonical template and on library of building components offering different alternatives and implementations for the circuits used for packet forwarding in noc such features allow to explore the noc design space in order to obtain router configuration which best fits the performance requirements of target application at lower silicon costs we describe the router architecture and present some synthesis results which demonstrate the feasibility of this new router
today many workers spend too much of their time translating their co workers requests into structures that information systems can understand this paper presents the novel interaction design and evaluation of vio an agent that helps workers trans late request vio monitors requests and makes suggestions to speed up the translation vio allows users to quickly correct agent errors these corrections are used to improve agent performance as it learns to automate work our evaluations demonstrate that this type of agent can significantly reduce task completion time freeing workers from mundane tasks
this paper presents boosting based algorithm for learning bipartite ranking function brf with partially labeled data until now different attempts had been made to build brf in transductive setting in which the test points are given to the methods in advance as unlabeled data the proposed approach is semi supervised inductive ranking algorithm which as opposed to transductive algorithms is able to infer an ordering on new examples that were not used for its training we evaluate our approach using the trec ohsumed and the reuters data collections comparing against two semi supervised classification algorithms for rocarea auc uninterpolated average precision aup mean precision tp and precision recall pr curves in the most interesting cases where there are an unbalanced number of irrelevant examples over relevant ones we show our method to produce statistically significant improvements with respect to these ranking measures
the mpi programming model hides network type and topology from developers but also allows them to seamlessly distribute computational job across multiple cores in both an intra and inter node fashion this provides for high locality performance when the cores are either on the same node or on nodes closely connected by the same network type the streaming model splits computational job into linear chain of decoupled units this decoupling allows the placement of job units on optimal nodes according to network topology furthermore the links between these units can be of varying protocols when the application is distributed across heterogeneous network in this paper we study how to integrate the mpi and stream programming models in order to exploit network locality and topology we present hybrid mpi stream framework that aims to take advantage of each model’s strengths we test our framework with financial application this application simulates an electronic market for single financial instrument stream of buy and sell orders is fed into price matching engine the matching engine creates stream of order confirmations trade confirmations and quotes based on its attempts to match buyers with sellers our results show that the hybrid mpi stream framework can deliver performance improvement at certain order transmission rates
information extraction ie is the task of extracting knowledge from unstructured text we present novel unsupervised approach for information extraction based on graph mutual reinforcement the proposed approach does not require any seed patterns or examples instead it depends on redundancy in large data sets and graph based mutual reinforcement to induce generalized extraction patterns the proposed approach has been used to acquire extraction patterns for the ace automatic content extraction relation detection and characterization rdc task ace rdc is considered hard task in information extraction due to the absence of large amounts of training data and inconsistencies in the available data the proposed approach achieves superior performance which could be compared to supervised techniques with reasonable training data
constraints on the memory size of embedded systems require reducing the image size of executing programs common techniques include code compression and reduced instruction sets we propose novel technique that eliminates large portions of the executable image without compromising execution time due to decompression or code generation due to reduced instruction sets frozen code and data portions are identified using profiling techniques and removed from the loadable image they are replaced with branches to code stubs that load them in the unlikely case that they are accessed the executable is sustained in runnable modeanalysis of the frozen portions reveals that most are error and uncommon input handlers only minority of the code less than that was identified as frozen during training run is also accessed with production datasetsthe technique was applied on three benchmark suites spec cint spec cfp and mediabench and results in image size reductions of up to and per suite the average reductions are and per suite
in this paper we propose and evaluate cluster based network server called press the server relies on locality conscious request distribution and standard for user level communication to achieve high performance and portability we evaluate press by first isolating the performance benefits of three key features of user level communication low processor overhead remote memory accesses and zero copy transfers next we compare press to servers that involve less intercluster communication but are not as easily portable our results for an node server cluster and five www traces demonstrate that user level communication can improve performance by as much as percent compared to kernel level protocol low processor overhead remote memory writes and zero copy all make nontrivial contributions toward this overall gain our results also show that portability in press causes no throughput degradation when we exploit user level communication extensively
citations to publication venues in the form of journal conference and workshop contain spelling variants acronyms abbreviated forms and misspellings all of which make more difficult to retrieve the item of interest the task of discovering and reconciling these variant forms of bibliographic references is known as authority work the key goal is to create the so called authority files which maintain for any given bibliographic item list of variant labels ie variant strings used as reference to it in this paper we propose to use information available on the web to create high quality publication venue authority files our idea is to recognize and extract references to publication venues in the text snippets of the answers returned by search engine references to same publication venue are then reconciled in an authority file each entry in this file is composed of canonical name for the venue an acronym the venue type ie journal conference or workshop and mapping to various forms of writing its name in bibliographic citations experimental results show that our web based method for creating authority files is superior to previous work based on straight string matching techniques considering the average precision in finding correct venue canonical names we observe gains up to
in this paper we present new program analysis method which we call storage use analysis this analysis deduces how objects are used by the program and allows the optimization of their allocation this analysis can be applied to both statically typed languages eg ml and latently typed languages eg scheme it handles side effects higher order functions separate compilation and does not require cps transformation we show the application of our analysis to two important optimizations stack allocation and unboxing the first optimization replaces some heap allocations by stack allocations for user and system data storage eg lists vectors procedures the second optimization avoids boxing some objects this analysis and associated optimitations have been implemented in the bigloo scheme ml compiler experimental results show that for many allocation intensive programs we get significant speedup in particular numerically intensive programs are almost times faster because floating point numbers are unboxed and no longer heap allocated
extensive research efforts have been devoted to implement group of type safe mutually recursive classes recently proposals for separating each member of the group as reusable and composable programming unit have also been presented one problem of these proposals is verbosity of the source programs we have to declare recursive type parameter to parameterize each mutually recursive class within each class declaration and we have to declare fixed point class with empty class body for each parameterized class therefore even though the underlying type system is simple programs written in these languages tend to be rather complex and hard to understand in this paper we propose language with lightweight dependent classes that forms simple type system built on top of generic java in this language we can implement each member of type safe mutually recursive classes in separate source file without writing lot of complex boilerplate code to carefully investigate type soundness of our proposal we develop xfgj simple extension of fgj supporting lightweight dependent classes this type system is proved to be sound
as data streams are gaining prominence in growing number of emerging application domains classification on data streams is becoming an active research area currently the typical approach to this problem is based on ensemble learning which learns basic classifiers from training data stream and forms the global predictor by organizing these basic ones while this approach seems successful to some extent its performance usually suffers from two contradictory elements existing naturally within many application scenarios firstly the need for gathering sufficient training data for basic classifiers and engaging enough basic learners in voting for bias variance reduction and secondly the requirement for significant sensitivity to concept drifts which places emphasis on using recent training data and up to date individual classifiers it results in such dilemma that some algorithms are not sensitive enough to concept drifts while others although sensitive enough suffer from unsatisfactory classification accuracy in this paper we propose an ensemble learning algorithm which furnishes training data for basic classifiers starting from the up to date data chunk and searching for complement from past chunks while ruling out the data inconsistent with current concept provides effective voting by adaptively distinguishing sensible classifiers from the else and engaging sensible ones as voters experimental results justify the superiority of this strategy in terms of both accuracy and sensitivity especially in severe circumstances where training data is extremely insufficient or concepts are evolving frequently and significantly
this paper presents the design implementation and performance evaluation of suite of resource policing mechanisms that allow guest processes to efficiently and unobtrusively exploit otherwise idle workstation resources unlike traditional policies that harvest cycles only from unused machines we employ fine grained cycle stealing to exploit resources even from machines that have active users we developed suite of kernel extensions that enable these policies to operate without significantly impacting host processes new starvation level cpu priority for guest jobs new page replacement policy that imposes hard bounds on physical memory usage by guest processes and new scheduling mechanism called rate windows that throttle guest processes usage of and network bandwidth we evaluate both the individual impacts of each mechanism and their utility for our fine grain cycle stealing
the reconfigurable devices such as cpld and fpga become more popular for its great potential on accelerating applications they are widely used as an application specified hardware accelerator many run time reconfigurable platforms are introduced such as the intel quickassist technology however it’s time consuming to design hardware accelerator while the performance is hard to determine because of the extra overheads it involved in order to estimate the efficiency of the accelerator theoretical analysis of such platforms was done in our paper three factors which impact the performance of the accelerator were concluded as well speed up ratio reconfiguration overhead and communication overhead furthermore performance model was established and an experiment on bzip was done to verify the model the results showed that the model’s estimation is very close to the real world and the average error on the efficiency’s threshold is less than
software engineer can use meaning preserving program restructuring tool during maintenance to change program’s structure to ease modification one common restructuring action is to create new abstract data type by encapsulating an existing data structure data encapsulation simplifies modification by isolating changes to the implementation and behavior of an abstract data type to perform encapsulation programmer must understand how the data structure is used in the code identify abstract operations performed on the data structure and choose concrete expressions to be made into functions we provide manipulable program visualization called the star diagram that both highlights information partinent to encapsulation and supports the application of meaning preserving restructuring transformations on the program through direct manipulation user interface the visualization graphically and compactly presents all statements in the program that use the given global data structure helping the programmer to choose the functions that completely encapsulate it additionally the visualization elides code unrelated to the data structure and to the task and collapses similar expressions to allow the programmer to identify frequently occurring code fragments and manipulate them together the visualization is mapped directly to the program text so manipulation of the visualization also restructures the program we describe the design implementation and application of the star diagram and evaluate its ability to assist data encapsulation in large programs
in this paper we address stereo matching in the presence of class of non lambertian effects where image formation can be modeled as the additive superposition of layers at different depths the presence of such effects makes it impossible for traditional stereo vision algorithms to recover depths using direct color matching based methods we develop several techniques to estimate both depths and colors of the component layers depth hypotheses are enumerated in pairs one from each layer in nested plane sweep for each pair of depth hypotheses matching is accomplished using spatial temporal differencing we then use graph cut optimization to solve for the depths of both layers this is followed by an iterative color update algorithm which we proved to be convergent our algorithm recovers depth and color estimates for both synthetic and real image sequences
this research investigates ways of predicting which files would be most likely to contain large numbers of faults in the next release of large industrial software system previous work involved making predictions using several different models ranging from simple fully automatable model the loc model to several different variants of negative binomial regression model that were customized for the particular software system under study not surprisingly the custom models invariably predicted faults more accurately than the simple model however development of customized models requires substantial time and analytic effort as well as statistical expertise we now introduce new more sophisticated models that yield more accurate predictions than the earlier loc model but which nonetheless can be fully automated we also extend our earlier research by presenting another large scale empirical study of the value of these prediction models using new industrial software system over nine year period
this paper presents new approach to dynamically monitoring operating system kernel integrity based on property called state based control flow integrity sbcfi violations of sbcfi signal persistent unexpected modification of the kernel’s control flow graph we performed thorough analysis of linux rootkits and found that employ persistent control flow modifications an informal study of windows rootkits yielded similar results we have implemented sbcfi enforcement as part of the xen and vmware virtual machine monitors our implementation detected all the control flow modifying rootkits we could install while imposing unnoticeable overhead for both typical web server workload and cpu intensive workloads when operating at second intervals
per core local scratchpad memories allow direct inter core communication with latency and energy advantages over coherent cache based communication especially as cmp architectures become more distributed we have designed cache integrated network interfaces nis appropriate for scalable multicores that combine the best of two worlds the flexibility of caches and the efficiency of scratchpad memories on chip sram is configurably shared among caching scratchpad and virtualized ni functions this paper presents our architecture which provides local and remote scratchpad access to either individual words or multi word blocks through rdma copy furthermore we introduce event responses as mechanism for software configurable synchronization primitives we present three event response mechanisms that expose ni functionality to software for multiword transfer initiation memory barriers for explicitly selected accesses of arbitrary size and multi party synchronization queues we implemented these mechanisms in four core fpga prototype and evaluated the on chip communication performance on the prototype as well as on cmp simulator with up to cores we demonstrate efficient synchronization low overhead communication and amortized overhead bulk transfers which allow parallelization gains for fine grain tasks and efficient exploitation of the hardware bandwidth
query expansion of named entities can be employed in order to increase the retrieval effectiveness peculiarity of named entities compared to other vocabulary terms is that they are very dynamic in appearance and synonym relationships between terms change with time in this paper we present an approach to extracting synonyms of named entities over time from the whole history of wikipedia in addition we will use their temporal patterns as feature in ranking and classifying them into two types ie time independent or time dependent time independent synonyms are invariant to time while time dependent synonyms are relevant to particular time period ie the synonym relationships change over time further we describe how to make use of both types of synonyms to increase the retrieval effectiveness ie query expansion with time independent synonyms for an ordinary search and query expansion with time dependent synonyms for search wrt temporal criteria finally through an evaluation based on trec collections we demonstrate how retrieval performance of queries consisting of named entities can be improved using our approach
load instructions diminish processor performance in two ways first due to the continuously widening gap between cpu and memory speed the relative latency of load instructions grows constantly and already slows program execution second memory reads limit the available instruction level parallelism because instructions that use the result of load must wait for the memory access to complete before they can start executing load value predictors alleviate both problems by allowing the cpu to speculatively continue processing without having to wait for load instructions which can significantly improve the execution speed while several hybrid load value predictors have been proposed and found to work well no systematic study of such predictors exists in this paper we investigate the performance of all hybrids that can be built out of register value last value stride delta last four value and finite context method predictor our analysis shows that hybrids can deliver percent more speedup than the best single component predictors an examination of the individual components of hybrids revealed that predictors with poor standalone performance sometimes make excellent components in hybrid while combining well performing individual predictors often does not result in an effective hybrid our hybridization study identified the register value stride delta predictor as one of the best two component hybrids it matches or exceeds the speedup of two component hybrids from the literature in spite of its substantially smaller and simpler design of all the predictors we studied the register value stride delta last four value hybrid performs best it yields harmonic mean speedup over the eight specint programs of percent
this paper presents lambda language for dynamic tracking of information flow across multiple interdependent dimensions of information typical dimensions of interest are integrity and confidentiality lambda supports arbitrary domain specific policies that can be developed independently lambda treats information flow metadata as first class entity and tracks information flow on the metadata itself integrity on integrity integrity on confidentiality etc this paper also introduces impolite novel class of information flow policies for lambda unlike many systems which only allow for absolute security relations impolite can model more realistic security policies based on relative security relations impolite demonstrates how policies on interdependent dimensions of information can be simultaneously enforced within lambda i’s unified framework
reputation systems have been popular in estimating the trustworthiness and predicting the future behavior of nodes in large scale distributed system where nodes may transact with one another without prior knowledge or experience one of the fundamental challenges in distributed reputation management is to understand vulnerabilities and develop mechanisms that can minimize the potential damages to system by malicious nodes in this paper we identify three vulnerabilities that are detrimental to decentralized reputation management and propose trustguard safeguard framework for providing highly dependable and yet efficient reputation system first we provide dependable trust model and set of formal methods to handle strategic malicious nodes that continuously change their behavior to gain unfair advantages in the system second transaction based reputation system must cope with the vulnerability that malicious nodes may misuse the system by flooding feedbacks with fake transactions third but not least we identify the importance of filtering out dishonest feedbacks when computing reputation based trust of node including the feedbacks filed by malicious nodes through collusion our experiments show that comparing with existing reputation systems our framework is highly dependable and effective in countering malicious nodes regarding strategic oscillating behavior flooding malevolent feedbacks with fake transactions and dishonest feedbacks
question classification is an important step in factual question answering qa and other dialog systems several attempts have been made to apply statistical machine learning approaches including support vector machines svms with sophisticated features and kernels curiously the payoff beyond simple bag of words representation has been small we show that most questions reveal their class through short contiguous token subsequence which we call its informer span perfect knowledge of informer spans can enhance accuracy from to using linear svms on standard benchmarks in contrast standard heuristics based on shallow pattern matching give only improvement showing that the notion of an informer is non trivial using novel multi resolution encoding of the question’s parse tree we induce conditional random field crf to identify informer spans with about accuracy then we build meta classifier using linear svm on the crf output enhancing accuracy to which is better than all published numbers
content filtering based intrusion detection systems have been widely deployed in enterprise networks and have become standard measure to protect networks and network users from cyber attacks although several solutions have been proposed recently finding an efficient solution is considered as difficult problem due to the limitations in resources such as small memory size as well as the growing link speed in this paper we present novel content filtering technique called table driven bottom up tree tbt which was designed to fully exploit hardware parallelism to achieve real time packet inspection ii to require small memory for storing signatures iii to be flexible in modifying the signature database and iv to support complex signature representation such as regular expressions we configured tbt considering the hardware specifications and limitations and implemented it using fpga simulation based performance evaluations showed that the proposed technique used only kilobytes of memory for storing the latest version of snort rule consisting of signatures in addition unlike many other hardware based solutions modification to signature database does not require hardware re compilation in tbt
the goal of this article is to review the state of the art tracking methods classify them into different categories and identify new trends object tracking in general is challenging problem difficulties in tracking objects can arise due to abrupt object motion changing appearance patterns of both the object and the scene nonrigid object structures object to object and object to scene occlusions and camera motion tracking is usually performed in the context of higher level applications that require the location and or shape of the object in every frame typically assumptions are made to constrain the tracking problem in the context of particular application in this survey we categorize the tracking methods on the basis of the object and motion representations used provide detailed descriptions of representative methods in each category and examine their pros and cons moreover we discuss the important issues related to tracking including the use of appropriate image features selection of motion models and detection of objects
ensuring model quality is key success factor in many computer science areas and becomes crucial in recent software engineering paradigms like the one proposed by model driven software development tool support for measurements and redesigns becomes essential to help developers improve the quality of their models however developing such helper tools for the wide variety of frequently domain specific visual notations used by software engineers is hard and repetitive task that does not take advantage from previous developments thus being frequently forgotten in this paper we present our approach for the visual specification of measurements and redesigns for domain specific visual languages dsvls with this purpose we introduce novel dsvl called slammer that contains generalisations of some of the more used types of internal product measurements and redesigns the goal is to facilitate the task of defining measurements and redesigns for any dsvl as well as the generation of tools from such specification reducing or eliminating the necessity of coding we rely on the use of visual patterns for the specification of the relevant elements for each measurement and redesign type in addition slammer allows the specification of redesigns either procedurally or by means of graph transformation rules these redesigns can be triggered when the measurements reach certain threshold these concepts have been implemented in the meta modelling tool atom in this way when dsvl is designed it is possible to specify measurements and redesigns that will become available in the final modelling environment generated for the language as an example we show case study in the web modelling domain
low latency anonymity systems are susceptive to traffic analysis attacks in this paper we propose dependent link padding scheme to protect anonymity systems from traffic analysis attacks while providing strict delay bound the covering traffic generated by our scheme uses the minimum sending rate to provide full anonymity for given set of flows the relationship between user anonymity and the minimum covering traffic rate is then studied via analysis and simulation when user flows are poisson processes with the same sending rate the minimum covering traffic rate to provide full anonymity to users is log for pareto traffic we show that the rate of the covering traffic converges to constant when the number of flows goes to infinity finally we use real internet trace files to study the behavior of our algorithm when user flows have different rates
virtual machine monitors vmms have been hailed as the basis for an increasing number of reliable or trusted computing systems the xen vmm is relatively small piece of software hypervisor that runs at lower level than conventional operating system in order to provide isolation between virtual machines its size is offered as an argument for its trustworthiness however the management of xen based system requires privileged full blown operating system to be included in the trusted computing base tcb in this paper we introduce our work to disaggregate the management virtual machine in xen based system we begin by analysing the xen architecture and explaining why the status quo results in large tcb we then describe our implementation which moves the domain builder the most important privileged component into minimal trusted compartment we illustrate how this approach may be used to implement trusted virtualisation and improve the security of virtual tpm implementations finally we evaluate our approach in terms of the reduction in tcb size and by performing security analysis of the disaggregated system
java is high productivity object oriented programming language that is rapidly gaining popularity in high performance application development one major obstacle to its broad acceptance is its mediocre performance when compared with fortran or especially if the developers use object oriented features of the language extensively previous work in improving the performance of object oriented high performance scientific java applications consisted of high level compiler optimization and analysis strategies such as class specialization and object inlining this paper extends prior work on object inlining by improving the analysis and developing new code transformation techniques to further improve the performance of high performance applications written in high productivity object oriented style two major impediments to effective object inlining are object and array aliasing and binary method invocations this paper implements object and array alias strategies to address the aliasing problem while utilizing an idea from telescoping languages to address the binary method invocation problem application runtime gains of up to result from employing these techniques these improvements should further increase the scientific community’s acceptance of the java programming language in the development of high performance high productivity scientific applications
as part of an architectural modeling project this paper investigates the problem of understanding and manipulating images of buildings our primary motivation is to automatically detect and seamlessly remove unwanted foreground elements from urban scenes without explicit handling these objects will appear pasted as artifacts on the model recovering the building facade in video sequence is relatively simple because parallax induces foreground background depth layers but here we consider static images only we develop series of methods that enable foreground removal from images of buildings or brick walls the key insight is to use priori knowledge about grid patterns on building facades that can be modeled as near regular textures nrt we describe markov random field mrf model for such textures and introduce markov chain monte carlo mcmc optimization procedure for discovering them this simple spatial rule is then used as starting point for inference of missing windows facade segmentation outlier identification and foreground removal
large scientific parallel applications demand large amounts of memory space current parallel computing platforms schedule jobs without fully knowing their memory requirements this leads to uneven memory allocation in which some nodes are overloaded this in turn leads to disk paging which is extremely expensive in the context of scientific parallel computing to solve this problem we propose new peer to peer solution called parallel network ram this approach avoids the use of disk better utilizes available ram resources and will allow larger problems to be solved while reducing the computational communication and synchronization overhead typically involved in parallel applications we proposed several different parallel network ram designs and evaluated the performance of each under different conditions we discovered that different designs are appropriate in different situations
we show the practical feasibility of monitoring complex security properties using runtime monitoring approach for metric first order temporal logic in particular we show how wide variety of security policies can be naturally formalized in this expressive logic ranging from traditional policies like chinese wall and separation of duty to more specialized usage control and compliance requirements we also explain how these formalizations can be directly used for monitoring and experimentally evaluate the performance of the resulting monitors
the continuing trend toward greater processing power larger storage and in particular increased display surface by using multiple monitor supports increased multi tasking by the computer user the concomitant increase in desktop complexity has the potential to push the overhead of window management to frustrating and counterproductive new levels it is difficult to adequately design for multiple monitor systems without understanding how multiple monitor users differ from or are similar to single monitor users therefore we deployed tool to group of single monitor and multiple monitor users to log window management activity analysis of the data collected from this tool revealed that usage of interaction components may change with an increase in number of monitors and window visibility can be useful measure of user display space management activity especially for multiple monitor users the results from this analysis begin to fill gap in research about real world window management practices
graph data are subject to uncertainties in many applications due to incompleteness and imprecision of data mining uncertain graph data is semantically different from and computationally more challenging than mining exact graph data this paper investigates the problem of mining frequent subgraph patterns from uncertain graph data the frequent subgraph pattern mining problem is formalized by designing new measure called expected support an approximate mining algorithm is proposed to find an approximate set of frequent subgraph patterns by allowing an error tolerance on the expected supports of the discovered subgraph patterns the algorithm uses an efficient approximation algorithm to determine whether subgraph pattern can be output or not the analytical and experimental results show that the algorithm is very efficient accurate and scalable for large uncertain graph databases
hierarchical multi label classification hmc is variant of classification where instances may belong to multiple classes at the same time and these classes are organized in hierarchy this article presents several approaches to the induction of decision trees for hmc as well as an empirical study of their use in functional genomics we compare learning single hmc tree which makes predictions for all classes together to two approaches that learn set of regular classification trees one for each class the first approach defines an independent single label classification task for each class sc obviously the hierarchy introduces dependencies between the classes while they are ignored by the first approach they are exploited by the second approach named hierarchical single label classification hsc depending on the application at hand the hierarchy of classes can be such that each class has at most one parent tree structure or such that classes may have multiple parents dag structure the latter case has not been considered before and we show how the hmc and hsc approaches can be modified to support this setting we compare the three approaches on yeast data sets using as classification schemes mips’s funcat tree structure and the gene ontology dag structure we show that hmc trees outperform hsc and sc trees along three dimensions predictive accuracy model size and induction time we conclude that hmc trees should definitely be considered in hmc tasks where interpretable models are desired
time dependant models have been intensively studied for many reasons among others because of their applications in software verification and due to the development of embedded platforms where reliability and safety depend to large extent on the time features many of the time dependant models were suggested as real time extensions of several well known untimed models the most studied formalisms include networks of timed automata which extend the model of communicating finite state machines with finite number of real valued clocks and timed extensions of petri nets where the added time constructs include eg time intervals that are assigned to the transitions time petri nets or to the arcs timed arc petri nets in this paper we shall semi formally introduce these models discuss their strengths and weaknesses and provide an overview of the known results about the relationships among the models
spline joints are novel class of joints that can model general scleronomic constraints for multibody dynamics based on the minimal coordinates formulation the main idea is to introduce spline curves and surfaces in the modeling of joints we model dof joints using splines on se and construct multi dof joints as the product of exponentials of splines in euclidean space we present efficient recursive algorithms to compute the derivatives of the spline joint as well as geometric algorithms to determine optimal parameters in order to achieve the desired joint motion our spline joints can be used to create interesting new simulated mechanisms for computer animation and they can more accurately model complex biomechanical joints such as the knee and shoulder
the iceberg cube mining computes all cells corresponding to group by partitions that satisfy given constraint on aggregated behaviors of the tuples in group by partition the number of cells often is so large that the result cannot be realistically searched without pushing the constraint into the search previous works have pushed antimonotone and monotone constraints however many useful constraints are neither antimonotone nor monotone we consider general class of aggregate constraints of the form theta sigma where is an arithmetic function of sql like aggregates and theta is one of we propose novel pushing technique called divide and approximate to push such constraints the idea is to recursively divide the search space and approximate the given constraint using antimonotone or monotone constraints in subspaces this technique applies to class called separable constraints which properly contains all constraints built by an arithmetic function of all sql aggregates
finite state verification eg model checking provides powerful means to detect errors that are often subtle and difficult to reproduce nevertheless the transition of this technology from research to practice has been slow while there are number of potential causes for reluctance in adopting such formal methods in practice we believe that primary cause rests with the fact that practitioners are unfamiliar with specification processes notations and strategies recent years have seen growing success in leveraging experience with design and coding patterns we propose pattern based approach to the presentation codification and reuse of property specifications for finite state verification
structured documents eg sgml can benefit lot from database support and more specifically from object oriented database oodb management systems this paper describes natural mapping from sgml documents into oodb’s and formal extension of two oodb query languages one sql like and the other calculus in order to deal with sgml document retrievalalthough motivated by structured documents the extensions of query languages that we present are general and useful for variety of other oodb applications key element is the introduction of paths as first class citizens the new features allow to query data and to some extent schema without exact knowledge of the schema in simple and homogeneous fashion
devices are increasingly vulnerable to soft errors as their feature sizes shrink previously soft error rates were significant primarily in space and high atmospheric computing modern architectures now use features so small at sufficiently low voltages that soft errors are becoming important even at terrestrial altitudes due to their large number of components supercomputers are particularly susceptible to soft errors since many large scale parallel scientific applications use iterative linear algebra methods the soft error vulnerability of these methods constitutes large fraction of the applications overall vulnerability many users consider these methods invulnerable to most soft errors since they converge from an imprecise solution to precise one however we show in this paper that iterative methods are vulnerable to soft errors exhibiting both silent data corruptions and poor ability to detect errors further we evaluate variety of soft error detection and tolerance techniques including checkpointing linear matrix encodings and residual tracking techniques
the world wide web has now become humongous archive of various contents the inordinate amount of information found on the web presents challenge to deliver right information to the right users on one hand the abundant information is freely accessible to all web denizens on the other hand much of such information may be irrelevant or even deleterious to some users for example some control and filtering mechanisms are desired to prevent inappropriate or offensive materials such as pornographic websites from reaching children ways of accessing websites are termed as access scenarios an access scenario can include using search engines eg image search that has very little textual content url redirection to some websites or directly typing porn website urls in this paper we propose framework to analyze website from several different aspects or information sources and generate classification model aiming to accurately classify such content irrespective of access scenarios extensive experiments are performed to evaluate the resulting system which illustrates the promise of the proposed approach
the possibility for spontaneous ad hoc networks between mobile devices has been increasing as small devices become more capable of hosting useful networked applications these applications face the challenges of frequent disconnections highly dynamic network topologies and varying communication patterns combination unique to mobile ad hoc networks this is the first survey to examine current manet programming approaches including tuple spaces remote objects publish subscribe and code migration through analysis and experimental results we suggest that these approaches are essentially extensions to existing distributed and parallel computing concepts and new abstractions may be necessary to fully handle the programming issues presented by manets
many techniques have been developed to perform indoor location each strategy has its own advantages and drawbacks with the application demanding location information the main determinant of the system to be used in this paper system is presented that serves location to innovative services for elderly and disabled people ranging from alarm and monitoring to support for navigation and leisure the system uses zigbee and ultrasound to fulfill the application requirements differing in this respect from all other existing systems zups zigbee and ultrasound positioning system provides wide multicell coverage easy extension robustness even in crowded scenarios different levels of precision depending on the user’s profile and service requirements from few centimeters to meters limited infrastructure requirements simple calibration and cost effectiveness the system has been evaluated from the technical functional and usability standpoints with satisfactory results and its suitability has also been demonstrated in residence for people with disabilities located in zaragoza spain
we present technique for implementing visual language compilers through standard compiler generation platforms the technique exploits extended positional grammars xpgs for short for modeling the visual languages in natural way and uses set of mapping rules to translate an xpg specification into translation schema this lets us generate visual language parsers through standard compiler compiler techniques and tools like yacc the generated parser accepts exactly the same set of visual sentences derivable through the application of xpg productions the technique represents an important achievement since it enables us to perform visual language compiler construction through standard compiler compilers rather than specific compiler generation tools this makes our approach particularly appealing since compiler compilers are widely used and rely on well founded theory moreover the approach provides the basis for the unification of traditional textual language technologies and visual language compiler technologies
program analysis is the heart of modern compilers most control flow analyses are reduced to the problem of finding fixed point in certain transition system and such fixed point is commonly computed through an iterative procedure that repeats tracing until convergencethis paper proposes new method to analyze programs through recursive graph traversals instead of iterative procedures based on the fact that most programs without spaghetti goto have well structured control flow graphs graphs with bounded tree width our main techniques are an algebraic construction of control flow graph called sp term which enables control flow analysis to be defined in natural recursive form and the optimization theorem which enables us to compute optimal solution by dynamic programmingwe illustrate our method with two examples dead code detection and register allocation different from the traditional standard iterative solution our dead code detection is described as simple combination of bottom up and top down traversals on sp term register allocation is more interesting as it further requires optimality of the result we show how the optimization theorem on sp terms works to find an optimal register allocation as certain dynamic programming
data dominated signal processing applications are typically described using large and multi dimensional arrays and loop nests the order of production and consumption of array elements in these loop nests has huge impact on the amount of memory required during execution this is essential since the size and complexity of the memory hierarchy is the dominating factor for power performance and chip size in these applications this paper presents number of guiding principles for the ordering of the dimensions in the loop nests they enable the designer or design tools to find the optimal ordering of loop nest dimensions for individual data dependencies in the code we prove the validity of the guiding principles when no prior restrictions are given regarding fixation of dimensions if some dimensions are already fixed at given nest levels this is taken into account when fixing the remaining dimensions in most cases an optimal ordering is found for this situation as well the guiding principles can be used in the early design phases in order to enable minimization of the memory requirement through in place mapping we use real life examples to show how they can be applied to reach cost optimized end product the results show orders of magnitude improvement in memory requirement compared to using the declared array sizes and similar penalties for choosing the suboptimal ordering of loops when in place mapping is exploited
in this paper we depart from tcp probing tsaoussidis and badr and propose an experimental transport protocol that achieves energy and throughput performance gains in mixed wired and wireless environments our approach decouples error recovery from contention estimation and focuses on the way these two mechanisms can feed the probing decision process and ii implement the protocol strategy by shaping traffic according to detected conditions we use validation mechanism that uncovers previous possibly wrong estimations our analysis matches well our simulation results that are very promising
the medial axis is classical representation of digital objects widely used in many applications however such set of balls may not be optimal subsets of the medial axis may exist without changing the reversivility of the input shape representation in this article we first prove that finding minimum medial axis is an np hard problem for the euclidean distance then we compare two algorithms which compute an approximation of the minimum medial axis one of them providing bounded approximation results
we present graph based intermediate representation ir with simple semantics and low memory cost implementation the ir uses directed graph with labeled vertices and ordered inputs but unordered outputs vertices are labeled with opcodes edges are unlabeled we represent the cfg and basic blocks with the same vertex and edge structures each opcode is defined by class that encapsulates opcode specific data and behavior we use inheritance to abstract common opcode behavior allowing new opcodes to be easily defined from old ones the resulting ir is simple fast and easy to use
the emergence of location aware services calls for new real time spatio temporal query processing algorithms that deal with large numbers of mobile objects and queries online query response is an important characterization of location aware services delay in the answer to query gives invalid and obsolete results simply because moving objects can change their locations before the query responds to handle large numbers of spatio temporal queries efficiently we propose the idea of sharing as means to achieve scalability in this paper we introduce several types of sharing in the context of continuous spatio temporal queries examples of sharing in the context of real time spatio temporal database systems include sharing the execution sharing the underlying space sharing the sliding time windows and sharing the objects of interest we demonstrate how sharing can be integrated into query predicates eg selection and spatial join processing the goal of this paper is to outline research directions and approaches that will lead to scalable and efficient location aware services
we discuss information retrieval methods that aim at serving diverse stream of user queries such as those submitted to commercial search engines we propose methods that emphasize the importance of taking into consideration of query difference in learning effective retrieval functions we formulate the problem as multi task learning problem using risk minimization framework in particular we show how to calibrate the empirical risk to incorporate query difference in terms of introducing nuisance parameters in the statistical models and we also propose an alternating optimization method to simultaneously learn the retrieval function and the nuisance parameters we work out the details for both and regularization cases and provide convergence analysis for the alternating optimization method for the special case when the retrieval functions belong to reproducing kernel hilbert space we illustrate the effectiveness of the proposed methods using modeling data extracted from commercial search engine we also point out how the current framework can be extended in future research
the current research progress and the existing problems of uncertain or imprecise knowledge representation and reasoning in description logics are analyzed in this paper approximate concepts are introduced to description logics based on rough set theory and kind of new rough description logic rdl rough description logic based on approximate concepts is proposed based on approximate concepts the syntax semantics and properties of the rdl are given it is proved that the approximate concept satisfiability definitely satisfiability and possibly satisfiability reasoning problem and approximate concepts rough subsumption reasoning problem wrt rough tbox in rdl may be reduced to the concept satisfiability reasoning problem in almost standard alc the description logic that provides the boolean concept constructors plus the existential and universal restriction constructors the works of this paper provide logic foundations for approximate ontologies and theoretical foundations for reasoning algorithms of more expressive rough description logics including approximate concepts number restrictions nominals inverse roles and role hierarchies
the main contribution of this paper is compiler based cache topology aware code optimization scheme for emerging multicore systems this scheme distributes the iterations of loop to be executed in parallel across the cores of target multicore machine and schedules the iterations assigned to each core our goal is to improve the utilization of the on chip multi layer cache hierarchy and to maximize overall application performance we evaluate our cache topology aware approach using set of twelve applications and three different commercial multicore machines in addition to study some of our experimental parameters in detail and to explore future multicore machines with higher core counts and deeper on chip cache hierarchies we also conduct simulation based study the results collected from our experiments with three intel multicore machines show that the proposed compiler based approach is very effective in enhancing performance in addition our simulation results indicate that optimizing for the on chip cache hierarchy will be even more important in future multicores with increasing numbers of cores and cache levels
design patterns have become widely acknowledged software engineering practice and therefore have been incorporated in the curricula of most computer science departments this paper presents an observational study on students ability to understand and apply design patterns within the context of postgraduate software engineering course students had to deliver two versions of software system one without and one with design patterns the former served as poorly designed system suffering from architectural problems while the latter served as an improved system where design problems had been solved by appropriate patterns the experiment allowed the quantitative evaluation of students preference to patterns moreover it was possible to assess students ability in relating design problems with patterns and interpreting the impact of patterns on software metrics the overall goal was to empirically identify ways in which course on design patterns could be improved
biometric based personal authentication is an effective method for automatically recognizing with high confidence person’s identity by observing that the texture pattern produced by bending the finger knuckle is highly distinctive in this paper we present new biometric authentication system using finger knuckle print fkp imaging specific data acquisition device is constructed to capture the fkp images and then an efficient fkp recognition algorithm is presented to process the acquired data in real time the local convex direction map of the fkp image is extracted based on which local coordinate system is established to align the images and region of interest is cropped for feature extraction for matching two fkps feature extraction scheme which combines orientation and magnitude information extracted by gabor filtering is proposed an fkp database which consists of images from different fingers is established to verify the efficacy of the proposed system and promising results are obtained compared with the other existing finger back surface based biometric systems the proposed fkp system achieves much higher recognition rate and it works in real time it provides practical solution to finger back surface based biometric systems and has great potentials for commercial applications
whereas the remote procedure call rpc abstraction including its derivates such as remote method invocation has proven to be an adequate programming paradigm for client server applications over lans type based publish subscribe tps is an appealing candidate programming abstraction for decoupled and completely decentralized applications that run over large scale and mobile networks tps enforces type safety and encapsulation just like rpc while providing decoupling and scalability properties unlike rpc two tps implementations in java demonstrate this approach’s potential the first is seminal approach relying on specific primitives added to the java language the second is library implementation based on more general recent java mechanisms avoiding any specific compilation
in this paper we present task assignment policy suited to environments such as high volume web serving clusters where local centralised dispatchers are utilised to distribute tasks amongst back end hosts offering mirrored services with negligible cost work conserving migration available between hosts the taptf wc task assignment based on prioritising traffic flows with work conserving migration policy was specifically created to exploit such environments as such taptf wc exhibits consistently good performance over wide range of task distribution scenarios due to its flexible nature spreading the work over multiple hosts when prudent and separating short task flows from large task flows via the use of dual queues tasks are migrated in work conserving manner reducing the penalty associated with task migration found in many existing policies such as tags and taptf which restart tasks upon migration we find that the taptf wc policy is well suited for load distribution under wide range of different workloads in environments where task sizes are not known priori and negligible cost work conserving migration is available
previous research has shown that rotation and orientation of items plays three major roles during collaboration comprehension coordination and communication based on these roles of orientation and advice from kinesiology research we have designed the rotate’n translate rnt interaction mechanism which provides integrated control of rotation and translation using only single touch point for input we present an empirical evaluation comparing rnt to common rotation mechanism that separates control of rotation and translation results of this study indicate rnt is more efficient than the separate mechanism and better supports the comprehension coordination and communication roles of orientation
the integration of knowledge from multiple sources is an important aspect in several areas such as data warehousing database integration automated reasoning systems active reactive databases and others thus central topic in databases is the construction of integration systems designed for retrieving and querying uniform data stored in multiple information sources this chapter illustrates recent techniques for computing repairs as well as consistent answers over inconsistent databases often databases may be inconsistent with respect to set of integrity constraints that is one or more integrity constraints are not satisfied most of the techniques for computing repairs and queries over inconsistent databases work for restricted cases and only recently there have been proposals to consider more general constraints in this chapter we give an informal description of the main techniques proposed in the literature
although graphical user interfaces guis constitute large part of the software being developed today and are typically created using rapid prototyping there are no effective regression testing techniques for guis the needs of gui regression testing differ from those of traditional software when the structure of gui is modified test cases from the original gui are either reusable or unusable on the modified gui since gui test case generation is expensive our goal is to make the unusable test cases usable the idea of reusing these unusable aka obsolete test cases has not been explored before in this paper we show that for guis the unusability of large number of test cases is serious problem we present novel gui regression testing technique that first automatically determines the usable and unusable test cases from test suite after gui modification it then determines which of the unusable test cases can be repaired so they can execute on the modified gui the last step is to repair the test cases our technique is integrated into gui testing framework that given test case automatically executes it on the gui we implemented our regression testing technique and demonstrate for two case studies that our approach is effective in that many of the test cases can be repaired and is practical in terms of its time performance
existing methods for white balancing photographs tend to rely on skilled interaction from the user which is prohibitive for most amateur photographers we propose minimal interaction system for white balancing photographs that contain humans many of the pictures taken by amateur photographers fall into this category our system matches user selected patch of skin in photograph to an entry in skin reflectance function database the estimate of the illuminant that emerges from the skin matching can be used to white balance the photograph allowing users to compensate for biased illumination in an image with single click we compare the quality of our results to output from three other low interaction methods including commercial approaches such as google picasa’s one click relighting whitepoint based algorithm and ebner’s localized gray world algorithm the comparisons indicate that our approach offers several advantages for amateur photographers
most prolog implementations are implemented in low level languages such as and are based on variation of the wam instruction set which enhances their performance but makes them hard to write in addition many of the more dynamic features of prolog like assert despite their popularity are not well supported we present high level continuation based prolog interpreter based on the pypy project the pypy project makes it possible to easily and efficiently implement dynamic languages it provides tools that automatically generate just in time compiler for given interpreter of the target language by using partial evaluation techniques the resulting prolog implementation is surprisingly efficient it clearly outperforms existing interpreters of prolog in high level languages such as java moreover on some benchmarks our system outperforms state of the art wam based prolog implementations our paper aims to show that declarative languages such as prolog can indeed benefit from having just in time compiler and that pypy can form the basis for implementing programming languages other than python
abstract prefetching is an important technique to reduce the average web access latency existing prefetching methods are based mostly on url graphs they use the graphical nature of http links to determine the possible paths through hypertext system although the url graph based approaches are effective in prefetching of frequently accessed documents few of them can prefetch those urls that are rarely visited this paper presents keyword based semantic prefetching approach to overcome the limitation it predicts future requests based on semantic preferences of past retrieved web documents we apply this technique to internet news services and implement client side personalized prefetching system newsagent the system exploits semantic preferences by analyzing keywords in url anchor text of previously accessed documents in different news categories it employs neural network model over the keyword set to predict future requests the system features self learning capability and good adaptability to the change of client surfing interest newsagent does not exploit keyword synonymy for conservativeness in prefetching however it alleviates the impact of keyword polysemy by taking into account server provided categorical information in decision making and hence captures more semantic knowledge than term document literal matching methods experimental results from daily browsing of abc news cnn and msnbc news sites for period of three months show an achievement of up to percent hit ratio due to prefetching
this paper studies the problem of pruning an ensemble of classifiers from reinforcement learning perspective it contributes new pruning approach that uses the learning algorithm in order to approximate an optimal policy of choosing whether to include or exclude each classifier from the ensemble extensive experimental comparisons of the proposed approach against state of the art pruning and combination methods show very promising results additionally we present an extension that allows the improvement of the solutions returned by the proposed approach over time which is very useful in certain performance critical domains
we introduce modular framework for distributed abstract argumentation where the argumentation context that is information about preferences among arguments values validity reasoning mode skeptical vs credulous and even the chosen semantics can be explicitly represented the framework consists of collection of abstract argument systems connected via mediators each mediator integrates information coming from connected argument systems thereby handling conflicts within this information and provides the context used in particular argumentation module the framework can be used in different directions eg for hierarchic argumentation as typically found in legal reasoning or to model group argumentation processes
in this paper is presented an improved model of ambient intelligent based on non traditional grids it is studied the fault tolerance of the proposed model taking into account the redundancy at link level
we propose to exploit three valued abstraction to stochastic systems in compositional way this combines the strengths of an aggressive state based abstraction technique with compositional modeling applying this principle to interactive markov chains yields abstract models that combine interval markov chains and modal transition systems in natural and orthogonal way we prove the correctness of our technique for parallel and symmetric composition and show that it yields lower bounds for minimal and upper bounds for maximal timed reachability probabilities
innovation in the fields of wireless data communications mobile devices and biosensor technology enables the development of new types of monitoring systems that provide people with assistance anywhere and at any time in this paper we present an architecture useful to build those kind of systems that monitor data streams generated by biological sensors attached to mobile users we pay special attention to three aspects related to the system efficiency selection of the optimal granularity that is the selection of the size of the input data stream package that has to be acquired in order to start new processing cycle the possible use of compression techniques to store and send the acquired input data stream and finally the performance of local analysis versus remote one moreover we introduce two particular real systems to illustrate the suitability and applicability of our proposal an anywhere and at any time monitoring system of heart arrhythmias and an apnea monitoring system
to carry out work assignments small groups distributed within larger enterprise often need to share documents among themselves while shielding those documents from others eyes in this situation users need an indexing facility that can quickly locate relevant documents that they are allowed to access without leaking information about the remaining documents imposing large management burden as users groups and documents evolve or requiring users to agree on central completely trusted authority to address this problem we propose the concept of confidentiality which captures the degree of information leakage from an index about the terms contained in inaccessible documents then we propose the confidential zerber indexing facility for sensitive documents which uses secret splitting and term merging to provide tunable limits on information leakage even under statistical attacks requires only limited trust in central indexing authority and is extremely easy to use and administer experiments with real world data show that zerber offers excellent performance for index insertions and lookups while requiring only modest amount of storage space and network bandwidth
this research aims to study the development of augmented reality of rare books or manuscripts of special collections in the libraries augmented reality has the ability to enhance users perception of and interaction with the real world libraries has to ensure that this special collection is well handled as these rare books and manuscripts are priceless as they represent the inheritance of each nation the use of augmented reality will be able to model these valuable manuscripts and rare books and appear as augmented reality to ensure that the collection can be better maintained users will be able to open the augmented rare book and flip the pages as well as read the contents of the rare books and manuscripts using the peripheral equipment such as the hmd or the marker the ar rare bm developed is modeled as an augmented reality that allows users to put the augmented rare book on his palm or table and manipulate it while reading users can also leave bookmark in the ar rare bm after reading so that they can read their favourite sections again at later date
requirements related scenarios capture typical examples of system behaviors through sequences of desired interactions between the software to be and its environment their concrete narrative style of expression makes them very effective for eliciting software requirements and for validating behavior models however scenarios raise coverage problems as they only capture partial histories of interaction among system component instances moreover they often leave the actual requirements implicit numerous efforts have therefore been made recently to synthesize requirements or behavior models inductively from scenarios two problems arise from those efforts on the one hand the scenarios must be complemented with additional input such as state assertions along episodes or flowcharts on such episodes this makes such techniques difficult to use by the nonexpert end users who provide the scenarios on the other hand the generated state machines may be hard to understand as their nodes generally convey no domain specific properties their validation by analysts complementary to model checking and animation by tools may therefore be quite difficult this paper describes tool supported techniques that overcome those two problems our tool generates labeled transition system lts for each system component from simple forms of message sequence charts msc taken as examples or counterexamples of desired behavior no additional input is required global lts for the entire system is synthesized first this lts covers all scenario examples and excludes all counterexamples it is inductively generated through an interactive procedure that extends known learning techniques for grammar induction the procedure is incremental on training examples it interactively produces additional scenarios that the end user has to classify as examples or counterexamples of desired behavior the lts synthesis procedure may thus also be used independently for requirements elicitation through scenario questions generated by the tool the synthesized system lts is then projected on local lts for each system component for model validation by analysts the tool generates state invariants that decorate the nodes of the local lts
some tasks in dataspace loose collection of heterogeneous data sources require integration of fine grained data from diverse sources this work is often done by end users knowledgeable about the domain who copy and paste data into spreadsheet or other existing application inspired by this kind of work in this paper we define data curation setting characterized by data that are explicitly selected copied and then pasted into target dataset where they can be confirmed or replaced rows and columns in the target may also be combined for example when redundant each of these actions is an integration decision often of high quality that when taken together comprise the provenance of data value in the target in this paper we define conceptual model for data and provenance for these user actions and we show how questions about data provenance can be answered we note that our model can be used in automated data curation as well as in setting with the manual activity we emphasize in our examples
in this paper we present novel algorithm for reconstructing scenes from set of images the user defines set of polygonal regions with corresponding labels in each image using familiar photo editing tools our reconstruction algorithm computes the model with maximum volume that is consistent with the set of regions in the input images the algorithm is fast uses only intersection operations and directly computes polygonal model we implemented user assisted system for scene reconstruction and show results on scenes that are difficult or impossible to reconstruct with other methods
in this paper we present the infocious web search engine our goal in creating infocious is to improve the way people find information on the web by resolving ambiguities present in natural language text this is achieved by performing linguistic analysis on the content of the web pages we index which is departure from existing web search engines that return results mainly based on keyword matching this additional step of linguistic processing gives infocious two main advantages first infocious gains deeper understanding of the content of web pages so it can better match users queries with indexed documents and therefore can improve relevancy of the returned results second based on its linguistic processing infocious can organize and present the results to the user in more intuitive ways in this paper we present the linguistic processing technologies that we incorporated in infocious and how they are applied in helping users find information on the web more efficiently we discuss the various components in the architecture of infocious and how each of them benefits from the added linguistic processing finally we experimentally evaluate the performance of component which leverages linguistic information in order to categorize web pages
the problem of building scalable shared memory multiprocessor can be reduced to that of building scalable memory hierarchy assuming interprocessor communication is handled by the memory system in this paper we describe the vmp mc design distributed parallel multi computer based on the vmp multiprocessor design that is intended to provide set of building blocks for configuring machines from one to several thousand processors vmp mc uses memory hierarchy based on shared caches ranging from on chip caches to board level caches connected by busses to at the bottom high speed fiber optic ring in addition to describing the building block components of this architecture we identify the key performance issues associated with the design and provide performance evaluation of these issues using trace drive simulation and measurements from the vmp this work was sponsored in part by the defense advanced research projects agency under contract
we propose in this paper very fast feature selection technique based on conditional mutual information by picking features which maximize their mutual information with the class to predict conditional to any feature already picked it ensures the selection of features which are both individually informative and two by two weakly dependant we show that this feature selection method outperforms other classical algorithms and that naive bayesian classifier built with features selected that way achieves error rates similar to those of state of the art methods such as boosting or svms the implementation we propose selects features among based on training set of examples in tenth of second on standard ghz pc
this paper examines the usability issues involved in ticketless travelling with an airport train the main contribution of this paper is that it describes actual use situations in detail we show how users intentions are difficult to anticipate unless in explicit communication eg with people whose job it is to help out with using the system being conspicuously assisted however only aggravates situation where users usually prefer anonymity given private in public type of design users had little chance of learning from watching others moreover users were quickly annoyed when they struggled with the machine they seemed to treat it as an agent for the provider rather than an assistant or tool for themselves at the end of this paper we outline and illustrate some new design ideas which we think ought to be considered for future designs of it in public spaces
heterogeneous architectures that integrate mix of big and small cores are very attractive because they can achieve high single threaded performance while enabling high performance thread level parallelism with lower energy costs despite their benefits they pose significant challenges to the operating system software thread scheduling is one of the most critical challenges in this paper we propose bias scheduling for heterogeneous systems with cores that have different microarchitectures and performancewe identify key metrics that characterize an application bias namely the core type that best suits its resource needs by dynamically monitoring application bias the operating system is able to match threads to the core type that can maximize system throughput bias scheduling takes advantage of this by influencing the existing scheduler to select the core type that bests suits the application when performing load balancing operations bias scheduling can be implemented on top of most existing schedulers since its impact is limited to changes in the load balancing code in particular we implemented it over the linux scheduler on real system that models microarchitectural differences accurately and found that it can improve system performance significantly and in proportion to the application bias diversity present in the workload unlike previous work bias scheduling does not require sampling of cpi on all core types or offline profiling we also expose the limits of dynamic voltage frequency scaling as an evaluation vehicle for heterogeneous systems
simulation of an application is popular and reliable approach to find the optimal configuration of level one cache memory for an application specific embedded system processor however long simulation time is one of the main disadvantages of simulation based approaches in this paper we propose new and fast simulation method super set simulator susesim while previous methods use top down searching strategy susesim utilizes bottom up search strategy along with new elaborate data structure to reduce the search space to determine cache hit or miss susesim can simulate hundreds of cache configurations simultaneously by reading an application’s memory request trace just once total number of cache hits and misses are accurately recorded depending on different cache block sizes and benchmark applications susesim can reduce the number of tags to be checked by up to compared to the existing fastest simulation approach the crcb algorithm with the help of faster search and an easy to maintain data structure susesim can be up to faster in simulating memory requests compared to the crcb algorithm
we address the development of normalization theory for object oriented data models that have common features to support objects we first provide an extension of functional dependencies to cope with the richer semantics of relationships between objects called path dependency local dependency and global dependency constraints using these dependency constraints we provide normal forms for object oriented data models based on the notions of user interpretation user specified dependency constraints and object model in constrast to conventional data models in which normalized object has unique interpretation in object oriented data models an object may have many multiple interpretations that form the model for that object an object will then be in normal form if and only if the user’s interpretation is derivable from the model of the object our normalization process is by nature iiterative in which objects are restructured until their models reflect the user’s interpretation
in this paper we propose an efficient text classification method using term projection firstly we use modified statistic to project terms into predefined categories which is more efficient compared to other clustering methods afterwards we utilize the generated clusters as features to represent the documents the classification is then performed in rule based manner or via svm experiment results show that our modified statistic feature selection method outperforms traditional statistic especially at lower dimensionalities and our method is also more efficient than latent semantic analysis lsa on homogeneous dataset meanwhile we can reduce the feature dimensionality by three orders of magnitude to save training and testing cost and maintain comparable accuracy moreover we could use small training set to gain an approximately improvement on heterogeneous dataset as compared to traditional method which indicates that our method has better generalization capability
rough sets are widely used in feature subset selection and attribute reduction in most of the existing algorithms the dependency function is employed to evaluate the quality of feature subset the disadvantages of using dependency are discussed in this paper and the problem of forward greedy search algorithm based on dependency is presented we introduce the consistency measure to deal with the problems the relationship between dependency and consistency is analyzed it is shown that consistency measure can reflects not only the size of decision positive region like dependency but also the sample distribution in the boundary region therefore it can more finely describe the distinguishing power of an attribute set based on consistency we redefine the redundancy and reduct of decision system we construct forward greedy search algorithm to find reducts based on consistency what’s more we employ cross validation to test the selected features and reduce the overfitting features in reduct the experimental results with uci data show that the proposed algorithm is effective and efficient
aligning structures often referred to as docking or registration is frequently required in fields such as computer science robotics and structural biology the task of aligning the structures is usually automated but due to noise and imprecision the user often needs to evaluate the results before final decision can be made the solutions involved are of multidimensional nature and normally densely populated therefore some form of visualization is necessary especially if users want to achieve higher level understanding such as solution symmetry or clustering from the data we have developed system that provides two views of the data one view places focus on the orientation of the solutions and the other focuses on translations solutions within the views are crosslinked using various visual cues users are also able to apply various filters intelligently reducing the solution set we applied the visualization to data generated by the automated cryo em process of docking molecular structures into electron density maps current systems in this field only allow for visual representation of single solution or numerical list of the data we evaluated the system through multi phase user study and found that the users were able to gain better high level understanding of the data even in cases of relatively small solution sets
in collaborative software development projects tasks are often used as mechanism to coordinate and track shared development work modern development environments provide explicit support for task management where tasks are typically organized and managed through predefined categories although there have been many studies that analyze data available from task management systems there has been relatively little work on the design of task management tools in this paper we explore how tagging with freely assigned keywords provides developers with lightweight mechanism to further categorize and annotate development tasks we investigate how tags that are frequently used over long period of time reveal the need for additional predefined categories of keywords in task management tool support finally we suggest future work to explore how integrated lightweight tool features in development environment may improve software development practices
let be set of points in dimensional lp metric space let epsilon and let be any real number an bounded leg path from to is an ordered set of points which connects to such that the leg between any two consecutive points in the set is at most the minimal path among all these paths is the bounded leg shortest path from to in the bounded leg shortest path stblsp problem we are given two points and and real number and are required to compute an bounded leg shortest path from to in the all pairs bounded leg shortest path apblsp problem we are required to build data structure that given any two query points from and any real number outputs the length of the bounded leg shortest path distance query or the path itself path query in this paper present first an algorithm for the apblsp problem in any lp metric which for any fixed epsilon computes in log middot epsilon time data structure which approximates any bounded leg shortest path within multiplicative error of epsilon it requires nlog space and distance queries are answered in log log time this improves on an algorithm with running time of given by bose et al in we present also an algorithm for the stblsp problem that given isin and real number computes in middot polylog the exact bounded shortest path from to this algorithm works in and infin metrics in the euclidean metric we also obtain an exact algorithm but with running time of epsilon for any epsilon we end by showing that for any weighted directed graph there is data structure of size nlog which is capable of answering path queries with multiplicative error of epsilon in log log ell time where ell is the length of the reported path our results improve upon the results given by bose et al our algorithms incorporate several new ideas along with an interesting observation made on geometric spanners which is of an independent interest
we propose processor virtualization architecture virtus to provide dedicated domain for preinstalled applications and virtualized domains for downloaded native applications with it security oriented next generation mobile terminals can provide any number of domains for native applications virtus features three new technologies namely vmm asymmetrization dynamic interdomain communication idc and virtualization assist logic and it is first in the world to virtualize an arm based multiprocessor evaluations have shown that vmm asymmetrization results in significantly less performance degradation and loc increase than do other vmms further dynamic idc overhead is low enough and virtualization assist logic can be implemented in sufficiently small area
we present minimum bayes risk mbr decoding over translation lattices that compactly encode huge number of translation hypotheses we describe conditions on the loss function that will enable efficient implementation of mbr decoders on lattices we introduce an approximation to the bleu score papineni et al that satisfies these conditions the mbr decoding under this approximate bleu is realized using weighted finite state automata our experiments show that the lattice mbr decoder yields moderate consistent gains in translation performance over best mbr decoding on arabic to english chinese to english and english to chinese translation tasks we conduct range of experiments to understand why lattice mbr improves upon best mbr and study the impact of various parameters on mbr performance
concurrent programs are notorious for containing errors that are difficult to reproduce and diagnose two common kinds of concurrency errors are data races and atomicity violations informally atomicity means that executing methods concurrently is equivalent to executing them serially several static and dynamic run time analysis techniques exist to detect potential races and atomicity violations run time checking may miss errors in unexecuted code and incurs significant run time overhead on the other hand run time checking generally produces fewer false alarms than static analysis this is significant practical advantage since diagnosing all of the warnings from static analysis of large codebases may be prohibitively expensivethis paper explores the use of static analysis to significantly decrease the overhead of run time checking our approach is based on type system for analyzing data races and atomicity type discovery algorithm is used to obtain types for as much of the program as possible complete type inference for this type system is np hard and parts of the program might be untypable warnings from the typechecker are used to identify parts of the program from which run time checking can safely be omitted the approach is completely automatic scalable to very large programs and significantly reduces the overhead of run time checking for data races and atomicity violations
there are three common challenges in real world classification applications ie how to use domain knowledge how to resist noisy samples and how to use unlabeled data to address these problems novel classification framework called mutually beneficial learning mbl is proposed in this paper mbl integrates two learning steps together in the first step the underlying local structures of feature space are discovered through learning process the result provides necessary capability to resist noisy samples and prepare better input for the second step where consecutive classification process is further applied to the result these two steps are iteratively performed until stop condition is met different from traditional classifiers the output of mbl consists of two components common classifier and set of rules corresponding to local structures in application test sample is first matched with the discovered rules if matched rule is found the label of the rule is assigned to the sample otherwise the common classifier will be utilized to classify the sample we applied the mbl to online news classification and our experimental results showed that mbl is significantly better than naïve bayes and svm even when the data is noisy or partially labeled
in state based testing it is common to include verdicts within test cases the result of the test case being the verdict reached by the test run in addition approaches that reason about test effectiveness or produce tests that are guaranteed to find certain classes of faults are often based on either fault domain or set of test hypotheses this article considers how the presence of fault domain or test hypotheses affects our notion of test verdict the analysis reveals the need for new verdicts that provide more information than the current verdicts and for verdict functions that return verdict based on set of test runs rather than single test run the concepts are illustrated in the contexts of testing from nondeterministic finite state machine and the testing of datatype specified using an algebraic specification language but are potentially relevant whenever fault domains or test hypotheses are used
program dependence information is useful for variety of software testing and maintenance tasks properly defined control and data dependencies can be used to identify semantic dependencies to function effectively on whole programs tools that utilize dependence information require information about interprocedural dependencies dependencies that exist because of interactions among procedures many techniques for computing data and control dependencies exist however in our search of the literature we find only one attempt to define and compute interprocedural control dependencies unfortunately that approach can omit important control dependencies and incorrectly identifies control dependencies for large class of programs this paper presents definition of interprocedural control dependence that supports the relationship of control and data dependence to semantic dependence an efficient algorithm for calculating interprocedural control dependencies and empirical results obtained by our implementation of the algorithm
there has been much recent interest in on line data mining existing mining algorithms designed for stored data are either not applicable or not effective on data streams where real time response is often needed and data characteristics change frequently therefore researchers have been focusing on designing new and improved algorithms for on line mining tasks such as classification clustering frequent itemsets mining pattern matching etc relatively little attention has been paid to designing dsmss which facilitate and integrate the task of mining data streams ie stream systems that provide inductive functionalities analogous to those provided by weka and ms ole db for stored data in this paper we propose the notion of an inductive dsms system that besides providing rich library of inter operable functions to support the whole mining process also supports the essentials of dsms including optimization of continuous queries load shedding synoptic constructs and non stop computing ease of use and extensibility are additional desiderata for the proposed inductive dsms we first review the many challenges involved in realizing such system and then present our approach of extending the stream mill dsms toward that goal our system features powerful query language where mining methods are expressed via aggregates for generic streams and arbitrary windows ii library of fast and light mining algorithms and iii an architecture that makes it easy to customize and extend existing mining methods and introduce new ones
we study selectivity estimation techniques for set similarity queries wide variety of similarity measures for sets have been proposed in the past in this work we concentrate on the class of weighted similarity measures eg tf idf and bm cosine similarity and variants and design selectivity estimators based on priori constructed samples first we study the pitfalls associated with straightforward applications of random sampling and argue that care needs to be taken in how the samples are constructed uniform random sampling yields very low accuracy while query sensitive realtime sampling is more expensive than exact solutions both in cpu and cost we show how to build robust samples priori based on existing synopses for distinct value estimation we prove the accuracy of our technique theoretically and verify its performance experimentally our algorithm is orders of magnitude faster than exact solutions and has very small space overhead
web applications support many of our daily activities but they often have security problems and their accessibility makes them easy to exploit in cross site scripting xss an attacker exploits the trust web client browser has for trusted server and executes injected script on the browser with the server’s privileges in xss constituted the largest class of newly reported vulnerabilities making it the most prevalent class of attacks today web applications have xss vulnerabilities because the validation they perform on untrusted input does not suffice to prevent that input from invoking browser’s javascript interpreter and this validation is particularly difficult to get right if it must admit some html mark up most existing approaches to finding xss vulnerabilities are taint based and assume input validation functions to be adequate so they either miss real vulnerabilities or report many false positives this paper presents static analysis for finding xss vulnerabilities that directly addresses weak or absent input validation our approach combines work on tainted information flow with string analysis proper input validation is difficult largely because of the many ways to invoke the javascript interpreter we face the same obstacle checking for vulnerabilities statically and we address it by formalizing policy based on the wc recommendation the firefox source code and online tutorials about closed source browsers we provide effective checking algorithms based on our policy we implement our approach and provide an extensive evaluation that finds both known and unknown vulnerabilities in real world web applications
an important but very expensive primitive operation of high dimensional databases is the nearest neighbor knn similarity join the operation combines each point of one dataset with its knns in the other dataset and it provides more meaningful query results than the range similarity join such an operation is useful for data mining and similarity search in this paper we propose novel knn join algorithm called the gorder or the ordering knn join method gorder is block nested loop join method that exploits sorting join scheduling and distance computation filtering and reduction to reduce both and cpu costs it sorts input datasets into the order and applied the scheduled block nested loop join on the ordered data the distance computation reduction is employed to further reduce cpu cost it is simple and yet efficient and handles high dimensional data efficiently extensive experiments on both synthetic cluster and real life datasets were conducted and the results illustrate that gorder is an efficient knn join method and outperforms existing methods by wide margin
social tagging systems have become increasingly popular for sharing and organizing web resources tag prediction is common feature of social tagging systems social tagging by nature is an incremental process meaning that once user has saved web page with tags the tagging system can provide more accurate predictions for the user based on user’s incremental behaviors however existing tag prediction methods do not consider this important factor in which their training and test datasets are either split by fixed time stamp or randomly sampled from larger corpus in our temporal experiments we perform time sensitive sampling on an existing public dataset resulting in new scenario which is much closer to real world in this paper we address the problem of tag prediction by proposing probabilistic model for personalized tag prediction the model is bayesian approach and integrates three factors ego centric effect environmental effects and web page content two methods both intuitive calculation and learning optimization are provided for parameter estimation pure graphbased methods which may have significant constraints such as every user every item and every tag has to occur in at least posts cannot make prediction in most of real world cases while our model improves the measure by over compared to leading algorithm in our real world use case
we envision new better together mobile application paradigm where multiple mobile devices are placed in close proximity and study specific together viewing video application in which higher resolution video isplayed back across screens of two mobile devices placed side by side this new scenario imposes real time synchronous decoding and rendering requirements which are difficult to achieve because of the intrinsic complexity of video andthe resource constraints such as processing power and battery life of mobile devices we develop novel efficient collaborative half frame decoding schemeand design tightly coupled collaborative system architecture that aggregates resources of both devices to achieve the task we have implemented the system and conducted experimental evaluation results confirm that our proposed collaborative and resource aggregation techniques can achieve our vision of better together mobile experiences
exceptions in safety critical systems must be addressed during conceptual design and risk analysis we developed conceptual model of exceptions methodology for eliciting and modeling exceptions and templates for modeling them in an extension of the object process methodology opm system analysis and design methodology and language that uses single graphical model for describing systems including their timing exceptions which has been shown to be an effective modeling methodology using an antibiotics treatment guideline as case study we demonstrate the value of our approach in eliciting and modeling exceptions that occur in clinical care systems
we consider the problem of sparse interpolation of an approximate multivariate black box polynomial in floating point arithmetic that is both the inputs and outputs of the black box polynomial have some error and all numbers are represented in standard fixed precision floating point arithmetic by interpolating the black box evaluated at random primitive roots of unity we give efficient and numerically robust solutions we note the similarity between the exact ben or tiwari sparse interpolation algorithm and the classical prony’s method for interpolating sum of exponential functions and exploit the generalized eigenvalue reformulation of prony’s method we analyse the numerical stability of our algorithms and the sensitivity of the solutions as well as the expected conditioning achieved through randomization finally we demonstrate the effectiveness of our techniques in practice through numerical experiments and applications
wireless sensor networks consist of system of distributed sensors embedded in the physical world and promise to allow observation of previously unobservable phenomena since they are exposed to unpredictable environments sensor network applications must handle wide variety of faults software errors node and link failures and network partitions the code to manually detect and recover from faults crosscuts the entire application is tedious to implement correctly and efficiently and is fragile in the face of program modifications we investigate language support for modularly managing faults our insight is that such support can be naturally provided as an extension to existing macroprogramming systems for sensor networks in such system programmer describes sensor network application as centralized program compiler then produces equivalent node level programs we describe simple checkpoint api for macroprograms which can be automatically implemented in distributed fashion across the network we also describe declarative annotations that allow programmers to specify checkpointing strategies at higher level of abstraction we have implemented our approach in the kairos macroprogramming system experiments show it to improve application availability by an order of magnitude and incur low messaging overhead
many authors have proposed power management techniques for general purpose processors at the cost of degraded performance such as lower ipc or longer delay some proposals have focused on cache memories because they consume significant fraction of total microprocessor power we propose reconfigurable and adaptive cache microarchitecture based on field programmable technology that is intended to deliver high performance at low energy consumption in this paper we evaluate the performance and energy consumption of run time algorithm when used to manage field programmable data cache the adaptation strategy is based on two techniques learning process provides the best cache configuration for each program phase and recognition process detects program phase changes by using data working set signatures to activate low overhead reconfiguration mechanism our proposals achieve performance improvement and cache energy saving at the same time considering design scenario driven by performance constraints we show that processor execution time and cache energy consumption can be reduced on average by and compared to non adaptive high performance microarchitecture alternatively when energy saving is prioritized and considering non adaptive energy efficient microarchitecture as baseline cache energy and processor execution time are reduced on average by and respectively in addition to comparing to conventional microarchitectures we show that the proposed microarchitecture achieves better performance and more cache energy reduction than other configurable caches
this paper investigates how the vision of the semantic web can be carried over to the realm of email we introduce general notion of semantic email in which an email message consists of structured query or update coupled with corresponding explanatory text semantic email opens the door to wide range of automated email mediated applications with formally guaranteed properties in particular this paper introduces broad class of semantic email processes for example consider the process of sending an email to program committee asking who will attend the pc dinner automatically collecting the responses and tallying them up we define both logical and decision theoretic models where an email process is modeled as set of updates to data set on which we specify goals via certain constraints or utilities we then describe set of inference problems that arise while trying to satisfy these goals and analyze their computational tractability in particular we show that for the logical model it is possible to automatically infer which email responses are acceptable wrt set of constraints in polynomial time and for the decision theoretic model it is possible to compute the optimal message handling policy in polynomial time in addition we show how to automatically generate explanations for process’s actions and identify cases where such explanations can be generated in polynomial time finally we discuss our publicly available implementation of semantic email and outline research challenges in this realm
exception handling in workflow management systems wfmss is very important problem since it is not possible to specify all possible outcomes and alternatives effective reuse of existing exception handlers can greatly help in dealing with workflow exceptions on the other hand cooperative support for user driven resolution of unexpected exceptions and workflow evolution at run time is vital for an adaptive wfms we have developed adome wfms via meta modeling approach as comprehensive framework in which the problem of workflow exception handling can be adequately addressed in this chapter we present an overview of exception handling in adome wfms with procedures for supporting the following reuse of exception handlers thorough and automated resolution of expected exceptions effective management of problem solving agents cooperative exception handling user driven computer supported resolution of unexpected exceptions and workflow evolution
constraint programming holds many promises for model driven software development mdsd up to now constraints have only started to appear in mdsd modeling languages but have not been properly reflected in model transformation this paper introduces constraint programming in model transformation shows how constraint programming integrates with qvt relations as pathway to wide spread use of our approach and describes the corresponding model transformation engine in particular the paper will illustrate the use of constraint programming for the specification of attribute values in target models and provide qualitative evaluation of the benefit drawn from constraints integrated with qvt relations
we investigate generalizations of the all subtrees dop approach to unsupervised parsing unsupervised dop models assign all possible binary trees to set of sentences and next use large random subset of all subtrees from these binary trees to compute the most probable parse trees we will test both relative frequency estimator for unsupervised dop and maximum likelihood estimator which is known to be statistically consistent we report state of the art results on english wsj german negra and chinese ctb data to the best of our knowledge this is the first paper which tests maximum likelihood estimator for dop on the wall street journal leading to the surprising result that an unsupervised parsing model beats widely used supervised model treebank pcfg
drawback of structured prediction methods is that parameter estimation requires repeated inference which is intractable for general structures in this paper we present an approximate training algorithm called piecewise training pw that divides the factors into tractable subgraphs which we call pieces that are trained independently piecewise training can be interpreted as approximating the exact likelihood using belief propagation and different ways of making this interpretation yield different insights into the method we also present an extension to piecewise training called piecewise pseudolikelihood pwpl designed for when variables have large cardinality on several real world natural language processing tasks piecewise training performs superior to besag’s pseudolikelihood and sometimes comparably to exact maximum likelihood in addition pwpl performs similarly to pw and superior to standard pseudolikelihood but is five to ten times more computationally efficient than batch maximum likelihood training
this paper describes randomized algorithm for approximate counting that preserves the same modest memory requirements of log log bits per counter as the approximate counting algorithm introduced in the seminal paper of morris and in addition is characterized by lower expected number of memory accesses and ii lower standard error on more than percent of its counting range an exact analysis of the relevant statistical properties of the algorithm is carried out performance evaluation via simulations is also provided to validate the presented theory given its properties the presented algorithm is suitable as basic building block of data streaming applications having large number of simultaneous counters and or operating at very high speeds as such it is applicable to wide range of measurement and monitoring operations including performance monitoring of communication hardware measurements for optimization in large database systems and gathering statistics for data compression
path queries have been extensively used to query semistructured data such as the web and xml documents in this paper we introduce weighted path queries an extension of path queries enabling several classes of optimization problems such as the computation of shortest paths to be easily expressed weighted path queries are based on the notion of weighted regular expression ie regular expression whose symbols are associated to weight we characterize the problem of answering weighted path queries and provide an algorithm for computing their answer we also show how weighted path queries can be effectively embedded into query languages for xml data to express in simple and compact form several meaningful research problems
database security research aims to protect database from unintended activities such as authenticated misuse malicious attacks in recent years surviving dbms from an attack is becoming even more crucial because networks have become more open and the increasingly critical role that database servers are playing nowadays unlike the traditional database failure attack recovery mechanisms in this paper we propose light weight dynamic data damage tracking quarantine and recovery dtqr solution we built the dtqr scheme into the kernel of postgresql we comprehensively study this approach from few aspects eg system overhead impact of the intrusion detection system and the experimental results demonstrated that our dtqr can sustain an excellent data service while healing the database server when it is under malicious attack
this paper presents type based solution to the long standing problem of object initialization constructors the conventional mechanism for object initialization have semantics that are surprising to programmers and that lead to bugs they also contribute to the problem of null pointer exceptions which make software less reliable masked types are new type state mechanism that explicitly tracks the initialization state of objects and prevents reading from uninitialized fields in the resulting language constructors are ordinary methods that operate on uninitialized objects and no special default value null is needed in the language initialization of cyclic data structures is achieved with the use of conditionally masked types masked types are modular and compatible with data abstraction the type system is presented in simplified object calculus and is proved to soundly prevent reading from uninitialized fields masked types have been implemented as an extension to java in which compilation simply erases extra type information experience using the extended language suggests that masked types work well on real code
dynamic peer group dpg applications such as video conferencing and multi user gaming environments have proliferated the internet and group key agreement protocols are critical in securing dpg communication identity based cryptosystems avoid certificate management complications inherent to traditional public key systems but introduce key escrow which is highly undesirable in most applications we design scalable technique to eliminate key escrow and develop distributed escrow less identity based group key agreement protocol for securing resource constrained dpg communication the proposed protocol is provably secure against passive adversaries resilient to active attacks scalable and more cost effective than any such protocol we provide detailed theoretical analysis to support the claims
we show that the accessibility problem the common descendant problem the termination problem and the uniform termination problem are undecidable for rules semi thue systems as corollary we obtain the undecidability of the post correspondence problem for rules
this paper introduces the concept of virtual access points vaps for wireless vehicular ad hoc networks vanets this new technique allows data dissemination among vehicles thus extending the reach of roadside access points to uncovered road areas each vehicle that receives message from an access point ap stores this message and rebroadcasts it into non covered areas this extends the network coverage for non time critical messages the vap role is transparent to the connected nodes and designed to avoid interference since each operates on bounded region outside any ap the experiments show the presented mechanism of store and forward at specific positions present gain in term of all the evaluated parameters
we show new camera based interaction solution where an ordinary camera can detect small optical tags from relatively large distance current optical tags such as barcodes must be read within short range and the codes occupy valuable physical space on products we present new low cost optical design so that the tags can be shrunk to mm visible diameter and unmodified ordinary cameras several meters away can be set up to decode the identity plus the relative distance and angle the design exploits the bokeh effect of ordinary cameras lenses which maps rays exiting from an out of focus scene point into disk like blur on the camera sensor this bokeh code or bokode is barcode design with simple lenslet over the pattern we show that code with mu features can be read using an off the shelf camera from distances of up to meters we use intelligent binary coding to estimate the relative distance and angle to the camera and show potential for applications in augmented reality and motion capture we analyze the constraints and performance of the optical system and discuss several plausible application scenarios
informal interactions an important subject of study in cscw are an essential resource in hospital work they are used as means to collaborate and to coordinate the way in which the work is performed as well as to locate and gather the artifacts and human resources necessary for patient care among others results from an observational study of work in public hospital show that significant amount of informal interactions happen face to face due to opportunistic encounters that is due to hospital work being mainly characterized by intense mobility task fragmentation collaboration and coordination this encouraged us to develop an architecture and system tool aimed at supporting mobile co located collaboration based on the findings of our study this paper presents set of design insights for developing collaborative applications that support co located interactions in hospital work as well as the implementation of these design insights in collaborative tool additionally we generalized the characteristic that must be fulfilled by tools that support mobile informal co located collaboration through the design of generic architecture that includes the characteristics of this type of tools
the multiple expected sources of traffic skewness in next generation sensornets ngsn will trigger the need for load balanced point to point routing protocols driven by this fact we present in this paper load balancing primitive namely traffic oblivious load balancing tolb to be used on top of any point to point routing protocol tolb obliviously load balances traffic by pushing the decision making responsibility to the source of any packet without depending on the energy status of the network sensors or on previously taken decisions for similar packets we present theoretical bounds on tolb’s performance for special network types such as mesh networks additionally we ran simulations to evaluate tolb’s performance on general networks our experimental results show the high benefit in terms of network lifetime and throughput of applying tolb on top of routing schemes to deal with various traffic skewness levels in different sensor deployment scenarios
in this paper we propose an object oriented model for designing hypermedia applications as the object oriented paradigm allows complex and user defined types nonconventional and nonatomic attributes we can take advantage of these capabilities not only for information modelling but also for providing alternative ways for accessing informationa query language is then presented it is based on an object oriented database system query language it combines features of object oriented databases queries and primitives for hypermedia navigation the language offers the possibility of querying both the application domain information and allowing the designers to obtain information about the schema of the applicationwe present some examples of the use of the object oriented model and the query language
this paper reports on study of professional web designers and developers we provide detailed characterization of their knowledge of fundamental programming concepts elicited through card sorting additionally we present qualitative findings regarding their motivation to learn new concepts and the learning strategies they employ we find high level of recognition of basic concepts but we identify number of concepts that they do not fully understand consider difficult to learn and use infrequently we also note that their learning process is motivated by work projects and often follows pattern of trial and error we conclude with implications for end user programming researchers
although researchers have begun to explicitly support end user programmers debugging by providing information to help them find bugs there is little research addressing the right content to communicate to these users the specific semantic content of these debugging communications matters because if the users are not actually seeking the information the system is providing they are not likely to attend to it this paper reports formative empirical study that sheds light on what end users actually want to know in the course of debugging spreadsheet given the availability of set of interactive visual testing and debugging features our results provide in sights into end user debuggers information gaps and further suggest opportunities to improve end user debugging systems support for the things end user debuggers actually want to know
debugging long running multithreaded programs is very challenging problem when using tracing based analyses since such programs are non deterministic reproducing the bug is non trivial and generating and inspecting traces for long running programs can be prohibitively expensive we propose framework in which to overcome the problem of bug reproducibility lightweight logging technique is used to log the events during the original execution when bug is encountered it is reproduced using the generated log and during the replay fine grained tracing technique is employed to collect control flow dependence traces that are then used to locate the root cause of the bug in this paper we address the key challenges resulting due to tracing that is the prohibitively high expense of collecting traces and the significant burden on the user who must examine the large amount of trace information to locate the bug in long running multithreaded program these challenges are addressed through execution reduction that realizes combination of logging and tracing such that traces collected contain only the execution information from those regions of threads that are relevant to the fault this approach is highly effective because we observe that for long running multithreaded programs many threads that execute are irrelevant to the fault hence these threads need not be replayed and traced when trying to reproduce the bug we develop novel lightweight scheme that identifies such threads by observing all the interthread data dependences and removes their execution footprint in the replay run in addition we identify regions of thread executions that need not be replayed or if they must be replayed we determine if they need not be traced following execution reduction the replayed execution takes lesser time to run and it produces much smaller trace than the original execution thus the cost of collecting traces and the effort of examining the traces to locate the fault are greatly reduced
current approaches to modeling the structure and semantics of video recordings restrict its reuse this is because these approaches are either too rigidly structured or too generally structured and so do not represent the structural and semantic regularities of classes of video recordings this paper proposes framework which tackles the problem of reuse by supporting the definition of wide range of models of video recordings and supporting reuse between them examples of the framework’s use are presented and examined with respect to different kinds of reuse of video current research and the development of toolset to support the framework
adaptive programs compute with objects just like object oriented programs each task to be accomplished is specified by so called propagation pattern which traverses the receiver object the object traversal is recursive descent via the instance variables where information is collected or propagated along the way propagation pattern consists of name for the task succinct specification of the parts of the receiver object that should be traversed and code fragments to be executed when specific object types are encountered the propagation patterns need to be complemented by class graph which defines the detailed object structure the separation of structure and behavior yields degree of flexibility and understandability not present in traditional object oriented languages for example the class graph can be changed without changing the adaptive program at all we present an efficient implementation of adaptive programs given an adaptive program and class graph we generate an efficient object oriented program for example in moreover we prove the correctness of the core of this translation key assumption in the theorem is that the traversal specifications are consistent with the class graph we prove the soundness of proof system for conservatively checking consistency and we show how to implement it efficiently
we study online social networks in which relationships can be either positive indicating relations such as friendship or negative indicating relations such as opposition or antagonism such mix of positive and negative links arise in variety of online settings we study datasets from epinions slashdot and wikipedia we find that the signs of links in the underlying social networks can be predicted with high accuracy using models that generalize across this diverse range of sites these models provide insight into some of the fundamental principles that drive the formation of signed links in networks shedding light on theories of balance and status from social psychology they also suggest social computing applications by which the attitude of one user toward another can be estimated from evidence provided by their relationships with other members of the surrounding social network
memory based methods for collaborative filtering predict new ratings by averaging weighted ratings between respectively pairs of similar users or items in practice large number of ratings from similar users or similar items are not available due to the sparsity inherent to rating data consequently prediction quality can be poor this paper re formulates the memory based collaborative filtering problem in generative probabilistic framework treating individual user item ratings as predictors of missing ratings the final rating is estimated by fusing predictions from three sources predictions based on ratings of the same item by other users predictions based on different item ratings made by the same user and third ratings predicted based on data from other but similar users rating other but similar items existing user based and item based approaches correspond to the two simple cases of our framework the complete model is however more robust to data sparsity because the different types of ratings are used in concert while additional ratings from similar users towards similar items are employed as background model to smooth the predictions experiments demonstrate that the proposed methods are indeed more robust against data sparsity and give better recommendations
ontologies have proven to be powerful tool for many tasks such as natural language processing and information filtering and retrieval however their development is an error prone and expensive task one approach for this problem is to provide automatic or semi automatic support for ontology construction this work presents the probabilistic relational hierarchy extraction prehe technique an approach for extracting concept hierarchies from text that uses statistical relational learning and natural language processing for combining cues from many state of the art techniques markov logic network has been developed for this task and is described here preliminary evaluation of the proposed approach is also outlined
in color spatial retrieval technique the color information is integrated with the knowledge of the colors spatial distribution to facilitate content based image retrieval several techniques have been proposed in the literature but these works have been developed independently without much comparison in this paper we present an experimental evaluation of three color spatial retrieval techniques mdash the signature based technique the partition based algorithm and the cluster based method we implemented these techniques and compare them on their retrieval effectiveness and retrieval efficiency the experimental study is performed on an image database consisting of images with the proliferation of image retrieval mechanisms and the lack of extensive performance study the experimental results can serve as guidelines in selecting suitable technique and designing new technique
existing studies on time series are based on two categories of distance functions the first category consists of the lp norms they are metric distance functions but cannot support local time shifting the second category consists of distance functions which are capable of handling local time shifting but are nonmetric the first contribution of this paper is the proposal of new distance function which we call erp edit distance with real penalty representing marriage of norm and the edit distance erp can support local time shifting and is metric the second contribution of the paper is the development of pruning strategies for large time series databases given that erp is metric one way to prune is to apply the triangle inequality another way to prune is to develop lower bound on the erp distance we propose such lower bound which has the nice computational property that it can be efficiently indexed with standard tree moreover we show that these two ways of pruning can be used simultaneously for erp distances specifically the false positives obtained from the tree can be further minimized by applying the triangle inequality based on extensive experimentation with existing benchmarks and techniques we show that this combination delivers superb pruning power and search time performance and dominates all existing strategies
people are thirsty for medical information existing web search engines often cannot handle medical search well because they do not consider its special requirements often medical information searcher is uncertain about his exact questions and unfamiliar with medical terminology therefore he sometimes prefers to pose long queries describing his symptoms and situation in plain english and receive comprehensive relevant information from search results this paper presents medsearch specialized medical web search engine to address these challenges medsearch uses several key techniques to improve its usability and the quality of search results first it accepts queries of extended length and reforms long queries into shorter queries by extracting subset of important and representative words this not only significantly increases the query processing speed but also improves the quality of search results second it provides diversified search results lastly it suggests related medical phrases to help the user quickly digest search results and refine the query we evaluated medsearch using medical questions posted on medical discussion forums the results show that medsearch can handle various medical queries effectively and efficiently
information extraction by text segmentation iets applies to cases in which data values of interest are organized in implicit semi structured records available in textual sources eg postal addresses bibliographic information ads it is an important practical problem that has been frequently addressed in the recent literature in this paper we introduce ondux on demand unsupervised information extraction new unsupervised probabilistic approach for iets as other unsupervised iets approaches ondux relies on information available on pre existing data to associate segments in the input string with attributes of given domain unlike other approaches we rely on very effective matching strategies instead of explicit learning strategies the effectiveness of this matching strategy is also exploited to disambiguate the extraction of certain attributes through reinforcement step that explores sequencing and positioning of attribute values directly learned on demand from test data with no previous human driven training feature unique to ondux this assigns to ondux high degree of flexibility and results in superior effectiveness as demonstrated by the experimental evaluation we report with textual sources from different domains in which ondux is compared with state of art iets approach
this paper presents technique called register value prediction rvp which uses type of locality called register value reuse by predicting that an instruction will produce the value that is already stored in the destination register we eliminate the need for large value buffers to enable value prediction even without the large buffers register value prediction can be made as or more effective than last value prediction particularly with the aid of compiler management of values in the register fileboth static and dynamic register value prediction techniques are demonstrated to exploit register value reuse the former requiring minimal instruction set architecture changes and the latter requiring set of small confidence counters we show an average gain of with dynamic rvp and moderate compiler assistance on next generation processor and on wide processor
we continue to investigate the direct style transformation by extending it to programs requiring call with current continuation aka call cc the direct style ds and the continuation passing style cps transformations form galois connection this pair of functions has place in the programmer’s toolbox mdash yet we are not aware of the existence of any other ds transformer starting from our ds transformer towards pure call by value functional terms scheme we extend it with counting analysis to detect non canonical occurrences of continuation the declaration of such continuation is translated into call cc and its application into the application of the corresponding first class continuation we also present staged versions of the ds and of the cps transformations where administrative reductions are separated from the actual translation and where the actual translations are carried out by local structure preserving rewriting rules these staged transformations are used to prove the galois continuation together the cps and the ds transformations enlarge the class of programs that can be manipulated on semantic basis we illustrate this point with partial evaluation by specializing scheme program with respect to static part of its input the program uses coroutines this illustration achieves first static coroutine is executed statically and its computational content is inlined in the residual program
this paper describes the results of research project aimed at implementing realistic embodied agent that can be animated in real time and is believable and expressive that is able to coherently communicate complex information through the combination and the tight synchronisation of verbal and nonverbal signals we describe in particular how we animate this agent that we called greta so as to enable her to manifest the affective states that are dynamically activated and de activated in her mind during the dialog with the user the system is made up of three tightly interrelated components representation of the agent mind this includes long and short term affective components personality and emotions and simulates how emotions are triggered and decay over time according to the agent’s personality and to the context and how several emotions may overlap dynamic belief networks with weighting of goals is the formalism we employ to this purpose mark up language to denote the communicative meanings that may be associated with dialog moves performed by the agent translation of the agent’s tagged move into face expression that combines appropriately the available channels gaze direction eyebrow shape head direction and movement etc the final output is facial model that respects the mpeg standard and uses mpeg facial animation parameters to produce facial expressions throughout the paper we illustrate the results obtained with an example of dialog in the domain of advice about eating disorders the paper concludes with an analysis of advantages of our cognitive model of emotion triggering and of the problems found in testing it although we did not yet complete formal evaluation of our system we briefly describe how we plan to assess the agent’s believability in terms of consistency of its communicative behaviour
the increasing complexity of applications on handheld devices requires the development of rich new interaction methods specifically designed for resource limited mobile use contexts one appealingly convenient approach to this problem is to use device motions as input paradigm in which the currently dominant interaction metaphors are gesture recognition and visually mediated scrolling however neither is ideal the former suffers from fundamental problems in the learning and communication of gestural patterns while the latter requires continual visual monitoring of the mobile device task that is undesirable in many mobile contexts and also inherently in conflict with the act of moving device to control it this paper proposes an alternate approach gestural menu technique inspired by marking menus and designed specifically for the characteristics of motion input it uses rotations between targets occupying large portions of angular space and emphasizes kinesthetic eyes free interaction three evaluations are presented two featuring an abstract user interface ui and focusing on how user performance changes when the basic system parameters of number size and depth of targets are manipulated these studies show that version of the menu system containing commands yields optimal performance compares well against data from the previous literature and can be used effectively eyes free without graphical feedback the final study uses full graphical ui and untrained users to demonstrate that the system can be rapidly learnt together these three studies rigorously validate the system design and suggest promising new directions for handheld motion based uis
efficiently building and maintaining resilient regular graphs is important for many applications such graphs must be easy to build and maintain in the presence of node additions and deletions they must also have high resilience connectivity typically algorithms use offline techniques to build regular graphs with strict bounds on resilience and such techniques are not designed to maintain these properties in the presence of online additions deletions and failures on the other hand random regular graphs are easy to construct and maintain and provide good properties with high probability but without strict guarantees in this paper we introduce new class of graphs that we call resilient random regular graphs and present technique to create and maintain graphs the graphs meld the desirable properties of random regular graphs and regular graphs with strict structural properties they are efficient to create and maintain and additionally are highly connected ie node and edge connected in the worst case we present the graph building and maintenance techniques present proofs for graph connectedness and various properties of graphs we believe that graphs will be useful in many communication applications
this paper proposes novel interaction techniques for parallel search tasks the system displays multiple search results returned by search engine side by side for each search query the system enables the user to rerank search results using reranking algorithm based on vertical and horizontal propagation of his her intention method of recommending operations for specific keywords is also proposed supporting operations such as shift to parallel search with an alternate term upgrading or downgrading results in terms of specific viewpoint and so on applications for the proposed system are also discussed
an important correctness criterion for software running on embedded microcontrollers is stack safety guarantee that the call stack does not overflow our first contribution is method for statically guaranteeing stack safety of interrupt driven embedded software using an approach based on context sensitive dataflow analysis of object code we have implemented prototype stack analysis tool that targets software for atmel avr microcontrollers and tested it on embedded applications compiled from up to lines of we experimentally validate the accuracy of the tool which runs in under sec on the largest programs that we tested the second contribution of this paper is the development of two novel ways to reduce stack memory requirements of embedded software
with more computing platforms connected to the internet each day computer system security has become critical issue one of the major security problems is execution of malicious injected code in this paper we propose new processor extensions that allow execution of trusted instructions only the proposed extensions verify instruction block signatures in run time signatures are generated during trusted installation process using multiple input signature register misr and stored in an encrypted form the coefficients of the misr and the key used for signature encryption are based on hidden processor key signature verification is done in the background concurrently with program execution thus reducing negative impact on performance the preliminary results indicate that the proposed processor extensions will prevent execution of any unauthorized code at relatively small increase in system complexity and execution time
radio frequency identification is gaining broader adoption in many areas one of the challenges in implementing an rfid based system is dealing with anomalies in rfid reads small number of anomalies can translate into large errors in analytical results conventional eager approaches cleanse all data upfront and then apply queries on cleaned data however this approach is not feasible when several applications define anomalies and corrections on the same data set differently and not all anomalies can be defined beforehand this necessitates anomaly handling at query time we introduce deferred approach for detecting and correcting rfid data anomalies each application specifies the detection and the correction of relevant anomalies using declarative sequence based rules an application query is then automatically rewritten based on the cleansing rules that the application has specified to provide answers over cleaned data we show that naive approach to deferred cleansing that applies rules without leveraging query information can be prohibitive we develop two novel rewrite methods both of which reduce the amount of data to be cleaned by exploiting predicates in application queries while guaranteeing correct answers we leverage standardized sql olap functionality to implement rules specified in declarative sequence based language this allows efficient evaluation of cleansing rules using existing query processing capabilities of dbms our experimental results show that deferred cleansing is affordable for typical analytic queries over rfid data
modern window based user interface systems generate user interface events as natural products of their normal operation because such events can be automatically captured and because they indicate user behavior with respect to an application’s user interface they have long been regarded as potentially fruitful source of information regarding application usage and usability however because user interface events are typically voluminos and rich in detail automated support is generally required to extract information at level of abstraction that is useful to investigators interested in analyzing application usage or evaluating usability this survey examines computer aided techniques used by hci practitioners and researchers to extract usability related information from user interface events framework is presented to help hci practitioners and researchers categorize and compare the approaches that have been or might fruitfully be applied to this problem because many of the techniques in the research literature have not been evaluated in practice this survey provides conceptual evaluation to help identify some of the relative merits and drawbacks of the various classes of approaches ideas for future research in this area are also presented this survey addresses the following questions how might user interface events be used in evaluating usability how are user interface events related to other forms of usability data what are the key challenges faced by investigators wishing to exploit this data what approaches have been brought to bear on this problem and how do they compare to one another what are some of the important open research questions in this area
in recent years there has been tremendous growth of online text information related to the explosive growth of the web which provides very useful information resource to all types of users who access the internet for various purposes the major demand of these users is to get required information within the stipulated time humans can recognise subjects of document fields by reading only some relevant specific words called field association words in the field this paper presents method of relevant estimation among fields by using field association words two methods are proposed in this paper first is method of extraction of co occurrence among fields and the second is method of judgment of similarity among fields as the methods of relevant estimation among fields from experimental results precision of the first method is high when relevance among fields is very high and considering direction of fields preferable results are obtained in the second method
in order to obtain machine understandable semantics for web resources research on the semantic web tries to annotate web resources with concepts and relations from explicitly defined formal ontologies this kind of formal annotation is usually done manually or semi automatically in this paper we explore complement approach that focuses on the social annotations of the web which are annotations manually made by normal web users without pre defined formal ontology compared to the formal annotations although social annotations are coarse grained informal and vague they are also more accessible to more people and better reflect the web resources meaning from the users point of views during their actual usage of the web resources using social bookmark service as an example we show how emergent semantics can be statistically derived from the social annotations furthermore we apply the derived emergent semantics to discover and search shared web bookmarks the initial evaluation on our implementation shows that our method can effectively discover semantically related web bookmarks that current social bookmark service can not discover easily
miss rate curves mrcs are useful in number of contexts in our research online cache mrcs enable us to dynamically identify optimal cache sizes when cache partitioning shared cache multicore processor obtaining mrcs has generally been assumed to be expensive when done in software and consequently their usage for online optimizations has been limited to address these problems and opportunities we have developed low overhead software technique to obtain mrcs online on current processors exploiting features available in their performance monitoring units so that no changes to the application source code or binaries are required our technique called rapidmrc requires single probing period of roughly million processor cycles ms and subsequently million cycles ms to process the data we demonstrate its accuracy by comparing the obtained mrcs to the actual mrcs of applications taken from speccpu speccpu and specjbb we show that rapidmrc can be applied to sizing cache partitions helping to achieve performance improvements of up to
in this paper we present framework for mining diverging patterns new type of contrast patterns whose frequency changes significantly differently in two data sets eg it changes from relatively low to relatively high value in one dataset but from high to low in the other in this framework measure called diverging ratio is defined and used to discover diverging patterns we use four dimensional vector to represent pattern and define the pattern’s diverging ratio based on the angular difference between its vectors in two datasets an algorithm is proposed to mine diverging patterns from pair of datasets which makes use of standard frequent pattern mining algorithm to compute vector components efficiently we demonstrate the effectiveness of our approach on real world datasets showing that the method can reveal novel knowledge from large databases
we present formal framework to specify and test systems presenting both soft and hard deadlines while hard deadlines must be always met on time soft deadlines can be sometimes met in different time usually higher from the specified one it is this characteristic to formally define sometimes what produces several reasonable alternatives to define appropriate implementation relations that is relations to decide wether an implementation is correct with respect to specification in addition to introduce these relations we define testing framework to test implementations
information extraction ie systems are prone to false hits for variety of reasons and we observed that many of these false hi ts occur in sentences that contain subjective language eg opinions emotions and sentiments motivated by these observations we explore the idea of using subjectivity analysis to improve the precision of information extraction systems in this paper we describe an ie system that uses subjective sentence classifier to filter its extractions we experimented with several different strategies for using the subjectivity classifications including an aggressive strategy that discards all extractions found in subjective sentences and more complex strategies that selectively discard extractions we evaluated the performance of these different approaches on the muc terrorism data set we found that indiscriminately filtering extractions from subjective sentences was overly aggressive but more selective filtering strategies improved ie precision with minimal recall loss
we present novel counting network construction where the number of input wires is smaller than or equal to the number of output wires the depth of our network is lg which depends only on in contrast the amortized contention of the network depends on the number of concurrent processes and the parameters and this offers more flexibility than all previously known networks with the same number of input and output wires whose contention depends only on two parameters and in case wlgw by choosing wlgw the contention of our network is nlgw which improves by logarithmic factor of over all previously known networks with wires
lot of recent work has focussed on bulk loading of data into multidimensional index structures in order to efficiently construct such structures for large datasets in this paper we address this problem with particular focus on trees mdash which are an important class of index structures used widely in commercial database systems we propose new technique which as opposed to the current technique of inserting data one by one bulk inserts entire new datasets into an active tree this technique called stlt for small tree large tree considers the new dataset as an tree itself small tree identifies and prepares suitable location in the original tree large tree for insertion and lastly performs the insert of the small tree into the large tree besides an analytical cost model of stlt extensive experimental studies both on synthetic and real gis data sets are also reported these experiments not only compare stlt against the conventional technique but also evaluate the suitability and limitations of stlt under different conditions such as varying buffer sizes ratio between existing and new data sizes and skewness of new data with respect to the whole spatial region we find that stlt does much better in average about than the existing technique for skewed datasets as well for large sizes of both the large tree and the small tree in terms of insertion time while keeping comparable query tree quality stlt consistently outperforms the alternate technique in all other circumstances in terms of bulk insertion time especially even up to for the cases when the area of new data sets covers up to of the global region covered by the existing index tree however at the cost of deteriorating resulting tree quality
the advent of the internet and the web and their subsequent ubiquity have brought forth opportunities to connect information sources across all types of boundaries local regional organizational etc examples of such information sources include databases xml documents and other unstructured sources uniformly querying those information sources has been extensively investigated major challenge relates to query optimization indeed querying multiple information sources scattered on the web raises several barriers for achieving efficiency this is due to the characteristics of web information sources that include volatility heterogeneity and autonomy those characteristics impede straightforward application of classical query optimization techniques they add new dimensions to the optimization problem such as the choice of objective function selection of relevant information sources limited query capabilities and unpredictable events in this paper we survey the current research on fundamental problems to efficiently process queries over web data integration systems we also outline classification for optimization techniques and framework for evaluating them
garbage first is server style garbage collector targeted for multi processors with large memories that meets soft real time goal with high probability while achieving high throughput whole heap operations such as global marking are performed concurrently with mutation to prevent interruptions proportional to heap or live data size concurrent marking both provides collection completeness and identifies regions ripe for reclamation via compacting evacuation this evacuation is performed in parallel on multiprocessors to increase throughput
this article discusses potential application of radio frequency identification rfid and collaborative filtering for targeted advertising in grocery stores every day hundreds of items in grocery stores are marked down for promotional purposes whether these promotions are effective or not depends primarily on whether the customers are aware of them or not and secondarily whether the customers are interested in the products or not currently the companies are incapable of influencing the customers decisionmaking process while they are shopping however the capabilities of rfid technology enable us to transfer the recommendation systems of commerce to grocery stores in our model using rfid technology we get real time information about the products placed in the cart during the shopping process based on that information we inform the customer about those promotions in which the customer is likely to be interested in the selection of the product advertised is dynamic decision making process since it is based on the information of the products placed inside the cart while customer is shopping collaborative filtering will be used for the identification of the advertised product and bayesian networks will be used for the application of collaborative filtering we are assuming scenario where all products have rfid tags and grocery carts are equipped with rfid readers and screens that would display the relevant promotions
modern computer systems permit users to access protected information from remote locations in certain secure environments it would be desirable to restrict this access to particular computer or set of computers existing solutions of machine level authentication are undesirable for two reasons first they do not allow fine grained application layer access decisions second they are vulnerable to insider attacks in which trusted administrator acts maliciously in this work we describe novel approach using secure hardware that solves these problems in our design multiple administrators are required for installation of system after installation the authentication privileges are physically linked to that machine and no administrator can bypass these controls we define an administrative model and detail the requirements for an authentication protocol to be compatible with our methodology our design presents some challenges for large scale systems in addition to the benefit of reduced maintenance
wi fi clients can obtain much better performance at some commercial hotspots than at others unfortunately there is currently no way for users to determine which hotspot access points aps will be sufficient to run their applications before purchasing access to address this problem this paper presents wifi reports collaborative service that provides wi fi clients with historical information about ap performance and application support the key research challenge in wifi reports is to obtain accurate user submitted reports this is challenging because two conflicting goals must be addressed in practical system preserving the privacy of users reports and limiting fraudulent reports we introduce practical cryptographic protocol that achieves both goals and we address the important engineering challenges in building wifi reports using measurement study of commercial aps in seattle we show that wifi reports would improve performance over previous ap selection approaches in of locations
in this paper we present an approach to design of command tables in aircraft cockpits to date there is no common standard for designing this kind of command tables command tables impose high load on human visual senses for displaying flight information such as altitude attitude vertical speed airspeed heading and engine power heavy visual workload and physical conditions significantly influence cognitive processes of an operator in an aircraft cockpit proposed solution formalizes the design process describing instruments in terms of estimated effects they produce on flight operators in this way we can predict effects and constraints of particular type of flight instrument and avoid unexpected effects early in the design process
there has been flurry of recent work on the design of high performance software and hybrid hardware software transactional memories stms and hytms this paper reexamines the design decisions behind several of these stateof the art algorithms adopting some ideas rejecting others all in an attempt to make stms faster we created the transactional locking tl framework of stm algorithms and used it to conduct range of comparisons of the performance of non blocking lock based and hybrid stm algorithms versus fine grained hand crafted ones we were able to make several illuminating observations regarding lock acquisition order the interaction of stms with memory management schemes and the role of overheads and abort rates in stm performance
the analysis of movement of people vehicles and other objects is important for carrying out research in social and scientific domains the study of movement behavior of spatiotemporal entities helps enhance the quality of service in decision making in real applications however the spread of certain entities such as diseases or rumor is difficult to observe compared to the movement of people vehicles or animals we can only infer their locations in certain region of space time on the basis of observable events in this paper we propose new model called as moving phenomenon to represent time varying phenomena over geotime tagged contents on the web the most important feature of this model is the integration of thematic dimension into an event based spatiotemporal data model by using the proposed model user can aggregate relevant contents relating to an interesting phenomenon and perceive its movement behavior further the model also enables user to navigate the spatial temporal and thematic information of the contents along all the three dimensions finally we present an example of typhoons to illustrate moving phenomena and draw comparison between the movement of the moving phenomenon created using information from news articles on the web and that of the actual typhoon
we present detailed study of network evolution by analyzing four large online social networks with full temporal information about node and edge arrivals for the first time at such large scale we study individual node arrival and edge creation processes that collectively lead to macroscopic properties of networks using methodology based on the maximum likelihood principle we investigate wide variety of network formation strategies and show that edge locality plays critical role in evolution of networks our findings supplement earlier network models based on the inherently non local preferential attachment based on our observations we develop complete model of network evolution where nodes arrive at prespecified rate and select their lifetimes each node then independently initiates edges according to gap process selecting destination for each edge according to simple triangle closing model free of any parameters we show analytically that the combination of the gap distribution with the node lifetime leads to power law out degree distribution that accurately reflects the true network in all four cases finally we give model parameter settings that allow automatic evolution and generation of realistic synthetic networks of arbitrary scale
we study complexity issues for interaction systems general model for component based systems that allows for very flexible interaction mechanism we present complexity results for important properties of interaction systems such as local global deadlock freedom progress and availability of components
many distributed monitoring applications of wireless sensor networks wsns require the location information of sensor node in this article we address the problem of enabling nodes of wireless sensor networks to determine their location in an untrusted environment known as the secure localization problem we propose novel range independent localization algorithm called serloc that is well suited to resource constrained environment such as wsn serloc is distributed algorithm based on two tier network architecture that allows sensors to passively determine their location without interacting with other sensors we show that serloc is robust against known attacks on wsns such as the wormhole attack the sybil attack and compromise of network entities and analytically compute the probability of success for each attack we also compare the performance of serloc with state of the art range independent localization schemes and show that serloc has better performance
in recent years network coding has emerged as new communication paradigm that can significantly improve the efficiency of network protocols by requiring intermediate nodes to mix packets before forwarding them recently several real world systems have been proposed to leverage network coding in wireless networks although the theoretical foundations of network coding are well understood real world system needs to solve plethora of practical aspects before network coding can meet its promised potential these practical design choices expose network coding systems to wide range of attacks we identify two general frameworks inter flow and intra flow that encompass several network coding based systems proposed in wireless networks our systematic analysis of the components of these frameworks reveals vulnerabilities to wide range of attacks which may severely degrade system performance then we identify security goals and design challenges in achieving security for network coding systems adequate understanding of both the threats and challenges is essential to effectively design secure practical network coding systems our paper should be viewed as cautionary note pointing out the frailty of current network coding based wireless systems and general guideline in the effort of achieving security for network coding systems
the real world we live in is mostly perceived through an incredibly large collection of views generated by humans machines and other systems this is the view reality the opsis project concentrates its efforts on dealing with the multifaceted form and complexity of data views including data projection views aggregate views summary views synopses and finally web views in particular opsis deals with the generation the storage organization cubetrees the efficient run time management dynamat of materialized views for data warehouse systems and for web servers with dynamic content webviews
the visual vocabulary is an intermediate level representation which has been proved to be very powerful for addressing object categorization problems it is generally built by vector quantizing set of local image descriptors independently of the object model used for categorizing images we propose here to embed the visual vocabulary creation within the object model construction allowing to make it more suited for object class discrimination and therefore for object categorization we also show that the model can be adapted to perform object level segmentation task without needing any shape model making the approach very adapted to high intra class varying objects
traditional web service discovery is strongly related to the use of service directories especially in the case of mobile web services where both service requestors and providers are mobile the dynamics impose the need for directory based discovery context plays an eminent role with mobility as filtering mechanism that enhances service discovery through the selection of the most appropriate service however current service directory specifications do not focus on mobility of services or context awareness in this paper we propose casd context aware service directory envisioned as context based index for services on top of any traditional service directory and design algorithms for construction search update and merge of such directories furthermore we describe the architecture of our system for context aware service discovery we present prototype implementation and discuss the experimental results as well as the overall evaluation the contribution of this work is the proposal for novel enhanced representation model for context aware service directory
we address the pose mismatch problem which can occur in face verification systems that have only single frontal face image available for training in the framework of bayesian classifier based on mixtures of gaussians the problem is tackled through extending each frontal face model with artificially synthesized models for non frontal views the synthesis methods are based on several implementations of maximum likelihood linear regression mllr as well as standard multi variate linear regression linreg all synthesis techniques rely on prior information and learn how face models for the frontal view are related to face models for non frontal views the synthesis and extension approach is evaluated by applying it to two face verification systems holistic system based on pca derived features and local feature system based on dct derived features experiments on the feret database suggest that for the holistic system the linreg based technique is more suited than the mllr based techniques for the local feature system the results show that synthesis via new mllr implementation obtains better performance than synthesis based on traditional mllr the results further suggest that extending frontal models considerably reduces errors it is also shown that the local feature system is less affected by view changes than the holistic system this can be attributed to the parts based representation of the face and due to the classifier based on mixtures of gaussians the lack of constraints on spatial relations between the face parts allowing for deformations and movements of face areas
views over distributed information sources such as data warehouses rely on the stability of the schemas of underlying databases in the event of meta data changes in the sources such as the deletion of table or column such views may become undefined using meta data about information redundancy views can be evolved as necessary to remain well defined after source meta data changesprevious work in view synchronization focused only on deletions of schema elements we now offer an approach that makes use of additions also our algorithm returns view definitions to previous versions by using knowledge about the history of views and meta data this technology enables us to adapt views to temporary meta data changes by canceling out opposite changes it also allows undo redo operations on meta data last in many cases the resulting evolved views even have an improved information quality in this paper we give formal taxonomy of schema and constraint changes and full description of the proposed history driven view synchronization algorithm for this taxonomy we also prove the history driven view synchronization algorithm to be correct our approach falls in the global as view category of data integration solutions but unlike prior solutions in this category it now also deals with changes in the information space rather than requiring source schemas to remain constant over time
this work addresses the problem of trading off the latency in delivering the answer to the sink at the benefit of balancing the spatial dispersion of the energy consumption among the nodes and consequently prolonging the lifetime in sensor networks typically in response to query that pertains to the data from some geographic region tree structure is constructed and when possible some in network aggregation is performed on the other hand in order to increase the robustness and or balance the load multipath routing is employed motivated by earlier work that combined trees and multipaths in this paper we explore the possibility and the impact of combining multiple trees and multiple multipaths for routing when processing query with respect to given region of interest we present and evaluate two approaches that enable load balancing in terms of alternating among collection of routing structures
short tcp flows may suffer significant response time performance degradations during network congestion unfortunately this creates an incentive for misbehavior by clients of interactive applications eg gaming telnet web to send dummy packets into the network at tcp fair rate even when they have no data to send thus improving their performance in moments when they do have data to send even though no law is violated in this way large scale deployment of such an approach has the potential to seriously jeopardize one of the core internet’s principles statistical multiplexing we quantify by means of analytical modeling and simulation gains achievable by the above misbehavior our research indicates that easy to implement application level techniques are capable of dramatically reducing incentives for conducting the above transgressions still without compromising the idea of statistical multiplexing
this paper presents the architecture of an asynchronous array of simple processors asap and evaluates its key architectural features as well as its performance and energy efficiency the asap processor calculates dsp applications with high energy efficiency is capable of high performance is easily scalable and is well suited to future fabrication technologies it is composed of two dimensional array of simple single issue programmable processors interconnected by reconfigurable mesh network processors are designed to capture the kernels of many dsp algorithms with very little additional overhead each processor contains its own tunable and haltable clock oscillator and processors operate completely asynchronously with respect to each other in globally asynchronous locally synchronous gals fashion asap array has been designed and fabricated in μm cmos technology each processor occupies mm is fully functional at clock rate of mhz at and dissipates an average of mw per processor at mhz under typical conditions while executing applications such as jpeg encoder core and complete ieee wireless lan baseband transmitter most processors operate at over mhz at processors dissipate mw at mhz and single asap processor occupies or less area than single processing element in other multi processor chips compared to several risc processors single issue mips and arm asap achieves performance times greater energy efficiency times greater while using far less area compared to the ti cx high end dsp processor asap achieves performance times greater energy efficiency times greater with an area times smaller compared to asic implementations asap achieves performance within factor of energy efficiency within factor of with area within factor of these data are for varying numbers of asap processors per benchmark
there is growing wealth of data describing networks of various types including social networks physical networks such as transportation or communication networks and biological networks at the same time there is growing interest in analyzing these networks in order to uncover general laws that govern their structure and evolution and patterns and predictive models to develop better policies and practices however fundamental challenge in dealing with this newly available observational data describing networks is that the data is often of dubious quality it is noisy and incomplete and before any analysis method can be applied the data must be cleaned and missing information inferred in this paper we introduce the notion of graph identification which explicitly models the inference of cleaned output network from noisy input graph it is this output network that is appropriate for further analysis we present an illustrative example and use the example to explore the types of inferences involved in graph identification as well as the challenges and issues involved in combining those inferences we then present simple general approach to combining the inferences in graph identification and experimentally show the utility of our combined approach and how the performance of graph identification is sensitive to the inter dependencies among these inferences
this paper studies the memory system behavior of java programs by analyzing memory reference traces of several specjvm applications running with just in time jit compiler trace information is collected by an exception based tracing tool called jtrace without any instrumentation to the java programs or the jit compilerfirst we find that the overall cache miss ratio is increased due to garbage collection which suffers from higher cache misses compared to the application we also note that going beyond way cache associativity improves the cache miss ratio marginally second we observe that java programs generate substantial amount of short lived objects however the size of frequently referenced long lived objects is more important to the cache performance because it tends to determine the application’s working set size finally we note that the default heap configuration which starts from small initial heap size is very inefficient since it invokes garbage collector frequently although the direct costs of garbage collection decrease as we increase the available heap size there exists an optimal heap size which minimizes the total execution time due to the interaction with the virtual memory performance
previous studies of non parametric kernel npk learning usually reduce to solving some semi definite programming sdp problem by standard sdp solver however time complexity of standard interior point sdp solvers could be as high as such intensive computation cost prohibits npk learning applicable to real applications even for data sets of moderate size in this paper we propose an efficient approach to npk learning from side information referred to as simplenpkl which can efficiently learn non parametric kernels from large sets of pairwise constraints in particular we show that the proposed simplenpkl with linear loss has closed form solution that can be simply computed by the lanczos algorithm moreover we show that the simplenpkl with square hinge loss can be re formulated as saddle point optimization task which can be further solved by fast iterative algorithm in contrast to the previous approaches our empirical results show that our new technique achieves the same accuracy but is significantly more efficient and scalable
discovering and unlocking the full potential of complex pervasive environments is still approached in application centric ways set of statically deployed applications often defines the possible interactions within the environment however the increasing dynamics of such environments require more versatile and generic approach which allows the end user to inspect configure and control the overall behavior of such an environment meta ui addresses these needs by providing the end user with an interactive view on physical or virtual environment which can then be observed and manipulated at runtime the meta ui bridges the gap between the resource providers and the end users by abstracting resource’s features as executable activities that can be assembled at runtime to reach common goal in order to allow software services to automatically integrate with pervasive computing environment the minimal requirements of the environment’s meta ui must be identified and agreed on in this paper we present meta stud goal and service oriented reference framework that supports the creation of meta uis for usage in pervasive environments the framework is validated using two independent implementation approaches designed with different technologies and focuses
case based reasoning cbr was firstly introduced into the area of business failure prediction bfp in the conclusion drawn out in its first application in this area is that cbr is not more applicable than multiple discriminant analysis mda and logit on the contrary there are some arguments which claim that cbr with nearest neighbor nn as its heart is not surely outranked by those machine learning techniques in this research we attempt to investigate whether or not cbr is sensitive to the so called optimal feature subsets in bfp since feature subset is an important factor that accounts for cbr’s performance when cbr is used to solve such classification problem the retrieval process of its life cycle is mainly used we use the classical euclidean metric technique to calculate case similarity empirical data two years prior to failure are collected from shanghai stock exchange and shenzhen stock exchange in china four filters ie mda stepwise method logit stepwise method one way anova independent samples test and the wrapper approach of genetic algorithm are employed to generate five optimal feature subsets after data normalization thirty times hold out method is used as assessment of predictive performances by combining leave one out cross validation and hold out method the two statistical baseline models ie mda and logit and the new model of support vector machine are employed as comparative models empirical results indicate that cbr is truly sensitive to optimal feature subsets with data for medium term bfp the stepwise method of mda filter approach is the first choice for cbr to select optimal feature subsets followed by the stepwise method of logit and the wrapper the two filter approaches of anova and test are the fourth choice if mda stepwise method is employed to select optimal feature subset for the cbr system there are no significant difference on predictive performance of medium term bfp between cbr and the other three models ie mda logit svm on the contrary cbr is outperformed by the three models at the significant level of if anova or test is used as feature selection method for cbr
understanding the nature of the workloads and system demands created by users of the world wide web is crucial to properly designing and provisioning web services previous measurements of web client workloads have been shown to exhibit number of characteristic features semi however it is not clear how those features may be changing with time in this study we compare two measurements of web client workloads separated in time by three years both captured from the same computing facility at boston university the older dataset obtained in is well known in the research literature and has been the basis for wide variety of studies the newer dataset was captured in and is comparable in size to the older dataset the new dataset has the drawback that the collection of users measured may no longer be representative of general web users semi however using it has the advantage that many comparisons can be drawn more clearly than would be possible using new different source of measurement our results fall into two categories first we compare the statistical and distributional properties of web requests across the two datasets this serves to reinforce and deepen our understanding of the characteristic statistical properties of web client requests we find that the kinds of distributions that best describe document sizes have not changed between and although specific values of the distributional parameters are different second we explore the question of how the observed differences in the properties of web client requests particularly the popularity and temporal locality properties affect the potential for web file caching in the network we find that for the computing facility represented by our traces between and the benefits of using size based caching policies have diminished semi and the potential for caching requested files in the network has declined
component based development is promising way to promote the productivity of large workflow systems development this paper proposes component based workflow systems development approach by investigating the following notions mechanisms and methods workflow component workflow component composition reuse association relationship between workflow components and workflow component repository the proposed approach is supported by set of development strategies and development platform through application and comparison we show the advantages of the component based workflow systems and the effectiveness of the proposed approach
given nodes in social network say authorship network how can we find the node author that is the center piece and has direct or indirect connections to all or most of them for example this node could be the common advisor or someone who started the research area that the nodes belong to isomorphic scenarios appear in law enforcement find the master mind criminal connected to all current suspects gene regulatory networks find the protein that participates in pathways with all or most of the given proteins viral marketing and many moreconnection subgraphs is an important first step handling the case of query nodes then the connection subgraph algorithm finds the intermediate nodes that provide good connection between the two original query nodeshere we generalize the challenge in multiple dimensions first we allow more than two query nodes second we allow whole family of queries ranging from or to and with softand in between finally we design and compare fast approximation and study the quality speed trade offwe also present experiments on the dblp dataset the experiments confirm that our proposed method naturally deals with multi source queries and that the resulting subgraphs agree with our intuition wall clock timing results on the dblp dataset show that our proposed approximation achieve good accuracy for about speedup
animating crowd of characters is an important problem in computer graphics the latest techniques enable highly realistic group motions to be produced in feature animation films and video games however interactive methods have not emerged yet for editing the existing group motion of multiple characters we present an approach to editing group motion as whole while maintaining its neighborhood formation and individual moving trajectories in the original animation as much as possible the user can deform group motion by pinning or dragging individuals multiple group motions can be stitched or merged to form longer or larger group motion while avoiding collisions these editing operations rely on novel graph structure in which vertices represent positions of individuals at specific frames and edges encode neighborhood formations and moving trajectories we employ shape manipulation technique to minimize the distortion of relative arrangements among adjacent vertices while editing the graph structure the usefulness and flexibility of our approach is demonstrated through examples in which the user creates and edits complex crowd animations interactively using collection of group motion clips
many testing and analysis techniques have been developed for inhouse use although they are effective at discovering defects before program is deployed these techniques are often limited due to the complexity of real world code and thus miss program faults it will be the users of the program who eventually experience failures caused by the undetected faults to take advantage of the large number of program runs carried by the users recent work has proposed techniques to collect execution profiles from the users for developers to perform post deployment failure analysis however in order to protect users privacy and to reduce run time overhead such profiles are usually not detailed enough for the developers to identify or fix the root causes of the failures in this paper we propose novel approach to utilize user execution profiles for more effective in house testing and analysis our key insight is that execution profiles for program failures can be used to simplify program while preserving its erroneous behavior by simplifying program and scaling down its complexity according to its profiles in house testing and analysis techniques can be performed more accurately and efficiently and pragmatically program defects that occur more often and are arguably more relevant to users will be given preference during failure analysis specifically we adapt statistical debugging on execution profiles to predict likely failure related code and use syntax directed algorithm to trim failure irrelevant code from program while preserving its erroneous behavior as much as possible we conducted case studies on testing engine cute and software model checker blast to evaluate our technique we used subject programs from the aristotle analysis system and the software artifact infrastructure repository sir our empirical results show that using simplified programs cute and blast find more bugs with improved accuracy and performance they were able to detect and out of more bugs respectively in about half of the time as they took on the original test programs
to offset the effect of read miss penalties on processor utilization in shared memory multiprocessors several software and hardware based data prefetching schemes have been proposed major advantage of hardware techniques is that they need no support from the programmer or compilersequential prefetching is simple hardware controlled prefetching technique which relies on the automatic prefetch of consecutive blocks following the block that misses in the cache thus exploiting spatial locality in its simplest form the number of prefetched blocks on each miss is fixed throughout the execution however since the prefetching efficiency varies during the execution of program we propose to adapt the number of prefetched blocks according to dynamic measure of prefetching effectiveness simulations of this adaptive scheme show reductions of the number of read misses the read penalty and of the execution time by up to and respectively
change detection is an important issue for modern geospatial information systems in this paper we address change detection of areal objects ie objects with closed curve outlines we specifically focus on the detection of movement translation and rotation and or deformation of such objects using aerial imagery the innovative approach we present in this paper combines geometric analysis with our model of differential snakes to support change detection geometric analysis proceeds by comparing the first moments of the two outlines describing the same object in different instances to estimate translation moment information allows us to determine the principal axes and eigenvectors of these outlines and this we can determine object rotation as the angle between these principal axes next we apply polygon clipping techniques to calculate the intersection and difference of these two outlines we use this result to estimate the radial deformation of the object expansion and contraction the results are further refined through the use of our differential snakes model to distinguish true change from the effects of inaccuracy in object determination the aggregation of these tools defines powerful approach for change detection in the paper we present the theoretical background behind these components and experimental results that demonstrate the performance of our approach
coupled transformation occurs when multiple software artifacts must be transformed in such way that they remain consistent with each other for instance when database schema is adapted in the context of system maintenance the persistent data residing in the system’s database needs to be migrated to conform to the adapted schema also queries embedded in the application code and any declared referential constraints must be adapted to take the schema changes into account as another example in xml to relational data mapping hierarchical xml schema is mapped to relational sql schema with appropriate referential constraints and the xml documents and queries are converted into relational data and relational queries the lt project is aimed at providing formal basis for coupled transformation this formal basis is found in data refinement theory point free program calculation and strategic term rewriting we formalize the coupled transformation of data type by an algebra of information preserving data refinement steps each witnessed by appropriate data conversion functions refinement steps are modeled by so called two level rewrite rules on type expressions that synthesize conversion functions between redex and reduct while rewriting strategy combinators are used to composed two level rewrite rules into complete rewrite systems point free program calculation is applied to optimized synthesize conversion function to migrate queries and to normalize data type constraints in this paper we provide an overview of the challenges met by the lt project and we give sketch of the solutions offered
in this paper we develop fault tolerant job scheduling strategy in order to tolerate faults gracefully in an economy based grid environment we propose novel adaptive task checkpointing based fault tolerant job scheduling strategy for an economy based grid the proposed strategy maintains fault index of grid resources it dynamically updates the fault index based on successful or unsuccessful completion of an assigned task whenever grid resource broker has tasks to schedule on grid resources it makes use of the fault index from the fault tolerant schedule manager in addition to using time optimization heuristic while scheduling grid job on grid resource the resource broker uses fault index to apply different intensity of task checkpointing inserting checkpoints in task at different intervals to simulate and evaluate the performance of the proposed strategy this paper enhances the gridsim toolkit to exhibit fault tolerance related behavior we also compare checkpointing fault tolerant job scheduling strategy with the well known time optimization heuristic in an economy based grid environment from the measured results we conclude that even in the presence of faults the proposed strategy effectively schedules grid jobs tolerating faults gracefully and executes more jobs successfully within the specified deadline and allotted budget it also improves the overall execution time and minimizes the execution cost of grid jobs
the microprocessor industry is currently struggling with higher development costs and longer design times that arise from exceedingly complex processors that are pushing the limits of instruction level parallelism meanwhile such designs are especially ill suited for important commercial applications such as on line transaction processing oltp which suffer from large memory stall times and exhibit little instruction level parallelism given that commercial applications constitute by far the most important market for high performance servers the above trends emphasize the need to consider alternative processor designs that specifically target such workloads the abundance of explicit thread level parallelism in commercial workloads along with advances in semiconductor integration density identify chip multiprocessing cmp as potentially the most promising approach for designing processors targeted at commercial servers this paper describes the piranha system research prototype being developed at compaq that aggressively exploits chip multi processing by integrating eight simple alpha processor cores along with two level cache hierarchy onto single chip piranha also integrates further on chip functionality to allow for scalable multiprocessor configurations to be built in glueless and modular fashion the use of simple processor cores combined with an industry standard asic design methodology allow us to complete our prototype within short time frame with team size and investment that are an order of magnitude smaller than that of commercial microprocessor our detailed simulation results show that while each piranha processor core is substantially slower than an aggressive next generation processor the integration of eight cores onto single chip allows piranha to outperform next generation processors by up to times on per chip basis on important workloads such as oltp this performance advantage can approach factor of five by using full custom instead of asic logic in addition to exploiting chip multiprocessing the piranha prototype incorporates several other unique design choices including shared second level cache with no inclusion highly optimized cache coherence protocol and novel architecture
we present generalisation of first order rewriting which allows us to deal with terms involving binding operations in an elegant and practical way we use nominal approach to binding in which bound entities are explicitly named rather than using nameless syntax such as de bruijn indices yet we get rewriting formalism which respects conversion and can be directly implemented this is achieved by adapting to the rewriting framework the powerful techniques developed by pitts et al in the freshml projectnominal rewriting can be seen as higher order rewriting with first order syntax and built in conversion we show that standard first order rewriting is particular case of nominal rewriting and that very expressive higher order systems such as klop’s combinatory reduction systems can be easily defined as nominal rewriting systems finally we study confluence properties of nominal rewriting
we present novel approach for retrieval of object categories based on novel type of image representation the generalized correlogram gc in our image representation the object is described as constellation of gcs where each one encodes information about some local part and the spatial relations from this part to others ie the part’s context we show how such representation can be used with fast procedures that learn the object category with weak supervision and efficiently match the model of the object against large collections of images in the learning stage we show that by integrating our representation with boosting the system is able to obtain compact model that is represented by very few features where each feature conveys key properties about the object’s parts and their spatial arrangement in the matching step we propose direct procedures that exploit our representation for efficiently considering spatial coherence between the matching of local parts combined with an appropriate data organization such as inverted files we show that thousands of images can be evaluated efficiently the framework has been applied to different standard databases and we show that our results are favorably compared against state of the art methods in both computational cost and accuracy
we consider an extension of integer linear arithmetic with star operator takes closure under vector addition of the solution set of linear arithmetic subformula we show that the satisfiability problem for this extended language remains in np and therefore np complete our proof uses semilinear set characterization of solutions of integer linear arithmetic formulas as well as generalization of recent result on sparse solutions of integer linear programming problems as consequence of our result we present worst case optimal decision procedures for two np hard problems that were previously not known to be in np the first is the satisfiability problem for logic of sets multisets bags and cardinality constraints which has applications in verification interactive theorem proving and description logics the second is the reachability problem for class of transition systems whose transitions increment the state vector by solutions of integer linear arithmetic formulas
we present compositional program logic for call by value imperative higher order functions with general forms of aliasing which can arise from the use of reference names as function parameters return values content of references and parts of data structures the program logic extends our earlier logic for alias free imperative higher order functions with new modal operators which serve as building blocks for clean structural reasoning about programs and data structures in the presence of aliasing this has been an open issue since the pioneering work by cartwright oppen and morris twenty five years ago we illustrate usage of the logic for description and reasoning through concrete examples including higher order polymorphic quicksort the logical status of the new operators is clarified by translating them into in equalities of reference names the logic is observationally complete in the sense that two programs are observationally indistinguishable if they satisfy the same set of assertions
the input vocabulary for touch screen interaction on handhelds is dramatically limited especially when the thumb must be used to enrich that vocabulary we propose to discriminate among thumb gestures those we call microrolls characterized by zero tangential velocity of the skin relative to the screen surface combining four categories of thumb gestures drags swipes rubbings and microrolls with other classification dimensions we show that at least elemental gestures can be automatically recognized we also report the results of two experiments showing that the roll vs slide distinction facilitates thumb input in realistic copy and paste task relative to existing interaction techniques
the increasing availability and accuracy of eye gaze detection equipment has encouraged its use for both investigation and control in this paper we present novel methods for navigating and inspecting extremely large images solely or primarily using eye gaze control we investigate the relative advantages and comparative properties of four related methods stare to zoom stz in which control of the image position and resolution level is determined solely by the user’s gaze position on the screen head to zoom htz and dual to zoom dtz in which gaze control is augmented by head or mouse actions and mouse to zoom mtz using conventional mouse input as an experimental control the need to inspect large images occurs in many disciplines such as mapping medicine astronomy and surveillance here we consider the inspection of very large aerial images of which google earth is both an example and the one employed in our study we perform comparative search and navigation tasks with each of the methods described and record user opinions using the swedish user viewer presence questionnaire we conclude that while gaze methods are effective for image navigation they as yet lag behind more conventional methods and interaction designers may well consider combining these techniques for greatest effect
previous research in real time concurrency control mainly focuses on the schedulability guarantee of hard real time transactions and the reduction of the miss rate of soft real time transactions although many new database applications have significant response time requirements not much work has been done in the joint scheduling of traditional nonreal time transactions and soft real time transactions in this paper we study the concurrency control problems in mixed soft real time database systems in which both non real time and soft real time transactions exist simultaneously the objectives are to identify the cost and the performance tradeoff in the design of cost effective and practical real time concurrency control protocols and to evaluate their performance under different real time and non real time supports in particular we are interested in studying the impacts of different scheduling approaches for soft real time transactions on the performance of non real time transactions instead of proposing yet another completely new real time concurrency control protocol our objective is to design an efficient integrated concurrency control method based on existing techniques we propose several methods to integrate the well known two phase locking and optimistic concurrency control with the aims to meet the deadline requirements of soft real time transactions and at the same time to minimize the impact on the performance of non real time transactions we have conducted series of experiments based on sanitized version of stock trading systems to evaluate the performance of both soft real time and non real time transactions under different real time supports in the system
social networks play important roles in the semantic web knowledge management information retrieval ubiquitous computing and so on we propose social network extraction system called polyphonet which employs several advanced techniques to extract relations of persons detect groups of persons and obtain keywords for person search engines especially google are used to measure co occurrence of information and obtain web documentsseveral studies have used search engines to extract social networks from the web but our research advances the following points first we reduce the related methods into simple pseudocodes using google so that we can build up integrated systems second we develop several new algorithms for social networking mining such as those to classify relations into categories to make extraction scalable and to obtain and utilize person to word relations third every module is implemented in polyphonet which has been used at four academic conferences each with more than participants we overview that system finally novel architecture called super social network mining is proposed it utilizes simple modules using google and is characterized by scalability and relate identify processes identification of each entity and extraction of relations are repeated to obtain more precise social network
for synchronous distributed system of processes with up to potential and actual crash failures where
the specification of constraint languages for access control models has proven to be difficult but remains necessary for safety and for mandatory access control policies while the authorisation relation subject times object rightarrow pow right defines the authorised permissions an authorisation schema defines how the various concepts such as subjects users roles labels are combined to form complete access control modelusing examples drawn from common access control models in the literature we extend the authorisation schema of dtac to define general formalism for describing authorisation schema for any access control modelbased on our generic authorisation schema we define new simpler constraint specification language which is as expressive as our previous graphical constraint languages and no more complex to verify
currently there is an increasing interest in data mining and educational systems making educational data mining as new growing research community this paper surveys the application of data mining to traditional educational systems particular web based courses well known learning content management systems and adaptive and intelligent web based educational systems each of these systems has different data source and objectives for knowledge discovering after preprocessing the available data in each case data mining techniques can be applied statistics and visualization clustering classification and outlier detection association rule mining and pattern mining and text mining the success of the plentiful work needs much more specialized work in order for educational data mining to become mature area
we present formal approach to implement fault tolerance in real time embedded systems the initial fault intolerant system consists of set of independent periodic tasks scheduled onto set of fail silent processors connected by reliable communication network we transform the tasks such that assuming the availability of an additional spare processor the system tolerates one failure at time transient or permanent failure detection is implemented using heartbeating and failure masking using checkpointing and rollback these techniques are described and implemented by automatic program transformations on the tasks programs the proposed formal approach to fault tolerance by program transformations highlights the benefits of separation of concerns it allows us to establish correctness properties and to compute optimal values of parameters to minimize fault tolerance overhead we also present an implementation of our method to demonstrate its feasibility and its efficiency
this paper describes kernel interface that provides an untrusted user level process an executive with protected access to memory management functions including the ability to create manipulate and execute within subservient contexts address spaces page motion callbacks not only give the executive limited control over physical memory management but also shift certain responsibilities out of the kernel greatly reducing kernel state and complexity the executive interface was motivated by the requirements of the wisconsin wind tunnel wwt system for evaluating cache coherent shared memory parallel architectures wwt uses the executive interface to implement fine grain user level extension of li’s shared virtual memory on thinking machines cm message passing multicomputer however the interface is sufficiently general that an executive could act as multiprogrammed operating system exporting an alternative interface to the threads running in its subservient contexts the executive interface is currently implemented as an extension to cmost the standard operating system for the cm in cmost policy decisions are made on central distinct control processor cp and broadcast to the processing nodes pns the pns execute minimal kernel sufficient only to implement the cp’s policy while this structure efficiently supports some parallel application models the lack of autonomy on the pns restricts its generality adding the executive interface provides limited autonomy to the pns creating structure that supports multiple models of application parallelism this structure with autonomy on top of centralization is in stark contrast to most microkernel based parallel operating systems in which the nodes are fundamentally autonomous
honeypot has been an invaluable tool for the detection and analysis of network based attacks by either human intruders or automated malware in the wild the insights obtained by deploying honeypots especially high interaction ones largely rely on the monitoring capability on the honeypots in practice based on the location of sensors honeypots can be monitored either internally or externally being deployed inside the monitored honeypots internal sensors are able to provide semantic rich view on various aspects of system dynamics eg system calls however their very internal existence makes them visible tangible and even subvertible to attackers after break ins from another perspective existing external honeypot sensors eg network sniffers could be made invisible to the monitored honeypot however they are not able to capture any internal system events such as system calls executed it is desirable to have honeypot monitoring system that is invisible tamper resistant and yet is capable of recording and understanding the honeypot’s system internal events such as system calls in this paper we present virtualization based system called vmscope which allows us to view the system internal events of virtual machine vm based honeypots from outside the honeypots particularly by observing and interpreting vm internal system call events at the virtual machine monitor vmm layer vmscope is able to provide the same deep inspection capability as that of traditional inside the honeypot monitoring tools eg sebek while still obtaining similar tamper resistance and invisibility as other external monitoring tools we have built proof of concept prototype by leveraging and extending one key virtualization technique called binary translation our experiments with real world honeypots show that vmscope is robust against advanced countermeasures that can defeat existing internally deployed honeypot monitors and it only incurs moderate run time overhead
recently grids and pervasive systems have been drawing increasing attention in order to coordinate large scale resources enabling access to small and smart devices in this paper we propose caching approach enabling to improve querying on pervasive grids our proposal called semantic pervasive dual cache follows semantics oriented approach it is based on the one hand on clear separation between the analysis and the evaluation process and on the other hand on the cooperation between client caches considering light analysis and proxy caches including evaluation capabilities such an approach helps load balancing making the system more scalable we have validated semantic pervasive dual cache using analytic models and simulations results obtained show that our approach is quite promising
the algorithm selection problem rice seeks to answer the question which algorithm is likely to perform best for my problem quest recognizing the problem as learning task in the early the machine learning community has developed the field of meta learning focused on learning about learning algorithm performance on classification problems but there has been only limited generalization of these ideas beyond classification and many related attempts have been made in other disciplines such as ai and operations research to tackle the algorithm selection problem in different ways introducing different terminology and overlooking the similarities of approaches in this sense there is much to be gained from greater awareness of developments in meta learning and how these ideas can be generalized to learn about the behaviors of other nonlearning algorithms in this article we present unified framework for considering the algorithm selection problem as learning problem and use this framework to tie together the crossdisciplinary developments in tackling the algorithm selection problem we discuss the generalization of meta learning concepts to algorithms focused on tasks including sorting forecasting constraint satisfaction and optimization and the extension of these ideas to bioinformatics cryptography and other fields
behavior analysis of complex distributed systems has led to the search for enhanced reachability analysis techniques which support modularity and which control the state explosion problem while modularity has been achieved state explosion in still problem indeed this problem may even be exacerbated as locally minimized subsystem may contain many states and transitions forbidden by its environment or context context constraints specified as interface processes are restrictions imposed by the environment on subsystem behavior recent research has suggested that the state explosion problem can be effectively controlled if context constraints are incorporated in compositional reachability analysis cra although theoretically very promising the approach has rarely been used in practice because it generally requires more complex computational model and does not contain mechanism to derive context constraints automatically this article presents technique to automate the approach while using similar computational model to that of cra context constraints are derived automatically based on set of sufficient conditions for these constraints to be transparently included when building reachability graphs as result the global reachability graph generated using the derived constraints is shown to be observationally equivalent to that generated by cra without the inclusion of context constraints constraints can also be specified explicitly by users based on their application knowledge erroneous constraints which contravene transparency can be identified together with an indication of the error sources user specified constraints can be combined with those generated automatically the technique is illustrated using clients server system and other examples
developing efficient and automatic testing techniques is one of the major challenges facing software validation community in this paper we show how uniform random generation process of finite automata developed in recent work by bassino and nicaud is relevant for many faces of automatic testing the main contribution is to show how to combine two major testing approaches model based testing and random testing this leads to new testing technique successfully experimented on realistic case study we also illustrate how the power of random testing applied on chinese postman problem implementation points out an error in well known algorithm finally we provide some statistics on model based testing algorithms
we study the problem of querying xml data sources that accept only limited set of queries such as sources accessible by web services which can implement very large potentially infinite families of xpath queries to compactly specify such families of queries we adopt the query set specifications formalism close to context free grammars we say that query is expressible by the specification if it is equivalent to some expansion of is supported by if it has an equivalent rewriting using some finite set of p’s expansions we study the complexity of expressibility and support and identify large classes of xpath queries for which there are efficient ptime algorithms our study considers both the case in which the xml nodes in the results of the queries lose their original identity and the one in which the source exposes persistent node ids
industry globalization brings with it inevitable changes to traditional organizational structures the notion of global virtual teams working together across geographical cultural and functional borders is becoming increasingly appealing this paper presents observations of how team of designers negotiate shared understanding in the collaborative design of virtual pedals for volvo car corporation although the team was globally distributed during most of the development process examples are drawn from collocated design sessions since this enables careful examination of the multifaceted ways in which collocated designers use wide variety of artifacts and techniques to create common ground the findings highlight the situational and interactional characteristics of design collaboration and suggest that the addition of shared objects to think with in distributed design environments could greatly facilitate global design teams in their collaborative process of thinking together apart
we consider the class of database programs and address the problem of minimizing the cost of their exchanges with the database server this cost partly consists of query execution at the server side and partly of query submission and network exchanges between the program and the server the natural organization of database programs leads to submit an intensive flow of elementary sql queries to the server and exploits only locally its optimization power in this paper we develop global optimization approach we base this approach on an execution model where queries can be executed asynchronously with respect to the flow of the application program our method aims at choosing an efficient query scheduling which limits the penalty of client server interactions our results show that the technique can improve the execution time of database programs by several orders of magnitude
both integer programming models and heuristic algorithms have been proposed for finding minimum energy broadcast and multicast trees in wireless ad hoc networks among heuristic algorithms the broadcast multicast incremental power bip mip algorithm is most known the theoretical performance of bip mip has been quantified in several studies to assess the empirical performance of bip mip and other heuristic algorithms it is necessary to compute an optimal tree or very good lower bound of the optimum in this paper we present an integer programming approach as well as improved heuristic algorithms our integer programming approach comprises novel integer model and relaxation scheme unlike previously proposed models the continuous relaxation of our model leads to very sharp lower bound of the optimum our relaxation scheme allows for performance evaluation of heuristics without having to compute optimal trees our contributions to heuristic algorithms consist of the power improving algorithm successive power adjustment spa and improved time complexity of some previously suggested algorithms we report extensive numerical experiments algorithm spa finds better solutions in comparison to host of other algorithms moreover the integer programming approach shows that trees found by algorithm spa are optimal or near optimal
the widespread deployment of sensor networks is on the horizon one of the main challenges in sensor networks is to process and aggregate data in the network rather than wasting energy by sending large amounts of raw data to reply to query some efficient data dissemination methods particularly data centric storage and information aggregation rely on efficient routing from one node to another in this paper we introduce gem graph embedding for sensor networks an infrastructure for node to node routing and data centric storage and information processing in sensor networks unlike previous approaches it does not depend on geographic information and it works well even in the face of physical obstacles in gem we construct labeled graph that can be embedded in the original network topology in an efficient and distributed fashion in that graph each node is given label that encodes its position in the original network topology this allows messages to be efficiently routed through the network while each node only needs to know the labels of its neighborsto demonstrate how gem can be applied we have developed concrete graph embedding method vpcs virtual polar coordinate space in vpcs we embed ringed tree into the network topology and label the nodes in such manner as to create virtual polar coordinate space we have also developed vpcr an efficient routing algorithm that uses vpcs vpcr is the first algorithm for node to node routing that guarantees reachability requires each node to keep state only about its immediate neighbors and requires no geographic information our simulation results show that vpcr is robust on dynamic networks works well in the face of voids and obstacles and scales well with network size and density
jxta is an open source initiative that allows to specify set of collaboration and communication protocols which enable the creation and deployment of peer to peer pp applications this paper provides survey on its current state regarding the topic of security the study focuses on the security evaluation of standard peer operations within the jxta network highlighting which issues must be seriously taken into account in those applications sensitive to security
role based access control rbac is supported directly or in closely related form by number of products this article presents formalization of rbac using graph transformations that is graphical specification technique based on generalization of classical string grammars to nonlinear structures the proposed formalization provides an intuitive description for the manipulation of graph structures as they occur in information systems access control and precise specification of static and dynamic consistency conditions on graphs and graph transformations the formalism captures the rbac models published in the literature and also allows uniform treatment of user roles and administrative roles and detailed analysis of the decentralization of administrative roles
the convergence of cs and biology will serve both disciplines providing each with greater power and relevance
while static typing is widely accepted as being necessary for secure program execution dynamic typing is also viewed as being essential in some applications particularly for distributed programming environments dynamics have been proposed as language construct for dynamic typing based on experience with languages such as clu cedar mesa and modula however proposals for incorporating dynamic typing into languages with parametric polymorphism have serious shortcomings new approach is presented to extending polymorphic lnanguages with dynamic typing at the heart of the approach is the use of dynamic type dispatch where polymorphic functions may analyze the structure of their type arguments this approach solves several open problems with the traditional approach to adding dynamic typing to polymorphic languages an explicity typed language xmldyn is presented this language uses refinement kinds to ensure that dynamic type dispatch does not fail at run time safe dynamics are new form of dynamics that use refinement kinds to statically check the use of run time dynamic typing run time errors are isolated to separate construct for performing run time type checks
in this paper we pose novel research problem for machine learning that involves constructing process model from continuous data we claim that casting learned knowledge in terms of processes with associated equations is desirable for scientific and engineering domains where such notations are commonly used we also argue that existing induction methods are not well suited to this task although some techniques hold partial solutions in response we describe an approach to learning process models from time series data and illustrate its behavior in three domains in closing we describe open issues in process model induction and encourage other researchers to tackle this important problem
the internet today offers primarily best effort service research and technology development efforts are currently underway to allow provisioning of better than best effort quality of service qos assurances the residual uncertainty in qos can be managed using pricing strategies in this article we develop spot pricing framework for intra domain expected bandwidth contracts with loss based qos guarantees the framework builds on nonlinear pricing scheme for cost recovery from earlier work and extends it to price risk utility based options pricing approach is developed to account for the uncertainties in delivering loss guarantees application of options pricing techniques in internet services provides mechanism for fair risk sharing between the provider and the customer and may be extended to price other uncertainties in qos guarantees
the widespread adoption of xml holds out the promise that document structure can be exploited to specify precise database queries however the user may have only limited knowledge of the xml structure and hence may be unable to produce correct xquery especially in the context of heterogeneous information collection the default is to use keyword based search and we are all too familiar with how difficult it is to obtain precise answers by these means we seek to address these problems by introducing the notion of meaningful lowest common ancestor structure mlcas for finding related nodes within an xml document by automatically computing mlcas and expanding ambiguous tag names we add new functionality to xquery and enable users to take full advantage of xquery in querying xml data precisely and efficiently without requiring perfect knowledge of the document structure such schema free xquery is potentially of value not just to casual users with partial knowledge of schema but also to experts working in data integration or data evolution context in such context schema free query once written can be applied universally to multiple data sources that supply similar content under different schemas and applied forever as these schemas evolve our experimental evaluation found that it was possible to express wide variety of queries in schema free manner and have them return correct results over broad diversity of schemas furthermore the evaluation of schema free query is not expensive using novel stack based algorithm we develop for computing mlcas from to times the execution time of an equivalent schema aware query
farsite is secure scalable file system that logically functions as centralized file server but is physically distributed among set of untrusted computers farsite provides file availability and reliability through randomized replicated storage it ensures the secrecy of file contents with cryptographic techniques it maintains the integrity of file and directory data with byzantine fault tolerant protocol it is designed to be scalable by using distributed hint mechanism and delegation certificates for pathname translations and it achieves good performance by locally caching file data lazily propagating file updates and varying the duration and granularity of content leases we report on the design of farsite and the lessons we have learned by implementing much of that design
as open source development has evolved differentiation of roles and increased sophistication of collaborative processes has occurred recently we described coordination issues in software development and an interactive visualization tool called the social health overview sho developed to address them this paper presents an empirical evaluation of sho intended to identify its strengths and weaknesses eleven informants in various open source roles were interviewed about their work practices eight of these participated in an evaluation comparing three change management tasks in sho and bugzilla results are discussed with respect to task strategy with each tool and participants roles
we propose definition for frequent approximate patterns in order to model important subgraphs in graph database with incomplete or inaccurate information by our definition frequent approximate patterns possess three main properties possible absence of exact match maximal representation and the apriori property since approximation increases the number of frequent patterns we present novel randomized algorithm called ram using feature retrieval large number of real and synthetic data sets are used to demonstrate the effectiveness and efficiency of the frequent approximate graph pattern model and the ram algorithm
we study the problem of finding efficient equivalent view based rewritings of relational queries focusing on query optimization using materialized views under the assumption that base relations cannot contain duplicate tuples lot of work in the literature addresses the problems of answering queries using views and query optimization however most of it proposes solutions for special cases such as for conjunctive queries cqs or for aggregate queries only in addition most of it addresses the problems separately under set or bag set semantics for query evaluation and some of it proposes heuristics without formal proofs for completeness or soundness in this paper we look at the two problems by considering cq queries that is both pure conjunctive and aggregate queries with aggregation functions sum count min and max the distinct keyword in sql versions of our queries is also allowed we build on past work to provide algorithms that handle this general setting this is possible because recent results on rewritings of cq queries show that there are sound and complete algorithms based on containment tests of cqsour focus is that our algorithms are efficient as well as sound and complete besides the contribution we make in putting and addressing the problems in this general setting we make two additional contributions for bag set and set semantics first we propose efficient sound and complete tests for equivalence of cq queries to rewritings that use overlapping views the algorithms are complete with respect to the language of rewritings these results apply not only to query optimization but to all areas where the goal is to obtain efficient equivalent view based query rewritings second based on these results we propose two sound algorithms bdpv and cdpv that find efficient execution plans for cq queries in terms of materialized views both algorithms extend the cost based query optimization approach of system the efficient sound algorithm bdpv is also complete in some cases whereas cdpv is sound and complete for all cq queries we consider we present study of the completeness efficiency tradeoff in the algorithms and provide experimental results that show the viability of our approach and test the limits of query optimization using overlapping views
this paper addresses the problem of resolving virtual method and interface calls in java bytecode the main focus is on new practical technique that can be used to analyze large applications our fundamental design goal was to develop technique that can be solved with only one iteration and thus scales linearly with the size of the program while at the same time providing more accurate results than two popular existing linear techniques class hierarchy analysis and rapid type analysiswe present two variations of our new technique variable type analysis and coarser grain version called declared type analysis both of these analyses are inexpensive easy to implement and our experimental results show that they scale linearly in the size of the programwe have implemented our new analyses using the soot frame work and we report on empirical results for seven benchmarks we have used our techniques to build accurate call graphs for complete applications including libraries and we show that compared to conservative call graph built using class hierarchy analysis our new variable type analysis can remove significant number of nodes methods and call edges further our results show that we can improve upon the compression obtained using rapid type analysiswe also provide dynamic measurements of monomorphic call sites focusing on the benchmark code excluding libraries we demonstrate that when considering only the benchmark code both rapid type analysis and our new declared type analysis do not add much precision over class hierarchy analysis however our finer grained variable type analysis does resolve significantly more call sites particularly for programs with more complex uses of objects
software model checkers are typically language specific require substantial development efforts and are hard to reuse for other languages adding partial order reduction por capabilities to such tools typically requires sophisticated changes to the tool’s model checking algorithms this paper proposes new method to make software model checkers language independent and improving their performance through por getting the por capabilities does not require making any changes to the underlying model checking algorithms for each language they are instead achieved through theory transformationr of l’s formal semantics rewrite theory under very minimal assumptions this can be done for any language with relatively little effort our experiments with the jvm promela like language and maude indicate that significant state space reductions and time speedups can be gained for tools generated this way
database sharing db sharing refers to general approach for building distributed high performance transaction system the nodes of db sharing system are locally coupled via high speed interconnect and share common database at the disk level this is also known as ldquo shared disk rdquo approach we compare database sharing with the database partitioning shared nothing approach and discuss the functional dbms components that require new and coordinated solutions for db sharing the performance of db sharing systems critically depends on the protocols used for concurrency and coherency control the frequency of communication required for these functions has to be kept as low as possible in order to achieve high transation rates and short response times trace driven simulation system for db sharing complexes has been developed that allows realistic performance comparison of four different concurrency and coherency control protocols we consider two locking and two optimistic schemes which operate either under central or distributed control for coherency control we investigate so called on request and broadcast invalidation schemes and employ buffer to buffer communication to exchange modified pages directly between different nodes the performance impact of random routing versus affinity based load distribution and different communication costs is also examined in addition we analyze potential performance bottlenecks created by hot spot pages
an important goal of an operating system is to make computing and communication resources available in fair and efficient way to the applications that will run on top of it to achieve this result the operating system implements number of policies for allocating resources to and sharing resources among applications and it implements safety mechanisms to guard against misbehaving applications however for most of these allocation and sharing tasks no single optimal policy exists different applications may prefer different operating system policies to achieve their goals in the best possible way customizable or adaptable operating system is an operating system that allows for flexible modification of important system policies over the past decade wide range of approaches for achieving customizability has been explored in the operating systems research community in this survey an overview of these approaches structured around taxonomy is presented
this paper explores the use of resolution as meta framework for developing various different deduction calculi in this work the focus is on developing deduction calculi for modal dynamic logics dynamic modal logics are pdl like extended modal logics which are closely related to description logics we show how tableau systems modal resolution systems and rasiowa sikorski systems can be developed and studied by using standard principles and methods of first order theorem proving the approach is based on the translation of reasoning problems in modal logic to first order clausal form and using suitable refinement of resolution to construct and mimic derivations of the desired proof method the inference rules of the calculus can then be read off from the clausal form we show how this approach can be used to generate new proof calculi and prove soundness completeness and decidability results this slightly unusual approach allows us to gain new insights and results for familiar and less familiar logics for different proof methods and compare them not only theoretically but also empirically in uniform framework
geometry processing algorithms have traditionally assumed that the input data is entirely in main memory and available for random access this assumption does not scale to large data sets as exhausting the physical memory typically leads to io inefficient thrashing recent works advocate processing geometry in streaming manner where computation and output begin as soon as possible streaming is suitable for tasks that require only local neighbor information and batch process an entire data setwe describe streaming compression scheme for tetrahedral volume meshes that encodes vertices and tetrahedra in the order they are written to keep the memory footprint low the compressor is informed when vertices are referenced for the last time ie are finalized the compression achieved depends on how coherent the input order is and how many tetrahedra are buffered for local reordering for reasonably coherent orderings and buffer of tetrahedra we achieve compression rates that are only to percent above the state of the art while requiring drastically less memory resources and less than half the processing time
it is well known that query is an approximate representation of the user’s information needs since it does not provide sufficient specification of the attended results numerous studies addressed this issue using techniques for better eliciting either document or query representations more recent studies investigated the use of search context to better understand the user intent driven by the query in order to deliver personalized information results in this article we propose personalized information retrieval model that leverages the information relevance by its usefulness to both the query and the user’s profile expressed by his main topics of interest the model is based on the influence diagram formalism which is an extension of bayesian networks dedicated to decision problems this graphical model offers an intuitive way to represent in the same framework all the basic information terms documents user interests surrounding the user’s information need and also quantify their mutual influence on the relevance estimation experimental results demonstrate that our model was successful at eliciting user queries according to dynamic changes of the user interests
this paper presents pioneer vliw architecture of native java processor we show that thanks to the specific stack architecture and to the use of the vliw technique one is able to obtain meaningful reduction of power dissipation with small area overhead when compared to other ways of executing java in hardware the underlying technique is based on the reuse of memory access instructions hence reducing power during memory or cache accesses the architecture is validated for some complex embedded applications like imdct computation and other data processing benchmarks
in the maximum constraint satisfaction problem max csp one is given finite collection of possibly weighted constraints on overlapping sets of variables and the goal is to assign values from given finite domain to the variables so as to maximize the number or the total weight for the weighted case of satisfied constraints this problem is np hard in general and therefore it is natural to study how restricting the allowed types of constraints affects the approximability of the problem in this article we show that any max csp problem with finite set of allowed constraint types which includes all fixed value constraints ie constraints of the form equals is either solvable exactly in polynomial time or else is apx complete even if the number of occurrences of variables in instances is bounded moreover we present simple description of all polynomial time solvable cases of our problem this description relies on the well known algebraic combinatorial property of supermodularity
test suite minimization techniques aim to eliminate redundant test cases from test suite based on some criteria such as coverage or fault detection capability most existing test suite minimization techniques have two main limitations they perform minimization based on single criterion and produce suboptimal solutions in this paper we propose test suite minimization framework that overcomes these limitations by allowing testers to easily encode wide spectrum of test suite minimization problems handle problems that involve any number of criteria and compute optimal solutions by leveraging modern integer linear programming solvers we implemented our framework in tool called mints that is freely available and can be interfaced with number of different state of the art solvers our empirical evaluation shows that mints can be used to instantiate number of different test suite minimization problems and efficiently find an optimal solution for such problems using different solvers
in this paper the authors share their experiences gathered during the design and implementation of the corba persistent object service there are two problems related to design and implementation of the persistence service first omg intentionally leaves the functionality core of the persistence service unspecified second omg encourages reuse of other object services without being specific enough in this respect the paper identifies the key design issues implied both by the intentional lack of omg specification and the limits of the implementation environment characteristics at the same time the paper discusses the benefits and drawbacks of reusing other object services particularly the relationship and externalization services to support the persistence service surprisingly the key lesson learned is that direct reuse of these object services is impossible
human communication involves not only speech but also wide variety of gestures and body motions interactions in virtual environments often lack this multi modal aspect of communication we present method for automatically synthesizing body language animations directly from the participants speech signals without the need for additional input our system generates appropriate body language animations by selecting segments from motion capture data of real people in conversation the synthesis can be performed progressively with no advance knowledge of the utterance making the system suitable for animating characters from live human speech the selection is driven by hidden markov model and uses prosody based features extracted from speech the training phase is fully automatic and does not require hand labeling of input data and the synthesis phase is efficient enough to run in real time on live microphone input user studies confirm that our method is able to produce realistic and compelling body language
software change resulting from new requirements environmental modifications and error detection creates numerous challenges for the maintenance of software products while many software evolution strategies focus on code to modeling language analysis few address software evolution at higher abstraction levels most lack the flexibility to incorporate multiple modeling languages not many consider the integration and reuse of domain knowledge with design knowledge we address these challenges by combining ontologies and model weaving to assist in software evolution of abstract artifacts our goals are to recover high level artifacts such as requirements and design models defined using variety of software modeling languages simplify modification of those models reuse software design and domain knowledge contained within models and integrate those models with enhancements via novel combination of ontological and model weaving concepts additional benefits to design recovery and software evolution include detecting high level dependencies and identifying differences between evolved software and initial specifications
collaborative filtering recommender systems are typically unable to generate adequate recommendations for newcomers empirical evidence suggests that the incorporation of trust network among the users of recommender system can significantly help to alleviate this problem hence users are highly encouraged to connect to other users to expand the trust network but choosing whom to connect to is often difficult task given the impact this choice has on the delivered recommendations it is critical to guide newcomers through this early stage connection process in this paper we identify several classes of key figures in the trust network namely mavens frequent raters and connectors furthermore we introduce measures to assess the influence of these users on the amount and the quality of the recommendations delivered by trust enhanced collaborative filtering recommender system experiments on dataset from epinionscom support the claim that generated recommendations for new users are more beneficial if they connect to an identified key figure compared to random user
fault tolerance in distributed computing is wide area with significant body of literature that is vastly diverse in methodology and terminology this paper aims at structuring the area and thus guiding readers into this interesting field we use formal approach to define important terms like fault fault tolerance and redundancy this leads to four distinct forms of fault tolerance and to two main phases in achieving them detection and correction we show that this can help to reveal inherently fundamental structures that contribute to understanding and unifying methods and terminology by doing this we survey many existing methodologies and discuss their relations the underlying system model is the close to reality asynchronous message passing model of distributed computing
cache misses due to coherence actions are often the major source for performance degradation in cache coherent multiprocessors it is often difficult for the programmer to take cache coherence into account when writing the program since the resulting access pattern is not apparent until the program is executedsm prof is performance analysis tool that addresses this problem by visualising the shared data access pattern in diagram with links to the source code lines causing performance degrading access patterns the execution of program is divided into time slots and each data block is classified based on the accesses made to the block during time slot this enables the programmer to follow the execution over time and it is possible to track the exact position responsible for accesses causing many cache misses related to coherence actionsmatrix multiplication and the mpd application from splash are used to illustrate the use of sm prof for mpd sm prof revealed performance limitations that resulted in performance improvement of over the current implementation is based on program driven simulation in order to achieve non intrusive profiling if small perturbation of the program execution is acceptable it is also possible to use software tracing techniques given that data address can be related to the originating instruction
many web sites have begun allowing users to submit items to collection and tag them with keywords the folksonomies built from these tags are an interesting topic that has seen little empirical research this study compared the search information retrieval ir performance of folksonomies from social bookmarking web sites against search engines and subject directories thirty four participants created queries for various information needs results from each ir system were collected and participants judged relevance folksonomy search results overlapped with those from the other systems and documents found by both search engines and folksonomies were significantly more likely to be judged relevant than those returned by any single ir system type the search engines in the study had the highest precision and recall but the folksonomies fared surprisingly well delicious was statistically indistinguishable from the directories in many cases overall the directories were more precise than the folksonomies but they had similar recall scores better query handling may enhance folksonomy ir performance further the folksonomies studied were promising and may be able to improve web search performance
the emergence of mobile computing provides the ability to access information at any time and place however as mobile computing environments have inherent factors like power storage asymmetric communication cost and bandwidth limitations efficient query processing and minimum query response time are definitely of great interest this survey groups variety of query optimization and processing mechanisms in mobile databases into two main categories namely query processing strategy and ii caching management strategy query processing includes both pull and push operations broadcast mechanisms we further classify push operation into on demand broadcast and periodic broadcast push operation on demand broadcast relates to designing techniques that enable the server to accommodate multiple requests so that the request can be processed efficiently push operation periodic broadcast corresponds to data dissemination strategies in this scheme several techniques to improve the query performance by broadcasting data to population of mobile users are described caching management strategy defines number of methods for maintaining cached data items in clients local storage this strategy considers critical caching issues such as caching granularity caching coherence strategy and caching replacement policy finally this survey concludes with several open issues relating to mobile query optimization and processing strategy
kernel functions have become an extremely popular tool in machine learning with an attractive theory as well this theory views kernel as implicitly mapping data points into possibly very high dimensional space and describes kernel function as being good for given learning problem if data is separable by large margin in that implicit space however while quite elegant this theory does not necessarily correspond to the intuition of good kernel as good measure of similarity and the underlying margin in the implicit space usually is not apparent in natural representations of the data therefore it may be difficult for domain expert to use the theory to help design an appropriate kernel for the learning task at hand moreover the requirement of positive semi definiteness may rule out the most natural pairwise similarity functions for the given problem domainin this work we develop an alternative more general theory of learning with similarity functions ie sufficient conditions for similarity function to allow one to learn well that does not require reference to implicit spaces and does not require the function to be positive semi definite or even symmetric instead our theory talks in terms of more direct properties of how the function behaves as similarity measure our results also generalize the standard theory in the sense that any good kernel function under the usual definition can be shown to also be good similarity function under our definition though with some loss in the parameters in this way we provide the first steps towards theory of kernels and more general similarity functions that describes the effectiveness of given function in terms of natural similarity based properties
safety and security are top concerns in maritime navigation particularly as maritime traffic continues to grow and as crew sizes are reduced the automatic identification system ais plays key role in regard to these concerns this system whose objective is in part to identify and locate vessels transmits location related information from vessels to ground stations that are part of so called vessel traffic service vts thus enabling these to track the movements of the vessels this paper presents techniques that improve the existing ais by offering better and guaranteed tracking accuracies at lower communication costs the techniques employ movement predictions that are shared between vessels and the vts empirical studies with prototype implementation and real vessel data demonstrate that the techniques are capable of significantly improving the ais
mature knowledge allows engineering disciplines the achievement of predictable results unfortunately the type of knowledge used in software engineering can be considered to be of relatively low maturity and developers are guided by intuition fashion or market speak rather than by facts or undisputed statements proper to an engineering discipline testing techniques determine different criteria for selecting the test cases that will be used as input to the system under examination which means that an effective and efficient selection of test cases conditions the success of the tests the knowledge for selecting testing techniques should come from studies that empirically justify the benefits and application conditions of the different techniques this paper analyzes the maturity level of the knowledge about testing techniques by examining existing empirical studies about these techniques we have analyzed their results and obtained testing technique knowledge classification based on their factuality and objectivity according to four parameters
designing and implementing system software so that it scales well on shared memory multiprocessors smmps has proven to be surprisingly challenging to improve scalability most designers to date have focused on concurrency by iteratively eliminating the need for locks and reducing lock contention however our experience indicates that locality is just as if not more important and that focusing on locality ultimately leads to more scalable system in this paper we describe methodology and framework for constructing system software structured for locality exploiting techniques similar to those used in distributed systems specifically we found two techniques to be effective in improving scalability of smmp operating systems an object oriented structure that minimizes sharing by providing natural mapping from independent requests to independent code paths and data structures and ii the selective partitioning distribution and replication of object implementations in order to improve locality we describe concrete examples of distributed objects and our experience implementing them we demonstrate that the distributed implementations improve the scalability of operating system intensive parallel workloads
memory based collaborative filtering cf makes recommendations based on collection of user preferences for items the idea underlying this approach is that the interests of an active user will more likely coincide with those of users who share similar preferences to the active user hence the choice and computation of similarity measure between users is critical to rating items this work proposes similarity update method that uses an iterative message passing procedure additionally this work deals with drawback of using the popular mean absolute error mae for performance evaluation namely that ignores ratings distribution novel modulation method and an accuracy metric are presented in order to minimize the predictive accuracy error and to evenly distribute predicted ratings over true rating scales preliminary results show that the proposed similarity update and prediction modulation techniques significantly improve the predicted rankings
many empirical studies have found that software metrics can predict class error proneness and the prediction can be used to accurately group error prone classes recent empirical studies have used open source systems these studies however focused on the relationship between software metrics and class error proneness during the development phase of software projects whether software metrics can still predict class error proneness in system’s post release evolution is still question to be answered this study examined three releases of the eclipse project and found that although some metrics can still predict class error proneness in three error severity categories the accuracy of the prediction decreased from release to release furthermore we found that the prediction cannot be used to build metrics model to identify error prone classes with acceptable accuracy these findings suggest that as system evolves the use of some commonly used metrics to identify which classes are more prone to errors becomes increasingly difficult and we should seek alternative methods to the metric prediction models to locate error prone classes if we want high accuracy
recently with the broad usage of location aware devices applications with moving object management become very popular in order to manage moving objects efficiently many spatial spatial temporal data access methods have been proposed however most of these data access methods are designed for single user environments in multiple user systems frequent updates may cause significant number of read write conflicts using these data access methods in this paper we propose an efficient framework concurrent location management clam for managing moving objects in multiple user environments the proposed concurrency control protocol integrates the efficiency of the link based approach and the flexibility of the lock coupling mechanism based on this protocol concurrent location update and search algorithms are provided we formally analyze and prove the correctness of the proposed concurrent operations experiment results on real datasets validate the efficiency and scalability of the proposed concurrent location management framework
novel adaptive trust based anonymous network atan is proposed the distributed and decentralised network management in atan does not require central authority so that atan alleviates the problem of single point of failure in some existing anonymous networks packets are routed onto intermediate nodes anonymously without knowing whether these nodes are trustworthy on the other hand an intermediate node should ensure that packets which it forwards are not malicious and it will not be allegedly accused of involving in the attack to meet these objectives the intermediate node only forwards packets received from the trusted predecessor which can be either the source or another intermediate node in atan our trust and reputation model aims to enhance anonymity by establishing trust and reputation relationship between the source and the forwarding members the trust and reputation relationship of any two nodes is adaptive to new information learned by these two nodes or recommended from other trust nodes therefore packets are anonymously routed from the trusted source to the destination through trusted intermediate nodes thereby improving anonymity of communications
it is time honored fashion to implement domain specific language dsl by translation to general purpose language such an implementation is more portable but an unidiomatic translation jeopardizes performance because in practice language implementations favor the common cases this tension arises especially when the domain calls for complex control structures we illustrate this tension by revisiting landin’s original correspondence between algol and church’s lambda notationwe translate domain specific programs with lexically scoped jumps to javascript our translation produces the same block structure and binding structure as in the source program la abdali the target code uses control operator in direct style la landin in fact the control operator used is almost landin’s hence our title our translation thus complements continuation passing translation la steele these two extreme translations require javascript implementations to cater either for first class continuations as rhino does or for proper tail recursion less extreme translations should emit more idiomatic control flow instructions such as for break and throwthe present experiment leads us to conclude that translations should preserve not just the data structures and the block structure of source program but also its control structure we thus identify new class of use cases for control structures in javascript namely the idiomatic translation of control structures from dsls
reconfigurable place transition systems are petri nets with initial markings and set of rules which allow the modification of the net during runtime in order to adapt the net to new requirements of the environment in this paper we use transformation rules for place transition systems in the sense of the double pushout approach for graph transformation the main problem in this context is to analyze under which conditions net transformations and token firing can be executed in arbitrary order this problem is solved in the main theorems of this paper reconfigurable place transition systems then are applied in mobile network scenario
the development of complex products such as automobiles involves engineering changes that frequently require redesigning or altering the products although it has been found that efficient management of knowledge and collaboration in engineering changes is crucial for the success of new product development extant systems for engineering changes focus mainly on storing documents related to the engineering changes or simply automating the approval processes while the knowledge that is generated from collaboration and decision making processes may not be captured and managed easily this consequently limits the use of the systems by the participants in engineering change processes this paper describes model for knowledge management and collaboration in engineering change processes and based on the model builds prototype system that demonstrates the model’s strengths we studied major korean automobile company to analyze the automobile industry’s unique requirements regarding engineering changes we also developed domain ontologies from the case to facilitate knowledge sharing in the design process for achieving efficient retrieval and reuse of past engineering changes we used case based reasoning cbr with concept based similarity measure
in this paper we introduce cacti significant enhancement of cacti cacti adds support for modeling of commodity dram technology and support for main memory dram chip organization cacti enables modeling of the complete memory hierarchy with consistent models all the way from sram based caches through main memory drams on dimms we illustrate the potential applicability of cacti in the design and analysis of future memory hierarchies by carrying out last level cache study for multicore multithreaded architecture at the nm technology node in this study we use cacti to model all components of the memory hierarchy including last level sram logic process based dram or commodity dram caches and main memory dram chips we carry out architectural simulation using benchmarks with large data sets and present results of their execution time breakdown of power in the memory hierarchy and system energy delay product for the different system configurations we find that commodity dram technology is most attractive for stacked last level caches with significantly lower energy delay products
the nosq microarchitecture performs store load communication without store queue and without executing stores in the out of order engine it uses speculative memory bypassing for all in flight store load communication enabled by percent accurate store load communication predictor the result is simple fast core data path containing no dedicated store load forwarding structures
customer relationship management crm is an important concept to maintain competitiveness at commerce thus many organizations hastily implement ecrm and fail to achieve its goal crm concept consists of number of compound components on product designs marketing attributes and consumer behaviors this requires different approaches from traditional ones in developing ecrm requirements engineering is one of the important steps in software development without well defined requirements specification developers do not know how to proceed with requirements analysis this research proposes strategy based process for requirements elicitation this framework contains three steps define customer strategies identify consumer and marketing characteristics and determine system requirements prior literature lacks discussing the important role of customer strategies in ecrm development empirical findings reveal that this strategy based view positively improves the performance of requirements elicitation
we present new rigging and skinning method which uses database of partial rigs extracted from set of source characters given target mesh and set of joint locations our system can automatically scan through the database to find the best fitting body parts tailor them to match the target mesh and transfer their skinning information onto the new character for the cases where our automatic procedure fails we provide an intuitive set of tools to fix the problems when used fully automatically the system can generate results of much higher quality than standard smooth bind and with some user interaction it can create rigs approaching the quality of artist created manual rigs in small fraction of the time
group comparison per se is fundamental task in many scientific endeavours but is also the basis of any classifier contrast sets and emerging patterns contrast between groups of categorical data comparing groups of sequence data is relevant task in many applications we define emerging sequences ess as subsequences that are frequent in sequences of one group and less frequent in the sequences of another and thus distinguishing or contrasting sequences of different classes there are two challenges to distinguish sequence classes the extraction of ess is not trivially efficient and only exact matches of sequences are considered in our work we address those problems by suffix tree based framework and similar matching mechanism we propose classifier based on emerging sequences evaluating against two learning algorithms based on frequent subsequences and exact matching subsequences the experiments on two datasets show that our model outperforms the baseline approaches by up to in prediction accuracy
raid robust and adaptable distributed database system for transaction processing is described raid is message passing system with server processes on each site the servers manage concurrent processing consistent replicated copies during site failures and atomic distributed commitment high level layered communications package provides clean location independent interface between servers the latest design of the communications package delivers messages via shared memory in high performance configuration in which several servers are linked into single process raid provides the infrastructure to experimentally investigate various methods for supporting reliable distributed transaction processing measurements on transaction processing time and server cpu time are presented data and conclusions of experiments in three categories are also presented communications software consistent replicated copy control during site failures and concurrent distributed checkpointing software tool for the evaluation of transaction processing algorithms in an operating system kernel is proposed
this paper introduces framework for long distance face recognition using both dense and sparse stereo reconstruction two methods to determine correspondences of the stereo pair are used in this paper dense global stereo matching using maximum posteriori markov random fields map mrf algorithms and active appearance model aam fitting of both images of the stereo pair and using the fitted aam mesh as the sparse correspondences experiments are performed regarding the use of different features extracted from these vertices for face recognition comparison between the two approaches and are carried out in this paper the cumulative rank curves cmc which are generated using the proposed framework confirms the feasibility of the proposed work for long distance recognition of human faces
chip multi processor cmp architectures have become mainstream for designing processors with large number of cores networks on chip nocs provide scalable communication method for cmp architectures nocs must be carefully designed to meet constraints of power consumption and area and provide ultra low latencies existing nocs mostly use dimension order routing dor to determine the route taken by packet in unicast traffic however with the development of diverse applications in cmps one to many multicast and one to all broadcast traffic are becoming more common current unicast routing cannot support multicast and broadcast traffic efficiently in this paper we propose recursive partitioning multicast rpm routing and detailed multicast wormhole router design for nocs rpm allows routers to select intermediate replication nodes based on the global distribution of destination nodes this provides more path diversities thus achieves more bandwidth efficiency and finally improves the performance of the whole network our simulation results using detailed cycle accurate simulator show that compared with the most recent multicast scheme rpm saves of crossbar and link power and of link utilization with network performance improvement also rpm is more scalable to large networks than the recently proposed vctm
in this paper we proposed an efficient and accurate text chunking system using linear svm kernel and new technique called masked method previous researches indicated that systems combination or external parsers can enhance the chunking performance however the cost of constructing multi classifiers is even higher than developing single processor moreover the use of external resources will complicate the original tagging process to remedy these problems we employ richer features and propose masked based method to solve unknown word problem to enhance system performance in this way no external resources or complex heuristics are required for the chunking system the experiments show that when training with the conll chunking dataset our system achieves in rate with linear furthermore our chunker is quite efficient since it adopts linear kernel svm the turn around tagging time on conll testing data is less than which is about times than polynomial kernel svm
type systems for secure information flow are useful for efficiently checking that programs have secure information flow they are however conservative so that they often reject safe programs as ill typed accordingly users have to check whether the rejected programs indeed have insecure flows to remedy this problem we propose method for automatically finding counterexample of secure information flow input states that actually lead to leakage of secret information our method is novel combination of type based analysis and model checking suspicious execution paths that may cause insecure information flow are first found by using the result of type based information flow analysis and then model checker is used to check whether the paths are indeed unsafe we have formalized and implemented the method the result of preliminary experiments shows that our method can often find counterexamples faster than method using model checker alone
linear constraint databases and query languages are appropriate for spatial database applications not only is the data model suitable for representing large portion of spatial data such as in gis systems but there also exist efficient algorithms for the core operations in the query languages an important limitation of linear constraints however is that they cannot model constructs such as euclidean distance extending such languages to include such constructs without obtaining the full power of polynomial constraints has proven to be quite difficult one approach to this problem by kuijpers kuper paredaens and vandeurzen used the notion of euclidean constructions with ruler and compass as the basis for first order query language while their language had the desired expressive power the semantics are not really natural due to its use of an ad hoc encoding in this paper we define language over similar class of databases with more natural semantics we show that this language captures natural subclass the representation independent queries of the first order language of kuijpers kuper paredaens and vandeurzen
this paper presents mobicausal new protocol to implement causal ordering in mobile computing systems the implementation of causal ordering proposed in this paper uses the new timestamping mechanisms proposed by prakash and singhal for mobile environments dependency sequences and hierarchical clocks our protocol compared with previous proposals is characterized by the elimination of unnecessary inhibition delay in delivering messages while maintaining low message overhead our protocol requires minimal resources on mobile hosts and wireless links the proposed protocol is also scalable and can easily handle dynamic change in the number of participating mobile hosts in the system
many tasks require attention switching for example searching for information on one sheet of paper and then entering this information onto another one with paper we see that people use fingers or objects as placeholders using these simple aids the process of switching attention between displays can be simplified and speeded up with large or multiple visual displays we have many tasks where both attention areas are on the screen and where using finger as placeholder is not suitable one way users deal with this is to use the mouse and highlight their current focus however this also has its limitations in particular in environments where there is no pointing device our approach is to utilize the user’s gaze position to provide visual placeholder the last area where user fixated on the screen before moving their attention away is highlighted we call this visual reminder gazemark gazemarks ease orientation and the resumption of the interrupted task when coming back to this display in this paper we report on study where the effectiveness of using gazemarks was investigated in particular we show how they can ease attention switching our results show faster completion times for resumed simple visual search task when using this technique the paper analyzes relevant parameters for the implementation of gazemarks and discusses some further application areas for this approach
method for the automatic generation of test scenarios from the behavioral requirements of system is presented in this paper the generated suite of test scenarios validates the system design or implementation against the requirements the approach proposed here uses requirements model and set of four algorithms the requirements model is an executable model of the proposed system defined in deterministic state based modeling formalism each action in the requirements model that changes the state of the model is identified with unique requirement identifier the scenario generation algorithms perform controlled simulations of the requirements model in order to generate suite of test scenarios applicable for black box testing measurements of several metrics on the scenario generation algorithms have been collected using prototype tools
we consider ad hoc radio networks in which each node knows only its own identity but is unaware of the topology of the network or of any bound on its size or diameter acknowledged broadcasting ab is communication task consisting in transmitting message from distinguished source to all other nodes of the network and making this fact common knowledge among all nodes to do this the underlying directed graph must be strongly connected working in model allowing all nodes to transmit spontaneously even before getting the source message chlebus et al chlebus ga sieniec gibbons pelc rytter deterministic broadcasting in unknown radio networks distrib comput proved that ab is impossible if collision detection is not available and gave an ab algorithm using collision detection that works in time nd where is the number of nodes and is the eccentricity of the source uchida et al uchida chen wada acknowledged broadcasting and gossiping in ad hoc radio networks theoret comput sci showed an ab algorithm without collision detection working in time log for all strongly connected networks of size at least in particular it follows that the impossibility result from chlebus ga sieniec gibbons pelc rytter deterministic broadcasting in unknown radio networks distrib comput is really caused by the singleton network for which ab amounts to realize that the source is alone we improve those two results by presenting two generic ab algorithms using broadcasting algorithm without acknowledgement as procedure for large class of broadcasting algorithms the resulting ab algorithm has the same time complexity using the currently best known broadcasting algorithms we obtain an ab algorithm with collision detection working in time min nlog nlognloglogn for arbitrary strongly connected networks and an ab algorithm without collision detection working in time nlognloglogn for all strongly connected networks of size moreover we show that in the model in which only nodes that already got the source message can transmit ab is infeasible in strong sense for any ab algorithm there exists an infinite family of networks for which this algorithm is incorrect
spoken language generation for dialogue systems requires dictionary of mappings between the semantic representations of concepts that the system wants to express and the realizations of those concepts dictionary creation is costly process it is currently done by hand for each dialogue domain we propose novel unsupervised method for learning such mappings from user reviews in the target domain and test it in the restaurant and hotel domains experimental results show that the acquired mappings achieve high consistency between the semantic representation and the realization and that the naturalness of the realization is significantly higher than the baseline
in this paper we propose mobile terminal mt location registration update model in this model the registration decision is based on two factors the time elapsed since last call arrival and the distance the mt has traveled since last registration it is established that the optimal registration strategy can be represented by curve only when the state of the system reaches this curve is registration performed in order for an mt to calculate its traveled distance an interactive implementation scheme and distance calculation algorithm are developed when the call interarrival times are independent and geometrically distributed the proposed model becomes distance based model and in this case the optimal registration strategy is of threshold structure for the distance based model single sample path based ordinal optimization algorithm is devised in this algorithm without any knowledge about the system parameters the mt observes the system state transitions estimates the ordinal of set of strategies and updates the registration strategy adaptively since only single sample path is used this algorithm can be implemented online several numerical examples are provided to compare the proposed model and the existing ones
in mobile and wireless environments mobile clients can access information with respect to their locations by submitting location dependent spatial queries ldsqs to location based service lbs servers owing to scarce wireless channel bandwidth and limited client battery life frequent ldsq submission from clients must be avoided observing that ldsqs issued from similar client positions would normally return the same results we explore the idea of valid scope that represents spatial area in which set of ldsqs will retrieve exactly the same query results with valid scope derived and an ldsq result cached at the client side client can assert whether the new ldsqs can be answered with the maintained ldsq result thus eliminating the ldsqs sent to the server as such contention on wireless channel and client energy consumed for data transmission can be considerably reduced in this paper we design efficient algorithms to compute the valid scope for common types of ldsqs including nearest neighbor queries and range queries through an extensive set of experiments our proposed valid scope computation algorithms are shown to significantly outperform existing approaches
we present framework for real time animation of explosions that runs completely on the gpu the simulation allows for arbitrary internal boundaries and is governed by combustion process stable fluid solver which includes thermal expansion and turbulence modeling the simulation results are visualised by two particle systems rendered using animated textures the results are physically based non repeating and dynamic real time explosions with high visual quality
in this paper we propose memory reduction as new approach to data locality enhancement under this approach we use the compiler to reduce the size of the data repeatedly referenced in collection of nested loops between their reuses the data will more likely remain in higher speed memory devices such as the cache specifically we present an optimal algorithm to combine loop shifting loop fusion and array contraction to reduce the temporary array storage required to execute collection of loops when applied to benchmark programs our technique reduces the memory requirement counting both the data and the code by on average the transformed programs gain speedup of on average due to the reduced footprint and consequently the improved data locality
domain model which captures the common knowledge and the possible variability allowed among applications in domain may assist in the creation of other valid applications in that domain however to create such domain models is not trivial task it requires expertise in the domain reaching very high level of abstraction and providing flexible yet formal artifacts in this paper an approach called semi automated domain modeling sdm to create draft domain models from applications in those domains is presented sdm takes repository of application models in domain and matches merges and generalizes them into sound draft domain models that include the commonality and variability allowed in these domains the similarity of the different elements is measured with consideration of syntactic semantic and structural aspects unlike ontology and schema integration these models capture both structural and behavioral aspects of the domain running sdm on small repositories of project management applications and scheduling systems we found that the approach may provide reasonable draft domain models whose comprehensibility correctness completeness and consistency levels are satisfactory
this paper tackles the problem of structural integration testing of stateful classes previous work on structural testing of objectoriented software exploits data flow analysis to derive test requirements for class testing and defines contextual def use associations to characterize inter method relations non contextual data flow testing of classes works well for unit testing but not for integration testing since it misses definitions and uses when properly encapsulated contextual data flow analysis approaches investigated so far either do not focus on state dependent behavior or have limited applicability due to high complexity this paper proposes an efficient structural technique based on contextual data flow analysis to test state dependent behavior of classes that aggregate other classes as part of their state
the augmented graph model as introduced in kleinberg stoc is an appealing model for analyzing navigability in social networks informally this model is defined by pair where is graph in which inter node distances are supposed to be easy to compute or at least easy to estimate this graph is augmented by links called long range links that are selected according to the probability distribution the augmented graph model enables the analysis of greedy routing in augmented graphs in greedy routing each intermediate node handling message for target selects among all its neighbors in the one that is the closest to in and forwards the message to it this paper addresses the problem of checking whether given graph is an augmented graph it answers part of the questions raised by kleinberg in his problem int congress of math more precisely given we aim at extracting the base graph and the long range links out of we prove that if has high clustering coefficient and has bounded doubling dimension then simple local maximum likelihood algorithm enables us to partition the edges of into two sets and such that and the edges in are of small stretch ie the map is not perturbed too greatly by undetected long range links remaining in the perturbation is actually so small that we can prove that the expected performances of greedy routing in using the distances in are close to the expected performances of greedy routing using the distances in although this latter result may appear intuitively straightforward since it is not as we also show that routing with map more precise than may actually damage greedy routing significantly finally we show that in the absence of hypothesis regarding the high clustering coefficient any local maximum likelihood algorithm extracting the long range links can miss the detection of logn long range links of stretch for any
we propose novel algorithm terminated ramp support vector machines tr svm for classification and feature ranking purposes in the family of support vector machines the main improvement relies on the fact that the kernel is automatically determined by the training examples it is built as function of simple classifiers generalized terminated ramp functions obtained by separating oppositely labeled pairs of training points the algorithm has meaningful geometrical interpretation and it is derived in the framework of tikhonov regularization theory its unique free parameter is the regularization one representing trade off between empirical error and solution complexity employing the equivalence between the proposed algorithm and two layer networks theoretical bound on the generalization error is also derived together with vapnik chervonenkis dimension performances are tested on number of synthetic and real data sets
verifying properties of large real world programs requires vast quantities of information on aspects such as procedural contexts loop invariants or pointer aliasing it is unimaginable to have all these properties provided to verification tool by annotations from the user static analysis will clearly play key role in the design of future verification engines by automatically discovering the bulk of this information the body of research in static program analysis can be split up in two major areas one probably the larger in terms of publications is concerned with discovering properties of data structures shape analysis pointer analysis the other addresses the inference of numerical invariants for integer or floating point algorithms range analysis propagation of round off errors in numerical algorithms we will call the former symbolic static analysis and the latter numerical static analysis both areas were successful in effectively analyzing large applications however symbolic and numerical static analysis are commonly regarded as entirely orthogonal problems for example pointer analysis usually abstracts away all numerical values that appear in the program whereas the floating point analysis tool astree does not abstract memory at all
in mobile computing environment database servers disseminate information to multiple mobile clients via wireless channels due to the low bandwidth and low reliability of wireless channels it is important for mobile client to cache its frequently accessed database items into its local storage this improves performance of database queries and improves availability of database items for query processing during disconnection in this paper we investigate issues on caching granularity coherence strategy and replacement policy of caching mechanisms for mobile environment utilizing point to point communication paradigmwe first illustrate that page based caching is not suitable in the mobile context due to the lack of locality among database items we propose three different levels of caching granularity attribute caching object caching and hybrid caching hybrid approach of attribute and object caching next we show that existing coherence strategies are inappropriate due to frequent disconnection in mobile environment and propose cache coherence strategy based on the update patterns of database items via detail simulation model we examine the performance of various levels of caching granularity with our cache coherence strategy we observe in general that hybrid caching could achieve better performance finally we propose several cache replacement policies that can adapt to the access patterns of database items for each given caching granularity we discover that our replacement policies outperform conventional ones in most situations
prior work including our own shows that application performance in garbage collected languages is highly dependent upon the application behavior and on underlying resource availability we show that given wide range of diverse garbage collection gc algorithms no single system performs best across programs and heap sizes we present java virtual machine extension for dynamic and automatic switching between diverse widely used gcs for application specific garbage collection selection we describe annotation guided and automatic gc switching we also describe novel extension to extant on stack replacement osr mechanisms for aggressive gc specialization that is readily amenable to compiler optimization
the paper considers the consensus problem in partially synchronous system with byzantine processes in this context the literature distinguishes authenticated byzantine faults where messages can be signed by the sending process with the assumption that the signature cannot be forged by any other process and byzantine faults where there is no mechanism for signatures but the receiver of message knows the identity of the sender the paper proposes an abstraction called weak interactive consistency wic that unifies consensus algorithms with and without signed messages wic can be implemented with and without signaturesthe power of wic is illustrated on two seminal byzantine consensus algorithms the castro liskov pbft algorithm no signatures and the martin alvisi fab paxos algorithms signatures wic allows very concise expression of these two algorithms
we present novel algorithms that optimize the order in which triangles are rendered to improve post transform vertex cache efficiency as well as for view independent overdraw reduction the resulting triangle orders perform on par with previous methods but are orders magnitude faster to compute the improvements in processing speed allow us to perform the optimization right after model is loaded when more information on the host hardware is available this allows our vertex cache optimization to often outperform other methods in fact our algorithms can even be executed interactively allowing for re optimization in case of changes to geometry or topology which happen often in cad cam applications we believe that most real time rendering applications will immediately benefit from these new results
this paper presents simple but effective method to reduce on chip access latency and improve core isolation in cmp non uniform cache architectures nuca the paper introduces feasible way to allocate cache blocks according to the access pattern each bank is dynamically partitioned at set level in private and shared content simply by adjusting the replacement algorithm we can place private data closer to its owner processor in contrast independently of the accessing processor shared data is always placed in the same position this approach is capable of reducing on chip latency without significantly sacrificing hit rates or increasing implementation cost of conventional static nuca additionally most of the unnecessary interference between cores in private accesses is removed to support the architectural decisions adopted and provide comparative study comprehensive evaluation framework is employed the workbench is composed of full system simulator and representative set of multithreaded and multiprogrammed workloads with this infrastructure different alternatives for the coherence protocol replacement policies and cache utilization are analyzed to find the optimal proposal we conclude that the cost for feasible implementation should be closer to conventional static nuca and significantly less than dynamic nuca finally comparison with static and dynamic nuca is presented the simulation results suggest that on average the mechanism proposed could improve system performance of static nuca and idealized dynamic nuca by and respectively
artists often need to import and embellish models coming from cad cam into vector graphics software to produce eg brochures or manuals current automatic solutions tend to result at best in triangle soup and artists often have to trace over renderings we describe method to convert models into layered vector illustrations that respect visibility and facilitate further editing our core contribution is visibility method that can partition mesh into large components that can be layered according to visibility because self occluding objects and objects forming occlusion cycles cannot be represented by layers without being cut we introduce new cut algorithm that uses graph representation of the mesh and curvature aware geodesic distances
class sharing is new language mechanism for building extensible software systems recent work has separately explored two different kinds of extensibility first family inheritance in which an entire family of related classes can be inherited and second adaptation in which existing objects are extended in place with new behavior and state class sharing integrates these two kinds of extensibility mechanisms with little programmer effort objects of one family can be used as members of another while preserving relationships among objects therefore family of classes can be adapted in place with new functionality spanning multiple classes object graphs can evolve from one family to another adding or removing functionality even at run time several new mechanisms support this flexibility while ensuring type safety class sharing has been implemented as an extension to java and its utility for evolving and extending software is demonstrated with realistic systems
spurred by range of potential applications there has been growing body of research in computational models of human emotion to advance the development of these models it is critical that we evaluate them against the phenomena they purport to model in this paper we present one method to evaluate an emotion model that compares the behavior of the model against human behavior using standard clinical instrument for assessing human emotion and coping we use this method to evaluate the emotion and adaptation ema model of emotion gratch and marsella the evaluation highlights strengths of the approach and identifies where the model needs further development
for set of points in �d spanner is sparse graph on the points of such that between any pair of points there is path in the spanner whose total length is at most times the euclidean distance between the points in this paper we show how to construct epsilon spanner with epsilon edges and maximum degree epsilon in time log spanner with similar properties was previously presented in however using our new construction coupled with several other innovations we obtain new results for two fundamental problems for constant doubling dimension metrics the first result is an essentially optimal compact routing scheme in particular we show how to perform routing with stretch of isin where the label size is log and the size of the table stored at each point is only log epsilon this routing problem was first considered by peleg and hassin who presented routing scheme in the plane later chan et al and abraham et al considered this problem for doubling dimension metric spaces abraham et al were the first to present isin routing scheme where the label size depends solely on the number of points in their scheme labels are of size of log and each point stores table of size log epsilon in our routing scheme we achieve routing tables of size log epsilon which is essentially the same size as label up to the factor of epsilon the second and main result of this paper is the first fully dynamic geometric spanner with poly logarithmic update time for both insertions and deletions we present an algorithm that allows points to be inserted into and deleted from with an amortized update time of log
this paper describes the security framework that is to be developed for the generic grid platform created for the project gredia this platform is composed of several components that need to be secured the platform uses the ogsa standards so that the security framework will follow gsi the portion of globus that implements security thus we will show the security features that gsi already provides and we will outline which others need to be created or enhanced
in parallel adaptive applications the computational structure of the applications changes over time leading to load imbalances even though the initial load distributions were balanced to restore balance and to keep communication volume low in further iterations of the applications dynamic load balancing repartitioning of the changed computational structure is required repartitioning differs from static load balancing partitioning due to the additional requirement of minimizing migration cost to move data from an existing partition to new partition in this paper we present novel repartitioning hypergraph model for dynamic load balancing that accounts for both communication volume in the application and migration cost to move data in order to minimize the overall cost the use of hypergraph based model allows us to accurately model communication costs rather than approximate them with graph based models we show that the new model can be realized using hypergraph partitioning with fixed vertices and describe our parallel multilevel implementation within the zoltan load balancing toolkit to the best of our knowledge this is the first implementation for dynamic load balancing based on hypergraph partitioning to demonstrate the effectiveness of our approach we conducted experiments on linux cluster with processors the results show that in terms of reducing total cost our new model compares favorably to the graph based dynamic load balancing approaches and multilevel approaches improve the repartitioning quality significantly
secure broadcasting of web documents is becoming crucial requirement for many web based applications under the broadcast document dissemination strategy web document source periodically broadcasts portions of its documents to potentially large community of users without the need for explicit requests by secure broadcasting we mean that the delivery of information to users must obey the access control policies of the document source traditional access control mechanisms that have been adapted for xml documents however do not address the performance issues inherent in access control in this paper labeling scheme is proposed to support rapid reconstruction of xml documents in the context of well known method called xml pool encryption the proposed labeling scheme supports the speedy inference of structure information in all portions of the document the binary representation of the proposed labeling scheme is also investigated in the experimental results the proposed labeling scheme is efficient in searching for the location of decrypted information
this paper presents the alpha ev conditional branch predictor the alpha ev microprocessor project canceled in june in late phase of development envisioned an aggressive wide issue out of order superscalar microarchitecture featuring very deep pipeline and simultaneous multithreading performance of such processor is highly dependent on the accuracy of its branch predictor and consequently very large silicon area was devoted to branch prediction on ev the alpha ev branch predictor relies on global history and features total of kbitsthe focus of this paper is on the different trade offs performed to overcome various implementation constraints for the ev branch predictor one such instance is the pipelining of the predictor on two cycles to facilitate the prediction of up to branches per cycle from any two dynamically successive instruction fetch blocks this resulted in the use of three fetch block old compressed branch history information for accesing the predictor implementation constraints also restricted the composition of the index functions for the predictor and forced the usage of only single ported memory cellsnevertheless we show that the alpha ev branch predictor achieves prediction accuracy in the same range as the state of the art academic global history branch predictors that do not consider implementation constraints in great detail
the widespread use of internet based services is increasing the amount of information such as user profiles that clients are required to disclose this information demand is necessary for regulating access to services and functionally convenient eg to support service customization but it has raised privacy related concerns which if not addressed may affect the users disposition to use network services at the same time servers need to regulate service access without disclosing entirely the details of their access control policy there is therefore pressing need for privacy aware techniques to regulate access to services open to the networkwe propose an approach for regulating service access and information disclosure on the web the approach consists of uniform formal framework to formulate and reason about both service access and information disclosure constraints it also provides means for parties to communicate their requirements while ensuring that no private information be disclosed and that the communicated requirements are correct with respect to the constraints
mobile is an extension of the net common intermediate language that supports certified in lined reference monitoring mobile programs have the useful property that if they are well typed with respect to declared security policy then they are guaranteed not to violate that security policy when executed thus when an in lined reference monitor irm is expressed in mobile it can be certified by simple type checker to eliminate the need to trust the producer of the irmsecurity policies in mobile are declarative can involve unbounded collections of objects allocated at runtime and can regard infinite length histories of security events exhibited by those objects the prototype mobile implementation enforces properties expressed by finite state security automata one automaton for each security relevant object and can type check mobile programs in the presence of exceptions finalizers concurrency and non termination executing mobile programs requires no change to existing net virtual machine implementations since mobile programs consist of normal managed cil code with extra typing annotations stored in net attributes
the fields of machine learning and mathematical programming are increasingly intertwined optimization problems lie at the heart of most machine learning approaches the special topic on machine learning and large scale optimization examines this interplay machine learning researchers have embraced the advances in mathematical programming allowing new types of models to be pursued the special topic includes models using quadratic linear second order cone semi definite and semi infinite programs we observe that the qualities of good optimization algorithms from the machine learning and optimization perspectives can be quite different mathematical programming puts premium on accuracy speed and robustness since generalization is the bottom line in machine learning and training is normally done off line accuracy and small speed improvements are of little concern in machine learning machine learning prefers simpler algorithms that work in reasonable computational time for specific classes of problems reducing machine learning problems to well explored mathematical programming classes with robust general purpose optimization codes allows machine learning researchers to rapidly develop new techniques in turn machine learning presents new challenges to mathematical programming the special issue include papers from two primary themes novel machine learning models and novel optimization approaches for existing models many papers blend both themes making small changes in the underlying core mathematical program that enable the develop of effective new algorithms
we propose xrpc minimal xquery extension that enables distributed yet efficient querying of heterogeneous xquery data sources xrpc enhances the existing concept of xquery functions with the remote procedure call rpc paradigm by calling out of an xquery for loop to multiple destinations and by calling functions that themselves perform xrpc calls complex pp communication patterns can be achieved the xrpc extension is orthogonal to all xquery features including the xquery update facility xquf we provide formal semantics for xrpc that encompasses execution of both read only and update queries xrpc is also network soap sub protocol that integrates seamlessly with web services and service oriented architectures soa and ajax based guis crucial feature of the protocol is bulk rpc that allows remote execution of many different calls to the same procedure using possibly single network round trip the efficiency potential of xrpc is demonstrated via an open source implementation in monetdb xquery we show however that xrpc is not system specific every xquery data source can service xrpc calls using wrapper since xquery is pure functional language we can leverage techniques developed for functional query decomposition to rewrite data shipping queries into xrpc based function shipping queries powerful distributed database techniques such as semi join optimizations directly map on bulk rpc opening up interesting future work opportunities
haptic gestures and sensations through the sense of touch are currently unavailable in remote communication there are two main reasons for this good quality haptic technology has not been widely available and knowledge on the use of this technology is limited to address these challenges we studied how users would like to and managed to create spatial haptic information by gesturing two separate scenario based experiments were carried out an observation study without technological limitations and study on gesturing with functional prototype with haptic actuators the first study found three different use strategies for the device the most common gestures were shaking smoothing and tapping multimodality was requested to create the context for the communication and to aid the interpretation of haptic stimuli the second study showed that users were able to utilize spatiality in haptic messages eg forward backward gesture for agreement however challenges remain in presenting more complex information via remote haptic communication the results give guidance for communication activities that are usable in spatial haptic communication and how to make it possible to enable this form of communication in reality
as parallelism in microprocessors becomes mainstream new prog ramming languages and environments are emerging to meet the challenges of parallel programming to support research on these languages we are developing low level language infrastructure called pillar derived from parallel implementation language although pillar programs are intended to be automatically generated from source programs in each parallel language pillar programs can also be written by expert programmers the language is defined as small set of extensions to as result pillar is familiar to programmers but more importantly it is practical to reuse an existing optimizing compiler like gcc or open to implement pillar compilerpillar’s concurrency features include constructs for threading synchronization and explicit data parallel operations the threading constructs focus on creating new threads only when hardware resources are idle and otherwise executing parallel work within existing threads thus minimizing thread creation overhead in addition to the usual synchronization constructs pillar includes transactional memory its sequential features include stack walking second class continuations support for precise garbage collection tail calls and seamless integration of pillar and legacy code this paper describes the design and implementation of the pillar software stack including the language compiler runtime and high level converters that translate high level language programs into pillar programs it also reports on early experience with three high level languages that target pillar
real time java is quickly emerging as platform for building safety critical embedded systems the real time variants of java including are attractive alternatives to ada and since they provide cleaner simpler and safer programming model unfortunately current real time java implementations have trouble scaling down to very hard real time embedded settings where memory is scarce and processing power is limited in this paper we describe the architecture of the fiji vm which enables vanilla java applications to run in very hard environments including booting on bare hardware with only very rudimentary operating system support we also show that our minimalistic approach delivers comparable performance to that of server class production java virtual machine implementations
research on ontologies is becoming widespread in the biomedical informatics community at the same time it has become apparent that the challenges of properly constructing and maintaining ontologies have proven more difficult than many workers in the field initially expected discovering general feasible methods has thus become central activity for many of those hoping to reap the benefits of ontologies this paper reviews current methods in the construction maintenance alignment and evaluation of ontologies
the classification performance of nearest prototype classifiers largely relies on the prototype learning algorithm the minimum classification error mce method and the soft nearest prototype classifier snpc method are two important algorithms using misclassification loss this paper proposes new prototype learning algorithm based on the conditional log likelihood loss cll which is based on the discriminative model called log likelihood of margin logm regularization term is added to avoid over fitting in training as well as to maximize the hypothesis margin the cll in the logm algorithm is convex function of margin and so shows better convergence than the mce in addition we show the effects of distance metric learning with both prototype dependent weighting and prototype independent weighting our empirical study on the benchmark datasets demonstrates that the logm algorithm yields higher classification accuracies than the mce generalized learning vector quantization glvq soft nearest prototype classifier snpc and the robust soft learning vector quantization rslvq and moreover the logm with prototype dependent weighting achieves comparable accuracies to the support vector machine svm classifier
the evolution of multimedia technology and the internet boost the multimedia sharing and searching activities among social networks the requirements of semantic multimedia retrieval goes far beyond those provided by the text based search engines technology here we present an collaborative approach that enables the semantic search of the multimedia objects by the collective discovery and meaningful indexing of their semantic concepts through the successive use of our model semantic concepts can be discovered and incorporated by analyzing the users search queries relevance feedback and selection patterns eventually through the growth and evolution of the index hierarchy the semantic index can be dynamically constructed validated and naturally built up towards the expectation of the social network
in this study we describe methodology to exploit specific type of domain knowledge in order to find tighter error bounds on the performance of classification via support vector machines the domain knowledge we consider is that the input space lies inside of specified convex polytope first we consider prior knowledge about the domain by incorporating upper and lower bounds of attributes we then consider more general framework that allows us to encode prior knowledge in the form of linear constraints formed by attributes by using the ellipsoid method from optimization literature we show that this can be exploited to upper bound the radius of the hyper sphere that contains the input space and enables us to tighten generalization error bounds we provide comparative numerical analysis and show the effectiveness of our approach
common geographic information systems gis require high degree of expertise from its users making them difficult to be operated by laymen this paper describes novel approaches to easily perform typical basic spatial tasks within gis eg pan zoom and selection operations by using multi touch gestures in combination with foot gestures we are interested in understanding how non expert users interact with such multi touch surfaces we provide categorization and framework of multi touch hand gestures for interacting with gis this framework is based on an initial evaluation we present results of more detailed in situ study mainly focusing on multi user multi touch interaction with geospatial data furthermore we extend our framework using combination of multi touch gestures with small set of foot gestures to solve geospatial tasks
we study the problem of continuous monitoring of top queries over multiple non synchronized streams assuming sliding window model this general problem has been well addressed research topic in recent years most approaches however assume synchronized streams where all attributes of an object are known simultaneously to the query processing engine in many streaming scenarios though different attributes of an item are reported in separate non synchronized streams which do not allow for exact score calculations we present how the traditional notion of object dominance changes in this case such that the dominance set still includes all and only those objects which have chance of being among the top results in their life time based on this we propose an exact algorithm which builds on generating multiple instances of the same object in way that enables efficient object pruning we show that even with object pruning the necessary storage for exact evaluation of top queries is linear in the size of the sliding window as data should reside in main memory to provide fast answers in an online fashion and cope with high stream rates storing all this data may not be possible with limited resources we present an approximate algorithm which leverages correlation statistics of pairs of streams to evict more objects while maintaining accuracy we evaluate the efficiency of our proposed algorithms with extensive experiments
this paper studies inductive definitions involving binders in which aliasing between free and bound names is permitted such aliasing occurs in informal specifications of operational semantics but is excluded by the common representation of binding as meta level abstraction drawing upon ideas from functional logic programming we represent such definitions with aliasing as recursively defined functions in higher order typed functional programming language that extends core ml with types for name binding type of semi decidable propositions and existential quantification for types with decidable equality we show that the representation is sound and complete with respect to the language’s operational semantics which combines the use of evaluation contexts with constraint programming we also give new and simple proof that the associated constraint problem is np complete
the automated categorization or classification of texts into predefined categories has witnessed booming interest in the last years due to the increased availability of documents in digital form and the ensuing need to organize them in the research community the dominant approach to this problem is based on machine learning techniques general inductive process automatically builds classifier by learning from set of preclassified documents the characteristics of the categories the advantages of this approach over the knowledge engineering approach consisting in the manual definition of classifier by domain experts are very good effectiveness considerable savings in terms of expert labor power and straightforward portability to different domains this survey discusses the main approaches to text categorization that fall within the machine learning paradigm we will discuss in detail issues pertaining to three different problems namely document representation classifier construction and classifier evaluation
this paper describes new approach to indexing time series indexing time series is more problematic than indexing text since in extreme we need to find all possible subsequences in the time series sequence we propose to use signature files to index the time series and we also propose new method to index the signature files to speed up the search of large time series we propose novel index structure the signature tree for time series indexing we implemented the signature tree and we discuss its performance
rapid technological change has had an impact on the nature of software this has led to new exigencies and to demands for software engineering paradigms that pay particular atttention to meeting them we advocate that such demands can be met at least in large parts through the adoption of software engineering processes that are founded on reflective stance to this end we turn our attention to the field of design rationale we analyze and characterize design rationale approaches and show that despite surface differences between different approaches they all tend to be variants of relatively small set of static and dynamic affinities we use the synthesis of static and dynamic affinities to develop generic model for reflective design the model is nonprescriptive and affects minimally the design process it is context independent and is intended to be used as facilitator in participative design supporting group communication and deliberation the potential utility of the model is demonstrated through two examples one from the world of business design and the other from programming language design
publish subscribe systems have demonstrated the ability to scale to large numbers of users and high data rates when providing content based data dissemination services on the internet however their services are limited by the data semantics and query expressiveness that they support on the other hand the recent work on selective dissemination of xml data has made significant progress in moving from xml filtering to the richer functionality of transformation for result customization but in general has ignored the challenges of deploying such xml based services on an internet scale in this paper we address these challenges in the context of incorporating the rich functionality of xml data dissemination in highly scalable system we present the architectural design of onyx system based on an overlay network we identify the salient technical challenges in supporting xml filtering and transformation in this environment and propose techniques for solving them
it is difficult to write programs that behave correctly in the presence of run time errors existing programming language features often provide poor support for executing clean up code and for restoring invariants in such exceptional situations we present dataflow analysis for finding certain class of error handling mistakes those that arise from failure to release resources or to clean up properly along all paths many real world programs violate such resource safety policies because of incorrect error handling our flow sensitive analysis keeps track of outstanding obligations along program paths and does precise modeling of control flow in the presence of exceptions using it we have found over error handling mistakes almost million lines of java code the analysis is unsound and produces false positives but few simple filtering rules suffice to remove them in practice the remaining mistakes were manually verified these mistakes cause sockets files and database handles to be leaked along some paths we present characterization of the most common causes of those errors and discuss the limitations of exception handling finalizers and destructors in addressing them based on those errors we propose programming language feature that keeps track of obligations at run time and ensures that they are discharged finally we present case studies to demonstrate that this feature is natural efficient and can improve reliability for example retrofitting kloc program with it resulted in code size decrease surprising speed increase from correctly deallocating resources in the presence of exceptions and more consistent behavior
increasing the level of spacecraft autonomy is essential for broadening the reach of solar system exploration computer vision has and will continue to play an important role in increasing autonomy of both spacecraft and earth based robotic vehicles this article addresses progress on computer vision for planetary rovers and landers and has four main parts first we review major milestones in the development of computer vision for robotic vehicles over the last four decades since research on applications for earth and space has often been closely intertwined the review includes elements of both second we summarize the design and performance of computer vision algorithms used on mars in the nasa jpl mars exploration rover mer mission which was major step forward in the use of computer vision in space these algorithms did stereo vision and visual odometry for rover navigation and feature tracking for horizontal velocity estimation for the landers third we summarize ongoing research to improve vision systems for planetary rovers which includes various aspects of noise reduction fpga implementation and vision based slip perception finally we briefly survey other opportunities for computer vision to impact rovers landers and orbiters in future solar system exploration missions
improving the precision of information retrieval has been challenging issue on chinese web as exemplified by chinese recipes on the web it is not easy natural for people to use keywords eg recipe names to search recipes since the names can be literally so abstract that they do not bear much if any information on the underlying ingredients or cooking methods in this paper we investigate the underlying features of chinese recipes and based on workflow like cooking procedures we model recipes as graphs we further propose novel similarity measurement based on the frequent patterns and devise an effective filtering algorithm to prune unrelated data so as to support efficient on line searching benefiting from the characteristics of graphs frequent common patterns can be mined from cooking graph database so in our prototype system called recipeview we extend the subgraph mining algorithm fsg to cooking graphs and combine it with our proposed similarity measurement resulting in an approach that well caters for specific users needs our initial experimental studies show that the filtering algorithm can efficiently prune unrelated cooking graphs without affecting the retrieval performance and the similarity measurement gets relatively higher precision recall against its counterparts
we study epidemic schemes in the context of collaborative data delivery in this context multiple chunks of data reside at different nodes and the challenge is to simultaneously deliver all chunks to all nodes here we explore the inter operation between the gossip of multiple simultaneous message chunks in this setting interacting nodes must select which chunk among many to exchange in every communication round we provide an efficient solution that possesses the inherent robustness and scalability of gossip our approach maintains the simplicity of gossip and has low message connections and computation overhead because our approach differs from solutions proposed by network coding we are able to provide insight into the tradeoffs and analysis of the problem of collaborative content distribution we formally analyze the performance of the algorithm demonstrating its efficiency with high probability
in field study conducted at leading fortune company we examined how having development teams reside in their own large room an arrangement called radical collocation affected system development the collocated projects had significantly higher productivity and shorter schedules than both the industry benchmarks and the performance of past similar projects within the firm the teams reported high satisfaction about their process and both customers and project sponsors were similarly highly satisfied the analysis of questionnaire interview and observational data from these teams showed that being at hand both visible and available helped them coordinate their work better and learn from each other radical collocation seems to be one of the factors leading to high productivity in these teams
in this paper we examine the issue of mining association rules among items in large database of sales transactions mining association rules means that given database of sales transactions to discover all associations among items such that the presence of some items in transaction will imply the presence of other items in the same transaction the mining of association rules can be mapped into the problem of discovering large itemsets where large itemset is group of items that appear in sufficient number of transactions the problem of discovering large itemsets can be solved by constructing candidate set of itemsets first and then identifying within this candidate set those itemsets that meet the large itemset requirement generally this is done iteratively for each large itemset in increasing order of where large itemset is large itemset with items to determine large itemsets from huge number of candidate sets in early iterations is usually the dominating factor for the overall data mining performance to address this issue we develop an effective algorithm for the candidate set generation it is hash based algorithm and is especially effective for the generation of candidate set for large itemsets explicitly the number of candidate itemsets generated by the proposed algorithm is in orders of magnitude smaller than that by previous methods thus resolving the performance bottleneck note that the generation of smaller candidate sets enables us to effectively trim the transaction database size at much earlier stage of the iterations thereby reducing the computational cost for later iterations significantly the advantage of the proposed algorithm also provides us the opportunity of reducing the amount of disk required extensive simulation study is conducted to evaluate performance of the proposed algorithm
new principles framework is presented for retrieval evaluation of ranked outputs it applies decision theory to model relevance decision preferences and shows that the probability ranking principle prp specifies optimal ranking it has two new components namely probabilistic evaluation model and general measure of retrieval effectiveness its probabilities may be interpreted as subjective or objective ones its performance measure is the expected weighted rank which is the weighted average rank of retrieval list starting from this measure the expected forward rank and some existing retrieval effectiveness measures eg top precision and discounted cumulative gain are instantiated using suitable weighting schemes after making certain assumptions the significance of these instantiations is that the ranking prescribed by prp is shown to be optimal simultaneously for all these existing performance measures in addition the optimal expected weighted rank may be used to normalize the expected weighted rank of retrieval systems for summary performance comparison across different topics between systems the framework also extends prp and our evaluation model to handle graded relevance thereby generalizing the discussed existing measures eg top precision and probabilistic retrieval models for graded relevance
vision based technology such as motion detection has long been limited to the domain of powerful processor intensive systems such as desktop pc’s and specialist hardware solutions with the advent of much faster mobile phone processors and memory we are now seeing plethora of feature rich software being deployed onto the mobile platform since these high powered smart phones are now equipped with cameras it has become feasible to combine their powerful processors and the camera to support new ways of interacting with the phone however it is not clear whether or not these processor intensive visual interactions can in fact be run at an acceptable speed on current mobile handsets in this paper we look at one of the most popular and widespread mobile smart phone systems the symbian and benchmark the speed accuracy and deployability of the three popular mobile languages we test pixel thresholding algorithm in python and java and rank them based on their speed within the context of intensive image based processing
this paper makes two contributions to architectural support for software debugging first it proposes novel statistics based on the fly bug detectionmethod called pc based invariant detection the idea is based on the observation that in most programs given memory location is typically accessed by only few instructions therefore by capturing the invariant of the set of pcs that normally access given variable we can detect accesses by outlier instructions which are often caused by memory corruption buffer overflow stack smashing or other memory related bugs since this method is statistics based it can detect bugs that do not violate any programming rules and that therefore are likely to be missed by many existing tools the second contribution is novel architectural extension called the check look aside buffer clb the clb uses bloom filter to reduce monitoring overheads in the recently proposed iwatcher architectural framework for software debugging the clb significantly reduces the overhead of pc based invariant debugging we demonstrate pc based invariant detection tool called accmon that leverages architectural run time system and compiler support our experimental results with seven buggy applications and total of ten bugs show that accmon can detect all ten bugs with few false alarms for five applications and for two applications and with low overhead times several existing tools evaluated including purify ccured and value based invariant detection tools fail to detect some of the bugs in addition purify’s overhead is one order of magnitude higher than accmon’s finally we show that the clb is very effective at reducing overhead
texture synthesis is important for many applications in computer graphics vision and image processing however it remains difficult to design an algorithm that is both efficient and capable of generating high quality results in this paper we present an efficient algorithm for realistic texture synthesis the algorithm is easy to use and requires only sample texture as input it generates textures with perceived quality equal to or better than those produced by previous techniques but runs two orders of magnitude faster this permits us to apply texture synthesis to problems where it has traditionally been considered impractical in particular we have applied it to constrained synthesis for image editing and temporal texture generation our algorithm is derived from markov random field texture models and generates textures through deterministic searching process we accelerate this synthesis process using tree structured vector quantization
icooolps was the first edition ofecoop icooolps workshop it intended to bring researchers and practitioners both from academia and industry together with spirit of openness to try and identify and begin to address the numerous and very varied issues of optimization this succeeded as can be seen from the papers the attendance and the liveliness of the discussions that took place during and after the workshop not to mention few new cooperations or postdoctoral contracts the talented people from different groups who participated were unanimous to appreciate this first edition and recommend that icooolps be continued next year community is thus beginning to form and should be reinforced by second edition next year with all the improvements this first edition made emerge
the distance or similarity metric plays an important role in many natural language processing nlp tasks previous studies have demonstrated the effectiveness of number of metrics such as the jaccard coefficient especially in synonym acquisition while the existing metrics perform quite well to further improve performance we propose the use of supervised machine learning algorithm that fine tunes them given the known instances of similar or dissimilar words we estimated the parameters of the mahalanobis distance we compared number of metrics in our experiments and the results show that the proposed metric has higher mean average precision than other metrics
schema mapping is high level declarative specification of the relationship between two schemas it specifies how data structured under one schema called the source schema is to be converted into data structured under possibly different schema called the target schema schema mappings are fundamental components for both data exchange and data integration to date language for specifying or programming schema mappings exists however developmental support for programming schema mappings is still lacking in particular tool for debugging schema mappings has not yet been developed in this paper we propose to build debugger for understanding and exploring schema mappings we present primary feature of our debugger called routes that describes the relationship between source and target data with the schema mapping we present two algorithms for computing all routes or one route for selected target data both algorithms execute in polynomial time in the size of the input in computing all routes our algorithm produces concise representation that factors common steps in the routes furthermore every minimal route for the selected data can essentially be found in this representation our second algorithm is able to produce one route fast if there is one and alternative routes as needed we demonstrate the feasibility of our route algorithms through set of experimental results on both synthetic and real datasets
we generalize the method of constructing windows in subsequence matching by this generalization we can explain earlier subsequence matching methods as special cases of common framework based on the generalization we propose new subsequence matching method general match the earlier work by faloutsos et al called frm for convenience causes lot of false alarms due to lack of point filtering effect dual match recently proposed as dual approach of frm improves performance significantly over frm by exploiting point filtering effect however it has the problem of having smaller allowable window size half that of frm given the minimum query length smaller window increases false alarms due to window size effect general match offers advantages of both methods it can reduce window size effect by using large windows like frm and at the same time can exploit point filtering effect like dual match general match divides data sequences into generalized sliding windows sliding windows and the query sequence into generalized disjoint windows disjoint windows we formally prove that general match is correct ie it incurs no false dismissal we then propose method of estimating the optimal value of the sliding factor that minimizes the number of page accesses experimental results for real stock data show that for low selectivities sim general match improves average performance by over dual match and by over frm for high selectivities sim by over dual match and by over frm the proposed generalization provides an excellent theoretical basis for understanding the underlying mechanisms of subsequence matching
database reverse engineering dbre methods recover conceptual data models from physical databases the bottom up nature of these methods imposes two major limitations first they do not provide an initial high level abstract schema suitable for use as basis for reasoning about the application domain single detailed schema is only produced at the very end of the project second they provide no support for divide and conquer approach the entire database schema must be analysed and processed as unit this paper presents simple solution to overcome both limitations in our proposal relations are grouped based on their primary keys each group can be perceived in two ways as relational schema that can be reversed engineered as standalone dbre project and as an element either an entity or relationship of high level abstract schema that provides initial insight about the application domain we also present examples from actual large database systems
gene expression data are numerical and describe the level of expression of genes in different situations thus featuring behaviour of the genes two methods based on fca formal concept analysis are considered for clustering gene expression data the first one is based on interordinal scaling and can be realized using standard fca algorithms the second method is based on pattern structures and needs adaptations of standard algorithms to computing with interval algebra the two methods are described in details and discussed the second method is shown to be more computationally efficient and providing more readable results experiments with gene expression data are discussed
due to an increasing volume of xml data it is considered prudent to store xml data on an industry strength database system instead of relying on domain specific application or file system for shredded xml data stored in the relational tables however it may not be straightforward to apply existing algorithms for twig query processing because most of the algorithms require xml data to be accessed in form of streams of elements grouped by their tags and sorted in particular order in order to support xml query processing within the common framework of relational database systems we first propose several bitmap indexes for supporting holistic twig joins on xml data stored in the relational tables since bitmap indexes are well supported in most of the commercial and open source database systems the proposed bitmap indexes and twig query processing algorithms can be incorporated into the relational query processing framework with more ease the proposed query processing algorithms are efficient in terms of both time and space since the compressed bitmap indexes stay compressed during query processing in addition we propose hybrid index which computes twig query solutions with only bit vectors without accessing labeled xml elements stored in the relational tables
incompleteness due to missing attribute values aka null values is very common in autonomous web databases on which user accesses are usually supported through mediators traditional query processing techniques that focus on the strict soundness of answer tuples often ignore tuples with critical missing attributes even if they wind up being relevant to user query ideally we would like the mediator to retrieve such possibleanswers and gauge their relevance by accessing their likelihood of being pertinent answers to the query the autonomous nature of web databases poses several challenges in realizing this objective such challenges include the restricted access privileges imposed on the data the limited support for query patterns and the bounded pool of database and network resources in the web environment we introduce novel query rewriting and optimization framework qpiad that tackles these challenges our technique involves reformulating the user query based on mined correlations among the database attributes the reformulated queries are aimed at retrieving the relevant possibleanswers in addition to the certain answers qpiad is able to gauge the relevance of such queries allowing tradeoffs in reducing the costs of database query processing and answer transmission to support this framework we develop methods for mining attribute correlations in terms of approximate functional dependencies value distributions in the form of naïve bayes classifiers and selectivity estimates we present empirical studies to demonstrate that our approach is able to effectively retrieve relevant possibleanswers with high precision high recall and manageable cost
it is envisaged that the application of the multilevel security mls scheme will enhance flexibility and effectiveness of authorization policies in shared enterprise databases and will replace cumbersome authorization enforcement practices through complicated view definitions on per user basis however the critical problem with the current model is that the belief higher security level is cluttered with irrelevant or inconsistent data as no mechanism for attenuation is supported critics also argue that it is imperative for mls database users to theorize about the belief of others perhaps at different security levels an apparatus that is currently missing and the absence of which in seriously feltthe impetus for our current research is the need to provide an adequate framework for belief reasoning in mls databases in this paper we show that these concepts can be captured in logic style declarative query language called multilog for mls deductive databases for which proof theoretic model theoretic and fixpoint semantics exist this development is significant from database perspective as it now enables us to compute the semantics of multilog databases in bottom up fashion we also define bottom up procedure to compute unique models of stratified multilog databases finally we establish the equivalence of multilog’s three logical characterizations model theory fixpoint theory and proof theory
microprocessor trends are moving towards wider architectures and more aggressive speculation with the increasing transistor budgets energy consumption has become critical design constraint to address this problem several researchers have proposed and evaluated energy efficient variants of speculation mechanisms however such hardware is typically evaluated in isolation and its impact on the energy consumption of the rest of the processor for example due to wrong path executions is ignored moreover the available metrics that would provide thorough evaluation of an architectural optimization employ somewhat complicated formulas with hard to measure parametersin this paper we introduce simple method to accurately compare the energy efficiency of speculative architectures our metric is based on runtime analysis of the entire processor chip and thus captures the energy consumption due to the positive as well as the negative activities that arise from the speculation activities we demonstrate the usefulness of our metric on the example of value speculation where we found some proposed value predictors including low power designs not to be energy efficient
middle domain mobility management provides an efficient routing low registration cost and handoff latency for layer ip layer based mobile network environment in the middle domain the base station bs acts as an agent or proxy to manage mobile networks to achieve this goal the bs could only address external traffic but without internal case management in order to complement this defect an enhanced version for the middle domain mobility management is designed in this paper moreover we research and design the multicast extension for the middle domain by applying the idea of the enhancement which is called hmp hierarchical multicast protocol associated handoff scheme is also proposed in this paper since it is complicated case for designing the multicast service in network environment we need characteristic method to address this case in order to fulfill this achievement of designing hmp scheme we introduce reduction process rp in this paper by using the rp complicated based network environment can be actually reduced to simpler network environment the mathematical analysis and simulation study are presented for performance evaluation simulation results have demonstrated that the enhanced middle domain mobility management has the better network performance in terms of registration cost handoff latency and routing cost in comparing with conventional mobility management schemes moreover the proposed multicast extension for hmp scheme is simple and has scalability and network performance advantages over other approaches in mobile multicasting
there are many design challenges in the hardware software co design approach for performance improvement of data intensive streaming applications with general purpose microprocessor and hardware accelerator these design challenges are mainly to prevent hardware area fragmentation to increase resource utilization to reduce hardware reconfiguration cost and to partition and schedule the tasks between the microprocessor and the hardware accelerator efficiently for performance improvement and power savings of the applications in this paper modular and block based hardware configuration architecture named memory aware run time reconfigurable embedded system martres is proposed for efficient resource management and performance improvement of streaming applications subsequently we design task placement algorithm named hierarchical best fit ascending hbfa algorithm to prove that martres configuration architecture is very efficient in increased resource utilization and flexible in task mapping and power savings the time complexity of hbfa algorithm is reduced to compared to traditional best fit bf algorithm’s time complexity of when the quality of the placement solution by hbfa is better than that of bf algorithm finally we design an efficient task partitioning and scheduling algorithm named balanced partitioned and placement aware partitioning and scheduling algorithm bpasa in bpasa we exploit the temporal parallelism in streaming applications to reduce reconfiguration cost of the hardware while keeping in mind the required throughput of the output data we balance the exploitation of spatial parallelism and temporal parallelism in streaming applications by considering the reconfiguration cost vs the data transfer cost the scheduler refers to the hbfa placement algorithm to check whether contiguous area on fpga is available before scheduling the task for hw or for sw
search on pcs has become less efficient than searching the web due to the increasing amount of stored data in this paper we present an innovative desktop search solution which relies on extracted metadata context information as well as additional background information for improving desktop search results we also present practical application of this approach the extensible beagle toolbox to prove the validity of our approach we conducted series of experiments by comparing our results against the ones of regular desktop search solution beagle we show an improved quality in search and overall performance
the precision of many type based analyses can be significantly increased given additional information about the programs execution for this reason it is not uncommon for such analyses to integrate supporting analyses computing for instance nil pointer or alias information such integration is problematic for number of reasons it obscures the original intention of the type system especially if multiple additional analyses are added it makes use of already available analyses difficult since they have to be rephrased as type systems and it is non modular changing the supporting analyses implies changing the entire type systemusing ideas from abstract interpretation we present method for parameterizing type systems over the results of abstract analyses in such way that one modular correctness proof can be obtained this is achieved by defining general format for information transferal and use of the information provided by the abstract analyses the key gain from this method is clear separation between the correctness of the analyses and the type system both in the implementation and correctness proof which leads to comparatively easy way of changing the parameterized analysis and making use of precise and hence complicated analysesin addition we exemplify the use of the framework by presenting parameterized type system that uses additional information to improve the precision of exception types in small imperative language with arrays
the recent trend in web application development is moving towards exposing the service functionality through application programming interface api many web based services start to offer apis to support application to application integration with their consumers on top of the web the trend has raised the demand for average web developers to know how to design apis for web based services in this paper we summarise list of inherent issues in the web that developers should pay attention to describe how web architecture may help to resolve these issues and suggest design considerations for web api design in addition we demonstrate an experimental design process through case study to design web api for social bookmarking service
co located collaborators can see the artifacts that others are working on which in turn enables casual interactions to help distributed collaborators maintain mutual awareness of people’s electronic work artifacts we designed and implemented an awareness tool that leverages screen sharing methods people see portions of others screens in miniature can selectively raise larger views of screen to get more detail and can engage in remote pointing people balance awareness with privacy by using several privacy protection strategies built into the system preliminary evaluation with two groups using this system shows that people use it to maintain awareness of what others are doing project certain image of themselves monitor progress coordinate joint tasks determine others availability and engage in serendipitous conversation and collaboration while privacy was not large concern for these groups theoretical analysis suggests that privacy risks may differ for other user communities
combining fine grained opinion information to produce opinion summaries is important for sentiment analysis applications toward that end we tackle the problem of source coreference resolution linking together source mentions that refer to the same entity the partially supervised nature of the problem leads us to define and approach it as the novel problem of partially supervised clustering we propose and evaluate new algorithm for the task of source coreference resolution that outperforms competitive baselines
this paper introduces method for converting an image or volume sampled on regular grid into space efficient irregular point hierarchy the conversion process retains the original frequency characteristics of the dataset by matching the spatial distribution of sample points with the required frequency to achieve good blending the spherical points commonly used in volume rendering are generalized to ellipsoidal point primitives family of multiresolution oriented gabor wavelets provide the frequency space analysis of the dataset the outcome of this frequency analysis is the reduced set of points in which the sampling rate is decreased in originally oversampled areas during rendering the traversal of the hierarchy can be controlled by any suitable error metric or quality criteria the local level of refinement is also sensitive to the transfer function areas with density ranges mapped to high transfer function variability are rendered at higher point resolution than others our decomposition is flexible and can be used for iso surface rendering alpha compositing and ray rendering of volumes we demonstrate our hierarchy with an interactive splatting volume renderer in which the traversal of the point hierarchy for rendering is modulated by user specified frame rate
in this paper we describe design orientated field study in which we deploy novel digital display device to explore the potential integration of teenage and family photo displays at home as well as the value of situated photo display technologies for intergenerational expression this exploration is deemed timely given the contemporary take up of digital capture devices by teenagers and the unprecedented volume of photographic content that teens generate findings support integration and the display of photos on standalone device as well as demonstrating the interventional efficacy of the design as resource for provoking reflection on the research subject we also draw upon the theoretical concept of dialogism to understand how our design mediates intergenerational relationships and interaction aesthetics relating to the notion of constructive conflict
key dependent message kdm security was introduced by black rogaway and shrimpton to address the case where key cycles occur among encryptions eg key is encrypted with itself it was mainly motivated by key cycles in dolev yao models ie symbolic abstractions of cryptography by term algebras and corresponding computational soundness result was later shown by ad atilde et al however both the kdm definition and this soundness result do not allow the general active attacks typical for dolev yao models or for security protocols in general we extend these definitions to obtain soundness result under active attacks we first present definition akdm adaptive kdm as kdm equivalent of authenticated symmetric encryption ie it provides chosen ciphertext security and integrity of ciphertexts for key cycles however this is not yet sufficient for the desired computational soundness result and thus we define dkdm dynamic kdm that additionally allows limited dynamic revelation of keys we show that dkdm is sufficient for computational soundness even in the strong sense of blackbox reactive simulatability brsim uc and in cases with joint terms with other operators we also build on current kdm secure schemes to construct schemes secure under the new definitions moreover we prove implications or construct separating examples respectively for new definitions and existing ones for symmetric encryption
spatial joins find all pairs of objects that satisfy given spatial relationship in spatial joins using indexes original space indexes such as the tree are widely used an original space index is the one that indexes objects as represented in the original space since original space indexes deal with extents of objects it is relatively complex to optimize join algorithms using these indexes on the other hand transform space indexes which transform objects in the original space into points in the transform space and index them deal only with points but no extents thus optimization of join algorithms using these indexes can be relatively simple however the disadvantage of these join algorithms is that they cannot be applied to original space indexes such as the tree in this paper we present novel mechanism for achieving the best of these two types of algorithms specifically we propose the new notion of the transform space view and present the transform space view join algorithm the transform space view is virtual transform space index based on an original space index it allows us to interpret or view an existing original space index as transform space index with no space and negligible time overhead and without actually modifying the structure of the original space index or changing object representation the transform space view join algorithm joins two original space indexes in the transform space through the notion of the transform space view through analysis and experiments we verify the excellence of the transform space view join algorithm the transform space view join algorithm always outperforms existing ones for all the data sets tested in terms of all three measures used the one pass buffer size the minimum buffer size required for guaranteeing one disk access per page the number of disk accesses for given buffer size and the wall clock time thus it constitutes lower bound algorithm we believe that the proposed transform space view can be applied to developing various new spatial query processing algorithms in the transform space
form comparison is fundamental part of many anthropometric biological anthropological archaeological and botanical researches etc in traditional anthropometric form comparison methods geometry characteristics and internal structure of surface points are not adequately considered form comparison of anthropometric data can make up the deficiency of traditional methods in this paper methods for analyzing other than objects are highlighted we summarize the advance of form comparison techniques in the last decades according to whether they are based upon anatomical landmarks we partition them into two main categories landmark based methods and landmark free methods the former methods are further sub divided into deformation methods superimposition methods and methods based on linear distances while the latter methods are sub divided into shape statistics based methods methods based on function analysis view based methods topology based methods and hybrid methods examples for each method are presented the discussion about their advantages and disadvantages are also introduced
simple implementation of an sml like module system is presented as module parameterized by base language and its type checker this implementation is useful both as detailed tutorial on the harper ndash lillibridge ndash leroy module system and its implementation and as constructive demonstration of the applicability of that module system to wide range of programming languages
as the issue width of superscalar processors is increased instruction fetch bandwidth requirements will also increase it will become necessary to fetch multiple basic blocks per cycle conventional instruction caches hinder this effort because long instruction sequences are not always in contiguous cache locations we propose supplementing the conventional instruction cache with trace cache this structure caches traces of the dynamic instruction stream so instructions that are otherwise noncontiguous appear contiguous for the instruction benchmark suite ibs and spec integer benchmarks kilobyte trace cache improves performance on average by over conventional sequential fetching further it is shown that the trace cache’s efficient low latency approach enables it to outperform more complex mechanisms that work solely out of the instruction cache
interactive tabletop and wall surfaces support collaboration and interactivity in novel ways apart from keyboards and mice such systems can also incorporate other input devices namely laser pointers marker pens with screen location sensors or touch sensitive surfaces similarly instead of vertically positioned desktop monitor collaborative setups typically use much larger displays which are oriented either vertically wall or horizontally tabletop or combine both kinds of surfaces in this paper we describe an empirical study that investigates how technical system constraints can affect group performance in high pace collaborative tasks for this we compare various input and output modalities in system that consists of several interactive tabletop and wall surface we observed that the performance of group of people scaled almost linearly with the number of participants on an almost perfectly parallel task we also found that mice were significantly faster than laser pointers but only by also interaction on walls was significantly faster than on the tabletop by
the widespread deployment of recommender systems has lead to user feedback of varying quality while some users faithfully express their true opinion many provide noisy ratings which can be detrimental to the quality of the generated recommendations the presence of noise can violate modeling assumptions and may thus lead to instabilities in estimation and prediction even worse malicious users can deliberately insert attack profiles in an attempt to bias the recommender system to their benefit while previous research has attempted to study the robustness of various existing collaborative filtering cf approaches this remains an unsolved problem approaches such as neighbor selection algorithms association rules and robust matrix factorization have produced unsatisfactory results this work describes new collaborative algorithm based on svd which is accurate as well as highly stable to shilling this algorithm exploits previously established svd based shilling detection algorithms and combines it with svd based cf experimental results show much diminished effect of all kinds of shilling attacks this work also offers significant improvement over previous robust collaborative filtering frameworks
distributed multimedia applications require support from the underlying operating system to achieve and maintain their desired quality of service qos this has led to the creation of novel task and message schedulers and to the development of qos mechanisms that allow applications to explicitly interact with relevant operating system services however the task scheduling techniques developed to date are not well equipped to take advantage of such interactions as result important events such as position update messages in virtual environments may be ignored if cpu scheduler ignores these events players will experience lack of responsiveness or even inconsistencies in the virtual world this paper argues that real time and multimedia applications can benefit from coordinatedel event delivery mechanism termed ecalls that supports such coordination we then show ecalls’s ability to reduce variations in inter frame times for media streams
we have conducted an extensive experimental study on algorithms for fully dynamic transitive closure we have implemented the recent fully dynamic algorithms by king lsqb rsqb roditty lsqb rsqb roditty and zwick lsqb rsqb and demetrescu and italiano lsqb rsqb along with several variants and compared them to pseudo fully dynamic and simple minded algorithms developed in previous study lsqb frigioni et al rsqb we tested and compared these implementations on random inputs synthetic worst case inputs and on inputs motivated by real world graphs our experiments reveal that some of the dynamic algorithms can really be of practical value in many situations
this paper presents probabilistic mixture modeling framework for the hierarchic organisation of document collections it is demonstrated that the probabilistic corpus model which emerges from the automatic or unsupervised hierarchical organisation of document collection can be further exploited to create kernel which boosts the performance of state of the art support vector machine document classifiers it is shown that the performance of such classifier is further enhanced when employing the kernel derived from an appropriate hierarchic mixture model used for partitioning document corpus rather than the kernel associated with flat non hierarchic mixture model this has important implications for document classification when hierarchic ordering of topics exists this can be considered as the effective combination of documents with no topic or class labels unlabeled data labeled documents and prior domain knowledge in the form of the known hierarchic structure in providing enhanced document classification performance
we describe the design implementation and performance of new parallel sparse cholesky factorization code the code uses multifrontal factorization strategy operations on small dense submatrices are performed using new dense matrix subroutines that are part of the code although the code can also use the blas and lapack the new code is recursive at both the sparse and the dense levels it uses novel recursive data layout for dense submatrices and it is parallelized using cilk an extension of specifically designed to parallelize recursive codes we demonstrate that the new code performs well and scales well on smps in particular on up to processors the code outperforms two state of the art message passing codes the scalability and high performance that the code achieves imply that recursive schedules blocked data layouts and dynamic scheduling are effective in the implementation of sparse factorization codes
several researchers have analysed the performance of ary cubes taking into account channel bandwidth constraints imposed by implementation technology namely the constant wiring density and pin out constraints for vlsi and multiple chip technology respectively for instance dally ieee trans comput abraham issues in the architecture of direct interconnection networks schemes for multiprocessors phd thesis university of illinois at urbana champaign and agrawal ieee trans parallel distributed syst have shown that low dimensional ary cubes known as tori outperform their high dimensional counterparts known as hypercubes under the constant wiring density constraint however abraham and agrawal have arrived at an opposite conclusion when they considered the constant pin out constraint most of these analyses have assumed deterministic routing where message always uses the same network path between given pair of nodes more recent multicomputers have incorporated adaptive routing to improve performance this paper re examines the relative performance merits of the torus and hypercube in the context of adaptive routing our analysis reveals that the torus manages to exploit its wider channels under light traffic as traffic increases however the hypercube can provide better performance than the torus our conclusion under the constant wiring density constraint is different from that of the works mentioned above because adaptive routing enables the hypercube to exploit its richer connectivity to reduce message blocking
with the increasing use of multi core microprocessors and hardware accelerators in embedded media processing systems there is an increasing need to discover coarse grained parallelism in media applications written in and common versions of these codes use pointer heavy sequential programming model to implement algorithms with high levels of inherent parallelism the lack of automated tools capable of discovering this parallelism has hampered the productivity of parallel programmers and application specific hardware designers as well as inhibited the development of automatic parallelizing compilers automatic discovery is challenging due to shifts in the prevalent programming languages scalability problems of analysis techniques and the lack of experimental research in combining the numerous analyses necessary to achieve clear view of the relations among memory accesses in complex programs this paper is based on coherent prototype system designed to automatically find multiple levels of coarse grained parallelism it visits several of the key analyses that are necessary to discover parallelism in contemporary media applications distinguishing those that perform satisfactorily at this time from those that do not yet have practical scalable solutions we show that contrary to common belief compiler with strong synergistic portfolio of modern analysis capabilities can automatically discover very substantial amount of coarse grained parallelism in complex media applications such as an mpeg encoder these results suggest that an automatic coarse grained parallelism discovery tool can be built to greatly enhance the software and hardware development processes of future embedded media processing systems
wireless sensor networks offer the potential to span and monitor large geographical areas inexpensively sensors however have significant power constraint battery life making communication very expensive another important issue in the context of sensor based information systems is that individual sensor readings are inherently unreliable in order to address these two aspects sensor database systems like tinydb and cougar enable in network data aggregation to reduce the communication cost and improve reliability the existing data aggregation techniques however are limited to relatively simple types of queries such as sum count avg and min max in this paper we propose data aggregation scheme that significantly extends the class of queries that can be answered using sensor networks these queries include approximate quantiles such as the median the most frequent data values such as the consensus value histogram of the data distribution as well as range queries in our scheme each sensor aggregates the data it has received from other sensors into fixed user specified size message we provide strict theoretical guarantees on the approximation quality of the queries in terms of the message size we evaluate the performance of our aggregation scheme by simulation and demonstrate its accuracy scalability and low resource utilization for highly variable input data sets
finding correlated sequential patterns in large sequence databases is one of the essential tasks in data mining since huge number of sequential patterns are usually mined but it is hard to find sequential patterns with the correlation according to the requirement of real applications the needed data analysis should be different in previous mining approaches after mining the sequential patterns sequential patterns with the weak affinity are found even with high minimum support in this paper new framework is suggested for mining weighted support affinity patterns in which an objective measure sequential ws confidence is developed to detect correlated sequential patterns with weighted support affinity patterns to efficiently prune the weak affinity patterns it is proved that ws confidence measure satisfies the anti monotone and cross weighted support properties which can be applied to eliminate sequential patterns with dissimilar weighted support levels based on the framework weighted support affinity pattern mining algorithm wsminer is suggested the performance study shows that wsminer is efficient and scalable for mining weighted support affinity patterns
the implementation of rfid leads to improved visibility in supply chains however as consequence of the increased data collection and enhanced data granularity supply chain participants have to deal with new data management challenges in this paper we give an overview of the current challenges and solution proposals in the area of data collection and transformation data organisation and data security we also identify further research requirements
recent studies of internet traffic have shown that flow size distributions often exhibit high variability property in the sense that most of the flows are short and more than half of the total load is constituted by small percentage of the largest flows in the light of this observation it is interesting to revisit scheduling policies that are known to favor small jobs in order to quantify the benefit for small and the penalty for large jobs among all scheduling policies that do not require knowledge of job size the least attained service las scheduling policy is known to favor small jobs the most we investigate the las queue for both load and our analysis shows that for job size distributions with high variability property las favors short jobs with negligible penalty to the few largest jobs and that las achieves mean response time over all jobs that is close to the mean response time achieved by srptfinally we implement las in the ns network simulator to study its performance benefits for tcp flows when las is used to schedule packets over the bottleneck link more than of the shortest flows experience smaller mean response times under las than under fifo and only the largest jobs observe negligible increase in response time the benefit of using las as compared to fifo is most pronounced at high load
virtual immersive environments or telepresence setups often consist of multiple cameras that have to be calibrated we present convenient method for doing this the minimum is three cameras but there is no upper limit the method is fully automatic and freely moving bpdght spot is the only calibration object set of virtual points is made by waving the bright spot through the working volume its projections are found with subpixel precision and verified by robust ransac analysis the cameras do not have to see all points only reasonable overlap between camera subgroups is necessary projective structures are computed via rank factorization and the euclidean stratification is done by imposing geometric constraints this linear estimate initializes postprocessing computation of nonlinear distortion which is also fully automatic we suggest trick on how to use very ordinary laser pointer as the calibration object we show that it is possible to calibrate an immersive virtual environment with cameras in less than minutes reaching about pixel reprojection error the method has been successfully tested on numerous multicamera environments using varying numbers of cameras of varying quality
this letter analyzes the fisher kernel from statistical point of view the fisher kernel is particularly interesting method for constructing model of the posterior probability that makes intelligent use of unlabeled data ie of the underlying data density it is important to analyze and ultimately understand the statistical properties of the fisher kernel to this end we first establish sufficient conditions that the constructed posterior model is realizable ie it contains the true distribution realizability immediately leads to consistency results subsequently we focus on an asymptotic analysis of the generalization error which elucidates the learning curves of the fisher kernel and how unlabeled data contribute to learning we also point out that the squared or log loss is theoretically preferable because both yield consistent estimators to other losses such as the exponential loss when linear classifier is used together with the fisher kernel therefore this letter underlines that the fisher kernel should be viewed not as heuristics but as powerful statistical tool with well controlled statistical properties
this paper presents an approach to full body human pose recognition inputs to the proposed approach are pairs of silhouette images obtained from wide baseline binocular cameras through multilinear analysis low dimensional view invariant pose coefficient vectors can be extracted from these stereo silhouette pairs taking these pose coefficient vectors as features the universum method is trained and used for pose recognition experiment results obtained using real image data showed the efficacy of the proposed approach
information integration for distributed and heterogeneous data sources is still an open challenging and schema matching is critical in this process this paper presents an approach to automatic elements matching between xml application schemas using similarity measure and relaxation labeling the semantic modeling of xml application schema has also been presented the similarity measure method considers element categories and their properties in an effort to achieve an optimal matching contextual constraints are used in the relaxation labeling method based on the semantic modeling of xml application schemas the compatible constraint coefficients are devised in terms of the structures and semantic relationships as defined in the semantic model to examine the effectiveness of the proposed methods an algorithm for xml schema matching has been developed and corresponding computational experiments show that the proposed approach has high degree of accuracy
quality of service qos support in local and cluster area environments has become an issue of great interest in recent years most current high performance interconnection solutions for these environments have been designed to enhance conventional best effort traffic performance but are not well suited to the special requirements of the new multimedia applications the multimedia router mmr aims at offering hardware based qos support within compact interconnection component one of the key elements in the mmr architecture are the algorithms used in traffic scheduling these algorithms are responsible for the order in which information is forwarded through the internal switch thus they are closely related to the qos provisioning mechanisms in this paper several traffic scheduling algorithms developed for the mmr architecture are described their general organization is motivated by chances for parallelization and pipelining while providing the necessary support both to multimedia flows and to best effort traffic performance evaluation results show that the qos requirements of different connections are met in spite of the presence of best effort traffic while achieving high link utilizations
aggregating statistical representations of classes is an important task for current trends in scaling up learning and recognition or for addressing them in distributed infrastructures in this perspective we address the problem of merging probabilistic gaussian mixture models in an efficient way through the search for suitable combination of components from mixtures to be merged we propose new bayesian modelling of this combination problem in association to variational estimation technique that handles efficiently the model complexity issue main feature of the present scheme is that it merely resorts to the parameters of the original mixture ensuring low computational cost and possibly communication should we operate on distributed system experimental results are reported on real data
conceptually the common approach to manipulating probabilistic data is to evaluate relational queries and then calculate the probability of each tuple in the result this approach ignores the possibility that the probabilities of complete answers are too low and hence partial answers with sufficiently high probabilities become important therefore we consider the semantics in which answers are maximal ie have the smallest degree of incompleteness subject tothe constraint that the probability is still above given threshold we investigate the complexity of joining relations under the above semantics in contrast to the deterministic case this approach gives rise to two different enumeration problems the first is finding all maximal sets of tuples that are join consistent connected and have joint probability above the threshold the second is computing all maximal tuples that are answers of partial joins and have probability above the threshold both problems are tractable under data complexity we also consider query and data complexity which rules out as efficient the following naive algorithm compute all partial answers and then choose the maximal ones among those with probabilities above the threshold we give efficient algorithms for several important special cases we also show that in general the first problem is np hard whereas the secondis hard
as microprocessor designs become increasingly powerand complexity conscious future microarchitectures must decrease their reliance on expensive dynamic scheduling structures while compilers have generally proven adept at planning useful static instruction level parallelism relying solely on the compiler instruction execution arrangement performs poorly when cache misses occur because variable latency is not well tolerated this paper proposes new microarchitectural model multipass pipelining that exploits meticulous compile time scheduling on simple in order hardware while achieving excellent cache miss tolerance through persistent advance preexecution beyond otherwise stalled instructions the pipeline systematically makes multiple passes through instructions that follow stalled instruction each pass increases the speed and energy efficiency of the subsequent ones by preserving computed results the concept of multiple passes and successive improvement of efficiency across passes in single pipeline distinguishes multipass pipelining from other runahead schemes simulation results show that the multipass technique achieves of the cycle reduction of aggressive out of order execution relative to in order execution in addition microarchitectural level power simulation indicates that benefits of multipass are achieved at fraction of the power overhead of full dynamic scheduling
an order dependent query is one whose result interpreted as multiset changes if the order of the input records is changed in stock quotes database for instance retrieving all quotes concerning given stock for given day does not depend on order because the collection of quotes does not depend on order by contrast finding stock’s five price moving average in trades table gives result that depends on the order of the table query languages based on the relational data model can handle order dependent queries only through add ons sql for instance has new window mechanism which can sort data in limited parts of query add ons make order dependent queries dicult to write and to optimize in this paper we show that order can be natural property of the underlying data model and algebra we introduce new query language and algebra called aquery that supports order from the ground up new order related query transformations arise in this setting we show by experiment that this framework language plus optimization techniques brings orders of magnitude improvement over sql systems on many natural order dependent queries
datalog database query language based on the logic programming paradigm is described the syntax and semantics of datalog and its use for querying relational database are presented optimization methods for achieving efficient evaluations of datalog queries are classified and the most relevant methods are presented various improvements of datalog currently under study are discussed and what is still needed in order to extend datalog’s applicability to the solution of real life problems is indicated
cardinality estimation is the problem of estimating the number of tuples returned by query it is fundamentally important task in data management used in query optimization progress estimation and resource provisioning we study cardinality estimation in principled framework given set of statistical assertions about the number of tuples returned by fixed set of queries predict the number of tuples returned by new query we model this problem using the probability space over possible worlds that satisfies all provided statistical assertions and maximizes entropy we call this the entropy maximization model for statistics maxent in this paper we develop the mathematical techniques needed to use the maxent model for predicting the cardinality of conjunctive queries
we present our current research on the implementation of gaze as an efficient and usable pointing modality supplementary to speech for interacting with augmented objects in our daily environment or large displays especially immersive virtual reality environments such as reality centres and caves we are also addressing issues relating to the use of gaze as the main interaction input modality we have designed and developed two operational user interfaces one for providing motor disabled users with easy gaze based access to map applications and graphical software the other for iteratively testing and improving the usability of gaze contingent displays
attackers exploit software vulnerabilities to control or crash programs bouncer uses existing software instrumentation techniques to detect attacks and it generates filters automatically to block exploits of the target vulnerabilities the filters are deployed automatically by instrumenting system calls to drop exploit messages these filters introduce low overhead and they allow programs to keep running correctly under attack previous work computes filters using symbolic execution along the path taken by sample exploit but attackers can bypass these filters by generating exploits that follow different execution path bouncer introduces three techniques to generalize filters so that they are harder to bypass new form of program slicing that uses combination of static and dynamic analysis to remove unnecessary conditions from the filter symbolic summaries for common library functions that characterize their behavior succinctly as set of conditions on the input and generation of alternative exploits guided by symbolic execution bouncer filters have low overhead they do not have false positives by design and our results show that bouncer can generate filters that block all exploits of some real world vulnerabilities
true multi user multimodal interaction over digital table lets co located people simultaneously gesture and speak commands to control an application we explore this design space through case study where we implemented an application that supports the kj creativity method as used by industrial designers four key design issues emerged that have significant impact on how people would use such multi user multimodal system first parallel work is affected by the design of multimodal commands second individual mode switches can be confusing to collaborators especially if speech commands are used third establishing personal and group territories can hinder particular tasks that require artefact neutrality finally timing needs to be considered when designing joint multimodal commands we also describe our model view controller architecture for true multi user multimodal interaction
as mobility permeates into todays computing and communication arena we envision application infrastructures that will increasingly rely on mobile technologies traditional database applications and information service applications will need to integrate mobile entities people and computers in this paper we develop distributed database framework for mobile environments key requirement in such an environment is to support frequent connection and disconnection of database sites we present algorithms that implement this framework in an asynchronous system
all large scale projects contain degree of risk and uncertainty software projects are particularly vulnerable to overruns due to the this uncertainty and the inherent difficulty of software project cost estimation in this paper we introduce search based approach to software project robustness the approach is to formulate this problem as multi objective search based software engineering problem in which robustness and completion time are treated as two competing objectives the paper presents the results of the application of this new approach to four large real world software projects using two different models of uncertainty
for given text which has been encoded by static huffman code the possibility of locating given pattern directly in the compressed text is investigated the main problem is one of synchronization as an occurrence of the encoded pattern in the encoded text does not necessarily correspond to an occurrence of the pattern in the text simple algorithm is suggested which reduces the number of erroneously declared matches the probability of such false matches is analyzed and empirically tested
this paper studies text mining problem comparative sentence mining csm comparative sentence expresses an ordering relation between two sets of entities with respect to some common features for example the comparative sentence canon’s optics are better than those of sony and nikon expresses the comparative relation better optics canon sony nikon given set of evaluative texts on the web eg reviews forum postings and news articles the task of comparative sentence mining is to identify comparative sentences from the texts and to extract comparative relations from the identified comparative sentences this problem has many applications for example product manufacturer wants to know customer opinions of its products in comparison with those of its competitors in this paper we propose two novel techniques based on two new types of sequential rules to perform the tasks experimental evaluation has been conducted using different types of evaluative texts from the web results show that our techniques are very promising
one significant problem in tile based texture synthesis is the presence of conspicuous seams in the tiles the reason is that the sample patches employed as primary patterns of the tile set may not be well stitched if carelessly picked in this paper we introduce an optimized approach that can stably generate an omega tile set of high pattern diversity and high quality firstly an extendable rule is introduced to increase the number of sample patches to vary the patterns in an omega tile set secondly in contrast to the other concurrent techniques that randomly choose sample patches for tile construction our technique uses genetic algorithm to select the feasible patches from the input example this operation insures the quality of the whole tile set experimental results verify the high quality and efficiency of the proposed algorithm
in this paper we focus on the minimal deterministic finite automaton that recognizes the set of suffixes of word up to errors as first result we give characterization of the nerode’s right invariant congruence that is associated with this result generalizes the classical characterization described in blumer blumer haussler ehrenfeucht chen seiferas the smallest automaton recognizing the subwords of text theoretical computer science as second result we present an algorithm that makes use of to accept in an efficient way the language of all suffixes of up to errors in every window of size of text where is the repetition index of moreover we give some experimental results on some well known words like prefixes of fibonacci and thue morse words finally we state conjecture and an open problem on the size and the construction of the suffix automaton with mismatches
routing overlays have become viable approach to working around slow bgp convergence and sub optimal path selection as well as to deploy novel forwarding architectures common sub component of routing overlay is routing mesh the route selection algorithm considers only those virtual links inter node links in an overlay in the routing mesh rather than all virtual links connecting an node overlay doing so reduces routing overhead thereby improving the scalability of the overlay as long as the process of constructing the mesh doesn’t itself introduce overheadthis paper proposes and evaluates low cost approach to building topology aware routing mesh that eliminates virtual links that contain duplicate physical segments in the underlying network an evaluation of our method on planetlab shows that conservative link pruning algorithm reduces routing overhead by factor of two without negatively impacting route selection additional analysis quanti es the impact on route selection of defining an even sparser mesh on top of the topology aware routing mesh documenting the cost benefit tradeoff that is intrinsic to routing it also shows that constructing sparser routing mesh on the topology aware routing mesh rather than directly on the internet itself benefit from having the reduced number of duplicate physical segments in the underlying network which improves the resilience of the resulting routing mesh
independent parties that produce perfectly complementary components may form alliances or coalitions or groups to better coordinate their pricing decisions when they sell their products to downstream buyers this paper studies how market demand conditions ie the form of the demand function demand uncertainty and price sensitive demand drive coalition formation among complementary suppliers in deterministic demand model we show that for an exponential or isoelastic demand function suppliers always prefer selling in groups for linear power demand function suppliers may all choose to sell independently in equilibrium these results are interpreted through the pass through rate associated with the demand function in an uncertain demand model we show that in general the introduction of multiplicative stochastic element in demand has an insignificant impact on stable coalitions and that an endogenous retail price ie demand is price sensitive increases suppliers incentives to form alliances relative to the case with fixed retail price we also consider the impact of various other factors on stable outcomes in equilibrium eg sequential decision making by coalitions of different sizes the cost effect due to alliance formation either cost savings or additional costs and system without an assembler
user contributed wireless mesh networks are disruptive technology that may fundamentally change the economics of edge network access and bring the benefits of computer network infrastructure to local communities at low cost anywhere in the world to achieve high throughput despite highly unpredictable and lossy wireless channels it is essential that such networks take advantage of transmission opportunities wherever they emerge however as opportunistic routing departs from the traditional but less effective deterministic shortest path based routing user nodes in such networks may have less incentive to follow protocols and contribute in this paper we present the first routing protocols in which it is incentive compatible for each user node to honestly participate in the routing despite opportunistic transmissions we not only rigorously prove the properties of our protocols but also thoroughly evaluate complete implementation of our protocols experiments show that there is gain in throughput when compared with an opportunistic routing protocol that does not provide incentives and users can act selfishly
the proactive desk is new digital desk with haptic feedback the concept of digital desk was proposed by wellner in for the first time typical digital desk enables user to seamlessly handie both digital and physical objects on the desk with common gui standard the user however handles them as virtual gui objects our proactive desk allows the user to handle both digital and physical objects on digital desk with realistic feeling in the proactive desk two linear induction motors are equipped to generate an omnidirectional translational force on the user’s hand or on physical object on the desk without any mechanical links or wires thereby preserving the advantages of the digital desk in this article we first discuss applications of digital desk with hatptic feedback then we mention the design and structure of the first trial proactive desk and its performance
the design development and deployment of interactive systems can substantively impact individuals society and the natural environment now and potentially well into the future yet scarcity of methods exists to support long term emergent systemic thinking in interactive design practice toward addressing this gap we propose four envisioning criteria stakeholders time values and pervasiveness distilled from prior work in urban planning design noir and value sensitive design we characterize how the criteria can support systemic thinking illustrate the integration of the envisioning criteria into established design practice scenariobased design and provide strategic activities to serve as generative envisioning tools we conclude with suggestions for use and future work key contributions include four envisioning criteria to support systemic thinking value scenarios extending scenario based design and strategic activities for engaging the envisioning criteria in interactive system design practice
number of vertical mining algorithms have been proposed recently for association mining which have shown to be very effective and usually outperform horizontal approaches the main advantage of the vertical format is support for fast frequency counting via intersection operations on transaction ids tids and automatic pruning of irrelevant data the main problem with these approaches is when intermediate results of vertical tid lists become too large for memory thus affecting the algorithm scalabilityin this paper we present novel vertical data representation called diffset that only keeps track of differences in the tids of candidate pattern from its generating frequent patterns we show that diffsets drastically cut down the size of memory required to store intermediate results we show how diffsets when incorporated into previous vertical mining methods increase the performance significantly
we introduce comprehensive biomechanical model of the human upper body our model confronts the combined challenge of modeling and controlling more or less all of the relevant articular bones and muscles as well as simulating the physics based deformations of the soft tissues its dynamic skeleton comprises bones with jointed degrees of freedom including those of each vertebra and most of the ribs to be properly actuated and controlled the skeletal submodel requires comparable attention to detail with respect to muscle modeling we incorporate muscles each of which is modeled as piecewise uniaxial hill type force actuator to simulate biomechanically realistic flesh deformations we also develop coupled finite element model with the appropriate constitutive behavior in which are embedded the detailed anatomical geometries of the hard and soft tissues finally we develop an associated physics based animation controller that computes the muscle activation signals necessary to drive the elaborate musculoskeletal system in accordance with sequence of target poses specified by an animator
real time production systems and other dynamic environments often generate tremendous potentially infinite amount of stream data the volume of data is too huge to be stored on disks or scanned multiple times can we perform on line multi dimensional analysis and data mining of such data to alert people about dramatic changes of situations and to initiate timely high quality responses this is challenging task in this paper we investigate methods for on line multi dimensional regression analysis of time series stream data with the following contributions our analysis shows that only small number of compressed regression measures instead of the complete stream of data need to be registered for multi dimensional linear regression analysis to facilitate on line stream data analysis partially materialized data cube model with regression as measure and tilt time frame as its time dimension is proposed to minimize the amount of data to be retained in memory or stored on disks and an exception guided drilling approach is developed for on line multi dimensional exception based regression analysis based on this design algorithms are proposed for efficient analysis of time series data streams our performance study compares the proposed algorithms and identifies the most memory and time efficient one for multi dimensional stream data analysis
dynamic load balancing is key problem for the efficient use of parallel systems when solving applications with unpredictable load estimates however depending on the underlying programming paradigm single program multiple data spmd or multiple program multiple data mpmd the balancing requirements vary in spmd scenarios perfect load balance is desired whereas in mpmd scenarios it might be better to quickly obtain large reduction in load imbalance in short period of time we propose extending the local domain of given processor in the load balancing algorithms to find better scope for each paradigm for that purpose we present generalised version of the diffusion algorithm searching unbalanced domains called ds dasud which extends the local domain of each processor beyond its immediate neighbour ds dasud belongs to the iterative distributed load balancing idlb class and in its original formulation operates in diffusion scheme where processor balances its load with all its immediate neighbours ds we evaluate this algorithm for the two programming paradigms varying the domain size the evaluation was carried out using two simulators load balancing and network simulators for large set of load distributions that exhibit different degrees of initial workload unbalancing these distributions were applied to torus and hypercube topologies and the number of processors ranged from to from these experiments we conclude that the dasud fits well for spmd scenarios whereas for mpmd dasud and dasud for hypercube and torus topologies respectively obtain the best trade off between the imbalance reduction up to and the cost incurred in reaching it
we propose general methodology for analysing the behaviour of open systems modelled as coordinators ie open terms of suitable process calculi coordinator is understood as process with holes or placeholders where other coordinators and components ie closed terms can be plugged in thus influencing its behaviour the operational semantics of coordinators is given by means of symbolic transition system where states are coordinators and transitions are labeled by spatial modal formulae expressing the potential interaction that plugged components may enable behavioural equivalences for coordinators like strong and weak bisimilarities can be straightforwardly defined over such transition system different from other approaches based on universal closures ie where two coordinators are considered equivalent when all their closed instances are equivalent our semantics preserves the openness of the system during its evolution thus allowing dynamic instantiation to be accounted for in the semantics to further support the adequacy of the construction we show that our symbolic equivalences provide correct approximations of their universally closed counterparts coinciding with them over closed components for process calculi in suitable formats we show how tractable symbolic semantics can be defined constructively using unification
in decision support systems having knowledge on the top values is more informative and crucial than the maximum value unfortunately the naive method involves high computational cost and the existing methods for range max query are inefficient if applied directly in this paper we propose pre computed partition top method ppt to partition the data cube and pre store number of top values for improving query performance the main focus of this study is to find the optimum values for two parameters ie the partition factor and the number of pre stored values through analytical approach cost function based on poisson distribution is used for the analysis the analytical results obtained are verified against simulation results it is shown that the ppt method outperforms other alternative methods significantly when proper and are used
in this paper we concentrate on aspects related to modeling and formal verification of embedded systems first we define formal model of computation for embedded systems based on petri nets that can capture important features of such systems and allows their representation at different levels of granularity our modeling formalism has well defined semantics so that it supports precise representation of the system the use of formal methods to verify its correctness and the automation of different tasks along the design process second we propose an approach to the problem of formal verification of embedded systems represented in our modeling formalism we make use of model checking to prove whether certain properties expressed as temporal logic formulas hold with respect to the system model we introduce systematic procedure to translate our model into timed automata so that it is possible to use available model checking tools we propose two strategies for improving the verification efficiency the first by applying correctness preserving transformations and the second by exploring the degree of parallelism characteristic to the system some examples including realistic industrial case demonstrate the efficiency of our approach on practical applications
under frequent node arrival and departure churn in an overlay network structure the problem of preserving accessibility is addressed by maintaining valid entries in the routing tables towards nodes that are alive however if the system fails to replace the entries of dead nodes with entries of live nodes in the routing tables soon enough requests may fail in such cases mechanisms to route around failures are required to increase the tolerance to node failures existing distributed hash tables dhts overlays include extensions to provide fault tolerance when looking up keys however these are often insufficient we analyze the case of greedy routing often preferred for its simplicity but with limited dependability even when extensions are applied the main idea is that fault tolerance aspects need to be dealt with already at design time of the overlay we thus propose simple overlay that offers support for alternative paths and we create routing strategy which takes advantage of all these paths to route the requests while keeping maintenance cost low experimental evaluation demonstrates that our approach provides an excellent resilience to failures
in this paper we discuss the efforts underway at the pacific northwest national laboratory in understanding the dynamics of multi party discourse across number of communication modalities such as email instant messaging traffic and meeting data two prototype systems are discussed the conversation analysis tool chat is an experimental test bed for the development of computational linguistic components and enables users to easily identify topics or persons of interest within multi party conversations including who talked to whom when the entities that were discussed etc the retrospective analysis of communication events race prototype leveraging many of the chat components is an application built specifically for knowledge workers and focuses on merging different types of communication data so that the underlying message can be discovered in an efficient timely fashion
we present the rd redundancy detector rd identifies redundant code fragments in large software systems written in lisp for each pair of code fragments rd uses combination of techniques ranging from syntax based analysis to semantics based analysis that detects positive and negative evidences regarding the redundancy of the analyzed code fragments these evidences are combined according to well defined model and sufficiently redundant fragments are reported to the user rd explores several techniques and heuristics to operate within reasonable time and space bounds and is designed to be extensible
in the origin detection problem an algorithm is given set of documents ordered by creation time and query document it needs to output for every consecutive sequence of alphanumeric terms in the earliest document in in which the sequence appeared if such document exists algorithms for the origin detection problem can for example be used to detect the origin of text segments in and thus to detect novel content in they can also find the document from which the author of has copied the most or show that is mostly original we concentrate on solutions that use only fixed amount of memory we propose novel algorithms for this problem and evaluate them together with large number of previously published algorithms our results show that detecting the origin of text segments efficiently can be done with very high accuracy even when the space used is less than of the size of the documents in the precision degrades smoothly with the amount of available space various estimation techniques can be used to increase the performance of the algorithms
although deforming surfaces are frequently used in numerous domains only few works have been proposed until now for simplifying such data in this paper we propose new method for generating progressive deforming meshes based on shape feature analysis and deformation area preservation by computing the curvature and torsion of each vertex in the original model we add the shape feature factor to its quadric error metric when calculating each qem edge collapse cost in order to preserve the areas with large deformation we add deformation degree weight to the aggregated quadric errors when computing the unified edge contraction sequence finally the edge contraction order is slightly adjusted to further reduce the geometric distortion for each frame our approach is fast easy to implement and as result good quality dynamic approximations with well preserved fine details can be generated at any given frame
this paper presents reachout chat based tool for peer support collaboration and community building we describe the philosophy behind the tool and explain how posting questions in the open directly benefits the creation distribution and use of organizational knowledge in addition to enhancing the cohesion of the community involved reachout proposes new methods of handling problems that include locating selecting and approaching the right set of potential advisers we discuss the advantages of public discussions over private one on one sessions and how this is enhanced by our unique combination of synchronous and asynchronous communication we present and analyze results from pilot of reachout and conclude with plans for future research and development
software architecture has been key area of concern in software industry due to its profound impact on the productivity and quality of software products this is even more crucial in case of software product line because it deals with the development of line of products sharing common architecture and having controlled variability the main contributions of this paper is to increase the understanding of the influence of key software product line architecture process activities on the overall performance of software product line by conducting comprehensive empirical investigation covering broad range of organizations currently involved in the business of software product lines this is the first study to empirically investigate and demonstrate the relationships between some of the software product line architecture process activities and the overall software product line performance of an organization at the best of our knowledge the results of this investigation provide empirical evidence that software product line architecture process activities play significant role in successfully developing and managing software product line
software system interacts with third party libraries through various apis using these library apis often needs tofollow certain usage patterns furthermore ordering rules specifications exist between apis and these rules govern the secure and robust operation of the system using these apis but these patterns and rules may not be well documented by the api developers previous approaches mine frequent association rules itemsets or subsequences that capture api call patterns shared by api client code however these frequent api patterns cannot completely capture some useful orderings shared by apis especially when multiple apis are involved across different procedures in this paper we present framework to automatically extract usage scenarios among user specified apis as partial orders directly from the source code api client code we adapt model checker to generate interprocedural control flow sensitive static traces related to the apis of interest different api usage scenarios are extracted from the static traces by our scenario extraction algorithm and fed to miner the miner summarizes different usage scenarios as compact partial orders specifications are extracted from the frequent partial orders using our specification extraction algorithm our experience of applying the framework on clients with loc in total has shown that theextracted api partial orders are useful in assisting effective api reuse and checking
reverse nearest neighbor rnn search is very crucial in many real applications in particular given database and query object an rnn query retrieves all the data objects in the database that have the query object as their nearest neighbors often due to limitation of measurement devices environmental disturbance or characteristics of applications for example monitoring moving objects data obtained from the real world are uncertain imprecise therefore previous approaches proposed for answering an rnn query over exact precise database cannot be directly applied to the uncertain scenario in this paper we re define the rnn query in the context of uncertain databases namely probabilistic reverse nearest neighbor prnn query which obtains data objects with probabilities of being rnns greater than or equal to user specified threshold since the retrieval of prnn query requires accessing all the objects in the database which is quite costly we also propose an effective pruning method called geometric pruning gp that significantly reduces the prnn search space yet without introducing any false dismissals furthermore we present an efficient prnn query procedure that seamlessly integrates our pruning method extensive experiments have demonstrated the efficiency and effectiveness of our proposed gp based prnn query processing approach under various experimental settings
in this paper we consider how discrete time quantum walks can be applied to graph matching problems the matching problem is abstracted using an auxiliary graph that connects pairs of vertices from the graphs to be matched by way of auxiliary vertices discrete time quantum walk is simulated on this auxiliary graph and the quantum interference on the auxiliary vertices indicates possible matches when dealing with graphs for which there is no exact match the interference amplitudes together with edge consistencies are used to define consistency measure we also explore the use of the method for inexact graph matching problems we have tested the algorithm on graphs derived from the nci molecule database and found it to significantly reduce the space of possible permutation matchings typically by factor of thereby allowing the graphs to be matched directly an analysis of the quantum walk in the presence of structural errors between graphs is used as the basis of the consistency measure we test the performance of this measure on graphs derived from images in the coil database
this paper presents novel software pipelining approach which is called swing modulo scheduling sms it generates schedules that are near optimal in terms of initiation interval register requirements and stage count swing modulo scheduling is heuristic approach that has low computational cost this paper first describes the technique and evaluates it for the perfect club benchmark suite on generic vliw architecture sms is compared with other heuristic methods showing that it outperforms them in terms of the quality of the obtained schedules and compilation time to further explore the effectiveness of sms the experience of incorporating it into production quality compiler for the equator map processor is described implementation issues are discussed as well as modifications and improvements to the original algorithm finally experimental results from using set of industrial multimedia applications are presented
content based image similarity search plays key role in multimedia retrieval each image is usually represented as point in high dimensional feature space the key challenge of searching similar images from large database is the high computational overhead due to the curse of dimensionality reducing the dimensionality is an important means to tackle the problem in this paper we study dimensionality reduction for top image retrieval intuitively an effective dimensionality reduction method should not only preserve the close locations of similar images or points but also separate those dissimilar ones far apart in the reduced subspace existing dimensionality reduction methods mainly focused on the former we propose novel idea called locality condensation lc to not only preserve localities determined by neighborhood information and their global similarity relationship but also ensure that different localities will not invade each other in the low dimensional subspace to generate non overlapping localities in the subspace lc first performs an elliptical condensation which condenses each locality with an elliptical shape into more compact hypersphere to enlarge the margins among different localities and estimate the projection in the subspace for overlap analysis through convex optimization lc further performs scaling condensation on the obtained hyperspheres based on their projections in the subspace with minimal condensation degrees by condensing the localities effectively the potential overlaps among different localities in the low dimensional subspace are prevented consequently for similarity search in the subspace the number of false hits ie distant points that are falsely retrieved will be reduced extensive experimental comparisons with existing methods demonstrate the superiority of our proposal
we evaluate the energy efficiency and performance of number of synchronization mechanisms adapted for embedded devices we focus on simple hardware accelerators for common software synchronization patterns we compare the energy efficiency of range of shared memory benchmarks using both spin locks and simple hardware transactional memory in most cases transactional memory provides both significantly reduced energy consumption and increased throughput we also consider applications that employ concurrency patterns based on semaphores such as pipelines and barriers we propose and evaluate novel energy efficient hardware semaphore construction in which cores spin on local scratchpad memory reducing the load on the shared bus
this paper explores the use of multi instance enrollment as means to improve the performance of face recognition experiments are performed using the nd face data set which contains scans of subjects this is the largest face data set currently available and contains substantial amount of varied facial expression results indicate that the multi instance enrollment approach outperforms state of the art component based recognition approach in which the face to be recognized is considered as an independent set of regions
media consumption is an inherently social activity serving to communicate ideas and emotions across both small and large scale communities the migration of the media experience to personal computers retains social viewing but typically only via non social strictly personal interface this paper presents an architecture and implementation for media content selection content re organization and content sharing within user community that is heterogeneous in terms of both participants and devices in addition our application allows the user to enrich the content as differentiated personalization activity targeted to his her peer group we describe the goals architecture and implementation of our system in this paper in order to validate our results we also present results from two user studies involving disjoint sets of test participants
in spatial database outsourcing data owner delegates its data management tasks to location based service lbs which indexes the data with an authenticated data structure ads the lbs receives queries ranges nearest neighbors originating from several clients subscribers each query initiates the computation of verification object vo based on the ads the vo is returned to the client that can verify the result correctness using the public key of the owner our first contribution is the mr tree space efficient ads that supports fast query processing and verification our second contribution is the mr tree modified version of the mr tree which significantly reduces the vo size through novel embedding technique finally whereas most adss must be constructed and maintained by the owner we outsource the mr and mr tree construction and maintenance to the lbs thus relieving the owner from this computationally intensive task
in question answering qa system the fundamental problem is how to measure the distance between question and an answer hence ranking different answers we demonstrate that such distance can be precisely and mathematically defined not only such definition is possible it is actually provably better than any other feasible definitions not only such an ultimate definition is possible but also it can be conveniently and fruitfully applied to construct qa system we have built such system quanta extensive experiments are conducted to justify the new theory
recent advances in information technology demand handling complex data types such as images video audio time series and genetic sequences distinctly from traditional data such as numbers short strings and dates complex data do not possess the total ordering property yielding relational comparison operators useless even equality comparisons are of little help as it is very unlikely to have two complex elements exactly equal therefore the similarity among elements has emerged as the most important property for comparisons in such domains leading to the growing relevance of metric spaces to data search regardless of the data domain properties the systems need to track evolution of data over time when handling multidimensional data temporal information is commonly treated as just one or more dimensions however metric data do not have the concept of dimensions thus adding plain temporal dimension does not make sense in this paper we propose novel metric temporal data representation and exploit its properties to compare elements by similarity taking into account time related evolution we also present experimental evaluation which confirms that our technique effectively takes into account the contributions of both the metric and temporal data components moreover the experiments showed that the temporal information always improves the precision of the answer
to conquer the weakness of existing integrity measurement and verification mechanisms based on trusted computing technology an integrity assurance mechanism for run time programs is proposed in this paper based on dynamic integrity measuring module the proposed integrity assurance mechanism solves the difficulties that may be encountered when attesting to the integrity of running programs the paper also describes the design and implementation details of the proposed module an example of applying the proposed mechanism to protect the vtpm instances in xen hypervisor is presented at last
abduction is fundamental form of nonmonotonic reasoning that aims at finding explanations for observed manifestations this process underlies many applications from car configuration to medical diagnosis we study here the computational complexity of deciding whether an explanation exists in the case when the application domain is described by propositional knowledge base building on previous results we classify the complexity for local restrictions on the knowledge base and under various restrictions on hypotheses and manifestations in comparison to the many previous studies on the complexity of abduction we are able to give much more detailed picture for the complexity of the basic problem of deciding the existence of an explanation it turns out that depending on the restrictions the problem in this framework is always polynomial time solvable np complete conp complete or complete based on these results we give an posteriori justification of what makes propositional abduction hard even for some classes of knowledge bases which allow for efficient satisfiability testing and deduction this justification is very simple and intuitive but it reveals that no nontrivial class of abduction problems is tractable indeed tractability essentially requires that the language for knowledge bases is unable to express both causal links and conflicts between hypotheses this generalizes similar observation by bylander et al for set covering abduction
it is time for us to focus on sound analyses for our critical systems software that is we must focus on analyses that ensure the absence of defects of particular known types rather than best effort bug finding tools this paper presents three sample analyses for linux that are aimed at eliminating bugs relating to type safety deallocation and blocking these analyses rely on lightweight programmer annotations and run time checks in order to make them practical and scalable sound analyses of this sort can check wide variety of properties and will ultimately yield more reliable code than bug finding alone
this paper presents new technique compiler directed page coloring that eliminates conflict misses in multiprocessor applications it enables applications to make better use of the increased aggregate cache size available in multiprocessor this technique uses the compiler’s knowledge of the access patterns of the parallelized applications to direct the operating system’s virtual memory page mapping strategy we demonstrate that this technique can lead to significant performance improvements over two commonly used page mapping strategies for machines with either direct mapped or two way set associative caches we also show that it is complementary to latency hiding techniques such as prefetchingwe implemented compiler directed page coloring in the suif parallelizing compiler and on two commercial operating systems we applied the technique to the specfp benchmark suite representative set of numeric programs we used the simos machine simulator to analyze the applications and isolate their performance bottlenecks we also validated these results on real machine an eight processor mhz digital alphaserver compiler directed page coloring leads to significant performance improvements for several applications overall our technique improves the specfp rating for eight processors by over digital unix’s page mapping policy and by over page coloring standard page mapping policy the suif compiler achieves specfp ratio of the highest ratio to date
requirements specification defines the requirements for the future system at conceptual level ie class or type level in contrast scenario represents concrete example of current or future system usage in early re phases scenarios are used to support the definition of high level requirements goals to be achieved by the new system in many cases those goals can to large degree be elicited by observing documenting and analyzing scenarios about current system usage ie the new system must often fulfill many of the functional and nonfunctional goals of the existing system to support the elicitation and validation of the goals achieved by the existing system and to illustrate problems of the old system we propose to capture current system usage using rich media eg video speech pictures etc and to interrelate those observations with the goal definitions thus we particularly aim at making the abstraction process which leads to the definition of the conceptual models more transparent and traceablemore precisely we relate the parts of the observations which have caused the definition of goal or against which goal was validated with the corresponding goal these interrelations provide the basis for explaining and illustrating goal model to eg untrained stakeholders and or new team members and thereby improving common understanding of the goal model detecting analyzing and resolving different interpretation of the observations comparing different observations using computed goal annotations and refining or detailing goal model during later process phases using the prime implementation framework we have implemented the prime crews environment which supports the interrelation of conceptual models and captured system usage observations we report on our experiences with prime crews gained in first experimental case study
given set of keywords conventional keyword search ks returns set of tuples each of which is obtained from single relation or by joining multiple relations and ii contains all the keywords in this paper proposes relevant problem called frequent co occurring term fct retrieval specifically given keyword set and an integer fct query reports the terms that are not in but appear most frequently in the result of ks query with the same fct search is able to discover the concepts that are closely related to furthermore it is also an effective tool for refining the keyword set of traditional keyword search while fct query can be trivially supported by solving the corresponding ks query we provide faster algorithm that extracts the correct results without evaluating any ks query at all the effectiveness and efficiency of our techniques are verified with extensive experiments on real data
this paper presents an intermediate language level optimization framework for dynamic binary translation performance is important to dynamic binary translation system so there has been growing interest in exploring new optimization algorithms the framework proposed in this paper includes efficient profiling hot code recognition and smart code cache management policies profiling is responsible for collecting runtime information which will be used by hot code recognition and code cache management algorithms we only focus on recognizing the hottest code and assign priorities to basic blocks according to their hotness to facilitate code cache management
modern software applications require internationalization to be distributed to different regions of the world in various situations many software applications are not internationalized at early stages of development to internationalize such an existing application developers need to externalize some hard coded constant strings to resource files so that translators can easily translate the application into local language without modifying its source code since not all the constant strings require externalization locating those need to translate constant strings is necessary task that developers must complete for internationalization in this paper we present an approach to automatically locating need to translate constant strings our approach first collects list of api methods related to the graphical user interface gui and then searches for need to translate strings from the invocations of these api methods based on string taint analysis we evaluated our approach on four real world open source applications rtext risk artofillusion and megamek the results show that our approach effectively locates most of the need to translate constant strings in all the four applications
we investigate design principles for placing striped delay sensitive data on number of disks in distributed environment the cost formulas of our performance model allow us to calculate the maximum number of users that can be supported by disks as well as to study the impact of other performance tuning options we show that for fixed probabilities of accessing the delay sensitive objects partitioning the set of disks is always better than striping in all of the disks then given number of disks and distinct delay sensitive objects with probabilities of access pr that must be striped across different disk partitions ie nonoverlapping subsets of the disks we use the theory of schur functions in order to find what is the optimal number of disks that must be allocated to each partition for objects with different consumption rates we provide an analytic solution to the problem of disk partitioning we analyze the problem of grouping the more and less popular delay sensitive objects together in partitions when the partitions are less than the objects so that the number of supported users is maximized finally we analyze the trade off of striping on all the disks versus partitioning the set of the disks when the access probabilities of the delay sensitive objects change with time
the cost of electricity for datacenters is substantial operational cost that can and should be managed not only for saving energy but also due to the ecologic commitment inherent to power consumption often pursuing this goal results in chronic underutilization of resources luxury most resource providers do not have in light of their corporate commitments this work proposes formalizes and numerically evaluates deep sam for clearing provisioning markets based on the maximization of welfare subject to utility level dependant energy costs and customer satisfaction levels we focus specifically on linear power models and the implications of the inherent fixed costs related to energy consumption of modern datacenters and cloud environments we rigorously test the model by running multiple simulation scenarios and evaluate the results critically we conclude with positive results and implications for long term sustainable management of modern datacenters
virtual try on vto applications are still under development even if few simplified applications start to be available true vto should let the user specify its measurements so that realistic avatar can be generated also the avatar should be animatable so that the worn cloth can be seen in motion this later statement requires two technologies motion adaptation and real time cloth simulation both have been extensively studied during the past decade and state of the art techniques may now enable the creation of high quality vto allowing user to virtually try on garments while shopping online this paper reviews the pieces that should be put together to build such an application
we consider the natural extension of the well known single disk caching problem to the parallel disk model pdm the main challenge is to achieve as much parallelism as possible and avoid bottlenecks we are given fast memory cache of size memory blocks along with request sequence bn where each block bi resides on one of disks in each parallel step at most one block from each disk can be fetched the task is to serve in the minimum number of parallel os thus each is analogous to page fault the difference here is that during each page fault up to blocks can be brought into memory as long as all of the new blocks entering the memory reside on different disks the problem has long history note that this problem is non trivial even if all requests in are unique this restricted version is called read once despite the progress in the offline version and read once version the general online problem still remained open here we provide comprehensive results with full general solution for the problem with asymptotically tight competitive ratios to exploit parallelism any parallel disk algorithm needs certain amount of lookahead into future requests to provide effective caching an online algorithm must achieve competitive ratio we show lower bound that states for lookahead any online algorithm must be competitive for lookahead greater than where is constant the tight upper bound of md on competitive ratio is achieved by our algorithm skew the previous algorithm tlru was md competitive and this was also shown to be tight for an lru based strategy we achieve the tight ratio using fairly different strategy than lru we also show tight results for randomized algorithms against oblivious adversary and give an algorithm achieving better bounds in the resource augmentation model
it is approximately years since the first computational experiments were conducted in what has become known today as the field of genetic programming gp twenty years since john koza named and popularised the method and ten years since the first issue appeared of the genetic programming evolvable machines journal in particular during the past two decades there has been significant range and volume of development in the theory and application of gp and in recent years the field has become increasingly applied there remain number of significant open issues despite the successful application of gp to number of challenging real world problem domains and progress in the development of theory explaining the behavior and dynamics of gp these issues must be addressed for gp to realise its full potential and to become trusted mainstream member of the computational problem solving toolkit in this paper we outline some of the challenges and open issues that face researchers and practitioners of gp we hope this overview will stimulate debate focus the direction of future research to deepen our understanding of gp and further the development of more powerful problem solving algorithms
configuration problems are thriving application area for declarative knowledge representation that currently experiences constant increase in size and complexity of knowledge bases automated support of the debugging process of such knowledge bases is necessary prerequisite for effective development of configurators we show that this task can be achieved by consistency based diagnosis techniques based on the formal definition of consistency based configuration we develop framework suitable for diagnosing configuration knowledge bases during the test phase of configurators valid and invalid examples are used to test the correctness of the system in case such examples lead to unintended results debugging of the knowledge base is initiated starting from clear definition of diagnosis in the configuration domain we develop an algorithm based on conflicts our framework is general enough for its adaptation to diagnosing customer requirements to identify unachievable conditions during configuration sessionsa prototype implementation using commercial constraint based configurator libraries shows the feasibility of diagnosis within the tight time bounds of interactive debugging sessions finally we discuss the usefulness of the outcomes of the diagnostic process in different scenarios
we present method for visualizing short video clips in single static image using the visual language of storyboards these schematic storyboards are composed from multiple input frames and annotated using outlines arrows and text describing the motion in the scene the principal advantage of this storyboard representation over standard representations of video generally either static thumbnail image or playback of the video clip in its entirety is that it requires only moment to observe and comprehend but at the same time retains much of the detail of the source video our system renders schematic storyboard layout based on small amount of user interaction we also demonstrate an interaction technique to scrub through time using the natural spatial dimensions of the storyboard potential applications include video editing surveillance summarization assembly instructions composition of graphic novels and illustration of camera technique for film studies
web cache replacement algorithms have received lot of attention during the past years though none of the proposed algorithms deals efficiently with all the particularities of the web environment namely relatively weak temporal locality due to filtering effects of caching hierarchies heterogeneity in size and origin of request streams in this paper we present the crf replacement policy whose development is mainly motivated by two factors the first is the filtering effects of web caching hierarchies and the second is the intention of achieving balance between hit and byte hit rates crf’s decisions for replacement are based on combination of the recency and frequency criteria in way that requires no tunable parameters
data replication is key technique for ensuring data availability traditionally researchers have focused on the availability of individual objects even though user level tasks called operations typically request multiple objects our recent experimental study has shown that the assignment of object replicas to machines results in subtle yet dramatic effects on the availability of these operations even though the availability of individual objects remains the same this paper is the first to approach the assignment problem from theoretical perspective and obtains series of results regarding assignments that provide the best and the worst availability for user level operations we use range of techniques to obtain our results from standard combinatorial techniques and hill climbing methods to janson’s inequality strong probabilistic tool some of the results demonstrate that even quite simple versions of the assignment problem can have surprising answers
in this paper we focus on classifying documents according to opinion and value judgment they contain the main originality of our approach is to combine linguistic pre processing classification and voting system using several classification methods in this context the relevant representation of the documents allows to determine the features for storing textual data in data warehouses the conducted experiments on very large corpora from french challenge on text mining deft show the efficiency of our approach
in recent years design patterns gain more interest in software engineering communities for both software development and maintenance as template to solve certain recurring problem design pattern documents successful experiences of software experts and gradually becomes the design guidelines of software development applying design patterns correctly can improve the efficiency of software design in terms of reusability and enhance maintainability during reverse engineering software can be evolved when developers modify their initial designs as requirements change for instance developer may add delete set of design elements such as classes and methods modifications on software artifacts can introduce conflicts and inconsistencies in the previously applied design patterns which are difficult to find and time consuming to correct this paper presents graph transformation approach to pattern level design validation and evolution based on well founded formalism we validate given design by graph grammar parser and automatically evolve the design at pattern level using graph transformation system rules for potential pattern evolutions are predefined the graph transformation approach preserves the integrity and consistency of design patterns in the system when designs change prototype system is built and case study on the strategy pattern demonstrates the feasibility of pattern based design validation and evolution using graph transformation techniques
project design involves an initial selection of technologies which has strong consequences for later stages of design in this paper we describe an ethnographic based field work study of complex organization and how it addressed the issue of front end project and technology selection formal procedures were designed for the organization to perform repeatable definable and measurable actions yet formal procedures obscured much about the processes actually being applied in selecting technologies and projects in actuality the formal procedures were interwoven with sensemaking activities so that technologies could be understood compared and decision consensus could be reached we expect that the insights from this study can benefit design teams in complex organizations facing similar selection and requirements issues
trust management represents today promising approach for supporting access control in open environments while several approaches have been proposed for trust management and significant steps have been made in this direction major obstacle that still exists in the realization of the benefits of this paradigm is represented by the lack of adequate support in the dbmsin this paper we present design that can be used to implement trust management within current relational dbmss we propose trust model with sql syntax and illustrate the main issues arising in the implementation of the model in relational dbms specific attention is paid to the efficient verification of delegation path for certificates this effort permits relatively inexpensive realization of the services of an advanced trust management model within current relational dbmss
constraints are important not just for maintaining data integrity but also because they capture natural probabilistic dependencies among data items probabilistic xml database pxdb is the probability sub space comprising the instances of document that satisfy set of constraints in contrast to existing models that can express probabilistic dependencies it is shown that query evaluation is tractable in pxdbs the problems of sampling and determining well definedness ie whether the above subspace is nonempty are also tractable furthermore queries and constraints can include the aggregate functions count max min and ratio finally this approach can be easily extended to allow probabilistic interpretation of constraints
engineering of complex distributed real time applications is one of the hardest tasks faced by the software profession today all aspects of the process from design to implementation are made more difficult by the interaction of behavioral and platform constraints providing tools for this task is likewise not without major challenges in this paper we discuss tool suite which supports the development of complex distributed real time applications in suitable high level language crl the suite’s component tools include compiler transformer optimizer an allocator migrator schedulability analyzer debugger monitor kernel and simulated network manager the overall engineering approach supported by the suite is to provide as simple and natural an integrated development paradigm as possible the suite tools address complexity due to distribution scheduling allocation and other sources in an integrated manner largely transparent to the developer to reflect the needs of propagation of functional and nonfunctional requirements throughout the development process number of robust code transformation and communication mechanisms have been incorporated into the suite to facilitate practical use of the suite the developed programs compile transform to safe subset of with appropriate libraries and runtime support in this safe subset the use of pointers is minimized aliases are not allowed unconstrained storage and time allocation is not allowed and constructions which lead to arbitrarily long executions or delays due to time or other resource allocation use are not permitted other safe features include strong typing including constrained variants only no side effects in expressions or functions etc
we address the problem of proving the total correctness of transformations of definite logic programs we consider general transformation rule called clause replacement which consists in transforming program into new program by replacing set of clauses occurring in by new set of clauses provided that and are equivalent in the least herbrand model of the program pwe propose general method for proving that transformations based on clause replacement are totally correct that is our method consists in showing that the transformation of into can be performed by adding extra arguments to predicates thereby deriving from the given program an annotated program overline ii applying variant of the clause replacement rule and transforming the annotated program overline into terminating annotated program overline and iii erasing the annotations from overline thereby getting qour method does not require that either or are terminating and it is parametric with respect to the annotations by providing different annotations we can easily prove the total correctness of program transformations based on various versions of the popular unfolding folding and goal replacement rules which can all be viewed as particular cases of our clause replacement rule
fundamental natural interaction concept is not yet fully exploited in most of the existing human computer interfaces recent technological advances have created the possibility to naturally and significantly enhance the interface perception by means of visual inputs the so called vision based interaction vbi in this paper we present gesture recognition algorithm where the user’s movements are obtained through real time vision based motion capture system specifically we focus on recognizing users motions with particular mean that is gesture defining an appropriate representation of the user’s motions based on temporal posture parameterization we apply non parametric techniques to learn and recognize the user’s gestures in real time this scheme of recognition has been tested for controlling classical computer videogame the results obtained show an excellent performance in on line classification and it allows the possibility to achieve learning phase in real time due to its computational simplicity
in an attempt to cope with time varying workload traditional adaptive time warp protocols are designed to react in response to performance changes by altering control parameter configurations like the amount of available memory the size of the checkpointing interval the frequency of gvt computation fossil collection invocations etc we call those schemes reactive because all control decisions are undertaken based on historical performance information collected at runtime and come into effect in future system states what must be considered drawback of this class of approaches is that time warp logical processes lps have most likely reached state different from the one for which the control action was established thus inducing performance control activities which are always outdatedthis paper develops environment aware self adaptive time warp lps implementing pro active performance control scheme addressing the timeliness of control decisions opposed to reactive tw schemes our pro active control mechanism based on statistical analysis of the state history periodically collected in real time intervals of size forecasts future lp state performance control decision is established that is most appropriate for the expected future lp state ie the state when the corresponding control activity would become effective depending on the forecast quality pro active scheme will presumably exhibit performance superior to re active schemes at least for cases where state changes in the time frame delta are very likely in this paper we study the ability of pro active tw lps to adapt to sudden load changes especially to abruptly occurring background workloads injected by other applications executing concurrently with the tw simulation on network of workstations experimental results show that the protocol is able to capture abrupt changes in both computational and communication resource availability justifying the title shock resistant time warp
multiresolution meshes enable us to build representations of geometric objects at different levels of detail lods we introduce multiresolution scheme whose data structure allows us to separately restore the geometry and topology of mesh during the refinement process additionally we use topological criterion not geometric criterion as usual in the literature to quickly simplify mesh what seems to make the corresponding simplification algorithm adequate for real time applications such as for example on line computer games
recent progress in energy harvesting technologies made it possible to build sensor networks with rechargeable nodes which target an indefinitely long operation in these networks the goal of energy management is to allocate the available energy such that the important performance metrics such as the number of detected threats are maximized as the harvested energy is not sufficient for continuous operation the scheduling of the active and inactive time is one of the main components of energy management the active time scheduling protocols need to maintain the energy equilibrium of the nodes while considering the uncertainties of the energy income which is strongly influenced by the weather and the energy expenditures which are dependent on the behavior of the targets in this paper we describe and experimentally compare three active time scheduling protocols static active time dynamic active time based on multi parameter heuristic and utility based uniform sensing we show that protocols which take into consideration the probabilistic models of the energy income and expenditure and can dynamically adapt to changes in the environment can provide significant performance advantage
so far virtual machine vm migration has focused on transferring the run time memory state of the vms in local area networks lan however for wide area network wan migration it is crucial to not just transfer the vms image but also transfer its local persistent state its file system and its on going network connections in this paper we address both by combining block level solution with pre copying and write throttling we show that we can transfer an entire running web server including its local persistent state with minimal disruption three seconds in the lan and seconds in the wan by combining dyndns with tunneling existing connections can continue transparently while new ones are redirected to the new network location thus we show experimentally that by combining well known techniques in novel manner we can provide system support for migrating virtual execution environments in the wide area
communication set generation significantly influences the performance of parallel programs however studies seldom give attention to the problem of communication set generation for irregular applications in this paper we propose communication optimization techniques for the situation of irregular array references in nested loops in our methods the local array distribution schemes are determined so that the total number of communication messages is minimized then we explain how to support communication set generation at compile time by introducing some symbolic analysis techniques in our symbolic analysis symbolic solutions of set of symbolic expression are obtained by using certain restrictions we introduce symbolic analysis algorithms to obtain the solutions in terms of set of equalities and inequalities finally experimental results on parallel computer cm are presented to validate our approach
the successful deployment of security policy is closely related not only to the complexity of the security requirements but also to the capabilities functionalities of the security devices the complexity of the security requirements is additionally increased when contextual constraints are taken into account such situations appear when addressing the dynamism of some security requirements or when searching finer granularity for the security rules the context denotes those specific conditions in which the security requirements are to be met re deploying contextual security policy depends on the security device functionalities either the devices include all functionalities necessary to deal with context and the policy is consequently deployed for ensuring its automatic changes or the devices do not have the right functionalities to entirely interpret contextual requirement we present solution to cope with this issue the re deployment of access control policies in system that lacks the necessary functionalities to deal with contexts
memory management is critical issue in stream processing involving stateful operators such as join traditionally the memory requirement for stream join is query driven query has to explicitly define window for each potentially unbounded input the window essentially bounds the size of the buffer allocated for that stream however output produced this way may not be desirable if the window size is not part of the intended query semantic due to the volatile input characteristics we discover that when streams are ordered or partially ordered it is possible to use data driven memory management scheme to improve the performance in this work we present novel data driven memory management scheme called window oblivious join wo join which adaptively adjusts the state buffer size according to the input characteristics our performance study shows that compared to traditional window join join wo join is more robust with respect to the dynamic input and therefore produces higher quality results with lower memory costs
we investigate the differences in terms of bothquantitative performance and subjective preference between direct touch and mouse input for unimanual andbimanual tasks on tabletop displays the results of twoexperiments show that for bimanual tasks performed ontabletops users benefit from direct touch input however our results also indicate that mouse input may be moreappropriate for single user working on tabletop tasksrequiring only single point interaction
implementing first class continuations can pose challenge if the target machine makes no provisions for accessing and re installing the run time stack in this paper we present novel translation that overcomes this problem in the first half of the paper we introduce theoretical model that shows how to eliminate the capture and the use of first class continuations in the presence of generalized stack inspection mechanism the second half of the paper explains how to translate this model into practice in two different contexts first we reformulate the servlet interaction language in the plt web server which heavily relies on first class continuations using our technique servlet programs can be run directly under the control of non cooperative web servers such as apache second we show how to use our new technique to copy and reconstitute the stack on msilnet using exception handlers this establishes that scheme’s first class continuations can exist on non cooperative virtual machines
automatic text classification is an important task for many natural language processing applications this paper presents neural approach to develop text classifier based on the learning vector quantization lvq algorithm the lvq model is classification method that uses competitive supervised learning algorithm the proposed method has been applied to two specific tasks text categorization and word sense disambiguation experiments were carried out using the reuters text collection for text categorization and the senseval corpus for word sense disambiguation the results obtained are very promising and show that our neural approach based on the lvq algorithm is an alternative to other classification systems
multicast is one of the most frequently used collective communication operations in multi core soc platforms bus as the traditional interconnect architecture for soc development has been highly efficient in delivering multicast messages since the bus is non scalable it can not address the bandwidth requirements of the large socs the networks on chip nocs emerged as scalable alternative to address the increasing communication demands of such systems however due to its hop to hop communication the nocs may not be able to deliver multicast operations as efficiently as buses do adopting multi port routers has been an approach to improve the performance of the multicast operations in interconnection networks this paper presents novel analytical model to compute communication latency of the multicast operation in wormhole routed interconnection networks employing asynchronous multi port routers scheme the model is applied to the quarc noc and its validity is verified by comparing the model predictions against the results obtained from discrete event simulator developed using omnet
queries that return list of frequently occurring items are important in the analysis of real time internet packet streams while several results exist for computing top queries using limited memory in the infinite stream model eg limited memory sliding windows to compute the statistics over sliding window synopsis data structure can be maintained for the stream to compute the statistics rapidly usually top query is always processed over an equal synopsis but it’s very hard to implement over an unequal synopsis because of the resulting inaccurate approximate answers therefore in this paper we focus on periodically refreshed top queries over sliding windows on internet traffic streams we present deterministic dsw dynamic sub window algorithm to support the processing of top aggregate queries over an unequal synopsis and guarantee the accuracy of the approximation results
concurrent ml cml is statically typed higher order concurrent language that is embedded in standard ml its most notable feature is its support for first class synchronous operations this mechanism allows programmers to encapsulate complicated communication and synchronization protocols as first class abstractions which encourages modular style of programming where the underlying channels used to communicate with given thread are hidden behind data and type abstractionwhile cml has been in active use for well over decade little attention has been paid to optimizing cml programs in this paper we present new program analysis for statically typed higher order concurrent languages that enables the compile time specialization of communication operations this specialization is particularly important in multiprocessor or multicore setting where the synchronization overhead for general purpose operations are high preliminary results from prototype that we have built demonstrate that specialized channel operations are much faster than the general purpose operationsour analysis technique is modular ie it analyzes and optimizes single unit of abstraction at time which plays to the modular style of many cml programs the analysis consists of three steps the first is type sensitive control flow analysis that uses the program’s type abstractions to compute more precise results the second is the construction of an extended control flow graph using the results of the cfa the last step is an iterative analysis over the graph that approximates the usage patterns of known channels our analysis is designed to detect special patterns of use such as one shot channels fan in channels and fan out channels we have proven the safety of our analysis and state those results
an spanner of graph is subgraph that approximates distances in within multiplicative factor and an additive error ensuring that for any two nodes dh dg this paper concerns algorithms for the distributed deterministic construction of sparse spanner for given graph and distortion parameters and it first presents generic distributed algorithm that in constant number of rounds constructs for every node graph and integer an spanner of βn edges where and are constants depending on for suitable parameters this algorithm provides spanner of at most kn edges in rounds matching the performances of the best known distributed algorithm by derbel et al podc for and constant it can also produce spanner of edges in constant time more interestingly for every integer it can construct in constant time spanner of edges such deterministic construction was not previously known the paper also presents second generic deterministic and distributed algorithm based on the construction of small dominating sets and maximal independent sets after computing such sets in sub polynomial time it constructs at its best spanner with βn edges where klog log for it provides spanner with edges the additive terms in the stretch of our constructions yield the best trade off currently known between and due to elkin and peleg stoc our distributed algorithms are rather short and can be viewed as unification and simplification of previous constructions
we introduce spidercast distributed protocol for constructing scalable churn resistant overlay topologies for supporting decentralized topic based pub sub communication spidercast is designed to effectively tread the balance between average overlay degree and communication cost of event dissemination it employs novel coverage optimizing heuristic in which the nodes utilize partial subscription views provided by decentralized membership service to reduce the average node degree while guaranteeing with high probability that the events posted on each topic can be routed solely through the nodes interested in this topic in other words the overlay is topic connected spidercast is unique in maintaining an overlay topology that scales well with the average number of topics node is subscribed to assuming the subscriptions are correlated insofar as found in most typical workloads furthermore the degree grows logarithmically in the total number of topics and slowly decreases as the number of nodes increases we show experimentally that for many practical work loads the spidercast overlays are both topic connected and have low per topic diameter while requiring each node to maintain low average number of connections these properties are satisfied even in very large settings involving up to nodes topics and subscriptions per node and under high churn rates in addition our results demonstrate that in large setting the average node degree in spidercast is at least smaller than in other overlays typically used to support decentralized pub sub communication such as eg similarity based rings based and random overlays
this chapter surveys the topic of active rules and active databases we analyze the state of the art of active databases and active rules their properties and applications in particular we describe the case of triggers following the sql standard committee point of view then we consider the case of dynamic constraints for which we use temporal logic formalism finally we discuss the applicability limitations and partial solutions found when attempting to ensure the satisfaction of dynamic constraints
the accuracy of methods for the assessment of mammographic risk analysis is heavily related to breast tissue characteristics previous work has demonstrated considerable success in developing an automatic breast tissue classification methodology which overcomes this difficulty this paper proposes unified approach for the application of number of rough and fuzzy rough set methods to the analysis of mammographic data indeed this is the first time that fuzzy rough approaches have been applied to this particular problem domain in the unified approach detailed here feature selection methods are employed for dimensionality reduction developed using rough sets and fuzzy rough sets number of classifiers are then used to examine the data reduced by the feature selection approaches and assess the positive impact of these methods on classification accuracy additionally this paper also employs new fuzzy rough classifier based on the nearest neighbour classification algorithm the novel use of such an approach demonstrates its efficiency in improving classification accuracy for mammographic data as well as considerably removing redundant irrelevant and noisy features this is supported with experimental application to two well known datasets the overall result of employing the proposed unified approach is that feature selection can identify only those features which require extraction this can have the positive effect of increasing the risk assessment accuracy rate whilst additionally reducing the time required for expert scrutiny which in turn means the risk analysis process is potentially quicker and involves less screening
many complex analysis problems can be most clearly and easily specified as logic rules and queries where rules specify how given facts can be combined to infer new facts and queries select facts of interest to the analysis problem at hand however it has been extremely challenging to obtain efficient implementations from logic rules and to understand their time and space complexities especially for on demand analysis driven by queriesthis paper describes powerful method for generating specialized rules and programs for demand driven analysis from datalog rules and queries and further for providing time and space complexity guarantees the method combines recursion conversion with specialization of rules and then uses method for program generation and complexity calculation from rules we compare carefully with the best prior methods by examining many variants of rules and queries for the same graph reachability problems and show the application of our method in implementing graph query languages in general
understanding the intent underlying user’s queries may help personalize search results and therefore improve user satisfaction we develop methodology for using the content of search engine result pages serps along with the information obtained from query strings to study characteristics of query intent with particular focus on sponsored search this work represents an initial step towards the development and evaluation of an ontology for commercial search considering queries that reference specific products brands and retailers the characteristics of query categories are studied with respect to aggregated user’s clickthrough behavior on advertising links we present model for clickthrough behavior that considers the influence of such factors as the location of ads and the rank of ads along with query category we evaluate our work using large corpus of clickthrough data obtained from major commercial search engine our findings suggest that query based features along with the content of serps are effective in detecting query intent the clickthrough behavior is found to be consistent with the classification for the general categories of query intent while for product brand and retailer categories all is true to lesser extent
advances in wireless and mobile computing environments allow mobile user to access wide range of applications for example mobile users may want to retrieve data about unfamiliar places or local life styles related to their location these queries are called location dependent queries furthermore mobile user may be interested in getting the query results repeatedly which is called location dependent continuous querying this continuous query emanating from mobile user may retrieve information from single zone single zq or from multiple neighbouring zones multiple zq we consider the problem of handling location dependent continuous queries with the main emphasis on reducing communication costs and making sure that the user gets correct current query result the key contributions of this paper include proposing hierarchical database framework tree architecture and supporting continuous query algorithm for handling location dependent continuous queries analysing the flexibility of this framework for handling queries related to single zq or multiple zq and propose intelligent selective placement of location dependent databases proposing an intelligent selective replication algorithm to facilitate time and space efficient processing of location dependent continuous queries retrieving single zq information demonstrating using simulation the significance of our intelligent selective placement and selective replication model in terms of communication cost and storage constraints considering various types of queries
studies have shown that for significant fraction of the time workstations are idle in this paper we present new scheduling policy called linger longer that exploits the fine grained availability of workstations to run sequential and parallel jobs we present two level workload characterization study and use it to simulate cluster of workstations running our new policy we compare two variations of our policy to two previous policies immediate eviction and pause and migrate our study shows that the linger longer policy can improve the throughput of foreign jobs on cluster by percent with only percent slowdown of local jobs for parallel computing we show that the linger longer policy outperforms reconfiguration strategies when the processor utilization by the local process is percent or less in both synthetic bulk synchronous and real data parallel applications
protection of one’s intellectual property is topic with important technological and legal facets the significance of this issue is amplified nowadays due to the ease of data dissemination through the internet here we provide technological mechanisms for establishing the ownership of dataset consisting of multiple objects the objects that we consider in this work are shapes ie two dimensional contours which abound in disciplines such as medicine biology anthropology and natural sciences the protection of the dataset is achieved through means of embedding of an imperceptible ownership seal that imparts only minute visual distortions this seal needs to be embedded in the proper data space so that its removal or destruction is particularly difficult our technique is robust to many common transformations such as data rotation translation scaling noise addition and resampling in addition to that the proposed scheme also guarantees that important distances between the dataset shapes objects are not distorted we achieve this by preserving the geodesic distances between the dataset objects geodesic distances capture significant part of the dataset structure and their usefulness is recognized in many machine learning visualization and clustering algorithms therefore if practitioner uses the protected dataset as input to variety of mining machine learning or database operations the output will be the same as on the original dataset we illustrate and validate the applicability of our methods on image shapes extracted from anthropological and natural science data
we describe parallel real time garbage collector and present experimental results that demonstrate good scalability and good real time bounds the collector is designed for shared memory multiprocessors and is based on an earlier collector algorithm which provided fixed bounds on the time any thread must pause for collection however since our earlier algorithm was designed for simple analysis it had some impractical features this paper presents the extensions necessary for practical implementation reducing excessive interleaving handling stacks and global variables reducing double allocation and special treatment of large and small objects an implementation based on the modified algorithm is evaluated on set of sml benchmarks on sun enterprise way ultrasparc ii multiprocessor to the best of our knowledge this is the first implementation of parallel real time garbage collector the average collector speedup is at processors and at processors maximum pause times range from ms to ms in contrast non incremental collector whether generational or not has maximum pause times from ms to ms compared to non parallel stop copy collector parallelism has overhead while real time behavior adds an additional overhead since the collector takes about of total execution time these features have an overall time costs of and
distributed applications have become core component of the internet’s infrastructure however many undergraduate curriculums especially at small colleges do not offer courses that focus on the design and implementation of distributed systems the courses that are offered address the theoretical aspects of system design but often fail to provide students with the opportunity to develop and evaluate distributed applications in real world environments as result undergraduate students are not as prepared as they should be for graduate study or careers in industry this paper describes an undergraduate course in distributed systems that not only studies the key design principles of distributed systems but also has unique emphasis on giving students hands on access to distributed systems through the use of shared computing testbeds such as planetlab and geni and open source technologies such as xen and hadoop using these platforms students can perform large scale distributed experimentation even at small colleges
network architectures for collaborative virtual reality have traditionally been dominated by client server and peer to peer approaches with peer to peer strategies typically being favored where minimizing latency is priority and client server where consistency is key with increasingly sophisticated behavior models and the demand for better support for haptics we argue that neither approach provides sufficient support for these scenarios and thus hybrid architecture is required we discuss the relative performance of different distribution strategies in the face of real network conditions and illustrate the problems they face finally we present an architecture that successfully meets many of these challenges and demonstrate its use in distributed virtual prototyping application which supports simultaneous collaboration for assembly maintenance and training applications utilizing haptics
we describe new application controlled file persistence model in which applications select the desired stability from range of persistence guarantees this new abstraction extends conventional abstractions by allowing applications to specify file’s volatility and methods for automatic reconstruction in case of loss the model allows applications particularly ones with weak persistence requirements to leverage the memory space of other machines to improve their performance an automated filename matching interface permits legacy applications to take advantage of the variable persistence guarantees without being modified our prototype implementation shows significant speed ups in some cases more than an order of magnitude over conventional network file systems such as nfs version
data caches are key hardware means to bridge the gap between processor and memory speeds but only for programs that exhibit sufficient data locality in their memory accesses thus method for evaluating cache performance is required to both determine quantitatively cache misses and to guide data cache optimizations existing analytical models for data cache optimizations target mainly isolated perfect loop nests we present an analytical model that is capable of statically analyzing not only loop nest fragments but also complete numerical programs with regular and compile time predictable memory accesses central to the whole program approach are abstract call inlining memory access vectors and parametric reuse analysis which allow the reuse and interference both within and across loop nests to be quantified precisely in unified framework based on the framework the cache misses of program are specified using mathematical formulas and the miss ratio is predicted from these formulas based on statistical sampling techniques our experimental results using kernels and whole programs indicate accurate cache miss estimates in substantially shorter amount of time typically several orders of magnitude faster than simulation
indexes are commonly employed to retrieve portion of file or to retrieve its records in particular order an accurate performance model of indexes is essential to the design analysis and tuning of file management and database systems and particularly to database query optimization many previous studies have addressed the problem of estimating the number of disk page fetches when randomly accessing records out of given records stored on disk pages this paper generalizes these results relaxing two assumptions that usually do not hold in practice unlimited buffer and unique records for each key value experiments show that the performance of an index scan is very sensitive to buffer size limitations and multiple records per key value model for these more practical situations is presented and formula derived for estimating the performance of an index scan we also give closed form approximation that is easy to compute the theoretical results are validated using the distributed relational database system although we use database terminology throughout the paper the model is more generally applicable whenever random accesses are made using keys
visibility problems are central to many computer graphics applications the most common examples include hidden part removal for view computation shadow boundaries mutual visibility of objects for lighting simulation in this paper we present theoretical study of visibility properties for scenes of smooth convex objects we work in the space of light rays or more precisely of maximal free segments we group segments that see the same object this defines the visibility complex the boundaries of these groups of segments correspond to the visual events of the scene limits of shadows disappearance of an object when the viewpoint is moved etc we provide worst case analysis of the complexity of the visibility complex of scenes as well as probabilistic study under simple assumption for normal scenes we extend the visibility complex to handle temporal visibility we give an output sensitive construction algorithm and present applications of our approach
wireless sensor network wsn design is complicated by the interaction of three elements localization medium access control and routing the design of wireless sensor networks necessarily focuses on the network’s intended purpose as well as an understanding of the synergies and tradeoffs between each of these three elements we propose five guidelines for the development of wireless sensor network and the performance and suitability of several protocols and algorithms are examined with respect to those guidelines we also suggest several performance metrics to provide fair comparisons between various localization medium access control and routing protocols
while the past research discussed several advantages of multiprocessor system on chip mpsoc architectures from both area utilization and design verification perspectives over complex single core based systems compilation issues for these architectures have relatively received less attention programming mpsocs can be challenging as several potentially conflicting issues such as data locality parallelism and load balance across processors should be considered simultaneously most of the compilation techniques discussed in the literature for parallel architectures not necessarily for mpsocs are loop based ie they consider each loop nest in isolation however one key problem associated with such loop based techniques is that they fail to capture the interactions between the different loop nests in the application this paper takes more global approach to the problem and proposes compiler driven data locality optimization strategy in the context of embedded mpsocs an important characteristic of the proposed approach is that in deciding the workloads of the processors ie in parallelizing the application it considers all the loop nests in the application simultaneously our experimental evaluation with eight embedded applications shows that the global scheme brings significant power performance benefits over the conventional loop based scheme
we propose new real time authentication scheme for memory as in previous proposals the scheme uses merkle tree to guarantee dynamic protection of memory we use the universal hash function family nh for speed and couple it with an aes encryption in order to achieve high level of security the proposed scheme is much faster compared to similar schemes achieved by cryptographic hash functions such as sha due to the finer grain incremental hashing ability provided by nh this advantage in speed becomes more vivid when the frequency of integrity checks becomes much lower than the frequency of memory updating this feature is mainly due to the incremental nature of nh moreover we show that with small variation in the universal hash function family used we can achieve fast and simple software implementation
query optimizer compares alternative plans in its search space to find the best plan for given query depending on the search space and the enumeration algorithm optimizers vary in their compilation time and the quality of the execution plan they can generate this paper describes compilation time estimator that provides quantified estimate of the optimizer compilation time for given query such an estimator is useful for automatically choosing the right level of optimization in commercial database systems in addition compilation time estimates can be quite helpful for mid query reoptimization for monitoring the progress of workload analysis tools where large number queries need to be compiled but not executed and for judicious design and tuning of an optimizerprevious attempts to estimate optimizer compilation complexity used the number of possible binary joins as the metric and overlooked the fact that each join often translates into different number of join plans because of the presence of physical properties we use the number of plans instead of joins to estimate query compilation time and employ two novel ideas reusing an optimizer’s join enumerator to obtain actual number of joins but bypassing plan generation to save estimation overhead maintaining small number of interesting properties to facilitate plan counting we prototyped our approach in commercial database system and our experimental results show that we can achieve good compilation time estimates less than error on average for complex real queries using small fraction within of the actual compilation time
iterative compilation is an efficient approach to optimize programs on rapidly evolving hardware but it is still only scarcely used in practice due to necessity to gather large number of runs often with the same data set and on the same environment in order to test many different optimizations and to select the most appropriate ones naturally in many cases users cannot afford training phase will run each data set once develop new programs which are not yet known and may regularly change the environment the programs are run onin this article we propose to overcome that practical obstacle using collective optimization where the task of optimizing program leverages the experience of many other users rather than being performed in isolation and often redundantly by each user collective optimization is an unobtrusive approach where performance information obtained after each run is sent back to central database which is then queried for optimizations suggestions and the program is then recompiled accordingly we show that it is possible to learn across data sets programs and architectures in non dynamic environments using static function cloning and run time adaptation without even reference run to compute speedups over the baseline optimization we also show that it is possible to simultaneously learn and improve performance since there are no longer two separate training and test phases as in most studies we demonstrate that extensively relying on competition among pairs of optimizations program reaction to optimizations provides robust and efficient method for capturing the impact of optimizations and for reusing this knowledge across data sets programs and environments we implemented our approach in gcc and will publicly disseminate it in the near future
this paper describes memory discipline that combines region based memory management and copying garbage collection by extending cheney’s copying garbage collection algorithm to work with regions the paper presents empirical evidence that region inference very significantly reduces the number of garbage collections and evidence that the fastest execution is obtained by using regions alone without garbage collection the memory discipline is implemented for standard ml in the ml kit compiler and measurements show that for variety of benchmark programs code generated by the compiler is as efficient both with respect to execution time and memory usage as programs compiled with standard ml of new jersey another state of the art standard ml compiler
this paper introduces unifi tool that attempts to automatically detect dimension errors in java programs unifi infers dimensional relationships across primitive type and string variables in program using an inter procedural context sensitive analysis it then monitors these dimensional relationships as the program evolves flagging inconsistencies that may be errors unifi requires no programmer annotations and supports arbitrary program specific dimensions thus providing fine grained dimensional consistency checking unifi exploits features of object oriented languages but can be used for other languages as well we have run unifi on real life java code and found that it is useful in exposing dimension errors we present case study of using unifi on nightly builds of line code base as it evolved over months
embedded digital signal processors for software defined radio have stringent design constraints including high computational bandwidth low power consumption and low interrupt latency furthermore due to rapidly evolving communication standards with increasing code complexity these processors must be compiler friendly so that code for them can quickly be developed in high level language in this paper we present the design of the sandblaster processor low power multithreaded digital signal processor for software defined radio the processor uses unique combination of token triggered threading powerful compound instructions and simd vector operations to provide real time baseband processing capabilities with very low power consumption we describe the processor’s architecture and microarchitecture along with various techniques for achieving high performance and low power dissipation we also describe the processor’s programming environment and the sb platform complete system on chip solution for software defined radio using super computer class vectorizing compiler the sb achieves real time performance in software on variety of communication protocols including gps am fm radio bluetooth gprs and wcdma in addition to providing programmable platform for sdr the processor also provides efficient support for wide variety of digital signal processing and multimedia applications
the logic programming paradigm provides the basis for new intensional view of higher order notions this view is realized primarily by employing the terms of typed lambda calculus as representational devices and by using richer form of unification for probing their structures these additions have important meta programming applications but they also pose non trivial implementation problems one issue concerns the machine representation of lambda terms suitable to their intended use an adequate encoding must facilitate comparison operations over terms in addition to supporting the usual reduction computation another aspect relates to the treatment of unification operation that has branching character and that sometimes calls for the delaying of the solution of unification problems final issue concerns the execution of goals whose structures become apparent only in the course of computation these various problems are exposed in this paper and solutions to them are described satisfactory representation for lambda terms is developed by exploiting the nameless notation of de bruijn as well as explicit encodings of substitutions special mechanisms are molded into the structure of traditional prolog implementations to support branching in unification and carrying of unification problems over other computation steps premium is placed in this context on exploiting determinism and on emulating usual first order behaviour an extended compilation model is presented that treats higher order unification and also handles dynamically emergent goals the ideas described here have been employed in the teyjus implementation of the lambda prolog language fact that is used to obtain preliminary assessment of their efficacy
new directions in the provision of end user computing experiences mean that we need to determine the best way to share data between small mobile computing devices partitioning large structures so that they can be shared efficiently provides basis for data intensive applications on such platforms in conjunction with such an approach dictionary based compression techniques provide additional benefits and help to prolong battery life
there has been wide interest recently in managing probabilistic data but in order to follow the rich literature on probabilistic databases one is often required to take detour into probability theory correlations conditionals monte carlo simulations error bounds topics that have been studied extensively in several areas of computer science and mathematics because of that it is often difficult to get to the algorithmic and systems level aspects of probabilistic data management in this tutorial we will distill these aspects from the often theory heavy literature on probabilistic databases we will start by describing real application at the university of washington using the rfid ecosystem we will show how probabilities arise naturally and why we need to cope with them we will then describe what an implementor needs to know to process sql queries on probabilistic databases in the second half of the tutorial we will discuss more advanced issues such as event processing over probabilistic streams and views over probabilistic data
specifications that are used in detailed design and in the documentation of existing code are primarily written and read by programmers however most formal specification languages either make heavy use of symbolic mathematical operators which discourages use by programmers or limit assertions to expressions of the underlying programming language which makes it difficult to write exact specifications moreover using assertions that are expressions in the underlying programming language can cause problems both in runtime assertion checking and in formal verification because such expressions can potentially contain side effects the java modeling language jml avoids these problems it uses side effect free subset of java’s expressions to which are added few mathematical operators such as the quantifiers forall and exists jml also hides mathematical abstractions such as sets and sequences within library of java classes the goal is to allow jml to serve as common notation for both formal verification and runtime assertion checking this gives users the benefit of several tools without the cost of changing notations
data caching on mobile clients is widely seen as an effective solution to improve system performance in particular cooperative caching based on the idea of sharing and coordination of cache data among multiple users can be particularly effective for information access in mobile ad hoc networks where mobile clients are moving frequently and network topology is changing dynamically most existing cache strategies perform replacement independently and they seldom consider coordinated replacement and energy saving issues in the context of mobile ad hoc network in this paper we analyse the impact of energy on designing cache replacement policy and formulate the energy efficient coordinated cache replacement problem ecorp as knapsack problem dynamic programming algorithm called ecorp dp and heuristic algorithm called ecorp greedy are presented to solve the problem simulations using both synthetic workload traces and real workload traces in our experiments show that the proposed policies can significantly reduce energy consumption and access latency when compared to other replacement policies
human participation in business processes needs to be addressed in process modeling bpelpeople with ws humantask covers this concern in the context of bpel bound to specific workflow technology this leads to number of problems firstly maintaining and migrating processes to new or similar technologies is expensive secondly the low level technical standards make it hard to communicate the process models to human domain experts model driven approaches can help to easier cope with technology changes and present the process models at higher level of abstraction than offered by the technology standards in this paper we extend the model driven approach with view based framework for business process modeling in which models can be viewed at different abstraction levels and different concerns of model can be viewed separately our approach enables developers to work with meta models that represent technical view on the human participation whereas human domain experts can have an abstract view on human participation in business process in order to validate our work mapping to bpelpeople technology will be demonstrated
large datasets on the order of gb and tb are increasingly common as abundant computational resources allow practitioners to collect produce and store data at higher rates as dataset sizes grow it becomes more challenging to interactively manipulate and analyze these datasets due to the large amounts of data that need to be moved and processed application independent caches such as operating system page caches and database buffer caches are present throughout the memory hierarchy to reduce data access times and alleviate transfer overheads we claim that an application aware cache with relatively modest memory requirements can effectively exploit dataset structure and application information to speed access to large datasets we demonstrate this idea in the context of system named the tree cache to reduce query latency to large octree datasets by an order of magnitude
mobile devices have become indispensable in daily life and hence how to take advantage of these portable and powerful facilities to share resources and information begins to emerge as an interesting problem in this paper we investigate the problem of information retrieval in mobile peer to peer network the prevailing approach to information retrieval is to apply flooding methods because of its quick response and easy maintenance obviously this kind of approach wastes huge amount of communication bandwidth which greatly affects the availability of the network and the battery power which significantly shortens the serving time of mobile devices in the network to tackle this problem we propose novel approach by mimicking different human behaviors of social networks which takes advantages of intelligence accuracy ia mechanism that evaluates the distance from node to certain resources in the network extensive experimental results show the efficiency and effectiveness of our approach as well as its scalability in volatile environment
search computing is novel discipline whose goal is to answer complex multi domain queries such queries typically require combining in their results domain knowledge extracted from multiple web resources therefore conventional crawling and indexing techniques which look at individual web pages are not adequate for them in this paper we sketch the main characteristics of search computing and we highlight how various classical computer science disciplines including software engineering web engineering service oriented architectures data management and human computing interaction are challenged by the search computing approach
the formation of secure transportation corridors where cargoes and shipments from points of entry can be dispatched safely to highly sensitive and secure locations is high national priority one of the key tasks of the program is the detection of anomalous cargo based on sensor readings in truck weigh stations due to the high variability dimensionality and or noise content of sensor data in transportation corridors appropriate feature representation is crucial to the success of anomaly detection methods in this domain in this paper we empirically investigate the usefulness of manifold embedding methods for feature representation in anomaly detection problems in the domain of transportation corridors we focus on both linear methods such as multi dimensional scaling mds as well as nonlinear methods such as locally linear embedding lle and isometric feature mapping isomap our study indicates that such embedding methods provide natural mechanism for keeping anomalous points away from the dense normal regions in the embedding of the data we illustrate the efficacy of manifold embedding methods for anomaly detection through experiments on simulated data as well as real truck data from weigh stations
advances in service oriented architecture soa have brought us close to the once imaginary vision of establishing and running virtual business business in which most or all of its business functions are outsourced to online services cloud computing offers realization of soa in which it resources are offered as services that are more affordable flexible and attractive to businesses in this paper we briefly study advances in cloud computing and discuss the benefits of using cloud services for businesses and trade offs that they have to consider we then present layered architecture for the virtual business and conceptual architecture for virtual business operating environment we discuss the opportunities and research challenges that are ahead of us in realizing the technical components of this conceptual architecture we conclude by giving the outlook and impact of cloud services on both large and small businesses
with advances in process technology soft errors se are becoming an increasingly critical design concern due to their large area and high density caches are worst hit by soft errors although error correction code based mechanisms protect the data in caches they have high performance and power overheads since multimedia applications are increasingly being used in mission critical embedded systems where both reliability and energy are major concern there is de nite need to improve reliability in embedded systems without too much energy overhead we observe that while soft error in multimedia data may only result in minor loss in qos soft error in avariable that controls the execution ow of the program may be fatal consequently we propose to partition the data space into failure critical and failure non critical data and provide high degree of soft error protection only to the failure critical data in horizontally partitioned caches experimental results demonstrate that our selective data protection can achieve the failure rate close to that of soft error protected cache system while retaining the performance and energy consumption similar to those of traditional cache system with some degradation in qos for example for conventional con guration as in intelxscale our approach achieves the same failure rate while improving performance by and reducing energy consumption by in comparison with soft error protected cache
when ranking texts retrieved for query semantics of each term in the texts is fundamental basis the semantics often depends on locality context neighboring terms of in the texts in this paper we present technique ctfatr that improves text rankers by encoding the term locality contexts to the assessment of term frequency tf of each term in the texts results of the tf assessment may be directly used to improve various kinds of text rankers without calling for any revisions to algorithms and development processes of the rankers moreover ctfatr is efficient to conduct the tf assessment online and neither training process nor training data is required empirical evaluation shows that ctfatr significantly improves various kinds of text rankers the contributions are of practical significance since many text rankers were developed and if they consider tf in ranking ctfatr may be used to enhance their performance without incurring any cost to them
this paper presents capriccio scalable thread package for use with high concurrency servers while recent work has advocated event based systems we believe that thread based systems can provide simpler programming model that achieves equivalent or superior performanceby implementing capriccio as user level thread package we have decoupled the thread package implementation from the underlying operating system as result we can take advantage of cooperative threading new asynchronous mechanisms and compiler support using this approach we are able to provide three key features scalability to threads efficient stack management and resource aware schedulingwe introduce linked stack management which minimizes the amount of wasted stack space by providing safe small and non contiguous stacks that can grow or shrink at run time compiler analysis makes our stack implementation efficient and sound we also present resource aware scheduling which allows thread scheduling and admission control to adapt to the system’s current resource usage this technique uses blocking graph that is automatically derived from the application to describe the flow of control between blocking points in cooperative thread package we have applied our techniques to the apache web server demonstrating that we can achieve high performance and scalability despite using simple threaded programming model
we present novel framework for the query performance prediction task that is estimating the effectiveness of search performed in response to query in lack of relevance judgments our approach is based on using statistical decision theory for estimating the utility that document ranking provides with respect to an information need expressed by the query to address the uncertainty in inferring the information need we estimate utility by the expected similarity between the given ranking and those induced by relevance models the impact of relevance model is based on its presumed representativeness of the information need specific query performance predictors instantiated from the framework substantially outperform state of the art predictors over five trec corpora
existing sensor network architectures are based on the assumption that data will be polled therefore they are not adequate for long term battery powered use in applications that must sense or react to events that occur at unpredictable times in response and motivated by structural autonomous crack monitoring acm application from civil engineering that requires bursts of high resolution sampling in response to aperiodic vibrations in buildings and bridges we have designed implemented and evaluated lucid dreaming hardware software technique to dramatically decrease sensor node power consumption in this and other event driven sensing applications this work makes the following main contributions we have identified the key mismatches between existing polling based sensor network architectures and event driven applications we have proposed hardware software technique to permit the power efficient use of sensor networks in event driven applications we have analytically characterized the situations in which the proposed technique is appropriate and we have designed implemented and tested hardware software solution for standard crossbow motes that embodies the proposed technique in the building and bridge structural integrity monitoring application the proposed technique achieves the power consumption of existing sensor network architectures thereby dramatically increasing battery lifespan or permitting operation based on energy scavenging we believe that the proposed technique will yield similar benefits in wide range of applications printed circuit board specification files permitting reproduction of the current implementation are available for free use in research and education
content addressable storage cas system is valuable tool for building storage solutions providing efficiency by automatically detecting and eliminating duplicate blocks it can also be capable of high throughput at least for streaming access however the absence of standardized api is barrier to the use of cas for existing applications additionally applications would have to deal with the unique characteristics of cas such as immutability of blocks and high latency of operations an attractive alternative is to build file system on top of cas since applications can use its interface without modification mapping file system onto cas system efficiently so as to obtain high duplicate elimination and high throughput requires very different design than for traditional disk subsystem in this paper we present the design implementation and evaluation of hydrafs file system built on top of hydrastor scalable distributed content addressable block storage system hydrafs provides high performance reads and writes for streaming access achieving of the hydrastor throughput while maintaining high duplicate elimination
we present griffin hybrid storage device that uses hard disk drive hdd as write cache for solid state device ssd griffin is motivated by two observations first hdds can match the sequential write bandwidth of mid range ssds second both server and desktop workloads contain significant fraction of block overwrites by maintaining log structured hdd cache and migrating cached data periodically griffin reduces writes to the ssd while retaining its excellent performance we evaluate griffin using variety of traces from windows systems and show that it extends ssd lifetime by factor of two and reduces average latency by
as most applications in wireless sensor networks wsn are location sensitive in this paper we explore the problem of location aided multicast for wsn we present four strategies to construct the geomulticast routing tree namely sarf sam cofam and msam especially we discuss cofam in detail and give the algorithm for setting up multicast tree in cone based forwarding area this algorithm is distributed and energy efficient extensive simulations have been conducted to evaluate the performance of the proposed routing schemas simulation results have shown that when constructing multicast tree fewer messages must be transmitted in our schemas
on the basis of case study we demonstrate the usefulness of topology invariants for model driven systems development considering graph grammar semantics for relevant fragment of uml where graph represents an object diagram allows us to apply topology analysis particular abstract interpretation of graph grammars the outcome of this analysis is finite and concise over approximation of all possible reachable object diagrams the so called topology invariant we discuss how topology invariants can be used to verify that constraints on given model are respected by the behaviour and how they can be viewed as synthesised constraints providing insight into the dynamic behaviour of the model
as the scale and complexity of parallel systems continue to grow failures become more and more an inevitable fact for solving large scale applications in this research we present an analytical study to estimate execution time in the presence of failures of directed acyclic graph dag based scientific applications and provide guideline for performance optimization the study is four fold we first introduce performance model to predict individual subtask computation time under failures next layered iterative approach is adopted to transform dag into layered dag which reflects full dependencies among all the subtasks then the expected execution time under failures of the dag is derived based on stochastic analysis unlike existing models this newly proposed performance model provides both the variance and distribution it is practical and can be put to real use finally based on the model performance optimization weak point identification and enhancement are proposed intensive simulations with real system traces are conducted to verify the analytical findings they show that the newly proposed model and weak point enhancement mechanism work well
bounded treewidth and monadic second order mso logic have proved to be key concepts in establishing fixed para meter tractability results indeed by courcelle’s theorem we know any property of finite structures which is expressible by an mso sentence can be decided in linear time data complexity if the structures have bounded treewidth in principle courcelle’s theorem can be applied directly to construct concrete algorithms by transforming the mso evaluation problem into tree language recognition problem the latter can then be solved via finite tree automaton fta however this approach has turned out to be problematical since even relatively simple mso formulae may lead to state explosion of the fta in this work we propose monadic datalog ie data log where all intentional predicate symbols are unary as an alternative method to tackle this class of fixed parameter tractable problems we show that if some property of finite structures is expressible in mso then this property can also be expressed by means of monadic datalog program over the structure plus the treedecomposition moreover we show that the resulting fragment of datalogcan be evaluated in linear time both wrt the program size and wrt the data size this new approach is put to work by devising new algorithm for the primality problem ie testing if some attribute in relational schema is part of key we also report on experimental results with prototype implementation
in this article we present the findings of two ethnographic studies embedded into two broader projects on interactive television in the home environment based on previous research on the home context and inspired by ongoing trends around interactive television we explored basic concepts such as the extended home and new interaction techniques in particular those related to future developments of the remote control for the two studies we also developed two variations of the cultural probes method creative probing and playful probing this methodological approach proved to be appropriate for gathering in depth data on participants opinions attitudes and ideas in way favorable to the participants overall our results support existing research data on user media behavior and expectations and show trends in and beyond the living room concerned with personalization privacy and security as well as communication
in our prior work we presented highly effective local search based heuristic algorithm called the largest expanding sweep search less to solve the minimum energy broadcast meb problem over wireless ad hoc or sensor networks in this paper the performance is further strengthened by using iterated local optimization ilo techniques at the cost of additional computational complexity to the best of our knowledge this implementation constitutes currently the best performing algorithm among the known heuristics for meb we support this claim through extensive simulation study comparing with globally optimal solutions obtained by an integer programming ip solver for small network size up to nodes which is imposed by practical limitation of the ip solver the ilo based algorithm produces globally optimal solutions with very high frequency and average performance is within of the optimal solution
classification of head models based on their shape attributes for subsequent indexing and retrieval are important in many applications as in hierarchical content based retrieval of these head models for virtual scene composition and the automatic annotation of these characters in such scenes while simple feature representations are preferred for more efficient classification operations these features may not be adequate for distinguishing between the subtly different head model classes in view of these we propose an optimization approach based on genetic algorithm ga where the original model representation is transformed in such way that the classification rate is significantly enhanced while retaining the efficiency and simplicity of the original representation specifically based on the extended gaussian image egi representation for models which summarizes the surface normal orientation statistics we consider these orientations as random variables and proceed to search for an optimal transformation for these variables based on genetic optimization the resulting transformed distributions for these random variables are then used as the modified classifier inputs experiments have shown that the optimized transformation results in significant improvement in classification results for large variety of class structures more importantly the transformation can be indirectly realized by bin removal and bin count merging in the original histogram thus retaining the advantage of the original egi representation
many distributed applications have to meet their performance or quality of service goals in environments where available resources change contantly important classes of distributed applications including distributed multimedia codes applications for mobile devices and computational grid codes use runtime adaptation in order to achieve their goals the adaptation behavior in these applications is usually programmed in ad hoc code that is directly incorporated into the base application resulting in systems that are complex to develop maintain modify and debug furthermore it is virtually impossible to extract high level information about adaptive behaviour using program analysis even if there were compiler and runtime sytems that could exploit such information the goal of our research is to develop compiler and programming language support to simplify the development and improve the performance of adaptive distribtued applications we describe simple set of language extensions for adaptive distributed applications and discuss potential compiler techniqes to support such appliations we also propose task graph based framework that can be used to formalize the description of wide range of adaptation operations
shore scalable heterogeneous object repository is persistent object system under development at the university of wisconsin shore represents merger of object oriented database and file system technologies in this paper we give the goals and motivation for shore and describe how shore provides features of both technologies we also describe some novel aspects of the shore architecture including symmetric peer to peer server architecture server customization through an extensible value added server facility and support for scalability on multiprocessor systems an initial version of shore is already operational and we expect release of version in mid
in teaching operating systems at an undergraduate level we belive that it is important to provide project that is realistic enought to show how real operating systems work yet is simple enough that the students can understand and modify it in significant ways number of these instructional saystems have been created over the last two decades but recent advances in hardware and software design along with the increasing power of available computational resources have changed the basis for many of the tradeoffs made by these systems we have implemented an instructional operating system called nachos and designed series of assignments to go with it our system includes cpu and device simulatiors and it runs as regulat unix process nachos illustrates and takes advantage of modern operating systems technology such as threads and remote procedure calls recent harware advances such as risc’s and the prevalence of memory hierarchies and modern software design techniques such as protocol layering and object oriented programming nachos has been used to teach undergraduate operating systems classes at several universities with positive results
the huge number of images on the web gives rise to the content based image retrieval cbir as the text based search techniques cannot cater to the needs of precisely retrieving web images however cbir comes with fundamental flaw the semantic gap between high level semantic concepts and low level visual features consequently relevance feedback is introduced into cbir to learn the subjective needs of users however in practical applications the limited number of user feedbacks is usually overwhelmed by the large number of dimensionalities of the visual feature space to address this issue novel semi supervised learning method for dimensionality reduction namely kernel maximum margin projection kmmp is proposed in this paper based on our previous work of maximum margin projection mmp unlike traditional dimensionality reduction algorithms such as principal component analysis pca and linear discriminant analysis lda which only see the global euclidean structure kmmp is designed for discovering the local manifold structure after projecting the images into lower dimensional subspace kmmp significantly improves the performance of image retrieval the experimental results on corel image database demonstrate the effectiveness of our proposed nonlinear algorithm
mobile users of computation and communication services have been rapidly adopting battery powered mobile handhelds such as pocketpcs and smartphones for their work however the limited battery lifetime of these devices restricts their portability and applicability and this weakness can be exacerbated by mobile malware targeting depletion of battery energy such malware are usually difficult to detect and prevent and frequent outbreaks of new malware variants also reduce the effectiveness of commonly seen signature based detection to alleviate these problems we propose power aware malware detection framework that monitors detects and analyzes previously unknown energy depletion threats the framework is composed of power monitor which collects power samples and builds power consumption history from the collected samples and data analyzer which generates power signature from the constructed history to generate power signature simple and effective noise filtering and data compression are applied thus reducing the detection overhead similarities between power signatures are measured by the distance reducing both false positive and false negative detection rates according to our experimental results on an hp ipaq running windows mobile os the proposed framework achieves significant up to storage savings without losing the detection accuracy and true positive rate in classifying mobile malware
we present an algorithm for the layered segmentation of video data in multiple views the approach is based on computing the parameters of layered representation of the scene in which each layer is modelled by its motion appearance and occupancy where occupancy describes probabilistically the layer’s spatial extent and not simply its segmentation in particular view the problem is formulated as the map estimation of all layer parameters conditioned on those at the previous time step ie sequential estimation problem that is equivalent to tracking multiple objects in given number views expectation maximisation is used to establish layer posterior probabilities for both occupancy and visibility which are represented distinctly evidence from areas in each view which are described poorly under the model is used to propose new layers automatically since these potential new layers often occur at the fringes of images the algorithm is able to segment and track these in single view until such time as suitable candidate match is discovered in the other views the algorithm is shown to be very effective at segmenting and tracking non rigid objects and can cope with extreme occlusion we demonstrate an application of this representation to dynamic novel view synthesis
one of the critiques on program slicing is that slices presented to the user are hard to understand this is mainly related to the problem that slicing lsquo dumps rsquo the results onto the user without any explanation this work will present an approach that can be used to lsquo filter rsquo slices this approach basically introduces lsquo barriers rsquo which are not allowed to be passed during slice computation an earlier filtering approach is chopping which is also extended to obey such barrier the barrier variants of slicing and chopping provide filtering possibilities for smaller slices and better comprehensibility the concept of barriers is then applied to path conditions which provide necessary conditions under which an influence between the source and target criterion exists barriers make those conditions more precise
language supported synchronization is source of serious performance problems in many java programs even single threaded applications may spend up to half their time performing useless synchronization due to the thread safe nature of the java libraries we solve this performance problem with new algorithm that allows lock and unlock operations to be performed with only few machine instructions in the most common cases our locks only require partial word per object and were implemented without increasing object size we present measurements from our implementation in the jdk for aix demonstrating speedups of up to factor of in micro benchmarks and up to factor of in real programs
the task of linking databases is an important step in an increasing number of data mining projects because linked data can contain information that is not available otherwise or that would require time consuming and expensive collection of specific data the aim of linking is to match and aggregate all records that refer to the same entity one of the major challenges when linking large databases is the efficient and accurate classification of record pairs into matches and non matches while traditionally classification was based on manually set thresholds or on statistical procedures many of the more recently developed classification methods are based on supervised learning techniques they therefore require training data which is often not available in real world situations or has to be prepared manually an expensive cumbersome and time consuming process the author has previously presented novel two step approach to automatic record pair classification in the first step of this approach training examples of high quality are automatically selected from the compared record pairs and used in the second step to train support vector machine svm classifier initial experiments showed the feasibility of the approach achieving results that outperformed means clustering in this paper two variations of this approach are presented the first is based on nearest neighbour classifier while the second improves svm classifier by iteratively adding more examples into the training sets experimental results show that this two step approach can achieve better classification results than other unsupervised approaches
the web of data has emerged as way of exposing structured linked data on the web it builds on the central building blocks of the web uris http and benefits from its simplicity and wide spread adoption it does however also inherit the unresolved issues such as the broken link problem broken links constitute major challenge for actors consuming linked data as they require them to deal with reduced accessibility of data we believe that the broken link problem is major threat to the whole web of data idea and that both linked data consumers and providers will require solutions that deal with this problem since no general solutions for fixing such links in the web of data have emerged we make three contributions into this direction first we provide concise definition of the broken link problem and comprehensive analysis of existing approaches second we present dsnotify generic framework able to assist human and machine actors in fixing broken links it uses heuristic feature comparison and employs time interval based blocking technique for the underlying instance matching problem third we derived benchmark datasets from knowledge bases such as dbpedia and evaluated the effectiveness of our approach with respect to the broken link problem our results show the feasibility of time interval based blocking approach for systems that aim at detecting and fixing broken links in the web of data
as xml has become an emerging standard for information exchange on the world wide web it has gained great attention among database communities with respect to extraction of information from xml which is considered as database model xml queries enable users to issue many kinds of complex queries using regular path expressions however they usually require large search space during query processing so the problem of xml query processing has received significant attention this paper surveys the state of the art on the problem of xml query evaluation we consider the problem in three dimensions xml instance storage xml query languages and xml views and xml query language processing we describe the problem definition algorithms proposed to solve it and the relevant research issues
effective load distribution is of great importance at grids which are complex heterogeneous distributed systems in this paper we study site allocation scheduling of nonclairvoyant jobs in level heterogeneous grid architecture three scheduling policies at grid level which utilize site load information are examined the aim is the reduction of site load information traffic while at the same time mean response time of jobs and fairness in utilization between the heterogeneous sites are of great interest simulation model is used to evaluate performance under various conditions simulation results show that considerable decrement in site load information traffic and utilization fairness can be achieved at the expense of slight increase in response time
traditional coherence protocols present set of difficult tradeoffs the reliance of snoopy protocols on broadcast and ordered interconnects limits their scalability while directory protocols incur performance penalty on sharing misses due to indirection this work introduces patch predictive adaptive token counting hybrid coherence protocol that provides the scalability of directory protocols while opportunistically sending direct requests to reduce sharing latency patch extends standard directory protocol to track tokens and use token counting rules for enforcing coherence permissions token counting allows patch to support direct requests on an unordered interconnect while mechanism called token tenure uses local processor timeouts and the directorys per block point of ordering at the home node to guarantee forward progress without relying on broadcast patch makes three main contributions first patch introduces token tenure which provides broadcast free forward progress for token counting protocols second patch deprioritizes best effort direct requests to match or exceed the performance of directory protocols without restricting scalability finally patch provides greater scalability than directory protocols when using inexact encodings of sharers because only processors holding tokens need to acknowledge requests overall patch is one size fits all coherence protocol that dynamically adapts to work well for small systems large systems and anywhere in between
this paper describes new software based registration and fusion of visible and thermal infrared ir image data for face recognition in challenging operating environments that involve illumination variations the combined use of visible and thermal ir imaging sensors offers viable means for improving the performance of face recognition techniques based on single imaging modality despite successes in indoor access control applications imaging in the visible spectrum demonstrates difficulties in recognizing the faces in varying illumination conditions thermal ir sensors measure energy radiations from the object which is less sensitive to illumination changes and are even operable in darkness however thermal images do not provide high resolution data data fusion of visible and thermal images can produce face images robust to illumination variations however thermal face images with eyeglasses may fail to provide useful information around the eyes since glass blocks large portion of thermal energy in this paper eyeglass regions are detected using an ellipse fitting method and replaced with eye template patterns to preserve the details useful for face recognition in the fused image software registration of images replaces special purpose imaging sensor assembly and produces co registered image pairs at reasonable cost for large scale deployment face recognition techniques using visible thermal ir and data fused visible thermal images are compared using commercial face recognition software faceit and two visible thermal face image databases the nist equinox and the utk iris databases the proposed multiscale data fusion technique improved the recognition accuracy under wide range of illumination changes experimental results showed that the eyeglass replacement increased the number of correct first match subjects by nist equinox and utk iris
we introduce feature based method to detect unusual patterns the property of normality allows us to devise framework to quickly prune the normal observations observations that can not be combined into any significant pattern are considered unusual rules that are learned from the dataset are used to construct the patterns for which we compute score function to measure the interestingness of the unusual patterns experiments using the kdd cup dataset show that our approach can discover most of the attack patterns those attacks are in the top set of unusual patterns and have higher score than the patterns of normal connections the experiments also show that the algorithm can run very fast
numerous qos routing strategies focus on end to end delays to provide time constrained routing protocols in wireless sensor networks wsns with the arrival of wireless multimedia sensor networks traffic can be composed of time sensitive packets and reliability demanding packets in such situations some works also take into account link reliability to provide probabilistic qos the trade off between the guarantee of the qos requirements and the network lifetime remains an open issue especially in large scale wsns this paper proposes promising multipath qos routing protocol based on separation of the nodes into two sub networks the first part includes specific nodes that are occasionally involved in routing decisions while the remaining nodes in the second sub network fully take part in them the qos routing is formulated as an optimization problem that aims to extend the network lifetime subject to qos constraints using the percolation theory routing algorithm is designed to solve the problem on the respective sub networks simulation results show the efficiency of this novel approach in terms of average end to end delays on time packet delivery ratio and network lifetime
to resolve some of lexical disagreement problems between queries and faqs we propose reliable faq retrieval system using query log clustering on indexing time the proposed system clusters the logs of users queries into predefined faq categories to increase the precision and the recall rate of clustering the proposed system adopts new similarity measure using machine readable dictionary on searching time the proposed system calculates the similarities between users queries and each cluster in order to smooth faqs by virtue of the cluster based retrieval technique the proposed system could partially bridge lexical chasms between queries and faqs in addition the proposed system outperforms the traditional information retrieval systems in faq retrieval
abstract we present new method for converting photo or image to synthesized painting following the painting style of an example painting treating painting styles of brush strokes as sample textures we reduce the problem of learning an example painting to texture synthesis problem the proposed method uses hierarchical patch based approach to the synthesis of directional textures the key features of our method are painting styles are represented as one or more blocks of sample textures selected by the user from the example painting image segmentation and brush stroke directions defined by the medial axis are used to better represent and communicate shapes and objects present in the synthesized painting image masks and hierarchy of texture patches are used to efficiently synthesize high quality directional textures the synthesis process is further accelerated through texture direction quantization and the use of gaussian pyramids our method has the following advantages first the synthesized stroke textures can follow direction field determined by the shapes of regions to be painted second the method is very efficient the generation time of synthesized painting ranges from few seconds to about one minute rather than hours as required by other existing methods on commodity pc furthermore the technique presented here provides new and efficient solution to the problem of synthesizing directional texture we use number of test examples to demonstrate the efficiency of the proposed method and the high quality of results produced by the method
for efficient image retrieval the image database should be processed to extract representing feature vector for each member image in the database reliable and robust statistical image indexing technique based on stochastic model of an image color content has been developed based on the developed stochastic model compact dimensional feature vector was defined to tag images in the database system the entries of the defined feature vector are the mean variance and skewness of the image color histogram distributions as well as correlation factors between color components of the rgb color space it was shown using statistical analysis that the feature vector provides sufficient knowledge about the histogram distribution the reliability and robustness of the proposed technique against common intensity artifacts and noise was validated through several experiments conducted for that purpose the proposed technique outperforms traditional and other histogram based techniques in terms of feature vector size and properties as well as performance
release consistency is widely accepted memory model for distributed shared memory systems eager release consistency represents the state of the art in release consistent protocols for hardware coherent multiprocessors while lazy release consistency has been shown to provide better performance for software distributed shared memory dsm several of the optimizations performed by lazy protocols have the potential to improve the performance of hardware coherent multiprocessors as well but their complexity has precluded hardware implementation with the advent of programmable protocol processors it may become possible to use them after all we present and evaluate lazy release consistent protocol suitable for machines with dedicated protocol processors this protocol admits multiple concurrent writers sends write notices concurrently with computation and delays invalidations until acquire operations we also consider lazier protocol that delays sending write notices until release operations our results indicate that the first protocol outperforms eager release consistency by as much as across variety of applications the lazier protocol on the other hand is unable to recoup its high synchronization overhead this represents qualitative shift from the dsm world where lazier protocols always yield performance improvements based on our results we conclude that machines with flexible hardware support for coherence should use protocols based on lazy release consistency but in less aggressively lazy form than is appropriate for dsm
in order to establish consolidated standards in novel data mining areas newly proposed algorithms need to be evaluated thoroughly many publications compare new proposition if at all with one or two competitors or even with so called naïve ad hoc solution for the prolific field of subspace clustering we propose software framework implementing many prominent algorithms and thus allowing for fair and thorough evaluation furthermore we describe how new algorithms for new applications can be incorporated in the framework easily
distributed coalition supports distributed mandatory access controls for resources whose security policies differ for each group of components over nodes and provides secure information operations and exchanges with nodes that handle information over which conflicts of interest may occur many projects have proposed distributed coalitions using virtual machine monitor but this approach for strong confinement tends to hinder successful deployments in real world scenarios that involve complicated operations and management for applications because such access control is coarse grained for the resources in this paper we propose chinese wall process confinement cwpc for practical application level distributed coalitions that provide fine grained access controls for resources and that emphasize minimizing the impact on the usability using program transparent reference monitor we implemented prototype system named aldc for standard office applications on microsoft windows that are used on daily basis for business purposes and that may involve conflicts of interests evaluated its performance and influence on usability and show that our approach is practical
dynamic software optimization methods are becoming increasingly popular for improving software performance and power the first step in dynamic optimization consists of detecting frequently executed code or critical regions most previous critical region detectors have been targeted to desktop processors we introduce critical region detector targeted to embedded processors with the unique features of being very size and power efficient and being completely nonintrusive to the software’s execution features needed in timing sensitive embedded systems our detector not only finds the critical regions but also determines their relative frequencies potentially important feature for selecting among alternative dynamic optimization methods our detector uses tiny cache like structure coupled with small amount of logic we provide results of extensive explorations across embedded system benchmarks we show that highly accurate results can be achieved with only percent power overhead acceptable size overhead and zero runtime overhead our detector is currently being used as part of dynamic hardware software partitioning approach but is applicable to wide variety of situations
web applications routinely handle sensitive data and many people rely on them to support various daily activities so errors can have severe and broad reaching consequences unlike most desktop applications many web applications are written in scripting languages such as php the dynamic features commonly supported by these languages significantly inhibit static analysis and existing static analysis of these languages can fail to produce meaningful results on realworld web applications automated test input generation using the concolic testing framework has proven useful for finding bugs and improving test coverage on and java programs which generally emphasize numeric values and pointer based data structures however scripting languages such as php promote style of programming for developing web applications that emphasizes string values objects and arrays in this paper we propose an automated input test generation algorithm that uses runtime values to analyze dynamic code models the semantics of string operations and handles operations whose argument and return values may not share common type as in the standard concolic testing framework our algorithm gathers constraints during symbolic execution our algorithm resolves constraints over multiple types by considering each variable instance individually so that it only needs to invert each operation by recording constraints selectively our implementation successfully finds bugs in real world web applications which state of the art static analysis tools fail to analyze
understanding large grid platform configurations and generating representative synthetic configurations is critical for grid computing research this paper presents an analysis of existing resource configurations and proposes grid platform generator that synthesizes realistic configurations of both computing and communication resources our key contributions include the development of statistical models for currently deployed resources and using these estimates for modeling the characteristics of future systems through the analysis of the configurations of clusters and over processors we identify appropriate distributions for resource configuration parameters in many typical clusters using well established statistical tests we validate our models against second resource collection of clusters and over processors and show that our models effectively capture the resource characteristics found in real world resource infrastructures these models are realized in resource generator which can be easily recalibrated by running it on training sample set
mobile communication devices may be used for spreading multimedia data without support of an infrastructure such scheme where the data is carried by people walking around and relayed from device to device by means of short range radio could potentially form public content distribution system that spans vast urban areas the transport mechanism is the flow of people and it can be studied but not engineered the question addressed in this paper is how well pedestrian content distribution may work we answer this question by modeling the mobility of people moving around in city constrained by given topology our contributions are both the queuing analytic model that captures the flow of people and the results on the feasibility of pedestrian content distribution furthermore we discuss possible extensions to the mobility model to capture speed distance relations that emerge in dense crowds
the past few years have witnessed an significant interest in probabilistic logic learning ie in research lying at the intersection of probabilistic reasoning logical representations and machine learning rich variety of different formalisms and learning techniques have been developed this paper provides an introductory survey and overview of the state of the art in probabilistic logic learning through the identification of number of important probabilistic logical and learning concepts
one may wish to use computer graphic images to carry out road visibility studies unfortunately most display devices still have limited luminance dynamic range especially in driving simulators in this paper we propose tone mapping operator tmo to compress the luminance dynamic range while preserving the driver’s performance for visual task relevant for driving situation we address three display issues of some consequences for road image display luminance dynamics image quantization and high minimum displayable luminance our tmo characterizes the effects of local adaptation with bandpass decomposition of the image using laplacian pyramid and processes the levels separately in order to mimic the human visual system the contrast perception model uses the visibility level usual index in road visibility engineering applications to assess our algorithm psychophysical experiment devoted to target detection task was designed using landolt ring the visual performances of observers were measured they stared first at high dynamic range image and then at the same image processed by tmo and displayed on low dynamic range monitor for comparison the evaluation was completed with visual appearance evaluation our operator gives good performances for three typical road situations one in daylight and two at night after comparison with four standard tmos from the literature the psychovisual assessment of our tmo is limited to these driving situations
many mobile phones integrate services such as personal calendars given the social nature of the stored data however users often need to access such information as part of phone conversation in typical non headset use this re quires users to interrupt their conversations to look at the screen we investigate counter intuitive solution to avoid the need for interruption we replace the visual interface with one based on auditory feedback surprisingly this can be done without interfering with the phone conversation we present blindsight prototype application that replaces the traditionally visual in call menu of mobile phone users interact using the phone keypad without looking at the screen blindsight responds with auditory feedback this feedback is heard only by the user not by the person on the other end of the line we present the results of two user studies of our prototype the first study verifies that useful keypress accuracy can be obtained for the phone at ear position the second study compares the blindsight system against visual baseline condition and finds preference for blindsight
whenever an array element is accessed java virtual machines execute compare instruction to ensure that the index value is within the valid bounds this reduces the execution speed of java programs array bounds check elimination identifies situations in which such checks are redundant and can be removed we present an array bounds check elimination algorithm for the java hotspot tm vm based on static analysis in the just in time compiler the algorithm works on an intermediate representation in static single assignment form and maintains conditions for index expressions it fully removes bounds checks if it can be proven that they never fail whenever possible it moves bounds checks out of loops the static number of checks remains the same but check inside loop is likely to be executed more often if such check fails the executing program falls back to interpreted mode avoiding the problem that an exception is thrown at the wrong place the evaluation shows speedup near to the theoretical maximum for the scientific scimark benchmark suite and also significant improvements for some java grande benchmarks the algorithm slightly increases the execution speed for the specjvm benchmark suite the evaluation of the dacapo benchmarks shows that array bounds checks do not have significant impact on the performance of object oriented applications
with the increasing use of research paper search engines such as citeseer for both literature search and hiring decisions the accuracy of such systems is of paramount importance this article employs conditional random fields crfs for the task of extracting various common fields from the headers and citation of research papers crfs provide principled way for incorporating various local features external lexicon features and globle layout features the basic theory of crfs is becoming well understood but best practices for applying them to real world data requires additional exploration we make an empirical exploration of several factors including variations on gaussian laplace and hyperbolic priors for improved regularization and several classes of features based on crfs we further present novel approach for constraint co reference information extraction ie improving extraction performance given that we know some citations refer to the same publication on standard benchmark dataset we achieve new state of the art performance reducing error in average by and word error rate by in comparison with the previous best svm results accuracy compares even more favorably against hmms on four co reference ie datasets our system significantly improves extraction performance with an error rate reduction of
software reuse is regarded is as key software development objective leading to reduction of costs associated with software development and maintenance however there is mounting evidence that software reuse is difficult to achieve in practice and that software development approaches such as component based development and more recently service oriented computing have failed to achieve anticipated levels of reuse in this paper we identify the determinants of service reusability and argue that the design of services plays an important role in achieving high levels of reuse we examine the relationship between service granularity and reuse and note that extensive use of coarse grained document centric services by soa practitioners makes achieving reuse particularly challenging
detection of malicious software malware using machine learning methods has been explored extensively to enable fast detection of new released malware the performance of these classifiers depends on the induction algorithms being used in order to benefit from multiple different classifiers and exploit their strengths we suggest using an ensemble method that will combine the results of the individual classifiers into one final result to achieve overall higher detection accuracy in this paper we evaluate several combining methods using five different base inducers decision tree naive bayes knn vfi and oner on five malware datasets the main goal is to find the best combining method for the task of detecting malicious files in terms of accuracy auc and execution time
this article addresses the problem of recognizing the behavior of person suffering from alzheimer’s disease at early intermediate stages we present keyhole plan recognition model based on lattice theory and action description logic which transforms the recognition problem into classification issue this approach allows us to formalize the plausible incoherent intentions of the patient resulting from the symptoms of his cognitive impairment such as disorientation memory lapse etc an implementation of this model was tested in our smart home laboratory by simulating set of real case scenarios
in this paper we present novel approach to three dimensional human motion estimation from monocular video data we employ particle filter to perform the motion estimation the novelty of the method lies in the choice of state space for the particle filter using non linear inverse kinematics solver allows us to perform the filtering in end effector space this effectively reduces the dimensionality of the state space while still allowing for the estimation of large set of motions preliminary experiments with the strategy show good results compared to full pose tracker
xml is one of the primary encoding schemes for data and knowledge we investigate incremental physical data clustering in systems that store xml documents using native format we formulate the xml clustering problem as an augmented with sibling edges tree partitioning problem and propose the pixsar practical incremental xml sibling augmented reclustering algorithm for incrementally clustering xml documents we show the fundamental importance of workload driven dynamically rearranging storage pixsar incrementally executes reclustering operations on selected subgraphs of the global augmented document tree the subgraphs are implied by significant changes in the workload as the workload changes pixsar incrementally djusts the xml data layout so as to better fit the workload pixsar’s main parameters are the radius in pages of the augmented portion to be reclustered and the way reclustering is triggered we briefly explore some of the effects of indexes full treatment of indexes is the subject of another paper we use an experimental data clustering system that includes fast disk simulator and file system simulator for storing native xml data we use novel method for exporting the saxon query processor into our setting experimental results indicate that using pixsar significantly reduces the number of page faults counting all page faults incurred while querying the document as well as maintenance operations thereby resulting in improved query performance
this article presents new method to model fast volume preservation of mass spring system to achieve realistic and efficient deformable object animation without using internal volumetric meshing with this method the simulated behavior is comparable to finite element method based model at fraction of the computational cost
in recent years the growth of the internet has facilitated the rapid emergence of online communities in this paper we survey key research issues on online communities from the perspectives of both social science and computing technologies we also sample several major online community applications and propose some directions for future research
we present novel approach to interdomain traffic engineering based on the concepts of nash bargaining and dual decomposition under this scheme isps use an iterative procedure to jointly optimize social cost function referred to as the nash product we show that the global optimization problem can be separated into subproblems by introducing appropriate shadow prices on the interdomain flows these subproblems can then be solved independently and in decentralized manner by the individual isps our approach does not require the isps to share any sensitive internal information such as network topology or link weights more importantly our approach is provably pareto efficient and fair therefore we believe that our approach is highly amenable to adoption by isps when compared to past approaches we also conduct simulation studies of our approach over several real isp topologies our evaluation shows that the approach converges quickly offers equitable performance improvements to isps is significantly better than unilateral approaches eg hot potato routing and offers the same performance as centralized solution with full knowledge
we propose deterministic fault tolerant and deadlock free routing protocol in two dimensional meshes based on dimension order routing and the odd even turn model the proposed protocol called extended routing does not use any virtual channels by prohibiting certain locations of faults and destinations faults are contained in set of disjointed rectangular regions called faulty blocks the number of faults to be tolerated is unbounded as long as nodes outside faulty blocks are connected in the mesh network the extended routing can also be used under special convex fault region called an orthogonal faulty block which can be derived from given faulty block by activating some nonfaulty nodes in the block extensions to partially adaptive routing traffic and adaptivity balanced using virtual networks and routing without constraints using virtual channels and virtual networks are also discussed
java application servers are gaining popularity as way for businesses to conduct day to day operations while strong emphasis has been placed on how to obtain peak performance only few research efforts have focused on these servers ability to sustain top performance in spite of the ever changing demands from users as preliminary study we conducted an experiment to observe the throughput degradation behavior of widely used java application server running standardized benchmark and found that throughput performance degrades ungracefully thus the goal of this work is three fold to identify the primary factors that cause poor throughput degradation ii to investigate how these factors affect throughput degradation and iii to observe how changes in algorithms and policies governing these factors affect throughput degradation
formal semantics for xquery with side effects have been proposed in we propose different semantics which is better suited for database compilation we substantiate this claim by formalizing the compilation of xquery extended with updates into database algebra we prove the correctness of the proposed compilation by mapping both the source language and the algebra to common core language with list comprehensions and extensible tuples
in this paper we describe an investigation into the requirements for and the use of in situ authoring in the creation of location based pervasive and ubicomp experiences we will focus on the co design process with users that resulted in novel visitor experience to historic country estate this has informed the design of new in situ authoring tools supplemented with tools for retrospective revisiting and reorganization of content an initial trial of these new tools will be discussed and conclusions drawn as to the appropriateness of such tools further enhancements as part of future trials will also be described
abstract we consider data clustering problems where partial grouping is known priori we formulate such biased grouping problems as constrained optimization problem where structural properties of the data define the goodness of grouping and partial grouping cues define the feasibility of grouping we enforce grouping smoothness and fairness on labeled data points so that sparse partial grouping information can be effectively propagated to the unlabeled data considering the normalized cuts criterion in particular our formulation leads to constrained eigenvalue problem by generalizing the rayleigh ritz theorem to projected matrices we find the global optimum in the relaxed continuous domain by eigendecomposition from which near global optimum to the discrete labeling problem can be obtained effectively we apply our method to real image segmentation problems where partial grouping priors can often be derived based on crude spatial attentional map that binds places with common salient features or focuses on expected object locations we demonstrate not only that it is possible to integrate both image structures and priors in single grouping process but also that objects can be segregated from the background without specific object knowledge
computing world distances of scene features from the captured images is common task in image analysis and scene understanding projective geometry based methods focus on measuring distance from one single image the scope of measurable scene is limited by the field of view fov of one single camera with full view panorama the scope of measurable scene is no longer limited by fov however the scope of measurable scene is limited by the fixed capture location of single panorama in this paper we propose one method of measuring distances of line segments in real world scene using panoramic video representation the scope of measurable scene is largely extended without the limitation of fov and fixed capture location prototype system called pv measure is developed to allow user to interactively measure the distances of line segments in panoramic video experiment results verify that the method offers good accuracy
cost based xml query optimization calls for accurate estimation of the selectivity of path expressions some other interactive and internet applications can also benefit from such estimations while there are number of estimation techniques proposed in the literature almost none of them has any guarantee on the estimation accuracy within given space limit in addition most of them assume that the xml data are more or less static ie with few updates in this paper we present framework for xml path selectivity estimation in dynamic context specifically we propose novel data structure bloom histogram to approximate xml path frequency distribution within small space budget and to estimate the path selectivity accurately with the bloom histogram we obtain the upper bound of its estimation error and discuss the trade offs between the accuracy and the space limit to support updates of bloom histograms efficiently when underlying xml data change dynamic summary layer is used to keep exact or more detailed xml path information we demonstrate through our extensive experiments that the new solution can achieve significantly higher accuracy with an even smaller space than the previous methods in both static and dynamic environments
we formulate new approach for evaluating prefetching algorithm we first carry out profiling run of program to identify all of the misses and corresponding locations in the program where prefetches for the misses can be initiated we then systematically control the number of misses that are prefetched the timeliness of these prefetches and the number of unused prefetches we validate the accuracy of our approach by comparing it to one based on markov prefetch algorithm this allows us to measure the potential benefit that any application can receive from prefetching and to analyze application behavior under conditions that cannot be explored with any known prefetching algorithm next we analyze system parameter that is vital to prefetching performance the line transfer interval which is the number of processor cycles required to transfer cache line this interval is determined by technology and bandwidth we show that under ideal conditions prefetching can remove nearly all of the stalls associated with cache misses unfortunately real processor implementations are less than ideal in particular the trend in processor frequency is outrunning on chip and off chip bandwidths today it is not uncommon for processor frequency to be three or four times bus frequency under these conditions we show that nearly all of the performance benefits derived from prefetching are eroded and in many cases prefetching actually degrades performance we carry out quantitative and qualitative analyses of these tradeoffs and show that there is linear relationship between overall performance and three metrics percentage of misses prefetched percentage of unused prefetches and bandwidth
we discuss general techniques centered around the layerwise separation property lsp of planar graph problem that allow to develop algorithms with running time given an instance of problem on planar graphs with parameter problems having lsp include planar vertex cover planar independent set and planar dominating set extensions of our speed up technique to basically all fixed parameter tractable planar graph problems are also exhibited moreover we relate eg the domination number or the vertex cover number with the treewidth of plane graph
we discuss alternative heap architectures for languages that rely on automatic memory management and implement concurrency through asynchronous message passing we describe how interprocess communication and garbage collection happens in each architecture and extensively discuss the tradeoffs that are involved in an implementation setting the erlang otp system where the rest of the runtime system is unchanged we present detailed experimental comparison between these architectures using both synthetic programs and large commercial products as benchmarks
user interface dui design and development requires practitioners designers and developers to represent their ideas in representations designed for machine execution rather than natural representations hampering development of effective duis as such concept oriented design cod was created as theory of software development for both natural and executable design and development instantiated in the toolkit chasm chasm is natural tiered executable user interface description language uidl for duis resulting in improved understandability as well as reduced complexity and reuse chasm’s utility is shown through evaluations by domain experts case studies of long term use and an analysis of spaces
in the recent past the recognition and localization of objects based on local point features has become widely accepted and utilized method among the most popular features are currently the sift features the more recent surf features and region based features such as the mser for time critical application of object recognition and localization systems operating on such features the sift features are too slow ms for images of size on ghz cpu the faster surf achieve computation time of ms which is still too slow for active tracking of objects or visual servoing applications in this paper we present combination of the harris corner detector and the sift descriptor which computes features with high repeatability and very good matching properties within approx ms while just computing the sift descriptors for computed harris interest points would lead to an approach that is not scale invariant we will show how scale invariance can be achieved without time consuming scale space analysis furthermore we will present results of successful application of the proposed features within our system for recognition and localization of textured objects an extensive experimental evaluation proves the practical applicability of our approach
this paper presents novel algorithm for computing graph edit distance ged in image categorization this algorithm is purely structural ie it needs only connectivity structure of the graph and does not draw on node or edge attributes there are two major contributions introducing edge direction histogram edh to characterize shape features of images it is shown that ged can be employed as distance of edhs this algorithm is completely independent on cost function which is difficult to be defined exactly computing distance of edhs with earth mover distance emd which takes neighborhood bins into account so as to compute distance of edhs correctly set of experiments demonstrate that the newly presented algorithm is available for classifying and clustering images and is immune to the planar rotation of images compared with ged from spectral seriation our algorithm can capture the structure change of graphs better and consume time used by the former one the average classification rate is and average clustering rate is higher than the spectral seriation method
type system with linearity is useful for checking software protocols andresource management at compile time linearity provides powerful reasoning about state changes but at the price of restrictions on aliasing the hard division between linear and nonlinear types forces the programmer to make trade off between checking protocol on an object and aliasing the object most onerous is the restriction that any type with linear component must itself be linear because of this checking protocol on an object imposes aliasing restrictions on any data structure that directly or indirectly points to the object we propose new type system that reduces these restrictions with the adoption and focus constructs adoption safely allows programmer to alias objects on which she is checking protocols and focus allows the reverse programmer can alias data structures that point to linear objects and use focus for safe access to those objects we discuss how we implemented these ideas in the vault programming language
in information retrieval cluster based retrieval is well known attempt in resolving the problem of term mismatch clustering requires similarity information between the documents which is difficult to calculate at feasible time the adaptive document clustering scheme has been investigated by researchers to resolve this problem however its theoretical viewpoint has not been fully discovered in this regard we provide conceptual viewpoint of the adaptive document clustering based on query based similarities by regarding the user’s query as concept as result adaptive document clustering scheme can be viewed as an approximation of this similarity based on this idea we derive three new query based similarity measures in language modeling framework and evaluate them in the context of cluster based retrieval comparing with means clustering and full document expansion evaluation result shows that retrievals based on query based similarities significantly improve the baseline while being comparable to other methods this implies that the newly developed query based similarities become feasible criterions for adaptive document clustering
past studies have shown that objects are created and then die in phases thus one way to sustain good garbage collection efficiency is to have large enough heap to allow many allocation phases to complete and most of the objects to die before invoking garbage collection however such an operating environment is hard to maintain in large multithreaded applications because most typical time sharing schedulers are not allocation phase cognizant ie they often schedule threads in way that prevents them from completing their allocation phases quickly thus when garbage collection is invoked most allocation phases have yet to be completed resulting in poor collection efficiency we introduce two new scheduling strategies larf lower allocation rate first and mqrr memory quantum round robin designed to be allocation phase aware by assigning higher execution priority to threads in computation oriented phases the simulation results show thatallthe reductions of the garbage collection time in generational collector can range from when compare to round robin scheduler the reductions of the overall execution time and the average thread turnaround time range from and respectively
in this paper we propose an improved method of an efficient secure access control labeling under dynamic xml data streams environment the proposed method enables an efficient secure real time processing of query in mobile terminal and supports insertion of new nodes at arbitrary positions in the xml tree without re labeling and without conflicting although some researches have been done to maintain the document order in updating the main drawbacks in the most of these works are if the deletion and or insertion occur regularly then expensive re computing of affected labels is needed therefore we focus on how to design an efficient secure labeling scheme for xml trees which are frequently updated under dynamic xml data streams
fundamental to data cleaning is the need to account for multiple data representations we propose formal framework that can be used to reason about and manipulate data representations the framework is declarative and combines elements of generative grammar with database querying it also incorporates actions in the spirit of programming language compilers this framework has multiple applications such as parsing and data normalization data normalization is interesting in its own right in preparing data for analysis as well as in pre processing data for further cleansing we empirically study the utility of the framework over several real world data cleaning scenarios and find that with the right normalization often the need for further cleansing is minimized
customers purchase behavior may vary over time traditional collaborative filtering cf methods make recommendations to target customer based on the purchase behavior of customers whose preferences are similar to those of the target customer however the methods do not consider how the customers purchase behavior may vary over time in contrast the sequential rule based recommendation method analyzes customers purchase behavior over time to extract sequential rules in the form purchase behavior in previous periods purchase behavior in the current period if target customer’s purchase behavior history is similar to the conditional part of the rule then his her purchase behavior in the current period is deemed to be the consequent part of the rule although the sequential rule method considers the sequence of customers purchase behavior over time it does not utilize the target customer’s purchase data for the current period to resolve the above problems this work proposes novel hybrid recommendation method that combines the segmentation based sequential rule method with the segmentation based knn cf method the proposed method uses customers rfm recency frequency and monetary values to cluster customers into groups with similar rfm values for each group of customers sequential rules are extracted from the purchase sequences of that group to make recommendations meanwhile the segmentation based knn cf method provides recommendations based on the target customer’s purchase data for the current period then the results of the two methods are combined to make final recommendations experiment results show that the hybrid method outperforms traditional cf methods
this paper defines the basic notions of local and non local tasks and determines the minimum information about failures that is necessary to solve any non local task in message passing systems it also introduces natural weakening of the well known set agreement task and show that in some precise sense it is the weakest non local task in message passing systems
in several applications such as databases planning and sensor networks parameters such as selectivity load or sensed values are known only with some associated uncertainty the performance of such system as captured by some objective function over the parameters is significantly improved if some of these parameters can be probed or observed in resource constrained situation deciding which parameters to observe in order to optimize system performance itself becomes an interesting and important optimization problem this problem is the focus of this paper unfortunately designing optimal observation schemes is np hard even for the simplest objective functions leading to the study of approximation algorithms in this paper we present general techniques for designing non adaptive probing algorithms which are at most constant factor worse than optimal adaptive probing schemes interestingly this shows that for several problems of interest while probing yields significant improvement in the objective function being adaptive about the probing is not beneficial beyond constant factors
reps formerly called dsls are multiscale medial means for modeling and rendering solid geometry they are particularly well suited to model anatomic objects and in particular to capture prior geometric information effectively in deformable models segmentation approaches the representation is based on figural models which define objects at coarse scale by hierarchy of figures mdash each figure generally slab representing solid region and its boundary simultaneously this paper focuses on the use of single figure models to segment objects of relatively simple structurea single figure is sheet of medial atoms which is interpolated from the model formed by net ie mesh or chain of medial atoms hence the name reps each atom modeling solid region via not only position and width but also local figural frame giving figural directions and an object angle between opposing corresponding positions on the boundary implied by the rep the special capability of an rep is to provide spatial and orientational correspondence between an object in two different states of deformation this ability is central to effective measurement of both geometric typicality and geometry to image match the two terms of the objective function optimized in segmentation by deformable models the other ability of reps central to effective segmentation is their ability to support segmentation at multiple levels of scale with successively finer precision objects modeled by single figures are segmented first by similarity transform augmented by object elongation then by adjustment of each medial atom and finally by displacing dense sampling of the rep implied boundary while these models and approaches also exist in we focus on objectsthe segmentation of the kidney from ct and the hippocampus from mri serve as the major examples in this paper the accuracy of segmentation as compared to manual slice by slice segmentation is reported
in this paper new strategy is proposed to defend against colluding malicious nodes in sensor network the strategy is based on new relaxation labelling algorithm to classify nodes into benign or malicious ones only reports from benign nodes can then be used to perform localisation and obtain accurate results experimental results based on simulations and field experiments illustrate the performance of the algorithm
with the explosive growth of web and the recent development in digital media technology the number of images on the web has grown tremendously consequently web image clustering has emerged as an important application some of the initial efforts along this direction revolved around clustering web images based on the visual features of images or textual features by making use of the text surrounding the images however not much work has been done in using multimodal information for clustering web images in this paper we propose graph theoretical framework for simultaneously integrating visual and textual features for efficient web image clustering specifically we model visual features images and words from surrounding text using tripartite graph partitioning this graph leads to clustering of the web images although graph partitioning approach has been adopted before the main contribution of this work lies in new algorithm that we propose consistent isoperimetric high order co clustering cihc for partitioning the tripartite graph computationally cihc is very quick as it requires simple solution to sparse system of linear equations our theoretical analysis and extensive experiments performed on real web images demonstrate the performance of cihc in terms of the quality efficiency and scalability in partitioning the visual feature image word tripartite graph
in this paper we propose prima pri vacy ma nager privacy protection mechanism which supports semi automated generation of access rules for users profile information prima access rules are tailored by the users privacy preferences for their profile data the sensitivity of the data itself and the objective risk of disclosing this data to other users the resulting rules are simple yet powerful specifications indicating the adequate level of protection for each user and are dynamically adapted to the ever changing setting of the users preferences and sn configuration
munisocket multiple network interface socket provides mechanisms to enhance the communication performance properties such as throughput transfer time and reliability by utilizing the existing multiple network interface cards on communicating hosts although the munisocket model has some communication performance advantages over the regular socket it also has number of usability and manageability drawbacks including the complexity of establishing multiple channels and configuring them for good communication performance this paper discusses some enhancements for munisocket using autonomic computing techniques these techniques include self discovery for discovering the existence of network interfaces and their performance properties self configuration for establishing channels over the interfaces and self optimization for selecting the best channels combinations for efficiently sending messages of varying sizes while these techniques enhance the communication performance among computers they also reduce the complexity of configuring munisocket and make its interface compatible with the regular tcp socket interface which in turn allows for transparent use of munisocket by the applications
bounded semantics of ltl with existential interpretation and that of ectl the existential fragment of ctl and the characterization of these existentially interpreted properties have been studied and used as the theoretical basis for sat based bounded model checking this has led to lot of successful work with respect to error detection in the checking of ltl and actl the universal fragment of ctl properties by satisfiability testing bounded semantics of ltl with the universal interpretation and that of actl and the characterization of such properties by propositional formulas have not been successfully established and this hinders practical verification of valid universal properties by satisfiability checking this paper studies this problem and the contribution is bounded semantics for actl and characterization of actl properties by propositional formulas firstly we provide simple bounded semantics for actl without considering the practical aspect of the semantics based on converting kripke model to model called model in which the transition relation is captured by set of paths each path with transitions this bounded semantics is not practically useful for the evaluation of formula since it involves too many paths in the model then the technique is to divide the model into submodels with limited number of paths which depends on and the actl property to be verified such that if an actl property is true in every such model then it is true in the model as well this characterization can then be used as the basis for practical verification of valid actl properties by satisfiability checking simple case study is provided to show the use of this approach for both verification and error detection of an abstract two process program written as first order transition system
energy efficiency is one of the main concerns in the wireless information dissemination system this paper presents wireless broadcast stream organization scheme which enables complex queries eg aggregation queries to be processed in an energy efficient way for efficient processing of complex queries we propose an approach of broadcasting their pre computed results with the data stream wherein the way of replication of index and pre computation results are investigated through analysis and experiments we show that the new approach can achieve significant performance enhancement for complex queries with respect to the access time and tuning time
weblogs have become prevalent source of information for people to express themselves in general there are two genres of contents in weblogs the first kind is about the webloggers personal feelings thoughts or emotions we call this kind of weblogs affective articles the second kind of weblogs is about technologies and different kinds of informative news in this paper we present machine learning method for classifying informative and affective articles among weblogs we consider this problem as binary classification problem by using machine learning approaches we achieve about on information retrieval performance measures including precision recall and we set up three studies on the applications of above classification approach in both research and industrial fields the above classification approach is used to improve the performance of classification of emotions from weblog articles we also develop an intent driven weblog search engine based on the classification techniques to improve the satisfaction of web users finally our approach is applied to search for weblogs with great deal of informative articles
main memory cache performance continues to play an important role in determining the overall performance of object oriented object relational and xml databases an effective method of improving main memory cache performance is to prefetch or pre load pages in advance to their usage in anticipation of main memory cache misses in this paper we describe framework for creating prefetching algorithms with the novel features of path and cache consciousness path consciousness refers to the use of short sequences of object references at key points in the reference trace to identify paths of navigation cache consciousness refers to the use of historical page access knowledge to guess which pages are likely to be main memory cache resident most of the time and then assumes these pages do not exist in the context of prefetching we have conducted number of experiments comparing our approach against four highly competitive prefetching algorithms the results shows our approach outperforms existing prefetching techniques in some situations while performing worse in others we provide guidelines as to when our algorithm should be used and when others maybe more desirable
in this paper we propose means to enhance an architecture description language with description of component behavior notation used for this purpose should be able to express the interplay on the component’s interfaces and reflect step by step refinement of the component’s specification during its design in addition the notation should be easy to comprehend and allow for formal reasoning about the correctness of the specification refinement and also about the correctness of an implementation in terms of whether it adheres to the specification targeting all these requirements together the paper proposes employing behavior protocols which are based on notation similar to regular expressions as proof of the concept the behavior protocols are used in the sofa architecture description language at three levels interface frame and architecture key achievements of this paper include the definitions of bounded component behavior and protocol conformance relation using these concepts the designer can verify the adherence of component’s implementation to its specification at runtime while the correctness of refining the specification can be verified at design time
the problem of scheduling resources for tasks with variable requirements over time can be stated as follows we are given two sequences of vectors and sequence represents resource availability during time intervals where each vector has elements sequence represents resource requirements of task during intervals where each vector has elements we wish to find the earliest time interval termed latency such that for where and are the jth elements of vectors and respectively one application of this problem is scheduling for multimedia presentations the fastest known algorithm to compute the optimal solution of this problem has mathcal sqrt log computation time amir and farach in proceedings of the acm siam symposium on discrete algorithms soda san francisco ca pp inf comput we propose technique that approximates the optimal solution in linear time mathcal we evaluated the performance of our algorithm when used for multimedia scheduling our results show that of the time our solution is within of the optimal
aspects are now commonly used to add functionality that otherwise would cut across the structure of object systems in this survey both directions in the connection between aspects and formal methods are examined on the one hand the use of aspects to facilitate general software verification and especially model checking is demonstrated on the other hand the new challenges to formal specification and verification posed by aspects are defined and several existing solutions are described
to advance research and improve the scientific return on data collection and interpretation efforts in the geosciences we have developed methods of interactive visualization with special focus on immersive virtual reality vr environments earth sciences employ strongly visual approach to the measurement and analysis of geologic data due to the spatial and temporal scales over which such data ranges as observations and simulations increase in size and complexity the earth sciences are challenged to manage and interpret increasing amounts of data reaping the full intellectual benefits of immersive vr requires us to tailor exploratory approaches to scientific problems these applications build on the visualization method’s strengths using both perception and interaction with data and models to take advantage of the skills and training of the geological scientists exploring their data in the vr environment this interactive approach has enabled us to develop suite of tools that are adaptable to range of problems in the geosciences and beyond
nested datatypes are families of datatypes that are indexed over all types such that the constructors may relate different family members moreover the argument types of the constructors refer to indices given by expressions where the family name may occur especially in this case of true nesting there is no direct support by theorem provers to guarantee termination of functions that traverse these data structuresa joint article with abel and uustalu tcs pp proposes iteration schemes that guarantee termination not by structural requirements but just by polymorphic typing they are generic in the sense that no specific syntactic form of the underlying datatype functor is required in subsequent work accepted for the journal of functional programming the author introduced an induction principle for the verification of programs obtained from mendler style iteration of rank which is one of those schemes and justified it in the calculus of inductive constructions through an implementation in the theorem prover coqthe new contribution is an extension of this work to generalized mendler iteration introduced in abel et al cited above leading to map fusion theorem for the obtained iterative functions the results and their implementation in coq are used for case study on representation of untyped lambda calculus with explicit flattening substitution is proven to fulfill two of the three monad laws the third only for hereditarily canonical terms but this is rectified by relativisation of the whole construction to those terms
we study the problem of tree pattern query rewriting using multiple views for the class of tree patterns in previous work has considered the rewriting problem using single view we consider two different ways of combining multiple views define rewritings of tree pattern using these combinations and study the relationship between them we show that when rewritings using single views do not exist we may use such combinations of multiple views to rewrite query and even if rewritings using single views do exist the rewritings using combinations of multiple views may provide more answers than those provided by the union of the rewritings using the individual views we also study properties of intersections of tree patterns and present algorithms for finding rewritings using intersections of views
the development of shape repositories and databases rises the need of online visualization of objects the main issue with the remote visualization of large meshes is the transfer latency of the geometric information the remote viewer requires the transfer of all polygons before allowing object’s manipulation to avoid this latency problem an approach is to send several levels of details of the same object so that lighter versions can be displayed sooner and replaced with more detailed version later on this strategy requires more bandwidth implies abruptly changes in object aspect as the geometry refines as well as non negligible precomputing time since the appearance of model is more influenced by its normal field than its geometry we propose framework in which the object’s lod is replaced with single simplified mesh with lod of appearance by using appearance preserving octree textures apo this appearance lod is encoded in unique texture and the details are progressively downloaded when they are needed our apo based framework achieves nearly immediate object rendering while details are transmitted and smoothly added to the texture scenes keep low geometry complexity while being displayed at interactive framerate with maximum of visual details leading to better visual quality over bandwith ratio than pure geometric lod schemes our implementation is platform independent as it uses jogl and runs on simple web browser furthermore the framework doesn’t require processing on the server side during the client rendering
transactional memory tm is concurrent programming paradigm that aims to make concurrent programming easier than fine grain locking whilst providing similar performance and scalability several tm systems have been made available for research purposes however there is lack of wide range of non trivial benchmarks with which to thoroughly evaluate these tm systemsthis paper introduces lee tm non trivial and realistic tm benchmark suite based on lee’s routing algorithm the benchmark suite provides sequential lock based and transactional implementations to enable direct performance comparison lee’s routing algorithm has several of the desirable properties of non trivial tm benchmark such as large amounts of parallelism complex contention characteristics and wide range of transaction durations and lengths sample evaluation shows unfavourable transactional performance and scalability compared to lock based execution in contrast to much of the published tm evaluations and highlights the need for non trivial tm benchmarks
transactional memory offers significant advantages for concurrency control compared to locks this paper presents the design and implementation of transactional memory constructs in an unmanaged language unmanaged languages pose unique set of challenges to transactional memory constructs for example lack of type and memory safety use of function pointers aliasing of local variables and others this paper describes novel compiler and runtime mechanisms that address these challenges and optimize the performance of transactions in an unmanaged environment we have implemented these mechanisms in production quality compiler and high performance software transactional memory runtime we measure the effectiveness of these optimizations and compare the performance of lock based versus transaction based programming on set of concurrent data structures and the splash benchmark suite on processor smp system the transaction based version of the splash benchmarks scales much better than the coarse grain locking version and performs comparably to the fine grain locking version compiler optimizations significantly reduce the overheads of transactional memory so that on single thread the transaction based version incurs only about overhead compared to the lock based version for the splash benchmark suite thus our system is the first to demonstrate that transactions integrate well with an unmanaged language and can perform as well as fine grain locking while providing the programming ease of coarse grain locking even on an unmanaged environment
in single second modern processor can execute billions of instructions obtaining bird’s eye view of the behavior of program at these speeds can be difficult task when all that is available is cycle by cycle examination in many programs behavior is anything but steady state and understanding the patterns of behavior at run time can unlock multitude of optimization opportunitiesin this paper we present unified profiling architecture that can efficiently capture classify and predict phase based program behavior on the largest of time scales by examining the proportion of instructions that were executed from different sections of code we can find generic phases that correspond to changes in behavior across many metrics by classifying phases generically we avoid the need to identify phases for each optimization and enable unified prediction scheme that can forecast future behavior our analysis shows that our design can capture phases that account for over of execution using less that bytes of on chip memory
every human being reading short report concerning road accident gets an idea of its causes the work reported here attempts to enable computer to do the same ie to determine the causes of an event from textual description of it it relies heavily on the notion of norm for two reasons the notion of cause has often been debated but remains poorly understood we postulate that what people tend to take as the cause of an abnormal event like an accident is the fact that specific norm has been violated natural language processing has given prominent place to deduction and for what concerns semantics to truth based inference however norm based inference is much more powerful technique to get the conclusions that human readers derive from text the paper describes complete chain of treatments from the text to the determination of the cause the focus is set on what is called linguistic and semantico pragmatic reasoning the former extracts so called semantic literals from the result of the parse and the latter reduces the description of the accident to small number of kernel literals which are sufficient to determine its cause both of them use non monotonic reasoning system viz lparse and smodels several issues concerning the representation of modalities and time are discussed and illustrated by examples taken from corpus of reports obtained from an insurance company
link analysis algorithms have been extensively used in web information retrieval however current link analysis algorithms generally work on flat link graph ignoring the hierarchal structure of the web graph they often suffer from two problems the sparsity of link graph and biased ranking of newly emerging pages in this paper we propose novel ranking algorithm called hierarchical rank as solution to these two problems which considers both the hierarchical structure and the link structure of the web in this algorithm web pages are first aggregated based on their hierarchical structure at directory host or domain level and link analysis is performed on the aggregated graph then the importance of each node on the aggregated graph is distributed to individual pages belong to the node based on the hierarchical structure this algorithm allows the importance of linked web pages to be distributed in the web page space even when the space is sparse and contains new pages experimental results on the gov collection of trec and show that hierarchical ranking algorithm consistently outperforms other well known ranking algorithms including the pagerank blockrank and layerrank in addition experimental results show that link aggregation at the host level is much better than link aggregation at either the domain or directory levels
dynamic tree data structures maintain forests that change over time through edge insertions and deletions besides maintaining connectivity information in logarithmic time they can support aggregation of information over paths trees or both we perform an experimental comparison of several versions of dynamic trees st trees et trees rc trees and two variants of top trees self adjusting and worst case we quantify their strengths and weaknesses through tests with various workloads most stemming from practical applications we observe that simple linear time implementation is remarkably fast for graphs of small diameter and that worst case and randomized data structures are best when queries are very frequent the best overall performance however is achieved by self adjusting st trees
this paper revisits the general hypermedia architecture based on perspective of peer to peer pp networking and pervasive computing and argues that pp has much to offer open hypermedia
virtual machine monitors are becoming popular tools for the deployment of database management systems and other enterprise software applications in this paper we consider common resource consolidation scenario in which several database management system instances each running in virtual machine are sharing common pool of physical computing resources we address the problem of optimizing the performance of these database management systems by controlling the configurations of the virtual machines in which they run these virtual machine configurations determine how the shared physical resources will be allocated to the different database instances we introduce virtualization design advisor that uses information about the anticipated workloads of each of the database systems to recommend workload specific configurations offine furthermore runtime information collected after the deployment of the recommended configurations can be used to refine the recommendation to estimate the effect of particular resource allocation on workload performance we use the query optimizer in new what if mode we have implemented our approach using both postgresql and db and we have experimentally evaluated its effectiveness using dss and oltp workloads
discrete transforms are of primary importance and fundamental kernels in many computationally intensive scientific applications in this paper we investigate the performance of two such algorithms fast fourier transform fft and discrete wavelet transform dwt on the sony toshiba ibm cell broadband engine cell be heterogeneous multicore chip architected for intensive gaming applications and high performance computing we design an efficient parallel implementation of fast fourier transform fft to fully exploit the architectural features of the cell be our fft algorithm uses an iterative out of place approach and for to complex input samples outperforms all other parallel implementations of fft on the cell be including fftw our fft implementation obtains single precision performance of gflop on the cell be outperforming intel duo core woodcrest for inputs of greater than samples we also optimize discrete wavelet transform dwt in the context of jpeg for the cell be dwt has an abundant parallelism however due to the low temporal locality of the algorithm memory bandwidth becomes significant bottleneck in achieving high performance we introduce novel data decomposition scheme to achieve highly efficient dma data transfer and vectorization with low programming complexity also we merge the multiple steps in the algorithm to reduce the bandwidth requirement this leads to significant enhancement in the scalability of the implementation our optimized implementation of dwt demonstrates and times speedup using one cell be chip to the baseline code for the lossless and lossy transforms respectively we also provide the performance comparison with the amd barcelona quad core opteron processor and the cell be excels the amd barcelona processor this highlights the advantage of the cell be over general purpose multicore processors in processing regular and bandwidth intensive scientific applications
software development is cooperative activity that heavily relies on the quality and effectiveness of the communication channels established within the development team and with the end user in the software engineering field several software engineering environments see have been developed to support and facilitate software development the most recent generation of these environments called process centered see psee supports the definition and the execution of various phases of the software process this is achieved by explicitly defining cooperation procedures and by supporting synchronization and data sharing among its usersactually cooperation support is theme of general interest and applies to all domains where computers can be exploited to support human intensive activities this has generated variety of research initiatives and support technology that is usually denoted by the acronym cscw computer supported cooperative work psee and cscw technologies have been developed rather independently from each other leading to large amount of research results tools and environments and practical experiences we argue that we have reached stage in technology development where it is necessary to assess and evaluate the effectiveness of the research efforts carried out so far moreover it is important to understand how to integrate and exploit the results of these different effortsthe goal of the paper is to understand which kind of basic functionalities psee can and should offer and how these environments can be integrated with other tools to effectively support cooperation in software development in particular the paper introduces process model we have built to support cooperative activity related to anomaly management in an industrial software factory the core of the paper is then constituted by the presentation and discussion of the experiences and results that we have derived from this modeling activity and how they related to the general problem of supporting cooperation in software development the project was carried out using the spade psee and the imaginedesk cscw toolkit both developed at politecnico di milano and cefriel during the past four years
this paper introduces gophers social game for mobile devices that utilises task oriented gameplay to create novel entertainment experience the study combines number of key research themes mobile social gaming acquiring useful data through gameplay and content sharing in mobile settings the experience of trialling the game in the real world is discussed and the findings from the study are presented
in this paper we compare and contrast two techniques to improve capacity conflict miss traffic in cc numa dsm clusters page migration replication optimizes read write accesses to page used by single processor by migrating the page to that processor and replicates all read shared pages in the sharers local memories numa optimizes read write accesses to any page by allowing processor to cache that page in its main memory page migration replication requires less hardware complexity as compared to numa but has limited applicability and incurs much higher overheads even with tuned hardware software support in this paper we compare and contrast page migration replication and numa on simulated clusters of symmetric multiprocessors executing shared memory applications our results show that both page migration replication and numa significantly improve the system performance over ldquo first touch rdquo migration in many applications page migration replication has limited opportunity and can not eliminate all the capacity conflict misses even with fast hardware support and unlimited amount of memory numa always performs best given page cache large enough to fit an application’s primary working set and subsumes page migration replication numa benefits more from hardware support to accelerate page operations than page migration replication and integrating page migration replication into numa to help reduce the hardware cost requires sophisticated mechanisms and policies to select candidates for page migration replication
the problem of implementing shared object of one type from shared objects of other types has been extensively researched recent focus has mostly been on wait free implementations which permit every process to complete its operations on implemented objects regardless of the speeds of other processes it is known that shared objects of different types have differing abilities to support wait free implementations it is therefore natural to want to arrange types in hierarchy that reflects their relative abilities to support wait free implementations in this paper we formally define robustness and other desirable properties of hierarchies roughly speaking hierarchy is robust if each type is ldquo stronger rdquo than any combination of lower level types we study two specific hierarchies one that we call hrm in which the level of type is based on the ability of an unbounded number of objects of that type and another hierarchy that we call hr in which type’s level is based on the ability of fixed number of objects of that type we prove that resource bounded hierarchies such as hr and its variants are not robust we also establish the unique importance of hrm every nontrivial robust hierarchy if one exists is necessarily ldquo coarsening rdquo of hrm
lazy dfa deterministic finite automata approach has been recently proposed to for efficient xml stream data processing this paper discusses the drawbacks of the approach suggests several optimizations as solutions and presents detailed analysis for the processing model the experiments show that our proposed approach is indeed effective and scalable
formal methods are helpful for many issues raised in the web services area in this article we advocate the use of process algebra as first step in the design and development of executable web services verification tools can be used to validate the correct execution of these formal descriptions we define some guidelines to encode abstract specifications of services to be written using these calculi into executable web services as back end language we consider bpel as the orchestration language we illustrate our approach through the development of simple business application
the limited built in configurability of linux can lead to expensive code size overhead when it is used in the embedded market to overcome this problem we propose the application of link time compaction and specialization techniques that exploit the priori known fixed runtime environment of many embedded systems in experimental setups based on the arm xscale and platforms the proposed techniques are able to reduce the kernel memory footprint with over percnt we also show how relatively simple additions to existing binary rewriters can implement the proposed techniques for complex very unconventional program such as the linux kernel we note that even after specialization lot of seemingly unnecessary code remains in the kernel and propose to reduce the footprint of this code by applying code compression techniques this technique combined with the previous ones reduces the memory footprint with over percnt for the platform and percnt for the arm platform finally we pinpoint an important code size growth problem when compaction and compression techniques are combined on the arm platform
modern intrusion detection systems are comprised of three basically different approaches host based network based and third relatively recent addition called procedural based detection the first two have been extremely popular in the commercial market for number of years now because they are relatively simple to use understand and maintain however they fall prey to number of shortcomings such as scaling with increased traffic requirements use of complex and false positive prone signature databases and their inability to detect novel intrusive attempts this intrusion detection system interacts with the access control system to deny further access when detection occurs and represent practical implementation addressing these and other concerns this paper presents an overview of our work in creating practical database intrusion detection system based on many years of database security research the proposed solution detects wide range of specific and general forms of misuse provides detailed reports and has low false alarm rate traditional commercial implementations of database security mechanisms are very limited in defending successful data attacks authorized but malicious transactions can make database useless by impairing its integrity and availability the proposed solution offers the ability to detect misuse and subversion through the direct monitoring of database operations inside the database host providing an important complement to host based and network based surveillance suites of the proposed solution may be deployed throughout network and their alarms man aged correlated and acted on by remote or local subscribing security services thus helping to address issues of decentralized management
rule bases are increasingly being used as repositories of knowledge content on the semantic web as the size and complexity of these rule bases increases developers and end users need methods of rule abstraction to facilitate rule management in this paper we describe rule abstraction method for semantic web rule language swrl rules that is based on lexical analysis and set of heuristics our method results in tree data structure that we exploit in creating techniques to visualize paraphrase and categorize swrl rules we evaluate our approach by applying it to several biomedical ontologies that contain swrl rules and show how the results reveal rule patterns within the rule base we have implemented our method as plug in tool for protégé owl the most widely used ontology modeling software for the semantic web our tool can allow users to rapidly explore content and patterns in swrl rule bases enabling their acquisition and management
this paper describes trap software tool that enables new adaptable behavior to be added to existing programs in transparent fashion in previous investigations we used an aspect oriented approach to manually define aspects for adaptation infrastructure which were woven into the original application code at compile time in follow on work we developed trap transparent shaping technique for automatically generating adaptation aspects where trap is specific instantiation of trap this paper presents our work into building trap which was intended to be port of trap into designing trap required us to overcome two major hurdles lack of reflection in and the incompatibility between the management of objects in and the aspect weaving technique used in trap we used generative programming methods to produce two tools trapgen and trapcc that work together to produce the desired trap functionality details of the trap architecture and operation are presented which we illustrate with description of case study that adds dynamic auditing capabilities to an existing distributed application
schema matching is the task of matching between concepts describing the meaning of data in various heterogeneous distributed data sources with many heuristics to choose from several tools have enabled the use of schema matcher ensembles combining principles by which different schema matchers judge the similarity between concepts in this work we investigate means of estimating the uncertainty involved in schema matching and harnessing it to improve an ensemble outcome we propose model for schema matching based on simple probabilistic principles we then propose the use of machine learning in determining the best mapping and discuss its pros and cons finally we provide thorough empirical analysis using both real world and synthetic data to test the proposed technique we conclude that the proposed heuristic performs well given an accurate modeling of uncertainty in matcher decision making
many database applications that need to disseminate dynamic information from server to various clients can suffer from heavy communication costs data caching at client can help mitigate these costs particularly when individual rm push hbox rm pull decisions are made for the different semantic regions in the data space the server is responsible for notifying the client about updates in the rm push regions the client needs to contact the server for queries that ask for data in the rm pull regions we call the idea of partitioning the data space into rm push hbox rm pull regions to minimize communication cost data gerrymandering in this paper we present solutions to technical challenges in adopting this simple but powerful idea we give provably optimal cost dynamic programming algorithm for gerrymandering on single query attribute we propose family of efficient heuristics for gerrymandering on multiple query attributes we handle the dynamic case in which the workloads of queries and updates evolve over time we validate our methods through extensive experiments on real and synthetic data sets
we introduce the bounded deformation tree or bd tree which can perform collision detection with reduced deformable models at costs comparable to collision detection with rigid objects reduced deformable models represent complex deformations as linear superpositions of arbitrary displacement fields and are used in variety of applications of interactive computer graphics the bd tree is bounding sphere hierarchy for output sensitive collision detection with such models its bounding spheres can be updated after deformation in any order and at cost independent of the geometric complexity of the model in fact the cost can be as low as one multiplication and addition per tested sphere and at most linear in the number of reduced deformation coordinates we show that the bd tree is also extremely simple to implement and performs well in practice for variety of real time and complex off line deformable simulation examples
first come first served fcfs mutual exclusion me is the problem of ensuring that processes attempting to concurrently access shared resource do so one by one in fair order in this paper we close the complexity gap between fcfs me and me in the asynchronous shared memory model where processes communicate using atomic reads and writes only and do not fail our main result is the first known fcfs me algorithm that makes logn remote memory references rmrs per passage and uses only atomic reads and writes our algorithm is also adaptive to point contention more precisely the number of rmrs process makes per passage in our algorithm is min logn where is the point contention our algorithm matches known rmr complexity lower bounds for the class of me algorithms that use reads and writes only and beats the rmr complexity of prior algorithms in this class that have the fcfs property
collaborative prediction refers to the task of predicting user preferences on the basis of ratings by other users collaborative prediction suffers from the cold start problem where predictions of ratings for new items or predictions of new users preferences are required various methods have been developed to overcome this limitation exploiting side information such as content information and demographic user data in this paper we present matrix factorization method for incorporating side information into collaborative prediction we develop weighted nonnegative matrix co tri factorization wnmctf where we jointly minimize weighted residuals each of which involves nonnegative factor decomposition of target or side information matrix numerical experiments on movielens data confirm the useful behavior of wnmctf when operating from cold start
results clustering in web searching is useful for providing users with overviews of the results and thus allowing them to restrict their focus to the desired parts however the task of deriving single word or multiple word names for the clusters usually referred as cluster labeling is difficult because they have to be syntactically correct and predictive moreover efficiency is an important requirement since results clustering is an online task suffix tree clustering stc is clustering technique where search results mainly snippets can be clustered fast in linear time incrementally and each cluster is labeled with phrase in this paper we introduce variation of the stc called stc with scoring formula that favors phrases that occur in document titles and differs in the way base clusters are merged and novel non merging algorithm called nm stc that results in hierarchically organized clusters the comparative user evaluation showed that both stc and nm stc are significantly more preferred than stc and that nm stc is about two times faster than stc and stc
slg resolution uses tabling to evaluate nonfloundering normal logic pr ograms according to the well founded semantics the slg wam which forms the engine of the xsb system can compute in memory recursive queries an order of magnitute faster than current deductive databases at the same time the slg wam tightly intergrates prolog code with tabled slg code and executes prolog code with minimal overhead compared to the wam as result the slg wam brings to logic programming important termination and complexity properties of deductive databases this article describes the architecture of the slg wam for powerful class of programs the class of fixed order dynamically stratified programs we offer detailed description of the algorithms data structures and instructions that the slg wam adds to the wam and performance analysis of engine overhead due to the extensions
we report on an automated runtime anomaly detection method at the application layer of multi node computer systems although several network management systems are available in the market none of them have sufficient capabilities to detect faults in multi tier web based systems with redundancy we model web based system as weighted graph where each node represents service and each edge represents dependency between services since the edge weights vary greatly over time the problem we address is that of anomaly detection from time sequence of graphsin our method we first extract feature vector from the adjacency matrix that represents the activities of all of the services the heart of our method is to use the principal eigenvector of the eigenclusters of the graph then we derive probability distribution for an anomaly measure defined for time series of directional data derived from the graph sequence given critical probability the threshold value is adaptively updated using novel online algorithmwe demonstrate that fault in web application can be automatically detected and the faulty services are identified without using detailed knowledge of the behavior of the system
experience sampling has been employed for decades to collect assessments of subjects intentions needs and affective states in recent years investigators have employed automated experience sampling to collect data to build predictive user models to date most procedures have relied on random sampling or simple heuristics we perform comparative analysis of several automated strategies for guiding experience sampling spanning spectrum of sophistication from random sampling procedure to increasingly sophisticated active learning the more sophisticated methods take decision theoretic approach centering on the computation of the expected value of information of probe weighing the cost of the short term disruptiveness of probes with their benefits in enhancing the long term performance of predictive models we test the different approaches in field study focused on the task of learning predictive models of the cost of interruption
most of the complexity of common data mining tasks is due to the unknown amount of information contained in the data being mined the more patterns and corelations are contained in such data the more resources are needed to extract them this is confirmed by the fact that in general there is not single best algorithm for given data mining task on any possible kind of input dataset rather in order to achieve good performances strategies and optimizations have to be adopted according to the dataset specific characteristics for example one typical distinction in transactional databases is between sparse and dense datasets in this paper we consider frequent set counting as case study for data mining algorithms we propose statistical analysis of the properties of transactional datasets that allows for characterization of the dataset complexity we show how such characterization can be used in many fields from performance prediction to optimization
this paper presents the http request global distribution using the fuzzy neural decision making mechanism two efficient algorithms gardib and gardib are proposed to support http request routing to the websites the algorithms use the fuzzy neural decision making method to assign each incoming request to the website with the least expected response time the response time includes the transmission time over the network as well as the time elapsed on the responding website server simulation experiments showed that gardib performed slightly better than gardib and both proposed algorithms outperformed other competitive distribution algorithms in all simulated workload scenarios
chip multiprocessors cmps are promising candidates for the next generation computing platforms to utilize large numbers of gates and reduce the effects of high interconnect delays one of the key challenges in cmp design is to balance out the often conflicting demands specifically for today’s image video applications and systems power consumption memory space occupancy area cost and reliability are as important as performance therefore compilation framework for cmps should consider multiple factors during the optimization process motivated by this observation this paper addresses the energy aware reliability support for the cmp architectures targeting in particular at array intensive image video applications there are two main goals behind our compiler approach first we want to minimize the energy wasted in executing replicas when there is no error during execution which should be the most frequent case in practice second we want to minimize the time to recover through the replicas from an error when it occurs this approach has been implemented and tested using four parallel array based applications from the image video processing domain our experimental evaluation indicates that the proposed approach saves significant energy over the case when all the replicas are run under the highest voltage frequency level without sacrificing any reliability over the latter
the speed project addresses the problem of computing symbolic computational complexity bounds of procedures in terms of their inputs we discuss some of the challenges that arise and present various orthogonal complementary techniques recently developed in the speed project for addressing these challenges
the question of determining which sets of constraints give rise to np complete problems and which give rise to tractable problems is an important open problem in the theory of constraint satisfaction it has been shown in previous papers that certain sufficient conditions for tractability and np completeness can be identified using algebraic properties of relations and that these conditions can be tested by solving particular form of constraint satisfaction problem the so called indicator problem this paper describes program which can solve the relevant indicator problems for arbitrary sets of constraints over small domains and for some sets of constraints over larger domains the main innovation in the program is its ability to deal with the many symmetries present in the problem it also has the ability to preserve symmetries in cases where this speeds up the solutionusing this program we have systematically investigated the complexity of all individual binary relations over domain of size four or less and of all individual ternary relations over domain of size three or less this automated analysis includes the derivation of more than new np completeness results and precisely identifies the small set of individual relations which cannot be classified as either tractable or np complete using the algebraic conditions presented in previous papers
the application of the tolerance paradigm to security intrusion tolerance has been raising reasonable amount of attention in the dependability and security communities in this paper we present novel approach to intrusion tolerence the idea is to use privileged components generically designated by wormholes to support the execution of intrusion tolerant protocols often called byzantine resilient in the literaturethe paper introduces the design of wormhole aware intrusion tolerant protocols using classical distributed systems problem consensus the system where the consensus protocol runs is mostly asynchronous and can fail in an arbitrary way except for the wormhole which is secure and synchronous using the wormhole to execute few critical steps the protocol manages to have low time complexity in the best case it runs in two rounds even if some processes are malicious the protocol also shows how often theoretical partial synchrony assumptions can be substantiated in practical distributed systems the paper shows the significance of the ttcb as an engineering paradigm since the protocol manages to be simple when compared with other protocols in the literature
this paper describes the design implementation and evaluation of areplication scheme to handle byzantine faults in transaction processing database systems the scheme compares answers from queries and updates on multiple replicas which are unmodified off the shelf systems to provide single database that is byzantine fault tolerant the scheme works when the replicas are homogeneous but it also allows heterogeneous replication in which replicas come from different vendors heterogeneous replicas reduce the impact of bugs and security compromises because they are implemented independently and are thus less likely to suffer correlated failures the main challenge in designing replication scheme for transactionprocessing systems is ensuring that the different replicas execute transactions in equivalent serial orders while allowing high degreeof concurrency our scheme meets this goal using novel concurrency control protocol commit barrier scheduling cbs we have implemented cbs in the context of replicated sql database hrdb heterogeneous replicated db which has been tested with unmodified production versions of several commercial and open source databases as replicas our experiments show an hrdb configuration that can tolerate one faulty replica has only modest performance overhead about for the tpc benchmark hrdb successfully masks several byzantine faults observed in practice and we have used it to find new bug in mysql
in large software development projects when programmer is assigned bug to fix she typically spends lot of time searching in an ad hoc manner for instances from the past where similar bugs have been debugged analyzed and resolved systematic search tools that allow the programmer to express the context of the current bug and search through diverse data repositories associated with large projects can greatly improve the productivity of debugging this paper presents the design implementation and experience from such search tool called debugadvisor the context of bug includes all the information programmer has about the bug including natural language text textual rendering of core dumps debugger output etc our key insight is to allow the programmer to collate this entire context as query to search for related information thus debugadvisor allows the programmer to search using fat query which could be kilobytes of structured and unstructured data describing the contextual information for the current bug information retrieval in the presence of fat queries and variegated data repositories all of which contain mix of structured and unstructured data is challenging problem we present novel ideas to solve this problem we have deployed debugadvisor to over users inside microsoft in addition to standard metrics such as precision and recall we present extensive qualitative and quantitative feedback from our users
graph has become increasingly important in modelling complicated structures and schemaless data such as proteins chemical compounds and xml documents given graph query it is desirable to retrieve graphs quickly from large database via graph based indices in this paper we investigate the issues of indexing graphs and propose novel solution by applying graph mining technique different from the existing path based methods our approach called gindex makes use of frequent substructure as the basic indexing feature frequent substructures are ideal candidates since they explore the intrinsic characteristics of the data and are relatively stable to database updates to reduce the size of index structure two techniques size increasing support constraint and discriminative fragments are introduced our performance study shows that gindex has times smaller index size but achieves times better performance in comparison with typical path based method graphgrep the gindex approach not only provides and elegant solution to the graph indexing problem but also demonstrates how database indexing and query processing can benefit form data mining especially frequent pattern mining furthermore the concepts developed here can be applied to indexing sequences trees and other complicated structures as well
in this paper we introduce family of expressive extensions of datalog called datalog as new paradigm for query answering over ontologies the datalog family admits existentially quantified variables in rule heads and has suitable restrictions to ensure highly efficient ontology querying we show in particular that datalog generalizes the dl lite family of tractable description logics which are the most common tractable ontology languages in the context of the semantic web and databases we also show how stratified negation can be added to datalog while keeping ontology querying tractable furthermore the datalog family is of interest in its own right and can moreover be used in various contexts such as data integration and data exchange
the current paper presents work in progress in integrated management of artefacts inautomotive electronics with focus on requirements case study on industrial requirements management was performed in swedish automotive supplier of software intensive systems based on this case study frame concept for integrated model management is proposed one core element in implementing the frame concept is an ontology based domain repository concepts architecture and realization of domain repository are introduced the domain repository is based on rdf rdfs and triple and includes extensions supporting navigation in the artefact net and derivation of new knowledge modlets actlets and spot views
high performance computing applications with data driven communication and computation characteristics require synchronization routines in the form of eureka barrier or termination synchronization in this paper we consider termination synchronization for two different execution models the ap and the aps model in the ap model processors are either active or passive and passive processor can be made active by another active processor in the aps model processors can also be in server state passive processor entering the server state does not become active again in addition server processor cannot change the status of other processors we describe and analyze solutions for both models and present experimental work highlighting the differences between the models we show that in almost all situations the use of an ap algorithm to detect termination in an aps environment will result in loss of performance our experimental work on the cray te provides insight into where and why this performance loss occurs
wireless sensor networks are increasingly being used in applications where the communication between nodes needs to be protected from eavesdropping and tampering such protection is typically provided using techniques from symmetric key cryptography the protocols in this domain suffer from one or more of the following problems weak security guarantees if some nodes are compromised lack of scalability high energy overhead for key management and increased end to end data latency in this paper we propose protocol called secos that mitigates these problems in static sensor networks secos divides the sensor field into control groups each with control node data exchange between nodes within control group happens through the mediation of the control head which provides the common key the keys are refreshed periodically and the control nodes are changed periodically to enhance security secos enhances the survivability of the network by handling compromise and failures of control nodes it provides the guarantee that the communication between any two sensor nodes remains secure despite the compromise of any number of other nodes in the network the experiments based on simulation model show seven time reduction in energy overhead and reduction in latency compared to spins which is one of the state of the art protocols for key management in sensor networks
we describe computational cognitive architecture for robots which we call act act embodied act is based on act but uses different visual auditory and movement modules we describe model that uses act to integrate visual and auditory information to perform conversation tracking in dynamic environment we also performed an empirical evaluation study which shows that people see our conversational tracking system as extremely natural
major challenge in frequent pattern mining is the sheer size of its mining results in many cases high minsup threshold may discover only commonsense patterns but low one may generate an explosive number of output patterns which severely restricts its usagein this paper we study the problem of compressing frequent pattern sets typically frequent patterns can be clustered with tightness measure delta called delta cluster and representative pattern can be selected for each cluster unfortunately finding minimum set of representative patterns is np hard we develop two greedy methods rpglobal and rplocal the former has the guaranteed compression bound but higher computational complexity the latter sacrifices the theoretical bounds but is far more efficient our performance study shows that the compression quality using rplocal is very close to rpglobal and both can reduce the number of closed frequent patterns by almost two orders of magnitude furthermore rplocal mines even faster than fpclose very fast closed frequent pattern mining method we also show that rpglobal and rplocal can be combined together to balance the quality and efficiency
we investigate the complexity of finding nash equilibria in which the strategy of each player is uniform on its support set we show that even for restricted class of win lose bimatrix games deciding the existence of such uniform equilibria is an np complete problem our proof is graph theoretical motivated by this result we also give np completeness results for the problems of finding regular induced subgraphs of large size or regularity which can be of independent interest
objects with mirroring optical characteristics are left out of the scope of most scanning methods we present here new automatic acquisition approach shape from distortion that focuses on that category of objects requires only still camera and color monitor and produces range scans plus normal and reflectance map of the target our technique consists of two steps first an improved environment matte is captured for the mirroring object using the interference of patterns with different frequencies to obtain sub pixel accuracy then the matte is converted into normal and depth map by exploiting the self coherence of surface when integrating the normal map along different paths the results show very high accuracy capturing even smallest surface details the acquired depth maps can be further processed using standard techniques to produce complete mesh of the object
separable assignment problem sap is defined by set of bins and set of items to pack in each bin value fij for assigning item to bin and separate packing constraint for each bin ie for bin family li of subsets of items that fit in bin the goal is to pack items into bins to maximize the aggregate value this class of problems includes the maximum generalized assignment problem gap and distributed caching problem dcp described in this papergiven beta approximation algorithm for finding the highest value packing of single bin we give polynomial time lp rounding based minus beta approximation algorithm simple polynomial time local search beta beta epsilon approximation algorithm for any epsilon therefore for all examples of sap that admit an approximation scheme for the single bin problem we obtain an lp based algorithm with epsilon approximation and local search algorithm with epsilon approximation guarantee furthermore for cases in which the subproblem admits fully polynomial approximation scheme such as for gap the lp based algorithm analysis can be strengthened to give guarantee of the best previously known approximation algorithm for gap is approximation by shmoys and tardos and chekuri and khanna our lp algorithm is based on rounding new linear programming relaxation with provably better integrality gapto complement these results we show that sap and dcp cannot be approximated within factor better than unless np sube dtime no log log even if there exists polynomial time exact algorithm for the single bin problemwe extend the approximation algorithm to nonseparable assignment problem with applications in maximizing revenue for budget constrained combinatorial auctions and the adwords assignment problem we generalize the local search algorithm to yield epsilon approximation algorithm for the median problem with hard capacities finally we study naturally defined game theoretic versions of these problems and show that they have price of anarchy of we also prove the existence of cycles of best response moves and exponentially long best response paths to pure or sink equilibria
the ability of reconfiguring software architectures in order to adapt them to new requirements or changing environment has been of growing interest but there is still not much formal work in the area most existing approaches deal with run time changes in deficient way the language to express computations is often at very low level of specification and the integration of two different formalisms for the computations and reconfigurations require sometimes substantial changes to address these problems we propose uniform algebraic approach with the following characteristicscomponents are written in high level program design language with the usual notion of statethe approach combines two existing frameworks mdash one to specify architectures the other to rewrite labelled graphs mdash just through small additions to either of themit deals with certain typical problems such as guaranteeing that new components are introduced in the correct state possibly transferred from the old components they replace it shows the relationships between reconfigurations and computations while keeping them separate because the approach provides semantics to given architecture through the algebraic construction of an equivalent program whose computations can be mirrored at the architectural level
deadlocks are possibly the best known bug pattern in computer systems in general certainly they are the best known in concurrent programming numerous articles some dating back more than years have been dedicated to the questions of how to design deadlock free programs how to statically or dynamically detect possible deadlocks how to avoid deadlocks at runtime and how to resolve deadlocks once they happen we start the paper with an investigation on how to exhibit potential deadlocks exhibiting deadlocks is very useful in testing as verifying if potential deadlock can actually happen is time consuming debugging activity there was recently some very interesting research in this direction however we believe our approach is more practical has no scaling issues and in fact is already industry readythe second contribution of our paper is in the area of healing multi threaded programs so they do not get into deadlocks this is an entirely new approach which is very different from the approaches in the literature that were meant for multi process scenarios and are not suitable and indeed not used in multithreaded programming while the basic ideas are fairly simple the details here are very important as any mistake is liable to actually create new deadlocks the paper describes the basic healing idea and its limitations the pitfalls and how to overcome them and experimental results
bursts in data center workloads are real problem for storage subsystems data volumes can experience peak request rates that are over an order of magnitude higher than average load this requires significant overprovisioning and often still results in significant request latency during peaks in order to address this problem we propose everest which allows data written to an overloaded volume to be temporarily off loaded into short term virtual store everest creates the short term store by opportunistically pooling underutilized storage resources either on server or across servers within the data center writes are temporarily off loaded from overloaded volumes to lightly loaded volumes thereby reducing the load on the former everest is transparent to and usable by unmodified applications and does not change the persistence or consistency of the storage system we evaluate everest using traces from production exchange mail server as well as other benchmarks our results show times reduction in mean response times during peaks
mutation testing measures the adequacy of test suite by seeding artificial defects mutations into program if mutation is not detected by the test suite this usually means that the test suite is not adequate however it may also be that the mutant keeps the program’s semantics unchanged and thus cannot be detected by any test such equivalent mutants have to be eliminated manually which is tedious we assess the impact of mutations by checking dynamic invariants in an evaluation of our javalanche framework on seven industrial size programs we found that mutations that violate invariants are significantly more likely to be detectable by test suite as consequence mutations with impact on invariants should be focused upon when improving test suites with less than of equivalent mutants our approach provides an efficient precise and fully automatic measure of the adequacy of test suite
in this paper we present improvements to recursive bisection basedplacement in contrast to prior work our horizontal cut lines arenot restricted to row boundaries this avoids narrow region problem to support these new cut line positions dynamic programmingbased legalization algorithm has been developed thecombination of these has improved the stability and lowered thewire lengths produced by our feng shui placement toolon benchmarks derived from industry partitioning examples our results are close to those of the annealing based tool dragon while taking only fraction of the run time on synthetic benchmarks our wire lengths are nearly better than those of dragonfor both benchmark suites our results are substantially better thanthose of the recursive bisection based tool capo and the analyticplacement tool kraftwerk
metamodels are increasingly being used in software engineering particularly in standards from both the omg and iso it is therefore critical that these metamodels be used correctly in this paper we investigate some of the pitfalls observed in the use of metamodelling ideas in software engineering and from these observations deduce some rules of thumb to help increase the quality of usage of metamodels in software engineering in the future
wireless mesh networks wmns consist of mesh routers and mesh clients where mesh routers have minimal mobility and form the backbone of wmns they provide network access for both mesh and conventional clients the integration of wmns with other networks such as the internet cellular ieee ieee ieee sensor networks etc can be accomplished through the gateway and bridging functions in the mesh routers mesh clients can be either stationary or mobile and can form client mesh network among themselves and with mesh routers wmns are anticipated to resolve the limitations and to significantly improve the performance of ad hoc networks wireless local area networks wlans wireless personal area networks wpans and wireless metropolitan area networks wmans they are undergoing rapid progress and inspiring numerous deployments wmns will deliver wireless services for large variety of applications in personal local campus and metropolitan areas despite recent advances in wireless mesh networking many research challenges remain in all protocol layers this paper presents detailed study on recent advances and open research issues in wmns system architectures and applications of wmns are described followed by discussing the critical factors influencing protocol design theoretical network capacity and the state of the art protocols for wmns are explored with an objective to point out number of open research issues finally testbeds industrial practice and current standard activities related to wmns are highlighted ed by discussing the critical factors influencing protocol design theoretical network capacity and the state of the art protocols for wmns are explored with an objective to outline number of open research issues finally testbeds industrial practice and current standard activities related to wmns are highlighted
regression test suite prioritization techniques reorder test cases so that on average more faults will be revealed earlier in the test suite’s execution than would otherwise be possible this paper presents genetic algorithm based test prioritization method that employs wide variety of mutation crossover selection and transformation operators to reorder test suite leveraging statistical analysis techniques such as tree model construction through binary recursive partitioning and kernel density estimation the paper’s empirical results highlight the unique role that the selection operators play in identifying an effective ordering of test suite the study also reveals that while truncation selection consistently outperformed the tournament and roulette operators in terms of test suite effectiveness increasing selection pressure consistently produces the best results within each class of operator after further explicating the relationship between selection intensity termination condition fitness landscape and the quality of the resulting test suite this paper demonstrates that the genetic algorithm based prioritizer is superior to random search and hill climbing and thus suitable for many regression testing environments
information search and retrieval interactions usually involve information content in the form of document collections information retrieval systems and interfaces and the user to fully understand information search and retrieval interactions between users cognitive space and the information space researchers need to turn to cognitive models and theories in this article the authors use one of these theories the basic level theory use of the basic level theory to understand human categorization is both appropriate and essential to user centered design of taxonomies ontologies browsing interfaces and other indexing tools and systems analyses of data from two studies involving free sorting by participants of images were conducted the types of categories formed and category labels were examined results of the analyses indicate that image category labels generally belong to superordinate to the basic level and are generic and interpretive implications for research on theories of cognition and categorization and design of image indexing retrieval and browsing systems are discussed copy wiley periodicals inc
garbage collectors are notoriously hard to verify due to their low level interaction with the underlying system and the general difficulty in reasoning about reachability in graphs several papers have presented verified collectors but either the proofs were hand written or the collectors were too simplistic to use on practical applications in this work we present two mechanically verified garbage collectors both practical enough to use for real world benchmarks the collectors and their associated allocators consist of assembly language instructions and macro instructions annotated with preconditions postconditions invariants and assertions we used the boogie verification generator and the automated theorem prover to verify this assembly language code mechanically we provide measurements comparing the performance of the verified collector with that of the standard bartok collectors on off the shelf benchmarks demonstrating their competitiveness
reasoning about the termination of equational programs in sophisticated equational languages such as elan maude obj cafeobj haskell and so on requires support for advanced features such as evaluation strategies rewriting modulo use of extra variables in conditions partiality and expressive type systems possibly including polymorphism and higher order however many of those features are at best only partially supported by current term rewriting termination tools for instance mu term me aprove ttt termptation etc while they may be essential to ensure termination we present sequence of theory transformations that can be used to bridge the gap between expressive membership equational programs and such termination tools and prove the correctness of such transformations we also discuss prototype tool performing the transformations on maude equational programs and sending the resulting transformed theories to some of the aforementioned standard termination tools
as practical opportunity for educating japanese young developers in the field of embedded software development software design contest involving the design of software to automatically control line trace robot and conduct running performance tests was held in this paper we give the results of the contest from the viewpoint of software quality evaluation we create framework for evaluating the software quality which integrated design model quality and the final system performance and conduct analysis using the framework as result of analysis it is found that the quantitative measurement of the structural complexity of the design models bears strong relationship to qualitative evaluation of the design conducted by judges it is also found that there is no strong correlation between design model quality evaluated by the judges and the final system performance for embedded software development it is particularly important to estimate and verify reliability and performance in the early stages using the model based on the analysis result we consider possible remedies with respect to the models submitted the evaluation methods used and the contest specifications in order to adequately measure several non functional quality characteristics including performance on the model it is necessary to improve the way of developing robot software such as applying model driven development and reexamine the evaluation methods
semantic web data exhibits very skewed frequency distributions among terms efficient large scale distributed reasoning methods should maintain load balance in the face of such highly skewed distribution of input data we show that term based partitioning used by most distributed reasoning approaches has limited scalability due to load balancing problems we address this problem with method for data distribution based on clustering in elastic regions instead of as signing data to fixed peers data flows semi randomly in the network data items speed date while being temporarily collocated in the same peer we introduce bias in the routing to allow semantically clustered neighborhoods to emerge our approach is self organising efficient and does not require any central coordination we have implemented this method on the marvin platform and have performed experiments on large real world datasets using cluster of up to nodes we compute the rdfs closure over different datasets and show that our clustering algorithm drastically reduces computation time calculating the rdfs closure of million triples in minutes
many end users wish to customize their applications automating common tasks and routines unfortunately this automation is difficult today users must choose between brittle macros and complex scripting languages programming by demonstration pbd offers middle ground allowing users to demonstrate procedure multiple times and generalizing the requisite behavior with machine learning unfortunately many pbd systems are almost as brittle as macro recorders offering few ways for user to control the learning process or correct the demonstrations used as training examples this paper presents chinle system which automatically constructs pbd systems for applications based on their interface specification the resulting pbd systems have novel interaction and visualization methods which allow the user to easily monitor and guide the learning process facilitating error recovery during training chinle constructed pbd systems learn procedures with conditionals and perform partial learning if the procedure is too complex to learn completely
review is provided of some database and representation issues involved in the implementation of geographic information systems gis the acm portal is published by the association for computing machinery copyright acm inc terms of usage privacy policy code of ethics contact us useful downloads adobe acrobat quicktime windows media player real player
the automated and semi automated analysis of source code has remained topic of intense research for more than thirty years during this period algorithms and techniques for source code analysis have changed sometimes dramatically the abilities of the tools that implement them have also expanded to meet new and diverse challenges this paper surveys current work on source code analysis it also provides road map for future work over the next five year period and speculates on the development of source code analysis applications techniques and challenges over the next and years
successful application of data mining to bioinformatics is protein classification number of techniques have been developed to classify proteins according to important features in their sequences secondary structures or three dimensional structures in this paper we introduce novel approach to protein classification based on significant patterns discovered on the surface of protein we define notion called alpha hbox rm surface we discuss the geometric properties of alpha hbox rm surface and present an algorithm that calculates the alpha hbox rm surface from finite set of points in we apply the algorithm to extracting the alpha hbox rm surface of protein and use pattern discovery algorithm to discover frequently occurring patterns on the surfaces the pattern discovery algorithm utilizes new index structure called the delta rm tree we use these patterns to classify the proteins while most existing techniques focus on the binary classification problem we apply our approach to classifying three families of proteins experimental results show the good performance of the proposed approach
web pages like people are often known by others in variety of contexts when those contexts are sufficiently distinct page’s importance may be better represented by multiple domains of authority rather than by one that indiscriminately mixes reputations in this work we determine domains of authority by examining the contexts in which page is cited however we find that it is not enough to determine separate domains of authority our model additionally determines the local flow of authority based upon the relative similarity of the source and target authority domains in this way we differentiate both incoming and outgoing hyperlinks by topicality and importance rather than treating them indiscriminately we find that this approach compares favorably to other topical ranking methods on two real world datasets and produces an approximately improvement in precision and quality of the top ten results over pagerank
this paper presents an automatic deforestation system stream fusion based on equational transformations that fuses wider range of functions than existing short cut fusion systems in particular stream fusion is able to fuse zips left folds and functions over nested lists including list comprehensions distinguishing feature of the framework is its simplicity by transforming list functions to expose their structure intermediate values are eliminated by general purpose compiler optimisations we have reimplemented the haskell standard list library on top of our framework providing stream fusion for haskell lists by allowing wider range of functions to fuse we see an increase in the number of occurrences of fusion in typical haskell programs we present benchmarks documenting time and space improvements
in todayýs fast paced world it’s increasingly difficult to understand and act promptly upon the content from the many available information streams temporal pooling addresses this problem by producing visual summary of recent stream content this summary is often in motion to incorporate newly arrived content this article reviews the perception of motion and change blindness offering several guidelines for the use of motion in visualization it then describes textpool tool for visualizing live text streams such as newswires blogs and closed captioning textpoolýs temporal pooling summarization is dynamic textual collage that clusters related terms textpool was tested with the content of several rss newswire feeds streaming stories per day textpool handled this bandwidth well producing useful summarizations of stream content
in semi supervised learning number of labeled examples are usually required for training an initial weakly useful predictor which is in turn used for exploiting the unlabeled examples however in many real world applications there may exist very few labeled training examples which makes the weakly useful predictor difficult to generate and therefore these semisupervised learning methods cannot be applied this paper proposes method working under two view setting by taking advantages of the correlations between the views using canonical component analysis the proposed method can perform semi supervised learning with only one labeled training example experiments and an application to content based image retrieval validate the effectiveness of the proposed method
touch is compelling input modality for interactive devices however touch input on the small screen of mobile device is problematic because user’s fingers occlude the graphical elements he wishes to work with in this paper we present lucidtouch mobile device that addresses this limitation by allowing the user to control the application by touching the back of the device the key to making this usable is what we call pseudo transparency by overlaying an image of the user’s hands onto the screen we create the illusion of the mobile device itself being semi transparent this pseudo transparency allows users to accurately acquire targets while not occluding the screen with their fingers and hand lucid touch also supports multi touch input allowing users to operate the device simultaneously with all fingers we present initial study results that indicate that many users found touching on the back to be preferable to touching on the front due to reduced occlusion higher precision and the ability to make multi finger input
this paper evaluates five supervised learning methods in the context of statistical spam filtering we study the impact of different feature pruning methods and feature set sizes on each learner’s performance using cost sensitive measures it is observed that the significance of feature selection varies greatly from classifier to classifier in particular we found support vector machine adaboost and maximum entropy model are top performers in this evaluation sharing similar characteristics not sensitive to feature selection strategy easily scalable to very high feature dimension and good performances across different datasets in contrast naive bayes commonly used classifier in spam filtering is found to be sensitive to feature selection methods on small feature set and fails to function well in scenarios where false positives are penalized heavily the experiments also suggest that aggressive feature pruning should be avoided when building filters to be used in applications where legitimate mails are assigned cost much higher than spams such as lambda so as to maintain better than baseline performance an interesting finding is the effect of mail headers on spam filtering which is often ignored in previous studies experiments show that classifiers using features from message header alone can achieve comparable or better performance than filters utilizing body features only this implies that message headers can be reliable and powerfully discriminative feature sources for spam filtering
scientific data are available through an increasing number of heterogeneous independently evolving sources although the sources themselves are independently evolving the data stored in them are not there exist inherent and intricate relationships between the distributed data sets and scientists are routinely required to write distributed queries in this setting being non experts in computer science the scientists are faced with two major challenges how to express such distributed queries this is non trivial task even if we assume that scientists are familiar with query languages like sql such queries can get arbitrarily complex as more sources are considered ii how to efficiently evaluate such distributed queries an efficient evaluation must account for batches of hundreds or even thousands of submitted queries and must optimize all of them as whole in this demo we focus on the biological domain for illustration purposes our solutions are applicable to other scientific domains and we present system called bioscout that offers solutions in both of the above challenges in more detail we demonstrate the following functionality in bioscout scientists draw their queries graphically resulting in query graph the scientist is unaware of the query language used or of any optimization issues given the query graph the system is able to generate as first step an optimal query plan for the submitted query ii bioscout uses four different strategies to combine the optimal query plans of individual queries to generate global query plan for all the submitted queries in the demo we illustrate graphically how each of the four strategies works
reusing software artifacts for system development is showing increasing promise as an approach to reducing the time and effort involved in building new systems and to improving the software development process and the quality of its outcome however software reuse has an associated steep learning curve since practitioners must become familiar with third party rationale for representing and implementing reusable assets for this reason enabling systematic approach to the reuse process by making software reuse tasks explicit allowing software frameworks to be instantiated using pre defined primitive and complex reuse operations and supporting the reuse process in semi automated way become crucial goals in this paper we present systematic reuse approach and the reuse description language rdl language designed to specify object oriented framework instantiation processes and an rdl execution environment which is the tool support for definition and execution of reuse processes and framework instantiations that lead to domain specific applications we illustrate our approach using dtframe framework for creating drawing editors
offering online personalized recommendation services helps improve customer satisfaction conventionally recommendation system is considered as success if clients purchase the recommended products however the act of purchasing itself does not guarantee satisfaction and truly successful recommendation system should be one that maximizes the customer’s after use gratification by employing an innovative associative classification method we are able to predict customer’s ultimate pleasure based on customer’s characteristics product will be recommended to the potential buyer if our model predicts his her satisfaction level will be high the feasibility of the proposed recommendation system is validated through laptop inspiron
this paper draws several observations from our experiences in building support for object groups these observations actually go beyond our experiences and may apply to many other developments of object based distributed systemsour first experience aimed at building support for smalltalk object replication using the isis process group toolkit it was quite easy to achieve group transparency but we were confronted with strong mismatch between the rigidity of the process group model and the flexible nature of object interactions consequently we decided to build our own object oriented protocol framework specifically dedicated to support object groups instead of using process group toolkit we built our framework in such way that basic distributed protocols such as failure detection and multicasts are considered as first class entities directly accessible to the programmers to achieve flexible and dynamic protocol composition we had to go beyond inheritance and objectify distributed algorithmsour second experience consisted in building corba service aimed at managing group of objects written on different languages and running on different platforms this experience revealed mismatch between the asynchrony of group protocols and the synchrony of standard corba interaction mechanisms which limited the portability of our corba object group service we restricted the impact of this mismatch by encapsulating asynchrony issues inside specific messaging sub servicewe dissect the cost of object group transparency in our various implementations and we point out the recurrent sources of overheads namely message indirection marshaling unmarshaling and strong consistency
several parallel architectures such as gpus and the cell processor have fast explicitly managed on chip memories in addition to slow off chip memory they also have very high computational power with multiple levels of parallelism significant challenge in programming these architectures is to effectively exploit the parallelism available in the architecture and manage the fast memories to maximize performance in this paper we develop an approach to effective automatic data management for on chip memories including creation of buffers in on chip local memories for holding portions of data accessed in computational block automatic determination of array access functions of local buffer references and generation of code that moves data between slow off chip memory and fast local memories we also address the problem of mapping computation in regular programs to multi level parallel architectures using multi level tiling approach and study the impact of on chip memory availability on the selection of tile sizes at various levels experimental results on gpu demonstrate the effectiveness of the proposed approach
adaptation online learning by autonomous virtual characters due to interaction with human user in virtual environment is difficult and important problem in computer animation in this article we present novel multi level technique for fast character adaptation we specifically target environments where there is cooperative or competitive relationship between the character and the human that interacts with that characterin our technique distinct learning method is applied to each layer of the character’s behavioral or cognitive model this allows us to efficiently leverage the character’s observations and experiences in each layer this also provides convenient temporal distinction between what observations and experiences provide pertinent lessons for each layer thus the character can quickly and robustly learn how to better interact with any given unique human user relying only on observations and natural performance feedback from the environment no explicit feedback from the human our technique is designed to be general and can be easily integrated into most existing behavioral animation systems it is also fast and memory efficient
we compress storage and accelerate performance of precomputed radiance transfer prt which captures the way an object shadows scatters and reflects light prt records over many surface points transfer matrix at run time this matrix transforms vector of spherical harmonic coefficients representing distant low frequency source lighting into exiting radiance per point transfer matrices form high dimensional surface signal that we compress using clustered principal component analysis cpca which partitions many samples into fewer clusters each approximating the signal as an affine subspace cpca thus reduces the high dimensional transfer signal to low dimensional set of per point weights on per cluster set of representative matrices rather than computing weighted sum of representatives and applying this result to the lighting we apply the representatives to the lighting per cluster on the cpu and weight these results per point on the gpu since the output of the matrix is lower dimensional than the matrix itself this reduces computation we also increase the accuracy of encoded radiance functions with new least squares optimal projection of spherical harmonics onto the hemisphere we describe an implementation on graphics hardware that performs real time rendering of glossy objects with dynamic self shadowing and interreflection without fixing the view or light as in previous work our approach also allows significantly increased lighting frequency when rendering diffuse objects and includes subsurface scattering
structured documents are commonly edited using free form editor even though every string is an acceptable input it makes sense to maintain structured representation of the edited document the structured representation has number of uses structural navigation and optional structural editing structure highlighting etc the construction of the structure must be done incrementally to be efficient the time to process an edit operation should be proportional to the size of the change and ideally independent of the total size of the document we show that combining lazy evaluation and caching of intermediate partial results enables incremental parsing we build complete incremental parsing library for interactive systems with support for error correction
we present novel semi supervised boosting algorithms that incrementally build linear combinations of weak classifiers through generic functional gradient descent using both labeled and unlabeled training data our approach is based on extending information regularization framework to boosting bearing loss functions that combine log loss on labeled data with the information theoretic measures to encode unlabeled data even though the information theoretic regularization terms make the optimization non convex we propose simple sequential gradient descent optimization algorithms and obtain impressively improved results on synthetic benchmark and real world tasks over supervised boosting algorithms which use the labeled data alone and state of the art semi supervised boosting algorithm
fast and efficient page ranking mechanism for web crawling and retrieval remains as challenging issue recently several link based ranking algorithms like pagerank hits and opic have been proposed in this paper we propose novel recursive method based on reinforcement learning which considers distance between pages as punishment called distancerank to compute ranks of web pages the distance is defined as the number of average clicks between two pages the objective is to minimize punishment or distance so that page with less distance to have higher rank experimental results indicate that distancerank outperforms other ranking algorithms in page ranking and crawling scheduling furthermore the complexity of distancerank is low we have used university of california at berkeley’s web for our experiments
ecommerce personalization can help web sites build and retain relationships with customers but it also raises number of privacy concerns this paper outlines the privacy risks associated with personalization and describes number of approaches to personalization system design that can reduce these risks this paper also provides an overview of the fair information practice principles and discusses how they may be applied to the design of personalization systems and introduces privacy laws and self regulatory guidelines relevant to personalization privacy risks can be reduced when personalization system designs allow for pseudonymous interactions client side data stores and task based personalization in addition interfaces that allow users to control the collection and use of their profile information can further ease privacy concerns
radio frequency identification rfid technologies are used in many applications for data collection however raw rfid readings are usually of low quality and may contain many anomalies an ideal solution for rfid data cleansing should address the following issues first in many applications duplicate readings by multiple readers simultaneously or by single reader over period of time of the same object are very common the solution should take advantage of the resulting data redundancy for data cleaning second prior knowledge about the readers and the environment eg prior data distribution false negative rates of readers may help improve data quality and remove data anomalies and desired solution must be able to quantify the degree of uncertainty based on such knowledge third the solution should take advantage of given constraints in target applications eg the number of objects in same location cannot exceed given value to elevate the accuracy of data cleansing there are number of existing rfid data cleansing techniques however none of them support all the aforementioned features in this paper we propose bayesian inference based approach for cleaning rfid raw data our approach takes full advantage of data redundancy to capture the likelihood we design an state detection model and formally prove that the state model can maximize the system performance moreover in order to sample from the posterior we devise metropolis hastings sampler with constraints mh which incorporates constraint management to clean rfid raw data with high efficiency and accuracy we validate our solution with common rfid application and demonstrate the advantages of our approach through extensive simulations
searchable symmetric encryption sse allows party to outsource the storage of its data to another party server in private manner while maintaining the ability to selectively search over it this problem has been the focus of active research in recent years in this paper we show two solutions to sse that simultaneously enjoy the following properties both solutions are more efficient than all previous constant round schemes in particular the work performed by the server per returned document is constant as opposed to linear in the size of the data both solutions enjoy stronger security guarantees than previous constant round schemes in fact we point out subtle but serious problems with previous notions of security for sse and show how to design constructions which avoid these pitfalls further our second solution also achieves what we call adaptive sse security where queries to the server can be chosen adaptively by the adversary during the execution of the search this notion is both important in practice and has not been previously consideredsurprisingly despite being more secure and more efficient our sse schemes are remarkably simple we consider the simplicity of both solutions as an important step towards the deployment of sse technologiesas an additional contribution we also consider multi user sse all prior work on sse studied the setting where only the owner of the data is capable of submitting search queries we consider the natural extension where an arbitrary group of parties other than the owner can submit search queries we formally define sse in the multi user setting and present an efficient construction that achieves better performance than simply using access control mechanisms
in this work we analyze the behavior on company internal social network site to determine which interaction patterns signal closeness between colleagues regression analysis suggests that employee behavior on social network sites snss reveals information about both professional and personal closeness while some factors are predictive of general closeness eg content recommendations other factors signal that employees feel personal closeness towards their colleagues but not professional closeness eg mutual profile commenting this analysis contributes to our understanding of how sns behavior reflects relationship multiplexity the multiple facets of our relationships with sns connections
in this paper we present new approach that incorporates semantic structure of sentences in form of verb argument structure to measure semantic similarity between sentences the variability of natural language expression makes it difficult for existing text similarity measures to accurately identify semantically similar sentences since sentences conveying the same fact or concept may be composed lexically and syntactically different inversely sentences which are lexically common may not necessarily convey the same meaning this poses significant impact on many text mining applications performance where sentence level judgment is involved the evaluation has shown that by processing sentence at its semantic level the performance of similarity measures is significantly improved
the aim in this paper is to develop method for clustering together image views of the same object class local invariant feature methods such as sift have been proven effective for image clustering however they have made either relatively little use or too complex use of geometric constraints and are confounded when the detected features are superabundant here we make two contributions aimed at overcoming these problems first we rank the sift points sift using visual saliency second we use the reduced set of sift features to construct specific hyper graph cshg model of holistic structure based on the cshg model two stage clustering method is proposed in which images are clustered according to the pairwise similarity of the graphs which is combination of the traditional similarity of local invariant feature vectors and the geometric similarity between two graphs this method comprehensively utilizes both sift and geometric constraints and hence combines both global and local information experiments reveal that the method gives excellent clustering performance
the development of wireless communication technologies offers the possibility to provide new services to the users other than the web surfing futur is wireless telecommunication company interested in designing implementing and managing wmans wireless metropolitan area networks that has launched project called luna large unwired network applications with the objective of covering main locations in the province of trento trentino alto adige italy with wireless mesh network the futur business model is based on very low cost access to the luna network offering users services with tailor made advertising in this paper we present the luna ads client side application able to provide contents together with advertising considering the usability and accessibility requirements
the file system api of contemporary systems makes programs vulnerable to tocttou time of check to time of use race conditions existing solutions either help users to detect these problems by pinpointing their locations in the code or prevent the problem altogether by modifying the kernel or its api the latter alternative is not prevalent and the former is just the first step programmers must still address tocttou flaws within the limits of the existing api with which several important tasks can not be accomplished in portable straightforward manner recently dean and hu addressed this problem and suggested probabilistic hardness amplification approach that alleviated the matter alas shortly after borisov et al responded with an attack termed filesystem maze that defeated the new approach we begin by noting that mazes constitute generic way to deterministically win many tocttou races gone are the days when the probability was small in the face of this threat we develop new user level defense that can withstand mazes and show that our method is undefeated even by much stronger hypothetical attacks that provide the adversary program with ideal conditions to win the race enjoying complete and instantaneo us knowledge about the defending program’s actions and being able to perfectly synchronize accordingly the fact that our approach is immune to these unrealistic attacks suggests it can be used as simple and portable solution to large class of tocttou vulnerabilities without requiring modifications to the underlying operating system
distributed model management aims to support the wide spread sharing and usage of decision support models web services is promising technology for supporting distributed model management activities such as model creation and delivery model composition model execution and model maintenance to fulfill dynamic decision support and problem solving requests we propose web services based framework for model management called mm ws to support various activities of the model management life cycle the framework is based on the recently proposed integrated service planning and execution isp approach for web services integration we discuss encoding of domain knowledge as individual models and utilize the mm ws framework to interleave synthesis of composite models with their execution prototypical implementation with an example is used to illustrate the utility of the framework to enable distributed model management and knowledge integration benefits and issues of using the framework to support model based decision making in organizational contexts are outlined
coordinations in noun phrases often pose the problem that elliptified parts have to be reconstructed for proper semantic interpretation unfortunately the detection of coordinated heads and identification of elliptified elements notoriously lead to ambiguous reconstruction alternatives while linguistic intuition suggests that semantic criteria might play an important if not superior role in disambiguating resolution alternatives our experiments on the reannotated wsj part of the penn treebank indicate that solely morpho syntactic criteria are more predictive than solely lexico semantic ones we also found that the combination of both criteria does not yield any substantial improvement
this paper describes an approach to modeling the evolution of non secure applications into secure applications in terms of the software requirements model and software architecture model the requirements for security services are captured separately from application requirements and the security services are encapsulated in connectors in the software architecture separately from the components providing functional services the enterprise architecture is described in terms of use case models static models and dynamic models the software architecture is described in terms of components and connectors which can be deployed to distributed configurations by separating application concerns from security concerns the evolution from non secure application to secure application can be achieved with less impact on the application an electronic commerce system is described to illustrate the approach
this article reports on the development of utility based mechanism for managing sensing and communication in cooperative multisensor networks the specific application on which we illustrate our mechanism is that of glacsweb this is deployed system that uses battery powered sensors to collect environmental data related to glaciers which it transmits back to base station so that it can be made available world wide to researchers in this context we first develop sensing protocol in which each sensor locally adjusts its sensing rate based on the value of the data it believes it will observe the sensors employ bayesian linear model to decide their sampling rate and exploit the properties of the kullback leibler divergence to place an appropriate value on the data then we detail communication protocol that finds optimal routes for relaying this data back to the base station based on the cost of communicating it derived from the opportunity cost of using the battery power for relaying data finally we empirically evaluate our protocol by examining the impact on efficiency of static network topology dynamic network topology the size of the network the degree of dynamism of the environment and the mobility of the nodes in so doing we demonstrate that the efficiency gains of our new protocol over the currently implemented method over month period are percnt percnt percnt and percnt respectively furthermore we show that our system performs at percnt percnt percnt and percnt of the theoretical optimal respectively despite being distributed protocol that operates with incomplete knowledge of the environment
information systems support data privacy by constraining user’s access to public views and thereby hiding the non public underlying data the privacy problem is to prove that none of the private data can be inferred from the information which is made public we present formal definition of the privacy problem which is based on the notion of certain answer then we investigate the privacy problem in the contexts of relational databases and ontology based information systems
object petri nets opns provide natural and modular method for modelling many real world systems we give structure pre serving translation of opns to prolog by encoding the opn semantics avoiding the need for an unfolding to flat petri net the translation provides support for reference and value semantics and even allows different objects to be treated as copyable or non copyable the method is developed for opns with arbitrary nesting we then apply logic programming tools to animate compile and model check opns in particular we use the partial evaluation system logen to produce an opn compiler and we use the model checker xtl to verify ctl formulae we also use logen to produce special purpose model checkers we present two case studies along with experimental results comparison of opn translations to maude specifications and model checking is given showing that our approach is roughly twice as fast for larger systems we also tackle infinite state model checking using the ecce system
part decomposition and conversely the construction of composite objects out of individual parts have long been recognized as ubiquitous and essential mechanisms involving abstraction this applies in particular in areas such as cad manufacturing software development and computer graphics although the part of relationship is distinguished in object oriented modeling techniques it ranks far behind the concept of generalization specialization and rigorous definition of its semantics is still missing in this paper we first show in which ways shift in emphasis on the part of relationship leads to analysis and design models that are easier to understand and to maintain we then investigate the properties of part of relationships in order to define their semantics this is achieved by means of categorization of part of relationships and by associating semantic constraints with individual categories we further suggest precise and compared with existing techniques less redundant specification of constraints accompanying part of categories based on the degree of exclusiveness and dependence of parts on composite objects although the approach appears generally applicable the object oriented unified modeling language uml is used to present our findings several examples demonstrate the applicability of the categories introduced
distributed denial of service ddos is major threat to the availability of internet services the anonymity allowed by ip networking together with the distributed large scale nature of the internet makes ddos attacks stealthy and difficult to counter to make the problem worse attack traffic is often indistinguishable from normal traffic as various attack tools become widely available and require minimum knowledge to operate automated anti ddos systems become increasingly important many current solutions are either excessively expensive or require universal deployment across many administrative domains this paper proposes two perimeter based defense mechanisms for internet service providers isps to provide the anti ddos service to their customers these mechanisms rely completely on the edge routers to cooperatively identify the flooding sources and establish rate limit filters to block the attack traffic the system does not require any support from routers outside or inside of the isp which not only makes it locally deployable but also avoids the stress on the isp core routers we also study new problem of perimeter based ip traceback and provide three solutions we demonstrate analytically and by simulations that the proposed defense mechanisms react quickly in blocking attack traffic while achieving high survival ratio for legitimate traffic even when percent of all customer networks attack the survival ratio for traffic from the other customer networks is still close to percent
query suggestion aims to suggest relevant queries for given query which help users better specify their information needs previously the suggested terms are mostly in the same language of the input query in this paper we extend it to cross lingual query suggestion clqs for query in one language we suggest similar or relevant queries in other languages this is very important to scenarios of cross language information retrieval clir and cross lingual keyword bidding for search engine advertisement instead of relying on existing query translation technologies for clqs we present an effective means to map the input query of one language to queries of the other language in the query log important monolingual and cross lingual information such as word translation relations and word co occurrence statistics etc are used to estimate the cross lingual query similarity with discriminative model benchmarks show that the resulting clqs system significantly out performs baseline system based on dictionary based query translation besides the resulting clqs is tested with french to english clir tasks on trec collections the results demonstrate higher effectiveness than the traditional query translation methods
data mining research typically assumes that the data to be analyzed has been identified gathered cleaned and processed into convenient form while data mining tools greatly enhance the ability of the analyst to make data driven discoveries most of the time spent in performing an analysis is spent in data identification gathering cleaning and processing the data similarly schema mapping tools have been developed to help automate the task of using legacy or federated data sources for new purpose but assume that the structure of the data sources is well understood however the data sets to be federated may come from dozens of databases containing thousands of tables and tens of thousands of fields with little reliable documentation about primary keys or foreign keyswe are developing system bellman which performs data mining on the structure of the database in this paper we present techniques for quickly identifying which fields have similar values identifying join paths estimating join directions and sizes and identifying structures in the database the results of the database structure mining allow the analyst to make sense of the database content this information can be used to eg prepare data for data mining find foreign key joins for schema mapping or identify steps to be taken to prevent the database from collapsing under the weight of its complexity
bansal and sviridenko bansal sviridenko new approximability and inapproximability results for dimensional bin packing in proceedings of the th annual acm siam symposium on discrete algorithms soda pp proved that there is no asymptotic ptas for dimensional orthogonal bin packing without rotations unless np we show that similar approximation hardness results hold for several and dimensional rectangle packing and covering problems even if rotations by ninety degrees are allowed moreover for some of these problems we provide explicit lower bounds on asymptotic approximation ratio of any polynomial time approximation algorithm our hardness results apply to the most studied case of dimensional problems with unit square bins and for dimensional strip packing and covering problems with strip of unit square base
an understanding of the topological structure of the internet is needed for quite number of networking tasks making decisions about peering relationships choice of upstream providers inter domain traffic engineering one essential component of these tasks is the ability to predict routes in the internet however the internet is composed of large number of independent autonomous systems ases resulting in complex interactions and until now no model of the internet has succeeded in producing predictions of acceptable accuracywe demonstrate that there are two limitations of prior models they have all assumed that an autonomous system as is an atomic structure it is not and ii models have tended to oversimplify the relationships between ases our approach uses multiple quasi routers to capture route diversity within the ases and is deliberately agnostic regarding the types of relationships between ases the resulting model ensures that its routing is consistent with the observed routes exploiting large number of observation points we show that our model provides accurate predictions for unobserved routes first step towards developing structural mod els of the internet that enable real applications
in standard text retrieval systems the documents are gathered and indexed on single server in distributed information retrieval dir the documents are held in multiple collections answers to queries are produced by selecting the collections to query and then merging results from these collections however in most prior research in the area collections are assumed to be disjoint in this paper we investigate the effectiveness of different combinations of server selection and result merging algorithms in the presence of duplicates we also test our hash based method for efficiently detecting duplicates and near duplicates in the lists of documents returned by collections our results based on two different designs of test data indicate that some dir methods are more likely to return duplicate documents and show that removing such redundant documents can have significant impact on the final search effectiveness
distributed hash tables dhts excel at exact match lookups but they do not directly support complex queries such as semantic search that is based on content in this paper we propose novel approach to efficient semantic search on dht overlays the basic idea is to place indexes of semantically close files into same peer nodes with high probability by exploiting information retrieval algorithms and locality sensitive hashing query for retrieving semantically close files is answered with high recall by consulting only small number eg of nodes that stores the indexes of the files semantically close to the query our approach adds only index information to peer nodes imposing only small storage overhead via detailed simulations we show that our approach achieves high recall for queries at very low cost ie the number of nodes visited for query is about independent of the overlay size
we propose notion of deterministic association rules for ordered data we prove that our proposed rules can be formally justified by purely logical characterization namely natural notion of empirical horn approximation for ordered data which involves background horn conditions these ensure the consistency of the propositional theory obtained with the ordered context the whole framework resorts to concept lattice models from formal concept analysis but adapted to ordered contexts we also discuss general method to mine these rules that can be easily incorporated into any algorithm for mining closed sequences of which there are already some in the literature
this paper presents novel architectural technique to hide fetch latency overhead of hardware encrypted and authenticated memory number of recent secure processor designs have used memory block encryption and authentication to protect un trusted external memory however the latency overhead of certain encryption modes or authentication schemes can be intolerably high this paper proposes novel techniques called frequent value ciphertext speculation and frequent value mac speculation that synergistically combine value prediction and the inherently pipelined cryptography hardware to address the issue of latency for memory decryption and authentication without sacrificing security frequent value ciphertext speculation can speed up memory decryption or mac integrity verification by speculatively encrypting predictable memory values and comparing the result ciphertext with the fetched ciphertext in mac speculation secure processor pre computes mac for speculated frequent values and compares the mac result with the fetched mac from memory using spec benchmark suite and detailed architecture simulator our results show that ciphertext speculation and mac speculation can significantly improve performance for direct memory encryption modes based on only most frequent bit values for eight benchmark programs the speedup is over and some benchmark programs achieve more than speedup for counter mode encrypted memory mac speculation can also significantly reduce the authentication overhead
an abstract type groups variables that are used for related purposes in program we describe dynamic unification based analysis for inferring abstract types initially each run time value gets unique abstract type run time interaction among values indicates that they have the same abstract type so their abstract types are unified also at run time abstract types for variables are accumulated from abstract types for values the notion of interaction may be customized permitting the analysis to compute finer or coarser abstract types these different notions of abstract type are useful for different tasks we have implemented the analysis for compiled binaries and for java bytecodes our experiments indicate that the inferred abstract types are useful for program comprehension improve both the results and the run time of follow on program analysis and are more precise than the output of comparable static analysis without suffering from overfitting
context the workshop was held to explore the potential for adapting the ideas of evidence based practices as used in medicine and other disciplines for use in software engineeringobjectives to devise ways of developing suitable evidence based practices and procedures especially the use of structured literature reviews and introducing these into software engineering research and practicemethod three sessions were dedicated to mix of presentations based on position papers and interactive discussion while the fourth focused upon the key issues as decided by the participantsresults an initial scoping of the major issues identification of useful parallels and some plans for future development of an evidence based software engineering communityconclusions while there are substantial challenges to introducing evidence based practices there are useful experiences to be drawn from variety of other domains
in multi hop wireless networks the mobile nodes usually act as routers to relay packets generated from other nodes however selfish nodes do not cooperate but make use of the honest ones to relay their packets which has negative effect on fairness security and performance of the network in this paper we propose novel incentive mechanism to stimulate cooperation in multi hop wireless networks fairness can be achieved by using credits to reward the cooperative nodes the overhead can be significantly reduced by using cheating detection system cds to secure the payment extensive security analysis demonstrates that the cds can identify the cheating nodes effectively under different cheating strategies simulation results show that the overhead of the proposed incentive mechanism is incomparable with the existing ones
finite automaton simply referred to as robot has to explore graph whose nodes are unlabeled and whose edge ports are locally labeled at each node the robot has no priori knowledge of the topology of the graph or of its size its task is to traverse all the edges of the graph we first show that for any state robot and any there exists planar graph of maximum degree with at most nodes that the robot cannot explore this bound improves all previous bounds in the literature more interestingly we show that in order to explore all graphs of diameter and maximum degree robot needs log memory bits even if we restrict the exploration to planar graphs this latter bound is tight indeed simple dfs up to depth enables robot to explore any graph of diameter and maximum degree using memory of size log bits we thus prove that the worst case space complexity of graph exploration is log bits
access control models are usually static ie permissions are granted based on policy that only changes seldom especially for scenarios in health care and disaster management more flexible support of access control ie the underlying policy is needed break glass is one approach for such flexible support of policies which helps to prevent system stagnation that could harm lives or otherwise result in losses today break glass techniques are usually added on top of standard access control solutions in an ad hoc manner and therefore lack an integration into the underlying access control paradigm and the systems access control enforcement architecture we present an approach for integrating in fine grained manner break glass strategies into standard access control models and their accompanying enforcement architecture this integration provides means for specifying break glass policies precisely and supporting model driven development techniques based on such policies
we present new linear time algorithm to compute good order for the point set of delaunay triangulation in the plane such good order makes reconstruction in linear time with simple algorithm possible similarly to the algorithm of snoeyink and van kreveld proceedings of th european symposium on algorithms esa pp our algorithm constructs such orders in logn phases by repeatedly removing constant fraction of vertices from the current triangulation compared to proceedings of th european symposium on algorithms esa pp we improve the guarantee on the number of removed vertices in each such phase if we restrict the degree of the points at the time they are removed to our algorithm removes at least of the points while the algorithm from proceedings of th european symposium on algorithms esa pp gives guarantee of we achieve this improvement by removing the points sequentially using breadth first search bfs based procedure that in contrast to proceedings of th european symposium on algorithms esa pp does not necessarily remove an independent set besides speeding up the algorithm removing more points in single phase has the advantage that two consecutive points in the computed order are usually closer to each other for this reason we believe that our approach is better suited for vertex coordinate compression we implemented prototypes of both algorithms and compared their running time on point sets uniformly distributed in the unit cube our algorithm is slightly faster to compare the vertex coordinate compression capabilities of both algorithms we round the resulting sequences of vertex coordinates to bit integers and compress them with simple variable length code our algorithm achieves about better vertex data compression than the algorithm from proceedings of th european symposium on algorithms esa pp
we investigate proof rules for information hiding using the recent formalism of separation logic in essence we use the separating conjunction to partition the internal resources of module from those accessed by the module’s clients the use of logical connective gives rise to form of dynamic partitioning where we track the transfer of ownership of portions of heap storage between program components it also enables us to enforce separation in the presence of mutable data structures with embedded addresses that may be aliased
in order to utilize the potential advantages of replicated collaborative cad system to support natural free fast and less constrained multi user human to human interaction local locking mechanism which can provide fast modeling response is adopted as concurrency control mechanism for replicated on line collaborative cad system human human interactive modeling is achieved by immediate local execution of modeling operations and exchange of modeling operations across all collaborative sites real time in particular an approach to achieve topological entity correspondence across collaborative sites during modeling procedure which is critical to guarantee the correctness and consistency of collaborative modeling result is proposed prototype system based on the acis geometric modeling kernel is implemented to verify availability of the proposed solution
in any collaborative system there are both symmetries and asymmetries present in the design of the technology and in the ways that technology is appropriated yet media space research tends to focus more on supporting and fostering the symmetries than the asymmetries throughout more than years of media space research the pursuit of increased symmetry whether achieved through technical or social means has been recurrent theme the research literature on the use of contemporary awareness systems in contrast displays little if any of this emphasis on symmetrical use indeed this body of research occasionally highlights the perceived value of asymmetry in this paper we unpack the different forms of asymmetry present in both media spaces and contemporary awareness systems we argue that just as asymmetry has been demonstrated to have value in contemporary awareness systems so might asymmetry have value in media spaces and in other cscw systems more generally to illustrate we present media space that emphasizes and embodies multiple forms of asymmetry and does so in response to the needs of particular work context
compilers for polymorphic languages can use runtime type inspection to support advanced implementation techniques such as tagless garbage collection polymorphic marshalling and flattened data structures intensional type analysis is type theoretic framework for expressing and certifying such type analyzing computations unfortunately existing approaches to intensional analysis do not work well on types with universal existential or fixpoint quantifiers this makes it impossible to code applications such as garbage collection persistence or marshalling which must be able to examine the type of any runtime valuewe present typed intermediate language that supports fully reflexive intensional type analysis by fully reflexive we mean that type analyzing operations are applicable to the type of any runtime value in the language in particular we provide both type level and term level constructs for analyzing quantified types our system supports structural induction on quantified types yet type checking remains decidable we show how to use reflexive type analysis to support type safe marshalling and how to generate certified type analyzing object code
efficient subsumption checking deciding whether subscription or publication is covered by set of previously defined subscriptions is of paramount importance for publish subscribe systems it provides the core system functionality matching of publications to subscriber needs expressed as subscriptions and additionally reduces the overall system load and generated traffic since the covered subscriptions are not propagated in distributed environments as the subsumption problem was shown previously to be co np complete and existing solutions typically apply pairwise comparisons to detect the subsumption relationship we propose monte carlo type probabilistic algorithm for the general subsumption problem it determines whether publication subscription is covered by disjunction of subscriptions in where is the number of subscriptions is the number of distinct attributes in subscriptions and is the number of tests performed to answer subsumption question the probability of error is problem specific and typically very small and sets an upper bound on our experimental results show significant gains in term of subscription set reduction which has favorable impact on the overall system performance as it reduces the total computational costs and networking traffic furthermore the expected theoretical bounds underestimate algorithm performance because it performs much better in practice due to introduced optimizations and is adequate for fast forwarding of subscriptions in case of high subscription rate
previous research has shown that staged execution se ie dividing program into segments and executing each segment at the core that has the data and or functionality to best run that segment can improve performance and save power however se’s benefit is limited because most segments access inter segment data ie data generated by the previous segment when consecutive segments run on different cores accesses to inter segment data incur cache misses thereby reducing performance this paper proposes data marshaling dm new technique to eliminate cache misses to inter segment data dm uses profiling to identify instructions that generate inter segment data and adds only bytes core of storage overhead we show that dm significantly improves the performance of two promising staged execution models accelerated critical sections and producer consumer pipeline parallelism on both homogeneous and heterogeneous multi core systems in both models dm can achieve almost all of the potential of ideally eliminating cache misses to inter segment data dm’s performance benefit increases with the number of cores
power dissipation limits have emerged as major constraint in the design of microprocessors this is true not only at the low end where cost and battery life are the primary drivers but also now at the midrange and high end system server level thus the ability to estimate power consumption at the high level during the early stage definition and trade off studies is key new methodology enhancement sought by design and performance architects we first review the fundamentals in terms of power estimation and power performance trade offs at the microarchitecture level we then discuss the opportunities of saving power that can be exposed via microarchitecture level modeling in particular the potential savings that can be effected through straightforward clock gating techniques is cited as an example we also describe some future ideas and trends in power efficient processor design examples of how microarchitectural observations can be used toward power saving circuit design optimizations are described the design and modeling challenges are in the context of work in progress within ibm research this research is in support of future high end processor development within ibm
federated system is popular paradigm for multiple organizations to collaborate and share resources for common benefits in practice however these organizations are often within different administrative domains and thus demand autonomous resource management as opposed to blindly exporting their resources for efficient search to address these challenges we present new resource discovery middleware called roads that can facilitate voluntary sharing in roads the participants can associate with each other at their own will and dictate the extent of sharing by properly exporting summaries condensed representation of their resource records to enable efficient search these summaries are aggregated and replicated along an overlay assisted server hierarchy and the queries are routed to those servers that are likely to hold the desired resources our experimental results show that roads provides not only flexible resource sharing for federated systems but also efficient resource discovery with performance comparable to centrally managed system
in this paper we give brief history on conceptual modeling in computer science and we discuss state of the art approaches it is claimed that number of problems remain to be solved schema first is no longer viable approach to meet the data needs in the dynamic world of the internet and web services this is also true for the schema later approach simply because data and data sources constantly change in depth and breadth and can neither wait for the schema to be completed nor for the data to be collected as solution we advocate for new schema during approach in which the process of conceptual modeling is in continuum with the operations in the database
this paper is proposing new platform for implementing services in future service oriented architectures the basic premise of our proposal is that by combining the large volume of uncontracted resources with small clusters of dedicated resources we can dramatically reduce the amount of dedicated resources while the goodput provided by the overall system remains at high level this paper presents particular strategies for implementing this idea for particular class of applications we performed very detailed simulations on synthetic and real traces to evaluate the performance of the proposed strategies our findings on compute intensive applications show that preemptive reallocation of resources is necessary for assured services the proposed preemption based scheduling heuristic can significantly improve utilization of the dedicated resources by opportunistically offloading the peak loads on uncontracted resources while keeping the service quality virtually unaffected
we describe work in progress on tools and infrastructure to support adaptive component based software for mobile devices in our case apple iphones our high level aim is design for appropriation ie system design for uses and contexts that designers may not be able to fully predict or model in advance logs of users system operation are streamed back in real time to evaluators data visualisation tools so that they can assess design problems and opportunities evaluators and developers can then create new software components that are sent to the mobile devices these components are either integrated automatically on the fly or offered as recommendations for users to accept or reject by connecting developers users and evaluators we aim to quicken the pace of iterative design so as to improve the process of creating and sustaining contextually fitting software
feature subset selection has become an important challenge in areas of pattern recognition machine learning and data mining as different semantics are hidden in numerical and categorical features there are two strategies for selecting hybrid attributes discretizing numerical variables or numericalize categorical features in this paper we introduce simple and efficient hybrid attribute reduction algorithm based on generalized fuzzy rough model theoretic framework of fuzzy rough model based on fuzzy relations is presented which underlies foundation for algorithm construction we derive several attribute significance measures based on the proposed fuzzy rough model and construct forward greedy algorithm for hybrid attribute reduction the experiments show that the technique of variable precision fuzzy inclusion in computing decision positive region can get the optimal classification performance number of the selected features is the least but accuracy is the best
recent work has focused on creating models for generating traveler behavior for micro simulations with the increase in hand held computers and gps devices there is likely to be an increasing demand for extending this idea to predicting an individual’s future travel plans for devices such as smart traveler’s assistant in this work we introduce technique based on sequential data mining for predicting multiple aspects of an individual’s next activity using combination of user history and their similarity to other travelers the proposed technique is empirically shown to perform better than more traditional approaches to this problem
we propose precomputation based approach for the real time rendering of scenes that include number of complex illumination phenomena such as radiosity and subsurface scattering and allows interactive modification of camera and lighting parameters at the heart of our approach lies novel parameterization of the rendering equation that is inherently supported by the modern gpu during the pre computation phase we build set of offset transfer maps based on the proposed parameterization which approximate the complete radiance transfer function for the scene the rendering phase is then reduced to set of texture blending and mapping operations that execute in real time on the gpu in contrast to the current state of the art which employs environment maps to produce global illumination our approach uses arbitrary first order lighting to compute final lighting solution and fully supports point and spot lights to discretize the transfer maps we develop an efficient method for generating and sampling continuous probability density functions from unordered data pointswe believe that the contributions of this paper offer significantly different approach to precomputed radiance transfer from those previously proposed
highly competitive and open environments should encompass mechanisms that will assist service providers sps in accounting for their interests ie offering at given period of time the adequate quality services in cost efficient manner assuming that user wishes to access specific service composed of distinct set of service tasks which can be served by various candidate service nodes csns problem that should be addressed is the assignment of service tasks to the most appropriate service nodes the pertinent problem is concisely defined optimally formulated and evaluated through simulation experiments on real network test bed
the postures character adopts over time are key expressive aspect of its movement while ik tools help character achieve positioning constraints there are few tools that help an animator with the expressive aspects of character’s poses three aspects are required in good pose design achieving set of world space constraints finding body shape that reflects the character’s inner state and personality and making adjustments to balance that act to strengthen the pose and also maintain realism this is routinely done in the performing arts but is uncommon in computer graphics our system combines all three components within single body shape solver the system combines feedback based balance control with hybrid ik system that utilizes optimization based and analytic ik components the ik system has been carefully designed to allow direct control over various aesthetically important aspects of body shape such as the type of curve in the spine and the relationship between the collar bones the system allows for both low level control and for higher level shape sets to be defined and used shape sets allow an animator to use single scalar to vary character’s pose within specified shape class providing an intuitive parameterization of posture changing shape set allows an animator to quickly experiment with different posture options for movement sequence supporting rapid exploration of the aesthetic space
we introduce new sublinear space data structure the count min sketch for summarizing data streams our sketch allows fundamental queries in data stream summarization such as point range and inner product queries to be approximately answered very quickly in addition it can be applied to solve several important problems in data streams such as finding quantiles frequent items etc the time and space bounds we show for using the cm sketch to solve these problems significantly improve those previously known typically from to in factor
previous object code compression schemes have employed static and semiadaptive compression algorithms to reduce the size of instruction memory in embedded systems the suggestion by number of researchers that adaptive compression techniques are unlikely to yield satisfactory results for code compression has resulted in virtually no investigation of their application to that domain this paper presents new adaptive approach to code compression which operates at the granularity of program’s cache lines where the context for compression is determined by an analysis of control flow in the code being compressed we introduce novel data structure the compulsory miss tree that is used to identify partial order in which compulsory misses will have occurred in an instruction cache whenever cache miss occurs this tree is used as basis for dynamically building and maintaining an lzw dictionary for compression decompression of individual instruction cache lines we applied our technique to eight benchmarks taken from the mibench and mediabench suites which were compiled with size optimization and subsequently compacted using link time optimizer prior to compressionresults from our experiments demonstrate object code size elimination averaging between and of the original linked code size depending on the cache line length under inspection
it is generally accepted that the management of imprecision and vagueness will yield more intelligent and realistic knowledge based applications in this paper we present fuzzy description logics framework based on certainty lattices our main feature is that an assertion is not just true or false like in classical description logics but certain to some degree where the certainty value is taken from certainty lattice we extend the well known fuzzy description logic based on fuzzy set theory shin to the fuzzy description logic based on certainty lattices theory shin the syntax semantics and logical properties of the shin are given and sound complete and terminating tableaux algorithm for deciding fuzzy abox consistency wrt rbox for the shin is presented in this paper various extensions of fuzzy description logics over lattices are also discussed
we extend fagin’s result on the equivalence between functional dependencies in relational databases and propositional horn clauses it is shown that this equivalence still holds for functional dependencies in databases that support complex values via nesting of records lists sets and multisetsthe equivalence has several implications firstly it extends well known result from relational databases to databases which are not in first normal form secondly it characterises the implication of functional dependencies in complex value databases in purely logical terms the database designer can take advantage of this equivalence to reduce database design problems to simpler problems in propositional logic an algorithm is presented for such an application furthermore relational database design tools can be reused to solve problems for complex value databases
web service selection enables user to find the most desirable service based on his her preferences however user preferences in real world can be either incomplete or inconsistent such that service selection cannot be conducted properly this paper presents system to facilitate web service selection in face of incomplete or inconsistent user preferences the system utilizes the information of historical users to amend the active user’s preference so as to improve the results of service selection we present detailed design of the system and verify its efficiency through extensive experiments
this paper presents scalable framework for real time raycasting of large unstructured volumes that employs hybrid bricking approach it adaptively combines original unstructured bricks in important focus regions with structured bricks that are resampled on demand in less important context regions the basis of this focus context approach is interactive specification of scalar degree of interest doi function thus rendering always considers two volumes simultaneously scalar data volume and the current doi volume the crucial problem of visibility sorting is solved by raycasting individual bricks and compositing in visibility order from front to back in order to minimize visual errors at the grid boundary it is always rendered accurately even for resampled bricks variety of different rendering modes can be combined including contour enhancement very important property of our approach is that it supports variety of cell types natively ie it is not constrained to tetrahedral grids even when interpolation within cells is used moreover our framework can handle multi variate data eg multiple scalar channels such as temperature or pressure as well as time dependent data the combination of unstructured and structured bricks with different quality characteristics such as the type of interpolation or resampling resolution in conjunction with custom texture memory management yields very scalable system
recently there is growing interest in the design development and deployment of sensor systems for applications of high level inference which leads to an increasing demand on connecting internet protocol ip network users to wireless sensor networks and accessing the available services and applications in this paper we first identify key requirements of designing an efficient and flexible architecture for integrating ip and sensor networks based on the requirements we outline an ip and sensor network integration architecture ipsa which provides ip mobility support and universal interface for ip mobile users to access sensor networks while considering application specific requirements with on demand processing code assigned to the middleware layer of an ip mobile user ipsa provides the flexibility for enabling diverse applications and manages network resources efficiently without additional requirements on sensor nodes except for the limited additional hardware requirement at ip mobile nodes
mining data streams of changing class distributions is important for real time business decision support the stream classifier must evolve to reflect the current class distribution this poses serious challenge on the one hand relying on historical data may increase the chances of learning obsolete models on the other hand learning only from the latest data may lead to biased classifiers as the latest data is often an unrepresentative sample of the current class distribution the problem is particularly acute in classifying rare events when for example instances of the rare class do not even show up in the most recent training data in this paper we use stochastic model to describe the concept shifting patterns and formulate this problem as an optimization one from the historical and the current training data that we have observed find the most likely current distribution and learn classifier based on the most likely distribution we derive an analytic solution and approximate this solution with an efficient algorithm which calibrates the influence of historical data carefully to create an accurate classifier we evaluate our algorithm with both synthetic and real world datasets our results show that our algorithm produces accurate and efficient classification
recent work is beginning to reveal the potential of numerical optimization as an approach to generating interfaces and displays optimization based approaches can often allow mix of independent goals and constraints to be blended in ways that would be difficult to describe algorithmically while optimization based techniques appear to offer several potential advantages further research in this area is hampered by the lack of appropriate tools this paper presents gadget an experimental toolkit to support optimization for interface and display generation gadget provides convenient abstractions of many optimization concepts gadget also provides mechanisms to help programmers quickly create optimizations including an efficient lazy evaluation framework powerful and configurable optimization structure and library of reusable components together these facilities provide an appropriate tool to enable exploration of new class of interface and display generation techniques
real time garbage collection rtgc has recently advanced to the point where it is being used in production for financial trading military command and control and telecommunications however among potential users of rtgc there is enormous diversity in both application requirements and deployment environments previously described rtgcs tend to work well in narrow band of possible environments leading to fragile systems and limiting adoption of real time garbage collection technology this paper introduces collector scheduling methodology called tax and spend and the collector design revisions needed to support it tax and spend provides general mechanism which works well across variety of application machine and operating system configurations tax and spend subsumes the predominant pre existing rtgc scheduling techniques it allows different policies to be applied in different contexts depending on the needs of the application virtual machines can co exist compositionally on single machine we describe the implementation of our system metronome ts as an extension of the metronome collector in ibm’s real time virtual machine product and we evaluate it running on an way smp blade with real time linux kernel compared to the state of the art metronome system on which it is based implemented in the identical infrastructure it achieves almost shorter latencies comparable utilization at shorter time window and mean throughput improvements of
object oriented and object relational dbms support set valued attributes which are natural and concise way to model complex information however there has been limited research to date on the evaluation of query operators that apply on sets in this paper we study the join of two relations on their set valued attributes various join types are considered namely the set containment set equality and set overlap joins we show that the inverted file powerful index for selection queries can also facilitate the efficient evaluation of most join predicates we propose join algorithms that utilize inverted files and compare them with signature based methods for several set comparison predicates
power management is an important problem in battery powered sensor networks as the sensors are required to operate for long time usually several weeks to several months one of the challenges in developing power management protocols for sensor networks is prototyping specifically existing programming platforms for sensor networks eg nesc tinyos use an event driven programming model and hence require the designers to be responsible for stack management buffer management flow control etc therefore the designers simplify prototyping their solutions either by implementing their own discrete event simulators or by modeling them in specialized simulators to enable the designers to prototype power management protocols in target platform eg nesc tinyos in this paper we use prose programming tool for sensor networks prose enables the designers to specify their programs in simple abstract models while hiding low level challenges of sensor networks and programming level challenges as case study in this paper we specify power management protocol with prose automatically generate the corresponding nesc tinyos code and evaluate its performance based on the performance results we expect that prose enables the designers to rapidly prototype quickly deploy and easily evaluate their protocols
this paper describes simple and effective quadratic placement algorithm called rql we show that good quadratic placement followed by local wirelength driven spreading can produce excellent results on large scale industrial asic designs as opposed to the current top performing academic placers rql does not embed linearization technique within the solver instead it only requires simpler pure quadratic objective function in the spirit of experimental results show that rql outperforms all available academic placers on the ispd placement contest benchmarks in particular rql obtains an average wire length improvement of and versus mpl ntuplace kraftwerk aplace and capo respectively in addition rql is three seven and ten times faster than mpl capo and aplace respectively on the ispd placement contest benchmarks on average rql obtains the best scaled wirelength among all available academic placers
real time content based access to live video data requires content analysis applications that are able to process the video data at least as fast as the video data is made available to the application and with an acceptable error rate statements as this express quality of service qos requirements to the application in order to provide some level of control of the qos provided the video content analysis application must be scalable and resource aware so that requirements of timeliness and accuracy can be met by allocating additional processing resourcesin this paper we present general architecture of video content analysis applications including model for specifying requirements of timeliness and accuracy the salient features of the architecture include its combination of probabilistic knowledge based media content analysis with qos and distributed resource management to handle qos requirements and its independent scalability at multiple logical levels of distribution we also present experimental results with an algorithm for qos aware selection of configurations of feature extractor and classification algorithms that can be used to balance requirements of timeliness and accuracy against available processing resources experiments with an implementation of real time motion vector based object tracking application demonstrate the scalability of the architecture
we present an efficient algorithm to approximate the swept volume sv of complex polyhedron along given trajectory given the boundary description of the polyhedron and path specified as parametric curve our algorithm enumerates superset of the boundary surfaces of sv it consists of ruled and developable surface primitives and the sv corresponds to the outer boundary of their arrangement we approximate this boundary by using five stage pipeline this includes computing bounded error approximation of each surface primitive computing unsigned distance fields on uniform grid classifying all grid points using fast marching front propagation iso surface reconstruction and topological refinement we also present novel and fast algorithm for computing the signed distance of surface primitives as well as number of techniques based on surface culling fast marching level set methods and rasterization hardware to improve the performance of the overall algorithm we analyze different sources of error in our approximation algorithm and highlight its performance on complex models composed of thousands of polygons in practice it is able to compute bounded error approximation in tens of seconds for models composed of thousands of polygons sweeping along complex trajectory
accurate feature detection and localization is fundamentally important to computer vision and feature locations act as input to many algorithms including camera calibration structure recovery and motion estimation unfortunately feature localizers in common use are typically not projectively invariant even in the idealized case of continuous image this results in feature location estimates that contain bias which can influence the higher level algorithms that make use of them while this behavior has been studied in the case of ellipse centroids and then used in practical calibration algorithm those results do not trivially generalize to the center of mass of radially symmetric intensity distribution this paper introduces the generalized result of feature location bias with respect to perspective distortion and applies it to several specific radially symmetric intensity distributions the impact on calibration is then evaluated finally an initial study is conducted comparing calibration results obtained using center of mass to those obtained with an ellipse detector results demonstrate that feature localization error over range of increasingly large projective distortions can be stabilized at less than tenth of pixel versus errors that can grow to larger than pixel in the uncorrected case
in order to maintain performance per watt in microprocessors there is shift towards the chip level multiprocessing paradigm microprocessor manufacturers are experimenting with tens of cores forecasting the arrival of hundreds of cores per single processor die in the near future with such large scale integration and increasing power densities thermal management continues to be significant design effort to maintain performance and reliability in modern process technologies in this paper we present two mechanisms to perform frequency scaling as part of dynamic frequency and voltage scaling dvfs to assist dynamic thermal management dtm our frequency selection algorithms incorporate the physical interaction of the cores on large scale system onto the emergency intervention mechanisms for temperature reduction of the hotspot while aiming to minimize the performance impact of frequency scaling on the core that is in thermal emergency our results show that our algorithm consistently succeeds in maximizing the operating frequency of the most critical core while successfully relieving the thermal emergency of the core comparison of our two alternative techniques reveals that our physical aware criticality based algorithm results in faster clock frequencies compared to our aggressive scaling algorithm we also show that our technique is extremely fast and is suited for real time thermal management
as means of transmitting not only data but also code encapsulated within functions higher order channels provide an advanced form of task parallelism in parallel computations in the presence of mutable references however they pose safety problem because references may be transmitted to remote threads where they are no longer valid this paper presents an ml like parallel language with type safe higher order channels by type safety we mean that no value written to channel contains references or equivalently that no reference escapes via channel from the thread where it is created the type system uses typing judgment that is capable of deciding whether the value to which term evaluates contains references or not the use of such typing judgment also makes it easy to achieve another desirable feature of channels channel locality that associates every channel with unique thread for serving all values addressed to it our type system permits mutable references in sequential computations and also ensures that mutable references never interfere with parallel computations thus it provides both flexibility in sequential programming and ease of implementing parallel computations
microarchitectures increasingly rely on dynamic optimization to improve performance in ways that are dif ficult or impossible for ahead of time compilers dynamic optimizers in turn require continuous portable low cost and accurate control flow profiles to inform their decisions but prior approaches have struggled to meet these goals simultaneously this paper presents pep hybrid instrumentation and sampling approach for continuous path and edge profiling that is efficient accurate and portable pep uses subset of ball larus path profiling to identify paths with low overhead and uses sampling to mitigate the expense of storing paths pep further reduces overhead by using profiling to guide instrumentation placement pep improves profile accuracy with modified version of arnold grove sampling the resulting system has average and maximum overhead path profile accuracy and edge profile accuracy on set of java benchmarks
query rewriting using views is technique that allows query to be answered efficiently by using pre computed materialized views it has many applications such as data caching query optimization schema integration etc this issue has been studied extensively for relational databases and as result the technology is maturing for xml data however the work is inadequate recently several frameworks have been proposed for query rewriting using views for xpath queries with the requirement that rewriting must be complete in this paper we study the problem of query rewriting using views for xpath queries without requiring that the rewriting be complete this will increase its applicability since in many cases complete rewritings using views do not exist we give formal definitions for various concepts to formulate the problem and then propose solutions our solutions are built under the framework for query containment we look into the problem from both theoretic perspectives and algorithmic approaches two methods to generate rewritings using views are proposed with different characteristics in terms of generalities and efficiencies the maximality properties of the rewritings generated by these methods are discussed
the combination of the cornea of an eye and camera viewing the eye form catadioptric mirror lens imaging system with very wide field of view we present detailed analysis of the characteristics of this corneal imaging system anatomical studies have shown that the shape of normal cornea without major defects can be approximated with an ellipsoid of fixed eccentricity and size using this shape model we can determine the geometric parameters of the corneal imaging system from the image then an environment map of the scene with large field of view can be computed from the image the environment map represents the illumination of the scene with respect to the eye this use of an eye as natural light probe is advantageous in many relighting scenarios for instance it enables us to insert virtual objects into an image such that they appear consistent with the illumination of the scene the eye is particularly useful probe when relighting faces it allows us to reconstruct the geometry of face by simply waving light source in front of the face finally in the case of an already captured image eyes could be the only direct means for obtaining illumination information we show how illumination computed from eyes can be used to replace face in an image with another one we believe that the eye not only serves as useful tool for relighting but also makes relighting possible in situations where current approaches are hard to use
new tool for shape decomposition is presented it is function defined on the shape domain and computed using linear system of equations it is demonstrated that the level curves of the new function provide hierarchical partitioning of the shape domain into visual parts without requiring any features to be estimated the new tool is an unconventional distance transform where the minimum distance to the union of the shape boundary and an unknown critical curve is computed this curve divides the shape domain into two parts one corresponding to the coarse scale structure and the other one corresponding to the fine scale structurethe connection of the new function to variety of morphological concepts skeleton by influence zone aslan skeleton and weighted distance transforms is discussed
knowing which method parameters may be mutated during method’s execution is useful for many software engineering tasks parameter reference is immutable if it cannot be used to modify the state of its referent object during the method’s execution we formally define this notion in core object oriented language having the formal definition enables determining correctness and accuracy of tools approximating this definition and unbiased comparison of analyses and tools that approximate similar definitionswe present pidasa tool for classifying parameter reference immutability pidasa combines several lightweight scalable analyses in stages with each stage refining the overall result the resulting analysis is scalable and combines the strengths of its component analyses as one of the component analyses we present novel dynamic mutability analysis and show how its results can be improved by random input generation experimental results on programs of up to kloc show that compared to previous approaches pidasa increases both run time performance and overall accuracy of immutability inference
in this paper we compare the simulated performance of family of multiprocessor architectures based on global shared memory the processors are connected to the memory through caches that snoop one or more shared buses in crossbar arrangement we have simulated number of configurations in order to assess the relative performance of multiple versus wide bus machines with varying amounts of prefetch four programs with widely differing characteristics were run on each configuration the configurations that gave the best all round results were multiple narrow buses with words of prefetch
the design of workflows is complicated task in those cases where the control flow between activities cannot be modeled in advance but simply occurs during enactment time run time we speak of ad hoc processes ad hoc processes allow for the flexibility needed in real life business processes since ad hoc processes are highly dynamic they represent one of the most difficult challenges both technically and conceptually caramba is one of the few process aware collaboration systems allowing for ad hoc processes unlike in classical workflow systems the users are no longer restricted by the system therefore it is interesting to study the actual way people and organizations work in this paper we propose process mining techniques and tools to analyze ad hoc processes we introduce process mining discuss the concept of mining in the context of ad hoc processes and demonstrate concrete application of the concept using caramba process mining tools such as emit and minson and newly developed extraction tool named teamlog
the proliferation of the world wide web has brought information retrieval ir techniques to the forefront of search technology to the average computer user ldquo searching rdquo now means using ir based systems for finding information on the www or in other document collections ir query evaluation methods and workloads differ significantly from those found in database systems in this paper we focus on three such differences first due to the inherent fuzziness of the natural language used in ir queries and documents an additional degree of flexibility is permitted in evaluating queries second ir query evaluation algorithms tend to have access patterns that cause problems for traditional buffer replacement policies third ir search is often an iterative process in which query is repeatedly refined and resubmitted by the user based on these differences we develop two complementary techniques to improve the efficiency of ir queries buffer aware query evaluation which alters the query evaluation process based on the current contents of buffers and ranking aware buffer replacement which incorporates knowledge of the query processing strategy into replacement decisions in detailed performance study we show that using either of these techniques yields significant performance benefits and that in many cases combining them produces even further improvements
novel junction detector is presented that fits the neighborhood around point to junction model the junction model segments the neighborhood into wedges by determining set of radial edges the radial edges are invariant to affine transforms creating an affine invariant junction detector the radial edges are evaluated based upon the pixels along the edge the angle between the pixel gradient and the vector to the potential junction point forms the initial basis for the measurement an initial set of radial edges is selected based upon identifying local maximums within given arc distance an energy function is applied to the resulting radial segmentation and greedy optimization routine is used to construct the minimal set of radial edges to identify the final junctions second energy function is used that combines the components of the first energy function with the resulting change in standard deviation by separation into radial segments the junctions with the most energy in their local neighborhoods are selected as potential junctions the neighborhoods about the potential junctions are analyzed to determine if they represent single line or multiple non parallel lines if the neighborhood represents multiple non parallel lines the point is classified as junction point the junction detector is tested on several images including both synthetic and real images highlights of radially segmented junction points are displayed for the real images
there are many human activities for which information about the geographical location where they take place is of paramount importance in the last years there has been increasing interest in the combination of computer supported collaborative work cscw and geographical information in this paper we analyze the concepts and elements of cscw that are most relevant to geocollaboration we define model facilitating the design of shared artifacts capable to build shared awareness of the geographical context the paper also describes two case studies using the model to design geocollaborative applications
many studies have shown the limits of support confidence framework used in apriori like algorithms to mine association rules there are lot of efficient implementations based on the antimonotony property of the support but candidate set generation is still costly in addition many rules are uninteresting or redundant and one can miss interesting rules like nuggets one solution is to get rid of frequent itemset mining and to focus as soon as possible on interesting rules for that purpose algorithmic properties were first studied especially for the confidence they allow all confidence rules to be found without preliminary support pruning more recently in the case of class association rules the concept of optimal rules gave pruning strategy compatible with more measures however all these properties have been demonstrated for limited number of interestingness measures we present new formal framework which allows us to make the link between analytic and algorithmic properties of the measures we apply this framework to optimal rules and we demonstrate necessary and sufficient condition of existence for this pruning strategy which can be applied to any measure
we present method for end to end out of core simplification and view dependent visualization of large surfaces the method consists of three phases memory insensitive simplification memory insensitive construction of multiresolution hierarchy and run time output sensitive view dependent rendering and navigation of the mesh the first two off line phases are performed entirely on disk and use only small constant amount of memory whereas the run time system pages in only the rendered parts of the mesh in cache coherent manner as result we are able to process and visualize arbitrarily large meshes given sufficient amount of disk space constant multiple of the size of the input meshsimilar to recent work on out of core simplification our memory insensitive method uses vertex clustering on rectilinear octree grid to coarsen and create hierarchy for the mesh and quadric error metric to choose vertex positions at all levels of resolution we show how the quadric information can be used to concisely represent vertex position surface normal error and curvature information for anisotropic view dependent coarsening and silhouette preservationthe run time component of our system uses asynchronous rendering and view dependent refinement driven by screen space error and visibility the system exploits frame to frame coherence and has been designed to allow preemptive refinement at the granularity of individual vertices to support refinement on time budgetour results indicate significant improvement in processing speed over previous methods for out of core multiresolution surface construction meanwhile all phases of the method are disk and memory efficient and are fairly straightforward to implement
as internet usage continues to expand rapidly careful attention needs to be paid to the design of internet servers for achieving high performance and end user satisfaction currently the memory system continues to remain significant performance bottleneck for internet servers employing multi ghz processors in this paper our aim is two fold to characterize the cache memory performance of web server workloads and to propose and evaluate cache design alternatives for future web servers we chose specweb as the representative web server workload and our entire characterization and evaluation methodology is based on our casper simulation framework we begin by exploring the processor cache design space for single and dual processor servers based on our observations we then evaluate other cache hierarchy alternatives such as chipset caches coherence filters and decompressed page stores we show the sensitivity of these components to basic organization parameters such as cache size line size and degree of associativity we also present the performance implications of routing memory requests initiated by devices through these caches based on detailed simulation data and its implications on system level performance this paper shows that chipset caches have significant potential for improving future web server performance
the preservation of digital artifacts represents an unanswered challenge for the modern information society xml and its query languages provide an effective environment to address this challenge because of their ability to support temporal information and queries and make it easy to publish database history to the web in this paper we focus on the problem of preserving publishing and querying efficiently the history of relational database past research on temporal databases revealed the difficulty of achieving satisfactory solutions using flat relational tables and sql here we show that the problem can be solved using xml to support temporally grouped representations of the database history and xquery to express powerful temporal queries on such representations furthermore the approach is quite general and it can be used to preserve and query the history of multi version xml documents then we turn to the problem of efficient implementation and we investigate alternative approaches including xml dbms ii shredding xml into relational tables and using sql xml on these tables iii sql nested tables and iv or dbms extended with xml support these experiments suggest that combination of temporal xml views and physical relational tables provides the best approach for managing temporal database information
the nature of distributed systems is constantly and steadily changing as the hardware and software landscape evolves porting applications and adapting existing middleware systems to ever changing computational platforms has become increasingly complex and expensive therefore the design of applications as well as the design of next generation middleware systems must follow set of guiding principles in order to insure long term survivability without costly re engineering from our practical experience the key determinants to success in this endeavor are adherence to the following principles design for change provide for storage subsystem coordination employ workload partitioning and load balancing techniques employ caching schedule the workload and understand the workload in order to support these principles we have collected extensive experimental results comparing three middleware systems targeted at data and compute intensive applications implemented by our research group during the course of the last decade on single data and compute intensive application the main contribution of this work is the analysis of level playing field where we discuss and quantify how adherence to these guiding principles impacts overall system throughput and response time
optimization of complex xqueries combining many xpath steps and joins is currently hindered by the absence of good cardinality estimation and cost models for xquery additionally the state of the art of even relational query optimization still struggles to cope with cost model estimation errors that increase with plan size as well as with the effect of correlated joins and selections in this research we propose to radically depart from the traditional path of separating the query compilation and query execution phases by having the optimizer execute materialize partial results and use sampling based estimation techniques to observe the characteristics of intermediates the proposed technique takes as input join graph where the edges are either equi joins or xpath steps and the execution environment provides value and structural join algorithms as well as structural and value based indices while run time optimization with sampling removes many of the vulnerabilities of classical optimizers it brings its own challenges with respect to keeping resource usage under control both with respect to the materialization of intermediates as well as the cost of plan exploration using sampling our approach deals with these issues by limiting the run time search space to so called zero investment algorithms for which sampling can be guaranteed to be strictly linear in sample size all operators and xml value indices used by rox for sampling have the zero investment property we perform extensive experimental evaluation on large xml datasets that shows that our run time query optimizer finds good query plans in robust fashion and has limited run time overhead
with the delta between processor clock frequency and memory latency ever increasing and with the standard locality improving transformations maturing compilers increasingly seek to modify an application’s data layout to improve spatial and temporal locality and to reduce cache miss and page fault penalties in this paper we describe practical implementation of the data layout optimizations structure splitting structure peeling structure field reordering and dead field removal both for profile and non profile based compilations we demonstrate significant performance gains but find that automatic transformations fail for relatively high number of record types because of legality violations or profitability constraints additionally we find class of desirable transformations for which the framework cannot provide satisfying results to address this issue we complement the automatic transformations with an advisory tool we reuse the compiler analysis done for automatic transformation and correlate its results with peformance data collected during runtime for structure fields such as data cache misses and latencies we then use the compiler as pefomtance analysis and reporting tool and provide insight into how to layout structure types more eficiently
this paper presents novel approach to image segmentation based on hypergraph cut techniques natural images contain more components edge homogeneous region noise so to facilitate the natural image analysis we introduce an image neighborhood hypergraph representation inh this representation extracts all features and their consistencies in the image data and its mode of use is close to the perceptual grouping then we formulate an image segmentation problem as hypergraph partitioning problem and we use the recent way hypergraph techniques to find the partitions of the image into regions of coherent brightness color experimental results of image segmentation on wide range of images from berkeley database show that the proposed method provides significant performance improvement compared with the stat of the art graph partitioning strategy based on normalized cut ncut criteria
we explore computational approach to proving intractability of certain counting problems more specifically we study the complexity of holant of regular graphs these problems include concrete problems such as counting the number of vertex covers or independent sets for regular graphs the high level principle of our approach is algebraic which provides sufficient conditions for interpolation to succeed another algebraic component is holographic reductions we then analyze in detail polynomial maps on induced by some combinatorial constructions these maps define sufficiently complicated dynamics of that we can only analyze them computationally we use both numerical computation as intuitive guidance and symbolic computation as proof theoretic verification to derive that certain collection of combinatorial constructions in myriad combinations fulfills the algebraic requirements of proving hardness the final result is dichotomy theorem for class of counting problems
pointer aliasing analysis is used to determine if two object names containing dereferences and or field selectors eg may refer to the same location during execution such information is necessary for applications such as data flow based testers program understanding tools and debuggers but is expensive to calculate with acceptable precision incremental algorithms update data flow information after program change rather than recomputing it from scratch under the assumption that the change impact will be limited two versions of practical incremental pointer aliasing algorithm have been developed based on landi ryder flow and context sensitive alias analysis empirical results attest to the time savings over exhaustive analysis six fold speedup on average and the precision of the approximate solution obtained on average same solution as exhaustive algorithm for of the tests
peer to peer storage systems assume that their users consume resources in proportion to their contribution unfortunately users are unlikely to do this without some enforcement mechanism prior solutions to this problem require centralized infrastructure constraints on data placement or ongoing administrative costs all of these run counter to the design philosophy of peer to peer systemssamsara enforces fairness in peer to peer storage systems without requiring trusted third parties symmetric storage relationships monetary payment or certified identities each peer that requests storage of another must agree to hold claim in return placeholder that accounts for available space after an exchange each partner checks the other to ensure faithfulness samsara punishes unresponsive nodes probabilistically because objects are replicated nodes with transient failures are unlikely to suffer data loss unlike those that are dishonest or chronically unavailable claim storage overhead can be reduced when necessary by forwarding among chains of nodes and eliminated when cycles are created forwarding chains increase the risk of exposure to failure but such risk is modest under reasonable assumptions of utilization and simultaneous persistent failure
we present mathematical framework for enforcing energy conservation in bidirectional reflectance distribution function brdf by specifying halfway vector distributions in simple two dimensional domains energy conserving brdfs can produce plausible rendered images with accurate reflectance behavior especially near grazing angles using our framework we create an empirical brdf that allows easy specification of diffuse specular and retroreflective materials we also present second brdf model that is useful for data fitting although it does not preserve energy it uses the same halfway vector domain as the first model we show that this data fitting brdf can be used to match measured data extremely well using only small set of parameters we believe that this is an improvement over table based lookups and factored versions of brdf data
rossnet brings together the four major areas of networking research network modeling simulation measurement and protocol design rossnet is tool for computing large scale design of experiments through components such as discrete event simulation engine default and extensible model designs and state of the art xml interface rossnet reads in predefined descriptions of network topologies and traffic scenarios which allows for in depth analysis and insight into emerging feature interactions cascading failures and protocol stability in variety of situations developers will be able to design and implement their own protocol designs network topologies and modeling scenarios as well as implement existing platforms within the rossnet platform also using rossnet designers are able to create experiments with varying levels of granularity allowing for the highest degree of scalability
the widespread interest in program slicing within the source code analysis and manipulation community has led to the introduction of large number of different forms of slicing each preserves some aspect of program’s behaviour and simplifies the program to focus exclusively upon this behaviour in order to understand the similarities and differences between forms of slicing formal mechanism is required this paper further develops formal framework for comparing forms of slicing using theory of program projection this framework is used to reveal the ordering relationship between various static dynamic simultaneous and conditioned forms of slicing
property testing algorithms are ultra efficient algorithms that decide whether given object eg graph has certain property eg bipartiteness or is significantly different from any object that has the property to this end property testing algorithms are given the ability to perform local queries to the input though the decision they need to make usually concerns properties with global nature in the last two decades property testing algorithms have been designed for many types of objects and properties amongst them graph properties algebraic properties geometric properties and more in this monograph we survey results in property testing where our emphasis is on common analysis and algorithmic techniques among the techniques surveyed are the following the self correcting approach which was mainly applied in the study of property testing of algebraic properties the enforce and test approach which was applied quite extensively in the analysis of algorithms for testing graph properties in the dense graphs model as well as in other contexts szemerédi’s regularity lemma which plays very important role in the analysis of algorithms for testing graph properties in the dense graphs model the approach of testing by implicit learning which implies efficient testability of membership in many functions classes and algorithmic techniques for testing properties of sparse graphs which include local search and random walks
this paper presents swing closed loop network responsive traffic generator that accurately captures the packet interactions of range of applications using simple structural model starting from observed traffic at single point in the network swing automatically extracts distributions for user application and network behavior it then generates live traffic corresponding to the underlying models in network emulation environment running commodity network protocol stacks we find that the generated traces are statistically similar to the original traces further to the best of our knowledge we are the first to reproduce burstiness in traffic across range of timescales using model applicable to variety of network settings an initial sensitivity analysis reveals the importance of capturing and recreating user application and network characteristics to accurately reproduce such burstiness finally we explore swing’s ability to vary user characteristics application properties and wide area network conditions to project traffic characteristics into alternate scenarios
in this paper we investigate the extent to which knowledge compilation can be used to circumvent the complexity of skeptical inference from stratified belief base sbb we first analyze the compilability of skeptical inference from an sbb under various requirements concerning both the selection policy under consideration the possibility to make the stratification vary at the on line query answering stage and the expected complexity of inference from the compiled form not surprisingly the results are mainly negative however since they concern the worst case situation only they do not prevent compilation based approach from being practically useful for some families of instances while many approaches to compile an sbb can be designed we are primarily interested in those which take advantage of existing knowledge compilation techniques for classical inference specifically we present general framework for compiling sbbs into so called normal sbbs where is any tractable class for clausal entailment which is the target class of compilation function another major advantage of the proposed approach lies in the flexibility of the normal belief bases obtained which means that changing the stratification does not require to re compile the sbb for several families of compiled sbbs and several selection policies the complexity of skeptical inference is identified some tractable restrictions are exhibited for each policy finally some empirical results are presented
crucial role in the microsoft net framework common language runtime clr security model is played by type safety of the common intermediate language cil in this paper we formally prove type safety of large subset of cil to do so we begin by specifying the static and dynamic semantics of cil by providing an abstract interpreter for cil programs we then formalize the bytecode verification algorithm whose job it is to compute well typing for given method we then prove type safety of well typed methods ie the execution according to the semantics model of legal and well typed methods does not lead to any run time type violations finally to prove cil’s type safety we show that the verification algorithm is sound ie the typings it produces are well typings and complete ie if well typing exists then the algorithm computes one
exhaustive model checking search techniques are ineffective for error discovery in large and complex multi threaded software systems distance estimate heuristics guide the concrete execution of the program toward possible error location the estimate is lower bound computed on statically generated abstract model of the program that ignores all data values and only considers control flow in this paper we describe new distance estimate heuristic that efficiently computes tighter lower bound in programs with polymorphism when compared to the state of the art distance heuristic we statically generate conservative distance estimates and refine the estimates when the targets of dynamic method invocations are resolved in our empirical analysis the state of the art approach is computationally infeasible for large programs with polymorphism while our new distance heuristic can quickly detect the errors
we present new multiphase method for efficiently simplifying polygonal surface models of arbitrary size it operates by combining an initial out of core uniform clustering phase with subsequent in core iterative edge contraction phase these two phases are both driven by quadric error metrics and quadrics are used to pass information about the original surface between phases the result is method that produces approximations of quality comparable to quadric based iterative edge contraction but at fraction of the cost in terms of running time and memory consumption
we introduce new visual search interface for search engines the interface is user friendly and informative graphical front end for organizing and presenting search results in the form of topic groups such semantics oriented search result presentation is in contrast with conventional search interfaces which present search results according to the physical structures of the information given user query our interface first retrieves relevant online materials via third party search engine and then we analyze the semantics of search results to detect latent topics in the result set once the topics are detected we map the search result pages into topic clusters according to the topic clustering result we divide the available screen space for our visual interface into multiple topic displaying regions one for each topic for each topic’s displaying region we summarize the information contained in the search results under the corresponding topic so that only key messages will be displayed with this new visual search interface users are conveyed the key information in the search results expediently with the key information users can navigate to the final desired results with less effort and time than conventional searching supplementary materials for this paper are available at http wwwcshkuhk songhua visualsearch
this paper explores interposed request routing in slice new storage system architecture for high speed networks incorporating network attached block storage slice interposes request switching filter called μproxy along each client’s network path to the storage service eg in network adapter or switch the μproxy intercepts request traffic and distributes it across server ensemble we propose request routing schemes for and file service traffic and explore their effect on service structure the slice prototype uses packet filter μproxy to virtualize the standard network file system nfs protocol presenting to nfs clients unified shared file volume with scalable bandwidth and capacity experimental results from the industry standard specsfs workload demonstrate that the architecture enables construction of powerful network attached storage services by aggregating cost effective components on switched gigabit ethernet lan
performance and power are the first order design metrics for network on chips nocs that have become the de facto standard in providing scalable communication backbones for multicores cmps however nocs can be plagued by higher power consumption and degraded throughput if the network and router are not designed properly towards this end this paper proposes novel router architecture where we tune the frequency of router in response to network load to manage both performance and power we propose three dynamic frequency tuning techniques freqboost freqthrtl and freqtune targeted at congestion and power management in nocs as enablers for these techniques we exploit dynamic voltage and frequency scaling dvfs and the imbalance in generic router pipeline through time stealing experiments using synthetic workloads on wormhole switched mesh interconnect show that freqboost is better choice for reducing average latency maximum while freqthrtl provides the maximum benefits in terms of power saving and energy delay product edp the freqtune scheme is better candidate for optimizing both performance and power achieving on an average reduction in latency savings in power up to at high load and savings up to at high load in edp with application benchmarks we observe ipc improvement up to using our design the performance and power benefits also scale for larger nocs
the signature quadratic form distance is an adaptive similarity measure for flexible content based feature representations of multimedia data in this paper we present deep survey of the mathematical foundation of this similarity measure which encompasses the classic quadratic form distance defined only for the comparison between two feature histograms of the same length and structure moreover we give the benefits of the signature quadratic form distance and experimental evaluation on numerous real world databases
frequent pattern mining has been studied extensively on scalable methods for mining various kinds of patterns including itemsets sequences and graphs however the bottleneck of frequent pattern mining is not at the efficiency but at the interpretability due to the huge number of patterns generated by the mining processin this paper we examine how to summarize collection of itemset patterns using only representatives small number of patterns that user can handle easily the representatives should not only cover most of the frequent patterns but also approximate their supports generative model is built to extract and profile these representatives under which the supports of the patterns can be easily recovered without consulting the original dataset based on the restoration error we propose quality measure function to determine the optimal value of parameter polynomial time algorithms are developed together with several optimization heuristics for efficiency improvement empirical studies indicate that we can obtain compact summarization in real datasets
this paper presents vision framework which enables feature oriented appearance based navigation in large outdoor environments containing other moving objects the framework is based on hybrid topological geometrical environment representation constructed from learning sequence acquired during robot motion under human control at the higher topological layer the representation contains graph of key images such that incident nodes share many natural landmarks the lower geometrical layer enables to predict the projections of the mapped landmarks onto the current image in order to be able to start or resume their tracking on the fly the desired navigation functionality is achieved without requiring global geometrical consistency of the underlying environment representation the framework has been experimentally validated in demanding and cluttered outdoor environments under different imaging conditions the experiments have been performed on many long sequences acquired from moving cars as well as in large scale real time navigation experiments relying exclusively on single perspective vision sensor the obtained results confirm the viability of the proposed hybrid approach and indicate interesting directions for future work
power dissipation and thermal issues are increasingly significant in modern processors as result it is crucial that power performance tradeoffs be made more visible to chip architects and even compiler writers in addition to circuit designers most existing power analysis tools achieve high accuracy by calculating power estimates for designs only after layout or floorplanning are complete in addition to being available only late in the design process such tools are often quite slow which compounds the difficulty of running them for large space of design possibilities this paper presents wattch framework for analyzing and optimizing microprocessor power dissipation at the architecture level wattch is or more faster than existing layout level power tools and yet maintains accuracy within of their estimates as verified using industry tools on leading edge designs this paper presents several validations of wattch’s accuracy in addition we present three examples that demonstrate how architects or compiler writers might use wattch to evaluate power consumption in their design process we see wattch as complement to existing lower level tools it allows architects to explore and cull the design space early on using faster higher level tools it also opens up the field of power efficient computing to wider range of researchers by providing power evaluation methodology within the portable and familiar simplescalar framework
in this paper we present compass an xml based agent model for supporting user in his web activities compass is the result of our attempt of synthesizing in unique context important guidelines currently characterizing the research in various computer science sectors indeed it constructs and handles rather rich even if light user profile this latter is exploited for supporting the user in an efficient search of information of his interest in this way it behaves as content based recommender system moreover it is particularly suited for constructing multi agent systems and therefore for implementing collaborative filtering recommendation techniques in addition since it widely uses xml technology it is particularly light and capable of operating on various hardware and software platforms the adoption of xml also facilitates the information exchange among compass agents and consequently makes the management and the exploitation of compass based multi agent systems easier
this paper brings up new concern regarding efficient re keying of large groups with dynamic membership minimizing the overall time it takes for the key server and the group members to process the re keying message specifically we concentrate on re keying algorithms based on the logical key hierarchy lkh and minimize the longest sequence of encryptions and decryptions that need to be done in re keying operation we first prove lower bound on the time required to perform re keying operation in this model then we provide an optimal schedule of re keying messages matching the above lower bound in particular we show that the optimal schedule can be found only when the ariety of the lkh key graph is chosen according to the available communication bandwidth and the users processing power our results show that key trees of ariety commonly assumed to be optimal are not optimal when used in high bandwidth networks or networks of devices with low computational power like sensor networks
there are major trends to advance the functionality of search engines to more expressive semantic level this is enabled by the advent of knowledge sharing communities such as wikipedia and the progress in automatically extracting entities and relationships from semistructured as well as natural language web sources recent endeavors of this kind include dbpedia entitycube knowitall readtheweb and our own yago naga project and others the goal is to automatically construct and maintain comprehensive knowledge base of facts about named entities their semantic classes and their mutual relations as well as temporal contexts with high precision and high recall this tutorial discusses state of the art methods research opportunities and open challenges along this avenue of knowledge harvesting
we address performance maximization of independent task sets under energy constraint on chip multi processor cmp architectures that support multiple voltage frequency operating states for each core we prove that the problem is strongly np hard we propose polynomial time approximation algorithms for homogeneous and heterogeneous cmps to the best of our knowledge our techniques offer the tightest bounds for energy constrained design on cmp architectures experimental results demonstrate that our techniques are effective and efficient under various workloads on several cmp architectures
space constrained optimization problems arise in variety of applications ranging from databases to ubiquitous computing typically these problems involve selecting set of items of interest subject to space constraint we show that in many important applications one faces variants of this basic problem in which the individual items are sets themselves and each set is associated with benefit value since there are no known approximation algorithms for these problems we explore the use of greedy and randomized techniques we present detailed performance and theoretical evaluation of the algorithms highlighting the efficiency of the proposed solutions
type inclusion test determines whether one type is subtype of another efficient type testing techniques exist for single subtyping but not for languages with multiple subtyping to date the fast constant time technique relies on binary matrix encoding of the subtype relation with quadratic space requirements in this paper we present three new encodings of the subtype relation the packed encoding the bit packed encoding and the compact encoding these encodings have different characteristics the bit packed encoding delivers the best compression rates on average for real life programs the packed encoding performs type inclusion tests in only machine instructions we present fast algorithm for computing these encoding which runs in less than milliseconds for pe and bpe and milliseconds for ce on an alpha processor finally we compare our results with other constant time type inclusion tests on suite of large benchmark hierarchies
the success of pp file sharing network highly depends on the scalability and versatility of its search mechanism two particularly desirable search features are scope ability to find infrequent items and support for partial match queries queries that contain typos or include subset of keywords while centralized index architectures such as napster can support both these features existing decentralized architectures seem to support at most one prevailing unstructured pp protocols such as gnutella and fasttrack deploy blind search mechanism where the set of peers probed is unrelated to the query thus they support partial match queries but have limited scope on the other extreme the recently proposed distributed hash tables dhts such as can and chord couple index location with the item’s hash value and thus have good scope but can not effectively support partial match queries another hurdle to dhts deployment is their tight control of the overlay structure and the information part of the index each peer maintains which makes them more sensitive to failures and frequent joins and disconnects we develop new class of decentralized pp architectures our design is based on unstructured architectures such as gnutella and fasttrack and retains many of their appealing properties including support for partial match queries and relative resilience to peer failures yet we obtain orders of magnitude improvement in the efficiency of locating rare items our approach exploits associations inherent in human selections to steer the search process to peers that are more likely to have an answer to the query we demonstrate the potential of associative search using models analysis and simulations
we present clustering technique addressing redundancy for bounded distance clusters which means being able to determine the minimum number of cluster heads per node and the maximum distance from nodes to their cluster heads this problem is similar to computing dominating set ds of the network ds is defined as the problem of selecting minimum cardinality vertex set of the network such that every vertex not in is at distance smaller than or equal to from at least vertices in in mobile ad hoc networks manets clusters should be computed distributively because the topology may change frequently we present the first centralized and distributed solutions to the ds problem for arbitrary topologies the centralized algorithm computes kln approximation where is the largest cardinality among all hop neighborhoods in the network the distributed approach is extended for clustering applications while the centralized is used as lower bound for comparison purposes extensive simulations are used to compare the distributed solution with the centralized one as case study we propose novel multi core multicast protocol that applies the distributed solution for the election of cores the new protocol is compared against puma one of the best performing multicast protocols for manets simulation results show that the new protocol outperforms puma on the context of static networks
in order to provide certified security services we must provide indicators that can measure the level of assurance that complex business process can offer unfortunately the formulation of security indicators is not amenable to efficient algorithms able to evaluate the level of assurance of complex process from its componentsin this paper we show an algorithm based on fd graphs variant of directed hypergraphs that can be used to compute in polynomial time the overall assurance indicator of complex business process from its components for arbitrary monotone composition functions ii the subpart of the business process that is responsible for such assurance indicator ie the best security alternative
write buffering is one of many successful mechanisms that improves the performance and scalability of multiprocessors however it leads to more complex memory system behavior which cannot be described using intuitive consistency models such as sequential consistency it is crucial to provide programmers with specification of the exact behavior of such complex memories this article presents uniform framework for describing systems at different levels of abstraction and proving their equivalence the framework is used to derive and prove correct simple specifications in terms of program level instructions of the sparc total store order and partial store order memoriesthe framework is also used to examine the sparc relaxed memory order we show that it is not memory consistency model that corresponds to any implementation on multiprocessor that uses write buffers even though we suspect that the sparc version specification of relaxed memory order was intended to capture general write buffer architecture the same technique is used to show that coherence does not correspond to write buffer architecture corollary which follows from the relationship between coherence and alpha is that any implementation of alpha consistency using write buffers cannot produce all possible alpha computations that is there are some computations that satisfy the alpha specification but cannot occur in the given write buffer implementation
restricted by wireless communication technology and mobile computing environment it is difficult to improve the efficiency of the programs that resided in mobile end to solve this problem the cache mechanism is major and effective method in this paper we propose an application oriented semantic cache model it establishes semantic associated rules base according to the knowledge of application domains makes use of the semantic locality for data pre fetching and adopts two level lru algorithm for cache replacement several experiments demonstrate that the semantic driven cache model can achieve higher hit ratio than traditional cache models
materialized views rdbms silver bullet demonstrate its efficacy in many applications especially as data warehousing decison support system tool the pivot of playing materialized views efficiently is view selection though studied for over thirty years in rdbms the selection is hard to make in the context of xml databases where both the semi structured data and the expressiveness of xml query languages add challenges to the view selection problem we start our discussion on producing minimal xml views in terms of size as candidates for given workload query set to facilitate intuitionistic view selection we present view graph called vcube to structurally maintain all generated views by basing our selection on vcube for materialization we propose two view selection strategies targeting at space optimized and space time tradeoff respectively we built our implementation on top of berkeley db xml demonstrating that significant performance improvement could be obtained using our proposed approaches
peer to peer overlay networks have proven to be good support for storing and retrieving data in fully decentralized way sound approach is to structure them in such way that they reflect the structure of the application peers represent objects of the application so that neighbours in the peer to peer network are objects having similar characteristics from the application’s point of view such structured peer to peer overlay networks provide natural support for range queries while some complex structures such as voronoï tessellation where each peer is associated to cell in the space are clearly relevant to structure the objects the associated cost to compute and maintain these structures is usually extremely high for dimensions larger than we argue that an approximation of complex structure is enough to provide native support of range queries this stems from the fact that neighbours are important while the exact space partitioning associated to given peer is not as crucial in this paper we present the design analysis and evaluation of raynet loosely structured voronoï based overlay network raynet organizes peers in an approximation of voronoï tessellation in fully decentralized way it relies on monte carlo algorithm to estimate the size of cell and on an epidemic protocol to discover neighbours in order to ensure efficient polylogarithmic routing raynet is inspired from the kleinberg’s small world model where each peer gets connected to close neighbours its approximate voronoï neighbours in raynet and shortcuts long range neighbours implemented using an existing kleinberg like peer sampling
we design an adaptive admission control mechanism network early warning system news to protect servers and networks from flash crowds and maintain high performance for end users news detects flash crowds from performance degradation in responses and mitigates flash crowds by admitting incoming requests adaptively we evaluate news performance with both simulations and testbed experiments we first investigate network limited scenarion in simulations we find that news detects flash crowds within seconds by discarding percent of incoming requests news protects the target server and networks from overloading reducing the response packet drop rate from percent to percent for admitted requests news increases their response rate by two times this performance is similar to the best static rate limiter deployed in the same scenario we also investigate the impact of detection intervals on news performance showing it affects both detection delay and false alarm rate we further consider server memory limited scenario in testbed experiments confirming that news is also effective in this case we also examine the runtime cost of news traffic monitoring in practice and find that it consumes little cpu time and relatively small memory finally we show news effectively protects bystander traffic from flash crowds
users enter queries that are short as well as long the aim of this work is to evaluate techniques that can enable information retrieval ir systems to automatically adapt to perform better on such queries by adaptation we refer to modifications to the queries via user interaction and detecting that the original query is not good candidate for modification we show that the former has the potential to improve mean average precision map of long and short queries by and respectively and that simple user interaction can help towards this goal we observed that after inspecting the options presented to them users frequently did not select any we present techniques in this paper to determine beforehand the utility of user interaction to avoid this waste of time and effort we show that our techniques can provide ir systems with the ability to detect and avoid interaction for unpromising queries without significant drop in overall performance
the proliferation of multimedia applications over mobile resource constrained wireless networks has raised the need for techniques that adapt these applications both to clients quality of service qos requirements and to network resource constraints this article investigates the upper layer adaptation mechanisms to achieve end to end delay control for multimedia applications the proposed adaptation approach spans application layer middleware layer and network layer in application layer the requirement adaptor dynamically changes the requirement levels according to end to end delay measurement and acceptable qos requirements for the end users in middleware layer the priority adaptor is used to dynamically adjust the service classes for applications using feedback control theory in network layer the service differentiation scheduler assigns different network resources eg bandwidth to different service classes with the coordination of these three layers our approach can adaptively assign resources to multimedia applications to evaluate the impact of our adaptation scheme we built real ieee ad hoc network testbed the test bed experiments show that the proposed upper layer adaptation for end to end delay control successfully adjusts multimedia applications to meet delay requirements in many scenarios
in this paper we introduce clean slate architecture for improving the delivery of data packets in ieee wireless mesh networks opposed to the rigid tcp ip layer architecture which exhibits serious deficiencies in such networks we propose unitary layer approach that combines both routing and transport functionalities in single layer the new mesh transmission layer mtl incorporates cross interacting routing and transport modules for reliable data delivery based on the loss probabilities of wireless links due to the significant drawbacks of standard tcp over ieee we particularly focus on the transport module proposing pure rate based approach for transmitting data packets according to the current contention in the network by considering the ieee spatial reuse constraint and employing novel acknowledgment scheme the new transport module improves both goodput and fairness in wireless mesh networks in comparative performance study we show that mtl achieves up to more goodput and up to less packet drops than tcp ip while maintaining excellent fairness results
several attempts have been made to analyze customer behavior on online commerce sites some studies particularly emphasize the social networks of customers users reviews and ratings of product exert effects on other consumers purchasing behavior whether user refers to other users ratings depends on the trust accorded by user to the reviewer on the other hand the trust that is felt by user for another user correlates with the similarity of two users ratings this bidirectional interaction that involves trust and rating is an important aspect of understanding consumer behavior in online communities because it suggests clustering of similar users and the evolution of strong communities this paper presents theoretical model along with analyses of an actual online commerce site we analyzed large community site in japan cosme the noteworthy characteristics of cosme are that users can bookmark their trusted users in addition they can post their own ratings of products which facilitates our analyses of the ratings bidirectional effects on trust and ratings we describe an overview of the data in cosme analyses of effects from trust to rating and vice versa and our proposition of measure of community gravity which measures how strongly user might be attracted to community our study is based on the cosme dataset in addition to the epinions dataset it elucidates important insights and proposes potentially important measure for mining online social networks
wireless sensor nodes are increasingly being tasked with computation and communication intensive functions while still subject to constraints related to energy availability on these embedded platforms once all low power design techniques have been explored duty cycling the various subsystems remains the primary option to meet the energy and power constraints this requires the ability to provide spurts of high mips and high bandwidth connections however due to the large overheads associated with duty cycling the computation and communication subsystems existing high performance sensor platforms are not efficient in supporting such an option in this article we present the design and optimizations taken in wireless gateway node wgn that bridges data from wireless sensor networks to wi fi networks in an on demand basis we discuss our strategies to reduce duty cycling related costs by partitioning the system and by reducing the amount of time required to activate or deactivate the high powered components we compare the design choices and performance parameters with those made in the intel stargate platform to show the effectiveness of duty cycling on our platform we have built working prototype and the experimental results with two different power management schemes show significant reductions in latency and average power consumption compared to the stargate the wgn running our power gating scheme performs about six times better in terms of average system power consumption than the stargate running the suspend system scheme for large working periods where the active power dominates for short working periods where the transition enable disable power becomes dominant we perform up to seven times better the comparative performance of our system is even greater when the sleep power dominates
separation logic is popular approach for specifying properties of recursive mutable data structures several existing systems verify subclass of separation logic specifications using static analysis techniques checking data structure specifications during program execution is an alternative to static verification it can enforce the sophisticated specifications for which static verification fails and it can help debug incorrect specifications and code by detecting concrete counterexamples to their validity this paper presents separation logic invariant checker slick runtime checker for separation logic specifications we show that although the recursive style of separation logic predicates is well suited for runtime execution the implicit footprint and existential quantification make efficient runtime checking challenging to address these challenges we introduce coloring technique for efficiently checking method footprints and describe techniques for inferring values of existentially quantified variables we have implemented our runtime checker in the context of tool for enforcing specifications of java programs our experience suggests that our runtime checker is useful companion to static verifier for separation logic specifications
this paper discusses the applications of rough ensemble classifier in two emerging problems of web mining the categorization of web services and the topic specific web crawling both applications discussed here consist of two major steps split of feature space based on internal tag structure of web services and hypertext to represent in tensor space model and combining classifications obtained on different tensor components using rough ensemble classifier in the first application we have discussed the classification of web services two step improvement on the existing classification results of web services has been shown here in the first step we achieve better classification results over existing by using tensor space model in the second step further improvement of the results has been obtained by using rough set based ensemble classifier in the second application we have discussed the focused crawling using rough ensemble prediction our experiment regarding this application has provided better harvest rate and better target recall for focused crawling
in this paper we explore the relationship between preference elicitation learning style problem that arises in combinatorial auctions and the problem of learning via queries studied in computational learning theory preference elicitation is the process of asking questions about the preferences of bidders so as to best divide some set of goods as learning problem it can be thought of as setting in which there are multiple target concepts that can each be queried separately but where the goal is not so much to learn each concept as it is to produce an optimal example in this work we prove number of similarities and differences between two bidder preference elicitation and query learning giving both separation results and proving some connections between these problems
caching and content delivery are important for content intensive publish subscribe applications this paper proposes several content distribution approaches that combine match based pushing and access based caching based on users subscription information and access patterns to study the performance of the proposed approaches we built simulator and developed workload to mimic the content and access dynamics of busy news site using purely access based caching approach as the baseline our best approaches yield over and relative gains for two request traces in terms of the hit ratio in local caches while keeping the traffic overhead comparable even when the subscription information is assumed not to reflect users accesses perfectly our best approaches still have about and relative improvement for the two traces to our knowledge this work is the first effort to investigate content distribution under the publish subscribe paradigm
the contribution of this work is the design and evaluation of programming language model that unifies aspects and classes as they appear in aspectj like languages we show that our model preserves the capabilities of aspectj like languages while improving the conceptual integrity of the language model and the compositionality of modules the improvement in conceptual integrity is manifested by the reduction of specialized constructs in favor of uniform orthogonal constructs the enhancement in compositionality is demonstrated by better modularization of integration and higher order crosscutting concerns
despite constant improvements in fabrication technology hardware components are consuming more power than ever with the ever increasing demand for higher performance in highly integrated systems and as battery technology falls further behind managing energy is becoming critically important to various embedded and mobile systems in this paper we propose and implement power aware virtual memory to reduce the energy consumed by the memory in response to workloads becoming increasingly data centric we can use the power management features in current memory technology to put individual memory devices into low power modes dynamically under software control to reduce the power dissipation however it is imperative that any techniques employed weigh memory energy savings against any potential energy increases in other system components due to performance degradation of the memory using novel power aware virtual memory implementation we estimate significant reduction in memory power dissipation from to based on rambus memory specifications while running various real world applications in working linux system unfortunately due to hardware bug in the chipset direct power measurement is currently not possible applying more advanced techniques we can reduce power dissipation further to depending on the actual workload with negligible effects on performance we also show this work is applicable to other memory architectures and is orthogonal to previously proposed hardware controlled power management techniques so it can be applied simultaneously to further enhance energy conservation in variety of platforms
in large scale networked computing systems component failures become norms instead of exceptions failure aware resource management is crucial for enhancing system availability and achieving high performance in this paper we study how to efficiently utilize system resources for high availability computing with the support of virtual machine vm technology we design reconfigurable distributed virtual machine rdvm infrastructure for networked computing systems we propose failure aware node selection strategies for the construction and reconfiguration of rdvms we leverage the proactive failure management techniques in calculating nodes reliability states we consider both the performance and reliability status of compute nodes in making selection decisions we define capacity reliability metric to combine the effects of both factors in node selection and propose best fit algorithms with optimistic and pessimistic selection strategies to find the best qualified nodes on which to instantiate vms to run user jobs we have conducted experiments using failure traces from production systems and the nas parallel benchmark programs on real world cluster system the results show the enhancement of system productivity by using the proposed strategies with practically achievable accuracy of failure prediction with the best fit strategies the job completion rate is increased by compared with that achieved in the current lanl hpc cluster the task completion rate reaches with utilization of relatively unreliable nodes
from design point of view coordination is radically undertheorized and under explored arguably playground games are the universal cross cultural venue in which people learn about and explore coordination between one another and between the worlds of articulated rules and the worlds of experience and action they can therefore teach us about the processes inherent in human coordination provide model of desirable coordinative possibilities and act as design framework from which to explore the relationship between game and game play or to put it in terms of an inherent tension in human computer interaction between plans and situated actions when brought together with computer language for coordination that helps us pare down coordinative complexity to essential components we can create systems that have highly distributed control structures in this paper we present the design of four such student created collaborative distributed interactive systems for face to face use these take their inspiration from playground games with respect to who can play plurality how appropriability and to what ends acompetitiveness as it happens our sample systems are themselves games however taking playground games as our model helps us create systems that support game play featuring not enforcement of plans but emergence of rules roles and turn taking
if conversion transforms control dependencies to data dependencies by using predication mechanism it is useful to eliminate hard to predict branches and to reduce the severe performance impact of branch mispredictions however the use of predicated execution in out of order processors has to deal with two problems there can be multiple definitions for single destination register at rename time and instructions with false predicated consume unnecessary resources predicting predicates is an effective approach to address both problems however predicting predicates that come from hard to predict branches is not beneficial in general because this approach reverses the if conversion transformation loosing its potential benefits in this paper we propose new scheme that dynamically selects which predicates are worthy to be predicted and which one are more effective in its if converted form we show that our approach significantly outperforms previous proposed schemes moreover it performs within of an ideal scheme with perfect predicate prediction
the paper describes an application of artificial intelligence technology to the implementation of rapid prototyping method in object oriented performance design oopd for real time systems oopd consists of two prototyping phases for real time systems each of these phases consists of three steps prototype construction prototype execution and prototype evaluation we present artificial intelligence based methods and tools to be applied to the individual steps in the prototype construction step rapid construction mechanism using reusable software components is implemented based on planning in the prototype execution step hybrid inference mechanism is used to execute the constructed prototype described in declarative knowledge representation mendel which is prolog based concurrent object oriented language can be used as prototype construction tool and prototype execution tool in the prototype evaluation step an expert system which is based on qualitative reasoning is implemented to detect and diagnose bottlenecks and generate an improvement plan for them
rdf schema rdfs as lightweight ontology language is gaining popularity and consequently tools for scalable rdfs inference and querying are needed sparql has become recently wc standard for querying rdf data but it mostly provides means for querying simple rdf graphs only whereas querying with respect to rdfs or other entailment regimes is left outside the current specification in this paper we show that sparql faces certain unwanted ramifications when querying ontologies in conjunction with rdf datasets that comprise multiple named graphs and we provide an extension for sparql that remedies these effects moreover since rdfs inference has close relationship with logic rules we generalize our approach to select custom ruleset for specifying inferences to be taken into account in sparql query we show that our extensions are technically feasible by providing benchmark results for rdfs querying in our prototype system giabata which uses datalog coupled with persistent relational database as back end for implementing sparql with dynamic rule based inference by employing different optimization techniques like magic set rewriting our system remains competitive with state of the art rdfs querying systems
the promise of rule based computing was to allow end users to create modify and maintain applications without the need to engage programmers but experience has shown that rule sets often interact in subtle ways making them difficult to understand and reason about this has impeded the widespread adoption of rule based computing this paper describes the design and implementation of xcellog user centered deductive spreadsheet system to empower non programmers to specify and manipulate rule based systems the driving idea underlying the system is to treat sets as the fundamental data type and rules as specifying relationships among sets and use the spreadsheet metaphor to create and view the materialized sets the fundamental feature that makes xcellog suitable for non programmers is that the user mainly sees the effect of the rules when rules or basic facts change the user sees the impact of the change immediately this enables the user to gain confidence in the rules and their modification and also experiment with what if scenarios without any programming preliminary experience with using xcellog indicates that it is indeed feasible to put the power of deductive spreadsheets for doing rule based computing into the hands of end users and do so without the requirement of programming or the constraints of canned application packages
many reputation management systems have been developed under the assumption that each entity in the system will use variant of the same scoring function much of the previous work in reputation management has focused on providing robustness and improving performance for given reputation scheme in this paper we present reputation based trust management framework that supports the synthesis of trust related feedback from many different entities while also providing each entity with the flexibility to apply different scoring functions over the same feedback data for customized trust evaluations we also propose novel scheme to cache trust values based on recent client activity to evaluate our approach we implemented our trust management service and tested it on realistic application scenario in both lan and wan distributed environments our results indicate that our trust management service can effectively support multiple scoring functions with low overhead and high availability
we present novel translation model based on tree to string alignment template tat which describes the alignment between source parse tree and target string tat is capable of generating both terminals and non terminals and performing reordering at both low and high levels the model is linguistically syntax based because tats are extracted automatically from word aligned source side parsed parallel texts to translate source sentence we first employ parser to produce source parse tree and then apply tats to transform the tree into target string our experiments show that the tat based model significantly outperforms pharaoh state of the art decoder for phrase based models
the event based paradigm has gained interest as solution for integrating large scale distributed and loosely coupled systems it also has been propelled by the growth of event based applications like ambient intelligence utility monitoring event driven supply chain management escm multi player games etc the pub sub paradigm is particularly relevant for implementing these types of applications the core of pub sub system is the notification service and there are several commercial products open source and research projects that implement it the nature of notification services distribution scale large amount of concurrent messages processed as well as the diversity of their implementations make their run time analysis difficult the few existing solutions that address this problem are mainly proprietary and focus on single aspects of the behavior general solution would significantly support research development and tuning of these services this paper introduces notification service independent analysis framework that enables online analysis of their behavior based on streamed observations flexibly defined metrics and visual representations thereof
in this paper we describe an off line unconstrained handwritten arabic word recognition system based on segmentation free approach and semi continuous hidden markov models schmms with explicit state duration character durations play significant part in the recognition of cursive handwriting the duration information is still mostly disregarded in hmm based automatic cursive handwriting recognizers due to the fact that hmms are deficient in modeling character durations properly we will show experimentally that explicit state duration modeling in the schmm framework can significantly improve the discriminating capacity of the schmms to deal with very difficult pattern recognition tasks such as unconstrained handwritten arabic recognition in order to carry out the letter and word model training and recognition more efficiently we propose new version of the viterbi algorithm taking into account explicit state duration modeling three distributions gamma gauss and poisson for the explicit state duration modeling have been used and comparison between them has been reported to perform word recognition the described system uses an original sliding window approach based on vertical projection histogram analysis of the word and extracts new pertinent set of statistical and structural features from the word image several experiments have been performed using the ifn enit benchmark database and the best recognition performances achieved by our system outperform those reported recently on the same database
growing system sizes together with increasing performance variability are making globally synchronous operation hard to realize mesochronous clocking constitutes possible solution to the problems faced the most fundamental of problems faced when communicating between mesochronously clocked regions concerns the possibility of data corruption caused by metastability this paper presents an integrated communication and mesochronous clocking strategy which avoids timing related errors while maintaining globally synchronous system perspective the architecture is scalable as timing integrity is based purely on local observations it is demonstrated with nm cmos standard cell network on chip design which implements completely timing safe global communication in modular system
getting the right software requirements under the right environment assumptions is critical precondition for developing the right software this task is intrinsically difficult we need to produce complete adequate consistent and well structured set of measurable requirements and assumptions from incomplete imprecise and sparse material originating from multiple often conflicting sources the system we need to consider comprises software and environment components including people and devices rich system model may significantly help us in this task such model must integrate the intentional structural functional and behavioral facets of the system being conceived rigorous techniques are needed for model construction analysis exploitation and evolution such techniques should support early and incremental reasoning about partial models for variety of purposes including satisfaction arguments property checks animations the evaluation of alternative options the analysis of risks threats and conflicts and traceability management the tension between technical precision and practical applicability calls for suitable mix of heuristic deductive and inductive forms of reasoning on suitable mix of declarative and operational specifications formal techniques should be deployed only when and where needed and kept hidden wherever possible the paper provides retrospective account of our research efforts and practical experience along this route problem oriented abstractions analyzable models and constructive techniques were permanent concerns
recent scientific and technological advances have witnessed an abundance of structural patterns modeled as graphs as result it is of special interest to process graph containment queries effectively on large graph databases given graph database and query raph the graph containment query is to retrieve all graphs in which contain as subgraph due to the vast number of graphs in and the nature of complexity for subgraph isomorphism testing it is desirable to make use of high quality graph indexing mechanisms to reduce the overall query processing cost in this paper we propose new cost effective graph indexing method based on frequent tree features of the graph database we analyze the effectiveness and efficiency of tree as indexing feature from three critical aspects feature size feature selection cost and pruning power in order to achieve better pruning ability than existing graph based indexing methods we select in addition to frequent tree features tree small number of discriminative graphs delta on demand without costly graph mining process beforehand our study verifies that tree delta is better choice than graph for indexing purpose denoted tree delta ge graph to address the graph containment query problem it has two implications the index construction by tree delta is efficient and the graph containment query processing by tree delta is efficient our experimental studies demonstrate that tree delta has compact index structure achieves an order of magnitude better performance in index construction and most importantly outperforms up to date graph based indexing methods gindex and tree in graph containment query processing
in this paper we describe restartable atomic sequences an optimistic mechanism for implementing simple atomic operations such as test and set on uniprocessor thread that is suspended within restartable atomic sequence is resumed by the operating system at the beginning of the sequence rather than at the point of suspension this guarantees that the thread eventually executes the sequence atomically restartable atomic sequence has significantly less overhead than other software based synchronization mechanisms such as kernel emulation or software reservation consequently it is an attractive alternative for use on uniprocessors that do no support atomic operations even on processors that do support atomic operations in hardware restartable atomic sequences can have lower overhead we describe different implementations of restartable atomic sequences for the mach and taos operating systems these systems thread management packages rely on atomic operations to implement higher level mutual exclusion facilities we show that improving the performance of low level atomic operations and therefore mutual exclusion mechanisms improves application performance
java is becoming viable platform for hard real time computing there are production and research real time java vms as well as applications in both military and civil sector technological advances and increased adoption of real time java contrast significantly with the lack of real time benchmarks the few benchmarks that exist are either low level synthetic micro benchmarks or benchmarks used internally by companies making it difficult to independently verify and repeat reported results this paper presents the collision detector benchmark suite an open source application benchmark suite that targets different hard and soft real time virtual machines is at its core real time benchmark with single periodic task which implements aircraft collision detection based on simulated radar frames the benchmark can be configured to use different sets of real time features and comes with number of workloads we describe the architecture of the benchmark and characterize the workload based on input parameters
this paper defines an extended polymorphic type system for an ml style programming language and develops sound and complete type inference algorithm different frdm the conventional ml type discipline the proposed type system allows full rank polymorphism where polymorphic types can appear in other types such as product types disjoint union types and range types of function types because of this feature the proposed type system significantly reduces the value only restriction of polymorphism which is currently adopted in most of ml style impure languages it also serves as basis for efficient implementation of type directed compilation of polymorphism the extended type system achieves more efficient type inference algorithm and it also contributes to develop more efficient type passing implementation of polymorphism we show that the conventional ml polymorphism sometimes introduces exponential overhead both at compile time elaboration and run time type passing execution and that these problems can be eliminated by our type inference system compared with more powerful rank type inference systems based on semi unification the proposed type inference algorithm infers most general type for any typable expression by using the conventional first order unification and it is therefore easily adopted in existing implementation of ml family of languages
in this paper we propose method for job migration policies by considering effective usage of global memory in addition to cpu load sharing in distributed systems the objective of this paper is to reduce the number of page faults caused by unbalanced memory allocations for jobs among distributed nodes which improves the overall performance of distributed system the proposed method which uses the high performance and high throughput approach with remote execution strategy performs the best for both cpu bound and memory bound jobs in homogeneous as well as in the heterogeneous networks in distributed system
this paper presents deterministic and efficient algorithm for online facility location the algorithm is based on simple hierarchical partitioning and is extremely simple to implement it also applies to variety of models ie models where the facilities can be placed anywhere in the region or only at customer sites or only at fixed locations the paper shows that the algorithm is log competitive under these various models where is the total number of customers it also shows that the algorithm is competitive with high probability and for any arrival order when customers are uniformly distributed or when they follow distribution satisfying smoothness property experimental results for variety of scenarios indicate that the algorithm behaves extremely well in practice
major trend in modern system on chip design is growing system complexity which results in sharp increase of communication traffic on the on chip communication bus architectures in real time embedded system task arrival rate inter task arrival time and data size to be transferred are not uniform over time this is due to the partial re configuration of an embedded system to cope with dynamic workload in this context the traditional application specific bus architectures may fail to meet the real time constraints thus to incorporate the random behavior of on chip communication this work proposes an approach to synthesize an on chip bus architecture which is robust for given distributions of random tasks the randomness of communication tasks is characterized by three main parameters which are the average task arrival rate the average inter task arrival time and the data size for synthesis an on chip bus requirement is guided by the worst case performance need while the dynamic voltage scaling technique is used to save energy when the workload is low or timing slack is high this in turn results in an effective utilization of communication resources under variable workload
web designers usually ignore how to model real user expectations and goals mainly due to the large and heterogeneous audience of the web this fact leads to websites which are difficult to comprehend by visitors and complex to maintain by designers in order to ameliorate this scenario an approach for using the modeling framework in web engineering has been developed in this paper furthermore we also present traceability approach for obtaining different kind of design artifacts tailored to specific web modeling method finally we include sample of our approach in order to show its applicability and we describe prototype tool as proof of concept of our research
interoperability and cross fertilization of multiple hypermedia domains are relatively new concerns to the open hypermedia and structural computing community recent work in this area has explored integrated data models and component based structural service architectures this paper focuses on the user interface aspects relating to client applications of the structural services it presents an integrative design of graphical hypermedia user interface for hypermedia based process centric enterprise models an enterprise model covers various important perspectives of an enterprise such as processes information organization and systems in this paper examples are given to show that these perspectives are presented better using mixture of hypermedia structures found in several hypermedia domains the visualization and interaction design for such hypermedia based enterprise models integrates many features found in navigational spatial taxonomic workflow and cooperative hypertext domains use case is provided to show that client applications based on this design allow users to see and interact with an enterprise model from multiple perspectives in addition some initial user experience is also reported
this paper describes fast html web page detection approach that saves computation time by limiting the similarity computations between two versions of web page to nodes having the same html tag type and by hashing the web page in order to provide direct access to node information this efficient approach is suitable as client application and for implementing server applications that could serve the needs of users in monitoring modifications to html web pages made over time and that allow for reporting and visualizing changes and trends in order to gain insight about the significance and types of such changes the detection of changes across two versions of page is accomplished by performing similarity computations after transforming the web page into an xml like structure in which node corresponds to an open close html tag performance and detection reliability results were obtained and showed speed improvements when compared to the results of previous approach
we describe new algorithm for fast global register allocation called linear scan this algorithm is not based on graph coloring but allocates registers to variables in single linear time scan of the variables live ranges the linear scan algorithm is considerably faster than algorithms based on graph coloring is simple to implement and results in code that is almost as efficient as that obtained using more complex and time consuming register allocators based on graph coloring the algorithm is of interest in applications where compile time is concern such as dynamic compilation systems ldquo just in time rdquo compilers and interactive development environments
number of linear classification methods such as the linear least squares fit llsf logistic regression and support vector machines svm’s have been applied to text categorization problems these methods share the similarity by finding hyperplanes that approximately separate class of document vectors from its complement however support vector machines are so far considered special in that they have been demonstrated to achieve the state of the art performance it is therefore worthwhile to understand whether such good performance is unique to the svm design or if it can also be achieved by other linear classification methods in this paper we compare number of known linear classification methods as well as some variants in the framework of regularized linear systems we will discuss the statistical and numerical properties of these algorithms with focus on text categorization we will also provide some numerical experiments to illustrate these algorithms on number of datasets
computer systems security management can be largely improved by the installation of mechanisms for automatic discovery of complex attacks due to system vulnerabilities and middleware misconfigurations the aim of this paper is to propose new formal technique for modeling computer system vulnerabilities and automatic generation of attack scenarios exploiting these vulnerabilities
risk relief services rrss as complementary to online trust promoting services are becoming versatile options for risk reduction in online consumer to consumer auctions in this paper we identify factors that affect the behavior of buyers in an online auction market who had to either adopt or not adopt online escrow services oes an experimental cc auction system with embedded decision support features was used to collect data results show that market factors such as fraud rate product price and seller’s reputation are important in determining buyers oes adoption this study also finds that sellers reputation has significant effect on buyer’s risk perception which influences his oes adoption decision furthermore the buyers oes adoption decisions were found to be congruent with the implied recommendations that were based on expected utility calculations
functional validation of processor design through execution of suite of test programs is common industrial practice in this paper we develop high level architectural specification driven methodology for systematic test suite generation our primary contribution is an automated test suite generation methodology that covers all possible processor pipeline interactions to accomplish this automation we develop fully formal processor model based on communicating extended finite state machines and traverse the processor model for on the fly generation of short test programs covering all reachable states and transitions our test generation method achieves several orders of magnitude reduction in test suite size compared to the previously proposed formal approaches for test generation leading to drastic reduction in validation effort
searching information through the internet often requires users to separately contact several digital libraries use each library interface to author the query analyze retrieval results and merge them with results returned by other libraries such solution could be simplified by using centralized server that acts as gateway between the user and several distributed repositories the centralized server receives the user query forwards the user query to federated repositories mdash possibly translating the query in the specific format required by each repository mdash and fuses retrieved documents for presentation to the user to accomplish these tasks efficiently the centralized server should perform some major operations such as resource selection query transformation and data fusionin this paper we report on some aspects of mind system for managing distributed heterogeneous multimedia libraries mind http wwwmind projectorg in particular this paper focusses on the issue of fusing results returned by different image repositories the proposed approach is based on normalization of matching scores assigned to retrieved images by individual libraries experimental results on prototype system show the potential of the proposed approach with respect to traditional solutions
this paper explores speculative precomputation technique that uses idle thread context in multithreaded architecture to improve performance of single threaded applications it attacks program stalls from data cache misses by pre computing future memory accesses in available thread contexts and prefetching these data this technique is evaluated by simulating the performance of research processor based on the itanium trade isa supporting simultaneous multithreading two primary forms of speculative precomputation are evaluated if only the non speculative thread spawns speculative threads performance gains of up to are achieved when assuming ideal hardware however this speedup drops considerably with more realistic hardware assumptions permitting speculative threads to directly spawn additional speculative threads reduces the overhead associated with spawning threads and enables significantly more aggressive speculation overcoming this limitation even with realistic costs for spawning threads speedups as high as are achieved with an average speedup of
this article is proposal for database index structure the xpath accelerator that has been specifically designed to support the evaluation of xpath path expressions as such the index is capable to support all xpath axes including ancestor following preceding sibling descendant or self etc this feature lets the index stand out among related work on xml indexing structures which had focus on the child and descendant axes only the index has been designed with close eye on the xpath semantics as well as the desire to engineer its internals so that it can be supported well by existing relational database query processing technology the index permits set oriented or rather sequence oriented path evaluation and can be implemented and queried using well established relational index structures notably trees and treeswe discuss the implementation of the xpath accelerator on top of different database backends and show that the index performs well on all levels of the memory hierarchy including disk based and main memory based database systems
the design of survivable mesh based communication networks has received considerable attention in recent years one task is to route backup paths and allocate spare capacity in the network to guarantee seamless communications services survivable to set of failure scenarios this is complex multi constraint optimization problem called the spare capacity allocation sca problem this paper unravels the sca problem structure using matrix based model and develops fast and efficient approximation algorithm termed successive survivable routing ssr first per flow spare capacity sharing is captured by spare provision matrix spm method the spm matrix has dimension the number of failure scenarios by the number of links it is used by each demand to route the backup path and share spare capacity with other backup paths next based on special link metric calculated from spm ssr iteratively routes updates backup paths in order to minimize the cost of total spare capacity backup path can be further updated as long as it is not carrying any traffic furthermore the spm method and ssr algorithm are generalized from protecting all single link failures to any arbitrary link failures such as those generated by shared risk link groups or all single node failures numerical results comparing several sca algorithms show that ssr has the best trade off between solution optimality and computation speed
the most important features of string matching algorithm are its efficiency and its flexibility efficiency has traditionally received more attention while flexibility in the search pattern is becoming more and more important issue most classical string matching algorithms are aimed at quickly finding an exact pattern in text being knuth morris pratt kmp and the boyer moore bm family the most famous ones recent development uses deterministic suffix automata to design new optimal string matching algorithms eg bdm and turbobdm flexibility has been addressed quite separately by the use of bit parallelism which simulates automata in their nondeterministic form by using bits and exploiting the intrinsic parallelism inside the computer word eg the shift or algorithm those algorithms are extended to handle classes of characters and errors in the pattern and or in the text their drawback being their inability to skip text characters in this paper we merge bit parallelism and suffix automata so that nondeterministic suffix automaton is simulated using bit parallelism the resulting algorithm called bndm obtains the best from both worlds it is much simpler to implement than bdm and nearly as simple as shift or it inherits from shift or the ability to handle flexible patterns and from bdm the ability to skip characters bndm is faster than bdm and up to times faster than shift or when compared to the fastest existing algorithms on exact patterns which belong to the bm family bndm is from slower to times faster depending on the alphabet size with respect to flexible pattern searching bndm is by far the fastest technique to deal with classes of characters and is competitive to search allowing errors in particular bndm seems very adequate for computational biology applications since it is the fastest algorithm to search on dna sequences and flexible searching is an important problem in that area as theoretical development related to flexible pattern matching we introduce new automaton to recognize suffixes of patterns with classes of characters to the best of our knowledge this automaton has not been studied before
balancing the competing goals of collaboration and security is difficult multidimensional problem collaborative systems often focus on building useful connections among people tools and information while security seeks to ensure the availability confidentiality and integrity of these same elements in this article we focus on one important dimension of this problem access control the article examines existing access control models as applied to collaboration highlighting not only the benefits but also the weaknesses of these models
computer software must dynamically adapt to changing conditions in order to fully realize the benefit of dynamic adaptation it must be performed correctly the correctness of adaptation cannot be properly addressed without precisely specifying the requirements for adaptation this paper introduces an approach to formally specifying adaptation requirements in temporal logic we introduce ltl an adaptation based extension to linear temporal logic and use this logic to specify three commonly used adaptation semantics neighborhood composition and sequential composition techniques are developed and applied to ltl to construct the specification of an adaptive system we introduce adaptation semantics graphs to visually present the adaptation semantics specifications for adaptive systems can be automatically generated from adaptation semantics graphs
real time database systems support data processing needs of real time systems where transactions have time constraints here we consider repetitively executed transactions and assume that execution histories are logged well known priority assignment technique called earliest deadline first is biased towards short transactions in which short transactions have better chances of completing their executions within their deadlines we introduce the notion of fair scheduling in which the goal is to have similar completion ratios for all transaction classes short to long in sizes we propose priority assignment techniques that overcome the biased scheduling and show that they work via extensive simulation experiments
online analytical processing olap systems allow one to quickly provide answers to analytical queries on multidimensional data these systems are typically based on visualisations characterised by limited interaction and exploration capabilities in this paper we propose innovative visual and interaction techniques for the analysis of multidimensional data the proposed solution is based on three dimensional hypercube representations that can be explored using dynamic queries and that combines colour coding detail on demand cutting planes and viewpoint control techniques we demonstrate the effectiveness of our visualisation tool by providing some practical examples showing how it has been used for extracting information from real multidimensional data
in this paper we introduce method for web page ranking based on computational geometry to evaluate and test by examples order relationships among web pages belonging to different knowledge domains the goal is through an organising procedure to learn from these examples real valued ranking function that induces ranking via convexity feature we consider the problem of self organising learning from numerical data to be represented by well fitted convex polygon procedure in which the vertices correspond to descriptors representing domains of web pages results and statistical evaluation of procedure show that the proposed method may be characterised as accurate
navigation has added interactivity in nowadays multimedia applications which support effective accessing to objects of various formats and presentation requirements storage issues need to be reconsidered for the new type of navigational multimedia applications in order to improve system’s performance this paper addresses the problem of multimedia data storage towards improving data accessibility and request servicing under navigational applications navigational graph based model for the multimedia data representation is proposed to guide the data placement under hierarchical storage topology the multimedia data dependencies access frequencies and timing constraints are used to characterize the graph nodes which correspond to multimedia objects allocated at the tertiary storage level based on certain defined popularity criteria data are elevated and placed on secondary level towards improving both the request servicing and data accessibility the proposed multimedia data elevation is prefetching approach since it is performed apriori not on demand based on previously extracted user access patterns appropriate data placement policies are also employed at the secondary level and simulation model has been developed based on current commercial tertiary and secondary storage devices this model is used to evaluate the proposed popularity based data elevation approach as employed under hierarchical storage subsystem experimentation is performed under artificial data workloads and it is shown that the proposed hierarchical data placement approach considerably improves data accessing and request servicing in navigational multimedia applications the iterative improvement placement is proven to outperform earlier related multimedia data placement policies with respect to commonly used performance metrics
while hypertext is often claimed to be tool that especially aids associative thinking intellectual work involves more than association so questions arise about the usefulness of hypertext tools in the more disciplined aspects of scholarly and argumentative writing examining the phases of scholarly writing reveals that different hypertext tools can aid different phases of intellectual work in ways other than associative thinking spatial hypertext is relevant at all phases while page and link hypertext is more appropriate to some phases than others
one of the biggest challenges in future application development is device heterogeneity in the future we expect to see rich variety of computing devices that can run applications these devices have different capabilities in processors memory networking screen sizes input methods and software libraries we also expect that future users are likely to own many types of devices depending on users changing situations and environments they may choose to switch from one type of device to another that brings the best combination of application functionality and device mobility size weight etc based on this scenario we have designed and implemented seamless application framework called the roam system that can both assist developers to build multiplatform applications that can run on heterogeneous devices and allow user to move migrate running application among heterogeneous devices in an effortless manner the roam system is based on partitioning of an application into components and it automatically selects the most appropriate adaptation strategy at the component level for target platform to evaluate our system we have created several multi platform roam applications including chess game connect game and shopping aid application we also provide measurements on application performance and describe our experience with application development in the roam system our experience shows that it is relatively easy to port existing applications to the roam system and runtime application migration latency is within few seconds and acceptable to most non real time applications
in this paper we focus on the design of bivariate edas for discrete optimization problems and propose new approach named hsmiec while the current edas require much time in the statistical learning process as the relationships among the variables are too complicated we employ the selfish gene theory sg in this approach as well as mutual information and entropy based cluster miec model is also set to optimize the probability distribution of the virtual population this model uses hybrid sampling method by considering both the clustering accuracy and clustering diversity and an incremental learning and resample scheme is also set to optimize the parameters of the correlations of the variables compared with several benchmark problems our experimental results demonstrate that hsmiec often performs better than some other edas such as bmda comit mimic and ecga
rare objects are often of great interest and great value until recently however rarity has not received much attention in the context of data mining now as increasingly complex real world problems are addressed rarity and the related problem of imbalanced data are taking center stage this article discusses the role that rare classes and rare cases play in data mining the problems that can result from these two forms of rarity are described in detail as are methods for addressing these problems these descriptions utilize examples from existing research so that this article provides good survey of the literature on rarity in data mining this article also demonstrates that rare classes and rare cases are very similar phenomena both forms of rarity are shown to cause similar problems during data mining and benefit from the same remediation methods
we present an algorithm for translating xslt programs into sql our context is that of virtual xml publishing in which single xml view is defined from relational database and subsequently queried with xslt programs each xslt program is translated into single sql query and run entirely in the database engine our translation works for large fragment of xslt which we define that includes descendant ancestor axis recursive templates modes parameters and aggregates we put considerable effort in generating correct and efficient sql queries and describe several optimization techniques to achieve this efficiency we have tested our system on all sql queries of the tpc database benchmark which we represented in xslt and then translated back to sql using our translator
we present new phrase based conditional exponential family translation model for statistical machine translation the model operates on feature representation in which sentence level translations are represented by enumerating all the known phrase level translations that occur inside them this makes the model good match with the commonly used phrase extraction heuristics the model’s predictions are properly normalized probabilities in addition the model automatically takes into account information provided by phrase overlaps and does not suffer from reference translation reachability problems we have implemented an open source translation system sinuhe based on the proposed translation model our experiments on europarl and gigafren corpora demonstrate that finding the unique map parameters for the model on large scale data is feasible with simple stochastic gradient methods sinuhe is fast and memory efficient and the bleu scores obtained by it are only slightly inferior to those of moses
the objective of this article is to investigate the problem of generating both positive and negative exact association rules when formal context of positive attributes is provided straightforward solution to this problem consists of conducting an apposition of the initial context with its complementary context construct the concept lattice of apposed contexts and then extract rules more challenging problem consists of exploiting rules generated from each one of the contexts and to get the whole set of rules for the context in this paper we analyze set of identified situations based on distinct types of input and come out with set of properties obviously the global set of positive and negative rules is superset of purely positive rules ie rules with positive attributes only and purely negative ones since it generally contains mixed rules ie rules in which at least positive attribute and negative attribute coexist the paper presents also set of inference rules to generate subset of all mixed rules from positive negative and mixed ones finally two key conclusions can be drawn from our analysis the generic basis containing negative rules �k cannot be completely and directly inferred from the set �k of positive rules or from the concept lattice and ii the whole set of mixed rules may not be completely generated from �k alone �k �k alone or alone
the frequent pattern tree fp tree is an efficient data structure for association rule mining without generation of candidate itemsets it was used to compress database into tree structure which stored only large items it however needed to process all transactions in batch way in real world applications new transactions are usually incrementally inserted into databases in the past we proposed fast updated fp tree fufp tree structure to efficiently handle new transactions and to make the tree update process become easier in this paper we attempt to modify the fufp tree construction based on the concept of pre large itemsets pre large itemsets are defined by lower support threshold and an upper support threshold it does not need to rescan the original database until number of new transactions have been inserted the proposed approach can thus achieve good execution time for tree construction especially when each time small number of transactions are inserted experimental results also show that the proposed pre fufp maintenance algorithm has good performance for incrementally handling new transactions
abstract in this paper we argue the need for effective resource management mechanisms for sharing resources in commodity clusters to address this issue we present the design of sharc system that enables resource sharing among applications in such clusters sharc depends on single node resource management mechanisms such as reservations or shares and extends the benefits of such mechanisms to clustered environments we present techniques for managing two important resources cpu and network interface bandwidth on cluster wide basis our techniques allow sharc to support reservation of cpu and network interface bandwidth for distributed applications dynamically allocate resources based on past usage and provide performance isolation to applications our experimental evaluation has shown that sharc can scale to node clusters running applications these results demonstrate that sharc can be an effective approach for sharing resources among competing applications in moderate size clusters
an automatic guided vehicle agv transportation system is fully automated system that provides logistic services in an industrial environment such as warehouse or factory traditionally the agvs that execute the transportation tasks are controlled by central server via wireless communication in joint effort between egemin an industrial manufacturer of agv transportation systems and distrinet labs research at the katholieke universiteit leuven we developed an innovative decentralized architecture for controlling agvs the driving motivations behind decentralizing the control of agvs were new and future quality requirements such as flexibility and openness at the software architectural level the agv control system is structured as multi agent system the detailed design and implementation is object oriented in this paper we report our experiences with developing the agent based control system for agvs starting from system requirements we give an overview of the software architecture and we zoom in on number of concrete functionalities we reflect on our experiences and report lessons learned from applying multi agent systems for real world agv control
there has been growing interest in mining frequent itemsets in relational data with multiple attributes key step in this approach is to select set of attributes that group data into transactions and separate set of attributes that labels data into items unsupervised and unrestricted mining however is stymied by the combinatorial complexity and the quantity of patterns as the number of attributes grows in this paper we focus on leveraging the semantics of the underlying data for mining frequent itemsets for instance there are usually taxonomies in the data schema and functional dependencies among the attributes domain knowledge and user preferences often have the potential to significantly reduce the exponentially growing mining space these observations motivate the design of user directed data mining framework that allows such domain knowledge to guide the mining process and control the mining strategy we show examples of tremendous reduction in computation by using domain knowledge in mining relational data with multiple attributes
this paper presents multiple model real time tracking technique for video sequences based on the mean shift algorithm the proposed approach incorporates spatial information from several connected regions into the histogram based representation model of the target and enables multiple models to be used to represent the same object the use of several regions to capture the color spatial information into single combined model allow us to increase the object tracking efficiency by using multiple models we can make the tracking scheme more robust in order to work with sequences with illumination and pose changes we define model selection function that takes into account both the similarity of the model with the information present in the image and the target dynamics in the tracking experiments presented our method successfully coped with lighting changes occlusion and clutter
multicore hardware is making concurrent programs pervasive unfortunately concurrent programs are prone to bugs among different types of concurrency bugs atomicity violation bugs are common and important existing techniques to detect atomicity violation bugs suffer from one limitation requiring bugs to manifest during monitored runs which is an open problem in concurrent program testing this paper makes two contributions first it studies the interleaving characteristics of the common practice in concurrent program testing ie running program over and over to understand why atomicity violation bugs are hard to expose second it proposes ctrigger to effectively and efficiently expose atomicity violation bugs in large programs ctrigger focuses on special type of interleavings ie unserializable interleavings that are inherently correlated to atomicity violation bugs and uses trace analysis to systematically identify likely feasible unserializable interleavings with low occurrence probability ctrigger then uses minimum execution perturbation to exercise low probability interleavings and expose difficult to catch atomicity violation we evaluate ctrigger with real world atomicity violation bugs from four sever desktop applications apache mysql mozilla and pbzip and three splash applications on core machines ctrigger efficiently exposes the tested bugs within seconds two to four orders of magnitude faster than stress testing without ctrigger some of these bugs do not manifest even after full days of stress testing in addition without deterministic replay support once bug is exposed ctrigger can help programmers reliably reproduce it for diagnosis our tested bugs are reproduced by ctrigger mostly within seconds to over times faster than stress testing
we study oblivious routing in which the packet paths are constructed independently of each other we give simple oblivious routing algorithm for geometric networks in which the nodes are embedded in the euclidean plane in our algorithm packet path is constructed by first choosing random intermediate node in the space between the source and destination and then the packet is sent to its destination through the intermediate node we analyze the performance of the algorithm in terms of the stretch and congestion of the resulting paths we show that the stretch is constant and the congestion is near optimal when the network paths can be chosen to be close to the geodesic lines that connect the end points of the paths we give applications of our general result to the mesh topology and uniformly distributed disc graphs previous oblivious routing algorithms with near optimal congestion use many intermediate nodes and do not control the stretch
hardly predictable data addresses in many irregular applications have rendered prefetching ineffective in many cases the only accurate way to predict these addresses is to directly execute the code that generates them as multithreaded architectures become increasingly popular one attractive approach is to use idle threads on these machines to perform pre execution mdash essentially combined act of speculative address generation and prefetching mdash to accelerate the main thread in this paper we propose such pre execution technique for simultaneous multithreading smt processors by using software to control pre execution we are able to handle some of the most important access patterns that are typically difficult to prefetch compared with existing work on pre execution our technique is significantly simpler to implement eg no integration of pre execution results no need of shortening programs for pre execution and no need of special hardware to copy register values upon thread spawns consequently only minimal extensions to smt machines are required to support our technique despite its simplicity our technique offers an average speedup of in set of irregular applications which is speedup over state of the art software controlled prefetching
we propose texture function for realistic modeling and efficient rendering of materials that exhibit surface mesostructures translucency and volumetric texture variations the appearance of such complex materials for dynamic lighting and viewing directions is expensive to calculate and requires an impractical amount of storage to precompute to handle this problem our method models an object as shell layer formed by texture synthesis of volumetric material sample and homogeneous inner core to facilitate computation of surface radiance from the shell layer we introduce the shell texture function stf which describes voxel irradiance fields based on precomputed fine level light interactions such as shadowing by surface mesostructures and scattering of photons inside the object together with diffusion approximation of homogeneous inner core radiance the stf leads to fast and detailed raytraced renderings of complex materials
in this paper we present new method for alignment of models this approach is based on two types of symmetries of the models the reflective symmetry and the local translational symmetry along direction inspired by the work on the principal component analysis pca we select the best optimal alignment axes within the pca axes the plane reflection symmetry being used as selection criterion this pre processing transforms the alignment problem into an indexing scheme based on the number of the retained pca axes in order to capture the local translational symmetry of shape along direction we introduce new measure we call the local translational invariance cost ltic the mirror planes of model are also used to reduce the number of candidate coordinate frames when looking for the one which corresponds to the user’s perception experimental results show that the proposed method finds the rotation that best aligns mesh
we present new approach for efficient collision handling of meshless objects undergoing geometric deformation the presented technique is based on bounding sphere hierarchies we show that information of the geometric deformation model can be used to significantly accelerate the hierarchy update the cost of the presented hierarchy update depends on the number of primitives in close proximity but not on the total number of primitives further the hierarchical collision detection is combined with level of detail response scheme since the collision response can be performed on any level of the hierarchy it allows for balancing accuracy and efficiency thus the collision handling scheme is particularly useful for time critical applications
consider set of servers and set of users where each server has coverage region ie an area of service and capacity ie maximum number of users it can serve our task is to assign every user to one server subject to the coverage and capacity constraints to offer the highest quality of service we wish to minimize the average distance between users and their assigned server this is an instance of well studied problem in operations research termed optimal assignment even though there exist several solutions for the static case where user locations are fixed there is currently no method for dynamic settings in this paper we consider the continuous assignment problem cap where an optimal assignment must be constantly maintained between mobile users and set of servers the fact that the users are mobile necessitates real time reassignment so that the quality of service remains high ie their distance from their assigned servers is minimized the large scale and the time critical nature of targeted applications require fast cap solutions we propose an algorithm that utilizes the geometric characteristics of the problem and significantly accelerates the initial assignment computation and its subsequent maintenance our method applies to different cost functions eg average squared distance and to any minkowski distance metric eg euclidean norm etc
demand driven spectrum allocation can drastically improve performance for wifi access points struggling under increasing user demands while their frequency agility makes cognitive radios ideal for this challenge performing adaptive spectrum allocation is complex and difficult process in this work we propose flex an efficient spectrum allocation architecture that efficiently adapts to dynamic traffic demands flex tunes network wide spectrum allocation by access points coordinating with peers minimizing network resets through local adaptations through detailed analysis and experimental evaluation we show that flex converges quickly provides users with proportional fair spectrum usage and significantly outperforms existing spectrum allocation proposals
this paper describes the techniques used to optimize relational queries in the sdd distributed database system queries are submitted to sdd in high level procedural language called datalanguage optimization begins by translating each datalanguage query into relational calculus form called an envelope which is essentially an aggregate free quel query this paper is primarily concerned with the optimization of envelopes envelopes are processed in two phases the first phase executes relational operations at various sites of the distributed database in order to delimit subset of the database that contains all data relevant to the envelope this subset is called reduction of the database the second phase transmits the reduction to one designated site and the query is executed locally at that site the critical optimization problem is to perform the reduction phase efficiently success depends on designing good repertoire of operators to use during this phase and an effective algorithm for deciding which of these operators to use in processing given envelope against given database the principal reduction operator that we employ is called semijoin in this paper we define the semijoin operator explain why semijoin is an effective reduction operator and present an algorithm that constructs cost effective program of semijoins given an envelope and database
the effectiveness of information retrieval ir systems is influenced by the degree of term overlap between user queries and relevant documents query document term mismatch whether partial or total is fact that must be dealt with by ir systems query expansion qe is one method for dealing with term mismatch ir systems implementing query expansion are typically evaluated by executing each query twice with and without query expansion and then comparing the two result sets while this measures an overall change in performance it does not directly measure the effectiveness of ir systems in overcoming the inherent issue of term mismatch between the query and relevant documents nor does it provide any insight into how such systems would behave in the presence of query document term mismatch in this paper we propose new approach for evaluating query expansion techniques the proposed approach is attractive because it provides an estimate of system performance under varying degrees of query document term mismatch it makes use of readily available test collections and it does not require any additional relevance judgments or any form of manual processing
this paper considers the problem of self diagnosis of wireless and mobile ad hoc networks manets using the comparison approach in this approach network manet consists of collection of independent heterogeneous mobile or stationary hosts interconnected via wireless links and it is assumed that at most of these hosts are faulty in order to diagnose the state of the manet tasks are assigned to pairs of hosts and the outcomes of these tasks are compared the agreements and disagreements between the hosts are the basis for identifying the faulty ones the comparison approach is believed to be one of the most practical fault identification approaches for diagnosing hard and soft faults we develop new distributed self diagnosis protocol called dynamic dsdp for manets that identifies both hard and soft faults in finite amount of time the protocol is constructed on top of reliable multi hop architecture correctness and complexity proofs are provided and they show that our dynamic dsdp performs better from communication complexity viewpoint than the existing protocols we have also developed simulator that is scalable to large number of nodes using the simulator we carried out simulation study to analyze the effectiveness of the self diagnosis protocol and its performance with regards to the number of faulty hosts the simulation results show that the proposed approach is an attractive and viable alternative or addition to present fault diagnosis techniques in manet environments
number of recent technological trends have made data intensive applications such as continuous media audio and video servers reality these servers are expected to play an important role in applications such as video on demand digital library news on demand distance learning etc continuous media applications are data intensive and might require storage subsystems that consist of hundreds of multi zone disk drives with the current technological trends homogeneous disk subsystem might evolve to consist of heterogeneous collection of disk drives given such storage subsystem the system must continue to support hiccup free display of audio and video clips this study describes extensions of four continuous display techniques for multi zone disk drives to heterogeneous platform these techniques include ibm’s logical track hp’s track pairing and usc’s fixb and deadline driven techniques we quantify the performance tradeoff associated with these techniques using analytical models and simulation studies the obtained results demonstrate tradeoffs between the cost per simultaneous stream supported by technique the wasted disk space and the incurred startup latency
packet classification categorizes incoming packets into multiple forwarding classes in router based on predefined filters it is important in fulfilling the requirements of differentiated services to achieve fast packet classification new approach namely filter rephrasing is proposed to encode the original filters by exploiting the hierarchical property of the filters filter rephrasing could dramatically reduce the search and storage complexity incurred in packet classification we incorporate well known scheme rectangle search with filter rephrasing to improve the lookup speed by at least factor of and decreases of the storage expenses as compared with other existing schemes the proposed scheme exhibits better balance between speed storage and computation complexity consequently the scalable effect of filter rephrasing is suitable for backbone routers with great number of filters
recent work on incremental crawling has enabled the indexed document collection of search engine to be more synchronized with the changing world wide web however this synchronized collection is not immediately searchable because the keyword index is rebuilt from scratch less frequently than the collection can be refreshed an inverted index is usually used to index documents crawled from the web complete index rebuild at high frequency is expensive previous work on incremental inverted index updates have been restricted to adding and removing documents updating the inverted index for previously indexed documents that have changed has not been addressedin this paper we propose an efficient method to update the inverted index for previously indexed documents whose contents have changed our method uses the idea of landmarks together with the diff algorithm to significantly reduce the number of postings in the inverted index that need to be updated our experiments verify that our landmark diff method results in significant savings in the number of update operations on the inverted index
despite improvements in network interfaces and software messaging layers software communication overhead still dominates the hardware routing cost in most systems in this study we identify the sources of this overhead by analyzing software costs of typical communication protocols built atop the active messages layer on the cm we show that up to ndash of the software messaging costs are direct consequence of the gap between specific network features such as arbitrary delivery order finite buffering and limited fault handling and the user communication requirements of in order delivery end to end flow control and reliable transmission however virtually all of these costs can be eliminated if routing networks provide higher level services such as in order delivery end to end flow control and packet level fault tolerance we conclude that significant cost reductions require changing the constraints on messaging layers we propose designing networks and network interfaces which simplify or replace software for implementing user communication requirements
the aim of this study was to empirically evaluate an embodied conversational agent called greta in an effort to answer two main questions what are the benefits and costs of presenting information via an animated agent with certain characteristics in persuasion task compared to other forms of display how important is it that emotional expressions are added in way that is consistent with the content of the message in animated agents to address these questions positively framed healthy eating message was created which was variously presented via greta matched human actor greta’s voice only no face or as text only furthermore versions of greta were created which displayed additional emotional facial expressions in way that was either consistent or inconsistent with the content of the message overall it was found that although greta received significantly higher ratings for helpfulness and likability presenting the message via greta led to the poorest memory performance among users importantly however when greta’s additional emotional expressions were consistent with the content of the verbal message the negative effect on memory performance disappeared overall the findings point to the importance of achieving consistency in animated agents
scalable busy wait synchronization algorithms are essential for achieving good parallel program performance on large scale multiprocessors such algorithms include mutual exclusion locks reader writer locks and barrier synchronization unfortunately scalable synchronization algorithms are particularly sensitive to the effects of multiprogramming their performance degrades sharply when processors are shared among different applications or even among processes of the same application in this paper we describe the design and evaluation of scalable scheduler conscious mutual exclusion locks reader writer locks and barriers and show that by sharing information across the kernel application interface we can improve the performance of scheduler oblivious implementations by more than an order of magnitude
in shared memory multiprocessor system it may be more efficient to schedule taskon one processor than on another if relevant data already reside in particularprocessor’s cache the effects of this type of processor affinity are examined it isobserved that tasks continuously alternate between executing at processor andreleasing this processor due to synchronization quantum expiration or preemptionqueuing network models of different abstract scheduling policies are formulated spanning the range from ignoring affinity to fixing tasks on processors these models are solved via mean value analysis where possible and by simulation otherwise an analytic cache model is developed and used in these scheduling models to include the effects of an initial burst of cache misses experienced by tasks when they return to processor forexecution mean value technique is also developed and used in the scheduling modelsto include the effects of increased bus traffic due to these bursts of cache misses onlya small amount of affinity information needs to be maintained for each task theimportance of having policy that adapts its behavior to changes in system load isdemonstrated
many modern applications have significant operating system os component the os execution affects various architectural states including the dynamic branch predictions which are widely used in today’s high performance microprocessor designs to improve performance this impact tends to become more significant as the designs become more deeply pipelined and more speculative in this paper we focus on the issues of understanding the os effects on the branch predictions and designing architectural support to alleviate the bottlenecks that are created by misprediction in this work we characterize the control flow transfer of several emerging applications on commercial os it was observed that the exception driven intermittent invocation of os code and user os branch history interference increased misprediction in both user and kernel code we propose two simple os aware control flow prediction techniques to alleviate the destructive impact of user os branch interference the first consists of capturing separate branch correlation information for user and kernel code the second involves using separate branch prediction tables for user and kernel code we demonstrate in this paper that os aware branch predictions require minimal hardware modifications and additions moreover the os aware branch predictions can be integrated with many existing schemes to further improve their performance we studied the improvement contributed by os aware techniques to various branch prediction schemes ranging from the simple gshare to the more advanced agree multi hybrid and bi mode predictors on the entry predictors incorporating the os aware techniques yields up to percent percent percent and percent prediction accuracy improvement on the gshare multi hybrid agree and bi mode predictors respectively
many recent studies have convincingly demonstrated that network traffic exhibits noticeable self similar nature which has considerable impact on queuing performance however the networks used in current multicomputers have been primarily designed and analyzed under the assumption of the traditional poisson arrival process which is inherently unable to capture traffic self similarity consequently it is crucial to reexamine the performance properties of multicomputer networks in the context of more realistic traffic models before practical implementations show their potential faults in an effort toward this end this paper proposes the first analytical model for wormhole switched ary cubes in the presence of self similar traffic simulation experiments demonstrate that the proposed model exhibits good degree of accuracy for various system sizes and under different operating conditions the analytical model is then used to investigate the implications of traffic self similarity on network performance this study reveals that the network suffers considerable performance degradation when subjected to self similar traffic stressing the great need for improving network performance to ensure efficient support for this type of traffic
actions usually taken to prevent processors from overheating such as decreasing the frequency or stopping the execution flow also degrade performance multiprocessor systems however offer the possibility of moving the task that caused cpu to overheat away to some other cooler cpu so throttling becomes only last resort taken if all of system’s processors are hot additionally the scheduler can take advantage of the energy characteristics of individual tasks and distribute hot tasks as well as cool tasks evenly among all cpusthis work presents mechanism for determining the energy characteristics of tasks by means of event monitoring counters and an energy aware scheduling policy that strives to assign tasks to cpus in way that avoids overheating individual cpus our evaluations show that the benefit of avoiding throttling outweighs the overhead of additional task migrations and that energy aware scheduling in many cases increases the system’s throughput
in recent years computer and internet technologies have broadened the ways that people can stay in touch through interviews with parents and grandparents we examined how people use existing technologies to communicate and share with their extended family while most of our participants expressed desire for more communication and sharing with their extended family many felt that an increase would realistically be difficult to achieve due to challenges such as busy schedules or extended family members lack of technology use our results also highlight the complexity of factors that researchers and designers must understand when attempting to design technology to support and enhance relationships including trade offs between facilitating interaction while minimizing new obligations reducing effort without trivializing communication and balancing awareness with privacy
we use realistic interdomain routing experiment platform to conduct real time attack and defense exercises for training purposes our interdomain routing experiment platform integrates open source router software real time network simulation and light weight machine virtualization technologies and is capable of supporting realistic large scale routing experiments the network model used consists of major autonomous systems connecting swedish internet users with realistic routing configurations derived from the routing registry we conduct series of real time security exercises on this routing system to study the consequence of intentionally propagating false routing information on interdomain routing and the effectiveness of corresponding defensive measures we describe three kinds of simplistic bgp attacks in the context of security exercises designed specifically for training purposes while an attacker can launch attacks from compromised router by changing its routing policies administrators will be able to observe the adverse effect of these attacks and subsequently apply appropriate defensive measures to mitigate their impact such as installing filtering rules these exercises all carried out in real time demonstrate the feasibility of routing experiments using the real time routing experiment platform
reprogramming of sensor networks is an important and challenging problem as it is often necessary to reprogram the sensors in place in this article we propose mnp multihop reprogramming service designed for sensor networks one of the problems in reprogramming is the issue of message collision to reduce the problem of collision we propose sender selection algorithm that attempts to guarantee that in given neighborhood there is at most one source transmitting the program at time furthermore our sender selection is greedy in that it tries to select the sender that is expected to have the most impact we use pipelining to enable fast data propagation mnp is energy efficient because it reduces the active radio time of sensor node by putting the node into ldquo sleep rdquo state when its neighbors are transmitting segment that is not of interest we call this type of sleep contention sleep to further reduce the energy consumption we add noreq sleep where sensor node goes to sleep if none of its neighbors is interested in receiving the segment it is advertising we also introduce an optional init sleep to reduce the energy consumption in the initial phase of reprogramming finally we investigate the performance of mnp in different network settings
this paper presents scuba secure code update by attestation for detecting and recovering compromised nodes in sensor networks the scuba protocol enables the design of sensor network that can detect compromised nodes without false negatives and either repair them through code updates or revoke the compromised nodes the scuba protocol represents promising approach for designing secure sensor networks by proposing first approach for automatic recovery of compromised sensor nodes the scuba protocol is based on ice indisputable code execution primitive we introduce to dynamically establish trusted code base on remote untrusted sensor node
recently ubiquitous location based services lbs has been utilized in variety of practical and mission critical applications such as security services personalization services location based entertainment and location based commerce the essence of lbs is actually the concept of location awareness where location aware devices perform more intelligent services for users by utilizing their locations in order to realize achieve this concept mobile device should continuously monitor the real time contextual changes of user this is what we call location based monitoring in this paper we discuss the research and technical issues on designing location based real time monitoring systems in lbs along with three major subjects high performance spatial index monitoring query processing engine and distributed monitoring system with dynamic load balancing capability and energy efficient management
when program violates its specification model checker produces counterexample that shows an example of undesirable behavior it is up to the user to understand the error locate it and fix the problem previous work introduced technique for explaining and localizing errors based on finding the closest execution to counterexample with respect to distance metric that approach was applied only to concrete executions of programs this paper extends and generalizes the approach by combining it with predicate abstraction using an abstract state space increases scalability and makes explanations more informative differences between executions are presented in terms of predicates derived from the specification and program rather than specific changes to variable values reasoning to the cause of an error from the factthat in the failing run but in the successful execution is easier than reasoning from the information that in the failing run but in the successful execution an abstract explanation is automatically generalized predicate abstraction has previously been used in model checking purely as state space reduction technique however an abstraction good enough to enable model checking tool to find an error is also likely to be useful as an automatically generated high level description of state space suitable for use by programmers results demonstrating the effectiveness of abstract explanations support this claim
we address some crucial problem associated with text categorization local feature selection it seems that intuitionistic fuzzy sets can be an effective and efficient tool making it possible to assess each term from feature set for each category from point of view of both its indicative and non indicative ability it is important especially for high dimensional problems to improve text filtering via confident rejection of non relevant documents moreover we indicate that intuitionistic fuzzy sets are good tool for the classification of imbalanced and overlapping classes commonly encountered case in text categorization
steady progress in the development of optical disc technology over the past decade has brought it to the point where it is beginning to compete directly with magnetic disc technology worm optical discs in particular which permanently register information on the disc surface have significant advantages over magnetic technology for applications that are mainly archival in nature but require the ability to do frequent on line insertions in this paper we propose class of access methods that use rewritable storage for the temporary buffering of insertions to data sets stored on worm optical discs and we examine the relationship between the retrieval performance from worm optical discs and the utilization of disc storage space when one of these organizations is employed we describe the performance trade off as one of fast sequential retrieval of the contents of block versus wasted space owing to data replication model of specific instance of such an organization buffered hash file scheme is described that allows for the specification of retrieval performance objectives alternative strategies for managing data replication that allow trade offs between higher consumption rates and better average retrieval performance are also described we then provide an expected value analysis of the amount of disc space that must be consumed on worm disc to meet specified performance limits the analysis is general enough to allow easy extension to other types of buffered files systems for worm optical discs
there are many applications for which it is necessary to illustrate motion in static image using visual cues which do not represent physical entity in the scene yet are widely understood to convey motion for example consider the task of illustrating the desired movements for exercising dancing or given sport technique traditional artists have developed techniques to specify desired movements precisely technical illustrators and suggest motion cartoonists in an image in this paper we present an interactive system to synthesize image of an animated character by generating artist inspired motion cues derived from skeletal motion capture data the primary cues include directed arrows noise waves and stroboscopic motion first the user decomposes the animation into short sequences containing individual motions which can be represented by visual cues the system then allows the user to determine suitable viewpoint for illustrating the movement to select the proper level in the joint hierarchy as well as to fine tune various controls for the depiction of the cues themselves while the system does provide adapted default values for each control extracted from the motion capture data it allows fine tuning for greater expressiveness moreover these cues are drawn in real time and maintain coherent display with changing viewpoints we demonstrate the benefit of our interactive system on various motion capture sequences
in many data mining and machine learning problems the data items that need to be clustered or classified are not arbitrary points in high dimensional space but are distributions that is points on high dimensional simplex for distributions natural measures are not ell distances but information theoretic measures such as the kullback leibler and hellinger divergences similarly quantities such as the entropy of distribution are more natural than frequency moments efficient estimation of these quantities is key component in algorithms for manipulating distributions since the datasets involved are typically massive these algorithms need to have only sublinear complexity in order to be feasible in practice we present range of sublinear time algorithms in various oracle models in which the algorithm accesses the data via an oracle that supports various queries in particular we answer question posed by batu et al on testing whether two distributions are close in an information theoretic sense given independent samples we then present optimal algorithms for estimating various information divergences and entropy with more powerful oracle called the combined oracle that was also considered by batu et al finally we consider sublinear space algorithms for these quantities in the data stream model in the course of doing so we explore the relationship between the aforementioned oracle models and the data stream model this continues work initiated by feigenbaum et al an important additional component to the study is considering data streams that are ordered randomly rather than just those which are ordered adversarially
secure evaluation of private functions pf sfe allows two parties to compute private function which is known by one party only on private data of both it is known that pf sfe can be reduced to secure function evaluation sfe of universal circuit uc previous uc constructions only simulated circuits with gates of inputs while gates with inputs were decomposed into many gates with inputs which is inefficient for large as the size of uc heavily depends on the number of gateswe present generalized uc constructions to efficiently simulate any circuit with gates of inputs having efficient circuit representation our constructions are non trivial generalizations of previously known uc constructionsas application we show how to securely evaluate private functions such as neural networks nn which are increasingly used in commercial applications our provably secure pf sfe protocol needs only one round in the semi honest model or even no online communication at all using non interactive oblivious transfer and evaluates generalized uc that entirely hides the structure of the private nn this enables applications like privacy preserving data classification based on private nns without trusted third party while simultaneously protecting user’s data and nn owner’s intellectual property
the recognition of program constructs that are frequently used by software developers is powerful mechanism for optimizing and parallelizing compilers to improve the performance of the object code the development of techniques for automatic recognition of computational kernels such as inductions reductions and array recurrences has been an intensive research area in the scope of compiler technology during the this article presents new compiler framework that unlike previous techniques that focus on specific and isolated kernels recognizes comprehensive collection of computational kernels that appear frequently in full scale real applications the xark compiler operates on top of the gated single assignment gsa form of high level intermediate representation ir of the source code recognition is carried out through demand driven analysis of this high level ir at two different levels first the dependences between the statements that compose the strongly connected components sccs of the data dependence graph of the gsa form are analyzed as result of this intra scc analysis the computational kernels corresponding to the execution of the statements of the sccs are recognized second the dependences between statements of different sccs are examined in order to recognize more complex kernels that result from combining simpler kernels in the same code overall the xark compiler builds hierarchical representation of the source code as kernels and dependence relationships between those kernels this article describes in detail the collection of computational kernels recognized by the xark compiler besides the internals of the recognition algorithms are presented the design of the algorithms enables to extend the recognition capabilities of xark to cope with new kernels and provides an advanced symbolic analysis framework to run other compiler techniques on demand finally extensive experiments showing the effectiveness of xark for collection of benchmarks from different application domains are presented in particular the sparskit ii library for the manipulation of sparse matrices the perfect benchmarks the spec cpu collection and the pltmg package for solving elliptic partial differential equations are analyzed in detail
in this paper we initiate the study of the approximability of the facility location problem in distributed setting in particular we explore trade off between the amount of communication and the resulting approximation ratio we give distributed algorithm that for every constant achieves an m� klog approximation in communication rounds where message size is bounded to log bits the number of facilities and clients are and respectively and is coefficient that depends on the cost values of the instance our technique is based on distributed primal dual approach for approximating linear program that does not form covering or packing program
risps reconfigurable instruction set processors are increasingly becoming popular as they can be customized to meet design constraints however existing instruction set customization methodologies do not lend well for mapping custom instructions on to commercial fpga architectures in this paper we propose design exploration framework that provides for rapid identification of reduced set of profitable custom instructions and their area costs on commercial architectures without the need for time consuming hardware synthesis process novel clustering strategy is used to estimate the utilization of the lut look up table based fpgas for the chosen custom instructions our investigations show that the area costs computations using the proposed hardware estimation technique on custom instructions are shown to be within of those obtained using hardware synthesis systematic approach has been adopted to select the most profitable custom instruction candidates our investigations show that this leads to notable reduction in the number of custom instructions with only marginal degradation in performance simulations based on domain specific application sets from the mibench and mediabench benchmark suites show that on average more than area utilization efficiency performance area can be achieved with the proposed technique
the web has been rapidly deepened by myriad searchable databases online where data are hidden behind query forms helping users query alternative deep web sources in the same domain eg books airfares is an important task with broad applications as core component of those applications dynamic query translation ie translating user’s query across dynamically selected sources has not been extensively explored while existing works focus on isolated subproblems eg schema matching query rewriting to study we target at building complete query translator and thus face new challenges to complete the translator we need to solve the predicate mapping problem ie map source predicate to target predicates which is largely unexplored by existing works to satisfy our application requirements we need to design customizable system architecture to assemble various components addressing respective subproblems ie schema matching predicate mapping query rewriting tackling these challenges we develop light weight domain based form assistant which can generally handle alternative sources in the same domain and is easily customizable to new domains our experiment shows the effectiveness of our form assistant in translating queries for real web sources
we present two flexible internet protocol ip router hardware hw architectures that enable router to readily expand in capacity according to network traffic volume growth and reconfigure its functionalities according to the hierarchical network layer in which it is placed reconfigurability is effectuated by novel method called methodology for hardware unity and an associated functional unit special processing agent which can be built using state of the art technology more specifically reconfiguration between an edge and hub or backbone router can be done rapidly via simple open close connection approach such architectures may among other benefits significantly extend the intervals between router hw upgrades for internet services providers they can serve as the basis for development of the next generation ip routers and are directly applicable to the emerging concept of single layer ip network architecture
interactive cross language information retrieval clir process in which searcher and system collaborate to find documents that satisfy an information need regardless of the language in which those documents are written calls for designs in which synergies between searcher and system can be leveraged so that the strengths of one can cover weaknesses of the other this paper describes an approach that employs user assisted query translation to help searchers better understand the system’s operation supporting interaction and interface designs are introduced and results from three user studies are presented the results indicate that experienced searchers presented with this new system evolve new search strategies that make effective use of the new capabilities that they achieve retrieval effectiveness comparable to results obtained using fully automatic techniques and that reported satisfaction with support for cross language searching increased the paper concludes with description of freely available interactive clir system that incorporates lessons learned from this research
using mixture of random variables to model data is tried and tested method common in data mining machine learning and statistics by using mixture modeling it is often possible to accurately model even complex multimodal data via very simple components however the classical mixture model assumes that data point is generated by single component in the model lot of datasets can be modeled closer to the underlying reality if we drop this restriction we propose probabilistic framework the mixture of subsets mos model by making two fundamental changes to the classical mixture model first we allow data point to be generated by set of components rather than just single component next we limit the number of data attributes that each component can influence we also propose an em framework to learn the mos model from dataset and experimentally evaluate it on real high dimensional datasets our results show that the mos model learned from the data represents the underlying nature of the data accurately
as the number and size of large timestamped collections eg sequences of digitized newspapers periodicals blogs increase the problem of efficiently indexing and searching such data becomes more important term burstiness has been extensively researched as mechanism to address event detection in the context of such collections in this paper we explore how burstiness information can be further utilized to enhance the search process we present novel approach to model the burstiness of term using discrepancy theory concepts this allows us to build parameter free linear time approach to identify the time intervals of maximum burstiness for given term finally we describe the first burstiness driven search framework and thoroughly evaluate our approach in the context of different scenarios
wireless local area networks wlans have become commonplace addition to the normal environments surrounding us based on ieee technology wlans can now be found in the working place at homes and in many cities central district area as open or commercial services these access points in the public areas are called hotspots they provide internet access in various types of public places such as shopping districts cafés airports and shops as the hotspots are being used by growing user base that is also quite heterogeneous their usability is becoming evermore important as hotspots can be accessed by number of devices differing in their capabilities size and user interfaces achieving good usability in accessing the services is not straightforward this paper reports user study and usability analysis on wlan access to discover user’s needs and suggest enhancements to fight the usability problems in wlan access
many organizations rely on web applications that use back end databases to store important data testing such applications requires significant effort manual testing alone is often impractical so testers also rely on automated testing techniques however current automated testing techniques may produce false positives or false negatives even in perfectly working system because the outcome of test case depends on the state of the database which changes over time as data is inserted and deleted the automatic database tester autodbt generates functional test cases that account for database updates autodbt takes as input model of the application and set of testing criteria the model consists of state transition diagram that shows how users navigate pages data specification that captures how data flows and an update specification that shows how the database is updated autodbt generates guard queries to determine whether the database is in state conducive to performing and evaluating tests autodbt also generates partial oracles to help validate whether back end database is updated correctly during testing this paper describes the design of autodbt prototype implementation several experiments with the prototype and four case studies
hyperlinks are an essential feature of the world wide web highly responsible for its success xlink improves on html’s linking capabilities in several ways in particular links after xlink can be out of line ie not defined at link source and collected in possibly several linkbases which considerably ease building complex link structuresregarding its architecture as distributed and open system the web differs significantly from traditional hypermedia systems modeling of link structures and processing of linkbases under the web’s open world linking require rethinking the traditional approaches this unfortunately has been rather neglected in the design of xlinkadding notion of interface to xlink as suggested in this work can considerably improve modeling of link structures when link structure is traversed the relevant linkbase might become ambiguous we suggest three linkbase management modes governing the binding of linkbase to document to resolve this ambiguity
this paper presents discriminative alignment model for extracting abbreviations and their full forms appearing in actual text the task of abbreviation recognition is formalized as sequential alignment problem which finds the optimal alignment origins of abbreviation letters between two strings abbreviation and full form we design large amount of finegrained features that directly express the events where letters produce or do not produce abbreviations we obtain the optimal combination of features on an aligned abbreviation corpus by using the maximum entropy framework the experimental results show the usefulness of the alignment model and corpus for improving abbreviation recognition
in wormhole meshes reliable routing is supposed to be deadlock free and fault tolerant many routing algorithms are able to tolerate large number of faults enclosed by rectangular blocks or special convex none of them however is capable of handling two convex fault regions with distance two by using only two virtual networks in this paper fault tolerant wormhole routing algorithm is presented to tolerate the disjointed convex faulty regions with distance two or no less which do not contain any nonfaulty nodes and do not prohibit any routing as long as nodes outside faulty regions are connected in the mesh network the processors overlapping along the boundaries of different fault regions is allowed the proposed algorithm which routes the messages by routing algorithm in fault free region can tolerate convex fault connected regions with only two virtual channels per physical channel and is deadlock and livelock free the proposed algorithm can be easily extended to adaptive routing
natural time dependent similarity measure for two trajectories is their average distance at corresponding times we give algorithms for computing the most similar subtrajectories under this measure assuming the two trajectories are given as two polygonal possibly self intersecting lines when minimum duration is specified for the subtrajectories and they must start at exactly corresponding times in the input trajectories we give linear time algorithm for computing the starting time and duration of the most similar subtrajectories the algorithm is based on result of independent interest we present linear time algorithm to find for piece wise monotone function an interval of at least given length that has minimum average value when the two subtrajectories can start at different times in the two input trajectories it appears difficult to give an exact algorithm for the most similar subtrajectories problem even if the duration of the desired two subtrajectories is fixed to some length we show that the problem can be solved approximately and with performance guarantee more precisely we present epsilon approximation algorithms for computing the most similar subtrajectories of two input trajectories for the case where the duration is specified and also for the case where only minimum on the duration is specified
large information displays are common in public and semi public spaces but still require rapid and lightweight ways for users to interact with them we present bluetone framework for developing large display applications which will interpret and react to dual tone multi frequency sounds transmitted from mobile phones paired with the display using the bluetooth headset profile bluetone enables text entry cursor manipulation and menu selection without requiring the installation of any special software on user’s mobile phone
as the computing industry enters the multicore era exponential growth in the number of transistors on chip continues to present challenges and opportunities for computer architects and system designers we examine one emerging issue in particular that of dynamic heterogeneity which can arise even among physically homogeneous cores from changing reliability power or thermal conditions different cache and tlb contents or changing resource configurations this heterogeneity results in constantly varying pool of hardware resources which greatly complicates software’s traditional task of assigning computation to cores in part to address dynamic heterogeneity we argue that hardware should take more active role in the management of its computation resources we propose hardware techniques to virtualize the cores of multicore processor allowing hardware to flexibly reassign the virtual processors that are exposed even to single operating system to any subset of the physical cores we show that multicore virtualization operates with minimal overhead and that it enables several novel resource management applications for improving both performance and reliability
this work presents general mechanism for executing specifications that comply with given invariants which may be expressed in different formalisms and logics we exploit maude’s reflective capabilities and its properties as general semantic framework to provide generic strategy that allows us to execute maude specifications taking into account user defined invariants the strategy is parameterized by the invariants and by the logic in which such invariants are expressed we experiment with different logics providing examples for propositional logic finite future time linear temporal logic and metric temporal logic
over the past five years large scale storage installations have required fault protection beyond raid leading to flurry of research on and development of erasure codes for multiple disk failures numerous open source implementations of various coding techniques are available to the general public in this paper we perform head to head comparison of these implementations in encoding and decoding scenarios our goals are to compare codes and implementations to discern whether theory matches practice and to demonstrate how parameter selection especially as it concerns memory has significant impact on code’s performance additional benefits are to give storage system designers an idea of what to expect in terms of coding performance when designing their storage systems and to identify the places where further erasure coding research can have the most impact
in this paper we propose global visibility algorithm which computes from region visibility for all view cells simultaneously in progressive manner we cast rays to sample visibility interactions and use the information carried by ray for all view cells it intersects the main contribution of the paper is set of adaptive sampling strategies based on ray mutations that exploit the spatial coherence of visibility our method achieves more than an order of magnitude speedup compared to per view cell sampling this provides practical solution to visibility preprocessing and also enables new type of interactive visibility analysis application where it is possible to quickly inspect and modify coarse global visibility solution that is constantly refined
in this paper we describe study that explored the implications of the social translucence framework for designing systems that support communications at work two systems designed for communicating availability status were empirically evaluated to understand what constitutes successful way to achieve visibility of people’s communicative state some aspects of the social translucence constructs visibility awareness and accountability were further operationalized into questionnaire and tested relationships between these constructs through path modeling techniques we found that to improve visibility systems should support people in presenting their status in contextualized yet abstract manner visibility was also found to have an impact on awareness and accountability but no significant relationship was seen between awareness and accountability we argue that to design socially translucent systems it is insufficient to visualize people’s availability status it is also necessary to introduce mechanisms stimulating mutual awareness that allow for maintaining shared reciprocical knowledge about communicators availability state which then can encourage them to act in socially responsible way
pervasive games have become popular field of investigation in recent years in which natural human computer interaction hci plays key role in this paper vision based approach for human hand motion gesture recognition is proposed for natural hci in pervasive games led light pen is used to indicate the user’s hand position while web camera is used to capture the hand motion rule based approach is used to design set of hand gestures which are classified into two categories linear gestures and arc shaped gestures determinate finite state automaton is developed to segment the captured hand motion trajectories the proposed interaction method has been applied to the traditional game tetris on pc with the hand held led light pen being used to drive the game instead of traditional key strokes experimental results show that the vision based interactions are natural and effective
we introduce green coordinates for closed polyhedral cages the coordinates are motivated by green’s third integral identity and respect both the vertices position and faces orientation of the cage we show that green coordinates lead to space deformations with shape preserving property in particular in they induce conformal mappings and extend naturally to quasi conformal mappings in in both cases we derive closed form expressions for the coordinates yielding simple and fast algorithm for cage based space deformation we compare the performance of green coordinates with those of mean value coordinates and harmonic coordinates and show that the advantage of the shape preserving property is not achieved at the expense of speed or simplicity we also show that the new coordinates extend the mapping in natural analytic manner to the exterior of the cage allowing the employment of partial cages
data transfer in grid environment has become one critical activity in large number of applications that require access to huge volumes of data in these scenarios characterized by large latencies poor performance and complex dependencies the use of approaches such as multiagents or parallel can provide great benefit however all the attempts to improve the performance of data transfer in grids should achieve the interoperability with already established data transfer schemes gridftp is one of the most known and used data grid transfer protocols this paper describes mapfs dsi modification of the gridftp server based on multiagent parallel file system mapfs dsi increases the performance of data transfers but keeping the interoperability with existing gridftp servers
in this paper we propose conceptual architecture for personal semantic web information retrieval system it incorporates semantic web web service pp and multi agent technologies to enable not only precise location of web resources but also the automatic or semi automatic integration of web resources delivered through web contents and web services in this architecture the semantic issues concerning the whole lifecycle of information retrieval were considered consistently and the integration of web contents and web services is enabled seamlessly the architecture consists of three main components consumer provider and mediator all providers and consumers are constructed as semantic myportal which provides gateway to all the information relevant to user each provider describes its capabilities in what we call wscd web site capability description and each consumer will submit relevant queries based on user requirements when web search is necessary the mediator is composed of agents assigned to the consumer and providers using an agent community based pp information retrieval acpp method to fullfill the information sharing among semantic myportals some preliminary experimental results are presented to show the efficiency of the acpp method and the usefulness of two query response histories for looking up new information sources and for reducing communication loads
possibilistic defeasible logic programming delp is logic programming language which combines features from argumentation theory and logic programming incorporating the treatment of possibilistic uncertainty at the object language level in spite of its expressive power an important limitation in delp is that imprecise fuzzy information cannot be expressed in the object language one interesting alternative for solving this limitation is the use of pgl possibilistic logic over godel logic extended with fuzzy constants fuzzy constants in pgl allow expressing disjunctive information about the unknown value of variable in the sense of magnitude modelled as unary predicate the aim of this article is twofold firstly we formalize depgl possibilistic defeasible logic programming language that extends delp through the use of pgl in order to incorporate fuzzy constants and fuzzy unification mechanism for them secondly we propose way to handle conflicting arguments in the context of the extended framework
the system administrators of large organizations often receive large number of mails from its users and substantial amount of effort is devoted to reading and responding to these mails the content of these messages can range from trivial technical questions to complex problem reports often these queries can be classified into specific categories for example reports of file system that is full or requests to change the toner in particular printer in this project we have experimented with text mining techniques and developed tool for automatically classifying user mail queries in real time and pseudo automatically responding to these requests our experimental evaluations suggest that one cannot completely rely on totally automatic tool for sorting and responding to incoming mail however it can be resource saving compliment to an existing toolset that can increase the support efficiency and quality
fast instruction decoding is challenge for the design of cisc microprocessors well known solution to overcome this problem is using trace cache it stores and fetches already decoded instructions avoiding the need for decoding them again however implementing trace cache involves an important increase in the fetch architecture complexityin this paper we propose novel decoding architecture that reduces the fetch engine implementation cost instead of using special purpose buffer like the trace cache our proposal stores frequently decoded instructions in the memory hierarchy the address where the decoded instructions are stored is kept in the branch prediction mechanism enabling it to guide our decoding architecture this makes it possible for the processor front end to fetch already decoded instructions from memory instead of the original nondecoded instructions our results show that an wide superscalar processor achieves an average performance improvement by using our decoding architecture this improvement is comparable to the one achieved by using the more complex trace cache while requiring less chip area and less energy consumption in the fetch architecture
this article describes general purpose program analysis that computes global control flow and data flow information for higher order call by value languages the analysis employs novel form of polyvariance called polymorhic splitting that uses let expressions as syntactic clues to gain precision the information derived from the analysis is used both to eliminate run time checks and to inline procedure the analysis and optimizations have been applied to suite of scheme programs experimental results obtained from the prototype implementation indicate that the analysis is extremely precise and has reasonable cost compared to monovariant flow analyses such as cfa or analyses based on type inference such as soft typing the analysis eliminates significantly more run time checks run time check elimination and inlining together typically yield to performance improvement for the benchmark suite with some programs running four times as fast
organising large scale web information retrieval systems into hierarchies of topic specific search resources can improve both the quality of results and the efficient use of computing resources promising way to build such systems involves federations of topic specific search engines in decentralised search environments most of the previous research concentrated on various technical aspects of such environments eg routing of search queries or merging of results from multiple sources we focus on organisational dynamics what happens to topical specialisation of search engines in the absence of centralised control when each engine makes individual and self interested decisions on its service parameters we investigate this question in computational economics framework where search providers compete for user queries by choosing what topics to index we provide formalisation of the competition problem and then analyse theoretically and empirically the specialisation dynamics of such systems
adaptive body biasing abb is popularly used technique to mitigate the increasing impact of manufacturing process variations on leakage power dissipation the efficacy of the abb technique can be improved by partitioning design into number of body bias islands each with its individual body bias voltage in this paper we propose system level leakage variability mitigation framework to partition multiprocessor system into body bias islands at the processing element pe granularity at design time and to optimally assign body bias voltages to each island post fabrication as opposed to prior gate and circuit level partitioning techniques that constrain the global clock frequency of the system we allow each island to run at different speed and constrain only the relevant system performance metrics in our case the execution deadlines experimental results show the efficacy of the proposed framework in reducing the mean and standard deviation of leakage power dissipation compared to baseline system without abb at the same time the proposed techniques provide significant runtime improvements over previously proposed monte carlo based technique while providing similar reductions in leakage power dissipation
it is well known that web page classification can be enhanced by using hyperlinks that provide linkages between web pages however in the web space hyperlinks are usually sparse noisy and thus in many situations can only provide limited help in classification in this paper we extend the concept of linkages from explicit hyperlinks to implicit links built between web pages by observing that people who search the web with the same queries often click on different but related documents together we draw implicit links between web pages that are clicked after the same queries those pages are implicitly linked we provide an approach for automatically building the implicit links between web pages using web query logs together with thorough comparison between the uses of implicit and explicit links in web page classification our experimental results on large dataset confirm that the use of the implicit links is better than using explicit links in classification performance with an increase of more than in terms of the macro measurement
in march sigcse members contributed to mailing list discussion on the question of whether programming should be taught objects first or imperative first we analyse that discussion exploring how the cs community debates the issue and whether contributors positions are supported by the research literature on novice programmers we applied four distinct research methods to the discussion cognitive science rhetorical analysis in the critical tradition phenomenography and biography we identify the cognitive claims made in the email discussion and find there is not consensus in the research literature as to whether the objects first approach or the imperative approach is harder to learn from the rhetorical analysis we find that the discussion was not so much debate between oo first versus imperative first but instead was more for and against oo first our phenomenographic analysis identified and categorized the underlying complexity of the discussion we also applied biographical method to explore the extent to which the participants views are shaped by their own prior experience the paper concludes with some reflections upon paradigms and the manner in which the cs discipline community defines itself
the well definedness problem for database query language consists of checking given an expression and an input type that the expression never yields runtime error on any input adhering to the input type in this article we study the well definedness problem for query languages on trees that are built from finite set of partially defined base operations by adding variables constants conditionals let bindings and iteration we identify properties of base operations that can make the problem undecidable and give restrictions that are sufficient to ensure decidability as direct result we obtain large fragment of xquery for which well definedness is decidable
one promising approach for adding object oriented oo facilities to functional languages like ml is to generalize the existing datatype and function constructs to be hierarchical and extensible so that datatype variants simulate classes and function cases simulate methods this approach allows existing datatypes to be easily extended with both new operations and new variants resolving longstanding conflict between the functional and oo styles however previous designs based on this approach have been forced to give up modular typechecking requiring whole program checks to ensure type safety we describe extensible ml eml an ml like language that supports hierarchical extensible datatypes and functions while preserving purely modular typechecking to achieve this result eml’s type system imposes few requirements on datatype and function extensibility but eml is still able to express both traditional functional and oo idioms we have formalized core version of eml and proven the associated type system sound and we have developed prototype interpreter for the language
in this paper we propose deferred workload based inter task dvs dynamic voltage scaling algorithm dwdvs which has two features for portable multimedia devices the first is that we reserve time interval for each task to execute and its workload can be completed in this time interval even in the worst case condition which means that the actual workload execution time of each task is equal to its worst case execution time in this way we can estimate the slack time from lower priority tasks more aggressively the second is that we defer these reserved time intervals which means that reserved time interval will be shifted to the deadline of its corresponding task as close as possible thus the operating frequency can be reduced even without slack time simulation results show that the proposed dwdvs reduces the energy consumption by and compared with the static voltage scaling static laedf and dra algorithms respectively and approaches theoretical low bound bound by margin of at most
during the iterative development of interactive software formative evaluation is often performed to find and fix usability problems early on the output of formative evaluation usually takes the form of prioritised list of usability findings each finding typically consisting of description of the problem how often it occurred and sometimes recommendation for possible solution unfortunately the valuable results of formative evaluations are usually collected into written document this makes it extremely difficult to automate the handling of usability findings more formalised electronic format for the handover of usability findings would make much more sense usabml is formalised structure for reporting usability findings expressed in xml it allows usability experts and software engineers to import usability findings into bug issue tracking systems to associate usability issues with parts of source code and to track progress in fixing them
almost all actions on computer are mediated by windows yet we know surprisingly little about how people coordinate their activities using these windows studies of window use are difficult for two reasons gathering longitudinal data is problematic and it is unclear how to extract meaningful characterisations from the data in this paper we present visualisation tool called window watcher that helps researchers understand and interpret low level event logs of window switching activities generated by our tool pylogger we describe its design objectives and demonstrate ways that it summarises and elucidates window use
many interactive systems require users to navigate through large sets of data and commands using constrained input devices mdash such as scroll rings rocker switches or specialized keypads mdash that provide less power and flexibility than traditional input devices like mice or touch screens while performance with more traditional devices has been extensively studied in human computer interaction there has been relatively little investigation of human performance with constrained input as result there is little understanding of what factors govern performance in these situations and how interfaces should be designed to optimize interface actions such as navigation and selection since constrained input is now common in wide variety of interactive systems such as mobile phones audio players in car navigation systems and kiosk displays it is important for designers to understand what factors affect performance to aid in this understanding we present the constrained input navigation cin model predictive model that allows accurate determination of human navigation and selection performance in constrained input scenarios cin identifies three factors that underlie user efficiency the performance of the interface type for single level item selection where interface type depends on the input and output devices the interactive behavior and the data organization the hierarchical structure of the information space and the user’s experience with the items to be selected we show through experiments that after empirical calibration the model’s predictions fit empirical data well and discuss why and how each of the factors affects performance models like cin can provide valuable theoretical and practical benefits to designers of constrained input systems allowing them to explore and compare much wider variety of alternate interface designs without the need for extensive user studies
this paper provides logical framework of negotiating agents who have capabilities of evaluating and building proposals given proposal an agent decides whether it is acceptable or not if the proposal is unacceptable as it is the agent seeks conditions to accept it this attitude is captured as process of making hypotheses by induction if an agent fails to find hypothesis it would concede by giving up some of its current belief this attitude is characterized using default reasoning we provide logical framework of such think act cycle of an agent and develop method for computing proposals using answer set programming
training statistical machine translation starts with tokenizing parallel corpus some languages such as chinese do not incorporate spacing in their writing system which creates challenge for tokenization moreover morphologically rich languages such as korean present an even bigger challenge since optimal token boundaries for machine translation in these languages are often unclear both rule based solutions and statistical solutions are currently used in this paper we present unsupervised methods to solve tokenization problem our methods incorporate information available from parallel corpus to determine good tokenization for machine translation
we present the first verification of full functional correctness for range of linked data structure implementations including mutable lists trees graphs and hash tables specifically we present the use of the jahob verification system to verify formal specifications written in classical higher order logic that completely capture the desired behavior of the java data structure implementations with the exception of properties involving execution time and or memory consumption given that the desired correctness properties include intractable constructs such as quantifiers transitive closure and lambda abstraction it is challenge to successfully prove the generated verification conditions our jahob verification system uses integrated reasoning to split each verification condition into conjunction of simpler subformulas then apply diverse collection of specialized decision procedures first order theorem provers and in the worst case interactive theorem provers to prove each subformula techniques such as replacing complex subformulas with stronger but simpler alternatives exploiting structure inherently present in the verification conditions and when necessary inserting verified lemmas and proof hints into the imperative source code make it possible to seamlessly integrate all of the specialized decision procedures and theorem provers into single powerful integrated reasoning system by appropriately applying multiple proof techniques to discharge different subformulas this reasoning system can effectively prove the complex and challenging verification conditions that arise in this context
today the world wide web is undergoing subtle but profound shift to web and semantic web technologies due to the increasing interest in the semantic web more and more semantic web applications are being developed one of the current main issues facing the development of semantic web applications is the simplicity and user friendliness for the end users especially for people with non it background this paper presents the feedrank system for the extraction aggregation semantic management and querying of web feeds the proposed system overcomes many of the limitations of conventional and passive web feed readers such as providing only simple presentations of what is received poor integration of correlated data from different sources and overwhelming the user with large traffic of feeds that are of no or low interest to them
in search engines ranking algorithms measure the importance and relevance of documents mainly based on the contents and relationships between documents user attributes are usually not considered in ranking this user neutral approach however may not meet the diverse interests of users who may demand different documents even with the same queries to satisfy this need for more personalized ranking we propose ranking framework social network document rank sndocrank that considers both document contents and the relationship between searched and document owners in social network this method combined the traditional tf idf ranking for document contents with out multi level actor similarity mas algorithm to measure to what extent document owners and the searcher are structurally similar in social network we implemented our ranking method in simulated video social network based on data extracted from youtube and tested its effectiveness on video search the results show that compared with the traditional ranking method like tf idfs the sndocrank algorithm returns more relevant documents more specifically searcher can get significantly better results be being in larger social network having more friends and being associated with larger local communities in social network
set sharing is an abstract domain in which each concrete object is represented by the set of local variables from which it might be reachable it is useful abstraction to detect parallelism opportunities since it contains definite information about which variables do not share in memory ie about when the memory regions reachable from those variables are disjoint set sharing is more precise alternative to pair sharing in which each domain element is set of all pairs of local variables from which common object may be reachable however the exponential complexity of some set sharing operations has limited its wider application this work introduces an efficient implementation of the set sharing domain using zero supressed binary decision diagrams zbdds because zbdds were designed to represent sets of combinations ie sets of sets they naturally represent elements of the set sharing domain we show how to synthesize the operations needed in the set sharing transfer functions from basic zbdd operations for some of the operations we devise custom zbdd algorithms that perform better in practice we also compare our implementation of the abstract domain with an efficient compact bitset based alternative and show that the zbdd version scales better in terms of both memory usage and running time
tiling is well known loop transformation to improve temporal locality of nested loops current compiler algorithms for tiling are limited to loops which are perfectly nested or can be transformed in trivial ways into perfect nest this paper presents number of program transformations to enable tiling for class of nontrivial imperfectly nested loops such that cache locality is improved we define program model for such loops and develop compiler algorithms for their tiling we propose to adopt odd even variable duplication to break anti and output dependences without unduly increasing the working set size and to adopt speculative execution to enable tiling of loops which may terminate prematurely due to eg convergence tests in iterative algorithms we have implemented these techniques in research compiler panorama initial experiments with several benchmark programs are performed on sgi workstations based on mips rk and rk processors overall the transformed programs run faster by to
recently progressive retrieval has been advocated as an alternate solution to multidimensional indexes or approximate techniques in order to accelerate similarity search of points in multidimensional spaces the principle of progressive search is to offer first subset of the answers to the user during retrieval if this subset satisfies the user’s needs retrieval stops otherwise search resumes and after number of steps the exact answer set is returned to the user such process is justified by the fact that in large number of applications it is more interesting to rapidly bring first approximate answer sets rather than waiting for long time the exact answer set the contribution of this paper is first typology of existing techniques for progressive retrieval we survey variety of methods designed for image retrieval although some of them apply to general database browsing context which goes beyond cbir we also include techniques not designed for but that can easily be adapted to progressive retrieval
in large data warehousing environments it is often advantageous to provide fast approximate answers to complex aggregate queries based on statistical summaries of the full data in this paper we demonstrate the difficulty of providing good approximate answers for join queries using only statistics in particular samples from the base relations we propose join synopses as an effective solution for this problem and show how precomputing just one join synopsis for each relation suffices to significantly improve the quality of approximate answers for arbitrary queries with foreign key joins we present optimal strategies for allocating the available space among the various join synopses when the query work load is known and identify heuristics for the common case when the work load is not known we also present efficient algorithms for incrementally maintaining join synopses in the presence of updates to the base relations our extensive set of experiments on the tpc benchmark database show the effectiveness of join synopses and various other techniques proposed in this paper
software or hardware data cache prefetching is an efficient way to hide cache miss latency however effectiveness of the issued prefetches have to be monitored in order to maximize their positive impact while minimizing their negative impact on performance in previous proposed dynamic frameworks the monitoring scheme is either achieved using processor performance counters or using specific hardware in this work we propose prefetching strategy which does not use any specific hardware component or processor performance counter our dynamic framework wants to be portable on any modern processor architecture providing at least prefetch instruction opportunity and effectiveness of prefetching loads is simply guided by the time spent to effectively obtain the data every load of program is monitored periodically and can be either associated to dynamically inserted prefetch instruction or not it can be associated to prefetch instruction at some disjoint periods of the whole program run as soon as it is efficient our framework has been implemented for itanium machines it involves several dynamic instrumentations of the binary code whose overhead is limited to only on average on large set of benchmarks our system is able to speed up some programs by
mitra is scalable storage manager that supports the display of continuous media data types eg audio and video clips it is software based system that employs off the shelf hardware components its present hardware platform is cluster of multi disk workstations connected using an atm switch mitra supports the display of mix of media types to reduce the cost of storage it supports hierarchical organization of storage devices and stages the frequently accessed objects on the magnetic disks for the number of displays to scale as function of additional disks mitra employs staggered striping it implements three strategies to maximize the number of simultaneous displays supported by each disk first the everest file system allows different files corresponding to objects of different media types to be retrieved at different block size granularities second the fixb algorithm recognizes the different zones of disk and guarantees continuous display while harnessing the average disk transfer rate third mitra implements the grouped sweeping scheme gss to minimize the impact of disk seeks on the available disk bandwidthin addition to reporting on implementation details of mitra we present performance results that demonstrate the scalability characteristics of the system we compare the obtained results with theoretical expectations based on the bandwidth of participating disks mitra attains between to of the theoretical expectations
this paper aims to match two sets of nonrigid feature points using random sampling methods by exploiting the principle eigenvector of corres pondence model linkage an adaptive sampling method is devised to efficiently deal with non rigid matching problems
despite the success of instruction level parallelism ilp optimizations in increasing the performance of microprocessors certain codes remain elusive in particular codes containing recursive data structure rds traversal loops have been largely immune to ilp optimizations due to the fundamental serialization and variable latency of the loop carried dependence through pointer chasing load to address these and other situations we introduce decoupled software pipelining dswp technique that statically splits single threaded sequential loop into multiple non speculative threads each of which performs useful computation essential for overall program correctness the resulting threads execute on thread parallel architectures such as simultaneous multithreaded smt cores or chip multiprocessors cmp expose additional instruction level parallelism and tolerate latency better than the original single threaded rds loop to reduce overhead these threads communicate using synchronization array dedicated hardware structure for pipelined inter thread communication dswp used in conjunction with the synchronization array achieves an to speedup in the optimized functions on both statically and dynamically scheduled processors
in many applications of wireless sensor networks sensor nodes are manually deployed in hostile environments where an attacker can disrupt the localization service and tamper with legitimate in network communication in this paper we introduce secure walking gps secure localization and key distribution solution for manual deployments of wsns using the location information provided by the gps and inertial guidance modules on special master node secure walking gps achieves accurate node localization and location based key distribution at the same time our analysis and simulation results indicate that the secure walking gps scheme makes deployed wsn resistant to the dolev yao the wormhole and the gps denial attacks has good localization and key distribution performance and is practical for large scale wsn deployments
we address the problem of energy efficient reliable wireless communication in the presence of unreliable or lossy wireless link layers in multi hop wireless networks prior work has provided an optimal energy efficient solution to this problem for the case where link layers implement perfect reliability however more common scenario link layer that is not perfectly reliable was left as an open problem in this paper we first present two centralized algorithms bamer and gamer that optimally solve the minimum energy reliable communication problem in presence of unreliable links subsequently we present distributed algorithm damer that approximates the performance of the centralized algorithm and leads to significant performance improvement over existing single path or multi path based techniques
shared variable is an abstraction of persistent interprocess communication processors execute operations often concurrently on shared variables to exchange information among themselves the behavior of operation executions is required to be ldquo consistent rdquo for effective interprocess communication consequently consistency specification of shared variable describes some guarantees on the behavior of the operation executions read write shared variable has two operations write stores specified value in the variable and read returns value from the variable for read write variables consistency specification describes what values reads may return using an intuitive notion of illegality of reads we propose framework that facilitates specifying large variety of read write variables
recently an elegant routing protocol the zone routing protocol zrp was proposed to provide hybrid routing framework that is locally proactive and globally reactive with the goal of minimizing the sum of the proactive and reactive control overhead the key idea of zrp is that each node proactively advertises its link state over fixed number of hops called the zone radius these local advertisements provide each node with an updated view of its routing zone the collection of all nodes and links that are reachable within the zone radius the nodes on the boundary of the routing zone are called peripheral nodes and play an important role in the reactive zone based route discovery the main contribution of this work is to propose novel hybrid routing protocol the two zone routing protocol tzrp as nontrivial extension of zrp in contrast with the original zrp where single zone serves dual purpose tzrp aims to decouple the protocol’s ability to adapt to traffic characteristics from its ability to adapt to mobility in support of this goal in tzrp each node maintains two zones crisp zone and fuzzy zone by adjusting the sizes of these two zones independently lower total routing control overhead can be achieved extensive simulation results show that tzrp is general manet routing framework that can balance the trade offs between various routing control overheads more effectively than zrp in wide range of network conditions
we present model for delegation that is based on our decentralized administrative role graph model we use combination of user group assignment and user role assignment to support user to user permission to user and role to role delegation powerful source dependent revocation algorithm is described we separate our delegation model into static and dynamic models then discuss the static model and its operations we provide detailed partial revocation algorithms we also give details concerning changes to the role hierarchy user group structure and rbac operations that are affected by delegation
the visual exploration of large databases raises number of unresolved inference problems and calls for new interaction patterns between multiple disciplines both at the conceptual and technical level we present an approach that is based on the interaction of four disciplines database systems statistical analyses perceptual and cognitive psychology and scientific visualization at the conceptual level we offer perceptual and cognitive insights to guide the information visualization process we then choose cluster surfaces to exemplify the data mining process to discuss the tasks involved and to work out the interaction patterns
in this paper an improved differential cryptanalysis framework for finding collisions in hash functions is provided its principle is based on linearization of compression functions in order to find low weight differential characteristics as initiated by chabaud and joux this is formalized and refined however in several ways for the problem of finding conforming message pair whose differential trail follows linear trail condition function is introduced so that finding collision is equivalent to finding preimage of the zero vector under the condition function then the dependency table concept shows how much influence every input bit of the condition function has on each output bit careful analysis of the dependency table reveals degrees of freedom that can be exploited in accelerated preimage reconstruction under the condition function these concepts are applied to an in depth collision analysis of reduced round versions of the two sha candidates cubehash and md and are demonstrated to give by far the best currently known collision attacks on these sha candidates
this paper describes technique for improving separation of concerns at the level of domain modeling contribution of this new approach is the construction of support tools that facilitate the elevation of crosscutting modeling concerns to first class constructs in type system the key idea is the application of variant of the omg object constraint language to models that are stored persistently in xml with this approach weavers are generated from domain specific descriptions to assist modeler in exploring various alternative modeling scenarios the paper examines several facets of aspect oriented domain modeling aodm including domain specific model weavers language to support the concern separation an overview of code generation issues within meta weaver framework and comparison between aodm and aop an example of the approach is provided as well as description of several future concepts for extending the flexibility within aodm
the content user gap is the difference between the limited range of content relevant preferences that may be expressed using the mpeg user interaction tools and the much wider range of metadata that may be represented using the mpeg content tools one approach for closing this gap is to make the user and content metadata isomorphic by using the existing mpeg content tools to represent user as well as content metadata agius and angelides subsequently user preferences may be specified for all content without omission since there is wealth of user preference and history metadata within the mpeg user interaction tools that can usefully complement these specific content preferences in this paper we develop method by which all user and content metadata may be bridged
this paper presents case study of the organisational control principles present in credit application process at the branch level of bank the case study has been performed in the context of an earlier suggested formal framework for organisational control principles based on the alloy predicate logic and its facilities for automated formal analysis and exploration in particular we establish and validate the novel concepts of specific and general obligations the delegation of these two kinds of obligations must be controlled by means of review and supervision controls the example of credit application process is used to discuss these organisational controls
cooperation among computational nodes to solve common parallel application is one of the most outstanding features of grid environments however from the performance point of view it is very important that this cooperation will be made in an equilibrated way because otherwise some nodes can be overloaded whereas other nodes can be underused this paper presents camble cooperative awareness model for balancing the load in grid environments which applies some theoretical principles of awareness models to promote an efficient autonomous equilibrate and cooperative task delivery in grid environments this cooperative task management has been implemented and tested in real and heterogeneous grid infrastructure composed of several vo with very successful results this paper presents some of these outcomes while emphasizes on the overhead and efficacy of the system using this model
in this paper several enhanced sufficient conditions are given for minimal routing in dimensional meshes with faulty nodes contained in set of disjoint faulty blocks it is based on an early work of wu’s minimal routing in meshes with faulty blocks unlike many traditional models that assume all the nodes know global fault distribution our approach is based on the notion of limited global fault information first fault model called faulty block is reviewed in which all faulty nodes in the system are contained in set of disjoint faulty blocks fault information is coded in tuple called extended safety level associated with each node of mesh to determine the feasibility of minimal routing specifically we study the existence of minimal route at given source node based on the associated extended safety level limited distribution of faulty block information and minimal routing an analytical model for the number of rows and columns that receive faulty block information is also given extensions to wang’s minimal connected components mccs are also considered mccs are rectilinear monotone polygonal shaped fault blocks and are refinement of faulty blocks our simulation results show substantial improvement in terms of higher percentage of minimal routing in meshes under both fault models
this paper proposes and evaluates new mechanism rate windows for and network rate policing the goal of the proposed system is to provide simple yet effective way to enforce resource limits on target classes of jobs in system this work was motivated by our linger longer infrastructure which harvests idle cycles in networks of workstations network and throttling is crucial because linger longer can leave guest jobs on non idle nodes and machine owners should not be adversely affected our approach is quite simple we use sliding window of recent events to compute the average rate for target resource the assigned limit is enforced by the simple expedient of putting application processes to sleep when they issue requests that would bring their resource utilization out of the allowable profile our system call intercept model makes the rate windows mechanism light weight and highly portable our experimental results show that we are able to limit resource usage to within few percent of target usages
two oft cited file systems the fast file system ffs and the log structured file system lfs adopt two sharply different update strategies update in place and update out of place this paper introduces the design and implementation of hybrid file system called hfs which combines the strengths of ffs and lfs while avoiding their weaknesses this is accomplished by distributing file system data into two partitions based on their size and type in hfs data blocks of large regular files are stored in data partition arranged in ffs like fashion while metadata and small files are stored in separate log partition organized in the spirit of lfs but without incurring any cleaning overhead this segregation makes it possible to use more appropriate layouts for different data than would otherwise be possible in particular hfs has the ability to perform clustered on all kinds of data including small files metadata and large files we have implemented prototype of hfs on freebsd and have compared its performance against three file systems including ffs with soft updates port of netbsd’s lfs and our lightweight journaling file system called yfs results on number of benchmarks show that hfs has excellent small file and metadata performance for example hfs beats ffs with soft updates in the range from to in the postmark benchmark
key scalability challenge for interprocedural dataflow analysis comes from large libraries our work addresses this challenge for the general category of interprocedural distributive environment ide dataflow problems using pre computed library summary information the proposed approach reduces significantly the cost of whole program ide analyses without any loss of precision we define an approach for library summary generation by using graph representation of dataflow summary functions and by abstracting away redundant dataflow facts that are internal to the library our approach also handles object oriented features by employing an ide type analysis as well as special handling of polymorphic library call sites whose target methods depend on the future unknown client code experimental results show that dramatic cost savings can be achieved with the help of these techniques
person name queries often bring up web pages that correspond to individuals sharing the same name the web people search weps task consists of organizing search results for ambiguous person name queries into meaningful clusters with each cluster referring to one individual this paper presents fuzzy ant based clustering approach for this multi document person name disambiguation problem the main advantage of fuzzy ant based clustering technique inspired by the behavior of ants clustering dead nestmates into piles is that no specification of the number of output clusters is required this makes the algorithm very well suited for the web person disambiguation task where we do not know in advance how many individuals each person name refers to we compare our results with state of the art partitional and hierarchical clustering approaches means and agnes and demonstrate favorable results this is particularly interesting as the latter involve manual setting of similarity threshold or estimating the number of clusters in advance while the fuzzy ant based clustering algorithm does not
in data stream applications data arrive continuously and can only be scanned once as the query processor has very limited memory relative to the size of the stream to work with hence queries on data streams do not have access to the entire data set and query answers are typically approximate while there have been many studies on the nearest neighbors knn problem in conventional multi dimensional databases the solutions cannot be directly applied to data streams for the above reasons in this paper we investigate the knn problem over data streams we first introduce the approximate knn eknn problem that finds the approximate knn answers of query point such that the absolute error of the th nearest neighbor distance is bounded by to support eknn queries over streams we propose technique called disc adaptive indexing on streams by space filling curves disc can adapt to different data distributions to either optimize memory utilization to answer eknn queries under certain accuracy requirements or achieve the best accuracy under given memory constraint at the same time disc provide efficient updates and query processing which are important requirements in data stream applications extensive experiments were conducted using both synthetic and real data sets and the results confirm the effectiveness and efficiency of disc
the main drawbacks of handheld devices small storage space small size of the display screen discontinuance of the connection to the wlan etc are often incompatible with the need of querying and browsing information extracted from enormous amounts of data which are accessible through the network in this application scenario data compression and summarization have leading role data in lossy compressed format can be transmitted more efficiently than the original ones and can be effectively stored in handheld devices setting the compression ratio accordingly in this paper we introduce very effective compression technique for multidimensional data cubes and the system hand olap which exploits this technique to allow handheld devices to extract and browse compressed two dimensional olap views coming from multidimensional data cubes stored on remote olap server localized on the wired network hand olap effectively and efficiently enables olap in mobile environments and also enlarges the potentialities of decision support systems by taking advantage from the naturally decentralized nature of such environments the idea which the system is based on is rather than querying the original multidimensional data cubes it may be more convenient to generate compressed olap view of them store such view into the handheld device and query it locally off line thus obtaining approximate answers that are suitable for olap applications
wikis have demonstrated how it is possible to convert community of strangers into community of collaborators semantic wikis have opened an interesting way to mix web advantages with the semantic web approach pp wikis have illustrated how wikis can be deployed on pp wikis and take advantages of its intrinsic qualities fault tolerance scalability and infrastructure cost sharing in this paper we present the first pp semantic wiki that combines advantages of semantic wikis and pp wikis building pp semantic wiki is challenging it requires building an optimistic replication algorithm that is compatible with pp constraints ensures an acceptable level of consistency and generic enough to handle semantic wiki pages the contribution of this paper is the definition of clear model for building pp semantic wikis we define the data model operations on this model intentions of these operations algorithms to ensure consistency and finally we implement the swooki prototype based on these algorithms
we discuss the parallelization of arithmetic operations on polynomials modulo triangular set we focus on parallel normal form computations since this is core subroutine in many high level algorithms such as triangular decompositions of polynomial systems when computing modulo triangular set multivariate polynomials are regarded recursively as univariate ones which leads to several implementation challenges when one targets highly efficient code we rely on an algorithm proposed in which addresses some of these issues we propose two level parallel scheme first we make use of parallel multidimensional fast fourier transform in order to perform multivariate polynomial multiplication secondly we extract parallelism from the structure of the sequential normal form algorithm of we have realized multithreaded implementation we report on different strategies for the management of tasks and threads
the invention of the hyperlink and the http transmission protocol caused an amazing new structure to appear on the internet the world wide web with the web there came spiders robots and web crawlers which go from one link to the next checking web health ferreting out information and resources and imposing organization on the huge collection of information and dross residing on the net this paper reports on the use of one such crawler to synthesize document collections on various topics in science mathematics engineering and technology such collections could be part of digital library
interacting with mobile applications and services remains difficult for users because of the impact of mobility on both device capabilities and the cognitive resources of users in this paper we explore the idea that interaction with mobile services can be made faster and more convenient if users are allowed to speculatively prepare in advance the services they wish to use and the data they will need essentially we propose to enable users to offset difficult interactions while mobile with larger number of interactions conducted in the desktop environment to illustrate the concept of offsetting we have developed prototype we call the context clipboard early feedback suggests that majority of the participants are in favour of the concept and that they may be prepared to use explicit preparation behaviour to simplify future interactions we close with our reflections on key implications for systems design and challenges for taking this work forward
this paper presents novel approach to automatic image annotation which combines global regional and contextual features by an extended cross media relevance model unlike typical image annotation methods which use either global or regional features exclusively as well as neglect the textual context information among the annotated words the proposed approach incorporates the three kinds of information which are helpful to describe image semantics to annotate images by estimating their joint probability specifically we describe the global features as distribution vector of visual topics and model the textual context as multinomial distribution the global features provide the global distribution of visual topics over an image while the textual context relaxes the assumption of mutual independence among annotated words which is commonly adopted in most existing methods both the global features and textual context are learned by probability latent semantic analysis approach from the training data the experiments over corel images have shown that combining these three kinds of information is beneficial in image annotation
in diverse applications ranging from stock trading to traffic monitoring popular data streams are typically monitored by multiple analysts for patterns of interest these analysts may submit similar pattern mining requests such as cluster detection queries yet customized with different parameter settings in this work we present an efficient shared execution strategy for processing large number of density based cluster detection queries with arbitrary parameter settings given the high algorithmic complexity of the clustering process and the real time responsiveness required by streaming applications serving multiple such queries in single system is extremely resource intensive the naive method of detecting and maintaining clusters for different queries independently is often in feasible in practice as its demands on system resources increase dramatically with the cardinality of the query workload to overcome this we analyze the interrelations between the cluster sets identified by queries with different parameters settings including both pattern specific and window specific parameters we introduce the notion of the growth property among the cluster sets identified by different queries and characterize the conditions under which it holds by exploiting this growth property we propose uniform solution called chandi which represents identified cluster sets as one single compact structure and performs integrated maintenance on them resulting in significant sharing of computational and memory resources our comprehensive experimental study using real data streams from domains of stock trades and moving object monitoring demonstrates that chandi is on average four times faster than the best alternative methods while using less memory space in our test cases it also shows that chandi scales in handling large numbers of queries on the order of hundreds or even thousands under high input data rates
when building enterprise applications that need to be accessed through variety of client devices developers usually strive to implement most of the business logic device independently while using web browser to display the user interface however when those web based front ends shall be rendered on different devices their differing capabilities may require device specific interaction patterns that still need to be specified and implemented efficiently we present an approach for specifying the dialog flows in multi channel web interfaces with very low redundancy and introduce framework that controls web interfaces device specific dialog flows according to those specifications while keeping the enterprise application logic completely device independent
besides the steady growing of size complexity and distribution of present day information systems business volatility with rapid changes in users wishes and technological upgrading are stressing an overwhelmingly need for more advanced conceptual modeling approaches such advanced conceptual models should coherently and soundly reflect these three crucial dimensions namely the size space and evolution over time dimensions in contribution towards such advanced conceptual approaches we presented in data know eng new form of integration of object orientation with emphasize on componentization into variety of algebraic petri nets we referred to as co netsthe purpose of the present paper is to soundly extend this proposal for coping with dynamic changing of structural and behavioral aspects of co nets components to this aim we are proposing an adequate petri net based meta level that may be sketched as follows first we construct two meta nets for each component one concerns the modification of behavioral aspects and the other is for dealing with structural aspects while the meta net for behavioral dynamic enables the dynamic of any transition in given component to be modified at runtime the meta net for structural aspects completes and enhances these capabilities by allowing involved messages and object signatures ie structure to be dynamically manipulated in addition of rigorous description of this meta level and its illustration using medium complexity banking system example we also discuss how this level brings satisfactory solution to the well known inheritance anomaly problem
aspect oriented programming aop is attracting attention from both research and industry as illustrated by the ever growing popularity of aspectj the de facto standard aop extension of java from compiler construction perspective aspectj is interesting as it is typical example of compositional language ie language composed of number of separate languages with different syntactical styles in addition to plain java aspectj includes language for defining pointcuts and one for defining advices language composition represents non trivial challenge for conventional parsing techniques first combining several languages with different lexical syntax leads to considerable complexity in the lexical states to processed second as new language features for aop are being explored many research proposals are concerned with further extending the aspectj language resulting in need for an extensible syntax definitionthis paper shows how scannerless parsing elegantly addresses the issues encountered by conventional techniques when parsing aspectj we present the design of modular extensible and formal definition of the lexical and context free aspects of the aspectj syntax in the syntax definition formalism sdf which is implemented by scannerless generalized lr parser sglr we introduce grammar mixins as novel application of sdf’s modularity features which allows the declarative definition of different keyword policies and combination of extensions we illustrate the modular extensibility of our definition with syntax extensions taken from current research on aspect languages finally benchmarks show the reasonable performance of scannerless generalized lr parsing for this grammar
most contemporary database systems perform cost based join enumeration using some variant of system r’s bottom up dynamic programming method the notable exceptions are systems based on the top down transformational search of volcano cascades as recent work has demonstrated bottom up dynamic programming can attain optimality with respect to the shape of the join graph no comparable results have been published for transformational search however transformational systems leverage benefits of top down search not available to bottom up methods in this paper we describe top down join enumeration algorithm that is optimal with respect to the join graph we present performance results demonstrating that combination of optimal enumeration with search strategies such as branch and bound yields an algorithm significantly faster than those previously described in the literature although our algorithm enumerates the search space top down it does not rely on transformations and thus retains much of the architecture of traditional dynamic programming as such this work provides migration path for existing bottom up optimizers to exploit top down search without drastically changing to the transformational paradigm
we present the empathic painting an interactive painterly rendering whose appearance adapts in real time to reflect the perceived emotional state of the viewer the empathic painting is an experiment into the feasibility of using high level control parameters namely emotional state to replace the plethora of low level constraints users must typically set to affect the output of artistic rendering algorithms we describe suite of computer vision algorithms capable of recognising users facial expressions through the detection of facial action units derived from the facs scheme action units are mapped to vectors within continuous space representing emotional state from which we in turn derive continuous mapping to the style parameters of simple but fast segmentation based painterly rendering algorithm the result is digital canvas capable of smoothly varying its painterly style at approximately frames per second providing novel user interactive experience using only commodity hardware
xml has become widely accepted standard for modelling storing and exchanging structured documents taking advantage of the document structure can result in improving the retrieval performance of xml documents notably growing number of these documents are stored in peer to peer networks which are promising self organizing infrastructures documents are distributed over the peer to peer network by either being stored locally on individual peers or by being assigned to collections such as digital libraries current search methods for xml documents in peer to peer networks lack the use of information retrieval techniques for vague queries and relevance detection our work aims for the development of search engine for xml documents where information retrieval methods are enhanced by using structural information documents and global index are distributed over peer to peer network building virtually unlimited storage space in this paper conceptual architecture for xml information retrieval in peer to peer networks is proposed based on this general architecture component structured architecture for concrete search engine is presented which uses an extension of the vector space model to compute relevance for dynamic xml documents
in this paper we propose new framework to perform motion compression for time dependent geometric data temporal coherence in dynamic geometric models can be used to achieve significant compression thereby leading to efficient storage and transmission of large volumes of data the displacement of the vertices in the geometric models is computed using the iterative closest point icp algorithm this forms the core of our motion prediction technique and is used to estimate the transformation between two successive data sets the motion between frames is coded in terms of few affine parameters with some added residues our motion segmentation approach separates the vertices into two groups within the first group motion can be encoded with few affine parameters without the need of residues in the second group the vertices need further encoding of residual errors also in this group for those vertices associated with large residual errors under affine mapping we encode their motion effectively using newtonian motion estimates this automatic segmentation enables our algorithm to he very effective in compressing time dependent geometric data dynamic range data captured from the real world as well as complex animations created using commercial tools can be compressed efficiently using this scheme
general technique combining model checking and abstraction is presented that allows property based analysis of systems consisting of an arbitrary number of featured components we show how parameterised systems can be specified in guarded command form with constraints placed on variables which occur in guards we prove that results that hold for small number of components can be shown to scale up we then show how featured systems can be specified in similar way by relaxing constraints on guards the main result is generalisation theorem for featured systems which we apply to two well known examples
sensor networks have been widely used to collect data about the environment when analyzing data from these systems people tend to ask exploratory questions they want to find subsets of data namely signal reflecting some characteristics of the environment in this paper we study the problem of searching for drops in sensor data specifically the search is to find periods in history when certain amount of drop over threshold occurs in data within time span we propose framework segdiff for extracting features compressing them and transforming the search into standard database queries approximate results are returned from the framework with the guarantee that no true events are missed and false positives are within user specified tolerance the framework efficiently utilizes space and provides fast response to users search experimental results with real world data demonstrate the efficiency of our framework with respect to feature size and search time
this paper describes new approach to finding performance bottlenecks in shared memory parallel programs and its embodiment in the paradyn parallel performance tools running with the blizzard fine grain distributed shared memory system this approach exploits the underlying system’s cache coherence protocol to detect data sharing patterns that indicate potential performance bottlenecks and presents performance measurements in data centric manner as demonstration parodyn helped us improve the performance of new shared memory application program by factor of four
type checking and type inference are fundamentally similar problems however the algorithms for performing the two operations on the same type system often differ significantly the type checker is typically straightforward encoding of the original type rules for many systems type inference is performed using two phase constraint based algorithmwe present an approach that given the original type rules written as clauses in logic programming language automatically generates an efficient two phase constraint based type inference algorithm our approach works by partially evaluating the type checking rules with respect to the target program to yield set of constraints suitable for input to an external constraint solver this approach avoids the need to manually develop and verify separate type inference algorithm and is ideal for experimentation with and rapid prototyping of novel type systems
billions of devices on chip is around the corner and the trend of deep submicron dsm technology scaling will continue for at least another decade meanwhile designers also face severe on chip parameter variations soft hard errors and high leakage power how to use these billions of devices to deliver power efficient high performance and yet error resilient computation is challenging task in this paper we attempt to demonstrate some of our perspectives to address these critical issues we elaborate on variation aware synthesis holistic error modeling reliable multicore and synthesis for application specific multicore we also present some of our insights for future reliable computing
growing size and complexity of many websites have made navigation through these sites increasingly difficult attempting to automatically predict the next page for website user to visit has many potential benefits for example in site navigation automatic tour generation adaptive web applications recommendation systems web server optimisation web search and web pre fetching this paper describes an approach to link prediction using markov chain model based on an exponentially smoothed transition probability matrix which incorporates site usage statistics collected over multiple time periods the improved performance of this approach compared to earlier methods is also discussed
novel approach to real time string filtering of large databases is presented the proposed approach is based on combination of artificial neural networks and operates in two stages the first stage employs self organizing map for performing approximate string matching and retrieving those strings of the database which are similar to ie assigned to the same som node as the query string the second stage employs harmony theory network for comparing the previously retrieved strings in parallel with the query string and determining whether an exact match exists the experimental results demonstrate accurate fast and database size independent string filtering which is robust to database modifications the proposed approach is put forward for general purpose directory catalogue and glossary search and internet mail blocking intrusion detection systems url and username classification applications
many problems in distributed computing are impossible when no information about process failures is available it is common to ask what information about failures is necessary and sufficient to circumvent some specific impossibility eg consensus atomic commit mutual exclusion etc this paper asks what information about failures is needed to circumvent any impossibility and sufficient to circumvent some impossibility in other words what is the minimal yet non trivial failure informatio we present an abstraction denoted that provides very little failure information in every run of the distributed system eventually informs the processes that some set of processes in the system cannot be the set of correct processes in that run although seemingly weak for it might provide random information for an arbitrarily long period of time and it only excludes one possibility of correct set among many still captures non trivial failure information we show that is sufficient to circumvent the fundamental wait free set agreement impossibility while doing so we disprove previous conjectures about the weakest failure detector to solve set agreement and we prove that solving set agreement with registers is strictly weaker than solving process consensus using process consensus we prove that is in precise sense minimal to circumvent any wait free impossibility roughly we show that is the weakest eventually stable failure detect or to circumvent any wait free impossibility our results are generalized through an abstraction �f that we introduce and prove necessary to solve any problem that cannot be solved in an resilient manner and yet sufficient to solve resilient set agreement
data preparation is an important and critical step in neural network modeling for complex data analysis and it has huge impact on the success of wide variety of complex data analysis tasks such as data mining and knowledge discovery although data preparation in neural network data analysis is important some existing literature about the neural network data preparation are scattered and there is no systematic study about data preparation for neural network data analysis in this study we first propose an integrated data preparation scheme as systematic study for neural network data analysis in the integrated scheme survey of data preparation focusing on problems with the data and corresponding processing techniques is then provided meantime some intelligent data preparation solution to some important issues and dilemmas with the integrated scheme are discussed in detail subsequently cost benefit analysis framework for this integrated scheme is presented to analyze the effect of data preparation on complex data analysis finally typical example of complex data analysis from the financial domain is provided in order to show the application of data preparation techniques and to demonstrate the impact of data preparation on complex data analysis
the key to increasing performance without commensurate increase in power consumption in modern processors lies in increasing both parallelism and core specialization core specialization has been employed in the embedded space and is likely to play an important role in future heterogeneous multi core architectures as well in this paper the face recognition application domain is employed as case study to showcase an architectural design methodology which generates specialized core with high performance and very low powercharacteristics specifically we create asic like execution flows to sustain the high memory parallelism generated within the core the price of this benefit is significant increase in compilation complexity the crux of the problem is the need to co schedule the often conflicting constraints of data access data movement and computation modular compiler approach that employs integer linear programming ilp based interconnect aware instruction and data scheduling techniques to solve this problem is then described the resulting core running the compiled code delivers throughput improvement over high performance processor pentium while simultaneously achieving an energy delay improvement over an energy efficient processor xscale and performs real time face recognition at embedded power budgets
in the ubiquitous computing environment people will interact with everyday objects or computers embedded in them in ways different from the usual and familiar desktop user interface one such typical situation is interacting with applications through large displays such as televisions mirror displays and public kiosks with these applications the use of the usual keyboard and mouse input is not usually viable for practical reasons in this setting the mobile phone has emerged as an excellent device for novel interaction this article introduces user interaction techniques using camera equipped hand held device such as mobile phone or pda for large shared displays in particular we consider two specific but typical situations sharing the display from distance and interacting with touch screen display at close distance using two basic computer vision techniques motion flow and marker recognition we show how camera equipped hand held device can effectively be used to replace mouse and share select and manipulate and objects and navigate within the environment presented through the large display
similarity search has been proved suitable for searching in large collections of unstructured data objects number of practical index data structures for this purpose have been proposed all of them have been devised to process single queries sequentially however in large scale systems such as web search engines indexing multi media content it is critical to deal efficiently with streams of queries rather than with single queries in this paper we show how to achieve efficient and scalable performance in this context to this end we transform sequential index based on clustering into distributed one and devise algorithms and optimizations specially tailored to support high performance parallel query processing
data integration systems provide access to set of heterogeneous autonomous data sources through so called global schema there are basically two approaches for designing data integration system in the global as view approach one defines the elements of the global schema as views over the sources whereas in the local as view approach one characterizes the sources as views over the global schema it is well known that processing queries in the latter approach is similar to query answering with incomplete information and therefore is complex task on the other hand it is common opinion that query processing is much easier in the former approach in this paper we show the surprising result that when the global schema is expressed in the relational model with integrity constraints even of simple types the problem of incomplete information implicitly arises making query processing difficult in the global as view approach as well we then focus on global schemas with key and foreign key constraints which represents situation which is very common in practice and we illustrate techniques for effectively answering queries posed to the data integration system in this case
we consider the view selection problem for xml content based routing given network in which stream of xml documents is routed and the routing decisions are taken based on results of evaluating xpath predicates on these documents select set of views that maximize the throughput of the network while in view selection for relational queries the speedup comes from eliminating joins here the speedup is obtained from gaining direct access to data values in an xml packet without parsing that packet the views in our context can be seen as binary representation of the xml document tailored for the network’s workloadin this paper we define formally the view selection problem in the context of xml content based routing and provide practical solution for it first we formalize the problem while the exact formulation is too complex to admit practical solutions we show that it can be simplified to manageable optimization problem without loss in precision second we show that the simplified problem can be reduced to the integer cover problem the integer cover problem is known to be np hard and to admit log greedy approximation algorithm third we show that the same greedy approximation algorithm performs much better on class of work loads called hierarchical workloads which are typical in xml stream processing namely it returns an optimal solution for hierarchical workloads and degrades gracefully to the log general bound as the workload becomes less hierarchical
this paper describes the denali isolation kernel an operating system architecture that safely multiplexes large number of untrusted internet services on shared hardware denali’s goal is to allow new internet services to be pushed into third party infrastructure relieving internet service authors from the burden of acquiring and maintaining physical infrastructure our isolation kernel exposes virtual machine abstraction but unlike conventional virtual machine monitors denali does not attempt to emulate the underlying physical architecture precisely and instead modifies the virtual architecture to gain scale performance and simplicity of implementation in this paper we first discuss design principles of isolation kernels and then we describe the design and implementation of denali following this we present detailed evaluation of denali demonstrating that the overhead of virtualization is small that our architectural choices are warranted and that we can successfully scale to more than virtual machines on commodity hardware
security typed languages such as jif require the programmer to label variables with information flow security policies as part of application development the compiler then flags errors wherever information leaks may occur resolving these information leaks is critical task in security typed language application development unfortunately because information flows can be quite subtle simple error messages tend to be insufficient for finding and resolving the source of information leaks more sophisticated development tools are needed for this task to this end we provide set of principles to guide the development of such tools furthermore we implement subset of these principles in an integrated development environment ide for jif called jifclipse which is built on the eclipse extensible development platform our plug in provides jif programmer with additional tools to view hidden information generated by jif compilation to suggest fixes for errors and to get more specific information behind an error message better development tools are essential for making security typed application development practical jifclipse is first step in this process
requests for communication via mobile devices can be disruptive to the current task or social situation to reduce the frequency of disruptive requests one promising approach is to provide callers with cues of receiver’s context through an awareness display allowing informed decisions of when to call existing displays typically provide cues based on what can be readily sensed which may not match what is needed during the call decision process in this paper we report results of four week diary study of mobile phone usage where users recorded what context information they considered when making call and what information they wished others had considered when receiving call our results were distilled into lessons that can be used to improve the design of awareness displays for mobile devices eg show frequency of receiver’s recent communication and distance from receiver to her phone we discuss technologies that can enable cues indicated in these lessons to be realized within awareness displays as well as discuss limitations of such displays and issues of privacy
this paper considers the problem of synthesizing application specific network on chip noc architectures we propose two heuristic algorithms called cluster and decompose that can systematically examine different set partitions of communication flows and we propose rectilinear steiner tree rst based algorithms for generating an efficient network topology for each group in the partition different evaluation functions in fitting with the implementation backend and the corresponding implementation technology can be incorporated into our solution framework to evaluate the implementation cost of the set partitions and rst topologies generated in particular we experimented with an implementation cost model based on the power consumption parameters of nm process technology where leakage power is major source of energy consumption experimental results on variety of noc benchmarks showed that our synthesis results can on average achieve reduction in power consumption over the best standard mesh implementation to further gauge the effectiveness of our heuristic algorithms we also implemented an exact algorithm that enumerates all distinct set partitions for the benchmarks where exact results could be obtained our cluster and decompose algorithms on average can achieve results within and of exact results with execution times all under second whereas the exact algorithms took as much as hours
we show that large class of data flow analyses for imperative languages are describable as type systems in the following technical sense possible results of an analysis can be described in language of types so that program checks with type if and only if this type is supertype of the result of applying the analysis type checking is easy with the help of certificate that records the eureka bits of typing derivation certificate assisted type checking amounts to form of lightweight analysis la rose for secure information flow we obtain type system that is considerably more precise than that of volpano et al but not more sophisticated importantly our type systems are compositional
there has been much work in recent years on extending ml with recursive modules one of the most difficult problems in the development of such an extension is the double vision problem which concerns the interaction of recursion and data abstraction in previous work defined type system called rtg which solves the double vision problem at the level of system style core calculus in this paper scale the ideas and techniques of rtg to the level of recursive ml style module calculus called rmc thus establishing that no tradeoff between data abstraction and recursive modules is necessary first describe rmc’s typing rules for recursive modules informally and discuss some of the design questions that arose in developing them then present the formal semantics of rmc which is interesting in its own right the formalization synthesizes aspects of both the definition and the harper stone interpretation of standard ml and includes novel two pass algorithm for recursive module typechecking in which the coherence of the two passes is emphasized by their representation in terms of the same set of inference rules
we present feasibility study for performing virtual address translation without specialized translation hardware removing address translation hardware and instead managing address translation in software has the potential to make the processor design simpler smaller and more energy efficient at little or no cost in performance the purpose of this study is to describe the design and quantify its performance impact trace driven simulations show that software managed address translation is just as efficient as hardware managed address translation moreover mechanisms to support such features as shared memory superpages fine grained protection and sparse address spaces can be defined completely in software allowing for more flexibility than in hardware defined mechanisms
abstract this paper provides an extensive overview of existing research in the field of software refactoring this research is compared and discussed based on number of different criteria the refactoring activities that are supported the specific techniques and formalisms that are used for supporting these activities the types of software artifacts that are being refactored the important issues that need to be taken into account when building refactoring tool support and the effect of refactoring on the software process running example is used throughout the paper to explain and illustrate the main concepts
we study oblivious routing in fat tree based system area networks with deterministic routing under the assumption that the traffic demand is uncertain the performance of routing algorithm under uncertain traffic demands is characterized by the oblivious performance ratio that bounds the relative performance of the routing algorithm with respect to the optimal algorithm for any given traffic demand we consider both single path routing where only one path is used to carry the traffic between each source destination pair and multipath routing where multiple paths are allowed for single path routing we derive lower bounds of the oblivious performance ratio for different fat trees and develop routing schemes that achieve the optimal oblivious performance ratios for commonly used topologies our evaluation results indicate that the proposed oblivious routing schemes not only provide the optimal worst case performance guarantees but also outperform existing schemes in average cases for multipath routing we show that it is possible to obtain an optimal scheme for all traffic demands an oblivious performance ratio of these results quantitatively demonstrate the performance difference between single path routing and multipath routing in fat trees
programming paradigms are designed to express algorithms elegantly and efficiently there are many parallel programming paradigms each suited to certain class of problems selecting the best parallel programming paradigm for problem minimizes programming effort and maximizes performance given the increasing complexity of parallel applications no one paradigm may be suitable for all components of an application today most parallel scientific applications are programmed with single paradigm and the challenge of multi paradigm parallel programming remains unmet in the broader communitywe believe that each component of parallel program should be programmed using the most suitable paradigm furthermore it is not sufficient to simply bolt modules together programmers should be able to switch between paradigms easily and resource management across paradigms should be automatic we present pre existing adaptive runtime system arts and show how it can be used to meet these challenges by allowing the simultaneous use of multiple parallel programming paradigms and supporting resource management across all of them we discuss the implementation of some common paradigms within the arts and demonstrate the use of multiple paradigms within our featurerich unstructured mesh framework we show how this approach boosts performance and productivity for an application developed using this framework
this paper presents the design principles implementation and evaluation of the retos operating system which is specifically developed for micro sensor nodes retos has four distinct objectives which are to provide multithreaded programming interface system resiliency kernel extensibility with dynamic reconfiguration and wsn oriented network abstraction retos is multithreaded operating system hence it provides the commonly used thread model of programming interface to developers we have used various implementation techniques to optimize the performance and resource usage of multithreading retos also provides software solutions to separate kernel from user applications and supports their robust execution on mmu less hardware the retos kernel can be dynamically reconfigured via loadable kernel framework so application optimized and resource efficient kernel is constructed finally the networking architecture in retos is designed with layering concept to provide wsn specific network abstraction retos currently supports atmel atmega ti msp and chipcon cc family of microcontrollers several real world wsn applications are developed for retos and the overall evaluation of the systems is described in the paper
some software defects trigger failures only when certain complex information flows occur within the software profiling and analyzing such flows therefore provides potentially important basis for filtering test cases we report the results of an empirical evaluation of several test case filtering techniques that are based on exercising complex information flows both coverage based and profile distribution based filtering techniques are considered they are compared to filtering techniques based on exercising basic blocks branches function calls and def use pairs with respect to their effectiveness for revealing defects
if kolmogorov complexity measures information in one object and information distance measures information shared by two objects how do we measure information shared by many objects this paper provides an initial pragmatic study of this fundamental data mining question firstly em xn is defined to be the minimum amount of thermodynamic energy needed to convert from any xi to any xj with this definition several theoretical problems have been solved second our newly proposed theory is applied to select comprehensive review and specialized review from many reviews core feature words expanded words and dependent words are extracted respectively comprehensive and specialized reviews are selected according to the information among them this method of selecting single review can be extended to select multiple reviews as well finally experiments show that this comprehensive and specialized review mining method based on our new theory can do the job efficiently
this paper proposes statistical tree to tree model for producing translations two main contributions are as follows method for the extraction of syntactic structures with alignment information from parallel corpus of translations and use of discriminative feature based model for prediction of these target language syntactic structures which we call aligned extended projections or aeps an evaluation of the method on translation from german to english shows similar performance to the phrase based model of koehn et al
this paper presents and validates performance models for varietyvof high performance collective communication algorithms for systems with cell processors the systems modeled include single cell processor two cell chips on cell blade and cluster of cell blades the models extend plogp the well known point topoint performance model by accounting for the unique hardware characteristics of the cell eg heterogeneous interconnects and dma engines and by applying the model to collective communication this paper also presents micro benchmark suite to accurately measure the extended plogp parameters on the cell blade and then uses these parameters to model different algorithms for the barrier broadcast reduce all reduce and all gather collective operations out of total performance predictions of them see less than error compared to the actual execution time and all of them see less than
while association rule mining is one of the most popular data mining techniques it usually results in many rules some of which are not considered as interesting or significant for the application at hand in this paper we conduct systematic approach to ascertain the discovered rules and provide rigorous statistical approach supporting this framework the strategy proposed combines data mining and statistical measurement techniques including redundancy analysis sampling and multivariate statistical analysis to discard the non significant rules real world dataset is used to demonstrate how the proposed unified framework can discard many of the redundant or non significant rules and still preserve high accuracy of the rule set as whole
the implementation and maintenance of industrial applications have continuously become more and more difficult in this context one problem is the evaluation of complex systems the ieee defines prototyping as development approach promoting the implementation of pilot version of the intended product this approach is potential solution to the early evaluation of system it can also be used to avoid the shift between the description specification of system and its implementation this brief introduction to the special section on rapid system prototyping illustrates current picture of prototyping
in this paper we investigate the difference between wikipedia and web link structure with respect to their value as indicators of the relevance of page for given topic of request our experimental evidence is from two ir test collections the gov collection used at the trec web tracks and the wikipedia xml corpus used at inex we first perform comparative analysis of wikipedia and gov link structure and then investigate the value of link evidence for improving search on wikipedia and on the gov domain our main findings are first wikipedia link structure is similar to the web but more densely linked second wikipedia’s outlinks behave similar to inlinks and both are good indicators of relevance whereas on the web the inlinks are more important third when incorporating link evidence in the retrieval model for wikipedia the global link evidence fails and we have to take the local context into account
we study the problem of updating xml repository through security views users are provided with the view of the repository schema they are entitled to see they write update requests over their view using the xupdate language each request is processed in two rewriting steps first the xpath expression selecting the nodes to update from the view is rewritten to another expression that only selects nodes the user is permitted to see second the xupdate query is refined according to the write privileges held by the user
high end computing is suffering data deluge from experiments simulations and apparatus that creates overwhelming application dataset sizes end user workstations despite more processing power than ever before are ill equipped to cope with such data demands due to insufficient secondary storage space and rates meanwhile large portion of desktop storage is unused we present the freeloader framework which aggregates unused desktop storage space and bandwidth into shared cache scratch space for hosting large immutable datasets and exploiting data access locality our experiments show that freeloader is an appealing low cost solution to storing massive datasets by delivering higher data access rates than traditional storage facilities in particular we present novel data striping techniques that allow freeloader to efficiently aggregate workstation’s network communication bandwidth and local bandwidth in addition the performance impact on the native workload of donor machines is small and can be effectively controlled
specialization of heap objects is critical for pointer analysis to effectively analyze complex memory activity this paper discusses heap specialization with respect to call chains due to the sheer number of distinct call chains exhaustive specialization can be cumbersome on the other hand insufficient specialization can miss valuable opportunities to prevent spurious data flow which results in not only reduced accuracy but also increased overheadin determining whether further specialization will be fruitful an object’s escape information can be exploited from empirical study we found that restriction based on escape information is often but not always sufficient at prohibiting the explosive nature of specializationfor in depth case study four representative benchmarks are selected for each benchmark we vary the degree of heap specialization and examine its impact on analysis results and time to provide better visibility into the impact we present the points to set and pointed to by set sizes in the form of histograms
we introduce method for extracting boundary surfaces from volumetric models of mechanical parts by ray ct scanning when the volumetric model is composed of two materials one for the object and the other for the background air these boundary surfaces can be extracted as isosurfaces using contouring method such as marching cubes lorensen and cline for volumetric model composed of more than two materials we need to classify the voxel types into segments by material and use generalized marching cubes algorithm that can deal with both ct values and material types here we propose method for precisely classifying the volumetric model into its component materials using modified and combined method of two well known algorithms in image segmentation region growing and graph cut we then apply the generalized marching cubes algorithm to generate triangulated mesh surfaces in addition we demonstrate the effectiveness of our method by constructing high quality triangular mesh models of the segmented parts
this study proposes an extended geographical database based on photo shooting history to enable the suggestion of candidate captions to newly shot photos the extended geographical database consists of not only subject positions but also the likely shooting positions and directions estimated using the histories of the shooting positions and directions of the subjects user can add caption to photo by selecting an appropriate one from the candidate captions the candidate captions are acquired using the shooting position and direction as key to the extended geographical database in this paper we present the results of experiments for constructing the extended geographical database using prototype system
this paper explores using information about program branch probabilities to optimize the results of hardware compilation the basic premise is to promote utilization by dedicating more resources to branches which execute more frequently new hardware compilation and flow control scheme are presented which enable the computation rate of different branches to be matched to the observed branch probabilities we propose an analytical queuing network performance model to determine the optimal settings for basic block computation rates given set of observed branch probabilities an experimental hardware compilation system has been developed to evaluate this approach the branch optimization design space is characterized in an experimental study for xilinx virtex fpgas of two complex applications video feature extraction and progressive refinement radiosity for designs of equal performance branch optimized designs require percent and percent less area for designs of equal area branch optimized designs run up to three times faster our analytical performance model is shown to be highly accurate with relative error between and times
we present an architecture designed for alert verification ie to reduce false positives in network intrusion detection systems our technique is based on systematic and automatic anomaly based analysis of the system output which provides useful context information regarding the network services the false positives raised by the nids analyzing the incoming traffic which can be either signature or anomaly based are reduced by correlating them with the output anomalies we designed our architecture for tcp based network services which have client server architecture such as http benchmarks show substantial reduction of false positives between and
xml has become the lingua franca for data exchange and integration across administrative and enterprise boundaries nearly all data providers are adding xml import or export capabilities and standard xml schemas and dtds are being promoted for all types of data sharing the ubiquity of xml has removed one of the major obstacles to integrating data from widely disparate sources namely the heterogeneity of data formats however general purpose integration of data across the wide are also requires query processor that can query data sources on demand receive streamed xml data from them and combine and restructure the data into new xml output while providing good performance for both batch oriented and ad hoc interactive queries this is the goal of the tukwila data integration system the first system that focuses on network bound dynamic xml data sources in contrast to previous approaches which must read parse and often store entire xml objects before querying them tukwila can return query results even as the data is streaming into the system tukwila is built with new system architecture that extends adaptive query processing and relational engine techniques into the xml realm as facilitated by pair of operators that incrementally evaluate query’s input path expressions as data is read in this paper we describe the tukwila architecture and its novel aspects and we experimentally demonstrate that tukwila provides better overall query performance and faster initial answers than existing systems and has excellent scalability
new dynamic cache resizing scheme for low power cam tag caches is introduced control algorithm that is only activated on cache misses uses duplicate set of tags the miss tags to minimize active cache size while sustaining close to the same hit rate as full size cache the cache partitioning mechanism saves both switching and leakage energy in unused partitions with little impact on cycle time simulation results show that the scheme saves of data cache energy and of instruction cache energy with minimal performance impact
we present constructs that create manage and verify digital audit trails for versioning file systems based upon small amount of data published to third party file system commits to version history at later date an auditor uses the published data to verify the contents of the file system at any point in time audit trails create an analog of the paper audit process for file data helping to meet the requirements of electronic record legislation such as sarbanes oxley our techniques address the and computational efficiency of generating and verifying audit trails the aggregation of audit information in directory hierarchies and constructing verifiable audit trails in the presence of lost data
the performance of modern machines is increasingly limited by insufficient memory bandwidth one way to alleviate this bandwidth limitation for given program is to minimize the aggregate data volume the program transfers from memory in this article we present compiler strategies for accomplishing this minimization following discussion of the underlying causes of bandwidth limitations we present two step strategy to exploit global cache reuse the temporal reuse across the whole program and the spatial reuse across the entire data set used in that program in the first step we fuse computation on the same data using technique called reuse based loop fusion to integrate loops with different control structures we prove that optimal fusion for bandwidth is np hard and we explore the limitations of computation fusion using perfect program information in the second step we group data used by the same computation through the technique of affinity based data regrouping which intermixes the storage assignments of program data elements at different granularities we show that the method is compile time optimal and can be used on array and structure data we prove that two extensions partial and dynamic data regrouping are np hard problems finally we describe our compiler implementation and experiments demonstrating that the new global strategy on average reduces memory traffic by over and improves execution speed by over on two high end workstations
it has been proposed that email clients could be improved if they presented messages grouped into conversations an email conversation is the tree of related messages that arises from the use of the reply operation we propose two models of conversation the first model characterizes conversation as chronological sequence of messages the second as tree based on the reply relationship we show how existing email clients and prior research projects implicitly support each model to greater or lesser degree depending on their design but none fully supports both models simultaneously we present mixed model visualization that simultaneously presents sequence and reply relationships among the messages of conversation making both visible at glance we describe the integration of the visualization into working prototype email client usability study indicates that the system meets our usability goals and verifies that the visualization fully conveys both types of relationships within the messages of an email conversation
we collected mobility traces of avatars spanning multiple regions in second life popular user created virtual world we analyzed the traces to characterize the dynamics of the avatars mobility and behavior both temporally and spatially we discuss the implications of our findings on the design of peer to peer architecture interest management mobility modeling of avatars server load balancing and zone partitioning caching and prefetching for user created virtual worlds
requirement for rendering realistic images interactively is efficiently simulating material properties recent techniques have improved the quality for interactively rendering dielectric materials but have mostly neglected phenomenon associated with refraction namely total internal reflection we present an algorithm to approximate total internal reflection on commodity graphics hardware using ray depth map intersection technique that is interactive and requires no precomputation our results compare favorably with ray traced images and improve upon approaches that avoid total internal reflection
during the last two decades wide variety of advanced methods for the visual exploration of large data sets have been proposed for most of these techniques user interaction has become crucial element since there are many situations in which users or analysts have to select the right parameter settings from among many in order to construct insightful visualizations the right choice of input parameters is essential since suboptimal parameter settings or the investigation of irrelevant data dimensions make the exploration process more time consuming and may result in wrong conclusions but finding the right parameters is often tedious process and it becomes almost impossible for an analyst to find an optimal parameter setting manually because of the volume and complexity of today’s data sets therefore we propose novel approach for automatically determining meaningful parameter and attribute settings based on the combined analysis of the data space and the resulting visualizations with respect to given task our technique automatically analyzes pixel images resulting from visualizations created from diverse parameter mappings and ranks them according to the potential value for the user this allows more effective and more efficient visual data analysis process since the attribute parameter space is reduced to meaningful selections and thus the analyst obtains faster insight into the data real world applications are provided to show the benefit of the proposed approach
procrastination scheduling has gained importance for energy efficiency due to the rapid increase in the leakage power consumption under procrastination scheduling task executions are delayed to extend processor shutdown intervals thereby reducing the idle energy consumption we propose algorithms to compute the maximum procrastination intervals for tasks scheduled by either the fixed priority or the dual priority scheduling policy we show that dual priority scheduling always guarantees longer shutdown intervals than fixed priority scheduling we further combine procrastination scheduling with dynamic voltage scaling to minimize the total static and dynamic energy consumption of the system our simulation experiments show that the proposed algorithms can extend the sleep intervals up to times while meeting the timing requirements the results show up to energy gains over dynamic voltage scaling
this paper studies virus inoculation game on social networks framework is presented which allows the measuring of the windfall of friendship ie how much players benefit if they care about the welfare of their direct neighbors in the social network graph compared to purely selfish environments we analyze the corresponding equilibria and show that the computation of the worst and best nash equilibrium is np hard intriguingly even though the windfall of friendship can never be negative the social welfare does not increase monotonically with the extent to which players care for each other while these phenomena are known on an anecdotal level our framework allows us to quantify these effects analytically
in this paper we present nocee fast and accurate method for extracting energy models for packet switched network on chip noc routers linear regression is used to model the relationship between events occurring in the noc and energy consumption the resulting models are cycle accurate and can be applied to different technology libraries we verify the individual router estimation models with many different synthetically generated traffic patterns and data inputs characterization of small library takes about two hours the mean absolute energy estimation error of the resultant models is max against complete gate level simulation we also apply this method to number of complete nocs with inputs extracted from synthetic application traces and compare our estimated results to the gate level power simulations mean absolute error is our estimation methodology has been integrated with commercial logic synthesis flow and power estimation tools synopsys design compiler and primepower allowing application across different designs the extracted models show the different trends across various parameterizations of network on chip routers and have been integrated into an architecture exploration framework
data cube has been playing an essential role in fast olap online analytical processing in many multi dimensional data warehouses however there exist data sets in applications like bioinformatics statistics and text processing that are characterized by high dimensionality eg over dimensions and moderate size eg around tuples no feasible data cube can be constructed with such data sets in this paper we will address the problem of developing an efficient algorithm to perform olap on such data sets experience tells us that although data analysis tasks may involve high dimensional space most olap operations are performed only on small number of dimensions at time based on this observation we propose novel method that computes thin layer of the data cube together with associated value list indices this layer while being manageable in size will be capable of supporting flexible and fast olap operations in the original high dimensional space through experiments we will show that the method has costs that scale nicely with dimensionality furthermore the costs are comparable to that of accessing an existing data cube when full materialization is possible
this paper presents domain specific dependency constraint language that allows software architects to restrict the spectrum of structural dependencies which can be established in object oriented systems the ultimate goal is to provide architects with means to define acceptable and unacceptable dependencies according to the planned architecture of their systems once defined such restrictions are statically enforced by tool thus avoiding silent erosions in the architecture the paper also presents results from applying the proposed approach to different versions of real world human resource management system copyright copy john wiley sons ltd
the processing of xml queries can result in evaluation of various structural relationships efficient algorithms for evaluating ancestor descendant and parent child relationships have been proposed whereas the problems of evaluating preceding sibling following sibling and preceding following relationships are still open in this paper we studied the structural join and staircase join for sibling relationship first the idea of how to filter out and minimize unnecessary reads of elements using parent’s structural information is introduced which can be used to accelerate structural joins of parent child and preceding sibling following sibling relationships second two efficient structural join algorithms of sibling relationship are proposed these algorithms lead to optimal join performance nodes that do not participate in the join can be judged beforehand and then skipped using tree index besides each element list joined is scanned sequentially once at most furthermore output of join results is sorted in document order we also discussed the staircase join algorithm for sibling axes studies show that staircase join for sibling axes is close to the structural join for sibling axes and shares the same characteristic of high efficiency our experimental results not only demonstrate the effectiveness of our optimizing techniques for sibling axes but also validate the efficiency of our algorithms as far as we know this is the first work addressing this problem specially
computing clusters cc consisting of several connected machines could provide high performance multiuser time sharing environment for executing parallel and sequential jobs in order to achieve good performance in such an environment it is necessary to assign processes to machines in manner that ensures efficient allocation of resources among the jobs this paper presents opportunity cost algorithms for online assignment of jobs to machines in cc these algorithms are designed to improve the overall cpu utilization of the cluster and to reduces the and the interprocess communication ipc overhead our approach is based on known theoretical results on competitive algorithms the main contribution of the paper is how to adapt this theory into working algorithms that can assign jobs to machines in manner that guarantees near optimal utilization of the cpu resource for jobs that perform and ipc operations the developed algorithms are easy to implement we tested the algorithms by means of simulations and executions in real system and show that they outperform existing methods for process allocation that are based on ad hoc heuristics
we present novel algorithm called clicks that finds clusters in categorical datasets based on search for partite maximal cliques unlike previous methods clicks mines subspace clusters it uses selective vertical method to guarantee complete search clicks outperforms previous approaches by over an order of magnitude and scales better than any of the existing method for high dimensional datasets these results are demonstrated in comprehensive performance study on real and synthetic datasets
we propose new instruction branch on random that is like standard conditional branch except rather than specifying the condition on which the branch should be taken it specifies frequency at which the branch should be taken we show that branch on random is useful for reducing the overhead of program instrumentation via sampling specifically branch on random provides an order of magnitude reduction in execution time overhead compared to previously proposed software only frameworks for instrumentation sampling furthermore we demonstrate that branch on random can be cleanly architected and implemented simply and efficiently for simple processors we estimate that branch on random can be implemented with bits of state and less than gates for aggressive superscalars this grows to less than bits of state and at most few hundred gates
sharir and welzl introduced an abstract framework for optimization problems called lp type problems or also generalized linear programming problems which proved useful in algorithm design we define new and as we believe simpler and more natural framework violator spaces which constitute proper generalization of lp type problems we show that clarkson’s randomized algorithms for low dimensional linear programming work in the context of violator spaces for example in this way we obtain the fastest known algorithm for the matrix generalized linear complementarity problem with constant number of blocks we also give two new characterizations of lp type problems they are equivalent to acyclic violator spaces as well as to concrete lp type problems informally the constraints in concrete lp type problem are subsets of linearly ordered ground set and the value of set of constraints is the minimum of its intersection
sql extensions that allow queries to explicitly specify data quality requirements in terms of currency and consistency were proposed in an earlier paper this paper develops data quality aware finer grained cache model and studies cache design in terms of four fundamental properties presence consistency completeness and currency the model provides an abstract view of the cache to the query processing layer and opens the door for adaptive cache management we describe an implementation approach that builds on the mtcache framework for partially materialized views the optimizer checks most consistency constraints and generates dynamic plan that includes currency checks and inexpensive checks for dynamic consistency constraints that cannot be validated during optimization our solution not only supports transparent caching but also provides fine grained data currency and consistency guarantees
significant fraction of the software and resource usage of modern handheld computer is devoted to its graphical user interface gui moreover guis are direct users of the display and also determine how users interact with software given that displays consume significant fraction of system energy it is very important to optimize guis for energy consumption this work presents the first gui energy characterization methodology energy consumption is characterized for three popular gui platforms windows window system and qt from the hardware software and application perspectives based on this characterization insights are offered for improving gui platforms and designing guis in an energy efficient and aware fashion such characterization also provides firm basis for further research on gui energy optimization
in this work we analyze the complexity of local broadcasting in the physical interference model we present two distributed randomized algorithms one that assumes that each node knows how many nodes there are in its geographical proximity and another which makes no assumptions about topology knowledge we show that if the transmission probability of each node meets certain characteristics the analysis can be decoupled from the global nature of the physical interference model and each node performs successful local broadcast in time proportional to the number of neighbors in its physical proximity we also provide worst case optimality guarantees for both algorithms and demonstrate their behavior in average scenarios through simulations
we consider the problem if given program satisfies specified safety property interesting programs have infinite state spaces with inputs ranging over infinite domains and for these programs the property checking problem is undecidable two broad approaches to property checking are testing and verification testing tries to find inputs and executions which demonstrate violations of the property verification tries to construct formal proof which shows that all executions of the program satisfy the property testing works best when errors are easy to find but it is often difficult to achieve sufficient coverage for correct programs on the other hand verification methods are most successful when proofs are easy to find but they are often inefficient at discovering errors we propose new algorithm synergy which combines testing and verification synergy unifies several ideas from the literature including counterexample guided model checking directed testing and partition refinementthis paper presents description of the synergy algorithm its theoretical properties comparison with related algorithms and prototype implementation called yogi
this paper presents our toolkit for developing java bytecode translator bytecode translation is getting important in various domains such as generative programming and aspect oriented programming to help the users easily develop translator the design of our toolkit is based on the reflective architecture however the previous implementations of this architecture involved serious runtime penalties to address this problem our toolkit uses custom compiler so that the runtime penalties are minimized since the previous version of our toolkit named javassist has been presented in another paper this paper focuses on this new compiler support for performance improvement this feature was not included in the previous version
click fraud is jeopardizing the industry of internet advertising internet advertising is crucial for the thriving of the entire internet since it allows producers to advertise their products and hence contributes to the well being of commerce moreover advertising supports the intellectual value of the internet by covering the running expenses of publishing content some content publishers are dishonest and use automation to generate traffic to defraud the advertisers similarly some advertisers automate clicks on the advertisements of their competitors to deplete their competitors advertising budgets this paper describes the advertising network model and focuses on the most sophisticated type of fraud which involves coalitions among fraudsters we build on several published theoretical results to devise the similarity seeker algorithm that discovers coalitions made by pairs of fraudsters we then generalize the solution to coalitions of arbitrary sizes before deploying our system on real network we conducted comprehensive experiments on data samples for proof of concept the results were very accurate we detected several coalitions formed using various techniques and spanning numerous sites this reveals the generality of our model and approach
regular path queries are the building blocks of almost any mechanism for querying semistructured data despite the fact that the main applications of such data are distributed there are only few works dealing with distributed evaluation of regular path queries in this paper we present message efficient and truly distributed algorithm for computing the answer to regular path queries in multi source semistructured database setting our algorithm is general as it works for the larger class of weighted regular path queries on weighted semistructured databases also we show how to make our algorithm fault tolerant to smoothly work in environments prone to process or machine failures this is very desirable in grid setting which is today’s new paradigm of distributed computing and where one does not have full control over machines that can unexpectedly leave in the middle of computation
in this paper we address the problem of garbage collection in single failure fault tolerant home based lazy release consistency hlrc distributed shared memory dsm system based on independent checkpointing and logging our solution uses laziness in garbage collection and exploits consistency constraints of the hlrc memory model for low overhead and scalability we prove safe bounds on the state that must be retained in the system to guarantee correct recovery after failure we devise two algorithms for garbage collection of checkpoints and logs checkpoint garbage collection cgc and lazy log trimming llt the proposed approach targets large scale distributed shared memory computing on local area clusters of computers in such systems using global synchronization or extra communication for garbage collection is inefficient or simply impractical due to system scale and temporary disconnections in communication the challenge lies in controlling the size of the logs and the number of checkpoints without global synchronization while tolerating transient disruptions in communication our garbage collection scheme is completely distributed does not force processes to synchronize does not add extra messages to the base dsm protocol and uses only the available dsm protocol information evaluation results for real applications show that it effectively bounds the number of past checkpoints to be retained and the size of the logs in stable storage
in on line analytical processing olap users explore database cube with roll up and drill down operations in order to find interesting results most approaches rely on simple aggregations and value comparisons in order to validate findings in this work we propose to combine olap dimension lattice traversal and statistical tests to discover significant metric differences between highly similar groups parametric statistical test allows pair wise comparison of neighboring cells in cuboids providing statistical evidence about the validity of findings we introduce two dimensional checkerboard visualization of the cube that allows interactive exploration to understand significant measure differences between two cuboids differing in one dimension along with associated image data our system is tightly integrated into relational dbms by dynamically generating sql code which incorporates several optimizations to efficiently explore the cube to visualize discovered cell pairs and to view associated images we present an experimental evaluation with medical data sets focusing on finding significant relationships between risk factors and disease
gaifman shapiro style architecture of program modules is introduced in the case of normal logic programs under stable model semantics the composition of program modules is suitably limited by module conditions which ensure the compatibility of the module system with stable models the resulting module theorem properly strengthens lifschitz and turner’s splitting set theorem for normal logic programs consequently the respective notion of equivalence between modules ie modular equivalence proves to be congruence relation moreover it is shown how our translation based verification method is accommodated to the case of modular equivalence and how the verification of weak visible equivalence can be optimized as sequence of module level tests
time series graphs are often used to visualize phenomena that change over time common tasks include comparing values at different points in time and searching for specified patterns either exact or approximate however tools that support time series graphs typically separate query specification from the actual search process allowing users to adapt the level of similarity only after specifying the pattern we introduce relaxed selection techniques in which users implicitly define level of similarity that can vary across the search pattern while creating search query with single gesture interaction users sketch over part of the graph establishing the level of similarity through either spatial deviations from the graph or the speed at which they sketch temporal deviations in user study participants were significantly faster when using our temporally relaxed selection technique than when using traditional techniques in addition they achieved significantly higher precision and recall with our spatially relaxed selection technique compared to traditional techniques
the computation time of scalable tasks depends on the number of processors allocated to them in multiprocessor systems as more processors are allocated to scalable task the overall computation time of the task decreases but the total amount of processors time devoted to the execution of the task called workload increases due to parallel execution overhead in this paper we propose task scheduling algorithm that utilizes the property of scalable tasks for on line and real time scheduling in the proposed algorithm the total workload of all scheduled tasks is reduced by managing processors allocated to the tasks as few as possible without missing their deadlines as result the processors in the system have less load to execute the scheduled tasks and can execute more newly arriving tasks before their deadlines simulation results show that the proposed algorithm performs significantly better than the conventional algorithm based on fixed number of processors to execute each task
rough set theory is useful tool for dealing with inexact uncertain or vague knowledge in information systems the classical rough set theory is based on equivalence relations and has been extended to covering based generalized rough set theory this paper investigates three types of covering generalized rough sets within an axiomatic approach concepts and basic properties of each type of covering based approximation operators are first reviewed axiomatic systems of the covering based approximation operators are then established the independence of axiom set for characterizing each type of covering based approximation operators is also examined as result two open problems about axiomatic characterizations of covering based approximation operators proposed by zhu and wang in ieee transactions on knowledge and data engineering proceedings of the third ieee international conference on intelligent systems pp are solved
in the last several years large multidimensional databases have become common in variety of applications such as data warehousing and scientific computing analysis and exploration tasks place significant demands on the interfaces to these databases because of the size of the data sets dense graphical representations are more effective for exploration than spreadsheets and charts furthermore because of the exploratory nature of the analysis it must be possible for the analysts to change visualizations rapidly as they pursue cycle involving first hypothesis and then experimentation in this paper we present polaris an interface for exploring large multidimensional databases that extends the well known pivot table interface the novel features of polaris include an interface for constructing visual specifications of table based graphical displays and the ability to generate precise set of relational queries from the visual specifications the visual specifications can be rapidly and incrementally developed giving the analyst visual feedback as they construct complex queries and visualizations
mobile devices have been used as tools for navigation and geographic information retrieval with some success however screen size glare and the cognitive demands of the interface are often cited as weaknesses when compared with traditional tools such as paper maps and guidebooks in this paper simple mixed media approach is presented which tries to address some of these concerns by combining paper maps with electronic guide resources information about landmark or region is accessed by waving handheld computer equipped with an radio frequency identification rfid reader above the region of interest on paper map we discuss our prototyping efforts including lessons learned about using rfid for mixed media interfaces we then present and discuss evaluations conducted in the field and in comparative exploratory study results indicate that the method is promising for tourism and other activities requiring mobile geographically related information access
most existing reranking approaches to image search focus solely on mining visual cues within the initial search results however the visual information cannot always provide enough guidance to the reranking process for example different images with similar appearance may not always present the same relevant information to the query observing that multi modality cues carry complementary relevant information we propose the idea of co reranking for image search by jointly exploring the visual and textual information co reranking couples two random walks while reinforcing the mutual exchange and propagation of information relevancy across different modalities the mutual reinforcement is iteratively updated to constrain information exchange during random walk as result the visual and textual reranking can take advantage of more reliable information from each other after every iteration experiment results on real world dataset msra mm collected from bing image search engine shows that co reranking outperforms several existing approaches which do not or weakly consider multi modality interaction
to operate reliably in environments where interaction with an operator is infrequent or undesirable an autonomous system should be capable of both determining how to achieve its objectives and adapting to novel circumstances on its own we have developed an approach to constructing autonomous systems that synthesise tasks from high level goals and adapt their software architecture to perform these tasks reliably in changing environment this paper presents our approach through detailed case study highlighting the challenges involved
in this article we present the parallelisation of an explicit state ctl model checking algorithm for virtual shared memory high performance parallel machine architecture the algorithm uses combination of private and shared data structures for implicit and dynamic load balancing with minimal synchronisation overhead the performance of the algorithm and the impact that different design decisions have on the performance are analysed using both mathematical cost models and experimental results the analysis shows not only the practicality and effective speedup of the algorithm but also the main pitfalls of parallelising model checking for shared memory architectures
due to the huge increase in the number and dimension of available databases efficient solutions for counting frequent sets are nowadays very important within the data mining community several sequential and parallel algorithms were proposed which in many cases exhibit excellent scalability in this paper we present pardci distributed and multithreaded algorithm for counting the occurrences of frequent sets within transactional databases pardci is parallel version of dci direct count intersect multi strategy algorithm which is able to adapt its behavior not only to the features of the specific computing platform eg available memory but also to the features of the dataset being processed eg sparse or dense datasets pardci enhances previous proposals by exploiting the highly optimized counting and intersection techniques of dci and by relying on multi level parallelization approach which explicitly targets clusters of smps an emerging computing platform we focused our work on the efficient exploitation of the underlying architecture intra node multithreading effectively exploits the memory hierarchies of each smp node while inter node parallelism exploits smart partitioning techniques aimed at reducing communication overheads in depth experimental evaluations demonstrate that pardci reaches nearly optimal performances under variety of conditions
describing and capturing significant differences between two classes of data is an important data mining and classification research topic in this paper we use emerging patterns to describe these significant differences such pattern occurs in one class of samples its home class with high frequency but does not exist in the other class so it can be considered as characteristic property of its home class we call the collection of all such patterns space beyond the space there are patterns that occur in both of the classes or that do not occur in any of the two classes within the space the most general and most specific patterns bound the other patterns in lossless convex way we decompose the space into terrace of pattern plateaus based on their frequency we use the most general patterns to construct accurate classifiers we also use these patterns in the bio medical domain to suggest treatment plans for adjusting the expression levels of certain genes so that patients can be cured
we present an algorithm for polygonizing closed implicit surfaces which produces meshes adapted to the local curvature of the surface our method is similar to but not based on marching triangles in that we start from point on the surface and develop mesh from that point using surface tracking approach however our approach works by managing fronts or sets of points on the border of the current polygonization fronts can subdivide to form new fronts or merge if they become adjacent in marked departure from previous approaches our meshes approximate the surface through heuristics relying on curvature furthermore our method works completely on the fly resolving cracks as it proceeds without the need for any post remeshing step to correct failures we have tested the algorithm with three different representations of implicit surfaces variational analytical and mpu using non trivial data sets yielding results that illustrate the flexibility and scalability of our technique performance comparisons with variants of marching cubes show that our approach is capable of good accuracy and meshing quality without sacrificing computing resources
we study fractional scheduling problems in sensor networks in particular sleep scheduling generalisation of fractional domatic partition and activity scheduling generalisation of fractional graph colouring the problems are hard to solve in general even in centralised setting however we show that there are practically relevant families of graphs where these problems admit local distributed approximation algorithm in local algorithm each node utilises information from its constant size neighbourhood only our algorithm does not need the spatial coordinates of the nodes it suffices that subset of nodes is designated as markers during network deployment our algorithm can be applied in any marked graph satisfying certain bounds on the marker density if the bounds are met guaranteed near optimal solutions can be found in constant time space and communication per nodewe also show that auxiliary information is necessary no local algorithm can achieve satisfactory approximation guarantee on unmarked graphs
as the number of cores on chip increases power consumed by the communication structures takes significant portion of the overall power budget in this article we first propose circuit switched interconnection architecture which uses crossroad switches to construct dedicated channels dynamically between any pairs of cores for nonhuge application specific socs the structure of the crossroad switch is simple which can be regarded as noc lite router and we can easily construct low power on chip network with these switches by system level design methodology we also present the design methodology to tailor the proposed interconnection architecture to low power structures by two proposed optimization schemes with profiled communication characteristics the first scheme is power aware topology construction which can build low power application specific interconnection topologies to further reduce the power consumption we propose the second optimization scheme to predetermine the operating mode of dual mode switches in the noc at runtime we evaluate several interconnection techniques and the results show that the proposed architecture is more low power and high performance than others under some constraints and scale boundaries we take multimedia applications as case studies and experimental results show the power savings of power aware topology approximate to percnt of the interconnection architecture the power consumption can be further reduced approximately percnt by applying partially dedicated path mechanism
peer to peer pp networks are vulnerable from malicious attacks by anonymous users by populating unprotected peers with poisoned file indices the attacker can launch poisoning ddos distributed denial of service attacks on any host in the network we solve this security problem with identity based signatures contained in file indexes to establish peer accountability we prove that index accountability can effectively block index poisoning ddos attacks in any open pp environment new accountable indexing protocol aip is proposed to enforce peer accountability this protocol is applicable to all pp file sharing networks either structured or unstructured the system allows gradual transition of peers to become aip enabled we develop an analytical model to characterize the poison propagation patterns the poisoning model is validated by simulated aip experiments on large scale pp networks over one million of peer nodes
recent practical experience with description logics dls has revealed that their expressivity is often insufficient to accurately describe structured objects objects whose parts are interconnected in arbitrary rather than tree like ways to address this problem we propose an extension of dl languages with description graphs modeling construct that can accurately describe objects whose parts are connected in arbitrary ways furthermore to enable modeling the conditional aspects of structured objects we also incorporate rules into our formalism we present an in depth study of the computational properties of such formalism in particular we first identify the sources of undecidability of the general unrestricted formalism and then present restriction that makes reasoning decidable finally we present tight complexity bounds
with increasing complexity of modern embedded systems the availability of highly optimizing compilers becomes more and more important at the same time application specific instruction set processors asips are used to fine tune hardware platforms to the intended application demanding the availability of retargetable components throughout thewhole tool chain very promising approach is to model the target architecture using dedicated description language that is rich enough to generate hardware components and the required tool chain eg assembler linker simulator and compiler in this work we present new structural architecture description language adl that is used to derive the architecture dependent components of compiler backend most notably an instruction selector based on tree pattern matching we combine our backend with gcc thereby opening up the way for large number of readily available high level optimizations experimental results show that the automatically derived code generator is competitive in comparison to handcrafted compiler backend
association rules represent promising technique to find hidden patterns in medical data set the main issue about mining association rules in medical data set is the large number of rules that are discovered most of which are irrelevant such number of rules makes search slow and interpretation by the domain expert difficult in this work search constraints are introduced to find only medically significant association rules and make search more efficient in medical terms association rules relate heart perfusion measurements and patient risk factors to the degree of stenosis in four specific arteries association rule medical significance is evaluated with the usual support and confidence metrics but also lift association rules are compared to predictive rules mined with decision trees well known machine learning technique decision trees are shown to be not as adequate for artery disease prediction as association rules experiments show decision trees tend to find few simple rules most rules have somewhat low reliability most attribute splits are different from medically common splits and most rules refer to very small sets of patients in contrast association rules generally include simpler predictive rules they work well with user binned attributes rule reliability is higher and rules generally refer to larger sets of patients
geometric flows are ubiquitous in mesh processing curve and surface evolutions based on functional minimization have been used in the context of surface diffusion denoising shape optimization minimal surfaces and geodesic paths to mention few such gradient flows are nearly always yet often implicitly based on the canonical linner product of vector fields in this paper we point out that changing this inner product provides simple powerful and untapped approach to extend current flows we demonstrate the value of such norm alteration for regularization and volume preservation purposes and in the context of shape matching where deformation priors ranging from rigid motion to articulated motion can be incorporated into gradient flow to drastically improve results implementation details including differentiable approximation of the hausdorff distance between irregular meshes are presented
summarizing set of streaming time series is an important issue that reliably allows information to be monitored and stored in domains such as finance networks etc to date most of existing algorithms have focused on this problem by summarizing the time series separately moreover the same amount of memory has been allocated to each time series yet memory management is an important subject in the data stream field but framework allocating equal amount of memory to each sequence is not appropriate we introduce an effective and efficient method which succeeds to respond to both challenges memory optimized framework along with fast novel sequence merging method experiments with real data show that this method is effective and efficient
in this paper we propose new algorithm for proving the validity or invalidity of pre postcondition pair for program the algorithm is motivated by the success of the algorithms for probabilistic inference developed in the machine learning community for reasoning in graphical models the validity or invalidity proof consists of providing an invariant at each program point that can be locally verified the algorithm works by iteratively randomly selecting program point and updating the current abstract state representation to make it more locally consistent with respect to the abstractions at the neighboring points we show that this simple algorithm has some interesting aspects it brings together the complementary powers of forward and backward analyses the algorithm has the ability to recover itself from excessive under approximation or over approximation that it may make because the algorithm does not distinguish between the forward and backward information the information could get both under approximated and over approximated at any step the randomness in the algorithm ensures that the correct choice of updates is eventually made as there is no single deterministic strategy that would provably work for any interesting class of programs in our experiments we use this algorithm to produce the proof of correctness of small but non trivial example in addition we empirically illustrate several important properties of the algorithm
the increasing heterogeneity dynamism and interconnectivity in software applications services and networks led to complex unmanageable and insecure systems coping with such complexity necessitates to investigate new paradigm namely autonomic computing although academic and industry efforts are beginning to proliferate in this research area there are still lots of open issues that remain to be solved this paper proposes categorization of complexity in systems and presents an overview of autonomic computing research area the paper also discusses summary of the major autonomic computing systems that have been already developed both in academia and industry and finally outlines the underlying research issues and challenges from practical as well as theoretical point of view
researchers are using emerging technologies to develop novel play environments while established computer and console game markets continue to grow rapidly even so evaluating the success of interactive play environments is still an open research challenge both subjective and objective techniques fall short due to limited evaluative bandwidth there remains no corollary in play environments to task performance with productivity systems this paper presents method of modeling user emotional state based on user’s physiology for users interacting with play technologies modeled emotions are powerful because they capture usability and playability through metrics relevant to ludic experience account for user emotion are quantitative and objective and are represented continuously over session furthermore our modeled emotions show the same trends as reported emotions for fun boredom and excitement however the modeled emotions revealed differences between three play conditions while the differences between the subjective reports failed to reach significance
in the last years mainstream research in vlsi placement has been driven by formal optimization and the ad hoc requirement that downstream tools particularly routers work progress is currently measured by improving routed wirelength and place and route run time on large benchmarks however these results now appear questionable as major placers were shown to be tuned to particular benchmark suites and ii some reported improvements could not be replicated on full fledged industrial circuitsinstead of blind wirelength minimization our work seeks better understanding of what good placer should produce and what existing placers actually produce we abstract away details from various circuit patterns into separate constructive benchmarks and perform detailed study of leading placers unlike the randomized peko benchmarks ours are highly structured and easy to visualize we know all of their wirelength optimal solutions and in many cases there is only one per benchmark by comparing actual solutions to optimal ones we reason about the underlying placer algorithms and their possible improvementsin new development we show that the wirelength sub optimality ratio of several existing placers quickly grows with the size of the netlist some of the reasons for such poor performance are obvious from our visualizations while it seems easy to coerce given placer to improve wirelength on any particular constructive benchmark improving the overall performance is more difficult we improve the performance of capo placer on several constructive benchmarks and proprietary cell circuit from ibm without wirelength penalty on commonly used benchmarks
in this paper we present number of measures that compare rankings of search engine results we apply these measures to five queries that were monitored daily for two periods of or days each rankings of the different search engines google yahoo and teoma for text searches and google yahoo and picsearch for image searches are compared on daily basis in addition to longitudinal comparisons of the same engine for the same query over time the results and rankings of the two periods are compared as well
the results of machine learning from user behavior can be thought of as program and like all programs it may need to be debugged providing ways for the user to debug it matters because without the ability to fix errors users may find that the learned program’s errors are too damaging for them to be able to trust such programs we present new approach to enable end users to debug learned program we then use an early prototype of our new approach to conduct formative study to determine where and when debugging issues arise both in general and also separately for males and females the results suggest opportunities to make machine learned programs more effective tools
we define standard of effectiveness for database calculus relative to query language effectiveness judges suitability to serve as processing framework for the query language and comprises aspects of coverage manipulability and efficient evaluation we present the monoid calculus and argue its effectiveness for object oriented query languages exemplified by oql of odmg the monoid calculus readily captures such features as multiple collection types aggregations arbitrary composition of type constructors and nested query expressions we also show how to extend the monoid calculus to deal with vectors and arrays in more expressive ways than current query languages do and illustrate how it can handle identity and updates
most solutions for introducing variability in software system are singular they support one particular point in the software life cycle at which variability can be resolved to select specific instance of the system the presence of significantly increased and dissimilar levels of variability in today’s software systems requires flexible approach that supports selection of system instance at any point in the life cycle from statically at design time to dynamically at run time this paper introduces our approach to supporting any time variability an approach based on the ubiquitous use of product line architecture as the organizing abstraction throughout the lifetime of software system the product line architecture specifies the variabilities in system both in terms of space captured as explicit variation points and time captured as explicit versions of architectural elements system instance can be selected at any point in time by providing set of desired features expressed as name value pairs to an automated selector tool we introduce our overall approach discuss our representation and tools for expressing and managing variability and demonstrate their use with three representative examples of any time variability
texture optimization is texture synthesis method that can efficiently reproduce various features of exemplar textures however its slow synthesis speed limits its usage in many interactive or real time applications in this paper we propose parallel texture optimization algorithm to run on gpus in our algorithm coherence search and principle component analysis pca are used for hardware acceleration and two acceleration techniques are further developed to speed up our gpu based texture optimization with reasonable precomputation cost the online synthesis speed of our algorithm is times faster than that of the original texture optimization algorithm and thus our algorithm is capable of interactive applications the advantages of the new scheme are demonstrated by applying it to interactive editing of flow guided synthesis
analyzing the quality of data prior to constructing data mining models is emerging as an important issue algorithms for identifying noise in given data set can provide good measure of data quality considerable attention has been devoted to detecting class noise or labeling errors in contrast limited research work has been devoted to detecting instances with attribute noise in part due to the difficulty of the problem we present novel approach for detecting instances with attribute noise and demonstrate its usefulness with case studies using two different real world software measurement data sets our approach called pairwise attribute noise detection algorithm panda is compared with nearest neighbor distance based outlier detection technique denoted dm investigated in related literature since what constitutes noise is domain specific our case studies uses software engineering expert to inspect the instances identified by the two approaches to determine whether they actually contain noise it is shown that panda provides better noise detection performance than the dm algorithm
internet routers and ethernet switches contain packet buffers to hold packets during times of congestion packet buffers are at the heart of every packet switch and router which have combined annual market of tens of billions of dollars and equipment vendors spend hundreds of millions of dollars on memory each year designing packet buffers used to be easy dram was cheap low power and widely used but something happened at gb when packets started to arrive and depart faster than the access time of dram alternative memories were needed but sram is too expensive and power hungry caching solution is appealing with hierarchy of sram and dram as used by the computer industry however in switches and routers it is not acceptable to have miss rate as it reduces throughput and breaks pipelines in this paper we describe how to build caches with hit rate under all conditions by exploiting the fact that switches and routers always store data in fifo queues we describe number of different ways to do it with and without pipelining with static or dynamic allocation of memory in each case we prove lower bound on how big the cache needs to be and propose an algorithm that meets or comes close to the lower bound these techniques are practical and have been implemented in fast silicon as result we expect the techniques to fundamentally change the way switches and routers use external memory
motion editing often requires repetitive operations for modifying similar action units to give similar effect or impression this paper proposes system for efficiently and flexibly editing the sequence of iterative actions by few intuitive operations our system visualizes motion sequence on summary timeline with editable pose icons and drag and drop operations on the timeline enable intuitive controls of temporal properties of the motion such as timing duration and coordination this graphical interface is also suited to transfer kinematical and temporal features between two motions through simple interactions with quick preview of the resulting poses our method also integrates the concept of edit propagation by which the manual modification of one action unit is automatically transferred to the other units that are robustly detected by similarity search technique we demonstrate the efficiency of our pose timeline interface with propagation mechanism for the timing adjustment of mutual actions and for motion synchronization with music sequence
increasing the number of instruction queue iq entries in dynamically scheduled processor exposes more instruction level parallelism leading to higher performance however increasing conventional iq’s physical size leads to larger latencies and slower clock speeds we introduce new iq design that divides large queue into small segments which can be clocked at high frequencies we use dynamic dependence based scheduling to promote instructions from segment to segment until they reach small issue buffer our segmented iq is designed specifically to accommodate variable latency instructions such as loads despite its roughly similar circuit complexity simulation results indicate that our segmented instruction queue with entries and chains improves performance by up to over entry conventional instruction queue for specint benchmarks and up to for specfp benchmarks the segmented iq achieves from to of the performance of monolithic entry queue while providing the potential for much higher clock speeds
prior work has shown that reduced ordered binary decision diagrams bdds can be powerful tool for program trace analysis and visualization unfortunately it can take hours or days to encode large traces as bdds further techniques used to improve bdd performance are inapplicable to large dynamic program traces this paper explores the use of zdds for compressing dynamic trace data prior work has show that zdds can represent sparse data sets with less memory compared to bdds this paper demonstrates that zdds do indeed provide greater compression for sets of dynamic traces smaller than bdds on average with proper tuning zdds encode sets of dynamic trace data over faster than bdds and zdds can be used for all prior applications of bdds for trace analysis and visualization
we present the design and evaluation of passport system that allows source addresses to be validated within the network passport uses efficient symmetric key cryptography to place tokens on packets that allow each autonomous system as along the network path to independently verify that source address is valid it leverages the routing system to efficiently distribute the symmetric keys used for verification and is incrementally deployable without upgrading hosts we have implemented passport with click and xorp and evaluated the design via micro benchmarking experiments on the deterlab security analysis and adoptability modeling we find that passport is plausible for gigabit links and can mitigate reflector attacks even without separate denial of service defenses our adoptability modeling shows that passport provides stronger security and deployment incentives than alternatives such as ingress filtering this is because the isps that adopt it protect their own addresses from being spoofed at each other’s networks even when the over all deployment is small
envisioning new generation of sensor network applications in healthcare and workplace safety we seek mechanisms that provide timely and reliable transmissions of mission critical data inspired by the physics in magnetism we propose simple diffusion based data dissemination mechanism referred to as the magnetic diffusion md in that the data sink functioning like the magnet propagates the magnetic charge to set up the magnetic field under the influence of the magnetic field the sensor data functioning like the metallic nails are attracted towards the sink we compare md to the state of the art mechanisms and find that md performs the best in timely delivery of data achieves high data reliability in the presence of network dynamics and yet works as energy efficiently as the state of the art these suggest that md is an effective data dissemination solution to the mission critical applications
in this paper we examine the issue of optimizing disk usage and scheduling large scale scientific workflows onto distributed resources where the workflows are data intensive requiring large amounts of data storage and the resources have limited storage resources our approach is two fold we minimize the amount of space workflow requires during execution by removing data files at runtime when they are no longer needed and we demonstrate that workflows may have to be restructured to reduce the overall data footprint of the workflow we show the results of our data management and workflow restructuring solutions using laser interferometer gravitational wave observatory ligo application and an astronomy application montage running on large scale production grid the open science grid we show that although reducing the data footprint of montage by can be achieved with dynamic data cleanup techniques ligo scientific collaboration workflows require additional restructuring to achieve reduction in data space usage we also examine the cost of the workflow restructuring in terms of the application’s runtime
we introduce symspline symmetric dual mouse technique for the manipulation of spline curves in symspline two cursors control the positions of the ends of the tangent to an edit point by moving the tangent with both mice the tangent and the edit point can be translated while the curvature of the spline is adjusted simultaneously according to the length and angle of the tangent we compare the symspline technique to two asymmetric dual mouse spline manipulation techniques and to standard single mouse technique in spline matching experiment symspline outperformed the two asymmetric dual mouse techniques and all three dual mouse techniques proved to be faster than the single mouse technique additionally symspline was the technique most preferred by test participants
managing multiple versions of xml documents represents critical requirement for many applications recently there has been much work on supporting complex queries on xml data eg regular path expressions structural projections etc in this article we examine the problem of implementing efficiently such complex queries on multiversion xml documents our approach relies on numbering scheme whereby durable node numbers dnns are used to preserve the order among the nodes of the xml tree while remaining invariant with respect to updates using the document’s dnns we show that many complex queries are reduced to combinations of range version retrieval queries we thus examine three alternative storage organizations indexing schemes to efficiently evaluate range version retrieval queries in this environment thorough performance analysis is then presented to reveal the advantages of each scheme
variational interpolation in curved geometries has many applications so there has always been demand for geometrically meaningful and efficiently computable splines in manifolds we extend the definition of the familiar cubic spline curves and splines in tension and we show how to compute these on parametric surfaces level sets triangle meshes and point samples of surfaces this list is more comprehensive than it looks because it includes variational motion design for animation and allows the treatment of obstacles via barrier surfaces all these instances of the general concept are handled by the same geometric optimization algorithm which minimizes an energy of curves on surfaces of arbitrary dimension and codimension
memory leaks compromise availability and security by crippling performance and crashing programs leaks are difficult to diagnose because they have no immediate symptoms online leak detection tools benefit from storing and reporting per object sites eg allocation sites for potentially leaking objects in programs with many small objects per object sites add high space overhead limiting their use in production environmentsthis paper introduces bit encoding leak location bell statistical approach that encodes per object sites to single bit per object bit loses information about site but given sufficient objects that use the site and known finite set of possible sites bell uses brute force decoding to recover the site with high accuracywe use this approach to encode object allocation and last use sites in sleigh new leak detection tool sleigh detects stale objects objects unused for long time and uses bell decoding to report their allocation and last use sites our implementation steals four unused bits in the object header and thus incurs no per object space overhead sleigh’s instrumentation adds execution time overhead which adaptive profiling reduces to sleigh’s output is directly useful for finding and fixing leaks in spec jbb and eclipse although sufficiently many objects must leak before bell decoding can report sites with confidence bell is suitable for other leak detection approaches that store per object sites and for other problems amenable to statistical per object metadata
do we always use the same name for the same concept usually not while misunderstandings are always troublesome they pose particularly critical problems in software projects requirements engineering deals intensively with reducing the number and scope of misunderstandings between software engineers and customers software maintenance is another important task where proper understanding of the application domain is vital in both cases it is necessary to gain or regain domain knowledge from existing documents that are usually inconsistent and imprecise this paper proposes to reduce the risk of misunderstandings by unifying the terminology of the different stakeholders with the help of an ontology the ontology is constructed by extracting terms and relations from existing documents applying text mining for ontology extraction has an unbeatable advantage compared to manual ontology extraction text mining detects terminology inconsistencies before they are absorbed in the ontology in addition to this the approach presented in this paper also introduces an explicit validation of ontology gained by text mining
connectivity products are finally available to provide the highways between computers containing data ibm has provided strong validation of the concept with their information warehouse dbms vendors are providing gateways into their products and sql is being retrofitted on many older dbmss to make it easier to access data from standard gl products and application development systems the next step needed for data integration is to provide common data dictionary with conceptual schema across the data to mask the many differences that occur when databases are developed independently and server that can access and integrate the databases using information from the data dictionary in this article we discuss interviso one of the first commercial federated database products interviso is based on mermaid which was developed at sdc and unisys templeton et al it provides value added layer above connectivity products to handle views across databases schema translation and transaction management
this paper investigates graph enumeration problem called the maximalp subgraphs problem where is hereditary or connected hereditary graph property formally given graph the maximal subgraphs problem is to generate all maximal induced subgraphs of that satisfy this problem differs from the well known node deletion problem studied by yannakakis and lewis lewis on the complexity of the maximum subgraph problem in proc th annual acm symposium on theory of computing acm press new york usa pp yannakakis node and edge deletion np complete problems in proc th annual acm symposium on theory of computing acm press new york usa pp lewis yannakakis the node deletion problem for hereditary properties is np complete comput system sci in the maximal subgraphs problem the goal is to produce all locally maximal subgraphs of graph that have property whereas in the node deletion problem the goal is to find single globally maximum size subgraph with property algorithms are presented that reduce the maximal subgraphs problem to an input restricted version of this problem these algorithms imply that when attempting to efficiently solve the maximal subgraphs problem for specific it is sufficient to solve the restricted case the main contributions of this paper are characterizations of when the maximal subgraphs problem is in complexity class eg polynomial delay total polynomial time
monitoring of cognitive and physical function is central to the care of people with or at risk for various health conditions but existing solutions rely on intrusive methods that are inadequate for continuous tracking less intrusive techniques that facilitate more accurate and frequent monitoring of the status of cognitive or physical function become increasingly desirable as the population ages and lifespan increases since the number of seniors using computers continues to grow dramatically method that exploits normal daily computer interactions is attractive this research explores the possibility of detecting cognitive and physical stress by monitoring keyboard interactions with the eventual goal of detecting acute or gradual changes in cognitive and physical function researchers have already attributed certain amount of variability and drift in an individual’s typing pattern to situational factors as well as stress but this phenomenon has not been explored adequately in an attempt to detect changes in typing associated with stress this research analyzes keystroke and linguistic features of spontaneously generated text results show that it is possible to classify cognitive and physical stress conditions relative to non stress conditions based on keystroke and linguistic features with accuracy rates comparable to those currently obtained using affective computing methods the proposed approach is attractive because it requires no additional hardware is unobtrusive is adaptable to individual users and is of very low cost this research demonstrates the potential of exploiting continuous monitoring of keyboard interactions to support the early detection of changes in cognitive and physical function
we present pert power efficient scheme to deliver real time data packets in sensor networks time sensitive sensor data is common in applications such as hazard monitoring systems traffic control systems and battlefield command systems such data are associated with end to end deadlines within which they must reach the base station bs we make two contributions in this work firstly we propose novel load balanced routing scheme that distributes data packets evenly among the nodes relaying data towards the bs avoiding bottlenecks and increasing the likelihood that packets will meet their deadlines secondly we propose method of grouping smaller packets into larger ones by delaying data transmissions at the relaying nodes whenever slack times are positive our packet grouping scheme significantly reduces packet transmissions reduces congestion and saves power in the sensor network we verify the effectiveness of our approach through extensive simulations of wireless network behaviour using the ns simulation package
we address the task of actively learning segmentation system given large number of unsegmented images and access to an oracle that can segment given image decide which images to provide to quickly produce segmenter here discriminative random field that is accurate over this distribution of images we extend the standard models for active learner to define system for this task that first selects the image whose expected label will reduce the uncertainty of the other unlabeled images the most and then after greedily selects from the pool of unsegmented images the most informative image the results of our experiments over two real world datasets segmenting brain tumors within magnetic resonance images and segmenting the sky in real images show that training on very few informative images here as few as can produce segmenter that is as good as training on the entire dataset
the focus of this paper is to investigate the possibility of predicting several user and message attributes in text based real time online messaging services for this purpose large collection of chat messages is examined the applicability of various supervised classification techniques for extracting information from the chat messages is evaluated two competing models are used for defining the chat mining problem term based approach is used to investigate the user and message attributes in the context of vocabulary use while style based approach is used to examine the chat messages according to the variations in the authors writing styles among authors the identity of an author is correctly predicted with accuracy moreover the reverse problem is exploited and the effect of author attributes on computer mediated communications is discussed
standard relation has two dimensions attributes and tuples temporal relation contains two additional orthogonal time dimensions valid time records when facts are true in the modeled reality and transaction time records when facts are stored in the temporal relation although there are no restrictions between the valid time and transaction time associated with each fact in many practical applications the valid and transaction times exhibit restricted interrelationships that define several types of specialized temporal relations this paper examines areas where different specialized temporal relations are present in application systems with multiple interconnected temporal relations multiple time dimensions may be associated with facts as they flow from one temporal relation to another the paper investigates several aspects of the resulting generalized temporal relations including the ability to query predecessor relation from successor relation the presented framework for generalization and specialization allows one to precisely characterize and compare temporal relations and the application systems in which they are embedded the framework’s comprehensiveness and its use in understanding temporal relations are demonstrated by placing previously proposed temporal data models within the framework the practical relevance of the defined specializations and generalizations is illustrated by sample realistic applications in which they occur the additional semantics of specialized relations are especially useful for improving the performance of query processing
an sql extension is formalized for the management of spatio temporal data ie of spatial data that evolves with respect to time the extension is dedicated to applications such as topography cartography and cadastral systems hence it considers discrete changes both in space and in time it is based on the rigid formalization of data types and of sql constructs data types are defined in terms of time and spatial quanta the sql constructs are defined in terms of kernel of few relational algebra operations composed of the well known operations of the nf model and of two more unfold and fold in conjunction with previous work it enables the uniform management of nf structures that may contain not only spatio temporal but also either purely temporal or purely spatial or conventional data the syntax and semantics of the extension is fully consistent with the sql standard
providing alert communication in emergency situations is vital to reduce the number of victims reaching this goal is challenging due to users diversity people with disabilities elderly and children and other vulnerable groups notifications are critical when an emergency scenario is going to happen eg typhoon approaching so the ability to transmit notifications to different kind of users is crucial feature for such systems in this work an ontology was developed by investigating different sources accessibility guidelines emergency response systems communication devices and technologies taking into account the different abilities of people to react to different alarms eg mobile phone vibration as an alarm for deafblind people we think that the proposed ontology addresses the information needs for sharing and integrating emergency notification messages over distinct emergency response information systems providing accessibility under different conditions and for different kind of users
the rapid emergence of xml as standard for data exchange over the web has led to considerable interest in the problem of securing xml documents in this context query evaluation engines need to ensure that user queries only use and return xml data the user is allowed to access these added access control checks can considerably increase query evaluation time in this paper we consider the problem of optimizing the secure evaluation of xml twig queries we focus on the simple but useful multi level access control model where security level can be either specified at an xml element or inherited from its parent for this model secure query evaluation is possible by rewriting the query to use recursive function that computes an element’s security level based on security information in the dtd we devise efficient algorithms that optimally determine when the recursive check can be eliminated and when it can be simplified to just local check on the element’s attributes without violating the access control policy finally we experimentally evaluate the performance benefits of our techniques using variety of xml data and queries
this paper presents tool altair that automatically generates api function cross references which emphasizes reliable structural measures and does not depend on specific client code altair ranks related api functions for given query according to pair wise overlap ie how they share state and clusters tightly related ones into meaningful modules experiments against several popular software packages show that altair recommends related api functions for given query with remarkably more precise and complete results than previous tools that it can extract modules from moderate sized software eg apache with functions at high precision and recall rates eg both exceeding for two modules in apache and that the computation can finish within few seconds
join of two relations in real databases is usually much smaller than their cartesian product this means that most of the combinations of tuples in the crossproduct of the respective relations do not appear together in the join result we characterize these combinations as ranges of attributes that do not appear together we sketch an algorithm for finding such combinations and present experimental results from real data sets we then explore two potential applications of this knowledge in query processing in the first application we model empty joins as materialized views we show how they can be used for query optimization in the second application we propose strategy that uses information about empty joins for an improved join selectivity estimation
recent technology advances are pushing towards full integration of low capacity networked devices in pervasive embedded pp systems one of the challenges of such integration is to allow low capacity devices both to invoke and to provide services while featuring enhanced service discovery mechanisms that are necessary to automate service invocation in pervasive environments in this paper we present two tiered approach to enabling enhanced service discovery in embedded pp systems we first present super peer based overlay network featuring matching capability aware routing of messages and saving the resource consumption of low capacity devices while keeping the overall network traffic low we then present service discovery protocol that exploits such underlying overlay network to suitably distribute service contracts on devices capable of analysing them thus enabling enhanced service discovery even in nets mainly formed by low capacity devices finally we discuss some experimental results that confirm the viability of the proposed approach
this paper presents improved algorithms for the following problem given an unweighted directed graph and sequence of on line shortest path reachability queries interspersed with edge deletions develop data structure that can answer each query in optimal time and can be updated efficiently after each edge deletion the central idea underlying our algorithms is scheme for implicitly storing all pairs reachability shortest path information and an efficient way to maintain this information our algorithms are randomized and have one sided inverse polynomial error for query
verification by state space exploration also often referred to as model checking is an effective method for analyzing the correctness of concurrent reactive systems for instance communication protocols unfortunately traditional model checking is restricted to the verification of properties of models ie abstractions of concurrent systemswe discuss in this paper how model checking can be extended to analyze arbitrary software such as implementations of communication protocols written in programming languages like or we then introduce search technique that is suitable for exploring the state spaces of such systems this algorithm has been implemented in verisoft tool for systematically exploring the state spaces of systems composed of several concurrent processes executing arbitrary codeduring the past five years verisoft has been applied successfully for analyzing several software products developed in lucent technologies and has also been licensed to hundreds of users in industry and academia we discuss applications strengths and limitations of verisoft and compare it to other approaches to software model checking analysis and testing
in xml retrieval two distinct approaches have been established and pursued without much cross fertilization taking place so far on the one hand native xml databases tailored to the semistructured data model have received considerable attention and wealth of index structures join algorithms tree encodings and query rewriting techniques for xml have been proposed on the other hand the question how to make xml fit the relational data model has been studied in great detail giving rise to multitude of storage schemes for xml in relational database systems rdbss in this paper we examine how native xml indexing techniques can boost the retrieval of xml stored in an rdbs we present the relational cadg rcadg an adaptation of several native indexing approaches to the relational model and show how it supports the evaluation of clean formal language of conjunctive xml queries unlike relational storage schemes for xml the rcadg largely preserves the underlying tree structure of the data in the rdbs thus addressing several open problems known from the literature experiments show that the rcadg accelerates retrieval by up to two or even three orders of magnitude compared to both native and relational approaches
we present probabilistic program transformation algorithm to render given program tamper resistant in addition we suggest model to estimate the required effort for an attackwemake some engineering assumptions about local indistinguishability on the transformed program and model an attacker’s steps as making walk on the program flow graph the goal of the attacker is to learn what has been inserted by the transformation in which case he wins our heuristic estimate counts the number of steps of his walk on the graph our model is somewhat simplified but we believe both the constructions and models can be made more realistic in the future
present the ohmu language unified object model which allows number of advanced techniques such as aspects mixin layers parametric polymorphism and generative components to be implemented cleanly using two basic concepts block structure and inheritance argue that conventional ways of defining classes and objects have created artificial distinctions which limit their expressiveness the ohmu model unifies functions classes instances templates and even aspects into single construct the structure function calls instantiation aspect weaving and inheritance are likewise unified into single operation the structure transformation this simplification eliminates the distinction between classes and instances and between compile time and run time code instead of being compiled programs are reduced using partial evaluation during which the interpreter is invoked at compile time within this architecture standard oo inheritance becomes natural vehicle for creating meta programs and automatic code generators the key to number of recent domain driven programming methodologies
pure type systems make use of domain full lambda abstractions lambda xdm we present variant of pure type systems which we call domain free pure type systems with domain free lambda abstractions lambda xm domain free pure type systems have number of advantages over both pure type systems and so called type assignment systems they also have some disadvantages and have been used in theoretical developments as well as in implementations of proof assistants we study the basic properties of domain free pure type systems establish their formal relationship with pure type systems and type assignment systems and give number of applications of these correspondences
large scale library digitization projects such as the open content alliance are producing vast quantities of text but little has been done to organize this data subject headings inherited from card catalogs are useful but limited while full text indexing is most appropriate for readers who already know exactly what they want statistical topic models provide complementary function these models can identify semantically coherent topics that are easily recognizable and meaningful to humans but they have been too computationally intensive to run on library scale corpora this paper presents dcm lda topic model based on dirichlet compound multinomial distributions this model is simultaneously better able to represent observed properties of text and more scalable to extremely large text collections we train individual topic models for each book based on the cooccurrence of words within pages we then cluster topics across books the resulting topical clusters can be interpreted as subject facets allowing readers to browse the topics of collection quickly find relevant books using topically expanded keyword searches and explore topical relationships between books we demonstrate this method finding topics on corpus of billion words from books in less than hours and it easily could scale well beyond this
wireless multihop communication is becoming more important due to the increasing popularity of wireless sensor networks wireless mesh networks and mobile social networks they are distinguished from conventional multihop networks in terms of scale traffic intensity and or node density being readily available in most of radios multirate facility appears to be useful to address some of these issues and is particularly helpful in high density scenarios where inter node distance is short demanding prudent multirate adaptation algorithm however communication at high bit rates mandates large number of hops for given node pair and thus can easily be depreciated as per hop overhead at several layers of network protocol is aggregated over the increased number of hops this paper presents novel multihop multirate adaptation mechanism called multihop transmission opportunity mtop that allows frame to be forwarded number of hops consecutively but reduces the mac layer overhead between hops this seemingly collision prone multihop forwarding is proven to be safe via analysis and usrp gnu radio based experiment the idea of mtop is in clear contrast to but not mutually exclusive with the conventional opportunistic transmission mechanism referred to as txop where node transmits multiple frames back to back when it gets an opportunity we conducted an extensive simulation study via ns demonstrating the performance advantage of mtop under wide range of network scenarios
while it has been argued that application layer overlay protocols can enhance services in mobile ad hoc networks hardly any empirical data is available on the throughput and delay performance achievable in this fashion this paper presents performance measurements of an application layer overlay approach that ensures integrity and confidentiality of application data in an ad hoc environment key management and encryption scheme called neighborhood key method is presented where each node shares secrets only with authenticated neighbors in the ad hoc network thus avoiding global re keying operations all proposed solutions have been implemented and empirically evaluated in an existing software system for application layer overlay networking results from indoor and outdoor measurement experiments with mobile handheld devices provide insight into the performance and overhead of overlay networking and application layer security services in ad hoc networks
hard masking real time program is one that satisfies safety including timing constraints and liveness properties in the absence and presence of faults it has been shown that any hard masking program can be decomposed into fault intolerant version and set of fault tolerance components known as detectors and delta correctors in this paper we introduce set of sufficient conditions for interference freedom among fault tolerance components and real time programs we demonstrate that such conditions elegantly enable us to compositionally verify the correctness of hard masking programs preliminary model checking experiments show very encouraging results in both achieving speedups and reducing memory usage in verification of embedded systems
quality of service qos in web services encompasses various non functional issues such as performance dependability and security etc as more and more web services become available qos capability is becoming decisive factor to distinguishing services this study proposes an efficient service selection scheme to help service requesters select services by considering two different contexts single qos based service discovery and qos based optimization of service composition based on qos measurement metrics this study proposes multiple criteria decision making and integer programming approaches to select the optimal service experimental results show that the scheme is not only efficient but also works well for complicated scenarios
this paper addresses the allocation of link capacities in the automated design process of network on chip based system communication resource costs are minimized under quality of service timing constraintsfirst we introduce novel analytical delay model for virtual channeled wormhole networks with non uniform link capacities that eliminates costly simulations at the inner loop of the optimization process second we present an efficient capacity allocation algorithm that assigns link capacities such that packet delays requirements for each flow are satisfied we demonstrate the benefit of capacity allocation for typical system on chip where the traffic is heterogeneous and delay requirements may largely vary in comparison with the standard approach which assumes uniform capacity links
in software development developers often rely on testing to reveal bugs typically test suite should be prepared before initial testing and new test cases may be added to the test suite during the whole testing process this may usually cause the test suite to contain more or less redundancy in other words subset of the test suite called the representative set may still satisfy all the test objectives as the redundancy can increase the cost of executing the test suite quite few test suite reduction techniques have been brought out in spite of the np completeness of the general problem of finding the optimal representative set of test suite in the literature there have been some experimental studies of test suite reduction techniques but the limitations of the these experimental studies are quite obvious recently proposed techniques are not experimentally compared against each other and reported experiments are mainly based on small programs or even simulation data this paper presents new experimental study of the four typical test suite reduction techniques including harrold et al heuristic and three other recently proposed techniques such as chen and lau’s gre heuristic mansour and el fakin’s genetic algorithm based approach and black et al ilp based approach based on the results of this experimental study we also provide guideline for choosing the appropriate test suite reduction technique and some insights into why the techniques vary in effectiveness and efficiency
delaunay refinement is widely used method for the construction of guaranteed quality triangular and tetrahedral meshes we present an algorithm and software for the parallel constrained delaunay mesh generation in two dimensions our approach is based on the decomposition of the original mesh generation problem into smaller subproblems which are meshed in parallel the parallel algorithm is asynchronous with small messages which can be aggregated and exhibits low communication costs on heterogeneous cluster of more than processors our implementation can generate over one billion triangles in less than minutes while the single node performance is comparable to that of the fastest to our knowledge sequential guaranteed quality delaunay meshing library the triangle
this paper presents novel image editing program emphasizing easy selection and manipulation of material found in informal casual documents such as sketches handwritten notes whiteboard images screen snapshots and scanned documents the program called scanscribe offers four significant advances first it presents new intuitive model for maintaining image objects and groups along with underlying logic for updating these in the course of an editing session second scanscribe takes advantage of newly developed image processing algorithms to separate foreground markings from white or light background and thus can automatically render the background transparent so that image material can be rearranged without occlusion by background pixels third scanscribe introduces new interface techniques for selecting image objects with pointing device without resorting to palette of tool modes fourth scanscribe presents platform for exploiting image analysis and recognition methods to make perceptually significant structure readily available to the user as research prototype scanscribe has proven useful in the work of members of our laboratory and has been released on limited basis for user testing and evaluation
we introduce robust moving least squares technique for reconstructing piecewise smooth surface from potentially noisy point cloud we use techniques from robust statistics to guide the creation of the neighborhoods used by the moving least squares mls computation this leads to conceptually simple approach that provides unified framework for not only dealing with noise but also for enabling the modeling of surfaces with sharp featuresour technique is based on new robust statistics method for outlier detection the forward search paradigm using this powerful technique we locally classify regions of point set to multiple outlier free smooth regions this classification allows us to project points on locally smooth region rather than surface that is smooth everywhere thus defining piecewise smooth surface and increasing the numerical stability of the projection operator furthermore by treating the points across the discontinuities as outliers we are able to define sharp features one of the nice features of our approach is that it automatically disregards outliers during the surface fitting phase
latent dirichlet allocation lda blei ng jordan is fully generative statistical language model on the content and topics of corpus of documents in this paper we apply an extension of lda for web spam classification our linked lda technique takes also linkage into account topics are propagated along links in such way that the linked document directly influences the words in the linking document the inferred lda model can be applied for classification as dimensionality reduction similarly to latent semantic indexing we test linked lda on the webspam uk corpus by using bayesnet classifier in terms of the auc of classification we achieve improvement over plain lda with bayesnet and over the public link features with the addition of this method to log odds based combination of strong link and content baseline classifiers results in improvement in auc our method even slightly improves over the best web spam challenge result
we propose incorporating production rules facility into relational database system such facility allows definition of database operations that are automatically executed whenever certain conditions are met in keeping with the set oriented approach of relational data manipulation languages our production rules are also set oriented mdash they are triggered by sets of changes to the database and may perform sets of changes the condition and action parts of our production rules may refer to the current state of the database as well as to the sets of changes triggering the rules we define syntax for production rule definition as an extension to sql model of system behavior is used to give an exact semantics for production rule execution taking into account externally generated operations self triggering rules and simultaneous triggering of multiple rules
peer to peer pp networks are beginning to form the infrastructure of future applications heavy network traffic limits the scalability of pp networks indexing is method to reduce this traffic but indexes tend to become large with the growth of the network also limiting the size of these indexes causes loss of indexing information in this paper we introduce novel ontology based index oi which limits the size of the indexes without sacrificing indexing information we show that the method can be employed by many pp networks the oi sits on top of routing and maintenance modules of pp network and enhances it the oi prunes branches of search trees which have no chance to proceed to response also the oi guarantees that an enhanced routing algorithm and its basic version have the same result set for given search query this means that the oi reduces traffic without reducing quality of service to measure the performance of the oi we apply it on chord dht based and hypercup non dht based pp networks and show that it reduces the networks traffic significantly
quality experts often need to identify in software systems design defects which are recurring design problems that hinder development and maintenance consequently several defect detection approaches and tools have been proposed in the literature however we are not aware of any approach that defines and reifies the process of generating detection algorithms from the existing textual descriptions of defects in this paper we introduce an approach to automate the generation of detection algorithms from specifications written using domain specific language the domain specific is defined from thorough domain analysis we specify several design defects generate automatically detection algorithms using templates and validate the generated detection algorithms in terms of precision and recall on xerces an open source object oriented system
higher power relay nodes can be used as cluster heads in two tiered sensor networks to achieve improved network lifetime the relay nodes may form network among themselves to route data towards the base station in this model the lifetime of network is determined mainly by the lifetimes of these relay nodes an energy aware communication strategy can greatly extend the lifetime of such networks however integer linear program ilp formulations for optimal energy aware routing quickly become computationally intractable and are not suitable for practical networks in this paper we have proposed an efficient solution based on genetic algorithm ga for scheduling the data gathering of relay nodes which can significantly extend the lifetime of relay node network for smaller networks where the global optimum can be determined our ga based approach is always able to find the optimal solution furthermore our algorithm can easily handle large networks where it leads to significant improvements compared to traditional routing schemes
server storage systems use large number of disks to achieve high performance thereby consuming significant amount of power in this paper we propose to significantly reduce the power consumed by such storage systems via intra disk parallelism wherein disk drives can exploit parallelism in the request stream intra disk parallelism can facilitate replacing large disk array with smaller one using the minimum number of disk drives needed to satisfy the capacity requirements we show that the design space of intra disk parallelism is large and present taxonomy to formulate specific implementations within this space using set of commercial workloads we perform limit study to identify the key performance bottlenecks that arise when we replace storage array that is tuned to provide high performance with single high capacity disk drive we show that it is possible to match and even surpass the performance of storage array for these workloads by using single disk drive of sufficient capacity that exploits intra disk parallelism while significantly reducing the power consumed by the storage system we evaluate the performance and power consumption of disk arrays composed of intra disk parallel drives and discuss engineering and cost issues related to the implementation and deployment of such disk drives
yield management in semiconductor manufacturing companies requires accurate yield prediction and continual control however because many factors are complexly involved in the production of semiconductors manufacturers or engineers have hard time managing the yield precisely intelligent tools need to analyze the multiple process variables concerned and to predict the production yield effectively this paper devises hybrid method of incorporating machine learning techniques together to detect high and low yields in semiconductor manufacturing the hybrid method has strong applicative advantages in manufacturing situations where the control of variety of process variables is interrelated in real applications the hybrid method provides more accurate yield prediction than other methods that have been used with this method the company can achieve higher yield rate by preventing low yield lots in advance
with the advance of technology public key cryptography pkc will sooner or later be widely used in wireless sensor networks recently it has been shown that the performance of some public key algorithms such as elliptic curve cryptography ecc is already close to being practical on sensor nodes however the energy consumption of pkc is still expensive especially compared to symmetric key algorithms to maximize the lifetime of batteries we should minimize the use of pkc whenever possible in sensor networksthis paper investigates how to replace one of the important pkc operations the public key authentication with symmetric key operations that are much more efficient public key authentication is to verify the authenticity of another party’s public key to make sure that the public key is really owned by the person it is claimed to belong to in pkc this operation involves an expensive signature verification on certificate we propose an efficient alternative that uses one way hash function only our scheme uses all sensor’s public keys to construct forest of merkle trees of different heights by optimally selecting the height of each tree we can minimize the computation and communication costs the performance of our scheme is evaluated in the paper
service discovery is key activity to actually identify the web services wss to be invoked and composed since it is likely that more than one service fulfill set of user requirements some ranking mechanisms based on non functional properties nfps are needed to support automatic or semi automatic selectionthis paper introduces an approach to nfp based ranking of wss providing support for semantic mediation consideration of expressive nfp descriptions both on provider and client side and novel matching functions for handling either quantitative or qualitative nfps the approach has been implemented in ranker that integrates reasoning techniques with algorithmic ones in order to overcome current and intrinsic limitations of semantic web technologies and to provide algorithmic techniques with more flexibility moreover to the best of our knowledge this paper presents the first experimental results related to nfp based ranking of wss considering significant number of expressive nfp descriptions showing the effectiveness of the approach
in this paper we introduce new perceptual metric for efficient high quality global illumination rendering the metric is based on rendering by components framework in which the direct and indirect diffuse glossy and specular light transport paths are separately computed and then composited to produce an image the metric predicts the perceptual importances of the computationally expensive indirect illumination components with respect to image quality to develop the metric we conducted series of psychophysical experiments in which we measured and modeled the perceptual importances of the components an important property of this new metric is that it predicts component importances from inexpensive estimates of the reflectance properties of scene and therefore adds negligible overhead to the rendering process this perceptual metric should enable the development of an important new class of efficient global illumination rendering systems that can intelligently allocate limited computational resources to provide high quality images at interactive rates
in this paper we develop methods to sample small realistic graph from large internet topology despite recent activity modeling and generation of realistic graphs resembling the internet is still not resolved issue all previous work has attempted to grow such graphs from scratch we address the complementary problem of shrinking an existing topology in more detail this work has three parts first we propose number of reduction methods that can be categorized into three classes deletion methods contraction methods and exploration methods we prove that some of them maintain key properties of the initial graph we implement our methods and show that we can effectively reduce the nodes of an internet graph by as much as while maintaining its important properties second we show that our reduced graphs compare favorably against construction based generators finally we successfully validate the effectiveness of our best methods in an actual performance evaluation study of multicast routing apart from its practical applications the problem of graph sampling is of independent interest
in the recent years several research efforts have focused on the concept of time granularity and its applications first stream of research investigated the mathematical models behind the notion of granularity and the algorithms to manage temporal data based on those models second stream of research investigated symbolic formalisms providing set of algebraic operators to define granularities in compact and compositional way however only very limited manipulation algorithms have been proposed to operate directly on the algebraic representation making it unsuitable to use the symbolic formalisms in applications that need manipulation of granularities this paper aims at filling the gap between the results from these two streams of research by providing an efficient conversion from the algebraic representation to the equivalent low level representation based on the mathematical models in addition the conversion returns minimal representation in terms of period length our results have major practical impact users can more easily define arbitrary granularities in terms of algebraic operators and then access granularity reasoning and other services operating efficiently on the equivalent minimal low level representation as an example we illustrate the application to temporal constraint reasoning with multiple granularities from technical point of view we propose an hybrid algorithm that interleaves the conversion of calendar subexpressions into periodical sets with the minimization of the period length the algorithm returns set based granularity representations having minimal period length which is the most relevant parameter for the performance of the considered reasoning services extensive experimental work supports the techniques used in the algorithm and shows the efficiency and effectiveness of the algorithm
in this paper we give simple scheme for identifying approximate frequent items over sliding window of size our scheme is deterministic and does not make any assumption on the distribution of the item frequencies it supports update and query time and uses space it is very simple its main data structures are just few short queues whose entries store the position of some items in the sliding window we also extend our scheme for variable size window this extended scheme uses log εn space
this paper introduces new algorithm for clustering data in high dimensional feature spaces called gardenhd the algorithm is organized around the notion of data space reduction ie the process of detecting dense areas dense cells in the space it performs effective and efficient elimination of empty areas that characterize typical high dimensional spaces and an efficient adjacency connected agglomeration of dense cells into larger clusters it produces compact representation that can effectively capture the essence of data gardenhd is hybrid of cell based and density based clustering however unlike typical clustering methods in its class it applies recursive partition of sparse regions in the space using new space partitioning strategy the properties of this partitioning strategy greatly facilitate data space reduction the experiments on synthetic and real data sets reveal that gardenhd and its data space reduction are effective efficient and scalable
developing and testing parallel code is hard even for one given input parallel program can have many possible different thread interleavings which are hard for the programmer to foresee and for testing tool to cover using stress or random testing for this reason recent trend is to use systematic testing which methodically explores different thread interleavings while checking for various bugs data races are common bugs but unfortunately checking for races is often skipped in systematic testers because it introduces substantial runtime overhead if done purely in software recently several techniques for race detection in hardware have been proposed but they still require significant hardware support this paper presents light novel technique for data race detection during systematic testing that has both small runtime overhead and very lightweight hardware requirements light is based on the observation that two thread interleavings in which racing accesses are flipped will very likely exhibit some deviation in their program execution history light computes bit hash of the program execution history during systematic testing if the hashes of two interleavings with the same happens before graph differ then race has occurred light only needs bit register per core drastic improvement over previous hardware schemes in addition our experiments on splash applications show that light has no false positives detects of races and induces only small slowdown for race free executions on average and in two different modes
this paper introduces novel pairwise adaptive dissimilarity measure for large high dimensional document datasets that improves the unsupervised clustering quality and speed compared to the original cosine dissimilarity measure this measure dynamically selects number of important features of the compared pair of document vectors two approaches for selecting the number of features in the application of the measure are discussed the proposed feature selection process makes this dissimilarity measure especially applicable in large high dimensional document collections its performance is validated on several test sets originating from standardized datasets the dissimilarity measure is compared to the well known cosine dissimilarity measure using the average measures of the hierarchical agglomerative clustering result this new dissimilarity measure results in an improved clustering result obtained with lower required computational time
to be effective in the human world robots must respond to human emotional states this paper focuses on the recognition of the six universal human facial expressions in the last decade there has been successful research on facial expression recognition fer in controlled conditions suitable for human computer interaction however the human robot scenario presents additional challenges including lack of control over lighting conditions and over the relative poses and separation of the robot and human the inherent mobility of robots and stricter real time computational requirements dictated by the need for robots to respond in timely fashion our approach imposes lower computational requirements by specifically adapting model based techniques to the fer scenario it contains adaptive skin color extraction localization of the entire face and facial components and specifically learned objective functions for fitting deformable face model experimental evaluation reports recognition rate of on the cohn kanade facial expression database and in robot scenario which compare well to other fer systems
our aim is to solve the feature subset selection problem with thousands of variables using an incremental procedure the procedure combines incrementally the outputs of non scalable search and score bayesian network structure learning methods that are run on much smaller sets of variables we assess the scalability the performance and the stability of the procedure through several experiments on synthetic and real databases scaling up to variables our method is shown to be efficient in terms of both running time and accuracy
recent advance of mobile and interactive devices such as smart phones pdas and handheld computers enables to deliver multimodal contents based on users and their environments in pervasive computing multimodal contents are mainly composed of multiple components which are often delivered from distributed multiple sources therefore how appropriate contents can be provided to users and how computing resources can be effectively exploited are critical issues in this paper an analytical model for multimodal contents is developed based on queueing theory for the purpose of delivery evaluation of the contents the model can be applied to estimate how delivery parameters of multimodal contents such as arrival rates drop rates and the number of packets can impact overall the quality of services in terms of temporal aspects numerical example of weather information delivery is provided to illustrate the effectiveness of the proposed model
clustering is the problem of identifying the distribution of patterns and intrinsic correlations in large data sets by partitioning the data points into similarity classes recently number of methods have been proposed and demonstrated good performance based on matrix approximation despite significant research on these methods few attempts have been made to establish the connections between them while highlighting their differences in this paper we present unified view of these methods within general clustering framework where the problem of clustering is formulated as matrix approximations and the clustering objective is minimizing the approximation error between the original data matrix and the reconstructed matrix based on the cluster structures the general framework provides an elegant base to compare and understand various clustering methods we provide characterizations of different clustering methods within the general framework including traditional one side clustering subspace clustering and two side clustering we also establish the connections between our general clustering framework with existing frameworks
adaptively secure key exchange allows the establishment of secure channels even in the presence of an adversary that can corrupt parties adaptively and obtain their internal states in this paper we give formal definition of contributory protocols and define an ideal functionality for password based group key exchange with explicit authentication and contributiveness in the uc framework as with previous definitions in the same framework our definitions do not assume any particular distribution on passwords or independence between passwords of different parties we also provide the first steps toward realizing this functionality in the above strong adaptive setting by analyzing an efficient existing protocol and showing that it realizes the ideal functionality in the random oracle and ideal cipher models based on the cdh assumption
mining with streaming data is hot topic in data mining when performing classification on data streams traditional classification algorithms based on decision trees such as id and have relatively poor efficiency in both time and space due to the characteristics of streaming data there are some advantages in time and space when using random decision trees an incremental algorithm for mining data streams srmtds semi random multiple decision trees for data streams based on random decision trees is proposed in this paper srmtds uses the inequality of hoeffding bounds to choose the minimum number of split examples heuristic method to compute the information gain for obtaining the split thresholds of numerical attributes and naïve bayes classifier to estimate the class labels of tree leaves our extensive experimental study shows that srmtds has an improved performance in time space accuracy and the anti noise capability in comparison with vfdtc state of the art decision tree algorithm for classifying data streams
this paper reports on the main results of specific action on mobile databases conducted by cnrs in france from october to december the objective of this action was to review the state of progress in mobile databases and identify major research directions for the french database community rather than provide survey of all important issues in mobile databases this paper gives an outline of the directions in which the action participants are now engaged namely copy synchronization in disconnected computing mobile transactions database embedded in ultra light devices data confidentiality pp dissemination models and middleware adaptability
shareability is design principle that refers to how system interface or device engages group of collocated co present users in shared interactions around the same content or the same object this is broken down in terms of set of components that facilitate or constrain the way an interface or product is made shareable central are the notions of access points and entry points entry points invite and entice people into engagement providing an advance overview minimal barriers and honeypot effect that draws observers into the activity access points enable users to join group’s activity allowing perceptual and manipulative access and fluidity of sharing we show how these terms can be useful for informing analysis and empirical research
mining closed frequent itemsets from data streams is of interest recently however it is not easy for users to determine proper minimum support threshold hence it is more reasonable to ask users to set bound on the result size therefore an interactive single pass algorithm called tkc ds top frequent closed itemsets of data streams is proposed for mining top closed itemsets from data streams efficiently novel data structure called cil closed itemset lattice is developed for maintaining the essential information of closed itemsets generated so far experimental results show that the proposed tkc ds algorithm is an efficient method for mining top frequent itemsets from data streams
this paper presents highly predictable low overhead and yet dynamic memory allocation strategy for embedded systems with scratch pad memory scratch pad is fast compiler managed sram memory that replaces the hardware managed cache it is motivated by its better real time guarantees vs cache and by its significantly lower overheads in energy consumption area and overall runtime even with simple allocation scheme existing scratch pad allocation methods are of two types first software caching schemes emulate the workings of hardware cache in software instructions are inserted before each load store to check the software maintained cache tags such methods incur large overheads in runtime code size energy consumption and sram space for tags and deliver poor real time guarantees just like hardware caches second category of algorithms partitionsm variables at compile time into the two banks for example our previous work in derives provably optimal static allocation for global and stack variables and achieves speedup over all earlier methods however drawback of such static allocation schemes is that they do not account for dynamic program behavior it is easy to see why data allocation that never changes at runtime cannot achieve the full locality benefits of cachein this paper we present dynamic allocation method for global and stack data that for the first time accounts for changing program requirements at runtime ii has no software caching tags iii requires no run time checks iv has extremely low overheads and yields predictable memory access times in this method data that is about to be accessed frequently is copied into the sram using compiler inserted code at fixed and infrequent points in the program earlier data is evicted if necessary when compared to provably optimal static allocation our results show runtime reductions ranging from to averaging using no additional hardware support with hardware support for pseudo dma and full dma which is already provided in some commercial systems the runtime reductions increase to and respectively
the innate dynamicity and complexity of mobile ad hoc networks manets has resulted in numerous ad hoc routing protocols being proposed furthermore numerous variants and hybrids continue to be reported in the literature this diversity appears to be inherent to the field it seems unlikely that there will ever be one size fits all solution to the ad hoc routing problem however typical deployment environments for ad hoc routing protocols still force the choice of single fixed protocol and the resultant compromise can easily lead to sub optimal performance depending on current operating conditions in this paper we address this problem by exploring framework approach to the construction and deployment of ad hoc routing protocols our framework supports the simultaneous deployment of multiple protocols so that manet nodes can switch protocols to optimise to current operating conditions the framework also supports finer grained dynamic reconfiguration in terms of protocol variation and hybridisation we evaluate our framework by using it to construct and simultaneously deploy two popular ad hoc routing protocols dymo and olsr and also to derive fine grained variants of these we measure the performance and resource overhead of these implementations compared to monolithic ones and find the comparison to be favourable to our approach
many embedded systems such as pdas require processing of the given applications with rigid power budget however they are able to tolerate occasional failures due to the imperfect human visual auditory systems the problem we address in this paper is how to utilize such tolerance to reduce multimedia system’s energy consumption for providing guaranteed quality of service at the user level in terms of completion ratio we explore range of offline and on line strategies that take this tolerance into account in conjunction with the modest non determinism in application’s execution time first we give simple best effort approach that achieves the maximum completion ratio then we propose an enhanced on line best eort energy minimization beem approach and hybrid offline on line minimum effort ome approach we prove that beem maintains the maximum completion ratio while consuming the provably least amount of energy and ome guarantees the required completion ratio statistically we apply both approaches to variety of benchmark task graphs most from popular dsp applications simulation results show that significant energy savings for beem and for ome both over the simple best eort approach can be achieved while meeting the required completion ratio requirements
in this paper we present an improvement to the minimum error boundary cut method of shaping texture patches for non parametric texture synthesis from example algorithms such as efros and freeman’s image quilting our method uses an alternate distance metric for dijkstra’s algorithm and as result we are able to prevent the path from taking short cuts through high cost areas as can sometimes be seen in traditional image quilting furthermore our method is able to reduce both the maximum error in the resulting texture and the visibility of the remaining defects by spreading them over longer path post process methods such as pixel re synthesis can easily be modified and applied to our minimum boundary cut to increase the quality of the results
mobile devices such as mobile phones and personal digital assistants have gained wide spread popularity these devices will increasingly be networked thus enabling the construction of distributed mobile applications these have to adapt to changes in context such as variations in network bandwidth exhaustion of battery power or reachability of services on other devices we show how the construction of adaptive and context aware mobile applications can be supported using reflective middleware the middleware provides software engineers with primitives to describe how context changes are handled using policies these policies may conflict in this paper we classify the different types of conflicts that may arise in mobile computing we argue that conflicts cannot be resolved statically at the time applications are designed but rather need to be resolved at execution time we demonstrate method by which these policy conflicts can be treated this method uses micro economic approach that relies on particular type of sealed bid auction
we present novel technique for texture synthesis using optimization we define markov random field mrf based similarity metric for measuring the quality of synthesized texture with respect to given input sample this allows us to formulate the synthesis problem as minimization of an energy function which is optimized using an expectation maximization em like algorithm in contrast to most example based techniques that do region growing ours is joint optimization approach that progressively refines the entire texture additionally our approach is ideally suited to allow for controllable synthesis of textures specifically we demonstrate controllability by animating image textures using flow fields we allow for general two dimensional flow fields that may dynamically change over time applications of this technique include dynamic texturing of fluid animations and texture based flow visualization
we introduce new model of partial synchrony for read write shared memory systems this model is based on the notion of set timeliness natural and straightforward generalization of the seminal concept of timeliness in the partially synchrony model of dwork lynch and stockmeyer despite its simplicity the concept of set timeliness is powerful enough to describe the first partially synchronous system for read write shared memory that separates consensus and set agreement we show that this system has enough timeliness for solving set agreement but not enough for solving consensus set timeliness also allows us to define family of partially synchronous systems of processes denoted skn which closely matches the family of anti failure detectors that were recently shown to be the weakest failure detectors for the set agreement problem we prove that for skn is synchronous enough to implement anti but not enough to implement anti the results above show that set timeliness can be used to study and compare the partial synchrony requirements of problems that are strictly weaker than consensus
it has been observed that component based applications exhibit object churn the excessive creation of short lived objects often caused by trading performance for modularity because churned objects are short lived they appear to be good candidates for stack allocation unfortunately most churned objects escape their allocating function making escape analysis ineffective we reduce object churn with three contributions first we formalize two measures of churn capture and control second we develop lightweight dynamic analyses for measuring both capture and control third we develop an algorithm that uses capture and control to inline portions of the call graph to make churned objects non escaping enabling churn optimization via escape analysis jolt is lightweight dynamic churn optimizer that uses our algorithms we embedded jolt in the jit compiler of the ibm commercial jvm and evaluated jolt on large application frameworks including eclipse and jboss we found that jolt eliminates over times as many allocations as state of the art escape analysis alone
diagrammatic visual languages can increase the ability of engineers to model and understand complex systems however to effectively use visual models the syntax and semantics of these languages should be defined precisely since most diagrammatic visual models that are currently used to specify systems can be described as directed typed graphs graph grammars have been identified as suitable formalism to describe the abstract syntax of visual modeling languages in this article we investigate how advanced graph transformation techniques such as conditional structure generic and type generic graph transformation rules can help to improve and simplify the specification of the abstract syntax of visual modeling language to demonstrate the practicability of an approach that unifies these advanced graph transformation techniques we define the abstract syntax of behavior trees bts graphical specification language for functional requirements additionally we provide translational semantics of bts by formalizing translation scheme to the input language of the sal model checking tool for each of the graph transformation rules
field programmable dual vdd interconnects are effective in reducing fpga power we formulate the dual vdd aware slack budgeting problem as linear program lp and min cost network flow problem respectively both algorithms reduce interconnect power by percnt on average compared to single vdd interconnects but the network flow based algorithm runs faster on mcnc benchmarks furthermore we develop simultaneous retiming and slack budgeting srsb with flip flop layout constraints in dual vdd fpgas based on mixed integer linear programming and speed up the algorithm by lp relaxation and local legalization compared to retiming followed by slack budgeting srsb reduces interconnect power by up to percnt
mechanism design is now standard tool in computer science for aligning the incentives of self interested agents with the objectives of system designer there is however fundamental disconnect between the traditional application domains of mechanism design such as auctions and those arising in computer science such as networks while monetary transfers ie payments are essential for most of the known positive results in mechanism design they are undesirable or even technologically infeasible in many computer systems classical impossibility results imply that the reach of mechanisms without transfers is severely limited computer systems typically do have the ability to reduce service quality routing systems can drop or delay traffic scheduling protocols can delay the release of jobs and computational payment schemes can require computational payments from users eg in spam fighting systems service degradation is tantamount to requiring that users burn money and such payments can be used to influence the preferences of the agents at cost of degrading the social surplus we develop framework for the design and analysis of money burning mechanisms to maximize the residual surplus the total value of the chosen outcome minus the payments required our primary contributions are the following we define general template for prior free optimal mechanism design that explicitly connects bayesian optimal mechanism design the dominant paradigm in economics with worst case analysis in particular we establish general and principled way to identify appropriate performance benchmarks in prior free mechanism design for general single parameter agent settings we characterize the bayesian optimal money burning mechanism for multi unit auctions we design near optimal prior free money burning mechanism for every valuation profile its expected residual surplus is within constant factor of our benchmark the residual surplus of the best bayesian optimal mechanism for this profile for multi unit auctions we quantify the benefit of general transfers over money burning optimal money burning mechanisms always obtain logarithmic fraction of the full social surplus and this bound is tight
an increasing interest in systems of systems that is systems comprising varying number of interconnected sub systems raises the need for automated verification techniques for dynamic process creation and changing communication topology in previous work we developed verification approach that is based on finitary abstraction via data type reduction to be effective in practice the abstraction has to be complemented by non trivial assumptions about valid communication behaviour so called non interference lemmata in this paper we mechanise the generation and validation of these kind of non interference properties by integrating ideas from communication observation and counter abstraction we thereby provide fully automatic procedure to substantially increase the precision of the abstraction we explain our approach in terms of modelling language for dynamic communication systems and use running example of car platooning system to demonstrate the effectiveness of our extensions
we address the problem of porting parallel distributed applications from static homogeneous cluster environments to dynamic heterogeneous grid resources we introduce generic technique for adaptive load balancing of parallel applications on heterogeneous resources and evaluate it using case study application virtual reactor for simulation of plasma chemical vapour deposition this application has modular architecture with number of loosely coupled components suitable for distribution over the grid it requires large parameter space exploration that allows using grid resources for high throughput computing the virtual reactor contains number of parallel solvers originally designed for homogeneous computer clusters that needed adaptation to the heterogeneity of the grid in this paper we study the performance of one of the parallel solvers apply the technique developed for adaptive load balancing evaluate the efficiency of this approach and outline an automated procedure for optimal utilization of heterogeneous grid resources for high performance parallel computing
set of topological invariants for relations between lines embedded in the dimensional euclidean space is given the set of invariants is proven to be necessary and sufficient to characterize topological equivalence classes of binary relations between simple lines the topology of arbitrarily complex geometric scenes is described with variation of the same set of invariants polynomial time algorithms are given to assess topological equivalence of two scenes the relevance of identifying such set of invariants and efficient algorithms is due to application areas of spatial database systems where model for describing topological relations between planar features is sought
the distance to monotonicity of sequence is the minimum number of edit operations required to transform the sequence into an increasing order this measure is complementary to the length of the longest increasing subsequence lis we address the question of estimating these quantities in the one pass data stream model and present the first sub linear space algorithms for both problems we first present radic space deterministic algorithms that approximate the distance to monotonicity and the lis to within factor that is arbitrarily close to we also show lower bound of omega on the space required by any randomized algorithm to compute the lis or alternatively the distance from monotonicity exactly demonstrating that approximation is necessary for sub linear space computation this bound improves upon the existing lower bound of omega radic lnvz our main result is randomized algorithm that uses only log space and approximates the distance to monotonicity to within factor that is arbitrarily close to in contrast we believe that any significant reduction in the space complexity for approximating the length of the lis is considerably hard we conjecture that any deterministic epsilon approximation algorithm for lis requires omega radic space and as step towards this conjecture prove space lower bound of omega radic for restricted yet natural class of deterministic algorithms
secure communications in wireless ad hoc networks require setting up end to end secret keys for communicating node pairs due to physical limitations and scalability requirements full key connectivity can not be achieved by key pre distribution in this paper we develop an analytical framework for the on demand key establishment approach we propose novel security metric called rem resilience vector to quantify the resilience of any key establishment schemes against revealing erasure and modification rem attacks our analysis shows that previous key establishment schemes are vulnerable under rem attacks relying on the new security metric we prove universal bound on achievable rem resilience vectors for any on demand key establishment scheme this bound that characterizes the optimal security performance analytically is shown to be tight as we propose rem resilient key establishment scheme which achieves any vector within this bound in addition we develop class of low complexity key establishment schemes which achieve nearly optimal rem attack resilience
with the rapid proliferation of xml data large scale online applications are emerging in this research we aim to enhance the xml query processors with the ability to process queries progressively and report partial results and query progress continually the methodology lays its foundation on sampling we shed light on how effective samples can be drawn from semi structured xml data as opposed to flat table relational data several innovative sampling schemes on xml data are designed the proposed methodology advances xml query processing to the next level being more flexible responsive user informed and user controllable to meet emerging needs and future challenges
commodity computer systems contain more and more processor cores and exhibit increasingly diverse architectural tradeoffs including memory hierarchies interconnects instruction sets and variants and io configurations previous high performance computing systems have scaled in specific cases but the dynamic nature of modern client and server workloads coupled with the impossibility of statically optimizing an os for all workloads and hardware variants pose serious challenges for operating system structures we argue that the challenge of future multicore hardware is best met by embracing the networked nature of the machine rethinking os architecture using ideas from distributed systems we investigate new os structure the multikernel that treats the machine as network of independent cores assumes no inter core sharing at the lowest level and moves traditional os functionality to distributed system of processes that communicate via message passing we have implemented multikernel os to show that the approach is promising and we describe how traditional scalability problems for operating systems such as memory management can be effectively recast using messages and can exploit insights from distributed systems and networking an evaluation of our prototype on multicore systems shows that even on present day machines the performance of multikernel is comparable with conventional os and can scale better to support future hardware
in this paper we discuss verification and validation of simulation models four different approaches to deciding model validity are described two different paradigms that relate verification and validation to the model development process are presented various validation techniques are defined conceptual model validity model verification operational validity and data validity are discussed way to document results is given recommended procedure for model validation is presented and accreditation is briefly discussed
privacy violation occurs when the association between an individual identity and data considered private by that individual is obtained by an unauthorized party uncertainty and indistinguishability are two independent aspects that characterize the degree of this association being revealed indistinguishability refers to the property that the attacker cannot see the difference among group of individuals while uncertainty refers to the property that the attacker cannot tell which private value among group of values an individual actually has this paper investigates the notion of indistinguishability as general form of anonymity applicable for example not only to generalized private tables but to relational views and to sets of views obtained by multiple queries over private database table it is shown how indistinguishability is highly influenced by certain symmetries among individuals in the released data with respect to their private values the paper provides both theoretical results and practical algorithms for checking if specific set of views over private table provide sufficient indistinguishability
we address the problem of cool blog classification using only positive and unlabeled examples we propose an algorithm called pub that exploits the information of unlabeled data together with the positive examples to predict whether the unseen blogs are cool or not the algorithm uses the weighting technique to assign weight to each unlabeled example which is assumed to be negative in the training set and the bagging technique to obtain several weak classifiers each of which is learned on small training set generated by randomly sampling some positive examples and some unlabeled examples which are assumed to be negative each of the weak classifiers must achieve admissible performance measure evaluated based on the whole labeled positive examples or has the best performance measure within iteration limit the majority voting function on all weak classifiers is employed to predict the class of test instance the experimental results show that pub can correctly predict the classes of unseen blogs where this situation cannot be handled by the traditional learning from positive and negative examples the results also show that pub outperforms other algorithms for learning from positive and unlabeled examples in the task of cool blog classification
learner centred design lcd is nebulous concept it can range from attempts to design with the needs of the learner at the forefront to involving the learner at various stages of the design process sometimes throughout the whole process in addition learner centred design involving children implies additional issues which do not present themselves when using an lcd approach with adults in this paper we argue that current lcd frameworks do not consider the full range of issues necessary for successful design we propose carss context activities roles stakeholders skills learner centred design framework specifically for child learners and describe two design case studies which demonstrate the framework in use
when faced with the need for documentation examples bug fixes error descriptions code snippets workarounds templates patterns or advice software developers frequently turn to their web browser web resources both organized and authoritative as well as informal and community driven are heavily used by developers the time and attention devoted to finding or re finding and navigating these sites is significant we present codetrail system that demonstrates how the developer’s use of web resources can be improved by connecting the eclipse integrated development environment ide and the firefox web browser codetrail uses communication channel and shared data model between these applications to implement variety of integrative tools by combining information previously available only to the ide or the web browser alone such as editing history code contents and recent browsing codetrail can automate previously manual tasks and enable new interactions that exploit the marriage of data and functionality from firefox and eclipse just as the ide will change the contents of peripheral views to focus on the particular code or task with which the developer is engaged so too the web browser can be focused on the developer’s current context and task
online communities produce rich behavioral datasets eg usenet news conversations wikipedia edits and facebook friend networks analysis of such datasets yields important insights like the long tail of user participation and suggests novel design interventions like targeting users with personalized opportunities and work requests however certain key user data typically are unavailable specifically viewing pre registration and non logged in activity the absence of data makes some questions hard to answer ac cess to it can strengthen extend or cast doubt on previous results we report on analysis of user behavior in cyclopath geographic wiki and route finder for bicyclists with access to viewing and non logged in activity data we were able to replicate and extend prior work on user lifecycles in wikipedia bring to light some pre registration activity thus testing for the presence of educational lurking and demonstrate the locality of geographic activity and how editing and viewing are geographically correlated
we investigate the interaction of mobile robots relying on information provided by heterogeneous sensor nodes to accomplish mission cooperative adaptive and responsive monitoring in mixed mode environments mmes raises the need for multi disciplinary research initiatives to date such research initiatives are limited since each discipline focusses on its domain specific simulation or testbed environment existing evaluation environments do not respect the interdependencies occurring in mmes as consequence holistic validation for development debugging and performance analysis requires an evaluation tool incorporating multi disciplinary demands in the context of mmes we discuss existing solutions and highlight the synergetic benefits of common evaluation tool based on this analysis we present the concept of the mm ulator novel architecture for an evaluation tool incorporating the necessary diversity for multi agent hard software in the loop simulation in modular and scalable way
with the scaling of technology leakage energy will become the dominant source of energy consumption besides cache memories branch predictors are among the largest on chip array structures and consume nontrivial leakage energy this paper proposes two cost effective loop based strategies to reduce the branch predictor leakage without impacting prediction accuracy or performance the loop based approaches exploit the fact that loops usually only contain small number of instructions and hence even fewer branch instructions while taking significant fraction of the execution time consequently all the nonactive entries of branch predictors can be placed into the low leakage mode during the loop execution in order to reduce leakage energy compiler and circuit supports are discussed to implement the proposed leakage reduction strategies compared to the recently proposed decay based approach our experimental results show that the loop based approach can extract percnt more dead time of the branch predictor on average leading to more leakage energy savings without impacting the branch prediction accuracy and performance
previous research has shown that faceted browsing is effective and enjoyable in searching and browsing large collections of data in this work we explore the efficacy of interactive visualization systems in supporting exploration and sensemaking within faceted datasets to do this we developed an interactive visualization system called facetlens which exposes trends and relationships within faceted datasets facetlens implements linear facets to enable users not only to identify trends but also to easily compare several trends simultaneously furthermore it offers pivot operations to allow users to navigate the faceted dataset using relationships between items we evaluate the utility of the system through description of insights gained while experts used the system to explore the chi publication repository as well as database of funding grant data and report formative user study that identified usability issues
we propose novel document clustering method which aims to cluster the documents into different semantic classes the document space is generally of high dimensionality and clustering in such high dimensional space is often infeasible due to the curse of dimensionality by using locality preserving indexing lpi the documents can be projected into lower dimensional semantic space in which the documents related to the same semantics are close to each other different from previous document clustering methods based on latent semantic indexing lsi or nonnegative matrix factorization nmf our method tries to discover both the geometric and discriminating structures of the document space theoretical analysis of our method shows that lpi is an unsupervised approximation of the supervised linear discriminant analysis lda method which gives the intuitive motivation of our method extensive experimental evaluations are performed on the reuters and tdt data sets
despite the recent advances in the theory underlying obfuscation there still is need to evaluate the quality of practical obfuscating transformations more quickly and easily this paper presents the first steps toward comprehensive evaluation suite consisting of number of deobfuscating transformations and complexity metrics that can be readily applied on existing and future transformations in the domain of binary obfuscation in particular framework based on software complexity metrics measuring four program properties code control flow data and data flow is suggested number of well known obfuscating and deobfuscating transformations are evaluated based upon their impact on set of complexity metrics this enables us to quantitatively evaluate the potency of the de obfuscating transformations
software systems are often released with missing functionality errors or incompatibilities that may result in failures in the field inferior performances or more generally user dissatisfaction in previous work some of the authors presented the gamma approach whose goal is to improve software quality by augmenting software engineering tasks with dynamic information collected from deployed software the gamma approach enables analyses that rely on actual field data instead of synthetic in house data and leverage the vast and heterogeneous resources of an entire user community instead of limited and often homogeneous in house resources when monitoring large number of deployed instances of software product however significant amount of data is collected such raw data are useless in the absence of suitable data mining and visualization techniques that support exploration and understanding of the data in this paper we present new technique for collecting storing and visualizing program execution data gathered from deployed instances of software product we also present prototype toolset gammatella that implements the technique finally we show how the visualization capabilities of gammatella facilitate effective investigation of several kinds of execution related information in an interactive fashion and discuss our initial experience with semi public display of gammatella
this paper considers various flavors of the following online problem preprocess text or collection of strings so that given query string all matches of with the text can be reported quickly in this paper we consider matches in which bounded number of mismatches are allowed or in which bounded number of don’t care characters are allowed the specific problems we look at are indexing in which there is single text and we seek locations where matches substring of dictionary queries in which collection of strings is given upfront and we seek those strings which match in their entirety and dictionary matching in which collection of strings is given upfront and we seek those substrings of long which match an original string in its entirety these are all instances of an all to all matching problem for which we provide single solutionthe performance bounds all have similar character for example for the indexing problem with and the query time for substitutions is log matches with data structure of size log and preprocessing time of log where are constants the deterministic preprocessing assumes weakly nonuniform ram model this assumption is not needed if randomization is used in the preprocessing
in this paper we generalize the notion of self adapting one dimensional index structures to wide class of spatial index structures the resulting query responsive index structures can adapt their structure to the users query pattern and thus have the potential to improve the response time in practice we outline two general approaches to providing query responsiveness and present the results in terms of the well known tree our experiments show that depending on the query pattern significant improvements can be obtained in practice
we propose method to evaluate queries using last resort semantic cache in distributed web search engine the cache stores group of frequent queries and for each of these queries it keeps minimal data that is the list of machines that produced their answers the method for evaluating the queries uses the inverse frequency of the terms in the queries stored in the cache idf to determine when the results recovered from the cache are good approximation to the exact answer set experiments show that the method is effective and efficient
empirical validation of software metrics to predict quality using machine learning methods is important to ensure their practical relevance in the software organizations it would also be interesting to know the relationship between object oriented metrics and fault proneness in this paper we build support vector machine svm model to find the relation ship between object oriented metrics given by chidamber and kemerer and fault proneness the proposed model is empirically evaluated using open source software the performance of the svm method was evaluated by receiver operating characteristic roc analysis based on these results it is reasonable to claim that such models could help for planning and performing testing by focusing resources on fault prone parts of the design and code thus the study shows that svm method may also be used in constructing software quality models however similar types of studies are required to be carried out in order to establish the acceptability of the model
application servers are subject to varying workloads which suggests an autonomic management to maintain optimal performance we propose to integrate in the component based programming model often used in current application servers the concept of service level adaptation allowing some components to dynamically degrade or upgrade their level of service our goal is to be able under heavy workloads to trade lower service level of the most resource intensive components for stable performance of the server as whole upgrading or degrading components is autonomously performed through runtime profiling which is used to estimate the application’s hot spots and target adaptations in addition to finding the best adaptations this performance profile allows our system to characterize the effects of past adaptations in particular given the current workload it is possible to estimate if service level upgrade might result in an overload as result by stabilizing the server at peak performance via component adaptations we are able to drastically improve both overall latency and throughput for instance on both the rubis and tpc benchmarks we are able to maintain peak performance in heavy load scenarios far exceeding the initial capacity of the system
this paper proposes nonintrusive encryption mechanism for protecting data confidentiality on the web the core idea is to encrypt confidential data before sending it to untrusted sites and use keystores on the web to manage encryption keys without intervention from users formal language based information flow model is used to prove the soundness of the mechanism
this paper addresses the maximal lifetime scheduling problem in sensor surveillance systems given set of sensors and targets in an area sensor can watch only one target at time our task is to schedule sensors to watch targets and forward the sensed data to the base station such that the lifetime of the surveillance system is maximized where the lifetime is the duration that all targets are watched and all active sensors are connected to the base station we propose an optimal solution to find the target watching schedule for sensors that achieves the maximal lifetime our solution consists of three steps computing the maximal lifetime of the surveillance system and workload matrix by using the linear programming technique decomposing the workload matrix into sequence of schedule matrices that can achieve the maximal lifetime and determining the sensor surveillance trees based on the above obtained schedule matrices which specify the active sensors and the routes to pass sensed data to the base station this is the first time in the literature that the problem of maximizing lifetime of sensor surveillance systems has been formulated and the optimal solution has been found
with the growth in the size of datasets data mining has recently become an important research topic and is receiving substantial interest from both academia and industry at the same time greater recognition of the value of temporal and spatial data has been evident and the first papers looking at the confluence of these two areas are starting to emerge this short paper provides few comments on this research and provides bibliography of relevant research papers investigating temporal spatial and spatio temporal data mining
the high heterogeneity and variability of mobile computing environments can adversely affect the performance of applications running in these environments to tackle this problem adaptation techniques can be exploited adaptation based on code mobility is possible solution as it allows to dynamically modify the load of the hosting nodes and the internode traffic to adapt to the changing characteristics of computing nodes and network links in this paper we propose modeling framework to analyze the performance effectiveness of code mobility based adaptation in mobile computing environment distinguishing feature of our framework is the modeling of both physical and logical mobility as something that can be plugged into pre existing architecture model to ease the analysis of the performance impact of both different physical mobility scenarios and of different adaptation strategies based on code mobility to enhance the framework usability we have adopted uml as modeling language remaining fully compliant with the latest uml specification and with the standard uml profile for schedulability performance and time specification
in an instance of the prize collecting steiner forest problem pcsf we are given an undirected graph non negative edge costs for all epsilon terminal pairs si ti le le and penalties pi pi feasible solution consists of forest and subset of terminal pairs such that for all si ti epsilon either si ti are connected by or si ti epsilon the objective is to compute feasible solution of minimum cost pi game theoretic version of the above problem has players one for each terminal pair in player i’s ultimate goal is to connect si and ti and the player derives privately held utility ui ge from being connected service provider can connect the terminals si and ti of player in two ways by buying the edges of an si ti path in or by buying an alternate connection between si and ti maybe from some other provider at cost of pi in this paper we present simple budget balanced and group strategyproof mechanism for the above problem we also show that our mechanism computes client sets whose social cost is at most log times the minimum social cost of any player set this matches lower bound that was recently given by roughgarden and sundararajan stoc
tags are an important information source in web they can be used to describe users topic preferences as well as the content of items to make personalized recommendations however since tags are arbitrary words given by users they contain lot of noise such as tag synonyms semantic ambiguities and personal tags such noise brings difficulties to improve the accuracy of item recommendations to eliminate the noise of tags in this paper we propose to use the multiple relationships among users items and tags to find the semantic meaning of each tag for each user individually with the proposed approach the relevant tags of each item and the tag preferences of each user are determined in addition the user and item based collaborative filtering combined with the content filtering approach are explored the effectiveness of the proposed approaches is demonstrated in the experiments conducted on real world datasets collected from amazoncom and citeulike website
heterogeneous information network is an information network composed of multiple types of objects clustering on such network may lead to better understanding of both hidden structures of the network and the individual role played by every object in each cluster however although clustering on homogeneous networks has been studied over decades clustering on heterogeneous networks has not been addressed until recently recent study proposed new algorithm rankclus for clustering on bi typed heterogeneous networks however real world network may consist of more than two types and the interactions among multi typed objects play key role at disclosing the rich semantics that network carries in this paper we study clustering of multi typed heterogeneous networks with star network schema and propose novel algorithm netclus that utilizes links across multityped objects to generate high quality net clusters an iterative enhancement method is developed that leads to effective ranking based clustering in such heterogeneous networks our experiments on dblp data show that netclus generates more accurate clustering results than the baseline topic model algorithm plsa and the recently proposed algorithm rankclus further netclus generates informative clusters presenting good ranking and cluster membership information for each attribute object in each net cluster
there is growing recognition within the visual analytics community that interaction and inquiry are inextricable it is through the interactive manipulation of visual interface the analytic discourse that knowledge is constructed tested refined and shared this article reflects on the interaction challenges raised in the visual analytics research and development agenda and further explores the relationship between interaction and cognition it identifies recent exemplars of visual analytics research that have made substantive progress toward the goals of true science of interaction which must include theories and testable premises about the most appropriate mechanisms for human information interaction seven areas for further work are highlighted as those among the highest priorities for the next years of visual analytics research ubiquitous embodied interaction capturing user intentionality knowledge based interfaces collaboration principles of design and perception interoperability and interaction evaluation ultimately the goal of science of interaction is to support the visual analytics and human computer interaction communities through the recognition and implementation of best practices in the representation and manipulation of visual displays
in this paper we define and explore proofs of retrievability pors por scheme enables an archive or back up service prover to produce concise proof that user verifier can retrieve target file that is that the archive retains and reliably transmits file data sufficient for the user to recover in its entirety por may be viewed as kind of cryptographic proof of knowledge pok but one specially designed to handle large file or bitstring we explore por protocols here in which the communication costs number of memory accesses for the prover and storage requirements of the user verifier are small parameters essentially independent of the length of in addition to proposing new practical por constructions we explore implementation considerations and optimizations that bear on previously explored related schemes in por unlike pok neither the prover nor the verifier need actually have knowledge of pors give rise to new and unusual security definition whose formulation is another contribution of our work we view pors as an important tool for semi trusted online archives existing cryptographic techniques help users ensure the privacy and integrity of files they retrieve it is also natural however for users to want to verify that archives do not delete or modify files prior to retrieval the goal of por is to accomplish these checks without users having to download the files themselves por can also provide quality of service guarantees ie show that file is retrievable within certain time bound
modern networks provide qos quality of service model to go beyond best effort services but current qos models are oriented towards low level network parameters eg bandwidth latency jitter application developers on the other hand are interested in quality models that are meaningful to the end user and therefore struggle to bridge the gap between network and application qos models examples of application quality models are response time predictability or budget for transmission costs applications that can deal with changes in the network environment are called network aware network aware application attempts to adjust its resource demands in response to network performance variations this paper presents framework based approach to the construction of network aware programs at the core of the framework is feedback loop that controls the adjustment of the application to network properties the framework provides the skeleton to address two fundamental challenges for the construction of network aware applications how to find out about dynamic changes in network service quality and how to map application centric quality measures eg predictability to network centric quality measures eg qos models that focus on bandwidth or latency our preliminary experience with prototype network aware image retrieval system demonstrates the feasibility of our approach the prototype illustrates that there is more to network awareness than just taking network resources and protocols into account and raises questions that need to be addressed from software engineering point of view to make general approach to network aware applications useful
discussion boards and online forums are important platforms for people to share information users post questions or problems onto discussion boards and rely on others to provide possible solutions and such question related content sometimes even dominates the whole discussion board however to retrieve this kind of information automatically and effectively is still non trivial task in addition the existence of other types of information eg announcements plans elaborations etc makes it difficult to assume that every thread in discussion board is about question we consider the problems of identifying question related threads and their potential answers as classification tasks experimental results across multiple datasets demonstrate that our method can significantly improve the performance in both question detection and answer finding subtasks we also do careful comparison of how different types of features contribute to the final result and show that non content features play key role in improving overall performance finally we show that ranking scheme based on our classification approach can yield much better performance than prior published methods
web based instruction wbi programs which have been increasingly developed in educational settings are used by diverse learners therefore individual differences are key factors for the development of wbi programs among various dimensions of individual differences the study presented in this article focuses on cognitive styles more specifically this study investigates how cognitive styles affect students learning patterns in wbi program with an integrated approach utilizing both traditional statistical and data mining techniques the former are applied to determine whether cognitive styles significantly affected students learning patterns the latter use clustering and classification methods in terms of clustering the means algorithm has been employed to produce groups of students that share similar learning patterns and subsequently the corresponding cognitive style for each group is identified as far as classification is concerned the students learning patterns are analyzed using decision tree with which eight rules are produced for the automatic identification of students cognitive styles based on their learning patterns the results from these techniques appear to be consistent and the overall findings suggest that cognitive styles have important effects on students learning patterns within wbi the findings are applied to develop model that can support the development of wbi programs
small variety of methods and techniques are presented in the literature as solutions to manage requirements elicitation for web applications however the existing state of the art is lacking research regarding practical functioning solutions that would match web application characteristics the main concern for this paper is how requirements for web applications can be elicited the viewpoint oriented requirements definition method vord is chosen for eliciting and formulating web application requirements in an industrial case study vord is helpful because it allows structuring of requirements around viewpoints and formulating very detailed requirements specifications requirements were understandable to the client with minimal explanation but failed to capture the business vision strategy and daily business operations and could not anticipate the changes in the business process as consequence of introducing the web application within the organisation the paper concludes by discussion of how to adapt and extend vord to suit web applications
generalized temporal role based access control gtrbac model that allows specification of comprehensive set of temporal constraint for access control has recently been proposed the model constructs allow one to specify various temporal constraints on role user role assignments and role permission assignments however temporal constraints on role enablings and role activations can have various implications on role hierarchy in this paper we present an analysis of the effects of gtrbac temporal constraints on role hierarchy and introduce various kinds of temporal hierarchies in particular we show that there are certain distinctions that need to be made in permission inheritance and role activation semantics in order to capture all the effects of gtrbac constraints such as role enablings and role activations on role hierarchy
automated deduction uses computation to perform symbolic logical reasoning it has been core technology for program verification from the very beginning satisfiability solvers for propositional and first order logic significantly automate the task of deductive program verification we introduce some of the basic deduction techniques used in software and hardware verification and outline the theoretical and engineering issues in building deductive verification tools beyond verification deduction techniques can also be used to support variety of applications including planning program optimization and program synthesis
we propose biologically motivated computational model for learning task driven and object based visual attention control in interactive environments in this model top down attention is learned interactively and is used to search for desired object in the scene through biasing the bottom up attention in order to form need based and object driven state representation of the environment our model consists of three layers first in the early visual processing layer most salient location of scene is derived using the biased saliency based bottom up model of visual attention then cognitive component in the higher visual processing layer performs an application specific operation like object recognition at the focus of attention from this information state is derived in the decision making and learning layer top down attention is learned by the tree algorithm which successively grows an object based binary tree internal nodes in this tree check the existence of specific object in the scene by biasing the early vision and the object recognition parts its leaves point to states in the action value table motor actions are associated with the leaves after performing motor action the agent receives reinforcement signal from the critic this signal is alternately used for modifying the tree or updating the action selection policy the proposed model is evaluated on visual navigation tasks where obtained results lend support to the applicability and usefulness of the developed method for robotics
we present the design of high performance and energy efficient dynamic instruction schedulers in dimensional integration technology based on previous observation that the critical path latency of conventional dynamic scheduler is greatly affected by wire delay we propose integrated scheduler designs by partitioning conventional scheduler across multiple vertically stacked die the die stacked organization reduces the lengths of critical wires thus reducing both latency and energy our simulation results show that entry entry instruction scheduler implemented in die stack achieves reduction in latency with simultaneous energy reduction as compared to conventional planar design the benefits are even larger when the instruction scheduler is implemented on die stack with the corresponding latency reductions being
we propose novel local feature based face representation method based on two stage subset selection where the first stage finds the informative regions and the second stage finds the discriminative features in those locations the key motivation is to learn the most discriminative regions of human face and the features in there for person identification instead of assuming priori any regions of saliency we use the subset selection based formulation and compare three variants of feature selection and genetic algorithms for this purpose experiments on frontal face images taken from the feret dataset confirm the advantage of the proposed approach in terms of high accuracy and significantly reduced dimensionality
in our approach the event calculus is used to provide formalism that avoids the question of object timestamping by not applying time to objects rather temporal behavior is reflected in events which bring about changes in objects previous applications of the event calculus in databases are considered an extension of the formalism to fully bitemporal model is demonstrated these extensions and the object event calculus oec form framework for approaching temporal issues in object oriented systems practical application issues as well as formal theory are describedcurrent gises will support areal calculations on geographic objects and can also describe topological relations between them however they lack the ability to extrapolate from historical data the sufficiency of the temporal gis model to support inventory updates quality control and display is demonstrated follow up and further extensions and areas of exploration are presented at the conclusion
there is growing trend towards designing simpler cpu cores that have considerable area complexity and power advantages these cores are then leveraged in large scale multicore processors or in socs for hand held devices the most significant limitation of such simple cpu cores is their lower performance in this paper we propose technique to improve the performance of simple cores with minimal increase in complexity and area in particular we integrate reconfigurable hardware unit rhu that exploits loop level parallelism to increase the core’s overall performance the rhu is reconfigured to execute instructions with highly predictable operand values from the future iterations of loops our experiments show that the proposed architecture improves the performance by an average of about across wide range of applications while incurring area overhead of only about
the analysis of data using visual tool is rarely task done in isolation it tends to be part of wider goal that of making sense of the current situation often to support decision making user centred approach is needed in order to properly design interaction that supports sense making incorporating visual data analysis this paper reports the experience gained in medla project that aims to support knowledge management km sharing and reuse across different media in large enterprises we report the user centred design approach adopted and the design phases that led to the first prototype user evaluation was conducted to assess the design and how different levels of data information and knowledge were mapped using alternative visual tools the results show that clear separation of the visual data analysis from other sense making sub tasks helps users in focussing their attention users particularly appreciated the data analysis across different media and formats as well as the support for contextualising information within the broader perspective of km further work is needed to develop more fully intuitive visualisations that exploit the richer information in multimedia documents and make the multiple connections between data more easily accessible
multi class binary symbol classification requires the use of rich descriptors and robust classifiers shape representation is difficult task because of several symbol distortions such as occlusions elastic deformations gaps or noise in this paper we present the circular blurred shape model descriptor this descriptor encodes the arrangement information of object parts in correlogram structure prior blurring degree defines the level of distortion allowed to the symbol moreover we learn the new feature space using set of adaboost classifiers which are combined in the error correcting output codes framework to deal with the multi class categorization problem the presented work has been validated over different multi class data sets and compared to the state of the art descriptors showing significant performance improvements
web services are increasingly gaining acceptance as framework for facilitating application to application interactions within and across enterprises it is commonly accepted that service description should include not only the interface but also the business protocol supported by the service the present work focuses on the formalization of an important category of protocols that includes time related constraints called timed protocols and the impact of time on compatibility and replaceability analysis we formalized the following timing constraints invoke constraints define time windows within which service operation can be invoked while invoke constraints define expiration deadlines we extended techniques for compatibility and replaceability analysis between timed protocols by using semantic preserving mapping between timed protocols and timed automata leading to the identification of novel class of timed automata called protocol timed automata pta pta exhibit particular kind of silent transition that strictly increase the expressiveness of the model yet they are closed under complementation making every type of compatibility or replaceability analysis decidable finally we implemented our approach in the context of larger project called servicemosaic model driven framework for web service life cycle management
this paper investigates supervised and unsupervised adaptation of stochastic grammars including gram language models and probabilistic context free grammars pcfgs to new domain it is shown that the commonly used approaches of count merging and model interpolation are special cases of more general maximum posteriori map framework which additionally allows for alternate adaptation approaches this paper investigates the effectiveness of different adaptation strategies and in particular focuses on the need for supervision in the adaptation process we show that gram models as well as pcfgs benefit from either supervised or unsupervised map adaptation in various tasks for gram models we compare the benefit from supervised adaptation with that of unsupervised adaptation on speech recognition task with an adaptation sample of limited size about and show that unsupervised adaptation can obtain of the adaptation gain obtained by supervised adaptation we also investigate the benefit of using multiple word hypotheses in the form of word lattice for unsupervised adaptation on speech recognition task for which there was much larger adaptation sample available the use of word lattices for adaptation required the derivation of generalization of the well known good turing estimate using this generalization we derive method that uses monte carlo sampling for building katz backoff models the adaptation results show that for adaptation samples of limited size several tens of hours unsupervised adaptation on lattices gives performance gain over using transcripts the experimental results also show that with very large adaptation sample the benefit from transcript based adaptation matches that of lattice based adaptation finally we show that pcfg domain adaptation using the map framework provides similar gains in measure accuracy on parsing task as was seen in asr accuracy improvements with gram adaptation experimental results show that unsupervised adaptation provides of the gain obtained by supervised adaptation
near duplicate detection techniques are exploited to facilitate representative photo selection and region of interest roi determination which are important functionalities for efficient photo management and browsing to make near duplicate detection module resist to noisy features three filtering approaches ie point based region based and probabilistic latent semantic plsa are developed to categorize feature points for the photos taken in travels we construct support vector machine classifier to model matching patterns between photos and determine whether photos are near duplicate pairs relationships between photos are then described as graph and the most central photo that best represents photo cluster is selected according to centrality values because matched feature points are often located in the interior or at the contour of important objects the region that compactly covers the matched feature points is determined as the roi we compare the proposed approaches with conventional ones and demonstrate their effectiveness
the twisted cube is an important variation of the hypercube it possesses many desirable properties for interconnection networks in this paper we study fault tolerant embedding of paths in twisted cubes let tq denote the dimensional twisted cube we prove that path of length can be embedded between any two distinct nodes with dilation for any faulty set tq tq with this result is optimal in the sense that the embedding has the smallest dilation the result is also complete in the sense that the two bounds on path length and faulty set size for successful embedding are tight that is the result does not hold if we also extend the result on hamiltonian connectivity of tq in the literature
this paper reports the result of our experimental study on new method of applying an association rule miner to discover useful information from customer inquiry database in call center of company it has been claimed that association rule mining is not suited for text mining to overcome this problem we propose to generate sequential data set of words with dependency structure from the japanese text database and to employ new method for extracting meaningful association rules by applying new rule selection criterion each inquiry in the sequential data was represented as list of word pairs each of which consists of verb and its dependent noun the association rules were induced regarding each pair of words as an item the rule selection criterion comes from our principle that we put heavier weights to co occurrence of multiple items more than single item occurrence we regarded rule important if the existence of the items in the rule body significantly affects the occurrence of the item in the rule head the selected rules were then categorized to form meaningful information classes with this method we succeeded in extracting useful information classes from the text database which were not acquired by only simple keyword retrieval also inquiries with multiple aspects were properly classified into corresponding multiple categories
in this paper we present reliable storage service called solarstore that adaptively trades off storage reliability versus energy consumption in solar powered sensor networks solarstore adopts predominantly disconnected network model where long running data collection experiments are conducted in the absence of continuous connection to the outside world solarstore replicates data in the network until the next upload opportunity and ii adapts the degree of data replication dynamically depending on solar energy and storage availability the goal is to maximize the amount of data that can eventually be retrieved from the network subject to energy and storage constraints maximization of retrievable data implies minimizing sensing blackouts due to energy depletion as well as minimizing loss due to node damage in harsh environmental conditions we have deployed an outdoor solar powered sensor network on which solarstore is implemented and tested an indoor testbed is also set up for performance evaluation under environmental conditions not attained locally experiments show that solarstore is successful in dynamically responding to variations in the environment in manner that increases retrievable data under different node failure scenarios
we propose unifying framework for model based specification notations our framework captures the execution semantics that are common among model based notations and leaves the distinct elements to be defined by set of parameters the basic components of specification are non concurrent state transition machines which are combined by composition operators to form more complex concurrent specifications we define the step semantics of these basic components in terms of an operational semantics template whose parameters specialize both the enabling of transitions and transitions effects we also provide the operational semantics of seven composition operators defining each as the concurrent execution of components with changes to their shared variables and events to reflect inter component communication and synchronization the definitions of these operators use the template parameters to preserve in composition notation specific behaviour by separating notation’s step semantics from its composition and concurrency operators we simplify the definitions of both our framework is sufficient to capture the semantics of basic transition systems csp ccs basic lotos estelle subset of sdl and variety of statecharts notations we believe that description of notation’s semantics in our framework can be used as input to tool that automatically generates formal analysis tools
in this paper novel method of representing symbolic images in symbolic image database sid invariant to image transformations that is useful for exact match retrieval is presented the relative spatial relationships existing among the components present in an image are perceived with respect to the direction of reference and preserved by set of triples distinct and unique key is computed for each distinct triple the mean and standard deviation of the set of keys computed for symbolic image are stored along with the total number of keys as the representatives of the corresponding image the proposed exact match retrieval scheme is based on modified binary search technique and thus requires log search time in the worst case where is the total number of symbolic images in the sid an extensive experimentation on large database of symbolic images is conducted to corroborate the superiority of the model the effectiveness of the proposed representation scheme is tested with standard testbed images
this paper discusses dependable and widely applicable peer to peer pp computing platform as the existing pp computing platforms are limited due to the lack of support for various computational models this paper proposes workflow management mechanism to support task dependency in parallel programs while increasing computing efficiency in general task dependency leads to serious performance degradation for failed task re execution because of volatile peers therefore it results in low dependability here dependability is defined as comparison of the actual performance with task failures to the theoretical one without failure on pp computing platform redundant task dispatch and runtime optimization method are proposed to guarantee high dependability even with highly volatile peers large scale simulation results indicate that the computing platform efficiently solves the problem of pp computing due to volatile peers
most commercial database systems do or should exploit many sorting techniques that are publicly known but not readily available in the research literature these techniques improve both sort performance on modern computer systems and the ability to adapt gracefully to resource fluctuations in multiuser operations this survey collects many of these techniques for easy reference by students researchers and product developers it covers in memory sorting disk based external sorting and considerations that apply specifically to sorting in database systems
the papers in this special issue originated at sat the fourth international symposium on the theory and applications of satisfiability testing this foreword reviews the current state of satisfiability testing and places the papers in this issue in context
kernel machines and rough sets are two classes of popular learning techniques kernel machines enhance traditional linear learning algorithms to deal with nonlinear domains by nonlinear mapping while rough sets introduce human like manner to deal with uncertainty in learning granulation and approximation play central role in rough sets based learning and reasoning fuzzy granulation and fuzzy approximation which is inspired by the ways in which humans granulate information and reason with it are widely discussed in literatures however how to generate effective fuzzy granules from data has not been fully studied so far in this work we integrate kernel functions with fuzzy rough set models and propose two types of kernelized fuzzy rough sets kernel functions are employed to compute the fuzzy equivalence relations between samples thus generate fuzzy information granules of the approximation space and then these fuzzy granules are used to approximate the classification based on the conception of fuzzy lower and upper approximations
recent marketing research has suggested that in store environmental stimuli such as shelf space allocation and product display has great influence upon consumer buying behavior and may induce substantial demand prior work in this area however has not considered the effect of spatial relationships such as the shelf space adjacencies of distinct items on unit sales this paper motivated in great part by the prominent beer and diapers example uses data mining techniques to discover the implicit yet meaningful relationship between the relative spatial distance of displayed products and the items unit sales in retailer’s store the purpose of the developed mining scheme is to identify and classify the effects of such relationships the managerial implications of the discovered knowledge are crucial to the retailer’s strategic formation in merchandising goods this paper proposes novel representation scheme and develops robust algorithm based on association analysis to show its efficiency and effectiveness an intensive experimental study using self defined simulation data was conducted the authors believe that this is the first academically researched attempt at exploring this emerging area of the merchandising problem using data mining
in the information filtering paradigm clients subscribe to server with continuous queries or profiles that express their information needs clients can also publish documents to servers whenever document is published the continuous queries satisfying this document are found and notifications are sent to appropriate clients this article deals with the filtering problem that needs to be solved efficiently by each server given database of continuous queries db and document find all queries isin db that match we present data structures and indexing algorithms that enable us to solve the filtering problem efficiently for large databases of queries expressed in the model awp awp is based on named attributes with values of type text and its query language includes boolean and word proximity operators
we describe method to order messages across groups in publish subscribe system without centralized control or large vector timestamps we show that our scheme is practical little state is required that it is scalable the maximum message load is limited by receivers and that it performs well the paths messages traverse to be ordered are not made much longer than necessary our insight is that only messages to groups that overlap in membership can be observed to arrive out of order sequencing messages to these groups is sufficient to provide consistent order and when publishers subscribe to the groups to which they send this message order is causal order
today’s complex world requires state of the art data analysis over truly massive data sets these data sets can be stored persistently in databases or flat files or can be generated in realtime in continuous manner an associated set is collection of data sets annotated by the values of domain these data sets are populated using data source according to condition and the annotated value an associated set asset query consists of repeated successive interrelated definitions of associated sets put together in column wise fashion resembling spreadsheet document we present datamingler powerful gui to express and manage asset queries data sources and aggregate functions and the asset query engine qe to efficiently evaluate asset queries we argue that asset queries constitute useful class of olap queries are suitable for distributed processing settings and extend the mapreduce paradigm in declarative way
networks of workstations are rapidly emerging as cost effective alternative to parallel computers switch based interconnects with irregular topology allow the wiring flexibility scalability and incremental expansion capability required in this environment however the irregularity also makes routing and deadlock avoidance on such systems quite complicated in current proposals many messages are routed following nonminimal paths increasing latency and wasting resources in this paper we propose two general methodologies for the design of adaptive routing algorithms for networks with irregular topology routing algorithms designed according to these methodologies allow messages to follow minimal paths in most cases reducing message latency and increasing network throughputas an example of application we propose two adaptive routing algorithms for an previously known as autonet they can be implemented either by duplicating physical channels or by splitting each physical channel into two virtual channels in the former case the implementation does not require new switch design it only requires changing the routing tables and adding links in parallel with existing ones taking advantage of spare switch ports in the latter case new switch design is required but the network topology is not changed evaluation results for several different topologies and message distributions show that the new routing algorithms are able to increase throughput for random traffic by factor of up to with respect to the original up ast down ast algorithm also reducing latency significantly for other message distributions throughput is increased more than seven times we also show that most of the improvement comes from the use of minimal routing
user contributed tags have shown promise as means of indexing multimedia collections by harnessing the combined efforts and enthusiasm of online communities but tags are only one way of describing multimedia items in this study compare the characteristics of public tags with other forms of descriptive metadata’titles and narrative captions’that users have assigned to collection of very similar images gathered from the photo sharing service flickr the study shows that tags converge on different descriptions than the other forms of metadata do and that narrative metadata may be more effective than tags for capturing certain aspects of images that may influence their subsequent retrieval and use the study also examines how photographers use peoples names to personalize the different types of metadata and how they tell stories across short sequences of images the study results are then brought to bear on design recommendations for user tagging tools and automated tagging algorithms and on using photo sharing sites as de facto art and architecture resources
many anti phishing mechanisms currently focus on helping users verify whether web site is genuine however usability studies have demonstrated that prevention based approaches alone fail to effectively suppress phishing attacks and protect internet users from revealing their credentials to phishing sites in this paper instead of preventing human users from ldquo biting the bait rdquo we propose new approach to protect against phishing attacks with ldquo bogus bites rdquo we develop bogusbiter unique client side anti phishing tool which transparently feeds relatively large number of bogus credentials into suspected phishing site bogusbiter conceals victim’s real credential among bogus credentials and moreover it enables legitimate web site to identify stolen credentials in timely manner leveraging the power of client side automatic phishing detection techniques bogusbiter is complementary to existing preventive anti phishing approaches we implemented bogusbiter as an extension to the firefox web browser and evaluated its efficacy through real experiments on both phishing and legitimate web sites our experimental results indicate that it is promising to use bogusbiter to transparently protect against phishing attacks
this work presents immucube scalable and efficient mechanism to improve dependability of interconnection networks for parallel and distributed computers immucube achieves better flexibility and scalability than any other previous fault tolerant mechanism in ary cubes the proposal inherits from immunet several advantages over other previous fault tolerant routing algorithms allowing any temporal and spatial fault combination permitting automatic and application transparent reconfiguration after any fault and requiring negligible overhead in the absence of faults immucube introduces new important features such as providing graceful performance degradation even in very large interconnection networks tolerating transparent resource utilization after transitory faults or partial repair of faulty resources being able to deal with intermittent faults and being able to dynamically recover the original network performance when all the failed components have been repaired
this paper compares and analyzes heuristics to solve the fine grained data replication problem over the internet in fine grained replication frequently accessed data objects as opposed to the entire website contents are replicated onto set of selected sites so as to minimize the average access time perceived by the end users the paper presents unified cost model that captures the minimization of the total object transfer cost in the system which in turn leads to effective utilization of storage space replica consistency fault tolerance and load balancing the set of heuristics include six star based algorithms two bin packing algorithms one greedy and one genetic algorithm the heuristics are extensively simulated and compared using an experimental test bed that closely mimics the internet infrastructure and user access patterns gt itm and inet topology generators are used to obtain well defined network topologies based on flat link distance power law and hierarchical transit stub models the user access patterns are derived from real access logs collected at the websites of soccer world cup and nasa kennedy space center the heuristics are evaluated by analyzing the communication cost incurred due to object transfers under the variance of server capacity object size read access write access number of objects and sites the main benefit of this study is to facilitate readers with the choice of algorithms that guarantee fast or optimal or both types of solutions this allows the selection of particular algorithm to be used in given scenario
this paper describes how the cmcc compiler reuses code both internally reuse between different modules and externally reuse between versions for different target machines the key to reuse are the application frameworks developed for global data flow analysis code generation instruction scheduling and register allocationthe code produced by cmcc is as good as the code produced by the native compilers for the mips and sparc although significantly less resources have been spent on cmcc overall about man years by persons cmcc is implemented in which allowed for compact expression of the frameworks as class hierarchies the results support the claim that suitable frameworks facilitate reuse and thereby significantly improve developer effectiveness
in the context of object databases we study the application of an update method to collection of receivers rather than to single one the obvious strategy of applying the update to the receivers one after the other in some arbitrary order brings up the problem of order independence on very general level we investigate how update behavior can be analyzed in terms of certain schema annotations called colorings we are able to characterize those colorings that always describe order independedent updates we also consider more specific model of update methods implemented in the relational algebra order independence of such algebraic methods is undecidable in general but decidable if the expressions used are positive finally we consider an alternative parallel strategy for set oriented applications of algebraic update methods and compare and relate it to the sequential strategy
interaction and navigation in large geometric spaces typically require sequence of pan and zoom actions this strategy is often ineffective and cumbersome especially when trying to study several distant objects we propose new distortion technique that folds the intervening space to guarantee visibility of multiple focus regions the folds themselves show contextual information and support unfolding and paging interactions compared to previous work our method provides more context and distance awareness we conducted study comparing the space folding technique to existing approaches and found that participants performed significantly better with the new technique
frequent requests from users to search engines on the world wide web are to search for information about people using personal names current search engines only return sets of documents containing the name queried but as several people usually share personal name the resulting sets often contain documents relevant to several people it is necessary to disambiguate people in these result sets in order to to help users find the person of interest more readily in the task of name disambiguation effective measurement of similarities in the documents is crucial step towards the final disambiguation we propose new method that uses web directories as knowledge base to find common contexts in documents and uses the common contexts measure to determine document similarities experiments conducted on documents mentioning real people on the web together with several famous web directory structures suggest that there are significant advantages in using web directories to disambiguate people compared with other conventional methods
interesting characteristics of large scale events are their spatial distribution their extended duration over days and the fact that they are set apart from daily life the increasing pervasiveness of computational media encourages us to investigate such unexplored domains especially when thinking of applications for spectator groups here we report of field study on two groups of rally spectators who were equipped with multimedia phones and we present novel mobile group media application called mgroup that supports groups in creating and sharing experiences particularly we look at the possibilities of and boundary conditions for computer applications posed by our findings on group identity and formation group awareness and coordination the meaningful construction of an event experience and its grounding in the event context the shared context and discourses protagonism and active spectatorship moreover we aim at providing new perspective on spectatorship at large scale events which can make research and development more aware of the socio cultural dimension
in this paper we present the ecell temporary collaborative niche for group work in school environments the ecell consists of private inner display and public outer display located in unused public spaces eg in corridors and libraries throughout the school premises the inner display is large touch sensitive screen connected to standard computer the outer display consists of projection on large semitransparent surface combined the two displays comprise an it supported collaborative environment especially suited for project based education through three iterations of design we describe the technological the spatial and the educational aspects of the ecell and outline its potential for supporting collaborative activities in temporary niche in which the architecture of the school itself reflects ongoing work thus the ecell stimulates knowledge sharing awareness and social interaction among pupils and teachers who are part of the school community
current day database applications with large numbers of users require fine grained access control mechanisms at the level of individual tuples not just entire relations views to control which parts of the data can be accessed by each user fine grained access control is often enforced in the application code which has numerous drawbacks these can be avoided by specifying enforcing access control at the database level we present novel fine grained access control model based on authorization views that allows authorization transparent querying that is user queries can be phrased in terms of the database relations and are valid if they can be answered using only the information contained in these authorization views we extend earlier work on authorization transparent querying by introducing new notion of validity conditional validity we give powerful set of inference rules to check for query validity we demonstrate the practicality of our techniques by describing how an existing query optimizer can be extended to perform access control checks by incorporating these inference rules
software prefetching is promising technique to hide cache miss latencies but it remains challenging to effectively prefetch pointer based data structures because obtaining the memory address to be prefetched requires pointer dereferences the recently proposed stride prefetching overcomes this problem but it only exploits inter iteration stride patterns and relies on an off line profiling methodwe propose new algorithm for stride prefetching which is intended for use in dynamic compiler we exploit both inter and intra iteration stride patterns which we discover using an ultra lightweight profiling technique called object inspection this is kind of partial interpretation that only dynamic compiler can perform during the compilation of method the dynamic compiler gathers the profile information by partially interpreting the method using the actual values of parameters and causing no side effectswe evaluated an implementation of our prefetching algorithm in production level java just in time compiler the results show that the algorithm achieved up to an and speedup in industry standard benchmarks on the pentium and the athlon mp respectively while it increased the compilation time by less than
number of widely used programming languages use lexically included files as way to share and encapsulate declarations definitions code and data as the code evolves files included in compilation unit are often no longer required yet locating and removing them is haphazard operation which is therefore neglected the difficulty of reasoning about included files stems primarily from the fact that the definition and use of macros complicates the notions of scope and of identifier boundaries by defining four successively refined identifier equivalence classes we can accurately derive dependencies between identifiers mapping of those dependencies on relationship graph between included files can then be used to determine included files that are not required in given compilation unit and can be safely removed we validate our approach through number of experiments on numerous large production systems copyright copy john wiley sons ltd
xml transformations are very sensitive to types xml types describe the tags and attributes of xml elements as well as the number kind and order of their sub elements therefore operations even simple ones that modify these features may affect the types of documents operations on xml documents are performed by iterators that to be useful need to be typed by kind of polymorphism that goes beyond what currently exists for this reason these iterators are not programmed but rather hard coded in the languages however this approach soon reaches its limits as the hard coded iterators cannot cover fairly standard usage scenarios as solution to this problem we propose generic language to define iterators for xml data this language can either be used as compilation target eg for xpath or it can be grafted on any statically typed host programming language as long as this has product types to endow it with xml processing capabilities we show that our language mostly offers the required degree of polymorphism study its formal properties and show its expressiveness and practical impact by providing several usage examples and encodings
movie trailers or previews are an important method of advertising movies they are extensively shown before movies in cinemas as well as on television and increasingly over the internet making trailer is creative process in which number of shots from movie are selected in order to entice viewer in to paying to see the full movie thus the creation of these trailers is an integral part in the promotion of movie action movies in particular rely on trailers as form of advertising as it is possible to show short exciting portions of an action movie which are likely to appeal to the target audience this paper presents an approach which automatically selects shots from action movies in order to assist in the creation of trailers set of audiovisual features are extracted that aim to model the characteristics of shots typically present in trailers and support vector machine is utilised in order to select the relevant shots the approach taken is not particularly novel but the results show that the process may be used in order to ease the trailer creation process or to facilitate the creation of variable length or personalised trailers
we show that relational algebra calculations for incomplete databases probabilistic databases bag semantics and why provenance are particular cases of the same general algorithms involving semirings this further suggests comprehensive provenance representation that uses semirings of polynomials we extend these considerations to datalog and semirings of formal power series we give algorithms for datalog provenance calculation as well as datalog evaluation for incomplete and probabilistic databases finally we show that for some semirings containment of conjunctive queries is the same as for standard set semantics
equality and subtyping of recursive types were studied in the by amadio and cardelli kozen palsberg and schwartzbach brandt and henglein and others potential applications include automatic generation of bridge code for multilanguage systems and type based retrieval of software modules from libraries in this paper we present an efficient decision procedure for notion of type equality that includes unfolding of recursive types and associativity and commutativity of product types advocated by auerbach barton and raghavachari these properties enable flexible matching of types for two types of size at most our algorithm takes iterations each of which takes time for total of time elsevier science
current web search engines return result pages containing mostly text summary even though the matched web pages may contain informative pictures text excerpt ie snippet is generated by selecting keywords around the matched query terms for each returned page to provide context for user’s relevance judgment however in many scenarios we found that the pictures in web pages if selected properly could be added into search result pages and provide richer contextual description because picture is worth thousand words such new summary is named as image excerpts by well designed user study we demonstrate image excerpts can help users make much quicker relevance judgment of search results for wide range of query types to implement this idea we propose practicable approach to automatically generate image excerpts in the result pages by considering the dominance of each picture in each web page and the relevance of the picture to the query we also outline an efficient way to incorporate image excerpts in web search engines web search engines can adopt our approach by slightly modifying their index and inserting few low cost operations in their workflow our experiments on large web dataset indicate the performance of the proposed approach is very promising
new experimental methods allow researchers within molecular and systems biology to rapidly generate larger and larger amounts of data this data is often made publicly available on the internet and although this data is extremely useful we are not using its full capacity one important reason is that we still lack good ways to connect or integrate information from different resources one kind of resource is the over data sources freely available on the web as most data sources are developed and maintained independently they are highly heterogeneous information is also updated frequently other kinds of resources that are not so well known or commonly used yet are the ontologies and the standards ontologies aim to define common terminology for domain of interest standards provide way to exchange data between data sources and tools even if the internal representations of the data in the resources and tools are different in this chapter we argue that ontological knowledge and standards should be used for integration of data we describe properties of the different types of data sources ontological knowledge and standards that are available on the web and discuss how this knowledge can be used to support integrated access to multiple biological data sources further we present an integration approach that combines the identified ontological knowledge and standards with traditional information integration techniques current integration approaches only cover parts of the suggested approach we also discuss the components in the model on which much recent work has been done in more detail ontology based data source integration ontology alignment and integration using standards although many of our discussions in this chapter are general we exemplify mainly using work done within the rewerse working group on adding semantics to the bioinformatics web
within the university the introduction of computers is creating new criterion of differentiation between those who as matter of course become integrated in the technocratic trend deriving from the daily use of these machines and those who become isolated by not using them this difference increases when computer science and communications merge to introduce virtual educational areas where the conjunction of teacher and pupil in the space time dimension is no longer an essential requirement and where the written text is replaced or rather complemented by the digital textin this article historical defence is made of the presence of this new standard in the creation of digital educational resources such as the hyperdocument as well as the barriers and technological problems deriving from its use furthermore hyco an authoring tool is introduced which facilitates the composition of hypertexts which are stored as semantic learning objects looking for that through of simple and extremely intuitive interface and interaction model any teacher with minimum knowledge of computer science has the possibility of transforming his or her experience and knowledge into useful and quality hypermedia educational resources
current models of direction relations are not designed to describe direction information inside the minimum bounding rectangle mbr of reference region thus the direction relations between overlapping and contained regions cannot be effectively described and derived to resolve this problem new model of direction relations namely interior boundary direction relations is proposed in this study to describe direction concepts relative to the interior or boundary of region such as east part of region west border of region line goes through east part of region and etc by combining the interior and exterior direction relations three types of compositions of direction relations are investigated composing two interior direction relations which can be used to derive the interior or exterior direction relations between two regions with the same parent region composing an interior direction relation with an exterior direction relation and composing an interior with an exterior direction relation the results indicate that the new interior boundary direction relations and its compositions with exterior direction relations are powerful in describing and deriving direction relations between overlapped and contained regions
the periodic broadcasting of frequently requested data can reduce the workload of uplink channels and improve data access for users in wireless network since mobile devices have limited energy capacities associated with their reliance on battery power it is important to minimize the time and energy spent on accessing broadcast data the indexing and scheduling of broadcast data play key role in this problem in this paper we formulate the index and data allocation problem and propose solution that can adapt to any number of broadcast channels we first restrict the considered problem to scenario with no index data replication and introduce an optimal solution and heuristic solution to the single channel and multichannel cases respectively then we discuss how to replicate indexes on the allocation to further improve the performance the results from some experiments demonstrate the superiority of our proposed approach
how can we generate realistic networks in addition how can we do so with mathematically tractable model that allows for rigorous analysis of network properties real networks exhibit long list of surprising properties heavy tails for the in and out degree distribution heavy tails for the eigenvalues and eigenvectors small diameters and densification and shrinking diameters over time current network models and generators either fail to match several of the above properties are complicated to analyze mathematically or both here we propose generative model for networks that is both mathematically tractable and can generate networks that have all the above mentioned structural properties our main idea here is to use non standard matrix operation the kronecker product to generate graphs which we refer to as kronecker graphs first we show that kronecker graphs naturally obey common network properties in fact we rigorously prove that they do so we also provide empirical evidence showing that kronecker graphs can effectively model the structure of real networks we then present kronfit fast and scalable algorithm for fitting the kronecker graph generation model to large real networks naive approach to fitting would take super exponential time in contrast kronfit takes linear time by exploiting the structure of kronecker matrix multiplication and by using statistical simulation techniques experiments on wide range of large real and synthetic networks show that kronfit finds accurate parameters that very well mimic the properties of target networks in fact using just four parameters we can accurately model several aspects of global network structure once fitted the model parameters can be used to gain insights about the network structure and the resulting synthetic graphs can be used for null models anonymization extrapolations and graph summarization the acm portal is published by the association for computing machinery copyright acm inc terms of usage privacy policy code of ethics contact us useful downloads adobe acrobat quicktime windows media player real player
we present xpi core calculus for xml messaging xpi features asynchronous communication pattern matching name and code mobility integration of static and dynamic typing in xpi type system disciplines xml message handling at the level of channels patterns and processes run time safety theorem ensures that in well typed systems no service will ever receive documents it cannot understand and that the offered services will be consistent with the declared channel capacities an inference system is introduced which is proved to be in full agreement with type checking notion of barbed equivalence is defined that takes into account information about service interfaces flexibility and expressiveness of this calculus are illustrated by number of examples some concerning description and discovery of web services
in this paper we propose lightweight algorithm for constructing multi resolution data representations for sensor networks we compute at each sensor node log aggregates about exponentially enlarging neighborhoods centered at the ith aggregate is the aggregated data among nodes approximately within hops of we present scheme named the hierarchical spatial gossip algorithm to extract and construct these aggregates for all sensors simultaneously with total communication cost of polylog the hierarchical gossip algorithm adopts atomic communication steps with each node choosing to exchange information with node distance away with probability the attractiveness of the algorithm attributes to its simplicity low communication cost distributed nature and robustness to node failures and link failures besides the natural applications of multi resolution data summaries in data validation and information mining we also demonstrate the application of the pre computed spatial multi resolution data summaries in answering range queries efficiently
this paper addresses the challenging problem of verifying the safety of pointer dereferences in real java programs we provide an automatic approach to this problem based on sound interprocedural analysis we present staged expanding scope algorithm for interprocedural abstract interpretation which invokes sound analysis with partial programs of increasing scope this algorithm achieves many benefits typical of whole program interprocedural analysis but scales to large programs by limiting analysis to small program fragments to address cases where the static analysis of program fragments fails to prove safety the analysis also suggests possible annotations which if user accepts ensure the desired properties experimental evaluation on number of java programs shows that we are able to verify of all dereferences soundly and automatically and further reduce the number of remaining dereferences using non nullness annotations
program synthesis which is the task of discovering programs that realize user intent can be useful in several scenarios enabling people with no programming background to develop utility programs helping regular programmers automatically discover tricky mundane details program understanding discovery of new algorithms and even teaching this paper describes three key dimensions in program synthesis expression of user intent space of programs over which to search and the search technique these concepts are illustrated by brief description of various program synthesis projects that target synthesis of wide variety of programs such as standard undergraduate textbook algorithms eg sorting dynamic programming program inverses eg decoders deserializers bitvector manipulation routines deobfuscated programs graph algorithms text manipulating routines mutual exclusion algorithms etc
the tension between privacy and awareness has been persistent difficulty in distributed environments that support opportunistic and informal interaction for example many awareness systems that display always on video links or pc screen contents have been perceived as too invasive even though functional real world analogues like open plan offices may provide even less privacy than their online counterparts in this paper we explore the notion of privacy in open plan real world environments in order to learn more about how it might be supported in distributed systems from interviews and observations in four open plan offices we found that attention plays an important role in the management of both confidentiality and solitude the public nature of paying attention allows people to build understandings of what objects in space are legitimate targets for attention and allows people to advertise their interest in interaction our results add to what is known about how privacy works in real world spaces and suggest valuable design ideas that can help improve support for natural privacy control and interaction in distributed awareness systems
data archiving systems rely on replication to preserve information this paper discusses how network of autonomous archiving sites can trade data to achieve the most reliable replication series of binary trades among sites produces peer to peer archiving network two trading algorithms are examined one based on trading collections even if they are different sizes and another based on trading equal sized blocks of space which can then store collections the concept of deeds is introduced deeds track the blocks of space owned by one site at another policies for tuning these algorithms to provide the highest reliability for example by changing the order in which sites are contacted and offered trades are discussed finally simulation results are presented that reveal which policies are best the experiments indicate that digital archive can achieve the best reliability by trading blocks of space deeds and that following certain policies will allow that site to maximize its reliability
semantic similarity relates to computing the similarity between concepts which are not lexicographically similar we investigate approaches to computing semantic similarity by mapping terms concepts to an ontology and by examining their relationships in that ontology some of the most popular semantic similarity methods are implemented and evaluated using wordnet as the underlying reference ontology building upon the idea of semantic similarity novel information retrieval method is also proposed this method is capable of detecting similarities between documents containing semantically similar but not necessarily lexicographically similar terms the proposed method has been evaluated in retrieval of images and documents on the web the experimental results demonstrated very promising performance improvements over state of the art information retrieval methods
as hot research topic many search algorithms have been presented and studied for unstructured peer to peer pp systems during the past few years unfortunately current approaches either cannot yield good lookup performance or incur high search cost and system maintenance overhead the poor search efficiency of these approaches may seriously limit the scalability of current unstructured pp systems in this paper we propose to exploit two dimensional locality to improve pp system search efficiency we present locality aware pp system architecture called foreseer which explicitly exploits geographical locality and temporal locality by constructing neighbor overlay and friend overlay respectively each peer in foreseer maintains small number of neighbors and friends along with their content filters used as distributed indices by combining the advantages of distributed indices and the utilization of two dimensional locality our scheme significantly boosts pp search efficiency while introducing only modest overhead in addition several alternative forwarding policies of foreseer search algorithm are studied in depth on how to fully exploit the two dimensional locality
we study the predecessor existence problem for finite discrete dynamical systems given finite discrete dynamical system and configuration the predecessor existence or pre problem is to determine whether there is configuration such that has transition from to in addition to the decision version we also study the following variants the predecessor existence or pre problem counting the number of predecessors the unique predecessor existence or upre problem deciding whether there is unique predecessor and the ambiguous predecessor existence or apre problem given configuration and predecessor of deciding whether there is different predecessor of general techniques are presented for simultaneously characterizing the computational complexity of the pre problem and its three variants our hardness results are based on the concept of simultaneous reductions single transformations that can be used to simultaneously prove the hardness of the different variants of the pre problem for their respective complexity classes our easiness results are based on dynamic programming and they extend the previous results on pre problem for one dimensional cellular automata the hardness results together with the easiness results provide tight separation between easy and hard instances further the results imply similar bounds for other classes of finite discrete dynamical systems including discrete hopfield and recurrent neural networks concurrent state machines systolic networks and one and two dimensional cellular automata our results extend the earlier results of green sutner and orponen on the complexity of the predecessor existence problem and its variants
this paper proposes new coherence method called multicast snooping that dynamically adapts between broadcast snooping and directory protocol multicast snooping is unique because processors predict which caches should snoop each coherence transaction by specifying multicast mask transactions are delivered with an ordered multicast network such as an isotach network which eliminates the need for acknowledgment messages processors handle transactions as they would with snooping protocol while simplified directory operates in parallel to check masks and gracefully handle incorrect ones eg previous owner missing preliminary performance numbers with mostly splash benchmarks running on processors show that we can limit multicasts to an average of destinations and we can deliver multicasts per network cycle broadcast snooping’s per cycle while these results do not include timing they do provide encouragement that multicast snooping can obtain data directly like broadcast snooping but apply to larger systems like directories
previous research has investigated how people either navigate the web as whole or find information on websites of which they have little previous knowledge however it is now common for people to make frequent use of one site eg their employer’s intranet this paper reports how participants recalled and navigated familiar website they had used for months sketch maps showed that participants memory for the site’s content and structure was very limited in extent but generally accurate navigation data showed that participants had much more difficulty finding the region of the site that contained piece of information than then finding the information itself these data highlight the need for directly accessed pages to be given greater prominence in browser history mechanisms and designers to make information regions memorable finally two navigational path metrics stratum and percentage of revisit actions that correlated with participants performance were identified
formal concept analysis fca was originally proposed by wille which is an important theory for data analysis and knowledge discovery afs axiomatic fuzzy set algebra was proposed by liu liu the fuzzy theory based on afs algebras and afs structure journal of mathematical analysis and applications liu the topology on afs algebra and afs structure journal of mathematical analysis and applications which is semantic methodology relating to the fuzzy theory combining above two theories we propose afs formal concept which can be viewed as the generalization and development of monotone concept proposed by deogun and saquer moreover we show that the set of all afs formal concepts forms complete lattice afs formal concept can be applied to represent the logic operations of queries in information retrieval furthermore we give an approach to find the afs formal concepts whose intents extents approximate any element of afs algebra by virtue of rough set theory
virtualization is increasingly being used in regular desktop pcs data centers and server farms one of the advantages of introducing this additional architectural layer is to increase overall system security in this paper we propose an architecture kvmsec that is an extension to the linux kernel virtual machine aimed at increasing the security of guest virtual machines kvmsec can protect guest virtual machines against attacks such as viruses and kernel rootkits kvmsec enjoys the following features it is transparent to guest machines it is hard to access even from compromised virtual machine it can collect data analyze them and act consequently on guest machines it can provide secure communication between each of the guests and the host and it can be deployed on linux hosts and at present supports linux guest machines these features are leveraged to implement real time monitoring and security management system further differences and advantages over previous solutions are highlighted as well as concrete roadmap for further development
most formal verification tools on the market convert high level register transfer level rtl design into bit level model algorithms that operate at the bit level are unable to exploit the structure provided by the higher abstraction levels and thus are less scalable this tutorial surveys recent advances in formal verification using high level models we present word level verification with predicate abstraction and satisfiability modulo theories smt solvers we then describe techniques for term level modeling and ways to combine word level and term level approaches for scalable verification
wc recommendations xml encryption and xml digital signature can be used to protect the confidentiality of and provide assurances about the integrity of xml documents transmitted over an insecure medium the focus of this paper is how to control access to xml documents once they have been received this is particularly important for services where updates are sent to subscribers we describe how certain access control policies for restricting access to xml documents can be enforced by encrypting specified regions of the document these regions are specified using xpath filters and the policies are based on the hierarchical structure of xml documents we also describe how techniques for assigning keys to security lattice can be adapted to minimize the number of keys that are distributed to users and compare our approach with two other access control frameworks finally we consider how role based access control can be used to enforce more complex access control policies
real world objects are usually composed of number of different materials that often show subtle changes even within single material photorealistic rendering of such objects requires accurate measurements of the reflection properties of each material as well as the spatially varying effects we present an image based measuring method that robustly detects the different materials of real objects and fits an average bidirectional reflectance distribution function brdf to each of them in order to model local changes as well we project the measured data for each surface point into basis formed by the recovered brdfs leading to truly spatially varying brdf representation real world objects often also have fine geometric detail that is not represented in an acquired mesh to increase the detail we derive normal maps even for non lambertian surfaces using our measured brdfs high quality model of real object can be generated with relatively little input data the generated model allows for rendering under arbitrary viewing and lighting conditions and realistically reproduces the appearance of the original object
the distributed routing protocols in use today promise to operate correctly only if all nodes implement the protocol faithfully small insignificant set of nodes have in the past brought an entire network to standstill by reporting incorrect route information the damage caused by these erroneous reports in some instances could have been contained since incorrect route reports sometimes reveal themselves as inconsistencies in the state information of correctly functioning nodes by checking for such inconsitencies and taking preventive action such as disregarding selected route reports correctly functioning node could have limited the damage caused by the malfunctioning nodes our theoretical study attempts to understand when correctly functioning node can by analysing its routing state detect that some node is misimplementing route selection we present methodology called strong detection that helps answer the question we then apply strong detection to three classes of routing protocols distance vector path vector and link state for each class we derive low complexity self monitoring algorithms that take as input the routing state and output whether any detectable anomalies exist we then use these algorithms to compare and contrast the self monitoring power ofthese different classes of protocols in relation to the complexity of the irrouting state
in the arena of automated negotiations we focus on the principal negotiation protocol in bilateral settings ie the alternating offers protocol in the scientific community it is common the idea that bargaining in the alternating offers protocol will play crucial role in the automation of electronic transactions notwithstanding its prominence literature does not present satisfactory solution to the alternating offers protocol in real world settings eg in presence of uncertainty in this paper we game theoretically analyze this negotiation problem with one sided uncertain deadlines and we provide an efficient solving algorithm specifically we analyze the situation where the values of the parameters of the buyer are uncertain to the seller whereas the parameters of the seller are common knowledge the analysis of the reverse situation is analogous in this particular situation the results present in literature are not satisfactory since they do not assure the existence of an equilibrium for every value of the parameters from our game theoretical analysis we find two choice rules that apply an action and probability distribution over the actions respectively to every time point and we find the conditions on the parameters such that each choice rule can be singularly employed to produce an equilibrium these conditions are mutually exclusive we show that it is always possible to produce an equilibrium where the actions at any single time point are those prescribed either by the first choice rule or by the second one we exploit this result for developing solving algorithm the proposed algorithm works backward by computing the equilibrium from the last possible deadline of the bargaining to the initial time point and by applying at each time point the actions prescribed by the choice rule whose conditions are satisfied the computational complexity of the proposed algorithm is asymptotically independent of the number of types of the player whose deadline is uncertain with linear utility functions it is where is the number of the issues and is the length of the bargaining
we present wipdash visualization for software development teams designed to increase group awareness of work items and code base activity wipdash was iteratively designed by working with two development teams using interviews observations and focus groups as well as sketches of the prototype based on those observations and feedback we prototyped wipdash and deployed it with two software teams for one week field study we summarize the lessons learned and include suggestions for future version
skyline queries which return the objects that are better than or equal in all dimensions and better in at least one dimension are useful in many decision making and real time monitor applications with the number of dimensions increasing and continuous large volume data arriving mining the thin skylines over data stream under control of losing quality is more meaningful problem in this paper firstly we propose novel concept called thin skyline which uses skyline object that represents its nearby skyline neighbors within distance acceptable difference then two algorithms are developed which prunes the skyline objects within the acceptable difference and adopts correlation coefficient to adjust adaptively thin skyline query quality furthermore our experimental performance study shows that the proposed methods are both efficient and effective
in many web applications such as blog classification and new sgroup classification labeled data are in short supply it often happens that obtaining labeled data in new domain is expensive and time consuming while there may be plenty of labeled data in related but different domain traditional text classification ap proaches are not able to cope well with learning across different domains in this paper we propose novel cross domain text classification algorithm which extends the traditional probabilistic latent semantic analysis plsa algorithm to integrate labeled and unlabeled data which come from different but related domains into unified probabilistic model we call this new model topic bridged plsa or tplsa by exploiting the common topics between two domains we transfer knowledge across different domains through topic bridge to help the text classification in the target domain unique advantage of our method is its ability to maximally mine knowledge that can be transferred between domains resulting in superior performance when compared to other state of the art text classification approaches experimental eval uation on different kinds of datasets shows that our proposed algorithm can improve the performance of cross domain text classification significantly
the limitations of bgp routing in the internet are often blamed for poor end to end performance and prolonged connectivity interruptions recent work advocates using overlays to effectively bypass bgp’s path selection in order to improve performance and fault tolerance in this paper we explore the possibility that intelligent control of bgp routes coupled with isp multihoming can provide competitive end to end performance and reliability using extensive measurements of paths between nodes in large content distribution network we compare the relative benefits of overlay routing and multihoming route control in terms of round trip latency tcp connection throughput and path availability we observe that the performance achieved by route control together with multihoming to three isps multihoming is within of overlay routing employed in conjunction multihoming in terms of both end to end rtt and throughput we also show that while multihoming cannot offer the nearly perfect resilience of overlays it can eliminate almost all failures experienced by singly homed end network our results demonstrate that by leveraging the capability of multihoming route control it is not necessary to circumvent bgp routing to extract good wide area performance and availability from the existing routing system
periodic broadcast and scheduled multicast have been shown to be very effective in reducing the demand on server bandwidth while periodic broadcast is better for popular videos scheduled multicast is more suitable for less popular ones work has also been done to show that hybrid of these techniques offer the best performance existing hybrid schemes however assume that the characteristic of the workload does not change with time this assumption is not true for many applications such as movie on demand digital video libraries or electronic commerce in this paper we show that existing scheduled multicast techniques are not suited for hybrid designs to address this issue we propose new approach and use it to design an adaptive hybrid strategy our technique adjusts itself to cope with changing workload we provide simulation results to demonstrate that the proposed technique is significantly better than the best static approach in terms of service latency throughput defection rate and unfairness
it is well known that pragmatic knowledge is useful and necessary in many difficult language processing tasks but because this knowledge is difficult to acquire and process automatically it is rarely used we present an open information extraction technique for automatically extracting particular kind of pragmatic knowledge from text and we show how to integrate the knowledge into markov logic network model for quantifier scope disambiguation our model improves quantifier scope judgments in experiments
dwarf is highly compressed structure for computing storing and querying data cubes dwarf identifies prefix and suffix structural redundancies and factors them out by coalescing their store prefix redundancy is high on dense areas of cubes but suffix redundancy is significantly higher for sparse areas putting the two together fuses the exponential sizes of high dimensional full cubes into dramatically condensed data structure the elimination of suffix redundancy has an equally dramatic reduction in the computation of the cube because recomputation of the redundant suffixes is avoided this effect is multiplied in the presence of correlation amongst attributes in the cube petabyte dimensional cube was shrunk this way to gb dwarf cube in less than minutes storage reduction ratio still dwarf provides precision on cube queries and is self sufficient structure which requires no access to the fact table what makes dwarf practical is the automatic discovery in single pass over the fact table of the prefix and suffix redundancies without user involvement or knowledge of the value distributionsthis paper describes the dwarf structure and the dwarf cube construction algorithm further optimizations are then introduced for improving clustering and query performance experiments with the current implementation include comparisons on detailed measurements with real and synthetic datasets against previously published techniques the comparisons show that dwarfs by far out perform these techniques on all counts storage space creation time query response time and updates of cubes
we consider while loop on some space and we are interested in deriving the function that this loop defines between its initial states and its final states when it terminates such capability is useful in wide range of applications including reverse engineering software maintenance program comprehension and program verification in the absence of general theoretical solution to the problem of deriving the function of loop we explore engineering solutions in this paper we use relational refinement calculus to approach this complex problem in systematic manner our approach has many drawbacks some surmountable and some not being inherent to the approach nevertheless it offers way to automatically derive the function of loops or an approximation thereof under some conditions
type casting allows program to access an object as if it had type different from its declared type this complicates the design of pointer analysis algorithm that treats structure fields as separate objects therefore some previous pointer analysis algorithms collapse structure into single variable the disadvantage of this approach is that it can lead to very imprecise points to information other algorithms treat each field as separate object based on its offset and size while this approach leads to more precise results the results are not portable because the memory layout of structures is implementation dependentthis paper first describes the complications introduced by type casting then presents tunable pointer analysis framework for handling structures in the presence of casting different instances of this framework produce algorithms with different levels of precision portability and efficiency experimental results from running our implementations of four instances of this framework show that it is important to distinguish fields of structures in pointer analysis but ii making conservative approximations when casting is involved usually does not cost much in terms of time space or the precision of the results
in this work we propose scientific data exploration methodology and software environment that permits to obtain both data meta clustering and interactive visualizations our approach is based on an elaboration pipeline including data reading multiple clustering solution generation meta clustering and consensus clustering each stage is supported by dedicated visualization and interaction tools involved techniques include price based global optimization algorithm able to build set of solutions that are local minima of the means objective function different consensus methods aimed to reduce the set of solutions tools for the interactive hierarchical agglomeration of clusterings and for the exploration and visualization of the space of clustering solutions
in this paper we provide examples of how thread level speculation tls simplifies manual parallelization and enhances its performance number of techniques for manual parallelization using tls are presented and results are provided that indicate the performance contribution of each technique on seven spec cpu benchmark applications we also provide indications of the programming effort required to parallelize each benchmark tls parallelization yielded speedup on our four floating point applications and speedup on our three integer applications while requiring only approximately programmer hours and lines of non template code per application these results support the idea that manual parallelization using tls is an efficient way to extract fine grain thread level parallelism
protecting individual privacy is an important problem in microdata distribution and publishing anonymization algorithms typically aim to satisfy certain privacy definitions with minimal impact on the quality of the resulting data while much of the previous literature has measured quality through simple one size fits all measures we argue that quality is best judged with respect to the workload for which the data will ultimately be used this article provides suite of anonymization algorithms that incorporate target class of workloads consisting of one or more data mining tasks as well as selection predicates an extensive empirical evaluation indicates that this approach is often more effective than previous techniques in addition we consider the problem of scalability the article describes two extensions that allow us to scale the anonymization algorithms to datasets much larger than main memory the first extension is based on ideas from scalable decision trees and the second is based on sampling thorough performance evaluation indicates that these techniques are viable in practice
energy in sensor networks is distributed non transferable resource over time differences in energy availability are likely to arise protocols like routing trees may concentrate energy usage at certain nodes differences in energy harvesting arising from environmental variations such as if one node is in the sun and another is in the shade can produce variations in charging rates and battery levels because many sensor network applications require nodes to collaborate to ensure complete sensor coverage or route data to the network’s edge small set of nodes whose continued operation is threatened by low batteries can have disproportionate impact on the fidelity provided by the network as whole in the most extreme case the loss of single sink node may render the remainder of the network unreachable while previous research has addressed reducing the energy usage of individual nodes the challenge of collaborative energy management has been largely ignored we present integrated distributed energy awareness idea sensor network service enabling effective network wide energy decision making idea integrates into the sensor network application by providing an api allowing components to evaluate their impact on other nodes idea distributes information about each node’s load rate charging rate and battery level to other nodes whose decisions affect it finally idea enables awareness of the connection between the behavior of each node and the application’s energy goals guiding the network toward states that improve performance this paper describes the idea architecture and demonstrates its use through three case studies using both simulation and testbed experiments we evaluate each idea application by comparing it to simpler approaches that do not integrate distributed energy awareness we show that using idea can significantly improve performance compared with solutions operating with purely local information
in this paper we describe the yeti information sharing system that has been designed to foster community building through informal digital content sharing the yeti system is general information parsing hosting and distribution infrastructure with interfaces designed for individual and public content reading in this paper we describe the yeti public display interface with particular focus on tools we have designed to provide lightweight awareness of others interactions with posted content our tools augment content with metadata that reflect people’s reading of content captured video clips of who’s reading and interacting with content tools to allow people to leave explicit freehand annotations about content and visualization of the content access history to show when content is interacted with results from an initial evaluation are presented and discussed
at every stage in physical design engineers are faced with many different objectives and tools to develop optimize and evaluate their design each choice of tool or an objective to optimize can potentially lead to completely different final physically designed circuit furthermore some of the objectives optimized by the tools are not necessarily the best or right objectives but rather compromised objectives for example placers optimize the half perimeter wirelength rather than the routed wirelength the contributions of this paper are twofold first we define and use metric to measure the consistency of optimizing wirelength during the different stages of physical design our main technique is based on tracing the relative lengths of two nets or more accurately pairs of nets as they progress through the physical design flow second we propose simple method to quantify the similarity between the results of different tools our empirical results point out to the physical design stages where vulnerability can occur from optimizing compromised objectives
in real time software not only computation errors but also timing errors can cause system failures which eventually result in significant physical damages or threats to human life to efficiently guarantee the timely execution of expected functions it is necessary to clearly specify and formally verify timing requirements before performing detailed system design with the expected benefit of reusability and extensibility component technology has been gradually applied to developing industrial applications including real time systems however most of component based approaches applied to real time systems lack in systematic and rigorous approach to specifying and verifying timing requirements at an earlier development stage this paper proposes component based approach to specifying and verifying timing requirements for real time systems in systematic and compositional manner we first describe behaviors of the constituent components including timing requirements in uml diagrams and then translate the uml diagrams into mter nets an extension of ter nets to perform timing analysis in compositional way the merit of the proposed approach is that the specification and analysis results can be reused and independently maintained
this paper presents safechoice sc novel clustering algorithm for wirelength driven placement unlike all previous approaches sc is proposed based on fundamental theorem safe condition which guarantees that clustering would not degrade the placement wirelength to derive such theorem we first introduce the concept of safe clustering ie do clustering without degrading the placement quality to check the safe condition for pair wise clustering we propose selective enumeration technique sc maintains global priority queue pq based on the safeness and area of potential clusters iteratively the cluster at the top of the pq is formed sc automatically stops clustering when generating more clusters would degrade the placement wirelength to achieve other clustering objectives eg any target clustering ratio sc is able to perform under three different modes comprehensive experimental results show that the clusters produced by sc consistently help the placer to achieve the best wirelength among all other clustering algorithms
understanding the characteristics of traffic is increasingly important as the performance gap between the processor and disk based storage continues to widen moreover recent advances in technology coupled with market demands have led to new and exciting developments in storage systems particularly network storage storage utilities and intelligent self optimizing storage in this paper we empirically examine the physical traffic of wide range of server and personal computer pc workloads focusing on how these workloads will be affected by the recent developments in storage systems as part of our analysis we compare our results with historical data and re examine some rules of thumb eg one bit of per second for each instruction per second of processing power that have been widely used for designing computer systems we find that the traffic is bursty and appears to exhibit self similar characteristics our analysis also indicates that there is little cross correlation between traffic volumes of server workloads which suggests that aggregating these workloads will likely help to smooth out the traffic and enable more efficient utilization of resources we discover that there is significant potential for harnessing free system resources to perform background tasks such as optimization of disk block layout in general we observe that the characteristics of the traffic are relatively insensitive to the extent of upstream caching and thus our results still apply on qualitative level when the upstream cache is increased in size
continuous always on monitoring is beneficial for number of applications but potentially imposes high load in terms of communication storage and power consumption when large number of variables need to be monitored we introduce two new filtering techniques swing filters and slide filters that represent within prescribed precision time varying numerical signal by piecewise linear function consisting of connected line segments for swing filters and mostly disconnected line segments for slide filters we demonstrate the effectiveness of swing and slide filters in terms of their compression power by applying them to real life data set plus variety of synthetic data sets for nearly all combinations of signal behavior and precision requirements the proposed techniques outperform the earlier approaches for online filtering in terms of data reduction the slide filter in particular consistently dominates all other filters with up to twofold improvement over the best of the previous techniques
db universal database db udb divides the buffer area into number of independent buffer pools and each database object table or index is assigned to specific buffer pool buffer pool sizing which sets an appropriate size for each of the buffer pools is crucial for achieving optimal performancein this paper we investigate the buffer pool sizing problem two cost models which are based on page fault and data access time are examined greedy algorithm is proposed to search for the optimal solution we study the effectiveness of the above techniques using experiments with the tpc benchmark database the results show that the data access time based cost model is more effective for optimizing the buffer pool sizes than the page fault based cost model
spoken dialogue system performance can vary widely for different users as well for the same user during different dialogues this paper presents the design and evaluation of an adaptive version of toot spoken dialogue system for retrieving online train schedules based on rules learned from set of training dialogues adaptive toot constructs user model representing whether the user is having speech recognition problems as particular dialogue progresses adaptive toot then automatically adapts its dialogue strategies based on this dynamically changing user model an empirical evaluation of the system demonstrates the utility of the approach
several extensions of the web services framework wsf have been proposed the combination with semantic web technologies introduces notion of semantics which can enhance scalability through automation service composition to processes is an equally important issue ontology technology the core of the semantic web can be the central building block of an extension endeavour we present conceptual architecture for ontology based web service development and deployment the development of service based software systems within the wsf is gaining increasing importance we show how ontologies can integrate models languages infrastructure and activities within this architecture to support reuse and composition of semantic web services
we present signal processing framework for analyzing the reflected light field from homogeneous convex curved surface under distant illumination this analysis is of theoretical interest in both graphics and vision and is also of practical importance in many computer graphics problems for instance in determining lighting distributions and bidirectional reflectance distribution functions brdfs in rendering with environment maps and in image based rendering it is well known that under our assumptions the reflection operator behaves qualitatively like convolution in this paper we formalize these notions showing that the reflected light field can be thought of in precise quantitative way as obtained by convolving the lighting and brdf ie by filtering the incident illumination using the brdf mathematically we are able to express the frequency space coefficients of the reflected light field as product of the spherical harmonic coefficients of the illumination and the brdf these results are of practical importance in determining the well posedness and conditioning of problems in inverse rendering estimation of brdf and lighting parameters from real photographs furthermore we are able to derive analytic formulae for the spherical harmonic coefficients of many common brdf and lighting models from this formal analysis we are able to determine precise conditions under which estimation of brdfs and lighting distributions are well posed and well conditioned our mathematical analysis also has implications for forward rendering especially the efficient rendering of objects under complex lighting conditions specified by environment maps the results especially the analytic formulae derived for lambertian surfaces are also relevant in computer vision in the areas of recognition photometric stereo and structure from motion
in this paper two novel methods suitable for blind mesh object watermarking applications are proposed the first method is robust against rotation translation and uniform scaling the second one is robust against both geometric and mesh simplification attacks pseudorandom watermarking signal is cast in the mesh object by deforming its vertices geometrically without altering the vertex topology prior to watermark embedding and detection the object is rotated and translated so that its center of mass and its principal component coincide with the origin and the axis of the cartesian coordinate system this geometrical transformation ensures watermark robustness to translation and rotation robustness to uniform scaling is achieved by restricting the vertex deformations to occur only along the coordinate of the corresponding theta phi spherical coordinate system in the first method set of vertices that correspond to specific angles theta is used for watermark embedding in the second method the samples of the watermark sequence are embedded in set of vertices that correspond to range of angles in the theta domain in order to achieve robustness against mesh simplifications experimental results indicate the ability of the proposed method to deal with the aforementioned attacks
sensor network mac protocols typically sacrifice packet latency to achieve energy efficiency such delays may well increase due to routing protocol operation for this reason it is imperative that we attempt to quantify the end to end delay and energy consumption when jointly using low duty cycle mac and routing protocols in this paper we present comprehensive evaluation of merlin mac and efficient routing integrated with support for localization cross layer protocol that integrates both mac and routing features in contrast to many sensor network protocols it employs multicast upstream and multicast downstream approach to relaying packets to and from the gateway simultaneous reception and transmission errors are notified by asynchronous burst ack and negative burst ack messages division of the network into timezones together with an appropriate scheduling policy enables the routing of packets to the closest gateway an evaluation of merlin has been conducted through simulation against both the smac and the esr routing protocols an improved version of the dsr algorithm the results illustrate that the joint usage of both smac and esr in low duty cycle scenarios causes extremely high end to end delays and prevents acceptable data delivery rate merlin as an integrated approach notably reduces latency resulting in nodes that can deliver data in very low duty cycle yielding significant extension to network lifetime
mems based storage is an emerging nonvolatile secondary storage technology it promises high performance high storage density and low power consumption with fundamentally different architectural designs from magnetic disk mems based storage exhibits unique two dimensional positioning behaviors and efficient power state transitions we model these low level device specific properties of mems based storage and present request scheduling algorithms and power management strategies that exploit the full potential of these devices our simulations show that mems specific device management policies can significantly improve system performance and reduce power consumption
mirror based systems are object oriented reflective architectures built around set of design principles that lead to reflective apis which foster high degree of reusability loose coupling with base level objects and whose structure and design corresponds to the system being mirrored however support for behavioral intercession has been limited in contemporary mirror based architectures in spite of its many interesting applications this is due to the fact that mirror based architectures only support explicit reflection while behavioral intercession requires implicit reflection this work reconciles mirrors with behavioral intercession we discuss the design of mirror based architecture with implicit mirrors that can be absorbed in the interpreter and mirages base objects whose semantics are defined by implicit mirrors we describe and illustrate the integration of this reflective architecture for the distributed object oriented programming language ambienttalk
increasingly tight energy design goals require processor architects to rethink the organizational structure of microarchitectural resources we examine new multilateral cache organization replacing conventional data cache with set of smaller region caches that significantly reduces energy consumption with little performance impact this is achieved by tailoring the cache resources to the specific reference characteristics of each application in applications with small heap footprints we save about of the total cache energy in the remaining applications we employ small cache for frequently accessed heap data and larger cache for low locality data achieving an energy savings of
making the interactions with digital user interface disappears into and becomes part of the human to human interaction and conversation is challenge conventional metaphor and underlying interface infrastructure for single user desktop systems have been traditionally geared towards single mouse and keyboard click and type based wimp interface design on the other hand people usually meet in social context around table facing each other table setting provides large interactive visual and tangible surface it affords and encourages collaboration coordination serendipity as well as simultaneous and parallel interaction among multiple people in this paper we examine and explore the opportunities challenges research issues pitfalls and plausible approaches for enabling direct touchable shared social interactions on multi touch multi user tabletops
several recent studies have demonstrated that the type of improvements in information retrieval system effectiveness reported in forums such as sigir and trec do not translate into benefit for users two of the studies used an instance recall task and third used question answering task so perhaps it is unsurprising that the precision based measures of ir system effectiveness on one shot query evaluation do not correlate with user performance on these tasks in this study we evaluate two different information retrieval tasks on trec web track data precision based user task measured by the length of time that users need to find single document that is relevant to trec topic and simple recall based task represented by the total number of relevant documents that users can identify within five minutes users employ search engines with controlled mean average precision map of between and our results show that there is no significant relationship between system effectiveness measured by map and the precision based task significant but weak relationship is present for the precision at one document returned metric weak relationship is present between map and the simple recall based task
dynamic binary optimizers store altered copies of original program instructions in software managed code caches in order to maximize reuse of transformed code code caches store code blocks that may vary in size reference other code blocks and carry high replacement overhead these unique constraints reduce the effectiveness of conventional cache management policies our work directly addresses these unique constraints and presents several contributions to the code cache management problem first we show that evicting more than the minimum number of code blocks from the code cache results in less run time overhead than the existing alternatives such granular evictions reduce overall execution time as the fixed costs of invoking the eviction mechanism are amortized across multiple cache insertions second study of the ideal lifetimes of dynamically generated code blocks illustrates the benefit of replacement algorithm based on generational heuristic we describe and evaluate generational approach to code cache management that makes it easy to identify long lived code blocks and simultaneously avoid any fragmentation because of the eviction of short lived blocks finally we present results from an implementation of our generational approach in the dynamorio framework and illustrate that as dynamic optimization systems become more prevalent effective code cache management policies will be essential for reliable scalable performance of modern applications
while work in recent years has demonstrated that wavelets can be efficiently used to compress large quantities of data and provide fast and fairly accurate answers to queries little emphasis has been placed on using wavelets in approximating datasets containing multiple measures existing decomposition approaches will either operate on each measure individually or treat all measures as vector of values and process them simultaneously we show in this paper that the resulting individual or combined storage approaches for the wavelet coefficients of different measures that stem from these existing algorithms may lead to suboptimal storage utilization which results to reduced accuracy to queries to alleviate this problem we introduce in this work the notion of an extended wavelet coefficient as flexible storage method for the wavelet coefficients and propose novel algorithms for selecting which extended wavelet coefficients to retain under given storage constraint experimental results with both real and synthetic datasets demonstrate that our approach achieves improved accuracy to queries when compared to existing techniques
real time and embedded systems have traditionally been designed for closed environments where operating conditions input workloads and resource availability are known priori and are subject to little or no change at runtime there is increasing demand however for adaptive capabilities in distributed real time and embedded dre systems that execute in open environments where system operational conditions input workload and resource availability cannot be characterized accurately priori challenging problem faced by researchers and developers of such systems is devising effective adaptive resource management strategies that can meet end to end quality of service qos requirements of applications to address key resource management challenges of open dre systems this paper presents the hierarchical distributed resource management architecture hidra which provides adaptive resource management using control techniques that adapt to workload fluctuations and resource availability for both bandwidth and processor utilization simultaneously this paper presents three contributions to research in adaptive resource management for dre systems first we describe the structure and functionality of hidra second we present an analytical model of hidra that formalizes its control theoretic behavior and presents analytical assurance of system performance third we evaluate the performance of hidra via experiments on representative dre system that performs real time distributed target tracking our analytical and empirical results indicate that hidra yields predictable stable and efficient system performance even in the face of changing workload and resource availability
tinyos applications are built with software components that communicate through narrow interfaces since components enable fine grained code reuse this approach has been successful in creating applications that make very efficient use of the limited code and data memory on sensor network nodes however the other important benefit of components rapid application development through black box reuse remains largely unrealized because in many cases interfaces have implied usage constraints that can be the source of frustrating program errors developers are commonly forced to read the source code for components partially defeating the purpose of using components in the first place our research helps solve these problems by allowing developers to explicitly specify and enforce component interface contracts due to the extensive reuse of the most common interfaces implementing contracts for small number of frequently reused interfaces permitted us to extensively check number of applications we uncovered some subtle and previously unknown bugs in applications that have been in common use for years
field programmable gate arrays fpgas provide designers with the ability to quickly create hardware circuits increases in fpga configurable logic capacity and decreasing fpga costs have enabled designers to more readily incorporate fpgas in their designs fpga vendors have begun providing configurable soft processor cores that can be synthesized onto their fpga products while fpgas with soft processor cores provide designers with increased flexibility such processors typically have degraded performance and energy consumption compared to hard core processors previously we proposed warp processing technique capable of optimizing software application by dynamically and transparently re implementing critical software kernels as custom circuits in on chip configurable logic in this paper we study the potential of microblaze soft core based warp processing system to eliminate the performance and energy overhead of soft core processor compared to hard core processor we demonstrate that the soft core based warp processor achieves average speedups of and energy reductions of compared to the soft core alone our data shows that soft core based warp processor yields performance and energy consumption competitive with existing hard core processors thus expanding the usefulness of soft processor cores on fpgas to broader range of applications
in this poster we investigate how to enhance web clustering by leveraging the tripartite network of social tagging systems we propose clustering method called tripartite clustering which cluster the three types of nodes resources users and tags simultaneously based on the links in the social tagging network the proposed method is experimented on real world social tagging dataset sampled from delicious we also compare the proposed clustering approach with means all the clustering results are evaluated against human maintained web directory the experimental results show that tripartite clustering significantly outperforms the content based means approach and achieves performance close to that of social annotation based means whereas generating much more useful information
abstract the analysis of the semantics of temporal data and queries plays central role in the area of temporal databases although many different algebræ and models have been proposed almost all of them are based on point based snapshot semantics for data on the other hand in the areas of linguistics philosophy and recently artificial intelligence an oft debated issue concerns the use of an interval based versus point based semantics in this paper we first show some problems inherent in the adoption of point based semantics for data then argue that these problems arise because there is no distinction drawn in the data between telic and atelic facts we then introduce three sorted temporal model and algebra including coercion functions for transforming relations of one sort into relations of the other at query time which properly copes with these issues
in this paper we compare four algorithms for the mapping of pipelined applications on heterogeneous multiprocessor platform implemented using field programmable gate arrays fpgas with customizable processors initially we describe the framework and the model of pipelined application we adopted then we focus on the problem of mapping set of pipelined applications onto heterogeneous multiprocessor platform and consider four search algorithms tabu search simulated annealing genetic algorithms and the bayesian optimization algorithm we compare the performance of these four algorithms on set of synthetic problems and on two real world applications the jpeg image encoding and the adpcm sound encoding our results show that on our framework the bayesian optimization algorithm outperforms all the other three methods for the mapping of pipelined applications
reorganization of objects in an object databases is an important component of several operations like compaction clustering and schema evolution the high availability requirements times operation of certain application domains requires reorganization to be performed on line with minimal interference to concurrently executing transactions in this paper we address the problem of on line reorganization in object databases where set of objects have to be migrated from one location to another specifically we consider the case where objects in the database may contain physical references to other objects relocating an object in this case involves finding the set of objects parents that refer to it and modifying the references in each parent we propose an algorithm called the incremental reorganization algorithm ira that achieves the above task with minimal interference to concurrently executing transactions the ira algorithm holds locks on at most two distinct objects at any point of time we have implemented ira on brahma storage manager developed at iit bombay and conducted an extensive performance study our experiments reveal that ira makes on line reorganization feasible with very little impact on the response times of concurrently executing transactions and on overall system throughput we also describe how the ira algorithm can handle system failures
we present uniform non monotonic solution to the problems of reasoning about action on the basis of an argumentation theoretic approach our theory is provably correct relative to sensible minimisation policy introduced on top of temporal propositional logic sophisticated problem domains can be formalised in our framework as much attention of researchers in the field has been paid to the traditional and basic problems in reasoning about actions such as the frame the qualification and the ramification problems approaches to these problems within our formalisation lie at heart of the expositions presented in this paper
heterogeneous data co clustering has attracted more and more attention in recent years due to its high impact on various applications while the co clustering algorithms for two types of heterogeneous data denoted by pair wise co clustering such as documents and terms have been well studied in the literature the work on more types of heterogeneous data denoted by high order co clustering is still very limited as an attempt in this direction in this paper we worked on specific case of high order co clustering in which there is central type of objects that connects the other types so as to form star structure of the inter relationships actually this case could be very good abstract for many real world applications such as the co clustering of categories documents and terms in text mining in our philosophy we treated such kind of problems as the fusion of multiple pair wise co clustering sub problems with the constraint of the star structure accordingly we proposed the concept of consistent bipartite graph co partitioning and developed an algorithm based on semi definite programming sdp for efficient computation of the clustering results experiments on toy problems and real data both verified the effectiveness of our proposed method
software model checking has been successful for sequential programs where predicate abstraction offers suitable models and counterexample guided abstraction refinement permits the automatic inference of models when checking concurrent programs we need to abstract threads as well as the contexts in which they execute stateless context models such as predicates on global variables prove insufficient for showing the absence of race conditions in many examples we therefore use richer context models which combine predicates for abstracting data state control flow quotients for abstracting control state and counters for abstracting an unbounded number of threads we infer suitable context models automatically by combination of counterexample guided abstraction refinement bisimulation minimization circular assume guarantee reasoning and parametric reasoning about an unbounded number of threads this algorithm called circ has been implemented in blast and succeeds in checking many examples of nesc code for data races in particular blast proves the absence of races in several cases where previous race checkers give false positives
adaptive groupware systems support changes in users locations devices roles and collaborative structure developing such systems is difficult due to the complex distributed systems programming involved in this paper we introduce fiia novel architectural style for groupware fiia is user centered in that it allows easy specification of groupware structured around users settings devices and applications and where adaptations are specified at high level similar to scenarios the fiianet toolkit automatically maps fiia architectures to wide range of possible distributed systems under control of an annotation language together these allow developers to work at high level while retaining control over distribution choices
exertion games are an emerging form of interactive games that require players to invest significant physical effort as part of the gameplay rather than just pressing buttons these exertion games have potential health benefits by promoting exercise it is also believed that they can facilitate social play between players and that social play can improve participation in exertion games however there is currently lack of understanding of how to design games to support these effects in this paper we present qualitative case study that illustrates how networked environments support social play in exertion games and how this can help to gain an understanding of existing games and support the design of future games this work offers preliminary analytical and descriptive account of the relationship between exertion and social play in such game and highlights the influence of design with the aim of utilizing the attributed benefits of exertion and social play
response time is key differentiation among electronic commerce commerce applications for many commerce applications web pages are created dynamically based on the current state of business stored in database systems recently the topic of web acceleration for database driven web applications has drawn lot of attention in both the research community and commercial arena in this paper we analyze the factors that have impacts on the performance and scalability of web applications we discuss system architecture issues and describe approaches to deploying caching solutions for accelerating web applications we give the performance matrix measurement for network latency and various system architectures the paper is summarized with road map for creating high performance web applications
the uptake of digital photos vs print photos has altered the practice of photo sharing print photos are easy to share within the home but much harder to share outside of it the opposite is true of digital photos people easily share digital photos outside the home eg to family and friends by mail gift giving and to social networks and the broader public by web publishing yet within the home collocated digital photo sharing is harder primarily because digital photos are typically stored on personal accounts in desktop computers located in home offices this leads to several consequences the invisibility of digital photos implies few opportunities for serendipitous photo sharing access control and navigation issues inhibit family members from retrieving photo collections photo viewing is compromised as digital photos are displayed on small screens in an uncomfortable viewing setting to mitigate some of these difficulties we explore how physical memorabilia collected by family members can create opportunities that encourage social and collocated digital photo sharing first we studied via contextual interviews with households how families currently practice photo sharing and how they keep memorabilia we identified classes of memorabilia that can serve as memory triggers to family events trips and times when people took photos second we designed souvenirs photo viewing system that exploits memorabilia as social instrument using souvenirs family member can meaningfully associate physical memorabilia with particular photo sets later any family member can begin their story telling with others through the physical memento and then enrich the story by displaying its associated photos simply by moving the memento close to the home’s large format television screen third we re examined our design premises by evoking household reactions to an early version of souvenirs based on these interviews we redesigned souvenirs to better reflect the preferences and real practices of photo and memorabilia use in the home
we present novel level of detail selection method for real time rendering that works on hierarchies of discrete and continuous representations we integrate point rendered objects with polygonal geometry and demonstrate our approach in terrain flyover application where the digital elevation model is augmented with forests the vegetation is rendered as continuous sequence of splats which are organized in hierarchy further we discuss enhancements to our basic method to improve its scalability
reduced energy consumption is one of the most important design goals for embedded application domains like wireless multimedia and biomedical instruction memory hierarchy has been proven to be one of the most power hungry parts of the system this paper introduces an architectural enhancement for the instruction memory to reduce energy and improve performance the proposed distributed instruction memory organization requires minimal hardware overhead and allows execution of multiple loops in parallel in uni processor system this architecture enhancement can reduce the energy consumed in the instruction and data memory hierarchy by and improve the performance by compared to enhanced smt based architectures
an efficient and reliable file storage system is important to micro sensor nodes so that data can be logged for later asynchronous delivery across multi hop wireless sensor network designing and implementing such file system for sensor node faces various challenges sensor nodes are highly resource constrained in terms of limited runtime memory limited persistent storage and finite energy also the flash storage medium on sensor nodes differs in variety of ways from the traditional hard disk eg in terms of the limited number of writes for flash memory unit we present the design and implementation of elf an efficient log structured flash based file system tailored for sensor nodes elf is adapted to achieve memory efficiency low power operation and tailored support for common types of sensor file operations such as appending data to file elf’s log structured approach achieves wear levelling across flash memory pages with limited write lifetimes elf also uniquely provides garbage collection capability as well as reliability for micro sensor nodes performance evaluation of an implementation of elf based on tinyos and mica sensor motes is presented
space partitioned moving objects databases sp mods allow for the scalable distributed management of large sets of mobile objects trajectories by partitioning the trajectory data to network of database servers processing spatio temporal query therefore requires efficiently routing to the servers storing the affected trajectory segments with coordinate based query like spatio temporal range query the relevant servers are directly determined by the queried range however with trajectory based queries like retrieving the distance covered by certain object during given time interval the relevant servers depend on actual movement of the queried object therefore efficient routing mechanisms for trajectory based queries are an important challenge in sp mods in this paper we present the distributed trajectory index dti that allows for such efficient query routing by creating an overlay network for each trajectory we further present an enhanced index called dti it accelerates the processing of queries on aggregates of dynamic attributes like the maximum speed during time interval by augmenting dti with summaries of trajectory segments our simulations with network of database servers show that dti can reduce the overall processing time by more than
industrial designers make sketches and physical models to start and develop ideas and concept designs such representations have advantages that they support fast intuitive rich sensory exploration of solutions although existing tools and techniques provide adequate support where the shape of the product is concerned the exploration of surface qualities such as material and printed graphics is supported to much lesser extent moreover there are no tools that have the fluency of sketching that allow combined exploration of shape material and their interactions this paper evaluates skin an augmented reality tool designed to solve these two shortcomings by projecting computer generated images onto the shape model skin allows for sketchy tangible interaction where designers can explore surface qualities on three dimensional physical shape model the tool was evaluated in three design situations in the domain of ceramics design in each case we found that the joint exploration of shape and surface provided creative benefits in the form of new solutions in addition gain in efficiency was found in at least one case the results show that joint exploration of shape and surface can be effectively supported with tangible augmented reality techniques and suggest that this can be put to practical use in industry today
data summarization is an important data mining task which aims to find compact description of dataset emerging applications place special requirements to the data summarization techniques including the ability to find concise and informative summary from high dimensional data the ability to deal with different types of attributes such as binary categorical and numeric attributes end user comprehensibility of the summary insensibility to noise and missing values and scalability with the data size and dimensionality in this work general framework that satisfies all of these requirements is proposed to summarize high dimensional data we formulate this problem in bipartite graph scheme mapping objects data records and values of attributes into two disjoint groups of nodes of graph in which set of representative objects is discovered as the summary of the original data further the capability of representativeness is measured using the mdl principle which helps to yield highly intuitive summary with the most informative objects of the input data while the problem of finding the optimal summary with minimal representation cost is computationally infeasible an approximate optimal summary is achieved by heuristic algorithm whose computation cost is quadratic to the size of data and linear to the dimensionality of data in addition several techniques are developed to improve both quality of the resultant summary and efficiency of the algorithm detailed study on both real and synthetic datasets shows the effectiveness and efficiency of our approach in summarizing high dimensional datasets with binary categorical and numeric attributes
when cooperating with each other enterprises must closely monitor internal processes and those of partners to streamline business to business bb workflows this work applies the process view model which extends beyond conventional activity based process models to design workflows across multiple enterprises process view is an abstraction of an implemented process an enterprise can design various process views for different partners based on diverse commercial relationships and in doing so establish an integrated process that consists of internal processes and process views that each partner provides participatory enterprises can obtain appropriate progress information from their own integrated processes allowing them to collaborate effectively furthermore bb workflows are coordinated through virtual states of process views this work develops uniform approach to manage state mappings between internal processes and process views the proposed approach enhances prevalent activity based process models adaptable to collaborative environments
we present motion magnification technique that acts like microscope for visual motion it can amplify subtle motions in video sequence allowing for visualization of deformations that would otherwise be invisible to achieve motion magnification we need to accurately measure visual motions and group the pixels to be modified after an initial image registration step we measure motion by robust analysis of feature point trajectories and segment pixels based on similarity of position color and motion novel measure of motion similarity groups even very small motions according to correlation over time which often relates to physical cause an outlier mask marks observations not explained by our layered motion model and those pixels are simply reproduced on the output from the original registered observationsthe motion of any selected layer may be magnified by user specified amount texture synthesis fills in unseen holes revealed by the amplified motions the resulting motion magnified images can reveal or emphasize small motions in the original sequence as we demonstrate with deformations in load bearing structures subtle motions or balancing corrections of people and rigid structures bending under hand pressure
xmage is introduced in this paper as method for partial similarity searching in image databases region based image retrieval is method of retrieving partially similar images it has been proposed as way to accurately process queries in an image database in region based image retrieval region matching is indispensable for computing the partial similarity between two images because the query processing is based upon regions instead of the entire image naive method of region matching is sequential comparison between regions which causes severe overhead and deteriorates the performance of query processing in this paper new image contents representation called condensed extended histogram cxhistogram is presented in conjunction with well defined distance function cxsim on the cx histogram the cxsim is new image to image similarity measure to compute the partial similarity between two images it achieves the effect of comparing regions of two images by simply comparing the two images the cxsim reduces query space by pruning irrelevant images and it is used as filtering function before sequential scanning extensive experiments were performed on real image data to evaluate xmage it provides significant pruning of irrelevant images with no false dismissals as consequence it achieves up to fold speed up in search over the tree search followed by sequential scanning
refactoring is hot and controversial issue supporters claim that it helps increasing the quality of the code making it easier to understand modify and maintain moreover there are also claims that refactoring yields higher development productivity however there is only limited empirical evidence of such assumption case study has been conducted to assess the impact of refactoring in close to industrial environment results indicate that refactoring not only increases aspects of software quality but also improves productivity our findings are applicable to small teams working in similar highly volatile domains ours is application development for mobile devices however additional research is needed to ensure that this is indeed true and to generalize it to other contexts
scaling of transistor feature sizes has provided remarkable advancement in silicon industry for last three decades however while the performance increases due to scaling the power density increases substantially every generation due to higher integration density furthermore the demand for power sensitive design has grown significantly in recent years due to tremendous growth in portable applications consequently the need for power efficient design techniques has grown considerably several efficient design techniques have been proposed to reduce both dynamic as well as static power in state of the art vlsi circuit applications in this paper we discuss different circuit techniques that are used to maintain the power consumption both static and dynamic within limit while achieving the highest possible performance
knowledge of low level control flow is essential for many compiler optimizations in systems with tail call optimization the determination of interprocedural control flow is complicated by the fact that because of tail call optimization control flow at procedure returns is not readily evident from the call graph of the program this article shows how interprocedural control flow analysis of first order programs can be carried out using well known concepts from parsing theory in particular we show that context insensitive or zeroth order control flow analysis corresponds to the notion of follow sets in context free grammars while context sensitive or first order control flow analysis corresponds to the notion of lr items the control flow information so obtained can be used to improve the precision of interprocedural dataflow analyses as well as to extend certain low level code optimizations across procedure boundaries
this paper addresses the problem of streaming packetized media over lossy packet network through an intermediate proxy server to client in rate distortion optimized way the proxy located at the junction of the backbone network and the last hop to the client coordinates the communication between the media server and the client using hybrid receiver sender driven streaming in rate distortion optimization framework the framework enables the proxy to determine at every instant which packets if any it should either request from the media server or re transmit directly to the client in order to meet constraints on the average transmission rates on the backbone and the last hop while minimizing the average end to end distortion performance gains are observed over rate distortion optimized sender driven systems for streaming packetized video content the improvement in performance depends on the quality of the network path both in the backbone network and along the last hop
to model detailed expressive face based on the limited user constraints is challenge work in this paper we present the facial expression editing technique based on dynamic graph model the probabilistic relations between facial expressions and the complex combination of local facial features as well as the temporal behaviors of facial expressions are represented by the hierarchical dynamic bayesian network given limited user constraints on the sparse feature mesh the system can infer the basis expression probabilities which are used to locate the corresponding expressive mesh in the shape space spanned by the basis models the experiments demonstrate the dense facial meshes corresponding to the user constraints can be synthesized effectively
this paper designs routing and channel assignment game strong transmission game in non cooperative wireless mesh networks due to the nature of mesh routers relay nodes ie they are dedicated and have sufficient power supply this game consists of only service requestors our main contributions in this paper are as follows we prove that there always exists pure strategy nash equilibrium in the game and the optimal solution of our game is nash equilibrium as well the price of anarchy is proved to be furthermore our heuristic algorithms are introduced to approach the equilibrium state in the sense of the optimal routing and channel assignment response of every requestor while the decisions from other agents are fixed to evaluate our scheme substantial simulation results are presented and the conclusion is twofold our proposal is not far from the optimal even performance gains can be expected as compared with off the shelf techniques
network on chip noc is increasingly needed to interconnect the large number and variety of intellectual property ip cells that make up system on chip soc the network must be able to communicate between cells in di erent clock domains and do so with minimal space power and latency overhead in this paper we describe an asynchronous noc using an elastic flow protocol and methods of automatically generating topology and router placement we use the communication profile of the soc design to drive the binary tree topology creation and the physical placement of routers and force directed approach to determine router locations the nature of elastic flow removes the need for large router bu ers and thus we gain significant power and space advantage compared to traditional nocs additionally our network is deadlock free and paths have bounded worst case communication latencies
to achieve effective distributed components we rely on an active object model from which we build asynchronous and distributed components that feature the capacity to exhibit various valuable properties as confluence and determinism and for which we can specify the behaviourwe will emphasise how important it is to rely on precise and formal programming model and how practical component systems can benefit from theoretical inputs
schema mapping is declarative specification of the relationship between instances of source schema and target schema the data exchange or data translation problem asks given an instance over the source schema materialize an instance or solution over the target schema that satisfies the schema mapping in general given source instance may have numerous different solutions among all the solutions universal solutions and core universal solutions have been singled out and extensively studied universal solution is most general one and also represents the entire space of solutions while core universal solution is the smallest universal solution and is unique up to isomorphism hence we can talk about the core the problem of designing efficient algorithms for computing the core has attracted considerable attention in recent years in this paper we present method for directly computing the core by sql queries when schema mappings are specified by source to target tuple generating dependencies tgds unlike prior methods that given source instance first compute target instance and then recursively minimize that instance to the core our method avoids the construction of such intermediate instances this is done by rewriting the schema mapping into laconic schema mapping that is specified by first order tgds with linear order in the active domain of the source instances laconic schema mapping has the property that direct translation of the source instance according to the laconic schema mapping produces the core furthermore laconic schema mapping can be easily translated into sql hence it can be optimized and executed by database system to produce the core we also show that our results are optimal the use of the linear order is inevitable and in general schema mappings with constraints over the target schema cannot be rewritten to laconic schema mapping
combinatorial auctions provide suitable mechanisms for efficient allocation of resources to self interested agents considering ubiquitous computing scenarios the ability to complete an auction within fine grained time period without loss of allocation efficiency is in strong demand furthermore to achieve such scenarios it is very important to handle large number of bids in an auction recently we proposed an algorithm to obtain sufficient quality of winners in very short time however it is demanded to analyze which factor is mainly affected to obtain such good performance also it is demanded to clarify the actual implementation level performance of the algorithm compared to major commercial level generic problem solver in this paper we show our parallel greedy updating approach contributes its better performance furthermore we show our approach has certain advantage compared to latest commercial level implementation of generic lp solver through various experiments
the increasing of computational power requirements for dsp and multimedia application and the needs of easy to program development environment has driven recent programmable devices toward very long instruction word vliw architectures and hw sw co design environments vliw architecture allows generating optimized machine code from high level languages exploiting instruction level parallelism ilp furthermore applications requirements and time to market constraints are growing dramatically moving functionalities toward system on chip soc direction this paper presents vliw sim an application driven architecture design approach based on instruction set simulation vliw architectures and instruction set simulation were chosen to fulfill multimedia domain requirements and to implement an efficient hw sw co design environment the vliw sim simulation technology is based on pipeline status modeling simulation cache and simulation oriented hw description an effective support for hw sw co design requires high simulation performance in terms of simulated instruction per second sips flexibility the ability to represent number of different architectures and cycle accuracy there is strong trade off between these features cycle accurate or close to cycle accurate simulation have usually low performance good simulation performance can be obtained loosing the simulator flexibility moreover soc simulation requires further degree of flexibility in simulating different components core co processors memories buses the proposed approach is focused on interpretative not compiled re configurable instruction set simulator iss in order to support both application design and architecture exploration vliw sim main features are efficient host resource allocation instruction set and architecture description flexibility instruction set dynamic generation and simulation oriented hardware description step by step pipeline status tracking simulation speed and accuracy performance of simulation test for three validation case studies ti tmscx ti tmscx and st are reported
data broadcasting is well known for its excellent scalability most geographical data such as weather and traffic is public information that has large amount of potential users which makes it very suitable for broadcast the query response time is greatly affected by the order in which data items are being broadcast this paper proposes an efficient method to place geographical data items over broadcast channel that reduces access time for spatial range queries on them this paper then performs evaluation studies comparing different ordering methods random orderings tree traversal ordering hilbert ordering and the optimized ordering based on the proposed method the results show that the optimized ordering is significantly better than the others
exhibiting new features and likely related matching techniques to efficiently retrieve images from databases remains an open problem this paper is first devoted to such novel description of coloured textures by lrs local relational string so based on relative relations between neighbour pixels and on their distribution it is illumination invariant and able to capture some semantics local to objects through selected image stripes second we propose bi directional query candidate matching based on region ie potential object mutual preferences the remaining third of the paper reports on the experimentation over images providing results in the treceval format about the performance compared with common available techniques and about the influence of key parameters in the proposed segmentation and pairing processes
an increasing trend in mobile and pervasive computing is the augmentation of everyday public spaces with local computation leading to so called smart environments however there are no well accepted techniques for supporting spontaneous interaction between mobile users and these smart environments though wide range of techniques have been explored ranging from gesture recognition to downloading applications to user’s phone in this paper we explore an approach to supporting such interaction based on the use of bluetooth device user friendly names as control channel between users mobile phones and computational resources in their local environment such an approach has many advantages over existing techniques though it is not without limitations our work focuses specifically on the use of device names to control and customize applications on large public displays in campus environment this paper describes our basic approach number of applications that we have constructed using this technique and the results of our evaluation work which has included range of user studies and field trials the paper concludes with an assessment of the viability of using our approach for interaction scenarios involving mobile users and computationally rich environments
recommender systems have been subject to an enormous rise in popularity and research interest over the last ten years at the same time very large taxonomies for product classification are becoming increasingly prominent among commerce systems for diverse domains rendering detailed machine readable content descriptions feasible amazoncom makes use of an entire plethora of hand crafted taxonomies classifying books movies apparel and various other goods we exploit such taxonomic background knowledge for the computation of personalized recommendations hereby relationships between super concepts and sub concepts constitute an important cornerstone of our novel approach providing powerful inference opportunities for profile generation based upon the classification of products that customers have chosen ample empirical analysis both offline and online demonstrates our proposal’s superiority over common existing approaches when user information is sparse and implicit ratings prevail
many traditional information retrieval models such as bm and language modeling give good retrieval effectiveness but can be difficult to implement efficiently recently document centric impact models were developed in order to overcome some of these efficiency issues however such models have number of problems including poor effectiveness and heuristic term weighting schemes in this work we present statistical view of document centric impact models we describe how such models can be treated statistically and propose supervised parameter estimation technique we analyze various theoretical and practical aspects of the model and show that weights estimated using our new estimation technique are significantly better than the integer based weights used in previous studies
the context interchange strategy presents novel perspective for mediated data access in which semantic conflicts among heterogeneous systems are not identified priori but are detected and reconciled by context mediator through comparison of contexts axioms corresponding to the systems engaged in data exchange in this article we show that queries formulated on shared views export schema and shared ldquo ontologies rdquo can be mediated in the same way using the context interchange framework the proposed framework provides logic based object oriented formalsim for representing and reasoning about data semantics in disparate systems and has been validated in prototype implementation providing mediated data access to both traditional and web based information sources
data flow testing is well known technique and it has proved to be better than the commercially used branch testing the problem with data flow testing is that apart from scalar variables only approximate information is available this paper presents an algorithm that precisely determines the definition use pairs for arrays within large domain there are numerous methods addressing the array data flow problem however these methods are only used in the optimization or parallelization of programs data flow testing however requires at least one real solution of the problem for which the necessary program path is executed contrary to former precise methods we avoid negation in formulae which seems to be the biggest problem in all previous methods
we present question answering qa system which learns how to detect and rank answer passages by analyzing questions and their answers qa pairs provided as training data we built our system in only few person months using off the shelf components part of speech tagger shallow parser lexical network and few well known supervised learning algorithms in contrast many of the top trec qa systems are large group efforts using customized ontologies question classifiers and highly tuned ranking functions our ease of deployment arises from using generic trainable algorithms that exploit simple feature extractors on qa pairs with trec qa data our system achieves mean reciprocal rank mrr that compares favorably with the best scores in recent years and generalizes from one corpus to another our key technique is to recover from the question fragments of what might have been posed as structured query had suitable schema been available comprises selectors tokens that are likely to appear almost unchanged in an answer passage the other fragment contains question tokens which give clues about the answer type and are expected to be replaced in the answer passage by tokens which specialize or instantiate the desired answer type selectors are like constants in where clauses in relational queries and answer types are like column names we present new algorithms for locating selectors and answer type clues and using them in scoring passages with respect to question
while the use of mapreduce systems such as hadoop for large scale data analysis has been widely recognized and studied we have recently seen an explosion in the number of systems developed for cloud data serving these newer systems address cloud oltp applications though they typically do not support acid transactions examples of systems proposed for cloud serving use include bigtable pnuts cassandra hbase azure couchdb simpledb voldemort and many others further they are being applied to diverse range of applications that differ considerably from traditional eg tpc like serving workloads the number of emerging cloud serving systems and the wide range of proposed applications coupled with lack of apples to apples performance comparisons makes it difficult to understand the tradeoffs between systems and the workloads for which they are suited we present the yahoo cloud serving benchmark ycsb framework with the goal of facilitating performance comparisons of the new generation of cloud data serving systems we define core set of benchmarks and report results for four widely used systems cassandra hbase yahoo pnuts and simple sharded mysql implementation we also hope to foster the development of additional cloud benchmark suites that represent other classes of applications by making our benchmark tool available via open source in this regard key feature of the ycsb framework tool is that it is extensible it supports easy definition of new workloads in addition to making it easy to benchmark new systems
the paper focuses on motion based information extraction from cluttered video image sequences novel method is introduced which can reliably detect walking human figures contained in such images the method works with spatio temporal input information to detect and classify patterns typical of human movement our algorithm consists of real time operations which is an important factor in practical applications the paper presents new information extraction and temporal tracking method based on simplified version of the symmetry pattern extraction which pattern is characteristic for the moving legs of walking person these spatio temporal traces are labelled by kernel fisher discriminant analysis with the use of temporal tracking and non linear classification we have achieved pedestrian detection from cluttered image scenes with correct classification rate of from to step periods the detection rates of linear classifier and svm are also presented in the results hereby the necessity of non linear method and the power of kfda for this detection task is also demonstrated
we present an adaptive distributed query sampling framework that is quality conscious for extracting high quality text database samples the framework divides the query based sampling process into an initial seed sampling phase and quality aware iterative sampling phase in the second phase the sampling process is dynamically scheduled based on estimated database size and quality parameters derived during the previous sampling process the unique characteristic of our adaptive query based sampling framework is its self learning and self configuring ability based on the overall quality of all text databases under consideration we introduce three quality conscious sampling schemes for estimating database quality and our initial results show that the proposed framework supports higher quality document sampling than existing approaches
verified compilers such as leroy’s compcert are accompanied by fully checked correctness proof both the compiler and proof are often constructed with an interactive proof assistant this technique provides strong end to end correctness guarantee on top of small trusted computing base unfortunately these compilers are also challenging to extend since each additional transformation must be proven correct in full formal detail at the other end of the spectrum techniques for compiler correctness based on domain specific language for writing optimizations such as lerner’s rhodium and cobalt make the compiler easy to extend the correctness of additional transformations can be checked completely automatically unfortunately these systems provide weaker guarantee since their end to end correctness has not been proven fully formally we present an approach for compiler correctness that provides the best of both worlds by bridging the gap between compiler verification and compiler extensibility in particular we have extended leroy’s compcert compiler with an execution engine for optimizations written in domain specific and proved that this execution engine preserves program semantics using the coq proof assistant we present our compcert extension xcert including the details of its execution engine and proof of correctness in coq furthermore we report on the important lessons learned for making the proof development manageable
embedded systems often include traditional processor capable of executing sequential code but both control and data dominated tasks are often more naturally expressed using one of the many domain specific concurrent specification languages this article surveys variety of techniques for translating these concurrent specifications into sequential code the techniques address compiling wide variety of languages ranging from dataflow to petri nets each uses different method to some degree chosen to match the semantics of concurrent language each technique is considered to consist of partial evaluator operating on an interpreter this combination provides clearer picture of how parts of each technique could be used in different setting
security is major target for today’s information systems is designers security modelling languages exist to reason on security in the early phases of is development when the most crucial design decisions are made reasoning on security involves analysing risk and effectively communicating risk related information however we think that current languages can be improved in this respect in this paper we discuss this issue for secure tropos the language supporting the eponymous agent based is development we analyse it and suggest improvements in the light of an existing reference model for is security risk management this allows for checking secure tropos concepts and terminology against those of current risk management standards thereby improving the conceptual appropriateness of the language the paper follows running example called esap located in the healthcare domain
barycentric coordinates can be used to express any point inside triangle as unique convex combination of the triangle’s vertices and they provide convenient way to linearly interpolate data that is given at the vertices of triangle in recent years the ideas of barycentric coordinates and barycentric interpolation have been extended to arbitrary polygons in the plane and general polytopes in higher dimensions which in turn has led to novel solutions in applications like mesh parameterization image warping and mesh deformation in this paper we introduce new generalization of barycentric coordinates that stems from the maximum entropy principle the coordinates are guaranteed to be positive inside any planar polygon can be evaluated efficiently by solving convex optimization problem with newton’s method and experimental evidence indicates that they are smooth inside the domain moreover the construction of these coordinates can be extended to arbitrary polyhedra and higher dimensional polytopes
recent research has shown how the formal modeling of concurrent systems can benefit from monadic structuring with this approach formal system model is really program in domain specific language defined by monad for shared state concurrency can these models be compiled into efficient implementations this paper addresses this question and presents an overview of techniques for compiling monadic concurrency models directly into reasonably efficient software and hardware implementations the implementation techniques described in this article form the basis of semantics directed approach to model driven engineering
it is important to consider fast and reliable detections of sensing events occurring in various parts of the sensor network we propose cluster based energy aware event detection scheme where events are reliably relayed to sink in the form of aggregated data packets the clustering scheme provides faster and better event detection and reliability control capabilities to the areas of the network where an event is occurring it also reduces network overhead latency and loss of even information due to cluster rotation the proposed scheme has the following new features new concept of energy level based cluster head ch selection event packet being capable of transmitting from the chs to the sink while the cluster are being formed the sink’s assigning dynamically adaptable reliability factor to clusters mechanism used to control the transmission rate of the sensors according to the assigned cluster reliability etc
case study in adaptive information filtering systems for the web is presented the described system comprises two main modules named humos and wifs humos is user modeling system based on stereotypes it builds and maintains long term models of individual internet users representing their information needs the user model is structured as frame containing informative words enhanced with semantic networks the proposed machine learning approach for the user modeling process is based on the use of an artificial neural network for stereotype assignments wifs is content based information filtering module capable of selecting html text documents on computer science collected from the web according to the interests of the user it has been created for the very purpose of the structure of the user model utilized by humos currently this system acts as an adaptive interface to the web search engine alta vistatm an empirical evaluation of the system has been made in experimental settings the experiments focused on the evaluation by means of non parametric statistics approach of the added value in terms of system performance given by the user modeling component it also focused on the evaluation of the usability and user acceptance of the system the results of the experiments are satisfactory and support the choice of user model based approach to information filtering on the web
in modern computers program’s data locality can affect performance significantly this paper details full sparse tiling run time reordering transformation that improves the data locality for stationary iterative methods such as gauss seidel operating on sparse matrices in scientific applications such as finite element analysis these iterative methods dominate the execution time full sparse tiling chooses permutation of the rows and columns of the sparse matrix and then an order of execution that achieves better data locality we prove that full sparse tiled gauss seidel generates solution that is bitwise identical to traditional gauss seidel on the permuted matrix we also present measurements of the performance improvements and the overheads of full sparse tiling and of cache blocking for irregular grids related technique developed by douglas et al
we generalise fiore et al’s account of variable binding for untyped cartesian contexts to give an account of binding for either variables or names that may be typed we do this in an enriched setting allowing the incorporation of recursion into the analysis extending earlier work by us we axiomatise the notion of context by defining and using the notion of an enriched pseudo monad on cat with leading examples of given by set and omega cpo the latter yielding an account of recursion fiore et al implicitly used the pseudo monad fp on cat for small categories with finite products given set of types our extension to typed binders and enrichment involves generalising from fiore et al’s use of set to sa op we define substitution monoidal structure on sa op allowing us to give definition of binding signature at this level of generality and extend initial algebra semantics to the typed enriched axiomatic setting this generalises and axiomatises previouswork by fiore et al and later authors in particular cases in particular it includes the logic of bunched implications and variants infinitary examples and structures not previously considered such as those generated by finite limits
future mobile markets are expected to increasingly embrace location based services this paper presents new system architecture for location based services which consists of location database and distributed location anonymizers the service is privacy aware in the sense that the location database always maintains degree of anonymity the location database service permits three different levels of query and can thus be used to implement wide range of location based services furthermore the architecture is scalable and employs simple functions that are similar to those found in general database systems
while runahead execution is effective at parallelizing independent long latency cache misses it is unable to parallelize dependent long latency cache misses to overcome this limitation this paper proposes novel technique address value delta avd prediction an avd predictor keeps track of the address pointer load instructions for which the arithmetic difference ie delta between the effective address and the data value is stable if such load instruction incurs long latency cache miss during runahead execution its data value is predicted by subtracting the stable delta from its effective address this prediction enables the pre execution of dependent instructions including load instructions that incur long latency cache misses we describe how why and for what kind of loads avd prediction works and evaluate the design tradeoffs in an implementable avd predictor our analysis shows that stable avds exist because of patterns in the way data structures are allocated in memory our results show that augmenting runahead processor with simple entry avd predictor improves the average execution time of set of pointer intensive applications by
skyline queries compute the set of pareto optimal tuples in relation that is those tuples that are not dominated by any other tuple in the same relation although several algorithms have been proposed for efficiently evaluating skyline queries they either necessitate the relation to have been indexed or have to perform the dominance tests on all the tuples in order to determine the result in this article we introduce salsa novel skyline algorithm that exploits the idea of presorting the input data so as to effectively limit the number of tuples to be read and compared this makes salsa also attractive when skyline queries are executed on top of systems that do not understand skyline semantics or when the skyline logic runs on clients with limited power and or bandwidth we prove that if one considers symmetric sorting functions the number of tuples to be read is minimized by sorting data according to ldquo minimum coordinate rdquo minc criterion and that performance can be further improved if data distribution is known and an asymmetric sorting function is used experimental results obtained on synthetic and real datasets show that salsa consistently outperforms state of the art sequential skyline algorithms and that its performance can be accurately predicted
how can we automatically spot all outstanding observations in data set this question arises in large variety of applications eg in economy biology and medicine existing approaches to outlier detection suffer from one or more of the following drawbacks the results of many methods strongly depend on suitable parameter settings being very difficult to estimate without background knowledge on the data eg the minimum cluster size or the number of desired outliers many methods implicitly assume gaussian or uniformly distributed data and or their result is difficult to interpret to cope with these problems we propose coco technique for parameter free outlier detection the basic idea of our technique relates outlier detection to data compression outliers are objects which can not be effectively compressed given the data set to avoid the assumption of certain data distribution coco relies on very general data model combining the exponential power distribution with independent components we define an intuitive outlier factor based on the principle of the minimum description length together with an novel algorithm for outlier detection an extensive experimental evaluation on synthetic and real world data demonstrates the benefits of our technique availability the source code of coco and the data sets used in the experiments are available at http wwwdbsifilmude forschung kdd boehm coco
traditional collision intensive multi body simulations are difficult to control due to extreme sensitivity to initial conditions or model parameters furthermore there may be multiple ways to achieve any one goal and it may be difficult to codify user’s preferences before they have seen the available solutions in this paper we extend simulation models to include plausible sources of uncertainty and then use markov chain monte carlo algorithm to sample multiple animations that satisfy constraints user can choose the animation they prefer or applications can take direct advantage of the multiple solutions our technique is applicable when probability can be attached to each animation with ldquo good rdquo animations having high probability and for such cases we provide definition of physical plausibility for animations we demonstrate our approach with examples of multi body rigid body simulations that satisfy constraints of various kinds for each case presenting animations that are true to physical model are significantly different from each other and yet still satisfy the constraints
we present data from detailed observation of information workers that shows that they experience work fragmentation as common practice we consider that work fragmentation has two components length of time spent in an activity and frequency of interruptions we examined work fragmentation along three dimensions effect of collocation type of interruption and resumption of work we found work to be highly fragmented people average little time in working spheres before switching and of their working spheres are interrupted collocated people work longer before switching but have more interruptions most internal interruptions are due to personal work whereas most external interruptions are due to central work though most interrupted work is resumed on the same day more than two intervening activities occur before it is we discuss implications for technology design how our results can be used to support people to maintain continuity within larger framework of their working spheres
prototyping is the pivotal activity that structures innovation collaboration and creativity in design prototypes embody design hypotheses and enable designers to test them framin design as thinking by doing activity foregrounds iteration as central concern this paper presents dtools toolkit that embodies an iterative design centered approach to prototyping information appliances this work offers contributions in three areas first dtools introduces statechart based visual design tool that provides low threshold for early stage prototyping extensible through code for higher fidelity prototypes second our research introduces three important types of hardware extensibility at the hardware to pc interface the intra hardware communication level and the circuit level third dtools integrates design test and analysis of information appliances we have evaluated dtools through three studies laboratory study with thirteen participants rebuilding prototypes of existing and emerging devices and by observing seven student teams who built prototypes with dtools
increasingly software are required to be ready to adapt itself to the changing environment caused by wide range of maintenance evolution and operation problems furthermore in large complex distributed systems and continuous running systems the traditional approaches to bringing about change require that the system be taken offline temporarily which is often undesirable due to requirements for high availability to address this new kind of capability dynamic software adaptation which refers to software changes in both structure and behavior without bring it down is proposedin this paper we explore an architecture based mobile agent approach to dynamic software adaptation our goal is to automate the software adaptation on the fly on the basis of explicating and reasoning about architectural knowledge about the running system for that we introduce the dynamic software architecture which means the architecture itself can also be introspected and altered at runtime to guide and control the adaptation we use the architectural reflection to observe and control the system architecture while use the architectural style to ensure the consistency and correctness of the architecture reconfiguration to handle the adaptation of the running system mobile agents which is well suited for complex management issues is employed mobile agents carry self contained mobile code and act upon running componentsthe usage of meta architecture and the mobile agents not only forms an adaptation feedback loop onto the running system it also separates the concerns among the architectural model the target system and the facilities use for adaptation it will simplify the developing deploying and maintaining of the system while pose good basis for enabling the reuse of the adaptation facilities
during the last decade multimedia databases have become increasingly important in many application areas such as medicine cad geography and molecular biology an important research topic in multimedia databases is similarity search in large data sets most current approaches that address similarity search use the feature approach which transforms important properties of the stored objects into points of high dimensional space feature vectors thus similarity search is transformed into neighborhood search in feature space multidimensional index structures are usually applied when managing feature vectors query processing can be improved substantially with optimization techniques such as blocksize optimization data space quantization and dimension reduction to determine optimal parameters an accurate estimate of index based query processing performance is crucial in this paper we develop cost model for index structures for point databases such as the tree and the tree it provides accurate estimates of the number of data page accesses for range queries and nearest neighbor queries under euclidean metric and maximum metric and maximum metric the problems specific to high dimensional data spaces called boundary effects are considered the concept of the fractal dimension is used to take the effects of correlated data into account
this paper describes new hair rendering technique for anime characters the overall goal is to improve current cel shaders by introducing new hair model and hair shader the hair renderer is based on painterly rendering algorithm which uses large amount of particles the hair model is rendered twice first for generating the silhouettes and second for shading the hair strands in addition we also describe modified technique for specular highlighting most of the rendering steps except the specular highlighting are performed on the gpu and take advantage of recent graphics hardware however since the number of particles determines the quality of the hair shader large number of particles is used which reduces the performance accordingly
the ever increasing gap between processor and memory speeds has motivated the design of embedded systems with deeper cache hierarchies to avoid excessive miss rates instead of using bigger cache memories and more complex cache controllers program transformations have been proposed to reduce the amount of capacity and conflict misses this is achieved however by complicating the memory index arithmetic code which results in performance degradation when executing the code on programmable processors with limited address capabilities however when these are complemented by high level address code transformations the overhead introduced can be largely eliminated at compile time in this paper the clear benefits of the combined approach is illustrated on two real life applications of industrial relevance using popular programmable processor architectures and showing important gains in energy factor less with relatively small penalty in execution time instead of factors overhead without the address optimisation stage the results of this paper leads to systematic pareto optimal trade off supported by tools between memory power and cpu cycles which has up to now not been feasible for the targeted systemsthe ever increasing gap between processor and memory speeds has motivated the design of embedded systems with deeper cache hierarchies to avoid excessive miss rates instead of using bigger cache memories and more complex cache controllers program transformations have been proposed to reduce the amount of capacity and conflict misses this is achieved however by complicating the memory index arithmetic code which results in performance degradation when executing the code on programmable processors with limited address capabilities however when these are complemented by high level address code transformations the overhead introduced can be largely eliminated at compile time in this paper the clear benefits of the combined approach is illustrated on two real life applications of industrial relevance using popular programmable processor architectures and showing important gains in energy factor less with relatively small penalty in execution time instead of factors overhead without the address optimisation stage the results of this paper leads to systematic pareto optimal trade off supported by tools between memory power and cpu cycles which has up to now not been feasible for the targeted systems
we describe three generations of information integration systems developed at george mason university all three systems adopt virtual database design global integration schema mapping between this schema and the schemas of the participating information sources and automatic interpretation of global queries the focus of multiplex is rapid integration of very large evolving and heterogeneous collections of information sources fusionplex strengthens these capabilities with powerful tools for resolving data inconsistencies finally autoplex takes more proactive approach to integration by recruiting contributions to the global integration schema from available information sources using machine learning techniques it confronts major cost of integration that of mapping new sources into the global schema
session takes place between two parties after establishing connection each party interleaves local computations and communications sending or receiving with the other session types characterise such sessions in terms of the types of values communicated and the shape of protocols and have been developed for the calculus corba interfaces and functional languages we study the incorporation of session types into object oriented languages through moose multi threaded language with session types thread spawning iterative and higher order sessions our design aims to consistently integrate the object oriented programming style and sessions and to be able to treat various case studies from the literature we describe the design of moose its syntax operational semantics and type system and develop type inference system after proving subject reduction we establish the progress property once communication has been established well typed programs will never starve at communication points
extracting data on the web is an important information extraction task most existing approaches rely on wrappers which require human knowledge and user interaction during extraction this paper proposes the use of conditional models as an alternative solution to this task deriving the strength of conditional models like maximum entropy and maximum entropy markov models our method offers three major advantages the full automation the ability to incorporate various non independent overlapping features of different hypertext representations and the ability to deal with missing and disordered data fields the experimental results on wide range of commercial websites with different layouts show that our method can achieve satisfactory trade off between automation and accuracy and also provide practical application of automated data extraction from the web
the authors present an approach to acquiring knowledge from previously processed queries by using newly acquired knowledge together with given semantic knowledge it is possible to make the query processor and or optimizer more intelligent so that future queries can processed more efficiently the acquired knowledge is in the form of constraints while some constraints are to be enforced for all database states others are known to be valid for the current state of the database the former constraints are statistic integrity constraints while the latter are called dynamic integrity constraints some situations in which certain dynamic semantic constraints can be automatically extracted are identified this automatic tool for knowledge acquisition can also be used as an interactive tool for identifying potential static integrity constraints the concept of minimal knowledge base is introduced and method to maintain the knowledge base is presented an algorithm to compute the restriction selection closure ie all deductible restrictions from given set of restrictions join predicates as given in query and constraints is given
top queries are desired aggregation operations on data sets examples of queries on network data include the top source as’s top ports or top domain names over ip packets or over ip flow records since the complete dataset is often not available or not feasible to examine we are interested in processing top queries from samples if all records can be processed the top items can be obtained by counting the frequency of each item even when the full dataset is observed however resources are often insufficient for such counting and techniques were developed to overcome this issue when we can observe only random sample of the records an orthogonal complication arises the top frequencies in the sample are biased estimates of the actual top frequencies this bias as depends on the distribution and must be accounted for when seeking the actual value we address this by designing and evaluating several schemes that derive rigorous confidence bounds for top estimates simulations on various data sets that include ip flows data show that schemes that exploit more of the structure of the sample distribution produce much tight confidence intervals with an order of magnitude fewer samples than simpler schemes that utilize only the sampled top frequencies the simpler schemes however are more efficient in terms of computation our work is basic and is widely applicable to all applications that process top and heavy hitters queries over random sample of the actual records
the existence of high degree of free riding is serious threat to peer to peer pp networks in this paper we propose distributed framework to reduce the adverse effects of free riding on pp networks our solution primarily focuses on locating free riders and taking actions against them we propose framework in which each peer monitors its neighbors decides if they are free riders and takes appropriate actions unlike other proposals against free riding our framework does not require any permanent identification of peers or security infrastructures for maintaining global reputation system our simulation results show that the framework can reduce the effects of free riding and can therefore increase the performance of pp network
software performance is an important non functional quality attribute and software performance evaluation is an essential activity in the software development process especially in embedded real time systems software design and evaluation are driven by the needs to optimize the limited resources to respect time deadlines and at the same time to produce the best experience for end users software product family architectures add additional requirements to the evaluation process in this case the evaluation includes the analysis of the optimizations and tradeoffs for the whole products in the family performance evaluation of software product family architectures requires knowledge and clear understanding of different domains software architecture assessments software performance and software product family architecture we have used scenario driven approach to evaluate performance and dynamic memory management efficiency in one nokia software product family architecture in this paper we present two case studies furthermore we discuss the implications and tradeoffs of software performance against evolvability and maintenability in software product family architectures
portlet syndication is the next wave following the successful use of content syndication in current portals portlets can be regarded as web components and the portal as the component container where portlets are aggregated to provide higher order applications this perspective requires departure from how current web portals are envisaged the portal is no longer perceived as set of pages but as an integrated set of web components that are now delivered through the portal from this perspective the portal page now acts as mere conduit for portlets page and page navigation dilute in favor of portlet and portlet orchestration however the mapping from portlet orchestration design time to page navigation implementation time is too tedious and error prone for instance the fact that the same portlet can be placed in distinct pages produces code clones that are repeated along the pages that contain this portlet this redundancy substantiates in the first place the effort to move to model driven development this work uses the exo platform as the target psm and the pim is based on hypermedia model based on statecharts the paper shows how this approach accounts for portal validation verification to be conducted earlier at the pim level and streamlines both design and implementation of exo portals running example is used throughout the paper
to ensure sustainable operations of wireless sensor systems environmental energy harvesting has been regarded as the right solution for long term applications in energy dynamic environments energy conservation is no longer considered necessarily beneficial because energy storage units eg batteries or capacitors are limited in capacity and leakage prone in contrast to legacy energy conservation approaches we aim at energy synchronization for wireless sensor devices the starting point of this work is twinstar which uses ultra capacitor as the only energy storage unit to efficiently use the harvested energy we design and implement leakage aware feedback control techniques to match local and network wide activity of sensor nodes with the dynamic energy supply from environments we conduct system evaluation under three typical real world settings indoor outdoor and mobile backpack under wide range of system settings results indicate our leakage aware control can effectively utilize energy that could otherwise leak away nodes running leakage aware control can enjoy more energy than the ones running non leakage aware control and application performance eg event detection can be improved significantly
ml modules provide hierarchical namespace management as well as fine grained control over the propagation of type information but they do not allow modules to be broken up into mutually recursive separately compilable components mixin modules facilitate recursive linking of separately compiled components but they are not hierarchically composable and typically do not support type abstraction we synthesize the complementary advantages of these two mechanisms in novel module system design we call mixml mixml module is like an ml structure in which some of the components are specified but not defined in other words it unifies the ml structure and signature languages into one mixml seamlessly integrates hierarchical composition translucent mlstyle data abstraction and mixin style recursive linking moreover the design of mixml is clean and minimalist it emphasizes how all the salient semantically interesting features of the ml module system as well as several proposed extensions to it can be understood simply as stylized uses of small set of orthogonal underlying constructs with mixin composition playing central role
we present new approach to fluid simulation that balances the speed of model reduction with the flexibility of grid based methods we construct set of composable reduced models or tiles which capture spatially localized fluid behavior we then precompute coupling terms so that these models can be rearranged at runtime to enforce consistency between tiles we introduce constraint reduction this technique modifies reduced model so that given set of linear constraints can be fulfilled because dynamics and constraints can be solved entirely in the reduced space our method is extremely fast and scales to large domains
we present an interactive system for reconstructing surface normals from single image our approach has two complementary contributions first we introduce novel shape from shading algorithm sfs that produces faithful normal reconstruction for local image region high frequency component but it fails to faithfully recover the overall global structure low frequency component our second contribution consists of an approach that corrects low frequency error using simple markup procedure this approach aptly called rotation palette allows the user to specify large scale corrections of surface normals by drawing simple stroke correspondences between the normal map and sphere image which represents rotation directions combining these two approaches we can produce high quality surfaces quickly from single images
we propose new notion of declassification policy called linear declassification linear declassification controls not only which functions may be applied to declassify high security values but also how often the declassification functions may be applied we present linear type system which guarantees that well typed programs never violate linear declassification policies to state formal security property guaranteed by the linear declassification we also introduce linear relaxed non interference as an extension of li and zdancewic’s relaxed non interference an application of the linear relaxed non interference to quantitative information flow analysis is also discussed
information processing is often such large and complex artifact that it goes beyond human being’s capacity to conceive model and develop it with all of its aspects at time for this reason it is typical for one to focus on some aspects of it in one time and on other aspects in another time depending on the problem at hand for recurrent situations it is necessary to have structured and well defined set of perspectives which guide selections and shifts of focuses this paper presents light weight perspective ontology which provides set of well defined perspectives established on three dimensions to conceive issues in information processing in an organized manner the perspective ontology can be applied on different information processing layers such as information systems is information systems development isd and method engineering me to demonstrate the applicability of the ontology it is used to derive set of is perspectives with basic is concepts and constructs the is perspectives are then deployed as framework in comparative analysis of current perspectives in the is literature
the unification problem in algebras capable of describing sets has been tackled directly or indirectly by many researchers and it finds important applications in various research areas eg deductive databases theorem proving static analysis rapid software prototyping the various solutions proposed are spread across large literature in this paper we provide uniform presentation of unification of sets formalizing it at the level of set theory we address the problem of deciding existence of solutions at an abstract level this provides also the ability to classify different types of set unification problems unification algorithms are uniformly proposed to solve the unification problem in each of such classes the algorithms presented are partly drawn from the literature ndash and properly revisited and analyzed ndash and partly novel proposals in particular we present new goal driven algorithm for general aci unification and new simpler algorithm for general ab ell unification
we present scheme for optimal vlsi layout and packaging of butterfly networks under the thompson model the multilayer grid model and the hierarchical layout model we show that when layers of wires are available an node butterfly network can be laid out with area log log maximum wire length log log and volume log log under the multilayer grid model where only one active layer for network nodes is required and layers of wires are available our layout scheme allows us to partition an node butterfly network into thgr node clusters with an average of dic ap log log for any constant integer inter cluster links per node leading to optimal layout and packaging at the same time under the hierarchical layout model the scalability of our layouts are optimal in that we can allow each of log nodes to occupy an area as large as log and each of the remaining network nodes to occupy an area as large as log without increasing the leading constants of layout area volume or maximum wire length
new and emerging pp applications require sophisticated range query capability and also have strict requirements on query correctness system availability and item availability while there has been recent work on developing new pp range indices none of these indices guarantee correctness and availability in this paper we develop new techniques that can provably guarantee the correctness and availability of pp range indices we develop our techniques in the context of general pp indexing framework that can be instantiated with most pp index structures from the literature as specific instantiation we implement ring an existing pp range index and show how it can be extended to guarantee correctness and availability we quantitatively evaluate our techniques using real distributed implementation
this paper describes shasta system that supports shared address space in software on clusters of computers with physically distributed memory unique aspect of shasta compared to most other software distributed shared memory systems is that shared data can be kept coherent at fine granularity in addition the system allows the coherence granularity to vary across different shared data structures in single application shasta implements the shared address space by transparently rewriting the application executable to intercept loads and stores for each shared load or store the inserted code checks to see if the data is available locally and communicates with other processors if necessary the system uses numerous techniques to reduce the run time overhead of these checks since shasta is implemented entirely in software it also provides tremendous flexibility in supporting different types of cache coherence protocols we have implemented an efficient cache coherence protocol that incorporates number of optimizations including support for multiple communication granularities and use of relaxed memory models this system is fully functional and runs on cluster of alpha workstationsthe primary focus of this paper is to describe the techniques used in shasta to reduce the checking overhead for supporting fine granularity sharing in software these techniques include careful layout of the shared address space scheduling the checking code for efficient execution on modern processors using simple method that checks loads using only the value loaded reducing the extra cache misses caused by the checking code and combining the checks for multiple loads and stores to characterize the effect of these techniques we present detailed performance results for the splash applications running on an alpha processor without our optimizations the checking overheads are excessively high exceeding for several applications however our techniques are effective in reducing these overheads to range of to for almost all of the applications we also describe our coherence protocol and present some preliminary results on the parallel performance of several applications running on our workstation cluster our experience so far indicates that once the cost of checking memory accesses is reduced using our techniques the shasta approach is an attractive software solution for supporting shared address space with fine grain access to data
we propose an algorithm to effectively cluster specific type of text documents textual responses gathered through survey system due to the peculiar features exhibited in such responses eg short in length rich in outliers and diverse in categories traditional unsupervised and semi supervised clustering techniques are challenged to achieve satisfactory performance as demanded by survey task we address this issue by proposing semi supervised topic driven approach it first employs an unsupervised algorithm to generate preliminary clustering schema for all the answers to question human expert then uses this schema to identify the major topics in these answers finally topic driven clustering algorithm is adopted to obtain an improved clustering schema we evaluated this approach using five questions in survey we recently conducted in the us the results demonstrate that this approach can lead to significant improvement in clustering quality
systems on chip soc are becoming increasingly complex with large number of applications integrated on the same chip such system often supports large number of use cases and is dynamically reconfigured when platform conditions or user requirements change networks on chip noc offer the designer unsurpassed runtime flexibility this flexibility stems from the programmability of the individual routers and network interfaces when change in use case occurs the application task graph and the network connections change to mitigate the complexity in programming the many registers controlling the noc an abstraction in the form of configuration library is needed in addition such library must leave the modified system in consistent state from which normal operation can continue in this paper we present the facilities for controlling change in reconfigurable noc we show the architectural additions and the many trade offs in the design of run time library for noc reconfiguration we qualitatively and quantitatively evaluate the performance memory requirements predictability and reusability of the different implementations
self organizing structured peer to peer pp overlay networks like can chord pastry and tapestry offer novel platform for variety of scalable and decentralized distributed applications these systems provide efficient and fault tolerant routing object location and load balancing within self organizing overlay networkone major problem with these systems is how to bootstrap them how do you decide which overlay to join how do you find contact node in the overlay to join how do you obtain the code that you should run current systems require that each node that participates in given overlay supports the same set of applications and that these applications are pre installed on each nodein this position paper we sketch the design of an infrastructure that uses universal overlay to provide scalable infrastructure to bootstrap multiple service overlays providing different functionality it provides mechanisms to advertise services and to discover services contact nodes and service code
in this paper generalized adaptive ensemble generation and aggregation gaega method for the design of multiple classifier systems mcss is proposed gaega adopts an over generation and selection strategy to achieve good bias variance tradeoff in the training phase different ensembles of classifiers are adaptively generated by fitting the validation data globally with different degrees the test data are then classified by each of the generated ensembles the final decision is made by taking into consideration both the ability of each ensemble to fit the validation data locally and reducing the risk of overfitting in this paper the performance of gaega is assessed experimentally in comparison with other multiple classifier aggregation methods on data sets the experimental results demonstrate that gaega significantly outperforms the other methods in terms of average accuracy ranging from to
while there have been many recent proposals for hardware that supports thread level speculation tls there has been relatively little work on compiler optimizations to fully exploit this potential for parallelizing programs optimistically in this paper we focus on one important limitation of program performance under tls which is stalls due to forwarding scalar values between threads that would otherwise cause frequent data dependences we present and evaluate dataflow algorithms for three increasingly aggressive instruction scheduling techniques that reduce the critical forwarding path introduced by the synchronization associated with this data forwarding in addition we contrast our compiler techniques with related hardware only approaches with our most aggressive compiler and hardware techniques we improve performance under tls by for of applications and by at least for half of the other applications
collaborative systems that automate the sharing of programmer defined user interfaces offer limited coupling flexibility typically forcing all users of an application to share all aspects of the user interfaces those that automatically support high coupling flexibility are tied to narrow set of predefined user interfaces we have developed framework that provides high level and flexible coupling support for arbitrary programmer defined user interfaces the framework refines an abstract layered model of collaboration with structured application layers and automatic acquisition transformation and processing of updates it has been used to easily provide flexible coupling in complex existing single user software and shown to support all known ways to share user interfaces coupling flexibility comes at the cost of small amount of additional programming we have carefully crafted the framework to ensure that this overhead is proportional to the degree of coupling flexibility desired
broadcast encryption be deals with secure transmission of message to group of receivers such that only an authorized subset of receivers can decrypt the message the transmission cost of be system can be reduced considerably if limited number of free riders can be tolerated in the system in this paper we study the problem of how to optimally place given number of free riders in subset difference sd based be system which is currently the most efficient be scheme in use and has also been incorporated in standards and we propose polynomial time optimal placement algorithm and three more efficient heuristics for this problem simulation experiments show that sd based be schemes can benefit significantly from the proposed algorithms
our work focuses on building tools to support collaborative software development we are building new programming environment with integrated software configuration management which provides variety of features to help programming teams coordinate their work in this paper we detail hierachy based software configuration management system called coven which acts as collaborative medium for allowing teams of programmers to cooperate by providing family of inter related mechanisms our system provides powerful support for cooperation and coordination in manner which matches the structure of development teams
concurrent programs require high level abstractions in order to manage complexity and enable compositional reasoning in this paper we introduce novel concurrency abstraction dubbed transactional events which combines first class synchronous message passing events with all or nothing transactions this combination enables simple solutions to interesting problems in concurrent programming for example guarded synchronous receive can be implemented as an abstract transactional event whereas in other languages it requires non abstract non modular protocol as another example three way rendezvous can be implemented as an abstract transactional event which is impossible using first class events alone both solutions are easy to code and easy to reason about the expressive power of transactional events arises from sequencing combinator whose semantics enforces an all or nothing transactional property either both of the constituent events synchronize in sequence or neither of them synchronizes this sequencing combinator along with non deterministic choice combinator gives transactional events the compositional structure of monad with plus we provide formal semantics for transactional events and give detailed account of an implementation
in this paper we study the problem of topic level random walk which concerns the random walk at the topic level previously several related works such as topic sensitive page rank have been conducted however topics in these methods were predefined which makes the methods inapplicable to different domains in this paper we propose four step approach for topic level random walk we employ probabilistic topic model to automatically extract topics from documents then we perform the random walk at the topic level we also propose an approach to model topics of the query and then combine the random walk ranking score with the relevance score based on the modeling results experimental results on real world data set show that our proposed approach can significantly outperform the baseline methods of using language model and that of using traditional pagerank
organizations families institutions evolve shared culture and history in this work we describe system to facilitate conversation and storytelling about this collective past users explore digital archives of shared materials such as photographs video and text documents on tabletop interface both the software and the interface encourage natural conversation and reflection this work is an application of our ongoing research on systems for multiple co present users to explore digital collections in this paper we present case study of our own group history along with the software extensions developed for this scenario these extensions include methods for easily branching off from and returning to previous threads of the exploration incorporating background contexts that support variety of view points and flexible story sharing and supporting the active and passive discovery of relevant information
in distributed stream processing environments large numbers of continuous queries are distributed onto multiple servers when one or more of these servers become overloaded due to bursty data arrival excessive load needs to be shed in order to preserve low latency for the query results because of the load dependencies among the servers load shedding decisions on these servers must be well coordinated to achieve end to end control on the output quality in this paper we model the distributed load shedding problem as linear optimization problem for which we propose two alternative solution approaches solver based centralized approach and distributed approach based on metadata aggregation and propagation whose centralized implementation is also available both of our solutions are based on generating series of load shedding plans in advance to be used under certain input load conditions we have implemented our techniques as part of the borealis distributed stream processing system we present experimental results from our prototype implementation showing the performance of these techniques under different input and query workloads
the partial stable models of logic program form class of models that include the unique well founded model total stable models and other two interesting subclasses maximal stable models and least undefined stable models as stable models different from the well founded are not unique datalog queries do not in general correspond to functions the question is what are the expressive powers of the various types of stable models when they are restricted to the class of all functional queries the paper shows that this power does not go in practice beyond the one of stratified queries except for least undefined stable models which instead capture the whole boolean hierarchy bh finally it is illustrated how the latter result can be used to design functional language which by means of disciplined usage of negation allows to achieve the desired level of expressiveness up to bh so that exponential time resolution is eventually enabled only for hard problems
spinning is synchronization mechanism commonly used in applications and operating systems excessive spinning however often indicates performance or correctness eg livelock problems detecting if applications and operating systems are spinning is essential for achieving high performance especially in consolidated servers running virtual machines prior research has used source or binary instrumentation to detect spinning however these approaches place significant burden on programmers and may even be infeasible in certain situations in this paper we propose efficient hardware to detect spinning in unmodified applications and operating systems based on this hardware we develop scheduling and power policies that adaptively manage resources for spinning threads system support that helps detect when multithreaded program is livelocked and hardware performance counters that accurately reflect system performance using full system simulation with spec omp splash and wisconsin commercial workloads we demonstrate that our mechanisms effectively improve the management of multithreaded systems
clustering on large databases has been studied actively as an increasing number of applications involve huge amount of data in this paper we propose an efficient top down approach for density based clustering which is based on the density information stored in index codes of multidimensional index we first provide formal definition of the cluster based on the concept of region contrast partition based on this notion we propose novel top down clustering algorithm which improves the efficiency through branch and bourd pruning for this pruning we present technique for determining the bounds based on sparse and dense internal regions and formally prove the correctness of the bounds experimental results show that the proposed method reduces the elapsed time by up to times compared with that of birch which is well known clustering method the results also show that the performance improvement becomes more marked as the size of the database increases
the architectures of embedded systems are often application specific containing multiple heterogenous cores non uniform memory on chip networks and custom hardware elements eg dsp cores standard programming languages do not use these many of these features natively because they assume traditional single processor and single logical address space abstraction that hides these architectural details this paper describes compile time virtualisation technique which uses virtualisation layer to map software onto the target architecture whilst allowing the programmer to control the virtualisation mappings in order to effectively exploit custom architectures
in recent years cp nets have emerged as useful tool for supporting preference elicitation reasoning and representation cp nets capture and support reasoning with qualitative conditional preference statements statements that are relatively natural for users to express in this paper we extend the cp nets formalism to handle another class of very natural qualitative statements one often uses in expressing preferences in daily life statements of relative importance of attributes the resulting formalism tcp nets maintains the spirit of cp nets in that it remains focused on using only simple and natural preference statements uses the ceteris paribus semantics and utilizes graphical representation of this information to reason about its consistency and to perform possibly constrained optimization using it the extra expressiveness it provides allows us to better model tradeoffs users would like to make more faithfully representing their preferences
the particle level set method has proven successful for the simulation of two separate regions such as water and air or fuel and products in this paper we propose novel approach to extend this method to the simulation of as many regions as desired the various regions can be liquids or gases of any type with differing viscosities densities viscoelastic properties etc we also propose techniques for simulating interactions between materials whether it be simple surface tension forces or more complex chemical reactions with one material converting to another or two materials combining to form third we use separate particle level set method for each region and propose novel projection algorithm that decodes the resulting vector of level set values providing dictionary that translates between them and the standard single valued level set representation an additional difficulty occurs since discretization stencils for interpolation tracing semi lagrangian rays etc cross region boundaries naively combining non smooth or even discontinuous data this has recently been addressed via ghost values eg for fire or bubbles we instead propose new paradigm that allows one to incorporate physical jump conditions in data on the fly which is significantly more efficient for multiple regions especially at triple points or near boundaries with solids
the duplicate elimination problem of detecting multiple tuples which describe the same real world entity is an important data cleaning problem previous domain independent solutions to this problem relied on standard textual similarity functions eg edit distance cosine metric between multi attribute tuples however such approaches result in large numbers of false positives if we want to identify domain specific abbreviations and conventions in this paper we develop an algorithm for eliminating duplicates in dimensional tables in data warehouse which are usually associated with hierarchies we exploit hierarchies to develop high quality scalable duplicate elimination algorithm and evaluate it on real datasets from an operational data warehouse
even with much activity over the past decade including organized efforts on both sides of the atlantic the representation of both space and time in digital databases is still problematic and functional space time systems have not gone beyond the limited prototype stage why is this the case why did it take twenty years from the first gis for the for representation and analysis in the temporal as well as the spatial dimension to begin emsp explore the answers to these questions by giving historical overview of the development of space time representation in the geographic information systems and database communities and review of the most recent research within the context of this perspective also question what seems to be spirit of self accusation in which the lack of functional space time systems has been discussed in the literature and in meetings of gis researchers close by offering my own interpretation of current research issues on space time data models and languages
we consider the following autocompletion search scenario imagine user of search engine typing query then with every keystroke display those completions of the last query word that would lead to the best hits and also display the best such hits the following problem is at the core of this feature for fixed document collection given set of documents and an alphabetical range of words compute the set of all word in document pairs from the collection such that and we present new data structure with the help of which such autocompletion queries can be processed on the average in time linear in the input plus output size independent of the size of the underlying document collection at the same time our data structure uses no more space than an inverted index actual query processing times on large test collection correlate almost perfectly with our theoretical bound
history of user operations function helps make applications easier to use for example users may have access to an operation history list in an application to undo or redo past operation to provide an overview of long operation history and help users find target interactions or application states quickly visual representations of operation history have been proposed however most previous systems are tightly integrated with target applications and difficult to apply to new applications we propose an application independent method that can visualize the operation history of arbitrary gui applications by monitoring the input and output gui events from outside of the target application we implemented prototype system that visualizes operation sequences of generic java awt swing applications using an annotated comic strip metaphor we tested the system with various applications and present results from user study
although anonymizing peer to peer pp networks often means extra cost in terms of transfer efficiency many systems try to mask the identities of their users for privacy consideration by comparison and analysis of existing approaches we investigate the properties of unstructured pp anonymity and summarize current attack models on these designs most of these approaches are path based which require peers to pre construct anonymous paths before transmission thus suffering significant overhead and poor reliability we also discuss the open problems in this field and propose several future research directions
parallel and distributed languages specify computations on multiple processors and have computation language to describe the algorithm ie what to compute and coordination language to describe how to organise the computations across the processors haskell has been used as the computation language for wide variety of parallel and distributed languages and this paper is comprehensive survey of implemented languages we outline parallel and distributed language concepts and classify haskell extensions using them similar example programs are used to illustrate and contrast the coordination languages and the comparison is facilitated by the common computation language lazy language is not an obvious choice for parallel or distributed computation and we address the question of why haskell is common functional computation language
this paper examines the problem of distributed intrusion detection in mobile ad hoc networks manets utilizing ensemble methods three level hierarchical system for data collection processing and transmission is described local idss intrusion detection systems are attached to each node of the manet collecting raw data of network operation and computing local anomaly index measuring the mismatch between the current node operation and baseline of normal operation anomaly indexes from nodes belonging to cluster are periodically transmitted to cluster head which averages the node indexes producing cluster level anomaly index cluster heads periodically transmit these cluster level anomaly indexes to manager which averages them on the theoretical side we show that averaging improves detection rates under very mild conditions concerning the distributions of the anomaly indexes of the normal class and the anomalous class on the practical side the paper describes clustering algorithms to update cluster centers and machine learning algorithms for computing the local anomaly indexes the complete suite of algorithms was implemented and tested under two types of manet routing protocols and two types of attacks against the routing infrastructure performance evaluation was effected by determining the receiver operating characteristics roc curves and the corresponding area under the roc curve auc metrics for various operational conditions the overall results confirm the theoretical developments related with the benefits of averaging with detection accuracy improving as we move up in the node cluster manager hierarchy
the paper introduces the problem of system level thermal aware design of applications with uncertain run time on an embedded processor equipped with dynamic voltage frequency scaling features the problem takes as inputs task sequence cycle time distribution of each task and processor thermal model the solution specifies voltage frequency assignment to the tasks such that the expected latency is minimized subject to the probability that the peak temperature constraint is not violated is no less than designer specified value we prove that the problem is at least np hard and present optimal and epsilon fully polynomial time approximation scheme as solutions to the best of our knowledge this paper is the first work that addresses the stochastic version of the system level thermal aware design problem we evaluate the effectiveness of our techniques by experimenting with realistic and synthetic benchmarks
prefetching into cpu caches has long been known to be effective in reducing the cache miss ratio but known implementations of prefetching have been unsuccessful in improving cpu performance the reasons for this are that prefetches interfere with normal cache operations by making cache address and data ports busy the memory bus busy the memory banks busy and by not necessarily being complete by the time that the prefetched data is actually referenced in this paper we present extensive quantitative results of detailed cycle by cycle trace driven simulation of uniprocessor memory system in which we vary most of the relevant parameters in order to determine when and if hardware prefetching is useful we find that in order for prefetching to actually improve performance the address array needs to be double ported and the data array needs to either be double ported or fully buffered it is also very helpful for the bus to be very wide eg bytes for bus transactions to be split and for main memory to be interleaved under the best circumstances ie with significant investment in extra hardware prefetching can significantly improve performance for implementations without adequate hardware prefetching often decreases performance
the gradient of function defined on manifold is perhaps one of the most important differential objects in data analysis most often in practice the input function is available only at discrete points sampled from the underlying manifold and the manifold is approximated by either mesh or simply point cloud while many methods exist for computing gradients of function defined over mesh computing and simplifying gradients and related quantities such as critical points of function from point cloud is non trivial in this paper we initiate the investigation of computing gradients under different metric on the manifold from the original natural metric induced from the ambient space specifically we map the input manifold to the eigenspace spanned by its laplacian eigenfunctions and consider the so called diffusion distance metric associated with it we show the relation of gradient under this metric with that under the original metric it turns out that once the laplace operator is constructed it is easier to approximate gradients in the eigenspace for discrete inputs especially point clouds and it is robust to noises in the input function and in the underlying manifold more importantly we can easily smooth the gradient field at different scales within this eigenspace framework we demonstrate the use of our new eigen gradients with two applications approximating simplifying the critical points of function and the jacobi sets of two input functions which describe the correlation between these two functions from point clouds data
checkpointing and rollback recovery are well known techniques for handling failures in distributed systems the issues related to the design and implementation of efficient checkpointing and recovery techniques for distributed systems have been thoroughly understood for example the necessary and sufficient conditions for set of checkpoints to be part of consistent global checkpoint has been established for distributed computations in this paper we address the analogous question for distributed database systems in distributed database systems transaction consistent global checkpoints are useful not only for recovery from failure but also for audit purposes if each data item of distributed database is checkpointed independently by separate transaction none of the checkpoints taken may be part of any transaction consistent global checkpoint however allowing individual data items to be checkpointed independently results in non intrusive checkpointing in this paper we establish the necessary and sufficient conditions for the checkpoints of set of data items to be part of transaction consistent global checkpoint of the distributed database such conditions can also help in the design and implementation of non intrusive checkpointing algorithms for distributed database systems
efficiently managing the history of time evolving system is one of the central problems in many database environments like database systems that incorporate versioning or object oriented databases that implicitly or explicitly maintain the history of persistent objects in this paper we propose algorithms that reconstruct past states of an evolving system for two general cases ie when the system state is represented by set or by hierarchy forest of trees sets are widely used as canonical form of representing information in databases or program states for more complex applications like schema evolution in object oriented databases it becomes necessary to manage the history of data structures that have the form of forests or even graphs the proposed algorithms use minimal space proportional to the number of changes occurring in the evolution and have the advantage of being on line in the amortized sense any past system state is reconstructed in time loglogt where is the size of the answer and is the maximal evolution time for all practical cases the loglogt factor is constant therefore our algorithms provide almost random access to any past system state moreover we show that the presented algorithms are optimal among all algorithms that use space linear in the number of changes in the system evolution
abstract in this paper we have proposed speculative locking sl protocols to improve the performance of distributed database systems ddbss by trading extra processing resources in sl transaction releases the lock on the data object whenever it produces corresponding after image during its execution by accessing both before and after images the waiting transaction carries out speculative executions and retains one execution based on the termination commit or abort mode of the preceding transactions by carrying out multiple executions for transaction sl increases parallelism without violating serializability criteria under the naive version of sl the number of speculative executions of the transaction explodes with data contention by exploiting the fact that submitted transaction is more likely to commit than abort we propose the sl variants that process transactions efficiently by significantly reducing the number of speculative executions the simulation results indicate that even with manageable extra resources these variants significantly improve the performance over two phase locking in the ddbs environments where transactions spend longer time for processing and transaction aborts occur frequently
with each technology generation delivering timevarying current with reduced nominal supply voltage variation is becoming more difficult due to increasing current and power requirements the power delivery network design becomes much more complex and requires accurate analysis and optimizations at all levels of abstraction in order to meet the specifications in this paper we describe techniques for estimation of the supply voltage variations that can be used in the design of the power delivery network we also describe the decoupling capacitor hierarchy that provides low impedance to the increasing high frequency current demand and limits the supply voltage variations techniques for high level power estimation that can be used for performance vs power trade offs to reduce the current and power requirements of the circuit are also presented
anonymous credentials are widely used to certify properties of credential owner or to support the owner to demand valuable services while hiding the user’s identity at the same time credential system aka pseudonym system usually consists of multiple interactive procedures between users and organizations including generating pseudonyms issuing credentials and verifying credentials which are required to meet various security properties we propose general symbolic model based on the applied pi calculus for anonymous credential systems and give formal definitions of few important security properties including pseudonym and credential unforgeability credential safety pseudonym untraceability we specialize the general formalization and apply it to the verification of concrete anonymous credential system proposed by camenisch and lysyanskaya the analysis is done automatically with the tool proverif and several security properties have been verified
the research reported here integrates computational visual and cartographic methods to develop geovisual analytic approach for exploring and understanding spatio temporal and multivariate patterns the developed methodology and tools can help analysts investigate complex patterns across multivariate spatial and temporal dimensions via clustering sorting and visualization specifically the approach involves self organizing map parallel coordinate plot several forms of reorderable matrices including several ordering methods geographic small multiple display and dimensional cartographic color design method the coupling among these methods leverages their independent strengths and facilitates visual exploration of patterns that are difficult to discover otherwise the visualization system we developed supports overview of complex patterns and through variety of interactions enables users to focus on specific patterns and examine detailed views we demonstrate the system with an application to the ieee infovis contest data set which contains time varying geographically referenced and multivariate data for technology companies in the us
versioning file systems retain earlier versions of modified files allowing recovery from user mistakes or system corruption unfortunately conventional versioning systems do not efficiently record large numbers of versions in particular versioned metadata can consume as much space as versioned data this paper examines two space efficient metadata structures for versioning file systems and describes their integration into the comprehensive versioning file system cvfs which keeps all versions of all files journal based metadata encodes each metadata version into single journal entry cvfs uses this structure for inodes and indirect blocks reducing the associated space requirements by multiversion trees extend each entrys key with timestamp and keep current and historical entries in single tree cvfs uses this structure for directories reducing the associated space requirements by similar space reductions are predicted via trace analysis for other versioning strategies eg on close versioning experiments with cvfs verify that its current version performance is sim ilar to that of non versioning file systems while reducing overall space needed for history data by factor of two although access to historical versions is slower than con ventional versioning systems checkpointing is shown to mitigate and bound this effect
fundamental principle for reusing software assets is providing means to access them information retrieval mechanisms assisted by semantic initiatives play very important role in finding relevant reusable assets in this context this paper presents semantic search tool in order to improve the precision of search returns furthermore the requirements the decomposition of architectural module and aspects of implementation are presented
confronted with the generalization of monitoring in operational networks researchers have proposed placement algorithms that can help isps deploy their monitoring infrastructure in cost effective way while maximizing the benefits of their infrastructure however static placement of monitors cannot be optimal given the short term and long term variations in traffic due to re routing events anomalies and the normal network evolution in addition most isps already deploy router embedded monitoring functionalities despite some limitations inherent to being part of router these monitoring tools give greater visibility on the network traffic but raise the question on how to configure network wide monitoring infrastructure that may contain hundreds of monitoring points we reformulate the placement problem as follows given network where all links can be monitored which monitors should be activated and which sampling rate should be set on these monitors in order to achieve given measurement task with high accuracy and low resource consumption we provide formulation of the problem an optimal algorithm to solve it and we study its performance on real backbone network
we study two natural models of randomly generated constraint satisfaction problems we determine how quickly the domain size must grow with to ensure that these models are robust in the sense that they exhibit non trivial threshold of satisfiability and we determine the asymptotic order of that threshold we also provide resolution complexity lower bounds for these models one of our results immediately yields theorem regarding homomorphisms between two random graphs copy wiley periodicals inc random struct alg
this paper explores using non linguistic vocalization as an additional modality to augment digital pen input on tablet computer we investigated this through set of novel interaction techniques and feasibility study typically digital pen users control one or two parameters using stylus position and sometimes pen pressure however in many scenarios the user can benefit from the ability to continuously vary additional parameters non linguistic vocalizations such as vowel sounds variation of pitch or control of loudness have the potential to provide fluid continuous input concurrently with pen interaction we present set of interaction techniques that leverage the combination of voice and pen input when performing both creative drawing and object manipulation tasks our feasibility evaluation suggests that with little training people can use non linguistic vocalization to productively augment digital pen interaction
test factoring creates fast focused unit tests from slow system wide tests each new unit test exercises only subset of the functionality exercised by the system test augmenting test suite with factored unit tests should catch errors earlier in test runone way to factor test is to introduce mock objects if test exercises component which interacts with another component the environment the implementation of can be replaced by mock the mock checks that t’s calls to are as expected and it simulates e’s behavior in response we introduce an automatic technique for test factoring given system test for and and record of t’s and e’s behavior when the system test is run test factoring generates unit tests for in which is mocked the factored tests can isolate bugs in from bugs in and if is slow or expensive improve test performance or costour implementation of automatic dynamic test factoring for the java language reduces the running time of system test suite by up to an order of magnitude
in many decision making applications users typically issue aggregate queries to evaluate these computationally expensive queries online aggregation has been developed to provide approximate answers with their respective confidence intervals quickly and to continuously refine the answers in this paper we extend the online aggregation technique to distributed context where sites are maintained in dht distributed hash table network our distributed online aggregation doa scheme iteratively and progressively produces approximate aggregate answers as follows in each iteration small set of random samples are retrieved from the data sites and distributed to the processing sites at each processing site local aggregate is computed based on the allocated samples at coordinator site these local aggregates are combined into global aggregate doa adaptively grows the number of processing nodes as the sample size increases to further reduce the sampling overhead the samples are retained as precomputed synopsis over the network to be used for processing future queries we also study how these synopsis can be maintained incrementally we have conducted extensive experiments on planetlab the results show that our doa scheme reduces the initial waiting time significantly and provides high quality approximate answers with running confidence intervals progressively
constraint programming is method of problem solving that allows declarative specification of relations among objects it is important to allow preferences of constraints since it is often difficult for programmers to specify all constraints without conflicts in this paper we propose numerical method for solving nonlinear constraints with hierarcical preferences ie constraint hierarchies in least squares manner this method finds sufficiently precise local optimal solutions by appropriately processing hierarchical preferences of constraints to evaluate the effectiveness of our method we present experimental results obtained with prototype constraint solver
personalized information access tools are frequently based on collaborative filtering recommendation algorithms collaborative filtering recommender systems typically suffer from data sparsity problem where systems do not have sufficient user data to generate accurate and reliable predictions prior research suggested using group based user data in the collaborative filtering recommendation process to generate group based predictions and partially resolve the sparsity problem although group recommendations are less accurate than personalized recommendations they are more accurate than general non personalized recommendations which are the natural fall back when personalized recommendations cannot be generated in this work we present initial results of study that exploits the browsing logs of real families of users gathered in an ehealth portal the browsing logs allowed us to experimentally compare the accuracy of two group based recommendation strategies aggregated group models and aggregated predictions our results showed that aggregating individual models into group models resulted in more accurate predictions than aggregating individual predictions into group predictions
focused web crawling traverses the web to collect documents on specific topic this is not an easy task since focused crawlers need to identify the next most promising link to follow based on the topic and the content and links of previously crawled pages in this paper we present framework based on maximum entropy markov models memms for an enhanced focused web crawler to take advantage of richer representations of multiple features extracted from web pages such as anchor text and the keywords embedded in the link url to represent useful context the key idea of our approach is to treat the focused web crawling problem as sequential task and use combination of content analysis and link structure to capture sequential patterns leading to targets the experimental results showed that focused crawling using memms is very competitive crawler in general over best first crawling on web data in terms of two metrics precision and maximum average similarity
on line analytical processing olap enables analysts to gain insight about data through fast and interactive access to variety of possible views on information organized in dimensional model the demand for data integration is rapidly becoming larger as more and more information sources appear in modern enterprises in the data warehousing approach selected information is extracted in advance and stored in repository yielding good query performance however in many situations logical rather than physical integration of data is preferable previous web based data integration efforts have focused almost exclusively on the logical level of data models creating need for techniques focused on the conceptual level also previous integration techniques for web based data have not addressed the special needs of olap tools such as handling dimensions with hierarchies extensible markup language xml is fast becoming the new standard for data representation and exchange on the world wide web the rapid emergence of xml data on the web eg business to business bb commerce is making it necessary for olap and other data analysis tools to handle xml data as well as traditional data formatsbased on real world case study this paper presents an approach to specification of olap dbs based on web data unlike previous work this approach takes special olap issues such as dimension hierarchies and correct aggregation of data into account also the approach works on the conceptual level using unified modeling language uml as basis for so called uml snowflake diagrams that precisely capture the multidimensional structure of the data an integration architecture that allows the logical integration of xml and relational data sources for use by olap tools is also presented
in previous research it has been shown that link based web page metrics can be used to predict experts assessment of quality we are interested in related question do expert rankings of real world entities correlate with search engine se rankings of corresponding web resources to answer this question we compared rankings of college football teams in the us with rankings of their associated web resources we looked at the weekly polls released by the associated press ap and usa today coaches poll both rank the top teams according to the aggregated expertise of sports writers and college football coaches for the entire season we compared the ranking of teams top and top according to the polls with the rankings of one to eight urls associated with each team in google live search and yahoo we found moderate to high correlations between the final rankings of and the se ranking in mid but the correlation between the polls and the ses steadily decreased as the season went on we believe this is because the rankings in the web graph as reported via ses have inertia and do not rapidly fluctuate as do the teams on the field fortunes
we provide computer verified exact monadic functional implementation of the riemann integral in type theory together with previous work by o’connor this may be seen as the beginning of the realization of bishop’s vision to use constructive mathematics as programming language for exact analysis
this paper addresses the problem of organizing material in mixed digital and physical environments it presents empirical examples of how people use collectional artefacts and organize physical material such as paper samples models mock ups plans etc in the real world based on this material we propose concepts for collectional actions and meta data actions and present prototypes combining principles from augmented reality and hypermedia to support organising and managing mixtures of digital and physical materials the prototype of the tagging system is running on digital desks and walls utilizing radio frequency identifier rfid tags and tag readers it allows users to tag important physical materials and have these tracked by antennas that may become pervasive in our work environments we work with three categories of tags simple object tags collectional tags and tooltags invoking operations such as grouping and linking of physical material our primary application domain is architecture and design thus we discuss use of augmented collectional artefacts primarily for this domain
this article reviews selected set of location based services lbs that have been published in the research literature focussing on mobile guides transport support gaming assistive technology and health the research needs and opportunities in each area are evaluated and the connections between each category of lbs are discussed the review illustrates the enormous diversity of forms in which lbs are appearing and the wide range of application sectors that are represented however very few of these applications are implemented pervasively on commercial basis as this is still challenging technically and economically
during the software crisis of the dijkstra’s famous thesis goto considered harmful paved the way for structured programming this short communication suggests that many current difficulties of parallel programming based on message passing are caused by poorly structured communication which is consequence of using low level send receive primitives we argue that like goto in sequential programs send receive should be avoided as far as possible and replaced by collective operations in the setting of message passing we dispute some widely held opinions about the apparent superiority of pairwise communication over collective communication and present substantial theoretical and empirical evidence to the contrary in the context of mpi message passing interface
the development of tools to support synchronous communications between non collocated colleagues has received considerable attention in recent years much of the work has focused on increasing sense of co presence between interlocutors by supporting aspects of face to face conversations that go beyond mere words eg gaze postural shifts in this regard design goal for many environments is the provision of as much media richness as possible to support non collocated communication in this paper we present results from our most recent interviews studying the use of text based virtual environment to support work collaborations we describe how such an environment though lacking almost all the visual and auditory cues known to be important in face to face conversation has played an important role in day to day communication we offer set of characteristics we feel are important to the success of this text only tool and discuss issues emerging from its long term use
in spite of the many decades of progress in database research surprisingly scientists in the life sciences community still struggle with inefficient and awkward tools for querying biological data sets this work highlights specific problem involving searching large volumes of protein data sets based on their secondary structure in this paper we define an intuitive query language that can be used to express queries on secondary structure and develop several algorithms for evaluating these queries we implement these algorithms both in periscope native system that we have built and in commercial ordbms we show that the choice of algorithms can have significant impact on query performance as part of the periscope implementation we have also developed framework for optimizing these queries and for accurately estimating the costs of the various query evaluation plans our performance studies show that the proposed techniques are very efficient in the periscope system and can provide scientists with interactive secondary structure querying options even on large protein data sets
described here is study of how students actively read electronic journal papers to prepare for classroom discussions eighteen students enrolled in graduate course participated in this study half of them read the documents privately while the other half shared their readings these readers were digitally monitored as they read annotated and shared the electronic documents over course of several weeks during semester this monitoring yielded comprehensive data bank of documents with markings and computer logs using semi structured interviews the reading marking and navigational activities of the participating readers were analyzed in detail under scrutiny were range of activities that the subjects carried out analyses of the data revealed the types of markings that the users employ and the ways in which those marking were placed derivation of the user perceived functions of the marking structures was then carried out the findings then lead to several implications for informing the design of reading and marking applications in digital libraries
we report on recent advancements in the development of grounder gringo for logic programs under answer set semantics like its relatives dlv and lparse gringo has in the meantime reached maturity and offers rich modeling language to program developers the attractiveness of gringo is fostered by the fact that it significantly extends the input language of lparse while supporting compatible output format recognized by many state of the art asp solvers
although most wireless terrestrial networks are based on two dimensional design in reality such networks operate in three dimensions since most often the size ie the length and the width of such terrestrial networks is significantly larger than the differences in the third dimension ie the height of the nodes the assumption is somewhat justified and usually it does not lead to major inaccuracies however in some environments this is not the case the underwater atmospheric or space communications being such apparent examples in fact recent interest in underwater acoustic ad hoc and sensor networks hints at the need to understand how to design networks in unfortunately the design of networks is surprisingly more difficult than the design of networks for example proofs of kelvin’s conjecture and kepler’s conjecture required centuries of research to achieve breakthroughs whereas their counterparts are trivial to solve in this paper we consider the coverage and connectivity issues of networks where the goal is to find node placement strategy with sensing coverage of space while minimizing the number of nodes required for surveillance our results indicate that the use of the voronoi tessellation of space to create truncated octahedral cells results in the best strategy in this truncated octahedron placement strategy the transmission range must be at least times the sensing range in order to maintain connectivity among nodes if the transmission range is between and times the sensing range then hexagonal prism placement strategy or rhombic dodecahedron placement strategy should be used although the required number of nodes in the hexagonal prism and the rhombic dodecahedron placement strategies is the same this number is higher than the number of nodes required by the truncated octahedron placement strategy we verify by simulation that our placement strategies indeed guarantee ubiquitous coverage we believe that our approach and our results presented in this paper could be used for extending the processes of network design to networks
this paper proposes an approach to decision analysis for complex industrial process without enough knowledge of input output model which is based on the two class svm method it first proposes svm based decision analysis model to improve the accuracy of determinant of whether decision is acceptable unacceptable by verifying the soft margin of svm this makes it only allow misclassification of only one class in two class classification then granular based approach is presented to solving this model it is proved that this granular approach can reach an upper bound of the original svm model an algorithm then is presented to determine whether decision is acceptable according to our analysis and experiments the two types of svm have better accuracy on judging its target class then traditional svm and the granular based svm solving can reduce the running time
given widespread interest in rough sets as being applied to various tasks of data analysis it is not surprising at all that we have witnessed wave of further generalizations and algorithmic enhancements of this original concept in this study we investigate an idea of rough fuzzy random sets this construct provides us with certain generalization of rough sets by introducing the concept of inclusion degree the underlying objective behind this development is to address the problems which involve co existing factors of fuzziness and randomness thus giving rise to notion of the fuzzy random approximation space based on inclusion degree some essential properties of rough approximation operators of such rough fuzzy random sets are discussed further theoretical foundations for the formation of rules constructed on basis of available decision tables are offered as well
general class of program analyses area combination of context free and regular language reachability we define regularly annotated set constraints constraint formalism that captures this class our results extend the class of reachability problems expressible naturally in single constraint formalism including such diverse applications as interprocedural dataflow analysis precise type based flow analysis and pushdown model checking
in this paper we analyze several issues involved in developing low latency adaptive wormhole routing schemes for two dimensional meshes it is observed that along with adaptivity balanced distribution of traffic has significant impact on the system performance motivated by this observation we develop new fully adaptive routing algorithm called positive first negative first for two dimensional meshes the algorithm uses only two virtual channels per physical channel creating two virtual networks the messages are routed positive first in one virtual network and negative first in the other because of this combination the algorithm distributes the system load uniformly throughout the network and is also fully adaptive it is shown that the proposed algorithm results in providing better performance in terms of the average network latency and throughput when compared with the previously proposed routing algorithms
we consider network congestion problems between tcp flows and define new game the window game which models the problem of network congestion caused by the competing flows analytical and experimental results show the relevance of the window game to the real tcp game and provide interesting insight on nash equilibria of the respective network games furthermore we propose new algorithmic queue mechanism called prince which at congestion makes scapegoat of the most greedy flow preliminary evidence shows that prince achieves efficient nash equilibria while requiring only limited computational resources
this paper presents process for capturing spatially and directionally varying illumination from real world scene and using this lighting to illuminate computer generated objects we use two devices for capturing such illumination in the first we photograph an array of mirrored spheres in high dynamic range to capture the spatially varying illumination in the second we obtain higher resolution data by capturing images with an high dynamic range omnidirectional camera as it traverses across plane for both methods we apply the light field technique to extrapolate the incident illumination to volume we render computer generated objects as illuminated by this captured illumination using custom shader within an existing global illumination rendering system to demonstrate our technique we capture several spatially varying lighting environments with spotlights shadows and dappled lighting and use them to illuminate synthetic scenes we also show comparisons to real objects under the same illumination
this paper advocates the placement of architecturally visible communication avc buffers between adjacent cores in mpsocs to provide high throughput communication for streaming applications producer consumer relationships map poorly onto cache based mpsocs instead we instantiate application specific avc buffers on top of distributed consistent and coherent cache based system with shared main memory to provide the desired functionality using jpeg compression as case study we show that the use of avc buffers in conjunction with parallel execution via heterogeneous software pipelining provides speedup of as much as compared to baseline single processor system with an increase in estimated memory energy consumption of only additionally we describe method to integrate the avc buffers into the cache coherence protocol this allows the runtime system to guarantee memory safety and coherence in situations where the parallelization of the application may be unsafe due to pointers that could not be resolved at compile time
concurrent ml is an extension of standard ml with calculus like primitives for multithreaded programming cml has reduction semantics but to date there has been no labelled transition system semantics provided for the entire language in this paper we present labelled transition semantics for fragment of cml called μvcml which includes features not covered before dynamically generated local channels and thread identifiers we show that weak bisimilarity for μvcml is congruence and coincides with barbed bisimulation congruence we also provide variant of sangiorgi’s normal bisimulation for μvcml and show that this too coincides with bisimilarity
electronic system level esl modeling allows early hardware dependent software hds development due to broad cpu diversity and shrinking time to market hds development can neither rely on hand retargeting binary tools nor can it rely on pre existent tools within standard packages as consequence binary utilities which can be easily adapted to new cpu targets are of increasing interest we present in this article framework for automatic generation of binary utilities it relies on two innovative ideas platform aware modeling and more inclusive relocation handling generated assemblers linkers disassemblers and debuggers were validated for mips sparc powerpc and picf an open source prototype generator is available for download
concurrency is used in modern software systems as means of addressing performance availability and reliability requirements the collaboration of multiple independently executing components is fundamental to meeting such requirements and such collaboration is realized by synchronizing component executionusing current technologies developers are faced with tension between correct synchronization and performance developers can be confident when simple forms of synchronization are used for example locking all accesses to shared data unfortunately such simple approaches can result in significant run time overhead and in fact there are many cases in which such simple approaches cannot implement required synchronization policies implementing more sophisticated and less constraining synchronization policies may improve run time performance and satisfy synchronization requirements but fundamental difficulties in reasoning about concurrency make it difficult to assess their correctnessthis paper describes an approach to automatically synthesizing complex synchronization implementations from formal high level specifications moreover the generated coded is designed to be processed easily by software model checking tools such as bandera this enables the generated synchronization solutions to be verified for important system correctness properties we believe this is an effective approach because the tool support provided makes it simple to use it has solid semantic foundation it is language independent and we have demonstrated that it is powerful enough to solve numerous challenging synchronization problems
this work unifies two important threads of research in intelligent user interfaces which share the common element of explicit task modeling on the one hand longstanding research on task centered gui design sometimes called model based design has explored the benefits of explicitly modeling the task to be performed by an interface and using this task model as an integral part of the interface design process more recently research on collaborative interface agents has shown how an explicit task model can be used to control the behavior of software agent that helps user perform tasks using gui this paper describes collection of tools we have implemented which generate both gui and collaborative interface agent from the same task model our task centered gui design tool incorporates number of novel features which help the designer to integrate the task model into the design process without being unduly distracted our implementation of collaborative interface agents is built on top of the collagen middleware for collaborative interface agents
with the proliferation of embedded devices and systems there is renewed interest in the generation of compact binaries code compaction techniques identify code sequences that repeatedly appear in program and replace them by single copy of the recurring sequence in existing techniques such sequences are typically restricted to single entry single exit regions in the control flow graph we have observed that in many applications recurring code sequences form single entry multipleexit seme regions in this paper we propose generalized algorithm for code compaction that first decomposes control flow graph into hierarchy of seme regions computes signatures of seme regions and then uses the signatures to find pairs of matching seme regions maximal sized matching seme regions are found and transformed to achieve code compaction our transformation is able to compact matching seme regions whose exits may lead to combination of identical and differing targets our experiments show that this transformation can lead to substantial reduction in code size for many embedded applications
this article describes the context design and recent development of the lapack for clusters lfc project it has been developed in the framework of self adapting numerical software sans since we believe such an approach can deliver the convenience and ease of use of existing sequential environments bundled with the power and versatility of highly tuned parallel codes that execute on clusters accomplishing this task is far from trivial as we argue in the paper by presenting pertinent case studies and possible usage scenarios
many robotic and machine vision applications rely on the accurate results of stereo correspondence algorithms however difficult environmental conditions such as differentiations in illumination depending on the viewpoint heavily affect the stereo algorithms performance this work proposes new illumination invariant dissimilarity measure in order to substitute the established intensity based ones the proposed measure can be adopted by almost any of the existing stereo algorithms enhancing it with its robust features the performance of the dissimilarity measure is validated through experimentation with new adaptive support weight asw stereo correspondence algorithm experimental results for variety of lighting conditions are gathered and compared to those of intensity based algorithms the algorithm using the proposed dissimilarity measure outperforms all the other examined algorithms exhibiting tolerance to illumination differentiations and robust behavior
due to the emergence of the http standards persistent connections are increasingly being used in web retrieval this paper studies the caching performance of web clusters under persistent connections focusing on the difference between session grained and request grained allocation strategies adopted by the web switch it is shown that the content based algorithm considerably improves caching performance over the content blind algorithm at the request grained level however most of the performance gain is offset by the allocation dependency that arises when the content based algorithm is used at the session grained level the performance loss increases with cluster size and connection holding time an optimization problem is formulated to investigate the best achievable caching performance under session grained allocation based on heuristic approach session affinity aware algorithm is presented that makes use of the correlation between the requests in session experimental results show that while the session affinity aware algorithm outperforms the content based algorithm under session grained allocation this optimization cannot fully compensate for the performance loss caused by allocation dependency
the recent mpeg standard specifies semi structured meta data format for open interoperability of multimedia however the standard refrains from specifying how the meta data is to be used or how meta data inappropriate to user requirements may be filtered out consequently we propose cosmos which produces structured mpeg compliant meta data for digital video and enables content based hybrid filtering of that meta data
in modern process industry it is often difficult to analyze manufacture process due to its numerous time series data analysts wish to not only interpret the evolution of data over time in working procedure but also examine the changes in the whole production process through time to meet such analytic requirements we have developed processlines an interactive visualization tool for large amount of time series data in process industry the data are displayed in fisheye timeline processlines provides good overviews for the whole production process and details for the focused working procedure user study using beer industry production data has shown that the tool is effective
there has been extensive research in xmlkeyword based and loosely structured querying some frameworks work well for certain types of xml data models and fail in others the reason is that the proposed techniques are based on finding relationships between solely individual nodes while overlooking the context of these nodes the context of leaf node is determined by its parent node because it specifies one of the characteristics of its parent node building relationships between individual leaf nodes without consideration of their parents may result in relationships that are semantically disconnected since leaf nodes are nothing but characteristics of their parents we observe that we could treat each parent children set of nodes as one unified entitywe then find semantic relationships between the different unified entities based on those observations we propose an xml semantic search engine called ooxsearch which answers loosely structured queries the recall and precision of the engine were evaluated experimentally and compared with two recent proposed systems and the results showed marked improvement
stretch free surface flattening has been requested by variety of applications at present the most difficult problem is how to segment given model into nearly developable atlases so that nearly stretch free flattening can be computed the criterion for segmentation is needed to evaluate the possibility of flattening given surface patch which should be fast computed in this paper we present method to compute the length preserved free boundary lpfb of mesh patch which speeds up the mesh parameterization the distortion on parameterization can then be employed as the criterion in trial and error algorithm for segmenting given model into nearly developable atlases the computation of lpfb is formulated as numerical optimization problem in the angle space where we are trying to optimize the angle excesses on the boundary while preserving the constraints derived from the closed path theorem and the length preservation
in modern designs the delay of net can vary significantly depending on its routing this large estimation error during the pre routing stage can often mislead the optimization of the netlist we extend state of the art interconnect driven physical synthesis by introducing new paradigm namely persistence that relies on guaranteed net routes for the most sensitive nets while performing circuit optimization in the pre route stage we implemented our proposed approach in cutting edge industrial physical synthesis flow this involved the automatic identification and routing of critical nets that were likely to be mispredicted the automatic update of their routes during the subsequent pre routing stage optimizations and the guaranteed retention of their routes across the routing stage our approach achieves significant performance improvements on suite of real world nm designs while ensuring that the impact on their routability remains negligible furthermore our experimental results scale very well with design size
this paper presents and studies distributed cache management approach through os level page allocation for future many core processors cache management is crucial multicore processor design aspect to overcome non uniform cache access latency for good program performance and to reduce on chip network traffic and related power consumption unlike previously studied hardwarebased private and shared cache designs implementing fixed caching policy the proposed os microarchitecture approach is flexible it can easily implement wide spectrum of caching policies without complex hardware support furthermore our approach can provide differentiated execution environment to running programs by dynamically controlling data placement and cache sharing degrees we discuss key design issues of the proposed approach and present preliminary experimental results showing the promise of our approach
today graphics processing units gpus are largely underexploited resource on existing desktops and possible cost effective enhancement to high performance systems to date most applications that exploit gpus are specialized scientific applications little attention has been paid to harnessing these highly parallel devices to support more generic functionality at the operating system or middleware level this study starts from the hypothesis that generic middleware level techniques that improve distributed system reliability or performance such as content addressing erasure coding or data similarity detection can be significantly accelerated using gpu supportwe take first step towards validating this hypothesis and we design storegpu library that accelerates number of hashing based middleware primitives popular in distributed storage system implementations our evaluation shows that storegpu enables up twenty five fold performance gains on synthetic benchmarks as well as on high level application the online similarity detection between large data files
database applications often impose temporal dependencies between transactions that must be satisfied to preserve data consistency the extant correctness criteria used to schedule the execution of concurrent transactions are either time independent or use strict difficult to satisfy real time constraints on one end of the spectrum serializability completely ignores time on the other end deadline scheduling approaches consider the outcome of each transaction execution correct only if the transaction meets its real time deadline in this article we explore new correctness criteria and scheduling methods that capture temporal transaction dependencies and belong to the broad area between these two extreme approaches we introduce the concepts of succession dependency and chronological dependency and define correctness criteria under which temporal dependencies between transactions are preserved even if the dependent transactions execute concurrently we also propose chronological scheduler that can guarantee that transaction executions satisfy their chronological constraints the advantages of chronological scheduling over traditional scheduling methods as well as the main issues in the implementation and performance of the proposed scheduler are discussed
to enable effective cross organizational collaborations process providers have to offer external views on their internal processes to their partners process view hides details of an internal process that are secret to or irrelevant for the partners this paper describes formal two step approach for constructing customized process views on structured process models first non customized process view is constructed from an internal structured process model by aggregating internal activities the provider wishes to hide second customized process view is constructed by hiding and omitting activities from the non customized view that are not requested by the process consumer the feasibility of the approach is shown by means of case study
despite the intense interest towards realizing the semantic web vision most existing rdf data management schemes are constrained in terms of efficiency and scalability still the growing popularity of the rdf format arguably calls for an effort to offset these drawbacks viewed from relational database perspective these constraints are derived from the very nature of the rdf data model which is based on triple format recent research has attempted to address these constraints using vertical partitioning approach in which separate two column tables are constructed for each property however as we show this approach suffers from similar scalability drawbacks on queries that are not bound by rdf property value in this paper we propose an rdf storage scheme that uses the triple nature of rdf as an asset this scheme enhances the vertical partitioning idea and takes it to its logical conclusion rdf data is indexed in six possible ways one for each possible ordering of the three rdf elements each instance of an rdf element is associated with two vectors each such vector gathers elements of one of the other types along with lists of the third type resources attached to each vector element hence sextuple indexing scheme emerges this format allows for quick and scalable general purpose query processing it confers significant advantages up to five orders of magnitude compared to previous approaches for rdf data management at the price of worst case five fold increase in index space we experimentally document the advantages of our approach on real world and synthetic data sets with practical queries
as large data centers emerge which host multiple web applications it is critical to isolate different application environments for security reasons and to provision shared resources effectively and efficiently to meet different service quality targets at minimum operational cost to address this problem we developed novel architecture of resource management framework for multi tier applications based on virtualization mechanisms key techniques presented in this paper include establishment of the analytic performance model which employs probabilistic analysis and overload management to deal with non equilibrium states general formulation of the resource management problem which can be solved by incorporating both deterministic and stochastic optimizing algorithms deployment of virtual servers to partition resource at much finer level and investigation of the impact of the failure rate to examine the effect of application isolation simulation experiments comparing three resource allocation schemes demonstrate the advantage of our dynamic approach in providing differentiated service qualities preserving qos levels in failure scenarios and also improving the overall performance while reducing the resource usage cost
trends towards consolidation and higher density computing configurations make the problem of heat management one of the critical challenges in emerging data centers conventional approaches to addressing this problem have focused at the facilities level to develop new cooling technologies or optimize the delivery of cooling in contrast to these approaches our paper explores an alternate dimension to address this problem namely systems level solution to control the heat generation through temperature aware workload placement we first examine theoretic thermodynamic formulation that uses information about steady state hot spots and cold spots in the data center and develop real world scheduling algorithms based on the insights from these results we develop an alternate approach our new approach leverages the non intuitive observation that the source of cooling inefficiencies can often be in locations spatially uncorrelated with its manifested consequences this enables additional energy savings overall our results demonstrate up to factor of two reduction in annual data center cooling costs over location agnostic workload distribution purely through software optimizations without the need for any costly capital investment
in various situations mobile agents at different hosts must cooperate with one another by sharing information and making decisions collectively to ensure effective interagent communication communication protocols must track target agent locations and deliver messages reliablyresearchers have proposed wide range of schemes for agent tracking and reliable message delivery however each scheme has its own assumptions design goals and methodology as result no uniform or structured methods exist for characterizing current protocols making it difficult to evaluate their relative effectiveness and performancethe authors propose mailbox based scheme for designing mobile agent communication protocols this scheme assigns each agent mailbox to buffer messages but decouples the agent and mailbox to let them reside at different hosts and migrate separately
spyware infections are becoming extremely pervasive posing grave threat to internet users privacy control of such an epidemic is increasingly difficult for the existing defense mechanisms which in many cases rely on detection alone in this paper we propose spyshield new containment technique to add another layer of defense against spyware our technique can automatically block the visions of untrusted programs in the presence of sensitive information which preserves users privacy even after spyware has managed to evade detection it also enables users to avoid the risks of using free software which could be bundled with surveillance code as first step our design of spyshield offers general protection against spy add ons an important type of spyware this is achieved through enforcing set of security policies to the channels an add on can use to monitor its host application such as com interfaces and shared memory so as to block unauthorized leakage of sensitive informationwe prototyped spyshield under windows xp to protect internet explorer and also evaluated it using real plug ins our experimental study shows that the technique can effectively disrupt spyware surveillance in accordance with security policies and introduce only small overhead
we show how the reachability analysis of rpps systems can be tackled with the tree automata techniques proposed by lugiez and schnoebelen for pa this approach requires that we express the states of rpps systems in rpa tailor made process rewrite system where reachability is relation recognizable by finite tree automata two outcomes of this study are an np algorithm for reachability in rpps systems and simple decision procedure for large class of reachability problems in rpa systems
in previous work we proposed an algebra whose operators allow to specify the valid compound terms of faceted taxonomy in flexible manner by combining positive and negative statements in this paper we treat the same problem but in more general setting where the facet taxonomies are not independent but are possibly interrelated through narrower broader relationships between their terms the proposed algebra called interrelated facet composition algebra ifca is more powerful as the valid compound terms of faceted taxonomy can be derived through smaller set of declared valid and or invalid compound terms an optimized wrt the naive approach algorithm that checks compound term validity according to well formed ifca expression and its worst time complexity are provided
in this paper we introduce the idea of organizing systems through number of examples from an ongoing ethnographic study of family life we suggest that organizing systems come about through the artful design and use of informational artifacts in the home such as calendars paper notes to do lists etc these systems are not only seen to organize household routines and schedules but also crucially to shape the social relations between family members drawing attention to the material properties of informational artifacts and how assemblies of these artifacts come to make up organizing systems we discuss some general implications for designing information technology for the home most importantly we suggest that technologies must be designed to accommodate the rich and diverse ways in which people organize their homes providing them with the resources to artfully construct their own systems rather than enforcing ones that are removed from their own experiences
most implementations of workstation based multimedia information systems cannot support continuous display of high resolution audio and video data and suffer from frequent disruptions and delays termed hiccups this is due to the low bandwidth of the current disk technology the high bandwidth requirement of multimedia objects and the large size of these objects which requires them to be almost always disk resident parallel multimedia information system and the key technical ideas that enable it to support real time display of multimedia objects are described in this system multimedia object across several disk drives is declustered enabling the system to utilize the aggregate bandwidth of multiple disks to retrieve an object in real time then the workload of an application is distributed evenly across the disk drives to maximize the processing capability of the system to support simultaneous display of several multimedia objects for different users two alternative approaches are described the first approach multitasks disk drive among several requests while the second replicates the data and dedicates resources to each individual request the trade offs associated with each approach are investigated using simulation model
we present an approach to construct networked online education system among physically separated participants targeted at teaching handwritten characters the proposed system provides haptic channel to intuitively learn remote instructor’s fine motor skills through the sense of touch the instructor’s handwriting motions are sent to the learner’s system and replayed via the learner’s haptic device the learner can intuitively acquire the instructor’s writing skill through the haptic interactions additionally the instructor can check the learner’s writing exercise and give him or her some advices to improve the writing ability we also describe experiments to quantitatively analyse how the haptic interactions improve the learning quality
shared registers are basic objects used as communication mediums in asynchronous concurrent computation concurrent timestamp system is higher typed communication object and has been shown to be powerful tool to solve many concurrency control problems it has turned out to be possible to construct such higher typed objects from primitive lower typed ones the next step is to find efficient constructions we propose very efficient wait free construction of bounded concurrent timestamp systems from writer shared registers this finalizes corrects and extends preliminary bounded multiwriter construction proposed by the second author in that work partially initiated the current interest in wait free concurrent objects and introduced notion of discrete vector clocks in distributed algorithms
software maintainers are faced with the task of regression testing retesting modified program on an often large number of test cases the cost of regression testing can be reduced if the size of the program that must be retested is reduced and if old test cases and old test results can be reused two complimentary algorithms for reducing the cost of regression testing are presented the first produces program called differences that captures the semantic change between certified previously tested program and modified changed version of certified it is more efficient to test differences because it omits unchanged computations the program differences is computed using combination of program slicesthe second algorithm identifies test cases for which certified and modified will produce the same output and existing test cases that will test components new in modified not rerunning test cases that produce the same output avoids unproductive testing testing new components with existing test cases avoids the costly construction of new test cases the second algorithm is based on the notion of common execution patterns which is the interprocedural extension of the notion introduced by bates and horwitz program components with common execution patterns have the same execution pattern during some call to their procedure they are computed using new type of interprocedural slice called calling context slice whereas an interprocedural slice includes the program components necessary to capture all possible executions of statement calling context slice includes only those program components necessary to capture the execution of statement in particular calling context ie particular call to the procedure together with differences it is possible to test modified by running the smaller program differences on smaller number of test cases this is more efficient than running modified on large number of test cases prototype implementation has been built to examine and illustrate these algorithms
mining frequent patterns such as frequent itemsets is core operation in many important data mining tasks such as in association rule mining mining frequent itemsets in high dimensional datasets is challenging since the search space is exponential in the number of dimensions and the volume of patterns can be huge many of the state of the art techniques rely upon the use of prefix trees eg fp trees which allow nodes to be shared among common prefix paths however the scalability of such techniques may be limited when handling high dimensional datasets the purpose of this paper is to analyse the behaviour of mining frequent itemsets when instead of tree data structure canonical directed acyclic graph namely zero suppressed binary decision diagram zbdd is used due to its compactness and ability to promote node reuse zbdd has proven very effective in other areas of computer science such as boolean sat solvers in this paper we show how zbdds can be used to mine frequent itemsets and their common varieties we also introduce weighted variant of zbdd which allows more efficient mining algorithm to be developed we provide an experimental study concentrating on high dimensional biological datasets and identify indicative situations where zbdd technology can be superior over the prefix tree based technique
workflow technology has recently been employed in scientific applications because of their ever increasing complexities across multiple organizations institutes research labs or units over the internet and intranet in this paper we propose methodology for the decomposition of complex scientific process requirements into different types of elementary flows such as control data exception semantics and security based on that we can determine the subset of each type of flows ie flow views necessary and the related requirements for the interactions with each type of collaboration partners in the process integration these subsets collectively constitute process view based on which interactions can be systematically designed integrated and managed in scalable way we show with case study in scientific research environment to demonstrate our approach we further illustrate how these flows can be implemented with various contemporary web services technologies
we present graph clear novel pursuit evasion problem on graphs which models the detection of intruders in complex indoor environments by robot teams the environment is represented by graph and robot team can execute sweep and block actions on vertices and edges respectively sweep action detects intruders in vertex and represents the capability of the robot team to detect intruders in the region associated to the vertex similarly block action prevents intruders from crossing an edge and represents the capability to detect intruders as they move between regions both actions may require multiple robots to be executed strategy is sequence of block and sweep actions to detect all intruders when instances of graph clear are being solved the goal is to determine optimal strategies ie strategies that use the least number of robots we prove that for the general case of graphs the problem of computation of optimal strategies is np hard next for the special case of trees we provide polynomial time algorithm the algorithm ensures that throughout the execution of the strategy all cleared vertices form connected subtree and we show that it produces optimal strategies
recent work on ontology based information extraction ie has tried to make use of knowledge from the target ontology in order to improve semantic annotation results however very few approaches exploit the ontology structure itself and those that do so have some limitations this paper introduces hierarchical learning approach for ie which uses the target ontology as an essential part of the extraction process by taking into account the relations between concepts the approach is evaluated on the largest available semantically annotated corpus the results demonstrate clearly the benefits of using knowledge from the ontology as input to the information extraction process we also demonstrate the advantages of our approach over other state of the art learning systems on commonly used benchmark dataset
in pp vod streaming systems user behavior modeling is critical to help optimise user experience as well as system throughput however it still remains challenging task due to the dynamic characteristics of user viewing behavior in this paper we consider the problem of user seeking prediction which is to predict the user’s next seeking position so that the system can proactively make response we present novel method for solving this problem in our method frequent sequential patterns mining is first performed to extract abstract states which are not overlapped and cover the whole video file altogether after mapping the raw training dataset to state transitions according to the abstract states we use simpel probabilistic contingency table to build the prediction model we design an experiment on the synthetic pp vod dataset the results demonstrate the effectiveness of our method
the drive for greater detail in scientific computing and digital photography is creating demand for ultra resolution images and visualizations such images are best viewed on large displays with enough resolution to show big picture relationships concurrently with fine grained details historically large scale displays have been rare due to the high costs of equipment space and maintenance however modern tiled displays of commodity lcd monitors offer large aggregate image resolution and can be constructed and maintained at low cost we present discussion of the factors to consider in constructing tiled lcd displays and an evaluation of current approaches used to drive them based on our experience constructing displays ranging from mpixels to mpixels we wish to capture current practices to inform the design and construction of future displays at current and larger scales
among existing grid middleware approaches one simple powerful and flexible approach consists of using servers available in different administrative domains through the classic client server or remote procedure call rpc paradigm network enabled servers nes implement this model also called gridrpc clients submit computation requests to scheduler whose goal is to find server available on the grid the aim of this paper is to give an overview of an nes middleware developed in the graal team called diet and to describe recent developments diet distributed interactive engineering toolbox is hierarchical set of components used for the development of applications based on computational servers on the grid
regression testing is an expensive testing procedure utilized to validate modified software regression test selection techniques attempt to reduce the cost of regression testing by selecting subset of program’s existing test suite safe regression test selection techniques select subsets that under certain well defined conditions exclude no tests from the original test suite that if executed would reveal faults in the modified software many regression test selection techniques including several safe techniques have been proposed but few have been subjected to empirical validation this paper reports empirical studies on particular safe regression test selection technique in which the technique is compared to the alternative regression testing strategy of running all tests the results indicate that safe regression test selection can be cost effective but that its costs and benefits vary widely based on number of factors in particular test suite design can significantly affect the effectiveness of test selection and coverage based test suites may provide test selection results superior to those provided by test suites that are not coverage based
we develop an availability solution called safetynet that uses unified lightweight checkpoint recovery mechanism to support multiple long latency fault detection schemes at an abstract level safetynet logically maintains multiple globally consistent checkpoints of the state of shared memory multiprocessor ie processors memory and coherence permissions and it recovers to pre fault checkpoint of the system and re executes if fault is detected safetynet efficiently coordinates checkpoints across the system in logical time and uses logically atomic coherence transactions to free checkpoints of transient coherence state safetynet minimizes performance overhead by pipelining checkpoint validation with subsequent parallel executionwe illustrate safetynet avoiding system crashes due to either dropped coherence messages or the loss of an interconnection network switch and its buffered messages using full system simulation of way multiprocessor running commercial workloads we find that safetynet adds statistically insignificant runtime overhead in the common case of fault free execution and avoids crash when tolerated faults occur
deployment of multi agent system on network refers to the placement of one or more copies of each agent on network hosts in such manner that the memory constraints of each node are satisfied finding the deployment that is most likely to tolerate faults ie have at least one copy of each agent functioning and in communication with other agents is challenge in this paper we address the problem of finding the probability of survival of deployment ie the probability that deployment will tolerate faults under the assumption that node failures are independent we show that the problem of computing the survival probability of deployment is at least np hard moreover it is hard to approximate we produce two algorithms to accurately compute the probability of survival of deployment these algorithms are expectedly exponential we also produce five heuristic algorithms to estimate survival probabilities these algorithms work in acceptable time frames we report on detailed set of experiments to determine the conditions under which some of these algorithms perform better than the others
web based education and training provides new paradigm for imparting knowledge students can access the learning material anytime by operating remotely from any location webd open standards such as xd and vrml support web based delivery of educational virtual environments eves eves have great potential for learning and training purposes by allowing one to circumvent physical safety and cost constraints unfortunately eves often leave to the user the onus of taking the initiative both in exploring the environment and interacting with its parts possible solution to this problem is the exploitation of virtual humans acting as informal coaches or more formal instructors for example virtual humans can be employed to show and explain maintenance procedures allowing learners to receive more practical explanations which are easier to understand however virtual humans are rarely used in webd eves since the programming effort to develop and re use them in different environments can be considerable in this paper we present general architecture that allows content creators to easily integrate virtual humans into webd eves to test the generality of our solution we present two practical examples showing how the proposed architecture has been used in different educational contexts
today’s fast growth of both the number and complexity of digital models results in number of research challenges amongst others the efficient presentation of and interaction with such complex models is essential it therefore has become more and more important to provide the user with smart visual interface that presents all the information required in the context of the task at hand in comprehensive way in this paper we present two stage concept for the task oriented exploration of polygonal meshes an authoring tool uses combination of automatic mesh segmentation and manual enrichment with semantic information for association with specified exploration goals this information is then used at runtime to adapt the model’s presentation to the task at hand the exploration of the enriched model can further be supported by interactive tools lenses are discussed as an example
after more than years of research and practice in software configuration management scm constructing consistent configurations of versioned software products still remains challenge this article focuses on the version models underlying both commercial systems and research prototypes it provides an overview and classification of different versioning paradigms and defines and relates fundamental concepts such as revisions variants configurations and changes in particular we focus on intensional versioning that is construction of versions based on configuration rules finally we provide an overview of systems that have had significant impact on the development of the scm discipline and classify them according to detailed taxonomy
modern processors rely on memory dependence prediction to execute load instructions as early as possible speculating that they are not dependent on an earlier unissued store to date the most sophisticated dependence predictors such as store sets have been tightly coupled to the fetch and execution streams requiring global knowledge of the in flight stream of stores to synchronize loads with specific stores this paper proposes new dependence predictor design called counting dependence predictor cdp the key feature of cdps is that the prediction mechanism predicts some set of events for which particular dynamic load should wait which may include some number of matching stores by waiting for local events only this dependence predictor can work effectively in distributed microarchitecture where centralized fetch and execution streams are infeasible or undesirable we describe and evaluate distributed counting dependence predictor and protocol that achieves of the performance of perfect memory disambiguation it outperforms load wait table similar to the alpha by idealized centralized implementations of store sets and the exclusive collision predictor both of which would be difficult to implement in distributed microarchitecture achieve and of oracular performance respectively
process mining aims at extracting information from event logs to capture the business process as it is being executed process mining is particularly useful in situations where events are recorded but there is no system enforcing people to work in particular way consider for example hospital where the diagnosis and treatment activities are recorded in the hospital information system but where health care professionals determine the careflow many process mining approaches have been proposed in recent years however in spite of many researchers persistent efforts there are still several challenging problems to be solved in this paper we focus on mining non free choice constructs ie situations where there is mixture of choice and synchronization although most real life processes exhibit non free choice behavior existing algorithms are unable to adequately deal with such constructs using petri net based representation we will show that there are two kinds of causal dependencies between tasks ie explicit and implicit ones we propose an algorithm that is able to deal with both kinds of dependencies the algorithm has been implemented in the prom framework and experimental results shows that the algorithm indeed significantly improves existing process mining techniques
model checking techniques have not been effective in important classes of software systems characterized by large or infinite input domains with interrelated linear and non linear constraints over the input variables various model abstraction techniques have been proposed to address this problem in this paper we wish to propose domain abstraction based on data equivalence and trajectory reduction as an alternative and complement to other abstraction techniques our technique applies the abstraction to the input domain environment instead of the model and is applicable to constraint free and deterministic constrained data transition system our technique is automatable with some minor restrictions
we have implemented an application independent collaboration manager called collagen based on the sharedplan theory of discourse and used it to build software interface agent for simple air travel application the software agent provides intelligent mixed initiative assistance without requiring natural language understanding key benefit of the collaboration manager is the automatic construction of an interaction history which is hierarchically structured according to the user lsquo and agent lsquo goals and intentions
the exponential growth of information available on the world wide web and retrievable by search engines has implied the necessity to develop efficient and effective methods for organizing relevant contents in this field document clustering plays an important role and remains an interesting and challenging problem in the field of web computing in this paper we present document clustering method which takes into account both contents information and hyperlink structure of web page collection where document is viewed as set of semantic units we exploit this representation to determine the strength of relation between two linked pages and to define relational clustering algorithm based on probabilistic graph representation the experimental results show that the proposed approach called red clustering outperforms two of the most well known clustering algorithm as means and expectation maximization
illumination changes cause serious problems in many computer vision applications we present new method for addressing robust depth estimation from stereo pair under varying illumination conditions first spatially varying multiplicative model is developed to account for brightness changes induced between left and right views the depth estimation problem based on this model is then formulated as constrained optimization problem in which an appropriate convex objective function is minimized under various convex constraints modelling prior knowledge and observed information the resulting multiconstrained optimization problem is finally solved via parallel block iterative algorithm which offers great flexibility in the incorporation of several constraints experimental results on both synthetic and real stereo pairs demonstrate the good performance of our method to efficiently recover depth and illumination variation fields simultaneously
it has been shown that tcp and tcp like congestion control are highly problematic in wireless multihop networks in this paper we present novel hop by hop congestion control protocol that has been tailored to the specific properties of the shared medium in the proposed scheme backpressure towards the source node is established implicitly by passively observing the medium lightweight error detection and correction mechanism guarantees fast reaction to changing medium conditions and low overhead our approach is equally applicable to tcp and udp like data streams we demonstrate the performance of our approach by an in depth simulation study these findings are underlined by testbed results obtained using an implementation of our protocol on real hardware
interpolation based automatic abstraction is powerful and robust technique for the automated analysis of hardware and software systems its use has however been limited to control dominated applications because of lack of algorithms for computing interpolants for data structures used in software programs we present efficient procedures to construct interpolants for the theories of arrays sets and multisets using the reduction approach for obtaining decision procedures for complex data structures the approach taken is that of reducing the theories of such data structures to the theories of equality and linear arithmetic for which efficient interpolating decision procedures exist this enables interpolation based techniques to be applied to proving properties of programs that manipulate these data structures
sensor networks usually operate under very severe energy restrictions therefore sensor communications should consume the minimum possible amount of energy while broadcasting is very energy expensive protocol it is also widely used as building block for variety of other network layer protocols therefore reducing the energy consumption by optimizing broadcasting is major improvement in sensor networking in this paper we propose an optimized broadcast protocol for sensor networks bps the major novelty of bps is its adaptive geometric approach that enables considerable reduction of retransmissions by maximizing each hop length bps adapts itself and gets the best out of existing radio conditions in bps nodes do not need any neighborhood information which leads to low communication and memory overhead we analyze the worst case scenario for bps and show that the number of transmissions in such scenario is constant multiple of those required in the ideal case our simulation results show that bps is very scalable with respect to network density bps is also resilient to transmission errors
this paper presents an innovative taxonomy for the classification of different strategies for the integration of ip components the taxonomy defines three main approaches which can apply both to hardware and software components standard based design communication synthesis and ip derivation the proposed taxonomy helps the understanding of current problems in embedded systems design and their associated proposed solutions from the software side the proposed classification considers all layers application software os and device drivers the taxonomy is based on the separation between computation and communication of the components and shows alternatives for adapting both of these aspects of an ip component to be integrated into soc the present paper also identifies open issues and possible future research directions in the design of embedded systems
rootkits are now prevalent in the wild users affected by rootkits are subject to the abuse of their data and resources often unknowingly suchmalware becomes even more dangerous when it is persistent infected disk images allow the malware to exist across reboots and prevent patches or system repairs from being successfully applied in this paper we introduce rootkit resistant disks rrd that label all immutable system binaries and configuration files at installation time during normal operation the disk controller inspects all write operations received from the host operating system and denies those made for labeled blocks to upgrade the host is booted into safe state and system blocks can only be modified if security token is attached to the disk controller by enforcing immutability at the disk controller we prevent compromised operating system from infecting its on disk image we implement the rrd on linksys nslu network storage device by extending the processing on the embedded disk controller running the slugos linux distribution our performance evaluation shows that the rrd exhibits an overhead of less than for filesystem creation and less than during intensive postmark benchmarking we further demonstrate the viability of our approach by preventing rootkit collected from the wild from infecting the os image in this way we show that rrds not only prevent rootkit persistence but do so in an efficient way
motion sketching is an approach for creating realistic rigid body motion in this approach an animator sketches how objects should move and the system computes physically plausible motion that best fits the sketch the sketch is specified with mouse based interface or with hand gestures which move instrumented objects in the real world to act out the desired behaviors the sketches may be imprecise may be physically infeasible or may have incorrect timing multiple shooting optimization estimates the parameters of rigid body simulation needed to simulate an animation that matches the sketch with physically plausible timing and motion this technique applies to physical simulations of multiple colliding rigid bodies possibly connected with joints in tree open loop topology
fundamental problem in distributed computing is performing set of tasks despite failures and delays stated abstractly the problem is to perform tasks using failure prone processors this paper studies the efficiency of emulating shared memory task performing algorithms on asynchronous message passing processors with quantifiable message latency efficiency is measured in terms of work and communication and the challenge is to obtain subquadratic work and message complexity while prior solutions assumed synchrony and constant delays the solutions given here yield subquadratic efficiency with asynchronous processors when the delays and failures are suitably constrained the solutions replicate shared objects using quorum system provided it is not disabled one algorithm has subquadratic work and communication when the delays and the number of processors owning object replicas are it tolerates crashes it is also shown that there exists an algorithm that has subquadratic work and communication and that tolerates failures provided message delays are sublinear
this paper evaluates the ability of wireless mesh architecture to provide high performance internet access while demanding little deployment planning or operational management the architecture considered in this paper has unplanned node placement rather than planned topology omni directional antennas rather than directional links and multi hop routing rather than single hop base stations these design decisions contribute to ease of deployment an important requirement for community wireless networks however this architecture carries the risk that lack of planning might render the network’s performance unusably low for example it might be necessary to place nodes carefully to ensure connectivity the omni directional antennas might provide uselessly short radio ranges or the inefficiency of multi hop forwarding might leave some users effectively disconnectedthe paper evaluates this unplanned mesh architecture with case study of the roofnet mesh network roofnet consists of nodes spread over four square kilometers of an urban area the network provides users with usable performance despite lack of planning the average inter node throughput is kbits second even though the average route has three hopsthe paper evaluates multiple aspects of the architecture the effect of node density on connectivity and throughput the characteristics of the links that the routing protocol elects to use the usefulness of the highly connected mesh afforded by omni directional antennas for robustness and throughput and the potential performance of single hop network using the same nodes as roofnet
in this paper we present mixed mimd simd execution model for reconfigurable computer this model is adapted to the use of specialized associative coprocessor embedded in this host machine main characteristic of the model is that it uses four types of processes decoding calculus coprocessor communication and transaction manager and that in principle one process of each type is allowed on each processor time intervals are allocated to operations into partitions of the set of processors transfers are usually limited to identifiers logical addresses and locks simulations display high level of processors occupation therefore the machine yield may be very high and the operations should be very fast
many dimensionality reduction problems end up with trace quotient formulation since it is difficult to directly solve the trace quotient problem traditionally the trace quotient cost function is replaced by an approximation such that the generalized eigenvalue decomposition can be applied in contrast we directly optimize the trace quotient in this work it is reformulated as quasi linear semidefinite optimization problem which can be solved globally and efficiently using standard off the shelf semidefinite programming solvers also this optimization strategy allows one to enforce additional constraints for example sparseness constraints on the projection matrix we apply this optimization framework to novel dimensionality reduction algorithm the performance of the proposed algorithm is demonstrated in experiments by several uci machine learning benchmark examples usps handwritten digits as well as orl and yale face data
in this paper we propose new content based publish subscribe pub sub framework that enables pub sub system to accommodate richer content formats including multimedia publications with image and video content the pub sub system besides being responsible for matching and routing the published content is also responsible for converting the content into the suitable target format for each subscriber content conversion is achieved through set of content adaptation operators eg image transcoder document translator etc at different nodes in the overlay network we study algorithms for placement of such operators in the pub sub broker overlay in order to minimize the communication and computation resource consumption our experimental results show that careful placement of these operators in pub sub overlay network can lead to significant cost reduction
for hierarchy of properties of term rewriting systems related to confluence we prove relative undecidability ie for implications in the hierarchy the property is undecidable for term rewriting systems satisfying for some of the implications either or is semi decidable for others neither nor is semi decidable we prove most of these results for linear term rewrite systems
collections of already developed programs are important resources for efficient development of reliable software systems in this paper we propose novel graph representation model of software component library repository called component rank model this is based on analyzing actual usage relations of the components and propagating the significance through the usage relations using the component rank model we have developed java class retrieval system named spars and applied spars to various collections of java files the result shows that spars gives higher rank to components that are used more frequently as result software engineers looking for component have better chance of finding it quickly spars has been used by two companies and has produced promising results
java virtual machines have historically employed either wholemethod or trace methodology for selecting regions of code for optimization adaptive whole method optimization primarily leverages intra procedural optimizations derived from classic static compilation techniques whereas trace optimization utilizes an interpreter to select manage and dispatch inter procedural fragments of frequently executed code in this paper we present our hybrid approach for supplementing the comprehensive strengths of whole method jit compiler with the inter procedural refinement of trace fragment selection and show that that the two techniques would be mutually beneficial using the interpreterless jikes rvm as foundation we use our trace profiling subsystem to identify an application’s working set as collection of hot traces and show that there is significant margin for improvement in instruction ordering that can be addressed by trace execution our benchmark hot trace profiles indicate that of transitions between machine code basic blocks as laid out by the jit compiler are non contiguous many of which are transfers of control flow to locations outside of the current virtual memory page additionally the analyses performed by the adaptive whole method jit compiler allow for better identification of trace starting and stopping locations an improvement over the popular next executing tail net trace selection scheme we show minimal overhead for trace selection indicating that inter procedural trace execution provides an opportunity to improve both instruction locality as well as compiler directed branch prediction without significant run time cost
mining frequent patterns is an important component of many prediction systems one common usage in web applications is the mining of users access behavior for the purpose of predicting and hence pre fetching the web pages that the user is likely to visit in this paper we introduce an efficient strategy for discovering frequent patterns in sequence databases that requires only two scans of the database the first scan obtains support counts for subsequences of length two the second scan extracts potentially frequent sequences of any length and represents them as compressed frequent sequences tree structure fs tree frequent sequence patterns are then mined from the fs tree incremental and interactive mining functionalities are also facilitated by the fs tree as part of this work we developed the fs miner system that discovers frequent sequences from web log files the fs miner has the ability to adapt to changes in users behavior over time in the form of new input sequences and to respond incrementally without the need to perform full re computation our system also allows the user to change the input parameters eg minimum support and desired pattern size interactively without requiring full re computation in most cases we have tested our system comparing it against two other algorithms from the literature our experimental results show that our system scales up linearly with the size of the input database furthermore it exhibits excellent adaptability to support threshold decreases we also show that the incremental update capability of the system provides significant performance advantages over full re computation even for relatively large update sizes
we describe an extension of the spin model checker for use on multi core shared memory systems and report on its performance we show how with proper load balancing the time requirements of verification run can in some cases be reduced close to fold when processing cores are used we also analyze the types of verification problems for which multi core algorithms cannot provide relief the extensions discussed here require only relatively small changes in the spin source code and are compatible with most existing verification modes such as partial order reduction the verification of temporal logic formulae bitstate hashing and hash compact compression
in this research method is presented for generating deformed model satisfying given error criteria from an stl model in triangular mesh representation suitable for rapid prototyping rp processes deformed model is non uniformly modified shape from base stl model in developing family product with various sizes such as shoe sometimes prototypes for all sizes should be made using an rp machine although an stl model is generated from solid model it is well known that creating non uniformly modified solid model from base solid model is very difficult generally there are some gaps between surfaces after modification and stitching the gaps is very difficult to solve this problem the authors explored the possibility of generating deformed stl model directly from base stl model this research includes data structure for modifying the stl model checking the error of modified edge compared with the exact non uniformly scaled curve checking the error of modified facet compared with the exact non uniformly scaled surface and splitting facet with an error greater than the allowable tolerance using the results of this research the difficult work of creating solid models to build non uniformly deformed stl models could be avoided
developing distributed real time systems with high degrees of assurance on the system reliability is becoming increasingly important yet remains difficult and error prone the time triggered message triggered object tmo scheme is high level distributed object oriented programming approach that has proved to be effective in developing such systems the tmo programming scheme allows real time application developers to explicitly specify temporal constraints in terms of global time in simple and natural forms tmosm is middleware model that provides the execution support mechanisms for tmos and tmosl is class library that provides convenient application programming interface api for developing tmo applications the tmo scheme tmosm and tmosl have evolved during these years in order to support complex distributed real time applications more effectively this paper presents some recent additions on the tmosm api that resulted from this evolution
service oriented architectures soa is an emerging approach that addresses the requirements of loosely coupled standards based and protocol independent distributed computing typically business operations running in an soa comprise number of invocations of these different components often in an event driven or asynchronous fashion that reflects the underlying business process needs to build an soa highly distributable communications and integration backbone is required this functionality is provided by the enterprise service bus esb that is an integration platform that utilizes web services standards to support wide variety of communications patterns over multiple transport protocols and deliver value added capabilities for soa applications this paper reviews technologies and approaches that unify the principles and concepts of soa with those of event based programing the paper also focuses on the esb and describes range of functions that are designed to offer manageable standards based soa backbone that extends middleware functionality throughout by connecting heterogeneous components and systems and offers integration services finally the paper proposes an approach to extend the conventional soa to cater for essential esb requirements that include capabilities such as service orchestration intelligent routing provisioning integrity and security of message as well as service management the layers in this extended soa in short xsoa are used to classify research issues and current research activities
privacy preserving has become an essential process for any data mining task therefore data transformation to ensure privacy preservation is needed in this paper we address problem of privacy preserving on an incremental data scenario in which the data need to be transformed are not static but appended all the time our work is based on well known data privacy model ie anonymity meanwhile the data mining task to be applied to the given dataset is associative classification as the problem of privacy preserving for data mining has proven as an np hard we propose to study the characteristics of proven heuristic algorithm in the incremental scenarios theoretically subsequently we propose few observations which lead to the techniques to reduce the computational complexity for the problem setting in which the outputs remains the same in addition we propose simple algorithm which is at most as efficient as the polynomial time heuristic algorithm in the worst case for the problem
we describe an efficient algorithm for the simulation of large sets of non convex rigid bodies the algorithm finds simultaneous solution for multi body system that is linear in the total number of contacts detected in each iteration we employ novel contact model that uses mass location and velocity information from all contacts at the moment of maximum compression to constrain rigid body velocities we also develop new friction model in the configuration space of rigid bodies these models are used to compute the feasible velocity and the frictional response of each body implementation is simple and leads to fast rigid body simulator that computes steps on the order of seconds for simulations involving over one thousand non convex objects in high contact configurations
many business process analysis models have been proposed however there are few discussions for artifact usages in workflow specifications well structured business process with sufficient resources might fail or yield unexpected results dynamically due to inaccurate artifact specification eg an inconsistency between artifact and control flow or contradictions between artifact operations this paper based on our previous work presents model for describing the input output of workflow process and analyzes the artifact usages upon the model this work identifies and formulates thirteen cases of artifact usage anomalies affecting process execution and categorizes the cases into three types moreover the methods for detecting these anomalies with time complexities less than in previous methods are presented besides the paper uses an example to demonstrate the processing of them
the demand for flexible embedded solutions and short time to market has led to the development of extensible processors that allow for customization through user defined instruction set extensions ises these are usually identified from plain sources in this article we propose combined exploration of code transformations and ise identification the resulting performance of such combination has been measured on two benchmark suites our results demonstrate that combined code transformations and ises can yield average performance improvements of percnt this outperforms ises when applied in isolation and in extreme cases yields speed up of
great deal of research has been devoted to solving the problem of network congestion posed against the context of increasing internet traffic however there has been little concern regarding improvements in the performance of internet servers such as web web proxy servers in spite of the projections that the performance bottleneck will shift from networks to endhosts in this paper we propose new resource management scheme for web web proxy servers which manages their resources for tcp connections effectively and fairly in the proposed system we focus on the effective use of server resources by assigning dynamically the send receive socket buffer according to the required size of each tcp connection and terminating positively the idol persistent connection also we validate the effectiveness of our proposed scheme through simulation and implementation experiments and confirm conclusively that web web proxy server throughput can be improved by at maximum and document transfer delay perceived by client hosts can be decreased by up to
software management is critical task in the system administration of enterprise scale networks enterprise scale networks that have traditionally comprised of large clusters of workstations are expanding to include low power ad hoc wireless sensor networks wsn the existing tools for software updates in workstations cannot be used with the severely resource constrained sensor nodes in this article we survey the software update techniques in wsns we base our discussion around conceptual model for the software update tools in wsns three components of this model that we study are the execution environment at the sensor nodes the software distribution protocol in the network and optimization of transmitted updates we present the design space of each component and discuss in depth the trade offs that need to be considered in making particular design choice the discussion is interspersed with references to deployed systems that highlight the design choices
transaction processing leads to new challenges in mobile ad hoc networks which in comparison to fixed wired networks suffer from problems like node disconnection message loss and frequently appearing network partitioning as the atomic commit protocol is that part of transaction processing in which failures can lead to the most serious data blocking we have developed robust and failure tolerant distributed cross layer atomic commit protocol called clcp that uses multiple coordinators in order to reduce the number of both failures and messages our protocol makes use of acknowledgement messages for piggybacking informationwe have evaluated our protocol in mobile ad hoc networks by using several mobility models ie random waypoint manhattan and attraction point and compared clcp with other atomic commit protocols ie pc and paxos commit each implemented in versions ie without sending message acknowledgements with relay routing technique and with nearest forward progress routing special to our simulation environment is the use of the quasi unit disc model which assumes non binary message reception probability that captures real world behavior much better than the classical unit disc model often used in theory using the quasi unit disc model our evaluation shows the following results clcp and pc without acknowledgement messages have significantly lower energy consumption than the other protocols and clcp is able to commit significantly more distributed transactions than all the other atomic commit protocols for each of the mobility models
the introduction of social networking site inside of large enterprise enables new method of communication between colleagues encouraging both personal and professional sharing inside the protected walls of company intranet our analysis of user behavior and interviews presents the case that professionals use internal social networking to build stronger bonds with their weak ties and to reach out to employees they do not know their motivations in doing this include connecting on personal level with coworkers advancing their career with the company and campaigning for their projects
the notorious xc dimensionality curse xd is well known phenomenon for any multi dimensional indexes attempting to scale up to high dimensions one well known approach to overcome degradation in performance with respect to increasing dimensions is to reduce the dimensionality of the original dataset before constructing the index however identifying the correlation among the dimensions and effectively reducing them are challenging tasks in this paper we present an adaptive multi level mahalanobis based dimensionality reduction mmdr technique for high dimensional indexing our mmdr technique has four notable features compared to existing methods first it discovers elliptical clusters for more effective dimensionality reduction by using only the low dimensional subspaces second data points in the different axis systems are indexed using single tree third our technique is highly scalable in terms of data size and dimension finally it is also dynamic and adaptive to insertions an extensive performance study was conducted using both real and synthetic datasets and the results show that our technique not only achieves higher precision but also enables queries to be processed efficiently
xml web services and the semantic web have opened the door for new and exciting information integration applications information sources on the web are controlled by different organizations or people utilize different text formats and have varying inconsistencies therefore any system that integrates information from different data sources must identify common entities from these sources data from many data sources on the web does not contain enough information to link the records accurately using state of the art record linkage systems however it is possible to exploit secondary data sources on the web to improve the record linkage processwe present an approach to accurately and automatically match entities from various data sources by utilizing state of the art record linkage system in conjunction with data integration system the data integration system is able to automatically determine which secondary sources need to be queried when linking records from various data sources in turn the record linkage system is then able to utilize this additional information to improve the accuracy of the linkage between datasets
program dependence information is useful for variety of applications such as software testing and maintenance tasks and code optimization properly defined control and data dependences can be used to identify semantic dependences to function effectively on whole programs tools that utilize dependence information require information about interprocedural dependences dependences that are identified by analyzing the interactions among procedures many techniques for computing interprocedural data dependences exist however virtually no attention has been paid to interprocedural control dependence analysis techniques that fail to account for interprocedural control dependences can suffer unnecessary imprecision and loss of safety this article presents definition of interprocedural control dependence that supports the relationship of control and data dependence to semantic dependence the article presents two approaches for computing interprocedural control dependences and empirical results pertaining to teh use of those approaches
many emerging applications such as video sensor monitoring can benefit from an on line video correlation system which can be used to discover linkages between different video streams in realtime however on line video correlations are often resource intensive where single host can be easily overloaded we present novel adaptive distributed on line video correlation system called vico unlike single stream processing correlations between different video streams require distributed execution system to observe new correlation constraint that any two correlated data must be distributed to the same host vico achieves three unique features correlation awareness that vico can guarantee the correlation accuracy while spreading excessive workload on multiple hosts adaptability that the system can adjust algorithm behaviors and switch between different algorithms to adapt to dynamic stream environments and fine granularity that the workload of one resource intensive correlation request can be divided and distributed among multiple hosts we have implemented and deployed prototype of vico on commercial cluster system our experiment results using both real videos and synthetic workloads show that vico outperforms existing techniques for scaling up the performance of video correlations
recent work have shown that moving least squares mls surfaces can be used effectively to reconstruct surfaces from possibly noisy point cloud data several variants of mls surfaces have been suggested some of which have been analyzed theoretically for guarantees these analyses so far have assumed uniform sampling density we propose new variant of the mls surface that for the first time incorporates local feature sizes in its formulation and we analyze it for reconstruction guarantees using non uniform sampling density the proposed variant of the mls surface has several computational advantages over existing mls methods
parallelism is one of the main sources for performance improvement in modern computing environment but the efficient exploitation of the available parallelism depends on number of parameters determining the optimum number of threads for given data parallel loop for example is difficult problem and dependent on the specific parallel platform this paper presents learning based approach to parallel workload allocation in cost aware manner this approach uses static program features to classify programs before deciding the best workload allocation scheme based on its prior experience with similar programs experimental results on java benchmarks test cases with different workloads in total show that it can efficiently allocate the parallel workload among java threads and achieve an efficiency of on average
dynamic binary translation is the process of translating and optimizing executable code for one machine to another at runtime while the program is ldquo executing rdquo on the target machine dynamic translation techniques have normally been limited to two particular machines competitor’s machine and the hardware manufacturer’s machine this research provides for more general framework for dynamic translations by providing framework based on specifications of machines that can be reused or adapted to new hardware architectures in this way developers of such techniques can isolate design issues from machine descriptions and reuse many components and analyses we describe our dynamic translation framework and provide some initial results obtained by using this system
the role based access control rbac model has garnered great interest in the security community due to the flexible and secure nature of its applicability to the complex and sophisticated information system one import aspect of rbac is the enforcing of security policy called constraint which controls the behavior of components in rbac much research has been conducted to specify constraints however more work is needed on the aspect of sharing information resources for providing better interoperability in the widely dispersed ubiquitous information system environment this paper provides visual modeling of rbac policy and specifies constraints of rbac by employing semantic web ontology language owl to enhance understanding of constraints for machines and people in ubiquitous computing environment using owl constraints were precisely formalized according to the constraint patterns and the effectiveness of owl specification was demonstrated by showing the reasoning process
mobile ad hoc networks manets have many well known applications in military settings as well as in emergency and rescue operations however lack of infrastructure and lack of centralized control make manets inherently insecure and therefore specialized security services are needed for their deployment self certification is an essential and fundamental security service in manets it is needed to securely cope with dynamic membership and topology and to bootstrap other important security primitives and services without the assistance of any centralized trusted authority an ideal protocol must involve minimal interaction among the manet nodes since connectivity can be unstable also since manets are often composed of weak or resource limited devices self certification protocol must be efficient in terms of computation and communication unfortunately previously proposed protocols are far from being ideal in this paper we propose fully non interactive self certification protocol based on bi variate polynomial secret sharing and threshold bls signature techniques in contrast with prior work our techniques do not require any interaction and do not involve any costly reliable broadcast communication among manet nodes we thoroughly analyze our proposal and show that it compares favorably to previous mechanisms
emergency trauma is major health problem worldwide to evaluate the potential of emerging telepresence technology for facilitating paramedic physician collaboration while providing emergency medical trauma care we conducted between subjects post test experimental lab study during simulated emergency situation paramedics diagnosed and treated trauma victim while working alone or in collaboration with physician via video or proxy analysis of paramedics task performance shows that the fewest harmful procedures occurred in the proxy condition paramedics in the proxy condition also reported higher levels of self efficacy these results indicate telepresence technology has potential to improve paramedics performance of complex emergency medical tasks and improve emergency trauma health care when designed appropriately
we consider the problem of pipelined filters where continuous stream of tuples is processed by set of commutative filters pipelined filters are common in stream applications and capture large class of multiway stream joins we focus on the problem of ordering the filters adaptively to minimize processing cost in an environment where stream and filter characteristics vary unpredictably over time our core algorithm greedy for adaptive greedy has strong theoretical guarantees if stream and filter characteristics were to stabilize greedy would converge to an ordering within small constant factor of optimal in experiments greedy usually converges to the optimal ordering one very important feature of greedy is that it monitors and responds to selectivities that are correlated across filters ie that are nonindependent which provides the strong quality guarantee but incurs run time overhead we identify three way tradeoff among provable convergence to good orderings run time overhead and speed of adaptivity we develop suite of variants of greedy that lie at different points on this tradeoff spectrum we have implemented all our algorithms in the stream prototype data stream management system and thorough performance evaluation is presented
critical database applications require safe replication between at least two sites for disaster tolerant services at the same time they must provide consistent and low latency results to their clients in normal cases in this paper we propose optimistic transactional active replication otar which replicates the transaction logs with low latency and provides consistent view to database applications the latency of our replication is lower than passive replication and guarantees the serializability of transaction isolation levels that cannot be supported by active replication for our replication each client sends transaction request to all replicas and all of the replicas execute the request and optimistically return the result of the transaction to the client each replica generates causality history of the transaction sent to the client with the result with the causality histories the client can make sure that the requested transaction was executed in the same order at all of the replicas and eventually commit it if the client cannot validate the order then the client waits for the pessimistic result of the transaction from the replicas this paper describes the algorithm and its properties
we present effective and robust algorithms to recognize isolated signs in signing exact english see the sign level recognition scheme comprises classifiers for handshape hand movement and hand location the see gesture data are acquired using cyberglove and magnetic trackers linear decision tree with fisher’s linear discriminant fld is used to classify see handshapes hand movement trajectory is classified using vector quantization principal component analysis vqpca both periodic and non periodic see sign gestures are recognized from isolated hand trajectories experiments yielded average handshape recognition accuracy of on unseen signers the average trajectory recognition rate with vqpca for non periodic and periodic gestures was and respectively these classifiers were combined with hand location classifier for sign level recognition yielding an accuracy of on sign see vocabulary
graphs provide powerful abstractions of relational data and are widely used in fields such as network management web page analysis and sociology while many graph representations of data describe dynamic and time evolving relationships most graph mining work treats graphs as static entities our focus in this paper is to discover regions of graph that are evolving in similar manner to discover regions of correlated spatio temporal change in graphs we propose an algorithm called cstag whereas most clustering techniques are designed to find clusters that optimise single distance measure cstag addresses the problem of finding clusters that optimise both temporal and spatial distance measures simultaneously we show the effectiveness of cstag using quantitative analysis of accuracy on synthetic data sets as well as demonstrating its utility on two large real life data sets where one is the routing topology of the internet and the other is the dynamic graph of files accessed together on the world cup official website
many algorithms have been proposed for the problem of time series classification however it is clear that one nearest neighbor with dynamic time warping dtw distance is exceptionally difficult to beat this approach has one weakness however it is computationally too demanding for many realtime applications one way to mitigate this problem is to speed up the dtw calculations nonetheless there is limit to how much this can help in this work we propose an additional technique numerosity reduction to speed up one nearest neighbor dtw while the idea of numerosity reduction for nearest neighbor classifiers has long history we show here that we can leverage off an original observation about the relationship between dataset size and dtw constraints to produce an extremely compact dataset with little or no loss in accuracy we test our ideas with comprehensive set of experiments and show that it can efficiently produce extremely fast accurate classifiers
multi relational data mining has become popular due to the limitations of propositional problem definition in structured domains and the tendency of storing data in relational databases several relational knowledge discovery systems have been developed employing various search strategies heuristics language pattern limitations and hypothesis evaluation criteria in order to cope with intractably large search space and to be able to generate high quality patterns in this work an ilp based concept discovery method namely confidence based concept discovery is described in which strong declarative biases and user defined specifications are relaxed moreover this new method directly works on relational databases in addition to this new confidence based pruning is used in this technique we also describe how to define and use aggregate predicates as background knowledge in the proposed method in order to use aggregate predicates we show how to handle numerical attributes by using comparison operators on them finally we analyze the effect of incorporating unrelated facts for generating transitive rules on the proposed method set of experiments are conducted on real world problems to test the performance of the proposed method
we propose static program analysis techniques for identifying the impact of relational database schema changes upon object oriented applications we use dataflow analysis to extract all possible database interactions that an application may make we then use this information to predict the effects of schema change we evaluate our approach with case study of commercially available content management system where we investigated versions of between loc and schema size of up to tables and stored procedures we demonstrate that the program analysis must be more precise in terms of context sensitivity than related work however increasing the precision of this analysis increases the computational cost we use program slicing to reduce the size of the program that needs to be analyzed using this approach we are able to analyse the case study in under minutes on standard desktop machine with no false negatives and low level of false positives
wireless sensor networks have been widely used in many applications such as soil temperature monitoring for plant growth and abnormal event detection of industrial parameters among these applications aggregate queries such as sum count average min and max are often used to collect statistical data due to the low quality sensing devices or random environmental disturbances sensor data are often noisy hence the idea of moving average which computes the average over consecutive aggregate data is introduced to offset the effect the high link loss rate however makes the result after averaging still inaccurate to address this issue we propose pcm based data transmission scheme to make up the possibly lost data specifically we focus on obtaining robust aggregate results under high link loss rate in order to reduce the communication traffic that dominates the energy consumption of the sensor network we also design an intelligent path selection algorithm for our scheme our extensive simulation results have shown that this technique outperforms its counterparts under various sensor network conditions
in distributed computing system message logging is widely used for providing nodes with recoverability to reduce the piggyback overhead of traditional causal message logging we present zoning causal message logging approach in this paper the crux of the approach is to control the propagation of dependency information the nodes in the system are divided into zones and by message fragment mechanism the dependency information of node is only visible in the zone scope simulation results show that the piggyback overhead of the proposed approach is lower than that of traditional causal message logging
many existing access controls use node filtering or querying rewriting techniques these techniques require rather time consuming processes such as parsing labeling pruning and or rewriting queries into safe ones each time user requests query or takes an action in this paper we propose fine grained access control model named securex which supports read and write privileges with our novel access control concept various access types are introduced including those for determining if user has the right to change xml structure furthermore securex can be integrated well with dynamic labeling scheme to eliminate repetitive labeling and pruning processes when determining user view this brings about advantages of speeding up searching and querying processes when comparing to traditional node filtering technique our integrated access control model takes less processing steps experiments have shown effectiveness of our approach
technological and business changes influence the evolution of software systems when this happens the software artifacts may need to be adapted to the changes this need is rapidly increasing in systems built using the model driven engineering mde paradigm an mde system basically consists of metamodels terminal models and transformations the evolution of metamodel may render its related terminal models and transformations invalid this paper proposes three step solution that automatically adapts terminal models to their evolving metamodels the first step computes the equivalences and simple and complex changes between given metamodel and former version of the same metamodel the second step translates the equivalences and differences into an adaptation transformation this transformation can then be executed in third step to adapt to the new version any terminal model conforming to the former version we validate our ideas by implementing prototype based on the atlanmod model management architecture amma platform we present the accuracy and performance that the prototype delivers on two concrete examples petri net metamodel from the research literature and the netbeans java metamodel
psycotrace is system that integrates static and dynamic tools to protect process from attacks that alter the process self as specified by the program source code the static tools build context free grammar that describes the sequences of system calls the process may issue and set of assertions on the process state one for each invocation the dynamic tools parse the call trace of the process to check that it belongs to the grammar language and evaluate the assertions this paper describes the architecture of psycotrace which exploits virtualization to introduce two virtual machines the monitored and the monitoring virtual machines to increase both the robustness and the transparency of the monitoring because the machine that implements all the checks is strongly separated from the monitored one we discuss the modification to the kernel of the monitored machine to trace system call invocations the definition of the legal traces and the checks to prove the trace is valid we describe how psycotrace applies introspection to evaluate the assertions and analyze the state of the monitored machine and of its data structures finally we present the security and performance results of the dynamic tools and the implementation of the static tools
traffic information systems are among the most prominent real world applications of dijkstra’s algorithm for shortest paths we consider the scenario of central information server in the realm of public railroad transport on wide area networks such system has to process large number of on line queries for optimal travel connections in real time in practice this problem is usually solved by heuristic variations of dijkstra’s algorithm which do not guarantee an optimal result we report results from pilot study in which we focused on the travel time as the only optimization criterion in this study various speed up techniques for dijkstra’s algorithm were analysed empirically this analysis was based on the timetable data of all german trains and on snapshot of half million customer queries
text classification is still an important problem for unlabeled text citeseer computer science document search engine uses automatic text classification methods for document indexing text classification uses document’s original text words as the primary feature representation however such representation usually comes with high dimensionality and feature sparseness word clustering is an effective approach to reduce feature dimensionality and feature sparseness and improve text classification performance this paper introduces domain rule based word clustering method for cluster feature representation the clusters are formed from various domain databases and the word orthographic properties besides significant dimensionality reduction such cluster feature representations show absolute improvement on average on classification performance of document header lines and absolute improvement on the overall accuracy of bibliographic fields extraction in contrast to feature representation just based on the original text words our word clustering even outperforms the distributional word clustering in the context of document metadata extraction
to provide variety of new and advanced communications services computer networks are required to perform increasingly complex packet processing this processing typically takes place on network routers and their associated components an increasingly central component in router design is chip multiprocessor cmp referred to as network processor or np in addition to multiple processors nps have multiple forms of on chip memory various network and off chip memory interfaces and other specialized logic components such as cams content addressable memories the design space for nps eg number of processors caches cache sizes etc is large due to the diverse workload application requirements and system characteristics system design constraints relate to the maximum chip area and the power consumption that are permissible while achieving defined line rates and executing required packet functions in this paper an analytic performance model that captures the processing performance chip area and power consumption for prototypical np is developed and used to provide quantitative insights into system design trade offs the model parameterized with networking application benchmark provides the basis for the design of scalable high performance network processor and presents insights into how best to configure the numerous design elements associated with nps
wireless connectivity for vehicles is fast growing market with plethora of different network technologies already in use surveys of the numbers of ieee access points in cities point to hundreds to thousands of networks within each square kilometre with coverage areas that are not easily predicted due to the complexities of the urban environment in order to take advantage of the diversity in wireless networks available we need data concerning their coverage methods of generating such coverage maps that are accurate space efficient and easy to query are not well addressed area in this paper we present and evaluate using large corpus of real world data novel algorithms for processing large quantities of signal strength values into coverage maps that satisfy such requirements
this paper proposes combination of circuit and architectural techniques to maximize leakage power reduction in embedded processor on chip caches it targets cache peripheral circuits which according to recent studies account for considerable amount of cache leakage at circuit level we propose novel design with multiple sleep modes for cache peripherals each mode represents trade off between leakage reduction and wakeup delay architectural control is proposed to decide when and how to use these different low leakage modes using cache miss information to guide its action this control is based on simple state machines that do not impact area or power consumption and can thus be used even in the resource constrained processors experimental results indicate that proposed techniques can keep the cache peripherals in one of the low power modes for more than of total execution time on average this translates to an average leakage power reduction of for nm technology the dl cache energy delay product is reduced on average by
we introduce general framework that is based on distance semantics and investigate the main properties of the entailment relations that it induces it is shown that such entailments are particularly useful for non monotonic reasoning and for drawing rational conclusions from incomplete and inconsistent information some applications are considered in the context of belief revision information integration systems and consistent query answering for possibly inconsistent databases
this paper compares the energy efficiency of chip multiprocessing cmp and simultaneous multithreading smt on modern out of order processors for the increasingly important multimedia applications since performance is an important metric for real time multimedia applications we compare configurations at equal performance we perform this comparison for large number of performance points derived using different processor architectures and frequencies voltageswe find that for the design space explored for each workload at each performance point cmp is more energy efficient than smt the difference is small for two thread systems but large to for four thread systems we also find that the best smt and the best cmp configuration for given performance target have different architecture and frequency voltage therefore their relative energy efficiency depends on subtle interplay between various factors such as capacitance voltage ipc frequency and the level of clock gating as well as workload features we perform detailed analysis considering these factors and develop mathematical model to explain these resultsalthough cmp shows clear energy advantage for four thread and higher workloads it comes at the cost of increased silicon area we therefore investigate hybrid solution where cmp is built out of smt cores and find it to be an effective compromise finally we find that we can reduce energy further for cmp with straightforward application of previously proposed techniques of adaptive architectures and dynamic voltage frequency scaling
droplet based microfluidic biochips have recently gained much attention and are expected to revolutionize the biological laboratory procedures as biochips are adopted for the complex procedures in molecular biology its complexity is expected to increase due to the need of multiple and concurrent assays on chip in this article we formulate the placement problem of digital microfluidic biochips with tree based topological representation called tree to the best knowledge of the authors this is the first work that adopts topological representation to solve the placement problem of digital microfluidic biochips we also consider the defect tolerant issue to avoid to use defective cells due to fabrication experimental results demonstrate that our approach is more efficient and effective than the previous unified synthesis and placement framework
in this paper we study whether the need for efficient xml publishing brings any new requirements for relational query engines or if sorting query results in the relational engine and tagging them in middleware is sufficient we observe that the mismatch between the xml data model and the relational model requires relational engines to be enhanced for efficiency specifically they need to support relation valued variables we discuss how such support can be provided through the addition of an operator gapply with minimal extensions to existing relational engines we discuss how the operator may be exposed in sql syntax and provide comprehensive study of optimization rules that govern this operator we report the results of preliminary performance evaluation showing the speedup obtained through our approach and the effectiveness of our optimization rules
detecting anomalous traffic is crucial part of managing ip networks in recent years network wide anomaly detection based on principal component analysis pca has emerged as powerful method for detecting wide variety of anomalies we show that tuning pca to operate effectively in practice is difficult and requires more robust techniques than have been presented thus far we analyze week of network wide traffic measurements from two ip backbones abilene and geant across three different traffic aggregations ingress routers od flows and input links and conduct detailed inspection of the feature time series for each suspected anomaly our study identifies and evaluates four main challenges of using pca to detect traffic anomalies the false positive rate is very sensitive to small differences in the number of principal components in the normal subspace ii the effectiveness of pca is sensitive to the level of aggregation of the traffic measurements iii large anomaly may in advertently pollute the normal subspace iv correctly identifying which flow triggered the anomaly detector is an inherently challenging problem
millions of users retrieve information from the internet using search engines mining these user sessions can provide valuable information about the quality of user experience and the perceived quality of search results often search engines rely on accurate estimates of click through rate ctr to evaluate the quality of user experience the vast heterogeneity in the user population and presence of automated software programs bots can result in high variance in the estimates of ctr to improve the estimation accuracy of user experience metrics like ctr we argue that it is important to identify typical and atypical user sessions in clickstreams our approach to identify these sessions is based on detecting outliers using mahalanobis distance in the user session space our user session model incorporates several key clickstream characteristics including novel conformance score obtained by markov chain analysis editorial results show that our approach of identifying typical and atypical sessions has precision of about filtering out these atypical sessions reduces the uncertainty confidence interval of the mean ctr by about these results demonstrate that our approach of identifying typical and atypical user sessions is extremely valuable for cleaning noisy user session data for increased accuracy in evaluating user experience
novel approach to the specification and verification of programs through an annotation language that is mixture between jml and the language of isabelle hol is proposed this yields three benefits specifications are concise and close to the underlying mathematical model existing isabelle theories can be reused and the leap of faith from specification language to encoding in logic is small this is of particular relevance for software certification and verification in application areas such as robotics
most real world database applications manage temporal data ie data with associated time references that capture temporal aspect of the data typically either when the data is valid or when the data is known such applications abound in eg the financial medical and scientific domains in contrast to this current database management systems offer preciously little built in query language support for temporal data management this situation persists although an active temporal database research community has demonstrated that application development can be simplified substantially by built in temporal support this paper’s contribution is motivated by the observation that existing temporal data models and query languages generally make the same rigid assumption about the semantics of the association of data and time namely that if subset of the time domain is associated with some data then this implies the association of any further subset with the data this paper offers comprehensive general framework where alternative semantics may co exist it supports so called malleable and atomic temporal associations in addition to the conventional ones mentioned above which are termed constant to demonstrate the utility of the framework the paper defines characteristics enabled temporal algebra termed ceta which defines the traditional relational operators in the new framework this contribution demonstrates that it is possible to provide built in temporal support while making less rigid assumptions about the data and without jeopardizing the degree of the support this moves temporal support closer to practical applications
this paper presents new content based image retrieval framework with relevance feedback this framework employs genetic programming to discover combination of descriptors that better characterizes the user perception of image similarity several experiments were conducted to validate the proposed framework these experiments employed three different image databases and color shape and texture descriptors to represent the content of database images the proposed framework was compared with three other relevance feedback methods regarding their efficiency and effectiveness in image retrieval tasks experiment results demonstrate the superiority of the proposed method
randomising set index functions can reduce the number of conflict misses in data caches by spreading the cache blocks uniformly over all sets typically the randomisation functions compute the exclusive ors of several address bits not all randomising set index functions perform equally well which calls for the evaluation of many set index functions this paper discusses and improves technique that tackles this problem by predicting the miss rate incurred by randomisation function based on profiling information new way of looking at randomisation functions is used namely the null space of the randomisation function the members of the null space describe pairs of cache blocks that are mapped to the same set this paper presents an analytical model of the error made by the technique and uses this to propose several optimisations to the technique the technique is then applied to generate conflict free randomisation function for the spec benchmarks
phenomenal improvements in the computational performance of multiprocessors have not been matched by comparable gains in system performance this imbalance has resulted in becoming significant bottleneck for many scientific applications one key to overcoming this bottleneck is improving the performance of multiprocessor file systems the design of high performance multiprocessor file system requires comprehensive understanding of the expected workload unfortunately until recently no general workload studies of multiprocessor file systems have been conducted the goal of the charisma project was to remedy this problem by characterizing the behavior of several production workloads on different machines at the level of individual reads and writes the first set of results from the charisma project describe the workloads observed on an intel ipsc and thinking machines cm this paper is intended to compare and contrast these two workloads for an understanding of their essential similarities and differences isolating common trends and platform dependent variances using this comparison we are able to gain more insight into the general principles that should guide multiprocessor file system design
although this is talk about the design of predictive models to determine where faults are likely to be in the next release of large software system the primary focus of the talk is the process that was followed when doing this type of software engineering research we follow the project from problem inception cradle to productization grave describing each of the intermediate stages to try to give picture of why such research takes so long and also why it is necessary to perform each of the steps
modern multi tier application systems are generally based on high performance database systems in order to process and store business information containing valuable business information these systems are highly interesting to attackers and special care needs to be taken to prevent any malicious access to this database layer in this work we propose novel approach for modelling sql statements to apply machine learning techniques such as clustering or outlier detection in order to detect malicious behaviour at the database transaction level the approach incorporates the parse tree structure of sql queries as characteristic eg for correlating sql queries with applications and distinguishing benign and malicious queries we demonstrate the usefulness of our approach on real world data
swarm robotics active self assembly and amorphous computing are fields that focus on designing systems of large numbers of small simple components that can cooperate to complete complex tasks many of these systems are inspired by biological systems and all attempt to use the simplest components and environments possible while still being capable of achieving their goals the canonical problems for such biologically inspired systems are shape assembly and path finding in this paper we demonstrate path finding in the well studied tile assembly model model of molecular self assembly that is strictly simpler than other biologically inspired models as in related work our systems function in the presence of obstacles and can be made fault tolerant the path finding systems use distinct components and find minimal length paths in time linear in the length of the path
when only small number of labeled samples are available supervised dimensionality reduction methods tend to perform poorly because of overfitting in such cases unlabeled samples could be useful in improving the performance in this paper we propose semi supervised dimensionality reduction method which preserves the global structure of unlabeled samples in addition to separating labeled samples in different classes from each other the proposed method which we call semi supervised local fisher discriminant analysis self has an analytic form of the globally optimal solution and it can be computed based on eigen decomposition we show the usefulness of self through experiments with benchmark and real world document classification datasets
in this paper novel hybrid algorithm is presented for the fast construction and high quality rendering of visual hulls we combine the strengths of two complementary hardware accelerated approaches direct constructive solid geometry csg rendering and texture mapping based visual cone trimming the former approach completely eliminates the aliasing artifacts inherent in the latter whereas the rapid speed of the latter approach compensates for the performance deficiency of the former additionally new view dependent texture mapping method is proposed this method makes efficient use of graphics hardware to perform per fragment blending weight computation which yields better rendering quality our rendering algorithm is integrated in distributed system that is capable of acquiring synchronized video streams and rendering visual hulls in real time or at interactive frame rates from up to eight reference views
navigation systems assist almost any kind of motion in the physical world including sailing flying hiking driving and cycling on the other hand traces supplied by global positioning systems gps can track actual time and absolute coordinates of the moving objectsconsequently this paper addresses efficient algorithms and data structures for the route planning problem based on gps data given set of traces and current location infer short est path to the destinationthe algorithm of bentley and ottmann is shown to transform geometric gps information directly into combinatorial weighted and directed graph structure which in turn can be queried by applying classical and refined graph traversal algorithms like dijkstras single source shortest path algorithm or for high precision map inference especially in car navigation algorithms for road segmentation map matching and lane clustering are presented
successfully structuring information in databases olap cubes and xml is crucial element in managing data nowadays however this process brought new challenges to usability it is difficult for users to switch from common communication means using natural language to data models eg database schemas that are hard to work with and understand especially for occasional users this important issue is under intense scrutiny in the database community eg keyword search over databases and query relaxation techniques and the information extraction community eg linking structured and unstructured data however there is still no comprehensive solution that automatically generates an olap online analytical processing query and chooses visualization based on textual content with high precision we present such method we discuss how to dynamically generate interpretations of textual content as an olap query select the best visualization and retrieve on the fly corresponding data from data warehouse to provide the most relevant aggregation results we consider the user’s actual context described by document’s content moreover we provide prototypical implementation of our method the text to query system tq and show how tq can be successfully applied to an enterprise scenario as an extension for an office application
recursive programs may require large numbers of procedure calls and stack operations and many such recursive programs exhibit exponential time complexity due to the time spent re calculating already computed sub problems as result methods which transform given recursive program to an iterative one have been intensively studied we propose here new framework for transforming programs by removing recursion the framework includes unified method of deriving low time complexity programs by solving recurrences extracted from the program sources our prototype system aptsr is an initial implementation of the framework automatically finding simpler closed form versions of class of recursive programs though in general the solution of recurrences is easier if the functions have only single recursion parameter we show practical technique for solving those with multiple recursion parameters
sharing the architectural knowledge of architectural analysis among stakeholders proves to be troublesome this causes problems in and with architectural analysis which can have serious consequences for the quality of system being developed as this quality might be incompletely or wrongly assessed this paper presents domain model which can be used as common ground among analysts and architects to capture and explicitly share such knowledge this enables way to overcome some of the obstacles imposed by the multi disciplinary context in which architectural analysis takes place to apply the domain model in practice we have created tool implementing part of this domain model for capturing and using explicit architectural knowledge during analysis we validate the tool and domain model in the context of an industrial case study
the paper provides critical and evolutionary analysis of knowledge management km in the aec architecture engineering construction industry it spans large spectrum of km research published in the management information systems and information technology it disciplines an interpretive subjectivist stance is adopted so as to provide holistic understanding and interpretation of organizational km practice research and models three generations of km are identified and discussed to sum up to be effective organizations need not only to negotiate their migration from knowledge sharing first generation to knowledge nurturing second generation culture but also to create sustained organizational and societal values the latter form the third generation of km and represent key challenges faced by modern organizations in the aec industry an evolutionary km framework is provided that presents the three proposed generations of km in terms of three dimensions that factor in the capability of individuals teams and organizations in the sector ict evolution and adoption patterns and construction management philosophies the paper suggests that value creation third generation km is grounded in the appropriate combination of human networks social capital intellectual capital and technology assets facilitated by culture of change
thousands of users issue keyword queries to the web search engines to find information on number of topics since the users may have diverse backgrounds and may have different expectations for given query some search engines try to personalize their results to better match the overall interests of an individual user this task involves two great challenges first the search engines need to be able to effectively identify the user interests and build profile for every individual user second once such profile is available the search engines need to rank the results in way that matches the interests of given user in this article we present our work towards personalized web search engine and we discuss how we addressed each of these challenges since users are typically not willing to provide information on their personal preferences for the first challenge we attempt to determine such preferences by examining the click history of each user in particular we leverage topical ontology for estimating user’s topic preferences based on her past searches ie previously issued queries and pages visited for those queries we then explore the semantic similarity between the user’s current query and the query matching pages in order to identify the user’s current topic preference for the second challenge we have developed ranking function that uses the learned past and current topic preferences in order to rank the search results to better match the preferences of given user our experimental evaluation on the google query stream of human subjects over period of month shows that user preferences can be learned accurately through the use of our topical ontology and that our ranking function which takes into account the learned user preferences yields significant improvements in the quality of the search results
computing preference queries has received lot of attention in the database community it is common that the user is unsure of his her preference so care must be taken to elicit the preference of the user correctly in this paper we propose to elicit the preferred ordering of user by utilizing skyline objects as the representatives of the possible ordering we introduce the notion of order based representative skylines which selects representatives based on the orderings that they represent to further facilitate preference exploration hierarchical clustering algorithm is applied to compute denogram on the skyline objects by coupling the hierarchical clustering with visualization techniques we allow users to refine their preference weight settings by browsing the hierarchy extensive experiments were conducted and the results validate the feasibility and the efficiency of our approach
motion estimation is usually based on the brightness constancy assumption this assumption holds well for rigid objects with lambertian surface but it is less appropriate for fluid and gaseous materials for these materials an alternative assumption is required this work examines three possible alternatives gradient constancy color constancy and brightness conservation under this assumption the brightness of an object can diffuse to its neighborhood brightness conservation and color constancy are found to be adequate models we propose method for detecting regions of dynamic texture in image sequences accurate segmentation into regions of static and dynamic texture is achieved using level set scheme the level set function separates each image into regions that obey brightness constancy and regions that obey the alternative assumption we show that the method can be simplified to obtain less robust but fast algorithm capable of real time performance experimental results demonstrate accurate segmentation by the full level set scheme as well as by the simplified method the experiments included challenging image sequences in which color or geometry cues by themselves would be insufficient
an understanding of how people allocate their visual attention when viewing web pages is very important for web authors interface designers advertisers and others such knowledge opens the door to variety of innovations ranging from improved web page design to the creation of compact yet recognizable visual representations of long pages we present an eye tracking study in which users viewed web pages while engaged in information foraging and page recognition tasks from this data we describe general location based characteristics of visual attention for web pages dependent on different tasks and demographics and generate model for predicting the visual attention that individual page elements may receive finally we introduce the concept of fixation impact new method for mapping gaze data to visual scenes that is motivated by findings in vision research
in this paper we extend the concept of shell pipes to incorporate forks joins cycles and key value aggregation these extensions enable the implementation of class of data flow computation with strong deterministic properties and provide simple yet powerful coordination layer for leveraging multi language and legacy components for large scale parallel computation concretely this paper describes the design and implementation of the language extensions in bourne again shell bash and examines the performance of the system using micro and macro benchmarks the implemented system is shown to scale to thousands of processors enabling high throughput performance for millions of processing tasks on large commodity compute clusters
hard disks contain data frequently an irreplaceable asset of high monetary and non monetary value at the same time hard disks are mechanical devices that consume power are noisy and fragile when their platters are rotating in this paper we demonstrate that hard disks cause different kinds of problems for different types of computer systems and demystify several common misconceptions we show that solutions developed to date are incapable of solving the power consumption noise and data reliability problems without sacrificing hard disk life time data reliability or user convenience we considered data reliability recovery performance user convenience and hard disk caused problems together at the enterprise scale we have designed greenfs fan out stackable file system that offers all time all data run time data protection improves performance under typical user workloads and allows hard disks to be kept off most of the time as result greenfs improves enterprise data protection minimizes disk drive related power consumption and noise and increases the chances of disk drive survivability in case of unexpected external impacts
data grids rely on the coordinated sharing of and interaction across multiple autonomous database management systems they provide transparent access to heterogeneous and autonomous data resources stored on grid nodes data sharing tools for grids must include both distributed query processing and data integration functionality this paper presents the implementation of data sharing system that is tailored to data grids ii supports well established and widely spread relational dbmss and iii adopts hybrid architecture by relying on peer model for query reformulation to retrieve semantically equivalent expressions and on wrapper mediator integration model for accessing and querying distributed data sources the system builds upon the infrastructure provided by the ogsa dqp distributed query processor and the xmap query reformulation algorithm the paper discusses the implementation methodology and presents empirical evaluation results
in this paper we show an instance based reasoning mail filtering model that outperforms classical machine learning techniques and other successful lazy learners approaches in the domain of anti spam filtering the architecture of the learning based anti spam filter is based on tuneable enhanced instance retrieval network able to accurately generalize mail representations the reuse of similar messages is carried out by simple unanimous voting mechanism to determine whether the target case is spam or not previous to the final response of the system the revision stage is only performed when the assigned class is spam whereby the system employs general knowledge in the form of meta rules
today’s dbmss are unable to support the increasing demands of the various applications that would like to use dbms each kind of application poses new requirements for the dbms the starburst project at ibm’s almaden research center aims to extend relational dbms technology to bridge this gap between applications and the dbms while providing full function relational system to enable sharing across applications starburst will also allow sophisticated programmers to add many kinds of extensions to the base system’s capabilities including language extensions eg new datatypes and operations data management extensions eg new access and storage methods and internal processing extensions eg new join methods and new query transformations to support these features the database query language processor must be very powerful and highly extensible starburst’s language processor features powerful query language rule based optimization and query rewrite and an execution system based on an extended relational algebra in this paper we describe the design of starburst’s query language processor and discuss the ways in which the language processor can be extended to achieve starburst’s goals
modern computer systems are inherently nondeterministic due to variety of events that occur during an execution including interrupts and dma fills the lack of repeatability that arises from this nondeterminism can make it difficult to develop and maintain correct software furthermore it is likely that the impact of nondeterminism will only increase in the coming years as commodity systems are now shared memory multiprocessors such systems are not only impacted by the sources of nondeterminism in uniprocessors but also by the outcome of memory races among concurrent threads in an effort to help ease the pain of developing software in nondeterministic environment researchers have proposed adding deterministic replay capabilities to computer systems system with deterministic replay capability can record sufficient information during an execution to enable replayer to later create an equivalent execution despite the inherent sources of nondeterminism that exist with the ability to replay an execution verbatim many new applications may be possible debugging deterministic replay could be used to provide the illusion of time travel debugger that has the ability to selectively execute both forward and backward in time security deterministic replay could also be used to enhance the security of software by providing the means for an in depth analysis of an attack hopefully leading to rapid patch deployment and reduction in the economic impact of new threats fault tolerance with the ability to replay an execution it may also be possible to develop hot standby systems for critical service providers using commodity hardware virtual machine vm could for example be fed in real time the replay log of primary server running on physically separate machine the standby vm could use the replay log to mimic the primary’s execution so that in the event that the primary fails the backup can take over operation with almost zero downtime
an important precondition for the success of the semantic web is founded on the principle that the content of web pages will be semantically annotated this paper proposes method of automatically acquiring semantic annotations aasa in the aasa method we employ combination of data mining and optimization to acquire semantic annotations key features of aasa include combining association rules inference mechanism genetic algorithm and self organizing map to create semantic annotations and using the nearest neighbor query combined with simulated annealing to maintain semantic annotations
the complexity of multi agent systems behavior properties is studied the behavior properties are formulated using classical temporal logic languages and are checked relative to the transition system induced by the multi agent system definition we show that there are deterministic or nondeterministic polynomial time check algorithms under some realistic structural and semantic restrictions on agent programs and actions
this paper proposes novel approach to the formal definition of uml semantics we distinguish descriptive semantics from functional semantics of modelling languages the former defines which system is an instance of model while the later defines the basic concepts underlying the models in this paper the descriptive semantics of class diagram interaction diagram and state machine diagram are defined by first order logic formulas translation tool is implemented and integrated with the theorem prover spass to enable automated reasoning about models the formalisation and reasoning of models is then applied to model consistency checking
multicore shared memory architectures are becoming prevalent and bring many programming challenges among the biggest are data races accesses to shared resources that make program’s behavior depend on scheduling decisions beyond its control to eliminate such races the shim concurrent programming language adopts deterministic message passing as it sole communication mechanism we demonstrate such language restrictions are practical by presenting shim to plus pthreads compiler that can produce efficient code for shared memory multiprocessors we present parallel jpeg decoder and fft exhibiting and speedups on four core processor
large image databases have emerged in various applications in recent years prime requisite of these databases is the means by which their contents can be indexed and retrieved multilevel signature file called the two signature multi level signature file smlsf is introduced as an efficient access structure for large image databases the smlsf encodes image information into binary signatures and creates tree structures can be efficiently searched to satisfy user’s query two types of signatures are generated type signatures are used at all tree levels except the leaf level and are based only on the domain objects included in the image type ii signatures on the other hand are stored at the leaf level and are based on the included domain objects and their spatial relationships the smlsf was compared analytically to existing signature file techniques the smlsf significantly reduces the storage requirements the index structure can answer more queries and the smlsf performance significantly improves over current techniques both storage reduction and performance improvement increase with the number of objects per image and the number of images in the database for an example large image database storage reduction of may be archieved while the performance improvement may reach
network on chip and memory controller become correlated with each other in case of high network congestion since the network port of memory controller can be blocked due to the back propagated network congestion we call such problem network congestion induced memory blocking in order to resolve the problem we present novel idea of network congestion aware memory controller based on the global information of network congestion the memory controller performs congestion aware memory access scheduling and congestion aware network entry control of read data the experimental results obtained from tile architecture show that the proposed memory controller presents up to improvement in memory utilization
distributed moving object database servers are feasible solution to the scalability problem of centralized database systems in this paper we propose distributed indexing method using the distributed hash table dht paradigm devised to efficiently support complex spatio temporal queries we assume setting in which there is large number of database servers that keep track of events associated with highly dynamic system of moving objects deployed in spatial area we present technique for properly keeping the index up to date and efficiently processing range and top queries for moving object databases we evaluated our system using event driven simulators with demanding spatio temporal workloads and the results show good performance in terms of response time and network traffic
power density continues to increase exponentially with each new technology generation posing major challenge for thermal management in modern processors much past work has examined microarchitectural policies for reducing total chip power but these techniques alone are insufficient if not aimed at mitigating individual hotspots the industry’s current trend has been toward multicore architectures which provide additional opportunities for dynamic thermal management this paper explores various thermal management techniques that exploit the distributed nature of multicore processors we classify these techniques in terms of core throttling policy whether that policy is applied locally to core or to the processor as whole and process migration policies we use turandot and hotspot based thermal simulator to simulate variety of workloads under thermal duress on core powerpctmprocessor using benchmarks from the spec suite we characterize workloads in terms of instruction throughput as well as their effective duty cycles among variety of options we find that distributed controltheoretic dvfs alone improves throughput by under our test conditions our final design involves pi based core thermal controller and an outer control loop to decide process migrations this policy avoids all thermal emergencies and yields an average of speedup over the baseline across all workloads
with the increasing sophistication of circuits and specifically in the presence of ip blocks new estimation methods are needed in the design flow of large scale circuits up to now number of post placement congestion estimation techniques in the presence of ip blocks have been presented in this paper we present unified approach for predicting wirelength congestion and delay parameters early in the design flow we also propose methodology to integrate these prediction methods into the placement framework to handle the large complexity of the designs
to overcome the computational complexity of the asynchronous hidden markov model ahmm we present novel multidimensional dynamic time warping dtw algorithm for hybrid fusion of asynchronous data we show that our newly introduced multidimensional dtw concept requires significantly less decoding time while providing the same data fusion flexibility as the ahmm thus it can be applied in wide range of real time multimodal classification tasks optimally exploiting mutual information during decoding even if the input streams are not synchronous our algorithm outperforms late and early fusion techniques in challenging bimodal speech and gesture fusion experiment
clustered microarchitectures are an attractive alternative to large monolithic superscalar designs due to their potential for higher clock rates in the face of increasingly wire delay constrained process technologies as increasing transistor counts allow an increase in the number of clusters thereby allowing more aggressive use of instruction level parallelism ilp the inter cluster communication increases as data values get spread across wider area as result of the emergence of this trade off between communication and parallelism subset of the total on chip clusters is optimal for performance to match the hardware to the application’s needs we use robust algorithm to dynamically tune the clustered architecture the algorithm which is based on program metrics gathered at periodic intervals achieves an performance improvement on average over the best statically defined architecture we also show that the use of additional hardware and reconfiguration at basic block boundaries can achieve average improvements of our results demonstrate that reconfiguration provides an effective solution to the communication and parallelism trade off inherent in the communication bound processors of the future
today most document categorization in organizations is done manually we save at work hundreds of files and mail messages in folders every day while automatic document categorization has been widely studied much challenging research still remains to support user subjective categorization this study evaluates and compares the application of self organizing maps soms and learning vector quantization lvq with automatic document classification using set of documents from an organization in specific domain manually classified by domain expert after running the som and lvq we requested the user to reclassify documents that were misclassified by the system results show that despite the subjective nature of human categorization automatic document categorization methods correlate well with subjective personal categorization and the lvq method outperforms the som the reclassification process revealed an interesting pattern about of the documents were classified according to their original categorization about according to the system’s categorization the users changed the original categorization and the remainder received different new categorization based on these results we conclude that automatic support for subjective categorization is feasible however an exact match is probably impossible due to the users changing categorization behavior
as silicon technologies move into the nanometer regime transistor reliability is expected to wane as devices become subject to extreme process variation particle induced transient errors and transistor wear out unless these challenges are addressed computer vendors can expect low yields and short mean times to failure in this article we examine the challenges of designing complex computing systems in the presence of transient and permanent faults we select one small aspect of typical chip multiprocessor cmp system to study in detail single cmp router switch our goal is to design bulletproof cmp switch architecture capable of tolerating significant levels of various types of defects we first assess the vulnerability of the cmp switch to transient faults to better understand the impact of these faults we evaluate our cmp switch designs using circuit level timing on detailed physical layouts our infrastructure represents new level of fidelity in architectural level fault analysis as we can accurately track faults as they occur noting whether they manifest or not because of masking in the circuits logic or architecture our experimental results are quite illuminating we find that transient faults because of their fleeting nature are of little concern for our cmp switch even within large switch fabrics with fast clocks next we develop unified model of permanent faults based on the time tested bathtub curve using this convenient abstraction we analyze the reliability versus area tradeoff across wide spectrum of cmp switch designs ranging from unprotected designs to fully protected designs with on line repair and recovery capabilities protection is considered at multiple levels from the entire system down through arbitrary partitions of the design we find that designs are attainable that can tolerate larger number of defects with less overhead than na iuml ve triple modular redundancy using domain specific techniques such as end to end error detection resource sparing automatic circuit decomposition and iterative diagnosis and reconfiguration
being deluged by exploding volumes of structured and unstructured data contained in databases data warehouses and the global internet people have an increasing need for critical information that is expertly extracted and integrated in personalized views allowing for the collective efforts of many data and knowledge workers we offer in this paper framework for addressing the issues involved in our proposed framework we assume that target view is specified ontologically and independently of any of the sources and we model both the target and all the sources in the same modeling language then for given target and source we generate target to source mapping that has the necessary properties to enable us to load target facts from source facts the mapping generator raises specific issues for user’s consideration but is endowed with defaults to allow it to run to completion with or without user input the framework is based on formal foundation and we are able to prove that when source has valid interpretation the generated mapping produces valid interpretation for the part of the target loaded from the source
one of the most promising techniques to detect and thwart network attack in network intrusion detection system is to compare each incoming packet with pre defined attack patterns this comparison can be performed by pattern matching engine which has several key requirements including scalability to line rates of network traffic and easy updating of new attack patterns memory based deterministic finite automata meet these requirements however their storage requirement will grow exponentially with the number of patterns which makes it impractical for implementation in this paper we propose customized memory based pattern matching engine whose storage requirement linearly increases with the number of patterns the basic idea is to allocate one memory slot for each state instead of each edge of the deterministic finite automaton to demonstrate this idea we have developed two customized memory decoders we evaluate them by comparing with traditional approach in terms of programmability and resource requirements we also examine their effectiveness for different optimized deterministic finite automata experimental results are presented to demonstrate the validity of our proposed approach
solid models may be blended through filleting or rounding operations that typically replace the vicinity of concave or convex edges by blends that smoothly connect to the rest of the solid’s boundary circular blends which are popular in manufacturing are each the subset of canal surface that bounds the region swept by ball of constant or varying radius as it rolls on the solid while maintaining two tangential contacts we propose to use second solid to control the radius variation this new formulation supports global blending simultaneous rounding and filleting operations and yields simple set theoretic formulation of the relative blending of solid given control solid we propose user interface options describe practical implementations and show results in and dimensions
debugging concurrent programs is difficult this is primarily because the inherent non determinism that arises because of scheduler interleavings makes it hard to easily reproduce bugs that may manifest only under certain interleavings the problem is exacerbated in multi core environments where there are multiple schedulers one for each core in this paper we propose reproduction technique for concurrent programs that execute on multi core platforms our technique performs lightweight analysis of failing execution that occurs in multi core environment and uses the result of the analysis to enable reproduction of the bug in single core system under the control of deterministic scheduler more specifically our approach automatically identifies the execution point in the re execution that corresponds to the failure point it does so by analyzing the failure core dump and leveraging technique called execution indexing that identifies related point in the re execution by generating core dump at this point and comparing the differences betwen the two dumps we are able to guide search algorithm to efficiently generate failure inducing schedule our experiments show that our technique is highly effective and has reasonable overhead
increasingly software should dynamically adapt its behavior at run time in response to changing conditions in the supporting computing and communication infrastructure and in the surrounding physical environment in order for an adaptive program to be trusted it is important to have mechanisms to ensure that the program functions correctly during and after adaptations adaptive programs are generally more difficult to specify verify and validate due to their high complexity particularly when involving multi threaded adaptations the program behavior is the result of the collaborative behavior of multiple threads and software components this paper introduces an approach to create formal models for the behavior of adaptive programs our approach separates the adaptation behavior and non adaptive behavior specifications of adaptive programs making the models easier to specify and more amenable to automated analysis and visual inspection we introduce process to construct adaptation models automatically generate adaptive programs from the models and verify and validate the models we illustrate our approach through the development of an adaptive gsm oriented audio streaming protocol for mobile computing application
as social networks are becoming ubiquitous on the web the semantic web goals indicate that it is critical to have standard model allowing exchange interoperability transformation and querying of social network datain this paper we show that rdf sparql meet this desiderata building on developments of social network analysis graph databases and semantic web we present social networks data model based on rdf and query and transformation language based on sparql meeting the above requirements we study its expressive power and complexity showing that it behaves well and present an illustrative prototype
horizontal microcoded architecture hma is paradigm for designing programmable high performance processing elements pes however it suffers from large code size which can be addressed by compression in this article we study the code size of one of the new hma based technologies called no instruction set computer nisc we show that nisc code size can be several times larger than typical risc processor and we propose several low overhead dictionary based code compression techniques to reduce its code size our compression algorithm leverages the knowledge of ldquo don’t care rdquo values in the control words and can reduce the code size by times on average despite such good results as shown in this article these compression techniques lead to poor fpga implementations because they require many on chip rams to address this issue we introduce an fpga aware dictionary based technique that uses the dual port feature of on chip rams to reduce the number of utilized block rams by half additionally we propose cascading two levels of dictionaries for code size and block ram reduction of large programs for an mp application merged cascaded three dictionary implementation reduces the number of utilized block rams by times compared to nisc without compression this corresponds to additional savings over the best single level dictionary based compression
middleware for web service compositions such as bpel engines provides the execution environment for services as well as additional functionalities such as monitoring and self tuning given its role in service provisioning it is very important to assess the performance of middleware in the context of soa this paper presents soabench framework for the automatic generation and execution of testbeds for benchmarking middleware for composite web services and for assessing the performance of existing soa infrastructures soabench defines testbed model characterized by the composite services to execute the workload to generate the deployment configuration to use the performance metrics to gather the data analyses to perform on them and the reports to produce we have validated soabench by benchmarking the performance of different bpel engines
extracting frequently executed hot portions of the application and executing their corresponding data flow graph dfg on the hardware accelerator brings about more speedup and energy saving for embedded systems comprising base processor integrated with tightly coupled accelerator extending dfgs to support control instructions and using control dfgs cdfgs instead of dfgs results in more coverage of application code portion are being accelerated hence more speedup and energy saving in this paper motivations for extending dfgs to cdfgs and handling control instructions are introduced in addition basic requirements for an accelerator with conditional execution support are proposed then two algorithms are presented for temporal partitioning of cdfgs considering the target accelerator architectural constraints to demonstrate effectiveness of the proposed ideas they are applied to the accelerator of reconfigurable processor called amber experimental results approve the remarkable effectiveness of covering control instructions and using cdfgs versus dfgs in the aspects of performance and energy reduction
today wireless networks are becoming increasingly ubiquitous usually several complex multi threaded applications are mapped on single embedded system and each of them is triggered by different input stream in accordance with the run time behaviours of the user and the environment this dynamicity renders the task of fully analyzing at design time these systems very complex if not impossible therefore run time information has to be used in order to produce an efficient design this introduces new challenges especially for embedded system designers using direct memory access dma module who have to know in advance the memory transfer behaviour of the whole system in order to design and program their dma efficiently this is especially important in embedded systems with dram memories as the concurrent accesses from different processing elements can adversely affect the page based architecture of these memory elements even more the increasingly common usage of dynamic data types further complicates the problem because the exact location of data instances in the memory is unknown at design time in this paper we propose system level optimization methodology to adapt the dma usage parameters automatically at run time according to online information with our proposed optimization approach we manage to reduce the mean latency of the memory transfers by more than thus reducing the average number of cycles that processing elements or dmas have to waste waiting for data from the main memory while optimizing energy consumption and system responsiveness we evaluate our approach using set of real life applications and real wireless dynamic streams
model elimination is back chaining strategy to search for and construct resolution refutations recent extensions to model elimination implemented in modoc have made it practical tool for satisfiability checking particularly for problems with known goals many formulas can be refuted more succinctly by recording certain derived clauses called lemmas lemmas can be used where clause of the original formula would normally be required however recording too many lemmas overwhelms the proof search lemma management has significant effect on the performance of modoc earlier research studied pure persistent global strategies and pure unit lemma local strategies this paper describes and evaluates hybrid strategy to control the lifetime of lemmas as well as new technique for deriving certain lemmas efficiently using lazy strategy unit lemmas are recorded locally as in previous practice but certain lemmas that are considered valuable are asserted globally range of functions for estimating value is studied experimentally criteria are reported that appear to be suitable for wide range of application derived formulas
nowadays the grid is turning into service oriented environment in this context there exist solutions to the execution of workflows and most of them are web service based additionally services are considered to exist on fixed host limiting the resource alternatives when scheduling the workflow tasks in this paper we address the problem of dynamic instantiation of grid services to schedule workflow applications we propose an algorithm to select the best resources available to execute each task of the workflow on the already instantiated services or on services dynamically instantiated when necessary the algorithm relies on the existence of grid infrastructure which could provide dynamic service instantiation simulation results show that the scheduling algorithm associated with the dynamic service instantiation can bring more efficient workflow execution on the grid
we present simple and efficient method for reconstructing triangulated surfaces from massive oriented point sample datasets the method combines streaming and parallelization moving least squares mls projection adaptive space subdivision and regularized isosurface extraction besides presenting the overall design and evaluation of the system our contributions include methods for keeping in core data structures complexity purely locally output sensitive and for exploiting both the explicit and implicit data produced by mls projector to produce tightly fitting regularized triangulations using primal isosurface extractor our results show that the system is fast scalable and accurate we are able to process models with several hundred million points in about an hour and outperform current fast streaming reconstructors in terms of geometric accuracy
in this paper the problem of face recognition under variable illumination conditions is considered most of the works in the literature exhibit good performance under strictly controlled acquisition conditions but the performance drastically drop when changes in pose and illumination occur so that recently number of approaches have been proposed to deal with such variability the aim of this work is twofold first survey on the existing techniques proposed to obtain an illumination robust recognition is given and then new method based on the fusion of different classifiers is proposed the experiments carried out on different face databases confirm the effectiveness of the approach
in this work we propose system for automatic document segmentation to extract graphical elements from historical manuscripts and then to identify significant pictures from them removing floral and abstract decorations the system performs block based analysis by means of color and texture features the gradient spatial dependency matrix new texture operator particularly effective for this task is proposed the feature vectors are processed by an embedding procedure which allows increased performance in later svm classification results for both feature extraction and embedding based classification are reported supporting the effectiveness of the proposal
we study how best to schedule scans of large data files in the presence of many simultaneous requests to common set of files the objective is to maximize the overall rate of processing these files by sharing scans of the same file as aggressively as possible without imposing undue wait time on individual jobs this scheduling problem arises in batch data processing environments such as map reduce systems some of which handle tens of thousands of processing requests daily over shared set of files as we demonstrate conventional scheduling techniques such as shortest job first do not perform well in the presence of cross job sharing opportunities we derive new family of scheduling policies specifically targeted to sharable workloads our scheduling policies revolve around the notion that all else being equal it is good to schedule nonsharable scans ahead of ones that can share io work with future jobs if the arrival rate of sharable future jobs is expected to be high we evaluate our policies via simulation over varied synthetic and real workloads and demonstrate significant performance gains compared with conventional scheduling approaches
we employ existing partial evaluation pe techniques developed for constraint logic programming clp in order to automatically generate test case generators for glass box testing of bytecode our approach consists of two independent clp pe phases first the bytecode is transformed into an equivalent decompiled clp program this is already well studied transformation which can be done either by using an ad hoc decompiler or by specialising bytecode interpreter by means of existing pe techniques second pe is performed in order to supervise the generation of test cases by execution of the clp decompiled program interestingly we employ control strategies previously defined in the context of clp pe in order to capture coverage criteria for glass box testing of bytecode unique feature of our approach is that this second pe phase allows generating not only test cases but also test case generators to the best of our knowledge this is the first time that clp pe techniques are applied for test case generation as well as to generate test case generators
spatial layout is frequently used for managing loosely organized information such as desktop icons and digital ink to help users organize this type of information efficiently we propose an interface for manipulating spatial aggregations of objects the aggregated objects are automatically recognized as group and the group structure is visualized as two dimensional bubble surface that surrounds the objects users can drag copy or delete group by operating on the bubble furthermore to help pick out individual objects in dense aggregation the system spreads the objects to avoid overlapping when requested this paper describes the design of this interface and its implementation we tested our technique in icon grouping and ink relocation tasks and observed improvements in user performance
linkage analysis is used to localize human disease genes on the genome and it can involve the exploration and interpretation of seven dimensional genetic likelihood space existing genetic likelihood exploration techniques are quite cumbersome and slow and do not help provide insight into the shape and features of the high dimensional likelihood surface the objective of our visualization is to provide an efficient visual exploration of the complex genetic likelihood space so that researchers can assimilate more information in the least possible time in this paper we present new visualization tools for interactive and efficient exploration of the multi dimensional likelihood space our tools provide interactive manipulation of active ranges of the six model parameters determining the dependent variable scaled genetic likelihood or hlod using filtering color and an approach inspired by worlds within worlds researchers can quickly obtain more informative and insightful visual interpretation of the space
service level agreements slas are used in service oriented computing to define the obligations of the parties involved in transaction slas define the service users quality of service qos requirements that the service provider should satisfy requirements defined once may not be satisfiable when the context of the web services changes eg when requirements or resource availability changes changes in the context can make slas obsolete making sla revision necessary we propose method to autonomously monitor the services context and adapt slas to avoid obsolescence thereof
common task in many database applications is the migration of legacy data from multiple sources into new one this requires identifying semantically related elements of the source and target systems and the creation of mapping expressions to transform instances of those elements from the source format to the target format currently data migration is typically done manually tedious and timeconsuming process which is difficult to scale to high number of data sources in this paper we describe quickmig new semi automatic approach to determining semantic correspondences between schema elements for data migration applications quickmig advances the state of the art with set of new techniques exploiting sample instances domain ontologies and reuse of existing mappings to detect not only element correspondences but also their mapping expressions quickmig further includes new mechanisms to effectively incorporate domain knowledge of users into the matching process the results from comprehensive evaluation using real world schemas and data indicate the high quality and practicability of the overall approach
this paper presents real time rendering system for generating chinese ink and wash cartoon the objective is to free the animators from laboriously designing traditional chinese painting appearance the system constitutes morphing animation framework and rendering process the whole rendering process is based on graphic process unit gpu including interior shading silhouette extracting and background shading moreover the morphing framework is created to automatically generate chinese painting cartoon from set of surface mesh models these techniques can be applied to real time chinese style entertainment application
consider large scale wireless sensor network measuring compressible data where distributed data values can be well approximated using only coefficients of some known transform we address the problem of recovering an approximation of the data values by querying any sensors so that the reconstruction error is comparable to the optimal term approximation to solve this problem we present novel distributed algorithm based on sparse random projections which requires no global coordination or knowledge the key idea is that the sparsity of the random projections greatly reduces the communication cost of pre processing the data our algorithm allows the collector to choose the number of sensors to query according to the desired approximation error the reconstruction quality depends only on the number of sensors queried enabling robust refinable approximation
this paper proposes and develops the basic theory for new approach to typing multi stage languages based notion of environment classifiers this approach involves explicit but lightweight tracking at type checking time of the origination environment for future stage computations classification is less restrictive than the previously proposed notions of closedness and allows for both more expressive typing of the run construct and for unifying account of typed multi stage programminthe proposed approach to typing requires making cross stage persistence csp explicit in the language at the same time it offers concrete new insights into the notion of levels and in turn into csp itself type safety is established in the simply typed setting as first step toward introducing classifiers to the hindley milner setting we propose an approach to integrating the two and prove type preservation in this setting
we present scalable temporal order analysis technique that supports debugging of large scale applications by classifying mpi tasks based on their logical program execution order our approach combines static analysis techniques with dynamic analysis to determine this temporal order scalably it uses scalable stack trace analysis techniques to guide selection of critical program execution points in anomalous application runs our novel temporal ordering engine then leverages this information along with the application’s static control structure to apply data flow analysis techniques to determine key application data such as loop control variables we then use lightweight techniques to gather the dynamic data that determines the temporal order of the mpi tasks our evaluation which extends the stack trace analysis tool stat demonstrates that this temporal order analysis technique can isolate bugs in benchmark codes with injected faults as well as real world hang case with amg
this paper presents method to integrate external knowledge sources such as dbpedia and opencyc into an ontology learning system that automatically suggests labels for unknown relations in domain ontologies based on large corpora of unstructured text the method extracts and aggregates verb vectors from semantic relations identified in the corpus it composes knowledge base which consists of verb centroids for known relations between domain concepts ii mappings between concept pairs and the types of known relations and iii ontological knowledge retrieved from external sources applying semantic inference and validation to this knowledge base improves the quality of suggested relation labels formal evaluation compares the accuracy and average ranking precision of this hybrid method with the performance of methods that solely rely on corpus data and those that are only based on reasoning and external data sources
we show that the localization problem for multilevel wireless sensor networks wsns can be solved as pattern recognition with the use of the support vector machines svm method in this paper we propose novel hierarchical classification method that generalizes the svm learning and that is based on discriminant functions structured in such way that it contains the class hierarchy we study version of this solution which uses hierarchical svm classifier we present experimental results the hierarchical svm classifier for localization in multilevel wsns the acm portal is published by the association for computing machinery copyright acm inc terms of usage privacy policy code of ethics contact us useful downloads adobe acrobat quicktime windows media player real player
we introduce word segmentation approach to languages where word boundaries are not orthographically marked with application to phrase based statistical machine translation pb smt instead of using manually segmented monolingual domain specific corpora to train segmenters we make use of bilingual corpora and statistical word alignment techniques first of all our approach is adapted for the specific translation task at hand by taking the corresponding source target language into account secondly this approach does not rely on manually segmented training data so that it can be automatically adapted for different domains we evaluate the performance of our segmentation approach on pb smt tasks from two domains and demonstrate that our approach scores consistently among the best results across different data conditions
current optimizations for anonymity pursue reduction of data distortion unilaterally and rarely evaluate disclosure risk during process of anonymization we propose an optimal anonymity algorithm in which the balance of risk distortion rd can be equilibrated at each anonymity stage we first construct generalization space gs then we use the probability and entropy metric to measure rd for each node in gs and finally we introduce releaser’s rd preference to decide an optimal anonymity path our algorithm adequately considers the dual impact on rd and obtains an optimal anonymity with satisfaction of releaser the efficiency of our algorithm will be evaluated by extensive experiments
geographically co located sensors tend to participate in the same environmental phenomena phenomenon aware stream query processing improves scalability by subscribing each query only to subset of sensors that participate in the phenomena of interest to that query in the case of sensors that generate readings with multi attribute schema phenomena may develop across the values of one or more attributes however tracking and detecting phenomena across all attributes does not scale well as the dimensions increase as the size of sensor network increases and as the number of attributes being tracked by sensor increases this becomes major bottleneck in this paper we present novel dimensional phenomenon detection and tracking mechanism termed as nd pdt over ary sensor readings we reduce the number of dimensions to be tracked by first dropping dimensions without any meaningful phenomena and then we further reduce the dimensionality by continuously detecting and updating various forms of functional dependencies amongst the phenomenon dimensions
in this paper we propose role based access control rbac method for grid database services in open grid services architecture data access and integration ogsa dai ogsa dai is an efficient grid enabled middleware implementation of interfaces and services to access and control data sources and sinks however in ogsa dai access control causes substantial administration overhead for resource providers in virtual organizations vos because each of them has to manage role map file containing authorization information for individual grid users to solve this problem we used the community authorization service cas provided by the globus toolkit to support the rbac within the ogsa dai framework the cas grants the membership on vo roles to users the resource providers then need to maintain only the mapping information from vo roles to local database roles in the role map files so that the number of entries in the role map file is reduced dramatically furthermore the resource providers control the granting of access privileges to the local roles thus our access control method provides increased manageability for large number of users and reduces day to day administration tasks of the resource providers while they maintain the ultimate authority over their resources performance analysis shows that our method adds very little overhead to the existing security infrastructure of ogsa dai
newly deployed multi hop radio network is unstructured and lacks reliable and efficient communication scheme in this paper we take step towards analyzing the problems existing during the initialization phase of ad hoc and sensor networks particularly we model the network as multi hop quasi unit disk graph and allow nodes to wake up asynchronously at any time further nodes do not feature reliable collision detection mechanism and they have only limited knowledge about the network topology we show that even for this restricted model good clustering can be computed efficiently our algorithm efficiently computes an asymptotically optimal clustering based on this algorithm we describe protocol for quickly establishing synchronized sleep and listen schedule between nodes within cluster additionally we provide simulation results in variety of settings
new problems of updating views involving inter entity relationships or joins are identified beyond those reported previously in the literature general purpose method and set of algorithms are presented for correctly decomposing multilingual update requests on network of distributed heterogeneous databases the method and algorithm also apply to both homogeneous nondistributed and distributed database environments the method called prototype views and update rules applies to individual relationships in an entity relationship er view of the network database and gives floorplan for update decomposition the network database view represents unified conceptual view of all the individual databases in the heterogeneous network ie of the objects shared across the network the update request is decomposed into sequence of intermediate control language steps to subsequently guide the particular updates to each of the underlying databases in the network individual database updates are performed by each particular database management system dbms
there is challenging man machine interface issue in existing association analysis algorithms because they are apriori like and the apriori algorithm is based on the assumption that users can specify the threshold minimum support it is impossible that users give suitable minimum support for database to be mined if the users are without knowledge concerning the database in this paper we propose fuzzy mining strategy with database independent minimum support which provides good man machine interface that allows users to specify the minimum support threshold without any knowledge concerning their databases to be mined we have evaluated the proposed approach and the experimental results have demonstrated that our algorithm is promising and efficient
we describe the integration of smart digital objects with hebbian learning to create distributed real time scalable approach to adapting to community’s preferences we designed an experiment using popular music as the subject matter each digital object corresponded to music album and contained links to other music albums by dynamically generating links among digital objects according to user traversal patterns then hierarchically organizing these links according to shared metadata values we created network of digital objects that self organized in real time according to the preferences of the user community furthermore the similarity between user preferences and generated link structure was more pronounced between collections of objects aggregated by shared metadata values
there still exists an open question on how formal models can be fully realized in the system development phase the model driven development mdd approach has been recently introduced to deal with such critical issue for building high assurance software systems there still exists an open question on how formal models can be fully realized in the system development phase the model driven development mdd approach has been recently introduced to deal with such critical issue for building high assurance software systems the mdd approach focuses on the transformation of high level design models to system implementation modules however this emerging development approach lacks an adequate procedure to address security issues derived from formal security models in this paper we propose an empirical framework to integrate security model representation security policy specification and systematic validation of security model and policy which would be eventually used for accommodating security concerns during the system development we also describe how our framework can minimize the gap between security models and the development of secure systems in addition we overview proof of concept prototype of our tool that facilitates existing software engineering mechanisms to achieve the above mentioned features of our framework
it is frequently remarked that designers of computer vision algorithms and systems cannot reliably predict how algorithms will respond to new problems variety of reasons have been given for this situation and variety of remedies prescribed in literature most of these involve in some way paying greater attention to the domain of the problem and to performing detailed empirical analysis the goal of this paper is to review what we see as current best practices in these areas and also suggest refinements that may benefit the field of computer vision distinction is made between the historical emphasis on algorithmic novelty and the increasing importance of validation on particular data sets and problems
the introduction of the semantic web vision and the shift toward machine understandable web resources has unearthed the importance of automatic semantic reconciliation consequently new tools for automating the process were proposed in this work we present formal model of semantic reconciliation and analyze in systematic manner the properties of the process outcome primarily the inherent uncertainty of the matching process and how it reflects on the resulting mappings an important feature of this research is the identification and analysis of factors that impact the effectiveness of algorithms for automatic semantic reconciliation leading it is hoped to the design of better algorithms by reducing the uncertainty of existing algorithms against this background we empirically study the aptitude of two algorithms to correctly match concepts this research is both timely and practical in light of recent attempts to develop and utilize methods for automatic semantic reconciliation
the problem of obtaining single consensus clustering solution from multitude or ensemble of clusterings of set of objects has attracted much interest recently because of its numerous practical applications while wide variety of approaches including graph partitioning maximum likelihood genetic algorithms and voting merging have been proposed so far to solve this problem virtually all of them work on hard partitionings ie where an object is member of exactly one cluster in any individual solution however many clustering algorithms such as fuzzy means naturally output soft partitionings of data and forcibly hardening these partitions before applying consensus method potentially involves loss of valuable information in this article we propose several consensus algorithms that can be applied directly to soft clusterings experimental results over variety of real life datasets are also provided to show that using soft clusterings as input does offer significant advantages especially when dealing with vertically partitioned data
procedures have long been the basic units of compilation in conventional optimization frameworks however procedures are typically formed to serve software engineering rather than optimization goals arbitrarily constraining code transformations techniques such as aggressive inlining and interprocedural optimization have been developed to alleviate this problem but due to code growth and compile time issues these can be applied only sparinglythis paper introduces the procedure boundary elimination pbe compilation framework which allows unrestricted whole program optimization pbe allows all intra procedural optimizations and analyses to operate on arbitrary subgraphs of the program regardless of the original procedure boundaries and without resorting to inlining in order to control compilation time pbe also introduces novel extensions of region formation and encapsulation pbe enables targeted code specialization which recovers the specialization benefits of inlining while keeping code growth in check this paper shows that pbe attains better performance than inlining with half the code growth
reducing memory space requirement is important to many applications for data intensive applications it may help avoid executing the program out of core for high performance computing memory space reduction may improve the cache hit rate as well as performance for embedded systems it can reduce the memory requirement the memory latency and the energy consumption this paper investigates program transformations which compiler can use to reduce the memory space required for storing program data in particular the paper uses integer programming to model the problem of combining loop shifting loop fusion and array contraction to minimize the data memory required to execute collection of multi level loop nests the integer programming problem is then reduced to an equivalent network flow problem which can be solved in polynomial time
we present enforceable component based realtime contracts the first extension of component based software engineering technology that comprehensively supports adaptive realtime systems from specification all the way to the running systemto provide this support we have extended component based interface definition languages idls and component representations in repositories to express realtime requirements for components the final software which is assembled from the components is then executed on realtime operating system rtos with the help of component runtime system rtos resource managers and the idl extensions are based on the same mathematical foundation thus the component runtime system can use information expressed in component oriented manner in the extended idl to derive parameters for the task based admission and scheduling in the rtos once basic realtime properties can thus be guaranteed runtime support can be extended to more elaborate schemes that also support adaptive applications container managed quality assurance we claim that this study convincingly demonstrates how component based software engineering can be extended to build systems with non functional requirements
contemporary workflow management systems are driven by explicit process models ie completely specified workflow design is required in order to enact given workflow process creating workflow design is complicated time consuming process and typically there are discrepancies between the actual workflow processes and the processes as perceived by the management therefore we have developed techniques for discovering workflow models the starting point for such techniques is so called workflow log containing information about the workflow process as it is actually being executed we present new algorithm to extract process model from such log and represent it in terms of petri net however we will also demonstrate that it is not possible to discover arbitrary workflow processes in this paper we explore class of workflow processes that can be discovered we show that the alpha hbox rm algorithm can successfully mine any workflow represented by so called swf net
networked environments such as wikis are commonly used to support work including the collaborative authoring of information and fact building in networked environments the activity of fact building is mediated not only by the technological features of the interface but also by the social conventions of the community it supports this paper examines the social and technological features of wikipedia article in order to understand how these features help mediate the activity of fact building and highlights the need for communication designers to consider the goals and needs of the communities for which they design
this paper presents the design and evaluation of safeguard an intra domain routing system that can safely forward packets to their destinations even when routes are changing safeguard is based on the simple idea that packets carry destination address plus local estimate of the remaining path cost we show that this simple design enables routers to detect path inconsistencies during route changes and resolve on working path for anticipated failure and restoration scenarios this in turn means that route changes do not disrupt connectivity although routing tables are inconsistent over the network we evaluate the router performance of safeguard using prototype based on netfpga and quagga we show that safeguard is amenable to high speed hardware implementation with low overhead we evaluate the network performance of safeguard via simulation the results show that safeguard converges faster than state of the art ip fast restoration mechanism and reduces periods of disruption to minimal duration ie the failure detection time
in this paper we introduce new cooperative design and visualization environment called integrare which supports designers and developers in building dependable component based systems using new behavior oriented design method this method has advantages in terms of its abilities to manage complexity find defects and make checks of dependability the environment integrates and unifies several tools that support multiple phases of the design process allowing them to interact and exchange information as well as providing efficient editing capabilities it can help formalize individual natural language functional requirements as behavior trees these trees can be composed to create an integrated tree like view of all the formalized requirements the environment manages complexity by allowing multiple users to work independently on requirements translation and tree editing in collaborative mode once design is constructed from the requirements it can be visually simulated with respect to an underlying operational semantics and formally verified by way of model checker
while the sociality of software agents drives toward the definition of institutions for multi agent systems their autonomy requires that such institutions are ruled by appropriate norm mechanisms computational institutions represent useful abstractions in this paper we show how computational institutions can be built on top of the rolex infrastructure role based system with interesting features for our aim we achieve twofold goal on the one hand we give concreteness to the institution abstractions on the other hand we demonstrate the flexibility of the rolex infrastructure
program transformation is the mechanical manipulation of program in order to improve it relative to some cost function and is understood broadly as the domain of computation where programs are the data the natural basic building blocks of the domain of program transformation are transformation rules expressing one step transformation on fragment of program the ultimate perspective of research in this area is high level language parametric rule based program transformation system which supports wide range of transformations admitting efficient implementations that scale to large programs this situation has not yet been reached as trade offs between different goals need to be made this survey gives an overview of issues in rule based program transformation systems focusing on the expressivity of rule based program transformation systems and in particular on transformation strategies available in various approaches the survey covers term rewriting extensions of basic term rewriting tree parsing strategies systems with programmable strategies traversal strategies and context sensitive rules
objects and their spatial relationships are important features for human visual perception in most existing content based image retrieval systems however only global features extracted from the whole image are used while they are easy to implement they have limited power to model semantic level objects and spatial relationship to overcome this difficulty this paper proposes constraint based region matching approach to image retrieval unlike existing region based approaches where either individual regions are used or only first order constraints are modeled the proposed approach formulates the problem in probabilistic framework and simultaneously models both first order region properties and second order spatial relationships for all the regions in the image specifically in this paper we present complete system that includes image segmentation local feature extraction first and second order constraints and probabilistic region weight estimation extensive experiments have been carried out on large heterogeneous image collection with images the proposed approach achieves significantly better performance than the state of the art approaches
in this paper we apply semantic web technologies to the creation of an improved search engine over legal and public administration documents conventional search strategies based on syntactic matching of tokens offer little help when the users vocabulary and the documents vocabulary differ this is often the case in public administration documents we present semantic search tool that fills this gap using semantic web technologies in particular ontologies and controlled vocabularies and hybrid search approach avoiding the expensive tagging of documents
denial of service dos and distributed denial of service ddos are two of the most serious and destructive network threats on the internet hackers exploiting all kinds of malicious packages to attack and usurp network hosts servers and bandwidth have seriously damaged enterprise campus and government network systems many network administrators employ intrusion detection systems idss and or firewalls to protect their systems however some systems lose most of their detection and or protection capabilities when encountering huge volume of attack packets in addition some detection resources may fail due to hardware and or software faults in this paper we propose grid based platform named the dynamic grid based intrusion detection environment dgide which exploits grid’s abundant computing resources to detect massive amount of intrusion packets and to manage dynamic environment detector node that detects attacks can dynamically join or leave the dgide newly joined detector is tested so that we can obtain its key performance curves which are used to balance detection workload among detectors the dgide backs up network packets when for some reason detector cannot continue its detection thus leaving an unfinished detection task the dgide allocates another available detector to take over therefore the drawbacks of ordinary security systems as mentioned above can be avoided
in this paper an adaptively weighted sub pattern locality preserving projection aw splpp algorithm is proposed for face recognition unlike the traditional lpp algorithm which operates directly on the whole face image patterns and obtains global face features that best detects the essential face manifold structure the proposed aw splpp method operates on sub patterns partitioned from an original whole face image and separately extracts corresponding local sub features from them furthermore the contribution of each sub pattern can be adaptively computed by aw splpp in order to enhance the robustness to facial pose expression and illumination variations the efficiency of the proposed algorithm is demonstrated by extensive experiments on three standard face databases yale yaleb and pie experimental results show that aw splpp outperforms other holistic and sub pattern based methods
set of ultra high throughput more than one gigabits per second serial links used as processor memory network can lead to the starting up of shared memory massively parallel multiprocessor the bandwidth of the network is far beyond values found in present shared memory multiprocessor networks to feed this network the memory must be serially multiported such multiprocessor can actually be build with current technologies this paper analyzes the characteristics of such novel architecture presents the solutions that must be considered and the practical problems associated with close of experiments these results show then the way to effectively build this multiprocessor taking into account main topics such as data coherency latency time and scalability
this paper explores power consumption for destructive read embedded dram destructive read dram is based on conventional dram design but with sense amplifiers optimized for lower latency this speed increase is achieved by not conserving the content of the dram cell after read operation random access time to dram was reduced from ns to ns in prototype made by hwang et al write back buffer was used to conserve data we have proposed new scheme for write back using the usually smaller cache instead of large additional write back buffer write back is performed whenever cache line is replaced this increases bus and dram bank activity compared to conventional architecture which again increases power consumption on the other hand computational performance is improved through faster dram accesses simulation of cpu dram and kbytes cache show that the power consumption increased by while the performance increased by for the applications in the spec benchmark with kbytes cache the power consumption increased by while performance increased by
nonquadratic variational regularization is well known and powerful approach for the discontinuity preserving computation of optic flow in the present paper we consider an extension of flow driven spatial smoothness terms to spatio temporal regularizers our method leads to rotationally invariant and time symmetric convex optimization problem it has unique minimum that can be found in stable way by standard algorithms such as gradient descent since the convexity guarantees global convergence the result does not depend on the flow initialization two iterative algorithms are presented that are not difficult to implement qualitative and quantitative results for synthetic and real world scenes show that our spatio temporal approach improves optic flow fields significantly ii smoothes out background noise efficiently and iii preserves true motion boundaries the computational costs are only higher than for pure spatial approach applied to all subsequent image pairs of the sequence
real time garbage collection has been shown to be feasible but for programs with high allocation rates the utilization achievable is not sufficient for some systemssince high allocation rate is often correlated with more high level abstract programming style the ability to provide good real time performance for such programs will help continue to raise the level of abstraction at which real time systems can be programmedwe have developed techniques that allow generational collection to be used despite the problems caused by variance in program behavior over the short time scales in which nursery can be collected syncopation allows such behavior to be detected by the scheduler in time for allocation to by pass the nursery and allow real time bounds to be metwe have provided an analysis of the costs of both generational and non generational techniques which allow the trade offs to be evaluated quantitatively we have also provided measurements of application behavior which show that while syncopation is necessary the need for it is rare enough that generational collection can provide major improvements in real time utilization an additional technique arraylet pre tenuring often significantly improves generational behavior
modern out of order processors with non blocking caches exploit memory level parallelism mlp by overlapping cache misses in wide instruction window the exploitation of mlp however can be limited due to long latency operations in producing the base address of cache miss load when the parent instruction is also cache miss load serialization of the two loads must be enforced to satisfy the load load data dependencein this paper we propose mechanism that dynamically captures the load load data dependences at runtime special preload is issued in place of the dependent load without waiting for the parent load thus effectively overlapping the two loads the preload provides necessary information for the memory controller to calculate the correct memory address upon the availability of the parent’s data to eliminate any interconnect delay between the two loads performance evaluations based on spec and olden applications show that significant speedups up to with an average of are achievable using the preload in conjunction with other aggressive mlp exploitation methods such as runahead execution the preload can make more significant improvement with an average of
we present novel approach for interactive view dependent rendering of massive models our algorithm combines view dependent simplification occlusion culling and out of core rendering we represent the model as clustered hierarchy of progressive meshes chpm we use the cluster hierarchy for coarse grained selective refinement and progressive meshes for fine grained local refinement we present an out of core algorithm for computation of chpm that includes cluster decomposition hierarchy generation and simplification we make use of novel cluster dependencies in the preprocess to generate crack free drastic simplifications at runtime the clusters are used for occlusion culling and out of core rendering we add frame of latency to the rendering pipeline to fetch newly visible clusters from the disk and to avoid stalls the chpm reduces the refinement cost for view dependent rendering by more than an order of magnitude as compared to vertex hierarchy we have implemented our algorithm on desktop pc we can render massive cad isosurface and scanned models consisting of tens or few hundreds of millions of triangles at frames per second with little loss in image quality
fair sharing of bandwidth remains an unresolved issue for distributed systems in this paper the users of distributed lan are modeled as selfish users with independence to choose their individual strategies with these selfish users the contention based distributed medium access scenario is modeled as complete information noncooperative game designated the access game novel mac strategy based on persistent csma is presented to achieve fairness in the access game it is proven that there are an infinite number of nash equilibria for the access game but they do not result in fairness therefore it may be beneficial for the selfish users to adhere to set of constraints that result in fairness in noncooperative fashion this leads to the formulation of constrained access game with fairness represented as set of algebraic constraints it is proven that the solution of the constrained game the constrained nash equilibrium is unique further it is shown that in addition to achieving fairness this solution also optimizes the throughput finally these results are extended to more realistic incomplete information scenario by approximating the incomplete information scenario as complete information scenario through information gathering and dissemination
service oriented architectures enable multitude of service providers to provide loosely coupled and interoperable services at different quality of service and cost levels this paper considers business processes composed of activities that are supported by service providers the structure of business process may be expressed by languages such as bpel and allows for constructs such as sequence switch while flow and pick this paper considers the problem of finding the set of service providers that minimizes the total execution time of the business process subject to cost and execution time constraints the problem is clearly np hard however the paper presents an optimized algorithm that finds the optimal solution without having to explore the entire solution space this algorithm can be used to find the optimal solution in problems of moderate size heuristic solution is also presented thorough experimental studies based on random business processes demonstrate that the heuristic algorithm was able to produce service provider allocations that result in execution times that are only few percentage points less than worse than the allocations obtained by the optimal algorithm while examining tiny fraction of the solution space tens of points versus millions of points
the pervasive computing environment will be composed of heterogeneous services in this work we have explored how domain specific language for service composition can be implemented to capture the common design patterns for service composition yet still retain comparable performance to other systems written in mainstream languages such as java in particular we have proposed the use of the method delegation design pattern the resolution of service bindings through the use of dynamically adjustable characteristics and the late binding of services as key features in simplifying the service composition task these are realised through the scooby language and the approach is compared to the use of apis to define adaptable services
the ability to generate repeatable realistic network traffic is critical in both simulation and testbed environments traffic generation capabilities to date have been limited to either simple sequenced packet streams typically aimed at throughput testing or to application specific tools focused on for example recreating representative http requests in this paper we describe harpoon new application independent tool for generating representative packet traffic at the ip flow level harpoon generates tcp and udp packet flows that have the same byte packet temporal and spatial characteristics as measured at routers in live environments harpoon is distinguished from other tools that generate statistically representative traffic in that it can self configure by automatically extracting parameters from standard netflow logs or packet traces we provide details on harpoon’s architecture and implementation and validate its capabilities in controlled laboratory experiments using configurations derived from flow and packet traces gathered in live environments we then demonstrate harpoon’s capabilities in router benchmarking experiment that compares harpoon with commonly used throughput test methods our results show that the router subsystem load generated by harpoon is significantly different suggesting that this kind of test can provide important insights into how routers might behave under actual operating conditions
the query optimizer is an important system component of relational database management system dbms it is the responsibility of this component to translate the user submitted query usually written in non procedural language into an efficient query evaluation plan qep which is then executed against the database the research literature describes wide variety of optimization strategies for different query languages and implementation environments however very little is known about how to design and structure the query optimization component to implement these strategies this paper proposes first step towards the design of modular query optimizer we describe its operations by transformation rules which generate different qeps from initial query specifications as we distinguish different aspects of the query optimization process our hope is that the approach taken in this paper will contribute to the more general goal of modular query optimizer as part of an extensible database management system
in the batch learning setting it suffices to take into account only reduced number of threshold candidates in discretizing the value range of numerical attribute for many commonly used attribute evaluation functions we show that the same techniques are also efficiently applicable in the on line learning scheme only constant time per example is needed for determining the changes on data grouping hence one can apply multi way splits eg in the standard approach to decision tree learning from data streams we also briefly consider modifications needed to cope with drifting concepts our empirical evaluation demonstrates that often the reduction in threshold candidates obtained is high for the important attributes in data stream logarithmic growth in the number of potential cut points and the reduced number of threshold candidates is observed
sensor network troubleshooting is notoriously difficult task further exacerbated by resource constraints unreliable components unpredictable natural phenomena and experimental programming paradigms this paper presents snts sensor network troubleshooting suite tool that performs automated failure diagnosis in sensor networks snts can be used to monitor network conditions using simple visualization techniques as well as to troubleshoot deployed distributed sensor systems using data mining approaches it is composed of data collection front end that records events internal to the network and ii data processing back end for subsequent analysis we use data mining techniques to automate failure diagnosis on the back end the assumption is that the occurrence of execution conditions that cause failures eg traversal of an execution path that contains bug or occurrence of sequence of events that protocol was not designed to handle will have measurable correlation by causality with the resulting failure itself hence by mining for network conditions that correlate with failure states the root causes of failure are revealed with high probability to evaluate the effectiveness of the tool we have used it to troubleshoot tracking system called envirotrack which although performs well most of the time occasionally fails to track targets correctly results show that snts can identify the major causes of the problem and give developers useful hints on improving the performance of the tracking system
the ability to position small subset of mesh vertices and produce meaningful overall deformation of the entire mesh is fundamental task in mesh editing and animation however the class of meaningful deformations varies from mesh to mesh and depends on mesh kinematics which prescribes valid mesh configurations and selection mechanism for choosing among them drawing an analogy to the traditional use of skeleton based inverse kinematics for posing skeletons we define mesh based inverse kinematics as the problem of finding meaningful mesh deformations that meet specified vertex constraintsour solution relies on example meshes to indicate the class of meaningful deformations each example is represented with feature vector of deformation gradients that capture the affine transformations which individual triangles undergo relative to reference pose to pose mesh our algorithm efficiently searches among all meshes with specified vertex positions to find the one that is closest to some pose in nonlinear span of the example feature vectors since the search is not restricted to the span of example shapes this produces compelling deformations even when the constraints require poses that are different from those observed in the examples furthermore because the span is formed by nonlinear blend of the example feature vectors the blending component of our system may also be used independently to pose meshes by specifying blending weights or to compute multi way morph sequences
concert is new language for distributed programming that extends ansi to support distribution and process dynamics concert provides the ability to create and terminate processes connect them together and communicate among them it supports transparent remote function calls rpc and asynchronous messages interprocess communications interfaces are typed in concert and type correctness is checked at compile time wherever possible otherwise at runtime all data types including complex data structures containing pointers and aliases can be transmitted in rpcs concert programs run on heterogeneous set of machine architectures and operating systems and communicate over multiple rpc and messaging protocols the current concert implementation runs on aix sunos solaris and os and communicates over sun rpc osf dce and udp multicast several groups inside and outside ibm are actively using concert and it is available via anonymous ftp from softwarewatsonibmcom pub concert
clustering is the process of grouping data objects into set of disjoint classes called clusters so that objects within class are highly similar with one another and dissimilar with the objects in other classes means km algorithm is one of the most popular clustering techniques because it is easy to implement and works fast in most situations however it is sensitive to initialization and is easily trapped in local optima harmonic means khm clustering solves the problem of initialization using built in boosting function but it also easily runs into local optima particle swarm optimization pso algorithm is stochastic global optimization technique hybrid data clustering algorithm based on pso and khm psokhm is proposed in this research which makes full use of the merits of both algorithms the psokhm algorithm not only helps the khm clustering escape from local optima but also overcomes the shortcoming of the slow convergence speed of the pso algorithm the performance of the psokhm algorithm is compared with those of the pso and the khm clustering on seven data sets experimental results indicate the superiority of the psokhm algorithm
this paper describes the architecture of costa an abstract interpretation based cost and termination analyzer for java bytecode the system receives as input bytecode program choice of resource of interest and tries to obtain an upper bound of the resource consumption of the program costa provides several non trivial notions of cost as the consumption of the heap the number of bytecode instructions executed and the number of calls to specific method additionally costa tries to prove termination of the bytecode program which implies the boundedness of any resource consumption having cost and termination together is interesting as both analyses share most of the machinery to respectively infer cost upper bounds and to prove that the execution length is always finite ie the program terminates we report on experimental results which show that costa can deal with programs of realistic size and complexity including programs which use java libraries to the best of our knowledge this system provides for the first time evidence that resource usage analysis can be applied to realistic object oriented bytecode programming language
we propose new publish subscribe system called fleet that seamlessly combines novel subscription mapping scheme and structured overlay network address space partitioning technique to build an effective content based publish subscribe pub sub system over distributed hash table dht overlay fleet employs an explicit mechanism to deal with skewed popularity distributions in subscriptions and events which spreads the load generated by hot attributes across multitude of peers the address space partitioning event delivery mechanism expedites event delivery fleet strikes an ideal balance between subscription storage cost and event delivery cost and is more scalable in the number of events subscriptions schema attributes and number of system nodes
we present novel approach for computing and solving the poisson equation over the surface of mesh as in previous approaches we define the laplace beltrami operator by considering the derivatives of functions defined on the mesh however in this work we explore choice of functions that is decoupled from the tessellation specifically we use basis functions second order tensor product splines defined over space and then restrict them to the surface we show that in addition to being invariant to mesh topology this definition of the laplace beltrami operator allows natural multiresolution structure on the function space that is independent of the mesh structure enabling the use of simple multigrid implementation for solving the poisson equation
in our previous work we proposed systematic cross layer framework for dynamic multimedia systems which allows each layer to make autonomous and foresighted decisions that maximize the system’s long term performance while meeting the application’s real time delay constraints the proposed solution solved the cross layer optimization offline under the assumption that the multimedia system’s probabilistic dynamics were known priori by modeling the system as layered markov decision process in practice however these dynamics are unknown priori and therefore must be learned online in this paper we address this problem by allowing the multimedia system layers to learn through repeated interactions with each other to autonomously optimize the system’s long term performance at run time the two key challenges in this layered learning setting are each layer’s learning performance is directly impacted by not only its own dynamics but also by the learning processes of the other layers with which it interacts and ii selecting learning model that appropriately balances time complexity ie learning speed with the multimedia system’s limited memory and the multimedia application’s real time delay constraints we propose two reinforcement learning algorithms for optimizing the system under different design constraints the first algorithm solves the cross layer optimization in centralized manner and the second solves it in decentralized manner we analyze both algorithms in terms of their required computation memory and interlayer communication overheads after noting that the proposed reinforcement learning algorithms learn too slowly we introduce complementary accelerated learning algorithm that exploits partial knowledge about the system’s dynamics in order to dramatically improve the system’s performance in our experiments we demonstrate that decentralized learning can perform equally as well as centralized learning while enabling the layers to act autonomously additionally we show that existing application independent reinforcement learning algorithms and existing myopic learning algorithms deployed in multimedia systems perform significantly worse than our proposed application aware and foresighted learning methods
storage mapping optimization is flexible approach to folding array dimensions in numerical codes it is designed to reduce the memory footprint after wide spectrum of loop transformations whether based on uniform dependence vectors or more expressive polyhedral abstractions conversely few loop transformations have been proposed to facilitate register promotion namely loop fusion unroll and jam or tiling building on array data flow analysis and expansion we extend storage mapping optimization to improve opportunities for register promotionour work is motivated by the empirical study of computational biology benchmark the approximate string matching algorithm bpr from nr grep on wide issue micro architecture our experiments confirm the major benefit of register tiling even on non numerical benchmarks but also shed the light on two novel issues prior array expansion may be necessary to enable loop transformations that finally authorize profitable register promotion and more advanced scheduling techniques beyond tiling and unroll and jam may significantly improve performance in fine tuning register usage and instruction level parallelism
photo libraries are growing in quantity and size requiring better support for locating desired photographs mediaglow is an interactive visual workspace designed to address this concern it uses attributes such as visual appearance gps locations user assigned tags and dates to filter and group photos an automatic layout algorithm positions photos with similar attributes near each other to support users in serendipitously finding multiple relevant photos in addition the system can explicitly select photos similar to specified photos we conducted user evaluation to determine the benefit provided by similarity layout and the relative advantages offered by the different layout similarity criteria and attribute filters study participants had to locate photos matching probe statements in some tasks participants were restricted to single layout similarity criterion and filter option participants used multiple attributes to filter photos layout by similarity without additional filters turned out to be one of the most used strategies and was especially beneficial for geographical similarity lastly the relative appropriateness of the single similarity criterion to the probe significantly affected retrieval performance
the use of technology to access personal information in public places is increasingly common but can these interactions induce stress sixty eight participants were led to believe that extremely sensitive personal information would be displayed via either public or personal handheld device in isolated or crowded in the presence of strangers conditions stress responses were taken in terms of heart rate galvanic skin response and subjective ratings as anticipated participants showed stronger stress reactions in the crowded rather than the isolated conditions and also experienced greater stress when the information was presented on public screen in comparison to personal handheld device implications for the design of public private information systems are discussed
recently mining data streams with concept drifts for actionable insights has become an important and challenging task for wide range of applications including credit card fraud protection target marketing network intrusion detection etc conventional knowledge discovery tools are facing two challenges the overwhelming volume of the streaming data and the concept drifts in this paper we propose general framework for mining concept drifting data streams using weighted ensemble classifiers we train an ensemble of classification models such as ripper naive beyesian etc from sequential chunks of the data stream the classifiers in the ensemble are judiciously weighted based on their expected classification accuracy on the test data under the time evolving environment thus the ensemble approach improves both the efficiency in learning the model and the accuracy in performing classification our empirical study shows that the proposed methods have substantial advantage over single classifier approaches in prediction accuracy and the ensemble framework is effective for variety of classification models
we present game theoretic study of hybrid communication networks in which mobile devices can connect in an ad hoc fashion to base station possibly via few hops using other mobile devices as intermediate nodes the maximal number of allowed hops might be bounded with the motivation to guarantee small latency we introduce hybrid connectivity games to study the impact of selfishness on this kind of infrastructure mobile devices are represented by selfish players each of which aims at establishing an uplink path to the base station minimizing its individual cost our model assumes that intermediate nodes on an uplink path are reimbursed for transmitting the packets of other devices the reimbursements can be paid either by benevolent network operator or by the senders of the packets using micropayments via clearing agency that possibly collects small percentage as commission these different ways to implement the payments lead to different variants of the hybrid connectivity game our main findings are if there is no constraint on the number of allowed hops on the path to the base station then the existence of equilibria is guaranteed regardless of whether the network operator or the senders pay for forwarding packets if the network operator pays then the existence of equilibria is guaranteed only if at most one intermediate node is allowed ie for at most two hops on the uplink path of device but not if the maximal number of allowed hops is three or larger in contrast if the senders pay for forwarding their packets then equilibria are guaranteed to exist given any bound on the number of allowed hops the equilibrium analysis presented in this paper gives first game theoretical motivation for the implementation of micropayment schemes in which senders pay for forwarding their packets we further support this evidence by giving an upper bound on the price of anarchy for this kind of hybrid connectivity games that is independent of the number of nodes but only depends on the number of hops and the power gradient
mammalian genomes are typically gbps gibabase pairs in size the largest public database ncbi national center for biotechnology information http wwwncbinlmnihgov of dna contains more than gbps suffix trees are widely acknowledged as data structure to support exact approximate sequence matching queries as well as repetitive structure finding efficiently when they can reside in main memory but it has been shown as difficult to handle long dna sequences using suffix trees due to the so called memory bottleneck problems the most space efficient main memory suffix tree construction algorithm takes nine hours and gb memory space to index the human genome in this paper we show that suffix trees for long dna sequences can be efficiently constructed on disk using small bounded main memory space and therefore all existing algorithms based on suffix trees can be used to handle long dna sequences that cannot be held in main memory we adopt two phase strategy to construct suffix tree on disk to construct diskbase suffix tree without suffix links and rebuild suffix links upon the suffix tree being constructed on disk if needed we propose new disk based suffix tree construction algorithm called dynacluster which shows log experimental behavior regarding cpu cost and linearity for cost dynacluster needs mb main memory only to construct more than mbps dna sequences and significantly outperforms the existing disk based suffix tree construction algorithms using prepartitioning techniques in terms of both construction cost and query processing cost we conducted extensive performance studies and report our findings in this paper
this paper extends our previous studies on the assimilation of internet based business innovations by firms in an international setting drawing upon theories on the process and contexts of technology diffusion we develop an integrative model to examine three assimilation stages initiation adoption routinization the model features technological organizational and environmental contexts as prominent antecedents of this three stage assimilation process based on this model we hypothesize how technology readiness technology integration firm size global scope managerial obstacles competition intensity and regulatory environment influence business assimilation at the firm level unique data set of firms from countries is used to test the conceptual model and hypotheses to probe deeper into the influence of the environmental context we compare two subsamples from developed and developing countries our empirical analysis leads to several key findings competition positively affects initiation and adoption but negatively impacts routinization suggesting that too much competition is not necessarily good for technology assimilation because it drives firms to chase the latest technologies without learning how to use existing ones effectively large firms tend to enjoy resource advantages at the initiation stage but have to overcome structural inertia in later stages we also find that economic environments shape innovation assimilation regulatory environment plays more important role in developing countries than in developed countries moreover while technology readiness is the strongest factor facilitating assimilation in developing countries technology integration turns out to be the strongest in developed countries implying that as business evolves the key determinant of its assimilation shifts from accumulation to integration of technologies together these findings offer insights into how innovation assimilation is influenced by contextual factors and how the effects may vary across different stages and in different environments
suffix trees and suffix arrays are important data structures for string processing providing efficient solutions for many applications involving pattern matching recent work by sinha et al sigmod addressed the problem of arranging suffix array on disk so that querying is fast and showed that the combination of small trie and suffix array like blocked data structure allows queries to be answered many times faster than alternative disk based suffix trees drawback of their lof sa structure and common to all current disk resident suffix tree array approaches is that the space requirement of the data structure though on disk is large relative to the text for the lof sa bytes including the underlying byte text in this paper we explore techniques for reducing the space required by the lof sa experiments show these methods cut the data structure to nearly half its original size without for large strings that necessitate on disk structures any impact on search times
model composition is common operation used in many software development activities for example reconciling models developed in parallel by different development teams or merging models of new features with existing model artifacts unfortunately both commercial and academic model composition tools suffer from the composition conflict problem that is models to be composed may conflict with each other and these conflicts must be resolved in practice detecting and resolving conflicts is highly intensive manual activity in this paper we investigate whether aspect orientation reduces conflict resolution effort as improved modularization may better localize conflicts the main goal of the paper is to conduct an exploratory study to analyze the impact of aspects on conflict resolution in particular model compositions are used to express the evolution of architectural models along six releases of software product line well known composition algorithms such as override merge and union are applied and compared on both ao and non ao models in terms of their conflict rate and effort to solve the identified conflicts our findings identify specific scenarios where aspect orientation properties such as obliviousness and quantification result in lower or higher composition effort
this paper presents framework for confirming deadlock potentials detected by runtime analysis of single run of multi threaded program the multi threaded program under examination is instrumented to emit lock and unlock events when the instrumented program is executed trace is generated consisting of the lock and unlock operations performed during that specific run lock graph is constructed which can reveal deadlock potentials in the form of cycles the effectiveness of this analysis is caused by the fact that successful non deadlocking runs yield as good and normally better information as deadlocking runs each cycle is then used to construct an observer that can detect the occurrence of the corresponding real deadlock should it occur during subsequent test runs and controller which when composed with the program determines the optimal scheduling strategy that will maximize the probability for the corresponding real deadlock to occur the framework is formalized in terms of transition systems and is implemented in java
as technology scaling drives the number of processor cores upward current on chip routers consume substantial portions of chip area and power budgets since existing research has greatly reduced router latency overheads and capitalized on available on chip bandwidth power constraints dominate interconnection network design recently research has proposed bufferless routers as means to alleviate these constraints but to date all designs exhibit poor operational frequency throughput or latency in this paper we propose an efficient bufferless router which lowers average packet latency by and dynamic energy by over existing bufferless on chip network designs in order to maintain the energy and area benefit of bufferless routers while delivering ultra low latencies our router utilizes an opportunistic processor side buffering technique and an energy efficient circuit switched network for delivering negative acknowledgments for dropped packets
in our research on tangible user interaction we focus on the design of products that are dedicated to particular user task and context in doing so we are interested in strengthening the actions side of tangible interaction currently the actions required by electronic products are limited to pushing sliding and rotating yet humans are capable of far more complex actions human dexterity is highly refined this focus on actions requires reconsideration of the design process in this paper we propose two design methods that potentially boost the focus on skilled actions in the design of tangible user interaction the hands only scenario is close up version of the dramatised use scenario it helps focus effort on what we imagine the hands of the users doing the video action wall is technique of live post its on projected computer screen little snippets of action videos running simultaneously help designers understand user actions by the qualities they represent
we consider the problem of data mining with formal privacy guarantees given data access interface based on the differential privacy framework differential privacy requires that computations be insensitive to changes in any particular individual’s record thereby restricting data leaks through the results the privacy preserving interface ensures unconditionally safe access to the data and does not require from the data miner any expertise in privacy however as we show in the paper naive utilization of the interface to construct privacy preserving data mining algorithms could lead to inferior data mining results we address this problem by considering the privacy and the algorithmic requirements simultaneously focusing on decision tree induction as sample application the privacy mechanism has profound effect on the performance of the methods chosen by the data miner we demonstrate that this choice could make the difference between an accurate classifier and completely useless one moreover an improved algorithm can achieve the same level of accuracy and privacy as the naive implementation but with an order of magnitude fewer learning samples
general information retrieval systems are designed to serve all users without considering individual needs in this paper we propose novel approach to personalized search it can in unified way exploit and utilize implicit feedback information such as query logs and immediately viewed documents moreover our approach can implement result re ranking and query expansion simultaneously and collaboratively based on this approach we develop client side personalized web search agent pair personalized assistant for information retrieval which supports both english and chinese our experiments on trec and htrdp collections clearly show that the new approach is both effective and efficient
sensor networks novel paradigm in distributed wireless communication technology have been proposed for various applications including military surveillance and environmental monitoring these systems deploy heterogeneous collections of sensors capable of observing and reporting on various dynamic properties of their surroundings in time sensitive manner such systems suffer bandwidth energy and throughput constraints that limit the quantity of information transferred from end to end these factors coupled with unpredictable traffic patterns and dynamic network topologies make the task of designing optimal protocols for such networks difficult mechanisms to perform data centric aggregation utilizing application specific knowledge provide means to augmenting throughput but have limitations due to their lack of adaptation and reliance on application specific decisions we therefore propose novel aggregation scheme that adaptively performs application independent data aggregation in time sensitive manner our work isolates aggregation decisions into module that resides between the network and the data link layer and does not require any modifications to the currently existing mac and network layer protocols we take advantage of queuing delay and the broadcast nature of wireless communication to concatenate network units into an aggregate using novel adaptive feedback scheme to schedule the delivery of this aggregate to the mac layer for transmission in our evaluation we show that end to end transmission delay is reduced by as much as percnt under heavy traffic loads additionally we show as much as percnt reduction in transmission energy consumption with an overall reduction in header overhead theoretical analysis simulation and test bed implementation on berkeley’s mica motes are provided to validate our claims
we present scalable and precise context sensitive points to analysis with three key properties filtering out of unrealizable paths context sensitive heap abstraction and context sensitive call graph previous work has shown that all three properties are important for precisely analyzing large programs eg to show safety of downcasts existing analyses typically give up one or more of the properties for scalability we have developed refinement based analysis that succeeds by simultaneously refining handling of method calls and heap accesses allowing the analysis to precisely analyze important code while entirely skipping irrelevant code the analysis is demanddriven and client driven facilitating refinement specific to each queried variable and increasing scalability in our experimental evaluation our analysis proved the safety of more casts than one of the most precise existing analyses across suite of large benchmarks the analysis checked the casts in under minutes per benchmark taking less than second per query and required only mb of memory far less than previous approaches
product data exchange is the precondition of business interoperation between web based firms however millions of small and medium sized enterprises smes encode their web product data in ad hoc formats for electronic product catalogues this prevents product data exchange between business partners for business interoperation to solve this problem this paper has proposed novel concept centric catalogue engineering approach for representing transforming and comparing semantic contexts in ad hoc product data exchange in this approach concepts and contexts of product data are specified along data exchange chain and are mapped onto several novel xml product map xpm documents by utilizing xml hierarchical structure and its syntax the designed xpm has overcome the semantic limitations of xml markup and has achieved the semantic interoperation for ad hoc product data exchange
traceable signatures ts suggested by kiayias tsiounis and yung extend group signatures to address various basic traceability issues beyond merely identifying the anonymous signer of rogue signature namely they enable the efficient tracing of all signatures produced by misbehaving party without opening the identity of other parties they also allow users to provably claim ownership of previously signed anonymous signature to date known ts systems all rely on the random oracle model in this work we present the first realization of the primitive that avoids resorting to the random oracle methodology in its security proofs furthermore our realization’s efficiency is comparable to that of nowadays fastest and shortest standard model group signatures
as organizations start to use data intensive cluster computing systems like hadoop and dryad for more applications there is growing need to share clusters between users however there is conflict between fairness in scheduling and data locality placing tasks on nodes that contain their input data we illustrate this problem through our experience designing fair scheduler for node hadoop cluster at facebook to address the conflict between locality and fairness we propose simple algorithm called delay scheduling when the job that should be scheduled next according to fairness cannot launch local task it waits for small amount of time letting other jobs launch tasks instead we find that delay scheduling achieves nearly optimal data locality in variety of workloads and can increase throughput by up to while preserving fairness in addition the simplicity of delay scheduling makes it applicable under wide variety of scheduling policies beyond fair sharing
recently we started to experience shift from physical communities to virtual communities which leads to missed social opportunities in our daily routine for instance we are not aware of neighbors with common interests or nearby events mobile social computing applications mscas promise to improve social connectivity in physical communities by leveraging information about people social relationships and places this article presents mobisoc middleware that enables msca development and provides common platform for capturing managing and sharing the social state of physical communities additionally it incorporates algorithms that discover previously unknown emergent geo social patterns to augment this state to demonstrate mobisoc’s feasibility we implemented and tested on smart phones two mscas for location based mobile social matching and place based ad hoc social collaboration experimental results showed that mobisoc can provide good response time for users we also demonstrated that an adaptive localization scheme and carefully chosen cryptographic methods can significantly reduce the resource consumption associated with the location engine and security on smart phones user study of the mobile social matching application proved that geo social patterns can double the quality of social matches and that people are willing to share their location with mobisoc in order to benefit from mscas
many state of the art join techniques require the input relations to be almost fully sorted before the actual join processing starts thus these techniques start producing first results only after considerable time period has passed this blocking behaviour is serious problem when consequent operators have to stop processing in order to wait for first results of the join furthermore this behaviour is not acceptable if the result of the join is visualized or and requires user interaction these are typical scenarios for data mining applications the off time of existing techniques even increases with growing problem sizes in this paper we propose generic technique called progressive merge join pmj that eliminates the blocking behaviour of sort based join algorithms the basic idea behind pmj is to have the join produce results as early as the external mergesort generates initial runs hence it is possible for pmj to return first results very early this paper provides the basic algorithms and the generic framework of pmj as well as use cases for different types of joins moreover we provide generic online selectivity estimator with probabilistic quality guarantees for similarity joins in particular first non blocking join algorithms are derived from applying pmj to the state of the art techniques we have implemented pmj as part of an object relational cursor algebra set of experiments shows that substantial amount of results are produced even before the input relationas would have been sorted we observed only moderate increase in the total runtime compared to the blocking counterparts
with cloud and utility computing models gaining significant momentum data centers are increasingly employing virtualization and consolidation as means to support large number of disparate applications running simultaneously on chip multiprocessor cmp server in such environments contention for shared platform resources cpu cores shared cache space shared memory bandwidth etc can have significant effect on each virtual machine’s performance in this paper we investigate the shared resource contention problem for virtual machines by measuring the effects of shared platform resources on virtual machine performance proposing model for estimating shared resource contention effects and proposing transition from virtual machine vm to virtual platform architecture vpa that enables transparent shared resource management through architectural mechanisms for monitoring and enforcement our measurement and modeling experiments are based on consolidation benchmark vconsolidate running on state of the art cmp server our virtual platform architecture experiments are based on detailed simulations of consolidation scenarios through detailed measurements and simulations we show that shared resource contention affects virtual machine performance significantly and emphasize that virtual platform architectures is must for future virtualized datacenters
rapid advancement and more readily availability of grid technologies have encouraged many businesses and researchers to establish virtual organizations vo and make use of their available desktop resources to solve computing intensive problems these vos however work as disjointed and independent communities with no resource sharing between them we in previous work have proposed fully decentralized and reconfigurable inter grid framework for resource sharing among such distributed and autonomous grid systems rao et al in iccsa the specific problem that underlies in such collaborating grids system is scheduling of resources as there is very little knowledge about availability of the resources due to the distributed and autonomous nature of the underlying grid entities in this paper we propose probabilistic and adaptive scheduling algorithm using system generated predictions for inter grid resource sharing keeping collaborating grid systems autonomous and independent we first use system generated job runtime estimates without actually submitting jobs to the target grid system then this job execution estimate is used to predict the job scheduling feasibility on the target system furthermore our proposed algorithm adapted itself to the actual resource behavior and performance simulation results are presented to discuss the correctness and accuracy of our proposed algorithm
we define the value state dependence graph vsdg the vsdg is form of the value dependence graph vdg extended by the addition of state dependence edges to model sequentialised computation these express store dependencies and loop termination dependencies of the original program we also exploit them to express the additional serialization inherent in producing final object code the central idea is that this latter serialization can be done incrementally so that we have class of algorithms which effectively interleave register allocation and code motion thereby avoiding well known phase order problem in compilers this class operates by first normalizing the vsdg during construction to remove all duplicated computation and then repeatedly choosing between allocating value to register ii spilling value to memory iii moving loop invariant computation within loop to avoid register spillage and iv statically duplicating computation to avoid register spillage we show that the classical two phase approach code motion then register allocation in both chow and chaitin forms are examples of this class and propose new algorithm based on depth first cuts of the vsdg
it is considered good distributed computing practice to devise object implementations that tolerate contention periods of asynchrony and large number of failures but perform fast if few failures occur the system is synchronous and there is no contention this paper initiates the first study of quorum systems that help design such implementations by encompassing at the same time optimal resilience just like traditional quorum systems as well as optimal best case complexity unlike traditional quorum systems we introduce the notion of refined quorum system rqs of some set as set of three classes of subsets quorums of firstclass quorums are also second class quorums themselves being also third class quorums first class quorums have large intersections with all other quorums second class quorums typically have smaller intersections with those of the third class the latter simply correspond to traditional quorums intuitively under uncontended and synchronous conditions distributed object implementation would expedite an operation if quorum of the first class is accessed then degrade gracefully depending on whether quorum of the second or the third class is accessed our notion of refined quorum system is devised assuming general adversary structure and this basically allows relying on refined quorum systems to relax the assumption of independent process failures often questioned in practice we illustrate the power of refined quorums by introducing two new optimal byzantine resilient distributed object implementations anatomic storage and consensus algorithm both match previously established resilience and best case complexity lower bounds closing open gaps as well as new complexity bounds we establish here
we describe modular programming style that harnesses modern type systems to verify safety conditions in practical systems this style has three ingredients compact kernel of trust that is specific to the problem domain ii unique names capabilities that confer rights and certify properties so as to extend the trust from the kernel to the rest of the application iii static type proxies for dynamic values we illustrate our approach using examples from the dependent type literature but our programs are written in haskell and ocaml today so our techniques are compatible with imperative code native mutable arrays and general recursion the three ingredients of this programming style call for an expressive core language higher rank polymorphism and phantom types
this paper proposes method for image indexing that allows to retrieve related images under the query by example paradigm the proposed strategy exploits multimodal interactions between text annotations and visual contents in the image database to build semantic index we achieve this using non negative matrix factorization algorithm to construct latent semantic space in which visual features and text terms are represented together the proposed system was evaluated using standard benchmark dataset the experimental evaluation shows significant improvement on the system performance using the proposed multimodal indexing approach
virtual machine vm memory allocation and vm consolidation can benefit from the prediction of vm page miss rate at each candidate memory size such prediction is challenging for the hypervisor or vm monitor due to lack of knowledge on vm memory access pattern this paper explores the approach that the hypervisor takes over the management for part of the vm memory and thus all accesses that miss the remaining vm memory can be transparently traced by the hypervisor for online memory access tracing its overhead should be small compared to the case that all allocated memory is directly managed by the vm to save memory space the hypervisor manages its memory portion as an exclusive cache ie containing only data that is not in the remaining vm memory to minimize overhead evicted data from vm enters its cache directly from vm memory as opposed to entering from the secondary storage we guarantee the cache correctness by only caching memory pages whose current contents provably match those of corresponding storage locations based on our design we show that when the vm evicts pages in the lru order the employment of the hypervisor cache does not introduce any additional overhead in the system we implemented the proposed scheme on the xen para virtualization platform our experiments with microbenchmarks and four real data intensive services specweb index searching tpc and tpc illustrate the overhead of our hypervisor cache and the accuracy of cache driven vm page miss rate prediction we also present the results on adaptive vm memory allocation with performance assurance
although tagging has become increasingly popular in online image and video sharing systems tags are known to be noisy ambiguous incomplete and subjective these factors can seriously affect the precision of social tag based web retrieval system therefore improving the precision performance of these social tag based web retrieval systems has become an increasingly important research topic to this end we propose shared subspace learning framework to leverage secondary source to improve retrieval performance from primary dataset this is achieved by learning shared subspace between the two sources under joint nonnegative matrix factorization in which the level of subspace sharing can be explicitly controlled we derive an efficient algorithm for learning the factorization analyze its complexity and provide proof of convergence we validate the framework on image and video retrieval tasks in which tags from the labelme dataset are used to improve image retrieval performance from flickr dataset and video retrieval performance from youtube dataset this has implications for how to exploit and transfer knowledge from readily available auxiliary tagging resources to improve another social web retrieval system our shared subspace learning framework is applicable to range of problems where one needs to exploit the strengths existing among multiple and heterogeneous datasets
this article presents an extensive characterization of spam infected mail workload the study aims at identifying and quantifying the characteristics that significantly distinguish spam from non spam ie legitimate traffic assessing the impact of spam on the aggregate traffic providing data for creating synthetic workload models and drawing insights into more effective spam detection techniques our analysis reveals significant differences in the spam and non spam workloads we conjecture that these differences are consequence of the inherently different mode of operation of the mail senders whereas legitimate mail transmissions are driven by social bilateral relationships spam transmissions are unilateral spammer driven action
for complex tasks such as parse selection the creation of labelled training sets can be extremely costly resource efficient schemes for creating informative labelled material must therefore be considered we investigate the relationship between two broad strategies for reducing the amount of manual labelling necessary to train accurate parse selection models ensemble models and active learning we show that popular active learning methods for reducing annotation costs can be outperformed by instead using model class which uses the available labelled data more efficiently for this we use simple type of ensemble model called the logarithmic opinion pool lop we furthermore show that lops themselves can benefit from active learning as predicted by theoretical explanation of the predictive power of lops detailed analysis of active learning using lops shows that component model diversity is strong predictor of successful lop performance other contributions include novel active learning method justification of our simulation studies using timing information and cross domain verification of our main ideas using text classification
for successful collaboration to occur fundamental requirement is the ability for participants to refer to artifacts within the shared environment this task is often straightforward in traditional collaborative desktop applications yet the spatial properties found in mixed reality environments greatly impact the complexity of generating and interpreting meaningful reference cues although awareness is very active area of research little focus has been given to the environmental and contextual factors that influence referencing or the costs associated with supporting it in mixed reality environments the work presented here consists of compilation of understanding we have gained through user observation participant feedback and system development we begin by summarizing our findings from several user studies in collaborative augmented reality ar to organize the complexity associated with referencing in ar we enumerate contextual and environmental factors that influence referential awareness integrating discussion about user preferences and the impact they have on the underlying technological requirements finally we discuss how these factors can impact the design space of collaborative systems and describe the cost associated with supporting references in collaborative ar
while most research papers on computer architectures include some performance measurements these performance numbers tend to be distrusted up to the point that after so many research articles on data cache architectures for instance few researchers have clear view of what are the best data cache mechanisms to illustrate the usefulness of fair quantitative comparison we have picked target architecture component for which lots of optimizations have been proposed data caches and we have implemented most of the performance oriented hardware data cache optimizations published in top conferences in the past years beyond the comparison of data cache ideas our goals are twofold to clearly and quantitatively evaluate the effect of methodology shortcomings such as model precision benchmark selection trace selection on assessing and comparing research ideas and to outline how strong is the methodology effect in many cases to outline that the lack of interoperable simulators and not disclosing simulators at publication time make it difficult if not impossible to fairly assess the benefit of research ideas this study is part of broader effort called microlib an open library of modular simulators aimed at promoting the disclosure and sharing of simulator models
if program does not fulfill given specification model checker delivers counterexample run which demonstrates the wrong behavior even with counterexample locating the actual fault in the source code is often difficult task for the verification engineer we present an automatic approach for fault localization in programs the method is based on model checking and reports only components that can be changed such that the difference between actual and intended behavior of the example is removed to identify these components we use the bounded model checker cbmc on an instrumented version of the program we present experimental data that supports the applicability of our approach
as the number of rfid applications grows concerns about their security and privacy become greatly amplified at the same time the acutely restricted and cost sensitive nature of rfid tags rules out simple reuse of traditional security privacy solutions and calls for new generation of extremely lightweight identification and authentication protocols this article describes universally composable security framework designed especially for rfid applications we adopt rfid specific setup communication and concurrency assumptions in model that guarantees strong security privacy and availability properties in particular the framework supports modular deployment which is most appropriate for ubiquitous applications we also describe set of simple efficient secure and anonymous untraceable rfid identification and authentication protocols that instantiate the proposed framework these protocols involve minimal interaction between tags and readers and place only small computational load on the tag and light computational burden on the back end server we show that our protocols are provably secure within the proposed framework
we address the problem of developing efficient cache coherence protocols for use in distributed systems implementing distributed shared memory dsm using message passing serious drawback of traditional approaches to this problem is that the users are required to state the desired coherence protocol at the level of asynchronous message interactions involving request acknowledge and negative acknowledge messages and handle unexpected messages by introducing intermediate states proofs of correctness of protocols described in terms of low level asynchronous messages are very involved often the proofs hold only for specific configurations and buffer allocations we propose method in which the users state the desired protocol directly in terms of the desired high level effect namely synchronization and coordination using the synchronous rendezvous construct these descriptions are much easier to understand and computationally more efficient to verify than asynchronous protocols due to their small state spaces the rendezvous protocol can also be synthesized into efficient asynchronous protocols in this paper we present our protocol refinement procedure prove its soundness and provide examples of its efficiency our synthesis procedure applies to large classes of dsm protocols
today ontologies are being used to model domain of knowledge in semantic web owl is considered to be the main language for developing such ontologies it is based on the xml model which inherently follows the hierarchical structure in this paper we demonstrate an automatic approach for emergent semantics modeling of ontologies we follow the collaborative ontology construction method without the direct interaction of domain users engineers or developers very important characteristic of an ontology is its hierarchical structure of concepts we consider large sets of domain specific hierarchical structures as trees and apply frequent sub tree mining for extracting common hierarchical patterns our experiments show that these hierarchical patterns are good enough to represent and describe the concepts for the domain ontology the technique further demonstrates the construction of the taxonomy of domain ontology in this regard we consider the largest frequent tree or tree created by merging the set of largest frequent sub trees as the taxonomy we argue in favour of the trustabilty for such taxonomy and related concepts since these have been extracted from the structures being used with in the specified domain
coarse grained reconfigurable architectures have become increasingly important in recent years automatic design or compilation tools are essential to their success in this paper we present modulo scheduling algorithm to exploit loop level parallelism for coarse grained reconfigurable architectures this algorithm is key part of our dynamically reconfigurable embedded systems compiler dresc it is capable of solving placement scheduling and routing of operations simultaneously in modulo constrained space and uses an abstract architecture representation to model wide class of coarse grained architectures the experimental results show high performance and efficient resource utilization on tested kernels
rapidly leveraging information analytics technologies to mine the mounting information in structured and unstructured forms derive business insights and improve decision making is becoming increasingly critical to today’s business successes one of the key enablers of the analytics technologies is an information warehouse management system iwms that processes different types and forms of information builds and maintains the information warehouse iw effectively although traditional multi dimensional data warehousing techniques coupled with the well known etl processes extract transform load may meet some of the requirements in an iwms in general they fall short on several major aspects they often lack comprehensive support for both structured and unstructured data processing they are database centric and require detailed database and data warehouse knowledge to perform iwms tasks and hence they are tedious and time consuming to operate and learn they are often inflexible and insufficient in coping with wide variety of on going iw maintenance tasks such as adding new dimensions and handling regular and lengthy data updates with potential failures and errors to cope with such issues this paper describes an iwms called biwtl business information warehouse toolkit and language that automates and simplifies iwms tasks by devising high level declarative information warehousing language giwl and building the runtime system components for such language biwtl hides system details eg databases full text indexers and data warehouse models from users by automatically generating appropriate runtime scripts and executing them based on the giwl language specification moreover biwtl supports structured and unstructured information processing by embedding flexible data extraction and transformation capabilities while ensuring high performance processing for large datasets in addition this paper systematically studied the core tasks around information warehousing and identified five key areas in particular we describe our technologies in three areas ie constructing an iw data loading and maintaining an iw we have implemented such technologies in biwtl and validated it in real world environments with number of customers our experience suggests that biwtl is light weight simple efficient and flexible
our long term objective is to develop general methodology for deploying web service aggregation and adaptation middleware capable of suitably overcoming syntactic and behavioral mismatches in view of application integration within and across organizational boundaries this article focuses on describing the core aggregation process which generates the workflow of composite service from set of service workflows to be aggregated and data flow mapping linking service parameters
we present tool perfcenter which can be used for performance oriented deployment and configuration of an application in hosting center or data center while there are numb er of tools which aid in the process of performance analysis during the software development cycle few tools are geared towards aiding data center architect in making appropriate decisions during the deployment of an application perfcenter facilitates this process by allowing specification in terms that are natural to data center architect thus perfcenter takes as input the number and specs of hosts available in data center the network architecture of geographically diverse data centers the deployment of software on hosts hosts on data centers and the usage information of the application scenarios resource consumption and provides various performance measures such as scenario response times and resource utilizations we describe the perfcenter specification and its performance analysis utilities in detail and illustrate its use in the deployment and configuration of webmail application
merging and integrating different conceptual models which have been collaboratively developed by domain experts and analysts with dissimilar perspectives on the same issue has been the subject of tremendous amount of research in this paper we focus on the fact that human analysts opinions possess degree of uncertainty which can be exploited while integrating such information we propose an underlying modeling construct which is the basis for transforming conceptual models into manipulatable format based on this construct methods for formally negotiating over and merging of conceptual models are proposed the approach presented in this paper focuses on the formalization of uncertainty and expert reliability through the employment of belief theory the proposed work has been evaluated for its effectiveness and usability the evaluators group of computer science graduate students believed that the proposed framework has the capability to fulfil its intended tasks the obtained results from the performance perspective are also promising
we describe and apply lightweight formal method for checking test results the method assumes that the software under test writes text log file this log file is then analyzed by program to see if it reveals failures we suggest state machine based formalism for specifying the log file analyzer programs and describe language and implementation based on that formalism we report on empirical studies of the application of log file analysis to random testing of units we describe the results of experiments done to compare the performance and effectiveness of random unit testing with coverage checking and log file analysis to other unit testing procedures the experiments suggest that writing formal log file analyzer and using random testing is competitive with other formal and informal methods for unit testing
the problem of combining join and semijoin reducers for distributed query processing is studied an approach based on interleaving join sequence with beneficial semijoins is proposed join sequence is mapped into join sequence tree first the join sequence tree provides an efficient way to identify for each semijoin its correlated semijoins as well as its reducible relations under the join sequence in light of these properties an algorithm for determining an effective sequence of join and semijoin reducers is developed examples are given to illustrate the results they show the advantage of using combination of joins and semijoins as reducers for distributed query processing
the emergence of location based computing promises new and compelling applications but raises very real privacy risks existing approaches to privacy generally treat people as the entity of interest often using fidelity tradeoff to manage the costs and benefits of revealing person’s location however these approaches cannot be applied in some applications as reduction in precision can render location information useless this is true of category of applications that use location data collected from multiple people to infer such information as whether there is traffic jam on bridge whether there are seats available in nearby coffee shop when the next bus will arrive or if particular conference room is currently empty we present hitchhiking new approach that treats locations as the primary entity of interest hitchhiking removes the fidelity tradeoff by preserving the anonymity of reports without reducing the precision of location disclosures we can therefore support the full functionality of an interesting class of location based applications without introducing the privacy concerns that would otherwise arise
in this paper we present novel approach to iso surface extraction which is based on multiresolution volume data representation and hierarchically approximates the iso surface with semi regular mesh after having generated hierarchy of volumes we extract the iso surface from the coarsest resolution with standard marching cubes algorithm apply simple mesh decimation strategy to improve the shape of the triangles and use the result as base mesh then we iteratively fit the mesh to the iso surface at the finer volume levels thereby subdividing it adaptively in order to be able to correctly reconstruct local features we also take care of generating an even vertex distribution over the iso surface so that the final result consists of triangles with good aspect ratio the advantage of this approach as opposed to the standard method of extracting the iso surface from the finest resolution with marching cubes is that it generates mesh with subdivision connectivity which can be utilized by several multiresolution algorithms as an application of our method we show how to reconstruct the surface of archaeological items
when peak performance is unnecessary dynamic voltage scaling dvs can be used to reduce the dynamic power consumption of embedded multiprocessors in future technologies however static power consumption due to leakage current is expected to increase significantly then it will be more effective to limit the number of processors employed ie turn some of them off or to use combination of dvs and processor shutdown in this paper leakage aware scheduling heuristics are presented that determine the best trade off between these three techniques dvs processor shutdown and finding the optimal number of processors experimental results obtained using public benchmark set of task graphs and real parallel applications show that our approach reduces the total energy consumption by up to for tight deadlines the critical path length and by up to for loose deadlines the critical path length compared to an approach that only employs dvs we also compare the energy consumed by our scheduling algorithms to two absolute lower bounds one for the case where all processors continuously run at the same frequency and one for the case where the processors can run at different frequencies and these frequencies may change over time the results show that the energy reduction achieved by our best approach is close to these theoretical limits
in this paper we show novel method for modelling behaviours of security protocols using networks of communicating automata in order to verify them with sat based bounded model checking these automata correspond to executions of the participants as well as to their knowledge about letters given bounded number of sessions we can verify both correctness or incorrectness of security protocol proving either reachability or unreachability of an undesired state we exemplify all our notions on the needham schroeder public key authentication protocol nspk and show experimental results for checking authentication using the verification tool verics
partial evaluation is program transformation that automatically specializes program with respect to invariants despite successful application in areas such as graphics operating systems and software engineering partial evaluators have yet to achieve widespread use one reason is the difficulty of adequately describing specialization opportunities indeed underspecialization or overspecialization often occurs without any feedback as to the source of the problemwe have developed high level module based language allowing the program developer to guide the choice of both the code to specialize and the invariants to exploit during the specialization process to ease the use of partial evaluation the syntax of this language is similar to the declaration syntax of the target language of the partial evaluator to provide feedback declarations are checked during the analyses performed by partial evaluation the language has been successfully used by variety of users including students having no previous experience with partial evaluation
the netmine framework allows the characterization of traffic data by means of data mining techniques netmine performs generalized association rule extraction to profile communications detect anomalies and identify recurrent patterns association rule extraction is widely used exploratory technique to discover hidden correlations among data however it is usually driven by frequency constraints on the extracted correlations hence it entails generating huge number of rules which are difficult to analyze or ii pruning rare itemsets even if their hidden knowledge might be relevant to overcome these issues netmine exploits novel algorithm to efficiently extract generalized association rules which provide high level abstraction of the network traffic and allows the discovery of unexpected and more interesting traffic rules the proposed technique exploits user provided taxonomies to drive the pruning phase of the extraction process extracted correlations are automatically aggregated in more general association rules according to frequency threshold eventually extracted rules are classified into groups according to their semantic meaning thus allowing domain expert to focus on the most relevant patterns experiments performed on different network dumps showed the efficiency and effectiveness of the netmine framework to characterize traffic data
mobile and pervasive applications frequently rely on devices such as rfid antennas or sensors light temperature motion to provide them information about the physical world these devices however are unreliable they produce streams of information where portions of data may be missing duplicated or erroneous current state of the art is to correct errors locally eg range constraints for temperature readings or use spatial temporal correlations eg smoothing temperature readings however errors are often apparent only in global setting eg missed readings of objects that are known to be present or exit readings from parking garage without matching entry readingsin this paper we present streamclean system for correcting input data errors automatically using application defined global integrity constraints because it is frequently impossible to make corrections with certainty we propose probabilistic approach where the system assigns to each input tuple the probability that it is correctwe show that streamclean handles large class of input data errors and corrects them sufficiently fast to keep up with input rates of many mobile and pervasive applications we also show that the probabilities assigned by streamclean correspond to user’s intuitive notion of correctness
informal design tools can provide immense value during the creative stages of the design process eg by transforming sketches into interactive simulations two key limitations of informal and many other design tools are that they do not promote working with multiple design ideas in parallel or collaboration in this paper we present new interaction model that allows team of designers to work efficiently with multiple ideas in parallel the model is grounded in theories of creativity and collaboration and was further informed by observations of creative group work practice our interaction model is fully demonstrated within new system called team storm results from an initial evaluation indicate that design teams are able to effectively utilize our system to create organize and share multiple design ideas during creative group work the benefit of our model is that it demonstrates how many existing single user design tools can be extended to support working efficiently with multiple ideas in parallel and co located collaboration
we advocate novel programming approach we call slotted programming that not only addresses the specific hardware capabilities of sensor nodes but also facilitates coding through truly modular design the approach is based on the temporal decoupling of the different tasks of sensor node such that at any time at most one task is active in contrast to traditional sensor network programming slotted programming guarantees that each of these tasks can be implemented as an independent software module simplifying not only the coding and testing phase but also the code reuse in different context in addition we believe that the proposed approach is highly qualified for energy efficient and real time applications to substantiate our claims we have implemented slotos an extension to tinyos that supports slotted programming within this framework we demonstrate the advantages of the slotted programming paradigm
several important aspects of software systems can be expressed as dependencies between their components special class of dependencies concentrates on the program text and captures the technical structure and behavior of the target system the central characteristic making such program dependencies valuable in software engineering environments is that they can be automatically extracted from the program by applying well known methods of programming language implementation we present model of program dependencies by considering them as relations between program elements moreover we show how dependency relations form the basis of producing graph like hypertextual representation of programs for programming environment having general and well defined model of program dependencies as foundation makes it easier to systematically construct and integrate language based tools as an example application we present hypertextual tool which is founded on our relational dependency model and which can be used to maintain programs written in the programming language
fundamental challenge in the design of wireless sensor networks wsns is to maximize their lifetimes especially when they have limited and non replenishable energy supply to extend the network lifetime power management and energy efficient communication techniques at all layers become necessary in this paper we present solutions for the data gathering and routing problem with in network aggregation in wsns our objective is to maximize the network lifetime by utilizing data aggregation and in network processing techniques we particularly focus on the joint problem of optimal data routing with data aggregation en route such that the above mentioned objective is achieved we present grid based routing and aggregator selection scheme grass scheme for wsns that can achieve low energy dissipation and low latency without sacrificing quality grass embodies optimal exact as well as heuristic approaches to find the minimum number of aggregation points while routing data to the base station bs such that the network lifetime is maximized our results show that when compared to other schemes grass improves system lifetime with acceptable levels of latency in data aggregation and without sacrificing data quality
scavenged storage systems harness unused disk space from individual workstations the same way idle cpu cycles are harnessed by desktop grid applications like seti home these systems provide promising low cost high performance storage solution in certain high end computing scenarios however selecting the security level and designing the security mechanisms for such systems is challenging as scavenging idle storage opens the door for security threats absent in traditional storage systems that use dedicated nodes under single administrative domain moreover increased security often comes at the price of performance and scalability this paper develops general threat model for systems that use scavenged storage presents the design of protocol that addresses these threats and is optimized for throughput and evaluates the overheads brought by the new security protocol when configured to provide number of different security properties
recent interests on xml semantic web and web ontology among other topics have sparked renewed interest on graph structured databases fundamental query on graphs is the reachability test of nodes recently hop labeling has been proposed to index large collections of xml and or graphs for efficient reachability tests however there has been few work on updates of hop labeling this is compounded by the fact that web data changes over time in response to these this paper studies the incremental maintenance of hop labeling we identify the main reason for the inefficiency of updates of existing hop labels we propose two updatable hop labelings hybrids of hop labeling and their incremental maintenance algorithms the proposed hop labeling is derived from graph connectivities as opposed to set cover which is used by all previous work our experimental evaluation illustrates the space efficiency and update performance of various kinds of hop labeling the main conclusion is that there is natural way to spare some index size for update performance in hop labeling
large portion of the power budget in server environments goes into the subsystem the disk array in particular traditional approaches to disk power management involve completely stopping the disk rotation which can take considerable amount of time making them less useful in cases where idle times between disk requests may not be long enough to outweigh the overheads this paper presents new approach called drpm to modulate disk speed rpm dynamically and gives practical implementation to exploit this mechanism extensive simulations with different workload and hardware parameters show that drpm can provide significant energy savings without compromising much on performance this paper also discusses practical issues when implementing drpm on server disks
we present novel unsupervised learning scheme that simultaneously clusters variables of several types eg documents words and authors based on pairwise interactions between the types as observed in co occurrence data in this scheme multiple clustering systems are generated aiming at maximizing an objective function that measures multiple pairwise mutual information between cluster variables to implement this idea we propose an algorithm that interleaves top down clustering of some variables and bottom up clustering of the other variables with local optimization correction routine focusing on document clustering we present an extensive empirical study of two way three way and four way applications of our scheme using six real world datasets including the news groups ng and the enron email collection our multi way distributional clustering mdc algorithms consistently and significantly outperform previous state of the art information theoretic clustering algorithms
microarchitectural prediction based on neural learning has received increasing attention in recent years however neural prediction remains impractical because its superior accuracy over conventional predictors is not enough to offset the cost imposed by its high latency we present new neural branch predictor that solves the problem from both directions it is both more accurate and much faster than previous neural predictors our predictor improves accuracy by combining path and pattern history to overcome limitations inherent to previous predictors it also has much lower latency than previous neural predictors the result is predictor with accuracy far superior to conventional predictors but with latency comparable to predictors from industrial designs our simulations show that path based neural predictor improves the instructions per cycle ipc rate of an aggressively clocked microarchitecture by percnt over the original perceptron predictor one reason for the improved accuracy is the ability of our new predictor to learn linearly inseparable branches we show that these branches account for percnt of all branches and almost all branch mispredictions
the construction of efficient parallel programs usually requires expert knowledge in the application area and deep insight into the architecture of specific parallel machine often the resulting performance is not portable ie program that is efficient on one machine is not necessarily efficient on another machine with different architecture transformation systems provide more flexible solution they start with specification of the application problem and allow the generation of efficient programs for different parallel machines the programmer has to give an exact specification of the algorithm expressing the inherent degree of parallelism and is released from the low level details of the architecture in this article we propose such transformation system with an emphasis on the exploitation of the data parallelism combined with hierarchically organized structure of task parallelism starting with specification of the maximum degree of task and data parallelism the transformations generate specification of parallel program for specific parallel machine the transformations are based on cost model and are applied in predefined order fixing the most important design decisions like the scheduling of independent multitask activations data distributions pipelining of tasks and assignment of processors to task activations we demonstrate the usefulness of the approach with examples from scientific computing
we consider the problems of containment equivalence satisfiability and query reachability for datalog programs with negation these problems are important for optimizing datalog programs we show that both query reachability and satisfiability are decidable for programs with stratified negation provided that negation is applied only to edb predicates or that all edb predicates are unary in the latter case we show that equivalence is also decidable the algorithms we present can also be used to push constraints from given query to the edb predicates in showing our decidability results we describe powerful tool the query tree which is used for several optimization problems for datalog programs finally we show that satisfiability is undecidable for datalog programs with unary idb predicates stratified negation and the interpreted predicate ne
we study non uniform constraint satisfaction problems definable in monadic datalog stratified by the use of non linearity we show how such problems can be described in terms of homomorphism dualities involving trees of bounded pathwidth and in algebraic terms for this we introduce new parameter for trees that closely approximates pathwidth and can be characterised via hypergraph searching game
hypermedia systems are evolutionary in nature this paper presents sem hp model based on semantic systemic and evolutionary perspective which facilitates the management of hypermedia evolution during the whole lifecycle process following the model the architecture of hypermedia systems can be conceived as composed of three sub systems conceptual presentation and navigation and two abstraction levels the system level and the meta level the first used by the reader and the second by the designer the objective of this division is to achieve good separation of concerns both in the development and in the evolution processes and to obtain better understanding thus facilitating the further development of tools
nominal rewriting is based on the observation that if we add support for equivalence to first order syntax using the nominal set approach then systems with binding including higher order reduction schemes such as calculus beta reduction can be smoothly represented nominal rewriting maintains strict distinction between variables of the object language atoms and of the meta language variables or unknowns atoms may be bound by special abstraction operation but variables cannot be bound giving the framework pronounced first order character since substitution of terms for variables is not capture avoiding we show how good properties of first order rewriting survive the extension by giving an efficient rewriting algorithm critical pair lemma and confluence theorem for orthogonal systems
feature extraction is process that extracts salient features from observed variables it is considered promising alternative to overcome the problems of weight and structure optimization in artificial neural networks there were many nonlinear feature extraction methods using neural networks but they still have the same difficulties arisen from the fixed network topology in this paper we propose novel combination of genetic algorithm and feedforward neural networks for nonlinear feature extraction the genetic algorithm evolves the feature space by utilizing characteristics of hidden neurons it improved remarkably the performance of neural networks on number of real world regression and classification problems
query result clustering has recently attracted lot of attention to provide users with succinct overview of relevant results however little work has been done on organizing the query results for object level search object level search result clustering is challenging because we need to support diverse similarity notions over object specific features such as the price and weight of product of heterogeneous domains to address this challenge we propose hybrid subspace clustering algorithm called hydra algorithm hydra captures the user perception of diverse similarity notions from millions of web pages and disambiguates different senses using feature based subspace locality measures our proposed solution by combining wisdom of crowds and wisdom of data achieves robustness and efficiency over existing approaches we extensively evaluate our proposed framework and demonstrate how to enrich user experiences in object level search using real world product search scenarios
we present renderants the first system that enables interactive reyes rendering on gpus taking renderman scenes and shaders as input our system first compiles renderman shaders to gpu shaders then all stages of the basic reyes pipeline including bounding splitting dicing shading sampling compositing and filtering are executed on gpus using carefully designed data parallel algorithms advanced effects such as shadows motion blur and depth of field can also be rendered in order to avoid exhausting gpu memory we introduce novel dynamic scheduling algorithm to bound the memory consumption during rendering the algorithm automatically adjusts the amount of data being processed in parallel at each stage so that all data can be maintained in the available gpu memory this allows our system to maximize the parallelism in all individual stages of the pipeline and achieve superior performance we also propose multi gpu scheduling technique based on work stealing so that the system can support scalable rendering on multiple gpus the scheduler is designed to minimize inter gpu communication and balance workloads among gpus we demonstrate the potential of renderants using several complex renderman scenes and an open source movie entitled elephants dream compared to pixar’s prman our system can generate images of comparably high quality but is over one order of magnitude faster for moderately complex scenes the system allows the user to change the viewpoint lights and materials while producing photorealistic results at interactive speed
we introduce novel feature size for bounded planar domains endowed with an intrinsic metric given point in such domain the homotopy feature size of at or hfs for short measures half the length of the shortest loop through that is not null homotopic in the resort to an intrinsic metric makes hfs rather insensitive to the local geometry of in contrast with its predecessors local feature size weak feature size homology feature size this leads to reduced number of samples that still capture the topology of under reasonable sampling conditions involving hfs we show that the geodesic delaunay traingulation dx of finite sampling of is homotopy equivalent to moreover dx is sandwiched between the geodesic witness complex cwx and relaxed version cwx defined by parameter taking advantage of this fact we prove that the homology of dx and hence of can be retrieved by computing the persistent homology between cwx and cwx we propose algorithms for estimating hfs selecting landmark set of sufficient density building its geodesic delaunay triangulation and computing the homology of using cwx we also present some simulation results in the context of sensor networks that corroborate our theoretical statements
in this paper we propose framework for analyzing the flow of values and their reuse in loop nests to minimize data traffic under the constraints of limited on chip memory capacity and dependences our analysis first undertakes fusion of possible loop nests intra procedurally and then performs loop distribution the analysis discovers the closeness factor of two statements which is quantitative measure of data traffic saved per unit memory occupied if the statements were under the same loop nest over the case where they are under different loop nests we then develop greedy algorithm which traverses the program dependence graph pdg to group statements together under the same loop nest legally to promote maximal reuse per unit of memory occupied we implemented our framework in petit tool for dependence analysis and loop transformations we compared our method with one based on tiling of fused loop nest and one based on greedy strategy to purely maximize reuse we show that our methods work better than both of these strategies in most cases for processors such as tmscxx which have very limited amount of on chip memory the improvements in data range from to percent over tiling and from to percent over maximal reuse for jpeg loops
crossed cubes are an important class of hypercube variants this paper addresses how to embed family of disjoint meshes into crossed cube two major contributions of this paper are for family of two disjoint meshes of size xx can be embedded in an crossed cube with unit dilation and unit expansion and for family of four disjoint meshes of size xx can be embedded in an crossed cube with unit dilation and unit expansion these results mean that family of two or four mesh structured parallel algorithms can be executed on same crossed cube efficiently and in parallel our work extends the results recently obtained by fan and jia fan jia embedding meshes into crossed cubes information sciences
mashup is new application development approach that allows users to aggregate multiple services to create service for new purpose even if the mashup approach opens new and broader opportunities for data service consumers the development process still requires the users to know not only how to write code using programming languages but also how to use the different web apis from different services in order to solve this problem there is increasing effort put into developing tools which are designed to support users with little programming knowledge in mashup applications development the objective of this study is to analyze the richnesses and weaknesses of the mashup tools with respect to the data integration aspect
multidimensional semistructured data mssd are semistructured data that present different facets under different contexts context represents alternative worlds and is expressed by assigning values to set of user defined variables called dimensions the notion of context has been incorporated in the object exchange model oem and the extended model is called multidimensional oem moem graph model for mssd in this paper we explain in detail how moem can represent the history of an oem database we discuss how moem properties are applied in the case of representing oem histories and show that temporal oem snapshots can be obtained from moem we present system that implements the proposed ideas and we use an example scenario to demonstrate how an underlying moem database accommodates changes in an oem database furthermore we show that moem is capable to model changes occurring not only in oem databases but in multidimensional oem databases as well the use of multidimensional query language mql query language for mssd is proposed for querying the history of oem databases and moem databases
to satisfy potential customers of web site and to lead them to the goods offered by the site one should support them in the course of navigation they have embarked on this paper presents the tool stratdyn developed as an add on module to the web usage miner wum wum not only discovers frequent sequences but it also allows the inspection of the different paths through the site stratdyn extends these capabilities it tests differences between navigation patterns described by number of measures of success and strategy for statistical significance this can help to single out the relevant differences between users behaviors and it can determine whether change in the site’s design has had the desired effect stratdyn also exploits the site’s semantics in the classification of navigation behavior and in the visualization of results displaying navigation patterns as alternative paths through strategy space this helps to understand the web logs and to communicate analysis results to non experts two case studies investigate search in an online catalog and interaction with an electronic shopping agent in an online store they show how the results of analysis can lead to proposals for improving web site these highlight the importance of investigating measures not only of eventual success but also of process to help users navigate towards the site’s offers
sequential pattern mining from sequence databases has been recognized as an important data mining problem with various applications items in sequence database can be organized into concept hierarchy according to taxonomy based on the hierarchy sequential patterns can be found not only at the leaf nodes individual items of the hierarchy but also at higher levels of the hierarchy this is called multiple level sequential pattern mining in previous research taxonomies based on crisp relationships between any two disjointed levels however cannot handle the uncertainties and fuzziness in real life for example tomatoes could be classified into the fruit category but could be also regarded as the vegetable category to deal with the fuzzy nature of taxonomy chen and huang developed novel knowledge discovering model to mine fuzzy multi level sequential patterns where the relationships from one level to another can be represented by value between and in their work generalized sequential patterns gsp like algorithm was developed to find fuzzy multi level sequential patterns this algorithm however faces difficult problem since the mining process may have to generate and examine huge set of combinatorial subsequences and requires multiple scans of the database in this paper we propose new efficient algorithm to mine this type of pattern based on the divide and conquer strategy in addition another efficient algorithm is developed to discover fuzzy cross level sequential patterns since the proposed algorithm greatly reduces the candidate subsequence generation efforts the performance is improved significantly experiments show that the proposed algorithm is much more efficient and scalable than the previous one in mining real life databases our works enhance the model’s practicability and could promote more applications in business
in this paper we examine how singular value decomposition svd along with demographic information can enhance plain collaborative filtering cf algorithms after brief introduction to svd where some of its previous applications in recommender systems are revisited we proceed with full description of our proposed method which utilizes svd and demographic data at various points of the filtering procedure in order to improve the quality of the generated predictions we test the efficiency of the resulting approach on two commonly used cf approaches user based and item based cf the experimental part of this work involves number of variations of the proposed approach the results show that the combined utilization of svd with demographic data is promising since it does not only tackle some of the recorded problems of recommender systems but also assists in increasing the accuracy of systems employing it
building verified compilers is difficult especially when complex analyses such as type checking or data flow analysis must be performed both the type checking and program optimization communities have developed methods for proving the correctness of these processes and developed tools for using respectively verified type systems and verified optimizations however it is difficult to use both of these analyses in single declarative framework since these processes work on different program representations type checking on abstract syntax trees and data flow analysis based optimization on control flow or program dependency graphs we present an attribute grammar specification language that has been extended with constructs for specifying attribute labelled control flow graphs and both ctl and ltl fv formulas that specify data flow analyses these formulas are model checked on these graphs to perform the specified analyses thus verified type rules and verified data flow analyses verified either by hand or with automated proof tools can both be transcribed into single declarative framework based on attribute grammars to build high confidence language implementations also the attribute grammar specification language is extensible so that it is relatively straight forward to add new constructs for different temporal logics so that alternative logics and model checkers can be used to specify data flow analyses in this framework
service based applications sbas increasingly have to become adaptive in order to operate and evolve in highly dynamic environments research on sbas thus has already produced range of adaptation techniques and strategies however adaptive sbas are prone to specific failures that would not occur in static applications examples are faulty adaptation behaviours due to changes not anticipated during design time or conflicting adaptations due to concurrently occurring events for adaptive sbas to become reliable and thus applicable in practice novel techniques that ensure the correctness of adaptations are needed to pave the way towards those novel techniques this paper identifies different kinds of adaptation specific failures based on classification of existing adaptation approaches and generic correctness assurance techniques we discuss how adaptation specific failures can be addressed and where new advanced techniques for correctness assurance of adaptations are required
the typical mode for querying in an image content based information system is query by example which allows the user to provide an image as query and to search for similar images ie the nearest neighbors based on one or combination of low level multidimensional features of the query example off lime this requires the time consuming pre computing of the whole set of visual descriptors over the image database on line one major drawback is that multidimensional sequential nn search is usually exhaustive over the whole image set face to the user who has very limited patience in this paper we propose technique for improving the performance of image query by example execution strategies over multiple visual features this includes first the pre clustering of the large image database and then the scheduling of the processing of the feature clusters before providing progressively the query results ie intermediate results are sent continuously before the end of the exhaustive scan over the whole database cluster eligibility criterion and two filtering rules are proposed to select the most relevant clusters to query by example experiments over more than images and five mpeg global features show that our approach significantly reduces the query time in two experimental cases the query time is divided by for clusters per descriptor type and by for clusters per descriptor type compared to blind sequential nn search with keeping the same final query result this constitutes promising perspective for optimizing image query by example execution
malware is at the root of large number of information security breaches despite widespread effort devoted to combating malware current techniques have proven to be insufficient in stemming the incessant growth in malware attacks in this paper we describe tool that exploits combination of virtualized isolated execution environments and dynamic binary instrumentation dbi to detect malicious software and prevent its execution we define two isolated environments testing environment wherein an untrusted program is traced during execution using dbi and subjected to rigorous checks against extensive security policies that express behavioral patterns of malicious software and ii real environment wherein program is subjected to run time monitoring using behavioral model in place of the security policies along with continuous learning process in order to prevent non permissible behaviorwe have evaluated the proposed methodology on both linux and windows xp operating systems using several virus benchmarks as well as obfuscated versions thereof experiments demonstrate that our approach achieves almost complete coverage for original and obfuscated viruses average execution times go up to and in the testing and real environments respectively the high overhead imposed in the testing environment does not create severe impediment since it occurs only once and is transparent to the user users are only affected by the overhead imposed in the real environment we believe that our approach has the potential to improve on the state of the art in malware detection offering improved accuracy with low performance penalty
we present an efficient method for modeling multi threaded concurrent systems with shared variables and locks in bounded model checking bmc and use it to improve the detection of safety properties such as data races previous approaches based on synchronous modeling of interleaving semantics do not scale up well due to the inherent asynchronism in those models instead in our approach we first create independent uncoupled models for each individual thread in the system then explicitly add additional synchronization variables and constraints incrementally and only where such synchronization is needed to guarantee the chosen concurrency semantics based on sequential consistency we describe our modeling in detail and report verification results to demonstrate the efficacy of our approach on complex case study
the aim of this research is to investigate the role of social networks in computer science education the internet shows great potential for enhancing collaboration between people and the role of social software has become increasingly relevant in recent years this research focuses on analyzing the role that social networks play in students learning experiences the construction of students social networks the evolution of these networks and their effects on the students learning experience in university environment are examined
number of concerns in multiagent systems mas have broadly scoped impact on the system architectural decomposition which in turn hinder the design of modular mas architectures typical examples of crosscutting concerns in mas architectures include learning mobility coordination and autonomy nowadays there are some architectural proposals that envisage an emerging aspect oriented architectural pattern as potential solution to address modularity shortcomings of conventional architectural patterns for mas designs however little effort has been dedicated to effectively assess when and which of these emerging and traditional architectural solutions promote in fact superior modularity in the presence of crosscutting mas concerns this paper presents quantitative comparison between aspect oriented and conventional mas architectures our analysis evaluates how the architectures under comparison support the promotion of enhanced modularity in the presence of architectural crosscutting concerns in mas design our evaluation used two medium sized mas applications and was centred on fundamental modularity attributes
database users can be frustrated by having an empty answer to query in this paper we propose framework to systematically relax queries involving joins and selections when considering relaxing query condition intuitively one seeks the minimal amount of relaxation that yields an answer we first characterize the types of answers that we return to relaxed queries we then propose lattice based framework in order to aid query relaxation nodes in the lattice correspond to different ways to relax queries we characterize the properties of relaxation at each node and present algorithms to compute the corresponding answer we then discuss how to traverse this lattice in way that non empty query answer is obtained with the minimum amount of query condition relaxation we implemented this framework and we present our results of thorough performance evaluation using real and synthetic data our results indicate the practical utility of our framework
the experiment described in this paper shows test environment constructed with two information spaces one large with nodes ordered in semi structured groups in which participants performed search and browse tasks the other was smaller and designed for precision zooming where subjects performed target selection simulation tasks for both tasks modes of gaze and mouse controlled navigation were compared the results of the browse and search tasks showed that the performances of the most efficient mouse and gaze implementations were indistinguishable however in the target selection simulation tasks the most efficient gazecontrol proved to be about faster than the most efficient mouse control the results indicate that gaze controlled pan zoom navigation is viable alternative to mouse control in inspection and target exploration of large multi scale environments however supplementing mouse control with gaze navigation also holds interesting potential for interface and interaction design
modeling tasks such as surface deformation and editing can be analyzed by observing the local behavior of the surface we argue that defining modeling operation by asking for rigidity of the local transformations is useful in various settings such formulation leads to non linear yet conceptually simple energy formulation which is to be minimized by the deformed surface under particular modeling constraints we devise simple iterative mesh editing scheme based on this principle that leads to detail preserving and intuitive deformations our algorithm is effective and notably easy to implement making it attractive for practical modeling applications
private matching between datasets owned by distinct parties is challenging problem with several applications private matching allows two parties to identify the records that are close to each other according to some distance functions such that no additional information other than the join result is disclosed to any party private matching can be solved securely and accurately using secure multi party computation smc techniques but such an approach is prohibitively expensive in practice previous work proposed the release of sanitized versions of the sensitive datasets which allows blocking ie filtering out sub sets of records that cannot be part of the join result this way smc is applied only to small fraction of record pairs reducing the matching cost to acceptable levels the blocking step is essential for the privacy accuracy and efficiency of matching however the state of the art focuses on sanitization based on anonymity which does not provide sufficient privacy we propose an alternative design centered on differential privacy novel paradigm that provides strong privacy guarantees the realization of the new model presents difficult challenges such as the evaluation of distance based matching conditions with the help of only statistical queries interface specialized versions of data indexing structures eg kd trees also need to be devised in order to comply with differential privacy experiments conducted on the real world census income dataset show that although our methods provide strong privacy their effectiveness in reducing matching cost is not far from that of anonymity based counterparts
the differential evolution de algorithm is simple and efficient evolutionary algorithm that has been applied to solve many optimization problems mainly in continuous search domains in the last few years many implementations of multi objective versions of de have been proposed in the literature combining the traditional differential mutation operator as the variation mechanism and some form of pareto ranking based fitness in this paper we propose the utilization of the differential mutation operator as an additional operator to be used within any multi objective evolutionary algorithm that employs an archive offline population the operator is applied for improving the high quality solutions stored in the archive working both as local search operator and diversity operator depending on the points selected to build the differential mutation in order to illustrate the use of the operator it is coupled with the nsga ii and the multi objective de mode showing promising results
in concurrent system with processes vector clocks of size are used for tracking dependencies between the events using vectors of size leads to scalability problems moreover association of components with processes makes vector clocks cumbersome and inefficient for systems with dynamic number of processes we present class of logical clock algorithms called chain clock for tracking dependencies between relevant events based on generalizing process to any chain in the computation poset chain clocks are generally able to track dependencies using fewer than components and also adapt automatically to systems with dynamic number of processes we compared the performance of dynamic chain clock dcc with vector clock for multithreaded programs in java with of total events being relevant events dcc requires times fewer components than vector clock and the timestamp traces are smaller by factor of for the same case although dcc requires shared data structures it is still times faster than vector clock in our experiments we also study the class of chain clocks which perform optimally for posets of small width and show that single algorithm cannot perform optimally for posets of small width as well as large width
in large organizations the administration of access privileges such as the assignment of access rights to user in particular role is handled cooperatively through distributed administrators in various different capacities quorum may be necessary or veto may be possible for such decision in this paper we present two major contributions we develop role based access control rbac approach for specifying distributed administration requirements and procedures between administrators or administration teams extending earlier work on distributed modular authorization while comprehensive specification in such language is conceivable it would be quite tedious to evaluate or analyze their operational aspects and properties in practice for this reason we create new class of extended petri nets called administration nets adm nets such that any rbac specification of cooperative administration requirements given in terms of predicate logic formulas can be embedded into an adm net this net behaves within the constraints specified by the logical formulas and at the same time it explicitly exhibits all needed operational details such as allowing for an efficient and comprehensive formal analysis of administrative behavior we introduce the new concepts and illustrate their use in several examples while adm nets are much more refined and behaviorally explicit than workflow systems our work provides for constructive step towards novel workflow management tools as well we demonstrate the usefulness of adm nets by modeling typical examples of administration processes concerned with sets of distributed authorization rules
we propose frameworks and algorithms for identifying communities in social networks that change over time communities are intuitively characterized as unusually densely knit subsets of social network this notion becomes more problematic if the social interactions change over time aggregating social networks over time can radically misrepresent the existing and changing community structure instead we propose an optimization based approach for modeling dynamic community structure we prove that finding the most explanatory community structure is np hard and apx hard and propose algorithms based on dynamic programming exhaustive search maximum matching and greedy heuristics we demonstrate empirically that the heuristics trace developments of community structure accurately for several synthetic and real world examples
in this paper we describe the vision behind the unified activity management project at ibm research in particular we describe and discuss activities activity centered computing and activity patterns and illustrate the potential impact of this approach and its value to individuals teams and the enterprise we discuss business activities and their integration into the development of business processes we share insights from user studies and feedback from customers on the benefits of the activity model in variety of business settings
in this paper we argue that hci practitioners are facing new challenges in design and evaluation that can benefit from the establishment of commonly valued use qualities with associated strategies for producing and rigorously evaluating work we present particular use quality suppleness as an example we describe ways that use qualities can help shape design and evaluation process and propose tactics for the chi community to use to encourage the evolution of bodies of knowledge around use qualities
new approach for ensuring the security of mobile code is proposed our approach enables mobile code consumer to understand and formally reason about what piece of mobile code can do check if the actions of the code are compatible with his her security policies and if so execute the code the compatibility checking process is automated but if there are conflicts consumers have the opportunity to refine their policies taking into account the functionality provided by the mobile code finally when the code is executed our framework uses runtime monitoring techniques to ensure that the code does not violate the consumer’s refined policiesat the heart of our method which we call model carrying code mcc is the idea that piece of mobile code comes equipped with an expressive yet concise model of the code’s security relevant behavior the generation of such models can be automated mcc enjoys several advantages over current approaches to mobile code security it protects consumers of mobile code from malicious or faulty code without unduly restricting the code’s functionality also it is applicable to the vast majority of code that exists today which is written in or this contrasts with previous approaches such as java security and proof carrying code which are either language specific or are limited to type safe languages finally mcc can be combined with existing techniques such as cryptographic signing and proof carrying code to yield additional benefits
the multidimensional md modeling which is the foundation of data warehouses dws md databases and on line analytical processing olap applications is based on several properties different from those in traditional database modeling in the past few years there have been some proposals providing their own formal and graphical notations for representing the main md properties at the conceptual level however unfortunately none of them has been accepted as standard for conceptual md modelingin this paper we present an extension of the unified modeling language uml using uml profile this profile is defined by set of stereotypes constraints and tagged values to elegantly represent main md properties at the conceptual level we make use of the object constraint language ocl to specify the constraints attached to the defined stereotypes thereby avoiding an arbitrary use of these stereotypes we have based our proposal in uml for two main reasons uml is well known standard modeling language known by most database designers thereby designers can avoid learning new notation and ii uml can be easily extended so that it can be tailored for specific domain with concrete peculiarities such as the multidimensional modeling for data warehouses moreover our proposal is model driven architecture mda compliant and we use the query view transformation qvt approach for an automatic generation of the implementation in target platform throughout the paper we will describe how to easily accomplish the md modeling of dws at the conceptual level finally we show how to use our extension in rational rose for md modeling
high performance superscalar architectures used to exploit instruction level parallelism in single thread applications have become too complex and too power hungry for the many core processors era we propose new architecture that uses multiple latency tolerant in order cores to improve single thread performance without requiring complex out of order execution hardware or large power hungry register files and instruction buffers using simple cores to provide improved single thread performance for conventional difficult to parallelize applications allows designers to place many more of these cores on the same die consequently emerging highly parallel applications can take full advantage of the many core parallel hardware without sacrificing performance of inherently serial applications our architecture splits single thread program execution into disjoint control and data threads that execute concurrently on multiple latency tolerant in order cores hence we call this style of execution disjoint out of order execution doe doe is novel implementation of speculative multithreading spmt it uses latency tolerance to overcome performance issues of spmt caused by load imbalance and inter thread data communication delays using control independence prediction hardware to spawn threads we simulate the potential performance of doe on subset of spec integer benchmarks under various parallelism scenarios and for doe configurations of and single issue latency tolerant cores
since data archiving in sensor networks is communication intensive application careful power management of communication is of critical importance for such networks an example is fps an adaptive power scheduling algorithm that combines slotted scheduling with csma mac in this paper we first propose new global power scheduling protocol called multi flow power scheduling mps that delivers more data and consumes less energy than existing power scheduling protocols mps sets up transmission schedule through standard data aggregation and dissemination operations however since it creates global schedule it does not scale to large networks we then present new power scheduling protocol called hybrid power scheduling hps that retains the scalability of fps while maintaining the energy efficiency and high data delivery rate of mps in thorough simulation study we compare hps and mps and our results show the efficacy of hps
the goal of this roadmap paper is to summarize the state of the art and to identify critical challenges for the systematic software engineering of self adaptive systems the paper is partitioned into four parts one for each of the identified essential views of self adaptation modelling dimensions requirements engineering and assurances for each view we present the state of the art and the challenges that our community must address this roadmap paper is result of the dagstuhl seminar on software engineering for self adaptive systems which took place in january
today the current state of the art in querying xml data is represented by xpath and xquery both of which rely on boolean conditions for node selection boolean selection is too restrictive when users do not use or even know the data structure precisely eg when queries are written based on summary rather than on schema in this paper we describe xml querying framework called fuzzyxpath based on fuzzy set theory which relies on fuzzy conditions for the definition of flexible constraints on stored data function called deep similar is introduced to replace xpath’s typical deep equal function the main goal is to provide degree of similarity between two xml trees assessing whether they are similar both structure wise and content wise several query examples are discussed in the field of xml based metadata for learning
shape from silhouettes algorithms use either surface or volumetric approach problem with the volumetric approach is that in general we know neither the accuracy of the reconstruction nor where to locate new viewpoints for improving the accuracy in this paper we present general approach to interactive object specific volumetric algorithms based on necessary condition for the best possible reconstruction to have been performed the outlined approach can be applied to any class of objects as an example of this approach an interactive algorithm has been implemented for convex polyhedra
this paper presents novel method for surface reconstruction that uses polarization and shading information from two views the method relies on polarization data acquired using standard digital camera and linear polarizer fresnel theory is used to process the raw images and to obtain initial estimates of surface normals assuming that the reflection type is diffuse based on this idea the paper presents two novel contributions to the problem of surface reconstruction the first is technique to enhance the surface normal estimates by incorporating shading information into the method this is done using robust statistics to estimate how the measured pixel brightnesses depend on the surface orientation this gives an estimate of the object material reflectance function which is used to refine the estimates of the surface normals the second contribution is to use the refined estimates to establish correspondence between two views of an object to do this set of patches are extracted from each view and are aligned by minimizing an energy functional based on the surface normal estimates and local topographic properties the optimum alignment parameters for different patch pairs are then used to establish stereo correspondence this process results in an unambiguous field of surface normals which can be integrated to recover the surface depth our technique is most suited to smooth non metallic surfaces it complements existing stereo algorithms since it does not require salient surface features to obtain correspondences an extensive set of experiments yielding reconstructed objects and reflectance functions are presented and compared to ground truth
most programming languages support call stack in the programming model and also in the runtime systemwe show that for applications targeting low power embedded microcontrollers mcus ram usage can be significantly decreased by partially or completely eliminating the runtime callstack we present flattening transformation that absorbs function into its caller replacing function invocations and returns with jumps unlike inlining flattening does not duplicate the bodies of functions that have multiple callsites applied aggressively flattening results in stack elimination flattening is most useful in conjunction with lifting transformation that moves global variables into local scope flattening and lifting can save ram however even more benefit can be obtained by adapting the compiler to cope with properties of flattened code first we show that flattening adds false paths that confuse standard live variables analysis the resulting problems can be mitigated by breaking spurious live range conflicts between variables using information from the unflattened callgraph second we show that the impact of high register pressure due to flattened and lifted code and consequent spills out of the register allocator can be mitigated by improving compiler’s stack layout optimizations we have implemented both of these improvements in gcc and have implemented flattening and lifting as source to source transformations on collection of applications for the avr family of bit mcus we show that total ram usage can be reduced by by compiling flattened and lifted programs with our improved gcc
supervised learning algorithms usually require high quality labeled training set of large volume it is often expensive to obtain such labeled examples in every domain of an application domain adaptation aims to help in such cases by utilizing data available in related domains however transferring knowledge from one domain to another is often non trivial due to different data distributions among the domains moreover it is usually very hard to measure and formulate these distribution differences hence we introduce new concept of label relation function to transfer knowledge among different domains without explicitly formulating the data distribution differences novel learning framework domain transfer risk minimization dtrm is proposed based on this concept dtrm simultaneously minimizes the empirical risk for the target and the regularized empirical risk for source domain under this framework we further derive generic algorithm called domain adaptation by label relation dalr that is applicable to various applications in both classification and regression settings dalr iteratively updates the target hypothesis function and outputs for the source domain until it converges we provide an in depth theoretical analysis of dtrm and establish fundamental error bounds we also experimentally evaluate dalr on the task of ranking search results using real world data our experimental results show that the proposed algorithm effectively and robustly utilizes data from source domains under various conditions different sizes for source domain data different noise levels for source domain data and different difficulty levels for target domain data
this paper addresses the problem of determining the node locations in ad hoc sensor networks when only connectivity information is available in previous work we showed that the localization algorithm mds map proposed by shang et al is able to localize sensors up to bounded error decreasing at rate inversely proportional to the radio range the main limitation of mds map is the assumption that the available connectivity information is processed in centralized way in this work we investigate practically important question whether similar performance guarantees can be obtained in distributed setting in particular we analyze the performance of the hop terrain algorithm proposed by savarese et al this algorithm can be seen as distributed version of the mds map algorithm more precisely assume that the radio range and that the network consists of sensors positioned randomly on dimensional unit cube and anchors in general positions we show that when only connectivity information is available for every unknown node the euclidean distance between the estimate xi and the correct position xi is bounded by xi xi where cd log for some constant cd which only depends on furthermore we illustrate that similar bound holds for the range based model when the approximate measurement for the distances is provided
reactive systems proposed by leifer and milner represent meta framework aimed at deriving behavioral congruences for those specification formalisms whose operational semantics is provided by rewriting rules despite its applicability reactive systems suffered so far from two main drawbacks first of all no technique was found for recovering set of inference rules eg in the so called sos style for describing the distilled observational semantics most importantly the efforts focused on strong bisimilarity tackling neither weak nor barbed semanticsour paper addresses both issues instantiating them on calculus whose semantics is still in flux cardelli and gordon’s mobile ambientswhile the solution to the first issue is tailored over our case study we provide general framework for recasting weak barbed equivalence in the reactive systems formalism moreover we prove that our proposal captures the behavioural semantics for mobile ambients proposed by rathke and sobociÅ�ski and by merro and zappa nardelli
this paper aims to reducing difference between sketches and photos by synthesizing sketches from photos and vice versa and then performing sketch sketch photo photo recognition with subspace learning based methods pseudo sketch pseudo photo patches are synthesized with embedded hidden markov model because these patches are assembled by averaging their overlapping area in most of the local strategy based methods which leads to blurring effect to the resulted pseudo sketch pseudo photo we integrate the patches with image quilting experiments are carried out to demonstrate that the proposed method is effective to produce pseudo sketch pseudo photo with high quality and achieve promising recognition results
the increasing amount of traditional network services may still not fulfil the requirements of ever demanding applications and must therefore be enriched by some form of increased intelligence in the network this is where the promise of autonomous systems comes into play autonomous systems are capable of performing activities by taking into account the local environment and adapting to it no planning is required therefore autonomous systems must optimise the use of the resources at hand this paper clearly identifies the need for autonomous systems in networking research anticipated architectures characteristics and properties the path of evolution from traditional network elements and the future of such systems
in this paper we present new technique based on strain fields to carry out shape morphing for applications in computer graphics and related areas strain is an important geometric quantity used in mechanics to describe the deformation of objects we apply it in novel way to analyze and control deformation in morphing using position vector fields the strain field relating source and target shapes can be obtained by interpolating this strain field between zero and final desired value we can obtain the position field for intermediate shapes this method ensures that the morphing process is smooth locally volumes suffer minimal distortion and no shape jittering or wobbling happens other methods do not necessarily have these desirable properties we also show how to control the method so that changes of shape in particular size changes vary linearly with time
in this paper we present semi automatic approach to efficiently and robustly recover the characteristic feature curves of given free form surface the technique supports sketch based interface where the user just has to roughly sketch the location of feature by drawing stroke directly on the input mesh the system then snaps this initial curve to the correct position based on graph cut optimization scheme that takes various surface properties into account additional position constraints can be placed and modified manually which allows for an interactive feature curve editing functionality we demonstrate the usefulness of our technique by applying it to practical problem scenario in reverse engineering here we consider the problem of generating statistical pca shape model for car bodies the crucial step is to establish proper feature correspondences between large number of input models due to the significant shape variation fully automatic techniques are doomed to failure with our simple and effective feature curve recovery tool we can quickly sketch set of characteristic features on each input model which establishes the correspondence to pre defined template mesh and thus allows us to generate the shape model finally we can use the feature curves and the shape model to implement an intuitive modeling metaphor to explore the shape space spanned by the input models
general haptic interaction with solid models requires an underlying physically based model that can generate in real time the forces and deformations to be rendered as result of user interaction in order to allow for rich set of interactions the physical model must support real time topological modifications including the embedding of new elements in the model and the introduction of cuts in the geometry in this paper we describe and demonstrate physically based framework for real time interaction with solid models discretized by finite elements we present model formulation that allows for fast progressive updates to be used in modeling the addition of new elements as well as dynamic inter and intra element changes in model connectivity our motivating applications have been in the area of open suturing simulations where cutting through skin and tissue undermining skin to separate it from the underlying soft tissue addition of sutures to close wounds and manipulation using multiple surgical instruments simultaneously are all tasks that must be supported we show new surgical simulator we recently developed to demonstrate the framework
currently paradigms such as component based software development and service oriented software architectures promote modularization of software systems into highly decoupled and reusable software components and services in addition to improve manageability and evolveability software systems are extended with management capabilities and self managed behavior because of their very nature these self managed software systems often are mission critical and highly available in this paper we focus on the complexity of preserving correctness in modularized self managed systems we discuss the importance of consistent software compositions in the context of self managed systems and the need for correctness preserving adaptation process we also give flavor of possible approaches for preserving correctness and conclude with some remarks and open questions
classifying nodes in networks is task with wide range of applications it can be particularly useful in anomaly and fraud detection many resources are invested in the task of fraud detection due to the high cost of fraud and being able to automatically detect potential fraud quickly and precisely allows human investigators to work more efficiently many data analytic schemes have been put into use however schemes that bolster link analysis prove promising this work builds upon the belief propagation algorithm for use in detecting collusion and other fraud schemes we propose an algorithm called snare social network analysis for risk evaluation by allowing one to use domain knowledge as well as link knowledge the method was very successful for pinpointing misstated accounts in our sample of general ledger data with significant improvement over the default heuristic in true positive rates and lift factor of up to more than twice that of the default heuristic we also apply snare to the task of graph labeling in general on publicly available datasets we show that with only some information about the nodes themselves in network we get surprisingly high accuracy of labels not only is snare applicable in wide variety of domains but it is also robust to the choice of parameters and highly scalable linearly with the number of edges in graph
scalability is an important issue in object recognition as it reduces database storage and recognition time in this paper we propose new scalable object representation and learning method to recognize many everyday objects the key proposal for scalable object representation is to combine the concept of feature sharing with multi view clustering in part based object representation in particular common frame constellation model cfcm in this representation scheme we also propose fully automatic learning method appearance based automatic feature clustering and sequential construction of clustered cfcms from labeled multi views and multiple objects we evaluated the scalability of the proposed method to coil db and applied the learning scheme to objects with training views experimental results show the scalable learning results in almost constant recognition performance relative to the number of objects
new method of estimating some statistical characteristics of tcp flows in the internet is developed in this paper for this purpose new set of random variables referred to as observables is defined when dealing with sampled traffic these observables can easily be computed from sampled data by adopting convenient mouse elephant dichotomy also dependent on traffic it is shown how these variables give reliable statistical representation of the number of packets transmitted by large flows during successive time intervals with an appropriate duration mathematical framework is developed to estimate the accuracy of the method as an application it is shown how one can estimate the number of large tcp flows when only sampled traffic is available the algorithm proposed is tested against experimental data collected from different types of ip networks
no single encoding scheme or fault model is optimal for all data versatile storage system allows them to be matched to access patterns reliability requirements and cost goals on per data item basis ursa minor is cluster based storage system that allows data specific selection of and on line changes to encoding schemes and fault models thus different data types can share scalable storage infrastructure and still enjoy specialized choices rather than suffering from one size fits all experiments with ursa minor show performance benefits of when using specialized choices as opposed to single more general configuration experiments also show that single cluster supporting multiple workloads simultaneously is much more efficient when the choices are specialized for each distribution rather than forced to use one size fits all configuration when using the specialized distributions aggregate cluster throughput nearly doubled
personalisation of web information systems wiss means customisation of the presented data content to the needs of users restricting the available functionality to the goals and preferences of users and tailoring the web presentation according to used devices and style options this paper primarily concentrates on the customisation of functionality by making all those operations available to user that are needed to achieve specified goal and by organising them in an action scheme called plot that is in accordance with the behavioural preferences of the user plots are formalised by algebraic expressions in kleene algebras with tests kats then personalisation can be formalised as an optimisation problem with equational preference rules for which term rewriting approach is proposed in second step the approach is extended to conditional term rewriting thereby dispensing with the particular need to associate preference rules with user profiles finally the approach is refined by taking content specifications via extended views and abstract programs on these views into account this leads us to reformulating the personalisation problem in higher order dynamic logic
the paper compares different approaches to estimate the reliability of individual predictions in regression we compare the sensitivity based reliability estimates developed in our previous work with four approaches found in the literature variance of bagged models local cross validation density estimation and local modeling by combining pairs of individual estimates we compose combined estimate that performs better than the individual estimates we tested the estimates by running data from domains through eight regression models regression trees linear regression neural networks bagging support vector machines locally weighted regression random forests and generalized additive model the results demonstrate the potential of sensitivity based estimate as well as the local modeling of prediction error with regression trees among the tested approaches the best average performance was achieved by estimation using the bagging variance approach which achieved the best performance with neural networks bagging and locally weighted regression
we study the problem of optimally partitioning two dimensional array of elements by cutting each coordinate axis into respectively intervals resulting in rectangular regions this problem arises in several applications in databases parallel computation and image processing our main contribution are new approximation algorithms for these np complete problems that improve significantly over previously known bounds the algorithms are fast and simple work for variety of measures of partitioning quality generalize to dimensions and achieve almost optimal approximation ratios we also extend previous np completeness results for this class of problems
we study similarity queries for time series data where similarity is defined in fairly general way in terms of distance function and set of affine transformations on the fourier series representation of sequence we identify safe set of transformations supporting wide variety of comparisons and show that this set is rich enough to formulate operations such as moving average and time scaling we also show that queries expressed using safe transformations can efficiently be computed without prior knowledge of the transformations we present query processing algorithm that uses the underlying multidimensional index built over the data set to efficiently answer similarity queries our experiments show that the performance of this algorithm is competitive to that of processing ordinary exact match queries using the index and much faster than sequential scanning we propose generalization of this algorithm for simultaneously handling multiple transformations at time and give experimental results on the performance of the generalized algorithm
we present spreadsheet debugger targeted at end users whenever the computed output of cell is incorrect the user can supply an expected value for cell which is employed by the system to generate list of change suggestions for formulas that when applied would result in the user specified output the change suggestions are ranked using set of heuristics in previous work we had presented the system as proof of concept in this paper we describe systematic evaluation of the effectiveness of inferred change suggestions and the employed ranking heuristics based on the results of the evaluation we have extended both the change inference process and the ranking of suggestions an evaluation of the improved system shows that change inference process and the ranking heuristics have both been substantially improved and that the system performs effectively
in this paper we propose new term dependence model for information retrieval which is based on theoretical framework using markov random fields we assume two types of dependencies of terms given in query long range dependencies that may appear for instance within passage or sentence in target document and ii short range dependencies that may appear for instance within compound word in target document based on this assumption our two stage term dependence model captures both long range and short range term dependencies differently when more than one compound word appear in query we also investigate how query structuring with term dependence can improve the performance of query expansion using relevance model the relevance model is constructed using the retrieval results of the structured query with term dependence to expand the query we show that our term dependence model works well particularly when using query structuring with compound words through experiments using gigabyte test collection of web documents mostly written in japanese we also show that the performance of the relevance model can be significantly improved by using the structured query with our term dependence model
specification of software for safety critical embedded computer systems has been widely addressed in literature to achieve the high level of confidence in specification’s correctness necessary in many applications manual inspections formal verification and simulation must be used in concert researchers have successfully addressed issues in inspection and verification however results in the areas of execution and simulation of specifications have not made as large an impact as desiredin this paper we present an approach to specification based prototyping which addresses this issue it combines the advantages of rigorous formal specifications and rapid systems prototyping the approach lets us refine formal executable model of the system requirements to detailed model of the software requirements throughout this refinement process the specification is used as prototype of the proposed software thus we guarantee that the formal specification of the system is always consistent with the observed behavior of the prototype the approach is supported with the nimbus environment framework that allows the formal specification to execute while interacting with software models of its embedding environment or even the physical environment itself hardware in the loop simulation
supporting continuous media cm data such as video and audio imposes stringent demands on the retrieval performance of multimedia server in this paper we propose and evaluate set of data placement and retrieval algorithms to exploit the full capacity of the disks in multimedia server the data placement algorithm declusters every object over all of the disks in the server using time based declustering unit with the aim of balancing the disk load as for runtime retrieval the quintessence of the algorithm is to give each disk advance notification of the blocks that have to be fetched in the impending time periods so that the disk can optimize its service schedule accordingly moreover in processing block request for replicated object the server will dynamically channel the retrieval operation to the most lightly loaded disk that holds copy of the required block we have implemented multimedia server based on these algorithms performance tests reveal that the server achieves very high disk efficiency specifically each disk is able to support up to mpeg streams moreover experiments suggest that the aggregate retrieval capacity of the server scales almost linearly with the number of disks
an unattended wireless sensor network uwsn might collect valuable data representing an attractive target for the adversary since sink visits the network infrequently unattended sensors cannot immediately off load data to some safe external entity with sufficient time between sink visits powerful mobile adversary can easily compromise sensor collected data in this paper we propose two schemes comac and exco that leverage sensor co operation to achieve data authentication these schemes use standard and inexpensive symmetric cryptographic primitives coupled with key evolution and few messages exchange we provide security analysis for proposed schemes and assess their effectiveness via simulations we show that proposed schemes cope well with real wsn issues such as message loss and sensor failure we also compare the two schemes with respect to robustness and overhead which allows network designers to carefully select the right scheme and tune appropriate system parameters
this paper describes lattice based decoder for hierarchical phrase based translation the decoder is implemented with standard wfst operations as an alternative to the well known cube pruning procedure we find that the use of wfsts rather than best lists requires less pruning in translation search resulting in fewer search errors direct generation of translation lattices in the target language better parameter optimization and improved translation performance when rescoring with long span language models and mbr decoding we report translation experiments for the arabic to english and chinese to english nist translation tasks and contrast the wfst based hierarchical decoder with hierarchical translation under cube pruning
we make the case for developing web of concepts by starting with the current view of web comprised of hyperlinked pages or documents each seen as bag of words extracting concept centric metadata and stitching it together to create semantically rich aggregate view of all the information available on the web for each concept instance the goal of building and maintaining such web of concepts presents many challenges but also offers the promise of enabling many powerful applications including novel search and information discovery paradigms we present the goal motivate it with example usage scenarios and some analysis of yahoo logs and discuss the challenges in building and leveraging such web of concepts we place this ambitious research agenda in the context of the state of the art in the literature and describe various ongoing efforts at yahoo research that are related
we propose simple heuristic partition method hpm of classification tree to improve efficiency in the search for splitting points of numerical attributes the proposal is motivated by the idea that the selection process of candidates in the splitting point selection can be made more flexible as to achieve faster computation while retaining classification accuracy we compare the performance of the hpm against fayyad’s method as the latter is the improved version of the standard algorithm on the search of splitting points we demonstrate that hpm is more efficient in some cases by as much as while producing essentially the same classification for six different data sets our result supports the relaxation of instance boundaries rib as valid approach that can be explored to achieve more efficient computations
uncertainty pervades many domains in our lives current real life applications eg location tracking using gps devices or cell phones multimedia feature extraction and sensor data management deal with different kinds of uncertainty finding the nearest neighbor objects to given query point is an important query type in these applications in this paper we study the problem of finding objects with the highest marginal probability of being the nearest neighbors to query object we adopt general uncertainty model allowing for data and query uncertainty under this model we define new query semantics and provide several efficient evaluation algorithms we analyze the cost factors involved in query evaluation and present novel techniques to address the trade offs among these factors we give multiple extensions to our techniques including handling dependencies among data objects and answering threshold queries we conduct an extensive experimental study to evaluate our techniques on both real and synthetic data
this paper reports an investigation into the connection of the workspace of physical libraries with digital library services using simple sensor technology we provide focused access to digital resources on the basis of the user’s physical context including the topic of the stacks they are next to and the content of books on their reading desks our research developed the technological infrastructure to support this fused interaction investigated current patron behavior in physical libraries and evaluated our system in user centred pilot study the outcome of this research demonstrates the potential utility of the fused library and provides starting point for future exploitation
in this paper we provide framework for analyzing network traffic traces through trace driven queueing we also introduce several queueing metrics together with the associated visualization tools some novel that provide insight into the traffic features and facilitate comparisons between traces some techniques for non stationary data are discussed applying our framework to both real and synthetic traces we illustrate how to compare traces using trace driven queueing and ii show that traces that look similar under various statistical measures such as the hurst index can exhibit rather different behavior under queueing simulation
chip multiprocessors cmps or multi core processors have become common way of reducing chip complexity and power consumption while maintaining high performance speculative cmps use hardware to enforce dependence allowing parallelizing compiler to generate multithreaded code without needing to prove independence in these systems sequential program is decomposed into threads to be executed in parallel dependent threads cause performance degradation but do not affect correctness thread decomposition attempts to reduce the run time overheads of data dependence thread misprediction and load imbalance because these overheads depend on the runtimes of the threads that are being created by the decomposition reducing the overheads while creating the threads is circular problem static compile time decomposition handles this problem by estimating the run times of the candidate threads but is limited by the estimates inaccuracy dynamic execution time decomposition in hardware has better run time information but is limited by the decomposition hardware’s complexity and run time overhead we propose third approach where compiler instruments profile run of the application to search through candidate threads and pick the best threads as the profile run executes the resultant decomposition is compiled into the application so that production run of the application has no instrumentation and does not incurany decomposition overhead we avoid static decomposition’s estimation accuracy problem by using actual profile run execution times to pick threads and we avoid dynamic decomposition’s overhead by performing the decomposition at profile time because we allow candidate threads to span arbitrary sections of the application’s call graph and loop nests an exhaustive search of the decomposition space is prohibitive even in profile runs to address this issue we make the key observation that the run time overhead of thread depends to the first order only on threads that overlap with the thread inexecution eg in four core cmp given thread can overlap with at most three preceding and three following threads this observation implies that given thread affects only few other threads allowing pruning of the space using cmp simulator we achieve an average speedup of on four cores for five of the spec cfp benchmarks which compares favorably to recent static techniques we also discuss experiments with cint
the software architecture of distributed program can be represented by hierarchical composition of subsystems with interacting processes at the leaves of the hierarchy compositional reachability analysis cra is promising state reduction technique which can be automated and used in stages to derive the overall behavior of distributed program based on its architecture cra is particularly suitable for the analysis of programs that are subject to evolutionary change when program evolves only the behaviors of those subsystems affected by the change need be reevaluated the technique however has limitation the properties available for analysis are constrained by the set of actions that remain globally observable properties involving actions encapsulated by subsystems may therefore not be analyzed in this article we enhance the cra technique to check safety properties which may contain actions that are not globally observable to achieve this the state machine model is augmented with special trap state labeled as pgr we propose scheme to transform in stages property that involves hidden actions to one that involves only globally observable actions the enhanced technique also includes mechanism aiming at reducing the debugging effort the technique is illustrated using gas station system example
as value flows across the boundary between interoperating languages it must be checked and converted to fit the types and representations of the target language for simple forms of data the checks and coercions can be immediate for higher order data such as functions and objects some must be delayed until the value is used in particular way typically these coercions and checks are implemented by an ad hoc mixture of wrappers reflection and dynamic predicates we observe that the wrapper and reflection operations fit the profile of mirrors the checks correspond to contracts and the timing and shape of mirror operations coincide with the timing and shape of contract operations based on these insights we present new model of interoperability that builds on the ideas of mirrors and contracts and we describe an interoperable implementation of java and scheme that is guided by the model
numerous routing protocols have been proposed for wireless sensor networks each of which is highly optimized for certain class of traffic like real time reliable sense and disseminate network reprogramming energy efficiency and so on however typical deployment demands an arbitrary communication pattern that generates multiple traffic types simultaneously arguably no single routing protocol can completely cater to deployment’s various flavors in this paper we propose dynamic routing framework that can replace the traditional routing layer with collection of routing decisions we allow application packets to carry two bit preamble that uniquely describes the nature of communication sought for the framework dynamically wires the appropriate routing component from set of well defined suite we conduct extensive simulation experiments that generates concurrent mix of different traffic types each having its own and often conflicting communication demands for such an application we show that we could meet each traffic types demands for reliability delay path distribution link losses and congestion losses we also show that service differentiation can indeed be met successfully and practical deployments can be an imminent reality
we describe framework for automatically selecting summary set of photos from large collection of geo referenced photographs such large collections are inherently difficult to browse and become excessively so as they grow in size making summaries an important tool in rendering these collections accessible our summary algorithm is based on spa tial patterns in photo sets as well as textual topical patterns and user photographer identity cues the algorithm can be expanded to support social temporal and other factors the summary can thus be biased by the content of the query the user making the query and the context in which the query is madea modified version of our summarization algorithm serves as basis for new map based visualization of large collections of geo referenced photos called tag maps tag maps visualize the data by placing highly representative textual tags on relevant map locations in the viewed region effectively providing sense of the important concepts embodied in the collectionan initial evaluation of our implementation on set of geo referenced photos shows that our algorithm and visualization perform well producing summaries and views that are highly rated by users
most systems that support visual interaction with models use shape representations based on triangle meshes the size of these representations imposes limits on applications for which complex models must be accessed remotely techniques for simplifying and compressing models reduce the transmission time multiresolution formats provide quick access to crude model and then refine it progressively unfortunately compared to the best nonprogressive compression methods previously proposed progressive refinement techniques impose significant overhead when the full resolution model must be downloaded the cpm compressed progressive meshes approach proposed here eliminates this overhead it uses new technique which refines the topology of the mesh in batches which each increase the number of vertices by up to percentless than an amortized total of bits per triangle encode where and how the topological refinements should be applied we estimate the position of new vertices from the positions of their topological neighbors in the less refined mesh using new estimator that leads to representations of vertex coordinates that are percent more compact than previously reported progressive geometry compression techniques
previous studies on multi instance learning typically treated instances in the bags as independently and identically distributed the instances in bag however are rarely independent in real tasks and better performance can be expected if the instances are treated in an non iid way that exploits relations among instances in this paper we propose two simple yet effective methods in the first method we explicitly map every bag to an undirected graph and design graph kernel for distinguishing the positive and negative bags in the second method we implicitly construct graphs by deriving affinity matrices and propose an efficient graph kernel considering the clique information the effectiveness of the proposed methods are validated by experiments
the miniaturization of hardware components has lead to the development of wireless sensor networks wsn and networked applications over them meanwhile middleware systems have also been proposed in order to both facilitating the development of these applications and providing common application services the development of middleware for sensor networks however places new challenges to middleware developers due to the low availability of resources and processing capacity of the sensor nodes in this context this paper presents middleware for wsn named mires mires incorporates characteristics of message oriented middleware by allowing applications communicate in publish subscribe way in order to illustrate the proposed middleware we implement an aggregation middleware service for an environment monitoring application
submodular functions are key concept in combinatorial optimization algorithms that involve submodular functions usually assume that they are given by value oracle many interesting problems involving submodular functions can be solved using only polynomially many queries to the oracle eg exact minimization or approximate maximization in this paper we consider the problem of approximating non negative monotone submodular function on ground set of size everywhere after only poly oracle queries our main result is deterministic algorithm that makes poly oracle queries and derives function such that for every set approximates within factor alpha where alpha radic for rank functions of matroids and alpha radic log for general monotone submodular functions our result is based on approximately finding maximum volume inscribed ellipsoid in symmetrized polymatroid and the analysis involves various properties of submodular functions and polymatroids our algorithm is tight up to logarithmic factors indeed we show that no algorithm can achieve factor better than omega radic log even for rank functions of matroid
in petabyte scale distributed file systems that decouple read and write from metadata operations behavior of the metadata server cluster will be critical to overall system performance and scalability we present dynamic subtree partitioning and adaptive metadata management system designed to efficiently manage hierarchical metadata workloads that evolve over time we examine the relative merits of our approach in the context of traditional workload partitioning strategies and demonstrate the performance scalability and adaptability advantages in simulation environment
abstract this paper describes the design of flexible load balancing framework and runtime software system for supporting the development of adaptive applications on distributed memory parallel computers the runtime system supports global namespace transparent object migration automatic message forwarding and routing and automatic load balancing these features can be used at the discretion of the application developer in order to simplify program development and to eliminate complex bookkeeping associated with mobile data objects an evaluation of this system in the context of three dimensional tetrahedral advancing front parallel mesh generator shows that overall runtime improvements of percent compared to common stop and repartition load balancing methods percent compared to explicit intrusive load balancing methods and percent compared to no load balancing are possible on large processor configurations at the same time the overheads attributable to the runtime system are fraction of percent of the total runtime the parallel advancing front method is coarse grained and highly adaptive application and therefore exercises all of the features of the runtime system
challenges in addressing the memory bottleneck have made it difficult to design packet processing platform that simultaneously achieves both ease of programming and high performance today’s commercial processors support two architectural mechanisms namely hardware multithreading and caching to overcome the memory bottleneck the configurations of these mechanisms eg cache capacity number of threads per processor core are fixed at processor design time the relative effectiveness of these mechanisms however varies significantly with application traffic and system characteristics thus programmers often struggle to achieve high performance from processor that is not well suited to particular deployment to address this challenge we first make case for and then develop malleable processor architecture that facilitates the dynamic reconfiguration of cache capacity and number of threads to best suit the needs of each deployment we then present an algorithm that can determine the optimal thread cache balance at run time the combination of these two allows us to simultaneously achieve the goals of ease of programming and high performance we demonstrate that our processor outperforms processor similar to intel’s ixp state of the art commercial network processor in about of the deployments we consider further in about of the deployments our platform improves the throughput by as much as
future heterogeneous single isa multicore processors will have an edge in potential performance per watt over comparable homogeneous processors to fully tap into that potential the os scheduler needs to be heterogeneity aware so it can match jobs to cores according to characteristics of both we propose heterogeneity aware signature supported scheduling algorithm that does the matching using per thread architectural signatures which are compact summaries of threads architectural properties collected offline the resulting algorithm does not rely on dynamic profiling and is comparatively simple and scalable we implemented hass in opensolaris and achieved average workload speedups of up to matching best static assignment achievable only by an oracle we have also implemented dynamic ipc driven algorithm proposed earlier that relies on online profiling we found that the complexity load imbalance and associated performance degradation resulting from dynamic profiling are significant challenges to using this algorithm successfully as result it failed to deliver expected performance gains and to outperform hass
ensuring long processor lifetimes by limiting failuresdue to wear out related hard errors is critical requirementfor all microprocessor manufacturers we observethat continuous device scaling and increasing temperaturesare making lifetime reliability targets even harder to meethowever current methodologies for qualifying lifetime reliabilityare overly conservative since they assume worst caseoperating conditions this paper makes the case thatthe continued use of such methodologies will significantlyand unnecessarily constrain performance instead lifetimereliability awareness at the microarchitectural design stagecan mitigate this problem by designing processors that dynamicallyadapt in response to the observed usage to meeta reliability targetwe make two specific contributions first we describean architecture level model and its implementation calledramp that can dynamically track lifetime reliability respondingto changes in application behavior ramp isbased on state of the art device models for different wear outmechanisms second we propose dynamic reliabilitymanagement drm technique where the processorcan respond to changing application behavior to maintainits lifetime reliability target in contrast to currentworst case behavior based reliability qualification methodologies drm allows processors to be qualified for reliabilityat lower but more likely operating points than theworst case using ramp we show that this can save costand or improve performance that dynamic voltage scalingis an effective response technique for drm and that dynamicthermal management neither subsumes nor is sub sumedby drm
in this paper we propose an autonomic management framework asgrid to address the requirements of emerging large scale applications in hybrid grid and sensor network systems to the best of our knowledge we are the first who proposed the notion of autonomic sensor grid systems in holistic manner aiming at non trivial large applications to bridge the gap between the physical world and the digital world and facilitate information analysis and decision making asgrid is designed to smooth the integration of sensor networks and grid systems and efficiently use both on demand under the blueprint of asgrid we present several building blocks that fulfill the following major features self configuration through content based aggregation and associative rendezvous mechanisms self optimisation through utility based sensor selection model driven hierarchical sensing task scheduling and auction based game theoretic approach for grid scheduling self protection through activekey dynamic key management and strust trust management mechanisms experimental and simulation results on these aspects are presented
clustering methods can be either data driven or need driven data driven methods intend to discover the true structure of the underlying data while need driven methods aims at organizing the true structure to meet certain application requirements thus need driven eg constrained clustering is able to find more useful and actionable clusters in applications such as energy aware sensor networks privacy preservation and market segmentation however the existing methods of constrained clustering require users to provide the number of clusters which is often unknown in advance but has crucial impact on the clustering result in this paper we argue that more natural way to generate actionable clusters is to let the application specific constraints decide the number of clusters for this purpose we introduce novel cluster model constraint driven clustering cdc which finds an priori unspecified number of compact clusters that satisfy all user provided constraints two general types of constraints are considered ie minimum significance constraints and minimum variance constraints as well as combinations of these two types we prove the np hardness of the cdc problem with different constraints we propose novel dynamic data structure the cd tree which organizes data points in leaf nodes such that each leaf node approximately satisfies the cdc constraints and minimizes the objective function based on cd trees we develop an efficient algorithm to solve the new clustering problem our experimental evaluation on synthetic and real datasets demonstrates the quality of the generated clusters and the scalability of the algorithm
debugging multithreaded programs which involves detection and identification of the cause of data races has proved to be hard problem although there has been significant amount of research on this topic prior works rely on one important assumption the debuggers must be aware of all the synchronization operations that take place during program run this assumption is significant limitation as multithreaded programs including the popular splash benchmark have barriers and flag synchronizations implemented in the user code we show that the lack of knowledge of these synchronization operations leads to unnecessary reporting of numerous races our experiments with splash benchmark suite show that distinct segments in source code on an average give rise to well over million dynamic instances of falsely reported races for these programs we propose dynamic software technique that identifies the user defined synchronizations exercised during program run this information not only helps avoids reporting of unnecessary races but also helps record replay system to speedup the replay our evaluation confirms that our synchronization detector is highly accurate with no false negatives and very few false positives thus reporting of nearly all unnecessary races is avoided finally we show that the knowledge of synchronization operations resulted in about reduction in replay time
we introduce new non intrusive on chip cache tuning hardware module capable of accurately predicting the best configuration of configurable cache for an executing application previous dynamic cache tuning approaches change the cache configuration several times as part of the tuning search process executing the application using inferior configurations and temporarily causing energy and performance overhead the introduced tuner uses different approach which non intrusively collects data on addresses issued by the microprocessor analyzes that data to predict the best cache configuration and then updates the cache to the new best configuration in one shot without ever having to examine inferior configurations the result is less energy and less performance overhead meaning that cache tuning can be applied more frequently we show through experiments that the one shot cache tuner can reduce memory access related energy for instructions by and comes within of previous intrusive approach and results in times less energy overhead and times speedup in tuning time compared to previous intrusive approach at the main expense of larger size
shareable data services providing consistency guarantees such as atomicity linearizability make building distributed systems easier however combining linearizability with efficiency in practical algorithms is difficult reconfigurable linearizable data service called rambo was developed by lynch and shvartsman this service guarantees consistency under dynamic conditions involving asynchrony message loss node crashes and new node arrivals the specification of the original algorithm is given at an abstract level aimed at concise presentation and formal reasoning about correctness the algorithm propagates information by means of gossip messages if the service is in use for long time the size and the number of gossip messages may grow without bound this paper presents consistent data service for long lived objects that improves on rambo in two ways it includes an incremental communication protocol and leave service the new protocol takes advantage of the local knowledge and carefully manages the size of messages by removing redundant information while the leave service allows the nodes to leave the system gracefully the new algorithm is formally proved correct by forward simulation using levels of abstraction an experimental implementation of the system was developed for networks of workstations the paper also includes selected analytical and preliminary empirical results that illustrate the advantages of the new algorithm
modern database applications including computer aided design multimedia information systems medical imaging molecular biology or geographical information systems impose new requirements on the effective and efficient management of spatial data particular problems arise from the need of high resolutions for large spatial objects and from the design goal to use general purpose database management systems in order to guarantee industrial strength in the past two decades various stand alone spatial index structures have been proposed but their integration into fully fledged database systems is problematic most of these approaches are based on the decomposition of spatial objects leading to replicating index structures in contrast to common black and white decompositions which suffer from the lack of intermediate solutions we introduce gray intervals which are stored in spatial index additionally we store the exact information of these gray intervals in compressed way these gray intervals are created by using cost based decompositioning algorithm which takes the access probability and the decompression cost of them into account furthermore we exploit statistical information of the database objects to find cost optimal decomposition of the query objects the experimental evaluation on the sequoia benchmark test points out that our new concept outperforms the relational interval tree by more than one order of magnitude with respect to overall query response time
concurrency control is one of the key problems in design and implementation of collaborative systems such as hypertext hypermedia systems cad cam systems and software development environments most existing systems store data in specialized databases with built in concurrency control policies usually implemented via locking it is desirable to construct such collaborative systems on top of the world wide web but most web servers do not support even conventional transactions let alone distributed multi website transactions or flexible concurrency control mechanisms oriented toward teamwork such as event notification shared locks and fine granularity locks we present transaction server that operates independently of web servers or the collaborative systems to fill the concurrency control gap by default the transaction server enforces the conventional atomic transaction model where sets of operations are performed in an all or nothing fashion and isolated from concurrent users the server can be tailored dynamically to apply more sophisticated concurrency control policies appropriate for collaboration the transaction server also supports applications employing information resources other than web servers such as legacy databases corba objects and other hypermedia systems our implementation permits wide range of system architecture styles
pushdown systems pdss are an automata theoretic formalism for specifying class of infinite state transition systems infiniteness comes from the fact that each configuration in the state space consists of formal control location coupled with stack of unbounded size pdss can model program paths that have matching calls and returns and automaton based representations allow analysis algorithms to account for the infinite control state space of recursive programs weighted pushdown systems wpdss are generalization of pdss that add general black box abstraction for program data through weights wpdss also generalize other frameworks for interprocedural analysis such as the sharir pnueli functional approach this paper surveys recent work in this area and establishes few new connections with existing work
wireless mobile networks and devices are becoming increasingly popular to provide users the access anytime and anywhere we are witnessing now an unprecedented demand for wireless networks to support both data and real time multimedia traffic the wireless mobile systems are based on cellular approach and the area is covered by cells that overlap each other in mobile cellular systems the handover is very important process many handover algorithms are proposed in the literature however to make better handover and keep the qos in wireless networks is very difficult task for this reason new intelligent algorithms should be implemented to deal with this problem in this paper we carried out comparison study of two handover systems based on fuzzy logic we implement two fuzzy based handover systems fbhs called fbhs and fbhs the performance evaluation via simulations shows that fbhs has better behavior than fbhs and can avoid ping pong effect in all simulation cases
genetic programming has now been used to produce at least instances of results that are competitive with human produced results these human competitive results come from wide variety of fields including quantum computing circuits analog electrical circuits antennas mechanical systems controllers game playing finite algebras photonic systems image recognition optical lens systems mathematical algorithms cellular automata rules bioinformatics sorting networks robotics assembly code generation software repair scheduling communication protocols symbolic regression reverse engineering and empirical model discovery this paper observes that despite considerable variation in the techniques employed by the various researchers and research groups that produced these human competitive results many of the results share several common features many of the results were achieved by using developmental process and by using native representations regularly used by engineers in the fields involved the best individual in the initial generation of the run of genetic programming often contains only small number of operative parts most of the results that duplicated the functionality of previously issued patents were novel solutions not infringing solutions in addition the production of human competitive results as well as the increased intricacy of the results are broadly correlated to increased availability of computing power tracked by moore’s law the paper ends by predicting that the increased availability of computing power through both parallel computing and moore’s law should result in the production in the future of an increasing flow of human competitive results as well as more intricate and impressive results
we present new approach for constructing and verifying higher order imperative programs using the coq proof assistant we build on the past work on the ynot system which is based on hoare type theory that original system was proof of concept where every program verification was accomplished via laborious manual proofs with much code devoted to uninteresting low level details in this paper we present re implementation of ynot which makes it possible to implement fully verified higher order imperative programs with reasonable proof burden at the same time our new system is implemented entirely in coq source files showcasing the versatility of that proof assistant as platform for research on language design and verification both versions of the system have been evaluated with case studies in the verification of imperative data structures such as hash tables with higher order iterators the verification burden in our new system is reduced by at least an order of magnitude compared to the old system by replacing manual proof with automation the core of the automation is simplification procedure for implications in higher order separation logic with hooks that allow programmers to add domain specific simplification rules we argue for the effectiveness of our infrastructure by verifying number of data structures and packrat parser and we compare to similar efforts within other projects compared to competing approaches to data structure verification our system includes much less code that must be trusted namely about hundred lines of coq code defining program logic all of our theorems and decision procedures have or build machine checkable correctness proofs from first principles removing opportunities for tool bugs to create faulty verifications
benchmarks are vital tools in the performance measurement evaluation and comparison of computer hardware and software systems standard benchmarks such as the trec tpc spec sap oracle microsoft ibm wisconsin as ap oo oo xoo benchmarks have been used to assess the system performance these benchmarks are domain specific and domain dependent in that they model typical applications and tie to problem domain test results from these benchmarks are estimates of possible system performance for certain pre determined problem types when the user domain differs from the standard problem domain or when the application workload is divergent from the standard workload they do not provide an accurate way to measure the system performance of the user problem domain system performance of the actual problem domain in terms of data and transactions may vary significantly from the standard benchmarks in this research we address the issue of generalization and precision of benchmark workload model for web search technology the current performance measurement and evaluation method suffers from the rough estimate of system performance which varies widely when the problem domain changes the performance results provided by the vendors cannot be reproduced nor reused in the real users environment hence in this research we tackle the issue of domain boundness and workload boundness which represents the root of the problem of imprecise ir representative and ir reproducible performance results we address the issue by presenting domain independent and workload independent workload model benchmark method which is developed from the perspective of the user requirements and generic constructs we present user driven workload model to develop benchmark in process of workload requirements representation transformation and generation via the common carrier of generic constructs we aim to create more generalized and precise evaluation method which derives test suites from the actual user domain and application setting the workload model benchmark method comprises three main components they are high level workload specification scheme translator of the scheme and set of generators to generate the test database and the test suite they are based on the generic constructs the specification scheme is used to formalize the workload requirements the translator is used to transform the specification the generator is used to produce the test database and the test workload we determine the generic constructs via the analysis of search methods the generic constructs form page model query model and control model in the workload model development the page model describes the web page structure the query model defines the logics to query the web the control model defines the control variables to set up the experiments in this study we have conducted ten baseline research experiments to validate the feasibility and validity of the benchmark method an experimental prototype is built to execute these experiments experimental results demonstrate that the method based on generic constructs and driven by the perspective of user requirements is capable of modeling the standard benchmarks as well as more general benchmark requirements
value speculation is speculative technique proposed to reduce the execution time of programs it relies on predictor checker and recovery mechanism the predictor predicts the result of an instruction in order to issue speculatively its dependent instructions the checker checks the prediction after issuing the predicted instruction and the recovery mechanism deals with mispredictions in order to maintain program correctness previous works on value speculation have considered that the instructions dependent on predicted instruction can be issued before issuing the predicted instruction non delayed issue policy in this work we propose delaying the issue time of the instructions dependent on value predicted instruction until issuing the value predicted instruction delayed issue policy although the potential performance benefits of the delayed issue policy are smaller than that of the non delayed issue policy the recovery mechanism required by the delayed issue policy is simpler than the recovery mechanism required by the non delayed issue policy we have evaluated both issue policies in the context of load value prediction by means of address prediction in order to determine in which scenarios the performance of the delayed issue policy is competitive with that of the non delayed issue policy our results show that the delayed policy is cost effective alternative to the non delayed policy especially for realistic issue queue sizes
the incorporation of context awareness capabilities into pervasive applications allows them to leverage contextual information to provide additional services while maintaining an acceptable quality of service these added capabilities however introduce distinct input space that can affect the behavior of these applications at any point during their execution making their validation quite challenging in this paper we introduce an approach to improve the test suite of context aware application by identifying context aware program points where context changes may affect the application’s behavior and by systematically manipulating the context data fed into the application to increase its exposure to potentially valuable context variations preliminary results indicate that the approach is more powerful than existing testing approaches used on this type of application
shared visual workspace allows multiple people to see similar views of objects and environments prior empirical literature demonstrates that visual information helps collaborators understand the current state of their task and enables them to communicate and ground their conversations efficiently we present an empirical study that demonstrates how action replaces explicit verbal instruction in shared visual workspace pairs performed referential communication task with and without shared visual space detailed sequential analysis of the communicative content reveals that pairs with shared workspace were less likely to explicitly verify their actions with speech rather they relied on visual information to provide the necessary communicative and coordinative cues
this work introduces transformation methodology for functional logic programs based on needed narrowing the optimal and complete operational principle for modern declarative languages which integrate the best features of functional and logic programming we provide correctness results for the transformation system wrt the set of computed values and answer substitutions and show that the prominent properties of needed narrowing namely the optimality wrt the length of derivations and the number of computed solutions carry over to the transformation process and the transformed programs we illustrate the power of the system by taking on in our setting two well known transformation strategies composition and tupling we also provide an implementation of the transformation system which by means of some experimental results highlights the potentiality of our approach
in this paper we give an account of the current state of practice in ontology engineering oe based on the findings of months empirical survey that analyzed oe projects the survey focused on process related issues and looked into the impact of research achievements on real world oe projects the complexity of particular ontology development tasks the level of tool support and the usage scenarios for ontologies the main contributions of this survey are twofold the size of the data set is larger than every other similar endeavor the findings of the survey confirm that oe is an established engineering discipline wrt the maturity and level of acceptance of its main components methodologies etc whereas further research should target economic aspects of oe and the customization of existing technology to the specifics of vertical domains
density biased sampling dbs has been proposed to address the limitations of uniform sampling by producing the desired probability distribution in the sample the ease of producing random sample depends on the available mechanism for accessing the elements of the dataset existing dbs algorithms perform sampling over flat files in this paper we develop new method that exploits spatial indexes and the local density information they preserve to provide good quality of sampling result and fast access to elements of the dataset with the proposed method accurate density estimations can be produced with respect to factors like skew noise or dimensionality moreover significant improvement in sampling time is attained the performance of the proposed method is examined analytically and experimentally the comparative results illustrate its superiority over existing methods
in the evolving submicron technology making it particularly attractive to use decentralized designs common form of decentralization adopted in processors is to partition the execution core into multiple clusters each cluster has small instruction window and set of functional units number of algorithms have been proposed for distributing instructions among the clusters the first part of this paper analyzes qualitatively as well as quantitatively the effect of various hardware parameters such as the type of cluster interconnect the fetch size the cluster issue width the cluster window size and the number of clusters on the performance of different instruction distribution algorithms the study shows that the relative performance of the algorithms is very sensitive to these hardware parameters and that the algorithms that perform relatively better with four or fewer clusters are generally not the best ones for larger number of clusters this is important given that with an imminent increase in the transistor budget more clusters are expected to be integrated on single chip the second part of the paper investigates alternate interconnects that provide scalable performance as the number of clusters is increased in particular it investigates two hierarchical interconnects single ring of crossbars and multiple rings of crossbars as well as instruction distribution algorithms to take advantage of these interconnects our study shows that these new interconnects with the appropriate distribution techniques achieve an ipc instructions per cycle that is percent better than the most scalable existing configuration and is within percent of that achieved by hypothetical ideal processor having cycle latency crossbar interconnect these results confirm the utility and applicability of hierarchical interconnects and hierarchical distribution algorithms in clustered processors
we propose novel communication efficient topology control algorithm for each wireless node to select communication neighbors and adjust its transmission power such that all nodes together self form topology that is energy efficient simultaneously for both unicast and broadcast communications we prove that the proposed topology is planar which guarantees packet delivery if certain localized routing method is used it is power efficient for unicast the energy needed to connect any pair of nodes is within small constant factor of the minimum under common power attenuation model it is efficient for broadcast the energy consumption for broadcasting data on top of it is asymptotically the best compared with structures constructed locally it has constant bounded logical degree which will potentially reduce interference and signal contention we further prove that the average physical degree of all nodes is bounded by small constant to the best of our knowledge this is the first communication efficient distributed algorithm to achieve all these properties previously only centralized algorithm was reported in moreover by assuming that the id and position of every node can be represented in log bits for wireless network of nodes our method uses at most messages where each message is of log bits we also show that this structure can be efficiently updated for dynamical network environment our theoretical results are corroborated in the simulations
comparing the expressive power of access control models is recognized as fundamental problem in computer security such comparisons are generally based on simulations between different access control schemes however the definitions for simulations that are used in the literature make it impossible to put results and claims about the expressive power of access control models into single context and to compare such models to one another in meaningful way we propose theory for comparing the expressive power of access control models we perceive access control systems as state transition systems and require simulations to preserve security properties we discuss the rationale behind such theory apply the theory to reexamine some existing work on the expressive power of access control models in the literature and present three results we show that rbac with particular administrative model from the literature arbac is limited in its expressive power atam augmented typed access matrix is more expressive than tam typed access matrix thereby solving an open problem posed in the literature and trust management language is at least as expressive as rbac with particular administrative model the ura component of arbac
in this article we consider the issue of ranking xml data and data sources in distributed xml data warehouse our ranking model applies to service oriented data management applications where web services store and exchange xml fragments each service publishes set of operations implemented as parameterized queries on local xml data warehouse integrating locally generated data and query results received from other services we propose new way for ranking distributed data and data sources taking into consideration their usage for the evaluation of queries the main results are formal ranking model of data queries and services and an implementation on data warehouse
previous studies have revealed that paravirtualization imposes minimal performance overhead on high performance computing hpc workloads while exposing numerous benefits for this field in this study we are investigating the memory hierarchy characteristics of paravirtualized systems and their impact on automatically tuned software systems we are presenting an accurate characterization of memory attributes using hardware counters and user process accounting for that we examine the proficiency of atlas quintessential example of an autotuning software system in tuning the blas library routines for paravirtualized systems in addition we examine the effects of paravirtualization on the performance boundary our results show that the combination of atlas and xen paravirtualization delivers native execution performance and nearly identical memory hierarchy performance profiles our research thus exposes new benefits to memory intensive applications arising from the ability to slim down the guest os without influencing the system performance in addition our findings support novel and very attractive deployment scenario for computational science and engineering codes on virtual clusters and computational clouds
we consider the problem of using sampling to estimate the result of an aggregation operation over subset based sql query where subquery is correlated to an outer query by not exists not in exists or in clause we design an unbiased estimator for our query and prove that it is indeed unbiased we then provide second biased estimator that makes use of the superpopulation concept from statistics to minimize the mean squared error of the resulting estimate the two estimators are tested over an extensive set of experiments
solid state drives perform random reads more than faster than traditional magnetic hard disks while offering comparable sequential read and write bandwidth because of their potential to speed up applications as well as their reduced power consumption these new drives are expected to gradually replace hard disks as the primary permanent storage media in large data centers however although they may benefit applications that stress random reads immediately they may not improve database applications especially those running long data analysis queries database query processing engines have been designed around the speed mismatch between random and sequential on hard disks and their algorithms currently emphasize sequential accesses for disk resident data in this paper we investigate data structures and algorithms that leverage fast random reads to speed up selection projection and join operations in relational query processing we first demonstrate how column based layout within each page reduces the amount of data read during selections and projections we then introduce flashjoin general pipelined join algorithm that minimizes accesses to base and intermediate relational data flashjoin’s binary join kernel accesses only the join attributes producing partial results in the form of join index subsequently its fetch kernel retrieves the attributes for later nodes in the query plan as they are needed flashjoin significantly reduces memory and requirements for each join in the query we implemented these techniques inside postgres and experimented with an enterprise ssd drive our techniques improved query runtimes by up to for queries ranging from simple relational scans and joins to full tpc queries
well known method to represent partially ordered set order for short consists in associating to each element of subset of fixed set such that the order relation coincides with subset inclusion such an embedding of into the lattice of all subsets of is called bit vector encoding of these encodings provide an interesting way to store an order they are economical with space and comparisons between elements can be performed efficiently via subset inclusion testsgiven an order minimizing the size of the encoding ie the cardinal of is however difficult problem the smallest size is called the dimension of and denoted by dim in the literature the decision problem for the dimension has been classified as np complete and generating small bit vector encodings is challenging issueseveral works deal with bit vector encodings from theoretical point of view in this article we focus on computational complexity results after synthesis of known results we come back on the np completeness by detailing proof and enforcing the conclusion with non approximability ratios besides this general result we investigate the complexity of the dimension for the class of trees we describe approximation algorithm for this class it uses an optimal balancing strategy which solves conjecture of krall vitek and horspool several interesting open problems are listed
leakage power reduction in cache memories continues to be critical area of research because of the promise of significant pay off various techniques have been developed so far that can be broadly categorized into state preserving eg drowsy caches and non state preserving eg cache decay decay saves more leakage but also incurs dynamic power overhead in the form of induced misses previous work has shown that depending on the leakage vs dynamic power trade off one or the other technique can be better several factors such as cache architecture technology parameters and temperature affect this trade off our work proposes the first mechanism to the best of our knowledg that takes into account temperature in adjusting the leakage control policy at run time at very low temperatures leakage is relatively weak so the need to tightly control it is not as important as the need to minimize extra dynamic power eg decay induced misses or performance loss we use hybrid decay drowsy policy where the main benefit comes from decaying cache lines while the drowsy mode is used to save leakage in long decay intervals to adapt the decay mode to temperature we propose simple triggering mechanism that is based on the principles of decaying thermal sensors and as such tied to temperature the hotter the cache is the faster cache lines are decayed since it is beneficial to do so with very high leakage currentsconversely when the cache temperature is low our mechanism defers putting cache lines in decay mode to avoid dynamic power overhead but still saves significant amount of leakage using the drowsy mode our study shows that across wide range of temperatures the simple adaptability of our proposal yields consistently better results than either the decay mode or drowsy mode alone improving over the best by as much as
the emerging edge services architecture promises to improve the availability and performance of web services by replicating servers at geographically distributed sites key challenge in such systems is data replication and consistency so that edge server code can manipulate shared data without incurring the availability and performance penalties that would be incurred by accessing traditional centralized database this paper explores using distributed object architecture to build an edge service system for an commerce application an online bookstore represented by the tpc benchmark we take advantage of application specific semantics to design distributed objects to manage specific subset of shared information using simple and effective consistency models our experimental results show that by slightly relaxing consistency within individual distributed objects we can build an edge service system that is highly available and efficient for example in one experiment we find that our object based edge server system provides factor of five improvement in response time over traditional centralized cluster architecture and factor of nine improvement over an edge service system that distributes code but retains centralized database
the subject of this paper is flow and context insensitive pointer analysis we present novel approach for precisely modelling struct variables and indirect function calls our method emphasises efficiency and simplicity and extends the language of set constraints we experimentally evaluate the precision cost trade off using benchmark suite of common programs between to lines of code our results indicate the field sensitive analysis is more expensive to compute but yields significantly better precision
this paper investigates the appropriateness of knowledge management system kms designs for different organizational knowledge processing challenges building on the theory of task technology fit ttf we argue that different kms designs are more effective for different knowledge tasks an exploratory field experiment was conducted in the context of internet based knowledge sharing services to provide empirical support for our hypotheses the results of our experiment show that kms designed to support the goal generate is more appropriate for divergent type knowledge problems because of its affordances for iterative brainstorming processes conversely for convergent type knowledge processing challenges kms with the goal choose that supports the ability to clarify and to analyze is more effective
it is often impossible to obtain one size fits all solution for high performance algorithms when considering different choices for data distributions parallelism transformations and blocking the best solution to these choices is often tightly coupled to different architectures problem sizes data and available system resources in some cases completely different algorithms may provide the best performance current compiler and programming language techniques are able to change some of these parameters but today there is no simple way for the programmer to express or the compiler to choose different algorithms to handle different parts of the data existing solutions normally can handle only coarse grained library level selections or hand coded cutoffs between base cases and recursive cases we present petabricks new implicitly parallel language and compiler where having multiple implementations of multiple algorithms to solve problem is the natural way of programming we make algorithmic choice first class construct of the language choices are provided in way that also allows our compiler to tune at finer granularity the petabricks compiler autotunes programs by making both fine grained as well as algorithmic choices choices also include different automatic parallelization techniques data distributions algorithmic parameters transformations and blocking additionally we introduce novel techniques to autotune algorithms for different convergence criteria when choosing between various direct and iterative methods the petabricks compiler is able to tune program in such way that delivers near optimal efficiency for any desired level of accuracy the compiler has the flexibility of utilizing different convergence criteria for the various components within single algorithm providing the user with accuracy choice alongside algorithmic choice
oscillatory motion is ubiquitous in computer graphics yet existing animation techniques are ill suited to its authoring we introduce new type of spline for this purpose known as wiggly spline the spline generalizes traditional piecewise cubics when its resonance and damping are set to zero but creates oscillatory animation when its resonance and damping are changed the spline provides combination of direct manipulation and physical realism to create overlapped and propagating motion we generate phase shifts of the wiggly spline and use these to control appropriate degrees of freedom in model the phase shifts can be created directly by procedural techniques or through paint like interface further option is to derive the phase shifts statistically by analyzing time series of simulation in this case the wiggly spline makes it possible to canonicalize simulation generalize it by providing frequency and damping controls and control it through direct manipulation
the total number of popular search engines has decreased over time from its peak of the late however when combining the three remaining major ones yahoo google ms live with large repositories of information eg bbccom nytcom gazetapl etc the total number of important information sources can be seen as slowly increasing focusing on utilization of search engines only it is easy to observe that the same query issued to each one of them results in different suggestions the question thus arrives is it possible and worthy to combine responses obtained from each one of them into single answer set in this paper we look into three approaches of achieving this goal which are based on game theory auction and consensus methods while our focus is to study and compare their performance
the ability to produce join results before having read an entire input early reduces query response time this is especially important for interactive applications and for joins in mediator systems that may have to wait on network delays when reading the inputs although several early join algorithms have been proposed there has been no formal treatment of how different reading policies affect the number of results produced in this work we show that alternate reading is optimal among fixed reading policies and we provide expressions for the expected number of results produced over time further we analyze policies that adapt their execution to the tuples already read and to the distribution of the inputs we present greedy adaptive algorithm that is optimal in that it outperforms all reading policies on average however the greedy policy is shown to perform only marginally better than the alternating policy thus the alternating policy emerges as policy that is easy to implement requires no knowledge of the input distributions is optimal among fixed policies and is nearly optimal among all policies
in recent years much effort has been put in ontology learning however the knowledge acquisition process is typically focused in the taxonomic aspect the discovery of non taxonomic relationships is often neglected even though it is fundamental point in structuring domain knowledge this paper presents an automatic and unsupervised methodology that addresses the non taxonomic learning process for constructing domain ontologies it is able to discover domain related verbs extract non taxonomically related concepts and label relationships using the web as corpus the paper also discusses how the obtained relationships can be automatically evaluated against wordnet and presents encouraging results for several domains
we present fast space efficient algorithm for constructing compressed suffix arrays csa the algorithm requires logn time in the worst case and only bits of extra space in addition to the csa as the basic step we describe an algorithm for merging two csas we show that the construction algorithm can be parallelized in symmetric multiprocessor system and discuss the possibility of distributed implementation we also describe parallel implementation of the algorithm capable of indexing several gigabytes per hour
tuple dropping though commonly used for loadshedding in most data stream operations is generally inadequatefor multi way windowed stream joins the join output rate canbe unnecessarily reduced because tuple dropping fails to exploitthe time correlations likely to exist among interrelated streamsin this paper we introduce grubjoin an adaptive multi way windowed stream join that effectively performs time correlationawarecpu load shedding grubjoin maximizes the output rateby achieving near optimal window harvesting which picks onlythe most profitable segments of individual windows for the joindue mainly to the combinatorial explosion of possible multi wayjoin sequences involving different window segments grubjoinfaces unique challenges that do not exist for binary joins suchas determining the optimal window harvesting configurationin time efficient manner and learning the time correlationsamong the streams without introducing overhead to tacklethese challenges we formalize window harvesting as an optimizationproblem develop greedy heuristics to determine nearoptimalwindow harvesting configurations and use approximationtechniques to capture the time correlations our experimentalresults show that grubjoin is vastly superior to tuple droppingwhen time correlations exist and is equally effective when timecorrelations are nonexistent
we study the problem of secure communication in multi channel single hop radio network with malicious adversary that can cause collisions and spoof messages we assume no pre shared secrets or trusted third party infrastructure the main contribution of this paper is ame randomized ast uthenticated essage xchange protocol that enables nodes to exchange messages in reliable and authenticated manner it runs in log time and has optimal resilience to disruption where is the set of pairs of nodes that need to swap messages is the total number of nodes the number of channels and log rounds for the setup phase and log rounds for an arbitrary pair to communicate by contrast existing solutions rely on pre shared secrets trusted third party infrastructure and or the assumption that all interference is non malicious
in the last decade empirical studies on object oriented design metrics have shown some of them to be useful for predicting the fault proneness of classes in object oriented software systems this research did not however distinguish among faults according to the severity of impact it would be valuable to know how object oriented design metrics and class fault proneness are related when fault severity is taken into account in this paper we use logistic regression and machine learning methods to empirically investigate the usefulness of object oriented design metrics specifically subset of the chidamber and kemerer suite in predicting fault proneness when taking fault severity into account our results based on public domain nasa data set indicate that most of these design metrics are statistically related to fault proneness of classes across fault severity and the prediction capabilities of the investigated metrics greatly depend on the severity of faults more specifically these design metrics are able to predict low severity faults in fault prone classes better than high severity faults in fault prone classes
direction is an important spatial concept that is used in many fields such as geographic information systems gis and image interpretation it is also frequently used as selection condition in spatial queries previous work has modeled direction as relational predicate between spatial objects conversely in this paper we model direction as new kind of spatial object using the concepts of vectors points and angles the basic approach is to model direction as unit vector this novel view of direction has several obvious advantages being modeled as spatial object direction object can have its own attributes and operation set secondly new spatial data types such as oriented spatial objects and open spatial objects can be defined at the abstract object level finally the object view of direction makes direction reasoning easy and also reduces the need for large number of inference rules these features are important in spatial query processing and optimization the applicability of the direction model is demonstrated by geographic query examples
this paper proposes cycle accounting architecture for simultaneous multithreading smt processors that estimates the execution times for each of the threads had they been executed alone while they are running simultaneously on the smt processor this is done by accounting each cycle to either base miss event or waiting cycle component during multi threaded execution single threaded alone execution time is then estimated as the sum of the base and miss event components the waiting cycle component represents the lost cycle count due to smt execution the cycle accounting architecture incurs reasonable hardware cost around kb of storage and estimates single threaded performance with average prediction errors around for two program workloads and for four program workloads the cycle accounting architecture has several important applications to system software and its interaction with smt hardware for one the estimated single thread alone execution time provides an accurate picture to system software of the actually consumed processor cycles per thread the alone execution time instead of the total execution time timeslice may make system software scheduling policies more effective second new class of thread progress aware smt fetch policies based on per thread progress indicators enable system software level priorities to be enforced at the hardware level
in this paper we provide simple game theoretic model of an online question and answer forum we focus on factual questions in which user responses aggregate while question remains open each user has unique piece of information and can decide when to report this information the asker prefers to receive information sooner rather than later and will stop the process when satisfied with the cumulative value of the posted information we consider two distinct cases complements case in which each successive piece of information is worth more to the asker than the previous one and substitutes case in which each successive piece of information is worth less than the previous one best answer scoring rule is adopted to model yahoo answers and is effective for substitutes information where it isolates an equilibrium in which all users respond in the first round but we find that this rule is ineffective for complements information isolating instead an equilibrium in which all users respond in the final round in addressing this we demonstrate that an approval voting scoring rule and proportional share scoring rule can enable the most efficient equilibrium with complements information under certain conditions by providing incentives for early responders as well as the user who submits the final answer
statistical summaries of traffic in ip networks are at the heart of network operation and are used to recover information on the traffic of arbitrary subpopulations of flows it is therefore of great importance to collect the most accurate and informative summaries given the router’s resource constraints cisco’s sampled netflow based on aggregating sampled packet stream into flows is the most widely deployed such system we observe two sources of inefficiency in current methods firstly single parameter the sampling rate is used to control utilization of both memory and processing access speed which means that it has to be set according to the bottleneck resource secondly the unbiased estimators are applicable to summaries that in effect are collected through uneven use of resources during the measurement period information from the earlier part of the measurement period is either not collected at all and fewer counter are utilized or discarded when performing sampling rate adaptation we develop algorithms that collect more informative summaries through an even and more efficient use of available resources the heart of our approach is novel derivation of unbiased estimators that use these more informative counts we show how to efficiently compute these estimators and prove analytically that they are superior have smaller variance on all packet streams and subpopulations to previous approaches simulations on pareto distributions and ip flow data show that the new summaries provide significantly more accurate estimates we provide an implementation design that can be efficiently deployed at routers
mixed presence groupware mpg supports both co located and distributed participants working over shared visual workspace it does this by connecting multiple single display groupware workspaces together through shared data structure our implementation and observations of mpg systems exposes two problems the first is display disparity where connecting heterogeneous tabletop and vertical displays introduces issues in how one seats people around the virtual table and how one orients work artifacts the second is presence disparity where participant’s perception of the presence of others is markedly different depending on whether collaborator is co located or remote this is likely caused by inadequate consequential communication between remote participants which in turn disrupts group collaborative and communication dynamics to mitigate display and presence disparity problems we determine virtual seating positions and replace conventional telepointers with digital arm shadows that extend from person’s side of the table to their pointer location
internet topology discovery consists of inferring the inter router connectivity links and the mapping from ip addresses to routers alias resolution current topology discovery techniques use ttl limited traceroute probes to discover links and use direct router probing to resolve aliases the often ignored record route rr ip option provides source of disparate topology data that could augment existing techniques but it is difficult to properly align with traceroute based topologies because router rr implementations are under standardized correctly aligned rr and traceroute topologies have fewer false links include anonymous and hidden routers and discover aliases for routers that do not respond to direct probing more accurate and feature rich topologies benefit overlay construction and network diagnostics modeling and measurement we present discarte system for aligning and cross validating rr and traceroute topology data using observed engineering practices discarte uses disjunctive logic programming dlp logical inference and constraint solving technique to intelligently merge rr and traceroute data we demonstrate that the resultant topology is more accurate and complete than previous techniques by validating its internal consistency and by comparing to publicly available topologies we classify irregularities in router implementations and introduce divide and conquer technique used to scale dlp to internet sized systems
though skyline queries in wireless sensor networks have been intensively studied in recent years existing solutions are not optimized for multiple skyline queries as they focus on single full space skyline queries it is not efficient to individually evaluate skyline queries especially in wireless sensor network environment where power consumption should be minimized in this paper we propose an energy efficient multi skyline evaluation emse algorithm to effectively evaluate multiple skyline queries in wireless sensor networks emse first utilizes global optimization mechanism to reduce the number of skyline queries and save on query propagation cost and parts of redundant result transmission cost as consequence then it utilizes local optimization mechanism to share the skyline results among skyline queries and uses some filtering policies to further eliminate unnecessary data transmission and save the skyline result transmission cost as consequence the experimental results show that the proposed algorithm is energy efficient when evaluating multiple skyline queries over wireless sensor networks
the notion of product families is becoming more and more popular both in research and in industry every product family initiative that is started within company has its own context such as particular business strategy and particular application domain each product family has its own specific characteristics that have to fit in with its context in this paper we will describe two dimensions for classifying product families the first dimension deals with the coverage of the product family platform platform coverage deals with the proportion of the functionality provided by the platform and the additional functionality needed to derive specific product within the product family the second dimension deals with the variation mechanisms that are used to derive specific product from the generic platform the coverage of the platform and the variation mechanisms used are not totally unrelated we will discuss various types of platform coverage and variation mechanisms including their characteristics these two dimensions are based on experience gained with number of product families we will look at four of these in greater detail to illustrate our ideas we believe that these dimensions will aid the classification of product families this will both facilitate the selection of new product family approach for particular context and support the evaluation of existing product families
developing sensor network applications demands new set of tools to aid programmers number of simulation environments have been developed that provide varying degrees of scalability realism and detail for understanding the behavior of sensor networks to date however none of these tools have addressed one of the most important aspects of sensor application design that of power consumption while simple approximations of overall power usage can be derived from estimates of node duty cycle and communication rates these techniques often fail to capture the detailed low level energy requirements of the cpu radio sensors and other peripherals in this paper we present scalable simulation environment for wireless sensor networks that provides an accurate per node estimate of power consumption powertossim is an extension to tossim an event driven simulation environment for tinyos applications in powertossim tinyos components corresponding to specific hardware peripherals such as the radio eeprom leds and so forth are instrumented to obtain trace of each device’s activity during the simulation runpowertossim employs novel code transformation technique to estimate the number of cpu cycles executed by each node eliminating the need for expensive instruction level simulation of sensor nodes powertossim includes detailed model of hardware energy consumption based on the mica sensor node platform through instrumentation of actual sensor nodes we demonstrate that powertossim provides accurate estimation of power consumption for range of applications and scales to support very large simulations
in prefix hijacking an autonomous system as advertises routes for prefixes that are owned by another as and ends up hijacking traffic that is intended to the owner while misconfigurations and or misunderstandings of policies are the likely reasons behind the majority of those incidents malicious incidents have also been reported recent works have focused on malicious scenarios that aim to maximize the amount of hijacked traffic from all ases without considering scenarios where the attacker is aiming to avoid detection in this paper we expose new class of prefix hijacking that is stealthy in nature the idea is to craft path of tunable lengths that deceive only small subset of ases by finely tuning the degree to which ases are effected the attacker can handle the hijacked traffic while the victimized as would not observe major reduction in its incoming traffic that would raise an alarm we give upper bounds on the impact of those attacks via simulations on real bgp internet announcements obtained from route views we discuss shortcomings in current proposed defense mechanisms against attackers which can falsify traceroute replies we also present defense mechanism against stealthy prefix hijacking attacks
in this paper we present an original and useful way for specifying and verifying temporal properties of concurrent programs with our tool named quasar quasar is based on asis and uses formal methods model checking properties that can be checked are either general like deadlock or fairness or more context specific referring to tasks states or to value of variables properties are then expressed in temporal logic in order to simplify the expression of these properties we define some templates that can be instantiated with specific items of the programs we demonstrate the usefulness of these templates by verifying subtle variations of the peterson algorithm thus although quasar uses up to date formal methods it remains accessible to large class of practitioners
the similarity join is an important operation for mining high dimensional feature spaces given two data sets the similarity join computes all tuples that are within distance egr one of the most efficient algorithms for processing similarity joins is the multidimensional spatial join msj by koudas and sevcik in our previous work pursued for the two dimensional case we found however that msj has several performance shortcomings in terms of cpu and cost as well as memory requirements therefore msj is not generally applicable to high dimensional datain this paper we propose new algorithm named generic external space sweep gess gess introduces modest rate of data replication to reduce the number of expensive distance computations we present new cost model for replication an model and an inexpensive method for duplicate removal the principal component of our algorithm is highly flexible replication engineour analytical model predicts tremendous reduction of the number of expensive distance computations by several orders of magnitude in comparison to msj factor in addition the memory requirements of gess are shown to be lower by several orders of magnitude furthermore the cost of our algorithm is by factor better independent from the fact whether replication occurs or not our analytical results are confirmed by large series of simulations and experiments with synthetic and real high dimensional data sets
web caches are traditionally organised in simple tree like hierarchy in this paper new architecture is proposed where federations of caches are distributed globally caching data partially the advantages of the proposed system are that contention on global caches is reduced while at the same time improving the scalability of the system since extra cache resources can be added on the fly among other topics discussed in this papers is the scalability of the proposed system the algorithms used to control the federation of web caches and the approach used to identify the potential web cache partners in order to obtain successful collaborative web caching system the formation of federations must be controlled by an algorithm that takes the dynamics of the internet traffic into consideration we use the history of web cache access in order to determine how federations should be formed initial performance results of simulation of number of nodes are promising
in recent years there has been prevalence of search engines being employed to find useful information in the web as they efficiently explore hyperlinks between web pages which define natural graph structure that yields good ranking unfortunately current search engines cannot effectively rank those relational data which exists on dynamic websites supported by online databases in this study to rank such structured data ie find the best items we propose an integrated online system consisting of compressed data structure to encode the dominant relationship of the relational data efficient querying strategies and updating scheme are devised to facilitate the ranking process extensive experiments illustrate the effectiveness and efficiency of our methods as such we believe the work in this paper can be complementary to traditional search engines
co occurrence data is quite common in many real applications latent semantic analysis lsa has been successfully used to identify semantic relations in such data however lsa can only handle single co occurrence relationship between two types of objects in practical applications there are many cases where multiple types of objects exist and any pair of these objects could have pairwise co occurrence relation all these co occurrence relations can be exploited to alleviate data sparseness or to represent objects more meaningfully in this paper we propose novel algorithm lsa which conducts latent semantic analysis by incorporating all pairwise co occurrences among multiple types of objects based on the mutual reinforcement principle lsa identifies the most salient concepts among the co occurrence data and represents all the objects in unified semantic space lsa is general and we show that several variants of lsa are special cases of our algorithm experiment results show that lsa outperforms lsa on multiple applications including collaborative filtering text clustering and text categorization
bitmap indices are efficient for answering queries on low cardinality attributes in this article we present new compression scheme called word aligned hybrid wah code that makes compressed bitmap indices efficient even for high cardinality attributes we further prove that the new compressed bitmap index like the best variants of the tree index is optimal for one dimensional range queries more specifically the time required to answer one dimensional range query is linear function of the number of hits this strongly supports the well known observation that compressed bitmap indices are efficient for multidimensional range queries because results of one dimensional range queries computed with bitmap indices can be easily combined to answer multidimensional range queries our timing measurements on range queries not only confirm the linear relationship between the query response time and the number of hits but also demonstrate that wah compressed indices answer queries faster than the commonly used indices including projection indices tree indices and other compressed bitmap indices
the opportunity for users to participate in design and development processes has expanded in recent years through such communication and information technologies as mailing lists bug trackers usage monitoring rich interactions between users and service center staff remote usability testing and so on key question therefore is deciding how to engage users in design and development through such technologies this paper addresses this question by reviewing literature on end user programming and open source development to develop framework concerning user roles and discourse the framework makes two claims user roles and social structure emerge after the introduction of software application role differentiation and different roles demand different kinds of discourse for deciding what to do and for reflecting upon intended and unintended consequences role discourse demands to show its application the framework is used to analyze the development of delicious breakthrough application for social bookmarking this development process is notable because it is characteristic of open source software development in some respects but the code is not made available publicly this hybridization appears to be widely applicable and suggests how design and development processes can be structured as service where the design and development of the system proceeds simultaneously with the formation and nurturing of community of users
autocompletion is useful feature when user is doing look up from table of records with every letter being typed autocompletion displays strings that are present in the table containing as their prefix the search string typed so far just as there is need for making the lookup operation tolerant to typing errors we argue that autocompletion also needs to be error tolerant in this paper we take first step towards addressing this problem we capture input typing errors via edit distance we show that naive approach of invoking an offline edit distance matching algorithm at each step performs poorly and present more efficient algorithms our empirical evaluation demonstrates the effectiveness of our algorithms
in this paper we report research results investigating microblogging as form of electronic word of mouth for sharing consumer opinions concerning brands we analyzed more than microblog postings containing branding comments sentiments and opinions we investigated the overall structure of these microblog postings the types of expressions and the movement in positive or negative sentiment we compared automated methods of classifying sentiment in these microblogs with manual coding using case study approach we analyzed the range frequency timing and content of tweets in corporate account our research findings show that percnt of microblogs contain mention of brand of the branding microblogs nearly percnt contained some expression of brand sentiments of these more than percnt were positive and percnt were critical of the company or product our comparison of automated and manual coding showed no significant differences between the two approaches in analyzing microblogs for structure and composition the linguistic structure of tweets approximate the linguistic patterns of natural language expressions we find that microblogging is an online tool for customer word of mouth communications and discuss the implications for corporations using microblogging as part of their overall marketing strategy copy wiley periodicals inc
we present netgrok tool for visualizing computer network usage in real time netgrok combines well known information visualization techniques overview zoom filter details on demand with network graph and treemap visualizations netgrok integrates these tools with shared data store that can read pcap formatted network traces capture traces from live interface and filter the data set dynamically by bandwidth number of connections and time we performed an expert user case study that demonstrates the benefits of applying these techniques to static and real time streaming packet data our user study shows netgrok serves as an excellent real time diagnostic enabling fast understanding of network resource usage and rapid anomaly detection
we present an architecture for synthesizable datapath oriented fpga core that can be used to provide post fabrication flexibility to an soc our architecture is optimized for bus based operations and employs directional routing architecture which allows it to be synthesized using standard asic design tools and flows the primary motivation for this architecture is to provide an efficient mechanism to support on chip debugging the fabric can also be used to implement other datapath oriented circuits such as those needed in signal processing and computation intensive applications we evaluate our architecture using set of benchmark circuits and compare it to previous fabrics in terms of area speed and power
latency insensitive design is the foundation of correct by construction methodology for soc design this approach can handle latency’s increasing impact on deep submicron technologies and facilitate the reuse of intellectual property cores for building complex systems on chips reducing the number of costly iterations in the design process
as parallel machines scale to one million nodes and beyond it becomes increasingly difficult to build reliable network that is able to guarantee packet delivery eventually large systems will need to employ fault tolerant messaging protocols that afford correct execution in the presence of lossy network in this paper we present lightweight protocol that preserves message idempotence and is easy to implement in hardware we identify the requirements for correct implementation of the protocol experiments are performed in simulation to determine implementation parameters that optimize performance we find that an aggressive implementation on fat tree network results in slowdown of less than compared to buffered wormhole routing on fault free network
motivated by applications to modern networking technologies there has been interest in designing efficient gossip based protocols for computing aggregate functions while gossip based protocols provide robustness due to their randomized nature reducing the message and time complexity of these protocols is also of paramount importance in the context of resource constrained networks such as sensor and peer to peer networks we present the first provably almost optimal gossip based algorithms for aggregate computation that are both time optimal and message optimal given node network our algorithms guarantee that all the nodes can compute the common aggregates such as min max count sum average rank etc of their values in optimal log time and using log log messages our result improves on the algorithm of kempe et al that is time optimal but uses log messages as well as on the algorithm of kashyap et al that uses log log messages but is not time optimal takes log log log time furthermore we show that our algorithms can be used to improve gossip based aggregate computation in sparse communication networks such as in peer to peer networks the main technical ingredient of our algorithm is technique called distributed random ranking drr that can be useful in other applications as well drr gives an efficient distributed procedure to partition the network into forest of disjoint trees of small size since the size of each tree is small aggregates within each tree can be efficiently obtained at their respective roots all the roots then perform uniform gossip algorithm on their local aggregates to reach distributed consensus on the global aggregates our algorithms are non address oblivious in contrast we show lower bound of log on the message complexity of any address oblivious algorithm for computing aggregates this shows that non address oblivious algorithms are needed to obtain significantly better message complexity our lower bound holds regardless of the number of rounds taken or the size of the messages used our lower bound is the first non trivial lower bound for gossip based aggregate computation and also gives the first formal proof that computing aggregates is strictly harder that rumor spreading in the address oblivious model
distributed collaborative editors dce provide computer support for modifying simultaneously shared documents such as articles wiki pages and programming source code by dispersed users controlling access in such systems is still challenging problem as they need dynamic access changes and low latency access to shared documents in this paper we propose flexible access control model where the shared document and its authorization policy are replicated at the local memory of each user to deal with latency and dynamic access changes we use an optimistic access control technique in such way that enforcement of authorizations is retroactive we show that naive coordination between updates of both copies can create security holes on the shared document by permitting illegal modifications or rejecting legal modifications finally we present prototype for managing authorizations in collaborative editing work which may be deployed easily on pp networks
in many retrieval tasks one important goal involves retrieving diverse set of results eg documents covering wide range of topics for search query first of all this reduces redundancy effectively showing more information with the presented results secondly queries are often ambiguous at some level for example the query jaguar can refer to many different topics such as the car or feline set of documents with high topic diversity ensures that fewer users abandon the query because no results are relevant to them unlike existing approaches to learning retrieval functions we present method that explicitly trains to diversify results in particular we formulate the learning problem of predicting diverse subsets and derive training method based on structural svms
in this paper we present new approach to mining binary data we treat each binary feature item as means of distinguishing two sets of examples our interest is in selecting from the total set of items an itemset of specified size such that the database is partitioned with as uniform distribution over the parts as possible to achieve this goal we propose the use of joint entropy as quality measure for itemsets and refer to optimal itemsets of cardinality as maximally informative itemsets we claim that this approach maximises distinctive power as well as minimises redundancy within the feature set number of algorithms is presented for computing optimal itemsets efficiently
indoor spaces accommodate large populations of individuals with appropriate indoor positioning eg bluetooth and rfid in place large amounts of trajectory data result that may serve as foundation for wide variety of applications eg space planning way finding and security this scenario calls for the indexing of indoor trajectories based on an appropriate notion of indoor trajectory and definitions of pertinent types of queries the paper proposes two tree based structures for indexing object trajectories in symbolic indoor space the rtr tree represents trajectory as set of line segments in space spanned by positioning readers and time the tpr tree applies data transformation that yields representation of trajectories as points with extension along the time dimension the paper details the structure node organization strategies and query processing algorithms for each index an empirical performance study suggests that the two indexes are effective efficient and robust the study also elicits the circumstances under which our proposals perform the best
federating mission critical systems over wide area networks still represents challenging issue for example it is hard to assure both reliability and timeliness in hostile environment such as internet the publish subscribe pub sub interaction model is promising solution for scalable data dissemination over wide area networks nevertheless currently available pub sub systems lack efficient support to achieve both reliability and timeliness in unreliable scenarios this paper describes an innovative approach to fill this gap making three contributions first cluster based peer to peer organization is introduced to handle large number of publishers and subscribers second the cluster coordinator is replicated to mask process crashes and to preserve cluster connectivity toward the outside world third multiple tree redundancy is applied to tolerate link crashes thereby minimizing unpredictability in the delivery time we present simulation based evaluation to assess the effectiveness of our approach in an unreliable setting this study indicates that our approach enforces the reliability of event delivery without affecting its timeliness
internet protocols encapsulate significant amount of state making implementing the host software complex in this paper we define the statecall policy language spl which provides usable middle ground between ad hoc coding and formal reasoning it enables programmers to embed automata in their code which can be statically model checked using spin and dynamically enforced the performance overheads are minimal and the automata also provide higher level debugging capabilities we also describe some practical uses of spl by describing the automata used in an ssh server written entirely in ocaml spl the acm portal is published by the association for computing machinery copyright acm inc terms of usage privacy policy code of ethics contact us useful downloads adobe acrobat quicktime windows media player real player
association rule mining from transaction database tdb requires the detection of frequently occurring patterns called frequent itemsets fis whereby the number of fis may be potentially huge recent approaches for fi mining use the closed itemset paradigm to limit the mining effort to subset of the entire fi family the frequent closed itemsets fcis we show here how fcis can be mined incrementally yet efficiently whenever new transaction is added to database whose mining results are available our approach for mining fis in dynamic databases relies on recent results about lattice incremental restructuring and lattice construction the fundamentals of the incremental fci mining task are discussed and its reduction to the problem of lattice update via the ci family is made explicit the related structural results underlie two algorithms for updating the set of fcis of given tdb upon the insertion of new transaction straightforward method searches for necessary completions throughout the entire ci family whereas second method exploits lattice properties to limit the search to cis which share at least one item with the new transaction efficient implementations of the parsimonious method is discussed in the paper together with set of results from preliminary study of the method’s practical performances
technology scaling in integrated circuits has consistently provided dramatic performance improvements in modern microprocessors however increasing device counts and decreasing on chip voltage levels have made transient errors first order design constraint that can no longer be ignored several proposals have provided fault detection and tolerance through redundantly executing program on an additional hardware thread or core while such techniques can provide high fault coverage they at best provide equivalent performance to the original execution and at worst incur slowdown due to error checking contention for shared resources and synchronization overheads this work achieves similar goal of detecting transient errors by redundantly executing program on an additional processor core however it speeds up rather than slows down program execution compared to the unprotected baseline case it makes the observation that small number of instructions are detrimental to overall performance and selectively skipping them enables one core to advance far ahead of the other to obtain prefetching and large instruction window benefits we highlight the modest incremental hardware required to support skewed redundancy and demonstrate speedup of for collection of integer floating point benchmarks while still providing error detection coverage within our sphere of replication additionally we show that third core can further improve performance while adding error recovery capabilities
we present and evaluate novel switching mechanism called layered switching conceptually the layered switching implements wormhole on top of virtual cut through switching to show the feasibility of layered switching as well as to confirm its advantages we conducted an rtl implementation study based on canonical wormhole architecture synthesis results show that our strategy suggests negligible degradation in hardware speed and area overhead simulation results demonstrate that it achieves higher throughput than wormhole alone while significantly reducing the buffer space required at network nodes when compared with virtual cut through
content distribution networks cdns are type of distributed database using geographically dispersed servers to efficiently distribute large multimedia contents among various kinds of cdns are those resembling peer to peer pp networks in which all the servers are equivalent and autonomous are easy to maintain and tolerant of faults however they differ from pp networks in that the number of nodes joining and leaving the network is negligible the main problems in cdns are the placement of contents and the location of content widely used cdns either have inefficient flooding like techniques for content location or restrict either content or index placement to use distributed hash tables for efficient content location however for the efficient distribution of contents the contents must be optimally placed within the cdn and no restrictions should be placed in the content or index placement algorithm we developed an efficient content location algorithm for cdns based on the distributed construction of search index without imposing any restrictions on the content or index placement algorithm we described our algorithm compared it with the existing content location algorithms and showed its effectiveness in increasing the success rate of queries with less traffic
consensus is one of the most common problems in distributed systems an important example of this in the field of dependability is group membership however consensus presents certain impossibilities which are not solvable on asynchronous systems therefore in the case of group membership systems must rely on additional services to solve the constraints imposed on them by the impossibility of consensus such additional services exist in the form of failure detectors and membership estimatorsthe contribution of this paper is the upper level algorithm of protocol stack that provides group membership for dynamic mobile and partitionable systems mainly aimed at mobile ad hoc networks stability criteria are established to select subset of nodes with low failure probability to form stable groups of nodes we provide description of the algorithm and the results of performance experiments on the ns network simulator
pre execution removes the microarchitectural latency of problem loads from programýs critical path by redundantly executing copies of their computations in parallel with the main program there have been several proposed pre execution systems quantitative framework pthsel for analytical pre execution thread thread selection and even research prototype to date however the energy aspects of pre execution have not been studied cycle level performance and energy simulations on spec integer benchmarks that suffer from misses show that energy blind pre execution naturally has linear latency energy trade off improving performance by while increasing energy consumption by to improve this trade off we propose two extensions to pthsel first we replace the flat cycle for cycle load cost model with model based on critical path estimation this extension increases thread efficiency in an energy independent way second we add parameterized energy model to pthsel forming pthsel that allows it to actively select threads that reduce energy rather than or in combination with execution latency experiments show that pthsel manipulates pre executionýs latency energy more effectively latency targeted selection benefits from the improved load cost model its performance improvements grow to an average of while energy costs drop to ed targeted selection produces threads that improve performance by only but ed by targeting thread selection for energy reduction results in energy free pre execution with average speedup of and small decrease in total energy consumption
the novel experience anywhere allowed participants to explore an urban area tying together information not normally available new points of views and interaction embedded into physical places guided by unseen on the street performers in an ongoing conversation maintained over mobile phones they gained access to locative media and staged performances our analysis demonstrates how anywhere produced engaging and uniquely personalised paths through complex landscape of content negotiated by the performer participant pair around various conflicting constraints we reflect our analysis through the lens of the key characteristics exhibited by mechanisms that support city exploration before focussing on possible extensions to the technological support of teams of professional and amateur guides
applications must be able to synchronize accesses to operating system resources in order to ensure correctness in the face of concurrency and system failures system transactions allow the programmer to specify updates to heterogeneous system resources with the os guaranteeing atomicity consistency isolation and durability acid system transactions efficiently and cleanly solve persistent concurrency problems that are difficult to address with other techniques for example system transactions eliminate security vulnerabilities in the file system that are caused by time of check to time of use tocttou race conditions system transactions enable an unsuccessful software installation to roll back without disturbing concurrent independent updates to the file system this paper describes txos variant of linux that implements system transactions txos uses new implementation techniques to provide fast serializable transactions with strong isolation and fairness between system transactions and non transactional activity the prototype demonstrates that mature os running on commodity hardware can provide system transactions at reasonable performance cost for instance transactional installation of openssh incurs only overhead and non transactional compilation of linux incurs negligible overhead on txos by making transactions central os abstraction txos enables new transactional services for example one developer prototyped transactional ext file system in less than one month
we describe method for enumerating all essentially different executions possible for cryptographic protocol we call them the shapes of the protocol naturally occurring protocols have only finitely many indeed very few shapes authentication and secrecy properties are easy to determine from them as are attacks cpsa our cryptographic protocol shape analyzer implements the method in searching for shapes cpsa starts with some initial behavior and discovers what shapes are compatible with it normally the initial behavior is the point of view of one participant the analysis reveals what the other principals must have done given this participant’s view
burst detection is the activity of finding abnormal aggregates in data streams such aggregates are based on sliding windows over data streams in some applications we want to monitor many sliding window sizes simultaneously and to report those windows with aggregates significantly different from other periods we will present general data structure for detecting interesting aggregates over such elastic windows in near linear time we present applications of the algorithm for detecting gamma ray bursts in large scale astrophysical data detection of periods with high volumes of trading activities and high stock price volatility is also demonstrated using real time trade and quote taq data from the new york stock exchange nyse our algorithm beats the direct computation approach by several orders of magnitude
mobile internet users have several options today including high bandwidth cellular data services such as that may be the choice for many however the ubiquity and low cost of wifi suggests an attractive alternative namely opportunistic use of open wifi access points aps or planned municipal mesh networks unfortunately for vehicular users the intermittent nature of wifi connectivity makes it challenging to support popular interactive applications such as web search and browsing our work is driven by two questions how can we enable system support for interactive web applications to tolerate disruptions in wifi connectivity from mobile nodes can opportunistic mobile to mobile mm transfers enhance application performance over only using aps and if so under what conditions and by how much we present thedu system that enables access to web search from moving vehicles the key idea is to use aggressive prefetching to transform the interactive web search application into one shot request response process we deployed prototype of thedu on the dieselnet testbed in amherst ma consisting of transit buses averaging on the road at time our deployment results show that thedu can deliver times as many relevant web pages than not using thedu bus receives relevant web pages with mean delay of minutes and within minutes in areas with high ap density thedu augments ap connectivity with mm transfers using utility driven dtn routing algorithm and uses caching to exploit query locality our analytic model and trace driven simulations suggest that mm routing yields little benefit over using aps alone even under moderately dense ap deployment such as in amherst with sparsely deployed aps as may be the case in rural areas our conclusions are more mixed mm routing with caching improves the number of relevant responses delivered per bus by up to but the mean delay is significantly high at minutes calling into question its practicality for interactive applications
during the life cycle of an xml application both schemas and queries may change from one version to another schema evolutions may affect query results and potentially the validity of produced data nowadays challenge is to assess and accommodate the impact of these changes in evolving xml applications such questions arise naturally in xml static analyzers these analyzers often rely on decision procedures such as inclusion between xml schemas query containment and satisfiability however existing decision procedures cannot be used directly in this context the reason is that they are unable to distinguish information related to the evolution from information corresponding to bugs this paper proposes predicate language within logical framework that can be used to make this distinction we present system for monitoring the effect of schema evolutions on the set of admissible documents and on the results of queries the system is very powerful in analyzing various scenarios where the result of query may not be anymore what was expected specifically the system is based on set of predicates which allow fine grained analysis for wide range of forward and backward compatibility issues moreover the system can produce counterexamples and witness documents which are useful for debugging purposes the current implementation has been tested with realistic use cases where it allows identifying queries that must be reformulated in order to produce the expected results across successive schema versions
large scale parallel systems multiprocessors system on chip mp socs multicomputers and cluster computers are often composed of hundreds or thousands of components such as routers channels and connectors that collectively possess failure rates higher than what arise in the ordinary systems one of the most important issues in the design of such systems is the development of the efficient fault tolerant mechanisms that provide high throughput and low latency in communications to ensure that these systems will keep running in degraded mode until the faulty components are repaired pipelined circuit switching pcs has been suggested as an efficient switching method for supporting inter processor communications in networks due to its ability to preserve both communication performance and fault tolerant demands in such systems this paper presents new mathematical model to investigate the effects of failures and capture the mean message latency in torus using pcs in the presence of faulty components simulation experiments confirm that the analytical model exhibits good degree of accuracy under different working conditions
domain specific languages play an important role in model driven engineering of software intensive industrial systems rich body of knowledge exists on the development of languages modeling environments and transformation systems the understanding of architectural choices for combining these parts into feasible solution however is not particularly deep we report on an endeavor in the realm of technology transfer process from academia to industry where we encountered unexpected influences of the architecture on the modeling language by examining the evolution of our language and its programming interface we show that these influences mainly stemmed from practical considerations for identifying these early on tight interaction between our research lab and the industrial partner was key in addition we share insights into the practice of cooperating with industry by presenting essential lessons we learned
bounded timed arc petri nets with read arcs were recently proven equivalent to networks of timed automata though the petri net model cannot express urgent behaviour and the described mutual translations are rather inefficient we propose an extension of timed arc petri nets with invariants to enforce urgency and with transport arcs to generalise the read arcs we also describe novel translation from the extended timed arc petri net model to networks of timed automata the translation is implemented in the tool tapaal and it uses uppaal as the verification engine our experiments confirm the efficiency of the translation and in some cases the translated models verify significantly faster than the native uppaal models do
deterministic finite automata dfas are widely used to perform regular expression matching in linear time several techniques have been proposed to compress dfas in order to reduce memory requirements unfortunately many real world ids regular expressions include complex terms that result in an exponential increase in number of dfa states since all recent proposals use an initial dfa as starting point they cannot be used as comprehensive regular expression representations in an ids in this work we propose hybrid automaton which addresses this issue by combining the benefits of deterministic and non deterministic finite automata we test our proposal on snort rule sets and we validate it on real traffic traces finally we address and analyze the worst case behavior of our scheme and compare it to traditional ones
in the data warehouse environment the concept of materialized view is common and important for efficient support of olap query processing materialized views are generally derived from several relations these materialized views need to be updated when source relations change since the propagation of updates to the views may impose significant overhead it is essential to update the warehouse views efficiently though various view maintenance strategies have been discussed in the past optimizations on the total accesses to relations have not been sufficiently investigated in this paper we propose an efficient incremental view maintenance method called optimal delta evaluation that can minimize the total accesses to relations we first present the delta evaluation expression and delta evaluation tree which are core concepts of the method then dynamic programming algorithm that can find the optimal delta evaluation tree is proposed we also present various experimental results that show the usefulness and efficiency of our proposed method
in recent years we extended the theory of abadi and lamport on the existence of refinement mappings the present paper gives an overview of several extensions of the theory and of number of recent applications to practical verifications it concludes with sketch of the results on semantic completeness and discussion of the relationship between semantic completeness and methodological convenience
we present physically based system for creating animations of novel words and phrases from text and audio input based on the analysis of motion captured speech examples leading image based techniques exhibit photo real quality yet lack versatility especially with regard to interactions with the environment data driven approaches that use motion capture to deform three dimensional surface often lack any anatomical or physically based structure limiting their accuracy and realism in contrast muscle driven physics based facial animation systems can trivially integrate external interacting objects and have the potential to produce very realistic animations as long as the underlying model and simulation framework are faithful to the anatomy of the face and the physics of facial tissue deformation we start with high resolution anatomically accurate flesh and muscle model built for specific subject then we translate motion captured training set of speech examples into muscle activation signals and subsequently segment those into intervals corresponding to individual phonemes finally these samples are used to synthesize novel words and phrases the versatility of our approach is illustrated by combining this novel speech content with various facial expressions as well as interactions with external objects
the frequent pattern tree fp tree is an efficient data structure for association rule mining without generation of candidate itemsets it was used to compress database into tree structure which stored only large items it however needed to process all transactions in batch way in real world applications new transactions are usually inserted into databases in this paper we thus attempt to modify the fp tree construction algorithm for efficiently handling new transactions fast updated fp tree fufp tree structure is proposed which makes the tree update process become easier an incremental fufp tree maintenance algorithm is also proposed for reducing the execution time in reconstructing the tree when new transactions are inserted experimental results also show that the proposed fufp tree maintenance algorithm runs faster than the batch fp tree construction algorithm for handling new transactions and generates nearly the same tree structure as the fp tree algorithm the proposed approach can thus achieve good trade off between execution time and tree complexity
clipping is the process of transforming real valued series into sequence of bits representing whether each data is above or below the average in this paper we argue that clipping is useful and flexible transformation for the exploratory analysis of large time dependent data sets we demonstrate how time series stored as bits can be very efficiently compressed and manipulated and that under some assumptions the discriminatory power with clipped series is asymptotically equivalent to that achieved with the raw data unlike other transformations clipped series can be compared directly to the raw data series we show that this means we can form tight lower bounding metric for euclidean and dynamic time warping distance and hence efficiently query by content clipped data can be used in conjunction with host of algorithms and statistical tests that naturally follow from the binary nature of the data series of experiments illustrate how clipped series can be used in increasingly complex ways to achieve better results than other popular representations the usefulness of the proposed representation is demonstrated by the fact that the results with clipped data are consistently better than those achieved with wavelet or discrete fourier transformation at the same compression ratio for both clustering and query by content the flexibility of the representation is shown by the fact that we can take advantage of variable run length encoding of clipped series to define an approximation of the kolmogorov complexity and hence perform kolmogorov based clustering
parametric polymorphism constrains the behavior of pure functional programs in way that allows the derivation of interesting theorems about them solely from their types ie virtually for free the formal background of such free theorems is well developed for extensions of the girard reynolds polymorphic lambda calculus by algebraic datatypes and general recursion provided the resulting calculus is endowed with either purely strict or purely nonstrict semantics but modern functional languages like clean and haskell while using nonstrict evaluation by default also provide means to enforce strict evaluation of subcomputations at will the resulting selective strictness gives the advanced programmer explicit control over evaluation order but is not without semantic consequences it breaks standard parametricity results this paper develops an operational semantics for core calculus supporting all the language features emphasized above its main achievement is the characterization of observational approximation with respect to this operational semantics via carefully constructed logical relation this establishes the formal basis for new parametricity results as illustrated by several example applications including the first complete correctness proof for short cut fusion in the presence of selective strictness the focus on observational approximation rather than equivalence allows finer grained analysis of computational behavior in the presence of selective strictness than would be possible with observational equivalence alone
chip multiprocessors cmps are now commodity hardware but commoditization of parallel software remains elusive in the near term the current trend of increased core per socket count will continue despite lack of parallel software to exercise the hardware future cmps must deliver thread level parallelism when software provides threads to run but must also continue to deliver performance gains for single threads by exploiting instruction level parallelism and memory level parallelism however power limitations will prevent conventional cores from exploiting both simultaneously this work presents the forwardflow architecture which can scale its execution logic up to run single threads or down to run multiple threads in cmp forwardflow dynamically builds an explicit internal dataflow representation from conventional instruction set architecture using forward dependence pointers to guide instruction wakeup selection and issue forwardflow’s backend is organized into discrete units that can be individually de activated allowing each core’s performance to be scaled by system software at the architectural level on single threads forwardflow core scaling yields mean runtime reduction of for increase in power consumption for multithreaded workloads forwardflow based cmp allows system software to select the performance point that best matches available power
we propose variational method for model based segmentation of gray scale images of highly degraded historical documents given training set of characters of certain letter we construct small set of shape models that cover most of the training set’s shape variance for each gray scale image of respective degraded character we construct custom made shape prior using those fragments of the shape models that best fit the character’s boundary therefore we are not limited to any particular shape in the shape model set in addition we demonstrate the application of our shape prior to degraded character recognition experiments show that our method achieves very accurate results both in segmentation of highly degraded characters and both in recognition when compared with manual segmentation the average distance between the boundaries of respective segmented characters was pixels the average size of the characters was pixels
we propose methods to accelerate texture based volume rendering by skipping invisible voxels we partition the volume into sub volumes each containing voxels with similar properties sub volumes composed of only voxels mapped to empty by the transfer function are skipped to render the adaptively partitioned sub volumes in visibility order we reorganize them into an orthogonal bsp tree we also present an algorithm that computes incrementally the intersection of the volume with the slicing planes which avoids the overhead of the intersection and texture coordinates computation introduced by the partitioning rendering with empty space skipping is to times faster than without it to skip occluded voxels we introduce the concept of orthogonal opacity map that simplifies the transformation between the volume coordinates and the opacity map coordinates which is intensively used for occlusion detection the map is updated efficiently by the gpu the sub volumes are then culled and clipped against the opacity map we also present method that adaptively adjusts the optimal number of the opacity map updates with occlusion clipping about of non empty voxels can be skipped and an additional speedup on average is gained for iso surface like rendering
outlier detection has many important applications in sensor networks eg abnormal event detection animal behavior change etc it is difficult problem since global information about data distributions must be known to identify outliers in this paper we use histogram based method for outlier detection to reduce communication cost rather than collecting all the data in one location for centralized processing we propose collecting hints in the form of histogram about the data distribution and using the hints to filter out unnecessary data and identify potential outliers we show that this method can be used for detecting outliers in terms of two different definitions our simulation results show that the histogram method can dramatically reduce the communication cost
we compare several optimization strategies implemented in an xml query evaluation system the strategies incorporate the use of path summaries into the query optimizer and rely on heuristics that exploit data statisticswe present experimental results that demonstrate wide range of performance improvements for the different strategies supported in addition we compare the speedups obtained using path summaries with those reported for index based methods the comparison shows that low cost path summaries combined with optimization strategies achieve essentially the same benefits as more expensive index structures
in this paper we propose query expansion and user profile enrichment approach to improve the performance of recommender systems operating on folksonomy storing and classifying the tags used to label set of available resources our approach builds and maintains profile for each user when he submits query consisting of set of tags on this folksonomy to retrieve set of resources of his interest it automatically finds further authoritative tags to enrich his query and proposes them to him all authoritative tags considered interesting by the user are exploited to refine his query and along with those tags directly specified by him are stored in his profile in such way to enrich it the expansion of user queries and the enrichment of user profiles allow any content based recommender system operating on the folksonomy to retrieve and suggest high number of resources matching with user needs and desires moreover enriched user profiles can guide any collaborative filtering recommender system to proactively discover and suggest to user many resources relevant to him even if he has not explicitly searched for them
evolution and reactivity in the semantic web address the vision and concrete need for an active web where data sources evolve autonomously and perceive and react to events in when the rewerse project started regarding work on evolution and reactivity in the semantic web there wasn’t much more than vision of such an active web materialising this vision requires the definition of model architecture and also prototypical implementations capable of dealing with reactivity in the semantic web including an ontology based description of all concepts this resulted in general framework for reactive event condition action rules in the semantic web over heterogeneous component languages inasmuch as heterogeneity of languages is in our view an important aspect to take into consideration for dealing with the heterogeneity of sources and behaviour of the semantic web concrete homogeneous languages targeting the specificity of reactive rules are of course also needed this is especially the case for languages that can cope with the challenges posed by dealing with composite structures of events or executing composite actions over web data in this chapter we report on the advances made on this front namely by describing the above mentioned general heterogeneous framework and by describing the concrete homogeneous language xchange
the problem of updating databases through interfaces based on the weak instance model is studied thus extending previous proposals that considered them only from the query point of view insertions and deletions of tuples are considered as preliminary tool lattice on states is defined based on the information content of the various states potential results of an insertion are states that contain at least the information in the original state and that in the new tuple sometimes there is no potential result and in the other cases there may be many of them we argue that the insertion is deterministic if the state that contains the information common to all the potential results the greatest lower bound in the lattice framework is potential result itself effective characterizations for the various cases exist symmetric approach is followed for deletions with fewer cases since there are always potential results determinism is characterized as consequence
nowadays the demand on user friendly querying interface such as query by sketch and query by editing is an important issue in the content based retrieval system for object database especially in mpeg pds perceptual shape descriptor has been developed in order to provide the user friendly querying which can not be covered by an existing international standard for description and browsing of object database since the pds descriptor is based on the part based representation of object it is kind of attributed relational gra arg so that the arg matching algorithm naturally follows as the core procedure for the similarity matching of the pds descriptor in this paper given pds database from the corresponding object database we bring focus into investigating the pros and cons of the target arg matching algorithms in order to demonstrate the objective evidence of our conclusion we have conducted the experiments based on the database of objects with categories in terms of the bull’s eye performance average normalized modified retrieval rate and precision recall curve
we present algorithms for finding optimal strategies for discounted infinite horizon determinsitc markov decision processes dmdps our fastest algorithm has worst case running time of mn improving the recent bound of mn obtained by andersson and vorbyov lsqb rsqb we also present randomized time algorithm for finding discounted all pairs shortest paths dapsp improving an mn time algorithm that can be obtained using ideas of papadimitriou and tsitsiklis lsqb rsqb
new approach is described for the fusion of multimedia information based on the concept of active documents advertising on the internet whereby the metadata of document travels in the network to seek out documents of interest to the parent document and at the same time advertises its parent document to other interested documents this abstraction of metadata is called an adlet which is the core of our approach two important features make this approach applicable to multimedia information fusion information retrieval data mining geographic information systems and medical information systems any document including web page database record video file audio file image and even paper documents can be enhanced by an adlet and become an active document and any node in nonactive network can be enhanced by adlet savvy software and the adlet enhanced node can coexist with other nonenhanced nodes an experimental prototype provides testbed for feasibility studies in hybrid active network environment
web advertising online advertising is form of promotion that uses the world wide web for the expressed purpose of delivering marketing messages to attract customers this paper addresses the mechanism of content oriented advertising contextual advertising which refers to the assignment of relevant ads within the content of generic web page eg blogs as blogs become platform for expressing personal opinion they naturally contain various kinds of expressions including both facts and comments of both positive and negative nature in this paper we propose the utilization of sentiment detection to improve web based contextual advertising the proposed soca sentiment oriented contextual advertising framework aims to combine contextual advertising matching with sentiment analysis to select ads that are related to the positive and neutral aspects of blog and rank them according to their relevance we experimentally validate our approach using set of data that includes both real ads and actual blog pages the results clearly indicate that our proposed method can effectively identify those ads that are positively correlated with the given blog pages
data mining is increasingly performed by people who are not computer scientists or professional programmers it is often done as an iterative process involving multiple ad hoc tasks as well as data pre and post processing all of which must be executed over large databases in order to make data mining more accessible it is critical to provide simple easy to use language that allows the user to specify ad hoc data processing model construction and model manipulation simultaneously it is necessary for the underlying system to scale up to large datasets unfortunately while each of these requirements can be satisfied individually by existing systems no system fully satisfies all criteria in this paper we present system called splash to fill this void splash supports an extended relational data model and sql query language which allows for the natural integration of statistical modeling and ad hoc data processing it also supports novel representatives operator to help explain models using limited number of examples we have developed prototype implementation of splash our experimental study indicates that it scales well to large input datasets further to demonstrate the simplicity of the language we conducted case study using splash to perform series of exploratory analyses using network log data our study indicates that the query based interface is simpler than common data mining software package and it often requires less programming effort to use
the locations of base stations are critically important to the viability of wireless sensor networks in this paper we examine the location privacy problem from both the attack and defense sides we start by examining adversaries targeting at identifying the sink location using minimum amount of resources in particular they launch zeroing in attack leveraging the fact that several network metrics are dimensional functions in the plane of the network and their values minimize at the sink thus determining the sink locations is equivalent to finding the minima of those functions we have shown that by obtaining the hop counts or the arrival time of broadcast packet at few spots in the network the adversaries are able to determine the sink location with the accuracy of one radio range sufficient to disable the sink by launching jamming attacks to cope with the zeroing in attacks we have proposed directed walk based scheme and validated that the defense strategy is effective in deceiving adversaries at little energy costs
ip networks today require massive effort to configure and manage ethernet is vastly simpler to manage but does not scale beyond small local area networks this paper describes an alternative network architecture called seattle that achieves the best of both worlds the scalability of ip combined with the simplicity of ethernet seattle provides plug and play functionality via flat addressing while ensuring scalability and efficiency through shortest path routing and hash based resolution of host information in contrast to previous work on identity based routing seattle ensures path predictability and stability and simplifies network management we performed simulation study driven by real world traffic traces and network topologies and used emulab to evaluate prototype of our design based on the click and xorp open source routing platforms our experiments show that seattle efficiently handles network failures and host mobility while reducing control overhead and state requirements by roughly two orders of magnitude compared with ethernet bridging
objective interactions between genes are realized as gene regulatory networks grns the control of such networks is essential for investigating issues like different diseases control is the process of studying the states and behavior of given system under different conditions the system considered in this study is gene regulatory network grn and one of the most important aspects in the control of grns is scalability consequently the objective of this study is to develop scalable technique that facilitates the control of grns method as the approach described in this paper concentrates on the control of grns we argue that it is possible to improve scalability by reducing the number of genes to be considered by the control policy consequently we propose novel method that considers gene relevancy to estimate genes that are less important for control this way it is possible to get reduced model after identifying genes that can be ignored in model building the latter genes are located based on threshold value which is expected to be provided by domain expert some guidelines are listed to help the domain expert in setting appropriate threshold value results we run experiments using both synthetic and real data including metastatic melanoma and budding yeast saccharomyces cerevisiae the reported test results identified genes that could be eliminated from each of the investigated grns for instance test results on budding yeast identified the two genes swi and mcm as candidates to be eliminated this considerably reduces the computation cost and hence demonstrate the applicability and effectiveness of the proposed approach conclusion employing the proposed reduction strategy results in close to optimal solutions to the control of grns which are otherwise intractable due to the huge state space implied by the large number of genes
in this paper we describe technique for specifying time related properties on traditional software components we apply the separation of concerns paradigm to allow independent specification of timing and to integrate timechecking specialized tool support into conventional software design processes we aim at helping the designer to specify time contracts and at simplifying the introduction of time properties in the component behaviour description we propose to handle timing issues in separate and specific design activity in order to provide means of formal computation of time properties for component assemblies without modifying in depth existing design processes
recently the world wide web consortium wc upgraded its web content accessibility guidelines wcag from version to wcag further encourages the design of accessible web content and has been put in place to address the limitations of the earlier version wcag the new development requires that updates be made accordingly one of the areas affected by the transition is automated web content accessibility evaluation and repair since most accessibility evaluation and repair tools aerts depend on guidelines to make suggestions about potential accessibility barriers and proffer repair solutions existing tools have to be modified to accommodate the changes wcag brings in particular more techniques for performing automated web content accessibility evaluation and repair are desirable the heterogeneous nature of web content which aerts assess calls for techniques of cross disciplinary origin in this paper we discuss the implications of the transition for automated evaluation and repair in addition we present meta review of relevant techniques from related disciplines for the purpose of informing research that surrounds testing and repair techniques employed by aerts
we consider the problem of determining whether sparse or lacunary polynomial is perfect power that is hr for some other polynomial and positive integer and of finding and should they exist we show how to determine if is perfect power in time polynomial in the size of the lacunary representation the algorithm works over gf at least for large characteristic and over where the cost is also polynomial in the log of the infinity norm of subject to conjecture we show how to find if it exists via kind of sparse newton iteration again in time polynomial in the size of the sparse representation finally we demonstrate an implementation using the library ntl
mobile pp networks have potential applications in many fields making them focus of current research however mobile pp networks are subject to the limitations of transmission range wireless bandwidth and highly dynamic network topology giving rise to many new challenges for efficient search in this paper we propose hybrid search approach which is automatic and economical in mobile pp networks the region covered by mobile pp network is partitioned into subregions each of which can be identified by unique id and known to all peers all the subregions then construct mobile kademlia mkad network the proposed hybrid retrieval approach aims to utilize flooding based and dht based schemes in mkad for indexing and searching according to designed utility functions our experiments show that the proposed approach is more accurate and efficient than existing methods the acm portal is published by the association for computing machinery copyright acm inc terms of usage privacy policy code of ethics contact us useful downloads adobe acrobat quicktime windows media player real player
rootkits are prevalent in today’s internet in particular persistent rootkits pose serious security threat because they reside in storage and survive system reboots using hypervisors is an attractive way to deal with rootkits especially when the rootkits have kernel privileges because hypervisors have higher privileges than os kernels however most of the previous studies do not focus on prevention of persistent rootkits this paper presents hypervisor based file protection scheme for preventing persistent rootkits from residing in storage based on security policies created in secure environment the hypervisor makes critical system files read only and unmodifiable by rootkits even if they have kernel privileges our scheme is designed to significantly reduce the size of hypervisors when combined with the architecture of bitvisor thin hypervisor for enforcing device security thereby contributing to the reliability of hypervisors our hypervisor consists of only kilo lines of code in total and its overhead on windows xp with fat file system is only
pipelined filter ordering is central problem in database query optimization the problem is to determine the optimal order in which to apply given set of commutative filters predicates to set of elements the tuples of relation so as to find as efficiently as possible the tuples that satisfy all of the filters optimization of pipelined filter ordering has recently received renewed attention in the context of environments such as the web continuous high speed data streams and sensor networks pipelined filter ordering problems are also studied in areas such as fault detection and machine learning under names such as learning with attribute costs minimum sum set cover and satisficing search we present algorithms for two natural extensions of the classical pipelined filter ordering problem distributional type problem where the filters run in parallel and the goal is to maximize throughput and an adversarial type problem where the goal is to minimize the expected value of multiplicative regret we present two related algorithms for solving both running in time which improve on the log algorithm of kodialam we use techniques from our algorithms for to obtain an algorithm for
task dependencies drive the need to coordinate work activities we describe technique for using automatically generated archi val data to compute coordination requirements ie who must coordinate with whom to get the work done analysis of data from large software development project revealed that coordina tion requirements were highly volatile and frequently extended beyond team boundaries congruence between coordination re quirements and coordination activities shortened development time developers particularly the most productive ones changed their use of electronic communication media over time achieving higher congruence we discuss practical implications of our technique for the design of collaborative and awareness tools
contemporary datacenters comprise hundreds or thousands of machines running applications requiring high availability and responsiveness although performance crisis is easily detected by monitoring key end to end performance indicators kpis such as response latency or request throughput the variety of conditions that can lead to kpi degradation makes it difficult to select appropriate recovery actions we propose and evaluate methodology for automatic classification and identification of crises and in particular for detecting whether given crisis has been seen before so that known solution may be immediately applied our approach is based on new and efficient representation of the datacenter’s state called fingerprint constructed by statistical selection and summarization of the hundreds of performance metrics typically collected on such systems our evaluation uses months of trouble ticket data from production datacenter with hundreds of machines running enterprise class user facing application in experiments in realistic and rigorous operational setting our approach provides operators the information necessary to initiate recovery actions with correctness in an average of minutes which is minutes earlier than the deadline provided to us by the operators to the best of our knowledge this is the first rigorous evaluation of any such approach on large scale production installation
since its inception the concept of network coordinates has been proposed to solve wide variety of problems such as overlay optimization network routing network localization and network modeling however two practical problems significantly limit the applications of network coordinates today first how can network coordinates be stabilized without losing accuracy so that they can be cached by applications second how can network coordinates be secured such that legitimate nodes coordinates are not impacted by misbehaving nodes although these problems have been discussed extensively solving them in decentralized network coordinates systems remains an open problem this paper presents new distributed algorithms to solve the coordinates stability and security problems for the stability problem we propose an error elimination model that can achieve stability without hurting accuracy novel algorithm based on this model is presented for the security problem we show that recently proposed statistical detection mechanisms cannot achieve an acceptable level of security against even simple attacks we propose to address the security problem in two parts first we show how the computation of coordinates can be protected by customized byzantine fault detection algorithm second we adopt triangle inequality violation detection algorithm to protect delay measurements these algorithms can be integrated together to provide stable and secure network coordinates
we study the problem of caching query result pages in web search engines popular search engines receive millions of queries per day and for each query return result page to the user who submitted the query the user may request additional result pages for the same query submit new query or quit searching altogether an efficient scheme for caching query result pages may enable search engines to lower their response time and reduce their hardware requirementsthis work studies query result caching within the framework of the competitive analysis of algorithms we define discrete time stochastic model for the manner in which queries are submitted to search engines by multiple user sessions we then present an adaptation of known online paging scheme to this model the expected number of cache misses of the resulting algorithm is no greater than times the expected number of misses that any online caching algorithm will experience under our specific model of query generation
we present new algorithm for interactive generation of hard edged umbral shadows in complex environments with moving light source our algorithm uses hybrid approach that combines the image quality of object precision methods with the efficiencies of image precision techniques we present an algorithm for computing compact potentially visible set pvs using levels of detail lods and visibility culling we use the pvss computed from both the eye and the light in novel cross culling algorithm that identifies reduced set of potential shadow casters and shadow receivers finally we use combination of shadow polygons and shadow maps to generate shadows we also present techniques for lod selection to minimize possible artifacts arising from the use of lods our algorithm can generate sharp shadow edges and reduces the aliasing in pure shadow map approaches we have implemented the algorithm on three pc system with nvidia geforce cards we achieve frames per second in three complex environments composed of millions of triangles
taint analysis form of information flow analysis establishes whether values from untrusted methods and parameters may flow into security sensitive operations taint analysis can detect many common vulnerabilities in web applications and so has attracted much attention from both the research community and industry however most static taint analysis tools do not address critical requirements for an industrial strength tool specifically an industrial strength tool must scale to large industrial web applications model essential web application code artifacts and generate consumable reports for wide range of attack vectors we have designed and implemented static taint analysis for java taj that meets the requirements of industry level applications taj can analyze applications of virtually any size as it employs set of techniques designed to produce useful answers given limited time and space taj addresses wide variety of attack vectors with techniques to handle reflective calls flow through containers nested taint and issues in generating useful reports this paper provides description of the algorithms comprising taj evaluates taj against production level benchmarks and compares it with alternative solutions
processor in memory pim architectures avoid the von neumann bottleneck in conventional machines by integrating high density dram and cmos logic on the same chip parallel systems based on this new technology are expected to provide higher scalability adaptability robustness fault tolerance and lower power consumption than current mpps or commodity clusters in this paper we describe the design of gilgamesh pim based massively parallel architecture and elements of its execution model gilgamesh extends existing pim capabilities by incorporating advanced mechanisms for virtualizing tasks and data and providing adaptive resource management for load balancing and latency tolerance the gilgamesh execution model is based on macroservers middleware layer which supports object based runtime management of data and threads allowing explicit and dynamic control of locality and load balancing the paper concludes with discussion of related research activities and an outlook to future work
the problem of programming scalable multicore processors has renewed interest in message passing languages and frameworks such languages and frameworks are typically actor oriented implementing some variant of the standard actor semantics this paper analyzes some of the more significant efforts to build actor oriented frameworks for the jvm platform it compares the frameworks in terms of their execution semantics the communication and synchronization abstractions provided and the representations used in the implementations it analyzes the performance of actor oriented frameworks to determine the costs of supporting different actor properties on jvm the analysis suggests that with suitable optimizations standard actor semantics and some useful communication and synchronization abstractions may be supported with reasonable efficiency on the jvm platform
current grid service choreography proposals remain at the descriptive level without providing any kind of reasoning mechanism to check the compatibility and validity of grid service composition how to formalize and verify the behavior of the grid service composition is therefore imperative novel process algebra cpi calculus conditional pi calculus is proposed in this paper by analyzing the interactive behavior of composite grid services we present grid service composition signature based on the signature we construct interactive behavior model iabm for grid service composition and specify the interactive behavior using the cpi calculus the case study shows that the mechanism for grid service specification and verification could be the algebraic foundation to be used afterwards in automatic and dynamic composition
for flexible and dynamic resource management in environments where users collaborate to fulfill their common tasks various attempts at modeling delegation of authority have been proposed using the role based access control rbac model however to achieve higher level of collaboration in large scale networked systems it is worthwhile supporting cross domain delegation with low administration cost for that purpose we propose capability role based access control crbac model by integrating capability based access control mechanism into the rbac model central to this scheme is the mapping of capabilities to permissions as well as to roles in each domain thereby realizing the delegation of permissions and roles by capability transfer by taking this approach of capability based access control our model has the advantages of flexibility and reduced administration costs we also demonstrate the effectiveness of our model by using examples of various types of delegation in clinical information systems
processors used in embedded systems are usually characterized by specialized irregular hardware architectures for which traditional code generation and optimization techniques fail especially for these types of processors the propan system has been developed that enables high quality machine dependent postpass optimizers to be generated from concise hardware specification optimizing code transformations as featured by propan require the control flow graph of the input program to be known the control flow reconstruction algorithm is generic ie machine independent and automatically derives the required hardware specific knowledge from the machine specification the reconstruction is based on an extended program slicing mechanism and is tailored to assembly programs it has been retargeted to assembly programs of two contemporary microprocessors the analog devices sharc and the philips trimedia tm experimental results show that the assembly based slicing enables the control flow graph of large assembly programs to be constructed in short time our experiments also demonstrate that the hardware design significantly influences the precision of the control flow reconstruction and the required computation time
in contrast to regular queries that are evaluated only once continuous query remains active over period of time and has to be continuously evaluated to provide up to date answers we propose method for continuous range query processing for different types of queries characterized by mobility of objects and or queries which all follow paths in an underlying spatial network the method assumes an available indexing scheme for indexing spatial network data an appropriately extended tree that primarily is used as an indexing scheme for network segments provides matching of queries and objects according to their locations on the network or their network routes the method introduces an additional pre refinement step which generates main memory data structures to support efficient incremental reevaluation of continuous range queries in periodically performed refinement steps
the main idea of content based image retrieval cbir is to search on an image’s visual content directly typically features eg color shape texture are extracted from each image and organized into feature vector retrieval is performed by image example where query image is given as input by the user and an appropriate metric is used to find the best matches in the corresponding feature space we attempt to bypass the feature selection step and the metric in the corresponding feature space by following what we believe is the logical continuation of the cbir idea of searching visual content directly it is based on the observation that since ultimately the entire visual content of an image is encoded into its raw data ie the raw pixel values in theory it should be possible to determine image similarity based on the raw data alone the main advantage of this approach is its simplicity in that explicit selection extraction and weighting of features is not needed this work is an investigation into an image dissimilarity measure following from the theoretical foundation of the recently proposed normalized information distance nid li chen li ma vitanyi the similarity metric in proceedings of the th acm siam symposium on discrete algorithms pp approximations of the kolmogorov complexity of an image are created by using different compression methods using those approximations the nid between images is calculated and used as metric for cbir the compression based approximations to kolmogorov complexity are shown to be valid by proving that they create statistically significant dissimilarity measures by testing them against null hypothesis of random retrieval furthermore when compared against several feature based methods the nid approach performed surprisingly well
the majority of current internet applications uses transmission control protocol tcp for ensuring reliable end to end delivery of data over ip networks the resulting path is generally speaking characterized by fairly large propagation delays of the order of tens to hundreds of milliseconds and increasing available bandwidth current tcpperformance is far from representing an optimal solution in such operating conditions the main reason lies in the conservative congestion control strategy employed which does not let tcp to exploit the always increasing available path capacity as consequence tcp optimization has been an active research topic in the research community over the last years boosted in the last few years by the widespread adoption of high speed optical fiber links in the backbone and the emergence of supercomputing networked applications from one side and tremendous growth of wireless bandwidth in network access from another this has led to the introduction of several alternative proposals for performing congestion control most of them focus on the effectiveness of bandwidth utilization introducing more aggressive congestion control strategies however such approaches result often in unfairness among flows with substantially different rtts or do not present the inter protocol fairness features required for incremental network deployment in this paper we propose tcp logwestwood tcp westwood enhancement based on logarithmic increase function targeting adaptation to the high speed wireless environment the algorithm shows low sensitivity with respect to rtt value while maintaining high network utilization in wide range of network settings the performance fairness and stability properties of the proposed tcp logwestwood are studied analytically and then validated by means of an extensive set of experiments including computer simulations and wide area internet measurements
open mass can be extremely dynamic due to heterogeneous agents that migrate among them to obtain resources or services not found locally in order to prevent malicious actions and to ensure agent trust open mas should be enhanced with normative mechanisms however it is not reasonable to expect that foreign agents will know in advance all the norms of the mas in which they will execute thus this paper presents dynacrom our approach for addressing these issues from the individual agents perspective dynacrom is an information mechanism so that agents become context norm aware from the system developers perspective dynacrom is methodology for norm management in regulated mass notwithstanding the ultimate goal of regulated mas is to have an enforcement mechanism we also present in the paper the integration of dynacrom and scaar scaar is the current solution of dynacrom for norm enforcement
this paper presents our experience mapping openmp parallel programming model to the ibm cyclops architecture the employs many core on chip design that integrates processing logic thread units embedded memory mb and communication hardware on the same die such unique architecture presents new opportunities for optimization specifically we consider the following three areas memory aware runtime library that places frequently used data structures in scratchpad memory unique spin lock algorithm for shared memory synchronization based on in memory atomic instructions and native support for thread level execution fast barrier that directly uses hardware support for collective synchronization all three optimizations together result in an overhead reduction for language constructs in openmp we believe that such drastic reduction in the cost of managing parallelism makes openmp more amenable for writing parallel programs on the platform
this work presents the use of click graphs in improving query intent classifiers which are critical if vertical search and general purpose search services are to be offered in unified user interface previous works on query classification have primarily focused on improving feature representation of queries eg by augmenting queries with search engine results in this work we investigate completely orthogonal approach instead of enriching feature representation we aim at drastically increasing the amounts of training data by semi supervised learning with click graphs specifically we infer class memberships of unlabeled queries from those of labeled ones according to their proximities in click graph moreover we regularize the learning with click graphs by content based classification to avoid propagating erroneous labels we demonstrate the effectiveness of our algorithms in two different applications product intent and job intent classification in both cases we expand the training data with automatically labeled queries by over two orders of magnitude leading to significant improvements in classification performance an additional finding is that with large amount of training data obtained in this fashion classifiers using only query words phrases as features can work remarkably well
we present framework to extract the most important features tree fragments from tree kernel tk space according to their importance in the target kernel based machine eg support vector machines svms in particular our mining algorithm selects the most relevant features based on svm estimated weights and uses this information to automatically infer an explicit representation of the input data the explicit features improve our knowledge on the target problem domain and make large scale learning practical improving training and test time while yielding accuracy in line with traditional tk classifiers experiments on semantic role labeling and question classification illustrate the above claims
for the first time the problem of optimizing energy for communication and motion is investigated we consider single mobile robot with continuous high bandwidth wireless communication eg caused by multimedia application like video surveillance this robot is connected to radio base station and moves with constant speed from given starting point on the plane to target point the task is to find the best path such that the energy consumption for mobility and the communication is optimized this is motivated by the fact that the energy consumption of radio devices increases polynomially at least to the power of two with the transmission distance we introduce efficient approximation algorithms finding the optimal path given the starting point the target point and the position of the radio stations we exemplify the influence of the communication cost by starting scenario with one radio station we study the performance of the proposed algorithm in simulation compare it with the scenario without applying our approach and present the results
service oriented architectures soa and in particular web services technologies are widely adopted for the development of interoperable systems in dynamic scenario composed service may exploit component services in order to complete its task composed services are variously distributed and offered by different providers in different security domains and under different load conditions so the development of services and their integration entails huge number of design choices obtaining optimality for all of the involved parameters for composed services is challenging and open issue in this paper we present mawes an autonomic framework that makes it possible to auto configure and to auto tune the composition of services guaranteeing optimal performance and the fulfillment of given security requirements we will illustrate the framework architecture and how it is able to support the development of self optimizing autonomic services on the basis of two evaluation services the first one able to predict the performance of different services execution the second one able to evaluate the security level provided by service
model checking is successful technique for automatically verifying concurrent finite state systems when designing model checker good compromise must be made between the expressive power of the property description formalism the complexity of the model checking problem and the user friendliness of the interface we present temporal logic and an associated model checking method that attempt to fulfill these criteria the logic is an extension of the alternation free calculus with actl like action formulas and pdl like regular expressions allowing concise and intuitive description of safety liveness and fairness properties over labeled transition systems the model checking method is based upon succinct translation of the verification problem into boolean equation system which is solved by means of an efficient local algorithm having good average complexity the algorithm also allows to generate full diagnostic information examples and counterexamples for temporal formulas this method is at the heart of the evaluator model checker that we implemented within the cadp toolbox using the generic open caesar environment for on the fly verification
compilers and optimizers for declarative query languages use some form of intermediate language to represent user level queries the advent of compositional query languages for orthogonal type systems eg oql calls for internal query representations beyond extensions of relational algebra this work adopts view of query processing which is greatly influenced by ideas from the functional programming domain uniform formal framework is presented which covers all query translation phases including user level query language compilation query optimization and execution plan generation we pursue the type based design mdash based on initial algebras mdash of core functional language which is then developed into an intermediate representation that fits the needs of advanced query processing based on the principle of structural recursion we extend the language by monad comprehensions which provide us with calculus style sublanguage that proves to be useful during the optimization of nested queries and combinators abstractions of the query operators implemented by the underlying target query engine due to its functional nature the language is susceptible to program transformation techniques that were developed by the functional programming as well as the functional data model communities we show how database query processing can substantially benefit from these techniques
there has been little attention to search based test data generation in the presence of pointer inputs and dynamic data structures an area in which recent concolic methods have excelled this paper introduces search based testing approach which is able to handle pointers and dynamic data structures it combines an alternating variable hill climb with set of constraint solving rules for pointer inputs the result is lightweight and efficient method as shown in the results from case study which compares the method to cute concolic unit testing tool
this paper focuses on data warehouse modelling the conceptual model we defined is based on object concepts extended with specific concepts like generic classes temporal classes and archive classes the temporal classes are used to store the detailed evolutions and the archive classes store the summarised data evolutions we also provide flexible concept allowing the administrator to define historised parts and non historised parts into the warehouse schema moreover we introduce constraints which configure the data warehouse behaviour and these various parts to validate our propositions we describe prototype dedicated to the data warehouse design
when the first specification of the fortran language was released in the goal was to provide an automatic programming system that would enhance the economy of programming by replacing assembly language with notation closer to the domain of scientific programming key issue in this context explicitly recognized by the authors of the language was the requirement to produce efficient object programs that could compete with their hand coded counterparts more than years later similar situation exists with respect to finding the right programming paradigm for high performance computing systems fortran as the traditional language for scientific programming has played major role in the quest for high productivity programming languages that satisfy very strict performance constraints this paper focuses on high level support for locality awareness one of the most important requirements in this context the discussion centers on the high performance fortran hpf family of languages and their influence on current language developments for peta scale computing hpf is data parallel language that was designed to provide the user with high level interface for programming scientific applications while delegating to the compiler the task of generating an explicitly parallel message passing program we outline developments that led to hpf explain its major features identify set of weaknesses and discuss subsequent languages that address these problems the final part of the paper deals with chapel modern object oriented language developed in the high productivity computing systems hpcs program sponsored by darpa salient property of chapel is its general framework for the support of user defined distributions which is related in many ways to ideas first described in vienna fortran this framework is general enough to allow concise specification of sparse data distributions the paper concludes with an outlook to future research in this area
the internet’s routing system is facing stresses due to its poor fundamental scaling properties compact routing is research field that studies fundamental limits of routing scalability and designs algorithms that try to meet these limits in particular compact routing research shows that shortest path routing forming core of traditional routing algorithms cannot guarantee routing table rt sizes that on all network topologies grow slower than linearly as functions of the network size however there are plenty of compact routing schemes that relax the shortest path requirement and allow for improved sublinear rt size scaling that is mathematically provable for all static network topologies in particular there exist compact routing schemes designed for grids trees and internet like topologies that offer rt sizes that scale logarithmically with the network size in this paper we demonstrate that in view of recent results in compact routing research such logarithmic scaling on internet like topologies is fundamentally impossible in the presence of topology dynamics or topology independent flat addressing we use analytic arguments to show that the number of routing control messages per topology change cannot scale better than linearly on internet like topologies we also employ simulations to confirm that logarithmic rt size scaling gets broken by topology independent addressing cornerstone of popular locator identifier split proposals aiming at improving routing scaling in the presence of network topology dynamics or host mobility these pessimistic findings lead us to the conclusion that fundamental re examination of assumptions behind routing models and abstractions is needed in order to find routing architecture that would be able to scale indefinitely
commercial applications such as databases and web servers constitute the largest and fastest growing segment of the market for multiprocessor servers ongoing innovations in disk subsystems along with the ever increasing gap between processor and memory speeds have elevated memory system design as the critical performance factor for such workloads however most current server designs have been optimized to perform well on scientific and engineering workloads potentially leading to design decisions that are non ideal for commercial applications the above problem is exacerbated by the lack of information on the performance requirements of commercial workloads the lack of available applications for widespread study and the fact that most representative applications are too large and complex to serve as suitable benchmarks for evaluating trade offs in the design of processors and serversthis paper presents detailed performance study of three important classes of commercial workloads online transaction processing oltp decision support systems dss and web index search we use the oracle commercial database engine for our oltp and dss workloads and the altavista search engine for our web index search workload this study characterizes the memory system behavior of these workloads through large number of architectural experiments on alpha multiprocessors augmented with full system simulations to determine the impact of architectural trends we also identify set of simplifications that make these workloads more amenable to monitoring and simulation without affecting representative memory system behavior we observe that systems optimized for oltp versus dss and index search workloads may lead to diverging designs specifically in the size and speed requirements for off chip caches
how do we find natural clustering of real world point set which contains an unknown number of clusters with different shapes and which may be contaminated by noise most clustering algorithms were designed with certain assumptions gaussianity they often require the user to give input parameters and they are sensitive to noise in this paper we propose robust framework for determining natural clustering of given data set based on the minimum description length mdl principle the proposed framework robust information theoretic clustering ric is orthogonal to any known clustering algorithm given preliminary clustering ric purifies these clusters from noise and adjusts the clusterings such that it simultaneously determines the most natural amount and shape subspace of the clusters our ric method can be combined with any clustering technique ranging from means and medoids to advanced methods such as spectral clustering in fact ric is even able to purify and improve an initial coarse clustering even if we start with very simple methods such as grid based space partitioning moreover ric scales well with the data set size extensive experiments on synthetic and real world data sets validate the proposed ric framework
supersampling is widely used by graphics hardware to render anti aliased images in conventional supersampling multiple scene samples are computationally combined to produce single screen pixel we consider novel imaging paradigm that we call display supersampling where multiple display samples are physically combined via the superimposition of multiple image subframes conventional anti aliasing and texture mapping techniques are shown inadequate for the task of rendering high quality images on supersampled displays instead of requiring anti aliasing filters supersampled displays actually require alias generation filters to cancel the aliasing introduced by nonuniform sampling we present fundamental theory and efficient algorithms for the real time rendering of high resolution anti aliased images on supersampled displays we show that significant image quality gains are achievable by taking advantage of display supersampling we prove that alias free resolution beyond the nyquist limits of single subframe may be achieved by designing bank of alias canceling rendering filters in addition we derive practical noniterative filter bank approach to real time rendering and discuss implementations on commodity graphics hardware
the development process in software product line engineering is divided into domain engineering and application engineering as consequence of this division tests should be performed in both processes however existing testing techniques for single systems cannot be applied during domain engineering because of the variability in the domain artifacts existing software product line test techniques only cover unit and system tests our contribution is model based automated integration test technique that can be applied during domain engineering for generating integration test case scenarios the technique abstracts from variability and assumes that placeholders are created for variability the generated scenarios cover all interactions between the integrated components which are specified in test model additionally the technique reduces the effort for creating placeholders by minimizing the number of placeholders needed to execute the integration test case scenarios we have experimentally measured the performance of the technique and the potential reduction of placeholders
previous studies have examined various aspects of user behavior on the web including general information seeking patterns search engine use and revisitation habits little research has been conducted to study how users navigate and interact with their web browser across different information seeking tasks we have conducted field study of participants in which we logged detailed web usage and asked participants to provide task categorizations of their web usage based on the following categories fact finding information gathering browsing and transactions we used implicit measures logged during each task session to provide usage measures such as dwell time number of pages viewed and the use of specific browser navigation mechanisms we also report on differences in how participants interacted with their web browser across the range of information seeking tasks within each type of task we found several distinguishing characteristics in particular information gathering tasks were the most complex participants spent more time completing this task viewed more pages and used the web browser functions most heavily during this task the results of this analysis have been used to provide implications for future support of information seeking on the web as well as direction for future research in this area copy wiley periodicals inc
this paper presents new method for inferring the semantic properties of documents by leveraging free text keyphrase annotations such annotations are becoming increasingly abundant due to the recent dramatic growth in semi structured user generated online content one especially relevant domain is product reviews which are often annotated by their authors with pros cons keyphrases such as real bargain or good value these annotations are representative of the underlying semantic properties however unlike expert annotations they are noisy lay authors may use different labels to denote the same property and some labels may be missing to learn using such noisy annotations we find hidden paraphrase structure which clusters the keyphrases the paraphrase structure is linked with latent topic model of the review texts enabling the system to predict the properties of unannotated documents and to effectively aggregate the semantic properties of multiple reviews our approach is implemented as hierarchical bayesian model with joint inference we find that joint inference increases the robustness of the keyphrase clustering and encourages the latent topics to correlate with semantically meaningful properties multiple evaluations demonstrate that our model substantially outperforms alternative approaches for summarizing single and multiple documents into set of semantically salient keyphrases
the goal of this paper is to investigate and assess the ability of explanatory models based on design metrics to describe and predict defect counts in an object oriented software system specifically we empirically evaluate the influence of design decisions to defect behavior of the classes in two products from the commercial software domain information provided by these models can help in resource allocation and serve as base for assessment and future improvementswe use innovative statistical methods to deal with the peculiarities of the software engineering data such as non normally distributed count data to deal with overdispersed data and excess of zeroes in the dependent variable we use negative binomial nb and zero inflated nb regression in addition to poisson regressionfurthermore we form framework for comparison of models descriptive and predictive ability predictive capability of the models to identify most critical classes in the system early in the software development process can help in allocation of resources and foster software quality improvement in addition to the correlation coefficients we use additional statistics to assess models ability to explain high variability in the data and pareto analysis to assess models ability to identify the most critical classes in the systemresults indicate that design aspects related to communication between classes and inheritance can be used as indicators of the most defect prone classes which require the majority of resources in development and testing phases the zero inflated negative binomial regression model designed to explicitly model the occurrence of zero counts in the dataset provides the best results for this purpose
systems for fast search of personal information are rapidly becoming ubiquitous such systems promise to dramatically improve personal information management yet most are modeled on web search in which users know very little about the content that they are searching we describe the design and deployment of system called phlat that optimizes search for personal information with an intuitive interface that merges search and browsing through variety of associative and contextual cues in addition phlat supports unified tagging labeling scheme for organizing personal content across storage systems files email etc the system has been deployed to hundreds of employees within our organization we report on both quantitative and qualitative aspects of system use phlat is available as free download at http researchmicrosoftcom adapt phlat
this paper considers the problem of publishing transaction data for research purposes each transaction is an arbitrary set of items chosen from large universe detailed transaction data provides an electronic image of one’s life this has two implications one transaction data are excellent candidates for data mining research two use of transaction data would raise serious concerns over individual privacy therefore before transaction data is released for data mining it must be made anonymous so that data subjects cannot be re identified the challenge is that transaction data has no structure and can be extremely high dimensional traditional anonymization methods lose too much information on such data to date there has been no satisfactory privacy notion and solution proposed for anonymizing transaction data this paper proposes one way to address this issue
heterogeneous multicore processors promise high execution efficiency under diverse workloads and program scheduling is critical in exploiting this efficiency this paper presents novel method to leverage the inherent characteristics of program for scheduling decisions in heterogeneous multicore processors the proposed method projects the core’s configuration and the program’s resource demand to unified multi dimensional space and uses weighted euclidean distance between these two to guide the program scheduling the experimental results show that on average this distance based scheduling heuristic achieves reduction in energy delay product reduction in energy and improvement in throughput when compared with traditional hardware oblivious scheduling algorithm
software refactoring is the process of reorganizing the internal structure of code while preserving the external behavior aspect oriented programming aop provides new modularization of software systems by encapsulating crosscutting concerns based on these two techniques aspect oriented ao refactoring restructures crosscutting elements in code ao refactoring includes two steps aspect mining identification of aspect candidates in code and as pect refactoring semantic preserving transformation to mi grate the aspect candidate code to ao code aspect refac toring clusters similar join points together for the aspect candidates and encapsulates each cluster with an effective pointcut definition with the increase in size of the code and crosscutting concerns it is tedious to manually identify aspects and their corresponding join points cluster the join points and infer pointcut expressions therefore there is need to auto mate the process of ao refactoring this paper proposes an automated approach that identifies aspect candidates in code and infers pointcut expressions for these aspects our approach mines for aspect candidates identifies the join points for the aspect candidates clusters the join points and infers an effective pointcut expression for each clus ter of join points the approach also provides an addi tional testing mechanismto ensure that the inferred pointcut expressions are of correct strength the empirical results show that our approach helps achieve significant reduc tion in the total number of pointcut expressions to be used in the refactored code
our icse paper showed how an application can be adapted at runtime by manipulating its architectural model in particular our paper demonstrated the beneficial role of software connectors in aiding runtime change an explicit architectural model fielded with the system and used as the basis for runtime change and architectural style in providing both structural and behavioral constraints over runtime change this paper examines runtime evolution in the decade hence broad framework for studying and describing evolution is introduced that serves to unify the wide range of work now found in the field of dynamic software adaptation this paper also looks to the future identifying what we believe to be highly promising directions
variety of animation effects such as herds and fluids contain detailed motion fields characterized by repetitive structures such detailed motion fields are often visually important but tedious to specify manually or expensive to simulate computationally due to the repetitive nature some of these motion fields eg turbulence in fluids could be synthesized by procedural texturing but procedural texturing is known for its limited generality we apply example based texture synthesis for motion fields our technique is general and can take on variety of user inputs including captured data manual art and physical procedural simulation this data driven approach enables artistic effects that are difficult to achieve via previous methods such as heart shaped swirls in fluid animation due to the use of texture synthesis our method is able to populate large output field from small input exemplar imposing minimum user workload our algorithm also allows the synthesis of output motion fields not only with the same dimension as the input eg to but also of higher dimension such as volumetric outputs from planar inputs this cross dimension capability supports convenient usage scenario ie the user could simply supply images and our method produces motion field with similar characteristics the motion fields produced by our method are generic and could be combined with variety of large scale low resolution motions that are easy to specify either manually or computationally but lack the repetitive structures to be characterized as textures we apply our technique to variety of animation phenomena including smoke liquid and group motion
the rd international acm workshop on data engineering for wireless and mobile access mobide for short took place on september at the westin horton plaza hotel in san diego california in conjunction with mobicom the mobide workshops serve as bridge between the data management and network research communities and have tradition of presenting innovations on mobile as well as wireless data engineering issues such as those found in sensor networks this workshop was the third in the mobide series mobide having taken place in seattle in conjunction with mobicom and mobide having taken place in santa barbara in conjunction with sigmod
we present new approach to character skinning where divergence free vector fields induced by skeletal motion describe the velocity of skin deformation the joint transformations for pose relative to rest pose create bend deformation field resulting in pose dependent or kinematic skin deformations varying smoothly across joints the bend deformation parameters are interactively controlled to capture the varying deformability of bone and other anatomic tissue within an overall fold over free and volume preserving skin deformation subsequently we represent the dynamics of skeletal motion tissue elasticity muscular tension and the environment as forces that are mapped to vortices at tissue interfaces simplified biot savart law in the context of elastic deformation recovers divergence free velocity field from the vorticity finally we apply new stable technique to efficiently integrate points along their deformation trajectories adding these dynamic forces over window of time prior to given pose provides continuum of user controllable kinodynamic skinning comprehensive implementation using typical animator workflow in maya shows our approach to be effective for complex character skinning
when generalization algorithms are known to the public an adversary can obtain more precise estimation of the secret table than what can be deduced from the disclosed generalization result therefore whether generalization algorithm can satisfy privacy property should be judged based on such an estimation in this paper we show that the computation of the estimation is inherently recursive process that exhibits high complexity when generalization algorithms take straightforward inclusive strategy to facilitate the design of more efficient generalization algorithms we suggest an alternative exclusive strategy which adopts seemingly drastic approach to eliminate the need for recursion surprisingly the data utility of the two strategies are actually not comparable and the exclusive strategy can provide better data utility in certain cases
in this research work we propose novel embedded dual execution mode bit processor architecture qsp which supports queue and stack programming models the qsp core is based on high performance produced order parallel queue architecture and is targeted for applications constrained in terms of area memory and power requirements the design focuses on the ability to execute queue programs and also to support stack programs without considerable increase in hardware to the base queue architecture prototype implementation of the processor is produced by synthesizing the high level model for target fpga device we present the architecture description and design results in fair amount of details from the design and evaluation results the qsp core efficiently executes both queue and stack based programs and achieves on average about mhz speed in addition when compared to the base single mode architecture pqp the qsp core requires only about additional hardware moreover the prototype fits on single fpga device thereby eliminating the need to perform multi chip partitioning which results in loss of resource efficiency
users have begun downloading an increasingly large number of mobile phone applications in response to advancements in handsets and wireless networks the increased number of applications results in greater chance of installing trojans and similar malware in this paper we propose the kirin security service for android which performs lightweight certification of applications to mitigate malware at install time kirin certification uses security rules which are templates designed to conservatively match undesirable properties in security configuration bundled with applications we use variant of security requirements engineering techniques to perform an in depth security analysis of android to produce set of rules that match malware characteristics in sample of of the most popular applications downloaded from the official android market kirin and our rules found applications that implement dangerous functionality and therefore should be installed with extreme caution upon close inspection another five applications asserted dangerous rights but were within the scope of reasonable functional needs these results indicate that security configuration bundled with android applications provides practical means of detecting malware
as the internet evolves into ubiquitous communication infrastructure and supports increasingly important services its dependability in the presence of various failures becomes critical in this paper we analyze is is routing updates fromthe sprint ip backbone network to characterize failures that affect ip connectivity failures are first classified based on patterns observed at the ip layer in some cases it is possible to further infer their probable causes such as maintenance activities router related and optical layer problems key temporal and spatial characteristics of each class are analyzed and when appropriate parameterized using well known distributions our results indicate that of all failures happen during period of scheduled maintenance activities of the unplanned failures almost are shared by multiple links and are most likely due to router related and optical equipment related problems respectively while affect single link at time our classification of failures reveals the nature and extent of failures in the sprint ip backbone furthermore our characterization of the different classes provides probabilistic failure model which can be used to generate realistic failure scenarios as input to various network design and traffic engineering problems
node selecting queries over trees lie at the core of several important xml languages for the web such as the node selection language xpath the query language xquery and the transformation language xslt the main syntactic constructs of such queries are the backward predicates for example ancestor and preceding and the forward predicates for example descendant and following forward predicates are included in the depth first left to right preorder relation associated with the input tree whereas backward predicates are included in the inverse of this preorder relation this work is devoted to an expressiveness study of node selecting queries with proven theoretical and practical applicability especially in the field of query evaluation against xml streams the main question it answers positively is whether for each input query with forward and backward predicates there exists an equivalent forward only output query this question is then positively answered for input and output queries of varying structural complexity using loglin and pspace reductions various existing applications based on the results of this work are reported including query optimization and streamed evaluation
achieving expressive and efficient content based routing in publish subscribe systems is difficult problem traditional approaches prove to be either inefficient or severely limited in their expressiveness and flexibility we present novel routing method based on bloom filters which shows high efficiency while simultaneously preserving the flexibility of content based schemes the resulting implementation is fast flexible and fully decoupled content based publish subscribe system
one counter automata are fundamental and widely studied class of infinite state systems in this paper we consider one counter automata with counter updates encoded in binary which we refer to as the succinct encoding it is easily seen that the reachability problem for this class of machines is in pspace and is np hard one of the main results of this paper is to show that this problem is in fact in np and is thus np completewe also consider parametric one counter automata in which counter updates be integer valued parameters the reachability problem asks whether there are values for the parameters such that final state can be reached from an initial state our second main result shows decidability of the reachability problem for parametric one counter automata by reduction to existential presburger arithmetic with divisibility
this paper presents the realisation using service oriented architecture of an approach for dynamic flexible and extensible exception handling in workflows based not on proprietary frameworks but on accepted ideas of how people actually work the resultant service implements detailed taxonomy of workflow exception patterns to provide an extensible repertoire of self contained exception handling processes called exlets which may be applied at the task case or specification levels when an exception occurs at runtime an exlet is dynamically selected from the repertoire depending on the context of the exception and of the particular work instance both expected and unexpected exceptions are catered for in real time so that manual handling is avoided
the hardware and software in modern aircraft controlsystems are good candidates for verification using formalmethods they are complex safety critical and challengethe capabilities of test based verification strategies wehave previously reported on our use of model checking toverify the time partitioning property of the deos real timeoperating system for embedded avionics the size and complexityof this system have limited us to analyzing only oneconfiguration at time to overcome this limit and generalizeour analysis to arbitrary configurations we have turnedto theorem provingthis paper describes our use of the pvs theorem proverto analyze the deos scheduler in addition to our inductiveproof of the time partitioning invariant we present afeature based technique for modeling state transition systemsand formulating inductive invariants this techniquefacilitates an incremental approach to theorem proving thatscales well to models of increasing complexity and has thepotential to be applicable to wide range of problems
viewing data sampled on complicated geometry such as helix or torus is hard because single camera view can only encompass part of the object either multiple views or non linear projection can be used to expose more of the object in single view however specifying such views is challenging because of the large number of parameters involved we show that small set of versatile widgets can be used to quickly and simply specify wide variety of such views these widgets are built on top of general framework that in turn encapsulates variety of complicated camera placement issues into more natural set of parameters making the specification of new widgets or combining multiple widgets simpler this framework is entirely view based and leaves intact the underlying geometry of the dataset making it applicable to wide range of data types
in this paper we focus on passive measurements of tcp traffic we propose heuristic technique to classify tcp anomalies ie segments that have sequence number different from the expected one such as out of sequence and duplicate segments since tcp is closed loop protocol that infers network conditions from packet losses and reacts accordingly the possibility of carefully distinguishing the causes of anomalies in tcp traffic is very appealing and may be instrumental to understand tcp behavior in real environments we apply the proposed heuristic to traffic traces collected at both network edges and backbone links by comparing results obtained from traces collected over several years we observe some phenomena such as the impact of the introduction of tcp sack which reduces the unnecessary retransmissions the large percentage of network reordering etc by further studying the statistical properties of tcp anomalies we find that while their aggregate exhibits long range dependence anomalies suffered by individual long lived flows are on the contrary uncorrelated interestingly no dependence on the actual link load is observed
in addition to information text contains attitudinal and more specifically emotional content this paper explores the text based emotion prediction problem empirically using supervised machine learning with the snow learning architecture the goal is to classify the emotional affinity of sentences in the narrative domain of children’s fairy tales for subsequent usage in appropriate expressive rendering of text to speech synthesis initial experiments on preliminary data set of fairy tales show encouraging results over na iuml ve baseline and bow approach for classification of emotional versus non emotional contents with some dependency on parameter tuning we also discuss results for tripartite model which covers emotional valence as well as feature set alternations in addition we present plans for more cognitively sound sequential model taking into consideration larger set of basic emotions
surface reconstruction provides powerful paradigm for modeling shapes from samples for point cloud data with only geometric coordinates as input delaunay based surface reconstruction algorithms have been shown to be quite effective both in theory and practice however major complaint against delaunay based methods is that they are slow and cannot handle large data we extend the cocone algorithm to handle supersize data this is the first reported delaunay based surface reconstruction algorithm that can handle data containing more than million sample points on modest machine
software pipelining is loop scheduling technique that extracts parallelism out of loops by overlapping the execution of several consecutive iterations due to the overlapping of iterations schedules impose high register requirements during their execution schedule is valid if it requires at most the number of registers available in the target architecture if not its register requirements have to be reduced either by decreasing the iteration overlapping or by spilling registers to memory in this paper we describe set of heuristics to increase the quality of register constrained modulo schedules the heuristics decide between the two previous alternatives and define criteria for effectively selecting spilling candidates the heuristics proposed for reducing the register pressure can be applied to any software pipelining technique the proposals are evaluated using register conscious software pipeliner on workbench composed of large set of loops from the perfect club benchmark and set of processor configurations proposals in this paper are compared against previous proposal already described in the literature for one of these processor configurations and the set of loops that do not fit in the available registers speed up of and reduction of the memory traffic by factor of are achieved with an affordable increase in compilation time for all the loops this represents speed up of and reduction of the memory traffic by factor of
we study the problem of finding the least priced path lpp between source and destination in opportunistic spectrum access osa networks this problem is motivated by economic considerations whereby spectrum opportunities are sold leased to secondary radios srs this incurs communication cost eg for traffic relaying as the beneficiary of these services the end user must compensate the service providing srs for their spectrum cost to give an incentive ie profit for srs to report their true cost typically the payment to sr should be higher than the actual cost however from an end user’s perspective unnecessary overpayment should be avoided so we are interested in the optimal route selection and payment determination mechanism that minimizes the price tag of the selected route and at the same time guarantees truthful cost reports from srs this setup is in contrast to the conventional truthful least cost path lcp problem where the interest is to find the minimum cost route the lpp problem is investigated with and without capacity constraints at individual srs for both cases our algorithmic solutions can be executed in polynomial time the effectiveness of our algorithms in terms of price saving is verified through extensive simulations
reasoning about heap allocated data structures such as linked lists and arrays is challenging the reachability predicate has proved to be useful for reasoning about the heap in type safe languages where memory is manipulated by dereferencing object fields sound and precise analysis for such data structures becomes significantly more challenging in the presence of low level pointer manipulation that is prevalent in systems software in this paper we give novel formalization of the reachability predicate in the presence of internal pointers and pointer arithmetic we have designed an annotation language for programs that makes use of the new predicate this language enables us to specify properties of many interesting data structures present in the windows kernel we present preliminary experience with prototype verifier on set of illustrative benchmarks
summary generation for multiple documents poses number of issues including sentence selection sentence ordering and sentence reduction over single document summarization in addition the temporal resolution among extracted sentences is also important this article considers informative words and event words to deal with multidocument summarization these words indicate the important concepts and relationships in document or among set of documents and can be used to select salient sentences we present temporal resolution algorithm using focusing time and coreference chains to convert chinese temporal expressions in document into calendrical forms moreover we consider the last calendrical form of sentence as sentence time stamp to address sentence ordering informative words event words and temporal words are introduced to sentence reduction algorithm which deals with both length constraints and information coverage experiments on chinese news data sets show significant improvements of both information coverage and readability
we present data structure enabling efficient nearest neighbor nn retrieval for bregman divergences the family of bregman divergences includes many popular dissimilarity measures including kl divergence relative entropy mahalanobis distance and itakura saito divergence these divergences present challenge for efficient nn retrieval because they are not in general metrics for which most nn data structures are designed the data structure introduced in this work shares the same basic structure as the popular metric ball tree but employs convexity properties of bregman divergences in place of the triangle inequality experiments demonstrate speedups over brute force search of up to several orders of magnitude
we introduce multi stage ensemble framework error driven generalist expert or edge for improved classification on large scale text categorization problems edge first trains generalist capable of classifying under all classes to deliver reasonably accurate initial category ranking given an instance edge then computes confusion graph for the generalist and allocates the learning resources to train experts on relatively small groups of classes that tend to be systematically confused with one another by the generalist the experts votes when invoked on given instance yield reranking of the classes thereby correcting the errors of the generalist our evaluations showcase the improved classification and ranking performance on several large scale text categorization datasets edge is in particular efficient when the underlying learners are efficient our study of confusion graphs is also of independent interest
in this paper we present the design and implementation of the xorbac component that provides flexible rbac service the xorbac implementation conforms to level of the unified nist model for rbac and can be reused for arbitrary applications on unix or windows with or tcl linkage xorbac runtime elements can be serialized and recreated from rdf data models conforming to well defined rdf schema furthermore we present our experiences with xorbac for the deployment within the http environment for web based mobile code system
in this paper we present method for rendering deformations as part of the programmable shader pipeline of contemporary graphical processing units in our method we allow general deformations including cuts previous approaches to deformation place the role of the gpu as general purpose processor for computing vertex displacement with the advent of vertex texture fetch in current gpus number of approaches have been proposed to integrate deformation into the rendering pipeline however the rendering of cuts cannot be easily programmed into vertex shader due to the inability to change the topology of the mesh furthermore rendering smooth deformed surfaces requires fine tessellation of the mesh in order to prevent self intersection and meshing artifacts for large deformations in our approach we overcome these problems by considering deformation as part of the pixel shader where transformation is performed on per pixel basis we demonstrate how this approach can be efficiently implemented using contemporary graphics hardware to obtain high quality rendering of deformation at interactive rates
embedded hard real time systems need reliable guarantees for the satisfaction of their timing constraints experience with the use of static timing analysis methods and the tools based on them in the automotive and the aeronautics industries is positive however both the precision of the results and the efficiency of the analysis methods are highly dependent on the predictability of the execution platform in fact the architecture determines whether static timing analysis is practically feasible at all and whether the most precise obtainable results are precise enough results contained in this paper also show that measurement based methods still used in industry are not useful for quite commonly used complex processors this dependence on the architectural development is of growing concern to the developers of timing analysis tools and their customers the developers in industry the problem reaches new level of severity with the advent of multicore architectures in the embedded domain this paper describes the architectural influence on static timing analysis and gives recommendations as to profitable and unacceptable architectural features
most of current researches on web page classification focus on leveraging heterogeneous features such as plain text hyperlinks and anchor texts in an effective and efficient way composite kernel method is one topic of interest among them it first selects bunch of initial kernels each of which is determined separately by certain type of features then classifier is trained based on linear combination of these kernels in this paper we propose an effective way to optimize the linear combination of kernels we proved that this problem is equivalent to solving generalized eigenvalue problem and the weight vector of the kernels is the eigenvector associated with the largest eigen value support vector machine svm classifier is then trained based on this optimized combination of kernels our experiment on the webkb dataset has shown the effectiveness of our proposed method
frequent pattern mining fpm has become one of the most popular data mining approaches for the analysis of purchasing patterns methods such as apriori and fp growth have been shown to work efficiently in this setting however these techniques are typically restricted to single concept level since typical business databases support hierarchies that represent the relationships amongst many different concept levels it is important that we extend our focus to discover frequent patterns in multi level environments unfortunately little attention has been paid to this research area in this paper we present two novel algorithms that efficiently discover multi level frequent patterns adopting either top down or bottom up approach our algorithms exploit existing fp tree structures rather than excessively scanning the raw data set multiple times as might be done with naive implementation in addition we also introduce an algorithm to mine cross level frequent patterns experimental results have shown that our new algorithms maintain their performance advantage across broad spectrum of test environments
there are many information objects and users in large company it is an important issue how to control user’s access in order that only authorized user can access information objects traditional access control models discretionary access control mandatory access control and role based access control do not properly reflect the characteristics of enterprise environment this paper proposes an improved access control model for enterprise environment the characteristics of access control in an enterprise environment are examined and task role based access control rbac model founded on concept of classification of tasks is introduced task is fundamental unit of business work or business activity rbac deals with each task differently according to its class and supports task level access control and supervision role hierarchy rbac is suitable access control model for industrial companies
large scale distributed data integration systems have to deal with important query processing costs which are essentially due to the high communication overload between data peers caching techniques can drastically reduce processing and communication costwe propose new distributed caching strategy that reduces redundant caching decisions of individual peers we estimate cache redundancy by distributed algorithmwithout additionalmessages our simulation experiments show that considering redundancy scores can drastically reduce distributed query execution costs
this paper describes seggen new algorithm for linear text segmentation on general corpuses it aims to segment texts into thematic homogeneous parts several existing methods have been used for this purpose based on sequential creation of boundaries here we propose to consider boundaries simultaneously thanks to genetic algorithm seggen uses two criteria maximization of the internal cohesion of the formed segments and minimization of the similarity of the adjacent segments first experimental results are promising and seggen appears to be very competitive compared with existing methods
the scope of telephony is significantly broadening providing users with variety of communication modes including presence status instant messaging and videoconferencing furthermore telephony is being increasingly combined with number of non telephony heterogeneous resources consisting of software entities such as web services and hardware entities such as location tracking devices this heterogeneity compounded with the intricacies of underlying technologies make the programming of new telephony applications daunting taskthis paper proposes an approach to supporting the development of advanced telephony applications we introduce declarative language to define the entities of target telephony application area this definition is passed to generator to produce java programming framework dedicated to the application area the generated frameworks provide service discovery and high level communication mechanisms these mechanisms are automatically mapped into sip making our approach compatible with existing sip infrastructures and entities our work has been validated on various advanced telephony applications
we present new decision procedure for detecting property violations in pushdown models for concurrent programs that use lock based synchronization where each thread’s lock operations are properly nested la synchronized methods in java the technique detects violations expressed as indexed phase automata pas class of non deterministic finite automata in which the only loops are self loopsour interest in pas stems from their ability to capture atomic set serializability violations atomic set serializability is relaxation of atomicity to only user specified set of memory locations we implemented the decision procedure and applied it to detecting atomic set serializability violations in models of concurrent java programs compared with prior method based on semi decision procedure not only was the decision procedure faster overall but the semi decision procedure timed out on about of the queries versus for the decision procedure
the first osr issue read was volume no april at the time was actively working on the spin extensible operating system and looking for new ways to safely expose interrupt handling to user provided code loaded into the kernel the article by kleiman and eykholt on interrupts as threads in that issue of osr was highly relevant and informative ever since then had planned on contributing to osr but unfortunately never got around to it for this reason when jeanna matthews asked whether could serve as the guest editor of the inaugural special topics osr issue jumped at the opportunity
in this paper we take the idea of application level processing on disks to one level further and focus on an architecture called cluster of active disks cad where the storage system contains network of parallel active disks each individual active disk which includes an embedded processor disk caches memory and interconnect can perform some application level processing but more importantly the active disks can collectively perform parallel input output and processing thereby reducing not just the communication latency but latency and computation time as well the cad architecture poses many challenges for the next generation software systems at all levels including programming models operating and runtime systems application mapping compilation parallelization and performance modeling and evaluation in this paper we focus exclusively on code scheduling support required for clusters of active disks more specifically we address the problem of code scheduling with the goal of minimizing the power consumption on the disk system our experiments indicate that the proposed scheduling approach is very successful in reducing power and generates better results than three other alternate scheduling schemes tested
with semiconductor technology advancing toward deep submicron leakage energy is of increasing concern especially for large on chip array structures such as caches and branch predictors recent work has suggested that larger aggressive branch predictors can and should be used in order to improve microprocessor performance further consideration is that more aggressive branch predictors especially multiported predictors for multiple branch prediction may be thermal hot spots thus further increasing leakage moreover as the branch predictor holds state that is transient and predictive elements can be discarded without adverse effect for these reasons it is natural to consider applying decay techniques already shown to reduce leakage energy for caches to branch prediction structuresdue to the structural difference between caches and branch predictors applying decay techniques to branch predictors is not straightforward this paper explores the strategies for exploiting spatial and temporal locality to make decay effective for bimodal gshare and hybrid predictors as well as the branch target buffer btb furthermore the predictive behavior of branch predictors steers them towards decay based not on state preserving static storage cells but rather quasi static dynamic storage cells this paper will examine the results of implementing decaying branch predictor structures with dynamic appropriately decaying cells rather than the standard static sram celloverall this paper demonstrates that decay techniques can apply to more than just caches with the branch predictor and btb as an example we show decay can either be implemented at the architectural level or with wholesale replacement of static storage cells with quasi static storage cells which naturally implement decay more importantly decay techniques can be applied and should be applied to other such transient and or predictive structures
in this paper we propose an interactive color natural image segmentation method the method integrates color feature with multiscale nonlinear structure tensor texture msnst feature and then uses grabcut method to obtain the segmentations the msnst feature is used to describe the texture feature of an image and integrated into grabcut framework to overcome the problem of the scale difference of textured images in addition we extend the gaussian mixture model gmm to msnst feature and gmm based on msnst is constructed to describe the energy function so that the texture feature can be suitably integrated into grabcut framework and fused with the color feature to achieve the more superior image segmentation performance than the original grabcut method for easier implementation and more efficient computation the symmetric kl divergence is chosen to produce the estimates of the tensor statistics instead of the riemannian structure of the space of tensor the conjugate norm was employed using locality preserving projections lpp technique as the distance measure in the color space for more discriminating power an adaptive fusing strategy is presented to effectively adjust the mixing factor so that the color and msnst texture features are efficiently integrated to achieve more robust segmentation performance last an iteration convergence criterion is proposed to reduce the time of the iteration of grabcut algorithm dramatically with satisfied segmentation accuracy experiments using synthesis texture images and real natural scene images demonstrate the superior performance of our proposed method
this paper describes new advance in solving cross lingual question answering cl qa tasks it is built on three main pillars the use of several multilingual knowledge resources to reference words between languages the inter lingual index ili module of eurowordnet and the multilingual knowledge encoded in wikipedia ii the consideration of more than only one translation per word in order to search candidate answers and iii the analysis of the question in the original language without any translation process this novel approach overcomes the errors caused by the common use of machine translation mt services by cl qa systems we also expose some studies and experiments that justify the importance of analyzing whether named entity should be translated or not experimental results in bilingual scenarios show that our approach performs better than an mt based cl qa approach achieving an average improvement of
spanner is graph on set of points with the following property between any pair of points there is path in the spanner whose total length is at most times the actual distance between the points in this paper we consider points residing in metric space equipped with doubling dimension and show how to construct dynamic spanner with degree in frac log varepsilon lambda update time when and are taken as constants the degree and update times are optimal
we present liverac visualization system that supports the analysis of large collections of system management time series data consisting of hundreds of parameters across thousands of network devices liverac provides high information density using reorderable matrix of charts with semantic zooming adapting each chart’s visual representation to the available space liverac allows side by side visual comparison of arbitrary groupings of devices and parameters at multiple levels of detail staged design and development process culminated in the deployment of liverac in production environment we conducted an informal longitudinal evaluation of liverac to better understand which proposed visualization techniques were most useful in the target environment
major difference between face to face interaction and computer mediated communication is how contact negotiation the way in which people start and end conversations is managed contact negotiation is especially problematic for distributed group members who are separated by distance and thus do not share many of the cues needed to help mediate interaction an understanding of what resources and cues people use to negotiate making contact when face to face identifies ways to design support for contact negotiation in new technology to support remote collaboration this perspective is used to analyze the design and use experiences with three communication prototypes desktop conferencing prototype montage and awarenex these prototypes use text video and graphic indicators to share the cues needed to gracefully start and end conversations experiences with using these prototypes focused on how these designs support the interactional commitment of the participants when they have to commit their attention to an interaction and how flexibly that can be negotiated reviewing what we learned from these research experiences identifies directions for future research in supporting contact negotiation in computer mediated communication
this paper presents contribution to word spotting applied for digitized syriac manuscripts the syriac language was wrongfully accused of being dead language and has been set aside by the domain of handwriting recognition yet it is very fascinating handwriting that combines the word structure and calligraphy of the arabic handwriting with the particularity of being intentionally written tilted by an angle of approximately for the spotting process we developed method that should find all occurrences of certain query word image based on selective sliding window technique from which we extract directional features and afterwards perform matching using euclidean distance correspondence between features the proposed method does not require any prior information and does not depend of word to character segmentation algorithm which would be extremely complex to realize due to the tilted nature of the handwriting
evaluation and applicability of many database techniques ranging from access methods histograms and optimization strategies to data normalization and mining crucially depend on their ability to cope with varying data distributions in robust way however comprehensive real data is often hard to come by and there is no flexible data generation framework capable of modelling varying rich data distributions this has led individual researchers to develop their own ad hoc data generators for specific tasks as consequence the resulting data distributions and query workloads are often hard to reproduce analyze and modify thus preventing their wider usage in this paper we present flexible easy to use and scalable framework for database generation we then discuss how to map several proposed synthetic distributions to our framework and report preliminary results
networks of sensors are used in many different fields from industrial applications to surveillance applications common feature of these applications is the necessity of monitoring infrastructure that analyzes large number of data streams and outputs values that satisfy certain constraints in this paper we present query processor for monitoring queries in network of sensors with prediction functions sensors communicate their values according to threshold policy and the proposed query processor leverages prediction functions to compare tuples efficiently and to generate answers even in the absence of new incoming tuples two types of constraints are managed by the query processor window join constraints and value constraints uncertainty issues are considered to assign probabilistic values to the results returned to the user moreover we have developed an appropriate buffer management strategy that takes into account the contributions of the prediction functions contained in the tuples we also present some experimental results that show the benefits of the proposal
one of the uses of social tagging is to associate freely selected terms tags to resources for sharing resources among tag consumers this enables tag consumers to locate new resources through the collective intelligence of other tag creators and offers new avenue for resource discovery this paper investigates the effectiveness of tags as resource descriptors determined through the use of text categorisation using support vector machines two text categorisation experiments were done for this research and tags and web pages from delicious were used the first study concentrated on the use of terms as its features the second study used both terms and its tags as part of its feature set the results indicate that the tags were not always reliable indicators of the resource contents at the same time the results from the terms only experiment were better compared to the experiment with terms and tags deeper analysis of sample of tags and documents were also conducted and implications of this research are discussed
we present physics based simulation method for animating sand to allow for efficiently scaling up to large volumes of sand we abstract away the individual grains and think of the sand as continuum in particular we show that an existing water simulator can be turned into sand simulator with only few small additions to account for inter grain and boundary frictionwe also propose an alternative method for simulating fluids our core representation is cloud of particles which allows for accurate and flexible surface tracking and advection but we use an auxiliary grid to efficiently enforce boundary conditions and incompressibility we further address the issue of reconstructing surface from particle data to render each frame
multivalued dependencies mvds are an important class of relational constraints that is fundamental to relational database design reflexivity axiom complementation rule and pseudo transitivity rule form minimal set of inference rules for the implication of mvds the complementation rule plays distinctive role as it takes into account the underlying relation schema which the mvds are defined on the axiom is much weaker than the complementation rule but is sufficient to form minimal set of inference rules together with augmentation and pseudo difference rule fagin has asked whether it is possible to reduce the power of the complementation rule and drop the augmentation rule at the same time and still obtain complete set it was argued that there is trade off between complementation rule and augmentation rule and one can only dispense with one of these rules at the same time it is shown in this paper that an affirmative answer to fagin’s problem can nevertheless be achieved in fact it is proven that axiom together with weaker form of the reflexivity axiom pseudo transitivity rule and exactly one of union intersection or difference rule form such desirable minimal sets the positive solution to this problem gives further insight into the difference between the notions of functional and multivalued dependencies
recognition in uncontrolled situations is one of the most important bottlenecks for practical face recognition systems we address this by combining the strengths of robust illumination normalization local texture based face representations and distance transform based matching metrics specifically we make three main contributions we present simple and efficient preprocessing chain that eliminates most of the effects of changing illumination while still preserving the essential appearance details that are needed for recognition ii we introduce local ternary patterns ltp generalization of the local binary pattern lbp local texture descriptor that is more discriminant and less sensitive to noise in uniform regions and iii we show that replacing local histogramming with local distance transform based similarity metric further improves the performance of lbp ltp based face recognition the resulting method gives state of the art performance on three popular datasets chosen to test recognition under difficult illumination conditions face recognition grand challenge version experiment extended yale and cmu pie
we present an approach for image retrieval using very large number of highly selective features and efficient learning of queries our approach is predicated on the assumption that each image is generated by sparse set of visual ldquo causes rdquo and that images which are visually similar share causes we propose mechanism for computing very large number of highly selective features which capture some aspects of this causal structure in our implementation there are over highly selective features at query time user selects few example images and the adaboost algorithm is used to learn classification function which depends on small number of the most appropriate features this yields highly efficient classification function in addition we show that the adaboost framework provides natural mechanism for the incorporation of relevance feedback finally we show results on wide variety of image queries
the acyclicity degree of relational database scheme is an interesting topic due to several desirable properties of the corresponding database in this paper simple and homogeneous way to characterize the acyclicity degree of database scheme is given the method is based on the existence in all acyclic database schemes of nonempty set of relation schemes that satisfy pruning predicate which is property similar to the property satisfied by the leaves in an ordinary tree this fact implies that such relation schemes may be eliminated using recursive pruning algorithm in order to determine the acyclicity degree furthermore if we use an incremental step by step design methodology enriching the scheme one relation at time the pruning predicate suggests set of properties that have to be verified by the new relation scheme in order to preserve the acyclicity degree of the database scheme
association rule mining has made many achievements in the area of knowledge discovery however the quality of the extracted association rules is big concern one problem with the quality of the extracted association rules is the huge size of the extracted rule set as matter of fact very often tens of thousands of association rules are extracted among which many are redundant thus useless mining non redundant rules is promising approach to solve this problem the min max exact basis proposed by pasquier et al pasquier has showed exciting results by generating only non redundant rules in this paper we first propose relaxing definition for redundancy under which the min max exact basis still contains redundant rules then we propose condensed representation called reliable exact basis for exact association rules the rules in the reliable exact basis are not only non redundant but also more succinct than the rules in min max exact basis we prove that the redundancy eliminated by the reliable exact basis does not reduce the belief to the reliable exact basis the size of the reliable exact basis is much smaller than that of the min max exact basis moreover we prove that all exact association rules can be deduced from the reliable exact basis therefore the reliable exact basis is lossless representation of exact association rules experimental results show that the reliable exact basis significantly reduces the number of non redundant rules
we have been interested in developing an otoneurological decision support system that supports diagnostics of vertigo diseases in this study we concentrate on testing its inference mechanism and knowledge discovery method knowledge is presented as patterns of classes each pattern includes attributes with weight and fitness values concerning the class with the knowledge discovery method it is possible to form fitness values from data knowledge formation is based on frequency distributions of attributes knowledge formed by the knowledge discovery method is tested with two vertigo data sets and compared to experts knowledge the experts and machine learnt knowledge are also combined in various ways in order to examine effects of weights on classification accuracy the classification accuracy of knowledge discovery method is compared to and nearest neighbour method and naive bayes classifier the results showed that knowledge bases combining machine learnt knowledge with the experts knowledge yielded the best classification accuracies further attribute weighting had an important effect on the classification capability of the system when considering different diseases in the used data sets the performance of the knowledge discovery method and the inference method is comparable to other methods employed in this study
we present graphical semantics for the pi calculus that is easier to visualize and better suited to expressing causality and temporal properties than conventional relational semantics pi chart is finite directed acyclic graph recording computation in the pi calculus each node represents process and each edge either represents computation step or message passing interaction pi charts enjoy natural pictorial representation akin to message sequence charts in which vertical edges represent control flow and horizontal edges represent data flow based on message passing pi chart represents single computation starting from its top the nodes with no ancestors to its bottom the nodes with no descendants unlike conventional reductions or transitions the edges in pi chart induce ancestry and other causal relations on processes we give both compositional and operational definitions of pi charts and illustrate the additional expressivity afforded by the chart semantics via series of examples
despite the fact that explicit congestion notification ecn demonstrated clear potential to substantially improve network performance recent network measurements reveal an extremely poor usage of this option in today’s internet in this paper we analyze the roots of this phenomenon and develop set of novel incentives to encourage network providers end hosts and web servers to apply ecninitially we examine fundamental drawback of the current ecn specification and demonstrate that the absence of ecn indications in tcp control packets can dramatically hinder system performance while security reasons primarily prevent the usage of ecn bits in tcp syn packets we show that applying ecn to tcp syn ack packets can significantly improve system performance without introducing any novel security or stability side effects our network experiments on cluster of web servers show dramatic performance improvement over the existing ecn specification throughput increases by more than while the average web response time simultaneously decreases by nearly an order of magnitudein light of the above finding using large scale simulations modeling and network experiments we re investigate the relevance of ecn and provide set of practical recommendations and insights ecn systematically improves the performance of all investigated aqm schemes contrary to common belief this particularly holds for red ii the impact of ecn is highest for web only traffic mixes such that even generic aqm algorithm with ecn support outperforms all non ecn enabled aqm schemes that we investigated iii primarily due to moderate queuing levels the superiority of ecn over other aqm mechanisms largely holds for high speed backbone routers even in more general traffic scenarios iv end hosts that apply ecn can exercise the above performance benefits instantly without waiting for the entire internet community to support the option
effective use of communication networks is critical to the performance and scalability of parallel applications partitioned global address space languages like upc bring the promise of performance and programmer productivity studies of well tuned programs have suggested that pgas languages are effective at utilizing modern networks because their one sided communication is good match to the underlying network hardware an open question is whether the manual optimizations required to achieve good performance can be performed automatically by the compiler in performance portable manner in this paper we present compiler and runtime optimization framework for loops containing communication operations our framework performs compile time message vectorization and strip mining and defers until runtime the selection of the actual communication operations at runtime the communication requirements of the program are analyzed and communication is instantiated and scheduled based on highly tuned network and application performance models the runtime analysis takes into account network flow control and quality of service restrictions and it is able to select from large class of available communication primitives the communication schedule best suited for the dynamic combination of input size and system parameters the results indicate that our framework produces code that scales and performs better than that of manually optimized implementations our approach not only improves performance but increases programmer productivity as well
reusing available software components in developing new systems is always priority as it usually saves considerable amount of time money and human effort since it might not always be possible to find single component that provides the sought functionality an ideal scenario for software reuse would be to build new software system by composing existing components based on their behavioral properties in this paper we take advantage of logical reasoning to find solution for automatic composition of stateless components stateless components are components with simple two step behavior they receive all their inputs at the same time and then return the corresponding outputs also at the same time we provide concrete algorithms to find possible component compositions for requested behavior we then validate the returned compositions using composition algebraic rules composition algebra is minimal process algebra that is specifically designed for this validation in order to understand the functionality of the proposed approach in realistic situations we also study some of the experimental results obtained by implementing the algorithm and running it on some test cases
research on multimedia information retrieval mir has recently witnessed booming interest prominent feature of this research trend is its simultaneous but independent materialization within several fields of computer science the resulting richness of paradigms methods and systems may on the long run result in fragmentation of efforts and slow down progress the primary goal of this study is to promote an integration of methods and techniques for mir by contributing conceptual model that encompasses in unified and coherent perspective the many efforts that are being produced under the label of mir the model offers retrieval capability that spans two media text and images but also several dimensions form content and structure in this way it reconciles similarity based methods with semantics based ones providing the guidelines for the design of systems that are able to provide generalized multimedia retrieval service in which the existing forms of retrieval not only coexist but can be combined in any desired manner the model is formulated in terms of fuzzy description logic which plays twofold role it directly models semantics based retrieval and it offers an ideal framework for the integration of the multimedia and multidimensional aspects of retrieval mentioned above the model also accounts for relevance feedback in both text and image retrieval integrating known techniques for taking into account user judgments the implementation of the model is addressed by presenting decomposition technique that reduces query evaluation to the processing of simpler requests each of which can be solved by means of widely known methods for text and image retrieval and semantic processing prototype for multidimensional image retrieval is presented that shows this decomposition technique at work in significant case
curve skeletons are thinned representations of objects useful for many visualization tasks including virtual navigation reduced model formulation visualization improvement animation etc there are many algorithms in the literature describing extraction methodologies for different applications however it is unclear how general and robust they are in this paper we provide an overview of many curve skeleton applications and compile set of desired properties of such representations we also give taxonomy of methods and analyze the advantages and drawbacks of each class of algorithms
artificial intelligence is playing an increasingly important role in network management in particular research in the area of intrusion detection relies extensively on ai techniques to design implement and enhance security monitoring systems this chapter discusses ways in which intrusion detection uses or could use ai some general ideas are presented and some actual systems are discussed the focus is mainly on knowledge representation machine learning and multi agent architectures
trajectory based tasks are common in many applications and have been widely studied recently researchers have shown that even very simple tasks such as selecting items from cascading menus can benefit from haptic force guidance haptic guidance is also of significant value in many applications such as medical training handwriting learning and in applications requiring precise manipulations there are however only very few guiding principles for selecting parameters that are best suited for proper force guiding in this paper we present model derived from the steering law that relates movement time to the essential components of tunneling task in the presence of haptic force guidance results of an experiment show that our model is highly accurate for predicting performance times in force enhanced tunneling tasks
manufacturers are focusing on multiprocessor system on chip mpsoc architectures in order to provide increased concurrency rather than increased clock speed for both large scale as well as embedded systems traditionally lock based synchronization is provided to support concurrency however managing locks can be very difficult and error prone in addition the performance and power cost of lock based synchronization can be high transactional memories have been extensively investigated as an alternative to lock based synchronization in general purpose systems it has been shown that transactional memory has advantages over locks in terms of ease of programming performance and energy consumption however their applicability to embedded multi core platforms has not been explored yet in this paper we demonstrate complete hardware transactional memory solution for an embedded multi core architecture consisting of cache coherent arm based cluster similar to arm’s mpcore using cycle accurate power and performance models for the transactional memory hardware we evaluate our architectural framework over set of different system and application settings and show that transactional memory is promising solution even for resource constrained embedded multiprocessors
on the web browsing and searching categories is popular method of finding documents two well known category based search systems are the yahoo and dmoz hierarchies which are maintained by experts who assign documents to categories however manual categorisation by experts is costly subjective and not scalable with the increasing volumes of data that must be processed several methods have been investigated for effective automatic text categorisation these include selection of categorisation methods selection of pre categorised training samples use of hierachies and selection of document fragments or features in this paper we further investigate categorisation into web hierarchies and the role of hierarchical information in improving categorisation effectiveness we introduce new strategies to reduce errors in hierarchical categorisation in particular we propose novel techniques that shift the assignment into higher level categories when lower level assignment is uncertain our results show that absolute error rates can be reduced by over
texture transfer algorithm modifies the target image replacing the high frequency information with the example source image previous texture transfer techniques normally use such factors as color distance and standard deviation for selecting the best texture from the candidate sets these factors are useful for expressing texture effect of the example source in the target image but are less than optimal for considering the object shape of the target image in this paper we propose novel texture transfer algorithm to express the directional effect based on the flow of the target image for this we use directional factor that considers the gradient direction of the target image we add an additional energy term that respects the image gradient to the previous fast texture transfer algorithm additionally we propose method for estimating the directional factor weight value from the target image we have tested our algorithm with various target images our algorithm can express result image with the feature of the example source texture and the flow of the target image
multidimensional lightcuts is new scalable method for efficiently rendering rich visual effects such as motion blur participating media depth of field and spatial anti aliasing in complex scenes it introduces flexible general rendering framework that unifies the handling of such effects by discretizing the integrals into large sets of gather and light points and adaptively approximating the sum of all possible gather light pair interactionswe create an implicit hierarchy the product graph over the gather light pairs to rapidly and accurately approximate the contribution from hundreds of millions of pairs per pixel while only evaluating tiny fraction eg we build upon the techniques of the prior lightcuts method for complex illumination at point however by considering the complete pixel integrals we achieve much greater efficiency and scalabilityour example results demonstrate efficient handling of volume scattering camera focus and motion of lights cameras and geometry for example enabling high quality motion blur with temporal sampling requires only increase in shading cost in scene with complex moving geometry materials and illumination
ambient intelligent systems provide an unexplored hardware platformfor executing distributed applications under strict energy constraintsthese systems must respond quickly to changes in userbehavior or environmental conditions and must provide high availabilityand fault tolerance under given quality constraints thesesystems will necessitate fault tolerance to be built into applicationsone way to provide such fault tolerance is to employ the use of redundancyhundreds of computational devices will be available indeeply networked ambient intelligent systems providing opportunitiesto exploit node redundancy to increase application lifetime orimprove quality of results if it drops below threshold pre copyingwith remote execution is proposed as novel alternative techniqueof code migration to enhance system lifetime for ambient intelligentsystems self management of the system is considered in two differentscenarios applications that tolerate graceful quality degradationand applications with single point failures the proposed techniquecan be part of design methodology for prolonging the lifetime ofa wide range of applications under various types of faults despitescarce energy resources
we describe an approach to the analysis of protocols for wireless sensor networks in scenarios with mobile nodes and dynamic link quality the approach is based on the theorem proving system pvs and can be used for formal specification automated simulation and verification of the behaviour of the protocol in order to demonstrate the applicability of the approach we analyse the reverse path forwarding algorithm which is the basic technique used for diffusion protocols for wireless sensor networks
we consider the problem of admission control in resource sharing systems such as web servers and transaction processing systems when the job size distribution has high variability with the aim of minimizing the mean response time it is well known that in such resource sharing systems as the number of tasks concurrently sharing the resource is increased the server throughput initially increases due to more efficient utilization of resources but starts falling beyond certain point due to resource contention and thrashing most admission control mechanisms solve this problem by imposing fixed upper bound on the number of concurrent transactions allowed into the system called the multi programming limit mpl and making the arrivals which find the server full queue up almost always the mpl is chosen to be the point that maximizes server efficiency in this paper we abstract such resource sharing systems as processor sharing ps server with state dependent service rate and first come first served fcfs queue and we analyze the performance of this model from queueing theoretic perspective we start by showing that counter to the common wisdom the peak efficiency point is not always optimal for minimizing the mean response time instead significant performance gains can be obtained by running the system at less than the peak efficiency we provide simple expression for the static mpl that achieves near optimal mean response time for general distributions next we present two traffic oblivious dynamic admission control policies that adjust the mpl based on the instantaneous queue length while also taking into account the variability of the job size distribution the structure of our admission control policies is mixture of fluid control when the number of jobs in the system is high with stochastic component when the system is near empty we show via simulations that our dynamic policies are much more robust to unknown traffic intensities and burstiness in the arrival process than imposing static mpl
people dynamically structure social interactions and activities at various locations in their environments in specialized types of places such as the office home coffee shop museum and school they also imbue various locations with personal meaning creating group hangouts and personally meaningful places mobile location aware community systems can potentially utilize the existence of such places to support the management of social information and interaction however acting effectively on this potential requires an understanding of how places and place types relate to people’s desire for place related awareness of and communication with others and what information people are willing to provide about themselves to enable place related communication and awareness we present here the findings from two qualitative studies survey of individuals in new york and study of how mobility traces can be used to find people’s important places in an exploration of these questions these studies highlight how people value and are willing to routinely provide information such as ratings comments event records relevant to place and when appropriate their location to enable services they also suggest how place and place type data could be used in conjunction with other information regarding people and places so that systems can be deployed that respect users people to people to places data sharing preferences we conclude with discussion on how place data can best be utilized to enable services when the systems in question are supported by sophisticated computerized user community social geographical model
most of today’s authentication schemes involve verifying the identity of principal in some way this process is commonly known as entity authentication in emerging ubiquitous computing paradigms which are highly dynamic and mobile in nature entity authentication may not be sufficient or even appropriate especially if principal’s privacy is to be protected in order to preserve privacy other attributes eg location or trustworthiness of the principal may need to be authenticated to verifier in this paper we propose ninja non identity based authentication scheme for mobile ubiquitous environment in which the trustworthiness of user’s device is authenticated anonymously to remote service provider verifier during the service discovery process we show how this can be achieved using trusted computing functionality
generating high quality gene clusters and identifying the underlying biological mechanism of the gene clusters are the important goals of clustering gene expression analysis to get high quality cluster results most of the current approaches rely on choosing the best cluster algorithm in which the design biases and assumptions meet the underlying distribution of the dataset there are two issues for this approach usually the underlying data distribution of the gene expression datasets is unknown and there are so many clustering algorithms available and it is very challenging to choose the proper one to provide textual summary of the gene clusters the most explored approach is the extractive approach that essentially builds upon techniques borrowed from the information retrieval in which the objective is to provide terms to be used for query expansion and not to act as stand alone summary for the entire document sets another drawback is that the clustering quality and cluster interpretation are treated as two isolated research problems and are studied separately in this paper we design and develop unified system gene expression miner to address these challenging issues in principled and general manner by integrating cluster ensemble text clustering and multidocument summarization and provide an environment for comprehensive gene expression data analysis we present novel cluster ensemble approach to generate high quality gene cluster in our text summarization module given gene cluster our expectation maximization based algorithm can automatically identify subtopics and extract most probable terms for each topic then the extracted top topical terms from each subtopic are combined to form the biological explanation of each gene cluster experimental results demonstrate that our system can obtain high quality clusters and provide informative key terms for the gene clusters
we consider the problem of segmenting webpage into visually and semantically cohesive pieces our approach is based on formulating an appropriate optimization problem on weighted graphs where the weights capture if two nodes in the dom tree should be placed together or apart in the segmentation we present learning framework to learn these weights from manually labeled data in principled manner our work is significant departure from previous heuristic and rule based solutions to the segmentation problem the results of our empirical analysis bring out interesting aspects of our framework including variants of the optimization problem and the role of learning
query by humming system allows the user to find song by humming part of the tune no musical training is needed previous query by humming systems have not provided satisfactory results for various reasons some systems have low retrieval precision because they rely on melodic contour information from the hum tune which in turn relies on the error prone note segmentation process some systems yield better precision when matching the melody directly from audio but they are slow because of their extensive use of dynamic time warping dtw our approach improves both the retrieval precision and speed compared to previous approaches we treat music as time series and exploit and improve well developed techniques from time series databases to index the music for fast similarity queries we improve on existing dtw indexes technique by introducing the concept of envelope transforms which gives general guideline for extending existing dimensionality reduction methods to dtw indexes the net result is high scalability we confirm our claims through extensive experiments
the accuracy of cardinality estimates is crucial for obtaining good query execution plan today’s optimizers make several simplifying assumptions during cardinality estimation that can lead to large errors and hence poor plans in scenario such as query optimizer testing it is very desirable to obtain the best plan ie the plan produced when the cardinality of each relevant expression is exact such plan serves as baseline against which plans produced by using the existing cardinality estimation module in the query optimizer can be compared however obtaining all exact cardinalities by executing appropriate subexpressions can be prohibitively expensive in this paper we present set of techniques that makes exact cardinality query optimization viable option for significantly larger set of queries than previously possible we have implemented this functionality in microsoft sql server and we present results using the tpc benchmark queries that demonstrate their effectiveness
critical problem in implementing interactive perception applications is the considerable computational cost of current computer vision and machine learning algorithms which typically run one to two orders of magnitude too slowly to be used interactively fortunately many of these algorithms exhibit coarse grained task and data parallelism that can be exploited across machines the slipstream project focuses on building highly parallel runtime system called sprout that can harness the computing power of cluster to execute perception applications with low latency this paper makes the case for using clusters for perception applications describes the architecture of the sprout runtime and presents two compute intensive yet interactive applications
using and extending framework is challenging task whose difficulty is exacerbated by the poor documentation that generally comes with the framework even in the presence of documentation developers often desire implementation examples for concrete guidance we propose an approach that automatically locates implementation examples from code base given lightweight documentation of framework based on our experience with concern oriented documentation we devised an approach that uses the framework documentation as template and that finds instances of this template in code base the concern instances represent self contained and structured implementation examples the relationships and the roles of parts composing the examples are uncovered and explained we implemented our approach in tool and conducted study comparing the results of our tool with results provided by eclipse committers showing that our approach can locate examples with high precision
existing distributed hash tables provide efficient mechanisms for storing and retrieving data item based on an exact key but are unsuitable when the search key is similar but not identical to the key used to store the data item in this paper we present scalable and efficient peer to peer system with new search primitive that can efficiently find the data items with keys closest to the search key the system works via novel assignment of virtual coordinates to each object in high dimensional synthetic space such that the proximity between two points in the coordinate space is correlated with the similarity between the strings that the points represent we examine the feasibility of this approach for efficient peer to peer search on inexact string keys and show that the system provides robust method to handle key perturbations that naturally occur in applications such as file sharing networks where the query strings are provided by users
many applications for mobile devices make use of maps but because interaction with these maps can be laborious the applications are often hard to use therefore the usability of maps on mobile devices must be improved in this paper we review the research that has been done to solve technical environmental and social challenges of mobile map use we will discuss interaction visualization and adaptive user support for maps on mobile devices we propose usability engineering as the method that should be used when developing maps for mobile applications
synthesizing expressive facial animation is very challenging topic within the graphics community in this paper we present an expressive facial animation synthesis system enabled by automated learning from facial motion capture data accurate motions of the markers on the face of human subject are captured while he she recites predesigned corpus with specific spoken and visual expressions we present novel motion capture mining technique that learns speech coarticulation models for diphones and triphones from the recorded data phoneme independent expression eigenspace piees that encloses the dynamic expression signals is constructed by motion signal processing phoneme based time warping and subtraction and principal component analysis pca reduction new expressive facial animations are synthesized as follows first the learned coarticulation models are concatenated to synthesize neutral visual speech according to novel speech input then texture synthesis based approach is used to generate novel dynamic expression signal from the piees model and finally the synthesized expression signal is blended with the synthesized neutral visual speech to create the final expressive facial animation our experiments demonstrate that the system can effectively synthesize realistic expressive facial animation
in this paper we describe rote extractor that learns patterns for finding semantic relationships in unrestricted text with new procedures for pattern generalization and scoring these include the use of part of speech tags to guide the generalization named entity categories inside the patterns an edit distance based pattern generalization algorithm and pattern accuracy calculation procedure based on evaluating the patterns on several test corpora in an evaluation with entities the system attains precision higher than for half of the relationships considered
this paper introduces new way to provide strong atomicity in an implementation of transactional memory strong atomicity lets us offer clear semantics to programs even if they access the same locations inside and outside transactions it also avoids differences between hardware implemented transactions and software implemented ones our approach is to use off the shelf page level memory protection hardware to detect conflicts between normal memory accesses and transactional ones this page level mechanism ensures correctness but gives poor performance because of the costs of manipulating memory protection settings and receiving notifications of access violations however in practice we show how combination of careful object placement and dynamic code update allows us to eliminate almost all of the protection changes existing implementations of strong atomicity in software rely on detecting conflicts by conservatively treating some non transactional accesses as short transactions in contrast our page level mechanism lets us be less conservative about how non transactional accesses are treated we avoid changes to non transactional code until possible conflict is detected dynamically and we can respond to phase changes where given instruction sometimes generates conflicts and sometimes does not we evaluate our implementation with versions of many of the stamp benchmarks and show how it performs within of an implementation with weak atomicity on all the benchmarks we have studied it avoids pathological cases in which other implementations of strong atomicity perform poorly
multidimensional databases have been designed to provide decision makers with the necessary tools to help them understand their data this framework is different from transactional data as the datasets contain huge volumes of historicized and aggregated data defined over set of dimensions that can be arranged through multiple levels of granularities many tools have been proposed to query the data and navigate through the levels of granularity however automatic tools are still missing to mine this type of data in order to discover regular specific patterns in this article we present method for mining sequential patterns from multidimensional databases at the same time taking advantage of the different dimensions and levels of granularity which is original compared to existing work the necessary definitions and algorithms are extended from regular sequential patterns to this particular case experiments are reported showing the significance of this approach
we present the tiny aggregation tag service for aggregation in low power distributed wireless environments tag allows users to express simple declarative queries and have them distributed and executed efficiently in networks of low power wireless sensors we discuss various generic properties of aggregates and show how those properties affect the performance of our in network approach we include performance study demonstrating the advantages of our approach over traditional centralized out of network methods and discuss variety of optimizations for improving the performance and fault tolerance of the basic solution
based on monadic datalog we introduce the concept of weighted monadic datalog over unranked trees this provides query language that can be used to extract quantitative information from semi structured databases where the quantities are taken from some semiring we show that weighted monadic datalog is as expressive as weighted tree automata on unranked trees moreover we prove that query can be evaluated efficiently on an unranked tree provided that is commutative and the underlying datalog program is non circular or ii is finite and commutative cpo semiring
rough sets are widely used in feature evaluation and attribute reduction and number of rough set based evaluation functions and search algorithms were reported however little attention has been paid to compute and compare stability of feature evaluation functions in this work we introduce three coefficients to calculate the stabilities of feature significance via perturbing samples experimental results show that entropy and fuzzy entropy based evaluation functions are more stable than the others and fuzzy rough set based functions are stable compared with the crisp functions these results give guideline to select feature evaluation for different applications
the problem of frequently updating multi dimensional indexes arises in many location dependent applications while the tree and its variants are the dominant choices for indexing multi dimensional objects the tree exhibits inferior performance in the presence of frequent updates in this paper we present an tree variant termed the rum tree which stands for tree with update memo that reduces the cost of object updates the rum tree processes updates in memo based approach that avoids disk accesses for purging old entries during an update process therefore the cost of an update operation in the rum tree is reduced to the cost of only an insert operation the removal of old object entries is carried out by garbage cleaner inside the rum tree in this paper we present the details of the rum tree and study its properties we also address the issues of crash recovery and concurrency control for the rum tree theoretical analysis and comprehensive experimental evaluation demonstrate that the rum tree outperforms other tree variants by up to one order of magnitude in scenarios with frequent updates
pattern based java bytecode compression techniques rely on the identification of identical instruction sequences that occur more than once each occurrence of such sequence is substituted by single instruction the sequence defines pattern that is used for extending the standard bytecode instruction set with the instruction that substitutes the pattern occurrences in the original bytecode alternatively the pattern may be stored in dictionary that serves for the bytecode decompression in this case the instruction that substitutes the pattern in the original bytecode serves as an index to the dictionary in this paper we investigate bytecode compression technique that considers more general case of patterns specifically we employ the use of an advanced pattern discovery technique that allows locating patterns of an arbitrary length which may contain variable number of wildcards in place of certain instruction opcodes or operands we evaluate the benefits and the limitations of this technique in various scenarios that aim at compressing the reference implementation of midp standard java environment for the development of applications for mobile devices
most mesh generation techniques require simplification and mesh improvement stages to prepare tetrahedral model for efficient simulation we have developed an algorithm that both reduces the number of tetrahedra in the model to permit interactive manipulation and removes the most poorly shaped tetrahedra to allow for stable physical simulations such as the finite element method the initial tetrahedral model may be composed of several different materials representing internal structures our approach targets the elimination of poorly shaped elements while simplifying the model using edge collapses and other mesh operations such as vertex smoothing tetrahedral swaps and vertex addition we present the results of our algorithm on variety of inputs including models with more than million tetrahedra in practice our algorithm reliably reduces meshes to contain only tetrahedra that meet specified shape requirements such as the minimum solid angle
automated techniques to diagnose the cause of system failures based on monitoring data is an active area of research at the intersection of systems and machine learning in this paper we identify three tasks that form key building blocks in automated diagnosis identifying distinct states of the system using monitoring data retrieving monitoring data from past system states that are similar to the current state pinpointing attributes in the monitoring data that indicate the likely cause of system failure we provide to our knowledge the first apples to apples comparison of both classical and state of the art techniques for these three tasks such studies are vital to the consolidation and growth of the field our study is based on variety of failures injected in multitier web service we present empirical insights and research opportunities
dynamic database is set of transactions in which the content and the size can change over time there is an essential difference between dynamic database mining and traditional database mining this is because recently added transactions can be more interesting than those inserted long ago in dynamic database this paper presents method for mining dynamic databases this approach uses weighting techniques to increase efficiency enabling us to reuse frequent itemsets mined previously this model also considers the novelty of itemsets when assigning weights in particular this method can find kind of new patterns from dynamic databases referred to trend patterns to evaluate the effectiveness and efficiency of the proposed method we implemented our approach and compare it with existing methods
in all wireless networks crucial problem is to minimize energy consumption as in most cases the nodes are battery operated we focus on the problem of power optimal broadcast for which it is well known that the broadcast nature of the radio transmission can be exploited to optimize energy consumption several authors have conjectured that the problem of power optimal broadcast is np complete we provide here formal proof both for the general case and for the geometric one in the former case the network topology is represented by generic graph with arbitrary weights whereas in the latter euclidean distance is considered we then describe new heuristic embedded wireless multicast advantage we show that it compares well with other proposals and we explain how it can be distributed
modal transition systems mts are an extension of labelled transition systems lts that distinguish between required proscribed and unknown behaviour and come equipped with notion of refinement that supports incremental modelling where unknown behaviour is iteratively elaborated into required or proscribed behaviour the original formulation of mts introduces two alternative semantics for mts strong and weak which require mts models to have the same communicating alphabet the latter allowing the use of distinguished unobservable action in this paper we show that the requirement of fixing the alphabet for mts semantics and the treatment of observable actions are limiting if mts are to support incremental elaboration of partial behaviour models we present novel semantics branching alphabet semantics for mts inspired by branching lts equivalence we show that some unintuitive refinements allowed by weak semantics are avoided and prove number of theorems that relate branching refinement with alphabet refinement and consistency these theorems which do not hold for other semantics support the argument for considering branching implementation of mts as the basis for sound semantics to support behaviour model elaboration
inspired by ideas from research on geometric and motion levels of detail we generalize lod to combine all currently novel techniques on real time human locomotion generation learn from fruitful research on key framed kinematic methods physically based approaches motion capture data reuse and action synthesis of reactive articulated characters we design and implement tentative but viable lod scheduler we also developed data driven on line locomotion generation system base on freely available motion library by integrating this lod transition scheduler into popular graphics and dynamics game engine this paper will then give brief overview on our experimental results and discussion on future work
internet facilitates access to data information and knowledge sources but at the same time it threatens to cognitively overload the decision makers this necessitates the development of effective decision support tools to properly inform the decision process internet technologies require new type of decision support that provides tighter integration and higher degree of direct interaction with the problem domain the central argument of this work is that in dynamic and highly complex electronic environments decision support systems dsss should be situated in the problem domain generic architecture the set of capabilities for our vision of situated dss is proposed and the architecture is illustrated with dss for investment management
we introduce multi agent logic cal cal cal cal cal variant of the linear temporal logic ltl with embedded multi agent knowledge with interacting agents the logic is motivated by semantics based on potentially infinite runs with time points represented by clusters of states with distributed knowledge of the agents we address properties of local and global knowledge modeled in this framework consider modeling of interaction between agents by possibility to puss information from one agent to others via possible transitions within time clusters of states main question we are focused on is the satisfiability problem and decidability of the logic cal cal cal cal cal key result is proposed algorithm which recognizes theorems of cal cal cal cal cal so we show that cal cal cal cal cal is decidable it is based on verification of validity for special normal reduced forms of rules in models with at most triple exponential size in the testing rules in the final part we discuss possible variations of the proposed logic
dependence aware transactional memory datm is recently proposed model for increasing concurrency of memory transactions without complicating their interface datm manages dependences between conflicting uncommitted transactions so that they commit safely the contributions of this paper are twofold first we provide safety proof for the dependence aware model this proof also shows that the datm model accepts all concurrent interleavings that are conflict serializable second we describe the first application of dependence tracking to software transactional memory stm design and implementation we compare our implementation with state of the art stm tl we use benchmarks from the stamp suite quantifying how dependence tracking converts certain types of transactional conflicts into successful commits on high contention workloads datm is able to take advantage of dependences to speed up execution by up to
the nearest neighbor search is an important operation widely used in multimedia databases in higher dimensions most of previous methods for nearest neighbor search become inefficient and require to compute nearest neighbor distances to large fraction of points in the space in this paper we present new approach for processing nearest neighbor search with the euclidean metric which searches over only small subset of the original space this approach effectively approximates clusters by encapsulating them into geometrically regular shapes and also computes better upper and lower bounds of the distances from the query point to the clusters for showing the effectiveness of the proposed approach we perform extensive experiments the results reveal that the proposed approach significantly outperforms the tree as well as the sequential scan
this paper addresses the notion of context in multiagent systems from an organisational point of view setting out from the rica metamodel that shapes the agents space of interaction on the basis of hierarchical organisational and communicative abstractions we propose interaction state machines as new formalism for the specification and enactment of multiagent interaction protocols by means of examples from the fipa interaction protocol library we show how this formalism allows for successive refinement of interaction protocols and how this process is guided by the organisational model underlying multiagent application
convex polyhedra are the basis for several abstractions used in static analysis and computer aided verification of complex and sometimes mission critical systems for such applications the identification of an appropriate complexity precision trade off is particularly acute problem so that the availability of wide spectrum of alternative solutions is mandatory we survey the range of applications of polyhedral computations in this area give an overview of the different classes of polyhedra that may be adopted outline the main polyhedral operations required by automatic analyzers and verifiers and look at some possible combinations of polyhedra with other numerical abstractions that have the potential to improve the precision of the analysis areas where further theoretical investigations can result in important contributions are highlighted
context modeling has long been acknowledged as key aspect in wide variety of problem domains in this paper we focus on the combination of contextualization and personalization methods to improve the performance of personalized information retrieval the key aspects in our proposed approach are the explicit distinction between historic user context and live user context the use of ontology driven representations of the domain of discourse as common enriched representational ground for content meaning user interests and contextual conditions enabling the definition of effective means to relate the three of them and the introduction of fuzzy representations as an instrument to properly handle the uncertainty and imprecision involved in the automatic interpretation of meanings user attention and user wishes based on formal grounding at the representational level we propose methods for the automatic extraction of persistent semantic user preferences and live ad hoc user interests which are combined in order to improve the accuracy and reliability of personalization for retrieval
human memory plays an important role in personal information management pim several scholars have noted that people refind information based on what they remember and it has been shown that people adapt their management strategies to compensate for the limitations of memory nevertheless little is known about what people tend to remember about their personal information and how they use their memories to refind the aim of this article is to increase our understanding of the role that memory plays in the process of refinding personal information concentrating on email re finding we report on user study that investigates what attributes of email messages participants remember when trying to refind we look at how the attributes change in different scenarios and examine the factors which impact on what is remembered
acquired data often provides the best knowledge of material’s bidirectional reflectance distribution function brdf its integration into most real time rendering systems requires both data compression and the implementation of the decompression and filtering stages on contemporary graphics processing units gpus this paper improves the quality of real time per pixel lighting on gpus using wavelet decomposition of acquired brdfs three dimensional texture mapping with indexing allows us to efficiently compress the brdf data by exploiting much of the coherency between hemispherical data we apply built in hardware filtering and pixel shader flexibility to perform filtering in the full brdf domain anti aliasing of specular highlights is performed via progressive level of detail technique built upon the multiresolution of the wavelet encoding this technique increases rendering performance on distant surfaces while maintaining accurate appearance of close ones
this paper proposes stochastic fluid flow model to compute the transfer time distribution of resources in peer to peer file sharing applications the amount of bytes transferred among peers is represented by continuous quantity the fluid level whose flow rate is modulated by set of discrete states representing the concurrent upload and download operations on the peers participating to the transfer transient solution of the model is then performed to compute the probability that peer can download given resource in less than units of time as function of several system parameters in particular the impact of file popularity bandwidth characteristics concurrent downloads and uploads cooperation level among peers and user behavior are included in our model specificationwe also provide numerical results aiming at proving the potentialities of the approach we adopted as well as to investigate interesting issues related to the effect of incentive mechanisms on the user cooperation
this paper elaborates on some of the fundamental contributions made by john mylopoulos in the area of requirements engineering we specifically focus on the use of goal models and their soft goals for reasoning about alternative options arising in the requirements engineering process personal account of john’s qualitative reasoning technique for comparing alternatives is provided first quantitative but lightweight technique for evaluating alternative options is then presented this technique builds on mechanisms introduced by the qualitative scheme while overcoming some problems raised by it meeting scheduling system is used as running example to illustrate the main ideas the acm portal is published by the association for computing machinery copyright acm inc terms of usage privacy policy code of ethics contact us useful downloads adobe acrobat quicktime windows media player real player
given the exponential increase of indexable context on the web ranking is an increasingly difficult problem in information retrieval systems recent research shows that implicit feedback regarding user preferences can be extracted from web access logs in order to increase ranking performance we analyze the implicit user feedback from access logs in the citeseer academic search engine and show how site structure can better inform the analysis of clickthrough feedback providing accurate personalized ranking services tailored to individual information retrieval systems experiment and analysis shows that our proposed method is more accurate on predicting user preferences than any non personalized ranking methods when user preferences are stable over time we compare our method with several non personalized ranking methods including ranking svmlight as well as several ranking functions specific to the academic document domain the results show that our ranking algorithm can reach accuracy in comparison to for ranking svmlight and below for all other single feature ranking methods we also show how the derived personalized ranking vectors can be employed for other ranking related purposes such as recommendation systems
there has been much interest in testing from finite state machines fsms as result of their suitability for modelling or specifying state based systems where there are multiple ports interfaces multi port fsm is used and in testing tester is placed at each port if the testers cannot communicate with one another directly and there is no global clock then we are testing in the distributed test architecture it is known that the use of the distributed test architecture can affect the power of testing and recent work has characterised this in terms of local equivalence in the distributed test architecture we can distinguish two fsms such as an implementation and specification if and only if they are not locally equivalent however there may be many fsms that are locally equivalent to given fsm and the nature of these fsms has not been explored this paper examines the set of fsms that are locally equivalent to given fsm it shows that there is unique smallest fsm and unique largest fsm that are locally equivalent to here smallest and largest refer to the set of traces defined by an fsm and thus to its semantics we also show that for given fsm the set of fsms that are locally equivalent to defines bounded lattice finally we define an fsm that amongst all fsms locally equivalent to has fewest states we thus give three alternative canonical fsms that are locally equivalent to an fsm one that defines the smallest set of traces one that defines the largest set of traces and one with fewest states all three provide valuable information and the first two can be produced in time that is polynomial in terms of the number of states of we prove that the problem of finding an equivalent fsm with fewest states is np hard in general but can be solved in polynomial time for the special case where there are two ports
an indulgent algorithm is distributed algorithm that besides tolerating process failures also tolerates arbitrarily long periods of instability with an unbounded number of timing and scheduling failures in particular no process can take any irrevocable action based on the operational status correct or failed of other processes this paper presents an intuitive and general characterization of indulgence the characterization can be viewed as simple application of murphy’s law to partial runs of distributed algorithm in computing model that encompasses various communication and resilience schemes we use our characterization to establish several results about the inherent power and limitations of indulgent algorithms
originating from basic research conducted in the and the parallel and distributed simulation field has matured over the last few decades today operational systems have been fielded for applications such as military training analysis of communication networks and air traffic control systems to mention few this tutorial gives an overview of technologies to distribute the execution of simulation programs over multiple computer systems particular emphasis is placed on synchronization also called time management algorithms as well as data distribution techniques
the sa tree is an interesting metric space indexing structure that is inspired by the voronoi diagram in essence the sa tree records portion of the delaunay graph of the data set graph whose vertices are the voronoi cells with edges between adjacent cells an improvement is presented on the original search strategy for the sa tree this consists of details on the intuition behind the improvement as well as the original search strategy and proof of their correctness furthermore it is shown how to adapt an incremental nearest neighbor algorithm to the sa tree which allows computing nearest neighbor in progressive manner unlike other adaptations the resulting algorithm does not take the unnecessary steps to ensure that keys of node elements are monotonically non decreasing
effective resizing of images should not only use geometric constraints but consider the image content as well we present simple image operator called seam carving that supports content aware image resizing for both reduction and expansion seam is an optimal connected path of pixels on single image from top to bottom or left to right where optimality is defined by an image energy function by repeatedly carving out or inserting seams in one direction we can change the aspect ratio of an image by applying these operators in both directions we can retarget the image to new size the selection and order of seams protect the content of the image as defined by the energy function seam carving can also be used for image content enhancement and object removal we support various visual saliency measures for defining the energy of an image and can also include user input to guide the process by storing the order of seams in an image we create multi size images that are able to continuously change in real time to fit given size
we present novel implementation of compressed suffix arrays exhibiting new tradeoffs between search time and space occupancy for given text or sequence of symbols over an alphabet sigma where each symbol is encoded by lg verbar sigma verbar bits we show that compressed suffix arrays use just nhh sigma bits while retaining full text indexing functionalities such as searching any pattern sequence of length in lg verbar sigma verbar polylog time the term hh le lg verbar sigma verbar denotes the hth order empirical entropy of the text which means that our index is nearly optimal in space apart from lower order terms achieving asymptotically the empirical entropy of the text with multiplicative constant if the text is highly compressible so that hn and the alphabet size is small we obtain text index with search time that requires only bits further results and tradeoffs are reported in the paper
the aim of this paper is to support user browsing on semantically heterogeneous information spaces in advance of user’s explicit actions his search context should be predicted by the locally annotated resources in his access histories we thus exploit semantic transcoding method and measure the relevance between the estimated model of user intention and the candidate resources in web spaces for these experiments we simulated the scenario of comparison shopping systems on the testing bed organized by twelve online stores in which images are annotated with semantically heterogeneous metadata
as cmos technology scales and more transistors are packed on to the same chip soft error reliability has become an increasingly important design issue for processors prior research has shown that there is significant architecture level masking and many soft error solutions take advantage of this effect prior work has also shown thatthe degree of such masking can vary significantly across workloads and between individual workload phases motivating dynamic adaptation of reliability solutions for optimal cost and benefit for such adaptation it is important to be able to accurately estimate the amount ofmasking or the architecture vulnerability factor avf online while the program is running unfortunately existing solutions for estimating avf are often based on offline simulators and hard to implement in real processors this paper proposes novel way of estimating avf online using simple modifications to the processor the estimation method applies to both logic and storage structures on the processor compared to previous methodsfor estimating avf our method does not require any offline simulation or calibration for different workloads we tested our method with widely used simulator from industry for four processor structures and for to intervals of each of eleven spec benchmarks the results show that our method provides acceptably accurate avf estimates at runtime the absoluteerror rarely exceeds across all application intervals for all structures and the mean absolute error for given application and structure combination is always within
in many contexts today documents are available in number of versions in addition to explicit knowledge that can be queried searched in documents these documents also contain implicit knowledge that can be found by text mining in this paper we will study association rule mining of temporal document collections and extend previous work within the area by performing mining based on semantics as well as studying the impact of appropriate techniques for ranking of rules
multimedia applications place high demands for quality of service qos performance and reliability on systems these stringent requirements make design of cost effective and scalable systems difficult therefore efficient adaptive and dynamic resource management techniques in conjunction with data placement techniques can be of great help in improving performance scalability and reliability of such systems this is the focus of our paper
consumer studies demonstrate that online users value personalized content at the same time providing personalization on websites seems quite profitable for web vendors this win win situation is however marred by privacy concerns since personalizing people’s interaction entails gathering considerable amounts of data about them as numerous recent surveys have consistently demonstrated computer users are very concerned about their privacy on the internet moreover the collection of personal data is also subject to legal regulations in many countries and states both user concerns and privacy regulations impact frequently used personalization methods this article analyzes the tension between personalization and privacy and presents approaches to reconcile the both
personalized web services strive to adapt their services advertisements news articles etc to individual users by making use of both content and user information despite few recent advances this problem remains challenging for at least two reasons first web service is featured with dynamically changing pools of content rendering traditional collaborative filtering methods inapplicable second the scale of most web services of practical interest calls for solutions that are both fast in learning and computation in this work we model personalized recommendation of news articles as contextual bandit problem principled approach in which learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles while simultaneously adapting its article selection strategy based on user click feedback to maximize total user clicks the contributions of this work are three fold first we propose new general contextual bandit algorithm that is computationally efficient and well motivated from learning theory second we argue that any bandit algorithm can be reliably evaluated offline using previously recorded random traffic finally using this offline evaluation method we successfully applied our new algorithm to yahoo front page today module dataset containing over million events results showed click lift compared to standard context free bandit algorithm and the advantage becomes even greater when data gets more scarce
discriminative learning methods are widely used in natural language processing these methods work best when their training and test data are drawn from the same distribution for many nlp tasks however we are confronted with new domains in which labeled data is scarce or non existent in such cases we seek to adapt existing models from resource rich source domain to resource poor target domain we introduce structural correspondence learning to automatically induce correspondences among features from different domains we test our technique on part of speech tagging and show performance gains for varying amounts of source and target training data as well as improvements in target domain parsing accuracy using our improved tagger
the graphical user interface gui is an important component of many software systems past surveys indicate that the development of gui is significant undertaking and that the gui’s source code often comprises substantial portion of the program’s overall source base graphical user interface creation frameworks for popular object oriented programming languages enable the rapid construction of simple and complex guis in this paper we examine the run time performance of two gui creation frameworks swing and thinlet that are tailored for the java programming language using simple model of java gui we formally define the difficulty of gui manipulation event after implementing case study application we conducted experiments to measure the event handling latency for gui manipulation events of varying difficulties during our investigation of the run time performance of the swing and thinlet gui creation frameworks we also measured the cpu and memory consumption of our candidate application during the selected gui manipulation events our experimental results indicate that thinlet often outperformed swing in terms of both event handling latency and memory consumption however swing appears to be better suited in terms of event handling latency and cpu consumption for the construction of guis that require manipulations of high difficulty levels
markov models have been widely used to represent and analyze user web navigation data in previous work we have proposed method to dynamically extend the order of markov chain model and complimentary method for assessing the predictive power of such variable length markov chain herein we review these two methods and propose novel method for measuring the ability of variable length markov model to summarize user web navigation sessions up to given length although the summarization ability of model is important to enable the identification of user navigation patterns the ability to make predictions is important in order to foresee the next link choice of user after following given trail so as for example to personalize web site we present an extensive experimental evaluation providing strong evidence that prediction accuracy increases linearly with summarization ability
we present knowledge discovery and data mining process developed as part of the columbia con edison project on manhole event prediction this process can assist with real world prioritization problems that involve raw data in the form of noisy documents requiring significant amounts of pre processing the documents are linked to set of instances to be ranked according to prediction criteria in the case of manhole event prediction which is new application for machine learning the goal is to rank the electrical grid structures in manhattan manholes and service boxes according to their vulnerability to serious manhole events such as fires explosions and smoking manholes our ranking results are currently being used to help prioritize repair work on the manhattan electrical grid
producing reliable and robust software has become one of the most important software development concerns in recent years testing is process by which software quality can be assured through the collection of information while testing can improve software reliability current tools typically are inflexible and have high over heads making it challenging to test large software projects in this paper we describe new scalable and flexible framework for testing programs with novel demand driven approach based on execution paths to implement test coverage this technique uses dynamic instrumentation on the binary code that can be inserted and removed on the fly to keep performance and memory overheads low we describe and evaluate implementations of the framework for branch node and defuse testing of java programs experimental results for branch testing show that our approach has on average speed up over static instrumentation and also uses less memory
consider an sql query that specifies duplicate elimination via distinct clause because duplicate elimination often requires an expensive sort of the query result it is often worthwhile to identify situations where the distinct clause is unnecessary to avoid the sort altogether we prove necessary and sufficient condition for deciding if query requires duplicate elimination the condition exploits knowledge about keys table constraints and query predicates because the condition cannot always be tested efficiently we offer practical algorithm that tests simpler sufficient condition we consider applications of this condition for various types of queries and show that we can exploit this condition in both relational and nonrelational database systems
the main purpose of an enterprise ontology is to promote the common understanding between people across enterprises as well as to serve as communication medium between people and applications and between different applications this paper outlines top level ontology called the context based enterprise ontology which aims to advance the understanding of the nature purposes and meanings of things in enterprises with providing basic concepts for conceiving structuring and representing things within contexts and or as contexts the ontology is based on the contextual approach according to which context involves seven domains purpose actor action object facility location and time the concepts in the ontology are defined in english and presented in meta models in uml based ontology engineering language
previous research has addressed the scalability and availability issues associated with the construction of cluster based network services this paper studies the clustering of replicated services when the persistent service data is frequently updated to this end we propose neptune an infrastructural middleware that provides flexible interface to aggregate and replicate existing service modules neptune accommodates variety of underlying storage mechanisms maintains dynamic and location transparent service mapping to isolate faulty modules and enforce replica consistency furthermore it allows efficient use of multi level replica consistency model with staleness control at its highest level this paper describes neptune’s overall architecture data replication support and the results of our performance evaluation
set based analysis is constraint based whole program analysis that is applicable to functional and object oriented programming language unfortunately the analysis is useless for large programs since it generates descriptions of data flow relationships that grow quadratically in the size of the programthis paper presents componential set based analysis which is faster and handles larger programs without any loss of accuracy over set based analysis the design of the analysis exploits number of theoretical results concerning constraint systems including completeness result and decision algorithm concerning the observable equivalance of constraint systems experimental results validate the practically of the analysis
materialized views and view maintenance are becoming increasingly important in practice in order to satisfy different data currency and performance requirements number of view maintenance policies have been proposed immediate maintenance involves potential refresh of the view after every update to the deriving tables when staleness of views can be tolerated view may be refreshed periodically or on demand when it is queried the maintenance policies that are chosen for views have implications on the validity of the results of queries and affect the performance of queries and updates in this paper we investigate number of issues related to supporting multiple views with different maintenance policies we develop formal notions of consistency for views with different maintenance policies we then introduce model based on view groupings for view maintenance policy assignment and provide algorithms based on the viewgroup model that allow consistency of views to be guaranteed next we conduct detailed study of the performance aspects of view maintenance policies based on an actual implementation of our model the performance study investigates the trade offs between different maintenance policy assignments our analysis of both the consistency and performance aspects of various view maintenance policies are important in making correct maintenance policy assignments
software systems modernisation using service oriented architectures soas and web services represents valuable option for extending the lifetime of mission critical legacy systems this paper presents black box modernisation approach for exposing interactive functionalities of legacy systems as services the problem of transforming the original user interface of the system into the request response interface of soa is solved by wrapper that is able to interact with the system on behalf of the user the wrapper behaviour is defined in the form of finite state machines retrievable by black box reverse engineering of the human computer interface the paper describes our wrapper based migration process and discusses the results of case studies showing process effectiveness and quality of resulting services
data replication is practical and effective method to achieve efficient and fault tolerant data access in grids traditionally data replication schemes maintain an entire replica in each site where file is replicated providing read only model these solutions require huge storage resources to store the whole set of replicas and do not allow efficient data modification to avoid the consistency problem in this paper we propose new replication method called the branch replication scheme brs that provides three main advantages over traditional approaches optimizing storage usage by creating subreplicas increasing data access performance by applying parallel techniques and providing the possibility to modify the replicas by maintaining consistency among updates in an efficient way an analytical model of the replication scheme naming system and replica updating scheme are formally described in the paper using this model operations such as reading writing or updating replica are analyzed simulation results demonstrate the feasibility of brs as they show that the new replication algorithm increases data access performance compared with popular replication schemes such as hierarchical and server directed replication which are commonly used in current data grids
the paper presents the development by using the proof assistant isabelle hol of compiler back end translating from functional source language to the bytecode language of an abstract machine the haskell code of the compiler is extracted from the isabelle hol specification and this tool is also used for proving the correctness of the implementation the main correctness theorem not only ensures functional semantics preservation but also resource consumption preservation the heap and stacks figures predicted by the semantics are confirmed in the translation to the abstract machinethe language and the development belong to wider proof carrying code framework in which formal compiler generated certificates about memory consumption are sought for
we explore suitable node labeling schemes used in collaborative xml dbmss xdbmss for short supporting typical xml document processing interfaces such schemes have to provide holistic support for essential xdbms processing steps for declarative as well as navigational query processing and with the same importance lock management in this paper we evaluate existing range based and prefix based labeling schemes before we propose our own scheme based on deweyids we experimentally explore its suitability as general and immutable node labeling mechanism stress its synergetic potential for query processing and locking and show how it can be implemented efficiently various compression and optimization measures deliver surprising space reductions frequently reduce the size of storage representation compared to an already space efficient encoding scheme to less than in the average and thus conclude their practical relevance
in this paper we investigate the problem of query rewriting using views in hybrid language allowing nominals ie individual names to occur in intentional descriptions of particular interest restricted form of nominals where individual names refer to simple values enable the specification of value constraints ie sets of allowed values for attributes such constraints are very useful in practice enabling for example fine grained description of queries and views in integration systems and thus can be exploited to reduce the query processing cost we use description logics to formalize the problem of query rewriting using views in presence of value constraints and show that the technique of query rewriting can be used to process queries under the certain answer semantics we propose sound and complete query rewriting bucket like algorithm data mining techniques have been used to favor scalability wrt the number of views experiments on synthetic datasets have been conducted
the problem of verifying the correctness of test executions is well known while manual verification is time consuming and error prone developing an oracle to automatically verify test executions can be as costly as implementing the original program this is especially true for concurrent programs due to their non determinism and complexity in this paper we present method that uses partial specifications to systematically derive oracles for concurrent programs we illustrate the method by deriving an ada task that monitors the execution of concurrent ada program and describe prototype tool that partially automates the derivation process we present the results of study that shows the derived oracles are surprisingly effective at error detection the study also shows that manual verification is an inaccurate means of failure detection that large test case sets must be used to ensure adequate testing coverage and that test cases must be run many times to cover for variations in run time behaviour
with the rapid increase in the amount of content on the world wide web it is now becoming clear that information cannot always be stored in form that anticipates all of its possible uses one solution to this problem is to create transcoding intermediaries that convert data on demand from one form into another up to now these transcoders have usually been stand alone components converting one particular data format to another particular data format more flexible approach is to create modular transcoding units that can be composed as needed in this paper we describe the benefits of an intermediary based transcoding approach and present formal framework for document transcoding that is meant to simplify the problem of composing transcoding operations
risk of covert insertion of circuitry into reconfigurable computing rc systems exists this paper reviews risks of hardware attack on field programmable gate array fpga based rc systems and proposes method for secure system credentials generation unique random and partially anonymous and trusted self reconfiguration using secure reconfiguration controller serecon and partial reconfiguration pr serecon provides root of trust rot for rc systems incorporating novel algorithms for security credentials generation and trusted design verification credentials are generated internally during system certification the private credential element never leaves the serecon security perimeter to provide integrity maintaining self reconfiguration serecon performs analysis of each new ip core structure prior to reconfiguration an unverified ip core can be used provided that its spatial isolation is retained serecon provides encrypted storage for installed ip cores resource usage for prototype serecon system is presented the protection provided by serecon is illustrated in number of security attack scenarios
applications and services are increasingly dependent on networks of smart sensors embedded in the environment to constantly sense and react to events in typical sensor network application information is collected from large number of distributed and heterogeneous sensor nodes information fusion in such applications is challenging research issue due to the dynamicity heterogeneity and resource limitations of sensor networks we present midfusion an adaptive middleware architecture to facilitate information fusion in sensor network applications midfusion discovers and selects the best set of sensors or sensor agents on behalf of applications transparently depending on the quality of service qos guarantees and the cost of information acquisition we also provide the theoretical foundation for midfusion to select the best set of sensors using the principles of bayesian and decision theories sensor selection algorithm ssa for selecting the best set of sensors is presented in this paper our theoretical findings are validated through simulation of the ssa algorithm on an example scenario
the need for incremental algorithms for evaluating database queries is well known but constructing algorithms that work on object oriented databases oodbs has been difficult the reason is that oodb query languages involve complex data types including composite objects and nested collections as result existing algorithms have limitations in that the kinds of database updates are restricted the operations found in many query languages are not supported or the algorithms are too complex to be described precisely we present an incremental computation algorithm that can handle any kind of database updates can accept any expressions in complex query languages such as oql and can be described precisely by translating primitive values and records into collections we can reduce all query expressions comprehension this makes the problems with incremental computation less complicated and thus allows us to decribe of two parts one is to maintain the consistency in each comprehension occurrence and the other is to update the value of an entire expression the algorithm is so flexible that we can use strict updates lazy updates and their combinations by comparing the performance of applications built with our mechanism and that of equivalent hand written update programs we show that our incremental algorithm can be iplemented efficiently
in this paper we present novel framework for direct volume rendering using splatting approach based on elliptical gaussian kernels to avoid aliasing artifacts we introduce the concept of resampling filter combining reconstruction with low pass kernel because of the similarity to heckbert’s ewa elliptical weighted average filter for texture mapping we call our technique ewa volume splatting it provides high image quality without aliasing artifacts or excessive blurring even with non spherical kernels hence it is suitable for regular rectilinear and irregular volume data sets moreover our framework introduces novel approach to compute the footprint function it facilitates efficient perspective projection of arbitrary elliptical kernels at very little additional cost finally we show that ewa volume reconstruction kernels can be reduced to surface reconstruction kernels this makes our splat primitive universal in reconstructing surface and volume data
database researchers have striven to improve the capability of database in terms of both performance and functionality we assert that the usability of database is as important as its capability in this paper we study why database systems today are so difficult to use we identify set of five pain points and propose research agenda to address these in particular we introduce presentation data model and recommend direct data manipulation with schema later approach we also stress the importance of provenance and of consistency across presentation models
we describe the construction of generic natural language query interface to an xml database our interface can accept large class of english sentences as query which can be quite complex and include aggregation nesting and value joins among other things this query is translated potentially after reformulation into an xquery expression the translation is based on mapping grammatical proximity of natural language parsed tokens in the parse tree of the query sentence to proximity of corresponding elements in the xml data to be retrieved iterative search in the form of followup queries is also supported our experimental assessment through user study demonstrates that this type of natural language interface is good enough to be usable now with no restrictions on the application domain
surface reconstruction from unorganized sample points is an important problem in computer graphics computer aided design medical imaging and solid modeling recently few algorithms have been developed that have theoretical guarantee of computing topologically correct and geometrically close surface under certain condition on sampling density unfortunately this sampling condition is not always met in practice due to noise non smoothness or simply due to inadequate sampling this leads to undesired holes and other artifacts in the output surface certain cad applications such as creating prototype from model boundary require water tight surface ie no hole should be allowed in the surface in this paper we describe simple algorithm called tight cocone that works on an initial mesh generated by popular surface reconstruction algorithm and fills up all holes to output water tight surface in doing so it does not introduce any extra points and produces triangulated surface interpolating the input sample points in support of our method we present experimental results with number of difficult data sets
real world data are often stored as relational database systems with different numbers of significant attributes unfortunately most classification techniques are proposed for learning from balanced non relational data and mainly for classifying one single attribute in this paper we propose an approach for learning from relational data with the specific goal of classifying multiple imbalanced attributes in our approach we extend relational modelling technique prms im designed for imbalanced relational learning to deal with multiple imbalanced attributes classification we address the problem of classifying multiple imbalanced attributes by enriching the prms im with the bagging classification ensemble we evaluate our approach on real world imbalanced student relational data and demonstrate its effectiveness in predicting student performance
in this paper we present results on real data focusing on personal identification based on one lead ecg using reduced number of heartbeat waveforms wide range of features can be used to characterize the ecg signal trace with application to personal identification we apply feature selection fs to the problem with the dual purpose of improving the recognition rate and reducing data dimensionality feature subspace ensemble method fse is described which uses an association between fs and parallel classifier combination techniques to overcome some fs difficulties with this approach the discriminative information provided by multiple feature subspaces determined by means of fs contributes to the global classification system decision leading to improved classification performance furthermore by considering more than one heartbeat waveform in the decision process through sequential classifier combination higher recognition rates were obtained
we describe bag of rectangles method for representing and recognizing human actions in videos in this method each human pose in an action sequence is represented by oriented rectangular patches extracted over the whole body then spatial oriented histograms are formed to represent the distribution of these rectangular patches in order to carry the information from the spatial domain described by the bag of rectangles descriptor to temporal domain for recognition of the actions four different methods are proposed these are namely frame by frame voting which recognizes the actions by matching the descriptors of each frame ii global histogramming which extends the idea of motion energy image proposed by bobick and davis by rectangular patches iii classifier based approach using svms and iv adaptation of dynamic time warping on the temporal representation of the descriptor the detailed experiments are carried out on the action dataset of blank et al high success rates prove that with very simple and compact representation we can achieve robust recognition of human actions compared to complex representations
we present reputation scheme for pseudonymous peer to peer pp system in an anonymous network misbehavior is one of the biggest problems in pseudonymous pp systems where there is little incentive for proper behavior in our scheme using ecash for reputation points the reputation of each user is closely related to his real identity rather than to his current pseudonym thus our scheme allows an honest user to switch to new pseudonym keeping his good reputation while hindering malicious user from erasing his trail of evil deeds with new pseudonym
most recently answer set programming asp has been attracting interest as new paradigm for problem solving an important aspect for which several approaches have been presented is the handling of preferences between rules in this paper we consider the problem of implementing preference handling approaches by means of meta interpreters in answer set programming in particular we consider the preferred answer set approaches by brewka and eiter by delgrande schaub and tompits and by wang zhou and lin we present suitable meta interpreters for these semantics using dlv which is an efficient engine for asp moreover we also present meta interpreter for the weakly preferred answer set approach by brewka and eiter which uses the weak constraint feature of dlv as tool for expressing and solving an underlying optimization problem we also consider advanced meta interpreters which make use of graph based characterizations and often allow for more efficient computations our approach shows the suitability of asp in general and of dlv in particular for fast prototyping this can be fruitfully exploited for experimenting with new languages and knowledge representation formalisms
the paper presents review of references in content based image retrieval the paper starts with discussing the working conditions of content based retrieval patterns of use types of pictures the role of semantics and the sensory gap subsequent sections discuss computational steps for image retrieval systems step one of the review is image processing for retrieval sorted by color texture and local geometry features for retrieval are discussed next sorted by accumulative and global features salient points object and shape features signs and structural combinations thereof similarity of pictures and objects in pictures is reviewed for each of the feature types in close connection to the types and means of feedback the user of the systems is capable of giving by interaction we briefly discuss aspects of system engineering databases system architecture and evaluation in the concluding section we present our view on the driving force of the field the heritage from computer vision the influence on computer vision the role of similarity and of interaction the need for databases the problem of evaluation and the role of the semantic gap
in this paper we propose op tkc order preserving top closed itemsets algorithm for mining top frequent closed itemsets our methodology visits the closed itemsets lattice in breadth first manner and generates all the top closed itemsets without generating all the closed itemsets of given dataset ie in the search space only closed itemsets that belongs to top are expanded and all other closed itemsets are pruned off our algorithm computes all the top closed itemsets with space complexity where is the dataset experiments involving publicly available datasets show that our algorithm takes less memory and running time than tfp algorithm
this paper is concerned with computing graph edit distance one of the criticisms that can be leveled at existing methods for computing graph edit distance is that they lack some of the formality and rigor of the computation of string edit distance hence our aim is to convert graphs to string sequences so that string matching techniques can be used to do this we use graph spectral seriation method to convert the adjacency matrix into string or sequence order we show how the serial ordering can be established using the leading eigenvector of the graph adjacency matrix we pose the problem of graph matching as maximum posteriori probability map alignment of the seriation sequences for pairs of graphs this treatment leads to an expression in which the edit cost is the negative logarithm of the posteriori sequence alignment probability we compute the edit distance by finding the sequence of string edit operations which minimizes the cost of the path traversing the edit lattice the edit costs are determined by the components of the leading eigenvectors of the adjacency matrix and by the edge densities of the graphs being matched we demonstrate the utility of the edit distance on number of graph clustering problems
accuracy is most important data quality dimension and its assessment is key issue in data management most of current studies focus on how to qualitatively analyze accuracy dimension and the analysis depends heavily on experts knowledge seldom work is given on how to automatically quantify accuracy dimension based on jensen shannon divergence jsd measure we propose accuracy of data can be automatically quantified by comparing data with its entity’s most approximation in available context to quickly identify most approximation in large scale data sources locality sensitive hashing lsh is employed to extract most approximation at multiple levels namely column record and field level our approach can not only give each data source an objective accuracy score very quickly as long as context member is available but also avoid human’s laborious interaction as an automatic accuracy assessment solution in multiple source environment our approach is distinguished especially for large scale data sources theory and experiment show our approach performs well in achieving metadata on accuracy dimension
we present new role system in which the type or role of each object depends on its referencing relationships with other objects with the role changing as these relationships change roles capture important object and data structure properties and provide useful information about how the actions of the program interact with these properties our role system enables the programmer to specify the legal aliasing relationships that define the set of roles that objects may play the roles of procedure parameters and object fields and the role changes that procedures perform while manipulating objects we present an interprocedural compositional and context sensitive role analysis algorithm that verifies that program maintains role constraints
owing to long serving time and huge numbers of clients internet services can easily suffer from transient faults although restarting service can solve this problem information of the on line requests will be lost owing to the service restart which is unacceptable for many commercial or transaction based services in this paper we propose an approach to achieve the goal of zero loss restart for internet services under this approach kernel subsystem is responsible for detecting the transient faults retaining the sol channels of the service and managing the service restart flow in addition some straightforward modifications to the service should be made to take advantage of the kernel support to demonstrate the feasibility of our approach we implemented the subsystem in the linux kernel moreover we modified web server and cgi program to take advantage of the kernel support according to the experimental results our approach incurs little runtime overhead ie less than percnt when the service crashes it can be restarted quickly ie within mu with no information loss furthermore the performance impact due to the service crash is small these results show that the approach can efficiently achieve the goal of zero loss restart for internet services copyright copy john wiley sons ltd
we describe the use of flexible meta interpreter for performing access control checks on deductive databases the meta program is implemented in prolog and takes as input database and an access policy specification we then proceed to specialise the meta program for given access policy and intensional database by using the logen partial evaluation system in addition to describing the programs involved in our approach we give number of performance measures for our implementation of an access control checker and we discuss the implications of using this approach for access control on deductive databases in particular we show that by using our approach we get flexible access control with virtually zero overhead
our goal is to simulate the full hair geometry consisting of approximately one hundred thousand hairs on typical human head this will require scalable methods that can simulate every hair as opposed to only few guide hairs novel to this approach is that the individual hair hair interactions can be modeled with physical parameters friction static attraction etc at the scale of single hair as opposed to clumped or continuum interactions in this vein we first propose new altitude spring model for preventing collapse in the simulation of volumetric tetrahedra and we show that it is also applicable both to bending in cloth and torsion in hair we demonstrate that this new torsion model for hair behaves in fashion similar to more sophisticated models with significantly reduced computational cost for added efficiency we introduce semi implicit discretization of standard springs that makes them truly linear in multiple spatial dimensions and thus unconditionally stable without requiring newton raphson iteration we also simulate complex hair hair interactions including sticking and clumping behavior collisions with objects eg head and shoulders and self collisions notably in line with our goal to simulate the full head of hair we do not generate any new hairs at render time
we describe novel approach to inferring curves from perspective drawings in an interactive design tool our methods are based on traditional design drawing style known as analytic drawing which supports precise image space construction of linear scaffold this scaffold in turn acts as set of visual constraints for sketching curves we implement analytic drawing techniques in pure inference sketching interface which supports both single and multi view incremental construction of complex scaffolds and curve networks new representation of drawings is proposed and useful interactive drawing aids are described novel techniques are presented for deriving constraints from single view sketches drawn relative to the current scaffold and then inferring line and curve geometry which satisfies these constraints the resulting analytic drawing tool allows drawings to be constructed using exactly the same strokes as one would make on paper
our research is aimed at characterizing understanding and exploiting the interactions between hardware and software to improve system performance we have developed paradigm for continuous program optimization cpo that assists in and automates the challenging task of pelformance tuning and we have implemented an initial prototype of this paradigm at the core of our implementation is performance and environment monitoring pem component that vertically integrates performance events from various layers in the execution stack cpo agents use the data provided by pem to detect diagnose and alleviate performance problems on existing systems in addition cpo can be used to improve future architecture designs by analyzing pem data collected on whole system simulator while varying architectural characteristics in this paper we present the cpo paradigm describe an initial implementation that includes pem as component and discuss two cpo clients
in many applications it is desirable to cluster high dimensional data along various subspaces which we refer to as projective clustering we propose new objective function for projective clustering taking into account the inherent trade off between the dimension of subspace and the induced clustering error we then present an extension of the means clustering algorithm for projective clustering in arbitrary subspaces and also propose techniques to avoid local minima unlike previous algorithms ours can choose the dimension of each cluster independently and automatically furthermore experimental results show that our algorithm is significantly more accurate than the previous approaches
resource management is one of the focus areas in grid computing research one major objective of resource management in computational grid environment is to allocate jobs to make efficient use of the computational resources under different resource providers and thereby achieve high performance of the jobs therefore performance analysis is also an important issue in grid resource management this paper aims at providing an integrated framework for performance based resource management in computational grid environment the framework is supported by multi agent system mas which has been developed using firm software engineering approach based on gaia methodology the mas provides adaptive execution facility either by rescheduling the jobs onto different resource providers or through application of local tuning techniques to jobs in case of any performance problem may be due to some change in the resource availability or usage scenario thus it always guarantees to maintain the quality of service as desired by the client
boosting is set of methods for the construction of classifier ensembles the differential feature of these methods is that they allow to obtain strong classifier from the combination of weak classifiers therefore it is possible to use boosting methods with very simple base classifiers one of the most simple classifiers are decision stumps decision trees with only one decision node this work proposes variant of the most well known boosting method adaboost it is based on considering as the base classifiers for boosting not only the last weak classifier but classifier formed by the last selected weak classifiers is parameter of the method if the weak classifiers are decision stumps the combination of weak classifiers is decision tree the ensembles obtained with the variant are formed by the same number of decision stumps than the original adaboost hence the original version and the variant produce classifiers with very similar sizes and computational complexities for training and classification the experimental study shows that the variant is clearly beneficial
statically typed aspect oriented programming languages restrict application of around advice only to the join points that have conforming types though the restriction guarantees type safety it can prohibit application of advice that is useful yet does not cause runtime type errors to this problem we present novel weaving mechanism called the type relaxed weaving that allows such advice applications while preserving type safety we formalized the mechanism and implemented as an aspectj compatible compiler called relaxaj
in order to facilitate sketch recognition most online existing works assume that people will not start to draw new symbol before the current one has been finished we propose in this paper method that relaxes this constraint the proposed methodology relies on two dimensional dynamic programming dp technique allowing symbol hypothesis generation which can correctly segment and recognize interspersed symbols in addition as discriminative classifiers usually have limited capability to reject outliers some domain specific knowledge is included to circumvent those errors due to untrained patterns corresponding to erroneous segmentation hypotheses with point level measurement the experiment shows that the proposed novel approach is able to achieve an accuracy of more than percent
automatic extraction of semantic information from text and links in web pages is key to improving the quality of search results however the assessment of automatic semantic measures is limited by the coverage of user studies which do not scale with the size heterogeneity and growth of the web here we propose to leverage human generated metadata namely topical directories to measure semantic relationships among massive numbers of pairs of web pages or topics the open directory project classifies millions of urls in topical ontology providing rich source from which semantic relationships between web pages can be derived while semantic similarity measures based on taxonomies trees are well studied the design of well founded similarity measures for objects stored in the nodes of arbitrary ontologies graphs is an open problem this paper defines an information theoretic measure of semantic similarity that exploits both the hierarchical and non hierarchical structure of an ontology an experimental study shows that this measure improves significantly on the traditional taxonomy based approach this novel measure allows us to address the general question of how text and link analyses can be combined to derive measures of relevance that are in good agreement with semantic similarity surprisingly the traditional use of text similarity turns out to be ineffective for relevance ranking
as the number of cores in cmps increases noc is projected to be the dominant communication fabric increase in the number of cores brings an important issue to the forefront the issue of chip power consumption which is projected to increase rapidly with the increase in number of cores since noc infrastructure contributes significantly to the total chip power consumption reducing noc power is crucial while circuit level techniques are important in reducing noc power architectural and software level approaches can be very effective in optimizing power consumption any such technique power saving technique should be scalable and have minimal adverse impact on performance we propose dynamic communication link usage based proactive link power management scheme this scheme using markov model proactively manages communication link turn ons and turn offs which results in negligible performance degradation and significant power savings we show that our prediction scheme is about accurate for the spec omp benchmarks and about over all applications experimented this accuracy helps us achieve link power savings of up to and an average link power savings of more importantly it incurs performance penalties as low as on average
high instruction cache hit rates are key to high performance one known technique to improve the hit rate of caches is to minimize cache interference by improving the layout of the basic blocks of the code however the performance impact of this technique has been reported for application code only even though there is evidence that the operating system often uses the cache heavily and with less uniform patterns than applications it is unknown how well existing optimizations perform for systems code and whether better optimizations can be found we address this problem in this paper this paper characterizes in detail the locality patterns of the operating system code and shows that there is substantial locality unfortunately caches are not able to extract much of it rarely executed special case code disrupts spatial locality loops with few iterations that call routines make loop locality hard to exploit and plenty of loop less code hampers temporal locality based on our observations we propose an algorithm to expose these localities and reduce interference in the cache for range of cache sizes associativities lines sizes and organizations we show that we reduce total instruction miss rates by percent or up to absolute points using simple model this corresponds to execution time reductions of the order of percent in addition our optimized operating system combines well with optimized and unoptimized applications
reasoning about the correctness of replication algorithm is difficult endeavor if correctness has to be shown for component based architecture where client request can lead to execution across different components or tiers this is even more difficult existing formalisms are either restricted to systems with only one component or make strong assumptions about the setup of the system in this paper we present flexible framework to reason about exactly once execution in failure prone replicated component based system our approach allows us to reason about the execution across the entire system eg application server and database tier if given replication algorithm makes assumptions about some of the components then those can be easily integrated into the reasoning process
optimizing the common case has been an adage in decades of processor design practices however as the system complexity and optimization techniques sophistication have increased substantially maintaining correctness under all situations however unlikely is contributing to the necessity of extra conservatism in all layers of the system design the mounting process voltage and temperature variation concerns further add to the conservatism in setting operating parameters excessive conservatism in turn hurt performance and efficiency in the common case however much of the systems complexity comes from advanced performance features and may not compromise the whole systems functionality and correctness even if some components are imperfect and introduce occasional errors we propose to separate performance goals from the correctness goal using an explicitly decoupled architecture in this paper we discuss one such incarnation where an independent core serves as an optimistic performance enhancement engine that helps accelerate the correctness guaranteeing core by passing high quality predictions and performing accurate prefetching the lack of concern for correctness in the optimistic core allows us to optimize its execution in more effective fashion than possible in optimizing monolithic core with correctness requirements we show that such decoupled design allows significant optimization benefits and is much less sensitive to conservatism applied in the correctness domain
nguyen and shparlinski have recently presented polynomial time algorithm that provably recovers the signer rsquo secret dsa key when few consecutive bits of the random nonces used at each signature generation are known for number of dsa signatures at most linear in log denoting as usual the small prime of dsa under reasonable assumption on the hash function used in dsa the number of required bits is about log but can be decreased to log log with running time qo log log subexponential in log and even further to two in polynomial time if one assumes access to ideal lattice basis reduction namely an oracle for the lattice closest vector problem for the infinity norm all previously known results were only heuristic including those of howgrave graham and smart who introduced the topic here we obtain similar results for the elliptic curve variant of dsa ecdsa
data aggregation reduces energy consumption by reducing the number of message transmissions in sensor networks effective aggregation requires that event messages be routed along common paths while existing routing protocols provide many ways to construct the aggregation tree this opportunistic style of aggregation is usually not optimal the minimal steiner tree mst maximises the possible degree of aggregation but finding such tree requires global knowledge of the network which is not practical in sensor networks in this paper we propose the adaptive aggregation tree aat to dynamically transform the structure of the routing tree to improve the efficiency of data aggregation it adapts to changes in the set of source nodes automatically and approaches the cost savings of mst without explicit maintenance of an infrastructure the evaluation results show that aat reduces the communication energy consumption by compared to shortest path tree and by compared to gpsr
ubiquitous computing aims to enhance computer use by utilizing many computer resources available through physical environments but also making them invisible to users the purpose of ubiquitous computing is anywhere and anytime access to information within computing infrastructures that is blended into background and no longer be reminded this ubiquitous computing poses new security challenges while the information can be accessed at anywhere and anytime because it may be applied by criminal users the information may contain private information that cannot be shared by all user communities several approaches are designed to protect information for pervasive environments however ad hoc mechanisms or protocols are typically added in the approaches by compromising disorganized policies or additional components to protect from unauthorized accessusage control has been considered as the next generation access control model with distinguishing properties of decision continuity in this paper we present usage control model to protect services and devices in ubiquitous computing environments which allows the access restrictions directly on services and object documents the model not only supports complex constraints for pervasive computing such as services devices and data types but also provides mechanism to build rich reuse relationships between models and objects finally comparisons with related works are analysed
dynamic aspect oriented programming aop technologies typically provide coarse grained mechanisms for adapting aspects that cross cut system deployment ie whole aspect modules can be added and removed at runtime however in this paper we demonstrate that adaptation of the finer grained elements of individual aspect modules is required in highly dynamic systems and applications we present aspectopencom principled reflection based component framework that provides meta object protocol capable of fine grained adaptation of deployed aspects we then evaluate this solution by eliciting set of requirements for dynamic fine grained adaptation from series of case studies and illustrate how the framework successfully meets these criteria we also investigate the performance gains of fine grained adaptation versus coarse grained approach
since security is of critical importance for modern storage systems it is imperative to protect stored data from being tampered with or disclosed although an increasing number of secure storage systems have been developed there is no way to dynamically choose security services to meet disk requests flexible security requirements furthermore existing security techniques for disk systems are not suitable to guarantee desired response times of disk requests we remedy this situation by proposing an adaptive strategy referred to as awards that can judiciously select the most appropriate security service for each write request while endeavoring to guarantee the desired response times of all disk requests to prove the efficiency of the proposed approach we build an analytical model to measure the probability that disk request is completed before its desired response time the model also can be used to derive the expected value of disk requests security levels empirical results based on synthetic workloads as well as real intensive applications show that awards significantly improves overall performance over an existing scheme by up to percnt with an average of percnt
this paper presents an offline partial evaluator for the calculus with the delimited continuation constructs shift and reset based on danvy and filinski’s type system for shift and reset we first present type system that specifies well annotated terms we then show specializer that receives an annotated term and produces the output in continuation passing style cps the correctness of our partial evaluator is established using the technique of logical relations thanks to the explicit reference to the type of continuations we can establish the correctness using the standard proof technique of structural induction despite the fact that the specializer itself is written in cps the paper also shows an efficient constraint based binding time analysis as well as how to extend the present work to richer language constructs such as recursion and conditionals
skyline query is of great importance in many applications such as multi criteria decision making and business planning in particular skyline point is data object in the database whose attribute vector is not dominated by that of any other objects previous methods to retrieve skyline points usually assume static data objects in the database ie their attribute vectors are fixed whereas several recent work focus on skyline queries with dynamic attributes in this paper we propose novel variant of skyline queries namely metric skyline whose dynamic attributes are defined in the metric space ie not limited to the euclidean space we illustrate an efficient and effective pruning mechanism to answer metric skyline queries through metric index extensive experiments have demonstrated the efficiency and effectiveness of our proposed pruning techniques over the metric index in answering metric skyline queries
this work approaches the problem of recognizing emotional facial expressions in static images focusing on three preprocessing techniques for feature extraction such as principal component analysis pca linear discriminant analysis lda and gabor filters these methods are commonly used for face recognition and the novelty consists in combining features provided by them in order to improve the performance of an automatic procedure for recognizing emotional facial expressions testing and recognition accuracy were performed on the japanese female facial expression jaffe database using multi layer perceptron mlp neural network as classifier the best classification accuracy on variations of facial expressions included in the training set was obtained combining pca and lda features of correct recognition rate whereas combining pca lda and gabor filter features the net gave of correct classification on facial expressions of subjects not included in the training set
software product lines of industrial size can easily incorporate thousands of variation points this scale of variability can become extremely complex to manage resulting in product development process that bears significant costs one technique that can be applied beneficially in this context is visualisation visualisation is widely used in software engineering and has proven useful to amplify human cognition in data intensive applications adopting this technique in software product line engineering can help stakeholders in supporting essential work tasks and in enhancing their understanding of large and complex product lines the research presented in this paper describes an integrated meta model and research tool that employs visualisation techniques to address significant software product line tasks such as variability management and product derivation examples of the tasks are described and the ways in which these tasks can be further supported by utilising visualisation techniques are explained
we present generalization of the ideal model for recursive polymorphic types types are defined as sets of terms instead of sets of elements of semantic domain our proof of the existence of types computed by fixpoint of typing operator does not rely on metric properties but on the fact that the identity is the limit of sequence of projection terms this establishes connection with the work of pitts on relational properties of domains this also suggests that ideals are better understood as closed sets of terms defined by orthogonality with respect to set of contexts
tablet pcs are gaining popularity but many older adults still struggle with pointing particularly with two error types missing landing and lifting outside the target bounds and slipping landing on the target but slipping off before lifting to solve these problems we examined the feasibility of extending and combining existing techniques designed for younger users and the mouse focusing our investigation on the bubble cursor and steady clicks techniques through laboratory experiment with younger and older adults we showed that both techniques can be adapted for use in pen interface and that combining the two techniques provides greater support than either technique on its own though our results were especially pertinent to the older group both ages benefited from the designs we also found that technique performance depended on task context from these findings we established guidelines for technique selection
we propose new fault block model minimal connected component mcc for fault tolerant adaptive routing in mesh connected multiprocessor systems this model refines the widely used rectangular model by including fewer nonfaulty nodes in fault blocks the positions of source destination nodes relative to faulty nodes are taken into consideration when constructing fault blocks the main idea behind it is that node will be included in fault block only if using it in routing will definitely make the route nonminimal the resulting fault blocks are of the rectilinear monotone polygonal shapes sufficient and necessary condition is proposed for the existence of the minimal manhattan routes in the presence of such fault blocks based on the condition an algorithm is proposed to determine the existence of manhattan routes since mcc is designed to facilitate minimal route finding if there exists no minimal route under mcc fault model then there will be absolutely no minimal route whatsoever we will also present two adaptive routing algorithms that construct manhattan route avoiding all fault blocks should such routes exist
we develop new algorithms for the management of transactions in page shipping client server database system in which the physical database is organized as sparse tree index our starvation free fine grained locking protocol combines adaptive callbacks with key range locking and guarantees repeatable read level isolation ie serializability for transactions containing any number of record insertions record deletions and key range scans partial and total rollbacks of client transactions are performed by the client each structure modification such as page split or merge is defined as an atomic action that affects only two levels of the tree and is logged using single redo only log record so that the modification never needs to be undone during transaction rollback or restart recovery the steal and no force buffering policy is applied by the server when flushing updated pages onto disk and by the clients when shipping updated data pages to the server while pages involved in structure modification are forced to the server when the modification is finished the server performs the restart recovery from client and system failures using an aries csa based recovery protocol our algorithms avoid accessing stale data but allow data page to be updated by one client transaction and read by many other client transactions simultaneously and updates may migrate from data page to another in structure modifications caused by other transactions while the updating transaction is still active
due to name abbreviations identical names name misspellings and pseudonyms inpublications or bibliographies citations an author may have multiple names and multiple authors may share the same name such name ambiguity affects the performance of document retrieval web search database integration and may cause improper attribution to authors this paper investigates two supervised learning approaches to disambiguate authors in the citations one approach uses the naive bayes probability model generative model the other uses support vector machines svms and the vector space representation of citations discriminative model both approaches utilize three types of citation attributes co author names the title of the paper and the title of the journal or proceeding we illustrate these two approaches on two types of data one collected from the web mainly publication lists from homepages the other collected from the dblpcitation databases
in this paper we address the problem of constructing an index for text document or collection of documents to answer various questions about the occurrences of pattern when allowing constant number of errors in particular our index can be built to report all occurrences all positions or all documents where pattern occurs in time linear in the size of the query string and the number of results this improves over previous work where the look up time was either not linear or depended upon the size of the document corpus our data structure has size nlog dn on average and with high probability for input size and queries with up to errors additionally we present trade off between query time and index complexity that achieves worst case bounded index size and preprocessing time with linear look up time on average
warp processing is recent computing technology capable of autonomously partitioning the critical kernels within an executing software application to hardware circuits implemented within an on chip fpga while previous performance driven warp processing has been shown to provide significant performance improvements over software only execution the dynamic performance improvement of warp processors may be lost for certain application domains such as real time systems alternatively as power consumption continue to become dominant design constraint we present and thoroughly analyze low power warp processing methodology that leverages voltage and or frequency scaling to substantially reduce power consumption without any performance degradation mdash all without requiring designer effort beyond the initial software development
making queries to database system through computer application can become repetitive and time consuming task for those users who generally make similar queries to get the information they need to work with we believe that interface agents could help these users by personalizing the query making and information retrieval tasks interface agents are characterized by their ability to learn users interests in given domain and to help them by making suggestions or by executing tasks on their behalf having this purpose in mind we have developed an agent named queryguesser to assist users of computer applications in which retrieving information from database is key task this agent observes user’s behavior while he is working with the database and builds the user’s profile then queryguesser uses this profile to suggest the execution of queries according to the user’s habits and interests and to provide the user information relevant to him by making time demanding queries in advance or by monitoring the events and operations occurring in the database system in this way the interaction between database users and databases becomes personalized while it is enhanced
mobile phones are set to become the universal interface to online services and cloud computing applications however using them for this purpose today is limited to two configurations applications either run on the phone or run on the server and are remotely accessed by the phone these two options do not allow for customized and flexible service interaction limiting the possibilities for performance optimization as well in this paper we present middleware platform that can automatically distribute different layers of an application between the phone and the server and optimize variety of objective functions latency data transferred cost etc our approach builds on existing technology for distributed module management and does not require new infrastructures in the paper we discuss how to model applications as consumption graph and how to process it with number of novel algorithms to find the optimal distribution of the application modules the application is then dynamically deployed on the phone in an efficient and transparent manner we have tested and validated our approach with extensive experiments and with two different applications the results indicate that the techniques we propose can significantly optimize the performance of cloud applications when used from mobile phones
large heterogeneous volumes of simulation data are calculated and stored in many disciplines eg in climate and climate impact research to gain insight current climate analysis applies statistical methods and model sensitivity analyzes in combination with standard visualization techniques however there are some obstacles for researchers in applying the full functionality of sophisticated visualization exploiting the available interaction and visualization functionality in order to go beyond data presentation tasks in particular there is gap between available and actually applied multi variate visualization techniques furthermore visual data comparison of simulation and measured data is still challenging task consequently this paper introduces library of visualization techniques tailored to support exploration and evaluation of climate simulation data these techniques are integrated into the easy to use visualization framework simenvvis designed as front end user interface to simulation environment which provides high level of user support generating visual representations
when addressing the formal validation of generated software two main alternatives consist either to prove the correctness of compilers or to directly validate the generated code here we focus on directly proving the correctness of compiled code issued from powerful pattern matching constructions typical of ml like languages or rewrite based languages such as elan maude or tom in this context our first contribution is to define general framework for anchoring algebraic pattern matching capabilities in existing languages like java or ml then using just enough powerful intermediate language we formalize the behavior of compiled code and define the correctness of compiled code with respect to pattern matching behavior this allows us to prove the equivalence of compiled code correctness with generic first order proposition whose proof could be achieved via proof assistant or an automated theorem prover we then extend these results to the multi match situation characteristic of the ml like languages the whole approach has been implemented on top of the tom compiler and used to validate the syntactic matching code of the tom compiler itself
as more interactive surfaces enter public life casual interactions from passersby are bound to increase most of these users can be expected to carry mobile phone or pda which nowadays offers significant computing capabilities of its own this offers new possibilities for interaction between these users private displays and large public ones in this paper we present system that supports such casual interactions we first explore method to track mobile phones that are placed on horizontal interactive surface by examining the shadows which are cast on the surface this approach detects the presence of mobile device as opposed to any other opaque object through the signal strength emitted by the built in bluetooth transceiver without requiring any modifications to the devices software or hardware we then go on to investigate interaction between sudoku game running in parallel on the public display and on mobile devices carried by passing users mobile users can join running game by placing their devices on designated area the only requirement is that the device is in discoverable bluetooth mode after specific device has been recognized client software is sent to the device which then enables the user to interact with the running game finally we explore the results of study which we conducted to determine the effectiveness and intrusiveness of interactions between users on the tabletop and users with mobile devices
the notion of data warehouse for integrating operational data into single repository is rapidly becoming popular in modern organizations an important issue in this context is how often one should synchronize the data warehouse to reflect the changes in the constituent operational data sources if the synchronization is performed very frequently the associated cost might be quite high although the data warehouse would only have small amount of stale data on the other hand if the data warehouse is synchronized infrequently it might result in costly errors in business decisions arising from the stale data this paper examines the trade off between the synchronization and staleness costs and derives the optimal synchronization frequency
since annotations were added to the java language many frameworks have moved to using annotated plain old java objects pojos in their newest releases legacy applications are thus forced to undergo extensive restructuring in order to migrate from old framework versions to new versions based on annotations version lock in additionally because annotations are embedded in the application code changing between framework vendors may also entail largescale manual changes vendor lock in this paper presents novel refactoring approach that effectively solves these two problems our approach infers concise set of semantics preserving transformation rules from two versions of single class unlike prior approaches that detect only simple structural refactorings our algorithm can infer general composite refactorings and is more than accurate on average we demonstrate the effectiveness of our approach by automatically upgrading more than lines of the unit testing code of four open source java applications to use the latest version of the popular junit testing framework
group signatures allow users to anonymously sign messages in the name of group membership revocation has always been critical issue in such systems in boneh and shacham formalized the concept of group signatures with verifier local revocation where revocation messages are only sent to signature verifiers as opposed to both signers and verifiers this paper presents an efficient verifier local revocation group signature vlr gs providing backward unlinkability ie previously issued signatures remain anonymous even after the signer’s revocation with security proof in the standard model ie without resorting to the random oracle heuristic
buffered coscheduled mpi bcs mpi introduces new approach to design the communication layer for large scale parallel machines the emphasis of bcs mpi is on the global coordination of large number of communicating processes rather than on the traditional optimization of the point to point performance bcs mpi delays the inter processor communication in order to schedule globally the communication pattern and it is designed on top of minimal set of collective communication primitives in this paper we describe prototype implementation of bcs mpi and its communication protocols several experimental results executed on set of scientific applications show that bcs mpi can compete with production level mpi implementation but is much simpler to implement debug and model
several studies have demonstrated the effectiveness of the wavelet decomposition as tool for reducing large amounts of data down to compact wavelet synopses that can be used to obtain fast accurate approximate answers to user queries while conventional wavelet synopses are based on greedily minimizing the overall root mean squared ie norm error in the data approximation recent work has demonstrated that such synopses can suffer from important problems including severe bias and wide variance in the quality of the data reconstruction and lack of non trivial guarantees for individual approximate answers as result probabilistic thresholding schemes have been recently proposed as means of building wavelet synopses that try to probabilistically control other approximation error metrics such as the maximum relative error in data value reconstruction which is arguably the most important for approximate query answers and meaningful error guaranteesone of the main open problems posed by this earlier work is whether it is possible to design efficient deterministic wavelet thresholding algorithms for minimizing non error metrics that are relevant to approximate query processing systems such as maximum relative or maximum absolute error obviously such algorithms can guarantee better wavelet synopses and avoid the pitfalls of probabilistic techniques eg bad coin flip sequences leading to poor solutions in this paper we address this problem and propose novel computationally efficient schemes for deterministic wavelet thresholding with the objective of optimizing maximum error metrics we introduce an optimal low polynomial time algorithm for one dimensional wavelet thresholding our algorithm is based on new dynamic programming dp formulation and can be employed to minimize the maximum relative or absolute error in the data reconstruction unfortunately directly extending our one dimensional dp algorithm to multi dimensional wavelets results in super exponential increase in time complexity with the data dimensionality thus we also introduce novel polynomial time approximation schemes with tunable approximation guarantees for the target maximum error metric for deterministic wavelet thresholding in multiple dimensions
due to the high dynamic frequency of virtual method calls in typical object oriented programs feedback directed devirtualization and inlining is one of the most important optimizations performed by high performance virtual machines critical input to effective feedback directed inlining is an accurate dynamic call graph in virtual machine the dynamic call graph is computed online during program execution therefore to maximize overall system performance the profiling mechanism must strike balance between profile accuracy the speed at which the profile becomes available to the optimizer and profiling overhead this paper introduces new low overhead sampling based technique that rapidly converges on high accuracy dynamic call graph we have implemented the technique in two high performance virtual machines jikes rvmand we empirically assess our profiling technique by reporting on the accuracy of the dynamic call graphs it computes and by demonstrating that increasing the accuracy of the dynamic call graph results in more effective feedback directed inlining
one of the data services offered within the cords service environment is multidatabase fulfilling this service requirement is the man date of the cords multidatabase project involving researchers at the university of waterloo and queen’s university one objective of this project is to research issues in multidatabase systems thus far these include schema integration global query decomposition and optimization and distributed transaction management second objective is to design and implement multidatabase prototype the latter objective presented an opportunity to explore and assess number of international and industry standards for example microsoft’s open database connectivity odbc and the open software foundation’s distributed computing environment osf dce this paper discusses the overall design of the multidatabase the research issues addressed and the status of the prototype implementation
bounded reachability or model checking is widely believed to work poorly when using decision diagrams instead of sat procedures recent research suggests this to be untrue with regards to synchronous systems particularly digital circuits this paper shows that the belief is also myth for asynchronous systems such as models specified by petri nets we propose bounded saturation new algorithm to compute bounded state spaces using multi way decision diagrams mdds this is based on the established saturation algorithm which benefits from non standard search strategy that is very different from breadth first search to bound saturation we employ edge valued mdds and rework its search strategy experimental results show that our algorithm often but not always compares favorably against two sat based approaches advocated in the literature for deadlock checking in petri nets
consider scientist who wants to explore multiple data sets to select the relevant ones for further analysis since the visualization real estate may put stringent constraint on how much detail can be presented to this user in single page effective table summarization techniques are needed to create summaries that are both sufficiently small and effective in communicating the available content in this paper we first argue that table summarization can benefit from knowledge about acceptable value clustering alternatives for clustering the values in the database we formulate the problem of table summarization with the help of value lattices we then provide framework to express alternative clustering strategies and to account for various utility measures such as information loss in assessing different summarization alternatives based on this interpretation we introduce three preference criteria max min util cautious max sum util cumulative and pareto util for the problem of table summarization to tackle with the inherent complexity we rely on the properties of the fuzzy interpretation to further develop novel ranked set cover based evaluation mechanism rsc these are brought together in an alphasum table summarization system experimental evaluations showed that rsc improves both execution times and the summary qualities in alphasum by pruning the search space more effectively than the existing solutions
since catalogs are dynamic autonomous and heterogeneous the integration of potentially large number of dynamic catalogs is delicate and time consuming task in this paper we describe the design and the implementation of system through which existing on line product catalogs can be integrated and the resulting integrated catalogs can be continuously adapted and personalized within dynamic environment the integration framework originates from previous project on integration of web data called webfindit using the framework we propose methodology for adaptation of integrated catalogs based on the observation of customers apos interaction patterns
fundamental problem that confronts peer to peer applications is the efficient location of the node that stores desired data item this paper presents chord distributed lookup protocol that addresses this problem chord provides support for just one operation given key it maps the key onto node data location can be easily implemented on top of chord by associating key with each data item and storing the key data pair at the node to which the key maps chord adapts efficiently as nodes join and leave the system and can answer queries even if the system is continuously changing results from theoretical analysis and simulations show that chord is scalable communication cost and the state maintained by each node scale logarithmically with the number of chord nodes
understanding large software systems is difficult traditionally automated tools are used to assist program understanding however the representations constructed by these tools often require prohibitive time and space demand driven techniques can be used to reduce these requirements however the use of pointers in modern languages introduces additional problems that do not integrate well with these techniques we present new techniques for effectively coping with pointers in large software systems written in the programming language and use our techniques to implement program slicing toolfirst we use fast flow insensitive points to analysis before traditional data flow analysis second we allow the user to parameterize the points to analysis so that the resulting program slices more closely match the actual program behavior such information cannot easily be obtained by the tool or might otherwise be deemed unsafe finally we present data flow equations for dealing with pointers to local variables in recursive programs these equations allow the user to select an arbitrary amount of calling context in order to better trade performance for precisionto validate our techniques we present empirical results using our program slicer on large programs the results indicate that cost effective analysis of large programs with pointers is feasible using our techniques
this paper explores the relationship between domain scheduling in avirtual machine monitor vmm and performance traditionally vmm schedulers have focused on fairly sharing the processor resources among domains while leaving the scheduling of resources as asecondary concern however this can resultin poor and or unpredictable application performance making virtualization less desirable for applications that require efficient and consistent behavior this paper is the first to study the impact of the vmm scheduler on performance using multiple guest domains concurrently running different types of applications in particular different combinations of processor intensive bandwidth intensive andlatency sensitive applications are run concurrently to quantify the impacts of different scheduler configurations on processor and performance these applications are evaluated on different scheduler configurations within the xen vmm these configurations include variety of scheduler extensions aimed at improving performance this cross product of scheduler configurations and application types offers insight into the key problems in vmm scheduling for and motivates future innovation in this area
conceptual models are well known tools to achieve good design of information systems nevertheless the understanding and use of all the constructs and constraints which are presented in such models are not an easy task and sometimes it is cause of loss of interestin this chapter we have tried to study in depth and clarify the meaning of the features of conceptual models the disagreements between main conceptual models the confusion in the use of some of their constructs and some open problems in these models are shown another important topic treated in this chapter is the conceptual to logic schemata transformation processsome solutions are presented in order to clarify the relationship construct and to extend the cardinality constraint concept in ternary relationships how to preserve the cardinality constraint semantics in binary and ternary relationships for their implementation in dbms with active capabilities has also been developed
future socs will contain multiple cores for workloads with significant parallelism prior work has shown the benefit of many small multi threaded scalar cores for workloads that require better single thread performance dedicated larger core can help but comes at large opportunity cost in the number of scalar cores that could be provisioned instead this paper proposes way to repurpose pair of scalar cores into way out of order issue core with minimal area overhead federating scalar cores in this way nevertheless achieves comparable performance to dedicated out of order core and dissipates less power as well
we present optimal algorithms for several fundamental problems on planar graphs our main contribution is an efficient algorithm for computing small vertex separator of an unweighted planar graph this algorithm is superior to all existing external memory algorithms for this problem as it requires neither breadth first search tree nor an embedding of the graph as part of the input in fact we derive optimal algorithms for planar embedding breadth first search depth first search single source shortest paths and computing weighted separators of planar graphs from our unweighted separator algorithm
this paper addresses some of the foundational issues associated with discovering the best few correlations from database specifically we consider the computational complexity of various definitions of the top correlation problem where the goal is to discover the few sets of events whose co occurrence exhibits the smallest degree of independence our results show that many rigorous definitions of correlation lead to intractable and strongly inapproximable problems proof of this inapproximability is significant since similar problems studied by the computer science theory community have resisted such analysis one goal of the paper and for future research is to develop alternative correlation metrics whose use will both allow efficient search and produce results that are satisfactory for users
while fine grained concurrent languages can naturally capture concurrency in many irregular and dynamic problems their flexibility has generally resulted in poor execution effciency in such languages the computation consists of many small threads which are created dynamically and synchronized implicitly in order to minimize the overhead of these operations we propose hybrid execution model which dynamically adapts to runtime data layout providing both sequential efficiency and low overhead parallel execution this model uses separately optimized sequential and parallel versions of code sequential efficiency is obtained by dynamically coalescing threads via stack based execution and parallel efficiency through latency hiding and cheap synchronization using heap allocated activation frames novel aspects of the stack mechanism include handling return values for futures and executing forwarded messages the responsibility to reply is passed along like call cc in scheme on the stack in addition the hybrid execution model is expressed entirely in and therefore is easily portable to many systems experiments with function call intensive programs show that this model achieves sequential efficiency comparable to programs experiments with regular and irregular application kernels on the cm and td demonstrate that it can yield to times better performance than code optimized for parallel execution alone
the existing predictive spatiotemporal indexes can be classified into two categories depending on whether they are based on the primal or dual methodology although we have gained considerable empirical knowledge about various access methods currently there is only limited understanding on the theoretical characteristics of the two methodologies in fact the experimental results in different papers even contradict each other regarding the relative superiority of the primal and dual techniques this paper presents careful study on the query performance of general primal and dual indexes and reveals important insight into the behavior of each technique in particular we mathematically establish the conditions that determine the superiority of each methodology and provide rigorous justification for well known observations that have not been properly explained in the literature our analytical findings also resolve the contradiction in the experiments of previous work
commerce applications have diverse security requirements ranging from business to business over business to consumer to consumer to consumer types of applications this range of requirements cannot be handled adequately by one single security model although role based access controls rbac depict promising fundament for generic high level security furthermore rbac is well researched but rather incompletely realized in most of the current backend as well as business layer systems security mechanisms have often been added to existing software causing many of the well known deficiencies found in most software products however with the rise of component based software development security models can also be made available for reuse therefore we present general purpose software framework providing security mechanisms such as authentication access controls and auditing for java software development the framework is called gamma generic authorization mechanisms for multi tier applications and offers multiple high level security models including the aforementioned rbac that may even be used concurrently to cover such diverse security requirements as found within commerce environments
multi clip query requests multiple video clips be returned as the answer of the query in many applications and situations the order in which these clips are to be delivered does not matter that much to the user this allows the system ample opportunities to optimize system throughput by using schedules that maximize the effect of piggybacking in this paper we study how to find such optimal schedules in particular we consider two optimization criteria one based on maximizing the number of piggybacked clips and ii the other based on maximizing the impact on buffer space we show that the optimal schedule under the first criterion is equivalent to maximum matching in suitably defined bipartite graph and that under the second criterion the optimal schedule is equivalent to maximum matching in suitably defined weighted bipartite graph our experimental results which are based on realistic distributions indicate that both kinds of optimal schedules can lead to gain in throughput of over and yet the time taken to compute such an optimal schedule is negligible finally we show how to deal with clips that are variable in length
this paper presents new framework for anytime heuristic search where the task is to achieve as many goals as possible within the allocated resources we show the inadequacy of traditional distance estimation heuristics for tasks of this type and present alternative heuristics that are more appropriate for multiple goal search in particular we introduce the marginal utility heuristic which estimates the cost and the benefit of exploring subtree below search node we developed two methods for online learning of the marginal utility heuristic one is based on local similarity of the partial marginal utility of sibling nodes and the other generalizes marginal utility over the state feature space we apply our adaptive and non adaptive multiple goal search algorithms to several problems including focused crawling and show their superiority over existing methods
this paper describes the design implementation and testing of system for selecting necessary axioms from large set also containing superfluous axioms to obtain proof of conjecture the selection is determined by semantics of the axioms and conjecture ordered heuristically by syntactic relevance measure the system is able to solve many problems that cannot be solved alone by the underlying conventional automated reasoning system
in information retrieval sub space techniques are usually used to reveal the latent semantic structure of data set by projecting it to low dimensional space non negative matrix factorisation nmf which generates non negative representation of data through matrix decomposition is one such technique it is different from other similar techniques such as singular vector decomposition svd in its non negativity constraints which lead to its parts based representation characteristic in this paper we present the novel use of nmf in two tasks object class detection and automatic annotation of images experimental results imply that nmf is promising sub space technique for discovering the latent structure of image data sets with the ability of encoding the latent topics that correspond to object classes in the basis vectors generated
today’s internet is open and anonymous while it permits free traffic from any host attackers that generate malicious traffic cannot typically be held accountable in this paper we present system called hosttracker that tracks dynamic bindings between hosts and ip addresses by leveraging application level data with unreliable ids using month long user login trace from large email provider we show that hosttracker can attribute most of the activities reliably to the responsible hosts despite the existence of dynamic ip addresses proxies and nats with this information we are able to analyze the host population to conduct forensic analysis and also to blacklist malicious hosts dynamically
quorum system is formed by organizing nodes into subsets called quorums where every two quorums intersect and no quorum includes another quorum the quorum system reduces the access cost per operation to balance the load and to improve the system scalability all of these properties make quorum systems particularly attractive for large scale sensor applications involving data gathering and diffusion tasks the task of data dissemination with quorum system is the same as doing match making between quorums but those pseudo quorum systems designed for wired networks fail to address the challenges introduced by sensor networks in this paper novel data dissemination scheme mm gsq built on top of new quorum system in wireless sensor network is proposed the mm gsq uses new quorum system named spatial neighbor proxy quorum snpq which is evolved from the pseudo quorum but much smaller the snpq utilizes the geometric properties of the planar graph eg gg or rng graph and reduces the quorum access cost greatly as opposed to the traditional pseudo quorum the mm gsq improves energy consumption by reducing message transmissions and collisions increases the match making success rate and is easy to be implemented theoretical analyses and experimental results indicate that the new quorum system snpq with related data dissemination algorithm mm gsq has higher scalability energy efficiency and match making success rate than those of the original means it is especially suitable for data dissemination in large scale wireless sensor networks wsns
many ranking algorithms applying machine learning techniques have been proposed in informational retrieval and web search however most of existing approaches do not explicitly take into account the fact that queries vary significantly in terms of ranking and entail different treatments regarding the ranking models in this paper we apply divide and conquer framework for ranking specialization ie learning multiple ranking models by addressing query difference we first generate query representation by aggregating ranking features through pseudo feedbacks and employ unsupervised clustering methods to identify set of ranking sensitive query topics based on training queries to learn multiple ranking models for respective ranking sensitive query topics we define global loss function by combining the ranking risks of all query topics and we propose unified svm based learning process to minimize the global loss moreover we employ an ensemble approach to generate the ranking result for each test query by applying set of ranking models of the most appropriate query topics we conduct experiments using benchmark dataset for learning ranking functions as well as dataset from commercial search engine experimental results show that our proposed approach can significantly improve the ranking performance over existing single model approaches as well as straightforward local ranking approaches and the automatically identified ranking sensitive topics are more useful for enhancing ranking performance than pre defined query categorization
the framework of consistent query answers and repairs has been introduced to alleviate the impact of inconsistent data on the answers to query repair is minimally different consistent instance and an answer is consistent if it is present in every repair in this article we study the complexity of consistent query answers and repair checking in the presence of universal constraints we propose an extended version of the conflict hypergraph which allows to capture all repairs wrt set of universal constraints we show that repair checking is in ptime for the class of full tuple generating dependencies and denial constraints and we present polynomial repair algorithm this algorithm is sound ie always produces repair but also complete ie every repair can be constructed next we present polynomial time algorithm computing consistent answers to ground quantifier free queries in the presence of denial constraints join dependencies and acyclic full tuple generating dependencies finally we show that extending the class of constraints leads to intractability for arbitrary full tuple generating dependencies consistent query answering becomes conp complete for arbitrary universal constraints consistent query answering is complete and repair checking conp complete
the structure of the web is increasingly being used to improve organization search and analysis of information on the web for example google uses the text in citing documents documents that link to the target document for search we analyze the relative utility of document text and the text in citing documents near the citation for classification and description results show that the text in citing documents when available often has greater discriminative and descriptive power than the text in the target document itself the combination of evidence from document and citing documents can improve on either information source alone moreover by ranking words and phrases in the citing documents according to expected entropy loss we are able to accurately name clusters of web pages even with very few positive examples our results confirm quantify and extend previous research using web structure in these areas introducing new methods for classification and description of pages
in recent years role based access control rbac has been spreading within organizations however companies still have considerable difficulty migrating to this model due to the complexity involved in identifying set of roles fitting the real needs of the company all the various role engineering methods proposed thus far lack metric for measuring the quality of candidate roles produced this paper proposes new approach guided by cost based metric where cost represents the effort to administer the resulting rbac further we propose ream role based association rule mining an algorithm leveraging the cost metric to find candidate role sets with the lowest possible administration cost for specific parameter set rbam behaves as already existing role mining algorithms and is worst case np complete yet we will provide several examples showing the sensibility of assumptions made by the algorithm further application of the algorithm to real data will highlight the improvements over current solutions finally we comment on the direction of future research
cast shadows can be significant in many computer vision applications such as lighting insensitive recognition and surface reconstruction nevertheless most algorithms neglect them primarily because they involve nonlocal interactions in nonconvex regions making formal analysis difficult however many real instances map closely to canonical configurations like wall groove type structure or pitted surface in particular we experiment with textures like moss gravel and kitchen sponge whose surfaces include canonical configurations like grooves this paper takes first step toward formal analysis of cast shadows showing theoretically that many configurations can be mathematically analyzed using convolutions and fourier basis functions our analysis exposes the mathematical convolution structure of cast shadows and shows strong connections to recent signal processing frameworks for reflection and illumination
we consider succinct or highly space efficient representations of static string consisting of pairs of balanced parentheses which support natural operations such as finding the matching parenthesis for given parenthesis or finding the pair of parentheses that most tightly enclose given pair this problem was considered by jacobson space efficient static trees and graphs in proc of the th focs pp and munro and raman succinct representation of balanced parentheses and static trees siam comput who gave bit and bit representations respectively that supported the above operations in time on the ram model of computation this data structure is fundamental tool in succinct representations and has applications in representing suffix trees ordinal trees planar graphs and permutations we consider the practical performance of parenthesis representations first we give new bit representation that supports all the above operations in time this representation is conceptually simpler its space bound has smaller term and it also has simple and uniform time and space construction algorithm we implement our data structure and variant of jacobson’s and evaluate their practical performance speed and memory usage when used in succinct representation of trees derived from xml documents as baseline we compare our representations against widely used implementation of the standard dom document object model representation of xml documents both succinct representations use orders of magnitude less space than dom and tree traversal operations are usually only slightly slower than in dom
the quality of compiler optimized code for high performance applications is far behind what optimization and domain experts can achieve by hand although it may seem surprising at first glance the performance gap has been widening over time due to the tremendous complexity increase in microprocessor and memory architectures and to the rising level of abstraction of popular programming languages and styles this paper explores in between solutions neither fully automatic nor fully manual ways to adapt computationally intensive application to the target architecture by mimicking complex sequences of transformations useful to optimize real codes we show that generative programming is practical means to implement architecture aware optimizations for high performance applicationsthis work explores the promises of generative programming languages and techniques for the high performance computing expert we show that complex architecture specific optimizations can be implemented in type safe purely generative framework peak performance is achievable through the careful combination of high level multi stage evaluation language metaocaml with low level code generation techniques nevertheless our results also show that generative approaches for high performance computing do not come without technical caveats and implementation barriers concerning productivity and reuse we describe these difficulties and identify ways to hide or overcome them from abstract syntaxes to heterogeneous generators of code generators combining high level and type safe multi stage programming with back end generator of imperative code
non interference is semantical condition on programs that guarantees the absence of illicit information flow throughout their execution and that can be enforced by appropriate information flow type systems much of previous work on type systems for noninterference has focused on calculi or high level programming languages and existing type systems for low level languages typically omit objects exceptions and method calls and or do not prove formally the soundness of the type system we define an information flow type system for sequential jvm like language that includes classes objects arrays exceptions and method calls and prove that it guarantees non interference for increased confidence we have formalized the proof in the proof assistant coq an additional benefit of the formalization is that we have extracted from our proof certified lightweight bytecode verifier for information flow our work provides to our best knowledge the first sound and implemented information flow type system for such an expressive fragment of the jvm
classification of streaming data faces three basic challenges it has to deal with huge amounts of data the varying time between two stream data items must be used best possible anytime classification and additional training data must be incrementally learned anytime learning for applying the classifier consistently to fast data streams in this work we propose novel index based technique that can handle all three of the above challenges using the established bayes classifier on effective kernel density estimators our novel bayes tree automatically generates adapted efficiently to the individual object to be classified hierarchy of mixture densities that represent kernel density estimators at successively coarser levels our probability density queries together with novel classification improvement strategies provide the necessary information for very effective classification at any point of interruption moreover we propose novel evaluation method for anytime classification using poisson streams and demonstrate the anytime learning performance of the bayes tree
we study the inference of data type definitions dtds for views of xml data using an abstraction that focuses on document content structure the views are defined by query language that produces list of documents selected from one or more input sources the selection conditions involve vertical and horizontal navigation thus querying explicitly the order present in input documents we point several strong limitations in the descriptive ability of current dtds and the need for extending them with subtyping mechanism and ii more powerful specification mechanism than regular languages such as context free languages with these extensions we show that one can always infer tight dtds that precisely characterize selection view on sources satisfying given dtds we also show important special cases where one can infer tight dtd without requiring extension ii finally we consider related problems such as verifying conformance of view definition with predefined dtd extensions to more powerful views that construct complex documents are also briefly discussed
because case based reasoning cbr is instance based it is vulnerable to noisy data other learning techniques such as support vector machines svms and decision trees have been developed to be noise tolerant so certain level of noise in the data can be condoned by contrast noisy data can have big impact in cbr because inference is normally based on small number of cases so far research on noise reduction has been based on majority rule strategy cases that are out of line with their neighbors are removed we depart from that strategy and use local svms to identify noisy cases this is more powerful than majority rule strategy because it explicitly considers the decision boundary in the noise reduction process in this paper we provide details on how such local svm strategy for noise reduction can be made scale to very large datasets training samples the technique is evaluated on nine very large datasets and shows excellent performance when compared with alternative techniques
according to different kinds of connectivity we can distinguish three types of mobile ad hoc networks dense sparse and clustered networks this paper is about modeling mobility in clustered networks where nodes are concentrated into clusters of dense connectivity and in between there exists sparse connectivity the dense and sparse networks are extensively studied and modeled but not much attention is paid to the clustered networks in the sparse and clustered networks an inherently important aspect is the mobility model both for the design and evaluation of routing protocols we propose new mobility model for clustered networks called heterogeneous random walk this model is simple mathematically tractable and most importantly it captures the phenomenon of emerging clusters observed in real partitioned networks in an elegant way we provide closed form expression for the stationary distribution of node position and we give recipe for the perfect simulation moreover based on the real mobility trace we provide strong evidence for the main macroscopic characteristics of clustered networks captured by the proposed mobility model for the very first time in the literature we show evidence for the correlation between the spatial speed distribution and the cluster formation we also present the results of the analysis of real cluster dynamics caused by nodes mobility
motion trajectory is meaningful and informative clue in characterizing the motions of human robots or moving objects hence it is important to explore effective motion trajectory modeling however with the existing methods motion trajectory is used in its raw data form and effective trajectory description is lacking in this paper we propose novel motion trajectory signature descriptor and develop three signature descriptions for motion characterization the flexible descriptions give the signature high functional adaptability to meet various application requirements in trajectory representation perception and recognition the full signature optimized signature and cluster signature are firstly defined for trajectory representation then we explore the motion perception from single signature inter signature matching and the generalization of cluster signature furthermore three solutions for signature recognition are investigated corresponding to different signature descriptions the conducted experiments verified the signature’s capabilities and flexibility the signature’s application to robot learning is also discussed
there is well known gap between systems oriented information retrieval ir and user oriented ir which cognitive ir seeks to bridge it is therefore interesting to analyze approaches at the level of frameworks models and study designs this article is an exercise in such an analysis focusing on two significant approaches to ir the lab ir approach and ingwersen’s cognitive ir approach the article focuses on their research frameworks models hypotheses laws and theories study designs and possible contributions the two approaches are quite different which becomes apparent in the use of independent controlled and dependent variables in the study designs of each approach thus each approach is capable of contributing very differently to understanding and developing information access the article also discusses integrating the approaches at the study design level copy wiley periodicals inc
we present novel framework for motion segmentation that combines the concepts of layer based methods and feature based motion estimation we estimate the initial correspondences by comparing vectors of filter outputs at interest points from which we compute candidate scene relations via random sampling of minimal subsets of correspondences we achieve dense piecewise smooth assignment of pixels to motion layers using fast approximate graphcut algorithm based on markov random field formulation we demonstrate our approach on image pairs containing large inter frame motion and partial occlusion the approach is efficient and it successfully segments scenes with inter frame disparities previously beyond the scope of layer based motion segmentation methods we also present an extension that accounts for the case of non planar motion in which we use our planar motion segmentation results as an initialization for regularized thin plate spline fit in addition we present applications of our method to automatic object removal and to structure from motion
due to the development of efficient solvers declarative problem solving frameworks based on model generation are becoming more and more applicable in practice however there are almost no tools to support debugging in these frameworks for several reasons current solvers are not suitable for debugging by tracing in this paper we propose new solver algorithm for one of these frameworks namely model expansion that allows for debugging by tracing we explain how to explore the trace of this solver in order to quickly locate bug and we compare our debugging method with existing ones for answer set programming and the alloy system
managing complex software systems is one of the most important problems to be solved by software engineering the software engineer needs to apply new techniques that allow for their adequate manipulation software architecture is becoming an important part of software design helping the designer to handle the structure and the complexity of large systems and aosd is paradigm proposed to manage this complexity by considering crosscutting concerns throughout the software’s life cycle the suitability of the existence of an aspect oriented ao architectural design appears when ao concepts are extended to the whole life cycle in order to adequately specify the ao design aspect oriented architecture description languages are needed the formal basis of these will allow architects to reason about the properties of the software architecture in this paper new architecture description language aspectleda is formally described in order to adequately manipulate ao concepts at the software architecture stage the aspectleda translation process is also described toolkit assists the architect during the process finally prototype of the system can be obtained and the correctness of the architecture obtained can be checked
we present diagsplit parallel algorithm for adaptively tessellating displaced parametric surfaces into high quality crack free micropolygon meshes diagsplit modifies the split dice tessellation algorithm to allow splits along non isoparametric directions in the surface’s parametric domain and uses dicing scheme that supports unique tessellation factors for each subpatch edge edge tessellation factors are computed using only information local to subpatch edges these modifications allow all subpatches generated by diagsplit to be processed independently without introducing junctions or mesh cracks and without incurring the tessellation overhead of binary dicing we demonstrate that diagsplit produces output that is better in terms of image quality and number of micropolygons produced than existing parallel tessellation schemes and as good as highly adaptive split dice implementations that are less amenable to parallelization
redundant threading architectures duplicate all instructions to detect and possibly recover from transient faults several lighter weight partial redundant threading prt architectures have been proposed recently opportunistic fault tolerance duplicates instructions only during periods of poor single thread performance ii restore does not explicitly duplicate instructions and instead exploits mispredictions among highly confident branch predictions as symptoms of faults iii slipstream creates reduced alternate thread by replacing many instructions with highly confident predictions we explore prt as possible direction for achieving the fault tolerance of full duplication with the performance of single thread execution opportunistic and restore yield partial coverage since they are restricted to using only partial duplication or only confident predictions respectively previous analysis of slipstream fault tolerance was cursory and concluded that only duplicated instructions are covered in this paper we attempt to better understand slipstream’s fault tolerance conjecturing that the mixture of partial duplication and confident predictions actually closely approximates the coverage of full duplication thorough dissection of prediction scenarios confirms that faults in nearly of instructions are detectable fewer than of faulty instructions are not detectable due to coincident faults and mispredictions next we show that the current recovery implementation fails to leverage excellent detection capability since recovery sometimes initiates belatedly after already retiring detected faulty instruction we propose and evaluate suite of simple microarchitectural alterations to recovery and checking using the best alterations slipstream can recover from faults in of instructions compared to only of instructions without alterations both results are much higher than predicted by past research which claims coverage for only duplicated instructions or of instructions on an issue smt processor slipstream performs within of single thread execution whereas full duplication slows performance by key byproduct of this paper is novel analysis framework in which every dynamic instruction is considered to be hypothetically faulty thus not requiring explicit fault injection fault coverage is measured in terms of the fraction of candidate faulty instructions that are directly or indirectly detectable before
the success of model checking for large programs depends crucially on the ability to efficiently construct parsimonious abstractions predicate abstraction is parsimonious if at each control location it specifies only relationships between current values of variables and only those which are required for proving correctness previous methods for automatically refining predicate abstractions until sufficient precision is obtained do not systematically construct parsimonious abstractions predicates usually contain symbolic variables and are added heuristically and often uniformly to many or all control locations at once we use craig interpolation to efficiently construct from given abstract error trace which cannot be concretized parsominous abstraction that removes the trace at each location of the trace we infer the relevant predicates as an interpolant between the two formulas that define the past and the future segment of the trace each interpolant is relationship between current values of program variables and is relevant only at that particular program location it can be found by linear scan of the proof of infeasibility of the tracewe develop our method for programs with arithmetic and pointer expressions and call by value function calls for function calls craig interpolation offers systematic way of generating relevant predicates that contain only the local variables of the function and the values of the formal parameters when the function was called we have extended our model checker blast with predicate discovery by craig interpolation and applied it successfully to programs with more than lines of code which was not possible with approaches that build less parsimonious abstractions
object representation in the inferior temporal cortex it an area of visual cortex critical for object recognition in the primate exhibits two prominent properties objects are represented by the combined activity of columnar clusters of neurons with each cluster representing component features or parts of objects and closely related features are continuously represented along the tangential direction of individual columnar clusters here we propose learning model that reflects these properties of parts based representation and topographic organization in unified framework this model is based on nonnegative matrix factorization nmf basis decomposition method nmf alone provides parts based representation where nonnegative inputs are approximated by additive combinations of nonnegative basis functions our proposed model of topographic nmf tnmf incorporates neighborhood connections between nmf basis functions arranged on topographic map and attains the topographic property without losing the parts based property of the nmf the tnmf represents an input by multiple activity peaks to describe diverse information whereas conventional topographic models such as the self organizing map som represent an input by single activity peak in topographic map we demonstrate the parts based and topographic properties of the tnmf by constructing hierarchical model for object recognition where the tnmf is at the top tier for learning high level object features the tnmf showed better generalization performance over nmf for data set of continuous view change of an image and more robustly preserving the continuity of the view change in its object representation comparison of the outputs of our model with actual neural responses recorded in the it indicates that the tnmf reconstructs the neuronal responses better than the som giving plausibility to the parts based learning of the model
given many known results on wireless network capacity practical and optimal capacity utilization remains an open question the existing link scheduling schemes in the literature are not applicable in large wireless networks because of their global operations for topology collection and transmission synchronization as network size increases global operations become infeasible to implement we propose in this paper localized link scheduling solution for achieving order optimal network capacity our method eliminates the global operations and improves significantly the practicality of scheduling implementation as the cost localized scheduling reduces the network capacity utilization however we prove that the reduction can be bounded by constant factor from the scaling order point of view we hence provide practical scheduling approach to optimize the network utilization
the need to automatically extract and classify the contents of multimedia data archives such as images video and text documents has led to significant work on similarity based retrieval of data to date most work in this area has focused on the creation of index structures for similarity based retrieval there is very little work on developing formalisms for querying multimedia databases that support similarity based computations and optimizing such queries even though it is well known that feature extraction and identification algorithms in media data are very expensive we introduce similarity algebra that brings together relational operators and results of multiple similarity implementations in uniform language the algebra can be used to specify complex queries that combine different interpretations of similarity values and multiple algorithms for computing these values we prove equivalence and containment relationships between similarity algebra expressions and develop query rewriting methods based on these results we then provide generic cost model for evaluating cost of query plans in the similarity algebra and query optimization methods based on this model we supplement the paper with experimental results that illustrate the use of the algebra and the effectiveness of query optimization methods using the integrated search engine isee as the testbed
tl and similar stm algorithms deliver high scalability based on write locking and invisible readers in fact no modern stm design locks to read along its common execution path because doing so would require memory synchronization operation that would greatly hamper performance in this paper we introduce tlrw new stm algorithm intended for the single chip multicore systems that are quickly taking over large fraction of the computing landscape we make the claim that the cost of coherence in such single chip systems is down to level that allows one to design scalable stm based on read write locks tlrw is based on byte locks novel read write lock design with low read lock acquisition overhead and the ability to take advantage of the locality of reference within transactions as we show tlrw has painfully simple design one that naturally provides coherent state without validation implicit privatization and irrevocable transactions providing similar properties in stms based on invisible readers such as tl has typically resulted in major loss of performance in series of benchmarks we show that when running on way single chip multicore machine tlrw delivers surprisingly good performance competitive with and sometimes outperforming tl however on way chip system that has higher coherence costs across the interconnect performance deteriorates rapidly we believe our work raises the question of whether on single chip multicore machines read write lock based stms are the way to go
speculative execution is an important technique that has historically been used to extract concurrency from sequential programs while techniques to support speculation work well when computations perform relatively simple actions eg reads and writes to known locations understanding speculation for multi threaded programs in which threads may communicate and synchronize through multiple shared references is significantly more challenging and is the focus of this paper we use as our reference point simple higher order concurrent language extended with an way barrier and fork join execution model our technique permits the expression guarded by the barrier to speculatively proceed before the barrier has been satisfied ie before all threads that synchronize on that barrier have done so and to have participating threads that would normally block on the barrier to speculatively proceed as well our solution formulates safety properties under which speculation is correct in fork join model and per synchronization basis
raid storage systems protect data from storage errors such as data corruption using set of one or more integrity techniques such as checksums the exact protection offered by certain techniques or combination of techniques is sometimes unclear we introduce and apply formal method of analyzing the design of data protection strategies specifically we use model checking to evaluate whether common protection techniques used in parity based raid systems are sufficient in light of the increasingly complex failure modes of modern disk drives we evaluate the approaches taken by number of real systems under single error conditions and find flaws in every scheme in particular we identify parity pollution problem that spreads corrupt data the result of single error across multiple disks thus leading to data loss or corruption we further identify which protection measures must be used to avoid such problems finally we show how to combine real world failure data with the results from the model checker to estimate the actual likelihood of data loss of different protection strategies
we discuss the problem of path feasibility for programs manipulating strings using collection of standard string library functions we prove results on the complexity of this problem including its undecidability in the general case and decidability of some special cases in the context of test case generation we are interested in an efficient finite model finding method for string constraints to this end we develop two tier finite model finding procedure first an integer abstraction of string constraints are passed to an smt satisfiability modulo theories solver the abstraction is either unsatisfiable or the solver produces model that fixes lengths of enough strings to reduce the entire problem to be finite domain the resulting fixed length string constraints are then solved in second phase we implemented the procedure in symbolic execution framework report on the encouraging results and discuss directions for improving the method further
the performance of memory hierarchies in which caches play an essential role is critical in nowadays general purpose and embedded computing systems because of the growing memory bottleneck problem unfortunately cache behavior is very unstable and difficult to predict this is particularly true in the presence of irregular access patterns which exhibit little locality such patterns are very common for example in applications in which pointers or compressed sparse matrices give place to indirections nevertheless cache behavior in the presence of irregular access patterns has not been widely studied in this paper we present an extension of systematic analytical modeling technique based on pmes probabilistic miss equations previously developed by the authors that allows the automated analysis of the cache behavior for codes with irregular access patterns resulting from indirections the model generates very accurate predictions despite the irregularities and has very low computing requirements being the first model that gathers these desirable characteristics that can automatically analyze this kind of codes these properties enable this model to help drive compiler optimizations as we show with an example
an essential part of modern enterprise software development is metadata mainstream metadata formats including xml deployment descriptors and java annotations suffer from number of limitations that complicate the development and maintenance of enterprise applications their key problem is that they make it impossible to reuse metadata specifications not only across different applications but even across smaller program constructs such as classes or methods to provide better enterprise metadata we present pattern based structural expressions pbse novel metadata representation that offers conciseness and maintainability advantages and is reusable to apply pbse to enterprise applications we translate pbse specifications to java annotations with annotating classes automatically as an intermediate build step we demonstrate the advantages of the new metadata format by assessing its conciseness and reusability as compared to xml and annotations in the task of expressing metadata of jee reference applications and mid size commercial enterprise application
intelligent adaptation is key issue for the design of flexible support systems for mobile users in this paper we present ubiquito tourist guide which integrates different forms of adaptation to the device used web access via laptop pda smartphone ii to the user and her features and preferences personalized interaction iii to the context of interaction and in particular to the user location besides some other features such as the time of the day ubiquito adapts the content of the service being provided recommendation and amount type of information features associated with each recommendation and the presentation interface in order to achieve better performance it keeps track of the user behavior updating and refining the user model during the interaction in the paper we introduce the architecture of the system and the choices we made as regards user device and context modeling and adaptation strategies we also present the results of preliminary evaluation of the system behavior
in luca cardelli and peter wegner my advisor published an acm computing surveys paper called on understanding types data abstraction and polymorphism their work kicked off flood of research on semantics and type theory for object oriented programming which continues to this day despite years of research there is still widespread confusion about the two forms of data abstraction abstract data types and objects this essay attempts to explain the differences and also why the differences matter
concurrency related bugs may happen when multiple threads access shared data and interleave in ways that do not correspond to any sequential execution their absence is not guaranteed by the traditional notion of data race freedom we present new definition of data races in terms of problematic interleaving scenarios and prove that it is complete by showing that any execution not exhibiting these scenarios is serializable for chosen set of locations our definition subsumes the traditional definition of data race as well as high level data races such as stale value errors and inconsistent views we also propose language feature called atomic sets of locations which lets programmers specify the existence of consistency properties between fields in objects without specifying the properties themselves we use static analysis to automatically infer those points in the code where synchronization is needed to avoid data races under our new definition an important benefit of this approach is that in general far fewer annotations are required than is the case with existing approaches such as synchronized blocks or atomic sections our implementation successfully inferred the appropriate synchronization for significant subset of java’s standard collections framework
in many learning tasks to obtain labeled instances is hard due to heavy cost while unlabeled instances can be easily collected active learners can significantly reduce labeling cost by only selecting the most informative instances for labeling graph based learning methods are popular in machine learning in recent years because of clear mathematical framework and strong performance with suitable models however they suffer heavy computation when the whole graph is in huge size in this paper we propose scalable algorithm for graph based active learning the proposed method can be described as follows in the beginning backbone graph is constructed instead of the whole graph then the instances in the backbone graph are chosen for labeling finally the instances with the maximum expected information gain are sampled repeatedly based on the graph regularization model the experiments show that the proposed method obtains smaller data utilization and average deficiency than other popular active learners on selected datasets from semi supervised learning benchmarks
current approaches for sequential pattern mining usually assume that the mining is performed in static sequence database however databases are not static due to update so that the discovered patterns might become invalid and new patterns could be created in addition to higher complexity the maintenance of sequential patterns is more challenging than that of association rules owing to sequence merging sequence merging which is unique in sequence databases requires the appended new sequences to be merged with the existing ones if their customer ids are the same re mining of the whole database appears to be inevitable since the information collected in previous discovery will be corrupted by sequence merging instead of re mining the proposed incsp incremental sequential pattern update algorithm solves the maintenance problem through effective implicit merging and efficient separate counting over appended sequences patterns found previously are incrementally updated rather than re mined from scratch moreover the technique of early candidate pruning further speeds up the discovery of new patterns empirical evaluation using comprehensive synthetic data shows that incsp is fast and scalable
applications that exploit the capabilities of sensor networks have triggered significant research on query processing in sensor systems energy constraints make optimizing query processing particularly important this article addresses multiroot multiquery optimization for region queries the work focuses on application layer issues exploiting query semantics the article formulates three algorithms na iuml ve algorithm without data sharing and static and heuristic data sharing algorithm the heuristic algorithm allows sharing of partially aggregated results of preconfigured geographic regions and exploits the location attribute of sensor nodes as grouping criterion simulation studies indicate the potential for significant energy savings with the proposed algorithms
this paper starts from well known idea that structure in irregular problems improves sequential performance and tries to show that the same structure can also be exploited for parallelization of irregular problems on distributed memory multicomputer in particular we extend well known parallelization technique called run time compilation to use structure information that is explicit on the array subscripts this paper presents number of internal representations suited to particular access patterns and shows how various preprocessing structures such as translation tables trace arrays and interprocessor communication schedules can be encoded in terms of one or more of these representations we show how loop and index normalization are important for detection of irregularity in array references as well as the presence of locality in such references this paper presents methods for detection of irregularity feasibility of inspection and finally placement of inspectors and interprocessor communication schedules we show that this process can be automated through extensions to an hpf fortran distributed memory compiler paradigm and new run time support for irregular problems pilar that uses variety of internal representations of communication patterns we devise performance measures which consider the relationship between the inspection cost the execution cost and the number of times the executor is invoked so that comparison of the competing schemes can be performed independent of the number of iterations finally we show experimental results on an ibm sp that validate our approach these results show that dramatic improvements in both memory requirements and execution time can be achieved by using these techniques
previous research has shown that global multiple scattering simulation is needed to achieve physically realistic renderings of hair particularly light colored hair with low absorption however previous methods have either sacrificed accuracy or have been too computationally expensive for practical use in this paper we describe physically based volumetric rendering method that computes multiple scattering solutions including directional effects much faster than previous accurate methods our two pass method first traces light paths through volumetric representation of the hair contributing power to grid of spherical harmonic coefficients that store the directional distribution of scattered radiance everywhere in the hair volume then in ray tracing pass multiple scattering is computed by integrating the stored radiance against the scattering functions of visible fibers using an efficient matrix multiplication single scattering is computed using conventional direct illumination methods in our comparisons the new method produces quality similar to that of the best previous methods but computes multiple scattering more than times faster
we study the process in which search engines with segmented indices serve queries in particular we investigate the number of result pages that search engines should prepare during the query processing phasesearch engine users have been observed to browse through very few pages of results for queries that they submit this behavior of users suggests that prefetching many results upon processing an initial query is not efficient since most of the prefetched results will not be requested by the user who initiated the search however policy that abandons result prefetching in favor of retrieving just the first page of search results might not make optimal use of system resources eitherwe argue that for certain behavior of users engines should prefetch constant number of result pages per query we define concrete query processing model for search engines with segmented indices and analyze the cost of such prefetching policies based on these costs we show how to determine the constant that optimizes the prefetching policy our results are mostly applicable to local index partitions of the inverted files but are also applicable to processing short queries in global index architectures
to bridge the semantic gap in content based image retrieval detecting meaningful visual entities eg faces sky foliage buildings etc in image content and classifying images into semantic categories based on trained pattern classifiers have become active research trends in this paper we present dual cascading learning frameworks that extract and combine intra image and inter class semantics for image indexing and retrievalin the supervised learning version support vector detectors are trained on semantic support regions without image segmentation the reconciled and aggregated detection based indexes then serve as input for support vector learning of image classifiers to generate class relative image indexes during retrieval similarities based on both indexes are combined to rank imagesin the unsupervised learning approach image classifiers are first trained on local image blocks from small number of labeled images then local semantic patterns are discovered from clustering the image blocks with high classification output training samples are induced from cluster memberships for support vector learning to form local semantic pattern detectors during retrieval similarities based on local class pattern indexes and discovered pattern indexes are combined to rank imagesquery by example experiments on unconstrained consumer photos with semantic queries show that the combined matching aproaches are better than matching with single indexes both the supervised semantics design and the semantics discovery approaches also outperformed the linear fusion of color and texture features significantly in average precisions by and respectively
the natural world is enormous dynamic incredibly diverse and highly complex despite the inherent challenges of surviving in such world biological organisms evolve self organize self repair navigate and flourish generally they do so with only local knowledge and without any centralized control our computer networks are increasingly facing similar challenges as they grow larger in size but are yet to be able to achieve the same level of robustness and adaptability many research efforts have recognized these parallels and wondered if there are some lessons to be learned from biological systems as result biologically inspired research in computer networking is quickly growing field this article begins by exploring why biology and computer network research are such natural match we then present broad overview of biologically inspired research grouped by topic and classified in two ways by the biological field that inspired each topic and by the area of networking in which the topic lies in each case we elucidate how biological concepts have been most successfully applied in aggregate we conclude that research efforts are most successful when they separate biological design from biological implementation that is to say when they extract the pertinent principles from the former without imposing the limitations of the latter
parameter variation in integrated circuits causes sections of chip to be slower than others if to prevent any resulting timing errors we design processors for worst case parameter values we may lose substantial performance an alternate approach explored in this paper is to design for closer to nominal values and provide some transistor budget to tolerate unavoidable variation induced errors to assess this approach this paper first presents novel framework that shows how microarchitecture techniques can trade off variation induced errors for power and processor frequency then the paper introduces an effective technique to maximize performance and minimize power in the presence of variation induced errors namely high dimensional dynamic adaptation for efficiency the technique is implemented using machine learning algorithm the results show that our best configuration increases processor frequency by on average allowing the processor to cycle faster than without variation processor performance increases by on average resulting in performance that is higher than without variation at only area cost
self adaptive recovery net sarn is an extended petri net model for specifying exceptional behavior in workflow systems sarn caters for high level recovery policies that are incorporated either with single task or set of tasks called recovery region recovery region delimits the part of the workflow from which the associated recovery policies take place in this paper we assume that sarn is initially partitioned into recovery regions by workflow designers who have priori expectations for how exceptions will be handled we propose pattern based approach to dynamically restructure sarn partition the objective is to continuously restructure recovery regions within sarn partition to reflect the dynamic changes in handling exceptions the restructuring of sarn partition is based on the observation of predefined recovery patterns
in this paper we present an interactive motion deformation method to modify animations so that they satisfy set of prioritized constraints our approach successfully handles the problem of retargeting and adjusting motion as well as adding significant changes to preexisting animations we introduce the concept of prioritized constraints for motion editing by exploiting an arbitrary large number of priority layers each frame is individually and smoothly adjusted to enforce set of prioritized constraints the iterative construction of the solution channels the convergence through intermediate solutions enforcing the highest prioritized constraints first in addition we propose new simple formulation to control the position of the center of mass so that the resulting motions are physically plausible finally we demonstrate that our method can address wide range of motion editing problem
the beautification of user interface resulting from model to model and model to code transformations in model driven architecture consists of performing manual changes to address user requirements which have not been supported during the transformations these requirements may include customization users preferences and compliance with corporate style guidelines this paper introduces beautification process into user interface model this process includes series of beautification operations based on formal definition as well as constrained editor that enables designers to apply these beautification operations on user interface all manual changes done using these beautification operations are transformed into model to model transformations thus reducing the problem of round trip engineering the paper also demonstrates that this process significantly reduces the number of manual changes performed on user interfaces of information systems while preserving the quality properties induced by the transformations
link analysis in various forms is now an established technique in many different subjects reflecting the perceived importance of links and of the web critical but very difficult issue is how to interpret the results of social science link analyses it is argued that the dynamic nature of the web its lack of quality control and the online proliferation of copying and imitation mean that methodologies operating within highly positivist quantitative framework are ineffective conversely the sheer variety of the web makes application of qualitative methodologies and pure reason very problematic to large scale studies methodology triangulation is consequently advocated in combination with warning that the web is incapable of giving definitive answers to large scale link analysis research questions concerning social factors underlying link creation finally it is claimed that although theoretical frameworks are appropriate for guiding research theory of link analysis is not possible copy wiley periodicals inc
in order to obtain acceptable quality of filtering services in real time conditions trade off between result relevance and response time has to be addressed ignoring resource availability is major drawback for many existing systems which try to boost quality by making different synergies between filtering strategies the essence of the proposed solution for combining filtering strategies is comprehensive self improving and resource aware coordination which both takes care about current resource usability and tries to improve itself during runtime the applicability of the presented coordination between filtering strategies is illustrated in system serving as an intelligent personal information assistant pia experimental results show that long lasting filtering jobs with duration over seconds are eliminated and that at the same time jobs which are shorter than seconds can be effectively used for improving coordination activities
formal description of real time requirements is difficult and error prone task conceptual and tool support for this activity plays central role in the agenda of technologytransference from the formal verification engineeringcommunity to the real time systems development practicein this article we present ts visual language to define complex event based requirements such as freshness bounded response event correlation etc the underlyingformalism is based on partial orders and supports real timeconstraints the problem of checking whether timed automatonmodel of system satisfies these sort of scenariosis shown to be decidable moreover we have also developeda tool that translates visually specified scenarios into observertimed automata the resulting automata can be composedwith model under analysis in order to check satisfactionof the stated scenarios we show the benefits of applyingthese ideas to some case studies
today chip multiprocessors cmps that accommodate multiple processor cores on the same chip have become reality as the communication complexity of such multicore systems is rapidly increasing designing an interconnect architecture with predictable behavior is essential for proper system operation in cmps general purpose processor cores are used to run software tasks of different applications and the communication between the cores cannot be precharacterized designing an efficient network on chip noc based interconnect with predictable performance is thus challenging task in this paper we address the important design issue of synthesizing the most power efficient noc interconnect for cmps providing guaranteed optimum throughput and predictable performance for any application to be executed on the cmp in our synthesis approach we use accurate delay and power models for the network components switches and links that are obtained from layouts of the components using industry standard tools the synthesis approach utilizes the floorplan knowledge of the noc to detect timing violations on the noc links early in the design cycle this leads to faster design cycle and quicker design convergence across the high level synthesis approach and the physical implementation of the design we validate the design flow predictability of our proposed approach by performing layout of the noc synthesized for core cmp our approach maintains the regular and predictable structure of the noc and is applicable in practice to existing noc architectures
in recent years there has been tremendous growth of online text information related to digital libraries medical diagnostic systems remote education news sources and electronic commerce there is great need to search and organise huge amount of information in text documents this paper focuses on word tendencies in documents and presents an automatic extraction method for specific subject field judgment is conducted by using field association words and similarity among word tendencies and other word tendencies are computed with information then word tendencies which have the same subject are grouped as one group and the important word tendencies are chosen from that group finally system suggests word tendencies from specific subjects and fields are implemented from the experimental result about of suggested word tendencies have been associated with popular subjects
probabilistic models with hidden variables such as probabilistic latent semantic analysis plsa and latent dirichlet allocation lda have recently become popular for solving several image content analysis tasks in this work we will use plsa model to represent images for performing scene classification we evaluate the influence of the type of local feature descriptor in this context and compare three different descriptors moreover we also examine three different local interest region detectors with respect to their suitability for this task our results show that two examined local descriptors the geometric blur and the self similarity feature outperform the commonly used sift descriptor by large margin
the rapid and robust identification of suspect’s footwear while he she is in police custody is an essential component in any system that makes full use of the footwear marks recovered from crime scenes footwear is an important source of forensic intelligence and sometimes evidence here we present an automated system for shoe model identification from outsole impressions taken directly from suspect’s shoes that can provide information in timely manner while suspect is in custody currently the process of identifying the shoe model from the of recorded model types is time consuming manual task the underlying methodology is based on large numbers of localized features located using mser feature detectors these features are transformed into robust sift descriptors and encoded relative to feature codebook forming histogram representations of each shoe pattern this representationist facilitates fast indexing of footwear patterns whilst finer search proceeds by comparing the correspondence between footwear patterns in short list through the application of modified constrained spectral correspondence methods the effectiveness of this approach is illustrated for reference dataset of different shoe model patterns from which first rank performance and top eight rank performance are achieved practical aspects of the system and future developments are also discussed
vehicular ad hoc networks vanets which provide vehicles with an easy access to exchange the up to date traffic status and various kinds of data have become promising application of mobile ad hoc networks in the life critical vanets security issues are considered as focal topic one challenging problem among these issues is the insider misbehavior since it bypasses the traditional security mechanisms such as authentication the existing works focusing on this problem do not take the privacy issue into account their presented solutions are paralyzed in the anonymous vanets where the drivers real identity is protected in this paper we propose slep and prp two novel protocols to efficiently remove the misbehaving insiders from the anonymous vanets through analysis and extensive simulations we demonstrate that these two protocols can achieve high reaction speed and accuracy for both the local eviction and the permanent revocation to the misbehaving vehicles at an acceptable cost
during highly productive period running from to about the research in lossless compression of meshes mainly consisted in hard battle for the best bitrates but for few years compression rates seem stabilized around bit per vertex for the connectivity coding of usual meshes and more and more work is dedicated to remeshing lossy compression or gigantic mesh compression where memory and cpu optimizations are the new priority however the size of models keeps growing and many application fields keep requiring lossless compression in this paper we present new contribution for single rate lossless connectivity compression which first brings improvement over current state of the art bitrates and secondly does not constraint the coding of the vertex positions offering therefore good complementarity with the best performing geometric compression methods the initial observation having motivated this work is that very often most of the connectivity part of mesh can be automatically deduced from its geometric part using reconstruction algorithms this has already been used within the limited framework of projectable objects essentially terrain models and gis but finds here its first generalization to arbitrary triangular meshes without any limitation regarding the topological genus the number of connected components the manifoldness or the regularity this can be obtained by constraining and guiding delaunay based reconstruction algorithm so that it outputs the initial mesh to be coded the resulting rates seem extremely competitive when the meshes are fully included in delaunay and are still good compared to the state of the art in the general case
the majority of existing work on agent dialogues considers negotiation persuasion or deliberation dialogues we focus on inquiry dialogues which allow agents to collaborate in order to find new knowledge we present general framework for representing dialogues and give the details necessary to generate two subtypes of inquiry dialogue that we define argument inquiry dialogues allow two agents to share knowledge to jointly construct arguments warrant inquiry dialogues allow two agents to share knowledge to jointly construct dialectical trees essentially tree with an argument at each node in which child node is counter argument to its parent existing inquiry dialogue systems only model dialogues meaning they provide protocol which dictates what the possible legal next moves are but not which of these moves to make our system not only includes dialogue game style protocol for each subtype of inquiry dialogue that we present but also strategy that selects exactly one of the legal moves to make we propose benchmark against which we compare our dialogues being the arguments that can be constructed from the union of the agents beliefs and use this to define soundness and completeness properties that we show hold for all inquiry dialogues generated by our system
predicate abstraction is the basis of many program verification tools until now the only known way to overcome the inherent limitation of predicate abstraction to safety properties was to manually annotate the finite state abstraction of program we extend predicate abstraction to transition predicate abstraction transition predicate abstraction goes beyond the idea of finite abstract state programs and checking the absence of loops instead our abstraction algorithm transforms program into finite abstract transition program then second algorithm checks fair termination the two algorithms together yield an automated method for the verification of liveness properties under full fairness assumptions impartiality justice and compassion in summary we exhibit principles that extend the applicability of predicate abstraction based program verification to the full set of temporal properties
it is well known that using conventional concurrency control techniques for obtaining serializable answers to long running queries leads to an unacceptable drop in system performance as result most current dbmss execute such queries under reduced degree of consistency thus providing non serializable answers in this paper we present new and highly concurrent approach for processing large decision support queries in relational databases in this new approach called compensation based query processing concurrent updates to any data participating in query are communicated to the query’s on line query processor which then compensates for these updates so that the final answer reflects changes caused by the updates very high concurrency is achieved by locking data only briefly while still delivering transaction consistent answers to queries
we present construction of piecewise rational free form surface of arbitrary topological genus which may contain sharp features creases corners or cusps the surface is automatically generated from given closed triangular mesh some of the edges are tagged as sharp ones defining the features on the surface the surface is mathcal smooth for an arbitrary value of except for the sharp features defined by the user our method is based on the manifold construction and follows the blending approach
advanced analysis of data streams is quickly becoming key area of data mining research as the number of applications demanding such processing increases online mining when such data streams evolve over time that is when concepts drift or change completely is becoming one of the core issues when tackling non stationary concepts ensembles of classifiers have several advantages over single classifier methods they are easy to scale and parallelize they can adapt to change quickly by pruning under performing parts of the ensemble and they therefore usually also generate more accurate concept descriptions this paper proposes new experimental data stream framework for studying concept drift and two new variants of bagging adwin bagging and adaptive size hoeffding tree asht bagging using the new experimental framework an evaluation study on synthetic and real world datasets comprising up to ten million examples shows that the new ensemble methods perform very well compared to several known methods
aggressive hardware prefetching often significantly increases energy consumption in the memory system experiments show that major fraction of prefetching energy degradation is due to the hardware history table related energy costs in this paper we present pare power aware prefetching engine that uses newly designed indexed hardware history table compared to the conventional single table design the new prefetching table consumes less power per access with the help of compiler based location set analysis we show that the proposed pare design improves energy consumption by as much as in the data memory systems in nm processor designs
this paper introduces new kernel which computes similarity between two natural language sentences as the number of paths shared by their dependency trees the paper gives very efficient algorithm to compute it this kernel is also an improvement over the word subsequence kernel because it only counts linguistically meaningful word subsequences which are based on word dependencies it overcomes some of the difficulties encountered by syntactic tree kernels as well experimental results demonstrate the advantage of this kernel over word subsequence and syntactic tree kernels
the increasing popularity of points as rendering primitives has led to variety of different rendering algorithms and the different implementations compare like apples to oranges in this paper we revisit and compare number of recently developed point based rendering implementations within common testbed also we briefly summarize few proposed hierarchical multiresolution point data structures and discuss their advantages based on common view dependent level of detail lod rendering framework we then examine different hardware accelerated point rendering algorithms experimental results are given with respect to performance timing and rendering quality for the different approaches additionally we also compare the point based rendering techniques to basic triangle mesh approach
people’s preferences are expressed at varying levels of granularity and detail as result of partial or imperfect knowledge one may have some preference for general class of entities for example liking comedies and another one for fine grained specific class such as disliking recent thrillers with al pacino in this article we are interested in capturing such complex multi granular preferences for personalizing database queries and in studying their impact on query results we organize the collection of one’s preferences in preference network directed acyclic graph where each node refers to subclass of the entities that its parent refers to and whenever they both apply more specific preferences override more generic ones we study query personalization based on networks of preferences and provide efficient algorithms for identifying relevant preferences modifying queries accordingly and processing personalized queries finally we present results of both synthetic and real user experiments which demonstrate the efficiency of our algorithms provide insight as to the appropriateness of the proposed preference model and show the benefits of query personalization based on composite preferences compared to simpler preference representations
background data analysis due to virus scanning backup and desktop search is increasingly prevalent on client systems as the number of tools and their resource requirements grow their impact on foreground workloads can be prohibitive this creates tension between users foreground work and the background work that makes information management possible we present system called scan lite that addresses this tension scan lite exploits the fact that data in an enterprise is often replicated to efficiently schedule background data analyses it uses content hashing to identify duplicate content and scans each unique piece of content only once it delays scheduling these scans to increase the likelihood that the content will be replicated on multiple machines thus providing more choices for where to perform the scan furthermore it prioritizes machines to maximize use of idle time and minimize the impact on foreground activities we evaluate scan lite using measurements of enterprise replication behavior we find that scan lite significantly improves scanning performance over the naive approach and that it effectively exploits replication to reduce total work done and the impact on client foreground activity
online mining of frequent itemsets over stream sliding window is one of the most important problems in stream data mining with broad applications it is also difficult issue since the streaming data possess some challenging characteristics such as unknown or unbound size possibly very fast arrival rate inability to backtrack over previously arrived transactions and lack of system control over the order in which the data arrive in this paper we propose an effective bit sequence based one pass algorithm called mfi transsw mining frequent itemsets within transaction sensitive sliding window to mine the set of frequent itemsets from data streams within transaction sensitive sliding window which consists of fixed number of transactions the proposed mfi transsw algorithm consists of three phases window initialization window sliding and pattern generation first every item of each transaction is encoded in an effective bit sequence representation in the window initialization phase the proposed bit sequence representation of item is used to reduce the time and memory needed to slide the windows in the following phases second mfi transsw uses the left bit shift technique to slide the windows efficiently in the window sliding phase finally the complete set of frequent itemsets within the current sliding window is generated by level wise method in the pattern generation phase experimental studies show that the proposed algorithm not only attain highly accurate mining results but also run significant faster and consume less memory than do existing algorithms for mining frequent itemsets over data streams with sliding window furthermore based on the mfi transsw framework an extended single pass algorithm called mfi timesw mining frequent itemsets within time sensitive sliding window is presented to mine the set of frequent itemsets efficiently over time sensitive sliding windows
this paper discusses the problems that software development organization must address in order to assess and improve its software processes in particular the authors are involved in project aiming at assessing and improving the current practice and the quality manual of the business unit telecommunications for defense butd of large telecommunications company the paper reports on the usage of formal process modeling languages to detect inconsistencies ambiguities incompleteness and opportunities for improvement of both the software process and its documentation
stronger protection is needed for the confidentiality and integrity of data because programs containing untrusted code are the rule rather than the exception information flow control allows the enforcement of end to end security policies but has been difficult to put into practice this article describes the decentralized label model new label model for control of information flow in systems with mutual distrust and decentralized authority the model improves on existing multilevel security models by allowing users to declassify information in decentralized way and by improving support for fine grained data sharing it supports static program analysis of information flow so that programs can be certified to permit only acceptable information flows while largely avoiding the overhead of run time checking the article introduces the language jif an extension to java that provides static checking of information flow using the decentralized label model
mobile objects have gained lot of attention in research and industry in the recent past but they also have long history security is one of the key requirements of mobile objects and one of the most researched characteristics related to mobility resource management has been somewhat neglected in the past but it is being increasingly addressed in both the context of security and qos in this paper we place few systems supporting mobile objects in perspective based upon how they address security and resource management we start with the theoretical model of actors that supports concurrent mobile objects in programming environment then we describe task migration for the mach microkernel case of mobile objects supported by an operating system using the omg masif standard as an example we then analyze middleware support for mobile objects mobile objects and agents moa system is an example of middleware level support based on java the active networks project conversant supports object mobility at the communication protocol level we summarize these projects comparing their security and resource management and conclude by deriving few general observations on how security and resource management have been applied and how they might evolve in the future
the usage control ucon model was introduced as unified approach to capture number of extensions for traditional access control models while the policy specification flexibility and expressive power of this model have been studied in previous work as related and fundamental problem the safety analysis of ucon has not been explored this paper presents two fundamental safety results for ucona sub model of ucon only considering authorizations in ucona an access control decision is based on the subject and or the object attributes which can be changed as the side effects of using the access right resulting in possible changes to future access control decisions hence the safety question in ucona is all the more pressing since every access can potentially enable additional permissions due to the mutability of attributes in ucon in this paper first we show that the safety problem is in general undecidable then we show that restricted form of ucona with finite attribute value domains and acyclic attribute creation relation has decidable safety property the decidable model maintains good expressive power as shown by specifying an rbac system with specific user role assignment scheme and drm application with consumable rights
in this paper we propose new structured pp overlay network named sw uinta small world in order to reduce the routing latency we firstly construct the uinta network in which both physical characteristics of network and data semantic are considered furthermore based on uinta nondeterministic caching strategy is employed to allow for poly logarithmic search time while having only constant cache size compared with the deterministic caching strategy proposed by previous pp systems the nondeterministic caching strategy can reduce communication overhead for maintaining the routing cache table cache entries in the cache table of peer nodes can be updated by subsequent queries rather than only by running stabilization periodically in the following novel cache replacement scheme named the sw cache replacement scheme is used to improve lookup performance which has proved to satisfy the small world principle so we call this network sw uinta small world after that according to the theoretical analysis it can be proved that sw uinta small world can get log search time with cache size lastly the performance of sw uinta small world is compared with those of other structured pp networks such as chord and uinta it shows that sw uinta small world can achieve improved object lookup performance and reduce maintenance cost
we study personalized web ranking algorithms based on the existence of document clusterings motivated by the topic sensitive page ranking of haveliwala we develop and implement an efficient local cluster algorithm by extending the web search algorithm of achlioptas et al we propose some formal criteria for evaluating such personalized ranking algorithms and provide some preliminary experiments in support of our analysis both theoretically and experimentally our algorithm differs significantly from topc sensitive page rank
recently wireless sensor networks wsns have been widely discussed in many applications in this paper we propose novel three dimensional emergency service that aims to guide people to safe places when emergencies happen at normal time the network is responsible for monitoring the environment when emergency events are detected the network can adaptively modify its topology to ensure transportation reliability quickly identify hazardous regions that should be avoided and find safe navigation paths that can lead people to exits in particular the structures of three dimensional buildings are taken into account in our design simulation results shows that our protocols can adapt emergencies quickly at low message cost and can find safer paths to exits than existing results
this paper proposes model of the inner enterprise knowledge recommender system among an organization different members have different demands for knowledge in different context comparing with traditional knowledge query way the knowledge recommender systems supply us more proactive way that could deliver the proper knowledge to the proper people at the proper time the recommendation mechanism is based on semantic matching on context information from both users side and knowledge’s side recommendation rules are also maintained in the recommendation engine which is the core module in the system by adjusting the rules the configuration of the knowledge recommender system could be adapted to different users this paper presents the system design as well as some key technologies analysis and also discusses the advantages and preconditions for implementing the proposed system
this paper investigates the casual interactions that support and nourish community and seeks to provide solution to the increasing detachment of modern society as community spaces become less and less engaging we suggest the use of ubiquitous computing ubicomp infrastructure to promote and support community connectedness via the hosting of virtual community environments and by providing local information and interaction possibilities this infrastructure addresses our need as society to communicate more effectively and create loose bonds with familiar strangers within our community we explore this idea with use scenario and user study of users interacting with the services in developed intelligent environment
this work presents static analysis technique based on program slicing for csp specifications given particular event in csp specification our technique allows us to know what parts of the specification must necessarily be executed before this event and what parts of the specification could be executed before it in some execution our technique is based on new data structure which extends the synchronized control flow graph scfg we show that this new data structure improves the scfg by taking into account the context in which processes are called and thus makes the slicing process more precise
having been extensively used to summarize massive data sets wavelet synopses can be classified into two types space bounded and error bounded synopses although various research efforts have been made for the space bounded synopses construction the constructions of error bounded synopses are yet to be thoroughly studied the state of the art approaches on error bounded synopses mainly focus on building one dimensional wavelet synopses while efficient algorithms on constructing multidimensional error bounded wavelet synopses still need to be investigated in this paper we propose first linear approximate algorithm to construct multidimensional error bounded synopses our algorithm constructs synopsis that has logn approximation ratio to the size of the optimal solution experiments on two dimensional array data have been conducted to support the theoretical aspects of our algorithm our method can build two dimensional wavelet synopses in less than second for large data set up to data array under given error bounds the advantages of our algorithm is further demonstrated through other comparisons in terms of synopses construction time and synopses sizes
in this paper we propose new polygonization method based on the classic marching triangle algorithm it is an improved and efficient version of the basic algorithm which produces complete mesh without any cracks our method is useful in the surface reconstruction process of scanned objects it works over the scalar field distance transform of the object to produce the resulting triangle mesh first we improve the original algorithm in finding new potential vertices in the mesh growing process second we modify the delaunay sphere test on the new triangles third we consider new triangles configuration to obtain more complete mesh finally we introduce an edge processing sequence to improve the overall marching triangle algorithm we use relevant error metric tool to compare results and show our new method is more accurate than marching cube which is the most widely used triangulation algorithm in the surface reconstruction process of scanned objects
in this paper we present an log time algorithm for finding shortest paths in an node planar graph with real weights this can be compared to the best previous strongly polynomial time algorithm developed by lipton rose and tarjan in which runs in time and the best polynomial time algorithm developed by henzinger klein subramanian and rao in which runs in time we also present significantly improved data structures for reporting distances between pairs of nodes and algorithms for updating the data structures when edge weights change
state space explosion causes most relevant behavioral questions for component based systems to be pspace hard here we exploit the structure of component based systems to obtain first approximation of the reachable global state space in order to improve this approximation we introduce new technique we call cross checking the resulting approximation can be used to study global properties of component based systems which we demonstrate here for local deadlock freedom
this paper describes and evaluates unified approach to phrasal query suggestions in the context of high precision search engine the search engine performs ranked extended boolean searches with the proximity operator near being the default operation suggestions are offered to the searcher when the length of the result list falls outside predefined bounds if the list is too long the engine specializes the query through the use of super phrases if the list is too short the engine generalizes the query through the use of proximal subphrases we describe methods for generating both types of suggestions and present algorithms for ranking the suggestions specifically we present the problem of counting proximal subphrases for specialization and the problem of counting unordered super phrases for generalization the uptake of our approach was evaluated by analyzing search log data from before and after the suggestion feature was added to commercial version of the search engine we looked at approximately million queries and found that after they were added suggestions represented nearly of the total queries efficacy was evaluated through controlled study of participants performing nine searches using three different search engines we found that the engine with phrasal query suggestions had better high precision recall than both the same search engine without suggestions and search engine with similar interface but using an okapi bm ranking algorithm
the goal of our research is to support cooperative work performed by stakeholders sitting around table to support such cooperation various table based systems with shared electronic display on the tabletop have been developed these systems however suffer the common problem of not recognizing shared information such as text and images equally because the orientation of their view angle is not favorable to solve this problem we propose the lumisight table this is system capable of displaying personalized information to each required direction on one horizontal screen simultaneously by multiplexing them and of capturing stakeholders gestures to manipulate the information
autonomic computing is being described as new paradigm for computing but in order to become new paradigm robust and easy to use methodologies and tools have to be developed in this paper we take significant step towards this goal by proposing an agent based methodology to autonomic computing systems starting from the recently realized fact that agent technology has the potential to be integrated into the framework of autonomic computing we propose to adapt the agent based methodology gaia for the analysis and the design of autonomic computing systems gaia adapted considers system’s organization as made up with two subsystems organizations namely managed system organization and an autonomic manager organization which stand in certain relationships to one another the key concepts in gaia adapted are roles which take two forms namely basic roles and autonomic roles that can interact with one another in certain institutionalized ways which are defined in the autonomic interaction model
we examine how to apply the hash join paradigm to spatial joins and define new framework for spatial hash joins our spatial partition functions have two components set of bucket extents and an assignment function which may map data item into multiple buckets furthermore the partition functions for the two input datasets may be differentwe have designed and tested spatial hash join method based on this framework the partition function for the inner dataset is initialized by sampling the dataset and evolves as data are inserted the partition function for the outer dataset is immutable but may replicate data item from the outer dataset into multiple buckets the method mirrors relational hash joins in other aspects our method needs no pre computed indices it is therefore applicable to wide range of spatial joinsour experiments show that our method outperforms current spatial join algorithms based on tree matching by wide margin further its performance is superior even when the tree based methods have pre computed indices this makes the spatial hash join method highly competitive both when the input datasets are dynamically generated and when the datasets have pre computed indices
we describe the reveal formal functional verification system and its application to four representative hardware test cases reveal employs counterexample guided abstraction refinement or cegar and is suitable for verifying the complex control logic of designs with wide datapaths reveal performs automatic datapath abstraction yielding an approximation of the original design with much smaller state space this approximation is subsequently used to verify the correctness of control logic interactions if the approximation proves to be too coarse it is automatically refined based on the spurious counterexample it generates such refinement can be viewed as form of on demand learning similar in spirit to conflict based learning in modern boolean satisfiability solvers the abstraction refinement process is iterated until the design is shown to be correct or an actual design error is reported the reveal system allows some user control over the abstraction and refinement steps this paper examines the effect on reveal’s performance of the various available options for abstraction and refinement based on our initial experience with this system we believe that automating the verification for useful class of hardware designs is now quite feasible
amos is mediator system that supports passive non intrusive integration of data from heterogeneous and autonomous data sources it is based on functional data model and declarative functional query language amosql foreign data sources eg relational databases text files or other types of data sources can be wrapped with amos mediators making them accessible through amosql amos mediators can communicate among each other through the multi database constructs of amosql that allow definition of functional queries and oo views accessing other amos servers the integrated views can contain both functions and types derived from the data sources furthermore local data associated with these view definitions may be stored in the mediator database this paper describes amos lsquo multi database query facilities and their optimization techniques calculus based function transformations are used to generate minimal query expressions before the query decomposition and cost based algebraic optimization steps take place object identifier oid generation is used for correctly representing derived objects in the mediators selective oid generation mechanism avoids overhead by generating in the mediator oids only for those derived objects that are either needed during the processing of query or have associated local data in the mediator database the validity of the derived objects that are assigned oids and the completeness of queries to the views are guaranteed by system generated predicates added to the queries
we present new framework for verifying partial specifications of programs in order to catch type and memory errors and check data structure invariants our technique can verify large class of data structures namely all those that can be expressed as graph types earlier versions were restricted to simple special cases such as lists or trees even so our current implementation is as fast as the previous specialized tools programs are annotated with partial specifications expressed in pointer assertion logic new notation for expressing properties of the program store we work in the logical tradition by encoding the programs and partial specifications as formulas in monadic second order logic validity of these formulas is checked by the mona tool which also can provide explicit counterexamples to invalid formulas to make verification decidable the technique requires explicit loop and function call invariants in return the technique is highly modular every statement of given program is analyzed only once the main target applications are safety critical data type algorithms where the cost of annotating program with invariants is justified by the value of being able to automatically verify complex properties of the program
for most web based applications contents are created dynamically based on the current state of business such as product prices and inventory stored in database systems these applications demand personalized content and track user behavior while maintaining application integrity many of such practices are not compatible with web acceleration solutions consequently although many web acceleration solutions have shown promising performance improvement and scalability architecting and engineering distributed enterprise web applications to utilize available content delivery networks remains challenge in this paper we examine the challenge to accelerate jee based enterprise web applications we list obstacles and recommend some practices to transform typical database driven jee applications to cache friendly web applications where web acceleration solutions can be applied furthermore such transformation should be done without modification to the underlying application business logic and without sacrificing functions that are essential to commerce we take the jee reference software the java petstore as case study by using the proposed guideline we are able to cache more than of the content in the petstore and scale up the web site more than times
over time researchers have acknowledged the importance of understanding the users strategies in the design of search systems however when involving users in the comparison of search systems methodological challenges still exist as researchers are pondering on how to handle the variability that human participants bring to the comparisons this paper present methods for controlling the complexity of user centered evaluations of search user interfaces through within subjects designs balanced task sets time limitations pre formulated queries cached result pages and through limiting the users access to result documents additionally we will present our experiences in using three measures search speed qualified search speed and immediate accuracy to facilitate the comparison of different search systems over studies
in spite of the increasing popularity of handheld touchscreen devices little research has been conducted on how to evaluate and design one handed thumb tapping interactions in this paper we present study that researched three issues related to these interactions whether it is necessary to evaluate these interactions with the preferred and the non preferred hand whether participants evaluating these interactions should be asked to stand and walk during evaluations whether targets on the edge of the screen enable participants to be more accurate in selection than targets not on the edge half of the forty participants in the study used their non preferred hand and half used their preferred hand each participant conducted half of the tasks while walking and half while standing we used different target positions on the edge of the screen and five different target sizes the participants who used their preferred hand completed tasks more quickly and accurately than the participants who used their non preferred hand with the differences being large enough to suggest it is necessary to evaluate this type of interactions with both hands we did not find differences in the performance of participants when they walked versus when they stood suggesting it is not necessary to include this as variable in evaluations in terms of target location participants rated targets near the center of the screen as easier and more comfortable to tap but the highest accuracy rates were for targets on the edge of the screen
the computation of kemeny rankings is central to many applications in the context of rank aggregation given set of permutations votes over set of candidates one searches for consensus permutation that is closest to the given set of permutations unfortunately the problem is np hard we provide broad study of the parameterized complexity for computing optimal kemeny rankings besides the three obvious parameters number of votes number of candidates and solution size called kemeny score we consider further structural parameterizations more specifically we show that the kemeny score and corresponding kemeny ranking of an election can be computed efficiently whenever the average pairwise distance between two input votes is not too large in other words kemeny score is fixed parameter tractable with respect to the parameter average pairwise kendall tau distance we describe fixed parameter algorithm with running time poly moreover we extend our studies to the parameters maximum range and average range of positions candidate takes in the input votes whereas kemeny score remains fixed parameter tractable with respect to the parameter maximum range it becomes np complete in the case of an average range of two this excludes fixed parameter tractability with respect to the parameter average range unless np finally we extend some of our results to votes with ties and incomplete votes where in both cases one no longer has permutations as input
key pre distribution scheme is method by which initially an off line trusted authority distributes pieces of information among set of users later each member of group of users can compute common key for secure communication in this paper we present an asymmetric key predistribution scheme instead of assuming that the network is comprised entirely of identical users in conventional key predistribution schemes the network now consists of mix of users with different missions ie ordinary users and keying material servers group of users using secret keys preloaded in their memory and public keying material retrieved from one keying material server can compute session key the properties of this method are that the compromise of keying material servers does not reveal any information about users secret keys and the session keys of privileged subset of users if computational assumptions are considered each user has very low storage requirement these properties make it attractive for sensor networks we first formally define the asymmetric key pre distribution scheme in terms of the entropy and give lower bounds on user’s storage requirement and the public keying material size then we present its constructions and applications for sensor networks
motivated by the advent of powerful hardware such as smp machines and execution environments such as grids research in parallel programming has gained much attention within the distributed computing community there is substantial body of efforts in the form of parallel libraries and frameworks that supply developers with programming tools to exploit parallelism in their applications still many of these efforts prioritize performance over other important characteristics such as code invasiveness ease of use and independence of the underlying executing hardware environment in this paper we present easyfjp new approach for semi automatically injecting parallelism into sequential java applications that offers convenient balance to these four aspects easyfjp is based upon the popular fork join parallel pattern and combines implicit application level parallelism with explicit non invasive application tuning experiments performed with several classic cpu intensive benchmarks and real world application confirm that easyfjp effectively addresses these problems while delivers very competitive performance
the co allocation architecture was developed to enable the parallel download of datasets servers from selected replica servers and the bandwidth performance is the main factor that affects the internet transfer between the client and the server therefore it is important to reduce the difference of finished time among replica servers and manage changeful network performance during the term of transferring as well in this paper we proposed an anticipative recursively adjusting co allocation scheme to adjust the workload of each selected replica server which handles unwarned variant network performances of the selected replica servers the algorithm is based on the previous finished rate of assigned transfer size to anticipate that bandwidth status on next section for adjusting the workload and further to reduce file transfer time in grid environment our approach is usefully in unstable gird environment which reduces the wasted idle time for waiting the slowest server and decreases file transfer completion time
classification in genres and domains is major field of research for information retrieval scientific and technical watch data mining etc and the selection of appropriate descriptors to characterize and classify texts is particularly crucial to that effectmost of practical experiments consider that domains are correlated to the content level words tokens lemmas etc and genres to the morphosyntactic or linguistic one function words pos etc however currently used variables are generally not accurate enough to be applied to the categorization taskthe present study assesses the impact of the lexical and linguistic levels in the field of genre and domain categorization the empirical results we obtained demonstrate how important it is to select an appropriate tagset that meets the requirement of the task the results also assess the efficiency of the linguistic level for both genre and domain based categorization
ksplice allows system administrators to apply patches to their operating system kernels without rebooting unlike previous hot update systems ksplice operates at the object code layer which allows ksplice to transform many traditional source code patches into hot updates with little or no programmer involvement in the common case that patch does not change the semantics of persistent data structures ksplice can create hot update without programmer writing any new code security patches are one compelling application of hot updates an evaluation involving all significant linux security patches from may to may finds that most security patches of require no new code to be performed as ksplice update in other words ksplice can correct of the linux kernel vulnerabilities from this interval without the need for rebooting and without writing any new code if programmer writes small amount of new code to assist with the remaining patches about lines per patch on average then ksplice can apply all of the security patches from this interval without rebooting
we study the problem of packing element disjoint steiner trees in graphs we are given graph and designated subset of terminal nodes and the goal is to find maximum cardinality set of element disjoint trees such that each tree contains every terminal node an element means non terminal node or an edge thus each non terminal node and each edge must be in at most one of the trees we show that the problem is apx hard when there are only three terminal nodes thus answering an open questionour main focus is on the special case when the graph is planar we show that the problem of finding two element disjoint steiner trees in planar graph is np hard we design an algorithm for planar graphs that achieves an approximation guarantee close to in fact given planar graph that is element connected on the terminals is an upper bound on the number of element disjoint steiner trees the algorithm returns element disjoint steiner trees using this algorithm we get an approximation algorithm for the edge disjoint version of the problem on planar graphs that improves on the previous approximation guarantees we also show that the natural lp relaxation of the planar problem has an integrality ratio approaching
in the context of the polikom research program novel telecooperation tools to support the distributed german government are being developed politeam is one of the projects resulting from this program aiming at the development of system supporting cooperation in large geographically distributed organizations main area of research is asynchronous cooperation based on the metaphors of electronic circulation folders and shared electronic workspaces the design process of politeam is based on the continuous improvement of an existing groupware system in close cooperation with selected pilot users this paper discusses our methodological design approach the design of the politeam system and the experiences our application partners made in the course of using the system at work
multiple disk systems disk arrays have been an attractive approach to meet high performance demands in data intensive applications such as information retrieval systems when we partition and distribute files across multiple disks to exploit the potential for parallelism balanced workload distribution becomes important for good performance naturally the performance of parallel information retrieval system using an inverted file structure is affected by the partitioning scheme of the inverted file in this paper we propose two different partitioning schemes for an inverted file system for shared everything multiprocessor machine with multiple disks we study the performance of these schemes by simulation under number of workloads where the term frequencies in the documents are varied the term frequencies in the queries are varied the number of disks are varied and the multiprogramming level is varied
botnet is network of compromised hosts that is under the control of single malicious entity often called the botmaster we present system that aims to detect bots independent of any prior information about the command and control channels or propagation vectors and without requiring multiple infections for correlation our system relies on detection models that target the characteristic fact that every bot receives commands from the botmaster to which it responds in specific way these detection models are generated automatically from network traffic traces recorded from actual bot instances we have implemented the proposed approach and demonstrate that it can extract effective detection models for variety of different bot families these models are precise in describing the activity of bots and raise very few false positives
we formulate dependency parsing as graphical model with the novel ingredient of global constraints we show how to apply loopy belief propagation bp simple and effective tool for approximate learning and inference as parsing algorithm bp is both asymptotically and empirically efficient even with second order features or latent variables which would make exact parsing considerably slower or np hard bp needs only time with small constant factor furthermore such features significantly improve parse accuracy over exact first order methods incorporating additional features would increase the runtime additively rather than multiplicatively
this survey covers rollback recovery techniques that do not require special language constructs in the first part of the survey we classify rollback recovery protocols into checkpoint based and log based checkpoint based protocols rely solely on checkpointing for system state restoration checkpointing can be coordinated uncoordinated or communication induced log based protocols combine checkpointing with logging of nondeterministic events encoded in tuples called determinants depending on how determinants are logged log based protocols can be pessimistic optimistic or causal throughout the survey we highlight the research issues that are at the core of rollback recovery and present the solutions that currently address them we also compare the performance of different rollback recovery protocols with respect to series of desirable properties and discuss the issues that arise in the practical implementations of these protocols
the prevalent use of xml highlights the need for generic flexible access control mechanism for xml documents that supports efficient and secure query access without revealing sensitive information unauthorized users this paper introduces novel paradigm for specifying xml security constraints and investigates the enforcement of such constraints during xml query evaluation our approach is based on the novel concept of security views which provide for each user group an xml view consisting of all and only the information that the users are authorized to access and view dtd that the xml view conforms to security views effectively protect sensitive data from access and potential inferences by unauthorized user and provide authorized users with necessary schema information to facilitate effective query formulation and optimization we propose an efficient algorithm for deriving security view definitions from security policies defined on the original document dtd for different user groups we also develop novel algorithms for xpath query rewriting and optimization such that queries over security views can be efficiently answered without materializing the views our algorithms transform query over security view to an equivalent query over the original document and effectively prune query nodes by exploiting the structural properties of the document dtd in conjunction with approximate xpath containment tests our work is the first to study flexible dtd based access control model for xml and its implications on the xml query execution engine furthermore it is among the first efforts for query rewriting and optimization in the presence of general dtds for rich class of xpath queries an empirical study based on real life dtds verifies the effectiveness of our approach
this paper describes the design goals and current status of tidier software tool that tidies erlang source code making it cleaner simpler and often also more efficient in contrast to other refactoring tools tidier is completely automatic and is not tied to any particular editor or ide instead tidier comes with suite of code transformations that can be selected by its user via command line options and applied in bulk on set of modules or entire applications using simple command alternatively users can use tidier’s gui to inspect one by one the transformations that will be performed on their code and manually select only those that they fancy we have used tidier to clean up various applications of erlang otp and have tested it on many open source erlang code bases of significant size we briefly report our experiences and show opportunities for tidier’s current set of transformations on existing erlang code out there as by product our paper also documents what we believe are good coding practices in erlang last but not least our paper describes in detail the automatic code cleanup methodology we advocate and set of refactorings which are general enough to be applied as is or with only small modifications to the source code of programs written in haskell or clean and possibly even in non functional languages
the paper covers the problem of bridging the gap between abstract and textual concrete syntaxes of software languages in the model driven engineering mde context this problem has been well studied in the context of programming languages but due to the obvious difference in the definitions of abstract syntax mde requires new set of engineering principles we first explore different approaches to defining abstract and concrete syntaxes in the mde context next we investigate the current state of languages and techniques used for bridging between textual concrete and abstract syntaxes in the context of mde finally we report on lessons learned in experimenting with the current technologies in order to provide comprehensive coverage of the problem under study we have selected case of web rule languages web rule languages leverage various types of syntax specification languages and they are complex in nature and large in terms of the language elements thus they provide us with realistic analysis framework based on which we can draw general conclusions based on the series of experiments that we conducted with the analyzed languages we propose method for approaching such problems and report on the empirical results obtained from the data collected during our experiments copyright copy john wiley sons ltd
radio frequency identification authentication protocols rfid aps are an active research topic and many protocols have been proposed in this paper we consider class of recently proposed lightweight rfid authentication protocols crap lcap ohlcap trap ya trap and ya trap which are claimed to be resistant to conventional attacks and suitable for low cost rfid device scenarios we examine them using gny logic to determine whether they can be proved to have achieved their protocol goals we show that most of them meet their goals though some do not furthermore this approach enables us to identify similarities and subtle differences among these protocols finally we offer guidelines on when it is necessary to use encryption rather than hash functions in the design of rfid authentication protocols
the processing of nn queries has been studied extensively both in centralized computing environment and in structured pp environment however the problem over an unstructured pp system is not well studied despite of their popularity communication efficient processing of nn queries in such an environment is unique challenge due to the distribution dynamics and large scale of the system in this paper we investigate the problem of efficiently computing nn queries over unstructured pp systems we first propose location based domination model to determine search space we then present two types of probing strategies radius convergence and radius expanding comprehensive performance study demonstrates that our techniques are efficient and scalable
animated characters that move and gesticulate appropriately with spoken text are useful in wide range of applications unfortunately this class of movement is very difficult to generate even more so when unique individual movement style is required we present system that with focus on arm gestures is capable of producing full body gesture animation for given input text in the style of particular performer our process starts with video of person whose gesturing style we wish to animate tool assisted annotation process is performed on the video from which statistical model of the person’s particular gesturing style is built using this model and input text tagged with theme rheme and focus our generation algorithm creates gesture script as opposed to isolated singleton gestures our gesture script specifies stream of continuous gestures coordinated with speech this script is passed to an animation system which enhances the gesture description with additional detail it then generates either kinematic or physically simulated motion based on this description the system is capable of generating gesture animations for novel text that are consistent with given performer’s style as was successfully validated in an empirical user study
providing information about other users and their activites is central function of many collaborative applications the data that provide this presence awareness are usually automatically generated and highly dynamic for example services such as aol instant messenger allow users to observe the status of one another and to initiate and participate in chat sessions as such services become more powerful privacy and security issues regarding access to sensitive user data become critical two key software engineering challenges arise in this contextpolicies regarding access to data in collaborative applications have subtle complexities and must be easily modifiable during collaboration users must be able to have high degree of confidence that the implementations of these policies are correct in this paper we propose framework that uses an automated verification approach to ensure that such systems conform to complex policies our approach takes advantage of verisoft recent tool for systematically testing implementations of concurrent systems and is applicable to wide variety of specification and development platforms for collaborative applications we illustrate the key features of our framework by applying it to the development of presence awareness system
pop up targets such as the items of popup menus and animated targets such as the moving windows in mac os exposé are common in current desktop environments this paper describes an initial study of pointing on pop up and animated targets since we are interested in expert performance we study the situation where the user has previous knowledge of the final position of the target we investigate the effect of the delay factor ie the delay before the target pops up for pop up targets or the duration of the animation for animated targets we find little difference between the two techniques in terms of pointing performance time and error however kinematic analysis reveals differences in the nature of the pointing movement we also find that movement time increases with delay but the degradation is smaller when the target is farther away than when it is closer indeed larger distances require longer movement time therefore the target reaches its destination while the participant is still moving the pointer providing more opportunity to correct the movement than with short distances finally we take into account these results to propose an extension to fitts law that better predicts movement time for these tasks
this paper proposes the use of session types to extend with behavioural information the simple descriptions usually provided by software component interfaces we show how session types allow not only high level specifications of complex interactions but also the definition of powerful interoperability tests at the protocol level namely compatibility and substitutability of components we present decidable proof system to verify these notions which makes our approach of pragmatic nature
the quantification of lexical semantic relatedness has many applications in nlp and many different measures have been proposed we evaluate five of these measures all of which use wordnet as their central resource by comparing their performance in detecting and correcting real word spelling errors an information content based measure proposed by jiang and conrath is found superior to those proposed by hirst and st onge leacock and chodorow lin and resnik in addition we explain why distributional similarity is not an adequate proxy for lexical semantic relatedness
thread level speculation tls has proven to be promising method of extracting parallelism from both integer and scientific workloads targeting speculative threads that range in size from hundreds to several thousand dynamic instructions and have minimal dependences between them recent work has shown that tls can offer compelling performance improvements for database workloads but only when targeting much larger speculative threads of more than dynamic instructions per thread with many frequent data dependences between them to support such large and dependent speculative threads hardware must be able to buffer the additional speculative state and must also address the more challenging problem of tolerating the resulting cross thread data dependences in this paper we present hardware support for large speculative threads that integrates several previous proposals for tls hardware we also introduce support for subthreads mechanism for tolerating cross thread data dependences by checkpointing speculative execution when speculation fails due to violated data dependence with sub threads the failed thread need only rewind to the checkpoint of the appropriate sub thread rather than rewinding to the start of execution this significantly reduces the cost of mis speculation we evaluate our hardware support for large and dependent speculative threads in the database domain and find that the transaction response time for three of the five transactions from tpc on simulated processor chip multiprocessor speedup by factor of to
in this paper we present method to jointly optimise the relevance and the diversity of the results in image retrieval without considering diversity image retrieval systems often mainly find set of very similar results so called near duplicates which is often not the desired behaviour from the user perspective the ideal result consists of documents which are not only relevant but ideally also diverse most approaches addressing diversity in image or information retrieval use two step approach where in first step set of potentially relevant images is determined and in second step these images are reranked to be diverse among the first positions in contrast to these approaches our method addresses the problem directly and jointly optimises the diversity and the relevance of the images in the retrieval ranking using techniques inspired by dynamic programming algorithms we quantitatively evaluate our method on the imageclef photo retrieval data and obtain results which outperform the state of the art additionally we perform qualitative evaluation on new product search task and it is observed that the diverse results are more attractive to an average user
the rapid propagation of the internet and information technologies has changed the nature of many industries fast response and personalized recommendations have become natural trends for all businesses this is particularly important for content related products and services such as consulting news and knowledge management in an organization the digital nature of their products allows for more customized delivery over the internet to provide personalized services however complete understanding of user profile and accurate recommendation are essential in this paper an internet recommendation system that allows customized content to be suggested based on the user’s browsing profile is developed the method adopts semantic expansion approach to build the user profile by analyzing documents previously read by the person once the customer profile is constructed personalized contents can be provided by the system an empirical study using master theses in the national central library in taiwan shows that the semantic expansion approach outperforms the traditional keyword approach in catching user interests the proper usage of this technology can increase customer satisfaction
the amount of information produced in the world increases by every year and this rate will only go up with advanced network technology more and more sources are available either over the internet or in enterprise intranets modern data management applications such as setting up web portals managing enterprise data managing community data and sharing scientific data often require integrating available data sources and providing uniform interface for users to access data from different sources such requirements have been driving fruitful research on data integration over the last two decades
tcp is widely used in commercial multimedia streaming systems with recent measurement studies indicating that significant fraction of internet streaming media is currently delivered over http tcp these observations motivate us to develop analytic performance models to systematically investigate the performance of tcp for both live and stored media streaming we validate our models via ns simulations and experiments conducted over the internet our models provide guidelines indicating the circumstances under which tcp streaming leads to satisfactory performance showing for example that tcp generally provides good streaming performance when the achievable tcp throughput is roughly twice the media bitrate with only few seconds of startup delay
the selection of stopping time ie scale significantly affects the performance of anisotropic diffusion filter for image denoising this paper designs markov random field mrf scale selection model which selects scales for image segments then the denoised image is the composition of segments at their optimal scales in the scale space firstly statistics based scale selection criteria are proposed for image segments then we design scale selection energy function in the mrf framework by considering the scale coherence between neighboring segments segment based noise estimation algorithm is also developed to estimate the noise statistics efficiently experiments show that the performance of mrf scale selection model is much better than the previous global scale selection schemes combined with this scale selection model the anisotropic diffusion filter is comparable to or even outperform the state of the art denoising methods in performance
this paper is concerned with transformation based approach to update propagation in an extended version of codd’s relational algebra which allows for defining derived relations even recursively it is shown that the desired optimization effects of update propagation may be lost if no generalized selection pushing strategy is employed to the transformed algebra expressions possible solution is the application of the magic sets rewriting but this may lead to unstratifiability of the incremental expressions for the efficient evaluation of magic sets transformed algebra expressions we propose to use the soft stratification approach because of the simplicity and efficiency of this technique
in many mid to large sized cities public maps are ubiquitous one can also find great number of maps in parks or near hiking trails public maps help to facilitate orientation and provide special information to not only tourists but also to locals who just want to look up an unfamiliar place while on the go these maps offer many advantages compared to mobile maps from services like google maps mobile or nokia maps they often show local landmarks and sights that are not shown on standard digital maps often these you are here yah maps are adapted to special use case eg zoo map or hiking map of certain area being designed for fashioned purpose these maps are often aesthetically well designed and their usage is therefore more pleasant in this paper we present novel technique and application called photomap that uses images of you are here maps taken with gps enhanced mobile camera phone as background maps for on the fly navigation tasks we discuss different implementations of the main challenge namely helping the user to properly georeference the taken image with sufficient accuracy to support pedestrian navigation tasks we present study that discusses the suitability of various public maps for this task and we evaluate if these georeferenced photos can be used for navigation on gps enabled devices
in this paper we propose the use of semistructured constraints in wrappers to mitigate the impact of poor extraction accuracy on cooperative information system cis data quality wrappers are critical element of ciss whenever the constituent information systems publish semistructured text such as forms reports and memos rather than structured databases the accuracy of cis data that stem from text depends upon the wrappers as well as the accuracy of the underlying sources wrapper specification is the process of defining patterns ie regular expressions to extract information from semistructured text wrapper verification is the process of ensuring extraction accuracy that the extracted information faithfully reflects the underlying source we focus on the problem of extraction accuracy we use constraints on semistructured data for both wrapper specification and verification consequently we perform extraction and verification simultaneously we apply the concept to wrappers for uniform domain name dispute resolution policy udrp cis of arbitration decisions udrp decisions are currently distributed across arbitration authorities on three continents the accuracy of data extracted using constraint based specification and verification is measured by type and type ii errors
despite the fact that large scale shared memory multiprocessors have been commercially available for several years system software that fully utilizes all their features is still not available mostly due to the complexity and cost of making the required changes to the operating system recently proposed approach called disco substantially reduces this development cost by using virtual machine monitor that laverages the existing operating system technology in this paper we present system called cellular disco that extends the disco work to provide all the advantages of the hardware partitioning and scalable operating system approaches we argue that cellular disco can achieve these benefits at only small fraction of the development cost of modifying the operating system cellular disco effectively turns large scale shared memory multiprocessor into virtual cluster that supports fault containment and heterogeneity while avoiding operating system scalability bottlenecks yet at the same time cellular disco preserves the benefits of shared memory multiprocessor by implementing dynamic fine grained resource sharing and by allowing users to overcommit resources such as processors and memory this hybrid approach requires scalable resource manager that makes local decisions with limited information while still providing good global performance and fault containment in this paper we describe our experience with cellular disco prototype on processor sgi origin system we show that the execution time penalty for this approach is low typically within of the best available commercial operating system formost workloads and that it can manage the cpu and memory resources of the machine significantly better than the hardware partitioning approach
software agents can be used to automate many of the tedious time consuming information processing tasks that humans currently have to complete manually however to do so agent plans must be capable of representing the myriad of actions and control flows required to perform those tasks in addition since these tasks can require integrating multiple sources of remote information typically slow bound process it is desirable to make execution as efficient as possible to address both of these needs we present flexible software agent plan language and highly parallel execution system that enable the efficient execution of expressive agent plans the plan language allows complex tasks to be more easily expressed by providing variety of operators for flexibly processing the data as well as supporting subplans for modularity and recursion for indeterminate looping the executor is based on streaming dataflow model of execution to maximize the amount of operator and data parallelism possible at runtime we have implemented both the language and executor in system called theseus our results from testing theseus show that streaming dataflow execution can yield significant speedups over both traditional serial von neumann as well as nonstreaming dataflow style execution that existing software and robot agent execution systems currently support in addition we show how plans written in the language we present can represent certain types of subtasks that cannot be accomplished using the languages supported by network query engines finally we demonstrate that the increased expressivity of our plan language does not hamper performance specifically we show how data can be integrated from multiple remote sources just as efficiently using our architecture as is possible with state of the art streaming dataflow network query engine
mobile computing based upon wireless technology as the interconnect and pdas web enabled cell phones etc as the end devices provide rich infrastructure for anywhere anytime information access wireless connectivity also poses tough problems network nodes may be mobile and the connectivity could be sporadic in many cases application mobility involving migration from one network node to another could provide interesting possibilities however the migration process is expensive in terms of both time and power overheads to minimize the migration cost an efficient strategy must decide which parts of the program should migrate to continue execution and at which program point the migration should take placein this work we develop compiler framework to achieve the above two goals first the potential migration points are decided by analyzing the call chains in the code then the compiler determines what parts of the program are dead at these points at run time using the current context of the call chain decision on whether to migrate now or later is taken such decision depends mainly upon the cost of migration involved at the current program point vs at later potential migration point our experiments with multimedia applications show that both the migration state and the latency are significantly reduced by our techniques over the base case of migration with full state in the absence of any compiler guidance thus the key contribution of the paper is to provide an efficient migration methodology removing barriers to application mobility
automatic text chunking aims to recognize grammatical phrase structures in natural language text text chunking provides downstream syntactic information for further analysis which is also an important technology in the area of text mining tm and natural language processing nlp existing chunking systems make use of external knowledge eg grammar parsers or integrate multiple learners to achieve higher performance however the external knowledge is almost unavailable in many domains and languages besides employing multiple learners does not only complicate the system architecture but also increase training and testing time costs in this paper we present novel phrase chunking model based on the proposed mask method without employing external knowledge and multiple learners the mask method could automatically derive more training examples from the original training data which significantly improves system performance we had evaluated our method in different chunking tasks and languages in comparison to previous studies the experimental results show that our method achieves state of the art performance in chunking tasks in two english chunking tasks ie shallow parsing and base chunking our method achieves and in rates when porting to chinese the rate is also our chunker is quite efficient the complete chunking time of words is less than
service level agreements slas are currently one of the major research topics in grid computing among many system components for sla related grid jobs the sla mapping mechanism has received widespread attention it is responsible for assigning sub jobs of workflow to variety of grid resources in way that meets the user’s deadline and costs as little as possible with the distinguished workload and resource characteristics mapping heavy communication workflow within an sla context gives rise to complicated combinatorial optimization problem this paper presents the application of various metaheuristics and suggests possible approach to solving this problem performance measurements deliver evaluation results on the quality and efficiency of each method
as in the web the growing of information is the main problem of the academic digital libraries thus similar tools could be applied in university digital libraries to facilitate the information access by the students and teachers in we presented fuzzy linguistic recommender system to advice research resources in university digital libraries the problem of this system is that the user profiles are provided directly by the own users and the process for acquiring user preferences is quite difficult because it requires too much user effort in this paper we present new fuzzy linguistic recommender system that facilitates the acquisition of the user preferences to characterize the user profiles we allow users to provide their preferences by means of incomplete fuzzy linguistic preference relation we include tools to manage incomplete information when the users express their preferences and in such way we show that the acquisition of the user profiles is improved
peer to peer pp systems which provide variety of popular services such as file sharing video streaming and voice over ip contribute significant portion of today’s internet traffic by building overlay networks that are oblivious to the underlying internet topology and routing these systems have become one of the greatest traffic engineering challenges for internet service providers isps and the source of costly data traffic flows in an attempt to reduce these operational costs isps have tried to shape block or otherwise limit pp traffic much to the chagrin of their subscribers who consistently finds ways to eschew these controls or simply switch providers in this paper we present the design deployment and evaluation of an approach to reducing this costly cross isp traffic without sacrificing system performance our approach recycles network views gathered at low cost from content distribution networks to drive biased neighbor selection without any path monitoring or probing using results collected from deployment in bittorrent with over users in nearly networks we show that our lightweight approach significantly reduces cross isp traffic and over of the time it selects peers along paths that are within single autonomous system as further we find that our system locates peers along paths that have two orders of magnitude lower latency and lower loss rates than those picked at random and that these high quality paths can lead to significant improvements in transfer rates in challenged settings where peers are overloaded in terms of available bandwidth our approach provides average download rate improvement in environments with large available bandwidth it increases download rates by on average and improves median rates by
centroidal voronoi tessellations cvts are special voronoi tessellations whose generators are also the centers of mass centroids of the voronoi regions with respect to given density function and cvt based methodologies have been proven to be very useful in many diverse applications in science and engineering in the context of image processing and its simplest form cvt based algorithms reduce to the well known means clustering and are easy to implement in this paper we develop an edge weighted centroidal voronoi tessellation ewcvt model for image segmentation and propose some efficient algorithms for its construction our ewcvt model can overcome some deficiencies possessed by the basic cvt model in particular the new model appropriately combines the image intensity information together with the length of cluster boundaries and can handle very sophisticated situations we demonstrate through extensive examples the efficiency effectiveness robustness and flexibility of the proposed method
debugging is time consuming task in software development although various automated approches have been proposed they are not effective enough on the other hand in manual debugging developers have difficulty in choosing breakpoints to address these problems and help developers locate faults effectively we propose an interactive fault localization framework combining the benefits of automated approaches and manual debugging before the fault is found this framework continuously recommends checking points based on statements suspicions which are calculated according to the execution information of test cases and the feedback information from the developer at earlier checking points then we propose naive approach which is an initial implementation of this framework however with this naive approach or manual debugging developers wrong estimation of whether the faulty statement is executed before the checking point breakpoint may make the debugging process fail so we propose another robust approach based on this framework handling cases where developers make mistakes during the fault localization process we performed two experimental studies and the results show that the two interactive approaches are quite effective compared with existing fault localization approaches moreover the robust approach can help developers find faults when they make wrong estimation at some checking points
we derive two big step abstract machines natural semantics and the valuation function of denotational semantics based on the small step abstract machine for core scheme presented by clinger at pldi starting from functional implementation of this smallstep abstract machine we fuse its transition function with its driver loop obtaining the functional implementation of big step abstract machine we adjust this big step abstract machine so that it is in defunctionalized form obtaining the functional implementation of second big step abstract machine we refunctionalize this adjusted abstract machine obtaining the functional implementation of natural semantics in continuation passing style and we closure unconvert this natural semantics obtaining compositional continuation passing evaluation function which we identify as the functional implementation of denotational semantics in continuation passing style we then compare this valuation function with that of clinger’s original denotational semantics of scheme
private data sometimes must be made public corporation may keep its customer sales data secret but reveals totals by sector for marketing reasons hospital keeps individual patient data secret but might reveal outcome information about the treatment of particular illnesses over time to support epidemiological studies in these and many other situations aggregate data or partial data is revealed but other data remains private moreover the aggregate data may depend not only on private data but on public data as well eg commodity prices general health statistics our ghostdb platform allows queries that combine private and public data produce aggregates to data warehouses for olap purposes and reveal exactly what is desired neither more nor less we call this functionality revelation on demand
in software transactional memory stm contention management refers to the mechanisms used to ensure forward progress to avoid livelock and starvation and to promote throughput and fairness unfortunately most past approaches to contention management were designed for obstruction free stm frameworks and impose significant constant time overheads priority based approaches in particular typically require that reads be visible to all transactions an expensive property that is not easy to support in most stm systems in this paper we present comprehensive strategy for contention management via fair resolution of conflicts in an stm with invisible reads our strategy depends on lazy acquisition of ownership extendable timestamps and an efficient way to capture both priority and conflicts we introduce two mechanisms one using bloom filters the other using visible read bits that implement point these mechanisms unify the notions of conflict resolution inevitability and transaction retry they are orthogonal to the rest of the contention management strategy and could be used in wide variety of hardware and software tm systems experimental evaluation demonstrates that the overhead of the mechanisms is low particularly when conflicts are rare and that our strategy as whole provides good throughput and fairness including livelock and starvation freedom even for challenging workloads
while interactive multimedia animation is very compelling medium few people are able to express themselves in it there are too many low level details that have to do not with the desired content eg shapes appearance and behavior but rather how to get computer to present the content for instance behaviors like motion and growth are generally gradual continuous phenomena moreover many such behaviors go on simultaneously computers on the other hand cannot directly accommodate either of these basic properties because they do their work in discrete steps rather than continuously and they only do one thing at time graphics programmers have to spend much of their effort bridging the gap between what an animation is and how to present it on computer we propose that this situation can be improved by change of language and present fran synthesized by complementing an existing declarative host language haskell with an embedded domain specific vocabulary for modeled animation as demonstrated in collection of examples the resulting animation descriptions are not only relatively easy to write but also highly composable
we have developed multithreaded implementation of breadth first search bfs of sparse graph using the cilk extensions to our pbfs program on single processor runs as quickly as standar breadth first search implementation pbfs achieves high work efficiency by using novel implementation of multiset data structure called bag in place of the fifo queue usually employed in serial breadth first search algorithms for variety of benchmark input graphs whose diameters are significantly smaller than the number of vertices condition met by many real world graphs pbfs demonstrates good speedup with the number of processing cores since pbfs employs nonconstant time reducer hyperobject feature of cilk the work inherent in pbfs execution depends nondeterministically on how the underlying work stealing scheduler load balances the computation we provide general method for analyzing nondeterministic programs that use reducers pbfs also is nondeterministic in that it contains benign races which affect its performance but not its correctness fixing these races with mutual exclusion locks slows down pbfs empirically but it makes the algorithm amenable to analysis in particular we show that for graph with diameter and bounded out degree this data race free version of pbfs algorithm runs it time dlg on processors which means that it attains near perfect linear speedup if dlg
we describe an implementation of session types in haskell session types statically enforce that client server communication proceeds according to protocols they have been added to several concurrent calculi but few implementations of session types are available our embedding takes advantage of haskell where appropriate but we rely on no exotic features thus our approach translates with minimal modification to other polymorphic typed languages such as ml and java our implementation works with existing haskell concurrency mechanisms handles multiple communication channels and recursive session types and infers protocols automatically while our implementation uses unsafe operations in haskell it does not violate haskell’s safety guarantees we formalize this claim in concurrent calculus with unsafe communication primitives over which we layer our implementation of session types and we prove that the session types layer is safe in particular it enforces that channel based communication follows consistent protocols
in this paper we address the problem of building class of robust factorization algorithms that solve for the shape and motion parameters with both affine weak perspective and perspective camera models we introduce gaussian uniform mixture model and its associated em algorithm this allows us to address parameter estimation within data clustering approach we propose robust technique that works with any affine factorization method and makes it resilient to outliers in addition we show how such framework can be further embedded into an iterative perspective factorization scheme we carry out large number of experiments to validate our algorithms and to compare them with existing ones we also compare our approach with factorization methods that use estimators
web hosting providers are increasingly looking into dynamic hosting to reduce costs and improve the performance of their platforms instead of provisioning fixed resources to each customer dynamic hosting maintains variable number of application instances to satisfy current demand while existing research in this area has mostly focused on the algorithms that decide on the number and location of application instances we address the problem of efficient enactment of these decisions once they are made we propose new approach to application placement and experimentally show that it dramatically reduces the cost of application placement which in turn improves the end to end agility of the hosting platform in reacting to demand changes
in wireless sensor network wsn concealing the locations and in some cases the identities of nodes especially the controller sometimes called sink or base station is an important problem in this paper we explain that appropriate solutions for this problem depend on the nature of the traffic generated in the network as well as the capabilities of the adversary that must be resisted when there is sufficient amount of data flows real or fake packets our proposed dcarps anonymous routing protocol can support location privacy against global eavesdropper otherwise it is only possible to stop packet tracing attacks by local eavesdropper which is what our probabilistic dcarps protocol achieves these protocols are based on label switching which has not been used in this kind of network before to enable dcarps we propose new approach for network topology discovery that allows the sink to obtain global view of the topology without revealing its own location as opposed to what is common today in sensor networks in order to resist traffic analysis attacks aiming at locating nodes we have used layered cryptography to make packet look randomly different on consecutive links stochastic security analysis of this protocol is provided another important issue in resource constrained sensor networks is energy conservation to this end our protocols use only modest symmetric cryptography also the sink is responsible for all routing calculations while the sensors only perform simple label swapping actions when forwarding packets another advantage of labels is preventing unnecessary cryptographic operations as will be seen in the manuscript furthermore we have embedded fairness scheme in the creation of the routing tree for the sensor network that distributes the burden of packet forwarding evenly
synopses construction algorithms have been found to be of interest in query optimization approximate query answering and mining and over the last few years several good synopsis construction algorithms have been proposed these algorithms have mostly focused on the running time of the synopsis construction vis vis the synopsis quality however the space complexity of synopsis construction algorithms has not been investigated as thoroughly many of the optimum synopsis construction algorithms are expensive in space for some of these algorithms the space required to construct the synopsis is significantly larger than the space required to store the input these algorithms rely on the fact that they require smaller working space and most of the data can be resident on disc the large space complexity of synopsis construction algorithms is handicap in several scenarios in the case of streaming algorithms space is fundamental constraint in case of offline optimal or approximate algorithms better space complexity often makes these algorithms much more attractive by allowing them to run in main memory and not use disc or alternately allows us to scale to significantly larger problems without running out of space in this paper we propose simple and general technique that reduces space complexity of synopsis construction algorithms as consequence we show that the notion of working space proposed in these contexts is redundant this technique can be easily applied to many existing algorithms for synopsis construction problems we demonstrate the performance benefits of our proposal through experiments on real life and synthetic data we believe that our algorithm also generalizes to broader range of dynamic programs beyond synopsis construction
using directed acyclic graph dag model of algorithms the authors focus on processor time minimal multiprocessor schedules time minimal multiprocessor schedules that use as few processors as possible the kung lo and lewis kll algorithm for computing the transitive closure of relation over set of elements requires at least parallel steps as originally reported their systolic array comprises sup processing elements it is shown that any time minimal multiprocessor schedule of the kll algorithm’s dag needs at least sup processing elements then processor time minimal systolic array realizing the kll dag is constructed its processing elements are organized as cylindrically connected mesh when mod when not mod the mesh is connected as torus
language extensions increase programmer productivity by providing concise often domain specific syntax and support for static verification of correctness security and style constraints language extensions can often be realized through translation to the base language supported by preprocessors and extensible compilers however various kinds of extensions require further adaptation of base compiler’s internal stages and components for example to support separate compilation or to make use of low level primitives of the platform eg jump instructions or unbalanced synchronization to allow for more loosely coupled approach we propose an open compiler model based on normalization steps from high level language to subset of it the core language we developed such compiler for mixed java and core bytecode language and evaluate its effectiveness for composition mechanisms such as traits as well as statement level and expression level language extensions
this paper presents geographic routing protocol boundary state routing bsr which consists of two componentsthe first is an improved forwarding strategy greedy boundedcompass which can forward packets around concave boundarieswhere the packet moves away from the destination without looping the second component is boundary mapping protocol bmp which is used to maintain link state information for boundaries containing concave vertices the proposed forwardingstrategy greedy boundedcompass is shown to produce higher rate of path completion than greedy forwarding andsignificantly improves the performance of gpsr in sparse networks when used in place of greedy forwarding the proposedgeographic routing protocol bsr is shown to produce significant improvements in performance in comparison to gpsr insparse networks due to informed decisions regarding direction of boundary traversal at local minima
we combine the work of garg and konemann and fleischer with ideas from dynamic graph algorithms to obtain faster approximation schemes for various versions of the multicommodity flow problem in particular if is moderately small and the size of every number used in the input instance is polynomially bounded the running times of our algorithms match up to poly logarithmic factors and some provably optimal terms the mn flow decomposition barrier for single commodity flow
the twisted cube is an important variant of the hypercube recently fan et al proved that the dimensional twisted cube tq is edge pancyclic for every they also asked if tq is edge pancyclic with faults for we find that tq is not edge pancyclic with only one faulty edge for any then we prove that tq is node pancyclic with faulty edges for every the result is optimal in the sense that with faulty edges the faulty tq is not node pancyclic for any
helper threading is technology to accelerate program by exploiting processor’s multithreading capability to run assist threads previous experiments on hyper threaded processors have demonstrated significant speedups by using helper threads to prefetch hard to predict delinquent data accesses in order to apply this technique to processors that do not have built in hardware support for multithreading we introduce virtual multithreading vmt novel form of switch on event user level multithreading capable of fly weight multiplexing of event driven thread executions on single processor without additional operating system support the compiler plays key role in minimizing synchronization cost by judiciously partitioning register usage among the user level threads the vmt approach makes it possible to launch dynamic helper thread instances in response to long latency cache miss events and to run helper threads in the shadow of cache misses when the main thread would be otherwise stalledthe concept of vmt is prototyped on an itanium processor using features provided by the processor abstraction layer pal firmware mechanism already present in currently shipping processors on way mp physical system equipped with vmt enabled itanium processors helper threading via the vmt mechanism can achieve significant performance gains for diverse set of real world workloads ranging from single threaded workstation benchmarks to heavily multithreaded large scale decision support systems dss using the ibm db universal database we measure wall clock speedup of to for the workstation benchmarks and to on various queries in the dss workload
the satisfiability test checks whether or not the evaluation of query returns the empty set for any input document and can be used in query optimization for avoiding the submission and the computation of unsatisfiable queries thus applying the satisfiability test before executing query can save processing time and query costs we focus on the satisfiability problem for queries formulated in the xml query language xpath and propose schema based approach to the satisfiability test of xpath queries which checks whether or not an xpath query conforms to the constraints in given schema if an xpath query does not conform to the constraints given in the schema the evaluation of the query will return an empty result for any valid xml document thus the xpath query is unsatisfiable we present complexity analysis of our approach which proves that our approach is efficient for typical cases we present an experimental analysis of our developed prototype which shows the optimization potential of avoiding the evaluation of unsatisfiable queries
differential topology and specifically morse theory provide suitable setting for formalizing and solving several problems related to shape analysis the fundamental idea behind morse theory is that of combining the topological exploration of shape with quantitative measurement of geometrical properties provided by real function defined on the shape the added value of approaches based on morse theory is in the possibility of adopting different functions as shape descriptors according to the properties and invariants that one wishes to analyze in this sense morse theory allows one to construct general framework for shape characterization parametrized with respect to the mapping function used and possibly the space associated with the shape the mapping function plays the role of lens through which we look at the properties of the shape and different functions provide different insights in the last decade an increasing number of methods that are rooted in morse theory and make use of properties of real valued functions for describing shapes have been proposed in the literature the methods proposed range from approaches which use the configuration of contours for encoding topographic surfaces to more recent work on size theory and persistent homology all these have been developed over the years with specific target domain and it is not trivial to systematize this work and understand the links similarities and differences among the different methods moreover different terms have been used to denote the same mathematical constructs which often overwhelm the understanding of the underlying common framework the aim of this survey is to provide clear vision of what has been developed so far focusing on methods that make use of theoretical frameworks that are developed for classes of real functions rather than for single function even if they are applied in restricted manner the term geometrical topological used in the title is meant to underline that both levels of information content are relevant for the applications of shape descriptions geometrical or metrical properties and attributes are crucial for characterizing specific instances of features while topological properties are necessary to abstract and classify shapes according to invariant aspects of their geometry the approaches surveyed will be discussed in detail with respect to theory computation and application several properties of the shape descriptors will be analyzed and compared we believe this is crucial step to exploit fully the potential of such approaches in many applications as well as to identify important areas of future research
we present an automated approach for high quality preview of feature film rendering during lighting design similar to previous work we use deep framebuffer shaded on the gpu to achieve interactive performance our first contribution is to generate the deep framebuffer and corresponding shaders automatically through data flow analysis and compilation of the original scene cache compression reduces automatically generated deep framebuffers to reasonable size for complex production scenes and shaders we also propose new structure the indirect framebuffer that decouples shading samples from final pixels and allows deep framebuffer to handle antialiasing motion blur and transparency efficiently progressive refinement enables fast feedback at coarser resolution we demonstrate our approach in real world production
autonomic systems manage themselves given high level objectives by their administrators they utilise feedback from their own execution and their environment to self adapt in order to satisfy their goals an important consideration for such systems is structure which is conducive to self management this paper presents structuring methodology for autonomic systems which explicitly models self adaptation while separating functionality and evolution our contribution is software architecture based framework combining an architecture description language based on pi calculus for describing the structure and behaviour of autonomic systems development methodology for evolution and mechanisms for feedback and change
computational design cd is paradigm where both program design and program synthesis are computations cd merges model driven engineering mde which synthesizes programs by transforming models with software product lines spl where programs are synthesized by composing transforma tions called features in this paper basic relationships between mde and spl are explored using the language of modern mathematicsnote although jointly authored this paper is written as presented by batory in his models keynote
java programs are increasing in popularity and prevalence on numerous platforms including high performance general purpose processors the success of java technology largely depends on the efficiency in executing the portable java bytecodes however the dynamic characteristics of the java runtime system present unique performance challenges for several aspects of microarchitecture design in this work we focus on the effects of indirect branches on branch target address prediction performance runtime bytecode translation just in time jit compilation frequent calls to the native interface libraries and dependence on virtual methods increase the frequency of polymorphic indirect branches therefore accurate target address prediction for indirect branches is very important for java codethis paper characterizes the indirect branch behavior in java processing and proposes an adaptive branch target buffer btb design to enhance the predictability of the targets our characterization shows that traditional btb will frequently mispredict few polymorphic indirect branches significantly deteriorating predictor accuracy in java processing therefore we propose rehashable branch target buffer btb which dynamically identifies polymorphic indirect branches and adapts branch target storage to accommodate multiple targets for branchthe btb improves the target predictability of indirect branches without sacrificing overall target prediction accuracy simulations show that the btb eliminates percnt of the indirect branch mispredictions suffered with traditional btb for java programs running in interpreter mode percnt in jit mode which leads to percnt decrease in overall target address misprediction rate percnt in jit mode with an equivalent number of entries the btb also outperforms the previously proposed target cache scheme for majority of java programs by adapting to greater variety of indirect branch behaviors
we investigate four hierarchical clustering methods single link complete link groupwise average and single pass and two linguistically motivated text features noun phrase heads and proper names in the context of document clustering statistical model for combining similarity information from multiple sources is described and applied to darpa’s topic detection and tracking phase tdt data this model based on log linear regression alleviates the need for extensive search in order to determine optimal weights for combining input features through an extensive series of experiments with more than documents from multiple news sources and modalities we establish that both the choice of clustering algorithm and the introduction of the additional features have an impact on clustering performance we apply our optimal combination of features to the tdt test data obtaining partitions of the documents that compare favorably with the results obtained by participants in the official tdt competition
the recognition of events in videos is relevant and challenging task of automatic semantic video analysis at present one of the most successful frameworks used for object recognition tasks is the bag of words bow approach however this approach does not model the temporal information of the video stream in this paper we present method to introduce temporal information within the bow approach events are modeled as sequence composed of histograms of visual features computed from each frame using the traditional bow model the sequences are treated as strings where each histogram is considered as character event classification of these sequences of variable size depending on the length of the video clip are performed using svm classifiers with string kernel that uses the needlemann wunsch edit distance experimental results performed on two datasets soccer video and trecvid demonstrate the validity of the proposed approach
replicating content across geographically distributed set of servers and redirecting clients to the closest server in terms of latency has emerged as common paradigm for improving client performance in this paper we analyze latencies measured from servers in google’s content distribution network cdn to clients all across the internet to study the effectiveness of latency based server selection our main result is that redirecting every client to the server with least latency does not suffice to optimize client latencies first even though most clients are served by geographically nearby cdn node sizeable fraction of experience latencies several tens of milliseconds higher than other in the same region second we find that queueing delays often override the benefits of client interacting with nearby server to help the administrators of google’s cdn cope with these problems we have built system called whyhigh first whyhigh measures client latencies across all nodes in the cdn and correlates measurements to identify the prefixes affected by inflated latencies second since clients in several thousand prefixes have poor latencies whyhigh prioritizes problems based on the impact that solving them would have eg by identifying either an as path common to several inflated prefixes or cdn node where path inflation is widespread finally whyhigh diagnoses the causes for inflated latencies using active measurements such as traceroutes and pings in combination with datasets such as bgp paths and flow records typical causes discovered include lack of peering routing misconfigurations and side effects of traffic engineering we have used whyhigh to diagnose several instances of inflated latencies and our efforts over the course of year have significantly helped improve the performance offered to clients by google’s cdn
two important features of modern database models are support for complex data structures and support for high level data retrieval and update the first issue has been studied by the development of various semantic data models the second issue has been studied through universal relation data models how the advantages of these two approaches can be combined is presently examined new data model that incorporates standard concepts from semantic data models such as entities aggregations and isa hierarchies is introduced it is then shown how nonnavigational queries and updates can be interpreted in this model the main contribution is to demonstrate how universal relation techniques can be extended to more powerful data model moreover the semantic constructs of the model allow one to eliminate many of the limitations of previous universal relation models
in contrast to current practices where software reuse is applied recursively and reusable assets are tailored trough parameterization or specialization existing reuse economic models assume that the cost of reusing software asset depends on its size and ii reusable assets are developed from scratch the contribution of this paper is that it provides modeling elements and an economic model that is better aligned with current practices the functioning of the model is illustrated in an example the example also shows how the model can support practitioners in deciding whether it is economically feasible to apply software reuse recursively
searching and comparing information from semi structured repositories is an important but cognitively complex activity for internet users the typical web interface displays list of results as textual list which is limited in helping the user compare or gain an overview of the results from series of iterative queries in this paper we propose new interactive lightweight technique that uses multiple synchronized tag clouds to support iterative visual analysis and filtering of query results although tag clouds are frequently available in web interfaces they are typically used for providing an overview of key terms in set of results but thus far have not been used for presenting semi structured information to support iterative queries we evaluated our proposed design in user study that presents typical search and comparison scenarios to users trying to understand heterogeneous clinical trials from leading repository of scientific information the study gave us valuable insights regarding the challenges that semi structured data collections pose and indicated that our design may ease cognitively demanding browsing activities of semi structured information
we consider the analysis and optimization of code utilizing operations and functions operating on entire arrays models are developed for studying the minimization of the number of materializations of array valued temporaries in basic blocks each consisting of sequence of assignment statements involving array valued variables we derive lower bounds on the number of materializations required and develop several algorithms minimizing the number of materializations subject to simple constraint on allowable statement rearrangement in contrast we also show that when statement rearrangement is unconstrained minimizing the number of materializations becomes np complete even for very simple basic blocks
predictable system behaviour is necessary but not sufficient condition when creating safety critical and safety related embedded systems at the heart of such systems there is usually form of scheduler the use of time triggered schedulers is of particular concern in this paper it has been demonstrated in previous studies that the problem of determining the task parameters for such scheduler is np hard we have previously described an algorithm ttsa which is intended to address this problem this paper describes an extended version of this algorithm ttsa which employs task segmentation to increase schedulability we show that the ttsa algorithm is highly efficient when compared with alternative branch and bound search schemes
in ubiquitous environments context aware applications need to monitor their execution context they use middleware services such as context managers for this purpose the space of monitorable entities is huge and each context aware application has specific monitoring requirements which can change at runtime as result of new opportunities or constraints due to context variations the issues dealt with in this paper are to guide context aware application designers in the specification of the monitoring of distributed context sources and to allow the adaptation of context management capabilities by dynamically taking into account new context data collectors not foreseen during the development process the solution we present cam follows the model driven engineering approach for answering the previous questions designers specialised into context management specify context awareness concerns into models that conform to context awareness meta model and these context awareness models are present at runtime and may be updated to cater with new application requirements this paper presents the whole chain from the context awareness model definition to the dynamic instantiation of context data collectors following modifications of context awareness models at runtime
social tagging is becoming increasingly popular in many web applications where users can annotate resources eg web pages with arbitrary keywords ie tags tag recommendation module can assist users in tagging process by suggesting relevant tags to them it can also be directly used to expand the set of tags annotating resource the benefits are twofold improving user experience and enriching the index of resources however the former one is not emphasized in previous studies though lot of work has reported that different users may describe the same concept in different ways we address the problem of personalized tag recommendation for text documents in particular we model personalized tag recommendation as query and ranking problem and propose novel graph based ranking algorithm for interrelated multi type objects when user issues tagging request both the document and the user are treated as part of the query tags are then ranked by our graph based ranking algorithm which takes into consideration both relevance to the document and preference of the user finally the top ranked tags are presented to the user as suggestions experiments on large scale tagging data set collected from delicious have demonstrated that our proposed algorithm significantly outperforms algorithms which fail to consider the diversity of different users interests
applications ranging from grid management to sensor nets to web based information integration and extraction can be viewed as receiving data from some number of autonomous remote data sources and then answering queries over this collected data in such environments it is helpful to inform users which data sources are relevant to their query results it is not immediately obvious what relevant should mean in this context as different users will have different requirements in this paper rather than proposing single definition of relevance we propose spectrum of definitions which we term relevance for we give algorithms for identifying relevant data sources for relational queries and explore their efficiency both analytically and experimentally finally we explore the impact of integrity constraints including dependencies and materialized views on the problem of computing and maintaining relevant data sources
redpin is fingerprint based indoor localization system designed and built to run on mobile phones the basic principles of our system are based on known systems like place lab or radar however with redpin it is possible to consider the signal strength of gsm bluetooth and wifi access points on mobile phone moreover we devised methods to omit the time consuming training phase and instead incorporate folksonomy like approach where the users train the system while using it finally this approach also enables the system to expeditiously adapt to changes in the environment caused for example by replaced access points
in this paper we discuss recent developments in interaction design principles for ubiquitous computing environments specifically implications related to situated and mobile aspects of work we present interaction through negotiation as general human computer interaction hci paradigm aimed at ubiquitous pervasive technology and environments with focus on facilitating negotiation in and between webs of different artifacts humans and places this approach is concerned with the way technology presents itself to us both as physical entities and as conceptual entities as well as the relations between these presentations as we move between different work settings and tasks it incorporates much needed focus on availability interpretability and connectivity as fundamental for understanding and supporting hci in relation to single devices as well as complex constellations of them based on several extensive empirical case studies as well as co operative design sessions we present reflective analysis providing insights into results of the interaction through negotiation design approach in action very promising area of application is exception handling in pervasive computing environments
whole file transfer is basic primitive for internet content dissemination content servers are increasingly limited by disk arm movement given the rapid growth in disk density disk transfer rates server network bandwidth and content size individual file transfers are sequential but the block access sequence on content server is effectively random when many slow clients access large files concurrently although larger blocks can help improve disk throughput buffering requirements increase linearly with block sizethis paper explores novel block reordering technique that can reduce server disk traffic significantly when large content files are shared the idea is to transfer blocks to each client in any order that is convenient for the server the server sends blocks to each client opportunistically in order to maximize the advantage from the disk reads it issues to serve other clients accessing the same file we first illustrate the motivation and potential impact of opportunistic block reordering using simple analytical model then we describe file transfer system using simple block reordering algorithm called circus experimental results with the circus prototype show that it can improve server throughput by factor of two or more in workloads with strong file access locality
we present an incremental algorithm to compute image based simplifications of large environment we use an optimization based approach to generate samples based on scene visibility and from each viewpoint create textured depth meshes tdms using sampled range panoramas of the environment the optimization function minimizes artifacts such as skins and cracks in the reconstruction we also present an encoding scheme for multiple tdms that exploits spatial coherence among different viewpoints the resulting simplifications incremental textured depth meshes itdms reduce preprocessing storage rendering costs and visible artifacts our algorithm has been applied to large complex synthetic environments comprising millions of primitives it is able to render them at frames second on pc with little loss in visual fidelity
even after decades of software engineering research complex computer systems still fail this paper makes the case for increasing research emphasis on dependability and specifically on improving availability by reducing time to recoverall software fails at some point so systems must be able to recover from failures recovery itself can fail too so systems must know how to intelligently retry their recovery we present here recursive approach in which minimal subset of components is recovered first if that does not work progressively larger subsets are recovered our domain of interest is internet services these systems experience primarily transient or intermittent failures that can typically be resolved by rebooting conceding that failure free software will continue eluding us for years to come we undertake systematic investigation of fine grain component level restarts microreboots as high availability medicine building and maintaining an accurate model of large internet systems is nearly impossible due to their scale and constantly evolving nature so we take an application generic approach that relies on empirical observations to manage recoverywe apply recursive microreboots to mercury commercial off the shelf cots based satellite ground station that is based on an lnternet service platform mercury has been in successful operation for over years from our experience with mercury we draw design guidelines and lessons for the application of recursive microreboots to other software systems we also present set of guidelines for building systems amenable to recursive reboots known as crash only software systems
with the increasing performance gap between processor and memory it is essential that caches are utilized efficiently however caches are very inefficiently utilized because not all the excess data fetched into the cache to exploit spatial locality is accessed studies have shown that prediction accuracy of about can be achieved when predicting the to be referenced words in cache block in this paper we use this prediction mechanism to fetch only the to be referenced data into the data cache on cache miss we then utilize the cache space thus made available to store words from multiple cache blocks in single physical cache block space in the cache thus increasing the useful words in the cache we also propose methods to combine this technique with value based approach to further increase the cache capacity our experiments show that with our techniques we achieve about of the data cache miss rate reduction and about of the cache capacity increase observed when using double sized cache with only about cache space overhead
we present methodological and technological solutions for evolving large scale software systems these solutions are based on many years of research and experience in developing systems in one of the most volatile application domains banking we discuss why promising software development techniques such as object oriented and component based approaches on their own cannot meet the challenges and objectives of software development today and propose three layered architectural approach based on the strict separation between computation coordination and configuration we present set of modelling primitives design principles and support tools through which such an approach can be put effectively into practice and discuss how it promotes more dynamic approach to software evolution finally we make comparisons with related work
previous work shows that web page can be partitioned into multiple segments or blocks and often the importance of those blocks in page is not equivalent also it has been proven that differentiating noisy or unimportant blocks from pages can facilitate web mining search and accessibility however no uniform approach and model has been presented to measure the importance of different segments in web pages through user study we found that people do have consistent view about the importance of blocks in web pages in this paper we investigate how to find model to automatically assign importance values to blocks in web page we define the block importance estimation as learning problem first we use vision based page segmentation algorithm to partition web page into semantic blocks with hierarchical structure then spatial features such as position and size and content features such as the number of images and links are extracted to construct feature vector for each block based on these features learning algorithms are used to train model to assign importance to different segments in the web page in our experiments the best model can achieve the performance with micro and micro accuracy which is quite close to person’s view
problem with the location free nature of cell phones is that callers have difficulty predicting receivers states leading to inappropriate calls one promising solution involves helping callers decide when to interrupt by providing them contextual information about receivers we tested the effectiveness of different kinds of contextual information by measuring the degree of agreement between receivers desires and callers decisions in simulation five groups of participants played the role of callers choosing between making calls or leaving messages and sixth group played the role of receivers choosing between receiving calls or receiving messages callers were provided different contextual information about receivers locations their cell phones ringer state the presence of others or no information at all callers provided with contextual information made significantly more accurate decisions than those without it our results suggest that different contextual information generates different kinds of improvements more appropriate interruptions or better avoidance of inappropriate interruptions we discuss the results and implications for practice in the light of other important considerations such as privacy and technological simplicity
over the recent years new approach for obtaining succinct approximate representation of dimensional vectors or signals has been discovered for any signal the succinct representation of is equal to ax where is carefully chosen real matrix lt often is chosen at random from some distribution over matrices the vector ax is often refered to as the measurement vector or sketch of although the dimension of ax is much shorter than of it contains plenty of useful information about
this paper introduces user centered design process and case study evaluation of novel wearable visualization system for team sports coined teamawear teamawear consists of three basketball jerseys that are equipped with electroluminescent wires and surfaces each jersey can be wirelessly controlled to represent game related information on the player in real time such as the amount of individual fouls scores and time alerts participatory user centered approach guided the development process towards more meaningful ethically and ergonomically valid design the system aims to enhance the awareness and understanding of game related public information for all stakeholders including players referees coaches and audience members we initially hypothesized that such increased awareness would positively influence in game decisions by players resulting in more interesting and enjoyable game play experience for all participants instead the case study evaluation demonstrated teamawear’s perceived usefulness particularly for nonplaying stakeholders such as the audience referees and coaches supporting more accurate coaching assessments better understanding of in game situations and increased enjoyment for spectators the high amount of game related cognitive load on the players during game play seems to hinder its influence on in game decisions
the wire length estimation is the bottleneck of packing based block placers to cope with this problem we present fast wire length estimation method in this paper the key idea is to bundle the pin nets between block pairs and measure the wire length bundle by bundle instead of net by net previous bundling method introduces huge error which compromises the performance we present an errorfree bundling approach which utilizes the piecewise linear wire length function of pair of blocks with the function implemented into lookup table the wire length can be computed promptly and precisely by binary search furthermore we show that pin nets can also be bundled resulting in further speedup the effectiveness of our method is verified by experiments
over the last few decades several usability knowledge based systems have been developed to provide user interface designers with usability knowledge eg heuristics usability guidelines standards such systems are intended to assist designers during the design process and to improve the usability of the user interface being designed however the assumption that such systems actually improve the usability of the resulting user interface remains to be demonstrated virtually no systems have been empirically tested by designers who create products in order to confirm this assumption we conducted an experimental study in which professional web designers had to create webpages either using knowledge based system metroweb or without it this study was intended to determine the influence of metroweb on the professional web designers cognitive activity and to find out whether metroweb actually assists them to develop user centred design the results show that the web designers did not very often use metroweb while designing webpages however rather surprisingly the designers who did use metroweb more often exhibited user centred activity than those working without metroweb we conclude this paper by discussing these findings and suggesting future possible ways of research intended to assist designers to adopt user centred approach to their activity
research on vibrotactile displays for mobile devices has developed and evaluated complex multi dimensional tactile stimuli with promising results however the possibility that user distraction an inevitable component of mobile interaction may mask or obscure vibrotactile perception has not been thoroughly considered this omission is addressed here with three studies comparing recognition performance on nine tactile icons between control and distracter conditions the icons were two dimensional three body sites against three roughness values and displayed to the wrist the distracter tasks were everyday activities transcription mouse based data entry and walking the results indicated performance significantly dropped in the distracter condition by between and in all studies variations in the results suggest different tasks may exert different masking effects this work indicates that distraction should be considered in the design of vibrotactile cues and that the results reported in lab based studies are unlikely to represent real world performance
as the data management field has diversified to consider settings in which queries are increasingly complex statistics are less available or data is stored remotely there has been an acknowledgment that the traditional optimize then execute paradigm is insufficient this has led to plethora of new techniques generally placed under the common banner of adaptive query processing that focus on using runtime feed back to modify query processing in way that provides better response time or more efficient cpu utilization in this survey paper we identify many of the common issues themes and approaches that pervade this work and the settings in which each piece of work is most appropriate our goal with this paper is to be value add over the existing papers on the material providing not only brief overview of each technique but also basic framework for understanding the field of adaptive query processing in general we focus primarily on intra query adaptivity of long running but not full fledged streaming queries we conclude with discussion of open research problems that are of high importance
operational type theory optt is type theory allowing possibly diverging programs while retaining decidability of type checking and consistent logic this is done by distinguishing proofs and program terms as well as formulas and types the theory features propositional equality on type free terms which facilitates reasoning about dependently typed programs optt has been implemented in the guru verified programming language which includes type and proof checker and compiler to efficient code in addition to the core optt guru implements number of extensions including ones for verification of programs using mutable state and input output this paper gives an introduction to verified programming in guru
tree patterns form natural basis to query tree structured data such as xml and ldap to improve the efficiency of tree pattern matching it is essential to quickly identify and eliminate redundant nodes in the pattern in this paper we study tree pattern minimization both in the absence and in the presence of integrity constraints ics on the underlying tree structured database in the absence of ics we develop polynomial time query minimization algorithm called cim whose efficiency stems from two key properties node cannot be redundant unless its children are and ii the order of elimination of redundant nodes is immaterial when ics are considered for minimization we develop technique for query minimization based on three fundamental operations augmentation an adaptation of the well known chase procedure minimization based on homomorphism techniques and reduction we show the surprising result that the algorithm referred to as acim obtained by first augmenting the tree pattern using ics and then applying cim always finds the unique minimal equivalent query while acim is polynomial time it can be expensive in practice because of its inherent non locality we then present fast algorithm cdm that identifies and eliminates local redundancies due to ics based on propagating rdquo information labels rdquo up the tree pattern cdm can be applied prior to acim for improving the minimization efficiency we complement our analytical results with an experimental study that shows the effectiveness of our tree pattern minimization techniques
to improve the performance of mobile computers number of broadcast based cache invalidation schemes have been proposed in the past to support object locality however most of these schemes have focused on providing support for client disconnection and reducing query delay the size of invalidation reports and the effect of invalidating items cached by many clients are also important issues that must be addressed in order to provide cost efficient cache invalidation in mobile environment in this paper we propose two techniques validation invalidation reports vir and the delayed requests scheme drs to address these issues vir uses combination of validation and invalidation reports allowing the server to construct and broadcast smaller reports at each interval thus improving downlink channel utilization drs addresses the problem where multiple clients request for the same data items it introduces cool down period after an invalidation to reduce the number of uplink requests sent by clients simulation results show that compared to the original ts approach the proposed schemes lower transmission cost associated with cache invalidation by between in the downlink channel and between in the uplink channel
we describe how some simple properties of discrete one forms directly relate to some old and new results concerning the parameterization of mesh data our first result is an easy proof of tutte’s celebrated spring embedding theorem for planar graphs which is widely used for parameterizing meshes with the topology of disk as planar embedding with convex boundary our second result generalizes the first dealing with the case where the mesh contains multiple boundaries which are free to be non convex in the embedding we characterize when it is still possible to achieve an embedding despite these boundaries being non convex the third result is an analogous embedding theorem for meshes with genus topologically equivalent to the torus applications of these results to the parameterization of meshes with disk and toroidal topologies are demonstrated extensions to higher genus meshes are discussed
the internet is currently experiencing one of the most important challenges in terms of content distribution since its first uses as medium for content delivery users from passive downloaders and browsers are moving towards content producers and publishers they often distribute and retrieve multimedia contents establishing network communities this is the case of peer to peer iptv communities in this work we present detailed study of pp iptv traffic providing useful insights on both transport and packet level properties as well as on the behavior of the peers inside the network in particular we provide novel results on the ports and protocols used ii differences between signaling and video traffic iii behavior of the traffic at different time scales iv differences between tcp and udp traffic traffic generated and received by peers vi peers neighborhood and session duration the knowledge gained thanks to this analysis is useful for several tasks eg traffic identification understanding the performance of different pp iptv technologies and the impact of such traffic on network nodes and links and building more realistic models for simulations
the majority of security schemes available for sensor networks assume deployment in areas without access to wired infrastructure more specifically nodes in these networks are unable to leverage key distribution centers kdcs to assist them with key management in networks with heterogeneous mix of nodes however it is not unrealistic to assume that some more powerful nodes have at least intermittent contact with backbone network for instance an air deployed battlefield network may have to operate securely for some time until uplinked friendly forces move through the area we therefore propose liger hybrid key management scheme for heterogeneous sensor networks that allows systems to operate in both the presence and absence of kdc specifically when no kdc is available nodes communicate securely with each other based upon probabilistic unbalanced method of key management the ability to access kdc allows nodes to probabilistically authenticate neighboring devices with which they are communicating we also demonstrate that this scheme is robust to the compromise of both low and high capability nodes and that the same keys can be used for both modes of operation detailed experiments and simulations are used to show that liger is highly practical solution for the current generation of sensors and the unbalanced approach can significantly reduce network initialization time
recent years have seen shift in perception of the nature of hci and interactive systems as interface work has increasingly become focus of attention for the social sciences we have expanded our appreciation of the importance of issues such as work practice adaptation and evolution in interactive systems the reorientation in our view of interactive systems has been accompanied by call for new model of design centered around user needs and participation this article argues that new process of design is not enough and that the new view necessitates similar reorientation in the structure of the systems we build it outlines some requirements for systems that support deeper conception of interaction and argues that the traditional system design techniques are not suited to creating such systems finally using examples from ongoing work in the design of an open toolkit for collaborative applications it illustrates how the principles of computational reflection and metaobject protocols can lead us toward new model based on open abstraction that holds great promise in addressing these issues
in order to let software programs gain full benefit from semi structured web sources wrapper programs must be built to provide machine readable view over them wrappers are able to accept query against the source and return set of structured results thus enabling applications to access web data in similar manner to that of information from databases significant problem in this approach arises as web sources may undergo changes that invalidate the current wrappers in this paper we present novel heuristics and algorithms to address this problem in our approach the system collects some query results during normal wrapper operation and when the source changes it uses them as input to generate set of labeled examples for the source which can then be used to induce new wrapper
chaotic routers are randomizing nonminimal adaptive packet routers designed for use in the communication networks of parallel computers chaotic routers combine the flexibility found in adaptive routing with design simple enough to be competitive with the most streamlined oblivious routers we review chaotic routing and compare it with other contemporary network routing approaches including state of the art oblivious and adaptive routers detailed head to head comparison between oblivious minimal adaptive and chaotic routing is then presented exploring the performance of comparable vlsi implementations through analysis and simulation the results indicate that chaotic routers provide very effective and efficient high performance message routing
functional logic programming and probabilistic programming have demonstrated the broad benefits of combining laziness non strict evaluation with sharing of the results with non determinism yet these benefits are seldom enjoyed in functional programming because the existing features for non strictness sharing and non determinism in functional languages are tricky to combine we present practical way to write purely functional lazy non deterministic programs that are efficient and perspicuous we achieve this goal by embedding the programs into existing languages such as haskell sml and ocaml with high quality implementations by making choices lazily and representing data with non deterministic components by working with custom monadic data types and search strategies and by providing equational laws for the programmer to reason about their code
planetlab is geographically distributed overlay network designed to support the deployment and evaluation of planetary scale network services two high level goals shape its design first to enable large research community to share the infrastructure planetlab provides distributed virtualization whereby each service runs in an isolated slice of planetlab’s global resources second to support competition among multiple network services planetlab decouples the operating system running on each node from the network wide services that define planetlab principle referred to as unbundled management this paper describes how planet lab realizes the goals of distributed virtualization and unbundled management with focus on the os running on each node
to improve the scalability of the web it is common practice to apply caching and replication techniques numerous strategies for placing and maintaining multiple copies of web documents at several sites have been proposed these approaches essentially apply global strategy by which single family of protocols is used to choose replication sites and keep copies mutually consistent we propose more flexible approach by allowing each distributed document to have its own associated strategy we propose method for assigning an optimal strategy to each document separately and prove that it generates family of optimal results using trace based simulations we show that optimal assignments clearly outperform any global strategy we have designed an architecture for supporting documents that can dynamically select their optimal strategy and evaluate its feasibility
building content based search tools for feature rich data has been challenging problem because feature rich data such as audio recordings digital images and sensor data are inherently noisy and high dimensional comparing noisy data requires comparisons based on similarity instead of exact matches and thus searching for noisy data requires similarity search instead of exact searchthe ferret toolkit is designed to help system builders quickly construct content based similarity search systems for feature rich data types the key component of the toolkit is content based similarity search engine for generic multi feature object representations to solve the similarity search problem in high dimensional spaces we have developed approximation methods inspired by recent theoretical results on dimension reduction the search engine constructs sketches from feature vectors as highly compact data structures for matching filtering and ranking data objects the toolkit also includes several other components to help system builders address search system infrastructure issues we have implemented the toolkit and used it to successfully construct content based similarity search systems for four data types audio recordings digital photos shape models and genomic microarray data
we propose new class of methods for vliw code compression using variable sized branch blocks with self generating tables code compression traditionally works on fixed sized blocks with its efficiency limited by their small size branch block series of instructions between two consecutive possible branch targets provides larger blocks for code compression we compare three methods for compressing branch blocks table based lempel ziv welch lzw based and selective code compression our approaches are fully adaptive and generate the coding table on the fly during compression and decompression when encountering branch target the coding table is cleared to ensure correctness decompression requires simple table lookup and updates the coding table when necessary when decoding sequentially the table based method produces bytes per iteration while the lzw based methods provide bytes peak and bytes average decompression bandwidth compared to huffman’s byte and variable to fixed vf bit peak performance our methods have higher decoding bandwidth and comparable compression ratio parallel decompression could also be applied to our methods which is more suitable for vliw architectures
several recent proposals for an active networks architecture advocate the placement of user defined computation within the network as key mechanism to enable wide range of new applications and protocols including reliable multicast transports mechanisms to foil denial of service attacks intra network real time signal transcoding and so forth this laudable goal however creates number of very difficult research problems and although number of pioneering research efforts in active networks have solved some of the preliminary small scale problems large number of wide open problems remain in this paper we propose an alternative to active networks that addresses restricted and more tractable subset of the active networks design space our approach which we and others call active services advocates the placement of user defined computation within the network as with active networks but unlike active networks preserves all of the routing and forwarding semantics of current internet architecture by restricting the computation environment to the application layer because active services do not require changes to the internet architecture they can be deployed incrementally in today’s internetwe believe that many of the applications and protocols targeted by the active networks initiative can be solved with active services and toward this end we propose herein specific architecture for an active service and develop one such service in detail the media gateway mega service that exploits this architecture in defining our active service we encountered six key problems service location service control service management service attachment service composition and the definition of the service environment and have crafted solutions for these problems in the context of the mega service to verify our design we implemented and fielded mega on the uc berkeley campus where it has been used regularly for several months by real users who connect via isdn to an on line classroom our initial experience indicates that our active services prototype provides very flexible and programmable platform for intra network computation that strikes good balance between the flexibility of the active networks architecture and the practical constraints of incremental deployment in the current internet
energy consumption and heat dissipation have become key considerations for modern high performance computer systems in this paper we focus on non clairvoyant speed scaling to minimize flow time plus energy for batched parallel jobs on multiprocessors we consider common scenario where the total power consumption cannot exceed given budget and the power consumed on each processor is sα when running at speed extending the equi processor allocation policy we propose two algorithms equi and equi which use respectively uniform speed and non uniform speed scaling function for the allocated processors using competitive analysis we show that equi is pα competitive for flow time plus energy and equi is lnp competitive for the same metric when given sufficient power where is the total number of processors our simulation results confirm that equi and equi achieve better performance than straightforward fixed speed equi strategy moreover moderate power constraint does not significantly affect the performance of our algorithms
most of traditional text categorization approaches utilize term frequency tf and inverse document frequency idf for representing importance of words and or terms in classifying text document this paper describes an approach to apply term distributions in addition to tf and idf to improve performance of centroid based text categorization three types of term distributions called inter class intra class and in collection distributions are introduced these distributions are useful to increase classification accuracy by exploiting information of term distribution among classes term distribution within class and term distribution in the whole collection of training data in addition this paper investigates how these term distributions contribute to weight each term in documents eg high term distribution of word promotes or demotes importance or classification power of that word to this end several centroid based classifiers are constructed with different term weightings using various data sets their performances are investigated and compared to standard centroid based classifier tdidf and centroid based classifier modified with information gain moreover we also compare them to two well known methods nn and naïve bayes in addition to unigram model of document representation bigram model is also explored finally the effectiveness of term distributions to improve classification accuracy is explored with regard to the training set size and the number of classes
traditional relation extraction methods require pre specified relations and relation specific human tagged examples bootstrapping systems significantly reduce the number of training examples but they usually apply heuristic based methods to combine set of strict hard rules which limit the ability to generalize and thus generate low recall furthermore existing bootstrapping methods do not perform open information extraction open ie which can identify various types of relations without requiring pre specifications in this paper we propose statistical extraction framework called statistical snowball statsnowball which is bootstrapping system and can perform both traditional relation extraction and open ie statsnowball uses the discriminative markov logic networks mlns and softens hard rules by learning their weights in maximum likelihood estimate sense mln is general model and can be configured to perform different levels of relation extraction in statsnwoball pattern selection is performed by solving an norm penalized maximum likelihood estimation which enjoys well founded theories and efficient solvers we extensively evaluate the performance of statsnowball in different configurations on both small but fully labeled data set and large scale web data empirical results show that statsnowball can achieve significantly higher recall without sacrificing the high precision during iterations with small number of seeds and the joint inference of mln can improve the performance finally statsnowball is efficient and we have developed working entity relation search engine called renlifang based on it
as mobile and wireless technologies become more pervasive in our society people begin to depend on network connectivity regardless of their location their mobility however implies dynamic topology where routes to destination cannot always be guaranteed the intermittent connectivity that results from this lack of end to end connection is dominant problem that leads to user frustration existing research to provide the mobile user with facade of constant connectivity generally presents mechanisms to handle disconnections when they occur in contrast the system we propose in this paper provides ways to handle disconnections before they occur we present data bundling system for intermittent connections dbs ic comprised of stationary agent sa and mobile agent ma the sa pro actively gathers data the user has previously specified and opportunistically sends this data to the ma the sa groups the user requested data into one or more data bundles which are then incrementally delivered to the ma during short periods of connectivity we fully implement dbs ic and evaluate its performance via live tests under varying network conditions results show that our system decreases data retrieval time by factor of two in the average case and by factor of in the best case
we present formalism called addressed term rewriting systems which can be used to model implementations of theorem proving symbolic computation and programming languages especially aspects of sharing recursive computations and cyclic data structures addressed term rewriting systems are therefore well suited to describing object based languages and as an example we present language called lambda cal bj incorporating both functional and object based features as case study in how reasoning about languages is supported in the atrs formalism we define type system for lambda cal bj and prove type soundness result
we present method to detect and visualize evolution patterns in source code our method consists of three steps first we extract an annotated syntax tree ast from each version of given source code next we hash the extracted syntax nodes based on metric combining structure and type information and construct matches correspondences between similar hash subtrees our technique detects code fragments which have not changed or changed little during the software evolution by parameterizing the similarity metric we can flexibly decide what is considered to be identical or not during the software evolution finally we visualize the evolution of the code structure by emphasizing both changing and constant code patterns we demonstrate our technique on versioned code base containing variety of changes ranging from simple to complex
in recent years numerous algorithms have been proposed for incremental view maintenance of data warehouses as matter of fact all of them follow almost the same general approach namely they compute the change of multi source view in response to an update message from data source following two steps issue set of queries against the other data sources and ii compensate the query result due to concurrent updates interfering with the first step despite many recent improvements the compensation approach needs precise detection of interfering updates occurring remotely in autonomous data sources and the assumption that messages are never lost and are delivered in the order in which they are sent however in real networks loss and misordering of messages are usual in this paper we propose maintenance algorithm that does not need the compensation step and applies to general view expressions of the bag algebra without limit on the number of base relations per data source
one ideal of configuration management is to specify only desired behavior in high level language while an automatic configuration management system assures that behavior on an ongoing basis we call self managing subsystem of this kind closure to better understand the nature of closures we implemented an http service closure on top of an apache web server while the procedure for building the server is imperative in nature and the configuration language for the original server is declarative the language for the closure must be transactional ie based upon predictable and verifiable atomic changes in behavioral state we study the desirable properties of such transactional configuration management languages and conclude that these languages may well be the key to solving the change management problem for network configuration management
the presence of holes in triangle mesh is classically ascribed to the deficiencies of the point cloud acquired from physical object to be reverse engineered this lack of information results from both the scanning process and the object complexity the consequences are simply not acceptable in many application domains eg visualization finite element analysis or stl prototyping this paper addresses the way these holes can be filled in while minimizing the curvature variation between the surrounding and inserted meshes the curvature variation is simulated by the variation between external forces applied to the nodes of linear mechanical model coupled to the meshes the functional to be minimized is quadratic and set of geometric constraints can be added to further shape the inserted mesh in addition complete cleaning toolbox is proposed to remove degenerated and badly oriented triangles resulting from the scanning process
hmf is conservative extension of hindley milner type inference with first class polymorphism in contrast to other proposals hml uses regular system types and has simple type inference algorithm that is just small extension of the usual damas milner algorithm given the relative simplicity and expressive power we feel that hmf can be an attractive type system in practice there is reference implementation of the type system available online together with technical report containing proofs leijen
location information contains huge promise in the area of awareness technologies in pepe project automatic location detection was investigated as part of mobile presence system field study with twelve young adults was conducted to explore the usage habits of sharing location information the participants defined named and shared on average twenty meaningful locations with their friends they found the location information as the most relevant mobile presence attribute due to fact that it gave good overview on the status of the other users we focus on analyzing how the participants named locations and how they used location information in the context of mobile presence the participants utilized shared meanings of locations in naming and storing them to the pepe system we classified the created locations as generic locations points of interest and geographical areas the presented results will facilitate in designing location enhanced mobile awareness systems
mining graph patterns in large networks is critical to variety of applications such as malware detection and biological module discovery however frequent subgraphs are often ineffective to capture association existing in these applications due to the complexity of isomorphism testing and the inelastic pattern definition in this paper we introduce proximity pattern which is significant departure from the traditional concept of frequent subgraphs defined as set of labels that co occur in neighborhoods proximity pattern blurs the boundary between itemset and structure it relaxes the rigid structure constraint of frequent subgraphs while introducing connectivity to frequent itemsets therefore it can benefit from both efficient mining in itemsets and structure proximity from graphs we developed two models to define proximity patterns the second one called normalized probabilistic association nmpa is able to transform complex graph mining problem to simplified probabilistic itemset mining problem which can be solved eficiently by modified fp tree algorithm called pfp nmpa and pfp are evaluated on real life social and intrusion networks empirical results show that it not only finds interesting patterns that are ignored by the existing approaches but also achieves high performance for finding proximity patterns in large scale graphs
in this article we examine the role played by the interprocedural analysis of array accesses in the automatic parallelization of fortran programs we use the ptran system to provide measurements of several benchmarks to compare different methods of representing interprocedurally accessed arrays we examine issues concerning the effectiveness of automatic parallelization using these methods and the efficiency of precise summarization method
we develop cache oblivious data structure for storing set of axis aligned rectangles in the plane such that all rectangles in intersecting query rectangle or point can be found efficiently our structure is an axis aligned bounding box hierarchy and as such it is the first cache oblivious tree with provable performance guarantees if no point in the plane is contained in or more rectangles in the structure answers rectangle query using memory transfers and point query using memory transfers for any where is the block size of memory transfers between any two levels of multilevel memory hierarchy we also develop variant of our structure that achieves the same performance on input sets with arbitrary overlap among the rectangles the rectangle query bound matches the bound of the best known linear space cache aware structure
evolution of software intensive system is unavoidable in fact evolution can be seen as part of reuse process during the evolution of the software asset the major part of the system functionality is normally reused so the key issue is to identify the volatile parts of the domain requirements additionally there is promise that tailored tool support may help supporting evolution in software intensive systems in this paper we describe the volatility analysis method for product lines this highly practical method has been used in multiple domains and is able to express and estimate common types of evolutional characteristics the method is able to represent volatility in multiple levels and has capacity to tie the volatility estimation to one product line member specification we also briefly describe current tool support for the method the main contribution of this paper is volatility analysis framework that can be used to describe how requirements are estimated to evolve in the future the method is based on the definition hierarchy framework
we present technique that provides frame to frame coherence in non photorealistic animations it is considered very important subject for non photorealistic animations to maintain frame to frame coherence so that the resulting frames do not randomly change every frame we maintain coherence by using particle systems each particle means brush stroke in the resulting image since we have located particles on the object’s surface the coherence is maintained when the object or camera is moving in the scene of course the coherence is maintained when camera is zooming in out however the brush strokes on the surface also zoom in out this result in too large or too small brush strokes that are not considered hand crafted brush strokes meanwhile frame to frame coherence can be preserved during camera zoom in out by dynamically managing the number of brush stroke and maintaining its size
in this article we demonstrate the applicability of semantic techniques for detection of conflict of interest coi we explain the common challenges involved in building scalable semantic web applications in particular those addressing connecting the dots problems we describe in detail the challenges involved in two important aspects on building semantic web applications namely data acquisition and entity disambiguation or reference reconciliation we extend upon our previous work where we integrated the collaborative network of subset of dblp researchers with persons in friend of friend social network foaf our method finds the connections between people measures collaboration strength and includes heuristics that use friendship affiliation information to provide an estimate of potential coi in peer review scenario evaluations are presented by measuring what could have been the coi between accepted papers in various conference tracks and their respective program committee members the experimental results demonstrate that scalability can be achieved by using dataset of over million entities all bibliographic data from dblp and large collection of foaf documents
we study various shortcut fusion rules for languages like haskell following careful semantic account of recently proposed rule for circular program transformation we propose new rule that trades circularity for higher orderedness and thus attains better semantic properties this also leads us to revisit the original foldr build rule as well as its dual and to develop variants that do not suffer from detrimental impacts of haskell’s mixed strict nonstrict semantics throughout we offer pragmatic insights about our new rules to investigate also their relative effectiveness rather than just their semantic correctness
we present novel technique for large deformations on meshes using the volumetric graph laplacian we first construct graph representing the volume inside the input mesh the graph need not form solid meshing of the input mesh’s interior its edges simply connect nearby points in the volume this graph’s laplacian encodes volumetric details as the difference between each point in the graph and the average of its neighbors preserving these volumetric details during deformation imposes volumetric constraint that prevents unnatural changes in volume we also include in the graph points short distance outside the mesh to avoid local self intersections volumetric detail preservation is represented by quadric energy function minimizing it preserves details in least squares sense distributing error uniformly over the whole deformed mesh it can also be combined with conventional constraints involving surface positions details or smoothness and efficiently minimized by solving sparse linear systemwe apply this technique in curve based deformation system allowing novice users to create pleasing deformations with little effort novel application of this system is to apply nonrigid and exaggerated deformations of cartoon characters to meshes we demonstrate our system’s potential with several examples
generic file sharing pp applications have gained high popularity in the past few years in particular pp streaming architectures have also attracted attention many of them consider many to one streaming or high level multicast techniques in contrast to these models we propose techniques and algorithms for point to point streaming in autonomous systems as it might occur in large companies campus or even in large hotels our major aim is to create replica situation that inter subnetwork rsvp streams are reduced to minimum therefore we introduce the architecture of an overlay network for interconnecting subnetworks each subnetwork contains so called local active rendezvous server lars which not only acts as directory server but also controls availability of movie content in its subnetwork owing to this we consider data placement strategies depending on restrictions of network bandwidth peer capabilities as well as the movies access frequency
as the number of users using web based applications increases online help systems are required to provide appropriate information in multiple formats and accesses since users are dealing with various applications simultaneously so as to complete objectives and make daily decisions it is necessary to provide anytime anywhere help or tutorials however developing effective and efficient help systems on behalf of users is costly as geographically distributed and multilingual users with minimal training or interruption have increased significantly it has become requirement to restructure help systems in order to meet users requirements and business challenges it is necessary to restructure help systems to deliver training documentation and online help while taking into account nonfunctional requirements such as usability time to market quality and maintainability in this paper we try to overcome such technical and economic constraints through single sourcing and content reuse embrace as many users as possible by html based help systems and beef up help contents through unified and structural help system design in addition network pipe method for user feedback data collection and statistical analysis has been suggested as user feedback automation
highly realistic virtual human models are rapidly becoming commonplace in computer graphics these models often represented by complex shape and requiring labor intensive process challenge the problem of automatic modeling the problem and solutions to automatic modeling of animatable virtual humans are studied methods for capturing the shape of real people parameterization techniques for modeling static shape the variety of human body shapes and dynamic shape how the body shape changes as it moves of virtual humans are classified summarized and compared finally methods for clothed virtual humans are reviewed
image annotation can be formulated as classification problem recently adaboost learning with feature selection has been used for creating an accurate ensemble classifier we propose dynamic adaboost learning with feature selection based on parallel genetic algorithm for image annotation in mpeg standard in each iteration of adaboost learning genetic algorithm ga is used to dynamically generate and optimize set of feature subsets on which the weak classifiers are constructed so that an ensemble member is selected we investigate two methods of ga feature selection binary coded chromosome ga feature selection method used to perform optimal feature subset selection and bi coded chromosome ga feature selection method used to perform optimal weighted feature subset selection ie simultaneously perform optimal feature subset selection and corresponding optimal weight subset selection to improve the computational efficiency of our approach master slave ga parallel program of ga is implemented nearest neighbor classifier is used as the base classifier the experiments are performed over classified corel images to validate the performance of the approaches
how to measure usability is an important question in hci research and user interface evaluation we review current practice in measuring usability by categorizing and discussing usability measures from studies published in core hci journals and proceedings the discussion distinguish several problems with the measures including whether they actually measure usability if they cover usability broadly how they are reasoned about and if they meet recommendations on how to measure usability in many studies the choice of and reasoning about usability measures fall short of valid and reliable account of usability as quality in use of the user interface being studied based on the review we discuss challenges for studies of usability and for research into how to measure usability the challenges are to distinguish and empirically compare subjective and objective measures of usability to focus on developing and employing measures of learning and retention to study long term use and usability to extend measures of satisfaction beyond post use questionnaires to validate and standardize the host of subjective satisfaction questionnaires used to study correlations between usability measures as means for validation and to use both micro and macro tasks and corresponding measures of usability in conclusion we argue that increased attention to the problems identified and challenges discussed may strengthen studies of usability and usability research
this article studies approximate distributed routing schemes on dynamic communication networks the work focuses on dynamic weighted general graphs where the vertices of the graph are fixed but the weights of the edges may change our main contribution concerns bounding the cost of adapting to dynamic changes the update efficiency of routing scheme is measured by the time needed in order to update the routing scheme following weight change naive dynamic routing scheme which updates all vertices following weight change requires omega diam time in order to perform the updates after every weight change where diam is the diameter of the underlying graph in contrast this article presents approximate dynamic routing schemes with average time complexity theta tilde per topological change where is the local density parameter of the underlying graph following weight change our scheme never incurs more than diam time thus our scheme is particularly efficient on graphs which have low local density and large diameter the article also establishes upper and lower bounds on the size of the databases required by the scheme at each site
robust detection of large dictionary of salient objects in natural image database is of fundamental importance to image retrieval systems we review three popular frameworks for salient object detection ie segmentation based method grid based method and part based method and discuss their advantages and limitations we argue that using these frameworks individually is generally not enough to handle large number of salient object classes accurately because of the intrinsic diversity of salient object features motivated by this observation we have proposed new system which combines the merits of these frameworks into one single hybrid system the system automatically selects the appropriate modeling method for each individual object class using measure and shape variance we conduct comparison experiments on two popular image dataset corel and labelme empirical results have shown that the proposed hybrid method is more general and can handle much more salient object classes in robust manner
in this article we present compiler based technique to help develop correct real time systems the domain we consider is that of multiprogrammed real time applications in which periodic tasks control physical systems via interacting with external sensors and actuators while system is up and running these operations must be performed as specified mdash otherwise the system may fail correctness depends not only on each program individually but also on the time multiplexed behavior of all of the programs running together errors due to overloaded resources are exposed very late in development process and often at runtime they are usually remedied by human intensive activities such as instrumentation measurement code tuning and redesign we describe static alternative to this process which relies on well accepted technologies from optimizing compilers and fixed priority scheduling specifically when set of tasks are found to be overloaded scheduling analyzer determines candidate tasks to be transformed via program slicing the slicing engine decomposes each of the selected tasks into two fragments one that is ldquo time critical rdquo and the other ldquo unobservable rdquo the unobservable part is then spliced to the end of the time critical code with the external semantics being maintained the benefit is that the scheduler may postpone the unobservable code beyond its original deadline which can enhance overall schedulability while the optimization is completely local the improvement is realized globally for the entire task set
two translations from activity diagrams to the input language of nusmv symbolic model verifier are presented both translations map an activity diagram into finite state machine and are inspired by existing statechart semantics the requirements level translation defines state machines that can be efficiently verified but are bit unrealistic since they assume the perfect synchrony hypothesis the implementation level translation defines state machines that cannot be verified so efficiently but that are more realistic since they do not use the perfect synchrony hypothesis to justify the use of the requirements level translation we show that for large class of activity diagrams and certain properties both translations are equivalent regardless of which translation is used the outcome of model checking is the same moreover for linear stuttering closed properties the implementation level translation is equivalent to slightly modified version of the requirements level translation we use the two translations to model check data integrity constraints for an activity diagram and set of class diagrams that specify the data manipulated in the activities both translations have been implemented in two tools we discuss our experiences in applying both translations to model check some large example activity diagrams
current models of the classification problem do not effectively handle bursts of particular classes coming in at different times in fact the current model of the classification problem simply concentrates on methods for one pass classification modeling of very large data sets our model for data stream classification views the data stream classification problem from the point of view of dynamic approach in which simultaneous training and test streams are used for dynamic classification of data sets this model reflects real life situations effectively since it is desirable to classify test streams in real time over an evolving training and test stream the aim here is to create classification system in which the training model can adapt quickly to the changes of the underlying data stream in order to achieve this goal we propose an on demand classification process which can dynamically select the appropriate window of past training data to build the classifier the empirical results indicate that the system maintains high classification accuracy in an evolving data stream while providing an efficient solution to the classification task
our studies have shown that as displays become larger users leave more windows open for easy multitasking larger number of windows however may increase the time that users spend arranging and switching between tasks we present scalable fabric task management system designed to address problems with the proliferation of open windows on the pc desktop scalable fabric couples window management with flexible visual representation to provide focus plus context solution to desktop complexity users interact with windows in central focus region of the display in normal manner but when user moves window into the periphery it shrinks in size getting smaller as it nears the edge of the display the window minimize action is redefined to return the window to its preferred location in the periphery allowing windows to remain visible when not in use windows in the periphery may be grouped together into named tasks and task switching is accomplished with single mouse click the spatial arrangement of tasks leverages human spatial memory to make task switching easier we review the evolution of scalable fabric over three design iterations including discussion of results from two user studies that were performed to compare the experience with scalable fabric to that of the microsoft windows xp taskbar
application specific instruction set extension is an effective technique for reducing accesses to components such as on and off chip memories register file and enhancing the energy efficiency however the addition of custom functional units to the base processor is required for supporting custom instructions which due to the increase of manufacturing and design costs in new nanometer scale technologies and shorter time to market is becoming an issue to address above issues in our proposed approach an optimized reconfigurable functional unit is used instead and instruction set customization is done after chip fabrication therefore while maintaining the flexibility of conventional microprocessor the low energy feature of customization is applicable experimental results show that the maximum and average energy savings are and respectively for our proposed architecture framework
many important applications in wireless mesh networks require reliable multicast communication ie with packet delivery ratio pdr previously numerous multicast protocols based on automatic repeat request arq have been proposed to improve the packet delivery ratio however these arq based protocols can lead to excessive control overhead and drastically reduced throughput in this paper we present comprehensive exploration of the design space for developing high throughput reliable multicast protocols that achieve pdr motivated by the fact that mac layer broadcast which is used by most wireless multicast protocols offers no reliability we first examine if better hop by hop reliability provided by unicasting the packets at the mac layer can help to achieve end to end multicast reliability we then turn to end to end solutions at the transport layer previously forward error correction fec techniques have been proved effective for providing reliable multicast in the internet by avoiding the control packet implosion and scalability problems of arq based protocols in this paper we examine if fec techniques can be equally effective to support reliable multicast in wireless mesh networks we integrate four representative reliable schemes one arq one fec and two hybrid originally developed for the internet with representative multicast protocol odmrp and evaluate their performance our experimental results via extensive simulations offer an in depth understanding of the various choices in the design space first compared to broadcast based unreliable odmrp using unicast for per hop transmission only offers very small improvement in reliability under low load but fails to improve the reliability under high load due to the significantly increased capacity requirement which leads to congestion and packet drop second at the transport layer the use of pure fec can significantly improve the reliability increasing pdr up to in many cases but can be inefficient in terms of the number of redundant packets transmitted in contrast carefully designed arq fec hybrid protocol such as rmdp can also offer reliability while improving the efficiency by up to compared to pure fec scheme to our best knowledge this is the first in depth study of high throughput reliable multicast protocols that provide pdr for wireless mesh networks
this paper describes graph coloring compiler framework to allocate on chip srf stream register file storage for optimizing scientific applications on stream processors our framework consists of first applying enabling optimizations such as loop unrolling to expose stream reuse and opportunities for maximizing parallelism ie overlapping kernel execution and memory transfersthen the three srf management tasks are solved in unified manner via graph coloring placing streams in the srf exploiting stream use and maximizing parallelism we evaluate the performance of our compiler framework by actually running nine representative scientific computing kernels on our ft stream processor our preliminary results show that compiler management achieves an average speedup of compared to first fit allocation in comparison with the performance results obtained from running these benchmarks on itanium an average speedup of is observed
key evolving protocols aim at limiting damages when an attacker obtains full access to the signer’s storage to simplify the integration of such mechanisms into standard security architectures boyen shacham shen and waters suggested the construction of forward secure signatures fss that protect past periods after break in with untrusted updates where private keys are additionally protected by second factor derived from password key updates can be made on encrypted version of private keys so that passwords only come into play for signing messages boyen et al described pairing based scheme in the standard model and also suggested the integration of untrusted updates in the bellare miner forward secure signature they left open the problem of endowing other efficient fss systems with the same second factor protection we first address this problem and suggest generic ways to construct fss schemes in untrusted update environments in second step we extend the unprotected update model to other key evolving systems such as forward secure public key encryption and key insulated cryptosystems we then explain how some of the constructions that we proposed for forward secure signatures can be adapted to these models
for online medical education purposes we have developed novel scheme to incorporate the results of semantic video classification to select the most representative video shots for generating concept oriented summarization and skimming of surgery education videos first salient objects are used as the video patterns for feature extraction to achieve good representation of the intermediate video semantics the salient objects are defined as the salient video compounds that can be used to characterize the most significant perceptual properties of the corresponding real world physical objects in video and thus the appearances of such salient objects can be used to predict the appearances of the relevant semantic video concepts in specific video domain second novel multi modal boosting algorithm is developed to achieve more reliable video classifier training by incorporating feature hierarchy and boosting to dramatically reduce both the training cost and the size of training samples thus it can significantly speed up svm support vector machine classifier training in addition the unlabeled samples are integrated to reduce the human efforts on labeling large amount of training samples finally the results of semantic video classification are incorporated to enable concept oriented video summarization and skimming experimental results in specific domain of surgery education videos are provided
in this paper we present number of augmented refrigerator magnet concepts the concepts are shown to be derived from previous research into the everyday use of fridge surfaces three broadly encompassing practices have been addressed through the concepts organization planning in households ii reminding and iii methods household members use to assign ownership to particular tasks activities and artifacts particular emphasis is given to design approach that aims to build on the simplicity of magnets so that each of the concepts offers basic simple to operate function the concepts and our use of what we call this less is more design sensibility are examined using low fidelity prototyping exercise the results of this preliminary work suggest that the concepts have the potential to be easily incorporated into household routines and that the design of simple functioning devices lends itself to this
while string matching plays an important role in deep packet inspection applications its software algorithms are insufficient to meet the demands of high speed performance accordingly we were motivated to propose fast and deterministic performance root hashing automaton matching rham coprocessor for embedded network processor although automaton algorithms are robust with deterministic matching time there is still plenty of room for improvement of their average case performance the proposed rham employs novel root hashing technique to accelerate automaton matching in our experiment rham is implemented in prevalent automaton algorithm aho corasick ac which is often used in many packet inspection applications compared to the original ac rham only requires extra vector size in kbytes for root hashing and has about and outperformance for urls and virus patterns respectively implementaion of rham fpga can perform at the rate of gbps with the pattern amount in bytes this is superior to all previous matching hardware in terms of throughput and pattern set
the osgi framework is run time environment for deploying service containing java components dynamically reconfigurable java applications can be developed through the framework’s powerful capabilities such as installing uninstalling updating components at run time and allowing the substitution of service implementations at run time coupled with the capability to be remotely managed the osgi framework is proving success in variety of application domains one domain where it is yet to make an impact is real time systems despite the fact that osgi components and services can be developed using the real time specification for java rtsj there are still variety of problems preventing the use of the framework to develop real time systems one such problem is lack of temporal isolation this paper focuses on how temporal isolation can be provided in the osgi framework as first step towards using the framework to developing real time systems with the rtsj
as more of our communication commerce and personal data goes online credibility becomes an increasingly important issue how do we determine if our commerce sites our healthcare sites or our online communication partners are credible this paper examines the research literature in the area of web credibility this review starts by examining the cognitive foundations of credibility other sections of the paper examine not only the general credibility of web sites but also online communication such as mail instant messaging and online communities training and education as well as future issues such as captchas and phishing will be addressed the implications for multiple populations users web developers browser designers and librarians will be discussed
the use of rules in distributed environment creates new challenges for the development of active rule execution models in particular since single event can trigger multiple rules that execute over distributed sources of data it is important to make use of concurrent rule execution whenever possible this paper presents the details of the integration rule scheduling irs algorithm integration rules are active database rules that are used for component integration in distributed environment the irs algorithm identifies rule conflicts for multiple rules triggered by the same event through static compile time analysis of the read and write sets of each rule unique aspect of the algorithm is that the conflict analysis includes the effects of nested rule execution that occurs as result of using an execution model with an immediate coupling mode the algorithm therefore identifies conflicts that may occur as result of the concurrent execution of different rule triggering sequences the rules are then formed into priority graph before execution defining the order in which rules triggered by the same event should be processed rules with the same priority can be executed concurrently the irs algorithm guarantees confluence in the final state of the rule execution the irs algorithm is applicable for rule scheduling in both distributed and centralized rule execution environments
we propose the first differentially private aggregation algorithm for distributed time series data that offers good practical utility without any trusted server this addresses two important challenges in participatory data mining applications where individual users collect temporally correlated time series data such as location traces web history personal health data and ii an untrusted third party aggregator wishes to run aggregate queries on the data to ensure differential privacy for time series data despite the presence of temporal correlation we propose the fourier perturbation algorithm fpak standard differential privacy techniques perform poorly for time series data to answer queries such techniques can result in noise of to each query answer making the answers practically useless if is large our fpak algorithm perturbs the discrete fourier transform of the query answers for answering queries fpak improves the expected error from to roughly where is the number of fourier coefficients that can approximately reconstruct all the query answers our experiments show that for many real life data sets resulting in huge error improvement for fpak to deal with the absence of trusted central server we propose the distributed laplace perturbation algorithm dlpa to add noise in distributed way in order to guarantee differential privacy to the best of our knowledge dlpa is the first distributed differentially private algorithm that can scale with large number of users dlpa outperforms the only other distributed solution for differential privacy proposed so far by reducing the computational load per user from to where is the number of users
providing scalable video services in peer to peer pp environment is challenging since videos are typically large and require high communication bandwidth for delivery many peers may be unwilling to cache them in whole to serve others in this paper we address two fundamental research problems in providing scalable pp video services how host can find enough video pieces which may scatter among the whole system to assemble complete video and given limited buffer size what part of video host should cache and what existing data should be expunged to make necessary space we address these problems with two new ideas cell caching collaboration and controlled inverse proportional cip cache allocation the cell concept allows cost effective caching collaboration in fully distributed environment and can dramatically reduce video lookup cost on the other hand cip cache allocation challenges the conventional caching wisdom by caching unpopular videos in higher priority our approach allows the system to retain many copies of popular videos to avoid creating hot spots and at the same time prevent unpopular videos from being quickly evicted from the system we have implemented gnutella like simulation network and use it as testbed to evaluate the proposed technique our extensive study shows convincingly the performance advantage of the new scheme
grid computing infrastructures embody cost effective computing paradigm that virtualises heterogeneous system resources to meet the dynamic needs of critical business and scientific applications these applications range from batch processes and long running tasks to real time and even transactional applications grid computing environments are inherently dynamic and unpredictable environments sharing services amongst many different users grid schedulers aim to make the most efficient use of grid resources high utilisation while providing the best possible performance to the grid applications reducing makespan and satisfying the associated performance and quality of service qos constraints additionally in commercial grid settings where economic considerations are an increasingly important part of grid scheduling it is necessary to minimise the cost of application execution on the behalf of the grid users while ensuring that the applications meet their qos constraints furthermore efficient resource allocation may allow resource broker to maximise their profit by minimising the quantity of resource procurement scheduling in such large scale dynamic and distributed environment is complex undertaking in this paper we propose an approach to grid scheduling which abstracts over the details of individual applications focusing instead on the global cost optimisation problem while taking into account the entire workload dynamically adjusting to the varying service demands our model places particular emphasis on the stochastic and unpredictable nature of the grid leading to more accurate reflection of the state of the grid and hence more efficient and accurate scheduling decisions
the oasis technical committee published the xacml administration and delegation profile xacml admin working draft on april in order to provide policy administration and dynamic delegation services to the xacml runtime we enhance this profile by adding role based delegation by amalgamating the proposed profile with the xacml arbac profile proposed in by doing so we improve the scalability of the delegation mechanism second we show how xacml arbac enforcement mechanism proposed in can be enhanced to enforce the proposed role based administration and delegation xacml profile xacml adrbac therefore providing method to enforce the xacml admin profile proposed in
we propose general method for reranker construction which targets choosing the candidate with the least expected loss rather than the most probable candidate different approaches to expected loss approximation are considered including estimating from the probabilistic model used to generate the candidates estimating from discriminative model trained to rerank the candidates and learning to approximate the expected loss the proposed methods are applied to the parse reranking task with various baseline models achieving significant improvement both over the probabilistic models and the discriminative rerankers when neural network parser is used as the probabilistic model and the voted perceptron algorithm with data defined kernels as the learning algorithm the loss minimization model achieves labeled constituents score on the standard wsj parsing task
as data of an unprecedented scale are becoming accessible it becomes more and more important to help each user identify the ideal results of manageable size as such mechanism skyline queries have recently attracted lot of attention for its intuitive query formulation this intuitiveness however has side effect of retrieving too many results especially for high dimensional data this paper is to support personalized skyline queries as identifying truly interesting objects based on user specific preference and retrieval size in particular we abstract personalized skyline ranking as dynamic search over skyline subspaces guided by user specific preference we then develop novel algorithm navigating on compressed structure itself to reduce the storage overhead furthermore we also develop novel techniques to interleave cube construction with navigation for some scenarios without priori structure finally we extend the proposed techniques for user specific preferences including equivalence preference our extensive evaluation results validate the effectiveness and efficiency of the proposed algorithms on both real life and synthetic data
measurement and estimation of packet loss characteristics are challenging due to the relatively rare occurrence and typically short duration of packet loss episodes while active probe tools are commonly used to measure packet loss on end to end paths there has been little analysis of the accuracy of these tools or their impact on the network the objective of our study is to understand how to measure packet loss episodes accurately with end to end probes we begin by testing the capability of standard poisson modulated end to end measurements of loss in controlled laboratory environment using ip routers and commodity end hosts our tests show that loss characteristics reported from such poisson modulated probe tools can be quite inaccurate over range of traffic conditions motivated by these observations we introduce new algorithm for packet loss measurement that is designed to overcome the deficiencies in standard poisson based tools specifically our method entails probe experiments that follow geometric distribution to enable an explicit trade off between accuracy and impact on the network and enable more accurate measurements than standard poisson probing at the same rate we evaluate the capabilities of our methodology experimentally by developing and implementing prototype tool called badabing the experiments demonstrate the trade offs between impact on the network and measurement accuracy we show that badabing reports loss characteristics far more accurately than traditional loss measurement tools
active contour modelling is useful to fit non textured objects and algorithms have been developed to recover the motion of an object and its uncertainty here we show that these algorithms can be used also with point features matched in textured objects and that active contours and point matches complement in natural way in the same manner we also show that depth from zoom algorithms developed for zooming cameras can be exploited also in the foveal peripheral eye configuration present in the armar iii humanoid robot
software system documentation is almost always expressed informally in natural language and free text examples include requirement specifications design documents manual pages system development journals error logs and related maintenance reports we propose method based on information retrieval to recover traceability links between source code and free text documents premise of our work is that programmers use meaningful names for program items such as functions variables types classes and methods we believe that the application domain knowledge that programmers process when writing the code is often captured by the mnemonics for identifiers therefore the analysis of these mnemonics can help to associate high level concepts with program concepts and vice versa we apply both probabilistic and vector space information retrieval model in two case studies to trace source code onto manual pages and java code to functional requirements we compare the results of applying the two models discuss the benefits and limitations and describe directions for improvements
the proliferation of bioinformatics activities brings new challenges how to understand and organise these resources how to exchange and reuse successful experimental procedures and to provide interoperability among data and tools this paper describes an effort toward these directions it is based on combining research on ontology management ai and scientific workflows to design reuse and annotate bioinformatics experiments the resulting framework supports automatic or interactive composition of tasks based on ai planning techniques and takes advantage of ontologies to support the specification and annotation of bioinformatics workflows we validate our proposal with prototype running on real data
we present method to view dependently control the size of shape features depicted in computer generated line drawings of meshes our method exhibits good temporal coherence during level of detail transitions and is fast because the calculations are carried out entirely on the gpu the strategy is to pre compute via digital geometry processing technique sequence of filtered versions of the mesh that eliminate shape features at progressively larger scales each filtered mesh retains the original connectivity providing direct correspondence between meshesat run time the meshes are loaded onto the graphics card and vertex program interpolates curvatures and positions between corresponding vertices in adjacent meshes of the sequence fragment program then renders silhouettes and suggestive contours to produce line drawing for which the size of depicted shape features follows user specified target size for example we can depict fine shape features over nearby surfaces and appropriately coarse scaled features in more distant regions more general level of detail policies could be implemented on top of our approach by letting the target size vary with scene attributes such as depth image location or annotations provided by the scene designer
many applications of object oriented database systems demand high performance and require longer duration transactions these requirements are contrary to one another two trends in modern systems can help improve the situation firstly multiprocessors are becoming commonplace and secondly object oriented database systems supporting multiple versions are becoming popular consequently database protocols that are less prone to extensive blocking as can be the case with the popular two phase locking protocol are needed in future systems the authors simulation studies have shown substantial performance improvements can be obtained by using multiversion protocols for database transaction management these protocols provide higher throughput at higher levels of concurrency which are achievable with multiprocessors than their traditional single version equivalents
this paper contributes general approach to characterizing patterns of change in spatio temporal databasewhile there is particular interest in modelling and querying how spatio temporal entities evolve the approach contributed by the paper is distinctive in being applicable without modification to aspatial entities as well the paper uses the tripod spatio temporal model to describe and instantiate in detail the contributed approach after briefly describing typical application and providing basic knowledge about tripod the paper characterizes and classifies evolution queries and describes in detail how they are evaluated
we present semantics for architectural specifications in the common algebraic specification language casl including an extended static analysis compatible with model theoretic requirements the main obstacle here is the lack of amalgamation for casl models to circumvent this problem we extend the casl logic by introducing enriched signatures where subsort embeddings form category rather than just preorder the extended model functor satisfies the amalgamation property as well as its converse which makes it possible to express the amalgamability conditions in the semantic rules in static terms using these concepts we develop the semantics at various levels in an institution independent fashion moreover amalgamation for enriched casl means that variety of results for institutions with amalgamation such as computation of normal forms and theorem proving for structured specifications can now be used for casl
xml is emerging as an important format for describing the schema of documents and data to facilitate integration of applications in variety of industry domains an important issue that naturally arises is the requirement to generate store and access xml documents it is important to reuse existing data management systems and repositories for this purpose in this paper we describe the xml access server xas general purpose xml based storage and retrieval system which provides the appearance of large set of xml documents while retaining the data in underlying federated data sources that could be relational object oriented or semi structured xas automatically maps the underlying data into virtual xml components when mappings between dtds and underlying schemas are established the components can be presented as xml documents or assembled into larger components xas manages the relationship between xml components and the mapping in the form of document composition logic the versatility in its ways to generate xml documents enables xas to serve large number of xml components and documents efficiently and expediently
this paper presents methodology for automatically designing instruction set extensions in embedded processors many commercially available cpus now offer the possibility of extending their instruction set for specific application their tool chains typically support manual experimentations but algorithms that can define the set of customised functional units most beneficial for given applications are missing only few algorithms exist but are severely limited in the type and size of operation clusters they can choose and hence reduce significantly the effectiveness of specialisation more general algorithm is presented here which selects maximal speedup convex subgraphs of the application data flow graph under fundamental microarchitectural constraints and which improves significantly on the state of the art
network security depends heavily on automated intrusion detection systems ids to sense malicious activity unfortunately ids often deliver both too much raw information and an incomplete local picture impeding accurate assessment of emerging threats we propose system to support analysis of ids logs that visually pivots large sets of net flows in particular two visual representations of the flow data are compared treemap visualization of local network hosts which are linked through hierarchical edge bundles with the external hosts and graph representation using force directed layout to visualize the structure of the host communication patterns three case studies demonstrate the capabilities of our tool to analyze service usage in managed network detect distributed attack and investigate hosts in our network that communicate with suspect external ips
application layer multicast alm uses overlays built on top of existing network infrastructure for rapid deployment of multicast applications key to the efficiency of this technique is the structure of the overlay tree used this work reviews and compares various self organising techniques that strive to build low cost and low delay trees using extensive simulations protocols investigated include hmtp hostcast switch trees dcmaltp nice tbcp and narada which encompass wide spectrum of overlay construction optimisation and maintenance techniques the protocols are evaluated based on their ability to achieve their objectives overlay path penalties protocol convergence and overhead we also conduct detailed analysis of two main components in building an overlay initial construction and the overhead of periodical improvement based on the observed results we identify strengths and weaknesses of various approaches and provide suggestions for future work on alm overlay optimisation
clio is an existing schema mapping tool that provides user friendly means to manage and facilitate the complex task of transformation and integration of heterogeneous data such as xml over the web or in xml databases by means of mappings from source to target schemas clio can help users conveniently establish the precise semantics of data transformation and integration in this paper we study the problem of how to efficiently implement such data transformation ie generating target data from the source data based on schema mappings we present three phase framework for high performance xml to xml transformation based on schema mappings and discuss methodologies and algorithms for implementing these phases in particular we elaborate on novel techniques such as streamed extraction of mapped source values and scalable disk based merging of overlapping data including duplicate elimination we compare our transformation framework with alternative methods such as using xquery or sql xml provided by current commercial databases the results demonstrate that the three phase framework although as simple as it is is highly scalable and outperforms the alternative methods by orders of magnitude
abstract in many applications users specify target values for certain attributes features without requiring exact matches to these values in return instead the result is typically ranked list of top objects that best match the specified feature values user subjectivity is an important aspect of such queries ie which objects are relevant to the user and which are not depends on the perception of the user due to the subjective nature of top queries the answers returned by the system to an user query often do not satisfy the users need right away either because the weights and the distance functions associated with the features do not accurately capture the users perception or because the specified target values do not fully capture her information need or both in such cases the user would like to refine the query and resubmit it in order to get back better set of answers while there has been lot of research on query refinement models there is no work that we are aware of on supporting refinement of top queries efficiently in database system done naively each refined query can be treated as starting query and evaluated from scratch this paper explores alternative approaches that significantly improve the cost of evaluating refined queries by exploiting the observation that the refined queries are not modified drastically from one iteration to another our experiments over real life multimedia data set show that the proposed techniques save more than percent of the execution cost of refined queries over the naive approach and is more than an order of magnitude faster than simple sequential scan
with the rapid increase of data in many areas clustering on large datasets has become an important problem in data analysis since cluster analysis is highly iterative process cluster analysis on large datasets prefers short iteration on relatively small representative set thus two phase framework sampling summarization iterative cluster analysis is often applied in practice since the clustering result only labels the small representative set there are problems with extending the result to the entire large dataset which are almost ignored by the traditional clustering research this extending is often named as labeling process labeling irregular shaped clusters distinguishing outliers and extending cluster boundary are the main problems in this stage we address these problems and propose visualization based approach to dealing with them precisely this approach partially involves human into the process of defining and refining the structure clustermap based on this structure the clustermap algorithm scans the large dataset to adapt the boundary extension and generate the cluster labels for the entire dataset experimental result shows that clustermap can preserve cluster quality considerably with low computational cost compared to the distance comparison based labeling algorithms
we present an efficient fully automatic approach to fault localization for safety properties stated in linear temporal logic we view the failure as contradiction between the specification and the actual behavior and look for components that explain this discrepancy we find these components by solving the satisfiability of propositional boolean formula we show how to construct this formula and how to extend it so that we find exactly those components that can be used to repair the circuit for given set of counterexamples furthermore we discuss how to efficiently solve the formula by using the proper decision heuristics and simulation based preprocessing we demonstrate the quality and efficiency of our approach by experimental results
in recent years online spam has become major problem for the sustainability of the internet excessive amounts of spam are not only reducing the quality of information available on the internet but also creating concern amongst search engines and web users this paper aims to analyse existing works in two different categories of spam domains email spam and mage spam to gain deeper understanding of this problem future reserch directions are also presented in these spam domains
hierarchical organization is well known property of language and yet the notion of hierarchical structure has been largely absent from the best performing machine translation systems in recent community wide evaluations in this paper we discuss new hierarchical phrase based statistical machine translation system chiang presenting recent extensions to the original proposal new evaluation results in community wide evaluation and novel technique for fine grained comparative analysis of mt systems
memory leaks are caused by software programs that prevent the reclamation of memory that is no longer in use they can cause significant slowdowns exhaustion of available storage space and eventually application crashes detecting memory leaks is challenging because real world applications are built on multiple layers of software frameworks making it difficult for developer to know whether observed references to objects are legitimate or the cause of leak we present graph mining solution to this problem wherein we analyze heap dumps to automatically identify subgraphs which could represent potential memory leak sources although heap dumps are commonly analyzed in existing heap profiling tools our work is the first to apply graph grammar mining solution to this problem unlike classical graph mining work we show that it suffices to mine the dominator tree of the heap dump which is significantly smaller than the underlying graph our approach identifies not just leaking candidates and their structure but also provides aggregate information about the access path to the leaks we demonstrate several synthetic as well as real world examples of heap dumps for which our approach provides more insight into the problem than state of the art tools such as eclipse’s mat
sampling is important for variety of graphics applications include rendering imaging and geometry processing however producing sample sets with desired efficiency and blue noise statistics has been major challenge as existing methods are either sequential with limited speed or are parallel but only through pre computed datasets and thus fall short in producing samples with blue noise statistics we present poisson disk sampling algorithm that runs in parallel and produces all samples on the fly with desired blue noise properties our main idea is to subdivide the sample domain into grid cells and we draw samples concurrently from multiple cells that are sufficiently far apart so that their samples cannot conflict one another we present parallel implementation of our algorithm running on gpu with constant cost per sample and constant number of computation passes for target number of samples our algorithm also works in arbitrary dimension and allows adaptive sampling from user specified importance field furthermore our algorithm is simple and easy to implement and runs faster than existing techniques
in this article we propose to investigate the extension of the edt squared euclidean distance transformation on irregular isothetic grids we give two algorithms to handle different structurations of grids we first describe simple approach based on the complete voronoi diagram of the background irregular cells naturally this is fast approach on sparse and chaotic grids then we extend the separable algorithm defined on square regular grids proposed in more convenient for dense grids those two methodologies permit to process efficiently edt on every irregular isothetic grids
the problem of cooperatively performing collection of tasks in decentralized setting where the computing medium is subject to adversarial perturbations is one of the fundamental problems in distributed computing such perturbations can be caused by processor failures unpredictable delays and communication breakdowns failure sensitive bounds for distributed cooperation problems for synchronous processors subject to crash failuresthese research results are motivated by the earlier work of the third author with paris kanellakis at brown university
in dsp processors minimizing the amount of address calculations is critical for reducing code size and improving performance since studies of programs have shown that instructions that manipulate address registers constitute significant portion of the overall instruction count up to percnt this work presents compiler based optimization strategy to ldquo reduce the code size in embedded systems rdquo our strategy maximizes the use of indirect addressing modes with postincrement decrement capabilities available in dsp processors these modes can be exploited by ensuring that successive references to variables access consecutive memory locations to achieve this spatial locality our approach uses both access pattern modification program code restructuring and memory storage reordering data layout restructuring experimental results on set of benchmark codes show the effectiveness of our solution and indicate that our approach outperforms the previous approaches to the problem in addition to resulting in significant reductions in instruction memory storage requirements the proposed technique improves execution time
composition of mappings between schemas is essential to support schema evolution data exchange data integration and other data management tasks in many applications mappings are given by embedded dependencies in this article we study the issues involved in composing such mappings our algorithms and results extend those of fagin et al who studied the composition of mappings given by several kinds of constraints in particular they proved that full source to target tuple generating dependencies tgds are closed under composition but embedded source to target tgds are not they introduced class of second order constraints so tgds that is closed under composition and has desirable properties for data exchange we study constraints that need not be source to target and we concentrate on obtaining first order embedded dependencies as part of this study we also consider full dependencies and second order constraints that arise from skolemizing embedded dependencies for each of the three classes of mappings that we study we provide an algorithm that attempts to compute the composition and sufficient conditions on the input mappings which guarantee that the algorithm will succeed in addition we give several negative results in particular we show that full and second order dependencies that are not limited to be source to target are not closed under composition for the latter under the additional restriction that no new function symbols are introduced furthermore we show that determining whether the composition can be given by these kinds of dependencies is undecidable
in this paper we address the problem of defining roles in organizations like trade ones the methodology we use is to model roles according to the agent metaphor we attribute to roles mental attitudes like beliefs desires and goals we relate them to the agent’s required expertise and responsibilities and we model role behavior in game theoretic terms analogously the organization is modelled as an agent which acts as normative system it imposes obligations to roles and to the agents playing the roles
we explore in this paper the efficient clustering of market basket data different from those of the traditional data the features of market basket data are known to be of high dimensionality and sparsity without explicitly considering the presence of the taxonomy most prior efforts on clustering market basket data can be viewed as dealing with items in the leaf level of the taxonomy tree clustering transactions across different levels of the taxonomy is of great importance for marketing strategies as well as for the result representation of the clustering techniques for market basket data in view of the features of market basket data we devise in this paper novel measurement called the category based adherence and utilize this measurement to perform the clustering with this category based adherence measurement we develop an efficient clustering algorithm called algorithm todes for market basket data with the objective to minimize the category based adherence the distance of an item to given cluster is defined as the number of links between this item and its nearest tode the category based adherence of transaction to cluster is then defined as the average distance of the items in this transaction to that cluster validation model based on information gain is also devised to assess the quality of clustering for market basket data as validated by both real and synthetic datasets it is shown by our experimental results with the taxonomy information algorithm todes devised in this paper significantly outperforms the prior works in both the execution efficiency and the clustering quality as measured by information gain indicating the usefulness of category based adherence in market basket data clustering
data mining applications place special requirements on clustering algorithms including the ability to find clusters embedded in subspaces of high dimensional data scalability end user comprehensibility of the results non presumption of any canonical data distribution and insensitivity to the order of input records we present clique clustering algorithm that satisfies each of these requirements clique identifies dense clusters in subspaces of maximum dimensionality it generates cluster descriptions in the form of dnf expressions that are minimized for ease of comprehension it produces identical results irrespective of the order in which input records are presented and does not presume any specific mathematical form for data distribution through experiments we show that clique efficiently finds accurate cluster in large high dimensional datasets
the behaviour of three methods for constructing binary heap on computer with hierarchical memory is studied the methods considered are the original one proposed by williams in which elements are repeatedly inserted into single heap the improvement by floyd in which small heaps are repeatedly merged to bigger heaps and recent method proposed eg by fadel et al in which heap is built layerwise both the worst case number of instructions and that of cache misses are analysed it is well known that floyd’s method has the best instruction count let denote the size of the heap to be constructed the number of elements that fit into cache line and let and be some positive constants our analysis shows that under reasonable assumptions repeated insertion and layerwise construction both incur at most cn cache misses whereas repeated merging as programmed by floyd can incur more than dn log inf inf cache misses however for our memory tuned versions of repeated insertion and repeated merging the number of cache misses incurred is close to the optimal bound in addition to these theoretical findings we communicate many practical experiences which we hope to be valuable for others doing experimental algorithmic work
xml compression has gained prominence recently because it counters the disadvantage of the verbose representation xml gives to data in many applications such as data exchange and data archiving entirely compressing and decompressing document is acceptable in other applications where queries must be run over compressed documents compression may not be beneficial since the performance penalty in running the query processor over compressed data outweighs the data compression benefits while balancing the interests of compression and query processing has received significant attention in the domain of relational databases these results do not immediately translate to xml data in this article we address the problem of embedding compression into xml databases without degrading query performance since the setting is rather different from relational databases the choice of compression granularity and compression algorithms must be revisited query execution in the compressed domain must also be rethought in the framework of xml query processing due to the richer structure of xml data indeed proper storage design for the compressed data plays crucial role here the xquec system xquery processor and compressor covers wide set of xquery queries in the compressed domain and relies on workload based cost model to perform the choices of the compression granules and of their corresponding compression algorithms as consequence xquec provides efficient query processing on compressed xml data an extensive experimental assessment is presented showing the effectiveness of the cost model the compression ratios and the query execution times
new method for generating trails from person’s movement through virtual environment ve is described the method is entirely automatic no user input is needed and uses string matching to identify similar sequences of movement and derive the person’s primary trail the method was evaluated in virtual building and generated trails that substantially reduced the distance participants traveled when they searched for target objects in the building weeks after set of familiarization sessions only modest amount of data typically five traversals of the building was required to generate trails that were both effective and stable and the method was not affected by the order in which objects were visited the trail generation method models an environment as graph and therefore may be applied to aiding navigation in the real world and information spaces as well as ves
we consider number of dynamic problems with no known poly logarithmic upper bounds and show that they require n� time per operation unless sum has strongly subquadratic algorithms our result is modular we describe carefully chosen dynamic version of set disjointness the multiphase problem and conjecture that it requires omega time per operation all our lower bounds follow by easy reduction we reduce sum to the multiphase problem ours is the first nonalgebraic reduction from sum and allows sum hardness results for combinatorial problems for instance it implies hardness of reporting all triangles in graph it is plausible that an unconditional lower bound for the multiphase problem can be established via number on forehead communication game
the use of concatenated schnorr signatures sch for the hierarchical delegation of public keys is well known technique in this paper we carry out thorough analysis of the identity based signature scheme that this technique yields the resulting scheme is of interest since it is intuitive simple and does not require pairings we prove that the scheme is secure against existential forgery on adaptive chosen message and adaptive identity attacks using variant of the forking lemma ps the security is proven in the random oracle model under the discrete logarithm assumption next we provide an estimation of its performance including comparison with the state of the art on identity based signatures we draw the conclusion that the schnorr like identity based signature scheme is arguably the most efficient such scheme known to date
the past three decades have seen the creation of several tools that extract visualize and manipulate graph structured representations of program information to facilitate interconnection and exchange of information between these tools and to support the prototyping and development of new tools it is desirable to have some generic support for the specification of graph transformations and exchanges between themgenset is generic programmable tool for transformation of graph structured data the implementation of the genset system and the programming paradigm of its language are both based on the view of directed graph as binary relation rather than use traditional relational algebra to specify transformations however we opt instead for the more expressive class of flow equations flow equations or more generally systems of simultaneous fixpoint equations have seen fruitful applications in several areas including data and control flow analysis formal verification and logic programming in genset they provide the fundamental construct for the programmer to use in defining new transformations
radio frequency identification rfid applications are emerging as key components in object tracking and supply chain management systems in next future almost every major retailer will use rfid systems to track the shipment of products from suppliers to warehouses due to rfid readings features this will result in huge amount of information generated by such systems when costs will be at level such that each individual item could be tagged thus leaving trail of data as it moves through different locations we define technique for efficiently detecting anomalous data in order to prevent problems related to inefficient shipment or fraudulent actions since items usually move together in large groups through distribution centers and only in stores do they move in smaller groups we exploit such feature in order to design our technique the preliminary experiments show the effectiveness of our approach
software developers often structure programs in such way that different pieces of code constitute distinct principals types help define the protocol by which these principals interact in particular abstract types allow principal to make strong assumptions about how well typed clients use the facilities that it provides we show how the notions of principals and type abstraction can be formalized within language different principals can know the implementation of different abstract types we use additional syntax to track the flow of values with abstract types during the evaluation of program and demonstrate how this framework supports syntactic proofs in the sytle of subject reduction for type abstraction properties such properties have traditionally required semantic arguments using syntax aboids the need to build model and recursive typesfor the language we present various typed lambda calculi with principals including versions that have mutable state and recursive types
internet data traffic is doubling each year yet bandwidth doesnot appear to be growing as fast as expected and thus shortfalls in available bandwidth particularly at the last mile mayresult to address these bandwidth allocation and congestionproblems researchers are proposing new overlay networks thatprovide high quality of service and near lossless guaranteehowever the central question raised by these new servicesis what impact will they have in the large to address these and other network engineering research questions highperformancesimulation tools are required however to date optimistic techniques have been viewed as operating outside of the performance envelope for internet protocols such as tcp ospf and bgpin this paper we dispel those views and demonstrate thatoptimistic protocols are able to efficiently simulate large scaletcp scenarios for realistic network topologies using singlehyper threaded computing system costing less than usd for our real world topology we use the core at usnetwork our optimistic simulator yields extremely high efficiency and many of our performance runs produce zero rollbacksour compact modeling framework reduces the amountof memory required per tcp connection and thus our memoryoverhead per connection for one of our largest experimentalnetwork topologies was kb that value was comprised ofall events used to model tcp packets tcp connection stateand routing information
sponsored search is one of the enabling technologies for today’s web search engines it corresponds to matching and showing ads related to the user query on the search engine results page users are likely to click on topically related ads and the advertisers pay only when user clicks on their ad hence it is important to be able to predict if an ad is likely to be clicked and maximize the number of clicks we investigate the sponsored search problem from machine learning perspective with respect to three main sub problems how to use click data for training and evaluation which learning framework is more suitable for the task and which features are useful for existing models we perform large scale evaluation based on data from commercial web search engine results show that it is possible to learn and evaluate directly and exclusively on click data encoding pairwise preferences following simple and conservative assumptions we find that online multilayer perceptron learning based on small set of features representing content similarity of different kinds significantly outperforms an information retrieval baseline and other learning models providing suitable framework for the sponsored search task
we study conflict free data distribution schemes in parallel memories in multiprocessor system architectures given host graph the problem is to map the nodes of into memory modules such that any instance of template type in can be accessed without memory conflicts conflict occurs if two or more nodes of are mapped to the same memory module the mapping algorithm should be fast in terms of data access possibly mapping each node in constant time ii minimize the required number of memory modules for accessing any instance in of the given template type and iii guarantee load balancing on the modules in this paper we consider conflict free access to star templates ie to any node of along with all of its neighbors such template type arises in many classical algorithms like breadth first search in graph message broadcasting in networks and nearest neighbor based approximation in numerical computation we consider the star template access problem on two specific host graphs tori and hypercubes that are also popular interconnection network topologies the proposed conflict free mappings on these graphs are fast use an optimal or provably good number of memory modules and guarantee load balancing
mining frequent sequences in large databases has been an important research topic the main challenge of mining frequent sequences is the high processing cost due to the large amount of data in this paper we propose novel strategy to find all the frequent sequences without having to compute the support counts of non frequent sequences the previous works prune candidate sequences based on the frequent sequences with shorter lengths while our strategy prunes candidate sequences according to the non frequent sequences with the same lengths as result our strategy can cooperate with the previous works to achieve better performance we then identify three major strategies used in the previous works and combine them with our strategy into an efficient algorithm the novelty of our algorithm lies in its ability to dynamically switch from previous strategy to our new strategy in the mining process for better performance experiment results show that our algorithm outperforms the previous ones under various parameter settings
for many parallel applications on distributed memory systems array re decomposition is usually required to enhance data locality and reduce the communication overheads how to effectively schedule messages to improve the performance of array re decomposition has received much attention in recent years this paper is devoted to develop efficient scheduling algorithms using the compiling information provided by array distribution patterns array alignment patterns and the periodic property of array accesses our algorithms not only avoid inter processor contention but also reduces real communication cost and communication generation time the experimental results show that the performance of array redecomposition can be significantly improved using our algorithms
recent seminal result of räcke is that for any undirected network there is an oblivious routing algorithm with polylogarithmic competitive ratio with respect to congestion unfortunately räcke’s construction is not polynomial time we give polynomial time construction that guarantees räcke’s bounds and more generally gives the true optimal ratio for any undirected or directed network
often insertion of several aspects into one system is desired and in that case the problem of interference among the different aspects might arise even if each aspect individually woven is correct relative to its specification in this type of interference one aspect can prevent another from having the required effect on woven system such interference is defined and specifications of aspects are described an incremental proof strategy based on model checking pairs of aspects for generic model expressing the specifications is defined when an aspect is added to library of noninterfering aspects only its interaction with each of the aspects from the library needs to be checked such checks for each pair of aspects are proven sufficient to detect interference or establish interference freedom for any order of application of any collection of aspects in library implemented examples of interfering aspects are analyzed and the results are described showing the advantage of the incremental strategy over direct proof in space needed for the model check early analysis and detection of such interference in libraries of aspects will enable informed choice of the aspects to be applied and of the weaving order
in this paper we introduce new nearest neighbor data structure and describe several ways that it may be used for symbolic regression compared to genetic programming alone an algorithm using nearest neighbor indexing can search much larger space and even so typically find smaller more general models in addition we introduce permutation tests in order to discriminate between relevant and irrelevant features
functional regression testing frt is performed to ensure that new version of product functions properly as designed in corporate environment the large numbers of test jobs and the complexity of scheduling the jobs on different platforms make performance of this testing an important issue grid provides an infrastructure for applications to use shared heterogeneous resources such an infrastructure may be used to solve large scale testing problems or to improve application performance frt is good candidate application for running on grid because each test job can run separately in parallel however experience indicates that such applications may suffer performance problems without proper cost based grid scheduling strategythe database technology dbt regression test team at ibm conducts the frt for ibm db universal databasetm db udb products as case study we examined the current test scheduling approach for the db products we found that the performance of the test scheduler suffers because it does not incorporate cost dependent selection of jobs and slaves testing ids therefore we have replaced the db test scheduler with one that estimates jobs run times and then chooses slaves using those times although knowing job’s actual run time is difficult we can use case based reasoning to estimate it based on past experience we create case base to store historical data and design an algorithm to estimate new jobs run times by identifying cases that have executed in the past the performance evaluation of our new scheduler shows significant performance benefit over the original scheduler in this paper we also examine how machine specifications such as the number of slaves running on machine and the machine speed affect application performance and run time estimation accuracy
state of the art relational and continuous algorithms alike have focused on producing optimal or near optimal query plans by minimizing single cost function however ensuring accurate yet real time responses for stream processing applications necessitates that the system identifies qualified rather than optimal query plans with the former guaranteeing that their utilization of both the cpu and the memory resources stays within their respective system capacities in such scenarios being optimal in one resource usage while out of bound in the other is not viable our experimental study illustrates that to be effective qualified plan optimizer must explore an extended plan search space called the jtree space composed not only of the standard mjoin and binary join plans but also of general join trees with mixed operator types while our proposed dynamic programming based jtree finder algorithm is guaranteed to generate qualified query plan if such plan exists in the search space its exponential time complexity makes it not viable for continuous stream environments to facilitate run time optimization we thus propose an efficient yet effective two layer plan generation framework the proposed framework first exploits the positive correlation between the cpu and memory usages to obtain plans that are minimal in at least one of the two resource usages in our second layer we propose two alternative polynomial time algorithms to explore the negative correlation between the resource usages to successfully generate query plans that adhere to both cpu and memory resource constraints effectiveness and efficiency of the proposed algorithms are experimentally evaluated by comparing them to each other as well as state of the art techniques
this work considers the problem of minimizing the power consumption for real time scheduling on processors with discrete operating modes we provide model for determining the expected energy demand based on statistical execution profiles which considers both the current and subsequent tasks if the load after the execution of the current task is expected to be high and slack time is conserved for subsequent tasks we are able to derive an optimal solution to the energy minimization problem for the remaining cases we propose heuristic approach that also achieves low run time overhead in contrast to previous work our scheduling approach is not restricted to single task scenarios frame based real time systems or pre computed schedules simulations and comparisons with energy efficient schedulers from literature demonstrate the efficiency of our approach
in this paper we investigate the compiler algorithms to support compiled communication in multiprocessor environments and study the benefits of compiled communication assuming that the underlying network is an all optical time division multiplexing tdm network we present an experimental compiler suif that supports compiled communication for high performance fortran hpf like programs on all optical tdm networks describe and evaluate the compiler algorithms used in suif we further demonstrate the effectiveness of compiled communication on all optical tdm networks by comparing the performance of compiled communication with that of traditional communication method using number of application programs
discretionary access control dac systems provide powerful resource management mechanisms based on the selective distribution of capabilities to selected classes of principals we study type based theory of dac models for process calculus that extends cardelli ghelli and gordon’s pi calculus with groups cardelli et al in our theory groups play the role of principals and form the unit of abstraction for our access control policies and types allow the specification of fine grained access control policies to govern the transmission of names bound the iterated re transmission of capabilities and predicate their use on the inability to pass them to third parties the type system relies on subtyping to achieve selective distribution of capabilities to the groups that control the communication channels we show that the typing and subtyping relationships of the calculus are decidable we also prove type safety result showing that in well typed processes all names label label flow according to the access control policies specified by their types and label ii label are received at the intended sites with the intended capabilities we illustrate the expressive power and the flexibility of the typing system using several examples
in this paper we present method for creating watercolor like animation starting from video as input the method involves two main steps applying textures that simulate watercolor appearance and creating simplified abstracted version of the video to which the texturing operations are applied both of these steps are subject to highly visible temporal artifacts so the primary technical contributions of the paper are extensions of previous methods for texturing and abstraction to provide temporal coherence when applied to video sequences to maintain coherence for textures we employ texture advection along lines of optical flow we furthermore extend previous approaches by incorporating advection in both forward and reverse directions through the video which allows for minimal texture distortion particularly in areas of disocclusion that are otherwise highly problematic to maintain coherence for abstraction we employ mathematical morphology extended to the temporal domain using filters whose temporal extents are locally controlled by the degree of distortions in the optical flow together these techniques provide the first practical and robust approach for producing watercolor animations from video which we demonstrate with number of examples
reasoning in systems integrating description logics dl ontologies and datalog rules is very hard task and previous studies have shown undecidability of reasoning in systems integrating even very simple dl ontologies with recursive datalog however the results obtained so far constitute very partial picture of the computational properties of systems combining dl ontologies and datalog rules the aim of this paper is to contribute to complete this picture extending the computational analysis of reasoning in systems integrating ontologies and datalog rules more precisely we first provide set of decidability and complexity results for reasoning in systems combining ontologies specified in dls and rules specified in nonrecursive datalog and its extensions with inequality and negation such results identify from the viewpoint of the expressive abilities of the two formalisms minimal combinations of description logics and datalog in which reasoning is undecidable then we present new results on the decidability and complexity of the so called restricted or safe integration of dl ontologies and datalog rules our results show that the unrestricted interaction between dls and datalog is computationally very hard even in the absence of recursion in rules surprisingly the various safeness restrictions which have been defined to regain decidability of reasoning in the interaction between dls and recursive datalog appear as necessary restrictions even when rules are not recursive
in this paper we are concerned with locating the most congested regions in fpga designs before routing is completed as well we are interested in the amount of congestion in these locations relative to surrounding areas if this estimation is done accurately and early enough eg prior to routing or even prior to placement the data can be used to avoid or spread out congestion before it becomes problem we implemented several estimation methods in the vpr tool set and visually compare estimation results to an actual routing congestion ma we find that standard image processing techniques such as blending and peak saturation considerably improve the quality of estimation for all metrics
recently trajectory data mining has received lot of attention in both the industry and the academic research in this paper we study the privacy threats in trajectory data publishing and show that traditional anonymization methods are not applicable for trajectory data due to its challenging properties high dimensional sparse and sequential our primary contributions are to propose new privacy model called lkc privacy that overcomes these challenges and to develop an efficient anonymization algorithm to achieve lkc privacy while preserving the information utility for trajectory pattern mining
multicast avoids sending repeated packets over the same network links and thus offers the promise of supporting multimedia streaming over wide area networks previously two opposite multicast schemes forward path forwarding and reverse path forwarding have been proposed on top of structured peer to peer pp overlay networks this paper presents borg new scalable application level multicast system built on top of pp overlay networks borg is hybrid protocol that exploits the asymmetry in pp routing and leverages the reverse path multicast scheme for its low link stress on the physical networks borg has been implemented on top of pastry generic structured pp routing substrate simulation results based on realistic network topology model shows that borg induces significantly lower routing delay penalty than both forward path and reverse path multicast schemes while retaining the low link stress of the reverse path multicast scheme
prior research suggests that people may ask their family and friends for computer help but what influences whether and how helper will provide help to answer this question we conducted qualitative investigation of people who participated in computer support activities with family and friends in the past year we describe how factors including maintenance of one’s personal identity as computer expert and accountability to one’s social network determine who receives help and the quality of help provided we also discuss the complex fractured relationship between the numerous stakeholders involved in the upkeep of home computing infrastructures based on our findings we provide implications for the design of systems to support informal help giving in residential settings
signed binary resolution and hyperresolution belong to the basic resolution proof methods both the resolution methods are refutation sound and under some finitary restrictions refutation complete our aim is to investigate their refutational completeness in more general case we shall assume that clausal theories may be countable sets and the set of truth values an arbitrary one there are unsatisfiable countable clausal theories for which there do not exist refutations by signed binary resolution and hyperresolution we propose criterion on the existence of refutation of an unsatisfiable countable clausal theory by the investigated resolution methods two important applications of the achieved results to automated deduction in signed logic herbrand’s theorem and signed davis putnam logemann loveland dpll procedure will be discussed
wireless sensor networks promise an unprecedented opportunity to monitor physical environments via inexpensive wireless embedded devices given the sheer amount of sensed data efficient classification of them becomes critical task in many sensor network applications the large scale and the stringent energy constraints of such networks however challenge the conventional classification techniques that demand enormous storage space and centralized computation in this paper we propose novel decision tree based hierarchical distributed classification approach in which local classifiers are built by individual sensors and merged along the routing path forming spanning tree the classifiers are iteratively enhanced by combining strategically generated pseudo data and new local data eventually converging to global classifier for the whole network we also introduce some control factors to facilitate the effectiveness of our approach through extensive simulations we study the impact of the introduced control factors and demonstrate that our approach maintains high classification accuracy with very low storage and communication overhead the approach also addresses critical issue of heterogeneous data distribution among the sensors
this paper presents an algorithm for static termination analysis of active rules in context of modular design several recent works have suggested proving termination by using the concept of triggering graph we propose here an original approach based on these works and that allows to guarantee the termination of set of rules conceived by several designers even when none of the designers knows the set of the active rules we introduce the notions of private event and of public event and we refine the notion of triggering graph by enclosing also events in graphs we replace then the notion of cycle which is no more relevant in context of modular design by the notion of maximal private path preceding rule by means of these tools we show that it is possible to prove termination of active rules modular sets
prototype selection on the basis of conventional clustering algorithms results in good representation but is extremely time taking on large data sets kd trees on the other hand are exceptionally efficient in terms of time and space requirements for large data sets but fail to produce reasonable representation in certain situations we propose new algorithm with speed comparable to the present kd tree based algorithms which overcomes the problems related to the representation for high condensation ratios it uses the maxdiff criterion to separate out distant clusters in the initial stages before splitting them any further thus improving on the representation the splits being axis parallel more nodes would be required for the representing data set which has no regions where the points are well separated
among the vast numbers of images on the web are many duplicates and near duplicates that is variants derived from the same original image such near duplicates appear in many web image searches and may represent infringements of copyright or indicate the presence of redundancy while methods for identifying near duplicates have been investigated there has been no analysis of the kinds of alterations that are common on the web or evaluation of whether real cases of near duplication can in fact be identified in this paper we use popular queries and commercial image search service to collect images that we then manually analyse for instances of near duplication we show that such duplication is indeed significant but that not all kinds of image alteration explored in previous literature are evident in web data removal of near duplicates from collection is impractical but we propose that they be removed from sets of answers we evaluate our technique for automatic identification of near duplicates during query evaluation and show that it has promise as an effective mechanism for management of near duplication in practice
faceted metadata and navigation have become major topics in library science information retrieval and human computer interaction hci this work surveys range of extant approaches in this design space classifying systems along several relevant dimensions we use that survey to analyze the organization of data and its querying within faceted browsing systems we contribute formal entity relationship er and relational data models that explain that organization and relational query models that explain systems browsing functionality we use these types of models since they are widely used to conceptualize data and to model back end data stores their structured nature also suggests ways in which both the models and faceted systems might be extended
active learning reduces the amount of manually annotated sentences necessary when training state of the art statistical parsers one popular method uncertainty sampling selects sentences for which the parser exhibits low certainty however this method does not quantify confidence about the current statistical model itself in particular we should be less confident about selection decisions based on low frequency events we present novel two stage method which first targets sentences which cannot be reliably selected using uncertainty sampling and then applies standard uncertainty sampling to the remaining sentences an evaluation shows that this method performs better than pure uncertainty sampling and better than an ensemble method based on bagged ensemble members only
with the increased need of data sharing among multiple organizations such as government organizations financial corporations medical hospitals and academic institutions it is critical to ensure that data is trustworthy so that effective decisions can be made based on these data in this paper we first discuss motivations and requirement for data trustworthiness we then present an architectural framework for comprehensive system for trustworthiness assurance we then discuss an important issue in our framework that is the evaluation of data provenance and survey trust model for estimating the confidence level of the data and the trust level of data providers by taking into account confidence about data provenance we introduce an approach for policy observing query evaluation we highlight open research issues and research directions throughout the paper
the purpose of our research is to support cognitive motor and emotional development of severely disabled children in the school context we designed and implemented set of novel learning experiences that are both low cost and easily customizable and combine the visual communication paradigm of augmented alternative communication acc with multimedia tangible technology using an application framework developed at our lab called talking paper teachers and therapists can easily associate conventional paper based elements eg pcs cards drawings pictures to multimedia resources videos sounds animations and create playful interactive spaces that are customized to the specific learning needs of each disabled child paper based elements work as visual representations for the concepts children must learn as communication devices and as physical affordances for interacting with multimedia resources the paper presents the approach and its application in real school context highlighting the benefits for both disabled and non disabled children the latter were involved as co designers of multimedia contents and learning activities their creative participation favored group binding and increased tolerance and sense of community in the classroom so that the overall project became means for real inclusive education
in the context of learning recommender systems we propose that the users with greater knowledge for example those who have obtained better results in various tests have greater weight in the calculation of the recommendations than the users with less knowledge to achieve this objective we have designed some new equations in the nucleus of the memory based collaborative filtering in such way that the existent equations are extended to collect and process the information relative to the scores obtained by each user in variable number of level tests
the problem of defining optimal optimization strategies for compilers is well known and has been studied extensively over the last years the problem arises from the fact that the sheer number of possible combinations of optimizations their order and their setting creates search space which cannot adequately be searched although it has been shown that compiler settings can be found that outperform standard ox switches for single application it is not known how to find such settings that work well for sets of applicationsin this paper we introduce statistical technique to derive methodology which trims down the search space considerably thereby allowing feasible and flexible solution for defining high performance optimization strategies we show that our technique finds single compiler setting for collection of programs specint that performs better than the standard ox settings of gcc
it is well known that traditional skyline query is very likely to return over many but less informative data points in the result especially when the querying dataset is high dimensional or anti correlated in data stream applications where large amounts of data are continuously generated this problem becomes much more serious since the full skyline result cannot be obtained efficiently and analyzed easily to cope with this difficulty in this paper we propose new concept called combinatorial dominant relationship to abstract dominant representatives of stream data based on this concept we propose three novel skyline queries namely basic convex skyline query bcsq dynamic convex skyline query dcsq andreverse convex skyline query rcsq combining the concepts of convex derived from geometry and the traditional skyline for the first time these queries can adaptively abstract the contour of skyline points without specifying the size of result set in advance and promote information content of the query result to efficiently process these queries and maintain their results we design and analyze algorithms by exploiting memory indexing structure called dcel which is used to represent and store the arrangement of data in the sliding window we convert the problems of points in the primal plane into those of lines in dual plane through dual transformation which helps us avoid expensive full skyline computation and speeds up the candidate set selection finally through extensive experiments with both real and synthetic datasets we validate the representative capability of csqs as well as the performance of our proposed algorithms
in the incremental versions of facility location and median the demand points arrive one at time and the algorithm maintains good solution by either adding each new demand to an existing cluster or placing it in new singleton cluster the algorithm can also merge some of the existing clusters at any point in timefor facility location we consider the case of uniform facility costs where the cost of opening facility is the same for all points and present the first incremental algorithm which achieves constant performance ratio using this algorithm as building block we obtain the first incremental algorithm for median which achieves constant performance ratio using mediansthe algorithm is based on novel merge rule which ensures that the algorithm’s configuration monotonically converges to the optimal facility locations according to certain notion of distance using this property we reduce the general case to the special case when the optimal solution consists of single facility
program transformation system is determined by repertoire of correctness preserving rules such as folding and unfolding normally we would like the folding rule to be in some sense the inverse of the unfolding rule typically however the folding rule of logic program transformation systems is an inverse of limited kind of unfolding in many cases this limited kind of folding suffices we argue nevertheless that it is both important and possible to extend such folding so as to be able to fold the clauses resulting from any unfolding of positive literal this extended folding rule allows us to derive some programs underivable by the existing version of this rule alone in addition our folding rule has applications to decompilation and reengineering where we are interested in obtaining high level program constructs from low level program constructs moreover we establish connection between logic program transformation and inductive logic programming this connection stems fromviewing our folding rule as common extension of the existing multipleclause folding rule on the one hand and an operator devised in inductive logic programming called intra construction on the other hand hence our folding rule can be regarded as step towards incorporating inductive inference into logic program transformation we prove correctness with respect to dung and kanchanasut’s semantic kernel
identity management refers to authentication sharing of personally identifiable information pii and provision of mechanisms protecting the privacy thereof the most commonly implemented is federated authentication permitting users to maintain single set of credentials to access many services specifications exist for profile exchange between service provider sp and the identity provider idp but are rarely used most frequently local storage of profile data is utilised due to security and privacy concerns key work in this area includes that of the prime project which provides privacy enhancing identity management their work utilises local data stores and or trusted third parties
we explore the design space of using direct finger input in conjunction with deformable physical prop for the creation and manipulation of conceptual geometric models the user sculpts virtual models by manipulating the space on into and around the physical prop in an extension of the types of manipulations one would perform on traditional modeling media such as clay or foam the prop acts as proxy to the virtual model and hence as frame of reference to the user’s fingers prototype implementation uses camera based motion tracking technology to track passive markers on the fingers and prop the interface supports variety of clay like sculpting operations including deforming smoothing pasting and extruding all operations are performed using the unconstrained fingers with command input enabled by small set of finger gestures coupled with on screen widgets
environment monitoring in coal mines is an important application of wireless sensor networks wsns that has commercial potential we discuss the design of structure aware self adaptive wsn system sasa by regulating the mesh sensor network deployment and formulating collaborative mechanism based on regular beacon strategy sasa is able to rapidly detect structure variations caused by underground collapses we further develop sound and robust mechanism for efficiently handling queries under instable circumstances prototype is deployed with mica motes in real coal mine we present our implementation experiences as well as the experimental results to better evaluate the scalability and reliability of sasa we also conduct large scale trace driven simulation based on real data collected from the experiments
this paper is about non approximate acceleration of high dimensional nonparametric operations such as nearest neighbor classifiers we attempt to exploit the fact that even if we want exact answers to nonparametric queries we usually do not need to explicitly find the data points close to the query but merely need to answer questions about the properties of that set of data points this offers small amount of computational leeway and we investigate how much that leeway can be exploited this is applicable to many algorithms in nonparametric statistics memory based learning and kernel based learning but for clarity this paper concentrates on pure nn classification we introduce new ball tree algorithms that on real world data sets give accelerations from fold to fold compared against highly optimized traditional ball tree based nn these results include data sets with up to dimensions and records and demonstrate non trivial speed ups while giving exact answers
runtime monitoring systems play an important role in system security and verification efforts that ensure that these systems satisfy certain desirable security properties are growing in importance one such security property is complete mediation which requires that sensitive operations are performed by piece of code only after the monitoring system authorizes these actions in this paper we describe verification technique that is designed to check for the satisfaction of this property directly on code from java standard libraries we describe tool cmv that implements this technique and automatically checks shrink wrapped java bytecode for the complete mediation property experimental results on running our tool over several thousands of lines of bytecode from the java libraries suggest that our approach is scalable and leads to very significant reduction in human efforts required for system verification
modern systems on chip platforms support multiple clock domains in which different sub circuits are driven by different clock signals although the frequency of each domain can be customized the number of unique clock frequencies on platform is typically limited we define the clock frequency assignment problem to be the assignment of frequencies to processing modules each with an ideal maximum frequency such that the sum of module processing times is minimized subject to limit on the number of unique frequencies we develop novel polynomial time optimal algorithm to solve the problem based on dynamic programming we apply the algorithm to the particular context of post improvement of accelerator based hardware software partitioning and demonstrate additional speedups using just three clock domains
chip multiprocessors are quickly becoming popular in embedded systems however the practical success of cmps strongly depends on addressing the difficulty of multithreaded application development for such systems transactional memory tm promises to simplify concurrency management in multithreaded applications by allowing programmers to specify coarse grain parallel tasks while achieving performance comparable to fine grain lock based applications this paper presents atlas the first prototype of cmp with hardware support for transactional memory atlas includes embedded powerpc cores that access coherent shared memory in transactional manner the data cache for each core is modified to support the speculative buffering and conflict detection necessary for transactional execution we have mapped atlas to the bee multi fpga board to create full system prototype that operates at mhz boots linux and provides significant performance and ease of use benefits for range of parallel applications overall the atlas prototype provides an excellent framework for further research on the software and hardware techniques necessary to deliver on the potential of transactional memory
we propose novel type inference algorithm for dependently typed functional language the novel features of our algorithm are it can iteratively refine dependent types with interpolants until the type inference succeeds or the program is found to be ill typed and ii in the latter case it can generate kind of counter example as an explanation of why the program is ill typed we have implemented prototype type inference system and tested it for several programs
with the ever increasing deployment and usage of gigabit networks traditional network anomaly detection based intrusion detection systems ids have not scaled accordingly most if not all ids assume the availability of complete and clean audit data we contend that this assumption is not valid factors like noise mobility of the nodes and the large amount of network traffic make it difficult to build traffic profile of the network that is complete and immaculate for the purpose of anomaly detection in this paper we attempt to address these issues by presenting an anomaly detection scheme called scan stochastic clustering algorithm for network anomaly detection that has the capability to detect intrusions with high accuracy even with incomplete audit data to address the threats posed by network based denial of service attacks in high speed networks scan consists of two modules an anomaly detection module that is at the core of the design and an adaptive packet sampling scheme that intelligently samples packets to aid the anomaly detection module the noteworthy features of scan include it intelligently samples the incoming network traffic to decrease the amount of audit data being sampled while retaining the intrinsic characteristics of the network traffic itself it computes the missing elements of the sampled audit data by utilizing an improved expectation maximization em algorithm based clustering algorithm and it improves the speed of convergence of the clustering process by employing bloom filters and data summaries
the ivmx architecture contains novel vector register file of up to vector registers accessed indirectly via mapping mechanism providing compatibility with the vmx architecture and potential for dramatic performance benefits the large number of vector registers and the unique indirection mechanism pose compilation challenges to be used efficiently the indirection mechanism emphasizes spatial locality of registers and interaction among destination and source operands during register allocation and the many vector registers call for aggressive automatic vectorization this work is first step in addressing the compilability of ivmx following the presentation and validation of its architectural aspects in this paper we present several compilation approaches to deal with the mapping mechanism and an outer loop vectorization transformation developed to promote the use of many vector registers we modified an existing register allocator to target all available registers and added post pass to rename live ranges considering spatial locality and interaction among operand types an fir filter is used to demonstrate the effectiveness of the techniques developed compared to version hand optimized for ivmx initial results show that we can reduce the overhead of map management down to of the total instruction count compared to obtained manually and compared to obtained using naive scheme while outperforming an equivalent vmx implementation by factor of
maximum entropy maxent is useful in natural language processing and many other areas iterative scaling is methods are one of the most popular approaches to solve maxent with many variants of is methods it is difficult to understand them and see the differences in this paper we create general and unified framework for iterative scaling methods this framework also connects iterative scaling and coordinate descent methods we prove general convergence results for is methods and analyze their computational complexity based on the proposed framework we extend coordinate descent method for linear svm to maxent results show that it is faster than existing iterative scaling methods the acm portal is published by the association for computing machinery copyright acm inc terms of usage privacy policy code of ethics contact us useful downloads adobe acrobat quicktime windows media player real player
detection of near duplicate documents is an important problem in many data mining and information filtering applications when faced with massive quantities of data traditional duplicate detection techniques relying on direct inter document similarity computation eg using the cosine measure are often not feasible given the time and memory performance constraints on the other hand fingerprint based methods such as match are very attractive computationally but may be brittle with respect to small changes to document content we focus on approaches to near replica detection that are based upon large collection statistics and present general technique of increasing their robustness via multiple lexicon randomization in experiments with large web page and spam email datasets the proposed method is shown to consistently outperform traditional match with the relative improvement in duplicate document recall reaching as high as the large gains in detection accuracy are offset by only small increases in computational requirements
collaborative filtering cf is data analysis task appearing in many challenging applications in particular data mining in internet and commerce cf can often be formulated as identifying patterns in large and mostly empty rating matrix in this paper we focus on predicting unobserved ratings this task is often part of recommendation procedure we propose new cf approach called interlaced generalized linear models glm it is based on factorization of the rating matrix and uses probabilistic modeling to represent uncertainty in the ratings the advantage of this approach is that different configurations encoding different intuitions about the rating process can easily be tested while keeping the same learning procedure the glm formulation is the keystone to derive an efficient learning procedure applicable to large datasets we illustrate the technique on three public domain datasets
matrix factorization or often called decomposition is frequently used kernel in large number of applications ranging from linear solvers to data clustering and machine learning the central contribution of this paper is thorough performance study of four popular matrix factorization techniques namely lu cholesky qr and svd on the sti cell broadband engine the paper explores algorithmic as well as implementation challenges related to the cell chip multiprocessor and explains how we achieve near linear speedup on most of the factorization techniques for range of matrix sizes for each of the factorization routines we identify the bottleneck kernels and explain how we have attempted to resolve the bottleneck and to what extent we have been successful our implementations for the largest data sets that we use running on two node ghz cell bladecenter exercising total of sixteen spes on average deliver and gflops for dense lu dense cholesky sparse cholesky qr and svd respectively the implementations achieve speedup of and respectively for dense lu dense cholesky sparse cholesky qr and svd when running on sixteen spes we discuss the interesting interactions that result from parallelization of the factorization routines on two node non uniform memory access numa cell blade cluster
multiprocessors are now dominant but real multiprocessors do not provide the sequentially consistent memory that is assumed by most work on semantics and verification instead they have subtle relaxed or weak memory models usually described only in ambiguous prose leading to widespread confusion we develop rigorous and accurate semantics for multiprocessor programs from instruction decoding to relaxed memory model mechanised in hol we test the semantics against actual processors and the vendor litmus test examples and give an equivalent abstract machine characterisation of our axiomatic memory model for programs that are in some precise sense data race free we prove in hol that their behaviour is sequentially consistent we also contrast the model with some aspects of power and arm behaviour this provides solid intuition for low level programming and sound foundation for future work on verification static analysis and compilation of low level concurrent code
very thorough online self test is essential for overcoming major reliability challenges such as early life failures and transistor aging in advanced technologies this paper demonstrates the need for operating system os support to efficiently orchestrate online self test in future robust systems experimental data from an actual dual quad core system demonstrate that without software support online self test can significantly degrade performance of soft real time and computation intensive applications by up to and can result in perceptible delays for interactive applications to mitigate these problems we develop os scheduling techniques that are aware of online self test and schedule migrate tasks in multi core systems by taking into account the unavailability of one or more cores undergoing online self test these techniques eliminate any performance degradation and perceptible delays in soft real time and interactive applications otherwise introduced by online self test and significantly reduce the impact of online self test on the performance of computation intensive applications our techniques require minor modifications to existing os schedulers thereby enabling practical and efficient online self test in real systems
for several reasons databases may become inconsistent with respect to given set of integrity constraints ics the dbms have no mechanism to maintain certain classes of ics new constraints are imposed on preexisting legacy data the ics are soft user or informational constraints that are considered at query time but without being necessarily enforced data from different and autonomous sources are being integrated in particular in mediator based approaches
wireless sensor networks consist of large number of tiny sensors that have only limited energy supply one of the major challenges in constructing such networks is to maintain long network lifetime as well as sufficient sensing areas to achieve this goal broadly used method is to turn off redundant sensors in this paper the problem of estimating redundant sensing areas among neighbouring wireless sensors is analysed we present simple methods to estimate the degree of redundancy without the knowledge of location or directional information we also provide tight upper and lower bounds on the probability of complete redundancy and on the average partial redundancy with random sensor deployment our analysis shows that partial redundancy is more realistic for real applications as complete redundancy is expensive requiring up to neighbouring sensors to provide percent chance of complete redundancy based on the analysis we propose scalable lightweight deployment aware scheduling ldas algorithm which turns off redundant sensors without using accurate location information simulation study demonstrates that the ldas algorithm can reduce network energy consumption and provide desired qos requirement effectively
as technology scales the delay uncertainty caused by process variations has become increasingly pronounced in deep submicron designs in the presence of process variations worst case timing analysis may lead to overly conservative synthesis and may end up using excess resources to guarantee design constraints in this paper we propose an efficient variation aware resource sharing and binding algorithm in behavioral synthesis which takes into account the performance variations for functional units the performance yield which is defined as the probability that the synthesized hardware meets the target performance constraints is used to evaluate the synthesis result an efficient metric called statistical performance improvement is used to guide resource sharing and binding the proposed algorithm is evaluated within commercial synthesis framework that generates optimized rtl netlists from behavioral specifications the effectiveness of the proposed algorithm is demonstrated with set of industrial benchmark designs which consist of blocks that are commonly used in wireless and image processing applications the experimental results show that our method achieves an average area reduction over traditional methods which are based on the worst case delay analysis with an average run time overhead
current distributed meeting support systems support meeting management audio video communication and application sharing the modelling and the execution of meeting processes are usually not supported moreover the overall business context of the meetings in an enterprise is missing in this paper cooperative hypermedia system is presented which provides jointly editable visual hypermedia artifacts in distributed meeting these visual hypermedia objects can be used for modelling meeting processes handling external documents and application integration in addition the system provides process support for flexible meeting control and the invocation of communication tools office tools and groupware tools for specific meeting activities more importantly the meeting processes and their related information objects are modeled as part of the overall enterprise model such that they are placed in the overall business context of an enterprise use cases are provided to show how distributed project team can perform many kinds of distributed meetings while enjoying dedicated support for the planning control information management and follow up activities as integral parts of their business tasks
the success of system on chip soc hinges upon well concerted integrated approach from multiple disciplines such as device design and application from the device perspective rapidly improving vlsi technology allows the integration of billions of transistors on single chip thus permitting wide range of functions to be combined on one chip from the application perspective numerous killer applications have been identified which can make full use of the aforementioned functionalities provided by single chip from the design perspective however with greater device integration system designs become more complex and are increasingly challenging to design moving forward novel approaches will be needed to meet these challenges this paper explores several new design strategies which represent the current design trends to deal with the emerging issues for example recognizing the stringent requirements on power consumption memory bandwidth latency and transistor variability novel power thermal management multi processor soc reconfigurable logic and design for verification and testing have now been incorporated into modern system design in addition we look into some plausible solutions for example further innovations on scalable reusable and reliable system architectures ip deployment and integration on chip interconnects and memory hierarchies are all anticipated in the near future
multiclass support vector machine svm methods are well studied in recent literature comparison studies on uci statlog multiclass datasets suggest using one against one method for multiclass svm classification however in unilabel multiclass text categorization with svms no comparison studies exist with one against one and other methods eg one against all and several well known improvements to these approaches in this paper we bridge this gap by performing empirical comparison of standard one against all and one against one together with three improvements to these standard approaches for unilabel text categorization with svm as base binary learner we performed all our experiments on three standard text corpuses using two types of document representation outcome of our experiments partly support rifkin and klautau’s statement that for small scale unilabel text categorization tasks if parameters of the classifiers are well tuned one against all will have better performance than one against one and other methods
we present the design and analysis of the first fully expressive iterative combinatorial exchange ice the exchange incorporates tree based bidding language tbbl that is concise and expressive for ces bidders specify lower and upper bounds in tbbl on their value for different trades and refine these bounds across rounds these bounds allow price discovery and useful preference elicitation in early rounds and allow termination with an efficient trade despite partial information on bidder valuations all computation in the exchange is carefully optimized to exploit the structure of the bid trees and to avoid enumerating trades proxied interpretation of revealed preference activity rule coupled with simple linear prices ensures progress across rounds the exchange is fully implemented and we give results demonstrating several aspects of its scalability and economic properties with simulated bidding strategies
intermittent sensory actuation and communication failures may cause agents to fail in maintaining their commitments to others thus to collaborate robustly agents must monitor others to detect coordination failures previous work on monitoring has focused mainly on small scale systems with only limited number of agents however as the number of monitored agents is scaled up two issues are raised that challenge previous work first agents become physically and logically disconnected from their peers and thus their ability to monitor each other is reduced second the number of possible coordination failures grows exponentially with all potential interactions thus previous techniques that sift through all possible failure hypotheses cannot be used in large scale teams this paper tackles these challenges in the context of detecting disagreements among team members monitoring task that is of particular importance to robust teamwork first we present new bounds on the number of agents that must be monitored in team to guarantee disagreement detection these bounds significantly reduce the connectivity requirements of the monitoring task in the distributed case second we present yoyo highly scalable disagreement detection algorithm which guarantees sound detection yoyo’s run time scales linearly in the number of monitored agents despite the exponential number of hypotheses it compactly represents all valid hypotheses in single structure while allowing for complex hierarchical organizational structure to be considered in the monitoring both yoyo and the new bounds are explored analytically and empirically in monitoring problems involving thousands of agents
tree induction and logistic regression are two standard off the shelf methods for building models for classification we present large scale experimental comparison of logistic regression and tree induction assessing classification accuracy and the quality of rankings based on class membership probabilities we use learning curve analysis to examine the relationship of these measures to the size of the training set the results of the study show several things contrary to some prior observations logistic regression does not generally outperform tree induction more specifically and not surprisingly logistic regression is better for smaller training sets and tree induction for larger data sets importantly this often holds for training sets drawn from the same domain that is the learning curves cross so conclusions about induction algorithm superiority on given domain must be based on an analysis of the learning curves contrary to conventional wisdom tree induction is effective at producing probability based rankings although apparently comparatively less so for given training set size than at making classifications finally the domains on which tree induction and logistic regression are ultimately preferable can be characterized surprisingly well by simple measure of the separability of signal from noise
caching frequently accessed items on the mobile client is an important technique to enhance data availability and to improve data access time cache replacement policies are used to find suitable subset of items for eviction from the cache due to limited cache size the existing policies rely on euclidean space and consider euclidean distance as an important parameter for eviction however in practice the position and movement of objects are constrained to spatial networks where the important distance measure is the network distance in this paper we propose cache replacement policy which considers the network density network distance and probability of access as important factors for eviction we make use of an already proven technique called progressive incremental network expansion to compute the network distance more efficiently series of simulation experiments have been conducted to evaluate the performance of the policy results indicate that the proposed cache replacement scheme performs significantly better than the existing policies far and paid and wprrp
the major goal of the cospal project is to develop an artificial cognitive system architecture with the ability to autonomously extend its capabilities exploratory learning is one strategy that allows an extension of competences as provided by the environment of the system whereas classical learning methods aim at best for parametric generalization ie concluding from number of samples of problem class to the problem class itself exploration aims at applying acquired competences to new problem class and to apply generalization on conceptual level resulting in new models incremental or online learning is crucial requirement to perform exploratory learning in the cospal project we mainly investigate reinforcement type learning methods for exploratory learning and in this paper we focus on the organization of cognitive systems for efficient operation learning is used over the entire system it is organized in the form of four nested loops where the outermost loop reflects the user reinforcement feedback loop the intermediate two loops switch between different solution modes at symbolic respectively sub symbolic level and the innermost loop performs the acquired competences in terms of perception action cycles we present system diagram which explains this process in more detail we discuss the learning strategy in terms of learning scenarios provided by the user this interaction between user teacher and system is major difference to classical robotics systems where the system designer places his world model into the system we believe that this is the key to extendable robust system behavior and successful interaction of humans and artificial cognitive systems we furthermore address the issue of bootstrapping the system and in particular the visual recognition module we give some more in depth details about our recognition method and how feedback from higher levels is implemented the described system is however work in progress and no final results are available yet the available preliminary results that we have achieved so far clearly point towards successful proof of the architecture concept
in the emerging area of sensor based systems significant challenge is to develop scalable fault tolerant methods to extract useful information from the data the sensors collect an approach to this data management problem is the use of sensor database systems which allow users to perform aggregation queries such as min count and avg on the readings of sensor network in addition more advanced queries such as frequency counting and quantile estimation can be supported due to energy limitations in sensor based networks centralized data collection is generally impractical so most systems use in network aggregation to reduce network traffic however even these aggregation strategies remain bandwidth intensive when combined with the fault tolerant multipath routing methods often used in these environments to avoid this expense we investigate the use of approximate in network aggregation using small sketches we present duplicate insensitive sketching techniques that can be implemented efficiently on small sensor devices with limited hardware support and we analyze both their performance and accuracy finally we present an experimental evaluation that validates the effectiveness of our methods
while the user centered design methods we bring from humancomputer interaction to ubicomp help sketch ideas and refine prototypes few tools or techniques help explore divergent design concepts reflect on their merits and come to new understanding of design opportunities and ways to address them we present speed dating design method for rapidly exploring application concepts and their interactions and contextual dimensions without requiring any technology implementation situated between sketching and prototyping speed dating structures comparison of concepts helping identify and understand contextual risk factors and develop approaches to address them we illustrate how to use speed dating by applying it to our research on the smart home and dual income families and highlight our findings from using this method
when producing estimates in software projects expert opinions are frequently combined however it is poorly understood whether when and how to combine expert estimates in order to study the effects of combination technique called planning poker the technique was introduced in software project for half of the tasks the tasks estimated with planning poker provided group consensus estimates that were less optimistic than the statistical combination mean of individual estimates for the same tasks and group consensus estimates that were more accurate than the statistical combination of individual estimates for the same tasks for tasks in the same project individual experts who estimated set of control tasks achieved estimation accuracy similar to that achieved by estimators who estimated tasks using planning poker moreover for both planning poker and the control group measures of the median estimation bias indicated that both groups had unbiased estimates because the typical estimated task was perfectly on target code analysis revealed that for tasks estimated with planning poker more effort was expended due to the complexity of the changes to be made possibly caused by the information provided in group discussions
moving objects past arms reach is common action in both real world and digital tabletops in the real world the most common way to accomplish this task is by throwing or sliding the object across the table sliding is natural easy to do and fast however in digital tabletops few existing techniques for long distance movement bear any resemblance to these real world motions we have designed and evaluated two tabletop interaction techniques that closely mimic the action of sliding an object across the table flick is an open loop technique that is extremely fast superflick is based on flick but adds correction step to improve accuracy for small targets we carried out two user studies to compare these techniques to fast and accurate proxy based technique the radar view in the first study we found that flick is significantly faster than the radar for large targets but is inaccurate for small targets in the second study we found no differences between superflick and radar for either time or accuracy given the simplicity and learnability of flicking our results suggest that throwing based techniques have promise for improving the usability of digital tables
this paper describes how to estimate surface models from dense sets of noisy range data taken from different points of view ie multiple range maps the proposed method uses sensor model to develop an expression for the likelihood of surface conditional on set of noisy range measurements optimizing this likelihood with respect to the model parameters provides an unbiased and efficient estimator the proposed numerical algorithms make this estimation computationally practical for wide variety of circumstances the results from this method compare favorably with state of the art approaches that rely on the closest point or perpendicular distance metric convenient heuristic that produces biased solutions and fails completely when surfaces are not sufficiently smooth as in the case of complex scenes or noisy range measurements empirical results on both simulated and real ladar data demonstrate the effectiveness of the proposed method for several different types of problems furthermore the proposed method offers general framework that can accommodate extensions to include surface priors ie maximum posteriori more sophisticated noise models and other sensing modalities such as sonar or synthetic aperture radar
as the competition of web search market increases there is high demand for personalized web search to conduct retrieval incorporating web users information needs this paper focuses on utilizing clickthrough data to improve web search since millions of searches are conducted everyday search engine accumulates large volume of clickthrough data which records who submits queries and which pages he she clicks on the clickthrough data is highly sparse and contains different types of objects user query and web page and the relationships among these objects are also very complicated by performing analysis on these data we attempt to discover web users interests and the patterns that users locate information in this paper novel approach cubesvd is proposed to improve web search the clickthrough data is represented by order tensor on which we perform mode analysis using the higher order singular value decomposition technique to automatically capture the latent factors that govern the relations among these multi type objects users queries and web pages tensor reconstructed based on the cubesvd analysis reflects both the observed interactions among these objects and the implicit associations among them therefore web search activities can be carried out based on cubesvd analysis experimental evaluations using real world data set collected from an msn search engine show that cubesvd achieves encouraging search results in comparison with some standard methods
lossless data compression techniques can potentially free up more than of the memory resources however previously proposed schemes suffer from high access costs the proposed main memory compression scheme practically eliminates performance losses of previous schemes by exploiting simple and yet effective compression scheme highly efficient structure for locating compressed block in memory and hierarchical memory layout that allows compressibility of blocks to vary with low fragmentation overhead we have evaluated an embodiment of the proposed scheme in detail using integer and floating point applications from the spec suite along with two server applications and we show that the scheme robustly frees up of the memory resources on average with negligible impact on the performance of only on average
developing component based systems supports reuse and modularity but introduces new compatibility problems testing and analysis are usually based on the availability of either the source code or the specifications but cots components are commonly provided without source code and with incomplete specifications dynamic analysis technique called behavior capture and test can reveal cots component misbehaviors and incompatibilities bct first automatically derives behavioral models by monitoring component executions and then dynamically checks these models when the components are replaced or used as part of new system another article related to software composition also appears in this issue open source software all you do is put it together
we give graph theoretical characterization of answer sets of normal logic programs we show that there is one to one correspondence between answer sets and special non standard graph coloring of so called block graphs of logic programs this leads us to an alternative implementation paradigm to compute answer sets by computing non standard graph colorings our approach is rule based and not atom based like most of the currently known methods we present an implementation for computing answer sets which works on polynomial space
relevance feedback is well established and effective framework for narrowing down the gap between low level visual features and high level semantic concepts in content based image retrieval in most of traditional implementations of relevance feedback distance metric or classifier is usually learned from user’s provided negative and positive examples however due to the limitation of the user’s feedbacks and the high dimensionality of the feature space one is often confront with the issue of the curse of the dimensionality recently several researchers have considered manifold ways to address this issue such as locality preserving projections augmented relation embedding and semantic subspace projection in this paper by using techniques from spectral graph embedding and regression we propose unified framework called spectral regression for learning an image subspace this framework facilitates the analysis of the differences and connections between the algorithms mentioned above and more crucially it provides much faster computation and therefore makes the retrieval system capable of responding to the user’s query more efficiently
we study the relationship between concurrent separation logic csl and the assume guarantee method aka rely guarantee method we show in three steps that csl can be treated as specialization of the method for well synchronized concurrent programs first we present an based program logic for low level language with built in locking primitives then we extend the program logic with explicit separation of private data and shared data which provides better memory modularity finally we show that csl adapted for the low level language can be viewed as specialization of the extended logic by enforcing the invariant that shared resources are well formed outside of critical regions this work can also be viewed as different approach from brookes to proving the soundness of csl our csl inference rules are proved as lemmas in the based logic whose soundness is established following the syntactic approach to proving soundness of type systems
we present novel algorithm for efficiently splitting deformable solids along arbitrary piecewise linear crack surfaces in cutting and fracture simulations the algorithm combines meshless discretization of the deformation field with explicit surface tracking using triangle mesh we decompose the splitting operation into first step where we synthesize crack surfaces and second step where we use the newly synthesized surfaces to update the meshless discretization of the deformation field we present novel visibility graph for facilitating fast update of shape functions in the meshless discretization the separation of the splitting operation into two steps along with our novel visibility graph enables high flexibility and control over the splitting trajectories provides fast dynamic update of the meshless discretization and allows for an easy implementation as result our algorithm is scalable versatile and suitable for large range of applications from computer animation to interactive medical simulation
the application area of image retrieval systems has widely spread to the www and multimedia database environments this paper presents an image search system with analytical functions for combining shape structure and color features the system pre processes an image segmentation from hybrid color systems of hsl and cielab this segmentation process includes new mechanism for clustering the elements of high resolution images in order to improve precision and reduce computation time the system extracts three features of an image which are color shape and structure we apply the color vector quantization for the color feature extraction the shape properties of an image which include eccentricity area equivalent diameter and convex area are analyzed for extracting the shape feature the image structure is identified by applying forward mirror extended curvelet transform another distinctive idea introduced in this paper is new distance metric which represents semantic similarity this paper has evaluations of the system using jpeg images from corel image collections the experimental results clarify the feasibility and effectiveness of the proposed system to improve accuracy for image retrieval the acm portal is published by the association for computing machinery copyright acm inc terms of usage privacy policy code of ethics contact us useful downloads adobe acrobat quicktime windows media player real player
this paper considers two questions in cryptographycryptography secure against memory attacks particularly devastating side channel attack against cryptosystems termed the memory attack was proposed recently in this attack significant fraction of the bits of secret key of cryptographic algorithm can be measured by an adversary if the secret key is ever stored in part of memory which can be accessed even after power has been turned off for short amount of time such an attack has been shown to completely compromise the security of various cryptosystems in use including the rsa cryptosystem and aeswe show that the public key encryption scheme of regev stoc and the identity based encryption scheme of gentry peikert and vaikuntanathan stoc are remarkably robust against memory attacks where the adversary can measure large fraction of the bits of the secret key or more generally can compute an arbitrary function of the secret key of bounded output length this is done without increasing the size of the secret key and without introducing any complication of the natural encryption and decryption routinessimultaneous hardcore bits we say that block of bits of are simultaneously hard core for one way function if given they cannot be distinguished from random string of the same length although any candidate one way function can be shown to hide one hardcore bit and even logarithmic number of simultaneously hardcore bits there are few examples of one way or trapdoor functions for which linear number of the input bits have been proved simultaneously hardcore the ones that are known relate the simultaneous security to the difficulty of factoring integerswe show that for lattice based injective trapdoor function which is variant of function proposed earlier by gentry peikert and vaikuntanathan an number of input bits are simultaneously hardcore where is the total length of the inputthese two results rely on similar proof techniques
the technique of abstract interpretation has allowed the development of very sophisticated global program analyses which are at the same time provably correct and practical we present in tutorial fashion novel program development framework which uses abstract interpretation as fundamental tool the framework uses modular incremental abstract interpretation to obtain information about the program this information is used to validate programs to detect bugs with respect to partial specifications written using assertions in the program itself and or in system libraries to generate and simplify run time tests and to perform high level program transformations such as multiple abstract specialization parallelization and resource usage control all in provably correct way in the case of validation and debugging the assertions can refer to variety of program points such as procedure entry procedure exit points within procedures or global computations the system can reason with much richer information than for example traditional types this includes data structure shape including pointer sharing bounds on data structure sizes and other operational variable instantiation properties as well as procedure level properties such as determinacy termination nonfailure and bounds on resource consumption time or space cost ciaopp the preprocessor of the ciao multi paradigm programming system which implements the described functionality will be used to illustrate the fundamental ideas
objective the purpose of this study was to assess the performance of real time open end version of the dynamic time warping dtw algorithm for the recognition of motor exercises given possibly incomplete input stream of data and reference time series the open end dtw algorithm computes both the size of the prefix of reference which is best matched by the input and the dissimilarity between the matched portions the algorithm was used to provide real time feedback to neurological patients undergoing motor rehabilitation methods and materials we acquired dataset of multivariate time series from sensorized long sleeve shirt which contains strain sensors distributed on the upper limb seven typical rehabilitation exercises were recorded in several variations both correctly and incorrectly executed and at various speeds totaling data set of time series nearest neighbour classifiers were built according to the outputs of open end dtw alignments and their global counterparts on exercise pairs the classifiers were also tested on well known public datasets from heterogeneous domains results nonparametric tests show that on full time series the two algorithms achieve the same classification accuracy value on partial time series classifiers based on open end dtw have far higher accuracy versus
histograms and wavelet synopses provide useful tools in query optimization and approximate query answering traditional histogram construction algorithms such as optimal optimize absolute error measures for which the error in estimating true value of by has the same effect of estimating true value of by however several researchers have recently pointed out the drawbacks of such schemes and proposed wavelet based schemes to minimize relative error measures none of these schemes provide satisfactory guarantees and we provide evidence that the difficulty may lie in the choice of wavelets as the representation scheme in this paper we consider histogram construction for the known relative error measures we develop optimal as well as fast approximation algorithms we provide comprehensive theoretical analysis and demonstrate the effectiveness of these algorithms in providing significantly more accurate answers through synthetic and real life data sets
the use of embodied agents defined as visual human like representations accompanying computer interface is becoming prevalent in applications ranging from educational software to advertisements in the current work we assimilate previous empirical studies which compare interfaces with visually embodied agents to interfaces without agents both using an informal descriptive technique based on experimental results studies as well as formal statistical meta analysis studies results revealed significantly larger effect sizes when analyzing subjective responses ie questionnaire ratings interviews than when analyzing behavioral responses such as task performance and memory furthermore the effects of adding an agent to an interface are larger than the effects of animating an agent to behave more realistically however the overall effect sizes were quite small eg across studies adding face to an interface only explains approximately of the variance in results we discuss the implications for both designers building interfaces as well as social scientists designing experiments to evaluate those interfaces
scenarios have gained acceptance in both research and practice as way of grounding software engineering projects in the users work however the research on scenario based design sbd includes very few studies of how scenarios are actually used by practising software engineers in real world projects such studies are needed to evaluate current sbd approaches and advance our general understanding of what scenarios contribute to design this longitudinal field study analyses the use of scenarios during the conceptual design of large information system the role of the scenarios is compared and contrasted with that of three other design artefacts the requirements specification the business model and the user interface prototype the distinguishing features of the scenarios were that they were task based and descriptive by being task based the scenarios strung individual events and activities together purposeful sequences and thereby provided an intermediate level of description that was both an instantiation of overall work objectives and fairly persistent context for the gradual elaboration of subtasks by being descriptive the scenarios preserved real world feel of the contents flow and dynamics of the users work the scenarios made the users work recognizable to the software engineers as complex but organized human activity this way the scenarios attained unifying role as mediator among both the design artefacts and the software engineers whilst they were not used for communication with users the scenarios were however discontinued before the completion of the conceptual design because their creation and management was dependent on few software engineers who were also the driving forces of several other project activities finally the software engineers valued the concreteness and coherence of the scenarios although that entailed risk of missing some effective reconceptions of the users work
level of detail rendering is essential for rendering very large detailed worlds in real time unfortunately level of detail computations can be expensive creating bottleneck at the cputhis paper presents the cabtt algorithm an extension to existing binary triangle tree based level of detail algorithms instead of manipulating triangles the cabtt algorithm instead operates on clusters of geometry called aggregate triangles this reduces cpu overhead eliminating bottleneck common to level of detail algorithms since aggregate triangles stay fixed over several frames they may be cached on the video card this further reduces cpu load and fully utilizes the hardware accelerated rendering pipeline on modern video cards these improvements result in fourfold increase in frame rate over roam at high detail levels our implementation renders an approximation of an million triangle heightfield at frames per second with an maximum error of pixel on consumer hardware
the problem of computing maximum posteriori map configuration is central computational challenge associated with markov random fields there has been some focus on tree based linear programming lp relaxations for the map problem this paper develops family of super linearly convergent algorithms for solving these lps based on proximal minimization schemes using bregman divergences as with standard message passing on graphs the algorithms are distributed and exploit the underlying graphical structure and so scale well to large problems our algorithms have double loop character with the outer loop corresponding to the proximal sequence and an inner loop of cyclic bregman projections used to compute each proximal update we establish convergence guarantees for our algorithms and illustrate their performance via some simulations we also develop two classes of rounding schemes deterministic and randomized for obtaining integral configurations from the lp solutions our deterministic rounding schemes use re parameterization property of our algorithms so that when the lp solution is integral the map solution can be obtained even before the lp solver converges to the optimum we also propose graph structured randomized rounding schemes applicable to iterative lp solving algorithms in general we analyze the performance of and report simulations comparing these rounding schemes the acm portal is published by the association for computing machinery copyright acm inc terms of usage privacy policy code of ethics contact us useful downloads adobe acrobat quicktime windows media player real player
as future technologies push towards higher clock rates traditional scheduling techniques that are based on wake up and select from an instruction window fail to scale due to their circuit complexities speculative instruction schedulers can significantly reduce logic on the critical scheduling path but can suffer from instruction misscheduling that can result in wasted issue opportunitiesmisscheduled instructions can spawn other misscheduled instructions only to be replayed over again and again until correctly scheduled these tornadoes in the speculative scheduler are characterized by extremely low useful scheduling throughput and high volume of wasted issue opportunities the impact of tornadoes becomes even more severe when using simultaneous multithreading misschedulings from one thread can occupy significant portion of the processor issue bandwidth effectively starving other threadsin this paper we propose zephyr an architecture that inhibits the formation of tornadoes zephyr makes use of existing load latency prediction techniques as well as coarse grain fifo queues to buffer instructions before entering scheduling queues on average we observe improvement in ipc performance reduction in hazards reduction in occupancy and reduction in the number of replays compared with baseline scheduler
motivated by structural properties of the web graph that support efficient data structures for in memory adjacency queries we study the extent to which large network can be compressed boldi and vigna www showed that web graphs can be compressed down to three bits of storage per edge we study the compressibility of social networks where again adjacency queries are fundamental primitive to this end we propose simple combinatorial formulations that encapsulate efficient compressibility of graphs we show that some of the problems are np hard yet admit effective heuristics some of which can exploit properties of social networks such as link reciprocity our extensive experiments show that social networks and the web graph exhibit vastly different compressibility characteristics
major requirement of distributed database management system ddbms is to enable users to write queries as though the database were not distributed distribution transparency the ddbms transforms the user’s queries into execution strategies that is sequences of operations on the various nodes of the network and of transmissions between them an execution strategy on distributed database is correct if it returns the same result as if the query were applied to nondistributed database this paper analyzes the correctness problem for query execution strategies formal model called multirelational algebra is used as unifying framework for this purpose the problem of proving the correctness of execution strategies is reduced to the problem of proving the equivalence of two expressions of multirelational algebra set of theorems on equivalence is given in order to facilitate this task the proposed approach can be used also for the generation of correct execution strategies because it defines the rules which allow the transformation of correct strategy into an equivalent one this paper does not deal with the problem of evaluating equivalent strategies and therefore is not in itself proposal for query optimizer for distributed databases however it constitutes theoretical foundation for the design of such optimizers
automatic symbolic techniques to generate test inputs for example through concolic execution suffer from path explosion the number of paths to be symbolically solved for grows exponentially with the number of inputs in many applications though the inputs can be partitioned into non interfering blocks such that symbolically solving for each input block while keeping all other blocks fixed to concrete values can find the same set of assertion violations as symbolically solving for the entire input this can greatly reduce the number of paths to be solved in the best case from exponentially many to linearly many in the number of inputs we present an algorithm that combines test input generation by concolic execution with dynamic computation and maintenance of information flow between inputs our algorithm iteratively constructs partition of the inputs starting with the finest all inputs separate and merging blocks if dependency is detected between variables in distinct input blocks during test generation instead of exploring all paths of the program our algorithm separately explores paths for each block while fixing variables in other blocks to random values in the end the algorithm outputs an input partition and set of test inputs such that inputs in different blocks do not have any dependencies between them and the set of tests provides equivalent coverage with respect to finding assertion violations as full concolic execution we have implemented our algorithm in the splat test generation tool we demonstrate that our reduction is effective by generating tests for four examples in packet processing and operating system code
text is word together with an additional linear order on it we study quantitative models for texts ie text series which assign to texts elements of semiring we introduce an algebraic notion of recognizability following reutenauer and bozapalidis as well as weighted automata for texts combining an automaton model of lodaya and weil with model of esik and nemeth after that we show that both formalisms describe the text series definable in certain fragment of weighted logics as introduced by droste and gastin in order to do so we study certain definable transductions and show that they are compatible with weighted logics
conference refers to group of members in network who communicate with each other within the group in this paper we propose novel design for conference network which supports multiple disjoint conferences the major component of the network is multistage network composed of switch modules with fan in and fan out capability fast self routing algorithm is developed for setting up routing paths in the conference network for an big times bigr conference network we design the hardware cost is big log bigr and the routing time and communication delay vary from big bigr to big log bigr depending on where the conference is allocated in the network as can be seen the new conference network is superior to existing designs in terms of hardware cost routing time and communication delay the conference network proposed is rearrangeably nonblocking in general and is strictly nonblocking under some conference service policy it can be used in applications that require efficient or real time group communication
we describe methodology for transforming large class of highly concurrent linearizable objects into highly concurrent transactional objects as long as the linearizable implementation satisfies certain regularity properties informally that every method has an inverse we define simple wrapper for the linearizable implementation that guarantees that concurrent transactions without inherent conflicts can synchronize at the same granularity as the original linearizable implementation
architecture generation is the first step in the design of software systems many of the qualities that the final software system possesses are usually decided at the architecture development stage itself thus if the final system should be usable testable secure high performance mobile and adaptable then these qualities or non functional requirements nfrs should be engineered into the architecture itself in particular recently adaptability is emerging as an important attribute required by almost all software systems briefly adaptability is the ability of software system to accommodate changes in its environment embedded systems are usually constrained both in hardware and software current adaptable architecture development methods for embedded systems are usually manual and ad hoc there are almost no comprehensive systematic approaches to consider nfrs at the architecture development stage while there are several examples of approaches to generate architectures based on functional requirements we believe that there are very few techniques that consider nfrs such as adaptability during the process of architecture generation in this paper we present an automated design method that helps develop adaptable architectures for embedded systems by developing tool called software architecture adaptability assistant sa sa helps the developer during the process of software architecture development by selecting the architectural constituents such as components connections patterns constraints styles and rationales that best fit the adaptability requirements for the architecture the developer can then complete the architecture from the constituents chosen by the tool sa uses the knowledge base properties of the nfr framework in order to help automatically generate the architectural constituents the nfr framework provides systematic method to consider nfrs in particular their synergies and conflicts we validated the architectures generated by sa in type of embedded system called the vocabulary evolution system ves by implementing the codes from the generated architectures in the ves and confirming that the resulting system satisfied the requirements for adaptability ves responds to external commands through an interface such as ethernet and the vocabulary of these commands changes with time the validation process also led to the discovery of some of the shortcomings of our automated design method
near term digital radio ntdr network is kind of manet in which mobile nodes are assigned into different clusters therefore it can let the nodes to communicate with each other efficiently in large area despite several ntdr protocols have been proposed there still lacks an efficient secure one accordingly in this paper we propose new method based on id based bilinear pairings to overcome the unsolved security problems nowadays after our analysis we conclude that our scheme is the first protocol for ntdr network that is not only secure but also very efficient
we present new approach for the elicitation and development security requirements in the entire data warehouse dws life cycle which we have called secure engineering process for data warehouses sedawa whilst many methods for the requirements analysis phase of the dws have been proposed the elicitation of security requirements as non functional requirements has not received sufficient attention hence in this paper we propose methodology for the dw design based on model driven architecture mda and the standard software process engineering metamodel specification spem from the object management group omg we define four phases comprising of several activities and steps an five disciplines which cover the whole dw design our methodology adapts the framework to be used under mda and the spem approaches in order to elicit and develop security requirements for dws the benefits of our proposal are shown through an example related to the management of the pharmacies consortium business
modeling languages are fundamental part of automated software development mdd for example uses uml class diagrams and state machines as languages to define applications in this paper we explore how feature oriented software development fosd uses modern mathematics as modeling language to express the design and synthesis of programs in software product lines but demands little mathematical sophistication from its users doing so has three practical benefits it offers simple and principled mathematical description of how fosd transforms derives and relates program artifacts it exposes previously unrecognized commuting relationships among tool chains thereby providing new ways to debug tools and it reveals new ways to optimize software synthesis
cluster ensembles offer solution to challenges inherent to clustering arising from its ill posed nature cluster ensembles can provide robust and stable solutions by leveraging the consensus across multiple clustering results while averaging out emergent spurious structures that arise due to the various biases to which each participating algorithm is tuned in this article we address the problem of combining multiple weighted clusters that belong to different subspaces of the input space we leverage the diversity of the input clusterings in order to generate consensus partition that is superior to the participating ones since we are dealing with weighted clusters our consensus functions make use of the weight vectors associated with the clusters we demonstrate the effectiveness of our techniques by running experiments with several real datasets including high dimensional text data furthermore we investigate in depth the issue of diversity and accuracy for our ensemble methods our analysis and experimental results show that the proposed techniques are capable of producing partition that is as good as or better than the best individual clustering
recovery oriented software is built with the perspective that hardware or software failures and operation mistakes are facts to be coped with as they are problems that cannot be fully solved while developing real complex applications consequently any software will always have non zero chance of failure some of these failures may be caused by defects that may be removed or encapsulated from the point of view of removing or encapsulating defects failure is considered to be trivial when the required effort to identify and eliminate or encapsulate the causing defect is small ii the risk of making mistakes in these steps is also small and iii the consequences of the failure are tolerable it is highly important to design systems in such way that most ideally all of the failures are trivial such systems are called lsquo debuggable systems rsquo in this study we present the results of systematic applying techniques that focus on creating debuggable software for real embedded applications copyright copy john wiley sons ltd
dynamic voltage scaling dvs is popular energy saving technique for real time tasks the effectiveness of dvs critically depends on the accuracy of workload estimation since dvs exploits the slack or the difference between the deadline and execution time many existing dvs techniques are profile based and simply utilize the worst case or average execution time without estimation several recent approaches recognize the importance of workload estimation and adopt statistical estimation techniques however these approaches still require extensive profiling to extract reliable workload statistics and furthermore cannot effectively handle time varying workloads feedback control based adaptive algorithms have been proposed to handle such nonstationary workloads but their results are often too sensitive to parameter selection to overcome these limitations of existing approaches we propose novel workload estimation technique for dvs this technique is based on the kalman filter and can estimate the processing time of workloads in robust and accurate manner by adaptively calibrating estimation error by feedback we tested the proposed method with workloads of various characteristics extracted from eight mpeg video clips to thoroughly evaluate the performance of our approach we used both cycle accurate simulator and an xscale based test board our simulation result demonstrates that the proposed technique outperforms the compared alternatives with respect to the ability to meet given timing and quality of service constraints furthermore we found that the accuracy of our approach is almost comparable to the oracle accuracy achievable only by offline analysis experimental results indicate that using our approach can reduce energy consumption by on average only with negligible deadline miss ratio dmr around moreover the average of computational overheads for the proposed technique is just which is the minimum value compared to other methods more importantly the dmr of our method is bounded by in the worst case while those of other methods are twice or more than ours
we present method for image interpolation which is able to create high quality perceptually convincing transitions between recorded images by implementing concepts derived from human vision the problem of physically correct image interpolation is relaxed to an image interpolation that is perceived as physically correct by human observers we find that it suffices to focus on exact edge correspondences homogeneous regions and coherent motion to compute such solutions in our user study we confirm the visual quality of the proposed image interpolation approach we show how each aspect of our approach increases the perceived quality of the interpolation results compare the results obtained by other methods and investigate the achieved quality for different types of scenes
the openmp programming model provides parallel applications very important feature job malleability job malleability is the capacity of an application to dynamically adapt its parallelism to the number of processors allocated to it we believe that job malleability provides to applications the flexibility that system needs to achieve its maximum performance we also defend that system has to take its decisions not only based on user requirements but also based on run time performance measurements to ensure the efficient use of resources job malleability is the application characteristic that makes possible the run time performance analysis without malleability applications would not be able to adapt their parallelism to the system decisions to support these ideas we present two new approaches to attack the two main problems of gang scheduling the excessive number of time slots and the fragmentation our first proposal is to apply scheduling policy inside each time slot of gang scheduling to distribute processors among applications considering their efficiency calculated based on run time measurements we call this policy performance driven gang scheduling our second approach is new re packing algorithm compress join that exploits the job malleability this algorithm modifies the processor allocation of running applications to adapt it to the system necessities and minimize the fragmentation and number of time slots these proposals have been implemented in sgi origin with processors results show the validity and convenience of both to consider the job performance analysis calculated at run time to decide the processor allocation and to use flexible programming model that adapts applications to system decisions
in this paper we present an enhanced approach to cope with consistency and validation issues arising in service oriented integration design using an expressive logic language this approach goes beyond the traditional ones which are focused on the simple consistency of structural specification indeed it is able to keep into account both static and dynamic constraints while the former type applies to system states the latter concerns the system state transitionsthe present solution is oriented towards the analysis of dynamic features as services of system described using rich ontology specification based on description logics the aim of such solution is to provide the architect at the design phase with adequate support tools to check both the consistency of static and dynamic artifacts it relies on the iope input output preconditions and effects paradigm to specify operation semantics and on decidable fragment of the first order logic in order to provide reasoning based tool able to verify various semantic properties
input validation is essential and critical in web applications it is the enforcement of constraints that any input must satisfy before it is accepted to raise external effects we have discovered some empirical properties for characterizing input validation in web applications in this paper we propose an approach for automated recovery of input validation model from program source code the model recovered is represented in variant of control flow graph called validation flow graph which shows essential input validation features implemented in programs based on the model we then formulate two coverage criteria for testing input validation the two criteria can be used to guide the structural testing of input validation in web applications we have evaluated the proposed approach through case studies and experiments
the rapid growth of available data arises the need for more sophisticated techniques for semantic access to information it has been proved that using conceptual model or ontology over relational data sources is necessary to overcome many problems related with accessing the structured data however the task of wrapping the data residing in database by means of an ontology is mainly done manually the research we are carrying out studies the reuse of relational sources in the context of semantics based access to information this problem is tackled in two phases extracting semantics hidden in the relational sources by wrapping them by means of an ontology ii understanding the methodology for semantic extension of such ontologies in this paper we focus on the first sub problem and present an automatic procedure for extracting from relational database schema its conceptual view the semantic mapping between the database and its conceptualization is captured by associating views over the data source to elements of the extracted ontology to represent the extracted conceptual model we use an ontology language rather that graphical notation in order to provide precise formal semantics our approach uses heuristics based on ideas of standard relational schema design and normalization under this we formally prove that our technique preserves the semantics of constraints in the database therefore there is no data loss and the extracted model constitutes faithful wrapper of the relational database
mining association rules is most commonly seen among the techniques for knowledge discovery from databases kdd it is used to discover relationships among items or itemsets furthermore temporal data mining is concerned with the analysis of temporal data and the discovery of temporal patterns and regularities in this paper new concept of up to date patterns is proposed which is hybrid of the association rules and temporal mining an itemset may not be frequent large for an entire database but may be large up to date since the items seldom occurring early may often occur lately an up to date pattern is thus composed of an itemset and its up to date lifetime in which the user defined minimum support threshold must be satisfied the proposed approach can mine more useful large itemsets than the conventional ones which discover large itemsets valid only for the entire database experimental results show that the proposed algorithm is more effective than the traditional ones in discovering such up to date temporal patterns especially when the minimum support threshold is high
the idea of phong shading is applied to subdivision surfaces normals are associated with vertices and the same construction is used for both locations and normals this creates vertex positions and normals the vertex normals are smoother than the normals of the subdivision surface and using vertex normals for shading attenuates the well known visual artifacts of many subdivision schemes we demonstrate how to apply subdivision to normals and how blend and combine different normals for achieving variety of effects
many applications require exploration of alternative scenarios most support it poorly subjunctive interfaces provide mechanisms for the parallel setup viewing and control of scenarios aiming to support users thinking about and interaction with their choices we illustrate how applications for information access real time simulation and document design may be extended with these mechanisms to investigate the usability of this form of extension we compare simple census browser against version with subjunctive interface in the first of three studies subjects reported higher satisfaction with the subjunctive interface and relied less on interim marks on paper no reduction in task completion time was found however mainly because some subjects encountered problems in setting up and controlling scenarios at the end of second five session study users of redesigned interface completed tasks percnt more quickly than with the simple interface in the third study we examined how subjects reasoned about multiple scenario setups in pursuing complex open ended data explorations our main observation was that subjects treated scenarios as information holders using them creatively in various ways to facilitate task completion
compiler directed locality optimization techniques are effective in reducing the number of cycles spent in off chip memory accesses recently methods have been developed that transform memory layouts of data structures at compile time to improve spatial locality of nested loops beyond current control centric loop nest based optimizations most of these data centric transformations use single static program wide memory layout for each array disadvantage of these static layout based locality enhancement strategies is that they might fail to optimize codes that manipulate arrays which demand different layouts in different parts of the code in this paper we introduce new approach which extends current static layout optimization techniques by associating different memory layouts with the same array in different parts of the code we call this strategy quasidynamic layout optimization in this strategy the compiler determines memory layouts for different parts of the code at compile time but layout conversions occur at runtime we show that the possibility of dynamically changing memory layouts during the course of execution adds new dimension to the data locality optimization problem our strategy employs static layout optimizer module as building block and by repeatedly invoking it for different parts of the code it checks whether runtime layout modifications bring additional benefits beyond static optimization our experiments indicate significant improvements in execution time over static layout based locality enhancing techniques
loma mobile location aware messaging application is designed to be mobile portal to location based information in cities the user can perform textual searches to location based content navigate using maps assisted by gps and leave messages to the environment or recognize the environment from map the map view is the key feature of the loma system the loma client is capable of rendering photorealistic city models with augmented location based information in smart phone without hardware rendering support at interactive frame rates this paper presents the key challenges and solutions in creating this map engine and lightweight but photorealistic city model
compilers base many critical decisions on abstracted architectural models while recent research has shown that modeling is effective for some compiler problems building accurate models requires great deal of human time and effort this paper describes how machine learning techniques can be leveraged to help compiler writers model complex systems because learning techniques can effectively make sense of high dimensional spaces they can be valuable tool for clarifying and discerning complex decision boundaries in this work we focus on loop unrolling well known optimization for exposing instruction level parallelism using the open research compiler as testbed we demonstrate how one can use supervised learning techniques to determine the appropriateness of loop unrolling we use more than loops drawn from benchmarks to train two different learning algorithms to predict unroll factors ie the amount by which to unroll loop for any novel loop the technique correctly predicts the unroll factor for of the loops in our dataset which leads to overall improvement for the spec benchmark suite for the spec floating point benchmarks
programmers build large scale systems with multiple languages to reuse legacy code and leverage languages best suited to their problems for instance the same program may use java for ease of programming and to interface with the operating system these programs pose significant debugging challenges because programmers need to understand and control code across languages which may execute in different environments unfortunately traditional multilingual debuggers require single execution environment this paper presents novel composition approach to building portable mixed environment debuggers in which an intermediate agent interposes on language transitions controlling and reusing single environment debuggers we implement debugger composition in blink debugger for java and the jeannie programming language we show that blink is relatively simple it requires modest amounts of new code portable it supports multiple java virtual machines compilers operating systems and component debuggers and powerful composition eases debugging while supporting new mixed language expression evaluation and java native interface jni bug diagnostics in real world case studies we show that language interface errors require single environment debuggers to restart execution multiple times whereas blink directly diagnoses them with one execution we also describe extensions for other mixed environments to show debugger composition will generalize
current web search engines generally impose link analysis based re ranking on web page retrieval however the same techniques when applied directly to small web search such as intranet and site search cannot achieve the same performance because their link structures are different from the global web in this paper we propose an approach to constructing implicit links by mining users access patterns and then apply modified pagerank algorithm to re rank web pages for small web search our experimental results indicate that the proposed method outperforms content based method by explicit link based pagerank by and directhit by respectively
publish subscribe systems are successfully used to decouple distributed applications however their efficiency is closely tied to the topology of the underlying network the design of which has been neglected peer to peer network topologies can offer inherently bounded delivery depth load sharing and self organisation in this paper we present content based publish subscribe system routed over peer to peer topology graph the implications of combining these approaches are explored and particular implementation using elements from rebeca and chord is proven correct
in multimedia databases the spatial index structures based on trees like tree tree have been proved to be efficient and scalable for low dimensional data retrieval however if the data dimensionality is too high the hierarchy of nested regions represented by the tree nodes becomes spatially indistinct hence the query processing deteriorates to inefficient index traversal in terms of random access costs and in such case the tree based indexes are less efficient than the sequential search this is mainly due to repeated access to many nodes at the top levels of the tree in this paper we propose modified storage layout of tree based indexes such that nodes belonging to the same tree level are stored together such level ordered storage allows to prefetch several top levels of the tree into the buffer pool by only few or even single contiguous operation ie one seek read the experimental results show that our approach can speedup the tree based search significantly
camera calibration is an indispensable step in retrieving metric information from images one classical self calibration method is based on kruppa equation derived from pairwise image correspondences however the calibration constraints derived from kruppa equation are quadratic which are computationally intensive and difficult to obtain initial values in this paper we propose new initialization algorithm to estimate the unknown scalar in the equation thus the camera parameters can be initialized linearly in closed form and then optimized iteratively via the kruppa constraints we prove that the scalar can be uniquely recovered from the infinite homography and propose practical method to estimate the homography from physical or virtual plane located at far distance to the camera extensive experiments on synthetic and real images validate the effectiveness of the proposed method
number of multi hop wireless network programming systems have emerged for sensor network retasking but none of these systems support cryptographically strong public key based system for source authentication and integrity verification the traditional technique for authenticating program binary namely digital signature of the program hash is poorly suited to resource contrained sensor nodes our solution to the secure programming problem leverages authenticated streams is consistent with the limited resources of typical sensor node and can be used to secure existing network programming systems under our scheme program binary consists of several code and data segments that are mapped to series of messages for transmission over the network an advertisement consisting of the program name version number and hash of the very first message is digitally signed and transmitted first the advertisement authenticates the first message which in turn contains hash of the second message similarly the second message contains hash of the third message and so on binding each message to the one logically preceding it in the series through the hash chain we augmented the deluge network programming system with our protocol and evaluated the resulting system performance
in this paper we describe two techniques for the efficient modularized implementation of large class of algorithms we illustrate these techniques using several examples including efficient generic unification algorithms that use reference cells to encode substitutions and highly modular language implementations we chose these examples to illustrate the following important techniques that we believe many functional programmers would find useful first defining recursive data types by splitting them into two levels structure defining level and recursive knot tying level second the use of rank polymorphism inside haskell’s record types to implement kind of type parameterized modules finally we explore techniques that allow us to combine already existing recursive haskell data types with the highly modular style of programming proposed here
we revisit the problem of detecting greedy behavior in the ieee mac protocol by evaluating the performance of two previously proposed schemes domino and the sequential probability ratio test sprt our evaluation is carried out in four steps we first derive new analytical formulation of the sprt that considers access to the wireless medium in discrete time slots then we introduce an analytical model for domino as third step we evaluate the theoretical performance of sprt and domino with newly introduced metrics that take into account the repeated nature of the tests this theoretical comparison provides two major insights into the problem it confirms the optimality of sprt and motivates us to define yet another test nonparametric cusum statistic that shares the same intuition as domino but gives better performance we finalize the paper with experimental results confirming the correctness of our theoretical analysis and validating the introduction of the new nonparametric cusum statistic
this paper describes an approach to building real time highly controllable characters kinematic character controller is built on the fly during capture session and updated after each new motion clip is acquired active learning is used to identify which motion sequence the user should perform next in order to improve the quality and responsiveness of the controller because motion clips are selected adaptively we avoid the difficulty of manually determining which ones to capture and can build complex controllers from scratch while significantly reducing the number of necessary motion samples
there are currently two approaches to providing byzantine fault tolerant state machine replication replica based approach eg bft that uses communication between replicas to agree on proposed ordering of requests and quorum based approach such as in which clients contact replicas directly to optimistically execute operations both approaches have shortcomings the quadratic cost of inter replica communication is un necessary when there is no contention and requires large number of replicas and performs poorly under contention we present hq hybrid byzantine fault tolerant state machine replication protocol that overcomes these problems hq employs lightweight quorum based protocol when there is no contention but uses bft to resolve contention when it arises furthermore hq uses only replicas to tolerate faults providing optimal resilience to node failures we implemented prototype of hq and we compare its performance to bft and analytically and experimentally additionally in this work we use new implementation of bft designed to scale as the number of faults increases our results show that both hq and our new implementation of bft scale as increases additionally our hybrid approach of using bft to handle contention works well
although gender differences in technological world are receiving significant research attention much of the research and practice has aimed at how society and education can impact the successes and retention of female computer science professionals but the possibility of gender issues within software has received almost no attention if gender issues exist with some types of software features it is possible that accommodating them by changing these features can increase effectiveness but only if we know what these issues are in this paper we empirically investigate gender differences for end users in the context of debugging spreadsheets our results uncover significant gender differences in self efficacy and feature acceptance with females exhibiting lower self efficacy and lower feature acceptance the results also show that these differences can significantly reduce females effectiveness
trace ratio is natural criterion in discriminant analysis as it directly connects to the euclidean distances between training data points this criterion is re analyzed in this paper and fast algorithm is developed to find the global optimum for the orthogonal constrained trace ratio problem based on this problem we propose novel semi supervised orthogonal discriminant analysis via label propagation differing from the existing semi supervised dimensionality reduction algorithms our algorithm propagates the label information from the labeled data to the unlabeled data through specially designed label propagation and thus the distribution of the unlabeled data can be explored more effectively to learn better subspace extensive experiments on toy examples and real world applications verify the effectiveness of our algorithm and demonstrate much improvement over the state of the art algorithms
representation of features of items and user feedback and reasoning about their relationships are major problems in recommender systems this is because item features and user feedback are subjective imprecise and vague the paper presents fuzzy set theoretic method ftm for recommender systems that handles the non stochastic uncertainty induced from subjectivity vagueness and imprecision in the data and the domain knowledge and the task under consideration the research further advances the application of fuzzy modeling for content based recommender systems initially presented by ronald yager the paper defines representation method similarity measures and aggregation methods as well as empirically evaluates the methods performance through simulation using benchmark movie data ftm consist of representation method for items features and user feedback using fuzzy sets and content based algorithm based on various fuzzy set theoretic similarity measures the fuzzy set extensions of the jaccard index cosine proximity or correlation similarity measures and aggregation methods for computing recommendation confidence scores the maximum minimum or weighted sum fuzzy set theoretic aggregation methods compared to the baseline crisp set based method csm presented the empirical evaluation of the ftm using the movie data and simulation shows an improvement in precision without loss of recall moreover the paper provides guideline for recommender systems designers that will help in choosing from combination of one of the fuzzy set theoretic aggregation methods and similarity measures
we consider two classes of preemptive processor sharing scheduling policies in which the instantaneous weight given to customer depends on the amount of service already imparted to the customer age based scheduling or the amount of service yet to be imparted to the customer residual processing time rpt based scheduling we analyze the system for the mean sojourn time of tagged customer conditioned on its service requirement the main contribution of this article arei we decompose the sojourn time of customer into two parts the contribution of the descendants of the customer and the contribution of the jobs and their descendants which the customer sees on arrival for the preemptive system under consideration it is shown that the behavior of the mean sojourn time for large values of service requirement is determined only by the first term above we provide closed form expression for this component of the sojourn time for general age based and rpt based scheduling disciplinesiii if the weight assigned to customer with an age units is alpha for some alpha we show that the behavior of mean sojourn time conditioned on service requirement is asymptotically linear for all le alpha le moreover this asymptotic slope is same for le alpha and shows discontinuity at alpha
typically in multimedia databases there exist two kinds of clues for query perceptive features and semantic classes in this paper we propose novel framework for multimedia databases index and retrieval integrating the perceptive features and semantic classes to improve the speed and the precision of the content based multimedia retrieval cbmr we develop semantics supervised clustering based index approach briefly as ssci the entire data set is divided hierarchically into many clusters until the objects within cluster are not only close in the perceptive feature space but also within the same semantic class and then an index term is built for each cluster especially the perceptive feature vectors in cluster are organized adjacently in disk so the ssci based nearest neighbor nn search can be divided into two phases first the indexes of all clusters are scanned sequentially to get the candidate clusters with the smallest distances from the query example second the original feature vectors within the candidate clusters are visited to get search results furthermore if the results are not satisfied the ssci supports an effective relevance feedback rf search users mark the positive and negative samples regarded cluster as unit instead of single object then the bayesian classifiers on perceptive features and that on semantics are used respectively to adjust retrieval similarity distance our experiments show that ssci based searching was faster than va based searching the quality of the search result based on ssci was better than that of the sequential search in terms of semantics and few cycles of the rf by the proposed approach can improve the retrieval precision significantly
one major challenge in communication networks is the problem of dynamically distributing load in the presence of bursty and hard to predict changes in traffic demands current traffic engineering operates on time scales of several hours which is too slow to react to phenomena like flash crowds or bgp reroutes one possible solution is to use load sensitive routing yet interacting routing decisions at short time scales can lead to oscillations which has prevented load sensitive routing from being deployed since the early experiences in arpanet however recent theoretical results have devised game theoretical re routing policy that provably avoids such oscillation and in addition can be shown to converge quickly in this paper we present replex distributed dynamic traffic engineering algorithm based on this policy exploiting the fact that most underlying routing protocols support multiple equal cost routes to destination it dynamically changes the proportion of traffic that is routed along each path these proportions are carefully adapted utilising information from periodic measurements and optionally information exchanged between the routers about the traffic condition along the path we evaluate the algorithm via simulations employing traffic loads that mimic actual web traffic bursty tcp traffic and whose characteristics are consistent with self similarity the simulations quickly converge and do not exhibit significant oscillations on both artificial as well as real topologies as can be expected from the theoretical results
peer to peer pp networks represent an effective way to share information since there are no central points of failure or bottleneck however the flip side to the distributive nature of pp networks is that it is not trivial to aggregate and broadcast global information efficiently we believe that this aggregation broadcast functionality is fundamental service that should be layered over existing distributed hash tables dhts and in this work we design novel algorithm for this purpose specifically we build an aggregation broadcast tree in bottom up fashion by mapping nodes to their parents in the tree with parent function the particular parent function family we propose allows the efficient construction of multiple interior node disjoint trees thus preventing single points of failure in tree structures in this way we provide dhts with an ability to collect and disseminate information efficiently on global scale simulation results demonstrate that our algorithm is efficient and robust
this paper proposes method which aims at increasing the efficiency of enterprise system implementations first we argue that existing process modeling languages that feature different degrees of abstraction for different user groups exist and are used for different purposes which makes it necessary to integrate them we describe how to do this using the meta models of the involved languages second we argue that an integrated process model based on the integrated meta model needs to be configurable and elaborate on the enabling mechanisms we introduce business example using sap modeling techniques to illustrate the proposed method
motivated by recent work of abiteboul vianu fordham and yesha we investigate the verifiability of transaction protocols specifying the interaction of multiple parties via network the protocols which we are concerned with typically occur in the context of electronic commerce applications and can be formalized as relational transducers we introduce class of powerful relational transducers based on gurevich’s abstract state machines and show that several verification problems related to electronic commerce applications are decidable for these transducers
the widespread availability of networked environments and the arrival of high speed networks have rekindled interest in the area of automatic data refresh update mechanisms in many application areas the updated information has limited period of usefulness therefore the development of systems and protocols that can handle such update tasks within predefined deadlines is required in this paper we propose and evaluate two real time update propagation mechanisms in client server environment the fundamental difference in these two time constrained techniques client ndash push and server push is in the location where the push transactions are generated in both these techniques and in contrast to conventional methods we propose the transport of the scripts of updating transactions in order to make client cached data current this avoids unnecessary shipments of data over the network instead messages are used to maintain the consistency of cached data in addition the propagation of update transaction scripts to client sites is neither periodic nor mandatory but is instead based on client specific criteria these criteria depend on the content of the database objects being updated we carry out comprehensive experimental evaluation of the suggested methods as we examine the following aspects time constrained push scheduling issues effects of various workloads on real time push transaction completion rates efficiency and overheads imposed by push transactions on the regular transaction processing our experiments show that client ndash push outperforms server push only for small number of clients the opposite is true once the load is increased by attaching large number of sites per server the efficiency of the update push protocols is as expected dependent on the load on the system as well as the percentage of updates to the database surprisingly the percentage of successfully completed real time push transactions is not affected very much by the strategy used to schedule them
this paper presents fourier descriptor based image alignment algorithm fdbia for applications of automatic optical inspection aoi performed in real time environment it deliberates component detection and contour tracing algorithms and uses the magnitude and phase information of fourier descriptors to establish correspondences between the target objects detected in the reference and the inspected images so the parameters for aligning the two images can be estimated accordingly to enhance the computational efficiency the proposed component detection and contour tracing algorithms use the run length encoding rle and blobs tables to represent the pixel information in the regions of interest the fourier descriptors derived from the component boundaries are used to match the target objects finally the transformation parameters for aligning the inspected image with the reference image are estimated based on novel phase shifted technique experimental results show that the proposed fdbia algorithm sustains similar accuracy as achieved by the commercial software easyfind against various rotation and translation conditions also the computational time consumed by the fdbia algorithm is significantly shorter than that by easyfind
types or sorts are pervasive in computer science and in rewritingbased programming languages which often support subtypes subsorts and subtype polymorphism programs in these languages can be modeled as order sorted term rewriting systems os trss often termination of such programs heavily depends on sort information but few techniques are currently available for proving termination of os trss and they often fail for interesting os trss in this paper we generalize the dependency pairs approach to prove termination of os trss preliminary experiments suggest that this technique can succeed where existing ones fail yielding easier and simpler termination proofs
the security policy of an information system may include wide range of different requirements the literature has primarily focused on access and information flow control requirements and more recently on authentication and usage control requirements specifying administration and delegation policies is also an important issue especially in the context of pervasive distributed systems in this paper we are investigating the new issue of modelling intrusion detection and reaction policies and study the appropriateness of using deontic logic for this purpose we analyze how intrusion detection requirements may be specified to face known intrusions but also new intrusions in the case of new intrusions we suggest using the bring it about modality and specifying requirements as prohibitions to bring it about that some security objectives are violated when some intrusions occur the security policy to be complete should specify what happens in this case this is what we call reaction policy the paper shows that this part of the policy corresponds to contrary to duty requirements and suggests an approach based on assigning priority to activation contexts of security requirements
shared variable construction is called buffer based if the values of the variable are stored in buffers that are different from control storage each buffer stores only single value from the domain of the variable buffer based construction is conflict free if in each execution of the shared variable no reading of any buffer overlaps with any writing of that buffer this paper studies shared space requirements for wait free conflict free deterministic constructions of writer reader multivalued atomic variables from safe variables that four buffers are necessary and sufficient for such constructions has been established in the literature this paper establishes the requirement for control storage the least shared space for such construction in the literature is four safe buffers and four safe control bits we show that four safe control bits are necessary for such constructions when the reader is restricted to read at most one buffer in each read operation
path method is used as mechanism in object oriented databases oodbs to retrieve or to update information relevant to one class that is not stored with that class but with some other class path method is method which traverses from one class through chain of connections between classes and accesses information at another class however it is difficult task for casual user or even an application programmer to write path methods to facilitate queries this is because it might require comprehensive knowledge of many classes of the conceptual schema that are not directly involved in the query and therefore may not even be included in user’s incomplete view about the contents of the database we have developed system called path method generator pmg which generates path methods automatically according to user’s database manipulating requests the pmg offers the user one of the possible path methods and the user verifies from his knowledge of the intended purpose of the request whether that path method is the desired one if the path method is rejected then the user can utilize his now increased knowledge about the database to request with additional parameters given another offer from the pmg the pmg is based on access weights attached to the connections between classes and precomputed access relevance between every pair of classes of the oodb specific rules for access weight assignment and algorithms for computing access relevance appeared in our previous papers mgpf mgpf mgpf in this paper we present variety of traversal algorithms based on access weights and precomputed access relevance experiments identify some of these algorithms as very successful in generating most desired path methods the pmg system utilizes these successful algorithms and is thus an efficient tool for aiding the user with the difficult task of querying and updating large oodb
performance simulation of software for multiprocessor system on chips mpsoc suffers from poor tool support cycle accurate simulation at instruction set simulation level is too slow and inefficient for any design of realistic size behavioral simulation though useful for functional analysis at high level does not provide any performance information that is crucial for design and analysis of mpsoc implementations as consequence designers are often reduced to manually annotate performance information onto behavioral models which contributes further to inefficiency and inaccuracy in this paper we use structural performance models to provide fast and accurate simulation of software for mpsoc we generate structural models automatically using gcc with accurate performance annotation while considering optimizations for instruction selection branch prediction and pipeline interlock our structural models are able to simulate at several orders of magnitude faster than iss and provide less than error on performance estimation these models allow realistic mpsoc design space explorations based on performance characteristics with simulation speed comparable to behavioral simulation we validate our simulation models with several benchmarks and demonstrate our approach with design case study of an mpeg decoder
visualisations of complex interrelationships have the potential to be complex and require lot of cognitive input we have drawn analogues from natural systems to create new visualisation approaches that are more intutive and easier to work with we use nature inspired concepts to provide cognitive amplification moving the load from the user’s cognitive to their perceptual systems and thus allowing them to focus their cognitive resources where they are most appropriate two systems are presented one uses physical based model to construct the visualisation while the other uses biological inspiration their application to four visualisation tasks is discussed the structure of information browsing on the internet the structure of parts of the web itself to aid the refinement of queries to digital library and to compare different documents for similar content
many applications in geometric modeling computer graphics visualization and computer vision benefit from reduced representation called curve skeletons of shape these are curves possibly with branches which compactly represent the shape geometry and topology the lack of proper mathematical definition has been bottleneck in developing and applying the the curve skeletons set of desirable properties of these skeletons has been identified and the existing algorithms try to satisfy these properties mainly through procedural definition we define function called medial geodesic on the medial axis which leads to methematical definition and an approximation algorithm for curve skeletons empirical study shows that the algorithm is robust against noise operates well with single user parameter and produces curve skeletons with the desirable properties moreover the curve skeletons can be associated with additional attributes that follow naturally from the definition these attributes capture shape eccentricity local measure of how far shape is away from tubular one
knowledge discovery facilitates querying database knowledge and intelligent query answering in database systems in this paper we investigate the application of discovered knowledge concept hierarchies and knowledge discovery tools for intelligent query answering in database systems knowledge rich data model is constructed to incorporate discovered knowledge and knowledge discovery tools queries are classified into data queries and knowledge queries both types of queries can be answered directly by simple retrieval or intelligently by analyzing the intent of query and providing generalized neighborhood or associated information using stored or discovered knowledge techniques have been developed for intelligent query answering using discovered knowledge and or knowledge discovery tools which includes generalization data summarization concept clustering rule discovery query rewriting deduction lazy evaluation application of multiple layered databases etc our study shows that knowledge discovery substantially broadens the spectrum of intelligent query answering and may have deep implications on query answering in data and knowledge base systems
ldquo divide and conquer rdquo strategy to compute natural joins by sequential scans on unordered relations is described this strategy is shown to always he better than merging scbiis when both relations must he sorted before joining and generally better in practical cases when only the largest relation mutt be sorted
descriptive complexity approach to random sat is initiated we show that unsatisfiability of any significant fraction of random cnf formulas cannot be certified by any property that is expressible in datalog combined with the known relationship between the complexity of constraint satisfaction problems and expressibility in datalog our result implies that any constraint propagation algorithm working with small constraints will fail to certify unsatisfiability almost always our result is consequence of designing winning strategy for one of the players in the existential pebble game the winning strategy makes use of certain extension axioms that we introduce and hold almost surely on random cnf formula the second contribution of our work is the connection between finite model theory and propositional proof complexity to make this connection explicit we establish tight relationship between the number of pebbles needed to win the game and the width of the resolution refutations as consequence to our result and the known size width relationship in resolution we obtain new proofs of the exponential lower bounds for resolution refutations of random cnf formulas and the pigeonhole principle
disruption tolerant networks dtns technologies are emerging solutions to networks that experience frequent partitions as result multicast design in dtns is considerably more difficult problem compared to that in internet and mobile ad hoc networks in this paper we first investigate three basic dtn multicast strategies including unicast based multicast multicast static tree based multicast st multicast and dynamic tree based multicast dt multicast strategies then we focus on studying two dt multicast routing schemes dynamic tree based routing dtbr and on demand situation aware multicast os multicast which address the challenges of utilizing opportunistic links to conduct dynamic multicast structures in dtns performances of different strategies are then evaluated by simulations including applying the real world dtn traces our results show that os multicast and dtbr can achieve higher message delivery ratio than that of using multicast and st multicast strategies also to get better performance we recommend that system designers select os multicast when the source traffic rate is low
mobile ad hoc network manet is group of mobile nodes which communicates with each other without any supporting infrastructure routing in manet is extremely challenging because of manets dynamic features its limited bandwidth and power energy nature inspired algorithms swarm intelligence such as ant colony optimization aco algorithms have shown to be good technique for developing routing algorithms for manets swarm intelligence is computational intelligence technique that involves collective behavior of autonomous agents that locally interact with each other in distributed environment to solve given problem in the hope of finding global solution to the problem in this paper we propose hybrid routing algorithm for manets based on aco and zone routing framework of bordercasting the algorithm hopnet based on ants hopping from one zone to the next consists of the local proactive route discovery within node’s neighborhood and reactive communication between the neighborhoods the algorithm has features extracted from zrp and dsr protocols and is simulated on glomosim and is compared to aodv routing protocol the algorithm is also compared to the well known hybrid routing algorithm anthocnet which is not based on zone routing framework results indicate that hopnet is highly scalable for large networks compared to anthocnet the results also indicate that the selection of the zone radius has considerable impact on the delivery packet ratio and hopnet performs significantly better than anthocnet for high and low mobility the algorithm has been compared to random way point model and random drunken model and the results show the efficiency and inefficiency of bordercasting finally hopnet is compared to zrp and the strength of nature inspired algorithm is shown
access to legal information and in particular to legal literature is examined in conjunction with the creation of portal to italian legal doctrine the design and implementation of services such as integrated access to wide range of resources are described with particular focus on the importance of exploiting metadata assigned to disparate legal material the integration of structured repositories and web documents is the main purpose of the portal it is constructed on the basis of federation system with service provider functions aiming at creating centralized index of legal resources the index is based on uniform metadata view created for structured data by means of the oai approach and for web documents by machine learning approach subject searching is major requirement for legal literature users and solution based on the exploitation of dublin core metadata as well as the use of legal ontologies and related terms prepared for accessing indexed articles have been implemented
increasingly software must dynamically adapt its behavior in response to changes in the supporting computing communication infrastructure and in the surrounding physical environment assurance that the adaptive software correctly satisfies its requirements is crucial if the software is to be used in high assurance systems such as command and control or critical infrastructure protection systems adaptive software development for these systems must be grounded upon formalism and rigorous software engineering methodology to gain assurance in this paper we briefly describe amoeba rt run time monitoring and verification technique that provides assurance that dynamically adaptive software satisfies its requirements
as sensor networks are deployed over various terrains the complexity of their topology continues to grow voids in networks often cause existing geographic routing algorithms to fail in this paper we introduce novel concept virtual position to address this issue virtual position is the middle position of all direct neighbors of node such virtual position reflects the neighborhood of sensor node as well as the tendency of further forwarding instead of comparing nodes real geographic positions virtual positions are compared when selecting the next hop for sparsely deployed networks this technique increases success rate of packet routing without introducing significant overhead we here present an algorithm using this foundation concept and then design several enhanced versions to improve success rate of packet routing in sensor networks we also conduct complexity analysis of the algorithms and support our claims of the algorithms superiority with extensive simulation results
geographic web search engines allow users to constrain and order search results in an intuitive manner by focusing query on particular geographic region geographic search technology also called local search has recently received significant interest from major search engine companies academic research in this area has focused primarily on techniques for extracting geographic knowledge from the web in this paper we study the problem of efficient query processing in scalable geographic search engines query processing is major bottleneck in standard web search engines and the main reason for the thousands of machines used by the major engines geographic search engine query processing is different in that it requires combination of text and spatial data processing techniques we propose several algorithms for efficient query processing in geographic search engines integrate them into an existing web search query processor and evaluate them on large sets of real data and query traces
scratch pad memories spms enable fast access to time critical data while prior research studied both static and dynamic spm management strategies not being able to keep all hot data ie data with high reuse in the spm remains the biggest problem this paper proposes data compression to increase the number of data blocks that can be kept in the spm our experiments with several embedded applications show that our compression based spm management heuristic is very effective and outperforms prior static and dynamic spm management approaches we also present an ilp formulation of the problem and show that the proposed heuristic generates competitive results with those obtained through ilp while spending much less time in compilation
in this paper we describe the design and implementation of static array bound checker for family of embedded programs the flight control software of recent mars missions these codes are large up to kloc pointer intensive heavily multithreaded and written in an object oriented style which makes their analysis very challenging we designed tool called global surveyor cgs that can analyze the largest code in couple of hours with precision of the scalability and precision of the analyzer are achieved by using an incremental framework in which pointer analysis and numerical analysis of array indices mutually refine each other cgs has been designed so that it can distribute the analysis over several processors in cluster of machines to the best of our knowledge this is the first distributed implementation of static analysis algorithms throughout the paper we will discuss the scalability setbacks that we encountered during the construction of the tool and their impact on the initial design decisions
similarity calculations and document ranking form the computationally expensive parts of query processing in ranking based text retrieval in this work for these calculations alternative implementation techniques are presented under four different categories and their asymptotic time and space complexities are investigated to our knowledge six of these techniques are not discussed in any other publication before furthermore analytical experiments are carried out on gb document collection to evaluate the practical performance of different implementations in terms of query processing time and space consumption advantages and disadvantages of each technique are illustrated under different querying scenarios and several experiments that investigate the scalability of the implementations are presented
in this paper we introduce solution for relational database content rights protection through watermarking rights protection for relational data is of ever increasing interest especially considering areas where sensitive valuable content is to be outsourced good example is data mining application where data is sold in pieces to parties specialized in mining it different avenues are available each with its own advantages and drawbacks enforcement by legal means is usually ineffective in preventing theft of copyrighted works unless augmented by digital counterpart for example watermarking while being able to handle higher level semantic constraints such as classification preservation our solution also addresses important attacks such as subset selection and random and linear data changes we introduce wmdb proof of concept implementation and its application to real life data namely in watermarking the outsourced wal mart sales data that we have available at our institute
cross lingual information retrieval allows users to query mixed language collections or to probe for documents written in an unfamiliar language major difficulty for cross lingual information retrieval is the detection and translation of out of vocabulary oov terms for oov terms in chinese another difficulty is segmentation at ntcir we explored methods for translation and disambiguation for oov terms when using chinese query on an english collection we have developed new segmentation free technique for automatic translation of chinese oov terms using the web we have also investigated the effects of distance factor and window size when using hidden markov model to provide disambiguation our experiments show these methods significantly improve effectiveness in conjunction with our post translation query expansion technique effectiveness approaches that of monolingual retrieval
recent research efforts on spoken document retrieval have tried to overcome the low quality of best automatic speech recognition transcripts especially in the case of conversational speech by using statistics derived from speech lattices containing multiple transcription hypotheses as output by speech recognizer we present method for lattice based spoken document retrieval based on statistical gram modeling approach to information retrieval in this statistical lattice based retrieval slbr method smoothed statistical model is estimated for each document from the expected counts of words given the information in lattice and the relevance of each document to query is measured as probability under such model we investigate the efficacy of our method under various parameter settings of the speech recognition and lattice processing engines using the fisher english corpus of conversational telephone speech experimental results show that our method consistently achieves better retrieval performance than using only the best transcripts in statistical retrieval outperforms recently proposed lattice based vector space retrieval method and also compares favorably with lattice based retrieval method based on the okapi bm model
constraint based approach to invariant generation in programs translates program into constraints that are solved using off the shelf constraint solvers to yield desired program invariants in this paper we show how the constraint based approach can be used to model wide spectrum of program analyses in an expressive domain containing disjunctions and conjunctions of linear inequalities in particular we show how to model the problem of context sensitive interprocedural program verification we also present the first constraint based approach to weakest precondition and strongest postcondition inference the constraints we generate are boolean combinations of quadratic inequalities over integer variables we reduce these constraints to sat formulae using bitvector modeling and use off the shelf sat solvers to solve them furthermore we present interesting applications of the above analyses namely bounds analysis and generation of most general counter examples for both safety and termination properties we also present encouraging preliminary experimental results demonstrating the feasibility of our technique on variety of challenging examples
the main strengths of collaborative filtering cf the most successful and widely used filtering technique for recommender systems are its cross genre or outside the box recommendation ability and that it is completely independent of any machine readable representation of the items being recommended however cf suffers from sparsity scalability and loss of neighbor transitivity cf techniques are either memory based or model based while the former is more accurate its scalability compared to model based is poor an important contribution of this paper is hybrid fuzzy genetic approach to recommender systems that retains the accuracy of memory based cf and the scalability of model based cf using hybrid features novel user model is built that helped in achieving significant reduction in system complexity sparsity and made the neighbor transitivity relationship hold the user model is employed to find set of like minded users within which memory based search is carried out this set is much smaller than the entire set thus improving system’s scalability besides our proposed approaches are scalable and compact in size computational results reveal that they outperform the classical approach
this paper presents the first experiments with an intelligent tutoring system in the domain of linked lists fundamental topic in computer science the system has been deployed in an introductory college level computer science class and engendered significant learning gains constraint based approach has been adopted in the design and implementation of the system we describe the system architecture its current functionalities and the future directions of its development
long standing research problem in computer graphics is to reproduce the visual experience of walking through large photorealistic environment interactively on one hand traditional geometry based rendering systems fall short of simulating the visual realism of complex environment on the other hand image based rendering systems have to date been unable to capture and store sampled representation of large environment with complex lighting and visibility effectsin this paper we present sea of images practical approach to dense sampling storage and reconstruction of the plenoptic function in large complex indoor environments we use motorized cart to capture omnidirectional images every few inches on eye height plane throughout an environment the captured images are compressed and stored in multiresolution hierarchy suitable for real time prefetching during an interactive walkthrough later novel images are reconstructed for simulated observer by resampling nearby captured imagesour system acquires images over square feet at an average image spacing of inches the average capture and processing time is hours we demonstrate realistic walkthroughs of real world environments reproducing specular reflections and occlusion effects while rendering frames per second
multi touch interaction has received considerable attention in the last few years in particular for natural two dimensional interaction however many application areas deal with three dimensional data and require intuitive interaction techniques therefore indeed virtual reality vr systems provide sophisticated user interface but then lack efficient interaction and are therefore rarely adopted by ordinary users or even by experts since multi touch interfaces represent good trade off between intuitive constrained interaction on touch surface providing tangible feedback and unrestricted natural interaction without any instrumentation they have the potential to form the foundation of the next generation user interface for as well as interaction in particular stereoscopic display of data provides an additional depth cue but until now the challenges and limitations for multi touch interaction in this context have not been considered in this paper we present new multi touch paradigms and interactions that combine both traditional interaction and novel interaction on touch surface to form new class of multi touch systems which we refer to as interscopic multi touch surfaces imuts we discuss imuts based user interfaces that support interaction with content displayed in monoscopic mode and content usually displayed stereoscopically in order to underline the potential of the proposed imuts setup we have developed and evaluated two example interaction metaphors for different domains first we present intuitive navigation techniques for virtual city models and then we describe natural metaphor for deforming volumetric datasets in medical context
volumetric data such as output from ct scans or laser range scan processing methods often have isosurfaces that contain topological noise small handles and holes that are not present in the original model because this noise can significantly degrade the performance of other geometric processing algorithms we present volumetric method that removes the topological noise and patches holes in undefined regions for given isovalue we start with surface completely inside the isosurface of interest and surface completely outside the isosurface these surfaces are expanded and contracted respectively on voxel by voxel basis changes in topology of the surfaces are prevented at every step using local topology test the result is pair of surfaces that accurately reflect the geometry of the model but have simple topology we represent the volume in an octree format for improved performance in space and time
the crucial issue in many classification applications is how to achieve the best possible classifier with limited number of labeled data for training training data selection is one method which addresses this issue by selecting the most informative data for training in this work we propose three data selection mechanisms based on fuzzy clustering method center based selection border based selection and hybrid selection center based selection selects the samples with high degree of membership in each cluster as training data border based selection selects the samples around the border between clusters hybrid selection is the combination of center based selection and border based selection compared with existing work our methods do not require much computational effort moreover they are independent with respect to the supervised learning algorithms and initial labeled data we use fuzzy means to implement our data selection mechanisms the effects of them are empirically studied on set of uci data sets experimental results indicate that compared with random selection hybrid selection can effectively enhance the learning performance in all the data sets center based selection shows better performance in certain data sets border based selection does not show significant improvement
this paper examines strategic arrangement of fact data in data warehouse in order to answer analytical queries efficiently usually the composite of foreign keys from dimension tables are defined as the fact table’s primary key we focus on analytical queries that specify value for randomly chosen foreign key the desired data for answering query are typically located at different parts of the disk thus requiting multiple disk os to read them from disk to memory we formulate cost model to express the expected time to read the desired data as function of disk system’s parameters seek time rotational latency and reading speed and the lengths of foreign keys for predetermined disk page size we search for an arrangement of the fact data that minimizes the expected time cost an algorithm is then provided for identifying the most desirable disk page size finally we present heuristic for answering complex queries that specify values for multiple foreign keys
cross language text categorization is the task of exploiting labelled documents in source language eg english to classify documents in target language eg chinese in this paper we focus on investigating the use of bilingual lexicon for cross language text categorization to this end we propose novel refinement framework for cross language text categorization the framework consists of two stages in the first stage cross language model transfer is proposed to generate initial labels of documents in target language in the second stage expectation maximization algorithm based on naive bayes model is introduced to yield resulting labels of documents preliminary experimental results on collected corpora show that the proposed framework is effective
this paper presents power constrained test scheduling method for multi clock domain socs that consist of cores operating at different clock frequencies during test in the proposed method we utilize virtual tam to solve the frequency gaps between cores and the ate moreover we present technique to reduce power consumption of cores during test while the test time of the cores remain the same or increase little by using virtual tam experimental results show the effectiveness of the proposed method
techniques for test case prioritization re order test cases to increase their rate of fault detection when there is fixed time budget that does not allow the execution of all the test cases time aware techniques for test case prioritization may achieve better rate of fault detection than traditional techniques for test case prioritization in this paper we propose novel approach to time aware test case prioritization using integer linear programming to evaluate our approach we performed experiments on two subject programs involving four techniques for our approach two techniques for an approach to time aware test case prioritization based on genetic algorithms and four traditional techniques for test case prioritization the empirical results indicate that two of our techniques outperform all the other techniques for the two subjects under the scenarios of both general and version specific prioritization the empirical results also indicate that some traditional techniques with lower analysis time cost for test case prioritization may still perform competitively when the time budget is not quite tight
we have developed new technique for evaluating cache coherent shared memory computers the wisconsin wind tunnel wwt runs parallel shared memory program on parallel computer cm and uses execution driven distributed discrete event simulation to accurately calculate program execution time wwt is virtual prototype that exploits similarities between the system under design the target and an existing evaluation platform the host the host directly executes all target program instructions and memory references that hit in the target cache wwt’s shared memory uses the cm memory’s error correcting code ecc as valid bits for fine grained extension of shared virtual memory only memory references that miss in the target cache trap to wwt which simulates cache coherence protocol wwt correctly interleaves target machine events and calculates target program execution time wwt runs on parallel computers with greater speed and memory capacity than uniprocessors wwt’s simulation time decreases as target system size increases for fixed size problems and holds roughly constant as the target system and problem scale
the problem of similarity search query by content has attracted much research interest it is difficult problem because of the inherently high dimensionality of the data the most promising solutions involve performing dimensionality reduction on the data then indexing the reduced data with multidimensional index structure many dimensionality reduction techniques have been proposed including singular value decomposition svd the discrete fourier transform dft the discrete wavelet transform dwt and piecewise polynomial approximation in this work we introduce novel framework for using ensembles of two or more representations for more efficient indexing the basic idea is that instead of committing to single representation for an entire dataset different representations are chosen for indexing different parts of the database the representations are chosen based upon local view of the database for example sections of the data that can achieve high fidelity representation with wavelets are indexed as wavelets but highly spectral sections of the data are indexed using the fourier transform at query time it is necessary to search several small heterogeneous indices rather than one large homogeneous index as we will theoretically and empirically demonstrate this results in much faster query response times
tracking the changing dynamics of object oriented frameworks design patterns architectural styles and subsystems during the development and reuse cycle can aid producing complex systems unfortunately current object oriented programming tools are relatively oblivious to the rich architectural abstractions in systemthis paper shows that architecture oriented visualization the graphical presentation of system statics and dynamics in terms of its architectural abstractions is highly beneficial in designing complex systems in addition the paper presents architecture aware instrumentation new technique for building efficient on line instrumentation to support architectural queries we demonstrate the effectiveness and performance of the scheme with case studies in the design of the choices object oriented operating system
personalized graphical user interfaces have the potential to reduce visual complexity and improve interaction efficiency by tailoring elements such as menus and toolbars to better suit an individual user’s needs when an interface is personalized to make useful features more accessible for user’s current task however there may be negative impact on the user’s awareness of the full set of available features making future tasks more difficult to assess this tradeoff we introduce awareness as an evaluation metric to be used in conjunction with performance we then discuss three studies we have conducted which show that personalized interfaces tradeoff awareness of unused features for performance gains on core tasks the first two studies previously published and presented only in summary demonstrate this tradeoff by measuring awareness using recognition test of unused features in the interface the studies also evaluated two different types of personalized interfaces layered interfaces approach and an adaptive split menu approach the third study presented in full focuses on adaptive split menus and extends results from the first two studies to show that different levels of awareness also correspond to an impact on performance when users are asked to complete new tasks based on all three studies and survey of related work we outline design space of personalized interfaces and present several factors that could affect the tradeoff between core task performance and awareness finally we provide set of design implications that should be considered for personalized interfaces
this paper investigates the problem of establishing trust in service oriented environments we focus on providing an infrastructure for evaluating the credibility of raters in reputation based framework that would enable trust based web services interactions the techniques we develop would aid service consumer in assigning an appropriate weight to the testimonies of different raters regarding prospective service provider the experimental analysis show that the proposed techniques successfully dilute the effects of malicious ratings
batched stream processing is new distributed data processing paradigm that models recurring batch computations on incrementally bulk appended data streams the model is inspired by our empirical study on trace from large scale production data processing cluster it allows set of effective query optimizations that are not possible in traditional batch processing model we have developed query processing system called comet that embraces batched stream processing and integrates with dryadlinq we used two complementary methods to evaluate the effectiveness of optimizations that comet enables first prototype system deployed on node cluster shows an reduction of over using our benchmark second when applied to real production trace covering over million machine hours our simulator shows an estimated saving of over
analytic models based on discrete time markov chains dtmc are proposed to assess the algorithmic performance of software transactional memory tm systems base stm variants are compared optimistic stm with inplace memory updates and write buffering and pessimistic stm starting from an absorbing dtmc closed form analytic expressions are developed which are quickly solved iteratively to determine key parameters of the considered stm systems like the mean number of transaction restarts and the mean transaction length since the models reflect complex transactional behavior in terms of read write locking data consistency checks and conflict management independent of implementation details they highlight the algorithmic performance advantages of one system over the other which due to their at times small differences are often blurred by implementation of stm systems and even difficult to discern with statistically significant discrete event simulations
the hot set model characterizing the buffer requirements of relational queries is presented this model allows the system to determine the optimal buffer space to be allocated to query it can also be used by the query optimizer to derive efficient execution plans accounting for the available buffer space and by query scheduler to prevent thrashing the hot set model is compared with the working set model simulation study is presented
php is popular language for server side applications in php assignment to variables copies the assigned values according to its so called copy on assignment semantics in contrast typical php implementation uses copy on write scheme to reduce the copy overhead by delaying copies as much as possible this leads us to ask if the semantics and implementation of php coincide and actually this is not the case in the presence of sharings within values in this paper we describe the copy on assignment semantics with three possible strategies to copy values containing sharings the current php implementation has inconsistencies with these semantics caused by its naïve use of copy on write we fix this problem by the novel mostly copy on write scheme making the copy on write implementations faithful to the semantics we prove that our copy on write implementations are correct using bisimulation with the copy on assignment semantics
repairing database means bringing the database in accordance with given set of integrity constraints by applying some minimal change if database can be repaired in more than one way then the consistent answer to query is defined as the intersection of the query answers on all repaired versions of the databaseearlier approaches have confined the repair work to deletions and insertions of entire tuples we propose theoretical framework that also covers updates as repair primitive update based repairing is interesting in that it allows rectifying an error within tuple without deleting the tuple thereby preserving consistent values in the tuple another novel idea is the construct of nucleus single database that yields consistent answers to class of queries without the need for query rewriting we show the construction of nuclei for full dependencies and conjunctive queries consistent query answering and constructing nuclei is generally intractable under update based repairing nevertheless we also show some tractable cases of practical interest
iterative algorithms are often used for range image matching in this paper we treat the iterative process of range image matching as live biological system evolving from one generation to another whilst different generations of the population are regarded as range images captured at different viewpoints the iterative process is simulated using time the well known replicator equations in theoretical biology are then adapted to estimate the probabilities of possible correspondences established using the traditional closest point criterion to reduce the effect of image resolutions on the final results for efficient and robust overlapping range image matching the relative fitness difference rather than the absolute fitness difference is employed in the replicator equations in order to model the probability change of possible correspondences being real over successive iterations the fitness of possible correspondence is defined as the negative of power of its squared euclidean distance while the replicator dynamics penalize those individuals with low fitness they are further penalised with parameter since distant points are often unlikely to represent their real replicators while the replicator equations assume that all individuals are equally likely to meet each other and thus treat them equally we penalise those individuals competing for the same points as their possible replicators the estimated probabilities of possible correspondences being real are finally embedded into the powerful deterministic annealing scheme for global optimization resulting in the camera motion parameters being estimated in the weighted least squares sense comparative study based on real range images with partial overlap has shown that the proposed algorithm is promising for automatic matching of overlapping range images
program analysis and program optimization of java programs require reference information that estimates the instances of classes that may be accessed through dereferences recent work has presented several approaches for adapting andersen’s algorithm the most precise flow insensitive and context insensitive points to analysis algorithm developed for for analyzing java programs eg studies in our previous work indicate that this algorithm may compute very imprecise reference information for java programs
dynamic evolution can be used to upgrade distributed applications without shutdown and restart as way of improving service levels while minimising the loss of business revenue caused by the downtime an evaluation framework assessing the level of support offered by existing methodologies in composition based application eg component based and service oriented development is proposed it was developed by an analysis of the literature and existing methodologies together with refinement based on survey of experienced practitioners and researchers the use of the framework is demonstrated by applying it to twelve methodologies to assess their support for dynamic evolution
query processing is major cost factor in operating large web search engines in this paper we study query result caching one of the main techniques used to optimize query processing performance our first contribution is study of result caching as weighted caching problem most previous work has focused on optimizing cache hit ratios but given that processing costs of queries can vary very significantly we argue that total cost savings also need to be considered we describe and evaluate several algorithms for weighted result caching and study the impact of zipf based query distributions on result caching our second and main contribution is new set of feature based cache eviction policies that achieve significant improvements over all previous methods substantially narrowing the existing performance gap to the theoretically optimal clairvoyant method finally using the same approach we also obtain performance gains for the related problem of inverted list caching
automatically clustering web pages into semantic groups promises improved search and browsing on the web in this paper we demonstrate how user generated tags from large scale social bookmarking websites such as delicious can be used as complementary data source to page text and anchor text for improving automatic clustering of web pages this paper explores the use of tags in means clustering in an extended vector space model that includes tags as well as page text and novel generative clustering algorithm based on latent dirichlet allocation that jointly models text and tags we evaluate the models by comparing their output to an established web directory we find that the naive inclusion of tagging data improves cluster quality versus page text alone but more principled inclusion can substantially improve the quality of all models with statistically significant absolute score increase of the generative model outperforms means with another score increase
chinese word segmentation cws is necessary step in chinese english statistical machine translation smt and its performance has an impact on the results of smt however there are many choices involved in creating cws system such as various specifications and cws methods the choices made will create new cws scheme but whether it will produce superior or inferior translation has remained unknown to date this article examines the relationship between cws and smt the effects of cws on smt were investigated using different specifications and cws methods four specifications were selected for investigation beijing university pku hong kong city university cityu microsoft research msr and academia sinica as we created cws schemes under different settings to examine the relationship between cws and smt our experimental results showed that the msr’s specifications produced the lowest quality translations in examining the effects of cws methods we tested dictionary based and crf based approaches and found there was no significant difference between the two in the quality of the resulting translations we also found the correlation between the cws score and smt bleu score was very weak we analyzed cws errors and their effect on smt by evaluating systems trained with and without these errors this article also proposes two methods for combining advantages of different specifications simple concatenation of training data and feature interpolation approach in which the same types of features of translation models from various cws schemes are linearly interpolated we found these approaches were very effective in improving the quality of translations
this paper presents garnet novel spatial hypertext interface to digital library garnet supports both information structuring via spatial hypertext and traditional information seeking via digital library user study of garnet is reported together with an analysis of how the organizing work done by users in spatial hypertext workspace could support later information seeking the use of garnet during the study is related to both digital library and spatial hypertext research spatial hypertexts support the detection of implicit document groups in user’s workspace the study also investigates the degree of similarity found in the full text of documents within such document groups
we present novel grid based method for simulating multiple unmixable fluids moving and interacting unlike previous methods that can only represent the interface between two fluids usually between liquid and gas this method can handle an arbitrary number of fluids through multiple independent level sets coupled with constrain condition to capture the fluid surface more accurately we extend the particle level set method to multi fluid version it shares the advantages of the particle level set method and has the ability to track the interfaces of multiple fluids to handle the dynamic behavior of different fluids existing together we use multiphase fluid formulation based on smooth weight function
information filters play an important role in processing streams of events both for filtering as well as routing events based on their content stateful information filters like agile cayuga and sase have gained significant amount of attention recently such filters not only consider the data of single event but also additional state such as sequence of previous events or context state applications for wireless sensors and rfid data are particularly prominent examples for the need for stateful information filtering with use cases like event correlation or sensor data affecting the routing of other events while quality of service has been researched fairly thoroughly for networking systems and general data stream management systems no comprehensive work exists for information filters the goal of this work is to present qos criteria for stateful information filters and to examine how qos control methods established in other areas can be applied to information filters
developers of fault tolerant distributed systems need to guarantee that fault tolerance mechanisms they build are in themselves reliable otherwise these mechanisms might in the end negatively affect overall system dependability thus defeating the purpose of introducing fault tolerance into the system to achieve the desired levels of reliability mechanisms for detecting and handling errors should be developed rigorously or formally we present an approach to modeling and verifying fault tolerant distributed systems that use exception handling as the main fault tolerance mechanism in the proposed approach formal model is employed to specify the structure of system in terms of cooperating participants that handle exceptions in coordinated manner and coordinated atomic actions serve as representatives of mechanisms for exception handling in concurrent systems we validate the approach through two case studies system responsible for managing production cell and ii medical control system in both systems the proposed approach has helped us to uncover design faults in the form of implicit assumptions and omissions in the original specifications
motivated by the studies in gestalt principle this paper describes novel approach on the adaptive selection of visual features for trademark retrieval we consider five kinds of visual saliencies symmetry continuity proximity parallelism and closure property the first saliency is based on zernike moments while the others are modeled by geometric elements extracted illusively as whole from trademark given query trademark we adaptively determine the features appropriate for retrieval by investigating its visual saliencies we show that in most cases either geometric or symmetric features can give us good enough accuracy to measure the similarity of geometric elements we propose maximum weighted bipartite graph wbg matching algorithm under transformation sets which is found to be both effective and efficient for retrieval
the decision tree based classification is popular approach for pattern recognition and data mining most decision tree induction methods assume training data being present at one central location given the growth in distributed databases at geographically dispersed locations the methods for decision tree induction in distributed settings are gaining importance this paper describes one such method that generates compact trees using multifeature splits in place of single feature split decision trees generated by most existing methods for distributed data our method is based on fisher’s linear discriminant function and is capable of dealing with multiple classes in the data for homogeneously distributed data the decision trees produced by our method are identical to decision trees generated using fisher’s linear discriminant function with centrally stored data for heterogeneously distributed data certain approximation is involved with small change in performance with respect to the tree generated with centrally stored data experimental results for several well known datasets are presented and compared with decision trees generated using fisher’s linear discriminant function with centrally stored data
we describe secure network virtualization framework that helps realize the abstraction of trusted virtual domains tvds security enhanced variant of virtualized network zones the framework allows groups of related virtual machines running on separate physical machines to be connected together as though there were on their own separate network fabric and at the same time helps enforce cross group security requirements such as isolation confidentiality security and information flow control the framework uses existing network virtualization technologies such as ethernet encapsulation vlan tagging and vpns and combines and orchestrates them appropriately to implement tvds our framework aims at automating the instantiation and deployment of the appropriate security mechanism and network virtualization technologies based on an input security model that specifies the required level of isolation and permitted network flows we have implemented prototype of the framework based on the xen hypervisor experimental evaluation of the prototype shows that the performance of our virtual networking extensions is comparable to that of the standard xen configuration
one of the characteristics of scientific application software is its long lifetime of active maintenance there has been little software engineering research into the development characteristics of scientific software and into the factors that support its successful long evolution the research described in this paper introduces novel model to examine the nature of change that influenced an example of industrial scientific software over its lifetime the research uses the model to provide an objective analysis of factors that contributed to long term evolution of the software system conclusions suggest that the architectural design of the software and the characteristics of the software development group played major role in the successful evolution of the software the novel model of change and the research method developed for this study are independent of the type of software under study
many stream processing systems enforce an order on data streams during query evaluation to help unblock blocking operators and purge state from stateful operators such in order processing iop systems not only must enforce order on input streams but also require that query operators preserve order this order preserving requirement constrains the implementation of stream systems and incurs significant performance penalties particularly for memory consumption especially for high performance potentially distributed stream systems the cost of enforcing order can be prohibitive we introduce new architecture for stream systems out of order processing oop that avoids ordering constraints the oop architecture frees stream systems from the burden of order maintenance by using explicit stream progress indicators such as punctuation or heartbeats to unblock and purge operators we describe the implementation of oop stream systems and discuss the benefits of this architecture in depth for example the oop approach has proven useful for smoothing workload bursts caused by expensive end of window operations which can overwhelm internal communication paths in iop approaches we have implemented oop in two stream systems gigascope and niagarast our experimental study shows that the oop approach can significantly outperform iop in number of aspects including memory throughput and latency
this paper reports research into semi automatic generation of scenarios for validating software intensive system requirements the research was undertaken as part of the esprit iv lsquo crews rsquo long term research project the paper presents the underlying theoretical models of domain knowledge computational mechanisms and user driven dialogues needed for scenario generation it describes how crews draws on theoretical results from the esprit iii lsquo nature rsquo basic research action that is object system models which are abstractions of the fundamental features of different categories of problem domain crews uses these models to generate normal course scenarios then draws on theoretical and empirical research from cognitive science human computer interaction collaborative systems and software engineering to generate alternative courses for these scenarios the paper describes computational mechanism for deriving use cases from object system models simple rules to link actions in use case taxonomies of classes of exceptions which give rise to alternative courses in scenarios and computational mechanism for generation of multiple scenarios from use case specification
xml is quickly becoming the de facto standard for data exchange over the internet this is creating new set of data management requirements involving xml such as the need to store and query xml documents researchers have proposed using relational database systems to satisfy these requirements by devising ways to shred xml documents into relations and translate xml queries into sql queries over these relations however key issue with such an approach which has largely been ignored in the research literature is how and whether the ordered xml data model can be efficiently supported by the unordered relational data model this paper shows that xml’s ordered data model can indeed be efficiently supported by relational database system this is accomplished by encoding order as data value we propose three order encoding methods that can be used to represent xml order in the relational data model and also propose algorithms for translating ordered xpath expressions into sql using these encoding methods finally we report the results of an experimental study that investigates the performance of the proposed order encoding methods on workload of ordered xml queries and updates
correlation analysis is basic problem in the field of data stream mining typical approaches add sliding window to data streams to get the recent results but the window length defined by users is always fixed which is not suitable for the changing stream environment we propose boolean representation based data adaptive method for correlation analysis among large number of time series streams the periodical trends of each stream series to are monitored to choose the most suitable window size and group the series with the same trends together instead of adopting complex pair wise calculation we can also quickly get the correlation pairs of series at the optimal window sizes all the processing is realized by simple boolean operations both the theory analysis and the experimental evaluations show that our method has good computation efficiency with high accuracy
this paper proposes complementary novel idea called minitasking to further reduce the number of cache misses by improving the data temporal locality for multiple concurrent queries our idea is based on the observation that in many workloads such as decision support systems dss there is usually significant amount of data sharing among different concurrent queries minitasking exploits such data sharing to improve data temporal locality by scheduling query execution at three levels query level batching operator level grouping and mini task level scheduling the experimental results with various types of concurrent tpc query workloads show that with the traditional ary storage model nsm layout minitasking significantly reduces the cache misses by up to and thereby achieves reduction in execution time with the partition attributes across pax layout minitasking further reduces the cache misses by and the execution time by for the tpc throughput test workload minitasking improves the end performance up to
the field of delay tolerant networking is rich with protocols that exploit node mobility to overcome unpredictable or otherwise bad connectivity the performance of many of these protocols is highly sensitive to the underlying mobility model which determines the nodes movements and the characteristics of these mobility models are not often studied or compared with few exceptions authors test their ideas using mobility models implemented on simulators written for the specific purpose of testing their protocols we argue that it is better to unify these simulations to one highly capable simulator we develop suite of mobility models in omnet that specifically target delay tolerant networks we also present series of metrics that can be used to reason about mobility models independent of which communication protocols and data traffic patterns are in use these metrics can be used to compare existing mobility models with future ones and also to provide insight into which characteristics of the mobility models affect which aspects of protocol performance we implement tool that derives these metrics from omnet simulations and implement several popular delay tolerant mobility models finally we present the results of our analysis
webcams microphones pressure gauges and other sensors provide exciting new opportunities for querying and monitoring the physical world in this paper we focus on querying wide area sensor databases containing xml data derived from sensors spread over tens to thousands of miles we present the first scalable system for executing xpath queries on such databases the system maintains the logical view of the data as single xml document while physically the data is fragmented across any number of host nodes for scalability sensor data is stored close to the sensors but can be cached elsewhere as dictated by the queries our design enables self starting distributed queries that jump directly to the lowest common ancestor of the query result dramatically reducing query response times we present novel query evaluate gather technique using xslt for detecting which data in local database fragment is part of the query result and how to gather the missing parts we define partitioning and cache invariants that ensure that even partial matches on cached data are exploited and that correct answers are returned despite our dynamic query driven caching experimental results demonstrate that our techniques dramatically increase query throughputs and decrease query response times in wide area sensor databases
distributed enterprise applications today are increasingly being built from services available over the web unit of functionality in this framework is web service software application that exposes set of typed connections that can be accessed over the web using standard protocols these units can then be composed into composite web service bpel business process execution language is high level distributed programming language for creating composite web services although bpel program invokes services distributed over several servers the orchestration of these services is typically under centralized control because performance and throughput are major concerns in enterprise applications it is important to remove the inefficiencies introduced by the centralized control in distributed or decentralized orchestration the bpel program is partitioned into independent sub programs that interact with each other without any centralized control decentralization can increase parallelism and reduce the amount of network traffic required for an application this paper presents technique to partition composite web service written as single bpel program into an equivalent set of decentralized processes it gives new code partitioning algorithm to partition bpel program represented as program dependence graph with the goal of minimizing communication costs and maximizing the throughput of multiple concurrent instances of the input program in contrast much of the past work on dependence based partitioning and scheduling seeks to minimize the completion time of single instance of program running in isolation the paper also gives cost model to estimate the throughput of given code partition
in modern processors the dynamic translation of virtual addresses to support virtual memory is done before or in parallel with the first level cache access as processor technology improves at rapid pace and the working sets of new applications grow insatiably the latency and bandwidth demands on the tlb translation lookaside buffer are getting more and more difficult to meet the situation is worse in multiprocessor systems which run larger applications and are plagued by the tlb consistency problemwe evaluate and compare five options for virtual address translation in the context of comas cache only memory architectures the dynamic address translation mechanism can be located after the cache access provided the cache is virtual in particular design which we call coma for virtual coma the physical address concept and the traditional tlb are eliminated while still supporting virtual memory coma reduces the address translation overhead to minimumv coma scales well and works better in systems with large number of processors as machine running on virtual addresses coma provides simple and consistent hardware model to the operating system and the compiler in which further optimization opportunities are possible
due to increasing clock speeds increasing design sizes and shrinking technologies it is becoming more and more challenging to distribute single global clock throughout chip in this paper we study the effect of using globally asynchronous locally synchronous gals organization for superscalar out of order processor both in terms of power and performance to this end we propose novel modeling and simulation environment for multiple clock cores with static or dynamically variable voltages for each synchronous block using this design exploration environment we were able to assess the power performance tradeoffs available for multiple clock single voltage mcsv as well as multiple clock dynamic voltage mcdv cores our results show that mcsv processors are more power efficient when compared to single clock single voltage designs with performance penalty of about by exploiting the flexibility of independent dynamic voltage scaling the various clock domains the power efficiency of gals designs can be improved by on average and up to more in select cases the power efficiency of mcdv cores becomes comparable with the one of single clock dynamic voltage scdv cores while being up to better in some cases our results show that mcdv cores consume less power at an average performance loss
many scientific and high performance computing applications consist of multiple processes running on different processors that communicate frequently because of their synchronization needs these applications can suffer severe performance penalties if their processes are not all coscheduled to run together two common approaches to coscheduling jobs are batch scheduling wherein nodes are dedicated for the duration of the run and gang scheduling wherein time slicing is coordinated across processors both work well when jobs are load balanced and make use of the entire parallel machine however these conditions are rarely met and most realistic workloads consequently suffer from both internal and external fragmentation in which resources and processors are left idle because jobs cannot be packed with perfect efficiency this situation leads to reduced utilization and suboptimal performance flexible coscheduling fcs addresses this problem by monitoring each job’s computation granularity and communication pattern and scheduling jobs based on their synchronization and load balancing requirements in particular jobs that do not require stringent synchronization are identified and are not coscheduled instead these processes are used to reduce fragmentation fcs has been fully implemented on top of the storm resource manager on processor alpha cluster and compared to batch gang and implicit coscheduling algorithms this paper describes in detail the implementation of fcs and its performance evaluation with variety of workloads including large scale benchmarks scientific applications and dynamic workloads the experimental results show that fcs saturates at higher loads than other algorithms up to percent higher in some cases and displays lower response times and slowdown than the other algorithms in nearly all scenarios
buffer overflow has become major source of network security vulnerability traditional schemes for detecting buffer overflow attacks usually terminate the attacked service degrading the service availability in this paper we propose lightweight buffer overflow protection mechanism that allows continued network service the proposed mechanism allows service program to reconfigure itself to identify and protect the vulnerable functions upon buffer overflow attacks protecting only the vulnerable functions instead of the whole program keeps the runtime overhead small moreover the mechanism adopts the idea of failure oblivious computing to allow service programs to execute through memory errors caused by the attacks once the vulnerable functions have been identified eliminating the need of restarting the service program upon further attacks to the vulnerable functions we have applied the mechanism on five internet servers the experiment results show that the mechanism has little impact on the runtime performance
checking sequence generated from finite state machine is test sequence that is guaranteed to lead to failure if the system under test is faulty and has no more states than the specification the problem of generating checking sequence for finite state machine is simplified if has distinguishing sequence an input sequence bar with the property that the output sequence produced by in response to bar is different for the different states of previous work has shown that where distinguishing sequence is known an efficient checking sequence can be produced from the elements of set of sequences that verify the distinguishing sequence used and the elements of set upsilon of subsequences that test the individual transitions by following each transition by the distinguishing sequence that verifies the final state of in this previous work is predefined set and upsilon is defined in terms of the checking sequence is produced by connecting the elements of upsilon and to form single sequence using predefined acyclic set ec of transitions an optimization algorithm is used in order to produce the shortest such checking sequence that can be generated on the basis of the given and ec however this previous work did not state how the sets and ec should be chosen this paper investigates the problem of finding appropriate and ec to be used in checking sequence generation we show how set may be chosen so that it minimizes the sum of the lengths of the sequences to be combined further we show that the optimization step in the checking sequence generation algorithm may be adapted so that it generates the optimal ec experiments are used to evaluate the proposed method
privacy preserving data mining addresses the need of multiple parties with private inputs to run data mining algorithm and learn the results over the combined data without revealing any unnecessary information most of the existing cryptographic solutions to privacy preserving data mining assume semi honest participants in theory these solutions can be extended to the malicious model using standard techniques like commitment schemes and zero knowledge proofs however these techniques are often expensive especially when the data sizes are large in this paper we investigate alternative ways to convert solutions in the semi honest model to the malicious model we take two classical solutions as examples one of which can be extended to the malicious model with only slight modifications while another requires careful redesign of the protocol in both cases our solutions for the malicious model are much more efficient than the zero knowledge proofs based solutions
preserving user trust in recommender system depends on the perception of the system as objective unbiased and accurate however publicly accessible user adaptive systems such as collaborative recommender systems present security problem attackers closely resembling ordinary users might introduce biased profiles to force the system to adapt in manner advantageous to them the authors discuss some of the major issues in building secure recommender systems including some of the most effective attacks and their impact on various recommendation algorithms approaches for responding to these attacks range from algorithmic approaches to designing more robust recommenders to effective methods for detecting and eliminating suspect profiles
current web search engines focus on searching only themost recentsnapshot of the web in some cases however it would be desirableto search over collections that include many different crawls andversions of each page one important example of such collectionis the internet archive though there are many others sincethe data size of such an archive is multiple times that of singlesnapshot this presents us with significant performance challengescurrent engines use various techniques for index compression andoptimized query execution but these techniques do not exploit thesignificant similarities between different versions of page or betweendifferent pagesin this paper we propose general framework for indexing andquery processing of archival collections and more generally anycollections with sufficient amount of redundancy our approachresults in significant reductions in index size and query processingcosts on such collections and it is orthogonal to and can be combinedwith the existing techniques it also supports highly efficientupdates both locally and over network within this framework we describe and evaluate different implementations that trade offindex size versus cpu cost and other factors and discuss applicationsranging from archival web search to local search of web sites email archives or file systems we present experimental resultsbased on search engine query log and large collection consistingof multiple crawls
consider set of customers eg wifi receivers and set of service providers eg wireless access points where each provider has capacity and the quality of service offered to its customers is anti proportional to their distance the capacity constrained assignment cca is matching between the two sets such that each customer is assigned to at most one provider ii every provider serves no more customers than its capacity iii the maximum possible number of customers are served and iv the sum of euclidean distances within the assigned provider customer pairs is minimized although max flow algorithms are applicable to this problem they require the complete distance based bipartite graph between the customer and provider sets for large spatial datasets this graph is expensive to compute and it may be too large to fit in main memory motivated by this fact we propose efficient algorithms for optimal assignment that employ novel edge pruning strategies based on the spatial properties of the problem additionally we develop incremental techniques that maintain an optimal assignment in the presence of updates with processing cost several times lower than cca recomputation from scratch finally we present approximate ie suboptimal cca solutions that provide tunable trade off between result accuracy and computation cost abiding by theoretical quality guarantees thorough experimental evaluation demonstrates the efficiency and practicality of the proposed techniques
new approach to software reliability estimation is presented that combines operational testing with stratified sampling in order to reduce the number of program executions that must be checked manually for conformance to requirements automatic cluster analysis is applied to execution profiles in order to stratify captured operational executions experimental results are reported that suggest this approach can significantly reduce the cost of estimating reliability
in this article we address the problem of reference disambiguation specifically we consider situation where entities in the database are referred to using descriptions eg set of instantiated attributes the objective of reference disambiguation is to identify the unique entity to which each description corresponds the key difference between the approach we propose called reldc and the traditional techniques is that reldc analyzes not only object features but also inter object relationships to improve the disambiguation quality our extensive experiments over two real data sets and over synthetic datasets show that analysis of relationships significantly improves quality of the result
when an xml document conforms to given type eg dtd or an xml schema type it is called valid document checking if given xml document is valid is called the validation problem and is typically performed by parser hence validating parser more precisely it is performed right after parsing by the same program module in practice however xml documents are often generated dynamically by some program checking whether all xml documents generated by the program are valid wrt given type is called the typechecking problem while validation analyzes an xml document type checker analyzes program and the problem’s difficulty is function of the language in which that program is expressed the xml typechecking problem has been investigated recently in msv hp hvp amn amn and the xquery working group adopted some of these techniques for typechecking xquery ffm all these techniques however have limitations which need to be understood and further explored and investigated in this paper we define the xml typechecking problem and present current approaches to typechecking discussing their limitations
bibliometrics are important measures for venue quality in digital libraries impacts of venues are usually the major consideration for subscription decision making and for ranking and recommending high quality venues and documents for digital libraries in the computer science literature domain conferences play major role as an important publication and dissemination outlet however with recent profusion of conferences and rapidly expanding fields it is increasingly challenging for researchers and librarians to assess the quality of conferences we propose set of novel heuristics to automatically discover prestigious and low quality conferences by mining the characteristics of program committee members we examine the proposed cues both in isolation and combination under classification scheme evaluation on collection of conferences and pc members shows that our heuristics when combined correctly classify about of the conferences with low false positive rate of and recall of more than for identifying reputable conferences furthermore we demonstrate empirically that our heuristics can also effectively detect set of low quality conferences with false positive rate of merely we also report our experience of detecting two previously unknown low quality conferences finally we apply the proposed techniques to the entire quality spectrum by ranking conferences in the collection
we study the problem of learning an unknown function represented as an expression or program over known finite monoid as in other areas of computational complexity where programs over algebras have been used the goal is to relate the computational complexity of the learning problem with the algebraic complexity of the finite monoid indeed our results indicate close connection between both kinds of complexity we focus on monoids which are either groups or aperiodic and on the learning model of exact learning from queries for group we prove that expressions over are efficiently learnable if is nilpotent and impossible to learn efficiently under cryptographic assumptions if is nonsolvable we present some results for restricted classes of solvable groups and point out connection between their efficient learnability and the existence of lower bounds on their computational power in the program model for aperiodic monoids our results seem to indicate that the monoid class known as da captures exactly learnability of expressions by polynomially many evaluation queries when using programs instead of expressions we show that our results for groups remain true while the situation is quite different for aperiodic monoids
power energy and thermal concerns have constrained embedded systems designs computing capability and storage density have increased dramatically enabling the emergence of handheld devices from special to general purpose computing in many mobile systems the disk is among the top energy consumers many previous optimizations for disk energy have assumed uniprogramming environments however many optimizations degrade in multiprogramming because programs are unaware of other programs execution context we introduce framework to make programs aware of and adapt to their runtime execution context we evaluated real workloads by collecting user activity traces and characterizing the execution contexts the study confirms that many users run limited number of programs concurrently we applied execution context optimizations to eight programs and tested ten combinations the programs ran concurrently while the disk’s power was measured our measurement infrastructure allows interactive sessions to be scripted recorded and replayed to compare the optimizations effects against the baseline our experiments covered two write cache policies for write through energy savings was in the range with an average of for write back energy savings was in the range with an average of in all cases our optimizations incurred less than performance penalty
the advent of commerce government and the rapid expansion of world wide connectivity demands end user systems that adhere to well defined security policies in this context trusted computing tc aims at providing framework and effective mechanisms that allow computing platforms and processes in distributed it system to gain assurance about each other’s integrity trustworthiness an industrial attempt towards realization of tc is the initiative of the trusted computing group tcg an alliance of large number of it enterprises the tcg has published set of specifications for extending conventional computer architectures with variety of security related features and cryptographic mechanisms the tcg approach has not only been subject of research but also public debates and concerns currently several prominent academic and industrial research projects are investigating trustworthy it systems based on tc virtualization technology and secure operating system design we highlight special aspects of trusted computing and present some current research and challenges we believe that tc technology is indeed capable of enhancing the security of computer systems and is another helpful means towards establishing trusted infrastructures however we also believe that it is not universal remedy for all of the security problems we are currently facing in information societies
when releasing microdata for research purposes one needs to preserve the privacy of respondents while maximizing data utility an approach that has been studied extensively in recent years is to use anonymization techniques such as generalization and suppression to ensure that the released data table satisfies the anonymity property major thread of research in this area aims at developing more flexible generalization schemes and more efficient searching algorithms to find better anonymizations ie those that have less information loss this paper presents three new generalization schemes that are more flexible than existing schemes this flexibility can lead to better anonymizations we present taxonomy of generalization schemes and discuss their relationship we present enumeration algorithms and pruning techniques for finding optimal generalizations in the new schemes through experiments on real census data we show that more flexible generalization schemes produce higher quality anonymizations and the bottom up works better for small values and small number of quasi identifier attributes than the top down approach
in text mining we are often confronted with very high dimensional data clustering with high dimensional data is challenging problem due to the curse of dimensionality in this paper to address this problem we propose an subspace maximum margin clustering smmc method which performs dimensionality reduction and maximum margin clustering simultaneously within unified framework we aim to learn subspace in which we try to find cluster assignment of the data points together with hyperplane classifier such that the resultant margin is maximized among all possible cluster assignments and all possible subspaces the original problem is transformed from learning the subspace to learning positive semi definite matrix in order to avoid tuning the dimensionality of the subspace the transformed problem can be solved efficiently via cutting plane technique and constrained concave convex procedure cccp since the sub problem in each iteration of cccp is joint convex alternating minimization is adopted to obtain the global optimum experiments on benchmark data sets illustrate that the proposed method outperforms the state of the art clustering methods as well as many dimensionality reduction based clustering approaches
this paper presents the results of an experiment to measure empirically the remaining opportunities for exploiting loop level parallelism that are missed by the stanford suif compiler state of the art automatic parallelization system targeting shared memory multiprocessor architectures for the purposes of this experiment we have developed run time parallelization test called the extended lazy privatizing doall elpd test which is able to simultaneously test multiple loops in loop nest the elpd test identifies specific type of parallelism where each iteration of the loop being tested accesses independent data possibly by making some of the data private to each processor for programs in three benchmark suites the elpd test was executed at run time for each candidate loop left unparallelized by the suif compiler to identify which of these loops could safely execute in parallel for the given program input the results of this experiment point to two main requirements for improving the effectiveness of parallelizing compiler technology incorporating control flow tests into analysis and extracting low cost run time parallelization tests from analysis results
we propose the study of visibly pushdown automata vpa for processing xml documents vpas are pushdown automata where the input determines the stack operation and xml documents are naturally visibly pushdown with the vpa pushing onto the stack on open tags and popping the stack on close tags in this paper we demonstrate the power and ease visibly pushdown automata give in the design of streaming algorithms for xml documents we study the problems of type checking streaming xml documents against sdtd schemas and the problem of typing tags in streaming xml document according to an sdtd schema for the latter problem we consider both pre order typing and post order typing of document which dynamically determines types at open tags and close tags respectively as soon as they are met we also generalize the problems of pre order and post order typing to prefix querying we show that deterministic vpa yields an algorithm to the problem of answering in one pass the set of all answers to any query that has the property that node satisfying the query is determined solely by the prefix leading to the node all the streaming algorithms we develop in this paper are based on the construction of deterministic vpas and hence for any fixed problem the algorithms process each element of the input in constant time and use space where is the depth of the document
emerging ubiquitous computing network is expected to consist of variety of heterogeneous and distributed devices while web services technology is increasingly being considered as promising solution to support the inter operability between such heterogeneous devices via well defined protocol currently there is no effective framework reported in the literature that can address the problem of coordinating the web services enabled devices this paper considers ubiquitous computing environment that is comprised of active autonomous devices interacting with each other through web services and presents an eca event condition action based framework for effective coordination of those devices specifically we first present an xml based language for describing eca rules that are embedded in web service enabled devices an eca rule when triggered by an internal or external event to the device can result in the invocation of appropriate web services in the system subsequently we consider the situation in which the rules are introduced and managed by multiple independent users and propose effective mechanisms that can detect and resolve potential inconsistencies among the rules the presented eca based coordination approach is expected to facilitate seamless inter operation among the web service enabled devices in the emerging ubiquitous computing environments
nearest neighbor searching is the problem of preprocessing set of point points in dimensional space so that given any query point it is possible to report the closest point to rapidly in approximate nearest neighbor searching parameter epsiv is given and multiplicative error of epsiv is allowed we assume that the dimension is constant and treat and epsiv as asymptotic quantities numerous solutions have been proposed ranging from low space solutions having space and query time log epsiv minus to high space solutions having space roughly log epsiv and query time log epsiv we show that there is single approach to this fundamental problem which both improves upon existing results and spans the spectrum of space time tradeoffs given tradeoff parameter gamma where le gamma le epsiv we show that there exists data structure of space gamma minus log epsiv that can answer queries in time log gamma epsiv gamma minus when gamma equals this yields data structure of space log epsiv that can answer queries in time log epsiv minus when gamma equals epsiv it provides data structure of space epsiv minus log epsiv that can answer queries in time log epsiv our results are based on data structure called epsiv avd which is hierarchical quadtree based subdivision of space into cells each cell stores up to representative points of the set such that for any query point in the cell at least one of these points is an approximate nearest neighbor of we provide new algorithms for constructing avds and tools for analyzing their total space requirements we also establish lower bounds on the space complexity of avds and show that up to factor of log epsiv our space bounds are asymptotically tight in the two extremes gamma equals and gamma equals epsiv
the web wrapping proble ie the problem of extracting structured information from html documents is one of great practical importance the often observed information overload that users of the web experience witnesses the lack of intelligent and encompassing web services that provide high quality collected and value added inforamtion the web wrapping problem has been addressed by significant amount of research work previous work can be classified into two categories depending on whether the html input is regarded as sequential character string eg or pre parsed document tree for instance the latter category of work thus assumes that systems may make use of an existing html parser as front and
in this paper we investigate the power implications of tile size selection for tile based processors we refer to this investigation as tile granularity study this is accomplished by distilling the architectural cost of tiles with different computational widths into system metric we call the granularity indicator gi the gi is then compared against the communications exposed when algorithms are partitioned across multiple tiles through this comparison the tile granularity that best fits given set of algorithms can be determined reducing the system power for that set of algorithms when the gi analysis is applied to the synchroscalar tile architecture we find that synchroscalar’s already low power consumption can be further reduced by when customized for execution of the reciever in addition the gi can also be used to evaluate tile size when considering multiple applications simultaneously providing convenient platform for hardware software co design
this paper describes scheme of encapsulating test support code as built in test bit components and embedding them into the hot spots of an object oriented framework so that defects caused by the modification and extension of the framework can be detected effectively and efficiently through testing the test components embedded into framework in this way increase the testability of the framework by making it easy to control and observe the process of framework testing the proposed technique is illustrated using the facilities of our testing scheme however is equally applicable to other object oriented languages using our scheme test components can be designed and embedded into the hot spots of framework without incurring changes or intervention to the framework code and also can be attached and detached dynamically to from the framework as needed at run time
traditional debug methodologies are limited in their ability to provide debugging support for many core parallel programming synchronization problems or bugs due to race conditions are particularly difficult to detect with software debugging tools most traditional debugging approaches rely on globally synchronized signals but these pose problems in terms of scalability the first contribution of this paper is to propose novel nonuniform debug architecture nuda based on ring interconnection schema our approach makes debugging both feasible and scalable for many core processing scenarios the key idea is to distribute the debugging support structures across set of hierarchical clusters while avoiding address overlap this allows the address space to be monitored using non uniform protocols our second contribution is non intrusive approach to race detection supported by the nuda non uniform page based monitoring cache in each nuda node is used to watch the access footprints the union of all the caches can serve as race detection probe using the proposed approach we show that parallel race bugs can be precisely captured and that most false positive alerts can be efficiently eliminated at an average slow down cost of only the net hardware cost is relatively low so that the nuda can readily scale increasingly complex many core systems
multicore architectures are an inflection point in mainstream software development because they force developers to write parallel programs in previous article in queue herb sutter and james larus pointed out ldquo the concurrency revolution is primarily software revolution the difficult problem is not building multicore hardware but programming it in way that lets mainstream applications benefit from the continued exponential growth in cpu performance rdquo in this new multicore world developers must write explicitly parallel applications that can take advantage of the increasing number of cores that each successive multicore generation will provide
finite mixture models have been applied for different computer vision image processing and pattern recognition tasks the majority of the work done concerning finite mixture models has focused on mixtures for continuous data however many applications involve and generate discrete data for which discrete mixtures are better suited in this paper we investigate the problem of discrete data modeling using finite mixture models we propose novel well motivated mixture that we call the multinomial generalized dirichlet mixture the novel model is compared with other discrete mixtures we designed experiments involving spatial color image databases modeling and summarization and text classification to show the robustness flexibility and merits of our approach
reuse of domain models is often limited to the reuse of the structural aspects of the domain eg by means of generic data models in object oriented models reuse of dynamic aspects is achieved by reusing the methods of domain classes because in the object oriented approach any behavior is attached to class it is impossible to reuse behavior without at the same time reusing the class in addition because of the message passing paradigm object interaction must be specified as method attached to one class which is invoked by another class in this way object interaction is hidden in the behavioral aspects of classes this makes object interaction schemas difficult to reuse and customize the focus of this paper is on improving the reuse of object oriented domain models this is achieved by centering the behavioral aspects around the concept of business events
proof checkers for proof carrying code and similar systems can suffer from two problems huge proof witnesses and untrustworthy proof rules no previous design has addressed both of these problems simultaneously we show the theory design and implementation of proof checker that permits small proof witnesses and machine checkable proofs of the soundness of the system
current evaluation metrics for machine translation have increasing difficulty in distinguishing good from merely fair translations we believe the main problem to be their inability to properly capture meaning good translation candidate means the same thing as the reference translation regardless of formulation we propose metric that assesses the quality of mt output through its semantic equivalence to the reference translation based on rich set of match and mismatch features motivated by textual entailment we first evaluate this metric in an evaluation setting against combination metric of four state of the art scores our metric predicts human judgments better than the combination metric combining the entailment and traditional features yields further improvements then we demonstrate that the entailment metric can also be used as learning criterion in minimum error rate training mert to improve parameter estimation in mt system training manual evaluation of the resulting translations indicates that the new model obtains significant improvement in translation quality
in this work we examine the potential of using the recently released sti cell processor as building block for future high end scientific computing systems our work contains several novel contributions first we introduce performance model for cell and apply it to several key numerical kernels dense matrix multiply sparse matrix vector multiply stencil computations and ffts next we validate our model by comparing results against published hardware data as well as our own cell blade implementations additionally we compare cell performance to benchmarks run on leading superscalar amd opteron vliw intel itanium and vector cray xe architectures our work also explores several different kernel implementations and demonstrates simple and effective programming model for cell’s unique architecture finally we propose modest microarchitectural modifications that could significantly increase the efficiency of double precision calculations overall results demonstrate the tremendous potential of the cell architecture for scientific computations in terms of both raw performance and power efficiency
updating delaunay triangulation when its vertices move is bottleneck in several domains of application rebuilding the whole triangulation from scratch is surprisingly very viable option compared to relocating the vertices this can be explained by several recent advances in efficient construction of delaunay triangulations however when all points move with small magnitude or when only fraction of the vertices move rebuilding is no longer the best option this paper considers the problem of efficiently updating delaunay triangulation when its vertices are moving under small perturbations the main contribution is set of filters based upon the concept of vertex tolerances experiments show that filtering relocations is faster than rebuilding the whole triangulation from scratch under certain conditions
mouselight is spatially aware standalone mobile projector with the form factor of mouse that can be used in combination with digital pens on paper by interacting with the projector and the pen bimanually users can visualize and modify the virtually augmented contents on top of the paper and seamlessly transition between virtual and physical information we present high fidelity hardware prototype of the system and demonstrate set of novel interactions specifically tailored to the unique properties of mouselight mouselight differentiates itself from related systems such as penlight in two aspects first mouselight presents rich set of bimanual interactions inspired by the toolglass interaction metaphor but applied to physical paper secondly our system explores novel displaced interactions that take advantage of the independent input and output that is spatially aware of the underneath paper these properties enable users to issue remote commands such as copy and paste or search we also report on preliminary evaluation of the system which produced encouraging observations and feedback
variable order markov chains vomcs are flexible class of models that extend the well known markov chains they have been applied to variety of problems in computational biology eg protein family classification linear time and space construction algorithm has been published in by apostolico and bejerano however neither report of the actual running time nor an implementation of it have been published since in this paper we use the lazy suffix tree and the enhanced suffix array to improve upon the algorithm of apostolico and bejerano we introduce new software which is orders of magnitude faster than current tools for building vomcs and is suitable for large scale sequence analysis
in this paper we present cowsami middleware infrastructure that enables context awareness in open ambient intelligence environments consisting of mobile users and context sources that become dynamically available as the users move from one location to another central requirement in such dynamic scenarios is to be able to integrate new context sources and users at run time cowsami exploits novel approach towards this goal the proposed approach is based on utilizing web services as interfaces to context sources and dynamically updatable relational views for storing aggregating and interpreting context context rules are employed to provide mappings that specify how to populate context relations with respect to the different context sources that become dynamically available an underlying context sources discovery mechanism is utilized to maintain context information up to date as context sources and users get dynamically involved
the main contributions of this paper are two fold first we present simple general framework for obtaining efficient constant factor approximation algorithms for the mobile piercing set mps problem on unit disks for standard metrics in fixed dimension vector spaces more specifically we provide low constant approximations for and norms on dimensional space for any fixed and for the norm on two and three dimensional spaces our framework provides family of fully distributed and decentralized algorithms which adapt asymptotically optimally to the mobility of disks at the expense of low degradation on the best known approximation factors of the respective centralized algorithms our algorithms take time to update the piercing set maintained per movement of disk we also present family of fully distributed algorithms for the mps problem which either match or improve the best known approximation bounds of centralized algorithms for the respective norms and space dimensionssecond we show how the proposed algorithms can be directly applied to provide theoretical performance analyses for two popular hop clustering algorithms in ad hoc networks the lowest id algorithm and the least cluster change lcc algorithm more specifically we formally prove that the lcc algorithm adapts in constant time to the mobility of the network nodes and minimizes up to low constant factors the number of hop clusters maintained while there is vast literature on simulation results for the lcc and the lowest id algorithms these had not been formally analyzed prior to this workwe also present an log approximation algorithm for the mobile piercing set problem for nonuniform disks ie disks that may have different radii with constant update time
the proliferation of text documents on the web as well as within institutions necessitates their convenient organization to enable efficient retrieval of information although text corpora are frequently organized into concept hierarchies or taxonomies the classification of the documents into the hierarchy is expensive in terms human effort we present novel and simple hierarchical dirichlet generative model for text corpora and derive an efficient algorithm for the estimation of model parameters and the unsupervised classification of text documents into given hierarchy the class conditional feature means are assumed to be inter related due to the hierarchical bayesian structure of the model we show that the algorithm provides robust estimates of the classification parameters by performing smoothing or regularization we present experimental evidence on real web data that our algorithm achieves significant gains in accuracy over simpler models
structural constraint solving is being increasingly used for software reliability tasks such as systematic testing or error recovery for example the korat algorithm provides constraint based test generation given java predicate that describes desired input constraints and bound on the input size korat systematically searches the bounded input space of the predicate to generate all inputs that satisfy the constraints as another example the starc tool uses constraint based search to repair broken data structures key issue for these approaches is the efficiency of search this paper presents novel approach that significantly improves the efficiency of structural constraint solvers specifically most existing approaches use backtracking through code re execution to explore their search space in contrast our approach performs checkpoint based backtracking by storing partial program states and performing abstract undo operations the heart of our approach is light weight search that is performed purely through code instrumentation the experimental results on korat and starc for generating and repairing set of complex data structures show an order to two orders of magnitude speed up over the traditionally used searches
self management is put forward as one of the means by which we could provide systems that are scalable support dynamic composition and rigorous analysis and are flexible and robust in the presence of change in this paper we focus on architectural approaches to self management not because the language level or network level approaches are uninteresting or less promising but because we believe that the architectural level seems to provide the required level of abstraction and generality to deal with the challenges posed self managed software architecture is one in which components automatically configure their interaction in way that is compatible with an overall architectural specification and achieves the goals of the system the objective is to minimise the degree of explicit management necessary for construction and subsequent evolution whilst preserving the architectural properties implied by its specification this paper discusses some of the current promising work and presents an outline three layer reference model as context in which to articulate some of the main outstanding research challenges
we present method for scattered data approximation with subdivision surfaces which actually uses the true representation of the limit surface as linear combination of smooth basis functions associated with the control vertices robust and fast algorithm for exact closest point search on loop surfaces which combines newton iteration and non linear minilnization is used for parameterizing the samples based on this we perform unconditionally convergent parameter correction to optimize the approximation with respect to the metric and thus we make well established scattered data fitting technique which has been available before only for spline surfaces applicable to subdivision surfaces we also adapt the recently discovered local second order squared distance function approximant to the parameter correction setup further we exploit the fact that the control mesh of subdivision surface can have arbitrary connectivity to reduce the error up to certain user defined tolerance by adaptively restructuring the control mesh combining the presented algorithms we describe complete procedure which is able to produce high quality approximations of complex detailed models
we advocate the use of point sets to represent shapes we provide definition of smooth manifold surface from set of points close to the original surface the definition is based on local maps from differential geometry which are approximated by the method of moving least squares mls we present tools to increase or decrease the density of the points thus allowing an adjustment of the spacing among the points to control the fidelity of the representationto display the point set surface we introduce novel point rendering technique the idea is to evaluate the local maps according to the image resolution this results in high quality shading effects and smooth silhouettes at interactive frame rates
in this paper we propose novel online learning algorithm for system level power management we formulate both dynamic power management dpm and dynamic voltage frequency scaling problems as one of workload characterization and selection and solve them using our algorithm the selection is done among set of experts which refers to set of dpm policies and voltage frequency settings leveraging the fact that different experts outperform each other under different workloads and device leakage characteristics the online learning algorithm adapts to changes in the characteristics and guarantees fast convergence to the best performing expert in our evaluation we perform experiments on hard disk drive hdd and intel pxax core cpu with real life workloads our results show that our algorithm adapts really well and achieves an overall performance comparable to the best performing expert at any point in time with energy savings as high as and for hdd and cpu respectively moreover it is extremely lightweight and has negligible overhead
pauses in distributed groupware activity can indicate anything from technical latency through infrastructure failure to participant’s thoughtful contemplation unraveling these ambiguities highlights mismatches between unseen off screen activities and on screen cursor behaviors in this paper we suggest that groupware systems have typically been poor at representing off screen activities and introduce the concept of display trajectories to bridge the sensor gap between the display and its surrounding space we consider requirements for display trajectories using the distributed social scientific analysis of video data as an example domain drawing on these requirements we prototype freeform whiteboard pen tracking and visualization technique around displays using ultrasound we describe an experiment which inspects the impact of display trajectories on remote response efficiency our findings show that visualization of the display trajectory improves participants ability to coordinate their actions by one second per interaction turn reducing latency in organizing turn taking by standard maximum conversation pause
new hand held laser range scanner is introduced that can capture multi view range images of an object and integrate the images without registering them the scanner uses reference double frame that acts as the coordinate system of the object range images captured from different views of the object are in the coordinate system of the double frame and thus automatically come together single view image is obtained by sweeping laser line over the object while keeping the camera fixed and analyzing the acquired laser stripes the laser line generator and the camera can move independently making it possible to conveniently scan an object just like painting over it with paintbrush while viewing it from different views the hardware and software organization of the scanner are described the characteristics of the scanner are investigated and example images captured by the scanner are presented
instruction scheduling hardware can be simplifiedand easily pipelined if pairs of dependent instructionsare fused so they share single instruction schedulingslot we study an implementation of the isa thatdynamically translates code to an underlying isathat supports instruction fusing microarchitecturethat is co designed with the fused instruction set completesthe implementationin this paper we focus on the dynamic binarytranslator for such co designed virtual machinethe dynamic binary translator first cracks instructionsbelonging to hot superblocks into risc stylemicro operations and then uses heuristics to fuse togetherpairs of dependent micro operationsexperimental results with spec integer benchmarksdemonstrate that the fused isa with dynamicbinary translation reduces the number of schedulingdecisions by about versus conventionalimplementation that uses hardware cracking into riscmicro operations an instruction scheduling slotneeds only hold two source register fields even thoughit may hold two instructions translations generatedin the proposed isa consume about less storagethan corresponding fixed length risc style isa
real time control of three dimensional avatars is an important problem in the context of computer games and virtual environments avatar animation and control is difficult however because large repertoire of avatar behaviors must be made available and the user must be able to select from this set of behaviors possibly with low dimensional input device one appealing approach to obtaining rich set of avatar behaviors is to collect an extended unlabeled sequence of motion data appropriate to the application in this paper we show that such motion database can be preprocessed for flexibility in behavior and efficient search and exploited for real time avatar control flexibility is created by identifying plausible transitions between motion segments and efficient search through the resulting graph structure is obtained through clustering three interface techniques are demonstrated for controlling avatar motion using this data structure the user selects from set of available choices sketches path through an environment or acts out desired motion in front of video camera we demonstrate the flexibility of the approach through four different applications and compare the avatar motion to directly recorded human motion
we study the problem of aggregate querying over sensor networks where the network topology is continuously evolving we develop scalable data aggregation techniques that remain efficient and accurate even as nodes move join or leave the network we present novel distributed algorithm called counttorrent that enables fast estimation of certain classes of aggregate queries such as count and sum counttorrent does not require static routing infrastructure is easily implemented in distributed setting and can be used to inform all network nodes of the aggregate query result instead of just the query initiator as is done in traditional query aggregation schemes we evaluate its robustness and accuracy compared to previous aggregation approaches through simulations of dynamic and mobile sensor network environments and experiments on micaz motes we show that in networks where the nodes are stationary counttorrent can provide accurate aggregate results even in the presence of lossy links in mobile sensor networks where the nodes constantly move and hence the network topology changes continuously counttorrent provides close within estimate of the accurate aggregate query value to all nodes in the network at all times
high performance embedded architectures will in some cases combine simple caches and multithreading two techniques that increase energy efficiency and performance at the same time however that combination can produce high and unpredictable cache miss rates even when the compiler optimizes the data layout of each program for the cache this paper examines data cache aware compilation for multithreaded architectures data cache aware compilation finds layout for data objects which minimizes inter object conflict misses this research extends and adapts prior cache conscious data layout optimizations to the much more difficult environment of multithreaded architectures solutions are presented for two computing scenarios the more general case where any application can be scheduled along with other applications and the case where the co scheduled working set is more precisely known
demand for content served by provider can fluctuate with time complicating the task of provisioning serving resources so that requests for its content are not rejected one way to address this problem is to have providers form collective in which they pool together their serving resources to assist in servicing requests for one another’s content in this paper we determine the conditions under which provider’s participation in collective reduces the rejection rate of requests for its content property that is necessary for the provider to justify participating in the collective we show that all request rejection rates are reduced when the collective is formed from homogeneous set of providers but that some rates can increase within heterogeneous sets of collectives we also show that asymptotically growing the size of the collective will sometimes but not always resolve this problem we explore the use of thresholding techniques where each collective participant sets aside portion of its serving resources to serve only requests for its own content we show that thresholding allows more diverse set of providers to benefit from the collective model making collectives more viable option for content delivery services
data breakpoint associates debugging actions with programmer specified conditions on the memory state of an executing program data breakpoints provide means for discovering program bugs that are tedious or impossible to isolate using control breakpoints alone in practice programmers rarely use data breakpoints because they are either unimplemented or prohibitively slow in available debugging software in this paper we present the design and implementation of practical data breakpoint facility data breakpoint facility must monitor all memory updates performed by the program being debugged we implemented and evaluated two complementary techniques for reducing the overhead of monitoring memory updates first we checked write instructions by inserting checking code directly into the program being debugged the checks use segmented bitmap data structure that minimizes address lookup complexity second we developed data flow algorithms that eliminate checks on some classes of write instructions but may increase the complexity of the remaining checks we evaluated these techniques on the sparc using the spec benchmarks checking each write instruction using segmented bitmap achieved an average overhead of this overhead is independent of the number of breakpoints in use data flow analysis eliminated an average of of the dynamic write checks for scientific programs such the nas kernels analysis reduced write checks by factor of ten or more on the sparc these optimizations reduced the average overhead to
in emergency scenarios we can obtain more effective coordination among team members each of them equipped with hand held devices through the use of workflow management software team members constitute mobile ad hoc network manet whose topology both influences and is influenced by the workflow in this paper we propose an algebraic approach for modeling workflow progress as well as its modifications as required by topology transformations the approach is based on algebraic higher order nets and sees both workflows and topologies as tokens allowing their concurrent modification
effective and efficient retrieval of similar shapes from large image databases is still challenging problem in spite of the high relevance that shape information can have in describing image contents in this paper we propose novel fourier based approach called warp for matching and retrieving similar shapes the unique characteristics of warp are the exploitation of the phase of fourier coefficients and the use of the dynamic time warping dtw distance to compare shape descriptors while phase information provides more accurate description of object boundaries than using only the amplitude of fourier coefficients the dtw distance permits us to accurately match images even in the presence of limited phase shiftings in terms of classical precision recall measures we experimentally demonstrate that warp can gain say up to percent in precision at percent recall level with respect to fourier based techniques that use neither phase nor dtw distance
with the increased complexity of platforms coupled with data centers servers sprawl power consumption is reaching unsustainable limits memory is an important target for platform level energy efficiency where most power management techniques use multiple power state dram devices to transition them to low power states when they are sufficiently idle however fully interleaved memory in high performance servers presents research challenge to the memory power management problem due to data striping across all memory modules memory accesses are distributed in manner that considerably reduces the idleness of memory modules to warrant transitions to low power states in this paper we introduce novel technique for dynamic memory interleaving that is adaptive to incoming workload in manner that reduces memory energy consumption while maintaining the performance at an acceptable level we use optimization theory to formulate and solve the power performance management problem we use dynamic cache line migration techniques to increase the idleness of memory modules by consolidating the application’s working set on minimal set of ranks our technique yields energy saving of about kj compared to traditional techniques measured at it delivers the maximum performance per watt during all phases of the application execution with maximum performance per watt improvement of
recently there have been several experimental and theoretical results showing significant performance benefits of recursive algorithms on both multi level memory hierarchies and on shared memory systems in particular such algorithms have the data reuse characteristics of blocked algorithm that is simultaneously blocked at many different levels most existing applications however are written using ordinary loops we present new compiler transformation that can be used to convert loop nests into recursive form automatically we show that the algorithm is fast and effective handling loop nests with arbitrary nesting and control flow the transformation achieves substantial performance improvements for several linear algebra codes even on current system with two level cache hierarchy as side effect of this work we also develop an improved algorithm for transitive dependence analysis powerful technique used in the recursion transformation and other loop transformations that is much faster than the best previously known algorithm in practice
given distributed computation and global predicate predicate detection involves determining whether there exists at least one consistent cut or global state of the computation that satisfies the predicate on the other hand computation slicing is concerned with computing the smallest subcomputation with the least number of consistent cuts that contains all consistent cuts of the computation satisfying the predicate in this paper we investigate the relationship between predicate detection and computation slicing and show that the two problems are equivalent specifically given an algorithm to detect predicate in computation we derive an algorithm to compute the slice of with respect to the time complexity of the derived slicing algorithm is where is the number of processes and is the set of events and is the time complexity of the detection algorithm we discuss how the equivalence result of this paper can be utilized to derive faster algorithm for solving the general predicate detection problem in many casesslicing algorithms described in our earlier papers are all off line in nature in this paper we also present two on line algorithms for computing the slice the first algorithm can be used to compute the slice for general predicate its amortized time complexity is where is the average concurrency in the computation and is the time complexity of the detection algorithm the second algorithm can be used to compute the slice for regular predicate its amortized time complexity is only
we present cache oblivious solutions to two important variants of range searching range reporting and approximate range counting the main contribution of our paper is general approach for constructing cache oblivious data structures that provide relative approximations for general class of range counting queries this class includes three sided range counting dominance counting and halfspace range counting our technique allows us to obtain data structures that use linear space and answer queries in the optimal query bound of logb block transfers in the worst case where is the number of points in the query range using the same technique we also obtain the first approximate halfspace range counting and dominance counting data structures with worst case query time of log in internal memory an easy but important consequence of our main result is the existence of log space cache oblivious data structures with an optimal query bound of logbn block transfers for the reporting versions of the above problems using standard reductions these data structures allow us to obtain the first cache oblivious data structures that use near linear space and achieve the optimal query bound for circular range reporting and nearest neighbour searching in the plane as well as for orthogonal range reporting in three dimensions
servlet cache can effectively improve the throughput and reduce response time experienced by customers in servlet container an essential issue of servlet cache is cache replacement traditional solutions such as lru lfu and gdsf only concern some intrinsic factors of cache objects regardless of associations among cached objects for higher performance some approaches are proposed to utilize these associations to predict customer visit behaviors but they are still restricted by first order markov model and lead to inaccurate predication in this paper we describe associations among servlets as sequential patterns and compose them into pattern graphs which eliminates the limitation of markov model and achieve more accurate predictions at last we propose discovery algorithm to generate pattern graphs and two predictive probability functions for cache replacement based on pattern graphs our evaluation shows that this approach can get higher cache hit ratio and effectively improve the performance of servlet container
we present novel unsupervised sentence fusion method which we apply to corpus of biographies in german given group of related sentences we align their dependency trees and build dependency graph using integer linear programming we compress this graph to new tree which we then linearize we use germanet and wikipedia for checking semantic compatibility of co arguments in an evaluation with human judges our method outperforms the fusion approach of barzilay mckeown with respect to readability
the term frequency normalisation parameter sensitivity is an important issue in the probabilistic model for information retrieval high parameter sensitivity indicates that slight change of the parameter value may considerably affect the retrieval performance therefore weighting model with high parameter sensitivity is not robust enough to provide consistent retrieval performance across different collections and queries in this paper we suggest that the parameter sensitivity is due to the fact that the query term weights are not adequate enough to allow informative query terms to differ from non informative ones we show that query term reweighing which is part of the relevance feedback process can be successfully used to reduce the parameter sensitivity experiments on five text retrieval conference trec collections show that the parameter sensitivity does remarkably decrease when query terms are reweighed
schlipf proved that stable logic programming slp solves all mathit np decision problems we extend schlipf’s result to prove that slp solves all search problems in the class mathit np moreover we do this in uniform way as defined in marek and truszczy nacute ski specifically we show that there is single mathrm datalog neg program mathit trg such that given any turing machine any polynomial with non negative integer coefficients and any input sigma of size over fixed alphabet sigma there is an extensional database mathit edb sigma such that there is one to one correspondence between the stable models of mathit edb sigma cup mathit trg and the accepting computations of the machine that reach the final state in at most steps moreover mathit edb sigma can be computed in polynomial time from sigma and the description of and the decoding of such accepting computations from its corresponding stable model of mathit edb sigma cup mathit trg can be computed in linear time similar statement holds for default logic with respect to sigma mathrm search problems
network on chip noc based chip multiprocessors cmps are expected to become more widespread in future in both high performance scientific computing and low end embedded computing for many execution environments that employ these systems reducing power consumption is an important goal this paper presents software approach for reducing power consumption in such systems through compiler directed voltage frequency scaling the unique characteristic of this approach is that it scales the voltages and frequencies of select cpus and communication links in coordinated manner to maximize energy savings without degrading performance our approach has three important components the first component is the identification of phases in the application the next step is to determine the critical execution paths and slacks in each phase for implementing these two components our approach employs novel parallel program representation the last component of our approach is the assignment of voltages and frequencies to cpus and communication links to maximize energy savings we use integer linear programming ilp for this voltage frequency assignment problem to test our approach we implemented it within compilation framework and conducted experiments with applications from the specomp suite and specjbb our results show that the proposed combined cpu link scaling is much more effective than scaling voltages of cpus or communication links in isolation in addition we observed that the energy savings obtained are consistent across wide range of values of our major simulation parameters such as the number of cpus the number of voltage frequency levels and the thread to cpu mapping
distributed computing in large size dynamic networks often requires the availability at each and every node of globally aggregated information about some overall properties of the network in this context traditional broadcasting solutions become inadequate as the number of participating nodes increases therefore aggregation schemes inspired by the physical biological phenomenon of diffusion have been recently proposed as simple yet effective alternative to solve the problem however diffusive aggregation algorithms require solutions to cope with the dynamics of the network and or of the values being aggregated solutions which are typically based on periodic restarts epoch based approaches this paper proposes an original and autonomic solution relying on coupling diffusive aggregation schemes with the bio inspired mechanism of evaporation while gossip based diffusive communication scheme is used to aggregate values over network gradual evaporation of values can be exploited to account for network and value dynamics without requiring periodic restarts comparative performance evaluation shows that the evaporative approach is able to manage the dynamism of the values and of the network structure in an effective way in most situations it leads to more accurate aggregate estimations than epoch based techniques
the application of data mining and knowledge discovery techniques to medical and health datasets is rewarding but highly challenging area not only are the datasets large complex heterogeneous hierarchical time varying and of varying quality but there exists asubstantial medical knowledge base which demands robust collaboration between the data miner and the health professional if useful information is to be extractedthis paper presents the experiences of the authors and others in applying exploratory data mining techniques to medical health and clinical data in so doing it elicits number of general issues and provides pointers to possible areas of future research in data mining and knowledge discovery more broadly
this paper addresses how intellectual property affects the web in general and content publishing on the web in particular before its commercialization the web was perceived as being free and unregulated this assumption is no longer true nowadays content providers need to know which practices on the web can result in potential legal problems the vast majority of web sites are developed by individual such as technical writers or graphic artists and small organizations which receive limited or no legal advice as result these web sites are developed with little or no regard to the legal constraints of intellectual property law in order to help this group of people the paper tries to answer the following question what are the typical legal issues for web content providers to watch out for this paper gives an overview of these legal issues for intellectual property ie copyrights patents and trademarks and discusses relevant law cases as first step towards more formal risk assessment of intellectual property issues we introduce maturity model that captures web site’s intellectual property coverage with five different maturity levels
in this paper we study queries over relational databases with integrity constraints ics the main problem we analyze is owa query answering ie query answering over database with ics under open world assumption the kinds of ics that we consider are functional dependencies in particular key dependencies and inclusion dependencies the query languages we consider are conjunctive queries cqs union of conjunctive queries ucqs cqs and ucqs with negation and or inequality we present set of results about the decidability and finite controllability of owa query answering under ics in particular we identify the decidability undecidability frontier for owa query answering under different combinations of the ics allowed and the query language allowed ii we study owa query answering both over finite databases and over unrestricted databases and identify the cases in which such problem is finitely controllable ie when owa query answering over finite databases coincides with owa query answering over unrestricted databases moreover we are able to easily turn the above results into new results about implication of ics and query containment under ics due to the deep relationship between owa query answering and these two classical problems in database theory in particular we close two long standing open problems in query containment since we prove finite controllability of containment of conjunctive queries both under arbitrary inclusion dependencies and under key and foreign key dependencies besides their theoretical interest we believe that the results of our investigation are very relevant in many research areas which have recently dealt with databases under an incomplete information assumption eg view based information access ontology based information systems data integration data exchange and peer to peer information systems
purely functional programs should run well on parallel hardware because of the absence of side effects but it has proved hard to realise this potential in practice plenty of papers describe promising ideas but vastly fewer describe real implementations with good wall clock performance we describe just such an implementation and quantitatively explore some of the complex design tradeoffs that make such implementations hard to build our measurements are necessarily detailed and specific but they are reproducible and we believe that they offer some general insights
prefetching is often used to overlap memory latency with computation for array based applications however prefetching for pointer intensive applications remains challenge because of the irregular memory access pattern and pointer chasing problem in this paper we proposed cooperative hardware software prefetching framework the push architecture which is designed specifically for linked data structures the push architecture exploits program structure for future address generation instead of relying on past address history it identifies the load instructions that traverse lds and uses prefetch engine to execute them ahead of the cpu execution this allows the prefetch engine to successfully generate future addresses to overcome the serial nature of lds address generation the push architecture employs novel data movement model it attaches the prefetch engine to each level of the memory hierarchy and pushes rather than pulls data to the cpu this push model decouples the pointer dereference from the transfer of the current node up to the processor thus series of pointer dereferences becomes pipelined process rather than serial process simulation results show that the push architecture can reduce up to of memory stall time on suite of pointer intensive applications reducing overall execution time by an average
slipstream processor accelerates program by speculatively removing repeatedly ineffectual instructions detecting the roots of ineffectual computation unreferenced writes nonmodifying writes and correctly predicted branches is straightforward on the other hand detecting ineffectual instructions in the backward slices of these root instructions currently requires complex back propagation circuitry we observe that by logically monitoring the speculative program instead of the original program back propagation can be reduced to detecting unreferenced writes that is once root instructions are actually removed instructions at the next higher level in the backward slice become newly exposed unreferenced writes in the speculative program this new algorithm called implicit back propagation eliminates complex hardware and achieves an average performance improvement of percent only marginally lower than the percent improvement achieved with explicit back propagation we further simplify the hardware component by electing not to detect ineffectual memory writes focusing only on ineffectual register writes minimal implementation consisting of only register indexed table similar to an architectural register file achieves good balance between complexity and performance percent average performance improvement with implicit back propagation and without detection of ineffectual memory writes
paper and traditional books have been serving as useful tools in supporting knowledge intensive tasks and school learning although learning strategies such as selective verbatim note taking or question asking may foster intentional recall or resolve comprehension difficulties in paper based learning practice improvement in learning may depend on the opportunity and quality of which students apply note taking review notes or enhance comprehension through questioning this study aims to complement paper textbook with mobile phone and to treat the combination as whole to facilitate verbatim note taking resolving comprehension questions and receiving reading recommendations the textbook paragraphs were augmented with line numbers to facilitate coordination between the mobile phone and the paper textbook an eight week comparative study was conducted to explore the use of two reading vehicles the results and findings show that using mobile phone to augment paper based learning is technically feasible and seems to promote the application of verbatim note taking and posting comprehension questions for discussion however the results of two course tests indicate that consequent learning improvement seemed inconsistent among the students six week case study was also conducted to explore the implications of the augmented support to students learning practice the findings show that mobile phones as learning supportive tools to augment paper based learning could support students planning and management of learning strategies or activities the portability of mobile phones and paper textbooks and the ubiquitous connection of paper based learning with an online learning community may provide the flexibility in planning ahead for suitable learning strategies or activities and may enhance students assessment for management of students learning goals
stack inspection is mechanism for programming secure applications by which method can obtain information from the call stack about the code that directly or indirectly invoked it this mechanism plays fundamental role in the security architecture of java and the net common language runtime central problem with stack inspection is to determine to what extent the local checks inserted into the code are sufficient to guarantee that global security property is enforced in this paper we present technique for inferring secure calling context for method by secure calling context we mean pre condition on the call stack sufficient for guaranteeing that execution of the method will not violate given global property this is particularly useful for annotating library code in order to avoid having to re analyse libraries for every new application the technique is constraint based static program analysis implemented via fixed point iteration over an abstract domain of linear temporal logic properties
macro tree transducers mtt are an important model that both covers many useful xml transformations and allows decidable exact typechecking this paper reports our first step toward an implementation of mtt typechecker that has practical efficiency our approach is to represent an input type obtained from backward inference as an alternating tree automaton in style similar to tozawa’s xslt typechecking in this approach typechecking reduces to checking emptiness of an alternating tree automaton we propose several optimizations cartesian factorization state partitioning on the backward inference process in order to produce much smaller alternating tree automata than the naive algorithm and we present our efficient algorithm for checking emptiness of alternating tree automata where we exploit the explicit representation of alternation for local optimizations our preliminary experiments confirm that our algorithm has practical performance that can typecheck simple transformations with respect to the full xhtml in reasonable time
previous studies on extracting class attributes from unstructured text consider either web documents or query logs as the source of textual data web search queries have been shown to yield attributes of higher quality however since many relevant attributes found in web documents occur infrequently in query logs web documents remain an important source for extraction in this paper we introduce bootstrapped web search bws extraction the first approach to extracting class attributes simultaneously from both sources extraction is guided by small set of seed attributes and does not rely on further domain specific knowledge bws is shown to improve extraction precision and also to improve attribute relevance across test classes
materialized views and view maintenance are important for data warehouses retailing banking and billing applications we consider two related view maintenance problems how to maintain views after the base tables have already been modified and how to minimize the time for which the view is inaccessible during maintenancetypically view is maintained immediately as part of the transaction that updates the base tables immediate maintenance imposes significant overhead on update transactions that cannot be tolerated in many applications in contrast deferred maintenance allows view to become inconsistent with its definition refresh operation is used to reestablish consistency we present new algorithms to incrementally refresh view during deferred maintenance our algorithms avoid state bug that has artificially limited techniques previously used for deferred maintenanceincremental deferred view maintenance requires auxiliary tables that contain information recorded since the last view refresh we present three scenarios for the use of auxiliary tables and show how these impact per transaction overhead and view refresh time each scenario is described by an invariant that is required to hold in all database states we then show that with the proper choice of auxiliary tables it is possible to lower both per transaction overhead and view refresh time
user interaction with animated hair is desirable for various applications but difficult because it requires real time animation and rendering of hair hair modeling in cluding styling simulation and rendering is computationally challenging due to the enormous number of deformable hair strands on human head elevating the computational complexity of many essential steps such as collision detection and self shadowing for hair using simulation localization techniques multi resolution representations and graphics hardware rendering acceleration we have developed physically based virtual hair salon system that simulates and renders hair at accelerated rates enabling users to interactively style virtual hair with haptic interface users can directly manipulate and position hair strands as well as employ real world styling applications cutting blow drying etc to create hairstyles more intuitively than previous techniques
the piduce project comprises programming language and distributed runtime environment devised for experimenting web services technologies by relying on solid theories about process calculi and formal languages for xml documents and schemas the language features values and datatypes that extend xml documents and schemas with channels an expressive type system with subtyping pattern matching mechanism for deconstructing xml values and control constructs that are based on milner’s asynchronous pi calculus the runtime environment supports the execution of piduce processes over networks by relying on state of the art technologies such as xml schema and wsdl thus enabling interoperability with existing web services we thoroughly describe the piduce project the programming language and its semantics the architecture of the distributed runtime and its implementation
preserving the integrity of software systems is essential in ensuring future product success commonly companies allocate only limited budget toward perfective maintenance and instead pressure developers to focus on implementing new features traditional techniques such as code inspection consume many staff resources and attention from developers metrics automate the process of checking for problems but produce voluminous imprecise and incongruent results an opportunity exists for visualization to assist where automated measures have failed however current software visualization techniques only handle the voluminous aspect of data but fail to address imprecise and incongruent aspects in this paper we describe several techniques for visualizing possible defects reported by automated inspection tools we propose catalogue of lightweight visualizations that assist reviewers in weeding out false positives we implemented the visualizations in tool called noseprints and present case study on several commercial systems and open source applications in which we examined the impact of our tool on the inspection process
this paper explores system issues for involving end users in constructing and enhancing smart home in support of this involvement we present an infrastructure and tangible deployment tool active participation of users is essential in domestic environment as it offers simplicity greater usercentric control lower deployment costs and better support for personalization our proposed infrastructure provides the foundation for end user deployment utilizing loosely coupled framework to represent an artefact and its augmented functionalities pervasive applications are built independently and are expressed as collection of functional tasks runtime component fednet maps these tasks to corresponding service provider artefacts the tangible deployment tool uses fednet and allows end users to deploy and control artefacts and applications only by manipulating rfid cards primary advantages of our approach are two fold firstly it allows end users to deploy ubicomp systems easily in do it yourself fashion secondly it allows developers to write applications and to build augmented artefacts in generic way regardless of the constraints of the target environment we describe an implemented prototype and illustrate its feasibility in real life deployment session by the end users our study shows that the end users might be involved in deploying future ubicomp systems if appropriate tools and supporting infrastructure are provided
we describe an algorithm to generate manifold mesh from an octree while preserving surface features the algorithm requires samples of surface coordinates on the octree edges along with the surface normals at those coordinatesthe distinct features of the algorithm arethe output mesh is manifold the resolution of the output mesh can be adjusted over the space with octree subdivision andsurface features are generally preserveda mesh generation algorithm with this combination of advantages has not been presented before
we examine how ambient displays can augment social television social tv is an interactive television solution that incorporates two ambient displays to convey to participants an aggregate view of their friends current tv watching status social tv also allows users to see which television shows friends and family are watching and send lightweight messages from within the tv viewing experience through two week field study we found the ambient displays to be an integral part of the experience we present the results of our field study with discussion of the implications for future social systems in the home
due to the usual incompleteness of information representation any approach to assign semantics to logic programs has to rely on default assumption on the missing information the stable model semantics that has become the dominating approach to give semantics to logic programs relies on the closed world assumption cwa which asserts that by default the truth of an atom is false there is second well known assumption called open world assumption owa which asserts that the truth of the atoms is supposed to be unknown by default however the cwa the owa and the combination of them are extremal though important assumptions over large variety of possible assumptions on the truth of the atoms whenever the truth is taken from an arbitrary truth spacethe topic of this paper is to allow any assignment ie interpretation over truth space to be default assumption our main result is that our extension is conservative in the sense that under the everywhere false default assumption cwa the usual stable model semantics is captured due to the generality and the purely algebraic nature of our approach it abstracts from the particular formalism of choice and the results may be applied in other contexts as well
many approaches to software verification are currently semi automatic human must provide key logical insights eg loop invariants class invariants and frame axioms that limit the scope of changes that must be analyzed this paper describes technique for automatically inferring frame axioms of procedures and loops using static analysis the technique builds on pointer analysis that generates limited information about all data structures in the heap our technique uses that information to over approximate potentially unbounded set of memory locations modified by each procedure loop this over approximation is candidate frame axiom we have tested this approach on the buffer overflow benchmarks from ase with manually provided specifications and invariants axioms our tool could verify falsify of the benchmarks with our automatically inferred frame axioms the tool could verify falsify of the demonstrating the effectiveness of our approach
this paper provides new worst case bounds for the size and treewith of the result of conjunctive query to database we derive bounds for the result size in terms of structural properties of both in the absence and in the presence of keys and functional dependencies these bounds are based on novel coloring of the query variables that associates coloring number to each query using this coloring number we derive tight bounds for the size of in case no functional dependencies or keys are specified and ii simple one attribute keys are given these results generalize recent size bounds for join queries obtained by atserias grohe and marx focs an extension of our coloring technique also gives lower bound for in the general setting of query with arbitrary functional dependencies our new coloring scheme also allows us to precisely characterize both in the absence of keys and with simple keys the treewidth preserving queries the queries for which the output treewidth is bounded by function of the input treewidth finally we characterize the queries that preserve the sparsity of the input in the general setting with arbitrary functional dependencies
this paper proposes system of interactive multimedia contents that allows multiple users to participate in face to face manner and share the same time and space it provides an interactive environment where multiple users can see and manipulate stereoscopic animation with individual sound two application examples are implemented one is location based content design and the other is user based content design both effectively use unique feature of the illusionhole ie location sensitive display device that provides stereoscopic image with multiple users around the table
this paper examines the area power performance and design issues for the on chip interconnects on chip multiprocessor attempting to present comprehensive view of class of interconnect architectures it shows that the design choices for the interconnect have significant effect on the rest of the chip potentially consuming significant fraction of the real estate and power budget this research shows that designs that treat interconnect as an entity that can be independently architected and optimized would not arrive at the best multi core design several examples are presented showing the need for careful co design for instance increasing interconnect bandwidth requires area that then constrains the number of cores or cache sizes and does not necessarily increase performance also shared level caches become significantly less attractive when the overhead of the resulting crossbar is accounted for hierarchical bus structure is examined which negates some of the performance costs of the assumed base line architecture
we describe an approach for evaluating whether candidate architecture dependably satisfies stakeholder requirements expressed in requirements level scenarios we map scenarios to architectural elements through an ontology of requirements level event classes and domain entities the scenarios express both functional requirements and quality attributes of the system for quality attributes the scenarios either operationalize the quality or show how the quality can be verified our approach provides connection between requirements stakeholder can understand directly and architectures developed to satisfy those requirements the requirements level ontology simplifies the mapping acts as the focus for maintaining the mapping as both scenarios and architecture evolve and provides foundation for evaluating scenarios and architecture individually and jointly in this paper we focus on the mapping through event classes and demonstrate our approach with two examples
we propose radial stroke and finger count shortcuts two techniques aimed at augmenting the menubar on multi touch surfaces we designed these multi finger two handed interaction techniques in an attempt to overcome the limitations of direct pointing on interactive surfaces while maintaining compatibility with traditional interaction techniques while radial stroke shortcuts exploit the well known advantages of radial strokes finger count shortcuts exploit multi touch by simply counting the number of fingers of each hand in contact with the surface we report the results of an experimental evaluation of our technique focusing on expert mode performance finger count shortcuts outperformed radial stroke shortcuts in terms of both easiness of learning and performance speed
design rules have been the primary contract between technology and design and are likely to remain so to preserve abstractions and productivity while current approaches for defining design rules are largely unsystematic and empirical in nature this paper offers novel framework for early and systematic evaluation of design rules and layout styles in terms of major layout characteristics of area manufacturability and variability due to the focus on co exploration in early stages of technology development we use first order models of variability and manufacturability instead of relying on accurate simulation and layout topology congestion based area estimates instead of explicit and slow layout generation the framework is used to efficiently co evaluate several debatable rules evaluation for cell library takes minutes results show that diffusion rounding mainly from diffusion power straps is dominant source of variability cell area overhead of fixed gate pitch implementation compared to poly implementation is tolerable given the improvement in variability and poly restriction which improves manufacturability and variability has almost no area overhead compared to poly in addition we explore gate spacing rules using our evaluation framework this exploration yields almost identical values as those of commercial nm process which serves as validation for our approach
we improve the quality of paraphrases extracted from parallel corpora by requiring that phrases and their paraphrases be the same syntactic type this is achieved by parsing the english side of parallel corpus and altering the phrase extraction algorithm to extract phrase labels alongside bilingual phrase pairs in order to retain broad coverage of non constituent phrases complex syntactic labels are introduced manual evaluation indicates absolute improvement in paraphrase quality over the baseline method
conventional gang scheduling has the disadvantage that when processes perform or blocking communication their processors remain idle because alternative processes cannot be run independently of their own gangs to alleviate this problem we suggest slight relaxation of this rule match gangs that make heavy use of the cpu with gangs that make light use of the cpu presumably due to or communication activity and schedule such pairs together allowing the local scheduler on each node to select either of the two processes at any instant as intensive gangs make light use of the cpu this only causes minor degradation in the service to compute bound jobs this degradation is more than offset by the overall improvement in system performance due to the better utilization of the resources
scratch pad memories spms are serious alternative to conventional cache memories in embedded computing since they allow software to manage data flowing from and into memory components resulting in predictable behavior at runtime the prior studies considered compiler directed spm management using both static and dynamic approaches one of the assumptions under which most of the proposed approaches to data spm management operate is that the application code is structured with regular loop nests with little or no control flow within the loops this assumption while it makes data spm management relatively easy to implement limits the applicability of those approachs to the codes involve conditional execution and complex control flows to address this problem this paper proposes novel data spm management strategy based on dataflow analysis this analysis operates on representation that reflects the conditional execution flow of the application and consequently it is applicable to large class of embedded applications including those with complex control flows
we propose definition of spatial database system as database system that offers spatial data types in its data model and query language and supports spatial data types in its implementation providing at least spatial indexing and spatial join methods spatial database systems offer the underlying database technology for geographic information systems and other applications we survey data modeling querying data structures and algorithms and system architecture for such systems the emphasis is on describing known technology in coherent manner rather than listing open problems
the publicly available bgp vantage points vps have been heavily used by the research community to build the internet autonomous system as level topology which is key input to many applications such as routing protocol design performance evaluation and network security issues however detailed study on the eyeshots of these vps has received little attention before in this paper we inspect these vps carefully specifically we do measurement work to evaluate the effect of various factors on the eyeshot of each individual vp as well as the relationship between the eyeshots of different vps based on the measurements we disclose several counterintuitive observations and explain the possible reasons behind which will help people to better understand the eyeshots of vps and make better use of them in practice
the constraint diagram language was designed to be used in conjunction with the unified modelling language uml primarily for placing formal constraints on software models in particular constraint diagrams play similar role to the textual object constraint language ocl in that they can be used for specifying system invariants and operation contracts in the context of uml model unlike the ocl however constraint diagrams can be used independently of the uml in this paper we illustrate range of intuitive and counter intuitive features of constraint diagrams and highlight some potential expressiveness limitations the counter intuitive features are related to how the individual pieces of syntax interact generalized version of the constraint diagram language that overcomes the illustrated counter intuitive features and limitations is proposed in order to discourage specification readers and writers from overlooking certain semantic information the generalized notation allows this information to be expressed more explicitly than in the non generalized case the design of the generalized notation takes into account five language design principles which are discussed in the paper we provide formalization of the syntax and semantics for generalized constraint diagrams moreover we establish lower bound on the expressiveness of the generalized notation and show that they are at least as expressive as constraint diagrams
method of reducing the wireless cost of tracking mobile users with uncertain parameters is developed in this paper such uncertainty arises naturally in wireless networks since an efficient user tracking is based on prediction of its future call and mobility parameters the conventional approach based on dynamic tracking is not reliable in the sense that inaccurate prediction of the user mobility parameters may significantly reduce the tracking efficiency unfortunately such uncertainty is unavoidable for mobile users especially for bursty mobility pattern the two main contributions of this study are novel method for topology independent distance tracking and combination of distance based tracking with distance sensitive timer that guarantees both efficiency and robustness the expected wireless cost of tracking under the proposed method is significantly reduced in comparison to the existing methods currently used in cellular networks furthermore as opposed to other tracking methods the worst case tracking cost is bounded from above and governed by the system such that it outperforms the existing methods the proposed strategy can be easily implemented and it does not require significant computational power from the user
based on concepts of the human visual system computational visual attention systems aim to detect regions of interest in images psychologists neurobiologists and computer scientists have investigated visual attention thoroughly during the last decades and profited considerably from each other however the interdisciplinarity of the topic holds not only benefits but also difficulties concepts of other fields are usually hard to access due to differences in vocabulary and lack of knowledge of the relevant literature this article aims to bridge this gap and bring together concepts and ideas from the different research areas it provides an extensive survey of the grounding psychological and biological research on visual attention as well as the current state of the art of computational systems furthermore it presents broad range of applications of computational attention systems in fields like computer vision cognitive systems and mobile robotics we conclude with discussion on the limitations and open questions in the field
this paper describes framework that allows for reasoning about and verification of concurrent statecharts with real time constraints subject to semantic variations the major problems addressed by this paper include the capture of multiple semantic variations of real time statecharts and the use of the resulting semantics for further analysis our solution is based on theoretical framework involving two dimensional temporal logic that is used to independently capture flow of control through statecharts as well as flow of time the independence of these dimensions along with the high level nature of temporal logic allows for simple adaptation to alternate semantics of statecharts as well as real time models the paper defines our logic shows how the semantics of real time statecharts can be expressed in this formalism and describes our tools for capturing and reasoning about these semantics the underlying goal is the formal analysis of real time software behavior in way that captures designer intentions
bandwidth aggregation is promising technology that can speed up access to the internet by bandwidth sharing and multi path communication current bandwidth aggregation systems bass deployed in public networks provide limited performance and flexibility when they are directly used in home networking environments to reap the full performance benefits of bass in home networks they need to be easily and dynamically programmable by home network users we present the design and implementation of programmable bandwidth aggregation system pbas that can provide home network users improved performance when sharing bandwidth for activities that access the internet we also present an empirical performance evaluation of the system and we demonstrate the superior efficiency of our proposed pbas over traditional bas in terms of computational overheads loadable code throughput performance and programmability
event driven programming has emerged as standard to implement high performance servers due to its flexibility and low os overhead still memory access remains bottleneck generic optimization techniques yield only small improvements in the memory access behavior of event driven servers as such techniques do not exploit their specific structure and behaviorthis paper presents an optimization framework dedicated to event driven servers based on strategy to eliminate data cache misses we propose novel memory manager combined with tailored scheduling strategy to restrict the working data set of the program to memory region mapped directly into the data cache our approach exploits the flexible scheduling and deterministic execution of event driven serverswe have applied our framework to industry standard webservers including tux and thttpd as well as to the squid proxy server and the cactus qos framework testing tux and thttpd using standard http benchmark tool shows that our optimizations applied to the tux web server reduce data cache misses under heavy load by up to and increase the throughput of the server by up to
two dimensional point data can be considered one of the most basic yet one of the most ubiquitous data types arising in wide variety of applications the basic scatter plot approach is widely used and highly effective for data sets of small to moderate size however it shows scalability problems for data sets of increasing size of multiple classes and of time dependency in this short paper we therefore present an improved visual analysis of such point clouds the basic idea is to monitor certain statistical properties of the data for each point and for each class as function of time the output of the statistic analysis is used for identification of interesting data views decreasing information overload the data is interactively visualized using various techniques in this paper we specify the problem detail our approach and present application results based on real world data set
we present novel technique both flexible and efficient for interactive remeshing of irregular geometry first the original arbitrary genus mesh is substituted by series of maps in parameter space using these maps our algorithm is then able to take advantage of established signal processing and halftoning tools that offer real time interaction and intricate control the user can easily combine these maps to create control map map which controls the sampling density over the surface patch this map is then sampled at interactive rates allowing the user to easily design tailored resampling once this sampling is complete delaunay triangulation and fast optimization are performed to perfect the final meshas result our remeshing technique is extremely versatile and general being able to produce arbitrarily complex meshes with variety of properties including uniformity regularity semi regularity curvature sensitive resampling and feature preservation we provide high level of control over the sampling distribution allowing the user to interactively custom design the mesh based on their requirements thereby increasing their productivity in creating wide variety of meshes
when concurrent shared memory program written with sequential consistency sc model is run on machine implemented with relaxed consistency rc model it could cause sc violations that are very hard to debug to avoid such violations programmers need to provide explicit synchronizations or insert fence instructions in this paper we propose scheme to detect and eliminate potential sc violations by combining shasha snir’s conflict graph and delay set theory with existing data race detection techniques for each execution we generate race graph which contains the improperly synchronized conflict accesses called race accesses and race cycles formed with those accesses as race cycle would probably lead to non sequential consistent execution we call it potential violation of sequential consistency pvsc bug we then compute the race delays of race cycles and suggest programmers to insert fences into source code to eliminate pvsc bugs we further convert race graph into pc race graph and improves cycle detection and race delay computation to where is the number of race access instructions we evaluate our approach with the splash benchmarks two large real world applications mysql and apache and several multi threaded cilk programs the results show that the proposed approach could effec tively detect pvsc bugs in real world applications with good scalability it retains most of the performance of the concurrent program after inserting required fence instructions with less than performance loss and the additional cost of our approach over traditional race detection techniques is quite low with on average
we present process logic for the calculus with the linear affine type discipline built on the preceding studies on logics for programs and processes simple systems of assertions are developed capturing the classes of behaviours ranging from purely functional interactions to those with destructive update local state and genericity central feature of the logic is representation of the behaviour of an environment as the dual of that of process in an assertion which is crucial for obtaining compositional proof systems from the process logic we can derive compositional program logics for various higher order programming languages whose soundness is proved via their embeddings into the process logic in this paper the key technical framework of the process logic and its applications is presented focussing on pure functional behaviour and prototypical call by value functional language leaving the full technical development to
the author discusses the likely evolution of commercial data managers over the next several years topics to be covered include the following why sql structured query language has become universal standard who can benefit from sql standardization why the current sql standard has no chance of lasting why all database systems can be distributed soon what new technologies are likely to be commercialized and why vendor independence may be achievable
we propose type system for lock freedom in the pi calculus which guarantees that certain communications will eventually succeed distinguishing features of our type system are it can verify lock freedom of concurrent programs that have sophisticated recursive communication structures it can be fully automated it is hybrid in that it combines type system for lock freedom with local reasoning about deadlock freedom termination and confluence analyses moreover the type system is parameterized by deadlock freedom termination confluence analyses so that any methods eg type systems and model checking can be used for those analyses lock freedom analysis tool has been implemented based on the proposed type system and tested for nontrivial programs
programs written in type unsafe languages such as and incur costly memory errors that result in corrupted data structures program crashes and incorrect results we present data centric solution to memory corruption called critical memory memory model that allows programmers to identify and protect data that is critical for correct program execution critical memory defines operations to consistently read and update critical data and ensures that other non critical updates in the program will not corrupt it we also present samurai runtime system that implements critical memory in software samurai uses replication and forward error correction to provide probabilistic guarantees of critical memory semantics because samurai does not modify memory operations on non critical data the majority of memory operations in programs run at full speed and samurai is compatible with third party libraries using both applications including web server and libraries an stl list class and memory allocator we evaluate the performance overhead and fault tolerance that samurai provides we find that samurai is useful and practical approach for the majority of the applications and libraries considered
web search is the main way for millions of users to access information every day but we continue to struggle when it comes to finding the right information at the right time in this paper we build on recent work to describe and evaluate new application of case based web search one that focuses on how experience reuse can support collaboration among searchers special emphasis is placed on the development of case based system that is compatible with existing search engines we also describe the results of live user deployment
nowadays decision support systems are evolving in order to handle complex data some recent works have shown the interest of combining on line analysis processing olap and data mining we think that coupling olap and data mining would provide excellent solutions to treat complex data to do that we propose an enhanced olap operator based on the agglomerative hierarchical clustering ahc the here proposed operator called opac operator for aggregation by clustering is able to provide significant aggregates of facts refereed to complex objects we complete this operator with tool allowing the user to evaluate the best partition from the ahc results corresponding to the most interesting aggregates of facts
over the past few years we have been trying to build an end to end system at wisconsin to manage unstructured data using extraction integration and user interaction this paper describes the key information extraction ie challenges that we have run into and sketches our solutions we discuss in particular developing declarative ie language optimizing for this language generating ie provenance incorporating user feedback into the ie process developing novel wiki based user interface for feedback best effort ie pushing ie into rdbmss and more our work suggests that ie in managing unstructured data can open up many interesting research challenges and that these challenges can greatly benefit from the wealth of work on managing structured data that has been carried out by the database community
the well definedness problem for programming language consists of checking given an expression and an input type whether the semantics of the expression is defined for all inputs adhering to the input type related problem is the semantic type checking problem which consists of checking given an expression an input type and an output type whether the expression always returns outputs adhering to the output type on inputs adhering to the input type both problems are undecidable for general purpose programming languages in this paper we study these problems for the nested relational calculus specific purpose database query language we also investigate how these problems behave in the presence of programming language features such as singleton coercion and type tests
many peer to peer overlay operations are inherently parallel and this parallelism can be exploited by using multi destination multicast routing resulting in significant message reduction in the underlying network we propose criteria for assessing when multicast routing can effectively be used and compare multi destination multicast and host group multicast using these criteria we show that the assumptions underlying the chuang sirbu multicast scaling law are valid in large scale peer to peer overlays and thus chuang sirbu is suitable for estimating the message reduction when replacing unicast overlay messages with multicast messages using simulation we evaluate message savings in two overlay algorithms when multi destination multicast routing is used in place of unicast messages we further describe parallelism in range of overlay algorithms including multi hop variable hop load balancing random walk and measurement overlay
the recent evolution of internet driven by the web services technology is extending the role of the web from support of information interaction to middleware for bb interactionsindeed the web services technology allows enterprises to outsource parts of their business processes using web services and it also provides the opportunity to dynamically offer new value added services through the composition of pre existing web servicesin spite of the growing interest in web services current technologies are found lacking efficient transactional support for composite web services css in this paper we propose transactional approach to ensure the failure atomicity of cs required by partners we use the accepted termination states ats property as mean to express the required failure atomicitypartners specify their cs mainly its control flow and the required ats then we use set of transactional rules to assist designers to compose valid cs with regards to the specified ats
computer security is severely threatened by software vulnerabilities prior work shows that information flow tracking also referred to as taint analysis is promising technique to detect wide range of security attacks however current information flow tracking systems are not very practical because they either require program annotations source code non trivial hardware extensions or incur prohibitive runtime overheads this paper proposes low overhead software only information flow tracking system called lift which minimizes run time overhead by exploiting dynamic binary instrumentation and optimizations for detecting various types of security attacks without requiring any hardware changes more specifically lift aggressively eliminates unnecessary dynamic information flow tracking coalesces information checks and efficiently switches between target programs and instrumented information flow tracking code we have implemented lift on dynamic binary instrumentation framework on windows our real system experiments with two real world server applications one client application and eighteen attack benchmarks show that lift can effectively detect various types of security attacks lift also incurs very low overhead only for server applications and times on average for seven spec int applications our dynamic optimizations are very effective in reducing the overhead by factor of times
due to high levels of integration and complexity the design of multi core socs has become increasingly challenging in particular energy consumption and distributing single global clock signal throughout chip have become major design bottlenecks to deal with these issues globally asynchronous locally synchronous gals design is considered for achieving low power consumption and modular design such design style fits nicely with the concept of voltage frequency islands vfis which has been recently introduced for achieving fine grain system level power management this paper proposes design methodology for partitioning an noc architecture into multiple vfis and assigning supply and threshold voltage levels to each vfi simulation results show about savings for real video application and demonstrate the effectiveness of our approach in reducing the overall system energy consumption the results and functional correctness are validated using an fpga prototype for an noc with multiple vfis
network fragmentation occurs when the accessibility of network based resource to an observer is function of how the observer is connected to the network in the context of the internet network fragmentation is well known and occurs in many situations including an increasing preponderance of network address translation firewalls and virtual private networks recently however new threats to internet consistency have received media attention alternative namespaces have emerged as the result of formal objections to the process by which internet names and addresses are provisioned in addition various governments and service providers around the world have deployed network technology that accidentally or intentionally restricts access to certain internet content combined with the aforementioned sources of fragmentation these new concerns provide ample motivation for network that allows users the ability to specify not only the network location of internet resources they want to view but also the perspectives from which they want to view them our vision of perspective access network pan is peer to peer overlay network that incorporates routing and directory services that allow network perspective sharing and nonhierarchical organization of the internet in this paper we present the design implementation and evaluation of directory service for such networks we demonstrate its feasibility and efficacy using measurements from test deployment on planetlab
the majority of documents on the web are written in html constituting huge amount of legacy data all documents are formatted for visual purposes only and with different styles due to diverse authorships and goals and this makes the process of retrieval and integration of web contents difficult to automate we provide contribution to the solution of this problem by proposing structured approach to data reverse engineering of data intensive web sites we focus on data content and on the way in which such content is structured on the web we profitably use web data model to describe abstract structural features of html pages and propose method for the segmentation of html documents in special blocks grouping semantically related web objects we have developed tool based on this method that supports the identification of structure function and meaning of data organized in web object blocks we demonstrate with this tool the feasibility and effectiveness of our approach over set of real web sites
beyond document classes the notion of document series denotes sets of documents whose semantic rhetorical and narrative structures comply with some given model whereas they may or not belong to the same document class this paper focuses on the production of document series it first examines the current research activity on topics related to document series and exhibits the directions which need to be combined to specify at generic level the intention of the author of document series then it describes framework for role based specification and shows how to turn such specification into documents the author specifies the generic narrative argumentative and rhetoric structures the thesis of the document in terms of the roles of the document elements rather than in terms of their contents binding actual content to roles is done separately it defines the theme on which the document is instantiated then mechanism based on tree recursive transformations turns the generic specification into an actual document the chosen set of transformations defines the document genre
future wireless sensor networks wsns will transport high bandwidth low latency streaming data and will host sophisticated processing such as image fusion and object tracking in network on sensor network nodes recent middleware proposals provide capabilities for in network processing reducing energy drain based on communication costs alone however hosting complex processing on wsn nodes incurs additional processing energy and latency costs that impact network lifetime and application performance there is need for wsn planning framework to explore energy saving and application performance trade offs for models of future sensor networks that account for processing costs in addition to communication costs in this work we present simulation framework to analyze the interplay between resource requirements for compute and communication intensive in network processing for streaming applications we simulate surveillance application workload with middleware capabilities for data fusion adaptive policy driven migration of data fusion computation across network nodes and prefetching of streaming data inputs for fusion processing our study sheds light on application figures of merit such as latency throughput and lifetime with respect to migration policy and node cpu and radio characteristics
monte carlo is the only choice for physically correct method to do global illumination in the field of realistic image synthesis generally monte carlo based algorithms require lot of time to eliminate the noise to get an acceptable image adaptive sampling is an interesting tool to reduce noise in which the evaluation of homogeneity of pixel’s samples is the key point in this paper we propose new homogeneity measure namely the arithmetic mean geometric mean difference abbreviated to am gm difference which is developed to execute adaptive sampling efficiently implementation results demonstrate that our novel adaptive sampling method can perform significantly better than classic ones
in this paper we consider tree based routing scheme for supporting barrier synchronization on scalable parallel computers with mesh network based on the characteristics of standard programming interface the scheme builds collective synchronization cs tree among the participating nodes using distributed algorithm when the routers are set up properly with the cs tree information barrier synchronization can be accomplished very efficiently by passing simple messages performance evaluations show that our proposed method performs better than previous path based approaches and is less sensitive to variations in group size and startup delay however our scheme has the extra overhead of building the cs tree thus it is more suitable for parallel iterative computations in which the same barrier is invoked repetitively
typestate analysis determines whether program violates set of finite state properties because the typestate analysis problem is statically undecidable researchers have proposed hybrid approach that uses residual monitors to signal property violations at runtime we present an efficient novel static typestate analysis that is flow sensitive partially context sensitive and that generates residual runtime monitors to gain efficiency our analysis uses precise flow sensitive information on an intra procedural level only and models the remainder of the program using flow insensitive pointer abstraction unlike previous flow sensitive analyses our analysis uses an additional backward analysis to partition states into equivalence classes code locations that transition between equivalent states are irrelevant and require no monitoring as we show in this work this notion of equivalent states is crucial to obtaining sound runtime monitors we proved our analysis correct implemented the analysis in the clara framework for typestate analysis and applied it to the dacapo benchmark suite in half of the cases our analysis determined exactly the property violating program points in many other cases the analysis reduced the number of instrumentation points by large amounts yielding significant speed ups during runtime monitoring
this paper presents compiler optimization algorithm to reduce the run time overhead of array subscript range checks in programs without compromising safety the algorithm is based on partial redundancy elimination and it incorporates previously developed algorithms for range check optimization we implemented the algorithm in our research compiler nascent and conducted experiments on suite of benchmark programs to obtain four results the execution overhead of naive range checking is high enough to merit optimization there are substantial differences between various optimizations loop based optimizations that hoist checks out of loops are effective in eliminating about of the range checks and more sophisticated analysis and optimization algorithms produce very marginal benefits
many applications such as commerce routinely use copies of data that are not in sync with the database due to heuristic caching strategies used to enhance performance we study concurrency control for transactional model that allows update transactions to read out of date copies each read operation carries freshness constraint that specifies how fresh copy must be in order to be read we offer definition of correctness for this model and present algorithms to ensure several of the most interesting freshness constraints we outline serializability theoretic correctness proof and present the results of detailed performance study
one of today’s challenges is producing reliable software in the face of an increasing number of interacting components our system chet lets developers define specifications describing how component should be used and checks these specifications in real java systems unlike previous systems chet is able to check wide range of complex conditions in large software systems without programmer intervention this paper explores the specification techniques that are used in chet and how they are able handle the types of specifications needed to accurately model and automatically identify component checks
two well known indexing methods are inverted files and signature files we have undertaken detailed comparison of these two approaches in the context of text indexing paying particular attention to query evaluation speed and space requirements we have examined their relative performance using both experimentation and refined approach to modeling of signature files and demonstrate that inverted files are distinctly superior to signature files not only can inverted files be used to evaluate typical queries in less time than can signature files but inverted files require less space and provide greater functionality our results also show that synthetic text database can provide realistic indication of the behavior of an actual text database the tools used to generate the synthetic database have been made publicly available
reconfigurable architectures are one of the most promising solutions satisfying both performance and flexibility however reconfiguration overhead in those architectures makes them inappropriate for repetitive reconfigurations in this paper we introduce configuration sharing technique to reduce reconfiguration overhead between similar applications using static partial reconfiguration compared to the traditional resource sharing that configures multiple temporal partitions simultaneously and employs time multiplexing technique the proposed configuration sharing reconfigures device incrementally as an application changes and requires backend adaptation to reuse configurations between applications adopting data flow intermediate representation our compiler framework extends min cut placer and negotiation based router to deal with the configuration sharing the results report that the framework could reduce of configuration time at the expense of of computation time on average
the web has been rapidly deepened by myriad searchable databases online where data are hidden behind query interfaces as an essential task toward integrating these massive deep web sources large scale schema matching ie discovering semantic correspondences of attributes across many query interfaces has been actively studied recently in particular many works have emerged to address this problem by holistically matching many schemas at the same time and thus pursuing mining approaches in nature however while holistic schema matching has built its promise upon the large quantity of input schemas it also suffers the robustness problem caused by noisy data quality such noises often inevitably arise in the automatic extraction of schema data which is mandatory in large scale integration for holistic matching to be viable it is thus essential to make it robust against noisy schemas to tackle this challenge we propose data ensemble framework with sampling and voting techniques which is inspired by bagging predictors specifically our approach creates an ensemble of matchers by randomizing input schema data into many independently downsampled trials executing the same matcher on each trial and then aggregating their ranked results by taking majority voting as principled basis we provide analytic justification of the effectiveness of this data ensemble framework further empirically our experiments on real web data show that the ensemblization indeed significantly boosts the matching accuracy under noisy schema input and thus maintains the desired robustness of holistic matcher
in non photorealistic rendering sketchiness is essential to communicate visual ideas and can be used to illustrate drafts and concepts in for instance architecture and product design in this paper we present hardware accelerated real time rendering algorithm for drawings that sketches visually important edges as well as inner color patches of arbitrary objects even beyond the geometrical boundary the algorithm preserves edges and color patches as intermediate rendering results using textures to achieve sketchiness it applies uncertainty values in image space to perturb texture coordinates when accessing intermediate rendering results the algorithm adjusts depth information derived from objects to ensure visibility when composing sketchy drawings with arbitrary scene contents rendering correct depth values while sketching edges and colors beyond the boundary of objects is achieved by depth sprite rendering moreover we maintain frame to frame coherence because consecutive uncertainty values have been determined by perlin noise function so that they are correlated in image space finally we introduce solution to control and predetermine sketchiness by preserving geometrical properties of objects in order to calculate associated uncertainty values this method significantly reduces the inherent shower door effect
delivery of products bought online can violate consumers privacy although not in straightforward way in particular delivery companies that have contracted with website know the company selling the product as well as the name and address of the online customer to make matters worse if the same delivery company has contracted with many websites aggregated information per address may be used to profile customers transaction activities in this paper we present fair delivery service system with guaranteed customer anonymity and merchant customer unlinkability with reasonable assumptions about the threat model
methods that learn from prior information about input features such as generalized expectation ge have been used to train accurate models with very little effort in this paper we propose an active learning approach in which the machine solicits labels on features rather than instances in both simulated and real user experiments on two sequence labeling tasks we show that our active learning method outperforms passive learning with features as well as traditional active learning with instances preliminary experiments suggest that novel interfaces which intelligently solicit labels on multiple features facilitate more efficient annotation
the use of real time data streams in data driven computational science is driving the need for stream processing tools that work within the architectural framework of the larger application data stream processing systems are beginning to emerge in the commercial space but these systems fail to address the needs of large scale scientific applications in this paper we illustrate the unique needs of large scale data driven computational science through an example taken from weather prediction and forecasting we apply realistic workload from this application against our calder stream processing system to determine effective throughput event processing latency data access scalability and deployment latency
it is important to find the person with right expertise and the appropriate solutions in the specific field to solve critical situation in large complex system such as an enterprise level application in this paper we apply the experts knowledge to construct solution retrieval system for expert finding and problem diagnosis firstly we aim to utilize the experts problem diagnosis knowledge which can identify the error type of problem to suggest the corresponding expert and retrieve the solution for specific error type therefore how to find an efficient way to use domain knowledge and the corresponding experts has become an important issue to transform experts knowledge into the knowledge base of solution retrieval system the idea of developing solution retrieval system based on hybrid approach using rbr rule based reasoning and cbr case based reasoning rcbr rule based cbr is proposed in this research furthermore we incorporate domain expertise into our methodology with role based access control model to suggest appropriate expert for problem solving and build prototype system with expert finding and problem diagnosis for the complex system the experimental results show that rcbr rule based cbr can improve accuracy of retrieval cases and reduce retrieval time prominently
software evolution and software quality are ever changing phenomena as software evolves evolution impacts software quality on the other hand software quality needs may drive software evolution strategiesthis paper presents an approach to schedule quality improvement under constraints and priority the general problem of scheduling quality improvement has been instantiated into the concrete problem of planning duplicated code removal in geographical information system developed in throughout the last years priority and constraints arise from development team and from the adopted development process the developer team long term goal is to get rid of duplicated code improve software structure decrease coupling and improve cohesionwe present our problem formulation the adopted approach including model of clone removal effort and preliminary results obtained on real world application
multisignature scheme allows group of signers to cooperate to generate compact signature on common document the length of the multisignature depends only on the security parameters of the signature schemes and not on the number of signers involved the existing state of the art multisignature schemes suffer either from impractical key setup assumptions from loose security reductions or from inefficient signature verification in this paper we present two new multisignature schemes that address all of these issues ie they have efficient signature verification they are provably secure in the plain public key model and their security is tightly related to the computation and decisional diffie hellman problems in the random oracle model our construction derives from variants of edl signatures
the computation of relatedness between two fragments of text in an automated manner requires taking into account wide range of factors pertaining to the meaning the two fragments convey and the pairwise relations between their words without doubt measure of relatedness between text segments must take into account both the lexical and the semantic relatedness between words such measure that captures well both aspects of text relatedness may help in many tasks such as text retrieval classification and clustering in this paper we present new approach for measuring the semantic relatedness between words based on their implicit semantic links the approach exploits only word thesaurus in order to devise implicit semantic links between words based on this approach we introduce omiotis new measure of semantic relatedness between texts which capitalizes on the word to word semantic relatedness measure sr and extends it to measure the relatedness between texts we gradually validate our method we first evaluate the performance of the semantic relatedness measure between individual words covering word to word similarity and relatedness synonym identification and word analogy then we proceed with evaluating the performance of our method in measuring text to text semantic relatedness in two tasks namely sentence to sentence similarity and paraphrase recognition experimental evaluation shows that the proposed method outperforms every lexicon based method of semantic relatedness in the selected tasks and the used data sets and competes well against corpus based and hybrid approaches
the notion of roll up dependency rud extends functional dependencies with generalization hierarchies ruds can be applied in olap and database design the problem of discovering ruds in large databases is at the center of this paper an algorithm is provided that relies on number of theoretical results the algorithm has been implemented results on two real life datasets are given the extension of functional dependency fd with roll ups turns out to capture meaningful rules that are outside the scope of classical fd mining performance figures show that ruds can be discovered in linear time in the number of tuples of the input dataset
this paper presents general platform namely synchronous tree sequence substitution grammar stssg for the grammar comparison study in translational equivalence modeling tem and statistical machine translation smt under the stssg platform we compare the expressive abilities of various grammars through synchronous parsing and real translation platform on variety of chinese english bilingual corpora experimental results show that the stssg is able to better explain the data in parallel corpora than other grammars our study further finds that the complexity of structure divergence is much higher than suggested in literature which imposes big challenge to syntactic transformation based smt
we present simple stochastic system for non periodically tiling the plane with small set of wang tiles the tiles may be filled with texture patterns or geometry that when assembled create continuous representation the primary advantage of using wang tiles is that once the tiles are filled large expanses of non periodic texture or patterns or geometry can be created as needed very efficiently at runtimewang tiles are squares in which each edge is assigned color valid tiling requires all shared edges between tiles to have matching colors we present new stochastic algorithm to non periodically tile the plane with small set of wang tiles at runtimefurthermore we present new methods to fill the tiles with texture poisson distributions or geometry to efficiently create at runtime as much non periodic texture or distributions or geometry as needed we leverage previous texture synthesis work and adapt it to fill wang tiles we demonstrate how to fill individual tiles with poisson distributions that maintain their statistical properties when combined these are used to generate large arrangement of plants or other objects on terrain we show how such environments can be rendered efficiently by pre lighting the individual wang tiles containing the geometrywe also extend the definition of wang tiles to include coding of the tile corners to allow discrete objects to overlap more than one edge the larger set of tiles provides increased degrees of freedom
in dynamic environments with frequent content updates we require online full text search that scales to large data collections and achieves low search latency several recent methods that support fast incremental indexing of documents typically keep on disk multiple partial index structures that they continuously update as new documents are added however spreading indexing information across multiple locations on disk tends to considerably decrease the search responsiveness of the system in the present paper we take fresh look at the problem of online full text search with consideration of the architectural features of modern systems selective range flush is greedy method that we introduce to manage the index in the system by using fixed size blocks to organize the data on disk and dynamically keep low the cost of data transfer between memory and disk as we experimentally demonstrate with the proteus prototype implementation that we developed we retrieve indexing information at latency that matches the lowest achieved by existing methods additionally we reduce the total building cost by in comparison to methods with similar retrieval time
phenomenal advances in nano technology and packaging have made it possible to develop miniaturized low power devices that integrate sensing special purpose computing and wireless communications capabilities it is expected that these small devices referred to as sensors will be mass produced and deployed making their production cost negligible due to their small form factor and modest non renewable energy budget individual sensors are not expected to be gps enabled moreover in most applications exact geographic location is not necessary and all that the individual sensors need is coarse grain location awareness the task of acquiring such coarse grain location awareness is referred to as training in this paper two scalable energy efficient training protocols are proposed for massively deployed sensor networks where sensors are initially anonymous and unaware of their location the training protocols are lightweight and simple to implement they are based on an intuitive coordinate system imposed onto the deployment area which partitions the anonymous sensors into clusters where data can be gathered from the environment and synthesized under local control
future generations of chip multiprocessors cmp will provide dozens or even hundreds of cores inside the chip writing applications that benefit from the massive computational power offered by these chips is not going to be an easy task for mainstream programmers who are used to sequential algorithms rather than parallel ones this paper explores the possibility of using transactional memory tm in openmp the industrial standard for writing parallel programs on shared memory architectures for and fortran one of the major complexities in writing openmp applications is the use of critical regions locks atomic regions and barriers to synchronize the execution of parallel activities in threads tm has been proposed as mechanism that abstracts some of the complexities associated with concurrent access to shared data while enabling scalable performance the paper presents first proof of concept implementation of openmp with tm some extensions to the language are proposed to express transactions these extensions are handled by our source to source openmp mercurium compiler and our software transactional memory stm library nebelung that supports the code generated by mercurium the current implementation of the library has no support at the hardware level so it is proof of concept implementation hardware transactional memory htm or hardware assisted stm hastm are seen as possible paths to make the tandem tm openmp more usable the paper finishes with set of open issues that still need to be addressed either in openmp or in the hardware software implementations of tm
we present novel algorithms for parallel testing of code that takes structurally complex test inputs the algorithms build on the korat algorithm for constraint based generation of structurally complex test inputs given an imperative predicate that specifies the desired structural constraints and finitization that bounds the desired input size korat performs systematic search to generate all test inputs within the bounds that satisfy the constraints we present how to generate test inputs with parallel search in korat and how to execute test inputs in parallel both off line when the inputs are saved on disk and on line when execution immediately follows generation the inputs that korat generates enable bounded exhaustive testing that checks the code under test exhaustively for all inputs within the given bounds we also describe novel methodology for reducing the number of equivalent inputs that korat can generate our development of parallel korat and the methodology for reducing equivalent inputs are motivated by testing an application developed at google the experimental results on running parallel korat across up to machines on the google’s infrastructure show that parallel test generation and execution can achieve significant speedup up to times
in this paper we introduce new view on roles in object oriented programming languages based on an ontological analysis of roles role is always associated with an object instance playing the role and also to an object instance its institution which represents its context the definition of role depends on the definition of the institution this property allows to endow role players with powers that can modify the state of the institution and of the other roles defined in it as an example we introduce role construct in java where the abolve features are interpreted as follows roles are implemented as classes which can be instantiated only in presence of an instance of the player and of an instance of an institution the definition of class implementing role is included in the class of the institution the role belongs to powers are methods which can access private fields and methods of the institution they belong to and of the other roles of the same institution
robots are said to be capable of self assembly when they can autonomously form physical connections with each other by examining different ways in which system can use self assembly ie different strategies we demonstrate and quantify the performance costs and benefits of acting as physically larger self assembled entity ii letting the system choose when and if to self assemble iii coordinating the sensing and actuation of the connected robots so that they respond to the environment as single collective entity our analysis is primarily based on real world experiments in hill crossing task the configuration of the hill is not known by the robots in advance the hill can be present or absent and can vary in steepness and orientation in some configurations the robots can overcome the hill more quickly by navigating individually while other configurations require the robots to self assemble to overcome the hill we demonstrate the applicability of our self assembly strategies to two other tasks hole crossing and robot rescue for which we present further proof of concept experiments with real robots
ugo’s research activity in the area ofmodels of computation moc for short has been prominent influential and broadly scoped ugo’s trademark is that undefinable ability to understand and distill computational aspects into new models as if you were reading them out of some evident connection between well know models only most often that connection is really visible only after ugo shows the way like experienced sailors have trusted compasses and sextants to help them find the best routes to harbour ugo relies on bag of favourite tools which he has used along the years to deliver variety of contributions to the moc area to mention just three in alphabetic order algebraic techniques concurrency theory and unification mechanismsin this introductory contribution we would like to recall some of the influentialmoc models put forward by ugo which cut across the three approaches before doing that it is worth devoting some space to discuss the three aspects separately notably the use of category theory is pervasive common trait
spyware is class of malicious code that is surreptitiously installed on victims machines once active it silently monitors the behavior of users records their web surfing habits and steals their passwords current anti spyware tools operate in way similar to traditional virus scanners that is they check unknown programs against signatures associated with known spyware instances unfortunately these techniques cannot identify novel spyware require frequent updates to signature databases and are easy to evade by code obfuscation in this paper we present novel dynamic analysis approach that precisely tracks the flow of sensitive information as it is processed by the web browser and any loaded browser helper objects using the results of our analysis we can identify unknown components as spyware and provide comprehensive reports on their behavior the techniques presented in this paper address limitations of our previouswork on spyware detection and significantly improve the quality and richness of our analysis in particular our approach allows human analyst to observe the actual flows of sensitive data in the system based on this information it is possible to precisely determine which sensitive data is accessed and where this data is sent to to demonstrate the effectiveness of the detection and the comprehensiveness of the generated reports we evaluated our system on substantial body of spyware and benign samples
this paper presents the conceptual modelling parts of methodology for the design of large scale data intensive web information systems wiss that is based on an abstract abstraction layer model alm it concentrates on the two most important layers in this model business layer and conceptual layerthe major activities on the business layer deal with user profiling and storyboarding which addresses the design of an underlying application story the core of such story can be expressed by directed multigraph in which the vertices represent scenes and the edges actions by the users including navigation this leads to story algebras which can then be used to personalise the wis to the needs of user with particular profilethe major activities on the conceptual layer address the support of scenes by modelling media types which combine links to databases via extended views with the generation of navigation structures operations supporting the activities in the storyboard hierarchical presentations and adaptivity to users end devices and channels
this paper presents study of the main current collaborative applications and shows how their architectural models focus on the interactive aspects of the systems for very specific applications it also analyses state of the art web service based collaborative applications and shows how they only solve specific problems and do not provide an extensible and flexible architecture from this study we conclude that there is currently no standard architecture and even less web service based one which can be taken as model for collaborative application development we therefore propose web service based architectural model for the development of this type of application this model provides flexible collaborative sessions in order to facilitate collaborative work in consistent way and with group awareness mechanisms the proposed architecture enables applications components or tools to be added and can be extended with new web services when required without the need to modify existing services the resulting collaborative applications are therefore flexible and extensible
rdf data are usually accessed using one of two methods either graphs are rendered in forms perceivable by human users eg in tabular or in graphical form which are difficult to handle for large data sets alternatively query languages like sparql provide means to express information needs in structured form hence they are targeted towards developers and experts inspired by the concept of spreadsheet tools where users can perform relatively complex calculations by splitting formulas and values across multiple cells we have investigated mechanisms that allow us to access rdf graphs in more intuitive and manageable yet formally grounded manner in this paper we make three contributions towards this direction first we present rdfunctions an algebra that consists of mappings between sets of rdf language elements uris blank nodes and literals under consideration of the triples contained in background graph second we define syntax for expressing rdfunctions which can be edited parsed and evaluated third we discuss tripcel an implementation of rdfunctions using spreadsheet metaphor using this tool users can easily edit and execute function expressions and perform analysis tasks on the data stored in an rdf graph
problem with many distributed applications is their behavior in lieu of unpredictable variations in user request volumes or in available resources this paper explores performance isolation based approach to creating robust distributed applications for each application the approach is to understand the performance dependencies that pervade it and then provide mechanisms for imposing constraints on the possible spread of such dependencies through the application concrete results are attained for jee middleware for which we identify sample performance dependencies in the application layer during request execution and in the middleware layer during request de fragmentation and during return parameter marshalling isolation points are the novel software abstraction used to capture performance dependencies and represent solutions for dealing with them and they are used to create solation rmi which is version of rmi iiop implemented in the websphere service infrastructure enhanced with isolation points initial results show the approach’s ability to detect and filter ill behaving messages that can cause an up to drop in performance for the trade benchmark and to eliminate up to drop in performance due to misbehaving clients
efficient discovery of information based on partial knowledge is challenging problem faced by many large scale dis tributed systems this paper presents plexus peer to peer search protocol that provides an efficient mechanism for advertising bit sequence pattern and discovering it using any subset of its bits pattern eg bloom filter summarizes the properties eg key words service description associated with shared object eg document service plexus has partially decentralized architecture involving super peers it adopts novel structured routing mechanism derived from the theory of error correcting codes ecc plexus achieves better resilience to peer failure by utilizing replication and redundant routing paths routing efficiency in plexus scales logarithmically with the number of superpeers the concept presented in this paper is supported with theoretical analysis and simulation results ob tained from the application of plexus to partial keyword search utilizing the extended golay code
we present phrase based statistical machine translation approach which uses linguistic analysis in the preprocessing phase the linguistic analysis includes morphological transformation and syntactic transformation since the word order problem is solved using syntactic transformation there is no reordering in the decoding phase for morphological transformation we use hand crafted transformational rules for syntactic transformation we propose transformational model based on probabilistic context free grammar this model is trained using bilingual corpus and broad coverage parser of the source language this approach is applicable to language pairs in which the target language is poor in resources we considered translation from english to vietnamese and from english to french our experiments showed significant bleu score improvements in comparison with pharaoh state of the art phrase based smt system
although luminance contrast plays predominant role in motion perception significant additional effects are introduced by chromatic contrasts in this paper relevant results from psychophysical and physiological research are described to clarify the role of color in motion detection interpreting these psychophysical experiments we propose guidelines for the design of animated visualizations and calibration procedure that improves the reliability of visual motion representation the guidelines are applied to examples from texture based flow visualization as well as graph and tree visualization
we introduce general uniform language independent framework for designing online and offline source to source program transformations by abstract interpretation of program semantics iterative source to source program transformations are designed constructively by composition of source to semantics semantics to transformed semantics and semantics to source abstractions applied to fixpoint trace semantics the correctness of the transformations is expressed through observational and performance abstractions the framework is illustrated on three examples constant propagation program specialization by online and offline partial evaluation and static program monitoring
pervasive computing applications often entail continuous monitoring tasks issuing persistent queries that return continuously updated views of the operational environment we present paq middleware that supports applications needs by approximating persistent query as sequence of one time queries paq introduces an integration strategy abstraction that allows composition of one time query responses into streams representing sophisticated spatio temporal phenomena of interest distinguishing feature of our middleware is the realization that the suitability of persistent query’s result is function of the application’s tolerance for accuracy weighed against the associated overhead costs in paq programmers can specify an inquiry strategy that dictates how information is gathered since network dynamics impact the suitability of particular inquiry strategy paq associates an introspection strategy with persistent query that evaluates the quality of the query’s results the result of introspection can trigger application defined adaptation strategies that alter the nature of the query paq’s simple api makes developing adaptive querying systems easily realizable we present the key abstractions describe their implementations and demonstrate the middleware’s usefulness through application examples and evaluation
we present framework that assesses relevance with respect to several relevance criteria by combining the query dependent and query independent evidence indicating these criteria this combination of evidence is modelled in uniform way irrespective of whether the evidence is associated with single document or related documents the framework is formally expressed within dempster shafer theory it is evaluated for web retrieval in the context of trec’s topic distillation task our results indicate that aggregating content based evidence from the linked pages of page is beneficial and that the additional incorporation of their homepage evidence further improves the effectiveness
in this research we aim to explore meaningful design directions for future photography applications with focus on the experiences around sharing we review wide rage of photo related applications extracting emerging patterns of different photo related interactions to inform framework for their discussion we extract two themes from the first stage of our analysis contextual annotation and tangible representation and then examine interesting application ideas around those themes we categorize design ideas into four groups augmentation of photo taking editing as creating new memories building new social networks through photo sharing and tangible representation to mediate intimacy finally we present user reactions to our design ideas in addition to providing framework for describing different photography applications this work provides an example of an integrative approach to designing new sharing experiences through digital photography
gpu architectures are increasingly important in the multi core era due to their high number of parallel processors performance optimization for multi core processors has been challenge for programmers furthermore optimizing for power consumption is even more difficult unfortunately as result of the high number of processors the power consumption of many core processors such as gpus has increased significantly hence in this paper we propose an integrated power and performance ipp prediction model for gpu architecture to predict the optimal number of active processors for given application the basic intuition is that when an application reaches the peak memory bandwidth using more cores does not result in performance improvement we develop an empirical power model for the gpu unlike most previous models which require measured execution times hardware performance counters or architectural simulations ipp predicts execution times to calculate dynamic power events we then use the outcome of ipp to control the number of running cores we also model the increases in power consumption that resulted from the increases in temperature with the predicted optimal number of active cores we show that we can save up to of runtime gpu energy consumption and on average of that for the five memory bandwidth limited benchmarks
dwell time on web pages has been extensively used for various information retrieval tasks however some basic yet important questions have not been sufficiently addressed eg what distribution is appropriate to model the distribution of dwell times on web page and furthermore what the distribution tells us about the underlying browsing behaviors in this paper we draw an analogy between abandoning page during web browsing and system failure in reliability analysis and propose to model the dwell time using the weibull distribution using this distribution provides better goodness of fit to real world data and it uncovers some interesting patterns of user browsing behaviors not previously reported for example our analysis reveals that web browsing in general exhibits significant negative aging phenomenon which means that some initial screening has to be passed before page is examined in detail giving rise to the browsing behavior that we call screen and glean in addition we demonstrate that dwell time distributions can be reasonably predicted purely based on low level page features which broadens the possible applications of this study to situations where log data may be unavailable
the single instruction multiple data simd architecture is very efficient for executing arithmetic intensive programs but frequently suffers from data alignment problems the data alignment problem not only induces extra time overhead but also hinders automatic vectorization of the simd compiler in this paper we compare three on chip memory systems which are single bank multi bank and multi port for the simd architecture to resolve the data alignment problems the single bank memory is the simplest but supports only the aligned accesses the multi bank memory requires little higher complexity but enables the unaligned accesses and the stride accesses with bank conflict limitation the multi port memory is capable of both the unaligned and stride accesses without any restriction but needs quite much expensive hardware we also developed vectorizing compiler that can conduct dynamic memory allocation and simd code generation the performances of the three memory systems with our simd compiler are evaluated using several digital signal processing kernels and the mpeg encoder the experimental results show that the multi bank memory can carry out mpeg encoding times faster whereas the single bank memory only achieves times speed up when employed in multimedia system with issue host processor and an way simd coprocessor the multi port memory obviously shows the best performance which is however an impractical improvement over the multi bank memory when the hardware cost is considered
with the increasing demand for proper and efficient xml data storage xml enabled database xendb has emerged as one of the popular solutions it claims to combine the pros and limit the cons of the traditional database management systems dbms and native xml database nxd in this paper we focus on xml data update management in xendb our aim is to preserve the conceptual semantic constraints and to avoid inconsistencies in xml data during update operations in this current era when xml data interchange mostly occurs in commercial setting it is highly critical that data exchanged be correct at all times and hence data integrity in xml data is paramount to achieve our goal we firstly classify different constraints in xml documents secondly we transform these constraints into xml schema with embedded sql annotations thirdly we propose generic update methodology that utilizes the proposed schema we then implement the method in one of the current xendb products since xendb has relational model as the underlying data model our update method uses the sql xml as standard language finally we also analyze the processing performance
search condition in object oriented object relational queries consists of nested predicates which are predicates on path expressions in this paper we propose new technique for estimating selectivity for nested predicates selectivity of nested predicate nested selectivity is defined as the ratio of the number of the qualified objects of the starting class in the path expression to the total number of objects of the class the new technique takes into account the effects of direct representation of the many to many relationship and the partial participation of objects in the relationship these two features occur frequently in object oriented object relational databases but have not been properly handled in the conventional selectivity estimation techniques for the many to many relationship we generalize the block hit function proposed by sb yao to allow the cases that an object belongs to more than one block for the partial participation we propose the concept of active objects and extend our technique for total participation to handle active objects we also propose efficient methods for obtaining statistical information needed for our estimation technique we finally analyze the accuracy of our technique through series of experiments and compare with the conventional ones the experiment results showed that there was significant inaccuracy in the estimation by the conventional ones confirming the advantage of our technique
embedded systems can operate perpetually without being connected to power source by harvesting environmental energy from motion the sun wind or heat differentials however programming these perpetual systems is challenging in response to changing energy levels programmers can adjust the execution frequency of energy intensive tasks or provide higher service levels when energy is plentiful and lower service levels when energy is scarce however it is often difficult for programmers to predict the energy consumption resulting from these adjustments worse explicit energy management can tie program to particular hardware platform limiting portability this paper introduces eon programming language and runtime system designed to support the development of perpetual systems to our knowledge eon is the first energy aware programming language eon is declarative coordination language that lets programmers compose programs from components written in or nesc paths through the program flows may be annotated with different energy states eon’s automatic energy management then dynamically adapts these states to current and predicted energy levels it chooses flows to execute and adjusts their rates of execution maximizing the quality of service under available energy constraints we demonstrate the utility and portability of eon by deploying two perpetual applications on widely different hardware platforms gps based location tracking sensor deployed on threatened species of turtle and on automobiles and solar powered camera sensor for remote ad hoc deployments we also evaluate the simplicity and effectiveness of eon with user study in which novice eon programmers produced more efficient efficient energy adaptive systems in substantially less time than experienced programmers
one of the necessary techniques for constructing virtual museum is to estimate the surface normal and the albedo of the artwork which has high specularity in this paper we propose novel photometric stereo method which is robust to the specular reflection of the object surface our method can also digitize the artwork arranged inside glass or acrylic display case without bringing the artwork out of the display case our method treats the specular reflection at the object surface or at the display case as an outlier and finds good surface normal evading the influence of the outliers we judiciously design the cost function so that the outlier will be automatically removed under the assumption that the object’s shape and color are smooth at the end of this paper we also show some archived data of segonko tumulus and objects in the university museum at the university of tokyo that were generated by applying the proposed method
this paper presents method for efficient compression and relighting with high resolution precomputed light transport matrices we accomplish this using wavelet transform transforming the columns of the transport matrix in addition to the row transform used in previous work we show that standard wavelet transform can actually inflate portions of the matrix because high frequency lights lead to high frequency images that cannot easily be compressed therefore we present an adaptive wavelet transform that terminates at level that avoids inflation and maximizes sparsity in the matrix data finally we present an algorithm for fast relighting from adaptively compressed transport matrices combined with gpu based precomputation pipeline this results in an image and geometry relighting system that performs significantly better than compression techniques on average better in terms of storage cost and rendering speed for equal quality matrices
data stream is massive unbounded sequence of data elements continuously generated at rapid rate consequently the knowledge embedded in data stream is more likely to be changed as time goes by identifying the recent change of data stream especially for an online data stream can provide valuable information for the analysis of the data stream however most of mining algorithms or frequency approximation algorithms over data stream do not differentiate the information of recently generated data elements from the obsolete information of old data elements which may be no longer useful or possibly invalid at present therefore they are not able to extract the recent change of information in data stream adaptively this paper proposes data mining method for finding recently frequent itemsets adaptively over an online transactional data stream the effect of old transactions on the current mining result of data steam is diminished by decaying the old occurrences of each itemset as time goes by furthermore several optimization techniques are devised to minimize processing time as well as memory usage finally the performance of the proposed method is analyzed by series of experiments to identify its various characteristics
this paper describes rudder decentralized agent based infrastructure for supporting the autonomic composition of grid applications rudder provides agents and protocols for discovering selecting and composing elements it also implements agent interaction and negotiation protocols to enable appropriate application behaviors to be dynamically negotiated and enacted the defined protocols and agent activities are supported by comet scalable decentralized shared space based coordination substrate the implementation operation and experimental evaluation of the system are presented
software evolution research has focused mostly on analyzing the evolution of single software systems however it is rarely the case that project exists as standalone independent of others rather projects exist in parallel within larger contexts in companies research groups or even the open source communities we call these contexts software ecosystems in this paper we present the small project observatory prototype tool which aims to support the analysis of software ecosystems through interactive visualization and exploration we present case study of exploring an ecosystem using our tool we describe the architecture of the tool and we distill lessons learned during the tool building experience
multiple display environment mde networks personal and shared devices to form virtual workspace and designers are just beginning to grapple with the challenges of developing interfaces tailored for these environments to develop effective interfaces for mdes designers must employ methods that allow them to rapidly generate and test alternative designs early in the design process paper prototyping offers one promising method but needs to be adapted to effectively simulate the use of multiple displays and allow testing with groups of users in this paper we share experiences from two projects in which paper prototyping was utilized to explore interfaces for mdes we identify problems encountered when applying the traditional method describe how these problems were overcome and distill our experiences into recommendations that others can draw upon by following our recommendations designers need only make minor modifications to the existing method to better realize benefits of paper prototyping for mdes
we introduce benchmark called texture text under relations to measure the relative strengths and weaknesses of combining text processing with relational workload in an rdbms while the well known trec benchmarks focus on quality we focus on efficiency texture is micro benchmark for query workloads and considers two central text support issues that previous benchmarks did not queries with relevance ranking rather than those that just compute all answers and richer mix of text and relational processing reflecting the trend toward seamless integration in developing this benchmark we had to address the problem of generating large text collections that reflected the performance characteristics of given seed collection this is essential for controlled study of specific data characteristics and their effects on performance in addition to presenting the benchmark with performance numbers for three commercial dbmss we present and validate synthetic generator for populating text fields
databases of text and text annotated data constitute significant fraction of the information available in electronic form searching and browsing are the typical ways that users locate items of interest in such databases interfaces that use multifaceted hierarchies represent new powerful browsing paradigm which has been proven to be successful complement to keyword searching thus far multifaceted hierarchies have been created manually or semi automatically making it difficult to deploy multifaceted interfaces over large number of databases we present automatic and scalable methods for creation of multifaceted interfaces our methods are integrated with traditional relational databases and can scale well for large databases furthermore we present methods for selecting the best portions of the generated hierarchies when the screen space is not sufficient for displaying all the hierarchy at once we apply our technique to range of large data sets including annotated images television programming schedules and web pages the results are promising and suggest directions for future research
random sampling is one of the most fundamental data management tools available however most current research involving sampling considers the problem of how to use sample and not how to compute one the implicit assumption is that sample is small data structure that is easily maintained as new data are encountered even though simple statistical arguments demonstrate that very large samples of gigabytes or terabytes in size can be necessary to provide high accuracy no existing work tackles the problem of maintaining very large disk based samples from data management perspective and no techniques now exist for maintaining very large samples in an online manner from streaming data in this paper we present online algorithms for maintaining on disk samples that are gigabytes or terabytes in size the algorithms are designed for streaming data or for any environment where large sample must be maintained online in single pass through data set the algorithms meet the strict requirement that the sample always be true statistically random sample without replacement of all of the data processed thus far we also present algorithms to retrieve small size random sample from large disk based sample which may be used for various purposes including statistical analyses by dbms
we introduce multidimensional framework for classifying and comparing trust and reputation systems the framework dimensions encompass both hard and soft features of such systems including different witness location approaches various reputation calculation engines variety of information sources and rating systems which are categorised as hard features and also basic reputation measurement parameters context diversity checking reliability and honesty assessment and adaptability which are referred to as soft features specifically the framework dimensions answer questions related to major characteristics of systems including those parameters from the real world that should be imitated in virtual environment the proposed framework can serve as basis to understand the current state of the art in the area of computational trust and reputation and also help in designing suitable control mechanisms for online communities in addition we have provided critical analysis of some of the existing techniques in the literature compared within the context of the proposed framework dimensions
the developments in information technology during the last decade have been rapidly changing the possibilities for data and knowledge access to respect this several declarative knowledge representation formalisms have been extended with the capability to access data and knowledge sources that are external to knowledge base this article reviews some of these formalisms that are centered around answer set programming viz hex programs modular logic programs and multi context systems which were developed by the kbs group of the vienna university of technology in cooperation with external colleagues these formalisms were designed with different principles and four different settings and thus have different properties and features however as argued they are not unrelated furthermore they provide basis for advanced knowledge based information systems which are targeted in ongoing research projects
presentation of content is an important aspect of today’s virtual reality applications especially in domains such as virtual museums the large amount and variety of exhibits in such applications raise need for adaptation and personalization of the environment this paper presents content personalization platform for virtual museums which is based on semantic description of content and on information implicitly collected about the users through their interactions with the museum the proposed platform uses stereotypes to initialize user models adapts user profiles dynamically and clusters users into similar interest groups science fiction museum has been set up as case study for this platform and an evaluation has been carried out
extensive studies have shown that mining gene expression data is important for both bioinformatics research and biomedical applications however most existing studies focus only on either co regulated gene clusters or emerging patterns factually another analysis scheme ie simultaneously mining phenotypes and diagnostic genes is also biologically significant which has received relative little attention so far in this paper we explore novel concept of local conserved gene cluster lc cluster to address this problem specifically an lc cluster contains subset of genes and subset of conditions such that the genes show steady expression values instead of the coherent pattern rising and falling synchronously defined by some previous work only on the subset of conditions but not along all given conditions to avoid the exponential growth in subspace search we further present two efficient algorithms namely falconer and falconer to mine the complete set of maximal lc clusters from gene expression data sets based on enumeration tree extensive experiments conducted on both real gene expression data sets and synthetic data sets show our approaches are efficient and effective our approaches outperform the existing enumeration tree based algorithms and our approaches can discover an amount of lc clusters which are potentially of high biological significance
context and motivation in market driven software development it is crucial but challenging to find the right balance among competing quality requirements qr problem in order to identify the unique challenges associated with the selection trade off and management of quality requirements an interview study is performed results this paper describes how qr are handled in practice data is collected through interviews with five product managers and five project leaders from five software companies contribution the contribution of this study is threefold firstly it includes an examination of the interdependencies among quality requirements perceived as most important by the practitioners secondly it compares the perceptions and priorities of quality requirements by product management and project management respectively thirdly it characterizes the selection and management of quality requirements in down stream development activities
retrieval in question and answer archive involves finding good answers for user’s question in contrast to typical document retrieval retrieval model for this task can exploit question similarity as well as ranking the associated answers in this paper we propose retrieval model that combines translation based language model for the question part with query likelihood approach for the answer part the proposed model incorporates word to word translation probabilities learned through exploiting different sources of information experiments show that the proposed translation based language model for the question part outperforms baseline methods significantly by combining with the query likelihood language model for the answer part substantial additional effectiveness improvements are obtained
there are two approaches to reduce the overhead associated with coordinated checkpointing first is to minimize the number of synchronization messages and the number of checkpoints the other is to make the checkpointing process non blocking in our previous work ieee parallel distributed systems we proved that there does not exist nonblocking algorithm which forces only minimum number of processes to take their checkpoints in this paper we present min process algorithm which relaxes the non blocking condition while tries to minimize the blocking time and non blocking algorithm which relaxes the min process condition while minimizing the number of checkpoints saved on the stable storage the proposed non blocking algorithm is based on the concept of mutable checkpoint which is neither tentative checkpoint nor permanent checkpoint based on mutable checkpoints our nonblocking algorithm avoids the avalanche effect and forces only minimum number of processes to take their checkpoints on the stable storage
program slicing is potentially useful analysis for aiding program understanding however in reality even slices of small programs are often too large to be useful imprecise pointer analyses have been suggested as one cause of this problem in this paper we use dynamic points to data which represents optimistic pointer information to obtain bound on the best case slice size improvement that can be achieved with improved pointer precision our experiments show that slice size can be reduced significantly for programs that make frequent use of calls through function pointers because for them the dynamic pointer data results in considerably smaller call graph which leads to fewer data dependences programs without or with only few calls through function pointers however show considerably less improvement we discovered that programs appear to have significant fraction of direct and nonspurious pointer data dependences so that reducing spurious dependences via pointers is only of limited benefit consequently to make slicing useful in general for such programs improvements beyond better pointer analyses will be necessary on the other hand since we show that collecting dynamic function pointer information can be performed with little overhead average slowdown of percent for our benchmarks dynamic pointer information may be practical approach to making slicing of programs with frequent function pointer use more successful in practice
in this paper we present voxelbars as an informative interface for volume visualization voxelbars arrange voxels into space and visually encode multiple attributes of voxels into one display voxelbars allow users to easily find out clusters of interesting voxels set opacities and colors of specific group of voxels and achieve various sophisticated visualization tasks at voxel level we provide results on real volume data to demonstrate the advantages of the voxelbars over scatterplots and traditional transfer function specification methods some novel visualization techniques including visibility aware transfer function design and selective clipping based on voxelbars are also introduced
to improve the effectiveness and efficiency of mining tasks constraint based mining enables users to concentrate on mining their interested association rules instead of the complete set of association rules previously proposed methods are mainly contributed to handling single constraint and only consider the constraints which are characterized by single attribute value in this paper we propose an approach to mine association rules with multiple constraints constructed by multi dimensional attribute values our proposed approach basically consists of three phases first we collect the frequent items and prune infrequent items according to the apriori property second we exploit the properties of the given constraints to prune search space or save constraint checking in the conditional databases third for each itemset possible to satisfy the constraint we generate its conditional database and perform the three phases in the conditional database recursively our proposed algorithms can exploit the properties of constraints to prune search space or save constraint checking therefore our proposed algorithm is more efficient than the revised fp growth and fic algorithms
this paper presents an improved client server system that increases the availability of remote data in order to reduce the required bandwidth the data related to the appearance color and normal involved in the rendering of meshes and point clouds is quantized on the fly during the transmission to the final client without reducing the geometric complexity our new quantization technique for the appearance that can be implemented on the gpu strongly reduces the cpu load on the server side and the transmission time is largely decreased
in the past data holders protected the privacy of their constituents by issuing separate disclosures of sensitive eg dna and identifying data eg names however individuals visit many places and their location visit patterns or trails can re identify seemingly anonymous data in this paper we introduce formal model of privacy protection called unlinkability to prevent trail re identification in distributed data the model guarantees that sensitive data trails are linkable to no less than identities we develop graph based model and illustrate how unlinkability is more appropriate solution to this privacy problem compared to alternative privacy protection models
the specifications of an application’s security configuration are crucial for understanding its security policies which can be very helpful in security related contexts such as misconfiguration detection such specifications however are often ill documented or even close because of the increasing use of graphic user interfaces to set program options in this paper we propose configre new technique for automatic reverse engineering of an application’s access control configurations our approach first partitions configuration input into fields and then identifies the semantic relations among these fields and the roles they play in enforcing an access control policy based upon such knowledge configre automatically generates specification language to describe the syntactic relations of these fields the language can be converted into scanner using standard parser generators for scanning configuration files and discovering the security policies specified in an application we implemented configre in our research and evaluated it against real applications the experiment results demonstrate the efficacy of our approach
people regularly interact with different representations of web pages person looking for new information may initially find web page represented as short snippet rendered by search engine when he wants to return to the same page the next day the page may instead be represented by link in his browser history previous research has explored how to best represent web pages in support of specific task types but as we find in this paper consistency in representation across tasks is also important we explore how different representations are used in variety of contexts and present compact representation that supports both the identification of new relevant web pages and the re finding of previously viewed pages
let text of characters over an alphabet of size be compressible to phrases by the lz algorithm we show how to build data structure based on the ziv lempel trie called the lz index that takes log bits of space that is times the entropy of the text for ergodic sources and reports the occurrences of pattern of length in worst case time log log we present practical implementation of the lz index which is faster than current alternatives when we take into consideration the time to report the positions or text contexts of the occurrences found
we present macroprogramming framework called macrolab that offers vector programming abstraction similar to matlab for cyber physical systems cpss the user writes single program for the entire network using matlab like operations such as addition find and max the framework executes these operations across the network in distributed fashion centralized fashion or something between the two whichever is most efficient for the target deployment we call this approach deployment specific code decomposition dscd macrolab programs can be executed on mote class hardware such as the telos motes our results indicate that macrolab introduces almost no additional overhead in terms of message cost power consumption memory footprint or cpu cycles over tinyos programs
in this paper we present performance comparison of database replication techniques based on total order broadcast while the performance of total order broadcast based replication techniques has been studied in previous papers this paper presents many new contributions first it compares with each other techniques that were presented and evaluated separately usually by comparing them to classical replication scheme like distributed locking second the evaluation is done using finer network model than previous studies third the paper compares techniques that offer the same consistency criterion one copy serializability in the same environment using the same settings the paper shows that while networking performance has little influence in lan setting the cost of synchronizing replicas is quite high because of this total order broadcast based techniques are very promising as they minimize synchronization between replicas
peer to peer networks are widely criticized for their inefficient flooding search mechanism distributed hash table dht algorithms have been proposed to improve the search efficiency by mapping the index of file to unique peer based on predefined hash functions however the tight coupling between indices and hosting peers incurs high maintenance cost in highly dynamic network to properly balance the tradeoff between the costs of indexing and searching we propose the distributed caching and adaptive search dicas algorithm where indices are passively cached in group of peers based on predefined hash function guided by the same function adaptive search selectively forwards queries to matched peers with high probability of caching the desired indices the search cost is reduced due to shrunk searching space different from the dht solutions distributed caching loosely maps the index of file to group of peers in passive fashion which saves the cost of updating indices our simulation study shows that the dicas protocol can significantly reduce the network search traffic with the help of small cache space contributed by each individual peer
power based side channel attacks are significant security risk especially for embedded applications to improve the security of such devices protected logic styles have been proposed as an alternative to cmos however they should only be used sparingly since their area and power consumption are both significantly larger than for cmos we propose to augment processor realized in cmos with custom instruction set extensions designed with security and performance as the primary objectives that are realized in protected logic we have developed design flow based on standard cad tools that can automatically synthesize and place and route such hybrid designs the flow is integrated into simulation and evaluation environment to quantify the security achieved on sound basis using mcml logic as case study we have explored different partitions of the present block cipher between protected and unprotected logic this experiment illustrates the tradeoff between the type and amount of application level functionality implemented in protected logic and the level of security achieved by the design our design approach and evaluation tools are generic and could be used to partition any algorithm using any protected logic style
security in cloud computing is getting more and more important recently besides passive defense such as encryption it is necessary to implement real time active monitoring detection and defense in the cloud according to the published researches dpi deep packet inspection is the most effective technology to realize active inspection and defense however most recent works of dpi aim at space reduction but could not meet the demands of high speed and stability in the cloud so it is important to improve regular methods of dpi making it more suitable for cloud computing in this paper an asynchronous parallel finite automaton named apfa is proposed by introducing the asynchronous parallelization and the heuristically forecast mechanism which significantly decreases the time consumed in matching while still keeps reducing the memory required what is more apfa is immune to the overlapping problem so that the stability is also enhanced the evaluation results show that apfa achieves higher stability better performance on time and memory in short apfa is more suitable for cloud computing
the authors describe recommender model that uses intermediate agents to evaluate large body of subjective data according to set of rules and make recommendations to users after scoring recommended items agents adapt their own selection rules via interactive evolutionary computing to fit user tastes even when user preferences undergo rapid change the model can be applied to such tasks as critiquing large numbers of music or written compositions in this paper we use musical selections to illustrate how agents make recommendations and report the results of several experiments designed to test the model’s ability to adapt to rapidly changing conditions yet still make appropriate decisions and recommendations
today’s data centers may contain tens of thousands of computers with significant aggregate bandwidth requirements the network architecture typically consists of tree of routing and switching elements with progressively more specialized and expensive equipment moving up the network hierarchy unfortunately even when deploying the highest end ip switches routers resulting topologies may only support of the aggregate bandwidth available at the edge of the network while still incurring tremendous cost non uniform bandwidth among data center nodes complicates application design and limits overall system performance in this paper we show how to leverage largely commodity ethernet switches to support the full aggregate bandwidth of clusters consisting of tens of thousands of elements similar to how clusters of commodity computers have largely replaced more specialized smps and mpps we argue that appropriately architected and interconnected commodity switches may deliver more performance at less cost than available from today’s higher end solutions our approach requires no modifications to the end host network interface operating system or applications critically it is fully backward compatible with ethernet ip and tcp
traditionally stemming has been applied to information retrieval tasks by transforming words in documents to the their root form before indexing and applying similar transformation to query terms although it increases recall this naive strategy does not work well for web search since it lowers precision and requires significant amount of additional computation in this paper we propose context sensitive stemming method that addresses these two issues two unique properties make our approach feasible for web search first based on statistical language modeling we perform context sensitive analysis on the query side we accurately predict which of its morphological variants is useful to expand query term with before submitting the query to the search engine this dramatically reduces the number of bad expansions which in turn reduces the cost of additional computation and improves the precision at the same time second our approach performs context sensitive document matching for those expanded variants this conservative strategy serves as safeguard against spurious stemming and it turns out to be very important for improving precision using word pluralization handling as an example of our stemming approach our experiments on major web search engine show that stemming only of the query traffic we can improve relevance as measured by average discounted cumulative gain dcg by on these queriesand over all query traffic
this paper proposes technique that mixes simulation and an analytical method to evaluate the characteristics of networks on chips nocs the advantage of this technique is to reduce the simulation time by reducing the complexity of the noc model while still obtaining accurate results for latency and throughput the basis of this technique is to send the whole payload data at once in the packet header ii to reduce the noc simulation complexity by omitting the flit by flit payload forwarding iii to use an algorithm for controlling the release of the packet trailer in order to close the connection at the right time for the evaluation of this technique an actor oriented model of noc joselito was created simulation results show that joselito is in average times faster in of the executed case studies than the implementation without using the proposed technique the worst case simulation results for latency and throughput have respectively and error compared to the corresponding register transfer level rtl model
data warehousing has been widely adopted by contemporary enterprises for inter organizational information sharing the need cannot be over emphasized to conduct researches on the integration of heterogeneous data warehouses to overcome the challenging situations today that makes it urgent to establish systematic integration methodology for integrating heterogeneous data warehouses via the internet or proprietary extranets traditionally researchers usually employed canonical format as the integration medium for logical data integrations among heterogeneous systems in this paper to fully utilize the power of the internet we propose framework and develop prototype to integrate heterogeneous data warehouses by xml technologies we first formally define the elements in data warehousing and discuss various semantic conflicts occurring among heterogeneous data cubes then we propose the system architecture and related resolution procedures for all kinds of semantic conflicts for local data cubes with different schemas we define global xml schema to integrate the local cube structures and transform each local cube respectively into an xml document conforming to the global xml schema these transformed xml documents obtained from local cubes will be manipulated by pre defined xquery commands to form unified xml document which can be regarded as the global cube the integrated global cube can be easily stored and manipulated in native xml databases the proposed methodology enables global users to browse or pose multi dimensional expressions mdx on the global cube to obtain result in the same way as they perform locally
image and video labeling is important for computers to understand images and videos and for image and video search manual labeling is tedious and costly automatically image and video labeling is yet dream in this paper we adopt web approach to labeling images and videos efficiently internet users around the world are mobilized to apply their common sense to solve problems that are hard for today’s computers such as labeling images and videos we first propose general human computation framework that binds problem providers web sites and internet users together to solve large scale common sense problems efficiently and economically the framework addresses the technical challenges such as preventing malicious party from attacking others removing answers from bots and distilling human answers to produce high quality solutions to the problems the framework is then applied to labeling images three incremental refinement stages are applied the first stage collects candidate labels of objects in an image the second stage refines the candidate labels using multiple choices synonymic labels are also correlated in this stage to prevent bots and lazy humans from selecting all the choices trap labels are generated automatically and intermixed with the candidate labels semantic distance is used to ensure that the selected trap labels would be different enough from the candidate labels so that no human users would mistakenly select the trap labels the last stage is to ask users to locate an object given label from segmented image the experimental results are also reported in this paper they indicate that our proposed schemes can successfully remove spurious answers from bots and distill human answers to produce high quality image labels
qualitative choice logic qcl is propositional logic for representing alternative ranked options for problem solutions the logic adds to classical propositional logic new connective called ordered disjunction intuitively means if possible but if is not possible then at least the semantics of qualitative choice logic is based on preference relation among models consequences of qcl theories can be computed through compilation to stratified knowledge bases which in turn can be compiled to classical propositional theories we also discuss potential applications of the logic several variants of qcl based on alternative inference relations and their relation to existing nonmonotonic formalisms
this paper evaluates the raw microprocessor raw addresses thechallenge of building general purpose architecture that performswell on larger class of stream and embedded computing applicationsthan existing microprocessors while still running existingilp based sequential programs with reasonable performance in theface of increasing wire delays raw approaches this challenge byimplementing plenty of on chip resources including logic wires and pins in tiled arrangement and exposing them through newisa so that the software can take advantage of these resources forparallel applications raw supports both ilp and streams by routingoperands between architecturally exposed functional units overa point to point scalar operand network this network offers lowlatency for scalar data transport raw manages the effect of wiredelays by exposing the interconnect and using software to orchestrateboth scalar and stream data transportwe have implemented prototype raw microprocessor in ibm’s nm layer copper cmos sf standard cell asic process wehave also implemented ilp and stream compilers our evaluationattempts to determine the extent to which raw succeeds in meetingits goal of serving as more versatile general purpose processorcentral to achieving this goal is raw’s ability to exploit all formsof parallelism including ilp dlp tlp and stream parallelismspecifically we evaluate the performance of raw on diverse setof codes including traditional sequential programs streaming applications server workloads and bit level embedded computationour experimental methodology makes use of cycle accurate simulatorvalidated against our real hardware compared to nmpentium iii using commodity pc memory system components rawperforms within factor of for sequential applications with verylow degree of ilp about to better for higher levels of ilp andx better when highly parallel applications are coded in astream language or optimized by hand the paper also proposes anew versatility metric and uses it to discuss the generality of raw
we present the induced generalized ordered weighted averaging igowa operator it is new aggregation operator that generalizes the owa operator including the main characteristics of both the generalized owa and the induced owa operator this operator uses generalized means and order inducing variables in the reordering process it provides very general formulation that includes as special cases wide range of aggregation operators including all the particular cases of the iowa and the gowa operator the induced ordered weighted geometric iowg operator and the induced ordered weighted quadratic averaging iowqa operator we further generalize the igowa operator via quasi arithmetic means the result is the quasi iowa operator finally we present numerical example to illustrate the new approach in financial decision making problem
this paper argues that computational grids can be used for far more types of applications than just trivially parallel ones algorithmic optimizations like latency hiding and exploiting locality can be used effectively to obtain high performance on grids despite the relatively slow wide area networks that connect the grid resources moreover the bandwidth of wide area networks increases rapidly allowing even some applications that are extremely communication intensive to run on grid provided the underlying algorithms are latency tolerant we illustrate large scale parallel computing on grids with three example applications that search large state spaces transposition driven search retrograde analysis and model checking we present several performance results on state of the art computer science grid das with dedicated optical network
in this paper we show that better performance can be achieved by training keypoint detector to only find those points that are suitable to the needs of the given task we demonstrate our approach in an urban environment where the keypoint detector should focus on stable man made structures and ignore objects that undergo natural changes such as vegetation and clouds we use waldboost learning with task specific training samples in order to train keypoint detector with this capability we show that our aproach generalizes to broad class of problems where the task is known beforehand
wireless sensor networks are an emerging technology for low cost unattended monitoring of wide range of environments their importance has been enforced by the recent delivery of the ieee standard for the physical and mac layers and the forthcoming zigbee standard for the network and application layers the fast progress of research on energy efficiency networking data management and security in wireless sensor networks and the need to compare with the solutions adopted in the standards motivates the need for survey on this field
universality the property of the web that makes it the largest data and information source in the world is also the property behind the lack of uniform organization scheme that would allow easy access to data and information semantic web wherein different applications and web sites can exchange information and hence exploit web data and information to their full potential requires the information about web resources to be represented in detailed and structured manner resource description framework rdf an effort in this direction supported by the world wide web consortium provides means for the description of metadata which is necessity for the next generation of interoperable web applications the success of rdf and the semantic web will depend on the development of applications that prove the applicability of the concept the availability of application interfaces which enable the development of such applications and databases and inference systems that exploit rdf to identify and locate most relevant web resources in addition many practical issues such as security ease of use and compatibility will be crucial in the success of rdf this survey aims at providing glimpse at the past present and future of this upcoming technology and highlights why we believe that the next generation of the web will be more organized informative searchable accessible and most importantly useful it is expected that knowledge discovery and data mining can benefit from rdf and the semantic web
existing texture synthesis from example strategies for polygon meshes typically make use of three components multi resolution mesh hierarchy that allows the overall nature of the pattern to be reproduced before filling in detail matching strategy that extends the synthesized texture using the best fit from texture sample and transfer mechanism that copies the selected portion of the texture sample to the target surface we introduce novel alternatives for each of these components use of radic subdivision surfaces provides the mesh hierarchy and allows fine control over the surface complexity adaptive subdivision is used to create an even vertex distribution over the surface use of the graph defined by surface region for matching rather than regular texture neighbourhood provides for flexible control over the scale of the texture and allows simultaneous matching against multiple levels of an image pyramid created from the texture sample we use graph cuts for texture transfer adapting this scheme to the context of surface synthesis the resulting surface textures are realistic tolerant of local mesh detail and are comparable to results produced by texture neighbourhood sampling approaches
this paper presents an approach for tracking paper documents on the desk over time and automatically linking them to the corresponding electronic documents using an overhead video camera we demonstrate our system in the context of two scenarios paper tracking and photo sorting in the paper tracking scenario the system tracks changes in the stacks of printed documents and books on the desk and builds complete representation of the spatial structure of the desktop when users want to find printed document buried in the stacks they can query the system based on appearance keywords or access time the system also provides remote desktop interface for directly browsing the physical desktop from remote location in the photo sorting scenario users sort printed photographs into physical stacks on the desk the systemautomatically recognizes the photographs and organizes the corresponding digital photographs into separate folders according to the physical arrangement our framework provides way to unify the physical and electronic desktops without the need for specialized physical infrastructure except for video camera
design variability due to within die and die to die process variations has the potential to significantly reduce the maximum operating frequency and the effective yield of high performance microprocessors in future process technology generations this variability manifests itself by increasing the number and criticality of long delay paths to quantify this impact we use an architectural process variation model that is appropriate for the analysis of system performance in the earlystages of the design process we propose method of selecting microarchitectural parameters to mitigate the frequency impact due to process variability for distinct structures while minimizing ipc instructions per cycle loss we propose an optimization procedure to be used for system level design decisions and we find that joint architecture and statistical timing analysis can be more advantageous than pure circuit level optimization overall the technique can improve the yield frequency by about with ipc loss for baseline machine with fo logic depth per pipestage this approach is sensitive to the selection of processor pipeline depth and we demonstrate that machines with aggressive pipelines will experience greater challenges in coping with process variability
given the cost of memories and the very large storage and bandwidth requirements of large scale multimedia databases hierarchical storage servers which consist of disk based secondary storage and tape library based tertiary storage are becoming increasingly popular such server applications rely upon tape libraries to store all media exploiting their excellent storage capacity and cost per mb characteristics they also rely upon disk arrays exploiting their high bandwidth to satisfy very large number of requests given typical access patterns and server configurations the tape drives are fully utilized uploading data for requests that ldquo fall through rdquo to the tertiary level such upload operations consume significant secondary storage device and bus bandwidth in addition with present technology and trends the disk array can serve fewer requests to continuous objects than it can store mainly due to io and or backplane bus bandwidth limitations in this work we address comprehensively the performance of these hierarchical continuous media storage servers by looking at all three main system resources the tape drive bandwidth the secondary storage bandwidth and the host’s ram we provide techniques which while fully utilizing the tape drive bandwidth an expensive resource they introduce bandwidth savings which allow the secondary storage devices to serve more requests and do so without increasing demands for the host’s ram space specifically we consider the issue of elevating continuous data from its permanent place in tertiary for display purposes we develop algorithms for sharing the responsibility for the playback between the secondary and tertiary devices and for placing the blocks of continuous objects on tapes and show how they achieve the above goals we study these issues for different commercial tape library products with different bandwidth and tape capacity and in environments with and without the multiplexing of tape libraries
the creation of most models used in computer animation and computer games requires the assignment of texture coordinates texture painting and texture editing we present novel approach for texture placement and editing based on direct manipulation of textures on the surface compared to conventional tools for surface texturing our system combines uv coordinate specification and texture editing into one seamless process reducing the need for careful initial design of parameterization and providing natural interface for working with textures directly on surfacesa combination of efficient techniques for interactive constrained parameterization and advanced input devices makes it possible to realize set of natural interaction paradigms the texture is regarded as piece of stretchable material which the user can position and deform on the surface selecting arbitrary sets of constraints and mapping texture points to the surface in addition the multi touch input makes it possible to specify natural handles for texture manipulation using point constraints associated with different fingers pressure can be used as direct interface for texture combination operations the position of the object and its texture can be manipulated simultaneously using two hand input
earliest query answering eqa is an objective of streaming algorithms for xml query answering that aim for close to optimal memory management in this paper we show that eqa is infeasible even for small fragment of xpath unless np we then present an eqa algorithm for queries and schemas defined by deterministic nested word automata dnwas and distinguish large class of dnwas for which streaming query answering is feasible in polynomial space and time
in this paper we perform an extensive theoretical and experimental study on common synopsis construction algorithms with emphasis on wavelet based techniques that take under consideration query workload statistics our goal is to compare expensive quadratic time algorithms with cheap near linear time algorithms particularly when the latter are not optimal and or not workload aware for the problem at hand further we present the first known algorithm for constructing wavelet synopses for special class of range sum query workloads our experimental results clearly justify the necessity for designing workload aware algorithms especially in the case of range sum queries
the signature file method is popular indexing technique used in information retrieval and databases it excels in efficient index maintenance and lower space overhead however it suffers from inefficiency in query processing due to the fact that for each query processed the entire signature file needs to be scanned in this paper we introduce tree structure called signature tree established over signature file which can be used to expedite the signature file scanning by one order of magnitude or more
multiprocessor systems on chips mpsocs have become popular architectural technique to increase performance however mpsocs may lead to undesirable power consumption characteristics for computing systems that have strict power budgets such as pdas mobile phones and notebook computers this paper presents the super complex instruction set computing supercisc embedded processor architecture and in particular investigates performance and power consumption of this device compared to traditional processor architecture based execution supercisc is heterogeneous multicore processor architecture designed to exceed performance of traditional embedded processors while maintaining reduced power budget compared to low power embedded processors at the heart of the supercisc processor is multicore vliw very large instruction word containing several homogeneous execution cores functional units in addition complex and heterogeneous combinational hardware function cores are tightly integrated to the core vliw engine providing an opportunity for improved performance and reduced energy consumption our supercisc processor core has been synthesized for both nm stratix ii field programmable gate aray fpga and nm standard cell application specific integrated circuit asic fabrication process from oki each operating at approximately mhz for the vliw core we examine several reasons for speedup and power improvement through the supercisc architecture including predicated control flow cycle compression and reduction in arithmetic power consumption which we call power compression finally testing our supercisc processor with multimedia and signal processing benchmarks we show how the supercisc processor can provide performance improvements ranging from to with an average of while also providing orders of magnitude of power improvements for the computational kernels the power improvements for our benchmark kernels range from just over to over with an average savings exceeding by combining these power and performance improvements our total energy improvements all exceed as these savings are limited to the computational kernels of the applications which often consume approximately percnt of the execution time we expect our savings to approach the ideal application improvement of
our study compared how experts and novices performed exploratory search using traditional search engine and social tagging system as expected results showed that social tagging systems could facilitate exploratory search for both experts and novices we however also found that experts were better at interpreting the social tags and generating search keywords which made them better at finding information in both interfaces specifically experts found more general information than novices by better interpretation of social tags in the tagging system and experts also found more domain specific information by generating more of their own keywords we found dynamic interaction between knowledge in the head and knowledge in the social web that although information seekers are more and more reliant on information from the social web domain expertise is still important in guiding them to find and evaluate the information implications on the design of social search systems that facilitate exploratory search are also discussed
effective planning in uncertain environment is important to agents and multi agents systems in this paper we introduce new logic based approach to probabilistic contingent planning probabilistic planning with imperfect sensing actions by relating probabilistic contingent planning to normal hybrid probabilistic logic programs with probabilistic answer set semantics we show that any probabilistic contingent planning problem can be encoded as normal hybrid probabilistic logic program we formally prove the correctness of our approach moreover we show that the complexity of finding probabilistic contingent plan in our approach is np complete in addition we show that any probabilistic contingent planning problem cal pp can be encoded as classical normal logic program with answer set semantics whose answer sets corresponds to valid trajectories in cal pp we show that probabilistic contingent planning problems can be encoded as sat problems we present new high level probabilistic action description language that allows the representation of sensing actions with probabilistic outcomes
algorithm animation attempts to explain an algorithm by visualizing interesting events of the execution of the implemented algorithm on some sample input algorithm explanation describes the algorithm on some adequate level of abstraction states invariants explains how important steps of the algorithm preserve the invariants and abstracts from the input data up to the relevant properties it uses small focus onto the execution state this paper is concerned with the explanation of algorithms on linked data structures the thesis of the paper is that shape analysis of such algorithms produces abstract representations of such data structures which focus on the active parts ie the parts of the data structures which the algorithm can access during its next steps the paper presents concept of visually executing an algorithm on these abstract representations of data
tagging has become primary tool for users to organize and share digital content on many social media sites in addition tag information has been shown to enhance capabilities of existing search engines however many resources on the web still lack tag information this paper proposes content based approach to tag recommendation which can be applied to webpages with or without prior tag information while social bookmarking service such as delicious enables users to share annotated bookmarks tag recommendation is available only for pages with tags specified by other users our proposed approach is motivated by the observation that similar webpages tend to have the same tags each webpage can therefore share the tags they own with similar webpages the propagation of tag depends on its weight in the originating webpage and the similarity between the sending and receiving webpages the similarity metric between two webpages is defined as linear combination of four cosine similarities taking into account both tag information and page content experiments using data crawled from delicious show that the proposed method is effective in populating untagged webpages with the correct tags
we consider the problem of partitioning the nodes of complete edge weighted graph into clusters so as to minimize the sum of the diameters of the clusters since the problem is np complete our focus is on the development of good approximation algorithms when edge weights satisfy the triangle inequality we present the first approximation algorithm for the problem the approximation algorithm yields solution which has no more than clusters such that the sum of cluster diameters is within factor ln of the optimal value using exactly clusters our approach also permits tradeoff among the constant terms hidden by the two big terms and the running time for any fixed we present an approximation algorithm that produces clusters whose total diameter is at most twice the optimal value when the distances are not required to satisfy the triangle inequality we show that unless np for any there is no polynomial time approximation algorithm that can provide performance guarantee of even when the number of clusters is fixed at we also present some results for the problem of minimizing the sum of cluster radii
this paper presents an infrastructure and mechanism for achieving dynamic inter enterprise workflow management using services provided by collaborative business enterprises services are distributed services that can be accessed programmatically on the internet using soap messages and the http protocol in this work we categorize services according to their business types and manage them in uddi enabled constraint based broker server service requests are specified in the activities of process model according to some standardized service templates and are bound to the proper service providers at run time by using constraint based dynamic service binding mechanism the workflow management system is dynamic in the sense that the actual business organizations which take part in business process are not determined until run time we have extended the traditional workflow process modeling by including service requests in activity specifications and extended the web service description language wsdl by including constraints in both service specifications and service requests so that the selection of service providers can be more accurately performed
the concept of an information space provides powerful metaphor for guiding the design of interactive retrieval systems we present case study of related article search browsing tool designed to help users navigate the information space defined by results of the pubmed search engine this feature leverages content similarity links that tie medline citations together in vast document network we examine the effectiveness of related article search from two perspectives topological analysis of networks generated from information needs represented in the trec genomics track and query log analysis of real pubmed users together data suggest that related article search is useful feature and that browsing related articles has become an integral part of how users interact with pubmed
optimizations in traditional compiler are applied sequentially with each optimization destructively modifying the program to produce transformed program that is then passed to the next optimization we present new approach for structuring the optimization phase of compiler in our approach optimizations take the form of equality analyses that add equality information to common intermediate representation the optimizer works by repeatedly applying these analyses to infer equivalences between program fragments thus saturating the intermediate representation with equalities once saturated the intermediate representation encodes multiple optimized versions of the input program at this point profitability heuristic picks the final optimized program from the various programs represented in the saturated representation our proposed way of structuring optimizers has variety of benefits over previous approaches our approach obviates the need to worry about optimization ordering enables the use of global optimization heuristic that selects among fully optimized programs and can be used to perform translation validation even on compilers other than our own we present our approach formalize it and describe our choice of intermediate representation we also present experimental results showing that our approach is practical in terms of time and space overhead is effective at discovering intricate optimization opportunities and is effective at performing translation validation for realistic optimizer
many distributed applications can be understood in terms of components interacting in an open environment such as the internet open environments are subject to change in unpredictable ways as applications may arrive evolve or disappear in order to validate components in such environments it can be useful to build simulation environments which reflect this highly unpredictable behavior this paper considers the validation of components with respect to behavioral interfaces behavioral interfaces specify semantic requirements on the observable behavior of components expressed in an assume guarantee style in our approach rewriting logic model is transparently extended with the history of all observable communications and metalevel strategies are used to guide the simulation of environment behavior over specification of the environment is avoided by allowing arbitrary environment behavior within the bounds of the assumption on observable behavior while the component is validated with respect to the guarantee of the behavioral interface
this paper addresses the problem of interactively modeling large street networks we introduce an intuitive and flexible modeling framework in which user can create street network from scratch or modify an existing street network this is achieved through designing an underlying tensor field and editing the graph representing the street network the framework is intuitive because it uses tensor fields to guide the generation of street network the framework is flexible because it allows the user to combine various global and local modeling operations such as brush strokes smoothing constraints noise and rotation fields our results will show street networks and three dimensional urban geometry of high visual quality
strong direct product theorem states that if we want to compute independent instances of function using less than times the resources needed for one instance then the overall success probability will be exponentially small in we establish such theorem for the randomized communication complexity of the disjointness problem ie with communication const kn the success probability of solving instances of size can only be exponentially small in this solves an open problem of ksw lss we also show that this bound even holds for am communication protocols with limited ambiguity the main result implies new lower bound for disjointness in restricted player nof protocol and optimal communication space tradeoffs for boolean matrix product our main result follows from solution to the dual of linear programming problem whose feasibility comes from so called intersection sampling lemma that generalizes result by razborov raz
this paper presents typed programming language and compiler for run time code generation the language called ml extends ml with modal operators in the style of the mini ml’e language of davies and pfenning ml allows programmers to use types to specify precisely the stages of computation in program the types also guide the compiler in generating target code that exploits the staging information through the use of run time code generation the target machine is currently version of the categorical abstract machine called the ccam which we have extended with facilities for run time code generationthis approach allows the programmer to express the staging that he wants directly to the compiler it also provides typed framework in which to verify the correctness of his staging intentions and to discuss his staging decisions with other programmers finally it supports in natural way multiple stages of run time specialization so that dynamically generated code can be used in the generation of yet further specialized codethis paper presents an overview of the language with several examples of programs that illustrate key concepts and programming techniques then it discusses the ccam and the compilation of ml programs into ccam code finally the results of some experiments are shown to demonstrate the benefits of this style of run time code generation for some applications
nowadays concepts languages and models for coordination cannot leave aside the needs of the increasing number of commercial applications based on global computing over the inter net for example platforms like microsoft net and sun microsystems java come equipped with packages for supporting ad hoc transactional features which are essential for most business applications we show how to extend the coordination language par excellence viz linda with basic primitives for transactions while retaining formal model for its concurrent computations this is achieved by exploiting variation of petri nets called zero safe nets where transactions can be suitably modelled by distinguishing between stable places ordinary ones and zero places where tokens can only be temporarily allocated defining hidden states the relevance of the transaction mechanism is illustrated in terms of expressive power finally it is shown that stable places and transactions viewed as atomic steps define an abstract semantics that is apt for fully algebraic treatment as demonstrated via categorical adjunctions between suitable categories of nets
we present novel hybrid analysis technology which can efficiently and seamlessly integrate all static and run time analysis of memory references into single framework that is capable of performing all data dependence analysis and can generate necessary information for most associated memory related optimizations we use ha to perform automatic parallelization by extracting run time assertions from any loop and generating appropriate run time tests that range from low cost scalar comparison to full reference by reference run time analysis moreover we can order the run time tests in increasing order of complexity overhead and thus risk the minimum necessary overhead we accomplish this by both extending compile time ip analysis techniques and by incorporating speculative run time techniques when necessary our solution is to bridge free compile time techniques with exhaustive run time techniques through continuum of simple to complex solutions we have implemented our framework in the polaris compiler by introducing an innovative intermediate representation called rtlmad and run time library that can operate on it based on the experimental results obtained to date we hope to automatically parallelize most and possibly all perfect codes significant accomplishment
in this paper we address the issue of deciding when to stop active learning for building labeled training corpus firstly this paper presents new stopping criterion classification change which considers the potential ability of each unlabeled example on changing decision boundaries secondly multi criteria based combination strategy is proposed to solve the problem of predefining an appropriate threshold for each confidence based stopping criterion such as max confidence min error and overall uncertainty finally we examine the effectiveness of these stopping criteria on uncertainty sampling and heterogeneous uncertainty sampling for active learning experimental results show that these stopping criteria work well on evaluation data sets and the combination strategies outperform individual criteria
data warehouses dws are currently considered to be the cornerstone of business intelligence bi systems security is key issue in dws since the business information that they manage is crucial and highly sensitive and should be carefully protected however the increasing amount of data available on the web signifies that more and more dw systems are considering the web as the primary data source through which to populate their dws xml is therefore widely accepted as being the principal means through which to provide easier data and metadata interchange among heterogeneous data sources from the web and the dw systems although security issues have been considered during the whole development process of traditional dws current research lacks approaches with which to consider security when the target platform is based on the web and xml technologies the idiosyncrasy of the unstructured and semi structured data available on the web definitely requires particular security rules that are specifically tailored to these systems in order to permit their particularities to be captured correctly in order to tackle this situation in this paper we propose methodological approach based on the model driven architecture mda for the development of secure xml dws we therefore specify set of transformation rules that are able to automatically generate not only the corresponding xml structure of the dw from secure conceptual dw models but also the security rules specified within the dw xml structure thus allowing us to implement both aspects simultaneously case study is provided at the end of the paper to show the benefits of our approach
in today’s data rich networked world people express many aspects of their lives online it is common to segregate different aspects in different places you might write opinionated rants about movies in your blog under pseudonym while participating in forum or web site for scholarly discussion of medical ethics under your real name however it may be possible to link these separate identities because the movies journal articles or authors you mention are from sparse relation space whose properties eg many items related to by only few users allow re identification this re identification violates people’s intentions to separate aspects of their life and can have negative consequences it also may allow other privacy violations such as obtaining stronger identifier like name and addressthis paper examines this general problem in specific setting re identification of users from public web movie forum in private movie ratings dataset we present three major results first we develop algorithms that can re identify large proportion of public users in sparse relation space second we evaluate whether private dataset owners can protect user privacy by hiding data we show that this requires extensive and undesirable changes to the dataset making it impractical third we evaluate two methods for users in public forum to protect their own privacy suppression and misdirection suppression doesn’t work here either however we show that simple misdirection strategy works well mention few popular items that you haven’t rated
imagine some program and number of changes if none of these changes is applied yesterday the program works if all changes are applied today the program does not work which change is responsible for the failure this is how the abstract of the paper yesterday my program worked today it does not why started paper which originally published at esec fse introduced the concept of delta debugging one of the most popular automated debugging techniques this year this paper receives the acm sigsoft impact paper award recognizing its influence in the past ten years in my keynote review the state of debugging then and now share how it can be hard to be simple what programmers really need and what research should do and should not do to explore these needs and cater to them
bittorrent suffers from the free riding problem induced by selfish peers hurting the system robustness existing research studies have focused on the fairness performance and robustness of bittorrent resulting from the tit for tat tft choking algorithm while very few studies have considered the effect of the seed choking algorithm this paper experimentally analyzes the impact of the free riding of selfish peers on bittorrent’s performance and robustness and proposes an activeness based seed choking algorithm where according to the activeness values of request peers which are the ratios of the available download bandwidth to the available upload bandwidth seed preferentially uploads to five request peers with the highest activeness values without any explicit reputation management system our simulation experiments show that compared to existing seed choking algorithms the activeness based seed choking algorithm not only restrains the free riding of selfish peers but also improves the performance of benign peers enhancing bittorrent’s robustness
concurrent object oriented programming coop languages focus the abstraction and encapsulation power of abstract data types on the problem of concurrency control in particular pure fine grained concurrent object oriented languages as opposed to hybrid or data parallel provides the programmer with simple uniform and flexible model while exposing maximum concurrency while such languages promise to greatly reduce the complexity of large scale concurrent programming the popularity of these languages has been hampered by efficiency which is often many orders of magnitude less than that of comparable sequential code we present sufficiency set of techniques which enables the efficiency of fine grained concurrent object oriented languages to equal that of traditional sequential languages like when the required data is available these techniques are empirically validated by the application to coop implementation of the livermore loops
radio resource management and quality of service qos provision in mobile ad hoc networks manets require the cooperation among different nodes and the design of distributed control mechanisms imposed by the self configuring and dynamic nature of these networks in this context in order to solve the tradeoff between qos provision and efficient resource utilization distributed admission control is required this article presents an adaptive admission procedure based on cross layer qos routing supported by an efficient end to end available bandwidth estimation the proposed scheme has been designed to perform flexible parameters configuration that allows to adapt the system response to the observed grade of mobility in the environment the performance evaluation has shown the capability of the proposal to guarantee soft qos provision thanks to flexible resource management adapted to different scenarios
tcp is suboptimal in heterogeneous wired wireless networks because it reacts in the same way to losses due to congestion and losses due to link errors in this paper we propose to improve tcp performance in wired wireless networks by endowing it with classifier that can distinguish packet loss causes in contrast to other proposals we do not change tcp’s congestion control nor tcp’s error recovery packet loss whose cause is classified as link error will simply be ignored by tcp’s congestion control and recovered as usual while packet loss classified as congestion loss will trigger both mechanisms as usual to build our classification algorithm database of pre classified losses is gathered by simulating large set of random network conditions and classification models are automatically built from this database by using supervised learning methods several learning algorithms are compared for this task our simulations of different scenarios show that adding such classifier to tcp can improve the throughput of tcp substantially in wired wireless networks without compromizing tcp friendliness in both wired and wireless environments
content adaptation is an attractive solution for the ever growing desktop based web content delivered to the user via heterogeneous devices in order to provide acceptable experience while surfing the web bridging the mismatch between the rich content and the user device’s resources display processing navigation network bandwidth media support without user intervention requires proactive behavior while content adaptation poses multitude of benefits without proper strategies adaptation will not be truly optimized there have been many projects focused on content adaptation that have been designed with different goals and approaches in this paper we introduce comprehensive classification for content adaptation system the classification is used to group the approaches applied in the implementation of existing content adaptation system survey on some content adaptation systems also been provided we also present the research spectrum in content adaptation and discuss the challenges
coronary heart disease chd is global epidemic that is the leading cause of death worldwide chd can be detected by measuring and scoring the regional and global motion of the left ventricle lv of the heart this project describes novel automatic technique which can detect the regional wall motion abnormalitie of the lv from echocardiograms given sequence of endocardial contours extracted from lv ultrasound images the sequence of contours moving through time can be interpreted as three dimensional surface from the surfaces we compute several geometry based features shape index values curvedness surface normals etc to obtain histograms based similarity functions that are optimally combined using mathematical programming approach to learn kernel function designed to classify normal vs abnormal heart wall motion in contrast with other state of the art methods our formulation also generates sparse kernels kernel sparsity is directly related to the computational cost of the kernel evaluation which is an important factor when designing classifiers that are part of real time system experimental results on set of echocardiograms collected in routine clinical practice at one hospital demonstrate the potential of the proposed approach
agile software development represents major departure from traditional plan based approaches to software engineering systematic review of empirical studies of agile software development up to and including was conducted the search strategy identified studies of which were identified as empirical studies the studies were grouped into four themes introduction and adoption human and social factors perceptions on agile methods and comparative studies the review investigates what is currently known about the benefits and limitations of and the strength of evidence for agile methods implications for research and practice are presented the main implication for research is need for more and better empirical studies of agile software development within common research agenda for the industrial readership the review provides map of findings according to topic that can be compared for relevance to their own settings and situations
we consider the problem of cropping surveillance videos this process chooses trajectory that small sub window can take through the video selecting the most important parts of the video for display on smaller monitor we model the information content of the video simply by whether the image changes at each pixel then we show that we can find the globally optimal trajectory for cropping window by using shortest path algorithm in practice we can speed up this process without affecting the results by stitching together trajectories computed over short intervals this also reduces system latency we then show that we can use second shortest path formulation to find good cuts from one trajectory to another improving coverage of interesting events in the video we describe additional techniques to improve the quality and efficiency of the algorithm and show results on surveillance videos
we describe an implementation that has users flick notes images audio and video files onto virtual imaginary piles beyond the display of small screen devices multiple sets of piles can be maintained in persistent workspaces two user studies yielded the following participants developed mental schemes to remember virtual pile locations and they successfully reinstated pile locations after several days while situated in varying environments alignment of visual cues on screen with surrounding physical cues in situ accelerated sorting task when compared to other non aligned visual cues the latter however yielded better long term retention
dynamic coalitions enable autonomous domains to achieve common objectives by sharing resources based on negotiated resource sharing agreements major requirement for administering dynamic coalitions is the availability of comprehensive set of access control tools in this paper we discuss the design implementation evaluation and demonstration of such tools in particular we have developed tools for negotiating resource sharing agreements access policy specification access review wholesale and selective distribution and revocation of privileges and policy decision and enforcement
in this paper we propose methodology for detecting abnormal traffic on the net such as worm attacks based on the observation of the behaviours of different elements at the network edges in order to achieve this we suggest set of critical features and we judge normal site status based on these standards for our goal this characterization must be free of virus traffic once this has been set we would be able to find abnormal situations when the observed behaviour set against the same features is significantly different from the previous model we have based our work on netflow information generated by the main routers in the university of zaragoza network with more than hosts the proposed model helps to characterize the whole corporate network sub nets and the individual hosts this methodology has proved its effectiveness in real infections caused by viruses such as spybot agobot etc in accordance with our experimental tests this system would allow to detect new kind of worms independently from the vulnerabilities or methods used for their propagation
qos routing in multi channel wireless mesh networks wmns with contention based mac protocols is very challenging problem in this paper we propose an on demand bandwidth constrained routing protocol for multiradio multi rate multi channel wmns with the ieee dcf mac protocol the routing protocol is based on distributed threshold triggered bandwidth estimation scheme implemented at each node for estimating the free to use bandwidth on each associated channel according to the free to use bandwidth at each node the call admission control which is integrated into the routing protocol predicts the residual bandwidth of path with the consideration of inter flow and intra flow interference to select the most efficient path among all feasible ones we propose routing metric which strikes balance between the cost and the bandwidth of the path the simulation results show that our routing protocol can successfully discover paths that meet the end to end bandwidth requirements of flows protect existing flows from qos violations exploit the capacity gain due to multiple channels and incurs low message overhead
support vector machine svm is novel pattern classification method that is valuable in many applications kernel parameter setting in the svm training process along with the feature selection significantly affects classification accuracy the objective of this study is to obtain the better parameter values while also finding subset of features that does not degrade the svm classification accuracy this study develops simulated annealing sa approach for parameter determination and feature selection in the svm termed sa svm to measure the proposed sa svm approach several datasets in uci machine learning repository are adopted to calculate the classification accuracy rate the proposed approach was compared with grid search which is conventional method of performing parameter setting and various other methods experimental results indicate that the classification accuracy rates of the proposed approach exceed those of grid search and other approaches the sa svm is thus useful for parameter determination and feature selection in the svm
we present method for generating surface crack patterns that appear in materials such as mud ceramic glaze and glass to model these phenomena we build upon existing physically based methods our algorithm generates cracks from stress field defined heuristically over triangle discretization of the surface the simulation produces cracks by evolving this field over time the user can control the characteristics and appearance of the cracks using set of simple parameters by changing these parameters we have generated examples similar to variety of crack patterns found in the real world we assess the realism of our results by comparison with photographs of real world examples using physically based approach also enables us to generate animations similar to time lapse photography
service level agreements slas define performance guarantees made by service providers eg in terms of packet loss delay delay variation and network availability in this paper we describe new active measurement methodology to accurately monitor whether measured network path characteristics are in compliance with performance targets specified in slas specifically we introduce new methodology for measuring mean delay along path that improves accuracy over existing methodologies and method for obtaining confidence intervals on quantiles of the empirical delay distribution without making any assumption about the true distribution of delay introduce new methodology for measuring delay variation that is more robust than prior techniques describe new methodology for estimating packet loss rate that significantly improves accuracy over existing approaches and extend existing work in network performance tomography to infer lower bounds on the quantiles of distribution of performance measures along an unmeasured path given measurements from subset of paths active measurements for these metrics are unified in discrete time based tool called slam the unified probe stream from slam consumes lower overall bandwidth than if individual streams are used to measure path properties we demonstrate the accuracy and convergence properties of slam in controlled laboratory environment using range of background traffic scenarios and in one and two hop settings and examine its accuracy improvements over existing standard techniques
this chapter considers the different temporal constructs appeared in the literature of temporal conceptual models timestamping and evolution constraints and it provides coherent model theoretic formalisation for them it then introduces correct and succinct encoding in subset of first order temporal logic namely mathcal dlr mathcal us the description logic mathcal dlr extended with the temporal operators since and until at the end results on the complexity of reasoning in temporal conceptual models are presented
virtual worlds have long history and it also includes various technologies yet research interest towards them has diverged over the years it seems that nowadays they are on focus again with the evolution of socially oriented and community supporting virtual worlds instead of technical factors human factors and the motivation behind the use are highlighted in this paper we will briefly review the research efforts made in the and derive set of themes that were of interest back then next we will expand the set by arguing for newer themes identified in the latest information systems literature these two sets of themes form basis of research agenda for studying virtual worlds in the next five years the sets also compose dimensional space for further theorizing and developing new virtual worlds or virtual places for work and pleasure hence the paper presents and argues for transition from technology engineering to social engineering
maintaining software systems is becoming more difficult as the size and complexity of software increase one factor that complicates software maintenance is the presence of code clones code clone is code fragment that has identical or similar code fragments to it in the source code code clones are introduced for various reasons such as reusing code by copy and paste if modifying code clone with many similar code fragments we must consider whether to modify each of them especially for large scale software such process is very complicated and expensive in this paper we propose methods of visualizing and featuring code clones to support their understanding in large scale software the methods have been implemented as tool called gemini which has applied to an open source software system application results show the usefulness and capability of our system
digital archives can best survive failures if they have made several opies of their collections at remote sites in this paper we discuss how autonomous sites can cooperate to provide preservation by trading data we examine the decisions that an archive must make when forming trading networks such as the amount of storage space to provide and the best number of partner sites we also deal with the fact that some sites may be more reliable than others experimental results from data trading simulator illustrate which policies are most reliable our techniques focus on preserving the bits of digital collections other services that focus on other archiving concerns such as preserving meaningful metadata can be built on top of the system we describe here
out of order speculative processors need bookkeeping method to recover from incorrect speculation in recent years several microarchitectures that employ checkpoints have been proposed either extending the reorder buffer or entirely replacing it this work presents an in dept study of checkpointing in checkpoint based microarchitectures from the desired content of checkpoint via implementation trade offs and to checkpoint allocation and release policies major contribution of the article is novel adaptive checkpoint allocation policy that outperforms known policies the adaptive policy controls checkpoint allocation according to dynamic events such as second level cache misses and rollback history it achieves percnt and percnt speedup for the integer and floating point benchmarks respectively and does not require branch confidence estimator the results show that the proposed adaptive policy achieves most of the potential of an oracle policy whose performance improvement is percnt and percnt for the integer and floating point benchmarks respectively we exploit known techniques for saving leakage power by adapting and applying them to checkpoint based microarchitectures the proposed applications combine to reduce the leakage power of the register file to about one half of its original value
software systems today are built from collections of interacting components written in different languages at varying levels of abstraction from the machine hardware the ability to integrate certified components from different levels of software architecture is necessary part of the process of developing dependable and secure computing infrastructure in this paper we present prototype system in the context of proof carrying code that allows for the integration of safety proofs derived from high level type system with certified low level memory management runtime library
in this paper we introduce system named argo which provides intelligent advertising made possible from users photo collections based on the intuition that user generated photos imply user interests which are the key for profitable targeted ads the argo system attempts to learn user’s profile from his shared photos and suggests relevant ads accordingly to learn user interest in an offline step hierarchical and efficient topic space is constructed based on the odp ontology which is used later on for bridging the vocabulary gap between ads and photos as well as reducing the effect of noisy photo tags in the online stage the process of argo contains three steps understanding the content and semantics of user’s photos and auto tagging each photo to supplement user submitted tags such tags may not be available learning the user interest given set of photos based on the learnt hierarchical topic space and representing ads in the topic space and matching their topic distributions with the target user interest the top ranked ads are output as the suggested ads two key challenges are tackled during the process the semantic gap between the low level image visual features and the high level user semantics and the vocabulary impedance between photos and ads we conducted series of experiments based on real flickr users and amazoncom products as candidate ads which show the effectiveness of the proposed approach
for an increasing number of modern database applications efficient support of similarity search becomes an important task along with the complexity of the objects such as images molecules and mechanical parts also the complexity of the similarity models increases more and more whereas algorithms that are directly based on indexes work well for simple medium dimensional similarity distance functions they do not meet the efficiency requirements of complex high dimensional and adaptable distance functions the use of multi step query processing strategy is recommended in these cases and our investigations substantiate that the number of candidates which are produced in the filter step and exactly evaluated in the refinement step is fundamental efficiency parameter after revealing the strong performance shortcomings of the state of the art algorithm for nearest neighbor search korn et al we present novel multi step algorithm which is guaranteed to produce the minimum number of candidates experimental evaluations demonstrate the significant performance gain over the previous solution and we observed average improvement factors of up to for the number of candidates and up to for the total runtime
recommender systems are based mainly on collaborative filtering algorithms which only use the ratings given by the users to the products when context is taken into account there might be difficulties when it comes to making recommendations to users who are placed in context other than the usual one since their preferences will not correlate with the preferences of those in the new context in this paper hybrid collaborative filtering model is proposed which provides recommendations based on the context of the travelling users combination of user based collaborative filtering method and semantic based one has been used contextual recommendation may be applied in multiple social networks that are spreading world wide the resulting system has been tested over com good example of social network where context is primary concern
paraphrase patterns are semantically equivalent patterns which are useful in both paraphrase recognition and generation this paper presents pivot approach for extracting paraphrase patterns from bilingual parallel corpora whereby the paraphrase patterns in english are extracted using the patterns in another language as pivots we make use of log linear models for computing the paraphrase likelihood between pattern pairs and exploit feature functions based on maximum likelihood estimation mle lexical weighting lw and monolingual word alignment mwa using the presented method we extract more than million pairs of paraphrase patterns from about million pairs of bilingual parallel sentences the precision of the extracted paraphrase patterns is above experimental results show that the presented method significantly outperforms well known method called discovery of inference rules from text dirt additionally the log linear model with the proposed feature functions are effective the extracted paraphrase patterns are fully analyzed especially we found that the extracted paraphrase patterns can be classified into five types which are useful in multiple natural language processing nlp applications
we introduce structural feasibility into procedural modeling of buildings this allows for more realistic structural models that can be interacted with in physical simulations while existing structural analysis tools focus heavily on providing an analysis of the stress state our proposed method automatically tunes set of designated free parameters to obtain forms that are structurally sound
in an identity based encryption ibe scheme there is key extraction protocol where user submits an identity string to master authority who then returns the corresponding secret key for that identity in this work we describe how this protocol can be performed efficiently and in blind fashion for several known ibe schemes that is user can obtain secret key for an identity without the master authority learning anything about this identity we formalize this notion as blind ibe and discuss its many practical applications in particular we build upon the recent work of camenisch neven and shelat to construct oblivious transfer ot schemes which achieve full simulatability for both sender and receiver ot constructions with comparable efficiency prior to camenisch et al were proven secure in the weaker half simulation model our ot schemes are constructed from the blind ibe schemes we propose which require only static complexity assumptions eg dbdh whereas prior comparable schemes require dynamic assumptions eg pddh
we introduce volumetric space time technique for the reconstruction of moving and deforming objects from point data the output of our method is four dimensional space time solid made up of spatial slices each of which is three dimensional solid bounded by watertight manifold the motion of the object is described as an incompressible flow of material through time we optimize the flow so that the distance material moves from one time frame to the next is bounded the density of material remains constant and the object remains compact this formulation overcomes deficiencies in the acquired data such as persistent occlusions errors and missing frames we demonstrate the performance of our flow based technique by reconstructing coherent sequences of watertight models from incomplete scanner data
in information systems engineering conceptual models are constructed to assess existing information systems and work out requirements for new ones as these models serve as means for communication between customers and developers it is paramount that both parties understand the models as well as that the models form proper basis for the subsequent design and implementation of the systems new case environments are now experimenting with formal modeling languages and various techniques for validating conceptual models though it seems difficult to come up with technique that handles the linguistic barriers between the parties involved in satisfactory manner in this article we discuss the theoretical basis of an explanation component implemented for the ppp case environment this component integrates other validation techniques and provides very flexible natural language interface to complex model information it describes properties of the modeling language and the conceptual models in terms familiar to users and the explanations can be combined with graphical model views when models are executed it can justify requested inputs and explain computed outputs by relating trace information to properties of the models
sensor networks naturally apply to broad range of applications that involve system monitoring and information tracking eg airport security infrastructure monitoring of children in metropolitan areas product transition in warehouse networks fine grained weather environmental measurements etc meanwhile there are considerable performance deficiencies in applying existing sensornets in the applications that have stringent requirements for efficient mechanisms for querying sensor data and delivering the query result the amount of data collected from all relevant sensors may be quite large and will require high data transmission rates to satisfy time constraints it implies that excessive packet collisions can lead to packet losses and retransmissions resulting in significant energy costs and latency in this paper we provide formal consideration of data transmission algebra dta that supports application driven data interrogation patterns and optimization across multiple network layers we use logical framework to specify dta semantics and to prove its soundness and completeness further we prove that dta query execution schedules have the key property of being collision free finally we describe and evaluate an algebraic query optimizer performing collision aware query scheduling that both improve the response time and reduce the energy consumption
caterpillar expressions have been introduced by brüggemann klein and wood for applications in markup languages caterpillar expression can be implemented as tree walking automaton operating on unranked trees here we give formal definition of determinism of caterpillar expressions that is based on the language of instruction sequences defined by the expression we show that determinism of caterpillar expressions can be decided in polynomial time
we present new disk scheduling framework to address the needs of shared multimedia service that provides differentiated multilevel quality of service for mixed media workloads in such shared service requests from different users have different associated performance objectives and utilities in accordance with the negotiated service level agreements slas service providers typically provision resources only for average workload intensity so it becomes important to handle workload surges in way that maximizes the utility of the served requests we capture the performance objectives and utilities associated with these multiclass diverse workloads in unified framework and formulate the disk scheduling problem as reward maximization problem we map the reward maximization problem to minimization problem on graphs and by novel use of graph theoretic techniques design scheduling algorithm that is computationally efficient and optimal in the class of seek optimizing algorithms comprehensive experimental studies demonstrate that the proposed algorithm outperforms other disk schedulers under all loads with the performance improvement approaching percnt under certain high load conditions in contrast to existing schedulers the proposed scheduler is extensible to new performance objectives workload type and utilities by simply altering the reward functions associated with the requests
the traffic matrix tm is one of the crucial inputs for many network management and traffic engineering tasks as it is usually impossible to directly measure traffic matrices it becomes an important research topic to infer them by modeling incorporating measurable data and additional information many estimation methods have been proposed so far but most of them are not sufficiently accurate or efficient researchers are therefore making efforts to seek better estimation methods of the proposed methods the kalman filtering method is very efficient and accurate method however the error covariance calculation components of kalman filtering are difficult to implement in realistic network systems due to the existence of ill conditioning problems in this paper we proposed square root kalman filtering traffic matrix estimation srkftme algorithm based on matrix decomposition to improve the kalman filtering method the srkftme algorithm makes use of the evolution equations of forecast and analysis error covariance square roots in this way the srkftme algorithm can ensure the positive definiteness of the error covariance matrices which can solve some ill conditioning problems also square root kalman filtering will be less affected by numerical problems simulation and actual traffic testing results show superior accuracy and stability of srkftme algorithm compared with prior kalman filtering methods
the problem of interestingness of discovered rules has been investigated by many researchers the issue is that data mining algorithms often generate too many rules which make it very hard for the user to find the interesting ones over the years many techniques have been proposed however few have made it to real life applications since august we have been working on major application for motorola the objective is to find causes of cellular phone call failures from large amount of usage log data class association rules have been shown to be suitable for this type of diagnostic data mining application we were also able to put several existing interestingness methods to the test which revealed some major shortcomings one of the main problems is that most existing methods treat rules individually however we discovered that users seldom regard single rule to be interesting by itself rule is only interesting in the context of some other rules furthermore in many cases each individual rule may not be interesting but group of them together can represent an important piece of knowledge this led us to discover deficiency of the current rule mining paradigm using non zero minimum support and non zero minimum confidence eliminates large amount of context information which makes rule analysis difficult this paper proposes novel approach to deal with all of these issues which casts rule analysis as olap operations and general impression mining this approach enables the user to explore the knowledge space to find useful knowledge easily and systematically it also provides natural framework for visualization as an evidence of its effectiveness our system called opportunity map based on these ideas has been deployed and it is in daily use in motorola for finding actionable knowledge from its engineering and other types of data sets
increasing cache capacity via compression enables designers to improve performance of existing designs for small incremental cost further leveraging the large die area invested in last level caches this paper explores the compressed cache design space with focus on implementation feasibility our compression schemes use companion line pairs cache lines whose addresses differ by single bit as candidates for compression we propose two novel compressed cache organizations the companion bit remapped cache and the pseudoassociative cache our cache organizations use fixed width physical cache line implementation while providing variablelength logical cache line organization without changing the number of sets or ways and with minimal increase in state per tag we evaluate banked and pairwise schemes as two alternatives for storing compressed companion pairs within physical cache line we evaluate companion line prefetching clp simple yet effective prefetching mechanism that works in conjunction with our compression scheme clp is nearly pollution free since it only prefetches lines that are compression candidates using detailed cycle accurate ia simulator we measure the performance of several third level compressed cache designs simulating representative collection of workloads our experiments show that our cache compression designs improve ipc for all cache sensitive workloads even those with modest data compressibility the pairwise pseudo associative compressed cache organization with companion line prefetching is the best configuration providing mean ipc improvement of for cache sensitive workloads and best case ipc improvement of finally our cache designs exhibit negligible overall ipc degradation for cache insensitive workloads
grid computing utilizes the distributed heterogeneous resources in order to support complicated computing problems grid can be classified into two types computing grid and data grid job scheduling in computing grid is very important problem to utilize grids efficiently we need good job scheduling algorithm to assign jobs to resources in grids in the natural environment the ants have tremendous ability to team up to find an optimal path to food resources an ant algorithm simulates the behavior of ants in this paper we propose balanced ant colony optimization baco algorithm for job scheduling in the grid environment the main contributions of our work are to balance the entire system load while trying to minimize the makespan of given set of jobs compared with the other job scheduling algorithms baco can outperform them according to the experimental results
in this paper we present firewxnet multi tiered portable wireless system for monitoring weather conditions in rugged wildland fire environments firewxnet provides the fire fighting community the ability to safely and easily measure and view fire and weather conditions over wide range of locations and elevations within forest fires this previously unattainable information allows fire behavior analysts to better predict fire behavior heightening safety considerations our system uses tiered structure beginning with directional radios to stretch deployment capabilities into the wilderness far beyond current infrastructures at the end point of our system we designed and integrated multi hop sensor network to provide environmental data we also integrated web enabled surveillance cameras to provide visual data this paper describes week long full system deployment utilizing sensor networks and web cams in the selway salmon complex fires of we perform an analysis of system performance and present observations and lessons gained from our deployment
prior works in communication security policy have focused on general purpose policy languages and evaluation algorithms however because the supporting frameworks often defer enforcement the correctness of realization of these policies in software is limited by the quality of domain specific implementations this paper introduces the antigone communication security policy enforcement framework the antigone framework fills the gap between representations and enforcement by implementing and integrating the diverse security services needed by policy policy is enforced by the run time composition configuration and regulation of security services we present the antigone architecture and demonstrate non trivial applications and policies profile of policy enforcement performance is developed and key architectural enhancements identified we also consider the advantages and disadvantages of alternative software architectures appropriate for policy enforcement
the increasing importance of unstructured knowledge intensive processes in enterprises is largely recognized conventional workflow solutions do not provide adequate support for the management and optimization of such processes therefore the need for more flexible approaches arises this paper presents conceptual framework for unobtrusive support of unstructured knowledge intensive business processes the framework enables modeling exchange and reuse of light weight user defined task structures in addition to the person to person exchange of best practices it further enables outsourcing of dynamic task structures and resources in personal workspaces and organizational units where these are managed according to local domain knowledge and made available for reuse in shared repositories the delegation of tasks enables the generation of enterprise process chains spreading beyond the boundaries of user’s personal workspace the structures emerging from user defined tasks task delegations and on demand acquisition of dynamic externally managed tasks and resources adequately represent agile human centric business processes thereby the framework facilitates effective knowledge management and fosters proactive tailoring of underspecified business processes through end users in light weight unobtrusive manner the presented concepts are supported within the collaborative task manager ctm novel prototype for email integrated task management
the efficient distributed construction of maximal independent set mis of graph is of fundamental importance we study the problem in the class of growth bounded graphs which includes for example the well known unit disk graphs in contrast to the fastest time optimal existing approach we assume that no geometric information eg distances in the graph’s embedding is given instead nodes employ randomization for their decisions our algorithm computes mis in log log log rounds with very high probability for graphs with bounded growth where denotes the number of nodes in the graph in view of linial’s log lower bound for computing mis in ring networks which was extended to randomized algorithms independently by naor and linial our solution is close to optimal in nutshell our algorithm shows that for computing mis randomization is viable alternative to distance information
we identify and present major interaction approach for tangible user interfaces based upon systems of tokens and constraints in these interfaces tokens are discrete physical objects which represent digital information constraints are confining regions that are mapped to digital operations these are frequently embodied as structures that mechanically channel how tokens can be manipulated often limiting their movement to single degree of freedom placing and manipulating tokens within systems of constraints can be used to invoke and control variety of computational interpretationswe discuss the properties of the token constraint approach consider strengths that distinguish them from other interface approaches and illustrate the concept with eleven past and recent supporting systems we present some of the conceptual background supporting these interfaces and consider them in terms of bellotti et al five questions for sensing based interaction we believe this discussion supports token constraint systems as powerful and promising approach for sensing based interaction
the ability to predict the time required to repair software defects is important for both software quality management and maintenance estimated repair times can be used to improve the reliability and time to market of software under development this paper presents an empirical approach to predicting defect repair times by constructing models that use well established machine learning algorithms and defect data from past software defect reports we describe as case study the analysis of defect reports collected during the development of large medical software system our predictive models give accuracies as high as despite the limitations of the available data we present the proposed methodology along with detailed experimental results which include comparisons with other analytical modeling approaches
fine grained lock protocols should allow for highly concurrent transaction processing on xml document trees which is addressed by the tadom lock protocol family enabling specific lock modes and lock granules adjusted to the various xml processing models we have already proved its operational flexibility and performance superiority when compared to competitor protocols here we outline our experiences gained during the implementation and optimization of these protocols we figure out their performance drivers to maximize throughput while keeping the response times at an acceptable level and perfectly exploiting the advantages of our tailor made lock protocols for xml trees because we have implemented all options and alternatives in our prototype system xtc benchmark runs for all drivers allow for comparisons in identical environments and illustrate the benefit of all implementation decisions finally they reveal that careful lock protocol optimization pays off
agent based simulations have proven to be suitable to investigate many kinds of problems especially in the field of social science but to provide useful insights the behaviour of the involved simulated actors needs to reflect relevant features of the real world in this paper we address one particular aspect in this regard namely the correct reflection of an actor’s evolution during simulation very often some knowledge exists about how an actor can evolve for example the typical development stages of entrepreneurs when investigating entrepreneurship networks we propose to model this knowledge explicitly using evolution links between roles enriched with suitable conditions and extend an agent and goal oriented modelling framework thereby we provide mapping to the simulation environment congolog that serves as an intermediary approach between not providing change of behaviour at all and very open approaches to behaviour adaptation such as learning
the continuous growth in the size and use of the world wide web imposes new methods of design and development of online information services the need for predicting the users needs in order to improve the usability and user retention of web site is more than evident and can be addressed by personalizing it recommendation algorithms aim at proposing ldquo next rdquo pages to users based on their current visit and past users navigational patterns in the vast majority of related algorithms however only the usage data is used to produce recommendations disregarding the structural properties of the web graph thus important mdash in terms of pagerank authority score mdash pages may be underrated in this work we present upr pagerank style algorithm which combines usage data and link analysis techniques for assigning probabilities to web pages based on their importance in the web site’s navigational graph we propose the application of localized version of upr upr to personalized navigational subgraphs for online web page ranking and recommendation moreover we propose hybrid probabilistic predictive model based on markov models and link analysis for assigning prior probabilities in hybrid probabilistic model we prove through experimentation that this approach results in more objective and representative predictions than the ones produced from the pure usage based approaches
code size and energy consumption are critical design concerns for embedded processors as they determine the cost of the overall system techniques such as reduced length instruction sets lead to significant code size savings but also introduce performance and energy consumption impediments such as additional dynamic instructions or decompression latency in this paper we show that block aware instruction set bliss which stores basic block descriptors in addition to and separately from the actual instructions in the program allows embedded processors to achieve significant improvements in all three metrics reduced code size and improved performance and lower energy consumption
references between objects in loosely coupled distributed information systems pose problem on the one hand one tries to avoid referential inconsistencies like eg dangling links in the www on the other hand using strict constraints as in databases may restrict the data providers severely we present the solution to this problem that we developed for the nexus system the approach tolerates referential inconsistencies in the data while providing consistent query answers to users for traversing references we present concept based on return references this concept is especially suitable for infrequent object migrations and provides good query performance for scenarios where object migrations are frequent we developed an alternative concept based on distributed hash table
representations are at work in it technology as plans of and for work they enable cooperation coordination accountability and control which have to be balanced off against each other the article describes standard developed for electronic health records ehr and the results of test of prototype built on that standard at department of internal medicine in it is argued that the prototype did not support clinical work which is attributed to the model of work embedded in the standard called basic structure for ehr behr the article concludes by calling for critical conceptualizations of the relations between representation work and knowledge production
we study envy free mechanisms for scheduling tasks on unrelated machines agents that approximately minimize the makespan for indivisible tasks we put forward an envy free poly time mechanism that approximates the minimal makespan to within factor of log where is the number of machines we also show lower bound of log log log this improves the recent result of mu’alem who give an upper bound of and lower bound of for divisible tasks we show that there always exists an envy free poly time mechanism with optimal makespan finally we demonstrate how our mechanism for envy free makespan minimization can be interpreted as market clearing problem
the goal of this research is to organize maps mined from journal articles into categories for hierarchical browsing within region time and theme facets map training set collected manually was used to develop classifiers metadata pertinent to the maps were harvested and then run separately though knowledge sources and our classifiers for region time and theme evaluation of the system based on map test set of unseen maps showed classification accuracy when compared with two human classifications for the same maps data mining and semantic analysis methods used here could support systems that index other types of article components such as diagrams or charts by region time and theme
very challenging issue for optimizing compilers is the phase ordering problem in what order should collection of compiler optimizations be performed we address this problem in the context of optimizing sequence of tensor contractions the pertinent loop transformations are loop permutation tiling and fusion in addition the placement of disk statements crucially affects performance the space of possible combinations is exponentially large we develop novel pruning strategies whereby search problem in larger space is replaced by large number of searches in much smaller space to determine the optimal permutation fusion tiling and placement of disk statements experimental results show that we obtain an improvement in cost by factor of up to over an equi tile size approach
in concurrent programming non blocking synchronization is very efficient but difficult to design correctly this paper presents static analysis to show that code blocks are atomic ie that every execution of the program is equivalent to one in which those code blocks execute without interruption by other threads our analysis determines commutativity of operations based primarily on how synchronization primitives including locks load linked store conditional and compare and swap are used reduction theorem states that certain patterns of commutativity imply atomicity atomicity is itself an important correctness requirement for many concurrent programs furthermore an atomic code block can be treated as single transition during subsequent analysis of the program this can greatly improve the efficiency of the subsequent analysis we demonstrate the effectiveness of our approach on several concurrent non blocking programs
the continuous development of wireless networks and mobile devices has motivated an intense research in mobile data services some of these services provide the user with context aware information specifically location based services and location dependent queries have attracted lot of interest in this article the existing literature in the field of location dependent query processing is reviewed the technological context mobile computing and support middleware such as moving object databases and data stream technology are described location based services and location dependent queries are defined and classified and different query processing approaches are reviewed and compared
we present an algorithm for efficient depth calculations and view synthesis the main goal is the on line generation of realistic interpolated views of dynamic scene the inputs are video streams originating from two or more calibrated static cameras efficiency is accomplished by the parallel use of the cpu and the gpu in multi threaded implementation the input images are projected on plane sweeping through space using the hardware accelerated transformations available on the gpu correlation measure is calculated simultaneously for all pixels on the plane and is compared at the different plane positions noisy virtual view and crude depth map result in very limited time we apply min cut max flow algorithm on graph implemented on the cpu to ameliorate this result by global optimisation
the both as view bav approach to data integration has the advantage of specifying mappings between schemas in bidirectional manner so that once bav mapping has been established between two schemas queries may be exchanged in either direction between the schemas in this paper we discuss the reformulation of queries over bav transformation pathways and demonstrate the use of this reformulation in two modes of query processing in the first mode public schemas are shared between peers and queries posed on the public schema can be reformulated into queries over any data sources that have been mapped to the public schema in the second queries are posed on the schema of data source and are reformulated into queries on another data source via any public schema to which both data sources have been mapped
people have developed variety of conventions for negotiating face to face interruptions the physical distribution of teams however together with the use of computer mediated communication and awareness systems fundamentally alters what information is available to person considering an interruption of remote collaborator this paper presents detailed comparison between self reports of interruptibility collected from participants over extended periods in their actual work environment and estimates of this interruptibility provided by second set of participants based on audio and video recordings our results identify activities and environmental cues that affect participants ability to correctly estimate interruptibility we show for example that closed office door had significant effect on observers estimation of interruptibility but did not have an effect on participants reports of their own interruptibility we discuss our findings and their importance for successful design of computer mediated communication and awareness systems
the design of new indexes has been driven by many factors such as data types operations and application environment the increasing demand for database systems to support new applications such as online analytical processing olap spatial databases and temporal databases has continued to fuel the development of new indexes in this paper we summarize the major considerations in developing new indexes paying particular attention to progress made in the design of indexes for spatial temporal databases and object oriented databases oodb our discussion focuses on the general concepts or features of these indexes thus presenting the building blocks for meeting the challenges of designing new indexes for novel applications to be encountered in the future
embodied interaction has been claimed to offer important advantages for learning programming however frequently claims have been based on intuitions and work in the area has focused largely around system building rather than on evaluation and reflection around those claims taking into account research in the area as well as in areas such as tangibles psychology of programming and the learning and teaching of programming this paper identifies set of important factors to take into account when analysing the potential of learning environments for programming employing embodied interaction these factors are formulated as set of questions that could be asked either when designing or analysing this type of learning environments
the relation between datalog programs and homomorphism problems and between datalog programs and bounded treewidth structures has been recognized for some time and given much attention recently additionally the essential role of persistent variables of program expansions in solving several relevant problems has also started to be observed it turns out that to understand the contribution of these persistent variables to the difficulty of some expressibility problems we need to understand the interrelationship among different notions of persistency numbers some of which we introduce and or formalize in the present workthis article is first foundational study of the various persistency numbers and their interrelationships to prove the relations among these persistency numbers we had to develop some nontrivial technical tools that promise to help in proving other interesting results too more precisely we define the adorned dependency graph of program useful tool for visualizing sets of persistent variables and we define automata that recognize persistent sets in expansionswe start by elaborating on finer definitions of expansions and queries which capture aspects of homomorphism problems on bounded treewidth structures the main results of this article are program transformation technique based on automata theoretic tools which manipulates persistent variables leading in certain cases to programs of fewer persistent variables categorization of the different roles of persistent variables this is done by defining four notions of persistency numbers which capture the propagation of persistent variables from syntactical level to semantical one decidability results concerning the syntactical notions of persistency numbers that we have defined and the exhibition of new classes of programs for which boundedness is undecidable
this paper presents two embedded feature selection algorithms for linear chain crfs named gfsalcrf and pgfsalcrf gfsalcrf iteratively selects feature incorporating which into the crf will improve the conditional log likelihood of the crf most at one time for time efficiency only the weight of the new feature is optimized to maximize the log likelihood instead of all weights of features in the crf the process is iterated until incorporating new features into the crf can not improve the log likelihood of the crf noticeably pgfsalcrf adopts pseudo likelihood as evaluation criterion to iteratively select features to improve the speed of gfsalcrf furthermore it scans all candidate features and forms small feature set containing some promising features at certain iterations then the small feature set will be used by subsequent iterations to further improve the speed experiments on two real world problems show that crfs with significantly fewer features selected by our algorithms achieve competitive performance while obtaining significantly shorter testing time
today’s pre college students have been immersed in social media systems sms that mediate their everyday interactions before students arrive at college they are using typically using blogs wikis forums social connection systems digital asset sharing systems and even community game systems to stay connected when students reach college their social networks change in both their function and structure institutional emphasis is placed upon course ware course management systems cms to augment lecture classroom and discussion section experiences while cms may share similarities with their favorite sms students do not always experience the same level of social engagement from them as they do with the tools they use outside of the academic experience this paper examines how students perceive sms examines what students consider sms and addresses feature differences between sms and cms mechanisms
the aim of this research is to develop an in depth understanding of the dynamics of online group interaction and the relationship between the participation in an online community and an individual’s off line life the half year study of thriving online health support community bob’s acl wwwboard used broad fieldwork approach guided by the ethnographic research techniques of observation interviewing and archival research in combination with analysis of the group’s dynamics during one week period research tools from the social sciences were used to develop thick rich description of the group the significant findings of this study include dependable and reliable technology is more important than state of the art technology in this community strong community development exists despite little differentiation of the community space provided by the software members reported that participation in the community positively influenced their offline lives strong group norms of support and reciprocity made externally driven governance unnecessary tools used to assess group dynamics in face to face groups provide meaningful information about online group dynamics and membership patterns in the community and strong subgroups actively contributed to the community’s stability and vitality
it has become increasingly important to be able to generate free form shapes in commercial applications using rapid prototyping technologies in many cases the shapes of interest are taken from real world objects that do not have pre existing computer models constructing an accurate model for these objects by hand is extremely time consuming and difficult with even the latest software packages to aid in the modeling process scanners are used to capture the object shape and generate high resolution model of the object however these models built from scans often have irregularities that prevent the construction of useful prototype this paper proposes method for generating models suitable for rapid prototyping from measurements of real world objects taken by scanner this is accomplished by taking cloud of point data as input and fitting closed surface to the data in such way as to ensure accuracy in the representation of the object surface and compatibility with rapid prototyping machine we treat surface modeling and adaptation to the data in new framework as stochastic surface estimation
scalable overlay networks such as chord can pastry and tapestry have recently emerged as flexible infrastructure for building large peer to peer systems in practice such systems have two disadvantages they provide no control over where data is stored and no guarantee that routing paths remain within an administrative domain whenever possible skipnet is scalable overlay network that provides controlled data placement and guaranteed routing locality by organizing data primarily by string names skipnet allows for both fine grained and coarse grained control over data placement content can be placed either on pre determined node or distributed uniformly across the nodes of hierarchical naming sub tree an additional useful consequence of skipnet’s locality properties is that partition failures in which an entire organization disconnects from the rest of the system can result in two disjoint but well connected overlay networks
the traditional setting of supervised learning requires large amount of labeled training examples in order to achieve good generalization however in many practical applications unlabeled training examples are readily available but labeled ones are fairly expensive to obtain therefore semi supervised learning has attracted much attention previous research on semi supervised learning mainly focuses on semi supervised classification although regression is almost as important as classification semi supervised regression is largely understudied in particular although co training is main paradigm in semi supervised learning few works has been devoted to co training style semi supervised regression algorithms in this paper co training style semi supervised regression algorithm ie coreg is proposed this algorithm uses two regressors each labels the unlabeled data for the other regressor where the confidence in labeling an unlabeled example is estimated through the amount of reduction in mean square error over the labeled neighborhood of that example analysis and experiments show that coreg can effectively exploit unlabeled data to improve regression estimates
when navigating in real physical environments as human beings we tend to display systematic or near systematic errors with distance direction and other navigation issues to avoid making these errors we choose different stratategies to find our way while there have been lot of hci studies of navigation design guidelines for using maps or speech based or tactile based guidance in mobile devices in this paper we introduce an initial study of multimodal navigation design utilising the design practice of episodes of motion originated from urban planning the implications of designing cues and providing rhythm as the design guidelines of episodes of motions suggests are explored in this study with the subjects being pedestrians with wayfinding tasks in an urban area the main contributions of this paper are in evaluating the design implications in the context of mobile wayfinding tasks and in reflecting the results according to human wayfinding behaviour it is concluded that by designing predictive clues and rhythm into mobile multimodal navigation applications we can improve navigation aids for users
the impact of interruptions on workflow and productivity has been extensively studied in the pc domain but while fragmented user attention is recognized as an inherent aspect of mobile phone usage little formal evidence exists of its effect on mobile productivity using survey and screenshot based diary study we investigated the types of barriers people face when performing tasks on their mobile phones the ways they follow up with such suspended tasks and how frustrating the experience of task disruption is for mobile users from situated samples provided by iphone and pocket pc users we distill classification of barriers to the completion of mobile tasks our data suggest that moving to pc to complete phone task is common yet not inherently problematic depending on the task finally we relate our findings to prior design guidelines for desktop workflow and discuss how the guidelines can be extended to mitigate disruptions to mobile taskflow
the reeb graph tracks topology changes in level sets of scalar function and finds applications in scientific visualization and geometric modeling we describe an algorithm that constructs the reeb graph of morse function defined on manifold our algorithm maintains connected components of the two dimensional levels sets as dynamic graph and constructs the reeb graph in nlogn nlogg loglogg time where is the number of triangles in the tetrahedral mesh representing the manifold and is the maximum genus over all level sets of the function we extend this algorithm to construct reeb graphs of manifolds in nlogn loglogn time where is the number of triangles in the simplicial complex that represents the manifold our result is significant improvement over the previously known algorithm finally we present experimental results of our implementation and demonstrate that our algorithm for manifolds performs efficiently in practice
microarchitecture is described that achieves high performance on conventional single threaded program codes without compiler assistance to obtain high instructions per clock ipc for inherently sequential eg specint programs large number of instructions must be in flight simultaneously however several problems are associated with such microarchitectures including scalability issues related to control flow and memory latencyour design investigates how to utilize large mesh of processing elements in order to execute singlethreaded program we present basic overview of our microarchitecture and discuss how it addresses scalability as we attempt to execute many instructions in parallel the microarchitecture makes use of control and value speculative execution multipath execution and high degree of out of order execution to help extract instruction level parallelism execution time predication and time tags for operands are used for maintaining program order we provide simulation results for several geometries of our microarchitecture illustrating range of design tradeoffs results are also presented that show the small performance impact over range of memory system latencies
given large set of data common data mining problem is to extract the frequent patterns occurring in this set the idea presented in this paper is to extract condensed representation of the frequent patterns called disjunction bordered condensation dbc instead of extracting the whole frequent pattern collection we show that this condensed representation can be used to regenerate all frequent patterns and their exact frequencies moreover this regeneration can be performed without any access to the original data practical experiments show that the dbccan be extracted very efficiently even in difficult cases and that this extraction and the regeneration of the frequent patterns is much more efficient than the direct extraction of the frequent patterns themselves we compared the dbc with another representation of frequent patterns previously investigated in the literature called frequent closed sets in nearly all experiments we have run the dbc have been extracted much more efficiently than frequent closed sets in the other cases the extraction times are very close
adaptive personalization where the system adapts the interface to user’s needs has the potential for significant performance benefits on small screen devices however research on adaptive interfaces has almost exclusively focused on desktop displays to explore how well previous findings generalize to small screen devices we conducted study with subjects to compare adaptive interfaces for small and desktop sized screens results show that high accuracy adaptive menus have an even larger positive impact on performance and satisfaction when screen real estate is constrained the drawback of the high accuracy menus however is that they reduce the user’s awareness of the full set of items in the interface potentially making it more difficult for users to learn about new features
abstract visualization can provide valuable assistance for data analysis and decision making tasks however how people perceive and interact with visualization tool can strongly influence their understanding of the data as well as the system’s usefulness human factors therefore contribute significantly to the visualization process and should play an important role in the design and evaluation of visualization tools several research initiatives have begun to explore human factors in visualization particularly in perception based design nonetheless visualization work involving human factors is in its infancy and many potentially promising areas have yet to be explored therefore this paper aims to review known methodology for doing human factors research with specific emphasis on visualization review current human factors research in visualization to provide basis for future investigation and identify promising areas for future research
the term wireless sensor network is applied broadly to range of significantly different networking environments on the other hand there exists substantial body of research on key establishment in wireless sensor networks much of which does not pay heed to the variety of different application requirements we set out simple framework for classifying wireless sensor networks in terms of those properties that directly influence key distribution requirements we fit number of existing schemes within this framework and use this process to identify areas which require further attention from key management architects
in this correspondence we propose novel efficient and effective refined histogram rh for modeling the wavelet subband detail coefficients and present new image signature based on the rh model for supervised texture classification our rh makes use of step function with exponentially increasing intervals to model the histogram of detail coefficients and the concatenation of the rh model parameters for all wavelet subbands forms the so called rh signature to justify the usefulness of the rh signature we discuss and investigate some of its statistical properties these properties would clarify the sufficiency of the signature to characterize the wavelet subband information in addition we shall also present an efficient rh signature extraction algorithm based on the coefficient counting technique which helps to speed up the overall classification system performance we apply the rh signature to texture classification using the well known databases experimental results show that our proposed rh signature in conjunction with the use of symmetrized kullback leibler divergence gives satisfactory classification performance compared with the current state of the art methods
reconfigurable computers rcs combine general purpose processors gpps with field programmable gate arrays fpgas the fpgas are reconfigured at run time to become application specific processors that collaborate with the gpps to execute the application high level language hll to hardware description language hdl compilers allow the fpga based kernels to be generated using hll based programming rather than hdl based hardware design unfortunately the loops needed for floating point reduction operations often cannot be pipelined by these hll hdl compilers this capability gap prevents the development of number of important fpga based kernels this article describes novel architecture and algorithm that allow the use of an hll hdl environment to implement high performance fpga based kernels that reduce multiple variable length sets of floating point data sparse matrix iterative solver is used to demonstrate the effectiveness of the reduction kernel the fpga augmented version running on contemporary rc is up to times faster than the software only version of the same solver running on the gpp conservative estimates show the solver will run up to times faster than software on next generation rc
duplex is distributed collaborative editor for users connected through large scale environment such as the internet large scale implies heterogeneity unpredictable communication delays and failures and inefficient implementations of techniques traditionally used for collaborative editing in local area networks to cope with these unfavorable conditions duplex proposes model based on splitting the document into independent parts maintained individually and replicated by kernel users act on document parts and interact with co authors using local environment providing safe store and recovery mechanisms against failures or divergence with co authors communication is reduced to minimum allowing disconnected operation atomicity concurrency and replica control are confined to manageable small context
providing an effective mobile search service is difficult task given the unique characteristics of the mobile space small screen devices with limited input and interaction capabilities do not make ideal search devices in addition mobile content by its concise nature offers limited indexing opportunities which makes it difficult to build high quality mobile search engines and indexes in this paper we consider the issue of limited page content by evaluating heuristic content enrichment framework that uses standard web resources as source of additional indexing knowledge we present an evaluation using mobile news service that demonstrates significant improvements in search performance compared to benchmark mobile search engine
fast and flexible message demultiplexing are well established goals in the networking community currently however network architects have had to sacrifice one for the other we present new packet filter system dpf dynamic packet filters that provides both the traditional flexibility of packet filters and the speed of hand crafted demultiplexing routines dpf filters run times faster than the fastest packet filters reported in the literature dpf’s performance is either equivalent to or when it can exploit runtime information superior to hand coded demultiplexors dpf achieves high performance by using carefully designed declarative packet filter language that is aggressively optimized using dynamic code generation the contributions of this work are detailed description of the dpf design discussion of the use of dynamic code generation and quantitative results on its performance impact quantitative results on how dpf is used in the aegis kernel to export network devices safely and securely to user space so that udp and tcp can be implemented efficiently as user level libraries and the unrestricted release of dpf into the public domain
given user specified minimum correlation threshold and market basket database with items and transactions an all strong pairs correlation query finds all item pairs with correlations above the threshold however when the number of items and transactions are large the computation cost of this query can be very high in this paper we identify an upper bound of pearson’s correlation coefficient for binary variables this upper bound is not only much cheaper to compute than pearson’s correlation coefficient but also exhibits special monotone property which allows pruning of many item pairs even without computing their upper bounds two step all strong pairs correlation que ry taper algorithm is proposed to exploit these properties in filter and refine manner furthermore we provide an algebraic cost model which shows that the computation savings from pruning is independent or improves when the number of items is increased in data sets with common zipf or linear rank support distributions experimental results from synthetic and real data sets exhibit similar trends and show that the taper algorithm can be an order of magnitude faster than brute force alternatives
in this paper we examine some critical design features of trace cache fetch engine for wide issue processor and evaluate their effects on performance we evaluate path associativity partial matching and inactive issue all of which are straightforward extensions to the trace cache we examine features such as the fill unit and branch predictor design in our final analysis we show that the trace cache mechanism attains percent performance improvement over an aggressive single block fetch mechanism and percent improvement over sequential multiblock mechanism
increases in on chip communication delay and the large working sets of server and scientific workloads complicate the design of the on chip last level cache for multicore processors the large working sets favor shared cache design that maximizes the aggregate cache capacity and minimizes off chip memory requests at the same time the growing on chip communication delay favors core private caches that replicate data to minimize delays on global wires recent hybrid proposals offer lower average latency than conventional designs but they address the placement requirements of only subset of the data accessed by the application require complex lookup and coherence mechanisms that increase latency or fail to scale to high core counts in this work we observe that the cache access patterns of range of server and scientific workloads can be classified into distinct classes where each class is amenable to different block placement policies based on this observation we propose reactive nuca nuca distributed cache design which reacts to the class of each cache access and places blocks at the appropriate location in the cache nuca cooperates with the operating system to support intelligent placement migration and replication without the overhead of an explicit coherence mechanism for the on chip last level cache in range of server scientific and multiprogrammed workloads nuca matches the performance of the best cache design for each workload improving performance by on average over competing designs and by at best while achieving performance within of an ideal cache design
we study an adaptive variant of oblivious transfer in which sender has messages of which receiver can adaptively choose to receive one after the other in such way that the sender learns nothing about the receiver’s selections and the receiver only learns about the requested messages we propose two practical protocols for this primitive that achieve stronger security notion than previous schemes with comparable efficiency in particular by requiring full simulatability for both sender and receiver security our notion prohibits subtle selective failure attack not addressed by the security notions achieved by previous practical schemesour first protocol is very efficient generic construction from unique blind signatures in the random oracle model the second construction does not assume random oracles but achieves remarkable efficiency with only constant number of group elements sent during each transfer this second construction uses novel techniques for building efficient simulatable protocols
this paper describes an algorithm for simultaneously optimizing across multiple levels of the memory hierarchy for dense matrix computations our approach combines compiler models and heuristics with guided empirical search to take advantage of their complementary strengths the models and heuristics limit the search to small number of candidate implementations and the empirical results provide the most accurate information to the compiler to select among candidates and tune optimization parameter values we have developed an initial implementation and applied this approach to two case studies matrix multiply and jacobi relaxation for matrix multiply our results on two architectures sgi and sun ultrasparc iie outperform the native compiler and either outperform or achieve comparable performance as the atlas self tuning library and the hand tuned vendor blas library jacobi results also substantially outperform the native compilers
while rfid has become ubiquitous technology there is still need for rfid systems with different capabilities protocols and features depending on the application this article describes design automation flow and power estimation technique for fast implementation and design feedback of new rfid systems physical layer features are described using waveform features which are used to automatically generate physical layer encoding and decoding hardware blocks rfid primitives to be supported by the tag are enumerated with rfid macros and the behavior of each primitive is specified using ansi within the template to automatically generate the tag controller case studies implementing widely used standards such as iso part and iso part using this automation technique are presented the power macromodeling flow demonstrated here is shown to be within percnt to percnt accuracy while providing results times faster than traditional methods when eliminating the need for certain features of iso part the design flow shows that the power required by the implementation is reduced by nearly percnt
consider an ordered static tree where each node has label from alphabet sigma tree may be of arbitrary degree and shape our goal is designing compressed storage scheme of that supports basic navigational operations among the immediate neighbors of node ie parent ith child or any child with some label hellip as well as more sophisticated path based search operations over its labeled structure we present novel approach to this problem by designing what we call the xbw transform of the tree in the spirit of the well known burrows wheeler transform for strings the xbw transform uses path sorting to linearize the labeled tree into two coordinated arrays one capturing the structure and the other the labels for the first time by using the properties of the xbw transform our compressed indexes go beyond the information theoretic lower bound and support navigational and path search operations over labeled trees within near optimal time bounds and entropy bounded space our xbw transform is simple and likely to spur new results in the theory of tree compression and indexing as well as interesting application contexts as an example we use the xbw transform to design and implement compressed index for xml documents whose compression ratio is significantly better than the one achievable by state of the art tools and its query time performance is order of magnitudes faster
ining non taxonomic relations is an important part of the semantic web puzzle building on the work of the semantic annotation community we address the problem of extracting relation instances among annotated entities in particular we analyze the problem of verb based relation instantiation in some detail and present heuristic domain independent approach based on verb chunking and entity clustering which doesn’t require parsing we also address the problem of mapping linguistic tuples to relations from the ontology case study conducted within the biography domain demonstrates the validity of our results in contrast to related work whilst examining the complexity of the extraction task and the feasibility of verb based extraction in general
constraint databases cdbs are an extension of relational databases that enrich both the relational data model and the relational query primitives with constraints by providing finite representation of data with infinite semantics the constraint database approach is particularly appropriate for querying spatiotemporal datasince the introduction of constraint databases in the early several constraint database systems have been implemented in this paper we discuss several extensions to paris’s constraint database framework that we believe necessary based on the experience with these implementations and specifically cqa cdbextending the cdb schema to explicitly distinguish traditional from constraint attributesadditional constraint query algebra operators keeping queries safemulti attribute indexing systems how best to group the attributes flexibility in representing infinite data taking constraintswe believe that paris would have been the first to modify the constraint database framework if he felt there were way to improve it and we hope that these extensions to the cdb framework bring us closer towards realizing the promise that constraint database technology holds for integrating the advantages of traditional relational database technology with emerging data intensive applications
this paper presents the first complete design to apply compressive sampling theory to sensor data gathering for large scale wireless sensor networks the successful scheme developed in this research is expected to offer fresh frame of mind for research in both compressive sampling applications and large scale wireless sensor networks we consider the scenario in which large number of sensor nodes are densely deployed and sensor readings are spatially correlated the proposed compressive data gathering is able to reduce global scale communication cost without introducing intensive computation or complicated transmission control the load balancing characteristic is capable of extending the lifetime of the entire sensor network as well as individual sensors furthermore the proposed scheme can cope with abnormal sensor readings gracefully we also carry out the analysis of the network capacity of the proposed compressive data gathering and validate the analysis through ns simulations more importantly this novel compressive data gathering has been tested on real sensor data and the results show the efficiency and robustness of the proposed scheme
for mobile robotics head gear in augmented reality ar applications or computer vision it is essential to continuously estimate the egomotion and the structure of the environment this paper presents the system developed in the smarttracking project which simultaneously integrates visual and inertial sensors in combined estimation scheme the sparse structure estimation is based on the detection of corner features in the environment from single known starting position the system can move into an unknown environment the vision and inertial data are fused and the performance of both unscented kalman filter and extended kalman filter are compared for this task the filters are designed to handle asynchronous input from visual and inertial sensors which typically operate at different and possibly varying rates additionally bank of extended kalman filters one per corner feature is used to estimate the position and the quality of structure points and to include them into the structure estimation process the system is demonstrated on mobile robot executing known motions such that the estimation of the egomotion in an unknown environment can be compared to ground truth
the problem of software integrity is traditionally addressed as the static verification of the code before the execution often by checking the code signature however there are no well defined solutions to the run time verification of code integrity when the code is executed remotely which is refer to as run time remote entrusting in this paper we present the research challenges involved in run time remote entrusting and how we intend to solve this problem specifically we address the problem of ensuring that given piece of code executes on an remote untrusted machine and that its functionalities have not been tampered with both before execution and during run time
the benefits of synergistic collaboration are at the heart of arguments in favor of pair programming however empirical studies usually investigate direct effects of various factors on pair programming performance without looking into the details of collaboration this paper reports from an empirical study that investigated the nature of pair programming collaboration and subsequently investigated postulated effects of personality on pair programming collaboration audio recordings of professional programmer pairs were categorized according to taxonomy of collaboration we then measured postulated relationships between the collaboration categories and the personality of the individuals in the pairs we found evidence that personality generally affects the type of collaboration that occurs in pairs and that different levels of given personality trait between two pair members increases the amount of communication intensive collaboration exhibited by pair
interactive tv research encompasses rather diverse body of work eg multimedia hci cscw uist user modeling media studies that has accumulated over the past years in this article we highlight the state of the art and consider two basic issues what is interactive tv research can it help us reinvent the practices of creating sharing and watching tv we survey the literature and identify three concepts that have been inherent in interactive tv research interactive tv as content creation interactive tv as content and experience sharing process and interactive tv as control of audiovisual content we propose this simple taxonomy create share control as an evolutionary step over the traditional hierarchical produce distribute consume paradigm moreover we highlight the importance of sociability in all phases of the create share control model
in machine learning and data mining heuristic and association rules are two dominant schemes for rule discovery heuristic rule discovery usually produces small set of accurate rules but fails to find many globally optimal rules association rule discovery generates all rules satisfying some constraints but yields too many rules and is infeasible when the minimum support is small here we present unified framework for the discovery of family of optimal rule sets and characterize the relationships with other rule discovery schemes such as nonredundant association rule discovery we theoretically and empirically show that optimal rule discovery is significantly more efficient than association rule discovery independent of data structure and implementation optimal rule discovery is an efficient alternative to association rule discovery especially when the minimum support is low
the cade atp system competition casc is an annual evaluation of fully automatic classical logic automated theorem proving atp systems casc was the fourteenth competition in the casc series twenty nine atp systems and system variants competed in the various competition and demonstration divisions an outline of the competition design and commentated summary of the results are presented
previous studies have highlighted the high arrival rate of new contenton the web we study the extent to which this new content can beefficiently discovered by crawler our study has two parts first we study the inherent difficulty of the discovery problem using amaximum cover formulation under an assumption of perfect estimates oflikely sources of links to new content second we relax thisassumption and study more realistic setting in which algorithms mustuse historical statistics to estimate which pages are most likely toyield links to new content we recommend simple algorithm thatperforms comparably to all approaches we considerwe measure the emphoverhead of discovering new content defined asthe average number of fetches required to discover one new page weshow first that with perfect foreknowledge of where to explore forlinks to new content it is possible to discover of all newcontent with under overhead and of new content with overhead but actual algorithms which do not have access to perfectforeknowledge face more difficult task one quarter of new contentis simply not amenable to efficient discovery of the remaining threequarters of new content during given week may be discoveredwith overhead if content is recrawled fully on monthly basis
applications in pervasive computing are typically required to interact seamlessly with their changing environments to provide users with smart computational services these applications must be aware of incessant context changes in their environments and adjust their behaviors accordingly as these environments are highly dynamic and noisy context changes thus acquired could be obsolete corrupted or inaccurate this gives rise to the problem of context inconsistency which must be timely detected in order to prevent applications from behaving anomalously in this paper we propose formal model of incremental consistency checking for pervasive contexts based on this model we further propose an efficient checking algorithm to detect inconsistent contexts the performance of the algorithm and its advantages over conventional checking techniques are evaluated experimentally using cabot middleware
this work achieves an efficient acquisition of scenes and their depths along long streets camera is mounted on vehicle moving along straight or mildly curved path and sampling line properly set in the camera frame scans the images over scenes continuously to form route panorama this paper proposes method to estimate the depth from the camera path by analyzing phenomenon called stationary blur in the route panorama this temporal blur is perspective effect in parallel projection yielded from the sampling slit with physical width we analyze the behavior of the stationary blur with respect to the scene depth vehicle path and camera properties based on that we develop an adaptive filter to evaluate the degree of the blur for depth estimation which avoids error prone feature matching or tracking in capturing complex street scenes and facilitates real time sensing the method also uses much less data than the structure from motion approach so that it can extend the sensing area significantly the resulting route panorama with depth information is useful for urban visualization monitoring navigation and modeling
we introduce relational variants of neural gas very efficient and powerful neural clustering algorithm which allow clustering and mining of data given in terms of pairwise similarity or dissimilarity matrix it is assumed that this matrix stems from euclidean distance or dot product respectively however the underlying embedding of points is unknown one can equivalently formulate batch optimization in terms of the given similarities or dissimilarities thus providing way to transfer batch optimization to relational data for this procedure convergence is guaranteed and extensions such as the integration of label information can readily be transferred to this framework
there appears to be no evidence of research that has attempted to bring together the knowledge relating to trust among team members working in virtual organisations unlike traditional workplaces the nature of virtual teams is such that working relationships are typically short and often there is no actual personal contact in this environment trust must be taken to new level as it is essential for the success of collaborative ventures the scope of this review will involve an examination of range of articles sourced through proquest and business premier resource databases these will be reviewed based on content analysis to develop patterns of issues raised this will involve using combination of integrative and interpretative approaches it is anticipated the outcome of this review will contribute to greater understanding of trust from the perspectives of key stakeholders in virtual working relationships
this paper describes method for building visual maps from video data using quantized descriptions of motion this enables unsupervised classification of scene regions based upon the motion patterns observed within them our aim is to recognise generic places using qualitative representation of the spatial layout of regions with common motion patterns such places are characterised by the distribution of these motion patterns as opposed to static appearance patterns and could include locations such as train platforms bus stops and park benches motion descriptions are obtained by tracking image features over temporal window and are then subjected to normalisation and thresholding to provide quantized representation of that feature’s gross motion input video is quantized spatially into pixel blocks and histogram of the frequency of occurrence of each vector is then built for each of these small areas of scene within these we can therefore characterise the dominant patterns of motion and then group our spatial regions based upon both proximity and local motion similarity to define areas or regions with particular motion characteristics moving up level we then consider the relationship between the motion in adjacent spatial areas and can characterise the dominant patterns of motion expected in particular part of the scene over time the current paper differs from previous work which has largely been based on the paths of moving agents and therefore restricted to scenes in which such paths are identifiable we demonstrate our method in three very different scenes an indoor room scenario with multiple chairs and unpredictable unconstrained motion an underground station featuring regions where motion is constrained train tracks and regions with complicated motion and difficult occlusion relationships platform and an outdoor scene with challenging camera motion and partially overlapping video streams
as one of the most important tasks of web usage mining wum web user clustering which establishes groups of users exhibiting similar browsing patterns provides useful knowledge to personalized web services and motivates long term research interests in the web community most of the existing approaches cluster web users based on the snapshots of web usage data although web usage data are evolutionary in the nature consequently the usefulness of the knowledge discovered by existing web user clustering approaches might be limited in this paper we address this problem by clustering web users based on the evolution of web usage data given set of web users and their associated historical web usage data we study how their usage data change over time and mine evolutionary patterns from each user’s usage history the discovered patterns capture the characteristics of changes to web user’s information needs we can then cluster web users by analyzing common and similar evolutionary patterns shared by users web user clusters generated in this way provide novel and useful knowledge for various personalized web applications including web advertisement and web caching
multi step recognition process is developed for extracting compound forest cover information from manually produced scanned historical topographic maps of the th century this information is unique data source for gis based land cover change modeling based on salient features in the image the steps to be carried out are character recognition line detection and structural analysis of forest symbols semantic expansion implying the meanings of objects is applied for final forest cover extraction the procedure resulted in high accuracies of indicating potential for automatic and robust extraction of forest cover from larger areas
web service based systems are built orchestrating loosely coupled standardized and internetworked programs if on the one hand web services address the interoperability issues of modern information systems on the other hand they enable the development of software systems on the basis of reuse greatly limiting the necessity for reimplementation techniques and methodologies to gain the maximum from this emerging computing paradigm are in great need in particular way to explicitly model and manage variability would greatly facilitate the creation and customization of web service based systems by variability we mean the ability of software system to be extended changed customized or configured for use in specific context we present framework and related tool suite for modeling and managing the variability of web service based systems for design and run time respectively it is an extension of the covamof framework for the variability management of software product families which was developed at the university of groningen among the novelties and advantages of the approach are the full modeling of variability via uml diagrams the run time support and the low involvement of the user all of which leads to great deal of automation in the management of all kinds of variability
we propose novel energy efficient memory architecture which relies on the use of cache with reduced number of tag bits the idea behind the proposed architecture is based on moving large number of the tag bits from the cache into an external register tag overflow buffer that identifies the current locality of the memory references additional hardware allows to dynamically update the value of the reference locality contained in the buffer energy efficiency is achieved by using for most of the memory accesses reduced tag cache this architecture is minimally intrusive for existing designs since it assumes the use of regular cache and does not require any special circuitry internal to the cache such as row or column activation mechanisms average energy savings are on tag energy corresponding to about saving on total cache energy measured on set of typical embedded applications
this paper presents graph oriented framework called webgop for architecture modeling and programming of web based distributed applications webgop is based on the graph oriented programming gop model under which the components of distributed program are configured as logical graph and implemented using set of operations defined over the graph webgop reshapes gop with reflective object oriented design which provides powerful architectural support in the world wide web environment in webgop the architecture graph is reified as an explicit object which itself is distributed over the network providing graph oriented context for the execution of distributed applications the programmer can specialize the type of graph to represent particular architecture style tailored for an application webgop also has built in support for flexible and dynamic architectures including both planned and unplanned dynamic reconfiguration of distributed applications we describe the webgop framework prototypical implementation of the framework on top of soap and performance evaluation of the prototype the prototype demonstrated the feasibility of our approach results of the performance evaluation showed that the overhead introduced by webgop over soap is reasonable and acceptable
the crossed cube cq is an important variant of the hypercube and possesses many desirable properties for interconnection networks this paper shows that in cq with faulty vertices and faulty edges there exists fault free path of length between any two distinct fault free vertices for each satisfying provided that where the lower bound of and the upper bound of are tight for some moreover this result improves the known result that cq is hamiltonian connected
in this paper it is shown how structural and cognitive versioning issues can be efficiently managed in petri nets based hypertextual model the advantages of this formalism are enhanced by modular and structured modeling modularity allows to focus the attention only on some modules while giving the abstraction of the others each module owns metaknowledge that is useful in defining new layers and contextsthe central point of the data model is the formulation and resolution of three recurrence equations effective in describing both the versioning and the derivation history these equations permit to express in precise terms both the structural evolution changes operated on specific nodes of the net and the behavioral one changes concerning browsing
advances in computation and communication technologies allow users to access computer networks using portable computing devices via wireless connection while mobile furthermore multidatabases offer practical means of managing information sharing from multiple preexisting heterogeneous databases by superimposing the mobile computing environment onto the multidatabase system new computing environment is attainedin this work we concentrate on the effects of the mobile computing environment on query processing in multidatabases we show how broadcasting as possible solution would respond to current challenges such as bandwidth and storage limitations organizing data objects along single dimension broadcast channel should follow the semantic links assumed within the multiple dimension objects lsquo structure learning from our past experiences in objects organization on conventional storage mediums disks we propose schemes for organizing objects along single broadcast air channel the proposed schemes are simulated and analyzed
although adaptive processors can exploit application variability to improve performance or save energy effectively managing their adaptivity is challenging to address this problem we introduce new approach to adaptivity the positional approach in this approach both the testing of configurations and the application of the chosen configurations are associated with particular code sections this is in contrast to the currently used temporal approach to adaptation where both the testing and application of configurations are tied to successive intervals in timewe propose to use subroutines as the granularity of code sections in positional adaptation moreover we design three implementations of subroutine based positional adaptation that target energy reduction in three different workload environments embedded or specialized server general purpose and highly dynamic all three implementations of positional adaptation are much more effective than temporal schemes on average they boost the energy savings of applications by and over temporal schemes in two experiments
although many suggestions have been made for concurrency in trees few of these have considered recovery as well we describe an approach which provides high concurrency while preserving well formed trees across system crashes our approach works for class of index trees that is generalization of the rm link tree this class includes some multi attribute indexes and temporal indexes structural changes in an index tree are decomposed into sequence of atomic actions each one leaving the tree well formed and each working on separate level of the tree all atomic actions on levels of the tree above the leaf level are independent of database transactions and so are of short duration incomplete structural changes are detected in normal operations and trigger completion
cloning in software systems is known to create problems during software maintenance several techniques have been proposed to detect the same or similar code fragments in software so called simple clones while the knowledge of simple clones is useful detecting design level similarities in software could ease maintenance even further and also help us identify reuse opportunities we observed that recurring patterns of simple clones so called structural clones often indicate the presence of interesting design level similarities an example would be patterns of collaborating classes or components finding structural clones that signify potentially useful design information requires efficient techniques to analyze the bulk of simple clone data and making non trivial inferences based on the abstracted information in this paper we describe practical solution to the problem of detecting some basic but useful types of design level similarities such as groups of highly similar classes or files first we detect simple clones by applying conventional token based techniques then we find the patterns of co occurring clones in different files using the frequent itemset mining fim technique finally we perform file clustering to detect those clusters of highly similar files that are likely to contribute to design level similarity pattern the novelty of our approach is application of data mining techniques to detect design level similarities experiments confirmed that our method finds many useful structural clones and scales up to big programs the paper describes our method for structural clone detection prototype tool called clone miner that implements the method and experimental results
data warehouse is an integrated and time varying collection of data derived from operational data and primarily used in strategic decision making by means of olap techniques although it is generally agreed that warehouse design is non trivial problem and that multidimensional data models as well as star or snowflake schemata are relevant in this context there exist neither methods for deriving such schema from an operational database nor measures for evaluating warehouse schema in this paper sequence of multidimensional normal forms is established that allow reasoning about the quality of conceptual data warehouse schemata in rigorous manner these normal forms address traditional database design objectives such as faithfulness completeness and freedom of redundancies as well as the notion of summarizability which is specific to multidimensional database schemata
regression testing as it’s commonly practiced is unsound due to inconsistent test repair and test addition this paper presents new technique differential testing that alleviates the test repair problem and detects more changes than regression testing alone differential testing works by creating test suites for both the original system and the modified system and contrasting both versions of the system with these two suites differential testing is made possible by recent advances in automated unit test generation furthermore it makes automated test generators more useful because it abstracts away the interpretation and management of large volumes of tests by focusing on the changes between test suites in our preliminary empirical study of subjects differential testing discovered and more behavior changes than regression testing alone
the concept of dominance has recently attracted much interest in the context of skyline computation given an dimensional data set point is said to dominate if is better than in at least one dimension and equal to or better than it in the remaining dimensions in this paper we propose extending the concept of dominance for business analysis from microeconomic perspective more specifically we propose new form of analysis called dominant relationship analysis dra which aims to provide insight into the dominant relationships between products and potential buyers by analyzing such relationships companies can position their products more effectively while remaining profitableto support dra we propose novel data cube called dada data cube for dominant relationship analysis which captures the dominant relationships between products and customers three types of queries called dominant relationship queries drqs are consequently proposed for analysis purposes linear optimization queries loq subspace analysis queries saq and comparative dominant queries cdq algorithms are designed for efficient computation of dada and answering the drqs using dada results of our comprehensive experiments show the effectiveness and efficiency of dada and its associated query processing strategies
an essential element in defining the semantics of web services is the domain knowledge medical informatics is one of the few domains to have considerable domain knowledge exposed through standards these standards offer significant value in terms of expressing the semantics of web services in the healthcare domainin this paper we describe the architecture of the artemis project which exploits ontologies based on the domain knowledge exposed by the healthcare information standards through standard bodies like hl cen tc iso tc and gehr we use these standards for two purposes first to describe the web service functionality semantics that is the meaning associated with what web service does and secondly to describe the meaning associated with the messages or documents exchanged through web servicesartemis web service architecture uses ontologies to describe semantics but it does not propose globally agreed ontologies rather healthcare institutes reconcile their semantic differences through mediator component the mediator component uses ontologies based on prominent healthcare standards as references to facilitate semantic mediation among involved institutes mediators have pp communication architecture to provide scalability and to facilitate the discovery of other mediators
efficiently and accurately discovering similarities among moving object trajectories is difficult problem that appears in many spatiotemporal applications in this paper we consider how to efficiently evaluate trajectory joins ie how to identify all pairs of similar trajectories between two datasets our approach represents an object trajectory as sequence of symbols ie string based on special lower bounding distances between two strings we propose pruning heuristic for reducing the number of trajectory pairs that need to be examined furthermore we present an indexing scheme designed to support efficient evaluation of string similarities in secondary storage through comprehensive experimental evaluation we present the advantages of the proposed techniques
this paper aims to contribute with an understanding of meaningful experiences of photography to support reflection upon the design of future camera devices we have conducted study of passive camera device called sensecam which previously has been investigated as memory aid combination of life logging and memory tool and as resource for digital narratives we take creative perspective and show that even if camera is designed to be forgotten in use ie is worn as necklace and takes pictures automatically it can still be part of an engaging or active photographic experience because sensecam is different from film cameras camera phones and other digital cameras it involves different type of photographic experience for example when moving through different social contexts and how the resulting pictures are appreciated the findings stem from people who used the camera for week this is complemented with reflections from the researcher who has been using the camera for month
we present directed acyclic graph visualisation designed to allow interaction with set of multiple classification trees specifically to find overlaps and differences between groups of trees and individual trees the work is motivated by the need to find representation for multiple trees that has the space saving property of general graph representation and the intuitive parent child direction cues present in individual representation of trees using example taxonomic data sets we describe augmentations to the common barycenter dag layout method that reveal shared sets of child nodes between common parents in clearer manner other interactions such as displaying the multiple ancestor paths of node when it occurs in several trees and revealing intersecting sibling sets within the context of single dag representation are also discussed
previous trust models are mainly focused on reputational mechanism based on explicit trust ratings however the large amount of user generated content and community context published on web is often ignored without enough information there are several problems with previous trust models first they cannot determine in which field one user trusts in another so many models assume that trust exists in all fields second some models are not able to delineate the variation of trust scales therefore they regard each user trusts all his friends to the same extent third since these models only focus on explicit trust ratings so the trust matrix is very sparse to solve these problems we present rcc trust trust model which combines reputation content and context based mechanisms to provide more accurate fine grained and efficient trust management for the electronic community we extract trust related information from user generated content and commmunity context from web to extend reputation based trust models we introduce role based and behavior based reasoning functionalities to infer users interests and category specific trust relationships following the study in sociology rcctrust exploits similarities between pairs of users to depict differentiated trust scales the experimental results show that rcctrust outperforms pure user similarity method and linear decay trust aware technique in both accuracy and coverage for recommender system
we propose new grid group deployment scheme in wireless sensor networks we use combinatorial designs for key predistribution in sensor nodes the deployment region is divided into square regions the predistribution scheme has the advantage that all nodes within particular region can communicate with each other directly and nodes which lie in different regions can communicate via special nodes called agents which have more resources than the general nodes the number of agents in region is always three whatever the size of the network we give measures of resiliency taking the lee distance into account apart from considering the resiliency in terms of fraction of links broken we also consider the resiliency as the number of nodes and regions disconnected when some sensor are compromised this second measure though very important had not been studied so far in key predistribution schemes which use deployment knowledge we find that the resiliency as the fraction of links compromised is better than existing schemes the number of keys preloaded in each sensor node is much less than all existing schemes and nodes are either directly connected or connected via two hop paths the deterministic key predistribution schemes result in constant time computation overhead for shared key discovery and path key establishment
in this paper we propose an approach to reason on uml schemas with ocl constraints we provide set of theorems to determine that schema does not have any infinite model and then provide decidable method that given schema of this kind efficiently checks whether it satisfies set of desirable properties such as schema satisfiability and class or association liveliness
the paper presents kermit knowledge based entity relationship modelling intelligent tutor kermit is problem solving environment for the university level students in which they can practise conceptual database design using the entity relationship data model kermit uses constraint based modelling cbm to model the domain knowledge and generate student models we have used cbm previously in tutors that teach sql and english punctuation rules the research presented in this paper is significant because we show that cbm can be used to support students learning design tasks which are very different from domains we dealt with in earlier tutors the paper describes the system’s architecture and functionality the system observes students actions and adapts to their knowledge and learning abilities kermit has been evaluated in the context of genuine teaching activities we present the results of two evaluation studies with students taking database courses which show that kermit is an effective system the students have enjoyed the system’s adaptability and found it valuable asset to their learning
activity inference attempts to identify what person is doing at given point in time from series of observations since the the task has developed into fruitful research field and is now considered key step in the design of many human centred systems for activity inference wearable and mobile devices are unique opportunities to sense user’s context unobtrusively throughout the day unfortunately the limited battery life of these platforms does not always allow continuous activity logging in this paper we present novel technique to fill in gaps in activity logs by exploiting both short and long range dependencies in human behaviour inference is performed by sequence alignment using scoring parameters learnt from training data in probabilistic framework experiments on the reality mining dataset show significant improvements over baseline results even with reduced training and long gaps in data
in this paper we consider the problem of answering queries consistently in the presence of inconsistent data ie data violating integrity constraints we propose technique based on the rewriting of integrity constraints into disjunctive rules with two different forms of negation negation as failure and classical negation the disjunctive program can be used to generate repairs for the database and ii to produce consistent answers ie maximal set of atoms which do not violate the constraints we show that our technique is sound complete and more general than techniques previously proposed
an important field of application of intelligent logical agents where rationality plays main role is that of automated negotiation our work is related to the use of argumentation in the field of negotiation in particular we are interested in contract violations and in the construction of justifications to motivate the violation itself and recover if possible the contract on modified conditions we propose temporal modal logic language able to support and depict the arguments justification used in dialectical disputes and we consider suitable algorithms and mechanisms to introduce and manage justifications
game theory is popular tool for designing interaction protocols for agent systems it is currently not clear how to extend this to open agent systems by open we mean that foreign agents will be free to enter and leave different systems at will this means that agents will need to be able to work with previously unseen protocols there does not yet exist any agreement on standard way in which such protocols can be specified and published furthermore it is not clear how an agent could be given the ability to use an arbitrary published protocol the agent would need to be able to work out strategy for participation to address this we propose machine readable language in which game theory mechanism can be written in the form of an agent interaction protocol this language allows the workings of the protocol to be made public so that agents can inspect it to determine its properties and hence their best strategy enabling agents to automatically determine the game theoretic properties of an arbitrary interaction protocol is difficult rather than requiring agents to find the equilibrium of game we propose that recommended equilibrium will be published along with the protocol agents can then check the recommendation to decide if it is indeed an equilibrium we present an algorithm for this decision problem we also develop an equilibrium which simplifies the complexity of the checking problem while still being robust to unilateral deviations
increasingly business applications need to capture consumers complex preferences interactively and monitor those preferences by translating them into event condition action eca rules and syntactically correct processing specification an expressive event model to specify primitive and composite events that may involve timing constraints among events is critical to such applications relying on the work done in active databases and real time systems this research proposes new composite event model based on real time logic rtl the proposed event model does not require fixed event consumption policies and allows the users to represent the exact correlation of event instances in defining composite events it also supports wide range of domain specific temporal events and constraints such as future events time constrained events and relative events this event model is validated within an electronic brokerage architecture that unbundles the required functionalities into three separable components business rule manager eca rule manager and event monitor with well defined interfaces proof of concept prototype was implemented in the java programming language to demonstrate the expressiveness of the event model and the feasibility of the architecture the performance of the composite event monitor was evaluated by varying the number of rules event arrival rates and type of composite events
open source development has long passed the state of infancy with brands such as apache mozilla and linux open source development is becoming major player on the global software market yet most open source projects today are using mailing lists as their primary communication channel and the resulting mailing archives as their only source of documentation mailing archives typically contain massive amounts of data and only support simplistic structures creating problems such as information overflow this makes it difficult for the developers to maintain common direction of their work causing reduced productivity and eventually loss of developers various approaches have been suggested to address this and similar problems both for open source development and for software development in general some approaches enable developers to model their exact reasoning others extract new data from existing data yet again others let developers describe low level details of the system in close proximity to the actual code the calliope project aims at facilitating developers in aligning their efforts in common direction at high level of abstraction as an important aspect of this we propose that the explicit modeling of multivalence could improve the acceptance of more advanced documentation tools into the environment of open source development in order to test this claim we have built prototype that implements explicit modeling of multivalence with this prototype we have carried out tests that support our claim
fundamental aspect of autonomous vehicle guidance is planning trajectories historically two fields have contributed to trajectory or motion planning methods robotics and dynamics and control the former typically have stronger focus on computational issues and real time robot control while the latter emphasize the dynamic behavior and more specific aspects of trajectory performance guidance for unmanned aerial vehicles uavs including fixed and rotary wing aircraft involves significant differences from most traditionally defined mobile and manipulator robots qualities characteristic to uavs include non trivial dynamics three dimensional environments disturbed operating conditions and high levels of uncertainty in state knowledge otherwise uav guidance shares qualities with typical robotic motion planning problems including partial knowledge of the environment and tasks that can range from basic goal interception which can be precisely specified to more general tasks like surveillance and reconnaissance which are harder to specify these basic planning problems involve continual interaction with the environment the purpose of this paper is to provide an overview of existing motion planning algorithms while adding perspectives and practical examples from uav guidance approaches
high precision parameter manipulation tasks typically require adjustment of the scale of manipulation in addition to the parameter itself this paper introduces the notion of zoom sliding or zliding for fluid integrated manipulation of scale zooming via pressure input while parameter manipulation within that scale is achieved via cursor movement sliding we also present the zlider figure widget that instantiates the zliding concept we experimentally evaluate three different input techniques for use with the zlider in conjunction with stylus for cursor positioning in high accuracy zoom and select task our results marginally favor the stylus with integrated isometric pressure sensing tip over bimanual techniques which separate zooming and sliding controls over the two hands we discuss the implications of our results and present further designs that make use of zliding
in this paper we overview one specific approach to the formal development of multi agent systems this approach is based on the use of temporal logics to represent both the behaviour of individual agents and the macro level behaviour of multi agent systems we describe how formal specification verification and refinement can all be developed using this temporal basis and how implementation can be achieved by directly executing these formal representations we also show how the basic framework can be extended in various ways to handle the representation and implementation of agents capable of more complex deliberation and reasoning
in this paper we present an approximate data gathering technique called edges for sensor networks that utilizes temporal and spatial correlations the goal of edges is to efficiently obtain the sensor reading within certain error bound to do this edges utilizes the multiple model kalman filter which is for the non linear data distribution as an approximation approach the use of the kalman filter allows edges to predict the future value using single previous sensor reading in contrast to the other statistical models such as the linear regression and multivariate gaussian in order to extend the lifetime of networks edges utilizes the spatial correlation in edges we group spatially close sensors as cluster since cluster header in network acts as sensor and router cluster header wastes its energy severely to send its own reading and or data coming from its children thus we devise redistribution method which distributes the energy consumption of cluster header using the spatial correlation in some previous works the fixed routing topology is used or the roles of nodes are decided at the base station and this information propagates through the whole network but in edges the change of cluster is notified to small portion of the network our experimental results over randomly generated sensor networks with synthetic and real data sets demonstrate the efficiency of edges
access path deployment is critical issue in physical database design access paths typically include clustered index as the primary access path and set of secondary indexes as auxiliary access paths to deploy the right access paths requires an effective algorithm and accurate estimation of the parameters used by the algorithm one parameter central to any index selection algorithm is the block selectivity of query existing methods for estimating block selectivities are limited by restrictive assumptions furthermore most existing methods produce estimates useful for aiding the selection of secondary indexes only little research has been done in the area of estimating block selectivities for supporting the selection of the clustered index the paper presents set of methods that do not depend on any specific assumption produce accurate estimates and can be used to aid in selecting the clustered index as well as secondary indexes
the retrieval facilities of most peer to peer pp systems are limited to queries based on unique identifier or small set of keywords the techniques used for this purpose are hardly applicable for content based image retrieval cbir in pp network furthermore we will argue that the curse of dimensionality and the high communication overhead prevent the adaptation of multidimensional search trees or fast sequential scan techniques for pp cbir in the present paper we will propose two compact data representations which can be distributed in pp network and used as the basis for source selection this allows to communicate only with small fraction of all peers during query processing without deteriorating the result quality significantly we will also present experimental results confirming our approach
the aim in information filtering is to provide users with personalised selection of information based on their interest profile in adaptive information filtering this profile partially or completely acquired by automatic means this paper investigates if profile generation can be partially acquired by automatic methods and partially by direct user involvement the issue is explored through an empirical study of simulated filtering system that mixes automatic and manual profile generation the study covers several issues involved in mixed control the first issue concerns if machine learned profile can provide better filtering performance if generated from an initial explicit user profile the second issue concerns if user involvement can improve on system generated or adapted profile finally the relationship between filtering performance and user ratings is investigated in this particular study the initial setup of personal profile was effective and yielded performance improvements that persisted after substantiate training however the study showed no correlation between users ratings of profiles and profile filtering performance and only weak indications that users could improve profiles that already had been trained on feedback
in this paper we integrate an assertion based verification methodology with our object oriented system level synthesis methodology to address the problem of hw sw co verification in this direction system level assertion language is defined the system level assertions can be used to monitor the current state of system or flow of transactions these assertions are automatically converted to monitor hardware or monitor software during the system level synthesis process depending on their type and also synthesis style of their corresponding functions the synthesized assertions are functionally equivalent to their original system level assertions and hence can be reused to verify the system after hw sw synthesis and also at run time after system manufacturing this way not only system level assertions are reused in lower levels of abstraction but also run time verification of system is provided in this paper we describe the system level assertion language and explain the corresponding synthesis method in our object oriented system level synthesis methodology however the concept can be applied to any system level design methodology with modifications to assertion types and synthesis method
this paper investigates the complexity of propositional projection temporal logic with star pptl to this end propositional projection temporal logic pptl is first extended to include projection star then by reducing the emptiness problem of star free expressions to the problem of the satisfiability of pptl formulas the lower bound of the complexity for the satisfiability of pptl formulas is proved to be non elementary then to prove the decidability of pptl the normal form normal form graph nfg and labelled normal form graph lnfg for pptl are defined also algorithms for transforming formula to its normal form and lnfg are presented finally decision algorithm for checking the satisfiability of pptl formulas is formalised using lnfgs
despite the importance of ranked queries in numerous applications involving multi criteria decision making they are not efficiently supported by traditional database systems in this paper we propose simple yet powerful technique for processing such queries based on multi dimensional access methods and branch and bound search the advantages of the proposed methodology are it is space efficient requiring only single index on the given relation storing each tuple at most once ii it achieves significant ie orders of magnitude performance gains with respect to the current state of the art iii it can efficiently handle data updates and iv it is applicable to other important variations of ranked search including the support for non monotone preference functions at no extra space overhead we confirm the superiority of the proposed methods with detailed experimental study
this paper presents experimental comparisons between the declarative encodings of various computationally hard problems in answer set programming asp and constraint logic programming over finite domains clp fd the objective is to investigate how solvers in the two domains respond to different problems highlighting the strengths and weaknesses of their implementations and suggesting criteria for choosing one approach over the other ultimately the work in this paper is expected to lay the foundations for transfer of technology between the two domains for example by suggesting ways to use clp fd in the execution of asp
computational simulation of time varying physical processes is of fundamental importance for many scientific and engineering applications most frequently time varying simulations are performed over multiple spatial grids at discrete points in time in this paper we investigate new approach to time varying simulation spacetime discontinuous galerkin finite element methods the result of this simulation method is simplicial tessellation of spacetime with per element polynomial solutions for physical quantities such as strain stress and velocity to provide accurate visualizations of the resulting solutions we have developed method for per pixel evaluation of solution data on the gpuwe demonstrate the importance of per pixel rendering versus simple linear interpolation for producing high quality visualizations we also show that our system can accommodate reasonably large datasets spacetime meshes containing up to million tetrahedra are not uncommon in this domain
face recognition is very active biometric research field due to the data’s insensitivity to illumination and pose variations face recognition has the potential to perform better than face recognition in this paper we focus on local feature based face recognition and propose novel faceprint method sift features are extracted from texture and range images and matched the matching number of key points together with geodesic distance ratios between models are used as three kinds of matching scores likelihood ratio based score level fusion is conducted to calculate the final matching score thanks to the robustness of sift shape index and geodesic distance against various changes of geometric transformation illumination pose and expression the faceprint method is inherently insensitive to these variations experimental results indicate that faceprint method achieves consistently high performance comparing with commonly used sift on texture images
single chip heterogeneous multiprocessors schms are becoming more commonplace especially in portable devices where reduced energy consumption is priority the use of coordinated collections of processors which are simpler or which execute at lower clock frequencies is widely recognized as means of reducing power while maintaining latency and throughput primary limitation of using this approach to reduce power at the system level has been the time to develop and simulate models of many processors at the instruction set simulator level high level models simulators and design strategies for schms are required to enable designers to think in terms of collections of cooperating heterogeneous processors in order to reduce power toward this end this paper has two contributions the first is to extend unique preexisting high level performance simulator the modeling environment for software and hardware mesh to include power annotations mesh can be thought of as thread level simulator instead of an instruction level simulator thus the problem is to understand how power might be calibrated and annotated with program fragments instead of at the instruction level program fragments are finer grained than threads and coarser grained than instructions our experimentation found that compilers produce instruction patterns that allow power to be annotated at this level using single number over all compiler generated fragments executing on processor since energy is power time this makes system runtime ie performance the dominant factor to be dynamically calculated at this level of simulation the second contribution arises from the observation that high level modeling is most beneficial when it opens up new possibilities for organizing designs thus we introduce design strategy enabled by the high level performance power simulation which we refer to as spatial voltage scaling the strategy both reduces overall system power consumption and improves performance in our example the design space for this design strategy could not be explored without high level schm power performance simulation
one of the major challenges in engineering distributed multiagent systems is the coordination necessary to align the behavior of different agents decentralization of control implies style of coordination in which the agents cooperate as peers with respect to each other and no agent has global control over the system or global knowledge about the system the dynamic interactions and collaborations among agents are usually structured and managed by means of roles and organizations in existing approaches agents typically have dual responsibility on the one hand playing roles within the organization on the other hand managing the life cycle of the organization itself for example setting up the organization and managing organization dynamics engineering realistic multiagent systems in which agents encapsulate this dual responsibility is complex task in this article we present middleware for context driven dynamic agent organizations the middleware is part of an integrated approach called macodo middleware architecture for context driven dynamic agent organizations the complementary part of the macodo approach is an organization model that defines abstractions to support application developers in describing dynamic organizations as described in weyns et al the macodo middleware offers the life cycle management of dynamic organizations as reusable service separated from the agents which makes it easier to understand design and manage dynamic organizations in multiagent systems we give detailed description of the software architecture of the madoco middleware the software architecture describes the essential building blocks of distributed middleware platform that supports the macodo organization model we used the middleware architecture to develop prototype middleware platform for traffic monitoring application we evaluate the macodo middeware architecture by assessing the adaptability scalability and robustness of the prototype platform
vector digital signal processors dsps offer good performance to power consumption ratio therefore they are suitable for mobile devices in software defined radio applications these vector dsps require input algorithms with vector operations the performance of vectorized algorithms to great extent depends on the distribution of data on vector elements traditional algorithms for vectorization focus on the extraction of parallelism from program we propose an analysis tool that focuses on the selection of an efficient dynamic data mapping for vector dsps we transferred garcia’s communication parallelism graph garcia et al ieee trans parallel distrib syst for distributed memory multiprocessor systems to vector dsps by alternating the representation of two dimensional data distributions and the cost models we are able to determine dynamic mapping of data on vector elements on the embedded vector processor evp van berkel et al proceedings of the software defined radio technical conference sdr additionally we propose new efficient algorithm for processing the graph representation that operates in two steps we demonstrate the capabilities of our tool by describing the vectorization of some mimo ofdm algorithms
advancing mobile computing technologies are enabling ldquo ubiquitous personal computing environment rdquo in this paper we focus on an important problem in such environment user mobility in the case of user mobility user is free to access his her personalized service at anytime anywhere through any possible mobile fixed devices providing mobility support in this scenario poses series of challenges the most essential problem is to preserve the user’s access to the same service despite changes of the accessing host or service provider existing system level mobility solutions are insufficient to address this issue since it is not aware of the application semantics on the other hand making each application to be mobility aware will greatly increase the development overhead we argue that the middleware layer is the best place to address this problem on one hand it is aware of application semantics on the other hand by building application neutral mobility functions in the middleware layer we eliminate the need to make each application mobility aware in this paper we design middleware framework to support user mobility in the ubiquitous computing environment its major mobility functions include user level handoff management and service instantiation across heterogeneous computing platforms we validate the major mobility functions using our prototype middleware system and test them on two multimedia applications mobile video player and mobile audio player to maximally approximate the real world user mobility scenario we have conducted experiments on variety of computing platforms and communication paradigms ranging from connected high end pc to handheld devices with wireless networks the results show that our middleware framework is able to provide efficient user mobility support in the heterogeneous computing environment
growing trend in commercial search engines is the display of specialized content such as news products etc interleaved with web search results ideally this content should be displayed only when it is highly relevant to the search query as it competes for space with regular results and advertisements one measure of the relevance to the search query is the click through rate the specialized content achieves when displayed hence if we can predict this click through rate accurately we can use this as the basis for selecting when to show specialized content in this paper we consider the problem of estimating the click through rate for dedicated news search results for queries for which news results have been displayed repeatedly before the click through rate can be tracked online however the key challenge for which previously unseen queries to display news results remains in this paper we propose supervised model that offers accurate prediction of news click through rates and satisfies the requirement of adapting quickly to emerging news events
coalitional games provide useful tool for modeling cooperation in multiagent systems an important special class of coalitional games is weighted voting games in which each player has weight intuitively corresponding to its contribution and coalition is successful if the sum of its members weights meets or exceeds given threshold key question in coalitional games is finding coalitions and payoff division schemes that are stable ie no group of players has any rational incentive to leave in this paper we investigate the computational complexity of stability related questions for weighted voting games we study problems involving the core the least core and the nucleolus distinguishing those that are polynomial time computable from those that are np hard or conp hard and providing pseudopolynomial and approximation algorithms for some of the computationally hard problems
graphs are an increasingly important data source with such important graphs as the internet and the web other familiar graphs include cad circuits phone records gene sequences city streets social networks and academic citations any kind of relationship such as actors appearing in movies can be represented as graph this work presents data mining tool called anf that can quickly answer number of interesting questions on graph represented data such as the following how robust is the internet to failures what are the most influential database papers are there gender differences in movie appearance patterns at its core anf is based on fast and memory efficient approach for approximating the complete neighbourhood function for graph for the internet graph nodes anf’s highly accurate approximation is more than times faster than the exact computation this reduces the running time from nearly day to matter of minute or two allowing users to perform ad hoc drill down tasks and to repeatedly answer questions about changing data sources to enable this drill down anf employs new techniques for approximating neighbourhood type functions for graphs with distinguished nodes and or edges when compared to the best existing approximation anf’s approach is both faster and more accurate given the same resources additionally unlike previous approaches anf scales gracefully to handle disk resident graphs finally we present some of our results from mining large graphs using anf
ip packet streams consist of multiple interleaving ip flows statistical summaries of these streams collected for different measurement periods are used for characterization of traffic billing anomaly detection inferring traffic demands configuring packet filters and routing protocols and more while queries are posed over the set of flows the summarization algorithmis applied to the stream of packets aggregation of traffic into flows before summarization requires storage of per flow counters which is often infeasible therefore the summary has to be produced over the unaggregated stream an important aggregate performed over summary is to approximate the size of subpopulation of flows that is specified posteriori for example flows belonging to an application such as web or dns or flows that originate from certain autonomous system we design efficient streaming algorithms that summarize unaggregated streams and provide corresponding unbiased estimators for subpopulation sizes our summaries outperform in terms of estimates accuracy those produced by packet sampling deployed by cisco’s sampled netflow the most widely deployed such system performance of our best method step sample and hold is close to that of summaries that can be obtainedfrom pre aggregated traffic
substantial number of massive large scale applications require scalable underlying network topologies nowadays structured peer to peer overlay networks meet these requirements very well but there is still need to decide which of these overlay networks is most suitable for providing the best possible performance for certain application this paper describes simcon simulation environment for overlay networks and large scale applications simcon allows the comparison of different overlay networks with respect to predefined metrics derived from requirements of the considered application this approach allows determining which overlay network meets the needs of given application best which in turn is great support for developers of large scale applications
during program maintenance programmer may make changes that enhance program functionality or fix bugs in code then the programmer usually will run unit regression tests to prevent invalidation of previously tested functionality if test fails unexpectedly the programmer needs to explore the edit to find the failure inducing changes for that test crisp uses results from chianti tool that performs semantic change impact analysis to allow the programmer to examine those parts of the edit that affect the failing test crisp then builds compilable intermediate version of the program by adding programmer selected partial edit to the original code augmenting the selection as necessary to ensure compilation the programmer can reexecute the test on the intermediate version in order to locate the exact reasons for the failure by concentrating on the specific changes that were applied in nine initial case studies on pairs of versions from two real java programs daikon and eclipse jdt compiler we were able to use crisp to identify the failure inducing changes for all but of failing tests on average changes were found to affect each failing test of the but only of these changes were found to be actually failure inducing
fully automatic methods that extract lists of objects from the web have been studied extensively record extraction the first step of this object extraction process identifies set of web page segments each of which represents an individual object eg product state of the art methods suffice for simple search but they often fail to handle more complicated or noisy web page structures due to key limitation their greedy manner of identifying list of records through pairwise comparison ie similarity match of consecutive segments this paper introduces new method for record extraction that captures list of objects in more robust way based on holistic analysis of web page the method focuses on how distinct tag path appears repeatedly in the dom tree of the web document instead of comparing pair of individual segments it compares pair of tag path occurrence patterns called visual signals to estimate how likely these two tag paths represent the same list of objects the paper introduces similarity measure that captures how closely the visual signals appear and interleave clustering of tag paths is then performed based on this similarity measure and sets of tag paths that form the structure of data records are extracted experiments show that this method achieves higher accuracy than previous methods
this paper presents geos new algorithm for the efficient segmentation of dimensional image and video datathe segmentation problem is cast as approximate energy minimization in conditional random field new parallel filtering operator built upon efficient geodesic distance computation is used to propose set of spatially smooth contrast sensitive segmentation hypotheses an economical search algorithm finds the solution with minimum energy within sensible and highly restricted subset of all possible labellingsadvantages include computational efficiency with high segmentation accuracy ii the ability to estimate an approximation to the posterior over segmentations iii the ability to handle generally complex energy models comparison with max flow indicates up to times greater computational efficiency as well as greater memory efficiencygeos is validated quantitatively and qualitatively by thorough comparative experiments on existing and novel ground truth data numerous results on interactive and automatic segmentation of photographs video and volumetric medical image data are presented
in software testing developing effective debugging strategies is important to guarantee the reliability of software under testing heuristic technique is to cause failure and therefore expose faults based on this approach mutation testing has been found very useful technique in detecting faults however it suffers from two problems with successfully testing programs requires extensive computing resources and puts heavy demand on human resources later empirical observations suggest that critical slicing based on statement deletion sdl mutation operator has been found the most effective technique in reducing effort and the required computing resources in locating the program faults the second problem of mutation testing may be solved by automating the program testing with the help of software tools our study focuses on determining the effectiveness of the critical slicing technique with the help of the mothra mutation testing system in detecting program faults this paper presents the results showing the performance of mothra mutation testing system through conducting critical slicing testing on selected suite of programs
we present dynamic voltage scaling dvs technique that minimizes system wide energy consumption for both periodic and sporadic tasks it is known that system consists of processors and number of other components energy aware processors can be run in different speed levels components like memory and subsystems and network interface cards can be in standby state when they are active but idle processor energy optimization solutions are not necessarily efficient from the perspective of systems current system wide energy optimization studies are often limited to periodic tasks with heuristics in getting approximated solutions in this paper we develop an exact dynamic programming algorithm for periodic tasks on processors with practical discrete speed levels the algorithm determines the lower bound of energy expenditure in pseudopolynomial time an approximation algorithm is proposed to provide performance guarantee with given bound in polynomial running time because of their time efficiency both the optimization and approximation algorithms can be adapted for online scheduling of sporadic tasks with irregular task releases we prove that system wide energy optimization for sporadic tasks is np hard in the strong sense we develop pseudo polynomial time solutions by exploiting its inherent properties
surveillance has been typical application of wireless sensor networks to conduct surveillance of given area in real life one can use stationary watch towers or can also use patrolling sentinels comparing them to solutions in sensor network surveillance all current coverage based methods fall into the first category in this paper we propose and study patrol based surveillance operations in sensor networks two patrol models are presented the coverage oriented patrol and the on demand patrol they achieve one of the following goals respectively to achieve surveillance of the entire field with low power drain but still bounded delay of detection ii to use an on demand manner to achieve user initiated surveillance only to interested placeswe propose the senstrol protocol to fulfill the patrol setup procedure for both models with the implementation in the glomosim simulator it is shown that patrol on arbitrary path can be set up in network where each node follows time sleep time wake power schedule
overlapping and multiversion techniques are two popular frameworks that transform an ephemeral index into multiple logical tree structure in order to support versioning databases although both frameworks have produced numerous efficient indexing methods their performance analysis is rather limited as result there is no clear understanding about the behavior of the alternative structures and the choice of the best one given the data and query characteristics furthermore query optimization based on these methods is currently impossible these are serious problems due to the incorporation of overlapping and multiversion techniques in several traditional eg financial and emerging eg spatiotemporal applications in this article we reduce performance analysis of overlapping and multiversion structures to that of the corresponding ephemeral structures thus simplifying the problem significantly this reduction leads to accurate cost models that predict the sizes of the trees the node page accesses and selectivity of queries furthermore the models offer significant insight into the behavior of the structures and provide guidelines about the selection of the most appropriate method in practice extensive experimentation proves that the proposed models yield errors below and percnt for uniform and nonuniform data respectively
information retrieval effectiveness is usually evaluated using measures such as normalized discounted cumulative gain ndcg mean average precision map and precision at some cutoff precision on set of judged queries recent research has suggested an alternative evaluating information retrieval systems based on user behavior particularly promising are experiments that interleave two rankings and track user clicks according to recent study interleaving experiments can identify large differences in retrieval effectiveness with much better reliability than other click based methods we study interleaving in more detail comparing it with traditional measures in terms of reliability sensitivity and agreement to detect very small differences in retrieval effectiveness reliable outcome with standard metrics requires about judged queries and this is about as reliable as interleaving with user impressions amongst the traditional measures ndcg has the strongest correlation with interleaving finally we present some new forms of analysis including an approach to enhance interleaving sensitivity
this paper presents fair traceable multi group signatures ftmgs which have enhanced capabilities compared to group and traceable signatures that are important in real world scenarios combining accountability and anonymity the main goal of the primitive is to allow multiple groups that are managed separately managers are not even aware of the other ones yet allowing users in the spirit of the identity initiative to manage what they reveal about their identity with respect to these groups by themselves this new primitive incorporates the following additional features while considering multiple groups it discourages users from sharing their private membership keys through two orthogonal and complementary approaches in fact it merges functionality similar to credential systems with anonymous type of signing with revocation the group manager now mainly manages joining procedures and new entities called fairness authorities and consisting of various representatives possibly are involved in opening and revealing procedures in many systems scenario assuring fairness in anonymity revocation is required we specify the notion and implement it in the random oracle model
this paper proposes an incremental maintenance algorithm that efficiently updates the materialized xpath xslt views defined using xpath expressions in xp vars the algorithm consists of two processes the dynamic execution flow of an xslt program is stored as an xt xml transformation tree during the full transformation in response to source xml data update the impacted portions of the xt tree are identified and maintained by partially re evaluating the xslt program this paper discusses the xpath xslt features of incremental view maintenance for subtree insertion deletion and applies them to the maintenance algorithm experiments show that the incremental maintenance algorithm outperforms full xml transformation algorithms by factors of up to
we consider large overlay network where any two nodes can communicate directly via the underlying internet as long as the sender knows the recipient’s ip address due to the scalability requirement the overlay network must be sparse given node can store at most polylogarithmic number of ip addresses notion of distance locality in the network is given by node to node round trip times we assume that initially the overlay links are random and hence have no explicit locality aware properties we provide fast distributed constructions for various locality aware low stretch distributed data structures such as distance labeling schemes name independent routing schemes and multicast trees in previous work such data structures have only been constructed via centralized algorithms our constructions complete in poly logarithmic time and thus induce at most poly logarithmic load on every given node and achieve quality guarantees similar to those of the corresponding centralized algorithms our algorithms use common locality aware small world like overlay framework constructed via concurrent random walks our guarantees are for growth constrained metrics well studied family of metrics which have been proposed as reasonable abstraction of round trip times in the internet
there has been considerable recent interest in both hardware andsoftware transactional memory tm we present an intermediateapproach in which hardware serves to accelerate tm implementation controlled fundamentally by software specifically we describe an alert on update mechanism aou that allows thread to receive fast asynchronous notification when previously identified lines are written by other threads and programmable data isolation mechanism pdi that allows thread to hide its speculative writes from other threads ignoring conflicts until software decides to make them visible these mechanisms reduce bookkeeping validation and copying overheads without constraining software policy on host of design decisions we have used aou and pdi to implement hardwareacceleratedsoftware transactional memory system we call rtm we have also used aou alone to create simpler rtm lite across range of microbenchmarks rtm outperforms rstm publicly available software transactional memory system by as much as geometric mean of in single thread mode at threads it outperforms rstm by as much as with an average speedup of performance degrades gracefully when transactions overflow hardware structures rtm lite is slightly faster than rtm for transactions that modify only small objects full rtm is significantly faster when objects are large in strongargument for policy flexibility we find that the choice between eager first access and lazy commit time conflict detection can lead to significant performance differences in both directions depending on application characteristics
runtime code generation that uses the values of one or more variables to generate specialized code is called value specific optimization typically value specific optimization focuses on variables that are modified much less frequently than they are referenced we call these glacial variables in current systems that use runtime code generation glacial variables are identified by programmer directives next we describe glacial variable analysis the first data flow analysis for automatically identifying glacial variables we introduce the term staging analysis to describe analyses that divide program into stages or use the stage structure of program glacial variable analysis is an interprocedural staging analysis that identifies the relative modification and reference frequencies for each variable and expression later several experiments are given to characterize set of benchmark programs with respect to their stage structure and we show how often value specific optimization might be applied finally we explain how staging analysis relates to runtime code generation briefly describe glacial variable analysis and present some initial results
because of the widespread increasing application of web services and autonomic computing self adaptive software is an area gaining increasing importance control theory provides theoretical foundation for self adaptive software in this paper we propose the use of the supervisory control theory of discrete event dynamic systems deds to provide rigorous foundation for designing software for reactive systems this paper focuses in particular on design of software with an attractivity requirement it studies this problem using the polynomial dynamic system pds model of deds necessary and sufficient condition for software existence and two algorithms for such software design are presented
the accurate prediction of program’s memory requirements is critical component in software development existing heap space analyses either do not take deallocation into account or adopt specific models of garbage collectors which do not necessarily correspond to the actual memory usage we present novel approach to inferring upper bounds on memory requirements of java like programs which is parametric on the notion of object lifetime ie on when objects become collectible if objects lifetimes are inferred by reachability analysis then our analysis infers accurate upper bounds on the memory consumption for reachability based garbage collector interestingly if objects lifetimes are inferred by heap liveness analysis then we approximate the program minimal memory requirement ie the peak memory usage when using an optimal garbage collector which frees objects as soon as they become dead the key idea is to integrate information on objects lifetimes into the process of generating the recurrence equations which capture the memory usage at the different program states if the heap size limit is set to the memory requirement inferred by our analysis it is ensured that execution will not exceed the memory limit with the only assumption that garbage collection works when the limit is reached experiments on java bytecode programs provide evidence of the feasibility and accuracy of our analysis
mesh network is vulnerable to privacy attacks because of the open medium property of wireless channel the fixed topology and the limited network size traditional anonymous routing algorithm cannot be directly applied to mesh network because they do not defend global attackers in this paper we design private routing algorithm that used onion ie layered encryption to hide routing information in addition we explore special ring topology that fits the investigated network scenario to preserve certain level of privacy against global adversary
developing applications for sensor networks is challenging task most programming systems narrowly focus on programming issues while ignoring that programming represents only tiny fraction of the typical life cycle of an application furthermore application developers face the prospect of investing lot of time in writing code that has nothing to do with the actual application logic lot of this code is related to different life cycle concerns such as distributed programming issues or runtime services eg group communication or time synchronisation in this paper we introduce an engineering method that simplifies the development of sensor network applications by providing comprehensive life cycle support for programming as well as ongoing evolutionary modification of embedded applications throughout the application life cycle the proposed engineering method is realised in form of concrete system called rulecaster to verify the utility of the engineering method and rulecaster we use scenario based evaluation method
program understanding is an essential part of all software maintenance and enhancement activities as currently practiced program understanding consists mainly of code reading the few automated understanding tools that are actually used in industry provide helpful but relatively shallow information such as the line numbers on which variable names occur or the calling structure possible among system components these tools rely on analyses driven by the nature of the programming language used as such they are adequate to answer questions concerning implementation details so called what questions they are severely limited however when trying to relate system to its purpose or requirements the why questions application programs solve real dash world problems the part of the world with which particular application is concerned is that application rsquo domain model of an application rsquo domain can serve as supplement to programming dash language dash based analysis methods and tools domain model carries knowledge of domain boundaries terminology and possible architectures this knowledge can help an analyst set expectations for program content moreover domain model can provide information on how domain concepts are related this article discusses the role of domain knowledge in program understanding it presents method by which domain models together with the results of programming dash language dash based analyses can be used to answers both what and why questions representing the results of domain dash based program understanding is also important and variety of representation techniques are discussed although domain dash based understanding can be performed manually automated tool support can guide discovery reduce effort improve consistency and provide repository of knowledge useful for downstream activities such as documentation reengineering and reuse tools framework for domain dash based program understanding dowser is presented in which variety of tools work together to make use of domain information to facilitate understanding experience with domain dash based program understanding methods and tools is presented in the form of collection of case studies after the case studies are described our work on domain dash based program understanding is compared with that of other researchers working in this area the paper concludes with discussion of the issues raised by domain dash based understanding and directions for future work
middleware based database replication protocols are more portable and flexible than kernel based protocols but have coarser grain information about transaction access data resulting in reduced concurrency and increased aborts this paper proposes conflict aware load balancing techniques to increase the concurrency and reduce the abort rate of middleware based replication protocols experimental evaluation using prototype of our system running the tpc benchmark showed that aborts can be reduced with no penalty in response time
we are interested in wireless sensor networks which are used to detect intrusion objects such as enemy tanks cars submarines etc since sensor nodes have limited energy supply sensor networks are configured to put some sensor nodes in sleep mode to save energy this is special case of randomized scheduling algorithm ignored by many studies an intrusion object’s size and shape are important factors that greatly affect the performance of sensor networks for example an extremely large object in small sensor field can easily be detected by even one sensor node no matter where the sensor node is deployed the larger an intrusion object is the fewer sensor nodes that are required for detection furthermore using fewer sensor nodes can save resources and reduce the waste of dead sensor nodes in the environment therefore studying coverage based on intrusion object’s size is important in this paper we study the performance of the randomized scheduling algorithm via both analysis and simulation in terms of intrusion coverage intensity in particular we study cases where intrusion objects occupy areas in two dimensional plane and where intrusion objects occupy areas in three dimensional space respectively we also study the deployment of sensor nodes when intrusion objects are of different sizes and shapes first sensor nodes are deployed in two dimensional plane and three dimensional space with uniform distributions then they are deployed in two dimensional plane and three dimensional space in two dimensional and three dimensional gaussian distributions respectively therefore our study not only demonstrates the impact of the size and shape of intrusion objects on the performance of sensor networks but also provides guideline on how to configure sensor networks to meet certain detecting capability in more realistic situations
as embedded systems get more complex deployment of embedded operating systems oss as software run time engines has become common in particular this trend is true even for battery powered embedded systems where maximizing battery life is primary concern in such os driven embedded software the overall energy consumption depends very much on which os is used and how the os is used therefore the energy effects of the os need to be studied in order to design low energy systems effectivelyin this paper we discuss the motivation for performing os energy characterization and propose methodology to perform the characterization systematically the methodology consists of two parts the first part is analysis which is concerned with identifying set of components that can be used to characterize the os energy consumption called energy characteristics the second part is macromodeling which is concerned with obtaining quantitative macromodels for the energy characteristics it involves the process of experiment design data collection and macromodel fitting the os energy macromodels can be used conveniently as os energy estimators in high level or architectural optimization of embedded systems for low energy consumptionas far as we know this work is the first attempt to systematically tackle energy macromodeling of an embedded os to demonstrate our approach we present experimental results for two well known embedded oss namely mu os and embedded linux os
hierarchical access control hac has been fundamental problem in computer and network systems since akl and taylor proposed the first hac scheme based on number theory in cryptographic key management techniques for hac have appeared as new and promising class of solutions to the hac problem many cryptographic hac schemes have been proposed in the past two decades one common feature associated with these schemes is that they basically limited dynamic operations at the node level in this paper by introducing the innovative concept of access polynomial and representing key value as the sum of two polynomials in finite field we propose new key management scheme for dynamic access hierarchy the newly proposed scheme supports full dynamics at both the node level and user level in uniform yet efficient manner furthermore the new scheme allows access hierarchy to be random structure and can be flexibly adapted to many other access models such as transfer down and depth limited transfer
the increasing gap in performance between processors and main memory has made effective instructions prefetching techniques more important than ever major deficiency of existing prefetching methods is that most of them require an extra port to cache recent study by rivers et al shows that this factor alone explains why most modern microprocessors do not use such hardware based cache prefetch schemes the contribution of this paper is two fold first we present method that does not require an extra port to cache second the performance improvement for our method is greater than the best competing method bhgp even disregarding the improvement from not having an extra port the three key features of our method that prevent the above deficiencies are as follows first late prefetching is prevented by correlating misses to dynamically preceding instructions for example if the cache miss latency is cycles then the instruction that was fetched cycles prior to the miss is used as the prefetch trigger second the miss history table is kept to reasonable size by grouping contiguous cache misses together and associated them with one preceding instruction and therefore one table entry third the extra cache port is avoided through efficient prefetch filtering methods experiments show that for our benchmarks chosen for their poor cache performance an average improvement of in runtime is achieved versus the bhgp methods while the hardware cost is also reduced the improvement will be greater if the runtime impact of avoiding an extra port is considered when compared to the original machine without prefetching our method improves performance by about for our benchmarks
edit distance based string similarity join is fundamental operator in string databases increasingly many applications in data cleaning data integration and scientific computing have to deal with fuzzy information in string attributes despite the intensive efforts devoted in processing deterministic string joins and managing probabilistic data respectively modeling and processing probabilistic strings is still largely unexplored territory this work studies the string join problem in probabilistic string databases using the expected edit distance eed as the similarity measure we first discuss two probabilistic string models to capture the fuzziness in string values in real world applications the string level model is complete but may be expensive to represent and process the character level model has much more succinct representation when uncertainty in strings only exists at certain positions since computing the eed between two probabilistic strings is prohibitively expensive we have designed efficient and effective pruning techniques that can be easily implemented in existing relational database engines for both models extensive experiments on real data have demonstrated order of magnitude improvements of our approaches over the baseline
record calculi use labels to distinguish between the elements of products and sums this paper presents novel variation type indexed rows in which labels are discarded and elements are indexed by their type alone the calculus lambda tir can express tuples recursive datatypes monomorphic records polymorphic extensible records and closed world style type based overloading our motivating application of lambda tir however is to encode the choice types of xml and the unordered tuple types of sgml indeed lambda tir is the kernel of the language xm lambda lazy functional language with direct support for xml types dtds and terms documents the system is built from rows equality constraints insertion constraints and constrained or qualified parametric polymorphism the test for constraint satisfaction is complete and for constraint entailment is only mildly incomplete we present type checking algorithm and show how lambda tir may be implemented by type directed translation which replaces type indexing by conventional natural number indexing though not presented in this paper we have also developed constraint simplification algorithm and type inference system
web page segmentation is crucial step for many applications in information retrieval such as text classification de duplication and full text search in this paper we describe new approach to segment html pages building on methods from quantitative linguistics and strategies borrowed from the area of computer vision we utilize the notion of text density as measure to identify the individual text segments of web page reducing the problem to solving partitioning task the distribution of segment level text density seems to follow negative hypergeometric distribution described by frumkina’s law our extensive evaluation confirms the validity and quality of our approach and its applicability to the web
in networking it is often required to quantify by how much one protocol is fairer than another and how certain parameter setting and or protocol enhancements improve fairness this paper provides framework to evaluate the fairness of various protocols in general telecommunications network within this framework there are two key components benchmark and single dimension metric we suggest to use the max min fairness bandwidth allocation as the benchmark and the euclidean distance between any bandwidth allocation under any protocol and the max min bandwidth allocation as the metric explicitly we provide method to compare the fairness of two sets of bandwidth allocation under two different protocols for given network by using this metric on the basis of this new framework we evaluate the fairness of fast tcp and tcp reno relative to the max min fairness criteria the distance between the max min fair allocation and allocations based on each of the two protocols is measured using the euclidian norm we derive explicit expressions for these distances for general network and compare the fairness of these two protocols by using their corresponding utility functions finally we numerically demonstrate how this method can be applied to compare the fairness of fast tcp and tcp reno for parking lot linear network and for the nsfnet backbone network in addition to merely comparison between protocols such numerical results can provide guidelines for better choice of parameters to make protocol fairer in given scenario
this paper presents solution for texture mapping unparameterized models the quality of texture on model is often limited by the model’s parameterization into texture space for models with complex topologies or complex distributions of structural detail finding this parameterization can be very difficult and usually must be performed manually through slow iterative process between the modeler and texture painter this is especially true of models which carry no natural parameterizations such as subdivision surfaces or models acquired from scanners instead we remove the parameterization and store the texture in space as sparse adaptive octree because no parameterization is necessary textures can be painted on any surface that can be rendered no mappings between disparate topologies are used so texture artifacts such as seams and stretching do not exist because this method is adaptive detail is created in the map only where required by the texture painter conserving memory usage
previewing links in hypertext navigation helps reduce the cognitive overhead associated with deciding whether or not to follow link in this paper we introduce new concept called dual use of image space duis and we show how it is used provide preview information of image map links in duis the pixels in the image space are used both as shading information as well as characters which can be read this concept provides mechanism for placing the text information related to images in context that is the text is placed within the corresponding objects prior to duis contextualized preview of links was only possible with text links the following are the advantages of contextualized preview image map links readers can benefit from both the text and the image without making visual saccades between the two the text does not obstruct the image as is the case in the existing techniques it is easy for the readers to associate the image and its corresponding image since the two are presented close to each other the text in the image space may also contain links and for this reason it is possible to introduce multiple links for image maps
we predict that the ever growing number of cores on our desktops will require re examination of concurrent programming two technologies are likely to become mainstream in response transactional memory provides superior programming model to traditional lock based concurrency while concurrent gc can take advantage of multiple cores to eliminate perceptible pauses in desktop applications such as games or internet telephony this paper proposes combination of the two technologies producing synergy that improves scalability while eliminating the annoyance of user perceivable pauses specifically we show how concurrent gc can share some of the mechanisms required for transactional memory thus as transactional memory becomes more efficient so too will concurrent gc we demonstrate how using state of the art software transactional memory system we can build state of the art concurrent collector our goal was to reduce of pause times to under one millisecond of the remainder we aim for to be under ms and of those left to be under ms our performance results show that we were able to achieve these targets with pause times between one or two orders of magnitude lower than mainstream technologies
in network processing involving operations such as filtering compression and fusion is widely used in sensor networks to reduce the communication overhead in many tactical and stream oriented wireless network applications both link bandwidth and node energy are critically constrained resources and in network processing itself imposes non negligible computing cost in this work we have developed unified and distributed closed loop control framework that computes both the optimal level of sensor stream compression performed by forwarding node and the best set of nodes where the stream processing operators should be deployed our framework extends the network utility maximization num paradigm where resource sharing among competing applications is modeled as form of distributed utility maximization we also show how our model can be adapted to more realistic cases where in network compression may be varied only discretely and where fusion operation cannot be fractionally distributed across multiple nodes
there has been an increased demand for characterizing user access patterns using web mining techniques since the informative knowledge extracted from web server log files can not only offer benefits for web site structure improvement but also for better understanding of user navigational behavior in this paper we present web usage mining method which utilize web user usage and page linkage information to capture user access pattern based on probabilistic latent semantic analysis plsa model specific probabilistic model analysis algorithm em algorithm is applied to the integrated usage data to infer the latent semantic factors as well as generate user session clusters for revealing user access patterns experiments have been conducted on real world data set to validate the effectiveness of the proposed approach the results have shown that the presented method is capable of characterizing the latent semantic factors and generating user profile in terms of weighted page vectors which may reflect the common access interest exhibited by users among same session cluster
service oriented computing is an emerging paradigm with increasing impact on the way modern software systems are designed and developed services are autonomous loosely coupled and heterogeneous computational entities able to cooperate to achieve common goals this paper introduces model for service orchestration which combines exogenous coordination model with services interfaces annotated with behavioural patterns specified in process algebra which is parametric on the interaction discipline the coordination model is variant of reo for which new semantic model is proposed
it is well known that the use of native methods in java defeats java’s guarantees of safety and security which is why the default policy of java applets for example does not allow loading non local native code however there is already large amount of trusted native code that comprises significant portion of the java development kit jdk we have carried out an empirical security study on portion of the native code in sun’s jdk by applying static analysis tools and manual inspection we have identified in this security critical code previously undiscovered bugs based on our study we describe taxonomy to classify bugs our taxonomy provides guidance to construction of automated and accurate bug finding tools we also suggest systematic remedies that can mediate the threats posed by the native code
this paper presents an approach that helps to discover geographic locations from the recognition extraction and geocoding of urban addresses found in web pages experiments that evaluate the presence and incidence of urban addresses in web pages are described experimental results based on collection of over million documents from the brazilian web show the feasibility and effectiveness of the proposed method
verifying authenticity and integrity of delivered data is indispensable for security sensitive wireless sensor networks wsn unfortunately conventional security approaches are unsuitable for wsn because energy efficiency is really not an important issue however energy conservation is truly critical issue in wsn in this paper proposed hybrid security system called energy efficient hybrid intrusion prohibition ehip system combines intrusion prevention with intrusion detection to provide an energy efficient and secure cluster based wsn cwsn the ehip system consists of authentication based intrusion prevention aip subsystem and collaboration based intrusion detection cid subsystem both subsystems provide heterogeneous mechanisms for different demands of security levels in cwsn to improve energy efficiency in aip two distinct authentication mechanisms are introduced to verify control messages and sensed data to prevent external attacks these two authentication mechanisms are customized according to the relative importance of information contained in control messages and sensed data however because the security threat from compromised sensor nodes cannot be fully avoided by aip cid is therefore proposed in cid the concept of collaborative monitoring is proposed to balance the tradeoff between network security and energy efficiency in order to evaluate the performance of ehip theoretical analyses and simulations of aip and cid are also presented in this paper simulation results fully support the theoretical analysis of ehip
column oriented database systems perform better than traditional row oriented database systems on analytical workloads such as those found in decision support and business intelligence applications moreover recent work has shown that lightweight compression schemes significantly improve the query processing performance of these systems one such lightweight compression scheme is to use dictionary in order to replace long variable length values of certain domain with shorter fixedlength integer codes in order to further improve expensive query operations such as sorting and searching column stores often use order preserving compression schemes in contrast to the existing work in this paper we argue that orderpreserving dictionary compression does not only pay off for attributes with small fixed domain size but also for long string attributes with large domain size which might change over time consequently we introduce new data structures that efficiently support an order preserving dictionary compression for variablelength string attributes with large domain size that is likely to change over time the main idea is that we model dictionary as table that specifies mapping from string values to arbitrary integer codes and vice versa and we introduce novel indexing approach that provides efficient access paths to such dictionary while compressing the index data our experiments show that our data structures are as fast as or in some cases even faster than other state of the art data structures for dictionaries while being less memory intensive
trajectories are spatio temporal traces of moving objects which contain valuable information to be harvested by spatio temporal data mining techniques applications like city traffic planning identification of evacuation routes trend detection and many more can benefit from trajectory mining however the trajectories of individuals often contain private and sensitive information so anyone who possess trajectory data must take special care when disclosing this data removing identifiers from trajectories before the release is not effective against linkage type attacks and rich sources of background information make it even worse an alternative is to apply transformation techniques to map the given set of trajectories into another set where the distances are preserved this way the actual trajectories are not released but the distance information can still be used for data mining techniques such as clustering in this paper we show that an unknown private trajectory can be reconstructed using the available background information together with the mutual distances released for data mining purposes the background knowledge is in the form of known trajectories and extra information such as the speed limit we provide analytical results which bound the number of the known trajectories needed to reconstruct private trajectories experiments performed on real trajectory data sets show that the number of known samples is surprisingly smaller than the actual theoretical bounds
many couples are forced to live apart for work school or other reasons this paper describes our study of such couples and what they lack from existing communication technologies we explored what they wanted to share presence mood environment daily events and activities how they wanted to share simple lightweight playful pleasant interaction and when they wanted to share empty moments such as waiting walking taking break waking up eating and going to sleep empty moments provide compelling new opportunity for design requiring subtlety and flexibility to enable participants to share connection without explicit messages we designed missu as technology probe to study empty moments in situ similar to private radio station missu shares music and background sounds field studies produced results relevant to social science technology and design couples with established routines were comforted characteristics such as ambiguity and movable technology situated in the home yet portable provide support these insights suggest design space for supporting the sharing of empty moments
we present parallel generational copying garbage collector implemented for the glasgow haskell compiler we use block structured memory allocator which provides natural granularity for dividing the work of gc between many threads leading to simple yet effective method for parallelising copying gc the results are encouraging we demonstrate wall clock speedups of on average factor of in gc time on commodity core machine with no programmer intervention compared to our best sequential gc
computational grid is composed of resources owned and controlled by number of geographically distributed organizations each organization demands site autonomy each organization must have complete direct control over their resources eg regulating how much resource can be used by whom and at what time one of the reasons for site autonomy is to ensure that local computation jobs within the organization can obtain the resources in timely fashion typically policy enforcement lies on the shoulders of the system administrators in each organization we see there is great need for developing new automated method to facilitate this process in recent years there has been great deal of research on grid economies to provide accounting management and economic incentives for allocating and sharing grid resources in this paper we describe control theoretic approach for automated enforcement of local policy built upon the notion of computation grid economy closed loop system is constructed by adding controller that manipulates the resource price using proportional integral control law our simulation results show that the controller helps the system to quickly achieve the targeted utilization and adapt to changing conditions
this paper presents novel segmentation approach based on markov random field mrf fusion model which aims at combining several segmentation results associated with simpler clustering models in order to achieve more reliable and accurate segmentation result the proposed fusion model is derived from the recently introduced probabilistic rand measure for comparing one segmentation result to one or more manual segmentations of the same image this non parametric measure allows us to easily derive an appealing fusion model of label fields easily expressed as gibbs distribution or as nonstationary mrf model defined on complete graph concretely this gibbs energy model encodes the set of binary constraints in terms of pairs of pixel labels provided by each segmentation results to be fused combined with prior distribution this energy based gibbs model also allows for definition of an interesting penalized maximum probabilistic rand estimator with which the fusion of simple quickly estimated segmentation results appears as an interesting alternative to complex segmentation models existing in the literature this fusion framework has been successfully applied on the berkeley image database the experiments reported in this paper demonstrate that the proposed method is efficient in terms of visual evaluation and quantitative performance measures and performs well compared to the best existing state of the art segmentation methods recently proposed in the literature
the ary cube denoted by is one of the most important interconnection networks for parallel computing in this paper we consider the problem of embedding cycles and paths into faulty ary cubes let be set of faulty nodes and or edges and we show that when
distributed storage systems provide data availability by means of redundancy to assure given level of availability in case of node failures new redundant fragments need to be introduced since node failures can be either transient or permanent deciding when to generate new fragments is non trivial an additional difficulty is due to the fact that the failure behavior in terms of the rate of permanent and transient failures may vary over time to be able to adapt to changes in the failure behavior many systems adopt reactive approach in which new fragments are created as soon as failure is detected however reactive approaches tend to produce spikes in bandwidth consumption proactive approaches create new fragments at fixed rate that depends on the knowledge of the failure behavior or is given by the system administrator however existing proactive systems are not able to adapt to changing failure behavior which is common in real world we propose new technique based on an ongoing estimation of the failure behavior that is obtained using model that consists of network of queues this scheme combines the adaptiveness of reactive systems with the smooth bandwidth usage of proactive systems generalizing the two previous approaches now the duality reactive or proactive becomes specific case of wider approach tunable with respect to the dynamics of the failure behavior
proof carrying code pcc is general approach to mobile code safety in which the code supplier augments the program with certificate or proof the intended benefit is that the program consumer can locally validate the certificate wrt the untrusted program by means of certificate checker process which should be much simpler efficient and automatic than generating the original proof abstraction carrying code acc is an enabling technology for pcc in which an abstract model of the program plays the role of certificate the generation of the certificate ie the abstraction is automatically carried out by an abstract interpretation based analysis engine which is parametric wrt different abstract domains while the analyzer on the producer side typically has to compute semantic fixpoint in complex iterative process on the receiver it is only necessary to check that the certificate is indeed fixpoint of the abstract semantics equations representing the program this is done in single pass in much more efficient process acc addresses the fundamental issues in pcc and opens the door to the applicability of the large body of frameworks and domains based on abstract interpretation as enabling technology for pcc we present an overview of acc and we describe in tutorial fashion an application to the problem of resource aware security in mobile code essentially the information computed by cost analyzer is used to generate cost certificates which attest safe and efficient use of mobile code receiving side can then reject code which brings cost certificates which it cannot validate or which have too large cost requirements in terms of computing resources in time and or space and accept mobile code which meets the established requirements
we present refinement type based approach for the static verification of complex data structure invariants our approach is based on the observation that complex data structures are typically fashioned from two elements recursion eg lists and trees and maps eg arrays and hash tables we introduce two novel type based mechanisms targeted towards these elements recursive refinements and polymorphic refinements these mechanisms automate the challenging work of generalizing and instantiating rich universal invariants by piggybacking simple refinement predicates on top of types and carefully dividing the labor of analysis between the type system and an smt solver further the mechanisms permit the use of the abstract interpretation framework of liquid type inference to automatically synthesize complex invariants from simple logical qualifiers thereby almost completely automating the verification we have implemented our approach in dsolve which uses liquid types to verify ocaml programs we present experiments that show that our type based approach reduces the manual annotation required to verify complex properties like sortedness balancedness binary search ordering and acyclicity by more than an order of magnitude
we consider piecewise linear embeddings of graphs in space such an embbeding is linkless if every pair of disjoint cycles forms trivial link in the sense of knot theory robertson seymour and thomas showed that graph has linkless embedding in if and only if it does not contain as minor any of seven graphs in petersen’s family graphs obtained from by series of yδ and δy operations they also showed that graph is linklessly embeddable in if and only if it admits flat embedding into ie an embedding such that for every cycle of there exists closed disk with clearly every flat embeddings is linkless but the converse is not true we first consider the following algorithmic problem associated with embeddings in flat embedding for given graph either detect one of petersen’s family graphs as minor in or return flat and hence linkless embedding in the first outcome is certificate that has no linkless and no flat embeddings our first main result is to give an algorithm for this problem while there is known polynomial time algorithm for constructing linkless embeddings this is the first polynomial time algorithm for constructing flat embeddings in space and we thereby settle problem proposed by lovasz we also consider the following classical problem in topology the unknot problem decide if given knot is trivial or not this is fundamental problem in knot theory and low dimensional topology whose time complexity is unresolved it has been extensively studied by researchers working in computational geometry related problem is the link problem decide if two given knots form link hass lagarias and pippenger observed that polynomial time algorithm for the link problem yields polynomial time algorithm for the unknot problem we relate the link problem to the following problem that was proposed independently by lovasz and by robertson et al conjecture lovasz robertson seymour and thomas there is polynomial time algorithm to decide whether given embedding of graph in the space is linkless affirming this conjecture would clearly yield polynomial time solution for the link problem we prove that the converse is also true by providing polynomial time solution for the above conjecture if we are given polynomial time oracle for the link problem
different types of program profiles control flow value address and dependence have been collected and extensively studied by researchers to identify program characteristics that can then be exploited to develop more effective compilers and architectures due to the large amounts of profile data produced by realistic program runs most work has focused on separately collecting and compressing different types of profiles in this paper we present unified representation of profiles called whole execution trace wet which includes the complete information contained in each of the above types of traces thus wets provide basis for next generation software tool that will enable mining of program profiles to identify program characteristics that require understanding of relationships among various types of profiles the key features of our wet representation are wet is constructed by labeling static program representation with profile information such that relavent and related profile information can be directly accessed by analysis algorithms as they traverse the representation highly effective two tier strategy is used to significantly compress the wet and compression techniques are designed such that they do not adversely affect the ability to rapidly traverse wet for extracting subsets of information corresponding to individual profile types as well as combination of profile types eg in form of dynamic slices of wets our experimentation shows that on an average execution traces resulting from execution of million statements can be stored in megabytes of storage after compression the compression factors range from to moreover the rates at which different types of profiles can be individually or simultaneously extracted are high
in valid time indeterminacy it is known that an event stored in database did in fact occur but it is not known exactly when in this paper we extend the sql data model and query language to support valid time indeterminacy we represent the occurrence time of an event with set of possible instants delimiting when the event might have occurred and probability distribution over that set we also describe query language constructs to retrieve information in the presence of indeterminacy these constructs enable users to specify their credibility in the underlying data and their plausibility in the relationships among that data denotational semantics for sql’s select statement with optional credibility and plausibility constructs is given we show that this semantics is reliable in that it never produces incorrect information is maximal in that if it were extended to be more informative the results may not be reliable and reduces to the previous semantics when there is no indeterminacy although the extended data model and query language provide needed modeling capabilities these extensions appear initially to carry significant execution cost contribution of this paper is to demonstrate that our approach is useful and practical an efficient representation of valid time indeterminacy and efficient query processing algorithms are provided the cost of support for indeterminacy is empirically measured and is shown to be modest finally we show that the approach is general by applying it to the temporal query language constructs being proposed for sql
although large volume of literature is available on mobile commerce commerce the topic is still under development and offers potential opportunities for further research and applications since the subject is at the stage of development review of the literature on commerce with the objective of bringing to the fore the state of art in commerce research and applications will initiate further research on the growth of commerce technologies this paper reviews the literature on commerce and applications using suitable classification scheme to identify the gap between theory and practice and future research directions the commerce articles are classified and the results of these are presented based on scheme that consists of five distinct categories commerce theory and research wireless network infrastructure mobile middleware wireless user infrastructure and commerce applications and cases comprehensive list of references is presented we hope that the findings of this research will provide useful insights into the anatomy of commerce literature and be good source for anyone who is interested in commerce the paper also provides some future directions for research
software metrics can provide an automated way for software practitioners to assess the quality of their software the earlier in the software development lifecycle this information is available the more valuable it is since changes are much more expensive to make later in the lifecyclesemantic metrics introduced by etzkorn and delugach assess software according to the meaning of the software’s functionality in its domain this is in contrast to traditional metrics which use syntax measures to assess code because semantic metrics do not rely on the syntax or structure of code they can be computed from requirements or design specifications before the system has been implemented this paper focuses on using semantic metrics to assess systems that have not yet been implemented
this article presents new method to animate photos of characters using motion capture data given single image of person or essentially human like subject our method transfers the motion of skeleton onto the subject’s shape in image space generating the impression of realistic movement we present robust solutions to reconstruct projective camera model and model pose which matches best to the given image depending on the reconstructed view shape template is selected which enables the proper handling of occlusions after fitting the template to the character in the input image it is deformed as rigid as possible by taking the projected motion data into account unlike previous work our method thereby correctly handles projective shape distortion it works for images from arbitrary views and requires only small amount of user interaction we present animations of diverse set of human and nonhuman characters with different types of motions such as walking jumping or dancing
existing keyword search systems in relational databases require users to submit complete query to compute answers often users feel left in the dark when they have limited knowledge about the data and have to use try and see approach for modifying queries and finding answers in this paper we propose novel approach to keyword search in the relational world called tastier tastier system can bring instant gratification to users by supporting type ahead search which finds answers on the fly as the user types in query keywords main challenge is how to achieve high interactive speed for large amounts of data in multiple tables so that query can be answered efficiently within milliseconds we propose efficient index structures and algorithms for finding relevant answers on the fly by joining tuples in the database we devise partition based method to improve query performance by grouping highly relevant tuples and pruning irrelevant tuples efficiently we also develop technique to answer query efficiently by predicting the highly relevant complete queries for the user we have conducted thorough experimental evaluation of the proposed techniques on real data sets to demonstrate the efficiency and practicality of this new search paradigm
in this paper we tackle the problem of helping domain experts to construct parameterize and deploy mashups of data and code we view mashup as data processing flow that describes how data is obtained from one or more sources processed by one or more components and finally sent to one or more sinks our approach allows specifying patterns of flows in language called cascade the patterns cover different possible variations of the flows including variations in the structure of the flow the components in the flow and the possible parameterizations of these components we present tool that makes use of this knowledge of flow patterns and associated metadata to allow domain experts to explore the space of possible flows described in the pattern the tool uses an ai planning approach to automatically build flow belonging to the flow pattern from high level goal specified as set of tags we describe examples from the financial services domain to show the use of flow patterns in allowing domain experts to construct large variety of mashups rapidly
meeting timing constraint is one of the most important issues for modern design automation tools this situation is exacerbated with the existence of process variation current high level synthesis tools performing task scheduling resource allocation and binding may result in unexpected performance discrepancy due to the ignorance of the impact of process variation which requires shift in the design paradigm from today’s deterministic design to statistical or probabilistic design in this paper we present variation aware performance yield guaranteed high level synthesis algorithm the proposed approach integrates high level synthesis and statistical static timing analysis into simulated annealing engine to simultaneously explore solution space while meeting design objectives our results show that the area reduction is in the average of when performance yield is imposed with the same total completion time constraint
we installed large plasma displays on the walls of seven inside offices of faculty and staff at university and displayed as the default image real time hdtv views of the immediate outside scene then utilizing field study methodology data were collected over week period to explore the user experience with these large display windows through the triangulation of data pages of interview transcripts journal entries and responses to email inquiries results showed that users deeply appreciated many aspects of their experience benefits included reported increase in users connection to the wider social community connection to the natural world psychological wellbeing and cognitive functioning users also integrated the large display window into their workplace practice however users expressed concerns particularly about the impacts on the privacy of people whose images were captured in the public place by the hdtv camera discussion focuses on design challenges for future investigations into related uses of large displays
we focus on the creative use of paper in the music composition process particularly the interaction between paper and end user programming when expressing musical ideas composers draw in precise way not just sketch working in close collaboration with composers we designed musink to provide them with smooth transition between paper drawings and openmusic flexible music composition tool musink’s built in recognizers handle common needs such as scoping and annotation users can also define new gestures and associate them with their own or predefined software functions musink supports semi structured delayed interpretation and serves as customizable gesture browser giving composers significant freedom to create their own individualized composition languages and to experiment with music on paper and on line
dynamic slicing is well known technique for program analysis debugging and understanding given program and input it finds all program statements which directly indirectly affect the values of some variables occurrences when is executed with in this article we develop dynamic slicing method for java programs our technique proceeds by backwards traversal of the bytecode trace produced by an input in given program since such traces can be huge we use results from data compression to compactly represent bytecode traces the major space savings in our method come from the optimized representation of data addresses used as operands by memory reference bytecodes and instruction addresses used as operands by control transfer bytecodes we show how dynamic slicing algorithms can directly traverse our compact bytecode traces without resorting to costly decompression we also extend our dynamic slicing algorithm to perform ldquo relevant slicing rdquo the resultant slices can be used to explain omission errors that is why some events did not happen during program execution detailed experimental results on space time overheads of tracing and slicing are reported in the article the slices computed at the bytecode level are translated back by our tool to the source code level with the help of information available in java class files our jslice dynamic slicing tool has been integrated with the eclipse platform and is available for usage in research and development
the constrained expression approach to analysis of concurrent software systems can be used with variety of design and programming languages and does not require complete enumeration of the set of reachable states of the concurrent system the construction of toolset automating the main constrained expression analysis techniques and the results of experiments with that toolset are reported the toolset is capable of carrying out completely automated analyses of variety of concurrent systems starting from source code in an ada like design language and producing system traces displaying the properties represented bv the analysts queries the strengths and weaknesses of the toolset and the approach are assessed on both theoretical and empirical grounds
in this paper we present an efficient scalable and general algorithm for performing set joins on predicates involving various similarity measures like intersect size jaccard coefficient cosine similarity and edit distance this expands the existing suite of algorithms for set joins on simpler predicates such as set containment equality and non zero overlap we start with basic inverted index based probing method and add sequence of optimizations that result in one to two orders of magnitude improvement in running time the algorithm folds in data partitioning strategy that can work efficiently with an index compressed to fit in any available amount of main memory the optimizations used in our algorithm generalize to several weighted and unweighted measures of partial word overlap between sets
many real world applications of multiagent systems require independently designed heterogeneous and operated autonomous agents to interoperate we consider agents who offer business services and collaborate in interesting business service engagements we formalize notions of interoperability and conformance which appropriately support agent heterogeneity and autonomy with respect to autonomy our approach considers the choices that each agent has and how their choices are coordinated so that at any time one agent leads and its counterpart follows but with initiative fluidly shifting among the participants with respect to heterogeneity we characterize the variations in the agents designs and show how an agent may conform to specification or substitute for another agent our approach addresses challenging problem with multi party interactions that existing approaches cannot solve further we introduce set of edit operations by which to modify an agent design so as to ensure its conformance with others
we propose formal specification framework for functional aspects of services we define services as operations which are specified by means of pre and postconditions for the specification of which we use extensions of description logic the extensions of description logic and the specification framework itself are defined as institutions this gives the framework uniformity of definition and solid algebraic and logical foundation the framework can be used for the specification of service requests and service providers given signature morphism from request to provider we define when service request is matched by service provider which can be used in service discovery we provide model theoretic definition of matching and show that matching can be characterized by semantic entailment relation which is formulated over particular standard description logic thus proofs of matching can be reduced to standard reasoning in description logic for which one can use description logic reasoners
rich internet applications rias are quickly becoming the de facto standard for interactive web applications on the internet featuring rich interfaces that increase user usability and efficiency these technologies increase the complexity of implementing web applications making it difficult to address non functional requirements such as application quality and reliability there is much activity in developing modelling languages for web applications but rias introduce additional concerns for application developers without identifying the requirements of interactive web applications we cannot quantitatively compare different formal methodologies nor suggest they are robust enough for industryin this paper we present comprehensive list of web application modelling requirements derived from previous work and existing real world interactive web applications we use these requirements to then propose an industry inspired benchmarking application which allows us to evaluate approaches to handling the complexity of modelling real world applications
in modern day high performance processors the complexity of the register rename logic grows along with the pipeline width and leads to larger renaming time delay and higher power consumption renaming logic in the front end of the processor is one of the largest contributors of peak temperatures on the chip and so demands attention to reduce the power consumption further with the advent of clustered microarchitectures the rename map table at the front end is shared by the clusters and hence its critical path delay should not become bottleneck in determining the processor clock cycle time analysis of characteristics of spec integer benchmark programs reveals that when the programs are processed in wide processor none or only one two source instruction an instruction with two source registers is renamed in cycle for percent of the total execution time similarly in an wide processor none or only one two source instruction is renamed in cycle for percent of the total execution time thus the analysis observes that the rename map table port bandwidth is highly underutilized for significant portion of time based on the analysis in this paper we propose novel technique to significantly reduce the number of ports in the rename map table the novelty of the technique is that it is easy to implement and succeeds in reducing the access time power and area of the rename logic without any additional power area and delay overheads in any other logic on the chip the proposed technique performs the register renaming of instructions in the order of their fetch with no significant impact on the processor’s performance with this technique in an wide processor as compared to conventional rename map table in an integer pipeline with ports to look up source operands rename map table with nine ports results in reduction in access time power and area by percent percent and percent respectively with only percent loss in instructions committed per cycle ipc the implementation of the technique in wide processor results in reduction in access time power and area by percent percent and percent respectively with an ipc loss of only percent
as new generation of parallel supercomputers enables researchers to conduct scientific simulations of unprecedented scale and resolution terabyte scale simulation output has become increasingly commonplace analysis of such massive data sets is typically bound many parallel analysis programs spend most of their execution time reading data from disk rather than performing useful computation to overcome this bottleneck we have developed new data access method our main idea is to cache copy of simulation output files on the local disks of an analysis cluster’s compute nodes and to use novel task assignment protocol to co locate data access with computation we have implemented our methodology in parallel disk cache system called zazen by avoiding the overhead associated with querying metadata servers and by reading data in parallel from local disks zazen is able to deliver sustained read bandwidth of over gigabytes per second on commodity linux cluster with nodes approaching the optimal aggregated bandwidth attainable on these nodes compared with conventional nfs pvfs and hadoop hdfs respectively zazen is and times faster for accessing large gb files and and times faster for accessing small mb files we have deployed zazen in conjunction with anton special purpose supercomputer that dramatically accelerates molecular dynamics md simulations and have been able to accelerate the parallel analysis of terabyte scale md trajectories by about an order of magnitude
we propose novel trust metric for social networks which is suitable for application to recommender systems it is personalised and dynamic and allows to compute the indirect trust between two agents which are not neighbours based on the direct trust between agents that are neighbours in analogy to some personalised versions of pagerank this metric makes use of the concept of feedback centrality and overcomes some of the limitations of other trust metrics in particular it does not neglect cycles and other patterns characterising social networks as some other algorithms do in order to apply the metric to recommender systems we propose way to make trust dynamic over time we show by means of analytical approximations and computer simulations that the metric has the desired properties finally we carry out an empirical validation on dataset crawled from an internet community and compare the performance of recommender system using our metric to one using collaborative filtering
the set agreement problem is generalization of the uniform consensus problem each process proposes value and each non faulty process has to decide value such that decided value is proposed value and at most different values are decided it has been shown that any algorithm that solves the set agreement problem in synchronous systems that can suffer up to crash failures requires rounds in the worst case it has also been shown that it is possible to design early deciding algorithms where no process decides and halts after min rounds where is the number of actual crashes in run this paper explores new direction to solve the set agreement problem in synchronous system it considers that the system is enriched with base objects denoted sa objects that allow solving the set agreement problem in set of mprocesses the paper has several contributions it first proposes synchronous set agreement algorithm that benefits from such underlying base objects this algorithm requires tl mk rounds more precisely rt rounds where mod the paper then shows that this bound that involves all the parameters that characterize both the problem and its environment and is lower bound the proof of this lower bound sheds additional light on the deep connection between synchronous efficiency and asynchronous computability finally the paper extends its investigation to the early deciding case it presents set agreement algorithm that directs the processes to decide and stop by round rf min these bounds generalize the bounds previously established for solving the set problem in pure synchronous systems
traditionally computer interfaces have been confined to conventional displays and focused activities however as displays become embedded throughout our environment and daily lives increasing numbers of them must operate on the periphery of our attention peripheral displays can allow person to be aware of information while she is attending to some other primary task or activity we present the peripheral displays toolkit ptk toolkit that provides structured support for managing user attention in the development of peripheral displays our goal is to enable designers to explore different approaches to managing user attention the ptk supports three issues specific to conveying information on the periphery of human attention these issues are abstraction of raw input rules for assigning notification levels to input and transitions for updating display when input arrives our contribution is the investigation of issues specific to attention in peripheral display design and toolkit that encapsulates support for these issues we describe our toolkit architecture and present five sample peripheral displays demonstrating our toolkit’s capabilities
symbolic simulation and uninterpreted functions have long been staple techniques for formal hardware verification in recent years we have adapted these techniques for the automatic formal verification of low level embedded software specifically checking the equivalence of different versions of assembly language programs our approach though limited in scalability has proven particularly promising for the intricate code optimizations and complex architectures typical of high performance embedded software such as for dsps and vliw processors indeed one of our key findings was how easy it was to create or retarget our verification tools to different even very complex machines the resulting tools automatically verified or found previously unknown bugs in several small sequences of industrial and published example code this paper provides an introduction to these techniques and review of our results
it has become promising direction to measure similarity of web search queries by mining the increasing amount of click through data logged by web search engines which record the interactions between users and the search engines most existing approaches employ the click through data for similarity measure of queries with little consideration of the temporal factor while the click through data is often dynamic and contains rich temporal information in this paper we present new framework of time dependent query semantic similarity model on exploiting the temporal characteristics of historical click through data the intuition is that more accurate semantic similarity values between queries can be obtained by taking into account the timestamps of the log data with set of user defined calendar schema and calendar patterns our time dependent query similarity model is constructed using the marginalized kernel technique which can exploit both explicit similarity and implicit semantics from the click through data effectively experimental results on large set of click through data acquired from commercial search engine show that our time dependent query similarity model is more accurate than the existing approaches moreover we observe that our time dependent query similarity model can to some extent reflect real world semantics such as real world events that are happening over time
energy consumption is crucial factor in designing battery operated embedded and mobile systems the memory system is major contributor to the system energy in such environments in order to optimize energy and energy delay in the memory system we investigate ways of splitting the instruction cache into several smaller units each of which is cache by itself called subcache the subcache architecture employs page based placement strategy dynamic cache line remapping policy and predictive precharging policy in order to improve the memory system energy behavior using applications from the specjvm and specint benchmarks the proposed subcache architecture is shown to be effective in improving both the energy and energy delay metrics
model composition helps designers managing complexities by modeling different system views separately and later compose them into an integrated model in the past years researchers have focused on the definition of model composition approaches operators and the tools supporting them model composition engines testing model composition engines is hard it requires the synthesis and analysis of complex data structures models in this context synthesis means to assembly complex structures in coherent way with respect to semantic constraints in this paper we propose to automatically synthesize input data for model composition engines using model decomposition operator through this operator we synthesize models in coherent way satisfying semantic constraints and taking into account the complex mechanics involved in the model composition furthermore such operator enables straightforward analysis of the composition result
there is now extensive interest in reasoning about moving objects probabilistic spatio temporal pst knowledge base kb contains atomic statements of the form object is was will be in region at time with probability in the interval in this paper we study mechanisms for belief revision in pst kbs we propose multiple methods for revising pst kbs these methods involve finding maximally consistent subsets and maximal cardinality consistent subsets in addition there may be applications where the user has doubts about the accuracy of the spatial information or the temporal aspects or about the ability to recognize objects in such statements we study belief revision mechanisms that allow changes to the kb in each of these three components finally there may be doubts about the assignment of probabilities in the kb allowing changes to the probability of statements in the kb yields another belief revision mechanism each of these belief revision methods may be epistemically desirable for some applications but not for others we show that some of these approaches cannot satisfy agm style axioms for belief revision under certain conditions we also perform detailed complexity analysis of each of these approaches simply put all belief revision methods proposed that satisfy agm style axioms turn out to be intractable with the exception of the method that revises beliefs by changing the probabilities minimally in the kb we also propose two hybrids of these basic approaches to revision and analyze the complexity of these hybrid methods
encodings based on higher order abstract syntax represent the variables of an object language as the variables of meta language such encodings allow for the reuse of conversion substitution and hypothetical judgments already defined in the meta language and thus often lead to simple and natural formalization however it is also well known that there are some inherent difficulties with higher order abstract syntax in supporting recursive definitionswe demonstrate novel approach to explicitly combining higher order abstract syntax with first order abstract syntax that makes use of restricted form of dependent types with this combination we can readily define recursive functions over first order abstract syntax while ensuring the correctness of these functions through higher order abstract syntax we present an implementation of substitution and verified evaluator for pure untyped call by value calculus
spatial join finds pairs of spatial objects having specific spatial relationship in spatial database systems number of spatial join algorithms have recently been proposed in the literature most of them however perform the join in the original space joining in the original space has drawback of dealing with sizes of objects and thus has difficulty in developing formal algorithm that does not rely on heuristics in this paper we propose spatial join algorithm based on the transformation technique an object having size in the two dimensional original space is transformed into point in the four dimensional transform space and the join is performed on these point objects this can be easily extended to dimensional cases we show the excellence of the proposed approach through analysis and extensive experiments the results show that the proposed algorithm has performance generally better than that of the based algorithm proposed by brinkhoff et al this is strong indicating that corner transformation preserves clustering among objects and that spatial operations can be performed better in the transform space than in the original space this reverses the common belief that transformation will adversely affect clustering we believe that our result will provide new insight towards transformation based spatial query processing
security in ambient intelligence ami poses too many challenges due to the inherently insecure nature of wireless sensor nodes however there are two characteristics of these environments that can be used effectively to prevent detect and confine attacks redundancy and continuous adaptation in this article we propose global strategy and system architecture to cope with security issues in ami applications at different levels unlike in previous approaches we assume an individual wireless node is vulnerablewe present an agent based architecture with supporting services that is proven to be adequate to detect and confine common attacks decisions at different levels are supported by trust based framework with good and bad reputation feedback while maintaining resistance to bad mouthing attacks we also propose set of services that can be used to handle identification authentication and authorization in intelligent ambientsthe resulting approach takes into account practical issues such as resource limitation bandwidth optimization and scalability
ranking is main research issue in ir styled keyword search over set of documents in this paper we study new keyword search problem called context sensitive document ranking which is to rank documents with an additional context that provides additional information about the application domain where the documents are to be searched and ranked the work is motivated by the fact that additional information associated with the documents can possibly assist users to find more relevant documents when they are unable to find the needed documents from the documents alone in this paper context is multi attribute graph which can represent any information maintained in relational database the context sensitive ranking is related to several research issues how to score documents how to evaluate the additional information obtained in the context that may contribute the document ranking how to rank the documents by combining the scores costs from the documents and the context more importantly the relationships between documents and the information stored in relational database may be uncertain because they are from different data sources and the relationships are determined systematically using similarity match which causes uncertainty in this paper we concentrate ourselves on these research issues and provide our solution on how to rank the documents in context where there exist uncertainty between the documents and the context we confirm the effectiveness of our approaches by conducting extensive experimental studies using real datasets
there exist many embedded applications such as those executing on set top boxes wireless base stations hdtv and mobile handsets that are structured as nested loops and benefit significantly from software managed memory prior work on scratchpad memories spms focused primarily on applications with regular data access patterns unfortunately some embedded applications do not fit in this category and consequently conventional spm management schemes will fail to produce the best results for them in this work we propose novel compilation strategy for data spms for embedded applications that exhibit irregular data access patterns our scheme divides the task of optimization between compiler and runtime the compiler processes each loop nest and insert code to collect information at runtime then the code is modified in such fashion that depending on the collected information it dynamically chooses to use or not to use the data spm for given set of accesses to irregular arrays our results indicate that this approach is very successful with the applications that have irregular patterns and improves their execution cycles by about over state of the art spm management technique and over the conventional cache memories also the additional code size overhead incurred by our approach is less than for all the applications tested
hardware performance monitors provide detailed direct feedback about application behavior and are an additional source of infor mation that compiler may use for optimization jit compiler is in good position to make use of such information because it is running on the same platform as the user applications as hardware platforms become more and more complex it becomes more and more difficult to model their behavior profile information that captures general program properties like execution frequency of methods or basic blocks may be useful but does not capture sufficient information about the execution platform machine level performance data obtained from hardware performance monitor can not only direct the compiler to those parts of the program that deserve its attention but also determine if an optimization step actually improved the performance of the application this paper presents an infrastructure based on dynamic compiler runtime environment for java that incorporates machine level information as an additional kind of feedback for the compiler and runtime environment the low overhead monitoring system provides fine grained performance data that can be tracked back to individual java bytecode instructions as an example the paper presents results for object co allocation in generational garbage collector that optimizes spatial locality of objects on line using measurements about cache misses in the best case the execution time is reduced by and cache misses by
multicore processors have emerged as powerful platform on which to efficiently exploit thread level parallelism tlp however due to amdahls law such designs will be increasingly limited by the remaining sequential components of applications to overcome this limitation it is necessary to design processors with many lower performance cores for tlp and some high performance cores designed to execute sequential algorithms such cores will need to address the memory wall by implementing kilo instruction windows large window processors require large load store queues that would be too slow if implemented using current cambased designs this paper proposes an epoch based load store queue elsq new design based on execution locality it is integrated into large window processor that has fast out of order core operating only on cache hits and slower cores that process misses and their dependent instructions the large lsq is coupled with the slow cores and is partitioned into small and local lsqs one per core we evaluate elsq in large window environment finding that it enables high performance at low power by exploiting locality among loads and stores elsq outperforms even an idealized central lsq when implemented on top of decoupled processor design
recent studies have showed the effectiveness of job co scheduling in alleviating shared cache contention on chip multiprocessors although program inputs affect cache usage and thus cache contention significantly their influence on co scheduling remains unexplored in this work we measure that influence and show that the ability to adapt to program inputs is important for co scheduler to work effectively on chip multiprocessors we then conduct an exploration in addressing the influence by constructing cross input predictive models for some memory behaviors that are critical for recently proposed co scheduler the exploration compares the effectiveness of both linear and non linear regression techniques in the model building finally we conduct systematic measurement of the sensitivity of co scheduling on the errors of the predictive behavior models the results demonstrate the potential of the predictive models in guiding contention aware co scheduling
in this paper flow sensitive context insensitive alias analysis in java is proposed it is more efficient and precise than previous analyses for and it does not negatively affect the safety of aliased references to this end we first present reference set alias representation second data flow equations based on the propagation rules for the reference set alias representation are introduced the equations compute alias information more efficiently and precisely than previous analyses for third for the constant time complexity of the type determination type table is introduced with reference variables and all possible types for each reference variable fourth an alias analysis algorithm is proposed which uses popular iterative loop method for an alias analysis finally running times of benchmark codes are compared for reference set and existing object pair representation
many searches on the web have transactional intent we argue that pages satisfying transactional needs can be distinguished from the more common pages that have some information and links but cannot be used to execute transaction based on this hypothesis we provide recipe for constructing transaction annotator by constructing an annotator with one corpus and then demonstrating its classification performance on another we establish its robustness finally we show experimentally that search procedure that exploits such pre annotation greatly outperforms traditional search for retrieving transactional pages
collaborative and content based filtering are the recommendation techniques most widely adopted to date traditional collaborative approaches compute similarity value between the current user and each other user by taking into account their rating style that is the set of ratings given on the same items based on the ratings of the most similar users commonly referred to as neighbors collaborative algorithms compute recommendations for the current user the problem with this approach is that the similarity value is only computable if users have common rated items the main contribution of this work is possible solution to overcome this limitation we propose new content collaborative hybrid recommender which computes similarities between users relying on their content based profiles in which user preferences are stored instead of comparing their rating styles in more detail user profiles are clustered to discover current user neighbors content based user profiles play key role in the proposed hybrid recommender traditional keyword based approaches to user profiling are unable to capture the semantics of user interests distinctive feature of our work is the integration of linguistic knowledge in the process of learning semantic user profiles representing user interests in more effective way compared to classical keyword based profiles due to sense based indexing semantic profiles are obtained by integrating machine learning algorithms for text categorization namely naïve bayes approach and relevance feedback method with word sense disambiguation strategy based exclusively on the lexical knowledge stored in the wordnet lexical database experiments carried out on content based extension of the eachmovie dataset show an improvement of the accuracy of sense based profiles with respect to keyword based ones when coping with the task of classifying movies as interesting or not for the current user an experimental session has been also performed in order to evaluate the proposed hybrid recommender system the results highlight the improvement in the predictive accuracy of collaborative recommendations obtained by selecting like minded users according to user profiles
this paper offers first in breadth survey and comparison of current aspect mining tools and techniques it focuses mainly on automated techniques that mine program’s static or dynamic structure for candidate aspects we present an initial comparative framework for distinguishing aspect mining techniques and assess known techniques against this framework the results of this assessment may serve as roadmap to potential users of aspect mining techniques to help them in selecting an appropriate technique it also helps aspect mining researchers to identify remaining open research questions possible avenues for future research and interesting combinations of existing techniques
in many distributed systems concurrent access is required to shared object where abstract object servers may incorporate type specific properties to define consistency requirements each operation and its outcome is treated as an event and conflicts may occur between different event types hence concurrency control and synchronization are required at the granularity of conflicting event types with such fine granularity of locking the occurrence of conflicts is likely to be lower than with whole object locking so optimistic techniques become more attractive this work describes the design implementation and performance of servers for shared atomic object semiqueue where each server employs either pessimistic or optimistic locking techniques on each conflicting event type we compare the performance of purely optimistic server purely pessimistic server and hybrid server which treats certain event types optimistically and others pessimistically to demonstrate the most appropriate environment for using pessimistic optimistic or hybrid control we show that the advantages of low overhead on optimistic locking at low conflict levels is offset at higher conflict levels by the wasted work done by aborted transactions to achieve optimum performance over the whole range of conflict levels an adaptable server is required whereby the treatment of conflicting event types can be changed dynamically between optimistic and pessimistic according to various criteria depending on the expected frequency of conflict we describe our implementations of adaptable servers which may allocate concurrency control strategy on the basis of state information the history of conflicts encountered or by using preset transaction priorities we show that the adaptable servers perform almost as well as the best of the purely optimistic pessimistic or hybrid servers under the whole range of conflict levels showing the versatility and efficiency of the dynamic servers finally we outline general design methodology for implementing adaptable concurrency control in servers for atomic objects illustrated using an atomic shared tree
in video on demand vod applications it is desirable to provide the user with the video cassette recorder like vcr capabilities such as fast forwarding video or jumping to specific frame we address this issue in the broadcast framework where each video is broadcast repeatedly on the network existing techniques rely on data prefetching as the mechanism to provide this functionality this approach provides limited usability since the prefetching rate cannot keep up with typical fast forward speeds fast forwarding video for several seconds would inevitably exhaust the prefetch buffer we address this practical problem in this paper by repeatedly broadcasting the interactive versions of the videos for instance an interactive version might contain only every fifth frame in the original video our client software leverages these interactive broadcasts to provide better vcr service we formally prove the correctness of this approach and compare its performance to prefetch method called active buffer management this scheme has been shown to offer in the broadcast environment the best performance to date our simulation results indicate that the new technique is superior in handling long duration vcr actions
motivated by the low structural fidelity for near regular textures in current texture synthesis algorithms we propose and implement an alternative texture synthesis method for near regular texture we view such textures as statistical departures from regular patterns and argue that thorough understanding of their structures in terms of their translation symmetries can enhance existing methods of texture synthesis we demonstrate the perils of texture synthesis for near regular texture and the promise of faithfully preserving the regularity as well as the randomness in near regular texture sample
large writes are beneficial both on individual disks and on disk arrays eg raid the presented design enables large writes of internal tree nodes and leaves it supports both in place updates and large append only log structured write operations within the same storage volume within the same tree and even at the same time the essence of the proposal is to make page migration inexpensive to migrate pages while writing them and to make such migration optional rather than mandatory as in log structured file systems the inexpensive page migration also aids traditional defragmentation as well as consolidation of free space needed for future large writes these advantages are achieved with very limited modification to conventional trees that also simplifies other tree operations eg key range locking and compression prior proposals and prototypes implemented transacted tree on top of log structured file systems and added transaction support to log structured file systems instead the presented design adds techniques and performance characteristics of log structured file systems to traditional trees and their standard transaction support notably without adding layer of indirection for locating tree nodes on disk the result retains fine granularity locking full transactional acid guarantees fast search performance etc expected of modern tree implementation yet adds efficient transacted page relocation and large high bandwidth writes
abstract clustering is central task in data mining applications such as customer segmentation high dimensional data has always been challenge for clustering algorithms because of the inherent sparsity of the points therefore techniques have recently been proposed to find clusters in hidden subspaces of the data however since the behavior of the data can vary considerably in different subspaces it is often difficult to define the notion of cluster with the use of simple mathematical formalizations the widely used practice of treating clustering as the exact problem of optimizing an arbitrarily chosen objective function can often lead to misleading results in fact the proper clustering definition may vary not only with the application and data set but also with the perceptions of the end user this makes it difficult to separate the definition of the clustering problem from the perception of an end user in this paper we propose system which performs high dimensional clustering by cooperation between the human and the computer the complex task of cluster creation is accomplished through combination of human intuition and the computational support provided by the computer the result is system which leverages the best abilities of both the human and the computer for solving the clustering problem
physical simulation of dynamic objects has become commonplace in computer graphics because it produces highly realistic animations in this paradigm the animator provides few physical parameters such as the objects initial positions and velocities and the simulator automatically generates realistic motions the resulting motion however is difficult to control because even small adjustment of the input parameters can drastically affect the subsequent motion furthermore the animator often wishes to change the end result of the motion instead of the initial physical parameters we describe novel interactive technique for intuitive manipulation of rigid multi body simulations using our system the animator can select bodies at any time and simply drag them to desired locations in response the system computes the required physical parameters and simulates the resulting motion surface characteristics such as normals and elasticity coefficients can also be automatically adjusted to provide greater range of feasible motions if the animator so desires because the entire simulation editing process runs at interactive speeds the animator can rapidly design complex physical animations that would be difficult to achieve with existing rigid body simulators
gossip based algorithms were first introduced for reliably disseminating data in large scale distributed systems however their simplicity robustness and flexibility make them attractive for more than just pure data dissemination alone in particular gossiping has been applied to data aggregation overlay maintenance and resource allocation gossiping applications more or less fit the same framework with often subtle differences in algorithmic details determining divergent emergent behavior this divergence is often difficult to understand as formal models have yet to be developed that can capture the full design space of gossiping solutions in this paper we present brief introduction to the field of gossiping in distributed systems by providing simple framework and using that framework to describe solutions for various application domains
recently developed quantitative model describing the dynamical response characteristics of primate cones is used for rendering high dynamic range hdr video the model provides range compression as well as luminance dependent noise suppression the steady state static version of the model provides global tone mapping algorithm for rendering hdr images both the static and dynamic cone models can be inverted enabling expansion of the hdr images and video that were compressed with the cone model
reproducing bugs is hard deterministic replay systems address this problem by providing high fidelity replica of an original program run that can be repeatedly executed to zero in on bugs unfortunately existing replay systems for multiprocessor programs fall short these systems either incur high overheads rely on non standard multiprocessor hardware or fail to reliably reproduce executions their primary stumbling block is data races source of nondeterminism that must be captured if executions are to be faithfully reproduced in this paper we present odr software only replay system that reproduces bugs and provides low overhead multiprocessor recording the key observation behind odr is that for debugging purposes replay system does not need to generate high fidelity replica of the original execution instead it suffices to produce any execution that exhibits the same outputs as the original guided by this observation odr relaxes its fidelity guarantees to avoid the problem of reproducing data races altogether the result is system that replays real multiprocessor applications such as apache mysql and the java virtual machine and provides low record mode overhead
recent high level synthesis approaches and based hardware description languages attempt to improve the hardware design process by allowing developers to capture desired hardware functionality in well known high level source language however these approaches have yet to achieve wide commercial success due in part to the difficulty of incorporating such approaches into software tool flows the requirement of using specific language compiler or development environment may cause many software developers to resist such approaches due to the difficulty and possible instability of changing well established robust tool flows thus in the past several years synthesis from binaries has been introduced both in research and in commercial tools as means of better integrating with tool flows by supporting all high level languages and software compilers binary synthesis can be more easily integrated into software development tool flow by only requiring an additional backend tool and it even enables completely transparent dynamic translation of executing binaries to configurable hardware circuits in this article we survey the key technologies underlying the important emerging field of binary synthesis we compare binary synthesis to several related areas of research and we then describe the key technologies required for effective binary synthesis decompilation techniques necessary for binary synthesis to achieve results competitive with source level synthesis hardware software partitioning methods necessary to find critical binary regions suitable for synthesis synthesis methods for converting regions to custom circuits and binary update methods that enable replacement of critical binary regions by circuits
the task of generating minimal models of knowledge base is at the computational heart of diagnosis systems like truth maintenance systems and of nonmonotonic systems like autoepistemic logic default logic and disjunctive logic programs unfortunately it is np hard in this paper we present hierarchy of classes of knowledge bases with the following properties first is the class of all horn knowledge bases second if knowledge base is in then has at most minimal models and all of them may be found in time lk where is the length of the knowledge base third for an arbitrary knowledge base we can find the minimum such that belongs to in time polynomial in the size of and last where is the class of all knowledge bases it is the case that that is every knowledge base belongs to some class in the hierarchy the algorithm is incremental that is it is capable of generating one model at time
we describe two level push relabel algorithm for the maximum flow problem and compare it to the competing codes the algorithm generalizes practical algorithm for bipartite flows experiments show that the algorithm performs well on several problem families the acm portal is published by the association for computing machinery copyright acm inc terms of usage privacy policy code of ethics contact us useful downloads adobe acrobat quicktime windows media player real player
extraction of addresses and location names from web pages is challenging task for search engines traditional information extraction and natural processing models remain unsuccessful in the context of the web because of the uncontrolled heterogenous nature of the web resources as well as the effects of html and other markup tags we describe new pattern based approach for extraction of addresses from web pages both html and vision based segmentations are used to increase the quality of address extraction the proposed system uses several address patterns and small table of geographic knowledge to hit addresses and then itemize them into smaller components the experiments show that this model can extract and itemize different addresses effectively without large gazetteers or human supervision
this paper presents an overview of dolap the th acm international workshop on data warehousing and olap held on november in lisbon portugal in conjunction with cikm the acm th conference on information and knowledge management
fault tolerance is one of the most important means to avoid service failure in the presence of faults so to guarantee they will not interrupt the service delivery software testing instead is one of the major fault removal techniques realized in order to detect and remove software faults during software development so that they will not be present in the final product this paper shows how fault tolerance and testing can be used to validate component based systems fault tolerance requirements guide the construction of fault tolerant architecture which is successively validated with respect to requirements and submitted to testing the theory is applied over mining control system running example
the recent introduction of several pieces of legislation mandating minimum and maximum retention periods for corporate records has prompted the enterprise content management ecm community to develop various records retention solutions records retention is significant subfield of records management and legal records retention requirements apply over corporate records regardless of their shape or form unfortunately the scope of existing solutions has been largely limited to proper identification classification and retention of documents and not of data more generally in this paper we address the problem of managed records retention in the context of relational database systems the problem is significantly more challenging than it is for documents for several reasons foremost there is no clear definition of what constitutes business record in relational databases it could be an entire table tuple part of tuple or parts of several tuples from multiple tables there are also no standardized mechanisms for purging anonymizing and protecting relational records functional dependencies user defined constraints and side effects caused by triggers make it even harder to guarantee that any given record will actually be protected when it needs to be protected or expunged when the necessary conditions are met most importantly relational tuples may be organized such that one piece of data may be part of various legal records and subject to several possibly conflicting retention policies we address the above problems and present complete solution for designing managing and enforcing records retention policies in relational database systems we experimentally demonstrate that the proposed framework can guarantee compliance with broad range of retention policies on an off the shelf system without incurring significant performance overhead for policy monitoring and enforcement
splice sites define the boundaries of exonic regions and dictate protein synthesis and function the splicing mechanism involves complex interactions among positional and compositional features of different lengths computational modeling of the underlying constructive information is especially challenging in order to decipher splicing inducing elements and alternative splicing factors spliceit splice identification technique introduces hybrid method for splice site prediction that couples probabilistic modeling with discriminative computational or experimental features inferred from published studies in two subsequent classification steps the first step is undertaken by gaussian support vector machine svm trained on the probabilistic profile that is extracted using two alternative position dependent feature selection methods in the second step the extracted predictions are combined with known species specific regulatory elements in order to induce tree based modeling the performance evaluation on human and arabidopsis thaliana splice site datasets shows that spliceit is highly accurate compared to current state of the art predictors in terms of the maximum sensitivity specificity tradeoff without compromising space complexity and in time effective way the source code and supplementary material are available at http wwwmedauthgr research spliceit
in recent years scaling of single core superscalar processor performance has slowed due to complexity and power considerations to improve program performance designs are increasingly adopting chip multiprocessing with homogeneous or heterogeneous cmps by trading off features from modern aggressive superscalar core cmps often offer better scaling characteristics in terms of aggregate performance complexity and power but often require additional software investment to rewrite retune or recompile programs to take advantage of the new designs the cell broadband engine is modern example of heterogeneous cmp with coprocessors accelerators which can be found in supercomputers roadrunner blade servers ibm qs and video game consoles scei ps cell be processor has host power risc processor ppe and eight synergistic processor elements spe each consisting of synergistic processor unit spu and memory flow controller mfc in this work we explore the idea of offloading automatic dynamic garbage collection gc from the host processor onto accelerator processors using the coprocessor paradigm offloading part or all of gc to coprocessor offers potential performance benefits because while the coprocessor is running gc the host processor can continue running other independent more general computations we implement bdw garbage collection on cell system and offload the mark phase to the spe co processor we show mark phase execution on the spe accelerator to be competitive with execution on full fledged ppe processor we also explore object based and block based caching strategies for explicitly managed memory hierarchies and explore to effectiveness of several prefetching schemes in the context of garbage collection finally we implement capitulative loads using the dma by extending software caches and quantify its performance impact on the coprocessor
the goal of statistical disclosure control sdc is to modify statistical data so that it can be published without releasing confidential information that may be linked to specific respondents the challenge for sdc is to achieve this variation with minimum loss of the detail and accuracy sought by final users there are many approaches to evaluate the quality of protection method however all these measures are only applicable to numerical or categorical attributes in this paper we present some recent results about time series protection and re identification we propose complete framework to evaluate time series protection methods we also present some empirical results to show how our framework works
poorly designed olap on line analytical processing cube can have size much larger than the volume of information potentially leading to problems with performance and usability we give new normal form for olap cube design and synthesis and decomposition algorithms to produce normalised olap cube schemata olap cube normalisation controls the structural sparsity resulting from inter dimensional functional dependencies we assume that functional dependencies are used to describe the constraints of the application universe of discourse our methods help the user to identify cube schemata with structural sparsity and to change the design in order to obtain more economy of space
the social impact from the world wide web cannot be underestimated but technologies used to build the web are also revolutionizing the sharing of business and government information within intranets in many ways the lessons learned from the internet carry over directly to intranets but others do not apply in particular the social forces that guide the development of intranets are quite different and the determination of good answer for intranet search is quite different than on the internet in this paper we study the problem of intranet search our approach focuses on the use of rank aggregation and allows us to examine the effects of different heuristics on ranking of search results
in this paper we show how conceptual graphs cg are powerful metaphor for identifying and understanding the wc resource description framework we also presents cg as target language and graph homomorphism as an abstract machine to interpret implement rdf sparql and rules we show that cg components can be used to implement such notions as named graphs and properties as resourcesin brief we think that cg are an excellent framework to progress in the semantic web because the wc now considers that rdf graphs are along with xml trees one of the two standard formats for the web
whereas physical database tuning has received lot of attention over the last decade logical database tuning seems to be under studied we have developed project called dba companion devoted to the understanding of logical database constraints from which logical database tuning can be achievedin this setting two main data mining issues need to be addressed the first one is the design of efficient algorithms for functional dependencies and inclusion dependencies inference and the second one is about the interestingness of the discovered knowledge in this paper we point out some relationships between database analysis and data mining in this setting we sketch the underlying themes of our approach some database applications that could benefit from our project are also described including logical database tuning
increasingly prominent variational effects impose imminent threat to the progress of vlsi technology this work explores redundancy which is well known fault tolerance technique for variation tolerance it is observed that delay variability can be reduced by making redundant paths distributed or less correlated based on this observation gate splitting methodology is proposed for achieving distributed redundancy we show how to avoid short circuit and estimate delay in dual driver nets which are caused by gate splitting spin off gate placement heuristic is developed to minimize redundancy cost monte carlo simulation results on benchmark circuits show that our method can improve timing yield from to with only increase on cell area and increase on wirelength on average
named entity recognition studies the problem of locating and classifying parts of free text into set of predefined categories although extensive research has focused on the detection of person location and organization entities there are many other entities of interest including phone numbers dates times and currencies to name few examples we refer to these types of entities as semi structured named entities since they usually follow certain syntactic formats according to some conventions although their structure is typically not well defined regular expression solutions require significant amount of manual effort and supervised machine learning approaches rely on large sets of labeled training data therefore these approaches do not scale when we need to support many semi structured entity types in many languages and regions in this paper we study this problem and propose novel three level bootstrapping framework for the detection of semi structured entities we describe the proposed techniques for phone date and time entities and perform extensive evaluations on english german polish swedish and turkish documents despite the minimal input from the user our approach can achieve precision and recall for phone entities and precision and recall for date and time entities on average we also discuss implementation details and report run time performance results which show significant improvements over regular expression based solutions
technologies for the efficient and effective reuse of ontological knowledge are one of the key success factors for the semantic web putting aside matters of cost or quality being reusable is an intrinsic property of ontologies originally conceived of as means to enable and enhance the interoperability between computing applications this article gives an account based on empirical evidence and real world findings of the methodologies methods and tools currently used to perform ontology reuse processes we study the most prominent case studies on ontology reuse published in the knowledge ontology engineering literature from the early nineties this overview is complemented by two self conducted case studies in the areas of ehealth and erecruitment in which we developed semantic web ontologies for different scopes and purposes by resorting to existing ontological knowledge on the web based on the analysis of the case studies we are able to identify series of research and development challenges which should be addressed to ensure reuse becomes feasible alternative to other ontology engineering strategies such as development from scratch in particular we emphasize the need for context and task sensitive treatment of ontologies both from an engineering and usage perspective and identify the typical phases of reuse processes which could profit considerably from such an approach further on we argue for the need for ontology reuse methodologies which optimally exploit human and computational intelligence to effectively operationalize reuse processes
query logs the patterns of activity left by millions of users contain wealth of information that can be mined to aid personalization we perform large scale study of yahoo search engine logs tracking million browser cookies over period of months we define metrics to address questions such as how much history is available how do users topical interests vary as reflected by their queries and what can we learn from user clicks we find that there is significantly more expected history for the user of randomly picked query than for randomly picked user we show that users exhibit consistent topical interests that vary between users we also see that user clicks indicate variety of special interests our findings shed light on user activity and can inform future personalization efforts
we present novel approach for interactive multimedia content creation that establishes an interactive environment in cyberspace in which users interact with autonomous agents generated from video images of real world creatures each agent has autonomy personality traits and behaviors that reflect the results of various interactions determined by an emotional model with fuzzy logic after an agent’s behavior is determined sequence of video images that best match the determined behavior is retrieved from the database in which variety of video image sequences of the real creature’s behaviors are stored the retrieved images are successively displayed on the cyberspace to make it responsive thus the autonomous agent behaves continuously in addition an explicit sketch based method directly initiate the reactive behavior of the agent without involving the emotional process this paper describes the algorithm that establishes such an interactive system first an image processing algorithm to generate video database is described then the process of behavior generation using emotional models and sketch based instruction are introduced finally two application examples are demonstrated video agents with humans and goldfish
in this paper we propose new intelligent cross layer qos support for wireless mobile ad hoc networks the solution named fuzzyqos exploits fuzzy logic for improving traffic regulation and the control of congestion to support both real time multimedia audio video services and non real time traffic services fuzzyqos includes three contributions fuzzy logic approach for best effort traffic regulation fuzzyqos new fuzzy petri nets technique fuzzyqos for modeling and analyzing the qos decision making for traffic regulation control and fuzzy logic approach for threshold buffer management fuzzyqos in fuzzyqos the feedback delay information received from the network is used to perform fuzzy regulation for best effort traffic using fuzzy logic fuzzyqos uses fuzzy thresholds to adapt to the dynamic conditions the evaluation of fuzzyqos performances was studied under different mobility channel and traffic conditions the results of simulations confirm that cross layer design using fuzzy logic at different levels can achieve low and stable end to end delay and high throughput under different network conditions these results will benefit delay and jitter sensitive real time services
we present method for progressive deforming meshes most existing mesh decimation methods focus on static meshes however there are more and more animation data today and it is important to address the problem of simplifying deforming meshes our method is based on deformation oriented decimation dod error metric and dynamic connectivity updating dcu algorithm deformation oriented decimation extends the deformation sensitivity decimation dsd error metric by augmenting an additional term to model the distortion introduced by deformation this new metric preserves not only geometric features but also areas with large deformation using this metric static reference connectivity is extracted for the whole animation dynamic connectivity updating algorithm utilizes vertex trees to further reduce geometric distortion by allowing the connectivity to change temporal coherence in the dynamic connectivity between frames is achieved by penalizing large deviations from the reference connectivity the combination of dod and dcu demonstrates better simplification and triangulation performance than previous methods for deforming mesh simplification
since its introduction in bounded model checking has gained industrial relevance for detecting errors in digital and hybrid systems one of the main reasons for this is that it always provides counterexample when an erroneous execution trace is found such counterexample can guide the designer while debugging the systemin this paper we are investigating how bounded model checking can be applied to generate counterexamples for different kind of model namely discrete time markov chains since in this case counterexamples in general do not consist of single path to safety critical state but of potentially large set of paths novel optimization techniques like loop detection are applied not only to speed up the counterexample computation but also to reduce the size of the counterexamples significantly we report on some experiments which demonstrate the practical applicability of our method
dynamic voltage and frequency scaling dvfs is an effective technique for controlling microprocessor energy and performance existing dvfs techniques are primarily based on hardware os timeinterrupts or static compiler techniques however substantially greater gains can be realized when control opportunities are also explored in dynamic compilation environment there are several advantages to deploying dvfs and managing energy performance tradeoffs through the use of dynamic compiler most importantly dynamic compiler driven dvfs is fine grained code aware and adaptive to the current microarchitecture environment this paper presents design framework of the run time dvfs optimizer in general dynamic compilation system prototype of the dvfs optimizer is implemented and integrated into an industrialstrength dynamic compilation system the obtained optimization system is deployed in real hardware platform that directly measures cpu voltage and current for accurate power and energy readings experimental results based on physical measurements for over spec or olden benchmarks show that significant energy savings are achieved with little performance degradation speck fp benchmarks benefit with energy savings of up to with performance loss in addition speck int show up to energy savings with performance loss spec fp save up to with performance loss and olden save up to with performance loss on average the technique leads to an energy delay product edp improvement that is better than static voltage scaling and is more than vs better than the reported dvfs results of prior static compiler work while the proposed technique is an effective method for microprocessor voltage and frequency control the design framework and methodology described in this paper have broader potential to address other energy and power issues such as di dt and thermal control
fundamental problem in the implementation of object oriented languages is that of frugal implementation of dynamic dispatching that is small footprint data structure that supports quick response to runtime dispatching queries of the following format which method should be executed in response to certain message sent to given object previous theoretical algorithms for this problem tend to be impractical due to their conceptual complexity and large hidden constants in contrast successful practical heuristics lack theoretical support the contribution of this article is in novel type slicing technique which results in two dispatching schemes ts and ctd we make the case for these schemes both practically and theoretically the empirical findings on corpus of hierarchies totaling some thousand types from eight different languages demonstrate improvement over previous results in terms of the space required for the representation and the time required for computing it the theoretical analysis is with respect to iota the best possible compression factor of the dispatching matrix the results are expressed as function of parameter kappa which can be thought of as metric of the complexity of the topology of multiple inheritance hierarchy in single inheritance hierarchies kappa equals but although kappa can be in the order of the size of the hierarchy it is typically small constant in actual use of inheritance in our corpus the median value of kappa is while its average is the ts scheme generalizes the famous interval containment technique to multiple inheritance ts achieves compression factor of iota kappa that is our generalization comes with an increase to the space requirement by small factor of kappa the pay is in the dispatching time which is no longer constant as in naive matrix implementation but logarithmic in the number of different method implementations in practice dispatching uses one indirect branch and on average only binary branches the ct schemes are sequence of algorithms ct ct ct hellip where ctd uses memory dereferencing operations during dispatch and achieves compression factor of iota minus in single inheritance setting generalization of these algorithms to multiple inheritance setting increases the space by factor of kappa minus this trade off represents the first bounds on the compression ratio of constant time dispatching algorithms we also present an incremental variant of the ctd suited for languages such as java
this paper describes the overall design and architecture of the timber xml database system currently being implemented at the university of michigan the system is based upon bulk algebra for manipulating trees and natively stores xml new access methods have been developed to evaluate queries in the xml context and new cost estimation and query optimization techniques have also been developed we present performance numbers to support some of our design decisions we believe that the key intellectual contribution of this system is comprehensive set at time query processing ability in native xml store with all the standard components of relational query processing including algebraic rewriting and cost based optimizer
we consider naive quadratic string matcher testing whether pattern occurs in text we equip it with cache mediating its access to the text and we abstract the traversal policy of the pattern the cache and the text we then specialize this abstracted program with respect to pattern using the off the shelf partial evaluator similixinstantiating the abstracted program with left to right traversal policy yields the linear time behavior of knuth morris and pratt’s string matcher instantiating it with right to left policy yields the linear time behavior of boyer and moore’s string matcher
we explore the automatic generation of test data that respect constraints expressed in the object role modeling orm language orm is popular conceptual modelinglanguage primarily targeting database applications withsignificant uses in practice the general problem of evenchecking whether an orm diagram is satisfiable is quitehard restricted forms are easily np hard and the problemis undecidable for some expressive formulations of ormbrute force mapping to input for constraint and sat solversdoes not scale state of the art solvers fail to find data to satisfy uniqueness and mandatory constraints in realistic time even for small examples we instead define restricted subset of orm that allows efficient reasoning yet contains most constraints overwhelmingly used in practice we show that the problem of deciding whether these constraints are consistent ie whether we can generate appropriate test data is solvable in polynomial time and we produce highly efficient interactive speed checker additionally we analyze over orm diagrams that capture data models from industrial practice and demonstrate that our subset of orm is expressive enough to handle their vast majority
we argue that finding vulnerabilities in software components is different from finding exploits against them exploits that compromise security often use several low level details of the component such as layouts of stack frames existing software analysis tools while effective at identifying vulnerabilities fail to model low level details and are hence unsuitable for exploit findingwe study the issues involved in exploit finding by considering application programming interface api level exploits software component is vulnerable to an api level exploit if its security can be compromised by invoking sequence of api operations allowed by the component we present framework to model low level details of apis and develop an automatic technique based on bounded infinite state model checking to discover api level exploitswe present two instantiations of this framework we show that format string exploits can be modeled as api level exploits and demonstrate our technique by finding exploits against vulnerabilities in widely used software we also use the framework to model cryptographic key management api the ibm cca and demonstrate tool that identifies previously known exploit
while intraprocedural static single assignment ssa is ubiquitous in modern compilers the use of interprocedural ssa although seemingly natural extension is limited we find that part of the impediment is due to the narrow scope of variables handled by previously reported approaches leading to limited benefits in optimizationin this study we increase the scope of interprocedural ssa issa to record elements and singleton heap variables we show that issa scales reasonably well to all mediabench and most of the speck while resolving on average times more loads to their definition we propose and evaluate an interprocedural copy propagation and an interprocedural liveness analysis and demonstrate their effectiveness on reducing input and output instructions by and respectively issa is then leveraged for constant propagation and dead code removal where additional expressions are folded
we present novel algorithm for finding the longest factors in text for which the working space is proportional to the history text size moreover our algorithm is online and exact in that unlike the previous batch algorithms which needs to read the entire input beforehand our algorithm reports the longest match just after reading each character this algorithm can be directly used for data compression pattern analysis and data mining our algorithm also supports the window buffer in that we can bound the working space by discarding the history from the oldest character using the dynamic rank select dictionary our algorithm requires log log bits of working space and log time per character log total time is the length of the history and is the alphabet size we implemented our algorithm and compared it with the recent algorithms in terms of speed and the working space we found that our algorithm can work with smaller working space less than of those for the previous methods in real world data and with reasonable decline in speed
component based system cbs is integration centric with focus on assembling individual components to build software system in cbs component source code information is usually unavailable each component also introduces added properties such as constraints associated with its use interactions with other components and customizability properties recent research suggests that most faults are found in only few system components complexity measure at specification phase can identify these components however traditional complexity metrics are not adequate for cbs as they focus mainly on either lines of code loc or information based on object and class properties there is therefore need to develop new technique for measuring the complexity of cbs specification cbss this paper describes structural complexity measure for cbss written in unified modelling language uml from system analyst’s point of view cbss consists of individual component descriptions characterized by its syntactic semantic and interaction properties we identify three factors interface constraints and interaction as primary contributors to the complexity of cbss we also present an application of our technique to university course registration system copyright copy john wiley sons ltd
modern data centers have large number of components that must be monitored including servers switches routers and environmental control systems this paper describes intemon prototype monitoring and mining system for data centers it uses the snmp protocol to monitor new data center at carnegie mellon it stores the monitoring data in mysql database allowing visualization of the time series data using jsp web based frontend interface for system administrators what sets intemon apart from other cluster monitoring systems is its ability to automatically analyze correlations in the monitoring data in real time and alert administrators of potential anomalies it uses efficient state of the art stream mining methods to report broken correlations among input streams it also uses these methods to intelligently compress historical data and avoid the need for administrators to configure threshold based monitoring bands
time varying congestion on internet paths and failures due to software hardware and configuration errors often disrupt packet delivery on the internetmany aproaches to avoiding these problems use multiple paths between two network locations these approaches rely on path independence assumption in order to work well ie they work best when the problems on different paths between two locations are uncorrelated in timethis paper examines the extent to which this assumption holds on the internet by analyzing days of data collected from nodes in the ron testbed we examine two problems that manifest themselves congestion triggered loss and path failures and find that the chances of losing two packets between the same hosts is nearly as high when those packets are sent through an intermediate node as when they are sent back to back on the same path in so doing we also compare two different ways of taking advantage of path redundancy proposed in the literature mesh routing based on packet replication and reactive routing based on adaptive path selection
in service oriented computing soc environments service clients interact with service providers for consuming services from the viewpoint of service clients the trust level of service or service provider is critical issue to consider in service selection and discovery particularly when client is looking for service from large set of services or service providers however service may invoke other services offered by different providers forming composite services the complex invocations in composite services greatly increase the complexity of trust oriented service selection and discovery in this paper we propose novel approaches for composite service representation trust evaluation and trust oriented service selection and discovery our experiments illustrate that compared with the existing approaches our proposed trust oriented service selection and discovery algorithm is realistic and more efficient
after short analysis of the requirements that knowledge representation language must satisfy we introduce description logics modal logics and nonmonotonic logics as formalisms for representing terminological knowledge time dependent or subjective knowledge and incomplete knowledge respectively at the end of each section we briefly comment on the connection to logic programming
embedded systems are pervasive and frequently used for critical systems with time dependent functionality dwyer et al have developed qualitative specification patterns to facilitate the specification of critical properties such as those that must be satisfied by embedded systems thus far no analogous repository has been compiled for real time specification patterns this paper makes two main contributions first based on an analysis of timing based requirements of several industrial embedded system applications we created real time specification patterns in terms of three commonly used real time temporal logics second as means to further facilitate the understanding of the meaning of specification we offer structured english grammar that includes support for real time properties we illustrate the use of the real time specification patterns in the context of property specifications of real world automotive embedded system
in the study of data exchange one usually assumes an open world semantics making it possible to extend instances of target schemas an alternative closed world semantics only moves as much data as needed from the source to the target to satisfy constraints of schema mapping it avoids some of the problems exhibited by the open world semantics but limits the expressivity of schema mappings here we propose mixed approach one can designate different attributes of target schemas as open or closed to combine the additional expressivity of the open world semantics with the better behavior of query answering in closed worlds we define such schema mappings and show that they cover large space of data exchange solutions with two extremes being the known open and closed world semantics we investigate the problems of query answering and schema mapping composition and prove two trichotomy theorems classifying their complexity based on the number of open attributes we find conditions under which schema mappings compose extending known results to wide range of closed world mappings we also provide results for restricted classes of queries and mappings guaranteeing lower complexity
keyword search techniques that take advantage of xml structure make it very easy for ordinary users to query xml databases but current approaches to processing these queries rely on intuitively appealing heuristics that are ultimately ad hoc these approaches often retrieve irrelevant answers overlook relevant answers and cannot rank answers appropriately to address these problems for data centric xml we propose coherency ranking cr domain and database design independent ranking method for xml keyword queries that is based on an extension of the concept of mutual information with cr the results of keyword query are invariant under schema reorganization we analyze how previous approaches to xml keyword search approximate cr and present efficient algorithms to perform cr our empirical evaluation with user supplied queries over two real world xml data sets shows that cr has better precision and recall and provides better ranking than all previous approaches
this short note introduces the reader to the contributed papers in the area of concurrency theory that have been written in honour of ugo montanari also draw the attention to one of the contributions of ugo in this area that in my opinion constitutes pearl in theoretical computer science and would deserve to be investigated further
the world wide web provides vast resource to genomics researchers with web based access to distributed data sources such as blast sequence homology search interfaces however finding the desired scientific information can still be very tedious and frustrating while there are several known servers on genomic data eg genebank embl ncbi that are shared and accessed frequently new data sources are created each day in laboratories all over the world sharing these new genomics results is hindered by the lack of common interface or data exchange mechanism moreover the number of autonomous genomics sources and their rate of change outpace the speed at which they can be manually identified meaning that the available data is not being utilized to its full potential an automated system that can find classify describe and wrap new sources without tedious and low level coding of source specific wrappers is needed to assist scientists in accessing hundreds of dynamically changing bioinformatics web data sources through single interface correct classification of any kind of web data source must address both the capability of the source and the conversation interaction semantics inherent in the design of the data source we propose service class description scd meta data approach for classifying web data sources that takes into account both the capability and the conversational semantics of the source the ability to discover the interaction pattern of web source leads to increased accuracy in the classification process our results show that an scd based approach successfully classifies two thirds of blast sites with accuracy and two thirds of bioinformatics keyword search sites with around precision
we discuss the relation between program slicing and data dependencies we claim that slicing can be defined and therefore calculated parametrically on the chosen notion of dependency which implies different result when building the program dependency graph in this framework it is possible to choose dependency in the syntactic or semantic sense thus leading to compute possibly different smaller slices moreover the notion of abstract dependency based on properties instead of exact data values is investigated in its theoretical meaning constructive ideas are given to compute abstract dependencies on expressions and to transform properties in order to rule out some dependencies the application of these ideas to information flow is also discussed
user query is an element that specifies an information need but it is not the only one studies in literature have found many contextual factors that strongly influence the interpretation of query recent studies have tried to consider the user’s interests by creating user profile however single profile for user may not be sufficient for variety of queries of the user in this study we propose to use query specific contexts instead of user centric ones including context around query and context within query the former specifies the environment of query such as the domain of interest while the latter refers to context words within the query which is particularly useful for the selection of relevant term relations in this paper both types of context are integrated in an ir model based on language modeling our experiments on several trec collections show that each of the context factors brings significant improvements in retrieval effectiveness
view materialization is promising technique for achieving the data sharing and virtual restructuring capabilities needed by advanced applications such as data warehousing and workflow management systems much existing work addresses the problem of how to maintain the consistency of materialized relational views under update operations however little progress has been made thus far regarding the topic of view materialization in object oriented databases oodbs in this paper we demonstrate that there are several significant differences between the relational and object oriented paradigms that can be exploited when addressing the object oriented view materialization problem first we propose techniques that prune update propagation by exploiting knowledge of the subsumption relationships between classes to identify branches of classes to which we do not need to propagate updates and by using derivation ordering to eliminate self canceling propagation second we use encapsulated interfaces combined with the fact that any unique database property is inherited from single location to provide registration service by which virtual classes can register their interest in specific properties and be notified upon modification of those properties third we introduce the notion of hierarchical registrations that further optimizes update propagation by organizing the registration structures according to the class generalization hierarchy thereby pruning the set of classes that are notified of updates we have successfully implemented all proposed techniques in the multiview system on top of the gemstone oodbms to the best of our knowledge multiview is the first oodb view system to provide updatable materialized virtual classes and virtual schemata in this paper we also present cost model for our update algorithms and we report results from the experimental studies we have run on the multiview system measuring the impact of various optimization strategies incorporated into our materialization update algorithms
we present extensive experimental results on our static analysis and source level transformation that adds explicit memory reuse commands into ml program text our analysis and transformation cost is negligible to lines per seconds enough to be used in daily programming the payoff is the reduction of memory peaks and the total garbage collection time the transformed programs reuse to of total allocated memory cells and the memory peak is reduced by to when the memory peak reduction is large enough to overcome the costs of dynamic flags and the memory reuse in the generational garbage collection it speeds up program’s execution by up to otherwise our transformation can slowdown programs by up to the speedup is likely only when the portion of garbage collection time among the total execution time is more than about
in this paper we investigate the collective tree spanners problem in homogeneously orderable graphs this class of graphs was introduced by brandstädt et al to generalize the dually chordal graphs and the distance hereditary graphs and to show that the steiner tree problem can still be solved in polynomial time on this more general class of graphs in this paper we demonstrate that every vertex homogeneously orderable graph admits spanning tree such that for any two vertices of dt dg ie an additive tree spanner and system of at most log spanning trees such that for any two vertices of spanning tree exists with dt dg ie system of at most log collective additive tree spanners these results generalize known results on tree spanners of dually chordal graphs and of distance hereditary graphs the results above are also complemented with some lower bounds which say that on some vertex homogeneously orderable graphs any system of collective additive tree spanners must have at least spanning trees and there is no system of collective additive tree spanners with constant number of trees
this paper proposes framework for detecting global state predicates in systems of processes with approximately synchronized real time clocks timestamps from these clocks are used to define two orderings on events definitely occurred before and possibly occurred before these orderings lead naturally to definitions of distinct detection modalities ie meanings of predicate held during computation namely poss sup db sup possibly held def sup db sup definitely held and inst definitely held in specific global state this paper defines these modalities and gives efficient algorithms for detecting them the algorithms are based on algorithms of garg and waldecker alagar and venkatesan cooper and marzullo and fromentin and raynal complexity analysis shows that under reasonable assumptions these real time clock based detection algorithms are less expensive than detection algorithms based on lamport’s happened before ordering sample applications are given to illustrate the benefits of this approach
software transactional memory is concurrency control technique gaining increasing popularity as it provides high level concurrency control constructs and eases the development of highly multi threaded applications but this easiness comes at the expense of restricting the operations that can be executed within memory transaction and operations such as terminal and file are either not allowed or incur in serious performance penalties database is another example of operations that usually are not allowed within memory transaction this paper proposes to combine memory and database transactions in single unified model benefiting from the acid properties of the database transactions and from the speed of main memory data processing the new unified model covers without differentiating both memory and database operations thus the users are allowed to freely intertwine memory and database accesses within the same transaction knowing that the memory and database contents will always remain consistent and that the transaction will atomically abort or commit the operations in both memory and database this approach allows to increase the granularity of the in memory atomic actions and hence simplifies the reasoning about them
we present an algorithm which splits surface into reliefs relatively fiat regions that have smooth boundaries the surface is then resampled in regular manner within each of the reliefs as result we obtain piecewise regular mesh prm having regular structure on large regions experimental results show that we are able to approximate the input surface with the mean square error of about of the diameter of the bounding box without increasing the number of vertices we introduce compression scheme tailored to work with our remeshed models and show that it is able to compress them losslessly after quantizing the vertex locations without significantly increasing the approximation error using about bits per vertex of the resampled model
while scalable data mining methods are expected to cope with massive web data coping with evolving trends in noisy data in continuous fashion and without any unnecessary stoppages and reconfigurations is still an open challenge this dynamic and single pass setting can be cast within the framework of mining evolving data streams in this paper we explore the task of mining mass user profiles by discovering evolving web session clusters in single pass with recently proposed scalable immune based clustering approach tecno streams and study the effect of the choice of different similarity measures on the mining process and on the interpretation of the mined patterns we propose simple similarity measure that has the advantage of explicitly coupling the precision and coverage criteria to the early learning stages and furthermore requiring that the affinity of the data to the learned profiles or summaries be defined by the minimum of their coverage or precision hence requiring that the learned profiles are simultaneously precise and complete with no compromisesin our experiments we study the task of mining evolving user profiles from web clickstream data web usage mining in single pass and under different trend sequencing scenarios showing that compared oto the cosine similarity measure the proposed similarity measure explicitly based on precision and coverage allows the discovery of more correct profiles at the same precision or recall quality levels
many applications operate in heterogeneous wireless sensor networks which represent challenging programming environment due to the wide range of device capabilities servilla addresses this difficulty in developing applications by offering new middleware framework based on service provisioning using servilla developers can construct platform independent applications over dynamic and diverse set of devices salient feature of servilla is its support for the discovery and binding to local and remote services which enables flexible and energy efficient in network collaboration among heterogeneous devices furthermore servilla provides modular middleware architecture that can be easily tailored to devices with wide range of resources allowing resource constrained devices to provide services while leveraging the capabilities of more powerful devices servilla has been implemented on tinyos for two representative hardware platforms imote and telosb with drastically different resources microbenchmarks demonstrate the efficiency of servilla’s implementation while an application case study on structural health monitoring demonstrates the efficacy of its coordination model for integrating heterogeneous devices
this paper provides detailed description of the general design space for metadata storage capabilities the design space considers issues of metadata identification typing and representation dynamic behavior predefined and user defined metadata schema discovery update operations api packaging marshalling searching and versioning the design space is used to structure retrospective analysis of the three major alternative metadata designs considered during the design of the webdav distributed authoring protocol deployment experience with webdav properties is also discussed with the most successful use occurring in custom client server pairs and in protocol extensions
region based memory management offers several important potential advantages over garbage collection including real time performance better data locality and more efficient use of limited memory researchers have advocated the use of regions for functional imperative and object oriented languages lexically scoped regions are now core feature of the real time specification for java rtsj recent research in region based programming for java has focused on region checking which requires manual effort to augment the program with region annotations in this paper we propose an automatic region inference system for core subset of java to provide an inference method that is both precise and practical we support classes and methods that are region polymorphic with region polymorphic recursion for methods one challenging aspect is to ensure region safety in the presence of features such as class subtyping method overriding and downcast operations our region inference rules can handle these object oriented features safely without creating dangling references
query can be answered by binary classifier which separates the instances that are relevant to the query from the ones that are not when kernel methods are employed to train such classifier the class boundary is represented as hyperplane in projected space data instances that are farthest from the hyperplane are deemed to be most relevant to the query and that are nearest to the hyperplane to be most uncertain to the query in this paper we address the twin problems of efficient retrieval of the approximate set of instances farthest from and nearest to query hyperplane retrieval of instances for this hyperplane based query scenario is mapped to the range query problem allowing for the reuse of existing index structures empirical evaluation on large image datasets confirms the effectiveness of our approach
rendering applications in design manufacturing ecommerce and other fields are used to simulate the appearance of objects and scenes fidelity with respect to appearance is often critical and calculating global illumination gi is an important contributor to image fidelity but it is expensive to compute gi approximation methods such as virtual point light vpl algorithms are efficient but they can induce image artifacts and distortions of object appearance in this paper we systematically study the perceptual effects on image quality and material appearance of global illumination approximations made by vpl algorithms in series of psychophysical experiments we investigate the relationships between rendering parameters object properties and image fidelity in vpl renderer using the results of these experiments we analyze how vpl counts and energy clamping levels affect the visibility of image artifacts and distortions of material appearance and show how object geometry and material properties modulate these effects we find the ranges of these parameters that produce vpl renderings that are visually equivalent to reference renderings further we identify classes of shapes and materials that cannot be accurately rendered using vpl methods with limited resources using these findings we propose simple heuristics to guide visually equivalent and efficient rendering and present method for correcting energy losses in vpl renderings this work provides strong perceptual foundation for popular and efficient class of gi algorithms
image search reranking is an effective approach to refining the text based image search result in the reranking process the estimation of visual similarity is critical to the performance however the existing measures based on global or local features cannot be adapted to different queries in this paper we propose to estimate query aware image similarity by incorporating the global visual similarity local visual similarity and visual word co occurrence into an iterative propagation framework after the propagation query aware image similarity combining the advantages of both global and local similarities is achieved and applied to image search reranking the experiments on real world web image dataset demonstrate that the proposed query aware similarity outperforms the global local similarity and their linear combination for image search reranking
the discovery of quantitative association rules in large databases is considered an interesting and important research problem recently different aspects of the problem have been studied and several algorithms have been presented in the literature among others in srikant and agrawal fukuda et al fukuda et al yoda et al miller and yang an aspect of the problem that has so far been ignored is its computational complexity in this paper we study the computational complexity of mining quantitative association rules
modern compilers must expose sufficient amounts of instruction level parallelism ilp to achieve the promised performance increases of superscalar and vliw processors one of the major impediments to achieving this goal has been inefficient programmatic control flow historically the compiler has translated the programmer’s original control structure directly into assembly code with conditional branch instructions eliminating inefficiencies in handling branch instructions and exploiting ilp has been the subject of much research however traditional branch handling techniques cannot significantly alter the program’s inherent control structure the advent of predication as program control representation has enabled compilers to manipulate control in form more closely related to the underlying program logic this work takes full advantage of the predication paradigm by abstracting the program control flow into logical form referred to as program decision logic network this network is modeled as boolean equation and minimized using modified versions of logic synthesis techniques after minimization the more efficient version of the program’s original control flow is re expressed in predicated code furthermore this paper proposes extensions to the hpl playdoh predication model in support of more effective predicate decision logic network minimization finally this paper shows the ability of the mechanisms presented to overcome limits on ilp previously imposed by rigid program control structure
costs are often an important part of the classification process cost factors have been taken into consideration in many previous studies regarding decision tree models in this study we also consider cost sensitive decision tree construction problem we assume that there are test costs that must be paid to obtain the values of the decision attribute and that record must be classified without exceeding the spending cost threshold unlike previous studies however in which records were classified with only single condition attribute in this study we are able to simultaneously classify records with multiple condition attributes an algorithm is developed to build cost constrained decision tree which allows us to simultaneously classify multiple condition attributes the experimental results show that our algorithm satisfactorily handles data with multiple condition attributes under different cost constraints
we analyze the performance of size interval task assignment sita policies for multi host assignment in non preemptive environment assuming poisson arrivals we provide general bounds on the average waiting time independent of the job size distribution we establish general duality theory for the performance of sita policies we provide detailed analysis of the performance of sita systems when the job size distribution is bounded pareto and the range of job sizes tends to infinity in particular we determine asymptotically optimal cutoff values and provide asymptotic formulas for average waiting time and slowdown we compare the results with the least work remaining policy and compute which policy is asymptotically better for any given set of parameters in the case of inhomogeneous hosts we determine their optimal ordering
we show that standard formulations of intersection type systems are unsound in the presence of computational effects and propose solution similar to the value restriction for polymorphism adopted in the revised definition of standard ml it differs in that it is not tied to let expressions and requires an additional weakening of the usual subtyping rules we also present bi directional type checking algorithm for the resulting language that does not require an excessive amount of type annotations and illustrate it through some examples we further show that the type assignment system can be extended to incorporate parametric polymorphism taken together we see our system and associated type checking algorithm as significant step towards the introduction of intersection types into realistic programming languages the added expressive power would allow many more properties of programs to be stated by the programmer and statically verified by compiler
exploratory spatial analysis is increasingly necessary as larger spatial data is managed in electro magnetic media we propose an exploratory method that reveals robust clustering hierarchy from point data our approach uses the delaunay diagram to incorporate spatial proximity it does not require prior knowledge about the data set nor does it require preconditions multi level clusters are successfully discovered by this new method in only nlogn time where is the size of the data set the efficiency of our method allows us to construct and display new type of tree graph that facilitates understanding of the complex hierarchy of clusters we show that clustering methods adopting raster like or vector like representation of proximity are not appropriate for spatial clustering we conduct an experimental evaluation with synthetic data sets as well as real data sets to illustrate the robustness of our method
unit test cases are focused and efficient system tests are effective at exercising complex usage patterns differential unit tests dut are hybrid of unit and system tests they are generated by carving the system components while executing system test case that influence the behavior of the target unit and then re assembling those components so that the unit can be exercised as it was by the system test we conjecture that duts retain some of the advantages of unit tests can be automatically and inexpensively generated and have the potential for revealing faults related to intricate system executions in this paper we present framework for automatically carving and replaying duts that accounts for wide variety of strategies we implement an instance of the framework with several techniques to mitigate test cost and enhance flexibility and we empirically assess the efficacy of carving and replaying duts
data clustering plays an important role in many disciplines including data mining machine learning bioinformatics pattern recognition and other fields where there is need to learn the inherent grouping structure of data in an unsupervised manner there are many clustering approaches proposed in the literature with different quality complexity tradeoffs each clustering algorithm works on its domain space with no optimum solution for all datasets of different properties sizes structures and distributions in this paper novel cooperative clustering cc model is presented it involves cooperation among multiple clustering techniques for the goal of increasing the homogeneity of objects within the clusters the cc model is capable of handling datasets with different properties by developing two data structures histogram representation of the pair wise similarities and cooperative contingency graph the two data structures are designed to find the matching sub clusters between different clusterings and to obtain the final set of clusters through coherent merging process the cooperative model is consistent and scalable in terms of the number of adopted clustering approaches experimental results show that the cooperative clustering model outperforms the individual clustering algorithms over number of gene expression and text documents datasets
in this paper we study the privacy preservation properties of aspecific technique for query log anonymization token based hashing in this approach each query is tokenized and then secure hash function is applied to each token we show that statistical techniques may be applied to partially compromise the anonymization we then analyze the specific risks that arise from these partial compromises focused on revelation of identity from unambiguous names addresses and so forth and the revelation of facts associated with an identity that are deemed to be highly sensitive our goal in this work is two fold to show that token based hashing is unsuitable for anonymization and to present concrete analysis of specific techniques that may be effective in breaching privacy against which other anonymization schemes should be measured
we present concurrent face routing cfr algorithm we formally prove that the worst case latency of our algorithm is asymptotically optimal our simulation results demonstrate that on average cfr significantly outperforms the best known geometric routing algorithms in the path stretch the speed of message delivery its performance approaches the shortest possible path cfr maintains its advantage over the other algorithms in pure form as well as in combination with greedy routing on planar as well as on non planar graphs
with the development of positioning technologies and the boosting deployment of inexpensive location aware sensors large volumes of trajectory data have emerged however efficient and scalable query processing over trajectory data remains big challenge we explore new approach to this target in this paper presenting new framework for query processing over trajectory data based on mapreduce traditional trajectory data partitioning indexing and query processing technologies are extended so that they may fully utilize the highly parallel processing power of large scale clusters we also show that the append only scheme of mapreduce storage model can be nice base for handling updates of moving objects preliminary experiments show that this framework scales well in terms of the size of trajectory data set it is also discussed the limitation of traditional trajectory data processing techniques and our future research directions
clustering suffers from the curse of dimensionality and similarity functions that use all input features with equal relevance may not be effective we introduce an algorithm that discovers clusters in subspaces spanned by different combinations of dimensions via local weightings of features this approach avoids the risk of loss of information encountered in global dimensionality reduction techniques and does not assume any data distribution model our method associates to each cluster weight vector whose values capture the relevance of features within the corresponding cluster we experimentally demonstrate the gain in perfomance our method achieves with respect to competitive methods using both synthetic and real datasets in particular our results show the feasibility of the proposed technique to perform simultaneous clustering of genes and conditions in gene expression data and clustering of very high dimensional data such as text data
several concurrent implementations of familiar data abstractions such as queues sets or maps typically do not follow locking disciplines and often use lock free synchronization to gain performance since such algorithms are exposed to weak memory model they are notoriously hard to get correct as witnessed by many bugs found in published algorithmswe outline technique for analyzing correctness of concurrent algorithms under weak memory models in which model checker is used to search for correctness violations the algorithm to be analyzed is transformed into form where statements may be reordered according to particular weak memory ordering the transformed algorithm can then be analyzed by model checking tool eg by enumerative state exploration we illustrate the approach on small example of queue which allows an enqueue operation to be concurrent with dequeue operation which we analyze with respect to the rmo memory model defined in sparc
while multiprocessor system on chips mpsocs are becoming widely adopted in embedded systems there is strong need for methodologies that quickly and accurately estimate performance of such complex systems in this paper we present novel method for accurately estimating the cycle counts of parameterized mpsoc architectures through workload simulation driven by program execution traces encoded in the form of branch bitstreams experimental results show that the proposed method delivers speedup factor of to against the instruction set simulator based method while achieving high cycle accuracy whose estimation error ranges between and
we propose implement and evaluate new energy conservation schemes for efficient data propagation in wireless sensor networks our protocols are adaptive ie locally monitor the network conditions and accordingly adjust towards optimal operation choices this dynamic feature is particularly beneficial in heterogeneous settings and in cases of re deployment of sensor devices in the network area we implement our protocols and evaluate their performance through detailed simulation study using our extended version of ns in particular we combine our schemes with known communication paradigms the simulation findings demonstrate significant gains and good trade offs in terms of delivery success delay and energy dissipation
this paper describes the design implementation and evaluation of parallel object database server while number of research groups and companies now provide object database servers designed to run on uniprocessors there has been surprisingly little work on the exploitation of parallelism to provide scalable performance in object database management systems odbms the work described in this paper takes as its starting point the object database management group odmg standard for object databases thereby allowing the project to focus on research into parallelism rather than on the odbms interfaces the system is designed to run on distributed memory parallel machine and the paper describes the key issues and design decisions including parallel query optimisation and execution flow control support for user defined operations in queries object distribution cache management and navigational client access the work shows that the significant differences between the object and relational database paradigms lead to significant differences in the designs of parallel servers to support these two paradigms the paper presents an extensive performance analysis of the prototype systems which shows that good performance can be achieved on cluster of linux pcs
we propose method to analyze secure information flow in stack based assembly languages communicating with the external environment by means of input and output channels the method computes for each instruction security level for each memory variable and stack element instruction level security analysis is flow sensitive and hence is more precise than other analyses such as standard security typing instruction level security analysis is specified in the framework of abstract interpretation we define concrete operational semantics which handles in addition to execution aspects the flow of information of the program the basis of the approach is that each value is annotated by security level and that the abstract domain is obtained from the concrete one by keeping the security levels and forgetting the actual values operand stack are abstracted as fixed length stacks of security levels an abstract state is map from instructions to abstract machine configurations where values are substituted by security levels the abstract semantics consists of set of abstract rules manipulating abstract states the instruction level security typing can be performed by an efficient fixpoint iteration algorithm similar to that used by bytecode verification
owing to the advent of wireless networking and personal digital devices information systems in the era of mobile computing are expected to be able to handle tremendous amount of traffic and service requests from the users wireless data broadcast thanks to its high scalability is particularly suitable for meeting such challenge indexing techniques have been developed for wireless data broadcast systems in order to conserve the scarce power resources in mobile clients however most of the previous studies do not take into account the impact of location information of users in this paper we address the issues of supporting spatial queries including window queries and nn queries of location dependent information via wireless data broadcast linear index structure based on the hilbert curve and corresponding search algorithms are proposed to answer spatial queries on air experiments are conducted to evaluate the performance of the proposed indexing technique results show that the proposed index and its enhancement outperform existing algorithms significantly
we consider in this paper class of publish subscribe pub sub systems called topic based systems where users subscribe to topics and are notified on events that belong to those subscribed topics with the recent flourishing of rss news syndication these systems are regaining popularity and are raising new challenging problems in most of the modern topics based systems the events in each topic are delivered to the subscribers via supporting distributed data structure typically multicast tree since peers in the network may come and go frequently this supporting structure must be continuously maintained so that holes do not disrupt the events delivery the dissemination of events in each topic thus incurs two main costs the actual transmission cost for the topic events and the maintenance cost for its supporting structure this maintenance overhead becomes particularly dominating when pub sub system supports large number of topics with moderate event frequency typical scenario in nowadays news syndication scene the goal of this paper is to devise method for reducing this maintenance overhead to the minimum our aim is not to invent yet another topic based pub sub system but rather to develop generic technique for better utilization of existing platforms our solution is based on novel distributed clustering algorithm that utilizes correlations between user subscriptions to dynamically group topics together into virtual topics called topic clusters andt hereby unifies their supporting structures and reduces costs our technique continuously adapts the topic clusters and the user subscriptions to the system state and incurs only very minimal overhead we have implemented our solution in the tamara pub sub system our experimental study shows this approach to be extremely effective improving the performance by an order of magnitude
image mining is an important task to discover interesting and meaningful patterns form large image databases in this paper we introduce the spatial co orientation patterns in image databases spatial co orientation patterns refer to objects that frequently occur with the same spatial orientation eg left right below etc among images for example an object is frequently left to an object among images we utilize the data structure string to represent the spatial orientation of objects in an image two approaches apriori based and pattern growth approaches are proposed for mining co orientation patterns an experimental evaluation with synthetic datasets shows the advantage and disadvantage between these two algorithms
this paper describes method for animating the appearance of clothing such as pants or shirt that fits closely to figure’s body compared to flowing cloth such as loose dresses or capes these types of garments involve nearly continuous collision contact and small wrinkles that can be troublesome for traditional cloth simulation methods based on the observation that the wrinkles in close fitting clothing behave in predominantly kinematic fashion we have developed an example based wrinkle synthesis technique our method drives wrinkle generation from the pose of the figure’s kinematic skeleton this approach allows high quality clothing wrinkles to be combined with coarse cloth simulation that computes the global and dynamic aspects of the clothing motion while the combined results do not exactly match high resolution reference simulation they do capture many of the characteristic fine scale features and wrinkles further the combined system runs at interactive rates making it suitable for applications where high resolution offline simulations would not be viable option the wrinkle synthesis method uses precomputed database built by simulating the high resolution clothing as the articulated figure is moved over range of poses in principle the space of poses is exponential in the total number of degrees of freedom however clothing wrinkles are primarily affected by the nearest joints allowing each joint to be processed independently during synthesis mesh interpolation is used to consider the influence of multiple joints and combined with coarse simulation to produce the final results at interactive rates
development environments typically present the software engineer with structural perspective of an object oriented system in terms of packages classes and methods from structural perspective it is difficult to gain an understanding of how source entities participate in system’s features at runtime especially when using dynamic languages such as smalltalk in this paper we evaluate the usefulness of offering an alternative complementary feature centric perspective of software system when performing maintenance activities we present feature centric environment combining interactive visual representations of features with source code browser displaying only the classes and methods participating in feature under investigation to validate the usefulness of our feature centric view we conducted controlled empirical experiment where we measured and compared the performance of subjects when correcting two defects in an unfamiliar software system with traditional development environment and with our feature centric environment we evaluate both quantitative and qualitative data to draw conclusions about the usefulness of feature centric perspective to support program comprehension during maintenance activities
several studies have repeatedly demonstrated that both the performance and scalability of shared nothing parallel database system depend on the physical layout of data across the processing nodes of the system today data is allocated in these systems using horizontal partitioning strategies this approach has number of drawbacks if query involves the partitioning attribute then typically only small number of the processing nodes can be used to speedup the execution of this query on the other hand if the predicate of selection query includes an attribute other than the partitioning attribute then the entire data space must be searched again this results in waste of computing resources in recent years several multidimensional data declustering techniques have been proposed to address these problems however these schemes are too restrictive eg fx ecc etc or optimized for certain type of queries eg dm hcam etc in this paper we introduce new technique which is flexible and performs well for general queries we prove its optimality properties and present experimental results showing that our scheme outperforms dm and hcam by significant margin
abstract this paper presents tempos set of models and languages supporting the manipulation of temporal data on top of object dbms the proposed models exploit object oriented technology to meet some important yet traditionally neglected design criteria related to legacy code migration and representation independence two complementary ways for accessing temporal data are offered query language and visual browser the query language namely tempoql is an extension of oql supporting the manipulation of histories regardless of their representations through fully composable functional operators the visual browser offers operators that facilitate several time related interactive navigation tasks such as studying snapshot of collection of objects at given instant or detecting and examining changes within temporal attributes and relationships tempos models and languages have been formalized both at the syntactical and the semantical level and have been implemented on top of an object dbms the suitability of the proposals with regard to applications requirements has been validated through concrete case studies
although graphical user interfaces started as imitations of the physical world many interaction techniques have since been invented that are not available in the real world this paper focuses on one of these previewing and how sensory enhanced input device called presense keypad can provide preview for users before they actually execute the commands preview important in the real world because it is often not possible to undo an action this previewable feature helps users to see what will occur next it is also helpful when the command assignment of the keypad dynamically changes such as for universal commanders we present several interaction techniques based on this input device including menu and map browsing systems and text input system we also discuss finger gesture recognition for the presense keypad
in this research we propose to use the discrete cosine transform to approximate the cumulative distributions of data cube cells values the cosine transform is known to have good energy compaction property and thus can approximate data distribution functions easily with small number of coefficients the derived estimator is accurate and easy to update we perform experiments to compare its performance with well known technique the haar wavelet the experimental results show that the cosine transform performs much better than the wavelet in estimation accuracy speed space efficiency and update easiness
the web is quickly moving from the era of search engines to the era of discovery engines whereas search engines help you find information you are looking for discovery engines help you find things that you never knew existed common discovery technique is to automatically identify and display objects similar to ones previously viewed by the user core to this approach is an accurate method to identify similar documents in this paper we present new approach to identifying similar documents based on conceptual tree similarity measure we represent each document as concept tree using the concept associations obtained from classifier then we make employ tree similarity measure based on tree edit distance to compute similarities between concept trees experiments on documents from the citeseer collection showed that our algorithm performed significantly better than document similarity based on the traditional vector space model
we present framework for offline partial evaluation for call by value functional programming languages with an ml style typing discipline this includes binding time analysis which is polymorphic with respect to binding times allows the use of polymorphic recursion with respect to binding times is applicable to polymorphically typed term and is proven correct with respect to novel small step specialization semanticsthe main innovation is to build the analysis on top of the region calculus of tofte and talpin thus leveraging the tools and techniques developed for it our approach factorizes the binding time analysis into region inference and subsequent constraint analysis the key insight underlying our framework is to consider binding times as properties of regionsspecialization is specified as small step semantics building on previous work on syntactic type soundness results for the region calculus using similar syntactic proof techniques we prove soundness of the binding time analysis with respect to the specializer in addition we prove that specialization preserves the call by value semantics of the region calculus by showing that the reductions of the specializer are contextual equivalences in the region calculus
we study in simulation the performance of two manet routing algorithms in an urban environment the two algorithms aodv and anthocnet are representative of two different approaches and design methodologies aodv is state of the art algorithm following purely reactive approach anthocnet is based on ant colony optimization and integrates proactive and reactive mechanisms we investigate the usefulness of the different approaches they adopt when confronted with the peculiarities of urban environments and real world applications at this aim we define detailed and realistic simulation setup in terms of radio propagation constrained node mobility and data traffic
in the last decade there has been an explosion of online commercial activity enabled by the world wide web an electronic marketplace market provides an online method to perform transactions between buyers and sellers potentially supporting all of the steps in the entire order fulfillment process credibility is an important requirement for the success of an market in this work we model and characterize an market as complex network and use the network structure to investigate the sellers credibility we propose new algorithm based on the structure of the negotiation network to recommend whether the seller is trustable or not we use real data from online marketplace from the biggest brazilian internet service provider as case study besides being prelimary work our technique achieves good results in terms of accuracy predicting correctly the results in more than it can be used to provide more effective reputation system for electronic negotiations which can be very useful as support decision mechanism for buyers
it is difficult for users to formulate appropriate queries for search in this paper we propose an approach to query term selection by measuring the effectiveness of query term in ir systems based on its linguistic and statistical properties in document collections two query formulation algorithms are presented for improving ir performance experiments on ntcir and ntcir ad hoc ir tasks demonstrate that the algorithms can significantly improve the retrieval performance by averagely compared to the performance of the original queries given in the benchmarks
with rapid development of information technology more and more document images are made by scanners but new problem comes out that many of document images from thick books are warped it is quite inconvenient for further process on computer this paper introduces an integrative algorithm on restoring chinese document images which is new filed and few researchers have worked on this subject yet the complicated structure of chinese block words makes the problem more difficult to solve this restoring method which is based on binding characters iteratively and building curved lines using parallel lines method is introduced in the phase of fitting svr is adopted instead of other parameter methods an idea of collaboration is also recommended to guarantee the quality of the final results correction rate of for experiment of document images proves this method works out very well
this paper defines and describes the properties of multicast virtual topology the array and resource efficient variation the rem array it is shown how several collective operations can be implemented efficiently using these virtual topologies while maintaining low complexity because the methods are applicable to any parallel computing environment that supports multicast communication in hardware they provide framework for collective communication libraries that are portable and yet take advantage of such low level hardware functionality in particular the paper describes the practical issues of using these methods in wormhole routed massively parallel computers mpcs and in workstation clusters connected by asynchronous transfer mode atm networks performance results are given for both environments
the virtualization technology makes it feasible that multiple guest operating systems run on single physical machine it is the virtual machine monitor that dynamically maps the virtual cpu of virtual machines to physical cpus according to the scheduling strategy the scheduling strategy in xen schedules virtual cpus of virtual machines asynchronously while guarantees the proportion of the cpu time corresponding to its weight maximizing the throughput of the system however this scheduling strategy may deteriorate the performance when the virtual machine is used to execute the concurrent applications such as parallel programs or multithreaded programs in this paper we analyze the cpu scheduling problem in the virtual machine monitor theoretically and the result is that the asynchronous cpu scheduling strategy will waste considerable physical cpu time when the system workload is the concurrent application then we present hybrid scheduling framework for the cpu scheduling in the virtual machine monitor there are two types of virtual machines in the system the high throughput type and the concurrent type the virtual machine can be set as the concurrent type when the majority of its workload is concurrent applications in order to reduce the cost of synchronization otherwise it is set as the high throughput type as the default moreover we implement the hybrid scheduling framework based on xen and we will give description of our implementation in details at last we test the performance of the presented scheduling framework and strategy based on the multi core platform and the experiment result indicates that the scheduling framework and strategy is feasible to improve the performance of the virtual machine system
as mobile phone has various advanced functionalities or features usability issues are increasingly challenging due to the particular characteristics of mobile phone typical usability evaluation methods and heuristics most of which are relevant to software system might not effectively be applied to mobile phone another point to consider is that usability evaluation activities should help designers find usability problems easily and produce better design solutions to support usability practitioners of the mobile phone industry we propose framework for evaluating the usability of mobile phone based on multi level hierarchical model of usability factors in an analytic way the model was developed on the basis of set of collected usability problems and our previous study on conceptual framework for identifying usability impact factors it has multi abstraction levels each of which considers the usability of mobile phone from particular perspective as there are goal means relationships between adjacent levels range of usability issues can be interpreted in holistic as well as diagnostic way another advantage is that it supports two different types of evaluation approaches task based and interface based to support both evaluation approaches we developed four sets of checklists each of which is concerned respectively with task based evaluation and three different interface types logical user interface lui physical user interface pui and graphical user interface gui the proposed framework specifies an approach to quantifying usability so that several usability aspects are collectively measured to give single score with the use of the checklists small case study was conducted in order to examine the applicability of the framework and to identify the aspects of the framework to be improved it showed that it could be useful tool for evaluating the usability of mobile phone based on the case study we improved the framework in order that usability practitioners can use it more easily and consistently
we study the power of reliable anonymous distributed systems where processes do not fail do not have identifiers and run identical programmes we are interested specifically in the relative powers of systems with different communication mechanisms anonymous broadcast read write registers or read write registers plus additional shared memory objects we show that system with anonymous broadcast can simulate system of shared memory objects if and only if the objects satisfy property we call idemdicence this result holds regardless of whether either system is synchronous or asynchronous conversely the key to simulating anonymous broadcast in anonymous shared memory is the ability to count broadcast can be simulated by an asynchronous shared memory system that uses only counters but read write registers by themselves are not enough we further examine the relative power of different types and sizes of bounded counters and conclude with nonrobustness result
timed concurrent constraint programming tcc is declarative model for concurrency offering logic for specifying reactive systems ie systems that continuously interact with the environment the universal tcc formalism utcc is an extension of tcc with the ability to express mobility here mobility is understood as communication of private names as typically done for mobile systems and security protocols in this paper we consider the denotational semantics for tcc and we extend it to collecting semantics for utcc based on closure operators over sequences of constraints relying on this semantics we formalize the first general framework for data flow analyses of tcc and utcc programs by abstract interpretation techniques the concrete and abstract semantics we propose are compositional thus allowing us to reduce the complexity of data flow analyses we show that our method is sound and parametric wrt the abstract domain thus different analyses can be performed by instantiating the framework we illustrate how it is possible to reuse abstract domains previously defined for logic programming eg to perform groundness analysis for tcc programs we show the applicability of this analysis in the context of reactive systems furthermore we make also use of the abstract semantics to exhibit secrecy flaw in security protocol we have developed prototypical implementation of our methodology and we have implemented the abstract domain for security to perform automatically the secrecy analysis
we propose zip calculus of open existential types that is an extension of system obtained by decomposing the introduction and elimination of existential types into more atomic constructs open existential types model modular type abstraction as done in module systems the static semantics of zip adapts standard techniques to deal with linearity of typing contexts its dynamic semantics is small step reduction semantics that performs extrusion of type abstraction as needed during reduction and the two are related by subject reduction and progress lemmas applying the curry howard isomorphism zip can be also read back as logic with the same expressive power as second order logic but with more modular ways of assembling partial proofs we also extend the core calculus to handle the double vision problem as well as type level and term level recursion the resulting language turns out to be new formalization of minor variant of dreyer’s internal language for recursive and mixin modules
feature interaction presents challenge to feature selection for classification feature by itself may have little correlation with the target concept but when it is combined with some other features they can be strongly correlated with the target concept unintentional removal of these features can result in poor classification performance handling feature interaction can be computationally intractable recognizing the presence of feature interaction we propose to efficiently handle feature interaction to achieve efficient feature selection and present extensive experimental results of evaluation
in an uncertain data sets where is the ground set consisting of elements ps probability function and fs score function each element with score appears independently with probability the top query on asks for the set of elements that has the maximum probability of appearing to be the elements with the highest scores in random instance of computing the top answer on fixed is known to be easy in this paper we consider the dynamic problem that is how to maintain the top query answer when changes including element insertions and deletions in the ground set changes in the probability function and in the score function we present fully dynamic data structure that handles an update in klogn time and answers top query in logn time for any the structure has size and can be constructed in nlogk time as building block of our dynamic structure we present an algorithm for the all top problem that is computing the top answers for all which may be of independent interest
given large graph stored on disk there is often need to perform search over this graph such need could arise for example in the search component of data intensive expert system or to solve path problems in deductive database systems in this paper we present novel data structuring technique and show how branch and bound search algorithm can use this data structuring to prune the search space simulation results confirm that using these techniques search can be expedited significantly without incurring large storage penalty as side benefit it is possible to organize the search to obtain successive approximations to the desired solution with considerable reduction in the total search
is it possible to efficiently reveal concurrency bugs by forcing well selected set of conditions on program interleaving to study this general question we defined two simple models of conditions on program interleaving targeted at the insufficient synchronization scope bug pattern we analyzed these models with respect to several buggy programs we also implemented an algorithm that tries to force one of these models the analysis of these models shows that relatively small models can detect insufficient synchronization scope bugs the experiments with the forcing algorithm demonstrated the ability of finding the bug with high efficiency the average testing time till the bug is detected was improved by factors of and compared with the average time required by dynamic exploration that did not incorporate the forcing algorithm
while scalable noc network on chip based communication architectures have clear advantages over long point to point communication channels their power consumption can be very high in contrast to most of the existing hardware based efforts on noc power optimization this paper proposes compiler directed approach where the compiler decides the appropriate voltage frequency levels to be used for each communication channel in the noc our approach builds and operates on novel graph based representation of parallel program and has been implemented within an optimizing compiler and tested using embedded benchmarks our experiments indicate that the proposed approach behaves better from both performance and power perspectives than hardwarebased scheme and the energy savings it achieves are very close to the savings that could be obtained from an optimal but hypothetical voltage frequency scaling scheme
with the explosive growth of the web people often need to monitor fresh information about their areas of interest by browsing the same sites repeatedly especially even for local area so many pages are created every day that might be greatly helpful knowledge in daily decision supports in order to reduce such monitoring work and not to miss chances to meet critical information for local area we are developing continuous geographic web search system with push based web monitoring services this paper will describe the system architecture to deal with multiple user queries to represent users geographic attention over multiple data pages incoming from geographic web crawlers if newly incoming data page is relevant to preregistered query the user who previously registered the query will be informed spontaneously about the new information by our notification service this paper especially focuses on the problem of how to match multiple data pages and multiple queries as quickly as possible for the purpose we will propose fast matching algorithm based on spatial join and show primitive experimental result with synthesis data
hyperspectral imaging is new technique in remote sensing that generates images with hundreds of spectral bands at different wavelength channels for the same area on the surface of the earth although in recent years several efforts have been directed toward the incorporation of parallel and distributed computing in hyperspectral image analysis there are no standardized architectures for this purpose in remote sensing missions to address this issue this paper develops two highly innovative implementations of standard hyperspectral data processing chain utilized among others in commercial software tools such as kodak’s research systems envi software package one of the most popular tools currently available for processing remotely sensed data it should be noted that the full hyperspectral processing chain has never been implemented in parallel in the past analytical and experimental results are presented in the context of real application using hyperspectral data collected by nasa’s jet propulsion laboratory over the world trade center area in new york city shortly after the terrorist attacks of september th the parallel implementations are tested in two different platforms including thunderhead massively parallel beowulf cluster at nasa’s goddard space flight center and xilinx virtex ii field programmable gate array fpga device combined these platforms deliver an excellent snapshot of the state of the art in those areas and offer thoughtful perspective on the potential and emerging challenges of incorporating parallel processing systems into realistic hyperspectral imaging problems
this paper reports as case study an attempt to model check the control subsystem of an operational nasa robotics system thirty seven properties including both safety and liveness specifications were formulated for the system twenty two of the thirty seven properties were successfully model checked several significant flaws in the original software system were identified and corrected during the model checking process the case study presents the entire process in semi historical mode the goal is to provide reusable knowledge of what worked what did not work and why
context processing in body area networks bans faces unique challenges due to the user and node mobility the need of real time adaptation to the dynamic topological and contextual changes and heterogeneous processing capabilities and energy constraints present on the available devices this paper proposes service oriented framework for the execution of context recognition algorithms we describe and theoretically analyze the performance of the main framework components including the sensor network organization service discovery service graph construction service distribution and mapping the theoretical results are followed by the simulation of the proposed framework as whole showing the overall cost of dynamically distributing applications on the network
we present query algebra that supports optimized access of web services through service oriented queries the service query algebra is defined based on formal service model that provides high level abstraction of web services across an application domain the algebra defines set of algebraic operators algebraic service queries can be formulated using these operators this allows users to query their desired services based on both functionality and quality we provide the implementation of each algebraic operator this enables the generation of service execution plans seps that can be used by users to directly access services we present an optimization algorithm by extending the dynamic programming dp approach to efficiently select the seps with the best user desired quality the experimental study validates the proposed algorithm by demonstrating significant performance improvement compared with the traditional dp approach
the ability to predict linkages among data objects is central to many data mining tasks such as product recommendation and social network analysis substantial literature has been devoted to the link prediction problem either as an implicitly embedded problem in specific applications or as generic data mining task this literature has mostly adopted static graph representation where snapshot of the network is analyzed to predict hidden or future links however this representation is only appropriate to investigate whether certain link will ever occur and does not apply to many applications for which the prediction of the repeated link occurrences are of primary interest eg communication network surveillance in this paper we introduce the time series link prediction problem taking into consideration temporal evolutions of link occurrences to predict link occurrence probabilities at particular time using enron mail data and high energy particle physics literature coauthorship data we have demonstrated that time series models of single link occurrences achieve comparable link prediction performance with commonly used static graph link prediction algorithms furthermore combination of static graph link prediction algorithms and time series models produced significantly better predictions over static graph link prediction methods demonstrating the great potential of integrated methods that exploit both interlink structural dependencies and intralink temporal dependencies
due to the large amount of mobile devices that continue to appear on the consumer market mobile user interface design becomes increasingly important the major issue with many existing mobile user interface design approaches is the time and effort that is needed to deploy user interface design to the target device in order to address this issue we propose the plug and design tool that relies on continuous multi device mouse pointer to design user interfaces directly on the mobile target device this will shorten iteration time since designers can continuously test and validate each design action they take using our approach designers can empirically learn the specialities of target device which will help them while creating user interfaces for devices they are not familiar with
the exact string matching problem is to find the occurrences of pattern of length from text of length symbols we develop novel and unorthodox filtering technique for this problem our method is based on transforming the problem into multiple matching of carefully chosen pattern subsequences while this is seemingly more difficult than the original problem we show that the idea leads to very simple algorithms that are optimal on average we then show how our basic method can be used to solve multiple string matching as well as several approximate matching problems in average optimal time the general method can be applied to many existing string matching algorithms our experimental results show that the algorithms perform very well in practice
although there are several factors contributing to the difficulty in meeting distributed real time transaction deadlines data conflicts among transactions especially in commitment phase are the prime factor resulting in system performance degradation therefore design of an efficient commit protocol is of great significance for distributed real time database systems drtdbs most of the existing commit protocols try to improve system performance by allowing committing cohort to lend its data to an executing cohort thus reducing data inaccessibility these protocols block the borrower when it tries to send workdone prepared message thus increasing the transactions commit time this paper first analyzes all kind of dependencies that may arise due to data access conflicts among executing committing transactions when committing cohort is allowed to lend its data to an executing cohort it then proposes static two phase locking and high priority based write update type ideal for fast and timeliness commit protocol ie swift in swift the execution phase of cohort is divided into two parts locking phase and processing phase and then in place of workdone message workstarted message is sent just before the start of processing phase of the cohort further the borrower is allowed to send workstarted message if it is only commit dependent on other cohorts instead of being blocked as opposed to this reduces the time needed for commit processing and is free from cascaded aborts to ensure non violation of acid properties checking of completion of processing and the removal of dependency of cohort are required before sending the yes vote message simulation results show that swift improves the system performance in comparison to earlier protocol the performance of swift is also analyzed for partial read only optimization which minimizes intersite message traffic execute commit conflicts and log writes consequently resulting in better response time the impact of permitting the cohorts of the same transaction to communicate with each other on swift has also been analyzed
this paper addresses the question of updating relational databases through xml views using query trees to capture the notions of selection projection nesting grouping and heterogeneous sets found throughout most xml query languages we show how xml views expressed using query trees can be mapped to set of corresponding relational views we then show how updates on the xml view are mapped to updates on the corresponding relational views existing work on updating relational views can then be leveraged to determine whether or not the relational views are updatable with respect to the relational updates and if so to translate the updates to the underlying relational database
it has been established that active learning is effective for learning complex subjective query concepts for image retrieval however active learning has been applied in concept independent way ie the kernel parameters and the sampling strategy are identically chosen for learning query concepts of differing complexity in this work we first characterize concept’s complexity using three measures hit rate isolation and diversity we then propose multimodal learning approach that uses images semantic labels to guide concept dependent active learning process based on the complexity of concept we make intelligent adjustments to the sampling strategy and the sampling pool from which images are to be selected and labeled to improve concept learnability our empirical study on image dataset shows that concept dependent learning is highly effective for image retrieval accuracy
commercial cache coherent nonuniform memory access ccnuma systems often require extensive investments in hardware design and operating system support different approach to building these systems is to use standard high volume shv hardware and stock software components as building blocks and assemble them with minimal investments in hardware and software this design approach trades the performance advantages of specialized hardware design for simplicity and implementation speed and relies on application level tuning for scalability and performance we present our experience with this approach in this paper we built way ccnuma intel system consisting of four commodity four processor fujitsu teamserver smps connected by synfinity cache coherent switch the system features total of sixteen mhz intel xeon processors and gb of physical memory and runs the standard commercial microsoft windows nt operating system the system can be partitioned statically or dynamically and uses an innovative combined hardware software approach to support application level performance tuning on the hardware side programmable performance monitor card measures the frequency of remote memory accesses which constitute the predominant source of performance overhead the monitor does not cause any performance overhead and can be deployed in production mode providing the possibility for dynamic performance tuning if the application workload changes over time on the software side the resource set abstraction allows application level threads to improve performance and scalability by specifying their execution and memory affinity across the ccnuma system results from performance evaluation study confirm the success of the combined hardware software approach for performance tuning in computation intensive workloads the results also show that the poor local memory bandwidth in commodity intel based systems rather than the latency of remote memory access is often the main contributor to poor scalability and performance the contributions of this work can be summarized as follows the resource set abstraction allows control over resource allocation in portable manner across ccnuma architectures we describe how it was implemented without modifying the operating system an innovative hardware design for programmable performance monitor card is designed specifically for ccnuma environment and allows dynamic adaptive performance optimizations performance study shows that performance and scalability are often limited by the local memory bandwidth rather than by the effects of remote memory access in an intel based architecture
the sliding window model is useful for discounting stale data in data stream applications in this model data elements arrive continually and only the most recent elements are used when answering queries we present novel technique for solving two important and related problems in the sliding window model maintaining variance and maintaining median clustering our solution to the problem of maintaining variance provides continually updated estimate of the variance of the last values in data stream with relative error of at most epsilon using epsilon log memory we present constant factor approximation algorithm which maintains an approximate median solution for the last data points using tau tau log memory where tau is parameter which trades off the space bound with the approximation factor of tau
we study the problem of classification when only dissimilarity function between objects is accessible that is data samples are represented not by feature vectors but in terms of their pairwise dissimilarities we establish sufficient conditions for dissimilarity functions to allow building accurate classifiers the theory immediately suggests learning paradigm construct an ensemble of simple classifiers each depending on pair of examples then find convex combination of them to achieve large margin we next develop practical algorithm referred to as dissimilarity based boosting dboost for learning with dissimilarity functions under theoretical guidance experiments on variety of databases demonstrate that the dboost algorithm is promising for several dissimilarity measures widely used in practice
peer to peer pp systems research has gained considerable attention recently with the increasing popularity of file sharing applications since these applications are used for sharing huge amounts of data it is very important to efficiently locate the data of interest in such systems however these systems usually do not provide efficient search techniques existing systems offer only keyword search functionality through centralized index or by query flooding in this paper we propose scheme based on reference vectors for sharing multi dimensional data in pp systems this scheme effectively supports larger set of query operations such as nn queries and content based similarity search than current systems which generally support only exact key lookups and keyword searchesthe basic idea is to store multiple replicas of an object’s index at different peers based on the distances between the object’s feature vector and the reference vectors later when query is posed the system identifies the peers that are likely to store the index information about relevant objects using reference vectors thus the system is able to return accurate results by contacting small fraction of the participating peers
the paper presents an approach to performance analysis of heterogeneous parallel algorithms as typical heterogeneous parallel algorithm is just modification of some homogeneous one the idea is to compare the heterogeneous algorithm with its homogeneous prototype and to assess the heterogeneous modification rather than analyse the algorithm as an isolated entity criterion of optimality of heterogeneous parallel algorithms is suggested parallel algorithm of matrix multiplication on heterogeneous clusters is used to illustrate the proposed approach
this paper defines new analysis paradigm blended program analysis that enables practical effective analysis of large framework based java applications for performance understanding blended analysis combines dynamic representation of the program calling structure with static analysis applied to region of that calling structure with observed performance problems blended escape analysis is presented which enables approximation of object effective lifetimes to facilitate explanation of the usage of newly created objects in program region performance bottlenecks stemming from overuse of temporary structures are common in framework based applications metrics are introduced to expose how in aggregate these applications make use of new objects results of empirical experiments with the trade benchmark are presented case study demonstrates how results from blended escape analysis can help locate in region which calls distinct methods the single call path responsible for performance problem involving objects created at distinct sites and as far as call levels away
program differencing is common means of software debugging although many differencing algorithms have been proposed for procedural and object oriented languages like and java there is no differencing algorithm for aspect oriented languages so far in this paper we propose an approach for difference analysis of aspect oriented programs the proposed algorithm contains novel way of matching two versions of module of which the signature has been modified for this we also work out set of well defined signatures for the new elements in the aspectj language in accordance with these signatures and with those existent for elements of the java language we investigate set of signature patterns to be used with the module matching algorithm furthermore we demonstrate successful application of node by node comparison algorithm originally developed for object oriented programs using tool which implements our algorithms we set up and evaluate set of test cases the results demonstrate the effectiveness of our approach for large subset of the aspectj language
distributed virtual environments dves are distributed simulated virtual worlds where users gather and interact within shared space web based dve applications are attracting more and more attention however building dve applications requires significant effort even with the modern development tools in this paper we propose component based and service based framework for constructing dve applications from coarse grained components this component based and service oriented architecture provides great flexibility for building complex dve applications based on the developed terminology and profile the framework provides high level description language for specifying user interaction tasks the dve developers can concentrate on the application design rather than worrying about the programming details the framework also provides runtime platform for coarse grained components integration and shared scene graph for coordinating the presentation for individual users
we directly investigate subject of much recent debate do word sense disambiguation models help statistical machine translation quality we present empirical results casting doubt on this common but unproved assumption using state of the art chinese word sense disambiguation model to choose translation candidates for typical ibm statistical mt system we find that word sense disambiguation does not yield significantly better translation quality than the statistical machine translation system alone error analysis suggests several key factors behind this surprising finding including inherent limitations of current statistical mt architectures
promising approach for type safe generic codes in the object oriented paradigm is variant parametric type which allows covariant and contravariant subtyping on fields where appropriate previous approaches formalise variant type as special case of the existential type system in this paper we present new framework based on flow analysis and modular type checking to provide simple but accurate model for capturing generic types our scheme stands to benefit from past and future advances in flow analysis and subtyping constraints furthermore it fully supports casting for variant types with special reflection mechanism called cast capture to handle objects with unknown types we have built constraint based type checker and have proven its soundness we have also successfully annotated suite of java libraries and client code with our flow based variant type system
this research abstract outlines the work plan to do as part of my phd in particular propose to devise practical way of integrating ownership control into existing programming languages in way that will help with adoption of ownership in the general programming community
in this paper we introduce optimus an optimizing synthesis compiler for streaming applications optimus compiles programs written in high level streaming language to either software or hardware implementations the compiler uses hierarchical compilation strategy that separates concerns between macro and micro functional requirements macro functional concerns address how components modules are assembled to implement larger more complex applications micro functional issues deal with synthesis issues of the module internals optimus thus allows software developers who lack deep hardware design expertise to transparently leverage the advantages of hardware customization without crossing the semantic gap between high level languages and hardware description languages optimus generates streaming hardware that achieves on average speedup over our baseline embedded processor for fraction of the energy additionally our results show that streaming specific optimizations can further improve performance by and reduce the area requirements by in average these designs are competitive with handel implementations for some of the same benchmarks
the throughput of local area networks is rapidly increasing for example the bandwidth of new atm networks and fddi token rings is an order of magnitude greater than that of ethernets other network technologies promise bandwidth increase of yet another order of magnitude in several years however in distributed systems lowered latency rather than increased throughput is often of primary concern this paper examines the system level effects of newer high speed network technologies on low latency cross machine communications to evaluate number of influences both hardware and software we designed and implemented new remote procedure call system targeted at providing low latency we then ported this system to several hardware platforms decstation and sparcstation with several different networks and controllers atm fddi and ethernet comparing these systems allows us to explore the performance impact of alternative designs in the communication system with respect to achieving low latency eg the network the network controller the hose architecture and cache system and the kernel and user level runtime software our rpc system which achieves substantially reduced call times mgr seconds on an atm network using decstation hosts allows us to isolate those components of next generation networks and controllers that still stand in the way of low latency communication we demonstrate that new generation processor technology and software design can reduce small packet rpc times to near network imposed limits making network and controller design more crucial than ever to achieving truly low latency communication
the aim of our gold model is to provide an object oriented oo multidimensional data model supported by an oo formal specification language that allows us to automatically generate prototypes from the specification at the conceptual level and therefore to animate and check system properties within the context of oo modeling and automatic prototyping the basis of the mapping from modeling to programming is focused on the identification of cardinality and behavioral patterns in the design phase and their relationships with the data model process model and interface design the aim of this paper therefore is the identification of these patterns based on the relationships between the dimension attributes included in cube classes these patterns will associate data together with olap operations and will allow us to have concise execution model that maps every pattern of modeling into its corresponding implementation making users able to accomplish olap operations on cube classes furthermore we extend the set of classical olap operations with two more operations combine divide to allow us to navigate along attributes that are not part of any classification hierarchy
nowadays xml based data integration systems are accepted as data service providers on the web in order to make such data integration system fully equipped with data manipulation capabilities programming frameworks which support update at the integration level are being developed when the user is permitted to submit updates it is necessary to establish the best possible data consistency in the whole data integration system to that extend we present an approach based on an xquery trigger service we define an xquery trigger model together with its semantics we report on the integration of the xquery trigger service into the overall architecture and discuss details of the execution model experiments show that data consistency is enforced easily efficiently and conveniently at the global level
the grid and agent communities both develop concepts and mechanisms for open distributed systems albeit from different perspectives the grid community has historically focused on brawn infrastructure tools and applications for reliable and secure resource sharing within dynamic and geographically distributed virtual organizations in contrast the agents community has focused on brain autonomous problem solvers that can act flexibly in uncertain and dynamic environments yet as the scale and ambition of both grid and agent deployments increase we see convergence of interests with agent systems requiring robust infrastructure and grid systems requiring autonomous flexible behaviors motivated by this convergence of interests we review the current state of the art in both areas review the challenges that concern the two communities and propose research and technology development activities that can allow for mutually supportive efforts
this paper describes the novel features of commercial software only solution to scanning the software object modeler our work is motivated by the desire to produce low cost portable scanning system based on hand held digital photographs we describe the novel techniques we have employed to achieve robust software based system in the areas of camera calibration surface generation and texture extraction
the implementation of interconnect is becoming significant challenge in modern integrated circuit ic design both synchronous and asynchronous strategies have been suggested to manage this problem creating low skew clock tree for synchronous inter block pipeline stages is significant challenge asynchronous interconnect does not require global clock and therefore it has potential advantage in terms of design effort this paper presents an asynchronous interconnect design that can be implemented using standard application specific ic flow this design is considered across range of ic interconnect scenarios the results demonstrate that there is region of the design space where the implementation provides an advantage over synchronous interconnect by removing the need for clocked inter block pipeline stages while maintaining high throughput further results demonstrate computer aided design tool enhancement that would significantly increase this space detailed comparison of power area and latency of the two strategies is also provided for range of ic scenarios
during the recent years the web has been developed rapidly making the efficient searching of information difficult and time consuming in this work we propose web search personalization methodology by coupling data mining techniques with the underlying semantics of the web content to this purpose we exploit reference ontologies that emerge from web catalogs such as odp which can scale to the growth of the web our methodology uses ontologies to provide the semantic profiling of users interests based on the implicit logging of their behavior and the on the fly semantic analysis and annotation of the web results summaries
as collaborative learning in general and pair programming in particular has become widely adopted in computer science education so has the use of pedagogical visualization tools for facilitating collaboration however there is little theory on collaborative learning with visualization and few studies on their effect on each other we build on the concept of the engagement taxonomy and extend it to classify finer variations in the engagement that result from the use of visualization tool we analyze the applicability of the taxonomy to the description of the differences in the collaboration process when visualization is used our hypothesis is that increasing the level of engagement between learners and the visualization tool results in higher positive impact of the visualization on the collaboration process this article describes an empirical investigation designed to test the hypothesis the results provide support for our extended engagement taxonomy and hypothesis by showing that the collaborative activities of the students and the engagement levels are correlated
this paper presents data driven approach to simultaneous segmentation and labeling of parts in meshes an objective function is formulated as conditional random field model with terms assessing the consistency of faces with labels and terms between labels of neighboring faces the objective function is learned from collection of labeled training meshes the algorithm uses hundreds of geometric and contextual label features and learns different types of segmentations for different tasks without requiring manual parameter tuning our algorithm achieves significant improvement in results over the state of the art when evaluated on the princeton segmentation benchmark often producing segmentations and labelings comparable to those produced by humans
many to one packet routing and scheduling are fundamental operations of sensor networks it is well known that many sensor network applications rely on data collection from the nodes the sensors by central processing device there is wide range of data gathering applications like target and hazard detection environmental monitoring battlefield surveillance etc consequently efficient data collection solutions are needed to improve the performance of the network in this paper we assume known distribution of sources each node wants to transmit at most one packet and one common destination called base station we provide via simple mathematical models transmission schedule for routing all the messages to the base station jointly minimizing both the completion time and the average packet delivery time we define our network model and provide improved lower bounds for linear two branch and star or multi branch network topologies all our algorithms run in polynomial time finally we prove that the problem of quality of service qos in our setting for star network under the same target function is np complete by showing reduction from the set partition problem
median clustering extends popular neural data analysis methods such as the self organizing map or neural gas to general data structures given by dissimilarity matrix only this offers flexible and robust global data inspection methods which are particularly suited for variety of data as occurs in biomedical domains in this chapter we give an overview about median clustering and its properties and extensions with particular focus on efficient implementations adapted to large scale data analysis
the difficulty of developing wireless sensor systems is widely recognized problems associated with testing debugging and profiling are key contributing factors while network simulators have proven useful they are unable to capture the subtleties of underlying hardware nor the dynamics of wireless signal propagation and interference and physical experimentation remains necessity to this end developers increasingly rely on shared deployments exposed for physical experimentation sensor network testbeds are under development across the world we present complementary testbed architecture that derives its novelty from three characteristics first the system is interactive users can profile source and network level components across network in real time as well as inject transient state faults and external network traffic second the system is source centric it enables automated source code analysis instrumentation and compilation finally the design is open developers can extend the set of exposed inter faces as appropriate to particular projects without modifying the underlying middleware we present the testbed design and implementation graphical user interface shell based macro programming interface example scenarios that illustrate their use and discuss the testbed’s application in the research and teaching activities at client institutions
we examine the problem of overcoming noisy word level alignments when learning tree to string translation rules our approach introduces new rules and re estimates rule probabilities using em the major obstacles to this approach are the very reasons that word alignments are used for rule extraction the huge space of possible rules as well as controlling overfitting by carefully controlling which portions of the original alignments are reanalyzed and by using bayesian inference during re analysis we show significant improvement over the baseline rules extracted from word level alignments
in this paper we present novel and fast constructive technique that relocates the instruction code in such manner into the main memory that the cache is utilized more efficiently the technique is applied as preprocessing step ie before the code is executed our technique is applicable in embedded systems where the number and characteristics of tasks running on the system is known priori the technique does not impose any computational overhead to the system as result of applying our technique to variety of real world applications we observed through simulation significant drop of cache misses furthermore the energy consumption of the whole system cpu caches buses main memory is reduced by up to these benefits could be achieved by slightly increased main memory size of about on average
in this paper hybrid anomaly intrusion detection scheme using program system calls is proposed in this scheme hidden markov model hmm detection engine and normal database detection engine have been combined to utilise their respective advantages fuzzy based inference mechanism is used to infer soft boundary between anomalous and normal behaviour which is otherwise very difficult to determine when they overlap or are very close to address the challenging issue of high cost in hmm training an incremental hmm training with optimal initialization of hmm parameters is suggested experimental results show that the proposed fuzzy based detection scheme can reduce false positive alarms by compared to the single normal database detection scheme our hmm incremental training with the optimal initialization produced significant improvement in terms of training time and storage as well the hmm training time was reduced by four times and the memory requirement was also reduced significantly
the task of list selection is fundamental to many user interfaces and the traditional scrollbar is control that does not utilise the rich input features of many mobile devices we describe the design and evaluation of zoofing list selection interface for touch pen devices that combines pressure based zooming and flick based scrolling while previous flick based interfaces have performed similarly to traditional scrolling for short distances and worse for long ones zoofing outperforms and is preferred to traditional scrolling flick based scrolling and orthozoom we analyse experimental logs to understand how pressure was used and discuss directions for further work
learning ranking or preference functions has become an important data mining task in recent years as various applications have been found in information retrieval among rank learning methods ranking svm has been favorably applied to various applications eg optimizing search engines improving data retrieval quality in this paper we first develop norm ranking svm that is faster in testing than the standard ranking svm and propose ranking vector svm rv svm that revises the norm ranking svm for faster training the number of variables in the rv svm is significantly smaller thus the rv svm trains much faster than the other ranking svms we experimentally compared the rv svm with the state of the art rank learning method provided in svm light the rv svm uses much less support vectors and trains much faster for nonlinear kernels than the svm light the accuracies of rv svm and svm light are comparable on relatively large data sets our implementation of rv svm is posted at http iispostechackr rv svm
the use of pointers presents serious problems for software productivity tools for software understanding restructuring and testing pointers enable indirect memory accesses through pointer dereferences as well as indirect procedure calls eg through function pointers in such indirect accesses and calls can be disambiguated with pointer analysis in this paper we evaluate the precision of one specific pointer analysis the fa pointer analysis by zhang et al for the purposes of call graph construction for programs with function pointers the analysis is incorporated in production strength code browsing tool from siemens corporate research in which the program call graph is used as primary tool for code understandingthe fa pointer analysis uses an inexpensive almost linear flow and context insensitive algorithm to measure analysis precision we compare the call graph constructed by this analysis with the most precise call graph obtainable by large category of existing pointer analyses surprisingly for all our data programs the fa analysis achieves the best possible precision this result indicates that for the purposes of call graph construction inexpensive pointer analyses may provide precision comparable to the precision of expensive pointer analyses
this article describes gaming and storytelling activities in mixed environment that integrates the real and virtual worlds uses an augmented reality paradigm and is supported by structuring and presentation framework for use in context aware mixed reality applications the basis of the framework is generic hypermedia model that can handle different media elements objects and relations between spaces and locations in physical and virtual worlds main component of the model deals with providing contextual content according to the state of the application and the person using it storytelling layer was also defined mainly by using the contextual mechanisms of the base model this layer provides abstractions to storytelling applications that reflect the morphology of common story structures and supports gaming flow the framework is being tested in gaming and storytelling environment that integrates the real world media elements and virtual worlds
checkpointing with rollback recovery is well known method for achieving fault tolerance in distributed systems in this work we introduce algorithms for checkpointing and rollback recovery on asynchronous unidirectional and bi directional ring networks the proposed checkpointing algorithms can handle multiple concurrent initiations by different processes while taking checkpoints processes do not have to take into consideration any application message dependency the synchronization is achieved by passing control messages among the processes application messages are acknowledged each process maintains list of unacknowledged messages here we use logical checkpoint which is standard checkpoint ie snapshot of the process plus list of messages that have been sent by this process but are unacknowledged at the time of taking the checkpoint the worst case message complexity of the proposed checkpointing algorithm is kn when initiators initiate concurrently the time complexity is for the recovery algorithm time and message complexities are both
planning and allocating resources for testing is difficult and it is usually done on empirical basis often leading to unsatisfactory results the possibility of early estimating the potential faultiness of software could be of great help for planning and executing testing activities most research concentrates on the study of different techniques for computing multivariate models and evaluating their statistical validity but we still lack experimental data about the validity of such models across different software applicationsthis paper reports an empirical study of the validity of multivariate models for predicting software fault proneness across different applications it shows that suitably selected multivariate models can predict fault proneness of modules of different software packages
administration of grid resources is time consuming and often tedious job most administrative requests are predictable and in general handling them requires knowledge of the local resources and the requester in this paper we discuss system to provide automated support for administrative requests such as resource reservation and user account management we propose using trust metrics to help judge the merits and suitability of each request we outline how these metrics can be implemented using trust management techniques into practical system we call gridadmin
computation of semantic similarity between concepts is very common problem in many language related tasks and knowledge domains in the biomedical field several approaches have been developed to deal with this issue by exploiting the knowledge available in domain ontologies snomed ct and specific closed and reliable corpuses clinical data however in recent years the enormous growth of the web has motivated researchers to start using it as the base corpus to assist semantic analysis of language this paper proposes and evaluates the use of the web as background corpus for measuring the similarity of biomedical concepts several classical similarity measures have been considered and tested using benchmark composed by biomedical terms and comparing the results against approaches in which specific clinical data were used results shows that the similarity values obtained from the web are even more reliable than those obtained from specific clinical data manifesting the suitability of the web as an information corpus for the biomedical domain
multi label problems arise in various domains such as multi topic document categorization protein function prediction and automatic image annotation one natural way to deal with such problems is to construct binary classifier for each label resulting in set of independent binary classification problems since multiple labels share the same input space and the semantics conveyed by different labels are usually correlated it is essential to exploit the correlation information contained in different labels in this paper we consider general framework for extracting shared structures in multi label classification in this framework common subspace is assumed to be shared among multiple labels we show that the optimal solution to the proposed formulation can be obtained by solving generalized eigenvalue problem though the problem is nonconvex for high dimensional problems direct computation of the solution is expensive and we develop an efficient algorithm for this case one appealing feature of the proposed framework is that it includes several well known algorithms as special cases thus elucidating their intrinsic relationships we further show that the proposed framework can be extended to the kernel induced feature space we have conducted extensive experiments on multi topic web page categorization and automatic gene expression pattern image annotation tasks and results demonstrate the effectiveness of the proposed formulation in comparison with several representative algorithms
wireless sensor networks wsns are composed of tiny devices with limited computation and battery capacities for such resource constrained devices data transmission is very energy consuming operation to maximize wsn lifetime it is essential to minimize the number of bits sent and received by each device one natural approach is to aggregate sensor data along the path from sensors to the sink aggregation is especially challenging if end to end privacy between sensors and the sink or aggregate integrity is required in this article we propose simple and provably secure encryption scheme that allows efficient additive aggregation of encrypted data only one modular addition is necessary for ciphertext aggregation the security of the scheme is based on the indistinguishability property of pseudorandom function prf standard cryptographic primitive we show that aggregation based on this scheme can be used to efficiently compute statistical values such as mean variance and standard deviation of sensed data while achieving significant bandwidth savings to protect the integrity of the aggregated data we construct an end to end aggregate authentication scheme that is secure against outsider only attacks also based on the indistinguishability property of prfs
this paper presents general framework for the study of relation based intuitionistic fuzzy rough sets by using constructive and axiomatic approaches in the constructive approach by employing an intuitionistic fuzzy implicator and an intuitionistic fuzzy triangle norm lower and upper approximations of intuitionistic fuzzy sets with respect to an intuitionistic fuzzy approximation space are first defined properties of intuitionistic fuzzy rough approximation operators are examined the connections between special types of intuitionistic fuzzy relations and properties of intuitionistic fuzzy approximation operators are established in the axiomatic approach an operator oriented characterization of intuitionistic fuzzy rough sets is proposed different axiom sets characterizing the essential properties of intuitionistic fuzzy approximation operators associated with various intuitionistic fuzzy relations are explored
we present system for producing multi viewpoint panoramas of long roughly planar scenes such as the facades of buildings along city street from relatively sparse set of photographs captured with handheld still camera that is moved along the scene our work is significant departure from previous methods for creating multi viewpoint panoramas which composite thin vertical strips from video sequence captured by translating video camera in that the resulting panoramas are composed of relatively large regions of ordinary perspective in our system the only user input required beyond capturing the photographs themselves is to identify the dominant plane of the photographed scene our system then computes panorama automatically using markov random field optimization users may exert additional control over the appearance of the result by drawing rough strokes that indicate various high level goals we demonstrate the results of our system on several scenes including urban streets river bank and grocery store aisle
many applications of wireless sensor networks require the sensor nodes to obtain their locations the main idea in most localization methods has been that some statically deployed nodes landmarks with known coordinates eg gps equipped nodes transmit beacons with their coordinates in order to help other nodes to localize themselves promising method that significantly reduces the cost is to replace the set of statically deployed gps enhanced sensors with one mobile landmark equipped with gps unit that moves to cover the entire network in this case fundamental research issue is the planning of the path that the mobile landmark should travel along in order to minimize the localization error as well as the time required to localize the whole network these two objectives can potentially conflict with each other in this paper we first study three different trajectories for the mobile landmark namely scan double scan and hilbert we show that any deterministic trajectory that covers the whole area offers significant benefits compared to random movement of the landmark when the mobile landmark traverses the network area at fine resolution scan has the lowest localization error among the three trajectories followed closely by hilbert but when the resolution of the trajectory is larger than the communication range the hilbert space filling curve offers significantly better accuracy than the other two trajectories we further study the tradeoffs between the trajectory resolution and the localization accuracy in the presence of hop localization in which sensors that have already obtained an estimate of their positions help to localize other sensors we show that under moderate sensor mobility hop localization along with good trajectory reduces the average localization error over time by about
bags ie sets with duplicates are often used to implement relations in database systems in this paper we study the expressive power of algebras for manipulating bags the algebra we present is simple extension of the nested relation algebra our aim is to investigate how the use of bags in the language extends its expressive power and increases its complexity we consider two main issues namely the relationship between the depth of bag nesting and the expressive power and ii the relationship between the algebraic operations and their complexity and expressive power we show that the bag algebra is more expressive than the nested relation algebra at all levels of nesting and that the difference may be subtle we establish hierarchy based on the structure of algebra expressions this hierarchy is shown to be highly related to the properties of the powerset operator
we propose method which given document to be classified automatically generates an ordered set of appropriate descriptors extracted from thesaurus the method creates bayesian network to model the thesaurus and uses probabilistic inference to select the set of descriptors having high posterior probability of being relevant given the available evidence the document to be classified our model can be used without having preclassified training documents although it improves its performance as long as more training data become available we have tested the classification model using document dataset containing parliamentary resolutions from the regional parliament of andalucia at spain which were manually indexed from the eurovoc thesaurus also carrying out an experimental comparison with other standard text classifiers
in recent work we showed how to implement new atomic keyword as an extension to the java programming language it allows program to perform series of heap accesses atomically without needing to use mutual exclusion locks we showed that data structures built using it could perform well and scale to large multi processor systems in this paper we extend our system in two ways firstly we show how to provide an explicit abort operation to abandon execution of an atomic block and to automatically undo any updates made within it secondly we show how to perform external within an atomic block during our work we found that it was surprisingly difficult to support these operations without opening loopholes through which the programmer could subvert language based security mechanisms our final design is based on external action abstraction allowing code running within an atomic block to request that given pre registered operation be executed outside the block
we examine the problem of efficiently computing sum count avg aggregates over objects with non zero extent recent work on computing multi dimensional aggregates has concentrated on objects with zero extent points on multi dimensional grid or one dimensional intervals however in many spatial and or spatio temporal applications objects have extent in various dimensions while they can be located anywhere in the application space the aggregation predicate is typically described by multi dimensional box box sum aggregation we examine two variations of the problem in the simple case an object’s value contributes to the aggregation result as whole as long as the object intersects the query box more complex is the functional box sum aggregation introduced in this paper where objects participate in the aggregation proportionally to the size of their intersection with the query box we first show that both problems can he reduced to dominance sum queries traditionally dominance sum queries are addressed in main memory by static structure the ecdf tree we then propose two extensions namely the ecdf trees that make this structure disk based and dynamic finally we introduce the da tree that combines the advantages from each ecdf tree we run experiments comparing the performance of the ecdf trees the ba tree and traditional tree which has been augmented to include aggregation information on its index nodes over spatial datasets our evaluation reaffirms that the ba tree has more robust performance compared against the augmented tree the ba tree offers drastic improvement in query performance at the expense of some limited extra space
scientific programs often include multiple loops over the same data interleaving parts of different loops may greatly improve performance we exploit this in compiler for titanium dialect of java our compiler combines reordering optimizations such as loop fusion and tiling with storage optimizations such as array contraction eliminating or reducing the size of temporary arrays the programmers we have in mind are willing to spend some time tuning their code and their compiler parameters given that and the difficulty in statically selecting parameters such as tile sizes it makes sense to provide automatic parameter searching alongside the compiler our strategy is to optimize aggressively but to expose the compiler’s decisions to external control we double or triple the performance of gauss seidel relaxation and multi grid versus an optimizing compiler without tiling and array contraction and we argue that ours is the best compiler for that kind of program
recently time synchronization algorithm called pairwise broadcast synchronization pbs is proposed with pbs sensor can be synchronized by overhearing synchronization packet exchange among its neighbouring sensors without sending out any packet itself in an one hop sensor network where every node is neighbour of each other single pbs message exchange between two nodes would facilitate all nodes to synchronize however in multi hop sensor network pbs message exchanges in several node pairs are needed in order to achieve network wide synchronization to reduce the number of message exchanges these node pairs should be carefully chosen in this paper we investigate how to choose these appropriate sensors aiming at reducing the number of pbs message exchanges while allowing every node to synchronize this selection problem is shown to be np complete for which the greedy heuristic is good polynomial time approximation algorithm nevertheless centralized algorithm is not suitable for wireless sensor networks therefore we develop distributed heuristic algorithm allowing sensor to determine how to synchronize itself based on its neighbourhood information only the protocol is tested through extensive simulations the simulation results reveal that the proposed protocol gives consistent performance under different conditions with its performance comparable to that of the centralized algorithm
concurrent data structure implementation is considered non blocking if it meets one of three following liveness criteria wait freedom lock freedom or obstruction freedom developers of non blocking algorithms aim to meet these criteria however to date their proofs for non trivial algorithms have been only manual pencil and paper semi formal proofs this paper proposes the first fully automatic tool that allows developers to ensure that their algorithms are indeed non blocking our tool uses rely guarantee reasoning while overcoming the technical challenge of sound reasoning in the presence of interdependent liveness properties
routing algorithms with time and message complexities that are provably low and independent of the total number of nodes in the network are essential for the design and operation of very large scale wireless mobile ad hoc networks manets in this paper we develop and analyze cluster overlay broadcast cob low complexity routing algorithm for manets cob runs on top of one hop cluster cover of the network which can be created and maintained using for instance the least cluster change lcc algorithm we formally prove that the lcc algorithm maintains cluster cover with constant density of cluster leaders with minimal update cost cob discovers routes by flooding broadcasting route requests through the network of cluster leaders with doubling radius technique building on the constant density property of the network of cluster leaders we formally prove that if there exists route from source to destination node with minimum hop count of delta then cob discovers route with at most delta hops from the source to the destination node in at most delta time and by sending at most delta messages we prove this result for arbitrary node distributions and mobility patterns and also show that cob adapts asymptotically optimally to the mobility of the nodes in our simulation experiments we examine the network layer performance of cob compare it with dynamic source routing and investigate the impact of the mac layer on cob routing
the development of real time systems is based on variety of different methods and notations despite the purported benefits of formal methods informal techniques still play predominant role in current industrial practice formal and informal methods have been combined in various ways to smoothly introduce formal methods in industrial practice the combination of real time structured analysis sa rt with petri nets is among the most popular approaches but has been applied only to requirements specifications this paper extends sa rt to specifications of the detailed design of embedded real time systems and combines the proposed notation with petri nets
thermal effects are becoming increasingly important in today’s sub micron technologies thermal issues affect the performance the reliability and the cooling costs of integrated systems high peak temperatures are of major concern in modern designs where the stacking of multiple layers leads to higher power densities therefore the integration of the thermal aware design during the initial phases of the design can reduce the cost and the time to market of the resulting product an efficient floorplanning in terms of thermal effects will reduce the appearance of critical hotspots and will spread heat across the chip area this paper analyzes the thermal distribution of multicore architectures and provides motivation for the need of thermal aware floorplanner for such architectures
protecting the sensitive information in company’s data warehouse from unauthorized access is an important component of regulatory compliance and privacy protection for business intelligence bi applications the access control features in current database systems are not suitable to meet this requirement since they are limited to base table accesses whereas bi applications typically use materialized views for better performance in this paper we provide middleware enabled policy based framework that allows access control to be uniformly applied to both base tables and materialized views to enable selective access to data warehouse we also provide empirical evaluation of our approach
within distributed computing environments access to very large geospatial datasets often suffers from slow or unreliable network connections to allow users to start working with partially delivered dataset progressive transmission methods are viable solution while incremental and progressive methods have been applied successfully to the transmission of raster images over the world wide web and in the form of prototypes of triangular meshes the transmission of vector map datasets has lacked similar attention this paper introduces solution to the progressive transmission of vector map data that allows users to apply analytical gis methods to partially transmitted data sets the architecture follows client server model with multiple map representations at the server side and thin client that compiles transmitted increments into topologically consistent format this paper describes the concepts develops an architecture and discusses implementation concerns
zigzag is unique hyperstructural paradigm designed by the hypertext pioneer ted nelson it has piqued lot of interest in the hypertext community in recent years because of its aim of revolutionizing electronic access to information and knowledge bases in zigzag information is stored in cells that are arranged into lists organized along unlimited numbers of intersecting sets of associations called dimensions to this infrastructure mechanism of transclusion is added allowing the data stored in cells to span and hence be utilized in different contexts proponents of zigzag claim that it is flexible and universal structure for information representation and yet the system has not been widely adopted and has been implemented even more rarely in this paper we address the question of whether there are intrinsic theoretical reasons as to why this is the case while the basic features and specifications of zigzag are well known we delve in to the less understood area of its theoretical underpinnings to tackle this question by modeling zigzag within the framework of set theory we reveal new class of hyperstructure that contains no referencable link objects whatsoever instead grouping non referencable binary associations into disjunct but parallel sets of common semantics dimensions we go on to further specialize these dimensional models into sets of finite partial functions which are closed over single domain isolating the new class of hyperstructures we are calling hyperorders this analysis not only sheds light on the benefits and limitations of the zigzag hypermedia system but also provides framework to describe and understand wider family of possible hyperstructure models of which it is an early example characteristics of zigzag’s transclusion mechanisms are also investigated highlighting previously unrecognized distinction and potential irrevocable conflict between two distinct uses of content reuse instance and identity transclusion
this paper presents novel methodology for modelling and analyzing of behavior relations of concurrent systems the set of all firing sequences of petri net is an important tool for describing the dynamic behavior of concurrent systems in this paper the behavior relativity of two concurrent subsystems in their synchronous composition is presented such behavior relativities including controlled relativity united relativity interactive relativity and exclusive relativity are defined respectively the properties of the relativities are discussed in detail the analysis method for these properties is based on minimum invariants when two subsystems are live bounded petri nets well known example has also been analyaed using the new methodology to demonstrate the advantages of the proposed methodology
the dynamic nature of clinical work makes it challenging to assess the usability of mobile information and communication technology ict for hospitals to achieve some of the realism of field evaluations combined with the control of laboratory based evaluations we have conducted usability tests of prototypes in laboratory custom designed as full scale ward section nurses and physicians acting out simulated clinical scenarios have used the prototypes this paper reports on the general methodological lessons learned from three such formative usability evaluations we have learned that the physical test environment the test scenarios and the prototypes form three variables that need to reflect sufficient amount of realism and concreteness in order to help generate valid test results at the same time these variables are tools that can help control and focus the evaluation on specific issues that one wants to gather data on we have also learned that encouraging reflection among participants and using detailed multi perspective recordings of usage can help form more precise understanding of how mobile ict can accommodate clinical work the current paper aims to inform work toward best practice for laboratory based evaluations of mobile ict for hospitals
wireless sensor networks consist of large number of tiny sensors that have only limited energy supply one of the major challenges in constructing such networks is to maintain long network lifetime as well as sufficient sensing area to achieve this goal broadly used method is to turn off redundant sensors in this paper the problem of estimating redundant sensing areas among neighbouring wireless sensors is analysed we present an interesting observation concerning the minimum and maximum number of neighbours that are required to provide complete redundancy and introduce simple methods to estimate the degree of redundancy without the knowledge of location or directional information we also provide tight upper and lower bounds on the probability of complete redundancy and on the average partial redundancy with random sensor deployment our analysis shows that partial redundancy is more realistic for real applications as complete redundancy is expensive requiring up to neighbouring sensors to provide percent chance of complete redundancy our results can be utilised in designing effective sensor scheduling algorithms to reduce energy consumption and in the mean time maintain reasonable sensing area
to cope with the challenges posed by device capacity and capability and also the nature of ad hoc networks service discovery model is needed that can resolve security and privacy issues with simple solutions the use of complex algorithms and powerful fixed infrastructure is infeasible due to the volatile nature of pervasive environment and tiny pervasive devices in this paper we present trust based secure service discovery model tssd trust based secure service discovery for truly pervasive environment our model is hybrid one that allows both secure and non secure discovery of services this model allows service discovery and sharing based on mutual trust the security model handles the communication and service sharing security issues tssd also incorporates trust mode for sharing services with unknown devices
order sorted logic has been formalized as first order logic with sorted terms where sorts are ordered to build hierarchy called sort hierarchy these sorted logics lead to useful expressions and inference methods for structural knowledge that ordinary first order logic lacks nitta et al pointed out that for legal reasoning sort hierarchy or sorted term is not sufficient to describe structural knowledge for event assertions which express facts caused at some particular time and place the event assertions are represented by predicates with arguments ie ary predicates and then particular kind of hierarchy called predicate hierarchy is built by relationship among the predicates to deal with such predicate hierarchy which is more intricate than sort hierarchy nitta et al implemented typed sorted logic programming language extended to include hierarchy of verbal concepts corresponding to predicates however the inference system lacks theoretical foundation because its hierarchical expressions exceed the formalization of order sorted logic in this paper we formalize logic programming language with not only sort hierarchy but also predicate hierarchy this language can derive general and concrete expressions in the two kinds of hierarchies for the hierarchical reasoning of predicates we propose manipulation of arguments in which surplus and missing arguments in derived predicates are eliminated and supplemented as discussed by allen mcdermott and shoham in research on temporal logic and as applied by nitta et al to legal reasoning if each predicate is interpreted as an event or action not as static property then missing arguments should be supplemented by existential terms in the argument manipulation based on this we develop horn clause resolution system extended to add inference rules of predicate hierarchies with semantic model restricted by interpreting predicate hierarchy the soundness and completeness of the horn clause resolution is proven
schema mappings are declarative specifications that describe the relationship between two database schemas in recent years there has been an extensive study of schema mappings and of their applications to several different data inter operability tasks including applications to data exchange and data integration schema mappings are expressed in some logical formalism that is typically fragment of first order logic or fragment of second order logic these fragments are chosen because they possess certain desirable structural properties such as existence of universal solutions or closure under target homomorphisms in this paper we turn the tables and focus on the following question can we characterize the various schema mapping languages in terms of structural properties possessed by the schema mappings specified in these languages we obtain number of characterizations of schema mappings specified by source to target dependencies including characterizations of schema mappings specified by lav local as view tgds schema mappings specified by full tgds and schema mappings specified by arbitrary tgds these results shed light on schema mapping languages from new perspective and more importantly demarcate the properties of schema mappings that can be used to reason about them in data inter operability applications
the use of query independent knowledge to improve the ranking of documents in information retrieval has proven very effective in the context of web search this query independent knowledge is derived from an analysis of the graph structure of hypertext links between documents however there are many cases where explicit hypertext links are absent or sparse eg corporate intranets previous work has sought to induce graph link structure based on various measures of similarity between documents after inducing these links standard link analysis algorithms eg pagerank can then be applied in this paper we propose and examine an alternative approach to derive query independent knowledge which is not based on link analysis instead we analyze each document independently and calculate specificity score based on normalized inverse document frequency and ii term entropies two re ranking strategies ie hard cutoff and soft cutoff are then discussed to utilize our query independent specificity scores experiments on standard trec test sets show that our re ranking algorithms produce gains in mean reciprocal rank of about and to gains in precision at and respectively when using the collection of trec disk and queries from trec ad hoc topics empirical tests demonstrate that the entropy based algorithm produces stable results across retrieval models ii query sets and iii collections
software testing is an essential process to improve software quality in practice researchers have proposed several techniques to automate parts of this process in particular symbolic execution can be used to automatically generate set of test inputs that achieves high code coveragehowever most state of the art symbolic execution approaches cannot directly handle programs whose inputs are pointers as is often the case for programs automatically generating test inputs for pointer manipulating code such as linked list or balanced tree implementation remains challenge eagerly enumerating all possible heap shapes forfeits the advantages of symbolic execution alternatively for tester writing assumptions to express the disjointness of memory regions addressed by input pointers is tedious and labor intensive taskthis paper proposes novel solution for this problem by exploiting type information disjointness constraints that characterize permissible configurations of typed pointers in byte addressable memory can be automatically generated as result the constraint solver can automatically generate relevant heap shapes for the program under test we report on our experience with an implementation of this approach in pex dynamic symbolic execution framework for net we examine two different symbolic representations for typed memory and we discuss the impact of various optimizations
the aim of this paper is to contribute to an understanding of how pd plays out in emerging large scale is projects we argue that even if many of these projects start out on well founded small step methodological basis such as agile methods xp etc organizational politics and maneuvering will inevitably be part of the process especially as the scope and size of the system increases more specifically we discuss this implicated organizational complexity the increasingly unclear user roles as well as critically examine the traditional neutral vendor role which is an assumption of agile engineering methods
the connection between integrality gaps and computational hardness of discrete optimization problems is an intriguing question in recent years this connection has prominently figured in several tight ugc based hardness results we show in this paper direct way of turning integrality gaps into hardness results for several fundamental classification problems specifically we convert linear programming integrality gaps for the multiway cut extension and and metric labeling problems into ugc based hardness results qualitatively our result suggests that if the unique games conjecture is true then linear relaxation of the latter problems studied in several papers so called earthmover linear program yields the best possible approximation taking this step further we also obtain integrality gaps for semi definite programming relaxation matching the integrality gaps of the earthmover linear program prior to this work there was an intriguing possibility of obtaining better approximation factors for labeling problems via semi definite programming
while overall bandwidth in the internet has grown rapidly over the last few years and an increasing number of clients enjoy broadband connectivity many others still access the internet over much slower dialup or wireless links to address this issue number of techniques for optimized delivery of web and multimedia content over slow links have been proposed including protocol optimizations caching compression and multimedia transcoding and several large isps have recently begun to widely promote dialup acceleration services based on such techniques recent paper by rhea liang and brewer proposed an elegant technique called value based caching that caches substrings of files rather than entire files and thus avoids repeated transmission of substrings common to several pages or page versionswe propose and study hierarchical substring caching technique that provides significant savings over this basic approach we describe several additional techniques for minimizing overheads and perform an evaluation on large set of real web access traces that we collected in the second part of our work we compare our approach to widely studied alternative approach based on delta compression and show how to integrate the two for best overall performance the studied techniques are typically employed in client proxy environment with each proxy serving large number of clients and an important aspect is how to conserve resources on the proxy while exploiting the significant memory and cpu power available on current clients
when faced with anything out of the ordinary faulty or suspicious the work of determining and categorizing the trouble and scoping for what to do about it if anything often go hand in hand this is diagnostic work in all its expert and non expert forms diagnostic work is often both intellectual and embodied collaborative and distributed and ever more deeply entangled with technologies yet it is often poorly supported by them in this special issue we show that diagnostic work is an important and pervasive aspect of people’s activities at work at home and on the move the papers published in this special issue come from range of domains including ambulance dispatch friendly fire incident and anomaly response for the nasa space shuttle software network and photocopier troubleshooting and users attempting to use new travel management system these papers illustrate the variety of work that may be thought of as diagnostic we hope that bringing focus on diagnostic work to these diverse practices and situations opens up rich vein of inquiry for cscw scholars designers and users
multicore processors contain new hardware characteristics that are different from previous generation single core systems or traditional smp symmetric multiprocessing multiprocessor systems these new characteristics provide new performance opportunities and challenges in this paper we show how hardware performance monitors can be used to provide fine grained closely coupled feedback loop to dynamic optimizations done by multicore aware operating system these multicore optimizations are possible due to the advanced capabilities of hardware performance monitoring units currently found in commodity processors such as execution pipeline stall breakdown and data address sampling we demonstrate three case studies on how multicore aware operating system can use these online capabilities for determining cache partition sizes which helps reduce contention in the shared cache among applications detecting memory regions with bad cache usage which helps in isolating these regions to reduce cache pollution and detecting sharing among threads which helps in clustering threads to improve locality using realistic applications from standard benchmark suites the following performance improvements were achieved up to improvement in ipc instructions per cycle due to cache partition sizing up to reduction in cache miss rates due to reduced cache pollution resulting in up to improvement in ipc and up to reduction in remote cache accesses due to thread clustering resulting in up to application level improvement
business activity monitoring bam aims to support the real time analysis of business processes in order to improve the speed and effectiveness of business operations providing timely integrated high level view on the evolution and well being of business activities within enterprises constitutes highly valuable analytical tool for monitoring managing and hopefully enhancing businesses however the degree of automation currently achieved cannot support the level of reactivity and adaptation demanded by businesses we argue that the fundamental problem is that moving between the business level and the it level is insufficiently automated and suggest an extensive use of semantic technologies as solution in particular we present sentinel semantic business process monitoring tool that advances the state of the art in bam by making extensive use of semantic technologies in order to support the integration and derivation of business level knowledge out of low level audit trails generated by it systems
with the rapid advancement in wireless communications and positioning techniques it is now feasible to track the positions of moving objects however existing indexes and associated algorithms which are usually disk based are unable to keep up with the high update rate while providing speedy retrieval at the same time since main memory is much faster than disk efficient management of moving object database can be achieved through aggressive use of main memory in this paper we propose an integrated memory partitioning and activity conscious twin index impact framework where the moving object database is indexed by pair of indexes based on the properties of the objects movement main memory structure manages active objects while disk based index handles inactive objects as objects become active or inactive they dynamically migrate from one structure to the other in the worst case that each time an object need to be migrated to the disk which means each update may incur disk access the performance of impact degrades to be the same as the disk based index structures moreover the main memory is also organized into two partitions one for the main memory index and the other as buffers for the frequently accessed nodes of the disk based index we also presented the detailed algorithms for different operations and cost model to estimate the optimal memory allocation our analytical and experimental results show that the proposed impact framework achieves significant performance improvement over the traditional indexing scheme
while participatory design makes end users part of the design process we might also want the resulting system to be open for interpretation appropriation and change over time to reflect its usage but how can we design for appropriation we need to strike good balance between making the user an active co constructor of system functionality versus making too strong interpretative design that does it all for the user thereby inhibiting their own creative use of the system through revisiting five systems in which appropriation has happened both within and outside the intended use we are going to show how it can be possible to design with open surfaces these open surfaces have to be such that users can fill them with their own interpretation and content they should be familiar to the user resonating with their real world practice and understanding thereby shaping its use
despite the success of global search engines website search is still problematic in its retrieval accuracy server logs contain rich source of information about how users actually access website in this paper we propose novel approach of using server log analysis to extract terms to build the web page representation which is new source of evidence for website search then we use multiple evidence combination to combine this log based evidence with text based and anchor based evidence we test the performance of different combination approaches the combination of representations and of ranking scores using linear combination and inference network models we also consider different baseline retrieval models our experimental results have shown that the server log when used in multiple evidence combination can improve the effectiveness of website search whereas the impact on different models is different
generic traversals over recursive data structures are often referred to as boilerplate code the definitions of functions involving such traversals may repeat very similar patterns but with variations for different data types and different functionality libraries of operations abstracting away boilerplate code typically rely on elaborate types to make operations generic the motivating observation for this paper is that most traversals have value specific behaviour for just one type we present the design of new library exploiting this assumption our library allows concise expression of traversals with competitive performance
mado interface is tangible user interface consisting of compact touch screen display and physical blocks mado means window in japanese and mado interface is utilized as the real window into the virtual world users construct physical object by simply combining electrical blocks then by connecting mado interface to the physical object they can watch the virtual model corresponding to the physical block configuration shape color etc the size and the viewpoint of the virtual model seen by the user depend on the position of mado interface maintaining the consistency between the physical and virtual worlds in addition users can interact with the virtual model by touching the display on mado interface these features enable users to explore the virtual world intuitively and powerfully
we explore dtm techniques within the context of uniform and nonuniform smt workloads while dvs is suitable for addressing workloads with uniformly high temperatures for nonuniform workloads performance loss occurs because of the slowdown of the cooler thread to address this we propose and evaluate dtm mechanisms that exploit the steering based thread management mechanisms inherent in clustered smt architecture we show that in contrast to dvs which operates globally our techniques are more effective at controlling temperature for nonuniform workloads furthermore we devise dtm technique that combines steering and dvs to achieve consistently good performance across all workloads
we present memory aware load balancing malb technique to dispatch transactions to replicas in replicated database our malb algorithm exploits knowledge of the working sets of transactions to assign them to replicas in such way that they execute in main memory thereby reducing disk in support of malb we introduce method to estimate the size and the contents of transaction working sets we also present an optimization called update filtering that reduces the overhead of update propagation between replicas we show that malb greatly improves performance over other load balancing techniques such as round robin least connections and locality aware request distribution lard that do not use explicit information on how transactions use memory in particular lard demonstrates good performance for read only static content web workloads but it gives performance inferior to malb for database workloads as it does not efficiently handle large requests malb combined with update filtering further boosts performance over lard we build prototype replicated system called tashkent with which we demonstrate that malb and update filtering techniques improve performance of the tpc and rubis benchmarks in particular in replica cluster and using the ordering mix of tpc malb doubles the throughput over least connections and improves throughput over lard malb with update filtering further improves throughput to triple that of least connections and more than double that of lard our techniques exhibit super linear speedup the throughput of the replica cluster is times the peak throughput of standalone database due to better use of the cluster’s memory
distributing data is one of the key problems in implementing efficient distributed memory parallel programs the problem becomes more difficult in programs where data redistribution between computational phases is considered the global data distribution problem is to find the optimal distribution in multi phase parallel programs solving this problem requires accurate knowledge of data redistribution costwe are investigating this problem in the context of software distributed shared memory sdsm system in which obtaining accurate redistribution cost estimates is difficult this is because sdsm communication is implicit it depends on access patterns page locations and the sdsm consistency protocolwe have developed integrated compile and run time analysis for sdsm systems to determine accurate redistribution cost estimates with low overhead our resulting system suif adapt can efficiently and accurately estimate execution time including redistribution to within of the actual time in all of our test cases and is often much closer these precise costs enable suif adapt to find efficient global data distributions in multiple phase programs
computer understanding of human actions and interactions is one of the key research issues in human computing in this regard context plays an essential role in semantic understanding of human behavioral and social signals from sensor data this paper put forward an event based dynamic context model to address the problems of context awareness in the analysis of group interaction scenarios event driven multilevel dynamic bayesian network is correspondingly proposed to detect multilevel events which underlies the context awareness mechanism online analysis can be achieved which is superior over previous works experiments in our smart meeting room demonstrate the effectiveness of our approach
extracting classification rules from data is an important task of data mining and gaining considerable more attention in recent years in this paper new meta heuristic algorithm which is called as taco miner is proposed for rule extraction from artificial neural networks ann the proposed rule extraction algorithm actually works on the trained anns in order to discover the hidden knowledge which is available in the form of connection weights within ann structure the proposed algorithm is mainly based on meta heuristic which is known as touring ant colony optimization taco and consists of two step hierarchical structure the proposed algorithm is experimentally evaluated on six binary and ary classification benchmark data sets results of the comparative study show that taco miner is able to discover accurate and concise classification rules
in this paper general scheme for solving coherent geometric queries on freeform geometry is presented and demonstrated on variety of problems common in geometric modeling the underlying strategy of the approach is to lift the domain of the problem into higher dimensional space to enable analysis on the continuum of all possible configurations of the geometry this higher dimensional space supports analysis of changes to solution topology by solving for critical points using spline based constraint solver the critical points are then used to guide fast local methods to robustly update repeated queries this approach effectively combines the speed of local updates with the robustness of global search solutions the effectiveness of the domain lifting scheme dls is demonstrated on several geometric computations including accurately generating offset curves and finding minimum distances our approach requires preprocessing step that computes the critical points but once the topology is analyzed an arbitrary number of geometry queries can be solved using fast local methods experimental results show that the approach solves for several hundred minimum distance computations between planar curves in one second and results in hundredfold speedup for trimming self intersections in offset curves
metadata repository systems manage metadata typically represented as models or meta models in order to facilitate repository application development dedicated query language addressing the specific capabilities of such systems is required this paper introduces declarative query language for querying omg mof based metadata repository systems called msql meta sql some of the key features of msql are support for higher order queries and model independent querying unified handling of repository data and metadata quantification over repository model elements sql alignment some of the areas where msql may be applied are querying schematically disparate models in mof repositories metadata application development generic browsing of complex meta data data collections and ultimately model driven development
ubiquitous computing challenges the conventional notion of user logged into personal computing device whether it is desktop laptop or digital assistant when the physical environment of user contains hundreds of networked computer devices each of which may be used to support one or more user applications the notion of personal computing becomes inadequate further when group of users share such physical environment new forms of sharing cooperation and collaboration are possible and mobile users may constantly change the computers with which they interact we refer to these digitally augmented physical spaces as active spaces we present in this paper an application framework that provides mechanisms to construct run or adapt existing applications to ubiquitous computing environments the framework binds applications to users uses multiple devices simultaneously and exploits resource management within the users environment that reacts to context and mobility our research contributes to application mobility partitioning and adaptation within device rich environments and uses context awareness to focus the resources of ubiquitous computing environments on the needs of users
many scientific problems can be represented as computational workflows of operations that access remote data integrate heterogeneous data and analyze and derive new data even when the data access and processing operations are implemented as web or grid services workflows are often constructed manually in languages such as bpel adding semantic descriptions of the services enables automatic or mixed initiative composition in most previous work these descriptions consists of semantic types for inputs and outputs of services or type for the service as whole while this is certainly useful we argue that is not enough to model and construct complex data workflows we present planning approach to automatically constructing data processing workflows where the inputs and outputs of services are relational descriptions in an expressive logic our workflow planner uses relational subsumption to connect the output of service with the input of another this modeling style has the advantage that adaptor services so called shims can be automatically inserted into the workflow where necessary
we consider the problem of deciding query equivalence for conjunctive language in which queries output complex objects composed from mixture of nested unordered collection types using an encoding of nested objects as flat relations we translate the problem to deciding the equivalence between encodings output by relational conjunctive queries this encoding equivalence cleanly unifies and generalizes previous results for deciding equivalence of conjunctive queries evaluated under various processing semantics as part of our characterization of encoding equivalence we define normal form for encoding queries and contend that this normal form offers new insight into the fundamental principles governing the behaviour of nested aggregation
collaborative filtering cf is one of the most widely used methods for personalized product recommendation at online stores cf predicts users preferences on products using past data of users such as purchase records or their ratings on products the prediction is then used for personalized recommendation so that products with highly estimated preference for each user are selected and presented one of the most difficult issues in using cf is that it is often hard to collect sufficient amount of data for each user to estimate preferences accurately enough in order to address this problem this research studies how we can gain the most information about each user by collecting data on very small number of selected products and develops method for choosing sequence of such products tailored to each user based on metrics from information theory and correlation based product similarity the effectiveness of the proposed methods is tested using experiments with the movielens dataset
we describe the design and implementation of the glue nail database system the nail language is purely declarative query language glue is procedural language used for non query activities the two languages combined are sufficient to write complete application nail and glue code both compile into the target language iglue the nail compiler uses variants of the magic sets algorithm and supports well founded models static optimization is performed by the glue compiler using techniques that include peephole methods and data flow analysis the iglue code is executed by the iglue interpreter which features run time adaptive optimizer the three optimizers each deal with separate optimization domains and experiments indicate that an effective synergism is achieved the glue nail system is largely complete and has been tested using suite of representative applications
self scheduling algorithms are useful for achieving load balance in heterogeneous computational systems therefore they can be applied in computational grids here we introduce two families of self scheduling algorithms the first considers an explicit form for the chunks distribution function the second focuses on the variation rate of the chunks distribution function from the first family we propose quadratic self scheduling qss algorithm from the second two new algorithms exponential self scheduling ess and root self scheduling rss are introduced qss ess and rss are tested in an internet based grid of computers involving resources from spain and mexico qss and ess outperform previous self scheduling algorithms qss is found slightly more efficient than ess rss shows poor performance fact traced back to the curvature of the chunks distribution function
designing complex software system is cognitively challenging task thus designers need cognitive support to create good designs domain oriented design environments are cooperative problem solving systems that support designers in complex design tasks in this paper we present the architecture and facilities of argo domain oriented design environment for software architecture argo lsquo own architecture is motivated by the desire to achieve reuse and extensibility of the design environment it separates domain neutral code from domain oriented code which is distributed among active design materials as opposed to being centralized in the design environment argo lsquo facilities are motivated by the observed cognitive needs of designers these facilities extend previous work in design environments by enhancing support for reflection in action and adding new support for opportunistic design and comprehension and problem solving
despite the automated refactoring support provided by today’s ides many program transformations that are easy to conceptualize such as improving the implementation of design pattern are not supported and are hence hard to perform we propose an extension to the refactoring paradigm that provides for the modular maintenance of crosscutting design idioms supporting both substitutability of design idiom implementations and the checking of essential constraints we evaluate this new approach through the design and use of arcum an ide based mechanism for declaring checking and evolving crosscutting design idioms
one of the key tasks of database administrator is to optimize the set of materialized indices with respect to the current workload to aid administrators in this challenging task commercial dbmss provide advisors that recommend set of indices based on sample workload it is left for the administrator to decide which of the recommended indices to materialize and when this decision requires some knowledge of how the indices benefit the workload which may be difficult to understand if there are any dependencies or interactions among indices unfortunately advisors do not provide this crucial information as part of the recommendation motivated by this shortcoming we propose framework and associated tools that can help an administrator understand the interactions within the recommended set of indices we formalize the notion of index interactions and develop novel algorithm to identify the interaction relationships that exist within set of indices we present experimental results with prototype implementation over ibm db that demonstrate the efficiency of our approach we also describe two new database tuning tools that utilize information about index interactions the first tool visualizes interactions based on partitioning of the index set into non interacting subsets and the second tool computes schedule that materializes the indices over several maintenance windows with maximal overall benefit in both cases we provide strong analytical results showing that index interactions can enable enhanced functionality
in this paper we present new approach for labeling points with different geometric surface primitives using novel feature descriptor the fast point feature histograms and discriminative graphical models to build informative and robust feature point representations our descriptors encode the underlying surface geometry around point using multi value histograms this highly dimensional feature space copes well with noisy sensor data and is not dependent on pose or sampling density by defining classes of geometric surfaces and making use of contextual information using conditional random fields crfs our system is able to successfully segment and label point clouds based on the type of surfaces the points are lying on we validate and demonstrate the method’s efficiency by comparing it against similar initiatives as well as present results for table setting datasets acquired in indoor environments
reputation systems are emerging as one of the promising solutions for building trust among market participants in commerce finding ways to avoid or reduce the influence of unfair ratings is fundamental problem in reputation systems we propose an implicit reputation rating mechanism suitable for bc commerce the conceptual framework of the mechanism is based on the source credibility model in consumer psychology we have experimentally evaluated the performance of the mechanism by comparing with the other benchmark rating mechanisms the experimental results provide evidence that the general users opinions can be predicted more effectively by only small number of users selected by our proposed mechanism
goal oriented methods are increasingly popular for elaborating software requirements they offer systematic support for incrementally building intentional structural and operational models of the software and its environment event based transition systems on the other hand are convenient formalisms for reasoning about software behaviour at the architectural levelthe paper relates these two worlds by presenting technique for translating formal specification of software operations built according to the kaos goal oriented method into event based transition systems analysable by the ltsa toolset the translation involves moving from declarative state based timed synchronous formalism typical of requirements modelling languages to an operational event based untimed asynchronous one typical of architecture description languages the derived model can be used for the formal analysis and animation of kaos operation models in ltsathe paper also provides insights into the two complementary formalisms and shows that the use of synchronous temporal logic for requirements specification hinders smooth transition from requirements to software architecture models
we consider wireless ad hoc network composed of set of wireless nodes distributed in two dimensional plane several routing protocols based on the positions of the mobile hosts have been proposed in the literature typical assumption in these protocols is that all wireless nodes have uniform transmission regions modeled by unit disk centered at each wireless node however all these protocols are likely to fail if the transmission ranges of the mobile hosts vary due to natural or man made obstacles or weather conditions these protocols may fail because either some connections that are used by routing protocols do not exist which effectively results in disconnecting the network or the use of some connections causes livelocks in this paper we describe robust routing protocol that tolerates up to roughly of variation in the transmission ranges of the mobile hosts more precisely our protocol guarantees message delivery in connected ad hoc network whenever the ratio of the maximum transmission range to the minimum transmission range is at most
pedagogical algorithm visualization av systems produce graphical representations that aim to assist learners in understanding the dynamic behavior of computer algorithms in order to foster active learning computer science educators have developed av systems that empower learners to construct their own visualizations of algorithms under study notably these systems support similar development model in which coding an algorithm is temporally distinct from viewing and interacting with the resulting visualization given that they are known to have problems both with formulating syntactically correct code and with understanding how code executes novice learners would appear likely to benefit from more live development model that narrows the gap between coding an algorithm and viewing its visualization in order to explore this possibility we have implemented what you see is what you code an algorithm development and visualization model geared toward novices first learning to program under the imperative paradigm in the model the line of algorithm code currently being edited is reevaluated on every edit leading to immediate syntactic feedback along with immediate semantic feedback in the form of an av analysis of usability and field studies involving introductory computer science students suggests that the immediacy of the model’s feedback can help novices to quickly identify and correct programming errors and ultimately to develop semantically correct code
graph database models can be defined as those in which data structures for the schema and instances are modeled as graphs or generalizations of them and data manipulation is expressed by graph oriented operations and type constructors these models took off in the eighties and early nineties alongside object oriented models their influence gradually died out with the emergence of other database models in particular geographical spatial semistructured and xml recently the need to manage information with graph like nature has reestablished the relevance of this area the main objective of this survey is to present the work that has been conducted in the area of graph database modeling concentrating on data structures query languages and integrity constraints
the database volumes of enterprise resource planning erp systems like sap are growing at tremendous rate and some of them have already reached size of several terabytes oltp online transaction processing databases of this size are hard to maintain and tend to perform poorly therefore most database vendors have implemented new features like horizontal partitioning to optimize such mission critical applications horizontal partitioning was already investigated in detail in the context of shared nothing distributed database systems but today’s erp systems mostly use centralized database with shared everything architecture in this work we therefore investigate how an sap system performs when the data in the underlying database is partitioned horizontally our results show that especially joins in parallel executed statements and administrative tasks benefit greatly from horizontal partitioning while the resulting small increase in the execution times of insertions deletions and updates is tolerable these positive results have initiated the sap cooperation partners to pursue partitioned data layout in some of their largest installed productive systems
interactive rendering of global illumination effects is challenging problem while precomputed radiance transfer prt is able to render such effects in real time the geometry is generally assumed static this work proposes to replace the precomputed lighting response used in prt by precomputed depth precomputing depth has the same cost as precomputing visibility but allows visibility tests for moving objects at runtime using simple shadow mapping for this purpose compression scheme for high number of coherent surface shadow maps cssms covering the entire scene surface is developed cssms allow visibility tests between all surface points against all points in the scene we demonstrate the effectiveness of cssm based visibility using novel combination of the lightcuts algorithm and hierarchical radiosity which can be efficiently implemented on the gpu we demonstrate interactive bounce diffuse global illumination with final glossy bounce and many high frequency effects general brdfs texture and normal maps and local or distant lighting of arbitrary shape and distribution all evaluated per pixel furthermore all parameters can vary freely over time the only requirement is rigid geometry
large scale non uniform memory access numa multiprocessors are gaining increased attention due to their potential for achieving high performance through the replication of relatively simple components because of the complexity of such systems scheduling algorithms for parallel applications are crucial in realizing the performance potential of these systems in particular scheduling methods must consider the scale of the system with the increased likelihood of creating bottlenecks along with the numa characteristics of the system and the benefits to be gained by placing threads close to their code and datawe propose class of scheduling algorithms based on processor pools processor pool is software construct for organizing and managing large number of processors by dividing them into groups called pools the parallel threads of job are run in single processor pool unless there are performance advantages for job to span multiple pools several jobs may share one pool our simulation experiments show that processor pool based scheduling may effectively reduce the average job response time the performance improvements attained by using processor pools increase with the average parallelism of the jobs the load level of the system the differentials in memory access costs and the likelihood of having system bottlenecks as the system size increasesr while maintaining the workload composition and intensity we observed that processor pools can be used to provide significant performance improvements we therefore conclude that processor pool based scheduling may be an effective and efficient technique for scalable systems
the freebsd gnu linux solaris and windows operating systems have kernels that provide comparable facilities interestingly their code bases share almost no common parts while their development processes vary dramatically we analyze the source code of the four systems by collecting metrics in the areas of file organization code structure code style the use of the preprocessor and data organization the aggregate results indicate that across various areas and many different metrics four systems developed using wildly different processes score comparably this allows us to posit that the structure and internal quality attributes of working non trivial software artifact will represent first and foremost the engineering requirements of its construction with the influence of process being marginal if any
energy saving is one of the most important issues in wireless mobile computing among others one viable approach to achieving energy saving is to use an indexed data organization to broadcast data over wireless channels to mobile units using indexed broadcasting mobile units can be guided to the data of interest efficiently and only need to be actively listening to the broadcasting channel when the relevant information is present in this paper we explore the issue of indexing data with skewed access for sequential broadcasting in wireless mobile computing we first propose methods to build index trees based on access frequencies of data records to minimize the average cost of index probes we consider two cases one for fixed index fanouts and the other for variant index fanouts and devise algorithms to construct index trees for both cases we show that the cost of index probes can be minimized not only by employing an imbalanced index tree that is designed in accordance with data access skew but also by exploiting variant fanouts for index nodes note that even for the same index tree different broadcasting orders of data records will lead to different average data access times to address this issue we develop an algorithm to determine the optimal order for sequential data broadcasting to minimize the average data access time performance evaluation on the algorithms proposed is conducted examples and remarks are given to illustrate our results
we present brute force attack on an elliptic curve cryptosystem implemented on uc berkley’s tinyos operating system for wireless sensor networks wsns the attack exploits the short period of the pseudorandom number generator prng used by the cryptosystem to generate private keys in order to define failure in the event brute force attack takes too long to execute we create metric that relates the duty cycle of the mote to the compromise rate and the period of the key generation algorithm experimental results show that roughly of the mote’s address space leads to private key compromise in min on average furthermore approximately of the mote address space leads to compromise in min on average in min and the remaining in min or less we examine two alternatives to the prng our own design modified from published algorithm and the new prng distributed with the beta release of tinyos our design executes times faster than the other alternative and requires cpu cycles more than the original prng in addition our design is times smaller than the other alternative and requires additional bytes of memory the period of our prng is uniform for all mote addresses and requires years on average for key compromise with the attack presented in this paper
secure multi party computation has been considered by the cryptographic community for number of years until recently it has been purely theoretical area with few implementations with which to test various ideas this has led to number of optimisations being proposed which are quite restricted in their application in this paper we describe an implementation of the two party case using yao’s garbled circuits and present various algorithmic protocol improvements these optimisations are analysed both theoretically and empirically using experiments of various adversarial situations our experimental data is provided for reasonably large circuits including one which performs an aes encryption problem which we discuss in the context of various possible applications
evaluating and analyzing the performance of parallel application on an architecture to explain the disparity between projected and delivered performance is an important aspect of parallel systems research however conducting such study is hard due to the vast design space of these systems in this paper we study two important aspects related to the performance of parallel applications on shared memory parallel architectures fist we quantify overheads observed during the execution of these applications on three different simulated architectures we next use these results to synthesize the bandwidth requirements for the applications with respect to different network topologies this study is performed using an execution driven simulation tool called spasm which provides way of isolating and quantifying the different parallel system overheads in nonintrusive manner the first exercise shows that in shared memory machines with private caches as long as the applications are well structured to exploit locality the key determinant that impacts performance is network connection the second exercise quantifies the network bandwidth needed to minimize the effect of network connection specifically it is shown that for the applications considered as long as the problem sizes are increased commensurate with the system size current network technologies supporting mbytes sec link bandwidth are sufficient to keep the network overheads such as the latency and contention within acceptable bounds
in this article we propose an efficient method for estimating depth map from long baseline image sequences captured by calibrated moving multi camera system our concept for estimating depth map is very simple we integrate the counting of the total number of interest points tnip in images with the original framework of multiple baseline stereo even by using simple algorithm the depth can be determined without computing similarity measures such as ssd sum of squared differences and ncc normalized cross correlation that have been used for conventional stereo matching the proposed stereo algorithm is computationally efficient and robust for distortions and occlusions and has high affinity with omni directional and multi camera imaging although expected trade off between accuracy and efficiency is confirmed for naive tnip based method hybrid approach that uses both tnip and ssd improve this with realizing high accurate and efficient depth estimation we have experimentally verified the validity and feasibility of the tnip based stereo algorithm for both synthetic and real outdoor scenes
conventional content based image retrieval cbir schemes employing relevance feedback may suffer from some problems in the practical applications first most ordinary users would like to complete their search in single interaction especially on the web second it is time consuming and difficult to label lot of negative examples with sufficient variety third ordinary users may introduce some noisy examples into the query this correspondence explores solutions to new issue that image retrieval using unclean positive examples in the proposed scheme multiple feature distances are combined to obtain image similarity using classification technology to handle the noisy positive examples new two step strategy is proposed by incorporating the methods of data cleaning and noise tolerant classifier the extensive experiments carried out on two different real image collections validate the effectiveness of the proposed scheme
web service orchestration is widely spread for the creation of composite web services using standard specifications such as bpelws the myriad of specifications and aspects that should be considered in orchestrated web services are resulting in increasing complexity this complexity leads to software infrastructures difficult to maintain with interwoven code involving different aspects such as security fault tolerance distribution etc in this paper we present zen flow reflective bpel engine that enables to separate the implementation of different aspects among them and from the implementation of the regular orchestration functionality of the bpel engine we illustrate its capabilities and performance exercising the reflective interface through decentralized orchestration use case
ensembles of distributed heterogeneous resources or computational grids have emerged as popular platforms for deploying large scale and resource intensive applications large collaborative efforts are currently underway to provide the necessary software infrastructure grid computing raises challenging issues in many areas of computer science and especially in the area of distributed computing as computational grids cover increasingly large networks and span many organizations in this paper we briefly motivate grid computing and introduce its basic concepts we then highlight number of distributed computing research questions and discuss both the relevance and the short comings of previous research results when applied to grid computing we choose to focus on issues concerning the dissemination and retrieval of information and data on computational grid platforms we feel that these issues are particularly critical at this time and as we can point to preliminary ideas work and results in the grid community and the distributed computing community this paper is of interest to distributing computing researchers because grid computing provides new challenges that need to be addressed as well as actual platforms for experimentation and research
focused crawling is aimed at selectively seeking out pages that are relevant to predefined set of topics since an ontology is well formed knowledge representation ontology based focused crawling approaches have come into research however since these approaches utilize manually predefined concept weights to calculate the relevance scores of web pages it is difficult to acquire the optimal concept weights to maintain stable harvest rate during the crawling process to address this issue we proposed learnable focused crawling framework based on ontology an ann artificial neural network was constructed using domain specific ontology and applied to the classification of web pages experimental results show that our approach outperforms the breadth first search crawling approach the simple keyword based crawling approach the ann based focused crawling approach and the focused crawling approach that uses only domain specific ontology
context new processes tools and practices are being introduced into software companies at an increasing rate with each new advance in technology software managers need to consider not only whether it is time to change the technologies currently used but also whether an evolutionary change is sufficient or revolutionary change is required objective in this paper we approach this dilemma from the organizational and technology research points of view to see whether they can help software companies in initiating and managing technology change in particular we explore the fit of the technology curve the classic change curve and technological change framework to software technology change project and examine the insights that such frameworks can bring method the descriptive case study described in this paper summarizes software technology change project in which year old legacy information system running on mainframe was replaced by network server system at the same time as the individual centric development practices were replaced with organization centric ones the study is based on review of the company’s annual reports in conjunction with other archival documents five interviews and collaboration with key stakeholder in the company results analyses of the collected data suggest that software technology change follows the general change research findings as characterized by the technology curve and the classic change curve further that such frameworks present critical questions for management to address when embarking on and then running such projects conclusions we describe how understanding why software technology change project is started the way in which it unfolds and how different factors affect it are essential tools for project leaders in preparing for change projects and for keeping them under control moreover we show how it is equally important to understand how software technology change can work as catalyst in revitalizing stagnated organization facilitating other changes and thereby helping an organization to redefine its role in the marketplace
releasing person specific data could potentially reveal sensitive information about individuals anonymization is promising privacy protection mechanism in data publishing although substantial research has been conducted on anonymization and its extensions in recent years only few prior works have considered releasing data for some specific purpose of data analysis this paper presents practical data publishing framework for generating masked version of data that preserves both individual privacy and information usefulness for cluster analysis experiments on real life data suggest that by focusing on preserving cluster structure in the masking process the cluster quality is significantly better than the cluster quality of the masked data without such focus the major challenge of masking data for cluster analysis is the lack of class labels that could be used to guide the masking process our approach converts the problem into the counterpart problem for classification analysis wherein class labels encode the cluster structure in the data and presents framework to evaluate the cluster quality on the masked data
currently it is possible to use aspect oriented languages to attach behavior to code based on semantic or syntactic properties of that code there is no language however that allows developers to attach behavior based on static metaproperties of code here we demonstrate technique for applying aop methods to metaproperties of source code we use advice to coherently define runtime behavior for subsets of code that need not share semantic or syntactic properties to illustrate the approach we use java as base language and define family of pointcuts based on the edit time of the source lines then build simple debugging application that applies runtime tracing to only the most recently changed code using this technique the tracing code is neatly modularized and need not depend on any semantic properties of the base code we believe that this approach has powerful applications for debugging as well as for software engineering researchers looking to explore the runtime effects of extra linguistic features
service computing has increasingly been adopted by the industry developing business applications by means of orchestration and choreography choreography specifies how services collaborate with one another by defining say the message exchange rather than via the process flow as in the case of orchestration messages sent from one service to another may require the use of different xpaths to manipulate or extract message contents mismatches in xml manipulations through xpaths such as to relate incoming and outgoing messages in choreography specifications may result in failures in this paper we propose to associate xpath rewriting graphs xrgs structure that relates xpath and xml schema with actions of choreography applications that are skeletally modeled as labeled transition systems we develop the notion of xrg patterns to capture how different xrgs are related even though they may refer to different xml schemas or their tags by applying xrg patterns we successfully identify new data flow associations in choreography applications and develop new data flow testing criteria finally we report an empirical case study that evaluates our techniques the result shows our techniques are promising in detecting failures in choreography applications
dualization of monotone boolean function represented by conjunctive normal form cnf is problem which in different disguise is ubiquitous in many areas including computer science artificial intelligence and game theory to mention some of them it is also one of the few problems whose precise tractability status in terms of polynomial time solvability is still unknown and now open for more than years in this paper we briefly survey computational results for this problem where we focus on the famous paper by fredman and khachiyan on the complexity of dualization of monotone disjunctive normal forms algorithms which showed that the problem is solvable in quasi polynomial time and thus most likely not co np hard as well as on follow up works we consider computational aspects including limited nondeterminism probabilistic computation parallel and learning based algorithms and implementations and experimental results from the literature the paper closes with open issues for further research
detailed understanding of expansion in complex networks can greatly aid in the design and analysis of algorithms for variety of important network tasks including routing messages ranking nodes and compressing graphs this has motivated several recent investigations of expansion properties in real world graphs and also in random models of real world graphs like the preferential attachment graph the results point to gap between real world observations and theoretical models some real world graphs are expanders and others are not but graph generated by the preferential attachment model is an expander whp we study random graph gn that combines certain aspects of geometric random graphs and preferential attachment graphs this model yields graph with power law degree distribution where the expansion property depends on tunable parameter of the model the vertices of gn are sequentially generated points xn chosen uniformly at random from the unit sphere in after generating xt we randomly connect it to points from those points in xt
tensor voting tv methods have been developed in series of papers by medioni and coworkers during the last years the method has been proved efficient for feature extraction and grouping and has been applied successfully in diversity of applications such as contour and surface inferences motion analysis etc we present here two studies on improvements of the method the first one consists in iterating the tv process and the second one integrates curvature information in contrast to other grouping methods tv claims the advantage to be non iterative although non iterative tv methods provide good results in many cases the algorithm can be iterated to deal with more complex or more ambiguous data configurations we present experiments that demonstrate that iterations substantially improve the process of feature extraction and help to overcome limitations of the original algorithm as further contribution we propose curvature improvement for tv unlike the curvature augmented tv proposed by tang and medioni our method evaluates the full curvature sign and amplitude in the case another advantage of the method is that it uses part of the curvature calculation already performed by the classical tv limiting the computational costs curvature modified voting fields are also proposed results show smoother curves lower degree of artifacts and high tolerance against scale variations of the input the methods are finally tested under noisy conditions showing that the proposed improvements preserve the noise robustness of the tv method
alternative semantics for aspect oriented abstractions can be defined by language designers using extensible aspect compiler frameworks however application developers are prevented from tailoring the language semantics in an application specific manner to address this problem we propose an architecture for aspect oriented languages with an explicit meta interface to language semantics we demonstrate the benefits of such an architecture by presenting several scenarios in which aspect oriented programs use the meta interface of the language to tailor its semantics to particular application execution context
in this paper we apply regression via classification rvc to the problem of estimating the number of software defects this approach apart from certain number of faults it also outputs an associated interval of values within which this estimate lies with certain confidence rvc also allows the production of comprehensible models of software defects exploiting symbolic learning algorithms to evaluate this approach we perform an extensive comparative experimental study of the effectiveness of several machine learning algorithms in two software data sets rvc manages to get better regression error than the standard regression approaches on both datasets
in this paper we describe the user modeling phase of our general research approach developing adaptive intelligent user interfaces to facilitate enhanced natural communication during the human computer interaction natural communication is established by recognizing users affective states ie emotions experienced by the users and responding to those emotions by adapting to the current situation via an affective user model adaptation of the interface was designed to provide multi modal feedback to the users about their current affective state and to respond to users negative emotional states in order to compensate for the possible negative impacts of those emotions bayesian belief networks formalization was employed to develop the user model to enable the intelligent system to appropriately adapt to the current context and situation by considering user dependent factors such as personality traits and preferences
software architecture has been shown to provide an appropriate level of granularity for assessing software system’s quality attributes eg performance and dependability similarly previous research has adopted an architecture centric approach to reasoning about and managing the run time adaptation of software systems for mobile and pervasive software systems which are known to be innately dynamic and unpredictable the ability to assess system’s quality attributes and manage its dynamic run time behavior is especially important in the past researchers have argued that software architecture based approach can be instrumental in facilitating mobile computing in this paper we present an integrated architecture driven framework for modeling analysis implementation deployment and run time migration of software systems executing on distributed mobile heterogeneous computing platforms in particular we describe the framework’s support for dealing with the challenges posed by both logical and physical mobility we also provide an overview of our experience with applying the framework to family of distributed mobile robotics systems this experience has verified our envisioned benefits of the approach and has helped us to identify several avenues of future work
designing energy efficient clusters has recently become an important concern to make these systems economically attractive for many applications since the cluster interconnect is major part of the system the focus of this paper is to characterize and optimize the energy consumption in the entire interconnect using cycle accurate simulator of an infiniband architecture iba compliant interconnect fabric and actual designs of its components we investigate the energy behavior on regular and irregular interconnects the energy profile of the three major components switches network interface cards nics and links reveals that the links and switch buffers consume the major portion of the power budget hence we focus on energy optimization of these two components to minimize power in the links first we investigate the dynamic voltage scaling dvs algorithm and then propose novel dynamic link shutdown dls technique the dls technique makes use of an appropriate adaptive routing algorithm to shut down the links intelligently we also present an optimized buffer design for reducing leakage energy in nm technology our analysis on different networks reveals that while dvs is an effective energy conservation technique it incurs significant performance penalty at low to medium workload moreover energy saving with dvs reduces as the buffer leakage current becomes significant with nm design on the other hand the proposed dls technique can provide optimized performance energy behavior up to percent energy savings with less than percent performance degradation in the best case for the cluster interconnects
in this paper we consider security related and energy efficiency issues in multi hop wireless networks we start our work from the observation known in the literature that shortest path routing creates congested areas in multi hop wireless networks these areas are critical they generate both security and energy efficiency issues we attack these problems and set out routing in outer space new routing mechanism that transforms any shortest path routing protocol or approximated versions of it into new protocol that does not create congested areas does not have the associated security related issues and does not encourage selfish positioning moreover the network lives longer of the same network using the original routing protocol in spite of using more energy globally and dies more gracefully
design of multiprocessor system on chips requires efficient and accurate simulation of every component since the memory subsystem accounts for up to of the performance and energy expenditures it has to be considered in system level design space exploration in this paper we present novel technique to simulate memory accesses in software tlm models we use compiler to automatically expose all memory accesses in software and annotate them onto efficient tlm models reverse address map provides target memory addresses for accurate cache and memory simulation simulating at more than mhz our models allow realistic architectural design space explorations on memory subsystems we demonstrate our approach with design exploration case study of an industrial strength mpeg decoder
this paper presents results of study of the effect of global variables on the quantity of dependence in general and on the presence of dependence clusters in particular the paper introduces simple transformation based analysis algorithm for measuring the impact of globals on dependence it reports on the application of this approach to the detailed assessment of dependence in an empirical study of programs consisting of just over lines of code the technique is used to identify global variables that have significant impact upon program dependence and to identify and characterize the ways in which global variable dependence may lead to dependence clusters in the study over half of the programs include such global variable and quarter have one that is solely responsible for dependence cluster
in wireless sensor networks wsns lot of sensory traffic with redundancy is produced due to massive node density and their diverse placement this causes the decline of scarce network resources such as bandwidth and energy thus decreasing the lifetime of sensor network recently the mobile agent ma paradigm has been proposed as solution to overcome these problems the ma approach accounts for performing data processing and making data aggregation decisions at nodes rather than bring data back to central processor sink using this approach redundant sensory data is eliminated in this article we consider the problem of calculating near optimal routes for mas that incrementally fuse the data as they visit the nodes in wsn the order of visited nodes the agent’s itinerary affects not only the quality but also the overall cost of data fusion our proposed heuristic algorithm adapts methods usually applied in network design problems in the specific requirements of sensor networks it computes an approximate solution to the problem by suggesting an appropriate number of mas that minimizes the overall data fusion cost and constructs near optimal itineraries for each of them the performance gain of our algorithm over alternative approaches both in terms of cost and task completion latency is demonstrated by quantitative evaluation and also in simulated environments through java based tool
the knowledge base management systems kbms project at the university of toronto was inspired by need for advanced knowledge representation applications that require knowledge bases containing hundreds of thousands or even millions of knowledge units the knowledge representation language telos provided framework for the project the key results included conceptual modeling innovations in the use of semantic abstractions representations of time and space and implementation techniques for storage management query processing rule management and concurrency control in this paper we review the key ideas introduced in the kbms project and connect them to some of the work since the conclusion of the project that is either closely related to or directly inspired by it
improvement of the software process is major concern for many organizations critical part of such an endeavor is the definition of metrics despite the importance of metric definition there have been no evaluations of existing methods for achieving this it is generally taken for granted that method with wide acceptance is suitable review of metric definition methods identities basili’s as one of the most widely used this paper reports on an evaluation of the method the evaluation is based on an actual application of the method in process improvement effort the resultant metrics and instrument are evaluated with respect to the following criteria interpretability validity reliability effectiveness and transportability the causes of problems found are identified these causes are problems with the method itself the evaluation indicates that the metrics resultant from the application of the method do not appropriately meet the above criteria and research on evolving or developing alternative methods is emphasized
in this paper we provide formal analysis of the idea of normative co ordination we argue that this idea is based on the assumption that agents can achieve flexible co ordination by conferring normative positions to other agents these positions include duties permissions and powers in particular we explain the idea of declarative power which consists in the capacity of the power holder of creating normative positions involving other agents simply by proclaiming such positions in addition we account also for the concepts of representation namely the representative’s capacity of acting in the name of his principal and of mandate which is the mandatee’s duty to act as the mandator has requested finally we show how the framework can be applied to represent the contract net protocol some brief remarks on future research and applications conclude this contribution
we analyse the complexity of finite model reasoning in the description logic alcqi ie alc augmented with qualifying number restrictions inverse roles and general tboxes it turns out that all relevant reasoning tasks such as concept satisfiability and abox consistency are exptime complete regardless of whether the numbers in number restrictions are coded unarily or binarily thus finite model reasoning with alcqi is not harder than standard reasoning with alcqi
public displays are typically situated in strategic places like town centers and in salient positions on walls within buildings however currently most public displays are non interactive and are typically used for information broadcasting tv news advertisements etc people passing by pay little attention to them as consequence public displays are under utilized in the everyday world we are investigating whether use of interactive public displays might increase people’s interaction with one another with resulting increase in sense of community in this paper we describe the design and first deployment experiences of platform independent interactive video commenting system using large public display in two sections of large enrollment university class our preliminary evaluation suggests that students enjoyed the activity of commenting that they participated great deal and that their sense of community was greater after using the system we discuss lessons we have learned from this initial experience and describe further work we are planning using this and similar interactive activities
we consider the problem of optimal netlist simplification in the presence of constraints because constraints restrict the reachable states of netlist they may enhance logic minimization techniques such as redundant gate elimination which generally benefit from unreachability invariants however optimizing the logic appearing in constraint definition may weaken its state restriction capability hence prior solutions have resorted to suboptimally neglecting certain valid optimization opportunities we develop the theoretical foundation and corresponding efficient implementation to enable the optimal simplification of netlists with constraints experiments confirm that our techniques enable significantly greater degree of redundant gate elimination than prior approaches often greater than which has been key to the automated solution of various difficult verification problems
virtual intimate objects are low bandwidth devices for communicating intimacy for couples in long distance relationships vios were designed to express intimacy in rich manner over low bandwidth connection vios were evaluated using logbook which included open ended questions designed to understand the context within which the vio was used users constructed complex dynamically changing understanding of the meaning of each interaction based on an understanding of their and their partner’s context of use the results show that users had rich and complex interpretations of this seemingly simple communication which suggests the necessity of exploring context of use to understand the situated nature of the interactions as an intrinsic part of an evaluation process for such technologies
uncontrolled overload can lead commerce applications to considerable revenue losses for this reason overload prevention in these applications is critical issue in this paper we present complete characterization of secure commerce applications scalability to determine which are the bottlenecks in their performance that must be considered for an overload control strategy with this information we design an adaptive session based overload control strategy based on ssl secure socket layer connection differentiation and admission control the ssl connection differentiation is key factor because the cost of establishing new ssl connection is much greater than establishing resumed ssl connection it reuses an existing ssl session on the server considering this big difference we have implemented an admission control algorithm that prioritizes resumed ssl connections to maximize the performance in session based environments and dynamically limits the number of new ssl connections accepted according to the available resources and the current number of connections in the system in order to avoid server overload our evaluation on tomcat server demonstrates the benefit of our proposal for preventing server overload
we are given collection of text documents hellip dk with sum which may be preprocessed in the document listing problem we are given an online query comprising of pattern string of length and our goal is to return the set of all documents that contain one or more copies of in the closely related occurrence listing problem we output the set of all positions within the documents where pattern occurs in weiner presented an algorithm with time and space preprocessing following which the occurrence listing problem can be solved in time output where output is the number of positions where occurs this algorithm is clearly optimal in contrast no optimal algorithm is known for the closely related document listing problem which is perhaps more natural and certainly well motivatedwe provide the first known optimal algorithm for the document listing problem more generally we initiate the study of pattern matching problems that require retrieving documents matched by the patterns this contrasts with pattern matching problems that have been studied more frequently namely those that involve retrieving all occurrences of patterns we consider document retrieval problems that are motivated by online query processing in databases information retrieval systems and computational biology we present very efficient optimal algorithms for our document retrieval problems our approach for solving such problems involve performing local encodings whereby they are reduced to range query problems on geometric objects points and lines that have color we present improved algorithms for these colored range query problems that arise in our reductions using the structural properties of strings this approach is quite general and yields simple efficient implementable algorithms for all the document retrieval problems in this paper
single language runtime systems in the form of java virtual machines are widely deployed platforms for executing untrusted mobile code these runtimes provide some of the features that operating systems provide interapplication memory protection and basic system services they do not however provide the ability to isolate applications from each other neither do they provide the ability to limit the resource consumption of applications consequently the performance of current systems degrades severely in the presence of malicious or buggy code that exhibits ill behaved resource usage we show that java runtime systems can be extended to support processes and that processes can provide robust and efficient support for untrusted applicationswe have designed and built kaffeos java runtime system that provides support for processes kaffeos isolates processes and manages the physical resources available to them cpu and memory unlike existing java virtual machines kaffeos can safely terminate processes without adversely affecting the integrity of the system and it can fully reclaim terminated process’s resources finally kaffeos requires no changes to the java language the novel aspects of the kaffeos architecture include the application of user kernel boundary as structuring principle for runtime systems the employment of garbage collection techniques for resource management and isolation and model for direct sharing of objects between untrusted applications the difficulty in designing kaffeos lay in balancing the goals of isolation and resource management against the goal of allowing direct sharing of objectsfor the specjvm benchmarks the overhead that our kaffeos prototype incurs ranges from percent to percent when compared to the open source jvm on which it is based we consider this overhead acceptable for the safety that kaffeos provides in addition our kaffeos prototype can scale to run more applications than running multiple jvms finally in the presence of malicious or buggy code that engages in denial of service attack kaffeos can contain the attack remove resources from the attacked applications and continue to provide robust service to other clients
in this paper we propose novel way of supporting occasional meetings that take place in unfamiliar public places which promotes lightweight visible and fluid collaboration our central idea is that the sharing and exchange of information occurs across public surfaces that users can easily access and interact with to this end we designed and implemented dynamo communal multi user interactive surface the surface supports the cooperative sharing and exchange of wide range of media that can be brought to the surface by users that are remote from their familiar organizational settings
with the advances in and popularity of mobile devices mobile service providers have direct channel for transferring information to their subscribers ie short messaging service sms and multimedia messaging service mms mobile service operators can recommend new content and information to users who opt in to receive such information directly through push messages at any time or place however as mobile push messages sent to users can cause interruptions such as alarms users who receive irrelevant push messages may become dissatisfied with their mobile web service and even their service provider in this paper we propose mobile content recommender system for sending personalized mobile push messages with content that users are likely to find relevant this system learns users preferences from contents and keywords in their usage logs and recommends items that match these preferences or those of similar users we analyzed customer feedback on personalized content dissemination and the relationship between customer feedback and mobile web usage of customers subscribing to korean mobile service provider push messages with personalized recommendations resulted in more positive feedback from customers and the mobile web usage of these customers increased
object oriented languages such as java and smalltalk provide uniform object reference model allowing objects to be conveniently shared if implemented directly these uniform reference models can suffer in efficiency due to additional memory dereferences and memory management operations automatic inline allocation of child objects within parent objects can reduce overheads of heap allocated pointer referenced objectswe present three compiler analyses to identify inlinable fields by tracking accesses to heap objects these analyses span range from local data flow to adaptive whole program flow sensitive inter procedural analysis we measure their cost and effectiveness on suite of moderate sized programs up to lines including libraries we show that aggressive interprocedural analysis is required to enable object inlining and our adaptive inter procedural analysis computes precise information efficiently object inlining eliminates typically of object accesses and allocations improving performance up to furthermore
hardware trends have produced an increasing disparity between processor speeds and memory access times while variety of techniques for tolerating or reducing memory latency have been proposed these are rarely successful for pointer manipulating programsthis paper explores complementary approach that attacks the source poor reference locality of the problem rather than its manifestation memory latency it demonstrates that careful data organization and layout provides an essential mechanism to improve the cache locality of pointer manipulating programs and consequently their performance it explores two placement techniques clustering and coloring that improve cache performance by increasing pointer structure’s spatial and temporal locality and by reducing cache conflictsto reduce the cost of applying these techniques this paper discusses two strategies cache conscious reorganization and cache conscious allocation and describes two semi automatic tools ccmorph and ccmalloc that use these strategies to produce cache conscious pointer structure layouts ccmorph is transparent tree reorganizer that utilizes topology information to cluster and color the structure ccmalloc is cache conscious heap allocator that attempts to co locate contemporaneously accessed data elements in the same physical cache block our evaluations with microbenchmarks several small benchmarks and couple of large real world applications demonstrate that the cache conscious structure layouts produced by ccmorph and ccmalloc offer large performance benefits in most cases significantly outperforming state of the art prefetching
it has been established that the second order stochastic gradient descent sgd method can potentially achieve generalization performance as well as empirical optimum in single pass through the training examples however second order sgd requires computing the inverse of the hessian matrix of the loss function which is prohibitively expensive for structured prediction problems that usually involve very high dimensional feature space this paper presents new second order sgd method called periodic step size adaptation psa psa approximates the jacobian matrix of the mapping function and explores linear relation between the jacobian and hessian to approximate the hessian which is proved to be simpler and more effective than directly approximating hessian in an on line setting we tested psa on wide variety of models and tasks including large scale sequence labeling tasks using conditional random fields and large scale classification tasks using linear support vector machines and convolutional neural networks experimental results show that single pass performance of psa is always very close to empirical optimum
modeling software features with model programs in is way of formalizing software requirements that lends itself to automated analysis such as model based testing unordered structures like sets and maps provide useful abstract view of system state within model program and greatly reduce the number of states that must be considered during analysis similarly technique called linearization reduces the number of states that must be considered by identifying isomorphic states or states that are identical except for reserve element choice such as the choice of object ids for instances of classes unfortunately linearization does not work on unordered structures such as sets the problem turns into graph isomorphism for which no polynomial time solution is known in this paper we discuss the issue of state isomorphism in the presence of unordered structures and give practical approach that overcomes some of the algorithmic limitations
we present software pipeline that enables an animator to deform light fields the pipeline can be used to deform complex objects such as furry toys while maintaining photo realistic quality our pipeline consists of three stages first we split the light field into sub light fields to facilitate splitting of complex objects we employ novel technique based on projected light patterns second we deform each sub light field to do this we provide the animator with controls similar to volumetric free form deformation third we recombine and render each sub light field our rendering technique properly handles visibility changes due to occlusion among sub light fields to ensure consistent illumination of objects after they have been deformed our light fields are captured with the light source fixed to the camera rather than being fixed to the object we demonstrate our deformation pipeline using synthetic and photographically acquired light fields potential applications include animation interior design and interactive gaming
this paper describes how formal methods were used to produce evidence in certification based on the common criteria of security critical software system the evidence included top level specification tls of the security relevant software behavior formal statement of the required security properties proofs that the specification satisfied the properties and demonstration that the source code which had been annotated with preconditions and postconditions was refinement of the tls the paper also describes those aspects of our approach which were most effective and research that could significantly increase the effectiveness of formal methods in software certification
the class of frequent hypergraph mining problems is introduced which includes the frequent graph mining problem class and contains also the frequent itemset mining problem we study the computational properties of different problems belonging to this class in particular besides negative results we present practically relevant problems that can be solved in incremental polynomial time some of our practical algorithms are obtained by reductions to frequent graph mining and itemset mining problems our experimental results in the domain of citation analysis show the potential of the framework on problems that have no natural representation as an ordinary graph
this paper presents case study of web based distributed simulation across the atlantic ocean between canada and france the distributed simulation engine known as dcd extends the cd environment to expose the simulation functionalities as machine consumable services based on the devs and cell devs formalisms and commonly used web service technologies dcd provides platform that represents step further towards transparent sharing of computing power data models and experiments in heterogeneous environment on global scale also the simulation service can be easily integrated with other services such as visualization network management and geographic information services in larger system experiments have been carried out to investigate simulation performance over commodity internet connections and major bottlenecks in the system have been identified based on the experimental results we put forward several areas that warrant further research
we present pan on demand self organizing wireless personal area network pan that balances performance and energy concerns by scaling the structure of the network to match the demands of applications pan on demand autonomously organizes co located mobile devices with one or more commodity radios such as bluetooth and wi fi to form network that enables data sharing among those devices it improves performance and extends battery life by switching between interfaces and opportunistically exploiting available power saving modes when applications are actively using the network pan on demand offers high bandwidth low latency communication when demand is light it adapts the network structure to minimize energy usage our results show that pan on demand reduces the average response time of pan activities such as mp playing mail viewing and photo sharing by up to and extends battery life by up to compared to current pan communication strategies
database systems are concerned with structured data unfortunately data is still often available in an unstructured manner eg in files even when it does have strong internal structure eg electronic documents or programs in previous paper we focussed on the use of high level query languages to access such files and developed optimization techniques to do so in this paper we consider how structured data stored in files can be updated using database update languages the interest of using database languages to manipulate files is twofold first it opens database systems to external data this concerns data residing in files or data transiting on communication channels and possibly coming from other databases secondly it provides high level query update facilities to systems that usually rely on very primitive linguistic support see for recent works in this direction similar motivations appear in in previous paper we introduced the notion of structuring schemas as mean of providing database view on structured data residing in file structuring schema consists of grammar together with semantic actions in database language we also showed how queries on files expressed in high level query language inf inf sql could be evaluated efficiently using variations of standard database optimization techniques the problem of update was mentioned there but remained largely unexplored this is the topic of the present paper we argue that updates on files can be expressed conveniently using high level database update languages that work on the database view of the file the key problem is how to propagate an update specified on the database here view to the file here the physical storage as first step we propose naive way of update propagation the database view of the file is materialized the update is performed on the database the database is unparsed to produce an updated file for this we develop an unparsing technique the problems that we meet while developing this technique are related to the well known view update problem see for instance the technique relies on the existence of an inverse mapping from the database to the file we show that the existence of such an inverse mapping results from the use of restricted structuring schemas the naive technique presents two major drawbacks it is inefficient it entails intense data construction and unparsing most of which dealing with data not involved in the update it may result in information loss information in the file that is not recorded in the database may be lost in the process the major contribution of this paper is combination of techniques that allows to minimize both the data construction and the unparsing work first we briefly show how optimization techniques from can be used to focus on the relevant portion of the database and to avoid constructing the entire database then we show that for class of structuring schemas satisfying locality condition it is possible to carefully circumscribe the unparsing some of the results in the paper are negative they should not come as surprise since we are dealing with complex theoretical foundations language theory for parsing and unparsing and first order logic for database languages however we do present positive results for particular classes of structuring schemas we believe that the restrictions imposed on these schemas are very acceptable in practice for instance all real examples of structuring schemas that we examined are local the paper is organized as follows in section we present the update problem and the structuring schemas in section naive technique for update propagation and the unparsing technique section introduces locality condition and presents more efficient technique for propagating updates in local structuring schemas the last section is conclusion
since the early on line analytical processing olap has been well studied research topic that has focused on implementation outside the database either with olap servers or entirely within the client computers our approach involves the computation and storage of olap cubes using user defined functions udf with database management system udfs offer users chance to write their own code that can then called like any other standard sql function by generating olap cubes within udf we are able to create the entire lattice in main memory the udf also allows the user to assert more control over the actual generation process than when using standard olap functions such as the cube operator we introduce data structure that can not only efficiently create an olap lattice in main memory but also be adapted to generate association rule itemsets with minimal change we experimentally show that the udf approach is more efficient than sql using one real dataset and synthetic dataset also we present several experiments showing that generating association rule itemsets using the udf approach is comparable to sql approach in this paper we show that techniques such as olap and association rules can be efficiently pushed into the udf and has better performance in most cases compared to standard sql functions
we consider broadcasting on the multiple access channel when packets are injected continuously multiple access channel is synchronous system with the properties that single transmission at round delivers the message to all nodes while multiple simultaneous transmissions result in conflict which prevents delivering messages to any among the recipients the traditional approach to dynamic broadcasting has been concerned with stability of protocols under suitable stochastic assumptions about injection rates we study deterministic protocols competing against adversaries restricted by injection rate and burstiness of traffic stability means that the number of packets in queues is bounded by constant in any execution for given number of stations protocol and adversary strong stability denotes the property that the number of queued packets is proportional to the burstiness of traffic that is the maximum number of packets an adversary may inject simultaneously there are three natural classes of protocols we consider the weakest acknowledgement based protocols have station rely on its local clock and on feedback from the channel during its own attempts of transmissions full sensing protocols allow station to rely on global clock and to store the history of all the previous successes failures of transmissions in the course of an execution station running an adaptive protocol can rely on global clock may add control bits to be piggybacked on messages and may store the complete history of the feedback from the channel during an execution it turns out that there is no adaptive broadcast protocol stable for the injection rate for the multiple access channel with at least stations even when collision detection is available we show that simple full sensing protocol is universally stable which means it can handle any constant injection rate in stable manner more involved full sensing protocol is shown to be both universally stable and strongly stable for injection rate over lg where is sufficiently large constant and is the number of stations we show that there is an acknowledgement based protocol that is strongly stable for injection rate over lg for sufficiently large constant regarding the stability of acknowledgement based protocols we show that no such protocol is stable for injection rate over lg this implies that there are no universally stable acknowledgement based protocols we show that when collision detection is available then simple full sensing protocol is both universally stable and strongly stable for injection rate over lgn as complementary fact we prove that no adaptive protocol for channel with collision detection can be strongly stable for the injection rate that satisfies over log this shows that the protocol we give is optimal with respect to injection rates it handles in strongly stable manner
handset is transforming from traditional cellular phone to an integrated content delivery platform for communications entertainment and commerce their increasing capabilities and value added features provide more utilities and at the same time make the design more complicated and the device more difficult to use an online survey was conducted to measure user’s perspective of the usability level of their current handset using psychometric type of instrument total of usability factors were derived from the results of exploratory factor analysis the total percentage variance explained by these factors of the overall variance of the data was the average internal consistency in this study is
classic cache replacement policies assume that miss costs are uniform however the correlation between miss rate and cache performance is not as straightforward as it used to be ultimately the true cost measure of miss should be the penalty ie the actual processing bandwidth lost because of the miss it is known that contrary to loads the penalty of stores is mostly hidden in modern processors to take advantage of this observation we propose simple schemes to replace load misses by store misses we extend classic replacement algorithms such as lru least recently used and plru partial lru to reduce the aggregate miss penalty instead of the miss countone key issue is to predict the next access type to block so that higher replacement priority is given to blocks that will be accessed next with store we introduce and evaluate various prediction schemes based on instructions and broadly inspired from branch predictors to guide the design we run extensive trace driven simulations on eight spec benchmarks with wide range of cache configurations and observe that our simple penalty sensitive policies yield positive load miss improvements over classic algorithms across most the benchmarks and cache configurations in some cases the improvements are very large
we present novel geometric algorithm to construct smooth surface that interpolates triangular or quadrilateral mesh of arbitrary topological type formed by vertices although our method can be applied to spline surfaces and subdivision surfaces of all kinds we illustrate our algorithm focusing on loop subdivision surfaces as most of the meshes are in triangular form we start our algorithm by assuming that the given triangular mesh is control net of loop subdivision surface the control points are iteratively updated globally by simple local point surface distance computation and an offsetting procedure without solving linear system the complexity of our algorithm is mn where is the number of vertices and is the number of iterations the number of iterations depends on the fineness of the mesh and accuracy required
this paper addresses the problem of building failure detection service for large scale distributed systems as well as multi agent systems it describes the failure detector mechanism and defines the roles it plays in the system afterwards the key construction problems that are fundamental in the context of building the failure detection service are presented finally sketch of general framework for implementing such service is described the proposed failure detection service can be used by mobile agents as crucial component for building fault tolerant multi agent systems
the first stage in transitioning from stakeholders needs to formal designs is the synthesis of user requirements from information elicited from the stakeholders in this paper we show how shallow natural language techniques can be used to assist analysis of the elicited information and so inform the synthesis of the user requirements we also show how related techniques can be used for the subsequent management of requirements and even help detect the absence of requirements motivation by identifying unprovenanced requirements
for the problem of reflecting an update on database view to the main schema the constant complement strategies are precisely those which avoid all update anomalies and so define the gold standard for well behaved solutions to the problem however the families of view updates which are supported under such strategies are limited so it is sometimes necessary to go beyond them albeit in systematic fashion in this work an investigation of such extended strategies is initiated for relational schemata the approach is to characterize the information content of database instance and then require that the optimal reflection of view update to the main schema embody the least possible change of information to illustrate the utility of the idea sufficient conditions for the existence of optimal insertions in the context of families of extended embedded implicational dependencies xeids are established it is furthermore established that all such optimal insertions are equivalent up to renaming of the new constant symbols which were introduced in support of the insertion
we believe that the future for problem solving method psm derived work is very promising in short psms provide solid foundation for creating semantic layer supporting planetary scale networks moreover within world scale network where billions services are used and created by billions of parties in ad hoc dynamic fashion we believe that psm based mechanisms provide the only viable approach to dealing the sheer scale systematically our current experiments in this area are based upon generic ontology for describing web services derived from earlier work on psms we outline how platforms based on our ontology can support large scale networked interactivity in three main areas within large european project we are able to map business level process descriptions to semantic web service descriptions to enable business experts to manage and use enterprise processes running in corporate information technology systems although highly successful web service based applications predominately run behind corporate firewalls and are far less pervasive on the general web within second large european project we are extending our semantic service work using the principles underlying the web and web to transform the web from web of data to one where services are managed and used at large scale significant initiatives are now underway in north america asia and europe to design new internet using clean slate approach to fulfill the demands created by new modes of use and the additional billion users linked to mobile phones our investigations within the european based future internet program indicate that significant opportunity exists for our psm derived work to address the key challenges currently identified scalability trust interoperability pervasive usability and mobility we outline one psm derived approach as an exemplar
we focus on improving the effectiveness of content based shape retrieval motivated by retrieval performance of several individual model feature vectors we propose novel method called prior knowledge based automatic weighted combination to improve the retrieval effectiveness the method dynamically determines the weighting scheme for different feature vectors based on the prior knowledge the experimental results show that the proposed method provides significant improvements on retrieval effectiveness of shape search with several measures on standard database compared with two existing combination methods the prior knowledge weighted combination technique has gained better retrieval effectiveness
visual object analysis researchers are increasingly experimenting with video because it is expected that motion cues should help with detection recognition and other analysis tasks this paper presents the cambridge driving labeled video database camvid as the first collection of videos with object class semantic labels complete with metadata the database provides ground truth labels that associate each pixel with one of semantic classes the database addresses the need for experimental data to quantitatively evaluate emerging algorithms while most videos are filmed with fixed position cctv style cameras our data was captured from the perspective of driving automobile the driving scenario increases the number and heterogeneity of the observed object classes over min of high quality hz footage is being provided with corresponding semantically labeled images at hz and in part hz the camvid database offers four contributions that are relevant to object analysis researchers first the per pixel semantic segmentation of over images was specified manually and was then inspected and confirmed by second person for accuracy second the high quality and large resolution color video images in the database represent valuable extended duration digitized footage to those interested in driving scenarios or ego motion third we filmed calibration sequences for the camera color response and intrinsics and computed camera pose for each frame in the sequences finally in support of expanding this or other databases we present custom made labeling software for assisting users who wish to paint precise class labels for other images and videos we evaluate the relevance of the database by measuring the performance of an algorithm from each of three distinct domains multi class object recognition pedestrian detection and label propagation
spatial data are found in geographic information systems such as digital road map databases where city and road attributes are associated with nodes and links in directed graph queries on spatial data are expensive because of the recursive property of graph traversal we propose graph indexing technique to expedite spatial queries where the graph topology remains relatively stationary using probabilistic analysis this paper shows that the graph indexing technique significantly improves the efficiency of constrained spatial queries
although several analytical models have been proposed in the literature for deterministic routing in different interconnection networks very few of them have considered the effects of virtual channel multiplexing on network performance this paper proposes new analytical model to compute message latency in general dimensional torus network with an arbitrary number of virtual channels per physical channel unlike the previous models proposed for toroidal based networks this model uses combinatorial approach to consider all different possible cases for the source destination pairs thus resulting in an accurate prediction the results obtained from simulation experiments confirm that the proposed model exhibits high degree of accuracy for various network sizes under different operating conditions compared to similar model proposed very recently which considers virtual channel utilization in the ary cube network
zero day cyber attacks such as worms and spy ware are becoming increasingly widespread and dangerous the existing signature based intrusion detection mechanisms are often not sufficient in detecting these types of attacks as result anomaly intrusion detection methods have been developed to cope with such attacks among the variety of anomaly detection approaches the support vector machine svm is known to be one of the best machine learning algorithms to classify abnormal behaviors the soft margin svm is one of the well known basic svm methods using supervised learning however it is not appropriate to use the soft margin svm method for detecting novel attacks in internet traffic since it requires pre acquired learning information for supervised learning procedure such pre acquired learning information is divided into normal and attack traffic with labels separately furthermore we apply the one class svm approach using unsupervised learning for detecting anomalies this means one class svm does not require the labeled information however there is downside to using one class svm it is difficult to use the one class svm in the real world due to its high false positive rate in this paper we propose new svm approach named enhanced svm which combines these two methods in order to provide unsupervised learning and low false alarm capability similar to that of supervised svm approach we use the following additional techniques to improve the performance of the proposed approach referred to as anomaly detector using enhanced svm first we create profile of normal packets using self organized feature map sofm for svm learning without pre existing knowledge second we use packet filtering scheme based on passive tcp ip fingerprinting ptf in order to reject incomplete network traffic that either violates the tcp ip standard or generation policy inside of well known platforms third feature selection technique using genetic algorithm ga is used for extracting optimized information from raw internet packets fourth we use the flow of packets based on temporal relationships during data preprocessing for considering the temporal relationships among the inputs used in svm learning lastly we demonstrate the effectiveness of the enhanced svm approach using the above mentioned techniques such as sofm ptf and ga on mit lincoln lab datasets and live dataset captured from real network the experimental results are verified by fold cross validation and the proposed approach is compared with real world network intrusion detection systems nids
this paper explores the use of hierarchical structure for classifying large heterogeneous collection of web content the hierarchical structure is initially used to train different second level classifiers in the hierarchical case model is learned to distinguish second level category from other categories within the same top level in the flat non hierarchical case model distinguishes second level category from all other second level categories scoring rules can further take advantage of the hierarchy by considering only second level categories that exceed threshold at the top level we use support vector machine svm classifiers which have been shown to be efficient and effective for classification but not previously explored in the context of hierarchical classification we found small advantages in accuracy for hierarchical models over flat models for the hierarchical approach we found the same accuracy using sequential boolean decision rule and multiplicative decision rule since the sequential approach is much more efficient requiring only of the comparisons used in the other approaches we find it to be good choice for classifying text into large hierarchical structures
in this paper new approach for detecting previously unencountered malware targeting mobile device is proposed in the proposed approach time stamped security data is continuously monitored within the target mobile device ie smartphones pdas and then processed by the knowledge based temporal abstraction kbta methodology using kbta continuously measured data eg the number of sent smss and events eg software installation are integrated with mobile device security domain knowledge base ie an ontology for abstracting meaningful patterns from raw time oriented security data to create higher level time oriented concepts and patterns also known as temporal abstractions automatically generated temporal abstractions are then monitored to detect suspicious temporal patterns and to issue an alert these patterns are compatible with set of predefined classes of malware as defined by security expert or the owner employing set of time and value constraints the goal is to identify malicious behavior that other defensive technologies eg antivirus or firewall failed to detect since the abstraction derivation process is complex the kbta method was adapted for mobile devices that are limited in resources ie cpu memory battery to evaluate the proposed modified kbta method lightweight host based intrusion detection system hids combined with central management capabilities for android based mobile phones was developed evaluation results demonstrated the effectiveness of the new approach in detecting malicious applications on mobile devices detection rate above in most scenarios and the feasibility of running such system on mobile devices cpu consumption was on average
enterprises have to be increasingly agile and responsive to address the challenges posed by the fast moving market with the software architecture evolving into service oriented architecture and the adoption of radio frequency identification rfid event processing can fit well in enterprise information systems in terms of facilitation of event aggregation into high level actionable information and event response to improve the responsiveness to make it more applicable the architecture of event processing in enterprise information systems is proposed event meta model is put forth and the rules are defined to improve the detect efficiency classification and partition of event instance is utilized we have implemented the event processing mechanism in enterprise information systems based on rfid including the data structures optimization strategies and algorithm that is considered as one of the contributions the performance evaluations show that the method is effective in terms of scalability and the capability of event processing complex event processing can improve operation efficiency and discover more actionable information which is justified by the application
the ability to quickly locate one or more instances of model in grey scale image is of importance to industry the recognition localization must be fast and accurate in this paper we present an algorithm which incorporates normalized correlation into pyramid image representation structure to perform fast recognition and localization the algorithm employs an estimate of the gradient of the correlation surface to perform steepest descent search test results are given detailing search time by target size effect of rotation and scale changes on performance and accuracy of the subpixel localization algorithm used in the algorithm finally results are given for searches on real images with perspective distortion and the addition of gaussian noise
wikipedia is one of the most successful online knowledge bases attracting millions of visits daily not surprisingly its huge success has in turn led to immense research interest for better understanding of the collaborative knowledge building process in this paper we performed terrorism domain specific case study comparing and contrasting the knowledge evolution in wikipedia with knowledge base created by domain experts specifically we used the terrorism knowledge base tkb developed by experts at mipt we identified wikipedia articles matching tkb records and went ahead to study them from three aspects creation revision and link evolution we found that the knowledge building in wikipedia had largely been independent and did not follow tkb despite the open and online availability of the latter as well as awareness of at least some of the wikipedia contributors about the tkb source in an attempt to identify possible reasons we conducted detailed analysis of contribution behavior demonstrated by wikipedians it was found that most wikipedians contribute to relatively small set of articles each their contribution was biased towards one or very few article at the same time each article’s contributions are often championed by very few active contributors including the article’s creator we finally arrive at conjecture that the contributions in wikipedia are more to cover knowledge at the article level rather than at the domain level
this paper presents the design and deployment of polar defence an interactive game for large public display we designed this display based on model of users and their interactions with large public displays in public spaces which we derived from prior work we conducted four day user study of this system in public space to evaluate the game and its impact on the surrounding environment our analysis showed that the installation successfully encouraged participation among strangers and that its design and deployment addressed many of the challenges described by prior research literature finally we reflect on this deployment to provide design guidance to other researchers building large interactive public displays for public spaces
this paper proposes novel accurate and efficient hybrid cpu gpu based dof haptic rendering algorithm for highly complex and large scale virtual environments ves that may simultaneously contain different types of object data representations in slower rendering process on the gpu local geometry near the haptic interaction point hip is obtained in the form of six directional depth maps from virtual cameras adaptively located around the object to be touched in faster rendering process on the cpu collision detection and response computations are performed using the directional depth maps without the need for any complex data hierarchy of virtual objects or data conversion of multiple data formats to efficiently find an ideal hip ihip the proposed algorithm uses new abstract local occupancy map instance lomi and the nearest neighbor search algorithm which does not require physical memory for storing voxel types during online voxelization and reduces the search time by factor of about finally in order to achieve accurate haptic interaction sub voxelization of voxel in lomi is proposed the effectiveness of the proposed algorithm is subsequently demonstrated with several benchmark examples
significant body of research in ubiquitous computing deals with mobile networks ie networks of mobile devices interconnected by wireless communication links due to the very nature of such mobile networks addressing and communicating with remote objects is significantly more difficult than in their fixed counterparts this paper reconsiders the remote object reference concept one of the most fundamental programming ions of distributed programming languages in the context of mobile networks we describe four desirable characteristics of remote references in mobile networks show how existing remote object references fail to exhibit them and subsequently propose ambient references remote object references designed for mobile networks
the widely used means clustering deals with ball shaped spherical gaussian clusters in this paper we extend the means clustering to accommodate extended clusters in subspaces such as line shaped clusters plane shaped clusters and ball shaped clusters the algorithm retains much of the means clustering flavors easy to implement and fast to converge model selection procedure is incorporated to determine the cluster shape as result our algorithm can recognize wide range of subspace clusters studied in various literatures and also the global ball shaped clusters living in all dimensions we carry extensive experiments on both synthetic and real world datasets and the results demonstrate the effectiveness of our algorithm
cloud computing enables highly scalable services to be easily consumed over the internet on an as needed basis while cloud computing is expanding rapidly and used by many individuals and organizations internationally data protection issues in the cloud have not been carefully addressed at current stage in the cloud users data is usually processed remotely in unknown machines that users do not own or operate hence users fear of confidential data particularly financial and health data leakage and loss of privacy in the cloud becomes significant barrier to the wide adoption of cloud services to allay users concerns of their data privacy in this paper we propose novel data protection framework which addresses challenges during the life cycle of cloud service the framework consists of three key components policy ranking policy integration and policy enforcement for each component we present various models and analyze their properties our goal is to provide new vision toward addressing the issues of the data protection in the cloud rather than detailed techniques of each component to this extent the paper includes discussion of set of general guidelines for evaluating systems designed based on such framework
we describe the first algorithms to compute minimum cuts in surface embedded graphs in near linear time given an undirected graph embedded on an orientable surface of genus with two specified vertices and our algorithm computes minimum cut in go log time except for the special case of planar graphs for which log time algorithms have been known for more than years the best previous time bounds for finding minimum cuts in embedded graphs follow from algorithms for general sparse graphs slight generalization of our minimum cut algorithm computes minimum cost subgraph in every homology class we also prove that finding minimum cost subgraph homologous to single input cycle is np hard
the growing availability of internet access has led to significant increase in the use of world wide web if we are to design dependable web based systems that deal effectively with the increasing number of clients and highly variable workload it is important to be able to describe the web workload and errors accurately in this paper we focus on the detailed empirical analysis of the session based workload and reliability based on the data extracted from actual web logs of eleven web servers first we introduce and rigourously analyze several intra session and inter session metrics that collectively describe web workload in terms of user sessions then we analyze web error characteristics and estimate the request based and session based reliability of web servers finally we identify the invariants of the web workload and reliability that apply through all data sets considered the results presented in this paper show that session based workload and reliability are better indicators of the users perception of the web quality than the request based metrics
consequence finding has been recognized as an important technique in many intelligent systems involving inference in previous work propositional or first order clausal theories have been considered for consequence finding in this paper we consider consequence finding from default theory which consists of first order clausal theory and set of normal defaults in an extension of default theory consequence finding can be done with the generating defaults for the extension alternatively all extensions can be represented at once with the conditional answer format which represents how conclusion depends on which defaultswe also propose procedure for consequence finding and query answering in default theory using the first order consequence finding procedure sol in computing consequences from default theories efficiently the notion of tcs freeness is most important to prune large number of irrational tableaux induced by the generating defaults for an extension in order to simulate the tcs freeness the refined sol calculus called sol is adopted using skip preference and complement checking
the analysis of scientific articles produced by different groups of authors helps to identify and characterize research groups and collaborations among them although this is quite studied area some issues such as quick understanding of groups and visualization of large social networks still pose some interesting challenges in order to contribute to this study we present solution based in overlapper tool for the visualization of overlapping groups that makes use of an enhanced variation of force directed graphs for real case study the tool has been applied to articles in the dblp database
the streamit programming model has been proposed to exploit parallelism in streaming applications on general purpose multi core architectures this model allows programmers to specify the structure of program as set of filters that act upon data and set of communication channels between them the streamit graphs describe task data and pipeline parallelism which can be exploited on modern graphics processing units gpus as they support abundant parallelism in hardware in this paper we describe the challenges in mapping streamit to gpus and propose an efficient technique to software pipeline the execution of stream programs on gpus we formulate this problem both scheduling and assignment of filters to processors as an efficient integer linear program ilp which is then solved using ilp solvers we also describe novel buffer layout technique for gpus which facilitates exploiting the high memory bandwidth available in gpus the proposed scheduling utilizes both the scalar units in gpu to exploit data parallelism and multiprocessors to exploit task and pipeline parallelism further it takes into consideration the synchronization and bandwidth limitations of gpus and yields speedups between and over single threaded cpu
fundamental problem in distributed computation is the distributed evaluation of functions the goal is to determine the value of function over set of distributed inputs in communication efficient manner specifically we assume that each node holds time varying input vector and we are interested in determining at any given time whether the value of an arbitrary function on the average of these vectors crosses predetermined threshold in this paper we introduce new method for monitoring distributed data which we term shape sensitive geometric monitoring it is based on geometric interpretation of the problem which enables to define local constraints on the data received at the nodes it is guaranteed that as long as none of these constraints has been violated the value of the function does not cross the threshold we generalize previous work on geometric monitoring and solve two problems which seriously hampered its performance as opposed to the constraints used so far which depend only on the current values of the local input vectors here we incorporate their temporal behavior into the constraints also the new constraints are tailored to the geometric properties of the specific function which is being monitored while the previous constraints were generic experimental results on real world data reveal that using the new geometric constraints reduces communication by up to three orders of magnitude in comparison to existing approaches and considerably narrows the gap between existing results and newly defined lower bound on the communication complexity
this paper reports findings from study of how guidebook was used by pairs of visitors touring historic house we describe how the guidebook was incorporated into their visit in four ways shared listening independent use following one another and checking in on each other we discuss how individual and groupware features were adopted in support of different visiting experiences and illustrate how that adoption was influenced by social relationships the nature of the current visit and any museum visiting strategies that the couples had finally we describe how the guidebook facilitated awareness between couples and how awareness of non guidebook users strangers influenced use
we incorporate prewrite operation before write operation in mobile transaction to improve data availability prewrite operation does not update the state of data object but only makes visible the future value that the data object will have after the final commit of the transaction once transaction reads all the values and declares all the prewrites it can pre commit at mobile host mh computer connected to unreliable mobile communication network the remaining transaction’s execution writes on database is shifted to the mobile service station mss computer connected to the reliable fixed network writes on database consume time and resources and are therefore shifted to mss and delayed this reduces wireless network traffic congestion since the responsibility of expensive part of the transaction’s execution is shifted to the mss it also reduces the computing expenses at mobile host pre committed transaction’s prewrite values are made visible both at mobile and at fixed database servers before the final commit of the transaction thus it increases data availability during frequent disconnection common in mobile computing since pre committed transaction does not abort no undo recovery needs to be performed in our model mobile host needs to cache only prewrite values of the data objects which take less memory transmission time energy and can be transmitted over low bandwidth we have analysed various possible schedules of running transactions concurrently both at mobile and fixed database servers we have discussed the concurrency control algorithm for our transaction model and proved that the concurrent execution of our transaction processing model produces only serializable schedules our performance study shows that our model increases throughput and decreases transaction abort ratio in comparison to other lock based schemes we have briefly discussed the recovery issues and implementation of our model
this article presents an experimental and analytical study of value prediction and its impact on speculative execution in superscalar microprocessors value prediction is new paradigm that suggests predicting outcome values of operations at run time and using these predicted values to trigger the execution of true data dependent operations speculatively as result stals to memory locations can be reduced and the amount of instruction level parallelism can be extended beyond the limits of the program’s dataflow graph this article examines the characteristics of the value prediction concept from two perspectives the related phenomena that are reflected in the nature of computer programs and the significance of these phenomena to boosting instruction level parallelism of superscalar microprocessors that support speculative execution in order to better understand these characteristics our work combines both analytical and experimental studies
as raw system performance continues to improve at exponential rates the utility of many services is increasingly limited by availability rather than performance key approach to improving availability involves replicating the service across multiple wide area sites however replication introduces well known trade offs between service consistency and availability thus this article explores the benefits of dynamically trading consistency for availability using continuous consistency model in this model applications specify maximum deviation from strong consistency on per replica basis in this article we evaluate the availability of prototype replication system running across the internet as function of consistency level consistency protocol and failure characteristics ii demonstrate that simple optimizations to existing consistency protocols result in significant availability improvements more than an order of magnitude in some scenarios iii use our experience with these optimizations to prove tight upper bound on the availability of services and iv show that maximizing availability typically entails remaining as close to strong consistency as possible during times of good connectivity resulting in communication versus availability trade off
the wide adaptation of gps and cellular technologies has created many applications that collect and maintain large repositories of data in the form of trajectories previous work on querying analyzing trajectorial data typically falls into methods that either address spatial range and nn queries or similarity based queries nevertheless trajectories are complex objects whose behavior over time and space can be better captured as sequence of interesting events we thus facilitate the use of motion pattern queries which allow the user to select trajectories based on specific motion patterns such patterns are described as regular expressions over spatial alphabet that can be implicitly or explicitly anchored to the time domain moreover we are interested in flexible patterns that allow the user to include variables in the query pattern and thus greatly increase its expressive power in this paper we introduce framework for efficient processing of flexible pattern queries the framework includes an underlying indexing structure and algorithms for query processing using different evaluation strategies an extensive performance evaluation of this framework shows significant performance improvement when compared to existing solutions
in this paper by considering the notion of an mv algebra we consider relationship between rough sets and mv algebra theory we introduce the notion of rough ideal with respect to an ideal of an mv algebra which is an extended notion of ideal in an mv algebra and we give some properties of the lower and the upper approximations in an mv algebra
feature modules are the building blocks of programs in software product lines spls foundational assumption of feature based program synthesis is that features are composed in predefined sequence called natural order recent work on virtual separation of concerns reveals new model of feature interactions that shows that feature modules can be quantized as compositions of smaller modules called derivatives we present this model and examine some of its consequences namely that given program can be reconstructed by composing features in any order and the contents of feature module as expressed as composition of derivatives is determined automatically by feature order we show that different orders allow one to adjust the contents of feature module to isolate and study the impact of interactions that feature has with other features we also show the utility of generalizing safe composition sc basic analysis of spls that verifies program type safety to demonstrate that every legal composition of derivatives and thus any composition order of features produces type safe program which is much stronger sc property
network intrusion detection systems typically detect worms by examining packet or flow logs for known signatures not only does this approach mean worms cannot be detected until the signatures are created but that variants of known worms will remain undetected since they will have different signatures the intuitive solution is to write more generic signatures this solution however would increase the false alarm rate and is therefore practically not feasible this paper reports on the feasibility of using machine learning technique to detect variants of known worms in real time support vector machines svms are machine learning technique known to perform well at various pattern recognition tasks such as text categorization and handwritten digit recognition given the efficacy of svms in standard pattern recognition problems this work applies svms to the worm detection problem specifically we investigate the optimal configuration of svms and associated kernel functions to classify various types of synthetically generated worms we demonstrate that the optimal configuration for real time detection of variants of known worms is to use linear kernel and unnormalized bi gram frequency counts as input
bittorrent bt has carried out significant and continuously increasing portion of internet traffic while several designs have been recently proposed and implemented to improve the resource utilization by bridging the application layer overlay and the network layer underlay these designs are largely dependent on internet infrastructures such as isps and cdns in addition they also demand large scale deployments of their systems to work effectively consequently they require multiefforts far beyond individual users ability to be widely used in the internet in this paper aiming at building an infrastructure independent user level facility we present our design implementation and evaluation of topology aware bt system called topbt to significantly improve the overall internet resource utilization without degrading user downloading performance the unique feature of topbt client lies in that topbt client actively discovers network proximities to connected peers and uses both proximities and transmission rates to maintain fast downloading while reducing the transmitting distance of the bt traffic and thus the internet traffic as result topbt client neither requires feeds from major internet infrastructures such as isps or cdns nor requires large scale deployment of other topbt clients on the internet to work effectively we have implemented topbt based on widely used open source bt client code base and made the software publicly available by deploying topbt and other bittorrent clients on hundreds of internet hosts we show that on average topbt can reduce about download traffic while achieving faster download speed compared to several prevalent bt clients topbt has been widely used in the internet by many users all over the world
divide and conquer algorithms are good match for modern parallel machines they tend to have large amounts of inherent parallelism and they work well with caches and deep memory hierarchies but these algorithms pose challenging problems for parallelizing compilers they are usually coded as recursive procedures and often use pointers into dynamically allocated memory blocks and pointer arithmetic all of these features are incompatible with the analysis algorithms in traditional parallelizing compilersthis paper presents the design and implementation of compiler that is designed to parallelize divide and conquer algorithms whose subproblems access disjoint regions of dynamically allocated arrays the foundation of the compiler is flow sensitive context sensitive and interprocedural pointer analysis algorithm range of symbolic analysis algorithms build on the pointer analysis information to extract symbolic bounds for the memory regions accessed by potentially recursive procedures that use pointers and pointer arithmetic the symbolic bounds information allows the compiler to find procedure calls that can execute in parallel without violating the data dependences the compiler generates code that executes these calls in parallel we have used the compiler to parallelize several programs that use divide and conquer algorithms our results show that the programs perform well and exhibit good speedup
content management systems cms store enterprise data such as insurance claims insurance policies legal documents patent applications or archival data like in the case of digital libraries search over content allows for information retrieval but does not provide users with great insight into the data more analytical view is needed through analysis aggregations groupings trends pivot tables or charts and so on multidimensional content exploration mcx is about effectively analyzing and exploring large amounts of content by combining keyword search with olap style aggregation navigation and reporting we focus on unstructured data or generally speaking documents or content with limited metadata as it is typically encountered in cms we formally present how cms content and metadata should be organized in well defined multidimensional structure so that sophisticated queries can be expressed and evaluated the cms metadata provide traditional olap static dimensions that are combined with dynamic dimensions discovered from the analyzed keyword search result as well as measures for document scores based on the link structure between the documents in addition we provide means for multidimensional content exploration through traditional olap rollupdrilldown operations on the static and dynamic dimensions solutions for multi cube analysis and dynamic navigation of the content we present our prototype called dbpubs which stores research publications as documents that can be searched and most importantly analyzed and explored finally we present experimental results of the efficiency and effectiveness of our approach
the technology for anonymous communication has been thoroughly researched but despite the existence of several protection services business model for anonymous web surfing has not emerged as of today one possibility to stimulate adoption is to facilitate it in specific subnet the idea is to identify promising target group which has substantial benefit from adopting the technology and to facilitate the adoption within that target group we examine the feasibility of this approach for anonymity services we identify potential target group consumers of pornographic online material and empirically validate their suitability by conducting traffic analysis we also discuss several business models for anonymity services we argue that providers of anonymity services should try to generate revenue from content providers like adult entertainment distributors the latter could benefit from offering anonymous access to their products by differentiating against competitors or by selling their products at higher price over the anonymous channel
this paper presents techniques for analyzing channel contract specifications in microsoft research’s singularity operating system channel contract is state machine that specifies the allowable interactions between server and client through an asynchronous communication channel we show that contrary to what is claimed in the singularity documentation processes that faithfully follow channel contract can deadlock we present realizability analysis that can be used to identify channel contracts with problems our realizability analysis also leads to an efficient verification approach where properties about the interaction behavior can be verified without modeling the contents of communication channels we analyzed more than channel contracts from the singularity code distribution and documentation only two contracts failed our realizability condition and these two contracts allow deadlocks our experimental results demonstrate that realizability analysis and verification of channel contracts can be done efficiently using our approach
we show that set of independently developed spam filters may be combined in simple ways to provide substantially better filtering than any of the individual filters the results of fifty three spam filters evaluated at the trec spam track were combined post hoc so as to simulate the parallel on line operation of the filters the combined results were evaluated using the trec methodology yielding more than factor of two improvement over the best filter the simplest method averaging the binary classifications returned by the individual filters yields remarkably good result new method averaging log odds estimates based on the scores returned by the individual filters yields somewhat better result and provides input to svm and logistic regression based stacking methods the stacking methods appear to provide further improvement but only for very large corpora of the stacking methods logistic regression yields the better result finally we show that it is possible to select priori small subsets of the filters that when combined still outperform the best individual filter by substantial margin
constraint based mining is an active field of research which is necessary step to achieve interactive and successful kdd processes the limitations of the task lies in languages being limited to describe the mined patterns and the ability to express varied constraints in practice current approaches focus on language and the most generic frameworks mine individually or simultaneously monotone and an anti monotone constraints in this paper we propose generic framework dealing with any partially ordered language and large set of constraints we prove that this set of constraints called primitive based constraints not only is superclass of both kinds of monotone ones and their boolean combinations but also other classes such as convertible and succinct constraints we show that the primitive based constraints can be efficiently mined thanks to relaxation method based on virtual patterns which summarize the specificities of the search space indeed this approach automatically deduces pruning conditions having suitable monotone properties and thus these conditions can be pushed into usual constraint mining algorithms we study the optimal relaxations finally we provide an experimental illustration of the efficiency of our proposal by experimenting it on several contexts
agent based computing is promising approach for developing applications in complex domains however despite the great deal of research in the area number of challenges still need to be faced to make agent based computing widely accepted paradigm in software engineering practice and ii to turn agent oriented software abstractions into practical tools for facing the complexity of modern application areas in this paper after short introduction to the key concepts of agent based computing as they pertain to software engineering we characterise the emerging key issues in multiagent systems mass engineering in particular we show that such issues can be analysed in terms of three different ldquo scales of observation rdquo ie in analogy with the scales of observation of physical phenomena in terms of micro macro and meso scales based on this characterisation we discuss for each scale of observation what are the peculiar engineering issues arising the key research challenges to be solved and the most promising research directions to be explored in the future
assistance work carried out by one entity in support of another is concept of long standing interest both as type of human work common in organizations and as model of how computational systems might interact with humans surprisingly the perhaps most paradigmatic form of assistance the work of administrative assistants or secretaries has received almost no attention this paper reports on study of assistants and their principals and managers laying out model of their work the skills and competencies they need to function effectively and reflects on implications for the design of systems and organizations
well known bad code smell in refactoring and software maintenance is duplicated code or code clones code clone is code fragment that is identical or similar to another unjustified code clones increase code size make maintenance and comprehension more difficult and also indicate design problems such as lack of encapsulation or abstraction this paper proposes token and ast based hybrid approach to automatically detecting code clones in erlang otp programs underlying collection of refactorings to support user controlled automatic clone removal and examines their application in substantial case studies both the clone detector and the refactorings are integrated within wrangler the refactoring tool developed at kent for erlang otp
computational biology or bioinformatics has been defined as the application of mathematical and computer science methods to solving problems in molecular biology that require large scale data computation and analysis as expected molecular biology databases play an essential role in computational biology research and development this paper introduces into current molecular biology databases stressing data modeling data acquisition data retrieval and the integration of molecular biology data from different sources this paper is primarily intended for an audience of computer scientists with limited background in biology
finding useful sharing information between instances in object oriented programs has recently been the focus of much research the applications of such static analysis are multiple by knowing which variables definitely do not share in memory we can apply conventional compiler optimizations find coarse grained parallelism opportunities or more importantly verify certain correctness aspects of programs even in the absence of annotations in this paper we introduce framework for deriving precise sharing information based on abstract interpretation for java like language our analysis achieves precision in various ways including supporting multivariance which allows separating different contexts we propose combined set sharing nullity classes domain which captures which instances do not share and which ones are definitively null and which uses the classes to refine the static information when inheritance is present the use of set sharing abstraction allows more precise representation of the existing sharings and is crucial in achieving precision during interprocedural analysis carrying the domains in combined way facilitates the interaction among them in the presence of multivariance in the analysis we show through examples and experimentally that both the set sharing part of the domain as well as the combined domain provide more accurate information than previous work based on pair sharing domains at reasonable cost
the deterministic block distribution method proposed for raid systems known as striping has been traditional solution for achieving high performance increased capacity and redundancy all the while allowing the system to be managed as if it were single device however this distribution method requires one to completely change the data layout when adding new storage subsystems which is drawback for current applications this paper presents adaptivez an adaptive block placement method based on deterministic zones which grows dynamically zone by zone according to capacity demands when adapting new storage subsystems it changes only fraction of the data layout while preserving simple management of data due to deterministic placement adaptivez uses both mechanism focused on reducing the overhead suffered during the upgrade as well as heterogeneous data layout for taking advantage of disks with higher capabilities the evaluation reveals that adaptivez only needs to move fraction of data blocks to adapt new storage subsystems while delivering an improved performance and balanced load the migration scheme used by this approach produces low overhead within an acceptable time finally it keeps the complexity of the data management at an acceptable level
at an abstract level hybrid systems are related to variants of kleene algebra since it has recently been shown that kleene algebras and their variants like omega algebras provide reasonable base for automated reasoning the aim of the present paper is to show that automated algebraic reasoning for hybrid system is feasible we mainly focus on applications in particular we present case studies and proof experiments to show how concrete properties of hybrid systems like safety and liveness can be algebraically characterised and how off the shelf automated theorem provers can be used to verify them
trees have been ubiquitous in database management systems for several decades and they are used in other storage systems as well their basic structure and basic operations are well and widely understood including search insertion and deletion concurrency control of operations in trees however is perceived as difficult subject with many subtleties and special cases the purpose of this survey is to clarify simplify and structure the topic of concurrency control in trees by dividing it into two subtopics and exploring each of them in depth
this paper presents cluster validation based document clustering algorithm which is capable of identifying an important feature subset and the intrinsic value of model order cluster number the important feature subset is selected by optimizing cluster validity criterion subject to some constraint for achieving model order identification capability this feature selection procedure is conducted for each possible value of cluster number the feature subset and the cluster number which maximize the cluster validity criterion are chosen as our answer we have evaluated our algorithm using several datasets from the newsgroup corpus experimental results show that our algorithm can find the important feature subset estimate the cluster number and achieve higher micro averaged precision than previous document clustering algorithms which require the value of cluster number to be provided
in recent years technological developments have made it possible to build interactive models of objects and virtual environments that can be experienced through the web using common low cost personal computers as in the case of web based hypermedia adaptivity can play an important role in increasing the usefulness effectiveness and usability of web sites ie web sites distributing content this paper introduces the reader to the concepts issues and techniques of adaptive web sites
learning to rank arises in many information retrieval applications ranging from web search engine online advertising to recommendation system in learning to rank the performance of ranking model is strongly affected by the number of labeled examples in the training set on the other hand obtaining labeled examples for training data is very expensive and time consuming this presents great need for the active learning approaches to select most informative examples for ranking learning however in the literature there is still very limited work to address active learning for ranking in this paper we propose general active learning framework expected loss optimization elo for ranking the elo framework is applicable to wide range of ranking functions under this framework we derive novel algorithm expected dcg loss optimization elo dcg to select most informative examples furthermore we investigate both query and document level active learning for raking and propose two stage elo dcg algorithm which incorporate both query and document selection into active learning extensive experiments on real world web search data sets have demonstrated great potential and effective ness of the proposed framework and algorithms
general constraint solver simplifies the implementation of program analyses because constraint generation can then be separated from constraint solving in return general solver often needs to sacrifice performance for generality we describe strategy that given set of constraints first performs off line optimizations performed before the execution of the solver which enable solver to find potential equivalences between analysis variables so as to reduce the problem space and thus improve performance the idea is that different analyses use different subsets of constraints as result specific property may hold for the subsets and specific optimization can be conducted on the constraints to be concrete we introduce two off line algorithms and apply them on the constraints generated by andersen’s pointer analysis or by reaching definitions analysis respectively the experimental results show that these algorithms dramatically reduce the effort of solving the constraints by detecting and unifying equivalent analysis variables furthermore because these optimizations are conducted on constraints instead of analysis specifications we can reuse them for different analyses and even automatically detect the off line analyses to be used
this paper lays theoretical and software foundations for world wide argument web wwaw large scale web of inter connected arguments posted by individuals to express their opinions in structured manner first we extend the recently proposed argument interchange format aif to express arguments with structure based on walton’s theory of argumentation schemes then we describe an implementation of this ontology using the rdf schema semantic web based ontology language and demonstrate how our ontology enables the representation of networks of arguments on the semantic web finally we present pilot semantic web based system argdf through which users can create arguments using different argumentation schemes and can query arguments using semantic web query language manipulation of existing arguments is also handled in argdf users can attack or support parts of existing arguments or use existing parts of an argument in the creation of new arguments argdf also enables users to create new argumentation schemes as such argdf is an open platform not only for representing arguments but also for building interlinked and dynamic argument networks on the semantic web this initial public domain tool is intended to seed variety of future applications for authoring linking navigating searching and evaluating arguments on the web
finding shortest paths and distances on the surface of mesh in is well studied problem with most research aiming to minimize computation time however for large meshes such as tin terrain models in gis the major bottleneck is often the memory required by an algorithm in this paper we evaluate techniques for computing path distances both for paths restricted to edges of the mesh and for paths traveling freely across the triangles of the mesh that do not need to store data structure for the entire mesh in memory in particular we implement novel combination of dijkstra and mmp aka continuous dijkstra methods that in our experiments on tins containing millions of triangles reduces the memory requirement by two orders of magnitude we are also able to compare distances computed by dijkstra fast marching and the mmp methods
we present general unwinding framework for the definition of information flow security properties of concurrent programs described in simple imperative language enriched with parallelism and atomic statement constructors we study different classes of programs obtained by instantiating the general framework and we prove that they entail the noninterference principle accurate proof techniques for the verification of such properties are defined by exploiting the tarski decidability result for first order formulae over the reals moreover we illustrate how the unwinding framework can be instantiated in order to deal with intentional information release and we extend our verification techniques to the analysis of security properties of programs admitting downgrading
method of analysing join algorithms based upon the time required to access transfer and perform the relevant cpu based operations on disk page is proposed the costs of variations of several of the standard join algorithms including nested block sort merge grace hash and hybrid hash are presented for given total buffer size the cost of these join algorithms depends on the parts of the buffer allocated for each purpose for example when joining two relations using the nested block join algorithm the amount of buffer space allocated for the outer and inner relations can significantly affect the cost of the join analysis of expected and experimental results of various join algorithms show that combination of the optimal nested block and optimal grace hash join algorithms usually provide the greatest cost benefit unless the relation size is small multiple of the memory size algorithms to quickly determine buffer allocation producing the minimal cost for each of these algorithms are presented when the relation size is small multiple of the amount of main memory available typically up to three to six times the hybrid hash join algorithm is preferable
we present an approach for adaptive scheduling of soft real time legacy applications for which no timing information is exposed to the system our strategy is based on the combination of two techniques real time monitor that observes the sequence of events generated by the application to infer its activation period feedback mechanism that adapts the scheduling parameters to ensure timely execution of the application by thorough experimental evaluation of an implementation of our approach we show its performance and its efficiency
the self organising map som is finding more and more applications in wide range of fields such as clustering pattern recognition and visualisation it has also been employed in knowledge management and information retrieval we propose an alternative to existing dimensional som based methods for document analysis the method termed adaptive topological tree structure atts generates taxonomy of underlying topics from set of unclassified unstructured documents the atts consists of hierarchy of adaptive self organising chains each of which is validated independently using proposed entropy based bayesian information criterion node meeting the expansion criterion spans child chain with reduced vocabulary and increased specialisation the atts creates topological tree of topics which can be browsed like content hierarchy and reflects the connections between related topics at each level review is also given on the existing neural network based methods for document clustering and organisation experimental results on real world datasets using the proposed atts method are presented and compared with other approaches the results demonstrate the advantages of the proposed validation criteria and the efficiency of the atts approach for document organisation visualisation and search it shows that the proposed methods not only improve the clustering results but also boost the retrieval
virtual organization provides cost efficient method allowing different autonomous entities such as organizations departments and individuals to extend service offerings in virtual marketplace to support cost efficient service provisioning suitable procedure must be applied to determine the amount of resources necessary for the operation of virtual organizations we propose new mathematical model for quantitative performance evaluation of resource management in virtual organizations we present an efficient algorithm to determine the steady state probabilities and the performance measures of the system comparison with detailed simulation model and other numerical approaches shows that the proposed algorithm is fast and accurate this algorithm can therefore be used for resource dimensioning to support the cost efficient operation of virtual organizations
in java software one important flexibility mechanism is dynamic class loading unfortunately the vast majority of static analyses for java treat dynamic class loading either unsoundly or too conservatively we present novel semi static approach for resolving dynamic class loading by combining static string analysis with dynamically gathered information about the execution environment the insight behind the approach is that dynamic class loading often depends on characteristics of the environment that are encoded in various environment variables such variables are not static elements however their run time values typically remain the same across multiple executions of the application thus the string values reported by our technique are tailored to the current installation of the system under analysis additionally we propose extensions of string analysis to increase the number of sites that can be resolved purely statically and to track the names of environment variables an experimental evaluation on the java standard libraries shows that state of the art purely static approach resolves only of non trivial sites while our approach resolves of such sites we also demonstrate how the information gained from resolved dynamic class loading can be used to determine the classes that can potentially be instantiated through the use of reflection our extensions of string analysis greatly increase the number of resolvable reflective instantiation sites this work is step towards making static analysis tools better equipped to handle the dynamic features of java
continuous monitoring of network domain poses several challenges first routers of network domain need to be polled periodically to collect statistics about delay loss and bandwidth second this huge amount of data has to be mined to obtain useful monitoring information this increases the overhead for high speed core routers and restricts the monitoring process from scaling to large number of flows to achieve scalability polling and measurements that involve core routers should be avoided we design and evaluate distributed monitoring scheme that uses only edge to edge measurements and scales well to large network domains in our scheme all edge routers form an overlay network with their neighboring edge routers the network is probed intelligently from nodes in the overlay to detect congestion in both directions of link the proposed scheme involves only edge routers and requires significantly fewer number of probes than existing monitoring schemes through analytic study and series of experiments we show that the proposed scheme can effectively identify the congested links the congested links are used to capture the misbehaving flows that are violating their service level agreements or attacking the domain by injecting excessive traffic
category constrained search in which category is selected to restrict the query to match instances in that category is very popular mechanism provided by information sources it reduces the search space and improves both the response time and the quality of the query results as more and more online sources are available it is challenging to build an integrated search system which provides unified interface and metasearch capability to search and access all sources from different websites in one query submission one of the fundamental problems with building such an integrated system is category mapping which maps the selected category in the unified interface to categories provided by the information sources in this paper we present an efficient algorithm for automatic category mapping our experiment shows that our approach is very convincing and can be used to implement automatic category mapping for the integration of category constrained search
the term model driven engineering mde is typically used to describe software development approaches in which abstract models of software systems are created and systematically transformed to concrete implementations in this paper we give an overview of current research in mde and discuss some of the major challenges that must be tackled in order to realize the mde vision of software development we argue that full realizations of the mde vision may not be possible in the near to medium term primarily because of the wicked problems involved on the other hand attempting to realize the vision will provide insights that can be used to significantly reduce the gap between evolving software complexity and the technologies used to manage complexity
client interactions with web accessible network services are organized into sessions involving requests that read and write shared application data when executed concurrently web sessions may invalidate each other’s data allowing the session with invalid data to progress might lead to financial penalties for the service provider while blocking the session’s progress will result in user dissatisfaction compromise would be to tolerate some bounded data inconsistency which would allow most of the sessions to progress while limiting the potential financial loss incurred by the service this paper develops analytical models of concurrent web sessions with bounded inconsistency in shared data which enable quantitative reasoning about these tradeoffs we illustrate our models using the sample buyer scenario from the tpc benchmark and validate them by showing their close correspondence to measured results in both simulated and real web server environments we augment our web server with profiling and automated decision making infrastructure which is shown to successfully choose the best concurrency control algorithm in real time in response to changing service usage patterns
in this paper we unify two supposedly distinct tasks in multimedia retrieval one task involves answering queries with few examples the other involves learning models for semantic concepts also with few examples in our view these two tasks are identical with the only differentiation being the number of examples that are available for training once we adopt this unified view we then apply identical techniques for solving both problems and evaluate the performance using the nist trecvid benchmark evaluation data we propose combination hypothesis of two complementary classes of techniques nearest neighbor model using only positive examples and discriminative support vector machine model using both positive and negative examples in case of queries where negative examples are rarely provided to seed the search we create pseudo negative samples we then combine the ranked lists generated by evaluating the test database using both methods to create final ranked list of retrieved multimedia items we evaluate this approach for rare concept and query topic modeling using the nist trecvid video corpusin both tasks we find that applying the combination hypothesis across both modeling techniques and variety of features results in enhanced performance over any of the baseline models as well as in improved robustness with respect to training examples and visual features in particular we observe an improvement of for rare concept detection and for the search task
wireless sensor and actor networks wsans are made up of large number of sensing devices which are resource impoverished nodes and powerful actuation devices both are equipped with computation and communication capabilities these devices cooperate to manage sensing and perform acting tasks numerous work conducted in the field of wsans assumes the existence of addresses and routing infrastructure to validate their proposals however assigning addresses and delivering detected events in these networks remains highly challenging specifically due to the sheer number of nodes to address these issues this paper proposes subcast novel distributed address assignment and routing scheme based on topic clustering system and fractal theory iterated function systems in order to minimize data delivery costs among actors the proposed architecture first builds an actor overlay network before allocating addresses to network nodes location information in the allocated addresses allows establishing data delivery paths simulation results confirm that the proposed system efficiently guarantees the allocation of unique addresses and performs efficient data delivery while reducing communication costs delays as well as the impact of imprecise locations
variety of systems with possibly embedded computing power such as small portable robots hand held computers and automated vehicles have power supply constraints their batteries generally last only for few hours before being replaced or recharged it is important that all design efforts are made to conserve power in those systems energy consumption in system can be reduced using number of techniques such as low power electronics architecture level power reduction compiler techniques to name just few however energy conservation at the application software level has not yet been explored in this paper we show the impact of various software implementation techniques on energy saving based on the observation that different instructions of processor cost different amount of energy we propose three energy saving strategies namely assigning live variables to registers ii avoiding repetitive address computations and iii minimizing memory accesses we also study how variety of algorithm design and implementation techniques affect energy consumption in particular we focus on the following aspects recursive versus iterative with stacks and without stacks ii different representations of the same algorithm iii different algorithms with identical asymptotic complexity for the same problem and iv different input representations we demonstrate the energy saving capabilities of these approaches by studying variety of applications related to power conscious systems such as sorting pattern matching matrix operations depth first search and dynamic programming from our experimental results we conclude that by suitably choosing an algorithm for problem and applying the energy saving techniques energy savings in excess of can be achieved
digital rights management drm can be considered to be mechanism to enforce access control over resource without considering its location there are currently no formal models for drm although there has been some work in analysing and formalising the interpretation of access control rules in drm systems formal model for drm is essential to provide specific access control semantics that are necessary for creating interoperable unambiguous implementations in this paper we discuss how drm differs as an access control model to the three well known traditional access control models dac mac and rbac and using these existing approaches motivate set of requirements for formal model for drm thereafter we present formal description of lirel rights expression language that is able to express access control policies and contractual agreement in single use license our motivation with this approach is to identify the different components in license contract and define how these components interact within themselves and with other components of the license formal notation allows for an uniform and unambiguous interpretation and implementation of the access control policies
cps transformation is an important tool in the compilation of functional programming languages for strict languages such as our web programming language rinso or microsoft’s monadic expressions can help with structuring and composing computationsto apply cps transformation in the compilation process of such language we integrate explicit monadic abstraction in call by value source language present danvy filinski style cps transformation for this extension and verify that the translation preserves simple typing we establish the simulation properties of this transformation in an untyped setting and relate it to two stage transformation that implements the monadic abstraction with thunks and introduces continuations in second step furthermore we give direct style translation which corresponds to the monadic translation
we propose new algorithm for building decision tree classifiers the algorithm is executed in distributed environment and is especially designed for classifying large data sets and streaming data it is empirically shown to be as accurate as standard decision tree classifier while being scalable for processing of streaming data on multiple processors these findings are supported by rigorous analysis of the algorithm’s accuracy the essence of the algorithm is to quickly construct histograms at the processors which compress the data to fixed amount of memory master processor uses this information to find near optimal split points to terminal tree nodes our analysis shows that guarantees on the local accuracy of split points imply guarantees on the overall tree accuracy the acm portal is published by the association for computing machinery copyright acm inc terms of usage privacy policy code of ethics contact us useful downloads adobe acrobat quicktime windows media player real player
alert correlation is process that analyzes the alerts produced by one or more intrusion detection systems and provides more succinct and high level view of occurring or attempted intrusions even though the correlation process is often presented as single step the analysis is actually carried out by number of components each of which has specific goal unfortunately most approaches to correlation concentrate on just few components of the process providing formalisms and techniques that address only specific correlation issues this paper presents general correlation model that includes comprehensive set of components and framework based on this model tool using the framework has been applied to number of well known intrusion detection data sets to identify how each component contributes to the overall goals of correlation the results of these experiments show that the correlation components are effective in achieving alert reduction and abstraction they also show that the effectiveness of component depends heavily on the nature of the data set analyzed
we propose new algorithm for fusion transformation that allows both stacks and accumulating parameters the new algorithm can fuse programs that cannot be handled by existing fusion techniques eg xml transformations the algorithm is formulated in modular type directed style where the transformation process is comprised of several transformation steps that change types but preserve the observational behavior of programs we identify class of functions to which our new fusion method successfully applies and show that closure property holds for that class
on line transducers are an important class of computational agent we construct and compose together many software systems using them such as stream processors layered network protocols dsp networks and graphics pipelines we show an interesting use of continuations that when taken in cps setting exposes the control flow of these systems this enables cps based compiler to optimise systems composed of these transducers using only standard known analyses and optimisations critically the analysis permits optimisation across the composition of these transducers allowing efficient construction of systems in hierarchical way
current discrete event simulator requires heavy simulation overhead to switch between different components to simulate them in strictly chronological order therefore timed simulation is significantly slower than un timed simulation by simply adding delays in the components and communication channels our timed mpeg decoder simulates more than times slower than an un timed simulation in this paper we propose partial order method to speed up timed simulation by relaxing the order that the components are simulated with partial order method component is not required to schedule channel access if both behavioral and timing results of the access are known the simulation switches less frequently hence the simulation overhead reduces we show that partial order method can be used in complex system level simulation such as mpsoc implementations of the mpeg decoder in our experiments partial order method provides more than times speedups over regular discrete event simulation for timed simulation
the overlap of computation and communication has long been considered to be significant performance benefit for applications similarly the ability of the message passing interface mpi to make independent progress that is to make progress on outstanding communication operations while not in the mpi library is also believed to yield performance benefits using an intelligent network interface to offload the work required to support overlap and independent progress is thought to be an ideal solution but the benefits of this approach have not been studied in depth at the application level this lack of analysis is complicated by the fact that most mpi implementations do not sufficiently support overlap or independent progress recent work has demonstrated quantifiable advantage for an mpi implementation that uses offload to provide overlap and independent progress the study is conducted on two different platforms with each having two mpi implementations one with and one without independent progress thus identical network hardware and virtually identical software stacks are used furthermore one platform asci red allows further separation of features such as overlap and offload thus this paper extends previous work by further qualifying the source of the performance advantage offload overlap or independent progress
this paper presents sharp framework for secure distributed resource management in an internet scale computing infrastructure the cornerstone of sharp is construct to represent cryptographically protected resource it claims it promises or rights to control resources for designated time intervals together with secure mechanisms to subdivide and delegate claims across network of resource managers these mechanisms enable flexible it resource peering it sites may trade their resources with peering partners or contribute them to federation according to local policies separation of claims into it tickets it and it leases it allows coordinated resource management across the system while preserving site autonomy and local control over resources sharp also introduces mechanisms for controlled accountable it oversubscription it of resource claims as fundamental tool for dependable efficient resource management we present experimental results from sharp prototype for planetlab and illustrate its use with decentralized barter economy for global planetlab resources the results demonstrate the power and practicality of the architecture and the effectiveness of oversubscription for protecting resource availability in the presence of failures
research interest in grid computing has grown significantly over the past five years management of distributed resources is one of the key issues in grid computing central to management of resources is the effectiveness of resource allocation as it determines the overall utility of the system the current approaches to brokering in grid environment are non coordinated since application level schedulers or brokers make scheduling decisions independently of the others in the system clearly this can exacerbate the load sharing and utilization problems of distributed resources due to sub optimal schedules that are likely to occur to overcome these limitations we propose mechanism for coordinated sharing of distributed clusters based on computational economy the resulting environment called grid federation allows the transparent use of resources from the federation when local resources are insufficient to meet its users requirements the use of computational economy methodology in coordinating resource allocation not only facilitates the quality of service qos based scheduling but also enhances utility delivered by resources we show by simulation while some users that are local to popular resources can experience higher cost and or longer delays the overall users qos demands across the federation are better met also the federation’s average case message passing complexity is seen to be scalable though some jobs in the system may lead to large numbers of messages before being scheduled
effective knowledge management in knowledge intensive environment can place heavy demands on the information filtering if strategies used to model workers long term task needs because of the growing complexity of knowledge intensive work tasks profiling technique is needed to deliver task relevant documents to workers in this study we propose an if technique with task stage identification that provides effective codification based support throughout the execution of task task needs pattern similarity analysis based on correlation value is used to identify worker’s task stage the pre focus focus formulation or post focus task stage the identified task stage is then incorporated into profile adaptation process to generate the worker’s current task profile the results of pilot study conducted in research institute confirm that there is low or negative correlation between search sessions and transactions in the pre focus task stage whereas there is at least moderate correlation between search sessions transactions in the post focus stage compared with the traditional if technique the proposed if technique with task stage identification achieves on average improvement in task relevant document support the results confirm the effectiveness of the proposed method for knowledge intensive work tasks
although sensor planning in computer vision has been subject of research for over two decades vast majority of the research seems to concentrate on two particular applications in rather limited context of laboratory and industrial workbenches namely object reconstruction and robotic arm manipulation recently increasing interest is engaged in research to come up with solutions that provide wide area autonomous surveillance systems for object characterization and situation awareness which involves portable wireless and or internet connected radar digital video and or infrared sensors the prominent research problems associated with multisensor integration for wide area surveillance are modality selection sensor planning data fusion and data exchange communication among multiple sensors thus the requirements and constraints to be addressed include far field view wide coverage high resolution cooperative sensors adaptive sensing modalities dynamic objects and uncontrolled environments this article summarizes new survey and analysis conducted in light of these challenging requirements and constraints it involves techniques and strategies from work done in the areas of sensor fusion sensor networks smart sensing geographic information systems gis photogrammetry and other intelligent systems where finding optimal solutions to the placement and deployment of multimodal sensors covering wide area is important while techniques covered in this survey are applicable to many wide area environments such as traffic monitoring airport terminal surveillance parking lot surveillance etc our examples will be drawn mainly from such applications as harbor security and long range face recognition
queries on xml documents typically combine selections on element contents and via path expressions the structural relationships between tagged elements structural joins are used to find all pairs of elements satisfying the primitive structural relationships specified in the query namely parent child and ancestor descendant relationships efficient support for structural joins is thus the key to efficient implementations of xml queries recently proposed node numbering schemes enable the capturing of the xml document structure using traditional indices such as trees or trees this paper proposes efficient structural join algorithms in the presence of tag indices we first concentrate on using trees and show how to expedite structural join by avoiding collections of elements that do not participate in the join we then introduce an enhancement based on sibling pointers that further improves performance such sibling pointers are easily implemented and dynamically maintainable we also present structural join algorithm that utilizes trees an extensive experimental comparison shows that the tree structural joins are more robust furthermore they provide drastic improvement gains over the current state of the art
shared memory programs running on non uniform memory access numa machines usually face inherent performance problems stemming from excessive remote memory accesses solution called the adaptive runtime system ars is presented in this paper ars is designed to adjust the data distribution at runtime through automatic page migrations it uses memory access histograms gathered by hardware monitors to find access hot spots and based on this detection to dynamically and transparently modify the data layout in this way incorrectly allocated data can be moved to the most appropriate node and hence data locality can be improved simulations show that this allows to achieve performance gain of as high as
in typical application of association rule learning from market basket data set of transactions for fixed period of time is used as input to rule learning algorithms for example the well known apriori algorithm can be applied to learn set of association rules from such transaction set however learning association rules from set of transactions is not one time only process for example market manager may perform the association rule learning process once every month over the set of transactions collected through the last month for this reason we will consider the problem where transaction sets are input to the system as stream of packages the sets of transactions may come in varying sizes and in varying periods once set of transactions arrive the association rule learning algorithm is executed on the last set of transactions resulting in new association rules therefore the set of association rules learned will accumulate and increase in number over time making the mining of interesting ones out of this enlarging set of association rules impractical for human experts we refer to this sequence of rules as association rule set stream or streaming association rules and the main motivation behind this research is to develop technique to overcome the interesting rule selection problem successful association rule mining system should select and present only the interesting rules to the domain experts however definition of interestingness of association rules on given domain usually differs from one expert to another and also over time for given expert this paper proposes post processing method to learn subjective model for the interestingness concept description of the streaming association rules the uniqueness of the proposed method is its ability to formulate the interestingness issue of association rules as benefit maximizing classification problem and obtain different interestingness model for each user in this new classification scheme the determining features are the selective objective interestingness factors related to the interestingness of the association rules and the target feature is the interestingness label of those rules the proposed method works incrementally and employs user interactivity at certain level it is evaluated on real market dataset the results show that the model can successfully select the interesting ones
emerging software development environments are characterized by heterogeneity they are composed of diverse object stores user interfaces and tools this paper presents an approach for providing hypermedia services in this heterogeneous setting central notions of the approach include the following anchors are established with respect to interactive views of objects rather than the objects themselves composable ary links can be established between anchors on different views of objects which may be stored in distinct object bases viewers may be implemented in different programming languages and hypermedia services are provided to multiple concurrently active viewers the paper describes the approach supporting architecture and lessons learned related work in the areas of supporing heterogeneity and hypermedia data modeling is discussed the system has been employed in variety of contexts including research development and education
in this paper we suggest variational model for optic flow computation based on non linearised and higher order constancy assumptions besides the common grey value constancy assumption also gradient constancy as well as the constancy of the hessian and the laplacian are proposed since the model strictly refrains from linearisation of these assumptions it is also capable to deal with large displacements for the minimisation of the rather complex energy functional we present an efficient numerical scheme employing two nested fixed point iterations following coarse to fine strategy it turns out that there is theoretical foundation of so called warping techniques hitherto justified only on an experimental basis since our algorithm consists of the integration of various concepts ranging from different constancy assumptions to numerical implementation issues detailed account of the effect of each of these concepts is included in the experimental section the superior performance of the proposed method shows up by significantly smaller estimation errors when compared to previous techniques further experiments also confirm excellent robustness under noise and insensitivity to parameter variations
nowadays cisco netflow is the de facto standard tool used by network operators and administrators for monitoring large edge and core networks implemented by all major vendors and recently ietf standard netflow reports aggregated information about traffic traversing the routers in the form of flow records while this kind of data is already effectively used for accounting monitoring and anomaly detection the limited amount of information it conveys has until now hindered its employment for traffic classification purposes in this paper we present behavioral algorithm which successfully exploits netflow records for traffic classification since our classifier identifies an application by means of the simple counts of received packets and bytes netflow records contain all information required we test our classification engine based on machine learning algorithm over an extended set of traces containing heterogeneous mix of applications ranging from pp file sharing and pp live streaming to traditional client server services results show that our methodology correctly identifies the byte wise traffic volume with an accuracy of in the worst case thus representing first step towards the use of netflow data for fine grained classification of network traffic
we present an algorithm to efficiently and robustly process collisions contact and friction in cloth simulation it works with any technique for simulating the internal dynamics of the cloth and allows true modeling of cloth thickness we also show how our simulation data can be post processed with collision aware subdivision scheme to produce smooth and interference free data for rendering
critical reality in integration is that knowledge obtained from different sources may often be conflicting conflict resolution whether performed during the design phase or during run time can be costly and if done without proper understanding of the usage context can be ineffective in this paper we propose novel exploration and feedback based approach ficsr pronounced as fixer to conflict resolution when integrating metadata from different sources rather than relying on purely automated conflict resolution mechanisms ficsr brings the domain expert in the conflict resolution process and informs the integration based on the expert’s feedback in particular instead of relying on traditional model based definition of consistency which whenever there are conflicts picks possible world among many we introduce ranked interpretation of the metadata and statements about the metadata this not only enables ficsr to avoid committing to an interpretation too early but also helps in achieving more direct correspondence between the experts subjective interpretation of the data and the system’s objective treatment of the available alternatives consequently the ranked interpretation leads to new opportunities for exploratory feedback for conflict resolution within the context of given statement of interest preliminary ranking of candidate matches representing different resolutions of the conflicts informs the user about the alternative interpretations of the metadata while user feedback regarding the preferences among alternatives is exploited to inform the system about the expert’s relevant domain knowledge the expert’s feedback then is used for resolving not only the conflicts among different sources but also possible mis alignments due to the initial matching phase to enable this system stackrel informs longleftrightarrow user feedback process we develop data structures and algorithms for efficient off line conflict agreement analysis of the integrated metadata we also develop algorithms for efficient on line query processing candidate result enumeration validity analysis and system feedback the results are brought together and evaluated in the feedback based inconsistency resolution ficsr system
this paper examines heuristic algorithms for processing distributed queries using generalized joins as this optimization problem is np hard heuristic algorithms are deemed to be justified heuristic algorithm to form formulate strategies to process queries is presented it has special property in that its overhead can be ldquo controlled rdquo the higher its overhead the better the strategies it produces modeling on test bed of queries is used to demonstrate that there is trade off between the strategy’s execution and formulation delays the modeling results also support the notion that simple greedy heuristic algorithms such as are proposed by many researchers are sufficient in that they are likely to lead to near optimal strategies and that increasing the overhead in forming strategies is only marginally beneficial both the strategy formulation and execution delays are examined in relation to the number of operations specified by the strategy and the total size of partial results
scalable storage architectures allow for the addition or removal of storage devices to increase storage capacity and bandwidth or retire older devices assuming random placement of data objects across multiple storage devices of storage pool our optimization objective is to redistribute minimum number of objects after scaling the pool in addition uniform distribution and hence balanced load should be ensured after redistribution moreover the redistributed objects should be retrieved efficiently during the normal mode of operation in one access and with low complexity computation to achieve this we propose an algorithm called random disk labeling rdl based on double hashing where storage can be added or removed without any increase in complexity we compare rdl with other proposed techniques and demonstrate its effectiveness through experimentation
this article analyses the first years of research published in the information systems frontiers isf from to the analysis of the published material includes examining variables such as most productive authors citation analysis universities associated with the most publications geographic diversity authors backgrounds and research methods the keyword analysis suggests that isf research has evolved from establishing concepts and domain of information systems is technology and management to contemporary issues such as outsourcing web services and security the analysis presented in this paper has identified intellectually significant studies that have contributed to the development and accumulation of intellectual wealth of isf the analysis has also identified authors published in other journals whose work largely shaped and guided the researchers published in isf this research has implications for researchers journal editors and research institutions
pairwise key establishment in mobile ad hoc networks allows any pair of nodes to agree upon shared key this is an important security service needed to secure routing protocols and in general to facilitate secure communication among the nodes of the network we present two self keying mechanisms for pairwise key establishment in mobile ad hoc networks which do not require any centralized support the mechanisms are built using the well known technique of threshold secret sharing and are robust and secure against collusion of up to certain number of nodes we evaluate and compare the performance of both the mechanisms in terms of the node admission and pairwise key establishment
existing source code based program comprehension approaches analyze either the code itself or the comments identifiers but not both in this research we combine code understanding with comment and identifier understanding this synergistic approach allows much deeper understanding of source code than is possible using either code or comments identifiers alone our approach also allows comparing comments to their associated code to see whether they match or belong to the code our combined approach implements both our heuristic code understanding and the comment identifier understanding within the same knowledge base inferencing engine this inferencing engine is the same used by an earlier well tested mature comment identifier based program understanding approach
to serve asynchronous requests using multicast two categories of techniques stream merging and periodic broadcasting have been proposed for sequential streaming access where requests are uninterrupted from the beginning to the end of an object these techniques are highly scalable the required server bandwidth for stream merging grows logarithmically as request arrival rate and the required server bandwidth for periodic broadcasting varies logarithmically as the inverse of start up delay sequential access model however is inappropriate to model partial requests and client interactivity observed in various streaming access workloads this paper analytically and experimentally studies the scalability of multicast delivery under non sequential access model where requests start at random points in the object we show that the required server bandwidth for any protocol providing immediate service grows at least as the square root of request arrival rate and the required server bandwidth for any protocol providing delayed service grows linearly with the inverse of start up delay we also investigate the impact of limited client receiving bandwidth on scalability we optimize practical protocols which provide immediate service to non sequential requests the protocols utilize limited client receiving bandwidth and they are near optimal in that the required server bandwidth is very close to its lower bound
this paper develops static type system equivalent to static single assignment ssa form in this type system type of variable at some program point represents the control flows from the assignment statements that reach the program point for this type system we show that derivable typing of program corresponds to the program in ssa form by this result any ssa transformation can be interpreted as type inference process in our type system by adopting result on efficient ssa transformation we develop type inference algorithm that reconstructs type annotated code from given code these results provide static alternative to ssa based compiler optimization without performing code transformation since this process does not change the code it does not incur overhead due to insertion of functions another advantage of this type based approach is that it is not constrained to naming mechanism of variables and can therefore be combined with other static properties useful for compilation and code optimization such as liveness information of variables as an application we express optimizations as type directed code transformations
these notes are an introduction to ql an object oriented query language for any type of structured data we illustrate the use of ql in assessing software quality namely to find bugs to compute metrics and to enforce coding conventions the class mechanism of ql is discussed in depth and we demonstrate how it can be used to build libraries of reusable queries
this paper presents model called scene driver for the re use of film and television material we begin by exploring general issues surrounding the ways in which content can be sub divided into meaningful units for re use and how criteria might then be applied to the selection and ordering of these units we also identify and discuss the different means by which user might interact with the content to create novel and engaging experiences the scene driver model has been instantiated using content from an animated children’s television series called tiny planets which is aimed at children of year old this type of material being story based itself lends itself particularly well to the application of narrative constraints to scene reordering to provide coherence to the experience of interacting with the content we propose an interactive narrative driven game architecture in which user generates novel narratives from existing content by placing domino like tiles these tiles act as glue between scenes and each tile choice dictates certain properties of the next scene to be shown within game there are three different game types based on three different ways in which tiles can be matched to scenes we introduce algorithms for generating legal tile sets for each of these three game types which can be extended to include narrative constraints this ensures that all novel orderings adhere to minimum narrative plan which has been identified based on analysis of the tiny planets series and on narrative theories we also suggest ways in which basic narratives can be enhanced by the inclusion of directorial techniques and by the use of more complex plot structures in our evaluation studies with children in the target age range our game compared favourably with other games that the children enjoyed playing
in this work we focus on managing scientific environmental data which are measurement readings collected from wireless sensors in environmental science applications raw sensor data often need to be validated interpolated aligned and aggregated before being used to construct meaningful result sets due to the lack of system that integrates all the necessary processing steps scientists often resort to multiple tools to manage and process the data which can severely affect the efficiency of their work in this paper we propose new data processing framework hypergrid to address the problem hypergrid adopts generic data model and generic query processing and optimization framework it offers an integrated environment to store query analyze and visualize scientific datasets the experiments on real query set and data set show that the framework not only introduces little processing overhead but also provides abundant opportunities to optimize the processing cost and thus significantly enhances the processing efficiency
although overlap between specifications mdash that is the incorporation of elements which designate common aspects of the system of concern mdash is precondition for specification inconsistency it has only been side concern in requirements engineering research this paper is concerned with overlaps it defines overlap relations in terms of specification interpretations identifies properties of these relations which are derived from the proposed definition shows how overlaps may affect the detection of inconsistency semi shows how specifications could be rewritten to reflect overlap relations and still be amenable to consistency checking using theorem proving semi analyses various methods that have been proposed for identifying overlaps with respect to the proposed definition semi and outlines directions for future research
clustering on multi type relational data has attracted more and more attention in recent years due to its high impact on various important applications such as web mining commerce and bioinformatics however the research on general multi type relational data clustering is still limited and preliminary the contribution of the paper is three fold first we propose general model the collective factorization on related matrices for multi type relational data clustering the model is applicable to relational data with various structures second under this model we derive novel algorithm the spectral relational clustering to cluster multi type interrelated data objects simultaneously the algorithm iteratively embeds each type of data objects into low dimensional spaces and benefits from the interactions among the hidden structures of different types of data objects extensive experiments demonstrate the promise and effectiveness of the proposed algorithm third we show that the existing spectral clustering algorithms can be considered as the special cases of the proposed model and algorithm this demonstrates the good theoretic generality of the proposed model and algorithm
it is not uncommon in parallel workloads to encounter shared data structures with read mostly access patterns where operations that update data are infrequent and most operations are read only typically data consistency is guaranteed using mutual exclusion or read write locks the cost of atomic update of lock variables result in high overheads and high cache coherence traffic under active sharing thus slowing down single thread performance and limiting scalability in this paper we present solero software optimistic lock elision for read only critical sections new lock implementation called for optimizing read only critical sections in java based on sequential locks solero is compatible with the conventional lock implementation of java however unlike the conventional implementation only critical sections that may write data or have side effects need to update lock variables while read only critical sections need only read lock variables without writing them each writing critical section changes the lock value to new value hence read only critical section is guaranteed to be consistent if the lock is free and its value does not change from the beginning to the end of the read only critical section using java workloads including specjbb and the hashmap and treemap java classes we evaluate the performance impact of applying solero to read mostly locks our experimental results show performance improvements across the board often substantial in both single thread speed and scalability over the conventional lock implementation mutual exclusion and read write locks solero improves the performance of specjbb by on single and multiple threads the results using the hashmap and treemap benchmarks show that solero outperforms the conventional lock implementation and read write locks by substantial multiples on multi threads
this paper presents detailed analysis of the structure and components of queries written by experimental participants in study that manipulated two factors found to affect end user information retrieval performance training in boolean logic and the type of search interface as reported previously we found that both boolean training and the use of an assisted interface improved the participants ability to find correct responses to information requests here we examine the impact of these training and interface manipulations on the boolean operators and search terms that comprise the submitted queries our analysis shows that both boolean training and the use of an assisted interface improved the participants ability to correctly utilize various operators an unexpected finding is that this training also had positive impact on term selection the terms and to lesser extent the operators comprising query were important factors affecting the participants performance in query tasks our findings demonstrate that even small training interventions can improve the users search performance and highlight the need for additional information retrieval research into how search interfaces can provide superior support to today’s untrained users of the web
many powerful methods and tools exist for extracting meaning from scientific publications their texts and their citation links however existing proposals often neglect fundamental aspect of learning that understanding and learning require an active and constructive exploration of domain in this paper we describe new method and tool that use data mining and interactivity to turn the typical search and retrieve dialogue in which the user asks questions and system gives answers into dialogue that also involves sense making in which the user has to become active by constructing bibliography and domain model of the search term this model starts from an automatically generated and annotated clustering solution that is iteratively modified by users the tool is part of an integrated authoring system covering all phases from search through reading and sense making to writing two evaluation studies demonstrate the usability of this interactive and constructive approach and they show that clusters and groups represent identifiable sub topics
we study the problem of efficiently evaluating similarity queries on histories where history is dimensional time series for ge while there are some solutions for time series and spatio temporal trajectories where typically le we are not aware of any work that examines the problem for larger values of in this paper we address the problem in its general case and propose class of summaries for histories with few interesting properties first for commonly used distance functions such as the lp norm lcss and dtw the summaries can be used to efficiently prune some of the histories that cannot be in the answer set of the queries second histories can be indexed based on their summaries hence the qualifying candidates can be efficiently retrieved to further reduce the number of unnecessary distance computations for false positives we propose finer level approximation of histories and an algorithm to find an approximation with the least maximum distance estimation error experimental results confirm that the combination of our feature extraction approaches and the indexability of our summaries can improve upon existing methods and scales up for larger values of and database sizes based on our experiments on real and synthetic datasets of dimensional histories
the simulation of large scale vegetation has always been important to computer graphics as well as to virtual plantation in ecology although we can employ texture synthesis technique to produce large scale vegetation from small input sample the synthesized result lacks physical accuracy which only focuses on visual appearance on the other hand the model developed in bio physical area to simulate vegetation evolution provides meaningful distribution result of vegetation however the visualization of these meaning results is simple and preliminary in this paper we propose new controllable texture synthesis method to generate visually pleasing large scale vegetation of physical meaning under the guidance of the control map the control map is computed by the vegetation evolution model as well as considering the convenience to guide synthesis furthermore we adapt our method on gpu to realize real time simulation of large scale vegetation the experimental results demonstrate that our method is effective and efficient
the requirements specifications of complex systems are increasingly developed in distributed fashion it makes inconsistency management necessary during the requirements stage however identifying appropriate inconsistency handling proposals is still an important challenge in particular for inconsistencies involving many different stakeholders with different concerns it is difficult to reach an agreement on inconsistency handling to address this this paper presents vote based approach to choosing acceptable common proposals for handling inconsistency this approach focuses on the inconsistency in requirements that results from conflicting intentions of stakeholders informally speaking we consider each distinct stakeholder or distributed artifact involved in the inconsistency as voter then we transform identification of an acceptable common proposal into problem of combinatorial vote based on each stakeholder’s preferences on the set of proposals an acceptable common proposal is identified in an automated way according to given social vote rule
anypoint is new model for one to many communication with ensemble sites aggregations of end nodes that appear to the external internet as unified site policies for routing any point traffic are defined by application layer plugins residing in extensible routers at the ensemble edge anypoint’s switching functions operate at the transport layer at the granularity of transport frames communication over an anypoint connection preserves end to end transport rate control partial ordering and reliable delivery experimental results from host based anypoint prototype and an nfs storage router application show that anypoint is powerful technique for virtualizing and extending cluster services and is amenable to implementation in high speed switches the anypoint prototype improves storage router throughput by relative to tcp proxy
tinyos is an effective platform for developing lightweight embedded network applications but the platform’s lean programming model and power efficient operation come at price tinyos applications are notoriously difficult to construct debug and maintain the development difficulties stem largely from programming model founded on events and deferred execution in short the model introduces non determinism in the execution ordering of primitive actions an issue exacerbated by the fact that embedded network systems are inherently distributed and reactive the resulting set of possible execution sequences for given system is typically large and can swamp developers unaided ability to reason about program behavior in this paper we present visualization toolkit for tinyos to aid in program comprehension the goal is to assist developers in reasoning about the computation forest underlying system under test and the particular branches chosen during each run the toolkit supports comprehension activities involving both local and distributed runtime behavior the constituent components include full featured static analysis and instrumentation library ii selection based probe insertion system iii lightweight event recording service iv trace extraction and reconstruction tool and three visualization front ends we demonstrate the utility of the toolkit using both standard and custom system examples and present an analysis of the toolkit’s resource usage and performance characteristics
shared interface allowing several users in co presence to interact simultaneously on digital data on single display is an uprising challenge in human computer interaction hci its development is motivated by the advent of large displays such as wall screens and tabletops it affords fluid and natural digital interaction without hindering human communication and collaboration it enables mutual awareness making participant conscious of each other activities in this paper we are interested in mixed presence groupware mpg when two or more remote shared interfaces are connected for distant collaborative session our contribution strives to answer to the question can the actual technology provide sufficient presence feeling of the remote site to enable efficient collaboration between two distant groups we propose digitable an experimental platform we hope lessen the gap between collocated and distant interaction digitable is combining multiuser tactile interactive tabletop video communication system enabling eye contact with real size distant user visualization and spatialized sound system for speech transmission robust computer vision module for distant users gesture visualization completes the platform we discuss first experiments using digitable for collaborative task mosaic completion in term of distant mutual awareness although digitable does not provide the same presence feeling in distant and or collocated situation first and important finding emerges distance does not hinder efficient collaboration anymore
we present the ssbox modular signaling platform tool for rapid application prototyping in cellular mobile network and examine in detail its performance limits for the application of active network based tracking called sstracker this application is highly configurable non intrusive and cost effective solution for large scale data collection on user mobility in the network uniquely enabling tracking of both active and passive mobile clients we present performance studies of real deployment in an existing cellular network and document the measured as well as simulated performance limits such as platform interconnection utilization other factors such as impact on battery consumption of the tracked device are studied as well platform modularity and variability is discussed and demonstrated by further deployed use cases we conclude by observing promising applicability for future cellular networks
due to the popularity of computer games and computer animated movies models are fast becoming an important element in multimedia applications in addition to the conventional polygonal representation for these models the direct adoption of the original scanned point set for model representation is recently gaining more and more attention due to the possibility of bypassing the time consuming mesh construction stage and various approaches have been proposed for directly processing point based models in particular the design of simplification approach which can be directly applied to point based models to reduce their size is important for applications such as model transmission and archival given point based model which is defined by point set and desired reduced number of output samples the simplification approach finds point set which satisfies being the cardinality of and ii minimizes the difference of the corresponding surface defined by and the original surface defined by although number of previous approaches has been proposed for simplification most of them do not focus on point based models ii do not consider efficiency quality and generality together and iii do not consider the distribution of the output samples in this paper we propose an adaptive simplification method asm which is an efficient technique for simplifying point based complex models specifically the asm consists of three parts hierarchical cluster tree structure the specification of simplification criteria and an optimization process the asm achieves low computation time by clustering the points locally based on the preservation of geometric characteristics we analyze the performance of the asm and show that it outperforms most of the current state of the art methods in terms of efficiency quality and generality
key problem in grid networks is how to efficiently manage the available infrastructure in order to satisfy user requirements and maximize resource utilization this is in large part influenced by the algorithms responsible for the routing of data and the scheduling of tasks in this paper we present several multi cost algorithms for the joint scheduling of the communication and computation resources that will be used by grid task we propose multi cost scheme of polynomial complexity that performs immediate reservations and selects the computation resource to execute the task and determines the path to route the input data furthermore we introduce multi cost algorithms that perform advance reservations and thus also find the starting times for the data transmission and the task execution we initially present an optimal scheme of non polynomial complexity and by appropriately pruning the set of candidate paths we also give heuristic algorithm of polynomial complexity our performance results indicate that in grid network in which tasks are either cpu or data intensive or both it is beneficial for the scheduling algorithm to jointly consider the computational and communication problems comparison between immediate and advance reservation schemes shows the trade offs with respect to task blocking probability end to end delay and the complexity of the algorithms
in this paper we summarize our work on the udt high performance data transport protocol over the past four years udt was designed to effectively utilize the rapidly emerging high speed wide area optical networks it is built on top of udp with reliability control and congestion control which makes it quite easy to install the congestion control algorithm is the major internal functionality to enable udt to effectively utilize high bandwidth meanwhile we also implemented set of apis to support easy application implementation including both reliable data streaming and partial reliable messaging the original udt library has also been extended to composable udt which can support various congestion control algorithms we will describe in detail the design and implementation of udt the udt congestion control algorithm composable udt and the performance evaluation
financial institutions are restricted by legislation and have to ensure that mobile access to data is legal in defined context however today’s access control solutions work but cannot decide whether an access is legal especially when an access from different countries is required different legislations have to be taken into account in this paper we address the problem of law compliant access in international financial environments we present an extension to context aware access control systems so that they incorporate legal constraints to this end we introduce different facets of context information their interrelations and describe their necessity for law aware access control finally by using an international banking application scenario we demonstrate how system that follows our approach can decide about access
in this paper we present learn an xml based multi agent system for supporting user device adaptive learning ie learning activities which take into account the profile past behaviour preferences and needs of users as well as the characteristics of the devices they use for these activities learn is characterized by the following features it is highly subjective since it handles quite rich and detailed user profile that plays key role during the learning activities ii it is dynamic and flexible ie it is capable of reacting to variations in user exigencies and objectives iii it is device adaptive since it decides the learning objects to present to the user on the basis of the device he is currently using iv it is generic ie it is capable of operating in large variety of learning contexts it is xml based since it exploits many facilities of xml technology for handling and exchanging information related to learning activities the paper also reports various experimental results as well as comparison between learn and other related learning management systems already presented in the literature
advances in image acquisition and storage technology have led to tremendous growth in very large and detailed image databases these images if analyzed can reveal useful information to the human users image mining deals with the extraction of implicit knowledge image data relationship or other patterns not explicitly stored in the images image mining is more than just an extension of data mining to image domain it is an interdisciplinary endeavor that draws upon expertise in computer vision image processing image retrieval data mining machine learning database and artificial intelligence in this paper we will examine the research issues in image mining current developments in image mining particularly image mining frameworks state of the art techniques and systems we will also identify some future research directions for image mining
schema matching is basic problem in many database application domains such as data integration business data warehousing and semantic query processing in current implementations schema matching is typically performed manually which has significant limitations on the other hand previous research papers have proposed many techniques to achieve partial automation of the match operation for specific application domains we present taxonomy that covers many of these existing approaches and we describe the approaches in some detail in particular we distinguish between schema level and instance level element level and structure level and language based and constraint based matchers based on our classification we review some previous match implementations thereby indicating which part of the solution space they cover we intend our taxonomy and review of past work to be useful when comparing different approaches to schema matching when developing new match algorithm and when implementing schema matching component
in this paper we are concerned with disseminating high volume data streams to many simultaneous applications over low bandwidth wireless mesh network for bandwidth efficiency we propose group aware stream filtering approach used in conjunction with multicasting that exploits two overlooked yet important properties of these applications many applications can tolerate some degree of slack in their data quality requirements and there may exist multiple subsets of the source data satisfying the quality needs of an application we can thus choose the best alternative subset for each application to maximise the data overlap within the group to best benefit from multicasting an evaluation of our prototype implementation shows that group aware data filtering can save bandwidth with low cpu overhead we also analyze the key factors that affect its performance based on testing with heterogeneous filtering requirements
explicit data graph execution edge architectures offer the possibility of high instruction level parallelism with energy efficiency in edge architectures the compiler breaks program into sequence of structured blocks that the hardware executes atomically the instructions within each block communicate directly instead of communicating through shared registers the trips edge architecture imposes restrictions on its blocks to simplify the microarchitecture each trips block has at most instructions issues at most loads and or stores and executes at most register bank reads and writes to detect block completion each trips block must produce constant number of outputs stores and register writes and branch decision the goal of the trips compiler is to produce trips blocks full of useful instructions while enforcing these constraints this paper describes set of compiler algorithms that meet these sometimes conflicting goals including an algorithm that assigns load and store identifiers to maximize the number of loads and stores within block we demonstrate the correctness of these algorithms in simulation on spec eembc and microbenchmarks extracted from spec and others we measure speedup in cycles over an alpha on microbenchmarks
this research examines the structural complexity of software and specifically the potential interaction of the two dominant dimensions of structural complexity coupling and cohesion analysis based on an information processing view of developer cognition results in theoretically driven model with cohesion as moderator for main effect of coupling on effort an empirical test of the model was devised in software maintenance context utilizing both procedural and object oriented tasks with professional software engineers as participants the results support the model in that there was significant interaction effect between coupling and cohesion on effort even though there was no main effect for either coupling or cohesion the implication of this result is that when designing implementing and maintaining software to control complexity both coupling and cohesion should be considered jointly instead of independently by providing guidance on structuring software for software professionals and researchers these results enable software to continue as the solution of choice for wider range of richer more complex problems
we study the fifo and causal multicast problem two group communication abstractions that deliver messages in an order consistent with their context with fifo multicast the context of message at process is all messages that were previously multicast by m’s sender and addressed to causal multicast extends the notion of context to all messages that are causally linked to by chain of multicast and delivery events we propose multicast algorithms for systems composed of set of disjoint groups of processes server racks or data centers these algorithms offer several desirable properties the protocols are latency optimal ii to deliver message only m’s sender and addressees communicate iii messages can be addressed to any subset of groups and iv these algorithms are highly resilient an arbitrary number of process failures is tolerated and we only require the network to be quasi reliable ie message is guaranteed to be received only if the sender and receiver of are always up to the best of our knowledge these are the first multicast protocols to offer all of these properties at the same time
this paper provides transparent and speculative algorithm for content based web page prefetching the algorithm relies on profile based on the internet browsing habits of the user it aims at reducing the perceived latency when the user requests document by clicking on hyperlink the proposed user profile relies on the frequency of occurrence for selected elements forming the web pages visited by the user these frequencies are employed in mechanism for the prediction of the user’s future actions for the anticipation of an adjacent action the anchored text around each of the outbound links is used and weights are assigned to these links some of the linked documents are then prefetched and stored in local cache according to the assigned weights the proposed algorithm was tested against three different prefetching algorithms and yield improved cache hit rates given moderate bandwidth overhead furthermore the precision of accurately inferring the user’s preference is evaluated through the recall precision curves statistical evaluation testifies that the achieved recall precision performance improvement is significant
research on rent’s rule in electrical engineering the applied sciences and technology has been based on the publication of interpretation of rent’s memoranda by landman and russo because of the wide impact of rent’s work and requests from researchers we present his original memoranda in this paper we review the impact of rent’s work and present the memoranda in the context of ibm computer hardware development since the furthermore because computer hardware components have changed significantly since the memoranda were written in new interpretation is needed for today’s ultra large scale integrated circuitry on the basis of our analysis of the memoranda one of the authors personal knowledge of the and computers and our experience in the design of high performance circuitry for microprocessor chips we have derived an historically equivalent interpretation of rent’s memoranda that is suitable for today’s computer components we describe an application of our historically equivalent interpretation to the problem of assessing on chip interconnection requirements of control logic circuitry in the ibm powertm microprocessor
the distance also known as the manhattan or taxicab distance between two vectors in rn is overn xi yi approximating this distance is fundamental primitive on massive databases with applications to clustering nearest neighbor search network monitoring regression sampling and support vector machines we give the first pass streaming algorithm for this problem in the turnstile model with space and update time the notation hides polylogarithmic factors in and the precision required to store vector entries all previous algorithms either required space or update time and or could not work in the turnstile model ie support an arbitrary number of updates to each coordinate our bounds are optimal up to factors
we argue that accessing the transitive closure of relationships is an important component of both databases and knowledge representation systems in artificial intelligence the demands for efficient access and management of large relationships motivate the need for explicitly storing the transitive closure in compressed and local way while allowing updates to the base relation to be propagated incrementally we present transitive closure compression technique based on labeling spanning trees with numeric intervals and provide both analytical and empirical evidence of its efficacy including proof of optimality
software engineering methodologies are subject to complex cost benefit tradeoffs economic models can help practitioners and researchers assess methodologies relative to these tradeoffs effective economic models however can be established only through an iterative process of refinement involving analytical and empirical methods sensitivity analysis provides one such method by identifying the factors that are most important to models sensitivity analysis can help simplify those models it can also identify factors that must be measured with care leading to guidelines for better test strategy definition and application in prior work we presented the first comprehensive economic model for the regression testing process that captures both cost and benefit factors relevant to that process while supporting evaluation of these processes across entire system lifetimes in this work we use sensitivity analysis to examine our model analytically and assess the factors that are most important to the model based on the results of that analysis we propose two new models of increasing simplicity we assess these models empirically on data obtained by using regression testing techniques on several non trivial software systems our results show that one of the simplified models assesses the relationships between techniques in the same way as the full model
in this paper we present pipeline for camera pose and trajectory estimation and image stabilization and rectification for dense as well as wide baseline omnidirectional images the input is set of images taken by single hand held camera the output is set of stabilized and rectified images augmented by the computed camera trajectory and reconstruction of feature points facilitating visual object recognition the paper generalizes previous works on camera trajectory estimation done on perspective images to omnidirectional images and introduces new technique for omnidirectional image rectification that is suited for recognizing people and cars in images the performance of the pipeline is demonstrated on real image sequence acquired in urban as well as natural environments
the goal of system evaluation in information retrieval has always been to determine which of set of systems is superior on given collection the tool used to determine system ordering is an evaluation metric such as average precision which computes relative collection specific scores we argue that broader goal is achievable in this paper we demonstrate that by use of standardization scores can be substantially independent of particular collection allowing systems to be compared even when they have been tested on different collections compared to current methods our techniques provide richer information about system performance improved clarity in outcome reporting and greater simplicity in reviewing results from disparate sources
similarity search is crucial task in multimedia retrieval and data mining most existing work has modelled this problem as the nearest neighbor nn problem which considers the distance between the query object and the data objects over fixed set of features such an approach has two drawbacks it leaves many partial similarities uncovered the distance is often affected by few dimensions with high dissimilarity to overcome these drawbacks we propose the match problem in this paperthe match problem models similarity search as matching between the query object and the data objects in dimensions where is given integer smaller than dimensionality and these dimensions are determined dynamically to make the query object and the data objects returned in the answer set match best the match query is expected to be superior to the knn query in discovering partial similarities however it may not be as good in identifying full similarity since single value of may only correspond to particular aspect of an object instead of the entirety to address this problem we further introduce the frequent match problem which finds set of objects that appears in the match answers most frequently for range of values moreover we propose search algorithms for both problems we prove that our proposed algorithm is optimal in terms of the number of individual attributes retrieved which is especially useful for information retrieval from multiple systems we can also apply the proposed algorithmic strategy to achieve disk based algorithm for the frequent match query by thorough experimental study using both real and synthetic data sets we show that the match query yields better result than the knn query in identifying similar objects by partial similarities our proposed method for processing the frequent match query outperforms existing techniques for similarity search in terms of both effectiveness and efficiency
traditional algorithms for description logic dl instance retrieval are inefficient for large amounts of underlying data as dl is becoming more and more popular in areas such as the semantic web and information integration it is very important to have systems which can reason efficiently over large data sets in this paper we present an approach to transform dl axioms formalised in the inline graphic mime subtype gif xlink sinline alt text mathcal shiq alt text inline graphic dl language into prolog program under the unique name assumption this transformation is performed with no knowledge about particular individuals they are accessed dynamically during the normal prolog execution of the generated program this technique together with the top down prolog execution implies that only those pieces of data are accessed that are indeed important for answering the query this makes it possible to store the individuals in database instead of memory which results in better scalability and helps in using dl ontologies directly on top of existing information sources the transformation process consists of two steps the dl axioms are converted to first order clauses of restricted form and prolog program is generated from these clauses step which is the focus of the present paper actually works on more general clauses than those obtainable by applying step to inline graphic mime subtype gif xlink sinline alt text mathcal shiq alt text inline graphic knowledge base we first present base transformation the output of which can be either executed using simple interpreter or further extended to executable prolog code we then discuss several optimisation techniques applicable to the output of the base transformation some of these techniques are specific to our approach while others are general enough to be interesting for dl reasoner implementors not using prolog we give an overview of dlog dl reasoner in prolog which is an implementation of the techniques outlined above we evaluate the performance of dlog and compare it to some widely used dl reasoners such as racerpro pellet and kaon
this paper deals with the systematic synthesis of space optimal arrays as target example an asymptotically space optimal array for nx ny nz rectangular mesh algorithms with affine schedule ai bj ck is designed the obtained bound improves the best previously known ones the key idea underlying our approach is to compress the initial index domain along number of directions in order to obtain new domain that is more suitable for the application of projection methods
we propose using one class two class and multiclass svms to annotate images for supporting keyword retrieval of images providing automatic annotation requires an accurate mapping of images low level perceptual features eg color and texture to some high level semantic labels eg landscape architecture and animals much work has been performed in this area however there is lack of ability to assess the quality of annotation in this paper we propose confidence based dynamic ensemble cde which employs three level classification scheme at the base level cde uses one class support vector machines svms to characterize confidence factor for ascertaining the correctness of an annotation or class prediction made by binary svm classifier the confidence factor is then propagated to the multiclass classifiers at subsequent levels cde uses the confidence factor to make dynamic adjustments to its member classifiers so as to improve class prediction accuracy to accommodate new semantics and to assist in the discovery of useful low level features our empirical studies on large real world data set demonstrate cde to be very effective
concurrent computer programs are fast becoming prevalent in many critical applications unfortunately these programs are especially difficult to test and debug recently it has been suggested that injecting random timing noise into many points within program can assist in eliciting bugs within the program upon eliciting the bug it is necessary to identify minimal set of points that indicate the source of the bug to the programmer in this paper we pose this problem as an active feature selection problem we propose an algorithm called the iterative group sampling algorithm that iteratively samples lower dimensional projection of the program space and identifies candidate relevant points we analyze the convergence properties of this algorithm we test the proposed algorithm on several real world programs and show its superior performance finally we show the algorithms performance on large concurrent program
two approaches to high throughput processors are chip multi processing cmp and simultaneous multi threading smt cmp increases layout efficiency which allows more functional units and faster clock rate however cmp suffers from hardware partitioning of functional resources smt increases functional unit utilization by issuing instructions simultaneously from multiple threads however wide issue smt suffers from layout and technology implementation problems we use silicon resources as our basis for comparison and find that area and system clock have large effect on the optimal smt cmp design trade we show the area overhead of smt on each processor and how it scales with the width of the processor pipeline and the number of smt threads the wide issue smt delivers the highest single thread performance with improved multithread throughput however multiple smaller cores deliver the highest throughput also alternate processor configurations are explored that trade off smt threads for other microarchitecture features the result is small increase to single thread performance but fairly large reduction in throughput
often software systems are developed by organizations consisting of many teams of individuals working together brooks states in the mythical man month book that product quality is strongly affected by organization structure unfortunately there has been little empirical evidence to date to substantiate this assertion in this paper we present metric scheme to quantify organizational complexity in relation to the product development process to identify if the metrics impact failure proneness in our case study the organizational metrics when applied to data from windows vista were statistically significant predictors of failure proneness the precision and recall measures for identifying failure prone binaries using the organizational metrics was significantly higher than using traditional metrics like churn complexity coverage dependencies and pre release bug measures that have been used to date to predict failure proneness our results provide empirical evidence that the organizational metrics are related to and are effective predictors of failure proneness
we examine the influence of task types on information seeking behaviors on the web by using screen capture logs and eye movement data eleven participants performed two different types of web search an informational task and transactional task and their think aloud protocols and behaviors were recorded analyses of the screen capture logs showed that the task type affected the participants informationseeking behaviors in the transactional task participants visited more web pages than for the informational task but their reading time for each page was shorter than in the informational task preliminary analysis of eye movement data for nine participants revealed characteristics of the scanpaths followed in search result pages as well as the distribution of lookzones for each task
for simulations involving complex objects number of different properties must be represented an example of this is in modeling an object undergoing combustion heat amounts fuel consumption and even object shape must be modeled and changed over time ideally we would put everything into unified representation but this is sometimes not possible feasible due to measurement limitations or the suitability of specific representation in this paper we define multi representation framework for dealing with multiple properties and their interactions within an object this model is especially useful in physically based modeling where the time variation of some properties affect other properties including geometry or topology as motivating example case we present method for modeling decomposition of burning object
recent work has demonstrated the effectiveness of the wavelet decomposition in reducing large amounts of data to compact sets of wavelet coefficients termed wavelet synopses that can be used to provide fast and reasonably accurate approximate query answers major shortcoming of these existing wavelet techniques is that the quality of the approximate answers they provide varies widely even for identical queries on nearly identical values in distinct parts of the data as result users have no way of knowing whether particular approximate answer is highly accurate or off by many orders of magnitude in this article we introduce probabilistic wavelet synopses the first wavelet based data reduction technique optimized for guaranteed accuracy of individual approximate answers whereas previous approaches rely on deterministic thresholding for selecting the wavelet coefficients to include in the synopsis our technique is based on novel probabilistic thresholding scheme that assigns each coefficient probability of being included based on its importance to the reconstruction of individual data values and then flips coins to select the synopsis we show how our scheme avoids the above pitfalls of deterministic thresholding providing unbiased highly accurate answers for individual data values in data vector we propose several novel optimization algorithms for tuning our probabilistic thresholding scheme to minimize desired error metrics experimental results on real world and synthetic data sets evaluate these algorithms and demonstrate the effectiveness of our probabilistic wavelet synopses in providing fast highly accurate answers with improved quality guarantees
we have modified mach to treat cross domain remote procedure call rpc as single entity instead of sequence of message passing operations with rpc thus elevated we improved the transfer of control during rpc by changing the thread model like most operating systems mach views threads as statically associated with single task with two threads involved in an rpc an alternate model is that of migrating threads in which during rpc single thread abstraction moves between tasks with the logical flow of control and server code is passively executed we have compatibly replaced mach’s static threads with migrating threads in an attempt to isolate this aspect of operating system design and implementation the key element of our design is decoupling of the thread abstraction into the execution context and the schedulable thread of control consisting of chain of contexts key element of our implementation is that threads are now based in the kernel and temporarily make excursions into tasks via upcalls the new system provides more precisely defined semantics for thread manipulation and additional control operations allows scheduling and accounting attributes to follow threads simplifies kernel code and improves rpc performance we have retained the old thread and ipc interfaces for backwards compatibility with no changes required to existing client programs and only minimal change to servers as demonstrated by functional unix single server and clients the logical complexity along the critical rpc path has been reduced by factor of nine local rpc doing normal marshaling has sped up by factors of we conclude that migrating thread model is superior to static model that kernel visible rpc is prerequisite for this improvement and that it is feasible to improve existing operating systems in this manner
in this paper we want to reconsider the role anthropology both its theory and methods can play within hci research one of the areas anthropologists can contribute to here is to rethink the notion of social context where technology is used context is usually equated with the immediate activities such as work tasks when and by whom the task is performed this tends to under represent some fundamental aspects of social life like culture and history in this paper we want to open up discussion about what context means in hci and to emphasize socio structural and historical aspects of the term we will suggest more inclusive analytic way that able the hci community to make better sense of use situation an example of technology use in workplace will be given to demonstrate the yields this kind of theoretical framework can bring into hci
the unmanaged internet architecture uia provides zero configuration connectivity among mobile devices through personal names users assign personal names through an ad hoc device introduction process requiring no central allocation once assigned names bind securely to the global identities of their target devices independent of network location each user manages one namespace shared among all the user’s devices and always available on each device users can also name other users to share resources with trusted acquaintances devices with naming relationships automatically arrange connectivity when possible both in ad hoc networks and using global infrastructure when available uia prototype demonstrates these capabilities using optimistic replication for name resolution and group management and routing algorithm exploiting the user’s social network for connectivity
nowadays wi fi is the most mature technology for wireless internet access despite the large and ever increasing diffusion of wi fi hotspots energy limitations of mobile devices are still an issue to deal with this the standard includes power saving mode psm but not much attention has been devoted by the research community to understand its performance in depth we think that this paper contributes to fill the gap we focus on typical wi fi hotspot scenario and assess the dependence of the psm behavior on several key parameters such as the packet loss probability the round trip time the number of users within the hotspot we show that during traffic bursts psm is able to save up to of the energy spent when no energy management is used and introduces limited additional delay unfortunately in the case of long inactivity periods between bursts psm is not the optimal solution for energy management we thus propose very simple cross layer energy manager xem that dynamically tunes its energy saving strategy depending on the application behavior and key network parameters xem does not require any modification to the applications or to the standard and can thus be easily integrated in current wi fi devices depending on the network traffic pattern xem reduces the energy consumption of an additional with respect to the standard psm
because they are based on large content addressable memories load store queues lsqs present implementation challenges in superscalar processors in this paper we propose an alternate lsq organization that separates the time critical forwarding functionality from the process of checking that loads received their correct values two main techniques are exploited first the store forwarding logic is accessed only by those loads and stores that are likely to be involved in forwarding and second the checking structure is banked by address the result of these techniques is that the lsq can be implemented by collection of small low bandwidth structures yielding an estimated three to five times reduction in lsq dynamic power
an important privacy issue in location based services is to hide user’s identity while still provide quality location based services previous work has addressed the problem of locational mathcal anonymity either based on centralized or decentralized schemes however centralized scheme relies on an anonymizing server as for location cloaking which may become the performance bottleneck when there are large number of clients more importantly holding information in centralized place is more vulnerable to malicious attacks decentralized scheme depends on peer communication to cloak locations and is more scalable however it may pose too much computation and communication overhead to the clients the service fulfillment rate may also be unsatisfied especially when there are not enough peers nearby this paper proposes new hybrid framework called hisc that balances the load between the as and mobile clients hisc partitions the space into base cells and mobile client claims surrounding area consisting of base cells the number of mobile clients in the surrounding cells is kept and updated at both client and as sides mobile client can either request cloaking service from the centralized as or use peer to peer approach for spatial cloaking based on personalized privacy response time and service quality requirements hisc can elegantly distribute the work load between the as and the mobile clients by tuning one system parameter base cell size and two client parameters surrounding cell size and tolerance count by integrating salient features of two schemes hisc successfully preserves query anonymity and provides more scalable and consistent service both the as and the clients can enjoy much less work load additionally we propose simple yet effective random range shifting algorithm to prevent possible privacy leakage that would exist in the original pp approach our experiments show that hisc can elegantly balance the work load based on privacy requirements and client distribution hisc provides close to optimal service quality meanwhile it reduces the response time by more than an order of magnitude from both the pp scheme and the centralized scheme when anonymity level value of mathcal or number of clients is large it also reduces the update message cost of the as by nearly times and the peer searching message cost of the clients by more than an order of magnitude
disconnected skeleton is very coarse yet very stable skeleton based representation scheme for generic shape recognition in which recognition is performed mainly based on the structure of disconnection points of extracted branches without explicitly using information about boundary details however sometimes sensitivity to boundary details may be required in order to achieve the goal of recognition in this study we first present simple way to enrich disconnected skeletons with radius functions next we attempt to resolve the conflicting goals of stability and sensitivity by proposing coarse to fine shape matching algorithm as the first step two shapes are matched based on the structure of their disconnected skeletons and following to that the computed matching cost is re evaluated by taking into account the similarity of boundary details in the light of class specific boundary deformations which are learned from given set of examples
in this paper we propose new cooperative caching strategies by multiple clients in push based broadcast system which replaces cached items based on benefits of the waiting time key idea is that the clients construct logical peer to peer network and each of them determines the replacement of its own cache by taking into account access probabilities to data items from neighboring clients and the broadcast periods of data items or the times remaining until these items are broadcasted next we confirm that the proposed strategies reduce the average response time by simulation experiments
we present static cache oblivious dictionary structures for strings which provide analogues of tries and suffix trees in the cache oblivious model our construction takes as input either set of strings to store single string for which all suffixes are to be stored trie compressed trie or suffix tree and creates cache oblivious data structure which performs prefix queries in logbn os where is the number of leaves in the trie is the query string and is the block size this query cost is optimal for unbounded alphabets the data structure uses linear space
in many ways the central problem of ubiquitous computing how computational systems can make sense of and respond sensibly to complex dynamic environment laden with human meaning is identical to that of artificial intelligence ai indeed some of the central challenges that ubicomp currently faces in moving from prototypes that work in restricted environments to the complexity of real world environments eg difficulties in scalability integration and fully formalizing context echo some of the major issues that have challenged ai researchers over the history of their field in this paper we explore key moment in ai’s history where researchers grappled directly with these issues resulting in variety of novel technical solutions within ai we critically reflect on six strategies from this history to suggest technical solutions for how to approach the challenge of building real world usable solutions in ubicomp today
let be set of
this paper presents new technique for rendering bidirectional texture functions btfs at different levels of detail lods our method first decomposes each btf image into multiple subbands with laplacian pyramid each vector of laplacian coefficients of texel at the same level is regarded as laplacian bidirectional reflectance distribution function brdf these vectors are then further compressed by applying principal components analysis pca at the rendering stage the lod parameter for each pixel is calculated according to the distance from the viewpoint to the surface our rendering algorithm uses this parameter to determine how many levels of btf laplacian pyramid are required for rendering under the same sampling resolution btf gradually transits to brdf as the camera moves away from the surface our method precomputes this transition and uses it for multiresolution btf rendering our laplacian pyramid representation allows real time anti aliased rendering of btfs using graphics hardware in addition to provide visually satisfactory multiresolution rendering for btfs our method has comparable compression rate to the available single resolution btf compression techniques
the idea of using user preferences to assist with information filtering and with providing the most relevant answers to queries has recently received some attention from the research community this has resulted in the proposition of several frameworks for formulating preferences and their direct embedding into relational query languages in this paper we discuss major exploitation issues and privacy concerns inherent in the basic paradigm used by these proposed approaches when used with businesses not implicitly trusted by the user we then outline an alternative approach geared specifically towards using user preferences when interacting with businesses
when monitoring spatial phenomena which can often be modeled as gaussian processes gps choosing sensor locations is fundamental task there are several common strategies to address this task for example geometry or disk models placing sensors at the points of highest entropy variance in the gp model and or optimal design in this paper we tackle the combinatorial optimization problem of maximizing the mutual information between the chosen locations and the locations which are not selected we prove that the problem of finding the configuration that maximizes mutual information is np complete to address this issue we describe polynomial time approximation that is within of the optimum by exploiting the submodularity of mutual information we also show how submodularity can be used to obtain online bounds and design branch and bound search procedures we then extend our algorithm to exploit lazy evaluations and local structure in the gp yielding significant speedups we also extend our approach to find placements which are robust against node failures and uncertainties in the model these extensions are again associated with rigorous theoretical approximation guarantees exploiting the submodularity of the objective function we demonstrate the advantages of our approach towards optimizing mutual information in very extensive empirical study on two real world data sets
information retrieval techniques have been traditionally exploited outside of relational database systems due to storage overhead the complexity of programming them inside the database system and their slow performance in sql implementations this project supports the idea that searching and querying digital libraries with information retrieval models in relational database systems can be performed with optimized sql queries and user defined functions in our research we propose several techniques divided into two phases storing and retrieving the storing phase includes executing document pre processing stop word removal and term extraction and the retrieval phase is implemented with three fundamental ir models the popular vector space model the okapi probabilistic model and the dirichlet prior language model we conduct experiments using article abstracts from the dblp bibliography and the acm digital library we evaluate several query optimizations compare the on demand and the static weighting approaches and we study the performance with conjunctive and disjunctive queries with the three ranking models our prototype proved to have linear scalability and satisfactory performance with medium sized document collections our implementation of the vector space model is competitive with the two other models
xml employs tree structured data model and naturally xml queries specify patterns of selection predicates on multiple elements related by tree structure finding all occurrences of such twig pattern in an xml database is core operation for xml query processing prior work has typically decomposed the twig pattern into binary structural parent child and ancestor descendant relationships and twig matching is achieved by using structural join algorithms to match the binary relationships against the xml database and ii stitching together these basic matches limitation of this approach for matching twig patterns is that intermediate result sizes can get large even when the input and output sizes are more manageablein this paper we propose novel holistic twig join algorithm twigstack for matching an xml query twig pattern our technique uses chain of linked stacks to compactly represent partial results to root to leaf query paths which are then composed to obtain matches for the twig pattern when the twig pattern uses only ancestor descendant relationships between elements twigstack is and cpu optimal among all sequential algorithms that read the entire input it is linear in the sum of sizes of the input lists and the final result list but independent of the sizes of intermediate results we then show how to use modification of trees along with twigstack to match query twig patterns in sub linear time finally we complement our analysis with experimental results on range of real and synthetic data and query twig patterns
out of vocabulary lexicons including new words collocations as well as phrases are the key flesh of human language while an obstacle to machine translation but the translation of oov is quite difficult to obtain web mining solution to the oov translation is adopted in our research the basic assumption lies in that most of the oov’s translations exist on the web and search engines can provide many web pages containing the oov and corresponding translations we mine the translation from returned snippets of the search engine with expanded oov as the query term the difference of our method from other methods lies in that query classification is made before submitting to the search engine experiment shows our solution can discover the translation to many of the oovs with quite high precision
coherent read misses in shared memory multiprocessors account for substantial fraction of execution time in many important scientific and commercial workloads we propose temporal streaming to eliminate coherent read misses by streaming data to processor in advance of the corresponding memory accesses temporal streaming dynamically identifies address sequences to be streamed by exploiting two common phenomena in shared memory access patterns temporal address correlation groups of shared addresses tend to be accessed together and in the same order and temporal stream locality recently accessed address streams are likely to recur we present practical design for temporal streaming we evaluate our design using combination of trace driven and cycle accurate full system simulation of cache coherent distributed shared memory system we show that temporal streaming can eliminate of coherent read misses in scientific applications and between and in database and web server workloads our design yields speedups of to in scientific applications and to in commercial workloads
we review the evolution of the nonparametric regression modeling in imaging from the local nadaraya watson kernel estimate to the nonlocal means and further to transform domain filtering based on nonlocal block matching the considered methods are classified mainly according to two main features local nonlocal and pointwise multipoint here nonlocal is an alternative to local and multipoint is an alternative to pointwise these alternatives though obvious simplifications allow to impose fruitful and transparent classification of the basic ideas in the advanced techniques within this framework we introduce novel single and multiple model transform domain nonlocal approach the block matching and filtering bmd algorithm which is currently one of the best performing denoising algorithms is treated as special case of the latter approach
we propose an algorithm for the hierarchical aggregation of observations in dissemination based distributed traffic information systems instead of transmitting observed parameters directly we propose soft state sketches an extension of flajolet martin sketches as probabilistic approximation this data representation is duplicate insensitive trait that overcomes two central problems of existing aggregation schemes for vanet applications first when multiple aggregates of observations for the same area are available it is possible to combine them into an aggregate containing all information from the original aggregates this is fundamentally different from existing approaches where typically one of the aggregates is selected for further use while the rest is discarded second any observation or aggregate can be included into higher level aggregates regardless if it has already been previously directly or indirectly added those characteristics result in very flexible aggregate construction and high quality of the aggregates we demonstrate these traits of our approach by simulation study
grids are geographically distributed aggregates of resource nodes that support the provision of computing services including computing cycle simulation services data mining and data processing services grids span multiple management domains with different service provisioning strategies in this paper we present decentralised strategy of service discovery that utilises selective service capacity state dissemination and an experience based confidence model the model provides measure of the likelihood that discovery request forwarded to peer node would lead to match between the requested capacity and the node’s available service capacity the simulation results show that the proposed algorithm outperforms both the flooding as well as the random forwarding discovery schemes
intelligibility and control are important user concerns in context aware applications they allow user to understand why an application is behaving certain way and to change its behavior because of their importance to end users they must be addressed at an interface level however often the sensors or machine learning systems that users need to understand and control are created long before specific application is built or created separately from the application interface thus supporting interface designers in building intelligibility and control into interfaces requires application logic and underlying infrastructure to be exposed in some structured fashion as context aware infrastructures do not provide generalized support for this we extended one such infrastructure with situations components that appropriately exposes application logic and supports debugging and simple intelligibility and control interfaces while making it easier for an application developer to build context aware applications and facilitating designer access to application state and behavior we developed support for interface designers in visual basic and flash we demonstrate the usefulness of this support through an evaluation of programmers an evaluation of the usability of the new infrastructure with interface designers and the augmentation of three common context aware applications
efficient algorithms for incrementally computing nested query expressions do not exist nested query expressions are query expressions in which selection join predicates contain subqueries in order to respond to this problem we propose two step strategy for incrementaly computing nested query expressions in step the query expression is transformed into an equivalent unnested flat query expression in step the flat query expression is incrementally computed to support step we have developed very concise algebra to algebra transformation algorithm and we have formally proved its correctness the flat query expressions resulting from the transformation make intensive use of the relational set difference operator to support step we present and analyze an efficient algorithm for incrementally computing set differences based on view pointer caches when combined with existing incremental algorithms for spj queries our incremental set difference algorithm can be used to compute the unnested flat query expressions efficiently it is important to notice that without our incremental set difference algorithm the existing incremental algorithms for spj queries are useless for any query involving the set difference operator including queries that are not the result of unnesting nested queries
application specific safe message handlers ashs are designed to provide applications with hardware level network performance ashs are user written code fragments that safely and efficiently execute in the kernel in response to message arrival ashs can direct message transfers thereby eliminating copies and send messages thereby reducing send response latency in addition the ash system provides support for dynamic integrated layer processing thereby eliminating duplicate message traversals and dynamic protocol composition thereby supporting modularity ashs provide this high degree of flexibility while still providing network performance as good as or if they exploit application specific knowledge even better than hard wired in kernel implementations combination of user level microbenchmarks and end to end system measurements using tcp demonstrate the benefits of the ash system
taking the temporal dimension into account in searching ie using time of content creation as part of the search condition is now gaining increasingly interest however in the case of web search and web warehousing the timestamps time of creation or creation of contents of web pages and documents found on the web are in general not known or can not be trusted and must be determined otherwise in this paper we describe approaches that enhance and increase the quality of existing techniques for determining timestamps based on temporal language model through number of experiments on temporal document collections we show how our new methods improve the accuracy of timestamping compared to the previous models
we consider the problem of duplicate document detection for search evaluation given query and small number of web results for that query we show how to detect duplicate web documents with precision and recall in contrast charikar’s algorithm designed for duplicate detection in an indexing pipeline achieves precision but with recall of our improvement in recall while maintaining high precision comes from combining three ideas first because we are only concerned with duplicate detection among results for the same query the number of pairwise comparisons is small therefore we can afford to compute multiple pairwise signals for each pair of documents model learned with standard machine learning techniques improves recall to with precision second most duplicate detection has focused on text analysis of the html contents of document in some web pages the html is not good indicator of the final contents of the page we use extended fetching techniques to fill in frames and execute java script including signals based on our richer fetches further improves the recall to and the precision to finally we also explore using signals based on the query comparing contextual snippets based on the richer fetches improves the recall to we show that the overall accuracy of this final model approaches that of human judges
we present machine translation framework that can incorporate arbitrary features of both input and output sentences the core of the approach is novel decoder based on lattice parsing with quasi synchronous grammar smith and eisner syntactic formalism that does not require source and target trees to be isomorphic using generic approximate dynamic programming techniques this decoder can handle non local features similar approximate inference techniques support efficient parameter estimation with hidden variables we use the decoder to conduct controlled experiments on german to english translation task to compare lexical phrase syntax and combined models and to measure effects of various restrictions on non isomorphism
in this paper we propose simple and highly robust point matching method named graph transformation matching gtm relying on finding consensus nearest neighbour graph emerging from candidate matches the method iteratively eliminates dubious matches in order to obtain the consensus graph the proposed technique is compared against both the softassign algorithm and combination of ransac and epipolar constraint among these three techniques gtm demonstrates to yield the best results in terms of elimination of outliers the algorithm is shown to be able to deal with difficult cases such as duplication of patterns and non rigid deformations of objects an execution time comparison is also presented where gtm shows to be also superior to ransac for high outlier rates in order to improve the performance of gtm for lower outlier rates we present an optimised version of the algorithm lastly gtm is successfully applied in the context of constructing mosaics of retinal images where feature points are extracted from properly segmented binary images similarly the proposed method could be applied to number of other important applications
what are the key factors that contribute to the success of hypermedia development tool we have investigated this issue in the context of non ict professional environments eg schools or small museums which have limited in house technical competences and must cope with very limited budget we discuss set of success factors relevant to hypermedia tools targeted to this audience and present tool for multichannel hypermedia development that we have developed with these factors in mind we report the key results from wide on the field study in which the different success factors have been measured
modular analyses of object oriented programs need clear encapsulation boundaries between program components the reference semantics of object oriented languages complicates encapsulation ownership type systems are way to guarantee encapsulation however they introduce substantial and nontrivial annotation overhead for the programmer this is in particular true for type systems with an access policy that is more flexible than owners as dominators as we want to use ownership disciplines as basis for modular analyses we need the flexibility however to keep it practical the annotation overhead should be kept minimal in this paper we present such flexible ownership type system together with an inference technique to reduce the annotation overhead runtime components in our approach can be accessed via the interface of the owner as well as via other boundary objects with explicitly declared interface types the resulting type system is quite complex however the programmer only has to annotate the interface types of componentthe ownership type information for the classes implementing the components is automatically inferred by constraint based algorithm we proved the soundness of our approach for java like core language
advances in hardware capacity especially devices such as cameras and displays are driving the development of applications like high definition video conferencing that have tight timing and cpu requirements unfortunately current operating systems do not adequately provide the timing response needed by these applications in this paper we present hierarchical scheduling model that aims to provide these applications with tight timing response while at the same time preserve the strengths of current schedulers namely fairness and efficiency our approach called cooperative polling consists of an application level event scheduler and kernel thread scheduler that cooperate to dispatch time constrained application events accurately and with minimal kernel preemption while still ensuring rigorously that all applications share resources fairly fairness is enforced in flexible manner allowing sharing according to mixture of both traditional resource centric metrics and new application centric metrics the latter being critical to support graceful application level adaptation in overload unlike traditional real time systems our model does not require specification or estimation of resource requirements simplifying its usage dramatically our evaluation using an adaptive video application and graphics server shows that our system has event dispatch accuracies that are one to two orders of magnitude smaller than are achieved by existing schedulers at the same time our scheduler still maintains fairness and has low overhead
in optimizing compilers data structure choices directly influence the power and efficiency of practical program optimization poor choice of data structure can inhibit optimization or slow compilation to the point that advanced optimization features become undesirable recently static single assignment form and the control dependence graph have been proposed to represent data flow and control flow properties of programs each of these previously unrelated techniques lends efficiency and power to useful class of program optimizations although both of these structures are attractive the difficulty of their construction and their potential size have discouraged their use we present new algorithms that efficiently compute these data structures for arbitrary control flow graphs the algorithms use em dominance frontiers new concept that may have other applications we also give analytical and experimental evidence that all of these data structures are usually linear in the size of the original program this paper thus presents strong evidence that these structures can be of practical use in optimization
this paper addresses the problem of recovering variable like entities when analyzing executables in the absence of debugging information we show that variable like entities can be recovered by iterating value set analysis vsa combined numeric analysis and pointer analysis algorithm and aggregate structure identification an algorithm to identify the structure of aggregates our initial experiments show that the technique is successful in correctly identifying of the local variables and of the fields of heap allocated objects previous techniques recovered of the local variables but of the fields of heap allocated objects moreover the values computed by vsa using the variables recovered by our algorithm would allow any subsequent analysis to do better job of interpreting instructions that use indirect addressing to access arrays and heap allocated data objects indirect operands can be resolved better at to of the sites of writes and up to of the sites of reads these are the memory access operations for which it is the most difficult for an analyzer to obtain useful results
wireless mesh networks wmns support the cost effective broadband access for internet users although today’s ieee phy and mac specifications provide multi channel and multi rate capabilities exploiting available channels and data rates is critical issue to guarantee high network performance in multi rate wireless networks high rate links heavily suffer from performance degradation due to the presence of low rate links this problem is often referred to as performance anomaly in this paper we propose rate based channel assignment rb ca protocol to alleviate performance anomaly by using multiple channels in wmns rb ca exploits new metric called prioritized transmission time ptt in order to form high rate multi channel paths hmps with hmps large volume of traffics can be simultaneously delivered from or to the internet via multiple non overlapping channels as well as high rate links extensive ns simulations and experiments in real test bed have been performed to evaluate the performance of rb ca and then we compared it with well known existing wmn architecture our simulation and experimental results show that rb ca achieves improved performance in terms of aggregate throughput packet delivery ratio end to end delay and fairness
most if not all state of the art complete sat solvers are complex variations of the dpll procedure described in the early published descriptions of these modern algorithms and related data structures are given either as high level state transition systems or informally as pseudo programming language code the former although often accompanied with informal correctness proofs are usually very abstract and do not specify many details crucial for efficient implementation the latter usually do not involve any correctness argument and the given code is often hard to understand and modify this paper aims to bridge this gap by presenting sat solving algorithms that are formally proved correct and also contain information required for efficient implementation we use tutorial top down approach and develop sat solver starting from simple design that is subsequently extended step by step with requisite series of features the heuristic parts of the solver are abstracted away since they usually do not affect solver correctness although they are very important for efficiency all algorithms are given in pseudo code and are accompanied with correctness conditions given in hoare logic style the correctness proofs are formalized within the isabelle theorem proving system and are available in the extended version of this paper the given pseudo code served as basis for our sat solver argo sat
we introduce nonparametric model for sensitivity estimation which relies on generating points similar to the prediction point using its nearest neighbors unlike most previous work the sampled points differ simultaneously in multiple dimensions from the prediction point in manner dependent on the local density our approach is based on an intuitive idea of locality which uses the voronoi cell around the prediction point ie all points whose nearest neighbor is the prediction point we demonstrate how an implicit density over this neighborhood can be used in order to compute relative estimates of the local sensitivity the resulting estimates demonstrate improved performance when used in classifier combination and classifier recalibration as well as being potentially useful in active learning and variety of other problems
this paper presents general class of dynamic stochastic optimization problems we refer to as stochastic depletion problems number of challenging dynamic optimization problems of practical interest are stochastic depletion problems optimal solutions for such problems are difficult to obtain both from pragmatic computational perspective as well as from theoretical perspective as such simple heuristics are desirable we isolate two simple properties that if satisfied by problem within this class guarantee that myopic policy incurs performance loss of at most relative to the optimal adaptive control policy for that problem we are able to verify that these two properties are satisfied for several interesting families of stochastic depletion problems and as consequence we identify computationally efficient approximations to optimal control policies for number of interesting dynamic stochastic optimization problems
ranking plays central role in many web search and information retrieval applications ensemble ranking sometimes called meta search aims to improve the retrieval performance by combining the outputs from multiple ranking algorithms many ensemble ranking approaches employ supervised learning techniques to learn appropriate weights for combining multiple rankers the main shortcoming with these approaches is that the learned weights for ranking algorithms are query independent this is suboptimal since ranking algorithm could perform well for certain queries but poorly for others in this paper we propose novel semi supervised ensemble ranking sser algorithm that learns query dependent weights when combining multiple rankers in document retrieval the proposed sser algorithm is formulated as an svm like quadratic program qp and therefore can be solved efficiently by taking advantage of optimization techniques that were widely used in existing svm solvers we evaluated the proposed technique on standard document retrieval testbed and observed encouraging results by comparing to number of state of the art techniques
novel method for projecting points onto point cloud possibly with noise is presented based on the point directed projection dp algorithm proposed by azariadis sapidis drawing curves onto cloud of points for point based modelling computer aided design the new method operates directly on the point cloud without any explicit or implicit surface reconstruction procedure the presented method uses simple robust and efficient algorithm least squares projection lsp which projects points onto the point cloud in least squares sense without any specification of the projection vector the main contribution of this novel method is the automatic computation of the projection vector furthermore we demonstrate the effectiveness of this approach through number of application examples including thinning point cloud point normal estimation projecting curves onto point cloud and others
mobile phones have the potential to be useful agents for their owners by detecting and reporting situations that are of interest several challenges emerge in the case of detecting and reporting nice to know situations being alerted of these events may not be of critical importance but may be useful if the user is not busy for detection the precision of sensing must be high enough to minimize annoying false notifications despite the constraints imposed by the inaccuracy of commodity sensors and the limited battery power available on mobile phones for reporting the notifications cannot be too obtrusive to the user or those in the vicinity peripheral cues are appropriate for conveying information like proximity but have been studied primarily in settings like offices where sensors and cueing mechanisms can be controlled we explore these issues through the design of peopletones buddy proximity application for mobile phones we contribute an algorithm for detecting proximity techniques for reducing sensor noise and power consumption and method for generating peripheral cues empirical measurements demonstrate the precision and recall characteristics of our proximity algorithm two week study of three groups of friends using peopletones shows that our techniques were effective enabling the study of how people respond to peripheral cues in the wild our qualitative findings underscore the importance of cue selection and personal control for peripheral cues
recent work in query optimization has addressed the issue of placing expensive predicates in query plan in this paper we explore the predicate placement options considered in the montage dbms presenting family of algorithms that form successively more complex and effective optimization solutions through analysis and performance measurements of montage sql queries we classify queries and highlight the simplest solution that will optimize each class correctly we demonstrate limitations of previously published algorithms and discuss the challenges and feasibility of implementing the various algorithms in commercial grade system
research aims to facilitate collaboration across time and distance researchers need techniques and tools to support their collaborative work groupware is one technique that supports groups of people engaging in common task over the network besides it is also one of the most effective means to solve the collaboration problem existing groupware projects provide fixed functions such as messaging conferen cing electronic meeting document management document collaboration and so on however they put limited emphasis on scientists research work in their specific fields this paper proposes groupware environment and tries to give domain specific group editor to facilitate researchers collaboration the groupware implements visual molecule group editor for chemists to co edit molecular structures over network and it has plug in extensible architecture intending to easily integrate other tools which is useful for chemists collaboration the idea given in this paper could be possible solution to facilitate chemists collaborative research work
an algorithm is proposed for face recognition in the presence of varied facial expressions it is based on combining the match scores from matching multiple overlapping regions around the nose experimental results are presented using the largest database employed to date in face recognition studies over scans of subjects results show substantial improvement over matching the shape of single larger frontal face region this is the first approach to use multiple overlapping regions around the nose to handle the problem of expression variation
current approaches for answering queries with imprecise constraints require users to provide distance metrics and importance measures for attributes of interest in this paper we focus on providing domain and end user independent solution for supporting imprecise queries over web databases without affecting the underlying database we propose query processing framework that integrates techniques from ir and database research to efficiently determine answers for imprecise queries we mine and use approximate functional dependencies between attributes to create precise queries having tuples relevant to the given imprecise query an approach to automatically estimate the semantic distances between values of categorical attributes is also proposed we provide preliminary results showing the utility of our approach
huge number of documents that were only available in libraries are now on the web the web access is solution to protect the cultural heritage and to facilitate knowledge transmission most of these documents are displayed as images of the original paper pages and are indexed by hand in this paper we present how and why document image analysis contributes to build the digital libraries of the future readers expect human centred interactive reading stations which imply the production of hyperdocuments to fit the reader’s intentions and needs image analysis allows extracting and categorizing the meaningful document components and relationships it also provides readers adapted visualisation of the original images document image analysis is an essential prerequisite to enrich hyperdocuments that support content based readers activities such as information seeking and navigation this paper focuses the function of the original image reference for the reader and the input data that are processed to automatically detect what makes sense in document
support for object oriented programming has become an integral part of mainstream languages and more recently generic programming has gained widespread acceptance as well natural question is how these two paradigms and their underlying language mechanisms should interact one particular design option that of using subtyping to constrain the type parameters of generic functions has been chosen in the generics of java and those planned for future revision of certain shortcomings have previously been identified in using subtyping for constraining parametric polymorphism in the context of generic programmingto address these we propose extending object oriented interfaces and subtyping to include associated types and constraint propagationassociated types are type members of interfaces and classes constraint propagation allows certain constraints on type parameters to be inferred from other constraints on those parameters and their use in base class type expressionsthe paper demonstrates these extensions in the context of with generics describes translation of the extended features to and presents formalism proving their safety the formalism is applicable to other mainstream object oriented languages supporting bounded polymorphism such as java
decade ago as wireless sensor network research took off many researchers in the field denounced the use of ip as inadequate and in contradiction to the needs of wireless sensor networking since then the field has matured standard links have emerged and ip has evolved in this paper we present the design of complete ipv based network architecture for wireless sensor networks we validate the architecture with production quality implementation that incorporates many techniques pioneered in the sensor network community including duty cycled link protocols header compression hop by hop forwarding and efficient routing with effective link estimation in addition to providing interoperability with existing ip devices this implementation was able to achieve an average duty cycle of average per hop latency of ms and data reception rate of over period of weeks in real world home monitoring application where each node generates one application packet per minute our results outperform existing systems that do not adhere to any particular standard or architecture in light of this demonstration of full ipv capability we review the central arguments that led the field away from ip we believe that the presence of an architecture specifically an ipv based one provides strong foundation for wireless sensor networks going forward
we introduce transformational approach to improve the first stage of offline partial evaluation of functional programs the so called binding time analysis bta for this purpose we first introduce an improved defunctionalization algorithm that transforms higher order functions into first order ones so that existing techniques for termination analysis and propagation of binding times of first order programs can be applied then we define another transformation tailored to defunctionalized programs that allows us to get the accuracy of polyvariant bta from monovariant bta over the transformed program finally we show summary of experimental results that demonstrate the usefulness of our approach
we present novel graph embedding to speed up distance range and nearest neighbor queries on static and or dynamic objects located on weighted graph that is applicable also for very large networks our method extends an existing embedding called reference node embedding which can be used to compute accurate lower and upper bounding filters for the true shortest path distance in order to solve the problem of high storage cost for the network embedding we propose novel concept called hierarchical embedding that scales well to very large traffic networks our experimental evaluation on several real world data sets demonstrates the benefits of our proposed concepts ie efficient query processing and reduced storage cost over existing work
the protection of privacy is an increasing concern in our networked society because of the growing amount of personal information that is being collected by number of commercial and public services emerging scenarios of user service interactions in the digital world are then pushing toward the development of powerful and flexible privacy aware models and languages this paper aims at introducing concepts and features that should be investigated to fulfill this demand we identify different types of privacy aware policies access control release and data handling policies the access control policies govern access release of data services managed by the party as in traditional access control and release policies govern release of personal identifiable information pii of the party and specify under which conditions it can be disclosed the data handling policies allow users to specify and communicate to other parties the policy that should be enforced to deal with their data we also discuss how data handling policies can be integrated with traditional access control systems and present privacy control module in charge of managing integrating and evaluating access control release and data handling policies
in this paper we present visual framework developed as an eclipse plug in to define and execute reverse engineering processes aimed at comprehending traditional and web based information systems processes are defined in terms of uml activity diagrams where predefined or newly developed software components can be associated to each activity components implemented using either traditional programming languages or software environments for data analysis ie matlab or can be reused once the process has been fully defined the software engineer executes it to reverse engineering and comprehend software systems the proposed visual framework has been evaluated on two case studies
this work presents the results of comparative study in which we investigate the ways manipulation of physical versus digital media are fundamentally different from one another participants carried out both puzzle task and photo sorting task in two different modes in physical dimensional space and on multi touch interactive tabletop in which the digital items resembled their physical counterparts in terms of appearance and behavior by observing the interaction behaviors of participants we explore the main differences and discuss what this means for designing interactive surfaces which use aspects of the physical world as design resource
scatterplots and parallel coordinate plots pcps that can both be used to assess correlation visually in this paper we compare these two visualization methods in controlled user experiment more specifically participants were asked to report observed correlation as function of the sample correlation under varying conditions of visualization method sample size and observation time statistical model is proposed to describe the correlation judgment process the accuracy and the bias in the judgments in different conditions are established by interpreting the parameters in this model discriminability index is proposed to characterize the performance accuracy in each experimental condition moreover statistical test is applied to derive whether or not the human sensation scale differs from theoretically optimal that is unbiased judgment scale based on these analyses we conclude that users can reliably distinguish twice as many different correlation levels when using scatterplots as when using pcps we also find that there is bias towards reporting negative correlations when using pcps therefore we conclude that scatterplots are more effective than parallel plots in supporting visual correlation analysis
the logic fo id uses ideas from the field of logic programming to extend first order logic with non monotone inductive definitions the goal of this paper is to extend gentzen’s sequent calculus to obtain deductive inference method for fo id the main difficulty in building such proof system is the representation and inference of unfounded sets it turns out that we can represent unfounded sets by least fixpoint expressions borrowed from stratified least fixpoint logic slfp which is logic with least fixpoint operator and characterizes the expressibility of stratified logic programs therefore in this paper we integrate least fixpoint expressions into fo id and define the logic fo id slfp we investigate sequent calculus for fo id slfp which extends the sequent calculus for slfp with inference rules for the inductive definitions of fo id we show that this proof system is sound with respect to slightly restricted fragment of fo id and complete for more restricted fragment of fo id
argumentation is the process by which arguments are constructed and handled argumentation constitutes major component of human intelligence the ability to engage in argumentation is essential for humans to understand new problems to perform scientific reasoning to express to clarify and to defend their opinions in their daily lives argumentation mining aims to detect the arguments presented in text document the relations between them and the internal structure of each individual argument in this paper we analyse the main research questions when dealing with argumentation mining and the different methods we have studied and developed in order to successfully confront the challenges of argumentation mining in legal texts
the complexity raised in modern software systems seems to be no longer affordable in terms of the abstractions and methodologies promoted by traditional approaches to computer science and software engineering and radically new approaches are required this paper focuses on the problem of engineering the motion coordination of large scale multi agent system and proposes an approach that takes inspiration from the laws of physics our idea is to have the movements of agents driven by force fields generated by the agents themselves and propagated via some infrastructure or by the agents in an ad hoc way globally coordinated and self organized behavior in the agent’s movements can then emerge due to the interrelated effects of agents following the shape of the fields and dynamic fields re shaping the approach is presented and its effectiveness described with regard to concrete case study in the area of urban traffic management
pram parallel random access model has been widely regarded desirable parallel machine model for many years but it is also believed to be impossible in reality as the new billion transistor processor era begins the explicit multi threading xmt pram on chip project is attempting to design an on chip parallel processor that efficiently supports pram algorithms this paper presents the first prototype of the xmt architecture that incorporates simple in order processors operating at mhz the microarchitecture of the prototype is described and the performance is studied with respect to some micro benchmarks using cycle accurate emulation the projected performance of an mhz xmt asic processor is compared with amd opteron ghz which uses similar area as would processor asic version of the xmt prototype the results suggest that an only mhz xmt asic system outperforms amd opteron ghz with speedups ranging between and
this paper develops fuzzy constraint based model for bilateral multi issue negotiation in trading environments in particular we are concerned with the principled negotiation approach in which agents seek to strike fair deal for both parties but which nevertheless maximises their own payoff thus there are elements of both competition and cooperation in the negotiation hence semi competitive environments one of the key intuitions of the approach is that there is often more than one option that can satisfy the interests of both parties so if the opponent cannot accept an offer then the proponent should endeavour to find an alternative that is equally acceptable to it but more acceptable to the opponent that is the agent should make trade off only if such trade off is not possible should the agent make concession against this background our model ensures the agents reach deal that is fair pareto optimal for both parties if such solution exists moreover this is achieved by minimising the amount of private information that is revealed the model uses prioritised fuzzy constraints to represent trade offs between the different possible values of the negotiation issues and to indicate how concessions should be made when they are necessary also by using constraints to express negotiation proposals the model can cover the negotiation space more efficiently since each exchange covers region rather than single point which is what most existing models deal with in addition by incorporating the notion of reward into our negotiation model the agents can sometimes reach agreements that would not otherwise be possible
semantic portal is the next generation of web portals that are powered by semantic web technologies for improved information sharing and exchange for community of users current methods of searching in semantic portals are limited to keyword based search using information retrieval ir techniques ontology based formal query and reasoning or simple combination of the two in this paper we propose an enhanced model that tightly integrates ir with formal query and reasoning to fully utilize both textual and semantic information for searching in semantic portals the model extends the search capabilities of existing methods and can answer more complex search requests the ideas in fuzzy description logic dl ir model and formal dl query method are employed and combined in our model based on the model semantic search service is implemented and evaluated the evaluation shows very large improvements over existing methods
in distributed proof construction systems information release policies can make it unlikely that any single node in the system is aware of the complete structure of any particular proof tree this property makes it difficult for queriers to determine whether the proofs constructed using these protocols sampled consistent snapshot of the system state this has previously been shown to have dire consequences in decentralized authorization systems unfortunately the consistency enforcement solutions presented in previous work were designed for systems in which only information encoded in certificates issued by certificate authorities is used during the decision making process further they assume that each piece of certified evidence used during proof construction is available to the decision making node at runtime in this paper we generalize these previous results and present lightweight mechanisms through which consistency constraints can be enforced in proof systems in which the full details of proof may be unavailable to the querier and the existence of certificate authorities for certifying evidence is unlikely these types of distributed proof systems are likely candidates for use in pervasive computing and sensor network environments we present modifications to one such distributed proof system that enable two types of consistency constraints to be enforced while still respecting the same confidentiality and integrity policies as the original proof system further we detail performance analysis that illustrates the modest overheads of our consistency enforcement schemes
government transformation is new term used to signify practices undertaken by governments in order to change their processes and services towards the electronic automation as services are being transformed every day in many countries the involved stakeholders are in urgent need for introducing and utilizing powerful instruments to facilitate and organize service composition and provision this paper presents the inspiration of conceptual model capable to originate an effective scalable government ontology and further on the implementation of an ontology based repository for designing modelling and even reengineering of governmental services pilot tested in the greek government
proof carrying code pcc allows code producer to provide to host program along with its formal safety proof the proof attests to certain safety policy enforced by the code and can be mechanically checked by the host while this language based approach to code certification is very general in principle existing pcc systems have only focused on programs whose safety proofs can be automatically generated as result many low level system libraries eg memory management have not yet been handled in this paper we explore complementary approach in which general properties and program correctness are semi automatically certified in particular we introduce low level language cap for building certified programs and present certified library for dynamic storage allocation
motion blur is crucial for high quality rendering but is also very expensive our first contribution is frequency analysis of motion blurred scenes including moving objects specular reflections and shadows we show that motion induces shear in the frequency domain and that the spectrum of moving scenes can be approximated by wedge this allows us to compute adaptive space time sampling rates to accelerate rendering for uniform velocities and standard axis aligned reconstruction we show that the product of spatial and temporal bandlimits or sampling rates is constant independent of velocity our second contribution is novel sheared reconstruction filter that is aligned to the first order direction of motion and enables even lower sampling rates we present rendering algorithm that computes sheared reconstruction filter per pixel without any intermediate fourier representation this often permits synthesis of motion blurred images with far fewer rendering samples than standard techniques require
an increasing number of novel applications produce rich set of different data types that need to be managed efficiently and coherently in this article we present our experience with designing and implementing data management infrastructure for distributed immersive performance dip application the dip project investigates versatile framework for the capture recording and replay of video audio and midi musical instrument digital interface streams in an interactive environment for collaborative music performance we are focusing on two classes of data streams that are generated within this environment the first category consists of high resolution isochronous media streams namely audio and video the second class comprises midi data produced by electronic instruments midi event sequences are alphanumeric in nature and fall into the category of the data streams that have been of interest to data management researchers in recent years we present our data management architecture which provides repository for all dip data streams of both categories need to be acquired transmitted stored and replayed in real time data items are correlated across different streams with temporal indices the audio and video streams are managed in our own high performance data recording architecture hydra which integrates multistream recording and retrieval in consistent manner this paper reports on the practical issues and challenges that we encountered during the design implementation and experimental phases of our prototype we also present some analysis results and discuss future extensions for the architecture
in this paper we present our initial design and implementation of declarative network verifier dnv dnv utilizes theorem proving well established verification technique where logic based axioms that automatically capture network semantics are generated and user driven proof process is used to establish network correctness properties dnv takes as input declarative networking specifications written in the network datalog ndlog query language and maps that automatically into logical axioms that can be directly used in existing theorem provers to validate protocol correctness dnv is significant improvement compared to existing use case of theorem proving which typically require several man months to construct the system specifications moreover ndlog high level specification whose semantics are precisely compiled into dnv without loss can be directly executed as implementations hence bridging specifications verification and implementation to validate the use of dnv we present case studies using dnv in conjunction with the pvs theorem prover to verify routing protocols including eventual properties of protocols in dynamic settings
our world today is generating huge amounts of graph data such as social networks biological networks and the semantic web many of these real world graphs are edge labeled graphs ie each edge has label that denotes the relationship between the two vertices connected by the edge fundamental research problem on these labeled graphs is how to handle the label constraint reachability query can vertex reach vertex through path whose edge labels are constrained by set of labels in this work we introduce novel tree based index framework which utilizes the directed maximal weighted spanning tree algorithm and sampling techniques to maximally compress the generalized transitive closure for the labeled graphs an extensive experimental evaluation on both real and synthetic datasets demonstrates the efficiency of our approach in answering label constraint reachability queries
vendors have widely adopted rbac to manage user access to computer resources in various products including database management systems however as this analysis shows the standard is hindered by limitations errors and design flaws
xml’s increasing diffusion makes efficient xml query processing and indexing all the more critical given the semistructured nature of xml documents however general query processing techniques won’t work researchers have proposed several specialized indexing methods that offer query processors efficient access to xml documents although none are yet fully implemented in commercial productsthe classification of xml indexing techniques in this article identifies current practices and trends offering insight into how developers can improve query processing and select the best solution for particular contexts
we present hashcache configurable cache storage engine designed to meet the needs of cache storage in the developing world with the advent of cheap commodity laptops geared for mass deployments developing regions are poised to become major users of the internet and given the high cost of bandwidth in these parts of the world they stand to gain significantly from network caching however current web proxies are incapable of providing large storage capacities while using small resource footprints requirement for the integrated multi purpose servers needed to effectively support developing world deployments hash cache presents radical departure from the conventional wisdom in network cache design and uses to times less memory than current techniques while still providing comparable or better performance as such hash cache can be deployed in configurations not attainable with current approaches such as having multiple terabytes of external storage cache attached to low powered machines hashcache has been successfully deployed in two locations in africa and further deployments are in progress
online communities have become popular for publishing and searching content as well as for finding and connecting to other users user generated content includes for example personal blogs bookmarks and digital photos these items can be annotated and rated by different users and these social tags and derived user specific scores can be leveraged for searching relevant content and discovering subjectively interesting items moreover the relationships among users can also be taken into consideration for ranking search results the intuition being that you trust the recommendations of your close friends more than those of your casual acquaintances queries for tag or keyword combinations that compute and rank the top results thus face large variety of options that complicate the query processing and pose efficiency challenges this paper addresses these issues by developing an incremental top algorithm with two dimensional expansions social expansion considers the strength of relations among users and semantic expansion considers the relatedness of different tags it presents new algorithm based on principles of threshold algorithms by folding friends and related tags into the search space in an incremental on demand manner the excellent performance of the method is demonstrated by an experimental evaluation on three real world datasets crawled from delicious flickr and librarything
recent progress of computer and network technologies makes it possible to store and retrieve large volume of multimedia data in many applications in such applications efficient indexing scheme is very important for multimedia retrieval depending on the media type multimedia data shows distinct characteristics and requires different approach to handle in this paper we propose fast melody finder fmf that can retrieve melodies fast from audio database based on frequently queried tunes those tunes are collected from user queries and incrementally updated into index considering empirical user request pattern for multimedia data those tunes will cover significant portion of user requests fmf represents all the acoustic and common music notational inputs using well known string format such as udr and lsr and uses string matching techniques to find query results we implemented prototype system and report on its performance through various experiments
is one of the few recent research projects that is examining operating system design structure issues in the context of new whole system design is open source and was designed from the ground up to perform well and to be scalable customizable and maintainable the project was begun in by team at ibm research over the last nine years there has been development effort on from between six to twenty researchers and developers across ibm collaborating universities and national laboratories supports the linux api and abi and is able to run unmodified linux applications and libraries the approach we took in to achieve scalability and customizability has been successfulthe project has produced positive research results has resulted in contributions to linux and the xen hypervisor on power and continues to be rich platform for exploring system software technology today is one of the key exploratory platforms in the doe’s fast os program is being used as prototyping vehicle in ibm’s percs project and is being used by universities and national labs for exploratory research in this paper we provide insight into building an entire system by discussing the motivation and history of describing its fundamental technologies and presenting an overview of the research directions we have been pursuing
this paper deals with energy aware real time system scheduling using dynamic voltage scaling dvs for energy constrained embedded systems that execute variable and unpredictable workloads the goal is to design dvs schemes to minimize the expected energy consumption of the whole system while meeting the deadlines of the tasks researchers have attempted to take advantage of stochastic information about workloads to achieve better energy savings and accordingly various stochastic dvs schemes have been proposed however the existing stochastic dvs schemes are based on much simplified power models that assume unrestricted continuous frequency well defined power frequency relation and no speed change overhead when these schemes are used in practice they need to be patched in order to comply with realistic power models experiments show that some of such dvs schemes perform even worse than certain non stochastic dvs schemes furthermore even for stochastic schemes that were shown experimentally to outperform non stochastic schemes it is not clear how well they perform compared to the optimal solution which is yet to be found in this work we provide unified practical approach for obtaining optimal or provably close to optimal stochastic inter task intra task and hybrid dvs schemes under realistic power models in which the processor only provides set of discrete speeds no assumption is made on power frequency relation and speed change overhead is considered we also evaluate the existing dvs schemes by comparing them with our dvs schemes
despite an increasing interest in scientific workflow technologies in recent years workflow design remains challenging slow and often error prone process thus limiting the speed of further adoption of scientific workflows based on practical experience with data driven workflows we identify and illustrate number of recurring scientific workflow design challenges ie parameter rich functions data assembly disassembly and cohesion conditional execution iteration and more generally workflow evolution in conventional approaches such challenges usually lead to the introduction of different types of shims ie intermediary workflow steps that act as adapters between otherwise incorrectly wired components however relying heavily on the use of shims leads to brittle ie change intolerant workflow designs that are hard to comprehend and maintain to this end we present general workflow design paradigm called virtual data assembly lines vdal in this paper we show how the vdal approach can overcome common scientific workflow design challenges and improve workflow designs by exploiting semistructured nested data model like xml ii flexible statically analyzable configuration mechanism eg an xquery fragment and iii an underlying virtual assembly line model that is resilient to workflow and data changes the approach has been implemented as kepler comad and applied to improve the design of complex real world workflows
as part of our continuing research on using petri nets to support automated analysis of ada tasking behavior we have investigated the application of petri net reduction for deadlock analysis although reachability analysis is an important method to detect deadlocks it is in general inefficient or even intractable net reduction can aid the analysis by reducing the size of the net while preserving relevant properties we introduce number of reduction rules and show how they can be applied to ada nets which are automatically generated petri net models of ada tasking we define reduction process and method by which useful description of detected deadlock state can be obtained from the reduced net’s information reduction tool and experimental results from applying the reduction process are discussed
the customization of natural language interface to certain application domain or knowledge base still represents major effort for end users given the current state of the art in this article we present our natural language interface orakel describe its architecture design choices and implementation in particular we present orakel’s adaptation model which allows users which are not familiar with methods from natural language processing nlp or formal linguistics to port natural language interface to certain domain and knowledge base the claim that our model indeed meets our requirement of intuitive adaptation is experimentally corroborated by diverse experiments with end users showing that non nlp experts can indeed create domain lexica for our natural language interface leading to similar performance compared to lexica engineered by nlp experts
the widespread use of clusters and web farms has increased the importance of data replication in this article we show how to implement consistent and scalable data replication at the middleware level we do this by combining transactional concurrency control with group communication primitives the article presents different replication protocols argues their correctness describes their implementation as part of generic middleware middle and proves their feasibility with an extensive performance evaluation the solution proposed is well suited for variety of applications including web farms and distributed object platforms
it is known that given an edge weighted graph maximum adjacency ordering ma ordering of vertices can find special pair of vertices called pendent pair and that minimum cut in graph can be found by repeatedly contracting pendent pair yielding one of the fastest and simplest minimum cut algorithms in this paper we provide another ordering of vertices called minimum degree ordering md ordering as new fundamental tool to analyze the structure of graphs we prove that an md ordering finds different type of special pair of vertices called flat pair which actually can be obtained as the last two vertices after repeatedly removing vertex with the minimum degree by contracting flat pairs we can find not only minimum cut but also all extreme subsets of given graph these results can be extended to the problem of finding extreme subsets in symmetric submodular set functions
accurate estimation of link quality is the key to enable efficient routing in wireless sensor networks current link estimators focus mainly on identifying long term stable links for routing they leave out potentially large set of intermediate links offering significant routing progress fine grained analysis of link qualities reveals that such intermediate links are bursty ie stable in the short term in this paper we use short term estimation of wireless links to accurately identify short term stable periods of transmission on bursty links our approach allows routing protocol to forward packets over bursty links if they offer better routing progress than long term stable links we integrate short term link estimator and its associated routing strategy with standard routing protocol for sensor networks our evaluation reveals an average of and maximum of reduction in the overall transmissions when routing over long range bursty links our approach is not tied to any specific routing protocol and integrates seamlessly with existing routing protocols and link estimators
this paper deals with visualization based approach to performance analyzing and tuning of highly irregular task parallel applications at its core lies novel automatic layout algorithm for execution graphs which is based on sugiyama’s framework our visualization enables the application designer to reliably detect manifestations of parallel overhead and to investigate on their individual root causes we particularly focus on structural properties of task parallel computations which are hard to detect in more analytical way for example false sharing and false parallelism in addition we discuss embedding our visualization into an integrated development environment realizing seamless work flow for implementation execution analysis and tuning of parallel progams
measuring association among variables is an important step for finding solutions to many data mining problems an existing metric might not be effective to serve as measure of association among set of items in database in this paper we propose two measures of association and we introduce the notion of associative itemset in database we express the proposed measures in terms of supports of itemsets in addition we provide theoretical foundations of our work we present experimental results on both real and synthetic databases to show the effectiveness of
in past years number of works considered behavioral protocols of components and discussed approaches for automatically checking of compatibality of protocols protocol conformance in component based systems the approaches are usually model checking approaches ie positive answer guarantees protocol conformance for all executions while negative answer provides example executions that may lead to protocol violations it turned out that if behavioral abstractions take into account unbounded concurrency and unbounded recursion the protocol conformance checking problem becomes undecidable there are two possibilities to overcome this problem further behavioral abstraction to finite state systems or ii conservative approximation of the protocol conformance checking problem both approaches may lead to spurious counterexamples ie due to abstractions or approximations the shown execution can never happen this work considers the second approach and shows heuristics that reduces the number of spurious counterexamples by cutting off search branches that definitely do not contain real counterexamples
we introduce new realistic input model for straight line geometric graphs and nonconvex polyhedra geometric graph is local if the longest edge at every vertex is only constant factor longer than the distance from to its euclidean nearest neighbor among the other vertices of and the longest and shortest edges of differ in length by at most polynomial factor polyhedron is local if all its faces are simplices and its edges form local geometric graph we show that any boolean combination of two local polyhedra in each with vertices can be computed in nlogn time using standard hierarchy of axis aligned bounding boxes using results of de berg we also show that any local polyhedron in has binary space partition tree of size nlog and depth logn these bounds are tight in the worst case when
we present the implementation of large scale latency estimation system based on gnp and incorporated into the google content delivery network our implementation employs standard features of contemporary web clients and carefully controls the overhead incurred by latency measurements using scalable centralized scheduler it also requires only small number of cdn modifications which makes it attractive for any cdn interested in large scale latency estimation we investigate the issue of coordinate stability over time and show that coordinates drift away from their initial values with time so that of node coordinates become inaccurate by more than ms after one week however daily re computations make of the coordinates stay within ms of their initial values furthermore we demonstrate that using coordinates to decide on client to replica re direction leads to selecting replicas closest in term of measured latency in of all cases in another of all cases clients are re directed to replicas offering latencies that are at most two times longer than optimal finally collecting huge volume of latency data and using clustering techniques enable us to estimate latencies between globally distributed internet hosts that have not participated in our measurements at all the results are sufficiently promising that google may offer public interface to the latency estimates in the future
how do tangible systems take advantage of our sociophysical embodiment in the world how can we use tangible interaction to better understand collaboration and intersubjectivity we present parazoan an interactive installation where evocative objects collaboratively control dynamic visual display our analysis of interactions with parazoan explores our questions and discusses implications for our understanding of tangible and virtual collaboration
the goal of the mobius project is to develop proof carrying code architecture to secure global computers that consist of java enabled mobile devices in this overview we present the consumer side of the mobius proof carrying code infrastructure for which we have developed formally certified executable checkers we consider wholesale proof carrying code scenarios in which trusted authority verifies the certificate before cryptographically signing the application we also discuss retail proof carrying code where the verification is performed on the consumer device
the majority of the existing techniques for surface reconstruction and the closely related problem of normal reconstruction are deterministic their main advantages are the speed and given reasonably good initial input the high quality of the reconstructed surfaces nevertheless their deterministic nature may hinder them from effectively handling incomplete data with noise and outliers an ensemble is statistical technique which can improve the performance of deterministic algorithms by putting them into statistics based probabilistic setting in this paper we study the suitability of ensembles in normal and surface reconstruction we experimented with widely used normal reconstruction technique hoppe derose duchamp mcdonald stuetzle surface reconstruction from unorganized points computer graphics and multi level partitions of unity implicits for surface reconstruction ohtake belyaev alexa turk seidel multi level partition of unity implicits acm transactions on graphics showing that normal and surface ensembles can successfully be combined to handle noisy point sets
high performance architectures depend heavily on efficient multi level memory hierarchies to minimize the cost of accessing data this dependence will increase with the expected increases in relative distance to main memory there have been number of published proposals for cache conflict avoidance schemes we investigate the design and performance of conflict avoiding cache architectures based on polynomial modulus functions which earlier research has shown to be highly effective at reducing conflict miss ratios we examine number of practical implementation issues and present experimental evidence to support the claim that pseudo randomly indexed caches are both effective in performance terms and practical from an implementation viewpoint
handling missing values when tackling real world datasets is great challenge arousing the interest of many scientific communities many works propose completion methods or implement new data mining techniques tolerating the presence of missing values it turns out that these tasks are very hard in this paper we propose new typology characterizing missing values according to relationships within the data these relationships are automatically discovered by data mining techniques using generic bases of association rules we define four types of missing values from these relationships the characterization is made for each missing value it differs from the well known statistical methods which apply same treatment for all missing values coming from same attribute we claim that such local characterization enables us perceptive techniques to deal with missing values according to their origins the way in which we deal with the missing values should depend on their origins eg attribute meaningless wrt other attributes missing values depending on other data missing values by accident experiments on real world medical dataset highlight the interests of such characterization
quantitative characterization of skin appearance is an important but difficult task the skin surface is detailed landscape with complex geometry and local optical properties in addition skin features depend on many variables such as body location eg forehead cheek subject parameters age gender and imaging parameters lighting camera as with many real world surfaces skin appearance is strongly affected by the direction from which it is viewed and illuminated computational modeling of skin texture has potential uses in many applications including realistic rendering for computer graphics robust face models for computer vision computer assisted diagnosis for dermatology topical drug efficacy testing for the pharmaceutical industry and quantitative comparison for consumer products in this work we present models and measurements of skin texture with an emphasis on faces we develop two models for use in skin texture recognition both models are image based representations of skin appearance that are suitably descriptive without the need for prohibitively complex physics based skin models our models take into account the varied appearance of the skin with changes in illumination and viewing direction we also present new face texture database comprised of more than images corresponding to human faces locations on each face forehead cheek chin and nose and combinations of imaging angles the complete database is made publicly available for further research
the modeling of high level semantic events from low level sensor signals is important in order to understand distributed phenomena for such content modeling purposes transformation of numeric data into symbols and the modeling of resulting symbolic sequences can be achieved using statistical models markov chains mcs and hidden markov models hmms we consider the problem of distributed indexing and semantic querying over such sensor models specifically we are interested in efficiently answering range queries return all sensors that have observed an unusual sequence of symbols with high likelihood ii top queries return the sensor that has the maximum probability of observing given sequence and iii nn queries return the sensor model which is most similar to query model all the above queries can be answered at the centralized base station if each sensor transmits its model to the base station however this is communication intensive we present much more efficient alternative distributed index structure mist model based index structure and accompanying algorithms for answering the above queries mist aggregates two or more constituent models into single composite model and constructs an in network hierarchy over such composite models we develop two kinds of composite models the first kind captures the average behavior of the underlying models and the second kind captures the extreme behaviors of the underlying models using the index parameters maintained at the root of subtree we bound the probability of observation of query sequence from sensor in the subtree we also bound the distance of query model to sensor model using these parameters extensive experimental evaluation on both real world and synthetic data sets show that the mist schemes scale well in terms of network size and number of model states we also show its superior performance over the centralized schemes in terms of update query and total communication costs
in distributed environment presentation of structured composite multimedia information poses new challenges in dealing with variable bandwidth bw requirement and synchronization of media data objects the detailed knowledge of bw requirement obtained by analyzing the document structure can be used to create prefetch schedule that results in efficient utilization of system resources distributed environment consists of various system components that are either dedicated to client or shared across multiple clients shared system components could benefit from fine granularity advanced reservation fgar of resources based on true bw requirement prefetching by utilizing advance knowledge of bw requirement can further improve resource utilization in this paper we describe the jinsil retrieval system that takes into account the available bandwidth and buffer resources and the nature of sharing in each component on the delivery path it reshapes bw requirement creates prefetch schedule for efficient resource utilization in each component and reserves necessary bw and buffer we also consider good choices for placement of prefetch buffers across various system components
this paper investigates the impact of proper tile size selection on the power the power consumption for tile based processors we refer to this investigation as tile granularity study this is accomplished by distilling the architectural cost of tiles with different computational widths into system metric we call the granularity indicator gi the gi is then compared against the bisection bandwith of algorithms when partitioned across multiple tiles from this comparison the tile granularity that best fits given set of algorithms can be determined reducing the system power for that set of algorithms when the gi analysis is applied to the synchroscalar tile architecture we find that synchroscalar’s already low power consumption can be further reduced by when customized for execution of the reciever in addition the gi can also be used to evaluate tile size when considering multiple applications simultaneously providing convenient platform for hardware software co design
this article provides an overview of such embodied agents that reason about the body eg self reconfiguring robots and of research into recognizing body part as belonging to one’s own body on the part of robotic agents vs animals more sketchily we also consider such animated avatars whose movements imitate human body movements and virtual models of the human body
much effort is invested in generating natural deformations of three dimensional shapes deformation transfer simplifies this process by allowing to infer deformations of new shape from existing deformations of similar shape current deformation transfer methods can be applied only to shapes which are represented as single component manifold mesh hence their applicability to real life models is somewhat limited we propose novel deformation transfer method which can be applied to variety of shape representations tet meshes polygon soups and multiple component meshes our key technique is deformation of the space in which the shape is embedded we approximate the given source deformation by harmonic map using set of harmonic basis functions then given sparse set of user selected correspondence points between the source and target shapes we generate deformation of the target shape which has differential properties similar to those of the source deformation our method requires only the solution of linear systems of equations and hence is very robust and efficient we demonstrate its applicability on wide range of deformations for different shape representations
in network aggregation is an essential primitive for performing queries on sensor network data however most aggregation algorithms assume that all intermediate nodes are trusted in contrast the standard threat model in sensor network security assumes that an attacker may control fraction of the nodes which may misbehave in an arbitrary byzantine mannerwe present the first algorithm for provably secure hierarchical in network data aggregation our algorithm is guaranteed to detect any manipulation of the aggregate by the adversary beyond what is achievable through direct injection of data values at compromised nodes in other words the adversary can never gain any advantage from misrepresenting intermediate aggregation computations our algorithm incurs only log node congestion supports arbitrary tree based aggregator topologies and retains its resistance against aggregation manipulation in the presence of arbitrary numbers of malicious nodes the main algorithm is based on performing the sum aggregation securely by first forcing the adversary to commit to its choice of intermediate aggregation results and then having the sensor nodes independently verify that their contributions to the aggregate are correctly incorporated we show how to reduce secure median count and average to this primitive
to deliver effective personalization for digital library users it is necessary to identify which human factors are most relevant in determining the behavior and perception of these users this paper examines three key human factors cognitive styles levels of expertise and gender differences and utilizes three individual clustering techniques means hierarchical clustering and fuzzy clustering to understand user behavior and perception moreover robust clustering capable of correcting the bias of individual clustering techniques is used to obtain deeper understanding the robust clustering approach produced results that highlighted the relevance of cognitive style for user behavior ie cognitive style dominates and justifies each of the robust clusters created we also found that perception was mainly determined by the level of expertise of user we conclude that robust clustering is an effective technique to analyze user behavior and perception
this paper presents system for rapid editing of highly dynamic motion capture data at the heart of this system is an optimization algorithm that can transform the captured motion so that it satisfies high level user constraints while enforcing that the linear and angular momentum of the motion remain physically plausible unlike most previous approaches to motion editing our algorithm does not require pose specification or model reduction and the user only need specify high level changes to the input motion to preserve the dynamic behavior of the input motion we introduce spline based parameterization that matches the linear and angular momentum patterns of the motion capture data because our algorithm enables rapid convergence by presenting good initial state of the optimization the user can efficiently generate large number of realistic motions from single input motion the algorithm can then populate the dynamic space of motions by simple interpolation effectively parameterizing the space of realistic motions we show how this framework can be used to produce an effective interface for rapid creation of dynamic animations as well as to drive the dynamic motion of character in real time
distributed query processing algorithms usually perform data reduction by using semijoin program but the problem with these approaches is that they still require an explicit join of the reduced relations in the final phase we introduce an efficient algorithm for join processing in distributed database systems that makes use of bipartite graphs in order to reduce data communication costs and local processing costs the bipartite graphs represent the tuples that can be joined in two relations taking also into account the reduction state of the relations this algorithm fully reduces the relations at each site we then present an adaptive algorithm for response time optimization that takes into account the system configuration ie the additional resources available and the data characteristics in order to select the best strategy for response time minimization we also report on the results of set of experiments which show that our algorithms outperform number of the recently proposed methods for total processing time and response time minimization
queries are convenient abstractions for the discovery of information and services as they offer content based information access in distributed settings query semantics are well defined for example queries are often designed to satisfy acid transactional properties when query processing is introduced in dynamic network setting achieving transactional semantics becomes complex due to the open and unpredictable environment in this article we propose query processing model for mobile ad hoc and sensor networks that is suitable for expressing wide range of query semantics the semantics differ in the degree of consistency with which query results reflect the state of the environment during query execution we introduce several distinct notions of consistency and formally express them in our model practical and significant contribution of this article is protocol for query processing that automatically assesses and adaptively provides an achievable degree of consistency given the operational environment throughout its execution the protocol attaches an assessment of the achieved guarantee to returned query results allowing precise reasoning about query with range of possible semantics we evaluate the performance of this protocol and demonstrate the benefits accrued to applications through examples drawn from an industrial application
component oriented and service oriented approaches have gained strong enthusiasm in industries and academia with particular interest for service oriented approaches component is software entity with given functionalities made available by provider and used to build other application within which it is integrated the service concept and its use in web based application development have huge impact on reuse practices accordingly considerable part of software architectures is influenced these architectures are moving towards service oriented architectures therefore applications re use services that are available elsewhere and many applications interact without knowing each other using services available via service servers and their published interfaces and functionalitiesindustries propose through various consortium languages technologies and standards more academic works are also undertaken concerning semantics and formalisation of components and service based systemswe consider here both streams of works in order to raise research concerns that will help in building quality softwareare there new challenging problems with respect to service based software construction to service construction an especially to software verification besides what are the links and the advances compared to distributed systems specific emphasis should be put on correctness properties of services and on service based systems in order to ensure their quality and therefrom the durability of information systems and applications therefore an important research issue is to reason on the correctness of software applications that will dynamically use or embed existing services for example additionally to the formal specification of its functionalities service may embed its specific properties and the certificate proof that guarantees these properties
we present template based approach to detecting human silhouettes in specific walking pose our templates consist of short sequences of silhouettes obtained from motion capture data this lets us incorporate motion information into them and helps distinguish actual people who move in predictable way from static objects whose outlines roughly resemble those of humans moreover during the training phase we use statistical learning techniques to estimate and store the relevance of the different silhouette parts to the recognition task at run time we use it to convert chamfer distance to meaningful probability estimates the templates can handle six different camera views excluding the frontal and back view as well as different scales we demonstrate the effectiveness of our technique using both indoor and outdoor sequences of people walking in front of cluttered backgrounds and acquired with moving camera which makes techniques such as background subtraction impractical
self protecting systems require the ability to instantaneously detect malicious activity at run time and prevent execution we argue that it is impossible to perfectly self protect systems without false positives due to the limited amount of information one might have at run time and that eventually some undesirable activity will occur that will need to be rolled back as consequence of this it is important that self protecting systems have the ability to completely and automatically roll back malicious activity which has occurred as the cost of human resources currently dominates the cost of cpu network and storage resources we contend that computing systems should be built with automated analysis and recovery as primary goal towards this end we describe the design implementation and evaluation of forensix robust high precision analysis and recovery system for supporting self healing the forensix system records all activity of target computer and allows for efficient automated reconstruction of activity when needed such system can be used to automatically detect patterns of malicious activity and selectively undo their operations forensix uses three key mechanisms to improve the accuracy and reduce the human overhead of performing analysis and recovery first it performs comprehensive monitoring of the execution of target system at the kernel event level giving high resolution application independent view of all activity second it streams the kernel event information in real time to append only storage on separate hardened logging machine making the system resilient to wide variety of attacks third it uses database technology to support high level querying of the archived log greatly reducing the human cost of performing analysis and recovery
compiler optimizations pose many problems to source level debugging of an optimized program due to reordering insertion and deletion of code on such problem is to determine whether the value of varible is current at breakpoint mdash that is whether its actual value is the same as its expected value we use the notion of dynamic currency of variable in source level debugging and propose the use of minimal unrolled graph to reduce the run time overhead of dynamic currency determination we prove that the minimal unrolled graph is an adequate basis for performing bit vector data flow analyses at breakpoint this property is used to perform dynamic currency determination it is also shown to help in recovery of dynamically noncurrent variable
this paper presents novel algorithm for generating watertight level set from an octree we show that the levelset can be efficiently extracted regardless of the topology of the octree or the values assigned to the vertices the key idea behind our approach is the definition of set of binary edge trees derived from the octree’s topology we show that the edge trees can be used define the positions of the isovalue crossings in consistent fashion and to resolve inconsistencies that may arise when single edge has multiple isovalue crossings using the edge trees we show that provably watertight mesh can be extracted from the octree without necessitating the refinement of nodes or modification of their values
this paper describes our development of the starburst rule system an active database rules facility integrated into the starburst extensible relational database system at the ibm almaden research center the starburst rule language is based on arbitrary database state transitions rather than tuple or statement level changes yielding clear and flexible execution semantics the rule system has been implemented completely its rapid implementation was facilitated by the extensibility features of starburst and rule management and rule processing are integrated into all aspects of database processing
with the increased complexity of platforms the growing demand of applications and data centers servers sprawl power consumption is reaching unsustainable limits the need to improved power management is becoming essential for many reasons including reduced power consumption cooling improved density reliability compliance with environmental standards this paper presents theoretical framework and methodology for autonomic power and performance management in business data centers we optimize for power and performance performance per watt at each level of the hierarchy while maintaining scalability we adopt mathematically rigorous optimization approach to minimize power while meeting performance constraints our experimental results show around savings in power while maintaining performance as compared to static power management techniques and additional savings with both global and local optimizations
ieee and mote devices are today two of the most interesting wireless technologies for ad hoc and sensor networks respectively and many efforts are currently devoted to understanding their potentialities unfortunately few works adopt an experimental approach though several papers highlight that popular simulation and analytical approximations may lead to very inaccurate results in this paper we discuss outcomes from an extensive measurement study focused on these technologies we analyze the dependence of the communication range on several parameters such as node distance from the ground transmission data rate environment humidity then we study the extent of the physical carrier sensing zone around sending node on the basis of these elements we provide unified wireless link model for both technologies finally by using this model we analyze well known scenarios such as the hidden node problem and we modify the traditional formulations according to our experimental results
there has been lot of research and industrial effort on building xquery engines with different kinds of xml storage and index models however most of these efforts focus on building either an efficient xquery engine with one kind of xml storage index view model in mind or general xquery engine without any consideration of the underlying xml storage index and view model we need an underlying framework to build an xquery engine that can work with and provide optimization for different xml storage index and view models besides xquery rdbmss also support sql xml standard language that integrates xml and relational processing there are industrial efforts for building hybrid xquery and sql xml engines that support both languages so that users can manage and query both relational and xml data on one platform however we need theoretical framework to optimize both sql xml and xquery languages in one rdbms in this paper we show our industrial work of building combined xquery and sql xml engine that is able to work and provide optimization for different kinds of xml storage and index models in oracle xmldb this work is based on xml extended relational algebra as the underlying tuple based logical algebra and incorporates tree and automata based physical algebra into the logical tuple based algebra so as to provide optimization for different physical xml formulations this results in logical and physical rewrite techniques to optimize xquery and sql xml over variety of physical xml storage index and view models including schema aware object relational xml storage with relational indexes binary xml storage with schema agnostic path value order key xmlindex sql xml view over relational data and relational view over xml furthermore we show the approach of leveraging cost based xml physical rewrite strategy to evaluate different physical rewrite plans
efficiently simulating large deformations of flexible objects is challenging problem in computer graphics in this paper we present physically based approach to this problem using the linear elasticity model and finite elements method to handle large deformations in the linear elasticity model we exploit the domain decomposition method based on the observation that each sub domain undergoes relatively small local deformation involving global rigid transformation in order to efficiently solve the deformation at each simulation time step we pre compute the object responses in terms of displacement accelerations to the forces acting on each node yielding force displacement matrix however the force displacement matrix could be too large to handle for densely tessellated objects to address this problem we present two methods the first method exploits spatial coherence to compress the force displacement matrix using the clustered principal component analysis method and the second method pre computes only the force displacement vectors for the boundary vertices of the sub domains and resorts to the cholesky factorization to solve the acceleration for the internal vertices of the sub domains finally we present some experimental results to show the large deformation effects and fast performance on complex large scale objects under interactive user manipulations
gestural user interfaces designed for planar touch sensitive tactile displays require an appropriate concept for teaching gestures and other haptic interaction to blind users we consider proportions of hands and demonstrate gestures by tactile only methods without the need for braille skills or verbalization user test was performed to confirm blind users may learn gestures autonomously
we present new transformation of terms into continuation passing style cps this transformation operates in one pass and is both compositional and first order previous cps transformations only enjoyed two out of the three properties of being first order one pass and compositional but the new transformation enjoys all three properties it is proved correct directly by structural induction over source terms instead of indirectly with colon translation as in plotkin’s original proof similarly it makes it possible to reason about cps transformed terms by structural induction over source terms directlythe new cps transformation connects separately published approaches to the cps transformation it has already been used to state new and simpler correctness proof of direct style transformation and to develop new and simpler cps transformation of control flow information
this paper addresses the problems of state space decomposition and predicate detection in distributed computation involving asynchronous messages we introduce natural communication dependency which leads to the definition of the communication graph this abstraction proves to be useful tool to decompose the state lattice of distributed computation into simpler structures known as concurrent intervals efficient algorithms have been proposed in the literature to detect special classes of predicates such as conjunctive predicates and bounded sum predicates we show that more general classes of predicates can be detected when proper constraints are imposed on the underlying computations in particular we introduce class of predicates defined herein as separable predicates that properly includes the above mentioned classes we show that separable predicates can be efficiently detected on distributed computations whose communication graphs satisfy the series parallel constraint
this paper presents decentralized peer to peer web cache called squirrel the key idea is to enable web browsers on desktop machines to share their local caches to form an efficient and scalable web cache without the need for dedicated hardware and the associated administrative cost we propose and evaluate decentralized web caching algorithms for squirrel and discover that it exhibits performance comparable to centralized web cache in terms of hit ratio bandwidth usage and latency it also achieves the benefits of decentralization such as being scalable self organizing and resilient to node failures while imposing low overhead on the participating nodes
in this paper we present method for grouping relevant object contours in edge maps by taking advantage of contour skeleton duality regularizing contours and skeletons simultaneously allows us to combine both low level perceptual constraints as well as higher level model constraints in very effective way the models are represented using paths in symmetry sets skeletons are treated as trajectories of an imaginary virtual robot in discrete space of symmetric points obtained from pairs of edge segments boundaries are then defined as the maps obtained by grouping the associated pairs of edge segments along the trajectories casting the grouping problem in this manner makes it similar to the problem of simultaneous localization and mapping slam hence we adapt the state of the art probabilistic framework namely rao blackwellized particle filtering that has been successfully applied to slam we use the framework to maximize the joint posterior over skeletons and contours
this paper describes resource management system for soft real time distributed object system that is based on three level feedback loop the resource management system employs profiling algorithm that monitors the usage of the resources least laxity scheduling algorithm that schedules the methods of the tasks and hot spot and cooling algorithms that allocate and migrate objects to balance the loads on the resources the resource management system consists of single resource manager for the distributed system and profiler and scheduler located on each of the processors in the distributed system
symbolic execution is flexible and powerful but computationally expensive technique to detect dynamic behaviors of program in this paper we present context sensitive relevancy analysis algorithm based on weighted pushdown model checking which pinpoints memory locations in the program where symbolic values can flow into this information is then utilized by code instrumenter to transform only relevant parts of the program with symbolic constructs to help improve the efficiency of symbolic execution of java programs our technique is evaluated on generalized symbolic execution engine that is developed upon java path finder with checking safety properties of java applications our experiments indicate that this technique can effectively improve the performance of the symbolic execution engine with respect to the approach that blindly instruments the whole program
we present flightpath novel peer to peer streaming application that provides highly reliable data stream to dynamic set of peers we demonstrate that flightpath reduces jitter compared to previous works by several orders of magnitude furthermore flightpath uses number of run time adaptations to maintain low jitter despite of the population behaving maliciously and the remaining peers acting selfishly at the core of flightpath’s success are approximate equilibria these equilibria allow us to design incentives to limit selfish behavior rigorously yet they provide sufficient flexibility to build practical systems we show how to use an nash equilibrium instead of strict nash to engineer live streaming system that uses bandwidth efficiently absorbs flash crowds adapts to sudden peer departures handles churn and tolerates malicious activity
this paper introduces texture representation suitable for recognizing images of textured surfaces under wide range of transformations including viewpoint changes and nonrigid deformations at the feature extraction stage sparse set of affine harris and laplacian regions is found in the image each of these regions can be thought of as texture element having characteristic elliptic shape and distinctive appearance pattern this pattern is captured in an affine invariant fashion via process of shape normalization followed by the computation of two novel descriptors the spin image and the rift descriptor when affine invariance is not required the original elliptical shape serves as an additional discriminative feature for texture recognition the proposed approach is evaluated in retrieval and classification tasks using the entire brodatz database and publicly available collection of photographs of textured surfaces taken from different viewpoints
so far the core component of the it system was absolutely server and the storage was recognized as its peripheral the recent evolution of device and network technologies has enabled storage consolidation by which all the data and its related simple software codes can be placed in one place storage centric designs are being deployed into many enterprise systems the role of the storage should be reconsidered this paper presents activities of the storage fusion project five year research and development project storage fusion is an idea of elegant deep collaboration between storage and database servers two substantial works are presented in this paper first the exploitation of query execution plans enables dynamically informed prefetching accordingly boosting ad hoc queries significantly second the idea of putting autonomic database reorganization into the storage has the potential benefit of relieving the management burdens of database structural deterioration
reo is channel based coordination model whose operational semantics is given by constraint automata ca quantitative constraint automata extend ca and hence reo with quantitative models to capture such non functional aspects of system’s behaviour as delays costs resource needs and consumption that depend on the internal details of the system however the performance of system can crucially depend not only on its internal details but also on how it is used in an environment as determined for instance by the frequencies and distributions of the arrivals of requests in this paper we propose quantitative intentional automata qia an extension of ca that allow incorporating the influence of system’s environment on its performance moreover we show the translation of qia into continuous time markov chains ctmcs which allows us to apply existing ctmc tools and techniques for performance analysis of qia and reo circuits
in this paper describe an ethnographic study of children and parents looking at issues of domestic privacy and security will provide an overview of parental rules and strategies for keeping children safe and briefly discuss children’s perspective on their online safety and how their parents shared the domestic work and responsibility for protecting them as part of the discussion will present implications for design and reflect on the problematic state of ethics privacy ethics review boards when working with children
component based software system consists of well encapsulated components that interact with each other via their interfaces software integration tests are generated to test the interactions among different components these tests are usually in the form of sequences of interface method calls although many components are equipped with documents that provide informal specifications of individual interface methods few documents specify component interaction constraints on the usage of these interface methods including the order in which these methods should be called and the constraints on the method arguments and returns across multiple methods in this paper we propose substra framework for automatic generation of software integration tests based on call sequence constraints inferred from initial test executions or normal runs of the subsystem under test two types of sequencing constraints are inferred shared subsystem states and object define use relationships the inferred constraints are used to guide automatic generation of integration tests we have implemented substra with tool and applied the tool on an atm example the preliminary results show that the tool can effectively generate integration tests that exercise new program behaviors
robot control in uncertain and dynamic environments can be greatly improved using sensor based control vision is versatile low cost sensory modality but low sample rate high sensor delay and uncertain measurements limit its usability especially in strongly dynamic environments vision can be used to estimate dof pose of an object by model based pose estimation methods but the estimate is typically not accurate along all degrees of freedom force is complementary sensory modality allowing accurate measurements of local object shape when tooltip is in contact with the object in multimodal sensor fusion several sensors measuring different modalities are combined together to give more accurate estimate of the environment as force and vision are fundamentally different sensory modalities not sharing common representation combining the information from these sensors is not straightforward we show that the fusion of tactile and visual measurements enables to estimate the pose of moving target at high rate and accuracy making assumptions of the object shape and carefully modeling the uncertainties of the sensors the measurements can be fused together in an extended kalman filter experimental results show greatly improved pose estimates with the proposed sensor fusion
the design of decision procedures for combinations of theories sharing some arithmetic fragment is challenging problem in verification one possible solution is to apply combination method la nelson oppen like the one developed by ghilardi for unions of non disjoint theories we show how to apply this non disjoint combination method with the theory of abelian groups as shared theory we consider the completeness and the effectiveness of this non disjoint combination method for the completeness we show that the theory of abelian groups can be embedded into theory admitting quantifier elimination for achieving effectiveness we rely on superposition calculus modulo abelian groups that is shown complete for theories of practical interest in verification
we propose in this paper technique for the acceleration of embedded java virtual machines the technique relies on an established synergy between efficient interpretation and selective dynamic compilation actually efficient interpretation is achieved by generated threaded interpreter that is made of pool of codelets the latter are native code units efficiently implementing the dynamic semantics of given bytecode besides each codelet carries out the dispatch to the next bytecode eliminating therefore the need for costly centralized traditional dispatch mechanism the acceleration technique described in this paper advocates the use of selective dynamic compiler to translate performance critical methods to native code the translation process takes advantage of the threaded interpreter by reusing most of the previously mentioned codelets this tight collaboration between the interpreter and the dynamic compiler leads to fast and lightweight in terms of footprint execution of java class files
there is growing interest in designing high performance network devices to perform packet processing at flow level applications such as stateful access control deep inspection and flow based load balancing all require efficient flow level packet processing in this paper we present design of high performance flow level packet processing system based on multi core network processors main contribution of this paper includes high performance flow classification algorithm optimized for network processors an efficient flow state management scheme leveraging memory hierarchy to support large number of concurrent flows two hardware optimized order preserving strategies that preserve internal and external per flow packet order experimental results show that the proposed flow classification algorithm aggrecuts outperforms the well known hicuts algorithm in terms of classification rate and memory usage the presented sighash scheme can manage over concurrent flow states on the intel ixp np with extremely low collision rate the performance of internal packet order preserving scheme using sram queue array is about of that of external packet order preserving scheme realized by ordered thread execution
this paper explores an important and relatively unstudied quality measure of sponsored search advertisement bounce rate the bounce rate of an ad can be informally defined as the fraction of users who click on the ad but almost immediately move on to other tasks high bounce rate can lead to poor advertiser return on investment and suggests search engine users may be having poor experience following the click in this paper we first provide quantitative analysis showing that bounce rate is an effective measure of user satisfaction we then address the question can we predict bounce rate by analyzing the features of the advertisement an affirmative answer would allow advertisers and search engines to predict the effectiveness and quality of advertisements before they are shown we propose solutions to this problem involving large scale learning methods that leverage features drawn from ad creatives in addition to their keywords and landing pages
we describe the design and implementation of sword scalable resource discovery service for wide area distributed systems in contrast to previous systems sword allows users to describe desired resources as topology of interconnected groups with required intragroup intergroup and per node characteristics along with the utility that the application derives from specified ranges of metric values this design gives users the flexibility to find geographically distributed resources for applications that are sensitive to both node and network characteristics and allows the system to rank acceptable configurations based on their quality for that application rather than evaluating single implementation of sword we explore variety of architectural designs that deliver the required functionality in scalable and highly available manner we discuss the trade offs of using centralized architecture as compared to fully decentralized design to perform wide area resource discovery to summarize our results we found that centralized architecture based on node server cluster sites at network peering facilities outperforms decentralized dht based resource discovery infrastructure with respect to query latency for all but the smallest number of sites however although centralized architecture shows significant promise in stable environments we find that our decentralized implementation has acceptable performance and also benefits from the dht’s self healing properties in more volatile environments we evaluate the advantages and disadvantages of centralized and distributed resource discovery architectures on hosts in emulation and on approximately planetlab nodes spread across the internet
the dominant architecture for the next generation of shared memory multiprocessors is cc numa cache coherent non uniform memory architecture these machines are attractive as compute servers because they provide transparent access to local and remote memory however the access latency to remote memory is to times the latency to local memory cc now machines provide the benefits of cache coherence to networks of workstations at the cost of even higher remote access latency given the large remote access latencies of these architectures data locality is potentially the most important performance issue using realistic workloads we study the performance improvements provided by os supported dynamic page migration and replication analyzing our kernel based implementation we provide detailed breakdown of the costs we show that sampling of cache misses can be used to reduce cost without compromising performance and that tlb misses may not be consistent approximation for cache misses finally our experiments show that dynamic page migration and replication can substantially increase application performance as much as and reduce contention for resources in the numa memory system
in densely packed environments access point domains significantly overlap and wireless hosts interfere with each other in complex ways knowing which devices interfere is an essential first step to minimizing this interference improving efficiency and delivering quality connectivity throughout the network this knowledge however is extremely difficult to obtain without either taking running network offline for measurements or having client hosts monitor and report airspace anomalies something typically outside the control of network administrators in this paper we describe technique we have developed to reveal wireless network interference relationships by examining the network traffic at wired routers that connects wireless domains to the internet this approach which we call void vvirless online interference detection searches for correlated throughput changes that occur when traffic from one node causes throughput drop at other nodes in its radio range in one analysis round we identify each node’s interference neighbours using single set of performance data collected from wired network router we have evaluated void in emulab testbeds consisting of tens of nodes as well as six node testbed in live wireless network the initial results have shown the promise of void to accurately correlate interfering devices together and effectively discriminate interfering devices from non interfering ones
in this paper we initiate study on comparing artifact centric workflow schemas in terms of the ability of one schema to emulate the possible behaviors of another schema artifact centric workflows are centered around business artifacts which contain both data schema which can hold all of the data about key business entity as it passes through workflow along with lifecycle schema which specifies the possible ways that the entity can evolve through the workflow in this paper the data schemas for artifact types are finite sets of attribute value pairs and the lifecycle schemas are specified as sets of condition action rules where the condition is evaluated against the current snapshot of the artifact and where the actions are external services or tasks which read subset of the attributes of an artifact which write onto subset of the attributes and which are performed by an entity outside of the workflow system often human the services are also characterized by pre and post conditions in the spirit of semantic web services to compare artifact centric workflows we introduce the notion of dominance which intuitively captures the fact that all executions of workflow can be emulated by second workflow in the current paper the emulation is focused only on the starting and ending snapshots of the possible enactments of the two workflows in fact dominance is parametric notion that depends on the characterization of the policies that govern the execution of the services invoked by the workflows in this paper we study in detail the case of absolute dominance in which this policy places no constraints on the possible service executions we provide decidability and complexity results for bounded and unbounded workflow executions in the cases where the values in an artifact range over an infinite structure such as the integers the rationals or the reals possibly with order addition or multiplication
we describe path approximate search process based on an extended editing distance designed to manage don’t care characters with variable length in path matching scheme extending xpath the structural path is bounded to conditional properties using variables whose values are retrieved thanks to backtracking processed on the editing distance matrix this system provides dedicated iterator for xml query and processing scripting language that features large xml document collection management joint operations and extraction features
most web applications are data intensive ie they rely heavily on dynamic contents usually stored in databases website design and maintenance can greatly benefit from conceptual descriptions of both data and hypermedia aspects ie those design dimensions which distinguish this application class the data upon which the content is based the way dynamic contents are composed together to form pages and how pages are linked together in order to move across the application content the paper proposes webile visual domain specific language based on uml which enables model driven approach to high level specification of web applications in contrast with other approaches webile exploits the uml meta model architecture by serialising the specifications in the xmi interchange format this representation provides interoperability amongst different operative platforms and enables an xsl transformation based automatic generation of the applications that are being designed
we propose nce an efficient algorithm to identify and extract relevant content from news webpages we define relevant as the textual sections that more objectively describe the main event in the article this includes the title and the main body section and excludes comments about the story and presentation elements our experiments suggest that nce is competitive in terms of extraction quality with the best methods available in the literature it achieves in our test corpus containing news webpages from sites the main advantages of our method are its simplicity and its computational performance it is at least an order of magnitude faster than methods that use visual features this characteristic is very suitable for applications that process large number of pages
in recent years the labels gossip and gossip based have been applied to an increasingly general class of algorithms including approaches to information aggregation overlay network management and clock synchronization these algorithms are intuitively similar irrespective of their purpose their distinctive features include relying on local information being round based and relatively simple and having bounded information transmission and processing complexity in each round our position is that this class can and should be significantly extended to involve algorithms from other disciplines that share the same or similar distinctive features like certain parallel numerical algorithms routing protocols bio inspired algorithms and cellular automata to name but few such broader perspective would allow us to import knowledge and tools to design and understand gossip based distributed systems and we could also export accumulated knowledge to re interpret some of the problems in other disciplines such as vehicular traffic control in this position paper we describe number of areas that show parallels with gossip protocols these example areas will hopefully serve as inspiration for future research in addition we believe that comparisons with other fields also helps clarify the definition of gossip protocols and represents necessary first step towards an eventual formal definition
general framework that integrates both control and data speculation using alias profiling and or compiler heuristic rules has shown to improve spec performance on itanium systems however speculative optimizations require check instructions and recovery code to ensure correct execution when speculation fails at runtime how to generate check instructions and their associated recovery code efficiently and effectively is an issue yet to be well studied also it is very important that the recovery code generated in the earlier phases integrate gracefully in the later optimization phases at the very least it should not hinder later optimizations thus ensuring overall performance improvement this paper proposes framework that uses an if block structure to facilitate check instructions and recovery code generation for general speculative optimizations it allows speculative instructions and their recovery code generated in the early compiler optimization phases to be integrated effectively with the subsequent optimization phases it also allows multi level speculation for multi level pointers and multi level expression trees to be handled with no additional complexity the proposed recovery code generation framework has been implemented in the open research compiler orc
many nlp applications entail that texts are classified based on their semantic distance how similar or different the texts are for example comparing the text of new document to that of documents of known topics can help identify the topic of the new text typically distributional distance is used to capture the implicit semantic distance between two pieces of text however such approaches do not take into account the semantic relations between words in this article we introduce an alternative method of measuring the semantic distance between texts that integrates distributional information and ontological knowledge within network flow formalism we first represent each text as collection of frequency weighted concepts within an ontology we then make use of network flow method which provides an efficient way of explicitly measuring the frequency weighted ontological distance between the concepts across two texts we evaluate our method in variety of nlp tasks and find that it performs well on two of three tasks we develop new measure of semantic coherence that enables us to account for the performance difference across the three data sets shedding light on the properties of data set that lends itself well to our method
since more and more business data are represented in xml format there is compelling need of supporting analytical operations in xml queries particularly the latest version of xquery proposed by wc xquery introduces new construct to explicitly express grouping operation in flwor expression existing works in xml query processing mainly focus on physically matching query structure over xml document given the explicit grouping operation in query how to efficiently compute grouping and aggregate functions over xml document is not well studied yet in this paper we extend our previous xml query processing algorithm vert to efficiently perform grouping and aggregate function in queries the main technique of our approach is introducing relational tables to index values query pattern matching and aggregation computing are both conducted with table indices we also propose two semantic optimizations to further improve the query performance finally we present experimental results to validate the efficiency of our approach over other existing approaches
in the design of wireless mesh networks wmns one of the fundamental considerations is the reliability and availability of communication paths between network pairs in the presence of nodes failure the reliability and deployment cost are important and are largely determined by network topology usually network performance and reliability are considered separately in this paper we propose new algorithm based on ear decomposition for constructing reliable wmn infrastructure that resists the failure of single mesh node and ensures full coverage to all mesh clients mcs via case study we show the tied relationship between network deployment cost performance and reliability in simultaneous optimization of cost and load balance over network channels the optimization model proposed is solved using metaheuristics which provides the network operator with set of reliable tradeoff solutions
differentiated service approach has been proposed as potential solution to provide quality of services qos in the next generation internet the ultimate goal of end to end service differentiation can be achieved by complementing the network level qos with the service differentiation at the internet servers in this paper we have presented detailed study of the performance of service differentiating web servers sdis various aspects such as admission control scheduling and task assignment schemes for sdis have been evaluated through real workload traces the impact of these aspects has been quantified in the simulation based study under high system utilization service differentiating server provides significantly better services to high priority tasks compared to traditional internet server combination of selective early discard and priority based task scheduling and assignment is required to provide efficient service differentiation at the servers the results of these studies could be used as foundation for further studies on service differentiating internet servers
in recent years overlay networks have become an important vehicle for delivering internet applications overlay network nodes are typically implemented using general purpose servers or clusters we investigate the performance benefits of more integrated architectures combining general purpose servers with high performance network processor np subsystems we focus on planetlab as our experimental context and report on the design and evaluation of an experimental planetlab platform capable of much higher levels of performance than typical system configurations to make it easier for users to port applications the system supports fast path slow path application structure that facilitates the mapping of the most performance critical parts of an application onto an np subsystem while allowing the more complex control and exception handling to be implemented within the programmer friendly environment provided by conventional servers we report on implementations of two sample applications an ipv router and forwarding application for the internet indirection infrastructure we demonstrate an improvement in packet processing rates and comparable reductions in latency
high performance parallel programs are currently difficult to write and debug one major source of difficulty is protecting concurrent accesses to shared data with an appropriate synchronization mechanism locks are the most common mechanism but they have number of disadvantages including possibly unnecessary serialization and possible deadlock transactional memory is an alternative mechanism that makes parallel programming easier with transactional memory transaction provides atomic and serializable operations on an arbitrary set of memory locations when transaction commits all operations within the transaction become visible to other threads when it aborts all operations in the transaction are rolled backtransactional memory can be implemented in either hardware or software straightforward hardware approach can have high performance but imposes strict limits on the amount of data updated in each transaction software approach removes these limits but incurs high overhead we propose novel hybrid hardware software transactional memory scheme that approaches the performance of hardware scheme when resources are not exhausted and gracefully falls back to software scheme otherwise
we recently developed static analysis to extract runtime architectures from object oriented programs written in existing languages the approach relies on adding ownership domain annotations to the code and statically extracts hierarchical runtime architecture from an annotated program we present promising results from week long on site field study to evaluate the method and the tools on kloc module of kloc commercial system in few days we were able to add the annotations to the module and extract top level architecture for review by developer
in many applications involving continuous data streams data arrival is bursty and data rate fluctuates over time systems that seek to give rapid or real time query responses in such an environment must be prepared to deal gracefully with bursts in data arrival without compromising system performance we discuss one strategy for processing bursty streams adaptive load aware scheduling of query operators to minimize resource consumption during times of peak load we show that the choice of an operator scheduling strategy can have significant impact on the runtime system memory usage as well as output latency our aim is to design scheduling strategy that minimizes the maximum runtime system memory while maintaining the output latency within prespecified bounds we first present chain scheduling an operator scheduling strategy for data stream systems that is near optimal in minimizing runtime memory usage for any collection of single stream queries involving selections projections and foreign key joins with stored relations chain scheduling also performs well for queries with sliding window joins over multiple streams and multiple queries of the above types however during bursts in input streams when there is buildup of unprocessed tuples chain scheduling may lead to high output latency we study the online problem of minimizing maximum runtime memory subject to constraint on maximum latency we present preliminary observations negative results and heuristics for this problem thorough experimental evaluation is provided where we demonstrate the potential benefits of chain scheduling and its different variants compare it with competing scheduling strategies and validate our analytical conclusions
many html pages are generated by software programs by querying some underlying databases and then filling in template with the data in these situations the metainformation about the data structure is lost so automated software programs cannot process these data in such powerful manners as information from databases we propose set of novel techniques for detecting structured records in web page and extracting the data values that constitute them our method needs only an input page it starts by identifying the data region of interest in the page then it is partitioned into records by using clustering method that groups similar subtrees in the dom tree of the page finally the attributes of the data records are extracted by using method based on multiple string alignment we have tested our techniques with high number of real web sources obtaining high precision and recall values
this article describes method for creating an evaluation measure for discourse understanding in spoken dialogue systems no well established measure has yet been proposed for evaluating discourse understanding which has made it necessary to evaluate it only on the basis of the system’s total performance such evaluations however are greatly influenced by task domains and dialogue strategies to find measure that enables good estimation of system performance only from discourse understanding results we enumerated possible discourse understanding related metrics and calculated their correlation with the system’s total performance through dialogue experiments
generating extension of program specializes the program with respect to part of the input applying partial evaluator to the program trivially yields generating extension but specializing the partial evaluator with respect to the program often yields more efficient one this specialization can be carried out by the partial evaluator itself semi in this case the process is known as the second futamura projectionwe derive an ml implementation of the second futamura projection for type directed partial evaluation tdpe due to the differences between lsquo traditional rsquo syntax directed partial evaluation and tdpe this derivation involves several conceptual and technical steps these include suitable formulation of the second futamura projection and techniques for making tdpe amenable to self application in the context of the second futamura projection we also compare and relate tdpe with conventional off line partial evaluationwe demonstrate our technique with several examples including compiler generation for tiny prototypical imperative language
stateless model checking is useful state space exploration technique for systematically testing complex real world software existing stateless model checkers are limited to the verification of safety properties on terminating programs however realistic concurrent programs are nonterminating property that significantly reduces the efficacy of stateless model checking in testing them moreover existing stateless model checkers are unable to verify that nonterminating program satisfies the important liveness property of livelock freedom property that requires the program to make continuous progress for any input to address these shortcomings this paper argues for incorporating fair scheduler in stateless exploration the key contribution of this paper is an explicit scheduler that is strongly fair and at the same time sufficiently nondeterministic to guarantee full coverage of safety propertieswe have implemented the fair scheduler in the chess model checker we show through theoretical arguments and empirical evaluation that our algorithm satisfies two important properties it visits all states of finite state program achieving state coverage at faster rate than existing techniques and it finds all livelocks in finite state program before this work nonterminating programs had to be manually modified in order to apply chess to them the addition of fairness has allowed chess to be effectively applied to real world nonterminating programs without any modification for example we have successfully booted the singularity operating system under the control of chess
this paper provides an overview of how empirical research can be valid approach to improve epistemological foundations and ontological representations in software engineering se despite of all the research done in se most of the results have not been yet been stated as laws theories hypothesis or conjectures ie from an epistemological point of view this paper explores such facts and advocates that the use of empirical methods can help to improve this situation furthermore it is also imperative for se experiments to be planned and executed properly in order to be valid epistemologically finally this paper presents some epistemological and ontological results obtained from empirical research in se
the open source mobility middleware developed in the fuego core project provides stack for efficient xml processing on limited devices its components are persistent map api advanced xml serialization and out of order parsing with byte level access xas data structures and algorithms for lazy manipulation and random access to xml trees reftree and component for xml document management raxs such as packaging versioning and synchronization the components provide toolbox of simple and lightweight xml processing techniques rather than complete xml database we demonstrate the fuego xml stack by building viewer and multiversion editor capable of processing gigabyte sized wikipedia xml files on mobile phone we present performance measurements obtained on the phone and comparison to implementations based on existing technologies these show that the fuego xml stack allows going beyond what is commonly considered feasible on limited devices in terms of xml processing and that it provides advantages in terms of decreased set up time and storage space requirements compared to existing approaches
the exciting developments in the world wide web www have revived interest in computer simulation for modeling particularly for conceiving simulation languages and building model libraries that can be assembled and executed over the internet and for analysis particularly for developing simulation optimization algorithms for parallel experimentation this paper contributes to this second stream of research by introducing framework for optimization via simulation ovs through parallel replicated discrete event simulation prdes in particular we combine nested partitions np and extended optimal computing budget allocation eocba to provide an efficient framework for prdes experiments the number of candidate alternatives to be evaluated can be reduced by the application of np eocba modification of the optimal computing budget allocation ocba for prdes minimizes the number of simulation replications required to evaluate particular alternative by allocating computing resources to potentially critical alternatives we deploy web services technologies based on the java axis and net framework to create viable infrastructure for heterogeneous prdes systems this approach which receives increasing attention under the banners of grid computing and cloud computing further promotes reusability scalability and interoperability experimental results with prototype implementation not only furnish proof of concept but also illustrate significant gains in simulation efficiency with prdes the proposed concept and techniques can also be applied to simulation models that require coordination and interoperation in heterogeneous environments such as decentralized supply chains
the internet witnessed its traffic evolved from text and images based traditional web content to more multimedia rich applications in the past decade as result multimedia and internet streaming technology have become an increasingly important building block to many internet applications ranging from remote education digital radio internet protocol tv iptv etc this paper discusses the fundamental issues in internet streaming delivery and associated technical challenges it emphasizes the architecture and system differences between streaming applications and traditional web applications it reviews our research experience and presents lessons learned in this area as well as points out directions for future work
in this paper we present new approach to web search personalization based on user collaboration and sharing of information about web documents the proposed personalization technique separates data collection and user profiling from the information system whose contents and indexed documents are being searched for ie the search engines and uses social bookmarking and tagging to re rank web search results it is independent of the search engine being used so users are free to choose the one they prefer even if their favorite search engine does not natively support personalization we show how to design and implement such system in practice and investigate its feasibility and usefulness with large sets of real word data and user study
the content based cross media retrieval is new type of multimedia retrieval in which the media types of query examples and the returned results can be different in order to learn the semantic correlations among multimedia objects of different modalities the heterogeneous multimedia objects are analyzed in the form of multimedia document mmd which is set of multimedia objects that are of different media types but carry the same semantics we first construct an mmd semi semantic graph mmdssg by jointly analyzing the heterogeneous multimedia data after that cross media indexing space cmis is constructed for each query the optimal dimension of cmis is automatically determined and the cross media retrieval is performed on per query basis by doing this the most appropriate retrieval approach for each query is selected ie different search methods are used for different queries the query dependent search methods make cross media retrieval performance not only accurate but also stable we also propose different learning methods of relevance feedback rf to improve the performance experiment is encouraging and validates the proposed methods
despite the extensiveness of recent investigations on static typing for xml parametric polymorphism has rarely been treated this well established typing discipline can also be useful in xml processing in particular for programs involving parametric schemas ie schemas parameterized over other schemas eg soap the difficulty in treating polymorphism for xml lies in how to extend the semantic approach used in the mainstream monomorphic xml type systems naive extension would be semantic quantification over all substitutions for type variables however this approach reduces to an nexptime complete problem for which no practical algorithm is known in this paper we propose different method that smoothly extends the semantic approach yet is algorithmically easier in this we devise novel and simple marking technique where we interpret polymorphic type as set of values with annotations of which subparts are parameterized we exploit this interpretation in every ingredient of our polymorphic type system such as subtyping inference of type arguments and so on as result we achieve sensible system that directly represents usual expected behavior of polymorphic type systems values of variable types are never reconstructed in reminiscence of reynold’s parametricity theory also we obtain set of practical algorithms for typechecking by local modifications to existing ones for monomorphic system
the problem of content based image retrieval cbir has traditionally been investigated within framework that emphasises the explicit formulation of query users initiate an automated search for relevant images by submitting an image or draw sketch that exemplifies their information need often relevance feedback is incorporated as post retrieval step for optimising the way evidence from different visual features is combined while this sustained methodological focus has helped cbir to mature it has also brought out its limitations more clearly there is often little support for exploratory search and scaling to very large collections is problematic moreover the assumption that users are always able to formulate an appropriate query is questionable an effective albeit much less studied method of accessing image collections based on visual content is that of browsing the aim of this survey paper is to provide structured overview of the different models that have been explored over the last one to two decades to highlight the particular challenges of the browsing approach and to focus attention on few interesting issues that warrant more intense research
the general purpose shape retrieval problem is challenging task particularly an ideal technique which can work in clustered environment meet the requirements of perceptual similarity measure on partial query and overcoming dimensionality curse and adverse environment is in demand this paper reports our study on one local structural approach that addresses these issues shape representation and indexing are two key points in shape retrieval the proposed approach combines novel local structure based shape representation and new histogram indexing structure the former makes possible partial shape matching of objects without the requirement of segmentation separation of objects from complex background while the latter has an advantage on indexing performance the search time is linearly proportional to the input complexity in addition the method is relatively robust under adverse environments it is able to infer retrieval results from incomplete information of an input by first extracting consistent and structurally unique local neighborhood information from inputs or models and then voting on the optimal matches thousands of images have been used to test the proposed concepts on sensitivity analysis similarity based retrieval partial query and mixed object query very encouraging experimental results with respect to efficiency and effectiveness have been obtained
it is challenging problem of surface based deformation to avoid apparent volumetric distortions around largely deformed areas in this paper we propose new rigidity constraint for gradient domain mesh deformation to address this problem intuitively the proposed constraint can be regarded as several small cubes defined by the mesh vertices through mean value coordinates the user interactively specifies the cubes in the regions which are prone to volumetric distortions and the rigidity constraints could make the mesh behave like solid object during deformation the experimental results demonstrate that our constraint is intuitive easy to use and very effective
multicast routing in mobile ad hoc networks manets poses several challenges due to inherent characteristics of the network such as node mobility reliability scarce resources etc this paper proposes an agent based multicast routing scheme abmrs in manets which uses set of static and mobile agents five types of agents are used in the scheme route manager static agent network initiation mobile agent network management static agent multicast initiation mobile agent and multicast management static agent the scheme operates in the following steps to identify reliable nodes to connect reliable nodes through intermediate nodes to construct backbone for multicasting using reliable nodes and intermediate nodes to join multicast group members to the backbone to perform backbone and group members management in case of mobility the scheme has been simulated in various network scenarios to test operation effectiveness in terms of performance parameters such as packet delivery ratio control overheads and group reliability also comparison of proposed scheme with maodv multicast ad hoc on demand distance vector protocol is presented abmrs performs better than maodv as observed from the simulation abmrs offers flexible and adaptable multicast services and also supports component based software development
the use of blogs to track and comment on real world political news entertainment events is growing similarly as more individuals start relying on the web as their primary information source and as more traditional media outlets try reaching consumers through alternative venues the number of news sites on the web is also continuously increasing content reuse whether in the form of extensive quotations or content borrowing across media outlets is very common in blogs and news entries outlets tracking the same real world event knowledge about which web entries re use content from which others can be an effective asset when organizing these entries for presentation on the other hand this knowledge is not cheap to acquire considering the size of the related space web entries it is essential that the techniques developed for identifying re use are fast and scalable furthermore the dynamic nature of blog and news entries necessitates incremental processing for reuse detection in this paper we develop novel qsign algorithm that efficiently and effectively analyze the blogosphere for quotation and reuse identification experiment results show that with qsign processing time gains from to are possible while maintaining reuse detection rates of upto furthermore processing time gains can be pushed multiple orders of magnitude from to for recall
this course summarizes the motivations and requirements for camera control presents an overview of the state of the art and examines promising avenues and hot topics for future research it classifies the various techniques and identifies the representational limits and commitments of each approaches range from completely interactive techniques based on the possible mappings between user’s input and the camera parameters to completely automated paradigms in which the camera moves and jumps according to high level scenario oriented goals between these extremes lie approaches with more limited expressiveness that use range of algebraic and constraint based optimization techniques the course includes number of live examples from both commercial systems and research prototypes and it emphasizes the tough issues facing application developers such as real time handling of visibility for complex multiple targets in dynamic environments multi object tracking
microarchitectural redundancy has been proposed as means of improving chip lifetime reliability it is typically used in reactive way allowing chips to maintain operability in the presence of failures by detecting and isolating correcting and or replacing components on first come first served basis only after they become faulty in this paper we explore an alternative more preferred method of exploiting microarchitectural redundancy to enhance chip lifetime reliability in our proposed approach redundancy is used proactively to allow non faulty microarchitecture components to be temporarily deactivated on rotating basis to suspend and or recover from certain wearout effects this approach improves chip lifetime reliability by warding off the onset of wearout failures as opposed to reacting to them posteriorly applied to on chip cache sram for combating nbti induced wearout failure our proactive wearout recovery approach increases lifetime reliability measured in mean time to failure of the cache by about factor of seven relative to no use of microarchitectural redundancy and factor of five relative to conventional reactive use of redundancy having similar area overhead
data parallel algorithms are presented for polygonizing collection of line segments represented by data parallel bucket pmr quadtree data parallel tree and data parallel tree such an operation is useful in geographic information system gis sample performance comparison of the three data parallel structures for this operation is also given
we propose novel method for obtaining more accurate tangential velocities for solid fluid coupling our method works for both rigid and deformable objects as well as both volumetric objects and thin shells the fluid can be either one phase such as smoke or two phase such as water with free surface the coupling between the solid and the fluid can either be one way with kinematic solids or fully two way coupled the only previous scheme that was general enough to handle both two way coupling and thin shells required mass lumping strategy that did not allow for freely flowing tangential velocities similar to that previous work our method prevents leaking of fluid across thin shell however unlike that work our method does not couple the tangential velocities in any fashion allowing for the proper slip independently on each side of the body moreover since it accurately and directly treats the tangential velocity it does not rely on grid refinement to obtain reasonable solution therefore it gives highly improved result on coarse meshes
software designers in the object oriented paradigm can make use of modeling tools and standard notations such as uml nevertheless casual observations from collocated design collaborations suggest that teams tend to use physical mediums to sketch plethora of informal diagrams in varied representations that often diverge from uml to better understand such collaborations and support them with tools we need to understand the origins roles uses and implications of these alternate representations to this end we conducted observational studies of collaborative design exercises in which we focused on representation use our primary finding is that teams intentionally improviserepresentations and organize design information in responseto ad hoc needs which arise from the evolution of the design and which are difficult to meet with fixed standard notations this behavior incurs orientation and grounding difficulties for which teams compensate by relying on memory other communication mediums and contextual cues without this additional information the artifacts are difficult to interpret and have limited documentation potential collaborative design tools and processes should therefore focus on preserving contextual information while permitting unconstrained mixing and improvising of notations
definitional trees have been introduced by sergio antoy in order to design an efficient term rewrite strategy which computes needed outermost redexes in this paper we consider the use of definitional trees in the context of term graph rewriting we show that unlike the case of term rewrite systems the strategies induced by definitional trees do not always compute needed redexes in presence of term graph rewrite systems we then define new class called inductively sequential term graph rewrite systems istgrs for which needed redexes are still provided by definitional trees systems in this class are not confluent in general we give additional syntactic criteria over istgrs’s which ensure the confluence property with respect to the set of admissible term graphs
this paper describes generalization of the god object method for haptic interaction between rigid bodies our approach separates the computation of the motion of the six degree of freedom god object from the computation of the force applied to the user the motion of the god object is computed using continuous collision detection and constraint based quasi statics which enables high quality haptic interaction between contacting rigid bodies the force applied to the user is computed using novel constraint based quasi static approach which allows us to suppress force artifacts typically found in previous methods the constraint based force applied to the user which handles any number of simultaneous contact points is computed within few microseconds while the update of the configuration of the rigid god object is performed within few milliseconds for rigid bodies containing up to tens of thousands of triangles our approach has been successfully tested on complex benchmarks our results show that the separation into asynchronous processes allows us to satisfy the different update rates required by the haptic and visual displays force shading and textures can be added and enlarge the range of haptic perception of virtual environment this paper is an extension of
challenge involved in applying density based clustering to categorical datasets is that the cube of attribute values has no ordering defined we propose the hierdenc algorithm for hierarchical density based clustering of categorical data hierdenc offers basis for designing simpler clustering algorithms that balance the tradeoff of accuracy and speed the characteristics of hierdenc include it builds hierarchy representing the underlying cluster structure of the categorical dataset ii it minimizes the user specified input parameters iii it is insensitive to the order of object input iv it can handle outliers we evaluate hierdenc on small dimensional standard categorical datasets on which it produces more accurate results than other algorithms we present faster simplification of hierdenc called the mulic algorithm mulic performs better than subspace clustering algorithms in terms of finding the multi layered structure of special datasets
we describe an approach to designing and implementing distributed system as family of related finite state machines generated from single abstract model various artefacts are generated from each state machine including diagrams source level protocol implementations and documentation the state machine family formalises the interactions between the components of the distributed system allowing increased confidence in correctness our methodology facilitates the application of state machines to problems for which they would not otherwise be suitablewe illustrate the technique with the example of byzantine fault tolerant commit protocol used in distributed storage system showing how an abstract model can be defined in terms of an abstract state space and various categories of state transitions we describe how such an abstract model can be deployed in concrete system and propose general methodology for developing systems in this style
all to all personalized communication commonly occurs in many important parallel algorithms such as fft and matrix transpose this paper presents new algorithms for all to all personalized communication or complete exchange in multidimensional torus or mesh connected multiprocessors for an times torus or mesh where leq the proposed algorithms have time complexities of message startups and rc message transmissions the algorithms for three or higher dimensional tori or meshes follow similar structure unlike other existing message combining algorithms in which the number of nodes in each dimension should be power of two and square the proposed algorithms accommodate non power of two tori or meshes where the number of nodes in each dimension need not be power of two and square in addition destinations remain fixed over larger number of steps in the proposed algorithms thus making them amenable to optimizations finally the data structures used are simple hence making substantial savings of message rearrangement time
predicated execution has been used to reduce the number of branch mispredictions by eliminating hard to predict branches however the additional instruction overhead and additional data dependencies due to predicated execution sometimes offset the performance advantage of having fewer mispredictions we propose mechanism in which the compiler generates code that can be executed either as predicated code or non predicated code ie code with normal conditional branches the hardware decides whether the predicated code or the non predicated code is executed based on run time confidence estimation of the branch prediction the code generated by the compiler is the same as predicated code except the predicated conditional branches are not removed they are left intact in the program code these conditional branches are called wish branches the goal of wish branches is to use predicated execution for hard to predict dynamic branches and branch prediction for easy to predict dynamic branches thereby obtaining the best of both worlds we also introduce class of wish branches called wish loops which utilize predication to reduce the misprediction penalty for hard to predict backward loop branches we describe the semantics types and operation of wish branches along with the software and hardware support required to generate and utilize them our results show that wish branches decrease the average execution time of subset of spec int benchmarks by compared to traditional conditional branches and by compared to the best performing predicated code binary
this paper proposes formal framework for agent communication where agents can reason about their goals using strategic reasoning this reasoning is argumentation based and enables agents to generate set of strategic goals depending on set of constraints sub goals are generated using this reasoning and they can be cancelled or substituted for alternatives during the dialogue progress an original characteristic of this framework is that agents can use this strategic reasoning together with tactic reasoning to persist in the achievement of their goals by considering alternatives depending on set of constraints tactic reasoning is responsible of selecting the communicative acts to perform in order to realize the strategic goals some constraints are fixed when the conversation starts and others during the dialogue progress the paper also discusses the computational complexity of such reasoning
coordination protocols are widely recognized as an efficient mechanism to support agent interaction involved in modern multi agent system mas applications this paper proposes solution for dynamic execution of coordination protocols in such open and distributed mas applications more precisely this paper shows how agents can dynamically take part in conversations where the role they are intended to hold within protocol is played without the need of prior knowledge
allocation of goal responsibilities to agent roles in multi agent systems mas influence the degree to which these systems satisfy nonfunctional requirements this paper proposes systematic approach that starts from nonfunctional requirements identification and moves towards agent role definition guided by the degree of nonfunctional requirements satisfaction the approach relies on goal dependencies to allow potential mas vulnerabilities to be studied in contrast to related work where organizational patterns are imposed on mas roles are constructed first allowing mas organizational structures to emerge from role definitions
streaming data models have been shown to be useful in many applications requiring high performance data exchange application level overlay networks are natural way to realize these applications data flows and their internal computations but existing middleware is not designed to scale to the data rates and low overhead computations necessary for the high performance domain this paper describes evpath middleware infrastructure that supports the construction and management of overlay networks that can be customized both in topology and in the data manipulations being performed extending from previous high performance publish subscribe system evpath not only provides for the low overhead movement and in line processing of large data volumes but also offers the flexibility needed to support the varied data flow and control needs of alternative higher level streaming models we explore some of the challenges of high performance event systems including those experienced when operating an event infrastructure used to transport io events at the scale of hundred thousand nodes specifically when transporting output data from large scale simulation running on the ornl cray jaguar petascale machine surprising new issue seen in experimentation at scale is the potential for strong perturbation of running applications from inappropriate speeds at which io is performed this requires the io system’s event transport to be explicitly scheduled to constrain resource competition in addition to dynamically setting and changing the topologies of event delivery
flip chip is solution for designs requiring more pins and higher speed however the higher speed demand also brings the issue of signal skew in this paper we propose new stage design layout methodology for flip chip considering signal skew firstly we produce an initial bumper signal assignment and then solve the flip chip floorplanning problem using partitioning based technique to spread the modules across the flip chip as the distribution of its bumpers with an anchoring and relocation strategy we can effectively place buffers at desirable locations finally we further reduce signal skew and monotonic routing density by refining the bumper signal assignment experimental results show that signal skew of traditional floorplanners range from to higher than ours and the total wirelength of other floorplanners is as much as higher than ours moreover our signal refinement method can further decrease monotonic routing density by up to and signal skew by up to
we present security analysis of the complete tls protocol in the universal composable security framework this analysis evaluates the composition of key exchange functionalities realized by the tls handshake with the message transmission of the tls record layer to emulate secure communication sessions and is based on the adaption of the secure channel model from canetti and krawczyk to the setting where peer identities are not necessarily known prior the protocol invocation and may remain undisclosed our analysis shows that tls including the diffie hellman and key transport suites in the uni directional and bi directional models of authentication securely emulates secure communication sessions
web page classification is important to many tasks in information retrieval and web mining however applying traditional textual classifiers on web data often produces unsatisfying results fortunately hyperlink information provides important clues to the categorization of web page in this paper an improved method is proposed to enhance web page classification by utilizing the class information from neighboring pages in the link graph the categories represented by four kinds of neighbors parents children siblings and spouses are combined to help with the page in question in experiments to study the effect of these factors on our algorithm we find that the method proposed is able to boost the classification accuracy of common textual classifiers from around to more than on large dataset of pages from the open directory project and outperforms existing algorithms unlike prior techniques our approach utilizes same host links and can improve classification accuracy even when neighboring pages are unlabeled finally while all neighbor types can contribute sibling pages are found to be the most important
this paper provides mathematical analysis of higher order variational methods and nonlinear diffusion filtering for image denoising besides the average grey value it is shown that higher order diffusion filters preserve higher moments of the initial data while maximum minimum principle in general does not hold for higher order filters we derive stability in the norm in the continuous and discrete setting considering the filters in terms of forward and backward diffusion one can explain how not only the preservation but also the enhancement of certain features in the given data is possible numerical results show the improved denoising capabilities of higher order filtering compared to the classical methods
numismatics deals with various historical aspects of the phenomenon money fundamental part of numismatists work is the identification and classification of coins according to standard reference books the recognition of ancient coins is highly complex task that requires years of experience in the entire field of numismatics to date no optical recognition system for ancient coins has been investigated successfully in this paper we present an extension and combination of local image descriptors relevant for ancient coin recognition interest points are detected and their appearance is described by local descriptors coin recognition is based on the selection of similar images based on feature matching experiments are presented for database containing ancient coin images demonstrating the feasibility of our approach
due to the increasing use of very large databases and data warehouses mining useful information and helpful knowledge from transactions is evolving into an important research area in the past researchers usually assumed databases were static to simplify data mining problems thus most of the classic algorithms proposed focused on batch mining and did not utilize previously mined information in incrementally growing databases in real world applications however developing mining algorithm that can incrementally maintain discovered information as database grows is quite important in this paper we propose the concept of pre large itemsets and design novel efficient incremental mining algorithm based on it pre large itemsets are defined by lower support threshold and an upper support threshold they act as gaps to avoid the movements of itemsets directly from large to small and vice versa the proposed algorithm doesn’t need to rescan the original database until number of transactions have been newly inserted if the database has grown larger then the number of new transactions allowed will be larger too
an emerging challenge for software engineering is the development of the methods and tools to aid design and analysis of concurrent and distributed software over the past few years number of analysis methods that focus on ada tasking have been developed many of these methods are based on some form of reachability analysis which has the advantage of being conceptually simple but the disadvantage of being computationally expensive we explore the effectiveness of various petri net based techniques for the automated deadlock analysis of ada programs our experiments consider variety of state space reduction methods both individually and in various combinations the experiments are applied to number of classical concurrent programs as well as set of ldquo real world rdquo programs the results indicate that petri net reduction and reduced state space generation are mutually beneficial techniques and that combined approaches based on petri net models are quite effective compared to alternative analysis approaches
modern software systems are built to operate in an open world setting by this we mean software that is conceived as dynamically adaptable and evolvable aggregate of components that may change at run time to respond to continuous changes in the external world moreover the software designer may have different degrees of ownership control and visibility of the different parts that compose an application in this scenario design time assumptions may be based on knowledge that may have different degrees of accuracy for the different parts of the application and of the external world that interacts with the system furthermore even if initially accurate they may later change after the system is deployed and running in this paper we investigate how these characteristics influence the way engineers can deal with performance attributes such as response time following model driven approach we discuss how to use at design time performance models based on queuing networks to drive architectural reasoning we also discuss the possible use of keeping models alive at run time this enables automatic re estimation of model parameters to reflect the real behavior of the running system re execution of the model and detection of possible failure which may trigger reaction that generates suitable architectural changes we illustrate our contribution through running example and numerical simulations that show the effectiveness of the proposed approach
we present occlusion switches for interactive visibility culling in complex environments an occlusion switch consists of two gpus graphics processing units and each gpu is used to either compute an occlusion representation or cull away primitives not visible from the current viewpoint moreover we switch the roles of each gpu between successive frames the visible primitives are rendered in parallel on third gpu we utilize frame to frame coherence to lower the communication overhead between different gpus and improve the overall performance the overall visibility culling algorithm is conservative up to image space precision this algorithm has been combined with levels of detail and implemented on three networked pcs each consisting of single gpu we highlight its performance on complex environments composed of tens of millions of triangles in practice it is able to render these environments at interactive rates with little loss in image quality
light weight embedded systems are now gaining more popularity due to the recent technological advances in fabrication that have resulted in more powerful tiny processors with greater communication capabilities that pose various scientific challenges for researchers perhaps the most significant challenge is the energy consumption concern and reliability mainly due to the small size of batteries in this tutorial we portray brief description of low power light weight embedded systems depict several power profiling studies previously conducted and present several research challenges that require low power consumption in embedded systems for each challenge we highlight how low power designs may enhance the overall performance of the system finally we present several techniques that minimize the power consumption in such systems
previous research works have presented convincing arguments that frequent pattern mining algorithm should not mine all frequent but only the closed ones because the latter leads to not only more compact yet complete result set but also better efficiency upon discovery of frequent closed xml query patterns indexing and caching can be effectively adopted for query performance enhancement most of the previous algorithms for finding frequent patterns basically introduced straightforward generate and test strategy in this paper we present solaria an efficient algorithm for mining frequent closed xml query patterns without candidate maintenance and costly tree containment checking efficient algorithm of sequence mining is involved in discovering frequent tree structured patterns which aims at replacing expensive containment testing with cheap parent child checking in sequences solaria deeply prunes unrelated search space for frequent pattern enumeration by parent child relationship constraint by thorough experimental study on various real life data we demonstrate the efficiency and scalability of solaria over the previous known alternative solaria is also linearly scalable in terms of xml queries size
measuring the effectiveness of proposed black box correlation attacks against deployed anonymous networks is not feasible this results in not being able to measure the effectiveness of defensive techniques or performance enhancements with respect to anonymity to overcome this problem discrete event based network simulation of the tor anonymous network is developed the simulation is validated against traffic transmitted through the real tor network and the scalability of the simulation is measured simulations with up to clients were run upon which several attacks are implemented thus allowing for measure of anonymity experimental defensive techniques are tested with corresponding anonymity measured
middleware supported database replication is away to increase performance and tolerate failures of enterprise applications middleware architectures distinguish themselves by their performance scalability and their application interface on one hand and the degree to which they guarantee replication consistency on the other both groups of features may conflict since the latter comes with an overhead that bears on the former we review different techniques proposed to achieve and measure improvements of the performance scalability and overhead introduced by different degrees of data consistency we do so with particular emphasis on the requirements of enterprise applications
existing software transactional memory stm designs attach metadata to ranges of shared memory subsequent runtime instructions read and update this metadata in order to ensure that an in flight transaction’s reads and writes remain correct the overhead of metadata manipulation and inspection is linear in the number of reads and writes performed by transaction and involves expensive read modify write instructions resulting in substantial overheads we consider novel approach to stm in which transactions represent their read and write sets as bloom filters and transactions commit by enqueuing bloom filter onto global list using this approach our ringstm system requires at most one read modify write operation for any transaction and incurs validation overhead linear not in transaction size but in the number of concurrent writers who commit furthermore ringstm is the first stm that is inherently livelock free and privatization safe while at the same time permitting parallel writeback by concurrent disjoint transactionswe evaluate three variants of the ringstm algorithm and find that it offers superior performance and or stronger semantics than the state of the art tl algorithm under number of workloads
in this paper an analysis of the effect of partial occlusion on facial expression recognition is investigated the classification from partially occluded images in one of the six basic facial expressions is performed using method based on gabor wavelets texture information extraction supervised image decomposition method based on discriminant non negative matrix factorization and shape based method that exploits the geometrical displacement of certain facial features we demonstrate how partial occlusion affects the above mentioned methods in the classification of the six basic facial expressions and indicate the way partial occlusion affects human observers when recognizing facial expressions an attempt to specify which part of the face left right lower or upper region contains more discriminant information for each facial expression is also made and conclusions regarding the pairs of facial expressions misclassifications that each type of occlusion introduces are drawn
we present proud probabilistic approach to processing similarity queries over uncertain data streams where the data streams here are mainly time series streams in contrast to data with certainty an uncertain series is an ordered sequence of random variables the distance between two uncertain series is also random variable we use general uncertain data model where only the mean and the deviation of each random variable at each timestamp are available we derive mathematical conditions for progressively pruning candidates to reduce the computation cost we then apply proud to streaming environment where only sketches of streams like wavelet synopses are available extensive experiments are conducted to evaluate the effectiveness of proud and compare it with det deterministic approach that directly processes data without considering uncertainty the results show that compared with det proud offers flexible trade off between false positives and false negatives by controlling threshold while maintaining similar computation cost in contrast det does not provide such flexibility this trade off is important as in some applications false negatives are more costly while in others it is more critical to keep the false positives low
this paper presents selinks programming language focused on building secure multi tier web applications selinks provides uniform programming model in the style of linq and ruby on rails with language syntax for accessing objects residing either in the database or at the server object level security policies are expressed as fully customizable first class labels which may themselves be subject to security policies access to labeled data is mediated via trusted user provided policy enforcement functions selinks has two novel features that ensure security policies are enforced correctly and efficiently first selinks implements type system called fable that allows protected object’s type to refer to its protecting label the type system can check that labeled data is never accessed directly by the program without first consulting the appropriate policy enforcement function second selinks compiles policy enforcement code to database resident user defined functions that can be called directly during query processing database side checking avoids transferring data to the server needlessly while still allowing policies to be expressed in customizable and portable manner our experience with two sizable web applications modelhealth care database and secure wiki with fine grained security policies indicates that cross tier policy enforcement in selinks is flexible relatively easy to use and when compared to single tier approach improves throughput by nearly an order of magnitude selinks is freely available
near duplicate keyframes ndk play unique role in large scale video search news topic detection and tracking in this paper we propose novel ndk retrieval approach by exploring both visual and textual cues from the visual vocabulary and semantic context respectively the vocabulary which provides entries for visual keywords is formed by the clustering of local keypoints the semantic context is inferred from the speech transcript surrounding keyframe we experiment the usefulness of visual keywords and semantic context separately and jointly using cosine similarity and language models by linearly fusing both modalities performance improvement is reported compared with the techniques with keypoint matching while matching suffers from expensive computation due to the need of online nearest neighbor search our approach is effective and efficient enough for online video search
virtual environments systems based on immersive projection technologies ipts offer users the possibility of collaborating intuitively in environment while considerable work has been done to examine interaction in desktop based collaborative virtual environments cves there are currently no studies for collaborative interaction using iptsthe aim of this paper is to examine how immersive technologies support interaction and to compare this to the experience with desktop systems study of collaboration is presented where two partners worked together using networked ipt environments the data collected included observations analysis of video and audio recordings questionnaires and debriefing interviews from both ipt sites this paper focuses on the successes and failures in collaboration through detailed examination of particular incidents during the interaction we compare these successes and failures with the findings of study by hindmarsh fraser heath benford computer supported collaborative work cscw pp that examined object focused interaction on desktop based cve systemour findings identify situations where interaction is better supported with the ipt system than the desktop system and situations where interaction is not as well supported we also present examples of how social interaction is critical to seamless collaboration
ferret is toolkit for building content based similarity search systems for feature rich data types such as audio video and digital photosthe key component of this toolkit is content based similarity search engine for generic multi feature object representations this paper describes the filtering mechanism used in the ferret toolkit and experimental results with several datasets the filtering mechanism uses approximation algorithms to generate candidate set and then ranks the objects in the candidate set with more sophisticated multi feature distance measure the paper compared two filtering methods using segment feature vectors and sketches constructed from segment feature vectors our experimental results show that filtering can substantially speedup the search process and reduce memory requirement while maintaining good search quality to help systems designers choose the filtering parameters we have developed rank based analytical model for the filtering algorithm using sketches our experiments show that the model gives conservative and good prediction for different datasets
modern distributed systems are diverse and dynamic and consequently difficult to manage using traditional approaches which rely on an extensive initial knowledge of the system on the performance front these systems often offer multiple opportunities for dynamically degrading or improving service level based on workload intensity to avoid overload and underload in this context we propose novel approach for building distributed systems capable of autonomously deciding when and how to adapt service level our approach limits the knowledge that must be provided manually to component based representation of the system from this representation we build and maintain performance profile which allows us to identify the most promising adaptations based on workload type and dynamically characterize the intrinsic efficiency of each adaptation based on past attempts we have successfully implemented and evaluated prototype of our approach in the context of multi tiered application servers copyright copy john wiley sons ltd
within recent years the development of location based services have received increasing attention from the software industry as well as from researchers within wide range of computing disciplines as particular interesting class of context aware mobile systems however while lot of research has been done into sensing adapting to and philosophising over the complex concept of context little theoretically based knowledge exists about why from user experience perspective some system designs work well and why others do not contributing to this discussion this article suggests the perspective of gestalt theory as theoretical framework for understanding the use of this class of computer systems based on findings from an empirical study we argue that the user experience of location based services can be understood through gestalt theory’s five principles of perceptual organisation proximity closure symmetry continuity and similarity specifically we argue that these principles assist us in explaining the interplay between context and technology in the user experience of location based services and how people make sense of small and fragmented pieces of information on mobile devices in context
we consider an active learning game within transductive learning model major problem with many active learning algorithms is that an unreliable current hypothesis can mislead the querying component to query uninformative points in this work we propose remedy to this problem our solution can be viewed as patch for fixing this deficiency and also as proposed modular approach for active transductive learning that produces powerful new algorithms extensive experiments on real data demonstrate the advantage of our method
neural symbolic systems are hybrid systems that integrate symbolic logic and neural networks the goal of neural symbolic integration is to benefit from the combination of features of the symbolic and connectionist paradigms of artificial intelligence this paper introduces new neural network architecture based on the idea of fibring logical systems fibring allows one to combine different logical systems in principled way fibred neural networks may be composed not only of interconnected neurons but also of other networks forming recursive architecture fibring function then defines how this recursive architecture must behave by defining how the networks in the ensemble relate to each other typically by allowing the activation of neurons in one network to influence the change of weights in another network intuitively this can be seen as training network at the same time that one runs network we show that in addition to being universal approximators like standard feedforward networks fibred neural networks can approximate any polynomial function to any desired degree of accuracy thus being more expressive than standard feedforward networks
we consider repository of animation models and motions that can be reused to generate new animation sequences for instance user can retrieve an animation of dog kicking its leg in air and manipulate the result to generate new animation where the dog is kicking ball in this particular example inverse kinematics technique can be used to retarget the kicking motion of dog to ball this approach of reusing models and motions to generate new animation sequences can be facilitated by operations such as querying of animation databases for required models and motions and manipulation of the query results to meet new constraints however manipulation operations such as motion retargeting are quite complex in nature hence there is need for visualizing the queries on animation databases as well as the manipulation operations on the query resultsin this paper we propose visually interactive method for reusing motions and models by adjusting the query results from animation databases for new situations while at the same time keeping the desired properties of the original models and motions here user first queries for animation objects ie geometric models and motions then the user interactively makes new animations by visually manipulating the query results depending on the orders in which the guis graphical user interfaces are invoked and the parameters are changed the system automatically generates sequence of operations list of sql like syntax commands and applies it to the query results of motions and models with the help of visualization tools the user can view the changes before accepting them
xpath is the standard declarative language for navigating xml data and returning set of matching nodes in the context of xslt xquery analysis query optimization and xml type checking xpath decision problems arise naturally they notably include xpath comparisons such as equivalence whether two queries always return the same result and containment whether for any tree the result of particular query is included in the result of second one xpath decision problems have attracted lot of research attention especially for studying the computational complexity of various xpath fragments however what is missing at present is the constructive use of an expressive logic which would allow capturing these decision problems while providing practically effective decision proceduresin this paper we propose logic based framework for the static analysis of xpath specifically we propose the alternation free modal calculus with converse as the appropriate logic for effectively solving xpath decision problems we present translation of large xpath fragment into calculus together with practical experiments on the containment using state of the art exptime decision procedure for calculus satisfiability these preliminary experiments shed light for the first time on the cost of checking the containment in practice we believe they reveal encouraging results for further static analysis of xml transformations
as software systems continue to grow and evolve locating code for maintenance and reuse tasks becomes increasingly difficult existing static code search techniques using natural language queries provide little support to help developers determine whether search results are relevant and few recommend alternative words to help developers reformulate poor queries in this paper we present novel approach that automatically extracts natural language phrases from source code identifiers and categorizes the phrases and search results in hierarchy our contextual search approach allows developers to explore the word usage in piece of software helping them to quickly identify relevant program elements for investigation or to quickly recognize alternative words for query reformulation an empirical evaluation of developers reveals that our contextual search approach significantly outperforms the most closely related technique in terms of effort and effectiveness
despite the importance of the quality of software project data problematic data inevitably occurs during data collection these data are the outliers with abnormal values on certain attributes which we call the abnormal attributes of outliers manually detecting outliers and their abnormal attributes is laborious and time consuming although few existing approaches identify outliers and their abnormal attributes these approaches are not effective in identifying the abnormal attributes when the outlier has abnormal values on more than the specific number of its attributes or discovering accurate rules to detect outliers and their abnormal attributes in this paper we propose pattern based outlier detection method that identifies abnormal attributes in software project data after discovering the reliable frequent patterns that reflect the typical characteristics of the software project data outliers and their abnormal attributes are detected by matching the software project data with those patterns empirical studies were performed on three industrial data sets and artificial data sets with injected outliers the results demonstrate that our approach outperforms five other approaches by an average of and in detecting the outliers and abnormal attributes respectively on the industrial data sets and an average of and respectively on the artificial data sets
the filtering of incoming tuples of data stream should be completed quickly and continuously which requires strict time and space constraints in order to guarantee these constraints the selection predicates of continuous queries are grouped or indexed in most data stream management systems dsms this paper proposes new scheme called attribute selection construct asc given set of continuous queries an asc divides the domain of an attribute of data stream into set of disjoint regions based on the selection predicates that are imposed on the attribute each region maintains the pre computed matching results of the selection predicates consequently an asc can collectively evaluate all of its selection predicates at the same time furthermore it can also monitor the overall evaluation statistics such as its selectivity and tuple dropping ratio dynamically for those attributes that are employed to express the selection predicates of the queries the processing order of their asc’s can significantly influence the overall performance of multiple query evaluation the evaluation sequence can be optimized by periodically capturing the run time tuple dropping ratio of its current evaluation sequence the performance of the proposed method is analyzed by series of experiments to identify its various characteristics
many recursive query processing applications are still poorly supported partly because implementations of general recursive capabilities are inefficient and hard to understand for users partly because the approaches do not integrate well with existing query languages an extension is proposed of the database language sql for the processing of recursive structures the new constructs are integrated in the view definition mechanism of sql therefore users with knowledge of sql can take advantage of the increased functionally without learning new language the construct is based on generalization of transitive closure and is formally defined because of the importance of extreme value sections special constructs are introduced for the selection of tuples with minimal or maximal values in some attributes applying these selections on recursively defined views constitutes nonlinear recursion by the introduction of special constructs for these selections dealing with general nonlinear recursion can be avoided
one of the challenges for software architects is ensuring that an implemented system faithfully represents its architecture we describe and demonstrate tool called discotect that addresses this challenge by dynamically monitoring running system and deriving the software architecture as that system runs the derivation process is based on mappings that relate low level system level events to higher level architectural events the resulting architecture is then fed into existing architectural design tools so that comparisons can be conducted with the design time architecture and architectural analyses can be re run to ensure that they are still valid in addition to the demonstration we briefly describe the mapping language and formal definition of the language in terms of colored petri nets
in this paper we present the design and implementation of itdb self healing or intrusion tolerant database prototype system while traditional secure database systems rely on preventive controls and are very limited in surviving malicious attacks itdb can detect intrusions isolate attacks contain assess and repair the damage caused by intrusions in timely manner such that sustained self stabilized levels of data integrity and availability can be provided to applications in the face of attacks itdb is implemented on top of cots dbms we have evaluated the cost effectiveness of itdb using several micro benchmarks preliminary testing measurements suggest that when the accuracy of intrusion detection is satisfactory itdb can effectively locate and repair the damage on the fly with reasonable database performance penalty
recommender systems are an effective tool to help find items of interest from an overwhelming number of available items collaborative filtering cf the best known technology for recommender systems is based on the idea that set of like minded users can help each other find useful information new user poses challenge to cf recommenders since the system has no knowledge about the preferences of the new user and therefore cannot provide personalized recommendations new user preference elicitation strategy needs to ensure that the user does not abandon lengthy signup process and lose interest in returning to the site due to the low quality of initial recommendations we extend the work of in this paper by incrementally developing set of information theoretic strategies for the new user problem we propose an offline simulation framework and evaluate the strategies through extensive offline simulations and an online experiment with real users of live recommender system
the ability to detect and pinpoint memory related bugs in production runs is important because in house testing may miss bugs this paper presents heapmon heap memory bug detection scheme that has very low performance overhead is automatic and is easy to deploy heapmon relies on two new techniques first it decouples application execution from bug monitoring which executes as helper thread on separate core in chip multiprocessor system second it associates filter bit with each cached word to safely and significantly reduce bug checking frequency by on average we test the effectiveness of these techniques using existing and injected memory bugs in spec applications and show that heapmon effectively detects and identifies most forms of heap memory bugs our results also indicate that the heapmon performance overhead is only on average orders of magnitude less than existing tools its overhead is also modest of the cache size and kb victim cache for on chip filter bits and of the allocated heap memory size for state bits which are maintained by the helper thread as software data structure
the rise of social interactions on the web requires developing new methods of information organization and discovery to that end we propose generative community based probabilistic tagging model that can automatically uncover communities of users and their associated tags we experimentally validate the quality of the discovered communities over the social bookmarking system delicious in comparison to an alternative generative model latent dirichlet allocation lda we find that the proposed community based model improves the empirical likelihood of held out test data and discovers more coherent interest based communities based on the community based probabilistic tagging model we develop novel community based ranking model for effective community based exploration of socially tagged web resources we compare community based ranking to three state of the art retrieval models bm ii cluster based retrieval using means clustering and iii lda based retrieval we find that the proposed ranking model results in significant improvement over these alternatives from to in the quality of retrieved pages
the possibility of non deterministic reductions is distinctive feature of some declarative languages two semantics commonly adopted for non determinism are call time choice notion that at the operational level is related to the sharing mechanism of lazy evaluation in functional languages and run time choice which corresponds closely to ordinary term rewriting but there are practical situations where neither semantics is appropriate if used in isolation in this paper we propose to annotate functions in program with the semantics most adequate to its intended use annotated programs are then mapped into unified core language but still high level designed to achieve careful but neat combination of ordinary rewriting to cope with run time choice with local bindings via let construct devised to express call time choice the result is flexible framework into which existing languages using pure run time or call time choice can be embedded either directly in the case of run time choice or by means of simple program transformation introducing lets in function definitions for the case of call time choice we prove the adequacy of the embedding as well as other relevant properties of the framework
complex entities are one of the most popular ways to model relationships among data especially complex entities known as physical assemblies are popular in several applications typically complex entities consist of several parts organized at many nested levels contemporary query languages intended for manipulating complex entities support only extensional queries likewise the user has to master the structures of complex entities completely which is impossible if physical assembly consists of huge number of parts further query languages do not support the manipulation of documents related to parts of physical assemblies in this paper we introduce novel declarative and powerful query language in which the above deficiencies have been eliminated our query language supports text information retrieval related to parts and it contains intensional and combined extensional intensional query features these features support making queries of new types in the paper we give several sample queries which demonstrate the usefulness of these query types in addition we show that conventional extensional queries can be formulated intuitively and compactly in our query language among other things this is due to our query primitives allowing removal of the explicit specification of navigation from the user
recently the practice of speculation in resolving data dependences has been studied as means of extracting more instruction level parallelism ilp an outcome of an instruction is predicted by value predictors the instruction and its dependent instructions can be executed simultaneously thereby exploiting ilp aggressively one of the serious hurdles for realizing data speculation is huge hardware budget of the predictors in this paper we propose technique reducing the budget by exploiting narrow width values the hardware budget of value predictors is reduced by up to simulation results show that the technique called mode scheme maintains processor performance with slight decrease of the value prediction accuracy
the paper introduces the cama context aware mobile agents framework intended for developing large scale mobile applications using the agent paradigm cama provides powerful set of abstractions supporting middleware and an adaptation layer allowing developers to address the main characteristics of the mobile applications openness asynchronous and anonymous communication fault tolerance and device mobility it ensures recursive system structuring using location scope agent and role abstractions cama supports system fault tolerance through exception handling and structured agent coordination within nested scopes the applicability of the framework is demonstrated using an ambient lecture scenario the first part of an ongoing work on series of ambient campus applications this scenario is developed starting from thorough definition of the traceable requirements including the fault tolerance requirements this is followed by the design phase at which the cama abstractions are applied at the implementation phase the cama middleware services are used through provided api this work is part of the fp ist rodin project on rigorous open development environment for complex systems
reversible execution has not been fully exploited in symbolic debuggers debuggers that can undo instructions usually incur significant performance penalty during debugging session in this paper we describe an efficient reversible debugging mechanism based on program instrumentation the approach enables repetitive debugging sessions with selectable reversible routines and recording modes experimental results indicate that the execution penalty can be significantly reduced with moderate code growth
in this paper we design and analyze web object replacement algorithm referred to as gain based exchange and migration algorithm gema suitable for cooperative world wide web proxy caching environment in cooperative environment where more than one proxy exists the replacement algorithms used for single system cache cannot not be directly put in use to achieve an acceptable performance in this paper we first present an analytical model which quantifies the ldquo importance rdquo referred to as object caching gain of an object at cache this gain is used in making replacement decisions and considers the benefit of caching at local as well as neighboring proxies our model efficiently exploits the advantages present in the existing the research contributions on designing replacement strategies for the single cache environment further with this model we introduce two basic powerful primitive operations namely the object exchange and object migration to improve an overall performance these two operations are carried out as an outcome of replacement decisions based on the comparison of gains among objects thus the calculation of the gain and deciding on which of the operations to use constitute the main part of our algorithm gema for quantifying the performance of gema we carry out rigorous simulation experiments based on trace driven and event driven approaches using the event driven simulation we comprehensively testify the performance improvement of gema under variety of performance measures such as average access time of web objects hit ratio and byte hit ratio we compare and analyze our strategies with some of the popular strategies found in the literature we also highlight some possible extensions to the research contributions in this paper
the cost complexity and inflexibility of hardware based directory protocols motivate us to study the performance implications of protocols that emulate directory management using software handlers executed on the compute processors an important performance limitation of such software only protocols is that software latency associated with directory management ends up on the critical memory access path for read miss transactions we propose five strategies that support efficient data transfers in hardware whereas directory management is handled at slower pace in the background by software handlers simulations show that this approach can remove the directory management latency from the memory access path whereas the directory is managed in software the hardware mechanisms must access the memory state in order to enable data transfers at high speed overall our strategies reach between and of the hardware based protocol performance
this paper presents simple and scalable framework for architecting peer to peer overlays called peer to peer receiver driven overlay or pro pro is designed for non interactive streaming applications and its primary design goal is to maximize delivered bandwidth and thus delivered quality to peers with heterogeneous and asymmetric bandwidth to achieve this goal pro adopts receiver driven approach where each receiver or participating peer independently discovers other peers in the overlay through gossiping and ii selfishly determines the best subset of parent peers through which to connect to the overlay to maximize its own delivered bandwidth participating peers form an unstructured overlay which is inherently robust to high churn rate than structured overlay networks furthermore each receiver leverages congestion controlled bandwidth from its parents as implicit signal to detect and react to long term changes in network or overlay condition without any explicit coordination with other participating peers independent parent selection by individual peers dynamically converge to an efficient overlay structure
unified framework is proposed for designing textures using energy optimization and deformation our interactive scheme has the ability to globally change the visual properties of texture elements and locally change texture elements with little user interaction given small sample texture the design process starts with applying set of global deformation operations rotation translation mirror scale and flip to the sample texture to obtain set of deformed textures automatically then we further make the local deformation to the deformed textures interactively by replacing the local texture elements regions from other textures by utilizing the energy optimization method interactive selections and deformations of local texture elements are accomplished simply through indicating the positions of texture elements very roughly with brush tool finally the deformed textures are further utilized to create large textures with the fast layer based texture deformation algorithm and the wavelet based energy optimization our experimental results demonstrate that the proposed approach can help design large variety of textures from small example change the locations of texture elements increase or decrease the density of texture elements and design cyclic marbling textures
this paper proposes an efficient method to locate three dimensional object in cluttered environment model of the object is represented in reference scale by the local features extracted from several reference images pca based hashing technique is introduced for accessing the database of reference features efficiently localization is performed in an estimated relative scale firstly pair of stereo images is captured simultaneously by calibrated cameras then the object is identified in both images by extracting features and matching them with reference features clustering the matched features with generalized hough transformation and verifying clusters with spatial relations between the features after the identification process knowledge based correspondences of features belonging to the object present in the stereo images are used for the estimation of the position the localization method is robust to different kinds of geometric and photometric transformations in addition to cluttering partial occlusions and background changes as both the model representation and localization are single scale processes the method is efficient in memory usage and computing time the proposed relative scale method has been implemented and experiments have been carried out on set of objects the method results very good accuracy and takes only few seconds for object localization by our primary implementation an application of the relative scale method for exploration of an object in cluttered environment is demonstrated the proposed method could be useful for many other practical applications
similarities among subsequences are typically regarded as categorical features of sequential data we introduce an algorithm for capturing the relationships among similar contiguous subsequences two time series are considered to be similar during time interval if every contiguous subsequence of predefined length satisfies the given similarity criterion our algorithm identifies patterns based on the similarity among sequences captures the sequence subsequence relationships among patterns in the form of directed acyclic graph dag and determines pattern conglomerates that allow the application of additional meta analyses and mining algorithms for example our pattern conglomerates can be used to analyze time information that is lost in categorical representations we apply our algorithm to stock market data as well as several other time series data sets and show the richness of our pattern conglomerates through qualitative and quantitative evaluations an exemplary meta analysis determines timing patterns representing relations between time series intervals and demonstrates the merit of pattern relationships as an extension of time series pattern mining
recently in moving object databases that mainly manage the spatiotemporal attributes approximate query processing for the future location based queries has deserved enormous attention histograms are generally used for selectivity estimation and approximate query answering in database enviromnents because histograms static properties may however make them inappropriate for application areas that treat dynamic properties such as moving object databases it is necessary to develop several mechanisms that can be well applied to dynamic query processingin this paper we present new method to efficiently process the approximate answers for future location based query predicates on demand by using spatiotemporal histograms based on the concepts of entropy and marginal distribution we build spatiotemporal histograms for the movement parameters which result in the avoidance of reconstructing histograms using spatiotemporal histograms the approximate future query processing can be achieved efficiently in addition we clarify and evaluate our proposed method with several experiments
we present novel approach for summarizing video in the form of multiscale image that is continuous in both the spatial domain and across the scale dimension there are no hard borders between discrete moments in time and user can zoom smoothly into the image to reveal additional temporal details we call these artifacts tapestries because their continuous nature is akin to medieval tapestries and other narrative depictions predating the advent of motion pictures we propose set of criteria for such summarization and series of optimizations motivated by these criteria these can be performed as an entirely offline computation to produce high quality renderings or by adjusting some optimization parameters the later stages can be solved in real time enabling an interactive interface for video navigation our video tapestries combine the best aspects of two common visualizations providing the visual clarity of dvd chapter menus with the information density and multiple scales of video editing timeline representation in addition they provide continuous transitions between zoom levels in user study participants preferred both the aesthetics and efficiency of tapestries over other interfaces for visual browsing
computer system models provide detailed answers to system performance for given system configuration system model estimates the cycles per instruction that the system would incur while running given workload in addition it estimates the proportion of time that is spent in different parts of the system and other related metrics such as bus utilizations consider those inputs to system model that are estimated with uncertainty examples include cache miss rates that are obtained via trace driven cache simulation and also sometimes by extrapolating beyond the simulation domain errors incurred during the measurement and fitting processes are propagated to the system model outputs on the other hand other inputs such as hardware latencies are known precisely in this paper we propose several measures of uncertainty of system model outputs when it stems from uncertainty in the inputs some of these measures are based on sensitivity of an output to the inputs we propose ways of defining and determining these sensitivities and turning them into uncertainty measures other measures are based on sampling schemes additionally we determine uncertainty measures for the system model outputs over wide range of inputs covering large system design spaces this is done by first selecting set of input configurations based on an experimental design methodology where the uncertainty measures are determined then these data are used to interpolate the uncertainty measure function over the rest of the input space we quantitatively characterize each input’s contribution to the output uncertainty over the input’s entire range we also propose ways that call attention to high output uncertainty regions in the input space the methodology is illustrated on system models developed at sun microsystems laboratories the particular goal of the performance analysis is design of level two caches
building commodity networked storage systems is an important architectural trend commodity servers hosting moderate number of consumer grade disks and interconnected with high performance network are an attractive option for improving storage system scalability and cost efficiency however such systems incur significant overheads and are not able to deliver to applications the available throughput we examine in detail the sources of overheads in such systems using working prototype to quantify the overheads associated with various parts of the protocol we optimize our base protocol to deal with small requests by batching them at the network level and without any specific knowledge we also redesign our protocol stack to allow for asynchronous event processing in line during send path request processing these techniques improve performance for disk sata raid array from to mbytes improvement using ramdisk peak performance improves from to mbytes improvement which is of the maximum possible throughput in our experimental setup we also analyze the remaining system bottlenecks and find that although commodity storage systems have potential for building high performance subsystems traditional network and protocols are not fully capable of delivering this potential
this paper presents novel algorithm for performing integrated segmentation and pose estimation of human body from multiple views unlike other state of the art methods which focus on either segmentation or pose estimation individually our approach tackles these two tasks together our method works by optimizing cost function based on conditional random field crf this has the advantage that all information in the image edges background and foreground appearances as well as the prior information on the shape and pose of the subject can be combined and used in bayesian framework optimizing such cost function would have been computationally infeasible however our recent research in dynamic graph cuts allows this to be done much more efficiently than before we demonstrate the efficacy of our approach on challenging motion sequences although we target the human pose inference problem in the paper our method is completely generic and can be used to segment and infer the pose of any rigid deformable or articulated object
desktop grids which use the idle cycles of many desktop pcs are one of the largest distributed systems in the world despite the popularity and success of many desktop grid projects the heterogeneity and volatility of hosts within desktop grids have been poorly understood yet resource characterization is essential for accurate simulation and modelling of such platforms in this paper we present application level traces of four real desktop grids that can be used for simulation and modelling purposes in addition we describe aggregate and per host statistics that reflect the heterogeneity and volatility of desktop grid resources finally we apply our characterization to develop performance model for desktop grid applications for various task granularities and then use cluster equivalence metric to quantify the utility of the desktop grid relative to that of dedicated cluster for task parallel applications
the challenges posed by complex real time digital image processing at high resolutions cannot be met by current state of the art general purpose or dsp processors due to the lack of processing power on the other hand large arrays of fpga based accelerators are too inefficient to cover the needs of cost sensitive professional markets we present new architecture composed of network of configurable flexible weakly programmable processing elements flexible weakly programmable advanced film engine flexwafe this architecture delivers both programmability and high efficiency when implemented on an fpga basis we demonstrate these claims using professional next generation noise reducer with more than image operations at percnt fpga area utilization on four virtex ii pro fpgas this article will focus on the flexwafe architecture principle and implementation on pci express board
user authority delegation is granting or withdrawing access to computer based information by entities that own and or control that information these entities must consider who should be granted access to specific information in the organization and determine reasonable authority delegation role based access control rbac delegation management where user access authority is granted for the minimum resources necessary for users to perform their tasks is not suitable for the actual working environment of an organization currently rbac implementations cannot correctly model inheritance and rules for different delegations are in conflict further these systems require that user roles positions and information access be continuously and accurately updated resulting in manual error prone access delegation system this paper presents proposal for new authority delegation model which allows users to identify their own function based delegation requirements as the initial input to the rbac process the conditions for delegations are identified and functions to implement these delegations are defined the criteria for basic authority delegation authentication and constraints are quantified and formulated for evaluation an analysis of the proposed model is presented showing that this approach both minimizes errors in delegating authority and is more suitable for authority delegation administration in real organizational applications
technical literature such as patents research papers whitepapers and technology news articles are widely recognized as important information sources for people seeking broad knowledge in technology fields however it is generally labor intensive task to survey these resources to track major advances in broad range of technical areas to alleviate this problem we propose novel survey assistance tool that focuses on novel semantic class for phrases advantage phrases which mention strong advantageous points of technologies or products the advantage phrases such as reduce cost improve pc performance and provide early warning of future failure can help users to grasp the capabilities of new technology and to come up with innovative solutions with large business values for themselves and their clients the proposed tool automatically extracts and lists up those advantage phrases from large technical documents and places the phrases that mention novel technology applications high on the output list the developed prototype of the tool is now available for consultants analyzing patent disclosures in this paper method to identify advantage phrases in technical documents and scoring function to give higher score to novel applications of technology are proposed and evaluated
large general purposed community question answering sites are becoming popular as new venue for generating knowledge and helping users in their information needs in this paper we analyze the characteristics of knowledge generation and user participation behavior in the largest question answering online community in south korea naver knowledge in we collected and analyzed over million question answer pairs from fifteen categories between and and have interviewed twenty six users to gain insights into their motivations roles usage and expertise we find altruism learning and competency are frequent motivations for top answerers to participate but that participation is often highly intermittent using simple measure of user performance we find that higher levels of participation correlate with better performance we also observe that users are motivated in part through point system to build comprehensive knowledge database these and other insights have significant implications for future knowledge generating online communities
in this paper we introduce simple randomized dynamic data structure for storing multidimensional point sets called quadtreap this data structure is randomized balanced variant of quadtree data structure in particular it defines hierarchical decomposition of space into cells which are based on hyperrectangles of bounded aspect ratio each of constant combinatorial complexity it can be viewed as multidimensional generalization of the treap data structure of seidel and aragon when inserted points are assigned random priorities and the tree is restructured through rotations as if the points had been inserted in priority order in any fixed dimension we show it is possible to store set of points in quadtreap of space the height of the tree is log with high probability it supports point insertion in time it supports point deletion in worst case time and expected case time averaged over the points of the tree it can answer approximate spherical range counting queries over groups and approximate nearest neighbor queries in time
verification is applied to software as proof method with respect to its requirements software testing is necessary because verification is often infeasible automation is desirable since the complexity and the effort involved are significant however automated software testing is commonly used to ensure confidence in the conformance of an implementation to an abstract model not to its requirement properties in this paper we introduce the notion of property relevance of test cases property relevant test cases can be used to determine property violations it is shown how to detect the properties relevant to test case new coverage criteria based on property relevance are introduced automated generation of test suites satisfying these criteria is also presented finally feasibility is illustrated with an empirical evaluation
surface flattening is crucial problem for many applications as indicated by the steady flow of new methods appearing in related publications quality control of these methods by means of accuracy criteria independent of particular flattening methodologies has not been addressed yet by researchers this is exactly the subject of this paper detailed analysis of flattening is presented leading to geometric and physics based criteria these are implemented in intuitively acceptable visualization techniques which are applied to practical examples
during face to face interactions listeners use backchannel feedback such as head nods as signal to the speaker that the communication is working and that they should continue speaking predicting these backchannel opportunities is an important milestone for building engaging and natural virtual humans in this paper we show how sequential probabilistic models eg hidden markov model or conditional random fields can automatically learn from database of human to human interactions to predict listener backchannels using the speaker multimodal output features eg prosody spoken words and eye gaze the main challenges addressed in this paper are automatic selection of the relevant features and optimal feature representation for probabilistic models for prediction of visual backchannel cues ie head nods our prediction model shows statistically significant improvement over previously published approach based on hand crafted rules
the development of areas such as remote and airborne sensing location based services and geosensor networks enables the collection of large volumes of spatial data these datasets necessitate the wide application of spatial databases queries on these geo referenced data often require the aggregation of isolated data points to form spatial clusters and obtain properties of the clusters however current sql standard does not provide an effective way to form and query spatial clusters in this paper we aim at introducing cluster by into spatial databases to allow broad range of interesting queries to be posted on spatial clusters we also provide language construct to specify spatial clustering algorithms the extension is demonstrated with several motivating examples
enhancing and maintaining complex software system requires detailed understanding of the underlying source code gaining this understanding by reading source code is difficult since software systems are inherently dynamic it is complex and time consuming to imagine for example the effects of method’s source code at run time the inspection of software systems during execution as encouraged by debugging tools contributes to source code comprehension leveraged by test cases as entry points we want to make it easy for developers to experience selected execution paths in their code by debugging into examples we show how links between test cases and application code can be established by means of dynamic analysis while executing regular tests
multi touch interfaces allow users to translate rotate and scale digital objects in single interaction however this freedom represents problem when users intend to perform only subset of manipulations user trying to scale an object in print layout program for example might find that the object was also slightly translated and rotated interfering with what was already carefully laid out earlier we implemented and tested interaction techniques that allow users to select subset of manipulations magnitude filtering eliminates transformations eg rotation that are small in magnitude gesture matching attempts to classify the user’s input into subset of manipulation gestures handles adopts conventional single touch handles approach for touch input our empirical study showed that these techniques significantly reduce errors in layout while the handles technique was slowest variation of the gesture matching technique presented the best combination of speed and control and was favored by participants
although digital library dl information is becoming increasingly annotated using metadata semantic query with respect to the structure of metadata has seldom been addressed the correlation of the two important aspects of dl content and services can generate additional semantic relationships this study proposes content and service inference model csim to derive relationships between content and services and defines functions to manipulate these relationships adding the manipulation functions to query predicates facilitates the description of structural semantics of dl content moreover in search for dl services inferences concerning csim relationships can be made to reuse dl service components highly promising with experimental results demonstrates that csim outperforms the conventional keyword based method in both content and service queries applying csim in dl significantly improves semantic queries and alleviates the administrative load when developing novel dl services such as dl query interface library resource planning and virtual union catalog system
snapshot object is an abstraction of the problem of obtaining consistent view of the contents of shared memory in distributed system despite concurrent changes to the memory there are implementations of component snapshot objects shared by ge processes using registers this is the minimum number of registers possible we prove time lower bound for implementations that use this minimum number of registers it matches the time taken by the fastest such implementation our proof yields insight into the structure of any such implementation showing that processes must access the registers in very constrained way we also prove time lower bound for snapshot implementations using single writer registers in addition to historyless objects such as registers and swap objects
current research is demonstrating that model checking and other forms of automated finite state verification can be effective for checking properties of software systems due to the exponential costs associated with model checking multiple forms of abstraction are often necessary to obtain system models that are tractable for automated checkingthe bandera tool set provides multiple forms of automated support for compiling concurrent java software systems to models that can be supplied to several different model checking tools in this paper we describe the foundations of bandera’s data abstraction mechanism which is used to reduce the cardinality and the program’s state space of data domains in software to be model checked from technical standpoint the form of data abstraction used in bandera is simple and it is based on classical presentations of abstract interpretation we describe the mechanisms that bandera provides for declaring abstractions for attaching abstractions to programs and for generating abstracted programs and properties the contributions of this work are the design and implementation of various forms of tool support required for effective application of data abstraction to software components written in programming language like java which has rich set of linguistic features
during the last years the growing application complexity design and mask costs have compelled embedded system designers to increasingly consider partially reconfigurable application specific instruction set processors rasips which combine programmable base processor with reconfigurable fabric although such processors promise to deliver excellent balance between performance and flexibility their design remains challenging task the key to the successful design of rasip is combined architecture exploration of all the three major components the programmable core the reconfigurable fabric and the interfaces between these two this work presents design flow that supports fast architecture exploration for rasips the design flow is centered around unified description of an entire rasip in an architecture description language adl this adl description facilitates consistent modeling and exploration of all three components of rasip through automatic generation of the software tools compiler tool chain and instruction set simulator and the rtl hardware model the generated software tools and the rtl model can be used either for final implementation of the rasip or can serve as preoptimized starting point for implementation that can be hand optimized afterward the design flow is further enhanced by number of automatic application analysis tools including fine grained application profiler an instruction set extension ise generator and data path mapper for coarse grained reconfigurable architectures cgras we present some case studies on embedded benchmarks to show how the design space exploration process helps to efficiently design an application domain specific rasip
large percentage of computed results have fewer significant bits compared to the full width of register we exploit this fact to pack multiple results into single physical register to reduce the pressure on the register file in superscalar processor two schemes for dynamically packing multiple narrow width results into partitions within single register are evaluated the first scheme is conservative and allocates full width register for computed result if the computed result turns out to be narrow the result is reallocated to partitions within common register freeing up the full width register the second scheme allocates register partitions based on prediction of the width of the result and reallocates register partitions when the actual result width is higher than what was predicted if the actual width is narrower than what was predicted allocated partitions are freed up detailed evaluation of our schemes show that average ipc gains of up to can be realized across the spec benchmarks on somewhat register constrained datapath
peculiarity oriented mining pom aiming to discover peculiarity rules hidden in dataset is new data mining method in the past few years many results and applications on pom have been reported however there is still lack of theoretical analysis in this paper we prove that the peculiarity factor pf one of the most important concepts in pom can accurately characterize the peculiarity of data with respect to the probability density function of normal distribution but is unsuitable for more general distributions thus we propose the concept of local peculiarity factor lpf it is proved that the lpf has the same ability as the pf for normal distribution and is the so called sensitive peculiarity description for general distributions to demonstrate the effectiveness of the lpf we apply it to outlier detection problems and give new outlier detection algorithm called lpf outlier experimental results show that lpf outlier is an effective outlier detection algorithm
we report on the value sensitive design development and appropriation of groupware system to support software engineering knowledge sharing usage data visitors and semi structured interviews individuals suggest the methods employed were successful in addressing value tensions particularly with respect to privacy awareness and reputation key contributions include proof of concept that established value sensitive design principles and methods can be used to good effect for the design of groupware in an industry setting new design method for addressing value tensions value dams and flows and demonstration of the co evolution of technology and organizational policy
this paper describes garbage collector designed around the use of permanent private thread local nurseries and is principally oriented towards functional languages we try to maximize the cache hit rate by having threads continually reuse their individual private nurseries these private nurseries operate in such way that they can be garbage collected independently of other threads which creates low collection pause times objects which survive thread local collections are moved to mature generation that can be collected either concurrently or in stop the world fashion we describe several optimizations including two dynamic control parameter adaptation schemes related to garbage collecting the private nurseries and to our concurrent collector some of which are made possible when the language provides mutability information we tested our collector against six benchmarks and saw single threaded performance improvements in the range of we also saw increase for processors in scalability for one parallel benchmark that had previously been memory bound
we present new memory access optimization for java to perform aggressive code motion for speculatively optimizing memory accesses by applying partial redundancy elimination pre techniques first to reduce as many barriers as possible and to enhance code motion we perform alias analysis to identify all the regions in which each object reference is not aliased secondly we find all the possible barriers finally we perform code motions in three steps for the first step we apply non speculative pre algorithm to move load instructions and their following instructions in the backwards direction of the control flow graph for the second step we apply speculative pre algorithm to move some of them aggressively before the conditional branches for the third step we apply our modified version of non speculative pre algorithm to move store instructions in the forward direction of the control flow graph and to even move some of them after the merge points we implemented our new algorithm in our production level java just in time compiler our experimental results show that our speculative algorithm improves the average maximum performance by for jbytemark and for specjvm over the fastest algorithm previously described while it increases the average maximum compilation time by for both benchmark suites
in recent years multi agent systems gained growing acceptance as required technology to develop complex distributed systems as result there is an increased need for practical methodology for developing such systems this paper presents new multi agent system development masd methodology developed over several years through analyzing and studying most of the existing agent oriented methodologies this new methodology is constructed based on the strengths and weaknesses of existing methodologies masd aims to provide designers of agent based systems with set of methods and guidelines that allow them to control the construction process of complex systems enabling software engineers to specify agent based systems that would be implemented within an execution environment for example the jadex platform masd differs from existing methodologies in that it is detailed and complete methodology for developing multi agent systems this paper describes the methodology’s process and illustrates it using running example namely car rental system
in this paper we explore sustainability in interaction design by reframing concepts of user identity and use in domestic setting building on our own work on everyday design and blevis’s sustainable interaction design principles we present examples from an ethnographic study of families in their homes which illustrate design in use the creative and sustainable ways people appropriate and adapt designed artifacts we claim that adopting conception of the user as creative everyday designer generates new set of design principles that promote sustainable interaction design
the effective documentation of architectural knowledge ak is one of the key factors in leveraging the paradigm shift toward sharing and reusing ak however current documentation approaches have severe shortcomings in capturing the knowledge of large and complex systems and subsequently facilitating its usage in this paper we propose to tackle this problem through the enrichment of traditional architectural documentation with formal ak we have developed an approach consisting of method and an accompanying tool suite to support this enrichment we evaluate our approach through quasi controlled experiment with the architecture of real large and complex system we provide empirical evidence that our approach helps to partially solve the problem and indicate further directions in managing documented ak
operator strength reduction is technique that improves compiler generated code by reformulating certain costly computations in terms of less expensive ones common case arises in array addressing expressions used in loops the compiler can replace the sequence of multiplies generated by direct translation of the address expression with an equivalent sequence of additions when combined with linear function test replacement strength reduction can speed up the execution of loops containing array references the improvement comes from two sources reduction in the number of operations needed to implement the loop and the use of less costly operationsthis paper presents new algorithm for operator strength reduction called osr osr improves upon an earlier algorithm of allen cocke and kennedy allen et al osr operates on the static single assignment ssa form of procedure cytron et al by taking advantage of the properties of ssa form we have derived an algorithm that is simple to understand quick to implement and in practice fast to run its asymptotic complexity is in the worst case the same as the allen cocke and kennedy algorithm ack osr achieves optimization results that are equivalent to those obtained with the ack algorithm osr has been implemented in several research and production compilers
credentials are an indispensable means for service access control in electronic commerce however regular credentials such as certificates and spki sdsi certificates do not address user privacy at all while anonymous credentials that protect user privacy are complex and have compatibility problems with existing pkis in this paper we propose privacy preserving credentials concept between regular credentials and anonymous credentials the privacy preserving credentials enjoy the advantageous features of both regular credentials and anonymous credentials and strike balance between user anonymity and system complexity we achieve this by employing computer servers equipped with tpms trusted platform modules we present detailed construction for elgamal encryption credentials we also present xmlbased specification for the privacy preserving credentials
we show that there exists metric space such that admits bilipschitz embedding into but does not admit an equivalent metric of negative type in fact we exhibit strong quantitative bound there are point subsets yn such that mapping yn to metric of negative type requires distortion log in formal sense this is the first lower bound specifically against bilipschitz embeddings into negative type metrics and therefore unlike other lower bounds ours cannot be derived from dimensional poincare inequality this answers an open question about the strength of strong vs weak triangle inequalities in number of semi definite programs our construction sheds light on the power of various notions of dual flows that arise in algorithms for approximating the sparsest cut problem it also has other interesting implications for bilipschitz embeddings of finite metric spaces
efficient exploration of unknown or unmapped environments has become one of the fundamental problem domains in algorithm design its applications range from robot navigation in hazardous environments to rigorous searching indexing and analysing digital data available on the internet large number of exploration algorithms has been proposed under various assumptions about the capability of mobile exploring entities and various characteristics of the environment which are to be explored this paper considers the graph model where the environment is represented by graph of connections in which discrete moves are permitted only along its edges designing efficient exploration algorithms in this model has been extensively studied under diverse set of assumptions eg directed vs undirected graphs anonymous nodes vs nodes with distinct identities deterministic vs probabilistic solutions single vs multiple agent exploration as well as in the context of different complexity measures including the time complexity the memory consumption and the use of other computational resources such as tokens and messages in this work the emphasis is on memory efficient exploration of anonymous graphs we discuss in more detail three approaches random walk propp machine and basic walk reviewing major relevant results presenting recent developments and commenting on directions for further research
effective data prefetching requires accurate mechanisms to predict embedded patterns in the miss reference behavior this paper proposes novel prefetching mechanism called the spectral prefetcher sp that accurately identifies the pattern by dynamically adjusting to its frequency the proposed mechanism divides the memory address space into tag concentration zones tczones and detects either the pattern of tags higher order bits or the pattern of strides differences between consecutive tags within each tczone the prefetcher dynamically determines whether the pattern of tags or strides will increase the effectiveness of prefetching and switches accordingly to measure the performance of our scheme we use cycle accurate aggressive out of order simulator that models bus occupancy bus protocol and limited bandwidth our experimental results show performance improvement of on average and at best for the memory intensive benchmarks we studied further we show that sp outperforms the previously proposed scheme with twice the size of sp by percnt and larger cache with equivalent storage area by percnt
fault scalable service can be configured to tolerate increasing numbers of faults without significant decreases in performance the query update protocol is new tool that enables construction of fault scalable byzantine fault tolerant services the optimistic quorum based nature of the protocol allows it to provide better throughput and fault scalability than replicated state machines using agreement based protocols prototype service built using the protocol outperforms the same service built using popular replicated state machine implementation at all system sizes in experiments that permit an optimistic execution moreover the performance of the protocol decreases by only as the number of byzantine faults tolerated increases from one to five whereas the performance of the replicated state machine decreases by
geometric framework for the recognition of three dimensional objects represented by point clouds is introduced in this paper the proposed approach is based on comparing distributions of intrinsic measurements on the point cloud in particular intrinsic distances are exploited as signatures for representing the point clouds the first signature we introduce is the histogram of pairwise diffusion distances between all points on the shape surface these distances represent the probability of traveling from one point to another in fixed number of random steps the average intrinsic distances of all possible paths of given number of steps between the two points this signature is augmented by the histogram of the actual pairwise geodesic distances in the point cloud the distribution of the ratio between these two distances as well as the distribution of the number of times each point lies on the shortest paths between other points these signatures are not only geometric but also invariant to bends we further augment these signatures by the distribution of curvature function and the distribution of curvature weighted distance these histograms are compared using the or other common distance metrics for distributions the presentation of the framework is accompanied by theoretical and geometric justification and state of the art experimental results with the standard princeton shape benchmark isdb and nonrigid datasets we also present detailed analysis of the particular relevance of each one of the different proposed histogram based signatures finally we briefly discuss more local approach where the histograms are computed for number of overlapping patches from the object rather than the whole shape thereby opening the door to partial shape comparisons
models of objects have become widely accessible in several disciplines within academia and industry spanning from scientific visualization to entertainment in the last few years models are often organized into digital libraries accessible over the network and thus semantic annotation of such models becomes an important issue fundamental step in annotating model is to segment it into meaningful parts in this work we present javad framework for inspecting and segmenting objects represented in xd format in particular we present combination of segmentation and merging techniques for producing feasible decomposition of the boundary of object we represent such decomposition as graph that we call the segmentation graph which is the basis for semantic annotation we describe also the interface we have developed to allow visualization and browsing of both the decomposition and the segmentation graph in order to understand the topological structure of the resulting decomposition
social interaction among people is an essential part of every society and strong foundation for the development and self actualization of person even in virtual environments we tend to interact in social way our research addresses the tasks of recognition interpretation and visualization of affect communicated through text messaging in order to facilitate sensitive and expressive interaction in computer mediated communication we previously introduced novel syntactical rule based approach to affect recognition from text the evaluation of the developed affect analysis model showed promising results regarding its capability to accurately recognize affective information in text from an existing corpus of informal online conversations to enrich the user’s experience in online communication to make it enjoyable exciting and fun we implemented web based instant messaging im application affectim and endowed it with emotional intelligence by integrating the developed affect analysis model this paper describes the findings of twenty person study conducted with our affectim system the results of the study indicate that our im system with automatic emotion recognition function can achieve level of affective intelligence system is successful at conveying the user’s feelings avatar expression is appropriate that is comparable to gold standard where users select the label of the conveyed emotion manually
error rates in the assessment of routine claims for welfare benefits have been found to be very high in netherlands usa and uk this is significant problem both in terms of quality of service and financial loss through over payments these errors also present challenges for machine learning programs using the data in this paper we propose way of addressing this problem by using process of moderation in which agents argue about the classification on the basis of data from distinct groups of assessors our agents employ an argument based dialogue protocol padua in which the agents produce arguments directly from database of cases with each agent having their own separate database we describe the protocol and report encouraging results from series of experiments comparing padua with other classifiers and assessing the effectiveness of the moderation process
we analyze access control mechanisms of the com architecture and define configuration of the com protection system in more precise and less ambiguous language than the com documentation using this configuration we suggest an algorithm that formally specifies the semantics of authorization decisions in com we analyze the level of support for the american national standard institute’s ansi specification of role based access control rbac components and functional specification in com our results indicate that com falls short of supporting even core rbac the main limitations exist due to the tight integration of the com architecture with the underlying operating system which prevents support for session management and role activation as specified in ansi rbac
this paper describes intra method control flow and data flow testing criteria for the java bytecode language six testing criteria are considered for the generation of testing requirements four control flow and two data flow based the main reason to work at lower level is that even when there is no source code structural testing requirements can still be derived and used to assess the quality of given test set it can be used for instance to perform structural testing on third party java components in addition the bytecode can be seen as an intermediate language so the analysis performed at this level can be mapped back to the original high level language that generated the bytecode to support the application of the testing criteria we have implemented tool named jabuti java bytecode understanding and testing jabuti is used to illustrate the application of the ideas developed in this paper copyright copy john wiley sons ltd
computers are increasingly being incorporated in devices with limited amount of available memory as result research is increasingly focusing on the automated reduction of program size existing literature focuses on either data or code compaction or on highly language dependent techniques this paper shows how combined code and data compaction can be achieved using link time code compaction system that reasons about the use of both code and data addresses the analyses proposed rely only on fundamental properties of linked code and are therefore generally applicable the combined code and data compaction is implemented in squeeze link time program compaction system and evaluated on spec mediabench and programs resulting in total binary program size reductions of this compaction involves no speed trade off as the compacted programs are on average about faster
the growing requirement on the correct design of high performance dsp system in short time force us to use ip’s in many design in this paper we propose an efficient ip block based design environment for high throughput vlsi systems the flow generates systemc register transfer level rtl architecture starting from matlab functional model described as netlist of functional ip the refinement process inserts automatically control structures to treat delays induced by the use of rtl ips it also inserts control structure to coordinate the execution of parallel clocked ip the delays may be managed by registers or by counters included in the control structure the experimentations show that the approach can produce efficient rtl architecture and allow huge save of time
in this paper we present web recommender system for recommending predicting and personalizing music playlists based on user model we have developed hybrid similarity matching method that combines collaborative filtering with ontology based semantic distance measurements we dynamically generate personalized music playlist from selection of recommended playlists which comprises the most relevant tracks to the user our web recommender system features three functionalities predict the likability of user towards specific music playlist recommend set of music playlists and compose new personalized music playlist our experimental results will show the efficacy of our hybrid similarity matching approach and the information personalization method
number of researches have focused on the usability aspect of groupware in supporting collaborative work unfortunately our understandings on their impact on supporting collaborative learning are still limited due to lack of attention on this issue furthermore the majority of educators and designers in cscl expect that interactions and collaborations would come naturally as result we are too busy with how versatile the tools in educational groupware systems shall have in order to provide wide variety of interaction opportunities for both learners and educators and largely ignore whether or not these features are valuable from learners as well as educators perspective to bridge this gap in this paper we describe our experiences with loosely coupled collaborative software called groupscribbles in its potential of supporting cooperation and coordination among learners as well as its failures our study suggests that it is not the versatility of the tools in these educational groupware systems but how they can provide seamless and focused distributed learning environment determines the overall pedagogical appropriateness of the software in cscl that is the learning environment although distributed and fragmented should be capable of sticking learners their activities and meta cognitive problem solving skills cohesively so as to continuously construct relatively compact learning space where coordination and collaborations can be made cheap lightweight effective and efficient
the aim of the experimental study described in this article is to investigate the effect of lifelike character with subtle expressivity on the affective state of users the character acts as quizmaster in the context of mathematical game this application was chosen as simple and for the sake of the experiment highly controllable instance of human computer interfaces and software subtle expressivity refers to the character’s affective response to the user’s performance by emulating multimodal human human communicative behavior such as different body gestures and varying linguistic style the impact of em pathic behavior which is special form of affective response is examined by deliberately frustrating the user during the game progress there are two novel aspects in this investigation first we employ an animated interface agent to address the affective state of users rather than text based interface which has been used in related research second while previous empirical studies rely on questionnaires to evaluate the effect of life like characters we utilize physiological information of users in addition to questionnaire data in order to precisely associate the occurrence of interface events with users autonomic nervous system activity the results of our study indicate that empathic character response can significantly decrease user stress and that affective behavior may have positive effect on users perception of the difficulty of task
this paper presents systematic study of the properties of large number of web sites hosted by major isp to our knowledge ours is the first comprehensive study of large server farm that contains thousands of commercial web sites we also perform simulation analysis to estimate potential performance benefits of content delivery networks cdns for these web sites we make several interesting observations about the current usage of web technologies and web site performance characteristics first compared with previous client workload studies the web server farm workload contains much higher degree of uncacheable responses and responses that require mandatory cache validations significant reason for this is that cookie use is prevalent among our population especially among more popular sites however we found an indication of wide spread indiscriminate usage of cookies which unnecessarily impedes the use of many content delivery optimizations we also found that most web sites do not utilize the cache control features ofthe http protocol resulting in suboptimal performance moreover the implicit expiration time in client caches for responses is constrained by the maximum values allowed in the squid proxy finally our simulation results indicate that most web sites benefit from the use of cdn the amount of the benefit depends on site popularity and somewhat surprisingly cdn may increase the peak to average request ratio at the origin server because the cdn can decrease the average request rate more than the peak request rate
informal and formal approaches to documenting software architecture design offer disjoint advantages and disadvantages informal approaches are often used in practice since they are easily accessible and support creativity and flexibility during design but they are hard to maintain and validate this is the strength of formally defined approaches which can be automatically processed maintained and validated but are expensive to use combining the advantages of both approaches promises to increase the reach of formal approaches and to make the aforementioned advantages more accessible we present an approach that offers seamless transition from relaxed and informal architecture descriptions to detailed and formally defined architecture definition
transactional memory tm has been shown to be promising programming model for multi core systems we developed software based transactional memory stm compiler that generates efficient transactional code for transactions to run on stm runtime without the need of transactional hardware support since real world applications often invoke third party libraries available only in binary form it is imperative for our stm compiler to support legacy binary functions and provide an efficient solution to convert those invoked inside transactions to the corresponding transactional code our stm compiler employs lightweight dynamic binary translation and optimization module ldbtom to automatically convert legacy binary functions to transactional code in this paper we describe our ldbtom system which seamlessly integrates the translated code with the stm compiler generated code to run on the stm runtime and optimizes the translated code taking advantage of dynamic optimization opportunities and stm runtime information although the binary code is inherently harder to optimize than high level source code our experiment shows that it can be translated and optimized into efficient transactional code by ldbtom
one of the major challenges of post pc computing is the need to reduce energy consumption thereby extending the lifetime of the batteries that power these mobile devices memory is particularly important target for efforts to improve energy efficiency memory technology is becoming available that offers power management features such as the ability to put individual chips in any one of several different power modes in this paper we explore the interaction of page placement with static and dynamic hardware policies to exploit these emerging hardware features in particular we consider page allocation policies that can be employed by an informed operating system to complement the hardware power management strategies we perform experiments using two complementary simulation environments trace driven simulator with workload traces that are representative of mobile computing and an execution driven simulator with detailed processor memory model and more memory intensive set of benchmarks spec our results make compelling case for cooperative hardware software approach for exploiting power aware memory with down to as little as of the energy bull delay for the best static policy and to of the energy bull delay for traditional full power memory
digital information goods constitute growing class of economic goods during decision making for purchase buyer searches for information about digital information goods such as information about the content price and trading information usage information how it can be presented and which legal restrictions apply we present logical container model for knowledge intensive digital information goods knowledge content object kco that directly references formalised semantic descriptions of key information types on information goods key information types are formalised as plug in slots facets facets can be instantiated by semantic descriptions that are linked with domain ontologies we have identified six logically congruent facet types by which user can interpret information goods kcos are mediated and managed by technical middleware called knowledge content carrier architecture kcca based on the technical and logical structure of kco we will discuss five economic implications that drive further research
the domain name system dns is critical part of the internet’s infrastructure and is one of the few examples of robust highly scalable and operational distributed system although few studies have been devoted to characterizing its properties such as its workload and the stability of the top level servers many key components of dns have not yet been examined based on large scale measurements taken fromservers in large content distribution network we present detailed study of key characteristics of the dns infrastructure such as load distribution availability and deployment patterns of dns servers our analysis includes both local dns servers and servers in the authoritative hierarchy we find that the vast majority of users use small fraction of deployed name servers the availability of most name servers is high and there exists larger degree of diversity in local dns server deployment and usage than for authoritative servers furthermore we use our dns measurements to draw conclusions about federated infrastructures in general we evaluate and discuss the impact of federated deployment models on future systems such as distributed hash tables
we consider the problem of locating replicas in network to minimize communications costs under the assumption that the read one write all policy is used to ensure data consistency an optimization problem is formulated in which the cost function estimates the total communications costs the paper concentrates on the study of the optimal communications cost as function of the ratio between the frequency of the read and write operations the problem is reformulated as zero one linear programming problem and its connection to the median problem is explained the general problem is proved to be np complete for path graphs dynamic programming algorithm for the problem is presented
hard disk drives returned back to seagate undergo the field return incoming test during the test the available logs in disk drives are collected if possible these logs contain cumulative data on the workload seen by the drive during its lifetime including the amount of bytes read and written the number of completed seeks and the number of spin ups the population of returned drives is considerable and the respective collected data represents good source of information disk drive workloads in this paper we present an in breadth analysis of these logs for the cheetah and families of drives we observe that over an entire family of drives the workload behavior is variable the workload variability is more enhanced during the first month of the drive’s life than afterward our analysis shows that the drives are generally underutilized yet there is portion of them about that experience higher utilization levels also the data sets indicate that the majority of disk driveswrite more than they read during their lifetime these observations can be used in the design process of disk drive features that intend to enhance overall drive operation including reliability performance and power consumption
home networks are common but notoriously difficult to setup and maintain the difficulty users experience in setting up and maintaining their home network is problematic because of the numerous security threats that can exploit poorly configured and maintained network security because there is little empirical data to characterize the usability problems associated with the adoption of wireless network security we surveyed primary caretakers and users of home networks examining their perceptions and usage of the security features available to them we found that users did not understand the difference between access control lists and encryption and that devices fail to properly notify users of weak security configuration choices to address these issues we designed and evaluated novel wireless router configuration wizard that encouraged strong security choices by improving the network configuration steps we found that security choices made by users of our wizard resulted in stronger security practices when compared to the wizard from leading equipment manufacturer
online program monitoring is an effective technique for detecting bugs and security attacks in running applications extending these tools to monitor parallel programs is challenging because the tools must account for inter thread dependences and relaxed memory consistency models existing tools assume sequential consistency and often slow down the monitored program by orders of magnitude in this paper we present novel approach that avoids these pitfalls by not relying on strong consistency models or detailed inter thread dependence tracking instead we only assume that events in the distant past on all threads have become visible we make no assumptions on and avoid the overheads of tracking the relative ordering of more recent events on other threads to overcome the potential state explosion of considering all the possible orderings among recent events we adapt two techniques from static dataflow analysis reaching definitions and reaching expressions to this new domain of dynamic parallel monitoring significant modifications to these techniques are proposed to ensure the correctness and efficiency of our approach we show how our adapted analysis can be used in two popular memory and security tools we prove that our approach does not miss errors and sacrifices precision only due to the lack of relative ordering among recent events moreover our simulation study on collection of splash and parsec benchmarks running memory checking tool on hardware assisted logging platform demonstrates the potential benefits in trading off very low false positive rate for reduced overhead and ii the ability to run on relaxed consistency models
the emergent field of computational photography is proving that by coupling generalized imaging optics with software processing the quality and flexibility of imaging systems can be increased in this paper we capture and manipulate multiple images of scene taken with different aperture settings numbers we design and implement prototype optical system and associated algorithms to capture four images of the scene in single exposure each taken with different aperture setting our system can be used with commercially available dslr cameras and photographic lenses without modification to either we leverage the fact that defocus blur is function of scene depth and to estimate depth map we demonstrate several applications of our multi aperture camera such as post exposure editing of the depth of field including extrapolation beyond the physical limits of the lens synthetic refocusing and depth guided deconvolution
testing of concurrent software is extremely difficult despite all the progress in the testing and verification technology concurrent bugs the most common of which are deadlocks and races make it to the field this paper describes set of techniques implemented in tool called contest allowing concurrent programs to self heal at run time concurrent bugs have the very desirable property for healing that some of the interleaving produce correct results while in others bugs manifest healing concurrency problems is about limiting or changing the probability of interleaving such that bugs will be seen less when healing concurrent programs if deadlock does not result from limiting the interleaving we are sure that the result of the healed program could have been in the original program and therefore no new functional bug has been introduced in this initial work which deals with different types of data races we suggest three types of healing mechanisms changing the probability of interleaving by introducing sleep or yield statements or by changing thread priorities removing interleaving using synchronisation commands like locking and unlocking certain mutexes or waits and notifies and removing the result of bad interleaving by replacing the value of variables by the one that should have been taken we also classify races according to the relevant healing strategies to apply
it is challenge to provide detection facilities for large scale distributed systems running legacy code on hosts that may not allow fault tolerant functions to execute on them it is tempting to structure the detection in an observer system that is kept separate from the observed system of protocol entities with the former only having access to the latter’s external message exchanges in this paper we propose an autonomous self checking monitor system which is used to provide fast detection to underlying network protocols the monitor architecture is application neutral and therefore lends itself to deployment for different protocols with the rulebase against which the observed interactions are matched making it specific to protocol to make the detection infrastructure scalable and dependable we extend it to hierarchical monitor structure the monitor structure is made dynamic and reconfigurable by designing different interactions to cope with failures load changes or mobility the latency of the monitor system is evaluated under fault free conditions while its coverage is evaluated under simulated error injections
automatic concept learning from large scale imbalanced data sets is key issue in video semantic analysis and retrieval which means the number of negative examples is far more than that of positive examples for each concept in the training data the existing methods adopt generally under sampling for the majority negative examples or over sampling for the minority positive examples to balance the class distribution on training data the main drawbacks of these methods are as key factor that affects greatly the performance in most existing methods the degree of re sampling needs to be pre fixed which is not generally the optimal choice many useful negative samples may be discarded in under sampling in addition some works only focus on the improvement of the computational speed rather than the accuracy to address the above issues we propose new approach and algorithm named adaouboost adaptive over sampling and under sampling boost the novelty of adaouboost mainly lies in adaptively over sample the minority positive examples and under sample the majority negative examples to form different sub classifiers and combine these sub classifiers according to their accuracy to create strong classifier which aims to use fully the whole training data and improve the performance of the class imbalance learning classifier in adaouboost first our clustering based under sampling method is employed to divide the majority negative examples into some disjoint subsets then for each subset of negative examples we utilize the borderline smote synthetic minority over sampling technique algorithm to over sample the positive examples with different size train each sub classifier using each of them and get the classifier by fusing these sub classifiers with different weights finally we combine these classifiers in each subset of negative examples to create strong classifier we compare the performance between adaouboost and the state of the art methods on trecvid benchmark with all concepts and the results show the adaouboost can achieve the superior performance in large scale imbalanced data sets
the principle of least privilege is well known design principle to which access control models and systems should adhere in the context of role based access control the principle of least privilege can be implemented through the use of sessions in this paper we first define family of simple role based models that provide support for multiple hierarchies and temporal constraints we then investigate question related to sessions in each of these models the inter domain role mapping problem the question has previously been defined and analyzed in the context of particular role based model we redefine the question and analyze it in the context of number of different role based models
we are inevitably moving into realm where small and inexpensive wireless devices would be seamlessly embedded in the physical world and form wireless sensor network in order to perform complex monitoring and computational tasks such networks pose new challenges in data processing and dissemination because of the limited resources processing bandwidth energy that such devices possess in this paper we propose new technique for compressing multiple streams containing historical data from each sensor our method exploits correlation and redundancy among multiple measurements on the same sensor and achieves high degree of data reduction while managing to capture even the smallest details of the recorded measurements the key to our technique is the base signal series of values extracted from the real measurements used for encoding piece wise linear correlations among the collected data values we provide efficient algorithms for extracting the base signal features from the data and for encoding the measurements using these features our experiments demonstrate that our method by far outperforms standard approximation techniques like wavelets histograms and the discrete cosine transform on variety of error metrics and for real datasets from different domains
when network resources are shared between instantaneous request ir and book ahead ba connections activation of future ba connections may cause preemption of on going ir connections due to resource scarcity rerouting of preempted calls via alternative feasible paths is often considered as the final option to restore and maintain service continuity existing rerouting techniques however do not ensure acceptably low service disruption time and suffer from high failure rate and low network utilization in this work new rerouting strategy is proposed that estimates the future resource scarcity identifies the probable candidate connections for preemption and initiates the rerouting process in advance for those connections simulations on widely used network topology suggest that the proposed rerouting scheme achieves higher successful rerouting rate with lower service disruption time while not compromising other network performance metrics like utilization and call blocking rate
testing object oriented programs is still hard task despite many studies on criteria to better cover the test space test criteria establish requirements one want to achieve in testing programs to help in finding software defects on the other hand program verification guarantees that program preserves its specification but it is not very straightforwardly applicable in many cases both program testing and verification are expensive tasks and could be used to complement each other this paper presents study on using formal verification to reduce the space of program testing as properties are checked using program model checkers programs are traced information from these traces can be used to realize how much testing criteria have been satisfied reducing the further program test space the present work is study on how much the test space of concurrent java programs can be reduced if deadlockfreedom is checked prior to testing
lightweight bytecode verification uses stack maps to annotate java bytecode programs with type information in order to reduce the verification to type checking this paper describes an improved bytecode analyser together with algorithms for optimizing the stack maps generated the analyser is simplified in its treatment of base values keeping only the necessary information to ensure memory safety and enriched in its representation of interface types using the dedekind macneille completion technique the computed interface information allows to remove the dynamic checks at interface method invocations we prove the memory safety property guaranteed by the bytecode verifier using an operational semantics whose distinguishing feature is the use of untagged bit values for bytecode typable without sets of types we show how to prune the fix point to obtain stack map that can be checked without computing with sets of interfaces ie lightweight verification is not made more complex or costly experiments on three substantial test suites show that stack maps can be computed and correctly pruned by an optimized but incomplete pruning algorithm
adequate coverage is very important for sensor networks to fulfill the issued sensing tasks in many working environments it is necessary to make use of mobile sensors which can move to the correct places to provide the required coverage in this paper we study the problem of placing mobile sensors to get high coverage based on voronoi diagrams we design two sets of distributed protocols for controlling the movement of sensors one favoring communication and one favoring movement in each set of protocols we use voronoi diagrams to detect coverage holes and use one of three algorithms to calculate the target locations of sensors if holes exist simulation results show the effectiveness of our protocols and give insight on choosing protocols and calculation algorithms under different application requirements and working conditions
in order to run untrusted code in the same process as trusted code there must be mechanism to allow dangerous calls to determine if their caller is authorized to exercise the privilege of using the dangerous routine java systems have adopted technique called stack inspection to address this concern but its original definition in terms of searching stack frames had an unclear relationship to the actual achievement of security overconstrained the implementation of java system limited many desirable optimizations such as method inlining and tail recursion and generally interfered with interprocedural optimization we present new semantics for stack inspection based on belief logic and its implementation using the calculus of security passing style which addresses the concerns of traditional stack inspection with security passing style we can efficiently represent the security context for any method activation and we can build new implementation strictly by rewriting the java bytecodes before they are loaded by the system no changes to the jvm or bytecode semantics are necessary with combination of static analysis and runtime optimizations our prototype implementation showes reasonable performance although traditional stack inspection is still faster and is easier to consider for languages beyond java we call our system safkasi the security architecture formerly known as stack inspection
many web sites especially those that dynamically generate html pages to display the results of user’s query present information in the form of list or tables current tools that allow applications to programmatically extract this information rely heavily on user input often in the form of labeled extracted records the sheer size and rate of growth of the web make any solution that relies primarily on user input is infeasible in the long term fortunately many web sites contain much explicit and implicit structure both in layout and content that we can exploit for the purpose of information extraction this paper describes an approach to automatic extraction and segmentation of records from web tables automatic methods do not require any user input but rely solely on the layout and content of the web source our approach relies on the common structure of many web sites which present information as list or table with link in each entry leading to detail page containing additional information about that item we describe two algorithms that use redundancies in the content of table and detail pages to aid in information extraction the first algorithm encodes additional information provided by detail pages as constraints and finds the segmentation by solving constraint satisfaction problem the second algorithm uses probabilistic inference to find the record segmentation we show how each approach can exploit the web site structure in general domain independent manner and we demonstrate the effectiveness of each algorithm on set of twelve web sites
maude is high performance reflective language and system supporting both equational and rewriting logic specification and programming for wide range of applications and has relatively large worldwide user and open source developer base this paper introduces novel features of maude including support for unification and narrowing unification is supported in core maude the core rewriting engine of maude with commands and metalevel functions for order sorted unification modulo some frequently occurring equational axioms narrowing is currently supported in its full maude extension we also give brief summary of the most important features of maude that were not part of maude and earlier releases these features include communication with external objects new implementation of its module algebra and new predefined libraries we also review some new maude applications
passive wand tracked in using computer vision techniques is explored as new input mechanism for interacting with large displays we demonstrate variety of interaction techniques that exploit the affordances of the wand resulting in an effective interface for large scale interaction the lack of any buttons or other electronics on the wand presents challenge that we address by developing set of postures and gestures to track state and enable command input we also describe the use of multiple wands and posit designs for more complex wands in the future
future wireless internet will consist of different wireless technologies that should operate together in an efficient way to provide seamless connectivity to mobile users the integration of different networks and technologies is challenging problem mainly because of the heterogeneity in access technologies network architectures protocols and service demands firstly this paper discusses three architectures for an all ip network integrating different wireless technologies using ip and its associated service models the first architecture called isb is based on combination of differentiated services diffserv and integrated services intserv models appropriate for low bandwidth cellular networks with significant resource management capabilities the second architecture called dsb is purely based on the diffserv model targeted for high bandwidth wireless lans with little resource management capabilities the last architecture called aip combines isb and dsb architectures to facilitate the integration of wireless lan and cellular networks towards uniform architecture for all ip wireless networks secondly this paper proposes flexible hierarchical resource management mechanism for the proposed all ip architecture aip which aims at providing connection level quality of service qos for mobile users simulation results show that the proposed mechanism satisfies the hard constraint on connection dropping probability while maintaining high bandwidth utilisation
the concurrency of transactions executing on atomic data types can be enhanced through the use of semantic information about operations defined on these types hitherto commutativity of operations has been exploited to provide enchanced concurrency while avoiding cascading aborts we have identified property known as recoverability which can be used to decrease the delay involved in processing noncommuting operations while still avoiding cascading aborts when an invoked operation is recoverable with respect to an uncommitted operation the invoked operation can be executed by forcing commit dependency between the invoked operation and the uncommitted operation the transaction invoking the operation will not have to wait for the uncommitted operation to abort or commit further this commit dependency only affects the order in which the operations should commit if both commit if either operation aborts the other can still commit thus avoiding cascading aborts to ensure the serializability of transactions we force the recoverability relationship between transactions to be acyclic simulation studies based on the model presented by agrawal et al indicate that using recoverability the turnaround time of transactions can be reduced further our studies show enchancement in concurrency even when resource constraints are taken into consideration the magnitude of enchancement is dependent on the resource contention the lower the resource contention the higher the improvement
scientific peer review open source software development wikis and other domains use distributed review to improve quality of created content by providing feedback to the work’s creator distributed review is used to assess or improve the quality of work eg an article however it can also provide learning benefits to the participants in the review process we developed an online review system for beginning computer programming students it gathers multiple anonymous peer reviews to give students feedback on their programming work we deployed the system in an introductory programming class and evaluated it in controlled study we find that peer reviews are accurate compared to an accepted evaluation standard that students prefer reviews from other students with less experience than themselves and that participating in peer review process results in better learning outcomes
contextualised open hypermedia can be used to provide added value to document collections or artefacts however transferring the underlying hyper structures into users conceptual model is often problem augmented reality provides mechanism for presenting these structures in visual and tangible manner translating the abstract action of combining contextual linkbases into physical gestures of real familiarity to users of the system this paper examines the use of augmented reality in hypermedia and explores some possible modes of interaction that embody the functionality of open hypermedia and contextual linking using commonplace and easily understandable real world metaphors
versioned and bug tracked software systems provide huge amount of historical data regarding source code changes and issues management in this paper we deal with impact analysis of change request and show that data stored in software repositories are good descriptor on how past change requests have been resolved fine grained analysis method of software repositories is used to index code at different levels of granularity such as lines of code and source files with free text contained in software repositories the method exploits information retrieval algorithms to link the change request description and code entities impacted by similar past change requests we evaluate such approach on set of three open source projects
although many augmented tabletop systems have shown the potential and usability of finger based interactions and paper based interfaces they have mainly dealt with each of them separately in this paper we introduce novel method aimed to improve human natural interactions on augmented tabletop systems which enables multiple users to use both fingertips and physical papers as mediums for interaction this method uses computer vision techniques to detect multi fingertips both over and touching the surface in real time regardless of their orientations fingertip and touch positions would then be used in combination with paper tracking to provide richer set of interaction gestures that the users can perform in collaborative scenarios
we present novel technique that speeds up state space exploration sse for evolving programs with dynamically allocated data sse is the essence of explicit state model checking and an increasingly popular method for automating test generation traditional non incremental sse takes one version of program and systematically explores the states reachable during the program’s executions to find property violations incremental sse considers several versions that arise during program evolution reusing the results of sse for one version can speed up sse for the next version since state spaces of consecutive program versions can have significant similarities we have implemented our technique in two model checkers java pathfinder and the sim state space explorer the experimental results on program evolutions and exploration changes show that for non initial runs our technique speeds up sse in cases from to with median of and slows down sse in only two cases for and
the phenomenon of churn has significant effect on the performance of peer to peer pp networks especially in mobile environments that are characterized by intermittent connections and unguaranteed network bandwidths number of proposals have been put forward to deal with this problem however we have so far not seen any thorough analysis to guide the optimal design choices and parameter configurations for structured pp networks in this article we present performance evaluation of structured communication oriented pp system in the presence of churn the evaluation is conducted using both simulation models and real life prototype implementation in both evaluation environments we utilize kademlia with some modifications as the underlying distributed hash table dht algorithm and peer to peer protocol ppp as the signaling protocol the results from the simulation models created using nethawk east telecommunication simulator software suggest that in most situations lookup parallelism degree of and resource replication degree of are enough for guaranteeing high resource lookup success ratio we also notice that with the parallel lookup mechanism good success ratio is achieved even without the keepalive traffic that is used for detecting the aliveness of nodes prototype system that works in mobile environment is implemented to evaluate the feasibility of mobile nodes acting as full fledged peers the measurements made using the prototype show that from the viewpoints of cpu load and network traffic load it is feasible for the mobile nodes to take part in the overlay through energy consumption measurements we draw the conclusion that in general the umts access mode consumes slightly more power than the wlan access mode protocol packets with sizes of bytes or less are observed to be the most energy efficient in the umts access mode
advances in wireless and mobile technology flood us with amounts of moving object data that preclude all means of manual data processing the volume of data gathered from position sensors of mobile phones pdas or vehicles defies human ability to analyze the stream of input data on the other hand vast amounts of gathered data hide interesting and valuable knowledge patterns describing the behavior of moving objects thus new algorithms for mining moving object data are required to unearth this knowledge an important function of the mobile objects management system is the prediction of the unknown location of an object in this paper we introduce data mining approach to the problem of predicting the location of moving object we mine the database of moving object locations to discover frequent trajectories and movement rules then we match the trajectory of moving object with the database of movement rules to build probabilistic model of object location experimental evaluation of the proposal reveals prediction accuracy close to our original contribution includes the elaboration on the location prediction model the design of an efficient mining algorithm introduction of movement rule matching strategies and thorough experimental evaluation of the proposed model
the need to fit together reusable components and system designs in spite of differences in protocol and representation choices occurs often in object oriented software construction it is therefore necessary to use adapters to achieve an exact fit between the available ldquo socket rdquo for reusable part and the actual part in this paper we discuss an approach to the construction of tools that largely automate the synthesis of adapter code such tools are important in reducing the effort involved in reuse since adapter synthesis can be challenging and error prone in the complex type environment of an object oriented language our approach is applicable to statically typed languages like and eiffel and is based on formal notion of adaptability which is related to but distinct from both subtyping and inheritance
the presentation of search results on the web has been dominated by the textual form of document representation on the other hand the document’s visual aspects such as the layout colour scheme or presence of images have been studied in limited context with regard to their effectiveness of search result presentation this article presents comparative evaluation of textual and visual forms of document representation as additional components of document surrogates total of people were recruited for our task based user study the experimental results suggest that an increased level of document representation available in the search results can facilitate users interaction with search interface the results also suggest that the two forms of additional representations are likely beneficial to users information searching process in different contexts
most programs are repetitive where similar behavior can be seen at different execution times algorithms have been proposed that automatically group similar portions of program’s execution into phases where samples of execution in the same phase have homogeneous behavior and similar resource requirements in this paper we present an automated profiling approach to identify code locations whose executions correlate with phase changes these software phase markers can be used to easily detect phase changes across different inputs to program without hardware support our approach builds combined hierarchical procedure call and loop graph to represent program’s execution where each edge also tracks the max average and standard deviation in hierarchical execution variability on paths from that edge we search this annotated call loop graph for instructions in the binary that accurately identify the start of unique stable behaviors across different inputs we show that our phase markers can be used to accurately partition execution into units of repeating homogeneous behavior by counting execution cycles and data cache hits we also compare the use of our software markers to prior work on guiding data cache reconfiguration using datareuse markers finally we show that the phase markers can be used to partition the program’s execution at code transitions to pick accurately simulation points for simpoint when simulation points are defined in terms of phase markers they can potentially be re used across inputs compiler optimizations and different instruction set architectures for the same source code
this paper identifies generic axiom framework for prioritised fuzzy constraint satisfaction problems pfcsps and proposes methods to instantiate it ie to construct specific schemes which obey the generic axiom framework in particular we give five methods to construct the priority operators that are used for calculating the local satisfaction degree of prioritised fuzzy constraint and identify priority norm operators that can be used for calculating the global satisfaction degree of prioritised fuzzy constraint problem moreover number of numerical examples and real examples are used to validate our system and thus we further obtain some insights into our system in addition we explore the relationship between weight schemes and prioritised fcsp schemes and reveal that the weighted fcsp schemes are the dual of prioritised fcsp schemes which can correspondingly be called posterioritised fcsp schemes
as the dynamic voltage scaling dvs technique provides system engineers the flexibility to trade off the performance and the energy consumption dvs has been adopted in many computing systems however the longer job executes the more energy in the leakage current the device processor consumes for the job to reduce the energy consumption resulting from the leakage current system might enter the dormant mode this paper targets energy efficient rate monotonic scheduling for periodic real time tasks on uniprocessor dvs system with non negligible leakage power consumption an on line simulated scheduling strategy and virtually blocking time strategy are developed for procrastination scheduling to reduce energy consumption the proposed algorithms derive feasible schedule for real time tasks with worst case guarantees for any input instance experimental results show that our proposed algorithms could derive energy efficient solutions
the programming language is at least as well known for its absence of spatial memory safety guarantees ie lack of bounds checking as it is for its high performance c’s unchecked pointer arithmetic and array indexing allow simple programming mistakes to lead to erroneous executions silent data corruption and security vulnerabilities many prior proposals have tackled enforcing spatial safety in programs by checking pointer and array accesses however existing software only proposals have significant drawbacks that may prevent wide adoption including unacceptably high run time overheads lack of completeness incompatible pointer representations or need for non trivial changes to existing source code and compiler infrastructure inspired by the promise of these software only approaches this paper proposes hardware bounded pointer architectural primitive that supports cooperative hardware software enforcement of spatial memory safety for programs this bounded pointer is new hardware primitive datatype for pointers that leaves the standard pointer representation intact but augments it with bounds information maintained separately and invisibly by the hardware the bounds are initialized by the software and they are then propagated and enforced transparently by the hardware which automatically checks pointer’s bounds before it is dereferenced one mode of use requires instrumenting only malloc which enables enforcement of perallocation spatial safety for heap allocated objects for existing binaries when combined with simple intraprocedural compiler instrumentation hardware bounded pointers enable low overhead approach for enforcing complete spatial memory safety in unmodified programs
this paper presents an online support vector machine svm that uses the stochastic meta descent smd algorithm to adapt its step size automatically we formulate the online learning problem as stochastic gradient descent in reproducing kernel hilbert space rkhs and translate smd to the nonparametric setting where its gradient trace parameter is no longer coefficient vector but an element of the rkhs we derive efficient updates that allow us to perform the step size adaptation in linear time we apply the online svm framework to variety of loss functions and in particular show how to handle structured output spaces and achieve efficient online multiclass classification experiments show that our algorithm outperforms more primitive methods for setting the gradient step size
distributed hash table dht systems are an important class of peer to peer routing infrastructures they enable scalable wide area storage and retrieval of information and will support the rapid development of wide variety of internet scale applications ranging from naming systems and file systems to application layer multicast dht systems essentially build an overlay network but path on the overlay between any two nodes can be significantly different from the unicast path between those two nodes on the underlying network as such the lookup latency in these systems can be quite high and can adversely impact the performance of applications built on top of such systemsin this paper we discuss random sampling technique that incrementally improves lookup latency in dht systems our sampling can be implemented using information gleaned from lookups traversing the overlay network for this reason we call our approach lookup parasitic random sampling lprs lprs converges quickly and requires relatively few modifications to existing dht systemsfor idealized versions of dht systems like chord tapestry and pastry we analytically prove that lprs can result in lookup latencies proportional to the average unicast latency of the network provided the underlying physical topology has power law latency expansion we then validate this analysis by implementing lprs in the chord simulator our simulations reveal that lprs chord exhibits qualitatively better latency scaling behavior relative to unmodified chord the overhead of lprs is one sample per lookup hop in the worst casefinally we provide evidence which suggests that the internet router level topology resembles power law latency expansion this finding implies that lprs has significant practical applicability as general latency reduction technique for many dht systems this finding is also of independent interest since it might inform the design of latency sensitive topology models for the internet
state of the art cluster based data centers consisting of three tiers web server application server and database server are being used to host complex web services such as commerce applications the application server handles dynamic and sensitive web contents that need protection from eavesdropping tampering and forgery although the secure sockets layer ssl is the most popular protocol to provide secure channel between client and cluster based network server its high overhead degrades the server performance considerably and thus affects the server scalability therefore improving the performance of ssl enabled network servers is critical for designing scalable and high performance data centers in this paper we examine the impact of ssl offering and ssl session aware distribution in cluster based network servers we propose back end forwarding scheme called sslwithbf that employs low overhead user level communication mechanism like virtual interface architecture via to achieve good load balance among server nodes we compare three distribution models for network servers round robin rr sslwithsession and sslwithbf through simulation the experimental results with node and node cluster configurations show that although the session reuse of sslwithsession is critical to improve the performance of application servers the proposed back end forwarding scheme can further enhance the performance due to better load balancing the sslwithbf scheme can minimize the average latency by about percent and improve throughput across variety of workloads
the pervasiveness and operational autonomy of mesh based wireless sensor networks wsns make them an ideal candidate in offering sustained monitoring functions at reasonable cost over wide area there has been general consensus within the research community that it is of critical importance to jointly optimize protocol sublayers in order to devise energy efficient cost effective and reliable communication strategies for wsns this paper proposes cross layer organizational approach based on sleep scheduling called sense sleep trees ss trees that aims to harmonize the various engineering issues and provides method to increase the monitoring coverage and the operational lifetime of mesh based wsns engaged in wide area surveillance applications an integer linear programming ilp formulation based on network flow model is provided to determine the optimal ss tree structures for achieving such design goals
computing semantic relatedness of natural language texts requires access to vast amounts of common sense and domain specific world knowledge we propose explicit semantic analysis esa novel method that represents the meaning of texts in high dimensional space of concepts derived from wikipedia we use machine learning techniques to explicitly represent the meaning of any text as weighted vector of wikipedia based concepts assessing the relatedness of texts in this space amounts to comparing the corresponding vectors using conventional metrics eg cosine compared with the previous state of the art using esa results in substantial improvements in correlation of computed relatedness scores with human judgments from to for individual words and from to for texts importantly due to the use of natural concepts the esa model is easy to explain to human users
in this paper we introduce generic framework for semi supervised kernel learning given pair wise dis similarity constraints we learn kernel matrix over the data that respects the provided side information as well as the local geometry of the data our framework is based on metric learning methods where we jointly model the metric kernel over the data along with the underlying manifold furthermore we show that for some important parameterized forms of the underlying manifold model we can estimate the model parameters and the kernel matrix efficiently our resulting algorithm is able to incorporate local geometry into the metric learning task at the same time it can handle wide class of constraints finally our algorithm is fast and scalable unlike most of the existing methods it is able to exploit the low dimensional manifold structure and does not require semi definite programming we demonstrate wide applicability and effectiveness of our framework by applying to various machine learning tasks such as semi supervised classification colored dimensionality reduction manifold alignment etc on each of the tasks our method performs competitively or better than the respective state of the art method
wide range of applications require that large quantities of data be maintained in sort order on disk the tree and its variants are an efficient general purpose disk based data structure that is almost universally used for this task the trie has the potential to be competitive alternative for the storage of data where strings are used as keys but has not previously been thoroughly described or tested we propose new algorithms for the insertion deletion and equality search of variable length strings in disk resident trie as well as novel splitting strategies which are critical element of practical implementation we experimentally compare the trie against variants of tree on several large sets of strings with range of characteristics our results demonstrate that although the trie uses more memory it is faster more scalable and requires less disk space
an increasing number of high tech devices such as driver monitoring systems and internet usage monitoring tools are advertised as useful or even necessary for good parenting of teens simultaneously there is growing market for mobile personal safety devices as these trends merge there will be significant implications for parent teen relationships affecting domains such as privacy trust and maturation not only the teen and his or her parents are affected other important stakeholders include the teen’s friends who may be unwittingly monitored this problem space with less clear cut assets risks and affected parties thus lies well outside of more typical computer security applications to help understand this problem domain and what if anything should be built we turn to the theory and methods of value sensitive design systematic approach to designing for human values in technology we first develop value scenarios that highlight potential issues benefits harms and challenges we then conducted semi structured interviews with participants teens and their parents results show significant differences with respect to information about internal state eg mood versus external environment eg location state situation eg emergency vs non emergency and awareness eg notification vs non notification the value scenario and interview results positioned us to identify key technical challenges such as strongly protecting the privacy of teen’s contextual information during ordinary situations but immediately exposing that information to others as appropriate in an emergency and corresponding architectural levers for these technologies in addition to laying foundation for future work in this area this research serves as prototypical example of using value sensitive design to explicate the underlying human values in complex security domains
the two problems of information integration and interoperability among multiple sources and management of uncertain data have received significant attention in recent years in this paper we study information integration and interoperability among xml sources with uncertain data we extend query processing algorithms in the semantic model approach for information integration and interoperability to the case where sources may contain uncertain xml information and present probability calculation algorithms for answers to user query in this approach
in this paper we present novel approach for mining opinions from product reviews where it converts opinion mining task to identify product features expressions of opinions and relations between them by taking advantage of the observation that lot of product features are phrases concept of phrase dependency parsing is introduced which extends traditional dependency parsing to phrase level this concept is then implemented for extracting relations between product features and expressions of opinions experimental evaluations show that the mining task can benefit from phrase dependency parsing
energy efficiency is rapidly becoming first class optimization parameter for modern systems caches are critical to the overall performance and thus modern processors both high and low end tend to deploy cache with large size and high degree of associativity due large size cache power takes up significant percentage of total system power one important way to reduce cache power consumption is to reduce the dynamic activities in the cache by reducing the dynamic load store counts in this work we focus on programs that are only available as binaries which need to be improved for energy efficiency for adapting these programs for energy constrained devices we propose feed back directed post pass solution that tries to do register re allocation to reduce dynamic load store counts and to improve energy efficiency our approach is based on zero knowledge of original code generator or compiler and performs post pass register allocation to get more power efficient binary we attempt to find out the dead as well as unused registers in the binary and then re allocate them on hot paths to reduce dynamic load store counts it is shown that the static code size increase due to our framework is very minimal our experiments on spec and mediabench show that our technique is effective we have seen dynamic spill loads stores reduction in the data cache ranging from to overall our approach improves the energy delay product of the program
several mechanisms such as canonical structures type classes or pullbacks have been recently introduced with the aim to improve the power and flexibility of the type inference algorithm for interactive theorem provers we claim that all these mechanisms are particular instances of simpler and more general technique just consisting in providing suitable hints to the unification procedure underlying type inference this allows simple modular and not intrusive implementation of all the above mentioned techniques opening at the same time innovative and unexpected perspectives on its possible applications
in dynamic environments like the web data sources may change not only their data but also their schemas their semantics and their query capabilities when mapping is left inconsistent by schema change it has to be detected and updated we present novel framework and tool tomas for automatically adapting rewriting mappings as schemas evolve our approach considers not only local changes to schema but also changes that may affect and transform many components of schema our algorithm detects mappings affected by structural or constraint changes and generates all the rewritings that are consistent with the semantics of the changed schemas our approach explicitly models mapping choices made by user and maintains these choices whenever possible as the schemas and mappings evolve when there is more than one candidate rewriting the algorithm may rank them based on how close they are to the semantics of the existing mappings
traditional approaches to recommender systems have not taken into account situational information when making recommendations and this seriously limits the relevance of the results this paper advocates context awareness as promising approach to enhance the performance of recommenders and introduces mechanism to realize this approach we present framework that separates the contextual concerns from the actual recommendation module so that contexts can be readily shared across applications more importantly we devise learning algorithm to dynamically identify the optimal set of contexts for specific recommendation task and user an extensive series of experiments has validated that our system is indeed able to learn both quickly and accurately
radio frequency identification rfid tags containing privacy sensitive information are increasingly embedded into personal documents eg passports and driver’s licenses the problem is that people are often unaware of the security and privacy risks associated with rfid likely because the technology remains largely invisible and uncontrollable for the individual to mitigate this problem we developed collection of novel yet simple and inexpensive alternative tag designs to make rfid visible and controllable this video and demonstration illustrates these designs for awareness our tags provide visual audible or tactile feedback when in the range of an rfid reader for control people can allow or disallow access to the information on the tag by how they touch orient move press or illuminate the tag for example figure shows tilt sensitive rfid tag
we address the challenges of bursty convergecast in multi hop wireless sensor networks where large burst of packets from different locations needs to be transported reliably and in real time to base station via experiments on mica mote sensor network using realistic traffic trace we determine the primary issues in bursty convergecast and accordingly design protocol rbc for reliable bursty convergecast to address these issues to improve channel utilization and to reduce ack loss we design window less block acknowledgment scheme that guarantees continuous packet forwarding and replicates the acknowledgment for packet to alleviate retransmission incurred channel contention we introduce differentiated contention control moreover we design mechanisms to handle varying ack delay and to reduce delay in timer based re transmissions we evaluate rbc again via experiments and show that compared to commonly used implicit ack scheme rbc doubles packet delivery ratio and reduces end to end delay by an order of magnitude as result of which rbc achieves close to optimal goodput
this paper describes our experiments on automatic parameter optimization for the japanese monolingual retrieval task unlike regression approaches we optimized parameters completely independently of retrieval models enabling the optimized parameter set to illustrate the characteristics of the target test collections we adopted genetic algorithms as optimization tools and cross validated with four test collections namely the clir collections for ntcir to ntcir the most difficult retrieval parameters to optimize are the feedback parameters because there are no principles for calibrating them our approach optimized feedback parameters and basic scoring parameters at the same time using test sets and validation sets we achieved effectiveness levels comparable with very strong baselines ie the best performing ntcir official runs
the use of frequent itemsets has been limited by the high computational cost as well as the large number of resulting itemsets in many real world scenarios however it is often sufficient to mine small representative subset of frequent itemsets with low computational cost to that end in this paper we define new problem of finding the frequent itemsets with maximum length and present novel algorithm to solve this problem indeed maximum length frequent itemsets can be efficiently identified in very large data sets and are useful in many application domains our algorithm generates the maximum length frequent itemsets by adapting pattern fragment growth methodology based on the fp tree structure also number of optimization techniques have been exploited to prune the search space finally extensive experiments on real world data sets validate the proposed algorithm
fourteen concurrent object oriented languages are compared in terms of how they deal with communication synchronization process management inheritance and implementation trade offs the ways in which they divide responsibility between the programmer the compiler and the operating system are also investigated it is found that current object oriented languages that have concurrency features are often compromised in important areas including inheritance capability efficiency ease of use and degree of parallel activity frequently this is because the concurrency features were added after the language was designed the languages discussed are actors abd abd argus cool concurrent smalltalk eiffel emerald es kit hybrid nexus parmacs pool and presto
we prove several results related to local proofs interpolation and superposition calculus and discuss their use in predicate abstraction and invariant generation our proofs and results suggest that symbol eliminating inferences may be an interesting alternative to interpolation
for aspect oriented software development aosd to live up to being software engineering method there must be support for the separation of crosscutting concerns across the development lifecycle part of this support is traceability from one lifecycle phase to anotherthis paper investigates the traceability between one particular aosd design level language theme uml and one particular aosd implementation level language aspectj this provides for means to assess these languages and their incompatibilities with view towards eventually developing standard design language for broad range of aosd approaches
this paper presents quantitative human performance model of making single stroke pen gestures within certain error constraints in terms of production time computed from the properties of curves line segments and corners clc in gesture stroke the model may serve as foundation for the design and evaluation of existing and future gesture based user interfaces at the basic motor control efficiency level similar to the role of previous laws of action played to pointing crossing or steering based user interfaces we report and discuss our experimental results on establishing and validating the clc model together with other basic empirical findings in stroke gesture production
the paper is concerned with applying learning to rank to document retrieval ranking svm is typical method of learning to rank we point out that there are two factors one must consider when applying ranking svm in general learning to rank method to document retrieval first correctly ranking documents on the top of the result list is crucial for an information retrieval system one must conduct training in way that such ranked results are accurate second the number of relevant documents can vary from query to query one must avoid training model biased toward queries with large number of relevant documents previously when existing methods that include ranking svm were applied to document retrieval none of the two factors was taken into consideration we show it is possible to make modifications in conventional ranking svm so it can be better used for document retrieval specifically we modify the hinge loss function in ranking svm to deal with the problems described above we employ two methods to conduct optimization on the loss function gradient descent and quadratic programming experimental results show that our method referred to as ranking svm for ir can outperform the conventional ranking svm and other existing methods for document retrieval on two datasets
we study the technology mapping problem for sequential circuits for look up table lut based field programmable gate arrays fpgas existing approaches to the problem simply remove the flip flops ffs then map the remaining combinational logic and finally put the ffs back these approaches ignore the sequential nature of circuit and assume the positions of the ffs are fixed however ffs in sequential circuit can be reposistioned by functionality preserving transformation called retiming as result existing approaches can only consider very small portion of the available solution space we propose in this paper novel approach to the technology mapping problem in our approach retiming is integrated into the technology mapping process so as to consider the full solution space we then present polynomial technology mapping algorithm that for given circuit produces mapping solution with the minimum clock period among all possible ways of retiming the effectiveness of the algorithm is also demonstrated experimentally
the world wide web was originally developed as shared writable hypertext medium facility that is still widely neededwe have recently developed web based management reporting system for legal firm in an attempt to improve the efficiency and management of their overall business process this paper shares our experiences in relating the firm’s specific writing and issue tracking tasks to existing web open hypermedia and semantic web research and describes why we chose to develop new solution set of open hypermedia components collectively called the management reporting system rather than employ an existing system
we present an improvement to the disk paxos protocol by gafni and lamport which utilizes extended functionality and flexibility provided by active disks and supports unmediated concurrent data access by an unlimited number of processes the solution facilitates coordination by an infinite number of clients using finite shared memory it is based on collection of read modify write objects with faults that emulate new reliable shared memory abstraction called ranked register the required read modify write objects are readily available in active disks and in object storage device controllers making our solution suitable for state of the art storage area network san environments
most of the mobile devices are equipped with nand flash memories even if it has characteristics of not in place update and asymmetric latencies among read write and erase operations write erase operation is much slower than read operation in flash memory for the overall performance of flash memory system the buffer replacement policy should consider the above severely asymmetric latencies existing buffer replacement algorithms such as lru lirs and arc cannot deal with the above problems this paper proposes an add on buffer replacement policy that enhances lirs by reordering writes of not cold dirty pages from the buffer cache to flash storage the enhances lirs wsr algorithm focuses on reducing the number of write erase operations as well as preventing serious degradation of buffer hit ratio the trace driven simulation results show that among the existing buffer replacement algorithms including lru cf lru arc and lirs our lirswsr is best in almost cases for flash storage systems
this paper presents string matching hardware on fpga for network intrusion detection systems the proposed architecture consisting of packet classifiers and strings matching verifiers achieves superb throughput by using several mechanisms first based on incoming packet contents the packet classifiers can dramatically reduce the number of strings to be matched for each packet and accordingly feed the packet to proper verifier to conduct matching second novel multi threading finite state machine fsm is proposed which improves fsm clock frequency and allows multiple packets to be examined by single fsm simultaneously design techniques for high speed interconnect and interface circuits are also presented experimental results are presented to explore the trade offs between system performance strings partition granularity and hardware resource cost
robustness analysis research has shown that conventional memory based recommender systems are very susceptible to malicious profile injection attacks number of attack models have been proposed and studied and recent work has suggested that model based collaborative filtering cf algorithms have greater robustness against these attacks moreover to combat such attacks several attack detection algorithms have been proposed one that has shown high detection accuracy is based on using principal component analysis pca to cluster attack profiles on the basis that such profiles are highly correlated in this paper we argue that the robustness observed in model based algorithms is due to the fact that the proposed attacks have not targeted the specific vulnerabilities of these algorithms we discuss how an effective attack targeting model based algorithms that employ profile clustering can be designed it transpires that the attack profiles employed in this attack exhibit low rather than high pair wise similarities and can easily be obfuscated to avoid pca based detection while remaining effective
this paper presents the avalla language domain specific modelling language for scenario based validation of asm models and its supporting tool the asmetav validator they have been developed according to the model driven development principles as part of the asmeta asm metamodelling toolset set of tools around asms as proof of concepts the paper reports the results of the scenario based validation for the well known lift control case study
scientists economists and planners in government industry and academia spend much of their time accessing integrating and analyzing data however many of their studies are one of kind with little sharing and reuse for subsequent endeavors the argos project seeks to improve the productivity of analysts by providing framework that encourages reuse of data sources and data processing operations and by developing tools to generate data processing workflows in this paper we present an approach to automatically generate data processing workflows first we define methodology for assigning formal semantics to data and operations according to domain ontology which allows sharing and reuse specifically we define data contents using relational descriptions in an expressive logic second we develop novel planner that uses relational subsumption to connect the output of data processing operation with the input of another our modeling methodology has the significant advantage that the planner can automatically insert adaptor operations wherever necessary to bridge the inputs and outputs of operations in the workflow we have implemented the approach in transportation modeling domain
web service caching ie caching the responses of xml web service requests is needed for designing scalable web service architectures such caching of dynamic content requires maintaining the caches appropriately to reflect dynamic updates to the back end data source in the database especially relational context extensive research has addressed the problem of incremental view maintenance however only few attempts have been made to address the cache maintenance problem for xml web service messages we propose middleware solution that bridges the gap between the cached web service responses and the backend dynamic data source we assume for generality that the back end source has general xml logical data model since the rdbms technology is widely used for storing and querying xml data we show how our solution can be implemented when the xml data source is implemented on top of an rdbms such implementation exploits the well known maturity of the rdbms technology the middleware solution described in this paper has the following features that distinguish it from the existing technology in this area it provides declarative description of web services based on rich and standards based view specification language xquery xpath no knowledge of the source xml schema is assumed instead the source can be any general well formed xml data the solution can be easily deployed on rdbms and the size of the auxiliary data needed for the cache maintenance does not depend on the source data size therefore the solution is highly scalable experimental evaluation is conducted to assess the performance benefits of the proposed approach
the secure shell ssh protocol is one of the most popular cryptographic protocols on the internet unfortunately the current ssh authenticated encryption mechanism is insecure in this paper we propose several fixes to the ssh protocol and using techniques from modern cryptography we prove that our modified versions of ssh meet strong new chosen ciphertext privacy and integrity requirements furthermore our proposed fixes will require relatively little modification to the ssh protocol and to ssh implementations we believe that our new notions of privacy and integrity for encryption schemes with stateful decryption algorithms will be of independent interest
abstract we describe an integrated framework for system on chip soc test automation our framework is based on new test access mechanism tam architecture consisting of flexible width test buses that can fork and merge between cores test wrapper and tam cooptimization for this architecture is performed by representing core tests using rectangles and by employing novel rectangle packing algorithm for test scheduling test scheduling is tightly integrated with tam optimization and it incorporates precedence and power constraints in the test schedule while allowing the soc integrator to designate group of tests as preemptable test preemption helps avoid hardware and power consumption conflicts thereby leading to more efficient test schedule finally we study the relationship between tam width and tester data volume to identify an effective tam width for the soc we present experimental results on our test automation framework for four benchmark socs
maximal association rule is one of the popular data mining techniques however no current research has found that allow for the visualization of the captured maximal rules in this paper smarviz soft maximal association rules visualization an approach for visualizing soft maximal association rules is proposed the proposed approach contains four main steps including discovering visualizing maximal supported sets capturing and finally visualizing the maximal rules under soft set theory
large scale wireless sensor networks wsns are highly vulnerable to attacks because they consist of numerous resource constrained devices and communicate via wireless links these vulnerabilities are exacerbated when wsns have to operate unattended in hostile environment such as battlefields in such an environment an adversary poses physical threat to all the sensor nodes that is an adversary may capture any node compromising critical security data including keys used for confidentiality and authentication consequently it is necessary to provide security services to these networks to ensure their survival we propose novel self organizing key management scheme for large scale and long lived wsns called survivable and efficient clustered keying seck that provides administrative services that ensures the survivability of the network seck is suitable for managing keys in hierarchical wsn consisting of low end sensor nodes clustered around more capable gateway nodes using cluster based administrative keys seck provides five efficient security administration mechanisms clustering and key setup node addition key renewal recovery from multiple node captures and re clustering all of these mechanisms have been shown to localize the impact of attacks and considerably improve the efficiency of maintaining fresh session keys using simulation and analysis we show that seck is highly robust against node capture and key compromise while incurring low communication and storage overhead
autonomic computing systems reduce software maintenance costs and management complexity by taking on the responsibility for their configuration optimization healing and protection these tasks are accomplished by switching at runtime to different system behaviour the one that is more efficient more secure more stable etc while still fulfilling the main purpose of the system thus identifying the objectives of the system analyzing alternative ways of how these objectives can be met and designing system that supports all or some of these alternative behaviours is promising way to develop autonomic systems this paper proposes the use of requirements goal models as foundation for such software development process and demonstrates this on an example
blogging in the enterprise is increasingly popular and recent research has shown that there are numerous benefits for both individuals and the organization eg developing reputation or sharing knowledge however participation is very low blogs are often abandoned and few users realize those benefits we have designed and implemented novel system called blog muse whose goal is to inspire potential blog writers by connecting them with their audience through topic suggestion system we describe our system design and report results from week study with users who installed our tool our data indicate that topics requested by users are effective at inspiring bloggers to write and lead to more social interactions around the resulting entries
clustering is one of the important data mining tasks nested clusters or clusters of multi density are very prevalent in data sets in this paper we develop hierarchical clustering approach cluster tree to determine such cluster structure and understand hidden information present in data sets of nested clusters or clusters of multi density we embed the agglomerative means algorithm in the generation of cluster tree to detect such clusters experimental results on both synthetic data sets and real data sets are presented to illustrate the effectiveness of the proposed method compared with some existing clustering algorithms dbscan means birch cure nbc optics neural gas tree som endbsan and ldbscan our proposed cluster tree approach performs better than these methods
while the web has grown significantly in recent years some portions of the web remain largely underdeveloped as shown in lack of high quality content and functionality an example is the arabic web in which lack of well structured web directories limits users ability to browse for arabic resources in this research we proposed an approach to building web directories for the underdeveloped web and developed proof of concept prototype called the arabic medical web directory ameddir that supports browsing of over arabic medical web sites and pages organized in hierarchical structure we conducted an experiment involving arab participants and found that the ameddir significantly outperformed two benchmark arabic web directories in terms of browsing effectiveness efficiency information quality and user satisfaction participants expressed strong preference for the ameddir and provided many positive comments this research thus contributes to developing useful web directory for organizing the information in the arabic medical domain and to better understanding of how to support browsing on the underdeveloped web copy wiley periodicals inc
users of the web are increasingly interested in tracking the appearance of new postings rather than locating existing knowledge coupled with this is the emergence of the web movement where everyone effectively publishes and subscribes and the concept of the internet of things these trends bring into sharp focus the need for efficient distribution of information however to date there has been few examples of applying ontology based techniques to achieve this knowledge based networking kbn involves the forwarding of messages across network based not just on the contents of the messages but also on the semantics of the associated metadata in this paper we examine the scalability problems of such network that would meet the needs of internet scale semantic based event feeds this examination is conducted by evaluating an implemented extension to an existing pub sub content based networking cbn algorithm to support matching of notification messages to client subscription filters using ontology based reasoning we also demonstrate how the clustering of ontologies leads to increased efficiencies in the subscription forwarding tables used which in turn results in increased scalability of the network
the capability to represent and use concepts like time and events in computer science is essential to solve wide class of problems characterized by the notion of change real time databases and multimedia are just few of several areas which needs good tools to deal with time another area where this concepts are essential is artificial intelligence because an agent must be able to reason about dynamic environmentin this work formalism is proposed which allows the representation and use of several features that had been recognized as useful in the attempts to solve such class of problems general framework based on many sorted logic is proposed centering our attention in issues such as the representation of time actions properties events and causality the proposal is compared with related work from the temporal logic and artificial intelligence areas this work complements and enhances previously related efforts on formalizing temporal concepts with the same purpose
every notion of component for the development of embedded systems has to take heterogeneity into account components may be hardware or software or os synchronous or asynchronous deterministic or not detailed wrt time or not detailed wrt data or not etc lot of approaches following ptolemy propose to define several models of computation and communication moccs to deal with heterogeneity and framework in which they can be combined hierarchically this paper presents the very first design of component model for embedded systems called we aim at expressing fine grain timing aspects and several types of concurrency as moccs but we require that all the moccs be programmed in terms of more basic primitives is meant to be an abstract description level intended to be translated into an existing language eg lustre for execution and property validation purposes
in this paper we propose new quadrilateral remeshing method for manifolds of arbitrary genus that is at once general flexible and efficient our technique is based on the use of smooth harmonic scalar fields defined over the mesh given such field we compute its gradient field and second vector field that is everywhere orthogonal to the gradient we then trace integral lines through these vector fields to sample the mesh the two nets of integral lines together are used to form the polygons of the output mesh curvature sensitive spacing of the lines provides for anisotropic meshes that adapt to the local shape our scalar field construction allows users to exercise extensive control over the structure of the final mesh the entire process is performed without computing an explicit parameterization of the surface and is thus applicable to manifolds of any genus without the need for cutting the surface into patches
compile time optimization is often limited by lack of target machine and input data set knowledge without this information compilers may be forced to make conservative assumptions to preserve correctness and to avoid performance degradation in order to cope with this lack of information at compile time adaptive and dynamic systems can be used to perform optimization at runtime when complete knowledge of input and machine parameters is available this paper presents compiler supported high level adaptive optimization system users describe in domain specific language optimizations performed by stand alone optimization tools and backend compiler flags as well as heuristics for applying these optimizations dynamically at runtime the adapt compiler reads these descriptions and generates application specific runtime systems to apply the heuristics to facilitate the usage of existing tools and compilers overheads are minimized by decoupling optimization from execution our system adapt supports range of paradigms proposed recently including dynamic compilation parameterization and runtime sampling we demonstrate our system by applying several optimization techniques to suite of benchmarks on two target machines adapt is shown to consistently outperform statically generated executables improving performance by as much as
set of mutation operators for sql queries that retrieve information from database is developed and tested against set of queries drawn from the nist sql conformance test suite the mutation operators cover wide spectrum of sql features including the handling of null values additional experiments are performed to explore whether the cost of executing mutants can be reduced using selective mutation or the test suite size can be reduced by using an appropriate ordering of the mutants the sql mutation approach can be helpful in assessing the adequacy of database test cases and their development and as tool for systematically injecting faults in order to compare different database testing techniques
modern internet streaming services have utilized various techniques to improve the quality of streaming media delivery despite the characterization of media access patterns and user behaviors in many measurement studies few studies have focused on the streaming techniques themselves particularly on the quality of streaming experiences they offer end users and on the resources of the media systems that they consume in order to gain insights into current streaming services techniques and thus provide guidance on designing resource efficient and high quality streaming media systems we have collected large streaming media workload from thousands of broadband home users and business users hosted by major isp and analyzed the most commonly used streaming techniques such as automatic protocol switch fast streaming mbr encoding and rate adaptation our measurement and analysis results show that with these techniques current streaming systems these techniques tend to over utilize cpu and bandwidth resources to provide better services to end users which may not be desirable and effective is not necessary the best way to improve the quality of streaming media delivery motivated by these results we propose and evaluate coordination mechanism that effectively takes advantage of both fast streaming and rate adaptation to better utilize the server and internet resources for streaming quality improvement
peer to peer pp databases are becoming prevalent on the internet for distribution and sharing of documents applications and other digital media the problem of answering large scale ad hoc analysis queries for example aggregation queries on these databases poses unique challenges exact solutions can be time consuming and difficult to implement given the distributed and dynamic nature of pp databases in this paper we present novel sampling based techniques for approximate answering of ad hoc aggregation queries in such databases computing high quality random sample of the database efficiently in the pp environment is complicated due to several factors the data is distributed usually in uneven quantities across many peers within each peer the data is often highly correlated and moreover even collecting random sample of the peers is difficult to accomplish to counter these problems we have developed an adaptive two phase sampling approach based on random walks of the pp graph as well as block level sampling techniques we present extensive experimental evaluations to demonstrate the feasibility of our proposed solution
novel method of rights protection for categorical data through watermarking is introduced in this paper new watermark embedding channels are discovered and associated novel watermark encoding algorithms are proposed while preserving data quality requirements the introduced solution is designed to survive important attacks such as subset selection and random alterations mark detection is fully blind in that it doesn’t require the original data an important characteristic especially in the case of massive data various improvements and alternative encoding methods are proposed and validation experiments on real life data are performed important theoretical bounds including mark vulnerability are analyzed the method is proved experimentally and by analysis to be extremely resilient to both alteration and data loss attacks for example tolerating up to percent data loss with watermark alteration of only percent
dynamic voltagescaling and sleep state control have been shown to be extremely effective in reducing energy consumption in cmos circuits though plenty of research papers have studied the application of these techniques in real time embedded system design through intelligent task and or voltage scheduling most of these results are limited to relatively simple real time application models in this paper comprehensive real time application model including periodic sporadic and bursty tasks as well as distributed real time constraints such as end to end delays is considered two methods are presented for reducing energy consumption while satisfying complex real time constraints for this model experimental results show that the methods achieve significant energy savings without violating any deadlines
many polyvariant program analyses have been studied in the including cfa polymorphic splitting and the cartesian product algorithm the idea of polyvariance is to analyze functions more than once and thereby obtain better precision for each call site in this paper we present an equivalence theorem which relates co inductively defined family of polyvariant flow analyses and standard type system the proof embodies way of understanding polyvariant flow information in terms of union and intersection types and conversely way of understanding union and intersection types in terms of polyvariant flow information we use the theorem as basis for new flow type system in the spirit of the lambda cil calculus of wells dimock muller and turbak in which types are annotated with flow information flow type system is useful as an interface between flow analysis algorithm and program optimizer derived systematically via our equivalence theorem our flow type system should be good interface to the family of polyvariant analyses that we study
we present method for efficiently performing deletions and updates of records when the records to be deleted or updated are chosen by range scan on an index the traditional method involves numerous unnecessary lock calls and traversals of the index from root to leaves especially when the qualifying records keys span more than one leaf page of the index customers have suffered performance losses from these inefficiencies and have complained about them our goal was to minimize the number of interactions with the lock manager and the number of page fixes comparison operations and possibly os some of our improvements come from increased synergy between the query planning and data manager components of dbms our patented method has been implemented in db to address specific customer requirements it has also been done to improve performance on the tpc benchmark
we are concerned with producing high quality images of implicit surfaces in particular those with non manifold features in this work we present point based technique that improves the rendering of non manifold implicit surfaces by using point and gradient information to prune plotting nodes resulting from using octree spatial subdivision based on the natural interval extension of the surface’s function the use of intervals guarantees that no parts of the surfaces are missed in the view volume and the combination of point and gradient sampling preserves this feature while greatly enhacing the quality of point based rendering of implicit surfaces we also sucessfully render non manifold features of implicit surfaces such as rays and thin sections we illustrate the technique with number of example surfaces
the performance of nearest neighbor nn classifier is known to be sensitive to the distance or similarity function used in classifying test instance another major disadvantage of nn is that it uses all training instances in the generalization phase this can cause slow execution speed and high storage requirement when dealing with large datasets in the past research many solutions have been proposed to handle one or both of the above problems in the scheme proposed in this paper we tackle both of these problems by assigning weight to each training instance the weight of training instance is used in the generalization phase to calculate the distance or similarity of query pattern to that instance the basic nn classifier can be viewed as special case of this scheme that treats all instances equally by assigning equal weight to all training instances using this form of weighted similarity measure we propose learning algorithm that attempts to maximize the leave one out lv classification rate of the nn rule by adjusting the weights of the training instances at the same time the algorithm reduces the size of the training set and can be viewed as powerful instance reduction technique an instance having zero weight is not used in the generalization phase and can be virtually removed from the training set we show that our scheme has comparable or better performance than some recent methods proposed in the literature for the task of learning the distance function and or prototype reduction
we introduce stable noise function with controllable properties the well known perlin noise function is generated by interpolation of pre defined random number table this table must be modified if user defined constraints are to be satisfied but modification can destroy the stability of the table we integrate statistical tools for measuring the stability of random number table with user constraints within an optimization procedure so as to create controlled random number table which nevertheless has uniform random distribution no periodicity and band limited property
this paper studies the impact of off chip store misses on processor performance for modern commercial applications the performance impact of off chip store misses is largely determined by the extent of their overlap with other off chip cache misses the epoch mlp model is used to explain and quantify how these overlaps are affected by various store handling optimizations and by the memory consistency model implemented by the processor the extent of these overlaps are then translated to off chip cpi experimental results show that store handling optimizations are crucial for mitigating the substantial performance impact of stores in commercial applications while some previously proposed optimizations such as store prefetching are highly effective they are unable to fully mitigate the performance impact of off chip store misses and they also leave performance gap between the stronger and weaker memory consistency models new optimizations such as the store miss accelerator an optimization of hardware scout and new application of speculative lock elision are demonstrated to virtually eliminate the impact of off chip store misses
the semiring based formalism to model soft constraint has been introduced in by ugo montanari and the authors of this paper the idea was to make constraint programming more flexible and widely applicable we also wanted to define the extension via general formalism so that all its instances could inherit its properties and be easily compared since then much work has been done to study extend and apply this formalism this papers gives brief summary of some of these research activities
this article uses data from the social bookmarking site delicious to empirically examine the dynamics of collaborative tagging systems and to study how coherent categorization schemes emerge from unsupervised tagging by individual users first we study the formation of stable distributions in tagging systems seen as an implicit form of ldquo consensus rdquo reached by the users of the system around the tags that best describe resource we show that final tag frequencies for most resources converge to power law distributions and we propose an empirical method to examine the dynamics of the convergence process based on the kullback leibler divergence measure the convergence analysis is performed for both the most utilized tags at the top of tag distributions and the so called long tail second we study the information structures that emerge from collaborative tagging namely tag correlation or folksonomy graphs we show how community based network techniques can be used to extract simple tag vocabularies from the tag correlation graphs by partitioning them into subsets of related tags furthermore we also show for specialized domain that shared vocabularies produced by collaborative tagging are richer than the vocabularies which can be extracted from large scale query logs provided by major search engine although the empirical analysis presented in this article is based on set of tagging data obtained from delicious the methods developed are general and the conclusions should be applicable across other websites that employ tagging
collaborative filtering cf is the most commonly applied recommendation system for personalized services since cf systems rely on neighbors as information sources the recommendation quality of cf depends on the recommenders selected however conventional cf has some fundamental limitations in selecting neighbors recommender reliability proof theoretical lack of credibility attributes and no consideration of customers heterogeneous characteristics this study employs multidimensional credibility model source credibility from consumer psychology and provides theoretical background for credible neighbor selection the proposed method extracts each consumer’s importance weights on credibility attributes which improves the recommendation performance by personalizing recommendations
it is often desirable for reasons of clarity portability and efficiency to write parallel programs in which the number of processes is independent of the number of available processors several modern operating systems support more than one process in an address space but the overhead of creating and synchronizing kernel processes can be high many runtime environments implement lightweight processes threads in user space but this approach usually results in second class status for threads making it difficult or impossible to perform scheduling operations at appropriate times eg when the current thread blocks in the kernel in addition lack of common assumptions may also make it difficult for parallel programs or library routines that use dissimilar thread packages to communicate with each other or to synchronize access to shared datawe describe set of kernel mechanisms and conventions designed to accord first class status to user level threads allowing them to be used in any reasonable way that traditional kernel provided processes can be used while leaving the details of their implementation to user level code the key features of our approach are shared memory for asynchronous communication between the kernel and the user software interrupts for events that might require action on the part of user level scheduler and scheduler interface convention that facilitates interactions in user space between dissimilar kinds of threads we have incorporated these mechanisms in the psyche parallel operating system and have used them to implement several different kinds of user level threads we argue for our approach in terms of both flexibility and performance
customer information is increasingly being solicited by organizations as they try to enhance their product and service offerings customers are becoming increasingly protective of the information they disclose the prior research on information disclosure has focused on privacy concerns and trust that lead to intentions to disclose in this study we tread new ground by examining the link between intent to disclose information and the actual disclosure drawing from social response theory and the principle of reciprocity we examine how organizations can influence the strength of the link between intent and actual disclosure we conduct an experiment using pieces of information in non commercial context that examines voluntary individual information disclosure our results indicate that by implementing reasoned dyadic condition where the organization provides reasoning on why they are collecting particular information individuals are more likely to actually disclose more information the results open up opportunities to go beyond intent and study the actual disclosure of sensitive information organizations can use the concept of reciprocity to enhance the design of information acquisition systems
the lack of proper support for multicast services in the internet has hindered the widespread use of applications that rely on group communication services such as mobile software agents although they do not require high bandwidth or heavy traffic these types of applications need to cooperate in scalable fair and decentralized way this paper presents gmac an overlay network that implements all multicast related functionality including membership management and packet forwarding in the end systems gmac introduces new approach for providing multicast services for mobile agent platforms in decentralized way where group members cooperate in fair way minimize the protocol overhead thus achieving great scalability simulations comparing gmac with other approaches in aspects such as end to end group propagation delay group latency group bandwidth protocol overhead resource utilization and failure recovery show that gmac is scalable and robust solution to provide multicast services in decentralized way to mobile software agent platforms with requirements similar to movilog
current technology continues providing smaller and faster transistors so processor architects can offer more complex and functional ilp processors because manufacturers can fit more transistors on the same chip area as consequence the fraction of chip area reachable in single clock cycle is dropping and at the same time the number of transistors on the chip is increasing however problems related with power consumption and heat dissipation are worrying this scenario is forcing processor designers to look for new processor organizations that can provide the same or more performance but using smaller sizes this fact especially affects the on chip cache memory design therefore studies proposing new smaller cache organizations while maintaining or even increasing the hit ratio are welcome in this sense the cache schemes that propose better exploitation of data locality bypassing schemes prefetching techniques victim caches etc are good examplethis paper presents data cache scheme called filter cache that splits the first level data cache into two independent organizations and its performance is compared with two other proposals appearing in the open literature as well as larger classical caches to check the performance two different scenarios are considered superscalar processor and symmetric multiprocessorthe obtained results show that in the superscalar processor the split data caches perform similarly or better than larger conventional caches ii some splitting schemes work well in multiprocessors while others work less well because of data localities iii the reuse information that some split schemes incorporate for managing is also useful for designing new competitive protocols to boost performance in multiprocessors iv the filter data cache achieves the best performance in both scenarios
coherence protocols and memory consistency models are two improtant issues in hardware coherent shared memory multiprocessors and softare distributed shared memory dsm systems over the years many researchers have made extensive study on these two issues repectively however the interaction between them has not been studied in the literature in this paper we study the coherence protocols and memory consistency models used by hardware and software dsm systems in detail based on our analysis we draw general definition for memory consistency model ie memory consistency model is the logical sum of the ordering of events in each processor and coherence protocol we also point that in hardware dsm system the emphasis of memory consistency model is relaxing the restriction of event ordering while in software dsm system memory consistency model focuses mainly on relaxing coherence protocol taking lazy release consistency lrc as an example we analyze the relationship between coherence protocols and memory consistency models in software dsm systems and find that whether the advantages of lrc can be exploited or not depends greatly on it’s corresponding protocol we draw the conclusion that the more relaxed consistency model is the more relaxed coherence protocol needed to support it this conclusion is very useful when we design new consistency model furthermore we make some improvements on traditional multiple writer protocol and as far as we aware we describe the complex state transition for multiple writer protocol for the first time in the end we list the main research directions for memory consistency models in hardware and software dsm systems
the study of random graphs has traditionally been dominated by the closely related models in which graph is sampled from the uniform distribution on graphs with vertices and edges and in which each of the edges is sampled independently with probability recently however there has been considerable interest in alternate random graph models designed to more closely approximate the properties of complex real world networks such as the web graph the internet and large social networks two of the most well studied of these are the closely related preferential attachment and copying models in which vertices arrive one by one in sequence and attach at random in rich get richer fashion to earlier verticeshere we study the infinite limits of the preferential attachment process namely the asymptotic behavior of finite graphs produced by preferential attachment brie pa graphs as well as the infinite graphs obtained by continuing the process indefinitely we are guided in part by striking result of erd ouml and eacute nyi on countable graphs produced by the infinite analogue of the model showing that any two graphs produced by this model are isomorphic with probability it is natural to ask whether comparable result holds for the preferential attachment processwe find somewhat surprisingly that the answer depends critically on the out degree of the model for and there exist infinite graphs infin such that random graph generated according to the infinite preferential attachment process is isomorphic to infin with probability for ge on the other hand two different samples generated from the infinite preferential attachment process are non isomorphic with positive probability the main technical ingredients underlying this result have fundamental implications for the structure of finite pa graphs in particular we give characterization of the graphs for which the expected number of subgraph embeddings of in an node pa graph remains bounded as goes to infinity
the technique of relevance feedback has been introduced to content based model retrieval however two essential issues which affect the retrieval performance have not been addressed in this paper novel relevance feedback mechanism is presented which effectively makes use of strengths of different feature vectors and perfectly solves the problem of small sample and asymmetry during the retrieval process the proposed method takes the user’s feedback details as the relevant information of query model and then dynamically updates two important parameters of each feature vector narrowing the gap between high level semantic knowledge and low level object representation the experiments based on the publicly available model database princeton shape benchmark psb show that the proposed approach not only precisely captures the user’s semantic knowledge but also significantly improves the retrieval performance of model retrieval compared with three state of the art query refinement schemes for model retrieval it provides superior retrieval effectiveness only with few rounds of relevance feedback based on several standard measures
one of the most prominent and comprehensive ways of data collection in sensor networks is to periodically extract raw sensor readings this way of data collection enables complex analysis of data which may not be possible with in network aggregation or query processing however this flexibility in data analysis comes at the cost of power consumption in this paper we develop asap an adaptive sampling approach to energyefficient periodic data collection in sensor networks the main idea behind asap is to use dynamically changing subset of the nodes as samplers such that the sensor readings of the sampler nodes are directly collected whereas the values of the non sampler nodes are predicted through the use of probabilistic models that are locally and periodically constructed asap can be effectively used to increase the network lifetime while keeping the quality of the collected data high in scenarios where either the spatial density of the network deployment is superfluous relative to the required spatial resolution for data analysis or certain amount of data quality can be traded off in order to decrease the power consumption of the network asap approach consists of three main mechanisms first sensing driven cluster construction is used to create clusters within the network such that nodes with close sensor readings are assigned to the same clusters second correlation based sampler selection and model derivation are used to determine the sampler nodes and to calculate the parameters of the probabilistic models that capture the spatial and temporal correlations among the sensor readings last adaptive data collection and model based prediction are used to minimize the number of messages used to extract data from the network unique feature of asap is the use of in network schemes as opposed to the protocols requiring centralized control to select and dynamically refine the subset of the sensor nodes serving as samplers and to adjust the value prediction models used for non sampler nodes such runtime adaptations create data collection schedule which is self optimizing in response to the changes in the energy levels of the nodes and environmental dynamics we present simulation based experimental results and study the effectiveness of asap under different system settings
the one shot shortest path query has been studied for decades however in the applications on road networks users are actually interested in the path with the minimum travel time the fastest path which varies as time goes this motivates us to study the continuous evaluation of fastest path queries in order to capture the dynamics of road networks repeatedly evaluating large number of fastest path queries at every moment is infeasible due to its computationally expensive cost we propose novel approach that employs the concept of the affecting area and the tolerance parameter to avoid the reevaluation while the travel time of the current answer is close enough to that of the fastest path furthermore grid based index is designed to achieve the efficient processing of multiple queries experiments on real datasets show significant reduction on the total amount of reevaluation and therefore the cost for reevaluating query
substantial medical data such as discharge summaries and operative reports are stored in electronic textual form databases containing free text clinical narratives reports often need to be retrieved to find relevant information for clinical and research purposes the context of negation negative finding is of special importance since many of the most frequently described findings are such when searching free text narratives for patients with certain medical condition if negation is not taken into account many of the documents retrieved will be irrelevant hence negation is major source of poor precision in medical information retrieval systems previous research has shown that negated findings may be difficult to identify if the words implying negations negation signals are more than few words away from them we present new pattern learning method for automatic identification of negative context in clinical narratives reports we compare the new algorithm to previous methods proposed for the same task and show its advantages accuracy improvement compared to other machine learning methods and much faster than manual knowledge engineering techniques with matching accuracy the new algorithm can be applied also to further context identification and information extraction tasks
we describe minimalist methodology to develop usage based recommender systems for multimedia digital libraries prototype recommender system based on this strategy was implemented for the open video project digital library of videos that are freely available for download sequential patterns of video retrievals are extracted from the project’s web download logs and analyzed to generate network of video relationships spreading activation algorithm locates video recommendations by searching for associative paths connecting query related videos we evaluate the performance of the resulting system relative to an item based collaborative filtering technique operating on user profiles extracted from the same log data
this paper presents an extensive empirical evaluation of an interprocedural parallelizing compiler developed as part of the stanford suif compiler system the system incorporates comprehensive and integrated collection of analyses including privatization and reduction recognition for both array and scalar variables and symbolic analysis of array subscripts the interprocedural analysis framework is designed to provide analysis results nearly as precise as full inlining but without its associated costs experimentation with this system shows that it is capable of detecting coarser granularity of parallelism than previously possible specifically it can parallelize loops that span numerous procedures and hundreds of lines of codes frequently requiring modifications to array data structures such as privatization and reduction transformations measurements from several standard benchmark suites demonstrate that an integrated combination of interprocedural analyses can substantially advance the capability of automatic parallelization technology
over the last decade digital photography has entered the mainstream inexpensive miniaturized cameras are now routinely included in consumer electronics digital projection is poised to make similar breakthrough with variety of vendors offering small low cost projectors as result active imaging is topic of renewed interest in the computer graphics community in particular low cost homemade scanners are now within reach of students and hobbyists with modest budgets this course provides beginners with the mathematics software and practical details they need to leverage projector camera systems in their own scanning projects an example driven approach is used throughout each new concept is illustrated using practical scanner implemented with off the shelf parts the course concludes by detailing how these new approaches are used in rapid prototyping entertainment cultural heritage and web based applications
we address the problem of approximating the distance of bounded degree and general sparse graphs from having some predetermined graph property that is we are interested in sublinear algorithms for estimating the fraction of edge modifications additions or deletions that must be performed on graph so that it obtains this fraction is taken with respect to given upper bound on the number of edges in particular for graphs with degree bound over vertices equals dn to perform such an approximation the algorithm may ask for the degree of any vertex of its choice and may ask for the neighbors of any vertex the problem of estimating the distance to having property was first explicitly addressed by parnas et al in the context of graphs this problem was studied by fischer and newman in the dense graphs model in this model the fraction of edge modifications is taken with respect to and the algorithm may ask for the existence of an edge between any pair of vertices of its choice fischer and newman showed that every graph property that has testing algorithm in this model with query complexity independent of the size of the graph also has distance approximation algorithm with query complexity that is independent of the size of graph in this work we focus on bounded degree and general sparse graphs and give algorithms for all properties shown to have efficient testing algorithms by goldreich and ron specifically these properties are edge connectivity subgraph freeness for constant size subgraphs being an eulerian graph and cycle freeness variant of our subgraph freeness algorithm approximates the size of minimum vertex cover of graph in sublinear time this approximation improves on recent result of parnas and ron
this paper proposes novel architectural framework handling effective unexpected exceptions in workflow management systems wfms effective unexpected exceptions are events for which the organizations lack handling strategies unstructured human interventions are necessary to overcome these situations but clash with the type of model control currently exercised by wfms the proposed framework uses the notion of map guidance to orchestrate these human interventions map guidance empowers users with contextual information about the wfms and environment enables the interruption of model control on the affected instances supports collaborative exception handling and facilitates regaining model control after the exception has been resolved the framework implementation in the open symphony open source platform is also described
it is well known that model checking and satisfiability for pltl are pspace complete by contrast very little is known about whether there exist some interesting fragments of pltl with lower worst case complexity such results would help understand why pltl model checkers are successfully used in practice in this article we investigate this issue and consider model checking and satisfiability for all fragments of pltl obtainable by restricting the temporal connectives allowed the number of atomic propositions and the temporal height elsevier science usa
let be set of points in the center problem for is to find two congruent balls of the minimum radius whose union covers we present two randomized algorithms for computing center of the first algorithm runs in log expected time and the second algorithm runs in log expected time where is the radius of the center of and is the radius of the smallest enclosing ball of the second algorithm is faster than the first one as long as is not very close to which is equivalent to the condition of the centers of the two balls in the center of not being very close to each other
full hierarchical dependencies fhds constitute large class of relational dependencies relation exhibits an fhd precisely when it can be decomposed into at least two of its projections without loss of information therefore fhds generalise multivalued dependencies mvds in which case the number of these projections is precisely two the implication of fhds has been defined in the context of some fixed finite universe this paper identifies sound and complete set of inference rules for the implication of fhds this ax iomatisation is very reminiscent of that for mvds then an alternative notion of fhd implication is introduced in which the underlying set of attributes is left undetermined the main result proposes finite axiomatisation for fhd implication in undetermined universes moreover the result clarifies the role of the complementation rule as mere means of database normalisation in fact an axiomatisation for fhd implication in fixed universes is proposed which allows to infer any fhds either without using the complementation rule at all or only in the very last step of the inference this also characterises the expressiveness of an incomplete set of inference rules in fixed universes the results extend previous work on mvds by biskup
taint tracking is emerging as general technique in software security to complement virtualization and static analysis it has been applied for accurate detection of wide range of attacks on benign software as well as in malware defense although it is quite robust for tackling the former problem application of taint analysis to untrusted and potentially malicious software is riddled with several difficulties that lead to gaping holes in defense these holes arise not only due to the limitations of information flow analysis techniques but also the nature of today’s software architectures and distribution models this paper highlights these problems using an array of simple but powerful evasion techniques that can easily defeat taint tracking defenses given today’s binary based software distribution and deployment models our results suggest that information flow techniques will be of limited use against future malware that has been designed with the intent of evading these defenses
gradient mesh vector graphics representation used in commercial software is regular grid with specified position and color and their gradients at each grid point gradient meshes can compactly represent smoothly changing data and are typically used for single objects this paper advances the state of the art for gradient meshes in several significant ways firstly we introduce topology preserving gradient mesh representation which allows an arbitrary number of holes this is important as objects in images often have holes either due to occlusion or their structure secondly our algorithm uses the concept of image manifolds adapting surface parameterization and fitting techniques to generate the gradient mesh in fully automatic manner existing gradient mesh algorithms require manual interaction to guide grid construction and to cut objects with holes into disk like regions our new algorithm is empirically at least times faster than previous approaches furthermore image segmentation can be used with our new algorithm to provide automatic gradient mesh generation for whole image finally fitting errors can be simply controlled to balance quality with storage
this paper surveys and demonstrates the power of non strict evaluation in applications executed on distributed architectures we present the design implementation and experimental evaluation of single assignment incomplete data structures in distributed memory architecture and abstract network machine anm incremental structures is incremental structure software cache issc and dynamic incremental structures dis provide nonstrict data access and fully asynchronous operations that make them highly suited for the exploitation of fine grain parallelism in distributed memory systems we focus on split phase memory operations and non strict information processing under distributed address space to improve the overall system performance novel technique of optimization at the communication level is proposed and described we use partial evaluation of local and remote memory accesses not only to remove much of the excess overhead of message passing but also to reduce the number of messages when some information about the input or part of the input is known we show that split phase transactions of is together with the ability of deferring reads allow partial evaluation of distributed programs without losing determinacy our experimental evaluation indicates that commodity pc clusters with both is and caching mechanism issc are more robust the system can deliver speedup for both regular and irregular applications we also show that partial evaluation of memory accesses decreases the traffic in the interconnection network and improves the performance of mpi is and mpi issc applications
webd open standards allow the delivery of interactive virtual learning environments through the internet reaching potentially large numbers of learners worldwide at any time this paper introduces the educational use of virtual reality based on webd technologies after briefly presenting the main webd technologies we summarize the pedagogical basis that motivate their exploitation in the context of education and highlight their interesting features we outline the main positive and negative results obtained so far and point out some of the current research directions
this paper presents an action analysis method based on robust string matching using dynamic programming similar to matching text sequences atomic actions based on semantic and structural features are first detected and coded as spatio temporal characters or symbols these symbols are subsequently concatenated to form unique set of strings for each action similarity metric using longest common subsequence algorithm is employed to robustly match action strings with variable length dynamic programming method with polynomial computational complexity and linear space complexity is implemented an effective learning scheme based on similarity metric embedding is developed to deal with matching strings of variable length our proposed method works with limited amount of training data and exhibits desirable generalization property moreover it can be naturally extended to detect compound behaviors and events experimental evaluation on our own and commonly used data set demonstrates that our method allows for large pose and appearance changes is robust to background clutter and can accommodate spatio temporal behavior variations amongst different subjects while achieving high discriminability between different behaviors
internet traffic exhibits multifaceted burstiness and correlation structure over wide span of time scales previous work analyzed this structure in terms of heavy tailed session characteristics as well as tcp timeouts and congestion avoidance in relatively long time scales we focus on shorter scales typically less than milliseconds our objective is to identify the actual mechanisms that are responsible for creating bursty traffic in those scales we show that tcp self clocking joint with queueing in the network can shape the packet interarrivals of tcp connection in two level on off pattern this structure creates strong correlations and burstiness in time scales that extend up to the round trip time rtt of the connection this effect is more important for bulk transfers that have large bandwidth delay product relative to their window size also the aggregation of many flows without rescaling their packet interarrivals does not converge to poisson stream as one might expect from classical superposition results instead the burstiness in those scales can be significantly reduced by tcp pacing in particular we focus on the importance of the minimum pacing timer and show that millisecond timer would be too coarse for removing short scale traffic burstiness while millisecond timer would be sufficient to make the traffic almost as smooth as poisson stream in sub rtt scales
we present method for representing solid objects with spatially varying oriented textures by repeatedly pasting solid texture exemplars the underlying concept is to extend the texture patch pasting approach of lapped textures to solids using tetrahedral mesh and texture patches the system places texture patches according to the user defined volumetric tensor fields over the mesh to represent oriented textures we have also extended the original technique to handle nonhomogeneous textures for creating solid models whose textural patterns change gradually along the depth fields we identify several texture types considering the amount of anisotropy and spatial variation and provide tailored user interface for each with our simple framework large scale realistic solid models can be created easily with little memory and computational cost we demonstrate the effectiveness of our approach with several examples including trees fruits and vegetables
research has reported that about percnt of web searchers utilize advanced query operators with the other percnt using extremely simple queries it is often assumed that the use of query operators such as boolean operators and phrase searching improves the effectiveness of web searching we test this assumption by examining the effects of query operators on the performance of three major web search engines we selected one hundred queries from the transaction log of web search service each of these original queries contained query operators such as and or must appear or phrase we then removed the operators from these one hundred advanced queries we submitted both the original and modified queries to three major web search engines total of queries were submitted and documents evaluated we compared the results from the original queries with the operators to the results from the modified queries without the operators we examined the results for changes in coverage relative precision and ranking of relevant documents the use of most query operators had no significant effect on coverage relative precision or ranking although the effect varied depending on the search engine we discuss implications for the effectiveness of searching techniques as currently taught for future information retrieval system design and for future research
detecting the failure of data stream is relatively easy when the stream is continually full of data the transfer of large amounts of data allows for the simple detection of interference whether accidental or malicious however during interference data transmission can become irregular rather than smooth when the traffic is intermittent it is harder to detect when failure has occurred and may lead to an application at the receiving end requesting retransmission or disconnecting request retransmission places additional load on system and disconnection can lead to unnecessary reversion to checkpointed database before reconnecting and reissuing the same request or response in this paper we model the traffic in data streams as set of significant events with an arrival rate distributed with poisson distribution once an arrival rate has been determined over time or lost events can be determined with greater chance of reliability this model also allows for the alteration of the rate parameter to reflect changes in the system and provides support for multiple levels of data aggregation one significant benefit of the poisson based model is that transmission events can be deliberately manipulated in time to provide steganographic channel that confirms sender receiver identity
this article describes the main techiques developed for falcon’s matlab to fortran compiler falcon is programming environment for the development of high performance scientific programs it combines static and dynamic inference methods to translate matlab programs into fortran the static inference is supported with advanced value propagation techniques and symbolic algorithms for subscript analysis experiments show that falcon’s matlab translator can generate code that performs more than times faster than the interpreted version of matlab and substantially faster than commercially available matlab compilers on one processor of an sgi power challenge futhermore in most cases we have tested the compiler generated code is as fast as corresponding hand written programs
documents cannot be automatically classified unless they have been represented as collection of computable features model is representation of document with computable features however model may not be sufficient to express document especially when two documents have the same features they might not be necessarily classified into the same category we propose method for determining the fitness of document model by using conflict instances conflict instances are instances with exactly same features but with different category labels given by human expert in an interactive document labelling process for training of the classifier in our paper we do not treat conflict instances as noises but as the evidences that can reveal distribution of positive instances we develop an approach to the representation of this distribution information as hyperplane namely distribution hyperplane then the fitness problem becomes problem of computing the distribution hyperplanebesides determining the fitness of model distribution hyperplane can also be used for acting as classifier itself and being membership function of fuzzy sets in this paper we also propose the selection criteria of effectiveness measuring for model in process of fitness computations
searching relevant visual information based on content features in large databases is an interesting and changeling topic that has drawn lots of attention from both the research community and industry this paper gives an overview of our investigations on effective and efficient video similarity search we briefly introduce some novel techniques developed for two specific tasks studied in this phd project video retrieval in large collection of segmented video clips and video subsequence identification from long unsegmented stream the proposed methods for processing these two types of similarity queries have shown encouraging performance and are being incorporated into our prototype system of video search named uqlips which has demonstrated some marketing potentials for commercialisation
traditional biosurveillance algorithms detect disease outbreaks by looking for peaks in univariate time series of health care data current health care surveillance data however are no longer simply univariate data streams instead wealth of spatial temporal demographic and symptomatic information is available we present an early disease outbreak detection algorithm called what’s strange about recent events wsare which uses multivariate approach to improve its timeliness of detection wsare employs rule based technique that compares recent health care data against data from baseline distribution and finds subgroups of the recent data whose proportions have changed the most from the baseline data in addition health care data also pose difficulties for surveillance algorithms because of inherent temporal trends such as seasonal effects and day of week variations wsare approaches this problem using bayesian network to produce baseline distribution that accounts for these temporal trends the algorithm itself incorporates wide range of ideas including association rules bayesian networks hypothesis testing and permutation tests to produce detection algorithm that is careful to evaluate the significance of the alarms that it raises
in the last decade we have observed an unprecedented development in molecular biology an extremely high number of organisms have been sequenced in genome projects and included in genomic databases for further analysis these databases present an exponential growth rate and they are intensively accessed daily all over the world once sequence is obtained its function and or structure must be determined direct experimentation is considered to be the most reliable method to do that however the experiments that must be conducted are very complex and time consuming for this reason it is far more productive to use computational methods to infer biological information from sequence this is usually done by comparing the new sequence with sequences that already had their characteristics determined blast is the most widely used heuristic tool for sequence comparison thousands of blast searches are made daily all over the world in order to further reduce the blast execution time cluster and grid environments can be effectively used this paper proposes and evaluates an adaptive task allocation framework to perform blast searches in grid environment the framework called packageblast provides an infrastructure that executes distributed blast genomic database comparisons in addition it is flexible since the user can choose or incorporate new task allocation strategies furthermore we propose mechanism to compute grid nodes execution weight adapting the chosen allocation policy to the observed computational power and local load of the nodes our results present very good speedups for instance in machine heterogeneous grid testbed speedup of was achieved reducing the blast execution time from min to min also we show that the adaptive task allocation strategy was able to handle successfully the complexity of grid environment
traditional tfidf like term weighting schemes have rough statistic idf as the term weighting factor which does not exploit the category information category labels on documents and intra document information the relative importance of given term to given document that contains it from the training data for text categorization task we present here more elaborate nonparametric probabilistic model to make use of this sort of information in the term weighting phase idf is theoretically proved to be rough approximation of this new term weighting factor this work is preliminary and mainly aiming at providing inspiration for further study on exploitation of this information but it already provides moderate performance boost on three popular document collections
digital signatures are an important security mechanism especially when non repudiation is desired however non repudiation is meaningful only when the private signing keys and functions are adequately protected an assumption that is very difficult to accommodate in the real world because computers and thus cryptographic keys and functions could be relatively easily compromised one approach to resolving or at least alleviating this problem is to use threshold cryptography but how should such techniques be employed in the real world in this paper we propose exploiting social networks whereby average users take advantage of their trusted ones to help secure their cryptographic keys while the idea is simple from an individual user’s perspective we aim to understand the resulting systems from whole system perspective specifically we propose and investigate two measures of the resulting systems attack resilience which captures the security consequences due to the compromise of some computers and thus the compromise of the cryptographic key shares stored on them availability which captures the effect when computers are not always responsive due to the peer to peer nature of social networks
in many applications data appear with huge number of instances as well as features linear support vector machines svm is one of the most popular tools to deal with such large scale sparse data this paper presents novel dual coordinate descent method for linear svm with and loss functions the proposed method is simple and reaches an epsilon accurate solution in log epsilon iterations experiments indicate that our method is much faster than state of the art solvers such as pegasos tron svmperf and recent primal coordinate descent implementation
this survey reviews advances in human motion capture and analysis from to following previous survey of papers up to tb moeslund granum survey of computer vision based human motion capture computer vision and image understanding human motion capture continues to be an increasingly active research area in computer vision with over publications over this period number of significant research advances are identified together with novel methodologies for automatic initialization tracking pose estimation and movement recognition recent research has addressed reliable tracking and pose estimation in natural scenes progress has also been made towards automatic understanding of human actions and behavior this survey reviews recent trends in video based human capture and analysis as well as discussing open problems for future research to achieve automatic visual analysis of human movement
detection of template and noise blocks in web pages is an important step in improving the performance of information retrieval and content extraction of the many approaches proposed most rely on the assumption of operating within the confines of single website or require expensive hand labeling of relevant and non relevant blocks for model induction this reduces their applicability since in many practical scenarios template blocks need to be detected in arbitrary web pages with no prior knowledge of the site structure in this work we propose to bridge these two approaches by using within site template discovery techniques to drive the induction of site independent template detector our approach eliminates the need for human annotation and produces highly effective models experimental results demonstrate the usefulness of the proposed methodology for the important applications of keyword extraction with relative performance gain as high as
this paper proposes multidestination message passing on wormhole ary cube networks using new base routing conformed path brcp model this model allows both unicast single destination and multidestination messages to co exist in given network without leading to deadlock the model is illustrated with several common routing schemes deterministic as well as adaptive and the associated deadlock freedom properties are analyzed using this model set of new algorithms for popular collective communication operations broadcast and multicast are proposed and evaluated it is shown that the proposed algorithms can considerably reduce the latency of these operations compared to the umesh unicast based multicast and the hamiltonian path based schemes very interesting result that is presented shows that multicast can be implemented with reduced or near constant latency as the number of processors participating in the multicast increases beyond certain number it is also shown that the brcp model can take advantage of adaptivity in routing schemes to further reduce the latency of these operations the multidestination mechanism and the brcp model establish new foundation to provide fast and scalable collective communication support on wormhole routed systems
trace effects are statically generated program abstractions that can be model checked for verification of assertions in temporal program logic in this paper we develop type and effect analysis for obtaining trace effects of object oriented programs in featherweight java we observe that the analysis is significantly complicated by the interaction of trace behavior with inheritance and other object oriented features particularly overridden methods dynamic dispatch and downcasting we propose an expressive type and effect inference algorithm combining polymorphism and subtyping subeffecting constraints to obtain flexible trace effect analysis in this setting and show how these techniques are applicable to object oriented features we also extend the basic language model with exceptions and stack based event contexts and show how trace effects scale to these extensions by structural transformations
controllers are necessary for physically based synthesis of character animation however creating controllers requires either manual tuning or expensive computer optimization we introduce linear bellman combination as method for reusing existing controllers given set of controllers for related tasks this combination creates controller that performs new task it naturally weights the contribution of each component controller by its relevance to the current state and goal of the system we demonstrate that linear bellman combination outperforms naive combination often succeeding where naive combination fails furthermore this combination is provably optimal for new task if the component controllers are also optimal for related tasks we demonstrate the applicability of linear bellman combination to interactive character control of stepping motions and acrobatic maneuvers
the execution of query in parallel database machine can be controlled in either acontrol flow way or in data flow way in the former case single system node controlsthe entire query execution in the latter case the processes that execute the query although possibly running on different nodes of the system trigger each other lately many database research projects focus on data flow control since it should enhanceresponse times and throughput the authors study control versus data flow with regardto controlling the execution of database queries an analytical model is used to comparecontrol and data flow in order to gain insights into the question which mechanism isbetter under which circumstances also some systems using data flow techniques aredescribed and the authors investigate to which degree they are really data flow theresults show that for particular types of queries data flow is very attractive since it reduces the number of control messages and balances these messages over the nodes
many important issues in the design and implementation of hypermedia system functionality focus on the way interobject connections are represented manipulated and stored prototypic system called hb is being designed to meet the storage needs of next generation hypermedia system architectures hb is referred to as hyperbase management system hbms because it supports not only the storage and manipulation of information but the storage and manipulation of the connectivity data that link information together to form hypermedia among hb distinctions is its use of semantic network database system to manage physical storage here basic semantic modeling concepts as they apply to hypermedia systems are reviewed and experiences using semantic database system in hb are discussed semantic data models attempt to provide more powerful mechanisms for structuring objects than are provided by traditional approaches in hb it was necessary to abstract interobject connectivity behaviors and information for hypermedia building on top of semantic database system facilitated such separation and made the structural aspects of hypermedia conveniently accessible to manipulation this becomes particularly important in the implementation of structure related operations such as structural queries our experience suggests that an integrated semantic object oriented database paradigm appears to be superior to purely relational semantic or object oriented methodologies for representing the structurally complex interrelationships that arise in hypermedia
we present derivation of control flow analysis by abstract interpretation our starting point is transition system semantics defined as an abstract machine for small functional language in continuation passing style we obtain galois connection for abstracting the machine states by composing galois connections most notable an independent attribute galois connection on machine states and galois connection induced by closure operator associated with constituent parts relation on environments we calculate abstract transfer functions by applying the state abstraction to the collecting semantics resulting in novel characterization of demand driven cfa
this paper discusses the use of networks on chip nocs consisting of multiple voltage frequency islands to cope with power consumption clock distribution and parameter variation problems in future multiprocessor systems on chip mpsocs in this architecture communication within each island is synchronous while communication across different islands is achieved via mixed clock mixed voltage queues in order to dynamically control the speed of each domain in the presence of parameter and workload variations we propose robust feedback control methodology towards this end we first develop state space model based on the utilization of the inter domain queues then we identify the theoretical conditions under which the network is controllable finally we synthesize state feedback controllers to cope with workload variations and minimize power consumption experimental results demonstrate robustness to parameter variations and more than energy savings by exploiting workload variations through dynamic voltage frequency scaling dvfs for hardware mpeg encoder design
ensemble learning is one of the principal current directions in the research of machine learning in this paper subspace ensembles for classification are explored which constitute an ensemble classifier system by manipulating different feature subspaces starting with the nature of ensemble efficacy we probe into the microcosmic meaning of ensemble diversity and propose to use region partitioning and region weighting to implement effective subspace ensembles an improved random subspace method that integrates this mechanism is presented individual classifiers possessing eminent performance on partitioned region reflected by high neighborhood accuracies are deemed to contribute largely to this region and are assigned large weights in determining the labels of instances in this area the robustness and effectiveness of the proposed method is shown empirically with the base classifier of linear support vector machines on the classification problem of eeg signals
automated database design systems embody knowledge about the database design process however their lack of knowledge about the domains for which databases are being developed significantly limits their usefulness methodology for acquiring and using general world knowledge about business for database design has been developed and implemented in system called the common sense business reasoner which acquires facts about application domains and organizes them into hierarchical context dependent knowledge base this knowledge is used to make intelligent suggestions to user about the entities attributes and relationships to include in database design distance function approach is employed for integrating specific facts obtained from individual design sessions into the knowledge base learning and for applying the knowledge to subsequent design problems reasoning
as the amount of multimedia data is increasing day by day thanks to less expensive storage devices and increasing numbers of information sources machine learning algorithms are faced with large sized and noisy datasets fortunately the use of good sampling set for training influences the final results significantly but using simple random sample srs may not obtain satisfactory results because such sample may not adequately represent the large and noisy dataset due to its blind approach in selecting samples the difficulty is particularly apparent for huge datasets where due to memory constraints only very small sample sizes are used this is typically the case for multimedia applications where data size is usually very large in this article we propose new and efficient method to sample of large and noisy multimedia data the proposed method is based on simple distance measure that compares the histograms of the sample set and the whole set in order to estimate the representativeness of the sample the proposed method deals with noise in an elegant manner which srs and other methods are not able to deal with we experiment on image and audio datasets comparison with srs and other methods shows that the proposed method is vastly superior in terms of sample representativeness particularly for small sample sizes although time wise it is comparable to srs the least expensive method in terms of time
the development process of hypermedia and web systems poses very specific problems that do not appear in other software applications such as the need for mechanisms to model sophisticated navigational structures interactive behaviors interfaces with external applications security constraints and multimedia compositions even though experience modelling skills and abstractions can be borrowed from existing software design methods such as object oriented modelling hypermedia developers need specific mechanisms to analyze and design using entities that belong to the hypermedia domain such as nodes links anchors and space and time based relationships moreover hypermedia methods should provide mechanisms to deal with all the aforementioned features in progressive and integrated way in this paper we present the ariadne development method adm software engineering method that proposes systematic flexible integrative and platform independent process to specify and evaluate hypermedia and web applications adm has been shown particularly useful in complex systems involving huge number of users with different abilities to access information with complex structure where huge number of nodes have to be organized in clear way to produce specifications that are discussed by people with an heterogeneous background this is the case of arce latin american project where countries are cooperating to produce web platform to improve assistance in disaster mitigation situations
we present tool for the user controlled creation of multiresolution meshes most automatic mesh reduction methods are not able to identify mesh regions of high semantic or functional importance for example the face of character model or areas deformed by animation to address this problem we present method allowing user to provide importance weights for mesh regions to control the automatic simplification process to demonstrate the usefulness of this approach in real world setting maya plug in is presented that lets the user create multiresolution meshes with importance weighting interactively and intuitively the user simply paints the importance of regions directly onto the mesh the plug in can handle arbitrary meshes with attributes vertex colors textures normals and attribute discontinuities this work aims to show that an integrated editing approach with full support for mesh attributes which lets the user exercise selective control over the simplification rather than operating fully automatic can bring multiresolution meshes out of academic environments into widespread use in the digital content creation industry
software product line spl is set of software systems with well defined commonalities and variabilities that are developed by managed reuse of common artifacts in this paper we present novel approach to implement spl by fine grained reuse mechanisms which are orthogonal to class based inheritance we introduce the featherweight record trait java frtj calculus where units of product functionality are modeled by traits construct that was already shown useful with respect to code reuse and by records construct that complements traits to model the variability of the state part of products explicitly records and traits are assembled in classes that are used to build products this composition of product functionalities is realized by explicit operators of the calculus allowing code manipulations for modeling product variability the frtj type system ensures that the products in the spl are type safe by type checking only once the records traits and classes shared by different products moreover type safety of an extension of type safe spl can be guaranteed by checking only the newly added parts
pixelization is the simple yet powerful technique of mapping each element of some data set to pixel in image there are primary characteristics of pixels that can be leveraged to impart information their color and color related attributes hue saturation etc and their arrangement in the image we have found that applying dimensional stacking layout to pixelization uniquely facilitates feature discovery informs and directs user queries supports interactive data mining and provides means for exploratory analysis in this paper we describe our approach and how it is being used to analyze multidimensional multivariate neuroscience data
highly distributed anonymous communications systems have the promise to reduce the effectiveness of certain attacks and improve scalability over more centralized approaches existing approaches however face security and scalability issues requiring nodes to have full knowledge of the other nodes in the system as in tor and tarzan limits scalability and can lead to intersection attacks in peer to peer configurations morphmix avoids this requirement for complete system knowledge but users must rely on untrusted peers to select the path this can lead to the attacker controlling the entire path more often than is acceptableto overcome these problems we propose salsa structured approach to organizing highly distributed anonymous communications systems for scalability and security salsa is designed to select nodes to be used in anonymous circuits randomly from the full set of nodes even though each node has knowledge of only subset of the network it uses distributed hash table based on hashes of the nodes ip addresses to organize the system with virtual tree structure limited knowledge of other nodes is enough to route node lookups throughout the system we use redundancy and bounds checking when performing lookups to prevent malicious nodes from returning false information without detection we show that our scheme prevents attackers from biasing path selection while incurring moderate overheads as long as the fraction of malicious nodes is less than additionally the system prevents attackers from obtaining snapshot of the entire system until the number of attackers grows too large eg for peers and groups the number of groups can be used as tunable parameter in the system depending on the number of peers that can be used to balance performance and security
key predistribution in sensor networks refers to the problem of distributing secret keys among sensor nodes prior to deployment recently many key predistribution schemes have been proposed for wireless sensor networks to further improve these techniques researchers have also proposed to use sensors expected location information to help predistribution of keying materials in this paper we propose lightweight and scalable key establishment scheme for gps enabled wireless sensor networks this scheme has little communication overhead and memory usage since communications and computations of key establishment schemes are done only once and when the network is begin initiated actually the most important overhead is the memory usage only nine keys should be stored in sensors memory this scheme also provides high local connectivity we have studied the performance of this scheme using simulations and analysis with the same storage overhead our scheme has lower overhead comparing to other schemes
periodicity is particularly interesting feature which is often inherent in real world time series data sets in this article we propose data mining technique for detecting multiple partial and approximate periodicities our approach is exploratory and follows filter refine paradigm in the filter phase we introduce an autocorrelation based algorithm that produces set of candidate partial periodicities the algorithm is extended to capture approximate periodicities in the refine phase we effectively prune invalid periodicities we conducted series of experiments with various real world data sets to test the performance and verify the quality of the results
new approach to automatic fingerprint verification based on general purpose wide baseline matching methodology is here proposed the approach is not based on the standard ridge minutiae based framework instead of detecting and matching the standard structural features local interest points are detected in the fingerprints then local descriptors are computed in the neighborhood of these points and afterwards these descriptors are compared using local and global matching procedures then final verification is carried out by bayes classifier the methodology is validated using the fvc dataset where competitive results are obtained
web searchers commonly have difficulties crafting queries to fulfill their information needs even after they are able to craft query they often find it challenging to evaluate the results of their web searches sources of these problems include the lack of support for constructing and refining queries and the static nature of the list based representations of web search results wordbars has been developed to assist users in their web search and exploration tasks this system provides visual representation of the frequencies of the terms found in the first document surrogates returned from an initial query in the form of histogram exploration of the search results is supported through term selection in the histogram resulting in re sorting of the search results based on the use of the selected terms in the document surrogates terms from the histogram can be easily added or removed from the query generating new set of search results examples illustrate how wordbars can provide valuable support for query refinement and search results exploration both when vague and specific initial queries are provided user evaluations with both expert and intermediate web searchers illustrate the benefits of the interactive exploration features of wordbars in terms of effectiveness as well as subjective measures although differences were found in the demographics of these two user groups both were able to benefit from the features of wordbars
we examine the role of transactional memory from two perspectives that of programming language with atomic actions and that of implementations of the language we argue that it is difficult to formulate clean separate and generally useful definition of transactional memory in both programming language semantics and implementations the treatment of atomic actions benefits from being combined with that of other language features in this respect as in many others transactional memory is analogous to garbage collection which is often coupled with other parts of language runtime systems
current feature based gesture recognition systems use human chosen features to perform recognition effective features for classification can also be automatically learned and chosen by the computer in other recognition domains such as face recognition manifold learning methods have been found to be good nonlinear feature extractors few manifold learning algorithms however have been applied to gesture recognition current manifold learning techniques focus only on spatial information making them undesirable for use in the domain of gesture recognition where stroke timing data can provide helpful insight into the recognition of hand drawn symbols in this paper we develop new algorithm for multi stroke gesture recognition which integrates timing data into manifold learning algorithm based on kernel isomap experimental results show it to perform better than traditional human chosen feature based systems
this article describes an architecture that allows replicated service to survive crashes without breaking its tcp connections our approach does not require modifications to the tcp protocol to the operating system on the server or to any of the software running on the clients furthermore it runs on commodity hardware we compare two implementations of this architecture one based on primary backup replication and another based on message logging focusing on scalability failover time and application transparency we evaluate three types of services file server web server and multimedia streaming server our experiments suggest that the approach incurs low overhead on throughput scales well as the number of clients increases and allows recovery of the service in near optimal time
collaborative recommender systems aim to recommend items to user based on the information gathered from other users who have similar interests the current state of the art systems fail to consider the underlying semantics involved when rating an item this in turn contributes to many false recommendations these models hinder the possibility of explaining why user has particular interest or why user likes particular item in this paper we develop an approach incorporating the underlying semantics involved in the rating experiments on movie database show that this improves the accuracy of the model
this paper describes mediafaces system that enables faceted exploration of media collections the system processes semi structured information sources to extract objects and facets eg the relationships between two objects next we rank the facets based on statistical analysis of image search query logs and the tagging behaviour of users annotating photos in flickr for given object of interest we can then retrieve the top most relevant facets and present them to the user the system is currently deployed in production by yahoo image search engine we present the system architecture its main components and the application of the system as part of the image search experience
business operations involve many factors and relationships and are modeled as complex business process workflows the execution of these business processes generates vast volumes of complex data the operational data are instances of the process flow taking different paths through the process the goal is to use the complex information to analyze and improve operations and to optimize the process flow in this paper we introduce new visualization technique called vislmpact that turns raw operational business data into valuable information vislmpact reduces data complexity by analyzing operational data and abstracting the most critical factors called impact factors which influence business operations the analysis may identify single nodes of the business flow graph as important factors but it may also determine aggregations of nodes to be important moreover the analysis may find that single nodes have certain data values associated with them which have an influence on some business metrics or resource usage parameters the impact factors are presented as nodes in symmetric circular graph providing insight into core business operations and relationships cause effect mechanism is built in to determine good and bad operational behavior and to take action accordingly we have applied vislmpact to real world applications fraud analysis and service contract analysis to show the power of vislmpact for finding relationships among the most important impact factors and for immediate identification of anomalies the vislmpact system provides highly interactive interface including drilldown capabilities down to transaction levels to allow multilevel views of business dynamics
in this paper we report the development of an energy efficient high performance distributed computing paradigm to carry out collaborative signal and information processing csip in sensor networks using mobile agents in this paradigm the processing code is moved to the sensor nodes through mobile agents in contrast to the client server based computing where local data are transferred to processing center although the client server paradigm has been widely used in distributed computing the many advantages of the mobile agent paradigm make it more suitable for sensor networks the paper first presents simulation models for both the client server paradigm and the mobile agent paradigm we use the execution time energy and energy sup sup delay as metrics to measure the performance several experiments are designed to show the effect of different parameters on the performance of the paradigms experimental results show that the mobile agent paradigm performs much better when the number of nodes is large while the client server paradigm is advantageous when the number of nodes is small based on this observation we then propose cluster based hybrid computing paradigm to combine the advantages of these two paradigms there are two schemes in this paradigm and simulation results show that there is always one scheme which performs better than either the client server or the mobile agent paradigms thus the cluster based hybrid computing provides an energy efficient and high performance solution to csip
new generation of community based social networking mobile applications is emerging in these applications there is often fundamental tension between users desire for preserving the privacy of their own data and their need for fine grained information about others our work is motivated by community based mobile application called aegis personal safety enhancement service based on sharing location information with trusted nearby friends we model the privacy participation tradeoffs in this application using game theoretic formulation users in this game are assumed to be self interested they prefer to obtain more fine grained knowledge from others while limiting their own privacy leak ie their own contributions to the game as much as possible we design tit for tat mechanism to give user incentives to contribute to the application we investigate the convergence of two best response dynamics to achieve non trivial nash equilibrium for this game further we propose an algorithm that yields pareto optimal nash equilibrium we show that this algorithm guarantees polynomial time convergence and can be executed in distributed manner
static scheduling of program represented by directed task graph on multiprocessor system to minimize the program completion time is well known problem in parallel processing since finding an optimal schedule is an np complete problem in general researchers have resorted to devising efficient heuristics plethora of heuristics have been proposed based on wide spectrum of techniques including branch and bound integer programming searching graph theory randomization genetic algorithms and evolutionary methods the objective of this survey is to describe various scheduling algorithms and their functionalities in contrasting fashion as well as examine their relative merits in terms of performance and time complexity since these algorithms are based on diverse assumptions they differ in their functionalities and hence are difficult to describe in unified context we propose taxonomy that classifies these algorithms into different categories we consider scheduling algorithms with each algorithm explained through an easy to understand description followed by an illustrative example to demonstrate its operation we also outline some of the novel and promising optimization approaches and current research trends in the area finally we give an overview of the software tools that provide scheduling mapping functionalities
we examine index representation techniques for document based inverted files and present mechanism for compressing them using word aligned binary codes the new approach allows extremely fast decoding of inverted lists during query processing while providing compression rates better than other high throughput representations results are given for several large text collections in support of these claims both for compression effectiveness and query efficiency
motion blending allows the generation of new motions by interpolation or transition between motion capture sequences which is widely accepted as standard technique in computer animation but traditional blending approaches let the user choose manually the transition time and duration this paper presents new motion blending method for smoothly blending between two motion capture clips and automatically selecting the transition time and duration to evaluate the effectiveness of the improved method we have done extensive experiments the experiment results show that the novel motion blending method is effective in smoothly blending between two motion sequences
concurrent garbage collection is highly attractive for real time systems because offloading the collection effort from the executing threads allows faster response allowing for extremely short deadlines at the microseconds level concurrent collectors also offer much better scalability over incremental collectors the main problem with concurrent real time collectors is their complexity the first concurrent real time garbage collector that can support fine synchronization stopless has recently been presented by pizlo et al in this paper we propose two additional and different algorithms for concurrent real time garbage collection clover and chicken both collectors obtain reduced complexity over the first collector stopless but need to trade benefit for it we study the algorithmic strengths and weaknesses of clover and chicken and compare them to stopless finally we have implemented all three collectors on the bartok compiler and runtime for and we present measurements to compare their efficiency and responsiveness
isa links are the core component of all ontologies and are organized into hierarchies of concepts in this paper we will first address the problem of an automatic help to build sound hierarchies dependencies called existence constraints are the foundation for the definition of normalized hierarchy of concepts in the first part of the paper algorithms are provided to obtain normalized hierarchy starting either from concepts or from instances using boolean functions the second part of the paper is devoted to the hierarchy maintenance automatically inserting merging or removing pieces of knowledge we also provide way to give synthetic views of the hierarchy
cache oblivious algorithms have the advantage of achieving good sequential cache complexity across all levels of multi level cache hierarchy regardless of the specifics cache size and cache line size of each level in this paper we describe cache oblivious sorting algorithms with optimal work optimal cache complexity and polylogarithmic depth using known mappings these lead to low cache complexities on shared memory multiprocessors with single level of private caches or single shared cache moreover the low cache complexities extend to shared memory multiprocessors with common configurations of multi level caches the key factor in the low cache complexity on multiprocessors is the low depth of the algorithms we propose
we define reactive simulatability for general asynchronous systems roughly simulatability means that real system implements an ideal system specification in way that preserves security in general cryptographic sense reactive means that the system can interact with its users multiple times eg in many concurrent protocol runs or multi round game in terms of distributed systems reactive simulatability is type of refinement that preserves particularly strong properties in particular confidentiality core feature of reactive simulatability is composability ie the real system can be plugged in instead of the ideal system within arbitrary larger systems this is shown in follow up papers and so is the preservation of many classes of individual security properties from the ideal to the real systems large part of this paper defines suitable system model it is based on probabilistic io automata pioa with two main new features one is generic distributed scheduling important special cases are realistic adversarial scheduling procedure call type scheduling among colocated system parts and special schedulers such as for fairness also in combinations the other is the definition of the reactive runtime via realization by turing machines such that notions like polynomial time are composable the simple complexity of the transition functions of the automata is not composable as specializations of this model we define security specific concepts in particular separation between honest users and adversaries and several trust models the benefit of io automata as the main model instead of only interactive turing machines as usual in cryptographic multi party computation is that many cryptographic systems can be specified with an ideal system consisting of only one simple deterministic io automaton without any cryptographic objects as many follow up papers show this enables the use of classic formal methods and automatic proof tools for proving larger distributed protocols and systems that use these cryptographic systems
the growing number of online accessible services call for effective techniques to support users in discovering selecting and aggregating services we present ws advisor framework for enabling users to capture and share task memories task memory represents knowledge eg context and user rating about services selection history for given task ws advisor provides declarative language that allows users to share task definitions and task memories with other users and communities the service selection component of this framework enables user agent to improve its service selection recommendations by leveraging task memories of other user agents with which the user share tasks in addition to the local task memories
the ability to locate network bottlenecks along end to end paths on the internet is of great interest to both network operators and researchers for example knowing where bottleneck links are network operators can apply traffic engineering either at the interdomain or intradomain level to improve routing existing tools either fail to identify the location of bottlenecks or generate large amount of probing packets in addition they often require access to both end points in this paper we present pathneck tool that allows end users to efficiently and accurately locate the bottleneck link on an internet path pathneck is based on novel probing technique called recursive packet train rpt and does not require access to the destination we evaluate pathneck using wide area internet experiments and trace driven emulation in addition we present the results of an extensive study on bottlenecks in the internet using carefully selected geographically diverse probing sources and destinations we found that pathneck can successfully detect bottlenecks for almost of the internet paths we probed we also report our success in using the bottleneck location and bandwidth bounds provided by pathneck to infer bottlenecks and to avoid bottlenecks in multihoming and overlay routing
nominal abstract syntax and higher order abstract syntax provide means for describing binding structure which is higher level than traditional techniques these approaches have spawned two different communities which have developed along similar lines but with subtle differences that make them difficult to relate the nominal abstract syntax community has devices like names freshness name abstractions with variable capture and the new quantifier whereas the higher order abstract syntax community has devices like lambda binders lambda conversion raising and the nabla quantifier this paper aims to unify these communities and provide concrete correspondence between their different devices in particular we develop semantics preserving translation from alpha prolog nominal abstract syntax based logic programming language to higher order abstract syntax based logic programming language we also discuss higher order judgments common and powerful tool for specifications with higher order abstract syntax and we show how these can be incorporated into this establishes as language with the power of higher order abstract syntax the fine grained variable control of nominal specifications and the desirable properties of higher order judgments
information propagation within the blogosphere is of much importance in implementing policies marketing research launching new products and other applications in this paper we take microscopic view of the information propagation pattern in blogosphere by investigating blog cascade affinity blog cascade is group of posts linked together discussing about the same topic and cascade affnity refers to the phenomenon of blog’s inclination to join specific cascade we identify and analyze an array of features that may affect blogger’s cascade joining behavior and utilize these features to predict cascade affinity of blogs evaluated on real dataset consisting of posts our svm based prediction achieved accuracy of measured by our experiments also showed that among all features identified the number of friends was the most important factor affecting bloggers inclination to join cascades
given multiple time sequences with missing values we propose dynammo which summarizes compresses and finds latent variables the idea is to discover hidden variables and learn their dynamics making our algorithm able to function even when there are missing values we performed experiments on both real and synthetic datasets spanning several megabytes including motion capture sequences and chlorine levels in drinking water we show that our proposed dynammo method can successfully learn the latent variables and their evolution can provide high compression for little loss of reconstruction accuracy can extract compact but powerful features for segmentation interpretation and forecasting has complexity linear on the duration of sequences
email is no longer perceived as communication marvel but rather as constant source of information overload several studies have shown that accessing managing and archiving email threatens to affect users productivity while several strategies and tools have been proposed to assuage this burden none have attempted to empower users to fight the overload collaboratively we hypothesize that despite differences in email management practices and frequencies of filing among users there is some degree of similarity in the end product of the organizational structures reached by those working in close cooperative roles eg members of research group employees of an organization in this paper we describe system that enables collaborators to share their filing strategies among themselves tags applied by one user are suggested to other recipients of the same email thereby amortizing the cost of tagging and email management across all stakeholders we wish to examine if such system support for semi automated tagging reduces email overload for all users and whether it leads to overall time savings for an entire enterprise as network effects propagate over time
this paper introduces general framework for the use of translation probabilities in cross language information retrieval based on the notion that information retrieval fundamentally requires matching what the searcher means with what the author of document meant that perspective yields computational formulation that provides natural way of combining what have been known as query and document translation two well recognized techniques are shown to be special case of this model under restrictive assumptions cross language search results are reported that are statistically indistinguishable from strong monolingual baselines for both french and chinese documents
valid group is defined as group of moving users that are within distance threshold from one another for at least minimum time duration unlike grouping of users determined by traditional clustering algorithms members of valid group are expected to stay close to one another during their movement each valid group suggests some social grouping that can be used in targeted marketing and social network analysis the existing valid group mining algorithms are designed to mine complete set of valid groups from time series of user location data known as the user movement database unfortunately there are considerable redundancy in the complete set of valid groups in this paper we therefore address this problem of mining the set of maximal valid groups we first extend our previous valid group mining algorithms to mine maximal valid groups leading to amg and vgmax algorithms we further propose the vgbk algorithm based on maximal clique enumeration to mine the maximal valid groups the performance results of these algorithms under different sets of mining parameters are also reported
near duplicate web documents are abundant two such documents differ from each other in very small portion that displays advertisements for example such differences are irrelevant for web search so the quality of web crawler increases if it can assess whether newly crawled web page is near duplicate of previously crawled web page or not in the course of developing near duplicate detection system for multi billion page repository we make two research contributions first we demonstrate that charikar’s fingerprinting technique is appropriate for this goal second we present an algorithmic technique for identifying existing bit fingerprints that differ from given fingerprint in at most bit positions for small our technique is useful for both online queries single fingerprints and all batch queries multiple fingerprints experimental evaluation over real data confirms the practicality of our design
object relational database management systems allow knowledgeable users to define new data types as well as new methods operators for the types this flexibility produces an attendant complexity which must be handled in new ways for an object relational database management system to be efficient in this article we study techniques for optimizing queries that contain time consuming methods the focus of traditional query optimizers has been on the choice of join methods and orders selections have been handled by ldquo pushdown rdquo rules these rules apply selections in an arbitrary order before as many joins as possible using th assumption that selection takes no time however users of object relational systems can embed complex methods in selections thus selections may take significant amounts of time and the query optimization model must be enhanced in this article we carefully define query cost framework that incorporates both selectivity and cost estimates for selections we develop an algorithm called predicate migration and prove that it produces optimal plans for queries with expensive methods we then describe our implementation of predicate migration in the commercial object relational database management system illustra and discuss practical issues that affect our earlier assumptions we compare predicate migration to variety of simplier optimization techniques and demonstrate that predicate migration is the best general solution to date the alternative techniques we present may be useful for constrained workloads
current mmx like extensions provide mechanism for general purpose processors to meet the growing performance demand of multimedia applications however the computing performance of these extensions is often limited because they only operate on single data stream to overcome this obstacle this paper presents an architecture named multi streaming simd architecture that enables one simd instruction to simultaneously manipulate multiple data streams the proposed architecture is processor in memory like register file architecture including simd operating logics for general purposed processors to further extend current mmx like extensions to obtain high performance to efficiently and flexibly realize the proposed architecture an operation cell is designed by fusing the logic gates and the storage cells together the operation cells then are used to compose register file with the ability of performing simd operations called multimedia operation storage unit mosu further many mosus are used to compose multi streaming simd computing engine that can simultaneously manipulate multiple data streams and exploit the subword parallelisms of the elements in each data stream three instruction modes global coupling and isolated modes are defined for the mmx like extensions to modulate the amount of parallel data streams and to efficiently utilize the computation resources simulation results show that when the multi streaming simd architecture has four register mosus it provides factor of to performance improvement compared with intel’s mmx extensions on eleven multimedia kernels
chip multiprocessors are quickly gaining momentum in all segments of computing however the practical success of cmps strongly depends on addressing the difficulty of multithreaded application development to address this challenge it is necessary to co develop new cmp architecture with novel programming models currently architecture research relies on software simulators which are too slow to facilitate interesting experiments with cmp software without using small datasets or significantly reducing the level of detail in the simulated models an alternative to simulation is to exploit the rich capabilities of modern fpgas to create fpga based platforms for novel cmp research this paper presents atlas the first prototype for cmps with hardware support for transactional memory tm technology aiming to simplify parallel programming atlas uses the bee multi fpga board to provide system with powerpc cores that run at mhz and runs linux atlas provides significant benefits for cmp research such as performance improvement over software simulator and good visibility that helps with software tuning and architectural improvements in addition to presenting and evaluating atlas we share our observations about building fpga based framework for cmp research specifically we address issues such as overall performance challenges of mapping asic style cmp rtl on to fpgas software support the selection criteria for the base processor and the challenges of using pre designed ip libraries
we consider two tier content distribution system for distributing massive content consisting of an infrastructure content distribution network cdn and large number of ordinary clients the nodes of the infrastructure network form structured distributed hash table based dht peer to peer pp network each file is first placed in the cdn and possibly is replicated among the infrastructure nodes depending on its popularity in such system it is particularly pressing to have proper load balancing mechanisms to relieve server or network overload the subject of the paper is on popularity based file replication techniques within the cdn using multiple hash functions our strategy is to set aside large number of hash functions when the demand for file exceeds the overall capacity of the current servers previously unused hash function is used to obtain new node id where the file will be replicated the central problems are how to choose an unused hash function when replicating file and how to choose used hash function when requesting the file our solution to the file replication problem is to choose the unused hash function with the smallest index and our solution to the file request problem is to choose used hash function uniformly at random our main contribution is that we have developed set of distributed robust algorithms to implement the above solutions and we have evaluated their performance in particular we have analyzed random binary search algorithm for file request and random gap removal algorithm for failure recovery
the extensibility and evolution of network services and protocols had become major research issue in recent years the programmable and active network paradigms have been trying to solve the problems emanating from the immutable organization of network software layers by allowing arbitrary custom codes to be embedded inside network layers in this work we propose new approach for building extensible network systems to support cross layer optimization the fundamental idea is to perform simple light weight meta engineering on the classical osi protocols organization to make it interactive and transparent the protocols become interactive since they can provide event notification to service subscribers and they become transparent since they also allow controlled access to their state information actual protocol extensions or modifications can then be performed at the application space by what we call transientware modules this organization provides the infrastructure needed for easy and practical extensions of the current network services and it becomes much easier to address other difficult issues like security and flexibility we call this mechanism interactive transparent networking intran and we call the extended kernel intran enabled we have realized freebsd implementation of the extensible intran enabled kernel in this paper we present formal efsm based model for the proposed meta engineering and illustrate the principles through real example of tcp extension then we demonstrate how it can be used to realize equivalents of other protocol modifications by showing the intran model of snoop balakrishnan seshan katz improving reliable transport and handoff performance in cellular wireless networks acm wireless networks
tinkertype is pragmatic framework for compact and modular description of formal systems type systems operational semantics logics etc family of related systems is broken down into set of clauses ndash individual inference rules ndash and set of features controlling the inclusion of clauses in particular systems simple static checks are used to help maintain consistency of the generated systems we present tinkertype and its implementation and describe its application to two substantial repositories of typed lambda calculi the first repository covers broad range of typing features including subtyping polymorphism type operators and kinding computational effects and dependent types it describes both declarative and algorithmic aspects of the systems and can be used with our tool the tinkertype assembler to generate calculi either in the form of typeset collections of inference rules or as executable ml typecheckers the second repository addresses smaller collection of systems and provides modularized proofs of basic safety properties
finding nearly optimal optimization settings for modern compilers which can utilize large number of optimizations is combinatorially exponential problem in this paper we investigate whether in the presence of many optimization choices random generation of compiler settings can be used to obtain well performing compiler settings we apply this random generation of compiler setting to gcc which implements optimizations our results show that this technique can be used to obtain setting which exceeds the performance of the default optimization settings and for each program in the specint benchmark suite we also apply this technique to obtain general setting which is suitable for many programs this setting performs equally well as the default settings using significantly less options finally we compare our setting with the default settings of gcc and analyze the difference
program bugs remain major challenge for software developers and various tools have been proposed to help with their localization and elimination most present day tools are based either on over approximating techniques that can prove safety but may report false positives or on under approximating techniques that can find real bugs but with possible false negatives in this paper we propose dual static analysis that is based on only over approximation its main novelty is to concurrently derive conditions that lead to either success or failure outcomes and thus we provide comprehensive solution for both proving safety and finding real program bugs we have proven the soundness of our approach and have implemented prototype system that is validated by set of experiments
randomized linear expected time algorithm for computing the zonoid depth dyckerhoff koshevoy mosler zonoid data depth theory and computation in prat ed compstat proceedings in computational statistics physica verlag heidelberg pp mosler multivariate dispersion central regions and depth the lift zonoid approach lecture notes in statistics vol springer verlag new york of point with respect to fixed dimensional point set is presented
we evaluated two strategies for alleviating working memory load for users of voice interfaces presenting fewer options per turn and providing confirmations forty eight users booked appointments using nine different dialogue systems which varied in the number of options presented and the confirmation strategy used participants also performed four cognitive tests and rated the usability of each dialogue system on standardised questionnaire when systems presented more options per turn and avoided explicit confirmation subdialogues both older and younger users booked appointments more quickly without compromising task success users with lower information processing speed were less likely to remember all relevant aspects of the appointment working memory span did not affect appointment recall older users were slightly less satisfied with the dialogue systems than younger users we conclude that the number of options is less important than an accurate assessment of the actual cognitive demands of the task at hand
we present an architecture for integrating set of natural language processing nlp techniques with wiki platform this entails support for adding organizing and finding content in the wiki we perform comprehensive analysis of how nlp techniques can support the user interaction with the wiki using an intelligent interface to provide suggestions the architecture is designed to be deployed with any existing wiki platform especially those used in corporate environments we implemented prototype integrating the nlp techniques keyphrase extraction and text segmentation as well as an improved search engine the prototype is integrated with two widely used wiki platforms media wiki and twiki
vulnerability driven filtering of network data can offer fast and easy to deploy alternative or intermediary to software patching as exemplified in shield in this paper we take shield’s vision to new domain inspecting and cleansing not just static content but also dynamic content the dynamic content we target is the dynamic html in web pages which have become popular vector for attacks the key challenge in filtering dynamic html is that it is undecidable to statically determine whether an embedded script will exploit the browser at run time we avoid this undecidability problem by rewriting web pages and any embedded scripts into safe equivalents inserting checks so that the filtering is done at run time the rewritten pages contain logic for recursively applying run time checks to dynamically generated or modified web content based on known vulnerabilities we have built and evaluated browsershield system that performs this dynamic instrumentation of embedded scripts and that admits policies for customized run time actions like vulnerability driven filtering
meshes obtained from laser scanner data often contain topological noise due to inaccuracies in the scanning and merging process this topological noise complicates subsequent operations such as remeshing parameterization and smoothing we introduce an approach that removes unnecessary nontrivial topology from meshes using local wave front traversal we discover the local topologies of the mesh and identify features such as small tunnels we then identify non separating cuts along which we cut and seal the mesh reducing the genus and thus the topological complexity of the mesh
this paper describes the synchronization and communication primitives of the cray te multiprocessor shared memory system scalable to processors we discuss what we have learned from the td project the predecessor to the te and the rationale behind changes made for the te we include performance measurements for various aspects of communication and synchronizationthe te augments the memory interface of the dec microprocessor with large set of explicitly managed external registers registers registers are used as the source or target for all remote communication they provide highly pipelined interface to global memory that allows dozens of requests per processor to be outstanding through registers the te provides rich set of atomic memory operations and flexible user level messaging facility the te also provides set of virtual hardware barrier eureka networks that can be arbitrarily embedded into the torus interconnect
tools and technologies aiming to support electronic teamwork are constantly improving however the landscape of supporting technologies and tools is fragmented and comprehensive solutions that fully realize the promises of electronic collaboration remain an elusive goal in this paper we describe and relate the capabilities of workflow systems groupware tools and content management system by introducing framework that provides common concepts and reference architectures to evaluate these technologies from the perspective of teamwork and collaboration we discuss the requirements of specific application domain and describe which requirements each of these technologies can address in addition we identify currently unsupported requirements and propose corresponding areas for additional research
although number of normalized edit distances presented so far may offer good performance in some applications none of them can be regarded as genuine metric between strings because they do not satisfy the triangle inequality given two strings and over finite alphabet this paper defines new normalized edit distance between and as simple function of their lengths and and the generalized levenshtein distance gld between them the new distance can be easily computed through gld with complexity of cdot and it is metric valued in under the condition that the weight function is metric over the set of elementary edit operations with all costs of insertions deletions having the same weight experiments using the aesa algorithm in handwritten digit recognition show that the new distance can generally provide similar results to some other normalized edit distances and may perform slightly better if the triangle inequality is violated in particular data set
the microsoft sensecam is small lightweight wearable camera used to passively capture photos and other sensor readings from user’s day to day activities it can capture up to images per day equating to almost million images per year it is used to aid memory by creating personal multimedia lifelog or visual recording of the wearer’s life however the sheer volume of image data captured within visual lifelog creates number of challenges particularly for locating relevant content within this work we explore the applicability of semantic concept detection method often used within video retrieval on the novel domain of visual lifelogs concept detector models the correspondence between low level visual features and high level semantic concepts such as indoors outdoors people buildings etc using supervised machine learning by doing so it determines the probability of concept’s presence we apply detection of everyday semantic concepts on lifelog collection composed of sensecam images from users the results were then evaluated on subset of images to determine the precision for detection of each semantic concept and to draw some interesting inferences on the lifestyles of those users we additionally present future applications of concept detection within the domain of lifelogging
in emerging applications such as location based services sensor monitoring and biological management systems the values of the database items are naturally imprecise for these uncertain databases an important query is the probabilistic nearest neighbor query pnn which computes the probabilities of sets of objects for being the closest to given query point the evaluation of this query can be both computationally and expensive since there is an exponentially large number of object sets and numerical integration is required often user may not be concerned about the exact probability values for example he may only need answers that have sufficiently high confidence we thus propose the probabilistic threshold nearest neighbor query pnn which returns sets of objects that satisfy the query with probabilities higher than some threshold three steps are proposed to handle this query efficiently in the first stage objects that cannot constitute an answer are filtered with the aid of spatial index the second step called probabilistic candidate selection significantly prunes number of candidate sets to be examined the remaining sets are sent for verification which derives the lower and upper bounds of answer probabilities so that candidate set can be quickly decided on whether it should be included in the answer we also examine spatially efficient data structures that support these methods our solution can be applied to uncertain data with arbitrary probability density functions we have also performed extensive experiments to examine the effectiveness of our methods
key characteristic of today’s high performance computing systems is physically distributed memory which makes the efficient management of locality essential for taking advantage of the performance enhancements offered by these architectures currently the standard technique for programming such systems involves the extension of traditional sequential programming languages with explicit message passing libraries in processor centric model for programming and execution it is commonly understood that this programming paradigm results in complex brittle and error prone programs because of the way in which algorithms and communication are inextricably interwoven this paper describes new approach to locality awareness which focuses on data distributions in high productivity languages data distributions provide an abstract specification of the partitioning of large scale data collections across memory units supporting coarse grain parallel computation and locality of access at high level of abstraction our design which is based on new programming language called chapel is motivated by the need to provide high productivity paradigm for the development of efficient and reusable parallel code we present an object oriented framework that allows the explicit specification of the mapping of elements in collection to memory units the control of the arrangement of elements within such units the definition of sequential and parallel iteration over collections and the formulation of specialized allocation policies as required for advanced applications the result is concise high productivity programming model that separates algorithms from data representation and enables reuse of distributions allocation policies and data structures
we describe method for plausible interpolation of images with wide range of applications like temporal up sampling for smooth playback of lower frame rate video smooth view interpolation and animation of still images the method is based on the intuitive idea that given pixel in the interpolated frames traces out path in the source images therefore we simply move and copy pixel gradients from the input images along this path key innovation is to allow arbitrary asymmetric transition points where the path moves from one image to the other this flexible transition preserves the frequency content of the originals without ghosting or blurring and maintains temporal coherence perhaps most importantly our framework makes occlusion handling particularly simple the transition points allow for matches away from the occluded regions at any suitable point along the path indeed occlusions do not need to be handled explicitly at all in our initial graph cut optimization moreover simple comparison of computed path lengths after the optimization allows us to robustly identify occluded regions and compute the most plausible interpolation in those areas finally we show that significant improvements are obtained by moving gradients and using poisson reconstruction
fetch performance is very important factor because it effectively limits the overall processor performance however there is little performance advantage in increasing front end performance beyond what the back end can consume for each processor design the target is to build the best possible fetch engine for the required performance level fetch engine will be better if it provides better performance but also if it takes fewer resources requires less chip area or consumes less powerin this paper we propose novel fetch architecture based on the execution of long streams of sequential instructions taking maximum advantage of code layout optimizations we describe our architecture in detail and show that it requires less complexity and resources than other high performance fetch architectures like the trace cache while providing high fetch performance suitable for wide issue super scalar processorsour results show that using our fetch architecture and code layout optimizations obtains higher performance than the ev fetch architecture and higher than the ftb architecture using state of the art branch predictors while being only slower than the trace cache even in the absence of code layout optimizations fetching instruction streams is still faster than the ev and only slower than the trace cachefetching instruction streams effectively exploits the special characteristics of layout optimized codes to provide high fetch performance close to that of trace cache but has much lower cost and complexity similar to that of basic block architecture
most documents available over the web conform to the html specification such documents are hierarchically structured in nature the existing data models for the web either fail to capture the hierarchical structure within the documents or can only provide very low level representation of such hierarchical structure how to represent and query html documents at higher level is an important issue in this paper we first propose novel conceptual model for html this conceptual model has only few simple constructs but is able to represent the complex hierarchical structure within html documents at level that is close to human conceptualization visualization of the documents we also describe how to convert html documents based on this conceptual model using the conceptual model and conversion method one can capture the essence ie semistructure of html documents in natural and simple way based on this conceptual model we then present rule based language to query html documents over the internet this language provides simple but very powerful way to query both intra document structures and inter document structures and allows the query results to be restructured being rule based it naturally supports negation and recursion and therefore is more expressive than sql based languages logical semantics is also provided
physical design of modern systems on chip is extremely challenging such digital integrated circuits often contain tens of millions of logic gates intellectual property blocks embedded memories and custom register transfer level rtl blocks at current and future technology nodes their power and performance are impacted more than ever by the placement of their modules however our experiments show that traditional techniques for placement and floorplanning and existing academic tools cannot reliably solve the placement task to study this problem we identify particularly difficult industrial instances and reproduce the failures of existing tools by modifying pre existing benchmark instances furthermore we propose algorithms that facilitate placement of these difficult instances empirically our techniques consistently produce legal placements and on instances where comparison is possible reduce wirelength by over capo and over patoma the pre existing tools that most frequently produce legal placements in our experiments
finding typical instances is an effective approach to understand and analyze large data sets in this paper we apply the idea of typicality analysis from psychology and cognition science to database query answering and study the novel problem of answering top typicality queries we model typicality in large data sets systematically to answer questions like who are the top most typical nba players the measure of simple typicality is developed to answer questions like who are the top most typical guards distinguishing guards from other players the notion of discriminative typicality is proposed computing the exact answer to top typicality query requires quadratic time which is often too costly for online query answering on large databases we develop series of approximation methods for various situations the randomized tournament algorithm has linear complexity though it does not provide theoretical guarantee on the quality of the answers the direct local typicality approximation using vp trees provides an approximation quality guarantee vp tree can be exploited to index large set of objects then typicality queries can be answered efficiently with quality guarantees by tournament method based on local typicality tree data structure an extensive performance study using two real data sets and series of synthetic data sets clearly show that top typicality queries are meaningful and our methods are practical
we present sampling strategy and rendering framework for intersectable models whose surface is implicitly defined by black box intersection test that provides the location and normal of the closest intersection of ray with the surface to speed up image generation despite potentially slow intersection tests our method exploits spatial coherence by adjusting the sampling resolution in image space to the surface variation in object space the result is set of small view dependent bilinear surface approximations which are rendered as quads using conventional graphics hardware the advantage of this temporary rendering representation is two fold first rendering is performed on the gpu leaving cpu time for ray intersection computation as the number of primitives is typically small complex per vertex or per fragment programs can be used to achieve variety of rendering effects second bilinear surface approximations are derived from the geometry and can be reused in other views here graphics hardware is exploited to determine the subset of image space in need of re sampling we demonstrate our system by ray casting an implicit surface defined from point samples for which current ray surface intersection computations are usually too slow to generate images at interactive rates
this paper presents semantics of self adjusting computation and proves that the semantics is correct and consistent the semantics integrates change propagation with the classic idea of memoization to enable reuse of computations under mutation to memory during evaluation reuse of computation via memoization triggers change propagation that adjusts the reused computation to reflect the mutated memory since the semantics combines memoization and change propagation it involves both non determinism and mutation our consistency theorem states that the non determinism is not harmful any two evaluations of the same program starting at the same state yield the same result our correctness theorem states that mutation is not harmful self adjusting programs are consistent with purely functional programming we formalized the semantics and its meta theory in the lf logical framework and machine checked the proofs in twelf
we present the mathematical foundations and the design methodology of the contract based model developed in the framework of the speeds project speeds aims at developing methods and tools to support speculative design design methodology in which distributed designers develop different aspects of the overall system in concurrent but controlled way our generic mathematical model of contract supports this style of development this is achieved by focusing on behaviors by supporting the notion of rich component where diverse functional and non functional aspects of the system can be considered and combined by representing rich components via their set of associated contracts and by formalizing the whole process of component composition
material design is the process by which artists specify the reflectance properties of surface such as its diffuse color and specular roughness we present user study to evaluate the relative benefits of different material design interfaces focusing on novice users since they stand to gain the most from intuitive interfaces specifically we investigate the editing of the parameters of analytic bidirectional distribution functions brdfs using three interface paradigms physical sliders by which users set the parameters of analytic brdf models such as diffuse albedo and specular roughness perceptual sliders by which users set perceptually inspired parameters such as diffuse luminance and gloss contrast and image navigation by which material variations are displayed in arrays of image thumbnails and users make edits by selecting them we investigate two design tasks precise adjustment and artistic exploration we collect objective and subjective data finding that subjects can perform equally well with physical and perceptual sliders as long as the interface responds interactively image navigation performs worse than the other interfaces on precise adjustment tasks but excels at aiding in artistic exploration we find that given enough time novices can perform relatively complex material editing tasks with little training and most novices work similarly to one another
real world use of rdf requires the ability to transparently represent and explain metadata associated with rdf triples for example when rdf triples are extracted automatically by information extraction programs there is need to represent where the triples came from what their temporal validity is and how certain we are that the triple is correct today there is no theoretically clean and practically scalable mechanism that spans these different needs reification is the only solution propose to date and its implementations have been ugly in this paper we present annotated rdf or ardf for short in which rdf triples are annotated by members of partially ordered set with bottom element that can be selected in any way desired by the user we present formal declarative semantics model theory for annotated rdf and develop algorithms to check consistency of ardf theories and to answer queries to ardf theories we show that annotated rdf supports users who need to think about the uncertainty temporal aspects and provenance of the rdf triples in an rdf database we develop prototype ardf implementation and show that our algorithms work efficiently even on real world data sets containing over million triples
data aggregation in geographic information systems gis is desirable feature spatial data are integrated in olap engines for this purpose however the development and operation of those systems is still complex task due to methodologies followed there are some ad hoc solutions that deal only with isolated aspects and do not provide developer and analyst with an intuitive integrated and standard framework for designing all relevant parts to overcome these problems we have defined model driven approach to accomplish geographic data warehouse gdw development then we have defined data model required to implement and query spatial data its modeling is defined and implemented by using an extension of uml metamodel and it is also formalized by using ocl language in addition the proposal has been verified against example scenario with sample data sets for this purpose we have accomplished developing tool based on eclipse platform and mda standard the great advantage of this solution is that developers can directly include spatial data at conceptual level while decision makers can also conceptually make geographic queries without being aware of logical details
the sharing of tacit knowledge is strategic factor for the success of software process from number of perspectives training project assimilation and reducing noise in knowledge transfer pair programming is supposed to be practice suitable for this purpose unfortunately the building of tacit knowledge is determined by factors that are difficult to isolate and capture because they concern personal attitude and capability thus we have focused on the possible causes forming the individual ability that can be isolated and studied such as the individual education background we have applied the practice of working in pairs to the design phase we have made an experiment and replica in academic environment in order to understand the relationship between the building of knowledge through the practice and the individual background in this paper we discuss the replica and compare the results with the first experiment’s ones
in this paper we propose diagonal ordering new technique for nearest neighbor knn search in high dimensional space our solution is based on data clustering and particular sort order of the data points which is obtained by slicing each cluster along the diagonal direction in this way we are able to transform the high dimensional data points into one dimensional space and index them using tree structure knn search is then performed as sequence of one dimensional range searches advantages of our approach include irrelevant data points are eliminated quickly without extensive distance computations the index structure can effectively adapt to different data distributions on line query answering is supported which is natural byproduct of the iterative searching algorithm we conduct extensive experiments to evaluate the diagonal ordering technique and demonstrate its effectiveness
collaborative learning environments provide set of tools for students acting in groups to interact and accomplish an assigned task in this kind of systems students are free to express and communicate with each other which usually lead to collaboration and communication problems that may require the intervention of teacher in this article we introduce an intelligent agent approach to assist teachers through monitoring participations made by students within collaborative distance learning environment detecting conflictive situations in which teacher’s intervention may be necessary high precision rates achieved on conflict detection scenarios suggest great potential for the application of the proposed rule based approach for providing personalized assistance to teachers during the development of group works
power delivery electricity consumption and heat management are becoming key challenges in data center environments several past solutions have individually evaluated different techniques to address separate aspects of this problem in hardware and software and at local and global levels unfortunately there has been no corresponding work on coordinating all these solutions in the absence of such coordination these solutions are likely to interfere with one another in unpredictable and potentially dangerous ways this paper seeks to address this problem we make two key contributions first we propose and validate power management solution that coordinates different individual approaches using simulations based on server traces from nine different real world enterprises we demonstrate the correctness stability and efficiency advantages of our solution second using our unified architecture as the base we perform detailed quantitative sensitivity analysis and draw conclusions about the impact of different architectures implementations workloads and system design choices
we present the architecture of an end to end semantic search engine that uses graph data model to enable interactive query answering over structured and interlinked data collected from many disparate sources on the web in particular we study distributed indexing methods for graph structured data and parallel query evaluation methods on cluster of computers we evaluate the system on dataset with million statements collected from the web and provide scale up experiments on billion synthetically generated statements
the software architecture of most systems is usually described informally and diagrammatically by means of boxes and lines in order for these descriptions to be meaningful the diagrams are understood by interpreting the boxes and lines in specific conventionalized ways the informal imprecise nature of these interpretations has number of limitations in this article we consider these conventionalized interpretations as architectural styles and provide formal framework for their uniform definition in addition to providing template for precisely defining new architectural styles this framework allows for analysis within and between different architectural styles
the techniques for making decisions ie branching play central role in complete methods for solving structured csp instances in practice there are cases when sat solvers benefit from limiting the set of variables the solver is allowed to branch on to so called input variables theoretically however restricting branching to input variables implies super polynomial increase in the length of the optimal proofs for dpll without clause learning and thus input restricted dpll cannot polynomially simulate dpll in this paper we settle the case of dpll with clause learning surprisingly even with unlimited restarts input restricted clause learning dpll cannot simulate dpll even without clause learning the opposite also holds and hence dpll and input restricted clause learning dpll are polynomially incomparable additionally we analyse the effect of input restricted branching on clause learning solvers in practice with various structural real world benchmarks
we propose two new improvements for bagging methods on evolving data streams recently two new variants of bagging were proposed adwin bagging and adaptive size hoeffding tree asht bagging asht bagging uses trees of different sizes and adwin bagging uses adwin as change detector to decide when to discard underperforming ensemble members we improve adwin bagging using hoeffding adaptive trees trees that can adaptively learn from data streams that change over time to speed up the time for adapting to change of adaptive size hoeffding tree asht bagging we add an error change detector for each classifier we test our improvements by performing an evaluation study on synthetic and real world datasets comprising up to ten million examples
we decompose the stereo matching problem into three sub problems in this work disparity estimation for non occlusion regions and occlusion detection disparity estimation for occlusion regions and surface model for the disparity map three step procedure is proposed to solve them sequentially at the first step we perform an initial matching and develop new graph model using the ordering and segmentation constraints to improve disparity values in non occlusion regions and detect occlusion regions at the second step we determine disparity values in occlusion regions based on global optimization since the conventional segmentation based stereo matching is not efficient in highly slanted or curved objects we propose post processing technique for disparity map enhancement using three dimensional geometric structure the proposed three step stereo matching procedure yields excellent quantitative and qualitative results with middlebury data sets
we present simple and computationally efficient algorithm for approximating catmull clark subdivision surfaces using minimal set of bicubic patches for each quadrilateral face of the control mesh we construct geometry patch and pair of tangent patches the geometry patches approximate the shape and silhouette of the catmull clark surface and are smooth everywhere except along patch edges containing an extraordinary vertex where the patches are to make the patch surface appear smooth we provide pair of tangent patches that approximate the tangent fields of the catmull clark surface these tangent patches are used to construct continuous normal field through their cross product for shading and displacement mapping using this bifurcated representation we are able to define an accurate proxy for catmull clark surfaces that is efficient to evaluate on next generation gpu architectures that expose programmable tessellation unit
active rules may interact in complex and sometimes unpredictable ways thus possibly yielding infinite rule executions by triggering each other indefinitely this paper presents analysis techniques focused on detecting termination of rule execution we describe an approach which combines static analysis of rule set at compile time and detection of endless loops during rule processing at runtime the compile time analysis technique is based on the distinction between mutual triggering and mutual activation of rules this distinction motivates the introduction of two graphs defining rule interaction called triggering and activation graphs respectively this analysis technique allows us to identify reactive behaviors which are guaranteed to terminate and reactive behaviors which may lead to infinite rule processing when termination cannot be guaranteed at compile time it is crucial to detect infinite rule executions at runtime we propose technique for identifying loops which is based on recognizing that given situation has already occurred in the past and therefore will occur an infinite number of times in the future this technique is potentially very expensive therefore we explain how it can be implemented in practice with limited computational effort particular use of this technique allows us to develop cycle monitors which check that critical rule sequences detected at compile time do not repeat forever we bridge compile time analysis to runtime monitoring by showing techniques based on the result of rule analysis for the identification of rule sets that can be independently monitored and for the optimal selection of cycle monitors
the quality of dependable systems ds is characterized by number of non functional properties eg performance reliability availability etc assessing the ds quality against these properties imposes the application of quality analysis and evaluation quality analysis consists of checking analytically solving or simulating models of the system which are specified using formalisms like csp ccs markov chains petri nets queuing nets etc however developers are usually not keen on using such formalisms for modeling and evaluating ds quality on the other hand they are familiar with using architecture description languages and object oriented notations for building ds models based on the previous and to render the use of traditional quality analysis techniques more tractable this paper proposes an architecture based environment that facilitates the specification and quality analysis of ds at the architectural level
we propose simple but effective upsampling method for automatically enhancing the image video resolution while preserving the essential structural information the main advantage of our method lies in feedback control framework which faithfully recovers the high resolution image information from the input data without imposing additional local structure constraints learned from other examples this makes our method independent of the quality and number of the selected examples which are issues typical of learning based algorithms while producing high quality results without observable unsightly artifacts another advantage is that our method naturally extends to video upsampling where the temporal coherence is maintained automatically finally our method runs very fast we demonstrate the effectiveness of our algorithm by experimenting with different image video data
the issue of data quality is gaining importance as individuals as well as corporations are increasingly relying on multiple often external sources of data to make decisions traditional query systems do not factor in data quality considerations in their response studies into the diverse interpretations of data quality indicate that fitness for use is fundamental criteria in the evaluation of data quality in this paper we present step methodology that includes user preferences for data quality in the response of queries from multiple sources user preferences are modelled using the notion of preference hierarchies we have developed an sql extension to facilitate the specification of preference hierarchies further we will demonstrate through experimentation how our approach produces an improved result in query response
accommodating the uncertain latency of load instructionsis one of the most vexing problems in in order microarchitecturedesign and compiler development compilers cangenerate schedules with high degree of instruction levelparallelism but cannot effectively accommodate unanticipatedlatencies incorporating traditional out of order executioninto the microarchitecture hides some of this latencybut redundantly performs work done by the compiler andadds additional pipeline stages although effective techniques such as prefetching and threading have been proposedto deal with anticipable long latency misses theshorter more diffuse stalls due to difficult to anticipate first or second level misses are less easily hidden on in orderarchitectures this paper addresses this problemby proposing microarchitectural technique referred toas two pass pipelining wherein the program executes ontwo in order back end pipelines coupled by queue the advance pipeline executes instructions greedily withoutstalling on unanticipated latency dependences executingindependent instructions while otherwise blocking instructionsare deferred the backup pipeline allows concurrentresolution of instructions that were deferred in theother pipeline resulting in the absorption of shorter missesand the overlap of longer ones this paper argues that thisdesign is both achievable and good use of transistor resourcesand shows results indicating that it can deliver significantspeedups for in order processor designs
mobile ad hoc networks or manets are flexible networks that are expected to support emerging group applications such as spontaneous collaborative activities and rescue operations in order to provide secrecy to these applications common encryption key has to be established between group members of the application this task is critical in manets because these networks have no fixed infrastructure frequent node and link failures and dynamic topology the proposed approaches to cope with these characteristics aim to avoid centralized solutions and organize the network into clusters however the clustering criteria used in the literature are not always adequate for key management and security in this paper we propose group key management framework based on trust oriented clustering scheme we show that trust is relevant clustering criterion for group key management in manets trust information enforce authentication and is disseminated by the mobility of nodes furthermore it helps to evict malicious nodes from the multicast session even if they are authorized members of the group simulation results show that our solution is efficient and typically adapted to mobility of nodes
this paper examines the performance of broadcast communication on multicomputer networks unlike many existing works this study considers number of key factors prosperities including scalability parallelism and routing scheme that could greatly affect the service provided by the network to broadcast messages both deterministic and adaptive routing schemes have been included in our analysis unlike the previous works this study considers the issue of broadcast latency at both the network and node levels across different traffic scenarios extensive simulation results show that both our suggested adaptive and deterministic algorithms exhibit superior performance characteristics under wide range of traffic conditions
intrinsic complexity is used to measure the complexity of learning areas limited by broken straight lines called open semi hulls and intersections of such areas any strategy learning such geometrical concepts can be viewed as sequence of primitive basic strategies thus the length of such sequence together with the complexities of the primitive strategies used can be regarded as the complexity of learning the concepts in question we obtained the best possible lower and upper bounds on learning open semi hulls as well as matching upper and lower bounds on the complexity of learning intersections of such areas surprisingly upper bounds in both cases turn out to be much lower than those provided by natural learning strategies another surprising result is that learning intersections of open semi hulls turns out to be easier than learning open semi hulls themselves
the ssapre algorithm for performing partial redundancy elimination based entirely on ssa form is presented the algorithm is formulated based on new conceptual framework the factored redundancy graph for analyzing redundancy and representes the first sparse approach to the classical problem and on methods for its solution with the algorithm description theorems and their proofs are given showing that the algorithm produces the best possible code by the criteria of computational optimality and lifetime optimality of the introduced temporaries in addition to the base algorithm practical implementation of ssapre that exhibits additional compile time efficiencies is described in closing measurement statistics are provided that characterize the instances of the partial redundancy problem from set of benchmark programs and compare optimization time spent by an implementation of ssapre aganist classical partial redundancy elimination implementation the data lend insight into the nature of partial redundancy elimination and demonstrate the expediency of this new approach
we present generative method for reconstructing human motion from single images and monocular image sequences inadequate observation information in monocular images and the complicated nature of human motion make the human pose reconstruction challenging in order to mine more prior knowledge about human motion we extract the motion subspace by performing conventional principle component analysis pca on small sample set of motion capture data in doing so we also reduce the problem dimensionality so that the generative pose recovering can be performed more effectively and the extracted subspace is naturally hierarchical this allows us to explore the solution space efficiently we design an annealed genetic algorithm aga and hierarchical annealed genetic algorithm haga for human motion analysis that searches the optimal solutions by utilizing the hierarchical characteristics of state space in tracking scenario we embed the evolutionary mechanism of aga into the framework of evolution strategy for adapting the local characteristics of fitness function we adopt the robust shape contexts descriptor to construct the matching function our methods are demonstrated in different motion types and different image sequences results of human motion estimation show that our novel generative method can achieve viewpoint invariant pose reconstruction
user level virtual memory vm primitives are used in many different application domains including distributed shared memory persistent objects garbage collection and checkpointing unfortunately vm primitives only allow traps to be handled at the granularity of fixed sized pages defined by the operating system and architecture in many cases this results in size mismatch between pages and application defined objects that can lead to significant loss in performance in this paper we describe the design and implementation of library that provides at the granularity of application defined regions the same set of services that are commonly available at page granularity using vm primitives applications that employ the interface of this library called the region trap library rtl can create and use multiple objects with different levels of protection ie invalid read only or read write that reside on the same virtual memory page and trap only on read write references to objects in an invalid state or write references to objects in read only state all other references to these objects proceed at hardware speeds benchmarks of an implementation on five different os architecture combinations are presented along with case study using region trapping within distributed shared memory dsm system to implement region based version of the lazy release consistency lrc coherence protocol together the benchmark results and the dsm case study suggest that region trapping mechanisms provide feasible region granularity alternative for application domains that commonly rely on page based virtual memory primitives
the race condition checker rccjava uses formal type system to statically identify potential race conditions in concurrent java programs but it requires programmer supplied type annotations this paper describes type inference algorithm for rccjava due to the interaction of parameterized classes and dependent types this type inference problem is np complete this complexity result motivates our new approach to type inference which is via reduction to propositional satisfiability this paper describes our type inference algorithm and its performance on programs of up to lines of code
this paper studies the greedy ensemble selection family of algorithms for ensembles of regression models these algorithms search for the globally best subset of regressors by making local greedy decisions for changing the current subset we abstract the key points of the greedy ensemble selection algorithms and present general framework which is applied to an application domain with important social and commercial value water quality prediction
this paper presents the ldquo situation manager rdquo tool that includes both language and an efficient runtime execution mechanism aimed at reducing the complexity of active applications this tool follows the observation that in many cases there is gap between current tools that enable one to react to single event following the eca event condition action paradigm and the reality in which single event may not require any reaction however the reaction should be given to patterns over the event historythe concept of presented in this paper extends the concept of in its expressive power flexibility and usability this paper motivates the work surveys other efforts in this area and discusses both the language and the execution model
this paper proposes transformation based approach to design constraint based analyses for java at coarser granularity in this approach we design less or equally precise but more efficient version of an original analysis by transforming the original construction rules into new ones as applications of this rule transformation we provide two instances of analysis design by rule transformation the first one designs sparse version of class analysis for java and the second one deals with sparse exception analysis for java both are designed based on method level and the sparse exception analysis is shown to give the same information for every method as the original analysis
wireless lans have been densely deployed in many urban areas contention among nearby wlans is locationsensitive which makes some hosts much more capable than others to obtain the channel for their transmissions another reality is that wireless hosts use different transmission rates to communicate with the access points due to attenuation of their signals we show that location sensitive contention aggravates the throughput anomaly caused by different transmission rates it can cause throughput degradation and host starvation this paper studies the intriguing interaction between location sensitive contention and time fairness across contending wlans achieving time fairness across multiple wlans is very difficult problem because the hosts may perceive very different channel conditions and they may not be able to communicate and coordinate their operations due to the disparity between the interference range and the transmission range in this paper we design mac layer time fairness solution based on two novel techniques channel occupancy adaptation which applies aimd on the channel occupancy of each flow and queue spreading which ensures that all hosts and only those hosts in saturated channel detect congestion and reduce their channel occupancies in response we show that these two techniques together approximate the generic adaptation algorithm for proportional fairness
one of the most frequently studied problems in the context of information dissemination in communication networks is the broadcasting problem in this paper we consider radio broadcasting in random geometric graphs in which nodes are placed uniformly at random in sqrt and there is directed edge from node to node in the corresponding graph iff the distance between and is smaller than the transmission radius assigned to throughout this paper we consider the distributed case ie each node is only aware apart from of its own coordinates and its own transmission radius and we assume that the transmission radii of the nodes vary according to power law distribution first we consider the model in which any node is assigned transmission radius min according to probability density function more precisely rho alpha min alpha alpha where and min delta sqrt log with being large constant for this case we develop simple radio broadcasting algorithm which has the running time loglogn with high probability and show that this result is asymptotically optimal then we consider the model in which any node is assigned transmission radius according to the probability density function where is drawn from the same range as before and is constant since this graph is usually not strongly connected we assume that the message which has to be spread to all nodes of the graph is placed initially in one of the nodes of the giant component we show that there exists fully distributed randomized algorithm which disseminates the message in loglogn steps with high probability where denotes the diameter of the giant component of the graphour results imply that by setting the transmission radii of the nodes according to power law distribution one can design energy efficient radio networks with low average transmission radius in which broadcasting can be performed exponentially faster than in the extensively studied case where all nodes have the same transmission power
routers must do best matching prefix lookup for every packet solutions for gigabit speeds are well known as internet link speeds higher we seek scalable solution whose speed scales with memory speeds while allowing large prefix databases in this paper we show that providing such solution requires careful attention to memory allocation and pipelining this is because fast lookups require on chip or off chip sram which is limited by either expense or manufacturing process we show that doing so while providing guarantees on the number of prefixes supported requires new algorithms and the breaking down of traditional abstraction boundaries between hardware and software we introduce new problem specific memory allocators that have provable memory utilization guarantees that can reach this is contrast to all standard allocators that can only guarantee utilization when the requests can come in the range an optimal version of our algorithm requires new but feasible sram memory design that allows shifted access in addition to normal word access our techniques generalize to other ip lookup schemes and to other state lookups besides prefix lookup
sybil attacks have been shown to be unpreventable except under the protection of vigilant central authority we use an economic analysis to show quantitatively that some applications and protocols are more robust against the attack than others in our approach for each distributed application and an attacker objective there is critical value that determines the cost effectiveness of the attack sybil attack is worthwhile only when the critical value is exceeded by the ratio of the value of the attacker’s goal to the cost of identities we show that for many applications successful sybil attacks may be expensive even when the sybil attack cannot be prevented specifically we propose the use of recurring fee as deterrent against the sybil attack as detailed example we look at four variations of the sybil attack against recurring fee based onion routing anonymous routing network and quantify its vulnerability
the damas milner type inference algorithm commonly known as algorithm is at the core of all ml type checkers although the algorithm is known to have poor worst case behavior in practice well engineered type checkers will run in approximately linear time to achieve this efficiency implementations need to improve on algorithm w’s method of scanning the complete type environment to determine whether type variable can be generalized at let binding following suggestion of damas most ml type checkers use an alternative method based on ranking unification variables to track their position in the type environment here we formalize two such ranking systems one based on lambda depth used in the sml nj compiler and the other based on let depth used in ocaml for instance each of these systems is formalized both with and without the value restriction and they are proved correct relative to the classic algorithm our formalizations of the various algorithms use simple abstract machines that are similar to small step evaluation semantics
automatic paraphrasing is an important component in many natural language processing tasks in this article we present new parallel corpus with paraphrase annotations we adopt definition of paraphrase based on word alignments and show that it yields high inter annotator agreement as kappa is suited to nominal data we employ an alternative agreement statistic which is appropriate for structured alignment tasks we discuss how the corpus can be usefully employed in evaluating paraphrase systems automatically eg by measuring precision recall and and also in developing linguistically rich paraphrase models based on syntactic structure
energy is precious resource in wireless sensor networks as sensor nodes are typically powered by batteries with high replacement cost this paper presents esense an energy efficient stochastic sensing framework for wireless sensor platforms esense is node level framework that utilizes knowledge of the underlying data streams as well as application data quality requirements to conserve energy on sensor node esense employs stochastic scheduling algorithm to dynamically control the operating modes of the sensor node components this scheduling algorithm enables an adaptive sampling strategy that aggressively conserves power by adjusting sensing activity to the application requirements using experimental results obtained on power tossim with real world data trace we demonstrate that our approach reduces energy consumption by while providing strong statistical guarantees on data quality
web applications are complex software artefacts whose creation and maintenance is not feasible without abstractions or models many special purpose languages are used today as notations for these models we show that functional programming languages can be used as modelling languages offering substantial benefits the precision and expressive power of functional languages helps in developing concise and maintainable specifications we demonstrate our approach with the help of simple example web site using haskell as the implementation language
we review ideas about the relationship between qualitative description of local image structure and quantitative description based on responses to family of linear filters we propose sequence of three linking hypotheses the first the feature hypothesis is that qualitative descriptions arise from category system on filter response space the second the icon hypothesis is that the partitioning into categories of filter response space is determined by system of iconic images one associated with each point of the space the third the texton hypothesis is that the correct images to play the role of icons are those that are the most likely explanations of vector of filter responses we present results in support of these three hypotheses including new results on st order structure
we introduce bilingually motivated word segmentation approach to languages where word boundaries are not orthographically marked with application to phrase based statistical machine translation pb smt our approach is motivated from the insight that pb smt systems can be improved by optimizing the input representation to reduce the predictive power of translation models we firstly present an approach to optimize the existing segmentation of both source and target languages for pb smt and demonstrate the effectiveness of this approach using chinese english mt task that is to measure the influence of the segmentation on the performance of pb smt systems we report percnt relative increase in bleu score and consistent increase according to other metrics we then generalize this method for chinese word segmentation without relying on any segmenters and show that using our segmentation pb smt can achieve more consistent state of the art performance across two domains there are two main advantages of our approach first of all it is adapted to the specific translation task at hand by taking the corresponding source target language into account second this approach does not rely on manually segmented training data so that it can be automatically adapted for different domains
the visual hull summarizes the relations between an object and its silhouettes and shadows this paper develops the theory of the visual hull of general piecewise smooth objects as those used in cad applications complete catalogue of the nine types of ruled surfaces that are possible boundaries of the visual hull of these objects is derived the construction of the visual hull is simplified by detailed analysis that allows pruning and trimming many surfaces not relevant for particular object an algorithm for computing the visual hull is presented together with several examples constructed with commercial cad package the theory developed includes as particular cases the previous approaches to the computation of the visual hull
developers use class diagrams to describe the architecture of their programs intensively class diagrams represent the structure and global behaviour of programs they show the programs classes and interfaces and their relationships of inheritance instantiation use association aggregation and composition class diagrams could provide useful data during programs maintenance however they often are obsolete and imprecise they do not reflect the real implementation and behaviour of programs we propose reverse engineering tool suite ptidej to build precise class diagrams from java programs with respect to their implementation and behaviour we describe static and dynamic models of java programs and algorithms to analyse these models and to build class diagrams in particular we detail algorithms to infer use association aggregation and composition relationships because these relationships do not have precise definitions we show that class diagrams obtained semi automatically are similar to those obtained manually and more precise than those provided usually demonstration applet of the ptidej tool suite is provided also the latest version of the ptidej tool suite is available at wwwyann gaelgueheneucnet work
in the literature proposals can be found to use hypermedia for requirements engineering the major technical and commercial constraints for wide spread application are removed now but there is little knowledge generally available yet on how exactly such approaches can be usefully applied in industrial practice or what the advantages and issues to be solved are so we report on case study of using hypermedia real time in the requirements phase of an important real world project inside our environment where several ways of applying hypermedia were tried with varying success since the resulting hypermedia repository can be fully automatically exported into web representation several versions were made available on the intranet at different stages while this case study may serve as data point in the space of such applications we also discuss reflections on this experience with the motivation of helping practitioners apply hypermedia successfully in requirements engineering in nutshell this paper presents some experience from using hypermedia in requirements engineering practice and critical discussion of advantages and issues involved
previous works in computer architecture have mostly neglected revenue and or profit key factors driving any design decision in this paper we evaluate architectural techniques to optimize for revenue profit the continual trend of technology scaling and sub wavelength lithography has caused transistor feature sizes to shrink into the nanoscale range as result the effects of process variations on critical path delay and chip yields have amplified common concept to remedy the effects of variations is speed binning by which chips from single batch are rated by discrete range of frequencies and sold at different prices an efficient binning distribution thus decides the profitability of the chip manufacturer we propose and evaluate cache redundancy scheme called substitute cache which allows the chip manufacturers to modify the number of chips in different bins particularly this technique introduces small fully associative array associated with each cache way to replicate the data elements that will be stored in the high latency lines and hence can be effectively used to boost up the overall chip yield and also shift the chip binning distribution towards higher frequencies we also develop models based on linear regression and neural networks to accurately estimate the chip prices from their architectural configurations using these estimation models we find that our substitute cache scheme can potentially increase the revenue for the batch of chips by as much as
aggregating spatial objects is necessary step in generating spatial data cubes to support roll up drill down operations current approaches face performance bottleneck issues when attempting to dynamically aggregate geometries for large set of spatial data we observe that changing the resolution of region is reflective of the fact that the precision of spatial data can be changed to certain extent without compromising its usefulness moreover most spatial datasets are stored at much higher resolutions than are necessary for some applications the existing approaches which aggregate objects at base resolution often results in processing bottleneck due to extraneous in this paper we develop new aggregation methodology that can significantly reduce retrieval costs and improve overall performance by utilising multiresolution data storage and retrieval techniques topological inconsistencies that may arise during resolution change which are not handled by current amalgamation techniques are identified by factoring these issues into the amalgamation query processing the retrieval loads can be further reduced with guaranteed topological correctness experimental results illustrate significant savings in data retrieval and overall processing time of dynamic aggregation
we propose pipelined architecture to accelerate high quality global illumination ray tracing can be done in real time so the main challenge is how to combine the visibility information from ray tracing with the global illumination information from photon tracing our architecture is based on reverse photon mapping which under reasonable assumptions is algorithmically faster than photon mapping without sacrificing versatility or visual quality furthermore reverse photon mapping exposes fine grain data objects photons which can be efficiently pipelined through our architecture for very high throughput because photon mapping is bandwidth limited we use cache behavior and bandwidths to measure the effectiveness of our approach simulations indicate that this architecture will eventually be able to render high quality global illumination in real time we believe that fine grain pipelining is powerful tool that will be necessary to achieve real time photorealistic rendering
information derived from relational databases is routinely used for decision making however little thought is usually given to the quality of the source data its impact on the quality of the derived information and how this in turn affects decisions to assess quality one needs framework that defines relevant metrics that constitute the quality profile of relation and provides mechanisms for their evaluation we build on quality framework proposed in prior work and develop quality profiles for the result of the primitive relational operations difference and union these operations have nuances that make both the classification of the resulting records as well as the estimation of the different classes quite difficult to address and very different from that for other operations we first determine how tuples appearing in the results of these operations should be classified as accurate inaccurate or mismember and when tuples that should appear do not called incomplete in the result although estimating the cardinalities of these subsets directly is difficult we resolve this by decomposing the problem into sequence of drawing processes each of which follows hyper geometric distribution finally we discuss how decisions would be influenced based on the resulting quality profiles
new solutions have been proposed to address problems with the internet’s interdomain routing protocol bgp before their deployment validation of incremental performance gains and backwards compatibility is necessary for this task the internet’s large size and complexity make all techniques but simulation infeasible when performing large scale network simulations memory requirements for routing table storage can become limiting factor this work uses model reduction to mitigate this problem with reduction defined in terms of the number of routers our framework uses path properties specific to interdomain routing to define the conditions of path preserving scale down transformation for implementation vertex contraction and deletion were used to remove routers from preliminary nominal network model vertex contraction was seen to violate the conditions of the transformation small subgraph from measured topology is used for experimental validation routing tables are compared to show equivalence under the model reduction
abstract recent work has shown equivalences between various type systems and flow logics ideally the translations upon which such equivalences are based should be faithful in the sense that information is not lost in round trip translations from flows to types and back or from types to flows and back building on the work of nielson nielson and of palsberg pavlopoulou we present the first faithful translations between class of finitary polyvariant flow analyses and type system supporting polymorphism in the form of intersection and union types additionally our flow type correspondence solves several open problems posed by palsberg pavlopoulou it expresses call string based polyvariance such as cfa as well as argument based polyvariance it enjoys subject reduction property for flows as well as for types and it supports flow oriented perspective rather than type oriented one
many real time systems must control their cpu utilizations in order to meet end to end deadlines and prevent overload utilization control is particularly challenging in distributed real time systems with highly unpredictable workloads and large number of end to end tasks and processors this paper presents the decentralized end to end utilization control deucon algorithm which can dynamically enforce the desired utilizations on multiple processors in such systems in contrast to centralized control schemes adopted in earlier works deucon features novel decentralized control structure that requires only localized coordination among neighbor processors deucon is systematically designed based on recent advances in distributed model predictive control theory both control theoretic analysis and simulations show that deucon can provide robust utilization guarantees and maintain global system stability despite severe variations in task execution times furthermore deucon can effectively distribute the computation and communication cost to different processors and tolerate considerable communication delay between local controllers our results indicate that deucon can provide scalable and robust utilization control for large scale distributed real time systems executing in unpredictable environments
efficient architecture exploration and design of application specific instruction set processors asips requires retargetable software development tools in particular compilers that can be quickly adapted to new architectures widespread approach is to model the target architecture in dedicated architecture description language adl and to generate the tools automatically from the adl specification for compiler generation however most existing systems are limited either by the manual retargeting effort or by redundancies in the adl models that lead to potential inconsistencies we present new approach to retargetable compilation based on the lisa adl with instruction semantics that minimizes redundancies while simultaneously achieving high degree of automation the key of our approach is to generate the mapping rules needed in the compiler’s code selector from the instruction semantics information we describe the required analysis and generation techniques and present experimental results for several embedded processors
we develop general model to estimate the throughput and goodput between arbitrary pairs of nodes in the presence of interference from other nodes in wireless network our model is based on measurements from the underlying network itself and is thus more accurate than abstract models of rf propagation such as those based on distance the seed measurements are easy to gather requiring only measurements in an node networks compared to existing measurement based models our model advances the state of the art in three important ways first it goes beyond pairwise interference and models interference among an arbitrary number of senders second it goes beyond broadcast transmissions and models the more common case of unicast transmissions third it goes beyond homogeneous nodes and models the general case of heterogeneous nodes with different traffic demands and different radio characteristics using simulations and measurements from two different wireless testbeds we show that the predictions of our model are accurate in wide range of scenarios
this paper proposes method to recognize scene categories using bags of visual words obtained hierarchically partitioning into subregion the input images specifically for each subregion the textons distribution and the extension of the corresponding subregion are taken into account the bags of visual words computed on the subregions are weighted and used to represent the whole scene the classification of scenes is carried out by support vector machine nearest neighbor algorithm and similarity measure based on bhattacharyya coefficient are used to retrieve from the scene database those that contain similar visual content to given scene used as query experimental tests using fifteen different scene categories show that the proposed approach achieves good performances with respect to the state of the art methods
the computer and communication systems that office workers currently use tend to interrupt at inappropriate times or unduly demand attention because they have no way to determine when an interruption is appropriate sensor based statistical models of human interruptibility offer potential solution to this problem prior work to examine such models has primarily reported results related to social engagement but it seems that task engagement is also important using an approach developed in our prior work on sensor based statistical models of human interruptibility we examine task engagement by studying programmers working on realistic programming task after examining many potential sensors we implement system to log low level input events in development environment we then automatically extract features from these low level event logs and build statistical model of interruptibility by correctly identifying situations in which programmers are non interruptible and minimizing cases where the model incorrectly estimates that programmer is non interruptible we can support reduction in costly interruptions while still allowing systems to convey notifications in timely manner
we explore the use of space time cuts to smoothly transition between stochastic mesh animation clips involving numerous deformable mesh groups while subject to physical constraints these transitions are used to construct mesh ensemble motion graphs for interactive data driven animation of high dimensional mesh animation datasets such as those arising from expensive physical simulations of deformable objects blowing in the wind we formulate the transition computation as an integer programming problem and introduce novel randomized algorithm to compute transitions subject to geometric nonpenetration constraints we present examples for several physically based motion datasets with real time display and optional interactive control over wind intensity via transitions between wind levels we discuss challenges and opportunities for future work and practical application
we propose practical defect prediction approach for companies that do not track defect related data specifically we investigate the applicability of cross company cc data for building localized defect predictors using static code features firstly we analyze the conditions where cc data can be used as is these conditions turn out to be quite few then we apply principles of analogy based learning ie nearest neighbor nn filtering to cc data in order to fine tune these models for localization we compare the performance of these models with that of defect predictors learned from within company wc data as expected we observe that defect predictors learned from wc data outperform the ones learned from cc data however our analyses also yield defect predictors learned from nn filtered cc data with performance close to but still not better than wc data therefore we perform final analysis for determining the minimum number of local defect reports in order to learn wc defect predictors we demonstrate in this paper that the minimum number of data samples required to build effective defect predictors can be quite small and can be collected quickly within few months hence for companies with no local defect data we recommend two phase approach that allows them to employ the defect prediction process instantaneously in phase one companies should use nn filtered cc data to initiate the defect prediction process and simultaneously start collecting wc local data once enough wc data is collected ie after few months organizations should switch to phase two and use predictors learned from wc data
considerable research has been done on the content based multimedia delivery and access in distributed data repositories as noted in the literature there is always trade off between multimedia quality and access speed in addition the overall performance is greatly determined by the distribution of the multimedia data in this article an unsupervised multimedia semantic integration approach for distributed infrastructure the distributed semantic indexing dsi is presented that addresses both the data quality and search performance with the ability of summarizing content information and guiding data distribution the proposed approach is distinguished by logic based representation and concise abstraction of the semantic contents of multimedia data which are further integrated to form general overview of multimedia data repository mdash content signature application of linguistic relationships to construct hierarchical metadata based on the content signatures allowing imprecise queries and achieving the optimal performance in terms of search cost the fundamental structure of the proposed model is presented the proposed scheme has been simulated and the simulation results are analyzed and compared against several other approaches that have been advocated in the literature
no matter what how the image is computationally produced screen based graphics are still typically presented on two dimensional surface like screen wall or electronic paper the limit of manipulating objects in two dimensional graphical display is where each pixel is an independent object these two observations motivate the development of calligraphic video textures that can be manipulated by the user using intuitions about physical material such as water ink or smoke we argue for phenomenological approach to complex visual interaction based on corporeal kinesthetic intuition and provide an effective way to provide such texture based interaction using computational physics motivating application is to create palpable highly textured video that can be projected as structured light fields in responsive environments
it is becoming increasingly common to see computers with two or even three monitors being used today people seem to like having more display space available and intuition tells us that the added space should be beneficial to work little research has been done to examine the effects and potential utility of multiple monitors for work on everyday tasks with common applications however we compared how people completed trip planning task that involved different applications and included interjected interruptions when they worked on computer with one monitor as compared to computer with two monitors results showed that participants who used the computer with two monitors performed the task set faster and with less workload and they also expressed subjective preference for the multiple monitor computer
we consider general form of information sources consisting of set of objects classified by terms arranged in taxonomy the query based access to the information stored in sources of this kind is plagued with uncertainty due among other things to the possible linguisticmismatch between the user and the object classification to overcome this uncertainty in all situations in which the user is not finding the desired information and is not willing or able to state new query the study proposes to extend the classification in way that is as reasonable as possible with respect to the original one by equating reasonableness with logical implication the sought extension turns out to be an explanation of the classification captured by abduction the problem of query evaluation on information sources extended in this way is studied and polynomial time algorithm is provided for the general case in which no hypothesis is made on the structure of the taxonomy the algorithm is successively specialized on most common kind of information sources namely sources whose taxonomy can be represented as directed acyclic graph it is shown that query evaluation on extended sources is easier for this kind of sources finally two applications of the method are presented which capture very important aspects of information access information browsing and query result ranking
multimedia content with rich internet applications using dynamic html dhtml and adobe flash is now becoming popular in various websites however visually impaired users cannot deal with such content due to audio interference with the speech from screen readers and intricate structures strongly optimized for sighted users we have been developing an accessibility internet browserfor multimedia aibrowser to address these problems thebrowser has two novel features non visual multimedia audiocontrols and alternative user interfaces using externalmetadata first by using the aibrowser users can directlycontrol the audio from the embedded media with fixed shortcutkeys therefore this allows blind users to increase ordecrease the media volume and pause or stop the mediato handle conflicts between the audio of the media and thespeech from the screen reader second the aibrowser canprovide an alternative simplified user interface suitable forscreen readers by using external metadata which can evenbe applied to dynamic content such as dhtml and flash in this paper we discuss accessibility problems with multimedia content due to streaming media and the dynamic changes in such content and explain how the aibrowser addresses these problems by describing non visual multimedia audio controls and external metadata based alternative user interfaces the evaluation of the aibrowser was conducted by comparing it to jaws one of the most popular screen readers on three well known multimedia content intensive websites the evaluation showed that the aibrowser made the contentthat was inaccessible with jaws relatively accessibleby using the multimedia audio controls and alternative interfaceswith metadata which included alternative text headinginformation and so on it also drastically reduced thekeystrokes for navigation with aibrowser which implies toimprove the non visual usability
in applications of data mining characterized by highly skewed misclassification costs certain types of errors become virtually unacceptable this limits the utility of classifier to range in which such constraints can be met naive bayes which has proven to be very useful in text mining applications due to high scalability can be particularly affected although its loss tends to be small its misclassifications are often made with apparently high confidence aside from efforts to better calibrate naive bayes scores it has been shown that its accuracy depends on document sparsity and feature selection can lead to marked improvement in classification performance traditionally sparsity is controlled globally and the result for any particular document may vary in this work we examine the merits of local sparsity control for naive bayes in the context of highly asymmetric misclassification costs in experiments with three benchmark document collections we demonstrate clear advantages of document level feature selection in the extreme cost setting multinomial naive bayes with local sparsity control is able to outperform even some of the recently proposed effective improvements to the naive bayes classifier there are also indications that local feature selection may be preferable in different cost settings
indirect input techniques allow users to quickly access all parts of tabletop workspaces without the need for physical access however indirect techniques restrict the available social cues that are seen on direct touch tables this reduced awareness results in impoverished coordination for example the number of conflicts might increase since users are more likely to interact with objects that another person is planning to use conflicts may also arise because indirect techniques reduce territorial behavior expanding the interaction space of each collaborator in this paper we introduce three new tabletop coordination techniques designed to reduce conflicts arising from indirect input while still allowing users the flexibility of distant object control two techniques were designed to promote territoriality and to allow users to protect objects when they work near their personal areas and the third technique lets users set their protection levels dynamically we present the results of an evaluation which shows that people prefer techniques that automatically provide protection for personal territories and that these techniques also increase territorial behavior
we introduce multi trials new technique for symmetry breaking for distributed algorithms and apply it to various problems in general graphs for instance we present three randomized algorithms for distributed vertex or edge coloring improving on previous algorithms and showing time color trade off to get coloring takes time log log to obtain an log log nn coloring takes time log this is more than an exponential improvement in time for graphs of polylogarithmic degree our fastest algorithm works in constant time using δlog log colors where denotes an arbitrary constant and log denotes the times recursively applied logarithm ton we also use the multi trials technique to compute network decompositions and to compute maximal independent set mis obtaining new results for several graph classes
virtual evidence ve first introduced by pearl provides convenient way of incorporating prior knowledge into bayesian networks this work generalizes the use of ve to undirected graphical models and in particular to conditional random fields crfs we show that ve can be naturally encoded into crf model as potential functions more importantly we propose novel semi supervised machine learning objective for estimating crf model integrated with ve the objective can be optimized using the expectation maximization algorithm while maintaining the discriminative nature of crfs when evaluated on the classifieds data our approach significantly outperforms the best known solutions reported on this task
due to its superiority such as low access latency low energy consumption light weight and shock resistance the success of flash memory as storage alternative for mobile computing devices has been steadily expanded into personal computer and enterprise server markets with ever increasing capacity of its storage however since flash memory exhibits poor performance for small to moderate sized writes requested in random order existing database systems may not be able to take full advantage of flash memory without elaborate flash aware data structures and algorithms the objective of this work is to understand the applicability and potential impact that flash memory ssd solid state drive has for certain type of storage spaces of database server where sequential writes and random reads are prevalent we show empirically that up to more than an order of magnitude improvement can be achieved in transaction processing by replacing magnetic disk with flash memory ssd for transaction log rollback segments and temporary table spaces
this paper studies continuous monitoring of nearest neighbor nn queries over sliding window streams according to this model data points continuously stream in the system and they are considered valid only while they belong to sliding window that contains the most recent arrivals count based or the arrivals within fixed interval covering the most recent time stamps time based the task of the query processor is to constantly maintain the result of long running nn queries among the valid data we present two processing techniques that apply to both count based and time based windows the first one adapts conceptual partitioning the best existing method for continuous nn monitoring over update streams to the sliding window model the second technique reduces the problem to skyline maintenance in the distance time space and precomputes the future changes in the nn set we analyze the performance of both algorithms and extend them to variations of nn search finally we compare their efficiency through comprehensive experimental evaluation the skyline based algorithm achieves lower cpu cost at the expense of slightly larger space overhead
current operating systems provide programmers an insufficient interface for expressing consistency requirements for accesses to system resources such as files and interprocess communication to ensure consistency programmers must to be able to access system resources atomically and in isolation from other applications on the same system although the os updates system resources atomically and in isolation from other processes within single system call not all operations critical to the integrity of an application can be condensed into single system call operating systems should support transactional execution of system calls providing simple comprehensive mechanism for atomic and isolated accesses to system resources preliminary results from linux prototype implementation indicate that the overhead of system transactions can be acceptably low
wireless sensor network wsn design often requires the decision of optimal locations deployment and transmit power levels power assignment of the sensors to be deployed in an area of interest few attempts have been made on optimizing both decision variables for maximizing the network coverage and lifetime objectives even though most of the latter studies consider the two objectives individually this paper defines the multiobjective deployment and power assignment problem dpap using the multi objective evolutionary algorithm based on decomposition moea the dpap is decomposed into set of scalar subproblems that are classified based on their objective preference and tackled in parallel by using neighborhood information and problem specific evolutionary operators in single run the proposed operators adapt to the requirements and objective preferences of each subproblem dynamically during the evolution resulting in significant improvements on the overall performance of moea simulation results have shown the superiority of the problem specific moea against the nsga ii in several network instances providing diverse set of high quality network designs to facilitate the decision maker’s choice
each node in wireless multi hop network can adjust the power level at which it transmits and thus change the topology of the network to save energy by choosing the neighbors with which it directly communicates many previous algorithms for distributed topology control have assumed an ability at each node to deduce some location based information such as the direction and the distance of its neighbor nodes with respect to itself such deduction of location based information however cannot be relied upon in real environments where the path loss exponents vary greatly leading to significant errors in distance estimates also multipath effects may result in different signal paths with different loss characteristics and none of these paths may be line of sight making it difficult to estimate the direction of neighboring node in this paper we present step topology control stc simple distributed topology control algorithm which reduces energy consumption while preserving the connectivity of heterogeneous sensor network without use of any location based information the stc algorithm avoids the use of gps devices and also makes no assumptions about the distance and direction between neighboring nodes we show that the stc algorithm achieves the same or better order of communication and computational complexity when compared to other known algorithms that also preserve connectivity without the use of location based information we also present detailed simulation based comparative analysis of the energy savings and interference reduction achieved by the algorithms the results show that in spite of not incurring higher communication or computational complexity the stc algorithm performs better than other algorithms in uniform wireless environments and especially better when path loss characteristics are non uniform
the tussle between reliability and functionality of the internet is firmly biased on the side of reliability new enabling technologies fail to achieve traction across the majority of isps we believe that the greatest challenge is not in finding solutions and improvements to the internet’s many problems but in how to actually deploy those solutions and re balance the tussle between reliability and functionality network virtualization provides promising approach to enable the coexistence of innovation and reliability we describe network virtualization architecture as technology for enabling internet innovation this architecture is motivated from both business and technical perspectives and comprises four main players in order to gain insight about its viability we also evaluate some of its components based on experimental results from prototype implementation
many safety critical systems that have been considered by the verification community are parameterized by the number of concurrent components in the system and hence describe an infinite family of systems traditional model checking techniques can only be used to verify specific instances of this family in this paper we present technique based on compositional model checking and program analysis for automatic verification of infinite families of systems the technique views parameterized system as an expression in process algebra ccs and interprets this expression over domain of formulas modal mu calculus considering process as property transformer the transformers are constructed using partial model checking techniques at its core our technique solves the verification problem by finding the limit of chain of formulas we present widening operation to find such limit for properties expressible in subset of modal mu calculus we describe the verification of number of parameterized systems using our technique to demonstrate its utility
since its introduction frequent itemset mining has been the subject of numerous studies however most of them return frequent itemsets in the form of textual lists the common cliché that picture is worth thousand words advocates that visual representation can enhance user understanding of the inherent relations in collection of objects such as frequent itemsets many visualization systems have been developed to visualize raw data or mining results however most of these systems were not designed for visualizing frequent itemsets in this paper we propose frequent itemset visualizer fisviz fisviz provides many useful features so that users can effectively see and obtain implicit previously unknown and potentially useful information that is embedded in data of various real life applications
we present new approach for simulating deformable objects the underlying model is geometrically motivated it handles pointbased objects and does not need connectivity information the approach does not require any pre processing is simple to compute and provides unconditionally stable dynamic simulationsthe main idea of our deformable model is to replace energies by geometric constraints and forces by distances of current positions to goal positions these goal positions are determined via generalized shape matching of an undeformed rest state with the current deformed state of the point cloud since points are always drawn towards well defined locations the overshooting problem of explicit integration schemes is eliminated the versatility of the approach in terms of object representations that can be handled the efficiency in terms of memory and computational complexity and the unconditional stability of the dynamic simulation make the approach particularly interesting for games
the chaos router randomizing nonminimal adaptive packet router is introduced adaptive routers allow messages to dynamically select paths depending on network traffic and bypass congested nodes this flexibility contrasts with oblivious packet routers where the path of packet is statically determined at the source node key advancement of the chaos router over previous nonminimal routers is the use of randomization to eliminate the need for livelock protection this simplifies adaptive routing to be of approximately the same complexity along the critical decision path as an oblivious router the primary cost is that the chaos router is probabilistically livelock free rather than being deterministically livelock free but evidence is presented implying that these are equivalent in practice the principal advantage is excellent performance for nonuniform traffic patterns the chaos router is described it is shown to be deadlock free and probabilistically livelock free and performance results are presented for variety of work loads
the overhead of context switching limits efficient scheduling of multiple concurrent threads on uniprocessor when real time requirements exist software implemented protocol controller may be crippled by this problem the available idle time may be too short to recover through context switching so only the primary thread can execute during message activity slowing the secondary threads and potentially missing deadlines asynchronous software thread integration asti uses coroutine calls and integration letting threads make independent progress efficiently and reducing the needed context switches we demonstrate the methods with software implementation of an automotive communication protocol and several secondary threads
next generation multiprocessor systems on chip mpsocs are expected to contain numerous processing elements interconnected via on chip networks executing real time applications it is anticipated that runtime optimization algorithms which dynamically adjust system parameters with the purpose of optimizing the system’s operation will be embedded in the system software and or hardware in this paper we present methodology for simulating and evaluating system level optimization algorithms demonstrated by the case of on chip dynamic task allocation applied to generic mpsoc architectures through this methodology we are able to show that dynamic system level bidding based task allocation can improve system performance when compared to round robin allocation in popular mpsoc applications
we argue that relationships between web pages are functions of the user’s intent we identify class of web tasks information gathering that can be facilitated by providing links to pages related to the page the user is currently viewing we define three kinds of intentional relationships that correspond to whether the user is seeking sources of information reading pages which provide information or surfing through pages as part of an extended information gathering process we show that these three relationships can be mined using combination of textual and link information and provide three scoring mechanisms that correspond to them seekrel factrel and surfrel these scoring mechanisms incorporate both textual and link information we build set of capacitated subnetworks each corresponding to particular keyword scores are computed by computing flows on these subnetworks the capacities of the links are derived from the hub and authority values of the nodes they connect following the work of kleinberg on assigning authority to pages in hyperlinked environments we evaluated our scoring mechanism by running experiments on four data sets taken from the web we present user evaluations of the relevance of the top results returned by our scoring mechanisms and compare those to the top results returned by google’s similar pages feature and the companion algorithm dean and henzinger
experienced web users have strategies for information search and re access that are not directly supported by web browsers or search engines we studied how prevalent these strategies are and whether even experienced users have problems with searching and re accessing information with this aim we conducted survey with experienced web users the results showed that this group has frequently used key strategies eg using several browser windows in parallel that they find important whereas some of the strategies that have been suggested in previous studies are clearly less important for them eg including urls on webpage in some aspects such as query formulation this group resembles less experienced web users for instance we found that most of the respondents had misconceptions about how their search engine handles queries as well as other problems with information search and re access in addition to presenting the prevalence of the strategies and rationales for their use we present concrete designs solutions and ideas for making the key strategies also available to less experienced users
we present websos novel overlay based architecture that provides guaranteed access to web server that is targeted by denial of service dos attack our approach exploits two key characteristics of the web environment its design around human centric interface and the extensibility inherent in many browsers through downloadable applets we guarantee access to web server for large number of previously unknown users without requiring pre existing trust relationships between users and the system by using reverse graphic turing tests furthermore our system makes it easy for service providers to charge users providing incentives to commercial offering of the service users can dynamically decide whether to use the websos overlay based on the prevailing network conditions our prototype requires no modifications to either servers or browsers and makes use of graphical turing tests web proxies and client authentication using the ssl tls protocol all readily supported by modern browsers we then extend this system with credential based micropayment scheme that combines access control and payment authorization in one operation turing tests ensure that malicious code such as worm cannot abuse user’s micropayment wallet we use the websos prototype to conduct performance evaluation over the internet using planetlab testbed for experimentation with network overlays we determine the end to end latency using both chord based approach and our shortcut extension our evaluation shows the latency increase by factor of and respectively confirming our simulation results
objective teaching refers to letting learners meet preset goal over certain time frame and for this purpose it is essential to find suitable teachers or combination of teachers most appropriate for the task this paper uses clustering and decision tree technologies to solve the problem of how to select teachers and students for objective teaching first students are classified into clusters according to their characteristics that are determined based on their pretest performance in english skill areas such as listening reading and writing second several classes of each skill area are available for the students freely to choose from each class already assigned teacher after the course of instruction is completed an analysis is made of the posttest results in order to evaluate both teaching and learning performances then the participant teachers who have taught particularly well in certain skill area are thus selected and recommended as appropriate for different clusters of students who have failed the posttest this study with the teaching experiment having turned out with highly encouraging results not only proposes new method of how to achieve objective teaching but also presents prototype system to demonstrate the effectiveness of this method
unlike twig query an xtwig query contains some selection predicates with reverse axes which are either ancestor or parent to evaluate such queries in the stream based context some rewriting rules have been proposed to transform the paths with reverse axes into equivalent reverse axis free ones however the transformation method is expensive due to multiple scanning input streams and the generation of unnecessary intermediate results to solve these problems holistic stream based algorithm xtwigstack is proposed for xtwig queries experiments show that xtwigstack is much more efficient than the transformation method
xfi is comprehensive protection system that offers both flexible access control and fundamental integrity guarantees at any privilege level and even for legacy code in commodity systems for this purpose xfi combines static analysis with inline software guards and two stack execution model we have implemented xfi for windows on the architecture using binary rewriting and simple stand alone verifier the implementation’s correctness depends on the verifier but not on the rewriter we have applied xfi to software such as device drivers and multimedia codecs the resulting modules function safely within both kernel and user mode address spaces with only modest enforcement overheads
we present new discrete velocity level formulation of frictional contact dynamics that reduces to pair of coupled projections and introduce simple fixed point property of this coupled system this allows us to construct novel algorithm for accurate frictional contact resolution based on simple staggered sequence of projections the algorithm accelerates performance using warm starts to leverage the potentially high temporal coherence between contact states and provides users with direct control over frictional accuracy applying this algorithm to rigid and deformable systems we obtain robust and accurate simulations of frictional contact behavior not previously possible at rates suitable for interactive haptic simulations as well as large scale animations by construction the proposed algorithm guarantees exact velocity level contact constraint enforcement and obtains long term stable and robust integration examples are given to illustrate the performance plausibility and accuracy of the obtained solutions
the paper introduces the notion of offline justification for answer set programming asp justifications provide graph based explanation of the truth value of an atom with respect to given answer set the paper extends also this notion to provide justification of atoms during the computation of an answer set on line justification and presents an integration of online justifications within the computation model of smodels offline and online justifications provide useful tools to enhance understanding of asp and they offer basic data structure to support methodologies and tools for debugging answer set programs preliminary implementation has been developed in
this paper presents rule based declarative database language which extends datalog to express events and nondeterministic state transitions by using the choice construct to model uncertainty in dynamic rules the proposed language called event choice datalog datalog ev for short provides powerful mechanism to formulate queries on the evolution of knowledge base given sequence of events envisioned to occur in the future distinguished feature of this language is the use of multiple spatio temporal dimensions in order to model finer control of evolution comprehensive study of the computational complexity of answering datalog ev queries is reported
it has long been recognized that capturing term relationships is an important aspect of information retrieval even with large amounts of data we usually only have significant evidence for fraction of all potential term pairs it is therefore important to consider whether multiple sources of evidence may be combined to predict term relations more accurately this is particularly important when trying to predict the probability of relevance of set of terms given query which may involve both lexical and semantic relations between the termswe describe markov chain framework that combines multiple sources of knowledge on term associations the stationary distribution of the model is used to obtain probability estimates that potential expansion term reflects aspects of the original query we use this model for query expansion and evaluate the effectiveness of the model by examining the accuracy and robustness of the expansion methods and investigate the relative effectiveness of various sources of term evidence statistically significant differences in accuracy were observed depending on the weighting of evidence in the random walk for example using co occurrence data later in the walk was generally better than using it early suggesting further improvements in effectiveness may be possible by learning walk behaviors
in data stream environments the initial plan of long running query may gradually become inefficient due to changes of the data characteristics in this case the query optimizer will generate more efficient plan based on the current statistics the online transition from the old to the new plan is called dynamic plan migration in addition to correctness an effective technique for dynamic plan migration should achieve the following objectives minimize the memory and cpu overhead of the migration reduce the duration of the transition and maintain steady output rate the only known solutions for this problem are the moving states ms and parallel track pt strategies which have some serious shortcomings related to the above objectives motivated by these shortcomings we first propose hybmig which combines the merits of ms and pt and outperforms both in every aspect as second step we extend pt ms and hybmig to the general problem of migration where both the new and the old plans are treated as black boxes
modern processors use branch target buffers btbs to predict the target address of branches such that they can fetch ahead in the instruction stream increasing concurrency and performance ideally btbs would be sufficiently large to capture the entire working set of the application and sufficiently small for fast access and practical on chip dedicated storage depending on the application these requirements are at odds this work introduces btb design that accommodates large instruction footprints without dedicating expensive onchip resources in the proposed phantom btb pbtb design conventional btb is augmented with virtual table that collects branch target information as the application runs the virtual table does not have fixed dedicated storage instead it is transparently allocated on demand in the on chip caches at cache line granularity the entries in the virtual table are proactively prefetched and installed in the dedicated conventional btb thus increasing its perceived capacity experimental results with commercial workloads under full system simulation demonstrate that pbtb improves ipc performance over entry btb by on average and up to with storage overhead of only overall the virtualized design performs within of conventional entry single cycle access btb while the dedicated storage is times smaller
sequential pattern mining has become challenging task in data mining due to its complexity essentially the mining algorithms discover all the frequent patterns meeting the user specified minimum support threshold however it is very unlikely that the user could obtain the satisfactory patterns in just one query usually the user must try various support thresholds to mine the database for the final desirable set of patterns consequently the time consuming mining process has to be repeated several times however current approaches are inadequate for such interactive mining due to the long processing time required for each query in order to reduce the response time for each query during the interactive process we propose knowledge base assisted mining algorithm for interactive sequence discovery the proposed approach utilizes the knowledge acquired from each mining process accumulates the counting information to facilitate efficient counting of patterns and speeds up the whole interactive mining process furthermore the knowledge base makes possible the direct generation of new candidate sets and the concurrent support counting of variable sized candidates even for some queries due to the pattern information already kept in the knowledge base database access is not required at all the conducted experiments show that our approach outperforms gsp state of the art sequential pattern mining algorithm by several order of magnitudes for interactive sequence discovery
in this paper we show the existence of small coresets for the problems of computing median and means clustering for points in low dimension in other words we show that given point set in rd one can compute weighted set of size log such that one can compute the median means clustering on instead of on and get an approximation as result we improve the fastest known algorithms for approximate means and median our algorithms have linear running time for fixed and in addition we can maintain the approximate median or means clustering of stream when points are being only inserted using polylogarithmic space and update time
most researches for ubiquitous services have been interested in constructing intelligent environments in physical spaces such as conference hall meeting room home and campus in these spaces people are able to share data easily however as cooperative works from distance make social issue we need data sharing system for an off site community that is group of people with common interest and purpose in different ubiquitous places like remote conference meeting in this paper we propose dynamic storage system cristore for heterogenous devices of the off site communities which autonomously builds distributed shared data space and keeps flexible overlay topology of the participant devices according to the devices capabilities cristore also performs file operations fitted to the capabilities
the emerging field of visual analytics changes the way we model gather and analyze data current data analysis approaches suggest to gather as much data as possible and then focus on goal and process oriented data analysis techniques visual analytics changes this approach and the methodology to interpret the results becomes the key issuethis paper contributes with method to interpret visual hierarchical heavy hitters vhhhs we show how to analyze data on the general level and how to examine specific areas of the data we identify five common patterns that build the interpretation alphabet of vhhhs we demonstrate our method on three different real world datasets and show the effectiveness of our approach
there is an increasing tendency in sensor networks and related networked embedded systems to push more complexity and intelligence into end nodes this in turn leads to growing need to support isolation between the software modules in node in conventional systems isolation is achieved using standard memory management hardware but this is not cost effective or energy efficient solution for small cheap embedded nodes we therefore propose software based solution that promises isolation in significantly lighter weight manner than existing software based mechanisms this is achieved by frontloading effort into offline compilation phases and leaving only small amount of work to be done at load time and run time
this paper presents an interactive gpu based system for cinematic relighting with multiple bounce indirect illumination from fixed view point we use deep frame buffer containing set of view samples whose indirect illumination is recomputed from the direct illumination on large set of gather samples distributed around the scene this direct to indirect transfer is linear transform which is particularly large given the size of the view and gather sets this makes it hard to precompute store and multiply with we address this problem by representing the transform as set of sparse matrices encoded in wavelet space hierarchical construction is used to impose wavelet basis on the unstructured gather cloud and an image based approach is used to map the sparse matrix computations to the gpu we precompute the transfer matrices using hierarchical algorithm and variation of photon mapping in less than three hours on one processor we achieve high quality indirect illumination at frames per second for complex scenes with over million polygons with diffuse and glossy materials and arbitrary direct lighting models expressed using shaders we compute per pixel indirect illumination without the need of irradiance caching or other subsampling techniques
isps are increasingly reluctant to collect and store raw network traces because they can be used to compromise their customers privacy anonymization techniques mitigate this concern by protecting sensitive information trace anonymization can be performed offline at later time or online at collection time offline anonymization suffers from privacy problems because raw traces must be stored on disk until the traces are deleted there is the potential for accidental leaks or exposure by subpoenas online anonymization drastically reduces privacy risks but complicates software engineering efforts because trace processing and anonymization must be performed at line speed this paper presents bunker network tracing system that combines the software development benefits of offline anonymization with the privacy benefits of online anonymization bunker uses virtualization encryption and restricted interfaces to protect the raw network traces and the tracing software exporting only an anonymized trace we present the design and implementation of bunker evaluate its security properties and show its ease of use for developing complex network tracing application
in this paper novel illumination normalization model is proposed for the pre processing of face recognition under varied lighting conditions the novel model could compensate all the illumination effects in face samples like the diffuse reflection specular reflection attached shadow and cast shadow firstly it uses the tvl model to get the low frequency part of face image and adopts the self quotient model to normalize the diffuse reflection and attached shadow then it generates the illumination invariant small scale part of face sample secondly tvl model is used to get the noiseless large scale part of face sample all kinds of illumination effects in the large scale part are further removed by the region based histogram equalization thirdly two parts are fused to generate the illumination invariant face sample the result of our model contains multi scaled image information and all illumination effects in face samples are compensated finally high order statistical relationships among variables of samples are extracted for classifier experimental results on some large scale face databases prove that the processed image by our model could largely improve the recognition performances of conventional methods under low level lighting conditions
open communities allow anonymous members to join and leave anytime in which interactions commonly take place between parties who have insufficient information about each other the openness brings risk to both the service provider and consumer trust is an important tool in human life as it enables people to get along with strangers it is accepted that trust management is potential way to solve the above problem in computational systems trust management mechanism can assist members in the open community to evaluate each other and decide which and whether to interact it provides an incentive to good behavior and gives punishment to malicious behavior the representation of trust in current computational trust and reputation models can be classified into four categories we present new evidential trust model based on the dezert smarandache theory which has higher expressiveness than the trust models based on the dempster shafer theory of evidence we propose smooth and effective approach of trust acquisition the general rule of combination of the dezert smarandache theory is used for trust aggregation we consider that trust is transitive and present rule of trust transitivity for our model lastly we evaluate our model experimentally
in this paper we present characterization study of search engine crawlers for the purposes of our work we use web server access logs from five academic sites in three different countries based on these logs we analyze the activity of different crawlers that belong to five search engines google altavista inktomi fastsearch and citeseer we compare crawler behavior to the characteristics of the general world wide web traffic and to general characterization studies we analyze crawler requests to derive insights into the behavior and strategy of crawlers we propose set of simple metrics that describe qualitative characteristics of crawler behavior vis vis crawler’s preference on resources of particular format its frequency of visits on web site and the pervasiveness of its visits to particular site to the best of our knowledge this is the first extensive and in depth characterization of search engine crawlers our results and observations provide useful insights into crawler behavior and serve as basis of our ongoing work on the automatic detection of web crawlers
though numerous multimedia systems exist in the commercial market today relatively little work has been done on developing the mathematical foundation of multimedia technology we attempt to take some initial steps towards the development of theoretical basis for multimedia information system to do so we develop the motion of structured multimedia database system we begin by defining mathematical model of media instance media instance may be thought of as ldquo glue rdquo residing on top of specific physical media representation such as video audio documents etc using this ldquo glue rdquo it is possible to define general purpose logical query language to query multimedia data this glue consists of set of ldquo states rdquo eg video frames audio tracks etc and ldquo features rdquo together with relationships between states and or features structured multimedia database system imposes certain mathematical structures on the set of features states using this notion of structure we are able to define indexing structures for processing queries methods to relax queries when answers do not exist to those queries as well as sound complete and terminating procedures to answer such queries and their relaxations when appropriate we show how media presentation can be generated by processing sequence of queries and furthermore we show that when these queries are extended to include constraints then these queries can not only generate presentations but also generate temporal synchronization properties and spatial layout properties for such presentations we describe the architecture of prototype multimedia database system based on the principles described in this paper
the common overarching goal of service bus and grid middleware is virtualization virtualization of business functions and virtualization of resources respectively by combining both capabilities new infrastructure called business grid results this infrastructure meets the requirements of both business applications and scientific computations in unified manner and in particular those that are not addressed by the middleware infrastructures in each of the fields furthermore it is the basis for enacting new trends like software as service or cloud computing in this paper the overall architecture of the business grid is outlined the business grid applications are described and the need for their customizability and adaptability is advocated requirements on the business grid like concurrency multi tenancy and scalability are addressed the concept of provisioning flows and other mechanisms to enable scalability as required by high number of concurrent users are outlined
we specify non invasive method allowing to estimate the time each developer of pair spends over the development activity during pair programming the method works by performing first behavioural fingerprinting of each developer based on low level event logs which then is used to operate segmentation over the log sequence produced by the pair in timelined log event sequence this is equivalent to estimating the times of the switching between developers we model the individual developer’s behaviour by means of markov chain inferred from the logs and model the developers role switching process by further higher level markov chain the overall model consisting in the two nested markov chains belongs to the class of hierarchical hidden markov models the method could be used not only to assess the degree of conformance with respect to predefined pair programming switch times policies but also to capture the characteristics of given programmers pair’s switching process namely in the context of pair programming effectiveness studies
existing large scale display systems generally adopt an indirect approach to user interaction this is due to the use of standard desktop oriented devices such as mouse on desk to control the large wall sized display by using an infrared laser pointer and an infrared tracking device more direct interaction with the large display can be achieved thereby reducing the cognitive load of the user and improving their mobility the challenge in designing such systems is to allow users to interact with objects on the display naturally and easily our system addresses this with hotspots regions surrounding objects of interest and gestures movements made with the laser pointer which triggers an action similar to those found in modern web browsers eg mozilla and opera finally these concepts are demonstrated by an add in module for microsoft powerpoint using the naturalpointtm smart navtm tracking device
suffix array is widely used full text index that allows fast searches on the text it is constructed by sorting all suffixes of the text in the lexicographic order and storing pointers to the suffixes in this order binary search is used for fast searches on the suffix array compact suffix array is compressed form of the suffix array that still allows binary searches but the search times are also dependent on the compression in this paper we give efficient methods for constructing and querying compact suffix arrays we also study practical issues such as the trade off between compression and search times and show how to reduce the space requirement of the construction experimental results are provided in comparison with other search methods with large text corpora the index took times the size of the text while the searches were only two times slower than from suffix array
distributed continuous media server dcms architectures are proposed to minimize the communication storage cost for those continuous media applications that serve large number of geographically distributed clients typically dcms is designed as pure hierarchy tree of centralized continuous media servers in an earlier work we proposed redundant hierarchical topology for dcms networks termed redhi which can potentially result in higher utilization and better reliability over pure hierarchy in this paper we focus on the design of resource management system for redhi that can exploit the resources of its dcms network to achieve these performance objectives our proposed resource management system is based on fully decentralized approach to achieve optimal scalability and robustness in general the major drawback of fully decentralized design is the increase in latency time and communication overhead to locate the requested object however as compared to the typically long duration and high resource bandwidth requirements of continuous media objects the extra latency and overhead of decentralized resource management approach become negligible moreover our resource management system collapses three management tasks object location path selection and resource reservation into one fully decentralized object delivery mechanism reducing the latency even further in sum decentralization of the resource management satisfies our scalability and robustness objectives whereas collapsing the management tasks helps alleviate the latency and overhead constraints to achieve high resource utilization the object delivery scheme uses our proposed cost function as well as various object location and resource reservation policies to select and allocate the best streaming path to serve each request the object delivery scheme is designed as an application layer resource management middleware for the dcms architecture to be independent of the underlying telecommunication infrastructure our experiments show that our resource management system is successful in realization of the higher resource utilization for the dcms networks with the redhi topology
we consider the following problem given set of clusterings find single clustering that agrees as much as possible with the input clusterings this problem clustering aggregation appears naturally in various contexts for example clustering categorical data is an instance of the clustering aggregation problem each categorical attribute can be viewed as clustering of the input rows where rows are grouped together if they take the same value on that attribute clustering aggregation can also be used as metaclustering method to improve the robustness of clustering by combining the output of multiple algorithms furthermore the problem formulation does not require priori information about the number of clusters it is naturally determined by the optimization function in this article we give formal statement of the clustering aggregation problem and we propose number of algorithms our algorithms make use of the connection between clustering aggregation and the problem of correlation clustering although the problems we consider are np hard for several of our methods we provide theoretical guarantees on the quality of the solutions our work provides the best deterministic approximation algorithm for the variation of the correlation clustering problem we consider we also show how sampling can be used to scale the algorithms for large datasets we give an extensive empirical evaluation demonstrating the usefulness of the problem and of the solutions
component based software structuring principles are now commonplace at the application level but componentization is far less established when it comes to building low level systems software although there have been pioneering efforts in applying componentization to systems building these efforts have tended to target specific application domains eg embedded systems operating systems communications systems programmable networking environments or middleware platforms they also tend to be targeted at specific deployment environments eg standard personal computer pc environments network processors or microcontrollers the disadvantage of this narrow targeting is that it fails to maximize the genericity and abstraction potential of the component approach in this article we argue for the benefits and feasibility of generic yet tailorable approach to component based systems building that offers uniform programming model that is applicable in wide range of systems oriented target domains and deployment environments the component model called opencom is supported by reflective runtime architecture that is itself built from components after describing opencom and evaluating its performance and overhead characteristics we present and evaluate two case studies of systems we have built using opencom technology thus illustrating its benefits and its general applicability
in mesh multicomputer performing jobs needs to schedule submeshes according to some processor allocation scheme in order to assign the incoming jobs to free submesh task compaction scheme is needed to generate larger contiguous free region the overhead of compaction depends on the efficiency of the task migration scheme in this paper two simple task migration schemes are first proposed in dimensional mesh multicomputers with supporting dimension ordered wormhole routing in one port communication model then hybrid scheme which combines advantages of the two schemes is discussed finally we evaluate the performance of all of these proposed approaches
store misses cause significant delays in shared memory multiprocessors because of limited store buffering and ordering constraints required for proper synchronization today programmers must choose from spectrum of memory consistency models that reduce store stalls at the cost of increased programming complexity prior research suggests that the performance gap among consistency models can be closed through speculation enforcing order only when dynamically necessary unfortunately past designs either provide insufficient buffering replace all stores with read modify write operations and or recover from ordering violations via impractical fine grained rollback mechanisms we propose two mechanisms that together enable store wait free implementations of any memory consistency model to eliminate buffer capacity related stalls we propose the scalable store buffer which places private speculative values directly into the cache thereby eliminating the non scalable associative search of conventional store buffers to eliminate ordering related stalls we propose atomic sequence ordering which enforces ordering constraints over coarse grain access sequences while relaxing order among individual accesses using cycle accurate full system simulation of scientific and commercial applications we demonstrate that these mechanisms allow the simplified programming of strict ordering while outperforming conventional implementations on average by sequential consistency sparc total store order and sparc relaxed memory order
in the generalized connectivity problem we are given an edge weighted graph and collection hellip sk tk of distinct demands each demand si ti is pair of disjoint vertex subsets we say that subgraph sube connects demand si ti when it contains path with one endpoint in si and the other in ti the goal is to identify minimum weight subgraph that connects all demands in alon et al soda introduced this problem to study online network formation settings and showed that it captures some well studied problems such as steiner forest non metric facility location tree multicast and group steiner tree finding non trivial approximation ratio for generalized connectivity was left as an open problem our starting point is the first polylogarithmic approximation for generalized connectivity attaining performance guarantee of log log here is the number of vertices in and is the number of demands we also prove that the cut covering relaxation of this problem has an log log integrality gap building upon the results for generalized connectivity we obtain improved approximation algorithms for two problems that contain generalized connectivity as special case for the directed steiner network problem we obtain an epsilon approximation which improves on the currently best performance guarantee of due to charikar et al soda for the set connector problem recently introduced by fukunaga and nagamochi ipco we present polylogarithmic approximation this result improves on the previously known ratio which can be omega in the worst case
we present method for extracting boundary surfaces from segmented cross section image data we use constrained potts model to interpolate an arbitrary number of region boundaries between segmented images this produces segmented volume from which we extract triangulated boundary surface using well known marching tetrahedra methods this surface contains staircase like artifacts and an abundance of unnecessary triangles we describe an approach that addresses these problems with voxel accurate simplification algorithm that reduces surface complexity by an order of magnitude our boundary interpolation and simplification methods are novel contributions to the study of surface extraction from segmented cross sections we have applied our method to construct polycrystal grain boundary surfaces from micrographs of sample of the metal tantalum
for the purposes of classification it is common to represent document as bag of words such representation consists of the individual terms making up the document together with the number of times each term appears in the document all classification methods make use of the terms it is common to also make use of the local term frequencies at the price of some added complication in the model examples are the naïve bayes multinomial model mm the dirichlet compound multinomial model dcm and the exponential family approximation of the dcm edcm as well as support vector machines svm although it is usually claimed that incorporating local word frequency in document improves text classification performance we here test whether such claims are true or not in this paper we show experimentally that simplified forms of the mm edcm and svm models which ignore the frequency of each word in document perform about at the same level as mm dcm edcm and svm models which incorporate local term frequency we also present new form of the naïve bayes multivariate bernoulli model mbm which is able to make use of local term frequency and show again that it offers no significant advantage over the plain mbm we conclude that word burstiness is so strong that additional occurrences of word essentially add no useful information to classifier
in this article we present method for automatically recovering complete and dense depth maps of an indoor environment by fusing incomplete data for the environment modeling problem the geometry of indoor environments is usually extracted by acquiring huge amount of range data and registering it by acquiring small set of intensity images and very limited amount of range data the acquisition process is considerably simplified saving time and energy consumption in our method the intensity and partial range data are registered first by using an image based registration algorithm then the missing geometric structures are inferred using statistical learning method that integrates and analyzes the statistical relationships between the visual data and the available depth on terms of small patches experiments on real world data on variety of sampling strategies demonstrate the feasibility of our method
we introduce logic of functional fixed points it is suitable for analyzing heap manipulating programs and can encode several logics used for program verification with different ways of expressing reachability while full fixed point logic remains undecidable several subsets admit decision procedures in particular for the logic of linear functional fixed points we develop an abstraction refinement integration of the smt solver and satisfiability checker for propositional linear time temporal logic the integration refines the temporal abstraction by generating safety formulas until the temporal abstraction is unsatisfiable or model for it is also model for the functional fixed point formula
in this paper we propose an efficient graph based mining gbm algorithm for mining the frequent trajectory patterns in spatial temporal database the proposed method comprises two phases first we scan the database once to generate mapping graph and trajectory information lists ti lists then we traverse the mapping graph in depth first search manner to mine all frequent trajectory patterns in the database by using the mapping graph and ti lists the gbm algorithm can localize support counting and pattern extension in small number of ti lists moreover it utilizes the adjacency property to reduce the search space therefore our proposed method can efficiently mine the frequent trajectory patterns in the database the experimental results show that it outperforms the apriori based and prefixspan based methods by more than one order of magnitude
massive taxonomies for product classification are currently gaining popularity among commerce systems for diverse domains for instance amazoncom maintains an entire plethora of hand crafted taxonomies classifying books movies apparel and various other types of consumer goods we use such taxonomic background knowledge for the computation of personalized recommendations exploiting relationships between super concepts and sub concepts during profile generation empirical analysis both offline and online demonstrates our proposal’s superiority over existing approaches when user information is sparse and implicit ratings prevail besides addressing the sparsity issue we use parts of our taxonomy based recommender framework for balancing and diversifying personalized recommendation lists in order to reflect the user’s complete spectrum of interests though being detrimental to average accuracy we show that our method improves user satisfaction with recommendation lists in particular for lists generated using the common item based collaborative filtering algorithm we evaluate our method using book recommendation data including offline analysis on ratings and an online study involving more than subjects
in this paper we discuss the uprising problem of public key revocation the main problem in key revocation includes the relatively large memory and communication required to store and transmit the revoked list of keys this problem becomes serious as the sensor network is subjected to several constraints in this paper we introduce several efficient representation mechanisms for representing set of revoked identifiers of keys we discuss several network and revocation scenarios and introduce the corresponding solution for each to demonstrate the value of our proposed approaches practical simulation results and several comparisons with the current used revocation mechanism are included
this paper discusses an approach to the problem of annotating multimedia content our approach provides annotation as metadata for indexing retrieval and semantic processing as well as content enrichment we use an underlying model for structured multimedia descriptions and annotations allowing the establishment of spatial temporal and linking relationships we discuss aspects related with documents and annotations used to guide the design of an application that allows annotations to be made with pen based interaction with tablet pcs as result video stream can be annotated during the capture the annotation can be further edited extended or played back synchronously
this study focuses on the development of conceptual simulation modeling tool that can be used to structure domain specific simulation environment the issues in software engineering and knowledge engineering such as object oriented concepts and knowledge representations are addressed to identify and analyze modeling frameworks and patterns of specific problem domain thus its structural and behavioral characteristics can be conceptualized and described in terms of simulation architecture and context moreover symbols notations and diagrams are developed as communication tool that creates blueprint to be seen and recognized by both domain experts and simulation developers which leads to the effectiveness and efficiency in the simulation development of any specific domains
reynolds’s abstraction theorem john reynolds types abstraction and parametric polymorphism in information processing pages north holland proceedings of the ifip th world computer congress often referred to as the parametricity theorem can be used to derive properties about functional programs solely from their types unfortunately in the presence of runtime type analysis the abstraction properties of polymorphic programs are no longer valid however runtime type analysis can be implemented with term level representations of types as in the language of crary et al karl crary stephanie weirich and greg morrisett intensional polymorphism in type erasure semantics journal of functional programming november where case analysis on these runtime representations introduces type refinement in this paper we show that representation based analysis is consistent with type abstraction by extending the abstraction theorem to such language we also discuss the free theorems that result this work provides foundation for the more general problem of extending the abstraction theorem to languages with generalized algebraic datatypes gadts
in this paper new definition of distance based outlier and an algorithm called hilout designed to efficiently detect the top outliers of large and high dimensional data set are proposed given an integer the weight of point is defined as the sum of the distances separating it from its nearest neighbors outlier are those points scoring the largest values of weight the algorithm hilout makes use of the notion of space filling curve to linearize the data set and it consists of two phases the first phase provides an approximate solution within rough factor after the execution of at most sorts and scans of the data set with temporal cost quadratic in and linear in and in where is the number of dimensions of the data set and is the number of points in the data set during this phase the algorithm isolates points candidate to be outliers and reduces this set at each iteration if the size of this set becomes then the algorithm stops reporting the exact solution the second phase calculates the exact solution with final scan examining further the candidate outliers that remained after the first phase experimental results show that the algorithm always stops reporting the exact solution during the first phase after much less than steps we present both an in memory and disk based implementation of the hilout algorithm and thorough scaling analysis for real and synthetic data sets showing that the algorithm scales well in both cases
the task of discovering natural groupings of input patterns or clustering is an important aspect of machine learning and pattern analysis in this paper we study the widely used spectral clustering algorithm which clusters data using eigenvectors of similarity affinity matrix derived from data set in particular we aim to solve two critical issues in spectral clustering how to automatically determine the number of clusters and how to perform effective clustering given noisy and sparse data an analysis of the characteristics of eigenspace is carried out which shows that not every eigenvectors of data affinity matrix is informative and relevant for clustering eigenvector selection is critical because using uninformative irrelevant eigenvectors could lead to poor clustering results and the corresponding eigenvalues cannot be used for relevant eigenvector selection given realistic data set motivated by the analysis novel spectral clustering algorithm is proposed which differs from previous approaches in that only informative relevant eigenvectors are employed for determining the number of clusters and performing clustering the key element of the proposed algorithm is simple but effective relevance learning method which measures the relevance of an eigenvector according to how well it can separate the data set into different clusters our algorithm was evaluated using synthetic data sets as well as real world data sets generated from two challenging visual learning problems the results demonstrated that our algorithm is able to estimate the cluster number correctly and reveal natural grouping of the input data patterns even given sparse and noisy data
many scientific problems can be modeled as computational workflows that integrate data from heterogeneous sources and process such data to derive new results these data analysis problems are pervasive in the physical and social sciences as well as in government practice in this paper we present an approach to automatically create computational workflows in response to user data requests we represent both data access and data processing operations uniformly as web services we describe the inputs and outputs of the services according to an ontology of the application domain expressed in rdf rdfs our system uses the triple logic engine to formally represent the ontology and the services and to automatically generate the workflowsthis work is part of the argos project that is developing flexible data query and analysis system based on the web services paradigm our application domain is goods movement analysis and its effects on spatial urban structure since our ontology represents data items as multi dimensional objects with hierarchical values for each dimension in this paper we focus on automatically generating workflows that include aggregation operations
the purpose of software metrics is to measure the quality of programs the results can be for example used to predict maintenance costs or improve code quality an emerging view is that if software metrics are going to be used to improve quality they must help in finding code that should be refactored often refactoring or applying design pattern is related to the role of the class to be refactored in client based metrics project gives the class context these metrics measure how class is used by other classes in the context we present new client based metric lcic lack of coherence in clients which analyses if the class being measured has coherent set of roles in the program interfaces represent the roles of classes if class does not have coherent set of roles it should be refactored or new interface should be defined for the class we have implemented tool for measuring the metric lcic for java projects in the eclipse environment we calculated lcic values for classes of several open source projects we compare these results with results of other related metrics and inspect the measured classes to find out what kind of refactorings are needed we also analyse the relation of different design patterns and refactorings to our metric our experiments reveal the usefulness of client based metrics to improve the quality of code
data stream management systems may be subject to higher input rates than their resources can handle when overloaded the system must shed load in order to maintain low latency query results in this paper we describe load shedding technique for queries consisting of one or more aggregate operators with sliding windows we introduce new type of drop operator called window drop this operator is aware of the window properties ie window size and window slide of its downstream aggregate operators in the query plan accordingly it logically divides the input stream into windows and probabilistically decides which windows to drop this decision is further encoded into tuples by marking the ones that are disallowed from starting new windows unlike earlier approaches our approach preserves integrity of windows throughout query plan and always delivers subsets of original query answers with minimal degradation in result quality
as network bandwidth increases designing an effective memory system for network processors becomes significant challenge the size of the routing tables the complexity of the packet classification rules and the amount of packet buffering required all continue to grow at staggering rate simply relying on large fast srams alone is not likely to be scalable or cost effective instead trends point to the use of low cost commodity dram devices as means to deliver the worst case memory performance that network data plane algorithms demand while drams can deliver great deal of throughput the problem is that memory banking significantly complicates the worst case analysis and specialized algorithms are needed to ensure that specific types of access patterns are conflict free we introduce virtually pipelined memory an architectural technique that efficiently supports high bandwidth uniform latency memory accesses and high confidence throughput even under adversarial conditions virtual pipelining provides simple to analyze programming model of deep pipeline deterministic latencies with completely different physical implementation memory system with banks and probabilistic mapping this allows designers to effectively decouple the analysis of their algorithms and data structures from the analysis of the memory buses and banks unlike specialized hardware customized for specific data plane algorithm our system makes no assumption about the memory access patterns we present mathematical argument for our system’s ability to provably provide bandwidth with high confidence and demonstrate its functionality and area overhead through synthesizable design we further show that even though our scheme is general purpose to support new applications such as packet reassembly it outperforms the state of the art in specialized packet buffering architectures
we present congestion driven placement flow first we consider in the global placement stage the routing demand to replace cells in order to avoid congested regions then we allocate appropriate amounts of white space into different regions of the chip according to the congestion map finally detailed placer is applied to legalize placements while preserving the distributions of white space experimental results show that our placement flow can achieve the best routability with the shortest routed wirelength among all publicly available placement tools moreover our white space allocation approach can significantly improve the routabilities of placements generated by other placement tools
we show how solutions to many recursive arena equations can be computed in natural way by allowing loops in arenas we then equip arenas with winning functions and total winning strategies we present two natural winning conditions compatible with the loop construction which respectively provide initial algebras and terminal coalgebras for large class of continuous functors finally we introduce an intuitionistic sequent calculus extended with syntactic constructions for least and greatest fixed points and prove it has sound and in certain weak sense complete interpretation in our game model the acm portal is published by the association for computing machinery copyright acm inc terms of usage privacy policy code of ethics contact us useful downloads adobe acrobat quicktime windows media player real player
multi hop network of wireless sensors can be used to gather spatio temporal samples of physical phenomenon and transmit these samples to processing center this paper addresses the important issue of minimizing the number of transmissions required to gather one sample from each sensor the technique used to minimize communication costs combines analytical results from stochastic geometry with distributed randomized algorithm for generating clusters of sensors the minimum communication energy achieved by this approach is significantly lower than the energy costs incurred in non clustered networks and in clustered networks produced by such algorithms as the max min cluster algorithm
large scale value added internet services composed of independent cooperating or competing services will soon become common place several groups have addressed the performance communication discovery and description aspects of these services however little work has been done on effectively composing paid services and the quality of service qos guarantees that they provide we address these issues in the context of distributed file storage in this paper in particular we propose implement and evaluate cost effective qos aware distributed file service comprising front end file service and back end third party storage services our front end service uses mathematical modeling and optimization to provide performance and availability guarantees at low cost by carefully orchestrating the accesses to the back end services experimental results from our prototype implementation validate our modeling and optimization we conclude that our approach for providing qos at low cost should be useful to future composite internet services
one of the important issues in constructing interprocedural program slices is maintaining context sensitivity or preserving calling context when procedure is called at multiple call sites though number of context sensitive techniques have been presented in the last decade the following important questions remain unanswered what is the level of precision lost if context sensitivity is not maintained what are the additional costs for achieving context sensitivity in this paper we evaluate pdg based explicitly context sensitive interprocedural program slicing technique for accuracy and efficiency we compare this technique against context insensitive technique using program slicing framework we have developed for java programs for which only the byte code sequences are availableour results show that the context sensitive technique in spite of its worst case exponential complexity can be very efficient in practice the execution time for our set of benchmarks is on the average only twice as much as the execution time for the context insensitive technique the results on the accuracy for the context insensitive technique are mixed for of the slicing criteria used in our experiments the context insensitive technique does not loose accuracy however in some cases it can also lead to slices with times more vertices on the average the slices constructed from the context insensitive technique are twice as large as the one from the context sensitive technique
in this paper we are interested in multimedia xml document retrieval whose aim is to find relevant document components ie xml elements that focus on the user needs we propose to represent multimedia elements using not only textual information but also hierarchical structure indeed an xml document can be represented as tree whose nodes correspond to xml elements thanks to this representation an analogy between xml documents and ontologies can be established therefore to quantify the participation degree of each node in the multimedia element representation we propose two measures using the ontology hierarchy another part of our model consists of defining the best window of multimedia fragments to be returned to the user through the evaluation of our model on the inex multimedia fragments task we show the importance of using the document structure in multimedia information retrieval
the ability to create user defined aggregate functions udas is rapidly becoming standard feature in relational database systems therefore problems such as query optimization query rewriting and view maintenance must take into account queries or views with udas there is wealth of research on these problems for queries with general aggregate functions unfortunately there is mismatch between the manner in which udas are created and the information that the database system requires in order to apply previous researchthe purpose of this paper is to explore this mismatch and to bridge the gap between theory and practice thereby enabling udas to become first class citizens within the database specifically we consider query optimization query rewriting and view maintenance for queries with udas for each of these problems we first survey previous results and explore the mismatch between theory and practice we then present theoretical and practical insights that can be combined to derive coherent framework for defining udas within database system
the area of wireless sensor networks wsn is currently attractive in the research community area due to its applications in diverse fields such as defense security civilian applications and medical research routing is serious issue in wsn due to the use of computationally constrained and resource constrained micro sensors these constraints prohibit the deployment of traditional routing protocols designed for other ad hoc wireless networks any routing protocol designed for use in wsn should be reliable energy efficient and should increase the lifetime of the network we propose simple least time energy efficient routing protocol with one level data aggregation that ensures increased life time for the network the proposed protocol was compared with popular ad hoc and sensor network routing protocols viz aodv royer and perkins das et al dsr johnson et al dsdv perkins and bhagwat dd intanagonwiwat et al and mcf ye et al it was observed that the proposed protocol outperformed them in throughput latency average energy consumption and average network lifetime the proposed protocol uses absolute time and node energy as the criteria for routing this ensures reliability and congestion avoidance
networks on chip for future many core processor platforms face an increasing diversity of traffic requirements ranging from streaming traffic with real time requirements to bursty latency sensitive best effort traffic from general purpose processors with caches in this paper we propose back suction novel flow control scheme to implement quality of service traffic with service guarantees is selectively prioritized upon low buffer occupancy of downstream routers as result best effort traffic is preferred for an improved latency as long as guaranteed service traffic makes sufficient progress we present formal analysis and an experimental evaluation of the back suction scheme showing improved latency of best effort traffic when compared to current approaches even under formal service guarantees for streaming traffic
as the diversity of sensornet use cases increases the combinations of environments and applications that will coexist will make custom engineering increasingly impractical we investigate an approach that focuses on replacing custom engineering with automated optimization of declarative protocol specifications specifically we automate network rendezvous and proxy selection from program source these optimizations perform program transformations that are grounded in recursive query optimization an area of database theory our prototype system implementation can automatically choose program executions that are as much as three and usually one order of magnitude better than original source programs
ontology plays an essential role in recognizing the meaning of the information in web documents it has been shown that extracting concepts is easier than building relationships among them for defined set of concepts many existing algorithms produce all possible relationships for that set this makes the process of refining the relationships almost impossible new algorithm is needed to reduce the number of relationships among defined set of concepts produced by existing algorithms this article contributes such an algorithm which enables domain knowledge expert to refine the relationships linking set of concepts in the research reported here text mining tools have been used to extract concepts in the domain of commerce laws new algorithm has been proposed to reduce the number of extracted relationships it groups the concepts according to the number of relationships with other concepts and provides formalization an experiment and software have been built proving that reducing the number of relationships will reduce the efforts needed from human expert copy wiley periodicals inc
we present an on the fly mechanism that detects access conflicts in executions of multi threaded java programs access conflicts are conservative approximation of data races the checker tracks access information at the level of objects object races rather than at the level of individual variables this viewpoint allows the checker to exploit specific properties of object oriented programs for optimization by restricting dynamic checks to those objects that are identified by escape analysis as potentially shared the checker has been implemented in collaboration with an ahead of time java compiler the combination fo static program analysis escape analysis and inline instrumentation during code generation allows us to reduce the runtime overhead of detecting access conflicts this overhead amounts to about in time and less than in space for typical benchmark applications and compares favorably to previously published on the fly mechanism that incurred an overhead of about factor of in time and up to factor of in space
large body of research analyzes the runtime execution of system to extract abstract behavioral views those approaches primarily analyze control flow by tracing method execution events or they analyze object graphs of heap memory snapshots however they do not capture how objects are passed through the system at runtime we refer to the exchange of objects as the object flow and we claim that it is necessary to analyze object flows if we are to understand the runtime of an object oriented application we propose and detail object flow analysis novel dynamic analysis technique that takes this new information into account to evaluate its usefulness we present visual approach that allows developer to study classes and components in terms of how they exchange objects at runtime we illustrate our approach on three case studies
we propose data mining approach to predict human wine taste preferences that is based on easily available analytical tests at the certification step large dataset when compared to other studies in this domain is considered with white and red vinho verde samples from portugal three regression techniques were applied under computationally efficient procedure that performs simultaneous variable and model selection the support vector machine achieved promising results outperforming the multiple regression and neural network methods such model is useful to support the oenologist wine tasting evaluations and improve wine production furthermore similar techniques can help in target marketing by modeling consumer tastes from niche markets
in all to all personalized communication aapc every node of parallel system sends potentially unique packet to every other node aapc is an important primitive operation for modern parallel compilers since it is used to redistribute data structures during parallel computations as an extremely dense communication pattern aapc causes congestion in many types of networks and therefore executes very poorly on general purpose asynchronous message passsing routers we present and evaluate network architecture that executes all to all communication optimally on two dimensional torus the router combines optimal partitions of the aapc step with self synchronizing switching mechanism integrated into conventional wormhole router optimality is achieved by routing along shortest paths while fully utilizing all links simple hardware addition for synchronized message switching can guarantee optimal aapc routing in many existing network architectures the flexible communication agent of the iwarp vlsi component allowed us to implement an efficient prototype for the evaluation of the hardware complexity as well as possible software overheads the measured performance on an times torus exceeded gigabytes sec or of the limit set by the raw speed of the interconnects we make quantitative comparison of the aapc router with conventional message passing system the potential gain of such router for larger parallel programs is illustrated with the example of two dimensional fast fourier transform
mobile application developers should be able to specify how applications can adapt to changing conditions and to later reconfigure the application to suit new circumstances event based communication have been advocated to facilitate such dynamic changes event based models however are fragmented which makes it difficult to understand the dependencies between components process oriented methodology overcomes this issue by specifying dependencies according to process model this paper describes methodology that combines the comprehensibility and manageability of control from process oriented methodologies with the flexibility of event based communication this enables fine grained adaptation of process oriented applications
we propose framework to construct web oriented user interfaces in high level way by exploiting declarative programming techniques such user interfaces are intended to manipulate complex data in type safe way ie it is ensured that only typecorrect data is accepted by the interface where types can be specified by standard types of programming language as well as any computable predicate on the data the interfaces are web based ie the data can be manipulated with standard web browsers without any specific requirements on the client side however if the client’s browser has javascript enabled one could also check the correctness of the data on the client side providing immediate feedback to the user in order to release the application programmer from the tedious details to interact with javascript we propose an approach where the programmer must only provide declarative description of the requirements of the user interface from which the necessary javascript programs and html forms are automatically generated this approach leads to very concise and maintainable implementation of web based user interfaces we demonstrate an implementation of this concept in the declarative multi paradigm language curry where the integrated functional and logic features are exploited to enable the high level of abstraction proposed in this paper
subspace clustering also called projected clustering addresses the problem that different sets of attributes may be relevant for different clusters in high dimensional feature spaces in this paper we propose the algorithm dish detecting subspace cluster hierarchies that improves in the following points over existing approaches first dish can detect clusters in subspaces of significantly different dimensionality second dish uncovers complex hierarchies of nested subspace clusters ie clusters in lower dimensional subspaces that are embedded within higher dimensional subspace clusters these hierarchies do not only consist of single inclusions but may also exhibit multiple inclusions and thus can only be modeled using graphs rather than trees third dish is able to detect clusters of different size shape and density furthermore we propose to visualize the complex hierarchies by means of an appropriate visualization model the so called subspace clustering graph such that the relationships between the subspace clusters can be explored at glance several comparative experiments show the performance and the effectivity of dish
direct volume rendering of scalar fields uses transfer function to map locally measured data properties to opacities and colors the domain of the transfer function is typically the one dimensional space of scalar data values this paper advances the use of curvature information in multi dimensional transfer functions with methodology for computing high quality curvature measurements the proposed methodology combines an implicit formulation of curvature with convolution based reconstruction of the field we give concrete guidelines for implementing the methodology and illustrate the importance of choosing accurate filters for computing derivatives with convolution curvature based transfer functions are shown to extend the expressivity and utility of volume rendering through contributions in three different application areas non photorealistic volume rendering surface smoothing via anisotropic diffusion and visualization of isosurface uncertainty
exhaustive verification often suffers from the state explosion problem where the reachable state space is too large to fit in main memory for this reason and because of disk swapping once the main memory is full very little progress is made and the process is not scalable to alleviate this partial verification methods have been proposed some based on randomized exploration mostly in the form of random walks in this paper we enhance partial randomized state space exploration methods with the concept of resource awareness the exploration algorithm is made aware of the limits on resources in particular memory and time we present memory aware algorithm that by design never stores more states than those that fit in main memory we also propose criteria to compare this algorithm with similar other algorithms we study properties of such algorithms both theoretically on simple classes of state spaces and experimentally on some preliminary case studies
large isps experience millions of bgp routing changes day in this paper we discuss the impact of bgp routing changes on the flow of trafffic summarizing and reconciling the results from six measurement studies of the sprint and at backbone networks
in this paper we present pixel level object categorization method suitable to be applied under real time constraints since pixels are categorized using bag of features scheme the major bottleneck of such an approach would be the feature pooling in local histograms of visual words therefore we propose to bypass this time consuming step and directly obtain the score from linear support vector machine classifier this is achieved by creating an integral image of the components of the svm which can readily obtain the classification score for any image sub window with only additions and products regardless of its size besides we evaluated the performance of two efficient feature quantization methods the hierarchical means and the extremely randomized forest all experiments have been done in the graz database showing comparable or even better results to related work with lower computational cost
cryptographic protocols often make use of nested cryptographic primitives for example signed message digests or encrypted signed messages gordon and jeffrey’s prior work on types for authenticity did not allow for such nested cryptography in this work we present the pattern matching spi calculus which is an obvious extension of the spi calculus to include pattern matching as primitive the novelty of the language is in the accompanying type system which uses the same language of patterns to describe complex data dependencies which cannot be described using prior type systems we show that any appropriately typed process is guaranteed to satisfy robust authenticity secrecy and integrity properties
the classic two stepped approach of the apriori algorithm and its descendants which consisted of finding all large itemsets and then using these itemsets to generate all association rules has worked well for certain categories of data nevertheless for many other data types this approach shows highly degraded performance and proves rather inefficientwe argue that we need to search all the search space of candidate itemsets but rather let the database unveil its secrets as the customers use it we propose system that does not merely scan all possible combinations of the itemsets but rather acts like search engine specifically implemented for making recommendations to the customers using techniques borrowed from information retrieval
sensor network application experts such as biologists geologists and environmental engineers generally have little experience with and little patience for general purpose and often low level sensor network programming languages we believe sensor network languages should be designed for application experts who may not be expert programmers to further that goal we propose the concepts of sensor network application archetypes archetype specific languages and archetype templates our work makes the following contributions we have examined wide range of wireless sensor networks to develop taxonomy of seven archetypes this taxonomy permits the design of compact languages that are appropriate for novice programmers we developed language named wasp and its associated compiler for commonly encountered archetype we conducted user studies to evaluate the suitability of wasp and several alternatives for novice programmers to the best of our knowledge this hour user study is the first to evaluate broad range of sensor network languages tinyscript tiny sql swissqm and tinytemplate on average users of other languages successfully implemented their assigned applications of the time among the successful completions the average development time was minutes users of wasp had an average success rate of and an average development time of minutes an improvement of
mining weighted interesting patterns wip is an important research issue in data mining and knowledge discovery with broad applications wip can detect correlated patterns with strong weight and or support affinity however it still requires two database scans which are not applicable for efficient processing of the real time data like data streams in this paper we propose novel tree structure called spwip tree single pass weighted interesting pattern tree that captures database information using single pass of database and provides efficient mining performance using pattern growth mining approach extensive experimental results show that our approach outperforms the existing wip algorithm moreover it is very efficient and scalable for weighted interesting pattern mining with single database scan
in this paper we present technique for the interactive visualization and interrogation of multi dimensional giga pixel imagery co registered image layers representing discrete spectral wavelengths or temporal information can be seamlessly displayed and fused users can freely pan and zoom while swiftly transitioning through data layers enabling intuitive analysis of massive multi spectral or time varying records data resource aware display paradigm is introduced which progressively and adaptively loads data from remote network attached storage devices the technique is specifically designed to work with scalable high resolution massively tiled display environments by displaying hundreds of mega pixels worth of visual information all at once several users can simultaneously compare and contrast complex data layers in collaborative environment
we define bi simulations up to preorder and show how we can use them to provide coinductive bi simulation like characterisation of semantic equivalences preorders for processes in particular we can apply our results to all the semantics in the linear time branching time spectrum that are defined by preorders coarser than the ready simulation preorder the relation between bisimulations up to and simulations up to allows us to find some new relations between the equivalences that define the semantics and the corresponding preorders in particular we have shown that the simulation up to an equivalence relation is canonical preorder whose kernel is the given equivalence relation since all of these canonical preorders are defined in an homogeneous way we can prove properties for them in generic way as an illustrative example of this technique we generate an axiomatic characterisation of each of these canonical preorders that is obtained simply by adding single axiom to the axiomatization of the original equivalence relation thus we provide an alternative axiomatization for any axiomatizable preorder in the linear time branching time spectrum whose correctness and completeness can be proved once and for all although we first prove by induction our results for finite processes then we see by using continuity arguments that they are also valid for infinite finitary processes
mining web click streams is an important data mining problem with broad applications however it is also difficult problem since the streaming data possess some interesting characteristics such as unknown or unbounded length possibly very fast arrival rate inability to backtrack over previously arrived click sequences and lack of system control over the order in which the data arrive in this paper we propose projection based single pass algorithm called dsm plw data stream mining for path traversal patterns in landmark window for online incremental mining of path traversal patterns over continuous stream of maximal forward references generated at rapid rate according to the algorithm each maximal forward reference of the stream is projected into set of reference suffix maximal forward references and these reference suffix maximal forward references are inserted into new in memory summary data structure called sp forest summary path traversal pattern forest which is an extended prefix tree based data structure for storing essential information about frequent reference sequences of the stream so far the set of all maximal reference sequences is determined from the sp forest by depth first search mechanism called mrs mining maximal reference sequence mining theoretical analysis and experimental studies show that the proposed algorithm has gently growing memory requirements and makes only one pass over the streaming data
information retrieval systems often use machine learning techniques to induce classifiers capable of categorizing documents unfortunately the circumstance that the same document may simultaneously belong to two or more categories has so far received inadequate attention and induction techniques currently in use often suffer from prohibitive computational costs in the case study reported in this article we managed to reduce these costs by running baseline induction algorithm on the training examples described by diverse feature subsets thus obtaining several subclassifiers when asked about document’s classes master classifier combines the outputs of the subclassifiers this combination can be accomplished in several different ways but we achieved the best results with our own mechanism inspired by the dempster shafer theory dst we describe the technique compare its performance experimentally with that of more traditional voting approaches and show that its substantial computational savings were achieved in exchange for acceptable loss in classification performance
the demand for network enabled limited footprint mobile devices is increasing rapidly central challenge that must be addressed in order to use these next generation devices effectively is efficient data management persistent data manipulated or required by applications executing on these computationally and communicationally impoverished devices must be consistently managed and made highly available this data management has traditionally been the responsibility of the os on which applications execute in this paper we extend this conventional os functionality to include post pc devices we propose novel programmatic solution to the problem of maintaining high data availability while attaining eventual consistency in the presence of mobility and disconnected operations device and network failures and limited device capabilities we achieve this by using combination of novel proxy architecture split request reply queue based on soft state principles and two tier update commit protocol we also exploit strong object typing to provide application specific conflict handling in order to attain faster eventual consistency as well as greater probability of automatic reconciliation
we propose regular expression pattern matching as core feature of programming languages for manipulating xml we extend conventional pattern matching facilities as in ml with regular expression operators such as repetition alternation etc that can match arbitrarily long sequences of subtrees allowing compact pattern to extract data from the middle of complex sequence we then show how to check standard notions of exhaustiveness and redundancy for these patterns regular expression patterns are intended to be used in languages with type systems based on regular expression types to avoid excessive type annotations we develop type inference scheme that propagates type constraints to pattern variables from the type of input values the type inference algorithm translates types and patterns into regular tree automata and then works in terms of standard closure operations union intersection and difference on tree automata the main technical challenge is dealing with the interaction of repetition and alternation patterns with the first match policy which gives rise to subtleties concerning both the termination and precision of the analysis we address these issues by introducing data structure representing these closure operations lazily
the paper proposes forbidden set labeling scheme for the family of graphs with doubling dimension bounded by for an vertex graph in this family and for any desired precision parameter the labeling scheme stores an log bit label at each vertex given the labels of two end vertices and and the labels of set of forbidden vertices and or edges our scheme can compute in time polynomial in the length of the labels stretch approximation for the distance between and in the graph gf the labeling scheme can be extended into forbidden set labeled routing scheme with stretch for graphs of bounded doubling dimension
science projects of various disciplines generate large amounts of data and face fundamental challenge thousands of researchers want to obtain new scientific results by logically relating subsets of the total volume of data considering the huge and widely distributed amounts of data science communities investigate different technologies to provide fast access to the growing data sets among these technologies peer to peer pp and data grid are two models that fit these requirements well because of their potential to provide high quality of service with low cost in this paper we explore the possibility of using the pp paradigm for data intensive science applications on the grid we argue that additional support is required to achieve fast access to the huge and widely distributed amounts of data and propose escigrid to overcome the scalability barriers in today’s science communities escigrid allows science communities to achieve high query throughput through decentralized protocol which integrates caching with query processing the protocol takes into account the physical distance between peers and the amount of traffic carried by each node the result of this integration is constant complexity for moderate queries and fast data transfers between grid peers our results show that escigrid increases the performance of data access on science grids
composite object represented as directed graph is an important data structure which requires efficient support in cad cam case office systems software management web databases and document databases it is cumbersome to handle such an object in relational database systems when it involves recursive relationships in this chapter we present new encoding method to support the efficient computation of recursion in addition we devise linear time algorithm to identify sequence of reachable trees wrt directed acyclic graph dag which covers all the edges of the graph together with the new encoding method this algorithm enables us to compute recursion wrt dag in time where represents the number of edges of the dag more importantly this method is especially suitable for relational environment
formal frameworks exist that allow service providers and users to negotiate the quality of service while these agreements usually include non functional service properties the quality of the information offered by provider is neglected yet in important application scenarios notably in those based on the service oriented computing paradigm the outcome of complex workflows is directly affected by the quality of the data involved in this paper we propose model for formal data quality agreements between data providers and data consumers and analyze its feasibility by showing how provider may take data quality constraints into account as part of its data provisioning process our analysis of the technical issues involved suggests that this is complex problem in general although satisfactory algorithmic and architectural solutions can be found under certain assumptions to support this claim we describe an algorithm for dealing with constraints on the completeness of query result with respect to reference data source and outline an initial provider architecture for managing more general data quality constraints
inter object references are one of the key concepts of object relational and object oriented database systems in this work we investigate alternative techniques to implement inter object references and make the best use of them in query processing ie in evaluating functional joins we will give comprehensive overview and performance evaluation of all known techniques for simple single valued as well as multi valued functional joins furthermore we will describe special order preserving functional join techniques that are particularly attractive for decision support queries that require ordered results while most of the presentation of this paper is focused on object relational and object oriented database systems some of the results can also be applied to plain relational databases because index nested loop joins along key foreign key relationships as they are frequently found in relational databases are just one particular way to execute functional join
dynamic finite versioning dfv schemes are an effective approach to concurrent transaction and query processing where finite number of consistent but maybe slightly out of date logical snapshots of the database can be dynamically derived for query access in dfv the storage overhead for keeping additional versions of changed data to support the logical snapshots and the amount of obsolescence faced by queries are two major performance issues in this paper we analyze the performance of dfv with emphasis on the trade offs between the storage cost and obsolescence we develop analytical models based on renewal process approximation to evaluate the performance of dfv using snapshots asymptotic closed form results for high query arrival rates are given for the case of two snapshots simulation is used to validate the analytical models and to evaluate the trade offs between various strategies for advancing snapshots when the results show that the analytical models match closely with simulation both the storage cost and obsolescence are sensitive to the snapshot advancing strategies especially for snapshots and generally speaking increasing the number of snapshots demonstrates trade off between storage overhead and query obsolescence for cases with skewed access or low update rates moderate increase in the number of snapshots beyond two can substantially reduce the obsolescence while the storage overhead may increase only slightly or even decrease in some cases such reduction in obsolescence is more significant as the coefficient of variation of the query length distribution becomes larger moreover for very low update rates large number of snapshots can be used to reduce the obsolescence to almost zero without increasing the storage overhead
fundamental problem in automating object database storage reclamation is determining how often to perform garbage collection we show that the choice of collection rate can have significant impact on application performance and that the best rate depends on the dynamic behavior of the application tempered by the particular performance goals of the user we describe two semi automatic self adaptive policies for controlling collection rate that we have developed to address the problem using trace driven simulations we evaluate the performance of the policies on test database application that demonstrates two distinct reclustering behaviors our results show that the policies are effective at achieving user specified levels of operations and database garbage percentage we also investigate the sensitivity of the policies over range of object connectivities the evaluation demonstrates that semi automatic self adaptive policies are practical means for flexibly controlling garbage collection rate
automatic differentiation ad is family of techniques to generate derivative code from mathematical model expressed in programming language ad computes partial derivatives for each operation in the input code and combines them to produce the desired derivative by applying the chain rule activity analysis is compiler analysis used to find active variables in automatic differentiation by lifting the burden of computing partial derivatives for passive variables activity analysis can reduce the memory requirement and run time of the generated derivative code this paper compares new context sensitive flow insensitive csfi activity analysis with an existing context insensitive flow sensitive cifs activity analysis in terms of execution time and the quality of the analysis results our experiments with eight benchmarks show that the new csfi activity analysis runs up to times faster and overestimates up to times fewer active variables than does the existing cifs activity analysis
this paper describes novel interactive media authoring framework mediate that enables amateurs to create videos of higher narrative or aesthetic quality with completely mobile lifecycle novel event bootstrapping dialog is used to derive shot suggestions that yield both targetted footage and annotation enabling an automatic computational media aesthetics aware editing phase the manual performance of which is typically barrier to the amateur this facilitates move away from requiring prior conception of the events or locale being filmed in the form of template to at capture bootstrapping of this information metadata gathered as part of the critical path of media creation also has implications for the longevity and reuse of captured media assets results of an evaluation performed on both the usability and delivered media aspects of the system are discussed which highlight the tenability of the proposed framework and the quality of the produced media
peer to peer pp file sharing systems are characterized by highly replicated content distributed among nodes with enormous aggregate resources for storage and communication these properties alone are not sufficient however to render pp networks immune to denial of service dos attack in this paper we study by means of analytical modeling and simulation the resilience of pp file sharing systems against dos attacks in which malicious nodes respond to queries with erroneous responses we consider the file targeted attacks in current use in the internet and we introduce new class of pp network targeted attacksin file targeted attacks the attacker puts large number of corrupted versions of single file on the network we demonstrate that the effectiveness of these attacks is highly dependent on the clients behavior for the attacks to succeed over the long term clients must be unwilling to share files slow in removing corrupted files from their machines and quick to give up downloading when the system is under attackin network targeted attacks attackers respond to queries for any file with erroneous information our results indicate that these attacks are highly scalable increasing the number of malicious nodes yields hyperexponential decrease in system goodput and moderate number of attackers suffices to cause near collapse of the entire system the key factors inducing this vulnerability are hierarchical topologies with misbehaving supernodes ii high path length networks in which attackers have increased opportunity to falsify control information and iii power law networks in which attackers insert themselves into high degree points in the graphfinally we consider the effects of client counter strategies such as randomized reply selection redundant and parallel download and reputation systems some counter strategies eg randomized reply selection provide considerable immunity to attack reducing the scaling from hyperexponential to linear yet significantly hurt performance in the absence of an attack other counter strategies yield little benefit or penalty in particular reputation systems show little impact unless they operate with near perfection
relational learning and inductive logic programming ilp commonly use as covering test the subsumption test defined by plotkin based on reformulation of subsumption as binary constraint satisfaction problem this paper describes novel subsumption algorithm named django which combines well known csp procedures and subsumption specific data structures django is validated using the stochastic complexity framework developed in csps and imported in ilp by giordana et saitta principled and extensive experiments within this framework show that django improves on earlier subsumption algorithms by several orders of magnitude and that different procedures are better at different regions of the stochastic complexity landscape these experiments allow for building control layer over django termed meta django which determines the best procedures to use depending on the order parameters of the subsumption problem instance the performance gains and good scalability of django and meta django are finally demonstrated on real world ilp task emulating the search for frequent clauses in the mutagenesis domain though the smaller size of the problems results in smaller gain factors ranging from to
processing and extracting meaningful knowledge from count data is an important problem in data mining the volume of data is increasing dramatically as the data is generated by day to day activities such as market basket data web clickstream data or network data most mining and analysis algorithms require multiple passes over the data which requires extreme amounts of time one solution to save time would be to use samples since sampling is good surrogate for the data and the same sample can be used to answer many kinds of queries in this paper we propose two deterministic sampling algorithms biased and drs both produce samples vastly superior to the previous deterministic and random algorithms both in sample quality and accuracy our algorithms also improve on the run time and memory footprint of the existing deterministic algorithms the new algorithms can be used to sample from relational database as well as data streams with the ability to examine each transaction only once and maintain the sample on the fly in streaming fashion we further show how to engineer one of our algorithms drs to adapt and recover from changes to the underlying data distribution or sample size we evaluate our algorithms on three different synthetic datasets as well as on real world clickstream data and demonstrate the improvements over previous art
many software systems fail to address their intended purpose because of lack of user involvement and requirements deficiencies this paper discusses the elaboration of requirements analysis process that integrates critical parameter based approach to task modeling within user centric design framework on one hand adapting task models to capture requirements bridges the gap between scenarios and critical parameters which benefits design from the standpoint of user involvement and accurate requirements on the other hand using task models as reusable component leverages requirements reuse which benefits design by increasing quality while simultaneously reducing development costs and time to market first we present the establishment of both user centric and reuse centric requirements process along with its implementation within an integrated design tool suite secondly we report the design procedures and findings of two user studies aimed at assessing the feasibility for novice designers to conduct the process as well as evaluating the resulting benefits upon requirements analysis deliverables requirements quality and requirements reuse
in many applications one common problem is to identify images which may have undergone unknown transformations we define this problem as transformed image identification tii where the goal is to identify geometrically transformed and signal processed images for given test image the tii consists of three main stages feature detection feature representation and feature matching the tii approach by lowe dg lowe distinctive image features from scale invariant keypoints int comput vision is one of the most promising techniques however both of its feature detection and matching stages are expensive because large number of feature points are detected in the image scale space and each feature point is described using high dimensional vector in this paper we explore the use of different techniques in each of the three tii stages and propose number of promising tii approaches by combining different techniques of the three stages our experimental results reveal that the proposed approaches not only improve the computational efficiency and decrease the storage requirement significantly but also increase the transformed image identification accuracy robustness
regression testing is an expensive maintenance process used to revalidate modified software regression test selection rts techniques try to lower the cost of regression testing by selecting and running subset of the existing test cases many such techniques have been proposed and initial studies show that they can produce savings we believe however that issues such as the frequency with which testing is done have strong effect on the behavior of these techniques therefore we conducted an experiment to assess the effects of test application frequency on the costs and benefits of regression test selection techniques our results expose essential tradeoffs that should be considered when using these techniques over series of software releases
the error tolerance of human perception offers range of opportunities to trade numerical accuracy for performance in physics based simulation however most prior work on perceptual error tolerance either focus exclusively on understanding the tolerance of the human visual system or burden the application developer with case specific implementations such as level of detail lod techniques in this article based on detailed set of perceptual metrics we propose methodology to identify the maximum error tolerance of physics simulation then we apply this methodology in the evaluation of four case studies first we utilize the methodology in the tuning of the simulation timestep the second study deals with tuning the iteration count for the lcp solver then we evaluate the perceptual quality of fast estimation with error control feec yeh et al finally we explore the hardware optimization technique of precision reduction
intelligent desktop environments allow the desktop user to define set of projects or activities that characterize the user’s desktop work these environments then attempt to identify the current activity of the user in order to provide various kinds of assistance these systems take hybrid approach in which they allow the user to declare their current activity but they also employ learned classifiers to predict the current activity to cover those cases where the user forgets to declare the current activity the classifiers must be trained on the very noisy data obtained from the user’s activity declarations instead of asking the user to review and relabel the data manually we employ an active em algorithm that combines the em algorithm and active learning em can be viewed as retraining on its own predictions to make it more robust we only retrain on those predictions that are made with high confidence for active learning we make small number of queries to the user based on the most uncertain instances experimental results on real users show this active em algorithm can significantly improve the prediction precision and that it performs better than either em or active learning alone
this paper addresses the problem of designing and implementing complex control systems for real time embedded software typical applications involve different control laws corresponding to different phases or modes eg take off full flight and landing in fly by wire control system on one hand existing methods such as the combination of simulink stateflow provide powerful but unsafe mechanisms by means of imperative updates of shared variables on the other hand synchronous languages and tools such as esterel or scade lustre are too restrictive and forbid to fully separate the specification of modes from their actual instantiation with particular control automaton in this paper we introduce conservative extension of synchronous data flow language close to lustre in order to be able to define systems with modes in more modular way while insuring the absence of data races we show that such system can be viewed as an object where modes are methods acting on shared memory the object is associated to scheduling policy which specifies the ways methods can be called to build valid synchronous reaction we show that the verification of the proper use of an object reduces to type inference problem using row types introduced by wand rémy and vouillon we define the semantics of the extended synchronous language and the type system the proposed extension has been implemented and we illustrate its use through several examples
in keyword search over data graphs an answer is nonredundant subtree that includes the given keywords an algorithm for enumerating answers is presented within an architecture that has two main components an engine that generates set of candidate answers and ranker that evaluates their score to be effective the engine must have three fundamental properties it should not miss relevant answers has to be efficient and must generate the answers in an order that is highly correlated with the desired ranking it is shown that none of the existing systems has implemented an engine that has all of these properties in contrast this paper presents an engine that generates all the answers with provable guarantees experiments show that the engine performs well in practice it is also shown how to adapt this engine to queries under the or semantics in addition this paper presents novel approach for implementing rankers destined for eliminating redundancy essentially an answer is ranked according to its individual properties relevancy and its intersection with the answers that have already been presented to the user within this approach experiments with specific rankers are described
the metric space model abstracts many proximity or similarity problems where the most frequently considered primitives are range and nearest neighbor search leaving out the similarity join an extremely important primitive in fact despite the great attention that this primitive has received in traditional and even multidimensional databases little has been done for general metric databases we solve two variants of the similarity join problem range joins given two sets of objects and distance threshold find all the object pairs one from each set at distance at most and closest pair joins find the closest object pairs one from each set for this sake we devise new metric index coined list of twin clusters ltc which indexes both sets jointly instead of the natural approach of indexing one or both sets independently finally we show how to use the ltc in order to solve classical range queries our results show significant speedups over the basic quadratic time naive alternative for both join variants and that the ltc is competitive with the original list of clusters when solving range queries furthermore we show that our technique has great potential for improvements
in this paper we show that feedback vertex set on planargraphs has kernel of size at most we give polynomialtime algorithm that given planar graph finds equivalent planargraph with at most vertices where is the size of the minimumfeedback vertex set of the kernelization algorithm is basedon number of reduction rules the correctness of most of these rulesis shown using new notion bases of induced subgraphs we also showhow to use this new notion to automatically prove safeness of reductionrules and obtain tighter bounds for the size of the kernel
we describe novel document summarization technique that uses informational cues such as social bookmarks or search queries as the basis for summary construction by leveraging the snippet generation capabilities of standard search engines comprehensive evaluation demonstrates how the social summarization technique can generate summaries that are of significantly higher quality that those produced by number of leading alternatives
information provision to address the changing requirements can be best supported by content management the current information technology enables information to be stored and provided from various distributed sources to identify and retrieve relevant information requires effective mechanisms for information discovery and assembly this paper presents method which enables the design of such mechanisms with set of techniques for articulating and profiling users requirements formulating information provision specifications realising management of information content in repositories and facilitating response to the user’s requirements dynamically during the process of knowledge construction these functions are represented in an ontology which integrates the capability of the mechanisms the ontological modelling in this paper has adopted semiotics principles with embedded norms to ensure coherent course of actions represented in these mechanisms
the impact of pipeline length on both the power and performance of microprocessor is explored both by theory and by simulation theory is presented for range of power performance metrics bipsm the theory shows that the more important power is to the metric the shorter the optimum pipeline length that results for typical parameters neither bips nor bips yield an optimum ie non pipelined design is optimal for bips the optimum averaged over all workloads studied occurs at fo design point stage pipeline but this value is highly dependent on the assumed growth in latch count with pipeline depth as dynamic power grows the optimal design point shifts to shorter pipelines clock gating pushes the optimum to deeper pipelines surprisingly as leakage power grows the optimum is also found to shift to deeper pipelines the optimum pipeline depth varies for different classes of workloads spec and spec integer applications traditional legacy database and on line transaction processing applications modern web applications and floating point applications
traditional text classification algorithms are based on basic assumption the training and test data should hold the same distribution however this identical distribution assumption is always violated in real applications due to the distribution of test data from target domain and the distribution of training data from auxiliary domain are different we call this classification problem cross domain classification although most of the training data are drawn from auxiliary domain we still can obtain few training data drawn from target domain to solve the cross domain classification problem in this situation we propose two stage algorithm which is based on semi supervised classification we firstly utilizes labeled data in target domain to filter the support vectors of the auxiliary domain then uses filtered data and labeled data from target domain to construct classifier for the target domain the experimental evaluation on real world text classification problems demonstrates encouraging results and validates our approach
with internet delivery of video content surging to an unprecedented level online video advertising is becoming increasingly pervasive in this paper we present novel advertising system for online video service called videosense which automatically associates the most relevant video ads with online videos and seamlessly inserts the ads at the most appropriate positions within each individual video unlike most current video oriented sites that only display video ad at the beginning or the end of video videosense aims to embed more contextually relevant ads at less intrusive positions within the video stream given an online video videosense is able to detect set of candidate ad insertion points based on content discontinuity and attractiveness select list of relevant candidate ads ranked according to global textual relevance and compute local visual aural relevance between each pair of insertion points and ads to support contextually relevant and less intrusive advertising the ads are expected to be inserted at the positions with highest discontinuity and lowest attractiveness while the overall global and local relevance is maximized we formulate this task as nonlinear integer programming problem and embed these rules as constraints the experiments have proved the effectiveness of videosense for online video advertising
this paper addresses some issues involved in applying the event condition action eca rule paradigm of active databases to policies collections of general principles specifying the desired behavior of system we use declarative policy description language cal pdl in which policies are formulated as sets of eca rules the main contribution of the paper is framework for detecting action conflicts and finding resolutions to these conflicts conflicts are captured as violations of action constraints the semantics of rules and conflict detection and resolution are defined axiomatically using logic programs given policy and set of action constraints the framework defines range of monitors that filter the output of the policy to satisfy the constraints
coordination languages and models can play key role in the engineering of environment in mas multiagent systems in this paper we take the respect coordination language for programming tuple centres and extend it so as to govern interactions between agents and environment in particular we show how its event model can be generalised to support the management of general environment events and make tuple centres situated to this end first case study is sketched where it is shown how the extended respect can be adopted to coordinate system for sensing and controlling environmental properties then the syntax and semantics of the extended version of respect is discussed
domain specific languages dsl have many potential advantages in terms of software engineering ranging from increased productivity to the application of formal methods although they have been used in practice for decades there has been little study of methodology or implementation tools for the dsl approach in this paper we present our dsl approach and its application to realistic domain the generation of video display device drivers the presentation focuses on the validation of our proposed framework for domain specific languages from design to implementation the framework leads to flexible design and structure and provides automatic generation of efficient implementations of dsl programs additionally we describe an example of complete dsl for video display adaptors and the benefits of the dsl approach for this application this demonstrates some of the generally claimed benefits of using dsls increased productivity higher level abstraction and easier verification this dsl has been fully implemented with our approach and is available compose project url http wwwirisafr compose gal
this paper describes an architecture allowing to verify properties of multiagent system during its execution this architecture is the basis of our study whose goal is to check at runtime if agents and more generally multiagent systems satisfy requirements considering that correct system is system verifying the properties specified by the designer we are interested in the property notion that is why we give here definition of property and we present an architecture to validate them the architecture multiagent system itself is based on set of agents whose goals are to check at runtime the whole system’s properties so after brief description of the property notion we describe our architecture and the way to check systems
model driven architecture mda is software development approach promoted by the omg mda is based on two key concepts models and model transformations several kinds of models are generally used throughout the development process to specify software system and to support its analysis and validation uml and its extensions such as the uml profile for real time systems uml spt are commonly used to define the structure and the behavior of software systems while other models such as performance models or schedulability models are more suitable for performance or schedulability analysis respectively in this paper we discuss model transformation enabling the derivation of schedulability analysis models from uml spt models as proof of concepts we present prototype implementation of this model transformation using atl we provide definition of the source and target metamodels using the metamodel specification language km and we specify the transformation in an atl module we discuss the merits and limitations of our approach and of its implementation
alhambra is browser based system designed to enforce and test web browser security policies at the core of alhambra is policy enhanced browser supporting fine grain security policies that restrict web page contents and execution alhambra requires no server side modifications or additions to the web application policies can restrict the construction of the document as well as the execution of javascript using access control rules and taint tracking engine using the alhambra browser we present two security policies that we have built using our architecture both designed to prevent cross site scripting the first policy uses taint tracking engine to prevent cross site scripting attacks that exploit bugs in the client side of the web applications the second one uses browsing history to create policies that restrict the contents of documents and prevent the inclusion of malicious content using alhambra we analyze the impact of policies on the compatibility of web pages to test compatibility alhambra supports revisiting user generated browsing sessions and comparing multiple security policies in parallel to quickly and automatically evaluate security policies to compare security policies for identical pages we have also developed useful comparison metrics that quantify differences between identical pages executed with different security policies not only do we show that our policies are effective with minimal compatibility cost we also demonstrate that alhambra can enforce strong security policies and provide quantitative evaluation of the differences introduced by security policies
in this paper we show that any point metric space can be embedded into distribution over dominating tree metrics such that the expected stretch of any edge is log this improves upon the result of bartal who gave bound of log log log moreover our result is existentially tight there exist metric spaces where any tree embedding must have distortion log distortion this problem lies at the heart of numerous approximation and online algorithms including ones for group steiner tree metric labeling buy at bulk network design and metrical task system our result improves the performance guarantees for all of these problems
demographic data regarding users and items exist in most available recommender systems data sets still there has been limited research involving such data this work sets the foundations for novel filtering technique which relies on information of that kind it starts by providing general step by step description of an approach which combines demographic information with existing filtering algorithms via weighted sum in order to generate more accurate predictions demog and demog are presented as an application of that general approach specifically on user based and item based collaborative filtering several experiments involving different settings of the proposed approach support its utility and prove that it shows enough promise in generating predictions of improved quality
web services ws constitute an essential factor for the next generation of application integration an important direction apart from the optimization of the description mechanism is the discovery of ws information and ws search engines lookup capabilities in this paper we propose novel decentralized approach for ws discovery based on new distributed peer based approach our proposed solution builds upon the domain name service decentralized approach majorly enhanced with novel efficient lookup up system in particular we work with peers that store ws information such as service descriptions which are efficiently located using scalable and robust data indexing structure for peer to peer networks the balanced distributed tree bdt bdt provides support for processing exact match queries of the form given key map the key onto node and range queries of the form map the nodes whose keys belong to given range bdt adapts efficiently update queries as nodes join and leave the system and can answer queries even if the system is continuously changing results from our theoretical analysis point out that the communication cost of both lookup and update operations scaling in sub logarithmic almost double logarithmic time complexity on the number of nodes furthermore our system is also robust on failures
many natural games have both high and low cost nash equilibria their price of anarchy is high and yet their price of stability is low in such cases one could hope to move behavior from high cost equilibrium to low cost one by public service advertising campaign encouraging players to follow the low cost equilibrium and if every player follows the advice then we are done however the assumption that everyone follows instructions is unrealistic more natural assumption is that some players will follow them while other players will not in this paper we consider the question of to what extent can such an advertising campaign cause behavior to switch from bad equilibrium to good one even if only fraction of people actually follow the given advice and do so only temporarily unlike the value of altruism model we assume everyone will ultimately act in their own interest we analyze this question for several important and widely studied classes of games including network design with fair cost sharing scheduling with unrelated machines and party affiliation games which include consensus and cut games we show that for some of these games such as fair cost sharing random alpha fraction of the population following the given advice is sufficient to get guarantee within an alpha factor of the price of stability for any alpha for other games such as party affiliation games there is strict threshold in this case alpha yields almost no benefit yet alpha is enough to reach near optimal behavior finally for some games such as scheduling no value alpha is sufficient we also consider viral marketing model in which certain players are specifically targeted and analyze the ability of such targeting to influence behavior using much smaller number of targeted players
streaming data to efficiently render complex scenes in presence of global illumination is still challenging task in this paper we introduce new data structure based on grid of irradiance vectors to store the indirect illumination appearing on complex and detailed objects the irradiance vector grid ivg this representation is independent of the geometric complexity and is suitable for quantization to different quantization schemes moreover its streaming over network involves only small overhead compared to detailed geometry and can be achieved independently of the geometry furthermore it can be efficiently rendered using modern graphics hardwar we demonstrate our new data structure in new remote visualization system that integrates indirect lighting streaming and progressive transmission of the geometry and study the impact of different strategies on data transfer
in the past few years there has been an increasing availability of technologies for the acquisition of digital models of real objects and the consequent use of these models in variety of applications in medicine engineering and cultural heritage in this framework content based retrieval of objects is becoming an important subject of research and finding adequate descriptors to capture global or local characteristics of the shape has become one of the main investigation goals in this article we present comparative analysis of few different solutions for description and retrieval by similarity of models that are representative of the principal classes of approaches proposed we have developed an experimental analysis by comparing these methods according to their robustness to deformations the ability to capture an object’s structural complexity and the resolution at which models are considered
finding the most relevant symmetry planes for an object is key step in many computer vision and object recognition tasks in fact such information can be effectively used as starting point for object segmentation noise reduction alignment and recognition some of these applications are strongly affected by the accuracy of symmetry planes estimation thus the use of technique that is both accurate and robust to noise is critical in this paper we introduce new weighted association graph which relates the main symmetry planes of objects to large sets of tightly coupled vertices this technique allows us to cast symmetry detection to classical pairwise clustering problem which we solve using the very effective dominant sets framework the improvement of our approach over other well known techniques is shown with several tests over both synthetic data and sampled point clouds
people need to find work with and put together information diverse activities such as scholarly research comparison shopping and entertainment involve collecting and connecting information resources we need to represent collections in ways that promote understanding of individual information resources and also their relationships representing individual resources with images as well as text makes good use of human cognitive facilities composition an alternative to lists means putting representations of elements in collection together using design principles to form connected wholewe develop combinformation mixed initiative system for representing collections as compositions of image and text surrogates the system provides set of direct manipulation facilities for forming editing organizing and distributing collections as compositions additionally to assist users in sifting through the vast expanse of potentially relevant information resources the system also includes generative agent that can proactively engage in processes of collecting information resources and forming image and text surrogates generative temporal visual composition agent develops the collection and its visual representation over time enabling users to see more possibilities to keep the user in control we develop interactive techniques that enable the user to direct the agentfor evaluation we conducted field study in an undergraduate general education course offered in the architecture department alternating groups of students used combinformation as an aid in preparing one of two major assignments involving information discovery to support processes of invention the students that used combinformation were found to perform better
traditionally an instruction decoder is designed as monolithic structure that inhibit the leakage energy optimization in this paper we consider split instruction decoder that enable the leakage energy optimization we also propose compiler scheduling algorithm that exploits instruction slack to increase the simultaneous active and idle duration in instruction decoder the proposed compiler assisted scheme obtains further reduction of energy consumption of instruction decoder over hardware only scheme for vliw architecture the benefits are and in the context of clustered and clustered vliw architecture respectively
peer production systems rely on users to self select appropriate tasks and scratch their personal itch however many such systems require significant maintenance work which also implies the need for collective action that is individuals following goals set by the group and performing good citizenship behaviors how can this paradox be resolved here we examine one potential answer the influence of social identification with the larger group on contributors behavior we examine wikipedia highly successful peer production system and find significant and growing influence of group structure with prevalent example being the wikiproject comparison of editors who join projects with those who do not and comparisons of the joiners behavior before and after they join project suggest their identification with the group plays an important role in directing them towards group goals and good citizenship behaviors upon joining wikipedians are more likely to work on project related content to shift their contributions towards coordination rather than production work and to perform maintenance work such as reverting vandalism these results suggest that group influence can play an important role in maintaining the health of online communities even when such communities are putatively self directed peer production systems
tracking unknown human motions using generative tracking techniques requires the exploration of high dimensional pose space which is both difficult and computationally expensive alternatively if the type of activity is known and training data is available low dimensional latent pose space may be learned and the difficulty and cost of the estimation task reduced in this paper we attempt to combine the competing benefits flexibility and efficiency of these two generative tracking scenarios within single approach we define number of activity models each composed of pose space with unique dimensionality and an associated dynamical model and each designed for use in the recovery of particular class of activity we then propose method for the fair combination of these activity models for use in particle dispersion by an annealed particle filter the resulting algorithm which we term multiple activity model annealed particle filtering mam apf is able to dynamically vary the scope of its search effort using small number of particles to explore latent pose spaces and large number of particles to explore the full pose space we present quantitative results on the humaneva and humaneva ii datasets demonstrating robust tracking of known and unknown activities from fewer than four cameras
in this paper we give surprisingly efficient algorithms for list decoding and testing random linear codes our main result is that random sparse linear codes are locally list decodable and locally testable in the high error regime with only constant number of queries more precisely we show that for all constants and and for every linear code which is sparse nc and unbiased each nonzero codeword in has weight in is locally testable and locally list decodable from fraction worst case errors using only poly queries to received word we also give subexponential time algorithms for list decoding arbitrary unbiased but not necessarily sparse linear codes in the high error regime in particular this yields the first subexponential time algorithm even for the problem of unique decoding random linear codes of inverse polynomial rate from fixed positive fraction of errors earlier kaufman and sudan had shown that sparse unbiased codes can be locally unique decoded and locally tested from constant fraction of errors where this constant fraction tends to as the number of codewords grows our results significantly strengthen their results while also having significantly simpler proofs at the heart of our algorithms is natural self correcting operation defined on codes and received words this self correcting operation transforms code with received word into simpler code and related received word such that is close to if and only if is close to starting with sparse unbiased code and an arbitrary received word constant number of applications of the self correcting operation reduces us to the case of local list decoding and testing for the hadamard code for which the well known algorithms of goldreich levin and blum luby rubinfeld are available this yields the constant query local algorithms for the original code our algorithm for decoding unbiased linear codes in subexponential time proceeds similarly applying the self correcting operation to an unbiased code and an arbitrary received word super constant number of times we get reduced to the problem of learning noisy parities for which non trivial subexponential time algorithms were recently given by blum kalai wasserman and feldman gopalan khot ponnuswami our result generalizes result of lyubashevsky which gave subexponential time algorithm for decoding random linear codes of inverse polynomial rate from random errors
multicore is now the dominant processor trend and the number of cores is rapidly increasing the paradigm shift to multicore forces the redesign of the software stack which includes dynamic analysis dynamic analyses provide rich features to software in various areas such as debugging testing optimization and security however these techniques often suffer from excessive overhead which make it less practical previously this overhead has been overcome by improved processor performance as each generation gets faster but the performance requirements of dynamic analyses in the multicore era cannot be fulfilled without redesigning for parallelism scalable design of dynamic analysis is challenging problem not only must the analysis itself must be parallel but the analysis must also be decoupled from the application and run concurrently typical method of decoupling the analysis from the application is to send the analysis data from the application to the core that runs the analysis thread via buffering however buffering can perturb application cache performance and the cache coherence protocol may not be efficient or even implemented with large numbers of cores in the future this paper presents our initial effort to explore the hardware design space and software approach that will alleviate the scalability problem for dynamic analysis on multicore we choose to make use of explicit inter core communication that is already available in real processor the tile processor and evaluate the opportunity for scalable dynamic analyses we provide our model and implement concurrent call graph profiling as case study our evaluation shows that pure communication overhead from the application point of view is as low as we expect that our work will help design scalable dynamic analyses and will influence the design of future many core processors
in this paper method with the double purpose of reducing the consumption of energy and giving deterministic guarantee on the fault tolerance of real time embedded systems operating under the rate monotonic discipline is presented lower bound exists on the slack left free by tasks being executed at their worst case execution time this deterministic slack can be redistributed and used for any of the two purposes the designer can set the trade off point between them in addition more slack can be reclaimed when tasks are executed in less than their worst case time fault tolerance is achieved by using the slack to recompute the faulty task energy consumption is reduced by lowering the operating frequency of the processor as much as possible while meeting all time constraints this leads to multifrequency method simulations are carried out to test it versus two single frequency methods nominal and reduced frequencies this is done under different trade off points and rates of faults occurrence the existence of an upper bound on the overhead caused by the transition time between frequencies in rate monotonic scheduled real time systems is formally proved the method can also be applied to multicore or multiprocessor systems
parameterization is one of the most powerful features to make specifications and declarative programs modular and reusable and our best hope for scaling up formal verification efforts this paper studies order sorted parameterization at three different levels its mathematical semantics ii its operational semantics by term rewriting and iii the inductive reasoning principles that can soundly be used to prove properties about such specifications it shows that achieving the desired properties at each of these three levels is considerably subtler matter than for many sorted specifications but that such properties can be attained under reasonable conditions
transferring existing mesh deformation from one character to another is simple way to accelerate the laborious process of mesh animation in many cases it is useful to preserve the semantic characteristics of the motion instead of its literal deformation for example when applying the walking motion of human to flamingo the knees should bend in the opposite direction semantic deformation transfer accomplishes this task with shape space that enables interpolation and projection with standard linear algebra given several example mesh pairs semantic deformation transfer infers correspondence between the shape spaces of the two characters this enables automatic transfer of new poses and animations
power and energy are first order design constraints in high performance computing current research using dynamic voltage scaling dvs relies on trading increased execution time for energy savings which is unacceptable for most high performance computing applications we present adagio novel runtime system that makes dvs practical for complex real world scientific applications by incurring only negligible delay while achieving significant energy savings adagio improves and extends previous state of the art algorithms by combining the lessons learned from static energy reducing cpu scheduling with novel runtime mechanism for slack prediction we present results using adagio for two real world programs umtk and paradis along with the nas parallel benchmark suite while requiring no modification to the application source code adagio provides total system energy savings of and for umtk and paradis respectively with less than increase in execution time
one of the strengths of rough set theory is the fact that an unknown target concept can be approximately characterized by existing knowledge structures in knowledge base knowledge structures in knowledge bases have two categories complete and incomplete in this paper through uniformly expressing these two kinds of knowledge structures we first address four operators on knowledge base which are adequate for generating new knowledge structures through using known knowledge structures then an axiom definition of knowledge granulation in knowledge bases is presented under which some existing knowledge granulations become its special forms finally we introduce the concept of knowledge distance for calculating the difference between two knowledge structures in the same knowledge base noting that the knowledge distance satisfies the three properties of distance space on all knowledge structures induced by given universe these results will be very helpful for knowledge discovery from knowledge bases and significant for establishing framework of granular computing in knowledge bases
distributed consensus algorithm allows processes to reach acommon decision value starting from individual inputs wait free consensus in which process always terminates within finite number of its own steps is impossible in anasynchronous shared memory system however consensus becomes solvable using randomization when process only has to terminatewith probability randomized consensus algorithms are typically evaluated by their total step complexity which is the expected total number of steps taken by all processes this work proves that the total step complexity of randomized consensus is in an asynchronous shared memory systemusing multi writer multi reader registers the bound is achieved by improving both the lower and the upper bounds for this problem in addition to improving upon the best previously known result bya factor of log the lower bound features agreatly streamlined proof both goals are achieved through restricting attention to set of layered executions andusing an isoperimetric inequality for analyzing their behavior the matching algorithm decreases the expected total step complexity by log factor by leveraging themulti writing capability of the shared registers its correctness proof is facilitated by viewing each execution of the algorithmas stochastic process and applying kolmogorov’s inequality
aggregates of individual objects such as forests crowds and piles of fruit are common source of complexity in computer graphics scenes when viewing an aggregate observers attend less to individual objects and focus more on overall properties such as numerosity variety and arrangement paradoxically rendering and modeling costs increase with aggregate complexity exactly when observers are attending less to individual objects in this paper we take some first steps to characterize the limits of visual coding of aggregates to efficiently represent their appearance in scenes we describe psychophysical experiments that explore the roles played by the geometric and material properties of individual objects in observers abilities to discriminate different aggregate collections based on these experiments we derive metrics to predict when two aggregates have the same appearance even when composed of different objects in follow up experiment we confirm that these metrics can be used to predict the appearance of range of realistic aggregates finally as proof of concept we show how these new aggregate perception metrics can be applied to simplify scenes by allowing substitution of geometrically simpler aggregates for more complex ones without changing appearance
semantic formalisms represent content in uniform way according to ontologies this enables manipulation and reasoning via automated means eg semantic web services but limits the user’s ability to explore the semantic data from point of view that originates from knowledge representation motivations we show how for user consumption visualization of semantic data according to some easily graspable dimensions eg space and time provides effective sense making of data in this paper we look holistically at the interaction between users and semantic data and propose multiple visualization strategies and dynamic filters to support the exploration of semantic rich data we discuss user evaluation and how interaction challenges could be overcome to create an effective user centred framework for the visualization and manipulation of semantic data the approach has been implemented and evaluated on real company archive
extended transaction models in databases were motivated by the needs of complex applications such as cad and software engineering transactions in such applications have diverse needs for example they may be long lived and they may need to cooperate we describe asset system for supporting extended transactions asset consists of set of transaction primitives that allow users to define custom transaction semantics to match the needs of specific applications we show how the transaction primitives can be used to specify variety of transaction models including nested transactions split transactions and sagas application specific transaction models with relaxed correctness criteria and computations involving workflows can also be specified using the primitives we describe the implementation of the asset primitives in the context of the ode database
for real time communication we must be able to guarantee timely delivery of messages in recent years improvements in technology have made possible switch based local area networks lans and system area networks sans that use wormhole switching pipelined switching technique which permits significantly shorter network latencies and higher throughputs than traditional store and forward packet switching this paper proposes model for real time communication in such wormhole networks based on the use of real time wormhole channels which are simplex virtual circuits in wormhole networks with certain real time guarantees distinguishing feature of our model is that it can be used in existing wormhole networks without any special hardware support preliminary delay analysis and properties are shown for the proposed real time wormhole channel model practical quadratic time complexity algorithm is shown for determining the feasibility of set of wormhole channels finally as an example of the utility of our model actual parameter values obtained from experiments on myrinet switch based network are used to determine if real time guarantees are possible for an example set of real time traffic streams intermixed with nonreal time traffic
multi core system on chips socs with on chip networks are becoming reality after almost decade of research one challenge in developing such socs is the need of efficient and accurate simulators for design space exploration this paper addresses this need by presenting socexplore framework for last communication centric design space exploration of complex socs with network based interconnects efficiency is achieved through abstraction of computation as high level trace while accuracy is maintained through cycle accurate interconnect simulation the flexibility offered allows for fast partition mapping and interconnect design space exploration in case study speed up of over architectural simulation is obtained for the mpeg application critical evaluation of the capabilities of our or any trace based framework is also provided
this paper presents simultaneous speculative threading sst which is technique for creating high performance area and power efficient cores for chip multiprocessors sst hardware dynamically extracts two threads of execution from single sequential program one consisting of load miss and its dependents and the other consisting of the instructions that are independent of the load miss and executes them in parallel sst uses an efficient checkpointing mechanism to eliminate the need for complex and power inefficient structures such as register renaming logic reorder buffers memory disambiguation buffers and large issue windows simulations of certain sst implementations show better per thread performance on commercial benchmarks than larger and higher powered out of order cores sun microsystems rock processor which is the first processor to use sst cores has been implemented and is scheduled to be commercially available in
we present new characterization of termination of general logic programs most existing termination analysis approaches rely on some static information about the structure of the source code of logic program such as modes types norms level mappings models interargument relations and the like we propose dynamic approach that employs some key dynamic features of an infinite generalized sldnf derivation such as repetition of selected subgoals and recursive increase in term size we also introduce new formulation of sldnf trees called generalized sldnf trees generalized sldnf trees deal with negative subgoals in the same way as prolog and exist for any general logic programs
managing the power consumption of computing platforms is complicated problem thanks to multitude of hardware configuration options and characteristics much of the academic research is based on unrealistic assumptions and has therefore seen little practical uptake we provide an overview of the difficulties facing power management schemes when used in real systems we present koala platform which uses pre characterised model at run time to predict the performance and energy consumption of piece of software an arbitrary policy can then be applied in order to dynamically trade performance and energy consumption we have implemented this system in recent linux kernel and evaluated it by running variety of benchmarks on number of different platforms under some conditions we observe energy savings of for performance loss
in many application domains eg www mining molecular biology large string datasets are available and yet under exploited the inductive database framework assumes that both such datasets and the various patterns holding within them might be queryable in this setting queries which return patterns are called inductive queries and solving them is one of the core research topics for data mining indeed constraint based mining techniques on string datasets have been studied extensively efficient algorithms enable to compute complete collections of patterns eg substrings which satisfy conjunctions of monotonic and or anti monotonic constraints in large datasets eg conjunctions of minimal and maximal support constraints we consider that fault tolerance and softness are extremely important issues for tackling real life data analysis we address some of the open problems when evaluating soft support constraint which implies the computations of pattern soft occurrences instead of the classical exact matching ones solving efficiently soft support constraints is challenging since it prevents from the clever use of monotonicity properties we describe our proposal and we provide an experimental validation on real life clickstream data which confirms the added value of this approach
sets and bags are closely related structures and have been studied in relational databases bag is different from set in that it is sensitive to the number of times an element occurs while set is not in this paper we introduce the concept of web bag in the context of web warehouse called whoweda areh ouse we da da ta which we are currently building informally web bag is web table which allows multiple occurrences of identical web tuples web bag helps to discover useful knowledge from web table such as visible documents or web sites luminous documents and luminous paths in this paper we perform cost benefit analysis with respect to storage transmission and operational cost of web bags and discussed issues and implication of materializing web bags as opposed to web tables containing distinct web tuples we have computed analytically the upper and lower bounds for the parameters which affect the cost of materializing web bags
this work investigates design choices in modeling discourse scheme for improving opinion polarity classification for this two diverse global inference paradigms are used supervised collective classification framework and an unsupervised optimization framework both approaches perform substantially better than baseline approaches establishing the efficacy of the methods and the underlying discourse scheme we also present quantitative and qualitative analyses showing how the improvements are achieved
in this paper we present general and an efficient algorithm for automatic selection of new application specific instructions under hardware resources constraints the instruction selection is formulated as an ilp problem and efficient solvers can be used for finding the optimal solution an important feature of our algorithm is that it is not restricted to basic block level nor does it impose any limitation on the number of the newly added instructions or on the number of the inputs outputs of these instructions the presented results show that significant overall application speedup is achieved even for large kernels for adpcm decoder the speedup ranges from to and that our algorithm compares well with other state of art algorithms for automatic instruction set extensions
inverse dithering is to restore the original continuous tone image from its dithering halftone we propose to use iterated conditional modes icm for approximating maximum posteriori map solution to the inverse problem the statistical model on which the icm is based takes advantage of the information on dither arrays for the considered two common mrf’s for measuring the smoothness of images the corresponding energy functions are convex the combination of this convexity and the structure of the constraint space associated with the map problem guarantees the global optimality the icm always searches the valid image space for better estimate there is no question of going beyond the valid space in addition it requires only local computation and is easy to implement the experimental results show that the restored images have high quality compared with two previous dmi dithering model based inverse methods our icm has higher psnr’s by db the results also show that using the gauss mrf gmrf for the continuous tone image often had higher psnr than using the huber mrf hmrf an advantage of the gmrf is that it makes the icm much easier to implement than the hmrf makes
concolic testing automates test input generation by combining the concrete and symbolic concolic execution of the code under test traditional test input generation techniques use either concrete execution or symbolic execution that builds constraints and is followed by generation of concrete test inputs from these constraints in contrast concolic testing tightly couples both concrete and symbolic executions they run simultaneously and each gets feedback from the other we have implemented concolic testing in tools for testing both and java programs we have used the tools to find bugs in several real world software systems including sglib popular data structure library used in commercial tool third party implementation of the needham schroeder protocol and the tmn protocol the scheduler of honeywell’s deos real time operating system and the sun microsystems jdk collection framework in this tutorial we will describe concolic testing and some of its recent extensions
in multihop wireless systems the need for cooperation among nodes to relay each other’s packets exposes them to wide range of security attacks particularly devastating attack is the wormhole attack where malicious node records control traffic at one location and tunnels it to colluding node possibly far away which replays it locally this can have an adverse effect on route establishment by preventing nodes from discovering legitimate routes that are more than two hops away previous works on tolerating wormhole attacks have focused only on detection and used specialized hardware such as directional antennas or extremely accurate clocks more recent work has addressed the problem of locally isolating the malicious nodes however all of this work has been done in the context of static networks due to the difficulty of secure neighbor discovery with mobile nodes the existing work on secure neighbor discovery has limitations in accuracy resource requirements and applicability to ad hoc and sensor networks in this paper we present countermeasure for the wormhole attack called mobiworp which alleviates these drawbacks and efficiently mitigates the wormhole attack in mobile networks mobiworp uses secure central authority ca for global tracking of node positions local monitoring is used to detect and isolate malicious nodes locally additionally when sufficient suspicion builds up at the ca it enforces global isolation of the malicious node from the whole network the effect of mobiworp on the data traffic and the fidelity of detection is brought out through extensive simulation using ns the results show that as time progresses the data packet drop ratio goes to zero with mobiworp due the capability of mobiworp to detect diagnose and isolate malicious nodes with an appropriate choice of design parameters mobiworp is shown to completely eliminate framing of legitimate node by malicious nodes at the cost of slight increase in the drop ratio the results also show that increasing mobility of the nodes degrades the performance of mobiworp
this work is settled in the framework of abstract simplicial complexes we propose definition of watershed and of collapse for maps defined on pseudomanifolds of arbitrary dimension through an equivalence theorem we establish deep link between these two notions any watershed can be obtained by collapse iterated until idempotence and conversely any collapse iterated until idempotence induces watershed we also state an equivalence result which links the notions of watershed and of collapse with the one of minimum spanning forest
we evaluate the dense optical flow between two frames via variational approach in this paper new framework for deriving the regularization term is introduced giving geometric insight into the action of smoothing term the framework is based on the beltrami paradigm in image denoising it includes general formulation that unifies several previous methods using the proposed framework we also derive two novel anisotropic regularizers incorporating new criterion that requires co linearity between the gradients of optical flow components and possibly the intensity gradient we call this criterion alignment and reveal its existence also in the celebrated nagel and enkelmann’s formulation it is shown that the physical model of rotational motion of rigid body pure divergent convergent flow and irrotational fluid flow satisfy the alignment criterion in the flow field experimental tests in comparison to recently published method show the capability of the new criterion in improving the optical flow estimations
there has been lot of recent research on transaction based concurrent programming aimed at offering an easier concurrent programming paradigm that enables programmers to better exploit the parallelism of modern multi processor machines such as multi core microprocessors we introduce transactional state machines tsms as an abstract finite data model of transactional shared memory concurrent programs tsms are variant of concurrent boolean programs or concurrent extended recursive state machines augmented with additional constructs for specifying potentially nested transactions namely some procedures or code segments can be marked as transactions and are meant to be executed atomically and there are also explicit commit and abort operations for transactions the tsm model is non blocking and allows interleaved executions where multiple processes can simultaneously be executing inside transactions it also allows nested transactions transactions which may never terminate and transactions which may be aborted explicitly or aborted automatically by the run time environment due to memory conflicts we show that concurrent executions of tsms satisfy correctness criterion closely related to serializability which we call stutter serializability with respect to shared memory we initiate study of model checking problems for tsms model checking arbitrary tsms is easily seen to be undecidable but we show it is decidable in the following case when recursion is exclusively used inside transactions in all but one of the processes we show that model checking such tsms against all stutter invariant omega regular properties of shared memory is decidable
we present websos novel overlay based architecture that provides guaranteed access to web server that is targeted by denial of service dos attack our approach exploits two key characteristics of the web environment its design around human centric interface and the extensibility inherent in many browsers through downloadable applets we guarantee access to web server for large number of previously unknown users without requiring pre existing trust relationships between users and the systemour prototype requires no modifications to either servers or browsers and makes use of graphical turing tests web proxies and client authentication using the ssl tls protocol all readily supported by modern browsers we use the websos prototype to conduct performance evaluation over the internet using planetlab testbed for experimentation with network overlays we determine the end to end latency using both chord based approach and our shortcut extension our evaluation shows the latency increase by factor of and respectively confirming our simulation results
the tiered algorithm is presented for time efficient and message efficient detection of process termination it employs global invariant of equality between process production and consumption at each level of process nesting to detect termination regardless of execution interleaving order and network transit time correctness is validated for arbitrary process launching hierarchies including launch in transit hazards where processes are created dynamically based on run time conditions for remote execution the performance of the tiered algorithm is compared to three existing schemes with comparable capabilities namely the cv ltd and credit termination detection algorithms for synchronization of tasks terminating in epochs of idle processing the tiered algorithm is shown to incur message count complexity and lg message bit complexity while incurring detection latency corresponding to only integer addition and comparison the synchronization performance in terms of messaging overhead detection operations and storage requirements are evaluated and compared across numerous task creation and termination hierarchies
this article describes geographical study on the usage of search engine focusing on the traffic details at the level of countries and continents the main objective is to understand from geographic point of view how the needs of the users are satisfied taking into account the geographic location of the host in which the search originates and the host that contains the web page that was selected by the user in the answers our results confirm that the web is cultural mirror of society and shed light on the implicit social network behind search these results are also useful as input for the design of distributed search engines
overlay network monitoring enables distributed internet applications to detect and recover from path outages and periods of degraded performance within seconds for an overlay network with end hosts existing systems either require measurements and thus lack scalability or can only estimate the latency but not congestion or failures our earlier extended abstract chen bindel and katz tomography based overlay network monitoring proceedings of the acm sigcomm internet measurement conference imc briefly proposes an algebraic approach that selectively monitors linearly independent paths that can fully describe all the paths the loss rates and latency of these paths can be used to estimate the loss rates and latency of all other paths our scheme only assumes knowledge of the underlying ip topology with links dynamically varying between lossy and normal in this paper we improve implement and extensively evaluate such monitoring system we further make the following contributions scalability analysis indicating that for reasonably large eg the growth of is bounded as log ii efficient adaptation algorithms for topology changes such as the addition or removal of end hosts and routing changes iii measurement load balancing schemes iv topology measurement error handling and design and implementation of an adaptive streaming media system as representative application both simulation and internet experiments demonstrate we obtain highly accurate path loss rate estimation while adapting to topology changes within seconds and handling topology errors
in web environment the online community is fundamental to the business model and participants in the online community are often motivated and rewarded by abstract concepts of social capital how networks of relationships in online communities are structured has important implications for how social capital may be generated which is critical to both attract and govern the necessary user base to sustain the site we examine popular website slashdot which uses system by which users can declare relationships with other users and also has an embedded reputation system to rank users called karma we test the relationship between user’s karma level and the social network structure measured by structural holes to evaluate the brokerage and closure theories of social capital development we find that slashdot users develop deep networks at lower levels of participation indicating value from closure and that participation intensity helps increase the returns we conclude with some comments on mechanism design which would exploit these findings to optimize the social networks and potentially increase the opportunities for monetization
we define the problem of bounded similarity querying in time series databases which generalizes earlier notions of similarity querying given sub sequence query sequence lower and upper bounds on shifting and scaling parameters and tolerance is considered boundedly similar to if can be shifted and scaled within the specified bounds to produce modified sequence whose distance from is within we use similarity transformation to formalize the notion of bounded similarity we then describe framework that supports the resulting set of queries it is based on fingerprint method that normalizes the data and saves the normalization parameters for off line data we provide an indexing method with single index structure and search technique for handling all the special cases of bounded similarity querying experimental investigations find the performance of our method to be competitive with earlier less general approaches
concise or condensed representations of frequent patterns follow the minimum description length mdl principle by providing the shortest description of the whole set of frequent patterns in this work we introduce new exact concise representation of frequent itemsets this representation is based on an exploration of the disjunctive search space the disjunctive itemsets convey information about the complementary occurrence of items in dataset novel closure operator is then devised to suit the characteristics of the explored search space the proposed operator aims at mapping many disjunctive itemsets to unique one called disjunctive closed itemset hence it permits to drastically reduce the number of handled itemsets within the targeted re presentation interestingly the proposed representation offers direct access to the disjunctive and negative supports of frequent itemsets while ensuring the derivation of their exact conjunctive supports we conclude from the experimental results reported and discussed here that our representation is effective and sound in comparison with different other concise representations
we study optimization of relational queries using materialized views where views may be regular or restructured in restructured view some data from the base table are represented as metadata that is schema information such as table and attribute names or vice versa using restructured views in query optimization opens up new spectrum of views that were not previously available and can result in significant additional savings in query evaluation costs these savings can be obtained due to significantly larger set of views to choose from and may involve reduced table sizes elimination of self joins clustering produced by restructuring and horizontal partitioning in this paper we propose general query optimization framework that treats regular and restructured views in uniform manner and is applicable to sql select project join queries and views without or with aggregation within the framework we provide algorithms to determine when view regular or restructured is usable in answering query and algorithms to rewrite queries using usable views semantic information such as knowledge of the key of view can be used to further optimize rewritten query within our general query optimization framework we develop techniques for determining the key of regular or restructured view and show how this information can be used to further optimize rewritten query it is straightforward to integrate all our algorithms and techniques into standard query optimization algorithms our extensive experimental results illustrate how using restructured views in addition to regular views in query optimization can result in significant reduction in query processing costs compared to system that uses only regular views
current disk prefetch policies in major operating systems track access patterns at the level of the file abstraction while this is useful for exploiting application level access patterns file level prefetching cannot realize the full performance improvements achievable by prefetching there are two reasons for this first certain prefetch opportunities can only be detected by knowing the data layout on disk such as the contiguous layout of file metadata or data from multiple files second nonsequential access of disk data requiring disk head movement is much slower than sequential access and the penalty for mis prefetching random block relative to that of sequential block is correspondingly more costly to overcome the inherent limitations of prefetching at the logical file level we propose to perform prefetching directly at the level of disk layout and in portable way our technique called diskseen is intended to be supplementary to and to work synergistically with file level prefetch policies if present diskseen tracks the locations and access times of disk blocks and based on analysis of their temporal and spatial relationships seeks to improve the sequentiality of disk accesses and overall prefetching performance our implementation of the diskseen scheme in the linux kernel shows that it can significantly improve the effectiveness of prefetching reducing execution times by for micro benchmarks and real applications such as grep cvs and tpc
the vision of web as combination of massive online virtual environments and today’s www currently attracts lot of attention while it provides multitude of opportunities the realisation of this vision on global scale poses severe technical challenges this paper points out some of the major challenges and highlights key concepts of an infrastructure that is being developed in order to meet them among these concepts special emphasis is put on the usage of two tier peer to peer approach the implementation of torrent based data distribution and the development of graded consistency notion the paper presents the current state of prototype implementation that is being developed in order to validate these concepts and evaluate alternative approaches
composite concepts result from the integration of multiple basic concepts by students to form highlevel knowledge so information about how students learn composite concepts can be used by instructors to facilitate students learning and the ways in which computational techniques can assist the study of the integration process are therefore intriguing for learning cognition and computer scientists we provide an exploration of this problem using heuristic methods search methods and machine learning techniques while employing bayesian networks as the language for representing the student models given experts expectation about students and simulated students responses to test items that were designed for the concepts we try to find the bayesian network structure that best represents how students learn the composite concept of interest the experiments were conducted with only simulated students the accuracy achieved by the proposed classification methods spread over wide range depending on the quality of collected input evidence we discuss the experimental procedures compare the experimental results observed in certain experiments provide two ways to analyse the influences of matrices on the experimental results and we hope that this simulation based experience may contribute to the endeavours in mapping the human learning process
recent cscw research has focused on methods for evaluating usability rather than the more problematic evaluation of systems in use possible approach to the integration of use design and evaluation is through the representation of evaluation findings as design oriented models method is described for modeling computer supported cooperative work and its context design patterns language based on the principles of activity theory the language is the outcome of an evaluation of the evolving use of tools to support collaborative information sharing carried out at global ngo
process based composition of web services has recently gained significant momentum for the implementation of inter organizational business collaborations in this approach individual web services are choreographed into composite web services whose integration logics are expressed as composition schema in this paper we present goal directed composition framework to support on demand business processes composition schemas are generated incrementally by rule inference mechanism based on set of domain specific business rules enriched with contextual information in situations where multiple composition schemas can achieve the same goal we must first select the best composition schema wherein the best schema is selected based on the combination of its estimated execution quality and schema quality by coupling the dynamic schema creation and quality driven selection strategy in one single framework we ensure that the generated composite service comply with business rules when being adapted and optimized
numerous extended transaction models have been proposed in the literature to overcome the limitations of the traditional transaction model for advanced applications characterized by their long durations cooperation between activities and access to multiple databases like cad cam and office automation however most of these extended models have been proposed with specific applications in mind and almost always fail to support applications with slightly different requirementswe propose the multiform transaction model to overcome this limitation the multiform transaction model supports variety of other extended transaction models multiform transaction consists of set of component transactions together with set of coordinators which specify the transaction completion dependencies among the component transactions set of transaction primitives allow the programmer to define custom completion dependencies we show how wide range of extended transactions can be implemented as multiform transactions including sagas transactional workflows nested transactions and contingent transactions we allow the programmers to define their own primitives mdash having very well defined interfaces mdash so that application specific transaction models like distributed multilevel secure transactions can also be supported
data warehouse is an integrated database whose data is collected from several data sources and supports on line analytical processing olap typically query to the data warehouse tends to be complex and involves large volume of data to keep the data at the warehouse consistent with the source data changes to the data sources should be propagated to the data warehouse periodically because the propagation of the changes maintenance is batch processing it takes long time since both query transactions and maintenance transactions are long and involve large volumes of data traditional concurrency control mechanisms such as two phase locking are not adequate for data warehouse environment we propose multi version concurrency control mechanism suited for data warehouses which use multi dimensional olap molap servers we call the mechanism multiversion concurrency control for data warehouses mvccdw to our knowledge our work is the first attempt to exploit versions for online data warehouse maintenance in molap environment mvcc dw guarantees the serializability of concurrent transactions transactions running under the mechanism do not block each other and do not need to place locks
desktop grids have emerged as an important methodology to harness the idle cycles of millions of participant desktop pcs over the internet however to effectively utilize the resources of desktop grid it is necessary to use scheduling policies suitable for such systems scheduling policy must be applicable to large scale systems involving large numbers of machines also the policy must be fault aware in the sense that it copes with resource volatility further adding to the complexity of scheduling for desktop grids is the inherent heterogeneity of such systems sub optimal performance would result if the scheduling policy does not take into account information on heterogeneity in this paper we suggest and develop several scheduling policies for desktop grid systems involving different levels of heterogeneity in particular we propose policy which utilizes the solution to linear programming problem which maximizes system capacity we consider parallel applications that consist of independent tasks
while many objects exhibit various forms of global symmetries prominent intrinsic symmetries which exist only on parts of an object are also well recognized such partial symmetries are often seen as more natural than global one even when the symmetric parts are under complex pose we introduce an algorithm to extract partial intrinsic reflectional symmetries pirs of shape given closed manifold mesh we develop voting scheme to obtain an intrinsic reflectional symmetry axis irsa transform which is scalar field over the mesh that accentuates prominent irsas of the shape we then extract set of explicit irsa curves on the shape based on refined measure of local reflectional symmetry support along curve the iterative refinement procedure combines irsa induced region growing and region constrained symmetry support refinement to improve accuracy and address potential issues arising from rotational symmetries in the shape we show how the extracted irsa curves can be incorporated into conventional mesh segmentation scheme so that the implied symmetry cues can be utilized to obtain more meaningful results we also demonstrate the use of irsa curves for symmetry driven part repair
an ensemble is set of learned models that make decisions collectively although an ensemble is usually more accurate than single learner existing ensemble methods often tend to construct unnecessarily large ensembles which increases the memory consumption and computational cost ensemble pruning tackles this problem by selecting subset of ensemble members to form subensembles that are subject to less resource consumption and response time with accuracy that is similar to or better than the original ensemble in this paper we analyze the accuracy diversity trade off and prove that classifiers that are more accurate and make more predictions in the minority group are more important for subensemble construction based on the gained insights heuristic metric that considers both accuracy and diversity is proposed to explicitly evaluate each individual classifier’s contribution to the whole ensemble by incorporating ensemble members in decreasing order of their contributions subensembles are formed such that users can select the top percent of ensemble members depending on their resource availability and tolerable waiting time for predictions experimental results on uci data sets show that subensembles formed by the proposed epic ensemble pruning via individual contribution ordering algorithm outperform the original ensemble and state of the art ensemble pruning method orientation ordering oo
we propose new method for the parameterized verification of formal specifications of cache coherence protocols the goal of parameterized verification is to establish system properties for an arbitrary number of caches in order to achieve this purpose we define abstractions that allow us to reduce the original parameterized verification problem to control state reachability problem for system with integer data variables specifically the methodology we propose consists of the following steps we first define an abstraction in which we only keep track of the number of caches in given state during the execution of protocol then we use linear arithmetic constraints to symbolically represent infinite sets of global states of the resulting abstract protocol for reasons of efficiency we relax the constraint operations by interpreting constraints over real numbers finally we check parameterized safety properties of abstract protocols using symbolic backward reachability strategy that allows us to obtain sufficient conditions for termination for an interesting class of protocols the latter problem can be solved by using the infinite state model checker hytech henzinger ho and wong toi ldquo model checker for hybrid systems rdquo proc of the th international conference on computer aided verification cav lecture notes in computer science springer haifa israel vol pp ndash hytech handles linear arithmetic constraints using the polyhedra library of halbwachs and proy ldquo verification of real time systems using linear relation analysis rdquo formal methods in system design vol no pp ndash by using this methodology we have automatically validated parameterized versions of widely implemented write invalidate and write update cache coherence protocols like synapse mesi moesi berkeley illinois firefly and dragon handy the cache memory book academic press with this application we have shown that symbolic model checking tools like hytech originally designed for the verification of hybrid systems can be applied successfully to new classes of infinite state systems of practical interest
propositionalization has recently received much attention in the ilp community as mean to learn efficiently non determinate concepts using adapted propositional algorithms this paper proposes to extend such an approach to unsupervised learning from symbolic relational description to help deal with the known combinatorial explosion of the number of possible clusters and the size of their descriptions we suggest an approach that gradually increases the expressivity of the relational language used to describe the classes at each level only the initial object descriptions that could benefit from such an enriched generalization language are propositionalized this latter representation allows us to use an efficient propositional clustering algorithm this approach is implemented in the cac system experiments on large chinese character database show the interest of using kids to cluster relational descriptions and pinpoint current problems for analyzing relational classifications
synchronizer with phase counter sometimes called asynchronous phase clock is an asynchronous distributed algorithm where each node maintains local pulse counter that simulates the global clock in synchronous network in this paper we present time optimal self stabilizing scheme for such synchronizer assuming unbounded counters we give simple rule by which each node can compute its pulse number as function of its neighbors pulse numbers we also show that some of the popular correction functions for phase clock synchronization are not self stabilizing in asynchronous networks using our rule the counters stabilize in time bounded by the diameter of the network without invoking global operations we argue that the use of unbounded counters can be justified by the availability of memory for counters that are large enough to be practically unbounded and by the existence of reset protocols that can be used to restart the counters in some rare cases where faults will make this necessary
recommendations are crucial for the success of large websites while there are many ways to determine recommendations the relative quality of these recommenders depends on many factors and is largely unknown we propose new classification of recommenders and comparatively evaluate their relative quality for sample web site the evaluation is performed with awesome adaptive website recommendations new data warehouse based recommendation system capturing and evaluating user feedback on presented recommendations moreover we show how awesome performs an automatic and adaptive closed loop website optimization by dynamically selecting the most promising recommenders based on continuously measured recommendation feedback we propose and evaluate several alternatives for dynamic recommender selection including powerful machine learning approach
in this paper we present the apfel plug in that collects fine grained changes from version archives in database apfel is built upon the eclipse infrastructure for cvs and java in order to describe changes apfel uses tokens such as method calls exceptions and variable usages we demonstrate the usefulness of apfel’s database with several case studies
this paper gives the main definitions relating to dependability generic concept including as special case such attributes as reliability availability safety integrity maintainability etc security brings in concerns for confidentiality in addition to availability and integrity basic definitions are given first they are then commented upon and supplemented by additional definitions which address the threats to dependability and security faults errors failures their attributes and the means for their achievement fault prevention fault tolerance fault removal fault forecasting the aim is to explicate set of general concepts of relevance across wide range of situations and therefore helping communication and cooperation among number of scientific and technical communities including ones that are concentrating on particular types of system of system failures or of causes of system failures
data type generic programming can be used to traverse and manipulate specific parts of large heterogeneously typed tree structures without the need for tedious boilerplate generic programming is often approached from theoretical perspective where the emphasis lies on the power of the representation rather than on efficiency we describe use cases for generic system derived from our work on nanopass compiler where efficiency is real concern and detail new generics approach alloy that we have developed in haskell to allow our compiler passes to traverse the abstract syntax tree quickly we benchmark our approach against several other haskell generics approaches and statistically analyse the results finding that alloy is fastest on heterogeneously typed trees
this paper presents the modeling of embedded systems with simbed an execution driven simulation testbed that measures the execution behavior and power consumption of embedded applications and rtoss by executing them on an accurate architectural model of microcontroller with simulated real time stimuli we briefly describe the simulation environment and present study that compares three rtoss mgr os ii popular public domain embedded real time operating system echidna sophisticated industrial strength commercial rtos and nos bare bones multi rate task scheduler reminiscent of typical roll your own rtoss found in many commercial embedded systems the microcontroller simulated in this study is the motorola core processor low power bit cpu core with bit instructions running at mhz
web queries have been and will remain an essential tool for accessing processing and ultimately reasoning with data on the web with the vast data size on the web and semantic web reducing costs of data transfer and query evaluation for web queries is crucial to reduce costs it is necessary to narrow the data candidates to query simplify complex queries and reduce intermediate results this article describes static approach to optimization of web queries we introduce set of rules which achieves the desired optimization by schema and type based query rewriting the approach consists in using schema information for removing incompleteness as expressed by descendant constructs and disjunctions from queries the approach is presented on the query language xcerpt though applicable to other query languages like xquery the approach is an application of rules in many aspects query rules are optimized using rewriting rules based on schema or type information specified in grammar rules
this paper makes the case that pin bandwidth will be critical consideration for future microprocessors we show that many of the techniques used to tolerate growing memory latencies do so at the expense of increased bandwidth requirements using decomposition of execution time we show that for modern processors that employ aggressive memory latency tolerance techniques wasted cycles due to insufficient bandwidth generally exceed those due to raw memory latencies given the importance of maximizing memory bandwidth we calculate effective pin bandwidth then estimate optimal effective pin bandwidth we measure these quantities by determining the amount by which both caches and minimal traffic caches filter accesses to the lower levels of the memory hierarchy we see that there is gap that can exceed two orders of magnitude between the total memory traffic generated by caches and the minimal traffic caches implying that the potential exists to increase effective pin bandwidth substantially we decompose this traffic gap into four factors and show they contribute quite differently to traffic reduction for different benchmarks we conclude that in the short term pin bandwidth limitations will make more complex on chip caches cost effective for example flexible caches may allow individual applications to choose from range of caching policies in the long term we predict that off chip accesses will be so expensive that all system memory will reside on one or more processor chips
high dimensional data has always been challenge for clustering algorithms because of the inherent sparsity of the points recent research results indicate that in high dimensional data even the concept of proximity or clustering may not be meaningful we discuss very general techniques for projected clustering which are able to construct clusters in arbitrarily aligned subspaces of lower dimensionality the subspaces are specific to the clusters themselves this definition is substantially more general and realistic than currently available techniques which limit the method to only projections from the original set of attributes the generalized projected clustering technique may also be viewed as way of trying to redefine clustering for high dimensional applications by searching for hidden subspaces with clusters which are created by inter attribute correlations we provide new concept of using extended cluster feature vectors in order to make the algorithm scalable for very large databases the running time and space requirements of the algorithm are adjustable and are likely ta tradeoff with better accuracy
peer to peer data integration aka peer data management systems pdmss promises to extend the classical data integration approach to the internet scale unfortunately some challenges remain before realizing this promise one of the biggest challenges is preserving the privacy of the exchanged data while passing through several intermediate peers another challenge is protecting the mappings used for data translation protecting the privacy without being unfair to any of the peers is yet third challenge this paper presents novel query answering protocol in pdmss to address these challenges the protocol employs technique based on noise selection and insertion to protect the query results and commutative encryption based technique to protect the mappings and ensure fairness among peers an extensive security analysis of the protocol shows that it is resilient to several possible types of attacks we implemented the protocol within an established pdms the hyperion system we conducted an experimental study using real data from the healthcare domain the results show that our protocol manages to achieve its privacy and fairness goals while maintaining query processing time at the interactive level
rough sets are applied to data tables containing missing values discernibility and indiscernibility between missing value and another value are considered simultaneously family of possible equivalence classes is obtained in which each equivalence class has the possibility that it is an actual one by using the family of possible equivalence classes we can derive lower and upper approximations even if the approximations are not obtained by previous methodsfurthermore the lower and upper approximations coincide with those obtained from methods of possible worlds
we present opportunistic controls class of user interaction techniques for augmented reality ar applications that support gesturing on and receiving feedback from otherwise unused affordances already present in the domain environment opportunistic controls leverage characteristics of these affordances to provide passive haptics that ease gesture input simplify gesture recognition and provide tangible feedback to the user widgets are tightly coupled with affordances to provide visual feedback and hints about the functionality of the control for example set of buttons is mapped to existing tactile features on domain objects we describe examples of opportunistic controls that we have designed and implemented using optical marker tracking combined with appearance based gesture recognition we present the results of user study in which participants performed simulated maintenance inspection of an aircraft engine using set of virtual buttons implemented both as opportunistic controls and using simpler passive haptics opportunistic controls allowed participants to complete their tasks significantly faster and were preferred over the baseline technique
the shading in scene depends on combination of many factors how the lighting varies spatially across surface how it varies along different directions the geometric curvature and reflectance properties of objects and the locations of soft shadows in this article we conduct complete first order or gradient analysis of lighting shading and shadows showing how each factor separately contributes to scene appearance and when it is important gradients are well suited to analyzing the intricate combination of appearance effects since each gradient term corresponds directly to variation in specific factor first we show how the spatial and directional gradients of the light field change as light interacts with curved objects this extends the recent frequency analysis of durand et al to gradients and has many advantages for operations like bump mapping that are difficult to analyze in the fourier domain second we consider the individual terms responsible for shading gradients such as lighting variation convolution with the surface brdf and the object’s curvature this analysis indicates the relative importance of various terms and shows precisely how they combine in shading third we understand the effects of soft shadows computing accurate visibility gradients and generalizing previous work to arbitrary curved occluders as one practical application our visibility gradients can be directly used with conventional ray tracing methods in practical gradient interpolation methods for efficient rendering moreover our theoretical framework can be used to adaptively sample images in high gradient regions for efficient rendering
wireless ad hoc routing has been extensively studied and many clever schemes have been proposed over the last several years one class of ad hoc routing is geographic routing where each intermediate node independently selects the next hop using the given location information of destination geographic routing which eliminates the overhead of route request packet flooding is scalable and suitable for large scale ad hoc networks however geographic routing may select the long detour paths when there are voids between source and destination in this paper we propose novel geographic routing approach called geographic landmark routing glr glr recursively discovers the intermediate nodes called landmarks and constructs sub paths that connect the subsequent landmarks simulation results on various network topologies show that glr significantly improves the performance of geographic routing
we capture the shape of moving cloth using custom set of color markers printed on the surface of the cloth the output is sequence of triangle meshes with static connectivity and with detail at the scale of individual markers in both smooth and folded regions we compute markers coordinates in space using correspondence across multiple synchronized video cameras correspondence is determined from color information in small neighborhoods and refined using novel strain pruning process final correspondence does not require neighborhood information we use novel data driven hole filling technique to fill occluded regions our results include several challenging examples wrinkled shirt sleeve dancing pair of pants and rag tossed onto cup finally we demonstrate that cloth capture is reusable by animating pair of pants using human motion capture data
wireless sensor networks have been widely used in civilian and military applications primarily designed for monitoring purposes many sensor applications require continuous collection and processing of sensed data due to the limited power supply for sensor nodes energy efficiency is major performance concern in query processing in this paper we focus on continuous knn query processing in object tracking sensor networks we propose localized scheme to monitor nearest neighbors to query point the key idea is to establish monitoring area for each query so that only the updates relevant to the query are collected the monitoring area is set up when the knn query is initially evaluated and is expanded and shrunk on the fly upon object movement we analyze the optimal maintenance of the monitoring area and develop an adaptive algorithm to dynamically decide when to shrink the monitoring area experimental results show that establishing monitoring area for continuous knn query processing greatly reduces energy consumption and prolongs network lifetime
to solve problems that require far more memory than single machine can supply data can be swapped to disk in some manner it can be compressed and or the memory of multiple parallel machines can be used to provide enough memory and storage space instead of implementing either functionality anew and specific for each application or instead of relying on the operating system’s swapping algorithms which are inflexible not algorithm aware and often limited in their fixed storage capacity our solution is large virtual machine lvm that transparently provides large address space to applications and that is more flexible and efficient than operating system approacheslvm is virtual machine for java that is designed to support large address spaces for billions of objects it swaps objects out to disk compresses objects where needed and uses multiple parallel machines in distributed shared memory dsm setting the latter is the main focus of this paper allocation and collection performance is similar to well known jvms if no swapping is needed with swapping and clustering we are able to create list containing elements far faster than other jvms lvm’s swapping is up to times faster than os level swapping swap aware gc algorithm helps by factor of
designing and optimizing high performance microprocessors is an increasingly difficult task due to the size and complexity of the processor design space high cost of detailed simulation and several constraints that processor design must satisfy in this paper we propose the use of empirical non linear modeling techniques to assist processor architects in making design decisions and resolving complex trade offs we propose procedure for building accurate non linear models that consists of the following steps selection of small set of representative design points spread across processor design space using latin hypercube sampling ii obtaining performance measures at the selected design points using detailed simulation iii building non linear models for performance using the function approximation capabilities of radial basis function networks and iv validating the models using an independently and randomly generated set of design points we evaluate our model building procedure by constructing non linear performance models for programs from the spec cpu benchmark suite with microarchitectural design space that consists of key parameters our results show that the models built using relatively small number of simulations achieve high prediction accuracy only error in cpi estimates on average across large processor design space our models can potentially replace detailed simulation for common tasks such as the analysis of key microarchitectural trends or searches for optimal processor design points
trace files are widely used in research and academia to study the behavior of programs they are simple to process and guarantee repeatability unfortunately they tend to be very large this paper describes vpc fundamentally new approach to compressing program traces vpc employs value predictors to bring out and amplify patterns in the traces so that conventional compressors can compress them more effectively in fact our approach not only results in much higher compression rates but also provides faster compression and decompression for example compared to bzip vpc geometric mean compression rate on speccpu store address traces is times higher compression is ten times faster and decompression is three times faster
in profiling tradeoff exists between information and overhead for example hardware sampling profilers incur negligible overhead but the information they collect is consequently very coarse other profilers use instrumentation tools to gather temporal traces such as path profiles and hot memory streams but they have high overhead runtime and feedback directed compilation systems need detailed information to aggressively optimize but the cost of gathering profiles can outweigh the benefits shadow profiling is novel method for sampling long traces of instrumented code in parallel with normal execution taking advantage of the trend of increasing numbers of cores each instrumented sample can be many millions of instructions in length the primary goal is to incur negligible overhead yet attain profile information that is nearly as accurate as perfect profile the profiler requires no modifications to the operating system or hardware and is tunable to allow for greater coverage or lower overhead we evaluate the performance and accuracy of this new profiling technique for two common types of instrumentation based profiles interprocedural path profiling and value profiling overall profiles collected using the shadow profiling framework are accurate versus perfect value profiles while incurring less than overhead consequently this technique increases the viability of dynamic and continuous optimization systems by hiding the high overhead of instrumentation and enabling the online collection of many types of profiles that were previously too costly
mini graph is dataflow graph that has an arbitrary internal size and shape but the interface of singleton instruction two register inputs one register output maximum of one memory operation and maximum of one terminal control transfer previous work has exploited dataflow sub graphs whose execution latency can be reduced via programmable fpga style hardware in this paper we show that mini graphs can improve performance by amplifying the bandwidths of superscalar processor’s stages and the capacities of many of its structures without custom latency reduction hardware amplification is achieved because the processor deals with complete mini graph via single quasi instruction the handle by constraining mini graph structure and forcing handles to behave as much like singleton instructions as possible the number and scope of the modifications over conventional superscalar microarchitecture is kept to minimum this paper describes mini graphs simple algorithm for extracting them from basic block frequency profiles and microarchitecture for exploiting them cycle level simulation of several benchmark suites shows that mini graphs can provide average performance gains of over an aggressive baseline with peak gains exceeding alternatively they can compensate for substantial reductions in register file and scheduler size and in pipeline bandwidth
we present class of richly structured undirected hidden variable models suitable for simultaneously modeling text along with other attributes encoded in different modalities our model generalizes techniques such as principal component analysis to heterogeneous data types in contrast to other approaches this framework allows modalities such as words authors and timestamps to be captured in their natural probabilistic encodings latent space representation for previously unseen document can be obtained through fast matrix multiplication using our method we demonstrate the effectiveness of our framework on the task of author prediction from years of the nips conference proceedings and for recipient prediction task using month academic email archive of researcher our approach should be more broadly applicable to many real world applications where one wishes to efficiently make predictions for large number of potential outputs using dimensionality reduction in well defined probabilistic framework
camus is multi user multi phone extension of the camus system mobile camera phones use their cameras to track position rotation height and other parameters over marker sheet to allow interactive performance of music multiple camera phones can use the same or separate marker sheets and send their interaction parameters via bluetooth to computer where the sensor information is converted to midi format to allow control of wide range of sound generation and manipulation hardware and software the semantics of the mapping of midi message to performance parameters of the camera interactions are fed back into the visualization on the camera phone
body area sensor network or ban based health monitoring is increasingly becoming popular alternative to traditional wired biomonitoring techniques however most biomonitoring applications need continuous processing of large volumes of data as result of which both power consumption and computation bandwidth turn out to be serious constraints for sensor network platforms this has resulted in lot of recent interest in design methods modeling and software analysis techniques specifically targeted towards bans and applications running on them in this paper we show that appropriate optimization of the application running on the communication gateway of wireless ban and accurate modeling of the microarchitectural details of the gateway processor can lead to significantly better resource usage and power savings in particular we propose method for deriving the optimal order in which the different sensors feeding the gateway processor should be sampled to maximize cache re use our case study using faint fall detection application from the geriatric care domain which is fed by number of smart sensors to detect physiological and physical gait signals of patient show very attractive energy savings in the underlying processor alternatively our method can be used to improve the sampling frequency of the sensors leading to higher reliability and better response time of the application
we address the problem of querying xml data over pp network in pp networks the allowed kinds of queries are usually exact match queries over file names we discuss the extensions needed to deal with xml data and xpath queries single peer can hold whole document or partial complete fragment of the latter each xml fragment document is identified by distinct path expression which is encoded in distributed hash table our framework differs from content based routing mechanisms biased towards finding the most relevant peers holding the data we perform fragments placement and enable fragments lookup by solely exploiting few path expressions stored on each peer by taking advantage of quasi zero replication of global catalogs our system supports fast full and partial xpath querying to this purpose we have extended the chord simulator and performed an experimental evaluation of our approach
we present technique to analyse successive versions of service interface in order to detect changes that cause clients using an earlier version not to interact properly with later version we focus on behavioural incompatibilities and adopt the notion of simulation as basis for determining if new version of service is behaviourally compatible with previous one unlike prior work our technique does not simply check if the new version of the service simulates the previous one instead in the case of incompatible versions the technique provides detailed diagnostics including list of incompatibilities and specific states in which these incompatibilities occur the technique has been implemented in tool that visually pinpoints set of changes that cause one behavioural interface not to simulate another one
we present an algorithm for interactively extracting and rendering isosurfaces of large volume datasets in view dependent fashion recursive tetrahedral mesh refinement scheme based on longest edge bisection is used to hierarchically decompose the data into multiresolution structure this data structure allows fast extraction of arbitrary isosurfaces to within user specified view dependent error bounds data layout scheme based on hierarchical space filling curves provides access to the data in cache coherent manner that follows the data access pattern indicated by the mesh refinement
while nand flash memories have rapidly increased in both capacity and performance and are increasingly used as storage device in many embedded systems their reliability has decreased both because of increased density and the use of multi level cells mlc current mlc technology only specifies the minimum requirement for an error correcting code ecc but provides no additional protection in hardware however existing flash file systems such as yaffs and jffs rely upon ecc to survive small numbers of bit errors but cannot survive the larger numbers of bit errors or page failures that are becoming increasingly common as flash file systems scale to multiple gigabytes we have developed flash memory file system rcffs that increases reliability by utilizing algebraic signatures to validate data and reed solomon codes to correct erroneous or missing data our file system allows users to adjust the level of reliability they require by specifying the number of redundancy pages for each erase block allowing them to dynamically trade off reliability and storage overhead by integrating error mitigation with advanced features such as fast mounting and compression we show via simulation in nandsim that our file system can outperform yaffs and jffs while surviving flash memory errors that would cause data loss for existing flash file systems
clustering high dimensional data is an emerging research field subspace clustering or projected clustering group similar objects in subspaces ie projections of the full space in the past decade several clustering paradigms have been developed in parallel without thorough evaluation and comparison between these paradigms on common basis conclusive evaluation and comparison is challenged by three major issues first there is no ground truth that describes the true clusters in real world data second large variety of evaluation measures have been used that reflect different aspects of the clustering result finally in typical publications authors have limited their analysis to their favored paradigm only while paying other paradigms little or no attention in this paper we take systematic approach to evaluate the major paradigms in common framework we study representative clustering algorithms to characterize the different aspects of each paradigm and give detailed comparison of their properties we provide benchmark set of results on large variety of real world and synthetic data sets using different evaluation measures we broaden the scope of the experimental analysis and create common baseline for future developments and comparable evaluations in the field for repeatability all implementations data sets and evaluation measures are available on our website
harvesting energy from the environment is desirable and increasingly important capability in several emerging applications of embedded systems such as sensor networks biomedical implants etc while energy harvesting has the potential to enable near perpetual system operation designing an efficient energy harvesting system that actually realizes this potential requires an in depth understanding of several complex tradeoffs these tradeoffs arise due to the interaction of numerous factors such as the characteristics of the harvesting transducers chemistry and capacity of the batteries used if any power supply requirements and power management features of the embedded system application behavior etc this paper surveys the various issues and tradeoffs involved in designing and operating energy harvesting embedded systems system design techniques are described that target high conversion and storage efficiency by extracting the most energy from the environment and making it maximally available for consumption harvesting aware power management techniques are also described which reconcile the very different spatio temporal characteristics of energy availability and energy usage within system and across network
most current banner advertising is sold through negotiation thereby incurring large transaction costs and possibly suboptimal allocations we propose new automated system for selling banner advertising in this system each advertiser specifies collection of host webpages which are relevant to his product desired total quantity of impressions on these pages and maximum per impression price the system selects subset of advertisers as winners and maps each winner to set of impressions on pages within his desired collection the distinguishing feature of our system as opposed to current combinatorial allocation mechanisms is that mimicking the current negotiation system we guarantee that winners receive at least as many advertising opportunities as they requested or else receive ample compensation in the form of monetary payment by the host such guarantees are essential in markets like banner advertising where major goal of the advertising campaign is developing brand recognition as we show the problem of selecting feasible subset of advertisers with maximum total value is inapproximable we thus present two greedy heuristics and discuss theoretical techniques to measure their performances our first algorithm iteratively selects advertisers and corresponding sets of impressions which contribute maximum marginal per impression profit to the current solution we prove bi criteria approximation for this algorithm showing that it generates approximately as much value as the optimum algorithm on slightly harder problem however this algorithm might perform poorly on instances in which the value of the optimum solution is quite large clearly undesirable failure mode hence we present an adaptive greedy algorithm which again iteratively selects advertisers with maximum marginal per impression profit but additionally reassigns impressions at each iteration for this algorithm we prove structural approximation result newly defined framework for evaluating heuristics we thereby prove that this algorithm has better performance guarantee than the simple greedy algorithm
commercial off the shelf cots software tends to be cheap reliable and functionally powerful due to its large user base it has thus become highly desirable to incorporate cots software into software products systems as it can significantly reduce development cost and effort while maintaining overall software product quality and increasing product acceptance however incorporating cots software into software products introduces new complexities that developers are currently ill equipped to handle most significantly while cots software frequently contains programmatic interfaces that allow other software components to obtain services from them on direct call basis they usually lack the ability to initiate interactions with other components this often leads to problems of state and or data inconsistency this paper presents framework for integrating cots software as proactive components within software system that maintain the consistency of the state and data they share with other components the framework utilizes combination of low level instrumentation and high level reasoning to expose the relevant internal activities within cots component required to initiate the communication needed to maintain consistency with the other components with which it shares state and data we will illustrate these capabilities through the integration of ibm’s rational rose into design suite and demonstrate how our framework solves the complex data synchronization problems that arise from this integration
performance simulation tools must be validated during the design process as functional models and early hardware are developed so that designers can be sure of the performance of their designs as they implement changes the current state of the art is to use simple hand coded bandwidth and latency testcases to assess early performance and to calibrate performance models applications and benchmark suites such as spec cpu are difficult to set up or take too long to execute on functional models short trace snippets from applications can be executed on performance and functional simulators but not without difficulty on hardware and there is no guarantee that hand coded tests and short snippets cover the performance of the original applicationswe present new automatic testcase synthesis methodology to address these concerns by basing testcase synthesis on the workload characteristics of an application we create source code that largely represents the performance of the application but which executes in fraction of the runtime we synthesize representative versions of the spec benchmarks compile and execute them and obtain an average ipc within of the average ipc of the original benchmarks with similar average workload characteristics in addition the changes in ipc due to design changes are found to be proportional to the changes in ipc for the original applications the synthetic testcases execute more than three orders of magnitude faster than the original applications typically in less than instructions making performance model validation feasible
mapreduce and stream processing are two emerging but different paradigms for analyzing processing and making sense of large volumes of modern day data while mapreduce offers the capability to analyze several terabytes of stored data stream processing solutions offer the ability to process possibly few million updates every second however there is an increasing number of data processing applications which need solution that effectively and efficiently combines the benefits of mapreduce and stream processing to address their data processing needs for example in the automated stock trading domain applications usually require periodic analysis of large amounts of stored data to generate model using mapreduce which is then used to process stream of incident updates using stream processing system this paper presents deduce which extends ibm’s system stream processing middleware with support for mapreduce by providing language and runtime support for easily specifying and embedding mapreduce jobs as elements of larger data flow capability to describe reusable modules that can be used as map and reduce tasks and configuration parameters that can be tweaked to control and manage the usage of shared resources by the mapreduce and stream processing components we describe the motivation for deduce and the design and implementation of the mapreduce extensions for system and then present experimental results
in this paper we describe how high quality transaction data comprising of online searching product viewing and product buying activity of large online community can be used to infer semantic relationships between queries we work with large scale query log consisting of around million queries from ebay we discuss various techniques to infer semantic relationships among queries and show how the results from these methods can be combined to measure the strength and depict the kinds of relationships further we show how this extraction of relations can be used to improve search relevance related query recommendations and recovery from null results in an ecommerce context
end users develop more software than any other group of programmers using software authoring devices such as mail filtering editors by demonstration macro builders and spreadsheet environments despite this there has been little research on finding ways to help these programmers with the dependability of their software we have been addressing this problem in several ways one of which includes supporting end user debugging activities through fault localization techniques this paper presents the results of an empirical study conducted in an end user programming environment to examine the impact of two separate factors in fault localization techniques that affect technique effectiveness our results shed new insights into fault localization techniques for end user programmers and the factors that affect them with significant implications for the evaluation of those techniques
we argue in this paper that concurrency errors should be treated as exceptions ie have fail stop behavior and precise semantics we propose an exception model based on conflict of synchronization free regions which precisely detects broad class of data races we show that our exceptions provide enough guarantees to simplify high level programming language semantics and debugging but are significantly cheaper to enforce than traditional data race detection to make the performance cost of enforcement negligible we propose architecture support for accurately detecting and precisely delivering these exceptions we evaluate the suitability of our model as well as the behavior of our architectural mechanisms using the parsec benchmark suite and commercial applications our results show that the exception model largely reflects how programmers are already writing code and that the main memory traffic and performance overheads of the enforcement mechanisms we propose are very low
reuse of existing libraries simplifies software development efforts however these libraries are often complex and reusing the apis in the libraries involves steep learning curve programmer often uses search engine such as google to discover code snippets involving library usage to perform common task problem with search engines is that they return many pages that programmer has to manually mine to discover the desired code recent research efforts have tried to address this problem by automating the generation of code snippets from user queries however these queries need to have type information and therefore require the user to have partial knowledge of the apiswe propose novel code search technique called sniff which retains the flexibility of performing code search in plain english while obtaining small set of relevant code snippets to perform the desired task our technique is based on the observation that the library methods that user code calls are often well documented we use the documentation of the library methods to add plain english meaning to an otherwise undocumented user code the annotated user code is then indexed for the purpose of free form query search another novel contribution of our technique is that we take type based intersection of the candidate code snippets obtained from query search to generate set of small and highly relevant code snippetswe have implemented sniff for java and have performed evaluations and user studies to demonstrate the utility of sniff our evaluations show that sniff performed better than most of the existing online search engines as well as related tools
we propose multiresolution volume simplification and polygonization algorithm traditionally voxel based algorithms lack the adaptive resolution support and consequently simplified volumes quickly lose sharp features after several levels of downsampling while tetrahedral based simplification algorithms usually generate poorly shaped triangles in our method each boundary cell is represented by carefully selected representative vertex the quadric error metrics are applied as the geometric error metric our approach first builds an error pyramid by bottom up cell merging we avoid topology problems in hierarchical cell merging by disabling erroneous cells and penalizing cells containing disconnected surface components with additional costs then top down traversal is used to collect cells within user specified error threshold the surfacenets algorithm is used to polygonize these cells we enhance it with online triangle shape optimization and budget control finally we discuss novel octree implementation which greatly eases the polygonization operations
user generated video content has become increasingly popular with large number of internet video sharing portals appearing many portals wish to rapidly find and remove objectionable material from the uploaded videos this paper considers the flagging of uploaded videos as potentially objectionable due to sexual content of an adult nature such videos are often characterized by the presence of large amount of skin although other scenes such as close ups of faces also satisfy this criterion the main contribution of this paper is to introduce to this task two uses of contextual information in the form of detected faces the first is to use combination of different face detectors to adjust the parameters of the skin detection model the second is through the summarization of video in the form of path in skin face plot this plot allows potentially objectionable segments of videos to be found while ignoring segments containing close ups of faces the proposed approach runs in real time experiments are done on per pixel annotated and challenging on line videos from an on line service provider to prove our approach large scale experiments are carried out on popular public video clips from web platforms these are chosen from the community top rated and cover large variety of different skin colors illuminations image quality and difficulty levels we find compact and reliable representation for videos to flag suspicious content efficiently
in the freeze tag problem the objective is to awaken set of asleep robots starting with only one awake robot robot awakens sleeping robot by moving to the sleeping robot’s position when robot awakens it is available to assist in awakening other slumbering robots the objective is to compute an optimal awakening schedule such that all robots are awake by time for the smallest possible value of because of its resemblance to the children’s game of freeze tag this problem has been called freeze tag problem ftp particularly intriguing aspect of the ftp is that any algorithm that is not purposely unproductive yields an log approximation while no log approximation algorithms are known for general metric spacesthis paper presents an approximation algorithm for the ftp in unweighted graphs in which there is one asleep robot at each node we show that this version of the ftp is np hardwe generalize our methods to the case in which there are multiple robots at each node and edges are unweighted we obtain log approximation in this case in the case of weighted edges our methods yield an log approximation algorithm where is the length of the longest edge and is the diameter of the graph
in the decision support queries which manipulate large data volumes it is frequent that query constituted by several joins can not be computed completely in memory in this paper we propose three strategies allowing to assign the memory of shared nothing parallel architecture to operation clones of query the performance evaluation of the three strategies shows that the strategies which favor the operation clones using lot of memory obtain better response time than the strategy which favors the clones using little memory the main contribution of this paper is to take into account the available memory sizes on every processor and to avoid allotting the same processor to two operation clones that must run in parallel
knowledge sharing enables people in virtual communities to access relevant knowledge explicit or tacit from broader scope of resources the performance in such environments is fundamentally based on how effectively the explicit and tacit knowledge can be shared across people and how efficiently the created knowledge can be organized and disseminated to enrich digital content this study will address how to apply social network based system to support interactive collaboration in knowledge sharing over peer to peer networks results of this study demonstrate that applying such social network based collaboration support to knowledge sharing helps people find relevant content and knowledgeable collaborators who are willing to share their knowledge
cyclic debugging is used to execute programs over and over again for tracking down and eliminating bugs during re execution programmers may want to stop at breakpoints or apply step by step execution for inspecting the programs state and detecting errors for long running parallel programs the biggest drawback is the cost associated with restarting the programs execution every time from the beginning solution is offered by combining checkpointing and debugging which allows program run to be initiated at any intermediate checkpoint problem is the selection of an appropriate recovery line for given breakpoint the temporal distance between these two points may be rather long if recovery lines are only chosen at consistent global checkpoints the method described in this paper allows users to select an arbitrary checkpoint as starting point for debugging and thus to shorten the temporal distance in addition mechanism for reducing the amount of trace data in terms of logged messages is provided the resulting technique is able to reduce the waiting time and the costs of cyclic debugging
software is typically improved and modified in small increments we refer to each of these increments as modification record mr mrs are usually stored in configuration management or version control system and can be retrieved for analysis in this study we retrieved the mrs from several mature open software projects we then concentrated our analysis on those mrs that fix defects and provided heuristics to automatically classify them we used the information in the mrs to visualize what files are changed at the same time and who are the people who tend to modify certain files we argue that these visualizations can be used to understand the development stage of in which project is at given time new features are added or defects are being fixed the level of modularization of project and how developers might interact between each other and the source code of system
this paper addresses the problem of semantics based temporal expert finding which means identifying person with given expertise for different time periods for example many real world applications like reviewer matching for papers and finding hot topics in newswire articles need to consider time dynamics intuitively there will be different reviewers and reporters for different topics during different time periods traditional approaches used graph based link structure by using keywords based matching and ignored semantic information while topic modeling considered semantics based information without conferences influence richer text semantics and relationships between authors and time information simultaneously consequently they result in not finding appropriate experts for different time periods we propose novel temporal expert topic tet approach based on semantics and temporal information based expert search stms for temporal expert finding which simultaneously models conferences influence and time information consequently topics semantically related probabilistic clusters of words occurrence and correlations change over time while the meaning of particular topic almost remains unchanged by using bayes theorem we can obtain topically related experts for different time periods and show how experts interests and relationships change over time experimental results on scientific literature dataset show that the proposed generalized time topic modeling approach significantly outperformed the non generalized time topic modeling approaches due to simultaneously capturing conferences influence with time information
towards sophisticated representation and reasoning techniques that allow for probabilistic uncertainty in the rules logic and proof layers of the semantic web we present probabilistic description logic programs or pdl programs which are combination of description logic programs or dl programs under the answer set semantics and the well founded semantics with poole’s independent choice logic we show that query processing in such pdl programs can be reduced to computing all answer sets of dl programs and solving linear optimization problems and to computing the well founded model of dl programs respectively moreover we show that the answer set semantics of pdl programs is refinement of the well founded semantics of pdl programs furthermore we also present an algorithm for query processing in the special case of stratified pdl programs which is based on reduction to computing the canonical model of stratified dl programs
this paper concerns sequential checkpoint placement problems under two dependability measures steady state system availability and expected reward per unit time in the steady state we develop numerical computation algorithms to determine the optimal checkpoint sequence based on the classical brender’s fixed point algorithm and further give three simple approximation methods numerical examples with the weibull failure time distribution are devoted to illustrate quantitatively the overestimation and underestimation of the sub optimal checkpoint sequences based on the approximation methods
finite state machine based abstractions of software behaviour are popular because they can be used as the basis for wide range of semi automated verification and validation techniques these can however rarely be applied in practice because the specifications are rarely kept up to date or even generated in the first place several techniques to reverse engineer these specifications have been proposed but they are rarely used in practice because their input requirements ie the number of execution traces are often very high if they are to produce an accurate result an insufficient set of traces usually results in state machine that is either too general or incomplete temporal logic formulae can often be used to concisely express constraints on system behaviour that might otherwise require thousands of execution traces to identify this paper describes an extension of an existing state machine inference technique that accounts for temporal logic formulae and encourages the addition of new formulae as the inference process converges on solution the implementation of this process is openly available and some preliminary results are provided
we present graph theoretic framework in which to study instances of the semiunification problem sup which is known to be undecidable but has several known and important decidable subsets one such subset the acyclic semiunification problem asup has proved useful in the study of polymorphic type inference we present graph theoretic criteria in our framework that exactly characterize the asup acyclicity constraint we then use our framework to find decidable subset of sup which we call asup which has more natural description than asup and strictly contains it
the issue of certificate revocation in mobile ad hoc networks manets where there are no on line access to trusted authorities is challenging problem in wired network environments when certificates are to be revoked certificate authorities cas add the information regarding the certificates in question to certificate revocation lists crls and post the crls on accessible repositories or distribute them to relevant entities in purely ad hoc networks there are typically no access to centralized repositories or trusted authorities therefore the conventional method of certificate revocation is not applicable in this paper we present decentralized certificate revocation scheme that allows the nodes within manet to revoke the certificates of malicious entities the scheme is fully contained and it does not rely on inputs from centralized or external entities
the availability of low cost hardware such as cmos cameras and microphones has fostered the development of wireless multimedia sensor networks wmsns ie networks of wirelessly interconnected devices that are able to ubiquitously retrieve multimedia content such as video and audio streams still images and scalar sensor data from the environment in this paper the state of the art in algorithms protocols and hardware for wireless multimedia sensor networks is surveyed and open research issues are discussed in detail architectures for wmsns are explored along with their advantages and drawbacks currently off the shelf hardware as well as available research prototypes for wmsns are listed and classified existing solutions and open research issues at the application transport network link and physical layers of the communication protocol stack are investigated along with possible cross layer synergies and optimizations
in this paper we revisit constructions from the literature that translate alternating automata into language equivalent nondeterministic automata such constructions are of practical interest in finite state model checking since formulas of widely used linear time temporal logics with future and past operators can directly be translated into alternating automata we present construction scheme that can be instantiated for different automata classes to translate alternating automata into language equivalent nondeterministic automata the scheme emphasizes the core ingredient of previously proposed alternation elimination constructions namely reduction to the problem of complementing nondeterministic automata furthermore we clarify and improve previously proposed constructions for different classes of alternating automata by recasting them as instances of our construction scheme finally we present new complementation constructions for way nondeterministic automata from which we then obtain novel alternation elimination constructions
opportunistic connections to the internet from open wireless access points is now commonly possible in urban areas vehicular networks can opportunistically connect to the internet for several seconds via open access points in this paper we adapt the interactive process of web search and retrieval to vehicular networks with intermittent internet access our system called thedu has mobile nodes use an internet proxy to collect search engine results and prefetch result pages the mobile nodes download the pre fetched web pages from the proxy our contribution is novel set of techniques to make aggressive but selective prefetching practical resulting in significantly greater number of relevant web results returned to mobile users in particular we prioritize responses in the order of the usefulness of the response to the query that allows the mobile node to download the most useful response first to evaluate our scheme we deployed thedu on dieselnet our vehicular testbed operating in micro urban area around amherst ma using simulated workload we find that users can expect four times as many useful responses to web search queries compared to not using thedu’s mechanisms moreover the mean latency in receiving the first relevant response for query is minutes for our deployment we expect thedu to have even better performance in larger cities that have densely populated open aps
we study the implications for the expressive power of call cc of upward continuations specifically the idiom of using continuation twice although such control effects were known to landin and reynolds when they invented and tt escape the forebears of call cc they still act as conceptual pitfall for some attempts to reason about continuations we use this idiom to refute some recent conjectures about equivalences in language with continuations but no other effects this shows that first class continuations as given by call cc have greater expressive power than one would expect from tt goto or exits
various temporal extensions to the relational model have been proposed all of these however deviate significantly from the original relational model this paper presents temporal extension of the relational algebra that is not significantly different from the original relational model yet is at least as expressive as any of the previous approaches this algebra employs multidimensional tuple time stamping to capture the complete temporal behavior of data the basic relational operations are redefined as consistent extensions of the existing operations in manner that preserves the basic algebraic equivalences of the snapshot ie conventional static algebra new operation namely temporal projection is introduced the complete update semantics are formally specified and aggregate functions are defined the algebra is closed and reduces to the snapshot algebra it is also shown to be at least as expressive as the calculus based temporal query language tquel in order to assess the algebra it is evaluated using set of twenty six criteria proposed in the literature and compared to existing temporal relational algebras the proposed algebra appears to satisfy more criteria than any other existing algebra
memory models define which executions of multithreaded programs are legal this paper formalises in fixpoint form the happens before memory model an over approximation of the java one and it presents static analysis using abstract interpretation our approach is completely independent of both the programming language and the analysed property it also appears to be promising framework to define compare and statically analyse other memory models
although many clustering methods have been presented in the literature most of them suffer from some drawbacks such as the requirement of user specified parameters and being sensitive to outliers for general divisive hierarchical clustering methods an obstacle to practical use is the expensive computation in this paper we propose an automatic divisive hierarchical clustering method divfrp its basic idea is to bipartition clusters repeatedly with novel dissimilarity measure based on furthest reference points sliding average of sum of error is employed to estimate the cluster number preliminarily and the optimum number of clusters is achieved after spurious clusters identified the method does not require any user specified parameter even any cluster validity index furthermore it is robust to outliers and the computational cost of its partition process is lower than that of general divisive clustering methods numerical experimental results on both synthetic and real data sets show the performances of divfrp
battery powered systems account for significant and rapidly expanding segment of the electronics and semiconductor industries unfortunately projections of the complexity functionality and performance of such systems far exceed expected improvements in battery technologies leading to widening battery gap bridging this gap is challenge that system designers must face for the foreseeable future the need to improve battery life has in large
in this paper we introduce the concept of witness anonymity for peer to peer systems witness anonymity combines the seemingly conflicting requirements of anonymity for honest peers who report on the misbehavior of other peers and accountability for malicious peers that attempt to misuse the anonymity feature to slander honest peers we propose the secure deep throat sdt protocol to provide anonymity for witnesses of malicious or selfish behavior to enable such peers to report on this behavior without fear of retaliation on the other hand in sdt the misuse of anonymity is restrained in such way that any malicious peer that attempts to send multiple claims against the same innocent peer for the same reason ie the same misbehavior type can be identified we also describe how sdt can be used in two modes the active mode can be used in scenarios with real time requirements eg detecting and preventing the propagation of peer to peer worms whereas the passive mode is suitable for scenarios without strict real time requirements eg query based reputation systems we analyze the security and overhead of sdt and present countermeasures that can be used to mitigate various attacks on the protocol our analysis shows that the communication storage and computation overheads of sdt are acceptable in peer to peer systems
while dynamic voltage scaling dvs and dynamic power management dpm techniques are widely used in real time embedded applications their complex interaction is not fully understood in this research effort we consider the problem of minimizing the expected energy consumption on settings where the workload is known only probabilistically by adopting system level power model we formally show how the optimal processing frequency can be computed efficiently for real time embedded application that can use multiple devices during its execution while still meeting the timing constraints our evaluations indicate that the new technique provides clear up to energy gains over the existing solutions that are proposed for deterministic workloads moreover in non negligible part of the parameter spectrum the algorithm’s performance is shown to be close to that of clairvoyant algorithm that can minimize the energy consumption with the advance knowledge about the exact workload
reliable data delivery and congestion control are two fundamental transport layer functions due to the specific characteristics of wireless sensor networks wsns traditional transport layer protocols eg transmission control protocol tcp and user datagram protocol udp that are widely used in the internet may not be suitable for wsns in this paper the characteristics of wsns are reviewed and the requirements and challenges of reliable data transport over wsns are presented the issues with applying traditional transport protocols over wsns are discussed we then survey recent research progress in developing suitable transport protocols for wsns the proposed reliable data transport and congestion control protocols for wsns are reviewed and summarised finally we describe some future research directions of transport protocol in wsns
mobile user is roaming in zone of cells in cellular network system when call for the mobile arrives the system pages the mobile in these cells since it never reports its location unless it leaves the zone delay constraint paging strategy must find the mobile after at most paging rounds each pages subset of the cells the goal is to minimize the number of paged cells until the mobile is found optimal solutions are known for the off line case for which an priori probability of mobile residing in any one of the cells is known in this paper we address the on line case an on line paging strategy makes its decisions based only on past locations of the mobile while trying to learn its future locationswe present deterministic and randomized on line algorithms for various values of number of paging rounds as function of number of cells and evaluate them using competitive analysis in particular we present constant competitive on line algorithm for the two extreme cases of and the former is the first nontrivial delay constraint case and the latter is the case for which there are no delay constraints we then show that the constant competitiveness can be attained already for logn all of the above algorithms are deterministic our randomized on line algorithm achieves near optimal performance for all values of this algorithm is based on solutions to the best expert problem
in this paper we make use of the relationship between the laplace beltrami operator and the graph laplacian for the purposes of embedding graph onto riemannian manifold to embark on this study we review some of the basics of riemannian geometry and explain the relationship between the laplace beltrami operator and the graph laplacian using the properties of jacobi fields we show how to compute an edge weight matrix in which the elements reflect the sectional curvatures associated with the geodesic paths on the manifold between nodes for the particular case of constant sectional curvature surface we use the kruskal coordinates to compute edge weights that are proportional to the geodesic distance between points we use the resulting edge weight matrix to embed the nodes of the graph onto riemannian manifold to do this we develop method that can be used to perform double centring on the laplacian matrix computed from the edge weights the embedding coordinates are given by the eigenvectors of the centred laplacian with the set of embedding coordinates at hand number of graph manipulation tasks can be performed in this paper we are primarily interested in graph matching we recast the graph matching problem as that of aligning pairs of manifolds subject to geometric transformation we show that this transformation is pro crustean in nature we illustrate the utility of the method on image matching using the coil database
the preservation of literary hypertexts presents significant challenges if we are to ensure continued access to them as the underlying technology changes not only does such an effort involve standard digital preservation problems of representing and refreshing metadata any constituent media types and structure hypertext preservation poses additional dimensions that arise from the work’s on screen appearance its interactive behavior and the ways reader’s interaction with the work is recorded in this paper we describe aspects of preservation introduced by literary hypertexts such as the need to reproduce their modes of interactivity and their means of capturing and using records of reading we then suggest strategies for addressing the pragmatic dimensions of hypertext preservation and discuss their status within existing digital preservation schemes finally we examine the possible roles various stakeholders within and outside of the hypertext community might assume including several social and legal issues that stem from preservation
as became apparent after the tragic events of september terrorist organizations and other criminal groups are increasingly using the legitimate ways of internet access to conduct their malicious activities such actions cannot be detected by existing intrusion detection systems that are generally aimed at protecting computer systems and networks from some kind of cyber attacks preparation of an attack against the human society itself can only be detected through analysis of the content accessed by the users the proposed study aims at developing an innovative methodology for abnormal activity detection which uses web content as the audit information provided to the detection system the new behavior based detection method learns the normal behavior by applying an unsupervised clustering algorithm to the contents of publicly available web pages viewed by group of similar users in this paper we represent page content by the well known vector space model the content models of normal behavior are used in real time to reveal deviation from normal behavior at specific location on the net the detection algorithm sensitivity is controlled by threshold parameter the method is evaluated by the trade off between the detection rate tp and the false positive rate fp
we propose dynamic aspect oriented system for operating system os kernels written in the language unlike other similar systems our system named klasy allows the users to pointcut not only function calls but also member accesses to structures this feature helps the developers who want to use aspects for profiling or debugging an os kernel to enable this klasy uses modified compiler for compiling an os kernel the modified compiler produces extended symbol information which enables dynamic weaver to find the memory addresses of join point shadows during runtime since normal compiler produces only limited symbol information other dynamic aspect oriented systems for have been able to pointcut only function calls we have implemented klasy for linux with the gnu compiler our experiments revealed that klasy achieves sufficient execution performance for practical use our case studies disclosed that klasy is useful for real applications
because of their rapid growth in recent years embedded systems present new front in vulnerability and an attractive target for attackers their pervasive use including sensors and mobile devices makes it easier for an adversary to gain physical access to facilitate both attacks and reverse engineering of the system this paper describes system codesseal for software protection and evaluates its overhead codesseal aims to protect embedded systems from attackers with enough expertise and resources to capture the device and attempt to manipulate not only software but also hardware the protection mechanism involves both compiler based software tool that instruments executables and an on chip fpga based hardware component that provides run time integrity and control flow checking on the executable code the use of reconfigurable hardware allows codesseal to provide such security services as confidentiality integrity and program flow protection in platform independent manner without requiring redesign of the processor similarly the compiler instrumentation hides the security details from software developers software and data protection techniques are presented for our system and performance analysis is provided using cycle accurate simulation our experimental results show that protecting instructions and data with high level of security can be achieved with low performance penalty in most cases less than
internet technology holds significant potential to respond to business educational and social needs but this same technology poses fundamentally new challenges for research ethics to reason about ethical questions researchers and ethics review boards typically rely on dichotomies like public versus private published vs unpublished and anonymous vs identified however online these categories are blurred and the underlying concepts require reinterpretation how then are we to reason about ethical dilemmas about research on the internet to date most work in this area has been grounded in combination of theoretical analysis and experience gained by people in the course of conducting internet research in these studies ethical insight was welcome byproduct of research aimed primarily at exploring other ends however little work has used experimental methods for the primary purpose of contributing to our reasoning about the ethics of research online in this paper we discuss the role of empirical data in helping us answer questions about internet research ethics as an example we review results of one study in which we gauged participant expectations of privacy in public chatrooms hudson bruckman using an experimental approach we demonstrate how participants expectations of privacy conflict with the reality of these public chatrooms although these empirical data cannot provide concrete answers we show how they influence our reasoning about the ethical issues of obtaining informed consent
we present novel algorithm for stochastic rasterization which can rasterize triangles with attributes depending on parameter varying continuously from to inside single frame these primitives are called time continuous triangles and can be used to render motion blur we develop efficient techniques for rasterizing time continuous triangles and specialized sampling and filtering algorithms for improved image quality our algorithm needs some new hardware mechanisms implemented on top of today’s graphics hardware pipelines however our algorithm can leverage much of the already existing hardware units in contemporary gpus which makes the implementation fairly inexpensive we introduce time dependent textures and show that motion blurred shadows and motion blurred reflections can be handled in our framework in addition we also present new techniques for efficient rendering of depth of field and glossy planar reflections using our stochastic rasterizer
most approaches to information filtering taken so far have the underlying hypothesis of potentially delivering notifications from every information producer to subscribers this exact publish subscribe model creates an efficiency and scalability bottleneck and might not even be desirable in certain applications the work presented here puts forward maps novel approach to support approximate information filtering in peer to peer environment in maps user subscribes to and monitors only carefully selected data sources and receives notifications about interesting events from these sources only this way scalability is enhanced by trading recall for lower message traffic we define the protocols of peer to peer architecture especially designed for approximate information filtering and introduce new node selection strategies based on time series analysis techniques to improve data source selection our experimental evaluation shows that maps is scalable it achieves high recall by monitoring only few data sources
nowadays many data mining analysis applications use the graph analysis techniques for decision making many of these techniques are based on the importance of relationships among the interacting units number of models and measures that analyze the relationship importance link structure have been proposed eg centrality importance and page rank and they are generally based on intuition where the analyst intuitively decides reasonable model that fits the underlying data in this paper we address the problem of learning such models directly from training data specifically we study way to calibrate connection strength measure from training data in the context of reference disambiguation problem experimental evaluation demonstrates that the proposed model surpasses the best model used for reference disambiguation in the past leading to better quality of reference disambiguation
mlj compiles sml into verifier compliant java byte codes its features include type checked interlanguage working extensions which allow ml and java code to call each other automatic recompilation management compact compiled code and runtime performance which using just in time compiling java virtual machine usually exceeds that of existing specialised bytecode interpreters for ml notable features of the compiler itself include whole program optimisation based on rewriting compilation of polymorphism by specialisation novel monadic intermediate language which expresses effect information in the type system and some interesting data representation choices
we propose method for dynamic security domain scaling on smps that offers both highly scalable performance and high security for future high end embedded systems its most important feature is its highly efficient use of processor resources accomplished by dynamically changing the number of processors within security domain ie dynamically yielding processors to other security domains in response to application load requirements two new technologies make this scaling possible without any virtualization software self transition management and unified virtual address mapping evaluations show that this domain control provides highly scalable performance and incurs almost no performance overhead in security domains the increase in oss in binary code size is less than percnt and the time required for individual state transitions is on the order of single millisecond this scaling is the first in the world to make possible the dynamic changing of the number of processors within security domain on an arm smp
this paper proposes general framework for selecting features in the computer vision domain ie learning descriptions from data where the prior knowledge related to the application is confined in the early stages the main building block is regularization algorithm based on penalty term enforcing sparsity the overall strategy we propose is also effective for training sets of limited size and reaches competitive performances with respect to the state of the art to show the versatility of the proposed strategy we apply it to both face detection and authentication implementing two modules of monitoring system working in real time in our lab aside from the choices of the feature dictionary and the training data which require prior knowledge on the problem the proposed method is fully automatic the very good results obtained in different applications speak for the generality and the robustness of the framework
device attestation is an essential feature in many security protocols and applications the lack of dedicated hardware and the impossibility to physically access devices to be attested makes attestation of embedded devices in applications such as wireless sensor networks prominent challenge several software based attestation techniques have been proposed that either rely on tight time constraints or on the lack of free space to store malicious code this paper investigates the shortcomings of existing software based attestation techniques we first present two generic attacks one based on return oriented rootkit and the other on code compression we further describe specific attacks on two existing proposals namely swatt and ice based schemes and argue about the difficulty of fixing them all attacks presented in this paper were implemented and validated on commodity sensors
measuring the structural similarity among xml documents is the task of finding their semantic correspondence and is fundamental to many web based applications while there exist several methods to address the problem the data mining approach seems to be novel interesting and promising one it explores the idea of extracting paths from xml documents encoding them as sequences and finding the maximal frequent sequences using the sequential pattern mining algorithms in view of the deficiencies encountered by ignoring the hierarchical information in encoding the paths for mining new sequential pattern mining scheme for xml document similarity computation is proposed in this paper it makes use of preorder tree representation ptr to encode the xml tree paths so that both the semantics of the elements and the hierarchical structure of the document can be taken into account when computing the structural similarity among documents in addition it proposes postprocessing step to reuse the mined patterns to estimate the similarity of unmatched elements so that another metric to qualify the similarity between xml documents can be introduced encouraging experimental results were obtained and reported
this article presents the design implementation and evaluation of cats network storage service with strong accountability properties cats offers simple web services interface that allows clients to read and write opaque objects of variable size this interface is similar to the one offered by existing commercial internet storage services cats extends the functionality of commercial internet storage services by offering support for strong accountability cats server annotates read and write responses with evidence of correct execution and offers audit and challenge interfaces that enable clients to verify that the server is faithful faulty server cannot conceal its misbehavior and evidence of misbehavior is independently verifiable by any participant cats clients are also accountable for their actions on the service client cannot deny its actions and the server can prove the impact of those actions on the state views it presented to other clients experiments with cats prototype evaluate the cost of accountability under range of conditions and expose the primary factors influencing the level of assurance and the performance of strongly accountable storage server the results show that strong accountability is practical for network storage systems in settings with strong identity and modest degrees of write sharing we discuss how the accountability concepts and techniques used in cats generalize to other classes of network services
microprocessor design is very complex and time consuming activity one of the primary reasons is the huge design space that needs to be explored in order to identify the optimal design given number of constraints simulations are usually used to explore these huge design spaces however they are fairly slow several hundreds of billions of instructions need to be simulated per benchmark and this needs to be done for every design point of interestrecently statistical simulation was proposed to efficiently cull huge design space the basic idea of statistical simulation is to collect number of important program characteristics and to generate synthetic trace from it simulating this synthetic trace is extremely fast as it contains million instructions onlythis paper improves the statistical simulation methodology by proposing accurate memory data flow models we model load forwarding ii delayed cache hits and iii correlation between cache misses based on path info our experiments using the spec cpu benchmarks show substantial improvement upon current state of the art statistical simulation methods for example for our baseline configuration we reduce the average ipc prediction error from to in addition we show that performance trends are predicted very accurately making statistical simulation enhanced with accurate data flow models useful tool for efficient and accurate microprocessor design space explorations
we present middleware platform for assembling pervasive applications that demand fault tolerance and adaptivity in distributed dynamic environments unlike typical adaptive middleware approaches in which sophisticated component model semantics are embedded into an existing underlying platform eg corba com ejb we propose platform that imposes minimal constraints for greater flexibility such tradeoff is advantageous when the platform is targeted by automatic code generators that inherently enforce correctness by construction applications are written as simple single threaded programs that assemble and monitor set of distributed components the approach decomposes applications into two distinct layers distributed network of interconnected modules performing computations and constructor logic that assembles that network via simple block diagram construction api the constructor logic subsequently monitors the configured system via stream of high level events such as notifications of resource availability or failures and consequently provides convenient centralized location for reconfiguration and debugging the component network is optimized for performance while the construction api is optimized for ease of assembly microbenchmarks indicate that our runtime incurs minimal overhead in addition to describing the programming model platform implementation and variety of pervasive applications built using our system this paper also extends our previous work with thorough analysis of remote objects and tracking techniques new contributions in distributed component liveness monitoring approaches and expanded microbenchmarks
we consider the use of medial surfaces to represent symmetries of objects this allows for qualitative abstraction based on directed acyclic graph of components and also degree of invariance to variety of transformations including the articulation of parts we demonstrate the use of this representation for object model retrieval our formulation uses the geometric information associated with each node along with an eigenvalue labeling of the adjacency matrix of the subgraph rooted at that node we present comparative retrieval results against the techniques of shape distributions osada et xa al and harmonic spheres kazhdan et xa al on models from the mcgill shape benchmark representing object classes for objects with articulating parts the precision vs recall curves using our method are consistently above and to the right of those of the other two techniques demonstrating superior retrieval performance for objects that are rigid our method gives results that compare favorably with these methods
this article proposes new online voltage scaling vs technique for battery powered embedded systems with real time constraints the vs technique takes into account the execution times and discharge currents of tasks to further reduce the battery charge consumption when compared to the recently reported slack forwarding technique ahmed and chakrabarti while maintaining low online complexity of furthermore we investigate the impact of online rescheduling and remapping on the battery charge consumption for tasks with data dependency which has not been explicitly addressed in the literature and propose novel rescheduling remapping technique finally we take leakage power into consideration and extend the proposed online techniques to include adaptive body biasing abb which is used to reduce the leakage power we demonstrate and compare the efficiency of the presented techniques using seven real life benchmarks and numerous automatically generated examples
we investigate ways in which an algorithm can improve its expected performance by fine tuning itself automatically with respect to an arbitrary unknown input distribution we give such self improving algorithms for sorting and clustering the highlights of this work sorting algorithm with optimal expected limiting running time and ii median algorithm over the hamming cube with linear expected limiting running time in all cases the algorithm begins with learning phase during which it adjusts itself to the input distribution typically in logarithmic number of rounds followed by stationary regime in which the algorithm settles to its optimized incarnation
the maintenance of semantic consistency between numerous heterogeneous electronic product catalogues epc that are distributed autonomous interdependent and emergent on the internet is an unsolved issue for the existing heterogeneous epc integration approaches this article attempts to solve this issue by conceptually designing an interoperable epc iepc system through proposed novel collaborative conceptualisation approach this approach introduces collaboration into the heterogeneous epc integration it implies much potential for future marketplace research it theoretically answers why real world epcs are so complex how these complex epcs can be explained and articulated in product map theory for heterogeneous epc integration how semantic consistency maintenance model can be created to satisfy the three heterogeneous epc integration conditions and implemented by adopting collaborative integration strategy on collaborative concept exchange model and how this collaborative integration strategy can be realised on collaboration mechanism this approach has been validated through theoretical justification and its applicability has been demonstrated in two prototypical business applications
large volume of research in temporal data mining is focusing on discovering temporal rules from time stamped data the majority of the methods proposed so far have been mainly devoted to the mining of temporal rules which describe relationships between data sequences or instantaneous events and do not consider the presence of complex temporal patterns into the dataset such complex patterns such as trends or up and down behaviors are often very interesting for the users in this paper we propose new kind of temporal association rule and the related extraction algorithm the learned rules involve complex temporal patterns in both their antecedent and consequent within our proposed approach the user defines set of complex patterns of interest that constitute the basis for the construction of the temporal rule such complex patterns are represented and retrieved in the data through the formalism of knowledge based temporal abstractions an apriori like algorithm looks then for meaningful temporal relationships in particular precedence temporal relationships among the complex patterns of interest the paper presents the results obtained by the rule extraction algorithm on simulated dataset and on two different datasets related to biomedical applications the first one concerns the analysis of time series coming from the monitoring of different clinical variables during hemodialysis sessions while the other one deals with the biological problem of inferring relationships between genes from dna microarray data
in this paper we address the problem of segmentation in image sequences using region based active contours and level set methods we propose novel method for variational segmentation of image sequences containing nonrigid moving objects the method is based on the classical chan vese model augmented with novel frame to frame interaction term which allow us to update the segmentation result from one image frame to the next using the previous segmentation result as shape prior the interaction term is constructed to be pose invariant and to allow moderate deformations in shape it is expected to handle the appearance of occlusions which otherwise can make segmentation fail the performance of the model is illustrated with experiments on synthetic and real image sequences
basic proposition of process assessment models is that higher process maturity is associated with improved project performance and product quality this study provides empirical evidence to support this proposition by testing the hypothesis that higher process maturity is negatively associated with schedule deviation in software maintenance next the present study investigates whether two process context factors organizational size and geographical region modify the relationship between process maturity and schedule deviation by using moderator testing method our results show that organizational size does not influence the relationship while geographical region is deemed to be an independent variable
this paper investigates the role of online resources in problem solving we look specifically at how programmers an exemplar form of knowledge workers opportunistically interleave web foraging learning and writing code we describe two studies of how programmers use online resources the first conducted in the lab observed participants web use while building an online chat room we found that programmers leverage online resources with range of intentions they engage in just in time learning of new skills and approaches clarify and extend their existing knowledge and remind themselves of details deemed not worth remembering the results also suggest that queries for different purposes have different styles and durations do programmers queries in the wild have the same range of intentions or is this result an artifact of the particular lab setting we analyzed month of queries to an online programming portal examining the lexical structure refinements made and result pages visited here we also saw traits that suggest the web is being used for learning and reminding these results contribute to theory of online resource usage in programming and suggest opportunities for tools to facilitate online knowledge work
at the integration scale of system on chips socs the conflicts between communication and computation will become prominent even on chip big fraction of system time willshift from computation to communication in synchronoussystems large amount of communication time is spent onmultiple clock period wires in this paper we explore retimingto pipeline long interconnect wires in soc designs behaviorally it means that both computation and communicationare rescheduled for parallelism the retiming is applied to anetlist of macro blocks where the internal structures may notbe changed and ip ops may not be able to be inserted onsome wire segments this problem is different from that on agate level netlist and is formulated as wire retiming problemtheoretical treatment and polynomial time algorithmare presented in the paper experimental results showed thebenefits and effectiveness of our approach
the successful integration of data from autonomous and heterogeneous systems calls for the resolution of semantic conflicts that may be present such conflicts are often reflected by discrepancies in attribute values of the same data object in this paper we describe recently developed prototype system discovering and reconciling conflicts direct the system mines data value conversion rules in the process of integrating business data from multiple sources the system architecture and functional modules are described the process of discovering conversion rules from sales data of trading company is presented as an illustrative example
instruction set simulation iss is widely used in system evaluation and software development for embedded processors despite the significant advancements in the iss technology it still suffers from low simulation speed compared to real hardware especially for embedded software developers simulation speed close to real time is important in order to efficiently develop complex software in this paper novel retargetable hybrid simulation framework hysim is presented which allows switching between native code execution and iss based simulation to reach certain state of an application as fast as possible all platform independent parts of the application are directly executed on the host while the platform dependent code executes on the iss during the native code execution performance estimation is conducted case study shows that speed ups ranging from to can be achieved without compromising debugging accuracy the performance estimation during native code execution shows an average error of
modern languages and operating systems often encourage programmers to use threads or independent control streams to mask the overhead of some operations and simplify program structure multitasking operating systems use threads to mask communication latency either with hardwares devices or users client server applications typically use threads to simplify the complex control flow that arises when multiple clients are used recently the scientific computing community has started using threads to mask network communication latency in massively parallel architectures allowing computation and communication to be overlapped lastly some architectures implement threads in hardware using those threads to tolerate memory latencyin general it would be desirable if threaded programs could be written to expose the largest degree of parallelism possible or to simplify the program design however threads incur time and space overheads and programmers often compromise simple designs for performance in this paper we show how to reduce time and space thread overhead using control flow and register liveness information inferred after compilation our techniques work on binaries are not specific to particular compiler or thread library and reduce the the overall execution time of fine grain threaded programs by asymp we use execution driven analysis and an instrumented operating system to show why the execution time is reduced and to indicate areas for future work
in grid computing environment several applications such as scientific data analysis and visualization are naturally computation and communication intensive these applications can be decomposed into sequence of pipeline stages which can be placed on different grid nodes for concurrent execution due to the aggregation of the computation and communication costs involved finding the way to place such pipeline stages on grid in order to achieve the maximum application throughput becomes challenging problem this paper proposes solution that considers both the pipeline placement and the data movement between stages specifically we try to minimize the computation cost of the pipeline stages while preventing the communication overhead between successive stages from dominating the entire processing time our proposed solution consists of two novel methods the first method is single path pipeline execution which exploits only temporal parallelism and the second method is multipath pipeline execution which considers both temporal and spatial parallelism inherent in any pipeline applications we evaluate our work in simulated environment and also conduct set of experiments in real grid computing system when compared with the results from several traditional placement methods our proposed methods give the highest throughput
with the continuous shrinking of transistor size processor designers are facing new difficulties to achieve high clock frequency the register file read time the wake up and selection logic traversal delay and the bypass network transit delay with also their respective power consumptions constitute major difficulties for the design of wide issue superscalar processorsin this paper we show that transgressing rule that has so far been applied in the design of all the superscalar processors allows to reduce these difficulties currently used general purpose isas feature single logical register file and generally floating point register file up to now all superscalar processors have allowed any general purpose functional unit to read and write any physical general purpose registerfirst we propose register write specialization ie forcing distinct groups of functional units to write only in distinct subsets of the physical register file thus limiting the number of write ports on each individual register register write specialization significantly reduces the access time the power consumption and the silicon area of the register file without impairing performancesecond we propose to combine register write specialization with register read specialization for clustered superscalar processors this limits the number of read ports on each individual register and simplifies both the wakeup logic and the bypass network with way cluster wsrs architecture the complexities of the wake up logic entry and bypass point are equivalent to the ones found with conventional way issue processor more physical registers are needed in wsrs architectures nevertheless using wsrs architecture allows dramatic reduction of the total silicon area devoted to the physical register file by factor four to six its power consumption is more than halved and its read access time is shortened by one third some extra hardware and or few extra pipeline stages are needed for register renaming wsrs architecture induces constraints on the policy for allocating instructions to clusters however performance of way cluster wsrs architecture stands the comparison with the one of conventional way cluster conventional superscalar processor
queries on major web search engines produce complex result pages primarily composed of two types of information organic results that is short descriptions and links to relevant web pages and sponsored search results the small textual advertisements often displayed above or to the right of the organic results strategies for optimizing each type of result in isolation and the consequent user reaction have been extensively studied however the interplay between these two complementary sources of information has been ignored situation we aim to change our findings indicate that their perceived relative usefulness as evidenced by user clicks depends on the nature of the query specifically we found that when both sources focus on the same intent for navigational queries there is clear competition between ads and organic results while for non navigational queries this competition turns into synergy we also investigate the relationship between the perceived usefulness of the ads and their textual similarity to the organic results and propose model that formalizes this relationship to this end we introduce the notion of responsive ads which directly address the user’s information need and incidental ads which are only tangentially related to that need our findings support the hypothesis that in the case of navigational queries which are usually fully satisfied by the top organic result incidental ads are perceived as more valuable than responsive ads which are likely to be duplicative on the other hand in the case of non navigational queries incidental ads are perceived as less beneficial possibly because they diverge too far from the actual user need we hope that our findings and further research in this area will allow search engines to tune ad selection for an increased synergy between organic and sponsored results leading to both higher user satisfaction and better monetization
abstract in this paper we address the problem of the efficient visualization of large irregular volume data sets by exploiting multiresolution model based on tetrahedral meshes multiresolution models also called level of detail lod models allow encoding the whole data set at virtually continuous range of different resolutions we have identified set of queries for extracting meshes at variable resolution from multiresolution model based on field values domain location or opacity of the transfer function such queries allow trading off between resolution and speed in visualization we define new compact data structure for encoding multiresolution tetrahedral mesh built through edge collapses to support selective refinement efficiently and show that such structure has storage cost from to times lower than standard data structures used for tetrahedral meshes the data structures and variable resolution queries have been implemented together with state of the art visualization techniques in system for the interactive visualization of three dimensional scalar fields defined on tetrahedral meshes experimental results show that selective refinement queries can support interactive visualization of large data sets
problem solving frameworks in large scale and wide area environments must handle connectivity issues nats and firewalls maintain scalability with respect to connection management accommodate dynamic processes joining leaving at runtime and provide simple means to tolerate communication node failures all of the above must be presented in simple and flexible programming model this paper designs and implements such framework by minimally extending distributed object oriented models for maximum generality and flexibility to make parallelism manageable we introduce an implicit serialization semantics on objects to relieve programmers from explicit synchronization while avoiding the recursion deadlock problems from which some models based on active objects suffer we show how this design nicely incorporate dynamically joining processes in our implementation participating nodes automatically construct tcp overlay so as to address connectivity and scalability issues we have implemented our framework gluepy as library for python for evaluation we show on over cores across clusters with complex networks involving nats and firewalls and process managements involving ssh torque and sge configurations how simple branch and bound search application can be expressed simply and executed easily
the need for content access control in hierarchies cach appears naturally in all contexts where set of users have different access rights to set of resources the hierarchy is defined using the access rights the different resources are encrypted using different keys key management is critical issue for scalable content access control in this paper we study the problem of key management for cach we present main existing access control models and show why these models are not suitable to the cach applications and why they are not implemented in the existing key management schemes furthermore we classify these key management schemes into two approaches and construct an access control model for each approach the proposed access control models are then used to describe the schemes in uniform and coherent way final contribution of our work consists of classification of the cach applications comparison of the key management schemes and study of the suitability of the existing schemes to the cach applications with respect to some analytical measurements
peace is an extremely important value for humankind yet it has been largely ignored by the computing and human computer interaction community this paper seeks to begin discussion within the human computer interaction community on how we can design technologies that have peace as an explicit goal to begin this discussion review empirical studies on the factors that contribute to conflict and those that make conflict less likely based on this identify areas where human computer interaction research has already contributed to prevent conflict and promote peace and open areas where our community can make positive difference
we show how the state space exploration tool verisoft can be used to analyze parallel programs compositionally verisoft is used to check assume guarantee specifications of parallel processes automatically the analysis is meant to complement standard assume guarantee reasoning which is usually carried out solely with pencil and paper while successful analysis does not always imply the general correctness of the specification it increases the confidence in the verification effort an unsuccessful analysis always produces counterexample which can be used to correct the specification or the program verisoft’s optimization and visualization techniques make the analysis relatively efficient and effective
plan diagram is pictorial enumeration of the execution plan choices of database query optimizer over the relational selectivity space we have shown recently that for industrial strength database engines these diagrams are often remarkably complex and dense with large number of plans covering the space however they can often be reduced to much simpler pictures featuring significantly fewer plans without materially affecting the query processing quality plan reduction has useful implications for the design and usage of query optimizers including quantifying redundancy in the plan search space enhancing useability of parametric query optimization identifying error resistant and least expected cost plans and minimizing the overheads of multi plan approaches we investigate here the plan reduction issue from theoretical statistical and empirical perspectives our analysis shows that optimal plan reduction wrt minimizing the number of plans is an np hard problem in general and remains so even for storage constrained variant we then present greedy reduction algorithm with tight and optimal performance guarantees whose complexity scales linearly with the number of plans in the diagram for given resolution next we devise fast estimators for locating the best tradeoff between the reduction in plan cardinality and the impact on query processing quality finally extensive experimentation with suite of multi dimensional tpch based query templates on industrial strength optimizers demonstrates that complex plan diagrams easily reduce to anorexic small absolute number of plans levels incurring only marginal increases in the estimated query processing costs
in this paper we introduce compact random access vector representation for solid textures made of intermixed regions with relatively smooth internal color variations it is feature preserving and resolution independent in this representation texture volume is divided into multiple regions region boundaries are implicitly defined using signed distance function color variations within the regions are represented using compactly supported radial basis functions rbfs with spatial indexing structure such rbfs enable efficient color evaluation during real time solid texture mapping effective techniques have been developed for generating such vector representation from bitmap solid textures data structures and techniques have also been developed to compactly store region labels and distance values for efficient random access during boundary and color evaluation
as cpu cores become building blocks we see great expansion in the types of on chip memory systems proposed for cmps unfortunately designing the cache and protocol controllers to support these memory systems is complex and their concurrency and latency characteristics significantly affect the performance of any cmp to address this problem this paper presents microarchitecture framework for cache and protocol controllers which can aid in generating the rtl for new memory systems the framework consists of three pipelined engines request tracking state manipulation and data movement which are programmed to implement higher level memory model this approach simplifies the design and verification of cmp systems by decomposing the memory model into sequences of state and data manipulations moreover implementing the framework itself produces polymorphic memory system to validate the approach we implemented scalable flexible cmp in silicon the memory system was then programmed to support three disparate memory models cache coherent shared memory streams and transactional memory measured overheads of this approach seem promising our system generates controllers with performance overheads of less than compared to an ideal controller with zero internal latency even the overhead of directly implementing fully programmable controller was modest while it did double the controller’s area the amortized effective area in the system grew by roughly
due to recent large scale deployments of delay and loss sensitive applications there are increasingly stringent demands on the monitoring of service level agreement metrics although many end to end monitoring methods have been proposed they are mainly based on active probing and thus inject measurement traffic into the network in this paper we propose new scheme for monitoring service level agreement metrics in particular delay distribution our scheme is passive and therefore will not cause perturbation to real traffic using realistic delay and traffic demands we show that our scheme achieves high accuracy and can detect burst events that will be missed by probing based methods
this paper describes the worldtravel service oriented application and testbed the purpose of the testbed is to provide to researchers an open source venue for experimenting with and evaluating ideas methods and implementation options for service oriented architectures and applications built upon standard service technologies the worldtravel testbed offers implementations of services and service interactions specific to the worldtravel application comprised of substantive back end that includes simple airline pricing ticketing engine with representative flight database both structured similarly to those used by companies actually offering such services front end for travel services interacting with mid tier request processing and routing services and load traces from the corresponding business applications that are used to drive the use of worldtravel and its serviceswe call worldtravel testbed rather than benchmark because its design permits extension at both the front end eg to add interesting new services like weather information about possible travel destinations and at the back end eg to add payment services this paper identifies the need for testbeds like worldtravel considers the attributes required of such testbeds describes our current testbed in detail and presents an initial testbed evaluation it also describes the actual production quality system on which worldtravel is based
in this paper we present our experience with applying multidimensional separation of concerns to software engineering environment by comparing two different designs of our system we show the importance of separating integration issues from the implementation of the individual concerns we present model in which integration issues are encapsulated into rst class connector objects and indicate how this facilitates the understandability maintenance and evolution of the system we identify issues of binding time binding granularity and binding cardinality as important criteria in selecting an appropriate model for separation of concerns we finally show how good choice following these criteria and considering the requirements of software engineering environments leads to system with dynamic configurability high level component integration and support for multiple instantiable views
two years ago we analyzed the architecture of sagitta sd large business information system being developed on behalf of dutch customs we were in particular interested in assessing the capabilities of the system to accommodate future complex changes we asked stakeholders to bring forward possible changes to the system and next investigated how these changes would affect the software architecture since then the system has been implemented and used and actual modifications have been proposed and realized we studied all change requests submitted since our initial analysis the present paper addresses how well we have been able to predict complex changes during our initial analysis and how and to what extent the process to elicit and assess the impact of such changes might be improved this study suggests that architecture analysis can be improved if we explicitly challenge the initial requirements the study also hints at some fundamental limitations of this type of analysis fundamental modifiabilityrelated decisions need not be visible in the documentation available the actual evolution of system remains to large extent unpredictable and some changes concern complex components and this complexity might not be known at the architecture level and or be unavoidable
we present novel approach that integrates occlusion culling within the view dependent rendering framework view dependent rendering provides the ability to change level of detail over the surface seamlessly and smoothly in real time the exclusive use of view parameters to perform level of detail selection causes even occluded regions to be rendered in high level of detail to overcome this serious drawback we have integrated occlusion culling into the level selection mechanism because computing exact visibility is expensive and it is currently not possible to perform this computation in real time we use visibility estimation technique instead our approach reduces dramatically the resolution at occluded regions
software developers use testing to gain and maintain confidence in the correctness of software system automated reduction and prioritization techniques attempt to decrease the time required to detect faults during test suite execution this paper uses the harrold gupta soffa delayed greedy traditional greedy and optimal greedy algorithms for both test suite reduction and prioritization even though reducing and reordering test suite is primarily done to ensure that testing is cost effective these algorithms are normally configured to make greedy choices with coverage information alone this paper extends these algorithms to greedily reduce and prioritize the tests by using both test cost eg execution time and the ratio of code coverage to test cost an empirical study with eight real world case study applications shows that the ratio greedy choice metric aids test suite reduction method in identifying smaller and faster test suite the results also suggest that incorporating test cost during prioritization allows for an average increase of and maximum improvement of for time sensitive evaluation metric called coverage effectiveness
this paper presents shimmer wireless platform for sensing and actuation that combines localized processing with energy harvesting to provide long lived structural health monitoring the life cycle of the node is significantly extended by the use of super capacitors for energy storage instead of batteries during this period the node is expected to work completely maintenance free the node is capable of harvesting up to per day this makes it completely self sufficient while employed in real structural health monitoring applications unlike other sensor networks that periodically monitor structure and route information to base station our device acquires the data and processes it locally after being radio triggered by an external agent the localized processing allows us to avoid issues due to network congestion our experiments show that its bits computational core can run at mips for minutes daily
the problem of discovering frequent arrangements of temporal intervals is studied it is assumed that the database consists of sequences of events where an event occurs during time interval the goal is to mine temporal arrangements of event intervals that appear frequently in the database the motivation of this work is the observation that in practice most events are not instantaneous but occur over period of time and different events may occur concurrently thus there are many practical applications that require mining such temporal correlations between intervals including the linguistic analysis of annotated data from american sign language as well as network and biological data three efficient methods to find frequent arrangements of temporal intervals are described the first two are tree based and use breadth and depth first search to mine the set of frequent arrangements whereas the third one is prefix based the above methods apply efficient pruning techniques that include set of constraints that add user controlled focus into the mining process moreover based on the extracted patterns standard method for mining association rules is employed that applies different interestingness measures to evaluate the significance of the discovered patterns and rules the performance of the proposed algorithms is evaluated and compared with other approaches on real american sign language annotations and network data and large synthetic datasets
multimedia constitutes an interesting field of application for semantic web and semantic web reasoning as the access and management of multimedia content and context depends strongly on the semantic descriptions of both at the same time multimedia resources constitute complex objects the descriptions of which are involved and require the foundation on sound modeling practice in order to represent findings of low and high level multimedia analysis and to make them accessible via semantic web querying of resources this tutorial aims to provide red thread through these different issues and to give an outline of where semantic web modeling and reasoning needs to further contribute to the area of semantic multimedia for the fruitful interaction between these two fields of computer science
software architecture documentation helps people in understanding the software architecture of system in practice software architectures are often documented after the fact ie they are maintained or created after most of the design decisions have been made and implemented to keep the architecture documentation up to date an architect needs to recover and describe these decisions this paper presents addra an approach an architect can use for recovering architectural design decisions after the fact addra uses architectural deltas to provide the architect with clues about these design decisions this allows the architect to systematically recover and document relevant architectural design decisions the recovered architectural design decisions improve the documentation of the architecture which increases traceability communication and general understanding of system
this paper presents market based macroprogramming mbm new paradigm for achieving globally efficient behavior in sensor networks rather than programming the individual low level behaviors of sensor nodes mbm defines virtual market where nodes sell actions such as taking sensor reading or aggregating data in response to global price information nodes take actions to maximize their own utility subject to energy budget constraints the behavior of the network is determined by adjusting the price vectors for each action rather than by directly specifying local node actions resulting in globally efficient allocation of network resources we present the market based macro programming paradigm as well as several experiments demonstrating its value for sensor network vehicle tracking application
several advanced role based access control rbac models have been developed supporting specific features ie role hierarchy separation of duty to achieve high flexibility however integrating additional features also increases their design complexity and consequently the opportunity for mistakes that may cause information to flow to inappropriate destinations in this paper we present formal technique to model and analyze rbac using colored petri nets cp nets and cpntools for editing and analyzing cp nets our purpose is to elaborate cp net model which describes generic access control structures based on an rbac policy the resulting cp net model can be then composed with different context specific aspects depending on the application significant benefit of cp nets and particularly cpntools is to provide graphical representation and an analysis framework that can be used by security administrators to understand why some permissions are granted or not and to detect whether security constraints are violated
open distributed multi agent systems are gaining interest in the academic community and in industry in such open settings agents are often coordinated using standardized agent conversation protocols the representation of such protocols for analysis validation monitoring etc is an important aspect of multi agent applications recently petri nets have been shown to be an interesting approach to such representation and radically different approaches using petri nets have been proposed however their relative strengths and weaknesses have not been examined moreover their scalability and suitability for different tasks have not been addressed this paper addresses both these challenges first we analyze existing petri net representations in terms of their scalability and appropriateness for overhearing an important task in monitoring open multi agent systems then building on the insights gained we introduce novel representation using colored petri nets that explicitly represent legal joint conversation states and messages this representation approach offers significant improvements in scalability and is particularly suitable for overhearing furthermore we show that this new representation offers comprehensive coverage of all conversation features of fipa conversation standards we also present procedure for transforming auml conversation protocol diagrams standard human readable representation to our colored petri net representation
hop by hop data aggregation is very important technique for reducing the communication overhead and energy expenditure of sensor nodes during the process of data collection in sensor network however because individual sensor readings are lost in the per hop aggregation process compromised nodes in the network may forge false values as the aggregation results of other nodes tricking the base station into accepting spurious aggregation results here fundamental challenge is how can the base station obtain good approximation of the fusion result when fraction of sensor nodes are compromised to answer this challenge we propose sdap secure hop by hop data aggregation protocol for sensor networks sdap is general purpose secure data aggregation protocol applicable to multiple aggregation functions the design of sdap is based on the principles of divide and conquer and commit and attest first sdap uses novel probabilistic grouping technique to dynamically partition the nodes in tree topology into multiple logical groups subtrees of similar sizes commitment based hop by hop aggregation is performed in each group to generate group aggregate the base station then identifies the suspicious groups based on the set of group aggregates finally each group under suspect participates in an attestation process to prove the correctness of its group aggregate the aggregate by the base station is calculated over all the group aggregates that are either normal or have passed the attestation procedure extensive analysis and simulations show that sdap can achieve the level of efficiency close to an ordinary hop by hop aggregation protocol while providing high assurance on the trustworthiness of the aggregation result last prototype implementation on top of tinyos shows that our scheme is practical on current generation sensor nodes such as mica motes
with the emergence of an effective infrastructure supporting grid computing and web services service oriented computing has been growing over the last few years and service oriented architectures are becoming an important computing paradigm when different trust domains control different component services trust management plays critical role to smooth the collaboration among component services the federation of these component services makes new demands for managing trust related behavior although many extant trust management systems deal with intradomain trust behaviors there is growing need for effective strategies for managing inter domain behaviors in this paper we explore requirements for federated trust management system the purpose of this paper is not to suggest single type of system covering all necessary features instead its purpose is to initiate discussion of the requirements arising from inter domain federation to offer context in which to evaluate current and future solutions and to encourage the development of proper models and systems for federated trust management our discussion addresses issues arising from trust representation trust exchange trust establishment trust enforcement and trust storage
anomalies in wireless sensor networks can occur due to malicious attacks faulty sensors changes in the observed external phenomena or errors in communication defining and detecting these interesting events in energy constrained situations is an important task in managing these types of networks key challenge is how to detect anomalies with few false alarms while preserving the limited energy in the network in this article we define different types of anomalies that occur in wireless sensor networks and provide formal models for them we illustrate the model using statistical parameters on dataset gathered from real wireless sensor network deployment at the intel berkeley research laboratory our experiments with novel distributed anomaly detection algorithm show that it can detect elliptical anomalies with exactly the same accuracy as that of centralized scheme while achieving significant reduction in energy consumption in the network finally we demonstrate that our model compares favorably to four other well known schemes on four datasets
parallel scheduling research based on multi core system become more and more popular due to its super computing capacity scheduling fairness and load balance is the key performance indicator for current scheduling algorithm the action of scheduler can be modeled as this accepting the task state graph task scheduling analyzing and putting the produced task into scheduling queue current algorithms involve in the action prediction according to the history record of task scheduling one disadvantage is that it becomes little efficient when task cost keeps great difference our devotion is to rearrange one long task into small subtasks then form another task state graph and parallel schedule them into task queue the final experiments show that performance booster has been reached by comparison with the traditional method
in computer cinematography the process of lighting design involves placing and configuring lights to define the visual appearance of environments and to enhance story elements this process is labor intensive and time consuming primarily because lighting artists receive poor feedback from existing tools interactive previews have very poor quality while final quality images often take hours to renderthis paper presents an interactive cinematic lighting system used in the production of computer animated feature films containing environments of very high complexity in which surface and light appearances are described using procedural renderman shaders our system provides lighting artists with high quality previews at interactive framerates with only small approximations compared to the final rendered images this is accomplished by combining numerical estimation of surface response image space caching deferred shading and the computational power of modern graphics hardwareour system has been successfully used in the production of two feature length animated films dramatically accelerating lighting tasks in our experience interactivity fundamentally changes an artist’s workflow improving both productivity and artistic expressiveness
today’s high end mobile phones commonly include one or two digital cameras these devices also known as cameraphones allow their owners to take photographs anywhere at any time with practically no cost as result many urban dwellers are photographed everyday without even being aware of it although in many countries legislation recognises the right of people to veto the dissemination of their image snapshots including recognisable passers by often end up on photo sharing websites in this paper we present cooperative system for cameraphones which automatically anonymises faces of people photographed involuntarily our system called blurme uses bluetooth awareness to inform photographer’s cameraphone when people around this photographer do not wish for their picture being taken it then identifies subjects on the photograph and anonymises other people’s faces blurme was tested on face regions detected on real life photographs collected from photo sharing website and manually labelled for subject faces the system achieved very promising results on photographs with up to three subjects
the problem of version detection is critical in many important application scenarios including software clone identification web page ranking plagiarism detection and peer to peer searching natural and commonly used approach to version detection relies on analyzing the similarity between files most of the techniques proposed so far rely on the use of hard thresholds for similarity measures however defining threshold value is problematic for several reasons in particular the threshold value is not the same when considering different similarity functions and ii it is not semantically meaningful for the user to overcome this problem our work proposes version detection mechanism for xml documents based on naïve bayesian classifiers thus our approach turns the detection problem into classification problem in this paper we present the results of various experiments on synthetic data that show that our approach produces very good results both in terms of recall and precision measures
this paper considers semantic approach for merging logic programs under answer set semantics given logic programs the goal is to provide characterisations of the merging of these programs our formal techniques are based on notions of relative distance between the underlying se models of the logic programs two approaches are examined the first informally selects those models of the programs that vary the least from the models of the other programs the second approach informally selects those models of program that are closest to the models of programs can be thought of as analogous to set of database integrity constraints we examine formal properties of these operators and give encodings for computing the mergings of multiset of logic programs within the same logic programming framework as by product we provide complexity analysis revealing that our operators do not increase the complexity of the base formalism
nonphotorealistic algorithm for retargeting images adapts large images so that important objects in the image are still recognizable when displayed at lower target resolution unlike existing image manipulation techniques such as cropping and scaling the retargeting algorithm can handle multiple important objects in an image the authors approach constructs topologically constrained epitome of an image based on visual attention model that is both comprehensible and size varying making the method suitable for display critical applications they extend this algorithm to other types of imagery such as medical data
in database cluster preventive replication can provide strong consistency without the limitations of synchronous replication in this paper we present full solution for preventive replication that supports multi master and partial configurations where databases are partially replicated at different nodes to increase transaction throughput we propose an optimization that eliminates delay at the expense of few transaction aborts and we introduce concurrent replica refreshment we describe large scale experimentation of our algorithm based on our repdb prototype http wwwsciencesuniv nantesfr lina atlas repdb over cluster of nodes running the postgresql dbms our experimental results using the tpc benchmark show that the proposed approach yields excellent scale up and speed up
recent years have witnessed tremendous growth of research in the field of wireless systems and networking protocols consequently simulation has appeared as the most convenient approach for the performance evaluation of such systems and several wireless network simulators have been proposed in recent years however the complexity of the wireless physical layer phy induces clear tradeoff between the accuracy and the scalability of simulators thereby the accuracy of the simulation results varies drastically from one simulator to another in this paper we focus on this tradeoff and we investigate the impact of the physical layer modeling accuracy on both the computational cost and the confidence in simulations we first provide detailed discussion on physical layer issues including the radio range link and interference modeling and we investigate how they have been handled in existing popular simulators we then introduce flexible and modular new wireless network simulator called wsnet using this simulator we analyze the influence of the phy modeling on the performance and the accuracy of simulations the results show that the phy modeling and in particular interference modeling can have significant impact on the behavior of the evaluated protocols at the expense of an increased computational overhead moreover we show that the use of realistic propagation models can improve the simulation accuracy without inducing severe degradation of scalability
this paper presents novel interaction system aimed at hands on manipulation of digital models through natural hand gestures our system is composed of new physical interaction device coupled with simulated compliant virtual hand model the physical interface consists of spacenavigator augmented with pressure sensors to detect directional forces applied by the user’s fingertips this information controls the position orientation and posture of the virtual hand in the same way that the spacenavigator uses measured forces to animate virtual frame in this manner user control does not involve fatigue due to reaching gestures or holding desired hand shape during contact the user has realistic visual feedback in the form of plausible interactions between the virtual hand and its environment our device is well suited to any situation where hand gesture contact or manipulation tasks need to be performed in virtual we demonstrate the device in several simple virtual worlds and evaluate it through series of user studies
the neuroweb project supports cerebrovascular researchers association studies intended as the search for statistical correlations between feature eg genotype and phenotype in this project the phenotype refers to the patients pathological state and thus it is formulated on the basis of the clinical data collected during the diagnostic activity in order to enhance the statistical robustness of the association inquiries the project involves four european union clinical institutions each institution provides its proprietary repository storing patients data although all sites comply with common diagnostic guidelines they also adopt specific protocols resulting in partially discrepant repository contents therefore in order to effectively exploit neuroweb data for association studies it is necessary to provide framework for the phenotype formulation grounded on the clinical repository content which explicitly addresses the inherent integration problem to that end we developed an ontological model for cerebrovascular phenotypes the neuroweb reference ontology composed of three layers the top layer top phenotypes is an expert based cerebrovascular disease taxonomy the middle layer deconstructs the top phenotypes into more elementary phenotypes low phenotypes and general use medical concepts such as anatomical parts and topological concepts the bottom layer core data set or cds comprises the clinical indicators required for cerebrovascular disorder diagnosis low phenotypes are connected to the bottom layer cds by specifying what combination of cds values is required for their existence finally cds elements are mapped to the local repositories of clinical data the neuroweb system exploits the reference ontology to query the different repositories and to retrieve patients characterized by common phenotype
this paper examines the representational requirements for interactive collaborative systems intended to support sensemaking and argumentation over contested issues we argue that perspective supported by semiotic and cognitively oriented discourse analyses offers both theoretical insights and motivates representational requirements for the semantics of tools for contesting meaning we introduce our semiotic approach highlighting its implications for discourse representation before describing research system claimaker designed to support the construction of scholarly argumentation by allowing analysts to publish and contest claims about scientific contributions we show how claimaker’s representational scheme is grounded in specific assumptions concerning the nature of explicit modelling and the evolution of meaning within discourse community these characteristics allow the system to represent scholarly discourse as dynamic process in the form of continuously evolving structures cognitively oriented discourse analysis then shows how the use of small set of cognitive relational primitives in the underlying ontology opens possibilities for offering users advanced forms of computational service for analysing collectively constructed argumentation networks
challenging problem facing the semantic search of multimedia data objects is the ability to index them here we present an architectural paradigm for collaborative semantic indexing which makes use of dynamic evolutionary approach by capturing analyzing and interpreting user response and query behavior the patterns of searching and finding multimedia data objects may be established within the present architectural paradigm the semantic index may be dynamically constructed validated and built up where the performance of the system will increase as time progresses our system also incorporates high degree of robustness and fault tolerance whereby inappropriate index terms will be gradually eliminated from the index while appropriate ones will be reinforced we also incorporate genetic variations into the design to allow objects which may otherwise be hidden to be discovered experimental results indicate that the present approach is able to confer significant performance benefits in the semantic searching and discovery of wide variety of multimedia data objects
processing and analyzing large volumes of data plays an increasingly important role in many domains of scientific research high level language and compiler support for developing applications that analyze and process such datasets has however been lacking so farin this paper we present set of language extensions and prototype compiler for supporting high level object oriented programming of data intensive reduction operations over multidimensional data we have chosen dialect of java with data parallel extensions for specifying collection of objects parallel for loop and reduction variables as our source high level language our compiler analyzes parallel loops and optimizes the processing of datasets through the use of an existing run time system called active data repository adr we show how loop fission followed by interprocedural static program slicing can be used by the compiler to extract required information for the run time system we present the design of compiler time interface which allows the compiler to effectively utilize the existing run time systema prototype compiler incorporating these techniques has been developed using the titanium front end from berkeley we have evaluated this compiler by comparing the performance of compiler generated code with hand customized adr code for three templates from the areas of digital microscopy and scientific simulations our experimental results show that the performance of compiler generated versions is on the average lower and in all cases within factor of two of the performance of hand coded versions
we present logic based formalism for modeling of dialogues between intelligent and autonomous software agents building on theory of abstract dialogue games which we present the formalism enables representation of complex dialogues as sequences of moves in combination of dialogue games and allows dialogues to be embedded inside one another the formalism is computational and its modular nature enables different types of dialogues to be represented
uncertain data streams where data is incomplete imprecise and even misleading have been observed in many environments feeding such data streams to existing stream systems produces results of unknown quality which is of paramount concern to monitoring applications in this paper we present the pods system that supports stream processing for uncertain data naturally captured using continuous random variables pods employs unique data model that is flexible and allows efficient computation built on this model we develop evaluation techniques for complex relational operators ie aggregates and joins by exploring advanced statistical theory and approximation evaluation results show that our techniques can achieve high performance while satisfying accuracy requirements and significantly outperform state of the art sampling method case study further shows that our techniques can enable tornado detection system for the first time to produce detection results at stream speed and with much improved quality
software architects often use model based techniques to analyse performance eg response times reliability and other extra functional properties of software systems these techniques operate on models of software architecture and execution environment and are applied at design time for early evaluation of design alternatives especially to avoid implementing systems with insufficient quality virtualisation such as operating system hypervisors or virtual machines and multiple layers in execution environments eg raid disk array controllers on top of hard disks are becoming increasingly popular in reality and need to be reflected in the models of execution environments however current component meta models do not support virtualisation and cannot model individual layers of execution environments this means that the entire monolithic model must be recreated when different implementations of layer must be compared to make design decision eg when comparing different java virtual machines in this paper we present an extension of an established model based performance prediction approach and associated tools which allow to model and predict state of the art layered execution environments such as disk arrays virtual machines and application servers the evaluation of the presented approach shows its applicability and the resulting accuracy of the performance prediction while respecting the structure of the modelled resource environment
lazy replication protocols provide good scalability properties by decoupling transaction execution from the propagation of new values to replica sites while guaranteeing correct and more efficient transaction processing and replica maintenance however they impose several restrictions that are often not valid in practical database settings eg they require that each transaction executes at its initiation site and or are restricted to full replication schemes also the protocols cannot guarantee that the transactions will always see the freshest available replicas this paper presents new lazy replication protocol called pdbrep that is free of these restrictions while ensuring one copy serializable executions the protocol exploits the distinction between read only and update transactions and works with arbitrary physical data organizations such as partitioning and striping as well as different replica granularities it does not require that each read only transaction executes entirely at its initiation site hence each read only site need not contain fully replicated database pdbrep moreover generalizes the notion of freshness to finer data granules than entire databases
in causal processes decisions do not depend on future data many well known problems such as occlusion culling order independent transparency and edge antialiasing cannot be properly solved using the traditional causal rendering architectures because future data may change the interpretation of current eventswe propose adding delay stream between the vertex and pixel processing units while triangle resides in the delay stream subsequent triangles generate occlusion information as result the triangle may be culled by primitives that were submitted after it we show two to fourfold efficiency improvements in pixel processing and video memory bandwidth usage in common benchmark scenes we also demonstrate how the memory requirements of order independent transparency can be substantially reduced by using delay streams finally we describe how discontinuity edges can be detected in hardware previously used heuristics for collapsing samples in adaptive supersampling are thus replaced by connectivity information
this paper describes boostmap method for efficient nearest neighbor retrieval under computationally expensive distance measures database and query objects are embedded into vector space in which distances can be measured efficiently each embedding is treated as classifier that predicts for any three objects whether is closer to or to it is shown that linear combination of such embeddingbased classifiers naturally corresponds to an embedding and distance measure based on this property the boostmap method reduces the problem of embedding construction to the classical boosting problem of combining many weak classifiers into an optimized strong classifier the classification accuracy of the resulting strong classifier is direct measure of the amount of nearest neighbor structure preserved by the embedding an important property of boostmap is that the embedding optimization criterion is equally valid in both metric and non metric spaces performance is evaluated in databases of hand images handwritten digits and time series in all cases boostmap significantly improves retrieval efficiency with small losses in accuracy compared to brute force search moreover boostmap significantly outperforms existing nearest neighbor retrieval methods such as lipschitz embeddings fastmap and vp trees
this paper presents methods based on the open standard xd to rapidly describe life like characters and other scene elements in the context of storyboarding and pre visualization current frameworks that employ virtual agents often rely on non standardized pipelines and lack functionality to describe lighting camera staging or character behavior in descriptive and simple manner even though demand for such system is high ranging from edutainment to pre visualization in the movie industry few such systems exist thereto we present the answer framework which provides set of interconnected components that aid film director in the process of film production from the planning stage to post production rich and intuitive user interfaces are used for scene authoring and the underlying knowledge model is populated using semantic web technologies over which reasoning is applied this transforms the user input into animated pre visualizations that enable director to experience and understand certain film making decisions before production begins in this context we also propose some extensions to the current xd standard for describing cinematic contents
the notion of counting is central to number of basic multiprocessor coordination problems such as dynamic load balancing barrier synchronization and concurrent data structure design we investigate the scalability of variety of counting techniques for large scale multiprocessors we compare counting techniques based on spin locks message passing distributed queues software combining trees and counting networks our comparison is based on series of simple benchmarks on simulated processor alewife machine distributed memory multiprocessor currently under development at mit although locking techniques are known to perform well on small scale bus based multiprocessors serialization limits performance and contention can degrade performance both counting networks and combining trees outperform the other methods substantially by avoiding serialization and alleviating contention although combining tree throughput is more sensitive to variations in load comparison of shared memory and message passing implementations of counting networks and combining trees shows that message passing implementations have substantially higher throughput
this paper describes set of techniques developed for the visualization of high resolution volume data generated from industrial computed tomography for nondestructive testing ndt applications because the data are typically noisy and contain fine features direct volume rendering methods do not always give us satisfactory results we have coupled region growing techniques and histogram interface to facilitate volumetric feature extraction the new interface allows the user to conveniently identify separate or composite and compare features in the data to lower the cost of segmentation we show how partial region growing results can suggest reasonably good classification function for the rendering of the whole volume the ndt applications that we work on demand visualization tasks including not only feature extraction and visual inspection but also modeling and measurement of concealed structures in volumetric objects an efficient filtering and modeling process for generating surface representation of extracted features is also introduced four ct data sets for preliminary ndt are used to demonstrate the effectiveness of the new visualization strategy that we have developed
the availability of machine readable bilingual linguistic resources is crucial not only for rule based machine translation but also for other applications such as cross lingual information retrieval however the building of such resources bilingual single word and multi word correspondences translation rules demands extensive manual work and as consequence bilingual resources are usually more difficult to find than shallow monolingual resources such as morphological dictionaries or part of speech taggers especially when they involve less resourced language this paper describes methodology to build automatically both bilingual dictionaries and shallow transfer rules by extracting knowledge from word aligned parallel corpora processed with shallow monolingual resources morphological analysers and part of speech taggers we present experiments for brazilian portuguese spanish and brazilian portuguese english parallel texts the results show that the proposed methodology can enable the rapid creation of valuable computational resources bilingual dictionaries and shallow transfer rules for machine translation and other natural language processing tasks
we introduce imagemap as method for indexing and similarity searching in image databases idbs imagemap answers queries by example involving any number of objects or regions and taking into account their interrelationships we adopt the most general image content representation that is attributed relational graphs args in conjunction with the well accepted arg editing distance on args we tested imagemap on real and realistic medical images our method not only provides for visualization of the data set clustering and data mining but it also achieves up to fold speed up in search over sequential scanning with zero or very few false dismissals
the secure scalar product or dot product is one of the most used sub protocols in privacy preserving data mining indeed the dot product is probably the most common sub protocol used as such lot of attention has been focused on coming up with secure protocols for computing it however an inherent problem with these protocols is the extremely high computation cost especially when the dot product needs to be carried out over large vectors this is quite common in vertically partitioned data and is real problem in this paper we present ways to efficiently compute the approximate dot product we implement the dot product protocol and demonstrate the quality of the approximation our dot product protocol can be used to securely and efficiently compute association rules from data vertically partitioned between two parties
semantic web data seems like promising source of information for improving search while there is some literature about how semantic data should be used to enhance search there are no positive conclusions about the best approach this paper surveys existing approaches to semantic web search describes adapting trec benchmark for evaluation and proposes learned representation algorithm for using semantic web data in search the acm portal is published by the association for computing machinery copyright acm inc terms of usage privacy policy code of ethics contact us useful downloads adobe acrobat quicktime windows media player real player
the hm system is generalization of the hindley milner system parameterized in the constraint domain type inference is performed by generating constraints out of the program text which are then solved by the domain specific constraint solver the solver has to be invoked at the latest when type inference reaches let node so that we can build polymorphic type typical example of such an inference approach is milner’s algorithm we formalize an inference approach where the hm type inference problem is first mapped to clp program the actual type inference is achieved by executing the clp program such an inference approach supports the uniform construction of type inference algorithms and has important practical consequences when it comes to reporting type errors the clp style inference system where is defined by constraint handling rules is implemented as part of the chameleon system
we present general technique for approximating various descriptors of the extent of set of points in rd when the dimension is an arbitrary fixed constant for given extent measure mu and parameter epsiv it computes in time epsiv subset sube of size epsiv with the property that minus epsiv mu le mu le mu the specific applications of our technique include epsiv approximation algorithms for computing diameter width and smallest bounding box ball and cylinder of ii maintaining all the previous measures for set of moving points and iii fitting spheres and cylinders through point set our algorithms are considerably simpler and faster in many cases than previously known algorithms
libre software licensing schemes are sometimes abused by companies or individuals in order to encourage open source development it is necessary to build tools that can help in the rapid identification of open source licensing violations this paper describes an attempt to build such tool we introduce framework for approximate matching of programs and describe an implementation for java byte code programs first we statically analyze program to remove dead code simplify expressions and then extract slices which are generated from assignment statements we then compare programs by matching between sets of slices based on distance function we demonstrate the effectiveness of our method by running experiments on programs generated from two compilers and transformed by two commercial grade control flow obfuscators our method achieves an acceptable level of precision
xml extensible mark up language has been embraced as new approach to data modeling nowadays more and more information is formatted as semi structured data eg articles in digital library documents on the web and so on implementation of an efficient system enabling storage and querying of xml documents requires development of new techniques many different techniques of xml indexing have been proposed in recent years in the case of xml data we can distinguish the following trees an xml tree tree of elements and attributes and dataguide tree of element tags and attribute names obviously the xml tree of an xml document is much larger than the dataguide of given document authors often consider dataguide as small tree therefore they consider the dataguide search as small problem however we show that dataguide trees are often massive in the case of real xml documents consequently trivial dataguide search may be time and memory consuming in this article we introduce efficient methods for searching an xml twig pattern in large complex dataguide trees
it is well known that universally composable multiparty computation cannot in general be achieved in the standard model without setup assumptions when the adversary can corrupt an arbitrary number of players one way to get around this problem is by having trusted third party generate some global setup such as common reference string crs or public key infrastructure pki the recent work of katz shows that we may instead rely on physical assumptions and in particular tamper proof hardware tokens in this paper we consider similar but strictly weaker physical assumption we assume that player alice can partially isolate another player bob for brief portion of the computation and prevent bob from communicating more than some limited number of bits with the environment for example isolation might be achieved by asking bob to put his functionality on tamper proof hardware token and assuming that alice can prevent this token from communicating to the outside world alternatively alice may interact with bob directly but in special office which she administers and where there are no high bandwidth communication channels to the outside world we show that under standard cryptographic assumptions such physical setup can be used to uc realize any two party and multiparty computation in the presence of an active and adaptive adversary corrupting any number of players we also consider an alternative scenario in which there are some trusted third parties but no single such party is trusted by all of the players this compromise allows us to significantly limit the use of the physical set up and hence might be preferred in practice
information networks are ubiquitous in many applications and analysis on such networks has attracted significant attention in the academic communities one of the most important aspects of information network analysis is to measure similarity between nodes in network simrank is simple and influential measure of this kind based on solid theoretical random surfer model existing work computes simrank similarity scores in an iterative mode we argue that the iterative method can be infeasible and inefficient when as in many real world scenarios the networks change dynamically and frequently we envision non iterative method to bridge the gap it allows users not only to update the similarity scores incrementally but also to derive similarity scores for an arbitrary subset of nodes to enable the non iterative computation we propose to rewrite the simrank equation into non iterative form by using the kronecker product and vectorization operators based on this we develop family of novel approximate simrank computation algorithms for static and dynamic information networks and give their corresponding theoretical justification and analysis the non iterative method supports efficient processing of various node analysis including similarity tracking and centrality tracking on evolving information networks the effectiveness and efficiency of our proposed methods are evaluated on synthetic and real data sets
in memory tree structured index search is fundamental database operation modern processors provide tremendous computing power by integrating multiple cores each with wide vector units there has been much work to exploit modern processor architectures for database primitives like scan sort join and aggregation however unlike other primitives tree search presents significant challenges due to irregular and unpredictable data accesses in tree traversal in this paper we present fast an extremely fast architecture sensitive layout of the index tree fast is binary tree logically organized to optimize for architecture features like page size cache line size and simd width of the underlying hardware fast eliminates impact of memory latency and exploits thread level and datalevel parallelism on both cpus and gpus to achieve million cpu and million gpu queries per second cpu and gpu faster than the best previously reported performance on the same architectures fast supports efficient bulk updates by rebuilding index trees in less than seconds for datasets as large as mkeys and naturally integrates compression techniques overcoming the memory bandwidth bottleneck and achieving performance improvement over uncompressed index search for large keys on cpus
multi methods collections of overloaded methods associated to the same message whose selection takes place dynamically instead of statically as in standard overloading are useful mechanism since they unleash the power of dynamic binding in object oriented languages so enhancing re usability and separation of responsibilities however many mainstream languages such as eg java do not provide it resorting to only static overloading in this paper we propose an extension we call fmj featherweight multi java of featherweight java with encapsulated multi methods thus providing dynamic overloading the extension is conservative and type safe both message not understood and message ambiguous are statically ruled out our core language can be used as the formal basis for an actual implementation of dynamic overloading in java like languages
mobile clients have limited display and navigation capabilities to browse set of documents an intuitive method is to navigate through concept hierarchies to reduce semantic loading for each term that represents the concepts and the cognitive loading of users due to the limited display similar documents are grouped together before concept hierarchies are constructed for each document group since the concept hierarchies only represent the salient concepts in the documents term extraction is necessary our pilot experiments showed that an unconventional combination of term frequency and inverse document frequency yielded similar performance ie to previous work and the use of terms in titles achieved better performance than previous work ie our preliminary results of building concept hierarchies after clustering compared to that without is encouraging cf and we believe that further research can enhance the performance of concept hierarchies to level for commercial deployment for mobile clients
nowadays integration of enterprise information systems constitutes real and growing need for most enterprises especially for large and dynamic industrial ones it constitutes the main approach to dealing with heterogeneity that concerns the multiple software applications that make up information systems this paper constitutes general survey on integration of industrial information systems and aims to overview the main approaches that can be used in the context of industrial enterprises either for syntactic or semantic integration issues in particular this paper focuses on some semantics based approaches that promote the use of ontologies and especially the use of owl service ontology
in this paper we discuss set of functional requirements for software exploration tools and provide initial evidence that various combinations of these features are needed to effectively assist developers in understanding software we observe that current tools for software exploration only partly support these features this has motivated the development of sextant software exploration tool tightly integrated into the eclipse ide that has been developed to fill this gap by means of case studies we demonstrate how the requirements fulfilled by sextant are conducive to an understanding needed to perform maintenance task
today globally distributed software development has become the norm for many organizations and the popularity of implementing such an approach continues to increase in these circumstances strategy often employed is the use of virtual software development teams due to the collaborative nature of software development this has proved difficult and complex endeavor research has identified distance in its various forms as an important factor which negatively impacts on global software development and on virtual software team operation in particular in this context the aspects of distance have been defined as temporal geographical cultural and linguistic key element for the success of any team based project is the development of trust and cooperation each aspect of distance can negatively impact on the development of trust and hamper cooperation particularly in the virtual team environment an additional factor which this research identified is the importance and negative impact fear plays the serious implications of these factors are due to the need for dependence on asynchronous and online communication which is inherent to global software development and the operation of virtual software teams in particular the findings presented here are the results from four independent studies undertaken over twelve year period which consider each of these issues having identified the problems associated with trust and communication how these issues were successfully addressed and managed on multimillion dollar project which was heading for failure is outlined
this article presents new and highly accurate method for branch prediction the key idea is to use one of the simplest possible neural methods the perceptron as an alternative to the commonly used two bit counters the source of our predictor’s accuracy is its ability to use long history lengths because the hardware resources for our method scale linearly rather than exponentially with the history length we describe two versions of perceptron predictors and we evaluate these predictors with respect to five well known predictors we show that for kb hardware budget simple version of our method that uses global history achieves misprediction rate of percnt on the spec integer benchmarks an improvement of percnt over gshare we also introduce global local version of our predictor that is percnt more accurate than the mcfarling style hybrid predictor of the alpha we show that for hardware budgets of up to kb this global local perceptron predictor is more accurate than evers multicomponent predictor so we conclude that ours is the most accurate dynamic predictor currently available to explore the feasibility of our ideas we provide circuit level design of the perceptron predictor and describe techniques that allow our complex predictor to operate quickly finally we show how the relatively complex perceptron predictor can be used in modern cpus by having it override simpler quicker smith predictor providing ipc improvements of percnt over gshare and percnt over the mcfarling hybrid predictor
although many web development methods exist they are rarely used by practitioners the work reported here seeks to explain why this might be so and suggests that for many the perceived benefits may be outweighed by the difficulty or effort required to learn the method in attempting to gauge the utility of methods the authors undertook year study of small web development projects attempting to use range of published academic methods of the projects we found only one case where the developer continued to use an academic web development method throughout the lifecycle the ability to understand method and or its techniques was repeatedly cited as the reason for its abandonment our findings also indicate number of key areas relating to terminology completeness and guidance where existing methods may be failing their intended users in attempting to further our understanding of web development methods we completed comprehensive survey of web development methods covering web development methods encompassing range of different research communities and drawing upon different sources our findings here shed some light upon the confusion of methods for the would be user in summary the findings are that although there is much of value in variety of methods method choice is somewhat bewildering for the newcomer to the field and many methods are incomplete in some dimension by providing this work we hope to go some way towards supporting the software engineering community in both academia and industry in their understanding of the quality issues that exist with the take up and use of web development methods
this paper reports the results of study comparing the effectiveness of automatically generated tests constructed using random and way combinatorial techniques on safety related industrial code using mutation adequacy criteria reference point is provided by hand generated test vectors constructed during development to establish minimum acceptance criteria the study shows that way testing is not adequate measured by mutants kill rate compared with hand generated test set of similar size but that higher factor way test sets can perform at least as well to reduce the computation overhead of testing large numbers of vectors over large numbers of mutants staged optimising approach to applying way tests is proposed and evaluated which shows improvements in execution time and final test set size
typical commercial cad tools provide modal tools such as pan zoom orbit look etc to facilitate freeform navigation in scene mastering these navigation tools requires significant amount of learning and even experienced computer users can find learning confusing and error prone to address this we have developed concept called safe navigation where we augment these modal tools with properties to reduce the occurance of confusing situations and improve the learning experience in this paper we describe the major properties needed for safe navigation the features we implemented to realize these properties and usability tests on the effectiveness of these features we conclude that indeed these properties do improve the learning experience for users that are new to furthermore many of the features we implemented for safe navigation are also very popular with experienced users as result these features have been integrated into six commercial cad applications and we recommend other application developers include these features to improve navigation
the ability to store vast quantities of data and the emergence of high speed networking have led to intense interest in distributed data mining however privacy concerns as well as regulations often prevent the sharing of data between multiple parties privacy preserving distributed data mining allows the cooperative computation of data mining algorithms without requiring the participating organizations to reveal their individual data items to each other this paper makes several contributions first we present simple deterministic efficient kclustering algorithm that was designed with the goal of enabling an efficient privacy preserving version of the algorithm our algorithm examines each item in the database only once and uses only sequential access to the data our experiments show that this algorithm produces cluster centers that are on average more accurate than the ones produced by the well known iterative means algorithm and compares well against birch second we present distributed privacy preserving protocol for clustering based on our new clustering algorithm the protocol applies to databases that are horizontally partitioned between two parties the participants of the protocol learn only the final cluster centers on completion of the protocol unlike most of the earlier results in privacy preserving clustering our protocol does not reveal intermediate candidate cluster centers the protocol is also efficient in terms of communication and does not depend on the size of the database although there have been other clustering algorithms that improve on the means algorithm ours is the first for which communication efficient cryptographic privacy preserving protocol has been demonstrated
feature selection has become an increasingly important field of research it aims at finding optimal feature subsets that can achieve better generalization on unseen data however this can be very challenging task especially when dealing with large feature sets hence search strategy is needed to explore relatively small portion of the search space in order to find semi optimal subsets many search strategies have been proposed in the literature however most of them do not take into consideration relationships between features due to the fact that features usually have different degrees of dependency among each other we propose in this paper new search strategy that utilizes dependency between feature pairs to guide the search in the feature space when compared to other well known search strategies the proposed method prevailed
inheritance is crucial part of object oriented programming but its use in practice and the resulting large scale inheritance structures in programs remain poorly understood previous studies of inheritance have been relatively small and have generally not considered issues such as java’s distinction between classes and interfaces nor have they considered the use of external librariesin this paper we present the first substantial empirical study of the large scale use of inheritance in contemporary oo programming language we present suite of structured metrics for quantifying inheritance in java programs we present the results of performing corpus analysis using those metrics to over applications consisting of over separate classes and interfaces our analysis finds higher use of inheritance than anticipated variation in the use of inheritance between interfaces and classes and differences between inheritance within application types compared with inheritance from external libraries
nearest neighbor finding is one of the most important spatial operations in the field of spatial data structures concerned with proximity because the goal of the space filling curves is to preserve the spatial proximity the nearest neighbor queries can be handled by these space filling curves when data are ordered by the peano curve we can directly compute the sequence numbers of the neighboring blocks next to the query block in eight directions in the space based on its bit shuffling property but when data are ordered by the rbg curve or the hilbert curve neighbor finding is complex however we observe that there is some relationship between the rbg curve and the peano curve as with the hilbert curve therefore in this paper we first show the strategy based on the peano curve for the nearest neighbor query next we present the rules for transformation between the peano curve and the other two curves including the rbg curve and the hilbert curve such that we can also efficiently find the nearest neighbor by the strategies based on these two curves from our simulation we show that the strategy based on the hilbert curve requires the least total time the cpu time and the time to process the nearest neighbor query among our three strategies since it can provide the good clustering property
this paper presents new method for evaluating boolean set operations between binary space partition bsp trees our algorithm has many desirable features including both numerical robustness and output sensitive time complexity while simultaneously admitting straightforward implementation to achieve these properties we present two key algorithmic improvements the first is method for eliminating null regions within bsp tree using linear programming this replaces previous techniques based on polygon cutting and tree splitting the second is an improved method for compressing bsp trees based on similar approach within binary decision diagrams the performance of the new method is analyzed both theoretically and experimentally given the importance of boolean set operations our algorithms can be directly applied to many problems in graphics cad and computational geometry
data warehouses collect masses of operational data allowing analysts to extract information by issuing decision support queries on the otherwise discarded data in many application areas eg telecommunications the warehoused data sets are multiple terabytes in size parts of these data sets are stored on very large disk arrays while the remainder is stored on tape based tertiary storage which is one to two orders of magnitude less expensive than on line storage however the inherently sequential nature of access to tape based tertiary storage makes the efficient access to tape resident data difficult to accomplish through conventional databasesin this paper we present way to make access to massive tape resident data warehouse easy and efficient ad hoc decision support queries usually involve large scale and complex aggregation over the detail data these queries are difficult to express in sql and frequently require self joins on the detail data which are prohibitively expensive on the disk resident data and infeasible to compute on tape resident data or unnecessary multiple passes through the detail data an extension to sql the extended multi feature sql emf sql expresses complex aggregation computations in clear manner without using self joins the detail data in data warehouse usually represents record of past activities and therefore is temporal we show that complex queries involving sequences can be easily expressed in emf sql an emf sql query can be optimized to minimize the number of passes through the detail data required to evaluate the query in many cases to only one pass we describe an efficient query evaluation algorithm along with query optimization algorithm that minimizes the number of passes through the detail data and which minimizes the amount of main memory required to evaluate the query these algorithms are useful not only in the context of tape resident data warehouses but also in data stream systems which require similar processing techniques
with new technologies temperature has become major issue to be considered at system level design without taking temperature aspects into consideration no approach to energy or and performance optimization will be sufficiently accurate and efficient in this paper we propose an on line temperature aware dynamic voltage and frequency scaling dvfs technique which is able to exploit both static and dynamic slack the approach implies an offline temperature aware optimization step and on line voltage frequency settings based on temperature sensor readings most importantly the presented approach is aware of the frequency temperature dependency by which important additional energy savings are obtained
reputation system provides way to maintain trust through social control by utilizing feedbacks about the service providers past behaviors conventional memory based reputation system mrs is one of the most successful mechanisms in terms of accuracy though mrs performs well on giving predicted values for service providers offering averaging quality services our experiments show that mrs performs poor on giving predicted values for service providers offering high and low quality services we propose bayesian memory based reputation system bmrs which uses bayesian theory to analyze the probability distribution of the predicted valued given by mrs and makes suitable adjustment the simulation results which are based on eachmovie dataset show that our proposed bmrs has higher accuracy than mrs on giving predicted values for service providers offering high and low quality services
data clustering is an important task in many disciplines large number of studies have attempted to improve clustering by using the side information that is often encoded as pairwise constraints however these studies focus on designing special clustering algorithms that can effectively exploit the pairwise constraints we present boosting framework for data clustering termed as boostcluster that is able to iteratively improve the accuracy of any given clustering algorithm by exploiting the pairwise constraints the key challenge in designing boosting framework for data clustering is how to influence an arbitrary clustering algorithm with the side information since clustering algorithms by definition are unsupervised the proposed framework addresses this problem by dynamically generating new data representations at each iteration that are on the one hand adapted to the clustering results at previous iterations by the given algorithm and on the other hand consistent with the given side information our empirical study shows that the proposed boosting framework is effective in improving the performance of number of popular clustering algorithms means partitional singlelink spectral clustering and its performance is comparable to the state of the art algorithms for data clustering with side information
existing encoding schemes and index structures proposed for xml query processing primarily target the containment relationship specifically the parent child and ancestor descendant relationship the presence of preceding sibling and following sibling location steps in the xpath specification which is the de facto query language for xml makes the horizontal navigation besides the vertical navigation among nodes of xml documents necessity for efficient evaluation of xml queries our work enhances the existing range based and prefix based encoding schemes such that all structural relationships between xml nodes can be determined from their codes alone furthermore an external memory index structure based on the traditional tree xl tree xml location tree is introduced to index element sets such that all defined location steps in the xpath language vertical and horizontal top down and bottom up can be processed efficiently the xl trees under the range or prefix encoding scheme actually share the same structure but various search operations upon them may be slightly different as result of the richer information provided by the prefix encoding scheme finally experiments are conducted to validate the efficiency of the xl tree approach we compare the query performance of xl tree with that of tree which is capable of handling comprehensive xpath location steps and has been empirically shown to outperform other indexing approaches
the application of feature subsets with high order correlation in classification has demonstrates its power in recent study where non redundant interacting feature subsets nifs is defined based on multi information in this paper we re examine the problem of finding nifss we further improve the upper bounds and lower bounds on the correlations which can be used to significantly prune the search space the experiments on real datasets demonstrate the efficiency and effectiveness of our approach
in multimode distributed systems active task sets are assigned to their distributed components for realizing one or more functions many of these systems encounter runtime task variations at the input and across the system while processing their tasks in real time very few efforts have been made to address energy efficient scheduling in these types of distributed systems in this paper we propose an analytical model for energy efficient scheduling in distributed real time embedded systems to handle time varying task inputs new slack distribution scheme is introduced and adopted during the schedule of the task sets in the system the slack distribution is made according to the service demand at the nodes which affects the energy consumption in the system the active component at node periodically determines the service rate and applies voltage scaling according to the dynamic traffic condition observed at various network nodes the proposed approach uses comprehensive traffic description function at nodes and provides adequate information about the worst case traffic behavior anywhere in the distributed network thereby enhancing the system power management capabilities we evaluate the proposed technique using several benchmarks employing an event driven simulator and demonstrate its performance for multimode applications experimental results indicate significant energy savings in various examples and case studies
today most object oriented software systems are developed using an evolutionary process model therefore understanding the phases that the system’s logical design has gone through and the style of their evolution can provide valuable insights in support of consistently maintaining and evolving the system without compromising the integrity and stability of its architecture in this paper we present method for analyzing the evolution of object oriented software systems from the point of view of their logical design this method relies on umldiff uml structure differencing algorithm which given sequence of uml class models corresponding to the logical design of sequence of system code releases produces sequence of change records that describe the design level changes between subsequent system releases this change records sequence is subsequently analyzed from the perspective of each individual system class to produce the class evolution profile ie class specific change records sequence three types of longitudinal analyses phasic gamma and optimal matching analysis are applied to the class evolution profiles to recover high level abstraction of distinct evolutionary phases and their corresponding styles and to identify class clusters with similar evolution trajectories the recovered knowledge facilitates the overall understanding of system evolution and the planning of future maintenance activities we report on one real world case study evaluating our approach
we survey principles of model checking techniques for the automatic analysis of reactive systems the use of model checking is exemplified by an analysis of the needham schroeder public key protocol we then formally define transition systems temporal logic automata and their relationship basic model checking algorithms for linear and branching time temporal logics are defined followed by an introduction to symbolic model checking and partial order reduction techniques the paper ends with list of references to some more advanced topics
interconnection network design plays central role in the design of parallel systems most of the previous research has evaluated the performance of interconnection networks in isolation in this study we investigate the relationship between application program characteristics and interconnection network performance using an execution driven simulation testbed the reconfigurable architecture workbench raw we simulate five topological configurations of ary cube interconnect and four different network link models for node simd machine and quantify the impact of the network on two application programs we provide experimental evidence that such in context simulation provides better view of the impact of network design variables on system performance we show that recent results indicating that low dimensional designs provide better icn performance ignore application requirements that may favor high dimensional designs furthermore applications that would appear to favor low dimensional designs may not in fact be significantly impacted by the network’s dimensionality we experimentally test the results of published performance models comparing the use of synthetic load to that of load generated by typical application program the experiments indicate that the standard metric of average message latency can vary considerably under different application loads and that average message latency may not reflect overall application performancein particular at the level of the offered application generated load to the network the topological properties of the network are important in determining the average message latency however for overall application performance we found that the network topology may not be critical so long as there is sufficient network bandwidth in such cases the results suggest that optimizing the implementation cost of the network should be the key design criterion we also present simple abstraction for the network that captures all the important design parameters of the interconnect that can be easily incorporated into any execution driven simulation framework
we present algorithms that solve number of fundamental problems on planar directed graphs planar digraphs in os where is the number of os needed to sort elements the problems we consider are breadth first search the single source shortest path problem computing directed ear decomposition of strongly connected planar digraph computing an open directed ear decomposition of strongly connected biconnected planar digraph and topologically sorting planar directed acyclic graph
transactional memory tm is an emerging concurrent programming abstraction numerous software based transactional memory stm implementations have been developed in recent years stm implementations must guarantee transaction atomicity and isolation in order to ensure progress an stm implementation must resolve transaction collisions by consulting contention manager cm recent work established that serializing contention management technique in which the execution of colliding transactions is serialized for eliminating repeat collisions can dramatically improve stm performance in high contention workloads in low contention and highly parallel workloads however excessive serialization of memory transactions may limit concurrency too much and hurt performance it is therefore important to better understand how the impact of serialization on stm performance varies as function of workload characteristicswe investigate how serializing cm influences the performance of stm systems specifically we study serialization’s influence on stm throughput number of committed transactions per time unit and efficiency ratio between the extent of useful work done by the stm and work wasted by aborts as the workload’s level of contention varies towards this goal we implement cbench synthetic benchmark that generates workloads in which transactions have parameter pre determined length and probability of being aborted in the lack of contention reduction mechanisms cbench facilitates evaluating the efficiency of contention management algorithms across the full spectrum of contention levelsthe characteristics of tm workloads generated by real applications may vary over time to achieve good performance cm algorithms need to monitor these characteristics and change their behavior accordingly we implement adaptive algorithms that control the activation of serialization cm according to measured contention level based on novel low overhead serialization algorithm we then evaluate our new algorithms on cbench generated workloads and on additional well known stm benchmark applications we believe our results shed light on the manner in which serializing cm should be used by stm systems
this paper discusses visualization and interactive exploration of large relational data sets through the integration of several well chosen multidimensional data visualization techniques and for the purpose of visual data mining and exploratory data analysis the basic idea is to combine the techniques of grand tour direct volume rendering and data aggregation in databases to deal with both the high dimensionality of data and large number of relational records each technique has been enhanced or modified for this application specifically positions of data clusters are used to decide the path of grand tour this cluster guided tour makes intercluster distance preserving projections in which data clusters are displayed as separate as possible tetrahedral mapping method applied to cluster centroids helps in choosing interesting cluster guided projections multidimensional footprint splatting is used to directly render large relational data sets this approach abandons the rendering techniques that enhance realism and focuses on how to efficiently produce real time explanatory images that give comprehensive insights into global features such as data clusters and holes examples are given where the techniques are applied to large more than million records relational data sets
we present novel approach to automatic information extraction from deep web life science databases using wrapper induction traditional wrapper induction techniques focus on learning wrappers based on examples from one class of web pages ie from web pages that are all similar in structure and content thereby traditional wrapper induction targets the understanding of web pages generated from database using the same generation template as observed in the example set however life science web sites typically contain structurally diverse web pages from multiple classes making the problem more challenging furthermore we observed that such life science web sites do not just provide mere data but they also tend to provide schema information in terms of data labels giving further cues for solving the web site wrapping task our solution to this novel challenge of site wide wrapper induction consists of sequence of steps classification of similar web pages into classes discovery of these classes and wrapper induction for each class our approach thus allows us to perform unsupervised information retrieval from across an entire web site we test our algorithm against three real world biochemical deep web sources and report our preliminary results which are very promising
satchmore was introduced as mechanism to integrate relevancy testing with the model generation theorem prover satchmo this made it possible to avoid invoking some clauses that appear in no refutation which was major drawback of the satchmo approach satchmore relevancy however is driven by the entire set of negative clauses and no distinction is accorded to the query negation under unfavorable circumstances such as in the presence of large amounts of negative data this can reduce the efficiency of satchmore in this paper we introduce further refinement of satchmo called satchmorebid satchmore with bidirectional relevancy satchmorebid uses only the negation of the query for relevancy determination at the start other negative clauses are introduced on demand and only if refutation is not possible using the current set of negative clauses the search for the relevant negative clauses is performed in forward chaining mode as opposed to relevancy propagation in satchmore which is based on backward chaining satchmorebid is shown to be refutationally sound and complete experiments on prototype satchmorebid implementation point to its potential to enhance the efficiency of the query answering process in disjunctive databases
we consider the problem of fitting step function to set of points more precisely given an integer and set of points in the plane our goal is to find step function with steps that minimizes the maximum vertical distance between and all the points in we first give an optimal logn algorithm for the general case in the special case where the points in are given in sorted order according to their coordinates we give an optimal time algorithm then we show how to solve the weighted version of this problem in time log finally we give an logh algorithm for the case where outliers are allowed and the input is sorted the running time of all our algorithms is independent of
due to uncertainty in nodal mobility dtn routing usually employs multi copy forwarding schemes to avoid the cost associated with flooding much effort has been focused on probabilistic forwarding which aims to reduce the cost of forwarding while retaining high performance rate by forwarding messages only to nodes that have high delivery probabilities this paper aims to provide an optimal forwarding protocol which maximizes the expected delivery rate while satisfying certain constant on the number of forwardings per message in our proposed optimal probabilistic forwarding opf protocol we use an optimal probabilistic forwarding metric derived by modeling each forwarding as an optimal stopping rule problem we also present several extensions to allow opf to use only partial routing information and work with other probabilistic forwarding schemes such as ticket based forwarding we implement opf and several other protocols and perform trace driven simulations simulation results show that the delivery rate of opf is only lower than epidemic and greater than the state of the art delegation forwarding while generating more copies and longer delay
this paper presents the design and deployment experience of an air dropped wireless sensor network for volcano hazard monitoring the deployment of five stations into the rugged crater of mount st helens only took one hour with helicopter the stations communicate with each other through an amplified radio and establish self forming and self healing multi hop wireless network the distance between stations is up to km each sensor station collects and delivers real time continuous seismic infrasonic lightning gps raw data to gateway the main contribution of this paper is the design and evaluation of robust sensor network to replace data loggers and provide real time long term volcano monitoring the system supports utc time synchronized data acquisition with ms accuracy and is online configurable it has been tested in the lab environment the outdoor campus and the volcano crater despite the heavy rain snow and ice as well as gusts exceeding miles per hour the sensor network has achieved remarkable packet delivery ratio above with an overall system uptime of about over the months evaluation period after deployment our initial deployment experiences with the system have alleviated the doubts of domain scientists and prove to them that low cost sensor network system can support real time monitoring in extremely harsh environments
to achieve the goal of realizing object adaptation to environments new role based model epsilon and language epsilonj are proposed in epsilon an environment is defined as field of collaboration between roles and an object adapts to the environment assuming one of the roles objects can freely enter or leave environments and belong to multiple environments at time so that dynamic adaptation or evolution of objects is realized environments and roles are the first class constructs at runtime as well as at model description time so that separation of concerns is not only materialized as static structure but also observable as behaviors environments encapsulating collaboration are independent reuse components to be deployed separately from objects in this paper the epsilon model and the language are explained with some examples the effectiveness of the model is illustrated by case study on the problem of integrated systems implementation of the language is also reported
in this paper we present photometry based approach to the digital documentation of cultural artifacts rather than representing an artifact as geometric model with spatially varying reflectance properties we instead propose directly representing the artifact in terms of its reflectance field the manner in which it transforms light into images the principal device employed in our technique is computer controlled lighting apparatus which quickly illuminates an artifact from an exhaustive set of incident illumination directions and set of digital video cameras which record the artifact’s appearance under these forms of illumination from this database of recorded images we compute linear combinations of the captured images to synthetically illuminate the object under arbitrary forms of complex incident illumination correctly capturing the effects of specular reflection subsurface scattering self shadowing mutual illumination and complex brdf’s often present in cultural artifacts we also describe computer application that allows users to realistically and interactively relight digitized artifacts
network and server centric computing paradigms are quickly returning to being the dominant methods by which we use computers web applications are so prevalent that the role of pc today has been largely reduced to terminal for running client or viewer such as web browser implementers of network centric applications typically rely on the limited capabilities of html employing proprietary plug ins or transmitting the binary image of an entire application that will be executed on the client alternatively implementers can develop without regard for remote use requiring users who wish to run such applications on remote server to rely on system that creates virtual frame buffer on the server and transmits copy of its raster image to the local clientwe review some of the problems that these current approaches pose and show how they can be solved by developing distributed user interface toolkit distributed user interface toolkit applies techniques to the high level components of toolkit that are similar to those used at low level in the window system as an example of this approach we present remotejfc working distributed user interface toolkit that makes it possible to develop thin client applications using distributed version of the java foundation classes
large number of web pages contain data structured in the form of lists many such lists can be further split into multi column tables which can then be used in more semantically meaningful tasks however harvesting relational tables from such lists can be challenging task the lists are manually generated and hence need not have well defined templates they have inconsistent delimiters if any and often have missing information we propose novel technique for extracting tables from lists the technique is domain independent and operates in fully unsupervised manner we first use multiple sources of information to split individual lines into multiple fields and then compare the splits across multiple lines to identify and fix incorrect splits and bad alignments in particular we exploit corpus of html tables also extracted from the web to identify likely fields and good alignments for each extracted table we compute an extraction score that reflects our confidence in the table’s quality we conducted an extensive experimental study using both real web lists and lists derived from tables on the web the experiments demonstrate the ability of our technique to extract tables with high accuracy in addition we applied our technique on large sample of about lists crawled from the web the analysis of the extracted tables have led us to believe that there are likely to be tens of millions of useful and query able relational tables extractable from lists on the web
as statistical machine learning algorithms and techniques continue to mature many researchers and developers see statistical machine learning not only as topic of expert study but also as tool for software development extensive prior work has studied software development but little prior work has studied software developers applying statistical machine learning this paper presents interviews of eleven researchers experienced in applying statistical machine learning algorithms and techniques to human computer interaction problems as well as study of ten participants working during five hour study to apply statistical machine learning algorithms and techniques to realistic problem we distill three related categories of difficulties that arise in applying statistical machine learning as tool for software development difficulty pursuing statistical machine learning as an iterative and exploratory process difficulty understanding relationships between data and the behavior of statistical machine learning algorithms and difficulty evaluating the performance of statistical machine learning algorithms and techniques in the context of applications this paper provides important new insight into these difficulties and the need for development tools that better support the application of statistical machine learning
building efficient tools for understanding large software systems is difficult many existing program understanding tools build control flow and data flow representations of the program priori and therefore may require prohibitive space and time when analyzing large systems since much of these representations may be unused during an analysis we construct representations on demand not in advance furthermore some representations such as the abstract syntax tree may be used infrequently during an analysis we discard these representations and recompute them as needed reducing the overall space required finally we permit the user to selectively trade off time for precision and to customize the termination of these costly analyses in order to provide finer user control we revised the traditional software architecture for compilers to provide these features without unnecessarily complicating the analyses themselves these techniques have been successfully applied in the design of program slicer for the comprehensive health care system chcs million line hospital management system written in the mumps programming language
rendering complex scenes with indirect illumination high dynamic range environment lighting and many direct light sources remains challenging problem prior work has shown that all these effects can be approximated by many point lights this paper presents scalable solution to the many light problem suitable for gpu implementation we view the problem as large matrix of sample light interactions the ideal final image is the sum of the matrix columns we propose an algorithm for approximating this sum by sampling entire rows and columns of the matrix on the gpu using shadow mapping the key observation is that the inherent structure of the transfer matrix can be revealed by sampling just small number of rows and columns our prototype implementation can compute the light transfer within few seconds for scenes with indirect and environment illumination area lights complex geometry and arbitrary shaders we believe this approach can be very useful for rapid previewing in applications like cinematic and architectural lighting design
traditional multi layered approach is adopted to human body modeling and deformation the model is split into three general anatomical structures the skeleton musculature and skin it is shown that each of these layers is modeled and deformed by using fast procedural ad hoc methods that can painlessly be reimplemented the modeling approach is generic enough to handle muscles of varying shape size and characteristics and does not break in extreme skeleton poses it is also described that the integrated musclebuilder system whose main features are easy and quick creation of muscle deformation models ii automatic deformation of an overlying skin it is shown that visually realistic results can be obtained at interactive frame rates with very little input from the designer
graph based modeling has emerged as powerful abstraction capable of capturing in single and unified framework many of the relational spatial topological and other characteristics that are present in variety of datasets and application areas computationally efficient algorithms that find patterns corresponding to frequently occurring subgraphs play an important role in developing data mining driven methodologies for analyzing the graphs resulting from such datasets this paper presents two algorithms based on the horizontal and vertical pattern discovery paradigms that find the connected subgraphs that have sufficient number of edge disjoint embeddings in single large undirected labeled sparse graph these algorithms use three different methods for determining the number of edge disjoint embeddings of subgraph and employ novel algorithms for candidate generation and frequency counting which allow them to operate on datasets with different characteristics and to quickly prune unpromising subgraphs experimental evaluation on real datasets from various domains show that both algorithms achieve good performance scale well to sparse input graphs with more than vertices or edges and significantly outperform previously developed algorithms
we present several powerful new techniques for similarity based modelling of surfaces using geodesic fans new framework for local surface comparison similarity based surface modelling provides intelligent surface manipulation by simultaneously applying modification to all similar areas of the surface we demonstrate similarity based painting deformation and filtering of surfaces and show how to vary our similarity measure to encompass geometry textures or other arbitrary signals geodesic fans are neighbourhoods uniformly sampled in the geodesic polar coordinates of point on surface we show how geodesic fans offer fast approximate alignment and comparison of surface neighbourhoods using simple spoke reordering as geodesic fans offer structurally equivalent definition of neighbourhoods everywhere on surface they are amenable to standard acceleration techniques and are well suited to extending image domain methods for modelling by example to surfaces
array operations are used in large number of important scientific codes such as molecular dynamics finite element methods climate modeling etc to implement these array operations efficiently many methods have been proposed in the literature however the majority of these methods are focused on the two dimensional arrays when extended to higher dimensional arrays these methods usually do not perform well hence designing efficient algorithms for multidimensional array operations becomes an important issue in this paper we propose new scheme extended karnaugh map representation ekmr for the multidimensional array representation the main idea of the ekmr scheme is to represent multidimensional array by set of two dimensional arrays hence efficient algorithm design for multidimensional array operations becomes less complicated to evaluate the proposed scheme we design efficient algorithms for multidimensional array operations matrix matrix addition subtraction and matrix matrix multiplications based on the ekmr and the traditional matrix representation tmr schemes both theoretical analysis and experimental test for these array operations were conducted since fortran provides rich set of intrinsic functions for multidimensional array operations in the experimental test we also compare the performance of intrinsic functions provided by the fortran compiler and those based on the ekmr scheme the experimental results show that the algorithms based on the ekmr scheme outperform those based on the tmr scheme and those provided by the fortran compiler
several of the emerging mobile commerce services such as mobile auctions mobile financial services and multiparty interactive games will require support for dependable transactions this is difficult challenge because of both intermittent connectivity and potential failures in wireless infrastructure in this paper we present multinetwork access based wireless architecture and related protocols to support dependable transactions the key idea is to allow group users to utilize access to one or more wireless networks to complete different steps of transaction this allows for transactions to be completed even under time and location dependent connectivity problems and network failures the performance results show that access to multiple wireless networks leads to very high transaction completion probability even when individual wireless networks do not offer continuous and or highly available access the transaction completion probability is found to be dependent on the group size and number of steps in transaction and the same level of dependable performance for transactions can be achieved by increasing the number of wireless networks or improved access to individual networks the overhead for multi network access can be further reduced by creating preferred wireless networks and by reducing the number of critical users in different transaction stages
while mapping streaming such as multimedia or network packet processing application onto specified architecture an important issue is to determine the input stream rates that can be supported by the architecture for any given mapping this is subject to typical constraints such as on chip buffers should not overflow and specified playout buffers which feed audio or video devices should not underflow so that the quality of the audio video output is maintained the main difficulty in this problem arises from the high variability in execution times of stream processing algorithms coupled with the bursty nature of the streams to be processed in this paper we present mathematical framework for such rate analysis for streaming applications and illustrate its feasibility through detailed case study of mpeg decoder application when integrated into tool for automated design space exploration such an analysis can be used for fast performance evaluation of different stream processing architectures
synthesis is the process of automatically generating correct running system from its specification in this paper we suggest translation of live sequence chart specification into two player game for the purpose of synthesis we use this representation for synthesizing reactive system and introduce novel algorithm for composing two such systems for two subsets of specification even though this algorithm may fail to compose the systems or to prove the joint specification to be inconsistent we present some promising results for which the composition algorithm does succeed and saves significant running time we also discuss options for extending the algorithm into sound and complete one
in this article we give our vision of pervasive grid grid dealing with light mobile and uncertain devices using context awareness for delivering the right information using the right infrastructure to final users we focus on the data side of the problem since it encompasses number of problems related to such an environment from data access query optimization data placement to security adaptation
volunteer computing is powerful way to harness distributed resources to perform large scale tasks similarly to other types of community based initiatives volunteer computing is based on two pillars the first is computational allocating and managing large computing tasks the second is participative making large numbers of individuals volunteer their computer resources to project while the computational aspects of volunteer computing received much research attention the participative aspect remains largely unexplored in this study we aim to address this gap by drawing on social psychology and online communities research we develop and test three dimensional model of the factors determining volunteer computing users contribution we investigate one of the largest volunteer computing projects seti home by linking survey data about contributors motivations to their activity logs our findings highlight the differences between volunteer computing and other forms of community based projects and reveal the intricate relationship between individual motivations social affiliation tenure in the project and resource contribution implications for research and practice are discussed
while previous compiler research indicates that significant improvements in energy efficiency may be possible if properly optimized code is used the energy constraints under which given application code should be optimized may not always be available at compile time more importantly these constraints may change dynamically during the course of execution in this work we present dynamic recompilation linking framework using which the energy behavior of given application can be optimized while the application is being executed our preliminary experiments indicate that large energy gains are possible through dynamic code recompilation linking at the expense of relatively small increase in execution time
loop fusion combines corresponding iterations of different loops it is traditionally used to decrease program run time by reducing loop overhead and increasing data locality in this paper however we consider its effect on energy the uniformity or balance of demand for system resources on conventional superscalar processor increased balance tends to increase ipc and thus dynamic power so that fusion induced improvements in program energy are slightly smaller than improvements in program run time if ipc is held constant however by reducing frequency and voltage particularly on processor with multiple clock domains then energy improvements may significantly exceed run time improvements we demonstrate the benefits of increased program balance under theoretical model of processor energy consumption we then evaluate the benefits of fusion empirically on synthetic and real world benchmarks using our existing loop fusing compiler and heavily modified version of the simplescalar wattch simulator for the real world benchmarks we demonstrate energy savings ranging from with run times ranging from slowdown to speedup in addition to validating our theoretical model the simulation results allow us to tease apart the factors that contribute to fusion induced time and energy savings
simulation of the musculoskeletal system has important applications in biomechanics biomedical engineering surgery simulation and computer graphics the accuracy of the muscle bone and tendon geometry as well as the accuracy of muscle and tendon dynamic deformation are of paramount importance in all these applications we present framework for extracting and simulating high resolution musculoskeletal geometry from the segmented visible human data set we simulate contact collision coupled muscles in the upper limb and describe computationally tractable implementation using an embedded mesh framework muscle geometry is embedded in nonmanifold connectivity preserving simulation mesh molded out of lower resolution bcc lattice containing identical well shaped elements leading to relaxed time step restriction for stability and thus reduced computational cost the muscles are endowed with transversely isotropic quasi incompressible constitutive model that incorporates muscle fiber fields as well as passive and active components the simulation takes advantage of new robust finite element technique that handles both degenerate and inverted tetrahedra
web applications are becoming increasingly popular for mobile wireless pdas however web browsing on these systems can be quite slow an alternative approach is handheld thin client computing in which the web browser and associated application logic run on server which then sends simple screen updates to thepda for display to assess the viability of this thin client approach we compare the web browsing performance of thin clients against fat clients that run the web browser locally on pda our results show that thin clients can provide better web browsing performance compared to fat clients both in terms of speed and ability to correctly display web content surprisingly thin clients are faster even when having to send more data over the network we characterize and analyze different design choices in various thin client systems and explain why these approaches can yield superior web browsing performance on mobile wireless pdas
imagine that you have been entrusted with private data such as corporate product information sensitive government information or symptom and treatment information about hospital patients you may want to issue queries whose result will combine private and public data but private data must not be revealed ghostdb is an architecture and system to achieve this you carry private data in smart usb key large flash persistent store combined with tamper and snoop resistant cpu and small ram when the key is plugged in you can issue queries that link private and public data and be sure that the only information revealed to potential spy is which queries you pose queries linking public and private data entail novel distributed processing techniques on extremely unequal devices standard computer and smart usb key this paper presents the basic framework to make this all work intuitively and efficiently
in many applications that track and analyze spatiotemporal data movements obey periodic patterns the objects follow the same routes approximately over regular time intervals for example people wake up at the same time and follow more or less the same route to their work everyday the discovery of hidden periodic patterns in spatiotemporal data apart from unveiling important information to the data analyst can facilitate data management substantially based on this observation we propose framework that analyzes manages and queries object movements that follow such patterns we define the spatiotemporal periodic pattern mining problem and propose an effective and fast mining algorithm for retrieving maximal periodic patterns we also devise novel specialized index structure that can benefit from the discovered patterns to support more efficient execution of spatiotemporal queries we evaluate our methods experimentally using datasets with object trajectories that exhibit periodicity
rootkits have become growing concern in cyber security typically they exploit kernel vulnerabilities to gain root privileges of system and conceal malwares activities from users and system administrators without any authorization once infected these malware applications will operate completely in stealth leaving no trace for administrators and anti malware tools current anti rootkit solutions try to either strengthen the kernel by removing known vulnerabilities or develop software tools at the os or virtual machine monitor levels to monitor the integrity of the kernel seeing the failure of these software techniques we propose in this paper an autonomic architecture called shark or secure hardware support against rootkit by employing hardware support to provide system level security without trusting the software stack including the os kernel shark enhances the relationship between the os and the hardware architecture making the entire system more security aware in defending rootkits shark proposes new architectural support to provide secure association between each software context and the underlying hardware it helps system administrators to obtain feedback directly from the hardware to reveal all running processes even when the os kernel is compromised we emulated the functionality of shark by using bochs and modifying the linux kernel version based on our proposed architectural extension several real rootkits were installed to compromise the kernel and conceal malware processes on our emulated environment shark is shown to be highly effective in identifying variety of rootkits employing different software schemes in addition the performance analysis based on our simics simulations shows negligible overhead making the shark architecture highly practical
transactional memory tm is concurrency control mechanism that aims to simplify concurrent programming with reasonable scalability programmers can simply specify the code regions that access the shared data and then tm system executes them as transactions however programmers often need to modify the application logic to achieve high scalability on tm if there is any variable that is frequently updated in many transactions the program does not scale well on tmwe propose an approach that uses ordered shared locks in tm systems to improve the scalability of such programs the ordered shared locks allow multiple transactions to update shared variable concurrently without causing rollbacks or blocking until other transactions finish our approach improves the scalability of tm by applying the ordered shared locks to variables that are frequently updated in many transactions while being accessed only once in each transaction we implemented our approach on software tm stm system for java in our experiments it improved the performance of an hsqldb benchmark by on threads and by on threads compared to the original stm system
the slack time in real time systems can be used by recovery schemes to increase system reliability as well as by frequency and voltage scaling techniques to save energy moreover the rate of transient faults ie soft errors caused for example by cosmic ray radiations also depends on system operating frequency and supply voltage thus there is an interesting trade off between system reliability and energy consumption this work first investigates the effects of frequency and voltage scaling on the fault rate and proposes two fault rate models based on previously published data then the effects of energy management on reliability are studied our analysis results show that energy management through frequency and voltage scaling could dramatically reduce system reliability and ignoring the effects of energy management on the fault rate is too optimistic and may lead to unsatisfied system reliability
web user search customization research has been fueled by the recognition that if the www is to attain to its optimal potential as an interactive medium the development of new and or improved web resource classification page identification referencing indexing etc and retrieval delivery systems supportive of and responsive to user preference is of prime importance user preference as it relates to web user’s search agenda entails maintaining the user as director of his search and expert as to which web pages are relevant in our work web usage and web structure mining are employed in theoretically skillful way to produce strongly connected virtual bipartite clique biclique search neighborhood of high quality pages of relevance to the web user’s search objective our algorithm is designed to exploit linkage data inherent in web access logs using the combined log format cblf to assemble referer partite set of pages consistent with the user’s preference and search intent members user’s initial choice of web resource page and other relevant authority type pages and request partite set members pages with incoming links from the referer partite the web user’s initial page of choice becomes the first member of the referer partite and gatekeeper to the biclique neighborhood our algorithm uses web site’s collective user’s history log entries in collaborative manner to identify and further qualify pages of relevance for membership in the appropriate partite set web user search customization strategically fostered by our algorithm enhances the efficiency and productivity of web user’s activity in three ways it delivers high quality pages organized hierarchically to facilitate the user’s ready assessment of the web site’s benefit to his search objective thus minimizing time spent at an unfruitful site it facilitates ease of navigation in either breadth first depth first or combination of the two and it nullifies time spent locating and traversing paths to pages hosted in much to large distributed search environments
defeasible logic is rule based nonmonotonic logic with both strict and defeasible rules and priority relation on rules we show that inference in the propositional form of the logic can be performed in linear time this contrasts markedly with most other propositional nonmonotonic logics in which inference is intractable
we propose variational framework for the integration of multiple competing shape priors into level set based segmentation schemes by optimizing an appropriate cost functional with respect to both level set function and vector valued labeling function we jointly generate segmentation by the level set function and recognition driven partition of the image domain by the labeling function which indicates where to enforce certain shape priors our framework fundamentally extends previous work on shape priors in level set segmentation by directly addressing the central question of where to apply which prior it allows for the seamless integration of numerous shape priors such that while segmenting both multiple known and unknown objects the level set process may selectively use specific shape knowledge for simultaneously enhancing segmentation and recognizing shape
phishing attacks in which criminals lure internet users to web sites that spoof legitimate web sites are occurring with increasing frequency and are causing considerable harm to victims while great deal of effort has been devoted to solving the phishing problem by prevention and detection of phishing emails and phishing web sites little research has been done in the area of training users to recognize those attacks our research focuses on educating users about phishing and helping them make better trust decisions we identified number of challenges for end user security education in general and anti phishing education in particular users are not motivated to learn about security for most users security is secondary task it is difficult to teach people to identify security threats without also increasing their tendency to misjudge nonthreats as threats keeping these challenges in mind we developed an email based anti phishing education system called ldquo phishguru rdquo and an online game called ldquo anti phishing phil rdquo that teaches users how to use cues in urls to avoid falling for phishing attacks we applied learning science instructional principles in the design of phishguru and anti phishing phil in this article we present the results of phishguru and anti phishing phil user studies that demonstrate the effectiveness of these tools our results suggest that while automated detection systems should be used as the first line of defense against phishing attacks user education offers complementary approach to help people better recognize fraudulent emails and websites
placement is one of the most important steps in the rtl to gdsii synthesis process as it directly defines the interconnects which have become the bottleneck in circuit andsystem performance in deep submicron technologies theplacement problem has been studied extensively in the past years however recent studies show that existing placement solutions are surprisingly far from optimal the first part of this tutorial summarizes results from recent optimality and scalability studies of existing placement tools thesestudies show that the results of leading placement tools fromboth industry and academia may be up to to awayfrom optimal in total wirelength if such gap can be closed the corresponding performance improvement will be equivalent to several technology generation advancements thesecond part of the tutorial highlights the recent progress onlarge scale circuit placement including techniques for wirelength minimization routability optimization and performance optimization
in this paper we consider the problem of data allocation in environments of self motivated servers where information servers respond to queries from users new data items arrive frequently and have to be allocated in the distributed system the servers have no common interests and each server is concerned with the exact location of each of the data items there is also no central controller we suggest using negotiation framework which takes into account the passage of time during the negotiation process itself using this negotiation mechanism the servers have simple and stable negotiation strategies that result in efficient agreements without delays we provide heuristics for finding the details of the strategies which depend on the specific settings of the environment and which cannot be provided to the agents in advance we demonstrate the quality of the heuristics using simulations we consider situations characterized by complete as well as incomplete information and prove that our methods yield better results than the static allocation policy currently used for data allocation for servers in distributed systems
we present new routing protocol pathlet routing in which networks advertise fragments of paths called pathlets that sources concatenate into end to end source routes intuitively the pathlet is highly flexible building block capturing policy constraints as well as enabling an exponentially large number of path choices in particular we show that pathlet routing can emulate the policies of bgp source routing and several recent multipath proposals this flexibility lets us address two major challenges for internet routing scalability and source controlled routing when router’s routing policy has only local constraints it can be represented using small number of pathlets leading to very small forwarding tables and many choices of routes for senders crucially pathlet routing does not impose global requirement on what style of policy is used but rather allows multiple styles to coexist the protocol thus supports complex routing policies while enabling and incentivizing the adoption of policies that yield small forwarding plane state and high degree of path choice
error detection is an important activity of program development which is applied to detect incorrect computations or runtime failures of software the costs of debugging are strongly related to the complexity and the scale of the investigated programs both characteristics are especially cumbersome for large scale parallel programs with long runtimes which are quite common in computational science and engineering cse applications solution is offered by combination of techniques using the event graph model as representation of parallel program behaviour with process isolation subset of the original number of processes can be investigated while the absent processes are simulated by the debugging system with checkpointing an arbitrary temporal section of program’s runtime can be extracted for exhaustive analysis without the need to restart the program from the beginning additional benefits of the event graph are support of equivalent execution of nondeterministic programs as well as comprehensible visualisation as space time diagram
the quality of software architecture for component based distributed systems is defined not just by its source code but also by other systemic artifacts such as the assembly deployment and configuration of the application components and their component middleware in the context of distributed real time and embedded dre component based systems bin packing algorithms and schedulability analysis have been used to make deployment and configuration decisions however these algorithms make only coarse grained node assignments but do not indicate how components are allocated to different middleware containers on the node which are known to impact runtime system performance and resource consumption this paper presents model transformation based algorithm that combines user specified quality of service qos requirements with the node assignments to provide finer level of granularity and precision in the deployment and configuration decisions beneficial side effect of our work lies in how these decisions can be leveraged by additional backend performance optimization techniques we evaluate our approach and compare it against the existing state of the art in the context of representative dre system
we investigate complexity issues related to pure nash equilibria of strategic games we show that even in very restrictive settings determining whether game has pure nash equilibrium is np hard while deciding whether game has strong nash equilibrium is Ï�p complete we then study practically relevant restrictions that lower the complexity in particular we are interested in quantitative and qualitative restrictions of the way each player’s payoff depends on moves of other players we say that game has small neighborhood if the utility function for each player depends only on the actions of logarithmically small number of other players the dependency structure of game can be expressed by graph or by hypergraph by relating nash equilibrium problems to constraint satisfaction problems csps we show that if has small neighborhood and if has bounded hypertree width or if has bounded treewidth then finding pure nash and pareto equilibria is feasible in polynomial time if the game is graphical then these problems are logcfl complete and thus in the class nc of highly parallelizable problems
the limited screen size and resolution of current mobile devices can still be problematic for map multimedia and browsing applications in this paper we present touch interact an interaction technique in which mobile phone is able to touch display at any position to perform selections through the combination of the output capabilities of the mobile phone and display applications can share the entire display space moreover there is potential to realize new interaction techniques between the phone and display for example select pick and select drop are interactions whereby entities can be picked up onto the phone or dropped onto the display we report the implementation of touch interact its usage for tourist guide application and experimental comparison the latter shows that the performance of touch interact is comparable to approaches based on touch screen it also shows the advantages of our system regarding ease of use intuitiveness and enjoyment
this paper presents new approach to combining outputs of existing word alignment systems each alignment link is represented with set of feature functions extracted from linguistic features and input alignments these features are used as the basis of alignment decisions made by maximum entropy approach the learning method has been evaluated on three language pairs yielding significant improvements over input alignments and three heuristic combination methods the impact of word alignment on mt quality is investigated using phrase based mt system
this paper proposes new discrete optimization framework for tomographic reconstruction and segmentation of ct volumes when only few projection views are available the problem has important clinical applications in coronary angiographic imaging we first show that the limited view reconstruction and segmentation problem can be formulated as constrained version of the metric labeling problem this lays the groundwork for linear programming framework that brings metric labeling classification and classical algebraic tomographic reconstruction art together in unified model if the imaged volume is known to be comprised of finite set of attenuation coefficients realistic assumption given regular limited view reconstruction we view it as task of voxels reassignment subject to maximally maintaining consistency with the input reconstruction and the objective of art simultaneously the approach can reliably reconstruct or segment volumes with several multiple contrast objects we present evaluations using experiments on cone beam computed tomography
in this paper we investigate position based enhancements to wi fi network security specifically we investigate whether received signal strength rss measurements can identify attempts at network access by malicious nodes exterior to an authorised network perimeter we assume the malicious nodes will spoof their received or transmitted power levels in attempts to circumvent standard position based security techniques we outline why residual analysis of the rss measurements cannot robustly identify illegal network access requests however we show that by referring the residual rss analysis to claimed position interior to the authorised perimeter robust position based verification system for secure network access can be developed indoor systems based on rss fingerprints and differential rss fingerprints are studied outdoor systems under the assumption of log normal shadowing are also investigated
this paper presents case study of the process of insightful analysis of clinical data collected in regular hospital practice the approach is applied to database describing patients suffering from brain ischaemia either permanent as brain stroke with positive computer tomography ct or reversible ischaemia with normal brain ct test the goal of the analysis is the extraction of useful knowledge that can help in diagnosis prevention and better understanding of the vascular brain disease this paper demonstrates the applicability of subgroup discovery for insightful data analysis and describes the expert’s process of converting the induced rules into useful medical knowledge detection of coexisting risk factors selection of relevant discriminative points for numerical descriptors as well as the detection and description of characteristic patient subpopulations are important results of the analysis graphical representation is extensively used to illustrate the detected dependencies in the available clinical data
as multi core architectures with thread level speculation tls are becoming better understood it is important to focus on tls compilation tls compilers are interesting in that while they do not need to fully prove the independence of concurrent tasks they make choices of where and when to generate speculative tasks that are crucial to overall tls performancethis paper presents posh new fully automated tls compiler built on top of gcc posh is based on two design decisions first to partition the code into tasks it leverages the code structures created by the programmer namely subroutines and loops second it uses simple profiling pass to discard ineffective tasks with the code generated by posh simulated tls chip multiprocessor with superscalar cores delivers an average speedup of for the specint applications moreover an estimated of this speedup is result of the implicit data prefetching provided by squashed tasks
nowadays more and more news sites publish news stories using news rss feeds for easier access and subscription on the web generally news stories are grouped by several categories and each category corresponds to one news rss feed however there are no uniform standards for categorization each news site has its own way of categorization for grouping news stories these dissimilar categorization can not always satisfy every individual user and generally the provided categories are not detailed enough for personal usingin this paper we proposed method for users to create customizable personal news rss feeds using existing ones we implemented news directory system nds which can retrieve news stories by rss feeds and classify them using this system we can recategorize news stories from original rss feeds or subdivide one rss feed to more detailed level with the classification information for each news article we offer customizable personal news rss feeds to subscribers
mobile ad hoc networks manets are very dynamic networks with devices continuously entering and leaving the group the highly dynamic nature of manets renders the manual creation and update of policies associated with the initial incorporation of devices to the manet admission control as well as with anomaly detection during communications among members access control very difficult task in this paper we present barter mechanism that automatically creates and updates admission and access control policies for manets based on behavior profiles barter is an adaptation for fully distributed environments of our previously introduced bb nac mechanism for nac technologies rather than relying on centralized nac enforcer manet members initially exchange their behavior profiles and compute individual local definitions of normal network behavior during admission or access control each member issues an individual decision based on its definition of normalcy individual decisions are then aggregated via threshold cryptographic infrastructure that requires an agreement among fixed amount of manet members to change the status of the network we present experimental results using content and volumetric behavior profiles computed from the enron dataset in particular we show that the mechanism achieves true rejection rates of with false rejection rates of
well distributed point sets play an important role in variety of computer graphics contexts such as anti aliasing global illumination halftoning non photorealistic rendering point based modeling and rendering and geometry processing in this paper we introduce novel technique for rapidly generating large point sets possessing blue noise fourier spectrum and high visual quality our technique generates non periodic point sets distributed over arbitrarily large areas the local density of point set may be prescribed by an arbitrary target density function without any preset bound on the maximum density our technique is deterministic and tile based thus any local portion of potentially infinite point set may be consistently regenerated as needed the memory footprint of the technique is constant and the cost to generate any local portion of the point set is proportional to the integral over the target density in that area these properties make our technique highly suitable for variety of real time interactive applications some of which are demonstrated in the paperour technique utilizes set of carefully constructed progressive and recursive blue noise wang tiles the use of wang tiles enables the generation of infinite non periodic tilings the progressive point sets inside each tile are able to produce spatially varying point densities recursion allows our technique to adaptively subdivide tiles only where high density is required and makes it possible to zoom into point sets by an arbitrary amount while maintaining constant apparent density
recently scientific workflows have emerged as platform for automating and accelerating data processing and data sharing in scientific communities many scientific workflows have been developed for collaborative research projects that involve number of geographically distributed organizations sharing of data and computation across organizations in different administrative domains is essential in such collaborative environment because of the competitive nature of scientific research it is important to ensure that sensitive information in scientific workflows can be accessed by and propagated to only authorized parties to address this problem we present techniques for analyzing how information propagates in scientific workflows we also present algorithms for incrementally analyzing how information propagates upon every change to an existing scientific workflow
instruction wakeup logic consumes large amount of energy in out of order processors existing solutions to the problem require prediction or additional hardware complexity to reduce the energy consumption and in some cases may have negative impact on processor performance this paper proposes new mechanism for instruction wakeup which uses partitioned instruction queue iq the energy consumption of an iq partition block is proportional to the number of entries in it all the blocks are turned off until the mechanism determines which blocks to access on wakeup using simple successor tracking mechanism the proposed approach is shown to require as little as comparisons per committed instruction for spec benchmarks the energy consumption and timing of the partitioned iq design are evaluated using cacti models for μm process the average energy savings observed were and respectively for entry integer and floating point partitioned iqs
wireless sensor networks wsn have become significant research challenge attracting many researchers this paper provides an overview of collaborative wsn reviewing the algorithms techniques and state of the art developed so far we discuss the research challenges and opportunities in this area the major focus is given to cooperative signal and information processing in collaborative wsn in order to expose important constraints in wireless applications which require distributed computing such as node localization target detection and tracking in this paper we also present and discuss the applications of multi agent systems into wsn as core technology of cooperative information processing
as the ubiquitous interplay of structured semi structured and unstructured data from different sources neither db style structured query requiring knowledge of full schema and complex language nor ir style keyword search ignoring latent structures can satisfy users in this paper we present novel semi structured search engine se that provides easy flexible precise and rapid access to heterogeneous data represented by semi structured graph model by using an intuitive se query language sql users are able to pose queries on heterogeneous data in varying degree of structural constraint according to their knowledge of schema se evaluates sql queries as the top answers composed of logical units and relationship paths between them and thus can extract meaningful information even if the query conditions are vague ambiguous and inaccurate
efficient collaboration allows organizations and individuals to improve the efficiency and quality of their business activities delegations as signif icant approach may occur as workflow collabora tions supply chain collaborations or collaborative commerce role based delegation models have been used as flexible and efficient access management for collaborative business environments delegation revocations can provide significant functionalities for the models in business environments when the delegated roles or permissions are required to get back however problems may arise in the revocation process when one user delegates user role and another user delegates negative authorization of the role this paper aims to analyse various role based delegation revocation features through examples revocations are categorized in four dimensions dependency resilience propagation and dominance according to these dimensions sixteen types of revocations exist for specific requests in collaborative business environments dependentweaklocaldelete dependent weaklocalnegative dependentweakglobaldelete dependentweakglobalnegative independentweak localdelete independentweaklocalnegative inde pendentweakglobaldelete independentweakglobal negative and so on we present revocation delegating models and then discuss user delegation authorization and the impact of revocation operations finally comparisons with other related work are discussed
the advent of sensor networks presents untapped opportunities for synthesis we examine the problem of synthesis of behavioral specifications into networks of programmable sensor blocks the particular behavioral specification we consider is an intuitive user created network diagram of sensor blocks each block having pre defined combinational or sequential behavior we synthesize this specification to new network that utilizes minimum number of programmable blocks in place of the pre defined blocks thus reducing network size and hence network cost and power we focus on the main task of this synthesis problem namely partitioning pre defined blocks onto minimum number of programmable blocks introducing the efficient but effective paredown decomposition algorithm for the task we describe the synthesis and simulation tools we developed we provide results showing excellent network size reductions through such synthesis and significant speedups of our algorithm over exhaustive search while obtaining near optimal results for real network designs as well as nearly randomly generated designs
the in network aggregation paradigm in sensor networks provides versatile approach for evaluating aggregate queries traditional approaches need separate aggregate to be computed and communicated for each query and hence do not scale well with the number of queries since approximate query results are sufficient for many applications we use an alternate approach based on summary data structures we consider two kinds of aggregate queries value range queries that compute the number of sensors that report values in the given range and location range queries that compute the sum of values reported by sensors in given location range we construct summary data structures called linear sketches over the sensor data using in network aggregation and use them to answer aggregate queries in an approximate manner at the base station there is trade off between accuracy of the query results and lifetime of the sensor network that can be exploited to achieve increased lifetimes for small loss in accuracy experimental results show that linear sketching achieves significant improvements in lifetime of sensor networks for only small loss in accuracy of the queries further our approach achieves more accurate query results than the other classical techniques using discrete fourier transform and discrete wavelet transform
wireless ad hoc sensor networks have emerged as one of the key growth areas for wireless networking and computing technologies so far these networks systems have been designed with static and custom architectures for specific tasks thus providing inflexible operation and interaction capabilities our vision is to create sensor networks that are open to multiple transient users with dynamic needs working towards this vision we propose framework to define and support lightweight and mobile control scripts that allow the computation communication and sensing resources at the sensor nodes to be efficiently harnessed in an application specific fashion the replication migration of such scripts in several sensor nodes allows the dynamic deployment of distributed algorithms into the network our framework sensorware defines creates dynamically deploys and supports such scripts our implementation of sensorware occupies less than kbytes of code memory and thus easily fits into several sensor node platforms extensive delay measurements on our ipaq based prototype sensor node platform reveal the small overhead of sensorware to the algorithms less than msec in most high level operations in return the programmer of the sensor network receives compactness of code abstraction services for all of the node’s modules and in built multi user support sensorware with its features apart from making dynamic programming possible it also makes it easy and efficient without restricting the expressiveness of the algorithms
the objective of this paper is to use computer vision to detect and localize multiple object within an image in the presence of cluttered background substantial occlusion and significant scale changes our approach consists of first generating set of hypotheses for each object using generative model plsa with bag of visual words representing each image then the discriminative part verifies each hypothesis using multi class svm classifier with merging features that combines both spatial shape and color appearance of an object in the post processing stage environmental context information is used to improve the performance of the system combination of features and context information are used to investigate the performance on our local database the best performance is obtained using object specific weighted merging features and the context information our approach overcomes the limitations of some state of the art methods
since the last in depth studies of measured tcp traffic some years ago the internet has experienced significant changes including the rapid deployment of backbone links with orders of magnitude more capacity the emergence of bandwidth intensive streaming applications and the massive penetration of new tcp variants these and other changes beg the question whether the characteristics of measured tcp traffic in today’s internet reflect these changes or have largely remained the same to answer this question we collected and analyzed packet traces from number of internet backbone and access links focused on the heavy hitter flows responsible for the majority of traffic next we analyzed their within flow packet dynamics and observed the following features in one of our datasets up to of flows have an initial congestion window icw size larger than the upper bound specified by rfc among flows that encounter retransmission rates of more than of them exhibit irregular retransmission behavior where the sender does not slow down its sending rate during retransmissions tcp flow clocking ie regular spacing between flights of packets can be caused by both rtt and non rtt factors such as application or link layer and of flows studied show no pronounced flow clocking to arrive at these findings we developed novel techniques for analyzing unidirectional tcp flows including technique for inferring icw size method for detecting irregular retransmissions and new approach for accurately extracting flow clocks
context and motivation the principle of divide and conquer suggests that complex software problems should be decomposed into simpler problems and those problems should be solved before considering how they can be composed the eventual composition may fail if solutions to simpler problems interact in unexpected ways question problem given descriptions of individual problems early identification of situations where composition might fail remains an outstanding issue principal ideas results in this paper we present tool supported approach for early identification of all possible interactions between problems where the composition cannot be achieved fully our tool called the openpf provides simple diagramming editor for drawing problem diagrams and describing them using the event calculus ii structures the event calculus formulae of individual problem diagrams for the abduction procedure and iii communicates with an off the shelf abductive reasoner in the background and relates the results of the abduction procedure to the problem diagrams the theory and the tool framework proposed are illustrated with an interaction problem from smart home application contribution this tool highlights at an early stage the parts in problem diagrams that will interact when composed together
applications that process complex inputs often react in different ways to changes in different regions of the input small changes to forgiving regions induce correspondingly small changes in the behavior and output small changes to critical regions on the other hand can induce disproportionally large changes in the behavior or output identifying the critical and forgiving regions in the input and the corresponding critical and forgiving regions of code is directly relevant to many software engineering tasks we present system snap for automatically grouping related input bytes into fields and classifying each field and corresponding regions of code as critical or forgiving given an application and one or more inputs snap uses targeted input fuzzing in combination with dynamic execution and influence tracing to classify regions of input fields and code as critical or forgiving our experimental evaluation shows that snap makes classifications with close to perfect precision and very good recall between and depending on the application
in this paper we describe how the bbc is working to integrate data and linking documents across bbc domains by using semantic web technology in particular linked data musicbrainz and dbpedia we cover the work of bbc programmes and bbc music building linked data sites for all music and programmes related brands and we describe existing projects ongoing development and further research we are doing in joint collaboration between the bbc freie universität berlin and rattle research in order to use dbpedia as the controlled vocabulary and semantic backbone for the whole bbc
the method of logical relations assigns relational interpretation to types that expresses operational invariants satisfied by all terms of type the method is widely used in the study of typed languages for example to establish contextual equivalences of terms the chief difficulty in using logical relations is to establish the existence of suitable relational interpretation we extend work of pitts and birkedal and harper on constructing relational interpretations of types to polymorphism and recursive types and apply it to establish parametricity and representation independence properties in purely operational setting we argue that once the existence of relational interpretation has been established it is straightforward to use it to establish properties of interest
implicit coscheduling is distributed algorithm for time sharing communicating processes in cluster of workstations by observing and reacting to implicit information local schedulers in the system make independent decisions that dynamically coordinate the scheduling of communicating processes the principal mechanism involved is two phase spin blocking process waiting for message response spins for some amount of time and then relinquishes the processor if the response does not arrivein this paper we describe our experience implementing implicit coscheduling on cluster of ultrasparc workstations this has led to contributions in three main areas first we more rigorously analyze the two phase spin block algorithm and show that spin time should be increased when process is receiving messages second we present performance measurements for wide range of synthetic benchmarks and for seven split parallel applications finally we show how implicit coscheduling behaves under different job layouts and scaling and discuss preliminary results for achieving fairness
the policy continuum is fundamental component of any policy based management implementation for autonomic networking but as of yet has no formal operational semantics we propose policy continuum model and accompanying policy authoring process that demonstrates the key properties that set continuum apart from non hierarchical policy model as part of the policy authoring process we present policy conflict analysis algorithm that leverages the information model making it applicable to arbitrary applications and continuum levels the approach for policy conflict analysis entails analysing candidate policy either newly created or modified on pair wise basis with already deployed policies and potential conflicts between the policies are fed back to the policy author central to the approach is two phase algorithm which firstly determines the relationships between the pair of policies and secondly applies an application specific conflict pattern to determine if the policies should be flagged as potentially conflicting in this paper we present the formal policy continuum and two phase conflict analysis algorithm as part of the policy authoring process we describe an implementation where we demonstrate the detection of potential conflicts within policy continuum
energy consumption is one of the most important issues in resource constrained embedded systems many such systems run java based applications due to java’s architecture independent format bytecode standard techniques for executing bytecode programs eg interpretation or just in time compilation have performance or memory issues that make them unsuitable for resource constrained embedded systems superoperator extended lightweight java virtual machine jvm can be used in resource constrained embedded systems to improve performance and reduce memory consumption this paper shows that such jvm also significantly reduces energy consumption this is due primarily to considerable reduction in the number of memory accesses and thus in energy consumption in the instruction and data tlbs and caches and in most cases in dram energy consumption since the fraction of processor energy dissipated in these units is approximately the energy savings achieved are significant the paper evaluates the number of load store and computational instructions eliminated by the use of proposed superoperators as compared to simple interpreter on set of embedded benchmarks using cache and dram per access energy we estimate the total processor dram energy saved by using our jvm our results show that with kb caches the reduction in energy consumption ranges from to of the overall processor plus dram energy even higher savings may be achieved with smaller caches and increased access to dram as dram access energy is fairly high
content based shape retrieval for broad domains like the world wide web has recently gained considerable attention in computer graphics community one of the main challenges in this context is the mapping of objects into compact canonical representations referred to as descriptors which serve as search keys during the retrieval process the descriptors should have certain desirable properties like invariance under scaling rotation and translation very importantly they should possess descriptive power providing basis for similarity measure between three dimensional objects which is close to the human notion of resemblancein this paper we advocate the usage of so called zernike invariants as descriptors for content based shape retrieval the basis polynomials of this representation facilitate computation of invariants under the above transformations some theoretical results have already been summarized in the past from the aspect of pattern recognition and shape analysis we provide practical analysis of these invariants along with algorithms and computational details furthermore we give detailed discussion on influence of the algorithm parameters like type and resolution of the conversion into volumetric function number of utilized coefficients etc as is revealed by our study the zernike descriptors are natural extensions of spherical harmonics based descriptors which are reported to be among the most successful representations at present we conduct comparison of zernike descriptors against these regarding computational aspects and shape retrieval performance
we describe system level simulation model and show that it enables accurate predictions of both subsystem and overall system performance in contrast the conventional approach for evaluating the performance of an subsystem design which is based on standalone subsystem models is often unable to accurately predict performance changes because it is too narrow in scope in particular conventional methodology treats all requests equally ignoring differences in how individual requests response times affect system behavior including both system performance and the subsequent workload we introduce the concept of request criticality to describe these feedback effects and show that real workloads are not approximated well by either open or closed input models because conventional methodology ignores this fact it often leads to inaccurate performance predictions and can thereby lead to incorrect conclusions and poor design choices we illustrate these problems with real examples and show that system level model which includes both the subsystem and other important system components eg cpus and system software properly captures the feedback and subsequent performance effects
measurement collection and interpretation of network usage data commonly involves multiple stage of sampling and aggregation examples include sampling packets aggregating them into flow statistics at router sampling and aggregation of usage records in network data repository for reporting query and archiving although unbiased estimates of packet bytes and flows usage can be formed for each sampling operation for many applications it is crucial to know the inherent estimation error previous work in this area has been limited mainly to analyzing the estimator variance for particular methods eg independent packet sampling however the variance is of limited use for more general sampling methods where the estimate may not be well approximated by gaussian distribution this motivates our paper in which we establish chernoff bounds on the likelihood of estimation error in general multistage combination of measurement sampling and aggregation we derive the scale against which errors are measured in terms of the constituent sampling and aggregation operations in particular this enables us to obtain rigorous confidence intervals around any given estimate we apply our method to number of sampling schemes both in the literature and currently deployed including sampling of packet sampled netflow records sample and hold and flow slicing we obtain one particularly striking result in the first case that for range of parameterizations packet sampling has no additional impact on the estimator confidence derived from our bound beyond that already imposed by flow sampling
in this paper we describe the program structure tree pst hierarchical representation of program structure based on single entry single exit sese regions of the control flow graph we give linear time algorithm for finding sese regions and for building the pst of arbitrary control flow graphs including irreducible ones next we establish connection between sese regions and control dependence equivalence classes and show how to use the algorithm to find control regions in linear time finally we discuss some applications of the pst many control flow algorithms such as construction of static single assignment form can be speeded up by applying the algorithms in divide and conquer style to each sese region on its own the pst is also used to speed up data flow analysis by exploiting ldquo sparsity rdquo experimental results from the perfect club and spec benchmarks confirm that the pst approach finds and exploits program structure
in this paper we present new point set surface pss definition based on moving least squares mls fitting of algebraic spheres our surface representation can be expressed by either projection procedure or in implicit form the central advantages of our approach compared to existing planar mls include significantly improved stability of the projection under low sampling rates and in the presence of high curvature the method can approximate or interpolate the input point set and naturally handles planar point clouds in addition our approach provides reliable estimate of the mean curvature of the surface at no additional cost and allows for the robust handling of sharp features and boundaries it processes simple point set as input but can also take significant advantage of surface normals to improve robustness quality and performance we also present an novel normal estimation procedure which exploits the properties of the spherical fit for both direction estimation and orientation propagation very efficient computational procedures enable us to compute the algebraic sphere fitting with up to million points per second on latest generation gpus
synchronization ensures exclusive shared variable access at runtime and static access control mechanisms give similar guarantees at compilation time usually we treat these language concepts as separate in this work we propose to integrate synchronization into access control in java like language shared variable access depends on the availability of tokens as form of access control and the compiler generates code for locking to gain the needed tokens synchronization we get more freedom in expressing synchronization at appropriate points in program and weaker influence of concurrency on the program structure
in recent years the internet has become one of the most important sources of information and it is now imperative that companies are able to collect retrieve process and manage information from the web however due to the sheer amount of information available browsing web content by searches using keywords is inefficient largely because unstructured html web pages are written for human comprehension and not for direct machine processing for the same reason the degree of web automation is limited it is recognized that semantics can enhance web automation but it will take an indefinite amount of effort to convert the current html web into the semantic web this study proposes novel ontology extractor called ontospider for extracting ontology from the html web the contribution of this work is the design and implementation of six phase process that includes the preparation transformation clustering recognition refinement and revision for extracting ontology from unstructured html pages the extracted ontology provides structured and relevant information for applications such as commerce and knowledge management that can be compared and analyzed more effectively we give detailed information on the system and provide series of experimental results that validate the system design and illustrate the effectiveness of ontospider
video streaming is vital for many important applications such as distance learning digital video libraries and movie on demand since video streaming requires significant server and networking resources caching has been used to reduce the demand on these resources in this paper we propose novel collaboration scheme for video caching on overlay networks called overlay caching scheme ocs to further minimize service delays and loads placed on an overlay network for video streaming applications ocs is not centralized nor hierarchical collaborative scheme despite its design simplicity ocs effectively uses an aggregate storage space and capability of distributed overlay nodes to cache popular videos and serve nearby clients moreover ocs is light weight and adaptive to clients locations and request patterns we also investigate other video caching techniques for overlay networks including both collaborative and non collaborative ones compared with these techniques on topologies inspired from actual networks ocs offers extremely low average service delays and approximately half the server load ocs also offers smaller network load in most cases in our study
we propose new graph based semi supervised learning ssl algorithm and demonstrate its application to document categorization each document is represented by vertex within weighted undirected graph and our proposed framework minimizes the weighted kullback leibler divergence between distributions that encode the class membership probabilities of each vertex the proposed objective is convex with guaranteed convergence using an alternating minimization procedure further it generalizes in straightforward manner to multi class problems we present results on two standard tasks namely reuters and webkb showing that the proposed algorithm significantly outperforms the state of the art
component dependency is an important software measure it is directly related to software understandability maintainability and reusability two important parameters in describing component dependency are the type of coupling between two components and the type of the dependent component depending upon the different types of coupling and the type of the dependent components there can be different effects on component maintenance and component reuse in this paper we divide dependent components into three types we then classify various component dependencies and analyze their effects on maintenance and reuse based on the classification we present dependency metric and validate it on open source java components our study shows that strong correlation exists between the measurement of the dependency of the component and the effort to reuse the component this indicates that the classification of component dependency and the suggested metric could be further used to represent other external software quality factors
model based development promises to increase productivity by offering modeling languages tailored to specific domain such modeling languages are typically defined by metamodel in response to changing requirements and technological progress the domains and thus the metamodels are subject to change manually migrating existing models to new version of their metamodel is tedious and error prone hence adequate tool support is required to support the maintenance of modeling languages this paper introduces cope an integrated approach to specify the coupled evolution of metamodels and models to reduce migration effort with cope language is evolved by incrementally composing modular coupled transformations that adapt the metamodel and specify the corresponding model migrations this modular approach allows to combine the reuse of recurring transformations with the expressiveness to cater for complex transformations we demonstrate the applicability of cope in practice by modeling the coupled evolution of two existing modeling languages
parameterless feature ranking approach is presented for feature selection in the pattern classification task compared with battiti’s mutual information feature selection mifs and kwak and choi’s mifs methods the proposed method derives an estimation of the conditional mi between the candidate feature and the output class given the subset of selected features ie without any parameters like in mifs and mifs methods to be preset thus the intractable problem can be avoided completely which is how to choose an appropriate value for to achieve the tradeoff between the relevance to the output classes and the redundancy with the already selected features furthermore modified greedy feature selection algorithm called the second order mi feature selection approach somifs is proposed experimental results demonstrate the superiority of somifs in terms of both synthetic and benchmark data sets
we show that the generalized variant of formal systems where the underlying equational specifications are membership equational theories and where the rules are conditional and can have equations memberships and rewrites in the conditions is reflective we also show that membership equational logic many sorted equational logic and horn logic with equality are likewise reflective these results provide logical foundations for reflective languages and tools based on these logics
program analysis has been increasingly used in software engineering tasks such as auditing programs for security vulnerabilities and finding errors in general such tools often require analyses much more sophisticated than those traditionally used in compiler optimizations in particular context sensitive pointer alias information is prerequisite for any sound and precise analysis that reasons about uses of heap objects in program context sensitive analysis is challenging because there are over contexts in typical large program even after recursive cycles are collapsed moreover pointers cannot be resolved in general without analyzing the entire program this paper presents new framework based on the concept of deductive databases for context sensitive program analysis in this framework all program information is stored as relations data access and analyses are written as datalog queries to handle the large number of contexts in program the database represents relations with binary decision diagrams bdds the system we have developed called bddbddb automatically translates database queries into highly optimized bdd programs our preliminary experiences suggest that large class of analyses involving heap objects can be described succinctly in datalog and implemented efficiently with bdds to make developing application specific analyses easy for programmers we have also created language called pql that makes subset of datalog queries more intuitive to define we have used the language to find many security holes in web applications
this paper describes the design and implementation of query engine that provides extended sql based access to the data managed by an object oriented database system this query engine allows extended sql queries to be embedded in programs or issued interactively as from command line interface the language supported by the engine is the complete sql select statement plus object extensions for navigating along paths and embedded structures querying nested sets and invoking member functions in addition an object oriented sql view facility is provided using this view facility one can define object oriented views one can also define views that flatten complex oodb schemas allowing direct access by existing tools designed to provide remote access to relational databases the view facility also supports the definition of views that include reference and set valued columns based on other views thus allowing entire view schemas to be created this paper describes the sql query and view extensions and discusses number of issues that arose on the way to the implementation that is currently running on top of the objectstore oodb system
we consider the problem of approximating sliding window joins over data streams in data stream processing system with limited resources in our model we deal with resource constraints by shedding load in the form of dropping tuples from the data streams we first discuss alternate architectural models for data stream join processing and we survey suitable measures for the quality of an approximation of set valued query result we then consider the number of generated result tuples as the quality measure and we give optimal offline and fast online algorithms for it in thorough experimental study with synthetic and real data we show the efficacy of our solutions for applications with demand for exact results we introduce new archive metric which captures the amount of work needed to complete the join in case the streams are archived for later processing
xml extensible markup language has emerged as prevalent standard for document representation and exchange on the web it is often the case that xml documents contain information of different sensitivity degrees that must be selectively shared by possibly large user communities there is thus the need for models and mechanisms enabling the specification and enforcement of access control policies for xml documents mechanisms are also required enabling secure and selective dissemination of documents to users according to the authorizations that these users have in this article we make several contributions to the problem of secure and selective dissemination of xml documents first we define formal model of access control policies for xml documents policies that can be defined in our model take into account both user profiles and document contents and structures we also propose an approach based on an extension of the cryptolope trade approach gladney and lotspiech which essentially allows one to send the same document to all users and yet to enforce the stated access control policies our approach consists of encrypting different portions of the same document according to different encryption keys and selectively distributing these keys to the various users according to the access control policies we show that the number of encryption keys that have to be generated under our approach is minimal and we present an architecture to support document distribution
rich multimedia content including images audio and text are frequently used to describe the same semantics in learning and ebusiness web pages instructive slides multimedia cyclopedias and so on in this paper we present framework for cross media retrieval where the query example and the retrieved result can be of different media types we first construct multimedia correlation space mmcs by exploring the semantic correlation of different multimedia modalities during which multimedia content and co occurrence information is utilized we propose novel ranking algorithm namely ranking with local regression and global alignment lrga which learns robust laplacian matrix for data ranking in lrga for each data point local linear regression model is used to predict the ranking values of its neighboring points we propose unified objective function to globally align the local models from all the data points so that an optimal ranking value can be assigned to each data point lrga is insensitive to parameters making it particularly suitable for data ranking relevance feedback algorithm is proposed to improve the retrieval performance comprehensive experiments have demonstrated the effectiveness of our methods
in this paper we present the design and implementation of the distributed open network emulator done scalable hybrid network emulation simulation environment it has several novel contributions first new model of time called relativistic time that combines the controllability of virtual time with the naturally flowing characteristics of wall clock time this enables hybrid environment in which direct code execution can be mixed with simulation models second done uses new transparent object based framework called weaves which enables the composition of unmodified network applications and protocol stacks to create large scale simulations finally it implements novel parallelization strategy that minimizes the number of independent timelines and offers an efficient mechanism to progress the event timeline our prototype implementation incorporates the complete tcp ip stack from the linux kernel family and executes any application code written for the bsd sockets interface the prototype runs on processors and produces super linear speedup in simulation of hundred infinite source to infinite sink pairs
dependability requirements such as safety and availability often conflict with one another making the development of dependable systems challenging it is not always possible to design system that fulfils all of its dependability requirements and consequently it is necessary to identify conflicts early in the development process and to optimize the architectural design with regard to dependability and cost this paper first provides an overview of fifteen different approaches to optimizing system designs at an architectural level then an abstract method is proposed that synthesises the main points of the different approaches to yield generic approach that could be applied across wide variety of different system attributes
synchronous languages are widely used in industrial applications for the design and implementation of real time embedded and reactive systems and are also well suited for real time verification purposes since they have clean formal semantics in this paper we focuse on the real time temporal logic jctl which can directly support the real time formal verification of synchronous programs for the design of systems in earlier high level as well as in later low level design stages creating bridging between industrial real time descriptions and formal real time verification we extend the model checking capabilities of jctl by introducing new forward symbolic model checking techniques allowing jctl to benefit from both forward as well as traditional backward state traversal methods and of course their combination
indexing structures based on space partitioning are powerless because of the well known curse of dimensionality linear scan of the data with approximation is more efficient in high dimensional similarity search however approaches so far concentrated on reducing ignored the computation cost for an expensive distance function such as norm with fractional the computation cost becomes the bottleneck we propose new technique to address expensive distance functions by indexing the function by pre computing some key values of the function once then the values are used to develop the upper lower bounds of the distance between each data and the query vector the technique is extremely efficient since it avoids most of the distance function computations moreover it does not spend any extra storage because no index is constructed and stored the efficiency is confirmed by cost analyses as well as experiments on synthetic and real data
this paper describes an efficient web page detection approach based on restricting the similarity computations between two versions of given web page to the nodes with the same html tag type before performing the similarity computations the html web page is transformed into an xml like structure in which node corresponds to an open closed html tag analytical expressions and supporting experimental results are used to quantify the improvements that are made when comparing the proposed approach to the traditional one which computes the similarities across all nodes of both pages it is shown that the improvements are highly dependent on the diversity of tags in the page that is the more diverse the page is ie contains mixed content of text images links etc the greater the improvements are while the more uniform it is the lesser they are
radio interference whether intentional or otherwise represents serious threat to assuring the availability of sensor network services as such techniques that enhance the reliability of sensor communications in the presence of radio interference are critical in this article we propose to cope with this threat through technique called channel surfing whereby the sensor nodes in the network adapt their channel assignments to restore network connectivity in the presence of interference we explore two different approaches to channel surfing coordinated channel switching in which the entire sensor network adjusts its channel and spectral multiplexing in which nodes in jammed region switch channels and nodes on the boundary of jammed region act as radio relays between different spectral zones for coordinated channel switching we examine an autonomous strategy where each node detects the loss of its neighbors in order to initiate channel switching to cope with latency issues in the autonomous strategy we propose broadcast assisted channel switching strategy to more rapidly coordinate channel switching for spectral multiplexing we have devised both synchronous and asynchronous strategies to facilitate the scheduling of nodes in order to improve network fidelity when sensor nodes operate on multiple channels in designing these algorithms we have taken system oriented approach that has focused on exploring actual implementation issues under realistic network settings we have implemented these proposed methods on testbed of mica sensor nodes and the experimental results show that channel surfing in its various forms is an effective technique for repairing network connectivity in the presence of radio interference while not introducing significant performance overhead
data mining is most commonly used in attempts to induce association rules from transaction data most previous studies focused on binary valued transactions however the data in real world applications usually consists of quantitative values in the last few years many researchers have proposed evolutionary algorithms for mining interesting association rules from quantitative data in this paper we present preliminary study on the evolutionary extraction of quantitative association rules experimental results on real world dataset show the effectiveness of this approach
service oriented system is composed of independent software units namely services that interact with one another exclusively through message exchanges the proper functioning of such system depends on whether or not each individual service behaves as the other services expect it to behave since services may be developed and operated independently it is unrealistic to assume that this is always the case this article addresses the problem of checking and quantifying how much the actual behavior of service as recorded in message logs conforms to the expected behavior as specified in process model we consider the case where the expected behavior is defined using the bpel industry standard business process execution language for web services bpel process definitions are translated into petri nets and petri net based conformance checking techniques are applied to derive two complementary indicators of conformance fitness and appropriateness the approach has been implemented in toolset for business process analysis and mining namely prom and has been tested in an environment comprising multiple oracle bpel servers
most existing concepts in data warehousing provide central data base system storing gathered raw data and redundantly computed materialized views while in current system architectures client tools are sending queries to central data warehouse system and are only used to graphically present the result the steady rise in power of personal computers and the expansion of network bandwidth makes it possible to store replicated parts of the data warehouse at the client thus saving network bandwidth and utilizing local com puting power within such scenario potentially mobile client does not need to be connected to central server while performing local analyses although this scenario seems attractive several pro blems arise by introducing such an architecture for example schema data could be changed or new fact data could be available this paper is focusing on the first problem and presents ideas on how changed schema data can be detected and efficiently synchro nized between client and server exploiting the special needs and requirements of data warehousing
this paper proposes natural deduction system cnds for classical modal logic with necessity and possibility modalities this new system is an extension of parigot’s classical natural deduction with dual context to formulate modal logic the modal calculus is also introduced as computational extraction of cnds it is an extension of both the calculus and the modal calculus subject reduction confluency and strong normalization of the modal calculus are shown finally the computational interpretation of the modal calculus especially the computational meaning of the modal possibility operator is discussed
the overhead in terms of code size power consumption and execution time caused by the use of precompiled libraries and separate compilation is often unacceptable in the embedded world where real time constraints battery life time and production costs are of critical importance in this paper we present our link time optimizer for the arm architecture we discuss how we can deal with the peculiarities of the arm architecture related to its visible program counter and how the introduced overhead can to large extent be eliminated our link time optimizer is evaluated with four tool chains two proprietary ones from arm and two open ones based on gnu gcc when used with proprietary tool chains from arm ltd our link time optimizer achieved average code size reductions of and percnt while the programs have become and percnt faster and to percnt more energy efficient finally we show how the incorporation of link time optimization in tool chains may influence library interface design
greedy geographic routing is attractive in wireless sensor networks due to its efficiency and scalability however greedy geographic routing may incur long routing paths or even fail due to routing voids on random network topologies we study greedy geographic routing in an important class of wireless sensor networks that provide sensing coverage over geographic area eg surveillance or object tracking systems our geometric analysis and simulation results demonstrate that existing greedy geographic routing algorithms can successfully find short routing paths based on local states in sensing covered networks in particular we derive theoretical upper bounds on the network dilation of sensing covered networks under greedy geographic routing algorithms furthermore we propose new greedy geographic routing algorithm called bounded voronoi greedy forwarding bvgf that allows sensing covered networks to achieve an asymptotic network dilation lower than as long as the communication range is at least twice the sensing range our results show that simple greedy geographic routing is an effective routing scheme in many sensing covered networks
most power reduction techniques have focused on gating the clock to unused functional units to minimize static power consumption while system level optimizations have been used to deal with dynamic power consumption once these techniques are applied register file power consumption becomes dominant factor in the processor this paper proposes power aware reconfiguration mechanism in the register file driven by compiler optimal usage of the register file in terms of size is achieved and unused registers are put into low power state total energy consumption in the register file is reduced by with no appreciable performance penalty for mibench benchmarks on an embedded processor the effect of reconfiguration granularity on energy savings is also analyzed and the compiler approach to optimize energy results is presented
this paper presents an efficient deterministic gossip algorithm for synchronous crash prone message passing processors the algorithm has time complexity logp and message complexity for any this substantially improves the message complexity of the previous best algorithm that has while maintaining the same time complexitythe strength and utility of the new result is demonstrated by constructing deterministic algorithm for performing tasks in this distributed setting previous solutions used coordinator or checkpointing approaches immediately incurring work penalty for crashes or relied on strong communication primitives such as reliable broadcast or had work too close to the trivial bound of oblivious algorithms the new algorithm uses crash prone processors to perform similar and idempotent tasks so long as one processor remains active the work of the algorithm is min logp and its message complexity is fpepsiv min log for any this substantially improves the work complexity of previous solutions using simple point to point messaging while meeting or beating the corresponding message complexity boundsthe new algorithms use communication graphs and permutations with certain combinatorial properties that are shown to exist the algorithms are correct for any permutations and in particular the same expected bounds can be achieved using random permutations
description logics dls theoretically explore knowledge representation and reasoning in concept languages however since they are conceptually oriented they are not equipped with rule based reasoning mechanisms for assertional knowledge bases specifically rules and facts in logic programming lp or the interaction of rules and facts with terminological knowledge to combine rule based reasoning with terminological knowledge this paper presents hybrid reasoning system for dl knowledge bases tbox and abox and first order clause sets the primary result of this study involves the design of sound and complete resolution method for the composed knowledge bases and this method possesses features of an effective deduction procedure such as robinson’s resolution principle
broadcasters are demonstrating interest in systems that ease the process of annotation the huge amount of live and archived video materials exploitation of such assets is considered key method for the improvement of production quality and sport videos are one of the most marketable assets in particular in europe soccer is one of the most relevant sport types this paper deals with detection and recognition of soccer highlights using an approach based on temporal logic models
limited energy supply is one of the major constraints in wireless sensor networks feasible strategy is to aggressively reduce the spatial sampling rate of sensors that is the density of the measure points in field by properly scheduling we want to retain the high fidelity of data collection in this paper we propose data collection method that is based on careful analysis of the surveillance data reported by the sensors by exploring the spatial correlation of sensing data we dynamically partition the sensor nodes into clusters so that the sensors in the same cluster have similar surveillance time series they can share the workload of data collection in the future since their future readings may likely be similar furthermore during short time period sensor may report similar readings such correlation in the data reported from the same sensor is called temporal correlation which can be explored to further save energy we develop generic framework to address several important technical challenges including how to partition the sensors into clusters how to dynamically maintain the clusters in response to environmental changes how to schedule the sensors in cluster how to explore temporal correlation and how to restore the data in the sink with high fidelity we conduct an extensive empirical study to test our method using both real test bed system and large scale synthetic data set
the number of web users and the diversity of their interests increase continuously web content providers seek to infer these interests and to adapt their web sites to improve accessibility of the offered content usage pattern mining is promising approach in support of this goal assuming that past navigation behavior is an indicator of the users interests then web server logs can be mined to infer what the users are interested in on that basis the web site may be reorganized to make the interesting content more easily accessible or recommendations can be dynamically generated to help new visitors find information of interest faster in this paper we discuss case study examining the effectiveness of sequential pattern mining for understanding the users navigation behavior in focused web sites this study examines the web site of an undergraduate course as an example of focused web site that offers information intriusically related to process and closely reflects the workflow of this underlying process we found that in such focused sites indeed visitor behavior reflects the process supported by the web site and that sequential pattern mining can effectively predict web usage behavior in these sites
advances in the efficient discovery of frequent itemsets have led to the development of number of schemes that use frequent itemsets to aid developing accurate and efficient classifiers these approaches use the frequent itemsets to generate set of composite features that expand the dimensionality of the underlying dataset in this paper we build upon this work and present variety of schemes for composite feature selection that achieve substantial reduction in the number of features without adversely affecting the accuracy gains and ii show both analytically and experimentally that the composite features can lead to improved classification models even in the context of support vector machines in which the dimensionality can automatically be expanded by the use of appropriate kernel functions
this paper presents novel real time surveillance video summarization system that employs the eye gaze positions of the surveillance operator the system output can be used to efficiently review the overlooked sections of the surveillance video which increases the reliability of the surveillance system the summary of the operator monitored actions can also be obtained for efficient re examination of the surveillance videos the system employs novel non linear video abstraction method that can mix actions from different frames into the same summary frame for more compact videos the video summaries are performed in real time on average hardware thanks to our improved dynamic programming based summary techniques we performed several experiments using synthetic and real world surveillance videos which showed the practical applicability of our system sample videos and their summaries can be reached at http visiongyteedutr projectsphp id
in this paper we argue that developing information extraction ie programs using datalog with embedded procedural extraction predicates is good way to proceed first compared to current ad hoc composition using eg perl or datalog provides cleaner and more powerful way to compose small extraction modules into larger programs thus writing ie programs this way retains and enhances the important advantages of current approaches programs are easy to understand debug and modify second once we write ie programs in this framework we can apply query optimization techniques to them this gives programs that when run over variety of data sets are more efficient than any monolithic program because they are optimized based on the statistics of the data on which they are invoked we show how optimizing such programs raises challenges specific to text data that cannot be accommodated in the current relational optimization framework then provide initial solutions extensive experiments over real world data demonstrate that optimization is indeed vital for ie programs and that we can effectively optimize ie programs written in this proposed framework
presently solid state disks ssds are emerging as disruptive storage technology and promise breakthroughs for important application properties they quickly enter the enterprise domain and partially replace magnetic disks hdds for database servers to identify performance and energy use of both types of storage devices we have built an analysis tool and measured access times and energy needed for them associating these measurements to physical io patterns we checked and verified the performance claims given by the device manufacturers using typical read write access patterns frequently observed in io intensive database applications we fathomed the performance and energy efficiency potential of spectrum of differing storage devices low end medium and high end ssds and hdds cross comparing measurements of identical experiments we present indicative parameters concerning io performance and energy consumption furthermore we reexamine an io rule of thumb guiding their energy efficient use in database servers these findings suggest some database related optimization areas where they can improve performance while energy is saved at the same time
many modern applications result in significant operating system os component the os component has several implications including affecting the control flow transfer in the execution environment this paper focuses on understanding the operating system effects on control flow transfer and prediction and designing architectural support to alleviate the bottlenecks we characterize the control flow transfer of several emerging applications on commercial operating system we find that the exception driven intermittent invocation of os code and the user os branch history interference increase the misprediction in both user and kernel codewe propose two simple os aware control flow prediction techniques to alleviate the destructive impact of user os branch interference the first one consists of capturing separate branch correlation information for user and kernel code the second one involves using separate branch prediction tables for user and kernel code we study the improvement contributed by the os aware prediction to various branch predictors ranging from simple gshare to more elegant agree multi hybrid and bi mode predictors on entries predictors incorporating os aware techniques yields up to and prediction accuracy improvement in gshare multi hybrid agree and bi mode predictors resulting in up to execution speedup
similarly to the web wikis have advanced from initially simple ad hoc solutions to highly popular systems of widespread use this evolution is reflected by the impressive number of wiki engines available and by the numerous settings and disciplines they have found applicability to in the last decade in conjunction to these rapid advances the question on the fundamental principles underlying the design and the architecture of wiki technologies becomes inevitable for their systematic further development and their long lasting success at public private and corporate level this paper aims at be part of this endeavor building upon the natural relationship between wikis and hypermedia we examine to which extent the current state of the art in the field complemented by results achieved in adjacent communities such as the world wide web and the semantic web fulfills the requirements of modern hypermedia systems as conclusion of the study we outline further directions of research and development which are expected to contribute to the realization of this vision
the pseudo boolean optimization pbo and maximum satisfiability maxsat problems are natural optimization extensions of boolean satisfiability sat in the recent past different algorithms have been proposed for pbo and for maxsat despite the existence of straightforward mappings from pbo to maxsat and vice versa this papers proposes weighted boolean optimization wbo new unified framework that aggregates and extends pbo and maxsat in addition the paper proposes new unsatisfiability based algorithm for wbo based on recent unsatisfiability based algorithms for maxsat besides standard maxsat the new algorithm can also be used to solve weighted maxsat and pbo handling pseudo boolean constraints either natively or by translation to clausal form experimental results illustrate that unsatisfiability based algorithms for maxsat can be orders of magnitude more efficient than existing dedicated algorithms finally the paper illustrates how other algorithms for either pbo or maxsat can be extended to wbo
we consider hierarchical systems where nodes represent entities and edges represent binary relationships among them an example is hierarchical composition of web services where the nodes denote services and the edges represent the parent child relationship of service invoking another service fundamental issue to address in such systems is for two nodes and in the hierarchy whether can see that is whether has visibility over the visibility could be with respect to certain attributes like operational details execution logs and security related issues in general setting seeing may depend on wishing to see ii wishing to be seen by and iii other nodes not objecting to seeing in this paper we develop generic conceptual model to express visibility we study two complementary notions sphere of visibility of node that includes all the nodes in the hierarchy that can see and sphere of noticeability of that includes all the nodes that can see we also identify dual properties coherence and correlation that relate the spheres of different nodes in special ways and also relate the visibility and noticeability notions we study some variants of coherence and correlation also these properties give rise to interesting and useful visibility and noticeability assignments and their representations
users in peer to peer pp system join and leave the network in continuous manner understanding the resilience properties of pp systems under high rate of node churn becomes important in this work we first find that lifetime based dynamic churn model for pp network that has reached stationarity is reducible to uniform node failure model this is simple yet powerful result that bridges the gap between the complex dynamic churn models and the more tractable uniform failure model we further develop the reachable component method and derive the routing performance of wide range of structured pp systems under varying rates of churn we find that the de bruijn graph based routing systems offer excellent resilience under extremely high rate of node turnovers followed by group of routing systems that include can kademlia chord and randomized chord we show that our theoretical predictions agree well with large scale simulation results we finish by suggesting methods to further improve the routing performance of dynamic pp systems in the presence of churn and failures
context sensitive points to analysis is critical for several program optimizations however as the number of contexts grows exponentially storage requirements for the analysis increase tremendously for large programs making the analysis non scalable we propose scalable flow insensitive context sensitive inclusion based points to analysis that uses specially designed multi dimensional bloom filter to store the points to information two key observations motivate our proposal points to information between pointer object and between pointer pointer is sparse and ii moving from an exact to an approximate representation of points to information only leads to reduced precision without affecting correctness of the may points to analysis by using an approximate representation multi dimensional bloom filter can significantly reduce the memory requirements with probabilistic bound on loss in precision experimental evaluation on spec benchmarks and two large open source programs reveals that with an average storage requirement of mb our approach achieves almost the same precision as the exact implementation by increasing the average memory to mb it achieves precision upto for these benchmarks using mod ref analysis as the client we find that the client analysis is not affected that often even when there is some loss of precision in the points to representation we find that the nomodref percentage is within of the exact analysis while requiring mb maximum mb memory and less than minutes on average for the points to analysis another major advantage of our technique is that it allows to trade off precision for memory usage of the analysis
this paper describes simulation system that has been developed to model the deformation and fracture of solid objects in real time gaming context based around corotational tetrahedral finite element method this system has been constructed from components published in the graphics and computational physics literatures the goal of this paper is to describe how these components can be combined to produce an engine that is robust to unpredictable user interactions fast enough to model reasonable scenarios at real time speeds suitable for use in the design of game level and with appropriate controls allowing content creators to match artistic direction details concerning parallel implementation solver design rendering method and other aspects of the simulation are elucidated with the intent of providing guide to others wishing to implement similar systems examples from in game scenes captured on the xbox ps and pc platforms are included
xml is widely regarded as promising means for data representation integration and exchange as companies transact business over the internet the sensitive nature of the information mandates that access must be provided selectively using sophisticated access control specifications using the specification directly to determine if user has access to specific xml data item can hence be extremely inefficient the alternative of fully materializing for each data item the users authorized to access it can be space inefficient in this paper we propose space and time efficient solution to the access control problem for xml data our solution is based on novel notion of compressed accessibility map cam which compactly identifies the xml data items to which user has access by exploiting structural locality of accessibility in tree structured data we present cam lookup algorithm for determining if user has access to data item it takes time proportional to the product of the depth of the item in the xml data and logarithm of the cam size
interactive visualization of large digital elevation models is of continuing interest in scientific visualization gis and virtual reality applications taking advantage of the regular structure of grid digital elevation models efficient hierarchical multiresolution triangulation and adaptive level of detail lod rendering algorithms have been developed for interactive terrain visualization despite the higher triangle count these approaches generally outperform mesh simplification methods that produce irregular triangulated network tin based lod representations in this project we combine the advantage of tin based mesh simplification preprocess with high performance quadtree based lod triangulation and rendering at run time this approach called quadtin generates an efficient quadtree triangulation hierarchy over any irregular point set that may originate from irregular terrain sampling or from reducing oversampling in high resolution grid digital elevation models
an important part of many software maintenance tasks is to gain sufficient level of understanding of the system at hand the use of dynamic information to aid in this software understanding process is common practice nowadays major issue in this context is scalability due to the vast amounts of information it is very difficult task to successfully navigate through the dynamic data contained in execution traces without getting lost in this paper we propose the use of two novel trace visualization techniques based on the massive sequence and circular bundle view which both reflect strong emphasis on scalability these techniques have been implemented in tool called extravis by means of distinct usage scenarios that were conducted on three different software systems we show how our approach is applicable in three typical program comprehension tasks trace exploration feature location and top down analysis with domain knowledge
previous efforts on event detection from the web have focused primarily on web content and structure data ignoring the rich collection of web log data in this paper we propose the first approach to detect events from the click through data which is the log data of web search engines the intuition behind event detection from click through data is that such data is often event driven and each event can be represented as set ofquery page pairs that are not only semantically similar but also have similar evolution pattern over time given the click through data in our proposed approach we first segment it into sequence of bipartite graphs based on theuser defined time granularity next the sequence of bipartite graphs is represented as vector based graph which records the semantic and evolutionary relationships between queries and pages after that the vector based graph is transformed into its dual graph where each node is query page pair that will be used to represent real world events then the problem of event detection is equivalent to the problem of clustering the dual graph of the vector based graph the clustering process is based on two phase graph cut algorithm in the first phase query page pairs are clustered based on thesemantic based similarity such that each cluster in the result corresponds to specific topic in the second phase query page pairs related to the same topic are further clustered based on the evolution pattern based similarity such that each cluster is expected to represent specific event under the specific topic experiments with real click through data collected from commercial web search engine show that the proposed approach produces high quality results
we explore the use of the landing page content in sponsored search ad selection specifically we compare the use of the ad’s intrinsic content to augmenting the ad with the whole or parts of the landing page we explore two types of extractive summarization techniques to select useful regions from the landing pages out of context and in context methods out of context methods select salient regions from the landing page by analyzing the content alone without taking into account the ad associated with the landing page in context methods use the ad context including its title creative and bid phrases to help identify regions of the landing page that should be used by the ad selection engine in addition we introduce simple yet effective unsupervised algorithm to enrich the ad context to further improve the ad selection experimental evaluation confirms that the use of landing pages can significantly improve the quality of ad selection we also find that our extractive summarization techniques reduce the size of landing pages substantially while retaining or even improving the performance of ad retrieval over the method that utilize the entire landing page
many systems design configuration runtime and management decisions must be made from large set of possible alternatives ad hoc heuristics have traditionally been used to make these decisions but they provide no guarantees of solution quality we argue that operations research style optimization techniques should be used to solve these problems we provide an overview of these techniques and where they are most effective address common myths and fears about their use in making systems decisions give several success stories and propose systems areas that could benefit from their application
this paper presents symbol spotting approach for indexing by content database of line drawing images as line drawings are digital born documents designed by vectorial softwares instead of using pixel based approach we present spotting method based on vector primitives graphical symbols are represented by set of vectorial primitives which are described by an off the shelf shape descriptor relational indexing strategy aims to retrieve symbol locations into the target documents by using combined numerical relational description of structures the zones which are likely to contain the queried symbol are validated by hough like voting scheme in addition performance evaluation framework for symbol spotting in graphical documents is proposed the presented methodology has been evaluated with benchmarking set of architectural documents achieving good performance results
users of social networking services can connect with each other by forming communities for online interaction yet as the number of communities hosted by such websites grows over time users have even greater need for effective community recommendations in order to meet more users in this paper we investigate two algorithms from very different domains and evaluate their effectiveness for personalized community recommendation first is association rule mining arm which discovers associations between sets of communities that are shared across many users second is latent dirichlet allocation lda which models user community co occurrences using latent aspects in comparing lda with arm we are interested in discovering whether modeling low rank latent structure is more effective for recommendations than directly mining rules from the observed data we experiment on an orkut data set consisting of users and communities our empirical comparisons using the top recommendations metric show that lda performs consistently better than arm for the community recommendation task when recommending list of or more communities however for recommendation lists of up to communities arm is still bit better we analyze examples of the latent information learned by lda to explain this finding to efficiently handle the large scale data set we parallelize lda on distributed computers and demonstrate our parallel implementation’s scalability with varying numbers of machines
in the mobile code paradigm for distributed systems as well as in the active networks and agents frameworks programs from possibly unknown hosts interact with the resources local to the host while this model offers great potential it also raises difficult security and performance issues the mobile code unit should be guaranteed to be safe not to abuse the resources of the host in limited time since the acquisition of the code happens in real time eg java applet existing host security schemes can be classified as discretion based accept certificate of authenticity at your discretion and ii verification based formally prove the safety verification provides the desired level of security however it comes at large performance delay while discretion is efficient but limited and relies on blind trust we present an optimization verification caching for enhancing the performance of verification based security methods secure indexing of previously encountered code units is established by using message digest algorithm eg md to generate fingerprint of the code we characterize the performance and security of this scheme and investigate optimizations to lower the cost of generating the fingerprint by indexing on small partial fingerprints and generating the full fingerprint only if there is cache hit in addition we generalize the approach to allow multiple trusting nodes to distribute caching among them sharing experiences and effectively increasing the cache size
this paper explores the idea of managing mobile ad hoc networks manets by the communication needs of their nodes as means to facilitate the operation of distributed applications specifically we present middleware layer that enables reasoning about the multiple possibilities there may exist to ensure satisfiability of certain communication needs this middleware has been explicitly devised to handle partial and changeable knowledge about the networks and to guide the search for missing information whenever it cannot conclude whether it will be possible to satisfy some needs these features provide the basis to implement policies with which to coordinate activities in manet in quest for the configuration that best satisfies the communication needs of its nodes we provide simulation results to show the comparative advantages of our solution plus report of experiments to assess its practicality and usability
in the design of algorithms the greedy paradigm provides powerful tool for solving efficiently classical computational problems within the framework of procedural languages however expressing these algorithms within the declarative framework of logic based languages has proven difficult research challenge in this paper we extend the framework of datalog like languages to obtain simple declarative formulations for such problems and propose effective implementation techniques to ensure computational complexities comparable to those of procedural formulations these advances are achieved through the use of the choice construct extended with preference annotations to effect the selection of alternative stable models and nondeterministic fixpoints we show that with suitable storage structures the differential fixpoint computation of our programs matches the complexity of procedural algorithms in classical search and optimization problems
are trust and risk important in consumers electronic commerce purchasing decisions what are the antecedents of trust and risk in this context how do trust and risk affect an internet consumer’s purchasing decision to answer these questions we develop theoretical framework describing the trust based decision making process consumer uses when making purchase from given site ii test the proposed model using structural equation modeling technique on internet consumer purchasing behavior data collected via web survey and iii consider the implications of the model the results of the study show that internet consumers trust and perceived risk have strong impacts on their purchasing decisions consumer disposition to trust reputation privacy concerns security concerns the information quality of the website and the company’s reputation have strong effects on internet consumers trust in the website interestingly the presence of third party seal did not strongly influence consumers trust
when database query has large number of results the user can only be shown one page of results at time one popular approach is to rank results such that the best results appear first however standard database query results comprise set of tuples with no associated ranking it is typical to allow users the ability to sort results on selected attributes but no actual ranking is defined an alternative approach to the first page is not to try to show the best results but instead to help users learn what is available in the whole result set and direct them to finding what they need in this paper we demonstrate through user study that page comprising one representative from each of clusters generated through medoid clustering is superior to multiple alternative candidate methods for generating representatives of data set users often refine query specifications based on returned results traditional clustering may lead to completely new representatives after refinement step furthermore clustering can be computationally expensive we propose tree based method for efficiently generating the representatives and smoothly adapting them with query refinement experiments show that our algorithms outperform the state of the art in both result quality and efficiency
xquery and sql xml are powerful new languages for querying xml data however they contain number of stumbling blocks that users need to be aware of to get the expected results and performance for example certain language features make it hard if not impossible to exploit xml indexesthe major database vendors provide xquery and sql xml support in their current or upcoming product releases in this paper we identify common pitfalls gleaned from the experiences of early adopters of this functionality we illustrate these pitfalls through concrete examples explain the unexpected query behavior and show alternative formulations of the queries that behave and perform as anticipated as results we provide guidelines for xquery and sql xml users feedback on the language standards and food for thought for emerging languages and apis
we describe method for the acquisition of deformable human geometry from silhouettes our technique uses commercial tracking system to determine the motion of the skeleton then estimates geometry for each bone using constraints provided by the silhouettes from one or more cameras these silhouettes do not give complete characterization of the geometry for particular point in time but when the subject moves many observations of the same local geometries allow the construction of complete model our reconstruction algorithm provides simple mechanism for solving the problems of view aggregation occlusion handling hole filling noise removal and deformation modeling the resulting model is parameterized to synthesize geometry for new poses of the skeleton we demonstrate this capability by rendering the geometry for motion sequences that were not included in the original datasets
the design of concurrent programs is error prone due to the interaction between concurrently executing threads traditional automated techniques for finding errors in concurrent programs such as model checking explore all possible thread interleavings since the number of thread interleavings increases exponentially with the number of threads such analyses have high computational complexity in this paper we present novel analysis technique for concurrent programs that avoids this exponential complexity our analysis transforms concurrent program into sequential program that simulates the execution of large subset of the behaviors of the concurrent program the sequential program is then analyzed by tool that only needs to understand the semantics of sequential execution our technique never reports false errors but may miss errors we have implemented the technique in kiss an automated checker for multithreaded programs and obtained promising initial results by using kiss to detect race conditions in windows device drivers
peer to peer pp systems based on distributed hash tables allow the construction of applications with high scalability and high availability these kinds of applications are more sophisticated and demanding on data volume to be handled as well as their location cache is quite interesting within these applications since cache reduces the latency experienced by users nevertheless configuring suitable cache is not trivial issue due to the quantity of parameters especially within distributed and dynamic environments this is the motivation for proposing the dhtcache cache service that allows developers to experiment with different cache configurations to provide information to make better decisions on the type of cache suitable for pp applications
contrast set mining aims at finding differences between different groups this paper shows that contrast set mining task can be transformed to subgroup discovery task whose goal is to find descriptions of groups of individuals with unusual distributional characteristics with respect to the given property of interest the proposed approach to contrast set mining through subgroup discovery was successfully applied to the analysis of records of patients with brain stroke confirmed by positive ct test in contrast with patients with other neurological symptoms and disorders having normal ct test results detection of coexisting risk factors as well as description of characteristic patient subpopulations are important outcomes of the analysis
mobile location dependent information services ldiss have become increasingly popular in recent years however data caching strategies for ldiss have thus far received little attention in this paper we study the issues of cache invalidation and cache replacement for location dependent data under geometric location model we introduce new performance criterion called caching efficiency and propose generic method for location dependent cache invalidation strategies in addition two cache replacement policies pa and paid are proposed unlike the conventional replacement policies pa and paid take into consideration the valid scope area of data value we conduct series of simulation experiments to study the performance of the proposed caching schemes the experimental results show that the proposed location dependent invalidation scheme is very effective and the pa and paid policies significantly outperform the conventional replacement policies
web refers to new generation of web applications designed to support collaboration and the sharing of user generated content these applications are increasingly being used not just to share personal information but also to manage it for example user might use facebook to manage their photos and personal contacts networking site such as linkedin to manage professional contacts and various project wiki sites to manage and share information about publications and presentations as result personal data and its management become fragmented not only across desktop applications but also between desktop applications and various web applications we look at personal information management pim issues in the realm of web showing how the respective communities might profit from each other
this paper reports our in situ study using contextual inquiry ci it solicits user requirements of hierarchically organized search results for mobile access in our experiment search activities of our subjects are recorded in the video and the interviewer solicits the interface requirements during and after the experiment an affinity diagram is built as summary of our findings in the experiment and the major issues are discussed in this paper the search behavior of our subjects is summarized into flow chart in this study we report mobile interface features that are desired by our users in addition to those found in an earlier survey
in large scale clusters and computational grids component failures become norms instead of exceptions failure occurrence as well as its impact on system performance and operation costs have become an increasingly important concern to system designers and administrators in this paper we study how to efficiently utilize system resources for high availability clusters with the support of the virtual machine vm technology we design reconfigurable distributed virtual machine rdvm infrastructure for clusters computing we propose failure aware node selection strategies for the construction and reconfiguration of rdvms we leverage the proactive failure management techniques in calculating nodes reliability status we consider both the performance and reliability status of compute nodes in making selection decisions we define capacity reliability metric to combine the effects of both factors in node selection and propose best fit algorithms to find the best qualified nodes on which to instantiate vms to run parallel jobs we have conducted experiments using failure traces from production clusters and the nas parallel benchmark programs on real cluster the results show the enhancement of system productivity and dependability by using the proposed strategies with the best fit strategies the job completion rate is increased by compared with that achieved in the current lanl hpc cluster and the task completion rate reaches
spatial co location patterns represent the subsets of boolean spatial features whose instances often locate in close geographic proximity the existing co location pattern mining algorithms aim to find spatial relations based on the distance threshold however it is hard to decide the distance threshold for spatial data set without any prior knowledge moreover spatial data sets are usually not evenly distributed and single distance value cannot fit an irregularly distributed spatial data set well in this paper we propose the notion of the nearest features simply nf based co location pattern the nf set of spatial feature’s instances is used to evaluate the spatial relationship between this feature and any other feature nf based co location pattern mining algorithm by using tree knfcom in short is further presented to identify the co location patterns in large spatial data sets the experimental results show that the knfcom algorithm is effective and efficient and its complexity is
an ontology is powerful way of representing knowledge for multiple purposes there are several ontology languages for describing concepts properties objects and relationships however ontologies in information systems are not primarily written for human reading and communication among humans for many business government and scientific purposes written documents are the primary description and communication media for human knowledge communication unfortunately there is significant gap between knowledge expressed as textual documents and knowledge represented as ontologies semantic documents aim at combining documents and ontologies and allowing users to access the knowledge in multiple ways by adding annotations to electronic document formats and including ontologies in electronic documents it is possible to reconcile documents and ontologies and to provide new services such as ontology based searches of large document databases to accomplish this goal semantic documents require tools that support both complex ontologies and advanced document formats the protege ontology editor together with custom tailored documentation handling extension enables developers to create semantic documents by linking preexisting documents to ontologies
xml has been explored by both research and industry communities more than papers were published on different aspects of xml with so many publications it is hard for someone to decide where to start hence this paper presents some of the research topics on xml namely xml on relational databases query processing views data matching and schema evolution it then summarizes some some of the most relevant or traditional papers on those subjects
in order to verify larger and more complicated systems with model checking it is necessary to apply some abstraction techniques using subset of first order logic called euf is one of them the euf model checking problem is however generally undecidable in this paper we introduce technique called term height reduction to guarantee the termination of state enumeration in euf model checking this technique generates an over approximate set of states including all the reachable states by checking designated invariant property we can guarantee whether the invariant property always holds for the design when verification succeeds we apply our algorithm to simple program and dsp design and show the experimental results
this article presents the design implementation and evaluation of escheduler an energy efficient soft real time cpu scheduler for multimedia applications running on mobile device escheduler seeks to minimize the total energy consumed by the device while meeting multimedia timing requirements to achieve this goal escheduler integrates dynamic voltage scaling into the traditional soft real time cpu scheduling it decides at what cpu speed to execute applications in addition to when to execute what applications escheduler makes these scheduling decisions based on the probability distribution of cycle demand of multimedia applications and obtains their demand distribution via online profilingwe have implemented escheduler in the linux kernel and evaluated it on laptop with variable speed cpu and typical multimedia codecs our experimental results show four findings first the cycle demand distribution of our studied codecs is stable or changes slowly this stability implies the feasibility to perform our proposed energy efficient scheduling with low overhead second escheduler delivers soft performance guarantees to these codecs by bounding their deadline miss ratio under the application specific performance requirements third escheduler reduces the total energy of the laptop by percnt to percnt relative to the scheduling algorithm without voltage scaling and by percnt to percnt relative to voltage scaling algorithms without considering the demand distribution finally escheduler saves energy by percnt to percnt by explicitly considering the discrete cpu speeds and the corresponding total power of the whole laptop rather than assuming continuous speeds and cubic speed power relationship
we consider the problem of scheduling an application composed of independent tasks on fully heterogeneous master worker platform with communication costs we introduce bi criteria approach aiming at maximizing the throughput of the application while minimizing the energy consumed by participating resources assuming arbitrary super linear power consumption laws we investigate different models for energy consumption with and without start up overheads building upon closed form expressions for the uniprocessor case we derive optimal or asymptotically optimal solutions for both models
energy efficiency is becoming an increasingly important feature for both mobile and high performance server systems most processors designed today include power management features that provide processor operating points which can be used in power management algorithms however existing power management algorithms implicitly assume that lower performance points are more energy efficient than higher performance points our empirical observations indicate that for many systems this assumption is not validwe introduce new concept called critical power slope to explain and capture the power performance characteristics of systems with power management features we evaluate three systems clock throttled pentium laptop frequency scaled powerpc platform and voltage scaled system to demonstrate the benefits of our approach our evaluation is based on empirical measurements of the first two systems and publicly available data for the third using critical power slope we explain why on the pentium based system it is energy efficient to run only at the highest frequency while on the powerpc based system it is energy efficient to run at the lowest frequency point we confirm our results by measuring the behavior of web serving benchmark furthermore we extend the critical power slope concept to understand the benefits of voltage scaling when combined with frequency scaling we show that in some cases it may be energy efficient not to reduce voltage below certain point
reducing the number of data cache accesses improves performance port efficiency bandwidth and motivates the use of single ported caches instead of complex and expensive multi ported ones in this paper we consider an intrusion detection system as target application and study the effectiveness of two techniques prefetching data from the cache into local buffers in the processor core and ii load instruction reuse ir in reducing data cache traffic the analysis is carried out using microarchitecture and instruction set representative of programmable processor with the aim of determining if the above techniques are viable for programmable pattern matching engine found in many network processors we find that ir is the most generic and efficient technique which reduces cache traffic by up to however combination of prefetching and ir with application specific tuning performs as well as and sometimes better than ir alone
this paper introduces techniques to detect mutability of fields and classes in java variable is considered to be mutable if new value is stored into it as well as if any of its reachable variables is mutable we present static flow sensitive analysis algorithm which can be applied to any java component the analysis classifies fields and classes as either mutable or immutable in order to facilitate openworld analysis the algorithm identifies situations that expose variables to potential modification by code outside the component as well as situations where variables are modified by the analyzed code we also present an implementation of the analysis which focuses on detecting mutability of class variables so as to avoid isolation problems the implementation incorporates intra and inter procedural data flow analyses and is shown to be highly scalable experimental results demonstrate the effectiveness of the algorithms
input validation is the enforcement of constraints that an input must satisfy before it is accepted in program it is an essential and important feature in large class of systems and usually forms major part of data intensive system currently the design and implementation of input validation are carried out by application developers the recovery and maintenance of input validation implemented in system is challenging issue in this paper we introduce variant of control flow graph called validation flow graph as model to analyze input validation implemented in program we have also discovered some empirical properties that characterizing the implementation of input validation based on the model and the properties discovered we then propose method that recovers the input validation model from source and use program slicing techniques to aid the understanding and maintenance of input validation we have also evaluated the proposed method through case studies the results show that the method can be very useful and effective for both experienced and inexperienced developers
nand flash memory can provide cost effective secondary storage in mobile embedded systems but its lack of random access capability means that code shadowing is generally required taking up extra ram space demand paging with nand flash memory has recently been proposed as an alternative which requires less ram this scheme is even more attractive for onenand flash which consists of nand flash array with sram buffers and supports execute in place xip which allows limited random access to data on the sram bufferswe introduce novel demand paging method for onenand flash memory with xip feature the proposed on line demand paging method with xip adopts finite size sliding window to capture the paging history and thus predict future page demands we particularly focus on non critical code accesses which can disturb real time codeexperimental results show that our method outperforms conventional lru based demand paging by in terms of execution time and by in terms of energy consumption it even beats the optimal solution obtained from min which is conventional off line demand paging technique by and respectively
dbtm ims and tandemtm systems aries is applicable not only to database management systems but also to persistent object oriented languages recoverable file systems and transaction based operating systems aries has been implemented to varying degrees in ibm’s os tm extended edition database manager db workstation data save facility vm starburst and quicksilver and in the university of wisconsin’s exodus and gamma database machine
value sensitive design and participatory design are two methodological frameworks that account for ethical issues throughout the process of technology design through analysis and case studies this paper argues that such methods should be applied to persuasive technology computer systems that are intended to change behaviors and attitudes
in this paper we analyze the performance of functional disk system with relational database engine fds rii for nonuniform data distribution fds rii is relational storage system designed to accelerate relational algebraic operations which employs hash based algorithm to process relational operations basically in the hash based algorithm relation is first partitioned into several clusters by split function then each cluster is staged onto the main memory and further hash function is applied to each cluster to perform relational operation thus the nonuniformity of split and hash functions is considered to be resulting from nonuniform data distribution on the hash based algorithm we clarify the effect of nonuniformity of the hash and split functions on the join performance it is possible to attenuate the effect of the hash function nonuniformity by increasing the number of processors and processing the buckets in parallel furthermore in order to tackle the nonuniformity of split function we introduce the combined hash algorithm this algorithm combines the grace hash algorithm with the nested loop algorithm in order to handle the overflown bucket efficiently using the combined hash algorithm we find that the execution time of the nonuniform data distribution is almost equal to that of the uniform data distribution thus we can get sufficiently high performance on fds rii also for nonuniformly distributed data
in this paper we present partitioning mapping and routing optimization framework for energy efficient vfi voltage frequency island based network on chip unlike the recent work which only performs partitioning together with voltage frequency assignment for given mesh network layout our framework consists of three key vfi aware components ie vfi aware partitioning vfi aware mapping and vfi aware routing thus our technique effectively reduces vfi overheads such as mixed clock fifos and voltage level converters by over and energy consumption by over compared with the previous state of art works
to exploit larger amounts of instruction level parallelism processors are being built with wider issue widths and larger numbers of functional units instruction fetch rate must also be increased in order to effectively exploit the performance potential of such processors block structured isas provide an effective means of increasing the instruction fetch rate we define an optimization called block enlargement that can be applied to block structured isa to increase the instruction fetch rate of processor that implements that isa we have constructed compiler that generates block structured isa code and simulator that models the execution of that code on block structured isa processor we show that for the specint benchmarks the block structured isa improves the performance of an aggressive wide issue dynamically scheduled processor by while using simpler microarchitectural mechanisms to support wide issue and dynamic scheduling
we consider mobile ad hoc network manet formed by agents that move at speed according to the manhattan random way point model over square region of side length the resulting stationary agent spatial probability distribution is far to be uniform the average density over the central zone is asymptotically higher than that over the suburb agents exchange data iff they are at distance at most within each other we study the flooding time of this manet the number of time steps required to broadcast message from one source agent to all agents of the network in the stationary phase we prove the first asymptotical upper bound on the flooding time this bound holds with high probability it is decreasing function of and and it is tight for wide and relevant range of the network parameters ie and consequence of our result is that flooding over the sparse and highly disconnected suburb can be as fast as flooding over the dense and connected central zone rather surprisingly this property holds even when is exponentially below the connectivity threshold of the manet and the speed is very low
the act of reminiscence is an important element of many interpersonal activities especially for elders where the therapeutic benefits are well understood individuals typically use various objects as memory aids in the act of recalling sharing and reviewing their memories of life experiences through preliminary user study with elders using cultural probe we identified that common memory aid is photo album or scrapbook in which items are collected and preserved in this article we present and discuss novel interface to our memento system that can support the creation of scrapbooks that are both digital and physical in form we then provide an overview of the user’s view of memento and brief description of its multi agent architecture we report on series of exploratory user studies in which we evaluate the effect and performance of memento and its suitability in supporting memory sharing and dissemination with physical digital scrapbooks taking account of the current technical limitations of memento our results show general approval and suitability of our system as an appropriate interaction scheme for the creation of physical digital items such as scrapbooks
this paper presents novel ranking style word segmentation approach called rsvm seg which is well tailored to chinese information retrieval cir this strategy makes segmentation decision based on the ranking of the internal associative strength between each pair of adjacent characters of the sentence on the training corpus composed of query items ranking model is learned by widely used tool ranking svm with some useful statistical features such as mutual information difference of test frequency and dictionary information experimental results show that this method is able to eliminate overlapping ambiguity much more effectively compared to the current word segmentation methods furthermore as this strategy naturally generates segmentation results with different granularity the performance of cir systems is improved and achieves the state of the art
query processing techniques for xml data have focused mainly on tree pattern queries tpqs however the need for querying xml data sources whose structure is very complex or not fully known to the user and the need to integrate multiple xml data sources with different structures have driven recently the suggestion of query languages that relax the complete specification of tree pattern in order to implement the processing of such languages in current dbmss their containment problem has to be efficiently solved in this paper we consider query language which generalizes tpqs by allowing the partial specification of tree pattern partial tree pattern queries ptpqs constitute large fragment of xpath that flexibly permits the specification of broad range of queries from keyword queries without structure to queries with partial specification of the structure to complete tpqs we address the containment problem for ptpqs this problem becomes more complex in the context of ptpqs because the partial specification of the structure allows new non trivial structural expressions to be inferred from those explicitly specified in query we show that the containent problem cannot be characterized by homomorphisms between ptpqs even when ptpqs are put in canonical form that comprises all derived structural expressions we provide necessary and sufficient conditions for this problem in terms of homomorphisms between ptpqs and possibly exponential number of tpqs to cope with the high complexity of ptpq containment we suggest heuristic approach for this problem that trades accuracy for speed an extensive experimental evaluation of our heuristic shows that our heuristic approach can be efficiently implemented in query optimizer
our research shows that for large databases without considerable additional storage overhead cluster based retrieval cbr can compete with the time efficiency and effectiveness of the inverted index based full search fs the proposed cbr method employs storage structure that blends the cluster membership information into the inverted file posting lists this approach significantly reduces the cost of similarity calculations for document ranking during query processing and improves efficiency for example in terms of in memory computations our new approach can reduce query processing time to of fs the experiments confirm that the approach is scalable and system performance improves with increasing database size in the experiments we use the cover coefficient based clustering methodology sup sup and the financial times database of trec containing documents of size mb defined by terms with total of inverted index elements this study provides cbr efficiency and effectiveness experiments using the largest corpus in an environment that employs no user interaction or user behavior assumption for clustering
security in host based computing continues to be an area of regular research problems in host based security exist in three principal areas application protection host environment protection and data protection these three areas echo the security issues for mobile agents protecting agents from hostile hosts protecting hosts from agents and protecting data in transit as identified in vuong and peng mobile agents are specialized applications which travel between hosts mobility is achieved by passing data and execution parameters from one host to another parallel can be drawn between mobile agents and host based applications considering both perform host based computations in this paper we regard the security challenges for mobile agents equivalent to those of applications installed on host furthermore we also consider circular threat that exists between applications and the host environment
in this paper we present an unlexicalized parser for german which employs smoothing and suffix analysis to achieve labelled bracket score of higher than previously reported results on the negra corpus in addition to the high accuracy of the model the use of smoothing in an unlexicalized parser allows us to better examine the interplay between smoothing and parsing results
restricting network access of routing and packet forwarding to well behaving nodes and denying access from misbehaving nodes are critical for the proper functioning of mobile ad hoc network where cooperation among all networking nodes is usually assumed however the lack of network infrastructure the dynamics of the network topology and node membership and the potential attacks from inside the network by malicious and or noncooperative selfish nodes make the conventional network access control mechanisms not applicable we present ursa ubiquitous and robust access control solution for mobile ad hoc networks ursa implements ticket certification services through multiple node consensus and fully localized instantiation it uses tickets to identify and grant network access to well behaving nodes in ursa no single node monopolizes the access decision or is completely trusted instead multiple nodes jointly monitor local node and certify revoke its ticket furthermore ursa ticket certification services are fully localized into each node’s neighborhood to ensure service ubiquity and resilience through analysis simulations and experiments we show that our design effectively enforces access control in the highly dynamic mobile ad hoc network
operating system abstractions do not always reach high enough for direct use by language or applications designer the gap is filled by language specific runtime environments which become more complex for richer languages commonlisp needs more than which needs more than but language specific environments inhibit integrated multi lingual programming and also make porting hard for instance because of operating system dependencies to help solve these problems we have built the portable common runtime pcr language independent and operating system independent base for modern languages pcr offers four interrelated facilities storage management including universal garbage collection symbol binding including static and dynamic linking and loading threads lightweight processes and low level including network sockets pcr is ldquo common rdquo because these facilities simultaneously support programs in several languages pcr supports cedar scheme and commonlisp intercalling and runs pre existing and commonlisp kyoto binaries pcr is ldquo portable rdquo because it uses only small set of operating system features the pcr source code is available for use by other researchers and developers
resource description extracted by query sampling method can be applied to determine which database sources certain query should be firstly sent to in this paper we propose contextualized query sampling method to extract the resources which are most relevant to up to date context practically the proposed approach is adopted to personal crawler systems the so called focused crawlers which can support the corresponding user’s web navigation tasks in real time by taking into account the user context eg intentions or interests the crawler can build the queries to evaluate candidate information sources as result we can discover semantic associations between user context and the sources and ii between all pairs of the sources these associations are applied to rank the sources and transform the queries for the other sources for evaluating the performance of contextualized query sampling on information sources we compared the ranking lists recommended by the proposed method with user feedbacks ie ideal ranks and also computed the precision of discovered subsumptions as semantic associations between the sources
in this work we present novel approach to clustering web site users into different groups and generating common user profiles these profiles can be used to make recommendations personalize web sites and for other uses such as targeting users for advertising by using the concept of mass distribution in dempster shafer’s theory the belief function similarity measure in our algorithm adds to the clustering task the ability to capture the uncertainty among web user’s navigation behavior our algorithm is relatively simple to use and gives comparable results to other approaches reported in the literature of web mining
memory constraint presents one of the critical challenges for embedded software writers while circuit level solutions based on cramming as many bits as possible into the smallest area possible are certainly important memory conscious software can bring much higher benefits focusing on an embedded java based environment this paper studies potential benefits and challenges when heap memory is managed at field granularity instead of object this paper discusses these benefits and challenges with the help of two field level analysis techniques the first of these called the field level lifetime analysis takes advantage of the observation that for given object instance not all the fields have the same lifetime the field level lifetime analysis demonstrates the potential benefits of exploiting this information our second analysis referred to as the disjointness analysis is built upon the fact that for given object some fields have disjoint lifetimes and therefore they can potentially share the same memory space to quantify the impact of these techniques we performed experiments with several benchmarks and point out the important characteristics that need to be considered by application writers
in this paper we analyze novel paradigm of reliable communication which is not based on the traditional timeout and retransmit mechanism of tcp our approach which we call fountain based protocol fbp consists of using digital fountain encoding which guarantees that duplicate packets are almost impossible by using game theory we analyze the behavior of tcp and fbp in the presence of congestion we show that hosts using tcp have an incentive to switch to an fbp approach obtaining higher goodput furthermore we also show that nash equilibrium occurs when all hosts use fbp ie when fbp hosts act in an absolutely selfish manner injecting packets into the network as fast as they can and without any kind of congestion control approach at this equilibrium the performance of the network is similar to the performance obtained when all hosts comply with tcp regarding the interaction of hosts using fbp at different rates our results show that the nash equilibrium is reached when all hosts send at the highest possible rate and as before that the performance of the network in such case is similar to the obtained when all hosts comply with tcp
web applications such as delicious flickr or lastfm have recently become extremely popular and as result large amount of semantically rich metadata produced by users becomes available and exploitable tag information can be used for many purposes eg user profiling recommendations clustering etc though the benefit of tags for search is by far the most discussed usage tag types differ largely across systems and previous studies showed that while some tag type categories might be useful for some particular users when searching they may not bring any benefit to others the present paper proposes an approach which utilizes rule based as well as model based methods in order to automatically identify exactly these different types of tags we compare the automatic tag classification produced by our algorithms against ground truth data set consisting of manual tag type assignments produced by human raters experimental results show that our methods can identify tag types with high accuracy thus enabling further improvement of systems making use of social tags
novel approach to structured design of complex interactive virtual reality applications called flex vr is presented two main elements of the approach are first componentization of vr content which enables to dynamically compose interactive behavior rich virtual scenes from independent components and second high level vr content model which enables users to easily create and manipulate complex vr application content
appliances represent quickly growing domain that raises new challenges in os design and development first new products appear at rapid pace to satisfy emerging needs second the nature of these markets makes these needs unpredictable lastly given the competitiveness of such markets there exists tremendous pressure to deliver new products in fact innovation is requirement in emerging markets to gain commercial successthe embedded nature of appliances makes upgrading and fixing bugs difficult and sometimes impossible to achieve consequently there must be high level of confidence in the software additionally the pace of innovation requires rapid os development so as to match ever changing needs of new appliancesto offer confidence software must be highly robust that is for given type of appliance critical behavioral properties must be determined and guaranteed eg power management must ensure that data are not lost robustness can be provided by mechanisms and or tools the ideal approach takes the form of certification tools aimed at statically verifying critical properties such tools avoid the need for laborious and error prone testing processto be first in market requires not only that the testing process be shortened but the development time as well to achieve this goal three strategies are needed re use of code to rapidly produce new product by assembling existing building blocks factorization of expertise to capitalize on domain specific experience and open endedness of software systems to match evolving functionalities and hardware featuresin this paper existing os approaches are assessed with respect to the requirements raised by appliances the limitations of these approaches are analyzed and used as basis to propose new approach to designing and structuring oses for appliances this approach is based on domain specific languages dsls and offers rapid development of robust oses we illustrate and assess our approach by concrete examples
the slowing pace of commodity microprocessor performance improvements combined with ever increasing chip power demands has become of utmost concern to computational scientists as result the high performance computing community is examining alternative architectures that address the limitations of modern cache based designs in this work we examine the potential of using the forthcoming sti cell processor as building block for future high end computing systems our work contains several novel contributions first we introduce performance model for cell and apply it to several key scientific computing kernels dense matrix multiply sparse matrix vector multiply stencil computations and ffts the difficulty of programming cell which requires assembly level intrinsics for the best performance makes this model useful as an initial step in algorithm design and evaluation next we validate the accuracy of our model by comparing results against published hardware results as well as our own implementations on the cell full system simulator additionally we compare cell performance to benchmarks run on leading superscalar amd opteron vliw intel itanium and vector cray xe architectures our work also explores several different mappings of the kernels and demonstrates simple and effective programming model for cell’s unique architecture finally we propose modest microarchitectural modifications that could significantly increase the efficiency of double precision calculations overall results demonstrate the tremendous potential of the cell architecture for scientific computations in terms of both raw performance and power efficiency
unstructured hexahedral volume meshes are of particular interest for visualization and simulation applications they allow regular tiling of the three dimensional space and show good numerical behaviour in finite element computations beside such appealing properties volume meshes take up huge amounts of space when stored in raw format in this paper we present technique for encoding the connectivity and geometry of unstructured hexahedral volume meshes for connectivity compression we generalize the concept of coding with degrees from the surface to the volume case in contrast to the connectivity of surface meshes which can be coded as sequence of vertex degrees the connectivity of volume meshes is coded as sequence of edge degrees this naturally exploits the regularity of typical hexahedral meshes we achieve compression rates of around bits per hexahedron bph that go down to bph for regular meshes on our test meshes the average connectivity compression ratio is for geometry compression we perform simple parallelogram prediction on uniformly quantized vertices within the side of hexahedron tests show an average geometry compression ratio of at quantization level of bits
in this paper resting on the analysis of instruction frequency and function based instruction sequences we develop an automatic malware categorization system amcs for automatically grouping malware samples into families that share some common characteristics using cluster ensemble by aggregating the clustering solutions generated by different base clustering algorithms we propose principled cluster ensemble framework for combining individual clustering solutions based on the consensus partition the domain knowledge in the form of sample level constraints can be naturally incorporated in the ensemble framework in addition to account for the characteristics of feature representations we propose hybrid hierarchical clustering algorithm which combines the merits of hierarchical clustering and medoids algorithms and weighted subspace medoids algorithm to generate base clusterings the categorization results of our amcs system can be used to generate signatures for malware families that are useful for malware detection the case studies on large and real daily malware collection from kingsoft anti virus lab demonstrate the effectiveness and efficiency of our amcs system
in this paper we address the problem of automatically deriving vocabularies of motion modules from human motion data taking advantage of the underlying spatio temporal structure in motion we approach this problem with data driven methodology for modularizing motion stream or time series of human motion into vocabulary of parameterized primitive motion modules and set of meta level behaviors characterizing extended combinations of the primitives central to this methodology is the discovery of spatio temporal structure in motion stream we estimate this structure by extending an existing nonlinear dimension reduction technique isomap to handle motion data with spatial and temporal dependencies the motion vocabularies derived by our methodology provide substrate of autonomous behavior and can be used in variety of applications we demonstrate the utility of derived vocabularies for the application of synthesizing new humanoid motion that is structurally similar to the original demonstrated motion
many dynamic optimization and or binary translationsystems hold optimized translated superblocks in codecache conventional code caching systems suffer fromoverheads when control is transferred from one cachedsuperblock to another especially via register indirectjumps the basic problem is that instruction addresses inthe code cache are different from those in the original programbinary therefore performance for register indirectjumps depends on the ability to translate efficiently fromsource binary pc values to code cache pc valueswe analyze several key aspects of superblock chainingand find that conventional baseline code cache withsoftware jump target prediction results in ipc lossversus the original binary we identify the inability to usea conventional return address stack as the most significantperformance limiter in code cache systems we introduce amodified software prediction technique that reduces theipc loss to this technique is based on techniqueused in threaded code interpretersa number of hardware mechanisms including specializedreturn address stack and hardware cache fortranslated jump target addresses are studied for efficientlysupporting register indirect jumps once all the chainingoverheads are removed by these support mechanisms asuperblock based code cache improves performance due toa better branch prediction rate improved cache locality and increased chances of straight line fetches simulationresults show ipc improvement over current generation way superscalar processor
this work introduces new family of link based dissimilarity measures between nodes of weighted directed graph this measure called the randomized shortest path rsp dissimilarity depends on parameter and has the interesting property of reducing on one end to the standard shortest path distance when is large and on the other end to the commute time or resistance distance when is small near zero intuitively it corresponds to the expected cost incurred by random walker in order to reach destination node from starting node while maintaining constant entropy related to spread in the graph the parameter is therefore biasing gradually the simple random walk on the graph towards the shortest path policy by adopting statistical physics approach and computing sum over all the possible paths discrete path integral it is shown that the rsp dissimilarity from every node to particular node of interest can be computed efficiently by solving two linear systems of equations where is the number of nodes on the other hand the dissimilarity between every couple of nodes is obtained by inverting an matrix the proposed measure can be used for various graph mining tasks such as computing betweenness centrality finding dense communities etc as shown in the experimental section
software engineering faculty face the challenge of educating future researchers and industry practitioners regarding the generation of empirical software engineering studies and their use in evidence based software engineering in order to engage the net generation with this topic we propose development and population of community driven web database containing summaries of empirical software engineering studies we also present our experience with integrating these activities into graduate software engineering course these efforts resulted in the creation of seeds software engineering evidence database system graduate students initially populated seeds with summaries of empirical software engineering studies the summaries were randomly sampled and reviewed by industry professionals who found the student written summaries to be at least as useful as professional written summaries in fact more of the respondents found the student written summaries to be very useful motivations student and instructor developed prototypes and assessments of the resulting artifacts will be discussed
we describe an optimization method for combinational and sequential logic networks with emphasis on scalability and the scope of optimization the proposed resynthesis is capable of substantial logic restructuring is customizable to solve variety of optimization tasks and has reasonable runtime on industrial designs the approach uses don’t cares computed for window surrounding node and can take into account external don’t cares eg unreachable states it uses sat solver and interpolation to find new representation for node this representation can be in terms of inputs from other nodes in the window thus effecting boolean re substitution experimental results on input lut networks after high effort synthesis show substantial reductions in area and delay when applied to large academic benchmarks the lut count and logic level is reduced by and respectively the longest runtime for synthesis and mapping is about two minutes when applied to set of industrial benchmarks ranging up to luts the lut count and logic level is reduced by and respectively experimental results on input lut networks after high effort synthesis show substantial reductions in area and delay the longest runtime is about minutes
accurate and synchronized time is crucial in many sensor network applications due to the need for consistent distributed sensing and coordination in hostile environments where an adversary may attack the networks and or the applications through external or compromised nodes time synchronization becomes an attractive target due to its importance this paper describes the design implementation and evaluation of tinysersync secure and resilient time synchronization subsystem for wireless sensor networks running tinyos this paper makes three contributions first it develops secure single hop pairwise time synchronization technique using hardware assisted authenticated medium access control mac layer timestamping unlike the previous attempts this technique can handle high data rate such as those produced by micaz motes in contrast to those by mica motes second this paper develops secure and resilient global time synchronization protocol based on novel use of the μtesla broadcast authentication protocol for local authenticated broadcast resolving the conflict between the goal of achieving time synchronization with μtesla based broadcast authentication and the fact that μtesla requires loose time synchronization the resulting protocol is secure against external attacks and resilient against compromised nodes the third contribution consists of an implementation of the proposed techniques on micaz motes running tinyos and thorough evaluation through field experiments in network of micaz motes
due to the exponential growth in worldwide information companies have to deal with an ever growing amount of digital information one of the most important challenges for data mining is quickly and correctly finding the relationship among data the apriori algorithm has been the most popular technique in finding frequent patterns however when applying this method database has to be scanned many times to calculate the counts of huge number of candidate itemsets parallel and distributed computing is an effective strategy for accelerating the mining process in this paper the distributed parallel apriori dpa algorithm is proposed as solution to this problem in the proposed method metadata are stored in the form of transaction identifiers tids such that only single scan to the database is needed the approach also takes the factor of itemset counts into consideration thus generating balanced workload among processors and reducing processor idle time experiments on pc cluster with computing nodes are also made to show the performance of the proposed approach and compare it with some other parallel mining algorithms the experimental results show that the proposed approach outperforms the others especially while the minimum supports are low
this paper presents performance model developed for the deployment design of ieee wireless mesh networks wmn the model contains seven metrics to analyze the state of wmn and novel mechanisms to use multiple evaluation criteria in wmn performance optimization the model can be used with various optimization algorithms in this work two example algorithms for channel assignment and minimizing the number of mesh access points aps have been developed prototype has been implemented with java evaluated by optimizing network topology with different criteria and verified with ns simulations according to the results multirate operation interference aware routing and the use of multiple evaluation criteria are crucial in wmn deployment design by channel assignment and removing useless aps the capacity increase in the presented simulations was between and compared to single channel configuration at the same time the coverage was kept high and the traffic distribution fair among the aps
contention for shared resources on multicore processors remains an unsolved problem in existing systems despite significant research efforts dedicated to this problem in the past previous solutions focused primarily on hardware techniques and software page coloring to mitigate this problem our goal is to investigate how and to what extent contention for shared resource can be mitigated via thread scheduling scheduling is an attractive tool because it does not require extra hardware and is relatively easy to integrate into the system our study is the first to provide comprehensive analysis of contention mitigating techniques that use only scheduling the most difficult part of the problem is to find classification scheme for threads which would determine how they affect each other when competing for shared resources we provide comprehensive analysis of such classification schemes using newly proposed methodology that enables to evaluate these schemes separately from the scheduling algorithm itself and to compare them to the optimal as result of this analysis we discovered classification scheme that addresses not only contention for cache space but contention for other shared resources such as the memory controller memory bus and prefetching hardware to show the applicability of our analysis we design new scheduling algorithm which we prototype at user level and demonstrate that it performs within of the optimal we also conclude that the highest impact of contention aware scheduling techniques is not in improving performance of workload as whole but in improving quality of service or performance isolation for individual applications
this paper presents sample based cameras for rendering high quality reflections on convex reflectors at interactive rates the method supports change of view moving objects and reflectors higher order reflections view dependent lighting of reflected objects and reflector surface properties in order to render reflections with the feed forward graphics pipeline one has to project reflected vertices sample based camera is collection of bsp trees of pinhole cameras that jointly approximate the projection function it is constructed from the reflected rays defined by the desired view and the scene reflectors scene point is projected by invoking only the cameras that contain it in their frustums reflections are rendered by projecting the scene geometry and then rasterizing in hardware
in this paper we study the live streaming workload from large content delivery network our data collected over month period contains over million requests for distinct urls from clients in over countries to our knowledge this is the most extensive data of live streaming on the internet that has been studied to date our contributions are two fold first we present macroscopic analysis of the workload characterizing popularity arrival process session duration and transport protocol use our results show that popularity follows mode zipf distribution session interarrivals within small time windows are exponential session durations are heavy tailed and that udp is far from having universal reach on the internet second we cover two additional characteristics that are more specific to the nature of live streaming applications the diversity of clients in comparison to traditional broadcast media like radio and tv and the phenomena that many clients regularly join recurring events we find that internet streaming does reach wide audience often spanning hundreds of as domains and tens of countries more interesting is that small streams also have diverse audience we also find that recurring users often have lifetimes of at least as long as one third of the days in the event
in typical client server scenario trusted server provides valuable services to client which runs remotely on an untrusted platform of the many security vulnerabilities that may arise such as authentication and authorization guaranteeing the integrity of the client code is one of the most difficult to address this security vulnerability is an instance of the malicious host problem where an adversary in control of the client’s host environment tries to tamper with the client code we propose novel client replacement strategy to counter the malicious host problem the client code is periodically replaced by new orthogonal clients such that their combination with the server is functionally equivalent to the original client server application the reverse engineering efforts of the adversary are deterred by the complexity of analysis of frequently changing orthogonal program code we use the underlying concepts of program obfuscation as basis for formally defining and providing orthogonality we also give preliminary empirical validation of the proposed approach
the standard solution for user authentication on the web is to establish tls based secure channel in server authenticated mode and run protocol on top of tls where the user enters password in an html form however as many studies point out the average internet user is unable to identify the server based on certificate so that impersonation attacks eg phishing are feasible we tackle this problem by proposing protocol that allows the user to identify the server based on human perceptible authenticators eg picture voice we prove the security of this protocol by refining the game based security model of bellare and rogaway and present proof of concept implementation
evidence shows that integrated development environments ides are too often functionality oriented and difficult to use learn and master this article describes challenges in the design of usable ides and in the evaluation of the usability of such tools it also presents the results of three different empirical studies of ide usability different methods are sequentially applied across the empirical studies in order to identify increasingly specific kinds of usability problems that developers face in their use of ides the results of these studies suggest several problems in ide user interfaces with the representation of functionalities and artifacts such as reusable program components we conclude by making recommendations for the design of ide user interfaces with better affordances which may ameliorate some of most serious usability problems and help to create more human centric software development environments
this paper presents aspectoptima language independent aspect oriented framework consisting of set of ten base aspects each one providing well defined reusable functionality that can be configured to ensure the acid properties atomicity consistency isolation and durability for transactional objects the overall goal of aspectoptima is to serve as case study for aspect oriented software development particularly for evaluating the expressivity of aop languages and how they address complex aspect interactions and dependencies the ten base aspects of aspectoptima are simple yet have complex dependencies and interactions among each other to implement different concurrency control and recovery strategies these aspects can be composed and assembled into different configurations some aspects conflict with each other others have to adapt their run time behavior according to the presence or absence of other aspects the design of aspectoptima highlights the need for set of key language features required for implementing reusable aspect oriented frameworks to illustrate the usefulness of aspectoptima as means for evaluating programming language features an implementation of aspectoptima in aspectj is presented the experiment reveals that aspectj’s language features do not directly support implementation of reusable aspect oriented frameworks with complex dependencies and interactions the encountered aspectj language limitations are discussed workaround solutions are shown potential language improvements are proposed where appropriate and some preliminary measurements are presented that highlight the performance impact of certain language features
qursed enables the development of web based query forms and reports qfrs that query and report semistructured xml data ie data that are characterized by nesting irregularities and structural variance the query aspects of qfr are captured by its query set specification which formally encodes multiple parameterized condition fragments and can describe large numbers of queries the run time component of qursed produces xquery compliant queries by synthesizing fragments from the query set specification that have been activated during the interaction of the end user with the qfr the design time component of qursed called qursed editor semi automates the development of the query set specification and its association with the visual components of the qfr by translating visual actions into appropriate query set specifications we describe qursed and illustrate how it accommodates the intricacies that the semistructured nature of the underlying database introduces we specifically focus on the formal model of the query set specification its generation via the qursed editor and its coupling with the visual aspects of the web based form and report
in this paper we generalize the notion of compositional semantics to cope with transfinite reductions of transition system standard denotational and predicate transformer semantics even though compositional provide inadequate models for some known program manipulation techniques we are interested in the systematic design of extended compositional semantics observing possible transfinite computations ie computations that may occur after given number of infinite loops this generalization is necessary to deal with program manipulation techniques modifying the termination status of programs such as program slicing we include the transfinite generalization of semantics in the hierarchy developed in by cousot where semantics at different levels of abstraction are related with each other by abstract interpretation we prove that specular hierarchy of non standard semantics modeling transfinite computations of programs can be specified in such way that the standard hierarchy can be derived by abstract interpretation we prove that non standard transfinite denotational and predicate transformer semantics can be both systematically derived as solutions of simple abstract domain equations involving the basic operation of reduced power of abstract domains this allows us to prove the optimality of these semantics ie they are the most abstract semantics in the hierarchy which are compositional and observe respectively the terminating and initial states of transfinite computations providing an adequate mathematical model for program manipulation
in the database framework of kanellakis et al it was argued that constraint query languages should take constraint databases as input and give other constraint databases that use the same type of atomic constraints as output this closed form requirement has been difficult to realize in constraint query languages that contain the negation symbol this paper describes general approach to restricting constraint query languages with negation to safe subsets that contain only programs that are evaluable in closed form on any valid constraint database input
data access costs contribute significantly to the execution time of applications with complex data structures the latency of memory accesses becomes high relative to processor cycle times application performance is increasingly limited by memory performance in some situations it is useful to trade increased computation costs for reduced memory costs the contributions of this paper are three fold we provide detailed analysis of the memory performance of seven memory intensive benchmarks we describe computation regrouping source level approach to improving the performance of memory bound applications by increasing temporal locality to eliminate cache and tlb misses and we demonstrate significant performance improvement by applying computation regrouping to our suite of seven benchmarks using computation regrouping we observe geometric mean speedup of with individual speedups ranging from to most of this improvement comes from eliminating memory tall time
one fundamental challenge for mining recurring subgraphs from semi structured data sets is the overwhelming abundance of such patterns in large graph databases the total number of frequent subgraphs can become too large to allow full enumeration using reasonable computational resources in this paper we propose new algorithm that mines only maximal frequent subgraphs ie subgraphs that are not part of any other frequent subgraphs this may exponentially decrease the size of the output set in the best case in our experiments on practical data sets mining maximal frequent subgraphs reduces the total number of mined patterns by two to three orders of magnitudeour method first mines all frequent trees from general graph database and then reconstructs all maximal subgraphs from the mined trees using two chemical structure benchmarks and set of synthetic graph data sets we demonstrate that in addition to decreasing the output size our algorithm can achieve five fold speed up over the current state of the art subgraph mining algorithms
this paper describes model based approach to generate conformance tests for interactive applications our method builds on existing work to address generation of small yet effective set of test frames for testing individual operations set up sequence that brings the system under test in an appropriate state for test frame self priming verification sequence for expected output and state changes self checking and negative test cases in the presence of exceptions our method exploits novel mutation scheme applied to operations specified as pre and postconditions on parameters and state variables set of novel abstraction techniques which result in compact finite state automaton and search techniques to automatically generate the set up and verification sequences we illustrate our method with simple atm application
network intrusions become signification threat to network servers and its availability simple intrusion can suspend the organization’s network services and can lead to financial disaster in this paper we propose framework called timevm to mitigate or even eliminate the infection of network intrusion on line as fast as possible the framework is based on the virtual machine technology and traffic replay based recovery timevm gives the illusion of time machine timevm logs only the network traffic to server and replays the logged traffic to multiple shadow virtual machines shadow vm after different time delays time lags consequently each shadow vm will represent the server at different time in history when attack infection is detected timevm enables navigating through the traffic history logs picking uninfected shadow vm removing the attack traffic and then fast replaying the entire traffic history to this shadow vm as result typical up to date uninfected version of the original system can be constructed the paper shows the implementation details for timevm it also addresses many practical challenges related to how to configure and deploy timevm in system in order to minimize the recovery time we present analytical framework and extensive evaluation to validate our approach in different environments
robots have entered our domestic lives but yet little is known about their impact on the home this paper takes steps towards addressing this omission by reporting results from an empirical study of irobot’s roomba vacuuming robot our findings suggest that by developing intimacy to the robot our participants were able to derive increased pleasure from cleaning and expended effort to fit roomba into their homes and shared it with others these findings lead us to propose four design implications that we argue could increase people’s enthusiasm for smart home technologies
compositional verification is essential for verifying large systems however approximate environments are needed when verifying the constituent modules in system effective compositional verification requires finding simple but accurate overapproximate environment for each module otherwise many verification failures may be produced therefore incurring high computational penalty for distinguishing the false failures from the real ones this paper presents an automated method to refine the state space of each module within an overapproximate environment this method is sound as long as an overapproximate environment is found for each module at the beginning of the verification process and it has less restrictions on system partitioning it is also coupled with several state space reduction techniques for better results experiments of this method on several large asynchronous designs show promising results
rfid is an emerging technology with many potential applications such as inventory management for supply chain in practice these applications often need series of continuous scanning operations to accomplish task for example if one wants to scan all the products with rfid tags in large warehouse given limited reading range of an rfid reader multiple scanning operations have to be launched at different locations to cover the whole warehouse usually this series of scanning operations are not completely independent as some rfid tags can be read by multiple processes simply scanning all the tags in the reading range during each process is inefficient because it collects lot of redundant data and consumes long time in this paper we develop efficient schemes for continuous scanning operations defined in both spatial and temporal domains our basic idea is to fully utilize the information gathered in the previous scanning operations to reduce the scanning time of the succeeding ones we illustrate in the evaluation that our algorithms dramatically reduce the total scanning time when compared with other solutions
innocent looking program transformations can easily change the space complexity of lazy functional programs the theory of space improvement seeks to characterize those local program transformations which are guaranteed never to worsen asymptotic space complexity of any program previous work by the authors introduced the space improvement relation and showed that number of simple local transformation laws are indeed space improvements this paper seeks an answer to the following questions is the improvement relation inhabited by interesting program transformations and if so how might they be established we show that the asymptotic space improvement relation is semantically badly behaved but that the theory of strong space improvement possesses fixed point induction theorem which permits the derivation of improvement properties for recursive definitions with the help of this tool we explore the landscape of space improvement by considering range of classical program transformation
we show how hypertext based program understanding tools can achieve new levels of abstraction by using inferred type information for cases where the subject software system is written in weakly typed language we propose typeexplorer tool for browsing cobol legacy systems based on these types the paper addresses how types an invented abstraction can be presented meaningfully to software re engineers the implementation techniques used to construct typeexplorer and the use of typeexplorer for understanding legacy systems at the level of individual statements as well as at the level of the software architecture which is illustrated by using typeexplorer to browse an industrial cobol system of lines of code
finding the right agents in large and dynamic network to provide the needed resources in timely fashion is long standing problem this paper presents method for information searching and sharing that combines routing indices with token based methods the proposed method enables agents to search effectively by acquiring their neighbors interests advertising their information provision abilities and maintaining indices for routing queries in an integrated way specifically the paper demonstrates through performance experiments how static and dynamic networks of agents can be tuned to answer queries effectively as they gather evidence for the interests and information provision abilities of others without altering the topology or imposing an overlay structure to the network of acquaintances
to solve some difficult problems that requires procedural knowledge people often seek the advice of experts who have got competence in that problem domain this paper focuses on locating and determining an expert in particular knowledge domain in most cases social network of user is explored through referrals to locate human experts past work in searching for experts through referrals focused primarily on static social network however static social network fail to accurately represent the set of experts as in knowledge domain as time evolves experts continuously keep changing this paper focuses on the problem of finding experts through referrals in time evolving co author social network authors and co authors of research publication for instance are domain experts in this paper we propose solution where the network is expanded incrementally and the information on domain experts is suitably modified this will avoid periodic global expertise recomputation and would help to effectively retrieve relevant information on domain experts novel data structure is also introduced in our study to effectively track the change in expertise of an author with time
human perceptual processes organize visual input to make the structure of the world explicit successful techniques for automatic depiction meanwhile create images whose structure clearly matches the visual information to be conveyed we discuss how analyzing these structures and realizing them in formal representations can allow computer graphics to engage with perceptual science to mutual benefit we call these representations visual explanations their job is to account for patterns in two dimensions as evidence of visual world
automatic capture of the user’s interaction environment for user adapted interaction and evaluation purposes is an unexplored area in the web accessibility research field this paper presents an application that collects user data regarding assistive technologies be either software or hardware in an unobtrusive way as result cc pp based profiles are created so that interoperability between components such as evaluation engines or server side content adaptors can be attained the implications that versioning issues and the potential user group of given assistive technology have on the guidelines to apply are also remarked the major benefit of this approach is that users can perform their tasks avoiding distractions while interacting with the world wide web
skyline query returns set of objects that are not dominated by other objects an object is said to dominate another if it is closer to the query than the latter on all factors under consideration in this paper we consider the case where the similarity measures may be arbitrary and do not necessarily come from metric space we first explore middleware algorithms analyze how skyline retrieval for non metric spaces can be done on the middleware backend and lay down necessary and sufficient stopping condition for middleware based skyline algorithms we develop the balanced access algorithm which is provably more io friendly than the state of the art algorithm for skyline query processing on middleware and show that baa outperforms the latter by orders of magnitude we also show that without prior knowledge about data distributions it is unlikely to have middleware algorithm that is more io friendly than baa in fact we empirically show that baa is very close to the absolute lower bound of io costs for middleware algorithms further we explore the non middleware setting and devise an online algorithm for skyline retrieval which uses recently proposed value space index over non metric spaces al tree the al tree based algorithm is able to prune subspaces and efficiently maintain candidate sets leading to better performance we compare our algorithms to existing ones which can work with arbitrary similarity measures and show that our approaches are better in terms of computational and disk access costs leading to significantly better response times
this tutorial reviews image alignment and image stitching algorithms image alignment algorithms can discover the correspondence relationships among images with varying degrees of overlap they are ideally suited for applications such as video stabilization summarization and the creation of panoramic mosaics image stitching algorithms take the alignment estimates produced by such registration algorithms and blend the images in seamless manner taking care to deal with potential problems such as blurring or ghosting caused by parallax and scene movement as well as varying image exposures this tutorial reviews the basic motion models underlying alignment and stitching algorithms describes effective direct pixel based and feature based alignment algorithms and describes blending algorithms used to produce seamless mosaics it ends with discussion of open research problems in the area
users on twitter microblogging service started the phenomenon of adding tags to their messages sometime around february these tags are distinct from those in other web systems because users are less likely to index messages for later retrieval we compare tagging patterns in twitter with those in delicious to show that tagging behavior in twitter is different because of its conversational rather than organizational nature we use mixed method of statistical analysis and an interpretive approach to study the phenomenon we find that tagging in twitter is more about filtering and directing content so that it appears in certain streams the most illustrative example of how tagging in twitter differs is the phenomenon of the twitter micro meme emergent topics for which tag is created used widely for few days then disappears we describe the micro meme phenomenon and discuss the importance of this new tagging practice for the larger real time search context
we consider the problem of extracting features for multi class recognition problems the features are required to make fine distinctions between similar classes combined with tolerance for distortions and missing information we define and compare two general approaches both based on maximizing the delivered information for recognition one divides the problem into multiple binary classification tasks while the other uses single multi class scheme the two strategies result in markedly different sets of features which we apply to face identification and detection we show that the first produces sparse set of distinctive features that are specific to an individual face and are highly tolerant to distortions and missing input the second produces compact features each shared by about half of the faces which perform better in general face detection the results show the advantage of distinctive features for making fine distinctions in robust manner they also show that different features are optimal for recognition tasks at different levels of specificity
the computational requirements of full global illumination rendering are such that it is still not possible to achieve high fidelity graphics of very complex scenes in reasonable time on single computer by identifying which computations are more relevant to the desired quality of the solution selective rendering can significantly reduce rendering times in this paper we present novel component based selective rendering system in which the quality of every image and indeed every pixel can be controlled by means of component regular expression crex the crex provides flexible mechanism for controlling which components are rendered and in which order it can be used as strategy for directing the light transport within scene and also in progressive rendering framework furthermore the crex can be combined with visual perception techniques to reduce rendering computation times without compromising the perceived visual quality by means of psychophysical experiment we demonstrate how the crex can be successfully used in such perceptual rendering framework in addition we show how the crex’s flexibility enables it to be incorporated in predictive framework for time constrained rendering
the design and implementation of advanced personalized database applications requires preference driven approach representing preferences as strict partial orders is good choice in most practical cases therefore the efficient integration of preference querying into standard database technology is an important issue we present novel approach to relational preference query optimization based on algebraic transformations variety of new laws for preference relational algebra is presented this forms the foundation for preference query optimizer applying heuristics like push preference prototypical implementation and series of benchmarks show that significant performance gains can be achieved in summary our results give strong evidence that by extending relational databases by strict partial order preferences one can get both good modelling capabilities for personalization and good query runtimes our approach extends to recursive databases as well
critical infrastructures ci are complex and highly interdependent systems networks and assets that provide essential services in our daily life given the increasing dependence upon such critical infrastructures research and investments in identifying their vulnerabilities and devising survivability enhancements are recognized paramount by many countries understanding and analyzing interdependencies and interoperabilities between different critical infrastructures and between the several heterogeneous subsystems each infrastructure is composed of are among the most challenging aspects faced today by designers developers and operators in these critical sectors assessing the impact of interdependencies on the ability of the system to provide resilient and secure services is of primarily importance following this analysis steps can be taken to mitigate vulnerabilities revealed in critical assets this paper addresses the analysis of ci with focus on interdependencies between the involved subsystems in particular the experience gained by the authors in an on going european project is reported and discussed both in terms of identified challenges and in viable approaches under investigation
the key element to support ad hoc resource sharing on the web is to discover resources of interest the hypermedia paradigm provides way of overlaying set of resources with additional information in the form of links to help people find other resources however existing hypermedia approaches primarily develop mechanisms to enable resource sharing in fairly static centralized way recent developments in distributed computing on the other hand introduced peer to peer pp computing that is notable for employing distributed resources to perform critical function in more dynamic and ad hoc scenario we investigate the feasibility and potential benefits of bringing together the pp paradigm with the concept of hypermedia link services to implement ad hoc resource sharing on the web this is accomplished by utilizing web based distributed dynamic link service ddls as testbed and addressing the issues arising from the design implementation and enhancement of the service our experimental result reveals the behavior and performance of the semantics based resource discovery in ddls and demonstrates that the proposed enhancing technique for ddls topology reorganization is appropriate and efficient
we present technique for modeling and automatic verification of network protocols based on graph transformation it is suitable for protocols with potentially unbounded number of nodes in which the structure and topology of the network is central aspect such as routing protocols for ad hoc networks safety properties are specified as set of undesirable global configurations we verify that there is no undesirable configuration which is reachable from an initial configuration by means of symbolic backward reachability analysis in general the reachability problem is undecidablewe implement the technique in graph grammar analysis tool and automatically verify several interesting nontrivial examples notably we prove loop freedom for the dymo ad hoc routing protocol dymo is currently on the ietf standards track to potentially become an internet standard
recursive queries are quite important in the context of xml databases in addition several recent papers have investigated relational approach to store xml data and there is growing evidence that schema conscious approaches are better option than schema oblivious techniques as far as query performance is concerned however the issue of recursive xml queries for such approaches has not been dealt with satisfactorily in this paper we argue that it is possible to design schema oblivious approach that outperforms schema conscious approaches for certain types of recursive queries to that end we propose novel schema oblivious approach called sucxent schema unconcious xml enabled system that outperforms existing schema oblivious approaches such as xparent by up to times and schema conscious approaches shared inlining by up to eight times for recursive query execution our approach has up to two times smaller storage requirements compared to existing schema oblivious approaches and less than schema conscious techniques in addition sucxent performs marginally better than shared inlining and is times faster than xparent as far as insertion time is concerned
one of the key factors influencing project success or failure is project management unfortunately effective management of software projects is not in practice what is actually being practiced varies significantly from what is advised in the available literature in order to improve performance in the field of software project management there is dire need to formally educate prospective project managers in both the theoretical and practical aspects of managing software projects this paper focuses on the formulation and execution of practicum in software project management graduate course that aids students in learning practical aspects of software project management this course has been part of the masters in software project management curriculum at national university of computer and emerging sciences nuces lahore pakistan since we discuss the course in light of the major software project management activities recommended in literature comparison of the course with the portfolio program and project management maturity model pm has been done to allow us to assess the maturity of this course in terms of software engineering project management processes and assist us in identifying and highlighting the areas needing further improvement in terms of teaching practice and industry needs the comparison is based on the key process areas applicable to our course and shows that practicum in software project management is capable at the repeatable and capable at the defined levels of the pm
future mixed reality systems will need to support large numbers of simultaneous nonexpert users at reasonable per user costs if the systems are to be widely deployed within society in the short to medium term we have constructed prototype of such system an interactive entertainment space called ada that was designed to behave like simple organism using ada we conducted two studies the first assessing the effect of varying the operating parameters of the space on the collective behavior and attitudes of its users and the second assessing the relationships among user demographics behavior and attitudes our results showed that small changes in the ambient settings of the environment have significant effect on both user attitudes and behavior and that the changes in user attitudes do not necessarily correspond to the environmental changes we also found that individual user opinions are affected by demographics and reflected in overt behavior using these results we propose some tentative guidelines for the design of future shared mixed reality spaces
massively multiplayer online games mmos have emerged as an exciting new class of applications for database technology mmos simulate long lived interactive virtual worlds which proceed by applying updates in frames or ticks typically at or hz in order to sustain the resulting high update rates of such games game state is kept entirely in main memory by the game servers nevertheless durability in mmos is usually achieved by standard dbms implementing aries style recovery this architecture limits scalability forcing mmo developers to either invest in high end hardware or to over partition their virtual worlds in this paper we evaluate the applicability of existing checkpoint recovery techniques developed for main memory dbms to mmo workloads our thorough experimental evaluation uses detailed simulation model fed with update traces generated synthetically and from prototype game server based on our results we recommend mmo developers to adopt copy on update scheme with double backup disk organization to checkpoint game state this scheme outperforms alternatives in terms of the latency introduced in the game as well the time necessary to recover after crash
this paper introduces new application of boosting for parse reranking several parsers have been proposed that utilize the all subtrees representation eg tree kernel and data oriented parsing this paper argues that such an all subtrees representation is extremely redundant and comparable accuracy can be achieved using just small set of subtrees we show how the boosting algorithm can be applied to the all subtrees representation and how it selects small and relevant feature set efficiently two experiments on parse reranking show that our method achieves comparable or even better performance than kernel methods and also improves the testing efficiency
traditional software project management theory often focuses on desk based development of software and algorithms much in line with the traditions of the classical project management and software engineering this can be described as tools and techniques perspective which assumes that software project management success is dependent on having the right instruments available rather than on the individual qualities of the project manager or the cumulative qualities and skills of the software organisation surprisingly little is known about how or whether these tools techniques are used in practice this study in contrast uses qualitative grounded theory approach to develop the basis for an alternative theoretical perspective that of competence competence approach to understanding software project management places the responsibility for success firmly on the shoulders of the people involved project members project leaders managers the competence approach is developed through an investigation of the experiences of project managers in medium sized software development company wm data in denmark starting with simple model relating project conditions project management competences and desired project outcomes we collected data through interviews focus groups and one large plenary meeting with most of the company’s project managers data analysis employed content analysis for concept variable development and causal mapping to trace relationships between variables in this way we were able to build up picture of the competences project managers use in their daily work at wm data which we argue is also partly generalisable to theory the discrepancy between the two perspectives is discussed particularly in regard to the current orientation of the software engineering field the study provides many methodological and theoretical starting points for researchers wishing to develop more detailed competence perspective of software project managers work
the dependencies that exist among definitions and uses of variables in program are required by many language processing tools this paper considers the computation of definition use and use definition chains that extend across procedure boundaries at call and return sites intraprocedural definition and use information is abstracted for each procedure and is used to construct an interprocedural flow graph this intraprocedural data flow information is then propagated throughout the program via the interprocedural flow graph to obtain sets of reaching definitions and or reachable uses for reach interprocedural control point including procedure entry exit call and return interprocedural definition use and or use definition chains are computed from this reaching information the technique handles the interprocedural effects of the data flow caused by both reference parameters and global variables while preserving the calling context of called procedures additionally recursion aliasing and separate compilation are handled the technique has been implemented using sun workstation and incorporated into an interprocedural data flow tester results from experiments indicate the practicality of the technique both in terms of the size of the interprocedural flow graph and the size of the data flow sets
we describe our experiences with the design implementation deployment and evaluation of portholes tool which provides group and collaboration awareness through the web the research objective was to explore as to how such system would improve communication and facilitate shared understanding among distributed development groups during the deployment of our portholes system we conducted naturalistic study by soliciting user feedback and evolving the system in response many of the initial reactions of potential users indicated that our system projected the wrong image so that we designed new version that provided explicit cues about being in public and who is looking back to suggest social rather than information interface we implemented the new design as java applet and evaluated design choices with preference study our experiences with different portholes versions and user reactions to them provide insights for designing awareness tools beyond portholes systems our approach is for the studies to guide and to provide feedback for the design and technical development of our system
we present continuously adaptive continuous query cacq implementation based on the eddy query processing framework we show that our design provides significant performance benefits over existing approaches to evaluating continuous queries not only because of its adaptivity but also because of the aggressive cross query sharing of work and space that it enables by breaking the abstraction of shared relational algebra expressions our telegraph cacq implementation is able to share physical operators both selections and join state at very fine grain we augment these features with grouped filter index to simultaneously evaluate multiple selection predicates we include measurements of the performance of our core system along with comparison to existing continuous query approaches
dynamic remapping is critical to the performance of bulk synchronous computations that have non deterministic behaviors because of the need of barrier synchronization between phases there are two basic issues in remapping when and how to remap this paper presents formal analysis of the issue of when to remap for dynamic computations with priori known statistical behaviors with an objective of finding optimal remapping frequencies for given tolerance of load imbalance the problem is formulated as two complement sequential stochastic optimization since general optimization techniques tend to reveal stationary properties of the workload process they are not readily applicable to the analysis of the effect of periodic remapping instead this paper develops new analytical approaches to precisely characterize the transient statistical behaviors of the workload process on both homogeneous and heterogeneous machines optimal remapping frequencies are derived for various random workload change processes with known or unknown probabilistic distributions they are shown accurate via simulations
run time monitoring can provide important insights about program’s behavior and for simple properties it can be done efficiently monitoring properties describing sequences of program states and events however can result in significant run time overhead in this paper we present novel approach to reducing the cost of run time monitoring of path properties properties are composed to form single integrated property that is then systematically decomposed into set of properties that encode necessary conditions for property violations the resulting set of properties forms lattice whose structure is exploited to select sample of properties that can lower monitoring cost while preserving violation detection power relative to the original properties preliminary studies for widely used java api reveal that our approach produces rich structured set of properties that enables control of monitoring overhead while detecting more violations than alternative techniques
when multiple valid solutions are available to problem preferences can be used to indicate choice in distributed system such preference based solution can be produced autonomous agents cooperating together but the attempt will lead to contention if the same resource is given preference by several user agents to resolve such contentions this paper proposes market based payment scheme for selling and buying preferences by the contenders in which the best solution is defined as the one where as many preferences as theoretically possible are globally met after exploring the nature of preference the paper develops preference processing model based on the market based scheme and presents theoretical performance model to verify the correctness of the processing model this verification is provided by simulation study of the processing modelfor the simulation study manufacturing environment is conjectured where set of tasks are resolved into subtasks by coordinator agents and then these subtasks are allocated to assembler agents through cooperation and negotiation in which preferred resources are exchanged against payments the study shows that our agent based strategy not only produces convergence on the total preference value for the whole system but also reaches that final value irrespective of the initial orderof subtask allocation to the assemblers
creativity has long been seen as mysterious process better understanding of the nature of creativity and the processes that underpin it are needed in order to be able to make contributions to the design of creativity support tools in this paper current theories of creativity and models of the creative process are discussed and an empirical study outlined of participants individually and collaboratively undertaking two creative tasks in different domains the tasks write poem and design poster were carried out via two media pen and paper and with the use of computer and software tools schon’s theory of reflection in action is applied to the study results the research has generated software requirements for tools to support these specific creative tasks and also initial design features to enable support for group reflection of evolving creative artefacts
aggressive scaling increases the number of devices we can integrate per square millimeter but makes it increasingly difficult to guarantee that each device fabricated has the intended operational characteristics without careful mitigation component yield rates will fall potentially negating the economic benefits of scaling the fine grained reconfigurability inherent in fpgas is powerful tool that can allow us to drop the stringent requirement that every device be fabricated perfectly in order for component to be useful to exploit inherent fpga reconfigurability while avoiding full cad mapping we propose lightweight techniques compatible with the current single bitstream model that can avoid defective devices reducing yield loss at high defect rates in particular by embedding testing operations and alternative path configurations into the bitstream each fpga can avoid defects by making only simple greedy decisions at bitstream load time with additional tracks above the minimum routable channel width routes can tolerate switch defect rates raising yield from essentially to near
in abstract algebra structure is said to be noetherian if it does not admit infinite strictly ascending chains of congruences in this paper we adapt this notion to first order logic by defining the class of noetherian theories examples of theories in this class are linear arithmetics without ordering and the empty theory containing only unary function symbol interestingly it is possible to design non disjoint combination method for extensions of noetherian theories we investigate sufficient conditions for adding temporal dimension to such theories in such way that the decidability of the satisfiability problem for the quantifier free fragment of the resulting temporal logic is guaranteed this problem is firstly investigated for the case of linear time temporal logic and then generalized to arbitrary modal temporal logics whose propositional relativized satisfiability problem is decidable
although originally designed for large scale electronic publishing xml plays an increasingly important role in the exchange of data on the web in fact it is expected that xml will become the lingua franca of the web eventually replacing html not surprisingly there has been great deal of interest on xml both in industry and in academia nevertheless to date no comprehensive study on the xml web ie the subset of the web made of xml documents only nor on its contents has been made this paper is the first attempt at describing the xml web and the documents contained in it our results are drawn from sample of repository of the publicly available xml documents on the web consisting of about documents our results show that despite its short history xml already permeates the web both in terms of generic domains and geographically also our results about the contents of the xml web provide valuable input for the design of algorithms tools and systems that use xml in one form or another
in chip multiprocessor cmp system the dram system isshared among cores in shared dram system requests from athread can not only delay requests from other threads by causingbank bus row buffer conflicts but they can also destroy other threadsdram bank level parallelism requests whose latencies would otherwisehave been overlapped could effectively become serialized as aresult both fairness and system throughput degrade and some threadscan starve for long time periodsthis paper proposes fundamentally new approach to designinga shared dram controller that provides quality of service to threads while also improving system throughput our parallelism aware batchscheduler par bs design is based on two key ideas first parbsprocesses dram requests in batches to provide fairness and toavoid starvation of requests second to optimize system throughput par bs employs parallelism aware dram scheduling policythat aims to process requests from thread in parallel in the drambanks thereby reducing the memory related stall time experienced bythe thread par bs seamlessly incorporates support for system levelthread priorities and can provide different service levels includingpurely opportunistic service to threads with different prioritieswe evaluate the design trade offs involved in par bs and compareit to four previously proposed dram scheduler designs on and core systems our evaluations show that averaged over coreworkloads par bs improves fairness by and system throughputby compared to the best previous scheduling technique stall time fair memory stfm scheduling based on simple request prioritizationrules par bs is also simpler to implement than stfm
appropriation is the process by which people adopt and adapt technologies fitting them into their working practices it is similar to customisation but concerns the adoption patterns of technology and the transformation of practice at deeper level understanding appropriation is key problem for developing interactive systems since it critical to the success of technology deployment it is also an important research issue since appropriation lies at the intersection of workplace studies and designmost accounts of appropriation in the research literature have taken social perspective in contrast this paper explores appropriation in terms of the technical features that support it drawing examples from applications developed as part of novel document management system it develops an initial set of design principles for appropriable technologies these principles are particularly relevant to component based approaches to system design that blur the traditional application boundaries
to extract abstract views of the behavior of an object oriented system for reverse engineering body of research exists that analyzes system’s runtime execution those approaches primarily analyze the control flow by tracing method execution events however they do not capture information flows we address this problem by proposing novel dynamic analysis technique named object flow analysis which complements method execution tracing with an accurate analysis of the runtime flow of objects to exemplify the usefulness of our analysis we present visual approach that allows system engineer to study classes and components in terms of how they exchange objects at runtime we illustrate and validate our approach on two case studies
this paper describes the main cases of multiple pointer interaction and proposes new notation named udp for classifying and comparing these systems technical solution that makes use of two complementary tools is presented in the second part of the paper this implementation can support most cases of multiple pointer interaction it is currently based on the window windowing system and the ubit toolkit
in this paper we focus on the problem of preserving the privacy of sensitive relationships in graph data we refer to the problem of inferring sensitive relationships from anonymized graph data as link reidentification we propose five different privacy preservation strategies which vary in terms of the amount of data removed and hence their utility and the amount of privacy preserved we assume the adversary has an accurate predictive model for links and we show experimentally the success of different link re identification strategies under varying structural characteristics of the data
we define calculus for investigating the interactions between mixin modules and computational effects by combining the purely functional mixin calculus cms with monadic metalanguage supporting the two separate notions of simplification local rewrite rules and computation global evaluation able to modify the store this distinction is important for smoothly integrating the cms rules which are all local with the rules dealing with the imperative features in our calculus mixins can contain mutually recursive computational components which are explicitly computed by means of new mixin operator whose semantics is defined in terms of haskell like recursive monadic binding since we mainly focus on the operational aspects we adopt simple type system like that for haskell that does not detect dynamic errors related to bad recursive declarations involving effects the calculus serves as formal basis for defining the semantics of imperative programming languages supporting first class mixins while preserving the cms equational reasoning
access to realistic complex graph datasets is critical to research on social networking systems and applications simulations on graph data provide critical evaluation of new systems and applications ranging from community detection to spam filtering and social web search due to the high time and resource costs of gathering real graph datasets through direct measurements researchers are anonymizing and sharing small number of valuable datasets with the community however performing experiments using shared real datasets faces three key disadvantages concerns that graphs can be de anonymized to reveal private information increasing costs of distributing large datasets and that small number of available social graphs limits the statistical confidence in the results the use of measurement calibrated graph models is an attractive alternative to sharing datasets researchers can fit graph model to real social graph extract set of model parameters and use them to generate multiple synthetic graphs statistically similar to the original graph while numerous graph models have been proposed it is unclear if they can produce synthetic graphs that accurately match the properties of the original graphs in this paper we explore the feasibility of measurement calibrated synthetic graphs using six popular graph models and variety of real social graphs gathered from the facebook social network ranging from to million edges we find that two models consistently produce synthetic graphs with common graph metric values similar to those of the original graphs however only one produces high fidelity results in our application level benchmarks while this shows that graph models can produce realistic synthetic graphs it also highlights the fact that current graph metrics remain incomplete and some applications expose graph properties that do not map to existing metrics
replication of information across server cluster provides promising way to support popular web sites however web server cluster requires some mechanism for the scheduling of requests to the most available server one common approach is to use the cluster domain name system dns as centralized dispatcher the main problem is that www address caching mechanisms although reducing network traffic only let this dns dispatcher control very small fraction of the requests reaching the web server cluster the non uniformity of the load from different client domains and the high variability of real web workload introduce additional degrees of complexity to the load balancing issue these characteristics make existing scheduling algorithms for traditional distributed systems not applicable to control the load of web server clusters and motivate the research on entirely new dns policies that require some system state information we analyze various dns dispatching policies under realistic situations where state information needs to be estimated with low computation and communication overhead so as to be applicable to web cluster architecture in model of realistic scenarios for the web cluster large set of simulation experiments shows that by incorporating the proposed state estimators into the dispatching policies the effectiveness of the dns scheduling algorithms can improve substantially in particular if compared to the results of dns algorithms not using adequate state information
patent documents contain important research results they are often collectively analyzed and organized in visual way to support decision making however they are lengthy and rich in technical terminology and thus require lot of human effort for analysis automatic tools for assisting patent engineers or decision makers in patent analysis are in great demand this paper describes summarization method for patent surrogate extraction intended to efficiently and effectively support patent mapping which is an important subtask of patent analysis six patent maps were used to evaluate its relative usefulness the experimental results confirm that the machine generated summaries do preserve more important content words than some other patent sections or even than the full patent texts when only few terms are to be considered for classification and mapping the implication is that if one were to determine patent’s category based on only few terms at quick pace one could begin by reading the section summaries generated automatically
multiple instance learning mil provides framework for training discriminative classifier from data with ambiguous labels this framework is well suited for the task of learning object classifiers from weakly labeled image data where only the presence of an object in an image is known but not its location some recent work has explored the application of mil algorithms to the tasks of image categorization and natural scene classification in this paper we extend these ideas in framework that uses mil to recognize and localize objects in images to achieve this we employ state of the art image descriptors and multiple stable segmentations these components combined with powerful mil algorithm form our object recognition system called milss we show highly competitive object categorization results on the caltech dataset to evaluate the performance of our algorithm further we introduce the challenging landmarks dataset collection of photographs of famous landmarks from around the world the results on this new dataset show the great potential of our proposed algorithm
supervised classification is one of the tasks most frequently carried out by so called intelligent systems thus large number of techniques have been developed based on artificial intelligence logic based techniques perceptron based techniques and statistics bayesian networks instance based techniques the goal of supervised learning is to build concise model of the distribution of class labels in terms of predictor features the resulting classifier is then used to assign class labels to the testing instances where the values of the predictor features are known but the value of the class label is unknown this paper describes various classification algorithms and the recent attempt for improving classification accuracy ensembles of classifiers
one of the most important open problems of parallel ltl model checking is to design an on the fly scalable parallel algorithm with linear time complexity such an algorithm would give the optimality we have in sequential ltl model checking in this paper we give partial solution to the problem we propose an algorithm that has the required properties for very rich subset of ltl properties namely those expressible by weak büchi automata
the performance model interchange format pmif provides mechanism for transferring the system model information among performance modeling tools requiring only that the tools either internally support pmif or provide an interface that reads writes model specifications from to file this paper presents the latest version of the specification pmif metamodel defining the information requirements and the corresponding xml schema it defines the semantic properties for pmifxml interchange the prescribed validation order errors and warnings and tool and web service implementation import and export prototypes for two different types of tools prove the concept generally available examples are used for repeatability
understanding how people in organizations appropriate and adapt groupware technologies to local contexts of use is key issue for cscw research since it is critical to the success of these technologies in this paper we argue that the appropriation and adaptation of groupware and other types of advanced cscw technologies is basically problem of sensemaking we analyze how group of technology use mediators orlikowski etâ al org sci in large multinational company adapted groupware technology virtual workspace to the local organizational context and vice versa by modifying features of the technology providing ongoing support for users and promoting appropriate conventions of use our findings corroborate earlier research on technology use mediation which suggests that such mediators can exert considerable influence on how particular technology will be established and used in an organization however we also find that the process of technology use mediation is much more complex and indeterminate than prior research suggests the reason being we argue that new advanced cscw technologies such as virtual workspaces and other groupware applications challenge the mediators and users sensemaking because the technologies are equivocal and therefore open to many possible and plausible interpretations
peer to peer pp systems are becoming increasingly popular as they enable users to exchange digital information by participating in complex networks such systems are inexpensive easy to use highly scalable and do not require central administration despite their advantages however limited work has been done on employing database systems on top of pp networkshere we propose the peerolap architecture for supporting on line analytical processing queries large number low end clients each containing cache with the most useful results are connected through an arbitrary pp network if query cannot be answered locally ie by using the cache contents of the computer where it is issued it is propagated through the network until peer that has cached the answer is found an answer may also be constructed by partial results from many peers thus peerolap acts as large distributed cache which amplifies the benefits of traditional client side caching the system is fully distributed and can reconfigure itself on the fly in order to decrease the query cost for the observed workload this paper describes the core components of peerolap and presents our results both from simulation and prototype installation running on geographically remote peers
we have explored the challenges of designing domestic services to help family and friends determine mutually agreeable times to call one another in prior study we identified activities in the home that predict availability to an external interruption in our follow up study we used cooperative design activity to see which if any of these home activities the family member is willing to share when engaged in mealtime or leisure routine the data people are willing to reveal is more detailed for mealtime than leisure activities furthermore the shared availability service needs to be accessible throughout the home in either compact portable form or integrated with other services in various room locations accuracy and reliability of the shared information along with device independent caller identification are also essential design requirements while not unique to the home the desire to personalize and present socially acceptable availability status is extremely important
today in both virtualized and non virtualized systems the entire functionality is based on device drivers they are central to any system structure both anecdotal and informed evidence indicates device drivers as major source of trouble in the classical os and source of scaling and performance issues in virtual due to trusted intermediary required for the shared we propose an architecture which virtualizes the entire subsystem rather than each device and provides device independent at higher level of abstraction than the traditional interfaces in our suggested architecture the system robustness is increased by isolating drivers efficient and scalable virtualization becomes possible by complete separation of the and compute function and introducing protection model that does not require trusted intermediary for
this paper studies how to verify the conformity of program with its specification and proposes novel constraint programming framework for bounded program verification cpbpv the cpbpv framework uses constraint stores to represent both the specification and the program and explores execution paths of bounded length nondeterministically the cpbpv framework detects non conformities and provides counter examples when path of bounded length that refutes some properties exists the input program is partially correct under the boundness restrictions if each constraint store so produced implies the post condition cpbpv does not explore spurious execution paths as it incrementally prunes execution paths early by detecting that the constraint store is not consistent cpbpv uses the rich language of constraint programming to express the constraint store finally cpbpv is parameterized with list of solvers which are tried in sequence starting with the least expensive and less general experimental results often produce orders of magnitude improvements over earlier approaches running times being often independent of the size of the variable domains moreover cpbpv was able to detect subtle errors in some programs for which other frameworks based on bounded model checking have failed
networks are hard to manage and in spite of all the so called holistic management packages things are getting worse we argue that the difficulty of network management can partly be attributed to fundamental flaw in the existing architecture protocols expose all their internal details and hence the complexity of the ever evolving data plane encumbers the management plane guided by this observation in this paper we explore an alternative approach and propose complexity oblivious network management conman network architecture in which the management interface of data plane protocols includes minimal protocol specific information this restricts the operational complexity of protocols to their implementation and allows the management plane to achieve high level policies in structured fashion we built the conman interface of few protocols and management tool that can achieve high level configuration goals based on this interface our preliminary experience with applying this tool to real world vpn configuration indicates the architecture’s potential to alleviate the difficulty of configuration management
in bit processor many of the data values actually used in computations require much narrower data widths in this study we demonstrate that instruction data widths exhibit very strong temporal locality and describe mechanisms to accurately predict data widthsto exploit the predictability of data widths we propose multi bit width mbw microarchitecture which when the opportunity arises takes the wires normally used to route the operands and bypass the result of bit instruction and instead uses them for multiple narrow width instructions this technique increases the effective issue width without adding many additional wires by reusing already existing datapathscompared to traditional four wide superscalar processor our best mbw configuration with peak issue rate of eight ipc achieves speedup on the simulated specint benchmarks which performs very well when compared to speedup attainable by processor with perfect data width predictor
several object oriented database management systems have been implemented without an accompanying theoretical foundation for constraint query specification and processing the pattern based object calculus presented in this article provides such theoretical foundation for describing and processing object oriented databases we view an object oriented database as network of interrelated classes ie the intension and collection of time varying object association patterns ie the extension the object calculus is based on first order logic it provides the formalism for interpreting precisely and uniformly the semantics of queries and integrity constraints in object oriented databases the power of the object calculus is shown in four aspects first associations among objects are expressed explicitly in an object oriented database second the nonassociation operator is included in the object calculus third set oriented operations can be performed on both homogeneous and heterogeneous object association patterns fourth our approach does not assume specific form of database schema proposed formalism is also applied to the design of high level object oriented query and constraint languages
an ad hoc network is formed by group of mobile hosts upon wireless network interface previous research in communication in ad hoc networks has concentrated on routing algorithms which are designed for fully connected networks the traditional approach to communication in disconnected ad hoc network is to let the mobile computer wait for network reconnection passively this method may lead to unacceptable transmission delays we propose an approach that guarantees message transmission in minimal time in this approach mobile hosts actively modify their trajectories to transmit messages we develop algorithms that minimize the trajectory modifications under two different assumptions the movements of all the nodes in the system are known and the movements of the hosts in the system are not known
despite significant efforts to obtain an accurate picture of the internet’s connectivity structure at the level of individual autonomous systems ases much has remained unknown in terms of the quality of the inferred as maps that have been widely used by the research community in this paper we assess the quality of the inferred internet maps through case studies of sample set of ases these case studies allow us to establish the ground truth of connectivity between this set of ases and their directly connected neighbors direct comparison between the ground truth and inferred topology maps yield insights into questions such as which parts of the actual topology are adequately captured by the inferred maps which parts are missing and why and what is the percentage of missing links in these parts this information is critical in assessing for each class of real world networking problems whether the use of currently inferred as maps or proposed as topology models is or is not appropriate more importantly our newly gained insights also point to new directions towards building realistic and economically viable internet topology maps
in information retrieval the linear combination method is very flexible and effective data fusion method since different weights can be assigned to different component systems however it remains an open question which weighting schema is good previously simple weighting schema was very often used for system its weight is assigned as its average performance over group of training queries in this paper we investigate the weighting issue by extensive experiments we find that series of power functions of average performance which can be implemented as efficiently as the simple weighting schema is more effective than the simple weighting schema for data fusion
given are directed edge labelled graph with distinguished node and regular expression which may contain variables we wish to compute all substitutions phi of symbols for variables together with all nodes such that all paths rarr are in phi we derive an algorithm for this problem using relational algebra and show how it may be implemented in prolog the motivation for the problem derives from declarative framework for specifying compiler optimisations
we consider an efficient realization of the all reduce operation with large data sizes in cluster environments under the assumption that the reduce operator is associative and commutative we derive tight lower bound of the amount of data that must be communicated in order to complete this operation and propose ring based algorithm that only requires tree connectivity to achieve bandwidth optimality unlike the widely used butterfly like all reduce algorithm that incurs network contention in smp multi core clusters the proposed algorithm can achieve contention free communication in almost all contemporary clusters including smp multi core clusters and ethernet switched clusters with multiple switches we demonstrate that the proposed algorithm is more efficient than other algorithms on clusters with different nodal architectures and networking technologies when the data size is sufficiently large
we present new techniques for fast accurate and scalable static data race detection in concurrent programs focusing our analysis on linux device drivers allowed us to identify the unique challenges posed by debugging largescale real life code and also pinpointed drawbacks in existing race warning generation methods this motivated the development of new techniques that helped us in improving both the scalability as well as the accuracy of each of the three main steps in race warning generation system the first and most crucial step is the automatic discovery of shared variables towards that end we present new efficient dataflow algorithm for shared variable detection which is more effective than existing correlation based techniques that failed to detect the shared variables responsible for data races in majority of the drivers in our benchmark suite secondly accuracy of race warning generation strongly hinges on the precision of the pointer analysis used to compute aliases for lock pointers we formulate new scalable context sensitive alias analysis that effectively combines divide and conquer strategy with function summarization and is demonstrably more efficient than existing bdd based techniques finally we provide new warning reduction technique that leverages lock acquisition patterns to yield provably better warning reduction than existing lockset based methods
the increasing performance price ratio of computer hardware makes possible to explore distributed approach at code clone analysis this paper presents ccfinder distributed approach at large scale code clone analysis ccfinder has been implemented with pc workstations in our student laboratory and vast collection of open source software with about million lines in total has been analyzed with it in about days the result has been visualized as scatter plot which showed the presence of frequently used code as easy recognizable patterns also ccfinder has been used to analyze single software system against the whole collection in order to explore the presence of code imported from open source software
the authors review object oriented systems from the user’s point of view and discuss problems that need solving they describe how characteristic features of object oriented systems can provide foundation for various computational tasks in particular they address the impact of object models on programming environments expert systems and databases although they use relatively simple example for illustration they believe that the same concepts and techniques can be applied to general applications the authors also discuss related problems and highlight important research directions
with power having become critical issue in the operation of data centers today there has been an increased push towards the vision of energy proportional computing in which no power is used by idle systems very low power is used by lightly loaded systems and proportionately higher power at higher loads unfortunately given the state of the art of today’s hardware designing individual servers that exhibit this property remains an open challenge however even in the absence of redesigned hardware we demonstrate how optimization based techniques can be used to build systems with off the shelf hardware that when viewed at the aggregate level approximate the behavior of energy proportional systems this paper explores the viability and tradeoffs of optimization based approaches using two different case studies first we show how different power saving mechanisms can be combined to deliver an aggregate system that is proportional in its use of server power second we show early results on delivering proportional cooling system for these servers when compared to the power consumed at utilization results from our testbed show that optimization based systems can reduce the power consumed at utilization to for server power and for cooling power
one of the challenges of designing for coarse grain reconfigurable arrays is the need for mature tools this is especially important because of the heterogeneity of the larger more predefined and hence more specialized array elements this work describes the use of genetic algorithm ga to automate the physical binding phase of kernel design we identify the generalizable features of an example platform and discuss suitable ways to harness the binding problem to ga search engine
program code in computer system can be altered either by malicious security attacks or by various faults in microprocessors at the instruction level all code modifications are manifested as bit flips in this work we present generalized methodology for monitoring code integrity at run time in application specific instruction set processors asips where both the instruction set architecture isa and the underlying microarchitecture can be customized for particular application domain we embed monitoring microoperations in machine instructions thus the processor is augmented with hardware monitor automatically the monitor observes the processor’s execution trace of basic blocks at run time checks whether the execution trace aligns with the expected program behavior and signals any mismatches since microoperations are at lower software architecture level than processor instructions the microarchitectural support for program code integrity monitoring is transparent to upper software levels and no recompilation or modification is needed for the program experimental results show that our microarchitectural support can detect program code integrity compromises with small area overhead and little performance degradation
the semantics for data manipulation of the database language cudl conceptual universal database language designed to manage dynamic database environments are presented this language conforms to the fdb frame database data model offering simple easy and efficient platform for the use of the fdb model otherwise the management and operation of fdb data is laborious and time consuming and it requires from the user very good acquaintance of the proposed model the structures and organisation of it as well as the processes of the management of elements that compose it in this paper we present in depth the semantics of the way of handling the data in order to search and transform information in an fdb data source we present the analysis of simple and complex cases that led us to synthesize valid and simple semantic rules that determine the data manipulation operations the more sophisticated and demanding constructs used in the language for query specification query processing and object manipulation are discussed and evaluated
as technology evolves power dissipation increases and cooling systems become more complex and expensive there are two main sources of power dissipation in processor dynamic power and leakage dynamic power has been the most significant factor but leakage will become increasingly significant in future it is predicted that leakage will shortly be the most significant cost as it grows at about times rate per generation thus reducing leakage is essential for future processor design since large caches occupy most of the area they are one of the leakiest structures in the chip and hence main source of energy consumption for future processorsthis paper introduces iatac inter access time per access count new hardware technique to reduce cache leakage for caches iatac dynamically adapts the cache size to the program requirements turning off cache lines whose content is not likely to be reused our evaluation shows that this approach outperforms all previous state of the art techniques iatac turns off percnt of the cache lines across different cache configurations with very small performance degradation of around percnt
this paper presents and evaluates number of techniques to improve the execution time of interprocedural pointer analysis in the context of programs the analysis is formulated as graph of set constraints and solved using worklist algorithm indirections lead to new constraints being added during this procedure the solution process can be simplified by identifying cycles and we present novel online algorithm for doing this we also present difference propagation scheme which avoids redundant work by tracking changes to each solution set the effectiveness of these and other methods are shown in an experimental study over common lsquo rsquo programs ranging between to lines of code
we present the semantics and proof system for an object oriented language with active objects asynchronous method calls and futures the language based on creol distinguishes itself in that unlike active object models it permits more than one thread of control within an object though unlike java only one thread can be active within an object at given time and rescheduling occurs only at specific release points consequently reestablishing an object’s monitor invariant is possible at specific well defined points in the code the resulting proof system shows that this approach to concurrency is simpler for reasoning than say java’s multithreaded concurrency model from methodological perspective we identify constructs which admit simple proof system and those which require for example interference freedom tests
knowledge is critical resource that organizations use to gain and maintain competitive advantages in the constantly changing business environment organizations must exploit effective and efficient methods of preserving sharing and reusing knowledge in order to help knowledge workers find task relevant information hence an important issue is how to discover and model the knowledge flow kf of workers from their historical work records the objectives of knowledge flow model are to understand knowledge workers task needs and the ways they reference documents and then provide adaptive knowledge support this work proposes hybrid recommendation methods based on the knowledge flow model which integrates kf mining sequential rule mining and collaborative filtering techniques to recommend codified knowledge these kf based recommendation methods involve two phases kf mining phase and kf based recommendation phase the kf mining phase identifies each worker’s knowledge flow by analyzing his her knowledge referencing behavior information needs while the kf based recommendation phase utilizes the proposed hybrid methods to proactively provide relevant codified knowledge for the worker therefore the proposed methods use workers preferences for codified knowledge as well as their knowledge referencing behavior to predict their topics of interest and recommend task related knowledge using data collected from research institute laboratory experiments are conducted to evaluate the performance of the proposed hybrid methods and compare them with the traditional cf method the results of experiments demonstrate that utilizing the document preferences and knowledge referencing behavior of workers can effectively improve the quality of recommendations and facilitate efficient knowledge sharing
distributed termination detection is fundamental problem in parallel and distributed computing and numerous schemes with different performance characteristics have been proposed these schemes while being efficient with regard to one performance metric prove to be inefficient in terms of other metrics significant drawback shared by all previous methods is that on most popular topologies they take time to detect and signal termination after its actual occurrence where is the total number of processing elements detection delay is arguably the most important metric to optimize since it is directly related to the amount of idling of computing resources and to the delay in the utilization of results of the underlying computation in this paper we present novel termination detection algorithm that is simultaneously optimal or near optimal with respect to all relevant performance measures on any topology in particular our algorithm has best case detection delay of theta and finite optimal worst case detection delay on any topology equal in order terms to the time for an optimal one to all broadcast on that topology which we accurately characterize for an arbitrary topology on ary cube tori and meshes the worst case delay is theta where is the diameter of the target topology further our algorithm has message and computational complexities of theta md in the worst case and for most applications theta in the average case the same as other message efficient algorithms and an optimal space complexity of theta where is the total number of messages used by the underlying computation we also give scheme using counters that greatly reduces the constant associated with the average message and computational complexities but does not suffer from the counter overflow problems of other schemes finally unlike some previous schemes our algorithm does not rely on first in first out fifo ordering for message communication to work correctly
recent researches on improving the efficiency and user experience of web browsing on handhelds are seeking to solve the problem by re authoring web pages or making adaptations and recommendations according to user preference their basis is good understanding of the relationship between user behaviors and user preference we propose practical method to find user’s interest blocks by machine learning using the combination of significant implicit evidences which is extracted from four aspects of user behaviors display time viewing information items scrolling and link selection we also develop customized web browser for small screen devices to collect user behaviors accurately for evaluation we conduct an on line user study and make statistical analysis based on the dataset which shows that most types of the suggested implicit evidences are significant and viewing information items is the least indicative aspect of user behaviors the dataset is then processed off line to find user’s interest blocks using the proposed method experimental results demonstrate the effectiveness of finding user’s interest blocks by machine learning using the combination of significant implicit evidences further analysis reveals the great effect of users and moderate effect of websites on the usefulness of significant implicit evidences
in this paper we report on an empirical exploration of digital ink and speech usage in lecture presentation we studied the video archives of five master’s level computer science courses to understand how instructors use ink and speech together while lecturing and to evaluate techniques for analyzing digital ink our interest in understanding how ink and speech are used together is to inform the development of future tools for supporting classroom presentation distance education and viewing of archived lectures we want to make it easier to interact with electronic materials and to extract information from them we want to provide an empirical basis for addressing challenging problems such as automatically generating full text transcripts of lectures matching speaker audio with slide content and recognizing the meaning of the instructor’s ink our results include an evaluation of handwritten word recognition in the lecture domain an approach for associating attentional marks with content an analysis of linkage between speech and ink and an application of recognition techniques to infer speaker actions
object oriented program design promotes the reuse of code not only through inheritance and polymorphism but also through building server classes which can be used by many different client classes research on static analysis of object oriented software has focused on addressing the new features of classes inheritance polymorphism and dynamic binding this paper demonstrates how exploiting the nature of object oriented design principles can enable development of scalable static analyses we present an algorithm for computing def use information for single class’s manipulation of objects of other classes which requires that only partial representations of server classes be constructed this information is useful for data flow testing and debugging
distributed persistent memory system is considered which implements form of segmentation with paging within the framework of the single address space paradigm of memory reference peculiar problem of system of this type is the lack of protection of the private information items of any given process against unauthorized access attempts possibly performed by the other processes we present set of mechanisms able to enforce access control over the private virtual space areas these mechanisms guarantee degree of protection comparable to that typical of multiple address space system while preserving the advantages of ease of information sharing typical of the single address space model the resulting environment is evaluated from number of salient viewpoints including ease of distribution and revocation of access rights strategies for virtual space reuse and the storage requirements of the information for memory management
in multi hop wireless networks mhwn packets are routed between source and destination using chain of intermediate nodes chains are fundamental communication structure in mhwns whose behavior must be understood to enable building effective protocols the behavior of chains is determined by number of complex and interdependent processes that arise as the sources of different chain hops compete to transmit their packets on the shared medium in this paper we show that mac level interactions play the primary role in determining the behavior of chains we evaluate the types of chains that occur based on the mac interactions between different links using realistic propagation and packet forwarding models we discover that the presence of destructive interactions due to different forms of hidden terminals does not impact the throughput of an isolated chain significantly however due to the increased number of retransmissions required the amount of band width consumed is significantly higher in chains exhibiting destructive interactions substantially influencing the over all network performance these results are validated by testbed experiments we finally study how different types of chains interfere with each other and discover that well behaved chains in terms of self interference are more resilient to interference from other chains
in recent years it has become increasingly clear that the vision of the semantic web requires uncertain reasoning over rich first order representations markov logic brings the power of probabilistic modeling to first order logic by attaching weights to logical formulas and viewing them as templates for features of markov networks this gives natural probabilistic semantics to uncertain or even inconsistent knowledge bases with minimal engineering effort inference algorithms for markov logic draw on ideas from satisfiability markov chain monte carlo and knowledge based model construction learning algorithms are based on the conjugate gradient algorithm pseudo likelihood and inductive logic programming markov logic has been successfully applied to problems in entity resolution link prediction information extraction and others and is the basis of the open source alchemy system
end users are often frustrated by unexpected problems while using networked software leading to frustrated calls to the help desk seeking solutions however trying to locate the cause of these unexpected behaviors is not simple task the key to many network monitoring and diagnosis approaches is using cross layer information but the complex interaction between network layers and usually large amount of collected data prevent it support personnel from determining the root of errors and bottlenecks there is need for the tools that reduce the amount of data to be processed offer systematic exploration of the data and assist whole stack performance analysis in this paper we present visty network stack visualization tool that allows it support personnel to systematically explore network activities at end hosts visty can provide an overview picture of the network stack at any specified time showing how errors in one layer affect the performance of others visty was designed as prototype for more advanced diagnosis tools and also may be used to assist novice users in understanding the network stack and relationships between each layer
the fast development of web services or more broadly service oriented architectures soas has prompted more organizations to move contents and applications out to the web softwares on the web allow one to enjoy variety of services for example translating texts into other languages and converting document from one format to another in this paper we address the problem of maintaining data integrity and confidentiality in web content delivery when dynamic content modifications are needed we propose flexible and scalable model for secure content delivery based on the use of roles and role certificates to manage web intermediaries the proxies coordinate themselves in order to process and deliver contents and the integrity of the delivered content is enforced using decentralized strategy to achieve this we utilize distributed role lookup table and role number based routing mechanism we give an efficient secure protocol ideliver for content processing and delivery and also describe method for securely updating role lookup tables our solution also applies to the security problem in web based workflows for example maintaining the data integrity in automated trading contract authorization and supply chain management in large organizations
the availability of huge system memory even on standard servers generated lot of interest in main memory database engines in data warehouse systems highly compressed column oriented data structures are quite prominent in order to scale with the data volume and the system load many of these systems are highly distributed with shared nothing approach the fundamental principle of all systems is full table scan over one or multiple compressed columns recent research proposed different techniques to speedup table scans like intelligent compression or using an additional hardware such as graphic cards or fpgas in this paper we show that utilizing the embedded vector processing units vpus found in standard superscalar processors can speed up the performance of mainmemory full table scan by factors this is achieved without changing the hardware architecture and thereby without additional power consumption moreover as on chip vpus directly access the system’s ram no additional costly copy operations are needed for using the new simd scan approach in standard main memory database engines therefore we propose this scan approach to be used as the standard scan operator for compressed column oriented main memory storage we then discuss how well our solution scales with the number of processor cores consequently to what degree it can be applied in multi threaded environments to verify the feasibility of our approach we implemented the proposed techniques on modern intel multi core processor using intel reg streaming simd extensions intel reg sse in addition we integrated the new simd scan approach into sap reg netweaver reg business warehouse accelerator we conclude with describing the performance benefits of using our approach for processing and scanning compressed data using vpus in column oriented main memory database systems
atomic broadcast is fundamental problem of distributed systems it states that messages must be delivered in the same order to their destination processes this paper describes solution to this problem in asynchronous distributed systems in which processes can crash and recover consensus based solution to atomic broadcast problem has been designed by chandra and toueg for asynchronous distributed systems where crashed processes do not recover we extend this approach it transforms any consensus protocol suited to the crash recovery model into an atomic broadcast protocol suited to the same model we show that atomic broadcast can be implemented requiring few additional log operations in excess of those required by the consensus the paper also discusses how additional log operations can improve the protocol in terms of faster recovery and better throughput to illustrate the use of the protocol the paper also describes solution to the replica management problem in asynchronous distributed systems in which processes can crash and recover the proposed technique makes bridge between established results on weighted voting and recent results on the consensus problem
the object oriented metrics suite proposed by chidamber and kemerer ck is measurement approach towards improved object oriented design and development practices however existing studies evidence traces of collinearity between some of the metrics and low ranges of other metrics two facts which may endanger the validity of models based on the ck suite as high correlation may be an indicator of collinearity in this paper we empirically determine to what extent high correlations and low ranges might be expected among ck metricsto draw as much general conclusions as possible we extract the ck metrics from large data set public domain projects and we apply statistical meta analysis techniques to strengthen the validity of our results homogenously through the projects we found moderate sim to high correlation between some of the metrics and low ranges of other metricsresults of this empirical analysis supply researchers and practitioners with three main advises to avoid the use in prediction systems of ck metrics that have correlation more than to test for collinearity those metrics that present moderate correlations between and to avoid the use as response in continuous parametric regression analysis of the metrics presenting low variance this might therefore suggest that prediction system may not be based on the whole ck metrics suite but only on subset consisting of those metrics that do not present either high correlation or low ranges
existing book readers do not do good job supporting many reading tasks that people perform as ethnographers report that when reading people frequently read from multiple display surfaces in this paper we present our design of dual display book reader and explore how it can be used to interact with electronic documents our design supports embodied interactions like folding flipping and fanning for local lightweight navigation we also show how mechanisms like space filling thumbnails can use the increased display space to aid global navigation lastly the detachable faces in our design can facilitate inter document operations and flexible layout of documents in the workspace semi directed interviews with seven users found that dual displays have the potential to improve the reading experience by supporting several local navigation tasks better than single display device users also identified many reading tasks for which the device would be valuable users did not find the embodied interface particularly useful when reading in our controlled lab setting however
network attached storage nas integrates redundant array of independent disks raid subsystem that consists of multiple disk drives to aggregate storage capacity performance and reliability based on data striping and distribution traditionally the stripe size is an important parameter that has great influence on the raid subsystem performance whereas the performance impact has been changed due to the development of disk drive technologies and some optimization methods based on disk drive access time this paper constructs performance analysis model to exploit the impact of some optimization approaches including sub commands combination storage interface augment and scatter gather on the stripe size of nas the analysis results and experimental validation indicate that due to the evolution of hardware and software the stripe size has negligible performance impact on nas when the disk drives involved are organized in raid pattern
ibm community tools ict is synchronous broadcast messaging system in use by very large globally distributed organization ict is interesting for number of reasons including its scale of use thousands of users per day its usage model of employing large scale broadcast to strangers to initiate small group interactions and the fact that it is synchronous system used across multiple time zones in this paper we characterize the use of ict in its context examine the activities for which it is used the motivations of its users and the values they derive from it we also explore problems with the system and look at the social and technical ways in which users deal with them
the creation and deployment of knowledge repositories for managing sharing and reusing tacit knowledge within an organization has emerged as prevalent approach in current knowledge management practices knowledge repository typically contains vast amounts of formal knowledge elements which generally are available as documents to facilitate users navigation of documents within knowledge repository knowledge maps often created by document clustering techniques represent an appealing and promising approach various document clustering techniques have been proposed in the literature but most deal with monolingual documents ie written in the same language however as result of increased globalization and advances in internet technology an organization often maintains documents in different languages in its knowledge repositories which necessitates multilingual document clustering mldc to create organizational knowledge maps motivated by the significance of this demand this study designs latent semantic indexing lsi based mldc technique capable of generating knowledge maps ie document clusters from multilingual documents the empirical evaluation results show that the proposed lsi based mldc technique achieves satisfactory clustering effectiveness measured by both cluster recall and cluster precision and is capable of maintaining good balance between monolingual and cross lingual clustering effectiveness when clustering multilingual document corpus
we present algorithms for the propositional model counting problem sat the algorithms utilize tree decompositions of certain graphs associated with the given cnf formula in particular we consider primal dual and incidence graphs we describe the algorithms coherently for direct comparison and with sufficient detail for making an actual implementation reasonably easy we discuss several aspects of the algorithms including worst case time and space requirements
the intelligence in wikipedia project at the university of washington is combining self supervised information extraction ie techniques with mixed initiative interface designed to encourage communal content creation ccc since ie and ccc are each powerful ways to produce large amounts of structured information they have been studied extensively but only in isolation by combining the two methods in virtuous feedback cycle we aim for substantial synergy while previous papers have described the details of individual aspects of our endeavor this report provides an overview of the project’s progress and vision
compressing an inverted file can greatly improve query performance of an information retrieval system irs by reducing disk os we observe that good document identifier assignment dia can make the document identifiers in the posting lists more clustered and result in better compression as well as shorter query processing time in this paper we tackle the np complete problem of finding an optimal dia to minimize the average query processing time in an irs when the probability distribution of query terms is given we indicate that the greedy nearest neighbor greedy nn algorithm can provide excellent performance for this problem however the greedy nn algorithm is inappropriate if used in large scale irss due to its high complexity where denotes the number of documents and denotes the number of distinct terms in real world irss the distribution of query terms is skewed based on this fact we propose fast heuristic called partition based document identifier assignment pbdia algorithm which can efficiently assign consecutive document identifiers to those documents containing frequently used query terms and improve compression efficiency of the posting lists for those terms this can result in reduced query processing time the experimental results show that the pbdia algorithm can yield competitive performance versus the greedy nn for the dia problem and that this optimization problem has significant advantages for both long queries and parallel information retrieval ir
many techniques for synthesizing digital hardware from like languages have been proposed but none have emerged as successful as verilog or vhdl for register transfer level design this paper looks at two of the fundamental challenges concurrency and timing control
module extraction methods have proved to be effective in improving the performance of some ontology reasoning tasks including finding justifications to explain why an entailment holds in an owl dl ontology however the existing module extraction methods that compute syntactic locality based module for the sub concept in subsumption entailment though ensuring the resulting module to preserve all justifications of the entailment may be insufficient in improving the performance of finding all justifications this is because syntactic locality based module is independent of the super concept in subsumption entailment and always contains all concept role assertions in order to extract smaller modules to further optimize finding all justifications in an owl dl ontology we propose goal directed method for extracting module that preserves all justifications of given entailment experimental results on large ontologies show that module extracted by our method is smaller than the corresponding syntactic locality based module making the subsequent computation of all justifications more scalable and more efficient
in recent years number of business reasons have caused software development to become increasingly distributed remote development of software offers several advantages but it is also fraught with challenges in this paper we report on our study of distributed software development that helped shape research agenda for this field our study has identified four areas where important research questions need to be addressed to make distributed development more effective these areas are collaborative software tools knowledge acquisition and management testing in distributed set up and process and metrics issues we present brief summary of related research in each of these areas and also outline open research issues
with advances in computing and communication technologies in recent years two significant trends have emerged in terms of information management heterogeneity and distribution heterogeneity herein discussed in terms of different types of data not in terms of schematic heterogeneity pertains to information use evolving from operational business data eg accounting payroll and inventory to digital assets communications and content eg documents intellectual property rich media mail and web data information has also become widely distributed both in scale and ownership to manage heterogeneity two major classes of systems have evolved database management systems to manage structured data and content management systems to manage document and rich media information in this paper we compare and contrast these different paradigms we believe it is imperative for any business to exploit value from all information independent of where it resides or its form we also identify the technical challenges and opportunities for bringing these different paradigms closer together
the increased functionality of epc class gen epcgen is making this standard the de facto specification for inexpensive tags in the rfid industry epcgen supports only very basic security tools such as bit pseudo random number generator and bit cyclic redundancy code recently two epcgen compliant protocols that address security issues were proposed in the literature in this paper we analyze these protocols and show that they are not secure and subject to replay impersonation and synchronization attacks we then consider the general issue of supporting security in epcgen compliant protocols and propose two rfid protocols that are secure within the restricted constraints of this standard and an anonymous rfid mutual authentication protocol with forward secrecy that is compliant with the epc class gen standard
wireless sensor networks wsns are emerging as essential and popular ways of providing pervasive computing environments for various applications in all these environments energy constraint is the most critical problem that must be considered clustering is introduced to wsns because of its network scalability energy saving attributes and network topology stabilities however there also exist some disadvantages associated with individual clustering scheme such as additional overheads during cluster head ch selection assignment and cluster construction process in this paper we discuss and compare several aspects and characteristics of some widely explored clustering algorithms in wsns eg clustering timings attributes metrics advantages and disadvantages of corresponding clustering algorithms this paper also presents discussion on the future research topics and the challenges of clustering in wsns
some compilation systems such as offline partial evaluators and selective dynamic compilation systems support staged optimizations staged optimization is one where logically single optimization is broken up into stages with the early stage performing preplanning set up work given any available partial knowledge about the program to be compiled and the final stage completing the optimization the final stage can be much faster than the original optimization by having much of its work performed by the early stages key limitation of current staged optimizers is that they are written by hand sometimes in an ad hoc manner we have developed framework called the staged compilation framework scf for systematically and automatically converting single stage optimizations into staged versions the framework is based on combination of aggressive partial evaluation and dead assignment elimination we have implemented scf in standard ml preliminary evaluation shows that scf can speed up classical optimization of some commonly used functions by up to times and typically between times and times
the problem of finding dense structures in given graph is quite basic in informatics including data mining and data engineering clique is popular model to represent dense structures and widely used because of its simplicity and ease in handling pseudo cliques are natural extension of cliques which are subgraphs obtained by removing small number of edges from cliques we here define pseudo clique by subgraph such that the ratio of the number of its edges compared to that of the clique with the same number of vertices is no less than given threshold value in this paper we address the problem of enumerating all pseudo cliques for given graph and threshold value we first show that it seems to be difficult to obtain polynomial time algorithms using straightforward divide and conquer approaches then we propose polynomial time polynomial delay in precise algorithm based on reverse search we show the efficiency of our algorithm in practice by computational experiments
we investigate performance characteristics of secure group communication systems gcss in mobile ad hoc networks that employ intrusion detection techniques for dealing with insider attacks tightly coupled with rekeying techniques for dealing with outsider attacks the objective is to identify optimal settings including the best intrusion detection interval and the best batch rekey interval under which the system lifetime mean time to security failure is maximized while satisfying performance requirements we develop mathematical model based on stochastic petri net to analyze tradeoffs between security and performance properties when given set of parameter values characterizing operational and environmental conditions of gcs instrumented with intrusion detection tightly coupled with batch rekeying we compare our design with baseline system using intrusion detection integrated with individual rekeying to demonstrate the effectiveness
name ambiguity is special case of identity uncertainty where one person can be referenced by multiple name variations in different situations or even share the same name with other people in this paper we focus on the problem of disambiguating person names within web pages and scientific documents we present an efficient and effective two stage approach to disambiguate names in the first stage two novel topic based models are proposed by extending two hierarchical bayesian text models namely probabilistic latent semantic analysis plsa and latent dirichlet allocation lda our models explicitly introduce new variable for persons and learn the distribution of topics with regard to persons and words after learning an initial model the topic distributions are treated as feature sets and names are disambiguated by leveraging hierarchical agglomerative clustering method experiments on web data and scientific documents from citeseer indicate that our approach consistently outperforms other unsupervised learning methods such as spectral clustering and dbscan clustering and could be extended to other research fields we empirically addressed the issue of scalability by disambiguating authors in over papers from the entire citeseer dataset
in this paper we present reconfigurable routing algorithm for mesh network on chip noc dedicated to fault tolerant massively parallel multi processors systems on chip mp soc the routing algorithm can be dynamically reconfigured to adapt to the modification of the micro network topology caused by faulty router this algorithm has been implemented in reconfigurable version of the dspin micro network and evaluated from the point of view of performance penalty on the network saturation threshold and cost extra silicon area occupied by the reconfigurable version of the router
current research activities in the field of deinterlacing include the selection of suitable deinterlacing methods and the estimation of the exact value of missing line this paper proposes spatio temporal domain fuzzy rough sets rule for selecting deinterlacing method that is suitable for regions with high motion or frequent scene changes the proposed algorithm consists of two parts the first part is fuzzy rule based edge direction detection with an edge preserving part that utilizes fuzzy theory to find the most accurate edge direction and interpolates the missing pixels using the introduced gradients in the interpolation the vertical resolution in the deinterlaced image is subjectively concealed the second part of the proposed algorithm is rough sets assisted optimization which selects the most suitable of five different deinterlacing methods and successively builds approximations of the deinterlaced sequence moreover this approach employs size reduction of the database system keeping only the information essential for the process the proposed algorithm is intended not only to be fast but also to reduce deinterlacing artifacts
in this paper we report an empirical study of the photographic portrayal of family members at home adopting social psychological approach and focusing on intergenerational power dynamics our research explores the use of domestic photo displays in family representation parents and their teenagers from eight families in the south of england were interviewed at home about their interpretations of both stored and displayed photos within the home discussions centred on particular photographs found by the participants to portray self and family in different ways the findings show that public displays of digital photos are still curated by mothers of the households but with more difficulty and less control than with analogue photos in addition teenagers both contribute and comply with this curation within the home whilst at the same time developing additional ways of presenting their families and themselves online that are unsupervised by the curator we highlight the conflict of interest that is at play within teen and parent practices and consider the challenges that this presents for supporting the representation of family through the design of photo display technology
much attention has recently been focused on the problem of effectively developing software systems that meet their non functional requirements nfrs architectural frameworks have been proposed as solution to support the design and analysis of nfrs such as performance security adaptability etc the significant benefits of such work include detecting and removing defects earlier reducing development time cost and improving the quality the formal design analysis framework fdaf is an aspect oriented approach that supports the automated translation of extended unified modeling language designs for distributed real time systems into existing formal notations including architecture description languages rapide and armani the analysis of the formalized design is achieved using existing tool support for the formal methods which leverages large body of work in the research community currently fdaf supports the design and analysis of response time and resource utilization performance sub aspects this paper presents the algorithms for translating extended uml diagrams into armani the proofs of correctness of the algorithms and an illustration of the fdaf approach by using the domain name system the armani performance analysis results can provide architects with information indicating whether or not overloaded components exist in the design if such component exists then the architect iteratively refines the uml architecture to meet the clients requirements
this article describes an approach for verifying programs in the presence of data abstraction and information hiding which are key features of modern programming languages with objects and modules this article draws on our experience building and using an automatic program checker and focuses on the property of modular soundness that is the property that the separate verifications of the individual modules of program suffice to ensure the correctness of the composite program we found this desirable property surprisingly difficult to achieve key feature of our methodology for modular soundness is new specification construct the abstraction dependency which reveals which concrete variables appear in the representation of given abstract variable without revealing the abstraction function itself this article discusses in detail two varieties of abstraction dependencies static and dynamic the article also presents new technical definition of modular soundness as monotonicity property of verifiability with respect to scope and uses this technical definition to formally prove the modular soundness of programming discipline for static dependencies
convergent scheduling is general framework for cluster assignment and instruction scheduling on spatial architectures convergent scheduler is composed of independent passes each implementing heuristic that addresses particular problem or constraint the passes share simple common interface that provides spatial and temporal preference for each instruction preferences are not absolute instead the interface allows pass to express the confidence of its preferences as well as preferences for multiple space and time slots pass operates by modifying these preferences by applying series of passes that address all the relevant constraints the convergent scheduler can produce schedule that satisfies all the important constraints because all passes are independent and need to understand only one interface to interact with each other convergent scheduling simplifies the problem of handling multiple constraints and codeveloping different heuristics we have applied convergent scheduling to two spatial architectures the raw processor and clustered vliw machine it is able to successfully handle traditional constraints such as parallelism load balancing and communication minimization as well as constraints due to preplaced instructions which are instructions with predetermined cluster assignment convergent scheduling is able to obtain an average performance improvement of over the existing space time scheduler of the raw processor and an improvement of over state of the art assignment and scheduling techniques on clustered vliw architecture
this paper proposes model comparison algorithm based on model descriptor spatial structure circular descriptor sscd the spatial structure is important in content based model analysis within the sscd the spatial structure of model is described by images and the attribute values of each pixel represent spatial information hence sscd can preserve the global spatial structure of models and is invariant to rotation and scaling in addition by using images to describe the spatial information of models all spatial information of the models can be represented by sscd without redundancy thus sscd can be applied to many scenarios which utilize spatial information in this paper an sscd based model comparison algorithm is presented the proposed algorithm has been tested on model retrieval experiments experimental results demonstrate the effectiveness of the proposed algorithm
the np hard max cover problem requires selecting sets from collection so as to maximize the size of the union this classic problem occurs commonly in many settings in web search and advertising for moderately sized instances greedy algorithm gives an approximation of however the greedy algorithm requires updating scores of arbitrary elements after each step and hence becomes intractable for large datasets we give the first max cover algorithm designed for today’s large scale commodity clusters our algorithm has provably almost the same approximation as greedy but runs much faster furthermore it can be easily expressed in the mapreduce programming paradigm and requires only polylogarithmically many passes over the data our experiments on five large problem instances show that our algorithm is practical and can achieve good speedups compared to the sequential greedy algorithm
we consider recovery from malicious but committed transactions traditional recovery mechanisms do not address this problem except for complete rollbacks which undo the work of good transactions as well as malicious ones and compensating transactions whose utility depends on application semantics we develop an algorithm that rewrites execution histories for the purpose of backing out malicious transactions good transactions that are affected directly or indirectly by malicious transactions complicate the process of backing out undesirable transactions we show that the prefix of rewritten history produced by the algorithm serializes exactly the set of unaffected good transactions the suffix of the rewritten history includes special state information to describe affected good transactions as well as malicious transactions we describe techniques that can extract additional good transactions from this latter part of rewritten history the latter processing saves more good transactions than is possible with dependency graph based approach to recovery
the proliferation of affordable mobile devices with processing and sensing capabilities together with the rapid growth inubiquitous network connectivity herald an era of mobiscopes networked sensing applications that rely on multiple mobile sensors to accomplish global tasks these distributed sensing systems extend the traditional sensor network model introducing challenges in data management data integrity privacy and network system design although several existing applications fit this description they providetailored one time solutions to what essentially is the same set of problems it’s time to work toward general architecture thatidentifies common challenges and provides general methodology for the design of future mobiscopes toward that end this articlesurveys variety of current and emerging mobile networked sensing applications articulates their common challenges and providesarchitectural guidelines and design directions for this important category of emerging distributed sensing systems this article is partof special issue on building sensor rich world
current industry standards for describing web services focus on ensuring interoperability across diverse platforms but do not provide good foundation for automating the use of web services representational techniques being developed for the semantic web can be used to augment these standards the resulting web service specifications enable the development of software programs that can interpret descriptions of unfamiliar web services and then employ those services to satisfy user goals owl owl for services is set of notations for expressing such specifications based on the semantic web ontology language owl it consists of three interrelated parts profile ontology used to describe what the service does process ontology and corresponding presentation syntax used to describe how the service is used and grounding ontology used to describe how to interact with the service owl can be used to automate variety of service related activities involving service discovery interoperation and composition large body of research on owl has led to the creation of many open source tools for developing reasoning about and dynamically utilizing web services
we present the application of feature mining techniques to the developmental therapeutics program’s aids antiviral screen database the database consists of compounds which were measured for their capability to protect human cells from hiv infection according to these measurements the compounds were classified as either active moderately active or inactive the distribution of classes is extremely skewed only of the molecules is known to be active and is known to be moderately activegiven this database we were interested in molecular substructures ie features that are frequent in the active molecules and infrequent in the inactives in data mining terms we focused on features with minimum support in active compounds and maximum support in inactive compounds we analyzed the database using the levelwise version space algorithm that forms the basis of the inductive query and database system molfea molecular feature miner within this framework it is possible to declaratively specify the features of interest such as the frequency of features on possibly different datasets as well as on the generality and syntax of them assuming that the detected substructures are causally related to biochemical mechanisms it should be possible to facilitate the development of new pharmaceuticals with improved activities
we examine the use of policy driven data placement services to improve the performance of data intensive petascale applications in high performance distributed computing environments in particular we are interested in using an asynchronous data placement service to stage data in and out of application workflows efficiently as well as to distribute and replicate data according to virtual organization policies we propose data placement service architecture and describe our implementation of one layer of this architecture which provides efficient priority based bulk data transfers
we propose novel method for modular verification of web service compositions we first use symbolic fixpoint computations to derive conditions on the incoming messages and relations among the incoming and outgoing messages of individual bpel web services these pre and post conditions are accumulated and serve as repository of summarizations of individual web services we then compose the summaries of the invoked bpel services to model external invocations resulting in scalable verification approach for web service compositions our technical contributions include an efficient symbolic encoding for modeling the concurrency semantics of systems having both multi threading and message passing and scalable method for summarizing concurrent processes that interact with each other using synchronous message passing along with modular framework that utilizes these summaries for scalable verification
this paper proposes person independent facial expression space pifes to analyze and synthesize facial expressions based on supervised locality preserving projections slpp which aligns different subjects and different intensities of facial expressions on one generalized expression manifold interactive curves of different patterns are generated according to the input facial expression image sequence and target responsive expression images are synthesized for different emotions in order to synthesize arbitrary expressions for new person with natural details novel approach based on local geometry preserving between the input face image and the target expression image is proposed experimental results clearly demonstrate the efficiency of the proposed algorithm
this paper presents partial order reduction algorithm called twophase that generates significantly reduced state space on large class of practical protocols over alternative algorithms in its class the reduced state space generated by twophase preserves all ctl ast assertions twophase achieves this reduction by following an alternative implementation of the proviso step in particular twophase avoids the in stack check that other tools use in order to realize the proviso step in this paper we demonstrate that the in stack check is inefficient in practice and demonstrate much simpler alternative method of realizing the proviso twophase can be easily combined with an on the fly model checking algorithm to reduce memory requirements further simple but powerful selective caching scheme can also be easily added to twophasea version of twophase using on the fly model checking and selective caching has been implemented in model checker called pv protocol verifier and is in routine use on large problems pv accepts proper subset of promela and never automaton expressing the ltl assertion to be verified pv has helped us complete full state space search several orders of magnitude faster than all alternative tools available in its class on dozens of real protocols pv has helped us detect bugs in real distributed shared memory cache coherency protocols that were missed during incomplete search using alternate tools
multicarrier communication is promising technique to effectively deliver high data rate and combat delay spread over fading channel and adaptability is an inherent advantage of multicarrier communication systems it can be implemented in online data streams this paper addresses significant problem in multicarrier networks that arises in data streaming scenarios namely today’s data mining is ill equipped to handle data streams effectively and pays little attention to the network stability and the fast response http www dbstandfordedu stream furthermore in analysis of massive data streams the ability to process the data in single pass while using little memory is crucial for often the data can be transmitted faster than it can be stored or accessed from disks to address the question we present an adaptive control theoretic explicit rate er online data mining control algorithm odmca to regulate the sending rate of mined data which accounts for the main memory occupancies of terminal nodes this single pass scheme considers limited memory space to process dynamic data streams and also explores the adaptive capability which is employed in general network computation model for dynamic data streams the proposed method uses distributed proportional integrative plus derivative pid controller combined with data mining where the control parameters can be designed to ensure the stability of the control loop in terms of sending rate of mined data the basic pid approach for the computation network transmission is presented and transformation and schur cohn stability test are used to achieve the stability criterion which ensures the bounded rate allocation without steady state oscillation we further show how the odmca can be used to design controller analyze the theoretical aspects of the proposed algorithm and verify its agreement with the simulations in the lan case and the wan case simulation results show the efficiency of our scheme in terms of high main memory occupancy fast response of the main memory occupancy and of the controlled sending rates
this paper investigates mobile wireless sensor actuator network application for use in the cattle breeding industry our goal is to prevent fighting between bulls in on farm breeding paddocks by autonomously applying appropriate stimuli when one bull approaches another bull this is an important application because fighting between high value animals such as bulls during breeding seasons causes significant financial loss to producers furthermore there are significant challenges in this type of application because it requires dynamic animal state estimation real time actuation and efficient mobile wireless transmissions we designed and implemented an animal state estimation algorithm based on state machine mechanism for each animal autonomous actuation is performed based on the estimated states of an animal relative to other animals simple yet effective wireless communication model has been proposed and implemented to achieve high delivery rates in mobile environments we evaluated the performance of our design by both simulations and field experiments which demonstrated the effectiveness of our autonomous animal control system
high performance computing is facing an exponential growth in job output dataset sizes this implies significant commitment of supercomputing center resources most notably precious scratch space in handling data staging and offloading however the scratch area is typically managed using simple purge policies without sophisticated end user data services that are required to balance center’s resource consumption and user serviceability end user data services such as offloading are performed using point to point transfers that are unable to reconcile center’s purge and users delivery deadlines unable to adapt to changing dynamics in the end to end data path and are not fault tolerant we propose robust framework for the timely decentralized offload of result data addressing the aforementioned significant gaps in extant direct transfer based offloading the decentralized offload is achieved using an overlay of user specified intermediate nodes and well known landmark nodes these nodes serve as means both to provide multiple data flow paths thereby maximizing bandwidth as well as provide fail over capabilities for the offload we have implemented our techniques within production job scheduler pbs and data transfer tool bittorrent and our evaluation shows that the offloading times can be significantly reduced for gb file while also meeting center user service level agreements
the parameterized verification of concurrent algorithms and protocols has been addressed by variety of recent methods experience shows that there is trade off between techniques which are widely applicable but depend on nontrivial human guidance and fully automated approaches which are tailored for narrow classes of applications in this spectrum we propose new framework based on environment abstraction which exhibits large degree of automation and can be easily adjusted to different fields of application our approach is based on two insights first we argue that natural abstractions for concurrent software are derived from the ptolemaic perspective of human engineer who focuses on single reference process for this class of abstractions we demonstrate soundness of abstraction under very general assumptions second most protocols in given class of protocols for instance cache coherence protocols and mutual exclusion protocols can be modeled by small sets of compound statements these two insights allow to us efficiently build precise abstract models for given protocols which can then be model checked we demonstrate the power of our method by applying it to various well known classes of protocols
network accountability forensic analysis and failure diagnosis are becoming increasingly important for network management and security such capabilities often utilize network provenance the ability to issue queries over network meta data for example network provenance may be used to trace the path message traverses on the network as well as to determine how message data were derived and which parties were involved in its derivation this paper presents the design and implementation of exspan generic and extensible framework that achieves efficient network provenance in distributed environment we utilize the database notion of data provenance to explain the existence of any network state providing versatile mechanism for network provenance to achieve such flexibility at internet scale exspan uses declarative networking in which network protocols can be modeled as continuous queries over distributed streams and specified concisely in declarative query language we extend existing data models for provenance developed in database literature to enable distribution at internet scale and investigate numerous optimization techniques to maintain and query distributed network provenance efficiently the exspan prototype is developed using rapidnet declarative networking platform based on the emerging ns toolkit experiments over simulated network and an actual deployment in testbed environment demonstrate that our system supports wide range of distributed provenance computations efficiently resulting in significant reductions in bandwidth costs compared to traditional approaches
reduction variables are an important class of cross thread dependence that can be parallelized by exploiting the associativity and commutativity of their operation in this paper we define class of shared variables called partial reduction variables prv these variables either cannot be proven to be reductions or they violate the requirements of reduction variable in some way we describe an algorithm that allows the compiler to detect prvs and we also discuss the necessary requirements to parallelize detected prvs based on these requirements we propose an implementation in tls system to parallelize prvs that works by combination of techniques at compile time and in the hardware the compiler transforms the variable under the assumption that the reduction like behavior proven statically will hold true at runtime however if thread reads or updates the shared variable as result of an alias or unlikely control path lightweight hardware mechanism will detect the access and synchronize it to ensure correct execution we implement our compiler analysis and transformation in gcc and analyze its potential on the spec cpu benchmarkswe find that supporting prvs provides up to performance gain over highly optimized tls system and on average performance improvement
we explore in this paper an effective sliding window filtering abbreviatedly as swf algorithm for incremental mining of association rules in essence by partitioning transaction database into several partitions algorithm swf employs filtering threshold in each partition to deal with the candidate itemset generation under swf the cumulative information of mining previous partitions is selectively carried over toward the generation of candidate itemsets for the subsequent partitions algorithm swf not only significantly reduces and cpu cost by the concepts of cumulative filtering and scan reduction techniques but also effectively controls memory utilization by the technique of sliding window partition algorithm swf is particularly powerful for efficient incremental mining for an ongoing time variant transaction database by utilizing proper scan reduction techniques only one scan of the incremented dataset is needed by algorithm swf the cost of swf is in orders of magnitude smaller than those required by prior methods thus resolving the performance bottleneck experimental studies are performed to evaluate performance of algorithm swf it is noted that the improvement achieved by algorithm swf is even more prominent as the incremented portion of the dataset increases and also as the size of the database increases
there is controversy as to whether explicit support for pddl like axioms and derived predicates is needed for planners to handle real world domains effectively many researchers have deplored the lack of precise semantics for such axioms while others have argued that it might be best to compile them away we propose an adequate semantics for pddl axioms and show that they are an essential feature by proving that it is impossible to compile them away if we restrict the growth of plans and domain descriptions to be polynomial these results suggest that adding reasonable implementation to handle axioms inside the planner is beneficial for the performance our experiments confirm this suggestion
this chapter overviews the scope goals and timeline of the modeling contest cocome it also describes the input the competing teams received and furthermore explains how the peer reviewing process went ahead and how the evaluation criteria were set with the aim to balance the inherently heterogeneous modeling and expressive power of different component models
focusing on travel videos taken in uncontrolled environments and by amateur photographers we exploit correlation between different modalities to facilitate effective travel video scene detection scenes in travel photos ie content taken at the same scenic spot can be easily determined by examining time information for travel video we extract several keyframes for each video shot then photos and keyframes are represented as sequence of visual word histograms respectively based on this representation we transform scene detection into sequence matching problem after finding the best alignment between two sequences we can determine scene boundaries in videos with the help of that in photos we demonstrate that we averagely achieve purity value of if the proposed method is combined with conventional ones we show that not only features of visual words aid in scene detection but also cross media correlation does
much useful information in news reports is often that which is surprising or unexpected in other words we harbour many expectations about the world and when any of these expectations are violated ie made inconsistent by news we have strong indicator of some information that is interesting for us in this paper we present framework for identifying interesting information in news reports by finding interesting inconsistencies an implemented system based on this framework accepts structured news reports as inputs translates each report to logical literal identifies the story of which the report is part looks for inconsistencies between the report the background knowledge and set of expectations classifies and evaluates those inconsistencies and outputs news reports of interest to the user together with associated explanations of why they are interesting
software regression testing occurs continuously during the software development process in order to detect faults as early as possible growing size of test suites on one hand and resource constraints on the other hand necessitates the test case prioritization process test case prioritization techniques schedule test cases for regression testing in an order that increases the chances of early detection of faults some prior techniques used the notion of history based test case prioritization in this paper we present new approach for prioritization using historical test case performance data which considers time and resource constraints this approach directly calculates the priority of each test case using historical information from the previous executions of the test case the results of applying our approach to siemens suite and space program are also presented our results present interesting insights into the effectiveness of the proposed approach in terms of faster fault detection
this article presents new adaptive texture model locally parallel oscillating patterns are modeled with weighted hilbert space defined over local fourier coefficients the weights on the local fourier atoms are optimized to match the local orientation and frequency of the texture we propose an adaptive method to decompose an image into cartoon layer and locally parallel texture layer using this model and total variation cartoon model this decomposition method is then used to denoise an image containing oscillating patterns finally we show how to take advantage of such separation framework to simultaneously inpaint the structure and texture components of an image with missing parts numerical results show that our method improves state of the art algorithms for directional and complex textures
we present secureblox declarative system that unifies distributed query processor with security policy framework secureblox decouples security concerns from system specification allowing easy reconfiguration of system’s security properties to suit given execution environment our implementation of secureblox is series of extensions to logicblox an emerging commercial datalog based platform for enterprise software systems secureblox enhances logicblox to enable distribution and static meta programmability and makes novel use of existing logicblox features such as integrity constraints secureblox allows meta programmability via bloxgenerics language extension for compile time code generation based on the security requirements and trust policies of the deployed environment we present and evaluate detailed use cases in which secureblox enables diverse applications including an authenticated declarative routing protocol with encrypted advertisements and an authenticated and encrypted parallel hash join operation our results demonstrate secureblox’s abilities to specify and implement wide range of different security constructs for distributed systems as well as to enable tradeoffs between performance and security
which active learning methods can we expect to yield good performance in learning binary and multi category logistic regression classifiers addressing this question is natural first step in providing robust solutions for active learning across wide variety of exponential models including maximum entropy generalized linear log linear and conditional random field models for the logistic regression model we re derive the variance reduction method known in experimental design circles as optimality we then run comparisons against different variations of the most widely used heuristic schemes query by committee and uncertainty sampling to discover which methods work best for different classes of problems and why we find that among the strategies tested the experimental design methods are most likely to match or beat random sample baseline the heuristic alternatives produced mixed results with an uncertainty sampling variant called margin sampling and derivative method called qbb mm providing the most promising performance at very low computational cost computational running times of the experimental design methods were bottleneck to the evaluations meanwhile evaluation of the heuristic methods lead to an accumulation of negative results we explore alternative evaluation design parameters to test whether these negative results are merely an artifact of settings where experimental design methods can be applied the results demonstrate need for improved active learning methods that will provide reliable performance at reasonable computational cost
the miniaturization of transistors in recent technology nodes requires tremendous back end tuning and optimizations making bug fixing at later design stages more expensive therefore it is imperative to find design bugs as early as possible the first defense against bugs is block level testing performed by designers and constrained random simulation is the prevalent method however this method may miss corner case scenarios in this paper we propose an innovative methodology that reuses existing constrained random testbenches for formal bug hunting to support the methodology we present several techniques to enhance rtl symbolic simulation and integrate state of the art word level and boolean level verification techniques into common framework called bughunter from case studies dlx alpha and fir bughunter found more bugs than constrained random simulation using fewer cycles including four new bugs in the verified design previously unknown to the designer the results demonstrate that the proposed techniques provide flexible scalable and robust solution for bug hunting
extensible markup language xml has emerged as medium for interoperability over the internet as the number of documents published in the form of xml is increasing there is need for selective dissemination of xml documents based on user interests in the proposed technique combination of adaptive genetic algorithms and multi class support vector machine svm is used to learn user model based on the feedback from the users the system automatically adapts to the user’s preference and interests the user model and similarity metric are used for selective dissemination of continuous stream of xml documents experimental evaluations performed over wide range of xml documents indicate that the proposed approach significantly improves the performance of the selective dissemination task with respect to accuracy and efficiency
program query languages and pattern detection techniques are an essential part of program analysis and manipulation systems queries and patterns permit the identification of the parts of interest in program’s implementation through representation dedicated to the intent of the system eg call graphs to detect behavioral flaws abstract syntax trees for transformations concrete source code to verify programming conventions etc this requires that developers understand and manage all the different representations and techniques in order to detect various patterns of interest to alleviate this overhead we present logic based language that allows the program’s implementation to be queried using concrete source code templates the queries are matched against combination of structural and behavioral program representations including call graphs points to analysis results and abstract syntax trees the result of our approach is that developers can detect patterns in the queried program using source code excerpts embedded in logic queries which act as prototypical samples of the structure and behavior they intend to match
in this paper we address the subject of large multimedia database indexing for content based retrieval we introduce multicurves new scheme for indexing high dimensional descriptors this technique based on the simultaneous use of moderate dimensional space filling curves has as main advantages the ability to handle high dimensional data dimensions and over to allow the easy maintenance of the indexes inclusion and deletion of data and to adapt well to secondary storage thus providing scalability to huge databases millions or even thousands of millions of descriptors we use multicurves to perform the approximate nearest neighbors search with very good compromise between precision and speed the evaluation of multicurves carried out on large databases demonstrates that the strategy compares well to other up to date nearest neighbor search strategies we also test multicurves on the real world application of image identification for cultural institutions in this application which requires the fast search of large amount of local descriptors multicurves allows dramatic speed up in comparison to the brute force strategy of sequential search without any noticeable precision loss
we present disco an asynchronous neighbor discovery and rendezvous protocol that allows two or more nodes to operate their radios at low duty cycles eg and yet still discover and communicate with one another during infrequent opportunistic encounters without requiring any prior synchronization information the key challenge is to operate the radio at low duty cycle but still ensure that discovery is fast reliable and predictable over range of operating conditions disco nodes pick pair of prime numbers such that the sum of their reciprocals is equal to the desired radio duty cycle each node increments local counter with globallyfixed period if node’s local counter value is divisible by either of its primes then the node turns on its radio for one period this protocol ensures that two nodes will have some overlapping radio on time within bounded number of periods even if nodes independently set their own duty cycle once neighbor is discovered and its wakeup schedule known rendezvous is just matter of being awake during the neighbor’s next wakeup period for synchronous rendezvous or during an overlapping wake period for asynchronous rendezvous
in this work we present study about adaptation on mobile museum guide aiming at investigating the relationships between personality traits and the attitudes toward some basic dimensions of adaptivity each participant was exposed to two simulated systems that realized an adaptive and non adaptive version respectively on each of the dimensions investigated the study showed interesting effects of big five personality traits on acceptance of the adaptivity dimensions in particular conscientiousness creativity and stability locus of control seemed to have limited yet quite selective effect on delegating to the system the choice of follow ups
one of the difficulties in high level synthesis and compiler optimization is obtaining good schedule without knowing the exact computation time of the tasks involved the uncertain computation times of these tasks normally occur when conditional instructions are employed and or inputs of the tasks influence the computation time the relationship between these tasks can be represented as data flow graph where each node models the task associated with probabilistic computation time set of edges represents the dependencies between tasks in this research we study scheduling and optimization algorithms taking into account the probabilistic execution times two novel algorithms called probabilistic retiming and probabilistic rotation scheduling are developed for solving the underlying nonresource and resource constrained scheduling problems respectively experimental results show that probabilistic retiming consistently produces graph with smaller longest path computation time for given confidence level as compared with the traditional retiming algorithm that assumes fixed worst case and average case computation times furthermore when considering the resource constraints and probabilistic environments probabilistic rotation scheduling gives schedule whose length is guaranteed to satisfy given probability requirement this schedule is better than schedules produced by other algorithms that consider worst case and average case scenarios
sensors in smart item environments capture data about product conditions and usage to support business decisions as well as production automation processes challenging issue in this application area is the restricted quality of sensor data due to limited sensor precision and sensor failures moreover data stream processing to meet resource constraints in streaming environments introduces additional noise and decreases the data quality in order to avoid wrong business decisions due to dirty data quality characteristics have to be captured processed and provided to the respective business task however the issue of how to efficiently provide applications with information about data quality is still an open research problem in this article we address this problem by presenting flexible model for the propagation and processing of data quality the comprehensive analysis of common data stream processing operators and their impact on data quality allows fruitful data evaluation and diminishes incorrect business decisions further we propose the data quality model control to adapt the data quality granularity to the data stream interestingness
formal analysis of security protocols based on symbolic models has been very successful in finding flaws in published protocols and proving protocols secure using automated tools an important question is whether this kind of formal analysis implies security guarantees in the strong sense of modern cryptography initiated by the seminal work of abadi and rogaway this question has been investigated and numerous positive results showing this so called computational soundness of formal analysis have been obtained however for the case of active adversaries and protocols that use symmetric encryption computational soundness has remained challenge in this paper we show the first general computational soundness result for key exchange protocols with symmetric encryption along the lines of paper by canetti and herzog on protocols with public key encryption more specifically we develop symbolic automatically checkable criterion based on observational equivalence and show that key exchange protocol that satisfies this criterion realizes key exchange functionality in the sense of universal composability our results hold under standard cryptographic assumptions
the most acute information management challenges today stem from organizations relying on large number of diverse interrelated data sources but having no means of managing them in convenient integrated or principled fashion these challenges arise in enterprise and government data management digital libraries smart homes and personal information management we have proposed dataspaces as data management abstraction for these diverse applications and dataspace support platforms dssps as systems that should be built to provide the required services over dataspaces unlike data integration systems dssps do not require full semantic integration of the sources in order to provide useful services this paper lays out specific technical challenges to realizing dssps and ties them to existing work in our field we focus on query answering in dssps the dssp’s ability to introspect on its content and the use of human attention to enhance the semantic relationships in dataspace
flash memory based solid state disks are fast becoming the dominant form of end user storage devices partly even replacing the traditional hard disks existing two level memory hierarchy models fail to realize the full potential of flash based storage devices we propose two new computation models the general flash model and the unit cost model for memory hierarchies involving these devices our models are simple enough for meaningful algorithm design and analysis in particular we show that broad range of existing external memory algorithms and data structures based on the merging paradigm can be adapted efficiently into the unit cost model our experiments show that the theoretical analysis of algorithms on our models corresponds to the empirical behavior of algorithms when using solid state disks as external memory
this paper describes method for transforming any given set of datalog rules into an efficient specialized implementation with guaranteed worst case time and space complexities and for computing the complexities from the rules the running time is optimal in the sense that only useful combinations of facts that lead to all hypotheses of rule being simultaneously true are considered and each such combination is considered exactly once the associated space usage is optimal in that it is the minimum space needed for such consideration modulo scheduling optimizations that may eliminate some summands in the space usage formula the transformation is based on general method for algorithm design that exploits fixed point computation incremental maintenance of invariants and combinations of indexed and linked data structures we apply the method to number of analysis problems some with improved algorithm complexities and all with greatly improved algorithm understanding and greatly simplified complexity analysis
this study proposes the anycast based multimedia distribution architectures with application level context aware capability to specify the most suitable server for various application domains the following three architectures namely the identical heterogeneous and semi heterogeneous candidate architectures are specified for different application purposes the identical candidate architecture in which multiple servers provide clients with the same contents is highly reliable and suitable for real time streaming applications ii the heterogeneous candidate architecture in which different servers provide clients with different contents provides better service diversity than the identical candidate architecture and is suitable for non sequential content service iii the semi heterogeneous candidate architecture comprising one main server and multiple proxy servers in which the main server stores the completed contents and proxy servers store some portions of information sessions has the best system performance and works well under high network traffic loads to obtain quick and smooth multimedia distribution the server selection criteria should consider not only the nearest server but also the network traffic loads and the popularity of requested content ie context aware considerations the proposed architectures based on the characteristics of ipv anycast and context aware operations attempt to find the most suitable server proxy finally the system performance of each of the three proposed architectures is analyzed and evaluated and compared with the non anycast architecture simulation results also indicate that the semi heterogeneous architecture is adaptive in face of changing conditions
as computer systems become increasingly mission critical used in life critical situations and relied upon to protect intellectual property operating system reliability is becoming an ever growing concern in the past mission and life critical embedded systems consisted of simple microcontrollers running small amount of software that could be validated using traditional and informal techniques however with the growth of software complexity traditional techniques for ensuring software reliability have not been able to keep up leading to an overall degradation of reliability this paper argues that microkernels are the best approach for delivering truly trustworthy computer systems in the foreseeable future it presents the nicta operating systems research vision centred around the microkernel and based on four core projects the sel project is designing an improved api for secure microkernel verified will produce full formal verification of the microkernel potoroo combines execution time measurements with static analysis to determine the worst case execution profiles of the kernel and camkes provides component architecture for building systems that use the microkernel through close collaboration with open kernel labs nicta spinoff the research output of these projects will make its way into products over the next few years
this paper describes the relationship between trading network and www network from preferential attachment mechanism perspective this mechanism is known to be the underlying principle in the network evolution and has been incorporated to formulate two famous web pages ranking algorithms pagerank and hypertext induced topic search hits we point out the differences between trading network and www network from preferential attachment perspective derive the formulation of hits based ranking algorithm for trading network as direct consequence of the differences and apply the same framework when deriving the formulation back to the hits formulation that turns to become technique to accelerate its convergences
this paper describes two exact algorithms for the joint problem of object placement and request routing in content distribution network cdn cdn is technology used to efficiently distribute electronic content throughout an existing internet protocol network the problem consists of replicating content on the proxy servers and routing the requests for the content to suitable proxy server in cdn such that the total cost of distribution is minimized an upper bound on end to end object transfer time is also taken into account the problem is formulated as nonlinear integer programming formulation which is linearized in three different ways two algorithms one based on benders decomposition and the other based on lagrangean relaxation and decomposition are described for the solution of the problem computational experiments are conducted by comparing the proposed linearizations and the two algorithms on randomly generated internet topologies
domain specific generators will increasingly rely on graphical languages for declarative specifications of target applications such languages will provide front ends to generators and related tools to produce customized code on demand critical to the success of this approach will be domain specific design wizards tools that guide users in their selection of components for constructing particular applications in this paper we present the containerstore graphical language its generator and design wizard
in this paper we present an image based framework that acquires the reflectance properties of human face range scan of the face is not required based on morphable face model the system estimates the shape and establishes point to point correspondence across images taken from different viewpoints and across different individuals faces this provides common parameterization of all reconstructed surfaces that can be used to compare and transfer brdf data between different faces shape estimation from images compensates deformations of the face during the measurement process such as facial expressions in the common parameterization regions of homogeneous materials on the face surface can be defined priori we apply analytical brdf models to express the reflectance properties of each region and we estimate their parameters in least squares fit from the image data for each of the surface points the diffuse component of the brdf is locally refined which provides high detail we present results for multiple analytical brdf models rendered at novel orientations and lighting conditions
research in biometrics suggests that the time period specific trait is monitored over ie observing speech or handwriting long enough is useful for identification focusing on this aspect this paper presents data mining analysis of the effect of observation time period on user identification based on online user behavior we show that online identification accuracies improve with pooling user data over sessions and present results that quantify the number of sessions needed to identify users at desired accuracy thresholds we discuss potential applications of this for verification of online user identity particularly as part of multi factor authentication methods
we give the first constant factor approximation algorithm for the asymmetric virtual private network textsc vpn problem with arbitrary concave costs we even show the stronger result that there is always tree solution of cost at most opt and that tree solution of expected cost at most opt can be determined in polynomial timefor the case of linear cost we obtain varepsilon frac mathcal mathcal approximation algorithm for any fixed where mathcal and mathcal mathcal geq mathcal denote the outgoing and ingoing demand respectivelyfurthermore we answer an outstanding open question about the complexity status of the so called balanced textsc vpn problem by proving its np hardness
in this paper we present four approaches to providing highly concurrent tree indices in the context of data shipping client server oodbms architecture the first performs all index operations at the server while the other approaches support varying degrees of client caching and usage of index pages we have implemented the four approaches as well as the pl approach in the context of the shore oodb system at wisconsin and we present experimental results from performance study based on running shore on an ibm sp multicomputer our results emphasize the need for non pl approaches and demonstrate the tradeoffs between pl no caching and the three caching alternatives
jit compilation is model of execution which translates at run time critical parts of the program to low level representation typically jit compiler produces machine code from an intermediate bytecode representation this paper considers hardware jit compiler targeting fpgas which are digital circuits configurable as needed to implement application specific circuits recent fpgas in the xilinx virtex family are particularly attractive for hardware jit because they are reconfigurable at run time they contain both cpus and reconfigurable logic and their architecture strikes balance of features in this paper we discuss the design of hardware architecture and compiler able to dynamically enhance the instruction set with hardware specialized instructions prototype system based on the xilinx virtex family supporting hardware jit compilation is described and evaluated
this paper investigates how the integration of agile methods and user centered design ucd is carried out in practice for this study we have applied grounded theory as suitable qualitative approach to determine what is happening in actual practice the data was collected by semi structured interviews with professionals who have already worked with an integrated agile ucd methodology further data was collected by observing these professionals in their working context and by studying their documents where possible the emerging themes that the study found show that there is an increasing realization of the importance of usability in software development among agile team members the requirements are emerging and both low and high fidelity prototypes based usability tests are highly used in agile teams there is an appreciation of each other’s work from both ucd professionals and developers and both sides can learn from each other
major challenge for dealing with multi perspective specifications and more concretely with merging of several descriptions or views is toleration of incompleteness and inconsistency views may be inconclusive and may have conflicts over the concepts being modeled the desire of being able to tolerate both phenomena introduces the need to evaluate and quantify the significance of detected inconsistency as well as to measure the degree of conflict and uncertainty of the merged view as the specification process evolves we show in this paper to what extent disagreement and incompleteness are closely interrelated and play central role to obtain measure of the level of inconsistency and to define merging operator whose aim is getting the model which best reflects the combined knowledge of all stakeholders we will also propose two kinds of interesting and useful orderings among perspectives which are based on differences of behavior and inconsistency respectively
without the mcnc and ispd benchmarks it would arguably not have been possible for the academic community to make consistent advances in physical design over the last decade while still being used extensively in placement and floorplanning research those benchmarks can no longer be considered representative of today’s and tomorrow’s physical design challenges in order to drive physical design research over the next few years new benchmark suit is being released in conjunction with the ispd placement contest these benchmarks are directly derived from industrial asic designs with circuit sizes ranging from thousand to million placeable objects unlike the ispd benchmarks the physical structure of these designs is completely preserved giving realistic challenging designs for today’s placement tools hopefully these benchmarks will help accelerate new physical design research in the placement floor planning and routing
query processing costs on large text databases are dominated by the need to retrieve and scan the inverted list of each query term retrieval time for inverted lists can be greatly reduced by the use of compression but this adds to the cpu time required here we show that the cpu component of query response time for conjunctive boolean queries and for informal ranked queries can be similarly reduced at little cost in terms of storage by the inclusion of an internal index in each compressed inverted list this method has been applied in retrieval system for collection of nearly two million short documents our experimental results show that the self indexing strategy adds less than to the size of the compressed inverted file which itself occupies less than of the indexed text yet can reduce processing time for boolean queries of terms to under one fifth of the previous cost similarly ranked queries of terms can be evaluated in as little as of the previous time with little or no loss of retrieval effectiveness
to effectively model complex applications in which constantly changing situations can be represented database system must be able to support the runtime specification of structural and behavioral nuances for objects on an individual or group basis this paper introduces the role mechanism as an extension of object oriented databases to support unanticipated behavioral oscillations for objects that may attain many types and share single object identity role refers to the ability to represent object dynamics by seamlessly integrating idiosyncratic behavior possibly in response to external events with pre existing object behavior specified at instance creation time in this manner the same object can simultaneously be an instance of different classes which symbolize the different roles that this object assumes the role concept and its underlying linguistic scheme simplify the design requirements of complex applications that need to create and manipulate dynamic objects
despite the widespread adoption of role based access control rbac models new access control models are required for new applications for which rbac may not be especially well suited and for which implementations of rbac do not enable properties of access control policies to be adequately defined and proven to address these issues we propose form of access control model that is based upon the key notion of an event the access control model that we propose is intended to permit the representation of access control requirements in distributed and changing computing environment the proving of properties of access control policies defined in terms of our model and direct implementations for access control checking
virtual or online content creation is no longer an external process done by software developers or professional new media players but is more and more performed by ordinary people in this paper we focus on non professional users to present how different categories of users get involved in the process of content sharing and creation within city community that only few of them are interested in contributing to this community is nothing new in itself instead we want to look at what is needed to encourage them to help us build up virtual replica of the city using an ad hoc application ie the amc application to support them in achieving this goal this mobile city device must have some iterative elements like tags ratings comments etc which are also known as social features that stimulate users to become active members of that particular community by exploring which of these interactive elements are most suitable on mobile devices we hope to define framework to support users in generating content in user friendly way
because of the high impact of high tech digital crime upon our society it is necessary to develop effective information retrieval ir tools to support digital forensic investigations in this paper we propose an ir system for digital forensics that targets emails our system incorporates wordnet ie domain independent ontology for the vocabulary into an extended boolean model ebm by applying query expansion techniques structured boolean queries in backus naur form bnf are utilized to assist investigators in effectively expressing their information requirements we compare the performance of our system on several email datasets with traditional boolean ir system built upon the lucene keyword only model experimental results show that our system yields promising improvement in retrieval performance without the requirement of very accurate query keywords to retrieve the most relevant emails
in recent years document clustering has been receiving more and more attentions as an important and fundamental technique for unsupervised document organization automatictopic extraction and fast information retrieval or filtering in this paper we propose novel method for clustering documents using regularization unlike traditional globally regularized clustering methods our method first construct local regularized linear label predictor for each document vector and then combine all those local regularizers with global smoothness regularizer so we call our algorithm clustering with local and global regularization clgr we will show that the cluster memberships of the documents can be achieved by eigenvalue decomposition of sparse symmetric matrix which can be efficiently solved by iterative methods finally our experimental evaluations on several datasets are presented to show the superiorities of clgr over traditional document clustering methods
this paper proposes an organized generalization of newman and girvan’s modularity measure for graph clustering optimized via deterministic annealing scheme this measure produces topologically ordered graph clusterings that lead to faithful and readable graph representations based on clustering induced graphs topographic graph clustering provides an alternative to more classical solutions in which standard graph clustering method is applied to build simpler graph that is then represented with graph layout algorithm comparative study on four real world graphs ranging from to vertices shows the interest of the proposed approach with respect to classical solutions and to self organizing maps for graphs
rfid tags are increasingly being used in supply chain applications due to their potential in engendering supply chain visibility and reducing tracking errors through instantaneous location of objects as well as the tagged object’s immediate ambient conditions during its life time an rfid tag is highly likely to physically pass through different organizations that have disparate information systems with varying information needs we consider management of data in these applications with respect to volume data and query characteristics privacy and security among others and propose means to reduce false positive and false negative reads the beneficial properties of rfid tags can be realized only when they are readable with high degree of accuracy the read rate accuracy in turn depends on several factors including the medium and distance between tag and reader presence of objects that are impenetrable by rf signal orientation of tag with respect to the reader among others we propose three algorithms that address read rate accuracy and illustrate their relative performance using an example scenario
we propose an automatic approach to generate street side photo realistic models from images captured along the streets at ground level we first develop multi view semantic segmentation method that recognizes and segments each image at pixel level into semantically meaningful areas each labeled with specific object class such as building sky ground vegetation and car partition scheme is then introduced to separate buildings into independent blocks using the major line structures of the scene finally for each block we propose an inverse patch based orthographic composition and structure analysis method for fa ccedil ade modeling that efficiently regularizes the noisy and missing reconstructed data our system has the distinct advantage of producing visually compelling results by imposing strong priors of building regularity we demonstrate the fully automatic system on typical city example to validate our methodology
designers usually begin with database to look for historical design solution available experience and techniques through design documents when initiating new design this database is collection of labeled design documents under few of predefined categories however little work has been done on labeling relatively small number of design documents for information organization so that most of design documents in this database can be automatically categorized this paper initiates study on this topic and proposes methodology in four steps design document collection documents labeling finalization of documents labeling and categorization of design database our discussion in this paper focuses on the first three steps the key of this method is to collect relatively small number of design documents for manual labeling operation and unify the effective labeling results as the final labels in terms of labeling agreement analysis and text classification experiment then these labeled documents are utilized as training samples to construct classifiers which can automatically give appropriate labels to each design document with this method design documents are labeled in terms of the consensus of operators understanding and design information can be organized in comprehensive and universally accessible way case study of labeling robotic design documents is used to demonstrate the proposed methodology experimental results show that this method can significantly benefit efficient design information search
this paper addresses the issues related to improving the overall quality of the dynamic candidate link generation for the requirements tracing process for verification and validation and independent verification and validation analysts the contribution of the paper is four fold we define goals for tracing tool based on analyst responsibilities in the tracing process we introduce several new measures for validating that the goals have been satisfied we implement analyst feedback in the tracing process and we present prototype tool that we built retro requirements tracing on target to address these goals we also present the results of study used to assess retro’s support of goals and goal elements that can be measured objectively
in high performance superscalar processor the instruction scheduler often comes with poor scalability and high complexity due to the expensive wakeup operation from detailed simulation based analyses we find that of the wakeup distances between two dependent instructions are short in the range of instructions and are in the range of instructions we apply this wakeup spatial locality to the design of conventional cam based and matrix based wakeup logic respectively by limiting the wakeup coverage to instructions where xc xc for entry segments the proposed wakeup designs confine the wakeup operation in two matrix based or three cam based entry segments no matter how large the issue window size is the experimental results show that for an issue window of entries iw or entries iw the proposed cam based wakeup locality design saves iw and iw of the power consumption reduces iw and iw in the wakeup latency compared to the conventional cam based design with almost no performance loss for the matrix based wakeup logic applying wakeup locality to the design drastically reduces the area cost extensive simulation results including comparisons with previous works show that the wakeup spatial locality is the key element to achieve scalability for future sophisticated instruction schedulers
the set agreement problem is generalization of the uniform consensus problem each process proposes value and each non faulty process has to decide value such that decided value is proposed value and at most different values are decided it has been shown that any algorithm that solves the set agreement problem in synchronous systems that can suffer up to crash failures requires tk rounds in the worst case it has also been shown that it is possible to design early deciding algorithms where no process decides and halts after min fk tk rounds where is the number of actual crashes in run this paper explores new direction to solve the set agreement problem in synchronous system it considers that the system is enriched with base objects denoted has sa objects that allow solving the set agreement problem in set of processes
this research examines the privacy comfort levels of participants if others can view traces of their web browsing activity during week long field study participants used an electronic diary daily to annotate each web page visited with privacy level content categories were used by participants to theoretically specify their privacy comfort for each category and by researchers to partition participants actual browsing the content categories were clustered into groups based on the dominant privacy levels applied to the pages inconsistencies between participants in their privacy ratings of categories suggest that general privacy management scheme is inappropriate participants consistency within categories suggests that personalized scheme may be feasible however more fine grained approach to classification is required to improve results for sites that tend to be general of multiple task purposes or dynamic in content
grids are being used for executing parallel applications over remote resources for executing parallel application on set of grid resources chosen by user or grid scheduler the input data needed by the application is segmented according to the data distribution followed in the application and the data segments are distributed to the grid resources the same input data may be used subsequently by different applications leading to multiple copies replicas of parallel data segments in various grid resources the data needed for parallel application can be gathered from the existing replicas onto the computational resources chosen by the grid scheduler for application execution in this work we have devised novel algorithms for determining nearest replica sites containing data segments needed by parallel application executing on set of resources with the objective of minimizing the time needed for transferring the data segments from the replica sites to the resources we have tested our algorithms on different kinds of experimental setups we find that the best algorithm varies according to the configuration of data servers and clients in all cases our algorithms performed better than the existing algorithms by at least
applications using xml for data representation very often use different xml formats and thus require the transformation of xml data the common approach transforms entire xml documents from one format into another eg by using an xslt stylesheet different from this approach we use an xslt stylesheet in order to transform given xpath query or given xslt query so that we retrieve and transform only that part of the xml document which is sufficient to answer the given query among other things our approach avoids problems of replication saves processing time and in distributed scenarios transportation costs
recent years have seen an overwhelming interest in how people work together as group both the nature of collaboration and research into how people collaborate is complex and multifaceted with different research agendas types of studies and variations in the behavioral data collected better understanding of collaboration is needed in order to be able to make contributions to the design of systems to support collaboration and collaborative tasks in this article we combine relevant literature past research and small scale empirical study of two people individually and collaboratively constructing jigsaws the objective is to make progress towards the goal of generating extensions to an existing task modeling approach task knowledge structures the research described has enabled us to generate requirements for approaches to modeling collaborative tasks and also set of requirements to be taken into account in the design of computer based collaborative virtual jigsaw
enterprise wlans have made dramatic shift towards centralized architectures in the recent past the reasons for such change have been ease of management and better design of various control and security functions the data path of wlans however continues to use the distributed random access model as defined by the popular dcf mechanism of the standard while theoretical results indicate that centrally scheduled data path can achieve higher efficiency than its distributed counterpart the likely complexity of such solution has inhibited practical consideration in this paper we take fresh implementation and deployment oriented view in understanding data path choices in enterprise wlans we perform extensive measurements to characterize the impact of various design choices like scheduling granularity on the performance of centralized scheduler and identify regions where such centralized scheduler can provide the best gains our detailed evaluation with scheduling prototypes deployed on two different wireless testbeds indicates that dcf is quite robust in many scenarios but centralization can play unique role in mitigating hidden terminals scenarios which may occur infrequently but become pain points when they do and exploiting exposed terminals scenarios which occur more frequently and limit the potential of successful concurrent transmissions motivated by these results we design and implement centaur hybrid data path for enterprise wlans that combines the simplicity and ease of dcf with limited amount of centralized scheduling from unique vantage point our mechanisms do not require client cooperation and can support legacy clients
in wireless sensor networks compromised sensor nodes aim to distort the integrity of data by sending false data reports injecting false data during data aggregation and disrupting transmission of aggregated data previously known trust systems rely on general reputation concept to prevent these attacks however this paper presents novel reliable data aggregation and transmission protocol called rdat which is based on the concept of functional reputation protocol rdat improves the reliability of data aggregation and transmission by evaluating each type of sensor node action using respective functional reputation in addition protocol rdat employs fault tolerant reed solomon coding scheme based multi path data transmission algorithm to ensure the reliable data transmission to the base station the simulation results show that protocol rdat significantly improves the reliability of the data aggregation and transmission in the presence of compromised nodes
polarity mining provides an in depth analysis of semantic orientations of text information motivated by its success in the area of topic mining we propose an ontology supported polarity mining ospm approach the approach aims to enhance polarity mining with ontology by providing detailed topic specific information ospm was evaluated in the movie review domain using both supervised and unsupervised techniques results revealed that ospm outperformed the baseline method without ontology support the findings of this study not only advance the state of polarity mining research but also shed light on future research directions copy wiley periodicals inc
the correctness of the data managed by database systems is vital to any application that utilizes data for business research and decision making purposes to guard databases against erroneous data not reflecting real world data or business rules semantic integrity constraints can be specified during database design current commercial database management systems provide various means to implement mechanisms to enforce semantic integrity constraints at database run timein this paper we give an overview of the semantic integrity support in the most recent sql standard sql and we show to what extent the different concepts and language constructs proposed in this standard can be found in major commercial object relational database management systems in addition we discuss general design guidelines that point out how the semantic integrity features provided by these systems should be utilized in order to implement an effective integrity enforcing subsystem for database
with the increasing popularity of mobile computing devices the need to access information in mobile environments has grown rapidly since the information has to be accessed over wireless networks mobile information systems often have to deal with problems like low bandwidth high delay and frequent disconnections information hoarding is method that tries to overcome these problems by transferring information which the user will probably need in advance the hoarding mechanism that we describe in this paper exploits the location dependence of the information access which is often found in mobile information systems our simulation results show that it is beneficial to do so and that we achieve higher hit ratios than with caching mechanism
the ease of deployment of battery powered and mobile systems is pushing the network edge far from powered infrastructures primary challenge in building untethered systems is offering powerful aggregation points and gateways between heterogeneous end points role traditionally played by powered servers microservers are battery powered in network nodes that play number of roles processing data fromclients aggregating data providing responses to queries and actingas network gateway providing qos guarantees for theseservices can be extremely energy intensive since increasedenergy consumption translates to shorter lifetime there is need for new way to provide these qos guarantees at minimal energy consumption this paper presents triage tiered hardware and softwarearchitecture for microservers triage extends the lifetime of microserver by combining two independent but connected platforms high power platform that provides the capability to executecomplex tasks and low power platform that provides high responsiveness at low energy cost the low power platform acts similarto medical triage unit examining requests to find critical ones and scheduling tasks to optimize the use of the high powerplatform the scheduling decision is based on evaluating each task’s resource requirements using hardware assisted profiling of execution time and energy usage using three microserver services storage network forwarding and query processing we show that triage provides more than increase in microserver lifetime over existing systems while providing probabilistic quality of service guarantees
traditional static checking centers around finding bugs in programs by isolating cases where the language has been used incorrectly these language based checkers do not understand the semantics of software libraries and therefore cannot be used to detect errors in the use of libraries in this paper we introduce stllint program analysis we have implemented for the standard template library and similar generic software libraries and we present the general approach that underlies stllint we show that static checking of library semantics differs greatly from checking of language semantics requiring new representations of program behavior and new algorithms major challenges include checking the use of generic algorithms loop analysis for interfaces and organizing behavioral specifications for extensibility copyright copy john wiley sons ltd
in this article we discuss some issues that arise when ontologies are used to support corporate application domains such as electronic commerce commerce and some technical problems in deploying ontologies for real world use in particular we focus on issues of ontology integration and the related problem of semantic mapping that is the mapping of ontologies and taxonomies to reference ontologies to preserve semantics along the way we discuss what typically constitutes an ontology architecture we situate the discussion in the domain of business to business bb commerce by its very nature bb commerce must try to interlink buyers and sellers from multiple companies with disparate product description terminologies and meanings thus serving as paradigmatic case for the use of ontologies to support corporate applications
this paper presents several important enhancements to therecently published multilevel placement package mplthe improvements include unconstrained quadratic relaxation on small noncontiguous subproblems at every level of the hierarchy ii improved interpolation declustering based on techniques from algebraic multigrid amg and iii iterated cycles with additional geometric informationfor aggregation in subsequent cycles the enhanced version of mpl named mpl improves the total wirelength result by about compared to the original version the attractive scalability properties of the mpl run time have beenlargely retained and the overall run time remains very competitive compared to gordian domino on uniform cell size ibm ispd benchmarks speed up of well overx on large circuits cells or nets is obtainedalong with an average improvement in total wirelength ofabout compared to dragon on the same benchmarks speed up of about is obtained at the cost ofabout increased wirelength on the recently publishedpeko synthetic benchmarks mpl generates surprisinglyhigh quality placements roughly closer to the optimal than those produced by capo and dragon inrun time about twice as long as capo’s and about th of dragon’s
we present the groundhog benchmarking suite that evaluates the power consumption of reconfigurable technology for applications targeting the mobile computing domain this benchmark suite includes seven designs one design targets fine grained fpga fabrics allowing for quick state of the art evaluation and six designs are specified at high level allowing them to target range of existing and future reconfigurable technologies each of the six designs can be stimulated with the help of synthetically generated input stimuli created by an open source tool included in the downloadable suite another tool is included to help verify the correctness of each implemented design to demonstrate the potential of this benchmark suite we evaluate the power consumption of two modern industrial fpgas targeting the mobile domain also we show how an academic fpga framework vpr that has been updated for power estimates can be used to estimates the power consumption of different fpga architectures and an open source cad flow mapping to these architectures
in this paper we propose novel nonparametric regression model to generate virtual humans from still images for the applications of next generation environments ng this model automatically synthesizes deformed shapes of characters by using kernel regression with elliptic radial basis functions erbfs and locally weighted regression loess kernel regression with erbfs is used for representing the deformed character shapes and creating lively animated talking faces for preserving patterns within the shapes loess is applied to fit the details with local control the results show that our method effectively simulates plausible movements for character animation including body movement simulation novel views synthesis and expressive facial animation synchronized with input speech therefore the proposed model is especially suitable for intelligent multimedia applications in virtual humans generation
in sensor networks secure localization mdash determining sensors locations in hostile untrusted environment mdash is challenging but very important problem that has not yet been addressed effectively this paper presents an attack tolerant localization protocol called verification for iterative localization veil under which sensors cooperatively safeguard the localization service by exploiting the high spatiotemporal correlation existing between adjacent nodes veil realizes adaptive management of profile for normal localization behavior and distributed detection of false locations advertised by attackers by comparing them against the profile of normal behavior our analysis and simulation results show that veil achieves high level tolerance to many critical attacks and is computationally feasible on resource limited sensors
we briefly motivate and present new online bibliography on schema evolution an area which has recently gained much interest in both research and practice
in this paper we consider distributed mechanism to detect and to defend against the low rate tcp attack the low rate tcp attack is recently discovered attack in essence it is periodic short burst that exploits the homogeneity of the minimum retransmission timeout rto of tcp flows and forces all affected tcp flows to backoff and enter the retransmission timeout state when these affected tcp flows timeout and retransmit their packets the low rate attack will again send short burst to force these affected tcp flows to enter rto again therefore these affected tcp flows may be entitled to zero or very low transmission bandwidth this sort of attack is difficult to identify due to large family of attack patterns we propose distributed detection mechanism to identify the low rate attack in particular we use the dynamic time warping approach to robustly and accurately identify the existence of the low rate attack once the attack is detected we use fair resource allocation mechanism to schedule all packets so that the number of affected tcp flows is minimized and provide sufficient resource protection to those affected tcp flows experiments are carried out to quantify the robustness and accuracy of the proposed distributed detection mechanism in particular one can achieve very low false positive negative when compare to legitimate internet traffic our experiments also illustrate the the efficiency of the defense mechanism across different attack patterns and network topologies
this paper advocates novel approach to the construction of secure software controlling information flow and maintaining integrity via monadic encapsulation of effects this approach is constructive relying on properties of monads and monad transformers to build verify and extend secure software systems we illustrate this approach by construction of abstract operating systems called separation kernels starting from mathematical model of shared state concurrency based on monads of resumptions and state we outline the development by stepwise refinements of separation kernels supporting unix like system calls interdomain communication and formally verified security policy domain separation because monads may be easily and safely represented within any pure higher order typed functional language the resulting system models may be directly realized within language such as haskell
program execution traces are frequently used in industry and academia yet most trace compression algorithms have to be re implemented every time the trace format is changed which takes time is error prone and often results in inefficient solutions this paper describes and evaluates tcgen tool that automatically generates portable customized high performance trace compressors all the user has to do is provide description of the trace format and select one or more predictors to compress the fields in the trace records tcgen translates this specification into source code and optimizes it for the specified trace format and predictor algorithms on average the generated code is faster and compresses better than the six other compression algorithms we have tested for example comparison with sbc one of the best trace compression algorithms in the current literature shows that tcgen’s synthesized code compresses speccpu address traces more decompresses them faster and compresses them faster
environmental modelling often requires long iterative process of sourcing reformatting analyzing and introducing various types of data into the model much of the data to be analyzed are geospatial data digital terrain models dtm river basin boundaries snow cover from satellite imagery etc and so the modelling workflow typically involves the use of multiple desktop gis and remote sensing software packages with limited compatibility among them recent advances in service oriented architectures soa are allowing users to migrate from dedicated desktop solutions to on line loosely coupled and standards based services which accept source data process them and pass results as basic parameters to other intermediate services and or then to the main model which also may be made available on line this contribution presents service oriented application that addresses the issues of data accessibility and service interoperability for environmental models key model capabilities are implemented as geospatial services which are combined to form complex services and may be reused in other similar contexts this work was carried out under the auspices of the aware project funded by the european programme global monitoring for environment and security gmes we show results of the service oriented application applied to alpine runoff models including the use of geospatial services facilitating discovery access processing and visualization of geospatial data in distributed manner
we present multimodal system for the recognition of manual signs and non manual signals within continuous sign language sentences in sign language information is mainly conveyed through hand gestures manual signs non manual signals such as facial expressions head movements body postures and torso movements are used to express large part of the grammar and some aspects of the syntax of sign language in this paper we propose multichannel hmm based system to recognize manual signs and non manual signals we choose single non manual signal head movement to evaluate our framework when recognizing non manual signals manual signs and non manual signals are processed independently using continuous multidimensional hmms and hmm threshold model experiments conducted demonstrate that our system achieved detection ratio of and reliability measure of
decision tree construction is well studied problem in data mining recently there has been much interest in mining data streams domingos and hulten have presented one pass algorithm for decision tree constructions their system using hoeffding inequality to achieve probabilistic bound on the accuracy of the tree constructed gama et al have extended vfdt in two directions their system vfdtc can deal with continuous data and use more powerful classification techniques at tree leaves peng et al present soft discretization method to solve continuous attributes in data mining in this paper we revisit these problems and implemented system svfdt for data stream mining we make the following contributions we present binary search trees bst approach for efficiently handling continuous attributes its processing time for values inserting is nlogn while vfdt’s processing time is we improve the method of getting the best split test point of given continuous attribute comparing to the method used in vfdtc it decreases from nlogn to in processing time comparing to vfdtc svfdt’s candidate split test number decrease from to logn improve the soft discretization method to increase classification accuracy in data stream mining
this paper presents tree register allocation which maps the lifetimes of the variables in program into set of trees colors each tree in greedy style which is optimal when there is no spilling and connects dataflow between and within the trees afterward this approach generalizes and subsumes as special cases ssa based linear scan and local register allocation it keeps their simplicity and low throughput cost and exposes wide solution space beyond them its flexibility enables control flow structure and or profile information to be better reflected in the trees this approach has been prototyped in the phoenix production compiler framework preliminary experiments suggest this is promising direction with great potential register allocation based on two special kinds of trees extended basic blocks and the maximal spanning tree are found to be competitive alternatives to ssa based register allocation and they all tend to generate better code than linear scan
proper nouns may be considered the most important query words in information retrieval if the two languages use the same alphabet the same proper nouns can be found in either language however if the two languages use different alphabets the names must be transliterated short vowels are not usually marked on arabic words in almost all arabic documents except very important documents like the muslim and christian holy books moreover most arabic words have syllable consisting of consonant vowel combination cv which means that most arabic words contain short or long vowel between two successive consonant letters that makes it difficult to create english arabic transliteration pairs since some english letters may not be matched with any romanized arabic letter in the present study we present different approaches for extraction of transliteration proper noun pairs from parallel corpora based on different similarity measures between the english and romanized arabic proper nouns under consideration the strength of our new system is that it works well for low frequency proper noun pairs we evaluate the new approaches presented using two different english arabic parallel corpora most of our results outperform previously published results in terms of precision recall and measure copy wiley periodicals inc
recommender systems are used in various domains to generate personalized information based on personal user data the ability to preserve the privacy of all participants is an essential requirement of the underlying information filtering architectures because the deployed recommender systems have to be accepted by privacy aware users as well as information and service providers existing approaches neglect to address privacy in this multilateral way we have developed an approach for privacy preserving recommender systems based on multiagent system technology which enables applications to generate recommendations via various filtering techniques while preserving the privacy of all participants we describe the main modules of our solution as well as an application we have implemented based on this approach
we present solutions for the mismatch pattern matching problem with don’t cares given text of length and pattern of length with don’t care symbols and bound our algorithms find all the places that the pattern matches the text with at most mismatches we first give logmlogk logn time randomised algorithm which finds the correct answer with high probability we then present new deterministic nk log time solution that uses tools originally developed for group testing taking our derandomisation approach further we develop an approach based on selectors that runs in nkpolylogm time further in each case the location of the mismatches at each alignment is also given at no extra cost
explicit state methods have proven useful in verifying safety critical systems containing concurrent processes that run asynchronously and communicate such methods consist of inspecting the states and transitions of graph representation of the system their main limitation is state explosion which happens when the graph is too large to be stored in the available computer memory several techniques can be used to palliate state explosion such as on the fly verification compositional verification and partial order reductions in this paper we propose new technique of partial order reductions based on compositional confluence detection ccd which can be combined with the techniques mentioned above ccd is based upon generalization of the notion of confluence defined by milner and exploits the fact that synchronizing transitions that are confluent in the individual processes yield confluent transition in the system graph it thus consists of analysing the transitions of the individual process graphs and the synchronization structure to identify such confluent transitions compositionally under some additional conditions the confluent transitions can be given priority over the other transitions thus enabling graph reductions we propose two such additional conditions one ensuring that the generated graph is equivalent to the original system graph modulo branching bisimulation and one ensuring that the generated graph contains the same deadlock states as the original system graph we also describe how ccd based reductions were implemented in the cadp toolbox and present examples and case study in which adding ccd improves reductions with respect to compositional verification and other partial order reductions
xml enabled publish subscribe pub sub systems have emerged as an increasingly important tool for commerce and internet applications in typical pub sub system subscribed users specify their interests in profile expressed in the xpath language each new data content is then matched against the user profiles so that the content is delivered only to the interested subscribers as the number of subscribed users and their profiles can grow very large the scalability of the service is critical to the success of pub sub systems in this article we propose novel scalable filtering system called ifist that transforms user profiles of twig pattern expressed in xpath into sequences using the pr uuml fer’s method consequently instead of breaking twig pattern into multiple linear paths and matching them separately fist performs holistic matching of twig patterns with each incoming document in bottom up fashion fist organizes the sequences into dynamic hash based index for efficient filtering and exploits the commonality among user profiles to enable shared processing during the filtering phase we demonstrate that the holistic matching approach reduces filtering cost and memory consumption thereby improving the scalability of fist
to extend the lifetime of the sensor networks as far as possible while maintaining the quality of network coverage is major concern in the research of coverage control systematical analysis on the relationship between the network lifetime and cover sets alternation is given and by introducing the concept of time weight factor the network lifetime maximization model is presented through the introduction of the solution granularity the network lifetime optimization problem is transformed into the maximization of cover sets solution based on nsga ii is proposed compared with the previous method which has the additional requirement that the cover sets being disjoint and results in large number of unused nodes our algorithm allows the sensors to participate in multiple cover sets and thus makes fuller use of the whole sensor nodes to further increase the network lifetime simulation results are presented to verify these approaches
we present in this paper technique for synthesizing virtual woodcuts based on real images woodcuts are an ancient form of art in which an image is printed from block of carved wood our solution is fully automatic but it also allows great deal of user control if desired given an input image our solution will synthesize an artistic woodcut from this image previous work on this topic has mainly relied on simulation of the actual physical interaction process between wood ink and paper whereas we present consistent solution based on four steps image segmentation computation of orientation fields generation of strokes and final rendering although some non photorealistic rendering work could be possibly extended to approximate woodcut effects ours is the first consistent approach targeting woodcuts specifically for images we illustrate the potential of the proposed technique with examples of virtual woodcuts obtained either automatically or user guided
many existing algorithms mine transaction databases of precise data for frequent itemsets however there are situations in which the user is interested in only some tiny portions of all the frequent itemsets and there are also situations in which data in the transaction databases are uncertain this calls for both constrained mining for finding only those frequent itemsets that satisfy user constraints which express the user interest and ii mining uncertain data in this paper we propose tree based algorithm that effectively mines transaction databases of uncertain data for only those frequent itemsets satisfying the user specified aggregate constraints the algorithm avoids candidate generation and pushes the aggregate constraints inside the mining process which reduces computation and avoids unnecessary constraint checking
pointer chasing applications tend to traverse composite data structures consisting of multiple independent pointer chains while the traversal of any single pointer chain leads to the serialization of memory operations the traversal of independent pointer chains provides source of memory parallelism this article investigates exploiting such interchain memory parallelism for the purpose of memory latency tolerance using technique called multi chain prefetching previous works roth et al roth and sohi have proposed prefetching simple pointer based structures in multi chain fashion however our work enables multi chain prefetching for arbitrary data structures composed of lists trees and arraysthis article makes five contributions in the context of multi chain prefetching first we introduce framework for compactly describing linked data structure lds traversals providing the data layout and traversal code work information necessary for prefetching second we present an off line scheduling algorithm for computing prefetch schedule from the lds descriptors that overlaps serialized cache misses across separate pointer chain traversals our analysis focuses on static traversals we also propose using speculation to identify independent pointer chains in dynamic traversals third we propose hardware prefetch engine that traverses pointer based data structures and overlaps multiple pointer chains according to the computed prefetch schedule fourth we present compiler that extracts lds descriptors via static analysis of the application source code thus automating multi chain prefetching finally we conduct an experimental evaluation of compiler instrumented multi chain prefetching and compare it against jump pointer prefetching luk and mowry prefetch arrays karlsson et al and predictor directed stream buffers psb sherwood et al our results show compiler instrumented multi chain prefetching improves execution time by percnt across six pointer chasing kernels from the olden benchmark suite rogers et al and by percnt across four specint benchmarks compared to jump pointer prefetching and prefetch arrays multi chain prefetching achieves percnt and percnt higher performance for the selected olden and specint benchmarks respectively compared to psb multi chain prefetching achieves percnt higher performance for the selected olden benchmarks but psb outperforms multi chain prefetching by percnt for the selected specint benchmarks an ideal psb with an infinite markov predictor achieves comparable performance to multi chain prefetching coming within percnt across all benchmarks finally speculation can enable multi chain prefetching for some dynamic traversal codes but our technique loses its effectiveness when the pointer chain traversal order is highly dynamic
current conceptual workflow models use either informally defined conceptual models or several formally defined conceptual models that capture different aspects of the workflow eg the data process and organizational aspects of the workflow to the best of our knowledge there are no algorithms that can amalgamate these models to yield single view of reality fragmented conceptual view is useful for systems analysis and documentation however it fails to realize the potential of conceptual models to provide convenient interface to automate the design and management of workflows first as step toward accomplishing this objective we propose seam state entity activity model conceptual workflow model defined in terms of set theory second no attempt has been made to the best of our knowledge to incorporate time into conceptual workflow model seam incorporates the temporal aspect of workflows third we apply seam to real life organizational unit’s workflows in this work we show subset of the workflows modeled for this organization using seam we also demonstrate via prototype application how the seam schema can be implemented on relational database management system we present the lessons we learned about the advantages obtained for the organization and for developers who choose to use seam we also present potential pitfalls in using the seam methodology to build workflow systems on relational platforms the information contained in this work is sufficient enough to allow application developers to utilize seam as methodology to analyze design and construct workflow applications on current relational database management systems the definition of seam as context free grammar definition of its semantics and its mapping to relational platforms should be sufficient also to allow the construction of an automated workflow design and construction tool with seam as the user interface
frequent pattern mining plays an important role in the data mining community since it is usually fundamental step in various mining tasks however maintenance of frequent patterns is very expensive in the incremental database in addition the status of pattern changes with time in other words frequent pattern is possible to become infrequent and vice versa in order to exactly find all frequent patterns most algorithms have to scan the original database completely whenever an update occurs in this paper we propose new algorithm itm stands for incremental transaction mapping algorithm for incremental frequent pattern mining without rescanning the whole database it transfers the transaction dataset to the vertical representation such that the incremental dataset can be integrated to the original database easily as demonstrated in our experiments the proposed method is very efficient and suitable for mining frequent patterns in the incremental database
although dynamic program slicing was first introduced to aid in user level debugging applications aimed at improving software quality reliability security and performance have since been identified as candidates for using dynamic slicing however the dynamic dependence graph constructed to compute dynamic slices can easily cause slicing algorithms to run out of memory for realistic program runs in this paper we present the design and evaluation of cost effective dynamic program slicing algorithm this algorithm is based upon dynamic dependence graph representation that is highly compact and rapidly traversable thus the graph can be held in memory and dynamic slices can be quickly computed compact representation is derived by recognizing that all dynamic dependences data and control need not be individually represented we identify sets of dynamic dependence edges between pair of statements that can share single representative edge we further show that the dependence graph can be transformed in manner that increases sharing and sharing can be performed even in the presence of aliasing experiments show that transformed dynamic dependence graphs explicitly represent only of the dependence edges present in the full dynamic dependence graph when the full graph sizes range from to gigabytes in size our compacted graphs range from to megabytes in size average slicing times for our algorithm range from to seconds across several benchmarks from specint
robust statistics approach to curvature estimation on discretely sampled surfaces namely polygon meshes and point clouds is presented the method exhibits accuracy stability and consistency even for noisy non uniformly sampled surfaces with irregular configurations within an estimation framework the algorithm is able to reject noise and structured outliers by sampling normal variations in an adaptively reweighted neighborhood around each point the algorithm can be used to reliably derive higher order differential attributes and even correct noisy surface normals while preserving the fine features of the normal and curvature field the approach is compared with state of the art curvature estimation methods and shown to improve accuracy by up to an order of magnitude across ground truth test surfaces under varying tessellation densities and types as well as increasing degrees of noise finally the benefits of robust statistical estimation of curvature are illustrated by applying it to the popular applications of mesh segmentation and suggestive contour rendering
in this paper we compare the behavior of pointers in programs as approximated by static pointer analysis algorithms with the actual behavior of pointers when these programs are run in order to perform this comparison we have implemented several well known pointer analysis algorithms and we have built an instrumentation infrastructure for tracking pointer values during program executionour experiments show that for number of programs from the spec and spec benchmark suites the pointer information produced by existing scalable static pointer analyses is far worse than the actual behavior observed at run time these results have two implications first tool like ours can be used to supplement static program understanding tools in situations where the static pointer information is too coarse to be usable second feedback directed compiler can use profile data on pointer values to improve program performance by ignoring aliases that do not arise at run time and inserting appropriate run time checks to ensure safety as an example we were able to obtain factor of speedup on frequently executed routine from mksim
we propose new heterogeneous data warehouse architecture where first phase traditional relational olap warehouse coexist with second phase data in compressed form optimized for data mining aggregations and metadata for the entire time frame are stored in the first phase relational database the main advantage of the second phase is its reduced requirement that enables very high throughput processing by sequential read only data stream algorithms it becomes feasible to run speed optimized queries and data mining operations on the entire time frame of most granular data the second phase also enables long term data storage and analysis using very efficient compressed format at low storage costs even for historical data the proposed architecture fits existing data warehouse solutions we show the effectiveness of the two phase data warehouse through case study of large web portal
although existing write ahead logging algorithms scale to conventional database workloads their communication and synchronization overheads limit their usefulness for modern applications and distributed systems we revisit write ahead logging with an eye toward finer grained concurrency and an increased range of workloads then remove two core assumptions that pages are the unit of recovery and that times tamps lsns should be stored on each page recovering individual application level objects rather than pages simplifies the handing of systems with object sizes that differ from the page size we show how to remove the need for lsns on the page which in turn enables dma or zero copy for large objects increases concurrency and reduces communication between the application buffer manager and log manager our experiments show that the looser coupling significantly reduces the impact of latency among the components this makes the approach particularly applicable to large scale distributed systems and enables cross pollination of ideas from distributed systems and transactional storage however these advantages come at cost segments are incompatible with physiological redo preventing number of important optimizations we show how allocation enables or prevents mixing of aries pages and physiological redo with segments we present an allocation policy that avoids undesirable interactions that complicate other combinations of aries and lsn free pages and then present proof that both approaches and our combination are correct many optimizations presented here were proposed in the past however we believe this is the first unified approach
numerous machine learning problems involve an investigation of relationships between features in heterogeneous datasets where different classifier can be more appropriate for different regions we propose technique of localized cascade generalization of weak classifiers this technique identifies local regions having similar characteristics and then uses the cascade generalization of local experts to describe the relationship between the data characteristics and the target class we performed comparison with other well known combining methods using weak classifiers as base learners on standard benchmark datasets and the proposed technique was more accurate
well known problem that limits the practical usage of association rule mining algorithms is the extremely large number of rules generated such large number of rules makes the algorithms inefficient and makes it difficult for the end users to comprehend the discovered rules we present the concept of heavy itemset an itemset is heavy for given support and confidence values if all possible association rules made up of items only in are present we prove simple necessary and sufficient condition for an itemset to be heavy we present formula for the number of possible rules for given heavy itemset and show that heavy itemset compactly represents an exponential number of association rules along with two simple search algorithms we present an efficient greedy algorithm to generate collection of disjoint heavy itemsets in given transaction database we then present modified apriori algorithm that starts with given collection of disjoint heavy itemsets and discovers more heavy itemsets not necessarily disjoint with the given ones
self organization is bound to greatly affect computer science the simplicity and yet power of self organized models will allow researchers to propose efficient solutions to problems never before thought possible to be addressed efficiently the published works in the field clearly demonstrate the potential of this approach this paper first reviews number of interesting self organization phenomena found in nature then it discusses their potential applicability in several computer science application scenarios
fine grained access controls for xml define access privileges at the granularity of individual xml nodes in this paper we present fine grained access control mechanism for xml data this mechanism exploits the structural locality of access rights as well as correlations among the access rights of different users to produce compact physical encoding of the access control data this encoding can be constructed using single pass over labeled xml database it is block oriented and suitable for use in secondary storage we show how this access control mechanism can be integrated with next of kin nok xml query processor to provide efficient secure query evaluation the key idea is that the structural information of the nodes and their encoded access controls are stored together allowing the access privileges to be checked efficiently our evaluation shows that the access control mechanism introduces little overhead into the query evaluation process
interface toolkits in ordinary application areas let average programmers rapidly develop software resembling other standard applications in contrast toolkits for novel and perhaps unfamiliar application areas enhance the creativity of these programmers by removing low level implementation burdens and supplying appropriate building blocks toolkits give people language to think about these new interfaces which in turn allows them to concentrate on creative designs this is important for it means that programmers can rapidly generate and test new ideas replicate and refine ideas and create demonstrations for others to try to illustrate this important link between toolkits and creativity describe example groupware toolkits we have built and how people have leveraged them to create innovative interfaces
tumor clustering is becoming powerful method in cancer class discovery nonnegative matrix factorization nmf has shown advantages over other conventional clustering techniques nonetheless there is still considerable room for improving the performance of nmf to this end in this paper gene selection and explicitly enforcing sparseness are introduced into the factorization process particularly independent component analysis is employed to select subset of genes so that the effect of irrelevant or noisy genes can be reduced the nmf and its extensions sparse nmf and nmf with sparseness constraint are then used for tumor clustering on the selected genes series of elaborate experiments are performed by varying the number of clusters and the number of selected genes to evaluate the cooperation between different gene selection settings and nmf based clustering finally the experiments on three representative gene expression datasets demonstrated that the proposed scheme can achieve better clustering results
glare arises due to multiple scattering of light inside the camera’s body and lens optics and reduces image contrast while previous approaches have analyzed glare in image space we show that glare is inherently ray space phenomenon by statistically analyzing the ray space inside camera we can classify and remove glare artifacts in ray space glare behaves as high frequency noise and can be reduced by outlier rejection while such analysis can be performed by capturing the light field inside the camera it results in the loss of spatial resolution unlike light field cameras we do not need to reversibly encode the spatial structure of the ray space leading to simpler designs we explore masks for uniform and non uniform ray sampling and show practical solution to analyze the statistics without significantly compromising image resolution although diffuse scattering of the lens introduces low frequency glare we can produce useful solutions in variety of common scenarios our approach handles photography looking into the sun and photos taken without hood removes the effect of lens smudges and reduces loss of contrast due to camera body reflections we show various applications in contrast enhancement and glare manipulation
database processes must be cache efficient to effectively utilize modern hardware in this paper we analyze the importance of temporal locality and the resultant cache behavior in scheduling database operators for in memory block oriented query processing we demonstrate how the overall performance of workload of multiple database operators is strongly dependent on how they are interleaved with each other longer time slices combined with temporal locality within an operator amortize the effects of the initial compulsory cache misses needed to load the operator’s state such as hash table into the cache though running an operator to completion over all of its input results in the greatest amortization of cache misses this is typically infeasible because of the large intermediate storage requirement to materialize all input tuples to an operator we show experimentally that good cache performance can be obtained with smaller buffers whose size is determined at runtime we demonstrate low overhead method of runtime cache miss sampling using hardware performance counters our evaluation considers two common database operators with state aggregation and hash join sampling reveals operator temporal locality and cache miss behavior and we use those characteristics to choose an appropriate input buffer block size the calculated buffer size balances cache miss amortization with buffer memory requirements
discovery of sequential patterns is becoming increasingly useful and essential in many scientific and commercial domains enormous sizes of available datasets and possibly large number of mined patterns demand efficient scalable and parallel algorithms even though number of algorithms have been developed to efficiently parallelize frequent pattern discovery algorithms that are based on the candidate generation and counting framework the problem of parallelizing the more efficient projection based algorithms has received relatively little attention and existing parallel formulations have been targeted only toward shared memory architectures the irregular and unstructured nature of the task graph generated by these algorithms and the fact that these tasks operate on overlapping sub databases makes it challenging to efficiently parallelize these algorithms on scalable distributed memory parallel computing architectures in this paper we present and study variety of distributed memory parallel algorithms for tree projection based frequent sequence discovery algorithm that are able to minimize the various overheads associated with load imbalance database overlap and interprocessor communication our experimental evaluation on processor ibm sp show that these algorithms are capable of achieving good speedups substantially reducing the amount of the required work to find sequential patterns in large databases
recently network on chip noc architectures have gained popularity to address the interconnect delay problem for designing cmp multi core soc systems in deep sub micron technology however almost all prior studies have focused on noc designs since three dimensional integration has emerged to mitigate the interconnect delay problem exploring the noc design space in can provide ample opportunities to design high performance and energy efficient noc architectures in this paper we propose stacked noc router architecture called mira which unlike the routers in previous works is stacked into multiple layers and optimized to reduce the overall area requirements and power consumption we discuss the design details of four layer noc and its enhanced version with additional express channels and compare them against design and baseline design all the designs are evaluated using cycle accurate noc simulator and integrated with the orion power model for performance and power analysis the simulation results with synthetic and application traces demonstrate that the proposed multi layered noc routers can outperform the and naïve designs in terms of performance and power it can achieve up to reduction in power consumption and up to improvement in average latency with synthetic workloads with real workloads these benefits are around and respectively
topology control protocol aims to efficiently adjust the network topology of wireless networks in self adaptive fashion to improve the performance and scalability of networks this is especially essential to large scale multihop wireless networks eg wireless sensor networks fault tolerant topology control has been studied recently in order to achieve both sparseness ie the number of links is linear with the number of nodes and fault tolerance ie can survive certain level of node link failures different geometric topologies were proposed and used as the underlying network topologies for wireless networks however most of the existing topology control algorithms can only be applied to two dimensional networks where all nodes are distributed in plane in practice wireless networks may be deployed in three dimensional space such as under water wireless sensor networks in ocean or mobile ad hoc networks among space shuttles in space this article seeks to investigate self organizing fault tolerant topology control protocols for large scale wireless networks our new protocols not only guarantee connectivity of the network but also ensure the bounded node degree and constant power stretch factor even under minus node failures all of our proposed protocols are localized algorithms which only use one hop neighbor information and constant messages with small time complexity thus it is easy to update the topology efficiently and self adaptively for large scale dynamic networks our simulation confirms our theoretical proofs for all proposed topologies
designing reliable software for sensor networks is challenging because application developers have little visibility into and understanding of the post deployment behaviorof code executing on resource constrained nodes in remoteand ill reproducible environments to address this problem this paper presents hermes lightweight framework and prototype tool that provides fine grained visibility and control of sensor nodes software at run time hermessarchitecture is based on the notion of interposition which enables it to provide these properties in minimally intrusive manner without requiring any modification to software applications being observed and controlled hermes providesa general extensible and easy to use framework forspecifying which software components to observe and controlas well as when and how this observation and control is done we have implemented and tested fully functional prototype of hermes for the sos sensor operating system our performance evaluation using real sensor nodesas well as cycle accurate simulation shows that hermes successfully achieves its objective of providing fine grained and dynamic visibility and control without incurring significant resource overheads we demonstrate the utility and flexibility of hermes by using our prototype to design implement and evaluate three case studies debugging and testing deployed sensor network applications performing transparent software updates in sensor nodes and implementingnetwork traffic shaping and resource policing
automatic test data generation leads to the identification of input values on which selected path or selected branch is executed within program path oriented vs goal oriented methods in both cases several approaches based on constraint solving exist but in the presence of pointer variables only path oriented methods have been proposed pointers are responsible for the existence of conditional aliasing problems that usually provoke the failure of the goal oriented test data generation process in this paper we propose an overall constraint based method that exploits the results of an intraprocedural points to analysis and provides two specific constraint combinators for automatically generating goal oriented test data this approach correctly handles multi levels stack directed pointers that are mainly used in programs the method has been fully implemented in the test data generation tool inka and first experiences in applying it to variety of existing programs are presented
although examples of tourist guides abound the role of context aware feedback in such systems is an issue that has been insufficiently explored given the potential importance of such feedback this paper investigates from usability perspective two tour guide systems developed for brunel university one with context aware user feedback and the other without an empirical study was undertaken in which each of the applications was assessed through the prism of three usability measurements efficiency effectiveness and satisfaction incorporating the participant feedback gathered as result the paper compares the use of the two applications in order to determine the impact of real time feedback with respect to user location efficiency understood as the time taken by participant to successfully complete task was found to be significantly affected by the use of context aware functionality effectiveness understood as the amount of information participant assimilated from the application was shown not to be impacted by the provision of context aware feedback even though average experiment duration was found to be significantly shorter in this case lastly participants subjective satisfaction when using context aware functionality was shown to be significantly higher than when using the non context aware application
anytime algorithms have been proposed for many different applications eg in data mining their strengths are the ability to first provide result after very short initialization and second to improve their result with additional time therefore anytime algorithms have so far been used when the available processing time varies eg on varying data streams in this paper we propose to employ anytime algorithms on constant data streams ie for tasks with constant time allowance we introduce two approaches that harness the strengths of anytime algorithms on constant data streams and thereby improve the over all quality of the result with respect to the corresponding budget algorithm we derive formulas for the expected performance gain and demonstrate the effectiveness of our novel approaches using existing anytime algorithms on benchmark data sets
the problem of clustering with side information has received much recent attention and metric learning has been considered as powerful approach to this problem until now various metric learning methods have been proposed for semi supervised clustering although some of the existing methods can use both positive must link and negative cannot link constraints they are usually limited to learning linear transformation ie finding global mahalanobis metric in this paper we propose framework for learning linear and non linear transformations efficiently we use both positive and negative constraints and also the intrinsic topological structure of data we formulate our metric learning method as an appropriate optimization problem and find the global optimum of this problem the proposed non linear method can be considered as an efficient kernel learning method that yields an explicit non linear transformation and thus shows out of sample generalization ability experimental results on synthetic and real world data sets show the effectiveness of our metric learning method for semi supervised clustering tasks
this paper presents novel compiler optimizations for reducing synchronization overhead in compiler parallelized scientific codes hybrid programming model is employed to combine the flexibility of the fork join model with the precision and power of the single program multiple data spmd model by exploiting compile time computation partitions communication analysis can eliminate barrier synchronization or replace it with less expensive forms of synchronization we show computation partitions and data communication can be represented as systems of symbolic linear inequalities for high flexibility and precision these optimizations has been implemented in the stanford suif compiler we extensively evaluate their performance using standard benchmark suites experimental results show barrier synchronization is reduced on average and by several orders of magnitude for certain programs
fine grained error or failure detection is often indispensable for precise effective and efficient reactions to runtime problems in this chapter we describe an approach that facilitates automatic generation of efficient runtime detectors for relevant classes of functional problems the technique targets failures that commonly manifest at the boundaries between the components that form the system it employs model based specification language that developers use to capture system level properties extracted from requirements specifications these properties are automatically translated into assertion like checks and inserted in all relevant locations of the systems codethe main goals of our research are to define useful classes of system level properties identify errors and failures related to the violations of those properties and produce assertions capable of detecting such violations to this end we analyzed wide range of available software specifications bug reports for implemented systems and other sources of information about the developers intent such as test suites the collected information is organized in catalog of requirements level property descriptions these properties are used by developers to annotate their system design specifications and serve as the basis for automatic assertion generation
we propose framework to allow an agent to cope with inconsistent beliefs and to handle conflicting inferences our approach is based on well established line of research on assumption based argumentation frameworks and defeasible reasoning we propose language to allow defeasible assumptions and context sensitive priorities to be explicitly expressed and reasoned about by the agent our work reveals some interesting problems to conditional priority based argumentation and establishes the fundamental properties of these frameworks we also establish sufficient condition for conditional priority based argumentation to have unique stable extension based on the notion of stratification
the rapid development of networking technologies has made it possible to construct distributed database that involves huge number of sites query processing in such large scaled system poses serious challenges beyond the scope of traditional distributed algorithms in this paper we propose new algorithm branca for performing top retrieval in these environments integrating two orthogonal methodologies semantic caching and routing indexes branca is able to solve query by accessing only small number of servers our algorithmic findings are accompanied with solid theoretical analysis which rigorously proves the effectiveness of branca extensive experiments verify that our technique outperforms the existing methods significantly
movement patterns of individual entities at the geographic scale are becoming prominent research focus in spatial sciences one pertinent question is how cognitive and formal characterizations of movement patterns relate in other words are mostly qualitative formal characterizations cognitively adequate this article experimentally evaluates movement patterns that can be characterized as paths through conceptual neighborhood graph that is two extended spatial entities changing their topological relationship gradually the central questions addressed are do humans naturally use topology to create cognitive equivalent classes that is is topology the basis for categorizing movement patterns spatially are all topological relations equally salient and does language influence categorization the first two questions are addressed using modification of the endpoint hypothesis stating that movement patterns are distinguished by the topological relation they end in the third question addresses whether language has an influence on the classification of movement patterns that is whether there is difference between linguistic and non linguistic category construction in contrast to our previous findings we were able to document the importance of topology for conceptualizing movement patterns but also reveal differences in the cognitive saliency of topological relations the latter aspect calls for weighted conceptual neighborhood graph to cognitively adequately model human conceptualization processes
turning the current web into semantic web requires automatic approaches for annotation of existing data since manual approaches will not scale in general we here present an approach for automatic generation of logic frames out of tables which subsequently supports the automatic population of ontologies from table like structures the approach consists of methodology an accompanying implementation and thorough evaluation it is based on grounded cognitive table model which is stepwise instantiated by our methodology
photographic supra projection is forensic process that aims to identify missing person from photograph and skull found one of the crucial tasks throughout all this process is the craniofacial superimposition which tries to find good fit between model of the skull and the photo of the face this photographic supra projection stage is usually carried out manually by forensic anthropologists it is thus very time consuming and presents several difficulties in this paper we aim to demonstrate that real coded evolutionary algorithms are suitable approaches to tackle craniofacial superimposition to do so we first formulate this complex task in forensic identification as numerical optimization problem then we adapt three different evolutionary algorithms to solve it two variants of real coded genetic algorithm and the state of the art evolution strategy cma es we also consider an existing binary coded genetic algorithm as baseline results on several superimposition problems of real world identification cases solved by the physical anthropology lab at the university of granada spain are considered to test our proposals
feature oriented programming fop decomposes complex software into features features are main abstractions in design and implementation they reflect user requirements and incrementally refine one another although features crosscut object oriented architectures they fail to express all kinds of crosscutting concerns this weakness is exactly the strength of aspects the main abstraction mechanism of aspect oriented programming aop in this article we contribute systematic evaluation and comparison of both paradigms aop and fop with focus on incremental software development it reveals that aspects and features are not competing concepts in fact aop has several strengths to improve fop in order to implement crosscutting featuressymmetrically the development model of fop can aid aop in implementing incremental designs consequently we propose the architectural integration of aspects and features in order to profit from both paradigms we introduce aspectual mixin layers amls an implementation approach that realizes this symbiosis subsequent evaluation and case study reveal that amls improve the crosscutting modularity of features as well as aspects become well integrated into incremental development style
empirical evidence suggests that reactive routing systems improve resilience to internet path failures they detect and route around faulty paths based on measurements of path performance this paper seeks to understand why and under what circumstances these techniques are effectiveto do so this paper correlates end to end active probing experiments loss triggered traceroutes of internet paths and bgp routing messages these correlations shed light on three questions about internet path failures where do failures appear how long do they last how do they correlate with bgp routing instability data collected over months from an internet testbed of topologically diverse hosts suggests that most path failures last less than fifteen minutes failures that appear in the network core correlate better with bgp instability than failures that appear close to end hosts on average most failures precede bgp messages by about four minutes but there is often increased bgp traffic both before and after failures our findings suggest that reactive routing is most effective between hosts that have multiple connections to the internet the data set also suggests that passive observations of bgp routing messages could be used to predict about of impending failures allowing re routing systems to react more quickly to failures
the experiments in this paper apply the idea of prototyping programming language tools from robust semantics we used partial evaluator similix to turn interpreters into inverse interpreters this way we generated inverse interpreters for several small languages including interpreters for turing machines an applied lambda calculus flowchart language and subset of java bytecode limiting factors of online partial evaluation were the polyvariant specialization scheme with its lack of generalization advantages were the availability of higher order values to specialize breadth first tree traversalthis application of self applicable partial evaluation is different from the classical futamura projections that tell us how to translate program by specialization of an interpreter
distributed computing is currently undergoing paradigm shift towards large scale dynamic systems where thousands of nodes collaboratively solve computational tasks examples of such emerging systems include autonomous sensor networks data grids wireless mesh network wmn infrastructures and more we argue that speculative computations will be instrumental to successfully performing meaningful computations in such systems moreover solutions deployed in such platforms will need to be as local as possible
in this paper we suggest new way to add multi touch capabilities to an lc screen for this ftir and ir sensing behind the lc screen will be combined using large infrared sensor array mounted behind the lc matrix infrared light in front of the screen which is strong enough to pass through the lc screen’s components can be detected the ftir technology is able to deliver such infrared light to the integrated sensors when touching the screen with the fingers for the prototype the key parameters of the ftir principle were experimentally analyzed to optimize the sensor reactivity the sensor prototype can simultaneously detect multiple touches with an accuracy of around mm and with an update rate of hz
this paper introduces queuing network models for the performance analysis of spmd applications executed on general purpose parallel architectures such as mimd and clusters of workstations the models are based on the pattern of computation communication and operations of typical parallel applications analysis of the models leads to the definition of speedup surfaces which capture the relative influence of processors and parallelism and show the effects of different hardware and software components on the performance since the parameters of the models correspond to measurable program and hardware characteristics the models can be used to anticipate the performance behavior of parallel application as function of the target architecture ie number of processors number of disks topology etc
this paper presents an overview of the field of recommender systems and describes the current generation of recommendation methods that are usually classified into the following three main categories content based collaborative and hybrid recommendation approaches this paper also describes various limitations of current recommendation methods and discusses possible extensions that can improve recommendation capabilities and make recommender systems applicable to an even broader range of applications these extensions include among others an improvement of understanding of users and items incorporation of the contextual information into the recommendation process support for multcriteria ratings and provision of more flexible and less intrusive types of recommendations
this paper studies sentiment analysis of conditional sentences the aim is to determine whether opinions expressed on different topics in conditional sentence are positive negative or neutral conditional sentences are one of the commonly used language constructs in text in typical document there are around of such sentences due to the condition clause sentiments expressed in conditional sentence can be hard to determine for example in the sentence if your nokia phone is not good buy this great samsung phone the author is positive about samsung phone but does not express an opinion on nokia phone although the owner of the nokia phone may be negative about it however if the sentence does not have if the first clause is clearly negative although if commonly signifies conditional sentence there are many other words and constructs that can express conditions this paper first presents linguistic analysis of such sentences and then builds some supervised learning models to determine if sentiments expressed on different topics in conditional sentence are positive negative or neutral experimental results on conditional sentences from diverse domains are given to demonstrate the effectiveness of the proposed approach
in vision and graphics advanced object models require not only shape but also surface detail while several scanning devices exist to capture the global shape of an object few methods concentrate on capturing the fine scale detail fine scale surface geometry relief texture such as surface markings roughness and imprints is essential in highly realistic rendering and accurate prediction we present novel approach for measuring the relief texture of specular or partially specular surfaces using specialized imaging device with concave parabolic mirror to view multiple angles in single image laser scanning typically fails for specular surfaces because of light scattering but our method is explicitly designed for specular surfaces also the spatial resolution of the measured geometry is significantly higher than standard methods so very small surface details are captured furthermore spatially varying reflectance is measured simultaneously ie both texture color and texture shape are retrieved
the emergence of numerous data sources online has presented pressing need for more automatic yet accurate data integration techniques for the data returned from querying such sources most works focus on how to extract the embedded structured data more accurately however to eventually provide an integrated access to these query results last but not least step is to combine the extracted data coming from different sources critical task is finding the correspondence of the data fields between the sources problem well known as schema matching query results are small and biased sample set of instances obtained from sources the obtained schema information is thus very implicit and incomplete which often prevents existing schema matching approaches from performing effectively in this paper we develop novel framework for understanding and effectively supporting schema matching on such instance based data especially for integrating multiple sources we view discovering matching as constructing more complete domain schema that best describes the input data with this conceptual view we can leverage various data instances and observed regularities seamlessly with holistic multiple source schema matching to achieve more accurate matching results our experiments show that our framework consistently outperforms baseline pairwise and clustering based approaches raising measure from to and works uniformly well for the surveyed domains
in most processors caches account for the largest fraction of onchip transistors thus being primary candidate for tackling the leakage problem existing architectural solutions usually rely on customized cache structures which are needed to implement some kind of power management policy memory arrays however are carefully developed and finely tuned by foundries and their internal structure is typically non accessible to system designers in this work we focus on the reduction of leakage energy in caches without interfering with its internal design we proposed truly architectural solution that is based on cache sub banking and on the detection and mapping of the application localities detected from profiling of the cache access patterns by customizing the mapping between the application address space and the cache we can expose as much address space idleness as possible thus resulting in shutdown potential which allows significant leakage saving results show leakage energy reduction of up to about on average with marginal impact on miss rate or execution time
in this paper we describe several issues end users may face when developing web mashup applications in visual language tools like yahoo pipes we explore how these problems manifest themselves in the conversations users have in the associated discussion forums and examine the community practices and processes at work in collaborative debugging and problem solving we have noticed two valences of engagement in the community core and peripheral core engagement involves active question asking and answering and contribution of example content peripheral engagement refers to those who read but don’t post and those who post legitimate questions and content but whose posts receive no response we consider what the characteristics are of each of these groups why there is such strong divide and how the periphery functions in the community process
we present an approach that addresses both formal specification and verification as well as runtime enforcement of rbac access control policies including application specific constraints such as separation of duties sod we introduce temporal cal formal language based on and temporal logic which provides domain specific predicates for expressing rbac and sod constraints an aspect oriented language with domain specific concepts for rbac and sod constraints is used for the runtime enforcement of policies enforcement aspects are automatically generated from temporal cal specifications hence avoiding the possibility of errors and inconsistencies that may be introduced when enforcement code is written manually furthermore the use of aspects ensures the modularity of the enforcement code and its separation from the business logic
existing knowledge discovery and data mining kdd field seldom deliver results that businesses can act on directly this issue trends controversies presents seven short articles reporting on different aspects of domain driven kdd an area that targets the development of effective methodologies and techniques for delivering actionable knowledge in given domain especially business
part structure and articulation are of fundamental importance in computer and human vision we propose using the inner distance to build shape descriptors that are robust to articulation and capture part structure the inner distance is defined as the length of the shortest path between landmark points within the shape silhouette we show that it is articulation insensitive and more effective at capturing part structures than the euclidean distance this suggests that the inner distance can be used as replacement for the euclidean distance to build more accurate descriptors for complex shapes especially for those with articulated parts in addition texture information along the shortest path can be used to further improve shape classification with this idea we propose three approaches to using the inner distance the first method combines the inner distance and multidimensional scaling mds to build articulation invariant signatures for articulated shapes the second method uses the inner distance to build new shape descriptor based on shape contexts the third one extends the second one by considering the texture information along shortest paths the proposed approaches have been tested on variety of shape databases including an articulated shape data set mpeg ce shape kimia silhouettes the eth data set two leaf data sets and human motion silhouette data set in all the experiments our methods demonstrate effective performance compared with other algorithms
instant messaging is popular medium for both social and work related communication in this paper we report an investigation of the effect of interpersonal relationship on underlying basic communication characteristics such as messaging rate and duration using large corpus of instant messages our results show that communication characteristics differ significantly for communications between users who are in work relationship and between users who are in social relationship we used our findings to inform the creation of statistical models that predict the relationship between users without the use of message content achieving an accuracy of nearly for one such model we discuss the results of our analyses and potential uses of these models
on line analytical processing olap refers to the technologies that allow users to efficiently retrieve data from the data warehouse for decision support purposes data warehouses tend to be extremely large it is quite possible for data warehouse to be hundreds of gigabytes to terabytes in size queries tend to be complex and ad hoc often requiring computationally expensive operations such as joins and aggregation given this we are interested in developing strategies for improving query processing in data warehouses by exploring the applicability of parallel processing techniques in particular we exploit the natural partitionability of star schema and render it even more efficient by applying dataindexes storage structure that serves both as an index as well as data and lends itself naturally to vertical partitioning of the data dataindexes are derived from the various special purpose access mechanisms currently supported in commercial olap products specifically we propose declustering strategy which incorporates both task and data partitioning and present the parallel star join psj algorithm which provides means to perform star join in parallel using efficient operations involving only rowsets and projection columns we compare the performance of the psj algorithm with two parallel query processing strategies the first is parallel join strategy utilizing the bitmap join index bji arguably the state of the art olap join structure in use today for the second strategy we choose well known parallel join algorithm namely the pipelined hash algorithm to assist in the performance comparison we first develop cost model of the disk access and transmission costs for all three approaches performance comparisons show that the dataindex based approach leads to dramatically lower disk access costs than the bji as well as the hybrid hash approaches in both speedup and scaleup experiments while the hash based approach outperforms the bji in disk access costs with regard to transmission overhead our performance results show that psj and bji outperform the hash based approach overall our parallel star join algorithm and dataindexes form winning combination
while conventional ranking algorithms such as the pagerank rely on the web structure to decide the relevancy of web page learning to rank seeks function capable of ordering set of instances using supervised learning approach learning to rank has gained increasing popularity in information retrieval and machine learning communities in this paper we propose novel nonlinear perceptron method for rank learning the proposed method is an online algorithm and simple to implement it introduces kernel function to map the original feature space into nonlinear space and employs perceptron method to minimize the ranking error by avoiding converging to solution near the decision boundary and alleviating the effect of outliers in the training dataset furthermore unlike existing approaches such as ranksvm and rankboost the proposed method is scalable to large datasets for online learning experimental results on benchmark corpora show that our approach is more efficient and achieves higher or comparable accuracies in instance ranking than state of the art methods such as frank ranksvm and rankboost
tcp congestion control has been designed to ensure internet stability along with fair and efficient allocation of the network bandwidth during the last decade many congestion control algorithms have been proposed to improve the classic tahoe reno tcp congestion control this paper aims at evaluating and comparing three control algorithms which are westwood new reno and vegas tcp using both ns simulations and live internet measurements simulation scenarios are carefully designed in order to investigate goodput fairness and friendliness provided by each of the algorithms results show that westwood tcp is friendly towards new reno tcp and improves fairness in bandwidth allocation whereas vegas tcp is fair but it is not able to grab its bandwidth share when coexisting with reno or in the presence of reverse traffic because of its rtt based congestion detection mechanism finally results show that westwood remarkably improves utilization of wireless links that are affected by losses not due to congestion
in the framework of data interpretation for petroleum exploration this paper contributes two contributions for visual exploration aiming to manually segment surfaces embedded in volumetric data resulting from user centered design approach the first contribution dynamic picking is new method of viewing slices dedicated to surface tracking ie fault picking from large seismic data sets the proposed method establishes new paradigm of interaction breaking with the conventional slices method usually used by geoscientists based on the time visualization method dynamic picking facilitates localizing of faults by taking advantage of the intrinsic ability of the human visual system to detect dynamic changes in textured data the second projective slice is focus context visualization technique that offers the advantage of facilitating the anticipation of upcoming slices over the sloping surface from the reported experimental results dynamic picking leads to good compromise between fitting precision and completeness of picking while the projective slice significantly reduces the amount of workload for an equivalent level of precision
while prior relevant research in active object recognition pose estimation has mostly focused on single camera systems we propose two multi camera solutions to this problem that can enhance object recognition rate particularly in the presence of occlusion in the proposed methods multiple cameras simultaneously acquire images from different view angles of an unknown randomly occluded object belonging to set of priori known objects by processing the available information within recursive bayesian framework at each step the recognition algorithms attempt to classify the object if its identity pose can be determined with high confidence level otherwise the algorithms would compute the next most informative camera positions for capturing more images the principle component analysis pca is used to produce measurement vector based on the acquired images occlusions in the images are handled by novel probabilistic modelling approach that can increase the robustness of the recognition process with respect to structured noise the camera positions at each recognition step are selected based on two statistical metrics quantifying the quality of the observations namely the mutual information mi and the cramer rao lower bound crlb while the former has also been used in prior relevant work the latter is new in the context of object recognition extensive monte carlo experiments conducted with two camera system demonstrate the effectiveness of the proposed approaches
today many companies rely on third party applications and application services for part of their information systems when applications from different parties are used together an integration problem arises similarly cross organisational application integration requires the coordination of distributed processing across several autonomous applications in this paper we describe an integration approach based on an event based coordination paradigm interaction is based on atomic units of interaction called business events each business event mirrors some event in the real world that requires the coordination of actions within number of components the coordination between applications is achieved by having applications specify preconditions for business events as result business event becomes small scale contract between involved applications each application can insert its own clauses into the contract by specifying preconditions moreover formal method for contract analysis is proposed to verify whether the contract is free from contradictions and inconsistencies finally in addition to its contracting aspect the event based communication paradigm entails dispatching and coordination mechanism which offers the additional advantage of complete separation of the coordination aspects from the functionality aspects the paper discusses different alternative architectures for event based coordination with particular emphasis on distributed loosely coupled environments such as web services
we introduce erasure pure type systems an extension to pure type systems with an erasure semantics centered around type constructor indicating parametric polymorphism the erasure phase is guided by lightweight program annotations the typing rules guarantee that well typed programs obey phase distinction between erasable compile time and non erasable run time terms the erasability of an expression depends only on how its value is used in the rest of the program despite this simple observation most languages treat erasability as an intrinsic property of expressions leading to code duplication problems our approach overcomes this deficiency by treating erasability extrinsically because the execution model of epts generalizes the familiar notions of type erasure and parametric polymorphism we believe functional programmers will find it quite natural to program in such setting
we now have incrementally grown databases of text documents ranging back for over decade in areas ranging from personal email to news articles and conference proceedings while accessing individual documents is easy methods for overviewing and understanding these collections as whole are lacking in number and in scope in this paper we address one such global analysis task namely the problem of automatically uncovering how ideas spread through the collection over time we refer to this problem as information genealogy in contrast to bibliometric methods that are limited to collections with explicit citation structure we investigate content based methods requiring only the text and timestamps of the documents in particular we propose language modeling approach and likelihood ratio test to detect influence between documents in statistically well founded way furthermore we show how this method can be used to infer citation graphs and to identify the most influential documents in the collection experiments on the nips conference proceedings and the physics arxiv show that our method is more effective than methods based on document similarity
method for human body modeling from set of images is proposed this method is based upon the deformation of predefined generic polygonal human mesh towards specific one which should be very similar with the subject when projected on the input images firstly the user defines several feature points on the generic model then rough specific model is obtained via matching the feature points of the model to the corresponding ones of the images and deforming the generic model secondly the reconstruction is improved by matching the silhouette of the deformed model to those of the images thirdly the result is refined by adopting three filters finally texture mapping and skinning are implemented
this paper explores the idea that redundant operations like type errors commonly flag correctness errors we experimentally test this idea by writing and applying four redundancy checkers to the linux operating system finding many errors we then use these errors to demonstrate that redundancies even when harmless strongly correlate with the presence of traditional hard errors eg null pointer dereferences unreleased locks finally we show that how flagging redundant operations gives way to make specifications fail stop by detecting dangerous omissions
the usage of descriptive data mining methods for predictive purposes is recent trend in data mining research it is well motivated by the understandability of learned models the limitation of the so called horizon effect and by the fact that it is multi task solution in particular associative classification whose main idea is to exploit association rules discovery approaches in classification gathered lot of attention in recent years similar idea is represented by the use of emerging patterns discovery for classification purposes emerging patterns are classes of regularities whose support significantly changes from one class to another and the main idea is to exploit class characterization provided by discovered emerging patterns for class labeling in this paper we propose and compare two distinct emerging patterns based classification approaches that work in the relational setting experiments empirically prove the effectiveness of both approaches and confirm the advantage with respect to associative classification
longest prefix matching lpm is fundamental part of various network processing tasks previously proposed approaches for lpm result in prohibitive cost and power dissipation tcams or in large memory requirements and long lookup latencies tries when considering future line rates table sizes and key lengths eg ipv hash based approaches appear to be an excellent candidate for lpm with the possibility of low power compact storage and latencies however there are two key problems that hinder their practical deployment as lpm solutions first naïve hash tables incur collisions and resolve them using chaining adversely affecting worst case lookup rate guarantees that routers must provide second hash functions cannot directly operate on wildcard bits requirement for lpm and current solutions require either considerably complex hardware or large storage space in this paper we propose novel architecture which successfully addresses for the first time both key problems in hash based lpm making the following contributions we architect an lpm solution based upon recently proposed collision free hashing scheme called bloomier filter by eliminating its false positives in storage efficient way we propose novel scheme called prefix collapsing which provides support for wildcard bits with small additional storage and reduced hardware complexity we exploit prefix collapsing and key characteristics found in real update traces to support fast and incremental updates feature generally not available in collision free hashing schemes
document clustering is an important tool for text analysis and is used in many different applications we propose to incorporate prior knowledge of cluster membership for document cluster analysis and develop novel semi supervised document clustering model the method models set of documents with weighted graph in which each document is represented as vertex and each edge connecting pair of vertices is weighted with the similarity value of the two corresponding documents the prior knowledge indicates pairs of documents that known to belong to the same cluster then the prior knowledge is transformed into set of constraints the document clustering task is accomplished by finding the best cuts of the graph under the constraints we apply the model to the normalized cut method to demonstrate the idea and concept our experimental evaluations show that the proposed document clustering model reveals remarkable performance improvements with very limited training samples and hence is very effective semi supervised classification tool
web usage mining possibly used in conjunction with standard approaches to personalization such as collaborative filtering can help address some of the shortcomings of these techniques including reliance on subjective user ratings lack of scalability and poor performance in the face of high dimensional and sparse data however the discovery of patterns from usage data by itself is not sufficient for performing the personalization tasks the critical step is the effective derivation of good quality and useful ie actionable ldquo aggregate usage profiles rdquo from these patterns in this paper we present and experimentally evaluate two techniques based on clustering of user transactions and clustering of pageviews in order to discover overlapping aggregate profiles that can be effectively used by recommender systems for real time web personalization we evaluate these techniques both in terms of the quality of the individual profiles generated as well as in the context of providing recommendations as an integrated part of personalization engine in particular our results indicate that using the generated aggregate profiles we can achieve effective personalization at early stages of users visits to site based only on anonymous clickstream data and without the benefit of explicit input by these users or deeper knowledge about them
the article addresses the problem of copyright protection for motion captured data by designing robust blind watermarking mechanism the mechanism segments motion capture data and identifies clusters of points per segment watermark can be embedded and extracted within these clusters by using proposed extension of quantization index modulation the watermarking scheme is blind in nature and the encoded watermarks are shown to be imperceptible and secure the resulting hiding capacity has bounds based on cluster size the watermarks are shown to be robust against attacks such as uniform affine transformations scaling rotation and translation cropping reordering and noise addition the time complexity for watermark embedding and extraction is estimated as log and log respectively
many multimedia applications can benefit from techniques for adapting existing classifiers to data with different distributions one example is cross domain video concept detection which aims to adapt concept classifiers across various video domains in this paper we explore two key problems for classifier adaptation how to transform existing classifier into an effective classifier for new dataset that only has limited number of labeled examples and how to select the best existing classifier for adaptation for the first problem we propose adaptive support vector machines svms as general method to adapt one or more existing classifiers of any type to the new dataset it aims to learn the delta function between the original and adapted classifier using an objective function similar to svms for the second problem we estimate the performance of each existing classifier on the sparsely labeled new dataset by analyzing its score distribution and other meta features and select the classifiers with the best estimated performance the proposed method outperforms several baseline and competing methods in terms of classification accuracy and efficiency in cross domain concept detection in the trecvid corpus
in this paper we investigate the retrieval performance of monophonic and polyphonic queries made on polyphonic music database we extend the gram approach for full music indexing of monophonic music data to polyphonic music using both rhythm and pitch information we define an experimental framework for comparative and fault tolerance study of various gramming strategies and encoding levels for monophonic queries we focus in particular on query by humming systems and for polyphonic queries on query by example error models addressed in several studies are surveyed for the fault tolerance study our experiments show that different gramming strategies and encoding precision differ widely in their effectiveness we present the results of our study on collection of polyphonic midi encoded music pieces
policies have traditionally been way to specify properties of system in this paper we show how policies can be applied to the organization model for adaptive computational systems omacs in omacs policies may constrain assignments of agents to roles the structure of the goal model for the organization or how an agent may play particular role in this paper we focus on policies limiting system traces this is done to leverage the work already done for specification and verification of properties in concurrent programs we show how traditional policies can be characterized as law policies that is they must always be followed by system in the context of multiagent systems law policies limit the flexibility of the system thus in order to preserve the system flexibility while still being able to guide the system into preferring certain behaviors we introduce the concept of guidance policies these guidance policies need not always be followed when the system cannot continue with the guidance policies they may be suspended we show how this can guide how the system achieves the top level goal while not decreasing flexibility of the system guidance policies are formally defined and since multiple guidance policies can introduce conflicts strategy for resolving conflicts is given
in many real life computer and networking applications the distributions of service times or times between arrivals of requests or both can deviate significantly from the memoryless negative exponential distribution that underpins the product form solution for queueing networks frequently the coefficient of variation of the distributions encountered is well in excess of one which would be its value for the exponential for closed queueing networks with non exponential servers there is no known general exact solution and most if not all approximation methods attempt to account for the general service time distributions through their first two moments we consider two simple closed queueing networks which we solve exactly using semi numerical methods these networks depart from the structure leading to product form solution only to the extent that the service time at single node is non exponential we show that not only the coefficients of variation but also higher order distributional properties can have an important effect on such customary steady state performance measures as the mean number of customers at resource or the resource utilization level in closed network additionally we examine the state that request finds upon its arrival at server which is directly tied to the resulting quality of service although the well known arrival theorem holds exactly only for product form networks of queues some approximation methods assume that it can be applied to reasonable degree also in other closed queueing networks we investigate the validity of this assumption in the two closed queueing models considered our results show that even in the case when there is single non exponential server in the network the state found upon arrival may be highly sensitive to higher order properties of the service time distribution beyond its mean and coefficient of variation this dependence of mean numbers of customers at server on higher order distributional properties is in stark contrast with the situation in the familiar open queue thus our results put into question virtually all traditional approximate solutions which concentrate on the first two moments of service time distributions
security policies are one of the most fundamental elements of computer security this paper uses colored petri net process cpnp to specify and verify security policies in modular way it defines fundamental policy properties ie completeness termination consistency and confluence in petri net terminology and gets some theoretical results according to xacml combiners and property preserving petri net process algebra pppa several policy composition operators are specified and property preserving results are stated for the policy correctness verification
this paper discusses the automatic ontology construction process in digital library traditional automatic ontology construction uses hierarchical clustering to group similar terms and the result hierarchy is usually not satisfactory for human’s recognition human provided knowledge network presents strong semantic features but this generation process is both labor intensive and inconsistent under large scale scenario the method proposed in this paper combines the statistical correction and latent topic extraction of textual data in digital library which produces semantic oriented and owl based ontology the experimental document collection used here is the chinese recorder which served as link between the various missions that were part of the rise and heyday of the western effort to christianize the far east the ontology construction process is described and final ontology in owl format is shown in our result
acquisition of context poses unique challenges to mobile context aware recommender systems the limited resources in these systems make minimizing their context acquisition practical need and the uncertainty in the mobile environment makes missing and erroneous context inputs major concern in this paper we propose an approach based on bayesian networks bns for building recommender systems that minimize context acquisition our learning approach iteratively trims the bn based context model until it contains only the minimal set of context parameters that are important to user in addition we show that two tiered context model can effectively capture the causal dependencies among context parameters enabling recommender system to compensate for missing and erroneous context inputs we have validated our proposed techniques on restaurant recommendation data set and web page recommendation data set in both benchmark problems the minimal sets of context can be reliably discovered for the specific users furthermore the learned bayesian network consistently outperforms the decision tree in overcoming both missing and erroneous context inputs to generate significantly more accurate predictions
source code duplication commonly known as code cloning is considered an obstacle to software maintenance because changes to cloned region often require consistent changes to other regions of the source code research has provided evidence that the elimination of clones may not always be practical feasible or cost effective we present clone management approach that describes clone regions in robust way that is independent from the exact text of clone regions or their location in file and that provides support for tracking clones in evolving software our technique relies on the concept of abstract clone region descriptors crds which describe clone regions using combination of their syntactic structural and lexical information we present our definition of crds and describe clone tracking system capable of producing crds from the output of different clone detection tools notifying developers of modifications to clone regions and supporting updates to the documented clone relationships we evaluated the performance and usefulness of our approach across three clone detection tools and five subject systems and the results indicate that crds are practical and robust representation for tracking code clones in evolving software
distributing data is fundamental problem in implementing efficient distributed memory parallel programs the problem becomes more difficult in environments where the participating nodes are not dedicated to parallel application we are investigating the data distribution problem in non dedicated environments in the context of explicit message passing programs to address this problem we have designed and implemented an extension to mpi called dynamic mpi dyn mpi the key component of dyn mpi is its run time system which efficiently and automatically redistributes data on the fly when there are changes in the application or the underlying environment dyn mpi supports efficient memory allocation precise measurement of system load and computation time and node removal performance results show that programs that use dyn mpi execute efficiently in non dedicated environments including up to almost three fold improvement compared to programs that do not redistribute data and improvement over standard adaptive load balancing techniques
static program slice is an extract of program which can help our understanding of the behavior of the program it has been proposed for use in debugging optimization parallelization and integration of programs this article considers two types of static slices executable and nonexecutable efficient and well founded methods have been developed to construct executable slices for programs without goto statements it would be tempting to assume these methods would apply as well in programs with arbitrary goto statements we show why previous methods do not work in this more general setting and describe our solutions that correctly and efficiently compute executable slices for programs even with arbitrary goto statements our conclusion is that goto statements can be accommodated in generating executable static slices
in dynamic analysis ie execution trace analysis an important problem is to cope with the volume of data to process however in the literature no definitive solution has yet been proposed generally the techniques start by compressing the execution trace before proceeding with the analysis in this paper we propose way to process the uncompressed execution trace using sampling technique then we present the concept of temporally omnipresent class that is the analogy of the noise in signal processing during analysis the omnipresent classes can be filtered out to concentrate only on relevant ones next we present the extension of our sampling technique to the dynamic clustering of classes this is way to recover the components of legacy system we finally show the application of this approach to medium size industrial software system as well as the tool that supports it as conclusion we suggest that our noise reduction and clustering techniques are both efficient and scalable
in order to foster the acceptance of commerce services it is vital to understand the intra industry interactions and dynamics while previous studies utilised mainly individual level perceptions and social influence models to explain the growth of commerce this study offers strategic viewpoint that may partially account for the slow diffusion of commerce services analytic induction is applied and value system analysis is used to identify the differences in the dynamics of the commerce and commerce industries and detect plausible key structural barriers to commerce diffusion
organizing web search results into hierarchical categories facilitates users browsing through web search results especially for ambiguous queries where the potential results are mixed together previous methods on search result classification are usually based on pre training classification model on some fixed and shallow hierarchical categories where only the top two level categories of web taxonomy is used such classification methods may be too coarse for users to browse since most search results would be classified into only two or three shallow categories instead deep hierarchical classifier must provide many more categories however the performance of such classifiers is usually limited because their classification effectiveness can deteriorate rapidly at the third or fourth level of hierarchy in this paper we propose novel algorithm known as deep classifier to classify the search results into detailed hierarchical categories with higher effectiveness than previous approaches given the search results in response to query the algorithm first prunes wide ranged hierarchy into narrow one with the help of some web directories different strategies are proposed to select the training data by utilizing the hierarchical structures finally discriminative naÃve bayesian classifier is developed to perform efficient and effective classification as result the algorithm can provide more meaningful and specific class labels for search result browsing than shallow style of classification we conduct experiments to show that the deep classifier can achieve significant improvement over state of the art algorithms in addition with sufficient off line preparation the efficiency of the proposed algorithm is suitable for online application
in this paper we present study of responses to the idea of being recorded by ubicomp recording technology called sensecam this study focused on real life situations in two north american and two european locations we present the findings of this study and their implications specifically how those who might be recorded perceive and react to sensecam we describe what system parameters social processes and policies are required to meet the needs of both the primary users and these secondary stakeholders and how being situated within particular locale can influence responses our results indicate that people would tolerate potential incursions from sensecam for particular purposes furthermore they would typically prefer to be informed about and to consent to recording as well as to grant permission before any data is shared these preferences however are unlikely to instigate request for deletion or other action on their part these results inform future design of recording technologies like sensecam and provide broader understanding of how ubicomp technologies might be taken up across different cultural and political regions
this paper describes method for improving the performance of large direct mapped cache by reducing the number of conflict misses our solution consists of two components an inexpensive hardware device called cache miss lookaside cml buffer that detects conflicts by recording and summarizing history of cache misses and software policy within the operating system’s virtual memory system that removes conflicts by dynamically remapping pages whenever large numbers of conflict misses are detected using trace driven simulation of applications and the operating system we show that cml buffer enables large direct mapped cache to perform nearly as well as two way set associative cache of equivalent size and speed although with lower hardware cost and complexity
content targeted advertising the task of automatically associating ads to web page constitutes key web monetization strategy nowadays further it introduces new challenging technical problems and raises interesting questions for instance how to design ranking functions able to satisfy conflicting goals such as selecting advertisements ads that are relevant to the users and suitable and profitable to the publishers and advertisers in this paper we propose new framework for associating ads with web pages based on genetic programming gp our gp method aims at learning functions that select the most appropriate ads given the contents of web page these ranking functions are designed to optimize overall precision and minimize the number of misplacements by using real ad collection and web pages from newspaper we obtained gain over state of the art baseline method of in average precision further by evolving individuals to provide good ranking estimations gp was able to discover ranking functions that are very effective in placing ads in web pages while avoiding irrelevant ones
we present non photorealistic algorithm for retargeting large images to small size displays particularly on mobile devices this method adapts large images so that important objects in the image are still recognizable when displayed at lower target resolution existing image manipulation techniques such as cropping works well for images containing single important object and down sampling works well for images containing low frequency information however when these techniques are automatically applied to images with multiple objects the image quality degrades and important information may be lost our algorithm addresses the case of multiple important objects in an image the retargeting algorithm segments an image into regions identifies important regions removes them fills the resulting gaps resizes the remaining image and re inserts the important regions our approach lies in constructing topologically constrained epitome of an image based on visual attention model that is both comprehensible and size varying making the method suitable for display critical applications
an organization’s data is often its most valuable asset but today’s file systems provide few facilities to ensure its safety databases on the other hand have long provided transactions transactions are useful because they provide atomicity consistency isolation and durability acid many applications could make use of these semantics but databases have wide variety of nonstandard interfaces for example applications like mail servers currently perform elaborate error handling to ensure atomicity and consistency because it is easier than using dbms transaction oriented programming model eliminates complex error handling code because failed operations can simply be aborted without side effects we have designed file system that exports acid transactions to user level applications while preserving the ubiquitous and convenient posix interface in our prototype acid file system called amino updated applications can protect arbitrary sequences of system calls within transaction unmodified applications operate without any changes but each system call is transaction protected we also built recoverable memory library with support for nested transactions to allow applications to keep their in memory data structures consistent with the file system our performance evaluation shows that acid semantics can be added to applications with acceptable overheads when amino adds atomicity consistency and isolation functionality to an application it performs close to ext amino achieves durability up to percnt faster than ext thanks to improved locality
we study sleep wake scheduling for low duty cycle sensor networks our work explicitly considers the effect of synchronization error we focus on widely used synchronization scheme and show that its synchronization error is nonnegligible and using conservative guard time is energy wasteful we formulate an optimization problem that aims to set the capture probability threshold for messages from each individual node such that the expected energy consumption is minimized and the collective quality of service qos over the nodes is guaranteed the problem is nonconvex nonetheless we are able to obtain solution with energy consumption that is provably at most larger than the optimal solution simulations demonstrate the efficacy of our solution
this paper the presence of functional dependencies specifically we present repairing strategy where only tuple updates are allowed in order to restore consistency the proposed approach allows us to obtain unique repair which can be computed in polynomial time
one truth holds for the healthcare industry nothing should interfere with the delivery of care given this fact the access control mechanisms used in healthcare to regulate and restrict the disclosure of data are often bypassed this break the glass phenomenon is an established pattern in healthcare organizations and though quite useful and mandatory in emergency situations it represents serious system weaknessin this paper we propose an access control solution aimed at better management of exceptions that occur in healthcare our solution is based on the definition of different policy spaces regulating access to patient data and used to balance the rigorous nature of traditional access control systems with the prioritization of care delivery
one of the well known risks of large margin training methods such as boosting and support vector machines svms is their sensitivity to outliers these risks are normally mitigated by using soft margin criterion such as hinge loss to reduce outlier sensitivity in this paper we present more direct approach that explicitly incorporates outlier suppression in the training process in particular we show how outlier detection can be encoded in the large margin training principle of support vector machines by expressing convex relaxation of the joint training problem as semide finite program one can use this approach to robustly train support vector machine while suppressing outliers we demonstrate that our approach can yield superior results to the standard soft margin approach in the presence of outliers
the design of module system for constructing and maintaining large programs is difficult task that raises number of theoretical and practical issues fundamental issue is the management of the flow of information between program units at compile time via the notion of an interface experience has shown that fully opaque interfaces are awkward to use in practice since too much information is hidden and that fully transparent interfaces lead to excessive interdependencies creating problems for maintenance and separate compilation the ldquo sharing rdquo specifications of standard ml address this issue by allowing the programmer to specify equational relationships between types in separated modules but are not expressive enough to allow the programmer complete control over the propagation of type information between modules these problems are addressed from type theoretic viewpoint by considering calculus based on girard’s system ohgr the calculus differs form those considered in previous studies by relying exclusively on new form of weak sum type to propagate information at compile time in contrast to approaches based on strong sums which rely on substitution the new form of sum type allows for the specification of equational as well as type and kind information in interfaces this provides complete control over the propagation of compile time information between program units and is sufficient to encode in straightforward way most users of type sharing specifications in standard ml modules are treated as ldquo first class rdquo citizens and therefore the system supports higher order modules and some object oriented programming idioms the language may be easily restricted to ldquo second class rdquo modules found in ml like languages
when sensor network is deployed it is typically required to support multiple simultaneous missions schemes that assign sensing resources to missions thus become necessary in this article we formally define the sensor mission assignment problem and discuss some of its variants in its most general form this problem is np hard we propose algorithms for the different variants some of which include approximation guarantees we also propose distributed algorithms to assign sensors to missions which we adapt to include energy awareness to extend network lifetime finally we show comprehensive simulation results comparing these solutions to an upper bound on the optimal solution
the problem of products missing from the shelf is major one in the grocery retail sector as it leads to lost sales and decreased consumer loyalty yet the possibilities for detecting and measuring an out of shelf situation are limited in this paper we suggest the employment of machine learning techniques in order to develop rule based decision support system for automatically detecting products that are not on the shelf based on sales and other data results up to now suggest that rules related with the detection of out of shelf products are characterized by acceptable levels of predictive accuracy and problem coverage
for decade hci researchers and practitioners have been developing methods practices and designs for the full range of human experience on the one hand variety of approaches to design such as aesthetic affective and ludic that emphasize particular qualities and contexts of experience and particular approaches to intervening in interactive experience have become focal on the other variety of approaches to understanding users and user experience based on narrative biography and role play have been developed and deployed these developments can be viewed in terms of one of the seminal commitments of hci to know the user empathy has been used as defining characteristic of designer user relationships when design is concerned with user experience in this article we use empathy to help position some emerging design and user experience methodologies in terms of dynamically shifting relationships between designers users and artefacts
mpsoc multi processor system on chip architecture is becoming increasingly used because it can provide designers much more opportunities to meet specific performance and power goals in this paper we propose an mpsoc architecture for implementing real time signal processing in gamma camera based on fully analysis of the characteristics of the application we design several algorithms to optimize the systems in terms of processing speed power consumption and area costs etc two types of dsp core have been designed for the integral algorithm and the coordinate algorithm the key parts of signal processing in gamma camera an interconnection synthesis algorithm is proposed to reduce the area cost of the network on chip we implement our mpsoc architecture on fpga and synthesize dsp cores and network on chip using synopsys design compiler with umc upmu textrm standard cell library the results show that our technique can effectively accelerate the processing and satisfy the requirements of real time signal processing for image construction
color histograms lack spatial information and are sensitive to intensity variation color distortion and cropping as result images with similar histograms may have totally different semantics the region based approaches are introduced to overcome the above limitations but due to the inaccurate segmentation these systems may partition an object into several regions that may have confused users in selecting the proper regions in this paper we present robust image retrieval based on color histogram of local feature regions lfr firstly the steady image feature points are extracted by using multi scale harris laplace detector then the significant local feature regions are ascertained adaptively according to the feature scale theory finally the color histogram of local feature regions is constructed and the similarity between color images is computed by using the color histogram of lfrs experimental results show that the proposed color image retrieval is more accurate and efficient in retrieving the user interested images especially it is robust to some classic transformations additive noise affine transformation including translation rotation and scale effects partial visibility etc
modern enterprise networks are of sufficient complexity that even simple faults can be difficult to diagnose let alone transient outages or service degradations nowhere is this problem more apparent than in the based wireless access networks now ubiquitous in the enterprise in addition to the myriad complexities of the wired network wireless networks face the additional challenges of shared spectrum user mobility and authentication management not surprisingly few organizations have the expertise data or tools to decompose the underlying problems and interactions responsible for transient outages or performance degradations in this paper we present set of modeling techniques for automatically characterizing the source of such problems in particular we focus on data transfer delays unique to networks media access dynamics and mobility management latency through combination of measurement inference and modeling we reconstruct sources of delay from the physical layer to the transport layer as well as the interactions among them we demonstrate our approach using comprehensive traces of wireless activity in the ucsd computer science building
many web sites have dynamic information objects whose topics change over time classifying these objects automatically and promptly is challenging and important problem for site masters traditional content based and link structure based classification techniques have intrinsic limitations for this task this paper proposes framework to classify an object into an existing category structure by analyzing the users traversals in the category structure the key idea is to infer an object’s topic from the predicted preferences of users when they access the object we compare two approaches using this idea one analyzes collective user behavior and the other each user’s accesses we present experimental results on actual data that demonstrate much higher prediction accuracy and applicability with the latter approach we also analyze the correlation between classification quality and various factors such as the number of users accessing the object to our knowledge this work is the first effort in combining object classification with user access prediction
test suite minimisation techniques seek to reduce the effort required for regression testing by selecting subset of test suites in previous work the problem has been considered as single objective optimisation problem however real world regression testing can be complex process in which multiple testing criteria and constraints are involved this paper presents the concept of pareto efficiency for the test suite minimisation problem the pareto efficient approach is inherently capable of dealing with multiple objectives providing the decision maker with group of solutions that are not dominated by each other the paper illustrates the benefits of pareto efficient multi objective test suite minimisation with empirical studies of two and three objective formulations in which multiple objectives such as coverage and past fault detection history are considered the paper utilises hybrid multi objective genetic algorithm that combines the efficient approximation of the greedy approach with the capability of population based genetic algorithm to produce higher quality pareto fronts
an under appreciated facet of index search structures is the importance of high performance search within tree internal nodes much attention has been focused on improving node fanout and hence minimizing the tree height bu ll gg lo have discussed the importance of tree page size recent article gl discusses internal node architecture but the subject is buried in single section of the paperin this short note want to describe the long evolution of good internal node architecture and techniques including an understanding of what problem was being solved during each of the incremental steps that have led to much improved node organizations
recognizing objects in images is an active area of research in computer vision in the last two decades there has been much progress and there are already object recognition systems operating in commercial products however most of the algorithms for detecting objects perform an exhaustive search across all locations and scales in the image comparing local image regions with an object model that approach ignores the semantic structure of scenes and tries to solve the recognition problem by brute force in the real world objects tend to covary with other objects providing rich collection of contextual associations these contextual associations can be used to reduce the search space by looking only in places in which the object is expected to be this also increases performance by rejecting patterns that look like the target but appear in unlikely places most modeling attempts so far have defined the context of an object in terms of other previously recognized objects the drawback of this approach is that inferring the context becomes as difficult as detecting each object an alternative view of context relies on using the entire scene information holistically this approach is algorithmically attractive since it dispenses with the need for prior step of individual object recognition in this paper we use probabilistic framework for encoding the relationships between context and object properties and we show how an integrated system provides improved performance we view this as significant step toward general purpose machine vision systems
the leading web search engines have spent decade building highly specialized ranking functions for english web pages one of the reasons these ranking functions are effective is that they are designed around features such as pagerank automatic query and domain taxonomies and click through information etc unfortunately many of these features are absent or altered in other languages in this work we show how to exploit these english features for subset of chinese queries which we call linguistically non local lnl lnl chinese queries have minimally ambiguous english translation which also functions as good english query we first show how to identify pairs of chinese lnl queries and their english counterparts from chinese and english query logs then we show how to effectively exploit these pairs to improve chinese relevance ranking our improved relevance ranker proceeds by translating query into english computing cross lingual relational graph between the chinese and english documents and employing the relational ranking method of qin et al to rank the chinese documents our technique gives consistent improvements over state of the art chinese mono lingual ranker on web search data from the microsoft live china search engine
we present ware framework for joint software and hardware modelling and synthesis of multiprocessor embedded systems the framework consists of component based annotated transaction level models for joint modelling of parallel software and multiprocessor hardware and exploration driven methodology for joint software and hardware synthesis the methodology has the advantage of combining real time requirements of software with efficient optimization of hardware performance we describe and apply the methodology to synthesize scheduler of video encoder on the cake multiprocessor moreover experiments show that the framework is scalable while achieving rapid and efficient designs
in this paper we introduce weights into pawlak rough set model to balance the class distribution of data set and develop weighted rough set based method to deal with the class imbalance problem in order to develop the weighted rough set based method we design first weighted attribute reduction algorithm by introducing and extending guiasu weighted entropy to measure the significance of an attribute then weighted rule extraction algorithm by introducing weighted heuristic strategy into lem algorithm and finally weighted decision algorithm by introducing several weighted factors to evaluate extracted rules furthermore in order to estimate the performance of the developed method we compare the weighted rough set based method with several popular methods used for class imbalance learning by conducting experiments with twenty uci data sets comparative studies indicate that in terms of auc and minority class accuracy the weighted rough set based method is better than the re sampling and filtering based methods and is comparable to the decision tree and svm based methods it is therefore concluded that the weighted rough set based method is effective for class imbalance learning
given the fpga based partially reconfigurable systems hardware tasks can be configured into or removed from the fpga fabric without interfering with other tasks running on the same device in such systems the efficiency of task scheduling algorithms directly impacts the overall system performance by using previously proposed scheduling model existing algorithms could not provide an efficient way to find all suitable allocations in addition most of them ignored the single reconfiguration port constraint and inter task dependencies further more to our best knowledge there is no previous work investigating in the impact on the scheduling result by reusing already placed tasks in this paper we focus on online task scheduling and propose task scheduling solution that takes the ignored constraints into account in addition novel reuse and partial reuse approach is proposed the simulation results show that our proposed solution achieves shorter application completion time up to and faster single task response time up to compared to the previously proposed stuffing algorithm
embedded systems require control of many concurrent real time activities leading to system designs that feature variety of hardware peripherals with each providing specific dedicated service these peripherals increase system size cost weight and design time software thread integration sti provides low cost thread concurrency on general purpose processors by automatically interleaving multiple threads of control into one this simplifies hardware to software migration which eliminates dedicated hardware and can help embedded system designers meet design constraints such as size weight and cost we have developed concepts for performing sti and have implemented many in our automated postpass compiler thrint here we present the transformations and examine how well the compiler integrates threads for two display applications we examine the integration procedure the processor load and code memory expansion integration allows reclamation of cpu idle time allowing run time speedups of to
finite state verification eg model checking provides powerful means to detect concurrency errors which are often subtle and difficult to reproduce nevertheless widespread use of this technology by developers is unlikely until tools provide automated support for extracting the required finite state models directly from program source unfortunately the dynamic features of modern languages such as java complicate the construction of compact finite state models for verification in this article we show how shape analysis which has traditionally been used for computing alias information in optimizers can be used to greatly reduce the size of finite state models of concurrent java programs by determining which heap allocated variables are accessible only by single thread and which shared variables are protected by locks we also provide several other state space reductions based on the semantics of java monitors prototype of the reductions demonstrates their effectiveness
we introduce theoretical framework for discovering relationships between two database instances over distinct and unknown schemata this framework is grounded in the context of data exchange we formalize the problem of understanding the relationship between two instances as that of obtaining schema mapping so that minimum repair of this mapping provides perfect description of the target instance given the source instance we show that this definition yields ldquo intuitive rdquo results when applied on database instances derived from each other by basic operations we study the complexity of decision problems related to this optimality notion in the context of different logical languages and show that even in very restricted cases the problem is of high complexity
the process by which students learn to program is major issue in computer science educational research programming is fundamental part of the computer science curriculum but one which is often problematic it seems to be difficult to find an effective method of teaching that is suitable for all students in this research we tried to gain insights into ways of improving our teaching by careful examination of students mistakes the compiler errors that were generated by their programs together with the pattern that was observed in their debugging activities formed the basis of this research we discovered that many students with good understanding of programming do not acquire the skills to debug programs effectively and this is major impediment to their producing working code of any complexity skill at debugging seems to increase programmer’s confidence and we suggest that more emphasis be placed on debugging skills in the teaching of programming
we formalize realistic model for computations over massive data sets the model referred to as the em adversarial sketch model unifies the well studied sketch and data stream models together with cryptographic flavor that considers the execution of protocols in hostile environments and provides framework for studying the complexity of many tasks involving massive data sets the adversarial sketch model consists of several participating parties honest parties whose goal is to compute pre determined function of their inputs and an adversarial party computation in this model proceeds in two phases in the first phase the adversarial party chooses the inputs of the honest parties these inputs are sets of elements taken from large universe and provided to the honest parties in an on line manner in the form of sequence of insert and delete operations once an operation from the sequence has been processed it is discarded and cannot be retrieved unless explicitly stored during this phase the honest parties are not allowed to communicate moreover they do not share any secret information and any public information they share is known to the adversary in advance in the second phase the honest parties engage in protocol in order to compute pre determined function of their inputs in this paper we settle the complexity up to logarithmic factors of two fundamental problems in this model testing whether two massive data sets are equal and approximating the size of their symmetric difference we construct explicit and efficient protocols with sublinear sketches of essentially optimal size poly logarithmic update time during the first phase and poly logarithmic communication and computation during the second phase our main technical contribution is an explicit and deterministic encoding scheme that enjoys two seemingly conflicting properties incrementality and high distance which may be of independent interest
we examine the problem of providing useful feedback about access control decisions to users while controlling the disclosure of the system’s security policies relevant feedback enhances system usability especially in systems where permissions change in unpredictable ways depending on contextual information however providing feedback indiscriminately can violate the confidentiality of system policy to achieve balance between system usability and the protection of security policies we present know framework that uses cost functions to provide feedback to users about access control decisions know honors the policy protection requirements which are represented as meta policy and generates permissible and relevant feedback to users on how to obtain access to resource to the best of our knowledge our work is the first to address the need for useful access control feedback while honoring the privacy and confidentiality requirements of system’s security policy
we present practical algorithms for accelerating geometric queries on models made of nurbs surfaces using programmable graphics processing units gpus we provide generalized framework for using gpus as co processors in accelerating cad operations by attaching the data corresponding to surface normals to surface bounding box structure we can calculate view dependent geometric features such as silhouette curves in real time we make use of additional surface data linked to surface bounding box hierarchies on the gpu to answer queries such as finding the closest point on curved nurbs surface given any point in space and evaluating the clearance between two solid models constructed using multiple nurbs surfaces we simultaneously output the parameter values corresponding to the solution of these queries along with the model space values though our algorithms make use of the programmable fragment processor the accuracy is based on the model space precision unlike earlier graphics algorithms that were based only on image space precision in addition we provide theoretical bounds for both the computed minimum distance values as well as the location of the closest point our algorithms are at least an order of magnitude faster than the commercial solid modeling kernel acis
this paper presents and experimentally evaluates new algorithm for efficient one hop link state routing in full mesh networks prior techniques for this setting scale poorly as each node incurs quadratic communication overhead to broadcast its link state to all other nodes in contrast in our algorithm each node exchanges routing state with only small subset of overlay nodes determined by using quorum system using two round protocol each node can find an optimal one hop path to any other node using only per node communication our algorithm can also be used to find the optimal shortest path of arbitrary length using only logn per node communication the algorithm is designed to be resilient to both node and link failures we apply this algorithm to resilient overlay network ron system and evaluate the results using large scale globally distributed set of internet hosts the reduced communication overhead from using our improved full mesh algorithm allows the creation of all pairs routing overlays that scale to hundreds of nodes without reducing the system’s ability to rapidly find optimal routes
this article presents the speedups achieved in generic single chip microprocessor system by employing high performance datapath the datapath acts as coprocessor that accelerates computational intensive kernel sections thereby increasing the overall performance we have previously introduced the datapath which is composed of flexible computational components fccs these components can realize any two level template of primitive operations the automated coprocessor synthesis method from high level software description and its integration to design flow for executing applications on the system is presented for evaluating the effectiveness of our coprocessor approach analytical study in respect to the type of the custom datapath and to the microprocessor architecture is performed the overall application speedups of several real life applications relative to the software execution on the microprocessor are estimated using the design flow these speedups range from to with an average value of while the overhead in circuit area is small the design flow achieved the acceleration of the applications near to theoretical speedup bounds comparison with another high performance datapath showed that the proposed coprocessor achieves smaller area time products by an average of percnt for the generated datapaths additionally the fcc coprocessor achieves better performance in accelerating kernels relative to software programmable dsp cores
large amount of structured information is buried in unstructured text information extraction systems can extract structured relations from the documents and enable sophisticated sql like queries over unstructured text information extraction systems are not perfect and their output has imperfect precision and recall ie contains spurious tuples and misses good tuples typically an extraction system has set of parameters that can be used as ldquo knobs rdquo to tune the system to be either precision or recall oriented furthermore the choice of documents processed by the extraction system also affects the quality of the extracted relation so far estimating the output quality of an information extraction task has been an ad hoc procedure based mainly on heuristics in this article we show how to use receiver operating characteristic roc curves to estimate the extraction quality in statistically robust way and show how to use roc analysis to select the extraction parameters in principled manner furthermore we present analytic models that reveal how different document retrieval strategies affect the quality of the extracted relation finally we present our maximum likelihood approach for estimating on the fly the parameters required by our analytic models to predict the runtime and the output quality of each execution plan our experimental evaluation demonstrates that our optimization approach predicts accurately the output quality and selects the fastest execution plan that satisfies the output quality restrictions
we present an empirical study in which we investigated group versus individual performance with collaborative information visualization environments cives the effects of system transparency on users performance and the effects of different collaborative settings on cive usage subjects searched for findings with cives working either alone in collocated dyad using shared electronic whiteboard or in remote dyad using application sharing groups answered more questions correctly and took less time with the more transparent cive than groups using the less transparent cive we interpret our results to mean that groups have better self corrective abilities when the system is transparent we present stage model to explain the collaborative process of using cives which accounts for task type collaborative setting and system transparency
much current software defect prediction work focuses on the number of defects remaining in software system in this paper we present association rule mining based methods to predict defect associations and defect correction effort this is to help developers detect software defects and assist project managers in allocating testing resources more effectively we applied the proposed methods to the sel defect data consisting of more than projects over more than years the results show that for defect association prediction the accuracy is very high and the false negative rate is very low likewise for the defect correction effort prediction the accuracy for both defect isolation effort prediction and defect correction effort prediction are also high we compared the defect correction effort prediction method with other types of methods part and naïve bayes and show that accuracy has been improved by at least percent we also evaluated the impact of support and confidence levels on prediction accuracy false negative rate false positive rate and the number of rules we found that higher support and confidence levels may not result in higher prediction accuracy and sufficient number of rules is precondition for high prediction accuracy
we introduce two families of techniques rubbing and tapping that use zooming to make possible precise interaction on passive touch screens and describe examples of each rub pointing uses diagonal rubbing gesture to integrate pointing and zooming in single handed technique in contrast zoom tapping is two handed technique in which the dominant hand points while the non dominant hand taps to zoom simulating multi touch functionality on single touch display rub tapping is hybrid technique that integrates rubbing with the dominant hand to point and zoom and tapping with the non dominant hand to confirm selection we describe the results of formal user study comparing these techniques with each other and with the well known take off and zoom pointing selection techniques rub pointing and zoom tapping had significantly fewer errors than take off for small targets and were significantly faster than take off and zoom pointing we show how the techniques can be used for fluid interaction in an image viewer and in google maps
xml access control requires the enforcement of highly expressive access control policies to support schema document and object specific protection requirements access control models for xml data can be classified in two major categories node filtering and query rewriting systems the first category includes approaches that use access policies to compute secure user views on xml data sets user queries are then evaluated on those views in the second category of approaches authorization rules are used to transform user queries to be evaluated against the original xml data set the pros and cons for these approaches have been widely discussed in the framework of xml access control standardization activities the aim of this paper is to describe model combining the advantages of these approaches and overcoming their limitations suitable as the basis of standard technique for xml access control enforcement the model specification is given using finite state automata ensuring generality wrt specific implementation techniques
in this paper we explore some of the opportunities and challenges for machine learning on the semantic web the semantic web provides standardized formats for the representation of both data and ontological background knowledge semantic web standards are used to describe meta data but also have great potential as general data format for data communication and data integration within broad range of possible applications machine learning will play an increasingly important role machine learning solutions have been developed to support the management of ontologies for the semi automatic annotation of unstructured data and to integrate semantic information into web mining machine learning will increasingly be employed to analyze distributed data sources described in semantic web formats and to support approximate semantic web reasoning and querying in this paper we discuss existing and future applications of machine learning on the semantic web with strong focus on learning algorithms that are suitable for the relational character of the semantic web’s data structure we discuss some of the particular aspects of learning that we expect will be of relevance for the semantic web such as scalability missing and contradicting data and the potential to integrate ontological background knowledge in addition we review some of the work on the learning of ontologies and on the population of ontologies mostly in the context of textual data
for software information agent operating on behalf of human owner and belonging to community of agents the choice of communicating or not with another agent becomes decision to take since communication generally implies cost since these agents often operate as recommender systems on the basis of dynamic recognition of their human owners behaviour and by generally using hybrid machine learning techniques three main necessities arise in their design namely providing the agent with an internal representation of both interests and behaviour of its owner usually called ontology ii detecting inter ontology properties that can help an agent to choose the most promising agents to be contacted for knowledge sharing purposes iii semi automatically constructing the agent ontology by simply observing the behaviour of the user supported by the agent leaving to the user only the task of defining concepts and categories of interest we present complete mas architecture called connectionist learning and inter ontology similarities cilios for supporting agent mutual monitoring trying to cover all the issues above cilios exploits an ontology model able to represent concepts concept collections functions and causal implications among events in multi agent environment moreover it uses mechanism capable of inducing logical rules representing agent behaviour in the ontology by means of connectionist ontology representation based on neural symbolic networks ie networks whose input and output nodes are associated with logic variables
this paper describes an integrated framework for plug and play soc test automation this framework is based on new approach for wrapper tam co optimization based on rectangle packing we first tailor tam widths to each core’s test data needs we then use rectangle packing to develop an integrated scheduling algorithm that incorporates precedence and power constraints in the test schedule while allowing the soc integrator to designate group of tests as preemptable finally we study the relationship between tam width and tester data volume to identify an effective tam width for the soc we present experimental results for non preemptive preemptive and power constrained test scheduling as well as for effective tam width identification for an academic benchmark soc and three industrial socs
database research traditionally aimed at data management methods and tools in various frameworks now requires broader focus building on recent successes in business applications researchers in database technology need to widen their spectrum of interest to confront new data management opportunities particularly in thecontext of the internet indeed in the asilomar report on database research experts from industry and academia called for researchers to make it easy for everyoneto store organize access and analyze the majority of human information online within the next years
number of recent papers in the networking community study the distance matrix defined by the node to node latencies in the internet and in particular provide number of quite successful distributed approaches that embed this distance into low dimensional euclidean space in such algorithms it is feasible to measure distances among only linear or near linear number of node pairs the rest of the distances are simply not available moreover for applications it is desirable to spread the load evenly among the participating nodes indeed several recent studies use this fully distributed approach and achieve empirically low distortion for all but small fraction of node pairsthis is concurrent with the large body of theoretical work on metric embeddings but there is fundamental distinction in the theoretical approaches to metric embeddings full and centralized access to the distance matrix is assumed and heavily used in this paper we present the first fully distributed embedding algorithm with provable distortion guarantees for doubling metrics which have been proposed as reasonable abstraction of internet latencies thus providing some insight into the empirical success of the recent vivaldi algorithm the main ingredient of our embedding algorithm is an improved fully distributed algorithm for more basic problem of triangulation where the triangle inequality is used to infer the distances that have not been measured this problem received considerable attention in the networking community and has also been studied theoretically in we use our techniques to extend isin relaxed embeddings and triangulations to infinite metrics and arbitrary measures and to improve on the approximate distance labeling scheme of talwar
we present technique for the analysis and re synthesis of arrangements of stroke based vector elements the capture of an artist’s style by the sole posterior analysis of his her achieved drawing poses formidable challenge such by example techniques could become one of the most intuitive tools for users to alleviate creation process efforts here we propose to tackle this issue from statistical point of view and take specific care of accounting for information usually overlooked in previous research namely the elements very appearance composed of curve like strokes we describe elements by concise set of perceptually relevant features after detecting appearance dominant traits we can generate new arrangements that respect the captured appearance related spatial statistics using multitype point processes our method faithfully reproduces visually similar arrangements and relies on neither heuristics nor post processes to ensure statistical correctness
we present novel protocol for fast multi hop message propagation in the scenario of ad hoc vehicular networks vanet our approach has been designed to gain optimal performance in scenarios that are very likely but not common in literature frov faces asymmetric communications and varying transmission ranges in this scenario it is able to broadcast any message with the minimal number of hops moreover our proposal is scalable with respect to the number of participating vehicles and tolerates vehicles that leave or join the platoon at the current state of development our protocol is optimal in the case of unidimensional roads and we are studying its extension to web of urban roads this paper presents the preliminary results of simulations carried out to verify the feasibility of our proposal
we consider several distributed collaborative key agreement and authentication protocols for dynamic peer groups there are several important characteristics which make this problem different from traditional secure group communication they are distributed nature in which there is no centralized key server collaborative nature in which the group key is contributory ie each group member will collaboratively contribute its part to the global group key and dynamic nature in which existing members may leave the group while new members may join instead of performing individual rekeying operations ie recomputing the group key after every join or leave request we discuss an interval based approach of rekeying we consider three interval based distributed rekeying algorithms or interval based algorithms for short for updating the group key the rebuild algorithm the batch algorithm and the queue batch algorithm performance of these three interval based algorithms under different settings such as different join and leave probabilities is analyzed we show that the interval based algorithms significantly outperform the individual rekeying approach and that the queue batch algorithm performs the best among the three interval based algorithms more importantly the queue batch algorithm can substantially reduce the computation and communication workload in highly dynamic environment we further enhance the interval based algorithms in two aspects authentication and implementation authentication focuses on the security improvement while implementation realizes the interval based algorithms in real network settings our work provides fundamental understanding about establishing group key via distributed and collaborative approach for dynamic peer group
we present an approach to the automatic creation of extractive summaries of literary short stories the summaries are produced with specific objective in mind to help reader decide whether she would be interested in reading the complete story to this end the summaries give the user relevant information about the setting of the story without revealing its plot the system relies on assorted surface indicators about clauses in the short story the most important of which are those related to the aspectual type of clause and to the main entities in story fifteen judges evaluated the summaries on number of extrinsic and intrinsic measures the outcome of this evaluation suggests that the summaries are helpful in achieving the original objective
on chip traffic of many applications exhibits self similar characteristics in this paper we intend to apply network calculus to analyze the delay and backlog bounds for self similar traffic in networks on chips we first prove that self similar traffic can not be constrained by any deterministic arrival curve then we prove that self similar traffic can be constrained by deterministic linear arrival curves rt rrate bburstiness if an additional parameter excess probability is used to capture its burstiness exceeding the arrival envelope this three parameter model rt enables us to apply and extend the results of network calculus to analyze the performance and buffering cost of networks delivering self similar traffic flows assuming the latency rate server model for the network elements we give closed form equations to compute the delay and backlog bounds for self similar traffic traversing series of network elements furthermore we describe performance analysis flow with self similar traffic as input our experimental results using real on chip multimedia traffic traces validate our model and approach
this paper presents an instruction scheduling and cluster assignment approach for clustered processors the proposed technique makes use of novel representation named the scheduling graph which describes all possible schedules powerful deduction process is applied to this graph reducing at each step the set of possible schedules in contrast to traditional list scheduling techniques the proposed scheme tries to establish relations among instructions rather than assigning each instruction to particular cycle the main advantage is that wrong or poor schedules can be anticipated and discarded earlier in addition cluster assignment of instructions is performed using another novel concept called virtual clusters which define sets of instructions that must execute in the same cluster these clusters are managed during the deduction process to identify incompatibilities among instructions the mapping of virtual to physical clusters is postponed until the scheduling of the instructions has finalized the advantages this novel approach features include accurate scheduling information when assigning and accurate information of the cluster assignment constraints imposed by scheduling decisions we have implemented and evaluated the proposed scheme with superblocks extracted from specint and mediabench the results show that this approach produces better schedules than the previous state ofthe art speed ups are up to with average speedups ranging from clusters to clusters
organizations maintain informational websites for wired devices the information content of such websites tends to change slowly with time so steady pattern of usage is soon established user preferences both at the individual and at the aggregate level can then be gauged from user access log files we propose heuristic scheme based on simulated annealing that makes use of the aggregate user preference data to re link the pages to improve navigability this scheme is also applicable to the initial design of websites for wireless devices using the aggregate user preference data obtained from parallel wired website and given an upper bound on the number of links per page our methodology links the pages in the wireless website in manner that is likely to enable the typical wireless user to navigate the site efficiently later when log file for the wireless website becomes available the same approach can be used to refine the design further
in order to let software programs access and use the information and services provided by web sources wrapper programs must be built to provide machine readable view over them although research literature on web wrappers is vast the problem of how to specify the internal logic of complex wrappers in graphical and simple way remains mainly ignored in this paper we propose new language for addressing this task our approach leverages on the existing work on intelligent web data extraction and automatic web navigation as building blocks and uses workflow based approach to specify the wrapper control logic the features included in the language have been decided from the results of study of wide range of real web automation applications from different business areas in this paper we also present the most salient results of the study
the paper focuses on investigating the combined use of semantic and structural information of programs to support the comprehension tasks involved in the maintenance and reengineering of software systems here semantic refers to the domain specific issues both problem and development domains of software system the other dimension structural refers to issues such as the actual syntactic structure of the program along with the control and data flow that it represents an advanced information retrieval method latent semantic indexing is used to define semantic similarity measure between software components components within software system are then clustered together using this similarity measure simple structural information ie file organization of the software system is then used to assess the semantic cohesion of the clusters and files with respect to each other the measures are formally defined for general application set of experiments is presented which demonstrates how these measures can assist in the understanding of nontrivial software system namely version of ncsa mosaic
over the past few years researchers have been exploring possibilities for ways in which embedded technologies can enrich childrens storytelling experiences in this article we present our research on physical interactive storytelling environments from childs perspective we present the system architecture as well as formative study of the technologys use with children ages we discuss the challenges and opportunities for kindergarten children to become creators of their own physical storytelling interactions
although the slicing of programs written in high level language has been widely studied in the literature relatively few papers have been published on the slicing of binary executable programs the lack of existing solutions for the latter is really hard to understand since the application domain for slicing binaries is similar to that for slicing high level languages furthermore there are special applications of the slicing of programs without source code like source code recovery code transformation and the detection of security critical code fragments in this paper in addition to describing the method of interprocedural static slicing of binaries we discuss how the set of the possible targets of indirect call sites can be reduced by dynamically gathered information our evaluation of the slicing method shows that if indirect function calls are extensively used both the number of edges in the call graph and the size of the slices can be significantly reduced
there is considerable interest in developing runtime infrastructures for programs that can migrate from one host to another mobile programs are appealing because they support efficient utilization of network resources and extensibility of information servers in this paper we present scheduling scheme for allocating resources to mix of real time and non real time mobile programs within this framework both mobile programs and hosts can specify constraints on how cpu should be allocated on the basis of the constraints the scheme constructs scheduling graph on which it applies several scheduling algorithms in case of conflicts between mobile program and host specified constraints the schemes implements policy that resolves the conflicts in favor of the host the resulting scheduling scheme is adaptive flexible and enforces both program and host specified constraints
reiter’s default logic formalizes nonmonotonic reasoning using default assumptions the semantics of given instance of default logic is based on fixpoint equation defining an extension three different reasoning problems arise in the context of default logic namely the existence of an extension the presence of given formula in an extension and the occurrence of formula in all extensions since the end of several complexity results have been published concerning these default reasoning problems for different syntactic classes of formulaswe derive in this paper complete classification of default logic reasoning problems by means of universal algebra tools using post’s clone lattice in particular we prove trichotomy theorem for the existence of an extension classifying this problem to be either polynomial np complete or Ï�p complete depending on the set of underlying boolean connectives we also prove similar trichotomy theorems for the two other algorithmic problems in connection with default logic reasoning
wireless sensor network wsn is novel technology in wireless field the main function of this technology is to use sensor nodes to sense important information just like battlefield data and personal health information under the limited resources it is important to avoid malicious damage while information transmits in wireless network so wireless intrusion detection system wids becomes one of important topics in wireless sensor networks the attack behavior of wireless sensor nodes is different to wired attackers in this paper we will propose an isolation table to detect intrusion in hierarchical wireless sensor networks and to estimate the effect of intrusion detection effectively the primary experiment proves the isolation table intrusion detection can prevent attacks effectively
we explore the problem of portable and flexible privacy preserving access rights that permit access to large collection of digital goods privacy preserving access control means that the service provider can neither learn what access rights customer has nor link request to access an item to particular customer thus maintaining privacy of both customer activity and customer access rights flexible access rights allow customer to choose subset of items or groups of items from the repository obtain access to and be charged only for the items selected and portability of access rights means that the rights themselves can be stored on small devices of limited storage space and computational capabilities such as smartcards or sensors and therefore the rights must be enforced using the limited resources available in this paper we present and compare two schemes that address the problem of such access rights we show that much can be achieved if one allows for even negligible amount of false positives items that were not requested by the customer but inadvertently were included in the customer access right representation due to constrained space resources but minimizing false positives is one of many other desiderata that include protection against sharing of false positives information by unscrupulous users providing the users with transaction untraceability and unlinkability and forward compatibility of the scheme our first scheme does not place any constraints on the amount of space available on the limited capacity storage device and searches for the best representation that meets the requirements the second scheme on the other hand has modest requirements on the storage space available but guarantees low rate of false positives with mc storage space available on the smartcard where is the number of items or groups of items included in the subscription and is selectable parameter it achieves rate of false positives of
in our research we have been concerned with the question of how to make relevant features of security situations visible to users in order to allow them to make informed decisions regarding potential privacy and security problems as well as regarding potential implications of their actions to this end we have designed technical infrastructures that make visible the configurations activities and implications of available security mechanisms this thus allows users to make informed choices and take coordinated and appropriate actions when necessary this work differs from the more traditional security usability work in that our focus is not only on the usability of security mechanism eg the ease of use of an access control interface but how security can manifest itself as part of people’s interactions with and through information systems ie how people experience and interpret privacy and security situations and are enabled or constrained by existing technological mechanisms to act appropriately in this paper we report our experiences designing developing and testing two technical infrastructures for supporting this approach for usable security
weighted undirected network is growth bounded if the number of nodes at distance around any given node is at most times the number of nodes at distance around the node given weighted undirected network with arbitrary node names and we present routing scheme that routes along paths of stretch and uses with high probability only εo log logn bit routing tables per node
the human face is capable of producing an astonishing variety of expressions mdash expressions for which sometimes the smallest difference changes the perceived meaning considerably producing realistic looking facial animations that are able to transmit this degree of complexity continues to be challenging research topic in computer graphics one important question that remains to be answered is when are facial animations good enough quest here we present an integrated framework in which psychophysical experiments are used in first step to systematically evaluate the perceptual quality of several different computer generated animations with respect to real world video sequences the first experiment provides an evaluation of several animation techniques exposing specific animation parameters that are important to achieve perceptual fidelity in second experiment we then use these benchmarked animation techniques in the context of perceptual research in order to systematically investigate the spatiotemporal characteristics of expressions third and final experiment uses the quality measures that were developed in the first two experiments to examine the perceptual impact of changing facial features to improve the animation techniques using such an integrated approach we are able to provide important insights into facial expressions for both the perceptual and computer graphics community
role based access control rbac is recognized as an excellent model for access control in an enterprise environment in large enterprises effective rbac administration is major issue arbac is well known solution for decentralized rbac administration arbac authorizes administrative roles by means of role ranges and prerequisite conditions although attractive and elegant in their own right we will see that these mechanisms have significant shortcomingswe propose an improved role administration model named arbac to overcome the weaknesses of arbac arbac adopts the organization unit for new user and permission pools independent of role or role hierarchy it uses refined prerequisite condition in addition we present bottom up approach to permission role administration in contrast to the top down approach of arbac
we present an algorithm for interactive deformation of subdivision surfaces including displaced subdivision surfaces and subdivision surfaces with geometric textures our system lets the user directly manipulate the surface using freely selected surface points as handles during deformation the control mesh vertices are automatically adjusted such that the deforming surface satisfies the handle position constraints while preserving the original surface shape and details to best preserve surface details we develop gradient domain technique that incorporates the handle position constraints and detail preserving objectives into the deformation energy for displaced subdivision surfaces and surfaces with geometric textures the deformation energy is highly nonlinear and cannot be handled with existing iterative solvers to address this issue we introduce shell deformation solver which replaces each numerically unstable iteration step with two stable mesh deformation operations our deformation algorithm only uses local operations and is thus suitable for gpu implementation the result is real time deformation system running orders of magnitude faster than the state of the art multigrid mesh deformation solver we demonstrate our technique with variety of examples including examples of creating visually pleasing character animations in real time by driving subdivision surface with motion capture data
an abundant amount of information is created and delivered over electronic media users risk becoming overwhelmed by the flow of information and they lack adequate tools to help them manage the situation information filtering if is one of the methods that is rapidly evolving to manage large information flows the aim of if is to expose users to only information that is relevant to them many if systems have been developed in recent years for various application domains some examples of filtering applications are filters for search results on the internet that are employed in the internet software personal mail filters based on personal profiles listservers or newsgroups filters for groups or individuals browser filters that block non valuable information filters designed to give children access them only to suitable pages filters for commerce applications that address products and promotions to potential customers only and many more the different systems use various methods concepts and techniques from diverse research areas like information retrieval artificial intelligence or behavioral science various systems cover different scope have divergent functionality and various platforms there are many systems of widely varying philosophies but all share the goal of automatically directing the most valuable information to users in accordance with their user model and of helping them use their limited reading time most optimally this paper clarifies the difference between if systems and related systems such as information retrieval ir systems or extraction systems the paper defines framework to classify if systems according to several parameters and illustrates the approach with commercial and academic systems the paper describes the underlying concepts of if systems and the techniques that are used to implement them it discusses methods and measurements that are used for evaluation of if systems and limitations of the current systems in the conclusion we present research issues in the information filtering research arena such as user modeling evaluation standardization and integration with digital libraries and web repositories
this paper presents the prefilter predicate pushdown framework for data stream management system dsms though early predicate evaluation is well known query optimization strategy novel problems arise in high performance dsms in particular query invocation costs are high as compared to the cost of evaluating simple predicates that are often used in high speed stream analysis ii selectivity estimates may become inaccurate over time and iii multiple queries possibly containing common subexpressions must be processed continuously the prefilter addresses these issues by constructing appropriate predicates for early evaluation as soon as new data arrive and before any queries are invoked it also compresses the bit vector representing the outcomes of pushed down predicates over newly arrived tuples and uses the compressed bitmap to efficiently check which queries do not have to be invoked using set of network monitoring queries we show that the performance of the gigascope dsms is significantly improved by the prefilter
recently the notion of self similarity has been shown to apply to wide area and local area network traffic in this paper we examine the mechanisms that give rise to the self similarity of network traffic we present hypothesized explanation for the possible self similarity of traffic by using particular subset of wide area traffic traffic due to the world wide web www using an extensive set of traces of actual user executions of ncsa mosaic reflecting over half million requests for www documents we examine the dependence structure of www traffic while our measurements are not conclusive we show evidence that www traffic exhibits behavior that is consistent with self similar traffic models then we show that the self similarity in such traffic can be explained based on the underlying distributions of www document sizes the effects of caching and user preference in file transfer the effect of user think time and the superimposition of many such transfers in local area network to do this we rely on empirically measured distributions both from our traces and from data independently collected at over thirty www sites
in this paper we describe the work devising new technique for role finding to implement role based security administration our results stem from industrial projects where large scale customers wanted to migrate to role based access control rbac based on already existing access rights patterns in their production it systems
structural testing of software requires monitoring the software’s execution to determine which program entities are executed by test suite such monitoring can add considerable overhead to the execution of the program adversely affecting the cost of running test suite thus minimizing the necessary monitoring activity lets testers reduce testing time or execute more test cases basic testing strategy is to cover all statements or branches but more effective strategy is to cover all definition use associations duas in this paper we present novel technique to efficiently monitor duas based on branch monitoring we show how to infer from branch coverage the coverage of many duas while remaining duas are predicted with high accuracy by the same information based on this analysis testers can choose branch monitoring to approximate dua coverage or instrument directly for dua monitoring which is precise but more expensive in this paper we also present tool called dua forensics that we implemented for this technique along with set of empirical studies that we performed using the tool
to date automatic handwriting recognition systems are far from being perfect and heavy human intervention is often required to check and correct the results of such systems this post editing process is both inefficient and uncomfortable to the user an example is the transcription of historic documents state of the art handwritten text recognition technology is not suitable to perform this task automatically and expensive paleography expert work is needed to achieve correct transcriptions as an alternative to fully manual transcription and post editing multimodal interactive approach is proposed here where user feedback is provided by means of touchscreen pen strokes and or more traditional keyboard and mouse operation user’s feedback directly allows to improve system accuracy while multimodality increases system ergonomy and user acceptability multimodal interaction is approached in such way that both the main and the feedback data streams help each other to optimize overall performance and usability empirical tests on three cursive handwritten tasks suggest that using this approach considerable amounts of user effort can be saved with respect to both pure manual work and non interactive post editing processing
the development of controls for the execution of concurrent code is non trivial we show how existing discrete event system des theory can be successfully applied to this problem from code without concurrency controls and specification of desired behaviours concurrency control code is generated by applying rigorously proven des theory we guarantee that the control scheme is nonblocking and thus free of both deadlock and livelock and minimally restrictive some conflicts between specifications and source can be automatically resolved without introducing new specifications moreover the approach is independent of specific programming or specification languages two examples using java are presented to illustrate the approach additional applicable des results are discussed as future work
in this paper we present new background estimation algorithm which effectively represents both background and foreground the problem is formulated with labeling problem over patch based markov random field mrf and solved with graph cuts algorithm our method is applied to the problem of mosaic blending considering the moving objects and exposure variations of rotating and zooming camera also to reduce seams in the estimated boundaries we propose simple exposure correction algorithm using intensities near the estimated boundaries
unbeknownst to most users when query is submitted to search engine two distinct searches are performed the organic or algorithmic search that returns relevant web pages and related data maps images etc and the sponsored search that returns paid advertisements while an enormous amount of work has been invested in understanding the user interaction with organic search surprisingly little research has been dedicated to what happens after an ad is clicked situation we aim to correct to this end we define and study the process of context transfer that is the user’s transition from web search to the context of the landing page that follows an ad click we conclude that in the vast majority of cases the user is shown one of three types of pages namely homepage the homepage of the advertiser category browse browse able sub catalog related to the original query and search transfer the search results of the same query re executed on the target site we show that these three types of landing pages can be accurately distinguished using automatic text classification finally using such an automatic classifier we correlate the landing page type with conversion data provided by advertisers and show that the conversion rate ie users response rate to ads varies considerably according to the type we believe our findings will further the understanding of users response to search advertising in general and landing pages in particular and thus help advertisers improve their web sites and help search engines select the most suitable ads
in this article we discuss how to shape mas infrastructure to support an agent oriented role based access control model rbac mas first we introduce the rbac model and show how it can be extended to capture the essential features of agent systems then we extrapolate the core requirements of an infrastructure for rbac mas and depict possible approach based on accs agent coordination contexts the conceptual framework for an rbac mas infrastructure exploiting accs is subsequently formalized through process algebraic description of the main infrastructure entities this is meant to serve as formal specification of both the infrastructure and the language for expressing roles operations and policies in rbac mas
in this paper we study the problem of processing multiple queries in wireless sensor network we focus on multi query optimization at the base station level to minimize the number of radio messages in the sensor network we adopt cost based approach and develop cost model to study the benefit of exploiting common subexpressions in queries we also propose several optimization algorithms for both data acquisition queries and aggregation queries that intelligently rewrite multiple sensor data queries at the base station into synthetic queries to eliminate redundancy among them before they are injected into the wireless sensor network the set of running synthetic queries is dynamically updated by the arrival of new queries as well as the termination of existing queries we validate the effectiveness of our cost model and our experimental results indicate that our multi query optimization strategy can provide significant performance improvements
providing runtime information about generic types that is reifying generics is challenging problem studied in several research papers in the last years this problem is not tackled in current version of the java programming language java which consequently suffers from serious safety and coherence problems the quest for finding effective and efficient solutions to this problem is still open and is further made more complicated by the new mechanism of wildcards introduced in java jse its reification aspects are currently unexplored and pose serious semantics and implementation issues in this paper we discuss an implementation support for wildcard types in java we first analyse the problem from an abstract viewpoint discussing the issues that have to be faced in order to extend an existing reification technique so as to support wildcards namely subtyping capture conversion and wildcards capture in method calls secondly we present an implementation in the context of the ego compiler ego is an approach for efficiently supporting runtime generics at compile time synthetic code is automatically added to the source code by the extended compiler so as to create generic runtime type information on by need basis store it into object instances and retrieve it when necessary in type dependent operations the solution discussed in this paper makes the ego compiler the first reification approach entirely dealing with the present version of the java programming language
in this paper we present novel interactive texture design scheme based on the tile optimization and image composition given small example texture the design process starts with applying an optimized sample patches selection operation to the example texture to obtain set of sample patches then set of omega tiles are constructed from these patches local changes to those tiles are further made by composing their local regions with the texture elements or objects interactively selected from other textures or normal images such select compose process is iterated many times until the desired omega tiles are obtained finally the tiles are tiled together to form large texture our experimental results demonstrate that the proposed technique can be used for designing large variety of versatile textures from single small example texture increasing or decreasing the decreasing the density of texture elements as well as for synthesizing textures from multiple sources
we study new model of computation called stream checking on graph problems where space limited verifier has to verify proof sequentially ie it reads the proof as stream moreover the proof itself is nothing but reordering of the input data this model has close relationship to many models of computation in other areas such as data streams communication complexity and proof checking and could be used in applications such as cloud computingin this paper we focus on graph problems where the input is sequence of edges we show that checking if graph has perfect matching is impossible to do deterministically using small space to contrast this we show that randomized verifiers are powerful enough to check whether graph has perfect matching or is connected
the analysis of large and complex networks or graphs is becoming increasingly important in many scientific areas including machine learning social network analysis and bioinformatics one natural type of question that can be asked in network analysis is given two sets and of individuals in graph with complete and missing knowledge respectively about property of interest which individuals in are closest to with respect to this property to answer this question we can rank the individuals in such that the individuals ranked highest are most likely to exhibit the property of interest several methods based on weighted paths in the graph and markov chain models have been proposed to solve this task in this paper we show that we can improve previously published approaches by rephrasing this problem as the task of property prediction in graph structured data from positive examples the individuals in and unlabelled data the individuals in and applying an inexpensive iterative neighbourhood’s majority vote based prediction algorithm inmv to this task we evaluate our inmv prediction algorithm and two previously proposed methods using markov chains on three real world graphs in terms of roc auc statistic inmv obtains rankings that are either significantly better or not significantly worse than the rankings obtained from the more complex markov chain based algorithms while achieving reduction in run time of one order of magnitude on large graphs
ranged hash functions generalize hash tables to the setting where hash buckets may come and go over time typical case in distributed settings where hash buckets may correspond to unreliable servers or network connections monotone ranged hash functions are particular class of ranged hash functions that minimize item reassignments in response to churn changes in the set of available buckets the canonical example of monotone ranged hash function is the ring based consistent hashing mechanism of karger et al these hash functions give maximum load of theta mlogm when is the number of items and is the number of buckets the question of whether some better bound could be obtained using more sophisticated hash function has remained open we resolve this question by showing two lower bounds first the maximum load of any randomized monotone ranged hash function is omega radic mlnm when mlogm this bound covers almost all of the nontrivial case because when omega mlogm simple random assignment matches the trivial lower bound of omega we give matching though impractical upper bound that shows that our lower bound is tight over almost all of its range second for randomized monotone ranged hash functions derived from metric spaces there is further trade off between the expansion factor of the metric and the load balance which for the special case of growth restricted metrics gives bound of omega mlogm asymptotically equal to that of consistent hashing these are the first known non trivial lower bounds for ranged hash functions they also explain why in ten years no better ranged hash functions have arisen to replace consistent hashing
we introduce generalized definition of sld resolution admitting restrictions on atom and or clause selectability instances of these restrictions include delay declarations input consuming unification and guarded clausesin the context of such generalization of sld resolution we offer theoretical framework to reason about programs and queries such that all derivations are successful we provide characterization of those programs and queries which allows to reuse existing methods from the literature on termination and verification of prolog programs
identity federation is key factor for user to access multiple service providers seamlessly many protocols that federate between the providers have been proposed such protocols are operated under the same rules of identity management between the federated providers moreover cross protocol federation that uses different protocols for federation between the providers has also been proposed however the providers cannot federate all user information even though they do so using the cross protocol federation because they do not confirm whether the federated providers obey the same rules or not federation bridge that converts and regulates the interaction messages between the providers is described herein with it the provider can federate based on the rules that user and the providers determine
we present novel spectral based algorithm for clustering categorical data that combines attribute relationship and dimension reduction techniques found in principal component analysis pca and latent semantic indexing lsi the new algorithm uses data summaries that consist of attribute occurrence and co occurrence frequencies to create set of vectors each of which represents cluster we refer to these vectors as candidate cluster representatives the algorithm also uses spectral decomposition of the data summaries matrix to project and cluster the data objects in reduced space we refer to the algorithm as sccadds spectral based clustering algorithm for categorical data using data summaries sccadds differs from other spectral clustering algorithms in several key respects first the algorithm uses the attribute categories similarity matrix instead of the data object similarity matrix as is the case with most spectral algorithms that find the normalized cut of graph of nodes of data objects sccadds scales well for large datasets since in most categorical clustering applications the number of attribute categories is small relative to the number of data objects second non recursive spectral based clustering algorithms typically require means or some other iterative clustering method after the data objects have been projected into reduced space sccadds clusters the data objects directly by comparing them to candidate cluster representatives without the need for an iterative clustering method third unlike standard spectral based algorithms the complexity of sccadds is linear in terms of the number of data objects results on datasets widely used to test categorical clustering algorithms show that sccadds produces clusters that are consistent with those produced by existing algorithms while avoiding the computation of the spectra of large matrices and problems inherent in methods that employ the means type algorithms
web search is the dominant form of information access and everyday millions of searches are handled by mainstream search engines but users still struggle to find what they are looking for and there is much room for improvement in this paper we describe novel and practical approach to web search that combines ideas from personalization and social networking to provide more collaborative search experience we described how this has been delivered by complementing rather than competing with mainstream search engines which offers considerable business potential in google dominated search marketplace
multidatabase system mdbs integrates information from autonomous local databases managed by different database management systems mdbs in distributed environment number of challenges are raised for query optimization in such an mdbs one of the major challenges is that some local optimization information may not be available at the global level we recently proposed query sampling method to drive cost estimation formulas for local databases in an mdbs to use the derived formulas to estimate the costs of queries we need to know the selectivities of the qualifications of the queries unfortunately existing methods for estimating selectivities cannot be used efficiently in an mdbs environment this paper discusses difficulties of estimating selectivities in an mdbs based on the discussion this paper presents an integrated method to estimate selectivities in an mdbs the method integrates and extends several existing methods so that they can be used in an mdbs efficiently it extends christodoulakis’s parametric method so that estimation accuracy is improved and more types of queries can be handled it extends lipton and naughton’s adaptive sampling method so that both performance and accuracy are improved theoretical and experimental results show that the extended lipton and naughton’s method described in this paper can be many times faster than the original one in addition the integrated method uses new piggyback approach to collect and maintain statistics which can reduce the statistic maintenance cost the integrated method is designed for the mdbs in the cords project cords mdbs implementation considerations are also given in the paper
more and more documents on theworldwideweb are based on templates on technical level this causes those documents to have quite similar source code and dom tree structure grouping together documents which are based on the same template is an important task for applications that analyse the template structure and need clean training data this paper develops and compares several distance measures for clustering web documents according to their underlying templates combining those distance measures with different approaches for clustering we show which combination of methods leads to the desired result
skyline queries have gained much attention as alternative query semantics with pros eglow query formulation overhead and cons eglarge control over result size to overcome the cons subspace skyline queries have been recently studied where users iteratively specify relevant feature subspaces on search space however existing works mainly focuss on centralized databases this paper aims to extend subspace skyline computation to distributed environments such as the web where the most important issue is to minimize the cost of accessing vertically distributed objects toward this goal we exploit prior skylines that have overlapped subspaces to the given subspace in particular we develop algorithms for three scenarios when the subspace of prior skylines is superspace subspace or the rest our experimental results validate that our proposed algorithm shows significantly better performance than the state of the art algorithms
the formalization of process definitions has been an invaluable aid in many domains however noticeable variations in processes start to emerge as precise details are added to process definitions while each such variation gives rise to different process these processes might more usefully be considered as variants of each other rather than completely different processes this paper proposes that it is beneficial to regard such an appropriately close set of process variants as process family the paper suggests characterization of what might comprise process family and introduces formal approach to defining families based upon this characterization to illustrate this approach we describe case study that demonstrates the different variations we observed in processes that define how dispute resolution is performed at the us national mediation board we demonstrate how our approach supports the definition of this set of process variants as process family
embodied agents are often designed with the ability to simulate human emotion this paper investigates the psychological impact of simulated emotional expressions on computer users with particular emphasis on how mismatched facial and audio expressions are perceived eg happy face with concerned voice in within subjects repeated measures experiment mismatched animations were perceived as more engaging warm concerned and happy when happy or warm face was in the animation as opposed to neutral or concerned face and when happy or warm voice was in the animation as opposed to neutral or concerned voice the results appear to follow cognitive dissonance theory as subjects attempted to make mismatched expressions consistent on both the visual and audio dimensions of animations resulting in confused perceptions of the emotional expressions design implications for affective embodied agents are discussed and future research areas identified
xml data management using relational database systems has been intensively studied in the last few years however in order for such systems to be viable they must support not only queries but also updates over virtual xml views that wrap the relational data while view updating is long standing difficult issue in the relational context the flexible xml data model and nested xml query language both pose additional challenges for view updatingthis paper addresses the question if for given update over an xml view correct relational update translation exists first we propose clean extended source theory as criteria for determining whether given translation mapping is correct to determine the existence of such correct mapping we classify view update as either un translatable conditionally or unconditionally translatable under given update translation policy this classification depends on several features of the xml view and the update granularity of the update at the view side properties of the view construction and types of duplication appearing in the view these features are represented in the annotated schema graph this is further utilized by our schema driven translatability reasoning algorithm star to classify given update into one of the three above update categories the correctness of the algorithm is proven using our clean extended source theory this technique represents practical approach that can be applied by any existing view update system in industry and academia for analyzing the translatability of given update statement before translation of it is attempted to illustrate the working algorithm we provide concrete case study on the translatability of xml view updates
the feature of continuous interaction in pen based system is critically significant seamless mode switch can effectively enhance the fluency of interaction the interface which incorporated the advantages of seamless and continuous operation has the potential of enhancing the efficiency of operation and concentrating the users attention in this paper we present seamless and continuous operation paradigm based on pen’s multiple input parameters prototype which can support seamless and continuous sc operation is designed to compare the performance with ms word system the subjects were requested to select target components activate the command menus and color the targets with given flowchart in two systems respectively the experiment results report the sc operation paradigm outperformed the standard ways in ms word in both operation speed and cursor footprint length cfl
quality of service routing is at present an active and remarkable research area since most emerging network services require specialized quality of service qos functionalities that cannot be provided by the current qos unaware routing protocols the provisioning of qos based network services is in general terms an extremely complex problem and significant part of this complexity lies in the routing layer indeed the problem of qos routing with multiple additive constraints is known to be np hard thus successful and wide deployment of the most novel network services demands that we thoroughly understand the essence of qos routing dynamics and also that the proposed solutions to this complex problem should be indeed feasible and affordable this article surveys the most important open issues in terms of qos routing and also briefly presents some of the most compelling proposals and ongoing research efforts done both inside and outside the next community to address some of those issues
social metadata are receiving interest from many domains mainly as way to aggregate various patterns in social networks few scholars have however taken the perspective of end users and examined how they utilize social metadata to enrich interpersonal communication the results of study of end user practices of social metadata usage are presented in this article data were gathered from variety of online forums by collecting and analyzing user discussions relating to social metadata supporting features in facebook three hundred and fifteen relevant comments on social metadata usage were extracted the analysis revealed the use of experimental profiles clashes between work and non work related social metadata usage and differences in users social investment causing social dilemmas the study also resulted in developments of theory relating to social metadata and relationship maintenance in conclusion social metadata expand pure attention economy conveying much wider qualitative range of social information
skyline computation is hot topic in database community due to its promising application in multi criteria decision making in sensor network application scenarios skyline is still useful and important in environment monitoring industry control etc to support energy efficient skyline monitoring in sensor networks this paper first presents naïve approach as baseline and then proposes an advanced approach that employs hierarchical thresholds at the nodes the threshold based approach focuses on minimizing the transmission traffic in the network to save the energy consumption finally we conduct extensive experiments to evaluate the proposed approaches on simulated data sets and compare the threshold based approach with the naïve approach experimental results show that the proposed threshold based approach outperforms the naïve approach substantially in energy saving
policies are widely used in many systems and applications recently it has been recognized that yes no response to every scenario is just not enough for many modern systems and applications many policies require certain conditions to be satisfied and actions to be performed before or after decision is made to address this need this paper introduces the notions of provisions and obligations provisions are those conditions that need to be satisfied or actions that must be performed before decision is rendered while obligations are those conditions or actions that must be fulfilled by either the users or the system after the decision this paper formalizes rule based policy framework that includes provisions and obligations and investigates reasoning mechanism within this framework policy decision may be supported by more than one derivation each associated with potentially different set of provisions and obligations called global po set the reasoning mechanism can derive all the global po sets for each specific policy decision and facilitates the selection of the best one based on numerical weights assigned to provisions and obligations as well as on semantic relationships among them the paper also shows the use of the proposed policy framework in security application
working in group consists of setting up an environment that allows the different participants to work together the collaboration has now become then discipline that fascinates the distributed environments as well as the human machine interactions the big challenge of the cscw environments is to be able to give the necessary mechanisms in order to carry out effective collaborative work ie to put the actors together in virtual room which simulates real situation of groupware meeting we would like to present model of architecture that places the actors in situation of virtual grouping centered on the awareness and that spreads on continuum of collaboration representing continuity of the group work augmented by the functional spaces of the clover model we have applied this model of architecture to the european project of tele neurology teneci which offers platform of telecommuting enriched by several functionalities presented to the neurologists to assure telediagnosis in the group
we connect two scenarios in structured learning adapting parser trained on one corpus to another annotation style and projecting syntactic annotations from one language to another we propose quasi synchronous grammar qg features for these structured learning tasks that is we score aligned pair of source and target trees based on local features of the trees and the alignment our quasi synchronous model assigns positive probability to any alignment of any trees in contrast to synchronous grammar which would insist on some form of structural parallelism in monolingual dependency parser adaptation we achieve high accuracy in translating among multiple annotation styles for the same sentence on the more difficult problem of cross lingual parser projection we learn dependency parser for target language by using bilingual text an english parser and automatic word alignments our experiments show that unsupervised qg projection improves on parses trained using only high precision projected annotations and far outperforms by more than absolute dependency accuracy learning an unsupervised parser from raw target language text alone when few target language parse trees are available projection gives boost equivalent to doubling the number of target language trees
in the attention driven image interpretation process an image is interpreted as containing several perceptually attended objects as well as the background the process benefits greatly content based image retrieval task with attentively important objects identified and emphasized an important issue to be addressed in an attention driven image interpretation is to reconstruct several attentive objects iteratively from the segments of an image by maximizing global attention function the object reconstruction is combinational optimization problem with complexity of which is computationally very expensive when the number of segments is large in this paper we formulate the attention driven image interpretation process by matrix representation an efficient algorithm based on the elementary transformation of matrix is proposed to reduce the computational complexity to wn where is the number of runs experimental results on both the synthetic and real data show significantly improved processing speed with an acceptable degradation to the accuracy of object formulation
classification is quite relevant task within data analysis field this task is not trivial task and different difficulties can arise depending on the nature of the problem all these difficulties can become worse when the datasets are too large or when new information can arrive at any time incremental learning is an approach that can be used to deal with the classification task in these cases it must alleviate or solve the problem of limited time and memory resources one emergent approach uses concentration bounds to ensure that decisions are made when enough information supports them iadem is one of the most recent algorithms that use this approach the aim of this paper is to improve the performance of this algorithm in different ways simplifying the complexity of the induced models adding the ability to deal with continuous data improving the detection of noise selecting new criteria for evolutionating the model including the use of more powerful prediction techniques etc besides these new properties the new system iadem preserves the ability to obtain performance similar to standard learning algorithms independently of the datasets size and it can incorporate new information as the basic algorithm does using short time per example
this paper describes gburg which generates tiny fast code generators based on finite state machine pattern matching the code generators translate postfix intermediate code into machine instructions in one pass except of course for backpatching addresses stack based virtual machine known as the lean virtual machine lvm tuned for fast code generation is also described gburg translates the two page lvm to specification into code generator that fits entirely in an kb cache and that emits code at mb set on mhz our just in time code generator translates and executes small benchmarks at speeds within factor of two of executables derived from the conventional compile time code generator on which it is based
web services technologies enable flexible and dynamic interoperation of autonomous software and information systems central challenge is the development of modeling techniques and tools for eanbling the semi automatic composition and analysis of these services taking into account their semantic and behavioral properties this paper presents an overview of the fundamental assumptions and concepts underlying current work on service composition and provides sampling of key results in the area it also provides brief tour of several composition models including semantic web services the roman model and the mealy conversation model
in many envisioned mobile ad hoc networks nodes are expected to periodically beacon to advertise their presence in this way they can receive messages addressed to them or participate in routing operations yet these beacons leak information about the nodes and thus hamper their privacy classic remedy consists in each node making use of certified pseudonyms and changing its pseudonym in specific locations called mix zones of course privacy is then higher if the pseudonyms are short lived ie nodes have short distance to confusion but pseudonyms can be costly as they are usually obtained from an external authority in this paper we provide detailed analytical evaluation of the age of pseudonyms based on differential equations we corroborate this model by set of simulations this paper thus provides detailed quantitative framework for selecting the parameters of pseudonym based privacy system in peer to peer wireless networks
we notice an increasing usage of web applications in interactive spaces variant of ubiquitous computing environments interactive spaces feature large and dynamically changing number of devices eg an interactive tv set in the living room that is used with different input devices or an iphone that is dynamically federated to devices in the environment web applications need better way to exploit the resources in the interactive space beyond the standard input devices like mouse and keyboard eg speech recognition device this paper presents mundomonkey web browser extension and programming api for interactive spaces the api follows the event based programming paradigm for allowing web applications and end user scripts to access the interactive space our approach aligns well with the commonly used programming style for web applications we used mundomonkey to customize the interface of web applications to user preferences and the interactive space at hand to our knowledge our approach is the first to address adaptation of the output as well as processing of input data with mundomonkey the customization is performed transparently to the application developer by the end user thereby mundomonkey is an alternative to model driven user interface development approaches
in server proxy client tier networking architecture that is executed in the mobile network proxies should be dynamically assigned to serve mobile hosts according to geographical dependency and the network situation the goal of proxy handoff is to allow mobile host still can receive packets from the corresponding server while its serving proxy is switched from the current one to another one once proxy handoff occurs proper proxy should be selected based on the load balance concern and the network situation concern in this paper tier multimedia mobile transmission platform called mmtp is proposed to solve the proxy handoff problem in the ipv based mobile network proxy handoff scheme based on the application layer anycasting technique is proposed in mmtp proper proxy is selected from number of candidate proxies by using the application layer anycast an experimental environment based on mmtp is also built to analyze the performance metrics of mmtp including load balance among proxies and handoff latency
the global nature of energy creates challenges and opportunities for developing operating system policies to effectively manage energy consumption in battery powered mobile wireless devices the proposed currentcy model creates the framework for the operating system to manage energy as first class resource furthermore currentcy provides powerful mechanism to formulate energy goals and to unify resource management policies across diverse competing applications and spanning device components with very different power characteristics this paper explores the ability of the currentcy model to capture more complex interactions and to express more mature energy goals than previously considered we carry out this exploration in ecosystem an energy centric linux based operating system we extend ecosystem to address four new goals reducing residual battery capacity at the end of the targeted battery lifetime when it is no longer required eg recharging is available dynamic tracking of the energy needs of competing applications for more effective energy sharing reducing response time variation caused by limited energy availability and energy efficient disk management our results show that the currentcy model can express complex energy related goals and behaviors leading to more effective unified management policies than those that develop from per device approaches
one aim of this paper is to improve the logical and ontological rigor of the obo relation ontology by providing axiomatic specifications for logical properties of relations such as partof locatedin connectedto adjacentto attachedto etc all of these relations are currently only loosely specified in obo second aim is to improve the expressive power of the relation ontology by including axiomatic characterizations of qualitative size relations such as roughly the same size as negligible in size with respect to same scale etc these relations are important for comparing anatomical entities in way that is compatible with the normal variations of their geometric properties moreover qualitative size relations are important for distinguishing anatomical entities at different scales unfortunately the formal treatment of these relations is difficult due to their context dependent nature and their inherent vagueness this paper presents formalization that facilitates the separation of ontological aspects that are context independent and non vague from aspects that are context dependent and subject to vagueness third aim is to explicitly take into account the specific temporal properties of all of the relations and to provide formalization that can be used as basis for the formal representation of canonical anatomy as well as of instantiated anatomy all the relations and their properties are illustrated informally using human synovial joint as running example at the formal level the axiomatic theory is developed using isabelle computational system for implementing logical formalisms all proofs are computer verified and the computational representation of the theory is accessible on http wwwifomisorg bfo fol
making cloud services responsive is critical to providing compelling user experience many large scale sites including linkedin digg and facebook address this need by deploying pools of servers that operate purely on in memory state unfortunately current technologies for partitioning requests across these in memory server pools such as network load balancers lead to frustrating programming model where requests for the same state may arrive at different servers leases are well known technique that can provide better programming model by assigning each piece of state to single server however in memory server pools host an extremely large number of items and granting lease per item requires fine grained leasing that is not supported in prior datacenter lease managers this paper presents centrifuge datacenter lease manager that solves this problem by integrating partitioning and lease management centrifuge consists of set of libraries linked in by the in memory servers and replicated state machine that assigns responsibility for data items including leases to these servers centrifuge has been implemented and deployed in production as part of microsoft’s live mesh large scale commercial cloud service in continuous operation since april when cloud services within mesh were built using centrifuge they required fewer lines of code and did not need to introduce their own subtle protocols for distributed consistency as cloud services become ever more complicated this kind of reduction in complexity is an increasingly urgent need
we propose and evaluate empirically the performance of dynamic processor scheduling policy for multiprogrammed shared memory multiprocessors the policy is dynamic in that it reallocates processors from one parallel job to another based on the currently realized parallelism of those jobs the policy is suitable for implementation in production systems in that mdash it interacts well with very efficient user level thread packages leaving to them many low level thread operations that do not require kernel intervention mdash it deals with thread blocking due to user and page faults mdash it ensures fairness in delivering resources to jobs mdash its performance measured in terms of average job response time is superior to that of previously proposed schedulers including those implemented in existing systems it provides good performance to very short sequential eg interactive requests we have evaluated our scheduler and compared it to alternatives using set of prototype implementations running on sequent symmetry multiprocessor using number of parallel applications with distinct qualitative behaviors we have both evaluated the policies according to the major criterion of overall performance and examined number of more general policy issues including the advantage of ldquo space sharing rdquo over ldquo time sharing rdquo the processors of multiprocessor and the importance of cooperation between the kernel and the application in reallocating processors between jobs we have also compared the policies according to other criteia important in real implementations in particular fairness and respone time to short sequential requests we conclude that combination of performance and implementation considerations makes compelling case for our dynamic scheduling policy
we consider known protocol for reliable multicast in distributed mobile systems where mobile hosts communicate with wired infrastructure by means of wireless technology the original specification of the protocol does not take into consideration any notion of computer security an adversary may eavesdrop on communications between hosts and inject packets over the wireless links we suggest secured version of the protocol providing authenticity and integrity of packets over the wireless links the secure mechanisms introduced rely on two different techniques secure wireless channels and time signature schemes further we outline the formal verification of part of the secured protocol
data that appear to have different characteristics than the rest of the population are called outliers identifying outliers from huge data repositories is very complex task called outlier mining outlier mining has been akin to finding needles in haystack however outlier mining has number of practical applications in areas such as fraud detection network intrusion detection and identification of competitor and emerging business trends in commerce this survey discuses practical applications of outlier mining and provides taxonomy for categorizing related mining techniques comprehensive review of these techniques with their advantages and disadvantages along with some current research issues are provided
in this paper we present study to evaluate the impact of adaptive feedback on the effectiveness of pedagogical agent for an educational computer game we compare version of the game with no agent and two versions with agents that differ only in the accuracy of the student model used to guide the agent’s interventions we found no difference in student learning across the three conditions and we report an analysis to understand the reasons of these results
for distributed database system to function efficiently the fragments of the database need to be located judiciously at various sites across the relevant communications network the problem of allocating these fragments to the most appropriate sites is difficult one to solve however with most approaches available relying on heuristic techniques optimal approaches are usually based on mathematical programming and formulations available for this problem are based on the linearization of nonlinear binary integer programs and have been observed to be ineffective except on very small problems this paper presents new integer programming formulations for the nonredundant version of the fragment allocation problem this formulation is extended to address problems which have both storage and processing capacity constraints the approach is observed to be particularly effective in the presence of capacity restrictions extensive computational tests conducted over variety of parameter values indicate that the reformulations are very effective even on relatively large problems thereby reducing the need for heuristic approaches
shadows the common phenomena in most outdoor scenes bring many problems in image processing and computer vision in this paper we present novel method focusing on extracting shadows from single outdoor image the proposed tricolor attenuation model tam that describe the attenuation relationship between shadowand its nonshadowbackground is derived based on image formation theory the parameters of the tamare fixed by using the spectral power distribution spd of daylight and skylight which are estimated according to planck’s blackbody irradiance law based on the tam multistep shadow detection algorithm is proposed to extract shadows compared with previous methods the algorithm can be applied to process single images gotten in real complex scenes without prior knowledge the experimental results validate the performance of the model
service based applications sbas need to operate in highly dynamic world in which their constituent services could fail or become unavailable monitoring is typically used to identify such failures and if needed to trigger an adaptation of the sba to compensate for those failures however existing monitoring approaches exhibit several limitations monitoring individual services can uncover failures of services yet it remains open whether those individual failures lead to violation of the sba’s requirements which would necessitate an adaptation monitoring the sba can uncover requirements deviations however it will not provide information about the failures leading to this deviation which constitutes important information needed for the adaptation activities even combination of and is limited for instance requirements deviation will only be identified after it has occurred after the execution of the whole sba which then in case of failures might require costly compensation actions in this paper we introduce an approach that addresses those limitations by augmenting monitoring techniques for individual services with formal verification techniques the approach explicitly encodes assumptions that the constituent services of an sba will perform as expected based on those assumptions formal verification is used to assess whether the sba requirements are satisfied and whether violation of those assumptions during run time leads to violation of the sba requirements thereby our approach allows for pro actively deciding whether the sba requirements will be violated based on monitored failures and identifying the specific root cause for the violated requirements
the multiprocessor soc mpsoc revolution is fueled by the need to execute multiple advanced multimedia applications on single embedded computing platform at design time the applications that will run in parallel and their respective user requirements are unknown hence run time manager rtm is needed to match all application needs with the available platform resources and services creating such run time manager requires two decisions first one needs to decide what functionality to implement second one has to decide how to implement this functionality in order to meet boundary conditions like eg real time performance this paper is the first to detail generic view on mpsoc run time management functionality and its design space trade offs we substantiate the run time components and the implementation trade offs with academic state of the art solutions and brief overview of some industrial multiprocessor run time management examples we show clear trend towards more hardware acceleration limited distribution of management functionality over the platform and increasing support for adaptive multimedia applications in addition we briefly detail upcoming run time management research issues
detecting whether computer program code is student’s original work or has been copied from another student or some other source is major problem for many universities detection methods based on the information retrieval concepts of indexing and similarity matching scale well to large collections of files but require appropriate similarity functions for good performance we have used particle swarm optimization and genetic programming to evolve similarity functions that are suited to computer program code using training set of plagiarised and non plagiarised programs we have evolved better parameter values for the previously published okapi bm similarity function we have then used genetic programming to evolve completely new similarity functions that do not conform to any predetermined structure we found that the evolved similarity functions outperformed the human developed okapi bm function we also found that detection system using the evolved functions was more accurate than the the best code plagiarism detection system in use today and scales much better to large collections of files the evolutionary computing techniques have been extremely useful in finding similarity functions that advance the state of the art in code plagiarism detection
this paper describes the concept of sensor networks which has been made viable by the convergence of micro electro mechanical systems technology wireless communications and digital electronics first the sensing tasks and the potential sensor networks applications are explored and review of factors influencing the design of sensor networks is provided then the communication architecture for sensor networks is outlined and the algorithms and protocols developed for each layer in the literature are explored open research issues for the realization of sensor networks are also discussed
taxonomic measures of semantic proximity allow us to compute the relatedness of two concepts these metrics are versatile instruments required for diverse applications eg the semantic web linguistics and also text mining however most approaches are only geared towards hand crafted taxonomic dictionaries such as wordnet which only feature limited fraction of real world concepts more specific concepts and particularly instances of concepts ie names of artists locations brand names etc are not coveredthe contributions of this paper are two fold first we introduce framework based on google and the open directory project odp enabling us to derive the semantic proximity between arbitrary concepts and instances second we introduce new taxonomy driven proximity metric tailored for our framework studies with human subjects corroborate our hypothesis that our new metric outperforms benchmark semantic proximity metrics and comes close to human judgement
modeling the beyond topical aspects of relevance are currently gaining popularity in ir evaluation for example the discounted cumulated gain dcg measure implicitly models some aspects of higher order relevance via diminishing the value of relevant documents seen later during retrieval eg due to information cumulated redundancy and effort in this paper we focus on the concept of negative higher order relevance nhor made explicit via negative gain values in ir evaluation we extend the computation of dcg to allow negative gain values perform an experiment in laboratory setting and demonstrate the characteristics of nhor in evaluation the approach leads to intuitively reasonable performance curves emphasizing from the user’s point of view the progression of retrieval towards success or failure we discuss normalization issues when both positive and negative gain values are allowed and conclude by discussing the usage of nhor to characterize test collections
this paper addresses the field of unbounded model checking umc based on sat engines where craig interpolants have recently gained wide acceptance as an automated abstraction technique we start from the observation that interpolants can be quite effective on large verification instances as they operate on sat generated refutation proofs interpolants are very good at automatically abstract facts that are not significant for proofs in this work we push forward the new idea of generating abstractions without resorting to sat proofs and to accept reject abstractions whenever they do not fulfill given adequacy constraints we propose an integrated approach smoothly combining the capabilities of interpolation with abstraction and over approximation techniques that do not directly derive from sat refutation proofs the driving idea of this combination is to incrementally generate by refinement an abstract over approximate image built up from equivalences implications ternary and localization abstraction then eventually from sat refutation proofs experimental results derived from the verification of hard problems show the robustness of our approach
the correlation between keywords has been exploited to improve automatic image annotation aia differing from the traditional lexicon or training data based keyword correlation estimation we propose using web scale image semantic space learning to explore the keyword correlation for automatic web image annotation specifically we use the social media web site flickr as web scale image semantic space to determine the annotation keyword correlation graph to smooth the annotation probability estimation to further improve web image annotation performance we present novel constraint piecewise penalty weighted regression model to estimate the semantics of the web image from the corresponding associated text we integrate the proposed approaches into our web image annotation framework and conduct experiments on real web image data set the experimental results show that both of our approaches can improve the annotation performance significantly
in this paper we propose novel approach to reduce dynamic power in set associative caches that leverages on leakage saving proposal namely cache decay we thus open the possibility to unify dynamic and leakage management in the same framework the main intuition is that in decaying cache dead lines in set need not be searched thus rather than trying to predict which cache way holds specific line we predict for each way whether the line could be live in it we access all the ways that possibly contain the live line and we call this way selection in contrast to way prediction way selection cannot be wrong the line is either in the selected ways or not in the cache the important implication is that we have fixed hit time indispensable for both performance and ease of implementation reasons in order to achieve high accuracy in terms of total ways accessed we use decaying bloom filters to track only the live lines in ways dead lines are automatically purged we offer efficient implementations of such autonomously decaying bloom filters using novel quasi static cells our prediction approach grants us high accuracy in narrowing the choice of ways for hits as well as the ability to predict misses known weakness of way prediction
the dynamic and lossy nature of wireless communication poses major challenges to reliable self organizing multihop networks these non ideal characteristics are more problematic with the primitive low power radio transceivers found in sensor networks and raise new issues that routing protocols must address link connectivity statistics should be captured dynamically through an efficient yet adaptive link estimator and routing decisions should exploit such connectivity statistics to achieve reliability link status and routing information must be maintained in neighborhood table with constant space regardless of cell density we study and evaluate link estimator neighborhood table management and reliable routing protocol techniques we focus on many to one periodic data collection workload we narrow the design space through evaluations on large scale high level simulations to node in depth empirical experiments the most effective solution uses simple time averaged ewma estimator frequency based table management and cost based routing
design of high performance cluster networks routers with quality of service qos guarantees is becoming increasingly important to support variety of multimedia applications many of which have real time constraints most commercial routers which are based on the wormhole switching paradigm can deliver high performance but lack qos provisioning in this paper we present pipelined wormhole router architecture that can provide high and predictable performance for integrated traffic in clusters we consider two different implementations mdash non preemptive model and more aggressive preemptive model we also present the design of network interface card nic based on the virtual interface architecture via design paradigm to support qos in the nic the qos capable router and nic designs are evaluated with mixed workload consisting of best effort traffic multimedia streams and control traffic simulation results of an port router and times mesh network indicate that the preemptive router can provide better performance than the non preemptive router for dynamically changing workloads co evaluation of the qos aware nic with the proposed router models shows significant performance improvement compared to that with traditional nic without any qos support
mpeg is promising standard for the description of multimedia content number of applications based on mpeg media descriptions have been set up for research commercial and industrial applications therefore an efficient storage solution for large amounts of mpeg descriptions is certainly desirable as kind of data centric xml documents mpeg descriptions can be stored in the relational dbms for efficient and effective management the approaches of storing xml data in relational dbms can be classified into two classes of storage model schema conscious and schema oblivious the schema conscious model however cannot support complex xpath based queries efficiently and the schema oblivious approach lacks the flexibility in typed representation and access although the leading database systems have provided functionality for the xml document management none of them can reach all the critical requirements for the mpeg descriptions management in this paper we present new storage approach called ixmdb for mpeg documents storage solution ixmdb integrates the advantages of both the schema conscious method and the schema oblivious method and avoids the main drawbacks from each method the design of ixmdb pays attention to both multimedia information exchange and multimedia data manipulation its features can reach the most critical requirements for the mpeg documents storage and management the translation mechanism for converting xquery to sql and the support of query from multimedia perspective are provided with ixmdb performance studies are conducted by performing set of queries from the xml perspective and from the multimedia perspective the experimental results are presented in the paper and initial results are encouraging
this paper describes an approach for the automated verification of mobile programs mobile systems are characterized by the explicit notion of locations eg sites where they run and the ability to execute at different locations yielding number of security issueswe give formal semantics to mobile systems as labeled kripke structures which encapsulate the notion of the location net the location net summarizes the hierarchical nesting of threads constituting mobile program and enables specifying security policies we formalize language for specifying security policies and show how mobile programs can be exhaustively analyzed against any given security policy by using model checking techniques we developed and experimented with prototype framework for analysis of mobile code using the satabs model checker our approach relies on satabs’s support for unbounded thread creation and enhances it with location net abstractions which are essential for verifying large mobile programs our experimental results on various benchmarks are encouraging and demonstrate advantages of the model checking based approach which combines the validation of security properties with other checks such as for buffer overflows
although several process modeling languages allow one to specify processes with multiple start elements the precise semantics of such models are often unclear both from pragmatic and from theoretical point of view this paper addresses the lack of research on this problem and introduces the casu framework from creation activation subscription unsubscription the contribution of this framework is systematic description of design alternatives for the specification of instantiation semantics of process modeling languages we classify six prominent languages by the help of this framework we validate the relevance of the casu framework through empirical investigations involving large set of process models from practice our work provides the basis for the design of new correctness criteria as well as for the formalization of event driven process chains epcs and extension of the business process modeling notation bpmn it complements research such as the workflow patterns
priority driven search is an algorithm for retrieving similar shapes from large database of objects given query object and database of target objects all represented by sets of local shape features the algorithm produces ranked list of the best target objects sorted by how well any subset of features on the query match features on the target object to achieve this goal the system maintains priority queue of potential sets of feature correspondences partial matches sorted by cost function accounting for both feature dissimilarity and the geometric deformation only partial matches that can possibly lead to the best full match are popped off the queue and thus the system is able to find provably optimal match while investigating only small subset of potential matches new methods based on feature distinction feature correspondences at multiple scales and feature difference ranking further improve search time and retrieval performance in experiments with the princeton shape benchmark the algorithm provides significantly better classification rates than previously tested shape matching methods while returning the best matches in few seconds per query
we introduce gesture controllers method for animating the body language of avatars engaged in live spoken conversation gesture controller is an optimal policy controller that schedules gesture animations in real time based on acoustic features in the user’s speech the controller consists of an inference layer which infers distribution over set of hidden states from the speech signal and control layer which selects the optimal motion based on the inferred state distribution the inference layer consisting of specialized conditional random field learns the hidden structure in body language style and associates it with acoustic features in speech the control layer uses reinforcement learning to construct an optimal policy for selecting motion clips from distribution over the learned hidden states the modularity of the proposed method allows customization of character’s gesture repertoire animation of non human characters and the use of additional inputs such as speech recognition or direct user control
the main drawback of existing software artifact management systems is the lack of automatic or semi automatic traceability link generation and maintenance we have improved an artifact management system with traceability recovery tool based on latent semantic indexing lsi an information retrieval technique we have assessed lsi to identify strengths and limitations of using information retrieval techniques for traceability recovery and devised the need for an incremental approach the method and the tool have been evaluated during the development of seventeen software projects involving about students we observed that although tools based on information retrieval provide useful support for the identification of traceability links during software development they are still far to support complete semi automatic recovery of all links the results of our experience have also shown that such tools can help to identify quality problems in the textual description of traced artifacts
we identify two issues with searching literature digital collections within digital libraries there are no effective paper scoring and ranking mechanisms without scoring and ranking system users are often forced to scan large and diverse set of publications listed as search results and potentially miss the important ones topic diffusion is common problem publications returned by keyword based search query often fall into multiple topic areas not all of which are of interest to users this paper proposes new literature digital collection search paradigm that effectively ranks search outputs while controlling the diversity of keyword based search query output topics our approach is as follows first during pre querying publications are assigned into pre specified ontology based contexts and query independent context scores are attached to papers with respect to the assigned contexts when query is posed relevant contexts are selected search is performed within the selected contexts context scores of publications are revised into relevancy scores with respect to the query at hand and the context that they are in and query outputs are ranked within each relevant context this way we minimize query output topic diversity reduce query output size decrease user time spent scanning query results and increase query output ranking accuracy using genomics oriented pubmed publications as the testbed and gene ontology terms as contexts our experiments indicate that the proposed context based search approach produces search results with up to higher precision and reduces the query output size by up to
brighthouse is column oriented data warehouse with an automatically tuned ultra small overhead metadata layer called knowledge grid that is used as an alternative to classical indexes the advantages of column oriented data storage as well as data compression have already been well documented especially in the context of analytic decision support querying this paper demonstrates additional benefits resulting from knowledge grid for compressed column oriented databases in particular we explain how it assists in query optimization and execution by minimizing the need of data reads and data decompression
we report on our experience with the hardware transactional memory htm feature of two pre production revisions of new commercial multicore processor our experience includes number of promising results using htm to improve performance in variety of contexts and also identifies some ways in which the feature could be improved to make it even better we give detailed accounts of our experiences sharing techniques we used to achieve the results we have as well as describing challenges we faced in doing so
effective bug localization is important for realizing automated debugging one attractive approach is to apply statistical techniques on collection of evaluation profiles of program properties to help localize bugs previous research has proposed various specialized techniques to isolate certain program predicates as bug predictors however because many bugs may not be directly associated with these predicates these techniques are often ineffective in localizing bugs relevant control flow paths that may contain bug locations are more informative than stand alone predicates for discovering and understanding bugs in this paper we propose an approach to automatically generate such faulty control flow paths that link many bug predictors together for revealing bugs our approach combines feature selection to accurately select failure related predicates as bug predictors clustering to group correlated predicates and control flow graph traversal in novel way to help generate the paths we have evaluated our approach on code including the siemens test suite and rhythmbox large music management application for gnome our experiments show that the faulty control flow paths are accurate useful for localizing many bugs and helped to discover previously unknown errors in rhythmbox
nowadays due to the lack of face to face contact distance course instructors have real difficulties knowing who their students are how their students behave in the virtual course what difficulties they find what probability they have of passing the subject in short they need to have feedback which helps them to improve the learning teaching process although most learning content management systems lcms offer reporting tool in general these do not show clear vision of each student’s academic progression in this work we propose decision making system which helps instructors to answer these and other questions using data mining techniques applied to data from lcmss databases the goal of this system is that instructors do not require data mining knowledge they only need to request pattern or model interpret the result and take the educational actions which they consider necessary
the early success of link based ranking algorithms was predicated on the assumption that links imply merit of the target pages however today many links exist for purposes other than to confer authority such links bring noise into link analysis and harm the quality of retrieval in order to provide high quality search results it is important to detect them and reduce their influence in this paper method is proposed to detect such links by considering multiple similarity measures over the source pages and target pages with the help of classifier these noisy links are detected and dropped after that link analysis algorithms are performed on the reduced link graph the usefulness of number of features are also tested experiments across query specific datasets show our approach almost doubles the performance of kleinberg’s hits and boosts bharat and henzinger’s imp algorithm by close to in terms of precision it also outperforms previous approach focusing on link farm detection
this paper reports field research of enterprise storage administration we observed that storage administration was complex and not always optimally supported by software tools the findings highlight storage administrator work practices and challenges and inform guidelines for designing software tools that support storage administrators work
runtime monitoring allows programmers to validate for instance the proper use of application interfaces given property specification runtime monitor tracks appropriate runtime events to detect violations and possibly execute recovery code although powerful runtime monitoring inspects only one program run at time and so may require many program runs to find errors therefore in this paper we present ahead of time techniques that can prove the absence of property violations on all program runs or flag locations where violations are likely to occur our work focuses on tracematches an expressive runtime monitoring notation for reasoning about groups of correlated objects we describe novel flow sensitive static analysis for analyzing monitor states our abstraction captures both positive information set of objects could be in particular monitor state and negative information the set is known not to be in state the analysis resolves heap references by combining the results of three points to and alias analyses we also propose machine learning phase to filter out likely false positives we applied set of tracematches to the dacapo benchmark suite and scimark our static analysis rules out all potential points of failure in of the cases and of false positives on average our machine learning algorithm correctly classifies the remaining potential points of failure in all but three of cases the approach revealed defects and suspicious code in three benchmark programs
motivated by contextual advertising systems and other web applications involving efficiency accuracy tradeoffs we study similarity caching here cache hit is said to occur if the requested item is similar but not necessarily equal to some cached item we study two objectives that dictate the efficiency accuracy tradeoff and provide our caching policies for these objectives by conducting extensive experiments on real data we show similarity caching can significantly improve the efficiency of contextual advertising systems with minimal impact on accuracy inspired by the above we propose simple generative model that embodies two fundamental characteristics of page requests arriving to advertising systems namely long range dependences and similarities we provide theoretical bounds on the gains of similarity caching in this model and demonstrate these gains empirically by fitting the actual data to the model
responsiveness or the time until person responds to communication can affect the dynamics of conversation as well as participants perceptions of one another in this paper we present careful examination of responsiveness to instant messaging communication showing for example that work fragmentation significantly correlates with faster responsiveness we show also that the presentation of the incoming communication significantly affects responsiveness even more so than indicators that the communication was ongoing suggesting the potential for dynamically influencing responsiveness this work contributes to better understanding of computer mediated communication and to the design of new tools for computer mediated communication
actuation functionality in sensor network enables an unprecedented interaction with the physical environment when used by malicious distributed network however actuation may become potent new attack in this work we explore new general class of actuation attacks which aim to disable the sensing fidelity and dependability of wireless sensor network we propose countermeasure to this denial of service on sensing doss based on controlled level of random mobility we show how the level of mobility may be traded off to suit security needs and energy constraints and to exploit priori knowledge of the environment we demonstrate how this random mobility approach performs under various strengths densities and distributions of the two networks and show that it reduces the number of affected nodes exponentially over time furthermore we discuss how this simple mobility approach renders the network more fault tolerant and resilient in an inherent way without need for the nodes to communicate and aggregate their sensed data
sports and fitness are increasingly attracting the interest of computer science researchers as well as companies in particular recent mobile devices with hardware graphics acceleration offer new still unexplored possibilities this paper investigates the use of mobile guides in fitness activities proposing the mobile personal trainer mopet application mopet uses gps device to monitor user’s position during her physical activity in an outdoor fitness trail it provides navigation assistance by using fitness trail map and giving speech directions moreover mopet provides motivation support and exercise demonstrations by using an embodied virtual trainer called evita evita shows how to correctly perform the exercises along the trail with animations and incites the user to the best of our knowledge our project is the first to employ mobile guide for fitness activities the effects of mopet on motivation as well as its navigational and training support have been experimentally evaluated with users evaluation results encourage the use of mobile guides and embodied virtual trainers in outdoor fitness applications
in this paper we consider object protocols that constrain interactions between objects in program several such protocols have been proposed in the literature for many apis such as jdom jdbc api designers constrain how api clients interact with api objects in practice api clients violate such constraints as evidenced by postings in discussion forums for these apis thus it is important that api designers specify constraints using appropriate object protocols and enforce them the goal of an object protocol is expressed as protocol invariant fundamental properties such as ownership can be expressed as protocol invariants we present language prolang to specify object protocols along with their protocol invariants and tool invcop to check if program satisfies protocol invariant invcop separates the problem of checking if protocol satisfies its protocol invariant called protocol correctness from the problem of checking if program conforms to protocol called program conformance the former is solved using static analysis and the latter using runtime analysis due to this separation errors made in protocol design are detected at higher level of abstraction independent of the program’s source code and performance of conformance checking is improved as protocol correctness has been verified statically we present theoretical guarantees about the way we combine static and runtime analysis and empirical evidence that our tool invcop finds usage errors in widely used apis we also show that statically checking protocol correctness greatly optimizes the overhead of checking program conformance thus enabling api clients to test whether their programs use the api as intended by the api designer
families use range of devices and locations to capture manage and share digital photos as part of their digital photo ecosystem the act of moving media between devices and locations is not always simple though and can easily become time consuming we conducted interviews and design sessions in order to better understand the movement of media in digital photo ecosystems and investigate ways to improve it our results show that users must manage multiple entry points into their ecosystem avoid segmentation in their collections and explicitly select and move photos between desired devices and locations through design sessions we present and evaluate design ideas to overcome these challenges that utilize multipurpose devices always accessible photo collections and sharing from any device these show how automation can be combined with recommendation and user interaction to improve flow within digital photo ecosystems
we present an approach for extracting reliefs and details from relief surfaces we consider relief surface as surface composed of two components base surface and height function which is defined over this base however since the base surface is unknown the decoupling of these components is challenge we show how to estimate robust height function over the base without explicitly extracting the base surface this height function is utilized to separate the relief from the base several applications benefiting from this extraction are demonstrated including relief segmentation detail exaggeration and dampening copying of details from one object to another and curve drawing on meshes
the web has become an important medium for news delivery and consumption fresh content about variety of topics events and places is constantly being created and published on the web by news agencies around the world as intuitively understood by readers and studied in journalism news articles produced by different social groups present different attitudes towards and interpretations of the same news issues in this paper we propose new paradigm for aggregating news articles according to the local news sources associated with the stakeholders of the news issues this new paradigm provides users the capability to aggregate and browse various local points of view about the news issues in which they are interested we implement this paradigm in system called localsavvy localsavvy analyzes the news articles provided by users using knowledge about locations automatically acquired from the web based on the analysis of the news issue the system finds and aggregates local news articles published by official and unofficial news sources associated with the stakeholders moreover opinions from those local social groups are extracted from the retrieved results presented in the summaries and highlighted in the news web pages we evaluate localsavvy with user study the quantitative and qualitative analysis shows that news articles aggregated by localsavvy present relevant and distinct local opinions which can be clearly perceived by the subjects
multi core processors with low communication costs and high availability of execution cores will increase the use of execution and compilation models that use short threads to expose parallelism current branch predictors seek to incorporate large amounts of control flow history to maximize accuracy however when that history is absent the predictor fails to work as intended thus modern predictors are almost useless for threads below certain length using speculative multithreaded spmt architecture as an example of system which generates shorter threads this work examines techniques to improve branch prediction accuracy when new thread begins to execute on different core this paper proposes minor change to the branch predictor that gives virtually the same performance on short threads as an idealized predictor that incorporates unknowable pre history of spawned speculative thread at the same time strong performance on long threads is preserved the proposed technique sets the global history register of the spawned thread to the initial value of the program counter this novel and simple design reduces branch mispredicts by and provides as much as ipc improvement on selected spec benchmarks
several studies have explored the relationship between the metrics of the object oriented software and the change proneness of the classes this knowledge can be used to help decision making among design alternatives or assess software quality such as maintainability despite the increasing use of complex inheritance relationships and polymorphism in object oriented software there has been less emphasis on developing metrics that capture the aspect of dynamic behavior considering dynamic behavior metrics in conjunction with existing metrics may go long way toward obtaining more accurate predictions of change proneness to address this need we provide the behavioral dependency measure using structural and behavioral information taken from uml design models model based change proneness prediction helps to make high quality software by exploiting design models from the earlier phase of the software development process the behavioral dependency measure has been evaluated on multi version medium size open source project called jflex the results obtained show that the proposed measure is useful indicator and can be complementary to existing object oriented metrics for improving the accuracy of change proneness prediction when the system contains high degree of inheritance relationships and polymorphism
the intuition that different text classifiers behave in qualitatively different ways has long motivated attempts to build better metaclassifier via some combination of classifiers we introduce probabilistic method for combining classifiers that considers the context sensitive reliabilities of contributing classifiers the method harnesses reliability indicators variables that provide valuable signal about the performance of classifiers in different situations we provide background present procedures for building metaclassifiers that take into consideration both reliability indicators and classifier outputs and review set of comparative studies undertaken to evaluate the methodology
in this paper we propose re ranking algorithm using post retrieval clustering for content based image retrieval cbir in conventional cbir systems it is often observed that images visually dissimilar to query image are ranked high in retrieval results to remedy this problem we utilize the similarity relationship of the retrieved results via post retrieval clustering in the first step of our method images are retrieved using visual features such as color histogram next the retrieved images are analyzed using hierarchical agglomerative clustering methods hacm and the rank of the results is adjusted according to the distance of cluster from query in addition we analyze the effects of clustering methods querycluster similarity functions and weighting factors in the proposed method we conducted number of experiments using several clustering methods and cluster parameters experimental results show that the proposed method achieves an improvement of retrieval effectiveness of over on average in the average normalized modified retrieval rank anmrr measure
languages that integrate functional and logic programming with complete operational semantics are based on narrowing unification based goal solving mechanism which subsumes the reduction principle of functional languages and the resolution principle of logic languages in this article we present partial evaluation scheme for functional logic languages based on an automatic unfolding algorithm which builds narrowing trees the method is formalized within the theoretical framework established by lloyd and shepherdson for the partial deduction of logic programs which we have generalized for dealing with functional computations generic specialization algorithm is proposed which does not depend on the eager or lazy nature of the narrower being used to the best of our knowledge this is the first generic algorithm for the specialization of functional logic programs we also discuss the relation to work on partial evaluation in functional programming term rewriting systems and logic programming finally we present some experimental results with an implementation of the algorithm which show in practice that the narrowing driven partial evaluator effectively combines the propagation of partial data structures by means of logical variables and unification with better opportunities for optimization thanks to the functional dimension
major asset of modern systems is to dynamically reconfigure systems to cope with failures or component updates nevertheless designing such systems with off the shelf components is hardly feasible components are black boxes that can only interact with others on compatible interfaces part of the problem is solved through software adaptation techniques which compensates mismatches between interfaces our approach aims at using results of software adaptation in order to also provide reconfiguration capabilities to black box components this paper provides two contributions formal framework that unifies behavioural adaptation and structural reconfiguration of components this is used for statically reasoning whether it is possible to reconfigure system and ii two cases of reconfiguration in client server system in which the server is substituted by another one with different behavioural interface and the system keeps on working transparently from the client’s point of view
we present novel architecture for hardware accelerated rendering of point primitives our pipeline implements refined version of ewa splatting high quality method for antialiased rendering of point sampled representations central feature of our design is the seamless integration of the architecture into conventional opengl like graphics pipelines so as to complement triangle based rendering the specific properties of the ewa algorithm required variety of novel design concepts including ternary depth test and using an on chip pipelined heap data structure for making the memory accesses of splat primitives more coherent in addition we developed computationally stable evaluation scheme for perspectively corrected splats we implemented our architecture both on reconfigurable fpga boards and as an asic prototype and we integrated it into an opengl like software implementation our evaluation comprises detailed performance analysis using scenes of varying complexity
although compile time optimizations generally improve program performance degradations caused by individual techniques are to be expected one promising research direction to overcome this problem is the development of dynamic feedback directed optimization orchestration algorithms which automatically search for the combination of optimization techniques that achieves the best program performance the challenge is to develop an orchestration algorithm that finds in an exponential search space solution that is close to the best in acceptable time in this paper we build such fast and effective algorithm called combined elimination ce the key advance of ce over existing techniques is that it takes the least tuning time of the closest alternative while achieving the same program performance we conduct the experiments on both pentium iv machine and sparc ii machine by measuring performance of spec cpu benchmarks under large set of gcc compiler options furthermore through orchestrating small set of optimizations causing the most degradation we show that the performance achieved by ce is close to the upper bound obtained by an exhaustive search algorithm the gap is less than on average
web applications often use html templates to separate the webpage presentation from its underlying business logic and objects this is now the de facto standard programming model for web application development this paper proposes novel implementation for existing server side template engines flyingtemplate for reduced bandwidth consumption in web application servers and off loading html generation tasks to web clients instead of producing fully generated html page the proposed template engine produces skeletal script which includes only the dynamic values of the template parameters and the bootstrap code that runs on web browser at the client side it retrieves client side template engine and the payload templates separately with the goals of efficiency implementation transparency security and standards compliance in mind we developed flyingtemplate with two design principles effective browser cache usage and reasonable compromises which restrict the template usage patterns and relax the security policies slightly but in controllable way this approach allows typical template based web applications to run effectively with flyingtemplate as an experiment we tested the specweb banking application using flyingtemplate without any other modifications and saw throughput improvements from to in its best mode in addition flyingtemplate can enforce compliance with simple security policy thus addressing the security problems of client server partitioning in the web environment
cutting up complex object into simpler sub objects is fundamental problem in various disciplines in image processing images are segmented while in computational geometry solid polyhedra are decomposed in recent years in computer graphics polygonal meshes are decomposed into sub meshes in this paper we propose novel hierarchical mesh decomposition algorithm our algorithm computes decomposition into the meaningful components of given mesh which generally refers to segmentation at regions of deep concavities the algorithm also avoids over segmentation and jaggy boundaries between the components finally we demonstrate the utility of the algorithm in control skeleton extraction
we address the problem of selection of fragments of xml input data as required for xquery evaluation rather than first selecting individual fragments in isolation and then bringing them together as required by multiple variable bindings we select the tuples of co related fragments at once in one pass over the input our approach is event driven and correspondingly does not require building the input data in memory the tuples as needed for generating the output are reported as early as possible combined with an incremental garbage collection scheme of the buffers our approach allows to store at any time only as much data as necessarily needed for the query evaluation
we propose similarity based matching technique for the purpose of quasi periodic time series patterns alignment the method is based on combination of two previously published works modified version of the douglas peucker line simplification algorithm dpsimp for data reduction in time series and sea for pattern matching of quasi periodic time series the previously developed sea method was shown to be more efficient than the very popular dtw technique the aim of the obtained aseal method approximate shape exchange algorithm is reduction of the space and time necessary to accomplish alignments comparable to those of the sea method the study shows the effectiveness of the proposed aseal method on ecg signals taken from the massachusetts institute of technology beth israel hospital mit bih database in terms of the correlation factor and alignment quality for savings up to in used samples and processing time reduction up to with respect to those of sea particularly the method is able to deal with very complex alignment situations magnitude time axis shift scaling local variabilities difference in length phase shift arbitrary number of periods in the context of quasi periodic time series among other possible applications the proposed aseal method is novel step toward resolution of the person identification using ecg problem
we present an object level scalable web framework and discuss our implementation as well as simulation experiences with them this object level scalable web framework automatically monitors access patterns replicates and maintains large multimedia objects among set of geographically distributed web servers without the need for full url replication this framework employs traceable rtsp http redirection approach to avoid cyclic redirections among the geographically distributed web serverswe implemented this web framework on wide area network with set of three web servers one at the technical university of darmstadt germany second one at the virginia polytechnic university usa and the third at the national university of singapore we also carried out series of simulation experiments to analyze the scalability of this object level scalable web framework in this paper we share these implementation and simulation experimental results
due to the emergence of portable devices that must run complex dynamic applications there is need for flexible platforms for embedded systems runtime reconfigurable hardware can provide this flexibility but the reconfiguration latency can significantly decrease the performance when dealing with task graphs runtime support that schedules the reconfigurations in advance can drastically reduce this overhead however executing complex scheduling heuristics at runtime may generate an excessive penalty hence we have developed hybrid design time runtime reconfiguration scheduling heuristic that generates its final schedule at runtime but carries out most computations at design time we have tested our approach in powerpc processor embedded on fpga demonstrating that it generates very small runtime penalty while providing almost as good schedules as full runtime approach
mobile ad hoc networks manet offer convenient basis toward realization of pervasive computing due to its ease of deployment and inherent support for anytime anywhere network access for mobile users however the development of applications over such networks is faced by the challenge of network dynamics attributed to node mobility and the scalability issue group management poses as promising paradigm to ease the development of distributed applications for dynamic mobile networks specifically group management makes transparent the failures due to node mobility and assembles mobile nodes to meet target functional and non functional properties various network level grouping schemes over manet have been investigated over the last couple of years in this paper we introduce the design and implementation of generic group service for manet defined with respect to the various attributes of relevance generic group management is further demonstrated with its support of scalable service discovery in manet
in this paper we present new practical approach to solve the incremental nearest point problem in the plane we used the proposed approach in industrial applications with superior behaviour to the theoretically better solutions the method efficiently avoids the requirement of initial randomization of the input points by splitting the plane in strips using heuristic points in strips are stored either in skip lists or in trees testing of the algorithms at different point distributions shows that our algorithm using proposed heuristic is almost insensible to distributions of input points what makes the algorithm very attractive for various engineering applications
aspect oriented software development is gaining popularity with the wider adoption of languages such as aspectj to reduce the manual effort of testing aspects in aspectj programs we have developed framework called aspectra that automates generation of test inputs for testing aspectual behavior ie the behavior implemented in pieces of advice or intertype methods defined in aspects to test aspects developers construct base classes into which the aspects are woven to form woven classes our approach leverages existing test generation tools to generate test inputs for the woven classes these test inputs indirectly exercise the aspects to enable aspects to be exercised during test generation aspectra automatically synthesizes appropriate wrapper classes for woven classes to assess the quality of the generated tests aspectra defines and measures aspectual branch coverage branch coverage within aspects to provide guidance for developers to improve test coverage aspectra also defines interaction coverage we have developed tools for automating aspectra’s wrapper synthesis and coverage measurement and applied them on testing subjects taken from variety of sources our experience has shown that aspectra effectively provides tool supports in enabling existing test generation tools to generate test inputs for improving aspectual branch coverage
moving object environments are characterized by large numbers of moving objects and numerous concurrent continuous queries over these objects efficient evaluation of these queries in response to the movement of the objects is critical for supporting acceptable response times in such environments the traditional approach of building an index on the objects data suffers from the need for frequent updates and thereby results in poor performance in fact brute force no index strategy yields better performance in many cases neither the traditional approach nor the brute force strategy achieve reasonable query processing times this paper develops novel techniques for the efficient and scalable evaluation of multiple continuous queries on moving objects our solution leverages two complimentary techniques query indexing and velocity constrained indexing vci query indexing relies on incremental evaluation reversing the role of queries and data and exploiting the relative locations of objects and queries vci takes advantage of the maximum possible speed of objects in order to delay the expensive operation of updating an index to reflect the movement of objects in contrast to an earlier technique that requires exact knowledge about the movement of the objects vci does not rely on such information while query indexing outperforms vci it does not efficiently handle the arrival of new queries velocity constrained indexing on the other hand is unaffected by changes in queries we demonstrate that combination of query indexing and velocity constrained indexing enables the scalable execution of insertion and deletion of queries in addition to processing ongoing queries we also develop several optimizations and present detailed experimental evaluation of our techniques the experimental results show that the proposed schemes outperform the traditional approaches by almost two orders of magnitude
the index selection problem isp concerns the selection of an appropriate index set to minimize the total cost for given workload containing read and update queries since the isp has been proven to be an np hard problem most studies focus on heuristic algorithms to obtain approximate solutions however even approximate algorithms still consume large amount of computing time and disk space because these systems must record all query statements and frequently request from the database optimizers the cost estimation of each query in each considered index this study proposes novel algorithm without repeated optimizer estimations when query is delivered to database system the optimizer evaluates the costs of various query plans and chooses an access path for the query the information from the evaluation stage is aggregated and recorded with limited space the proposed algorithm can recommend indexes according to the readily available information without querying the optimizer again the proposed algorithm was tested in postgresql database system using tpc data experimental results show the effectiveness of the proposed approach
physical gaming is genre of computer games that has recently been made available for the home but what does it mean to bring games home that were originally designed for play in the arcade this paper describes an empirical study that looks at physical gaming and how it finds its place in the home we discuss the findings from this study by organizing them around four topics the adoption of the game its unique spatial needs the tension between visibility and availability of the game and what it means to play among what we describe as the gaming circle or players and non players alike finally we discuss how physical gaming in the home surfaces questions and issues for householders and researchers around adoption gender and both space and place
in this paper we examine the methodological issues involved in constructing test collections of structured documents and obtaining best entry points for the evaluation of the focussed retrieval of document components we describe pilot test of the proposed test collection construction methodology performed on document collection of shakespeare plays in our analysis we examine the effect of query complexity and type on overall query difficulty the use of multiple relevance judges for each query the problem of obtaining exhaustive relevance assessments from participants and the method of eliciting relevance assessments and best entry points our findings indicate that the methodology is indeed feasible in this small scale context and merits further investigation
we consider here scalar aggregation queries in databases that may violate given set of functional dependencies we define consistent answers to such queries to be greatest lowest least upper bounds on the value of the scalar function across all minimal repairs of the database we show how to compute such answers we provide complete characterization of the computational complexity of this problem we also show how tractability can be improved in several special cases one involves novel application of boyce codd normal form and present practical hybrid query evaluation method
we introduce mesh quilting geometric texture synthesis algorithm in which texture sample given in the form of triangle mesh is seamlessly applied inside thin shell around an arbitrary surface through local stitching and deformation we show that such geometric textures allow interactive and versatile editing and animation producing compelling visual effects that are difficult to achieve with traditional texturing methods unlike pixel based image quilting mesh quilting is based on stitching together geometry elements our quilting algorithm finds corresponding geometry elements in adjacent texture patches aligns elements through local deformation and merges elements to seamlessly connect texture patches for mesh quilting on curved surfaces critical issue is to reduce distortion of geometry elements inside the space of the thin shell to address this problem we introduce low distortion parameterization of the shell space so that geometry elements can be synthesized even on very curved objects without the visual distortion present in previous approaches we demonstrate how mesh quilting can be used to generate convincing decorations for wide range of geometric textures
internet traffic primarily consists of packets from elastic flows ie web transfers file transfers and mail whose transmissions are mediated via the transmission control protocol tcp in this paper we develop methodology to process tcp flow measurements in order to analyze throughput correlations among tcp flow classes that can be used to infer congestion sharing in the internet the primary contributions of this paper are development of technique for processing flow records suitable for inferring congested resource sharing evaluation of the use of factor analysis on processed flow records to explore which tcp flow classes might share congested resources and validation of our inference methodology using bootstrap methods and nonintrusive flow level measurements collected at single network site our proposal for using flow level measurements to infer congestion sharing differs significantly from previous research that has employed packet level measurements for making inferences possible applications of our method include network monitoring and root cause analysis of poor performance
automatically detecting bugs in programs has been long held goal in software engineering many techniques exist trading off varying levels of automation thoroughness of coverage of program behavior precision of analysis and scalability to large code bases this paper presents the calysto static checker which achieves an unprecedented combination of precision and scalability in completely automatic extended static checker calysto is interprocedurally path sensitive fully context sensitive and bit accurate in modeling data operations comparable coverage and precision to very expensive formal analyses yet scales comparably to the leading less precise static analysis based tool for similar properties using calysto we have discovered dozens of bugs completely automatically in hundreds of thousands of lines of production open source applications with very low rate of false error reports this paper presents the design decisions algorithms and optimizations behind calysto’s performance
we investigate the problem of reliable computation in the presence of faults that may arbitrarily corrupt memory locations in this framework we consider the problems of sorting and searching in optimal time while tolerating the largest possible number of memory faults in particular we design an nlogn time sorting algorithm that can optimally tolerate up to nlogn memory faults in the special case of integer sorting we present an algorithm with linear expected running time that can tolerate faults we also present randomized searching algorithm that can optimally tolerate up to logn memory faults in logn expected time and an almost optimal deterministic searching algorithm that can tolerate logn faults for any small positive constant in logn worst case time all these results improve over previous bounds
we are building an intelligent information system to aid users in their investigative tasks such as detecting fraud in such task users must progressively search and analyze relevant information before drawing conclusion in this paper we address how to help users find relevant informa tion during an investigation specifically we present novel approach that can improve information retrieval by exploiting user’s investigative context compared to existing retrieval systems which are either context insensitive or leverage only limited user context our work offers two unique contributions first our system works with users cooperatively to build an investigative context which is otherwise very difficult to capture by machine or human alone second we develop context aware method that can adaptively retrieve and evaluate information relevant to an ongoing investigation experiments show that our approach can improve the relevance of retrieved information significantly as result users can fulfill their investigative tasks more efficiently and effectively
we presen an object oriented calculus whic hallows arbitrary hiding of methods in protot ypes even in the presence of binary methods and friend functions this combination of features permits complete control of the in terface class exposes to the remainder of program which is of key importance for program readability security and ease of maintenance while still allowing complex in teractions with other classes belonging to the same module or softw are componentthis result is made possible by the use of views view is name that specifies an in terface to an object set of views is attached to each object and method can be invoked either directly or via view of the object
computational complexity has been the primary challenge of many vlsi cad applications the emerging multicore and many core microprocessors have the potential to offer scalable performance improvement how to explore the multicore resources to speed up cad applications is thus natural question but also huge challenge for cad researchers indeed decades of work on general purpose compilation approaches that automatically extracts parallelism from sequential program has shown limited success past work has shown that programming model and algorithm design methods have great influence on usable parallelism in this paper we propose methodology to explore concurrency via nondeterministic transactional algorithm design and to program them on multicore processors for cad applications we apply the proposed methodology to the min cost flow problem which has been identified as the key problem in many design optimizations from wire length optimization in detailed placement to timing constrained voltage assignment concurrent algorithm and its implementation on multicore processors for min cost flow have been developed based on the methodology experiments on voltage island generation in floorplanning demonstrated its efficiency and scalable speedup over different number of cores
we describe the implementation of the magic sets transformation in the starburst extensible relational database system to our knowledge this is the first implementation of the magic sets transformation in relational database system the starburst implementation has many novel features that make our implementation especially interesting to database practitioners in addition to database researchers we use cost based heuristic for determining join orders sips before applying magic we push all equality and non equality predicates using magic replacing traditional predicate pushdown optimizations we apply magic to full sql with duplicates aggregation null values and subqueries we integrate magic with other relational optimization techniques the implementation is extensibleour implementation demonstrates the feasibility of the magic sets transformation for commercial relational systems and provides mechanism to implement magic as an integral part of new database system or as an add on to an existing database system
complex queries are becoming commonplace with the growing use of decision support systems these complex queries often have lot of common sub expressions either within single query or across multiple such queries run as batch multiquery optimization aims at exploiting common sub expressions to reduce evaluation cost multi query optimization has hither to been viewed as impractical since earlier algorithms were exhaustive and explore doubly exponential search space in this paper we demonstrate that multi query optimization using heuristics is practical and provides significant benefits we propose three cost based heuristic algorithms volcano sh and volcano ru which are based on simple modifications to the volcano search strategy and greedy heuristic our greedy heuristic incorporates novel optimizations that improve efficiency greatly our algorithms are designed to be easily added to existing optimizers we present performance study comparing the algorithms using workloads consisting of queries from the tpc benchmark the study shows that our algorithms provide significant benefits over traditional optimization at very acceptable overhead in optimization time
in this paper feature preserving mesh hole filling algorithm is realized by the polynomial blending technique we first search for feature points in the neighborhood of the hole these feature points allow us to define the feature curves with missing parts in the hole polynomial blending curve is constructed to complete the missing parts of the feature curves these feature curves divide the original complex hole into small simple sub holes we use the bezier lagrange hybrid patch to fill each sub hole the experimental results show that our mesh hole filling algorithm can effectively restore the original shape of the hole
auto parallelizing compilers for embedded applications have been unsuccessful due to the widespread use of pointer arithmetic and the complex memory model of multiple address space digital signal processors dsps this paper develops for the first time complete auto parallelization approach which overcomes these issues it first combines pointer conversion technique with new modulo elimination transformation for program recovery enabling later parallelization stages next it integrates novel data transformation technique that exposes the processor location of partitioned data when this is combined with new address resolution mechanism it generates efficient programs that run on multiple address spaces without using message passing furthermore as dsps do not possess any data cache structure an optimization is presented which transforms the program to both exploit remote data locality and local memory bandwidth this parallelization approach is applied to the dspstone and utdsp benchmark suites giving an average speedup of on four analog devices tigersharc ts processors
scalability and energy management issues are crucial for sensor network databases in this paper we introduce the sharing and partitioning of stream spectrum spass protocol as new approach to provide scalability with respect to the number of sensors and to manage the power consumption efficiently the spectrum of sensor is the range distribution of values read by that sensor close by sensors tend to give similar readings and consequently exhibit similar spectra we propose to combine similar spectra into one global spectrum that is shared by all contributing sensors then the global spectrum is partitioned among the sensors such that each sensor carries out the responsibility of managing partition of the spectrum spectrum sharing and partitioning require continuous coordination to balance the load over the sensors experimental results show that the spass protocol relieves sensor database system from the burden of data acquisition in large scale sensor networks and reduces the per sensor power consumption
there are various applications in wireless sensor networks which require knowing the relative or actual position of the sensor nodes over the past few years there have been different localisation algorithms proposed in the literature the algorithms based on classical multi dimensional scaling mds only require or anchor nodes and can provide higher accuracy than some other schemes in this paper we propose and analyse another type of mds called ordinal mds for localisation in wireless sensor networks ordinal mds differs from classical mds in that it only requires monotonicity constraint between the shortest path distance and the euclidean distance for each pair of nodes we conduct simulation studies under square and shaped topologies with different connectivity levels and number of anchors results show that ordinal mds provides lower position estimation error than classical mds in both hop based and range based scenarios
in designing spoken dialogue system developers need to specify the actions system should take in response to user speech input and the state of the environment based on observed or inferred events states and beliefs this is the fundamental task of dialogue management researchers have recently pursued methods for automating the design of spoken dialogue management using machine learning techniques such as reinforcement learning in this paper we discuss how dialogue management is handled in industry and critically evaluate to what extent current state of the art machine learning methods can be of practical benefit to application developers who are deploying commercial production systems in examining the strengths and weaknesses of these methods we highlight what academic researchers need to know about commercial deployment if they are to influence the way industry designs and practices dialogue management
in recent years management of moving objects has emerged as an active topic of spatial access methods various data structures indexes have been proposed to handle queries of moving points for example the well known bx tree uses novel mapping mechanism to reduce the index update costs however almost all the existing indexes for predictive queries are not applicable in certain circumstances when the update frequencies of moving objects become highly variable and when the system needs to balance the performance of updates and queries in this paper we introduce two kinds of novel indexes named by tree and αby tree by associating prediction life period with every moving object the proposed indexes are applicable in the environments with highly variable update frequencies in addition the αby tree can balance the performance of updates and queries depending on balance parameter experimental results show that the by tree and αby tree outperform the bx tree in various conditions
stream clustering algorithms are traditionally designed to process streams efficiently and to adapt to the evolution of the underlying population this is done without assuming any prior knowledge about the data however in many cases certain amount of domain or background knowledge is available and instead of simply using it for the external validation of the clustering results this knowledge can be used to guide the clustering process in non stream data domain knowledge is exploited in the context of semi supervised clusteringin this paper we extend the static semi supervised learning paradigm for streams we present denstream density based clustering algorithm for data streams that includes domain information in the form of constraints we also propose novel method for the use of background knowledge in data streams the performance study over number of real and synthetic data sets demonstrates the effectiveness and efficiency of our method to our knowledge this is the first approach to include domain knowledge in clustering for data streams
dependent type theory has several practical applications in the fields of theorem proving program verification and programming language design ivor is haskell library designed to allow easy extending and embedding of type theory based theorem prover in haskell application in this paper give an overview of the library and show how it can be used to embed theorem proving technology in an implementation of simple functional programming language by using type theory as core representation we can construct and evaluate terms and prove correctness properties of those terms within the same framework ensuring consistency of the implementation and the theorem prover
caches have become increasingly important with the widening gap between main memory and processor speeds small and fast cache memories are designed to bridge this discrepancy however they are only effective when programs exhibit sufficient data localitythe performance of the memory hierarchy can be improved by means of data and loop transformations tiling is loop transformation that aims at reducing capacity misses by shortening the reuse distance padding is data layout transformation targeted to reduce conflict missesthis article presents an accurate cost model that describes misses across different hierarchy levels and considers the effects of other hardware components such as branch predictors the cost model drives the application of tiling and padding transformations we combine the cost model with genetic algorithm to compute the tile and pad factors that enhance the program performanceto validate our strategy we ran experiments for set of benchmarks on large set of modern architectures our results show that this scheme is useful to optimize programs performance when compared to previous approaches we observe that with reasonable compile time overhead our approach gives significant performance improvements for all studied kernels on all architectures
among the computational intelligence techniques employed to solve classification problems fuzzy rule based classification systems frbcss are popular tool because of their interpretable models based on linguistic variables which are easier to understand for the experts or end users the aim of this paper is to enhance the performance of frbcss by extending the knowledge base with the application of the concept of interval valued fuzzy sets ivfss we consider post processing genetic tuning step that adjusts the amplitude of the upper bound of the ivfs to contextualize the fuzzy partitions and to obtain most accurate solution to the problem we analyze the goodness of this approach using two basic and well known fuzzy rule learning algorithms the chi et al method and the fuzzy hybrid genetics based machine learning algorithm we show the improvement achieved by this model through an extensive empirical study with large collection of data sets
the suffix tree or equivalently the enhanced suffix array provides efficient solutions to many problems involving pattern matching and pattern discovery in large strings such as those arising in computational biology here we address the problem of arranging suffix array on disk so that querying is fast in practice we show that the combination of small trie and suffix array like blocked data structure allows queries to be answered as much as three times faster than the best alternative disk based suffix array arrangement construction of our data structure requires only modest processing time on top of that required to build the suffix tree and requires negligible extra memory
the automatic interpretation of case law by computerised natural language processing algorithms remains an elusive challenge this paper proposes less ambitious goal human assisted semantic tagging of case law using basic deontic ontology so that the structured document can be queried independent resource description framework rdf named graphs are used to represent the legal case an implementation of event calculus is used to make inferences over these named graphs we employ our minimal deontic encoding to encode the significant aspects of commercial case from the south african high court
queries containing universal quantification are used in many applications including business intelligence applications and in particular data mining we present comprehensive survey of the structure and performance of algorithms for universal quantification we introduce framework that results in complete classification of input data for universal quantification then we go on to identify the most efficient algorithm for each such class one of the input data classes has not been covered so far for this class we propose several new algorithms thus for the first time we are able to identify the optimal algorithm to use for any given input datasetthese two classifications of optimal algorithms and input data are important for query optimization they allow query optimizer to make the best selection when optimizing at intermediate steps for the quantification problemin addition to the classification we show the relationship between relational division and the set containment join and we illustrate the usefulness of employing universal quantifications by presenting novel approach for frequent itemset discovery
multicasting is intended for group oriented computing there are more and more applications where one to many or many to many dissemination is an essential task the multicast service is critical in applications characterized by the close collaboration of teams many applications such as audio video distribution can tolerate loss of data content but many other applications cannot in addition even loss tolerant applications will suffer performance penalty an audio stream may experience short gap or lower fidelity in the presence of loss this paper describes our experience with implementing multicast routing protocol that delivers packets to all intended recipients with high probability due to number of reasons packet delivery cannot be achieved but packet delivery ratios in excess of are possible in most cases
this paper addresses complexity issues for important problems arising with disjunctive databases in particular the complexity of inference of literal and formula from propositional disjunctive database under variety of well known disjunctive database semantics is investigated as well deciding whether disjunctive database has model under particular semantics the problems are located in appropriate slots of the polynomial hierarchy
as social service in web folksonomy provides the users the ability to save and organize their bookmarks online with social annotations or tags social annotations are high quality descriptors of the web pages topics as well as good indicators of web users interests we propose personalized search framework to utilize folksonomy for personalized search specifically three properties of folksonomy namely the categorization keyword and structure property are explored in the framework the rank of web page is decided not only by the term matching between the query and the web page’s content but also by the topic matching between the user’s interests and the web page’s topics in the evaluation we propose an automatic evaluation framework based on folksonomy data which is able to help lighten the common high cost in personalized search evaluations series of experiments are conducted using two heterogeneous data sets one crawled from delicious and the other from dogear extensive experimental results show that our personalized search approach can significantly improve the search quality
dynamics is an inherent characteristic of computational grids the volatile nodal availablity requires grid applications and services be adaptive to changes of the underlying grid topology mobile execution allows mobile users or tasks to relocate across different nodes in the grid this poses new challenges to resource access control resource sharing in the grid coalition environment creates certain temporal and spatial requirements for accesses by mobile entities however there is lack of formal treatment of the impact of mobility on the shared resource access control in this paper we formalize the mobile execution of grid entities by using the mobile code model we introduce shared resource access language sral to model the behaviors of mobile codes sral is structured and composed so that the program of mobile code can be constructed recursively from primitive accesses we define the operational semantics of sral and prove that it is expressive enough for most resource access patterns in particular it is complete in the sense that it can specify any program of regular trace model constraint language srac is defined to specify spatial constraints for shared resource accesses checking if the behavior of mobile code satisfies given spatial constraint can be solved by polynomial time algorithm we apply the duration calculus to express temporal constraints and show the constraint satisfaction problem is decidable as well we extend the role based access control model to specify and enforce our spatio temporal constraints to prove the concept and technical feasibility of our coordinated access control model we implemented it in mobile agent system which emulates mobile execution in grids by software agents
malware detection is crucial aspect of software security current malware detectors work by checking for signatures which attempt to capture the syntactic characteristics of the machine level byte sequence of the malware this reliance on syntactic approach makes current detectors vulnerable to code obfuscations increasingly used by malware writers that alter the syntactic properties of the malware byte sequence without significantly affecting their execution behavior this paper takes the position that the key to malware identification lies in their semantics it proposes semantics based framework for reasoning about malware detectors and proving properties such as soundness and completeness of these detectors our approach uses trace semantics to characterize the behavior of malware as well as that of the program being checked for infection and uses abstract interpretation to ldquo hide rdquo irrelevant aspects of these behaviors as concrete application of our approach we show that standard signature matching detection schemes are generally sound but not complete the semantics aware malware detector proposed by christodorescu et al is complete with respect to number of common obfuscations used by malware writers and the malware detection scheme proposed by kinder et al and based on standard model checking techniques is sound in general and complete on some but not all obfuscations handled by the semantics aware malware detector
this paper describes component model where the overall semantics of component is included in the interface definition such model is necessary for future computing where programs will run at internet scales and will employ combination of web services grid technologies peer to peer sharing autonomic capabilities and open source implementations the component model is based on packages and supports static and dynamic objects interfaces structures and exceptions the interface definitions provide practical approach to defining functional semantics and include appropriate extensions to provide semantics for security privacy recovery and costs the component model has been implemented in prototype framework and demonstrated in an internet scale example
automatic image annotation automatically labels image content with semantic keywords for instance the relevance model estimates the joint probability of the keyword and the image most of the previous annotation methods assign keywords separately recently the correlation between annotated keywords has been used to improve image annotation however directly estimating the joint probability of set of keywords and the unlabeled image is computationally prohibitive to avoid the computation difficulty we propose heuristic greedy iterative algorithm to estimate the probability of keyword subset being the caption of an image in our approach the correlations between keywords are analyzed by automatic local analysis of text information retrieval in addition new image generation probability estimation method is proposed based on region matching we demonstrate that our iterative annotation algorithm can incorporate the keyword correlations and the region matching approaches handily to improve the image annotation significantly the experiments on the eccv benchmark show that our method outperforms the state of the art continuous feature model mbrm with recall and precision improving and respectively
the advent of grid environments made feasible the solution of computational intensive problems in reliable and cost effective way as workflow systems carry out more complex and mission critical applications quality of service qos analysis serves to ensure that each application meets user requirements in that frame we present novel algorithm which allows the mapping of workflow processes to grid provided services assuring at the same time end to end provision of qos based on user defined parameters and preferences we also demonstrate the operation of the implemented algorithm and evaluate its effectiveness using grid scenario based on image rendering application
an important problem that confronts peer to peer pp systems is efficient support for content based search in this paper we look at how similarity query in high dimensional spaces can be supported in unstructured pp systems we design an efficient indexmechanism named linking identical neighborly partitions linp which takes advantage of both space partitioning and routing indices techniques we evaluate our proposed scheme over various data sets and experimental results show the efficacy of our approach
energy power and area efficiency are critical design concerns for embedded processors much of the energy of typical embedded processor is consumed in the front end since instruction fetching happens on nearly every cycle and involves accesses to large memory arrays such as instruction and branch target caches the use of small front end arrays leads to significant power and area savings but typically results in significant performance degradation this paper evaluates and compares optimizations that improve the performance of embedded processors with small front end caches we examine both software techniques such as instruction re ordering and selective caching and hardware techniques such as instruction prefetching tagless instruction cache and unified caches for instruction and branch targets we demonstrate that building on top of block aware instruction set these optimizations can eliminate the performance degradation due to small front end caches moreover selective combinations of these optimizations lead to an embedded processor that performs significantly better than the large cache design while maintaining the area and energy efficiency of the small cache design
xml has gained popularity for information representation exchange and retrieval as the xml material becomes more abundant the ability to gain knowledge from xml sources decreases due to their heterogeneity and structural irregularity the use of data mining techniques becomes essential to improve xml document handling this paper discusses the capabilities and the process of applying data mining techniques in xml sources
in grid workflow systems checkpoint selection strategy is responsible for selecting checkpoints for conducting temporal verification at the runtime execution stage existing representative checkpoint selection strategies often select some unnecessary checkpoints and omit some necessary ones because they cannot adapt to the dynamics and uncertainty of runtime activity completion duration in this article based on the dynamics and uncertainty of runtime activity completion duration we develop novel checkpoint selection strategy that can adaptively select not only necessary but also sufficient checkpoints specifically we introduce new concept of minimum time redundancy as key reference parameter for checkpoint selection an important feature of minimum time redundancy is that it can adapt to the dynamics and uncertainty of runtime activity completion duration we develop method on how to achieve minimum time redundancy dynamically along grid workflow execution and investigate its relationships with temporal consistency based on the method and the relationships we present our strategy and rigorously prove its necessity and sufficiency the simulation evaluation further demonstrates experimentally such necessity and sufficiency and its significant improvement on checkpoint selection over other representative strategies
as users pan and zoom display content can disappear into off screen space particularly on small screen devices the clipping of locations such as relevant places on map can make spatial cognition tasks harder halo is visualization technique that supports spatial cognition by showing users the location of off screen objects halo accomplishes this by surrounding off screen objects with rings that are just large enough to reach into the border region of the display window from the portion of the ring that is visible on screen users can infer the off screen location of the object at the center of the ring we report the results of user study comparing halo with an arrow based visualization technique with respect to four types of map based route planning tasks when using the halo interface users completed tasks faster while there were no significant differences in error rate for three out of four tasks in our study
the querying and analysis of data streams has been topic of much recent interest motivated by applications from the fields of networking web usage analysis sensor instrumentation telecommunications and others many of these applications involve monitoring answers to continuous queries over data streams produced at physically distributed locations and most previous approaches require streams to be transmitted to single location for centralized processing unfortunately the continual transmission of large number of rapid data streams to central location can be impractical or expensive we study useful class of queries that continuously report the largest values obtained from distributed data streams top monitoring queries which are of particular interest because they can be used to reduce the overhead incurred while running other types of monitoring queries we show that transmitting entire data streams is unnecessary to support these queries and present an alternative approach that reduces communication significantly in our approach arithmetic constraints are maintained at remote stream sources to ensure that the most recently provided top answer remains valid to within user specified error tolerance distributed communication is only necessary on occasion when constraints are violated and we show empirically through extensive simulation on real world data that our approach reduces overall communication cost by an order of magnitude compared with alternatives that er the same error guarantees
method for automated analysis of fault tolerance of distributed systems is presented it is based on stream or data flow model of distributed computation temporal ordering relationships between messages received by component on different channels are not captured by this model this makes the analysis more efficient and forces the use of conservative approximations in analysis of systems whose behavior depends on such inter channel orderings to further support efficient analysis our framework includes abstractions for the contents number and ordering of messages sent on each channel analysis of reliable broadcast protocol illustrates the method
with increasing process variations low vt swapping is an effective technique that can be used to improve timing yield without having to modify design following placement and routing gate criticality defined as the probability that gate lies on critical path forms the basis for existing low vt swapping techniques this paper presents simulation based study that challenges the effectiveness of low vt swapping based on the conventional definition of gate criticality especially as random process variations increase with technology scaling we introduce dominant gate criticality to address the drawbacks of the conventional definition of gate criticality and formulate dominant critical gate ranking in the presence of process variations as an optimization problem simulation results for benchmark circuits from the iscas and opensparc suites to achieve timing yields of and indicate that low vt swapping based on dominant gate criticality reduces leakage power overhead by and for independent and correlated process variations respectively over low vt swapping based on conventional gate criticality
index structures like the suffix tree or the suffix array are of utmost importance in stringology most notably in exact string matching in the last decade research on compressed index structures has flourished because the main problem in many applications is the space consumption of the index it is possible to simulate the matching of pattern against suffix tree on an enhanced suffix array by using range minimum queries or the so called child table in this paper we show that the super cartesian tree of the lcp array with which the suffix array is enhanced very naturally explains the child table more important however is the fact that the balanced parentheses representation of this tree constitutes very natural compressed form of the child table which admits to locate all occ occurrences of pattern of length in log occ time where is the underlying alphabet our compressed child table uses less space than previous solutions to the problem an implementation is available
recent work has shown the feasibility and promise of template independent web data extraction however existing approaches use decoupled strategies attempting to do data record detection and attribute labeling in two separate phases in this paper we show that separately extracting data records and attributes is highly ineffective and propose probabilistic model to perform these two tasks simultaneously in our approach record detection can benefit from the availability of semantics required in attribute labeling and at the same time the accuracy of attribute labeling can be improved when data records are labeled in collective manner the proposed model is called hierarchical conditional random fields it can efficiently integrate all useful features by learning their importance and it can also incorporate hierarchical interactions which are very important for web data extraction we empirically compare the proposed model with existing decoupled approaches for product information extraction and the results show significant improvements in both record detection and attribute labeling
validation of low fidelity prototyping test results is difficult because we cannot claim whether the results are the effect of the prototype itself or the essence of the design concept we try to evaluate however it will cost too much if we implement fully functional prototype for more valid evaluation in this research we provide qualitative and reflective analysis of usability evaluations of text messaging functionality of mobile phone by comparing three types of prototyping techniques paper based and computer based and fully functional prototype this analysis led us to realize how significantly the unique characteristics of each different prototype affect the usability evaluation in different ways we identify what characteristics of each prototype causes the differences in finding usability problems and then suggest key considerations for designing more valid low fidelity prototypes based on this analysis
recognizing patterns in conceptual models is useful for number of purposes like revealing syntactical errors model comparison and identification of business process improvement potentials in this contribution we introduce an approach for the specification and matching of structural patterns in conceptual models unlike existing approaches we do not focus on certain application problem or specific modeling language instead our approach is generic making it applicable for any pattern matching purpose and any conceptual modeling language in order to build sets representing structural model patterns we define operations based on set theory which can be applied to arbitrary sets of model elements and relationships besides conceptual specification of our approach we present prototypical modeling tool that shows its applicability
we introduce two throughput metrics referred to as flow and time sampled throughputs the former gives the throughput statistics of an arbitrary flow while the latter weights these throughput statistics by the flow durations under fair sharing assumptions the latter is shown to coincide with the steady state instantaneous throughput weighted by the number of flows which provides useful means to measure and estimate it we give some generic properties satisfied by both the metrics and illustrate their difference on few examples
recently there has been dramatic increase in the use of xml data to deliver information over the web personal weblogs news web sites and discussion forums are now publishing rss feeds for their subscribers to retrieve new postings as the popularity of personal weblogs and rss feeds grows rapidly rss aggregation services and blog search engines have appeared which try to provide central access point for simpler access and discovery of new content from large number of diverse rss sources in this paper we study how the rss aggregation services should monitor the data sources to retrieve new content quickly using minimal resources and to provide its subscribers with fast news alerts we believe that the change characteristics of rss sources and the general user access behavior pose distinct requirements that make this task significantly different from the traditional index refresh problem for web search engines our studies on collection of rss feeds reveal some general characteristics of the rss feeds and show that with proper resource allocation and scheduling the rss aggregator provides news alerts significantly faster than the best existing approach
the reuse of design patterns in realistic software systems is often result of blending multiple pattern elements together rather than instantiating them in an isolated manner the explicit description of pattern compositions is the key for documenting the structure and the behavior of blended patterns and ii more importantly supporting the reuse of composite patterns across different software projects in this context this paper proposes fine grained composition language for describing varying blends of design patterns based on their structural and behavioural semantics the reusability and expressiveness of the proposed language are assessed through its application to compositions of gof patterns recurrently appearing in three different case studies the openorb middleware the jhotdraw and junit frameworks
program slicing is an effective techniqe for narrowing the focus of attention to the relevant parts of program during the debugging process however imprecision is problem in static slices since they are based on all possible executions that reach given program point rather than the specific execution under which the program is being debugged dynamic slices based on the specific execution being debugged are precise but incur high run time overhead due to the tracing information that is collected during the program’s execution we present hybrid slicing technique that integrates dynamic information from specific execution into static slice analysis the hybrid sliceproduced is more precise that the static slice and less costly that the dynamic slice the technique exploits dynamic information that is readily available during debugging mdash namely breakpoint information and the dynamic call graph this information is integrated into static slicing analysis to more accurately estimate the potential paths taken by the program the breakpoints and call return points used as reference points divide the execution path into intervals by associating each statement in the slice with an execution interval hybrid slicing provides information as to when statement was encountered during execution another attractive feature of our approach is that it allows the user to control the cost of hybrid slicing by limiting the amount of dynamic information used in computing the slice we implemented the hybrid slicing technique to demonstrate the feasibility of our approach
networks and networked applications depend on several pieces of configuration information to operate correctly such information resides in routers firewalls and end hosts among other places incorrect information or misconfiguration could interfere with the running of networked applications this problem is particularly acute in consumer settings such as home networks where there is huge diversity of network elements and applications coupled with the absence of network administrators to address this problem we present netprints system that leverages shared knowledge in population of users to diagnose and resolve misconfigurations basically if user has working network configuration for an application or has determined how to rectify problem we would like this knowledge to be made available automatically to another user who is experiencing the same problem netprints accomplishes this task by applying decision tree based learning on working and nonworking configuration snapshots and by using network traffic based problem signatures to index into configuration changes made by users to fix problems we describe the design and implementation of netprints and demonstrate its effectiveness in diagnosing variety of home networking problems reported by users
equivalence relations can be used to reduce the state space of system model thereby permitting more efficient analysis we study backward stochastic bisimulation in the context of model checking continuous time markov chains against continuous stochastic logic csl properties while there are simple csl properties that are not preserved when reducing the state space of continuous time markov chain using backward stochastic bisimulation we show that the equivalence can nevertheless be used in the verification of practically significant class of csl properties we consider an extension of these results to markov reward models and continuous stochastic reward logic furthermore we identify the logical properties for which the requirement on the equality of state labeling sets normally imposed on state equivalences in model checking context can be omitted from the definition of the equivalence resulting in better state space reduction
code placement techniques have traditionally improved instruction fetch bandwidth by increasing instruction locality and decreasing the number of taken branches however traditional code placement techniques have less benefit in the presence of trace cache that alters the placement of instructions in the instruction cache moreover as pipelines have become deeper to accommodate increasing clock rates branch misprediction penalties have become significant impediment to performance we evaluate pattern history table partitioning feedback directed code placement technique that explicitly places conditional branches so that they are less likely to interfere destructively with one another in branch prediction tables on spec cpu benchmarks running on an intel pentium branch mispredictions are reduced by up to and on average this reduction yields speedup of up to and on average by contrast branch alignment previous code placement technique yields only up to speedup and less than on average
significant aspect in applying the reflexion method is the mapping of components found in the source code onto the conceptual components defined in the hypothesized architecture to date this mapping is established manually which requires lot of work for large software systems in this paper we present new approach in which clustering techniques are applied to support the user in the mapping activity the result is semi automated mapping technique that accommodates the automatic clustering of the source model with the user’s hypothesized knowledge about the system’s architecture this paper describes three case studies in which the semi automated mapping technique called hugme has been applied successfully to extend partial map of real world software applications in addition the results of another case study from an earlier publication are summarized which lead to comparable results we evaluated the extended versions of two automatic software clustering techniques namely mqattract and countattract with oracle mappings we closely study the influence of the degree of completeness of the existing mapping and other controlling variables of the technique to make reliable suggestions both clustering techniques were able to achieve mapping quality where more than of the automatic mapping decisions turned out to be correct moreover the experiments indicate that the attraction function countattract based on local coupling and cohesion is more suitable for semi automated mapping than the approach mqattract based on global assessment of coupling and cohesion
we introduce ewall an experimental visual analytics environment for the support of remote collaborative sense making activities ewall is designed to foster and support object focused thinking where users represent and understand information as objects construct and recognize contextual relationships among objects as well as communicate through objects ewall also offers unified infrastructure for the implementation and testing of computational agents that consolidate user contributions and manage the flow of information among users through the creation and management of virtual transactive memory ewall users operate their individual graphical interfaces to collect abstract organize and comprehend task relevant information relative to their areas of expertise first type of computational agents infers possible relationships among information items through the analysis of the spatial and temporal organization and collaborative use of information all information items and relationships converge in shared database second type of computational agents evaluates the contents of the shared database and provides individual users with customized selection of potentially relevant information learning mechanism allows the computational agents to adapt to particular users and circumstances ewall is designed to enable individual users to navigate vast amounts of shared information effectively and help remotely dispersed team members combine their contributions work independently without diverting from common objectives and minimize the necessary amount of verbal communication
raid redundant arrays of independent disk level is popular paradigm which uses parity to protect against single disk failures major shortcoming of raid is the small write penalty ie the cost of updating parity when data block is modified read modify writes and reconstruct writes are alternative methods for updating small data and parity blocks we use queuing formulation to determine conditions under which one method outperforms the other our analysis shows that in the case of raid and more generally disk arrays with check disks tolerating disk failures rcw outperforms rmw for higher values of and we note that clustered raid and variable scope of parity protection methods favor reconstruct writes dynamic scheme to determine the more desirable policy based on the availability of appropriate cached blocks is proposed
we formulate an intraprocedural information flow analysis algorithm for sequential heap manipulating programs we prove correctness of the algorithm and argue that it can be used to verify some naturally occurring examples in which information flow is conditional on some hoare like state predicates being satisfied because the correctness of information flow analysis is typically formulated in terms of noninterference of pairs of computations the algorithm takes as input program together with two state assertions as postcondition and generates two state preconditions together with verification conditions to process heap manipulations and while loops the algorithm must additionally be supplied object flow invariants as well as loop flow invariants which are themselves two state and possibly conditional
one of the potent personalization technologies powering the adaptive web is collaborative filtering collaborative filtering cf is the process of filtering or evaluating items through the opinions of other people cf technology brings together the opinions of large interconnected communities on the web supporting filtering of substantial quantities of data in this chapter we introduce the core concepts of collaborative filtering its primary uses for users of the adaptive web the theory and practice of cf algorithms and design decisions regarding rating systems and acquisition of ratings we also discuss how to evaluate cf systems and the evolution of rich interaction interfaces we close the chapter with discussions of the challenges of privacy particular to cf recommendation service and important open research questions in the field
the concurrency control cc method employed can be critical to the performance of transaction processing systems conventional locking suffers from the blocking phenomenon where waiting transactions continue to hold locks and block other transactions from progressing in high data contention environment as an increasing number of transactions wait larger number of lock requests get blocked and fewer lock requests can get through the proposed scheme reduces the blocking probability by deferring the blocking behavior of transactions to the later stages of their execution by properly balancing the blocking and abort effects the proposed scheme can lead to better performance than either the conventional locking or the optimistic concurrency control occ schemes at all data and resource contention levels we consider both static and dynamic approaches to determine when to switch from the nonblocking phase to the blocking phase an analytical model is developed to estimate the performance of this scheme and determine the optimal operating or switching point the accuracy of the analytic model is validated through detailed simulation
the address sequence on the processor memory bus can reveal abundant information about the control flow of program this can lead to critical information leakage such as encryption keys or proprietary algorithms addresses can be observed by attaching hardware device on the bus that passively monitors the bus transaction such side channel attacks should be given rising attention especially in distributed computing environment where remote servers running sensitive programs are not within the physical control of the clienttwo previously proposed hardware techniques tackled this problem through randomizing address patterns on the bus one proposal permutes set of contiguous memory blocks under certain conditions while the other approach randomly swaps two blocks when necessary in this paper we present an anatomy of these attempts and show that they impose great pressure on both the memory and the disk this leaves them less scalable in high performance systems where the bandwidth of the bus and memory are critical resources we propose lightweight solution to alleviating the pressure without compromising the security strength the results show that our technique can reduce the memory traffic by factor of compared with the prior scheme while keeping almost the same page fault rate as baseline system with no security protection
we investigate quantified interpreted systems semantics to model multi agent systems in which the agents can reason about individuals their properties and relationships among them the semantics naturally extends interpreted systems to first order by introducing domain of individuals we present first order epistemic language interpreted on this semantics and prove soundness and completeness of the quantified modal system qs an axiomatisation for these structures finally we exemplify the use of the logic by modeling message passing systems relevant class of interpreted systems analysed in epistemic logic
network intrusion detection has been generally dealt with using sophisticated software and statistical analysis although sometimes it has to be done by administrators either by detecting the intruders in real time or by revising network logs making this tedious and timeconsuming task to support this intrusion detection analysis has been carried out using visual auditory or tactile sensory information in computer interfaces however little is known about how to best integrate the sensory channels for analyzing intrusion detection alarms in the past we proposed set of ideas outlining the benefits of enhancing intrusion detection alarms with multimodal interfaces in this paper we present simplified sound assisted attack mitigation system enhanced with auditory channels results indicate that the resulting intrusion detection system effectively generates distinctive sounds upon series of simple attack scenarios consisting of denial of service and port scanning
naive bayes is an effective and efficient learning algorithm in classification in many applications however an accurate ranking of instances based on the class probability is more desirable unfortunately naive bayes has been found to produce poor probability estimates numerous techniques have been proposed to extend naive bayes for better classification accuracy of which selective bayesian classifiers sbc langley sage tree augmented naive bayes tan friedman et al nbtree kohavi boosted naive bayes elkan and aode webb et al achieve remarkable improvement over naive bayes in terms of classification accuracy an interesting question is do these techniques also produce accurate ranking in this paper we first conduct systematic experimental study on their efficacy for ranking then we propose new approach to augmenting naive bayes for generating accurate ranking called hidden naive bayes hnb in an hnb hidden parent is created for each attribute to represent the influences from all other attributes and thus more accurate ranking is expected hnb inherits the structural simplicity of naive bayes and can be easily learned without structure learning our experiments show that hnb outperforms naive bayes sbc boosted naive bayes nbtree and tan significantly and performs slightly better than aode in ranking
recent advances in wireless communications and positioning devices have generated tremendous amount of interest in the continuous monitoring of spatial queries however such applications can incur heavy burden on the data owner do due to very frequent location updates database outsourcing is viable solution whereby the do delegates its database functionality to service provider sp that has the infrastructure and resources to handle the high workload in this framework authenticated query processing enables the clients to verify the correctness of the query results that are returned by the sp in addition to correctness the dynamic nature of the monitored data requires the provision for temporal completeness ie the clients must be able to verify that there are no missing results in between data updates this paper constitutes the first work that deals with the authentication of continuous spatial queries focusing on ranges we first introduce baseline solution bsl that achieves correctness and temporal completeness but incurs false transmissions that is the sp has to notify clients whenever there is data update even if it does not affect their results then we propose csa mechanism that minimizes the processing and transmission overhead through an elaborate indexing scheme and virtual caching mechanism finally we derive analytical models to optimize the performance of our methods and evaluate their effectiveness through extensive experiments
in this paper we propose protection scheme for geometry data this field has got hardly any attention despite the fact that data has severe industrial impact on the success of product or business with current reproduction techniques the loss of model virtually enables malicious party to create copies of product we propose protection mechanism for data that provides fine grained access control preview capabilities and reduced processing cost
vehicular ad hoc networks vanets have received large attention in recent times there are many applications bringing out the new research challenges in the vanet environment such as vehicle to vehicle communication road side vehicle networking etc road side data access in the two intersecting roads of downtown area in vanet is the main consideration of this paper caching is common technique for saving network traffic and enhance response time especially in wireless environments where bandwidth is often scare resource in this paper novel road side information station ris data distribution model with cache scheme in vanet for the data access in the two intersecting roads of downtown area is proposed the performance of the proposed scheme is tested through simulations the simulation results reveal that the proposed scheme can significantly improve the response time and the network performance
in this article we show that keeping track of history enables significant improvements in the communication complexity of dynamic network protocols we present communication optimal maintenance of spanning tree in dynamic network the amortized on the number of topological changes message complexity is where is the number of nodes in the network the message size used by the algorithm is log verbar id verbar where verbar id verbar is the size of the name space of the nodes typically log verbar id verbar equals log previous algorithms that adapt to dynamic networks involved omega messages per topological change mdash inherently paying for re computation of the tree from scratch spanning trees are essential components in many distributed algorithms some examples include broadcast dissemination of messages to all network nodes multicast reset general adaptation of static algorithms to dynamic networks routing termination detection and more thus our efficient maintenance of spanning tree implies the improvement of algorithms for these tasks our results are obtained using novel technique to save communication node uses information received in the past in order to deduce present information from the fact that certain messages were not sent by the node’s neighbor this technique is one of our main contributions
understanding the graph structure of the internet is crucial step for building accurate network models and designing efficient algorithms for internet applications yet obtaining this graph structure can be surprisingly difficult task as edges cannot be explicitly queried for instance empirical studies of the network of internet protocol ip addresses typically rely on indirect methods like traceroute to build what are approximately single source all destinations shortest path trees these trees only sample fraction of the network’s edges and paper by lakhina et al found empirically that the resulting sample is intrinsically biased further in simulations they observed that the degree distribution under traceroute sampling exhibits power law even when the underlying degree distribution is poisson in this article we study the bias of traceroute sampling mathematically and for very general class of underlying degree distributions explicitly calculate the distribution that will be observed as example applications of our machinery we prove that traceroute sampling finds power law degree distributions in both delta regular and poisson distributed random graphs thus our work puts the observations of lakhina et al on rigorous footing and extends them to nearly arbitrary degree distributions
this paper focuses on the algorithmic aspects for the hardware software hw sw partitioning which searches reasonable composition of hardware and software components which not only satisfies the constraint of hardware area but also optimizes the execution time the computational model is extended so that all possible types of communications can be taken into account for the hw sw partitioning also new dynamic programming algorithm is proposed on the basis of the computational model in which source data rather than speedup in previous work of basic scheduling blocks are directly utilized to calculate the optimal solution the proposed algorithm runs in for code fragments and the available hardware area simulation results show that the proposed algorithm solves the hw sw partitioning without increase in running time compared with the algorithm cited in the literature
binary translation has been widely used as object codemigration across different architectures most currentworks are targeted towards running an existing old architecture binary version of complex application on newerarchitecture and so availability of resources is not problemin this paper we propose technique called rabit foremulating newer architecture’s binary on an older onerabit will allow the consumers to emulate the applicationsdeveloped for newer hardware on their older machines byincurring some performance trade offs this way the needyconsumers can take advantage of the newer software featureson their existing machines before they decide to upgradethem to later models in situations where the newerhardware is in place our technique can be used to run applicationsredundantly on the older machine for dependabilityanalysis it also provides the cpu architects with aninstruction level simulator to test the design and stability ofa new cpu before the new hardware is actually in placerabit’s design consists of translator an interpreter anda set of operating system services to deal with the lessamount of resources like registers in the older architectureas compared to the newer ones we present new registerallocation algorithm the feasibility of the technique is establishedby simulating rabit for its performance
points lines and regions are the three basic entities for constituting vector based objects in spatial databases many indexing methods tree tree quad tree pmr tree grid file tree and so on have been widely discussed for handling point or region data these traditional methods can efficiently organize point or region objects in space into hashing or hierarchical directory they provide efficient access methods to meet the requirement of accurate retrievals however two problems are encountered when their techniques are applied to deal with line segments the first is that representing line segments by means of point or region objects cannot exactly and properly preserve the spatial information about the proximities of line segments the second problem is derived from the large dead space and overlapping areas in external and internal nodes of the hierarchical directory caused by the use of rectangles to enclose line objects in this paper we propose an indexing structure for line segments based on tree to remedy these two problems through the experimental results we demonstrate that our approach has significant improvement over the storage efficiency in addition the retrieval efficiency has also been significantly prompted as compared to the method using tree index scheme these improvements derive mainly from the proposed data processing techniques and the new indexing method
traditional scientific computing has been associated with harnessing computation cycles within and across clusters of machines in recent years scientific applications have become increasingly data intensive this is especially true in the fields of astronomy and high energy physics furthermore the lowered cost of disks and commodity machines has led to dramatic increase in the amount of free disk space spread across machines in cluster this space is not being exploited by traditional distributed computing tools in this paper we have evaluated ways to improve the data management capabilities of condor popular distributed computing system we have augmented the condor system by providing the capability to store data used and produced by workflows on the disks of machines in the cluster we have also replaced the condor matchmaker with new workflow planning framework that is cognizant of dependencies between jobs in workflow and exploits these new data storage capabilities to produce workflow schedules we show that our data caching and workflow planning framework can significantly reduce response times for data intensive workflows by reducing data transfer over the network in cluster we also consider ways in which this planning framework can be made adaptive in dynamic multi user failure prone environment
new randomized asynchronous shared memory data structure is given for implementing an approximate counter that can be incremented once by each of processes in model that allows up to minus crash failures for any fixed epsis the counter achieves relative error of delta with high probability at the cost of delta log epsis register operations per increment and epsis delta log epsis register operations per read the counter combines randomized sampling for estimating large values with an expander for estimating small values this is the first counter implementation that is sublinear the number of processes and works despite strong adversary scheduler that can observe internal states of processes an application of the improved counter is an improved protocol for solving randomized shared memory consensus which reduces the best previously known individual work complexity from log to an optimal resolving one of the last remaining open problems concerning consensus in this model
database integration is currently solved only for the case of simple structures semantics is mainly neglected it is known but often neglected that database integration cannot be automated system integration is far more difficult both integrations can only be performed if number of assumptions can be made for the integrated system instead of integrating systems entirely cooperation or collaboration of systems can be developed and used we propose in this paper the extension of the view cooperation approach to database collaboration
multi user applications generally lag behind in features or compatibility with single user applications as result users are often not motivated to abandon their favorite single user applications for groupware features that are less frequently used well accepted approach collaboration transparency is able to convert off the shelf single user applications into groupware without modifying the source code however existing systems have been largely striving to develop generic application sharing mechanisms and undesirably force users to share the same application in cooperative work in this paper we analyze this problem and present novel approach called intelligent collaboration transparency to addressing this problem our approach allows for heterogeneous application sharing by considering the particular semantics of the applications and the collaboration task in question
in this paper we study the localization problem in large scale underwater sensor networks the adverse aqueous environments the node mobility and the large network scale all pose new challenges and most current localization schemes are not applicable we propose hierarchical approach which divides the whole localization process into two sub processes anchor node localization and ordinary node localization many existing techniques can be used in the former for the ordinary node localization process we propose distributed localization scheme which novelly integrates dimensional euclidean distance estimation method with recursive location estimation method simulation results show that our proposed solution can achieve high localization coverage with relatively small localization error and low communication overhead in large scale dimensional underwater sensor networks
the snapshot object is an important tool for constructing wait free asynchronous algorithms we relate the snapshot object to the lattice agreement decision problem it is shown that any algorithm for solving lattice agreement can be transformed into an implementation of snapshot object the overhead cost of this transformation is only linear number of read and write operations on atomic single writer multi reader registers the transformation uses an unbounded amount of shared memory we present deterministic algorithm for lattice agreement that used log operations on processor test set registers plus operations on atomic single writer multi reader registers the shared objects are used by the algorithm in dynamic mode that is the identity of the processors that access each of the shared objects is determined dynamically during the execution of the algorithm by randomized implementation of processors test set registers from atomic registers this algorithm implies randomized algorithm for lattice agreement that uses an expected number of operations on dynamic atomic single writer multi reader registers combined with our transformation this yields implementations of atomic snapshots with the same complexity
this work presents new approach for ranking documents in the vector space model the novelty lies in two fronts first patterns of term co occurrence are taken into account and are processed efficiently second term weights are generated using data mining technique called association rules this leads to new ranking mechanism called the set based vector model the components of our model are no longer index terms but index termsets where termset is set of index terms termsets capture the intuition that semantically related terms appear close to each other in document they can be efficiently obtained by limiting the computation to small passages of text once termsets have been computed the ranking is calculated as function of the termset frequency in the document and its scarcity in the document collection experimental results show that the set based vector model improves average precision for all collections and query types evaluated while keeping computational costs small for the gigabyte trec collection the set based vector model leads to gain in average precision figures of percnt and percnt for disjunctive and conjunctive queries respectively with respect to the standard vector space model these gains increase to percnt and percnt respectively when proximity information is taken into account query processing times are larger but on average still comparable to those obtained with the standard vector model increases in processing time varied from percnt to percnt our results suggest that the set based vector model provides correlation based ranking formula that is effective with general collections and computationally practical
an adaptive information filtering system monitors document stream to identify the documents that match information needs specified by user profiles as the system filters it also refines its knowledge about the user’s information needs based on long term observations of the document stream and periodic feedback training data from the user low variance profile learning algorithms such as rocchio work well at the early stage of filtering when the system has very few training data low bias profile learning algorithms such as logistic regression work well at the later stage of filtering when the system has accumulated enough training datahowever an empirical system needs to works well consistently at all stages of filtering process this paper addresses this problem by proposing new technique to combine different text classification algorithms via constrained maximum likelihood bayesian prior this technique provides trade off between bias and variance and the combined classifier may achieve consistent good performance at different stages of filtering we implemented the proposed technique to combine two complementary classification algorithms rocchio and logistic regression the new algorithm is shown to compare favorably with rocchio logistic regression and the best methods in the trec and trec adaptive filtering tracks
by studying the behavior of several programs that crash due to memory errors we observed that locating the errors can be challenging because significant propagation of corrupt memory values can occur prior to the point of the crash in this article we present an automated approach for locating memory errors in the presence of memory corruption propagation our approach leverages the information revealed by program crash when crash occurs this reveals subset of the memory corruption that exists in the execution by suppressing nullifying the effect of this known corruption during execution the crash is avoided and any remaining hidden corruption may then be exposed by subsequent crashes the newly exposed corruption can then be suppressed in turn by iterating this process until no further crashes occur the first point of memory corruption mdash and the likely root cause of the program failure mdash can be identified however this iterative approach may terminate prematurely since programs may not crash even when memory corruption is present during execution to address this we show how crashes can be exposed in an execution by manipulating the relative ordering of particular variables within memory by revealing crashes through this variable re ordering the effectiveness and applicability of the execution suppression approach can be improved we describe set of experiments illustrating the effectiveness of our approach in consistently and precisely identifying the first points of memory corruption in executions that fail due to memory errors we also discuss baseline software implementation of execution suppression that incurs an average overhead of and describe how to reduce this overhead to through hardware support
the fact that instructions in programs often produce repetitive results has motivated researchers to explore various techniques such as value prediction and value reuse to exploit this behavior value prediction improves the available instruction level parallelism ilp in superscalar processors by allowing dependent instructions to be executed speculatively after predicting the values of their input operands value reuse on the other hand tries to eliminate redundant computation by storing the previously produced results of instructions and skipping the execution of redundant instructions previous value reuse mechanisms use single instruction or naturally formed instruction group such as basic block trace or function as the reuse unit these naturally formed instruction groups are readily identifiable by the hardware at runtime without compiler assistance however the performance potential of value reuse mechanism depends on its reuse detection time the number of reuse opportunities and the amount of work saved by skipping each reuse unit since larger instruction groups typically have fewer reuse opportunities than smaller groups but they provide greater benefit for each reuse detection process it is very important to find the balance point that provides the largest overall performance gain in this paper we propose new mechanism called subblock reuse subblocks are created by slicing basic blocks either dynamically or with compiler guidance the dynamic approaches use the number of instructions numbers of inputs and outputs or the presence of store instructions to determine the subblock boundaries the compiler assisted approach slices basic blocks using data flow considerations to balance the reuse granularity and the number of reuse opportunities the results show that subblocks which can produce up to percent speedup if reused properly are better candidates for reuse units than basic blocks although subblock reuse with compiler assistance has substantial and consistent potential to improve the performance of superscalar processors this scheme is not always the best performer subblocks restricted to two consecutive instructions demonstrate surprisingly good performance potential as well
we examine whether traditional automated annotation system can be improved by using background knowledge traditional means any machine learning approach together with image analysis techniques we use as baseline for our experiments the work done by yavlinsky et al who deployed non parametric density estimation we observe that probabilistic image analysis by itself is not enough to describe the rich semantics of an image our hypothesis is that more accurate annotations can be produced by introducing additional knowledge in the form of statistical co occurrence of terms this is provided by the context of images that otherwise independent keyword generation would miss we test our algorithm with two different datasets corel and imageclef for the corel dataset we obtain significantly better results while our algorithm appears in the top quartile of all methods submitted in imageclef
we describe methods to search with query by example in known domain for information in an unknown domain by exploiting web search engines relational search is an effective way to obtain information in an unknown field for users for example if an apple user searches for microsoft products similar apple products are important clues for the search even if the user does not know keywords to search for specific microsoft products the relational search returns product name by querying simply an example of apple products more specifically given tuple containing three terms such as apple ipod microsoft the term zune can be extracted from the web search results where apple is to ipod what microsoft is to zune as previously proposed relational search requires huge text corpus to be downloaded from the web the results are not up to date and the corpus has high construction cost we introduce methods for relational search by using web search indices we consider methods based on term co occurrence on lexico syntactic patterns and on combinations of the two approaches our experimental results showed that the combination methods got the highest precision and clarified the characteristics of the methods
sensor scheduling plays critical role for energy efficiency of wireless sensor networks traditional methods for sensor scheduling use either sensing coverage or network connectivity but rarely both in this paper we deal with challenging task without accurate location information how do we schedule sensor nodes to save energy and meet both constraints of sensing coverage and network connectivity our approach utilizes an integrated method that provides statistical sensing coverage and guaranteed network connectivity we use random scheduling for sensing coverage and then turn on extra sensor nodes if necessary for network connectivity our method is totally distributed is able to dynamically adjust sensing coverage with guaranteed network connectivity and is resilient to time asynchrony we present analytical results to disclose the relationship among node density scheduling parameters coverage quality detection probability and detection delay analytical and simulation results demonstrate the effectiveness of our joint scheduling method
this paper presents new congestion minimization technique for standard cell global placement the most distinct feature of this approach is that it does not follow the traditional estimate then eliminate strategy instead it avoids the excessive usage of routing resources by the local nets so that more routing resources are available for the uncertain global nets the experimental results show that our new technique sparse achieves better routability than the traditional total wire length bounding box guided placers which had been shown to deliver the best routability results among the placers optimizing different cost functions another feature of sparse is the capability of allocating white space implicitly sparse exploits the well known empirical rent’s rule and is able to improve the routability even more in the presence of white space compared to the most recent academic routability driven placer dragon sparse is able to produce solutions with equal or better routability
we start with the usual paradigm in electronic commerce customer bob wants to buy from merchant alice however bob wishes to enjoy maximal privacy while alice needs to protect her sensitive data bob should be able to remain anonymous throughout the entire process from turning on his computer to final delivery and even after sale maintenance services ideally he should even be able to hide from alice what he is interested in buying conversely alice should not have to reveal anything unnecessary about her catalogue especially prices for fear that she might in fact be dealing with hostile competitor masquerading as customer for this purpose we introduce the blind electronic commerce paradigm to offer an integrated solution to the dual conundrum of ensuring bob’s privacy as well as protecting alice’s sensitive information
real time animation of human like characters is an active research area in computer graphics the conventional approaches have however hardly dealt with the rhythmic patterns of motions which are essential in handling rhythmic motions such as dancing and locomotive motions in this paper we present novel scheme for synthesizing new motion from unlabelled example motions while preserving their rhythmic pattern our scheme first captures the motion beats from the example motions to extract the basic movements and their transitions based on those data our scheme then constructs movement transition graph that represents the example motions given an input sound signal our scheme finally synthesizes novel motion in an on line manner while traversing the motion transition graph which is synchronized with the input sound signal and also satisfies kinematic constraints given explicitly and implicitly through experiments we have demonstrated that our scheme can effectively produce variety of rhythmic motions
in this paper we describe novel approach for searching large data sets from mobile phone existing interfaces for mobile search require keyword text entry and are not suited for browsing our alternative uses hybrid model to de emphasize tedious keyword entry in favor of iterative data filtering we propose navigation and selection of hierarchical metadata facet navigation with incremental text entry to further narrow the results we conducted formative evaluation to understand the relative advantages of keyword entry versus facet navigation for both browse and search tasks on the phone we found keyword entry to be more powerful when the name of the search target is known while facet navigation is otherwise more effective and strongly preferred
in online communities or blogospheres the users publish their posts and then their posts get feedback actions from other users in the form of comment trackback or recommendation these interactions form graph in which the vertices represent set of users while the edges represent set of feedbacks thus the problem of users rankings can be approached in terms of the analysis of the social relationships between the users themselves within this graph pagerank and hits have often been applied for users rankings especially for users reputation but there has been no consideration of the fact that the user’s sociability can affect the user’s reputation to address this problem in this paper we newly propose two different factors that affect the score of every user the user’s reputation and the user’s sociability furthermore we present novel schemes that effectively and separately can estimate the reputation and the sociability of the users our experimental results show that our schemes can effectively separate the user’s pure reputation from the user’s sociability pure reputation as it stands alone or when it is combined with sociability is capable of producing more optimal user ranking results than can the previous works
coloring the nodes of graph with small number of colors is one of the most fundamental problems in theoretical computer science in this paper we study graph coloring in distributed setting processors of distributed system are nodes of an undirected graph there is an edge between two nodes whenever the corresponding processors can directly communicate with each other we assume that distributed coloring algorithms start with an initial coloring of in the paper we prove new strong lower bounds for two special kinds of coloring algorithms for algorithms which run for single communication round ie every node of the network can only send its initial color to all its neighbors we show that the number of colors of the computed coloring has to be at least log log log if such one round algorithms are iteratively applied to reduce the number of colors step by step we prove time lower bound of log log to obtain an coloring the best previous lower bounds for the two types of algorithms are log log and log respectively
traditional ranking mainly focuses on one type of data source and effective modeling still relies on sufficiently large number of labeled or supervised examples however in many real world applications in particular with the rapid growth of the web ranking over multiple interrelated heterogeneous domains becomes common situation where in some domains we may have large amount of training data while in some other domains we can only collect very little one important question is if there is not sufficient supervision in the domain of interest how could one borrow labeled information from related but heterogenous domain to build an accurate model this paper explores such an approach by bridging two heterogeneous domains via the latent space we propose regularized framework to simultaneously minimize two loss functions corresponding to two related but different information sources by mapping each domain onto shared latent space capturing similar and transferable oncepts we solve this problem by optimizing the convex upper bound of the non continuous loss function and derive its generalization bound experimental results on three different genres of data sets demonstrate the effectiveness of the proposed approach
in this paper we detail an anatomically inspired physically based model of the human torso designed for the visual simulation of respiration using mixed system of rigid and deformable parts motion related to breath is signature movement of the human body and an indicator for life but it has been largely overlooked by the graphics community novel composition of biological components is necessary to capture the key characteristics of breathing motion visible in the human trunk because the movement is generated fundamentally through the combination of both rigid bone and soft tissue our approach uses simple physically based muscle element which is used throughout to drive the motion of the ribs and diaphragm as well as in other muscles like those of the abdomen to produce passive resistance in addition we describe an implementation of straightforward method for preserving incompressible volume in deformable bodies to use in approximating the motion of the abdomen related to breath through the careful construction of this anatomically based torso control for respiration becomes the generation of periodic contraction signals for minimal set of two muscle groups we show the flexibility of our approach through the animation of several breathing styles using our system
current microprocessors aggressively exploit instruction level parallelism ilp through techniques such as multiple issue dynamic scheduling and non blocking reads recent work has shown that memory latency remains significant performance bottleneck for shared memory multiprocessor systems built of such processorsthis paper provides the first study of the effectiveness of software controlled non binding prefetching in shared memory multiprocessors built of state of the art ilp based processors we find that software prefetching results in significant reductions in execution time to for three out of five applications on an ilp system however compared to previous generation system software prefetching is significantly less effective in reducing the memory stall component of execution time on an ilp system consequently even after adding software prefetching memory stall time accounts for over of the total execution time in four out of five applications on our ilp systemthis paper also investigates the interaction of software prefetching with memory consistency models on ilp based multiprocessors in particular we seek to determine whether software prefetching can equalize the performance of sequential consistency sc and release consistency rc we find that even with software prefetching for three out of five applications rc provides significant reduction in execution time to compared to sc
we propose generic and robust framework for news video indexing which we founded on broadcast news production model we identify within this model four production phases each providing useful metadata for annotation in contrast to semiautomatic indexing approaches which exploit this information at production time we adhere to an automatic data driven approach to that end we analyze digital news video using separate set of multimodal detectors for each production phase by combining the resulting production derived features into statistical classifier ensemble the framework facilitates robust classification of several rich semantic concepts in news video rich meaning that concepts share many similarities in their production process experiments on an archive of hours of news video from the trecvid benchmark show that combined analysis of production phases yields the best results in addition we demonstrate that the accuracy of the proposed style analysis framework for classification of several rich semantic concepts is state of the art
prior knowledge is critical resource for design especially when designers are striving to generate new ideas for complex problems systems that improve access to relevant prior knowledge and promote reuse can improve design efficiency and outcomes unfortunately such systems have not been widely adopted indicating that user needs in this area have not been adequately understood in this paper we report the results of contextual inquiry into the practices of and attitudes toward knowledge management and reuse during early design the study consisted of interviews and surveys with professional designers in the creative domains novel aspect of our work is the focus on early design which differs from but complements prior works focus on knowledge reuse during later design and implementation phases our study yielded new findings and implications that if applied will help bring the benefits of knowledge management systems and reuse into early design activity
in this paper we evaluate the memory system behavior of two distinctly different implementations of the unix operating system dec’s ultrix monolithic system and mach with cmu’s unix server microkernel based system in our evaluation we use combined system and user memory reference traces of thirteen industry standard workloads we show that the microkernel based system executes substantially more non idle system instructions for an equivalent workload than the monolithic system furthermore the average instruction for programs running on mach has higher cost in terms of memory cycles per instruction than on ultrix in the context of our traces we explore number of popular assertions about the memory system behavior of modern operating systems paying special attention to the effect that mach’s microkernel architecture has on system performance our results indicate that many but not all of the assertions are true and that few while true have only negligible impact on real system performance
an assumption behind new interface approaches that employ physical means of interaction is that these can leverage users prior knowledge from the real world making them intuitive or natural to use this paper presents user study of tangible augmented reality which shows that physical input tools can invite wide variety of interaction behaviours and raise unmatched expectations about how to interact children played with interactive sequences in an augmented book using physical paddles to control the main characters our analysis focuses on how knowledge and skills that children have from the physical world succeed or fail to apply in the interaction with this application we found that children expected the digital augmentations to behave and react analogous to physical objects encouraged by the ability to act in space and the digital visual feedback the affordances of the paddles as physical interaction devices invited actions that the system could not detect or interpret in effect children often struggled to understand what it was in their actions that made the system react
cats concurrency analysis tool suite is designed to satisfy several criteria it must analyze implementation level ada source code and check user specified conditions associated with program source code it must be modularized in fashion that supports flexible composition with other tool components including integration with variety of testing and analysis techniques and its performance and capacity must be sufficient for analysis of real application programs meeting these objectives together is significantly more difficult than meeting any of them alone we describe the design and rationale of cats and report experience with an implementation the issues addressed here are primarily practical concerns for modularizing and integrating tools for analysis of actual source programs we also report successful application of cats to major subsystems of nontoy highly concurrent user interface system
this paper reviews well known fingerprint matching algorithm that uses an orientation based minutia descriptor it introduces set of improvements to the algorithm that increase the accuracy and speed using the same features the most significant improvement is in the global minutiae matching step reducing the number of local matching minutiae and using multiple minutiae pairs for fingerprint alignment we conduct series of experiments over the four databases of fvc showing that the modified algorithm outperforms its predecessor and other algorithms proposed in the literature
shadow memory is used by dynamic program analysis tools to store metadata for tracking properties of application memory the efficiency of mapping between application memory and shadow memory has substantial impact on the overall performance of such analysis tools however traditional memory mapping schemes that work well on bit architectures cannot easily port to bit architectures due to the much larger bit address space this paper presents ems an efficient memory shadowing scheme for bit architectures by taking advantage of application reference locality and unused regions in the bit address space ems provides fast and flexible memory mapping scheme without relying on any underlying platform features or requiring any specific shadow memory size our experiments show that ems is able to reduce the runtime shadow memory translation overhead to on average which almost halves the overhead of the fastest bit shadow memory system we are aware of
approximate string matching problem is common and often repeated task in information retrieval and bioinformatics this paper proposes generic design of programmable array processor architecture for wide variety of approximate string matching algorithms to gain high performance at low cost further we describe the architecture of the array and the architecture of the cell in detail in order to efficiently implement for both the preprocessing and searching phases of most string matching algorithms further the architecture performs approximate string matching for complex patterns that contain don’t care complement and classes symbols we also simulate and evaluate the proposed architecture on field programmable gate array fpga device using the jhdl tool for synthesis and the xilinx foundation tools for mapping placement and routing finally our programmable implementation achieves about times faster execution than desktop computer with pentium ghz for all algorithms when the length of the pattern is
this paper presents ongoing work on using data mining clustering to support the evaluation of software systems maintainability as input for our analysis we employ software measurement data extracted from java source code we propose two steps clustering process which facilitates the assessment of system’s maintainability at first and subsequently an in cluster analysis in order to study the evolution of each cluster as the system’s versions pass by the process is evaluated on apache geronimo jee open source application server the evaluation involves analyzing several versions of this software system in order to assess its evolution and maintainability over time the paper concludes with directions for future work
ubiquitous information environment can be achieved by the mobile computing technologies in this environment users carrying their portable computers can retrieve local or remote information anywhere and at anytime data broadcast with its advantages has become powerful means to disseminate data in wireless communications indexing methods for the broadcast data have been proposed to speedup access time and reduce power consumption however the influence of access failures has not been discussed for the error prone mobile environment the occurrence of access failures is often due to disconnections handoffs and communication noises in this paper based on the distributed indexing scheme we propose an adaptive access method which tolerates the access failures the basic idea is to use index replication to recover from the access failures one mechanism named search range is provided to dynamically record the range where the desired data item may exist according to the search range an unfinished search can be efficiently resumed by finding an available index replicate performance analysis is given to show the benefits of the method also the concept of version bits is applied to deal with the updates of the broadcast data
following recent work of clarkson we translate the coreset framework to the problems of finding the point closest to the origin inside polytope finding the shortest distance between two polytopes perceptrons and soft as well as hard margin support vector machines svm we prove asymptotically matching upper and lower bounds on the size of coresets stating that coresets of size do always exist as and that this is best possible the crucial quantity is what we call the excentricity of polytope or pair of polytopes additionally we prove linear convergence speed of gilbert’s algorithm one of the earliest known approximation algorithms for polytope distance and generalize both the algorithm and the proof to the two polytope case interestingly our coreset bounds also imply that we can for the first time prove matching upper and lower bounds for the sparsity of perceptron and svm solutions
increased network speeds coupled with new services delivered via the internet have increased the demand for intelligence and flexibility in network systems this paper argues that both can be provided by new hardware platforms comprised of heterogeneous multi core systems with specialized communication support we present and evaluate an experimental network service platform that uses an emergent class of devices network processors as its communication support coupled via dedicated interconnect to host processor acting as computational core software infrastructure spanning both enables the dynamic creation of application specific services on the network processor mediated by middleware and controlled by kernel level communication support experimental evaluations use pentium iv based computational core coupled with an ixp network processor the sample application services run on both include an image manipulation application and application level multicasting
combinatorial design theory is very active area of mathematical research with many applications in communications and information theory computer science statistics engineering and life sciences as one of the fundamental discrete structures combinatorial designs are used in fields as diverse as error correcting codes statistical design of experiments cryptography and information security mobile and wireless communications group testing algorithms in dna screening software and hardware testing and interconnection networks this monograph provides tutorial on combinatorial designs which gives an overview of the theory furthermore the application of combinatorial designs to authentication and secrecy codes is described in depth this close relationship of designs with cryptography and information security was first revealed in shannon’s seminal paper on secrecy systems we bring together in one source foundational and current contributions concerning design theoretic constructions and characterizations of authentication and secrecy codes the acm portal is published by the association for computing machinery copyright acm inc terms of usage privacy policy code of ethics contact us useful downloads adobe acrobat quicktime windows media player real player
image clustering an important technology for image processing has been actively researched for long period of time especially in recent years with the explosive growth of the web image clustering has even been critical technology to help users digest the large amount of online visual information however as far as we know many previous works on image clustering only used either low level visual features or surrounding texts but rarely exploited these two kinds of information in the same framework to tackle this problem we proposed novel method named consistent bipartite graph co partitioning in this paper which can cluster web images based on the consistent fusion of the information contained in both low level features and surrounding texts in particular we formulated it as constrained multi objective optimization problem which can be efficiently solved by semi definite programming sdp experiments on real world web image collection showed that our proposed method outperformed the methods only based on low level features or surround texts
recently many new applications such as sensor data monitoring and mobile device tracking raise up the issue of uncertain data management compared to certain data the data in the uncertain database are not exact points which instead often locate within region in this paper we study the ranked queries over uncertain data in fact ranked queries have been studied extensively in traditional database literature due to their popularity in many applications such as decision making recommendation raising and data mining tasks many proposals have been made in order to improve the efficiency in answering ranked queries however the existing approaches are all based on the assumption that the underlying data are exact or certain due to the intrinsic differences between uncertain and certain data these methods are designed only for ranked queries in certain databases and cannot be applied to uncertain case directly motivated by this we propose novel solutions to speed up the probabilistic ranked query prank over the uncertain database specifically we introduce two effective pruning methods spatial and probabilistic to help reduce the prank search space then we seamlessly integrate these pruning heuristics into the prank query procedure extensive experiments have demonstrated the efficiency and effectiveness of our proposed approach in answering prank queries in terms of both wall clock time and the number of candidates to be refined
this paper presents segmentation algorithm for triangular mesh data the proposed algorithm uses iterative merging of adjacent triangle pairs based on their orientations the oversegmented regions are merged again in an iterative region merging process finally the noisy boundaries of each region are refined the boundaries of each region contain perceptually important geometric information of the entire mesh model according to the purpose of the segmentation the proposed mesh segmentation algorithm supports various types of segmentation by controlling parameters
replicated state machines are an important and widely studied methodology for tolerating wide range of faults unfortunately while replicas should be distributed geographically for maximum fault tolerance current replicated state machine protocols tend to magnify the effects of high network latencies caused by geographic distribution in this paper we examine how to use speculative execution at the clients of replicated service to reduce the impact of network and protocol latency we first give design principles for using client speculation with replicated services such as generating early replies and prioritizing throughput over latency we then describe mechanism that allows speculative clients to make new requests through replica resolved speculation and predicated writes we implement detailed case study that applies this approach to standard byzantine fault tolerant protocol pbft for replicated nfs and counter services client speculation trades in maximum throughput to decrease the effective latency under light workloads letting us speed up run time on single client micro benchmarks when the client is co located with the primary on macro benchmark reduced latency gives the client speedup of up to
in this paper we present novel protocol for disseminating data in broadcast environments such that view consistency useful correctness criterion for broadcast environments is guaranteed our protocol is based on concurrency control information that is constructed by the server and is broadcasted at the beginning of each broadcast cycle the concurrency control information mainly captures read from relations among update transactions salient feature of the protocol is that the concurrency control information is small in size but precise enough for reducing unnecessary abortion of mobile transactions the small sized concurrency control information implies low communication overhead on broadcasting system in addition the computation overheads imposed by the algorithm on the server and the clients are low we also address the reliability issue of wireless communication and the incorporation of prefetching mechanism into our protocol simulation results demonstrate the superiority of our protocol in comparison with existing methods furthermore we have extended our protocol to deal with local view consistency which requires that all mobile transactions submitted by the same client observe the same serial order of update transactions
explaining the causes of infeasibility of boolean formulas has practical applications in numerous fields such as artificial intelligence repairing inconsistent knowledge bases formal verification abstraction refinement and unbounded model checking and electronic design diagnosing and correcting infeasibility minimal unsatisfiable subformulas muses provide useful insights into the causes of infeasibility an unsatisfiable formula often has many muses based on the application domain however muses with specific properties might be of interest in this paper we tackle the problem of finding smallest cardinality mus smus of given formula an smus provides succinct explanation of infeasibility and is valuable for applications that are heavily affected by the size of the explanation we present baseline algorithm for finding an smus founded on earlier work for finding all muses and new branch and bound algorithm called digger that computes strong lower bound on the size of an smus and splits the problem into more tractable subformulas in recursive search tree using two benchmark suites we experimentally compare digger to the baseline algorithm and to an existing incomplete genetic algorithm approach digger is shown to be faster in nearly all cases it is also able to solve far more instances within given runtime limit than either of the other approaches
simultaneous multithreading smt allows multiple threads to supply instructions to the instruction pipeline of superscalar processor because threads share processor resources an smt system is inherently different from multiprocessor system and therefore utilizing multiple threads on an smt processor creates new challenges for database implementerswe investigate three thread based techniques to exploit smt architectures on memory resident data first we consider running independent operations in separate threads technique applied to conventional multi processor systems second we describe novel implementation strategy in which individual operators are implemented in multi threaded fashion finally we introduce new data structure called work ahead set that allows us to use one of the threads to aggressively preload data into the cachewe evaluate each method with respect to its performance implementation complexity and other measures we also provide guidance regarding when and how to best utilize the various threading techniques our experimental results show that by taking advantage of smt technology we achieve to improvement in throughput over single threaded implementations on in memory database operations
loop fusion improves data locality and reduces synchronization in data parallel applications however loop fusion is not always legal even when legal fusion may introduce loop carried dependences which prevent parallelism in addition performance losses result from cache conflicts in fused loops in this paper we present new techniques to allow fusion of loop nests in the presence of fusion preventing dependences maintain parallelism and allow the parallel execution of fused loops with minimal synchronization and eliminate cache conflicts in fused loops we describe algorithms for implementing these techniques in compilers the techniques are evaluated on processor ksr multiprocessor and on processor convex spp multiprocessor the results demonstrate performance improvements for both kernels and complete applications the results also indicate that careful evaluation of the profitability of fusion is necessary as more processors are used
key challenge for dynamic web service selection is that web services are typically highly configurable and service requesters often have dynamic preferences on service configurations current approaches such as ws agreement describe web services by enumerating the various possible service configurations an inefficient approach when dealing with numerous service attributes with large value spaces we model web service configurations and associated prices and preferences more compactly using utility function policies which also allows us to draw from multi attribute decision theory methods to develop an algorithm for optimal service selection in this paper we present an owl ontology for the specification of configurable web service offers and requests and flexible and extensible framework for optimal service selection that combines declarative logic based matching rules with optimization methods such as linear programming assuming additive price preference functions experimental results indicate that our algorithm introduces an overhead of only around sec compared to random service selection while giving optimal results the overhead as percentage of total time decreases as the number of offers and configurations increase
ranking documents with respect to users information needs is challenging task due in part to the dynamic nature of users interest with respect to query which can change over time in this paper we propose an innovative method for characterizing the interests of community of users at specific point in time and for using this characterization to alter the ranking of documents retrieved for query by generating community interest vector civ for given query we measure the community interest by computing score in specific document or web page retrieved by the query this score is based on continuously updated set of recent daily or past few hours user oriented text data when applying our method in ranking yahoo buzz results the civ score improves relevant results by as determined by real world user evaluation
spectral compression of the geometry of triangle meshes achieves good results in practice but there has been little or no theoretical support for the optimality of this compression we show that for certain classes of geometric mesh models spectral decomposition using the eigenvectors of the symmetric laplacian of the connectivity graph is equivalent to principal component analysis on that class when equipped with natural probability distribution our proof treats connected one and two dimensional meshes with fixed convex boundaries and is based on an asymptotic approximation of the probability distribution in the two dimensional case the key component of the proof is that the laplacian is identical up to constant factor to the inverse covariance matrix of the distribution of valid mesh geometries hence spectral compression is optimal in the mean square error sense for these classes of meshes under some natural assumptions on their distribution
when messages which are to be sent point to point in network become available at irregular intervals decision must be made each time new message becomes available as to whether it should be sent immediately or if it is better to wait for more messages and send them all together because of physical properties of the networks certain minimum amount of time must elapse in between the transmission of two packets thus whereas waiting delays the transmission of the current data sending immediately may delay the transmission of the next data to become available even more we propose new quality measure and derive optimal deterministic and randomized algorithms for this on line problem
this paper discusses online power aware routing in large wireless ad hoc networks for applications where the message sequence is not known we seek to optimize the lifetime of the network we show that online power aware routing does not have constant competitive ratio to the off line optimal algorithm we develop an approximation algorithm called max min zpmin that has good empirical competitive ratio to ensure scalability we introduce second online algorithm for power aware routing this hierarchical algorithm is called zone based routing our experiments show that its performance is quite good
our society is increasingly moving towards richer forms of information exchange where mobility of processes and devices plays prominent role this tendency has prompted the academic community to study the security problems arising from such mobile environments and in particular the security policies regulating who can access the information in question in this paper we describe calculus for mobile processes and propose mechanism for specifying access privileges based on combination of the identity of the users seeking access their credentials and the location from which they seek it within reconfigurable nested structure we define baci boxed ambient calculus extended with distributed role based access control mechanism where each ambient controls its own access policy process in baci is associated with an owner and set of activated roles that grant permissions for mobility and communication the calculus includes primitives to activate and deactivate roles the behavior of these primitives is determined by the process’s owner its current location and its currently activated roles we consider two forms of security violations that our type system prevents attempting to move into an ambient without having the authorizing roles granting entry activated and trying to use communication port without having the roles required for access activated we accomplish and by giving static type system an untyped transition semantics and typed transition semantics we then show that well typed program never violates the dynamic security checks
recommender systems improve access to relevant products and information by making personalized suggestions based on previous examples of user’s likes and dislikes most existing recommender systems use collaborative filtering methods that base recommendations on other users preferences by contrast content based methods use information about an item itself to make suggestionsthis approach has the advantage of being able to recommend previously unrated items to users with unique interests and to provide explanations for its recommendations we describe content based book recommending system that utilizes information extraction and machine learning algorithm for text categorization initial experimental results demonstrate that this approach can produce accurate recommendations
we present fully automatic face recognition algorithm and demonstrate its performance on the frgc data our algorithm is multimodal and and performs hybrid feature based and holistic matching in order to achieve efficiency and robustness to facial expressions the pose of face along with its texture is automatically corrected using novel approach based on single automatically detected point and the hotelling transform novel spherical face representation sfr is used in conjunction with the sift descriptor to form rejection classifier which quickly eliminates large number of candidate faces at an early stage for efficient recognition in case of large galleries the remaining faces are then verified using novel region based matching approach which is robust to facial expressions this approach automatically segments the eyes forehead and the nose regions which are relatively less sensitive to expressions and matches them separately using modified icp algorithm the results of all the matching engines are fused at the metric level to achieve higher accuracy we use the frgc benchmark to compare our results to other algorithms which used the same database our multimodal hybrid algorithm performed better than others by achieving and verification rates at far and identification rates of and for probes with neutral and non neutral expression respectively
server and storage clustering has become popular platform for hosting large scale online services elements of the service clustering support are often constructed using centralized or hierarchical architectures in order to meet performance and policy objectives desired by online applications for instance central executive node can be employed to make efficient resource management decisions based on complete view of cluster wide resource availability as well as request demands functionality symmetric software architecture can enhance the robustness of cluster based network services due to its inherent absence of vulnerability points however such design must satisfy performance requirements and policy objectives desired by online services this paper argues for the improved robustness of functionally symmetric architectures and presents the designs of two specific clustering support elements energy conserving server consolidation and service availability management our emulation and experimentation on server cluster show that the proposed designs do not significantly compromise the system performance and policy objectives compared with the centralized approaches
soft errors are becoming common problem in current systems due to the scaling of technology that results in the use of smaller devices lower voltages and power saving techniques in this work we focus on soft errors that can occur in the objects created in heap memory and investigate techniques for enhancing the immunity to soft errors through various object duplication schemes the idea is to access the duplicate object when the checksum associated with the primary object indicates an error we implemented several duplication based schemes and conducted extensive experiments our results clearly show that this spectrum of schemes enable us to balance the tradeoffs between error rate and heap space consumption
the main purpose of topic detection and tracking tdt is to detect group and organize newspaper articles reporting on the same event since an event is reported occurrence at specific time and place and the unavoidable consequences tdt can benefit from an explicit use of time and place information in this work we focused on place information using time information as in the previous research news articles were analyzed for their characteristics of place information and new topic tracking method was proposed to incorporate the analysis results on place information experiments show that appropriate use of place information extracted automatically from news articles indeed helps event tracking that identify news articles reporting on the same events
we motivate the design of typed assembly language tal and present type preserving ttranslation from systemn to tal the typed assembly language we pressent is based on conventional risc assembly language but its static type sytem provides support for enforcing high level language abstratctions such as closures tuples and user defined abstract data types the type system ensures that well typed programs cannot violatet these abstractionsl in addition the typing constructs admit many low level compiler optimiztaions our translation to tal is specified as sequence of type preserving transformations including cps and closure conversion phases type correct source programs are mapped to type correct assembly language key contribution is an approach to polymorphic closure conversion that is considerably simpler than previous work the compiler and typed assembly lanugage provide fully automatic way to produce certified code suitable for use in systems where unstrusted and potentially malicious code must be checked for safety before execution
transductive support vector machine tsvm is method for semi supervised learning in order to further improve the classification accuracy and robustness of tsvm in this paper we make use of self training technique to ensemble tsvms and classify testing samples by majority voting the experiment results on uci datasets show that the classification accuracy and robustness of tsvm could be improved by our approach
discrete event system specification devs has been widely used to describe hierarchical models of discrete systems devs has also been used successfully to model with real time constraints in this paper we introduce methodology to verify real time devs models and describe the methodology by using case study of devs model of an elevator system our methodology applies recent advances in theoretical model checking to devs models the methodology also handles the cases where theoretical approach is not feasible to cross the gap between abstract timed automata models and the complexity of the devs real time implementation by empirical software engineering methods the case study is system composed of an elevator along an elevator controller and we show how the methodology can be applied to real case like this one in order to improve the quality of such real time applications
field association fa terms are limited set of discriminating terms that can specify document fields document fields can be decided efficiently if there are many relevant fa terms in that documents an earlier approach built fa terms dictionary using www search engine but there were irrelevant selected fa terms in that dictionary because that approach extracted fa terms from the whole documents this paper proposes new approach for extracting fa terms using passage portions of document text technique rather than extracting them from the whole documents this approach extracts fa terms more accurately than the earlier approach the proposed approach is evaluated for articles from the large tagged corpus according to experimental results it turns out that by using the new approach about more relevant fa terms are appending to the earlier fa term dictionary and around irrelevant fa terms are deleted moreover precision and recall are achieved and respectively using the new approach
we propose framework for examining trust in the storage stack based on different levels of trustworthiness present across different channels of information flow we focus on corruption in one of the channels the data channel and as case study we apply type aware corruption techniques to examine windows ntfs behavior when on disk pointers are corrupted we find that ntfs does not verify on disk pointers thoroughly before using them and that even established error handling techniques like replication are often used ineffectively our study indicates the need to more carefully examine how trust is managed within modern file systems
we present coverage metric which evaluates the testing of set of interacting concurrent processes existing behavioral coverage metrics focus almost exclusively on the testing of individual processes however the vast majority of practical hardware descriptions are composed of many processes which must correctly interact to implement the system coverage metrics which evaluate processes separately are unlikely to model the range of design errors which manifest themselves when components are integrated to build system metric which models component interactions is essential to enable validation techniques to scale with growing design complexity we describe the effectiveness of our metric and provide results to demonstrate that coverage computation using our metric is tractable
to improve the lifetime performance of multicore chip with simple cores we propose the core cannibalization architecture cca chip with cca provisions fraction of the cores as cannibalizable cores ccs in the absence of hard faults the ccs function just like normal cores in the presence of hard faults the ccs can be cannibalized for spare parts at the granularity of pipeline stages we have designed and laid out cca chips composed of multiple openrisc cores our results show that cca improves the chips lifetime performances compared to chips without cca
modifying motion capture to satisfy the constraints of new animation is difficult when contact is involved and critical problem for animation of hands the compliance with which character makes contact also reveals important aspects of the movement’s purpose we present new technique called interaction capture for capturing these contact phenomena we capture contact forces at the same time as motion at high rate and use both to estimate nominal reference trajectory and joint compliance unlike traditional methods our method estimates joint compliance without the need for motorized perturbation devices new interactions can then be synthesized by physically based simulation we describe novel position based linear complementarity problem formulation that includes friction breaking contact and the compliant coupling between contacts at different fingers the technique is validated using data from previous work and our own perturbation based estimates
ary trees are fundamental data structure in many text processing algorithms eg text searching the traditional pointer based representation of trees is space consuming and hence only relatively small trees can be kept in main memory nowadays however many applications need to store huge amount of information in this paper we present succinct representation for dynamic ary trees of nodes requiring nlogk nlogk bits of space which is close to the information theoretic lower bound unlike alternative representations where the operations on the tree can be usually computed in logn time our data structure is able to take advantage of asymptotically smaller values of supporting the basic operations parent and child in logk loglogn time which is logn time whenever logk logn insertions and deletions of leaves in the tree are supported in log log log frac log log log log log amortized time our representation also supports more specialized operations like subtreesize depth etc and provides new trade off when allowing faster updates in loglogn amortized time versus the amortized time of loglogn for from raman and rao at the cost of slower basic operations in loglogn time versus time of the acm portal is published by the association for computing machinery copyright acm inc terms of usage privacy policy code of ethics contact us useful downloads adobe acrobat quicktime windows media player real player
fastest three dimensional surface reconstruction algorithms from point clouds require of the knowledge of the surface normals the accuracy of state of the art methods depends on the precision of estimated surface normals surface normals are estimated by assuming that the surface can be locally modelled by plane as was proposed by hoppe et al thus current methods for estimating surface normals are prone to introduce artifacts at the geometric edges or corners of the objects in this paper an algorithm for normal estimation with neighborhood reorganization nenr is presented our proposal changes the characteristics of the neighborhood in places with corners or edges by assuming locally plane piecewise surface the results obtained by nenr improve the quality of the normal with respect to the state of the art algorithms the new neighborhood computed by nenr use only those points that belong to the same plane and they are the nearest neighbors experiments in synthetic and real data shown an improvement on the geometric edges of reconstructed surfaces when our algorithm is used
we address the problem of cyclic termgraph rewriting we propose new framework where rewrite rules are tuples of the form such that and are termgraphs representing the left hand and the right hand sides of the rule is mapping from the nodes of to those of and is partial function from nodes of to nodes of the mapping describes how incident edges of the nodes in are connected in it is not required to be graph morphism as in classical algebraic approaches of graph transformation the role of is to indicate the parts of to be cloned copied furthermore we introduce notion of heterogeneous pushout and define rewrite steps as heterogeneous pushouts in given category among the features of the proposed rewrite systems we quote the ability to perform local and global redirection of pointers addition and deletion of nodes as well as cloning and collapsing substructures
in this article we propose new approach for querying and indexing database of trees with specific applications to xml datasets our approach relies on representing both the queries and the data using sequential encoding and then subsequently employing an innovative variant of the longest common subsequence lcs matching algorithm to retrieve the desired results key innovation here is the use of series of inter linked early pruning steps coupled with simple index structure that enable us to reduce the search space and eliminate large number of false positive matches prior to applying the more expensive lcs matching algorithm additionally we also present mechanisms that enable the user to specify constraints on the retrieved output and show how such constraints can be pushed deep into the retrieval process leading to improved response times mechanisms supporting the retrieval of approximate matches are also supported when compared with state of the art approaches the query processing time of our algorithms is shown to be up to two to three orders of magnitude faster on several real datasets on realistic query workloads finally we show that our approach is suitable for emerging multi core server architectures when retrieving data for more expensive queries
computer programs can only run reliably if the underlying operating system is free of errors in this paper we evaluate from practitioner’s point of view the utility of the popular software model checker blast for revealing errors in linux kernel code the emphasis is on important errors related to memory safety in and locking behaviour of device drivers our conducted case studies show that while blast’s abstraction and refinement techniques are efficient and powerful the tool has deficiencies regarding usability and support for analysing pointers which are likely to prevent kernel developers from using it
performance prediction methods can help software architects to identify potential performance problems such as bottlenecks in their software systems during the design phase in such early stages of the software life cycle only little information is available about the system’s implementation and execution environment however these details are crucial for accurate performance predictions performance completions close the gap between available high level models and required low level details using model driven technologies transformations can include details of the implementation and execution environment into abstract performance models however existing approaches do not consider the relation of actual implementations and performance models used for prediction furthermore they neglect the broad variety of possible implementations and middleware platforms possible configurations and possible usage scenarios in this paper we establish formal relation between generated performance models and generated code ii introduce design and application process for parametric performance completions and iii develop parametric performance completion for message oriented middleware according to our method parametric performance completions are independent of specific platform reflect performance relevant software configurations and capture the influence of different usage scenarios to evaluate the prediction accuracy of the completion for message oriented middleware we conducted real world case study with the specjms benchmark http wwwspecorg jms the observed deviation of measurements and predictions was below to
the widespread deployment of recommender systems has lead to user feedback of varying quality while some users faithfully express their true opinion many provide noisy ratings which can be detrimental to the quality of the generated recommendations the presence of noise can violate modeling assumptions and may thus lead to instabilities in estimation and prediction even worse malicious users can deliberately insert attack profiles in an attempt to bias the recommender system to their benefit robust statistics is an area within statistics where estimation methods have been developed that deteriorate more gracefully in the presence of unmodeled noise and slight departures from modeling assumptions in this work we study how such robust statistical methods in particular estimators can be used to generate stable recommendation even in the presence of noise and spam to that extent we present robust matrix factorization algorithm and study its stability we conclude that estimators do not add significant stability to recommendation however the presented algorithm can outperform existing recommendation algorithms in its recommendation quality
high performing on chip instruction caches are crucial to keep fast processors busy unfortunately while on chip caches are usually successful at intercepting instruction fetches in loop intensive engineering codes they are less able to do so in large systems codes to improve the performance of the latter codes the compiler can be used to lay out the code in memory for reduced cache conflicts interestingly such an operation leaves the code in state that can be exploited by new type of instruction prefetching guarded sequential prefetchingthe idea is that the compiler leaves hints in the code as to how the code was laid out then at run time the prefetching hardware detects these hints and uses them to prefetch more effectively this scheme can be implemented very cheaply one bit encoded in control transfer instructions and prefetch module that requires minor extensions to existing next line sequential prefetchers furthermore the scheme can be turned off and on at run time with the toggling of bit in the tlb the scheme is evaluated with simulations using complete traces from processor machine overall for kbyte primary instruction caches guarded sequential prefetching removes on average of the instruction misses remaining in an operating system with an optimized layout speeding up the operating system by moreover the scheme is more cost effective and robust than existing sequential prefetching techniques
most recommender systems present recommended products in lists to the user by doing so much information is lost about the mutual similarity between recommended products we propose to represent the mutual similarities of the recommended products in two dimensional map where similar products are located close to each other and dissimilar products far apart as dissimilarity measure we use an adaptation of gower’s similarity coefficient based on the attributes of product two recommender systems are developed that use this approach the first the graphical recommender system uses description given by the user in terms of product attributes of an ideal product the second system the graphical shopping interface allows the user to navigate towards the product she wants we show prototype application of both systems to mp players
we introduce new algorithm to compute the spatial join of two or more spatial data sets when indexes are not available on them size separation spatial join sj imposes hierarchical decomposition of the data space and in contrast with previous approaches requires no replication of entities from the input data sets thus its execution time depends only on the sizes of the joined data sets we describe sj and present an analytical evaluation of its and processor requirements comparing them with those of previously proposed algorithms for the same problem we show that sj has relatively simple cost estimation formulas that can be exploited by query optimizer sj can be efficiently implemented using software already present in many relational systems in addition we introduce dynamic spatial bitmaps dsb new technique that enables sj to dynamically or statically exploit bitmap query processing techniques finally we present experimental results for prototype implementation of sj involving real and synthetic data sets for variety of data distributions our experimental results are consistent with our analytical observations and demonstrate the performance benefits of sj over alternative approaches that have been proposed recently
the assessment of routing protocols for mobile wireless networks is difficult task because of the networks dynamic behavior and the absence of benchmarks however some of these networks such as intermittent wireless sensors networks periodic or cyclic networks and some delay tolerant networks dtns have more predictable dynamics as the temporal variations in the network topology can be considered as deterministic which may make them easier to study recently graph theoretic model the evolving graphs was proposed to help capture the dynamic behavior of such networks in view of the construction of least cost routing and other algorithms the algorithms and insights obtained through this model are theoretically very efficient and intriguing however there is no study about the use of such theoretical results into practical situations therefore the objective of our work is to analyze the applicability of the evolving graph theory in the construction of efficient routing protocols in realistic scenarios in this paper we use the ns network simulator to first implement an evolving graph based routing protocol and then to use it as benchmark when comparing the four major ad hoc routing protocols aodv dsr olsr and dsdv interestingly our experiments show that evolving graphs have the potential to be an effective and powerful tool in the development and analysis of algorithms for dynamic networks with predictable dynamics at least in order to make this model widely applicable however some practical issues still have to be addressed and incorporated into the model like adaptive algorithms we also discuss such issues in this paper as result of our experience
loop tiling and communication optimization such as message pipelining and aggregation can achieve optimized and robust memory performance by proactively managing storage and data movement in this paper we generalize these techniques to pointer based data structures pbdss our approach dynamic pointer alignment dpa has two components the compiler decomposes program into non blocking threads that operate on specific pointers and labels thread creation sites with their corresponding pointers at runtime an explicit mapping from pointers to dependent threads is updated at thread creation and is used to dynamically schedule both threads and communication such that threads using the same objects execute together communication overlaps with local work and messages are aggregated we have implemented dpa to optimize remote reads to global pbdss on parallel machines our empirical results on the force computation phases of two applications that use sophisticated pbdss barnes hut and fmm show that dpa achieves good absolute performance and speedups by enabling tiling and communication optimization on the cray td
model interchange approaches support the analysis of software architecture and design by enabling variety of tools to automatically exchange performance models using common schema this paper builds on one of those interchange formats the software performance model interchange format pmif and extends it to support the performance analysis of real time systems specifically it addresses real time system designs expressed in the construction and composition language ccl and their transformation into the pmif for additional performance analyses this paper defines extensions and changes to the pmif meta model and schema required for real time systems it describes transformations for both simple best case models and more detailed models of concurrency and synchronization case study demonstrates the techniques and compares performance results from several analyses
constraint specifies relation or condition that must be maintained in system it is common for single user graphic system to specify some constraints and provide methods to satisfy these constraints automatically constraints are even more useful in collaborative systems which can confine and coordinate concurrent operations but satisfying constraints in the presence of concurrency in collaborative systems is difficult in this article we discuss the issues and techniques in maintaining constraints in collaborative systems in particular we also proposed novel strategy that is able to maintain both constraints and system consistency in the face of concurrent operations the strategy is independent of the execution orders of concurrent operations and able to retain the effects of all operations in resolving constraint violation the proposed strategy has been implemented in collaborative genetic software engineering system called cogse for maintaining the tree structure constraint specific issues related to cogse are also discussed in detail
embedded systems are being deployed as part of critical infrastructures and are vulnerable to malicious attacks due to internet accessibility intrusion detection systems have been proposed to protect computer systems from unauthorized penetration detecting an attack early on pays off since further damage is avoided and in some cases resilient recovery could be adopted this is especially important for embedded systems deployed in critical infrastructures such as power grids etc where timely intervention could save catastrophes an intrusion detection system monitors dynamic program behavior against normal program behavior and raises an alert when an anomaly is detected the normal behavior is learnt by the system through training and profilinghowever all current intrusion detection systems are purely software based and thus suffer from large performance degradation due to constant monitoring operations inserted in application code due to the potential performance overheads software based solutions cannot monitor program behavior at very fine level of granularity thus leaving potential security holes as shown in the literature another important drawback of such methods is that they are unable to detect intrusions in near real time and the time lag could prove disastrous in real time embedded systems in this paper we propose hardware based approach to verify program execution paths of target applications dynamically and to detect anomalous executions with hardware support our approach offers multiple advantages over software based solutions including minor performance degradation much stronger detection capability larger variety of attacks get detected and zero latency reaction upon an anomaly for near real time detection and thus much better security
the next generation of software systems will be highly distributed component based and service oriented they will need to operate in unattended mode and possibly in hostile environments will be composed of large number of replaceable components discoverable at run time and will have to run on multitude of unknown and heterogeneous hardware and network platforms this paper focuses on qos management in service oriented architectures in which service providers sp provide set of interrelated services to service consumers and qos broker mediates qos negotiations between sps and consumers the main contributions of this paper are the description of an architecture that includes qos broker and service provider software components ii the specification of secure protocol for qos negotiation with the support of qos broker iii the specification of an admission control mechanism used by sps iv report on the implementation of the qos broker and sps and the experimental validation of the ideas presented in the paper
consider content distribution network consisting of set of sources repositories and clients where the sources and the repositories cooperate with each other for efficient dissemination of dynamic data in this system necessary changes are pushed from sources to repositories and from repositories to clients so that they are automatically informed about the changes of interest clients and repositories associate coherence requirements with data item denoting the maximum permissible deviation of the value of known to them from the value at the source given list of data item coherence served by each repository and set of client data item coherence requests we address the following problem how do we assign clients to the repositories so that the fidelity that is the degree to which client coherence requirements are met is maximized in this paper we first prove that the client assignment problem is np hard given the closeness of the client repository assignment problem and the matching problem in combinatorial optimization we have tailored and studied two available solutions to the matching problem from the literature max flow min cost and ii stable marriages our empirical results using real world dynamic data show that the presence of coherence requirements adds new dimension to the client repository assignment problem an interesting result is that in update intensive situations better fidelity can be delivered to the clients by attempting to deliver data to some of the clients at coherence lower than what they desire consequence of this observation is the necessity for quick adaptation of the delivered vs desired data coherence with respect to the changes in the dynamics of the system we develop techniques for such adaptation and show their impressive performance
we examine the problem of minimizing feedbacks in reliable wireless broadcasting by pairing rateless coding with extreme value theory our key observation is that in broadcast environment this problem resolves into estimating the maximum number of packets dropped among many receivers rather than for each individual receiverwith rateless codes this estimation relates to the number of redundant transmissions needed at the source in order for all receivers to correctly decode message with high probability we develop and analyze two new data dissemination protocols called random sampling rs and full sampling with limited feedback fslf based on the moment and maximum likelihood estimators in extreme value theory both protocols rely on single round learning phase requiring the transmission of few feedback packets from small subset of receivers with fixed overhead we show that fslf has the desirable property of becoming more accurate as the receivers’s population gets larger our protocols are channel agnostic in that they do not require priori knowledge of iid packet loss probabilities which may vary among receivers we provide simulations and an improved full scale implementation of the rateless deluge overthe air programming protocol on sensor motes as demonstration of the practical benefits of our protocols which translate into about latency and energy consumption savings
unstructured meshes are often used in simulations and imaging applications they provide advanced flexibility in modeling abilities but are more difficult to manipulate and analyze than regular data this work provides novel approach for the analysis of unstructured meshes using feature space clustering and feature detection analyzing and revealing underlying structures in data involve operators on both spatial and functional domains slicing concentrates more on the spatial domain while iso surfacing or volume rendering concentrate more on the functional domain nevertheless many times it is the combination of the two domains which provides real insight on the structure of the data in this work combined feature space is defined on top of unstructured meshes in order to search for structure in the data point in feature space includes the spatial coordinates of the point in the mesh domain and all chosen attributes defined on the mesh distance measures between points in feature space is defined enabling the utilization of clustering using the mean shift procedure previously used for images on unstructured meshes feature space analysis is shown to be useful for feature extraction for data exploration and partitioning
model checking has proven to be useful analysis technique not only for concurrent systems but also for the genetic regulatory networks grns that govern the functioning of living cells the applications of model checking in systems biology have revealed that temporal logics should be able to capture both branching time and fairness properties at the same time they should have user friendly syntax easy to employ by non experts in this paper we define ctrl computation tree regular logic an extension of ctl with regular expressions and fairness operators that attempts to match these criteria ctrlsubsumes both ctl and ltl and has reduced set of temporal operators indexed by regular expressions inspired from the modalities of pdl propositional dynamic logic we also develop translation of ctrl into hmlr hennessy milner logic with recursion an equational variant of the modal calculus this has allowed us to obtain an on the fly model checker with diagnostic for ctrl by directly reusing the verification technology available in the cadp toolbox we illustrate the application of the ctrl model checker by analyzing the grn controlling the carbon starvation response of escherichia coli
we propose new approach for developing and deploying distributed systems in which nodes predict distributed consequences of their actions and use this information to detect and avoid errors each node continuously runs state exploration algorithm on recent consistent snapshot of its neighborhood and predicts possible future violations of specified safety properties we describe new state exploration algorithm consequence prediction which explores causally related chains of events that lead to property violation this paper describes the design and implementation of this approach termed crystalball we evaluate crystalball on randtree bulletprime paxos and chord distributed system implementations we identified new bugs in mature mace implementations of three systems furthermore we show that if the bug is not corrected during system development crystalball is effective in steering the execution away from inconsistent states at runtime
the authors study the adaptation of an optimistic time warp kernel to cross cluster computing on the grid wide area communication the primary source of overhead is offloaded onto dedicated routing processes this allows the simulation processes to run at full speed and thus significantly decreases the performance gap caused by the wide area distribution further improvements are obtained by employing message aggregation on the wide area links and using distributed global virtual time algorithm the authors achieve many of their objectives for cellular automaton simulation with lazy cancellation and moderate communication high communication rates especially with aggressive cancellation present challenge this is confirmed by the experiments with synthetic loads even then satisfactory speedup can be achieved provided that the computational grain of events is large enough
we explicitly analyze the trajectories of learning near singularities in hierarchical networks such as multilayer perceptrons and radial basis function networks which include permutation symmetry of hidden nodes and show their general properties such symmetry induces singularities in their parameter space where the fisher information matrix degenerates and odd learning behaviors especially the existence of plateaus in gradient descent learning arise due to the geometric structure of singularity we plot dynamic vector fields to demonstrate the universal trajectories of learning near singularities the singularity induces two types of plateaus the on singularity plateau and the near singularity plateau depending on the stability of the singularity and the initial parameters of learning the results presented in this letter are universally applicable to wide class of hierarchical models detailed stability analysis of the dynamics of learning in radial basis function networks and multilayer perceptrons will be presented in separate work
the compositional computation of pareto points in multi dimensional optimization problems is an important means to efficiently explore the optimization space this paper presents symbolic pareto calculator spac for the algebraic computation of multidimensional trade offs spac uses bdds as representation for solution sets and operations on them the tool can be used in multi criteria optimization and design space exploration of embedded systems the paper describes the design and implementation of pareto algebra operations and it shows that bdds can be used effectively in pareto optimization
in this paper we develop compilation techniques for the realization of applications described in high level language hll onto runtime reconfigurable architecture the compiler determines hyper operations hyperops that are subgraphs of data flow graph of an application and comprise elementary operations that have strong producer consumer relationship these hyperops are hosted on computation structures that are provisioned on demand at runtime we also report compiler optimizations that collectively reduce the overheads of data driven computations in runtime reconfigurable architectures on an average hyperops offer reduction in total execution time and reduction in management overheads as compared to using basic blocks as coarse grained operations we show that hyperops formed using our compiler are suitable to support data flow software pipelining
tree based access methods for moving objects are hardly applicable in practice due mainly to excessive space requirements and high management costs to overcome the limitations of such tree based access methods we propose new index structure called aim adaptive cell based index for moving objects the aim is cell based multiversion access structure adopting an overlapping technique the aim refines cells adaptively to handle regional data skew which may change its locations over time through the extensive performance studies we observed that the aim consumed at most of the space required by tree based methods and achieved higher query performance compared with tree based methods
feature ranking is kind of feature selection process which ranks the features based on their relevances and importance with respect to the problem this topic has been well studied in supervised classification area however very few works are done for unsupervised clustering under the condition that labels of all instances are unknown beforehand thus feature ranking for unsupervised clustering is challenging task due to the absence of labels of instances for guiding the computations of the relevances of features this paper explores the feature ranking approach within the unsupervised clustering area we propose novel consensus unsupervised feature ranking approach termed as unsupervised feature ranking from multiple views frmv the frmv method firstly obtains multiple rankings of all features from different views of the same data set and then aggregates all the obtained feature rankings into single consensus one experimental results on several real data sets demonstrate that frmv is often able to identify better feature ranking when compared with that obtained by single feature ranking approach
even after several decades of research modeling is considered an art with high liability to produce incorrect abstractions of real world systems therefore validation and verification of simulation models is considered an indispensable method to establish the credibility of developed models in the process of parallelizing or distributing given credible simulation model bias is introduced possibly leading to serious errors in simulation results depending on the mechanisms used for parallelization or distribution separate validation of the parallel or distributed model is required necessary first step for such validation is an understanding of the sources of bias that might occur through parallelization or distribution of simulation model the intention of this paper is to give an overview of the various types of bias and to give formal definition of the bias and its quantification
document properties are compelling infrastructure on which to develop document management applications property based approach avoids many of the problems of traditional heierarchical storage mechanisms reflects document organizations meaningful to user tasks provides means to integrate the perspectives of multiple individuals and groups and does this all within uniform interaction framework document properties can reflect not only categorizations of documents and document use but also expressions of desired system activity such as sharing criteria replication management and versioning augmenting property based document management systems with active properties that carry executable code enables the provision of document based services on property infrastructure the combination of document properties as uniform mechanism for document management and active properties as way of delivering document services represents new paradigm for document management infrastructures the placeless documents system is an experimental prototype developed to explore this new paradigm it is based on the seamless integration of user specific active properties we present the fundamental design approach explore the challenges and opportunities it presents and show our architectures deals with them
current search engines can hardly cope adequately with fuzzy predicates defined by complex preferences the biggest problem of search engines implemented with standard sql is that sql does not directly understand the notion of preferences preference sql extends sql by preference model based on strict partial orders presented in more detail in the companion paper kie where preference queries behave like soft selection constraints several built in base preference types and the powerful pareto operator combined with the adherence to declarative sql programming style guarantees great programming productivity the preference sql optimizer does an efficient re writing into standard sql including high level implementation of the skyline perator for pareto optimal sets this pre processor approach enables seamless application integration making preference sql available on all major sql platforms several commercial bc portals are powered by preference sql its benefits comprise cooperative query answering and smart customer advice leading to higher customer satisfaction and shorter development times of personalized search engines we report practical experiences ranging from commerce and comparison shopping to large scale performance test for job portal
chip multiprocessors cmps are expected to be the building blocks for future computer systems while architecting these emerging cmps is challenging problem on its own programming them is even more challenging as the number of cores accommodated in chip multiprocessors increases network on chip noc type communication fabrics are expected to replace traditional point to point buses most of the prior software related work so far targeting cmps focus on performance and power aspects however as technology scales components of cmp are being increasingly exposed to both transient and permanent hardware failures this paper presents and evaluates compiler directed power performance aware reliability enhancement scheme for network on chip noc based chip multiprocessors cmps the proposed scheme improves on chip communication reliability by duplicating messages traveling across cmp nodes such that for each original message its duplicate uses different set of communication links as much as possible to satisfy performance constraint in addition our approach tries to reuse communication links across the different phases of the program to maximize link shutdown opportunities for the noc to satisfy power constraint our results show that the proposed approach is very effective in improving on chip network reliability without causing excessive power or performance degradation in our experiments we also evaluate the performance oriented and energy oriented versions of our compiler directed reliability enhancement scheme and compare it to two pure hardware based fault tolerant routing schemes
over the past decade we have witnessed the evolution of wireless sensor networks with advancements in hardware design communication protocols resource efficiency and other aspects recently there has been much focus on mobile sensor networks and we have even seen the development of small profile sensing devices that are able to control their own movement although it has been shown that mobility alleviates several issues relating to sensor network coverage and connectivity many challenges remain among these the need for position estimation is perhaps the most important not only is localization required to understand sensor data in spatial context but also for navigation key feature of mobile sensors in this paper we present survey on localization methods for mobile wireless sensor networks we provide taxonomies for mobile wireless sensors and localization including common architectures measurement techniques and localization algorithms we conclude with description of real world mobile sensor applications that require position estimation
an experimental comparison of large number of different image descriptors for content based image retrieval is presented many of the papers describing new techniques and descriptors for content based image retrieval describe their newly proposed methods as most appropriate without giving an in depth comparison with all methods that were proposed earlier in this paper we first give an overview of large variety of features for content based image retrieval and compare them quantitatively on four different tasks stock photo retrieval personal photo collection retrieval building retrieval and medical image retrieval for the experiments five different publicly available image databases are used and the retrieval performance of the features is analyzed in detail this allows for direct comparison of all features considered in this work and furthermore will allow comparison of newly proposed features to these in the future additionally the correlation of the features is analyzed which opens the way for simple and intuitive method to find an initial set of suitable features for new task the article concludes with recommendations which features perform well for what type of data interestingly the often used but very simple color histogram performs well in the comparison and thus can be recommended as simple baseline for many applications
we investigate the problem of designing scalable overlay network to support decentralized topic based pub sub communication we introduce new optimization problem called minimum topic connected overlay min tco that captures the tradeoff between the scalability of the overlay in terms of the nodes fanout and the message forwarding overhead incurred by the communicating parties roughly the min tco problem is as follows given collection of nodes and their subscriptions connect the nodes using the minimum possible number of edges so that for each topic message published on could reach all the nodes interested in by being forwarded by onlythe nodes interested in we show that the decision version of min tco is np complete and present polynomial algorithm that approximates the optimal solution within logarithmic factor with respect to the number of edges in theconstructed overlay we further prove that this approximation ratio is almost tight by showing that no polynomial algorithm can approximate min tco within constant factor unless np we show experimentally that on typical inputs the fanout of the overlay constructed by our approximation algorithm is significantly lower thanthat of the overlays built by the existing algorithms and that its running time is just small fraction of the analytical worst case bound as min tco can be shown to capture several important aspects of most known overlay based pub sub implementations our study sheds light on the inherent limitations of the existing systems as well asprovides an insight into the best possible feasible solution finally we introduce flexible framework that generalizes min tco and formalizes most similar overlay design problems that occur in scalable pub sub systems we also briefly discuss several examples of such problems and show some results with respect to their complexity
in order to organise and manage geospatial and georeferenced information on the web making them convenient for searching and browsing digital portal known as portal has been designed and implemented compared to other digital libraries portal is unique for several of its features it maintains metadata resources in xml with flexible resource schemas logical groupings of metadata resources as projects and layers are possible to allow the entire meta data collection to be partitioned differently for users with different information needs these metadata resources can be displayed in both the classification based and map based interfaces provided by portal portal further incorporates both query module and an annotation module for users to search metadata and to create additional knowledge for sharing respectively portal also includes resource classification module that categorizes resources into one or more hierarchical category trees based on user defined classification schemas this paper gives an overview of the portal design and implementation the portal features will be illustrated using collection of high school geography examination related resources
this paper presents preliminary work towards maturity model for system documentation the documentation maturity model dmm is specifically targeted towards assessing the quality of documentation used in aiding program understanding software engineers and technical writers produce such documentation during regular product development lifecycles the documentation can also be recreated after the fact via reverse engineering the dmm has both process and product components this paper focuses on the product quality aspects
vulnerability analysis is concerned with the problem of identifying weaknesses in computer systems that can be exploited to compromise their security in this paper we describe new approach to vulnerability analysis based on model checking our approach involves formal specification of desired security properties an example of such property is no ordinary user can overwrite system log files an abstract model of the system that captures its security related behaviors this model is obtained by composing models of system components such as the file system privileged processes etc verification procedure that checks whether the abstract model satisfies the security properties and if not produces execution sequences also called exploit scenarios that lead to violation of these properties an important benefit of model based approach is that it can be used to detect known and as yet unknown vulnerabilities this capability contrasts with previous approaches such as those used in cops and satan which mainly address known vulnerabilitiesthis paper demonstrates our approach by modelling simplified version of unix based system and analyzing this system using model checking techniques to identify nontrivial vulnerabilities key contribution of this paper is to show that such an automated analysis is feasible in spite of the fact that the system models are infinite state systems our techniques exploit some of the latest techniques in model checking such as constraint based implicit representation of state space together with domain specific optimizations that are appropriate in the context of vulnerability analysisclearly realistic unix system is much more complex than the one that we have modelled in this paper nevertheless we believe that our results show automated and systematic vulnerability analysis of realistic systems to be feasible in the near future as model checking techniques continue to improve
we consider the problem of indexing set of objects moving in dimensional spaces along linear trajectories simple external memory indexing scheme is proposed to efficiently answer general range queries the following are examples of the queries that can be answered by the proposed method report all moving objects that will pass between two given points within specified time interval ii become within given distance from some or all of given set of other moving objects our scheme is based on mapping the objects to dual space where queries about moving objects are transformed into polyhedral queries concerning their speeds and initial locations we then present simple method for answering such polyhedral queries based on partitioning the space into disjoint regions and using tree to index the points in each region by appropriately selecting the boundaries of each region we guarantee an average search time that matches known lower bound for the problem specifically for fixed if the coordinates of given set of points are statistically independent the proposed technique answers polyhedral queries on the average in log o’s using space where is the block size and is the number of reported points our approach is novel in that while it provides theoretical upper bound on the average query time it avoids the use of complicated data structures making it an effective candidate for practical applications the proposed index is also dynamic in the sense that it allows object insertion and deletion in an amortized update cost of log o’s experimental results are presented to show the superiority of the proposed index over other methods based on trees
massive scale self administered networks like peer to peer and sensor networks have data distributed across thousands of participant hosts these networks are highly dynamic with short lived hosts being the norm rather than an exception in recent years researchers have investigated best effort algorithms to efficiently process aggregate queries eg sum count average minimum and maximum on these networks unfortunately query semantics for best effort algorithms are ill defined making it hard to reason about guarantees associated with the result returned in this paper we specify correctness condition single site validity with respect to which the above algorithms are best effort we present class of algorithms that guarantee validity in dynamic networks experiments on real life and synthetic network topologies validate performance of our algorithms revealing the hitherto unknown price of validity
behaviour analysis should form an integral part of the software development process this is particularly important in the design of concurrent and distributed systems where complex interactions can cause unexpected and undesired system behaviour we advocate the use of compositional approach to analysis the software architecture of distributed program is represented by hierarchical composition of subsystems with interacting processes at the leaves of the hierarchy compositional reachability analysis cra exploits the compositional hierarchy for incrementally constructing the overall behaviour of the system from that of its subsystems in the tracta cra approach both processes and properties reflecting system specifications are modelled as state machines property state machines are composed into the system and violations are detected on the global reachability graph obtained the property checking mechanism has been specifically designed to deal with compositional techniques tracta is supported by an automated tool compatible with our environment for the development of distributed applications
automatic circuit placement has received renewed interest recently given the rapid increase of circuit complexity increase of interconnect delay and potential sub optimality of existing placement algorithms in this paper we present generalized force directed algorithm embedded in mpl multilevel framework our new algorithm named mpl produces the shortest wirelength among all published placers with very competitive runtime on the ibm circuits used in the new contributions and enhancements are we develop new analytical placement algorithm using density constrained minimization formulation which can be viewed as generalization of the force directed method in we analyze and identify the advantages of our new algorithm over the force directed method we successfully incorporate the generalized force directed algorithm into multilevel framework which significantly improves wirelength and speed compared to capo our algorithm mpl produces shorter wirelength and is faster compared to dragon mpl has shorter wirelength and is faster compared to fengshui it has shorter wirelength and is faster compared to the ultra fast placement algorithm fastplace mpl produces shorter wirelength but is slower fast mode of mpl mpl fast can produce shorter wirelength than fast place and is only slower moreover mpl fast has demonstrated better scalability than fastplace
ontoweaver is our conceptual modelling methodology and tool that support the specification and implementation of customized web applications it relies on number of different types of ontologies to declaratively describe all aspects of web application this paper focuses on the ontoweaver customization framework which exploits user model customization rule model and declarative site model to enable the design and development of customized web applications at conceptual level ontoweaver makes use of the jess inference engine to reason upon the site specifications and their underlying site ontologies according to the customization rules and the valuable user profiles to provide customization support in an intelligent way the ontology based approach enables the target web applications to be represented in an exchangeable format hence the management and maintenance of web applications can be carried out at conceptual level without having to worry about the implementation details likewise the declarative nature of the site specifications and the generic customization framework allow the specification of customization requirements to be carried out at the conceptual level
it is well known that multiprocessor systems are vastly more difficult to program than systems that support sequential programming models in paper this author argued that six important principles for supporting modular software construction are often violated by the architectures proposed for multiprocessor computer systems the fresh breeze project concerns the architecture and design of multiprocessor chip that can achieve superior performance while honoring these six principlesthe envisioned multiprocessor chip will incorporate three ideas that are significant departures from mainstream thinking about multiprocessor architecture simultaneous multithreading has been shown to have performance advantages relative to contemporary superscalar designs this advantage can be exploited through use of programming model that exposes parallelism in the form of multiple threads of computation the value of shared address space is widely appreciated through the use of bit pointers the conventional distinction between memory and the file system can be abolished this can provide superior execution environment in support of program modularity and software reuse as well as supporting multi user data protection and security that is consistent with modular software structure no memory update cycle free heap data items are created used and released but never modified once created the allocation release and garbage collection of fixed size chunks of memory will be implemented by efficient hardware mechanisms major benefit of this choice is that the multiprocessor cache coherence problem vanishes any object retrieved from the memory system is immutable in addition it is easy to prevent the formation of pointer cycles simplifying the design of memory management support
existing solutions for fault tolerant routing in interconnection networks either work for only one given regular topology or require slow and costly network reconfigurations that do not allow full and continuous network access in this paper we present frroots routing method for fault tolerance in topology flexible network technologies our method is based on redundant paths and can handle single dynamic faults without sending control messages other than those that are needed to inform the source nodes of the failing component used in modus with local rerouting the source nodes need not be informed and no control messages are necessary for the network to stay connected despite of single fault in fault free networks under nonuniform traffic our routing method performs comparable to or even better than topology specific routing algorithms in regular networks like meshes and tori froots does not require any other features in the switches or end nodes than flexible routing table and modest number of virtual channels for that reason it can be directly applied to several present day technologies like infiniband and advanced switching
drastically increasing involvement of computer science in different aspects of human life and sciences and the reciprocal dependency these sciences have developed on computer science and technology have deployed extremely challenging grounds for software architecture and design as discipline the requirements related to the management of complexity which is in the essence of these new domains combined with sizes which are orders of magnitude larger than the conventional business applications necessitate development of new paradigms since we are in fact beyond the age of writing one program by one group which takes care of one type of issue for one class of users the new paradigms should guarantee some type of technical pluralism which allows indefinite number of people addressing indefinite aspects of complex clusters of issues in an ongoing effort over indefinite amount of time the technical basis should provide for easy and ideally automated integration of all such efforts this means the capability of random program design and integration based on supporting and unifying conceptual framework in contributing to these principles nuclear process oriented analysis and modeling npoam presented the capability of random modeling and design while establishing itself on the supporting framework of abstraction oriented frames in an ongoing research to further modularize and streamline this methodology this paper presents the idea and method of application of npoam to agent base systems the accomplishment of this goal is substantiated through double implementation effort one in an agent simulation environment and the other through the use of industrial strength modeling and application development tools this paper also extends the agent oriented framework to propose new concept named quasi agents which is essentially related to mostly deterministic environments and offers examples of quasi agents in implementation
many key predistribution techniques have been developed recently to establish pairwise keys between sensor nodes in wireless sensor networks to further improve these schemes researchers have also proposed to take advantage of the sensors expected locations and discovered locations to help the predistribution of the keying materials however in many cases it is very difficult to deploy sensor nodes at their expected locations or guarantee the correct location discovery at sensor nodes in hostile environments in this article group based deployment model is developed to improve key predistribution in this model sensor nodes are only required to be deployed in groups the critical observation in the article is that the sensor nodes in the same group are usually close to each other after deployment this deployment model is practical it greatly simplifies the deployment of sensor nodes while still providing an opportunity to improve key predistribution specifically the article presents novel framework for improving key predistribution using the group based deployment knowledge this framework does not require the knowledge of the sensors expected or discovered locations and is thus suitable for applications where it is difficult to deploy the sensor nodes at their expected locations or correctly estimate the sensors locations after deployment to seek practical key predistribution schemes the article presents two efficient instantiations of this framework hash key based scheme and polynomial based scheme the evaluation shows that these two schemes are efficient and effective for pairwise key establishment in sensor networks they can achieve much better performance than the previous key predistribution schemes when the sensor nodes are deployed in groups
math locality sensitive hashing scheme is distribution on family of hash functions operating on collection of objects such that for two objects prh egr sim where sim egr is some similarity function defined on the collection of objects such scheme leads to compact representation of objects so that similarity of objects can be estimated from their compact sketches and also leads to efficient algorithms for approximate nearest neighbor search and clustering min wise independent permutations provide an elegant construction of such locality sensitive hashing scheme for collection of subsets with the set similarity measure sim frac pgr pgr math we show that rounding algorithms for lps and sdps used in the context of approximation algorithms can be viewed as locality sensitive hashing schemes for several interesting collections of objects based on this insight we construct new locality sensitive hashing schemes for ol collection of vectors with the distance between rarr over and rarr over measured by oslash rarr over rarr over pgr where oslash rarr over rarr over is the angle between rarr over and rarr over this yields sketching scheme for estimating the cosine similarity measure between two vectors as well as simple alternative to minwise independent permutations for estimating set similaritya collection of distributions on points in metric space with distance between distributions measured by the earth mover distance emd popular distance measure in graphics and vision our hash functions map distributions to points in the metric space such that for distributions and emd xie eh egr xie log log log emd ol
personal information management pim is an activity in which an individual stores personal information items to retrieve them later in former article we suggested the user subjective approach theoretical approach proposing design principles with which pim systems can systematically use subjective attributes of information items in this consecutive article we report on study that tested the approach by exploring the use of subjective attributes ie project importance and context in current pim systems and its dependence on design characteristics participants were personal computer users tools included questionnaire semistructured interview that was transcribed and analyzed and screen captures taken from this subsample results indicate that participants tended to use subjective attributes when the design encouraged them to however when the design discouraged such use they either found their own alternative ways to use them or refrained from using them altogether this constitutes evidence in support of the user subjective approach as it implies that current pim systems do not allow for sufficient use of subjective attributes the article also introduces seven novel system design schemes suggested by the authors which demonstrate how the user subjective principles can be implemented copy wiley periodicals inc
several process metamodels exist each of them presents different viewpoint of the same information systems engineering process however there are no existing correspondences between them we propose method to build unified fitted and multi viewpoint process metamodels for information systems engineering our method is based on process domain metamodel that contains the main concepts of information systems engineering process field this process domain metamodel helps selecting the needed metamodel concepts for particular situational context our method is also based on patterns to refine the process metamodel the process metamodel can then be instantiated according to the organisation’s needs the resulting method is represented as pattern system
one of the simplest and yet most consistently well performing set of classifiers is the naive bayes models special class of bayesian network models however these models rely on the naive assumption that all the attributes used to describe an instance are conditionally independent given the class of that instance to relax this independence assumption we have in previous work proposed family of models called latent classification models lcms lcms are defined for continuous domains and generalize the naive bayes model by using latent variables to model class conditional dependencies between the attributes in addition to providing good classification accuracy the lcm has several appealing properties including relatively small parameter space making it less susceptible to over fitting in this paper we take first step towards generalizing lcms to hybrid domains by proposing an lcm for domains with binary attributes we present algorithms for learning the proposed model and we describe variational approximation based inference procedure finally we empirically compare the accuracy of the proposed model to the accuracy of other classifiers for number of different domains including the problem of recognizing symbols in black and white images
this paper analyzes the relationships between derived types and taxonomic constraints the objectives are to see which taxonomic constraints are entailed by derivation rules and to analyze how taxonomic constraints can be satisfied in presence of derived types we classify derived entity types into several classes the classification reveals the taxonomic constraints entailed in each case these constraints must be base constraints defined in the taxonomy or derivable from them we show how the base taxonomic constraints can be satisfied either by the derivation rules or the whoie schema or by enforcement we also show that our results extend naturally to taxonomies of relationship typesour results are general and could be incorporated into many conceptual modeling environments and tools the expected benefits are an improvement in the verification of the consistency between taxonomic constraints and derivation rules and guide to the information system designer for the determination of the taxonomic constraints that must be enforced in the final system
static power consumption has become significant factor of the total power consumption in system circuit level switching techniques reduce static power consumption by exploiting idle periods in processor components and placing them into low power modes or turning them off completely in this paper we propose modified automaton based list scheduling technique that augments the circuit level techniques by reducing the number of transitions between power modes thereby increasing the of idle periods in resource units our scheduler uses global resource usage vector and the usage vector of the last issued instruction to select instruction from the ready list such that resource units common to the last issued instruction and the selected instruction are continuously active without creating transition into low power mode we estimate the power consumed in resource units using an energy model parameterized by the number of idle cycles active cycles and transitions we have implemented our algorithm in gcc and our simulations for different classes of benchmarks using an arm simulator with single and multi issue indicate an average energy savings of in resource units
sensor nodes in distributed sensor network can fail due to variety of reasons eg harsh environmental conditions sabotage battery failure and component wear out since many wireless sensor networks are intended to operate in an unattended manner after deployment failing nodes cannot be replaced or repaired during field operation therefore by designing the network to be fault tolerant we can ensure that wireless sensor network can perform its surveillance and tracking tasks even when some nodes in the network fail in this paper we describe fault tolerant self organization scheme that designates set of backup nodes to replace failed nodes and maintain backbone for coverage and communication the proposed scheme does not require centralized server for monitoring node failures and for designating backup nodes to replace failed nodes it operates in fully distributed manner and it requires only localized communication this scheme has been implemented on top of an energy efficient self organization technique for sensor networks the proposed fault tolerance node selection procedure can tolerate large number of node failures using only localized communication without losing either sensing coverage or communication connectivity
image retrieval is an active research area in image processing pattern recognition and computer vision relevance feedback has been widely accepted in the field of content based image retrieval cbir as method to boost the retrieval performance recently many researchers have employed support vector machines svms for relevance feedback this paper presents fuzzy support vector machine fsvm that is more robust to the four major problems encountered by the conventional svms small size of samples biased hyperplane over fitting and real time to improve the performance dominant color descriptor dcd is also proposed experimental results based on set of corel images demonstrate that the proposed system performs much better than the previous methods it achieves high accuracy and reduces the processing time greatly
the prism model of engineering processes and an architecture which captures this model in its various components are described the architecture has been designed to hold product software process description the life cycle of which is supported by an explicit representation of higher level or meta process description the central part of this paper describes the nine step prism methodology for building and tailoring process models and gives several scenarios to support this description in prism process models are built using hybrid process modeling language that is based on high level petri net formalism and rules an important observation is that this environment should be seen as an infrastructure for carrying out the more difficult task of creating sound process modelsn
multidestination message passing has been proposed as an attractive mechanism for efficiently implementing multicast and other collective operations on direct networks however applying this mechanism to switch based parallel systems is non trivial in this paper we propose alternative switch architectures with differing buffer organizations to implement multidestination worms on switch based parallel systems first we discuss issues related to such implementation deadlock freedom replication mechanisms header encoding and routing next we demonstrate how an existing central buffer based switch architecture supporting unicast message passing can be enhanced to accommodate multidestination message passing similarly implementing multidestination worms on an input buffer based switch architecture is discussed both of these implementations are evaluated against each other as well as against software based scheme using the central buffer organization simulation experiments under range of traffic multiple multicast bimodal varying degree of multicast and message length and system size are used for evaluation the study demonstrates the superiority of the central buffer based switch architecture it also indicates that under bimodal traffic the central buffer based hardware multicast implementation affects background unicast traffic less adversely compared to software based multicast implementation thus multidestination message passing can easily be applied to switch based parallel systems to deliver good collective communication performance
denial of service dos attacks continue to affect the availability of critical systems on the internet the existing dos problem is enough to merit significant research dedicated to analyzing and classifying dos attacks in the internet context however no such research exists for dos attacks in the domain of content based publish subscribe cps systems despite cps being at the forefront of business process execution application integration and event processing applications this can be attributed to the lack of structure and understanding of key issues in the area of dos in cps systems in this paper we propose to address these problems by presenting taxonomy for classifying dos characteristics and concerns new to cps systems our taxonomy is motivated by number of experimental results that were obtained using our cps middleware implementation and that highlight fundamental dos concerns in this domain finally we discuss some example dos attacks in detail with respect to our taxonomy and experimental results we find that localization message content complexity and filter statefulness are the key cps characteristics to consider when designing dos resilient cps systems
existing pseudo relevance feedback methods typically perform averaging over the top retrieved documents but ignore an important statistical dimension the risk or variance associated with either the individual document models or their combination treating the baseline feedback method as black box and the output feedback model as random variable we estimate posterior distribution for the feed back model by resampling given query’s top retrieved documents using the posterior mean or mode as the enhanced feedback model we then perform model combination over several enhanced models each based on slightly modified query sampled from the original query we find that resampling documents helps increase individual feedback model precision by removing noise terms while sampling from the query improves robustness worst case performance by emphasizing terms related to multiple query aspects the result is meta feedback algorithm that is both more robust and more precise than the original strong baseline method
routing table storage demands pose significant obstacle for large scale network simulation on demand computation of routes can alleviate those problems for models that do not require representation of routing dynamics however policy based routes as used at the interdomain level of the internet through the bgp protocol are significantly more difficult to compute on demand than shortest path intradomain routes due to the semantics of policy based routing and the possibility of routing divergence we exploit recent theoretical results on bgp routing convergence and measurement results on typical use of bgp routing policies to formulate model of typical use and an algorithm for on demand computation of routes that is guaranteed to terminate and produces the same routes as bgp we show empirically that this scheme can reduce memory usage by orders of magnitude and simultaneously reduce the route computation time compared to detailed model of the bgp protocol
uml sequence diagrams are commonly used to represent the interactions among collaborating objects reverse engineered sequence diagrams are constructed from existing code and have variety of uses in software development maintenance and testing in static analysis for such reverse engineering an open question is how to represent the intraprocedural flow of control from the code using the control flow primitives of uml we propose simple uml extensions that are necessary to capture general flow of control the paper describes an algorithm for mapping reducible exception free intraprocedural control flow graph to uml using the proposed extensions we also investigate the inherent tradeoffs of different problem solutions and discuss their implications for reverse engineering tools this work is substantial step towards providing high quality tool support for effective and efficient reverse engineering of uml sequence diagrams
application migration is key enabling technology component of mobile computing that allows rich semantics involving location awareness trust and timeliness of information processing by moving the application where the data is seamlessness is one of the key properties of mobile computing and downtime must be eliminated minimized during the migration to achieve seamlessness but migration involves large overheads dominant of which are the overheads due to serialization and de serialization to achieve seamless migration an application state could be pre serialized during the program’s execution and upon migration the serialized data could be transmitted and de serialized to get the execution started previous approach to this problem removed dead state but still suffered from large migration overheads due to serialization on demand that could lead to an unacceptable downtimein this work we develop static compiler analysis plus runtime assisted framework to decrease the migration overhead to almost zero while minimizing the degradation in the program’s performance we achieve such goal by deciding which data to be pre serialized through analysis and pre serializing the state in the program safe state is kept that would allow immediate migration upon the arrival of an interrupt while minimizing frequent pre serialization when the migration interrupt comes in the serialized data can be transmitted directly to the destination machine this allows an application to resume its execution at the destination machine with almost no interruption only small amount of non serialized data needs to be serialized during migration the optimization serializes the data in such way that maximal number of functions can execute without interruption after migration our experiments with multimedia applications show that the migration latency is significantly reduced leading to small downtime thus the contribution of the paper is to provide an efficient methodology to perform seamless migration while limiting the overhead
the distributed computing column covers the theory of systems that are composed of number of interacting computing elements these include problems of communication and networking databases distributed shared memory multiprocessor architectures operating systems verification internet and the webthis issue consists of the paper incentives and internet computation by joan feigenbaum and scott shenker many thanks to them for contributing to this issue
augmented tabletops have recently attracted considerable attention in the literature however little has been known about the effects that these interfaces have on learning tasks in this paper we report on the results of an empirical study that explores the usage of tabletop systems in an expressive collaborative learning task in particular we focus on measuring the difference in learning outcomes at individual and group levels between students using two interfaces traditional computer and augmented tabletop with tangible input no significant effects of the interface on individual learning gain were found however groups using traditional computer learned significantly more from their partners than those using tabletop interface further analysis showed an interaction effect of the condition and the group heterogeneity on learning outcomes we also present our qualitative findings in terms of how group interactions and strategy differ in the two conditions
query rewrite qrw optimizations apply algebraic transformations to sql query producing sql query both and are semantically equivalent ie they produce the same result but the execution of is generally faster than that of folding views derived tables applying transitive closure on predicates and converting outer joins to inner joins are some examples of qrw optimizations in this paper we carefully analyze the interactions among number of rewrite rules and show how this knowledge is used to devise triggering mechanism in the new teradata extensible qrw subsystem thereby enabling efficient application of the rewrite rules we also present results from experimental studies that show that as compared to conventional recognize act cycle strategy exploiting these interactions yields significant reduction in the time and space cost of query optimization while producing the same re written queries
this article presents new technology called interactive query management iqm designed for supporting flexible query management in decision support systems and recommender systems iqm aims at guiding user to refine query to structured repository of items when it fails to return manageable set of products two failure conditions are considered here when query returns either too many products or no product at all in the former case iqm uses feature selection methods to suggest some features that if used to further constrain the current query would greatly reduce the result set size in the latter case the culprits of the failure are determined by relaxation algorithm and explained to the user enumerating the constraints that if relaxed would solve the no results problem as consequence the user can understand the causes of the failure and decide what is the best query relaxation after having presented iqm we illustrate its empirical evaluation we have conducted two types of experiments with real users and offline simulations both validation procedures show that iqm can repair large percentage of user queries and keep alive the human computer interaction until the user information goals are satisfied
text classification techniques mostly rely on single term analysis of the document data set while more concepts especially the specific ones are usually conveyed by set of terms to achieve more accurate text classifier more informative feature including frequent co occurring words in the same sentence and their weights are particularly important in such scenarios in this paper we propose novel approach using sentential frequent itemset concept comes from association rule mining for text classification which views sentence rather than document as transaction and uses variable precision rough set based method to evaluate each sentential frequent itemset’s contribution to the classification experiments over the reuters and newsgroup corpus are carried out which validate the practicability of the proposed system
we propose practical public key encryption scheme whose security against chosen ciphertext attacks can be reduced in the standard model to the assumption that factoring is intractable
in this paper we study the following problem we are given certain region to monitor and requirement on the degree of coverage doc of to meet by network of deployed sensors the latter will be dropped by moving vehicle which can release sensors at arbitrary points within the node spatial distribution when sensors are dropped at certain point is modeled by probability density function the network designer is allowed to choose an arbitrary set of drop points and to release an arbitrary number of sensors at each point given this setting we consider the problem of determining the optimal grid deployment strategy ie the drop strategy in which release points are arranged in grid such that the doc requirement is fulfilled and the total number of deployed nodes is minimumthis problem is relevant whenever manual node deployment is impossible or overly expensive and partially controlled deployment is the only feasible choicethe main contribution of this paper is an accurate study of the inter relationships between environmental conditions doc requirement and cost of the deployment in particular we show that for given value of sigma and doc requirement optimal grid deployment strategies can be easily identified
abstract we view the problem of estimating the defect content of document after an inspection as machine learning problem the goal is to learn from empirical data the relationship between certain observable features of an inspection such as the total number of different defects detected and the number of defects actually contained in the document we show that some features can carry significant nonlinear information about the defect content therefore we use nonlinear regression technique neural networks to solve the learning problem to select the best among all neural networks trained on given data set one usually reserves part of the data set for later cross validation in contrast we use technique which leaves the full data set for training this is an advantage when the data set is small we validate our approach on known empirical inspection data set for that benchmark our novel approach clearly outperforms both linear regression and the current standard methods in software engineering for estimating the defect content such as capture recapture the validation also shows that our machine learning approach can be successful even when the empirical inspection data set is small
we study the classic mathematical economics problem of bayesian optimal mechanism design where principal aims to optimize expected revenue when allocating resources to self interested agents with preferences drawn from known distribution in single parameter settings ie where each agent’s preference is given by single private value for being served and zero for not being served this problem is solved unfortunately these single parameter optimal mechanisms are impractical and rarely employed and furthermore the underlying economic theory fails to generalize to the important relevant and unsolved multi dimensional setting ie where each agent’s preference is given by multiple values for each of the multiple services available in contrast to the theory of optimal mechanisms we develop theory of sequential posted price mechanisms where agents in sequence are offered take it or leave it prices we prove that these mechanisms are approximately optimal in single dimensional settings these posted price mechanisms avoid many of the properties of optimal mechanisms that make the latter impractical furthermore these mechanisms generalize naturally to multi dimensional settings where they give the first known approximations to the elusive optimal multi dimensional mechanism design problem in particular we solve multi dimensional multi unit auction problems and generalizations to matroid feasibility constraints the constant approximations we obtain range from to for all but one case our posted price sequences can be computed in polynomial time this work can be viewed as an extension and improvement of the single agent algorithmic pricing work of to the setting of multiple agents where the designer has combinatorial feasibility constraints on which agents can simultaneously obtain each service
commerce and intranet search systems require newly arriving content to be indexed and made available for search within minutes or hours of arrival applications such as file system and email search demand even faster turnaround from search systems requiring new content to become available for search almost instantaneously however incrementally updating inverted indices which are the predominant datastructure used in search engines is an expensive operation that most systems avoid performing at high rates we present jiti just in time indexing component that allows searching over incoming content nearly as soon as that content reaches the system jiti’s main idea is to invest less in the preprocessing of arriving data at the expense of tolerable latency in query response time it is designed for deployment in search systems that maintain large main index and that rebuild smaller stop press indices once or twice an hour jiti augments such systems with instant retrieval capabilities over content arriving in between the stop press builds main design point is for jiti to demand few computational resources in particular ram and our experiments consisted of injecting several documents and queries per second concurrently into the system over half hour long periods we believe that there are search applications for which the combination of the workloads we experimented with and the response times we measured present viable solution to pressing problem
recently the relationship between abstract interpretation and program specialization has received lot of scrutiny and the need has been identified to extend program specialization techniques so as to make use of more refined abstract domains and operators this article clarifies this relationship in the context of logic programming by expressing program specialization in terms of abstract interpretation based on this novel specialization framework along with generic correctness results for computed answers and finite failure under sld resolution is developedthis framework can be used to extend existing logic program specialization methods such as partial deduction and conjunctive partial deduction to make use of more refined abstract domains it is also shown how this opens up the way for new optimizations finally as shown in the paper the framework also enables one to prove correctness of new or existing specialization techniques in simpler mannerthe framework has already been applied in the literature to develop and prove correct specialization algorithms using regular types which in turn have been applied to the verification of infinite state process algebras
this paper presents new technique for realtime rendering refraction and caustics effects the algorithm can directly render complex objects represented by polygonal meshes without any precalculation and allows the objects to be deformed dynamically through user interactions also caustic patterns are rendered in the depth texture space we accurately trace the photons path and calculate the energy carried by the photons as result the caustic patterns are calculated without post processing and temporal filtering over neighboring frames our technique can handle both the convex objects and concave objects for the convex objects the ray convex surface intersection is calculated by using binary search algorithm for the concave objects the ray concave surface intersection is done by linear search followed by binary search refinement step the caustics can be also rendered for non uniform deformation of both refractive object and receiver surface allowing the interactive change of light and camera in terms of position and direction
the growing availability of information on the web has raised challenging problem can web based information system tailor itself to different user requirements with the ultimate goal of personalizing and improving the users experience in accessing the contents of website this paper proposes new approach to website personalization based on the exploitation of user browsing interests together with content and usage similarities among web pages the outcome is the delivery of page recommendations which are strictly related to the navigational purposes of visitors and their actual location within the cyberspace of the website our approach has been used effectively for developing non invasive system which allows web users to navigate through potentially interesting pages without having basic knowledge of the website structure
we develop detailed area and energy models for on chip interconnection networks and describe tradeoffs in the design of efficient networks for tiled chip multiprocessors using these detailed models we investigate how aspects of the network architecture including topology channel width routing strategy and buffer size affect performance and impact area and energy efficiency we simulate the performance of variety of on chip networks designed for tiled chip multiprocessors implemented in an advanced vlsi process and compare area and energy efficiencies estimated from our models we demonstrate that the introduction of second parallel network can increase performance while improving efficiency and evaluate different strategies for distributing traffic over the subnetworks drawing on insights from our analysis we present concentrated mesh topology with replicated subnetworks and express channels which provides improvement in area efficiency and improvement in energy efficiency over other networks evaluated in this study
traditionally direct marketing companies have relied on pre testing to select the best offers to send to their audience companies systematically dispatch the offers under consideration to limited sample of potential buyers rank them with respect to their performance and based on this ranking decide which offers to send to the wider population though this pre testing process is simple and widely used recently the industry has been under increased pressure to further optimize learning in particular when facing severe time and learning space constraints the main contribution of the present work is to demonstrate that direct marketing firms can exploit the information on visual content to optimize the learning phase this paper proposes two phase learning strategy based on cascade of regression methods that takes advantage of the visual and text features to improve and accelerate the learning process experiments in the domain of commercial multimedia messaging service mms show the effectiveness of the proposed methods and significant improvement over traditional learning techniques the proposed approach can be used in any multimedia direct marketing domain in which offers comprise both visual and text component
most fault tolerant schemes for wireless sensor networks focus on power failures or crash faults little attention has been paid to the data inconsistency failures which occur when the binary contents of data packet are changed during processing in this event faulty node may produce incorrect data and transmit them to other sensors hence erroneous results are propagated throughout the entire network and the sink may make inappropriate decisions as result accordingly this study proposes mechanism which can both tolerate and locate data inconsistency failures in sensor networks node disjoint paths and an automatic diagnosis scheme are utilized to identify the faulty sensor nodes the proposed mechanism was implemented with the ns simulator the evaluation results demonstrate the ability of the mechanism to identify faulty nodes efficiently and with limited overheads using the scheme more than of the inconsistent data packets were successfully detected by the sink
content based image retrieval with relevant feedback has been widely adopted as the query model of choice for improved effectiveness in image retrieval the effectiveness of this solution however depends on the efficiency of the feedback mechanism current methods rely on searching the database stored on disks in each round of relevance feedback this strategy incurs long delay making relevance feedback less friendly to the user especially for very large databases thus scalability is limitation of existing solutions in this paper we propose an in memory relevance feedback technique to substantially reduce the delay associated with feedback processing and therefore improve system usability our new data independent dimensionality reduction technique is used to compress the metadata to build small in memory database to support relevance feedback operations with minimal disk accesses we compare the performance of this approach with conventional relevance feedback techniques in terms of computation efficiency and retrieval accuracy the results indicate that the new technique substantially reduces response time for user feedback while maintaining the quality of the retrieval
this article presents practical solution for the cyclic debugging of nondeterministic parallel programs the solution consists of combination of record replay with automatic on the fly data race detection this combination enables us to limit the record phase to the more efficient recording of the synchronization operations while deferring the time consuming data race detection to the replay phase as the record phase is highly efficient there is no need to switch it off hereby eliminating the possibility of heisenbugs because tracing can be left on all the time this article describes an implementation of the tools needed to support recplay
this paper studies the problem of identifying comparative sentences in text documents the problem is related to but quite different from sentiment opinion sentence identification or classification sentiment classification studies the problem of classifying document or sentence based on the subjective opinion of the author an important application area of sentiment opinion identification is business intelligence as product manufacturer always wants to know consumers opinions on its products comparisons on the other hand can be subjective or objective furthermore comparison is not concerned with an object in isolation instead it compares the object with others an example opinion sentence is the sound quality of cd player is poor an example comparative sentence is the sound quality of cd player is not as good as that of cd player clearly these two sentences give different information their language constructs are quite different too identifying comparative sentences is also useful in practice because direct comparisons are perhaps one of the most convincing ways of evaluation which may even be more important than opinions on each individual object this paper proposes to study the comparative sentence identification problem it first categorizes comparative sentences into different types and then presents novel integrated pattern discovery and supervised learning approach to identifying comparative sentences from text documents experiment results using three types of documents news articles consumer reviews of products and internet forum postings show precision of and recall of more detailed results are given in the paper
given finite state machine checking sequence is an input sequence that is guaranteed to lead to failure if the implementation under test is faulty and has no more states than there has been much interest in the automated generation of short checking sequence from finite state machine however such sequences can contain reset transitions whose use can adversely affect both the cost of applying the checking sequence and the effectiveness of the checking sequence thus we sometimes want checking sequence with minimum number of reset transitions rather than shortest checking sequence this paper describes new algorithm for generating checking sequence based on distinguishing sequence that minimises the number of reset transitions used
generative coordination is one of the most prominent coordination models for implementing open systems due to its spatial and temporal decoupling recently coordination community effort have been trying to integrate security mechanisms to this model aiming to improve its robustness in this context this paper presents the bts coordination model which provides byzantine fault tolerant tuple space byzantine faults are commonly used to represent both process crashes and intrusions as far as we know bts is the first coordination model that supports this dependability level
financial distress prediction of companies is such hot topic that has called interest of managers investors auditors and employees case based reasoning cbr is methodology for problem solving it is an imitation of human beings actions in real life when employing cbr in financial distress prediction it can not only provide explanations for its prediction but also advise how the company can get out of distress based on solutions of similar cases in the past this research puts forward multiple case based reasoning system by majority voting multi cbr mv for financial distress prediction four independent cbr models deriving from euclidean metric manhattan metric grey coefficient metric and outranking relation metric are employed to generate the system of multi cbr pre classifications of the former four independent cbrs are combined to generate the final prediction by majority voting we employ two kinds of majority voting ie pure majority voting pmv and weighted majority voting wmv correspondingly there are two deriving multi cbr systems ie multi cbr pmv and multi cbr wmv in the experiment min max normalization was used to scale all data into the specific range of the technique of grid search was utilized to get optimal parameters under the assessment of leave one out cross validation loo cv and hold out data sets were used to assess predictive performance of models with data collected from shanghai and shenzhen stock exchanges experiment was carried out to compare performance of the two multi cbr mv systems with their composing cbrs and statistical models empirical results got satisfying results which has testified the feasibility and validity of the proposed multi cbr mv for listed companies financial distress prediction in china
the value of using static code attributes to learn defect predictors has been widely debated prior work has explored issues like the merits of mccabes versus halstead versus lines of code counts for generating defect predictors we show here that such debates are irrelevant since how the attributes are used to build predictors is much more important than which particular attributes are used also contrary to prior pessimism we show that such defect predictors are demonstrably useful and on the data studied here yield predictors with mean probability of detection of percent and mean false alarms rates of percent these predictors would be useful for prioritizing resource bound exploration of code that has yet to be inspected
specifying exact query concepts has become increasingly challenging to end users this is because many query concepts eg those for looking up multimedia object can be hard to articulate and articulation can be subjective in this study we propose query concept learner that learns query criteria through an intelligent sampling process our concept learner aims to fulfill two primary design objectives it has to be expressive in order to model most practical query concepts and it must learn concept quickly and with small number of labeled data since online users tend to be too impatient to provide much feedback to fulfill the first goal we model query concepts in cnf which can express almost all practical query concepts to fulfill the second design goal we propose our maximizing expected generalization algorithm mega which converges to target concepts quickly by its two complementary steps sample selection and concept refinement we also propose divide and conquer method that divides the concept learning task into subtasks to achieve speedup we notice that task must be divided carefully or search accuracy may suffer through analysis and mining results we observe that organizing image features in multiresolution manner and minimizing intragroup feature correlation can speed up query concept learning substantially while maintaining high search accuracy through examples analysis experiments and prototype implementation we show that mega converges to query concepts significantly faster than traditional methods
the word wide web has become one of the most important information repositories however information in web pages is free from standards in presentation and lacks being organized in good format it is challenging work to extract appropriate and useful information from web pages currently many web extraction systems called web wrappers either semi automatic or fully automatic have been developed in this paper some existing techniques are investigated then our current work on web information extraction is presented in our design we have classified the patterns of information into static and non static structures and use different technique to extract the relevant information in our implementation patterns are represented with xsl files and all the extracted information is packaged into machine readable format of xml
distributed object oriented environments have become important platforms for parallel and distributed service frameworks among distributed object oriented software net remoting provides language layer of abstractions for performing parallel and distributed computing in net environments in this paper we present our methodologies in supporting net remoting over meta clustered environments we take the advantage of the programmability of network processors to develop the content based switch for distributing workloads generated from remote invocations in net our scheduling mechanisms include stateful supports for net remoting services in addition we also propose scheduling policy to incorporate work flow models as the models are now incorporated in many of tools of grid architectures the result of our experiment shows that the improvement of eft is from to when compared to ett and is from to when compared to rr while the stateful task ratio is our schemes are effective in supporting the switching of net remoting computations over meta cluster environments
designing tensor fields in the plane and on surfaces is necessary task in many graphics applications such as painterly rendering pen and ink sketching of smooth surfaces and anisotropic remeshing in this article we present an interactive design system that allows user to create wide variety of symmetric tensor fields over surfaces either from scratch or by modifying meaningful input tensor field such as the curvature tensor our system converts each user specification into basis tensor field and combines them with the input field to make an initial tensor field however such field often contains unwanted degenerate points which cannot always be eliminated due to topological constraints of the underlying surface to reduce the artifacts caused by these degenerate points our system allows the user to move degenerate point or to cancel pair of degenerate points that have opposite tensor indices these operations provide control over the number and location of the degenerate points in the field we observe that tensor field can be locally converted into vector field so that there is one to one correspondence between the set of degenerate points in the tensor field and the set of singularities in the vector field this conversion allows us to effectively perform degenerate point pair cancellation and movement by using similar operations for vector fields in addition we adapt the image based flow visualization technique to tensor fields therefore allowing interactive display of tensor fields on surfaces we demonstrate the capabilities of our tensor field design system with painterly rendering pen and ink sketching of surfaces and anisotropic remeshing
this paper presents an abstraction called guardian for exception handling in distributed and concurrent systems that use coordinated exception handling this model addresses two fundamental problems with distributed exception handling in group of asynchronous processes the first is to perform recovery when multiple exceptions are concurrently signaled the second is to determine the correct context in which process should execute its exception handling actions several schemes have been proposed in the past to address these problems these are based on structuring distributed program as atomic actions based on conversations or transactions and resolving multiple concurrent exceptions into single one the guardian in distributed program represents the abstraction of global exception handler which encapsulates rules for handling concurrent exceptions and directing each process to the semantically correct context for executing its recovery actions its programming primitives and the underlying distributed execution model are presented here in contrast to the existing approaches this model is more basic and can be used to implement or enhance the existing schemes using several examples we illustrate the capabilities of this model finally its advantages and limitations are discussed in contrast to existing approaches
basex is an early adopter of the upcoming xquery full text recommendation this paper presents some of the enhancements made to the xml database to fully support the language extensions the system’s data and index structures are described and implementation details are given on the xquery compiler which supports sequential scanning index based and hybrid processing of full text queries experimental analysis and an insight into visual result presentation of query results conclude the presentation
there are number of transportation applications that require the use of heuristic shortest path algorithm rather than one of the standard optimal algorithms this is primarily due to the requirements of some transportation applications where shortest paths need to be quickly identified either because an immediate response is required eg in vehicle route guidance systems or because the shortest paths need to be recalculated repeatedly eg vehicle routing and scheduling for this reason number of heuristic approaches have been advocated for decreasing the computation time of the shortest path algorithm this paper presents survey review of various heuristic shortest path algorithms that have been developed in the past the goal is to identify the main features of different heuristic strategies develop unifying classification framework and summarize relevant computational experience
application level intrusion detection systems usually rely on the immunological approach in this approach the application behavior is compared at runtime with previously learned application profile of the sequence of system calls it is allowed to emit unfortunately this approach cannot detect anything but control flow violation and thus remains helpless in detecting the attacks that aim pure application data in this paper we propose an approach that would enhance the detection of such attacks our proposal relies on data oriented behavioral model that builds the application profile out of dynamically extracted invariant constraints on the application data items
digital pen systems originally designed to digitize annotations made on physical paper are evolving to permit wider variety of applications although the type and quality of pen feedback eg haptic audio and visual have huge impact on advancing the digital pen technology dynamic visual feedback has yet to be fully investigated in parallel miniature projectors are an emerging technology with the potential to enhance visual feedback for small mobile computing devices in this paper we present the penlight system which is testbed to explore the interaction design space and its accompanying interaction techniques in digital pen embedded with spatially aware miniature projector using our prototype that simulates miniature projection via standard video projector we visually augment paper documents giving the user immediate access to additional information and computational tools we also show how virtual ink can be managed in single and multi user environments to aid collaboration and data management user evaluation with professional architects indicated promise of our proposed techniques and their potential utility in the paper intensive domain of architecture
similarity search has been studied in domain of time series data mining and it is an important technique in stream mining since sampling rates of streams are frequently different and their time period varies in practical situations the method which deals with time warping such as dynamic time warping dtw is suitable for measuring similarity however finding pairs of similar subsequences between co evolving sequences is difficult due to increase of the complexity because dtw is method for detecting sequences that are similar to given query sequencein this paper we focus on the problem of finding pairs of similar subsequences and periodicity over data streams we propose method to detect similar subsequences in streaming fashion our approach for measuring similarity relies on proposed scoring function that incrementally updates score which is suitable for data stream processing we also present an efficient algorithm based on the scoring function our experiments on real and synthetic data demonstrate that our method detects the pairs of qualifying subsequence correctly and that it is dramatically faster than the existing method
in this paper we propose two information theoretic techniques for efficiently trading off the location update and paging costs associated with mobility management in wireless cellular networks previous approaches attempt to always accurately convey mobiles movement sequence and hence cannot reduce the signaling cost below the entropy bound our proposed techniques however exploit rate distortion theory to arbitrarily reduce the update cost at the expense of an increase in the corresponding paging overhead to this end we describe two location tracking algorithms based on spatial quantization and temporal quantization which first quantize the movement sequence into smaller set of codewords and then report compressed representation of the codeword sequence while the spatial quantization algorithm clusters individual cells into registration areas the more powerful temporal quantization algorithm groups sets of consecutive movement patterns the quantizers themselves are adaptive and periodically reconfigure to accommodate changes in the mobiles movement pattern simulation study with synthetic as well as real movement traces for both single system and multi system cellular networks demonstrate that the proposed algorithms can reduce the mobiles update frequency to updates day with reasonable paging cost low computational complexity storage overhead and codebook updates
guide to the tools and core technologies for merging information from disparate sources
spanner of an undirected unweighted graph is subgraph that approximates the distance metric of the original graph with some specified accuracy specifically we say sube is an spanner of if any two vertices at distance in are at distance at most in there is clearly some trade off between the sparsity of and the distortion function though the nature of the optimal trade off is still poorly understood in this article we present simple modular framework for constructing sparse spanners that is based on interchangable components called connection schemes by assembling connection schemes in different ways we can recreate the additive and spanners of aingworth et al and baswana et al and give spanners whose multiplicative distortion quickly tends toward our results rival the simplicity of all previous algorithms and provide substantial improvements up to doubly exponential reduction in edge density over the comparable spanners of elkin and peleg and thorup and zwick
re identification is major privacy threat to public datasets containing individual records many privacy protection algorithms rely on generalization and suppression of quasi identifier attributes such as zip code and birthdate their objective is usually syntactic sanitization for example anonymity requires that each quasi identifier tuple appear in at least records while diversity requires that the distribution of sensitive attributes for each quasi identifier have high entropy the utility of sanitized data is also measured syntactically by the number of generalization steps applied or the number of records with the same quasi identifier in this paper we ask whether generalization and suppression of quasi identifiers offer any benefits over trivial sanitization which simply separates quasi identifiers from sensitive attributes previous work showed that anonymous databases can be useful for data mining but anonymization does not guarantee any privacy by contrast we measure the tradeoff between privacy how much can the adversary learn from the sanitized records and utility measured as accuracy of data mining algorithms executed on the same sanitized records for our experimental evaluation we use the same datasets from the uci machine learning repository as were used in previous research on generalization and suppression our results demonstrate that even modest privacy gains require almost complete destruction of the data mining utility in most cases trivial sanitization provides equivalent utility and better privacy than anonymity diversity and similar methods based on generalization and suppression
self adaptive component based architectures facilitate the building of systems capable of dynamically adapting to varying execution context such dynamic adaptation is particularly relevant in the domain of ubiquitous computing where numerous and unexpected changes of the execution context prevail in this paper we introduce an extension of the music component based planning framework that optimizes the overall utility of applications when such changes occur in particular we focus on changes in the service provider landscape in order to plug in interchangeably components and services providing the functionalities defined by the component framework the dynamic adaptations are operated automatically for optimizing the application utility in given execution context our resulting planning framework is described and validated on motivating scenario of the music project
this paper presents decentralized variant of david gifford’s classic weighted voting scheme for managing replicated data weighted voting offers familiar consistency model and supports on line replica reconfiguration these properties make it good fit for applications in the pervasive computing domain by distributing versioned metadata along with data replicas and managing access to both data and metadata with the same quorums our algorithm supports peer to peer environment with dynamic device membership our algorithm has been implemented as part of database called oasis that was designed for pervasive environments
most existing hypermedia authoring systems are intended for use on desktop computers these systems are typically designed for the creation of documents and therefore employ authoring mechanisms in contrast authoring systems for nontraditional multimedia hypermedia experiences for virtual or augmented worlds focus mainly on creating separate media objects and embedding them within the user’s surroundings as result linking these media objects to create hypermedia is tedious manual task to address this issue we present an authoring tool for creating and editing linked hypermedia narratives that are interwoven with wearable computer user’s surrounding environment our system is designed for use by authors who are not programmers and allows them to preview their results on desktop workstation as well as with an augmented or virtual reality system
concurrency is pervasive in large systems unexpected interference among threads often results in heisenbugs that are extremely difficult to reproduce and eliminate we have implemented tool called chess for finding and reproducing such bugs when attached to program chess takes control of thread scheduling and uses efficient search techniques to drive the program through possible thread interleavings this systematic exploration of program behavior enables chess to quickly uncover bugs that might otherwise have remained hidden for long time for each bug chess consistently reproduces an erroneous execution manifesting the bug thereby making it significantly easier to debug the problem chess scales to large concurrent programs and has found numerous bugs in existing systems that had been tested extensively prior to being tested by chess chess has been integrated into the test frameworks of many code bases inside microsoft and is used by testers on daily basis
data items are often associated with location in which they are present or collected and their relevance or influence decays with their distance aggregate values over such data thus depend on the observing location where the weight given to each item depends on its distance from that location we term such aggregation spatially decaying spatially decaying aggregation has numerous applications individual sensor nodes collect readings of an environmental parameter such as contamination level or parking spot availability the nodes then communicate to integrate their readings so that each location obtains contamination level or parking availability in its neighborhood nodes in pp network could use summary of content and properties of nodes in their neighborhood in order to guide search in graphical databases such as web hyperlink structure properties such as subject of pages that can reach or be reached from page using link traversals provide information on the page we formalize the notion of spatially decaying aggregation and develop efficient algorithms for fundamental aggregation functions including sums and averages random sampling heavy hitters quantiles and norms
in model driven engineering mde software development is centred around formal description model of the proposed software system and other software artifacts are derived directly from the model we are investigating semantically configurable mde in which specifiers are able to configure the semantics of their models the goal of this work is to provide modelling environment that offers flexible configurable modelling notations so that specifiers are better able to represent their ideas and yet still provides the types of analysis tools and code generators normally associated with model driven engineering in this paper we present semantically configurable code generator generator which creates java code generator for modelling notation given the notation’s semantics expressed as set of parameter values we are able to simulate multiple different model based code generators though at present the performance of our generated code is about an order of magnitude slower than that produced by commercial grade generators
as microprocessor technology continues to scale into the nanometer regime recent studies show that interconnect delay will be limiting factor for performance and multiple cycles will be necessary to communicate global signals across the chip thus longer interconnects need to be pipelined and the impact of the extra latency along wires needs to be considered during early micro architecture design exploration in this paper we address this problem and make the following contributions oor plan driven micro architecture evaluation methodology considering interconnect pipelining at given target frequency by selectively optimizing architecture level critical paths use of micro architecture performance sensitivity models to weight micro architectural critical paths during oor planning and optimize them for higher performance methodology to study the impact of frequency scaling on micro architecture performance with consideration of interconnect pipeliningfor sample micro architecture design space we show that considering interconnect pipelining can increase the estimated performance against no wire pipelining approach between to we also demonstrate the value of the methodology in exploring the target frequency of the processor
while the demand for college graduates with computing skills continues to rise such skills no longer equate to mere programming skills modern day computing jobs demand design communication and collaborative work skills as well since traditional instructional methods in computing education tend to focus on programming skills we believe that fundamental rethinking of computing education is in order we are exploring new studio based pedagogy that actively engages undergraduate students in collaborative design oriented learning adapted from architectural education the studio based instructional model emphasizes learning activities in which students construct personalized solutions to assigned computing problems and present solutions to their instructors and peers for feedback and discussion within the context of design crits we describe and motivate the studio based approach review previous efforts to apply it to computer science education and propose an agenda for multi institutional research into the design and impact of studio based instructional models we invite educators to participate in community of research and practice to advance studio based learning in computing education
in this paper we present jedd language extension to java that supports convenient way of programming with binary decision diagrams bdds the jedd language abstracts bdds as database style relations and operations on relations and provides static type rules to ensure that relational operations are used correctlythe paper provides description of the jedd language and reports on the design and implementation of the jedd translator and associated runtime system of particular interest is the approach to assigning attributes from the high level relations to physical domains in the underlying bdds which is done by expressing the constraints as sat problem and using modern sat solver to compute the solution further runtime system is defined that handles memory management issues and supports browsable profiling tool for tuning the key bdd operationsthe motivation for designing jedd was to support the development of whole program analyses based on bdds and we have used jedd to express five key interrelated whole program analyses in our soot compiler framework we provide some examples of this application and discuss our experiences using jedd
we present blas bi labeling based system for efficiently processing complex xpath queries over xml data blas uses labeling to process queries involving consecutive child axes and labeling to process queries involving descendant axes traversal the xml data is stored in labeled form and indexed to optimize descendent axis traversals three algorithms are presented for translating complex xpath queries to sql expressions and two alternate query engines are provided experimental results demonstrate that the blas system has substantial performance improvement compared to traditional xpath processing using labeling
dataflow analyses for concurrent programs differ from their single threaded counterparts in that they must account for shared memory locations being overwritten by concurrent threads existing dataflow analysis techniques for concurrent programs typically fall at either end of spectrum at one end the analysis conservatively kills facts about all data that might possibly be shared by multiple threads at the other end precise thread interleaving analysis determines which data may be shared and thus which dataflow facts must be invalidated the former approach can suffer from imprecision whereas the latter does not scale we present radar framework that automatically converts dataflow analysis for sequential programs into one that is correct for concurrent programs radar uses race detection engine to kill the dataflow facts generated and propagated by the sequential analysis that become invalid due to concurrent writes our approach of factoring all reasoning about concurrency into race detection engine yields two benefits first to obtain analyses for code using new concurrency constructs one need only design suitable race detection engine for the constructs second it gives analysis designers an easy way to tune the scalability and precision of the overall analysis by only modifying the race detection engine we describe the radar framework and its implementation using pre existing race detection engine we show how radar was used to generate concurrent version of null pointer dereference analysis and we analyze the result of running the generated concurrent analysis on several benchmarks
linear ccd sensor reads temporal data from ccd array continuously and forms image profile compared to most of the sensors in the current sensor networks that output temporal signals it delivers more information such as color shape and event of flowing scene on the other hand it abstracts passing objects in the profile without heavy computation and transmits much less data than video this paper revisits the capabilities of the sensors in data processing compression and streaming in the framework of wireless sensor network we focus on several unsolved issues such as sensor setting shape analysis robust object extraction and real time background adapting to ensure long term sensing and visual data collection via networks all the developed algorithms are executed in constant complexity for reducing the sensor and network burden sustainable visual sensor network can thus be established in large area to monitor passing objects and people for surveillance traffic assessment invasion alarming etc
the introduction of covering generalized rough sets has made substantial contribution to the traditional theory of rough sets the notion of attribute reduction can be regarded as one of the strongest and most significant results in rough sets however the efforts made on attribute reduction of covering generalized rough sets are far from sufficient in this work covering reduction is examined and discussed we initially construct new reduction theory by redefining the approximation spaces and the reducts of covering generalized rough sets this theory is applicable to all types of covering generalized rough sets and generalizes some existing reduction theories moreover the currently insufficient reducts of covering generalized rough sets are improved by the new reduction we then investigate in detail the procedures to get reducts of covering the reduction of covering also provides technique for data reduction in data mining
time synchronisation is one of the most important and fundamental middleware services for wireless sensor networks however there is an apparent disconnect between existing time synchronisation implementations and the actual needs of current typical sensor network applications to address this problem we formulate set of canonical time synchronisation services distilled from actual applications and propose set of general application programming interfaces for providing them we argue that these services can be implemented using simple time stamping primitive called elapsed time on arrival eta and we provide two such implementations the routing integrated time synchronisation rits is an extension of eta over multiple hops it is reactive time synchronisation protocol that can be used to correlate multiple event detections at one or more locations to within microseconds rapid time synchronisation rats is proactive timesync protocol that utilises rits to achieve network wide synchronisation with microsecond precision and rapid convergence our work demonstrates that it is possible to build high performance timesync services using the simple eta primitive and suggests that more complex mechanisms may be unnecessary to meet the needs of many real world sensor network applications
group awareness is an important part of synchronous collaboration and support for group awareness can greatly improve groupware usability however it is still difficult to build groupware that supports group awareness to address this problem we have developed the maui toolkit java toolkit with broad suite of awareness enhanced ui components the toolkit contains both extensions of standard swing widgets and groupware specific components such as telepointers all components have added functionality for collecting distributing and visualizing group awareness information the toolkit packages components as javabeans allowing wide code reuse easy integration with ides and drag and drop creation of working group aware interfaces the toolkit provides the first ever set of ui widgets that are truly collaboration aware and provides them in way that greatly simplifies the construction and testing of rich groupware interfaces
conventional camera captures blurred versions of scene information away from the plane of focus camera systems have been proposed that allow for recording all focus images or for extracting depth but to record both simultaneously has required more extensive hardware and reduced spatial resolution we propose simple modification to conventional camera that allows for the simultaneous recovery of both high resolution image information and depth information adequate for semi automatic extraction of layered depth representation of the image our modification is to insert patterned occluder within the aperture of the camera lens creating coded aperture we introduce criterion for depth discriminability which we use to design the preferred aperture pattern using statistical model of images we can recover both depth information and an all focus image from single photographs taken with the modified camera layered depth map is then extracted requiring user drawn strokes to clarify layer assignments in some cases the resulting sharp image and layered depth map can be combined for various photographic applications including automatic scene segmentation post exposure refocusing or re rendering of the scene from an alternate viewpoint
we address design of computer support for work and its coordination at the danish broadcasting corporation we propose design solutions based upon participatory design techniques and ethnographically inspired analysis within full scale design project the project exemplifies an ambitious yet realistic design practice that provides sound basis for organisational decision making and for technical and organizational development and implementation we focus on cooperative aspects within and among the editorial units and between editorial units and the editorial board we discuss technical and organisational aspects of the design seen in light of recent cscw concepts including coordination and computational coordination mechanisms technologies of accountability and workflow from within and without
in this paper we study the maintenance of frequent patterns in the context of the generator representation the generator representation is concise and lossless representation of frequent patterns we effectively maintain the generator representation by systematically expanding its negative generator border in the literature very few work has addressed the maintenance of the generator representation to illustrate the proposed maintenance idea new algorithm is developed to maintain the generator representation for support threshold adjustment our experimental results show that the proposed algorithm is significantly faster than other state of the art algorithms this proposed maintenance idea can also be extended to other representations of frequent patterns as demonstrated in this paper
the rules of classical logic may be formulated in pairs corresponding to de morgan duals rules about are dual to rules about line of work including that of filinski griffin parigot danos joinet and schellinx selinger and curien and herbelin has led to the startling conclusion that call by value is the de morgan dual of call by namethis paper presents dual calculus that corresponds to the classical sequent calculus of gentzen in the same way that the lambda calculus of church corresponds to the intuitionistic natural deduction of gentzen the paper includes crisp formulations of call by value and call by name that are obviously dual no similar formulations appear in the literature the paper gives cps translation and its inverse and shows that the translation is both sound and complete strengthening result in curien and herbelin
cooperative multi agent systems mas are ones in which several agents attempt through their interaction to jointly solve tasks or to maximize utility due to the interactions among the agents multi agent problem complexity can rise rapidly with the number of agents or their behavioral sophistication the challenge this presents to the task of programming solutions to mas problems has spawned increasing interest in machine learning techniques to automate the search and optimization process we provide broad survey of the cooperative multi agent learning literature previous surveys of this area have largely focused on issues common to specific subareas for example reinforcement learning rl or robotics in this survey we attempt to draw from multi agent learning work in spectrum of areas including rl evolutionary computation game theory complex systems agent modeling and robotics we find that this broad view leads to division of the work into two categories each with its own special issues applying single learner to discover joint solutions to multi agent problems team learning or using multiple simultaneous learners often one per agent concurrent learning additionally we discuss direct and indirect communication in connection with learning plus open issues in task decomposition scalability and adaptive dynamics we conclude with presentation of multi agent learning problem domains and list of multi agent learning resources
direction based spatial relationships are critical in many domains including geographic information systems gis and image interpretation they are also frequently used as selection conditions in spatial queries in this paper we explore the processing of object based direction queries and propose new open shape based strategy oss oss models the direction region as an open shape and converts the processing of the direction predicates into the processing of topological operations between open shapes and closed geometry objects the proposed strategy oss makes it unnecessary to know the boundary of the embedding world and also eliminates the computation related to the world boundary oss reduces both and cpu costs by greatly improving the filtering effectiveness our experimental evaluation shows that oss consistently outperforms classical range query strategies rqs while the degree of performance improvement varies by several parameters experimental results also demonstrate that oss is more scalable than rqs for large data sets
capturing detailed surface geometry currently requires specialized equipment such as laser range scanners which despite their high accuracy leave gaps in the surfaces that must be reconciled with photographic capture for relighting applications using only standard digital camera and single view we present method for recovering models of predominantly diffuse textured surfaces that can be plausibly relit and viewed from any angle under any illumination our multiscale shape from shading technique uses diffuse lit flash lit image pairs to produce an albedo map and textured height field using two lighting conditions enables us to subtract one from the other to estimate albedo in the absence of flash lit image of surface for which we already have similar exemplar pair we approximate both albedo and diffuse shading images using histogram matching our depth estimation is based on local visibility unlike other depth from shading approaches all operations are performed on the diffuse shading image in image space and we impose no constant albedo restrictions an experimental validation shows our method works for broad range of textured surfaces and viewers are frequently unable to identify our results as synthetic in randomized presentation furthermore in side by side comparisons subjects found rendering of our depth map equally plausible to one generated from laser range scan we see this method as significant advance in acquiring surface detail for texturing using standard digital camera with applications in architecture archaeological reconstruction games and special effects
the internet suspend resume model of mobile computing cuts the tight binding between pc state and pc hardware by layering virtual machine on distributed storage isr lets the vm encapsulate execution and user customization state distributed storage then transports that state across space and time this article explores the implications of isr for an infrastructure based approach to mobile computing it reports on experiences with three versions of isr and describes work in progress toward the openisr version
sensornets are being deployed and increasingly brought on line to share data as it is collected sensornet republishing is the process of transforming on line sensor data and sharing the filtered aggregated or improved data with others we explore the need for data provenance in this system to allow users to understand how processed results are derived and detect and correct anomalies we describe our sensornet provenance system exploring design alternatives and quantifying storage trade offs in the context of city sized temperature monitoring application in that application our link approach outperforms other alternatives on saving storage requirement and our incremental compression scheme save the storage further up to
simultaneous multithreading smt represents fundamental shift in processor capability smt’s ability to execute multiple threads simultaneously within single cpu offers tremendous potential performance benefits however the structure and behavior of software affects the extent to which this potential can be achieved consequently just like the earlier arrival of multiprocessors the advent of smt processors prompts needed re evaluation of software that will run on them this evaluation is complicated since smt adopts architectural features and operating costs of both its predecessors uniprocessors and multiprocessors the crucial task for researchers is to determine which software structures and policies multi processor uniprocessor or neither are most appropriate for smtthis paper evaluates how smt’s changes to the underlying hardware affects server software and in particular smt’s effects on memory allocation and synchronization using detailed simulation of an smt server implemented in three different thread models we find that the default policies often provided with multiprocessor operating systems produce unacceptably low performance for each area that we examine we identify better policies that combine techniques from both uniprocessors and multi processors we also uncover vital aspect of multi threaded synchronization interaction with operating system thread scheduling that previous research on smt synchronization had overlooked overall our results demonstrate how few simple changes to applications run time support libraries can dramatically boost the performance of multi threaded servers on smt without requiring modifications to the applications themselves
abstract like text summarisation requires means of producing novel summary sentences in order to improve the grammaticality of the generated sentence we model global sentence level syntactic structure we couch statistical sentence generation as spanning tree problem in order to search for the best dependency tree spanning set of chosen words we also introduce new search algorithm for this task that models argument satisfaction to improve the linguistic validity of the generated tree we treat the allocation of modifiers to heads as weighted bipartite graph matching or assignment problem well studied problem in graph theory using bleu to measure performance on string regeneration task we found an improvement illustrating the benefit of the spanning tree approach armed with an argument satisfaction model
fan cloud is set of triangles that can be used to visualize and work with point clouds it is fast to compute and can replace triangular mesh representation we discuss visualization multiresolution reduction refinement and selective refinement algorithms for triangular meshes can also be applied to fan clouds they become even simpler because fans are not interrelated this localness of fan clouds is one of their main advantages no remeshing is necessary for local or adaptive refinement and reduction
adding byzantin tolerance to large scale distributed systems is considered non practical the time message and space requirements are very high recently researches have investigated the broadcast problem in the presence of local byzantin adversary the local adversary cannot control more than neighbors of any given node this paper proves sufficient conditions as to when the synchronous byzantin consensus problem can be solved in the presence of local adversarymoreover we show that for family of graphs the byzantin consensus problem can be solved using relatively small number of messages and with time complexity proportional to the diameter of the network specifically for family of bounded degree graphs with logarithmic diameter logn time and logn messages furthermore our proposed solution requires constant memory space at each node
compositional multi agent system design is methodological perspective on multi agent system design based on the software engineering principles process and knowledge abstraction compositionality reuse specification and verification this paper addresses these principles from generic perspective in the context of the compositional development method desire an overview is given of reusable generic models design patterns for different types of agents problem solving methods and tasks and reasoning patterns examples of supporting tools are described
this article presents novel constraint based motion editing technique on the basis of animator specified kinematic and dynamic constraints the method converts given captured or animated motion to physically plausible motion in contrast to previous methods using spacetime optimization we cast the motion editing problem as constrained state estimation problem based on the per frame kalman filter framework the method works as filter that sequentially scans the input motion to produce stream of output motion frames at stable interactive rate animators can tune several filter parameters to adjust to different motions turn the constraints on or off based on their contributions to the final result or provide rough sketch kinematic hint as an effective way of producing the desired motion experiments on various systems show that the technique processes the motions of human with degrees of freedom at about fps when only kinematic constraints are applied and at about fps when both kinematic and dynamic constraints are applied experiments on various types of motion show that the proposed method produces remarkably realistic animations
in this paper we investigate knowledge reasoning within simple framework called knowledge structure we use variable forgetting as basic operation for one agent to reason about its own or other agents knowledge in our framework two notions namely agents observable variables and the weakest sufficient condition play important roles in knowledge reasoning given background knowledge base and set of observable variables oi for each agent we show that the notion of agent knowing formula can be defined as weakest sufficient condition of over oi under moreover we show how to capture the notion of common knowledge by using generalized notion of weakest sufficient condition also we show that public announcement operator can be conveniently dealt with via our notion of knowledge structure further we explore the computational complexity of the problem whether an epistemic formula is realized in knowledge structure in the general case this problem is pspace hard however for some interesting subcases it can be reduced to co np finally we discuss possible applications of our framework in some interesting domains such as the automated analysis of the well known muddy children puzzle and the verification of the revised needham schroeder protocol we believe that there are many scenarios where the natural presentation of the available information about knowledge is under the form of knowledge structure what makes it valuable compared with the corresponding multi agent kripke structure is that it can be much more succinct
this paper proposes behavioral footprinting new dimension of worm profiling based on worm infection sessions worm’s infection session contains number of steps eg for probing exploitation and replication that are exhibited in certain order in every successful worm infection behavioral footprinting complements content based signature by enriching worm’s profile which will be used in worm identification an important task in post worm attack investigation and recovery we propose an algorithm to extract worm’s behavioral footprint from the worm’s traffic traces our evaluation with number of real worms and their variants confirms the existence of worms behavioral footprints and demonstrates their effectiveness in worm identification
users past search behaviour provides rich context that an information retrieval system can use to tailor its search results to suit an individual’s or community’s information needs in this paper we present an investigation of the variability in search behaviours for the same queries in close knit community by examining web proxy cache logs over period of nine months we extracted set of queries that had been issued by at least ten users our analysis indicates that overall users clicked on highly ranked and relevant pages but they tend to click on different sets of pages examination of the query reformulation history revealed that users often have different search intents behind the same query we identify three major causes for the community’s interaction behaviour differences the variance of task the different intents expressed with the query and the snippet and characteristics of retrieved documents based on our observations we identify opportunities to improve the design of different search and delivery tools to better support community and individual search experience
in this paper we consider the evolution of structure within large online social networks we present series of measurements of two such networks together comprising in excess of five million people and ten million friendship links annotated with metadata capturing the time of every event in the life of the network our measurements expose surprising segmentation of these networks into three regions singletons who do not participate in the network isolated communities which overwhelmingly display star structure and giant component anchored by well connected core region which persists even in the absence of starswe present simple model of network growth which captures these aspects of component structure the model follows our experimental results characterizing users as either passive members of the network inviters who encourage offline friends and acquaintances to migrate online and linkers who fully participate in the social evolution of the network
we give sqrt log approximation algorithm for the sparsest cut edge expansion balanced separator and graph conductance problems this improves the log approximation of leighton and rao we use well known semidefinite relaxation with triangle inequality constraints central to our analysis is geometric theorem about projections of point sets in rd whose proof makes essential use of phenomenon called measure concentration we also describe an interesting and natural ldquo approximate certificate rdquo for graph’s expansion which involves embedding an node expander in it with appropriate dilation and congestion we call this an expander flow
in this paper we discuss how light and temperature information can be designed to affect feedforward in tangible user interface tui in particular we focus on temperature which has not been widely considered as mode of information representation in feedback or feedforward we describe prototype that implements both information modes in tui finally we outline user study in which these modes are explored as feedforward coaching devices for decision making task the expected outcomes are an understanding of the role of temperature as information for feedforward in tuis and set of design guidelines for designers of tangibles working with these physical characteristics
existing estimation approaches for multi dimensional databases often rely on the assumption that data distribution in small region is uniform which seldom holds in practice moreover their applicability is limited to specific estimation tasks under certain distance metric this paper develops the power method comprehensive technique applicable to wide range of query optimization problems under various metrics the power method eliminates the local uniformity assumption and is accurate even in scenarios where existing approaches completely fail furthermore it performs estimation by evaluating only one simple formula with minimal computational overhead extensive experiments confirm that the power method outperforms previous techniques in terms of accuracy and applicability to various optimization scenarios
speed and scalability are two essential issues in data mining and knowledge discovery this paper proposed mathematical programming model that addresses these two issues and applied the model to credit classification problems the proposed multi criteria convex quadric programming mcqp model is highly efficient computing time complexity and scalable to massive problems size of because it only needs to solve linear equations to find the global optimal solution kernel functions were introduced to the model to solve nonlinear problems in addition the theoretical relationship between the proposed mcqp model and svm was discussed
atomicity is key correctness specification for multithreaded programs prior dynamic atomicity analyses include precise tools which report an error if and only if the observed trace is not serializable and imprecise tools which generalize from the observed trace to report errors that might occur on other traces but which may also report false alarms this paper presents sidetrack lightweight online dynamic analysis that generalizes from the observed trace without introducing the potential for false alarms if sidetrack reports an error then some feasible trace of the source program is not serializable experimental results show that this generalization ability increases the number of atomicity violations detected by sidetrack by
this paper presents game model of gym training system where the behavior of the system is specified using languages developed originally for reactive system design which drive game engine the approach makes it possible to describe behaviors of different parts of the system using different reactive system design languages and tools it thus provides framework for integrating the model behavior to obtain an executable game model of the entire system among the advantages of this approach is the ability to use existing analysis tools to understand the game behavior at design time and run time the ability to easily modify the behavior and the use of visual languages to allow various stakeholders to be involved in early stages of building the game finally we suggest integrating future games and game design methods into the emerging field of biological modeling to which reactive system design has recently been successfully applied
aggregate nearest neighbor ann queries developing from nearest neighbor nn queries are the relatively new query type in spatial database and data mining an ann queries return the object that minimizes an aggregate distance function with respect to set of query points because of the multiple query points ann queries are much more complex than nn queries for optimizing the query processing and improving the query efficiency many ann queries algorithms utilizes pruning strategies in this paper we propose two points projecting based ann queries algorithms which can efficiently prune the data points without indexing we project the query points into special line on which we analyses their distributing then pruning the search space unlike many other algorithms based on the data index mechanisms our algorithms avoid the curse of dimensionality and are effective and efficient in both high dimensional space and metric space we conduct experimental studies using both real dataset and synthetic datasets to compare and evaluate their efficiencies
the trend towards multicore processors and graphic processing units is increasing the need for software that can take advantage of parallelism writing correct parallel programs using threads however has proven to be quite challenging due to nondeterminism the threads of parallel application may be interleaved nondeterministically during execution which can lead to nondeterministic results some interleavings may produce the correct result while others may not we have previously proposed an assertion framework for specifying that regions of parallel program behave deterministically despite nondeterministic thread interleaving the framework allows programmers to write assertions involving pairs of program states arising from different parallel schedules we propose an algorithm to dynamically infer likely deterministic specifications for parallel programs given set of inputs and schedules we have implemented our specification inference algorithm for java and have applied it to number of previously examined java benchmarks we were able to automatically infer specifications largely equivalent to or stronger than our manual assertions from our previous work we believe that the inference of deterministic specifications can aid in understanding and documenting the deterministic behavior of parallel programs moreover an unexpected deterministic specification can indicate to programmer the presence of erroneous or unintended behavior
the lf logical framework codifies methodology for representing deductive systems such as programming languages and logics within dependently typed calculus in this methodology the syntactic and deductive apparatus of system is encoded as the canonical forms of associated lf types an encoding is correct adequate if and only if it defines compositional bijection between the apparatus of the deductive system and the associated canonical forms given an adequate encoding one may establish metatheoretic properties of deductive system by reasoning about the associated lf representation the twelf implementation of the lf logical framework is convenient and powerful tool for putting this methodology into practice twelf supports both the representation of deductive system and the mechanical verification of proofs of metatheorems about it the purpose of this article is to provide an up to date overview of the lf calculus the lf methodology for adequate representation and the twelf methodology for mechanizing metatheory we begin by defining variant of the original lf language called canonical lf in which only canonical forms long βη normal forms are permitted this variant is parameterized by subordination relation which enables modular reasoning about lf representations we then give an adequate representation of simply typed calculus in canonical lf both to illustrate adequacy and to serve as an object of analysis using this representation we formalize and verify the proofs of some metatheoretic results including preservation determinacy and strengthening each example illustrates significant aspect of using lf and twelf for formalized metatheory
the max planck institute for informatics mpi inf is one of institutes of the max planck society germany’s premier scientific organization for foundational research with numerous nobel prizes in natural sciences and medicine mpi inf hosts about researchers including graduate students and comprises research groups on algorithms and complexity programming logics computational biology and applied algorithmics computer graphics and databases and information systems dbis this report gives an overview of the dbis group’s mission and ongoing research
dealing with inconsistencies and change in requirements engineering is known to be difficult problem we propose formal integrated approach to inconsistency handling and requirements evolution with focus on providing automated support we define novel representation scheme that is expressive and able to maintain several key semantic distinctions based on this scheme we define toolkit of inconsistency handling technique we define principled process for evolving such specifications with minimal computational cost and user intervention finally we describe the reform system which implements some of these techniques
the continuous broadcast of data together with an index structure is an effective way of disseminating data in wireless mobile environment the index allows mobile client to tune in only when relevant data is available on the channel and leads to reduced power consumption for the clients this paper investigates the execution of queries on broadcasted index trees when query execution corresponds to partial traversal of the tree queries exhibiting this behavior include range queries and nearest neighbor queries we present two broadcast schedules for index trees and two query algorithms executed by mobile clients our solutions simultaneously minimize tuning time and latency and adapt to the client’s available memory experimental results using real and synthetic data compare results for broadcast with node repetition to one without node repetition and they show how priority based data management can help reduce tuning time and latency
designing thermal management strategies that reduce the impact of hot spots and on die temperature variations at low performance cost is very significant challenge for multiprocessor system on chips mpsocs in this work we present proactive mpsoc thermal management approach which predicts the future temperature and adjusts the job allocation on the mpsoc to minimize the impact of thermal hot spots and temperature variations without degrading performance in addition we implement and compare several reactive and proactive management strategies and demonstrate that our proactive temperature aware mpsoc job allocation technique is able to dramatically reduce the adverse effects of temperature at very low performance cost we show experimental results using simulator as well as an implementation on an ultrasparc system
for two universal sets and we define the concept of solitary set for any binary relation from to through the solitary sets we study the further properties that are interesting and valuable in the theory of rough sets as an application of crisp rough set models in two universal sets we find solutions of the simultaneous boolean equations by means of rough set methods we also study the connection between rough set theory and dempster shafer theory of evidence in particular we extend some results to arbitrary binary relations on two universal sets not just serial binary relations we consider the similar problems in fuzzy environment and give an example of application of fuzzy rough sets in multiple criteria decision making in the case of clothes
browser and proxy server caching are effective and relatively inexpensive methods of improving web performance most existing research considers caching to occur independently at the browser and the proxy server when the browser and the proxy server cache independently documents may get duplicated across the two levels this paper analyzes the impact of document duplication on the performance of several browser proxy caching policies we first derive an exact expression and an accurate approximation for the delay under joint browser proxy caching policy in which no duplication is permitted this policy is compared to base or benchmark policy in which caching occurs independently at the two levels and hence duplication of documents is freely permitted we next propose more general caching policy in which controlled amount of duplication is permitted this policy is analyzed and an exact expression and an approximate expression for performance are derived finally simulation study is performed to confirm the accuracy of the theoretical results and extend these results for situations that are difficult to analyze mathematically
semantic web search is new application of recent advances in information retrieval ir natural language processing artificial intelligence and other fields the powerset group in microsoft develops semantic search engine that aims to answer queries not only by matching keywords but by actually matching meaning in queries to meaning in web documents compared to typical keyword search semantic search can pose additional engineering challenges for the back end and infrastructure designs of these the main challenge addressed in this paper is how to lower query latencies to acceptable interactive levels index based semantic search requires more data processing such as numerous synonyms hypernyms multiple linguistic readings and other semantic information both on queries and in the index in addition some of the algorithms can be super linear such as matching co references across document consequently many semantic queries can run significantly slower than the same keyword query users however have grown to expect web search engines to provide near instantaneous results and slow search engine could be deemed unusable even if it provides highly relevant results it is therefore imperative for any search engine to meet its users interactivity expectations or risk losing them our approach to tackle this challenge is to exploit data parallelism in slow search queries to reduce their latency in multi core systems although all search engines are designed to exploit parallelism at the single node level this usually translates to throughput oriented task parallelism this paper focuses on the engineering of two latency oriented approaches coarse and fine grained and compares them to the task parallel approach we use powerset’s deployed search engine to evaluate the various factors that affect parallel performance workload overhead load balancing and resource contention we also discuss heuristics to selectively control the degree of parallelism and consequent overhead on query by query level our experimental results show that using fine grained parallelism with these dynamic heuristics can significantly reduce query latencies compared to fixed coarse granularity parallelization schemes although these results were obtained on and optimized for powerset’s semantic search they can be readily generalized to wide class of inverted index search engines
in this paper we study the problem of schema exchange natural extension of the data exchange problem to an intensional level to this end we first introduce the notion of schema template tool for the representation of class of schemas sharing the same structure we then define the schema exchange notion as the problem of taking schema that matches source template and ii generating new schema for target template on the basis of set of dependencies defined over the two templates this framework allows the definition once for all of generic transformations that work for several schemas method for the generation of correct solution of the schema exchange problem is proposed and number of general results are given we also show how it is possible to generate automatically data exchange setting from schema exchange solution this allows the definition of queries to migrate data from source database into the one obtained as result of schema exchange
energy efficient computing has recently become hot research area many works have been carried out on conserving energy but considering energy efficiency in grid computing is few this paper proposes energy efficient resource management in mobile grid the objective of energy efficient resource management in mobile grid is to maximize the utility of the mobile grid which is denoted as the sum of grid application utility the utility function models benefits of application and system by using nonlinear optimization theory energy efficient resource management in mobile grid can be formulated as multi objective optimization problem in order to derive distributed algorithm to solve global optimization problem in mobile grid we decompose the problem into the sub problems the proposed energy efficient resource management algorithm decomposes the optimization problem via iterative method to test the performance of the proposed algorithm the simulations are conducted to compare proposed energy efficient resource management algorithm with other energy aware scheduling algorithm
xml publishing has been an emerging technique for transforming portions of relational database into an xml document for example to facilitate interoperability between heterogeneous applications such applications may update the xml document and the source relational database must be updated accordingly in this paper we consider such xml documents as possibly recursively defined xml views of relations we propose new optimization techniques to efficiently support xml view updates specified via an xpath expression with recursion and complex filters the main novelties of our techniques are we propose space efficient relational encoding of recursive xml views and we push the bulk of update processing inside relational database specifically compressed representation of the xml views is stored as extended shared inlining relations space efficient and updatable hop index is used to optimize xpath evaluation on xml views updates of the xml views are evaluated on these relations and index view update translation is handled by heuristic procedure inside relational database as opposed to previous middleware approaches we present an experimental study to demonstrate the effectiveness of our proposed techniques
there are many current classifications and taxonomies relatingto computer security one missing classification is the trustworthinessof information being received by the security system which wedefine this new classification along with timeliness of detection andsecurity level of the security system present motivation for hardware based security solutions including hardware is not an automatic solutionto the limitations of software solutions advantages are only gained fromhardware through design that ensures at least first hand information dedicated monitors explicit hardware communication dedicated storage and dedicated security processors
this paper presents an experimental research that focuses on collaboration in multi player game the aim of the project is to study the cognitive impacts of awareness tools ie artifacts that allow users of collaborative system to be aware of what is going on in the joint virtual environment the focus is on finding an effect on performance as well as on the representation an individual builds of what his partner knows plans and intends to do ie mutual modeling we find that using awareness tools has significant effect by improving task performance however the players who were provided with this tool did not show any improvement of their mutual modeling further analysis on contrasted groups revealed that there was an effect of the awareness tool on mutual modeling for players who spent large amount of time using the tool
we present confocal stereo new method for computing shape by controlling the focus and aperture of lens the method is specifically designed for reconstructing scenes with high geometric complexity or fine scale texture to achieve this we introduce the confocal constancy property which states that as the lens aperture varies the pixel intensity of visible in focus scene point will vary in scene independent way that can be predicted by prior radiometric lens calibration the only requirement is that incoming radiance within the cone subtended by the largest aperture is nearly constant first we develop detailed lens model that factors out the distortions in high resolution slr cameras mp or more with large aperture lenses eg this allows us to assemble an aperture focus image afi for each pixel that collects the undistorted measurements over all apertures and focus settings in the afi representation confocal constancy reduces to color comparisons within regions of the afi and leads to focus metrics that can be evaluated separately for each pixel we propose two such metrics and present initial reconstruction results for complex scenes as well as for scene with known ground truth shape
while we should celebrate our success at evolving many vital aspects of the human technology interactive experience we question the scope of this progress step back with us for moment what really matters everyday life spans wide range of emotions and experiences from improving productivity and efficiency to promoting wonderment and daydreaming but our research and designs do not reflect this important life balance the research we undertake and the applications we build employ technology primarily for improving tasks and solving problems our claim is that our successful future technological tools the one we really want to cohabitate with will be those that incorporate the full range of life experiences in this paper we present wonderment as design concept introduce novel toolkit based on mobile phone technology for promoting non experts to participate in the creating of new objects of wonderment and finally describe probe style interventions used to inform the design of specific object of wonderment based on urban sounds and ringtones called hullabaloo
feature models are popular variability modeling notation used in product line engineering automated analyses of feature models such as consistency checking and interactive or offline product selection often rely on translating models to propositional logic and using satisfiability sat solvers efficiency of individual satisfiability based analyses has been reported previously we generalize and quantify these studies with series of independent experiments we show that previously reported efficiency is not incidental unlike with the general sat instances which fall into easy and hard classes the instances induced by feature modeling are easy throughout the spectrum of realistic models in particular the phenomenon of phase transition is not observed for realistic feature models our main practical conclusion is general encouragement for researchers to continued development of sat based methods to further exploit this efficiency in future
the use of the grid for interactive applications imposes new requirements on tools supporting performance analysis which are not well addressed by tools available in the area of parallel and distributed programming the two most important demands are an on line mode of operation and the ability to measure user defined application specific metrics such as for example the response time of specific user interaction the paper starts with an analysis of these requirements which is based on real world applications developed in the european crossgrid project the most important feature of these applications is the presence of person in the main computing loop who inspects intermediate results and controls the further behavior of the application based on the results of the requirements analysis new performance analysis tool pm is currently being implemented the paper outlines the main ideas as well as some design details of gpm with focus on the concept of user defined metrics the possible problems of this concept and some of its implementation aspects
data mining is defined as the process of discovering significant and potentially useful patterns in large volumes of data discovering associations between items in large database is one such data mining activity in finding associations support is used as an indicator as to whether an association is interesting in this paper we discuss three alternative interest measures for associations any confidence all confidence and bond we prove that the important downward closure property applies to both all confidence and bond we show that downward closure does not hold for any confidence we also prove that if associations have minimum all confidence or minimum bond then those associations will have given lower bound on their minimum support and the rules produced from those associations will have given lower bound on their minimum confidence as well however associations that have that minimum support and likewise their rules that have minimum confidence may not satisfy the minimum all confidence or minimum bond constraint we describe the algorithms that efficiently findall associations with minimum all confidence or minimum bond and present some experimental results
the definition of methodologies for checking database constraint satisfiability ie the absence of contradictions independently of any database state is fundamental and critical problem that has been marginally addressed in the literature in this paper sound and complete algorithm is proposed for checking the satisfiability of specific class of database integrity constraints in simplified object oriented model such class includes cardinality constraints set and bag attributes and explicit integrity constraints involving comparison operators the algorithm conceived to support database schema design allows the designer to distinguish among five different kinds of contradictions each identified by an ad hoc procedure
this paper exhibits the power of programming with dependent types by dint of embedding three domain specific languages cryptol language for cryptographic protocols small data description language and relational algebra each example demonstrates particular design patterns inherent to dependently typed programming documenting these techniques paves the way for further research in domain specific embedded type systems
there have been many approaches proposed for role mining however the problems solved often differ due to lack of consensus on the formal definition of the role mining problem in this paper we provide detailed analysis of the requirements for role mining the existing definitions of role mining and the methods used to assess role mining results given basic assumptions on how access control configurations are generated we propose novel definition of the role mining problem that fulfills the requirements that real world enterprises typically have in this way we recast role mining as prediction problem
time is one of the most relevant topics in ai it plays major role in several areas ranging from logical foundations to applications of knowledge dash based systems in this paper we survey wide range of research in temporal representation and reasoning without committing ourselves to the point of view of any specific application the organization of the paper follows the commonly recognized division of the field in two main subfields reasoning about actions and change and reasoning about temporal constraints we give an overview of the basic issues approaches and results in these two areas and outline relevant recent developments furthermore we briefly analyze the major emerging trends in temporal representation and reasoning as well as the relationships with other well dash established areas such as temporal databases and logic programming
real time process algebra enhanced with specific constructs for handling cryptographic primitives is proposed to model cryptographic protocols in simple way we show that some security properties such as authentication and secrecy can be re formulated in this timed setting moreover we show that they can be seen as suitable instances of general information flow like scheme called timed generalized non deducibility on compositions tgndc parametric wrt the observational semantics of interest we show that when considering timed trace semantics there exists most powerful hostile environment or enemy that can try to compromise the protocol moreover we present couple of compositionality results for tgndc one of which is time dependent and show their usefulness by means of case study
online market intelligence omi in particular competitive intelligence for product pricing is very important application area for web data extraction however omi presents non trivial challenges to data extraction technology sophisticated and highly parameterized navigation and extraction tasks are required on the fly data cleansing is necessary in order two identify identical products from different suppliers it must be possible to smoothly define data flow scenarios that merge and filter streams of extracted data stemming from several web sites and store the resulting data into data warehouse where the data is subjected to market intelligence analytics finally the system must be highly scalable in order to be able to extract and process massive amounts of data in short time lixto wwwlixtocom company offering data extraction tools and services has been providing omi solutions for several customers in this paper we show how lixto has tackled each of the above challenges by improving and extending its original data extraction software most importantly we show how high scalability is achieved through cloud computing this paper also features case study from the computers and electronics market
details in mesh animations are difficult to generate but they have great impact on visual quality in this work we demonstrate practical software system for capturing such details from multi view video recordings given stream of synchronized video images that record human performance from multiple viewpoints and an articulated template of the performer our system captures the motion of both the skeleton and the shape the output mesh animation is enhanced with the details observed in the image silhouettes for example performance in casual loose fitting clothes will generate mesh animations with flowing garment motions we accomplish this with fast pose tracking method followed by nonrigid deformation of the template to fit the silhouettes the entire process takes less than sixteen seconds per frame and requires no markers or texture cues captured meshes are in full correspondence making them readily usable for editing operations including texturing deformation transfer and deformation model learning
understanding the implementation of certain feature of system requires identification of the computational units of the system that contribute to this feature in many cases the mapping of features to the source code is poorly documented in this paper we present semiautomatic technique that reconstructs the mapping for features that are triggered by the user and exhibit an observable behavior the mapping is in general not injective that is computational unit may contribute to several features our technique allows for the distinction between general and specific computational units with respect to given set of features for set of features it also identifies jointly and distinctly required computational units the presented technique combines dynamic and static analyses to rapidly focus on the system’s parts that relate to specific set of features dynamic information is gathered based on set of scenarios invoking the features rather than assuming one to one correspondence between features and scenarios as in earlier work we can now handle scenarios that invoke many features furthermore we show how our method allows incremental exploration of features while preserving the mental map the analyst has gained through the analysis
in this paper steganalysis technique is proposed for pixel value differencing method this steganographic method which is immune against conventional attacks performs the embedding in the difference of the values of pixel pairs therefore the histogram of the differences of an embedded image is different as compared with cover image number of characteristics are identified in the difference histogram that show meaningful alterations when an image is embedded five distinct multilayer perceptrons neural networks are trained to detect different levels of embedding every image is fed to all networks and voting system categorizes the image as stego or cover the implementation results indicate success in correct categorization of the test images that contained more than embedding furthermore using neural network an estimator is presented which gives an estimate of the amount of the mpvd embedding in an image implementation of the estimator showed an average accuracy of in the estimation of the amount of embedding
nowadays nearly every field worker has mobile phone and or another kind of mobile device and an abundant hardware infrastructure for mobile business applications exists the common advances in the area of wireless communication technologies facilitate the realization of mobile business applications however the utilization of this infrastructure is an expensive task due to resource restrictions and heterogeneity of mobile devices specialized applications have to be developed for each kind of mobile device based on the service oriented paradigm we designed reconfigurable middleware for mobile devices which deals with the respective heterogeneity service oriented architecture soa allows to invoke services running on remote devices whereas the tier application approach is separating applications into particular tiers up to now the location of each tier in multi tier architectures was static with our approach the runtime redistribution of tiers is possible for each user application or device without additional implementation efforts in this paper we present the middleware design and our prototype recso we describe in depth the service invocation and with scenario show the advantages of our approach
we propose an energy balanced allocation of real time application onto single hop cluster of homogeneous sensor nodes connected with multiple wireless channels an epoch based application consisting of set of communicating tasks is considered each sensor node is equipped with discrete dynamic voltage scaling dvs the time and energy costs of both computation and communication activities are considered we propose both an integer linear programming ilp formulation and polynomial time phase heuristic our simulation results show that for small scale problems with tasks up to lifetime improvement is achieved by the ilp based approach compared with the baseline where no dvs is used also the phase heuristic achieves up to of the system lifetime obtained by the ilp based approach for large scale problems with tasks up to lifetime improvement can be achieved by the phase heuristic we also incorporate techniques for exploring the energy latency tradeoffs of communication activities such as modulation scaling which leads to lifetime improvement in our simulations simulations were further conducted for two real world problems lu factorization and fast fourier transformation fft compared with the baseline where neither dvs nor modulation scaling is used we observed up to lifetime improvement for the lu factorization algorithm and up to improvement for fft
automatic categorization is viable method to deal with the scaling problem on the world wide web for web site classification this paper proposes the use of web pages linked with the home page in different manner from the sole use of home pages in previous research to implement our proposed method we derive scheme for web site classification based on the nearest neighbor nn approach it consists of three phases web page selection connectivity analysis web page classification and web site classification given web site the web page selection chooses several representative web pages using connectivity analysis the nn classifier next classifies each of the selected web pages finally the classified web pages are extended to classification of the entire web site to improve performance we supplement the nn approach with feature selection method and term weighting scheme using markup tags and also reform its document document similarity measure in our experiments on korean commercial web directory the proposed system using both home page and its linked pages improved the performance of micro averaging breakeven point by compared with an ordinary classification which uses home page only
context and motivation creativity is indispensable for software systems to deliver progress and competitive advantage for stakeholders yet it is rarely supported in requirements processes question problem this paper investigated integration of two software tools one for generating requirements with scenarios the other for supporting people to think creatively while finding and collecting information the effectiveness of the integration was investigated principal ideas results the technical integration is described and an evaluation is reported contribution results reveal some effect on the novelty of the requirements generated and have implications for the design of tools to support creative requirements processes
the so called ldquo redundancy based rdquo approach to question answering represents successful strategy for mining answers to factoid questions such as ldquo who shot abraham lincoln quest rdquo from the world wide web through contrastive and ablation experiments with aranea system that has performed well in several trec qa evaluations this work examines the underlying assumptions and principles behind redundancy based techniques specifically we develop two theses that stable characteristics of data redundancy allow factoid systems to rely on external ldquo black box rdquo components and that despite embodying data driven approach redundancy based methods encode substantial amount of knowledge in the form of heuristics overall this work attempts to address the broader question of ldquo what really matters rdquo and to provide guidance for future researchers
in this paper we present temporal epistemic logic called μtel which generalizes calculus by introducing knowledge modality and cooperation modality similar to calculus μtel is succinct and expressive language it is showed that temporal modalities such as always sometime and until and knowledge modalities such as everyone knows and common knowledge can be expressed in such logic furthermore we study the model checking technique and its complexity finally we use μtel and its model checking algorithm to study the well known trains and controller problem
parallel corpora have become an essential resource for work in multilingual natural language processing in this article we report on our work using the strand system for mining parallel text on the world wide web first reviewing the original algorithm and results and then presenting set of significant enhancements these enhancements include the use of supervised learning based on structural features of documents to improve classification performance new content based measure of translational equivalence and adaptation of the system to take advantage of the internet archive for mining parallel text from the web on large scale finally the value of these techniques is demonstrated in the construction of significant parallel corpus for low density language pair
service oriented architecture with underlying technologies like web services and web service orchestration opens new vistas for integration among business processes operating in heterogeneous environments however such dynamic collaborations require highly secure environment at each respective business partner site existing web services standards address the issue of security only on the service provider platform the partner platforms to which sensitive information is released have till now been neglected remote attestation is relatively new field of research which enables an authorized party to verify that trusted environment actually exists on partner platform to incorporate this novel concept in to the web services realm new mechanism called ws attestation has been proposed this mechanism provides structural paradigm upon which more fine grained solutions can be built in this paper we present novel framework behavioral attestation for web services in which xacml is built on top of ws attestation in order to enable more flexible remote attestation at the web services level we propose new type of xacml policy called xacml behavior policy which defines the expected behavior of partner platform existing web service standards are used to incorporate remote attestation at the web services level and prototype is presented which implements xacml behavior policy using low level attestation techniques
templates in web sites hurt search engine retrieval performance especially in content relevance and link analysis current template removal methods suffer from processing speed and scalability when dealing with large volume web pages in this paper we propose novel two stage template detection method which combines template detection and removal with the index building process of search engine first web pages are segmented into blocks and blocks are clustered according to their style features second similar contents sharing the common layout style are detected during the index building process the blocks with similar layout style and content are identified as templates and deleted our experiment on eight popular web sites shows that our method achieves faster than shingle and sst methods with close accuracy
the term autonomic networking refers to network level software systems capable of self management according to the principles outlined by the autonomic computing initiative autonomicity is widely recognized as crucial property to harness the growing complexity of current networked systems in this paper we present review of state of the art techniques for the automated creation and evolution of software with application to network level functionalities the main focus of the survey are biologically inspired bottom up approaches in which complexity is grown from interactions among simpler units first we review evolutionary computation highlighting aspects that apply to the automatic optimization of computer programs in online dynamic environments then we review chemical computing discussing its suitability as execution model for autonomic software undergoing self optimization by code rewriting last we survey approaches inspired by embryology in which artificial entities undergo developmental process the overview is completed by an outlook into the major technical challenges for the application of the surveyed techniques to autonomic systems
most of current facial animation editing techniques are frame based approaches ie manually edit one keyframe every several frames which is ineffective time consuming and prone to editing inconsistency in this paper we present novel facial editing style learning framework that is able to learn constraint based gaussian process model from small number of facial editing pairs and then it can be effectively applied to automate the editing of the remaining facial animation frames or transfer editing styles between different animation sequences comparing with the state of the art multiresolution based mesh sequence editing technique our approach is more flexible powerful and adaptive our approach can dramatically reduce the manual efforts required by most of current facial animation editing approaches
with the increasing importance of search in guiding today’s web traffic more and more effort has been spent to create search engine spam since link analysis is one of the most important factors in current commercial search engines ranking systems new kinds of spam aiming at links have appeared building link farms is one technique that can deteriorate link based ranking algorithms in this paper we present algorithms for detecting these link farms automatically by first generating seed set based on the common link set between incoming and outgoing links of web pages and then expanding it links between identified pages are re weighted providing modified web graph to use in ranking page importance experimental results show that we can identify most link farm spam pages and the final ranking results are improved for almost all tested queries
refinement is functionality addition to software project that can affect multiple dispersed implementation entities functions classes etc in this paper we examine large scale refinements in terms of fundamental object oriented technique called collaboration based design we explain how collaborations can be expressed in existing programming languages or can be supported with new language constructs which we have implemented as extensions to the java language we present specific expression of large scale refinements called mixin layers and demonstrate how it overcomes the scalability difficulties that plagued prior work we also show how we used mixin layers as the primary implementation technique for building an extensible java compiler jts
this paper presents novel knowledge based method for measuring semantic similarity in support of applications aimed at organizing and retrieving relevant textual information we show how quantitative context may be established for what is essentially qualitative in nature by effecting topological transformation of the lexicon into metric space where distance is well defined we illustrate the technique with simple example and report on promising experimental results with significant word similarity problem
association mining techniques search for groups of frequently co occurring items in market basket type of data and turn these groups into business oriented rules previous research has focused predominantly on how to obtain exhaustive lists of such associations however users often prefer quick response to targeted queries for instance they may want to learn about the buying habits of customers that frequently purchase cereals and fruits to expedite the processing of such queries we propose an approach that converts the market basket database into an itemset tree experiments indicate that the targeted queries are answered in time that is roughly linear in the number of market baskets also the construction of the itemset tree has space and time requirements some useful theoretical properties are proven
any successful solution to using multicore processors to scale general purpose program performance will have to contend with rising intercore communication costs while exposing coarse grained parallelism recently proposed pipelined multithreading pmt techniques have been demonstrated to have general purpose applicability and are also able to effectively tolerate inter core latencies through pipelined interthread communication these desirable properties make pmt techniques strong candidates for program parallelization on current and future multicore processors and understanding their performance characteristics is critical to their deployment to that end this paper evaluates the performance scalability of general purpose pmt technique called decoupled software pipelining dswp and presents thorough analysis of the communication bottlenecks that must be overcome for optimal dswp scalability
scenarios are widely used as requirements and the quality of requirements is an important factor in the efficiency and success of development project the informal nature of scenarios requires that analysts do much manual work with them and much tedious and detailed effort is needed to make collection of scenarios well defined relatively complete minimal and coherent we discuss six aspects of scenarios having inherent structure on which automated support may be based and the results of using such support this automated support frees analysts to concentrate on tasks requiring human intelligence resulting in higher quality scenarios for better system requirements two studies validating the work are presented
virtual data centers allow the hosting of virtualized infrastructures networks storage machines that belong to several customers on the same physical infrastructure virtualization theoretically provides the capability for sharing the infrastructure among different customers in reality however this is rarely if ever done because of security concerns major challenge in allaying such concerns is the enforcement of appropriate customer isolation as specified by high level security policies at the core of this challenge is the correct configuration of all shared resources on multiple machines to achieve this overall security objective to address this challenge this paper presents security architecture for virtual data centers based on virtualization and trusted computing technologies our architecture aims at automating the instantiation of virtual infrastructure while automatically deploying the corresponding security mechanisms this deployment is driven by global isolation policy and thus guarantees overall customer isolation across all resources we have implemented prototype of the architecture based on the xen hypervisor
as modern software based systems and applications gain in versatility and functionality the ability to manage inconsistent resources and service disparate user requirements becomes increasingly imperative furthermore as systems increase in complexity rectification of system faults and recovery from malicious attacks become more difficult labor intensive expensive and error prone these factors have actuated research dealing with the concept of self healing systems self healing systems attempt to heal themselves in the sense of recovering from faults and regaining normative performance levels independently the concept derives from the manner in which biological system heals wound such systems employ models whether external or internal to monitor system behavior and use inputs obtaining therefore to adapt themselves to the run time environment researchers have approached this concept from several different angles this paper surveys research in this field and proposes strategy of synthesis and classification
one of the key ideas underlying web services is that of allowing the combination of existing services published on the web into new service that achieves some higher level functionality and satisfies some business goals as the manual development of the new composite service is recognized as difficult and error prone task the automated synthesis of the composition is considered one of the key challenges in the field of web servicesin this paper we will present survey of existing approaches for the synthesis of web service compositions we will then focus on specific approach the astro approach which has been shown to support complex composition requirements and to be applicable in real domains in the paper we will present the formal framework behind the astro approach we will present the implementation of the framework and its integration within commercial toolkit for developing web services we will finally evaluate the approach on real world composition domain
given an node graph and subset of terminal nodes the np hard steiner tree problem is to compute minimum size tree which spans the terminals all the known algorithms for this problem which improve on trivial time enumeration are based on dynamic programming and require exponential spacemotivated by the fact that exponential space algorithms are typically impractical in this paper we address the problem of designing faster polynomial space algorithms our first contribution is simple polynomial space logk time algorithm based on variant of the classical tree separator theorem this improves on trivial enumeration for roughly combining the algorithm above for small with an improved branching strategy for large we obtain an time polynomial space algorithm the refined branching is based on charging mechanism which shows that for large values of convenient local configurations of terminals and non terminals must exist the analysis of the algorithm relies on the measure conquer approach the non standard measure used here is linear combination of the number of nodes and number of non terminalsas byproduct of our work we also improve the exponential space time complexity of the problem from to
preliminary framework termed as dlgo that enables editable and portable personal digital libraries is presented for mobile offline users of digital libraries dlgo can package digital libraries into mobile storage devices such as flash drives along with needed application softwares eg wiki and dbms de compress contents of digital libraries to address storage constraints of mobile users when needed enables users to add delete and update entities of digital libraries using wiki framework and share sync edited contents with other dlgo users and the server using web services and rss framework
technological advances and increasingly complex and dynamic application behavior argue for revisiting mechanisms that adapt logical cache block size to application characteristics this approach to bridging the processor memory performance gap has been studied before but mostly via trace driven simulation looking only at caches given changes in hardware software technology we revisit the general approach we propose transparent phase adaptive low complexity mechanism for superloading and evaluate it on full system simulator for spec cpu codes targeting benefits instruction and data fetches we investigate cache blocks of confirming that no fixed size performs well for all applications differences range from between best and worst fixed block sizes our scheme obtains performance similar to the per application best static block size in few cases we minimally decrease performance compared to the best static size but best size varies per application and rarely matches real hardware we generally improve performance over best static choices by up to phase adaptability particularly benefits multiprogrammed workloads with conflicting locality characteristics yielding performance gains of our approach also outperforms next line and delta prefetching
implicit representation of graphs is coding of the structure of graphs using distinct labels so that adjacency between any two vertices can be decided by inspecting their labels alone all previous implicit representations of planar graphs were based on the classical three forests decomposition technique aka schnyder’s trees yielding asymptotically to log bit label representation where is the number of vertices of the graph we propose new implicit representation of planar graphs using asymptotically log bit labels as byproduct we have an explicit construction of graph with vertices containing all vertex planar graphs as induced subgraph the best previous size of such induced universal graph was more generally for graphs excluding fixed minor we construct log log log implicit representation for treewidth graphs we give log log log implicit representation improving the log representation of kannan naor and rudich stoc our representations for planar and treewidth graphs are easy to implement all the labels can be constructed in log time and support constant time adjacency testing
in this paper we propose new similarity measure to compute the pairwise similarity of text based documents based on suffix tree document model by applying the new suffix tree similarity measure in group average agglomerative hierarchical clustering gahc algorithm we developed new suffix tree document clustering algorithm nstc experimental results on two standard document clustering benchmark corpus ohsumed and rcv indicate that the new clustering algorithm is very effective document clustering algorithm comparing with the results of traditional word term weight tf idf similarity measure in the same gahc algorithm nstc achieved an improvement of on the average of measure score furthermore we apply the new clustering algorithm in analyzing the web documents in online forum communities topic oriented clustering algorithm is developed to help people in assessing classifying and searching the the web documents in large forum community
schematic heterogeneity arises when information that is represented as data under one schema is represented within the schema as metadata in another schematic heterogeneity is an important class of heterogeneity that arises frequently in integrating legacy data in federated or data warehousing applications traditional query languages and view mechanisms are insufficient for reconciling and translating data between schematically heterogeneous schemas higher order query languages that permit quantification over schema labels have been proposed to permit querying and restructuring of data between schematically disparate schemas we extend this work by considering how these languages can be used in practice specifically we consider restricted class of higher order views and show the power of these views in integrating legacy structures our results provide insights into the properties of restructuring transformations required to resolve schematic discrepancies in addition we show how the use of these views permits schema browsing and new forms of data independence that are important for global information systems furthermore these views provide framework for integrating semi structured and unstructured queries such as keyword searches into structured querying environment we show how these views can be used with minimal extensions to existing query engines we give conditions under which higher order view is usable for answering query and provide query translation algorithms
mobile business research has arguably grown to become one of the most topical and complex ebusiness research areas in recent years as result researchers face plethora of interdisciplinary research challenges understanding the range of these challenges and confronting them requires coordinated research efforts backed up by holistic guiding approach this paper aims at contributing to the future of mobile business research by proposing roadmap to systematise and guide future research efforts providing methodical outlook to open research issues across all dimensions defining mobile business and prioritises future research in each dimension in the form of short medium and long term research challenges
although many efficient concurrency control protocols have been proposed for real time database systems they are mainly restricted for systems with single type of real time transactions or mixed set of soft real time and non real time transactions only their performance objective usually aims at the minimization of the number of missed deadlines of soft real time transactions or to guarantee the deadline satisfaction of hard real time transactions so far it is still lack of any good study on the design of concurrency control strategies for mixed real time database systems mrtdbs which consist of both hard and soft real time transactions and together with non real time transactions due to the very different performance requirements of hard and soft real time transactions existing real time concurrency control protocols may not be suitable to mrtdbs in this paper we propose strategies for resolving data conflicts between different types of transactions in an mrtdbs so that the performance requirements of each individual transaction type can be satisfied and at the same time the overall system performance can be improved the performance of the proposed strategies is evaluated and compared with real time optimistic approach which has been shown to give better performance than the lock based protocols for soft and firm real time transactions
web services provide promising framework for developing interoperable software components that interact with each other across organizational boundaries for this framework to be successful the client and the server for service have to interact with each other based on the published service interface specification if either the client or the server deviate from the interface specification the client server interaction will lead to errors we present framework for checking interface conformance for web services given an interface specification we automatically generate web service server stubs for client verification and drivers for server verification and then use these stubs and drivers to check the conformance of the client and server to the interface specification we implemented this framework by using interface grammars as the interface specification language we developed an interface compiler that automatically generates stub or driver from given interface grammar we conducted case study by applying these techniques to the amazon commerce service
as mathematical models are increasingly adopted for corporate decision making problems arise in departmental information sharing and collaboration around model management systems when multiple departments are involved in decision making problems it is necessary to consider the individual conditions of the multiple departments as whole from an entire organizational perspective however in functionally decentralized organizations the operational data is usually dispersed in individual departments thus the departmental decisions are made separately on the basis of limited information and perspective this paper proposes an object oriented data model for developing collaborative model management system that facilitates not only sharing mathematical models among multiple departments but also coordinating and propagating ongoing changes in the models on real time basis prototype system is developed at kaist on commercial object oriented database system called objectstore using programming language
the increasing adoption of web services is one of the most important technological trends in contemporary business organizations this trend is motivated by claims about the ability of web services to facilitate information technology it flexibility improve information management and even lead to competitive advantage however as the move towards web services is gaining momentum research about their organizational consequences remains mostly conceptual this exploratory study empirically investigates whether the implementation of web services applications is associated with these technological informational and strategic impacts field study approach is employed to collect cross sectional data from it managers in israel data analysis generally supports the research hypotheses showing that the implementation of web services applications positively affects the flexibility of it infrastructure resources and information flexibility the results also show that specific implementation an enterprise information portal also has positive effects on the flexibility of it infrastructure capabilities information quality and it based competitive advantage finally the results demonstrate the magnitude of the organizational impacts of web services applications by comparing them to those of non web erp systems the implications of the findings for practice and research are discussed
semistructured data is modeled as rooted labeled graph the simplest kinds of queries on such data are those which traverse paths described by regular path expressions more complex queries combine several regular path expressions with complex data restructuring and with sub queries this article addresses the problem of efficient query evaluation on distributed semistructured databases in our setting the nodes of the database are distributed over fixed number of sites and the edges are classified into local with both ends in the same site and cross edges with ends in two distinct sites efficient evaluation in this context means that the number of communication steps is fixed independent on the data or the query and that the total amount of data sent depends only on the number of cross links and of the size of the query’s result we give such algorithms in three different settings first for the simple case of queries consisting of single regular expression second for all queries in calculus for graphs based on structural recursion which in addition to regular path expressions can perform nontrivial restructuring of the graph and third for class of queries we call select where queries that combine pattern matching and regular path expressions with data restructuring and subqueries this article also includes discussion on how these methods can be used to derive efficient view maintenance algorithms
we present the verification of the machine level implementation of conservative variant of the standard marksweep garbage collector in hoare style program logic the specification of the collector is given on machine level memory model using separation logic and is strong enough to preserve the safety property of any common mutator program our verification is fully implemented in the coq proof assistant and can be packed immediately as foundational proof carrying code package our work makes important attempt toward building fully certified production quality garbage collectors
grid computing is newly emerging technology aimed at large scale resource sharing and global area collaboration it is the next step in the evolution of parallel and distributed computing due to the largeness and complexity of the grid system its performance and reliability are difficult to model analyze and evaluate this paper presents model that relaxes some assumptions made in prior research on distributed systems that were inappropriate for grid computing the paper proposes virtual tree structured model of the grid service this model simplifies the physical structure of grid service allows service performance execution time to be efficiently evaluated and takes into account data dependence and failure correlation based on the model an algorithm for evaluating the grid service time distribution and the service reliability indices is suggested the algorithm is based on graph theory and probability theory illustrative examples and real case study of the biogrid are presented
current pen input mainly utilizes the position of the pen tip and occasionally button press other possible device parameters such as rolling the pen around its longitudinal axis are rarely used we explore pen rolling as supporting input modality for pen based interaction through two studies we are able to determine the parameters that separate intentional pen rolling for the purpose of interaction from incidental pen rolling caused by regular writing and drawing and the parameter range within which accurate and timely intentional pen rolling interactions can occur building on our experimental results we present an exploration of the design space of rolling based interaction techniques which showcase three scenarios where pen rolling interactions can be useful enhanced stimulus response compatibility in rotation tasks multi parameter input and simplified mode selection
in rudolph kalman published his now famous article describing recursive solution to the discrete data linear filtering problem kalman new approach to linear filtering and prediction problems transactions of the asme journal of basic engineering since that time due in large part to advances in digital computing the kalman filter has been the subject of extensive research and applications particularly in the area of autonomous or assisted navigation the purpose of this paper is to acknowledge the approaching th anniversary of the kalman filter with look back at the use of the filter for human motion tracking in virtual reality vr and augmented reality ar in recent years there has been an explosion in the use of the kalman filter in vr ar in fact at technical conferences related to vr these days it would be unusual to see paper on tracking that did not use some form of kalman filter or draw comparisons to those that do as such rather than attempt comprehensive survey of all uses of the kalman filter to date what follows focuses primarily on the early discovery and subsequent period of evolution of the kalman filter in vr along with few examples of modern commercial systems that use the kalman filter this paper begins with very brief introduction to the kalman filter brief look at the origins of vr little about tracking in vr in particular the work and conditions that gave rise to the use of the filter and then the evolution of the use of the filter in vr
collecting relevance judgments qrels is an especially challenging part of building an information retrieval test collection this paper presents novel method for creating test collections by offering substitute for relevance judgments our method is based on an old idea in ir single information need can be represented by many query articulations we call different articulations of particular need query aspects by combining the top documents retrieved by single system for multiple query aspects we build judgment free qrels whose rank ordering of ir systems correlates highly with rankings based on human relevance judgments
java language has become very popular in the last few years due to its portability java applications are adopted in distributed environment where heterogeneous resources cooperate in this context security is fundamental issue because each resource could execute applications that have been developed by possibly unknown third parties this paper recalls several solutions for improving the java native security support in particular it discusses an approach for history based access control of java applications this paper also describes the application of this solution to two common use cases grid computing and mobile devices such as mobile phones or pdas
this paper describes tinkersheets paper based interface to tangible simulations the proposed interface combines the advantages of form based input and paper form based input allows to set an arbitrary number of parameters using paper as medium for the interface keeps the interaction modality consistently physical tinkersheets are also used as an output screen to display summarized information about the simulation user study conducted in an authentic context shows how the characteristics of the interface shape real world usage we also describe how the affordances of this control and visualization interface support the co design of interaction with end users
in service oriented systems constellation of services cooperate sharing potentially sensitive information and responsibilities cooperation is only possible if the different participants trust each other as trust may depend on many different factors in flexible framework for trust management tm trust must be computed by combining different types of information in this paper we describe the tas tm framework which integrates independent tm systems into single trust decision point the tm framework supports intricate combinations whilst still remaining easily extensible it also provides unified trust evaluation interface to the authorization framework of the services we demonstrate the flexibility of the approach by integrating three distinct tm paradigms reputation based tm credential based tm and key performance indicator tm finally we discuss privacy concerns in tm systems and the directions to be taken for the definition of privacy friendly tm architecture
central goal of collaborative filtering cf is to rank items by their utilities with respect to individual users in order to make personalized recommendations traditionally this is often formulated as rating prediction problem however it is more desirable for cf algorithms to address the ranking problem directly without going through an extra rating prediction step in this paper we propose the probabilistic latent preference analysis plpa model for ranking predictions by directly modeling user preferences with respect to set of items rather than the rating scores on individual items from user’s observed ratings we extract his preferences in the form of pairwise comparisons of items which are modeled by mixture distribution based on bradley terry model an em algorithm for fitting the corresponding latent class model as well as method for predicting the optimal ranking are described experimental results on real world data sets demonstrated the superiority of the proposed method over several existing cf algorithms based on rating predictions in terms of ranking performance measure ndcg
we study the problem of extracting flattened tuple data from streaming hierarchical xml data tuple extraction queries are essentially xml pattern queries with multiple extraction nodes their typical applications include mapping based xml transformation and integrated set based processing of xml and relational data holistic twig joins are known for the optimal matching of xml pattern queries on parsed indexed xml data na iuml ve application of the holistic twig joins to streaming xml data incurs unnecessary disk os we adapt the holistic twig joins for tuple extraction queries on streaming xml with two novel features first we use the block and trigger technique to consume streaming xml data in best effort fashion without compromising the optimality of holistic matching second to reduce peak buffer sizes and overall running times we apply query path pruning and existential match pruning techniques to aggressively filter irrelevant incoming data we compare our solution with the direct competitor turboxpath and other alternative approaches that use full fledged query engines such as xquery or xslt engines for tuple extraction the experiments using real world xml data and queries demonstrated that our approach outperformed its competitors by up to orders of magnitude and exhibited almost linear scalability our solution has been demonstrated extensively to ibm customers and will be included in customer engagement applications in healthcare
diva novel environment for group work is presented this prototype virtual office environment provides support for communication cooperation and awareness in both the synchronous and asynchronous modes smoothly integrated into simple and intuitive interface which may be viewed as replacement for the standard graphical user interface desktop in order to utilize the skills that people have acquired through years of shared work in real offices diva is modeled after the standard office abstracting elements of physical offices required to support collaborative work people rooms desks and documents
we describe multi purpose image classifier that can be applied to wide variety of image classification tasks without modifications or fine tuning and yet provide classification accuracy comparable to state of the art task specific image classifiers the proposed image classifier first extracts large set of image features including polynomial decompositions high contrast features pixel statistics and textures these features are computed on the raw image transforms of the image and transforms of transforms of the image the feature values are then used to classify test images into set of pre defined image classes this classifier was tested on several different problems including biological image classification and face recognition although we cannot make claim of universality our experimental results show that this classifier performs as well or better than classifiers developed specifically for these image classification tasks our classifier’s high performance on variety of classification problems is attributed to large set of features extracted from images and ii an effective feature selection and weighting algorithm sensitive to specific image classification problems the algorithms are available for free download from http wwwopenmicroscopyorg
real world road planning applications often result in the formulation of new variations of the nearest neighbor nn problem requiring new solutions in this paper we study an unexplored form of nn queries named optimal sequenced route osr query in both vector and metric spaces osr strives to find route of minimum length starting from given source location and passing through number of typed locations in particular order imposed on the types of the locations we first transform the osr problem into shortest path problem on large planar graph we show that classic shortest path algorithm such as dijkstra’s is impractical for most real world scenarios therefore we propose lord light threshold based iterative algorithm which utilizes various thresholds to prune the locations that cannot belong to the optimal route then we propose lord an extension of lord which uses tree to examine the threshold values more efficiently finally for applications that cannot tolerate the euclidean distance as estimation and require exact distance measures in metric spaces eg road networks we propose pne that progressively issues nn queries on different point types to construct the optimal route for the osr query our extensive experiments on both real world and synthetic datasets verify that our algorithms significantly outperform disk based variation of the dijkstra approach in terms of processing time up to two orders of magnitude and required workspace up to reduction on average
we present bayesian technique for the reconstruction and subsequent decimation of surface models from noisy sensor data the method uses oriented probabilistic models of the measurement noise and combines them with feature enhancing prior probabilities over surfaces when applied to surface reconstruction the method simultaneously smooths noisy regions while enhancing features such as corners when applied to surface decimation it finds models that closely approximate the original mesh when rendered the method is applied in the context of computer animation where it finds decimations that minimize the visual error even under nonrigid deformations
the analysis and exploration of multidimensional and multivariate data is still one of the most challenging areas in the field of visualization in this paper we describe an approach to visual analysis of an especially challenging set of problems that exhibit complex internal data structure we describe the interactive visual exploration and analysis of data that includes several usually large families of function graphs fi bf we describe analysis procedures and practical aspects of the interactive visual analysis specific to this type of data with emphasis on the function graph characteristic of the data we adopted the well proven approach of multiple linked views with advanced interactive brushing to assess the data standard views such as histograms scatterplots and parallel coordinates are used to jointly visualize data we support iterative visual analysis by providing means to create complex composite brushes that span multiple views and that are constructed using different combination schemes we demonstrate that engineering applications represent challenging but very applicable area for visual analytics as case study we describe the optimization of fuel injection system in diesel engines of passenger cars
in this article we address the problem of target detection in wireless sensor networks wsns we formulate the target detection problem as line set intersection problem and use integral geometry to analytically characterize the probability of target detection for both stochastic and deterministic deployments compared to previous work we analyze wsns where sensors have heterogeneous sensing capabilities for the stochastic case we evaluate the probability that the target is detected by at least sensors and compute the free path until the target is first detected for the deterministic case we show an analogy between the target detection problem and the problem of minimizing the average symbol error probability in digital modulation schemes motivated by this analogy we propose heuristic sensor placement algorithm called date that makes use of well known signal constellations for determining good wsn constellations we also propose heuristic called cdate for connected wsn constellations that yields high target detection probability
clustering is one of the most effective means to enhance the performance of object base applications consequently many proposals exist for algorithms computing good object placements depending on the application profile however in an effective object base reorganization tool the clustering algorithm is only one constituent in this paper we report on our object base reorganization tool that covers all stages of reorganizing the objects the application profile is determined by monitoring tool the object placement is computed from the monitored access statistics utilizing variety of clustering algorithms and finally the reorganization tool restructures the object base accordingly the costs as well as the effectiveness of these tools is quantitatively evaluated on the basis of the oo benchmark
in reality not all users require instant video access people are used to planning ahead including when they wish to watch videos this fact triggered the creation of new video scheduling paradigm named scheduled video delivery svd it lures clients into making requests as early as possible in order to increase stream scheduling flexibility and pricing based resource utilization the existing svd single channel request scheduling algorithms such as medf and lpf have already successfully demonstrated the feasibility of svd in this paper we propose the multi channel multicast delivery mcmd stream scheduling algorithm for svd with mcmd client can receive segments of the same requested video being transmitted on channels its patching and multi channel receiving capabilities further improve resource utilization and the degree of multicast in delivering continuous media the performance study shows that mcmd has lower rejection rate and higher channel utilization than medf and lpf on svd both with and without price incentives
whole program path wpp is complete control flow trace of program’s execution recently larus showed that although wpp is expected to be very large of mbytes it can be greatly compressed to of mbytes and therefore saved for future analysis while the compression algorithm proposed by larus is highly effective the compression is accompanied with loss in the ease with which subsets of information can be accessed in particular path traces pertaining to particular function cannot generally be obtained without examining the entire compressed wpp representation to solve this problem we advocate the application of compaction techniques aimed at providing easy access to path traces on per function basis we present wpp compaction algorithm in which the wpp is broken in to path traces corresponding to individual function calls all of the path traces for given function are stored together as block ability to construct the complete wpp from individual path traces is preserved by maintaining dynamic call graph the compaction is achieved by eliminating redundant path traces that result from different calls to function and by replacing sequence of static basic block ids that correspond to dynamic basic block by single id we transform compacted wpp representation into timestamped wpp twpp representation in which the path traces are organized from the perspective of dynamic basic blocks twpp representation also offers additional opportunities for compaction experiments show that our algorithm compacts the wpps by factors ranging from to at the same time information is organized in highly accessible form which speeds up the responses to queries requesting the path traces of given function by over orders of magnitude
this paper describes the miss classification table simple mechanism that enables the processor or memory controller to identify each cache miss as either conflict miss or capacity non conflict miss the miss classification table works by storing part of the tag of the most recently evicted line of cache set if the next miss to that cache set has matching tag it is identified as conflict miss this technique correctly identifies of misses in the worst case several applications of this information are demonstrated including improvements to victim caching next line prefetching cache exclusion and pseudo associative cache this paper also presents the adaptive miss buffer amb which combines several of these techniques targeting each miss with the most appropriate optimization all within single small miss buffer the amb’s combination of techniques achieves better performance than any single technique alone
discovering users specific and implicit geographic intention in web search can greatly help satisfy users information needs we build geo intent analysis system that uses minimal supervision to learn model from large amounts of web search logs for this discovery we build city language model which is probabilistic representation of the language surrounding the mention of city in web queries we use several features derived from these language models to identify users implicit geo intent and pinpoint the city corresponding to this intent determine whether the geo intent is localized around the users current geographic location predict cities for queries that have mention of an entity that is located in specific place experimental results demonstrate the effectiveness of using features derived from the city language model we find that the system has over precision and more than accuracy for the task of detecting users implicit city level geo intent the system achieves more than accuracy in determining whether implicit geo queries are local geo queries neighbor region geo queries or none of these the city language model can effectively retrieve cities in location specific queries with high precision and recall human evaluation shows that the language model predicts city labels for location specific queries with high accuracy
the current methods to describe the shape of three dimensional objects can be classified into two groups methods following the composition of primitives approach and descriptions based on procedural shape representations as acquisition device returns an agglomeration of elementary objects eg laser scanner returns points the model acquisition pipeline always starts with composition of primitives due to the semantic information carried with generative description procedural model provides valuable metadata that make up the basis for digital library services retrieval indexing and searching an important challenge in computer graphics in the field of cultural heritage is to build bridge between the generative and the explicit geometry description combining both worlds mdash the accuracy and systematics of generative models with the realism and the irregularity of real world data first step towards semantically enriched data description is reconstruction algorithm based on decreasing exponential fitting this approach is robust towards outliers and multiple dataset mixtures it does not need preceding segmentation and is able to fit generative shape template to point cloud identifying the parameters of shape
program enhancement refers to adding new functionality to an existing program we argue that repetitive program enhancement tasks can be expressed as patterns and that the application of such enhancement patterns can be automated this paper presents novel approach to pattern oriented automated enhancement of object oriented programs our approach augments the capabilities of an aspect compiler to capture the programmer’s intent to enhance program in response to the programmer referencing piece of functionality that is non existent our approach automatically synthesizes aspect code to supply the required functionality transparently to improve flexibility and facilitate reuse the synthesis and application of the new functionality is guided by declarative whenthen rules concisely expressed using rule base our extensible automated program enhancement system called drivel extends the aspectj compiler with aspect generating capabilities the generation is controlled using the drools rules engine to validate our approach and automated tool we have created collection of enhancement libraries and used drivel to apply them to the libx edition builder large scale widely used web application drivel automatically enhanced the libx edition builder’s xml processing modules with structural navigation capabilities and caching eliminating the need to implement this functionality by hand
we briefly review the current state of play in the area of agent based software engineering and then consider what next we discuss range of marketing activities that will together help in making people from other communities aware of work in this area we then outline number of research topics that are seen as vital to the future of the field although as always more research is needed recent progress in both research and industrial adoption has been most encouraging and the future of agent based software engineering looks bright
this paper summarizes our experience designing and implementing bitvault content addressable retention platform for large volumes of reference data seldom changing information that needs to be retained for long time bitvault uses smart bricks as the building block to lower the hardware cost the challenges are to keep management costs low in system that scales from one brick to tens of thousands to ensure reliability and to deliver simple design our design incorporates peer to peer pp technologies for self managing and self healing and uses massively parallel repair to reduce system vulnerability to data loss the simplicity of the architecture relies on an eventually reliable membership service provided by perfect one hop distributed hash table dht its object driven repair model yields last replica recall guarantee independent of the failure scenario so long as the last copy of data object remains in the system that data can be retrieved and its replication degree can be restored prototype has been implemented theoretical analysis simulations and experiments have been conducted to validate the design of bitvault
recent research projects have experimented with controlling production system equipment through web service interfaces however orchestrating these web services to accomplish complicated production task can be difficult because the states of the production equipment and the set of available web services may change this paper proposes an approach in which the production equipment and their states are modelled with an ontology and the semantic model is dynamically analyzed to determine which action should next be taken in the production process similarly the available web services are described with semantic models and the descriptions are retrieved dynamically to find the web services capable of performing the required actions while web ontology language owl is used in describing equipment statuses the owl ontology which is based on owl is used for describing web services once the semantic service descriptions have been analyzed to find the appropriate web services the services are invoked using their syntactic wsdl descriptions
automata theoretic decision procedures for solving model checking and satisfiability problems for temporal dynamic and description logics have flourished during the past decades in the paper we define an exptime decision procedure based on the emptiness problem of uuml chi automata on infinite trees for the very expressive information logic sim designed for reasoning about information systems this logic involves modal parameters satisfying certain properties to capture the relevant nominals at the formula level boolean expressions and nominals at the modal level an implicit intersection operation for relations and universal modality the original combination of known techniques allows us to solve the open question related to the exptime completeness of sim furthermore we discuss how variants of sim can be treated similarly although the decidability status of some of them is still unknown
many webd sites do not offer sufficient assistance to especially novice users in navigating the virtual world find objects places of interests and learn how to interact with them this paper aims at helping the webd content creator to face this problem by proposing the adoption of guided tours of virtual worlds as an effective user aid and ii describing novel tool that provides automatic code generation for adding such guided tours to vrml worlds finally we will show how the tool has been used in the development of an application concerning computer science museum
performing inlining of routines across file boundaries is known to yield significant run time performance improvements in this paper we present scalable cross module inlining framework that reduces the compiler’s memory footprint file thrashing and overall compile time instead of using the call site ordering generated by the analysis phase the transformation phase dynamically produces new inlining order depending on the resource constraints of the system we introduce dependences among call sites and affinity among source files based on the inlines performed we discuss the implementation of our technique and show how it substantially reduces compile time and memory usage without sacrificing any run time performance
in large and often distributed environments where access control information may be shared across multiple sites the combination of individual specifications in order to define coherent access control policy is of fundamental importance in order to ensure non ambiguous behaviour formal languages often relying on firstorder logic have been developed for the description of access control policies we propose in this paper formalisation of policy composition by means of term rewriting we show how in this setting we are able to express wide range of policy combinations and reason about them modularity properties of rewrite systems can be used to derive the correctness of the global policy ie that every access request has an answer and this answer is unique
supporting one handed thumb operation of touchscreen based mobile devices presents challenging tradeoff between visual expressivity and ease of interaction thumbspace and shift two new application independent software based interaction techniques address this tradeoff in significantly different ways thumbspace addresses distant objects while shift addresses small object occlusion we present two extensive comparative user studies the first compares thumbspace and shift to peripheral hardware directional pad and scrollwheel and direct touchscreen input for selecting objects while standing and walking the data favored the shift design overall but suggested thumbspace is promising for distant objects our second study examines the benefits and learnability of combining shift and thumbspace on device with larger screen we found their combined use offered users better overall speed and accuracy in hitting small targets mm than using either method alone
decentralized storage systems aggregate the available disk space of participating computers to provide large storage facility these systems rely on data redundancy to ensure durable storage despite of node failures however existing systems either assume independent node failures or they rely on introspection to carefully place redundant data on nodes with low expected failure correlation unfortunately node failures are not independent in practice and constructing an accurate failure model is difficult in large scale systems at the same time malicious worms that propagate through the internet pose real threat of large scale correlated failures such rare but potentially catastrophic failures must be considered when attempting to provide highly durable storage in this paper we describe glacier distributed storage system that relies on massive redundancy to mask the effect of large scale correlated failures glacier is designed to aggressively minimize the cost of this redundancy in space and time erasure coding and garbage collection reduces the storage cost aggregation of small objects and loosely coupled maintenance protocol for redundant fragments minimizes the messaging cost in one configuration for instance our system can provide six nines durable storage despite correlated failures of up to of the storage nodes at the cost of an elevenfold storage overhead and an average messaging overhead of only messages per node and minute during normal operation glacier is used as the storage layer for an experimental serverless email system
quantiles are crucial type of order statistics in databases extensive research has been focused on maintaining space efficient structure for approximate quantile computation as the underlying dataset is updated the existing solutions however are designed to support only the current most updated snapshot of the dataset queries on the past versions of the data cannot be answered this paper studies the problem of historical quantile search the objective is to enable approximate quantile retrieval on any snapshot of the dataset in history the problem is very important in analyzing the evolution of distribution monitoring the quality of services query optimization in temporal databases and so on we present the first formal results in the literature first we prove novel theoretical lower bound on the space cost of supporting approximate historical quantile queries the bound reveals the fundamental difference between answering quantile queries about the past and those about the present time second we propose structure for finding approximate historical quantiles and show that it consumes more space than the lower bound by only square logarithmic factor extensive experiments demonstrate that in practice our technique performs much better than predicted by theory in particular the quantiles it returns are remarkably more accurate than the theoretical precision guarantee
linear discriminant analysis lda has been popular method for extracting features which preserve class separability it has been widely used in many fields of information processing however the computation of lda involves dense matrices eigen decomposition which can be computationally expensive both in time and memory specifically lda has mnt time complexity and requires mn mt nt memory where is the number of samples is the number of features and min when both and are large it is infeasible to apply lda in this paper we propose novel algorithm for discriminant analysis called em spectral regression discriminant analysis srda by using spectral graph analysis srda casts discriminant analysis into regression framework which facilitates both efficient computation and the use of regularization techniques specifically srda only needs to solve set of regularized least squares problems and there is no eigenvector computation involved which is huge save of both time and memory our theoretical analysis shows that srda can be computed with ms time and ms memory where leq is the average number of non zero features in each sample extensive experimental results on four real world data sets demonstrate the effectiveness and efficiency of our algorithm
increasingly number of applications rely on or can potentially benefit from analysis and monitoring of data streams to support processing of streaming data in grid environment we have been developing middleware system called gates grid based adaptive execution on streams our target applications are those involving high volume data streams and requiring distributed processing of data arising from distributed set of sources this paper addresses the problem of resource allocation in the gates system though resource discovery and resource allocation have been active topics in grid community the pipelined processing and real time constraint required by distributed streaming applications pose new challenges we present resource allocation algorithm that is based on minimal spanning trees we evaluate the algorithm experimentally and demonstrate that it results in configurations that are very close to optimal and significantly better than most other possible configurations
direct volume rendering dvr is of increasing diagnostic value in the analysis of data sets captured using the latest medical imaging modalities the deployment of dvr in everyday clinical work however has so far been limited one contributing factor is that current transfer function tf models can encode only small fraction of the user’s domain knowledge in this paper we use histograms of local neighborhoods to capture tissue characteristics this allows domain knowledge on spatial relations in the data set to be integrated into the tf as first example we introduce partial range histograms in an automatic tissue detection scheme and present its effectiveness in clinical evaluation we then use local histogram analysis to perform classification where the tissue type certainty is treated as second tf dimension the result is an enhanced rendering where tissues with overlapping intensity ranges can be discerned without requiring the user to explicitly define complex multidimensional tf
user interface design by sketching as well as other sketching activities typically involves sketching objects through representations that should combine meaningfulness for the end users and easiness for the recognition engines to investigate this relationship multi platform user interface design tool has been developed that enables designers to sketch design ideas in multiple levels of fidelity with multi stroke gestures supporting widget representations and commands usability analysis of these activities as they are submitted to recognition engine suggests that the level of fidelity the amount of constraints imposed on the representations and the visual difference of representations positively impact the sketching activity as whole implications for further sketch representations in user interface design and beyond are provided based on usability guidelines
the poirot project is four year effort to develop an architecture that integrates the products of number of targeted reasoning and learning components to produce executable representations of demonstrated web service workflow processes to do this it combines contributions from multiple trace analysis interpretation and learning methods guided by meta control regime that reviews explicit learning hypotheses and posts new learning goals and internal learning subtasks poirot’s meta controller guides the activity of its components through largely distinct phases of processing from trace interpretation to inductive learning hypotheses combination and experimental evaluation in this paper we discuss the impact that various kinds of inference during the trace interpretation phase can have on the quality of the learned models
in the typical database system an execution is correct if it is equivalent to some serial execution this criterion called serializability is unacceptable for new database applications which require long duration transactions we present new transaction model which allows correctness criteria more suitable for these applications this model combines three enhancements to the standard model nested transactions explicit predicates and multiple versions these features yield the name of the new model nested transactions with predicates and versions or nt pvthe modular nature of the nt pv model allows straightforward representation of simple systems it also provides formal framework for describing complex interactions the most complex interactions the model allows can be captured by protocol which exploits all of the semantics available to the nt pv model an example of these interactions is shown in case application the example shows how system based on the nt pv model is superior to both standard database techniques and unrestricted systems in both correctness and performance
terrain rendering is an important factor in the rendering of virtual scenes if they are large and detailed digital terrains can represent huge amount of data and therefore of graphical primitives to render in real time in this paper we present an efficient technique for out of core rendering of pseudo infinite terrains the full terrain height field is divided into regular tiles which are streamed and managed adaptively each visible tile is then rendered using precomputed triangle strip patch selected in an adaptive way according to an importance metric thanks to these two levels of adaptivity our approach can be seen as cross platform technique to render terrains on any kind of devices from slow handheld to powerful desktop pc by only exploiting the device capacity to draw as much triangles as possible for target frame rate and memory space
as computer systems penetrate deeper into our lives and handle private data safety critical applications and transactions of high monetary value efforts to breach their security also assume significant dimensions way beyond an amateur hacker’s play until now security was always an afterthought this is evident in regular updates to antivirus software patches issued by vendors after software bugs are discovered etc however increasingly we are realizing the need to incorporate security during the design of system be it software or hardware we invoke this philosophy in the design of hardware based system to enable protection of program’s data during execution in this paper we develop general framework that provides security assurance against wide class of security attacks our work is based on the observation that program’s normal or permissible behavior with respect to data accesses can be characterized by various properties we present hardware software approach wherein such properties can be encoded as data attributes and enforced as security policies during program execution these policies may be application specific eg access control for certain data structures compiler generated eg enforcing that variables are accessed only within their scope or universally applicable to all programs eg disallowing writes to unallocated memory we show how an embedded system architecture can support such policies by enhancing the memory hierarchy to represent the attributes of each datum as security tags that are linked to it throughout its lifetime and adding configurable hardware checker that interprets the semantics of the tags and enforces the desired security policies we evaluated the effectiveness of the proposed architecture in enforcing various security policies for several embedded benchmark applications our experiments in the context of the simplescalar framework demonstrate that the proposed solution ensures run time validation of application defined data properties with minimal execution time overheads
plethora of reaching techniques intended for moving objects between locations distant to the user have recently been proposed and tested one of the most promising techniques is the radar view up till now the focus has been mostly on how user can interact efficiently with given radar map not on how these maps are created and maintained it is for instance unclear whether or not users would appreciate the possibility of adapting such radar maps to particular tasks and personal preferences in this paper we address this question by means of prolonged user study with the sketch radar prototype the study demonstrates that users do indeed modify the default maps in order to improve interactions for particular tasks it also provides insights into how and why the default physical map is modified
recently improving the energy efficiency of hpc machines has become important as result interest in using powerscalable clusters where frequency and voltage can be dynamically modified has increased on power scalable clusters one opportunity for saving energy with little or no loss of performance exists when the computational load is not perfectly balanced this situation occurs frequently as balancing load between nodes is one of the long standing problems in parallel and distributed computing in this paper we present system called jitter which reduces the frequency on nodes that are assigned less computation and therefore have slack time this saves energy on these nodes and the goal of jitter is to attempt to ensure that they arrive just in time so that they avoid increasing overall execution time for example in aztec from the asci purple suite our algorithm uses less energy while increasing execution time by only
the expressiveness of various object oriented languages is investigated with respect to their ability to create new objects we focus on database method schemas dms model capturing the data manipulation capabilities of large class of deterministic methods in object oriented databases the results clarify the impact of various language constructs on object creation several new constructs based on expanded notions of deep equality are introduced in particular we provide tractable construct which yields language complete with respect to object creation the new construct is also relevant to query complexity for example it allows expressing in polynomial time some queries like counting requiring exponential space in dms alone
coordinating entities in networked environment has always been significant challenge for software developers in recent years however it has become even more difficult because devices have increasingly rich capabilities combining an ever larger range of technologies networking multimedia sensors etc to address this challenge we propose language based approach to covering the life cycle of applications coordinating networked entities our approach covers the characterization of the networked environment the specification of coordination applications the verification of networked environment and its deployment it is carried out in practice by domain specific language named pantaxou this paper presents the domain specific language pantaxou dedicated to the development of applications for networked heterogeneous entities pantaxou has been used to specify number of coordination scenarios in areas ranging from home automation to telecommunications the language semantics has been formally defined and compiler has been developed the compiler verifies the coherence of coordination scenario and generates coordination code in java
although the api of software framework should stay stable in practice it often changes during maintenance when deploying new framework version such changes may invalidate plugins modules that used one of its previous versions while manual plugin adaptation is expensive and error prone automatic adaptation demands cumbersome specifications which the developers are reluctant to write and maintain basing on the history of structural framework changes refactorings in our previous work we formally defined how to automatically derive an adaptation layer that shields plugins from framework changes in this paper we make our approach practical two case studies of unconstrained api evolution show that our approach scales in large number of adaptation scenarios and comparing to other adaptation techniques the evaluation of our logic based tool comeback demonstrates that it can adapt efficiently most of the problem causing api refactorings
in this paper we address the problem of database selection for xml document collections that is given set of collections and user query how to rank the collections based on their goodness to the query goodness is determined by the relevance of the documents in the collection to the query we consider keyword queries and support lowest common ancestor lca semantics for defining query results where the relevance of each document to query is determined by properties of the lca of those nodes in the xml document that contain the query keywords to avoid evaluating queries against each document in collection we propose maintaining in preprocessing phase information about the lcas of all pairs of keywords in document and use it to approximate the properties of the lca based results of query to improve storage and processing efficiency we use appropriate summaries of the lca information based on bloom filters we address both boolean and weighted version of the database selection problem our experimental results show that our approach incurs low errors in the estimation of the goodness of collection and provides rankings that are very close to the actual ones
representing the appearances of surfaces illuminated from different directions has long been an active research topic while many representation methods have been proposed the relationships and conversion between different representations have been less well researched these relationships are important as they provide an insight as to the different capabilities of the surface representations and means by which they may be converted to common formats for computer graphic applications in this paper we introduce single mathematical framework and use it to express three commonly used surface texture relighting representations surface gradients gradient polynomial texture maps ptm and eigen base images eigen the framework explicitly reveals the relations between the three methods and from this we propose set of conversion methods we use rough surface textures illuminated from directions for our experiments and perform both quantitative and qualitative assessments to evaluate the conversion methods the quantitative assessment uses normalized root mean squared error as metric to compare the original images and those produced by proposed representation methods the qualitative assessment is based on psychophysical experiments and non parametric statistics the results from the two assessments are consistent and show that the original eigen representation produces the best performance the second best performances are achieved by the original ptm representation and the conversion between polynomial texture maps ptm and eigen base images eigen while the performances of other representations are not significantly different
mapping specification has been recognised as critical bottleneck to the large scale deployment of data integration systems mapping is description using which data structured under one schema are transformed into data structured under different schema and is central to data integration and data exchange systems in this paper we argue that the classical approach of correspondence identification followed by manual mapping generation can be simplified through the removal of the second step by judicious refinement of the correspondences captured as step in this direction we present in this paper model for schematic correspondences that builds on and extends the classification proposed by kim et al to cater for the automatic derivation of mappings and present an algorithm that shows how correspondences specified in the model proposed can be used for deriving schema mappings the approach is illustrated using case study from integration in proteomics
role is commonly used concept in software development but concept with divergent definitions this paper discusses the characteristics of roles in software organizations and contrasts such organization roles with other player centric conceptions roles in organizations have their own identity and do not depend on role players for their existence in software terms such roles are first class runtime entities rather than just design concepts we define characteristic properties of both roles and players in organizational contexts and show how the boundary between role and its player varies depending on the level of autonomy the player is allowed we show how roles can facilitate the separation of structure from process facilitating greater adaptivity in software the problem of preservation of state in role based organizations is also discussed possible implementation strategies for both roles and players are discussed and illustrated with various role oriented approaches to building software organizations
clock is classical cache replacement policy dating back to that was proposed as low complexity approximation to lru on every cache hit the policy lru needs to move the accessed item to the most recently used position at which point to ensure consistency and correctness it serializes cache hits behind single global lock clock eliminates this lock contention and hence can support high concurrency and high throughput environments such as virtual memory for example multics unix bsd aix and databases for example db unfortunately clock is still plagued by disadvantages of lru such as disregard for frequency susceptibility to scans and low performanceas our main contribution we propose simple and elegant new algorithm namely clock with adaptive replacement car that has several advantages over clock it is scan resistant ii it is self tuning and it adaptively and dynamically captures the recency and frequency features of workload iii it uses essentially the same primitives as clock and hence is low complexity and amenable to high concurrency implementation and iv it outperforms clock across wide range of cache sizes and workloads the algorithm car is inspired by the adaptive replacement cache arc algorithm and inherits virtually all advantages of arc including its high performance but does not serialize cache hits behind single global lock as our second contribution we introduce another novel algorithm namely car with temporal filtering cart that has all the advantages of car but in addition uses certain temporal filter to distill pages with long term utility from those with only short term utility
search methods in dynamic networks usually cannot rely on stable topology from which shortest or otherwise optimized paths through the network are derived when no reliable search indices or routing tables are provided other methods like flooding or random walks have to be considered to explore the network these approaches can exploit partially available information on network paths but the search effort naturally increases with the lack of precise paths due to network dynamics the problem is especially relevant for wireless technology with strict limitation on power consumptionwe compare the efficiency of random walks and flooding for exploring networks of small to medium size several scenarios are considered including partial path information support for search transient analysis and bound are applied in order to evaluate the messaging overhead
multithreaded programs are difficult to get right because of unexpected interaction between concurrently executing threads traditional testing methods are inadequate for catching subtle concurrency errors which manifest themselves late in the development cycle and post deployment model checking or systematic exploration of program behavior is promising alternative to traditional testing methods however it is difficult to perform systematic search on large programs as the number of possible program behaviors grows exponentially with the program size confronted with this state explosion problem traditional model checkers perform iterative depth bounded search although effective for message passing software iterative depth bounding is inadequate for multithreaded software this paper proposes iterative context bounding new search algorithm that systematically explores the executions of multithreaded program in an order that prioritizes executions with fewer context switches we distinguish between preempting and nonpreempting context switches and show that bounding the number of preempting context switches to small number significantly alleviates the state explosion without limiting the depth of explored executions we show both theoretically and empirically that context bounded search is an effective method for exploring the behaviors of multithreaded programs we have implemented our algorithmin two model checkers and applied it to number of real world multithreaded programs our implementation uncovered previously unknown bugs in our benchmarks each of which was exposed by an execution with at most preempting context switches our initial experience with the technique is encouraging and demonstrates that iterative context bounding is significant improvement over existing techniques for testing multithreaded programs
recent work introduced novel peer to peer application that leverages content sharing and aggregation among the peers to diagnose misconfigurations on desktop pc this application poses interesting challenges in preserving privacy of user configuration data and in maintaining integrity of troubleshooting results in this paper we provide much more rigorous cryptographic and yet practical solution for preserving privacy and we investigate and analyze solutions for ensuring integrity
online feedback based rating systems are gaining popularity dealing with collaborative unfair ratings in such systems has been recognized as an important but difficult problem this problem is challenging especially when the number of honest ratings is relatively small and unfair ratings can contribute to significant portion of the overall ratings in addition the lack of unfair rating data from real human users is another obstacle toward realistic evaluation of defense mechanisms in this paper we propose set of methods that jointly detect smart and collaborative unfair ratings based on signal modeling based on the detection framework of trust assisted rating aggregation system is developed furthermore we design and launch rating challenge to collect unfair rating data from real human users the proposed system is evaluated through simulations as well as experiments using real attack data compared with existing schemes the proposed system can significantly reduce the impact from collaborative unfair ratings
we address the problem of academic conference homepage understanding for the semantic web this problem consists of three labeling tasks labeling conference function pages function blocks and attributes different from traditional information extraction tasks the data in academic conference homepages has complex structural dependencies across multiple web pages in addition there are logical constraints in the data in this paper we propose unified approach constrained hierarchical conditional random fields to accomplish the three labeling tasks simultaneously in this approach complex structural dependencies can be well described also the constrained viterbi algorithm in the inference process can avoid logical errors experimental results on real world conference data have demonstrated that this approach performs better than cascaded labeling methods by in measure and that the constrained inference process can improve the accuracy by based on the proposed approach we develop prototype system of use oriented semantic academic conference calendar the user simply needs to specify what conferences he she is interested in subsequently the system finds extracts and updates the semantic information from the web and then builds calendar automatically for the user the semantic conference data can be used in other applications such as finding sponsors and finding experts the proposed approach can be used in other information extraction tasks as well
today many peer to peer pp simulation frameworks feature variety of recent years research outcomes as modular building blocks allowing others to easily reuse these blocks in further simulations and approach more advanced issues more rapidly however the efforts in the field of pp load balancing have been excluded from that development so far as the proposed techniques often impose too many dependencies between the load balancing algorithms and the application to be put in loosely coupled components this paper discusses how load balancing algorithms that rely on the virtual server concept can be separated from the application and run in modular container with unified communication interface we discuss design fundamentals for such load balancing container based on variety of existing load balancing techniques and present our implementation for the oversim framework with our work load balancing becomes reusable building block for pp applications which contributes to the process of building rich and modular simulation environments
model programs are used as high level behavioral specifications typically representing abstract state machines for modeling reactive systems one uses input output model programs where the action vocabulary is divided between two conceptual players the input player and the output player the players share the action vocabulary and make moves that are labeled by actions according to their respective model programs conformance between the two model programs means that the output input player only makes output input moves that are allowed by the input output players model program in bounded game the total number of moves is fixed here model programs use background theory mathcal containing linear arithmetic sets and tuples we formulate the bounded game conformance checking problem or bgc as theorem proving problem modulo mathcal and analyze its complexity
mismatch and overload are the two fundamental issues regarding the effectiveness of information filtering both term based and pattern phrase based approaches have been employed to address these issues however they all suffer from some limitations with regard to effectiveness this paper proposes novel solution that includes two stages an initial topic filtering stage followed by stage involving pattern taxonomy mining the objective of the first stage is to address mismatch by quickly filtering out probable irrelevant documents the threshold used in the first stage is motivated theoretically the objective of the second stage is to address overload by apply pattern mining techniques to rationalize the data relevance of the reduced document set after the first stage substantial experiments on rcv show that the proposed solution achieves encouraging performance
software testing is critical part of software development as new test cases are generated over time due to software modifications test suite sizes may grow significantly because of time and resource constraints for testing test suite minimization techniques are needed to remove those test cases from suite that due to code modifications over time have become redundant with respect to the coverage of testing requirements for which they were generated prior work has shown that test suite minimization with respect to given testing criterion can significantly diminish the fault detection effectiveness fde of suites we present new approach for test suite reduction that attempts to use additional coverage information of test cases to selectively keep some additional test cases in the reduced suites that are redundant with respect to the testing criteria used for suite minimization with the goal of improving the fde retention of the reduced suites we implemented our approach by modifying an existing heuristic for test suite minimization our experiments show that our approach can significantly improve the fde of reduced test suites without severely affecting the extent of suite size reduction
static analysis has gained much attention over the past few years in applications such as bug finding and program verification as software becomes more complex and componentized it is common for software systems and applications to be implemented in multiple languages there is thus strong need for developing analysis tools for multi language software we introduce technique called analysis preserving language transformation aplt that enables the analysis of multi language software and also allows analysis tools for one language to be applied to programs written in another aplt preserves data and control flow information needed to perform static analyses but allows the translation to deviate from the original program’s semantics in ways that are not pertinent to the particular analysis we discuss major technical difficulties in building such translator using to java translator as an example we demonstrate the feasibility and effectiveness of aplt using two usage cases analysis of the java runtime native methods and reuse of java analysis tools for our preliminary results show that control and data flow equivalent model for native methods can eliminate unsoundness and produce reliable results and that aplt enables seamless reuse of analysis tools for checking high level program properties
alarm management has been around for decades in telecom solutions we have seen various efforts to define standardised alarm interfaces the research community has focused on various alarms correlation strategies still after years of effort in industry and research alike network administrators are flooded with alarms alarms are suffering from poor information quality and the costs of alarm integration have not decreased in this paper we explore the concept of alarm we define alarm and alarm type concepts by investigating the different definitions currently in use in standards and research efforts based on statistical alarm data from mobile operator we argue that operational and capital expenditures would decrease if alarm sources would apply to our alarm model
in this work first we present grid resource discovery protocol that discovers computing resources without the need for resource brokers to track existing resource providers the protocol uses scoring mechanism to aggregate and rank resource provider assets and internet router data tables called grid routing tables for storage and retrieval of the assets then we discuss the simulation framework used to model the protocol and the results of the experimentation the simulator utilizes simulation engine core that can be reused for other network protocol simulators considering time management event distribution and simulated network infrastructure the techniques for constructing the simulation core code using clr are also presented in this paper
rank aware query processing has emerged as key requirement in modern applications in these applications efficient and adaptive evaluation of top queries is an integral part of the application semantics in this article we introduce rank aware query optimization framework that fully integrates rank join operators into relational query engines the framework is based on extending the system dynamic programming algorithm in both enumeration and pruning we define ranking as an interesting physical property that triggers the generation of rank aware query plans unlike traditional join operators optimizing for rank join operators depends on estimating the input cardinality of these operators we introduce probabilistic model for estimating the input cardinality and hence the cost of rank join operator to our knowledge this is the first effort in estimating the needed input size for optimal rank aggregation algorithms costing ranking plans is key to the full integration of rank join operators in real world query processing enginessince optimal execution strategies picked by static query optimizers lose their optimality due to estimation errors and unexpected changes in the computing environment we introduce several adaptive execution strategies for top queries that respond to these unexpected changes and costing errors our reactive reoptimization techniques change the execution plan at runtime to significantly enhance the performance of running queries since top query plans are usually pipelined and maintain complex ranking state altering the execution strategy of running ranking query is an important and challenging taskwe conduct an extensive experimental study to evaluate the performance of the proposed framework the experimental results are twofold we show the effectiveness of our cost based approach of integrating ranking plans in dynamic programming cost based optimizers and we show significant speedup up to percnt when using our adaptive execution of ranking plans over the state of the art mid query reoptimization strategies
this paper explores the scalability of the stream processor architecture along the instruction data and thread level parallelism dimensions we develop detailed vlsi cost and processor performance models for multi threaded stream processor and evaluate the tradeoffs in both functionality and hardware costs of mechanisms that exploit the different types of parallelism we show that the hardware overhead of supporting coarse grained independent threads of control is depending on machine parameters we also demonstrate that the performance gains provided are of smaller magnitude for set of numerical applications we argue that for stream applications with scalable parallel algorithms the performance is not very sensitive to the control structures used within large range of area efficient architectural choices we evaluate the specific effects on performance of scaling along the different parallelism dimensions and explain the limitations of the ilp dlp and tlp hardware mechanisms
this paper addresses the problem of evaluating ranked top queries with expensive predicates as major dbmss now all support expensive user defined predicates for boolean queries we believe such support for ranked queries will be even more important first ranked queries often need to model user specific concepts of preference relevance or similarity which call for dynamic user defined functions second middleware systems must incorporate external predicates for integrating autonomous sources typically accessible only by per object queries third fuzzy joins are inherently expensive as they are essentially user defined operations that dynamically associate multiple relations these predicates being dynamically defined or externally accessed cannot rely on index mechanisms to provide zero time sorted output and must instead require per object probe to evaluate the current standard sort merge framework for ranked queries cannot efficiently handle such predicates because it must completely probe all objects before sorting and merging them to produce top answers to minimize expensive probes we thus develop the formal principle of necessary probes which determines if probe is absolutely required we then propose algorithm mpro which by implementing the principle is provably optimal with minimal probe cost further we show that mpro can scale well and can be easily parallelized our experiments using both real estate benchmark database and synthetic datasets show that mpro enables significant probe reduction which can be orders of magnitude faster than the standard scheme using complete probing
in this paper we propos new shared memory model transactionalmemory coherence and consistency tcc tcc providesa model in which atomic transactions are always the basicunit of parallel work communication memory coherence andmemory reference consistencytcc greatly simplifies parallelsoftware by eliminating the need for synchronization using conventionallocks and semaphores along with their complexitiestcc hardware must combine all writes from each transaction regionin program into single packet and broadcast this packetto the permanent shared memory state atomically as large blockthis simplifies the coherence hardware because it reduces theneed for small low latency messages and completely eliminatesthe need for conventional snoopy cache coherence protocols asmultiple speculatively written versions of cache line may safelycoexist within the systemmeanwhile automatic hardware controlledrollback of speculative transactions resolves any correctnessviolations that may occur when several processors attemptto read and write the same data simultaneouslythe cost of thissimplified scheme is higher interprocessor bandwidthto explore the costs and benefits of tcc we study the characterisitcsof an optimal transaction based memory system and examinehow different design parameters could affect the performanceof real systemsacross spectrum of applications the tcc modelitself did not limit available parallelismmost applications areeasily divided into transactions requiring only small write buffers on the order of kbthe broadcast requirements of tccare high but are well within the capabilities of cmps and small scalesmps with high speed interconnects
the performance analysis of heterogeneous multiprocessor systems is becoming increasingly difficult due to the steadily growing complexity of software and hardware components to cope with these increasing requirements analytic methods have been proposed the automatic generation of analytic system models that faithfully represent real system implementations has received relatively little attention however in this paper an approach is presented in which an analytic system model is automatically generated from the same specification that is also used for system synthesis analytic methods for performance analysis of system can thus be seamlessly integrated into the multi processor design flow which lays sound foundation for designing systems with predictable performance
abstract software pipelining is an instruction scheduling technique that exploits the instruction level parallelism ilp available in loops by overlapping operations from various successive loop iterations the main drawback of aggressive software pipelining techniques is their high register requirements if the requirements exceed the number of registers available in the target architecture some steps need to be applied to reduce the register pressure incurring some performance degradation reduce iteration overlapping or spilling some lifetimes to memory in the first part of this paper we propose set of heuristics to improve the spilling process and to better decide between adding spill code or directly decreasing the execution rate of iterations the experimental evaluation over large number of representative loops and for processor configuration reports an increase in performance by factor of and reduction of memory traffic by factor of in the second part of this paper we analyze the use of backtracking and propose novel approach for simultaneous instruction scheduling and register spilling in modulo scheduling mirs modulo scheduling with integrated register spilling the experimental evaluation reports an increase in performance by factor of and reduction of the memory traffic by factor of or an additional and with regard to the proposal in the first part of the paper these improvements are achieved at the expense of reasonable increase in the compilation time
we show how polynomial path orders can be employed efficiently in conjunction with weak innermost dependency pairs to automatically certify polynomial runtime complexity of term rewrite systems and the polytime computability of the functions computed the established techniques have been implemented and we provide ample experimental data to assess the new method
the aim of this paper is to illustrate how plausibility description logic called dlp can be exploited for reasoning about information sources characterized by heterogeneous data representation formats the paper first introduces dlp syntax and semantics then dlp based approach is illustrated for inferring complex knowledge patterns from information sources being heterogeneous in their formats and structure degrees finally it is described how inferred knowledge might be taken advantage of for constructing user profiles to be exploited in various application scenarios among these that of improving the quality of web search tools is described in detail
the advantages and positive effects of multiple coordinated views on search performance have been documented in several studies this paper describes the implementation of multiple coordinated views within the media watch on climate change domain specific news aggregation portal available at wwwecoresearchnet climate that combines portfolio of semantic services with visual information exploration and retrieval interface the system builds contextualized information spaces by enriching the content repository with geospatial semantic and temporal annotations and by applying semi automated ontology learning to create controlled vocabulary for structuring the stored information portlets visualize the different dimensions of the contextualized information spaces providing the user with multiple views on the latest news media coverage context information facilitates access to complex datasets and helps users navigate large repositories of web documents currently the system synchronizes information landscapes domain ontologies geographic maps tag clouds and just in time information retrieval agents that suggest similar topics and nearby locations
in this paper we discuss configurability as form of appropriation work we suggest that making technology work requires an awareness of the multiple dimensions of configurability carried out by numerous actors within and outside of the organizations in which new technologies are introduced in efforts to support cooperative work through discussion of the introduction of wireless call system into hospital we provide an overview of these dimensions organisational relations space and technology relations connectivity direct engagement and configurability as part of technology use and work and we suggest that in increasingly complex technological and organisational contexts greater attention will need to be focused on these dimensions of configurability in order to make things work
role play can be powerful educational tool especially when dealing with social or ethical issues however while other types of education activity have been routinely technology enhanced for some time the specific problems of supporting educational role play with technology have only begun to be tackled recently within the ecircus project we have designed framework for technology enhanced role play with the aim of educating adolescents about intercultural empathy this work was influenced by related fields such as intelligent virtual agents interactive narrative and pervasive games in this paper we will describe the different components of our role play technology by means of prototype implementation of this technology the orient showcase furthermore we will present some preliminary results of our first evaluation trials of orient
in this paper we propose model for representing and predicting distances in large scale networks by matrix factorization the model is useful for network distance sensitive applications such as content distribution networks topology aware overlays and server selections our approach overcomes several limitations of previous coordinates based mechanisms which cannot model sub optimal routing or asymmetric routing policies we describe two algorithms singular value decomposition svd and nonnegative matrix factorization nmf for representing matrix of network distances as the product of two smaller matrices with such representation we build scalable system internet distance estimation service ides that predicts large numbers of network distances from limited numbers of measurements extensive simulations on real world data sets show that ides leads to more accurate efficient and robust predictions of latencies in large scale networks than previous approaches
the problem of implementing self stabilizing timestamps with bounded values is investigated and solution is found which is applied to the exclusion problem and to the multiwriter atomic register problem thus we get self stabilizing solutions to these two well known problems new type of weak timestamps is identified here and some evidence is brought to show its usefulness
information and specifically web pages may be organized indexed searched and navigated using various metadata aspects such as keywords categories themes and also space while categories and keywords are up for interpretation space represents an unambiguous aspect to structure information the basic problem of providing spatial references to content is solved by geocoding task that relates identifiers in texts to geographic co ordinates this work presents methodology for the semiautomatic geocoding of persistent web pages in the form of collaborative human intervention to improve on automatic geocoding results while focusing on the greek language and related web pages the developed techniques are universally applicable the specific contributions of this work are automatic geocoding algorithms for phone numbers addresses and place name identifiers and ii web browser extension providing map based interface for manual geocoding and updating the automatically generated results with the geocoding of web page being stored as respective annotations in central repository this overall mechanism is especially suited for persistent web pages such as wikipedia to illustrate the applicability and usefulness of the overall approach specific geocoding examples of greek web pages are presented
dimensionality reduction is one of the widely used techniques for data analysis however it is often hard to get demanded low dimensional representation with only the unlabeled data especially for the discriminative task in this paper we put forward novel problem of transferred dimensionality reduction which is to do unsupervised discriminative dimensionality reduction with the help of related prior knowledge from other classes in the same type of concept we propose an algorithm named transferred discriminative analysis to tackle this problem it uses clustering to generate class labels for the target unlabeled data and use dimensionality reduction for them joint with prior labeled data to do subspace selection this two steps run adaptively to find better discriminative subspace and get better clustering results simultaneously the experimental results on both constrained and unconstrained face recognition demonstrate significant improvements of our algorithm over the state of the art methods
we have implemented the first copying garbage collector that permits continuous unimpeded mutator access to the original objects during copying the garbage collector incrementally replicates all accessible objects and uses mutation log to bring the replicas up to date with changes made by the mutator an experimental implementation demonstrates that the costs of using our algorithm are small and that bounded pause times of milliseconds can be readily achieved
an ad hoc data format is any nonstandard semi structured data format for which robust data processing tools are not easily available in this paper we present anne new kind of markup language designed to help users generate documentation and data processing tools for ad hoc text data more specifically given new ad hoc data source an anne programmer edits the document to add number of simple annotations which serve to specify its syntactic structure annotations include elements that specify constants optional data alternatives enumerations sequences tabular data and recursive patterns the anne system uses combination of user annotations and the raw data itself to extract context free grammar from the document this context free grammar can then be used to parse the data and transform it into an xml parse tree which may be viewed through browser for analysis or debugging purposes in addition the anne system generates pads ml description which may be saved as lasting documentation of the data format or compiled into host of useful data processing tools in addition to designing and implementing anne we have devised semantic theory for the core elements of the language this semantic theory describes the editing process which translates raw unannotated text document into an annotated document and the grammar extraction process which generates context free grammar from an annotated document we also present an alternative characterization of system behavior by drawing upon ideas from the field of relevance logic this secondary characterization which we call relevance analysis specifies direct relationship between unannotated documents and the context free grammars that our system can generate from them relevance analysis allows us to prove important theorems concerning the expressiveness and utility of our system
the osam kbms is knowledge base management system or the so called next generation database management system for non traditional data knowledge intensive applications in order to define query and manipulate knowledge base as well as to write codes to implement any application system we have developed an object oriented knowledge base programming language called to serve as the high level interface of osam kbms this paper presents the design of its implementation and its supporting kbms developed at the database systems research and development center of the university of florida
we present scalable end to end system for vision based monitoring of natural environments and illustrate its use for the analysis of avian nesting cycles our system enables automated analysis of thousands of images where manual processing would be infeasible we automate the analysis of raw imaging data using statistics that are tailored to the task of interest these ldquo features rdquo are representation to be fed to classifiers that exploit spatial and temporal consistencies our testbed can detect the presence or absence of bird with an accuracy of percnt count eggs with an accuracy of percnt and detect the inception of the nesting stage within day our results demonstrate the challenges and potential benefits of using imagers as biological sensors an exploration of system performance under varying image resolution and frame rate suggest that an in situ adaptive vision system is technically feasible
manual opacity transfer function editing for volume rendering can be difficult and counter intuitive process this paper proposes logarithmically scaled editor and argues that such scale relates the height of the transfer function to the rendered intensity of region of particular density in the volume almost directly resulting in much improved simpler manual transfer function editing
an interval of sequential process is sequence of consecutive events of this process the set of intervals defined on distributed computation defines an abstraction of this distributed computation and the traditional causality relation on events induces relation on the set of intervals that we call precedence an important question is then is the interval based abstraction associated with distributed computation consistent to answer this question this paper introduces consistency criterion named interval consistency ic intuitively this criterion states that an interval based abstraction of distributed computation is consistent if its precedence relation does not contradict the sequentiality of each process more formally ic is defined as property of precedence graph interestingly the ic criterion can be operationally characterized in terms of timestamps whose values belong to lattice the paper uses this characterization to design versatile protocol that given intervals defined by daemon whose behavior is unpredictable breaks them in nontrivial manner in order to produce an abstraction satisfying the ic criterion applications to communication induced checkpointing are suggested
heap allocation with copying garbage collection is general storage management technique for programming languages it is believed to have poor memory system performance to investigate this we conducted an in depth study of the memory system performance of heap allocation for memory systems found on many machines we studied the performance of mostly functional standard ml programs which made heavy use of heap allocation we found that most machines support heap allocation poorly however with the appropriate memory system organization heap allocation can have good performance the memory system property crucial for achieving good performance was the ability to allocate and initialize new object into the cache without penalty this can be achieved by having subblock by placement with subblock size of one word with write allocate policy along with fast page mode writes or write buffer for caches with subblock placement the data cache overhead was under for or larger data cache without subblock placement the overhead was often higher than
we describe the conceptual model of sorac data modeling system developed at the university of rhode island sorac supports both semantic objects and relationships and provides tool for modeling databases needed for complex design domains sorac’s set of built in semantic relationships permits the schema designer to specify enforcement rules that maintain constraints on the object and relationship types sorac then automatically generates code to maintain the specified enforcement rules producing schema that is compatible with ontos this facilitates the task of the schema designer who no longer has to ensure that all methods on object classes correctly maintain necessary constraints in addition explicit specification of enforcement rules permits automated analysis of enforcement propagations we compare the interpretations of relationships within the semantic and object oriented models as an introduction to the mixed model that sorac supports next the set of built in sorac relationship types is presented in terms of the enforcement rules permitted on each relationship type we then use the modeling requirements of an architectural design support system called archobjects to demonstrate the capabilities of sorac the implementation of the current sorac prototype is also briefly discussed
the rapid advancements of networking technology have boosted potential bandwidth to the point that the cabling is no longer the bottleneck rather the bottlenecks lie at the crossing points the nodes of the network where data traffic is intercepted or forwarded as result there has been tremendous interest in speeding those nodes making the equipment run faster by means of specialized chips to handle data trafficking the network processor is the blanket name thrown over such chips in their varied forms to date no performance data exist to aid in the decision of what processor architecture to use in next generation network processor our goal is to remedy this situation in this study we characterize both the application workloads that network processors need to support as well as emerging applications that we anticipate may be supported in the future then we consider the performance of three sample benchmarks drawn from these workloads on several state of the art processor architectures including an aggressive out of order speculative super scalar processor fine grained multithreaded processor single chip multiprocessor and simultaneous multithreaded processor smt the network interface environment is simulated in detail and our results indicate that smt is the architecture best suited to this environment
we propose new approach to the estimation of query result sizes for join queries the technique which we have called ldquo systematic sampling mdash syssmp rdquo is novel variant of the sampling based approach key novelty of the systematic sampling is that it exploits the sortedness of data semi the result of this is that the sample relation obtained well represents the underlying frequency distribution of the join attribute in the original relationwe first develop theoretical foundation for systematic sampling which suggests that the method gives more representative sample than the traditional simple random sampling subsequent experimental analysis on range of synthetic relations confirms that the quality of sample relations yielded by systematic sampling is higher than those produced by the traditional simple random samplingto ensure that sample relations produced by systematic sampling indeed assist in computing more accurate query result sizes we compare systematic sampling with the most efficient simple random sampling called lowbar cross using variety of relation configurations the results obtained validate that systematic sampling uses the same amount of sampling but still provides more accurate query result sizes than lowbar cross furthermore the extra sampling cost incurred by the use of systematic sampling pays off in cheaper query execution cost at run time
irregular algorithms are organized around pointer based data structures such as graphs and trees and they are ubiquitous in applications recent work by the galois project has provided systematic approach for parallelizing irregular applications based on the idea of optimistic or speculative execution of programs however the overhead of optimistic parallel execution can be substantial in this paper we show that many irregular algorithms have structure that can be exploited and present three key optimizations that take advantage of algorithmic structure to reduce speculative overheads we describe the implementation of these optimizations in the galois system and present experimental results to demonstrate their benefits to the best of our knowledge this is the first system to exploit algorithmic structure to optimize the execution of irregular programs
in this paper we describe dense motion segmentation method for wide baseline image pairs unlike many previous methods our approach is able to deal with deforming motions and large illumination changes by using bottom up segmentation strategy the method starts from sparse set of seed matches between the two images and then proceeds to quasi dense matching which expands the initial seed regions by using local propagation then the quasi dense matches are grouped into coherently moving segments by using local bending energy as the grouping criterion the resulting segments are used to initialize the motion layers for the final dense segmentation stage where the geometric and photometric transformations of the layers are iteratively refined together with the segmentation which is based on graph cuts our approach provides wider range of applicability than the previous approaches which typically require rigid planar motion model or motion with small disparity in addition we model the photometric transformations in spatially varying manner our experiments demonstrate the performance of the method with real images involving deforming motion and large changes in viewpoint scale and illumination
an inherent dilemma exists in the design of high functionality applications such as repositories of reusable software components in order to be useful high functionality applications have to provide large number of features creating huge learning problems for users we address this dilemma by developing intelligent interfaces that support learning on demand by enabling users to learn new features when they are needed during work we support learning on demand with information delivery by identifying learning opportunities of which users might not be aware the challenging issues in implementing information delivery are discussed and techniques to address them are illustrated with the codebroker system codebroker supports java programmers in learning reusable software components in the context of their normal development environments and practice by proactively delivering task relevant and personalized information evaluations of the system have shown its effectiveness in supporting learning on demand
feature selection as preprocessing step to machine learning has been very effective in reducing dimensionality removing irrelevant data increasing learning accuracy and improving result comprehensibility traditional feature selection methods resort to random sampling in dealing with data sets with huge number of instances in this paper we introduce the concept of active feature selection and investigate selective sampling approach to active feature selection in filter model setting we present formalism of selective sampling based on data variance and apply it to widely used feature selection algorithm relief further we show how it realizes active feature selection and reduces the required number of training instances to achieve time savings without performance deterioration we design objective evaluation measures of performance conduct extensive experiments using both synthetic and benchmark data sets and observe consistent and significant improvement we suggest some further work based on our study and experiments
we investigate techniques for analysis and retrieval of object trajectories we assume that trajectory is sequence of two or three dimensional points trajectory datasets are very common in environmental applications mobility experiments video surveillance and are especially important for the discovery of certain biological patterns such kind of data usually contain great amount of noise that makes all previously used metrics fail therefore here we formalize non metric similarity functions based on the longest common subsequence lcss which are very robust to noise and furthermore provide an intuitive notion of similarity between trajectories by giving more weight to the similar portions of the sequences stretching of sequences in time is allowed as well as global translating of the sequences in space efficient approximate algorithms that compute these similarity measures are also provided we compare these new methods to the widely used euclidean and dynamic time warping distance functions for real and synthetic data and show the superiority of our approach especially under the strong presence of noise we prove weaker version of the triangle inequality and employ it in an indexing structure to answer nearest neighbor queries finally we present experimental results that validate the accuracy and efficiency of our approach
recent work has presented the design and implementation of software library named dymelor supporting transparent log restore facilities for optimistic simulation objects with generic memory layout this library offers the possibility to allocate deallocate memory chunks via standard api and performs log restore of the object state via pack unpack techniques exploiting ad hoc meta data concisely identifying the object state layout at each point in simulation time in this paper we complement such library with software architecture offering the following additional advantages run time identification of chunk updates within the dynamic memory map ii reduced checkpoint latency and increased effectiveness in memory usage thanks to log restore facilities based on periodic snapshots of the whole simulation object state taken via the incremental copy of the modified dirty chunks onlyour approach is based on software instrumentation techniques suited for linux and the elf format targeting memory update references performed by the application level software and on lightweight run time monitoring mechanism providing minimal overhead while tracking the exact memory addresses and the size of memory areas dirtied by the execution of each event also our design has been oriented to portability across bit and bit intel compliant architectures thus covering wide spectrum of off the shelf machines
in this paper we propose new communication synthesis approach targeting systems with sequential communication media scm since scms require that the reading sequence and writing sequence must have the same order different transmission orders may have dramatic impact on the final performance however the problem of determining the best possible communication order for scms is not adequately addressed by prior work the goal of our work is to consider behaviors in communication synthesis for scm detect appropriate transmission order to optimize latency automatically transform the behavior descriptions and automatically generate driver routines and glue logics to access physical channels our algorithm named scoop successfully achieves these goals by behavior and communication co optimization compared to the results without optimization we can achieve an average improvement in total latency on set of real life benchmarks
we report on an extension of haskell with open type level functions and equality constraints that unifies earlier work on gadts functional dependencies and associated types the contribution of the paper is that we identify and characterise the key technical challenge of entailment checking and we give novel decidable sound and complete algorithm to solve it together with some practically important variants our system is implemented in ghc and is already in active use
due to the significant progress in automated verification there are often several techniques for particular verification problemin many circumstances different techniques are complementary eachtechnique works well for different type of input instances unfortunately it is not clear how to choose an appropriate technique for specific instanceof problem in this work we argue that this problem selection ofa technique and tuning its parameter values should be considered as astandalone problem verification meta search we propose several classificationsof models of asynchronous system and discuss applications ofthese classifications in the context of explicit finite state model checking
in the data warehouse environment the concept of materialized view is nowadays common and important in an objective of efficiently supporting olap query processing materialized views are generally derived from select project join of several base relations these materialized views need to be updated when the base relations change since the propagation of updates to the views may impose significant overhead it is very important to update the warehouse views efficiently though various view maintenance strategies have been discussed so far they typically require too much access to base relations resulting in the performance degradationin this paper we propose an efficient incremental view maintenance strategy called delta propagation that can minimize the total size of base relations accessed by analyzing the properties of base relations we first define the delta expression and delta propagation tree which are core concepts of the strategy then dynamic programming algorithm that can find the optimal delta expression are proposed we also present various experimental results that show the usefulness and efficiency of the strategy
this paper addresses the maximal lifetime scheduling for sensor surveillance systems with sensors to target given set of sensors and targets in an euclidean plane sensor can watch only one target at time and target should be watched by geq sensors at any time our task is to schedule sensors to watch targets and pass data to the base station such that the lifetime of the surveillance system is maximized where the lifetime is the duration up to the time when there exists one target that cannot be watched by sensors or data cannot be forwarded to the base station due to the depletion of energy of the sensor nodes we propose an optimal solution to find the target watching schedule for sensors that achieves the maximal lifetime our solution consists of three steps computing the maximal lifetime of the surveillance system and workload matrix by using linear programming techniques decomposing the workload matrix into sequence of schedule matrices that can achieve the maximal lifetime and determining the sensor surveillance trees based on the above obtained schedule matrices which specify the active sensors and the routes to pass sensed data to the base station this is the first time in the literature that this scheduling problem of sensor surveillance systems has been formulated and the optimal solution has been found we illustrate our optimal method by numeric example and experiments in the end
reference counting memory management is often advocated as technique for reducing or avoiding the pauses associated with tracing garbage collection we present some measurements to remind the reader that classic reference count implementations may in fact exhibit longer pauses than tracing collectorswe then analyze reference counting with lazy deletion the standard technique for avoiding long pauses by deferring deletions and associated reference count decrements usually to allocation time our principal result is that if each reference count operation is constrained to take constant time then the overall space requirements can be increased by factor of in the worst case where is the ratio between the size of the largest and smallest allocated object this bound is achievable but probably large enough to render this design point useless for most real time applicationswe show that this space cost can largely be avoided if allocating an byte object is allowed to additionally perform reference counting work
as configurable processing advances elements from the traditional approaches of both hardware and software development can be combined by incorporating customized application specific computational resources into the processor’s architecture especially in the case of field programmable gate array based systems with soft processors so as to enhance the performance of embedded applications this paper explores the use of several different microarchitectural alternatives to increase the performance of edge detection algorithms which are of fundamental importance for the analysis of dna microarray images optimized application specific hardware modules are combined with efficient parallelized software in an embedded soft core based multi processor it is demonstrated that the performance of one common edge detection algorithm namely sobel can be boosted remarkably by exploiting the architectural extensions offered by the soft processor in conjunction with the execution of carefully selected application specific instruction set extensions on custom made accelerating co processor connected to the processor core we introduce new approach that makes this methodology noticeably more efficient across various applications from the same domain which are often similar in structure with flexibility to update the processing algorithms an improvement reaching one order of magnitude over all software solutions could be obtained in support of this flexibility an effective adaptation of this approach is demonstrated which performs real time analysis of extracted microarray data the proposed reconfigurable multi core prototype has been exploited with minor changes to achieve almost speedup
as the field of sensor networks matures research in this area is focusing not only on fixed networks but also on mobile sensor networks for many reasons both technical and logistical such networks will often be very sparse for all or part of their operation sometimes functioning more as disruption tolerant networks dtns while much work has been done on localization methods for densely populated fixed networks most of these methods are inefficient or ineffective for sparse mobile networks where connections can be infrequent while some mobile networks rely on fixed location beacons or per node onboard gps these methods are not always possible due to cost power and other constraintsin this paper we present the low density collaborative ad hoc localization estimation locale system for sparse sensor networks in locale each node estimates its own position and collaboratively refines that location estimate by updating its prediction based on neighbors it encounters nodes also estimate as probability density function the likelihood their prediction is accurate we evaluate locale’s collaborative localization both through real implementations running on sensor nodes as well as through simulations of larger systems we consider scenarios of varying density down to neighbors per communication attempt as well as scenarios that demonstrate locale’s resilience in the face of extremely inaccurate individual nodes overall our algorithms yield up to median of better accuracy for location estimation compared to existing approaches in addition by allowing nodes to refine location estimates collaboratively locale also reduces the need for fixed location beacons ie gps enabled beacon towers by as much as
in real time collaborative systems replicated objects shared by users are subject to concurrency constraints in order to satisfy these various algorithms qualified as opðtimistic have been proposed that exploit the semantic properties of operations to serialize concurrent operations and achieve copy conðvergence of replicated objects their drawback is that they either reðquire condition on user’s operations which is hard to verify when possible to ensure or they need undoðing then redoing operations in some situations the main purpose of this paper is to present two new algorithms that overðcome these drawbacks they are based upon the impleðmentation of continuous global order which enables that condition to be released and simplifies the operation inteðgration process in the second algorithm thanks to deðferred broadcast of operations to other sites this process becomes even more simplified
in akl and taylor cryptographic solution to problem of access control in hierarchy acm transactions on computer systems first suggested the use of cryptographic techniques to enforce access control in hierarchical structures due to its simplicity and versatility the scheme has been used for more than twenty years to implement access control in several different domains including mobile agent environments and xml documents however despite its use over time the scheme has never been fully analyzed with respect to security and efficiency requirements in this paper we provide new results on the akl taylor scheme and its variants more precisely we provide rigorous analysis of the akl taylor scheme we consider different key assignment strategies and prove that the corresponding schemes are secure against key recovery we show how to obtain different tradeoffs between the amount of public information and the number of steps required to perform key derivation in the proposed schemes we also look at the mackinnon et al and harn and lin schemes and prove they are secure against key recovery we describe an akl taylor based key assignment scheme with time dependent constraints and prove the scheme efficient flexible and secure we propose general construction which is of independent interest yielding key assignment scheme offering security wrt key indistinguishability given any key assignment scheme which guarantees security against key recovery finally we show how to use our construction along with our assignment strategies and tradeoffs to obtain an akl taylor scheme secure wrt key indistinguishability requiring constant amount of public information
with the increase of available public data sources and the interest for analyzing them privacy issues are becoming the eye of the storm in many applications the vast amount of data collected on human beings and organizations as result of cyberinfrastructure advances or that collected by statistical agencies for instance has made traditional ways of protecting social science data obsolete this has given rise to different techniques aimed at tackling this problem and at the analysis of limitations in such environments such as the seminal study by aggarwal of anonymization techniques and their dependency on data dimensionality the growing accessibility to high capacity storage devices allows keeping more detailed information from many areas while this enriches the information and conclusions extracted from this data it poses serious problem for most of the previous work presented up to now regarding privacy focused on quality and paying little attention to performance aspects in this workshop we want to gather researchers in the areas of data privacy and anonymization together with researchers in the area of high performance and very large data volumes management we seek to collect the most recent advances in data privacy and anonymization ie anonymization techniques statistic disclosure techniques privacy in machine learning algorithms privacy in graphs or social networks etc and those in high performance and data management ie algorithms and structures for efficient data management parallel or distributed systems etc
this paper presents systematic approach to matching categories of query language interfaces with the requirements of certain user types the method is based on trend model of query language development on the dimensions of functional capabilities and usability from the trend model the following are derived classification scheme for query languages criterion hierarchy for query language evaluation comprehensive classification scheme of query language users and their requirements and preliminary recommendations for allocating language classes to user types the method integrates the results of existing human factors studies and provides structured framework for future research in this area current and expected developments are exemplified by the description of new generation database query languages in practical query language selection problem the results of this paper can be used for preselecting suitable query language types the final selection decision will also depend on organization specific factors such as the available database management system hardware and software strategies and financial system costs
many integrated circuit systems particularly in the multimedia and telecom domains are inherently data dominant for this class of systems large part of the power consumption is due to the data storage and data transfer moreover significant part of the chip area is occupied by memory the computation of the memory size is an important step in the system level exploration in the early stage of designing an optimized for area and or power memory architecture for this class of systems this paper presents novel nonscalar approach for computing exactly the minimum size of the data memory for high level procedural specifications of multidimensional signal processing applications in contrast with all the previous works which are estimation methods this approach can perform exact memory computations even for applications with numerous and complex array references and also with large numbers of scalars
this paper evaluates the potential contributions of natural language processing to requirements engineering we present selective history of the relationship between requirements engineering re and natural language processing nlp and briefly summarize relevant recent trends in nlp the paper outlines basic issues in re and how they relate to interactions between nlp front end and system development processes we suggest some improvements to nlp that may be possible in the context of re and conclude with an assessment of what should be done to improve likelihood of practical impact in this direction
image retrieval with dynamically extracted features compares user defined regions of interest with all sections of the archived images image elements outside the selected region are ignored thus objects can be found regardless of the specific environment several wavelet and gabor based methods for template matching are proposed the improved retrieval flexibility requires immense computational resources which can be satisfied by utilisation of powerful parallel architectures therefore cluster based architecture for efficient image retrieval is discussed in the second part of the article techniques for the partitioning of the image information parallel execution of the queries and strategies for workload balancing are explained by considering the parallel image database cairo as an example the quality and efficiency of the retrieval is examined by number of experiments
the paradigm shift in processor design from monolithic processors to multicore has renewed interest in programming models that facilitate parallelism while multicores are here today the future is likely to witness architectures that use reconfigurable fabrics fpgas as coprocessors fpgas provide an unmatched ability to tailor their circuitry per application leading to better performance at lower power unfortunately the skills required to program fpgas are beyond the expertise of skilled software programmers this paper shows how to bridge the gap between programming software vs hardware we introduce lime new object oriented language that can be compiled for the jvm or into synthesizable hardware description language lime extends java with features that provide way to carry oo concepts into efficient hardware we detail an end to end system from the language down to hardware synthesis and demonstrate lime program running on both conventional processor and in an fpga
we present transport protocol whose goal is to reduce power consumption without compromising delivery requirements of applications to meet its goal of energy efficiency our transport protocol contains mechanisms to balance end to end vs local retransmissions minimizes acknowledgment traffic using receiver regulated rate based flow control combined with selected acknowledgements and in network caching of packets and aggressively seeks to avoid any congestion based packet loss within recently developed ultra low power multi hop wireless network system extensive simulations and experimental results demonstrate that our transport protocol meets its goal of preserving the energy efficiency of the underlying network
this paper presents an exploratory study of college age students using two way push to talk cellular radios we describe the observed and reported use of cellular radio by the participants we discuss how the half duplex lightweight cellular radio communication was associated with reduced interactional commitment which meant the cellular radios could be used for wide range of conversation styles one such style intermittent conversation is characterized by response delays intermittent conversation is surprising in an audio medium since it is typically associated with textual media such as instant messaging we present design implications of our findings
it is not uncommon for modern systems to be composed of variety of interacting services running across multiple machines in such way that most developers do not really understand the whole system as abstraction is layered atop abstraction developers gain the ability to compose systems of extraordinary complexity with relative ease however many software properties especially those that cut across abstraction layers become very difficult to understand in such compositions the communication patterns involved the privacy of critical data and the provenance of information can be difficult to find and understand even with access to all of the source code the goal of data flow tomography is to use the inherent information flow of such systems to help visualize the interactions between complex and interwoven components across multiple layers of abstraction in the same way that the injection of short lived radioactive isotopes help doctors trace problems in the cardiovascular system the use of data tagging can help developers slice through the extraneous layers of software and pin point those portions of the system interacting with the data of interest to demonstrate the feasibility of this approach we have developed prototype system in which tags are tracked both through the machine and in between machines over the network and from which novel visualizations of the whole system can be derived we describe the system level challenges in creating working system tomography tool and we qualitatively evaluate our system by examining several example real world scenarios
the need for network storage has been increasing rapidly owing to the widespread use of the internet in organizations and the shortage of local storage space due to the increasing size of applications and databases proliferation of network storage systems entails significant increase in the number of storage objects eg files stored the number of concurrent clients and the size and number of storage objects transferred between the systems and their clients performance eg client perceived latency of these systems becomes major concern previous research has explored techniques for scaling up the number of storage servers involved to enhance the performance of network storage systems however adding servers to improve system performance is an expensive solution moreover for wan based network storage system the bottleneck for its performance improvement typically is not caused by the load of storage servers but by the network traffic between clients and storage servers this paper introduces an internet based network storage system named netshark and proposes caching based performance enhancement solution for such system the proposed performance enhancement solution is validated using simulation
in this paper we propose scheme that combines type inference and run time checking to make existing programs type safe we describe the ccured type system which extends that of by separating pointer types according to their usage this type system allows both pointers whose usage can be verified statically to be type safe and pointers whose safety must be checked at run time we prove type soundness result and then we present surprisingly simple type inference algorithm that is able to infer the appropriate pointer kinds for existing programsour experience with the ccured system shows that the inference is very effective for many programs as it is able to infer that most or all of the pointers are statically verifiable to be type safe the remaining pointers are instrumented with efficient run time checks to ensure that they are used safely the resulting performance loss due to run time checks is which is several times better than comparable approaches that use only dynamic checking using ccured we have discovered programming bugs in established programs such as several specint benchmarks
network data is ubiquitous encoding collections of relationships between entities such as people places genes or corporations while many resources for networks of interesting entities are emerging most of these can only annotate connections in limited fashion although relationships between entities are rich it is impractical to manually devise complete characterizations of these relationships for every pair of entities on large real world corpora in this paper we present novel probabilistic topic model to analyze text corpora and infer descriptions of its entities and of relationships between those entities we develop variational methods for performing approximate inference on our model and demonstrate that our model can be practically deployed on large corpora such as wikipedia we show qualitatively and quantitatively that our model can construct and annotate graphs of relationships and make useful predictions
we introduce the scape method shape completion and animation for people data driven method for building human shape model that spans variation in both subject shape and pose the method is based on representation that incorporates both articulated and non rigid deformations we learn pose deformation model that derives the non rigid surface deformation as function of the pose of the articulated skeleton we also learn separate model of variation based on body shape our two models can be combined to produce surface models with realistic muscle deformation for different people in different poses when neither appear in the training set we show how the model can be used for shape completion generating complete surface mesh given limited set of markers specifying the target shape we present applications of shape completion to partial view completion and motion capture animation in particular our method is capable of constructing high quality animated surface model of moving person with realistic muscle deformation using just single static scan and marker motion capture sequence of the person
ownership types enforce encapsulation in object oriented programs by ensuring that objects cannot be leaked beyond object that own them existing ownership programming languages either do not support parametric polymorphism type genericity or attempt to add it on top of ownership restrictions generic ownership provides per object ownership on top of sound generic imperative language the resulting system not only provides ownership guarantees comparable to established systems but also requires few additional language mechanisms due to full reuse of parametric polymorphism we formalise the core of generic ownership highlighting that only restriction of this calls and owner subtype preservation are required to achieve deep ownership finally we describe how ownership generic java ogj was implemented as minimal extension to generic java in the hope of bringing ownership types into mainstream programming
community wireless mesh networks wmns are increasingly being deployed for providing cheap low maintenance internet access for the successful adoption of wmns as last mile technology we argue that guarantee of per client fairness is critical specifically wmns should support bitrate for bucks service model similar to other popular access technologies such as cable dsl we analyze the effectiveness of both off the shelf and theoretically optimal approaches towards providing such service we propose the apollo system that outperforms both these approaches apollo seamlessly integrates three synergistic components theory guided service planning and subscription rate based admission control to enforce the planned service and novel distributed light weight fair scheduling scheme to deliver the admitted traffic we evaluate apollo using simulations and testbed experiments
in this paper we address the problem of shape analysis for concurrent programs we present new algorithms based on abstract interpretation for automatically verifying properties of programs with an unbounded number of threads manipulating an unbounded shared heapour algorithms are based on new abstract domain whose elements represent thread quantified invariants ie invariants satisfied by all threads we exploit existing abstractions to represent the invariants thus our technique lifts existing abstractions by wrapping universal quantification around elements of the base abstract domain such abstractions are effective because they are thread modular eg they can capture correlations between the local variables of the same thread as well as correlations between the local variables of thread and global variables but forget correlations between the states of distinct threads the exact nature of the abstraction of course depends on the base abstraction lifted in this style we present techniques for computing sound transformers for the new abstraction by using transformers of the base abstract domain we illustrate our technique in this paper by instantiating it to the boolean heap abstraction producing quantified boolean heap abstraction we have implemented an instantiation of our technique with canonical abstraction as the base abstraction and used it to successfully verify linearizability of data structures in the presence of an unbounded number of threads
backbone has been used extensively in various aspects eg routing route maintenance broadcast scheduling for wireless networks previous methods are mostly designed to minimize the backbone size however in many applications it is desirable to construct backbone with small cost when each wireless node has cost of being in the backbone in this paper we first show that previous methods specifically designed to minimize the backbone size may produce backbone with large cost we then propose an efficient distributed method to construct weighted sparse backbone with low cost we prove that the total cost of the constructed backbone is within small constant factor of the optimum for homogeneous networks when either the nodes costs are smooth or the network maximum node degree is bounded we also show that with small modification the constructed backbone is efficient for unicast the total cost or hop of the least cost or hop path connecting any two nodes using backbone is no more than or times of the least cost or hop path in the original communication graph as side product we give an efficient overlay based multicast structure whose total cost is no more than times of the minimum when the network is modeled by udg our theoretical results are corroborated by our simulation studies
regression testing is an expensive but necessary maintenance activity performed on modified software to provide confidence that changes are correct and do not adversely affect other portions of the softwore regression test selection technique choses from an existing test set thests that are deemed necessary to validate modified software we present new technique for regression test selection our algorithms construct control flow graphs for precedure or program and its modified version and use these graphs to select tests that execute changed code from the original test suite we prove that under certain conditions the set of tests our technique selects includes every test from the original test suite that con expose faults in the modified procedfdure or program under these conditions our algorithms are safe moreover although our algorithms may select some tests that cannot expose faults they are at lease as precise as other safe regression test selection algorithms unlike many other regression test selection algorithms our algorithms handle all language constructs and all types of program modifications we have implemented our algorithms initial empirical studies indicate that our technique can significantly reduce the cost of regression testing modified software
there are many applications that demand large quantities of natural looking motion it is difficult to synthesize motion that looks natural particularly when it is people who must move in this paper we present framework that generates human motions by cutting and pasting motion capture data selecting collection of clips that yields an acceptable motion is combinatorial problem that we manage as randomized search of hierarchy of graphs this approach can generate motion sequences that satisfy variety of constraints automatically the motions are smooth and human looking they are generated in real time so that we can author complex motions interactively the algorithm generates multiple motions that satisfy given set of constraints allowing variety of choices for the animator it can easily synthesize multiple motions that interact with each other using constraints this framework allows the extensive re use of motion capture data for new purposes
earth science research and applications usually use distributed geospatial information processing dgip services and powerful computing capabilities to extract information and knowledge from large volumes of distributed geospatial data conceptually such processing can be abstracted into logical model that utilizes geospatial domain knowledge to produce new geospatial products using this idea the geo tree concept and the proposed geospatial abstract information model aim have been used to develop grid workflow engine complying with geospatial standards and the business process execution language upon user’s request the engine generates virtual geospatial data information knowledge products from existing dgip data and services this article details how to define and describe the aim in xml format describe the process logically with an aim including the geospatial semantic logic conceptually describe the process of producing particular geospatial product step by step from raw geospatial data instantiate aim as concrete grid service workflow by selecting the optimal service instances and data sets and design grid workflow engine to execute the concrete workflows to produce geospatial products to verify the advantages and applicability of this grid enabled virtual geospatial product system its performance is evaluated and sample application is provided
feature selection is an important aspect of solving data mining and machine learning problems this paper proposes feature selection method for the support vector machine svm learning like most feature selection methods the proposed method ranks all features in decreasing order of importance so that more relevant features can be identified it uses novel criterion based on the probabilistic outputs of svm this criterion termed feature based sensitivity of posterior probabilities fspp evaluates the importance of specific feature by computing the aggregate value over the feature space of the absolute difference of the probabilistic outputs of svm with and without the feature the exact form of this criterion is not easily computable and approximation is needed four approximations fspp fspp are proposed for this purpose the first two approximations evaluate the criterion by randomly permuting the values of the feature among samples of the training data they differ in their choices of the mapping function from standard svm output to its probabilistic output fspp uses simple threshold function while fspp uses sigmoid function the second two directly approximate the criterion but differ in the smoothness assumptions of criterion with respect to the features the performance of these approximations used in an overall feature selection scheme is then evaluated on various artificial problems and real world problems including datasets from the recent neural information processing systems nips feature selection competition fspp show good performance consistently with fspp being the best overall by slight margin the performance of fspp is competitive with some of the best performing feature selection methods in the literature on the datasets that we have tested its associated computations are modest and hence it is suitable as feature selection method for svm applications
in order for collective communication routines to achieve high performance on different platforms they must be able to adapt to the system architecture and use different algorithms for different situations current message passing interface mpi implementations such as mpich and lam mpi are not fully adaptable to the system architecture and are not able to achieve high performance on many platforms in this paper we present system that produces efficient mpi collective communication routines by automatically generating topology specific routines and using an empirical approach to select the best implementations our system adapts to given platform and constructs routines that are customized for the platform the experimental results show that the tuned routines consistently achieve high performance on clusters with different network topologies
in our prior work we explored cache organization providing architectural support for distinguishing between memory references that exhibit spatial and temporal locality and mapping them to separate cachesthat work showed that using separate data caches for indexed or stream data and scalar data items could lead to substantial improvements in terms of cache misses in addition such separation allowed for the design of caches that could be tailored to meet the properties exhibited by different data itemsin this paper we investigate the interaction between three established methods split cache victim cache and stream buffer since significant amounts of compulsory and conflict misses are avoided the size of each cache ie array and scalar as well as the combined cache capacity can be reduced our results show that on average reduction in miss rates over the base configuration
there is an increasing quantity of data with uncertainty arising from applications such as sensor network measurements record linkage and as output of mining algorithms this uncertainty is typically formalized as probability density functions over tuple values beyond storing and processing such data in dbms it is necessary to perform other data analysis tasks such as data mining we study the core mining problem of clustering on uncertain data and define appropriate natural generalizations of standard clustering optimization criteria two variations arise depending on whether point is automatically associated with its optimal center or whether it must be assigned to fixed cluster no matter where it is actually located for uncertain versions of means and median we show reductions to their corresponding weighted versions on data with no uncertainties these are simple in the unassigned case but require some care for the assigned version our most interesting results are for uncertain center which generalizes both traditional center and median objectives we show variety of bicriteria approximation algorithms one picks kε logn centers and achieves approximation to the best uncertain centers another picks centers and achieves constant factor approximation collectively these results are the first known guaranteed approximation algorithms for the problems of clustering uncertain data
wireless sensor network wsn consists of groups of spatially distributed networked sensors used to cooperatively monitor physical environmental conditions these sensors are usually strongly resource constrained hence the network makes use of base stations nodes with robust disk storage energy and capacity of processing in wsn collected data are passed from sensor to sensor until the base station is reached query processing mechanism for wsns should be able to handle common conditions such as failures resource limitations eg energy and memory the existence of large amounts of data streams and mobility of the sensors an efficient strategy for dealing with such conditions is to introduce adaptability when processing queries adaptive query engines and query operators algorithms can adjust their behavior in response to conditions eg less energy memory availability which may occur when processing queries in this paper we propose an adaptive query processing mechanism for wsns in order to achieve this goal we propose generic data model to enable logical views over data streams so that the proposed query engine can see tuples of virtual relations rather than raw data streams flowing through the wsn ii introduce sql like query language called sensor network query language snql which enables users to express declarative queries and dynamically change parameters in queries clauses and iii propose two adaptive query operators called adaga adaptive aggregation algorithm for sensor networks and adapt adaptive join operator adaga is responsible for processing in network aggregation in the sensor nodes whereas adapt processes join operations in the base station of wsn both operators are able to dynamically adjust their behavior according to memory and energy usage in sensor nodes adaga and in the base station adapt experimental results presented in this paper prove the efficiency of the proposed query mechanism
this paper introduces paradigm for describing and specifying tangible user interfaces tuis the proposed token and constraints tac paradigm captures the core components of tuis while addressing many of the conceptual challenges unique to building these interfaces the paradigm enables the description of broad range of tuis by providing common set of constructs thus the tac paradigm lays the foundation for high level description language and software toolkit for tuis we evaluate the proposed paradigm by testing its ability to specify wide variety of existing tuis
the sporadic server ss overcomes the major limitations of other resource reservation fixed priority based techniques but it also presents some drawbacks mainly related to an increased scheduling overhead and not so efficient behavior during overrun situations in this paper we introduce and prove the effectiveness of an improved ss with reduced overhead and fairer handling of server overrun situations we also show how this can be efficiently exploited to provide temporal isolation in multiprocessor platform adapting already existing schedulability tests
web based cooperative systems hardly use approved user interface concepts for the design of interactive systems and thereby aggravate the interaction of the users with the system and also with each other in this article we describe how the flexibility and usability of such systems can particularly be improved by supporting direct manipulation techniques for navigation as well as tailoring the new functionality for tailoring and navigation is complemented by new forms of visualizing synchronous awareness information in web based systems we show this exemplarily by retrofitting the web based collaborative system cure however the necessary concepts can easily be transferred to other web based systems
this paper presents an equational formulation of an object oriented data model in this model database is represented as system of equations over set of oid’s and database query is transformation of system of equations into another system of equations during the query processing our model maintains an equivalence relation over oid’s that relates oid’s corresponding to the same real world entity by this mechanism the model achieves declarative set based query language and views for objects with identity moreover the query primitives are designed so that queries including object traversal can be evaluated in data parallel fashion
destination management systems dms is perfect application area for semantic web and pp technologies since tourism information dissemination and exchange are the key backbones of tourism destination management dms should take advantage of pp technologies and semantic web services interoperability ontologies and semantic annotation rdf based pp networks allow complex and extendable descriptions of resources instead of fixed and limited ones and they provide query facilities against these metadata instead of simple keyword based searches the layered adaptive semantic based dms ladms and peer to peer pp project aims at providing semantic based tourism destination information by combining the pp paradigm with semantic web technologies in this paper we propose metadata model encoding semantic tourism destination information in an rdf based pp network architecture the model combines ontological structures with information for tourism destinations and peers
identifying and reacting to malicious or anomalous ip traffic is significant challenge for network operators automated real time responses have been simplistic and require followup actions by technically specialised employees we describe system where off the shelf game engine technology enables collaborative network control through familiar interaction metaphors by translating network events into visually orthogonal activities anomalous behaviour is targeted by the managers as players using in game techniques such as shooting or healing resulting in defensive actions such as updates to firewall’s access control list being instantiated behind the scenes
we present cartoon animation style rendering method for water animation in an effort to capture and represent crucial features of water observed in traditional cartoon animation we propose cartoon water shader the proposed rendering method is modified phong illumination model augmented by the optical properties that ray tracing provides we also devise metric that automatically changes between refraction and reflection based on the angle between the normal vector of the water surface and the camera direction an essential characteristic in cartoon water animation is the use of flow lines we produce water flow regions with water flow shader assuming that an input to our system is result of an existing fluid simulation the input mesh contains proper geometric properties the water flow lines can be recovered by computing the curvature from the input geometry through which ridges and valleys are easily identified
many of recent studies have proved the tail equivalence result for egalitarian processor sharing system equation where resp is service requirement resp sojourn time of customer for ps rho in this paper we consider time shared systems in which the server capacity is shared by existing customers in proportion to dynamic weights assigned to customers we consider two systems in which the weight of customer depends on it age attained service and in which the weight depends on the residual processing time rpt we allow for parameterized family of weight functions such that the weight associated with customer that has received service or has rpt of units is omega alpha for some infin alpha infin we then study the sojourn time of customer under such scheduling discipline and provide conditions on alpha for tail equivalence to hold true and also give the value of as function of alpha
the research and industrial communities have made great strides in developing sophisticated defect detection tools based on static analysis to date most of the work in this area has focused on developing novel static analysis algorithms but has neglected study of other aspects of static analysis tools particularly user interfaces in this work we present novel user interface toolkit called path projection that helps users visualize navigate and understand program paths common component of many tools error reports we performed controlled user study to measure the benefit of path projection in triaging error reports from locksmith data race detection tool for we found that path projection improved participants time to complete this task without affecting accuracy while participants felt path projection was useful and strongly preferred it to more standard viewer
common practice in work groups is to share links to interesting web pages moreover passages in these web pages are often cut and pasted and used in various other contexts in this paper we report how we explore the idea of paragraph fingerprinting to achieve the goal of annotate once appear anywhere in social annotation system called spartagus this work was motivated by the prominence of redundant contents with different urls on the web and shared documents that are read and re read within enterprises our technique attaches users annotations to the contents of paragraphs enabling annotations to move along with the paragraphs within dynamic live pages and travel across page boundary to other pages as long as the paragraph contents remain intact we also describe how we use paragraph fingerprinting to facilitate the social sharing of information nuggets among our users
query processing is one of the most important mechanisms for data management and there exist mature techniques for effective query optimization and efficient query execution the vast majority of these techniques assume workloads of rather small transactional tasks with strong requirements for acid properties however the emergence of new computing paradigms such as grid and cloud computing the increasingly large volumes of data commonly processed the need to support data driven research intensive data analysis and new scenarios such as processing data streams on the fly or querying web services the fact that the metadata fed to optimizers are often missing at compile time and the growing interest in novel optimization criteria such as monetary cost or energy consumption create unique set of new requirements for query processing systems these requirements cannot be met by modern techniques in their entirety although interesting solutions and efficient tools have already been developed for some of them in isolation next generation query processors are expected to combine features addressing all of these issues and consequently lie at the confluence of several research initiatives this paper aims to present vision for such processors to explain their functionality requirements and to discuss the open issues along with their challenges
in this state of the art report we review advances in distributed component technologies such as the enterprise java beans specification and the corba component model we assess the state of industrial practice in the use of distributed components we show several architectural styles for whose implementation distributed components have been used successfully we review the use of iterative and incremental development processes and the notion of model driven architecture we then assess the state of the art in research into novel software engineering methods and tools for the modelling reasoning and deployment of distributed components the open problems identified during this review result in the formulation of research agenda that will contribute to the systematic engineering of distributed systems based on component technologies
new breed of web application dubbed ajax is emerging in response to limited degree of interactivity in large grain stateless web interactions at the heart of this new approach lies single page interaction model that facilitates rich interactivity also push based solutions from the distributed systems are being adopted on the web for ajax applications the field is however characterized by the lack of coherent and precisely described set of architectural concepts as consequence it is rather difficult to understand assess and compare the existing approaches we have studied and experimented with several ajax frameworks trying to understand their architectural properties in this paper we summarize four of these frameworks and examine their properties and introduce the spiar architectural style which captures the essence of ajax applications we describe the guiding software engineering principles and the constraints chosen to induce the desired properties the style emphasizes user interface component development intermediary delta communication between client server components and push based event notification of state changes through the components to improve number of properties such as user interactivity user perceived latency data coherence and ease of development in addition we use the concepts and principles to discuss various open issues in ajax frameworks and application development
the internet has dramatically changed how people sell and buy goods in recent years we have seen the emergence of electronic marketplaces that leverage information technology to create more efficient markets such as online auctions to bring together buyers and sellers with greater effectiveness at massive scale despite the growing interest and importance of such marketplaces our understanding of how the design of the marketplace affects buyer and seller behavior at the individual level and the market effectiveness at the aggregate level is still quite limited this paper presents detailed case study of currently operational massive scale online auction marketplace the main focus is to gain initial insights into the effects of the design of the marketplace the results of the study point to several important considerations and implications not only for the design of online marketplaces but also for the design of large scale websites where effective locating of information is key to user success
with the increasing processing speeds there is shift away from the paradigm of centralized sequential storage systems towards distributed and network based storage systems further with the new imaging and real time multimedia applications it is becoming more than ever important to design powerful efficient and scaleable systems in this paper the requirements of storage subsystems in multimedia environment were presented the storage system components relating to those requirements were analyzed current solutions were surveyed and classified then we proposed approaches to improve storage subsystem performance for multimedia the first approach applies constrained layout currently used for single disk model to multi disk system the second calls for using stripping unit that meets both media and storage system optimization criteria the third uses pool of buffers instead of single buffer per stream
automated extraction of bibliographic information from journal articles is key to the affordable creation and maintenance of citation databases such as medline reg newly required bibliographic field in this database is investigator names names of people who have contributed to the research addressed in the article but who are not listed as authors since the number of such names is often large several score or more their manual entry is prohibitive the automated extraction of these names is problem in named entity recognition ner but differs from typical ner due to the absence of normal english grammar in the text containing the names in addition since medline conventions require names to be expressed in particular format it is necessary to identify both first and last names of each investigator an additional challenge we seek to automate this task through two machine learning approaches support vector machine and structural svm both of which show good performance at the word and chunk levels in contrast to traditional svm structural svm attempts to learn sequence by using contextual label features in addition to observational features it outperforms svm at the initial learning stage without using contextual observation features however with the addition of these contextual features from neighboring tokens svm performance improves to match or slightly exceed that of the structural svm
this paper describes an inter procedural technique for computing symbolic bounds on the number of statements procedure executes in terms of its scalar inputs and user defined quantitative functions of input data structures such computational complexity bounds for even simple programs are usually disjunctive non linear and involve numerical properties of heaps we address the challenges of generating these bounds using two novel ideas we introduce proof methodology based on multiple counter instrumentation each counter can be initialized and incremented at potentially multiple program locations that allows given linear invariant generation tool to compute linear bounds individually on these counter variables the bounds on these counters are then composed together to generate total bounds that are non linear and disjunctive we also give an algorithm for automating this proof methodology our algorithm generates complexity bounds that are usually precise not only in terms of the computational complexity but also in terms of the constant factors next we introduce the notion of user defined quantitative functions that can be associated with abstract data structures eg length of list height of tree etc we show how to compute bounds in terms of these quantitative functions using linear invariant generation tool that has support for handling uninterpreted functions we show application of this methodology to commonly used data structures namely lists list of lists trees bit vectors using examples from microsoft product code we observe that few quantitative functions for each data structure are usually sufficient to allow generation of symbolic complexity bounds of variety of loops that iterate over these data structures and that it is straightforward to define these quantitative functions the combination of these techniques enables generation of precise computational complexity bounds for real world examples drawn from microsoft product code and stl library code for some of which it is non trivial to even prove termination such automatically generated bounds are very useful for early detection of egregious performance problems in large modular codebases that are constantly being changed by multiple developers who make heavy use of code written by others without good understanding of their implementation complexity
overlays have enabled several new and popular distributed applications such as akamai kazaa and bittorrent however the lack of an overlay aware network stack has hindered the widespread use of general purpose overlay packet delivery services in this paper we describe the design and implementation of oasis system and toolkit that enables legacy operating systems to access overlay based packet delivery services oasis combines set of ideas network address translation name resolution packet capture dynamic code execution to provide greater user choice we are in the process of making the oasis toolkit available for public use specifically to ease the development of planetlab based packet delivery services
we propose physically based algorithm for synthesizing sounds synchronized with brittle fracture animations motivated by laboratory experiments we approximate brittle fracture sounds using time varying rigid body sound models we extend methods for fracturing rigid materials by proposing fast quasistatic stress solver to resolve near audio rate fracture events energy based fracture pattern modeling and estimation of crack related fracture impulses multipole radiation models provide scalable sound radiation for complex debris and level of detail control to reduce soundmodel generation costs for complex fracture debris we propose precomputed rigid body soundbanks comprised of precomputed ellipsoidal sound proxies examples and experiments are presented that demonstrate plausible and affordable brittle fracture sounds
ml modules and haskell type classes have proven to be highly effective tools for program structuring modules emphasize explicit configuration of program components and the use of data abstraction type classes emphasize implicit program construction and ad hoc polymorphism in this paper we show how the implicitly typed style of type class programming may be supported within the framework of an explicitly typed module language by viewing type classes as particular mode of use of modules this view offers harmonious integration of modules and type classes where type class features such as class hierarchies and associated types arise naturally as uses of existing module language constructs such as module hierarchies and type components in addition programmers have explicit control over which type class instances are available for use by type inference in given scope we formalize our approach as harper stone style elaboration relation and provide sound type inference algorithm as guide to implementation
we propose novel objective function for discriminatively tuning log linear machine translation models our objective explicitly optimizes the bleu score of expected gram counts the same quantities that arise in forest based consensus and minimum bayes risk decoding methods our continuous objective can be optimized using simple gradient ascent however computing critical quantities in the gradient necessitates novel dynamic program which we also present here assuming bleu as an evaluation measure our objective function has two principle advantages over standard max bleu tuning first it specifically optimizes model weights for downstream consensus decoding procedures an unexpected second benefit is that it reduces overfitting which can improve test set bleu scores when using standard viterbi decoding
the applicability of term rewriting to program transformation is limited by the lack of control over rule application and by the context free nature of rewrite rules the first problem is addressed by languages supporting user definable rewriting strategies the second problem is addressed by the extension of rewriting strategies with scoped dynamic rewrite rules dynamic rules are defined at run time and can access variables available from their definition context rules defined within rule scope are automatically retracted at the end of that scope in this paper we explore the design space of dynamic rules and their application to transformation problems the technique is formally defined by extending the operational semantics underlying the program transformation language stratego and illustrated by means of several program transformations in stratego including constant propagation bound variable renaming dead code elimination function inlining and function specialization
this paper concentrates on the retrieval aspect in adaptive hypermedia ah traditionally ah research concentrates on applications that are closed in the sense that they assume fixed content elements certain applications ask for an extension of the contents considered with data obtained through information retrieval ir this paper addresses this issue of opening up ah applications and gives insight into research that applies techniques from ir and from the semantic web sw for the embedding of ir in ah we look at this issue in the context of an abstract reference model aham and concrete implementation framework aha the goal of this research is to define framework for ah with extended ir functionality we address the relevant issues for this framework characterized by the application of concepts from the sw paradigm leading to an enriched notion of concept relevancy
computing all pairs distances in graph is fundamental problem of computer science but there has been status quo with respect to the general problem of weighted directed graphs in contrast there has been growing interest in the area of algorithms for approximate shortest paths leading to many interesting variations of the original problemin this article we trace some of the fundamental developments like spanners and distance oracles their underlying constructions as well as their applications to the approximate all pairs shortest paths
in this paper we consider techniques for disseminating dynamic data such as stock prices and real time weather information from sources to set of repositories we focus on the problem of maintaining coherency of dynamic data items in network of cooperating repositories we show that cooperation among repositories where each repository pushes updates of data items to other repositories helps reduce system wide communication and computation overheads for coherency maintenance however contrary to intuition we also show that increasing the degree of cooperation beyond certain point can in fact be detrimental to the goal of maintaining coherency at low communication and computational overheads we present techniques to derive the optimal degree of cooperation among repositories ii to construct an efficient dissemination tree for propagating changes from sources to cooperating repositories and iii to determine when to push an update from one repository to another for coherency maintenance we evaluate the efficacy of our techniques using real world traces of dynamically changing data items specifically stock prices and show that careful dissemination of updates through network of cooperating repositories can substantially lower the cost of coherency maintenance
as cmos feature sizes continue to shrink and traditional microarchitectural methods for delivering high performance eg deep pipelining become too expensive and power hungry chip multiprocessors cmps become an exciting new direction by which system designers can deliver increased performance exploiting parallelism in such designs is the key to high performance and we find that parallelism must be exploited at multiple levels of the system the thread level parallelism that has become popular in many designs fails to exploit all the levels of available parallelism in many workloads for cmp systems we describe the cell broadband engine and the multiple levels at which its architecture exploits parallelism data level instruction level thread level memory level and compute transfer parallelism by taking advantage of opportunities at all levels of the system this cmp revolutionizes parallel architectures to deliver previously unattained levels of single chip performance we describe how the heterogeneous cores allow to achieve this performance by parallelizing and offloading computation intensive application code onto the synergistic processor element spe cores using heterogeneous thread model with spes we also give an example of scheduling code to be memory latency tolerant using software pipelining techniques in the spe
the functional programming community has shown some interest in spreadsheets but surprisingly no one seems to have considered making standard spreadsheet such as excel work with standard functional programming language such as haskell in this paper we show one way that this can be done our hope is that by doing so we might get spreadsheet programmers to give functional programming try
sharing structured data today requires standardizing upon single schema then mapping and cleaning all of the data this results in single queriable mediated data instance however for settings in which structured data is being collaboratively authored by large community eg in the sciences there is often lack of consensus about how it should be represented what is correct and which sources are authoritative moreover such data is seldom static it is frequently updated cleaned and annotated the orchestra collaborative data sharing system develops new architecture and consistency model for such settings based on the needs of data sharing in the life sciences in this paper we describe the basic architecture and implementation of the orchestra system and summarize some of the open challenges that arise in this setting
we present two interfaces to support one handed thumb use for pdas and cell phones both use scalable user interface scui techniques to support multiple devices with different resolutions and aspect ratios the designs use variations of zooming interface techniques to provide multiple views of application data applens uses tabular fisheye to access nine applications while launchtile uses pure zoom to access thirty six applications we introduce two sets of thumb gestures each representing different philosophies for one handed interaction we conducted two studies to evaluate our designs in the first study we explored whether users could learn and execute the applens gesture set with minimal training participants performed more accurately and efficiently using gestures for directional navigation than using gestures for object interaction in the second study we gathered user reactions to each interface as well as comparative preferences with minimal exposure to each design most users favored applens’s tabular fisheye interface
this paper shows how wikipedia and the semantic knowledge it contains can be exploited for document clustering we first create concept based document representation by mapping the terms and phrases within documents to their corresponding articles or concepts in wikipedia we also developed similarity measure that evaluates the semantic relatedness between concept sets for two documents we test the concept based representation and the similarity measure on two standard text document datasets empirical results show that although further optimizations could be performed our approach already improves upon related techniques
the java platform requires that out of bounds array accesses produce runtime exceptions in general this requires dynamic bounds check each time an array element is accessed however if it can be proven that the array index is within the bounds of the array the check can be eliminated we present new algorithm based on extended static single assignment essa form that builds constraint system representing control flow qualified linear constraints among program variables derived from program statements our system then derives relationships among variables and provides verifiable proof of its conclusions this proof can be verified by runtime system to minimize the analysis’s performance impact our system simultaneously considers both control flow and data flow when analyzing the constraint system handles general linear inequalities instead of simple difference constraints and provides verifiable proofs for its claims we present experimental results demonstrating that this method eliminates more bounds checks and when combined with runtime verification results in lower runtime cost than prior work our algorithm improves benchmark performance by up to nearly over the baseline safetsa system
we present study of the effects of disk and memory corruption on file system data integrity our analysis focuses on sun’s zfs modern commercial offering with numerous reliability mechanisms through careful and thorough fault injection we show that zfs is robust to wide range of disk faults we further demonstrate that zfs is less resilient to memory corruption which can lead to corrupt data being returned to applications or system crashes our analysis reveals the importance of considering both memory and disk in the construction of truly robust file and storage systems
in recent years olap technologies have become one of the important applications in the database industry in particular the datacube operation proposed in receives strong attention among researchers as fundamental research topic in the olap technologies the datacube operation requires computation of aggregations on all possible combinations of each dimension attribute as the number of dimensions increases it becomes very expensive to compute datacubes because the required computation cost grows exponentially with the increase of dimensions parallelization is very important factor for fast datacube computation however we cannot obtain sufficient performance gain in the presence of data skew even if the computation is parallelized in this paper we present dynamic load balancing strategy which enables us to extract the effectiveness of parallizing datacube computation sufficiently we perform experiments based on simulations and show that our strategy performs well
parallel bottom up evaluation provides an alternative for the efficient evaluation of logic programs existing parallel evaluation strategies are neither effective nor efficient in determining the data to be transmitted among processors in this paper we propose different strategy for general datalog programs that is based on the partitioning of data rather than that of rule instantiations the partition and processing schemes defined in this paper are more general than those in existing strategies parallel evaluation algorithm is given based on the semi naive bottom up evaluation notion of potential usefulness is recognized as data transmission criterion to reduce both effectively and efficiently the amount of data transmitted heuristics and algorithms are proposed for designing the partition and processing schemes for given program results from an experiment show that the strategy proposed in this paper has many promising features
when domain novices are in cc kuhlthau’s stage the exploration stage of researching an assignment they often do not know their information need this causes them to go back to stage the topic selection stage when they are selecting keywords to formulate their query to an information retrieval ir system our hypothesis is that instead of going backward they should be going forward toward goal state the performance of the task for which they are seeking the information if they can somehow construct their goal state into query this forward looking query better operationalizes their information need than does topic based query for domain novice undergraduates seeking information for course essay we define their task as selecting high impact essay structure which will put the student’s learning on display for the course instructor who will evaluate the essay we report study of first year history undergraduate students which tested the use and effectiveness of essay type as task focused query formulation device we randomly assigned history undergraduates to an intervention group and control group the dependent variable was essay quality based on an evaluation of the student’s essay by research team member and the marks given to the student’s essay by the course instructor we found that conscious or formal consideration of essay type is inconclusive as basis of task focused query formulation device for ir
according to our experience in real dash world projects we still observe deficiencies of current methods for object dash oriented analysis ooa especially in respect to the early elicitation and definition of requirements therefore we used object dash oriented technology and hypertext to develop practical approach ndash with tool support ndash that tightly combines ooa with requirements definition this novel approach is compatible with virtually any ooa method while more work needs to be done especially for supporting the process of requirements definition the observed deficiencies and current limitations of existing ooa methods are addressed and partly removed through this combination we have applied our approach in real dash world projects and our experience suggests the usefulness of this approach essentially its use leads to more complete and structured definition of the requirements and consequently we derive some recommendations for practitioners
accurate query cost estimation is crucial to query optimization in multidatabase system several estimation techniques for static environment have been suggested in the literature to develop cost model for dynamic environment we recently introduced multistate query sampling method it has been shown that this technique is promising in estimating the cost of query run in any given contention state for dynamic environment in this paper we study new problem on how to estimate the cost of large query that may experience multiple contention states following the discussion of limitations for two simple approaches ie single state analysis and average cost analysis we propose two novel techniques to tackle this challenge the first one called fractional analysis is suitable for gradually and smoothly changing environment while the second one called the probabilistic approach is developed for rapidly and randomly changing environment the former estimates query cost by analyzing its fractions and the latter estimates query cost based on markov chain theory the related issues including cost formula development error analysis and comparison among different approaches are discussed experiments demonstrate that the proposed techniques are quite promising in solving the new problem
sharing structured data in pp network is challenging problem especially in the absence of mediated schema the standard practice of answering consecutively rewritten query along the propagation path often results in significant loss of information on the opposite the use of mediated schemas requires human interaction and global agreement both during creation and maintenance in this paper we present groupeer an adaptive automated approach to both issues in the context of unstructured pp database overlays by allowing peers to individually choose which rewritten version of query to answer and evaluate the received answers information rich sources left hidden otherwise are discovered gradually the overlay is restructured as semantically similar peers are clustered together experimental results show that our technique produces very accurate answers and builds clusters that are very close to the optimal ones by contacting very small number of nodes in the overlay
extraction of three dimensional structure of scene from stereo images is problem that has been studied by the computer vision community for decades early work focused on the fundamentals of image correspondence and stereo geometry stereo research has matured significantly throughout the years and many advances in computational stereo continue to be made allowing stereo to be applied to new and more demanding problems in this paper we review recent advances in computational stereo focusing primarily on three important topics correspondence methods methods for occlusion and real time implementations throughout we present tables that summarize and draw distinctions among key ideas and approaches where available we provide comparative analyses and we make suggestions for analyses yet to be done
the convergence of the grid and peer to peer pp worlds has led to many solutions that try to efficiently solve the problem of resource discovery on grids some of these solutions are extensions of pp dht based networks we believe that these systems are not flexible enough when the indexed data are very dynamic ie the values of the resource attributes change very frequently over time this is common case for grid metadata like cpu loads queue occupation etc moreover since common requests for grid resources may be expressed as multi attribute range queries we think that the dht based pp solutions are poorly flexible and efficient in handling them in this paper we present two pp systems both are based on routing indexes which are used to efficiently route queries and update messages in the presence of highly variable data the first system uses tree shaped overlay network the second one is an evolution of the first and is based on two level hierarchical network topology where tree topologies must only be maintained at the lower level of the hierarchy ie within the various node groups making up the network the main goal of the second organization is to achieve simpler maintenance of the overall pp graph topology by preserving the good properties of the tree shaped topology we discuss the results of extensive simulation studies aimed at assessing the performance and scalability of the proposed approaches we also analyze how the network topologies affect the propagation of query and update messages
the primary aim of most data mining algorithms is to facilitate the discovery of concise and interpretable information from large amounts of data however many of the current formalizations of data mining algorithms have not quite reached this goal one of the reasons for this is that the focus on using purely automated techniques has imposed several constraints on data mining algorithms for example any data mining problem such as clustering or association rules requires the specification of particular problem formulations objective functions and parameters such systems fail to take the user’s needs into account very effectively this makes it necessary to keep the user in the loop in way which is both efficient and interpretable one unique way of achieving this is by leveraging human visual perceptions on intermediate data mining results such system combines the computational power of computer and the intuitive abilities of human to provide solutions which cannot be achieved by either this paper will discuss number of recent approaches to several data mining algorithms along these lines
autonomous mobile programs amps offer novel decentralised load management technology where periodic use is made of cost models to decide where to execute in network in this paper we demonstrate how sequential programs can be automatically converted into amps the amps are generated by an automatic continuation cost analyser that replaces iterations with costed autonomous mobility skeletons cams that encapsulate autonomous mobility the cams cost model uses an entirely novel continuation cost semantics to predict both the cost of the current iteration and the continuation cost of the remainder of the program we show that cams convey significant performance advantages eg reducing execution time by up to that the continuation cost models are consistent with the existing amp cost models and that the overheads of collecting and utilising the continuation costs are relatively small we discuss example amps generated by the analyser and demonstrate that they have very similar performance to hand costed cams programs
in this research we investigated whether learning process has unique information searching characteristics the results of this research show that information searching is learning process with unique searching characteristics specific to particular learning levels in laboratory experiment we studied the searching characteristics of participants engaged in searching tasks we classified the searching tasks according to anderson and krathwohl’s taxonomy of the cognitive learning domain research results indicate that applying and analyzing the middle two of the six categories generally take the most searching effort in terms of queries per session topics searched per session and total time searching interestingly the lowest two learning categories remembering and understanding exhibit searching characteristics similar to the highest order learning categories of evaluating and creating our results suggest the view of web searchers having simple information needs may be incorrect instead we discovered that users applied simple searching expressions to support their higher level information needs it appears that searchers rely primarily on their internal knowledge for evaluating and creating information needs using search primarily for fact checking and verification overall results indicate that learning theory may better describe the information searching process than more commonly used paradigms of decision making or problem solving the learning style of the searcher does have some moderating effect on exhibited searching characteristics the implication of this research is that rather than solely addressing searcher’s expressed information need searching systems can also address the underlying learning need of the user
parametric polymorphism has become common feature of mainstream programming languages but software component architectures have lagged behind and do not support it we examine the problem of providing parametric polymorphism with components combined from different programming languages we have investigated how to resolve different binding times and parametrization semantics in range of representative languages and have identified common ground that can be suitably mapped to different language bindings we present generic component architecture extension that provides support for parameterized components and that can be easily adapted to work on top of various software component architectures in use today eg corba dcom jni we have implemented and tested this architecture on top of corba we also present generic interface definition language gidl an extension to corba idl supporting generic types and we describe language bindings for java and aldor we explain our implementation of gidl consisting of gidl to idl compiler and tools for generating linkage code under the language bindings we demonstrate how this architecture can be used to access stl and aldor’s basicmath libraries in multi language environment and discuss our mappings in the context of automatic library interface generation
we study data driven web applications provided by web sites interacting with users or applications the web site can access an underlying database as well as state information updated as the interaction progresses and receives user input the structure and contents of web pages as well as the actions to be taken are determined dynamically by querying the underlying database as well as the state and inputs the properties to be verified concern the sequences of events inputs states and actions resulting from the interaction and are expressed in linear or branching time temporal logics the results establish under what conditions automatic verification of such properties is possible and provide the complexity of verification this brings into play mix of techniques from logic and model checking
most real world data is heterogeneous and richly interconnected examples include the web hypertext bibliometric data and social networks in contrast most statistical learning methods work with flat data representations forcing us to convert our data into form that loses much of the link structure the recently introduced framework of probabilistic relational models prms embraces the object relational nature of structured data by capturing probabilistic interactions between attributes of related entities in this paper we extend this framework by modeling interactions between the attributes and the link structure itself an advantage of our approach is unified generative model for both content and relational structure we propose two mechanisms for representing probabilistic distribution over link structures reference uncertainty and existence uncertainty we describe the appropriate conditions for using each model and present learning algorithms for each we present experimental results showing that the learned models can be used to predict link structure and moreover the observed link structure can be used to provide better predictions for the attributes in the model
variety of computer graphics applications sample surfaces of shapes in regular grid without making the sampling rate adaptive to the surface curvature or sharp features triangular meshes that interpolate or approximate these samples usually exhibit relatively big error around the insensitive sampled sharp features this paper presents robust general approach conducting bilateral filters to recover sharp edges on such insensitive sampled triangular meshes motivated by the impressive results of bilateral filtering for mesh smoothing and denoising we adopt it to govern the sharpening of triangular meshes after recognizing the regions that embed sharp features we recover the sharpness geometry through bilateral filtering followed by iteratively modifying the given mesh’s connectivity to form singlewide sharp edges that can be easily detected by their dihedral angles we show that the proposed method can robustly reconstruct sharp edges on feature insensitive sampled meshes
data mining on uncertain data stream has attracted lot of attentions because of the widely existed imprecise data generated from variety of streaming applications in recent years the main challenge of mining uncertain data streams stems from the strict space and time requirements of processing arriving tuples in high speed when new tuples arrive the number of the possible world instances will increase exponentially related to the volume of the data stream as one of the most important mining task how to devise clustering algorithms has been studied intensively on deterministic data streams whereas the work on the uncertain data streams still remains rare this paper proposes novel solution for clustering on uncertain data streams in point probability model where the existence of each tuple is uncertain detailed analysis and the thorough experimental reports both on synthetic and real data sets illustrate the advantages of our new method in terms of effectiveness and efficiency
algorithms for learning to rank web documents usually assume document’s relevance is independent of other documents this leads to learned ranking functions that produce rankings with redundant results in contrast user studies have shown that diversity at high ranks is often preferred we present two online learning algorithms that directly learn diverse ranking of documents based on users clicking behavior we show that these algorithms minimize abandonment or alternatively maximize the probability that relevant document is found in the top positions of ranking moreover one of our algorithms asymptotically achieves optimal worst case performance even if users interests change
inducing association rules is one of the central tasks in data mining applications quantitative association rules induced from databases describe rich and hidden relationships to be found within data that can prove useful for various application purposes eg market basket analysis customer profiling and others although association rules are quite widely used in practice thorough analysis of the related computational complexity is missing this paper intends to provide contribution in this setting to this end we first formally define quantitative association rule mining problems which include boolean association rules as special case we then analyze computational complexity of such problems the general problem as well as some interesting special cases are considered
we analyze the statistical properties of the coverage of point target moving in straight line in non uniform dynamic sensor field sensor locations form spatial point process the environmental variation is captured by making the sensor locations form non homogeneous spatial poisson process with fixed spatially varying density function the sensing areas of the sensors are circles of iid radii the availability of each node is modeled by an independent valued continuous time markov chain this gives markov non homogeneous poisson boolean model for which we perform coverage analysis we first obtain coverage of the target at an arbitrary time instant we then obtain coverage statistics of the target during the time interval we also provide an asymptotically tight closed form approximation for the duration for which the target is not covered in numerical results illustrate the analysis the environmental variation can also be captured by modeling the density function as spatial random process resulting in the point process being two dimensional cox process for this model we discuss issues in the coverage analysis
given query tuple the dynamic skyline query retrieves the tuples that are not dynamically dominated by any other in the data set with respect to tuple dynamically dominates another wrt if it has closer to q’s values in all attributes and has strictly closer to q’s value in at least one the dynamic skyline query can be treated as standard skyline query subject to the transformation of all tuples values in this work we make the observation that results to past dynamic skyline queries can help reduce the computation cost for future queries to this end we propose caching mechanism for dynamic skyline queries and devise cache aware algorithm our extensive experimental evaluation demonstrates the efficiency of this mechanism compared to standard techniques without caching
set containment join is join between set valued attributes of two relations whose join condition is specified using the subset sube operator set containment joins are deployed in many database applications even those that do not support set valued attributes in this article we propose two novel partitioning algorithms called the adaptive pick and sweep join apsj and the adaptive divide and conquer join adcj which allow computing set containment joins efficiently we show that apsj outperforms previously suggested algorithms for many data sets often by an order of magnitude we present detailed analysis of the algorithms and study their performance on real and synthetic data using an implemented testbed
large motion data sets often contain many variants of the same kind of motion but without appropriate tools it is difficult to fully exploit this fact this paper provides automated methods for identifying logically similar motions in data set and using them to build continuous and intuitively parameterized space of motions to find logically similar motions that are numerically dissimilar our search method employs novel distance metric to find close motions and then uses them as intermediaries to find more distant motions search queries are answered at interactive speeds through precomputation that compactly represents all possibly similar motion segments once set of related motions has been extracted we automatically register them and apply blending techniques to create continuous space of motions given function that defines relevant motion parameters we present method for extracting motions from this space that accurately possess new parameters requested by the user our algorithm extends previous work by explicitly constraining blend weights to reasonable values and having run time cost that is nearly independent of the number of example motions we present experimental results on test data set of frames or about ten minutes of motion sampled at hz
rough sets theory has proved to be useful mathematical tool for dealing with the vagueness and granularity in information tables classical definitions of lower and upper approximations were originally introduced with reference to an indiscernibility relation however indiscernibility relation is still restrictive for many applications many real world problems deal with assignment of some objects to some preference ordered decision classes and the objects are described by finite set of qualitative attributes and quantitative attributes in this paper we construct the indiscernibility relation for the subset of nominal attributes the outranking relation for the subset of ordinal attributes and the similarity relation for the subset of quantitative attributes then the global binary relation is generated by the intersection of indiscernibility relation outranking relation and similarity relation new definitions of lower and upper approximations of the upward and downward unions of decision classes are proposed based on the global relation we also prove that the lower and upper approximation operations satisfy the properties of rough inclusion complementarity identity of boundaries and monotonicity
electronic commerce is now an established vital part of the world economy however this economic sector is currently endangered by consumers well founded concern for the privacy of their information recent surveys indicate that this concern is beginning to alter consumers spending habits the world wide web consortium is well aware of these concerns and has produced the platform for privacy preferences protocol as mechanism to help consumers protect their online privacy this mechanism relies on the use of machine readable privacy policies posted on website and interpreted by client side browser extension however recent surveys indicate that adoption of this technology is stagnating on the server side and work on an updated version of the platform has been halted we use signaling theory as framework to model the likely future evolution of the platform in an effort to gauge whether it will flourish or wither as technology we find that signaling theory predicts the collapse of the platform for privacy preferences protocol however we also find theoretical and empirical grounds to predict that government intervention can drive adoption of the platform on the server side which may in turn bootstrap user adoption of this technology
clickthrough rate bid and cost per click are known to be among the factors that impact the rank of an ad shown on search result page search engines can benefit from estimating ad clickthrough in order to determine the quality of ads and maximize their revenue in this paper methodology is developed to estimate ad clickthrough rate by exploring user queries and clickthrough logs as we demonstrate the average ad clickthrough rate depends to substantial extent on the rank position of ads and on the total number of ads displayed on the page this observation is utilized by baseline model to calculate the expected clickthrough rate for various ads we further study the impact of query intent on the clickthrough rate where query intent is predicted using combination of query features and the content of search engine result pages the baseline model and the query intent model are compared for the purpose of calculating the expected ad clickthrough rate our findings suggest that such factors as the rank of an ad the number of ads displayed on the result page and query intent are effective in estimating ad clickthrough rate
in an uncertain database every object is associated with probability density function which describes the likelihood that appears at each position in multidimensional workspace this article studies two types of range retrieval fundamental to many analytical tasks specifically nonfuzzy query returns all the objects that appear in search region rq with at least certain probability tq on the other hand given an uncertain object fuzzy search retrieves the set of objects that are within distance epsiv from with no less than probability tq the core of our methodology is novel concept of ldquo probabilistically constrained rectangle rdquo which permits effective pruning validation of nonqualifying qualifying data we develop new index structure called the tree for minimizing the query overhead our algorithmic findings are accompanied with thorough theoretical analysis which reveals valuable insight into the problem characteristics and mathematically confirms the efficiency of our solutions we verify the effectiveness of the proposed techniques with extensive experiments
the size of human fingers and the lack of sensing precision can make precise touch screen interactions difficult we present set of five techniques called dual finger selections which leverage the recent development of multi touch sensitive displays to help users select very small targets these techniques facilitate pixel accurate targeting by adjusting the control display ratio with secondary finger while the primary finger controls the movement of the cursor we also contribute clicking technique called simpress which reduces motion errors during clicking and allows us to simulate hover state on devices unable to sense proximity we implemented our techniques on multi touch tabletop prototype that offers computer vision based tracking in our formal user study we tested the performance of our three most promising techniques stretch menu and slider against our baseline offset on four target sizes and three input noise levels all three chosen techniques outperformed the control technique in terms of error rate reduction and were preferred by our participants with stretch being the overall performance and preference winner
we propose new web page transformation method to facilitate web browsing on handheld devices such as personal digital assistants pdas in our approach an original web page that does not fit on the screen is transformed into set of subpages each of which fits on the screen this transformation is done through slicing the original page into page blocks iteratively with several factors considered these factors include the size of the screen the size of each page block the number of blocks in each transformed page the depth of the tree hierarchy that the transformed pages form as well as the semantic coherence between blocks we call the tree hierarchy of the transformed pages an sp tree in an sp tree an internal node consists of textually enhanced thumbnail image with hyperlinks and leaf node is block extracted from subpage of the original web page we adaptively adjust the fanout and the height of the sp tree so that each thumbnail image is clear enough for users to read while at the same time the number of clicks needed to reach leaf page is few through this transformation algorithm we preserve the contextual information in the original web page and reduce scrolling we have implemented this transformation module on proxy server and have conducted usability studies on its performance our system achieved shorter task completion time compared with that of transformations from the opera browser in nine of ten tasks the average improvement on familiar pages was percnt the average improvement on unfamiliar pages was percnt subjective responses were positive
the methods most heavily used by search engines to answer conjunctive queries on binary relations such as one associating keywords with web pages are based on computing the intersection of postings lists stored as sorted arrays and using variants of binary search we show that succinct representation of the binary relation permits much better results while using less space than traditional methods we apply our results not only to conjunctive queries on binary relations but also to queries on semi structured documents such as xml documents or file system indexes using variant of an adaptive algorithm used to solve conjunctive queries on binary relations
certified binary is value together with proof that the value satisfies given specification existing compilers that generate certified code have focused on simple memory and control flow safety rather than more advanced properties in this paper we present general framework for explicitly representing complex propositions and proofs in typed intermediate and assembly languages the new framework allows us to reason about certified programs that involve effects while still maintaining decidable typechecking we show how to integrate an entire proof system the calculus of inductive constructions into compiler intermediate language and how the intermediate language can undergo complex transformations cps and closure conversion while preserving proofs represented in the type system our work provides foundation for the process of automatically generating certified binaries in typetheoretic framework
in program debugging finding failing run is only the first step what about correcting the fault can we automate the second task as well as the first the autofix tool automatically generates and validates fixes for software faults the key insights behind autofix are to rely on contracts present in the software to ensure that the proposed fixes are semantically sound and on state diagrams using an abstract notion of state based on the boolean queries of class out of faults found by an automatic testing tool in two widely used eiffel libraries autofix proposes successful fixes for faults submitting some of these faults to experts shows that several of the proposed fixes are identical or close to fixes proposed by humans
despite their widespread adoption role based access control rbac models exhibit certain shortcomings that make them less than ideal for deployment in for example distributed access control in the distributed case standard rbac assumptions eg of relatively static access policies managed by human users with complete information available about users and job functions do not necessarily apply moreover rbac is restricted in the sense that it is based on one type of ascribed status an assignment of user to role in this article we introduce the status based access control sbac model for distributed access control the sbac model or family of models is based on the notion of users having an action status as well as an ascribed status user’s action status is established in part from history of events that relate to the user this history enables changing access policy requirements to be naturally accommodated the approach can be implemented as an autonomous agent that reasons about the events actions and history of events and actions which relates to requester for access to resources in order to decide whether the requester is permitted the access sought we define number of algebras for composing sbac policies algebras that exploit the language that we introduce for sbac policy representation identification based logic programs the sbac model is richer than rbac models and the policies that can be represented in our approach are more expressive than the policies admitted by number of monotonic languages that have been hitherto described for representing distributed access control requirements our algebras generalize existing algebras that have been defined for access policy composition we also describe an approach for the efficient implementation of sbac policies
is it possible to bring the benefits of rigorous software engineering methodologies to end users end users create software when they use spreadsheet systems web authoring tools and graphical languages when they write educational simulations spreadsheets and dynamic business web applications unfortunately however errors are pervasive in end user software and the resulting impact is sometimes enormous growing number of researchers and developers are working on ways to make the software created by end users more reliable this workshop brings together researchers who are addressing this topic with industry representatives who are deploying end user programming applications to facilitate sharing of real world problems and solutions
symbolic trajectory evaluation ste is powerful technique for hardware model checking it is based on valued symbolic simulation using and unknown where the is used to abstract away values of the circuit nodes most ste tools are bdd based and use dual rail representation for the three possible values of circuit nodes sat based ste tools typically use two variables for each circuit node to comply with the dual rail representation in this work we present novel valued circuit sat based algorithm for ste the ste problem is translated into circuit sat instance solution for this instance implies contradiction between the circuit and the ste assertion an unsat instance implies either that the assertion holds or that the model is too abstract to be verified in case of too abstract model we propose refinement automatically we implemented our valued circuit sat based ste algorithm and applied it successfully to several ste examples
disclosure analysis and control are critical to protect sensitive information in statistical databases when some statistical moments are released generic question in disclosure analysis is whether data snooper can deduce any sensitive information from available statistical moments to address this question we consider various types of possible disclosure based on the exact bounds that snooper can infer about any protected moments from available statistical moments we focus on protecting static moments in two dimensional tables and obtain the following results for each type of disclosure we reveal the distribution patterns of protected moments that are subject to disclosure based on the disclosure patterns we design efficient algorithms to discover all protected moments that are subject to disclosure also based on the disclosure patterns we propose efficient algorithms to eliminate all possible disclosures by combining minimum number of available moments we also discuss the difficulties of executing disclosure analysis and control in high dimensional tables
motivated by dennis dams studies of over and underapproximation of state transition systems we define logical relation calculus for galois connection building the calculus lets us define overapproximating galois connections in terms of lower powersets and underapproximating galois connections in terms of upper powersets using the calculus we synthesize dams most precise over and underapproximating transition systems and obtain proofs of their soundness and best precision as corollaries of abstract interpretation theory as bonus the calculus yields logic that corresponds to the variant of hennessy milner logic used in dams results following from corollary we have that dams most precise approximations soundly validate most properties that hold true for the corresponding concrete system these results bind together abstract interpretation and abstract model checking as intended by dams
understanding intents from search queries can improve user’s search experience and boost site’s advertising profits query tagging via statistical sequential labeling models has been shown to perform well but annotating the training set for supervised learning requires substantial human effort domain specific knowledge such as semantic class lexicons reduces the amount of needed manual annotations but much human effort is still required to maintain these as search topics evolve over time this paper investigates semi supervised learning algorithms that leverage structured data html lists from the web to automatically generate semantic class lexicons which are used to improve query tagging performance even with far less training data we focus our study on understanding the correct objectives for the semi supervised lexicon learning algorithms that are crucial for the success of query tagging prior work on lexicon acquisition has largely focused on the precision of the lexicons but we show that precision is not important if the lexicons are used for query tagging more adequate criterion should emphasize trade off between maximizing the recall of semantic class instances in the data and minimizing the confusability this ensures that the similar levels of precision and recall are observed on both training and test set hence prevents over fitting the lexicon features experimental results on retail product queries show that enhancing query tagger with lexicons learned with this objective reduces word level tagging errors by up to compared to the baseline tagger that does not use any lexicon features in contrast lexicons obtained through precision centric learning algorithm even degrade the performance of tagger compared to the baseline furthermore the proposed method outperforms one in which semantic class lexicons have been extracted from database
mobile visualisation of map based information is difficult task designers of such systems must contend with the limitations of mobile devices in terms of hardware screen size and input mechanisms these problems are exacerbated by the nature of spatial data where large information space needs to be presented and manipulated on small screen in this paper prototype adaptive mobile map based visualisation system called mediamaps is presented mediamaps allows users to capture location tag sort and browse multimedia in map based view mediamaps was designed to adapt the information visualised the map based visualisations and the supporting user interface the results of an international field study in which participants used mediamaps on their personal mobile phones for three week period are also presented these results show that the adaptations implemented achieved high levels of accuracy and user satisfaction and successfully addressed some of the limitations of mobile map based visualisation
advances in processor architecture and technology have resulted in workstations in the mips range as well newer local area networks such as atm promise ten to hundred fold increase in throughput much reduced latency greater scalability and greatly increased reliability when compared to current lans such as ethernetwe believe that these new network and processor technologies will permit tighter coupling of distributed systems at the hardware level and that distributed systems software should be designed to benefit from that tighter coupling in this paper we propose an alternative way of structuring distributed systems that takes advantage of communication model based on remote network access reads and writes to protected memory segmentsa key feature of the new structure directly supported by the communication model is the separation of data transfer and control transfer this is in contrast to the structure of traditional distributed systems which are typically organized using message passing or remote procedure call rpc in rpc style systems data and control are inextricably linked mdash all rpcs must transfer both data and control even if the control transfer is unnecessarywe have implemented our model on decstation hardware connected by an atm network we demonstrate how separating data transfer and control transfer can eliminate unnecessary control transfers and facilitate tighter coupling of the client and server this has the potential to increase performance and reduce server load which supports scaling in the face of an increasing number of clients for example for small set of file server operations our analysis shows decrease in server load when we switched from communications mechanism requiring both control transfer and data transfer to an alternative structure based on pure data transfer
we study the usefulness of lookahead in on line server routing problems if an on line algorithm is not only informed about the requests released so far but also has limited ability to foresee future requests what is the improvement that can be achieved in terms of the competitive ratio we consider several on line server routing problems in this setting such as the on line traveling salesman and the on line traveling repairman problem we show that the influence of lookahead can change considerably depending on the particular objective function and metric space considered
wsml presents framework encompassing different language variants rooted in description logics and logic programming so far the precise relationships between these variants have not been investigated we take the nonmonotonic first order autoepistemic logic which generalizes both description logics and logic programming and extend it with frames and concrete domains to capture all features of wsml we call this novel formalism ff ael we consider two forms of language layering for wsml namely loose and strict layering where the latter enforces additional restrictions on the use of certain language constructs in the rule based language variants in order to give additional guarantees about the layering finally we demonstrate that each wsml variant semantically corresponds to its target formalism ie wsml dl corresponds to shiq wsml rule to the stable model semantics for logic programs the well founded semantics can be seen as an approximation and wsml core to dhl without nominals horn subset of shiq
the wide adoption of the internet has made it convenient and low cost platform for large scale data collection however privacy has been the one issue that concerns internet users much more than reduced costs and ease of use when sensitive information are involved respondents in online data collection are especially reluctant to provide truthful response and the conventional practice to employ trusted third party to collect the data is unacceptable in these situations researchers have proposed various anonymity preserving data collection techniques in recent years but the current methods are generally unable to resist malicious attacks adequately and they are not sufficiently scalable for the potentially large numbers of respondents involved in online data collections in this paper we present an efficient anonymity preserving data collection protocol that is suitable for mutually distrusting respondents to submit their responses to an untrusted data collector our protocol employs the onion route approach to unlink the responses from the respondents to preserve anonymity our experimental results show that the method is highly efficient and robust for online data collection scenarios that involve large numbers of respondents
the web enables broad dissemination of information and services however the ways in which sites are designed can either facilitate or impede users benefit from these resources we present longitudinal study of web site design from to we analyze over quantitative measures of interface aspects eg amount of text on pages numbers and types of links consistency accessibility etc for pages and over sites that received ratings from internet professionals we examine characteristics of highly rated sites and provide three perspectives on the evolution of web site design patterns descriptions of design patterns during each time period changes in design patterns across the three time periods and comparisons of design patterns to those that are recommended in the relevant literature ie texts by recognized experts and user studies we illustrate how design practices conform to or deviate from recommended practices and the consequent implications we show that the most glaring deficiency of web sites even for sites that are highly rated is their inadequate accessibility in particular for browser scripts tables and form elements
step wise refinement is powerful paradigm for developing complex program from simple program by adding features incrementally we present the ahead algebraic hierarchical equations for application design model that shows how step wise refinement scales to synthesize multiple programs and multiple non code representations ahead shows that software can have an elegant hierarchical mathematical structure that is expressible as nested sets of equations we review tool set that supports ahead as demonstration of its viability we have bootstrapped ahead tools solely from equational specifications generating java and non java artifacts automatically task that was accomplished only by ad hoc means previously
applications for wireless sensor networks wsns are being spread to areas in which the contextual parameters modeling the environment are changing over the application lifespan whereas software adaptation has been identified as an effective approach for addressing context aware applications the existing work on wsns fails to support context awareness and mostly focuses on developing techniques to reprogram the whole sensor node rather than reconfiguring particular portion of the sensor application software therefore enabling adaptivity in the higher layers of wsn architecture such as the middleware and application layers beside the consideration in the lower layers becomes of high importance in this paper we propose distributed component based middleware approach named wisekit to enable adaptation and reconfiguration of wsn applications in particular this proposal aims at providing an abstraction to facilitate development of adaptive wsn applications as resource availability is the main concern of wsns the preliminary evaluation shows that our middleware approach promises lightweight fine grained and communication efficient model of application adaptation with very limited memory and energy overhead
this paper presents new approach to cluster web images images are first processed to extract signal features such as color in hsv format and quantized orientation web pages referring to these images are processed to extract textual features keywords and feature reduction techniques such as stemming stop word elimination and zipf’s law are applied all visual and textual features are used to generate association rules hypergraphs are generated from these rules with features used as vertices and discovered associations as hyperedges twenty two objective interestingness measures are evaluated on their ability to prune non interesting rules and to assign weights to hyperedges then hypergraph partitioning algorithm is used to generate clusters of features and simple scoring function is used to assign images to clusters tree distance based evaluation measure is used to evaluate the quality of image clustering with respect to manually generated ground truth our experiments indicate that combining textual and content based features results in better clustering as compared to signal only or text only approaches online steps are done in real time which makes this approach practical for web images furthermore we demonstrate that statistical interestingness measures such as correlation coefficient laplace kappa and measure result in better clustering compared to traditional association rule interestingness measures such as support and confidence
in this paper an evolutionary clustering technique is described that uses new point symmetry based distance measure the algorithm is therefore able to detect both convex and non convex clusters kd tree based nearest neighbor search is used to reduce the complexity of finding the closest symmetric point adaptive mutation and crossover probabilities are used the proposed ga with point symmetry gaps distance based clustering algorithm is able to detect any type of clusters irrespective of their geometrical shape and overlapping nature as long as they possess the characteristic of symmetry gaps is compared with existing symmetry based clustering technique sbkm its modified version and the well known means algorithm sixteen data sets with widely varying characteristics are used to demonstrate its superiority for real life data sets anova and manova statistical analyses are performed
alloy specifications are used to define lightweight models of systems we present alchemy which compiles alloy specifications into implementations that execute against persistent databases alchemy translates subset of alloy predicates into imperative update operations and it converts facts into database integrity constraints that it maintains automatically in the face of these imperative actions in addition to presenting the semantics and an algorithm for this compilation we present the tool and outline its application to non trivial specification we also discuss lessons learned about the relationship between alloy specifications and imperative implementations
we optimally place intrusion detection system ids sensors and prioritize ids alerts using attack graph analysis we begin by predicting all possible ways of penetrating network to reach critical assets the set of all such paths through the network constitutes an attack graph which we aggregate according to underlying network regularities reducing the complexity of analysis we then place ids sensors to cover the attack graph using the fewest number of sensors this minimizes the cost of sensors including effort of deploying configuring and maintaining them while maintaining complete coverage of potential attack paths the sensor placement problem we pose is an instance of the np hard minimum set cover problem we solve this problem through an efficient greedy algorithm which works well in practice once sensors are deployed and alerts are raised our predictive attack graph allows us to prioritize alerts based on attack graph distance to critical assets
the continuous increase in electrical and computational power in data centers has been driving many research approaches under the green it main theme however most of this research focuses on reducing energy consumption considering hardware components and data center building features like servers distribution and cooling flow on the contrary this paper points out that energy consumption is also service quality problem and presents an energy aware design approach for building service based applications to this effect techniques are provided to measure service costs combining quality of service qos requirements and green performance indicators gpi in order to obtain better tradeoff between energy efficiency and performance for each user
large scale computing environments such as teragrid distributed asci supercomputer das and gridpsila have been using resource co allocation to execute applications on multiple sites their schedulers work with requests that contain imprecise estimations provided by users this lack of accuracy generates fragments inside the scheduling queues that can be filled by rescheduling both local and multi site requests current resource co allocation solutions rely on advance reservations to ensure that users can access all the resources at the same time these coallocation requests cannot be rescheduled if they are based on rigid advance reservations in this work we investigate the impact of rescheduling co allocation requests based on flexible advance reservations and processor remapping the metascheduler can modify the start time of each job component and remap the number of processors they use in each site the experimental results show that local jobs may not fill all the fragments in the scheduling queues and hence rescheduling co allocation requests reduces response time of both local and multi site jobs moreover we have observed in some scenarios that processor remapping increases the chances of placing the tasks of multi site jobs into single cluster thus eliminating the inter cluster network overhead
good distance metric for the input data is crucial in many pattern recognition and machine learning applications past studies have demonstrated that learning metric from labeled samples can significantly improve the performance of classification and clustering algorithms in this paper we investigate the problem of learning distance metric that measures the semantic similarity of input data for regression problems the particular application we consider is human age estimation our guiding principle for learning the distance metric is to preserve the local neighborhoods based on specially designed distance as well as to maximize the distances between data that are not in the same neighborhood in the semantic spacewithout any assumption about the structure and the distribution of the input data we show that this can be done by using semidefinite programming furthermore the low level feature space can be mapped to the high level semantic space by linear transformation with very low computational cost experimental results on the publicly available fg net database show that the learned metric correctly discovers the semantic structure of the data even when the amount of training data is small and significant improvement over the traditional euclidean metric for regression can be obtained using the learned metric most importantly simple regression methods such as nearest neighbors knn combined with our learned metric become quite competitive and sometimes even superior in terms of accuracy when compared with the state of the art human age estimation approaches
mobile devices are increasingly powerful in media storage and rendering the prevalent request of decent video browsing on mobile devices is demanding however one limitation comes from the size and aspect constraints of display to display video on small screen rendering process probably undergoes sort of retargeting to fit into the target display and keep the most of original video information in this paper we formulate video retargeting as the problem of finding an optimal trajectory for cropping window to go through the video capturing the most salient region to scale towards proper display on the target to measure the visual importance of every pixel we utilize the local spatial temporal saliency st saliency and face detection results the spatiotemporal movement of the cropping window is modeled in graph where smoothed trajectory is resolved by max flow min cut method in global optimization manner based on the horizontal vertical projections and graph based method the trajectory estimation of each shot can be conducted within one second also the process of merging trajectories is employed to capture more saliency in video experimental results on diverse video contents have shown that our approach is efficient and subjective evaluation shows that the retargeted video has gained desirable user satisfaction
geographically replicating popular objects in the internet speeds up content distribution at the cost of keeping the replicas consistent and up to date the overall effectiveness of replication can be measured by the total communication cost consisting of client accesses and consistency management both of which depend on the locations of the replicas this paper investigates the problem of placing replicas under the widely used ttl based consistency scheme polynomial time algorithm is proposed to compute the optimal placement of given number of replicas in network the new replica placement scheme is compared using real internet topologies and web traces against two existing approaches which do not consider consistency management or assume invalidation based consistency scheme the factors affecting their performance are identified and discussed
hierarchical models have been extensively studied in various domains however existing models assume fixed model structures or incorporate structural uncertainty generatively in this paper we propose dynamic hierarchical markov random fields dhmrfs to incorporate structural uncertainty in discriminative manner dhmrfs consist of two parts structure model and class label model both are defined as exponential family distributions conditioned on observations dhmrfs relax the independence assumption as made in directed models as exact inference is intractable variational method is developed to learn parameters and to find the map model structure and label assignment we apply the model to real world web data extraction task which automatically extracts product items for sale on the web the results show promise
the main goal of this paper is to apply rewriting termination technology enjoying quite mature set of termination results and tools to the problem of proving automatically the termination of concurrent systems under fairness assumptions we adopt the thesis that concurrent system can be naturally modeled as rewrite system and develop theoretical approach to systematically transform under reasonable assumptions fair termination problems into ordinary termination problems of associated relations to which standard rewriting termination techniques and tools can be applied our theoretical results are combined into practical proof method for proving fair termination that can be automated and can be supported by current termination tools we illustrate this proof method with some concrete examples and briefly comment on future extensions
new algorithm for interactive image segmentation is proposed besides the traditional appearance and gradient information new generic shape prior gsp knowledge which implies the location and the shape information of the object is combined into the framework the gsp can be further categorized into the regional and the contour gsp to fit the interactive application where hierarchical graph cut based optimization procedure is established for its global optimization using the regional gsp to obtain good global segmentation results and the local one using the contour gsp to refine boundaries of global results moreover the global optimization is based on superpixels which significantly reduce the computational complexity but preserve necessary image structures the local one only considers subset pixels around contour segment they both speed up the system results show our method performs better on both speed and accuracy
since its birth the internet has always been characterised by two fold aspect of distributed information repository to store publish and retrieve program and data files and an interaction medium including variety of communication services an important current trend consists of merging the two aspects described above and envision the internet as globally distributed computing platform where communication and computation can be freely intertwined however traditional distributed programming models fall short in this context due to the peculiar characteristics of the internet on the one hand internet services are decentralised and unreliable on the other hand even more important mobility either of users devices or application components is going to impact the internet in the near future since internet applications are intrinsically interactive and collaborative the definition of an appropriate coordination model and its integration in forthcoming internet programming languages are key issues to build applications including mobile entities we sketch the main features that such model should present then we survey and discuss some coordination models for internet programming languages eventually outlining open issues and promising research directions
tool support for refactoring code written in mainstream languages such as and is currently lacking due to the complexity introduced by the mandatory preprocessing phase that forms part of the compilation cycle the defintion and use of macros complicates the notions of scope and of identifier boundaries the concept of token equivalence classes can be used to bridge the gap between the language proper semantic analysis and the nonpreprocessed source code the cscout toolchest uses the developed theory to analyze large interdependent program families web based interactive front end allows the precise realization of rename and remove refactorings on the original source code in addition cscout can convert programs into portable obfuscated format or store complete and accurate representation of the code and its identifiers in relational database
many commercial database management systems eg db oracle etc make use of histograms of the value distribution of individual attributes of relations in order to make good selections of query execution plans these histograms contain partial information about the actual distribution such as which attribute values occur most frequently and how often each one occurs and what value occurs at the kth quantile when the values are sortedin this paper we quantitatively assess the information gain or uncertainty reduction due to each of these types of histogram information both individually and in combination correspondingly we observe how the accuracy of estimating frequencies of individual values improves with the availability of each of these types of histogram information we suggest guidelines for constructing histograms tailored to each individual attribute depending on the characteristics of its attribute value distribution
in the past decades parallel systems have been used widely to support scientific and commercial applications new data centers today employ huge quantities of systems which consume large amount of energy most large scale systems have an array of hard disks working in parallel to meet performance requirements traditional energy conservation techniques attempt to place disks into low power states when possible in this paper we propose novel strategy which aims to significantly conserve energy while reducing average response times this goal is achieved by making use of buffer disks in parallel systems to accumulate small writes to form log which can be transferred to data disks in batch way we develop an algorithm dynamic request allocation algorithm for writes or daraw to energy efficiently allocate and schedule write requests in parallel system daraw is able to improve parallel energy efficiency by the virtue of leveraging buffer disks to serve majority of incoming write requests thereby keeping data disks in low power state for longer period times buffered requests are then written to data disks at predetermined time experimental results show that daraw can significantly reduce energy dissipation in parallel systems without adverse impacts on performance
the paper presents system for automatic geo registered real time reconstruction from video of urban scenes the system collects video streams as well as gps and inertia measurements in order to place the reconstructed models in geo registered coordinates it is designed using current state of the art real time modules for all processing steps it employs commodity graphics hardware and standard cpu’s to achieve real time performance we present the main considerations in designing the system and the steps of the processing pipeline our system extends existing algorithms to meet the robustness and variability necessary to operate out of the lab to account for the large dynamic range of outdoor videos the processing pipeline estimates global camera gain changes in the feature tracking stage and efficiently compensates for these in stereo estimation without impacting the real time performance the required accuracy for many applications is achieved with two step stereo reconstruction process exploiting the redundancy across frames we show results on real video sequences comprising hundreds of thousands of frames
dlv is the state of the art system for evaluating disjunctive answer set programs as in most answer set programming asp systems its implementation is divided in grounding part and propositional model finding part in this paper we focus on the latter which relies on an algorithm using backtracking search recently dlv has been enhanced with backjumping techniques which also involve reason calculus recording causes for the truth or falsity of atoms during the search this reason calculus allows for looking back in the search process for identifying areas in the search space in which no answer set will be found we can also define heuristics which make use of the information about reasons preferring literals that were the reasons of more inconsistent branches of the search tree this heuristics thus use information gathered earlier in the computation and are therefore referred to as look back heuristics in this paper we formulate suitable look back heuristics and focus on the experimental evaluation of the look back techniques that we have implemented in dlv obtaining the system dlv we have conducted thorough experimental analysis considering both randomly generated and structured instances of the qbf problem the canonical problem for the complexity classes and any problem in these classes can be expressed uniformly using asp and can therefore be solved by dlv we have also evaluated the same benchmark using native qbf solvers which were among the best solvers in recent qbf evaluations the comparison shows that dlv endowed with look back techniques is competitive with the best available qbf solvers on such instances
the growing processor memory performance gap causes the performance of many codes to be limited by memory accesses if known to exist in an application strided memory accesses forming streams can be targeted by optimizations such as prefetching relocation remapping and vector loads undetected they can be significant source of memory stalls in loops existing stream detection mechanisms either require special hardware which may not gather statistics for subsequent analysis or are limited to compile time detection of array accesses in loops formally little treatment has been accorded to the subject the concept of locality fails to capture the existence of streams in program’s memory accesses the contributions of this paper are as follows first we define spatial regularity as means to discuss the presence and effects of streams second we develop measures to quantify spatial regularity and we design and implement an on line parallel algorithm to detect streams and hence regularity in running applications third we use examples from real codes and common benchmarks to illustrate how derived stream statistics can be used to guide the application of profile driven optimizations overall we demonstrate the benefits of our novel regularity metric as an instrument to detect potential for code optimizations affecting memory performance
in field such as cardiology data used for clinical studies is not only alphanumeric but can also be composed of images or signals multimedia data warehouse then must be studied in order to provide an efficient environment for the analysis of this data the analysis environment must include appropriate processing methods in order to compute or extract the knowledge embedded into raw data traditional multidimensional models have static structure which members of dimensions are computed in unique way however multimedia data is often characterized by descriptors that can be obtained by various computation modes we define these computation modes as functional versions of the descriptors we propose functional multiversion multidimensional model by integrating the concept of version of dimension this concept defines dimensions with members computed according to various functional versions this new approach integrates different computation modes of these members into the proposed model in order to allow the user to select the best representation of data in this paper conceptual model is formally defined and prototype for this study is presented multimedia data warehouse in the medical field has been implemented on therapeutic study on acute myocardial infarction
this paper presents an energy management policy for reconfigurable clusters running multi tier application exploiting dvs together with multiple sleep states we develop theoretical analysis of the corresponding power optimization problem and design an algorithm around the solution moreover we rigorously investigate selection of the optimal number of spare servers for each power state problem that has only been approached in an ad hoc manner in current policies to validate our results and policies we implement them on an actual multi tier server cluster where nodes support all power management techniques considered experimental results using realistic dynamic workloads based on the tpcw benchmark show that exploiting multiple sleep states results in significant additional cluster wide energy savings up to with little or no performance degradation
cost analysis aims at obtaining information about the execution cost of programs this paper studies cost relation systems crss the sets of recursive equations used in cost analysis in order to capture the execution cost of programs in terms of the size of their input arguments we investigate the notion of crs from general perspective which is independent of the particular cost analysis framework our main contributions are we provide formal definition of execution cost and of crs which is not tied to particular programming language we present the notion of sound crs ie which correctly approximates the cost of the corresponding program we identify the differences with recurrence relation systems its possible applications and the new challenges that they bring about our general framework is illustrated by instantiating it to cost analysis of java bytecode haskell and prolog
the run time binding of web services has been recently put forward in order to support rapid and dynamic web service compositions with the growing number of alternative web services that provide the same functionality but differ in quality parameters the service composition becomes decision problem on which component services should be selected such that user’s end to end qos requirements eg availability response time and preferences eg price are satisfied although very efficient local selection strategy fails short in handling global qos requirements solutions based on global optimization on the other hand can handle global constraints but their poor performance renders them inappropriate for applications with dynamic and real time requirements in this paper we address this problem and propose solution that combines global optimization with local selection techniques to benefit from the advantages of both worlds the proposed solution consists of two steps first we use mixed integer programming mip to find the optimal decomposition of global qos constraints into local constraints second we use distributed local selection to find the best web services that satisfy these local constraints the results of experimental evaluation indicate that our approach significantly outperforms existing solutions in terms of computation time while achieving close to optimal results
electronic marketplaces are an important research theme on the information systems landscape in this paper we examine twelve years of research on electronic marketplaces in leading information systems journals the research articles are classified according to five conceptual high level groupings electronic markets theory system perspective with focus on the technology or functionality with the system adoption and implementation issues organisational implications and broader commerce issues the findings show an increase in electronic marketplace marketplace research over the twelve years the analysis of the literature highlights two distinct issues that researchers in the information systems discipline need to address the first is the lack of research on the fundamental questions on the nature of electronic markets and their efficiency if information systems research does not address this question then it will not be seen as tackling critical issues by those outside of the discipline the second is the relative lack of articles on the organisational implications of adopting and managing electronic marketplaces these include the organisational benefits costs and risks of trading through marketplaces and strategies and methodologies for managing organisational participation both issues can be addressed by increasing the number of macro studies examining efficiencies in electronic markets
we take cross layer design approach to study rate control in multihop wireless networks due to the lossy nature of wireless links the data rate of given flow becomes smaller and smaller along its routing path as result the data rate received successfully at the destination node the effective rate is typically lower than the transmission rate at the source node the injection rate in light of this observation we treat each flow as leaky pipe flow and introduce the notion of effective utility associated with the effective rate not the injection rate of each flow we then explore rate control through effective network utility maximization enum in this study two network models are studied in this paper enum with link outage constraints with maximum error rate at each link enum with path outage constraints where there exists an end to end outage requirement for each flow for both models we explicitly take into account the thinning feature of data flows and devise distributed hop by hop rate control algorithms accordingly our numerical examples corroborate that higher effective network utility and better fairness can be achieved by the enum algorithms than the standard num
modern transactional response time sensitive applications have run into practical limits on the size of garbage collected heaps the heap can only grow until gc pauses exceed the response time limits sustainable scalable concurrent collection has become feature worth paying forazul systems has built custom system cpu chip board and os specifically to run garbage collected virtual machines the custom cpu includes read barrier instruction the read barrier enables highly concurrent no stop the world phases parallel and compacting gc algorithm the pauseless algorithm is designed for uninterrupted application execution and consistent mutator throughput in every gc phasebeyond the basic requirement of collecting faster than the allocation rate the pauseless collector is never in rush to complete any gc phase no phase places an undue burden on the mutators nor do phases race to complete before the mutators produce more work portions of the pauseless algorithm also feature self healing behavior which limits mutator overhead and reduces mutator sensitivity to the current gc statewe present the pauseless gc algorithm the supporting hardware features that enable it and data on the overhead efficiency and pause times when running sustained workload
making vital disk data recoverable even in the event of os compromises has become necessity in view of the increased prevalence of os vulnerability exploits over the recent years we present the design and implementation of secure disk system svsds that performs selective flexible and transparent versioning of stored data at the disk level in addition to versioning svsds actively enforces constraints to protect executables and system log files most existing versioning solutions that operate at the disk level are unaware of the higher level abstractions of data and hence are not customizable we evolve hybrid solution that combines the advantages of disk level and file system level versioning systems thereby ensuring security while at the same time allowing flexible policies we implemented and evaluated software level prototype of svsds in the linux kernel and it shows that the space and performance overheads associated with selective versioning at the disk level are minimal
community based question answering cqa services have accumulated millions of questions and their answers over time in the process of accumulation cqa services assume that questions always have unique best answers however with an in depth analysis of questions and answers on cqa services we find that the assumption cannot be true according to the analysis at least of the cqa best answers are reusable when similar questions are asked again but no more than of them are indeed the unique best answers we conduct the analysis by proposing taxonomies for cqa questions and answers to better reuse the cqa content we also propose applying automatic summarization techniques to summarize answers our results show that question type oriented summarization techniques can improve cqa answer quality significantly
we study the algorithmic and structural properties of very large realistic social contact networks we consider the social network for the city of portland oregon usa developed as part of the transims episims project at the los alamos national laboratory the most expressive social contact network is bipartite graph with two types of nodes people and locations edges represent people visiting locations on typical day three types of results are presented our empirical results show that many basic characteristics of the dataset are well modeled by random graph approach suggested by fan chung graham and lincoln lu the cl model with power law degree distribution ii we obtain fast approximation algorithms for computing basic structural properties such as clustering coefficients and shortest paths distribution we also study the dominating set problem for such networks this problem arose in connection with optimal sensor placement for disease detection we present fast approximation algorithm for computing near optimal dominating sets iii given the close approximations provided by the cl model to our original dataset and the large data volume we investigate fast methods for generating such random graphs we present methods that can generate such random network in near linear time and show that these variants asymptotically share many key features of the cl model and also match the portland social networkthe structural results have been used to study the impact of policy decisions for controlling large scale epidemics in urban environments
most rdbmss maintain set of histograms for estimating the selectivities of given queries these selectivities are typically used for cost based query optimization while the problem of building an accurate histogram for given attribute or attribute set has been well studied little attention has been given to the problem of building and tuning set of histograms collectively for multidimensional queries in self managed manner based only on query feedback in this paper we present sash self adaptive set of histograms that addresses the problem of building and maintaining set of histograms sash uses novel two phase method to automatically build and maintain itself using query feedback information only in the online tuning phase the current set of histograms is tuned in response to the estimation error of each query in an online manner in the restructuring phase new and more accurate set of histograms replaces the current set of histograms the new set of histograms attribute sets and memory distribution is found using information from batch of query feedback we present experimental results that show the effectiveness and accuracy of our approach
several approaches to semantic web services including owls swsf and wsmo have been proposed in the literature with the aim to enable automation of various tasks related to web services including discovery contracting enactment monitoring and mediation the ability to specify processes and to reason about them is central to these initiatives in this paper we analyze the wsmo choreography model which is based on abstract state machines asms and propose methodology for generating wsmo choreography from visual specifications we point out the limitations of the current wsmo model and propose faithful extension that is based on concurrent transaction logic ctr the advantage of ctr based model is that it uniformly captures number of aspects that previously required separate mechanisms orwere not captured at all these include process specification contracting for services service enactment and reasoning
this paper presents new approach to using dynamic information flow analysis to detect attacks against application software the approach can be used to reveal and under some conditions to prevent attacks that violate specified information flow policy or exhibit known information flow signature when used in conjunction with automatic cluster analysis the approach can also reveal novel attacks that exhibit unusual patterns of information flows set of prototype tools implementing the approach have been developed for java byte code programs case studies in which this approach was applied to several subject programs are described
it is well known that multi chart parameterizations introduce seams over meshes causing serious problems for applications like texture filtering relief mapping and simulations in the texture domain here we present two techniques collectively known as continuity mapping that together make any multi chart parameterization seamless traveler’s map is used for solving the spatial discontinuities of multi chart parameterizations in texture space thanks to bidirectional mapping between areas outside the charts and the corresponding areas inside and sewing the seams addresses the sampling mismatch at chart boundaries using set of stitching triangles that are not true geometry but merely evaluated on perfragment basis to perform consistent linear interpolation between non adjacent texel values continuity mapping does not require any modification of the artist provided textures or models it is fully automatic and achieves continuity with small memory and computational costs
we present self healing on demand geographic path routing protocol ogpr for mobile ad hoc networks ogpr is an efficient stateless and scalable routing protocol that inherits the best of the three well known techniques for routing in ad hoc networks viz greedy forwarding reactive route discovery and source routing in ogpr protocol source nodes utilize the geographic topology information obtained during the location request phase to establish geographic paths to their respective destinations geographic paths decouple node id’s from the paths and are immune to changes in the network topology further they help nodes avoid dead ends due to greedy forwarding to utilize geographic paths even in sparser networks ogpr uses path healing mechanism that helps geographic paths adapt according to the network topology we present extensions to ogpr protocol to cope with networks containing unidirectional links further we present results from an extensive simulation study using glomosim simulation results show that ogpr achieves higher percentage packet delivery and lower control overhead compared to combination of gpsr gls protocols aodv and dsr under wide range of network scenarios
this work addresses the problem of processing continuous nearest neighbor nn queries for moving objects trajectories when the exact position of given object at particular time instant is not known but is bounded by an uncertainty region as has already been observed in the literature the answers to continuous nn queries in spatio temporal settings are time parameterized in the sense that the objects in the answer vary over time incorporating uncertainty in the model yields additional attributes that affect the semantics of the answer to this type of queries in this work we formalize the impact of uncertainty on the answers to the continuous probabilistic nn queries provide compact structure for their representation and efficient algorithms for constructing that structure we also identify syntactic constructs for several qualitative variants of continuous probabilistic nn queries for uncertain trajectories and present efficient algorithms for their processing
lock free algorithms have been developed to avoid various problems associated with using locks to control access to shared data structures instead of preventing interference between processes using mutual exclusion lock free algorithms must ensure correct behaviour in the presence of interference while this avoids the problems with locks the resulting algorithms are typically more intricate than lock based algorithms and allow more complex interactions between processes the result is that even when the basic idea is easy to understand the code implementing lock free algorithms is typically very subtle hard to understand and hard to get right in this paper we consider the well known lock free queue implementation due to michael and scott and show how slightly simplified version of this algorithm can be derived from an abstract specification via series of verifiable refinement steps reconstructing design history in this way allows us to examine the kinds of design decisions that underlie the algorithm as describe by michael and scott and to explore the consequences of some alternative design choices our derivation is based on refinement calculus with concurrent composition combined with reduction approach based on that proposed by lipton lamport cohen and others which we have previously used to derive scalable stack algorithm the derivation of michael and scott’s queue algorithm introduces some additional challenges because it uses helper mechanism which means that part of an enqueue operation can be performed by any process also in simulation proof the treatment of dequeue on an empty queue requires the use of backward simulation
the wavelet decomposition is proven tool for constructing concise synopses of large data sets that can be used to obtain fast approximate answers existing research studies focus on selecting an optimal set of wavelet coefficients to store so as to minimize some error metric without however seeking to reduce the size of the wavelet coefficients themselves in many real data sets the existence of large spikes in the data values results in many large coefficient values lying on paths of conceptual tree structure known as the error tree to exploit this fact we introduce in this paper novel compression scheme for wavelet synopses termed hierarchically compressed wavelet synopses that fully exploits hierarchical relationships among coefficients in order to reduce their storage our proposed compression scheme allows for larger number of coefficients to be stored for given space constraint thus resulting in increased accuracy of the produced synopsis we propose optimal approximate and greedy algorithms for constructing hierarchically compressed wavelet synopses that minimize the sum squared error while not exceeding given space budget extensive experimental results on both synthetic and real world data sets validate our novel compression scheme and demonstrate the effectiveness of our algorithms against existing synopsis construction algorithms
loop fusion is commonly used to improve the instruction level parallelism of loops for high performance embedded computing systems loop fusion however is not always directly applicable because the fusion prevention dependencies may exist among loops most of the existing techniques still have limitations in fully exploiting the advantages of loop fusion in this paper we present general loop fusion technique for loops or nested loops based on the loop dependency graph model retiming and multi dimensional retiming concepts we show that any model loop can be legally fused using our legalizing fusion technique polynomial time algorithms are developed to solve the loop fusion problem for model loops considering both timing and code size of the final code our technique produces the final code and calculates the resultant code size directly from the retiming values the experimental results show that our loop fusion technique always significantly reduces the schedule length
we consider dynamic compact routing in metrics of low doubling dimension given set of nodes in metric space with nodes joining leaving and moving we show how to maintain set of links that allows compact routing on the graph given constant and dynamic node set with normalized diameter in metric of doubling dimension we achieve dynamic graph with maximum degree log and an optimal stretch compact name independent routing scheme on with log bit storage at each node moreover the amortized number of messages for node joining leaving and moving is polylogarithmic in the normalized diameter and the cost total distance traversed by all messages generated of node move operation is proportional to the distance the node has traveled times polylog factor we can also show similar bounds for stretch compact dynamic labeled routing scheme one important application of our scheme is that it also provides node location scheme for mobile ad hoc networks with the same characteristics as our name independent scheme above namely optimal stretch for lookup polylogarithmic storage overhead and degree at the nodes and locality sensitive node move join leave operations we also show how to extend our dynamic compact routing scheme to address the more general problem of devising locality sensitive distributed hash tables dhts in dynamic networks of low doubling dimension our proposed dht scheme also has optimal stretch polylogarithmic storage overhead and degree at the nodes locality sensitive publish unpublish and node move join leave operations
we present bit vector algorithm for the optimal and economical placement of computations within flow graphs which is as efficient as standard uni directional analyses the point of our algorithm is the decomposition of the bi directional structure of the known placement algorithms into sequence of backward and forward analysis which directly implies the efficiency result moreover the new compositional structure opens the algorithm for modification two further uni directional analysis components exclude any unnecessary code motion this laziness of our algorithm minimizes the register pressure which has drastic effects on the run time behaviour of the optimized programs in practice where an economical use of registers is essential
current studies on the storage of xml data are focused on either the efficient mapping of xml data onto an existing rdbms or the development of native xml storage some native xml storages store each xml node in parsed object form clustering which means the physical arrangement of objects can be an important factor in improving the performance in this storage model in this paper we propose clustering method that stores data nodes in an xml document into the native xml storage the proposed clustering method uses path similarities between data nodes which can reduce page os required for query processing in addition we propose query processing method using signatures that facilitate the cluster level access on the stored data to benefit from the proposed clustering method this method can process path query by accessing only small number of clusters and thus need not use all of the clusters hence enabling the path query to be processed efficiently by skipping unnecessary data finally we compare the performance of the proposed method with that of the existing ones our results show that the performance of xml storage can be improved by using proper clustering method
even though interaction is an important part of information visualization infovis it has garnered relatively low level of attention from the infovis community few frameworks and taxonomies of infovis interaction techniques exist but they typically focus on low level operations and do not address the variety of benefits interaction provides after conducting an extensive review of infovis systems and their interactive capabilities we propose seven general categories of interaction techniques widely used in infovis select explore reconfigure encode abstract elaborate filter and connect these categories are organized around user’s intent while interacting with system rather than the low level interaction techniques provided by system the categories can act as framework to help discuss and evaluate interaction techniques and hopefully lay an initial foundation toward deeper understanding and science of interaction
image morphing has been extensively studied in computer graphics and it can be summarized as follows given two input images morphing algorithms produce sequence of inbetween images which transforms the source image into the target image in visually pleasant way in this paper we propose an algorithm based on recent advances from texture from sample ideas which synthesizes metamorphosis sequence targeted specifically for textures we use the idea of binary masks or texton masks to control and drive the morphing sequence our solution provides an automatic mapping from source texels into target texels and thus guarantees coherent and visually smoothing transition from the source texture into the target texture we compare our morphing results with prior work and with simple interpolation between corresponding pixels in the same source and target images
profiling can accurately analyze program behavior for select data inputs we show that profiling can also predict program locality for inputs other than profiled ones here locality is defined by the distance of data reuse studying whole program data reuse may reveal global patterns not apparent in short distance reuses or local control flow however the analysis must meet two requirements to be useful the first is efficiency it needs to analyze all accesses to all data elements in full size benchmarks and to measure distance of any length and in any required precision the second is predication based on few training runs it needs to classify patterns as regular and irregular and for regular ones it should predict their changing behavior for other inputs in this paper we show that these goals are attainable through three techniques approximate analysis of reuse distance originally called lru stack distance pattern recognition and distance based sampling when tested on integer and floating point programs from spec and other benchmark suites our techniques predict with on average accuracy for data inputs up to hundreds times larger than the training inputs based on these results the paper discusses possible uses of this analysis
networks on chips nocs have evolved as the communication design paradigm of future systems on chips socs in this work we target the noc design of complex socs with heterogeneous processor memory cores providing quality of service qos for the application we present an integrated approach to mapping of cores onto noc topologies and physical planning of nocs where the position and size of the cores and network components are computed our design methodology automates noc mapping physical planning topology selection topology optimization and instantiation bridging an important design gap in building application specific nocs we also present methodology to guarantee qos for the application during the mapping physical planning process by satisfying the delay jitter constraints and real time constraints of the traffic streams experimental studies show large area savings up to bandwidth savings up to and network component savings up to in buffer count in number of wires in switch ports compared to traditional design approaches
in the context of open source development or software evolution developers often face test suites which have been developed with no apparent rationale and which may need to be augmented or refined to ensure sufficient dependability or even reduced to meet tight deadlines we refer to this process as the re engineering of test suites it is important to provide both methodological and tool support to help people understand the limitations of test suites and their possible redundancies so as to be able to refine them in cost effective manner to address this problem in the case of black box category partition testing we propose methodology and tool based on machine learning that has shown promising results on case study involving students as testers
we consider the scheduling of sporadic real time task system on an identical multiprocessor though pfair algorithms are theoretically optimal for such task systems in practice their runtime overheads can significantly reduce the amount of useful work that is accomplished on the other hand if all deadlines need to be met then every known non pfair algorithm requires restrictions on total system utilization that can approach approximately of the available processing capacity this may be overkill for soft real time systems which can tolerate occasional or bounded deadline misses ie bounded tardiness in this paper we derive tardiness bounds under preemptive and non preemptive global mathsf edf when the total system utilization is not restricted except that it not exceed the available processing capacity hence processor utilization can be improved for soft real time systems on multiprocessors our tardiness bounds depend on the total system utilization and per task utilizations and execution costs the lower these values the lower the tardiness bounds as final remark we note that global mathsf edf may be superior to partitioned mathsf edf for multiprocessor based soft real time systems in that the latter does not offer any scope to improve system utilization even if bounded tardiness can be tolerated
cluster based routes help data transmission by acting as backbone paths in wireless ad hoc networks some mobile users form cluster to obtain common multimedia services however node mobility does not guarantee multimedia transmission in network with dynamic topology this study first presents novel measure of cluster based route’s stability according to prediction of connection probability at specific time in future helping transmit multimedia streams across many clusters in network the selected route is sufficiently stable to enable cluster based mobile users to receive time consuming multimedia streaming in future secondly inter cluster connectivity maintenance schemes including forward and backward connectivity methods to protect nonstop multimedia streams are developed to maintain inter cluster links finally analytical and evaluation results indicate that the cluster based route chosen according to the connection probability scheme is more effective than routes chosen by other flat on demand routing protocols for transmitting multimedia streaming in ad hoc networks
successful online communities have complex cooperative arrangements articulations of work and integration practices they require technical infrastructure to support broad division of labor yet the research literature lacks empirical studies that detail which types of work are valued by participants in an online community content analysis of wikipedia barnstars personalized tokens of appreciation given to participants reveals wide range of valued work extending far beyond simple editing to include social support administrative actions and types of articulation work our analysis develops theoretical lens for understanding how wiki software supports the creation of articulations of work we give implications of our results for communities engaged in large scale collaborations
finding good rest position for the disk head is very important for the performance of hard disk it has been shown in the past that rest positions obtained through anticipatory movements of the disk head can indeed improve response time but practical algorithms have not been described yet in this paper we describe software technique for perfoming anticipatory movements of the disk head in particular we show that by partitioning the disk controller memory into part used for caching and part used for predictive movements lower times as compared with the usual read ahead cache configurations are obtained through trace driven simulations we show in fact that significant improvements in the disk times can be obtained as compared to standard disk caching since the technique should be realized at the firmware level in the disk controller and no hardware modifications are needed the implementation cost is low
the object oriented oo paradigm has become increasingly popular in recent years researchers agree that although maintenance may turn out to be easier for oo systems it is unlikely that the maintenance burden will completely disappear one approach to controlling software maintenance costs is the utilization of software metrics during the development phase to help identify potential problem areas many new metrics have been proposed for oo systems but only few of them have been validated the purpose of this research is to empirically explore the validation of three existing oo design complexity metrics and specifically to assess their ability to predict maintenance timethis research reports the results of validating three metrics interaction level il interface size is and operation argument complexity oac controlled experiment was conducted to investigate the effect of design complexity as measured by the above metrics on maintenance time each of the three metrics by itself was found to be useful in the experiment in predicting maintenance performance
the ability to do fine grain power management via local voltage selection has shown much promise via the use of voltage frequency islands vfis vfi based designs combine the advantages of using fine grain speed and voltage control for reducing energy requirements while allowing for maintaining performance constraints we propose hardware based technique to dynamically change the clock frequencies and potentially voltages of vfi system driven by the dynamic workload this technique tries to change the frequency of synchronous island such that it will have efficient power utilization while satisfying performance constraints we propose hardware design that can be used to change the frequencies of various synchronous islands interconnected together by mixed clock mixed voltage fifo interfaces results show up to power savings for the set of benchmarks considered with no loss in throughput
the use of contextual information in building concept detectors for digital media has caught the attention of the multimedia community in the recent years generally speaking any information extracted from image headers or tags or from large collections of related images and used at classification time can be considered as contextual such information being discriminative in its own right when combined with pure content based detection systems using pixel information can improve the overall recognition performance significantly in this paper we describe framework for probabilistically modeling geographical information using geographical information systems gis database for event and activity recognition in general purpose consumer images such as those obtained from flickr the proposed framework discriminatively models the statistical saliency of geo tags in describing an activity or event our work leverages the inherent patterns of association between events and their geographical venues we use descriptions of small local neighborhoods to form bags of geo tags as our representation statistical coherence is observed in such descriptions across wide range of event classes and across many different users in order to test our approach we identify certain classes of activities and events wherein people commonly participate and take pictures images and corresponding metadata for the identified events and activities are obtained from flickr we employ visual detectors obtained from columbia university columbia which perform pure visual event and activity recognition in our experiments we present the performance advantage obtained by combining contextual gps information with pixel based detection systems
with the fast increase in web activities web data mining has recently become an important research topic and is receiving significant amount of interest from both academic and industrial environments while existing methods are efficient for the mining of frequent path traversal patterns from the access information contained in log file these approaches are likely to over evaluate associations explicitly most previous studies of mining path traversal patterns are based on the model of uniform support threshold where single support threshold is used to determine frequent traversal patterns without taking into consideration such important factors as the length of pattern the positions of web pages and the importance of particular pattern etc as result low support threshold will lead to lots of uninteresting patterns derived whereas high support threshold may cause some interesting patterns with lower supports to be ignored in view of this this paper broadens the horizon of frequent path traversal pattern mining by introducing flexible model of mining web traversal patterns with dynamic thresholds specifically we study and apply the markov chain model to provide the determination of support threshold of web documents and further by properly employing some effective techniques devised for joining reference sequences the proposed algorithm dynamic threshold miner dtm not only possesses the capability of mining with dynamic thresholds but also significantly improves the execution efficiency as well as contributes to the incremental mining of web traversal patterns performance of algorithm dtm and the extension of existing methods is comparatively analyzed with synthetic and real web logs it is shown that the option of algorithm dtm is very advantageous in reducing the number of unnecessary rules produced and leads to prominent performance improvement
to separate object motion from camera motion in an aerial video consecutive frames are registered at their planar background feature points are selected in consecutive frames and those that belong to the background are identified using the projective constraint corresponding background feature points are then used to register and align the frames by aligning video frames at the background and knowing that objects move against the background means to detect and track moving objects is provided only scenes with planar background are considered in this study experimental results show improvement in registration accuracy when using the projective constraint to determine the registration parameters as opposed to finding the registration parameters without the projective constraint
while the problem of high performance packet classification has received great deal of attention in recent years the research community has yet to develop algorithmic methods that can overcome the drawbacks of tcam based solutions this paper introduces hybrid approach which partitions the filter set into subsets that are easy to search efficiently the partitioning strategy groups filters that are close to one another in tuple space which makes it possible to use information from single field lookups to limit the number of subsets that must be searched we can trade off running time against space consumption by adjusting the coarseness of the tuple space partition we find that for two dimensional filter sets the method finds the best matching filter with just four hash probes while limiting the memory space expansion factor to about we also introduce novel method for longest prefix matching lpm which we use as component of the overall packet classification algorithm our lpm method uses small amount of on chip memory to speedup the search of an off chip data structure but uses significantly less on chip memory than earlier methods based on bloom filters
the user interfaces of today’s development environments have bento box design that partitions information into separate areas this design makes it difficult to stay oriented in the open documents and to synthesize information shown in different areas code canvas takes new approach by providing an infinite zoomable surface for software development canvas both houses editable forms of all of project’s documents and allows multiple layers of visualization over those documents by uniting the content of project and information about it onto single surface code canvas is designed to leverage spatial memory to keep developers oriented and to make it easy to synthesize information
we present logically qualified data types abbreviated to liquid types system that combines hindley milner type inference with predicate abstraction to automatically infer dependent types precise enough to prove variety of safety properties liquid types allow programmers to reap many of the benefits of dependent types namely static verification of critical properties and the elimination of expensive run time checks without the heavy price of manual annotation we have implemented liquid type inference in dsolve which takes as input an ocaml program and set of logical qualifiers and infers dependent types for the expressions in the ocaml program to demonstrate the utility of our approach we describe experiments using dsolve to statically verify the safety of array accesses on set of ocaml benchmarks that were previously annotated with dependent types as part of the dml project we show that when used in conjunction with fixed set of array bounds checking qualifiers dsolve reduces the amount of manual annotation required for proving safety from of program text to under
continuous advancements in semiconductor technology enable the design of complex systems on chips socs composed of tens or hundreds of ip cores at the same time the applications that need to run on such platforms have become increasingly complex and have tight power and performance requirements achieving satisfactory design quality under these circumstances is only possible when both computation and communication refinement are performed efficiently in an automated and synergistic manner consequently formal and disciplined system level design methodologies are in great demand for future multiprocessor design this article provides broad overview of some fundamental research issues and state of the art solutions concerning both computation and communication aspects of system level design the methodology we advocate consists of developing abstract application and platform models followed by application mapping onto the target platform and then optimizing the overall system via performance analysis in addition communication refinement step is critical for optimizing the communication infrastructure in this multiprocessor setup finally simulation and prototyping can be used for accurate performance evaluation purposes
since the physical topology of mobile ad hoc networks manets is generally unstable an appealing approach is the construction of stable and robust virtual topology or backbone virtual backbone can play important roles related to routing and connectivity management in this paper the problem of providing such virtual backbone with low overhead is investigated in particular we propose an approach called virtual grid architecture vga that can be applied to both homogeneous and heterogeneous manets we study the performance tradeoffs between the vga clustering approach and an optimal clustering based on an integer linear program ilp formulation many properties of the vga clustering approach eg vga size route length over vga and clustering overhead are also studied and quantified analytical as well as simulation results show that average route length over vga and vga cardinality tend to be close to optimal the results also show that the overhead of creating and maintaining vga is greatly reduced and thus the routing performance is improved significantly to illustrate two hierarchical routing techniques that operate on top of vga are presented and evaluated performance evaluation shows that vga clustering approach albeit simple is able to provide more stable long lifetime routes deliver more packets and accept more calls
most aggregation functions are limited to either categorical or numerical values but not both values in this paper we define three concepts of aggregation function and introduce novel method to aggregate multiple instances that consists of both the categorical and numerical values we show how these concepts can be implemented using clustering techniques in our experiment we discretize continuous values before applying the aggregation function on relational datasets with the empirical results obtained we demonstrate that our transformation approach using clustering techniques as means of aggregating multiple instances of attribute’s values can compete with existing multi relational techniques such as progol and tilde in addition the effect of the number of interval for discretization on the classification performance is also evaluated
research on information extraction from web pages wrapping has seen much activity recently particularly systems implementations but little work has been done on formally studying the expressiveness of the formalisms proposed or on the theoretical foundations of wrapping in this paper we first study monadic datalog over trees as wrapping language we show that this simple language is equivalent to monadic second order logic mso in its ability to specify wrappers we believe that mso has the right expressiveness required for web information extraction and propose mso as yardstick for evaluating and comparing wrappers along the way several other results on the complexity of query evaluation and query containment for monadic datalog over trees are established and simple normal form for this language is presented using the above results we subsequently study the kernel fragment elog minus of the elog wrapping language used in the lixto system visual wrapper generator curiously elog minus exactly captures mso yet is easier to use indeed programs in this language can be entirely visually specified
there has been considerable interest in random projections an approximate algorithm for estimating distances between pairs of points in high dimensional vector space let in rn be our points in dimensions the method multiplies by random matrix in rd reducing the dimensions down to just for speeding up the computation typically consists of entries of standard normal it is well known that random projections preserve pairwise distances in the expectation achlioptas proposed sparse random projections by replacing the entries in with entries in with probabilities achieving threefold speedup in processing timewe recommend using of entries in with probabilities for achieving significant fold speedup with little loss in accuracy
in the literature there exit two types of cache consistency maintenance algorithms for mobile computing environments stateless and stateful in stateless approach the server is unaware of the cache contents at mobile user mu even though stateless approaches employ simple database management schemes they lack scalability and ability to support user disconnectedness and mobility on the other hand stateful approach is scalable for large database systems at the cost of nontrivial overhead due to server database management in this paper we propose novel algorithm called scalable asynchronous cache consistency scheme saccs which inherits the positive features of both stateless and stateful approaches saccs provides weak cache consistency for unreliable communication eg wireless mobile environments with small stale cache hit probability it is also highly scalable algorithm with minimum database management overhead the properties are accomplished through the use of flag bits at the server cache sc and mu cache muc an identifier id in muc for each entry after its invalidation and estimated time to live ttl for each cached entry as well as rendering of all valid entries of muc to uncertain state when an mu wakes up the stale cache hit probability is analyzed and also simulated under the rayleigh fading model of error prone wireless channels comprehensive simulation results show that the performance of saccs is superior to those of other existing stateful and stateless algorithms in both single and multicell mobile environments
greater availability and affordability of wireless technology has led to an increase in the number of wireless sensor network wsn applications where sense data is collected at central user point commonly outside the network geographically and topologically for processing resource constraints on nodes in the network coupled with considerable redundancy in the data generated mean that applications have to be developed with an eye to maximising energy efficiency in order to extend network life in network computation and in particular in network information extraction from data has been promoted as technique for achieving this aim in network processing however has been limited to systems where at most simple aggregate queries are evaluated the results of which are communicated to the outside world there is currently little research into how the idea of in network processing can be extended and implemented to allow more complex queries to be resolved within the network this paper examines key applicative query based approaches that utilise in network processing for query resolution identifies their strengths and limitations and puts forward ideas for facilitating in network complex query processing in wsns finally preliminary results of experiments where in network attribute based logical abstractions are used for processing complex queries are presented
this paper presents novel method of generating set of texture tiles from samples which can be seamlessly tiled into arbitrary size textures in real time compared to existing methods our approach is simpler and more advantageous in eliminating visual seams that may exist in each tile of the existing methods especially when the samples have elaborate features or distinct colors texture tiles generated by our approach can be regarded as single colored tiles on each orthogonal direction border which are easier for tiling and more suitable for sentence tiling experimental results demonstrate the feasibility and effectiveness of our approach
we propose ferry an architecture that extensively yet wisely exploits the underlying distributed hash table dht overlay structure to build an efficient and scalable platform for content based publish subscribe pub sub services ferry aims to host any and many content based pub sub services any pub sub service with unique scheme can run on top of ferry and multiple pub sub services can coexist on top of ferry for each pub sub service ferry does not need to maintain or dynamically generate any dissemination tree instead it exploits the embedded trees in the underlying dht to deliver events thereby imposing little overhead ferry can support pub sub scheme with large number of event attributes to deal with skewed distribution of subscriptions and events ferry uses one hop subscription push and attribute partitioning to balance load
nuca caches are cache memories that thanks to banked organization broadcast search and promotion demotion mechanism are able to tolerate the increasing wire delay effects introduced by technology scaling as consequence they will outperform conventional caches uca uniform cache architectures in future generation cores due to the promotion demotion mechanism we have found that in nuca cache the distribution of hits on the ways varies across applications as well as across different execution phases within single application in this paper we show how such behavior can be utilized to improve nuca power efficiency as well as to decrease its access latencies in particular we propose new nuca structure called way adaptable nuca cache in which the number of active ie powered on ways is dynamically adapted to the need of the running application our initial evaluation shows that consistent reduction of both the average number of active ways in average and the number of bank access requests in average is achieved without significantly affecting the ipc
the goal of this work is to develop text and speech translation system from spanish to basque this pair of languages shows quite odd characteristics as they differ extraordinarily in both morphology and syntax thus attractive challenges in machine translation are involved nevertheless since both languages share official status in the basque country the underlying motivation is not only academic but also practical finite state transducers were adopted as basic translation models the main contribution of this work involves the study of several techniques to improve probabilistic finite state transducers by means of additional linguistic knowledge two methods to cope with both linguistics and statistics were proposed the first one performed morphological analysis in an attempt to benefit from atomic meaningful units when it comes to rendering the meaning from one language to the other the second approach aimed at clustering words according to their syntactic role and used such phrases as translation unit from the latter approach phrase based finite state transducers arose as natural extension of classical ones the models were assessed under restricted domain task very repetitive and with small vocabulary experimental results shown that both morphological and syntactical approaches outperformed the baseline under different test sets and architectures for speech translation
abstract this paper presents novel mesh simplification algorithm it decouples the simplification process into two phases shape analysis and edge contraction in the analysis phase it imposes hierarchical structure on surface mesh by uniform hierarchical partitioning marks the importance of each vertex in the hierarchical structure and determines the affected regions of each vertex at the hierarchical levels in the contraction phase it also divides the simplification procedure into two steps half edge contraction and optimization in the first step memoryless quadric metric error and the importance of vertices in the hierarchical structure are combined to determine one operation of half edge contraction in the second step it repositions the vertices in the half edge simplified mesh by minimizing the multilevel synthesized quadric error on the corresponding affected regions from the immediately local to the more global the experiments illustrate the competitive results
we describe four usability enhancing interfaces to citidel aimed at improving the user experience and supporting personalized information access by targeted communities these comprise multimodal interaction facility with capability for out of turn input interactive visualizations for exploratory analysis translation center exposing multilingual interfaces as well as traditional usability enhancements pilot studies demonstrate the resulting improvements in quality as measured across number of metrics
the available set of potential features in real world databases is sometimes very large and it can be necessary to find small subset for classification purposes one of the most important techniques in data pre processing for classification is feature selection less relevant or highly correlated features decrease in general the classification accuracy and enlarge the complexity of the classifier the goal is to find reduced set of features that reveals the best classification accuracy for classifier rule based fuzzy models can be acquired from numerical data and be used as classifiers as rule based structures revealed to be useful qualitative description for classification systems this work uses fuzzy models as classifiers this paper proposes an algorithm for feature selection based on two cooperative ant colonies which minimizes two objectives the number of features and the classification error two pheromone matrices and two different heuristics are used for these objectives the performance of the method is compared with other features selection methods achieving equal or better performance
models can help software engineers to reason about design time decisions before implementing system this paper focuses on models that deal with non functional properties such as reliability and performance to build such models one must rely on numerical estimates of various parameters provided by domain experts or extracted by other similar systems unfortunately estimates are seldom correct in addition in dynamic environments the value of parameters may change over time we discuss an approach that addresses these issues by keeping models alive at run time and feeding bayesian estimator with data collected from the running system which produces updated parameters the updated model provides an increasingly better representation of the system by analyzing the updated model at run time it is possible to detect or predict if desired property is or will be violated by the running implementation requirement violations may trigger automatic reconfigurations or recovery actions aimed at guaranteeing the desired goals we illustrate working framework supporting our methodology and apply it to an example in which web service orchestrated composition is modeled through discrete time markov chain numerical simulations show the effectiveness of the approach
data warehouse collects and maintains large amount of data from multiple distributed and autonomous data sources often the data in it is stored in the form of materialized views in order to provide fast access to the integrated data however maintaining certain level consistency of warehouse data with the source data is challenging in distributed multiple source environment transactions containing multiple updates at one or more sources further complicate the consistency issuefollowing the four level consistency definition of view in warehouse we first present complete consistency algorithm for maintaining spj type materialized views incrementally our algorithm speed ups the view refreshment time provided that some extra moderate space in the warehouse is available we then give variant of the proposed algorithm by taking the update frequencies of sources into account we finally discuss the relationship between view’s certain level consistency and its refresh time it is difficult to propose an incremental maintenance algorithm such that the view is always kept at certain level consistency with the source data and the view’s refresh time is as fast as possible we trade off these two factors by giving an algorithm with faster view refresh time while the view maintained by the algorithm is strong consistency rather than complete consistency with the source data
real time update of access control policies that is updating policies while they are in effect and enforcing the changes immediately and automatically is necessary for many dynamic environments examples of such environments include disaster relief and war zone in such situations system resources may need re configuration or operational modes may change necessitating change of policies for the system to continue functioning the policies must be changed immediately and the modified policies automatically enforced in this paper we propose solution to this problem we consider real time update of access control policies in the context of database system in our model database consists of set of objects that are read and updated through transactions access to the data objects are controlled by access control policies which are stored in the form of policy objects we consider an environment in which different kinds of transactions execute concurrently some of these may be transactions updating policy objects updating policy objects while they are deployed can lead to potential security problems we propose algorithms that not only prevent such security problems but also ensure serializable execution of transactions the algorithms differ on the degree of concurrency provided and the kinds of policies each can update
we propose novel mrf based model for deformable image matching also known as registration the deformation is described by field of discrete variables representing displacements of blocks of pixels discontinuities in the deformation are prohibited by imposing hard pairwise constraints in the model exact maximum posteriori inference is intractable and we apply linear programming relaxation technique we show that when reformulated in the form of two coupled fields of and displacements the problem leads to simpler relaxation to which we apply the sequential tree reweighted message passing trw algorithm wainwright kolmogorov this enables image registration with large displacements at single scale we employ fast message updates for special type of interaction as was proposed felzenszwalb and huttenlocher for the max product belief propagation bp and introduce few independent speedups in contrast to bp the trw allows us to compute per instance approximation ratios and thus to evaluate the quality of the optimization the performance of our technique is demonstrated on both synthetic and real world experiments
the mantis multimodal system for networks of in situ wireless sensors provides new multithreaded embedded operating system integrated with general purpose single board hardware platform to enable flexible and rapid prototyping of wireless sensor networks the key design goals of mantis are ease of use ie small learning curve that encourages novice programmers to rapidly prototype novel sensor networking applications in software and hardware as well as flexibility so that expert researchers can leverage or develop advanced software features and hardware extensions to suit the needs of advanced research in wireless sensor networks
as social networks and rich media sharing are increasingly converging end user concerns regarding to whom how and why to direct certain digital content emerge between the pure private contribution and the pure public contribution exists large research and design space of semi public content and relationships the theoretical framework of gift giving correlates to semi public contributions in that it envelopes social relationships concerns for others and reciprocity and was consequently adopted in order to reveal and classify qualitative semi public end user concerns with content contribution the data collection was performed through online ethnographic methods in large photo sharing network the main data collection method used was forum message elicitation combined with referential methods such as interviews and application observation and usage the analysis of data resulted in descriptions concerning end user intentions to address dynamic recipient groupings the intentions to control the level of publicness of both digital content and its related social metadata tags contacts comments and links to other networks and the conclusion that users often refrained from providing material unless they felt able to control its direction
in this paper we propose to extend peer to peer semantic wikis with personal semantic annotations semantic wikis are one of the most successful semantic web applications in semantic wikis wikis pages are annotated with semantic data to facilitate the navigation information retrieving and ontology emerging semantic data represents the shared knowledge base which describes the common understanding of the community however in collaborative knowledge building process the knowledge is basically created by individuals who are involved in social process therefore it is fundamental to support personal knowledge building in differentiated way currently there are no available semantic wikis that support both personal and shared understandings in order to overcome this problem we propose pp collaborative knowledge building process and extend semantic wikis with personal annotations facilities to express personal understanding in this paper we detail the personal semantic annotation model and show its implementation in pp semantic wikis we also detail an evaluation study which shows that personal annotations demand less cognitive efforts than semantic data and are very useful to enrich the shared knowledge base
effective caching in the domain name system dns is critical to its performance and scalability existing dns only supports weak cache consistency by using the time to live ttl mechanism which functions reasonably well in normal situations however maintaining strong cache consistency in dns as an indispensable exceptional handling mechanism has become more and more demanding for three important objectives to quickly respond and handle exceptions such as sudden and dramatic internet failures caused by natural and human disasters to adapt increasingly frequent changes of internet protocol ip addresses due to the introduction of dynamic dns techniques for various stationed and mobile devices on the internet and to provide fine grain controls for content delivery services to timely balance server load distributions with agile adaptation to various exceptional internet dynamics strong dns cache consistency improves the availability and reliability of internet services in this paper we first conduct extensive internet measurements to quantitatively characterize dns dynamics then we propose proactive dns cache update protocol dnscup running as middleware in dns name servers to provide strong cache consistency for dns the core of dnscup is an optimal lease scheme called dynamic lease to keep track of the local dns name servers we compare dynamic lease with other existing lease schemes through theoretical analysis and trace driven simulations based on the dns dynamic update protocol we build dnscup prototype with minor modifications to the current dns implementation our system prototype demonstrates the effectiveness of dnscup and its easy and incremental deployment on the internet
body area networks ban is key enabling technology in healthcare such as remote health monitoring an important security issue during bootstrap phase of the ban is to securely associate group of sensor nodes to patient and generate necessary secret keys to protect the subsequent wireless communications due to the the ad hoc nature of the ban and the extreme resource constraints of sensor devices providing secure fast efficient and user friendly secure sensor association is challenging task in this paper we propose lightweight scheme for secure sensor association and key management in ban group of sensor nodes having no prior shared secrets before they meet establish initial trust through group device pairing gdp which is an authenticated group key agreement protocol where the legitimacy of each member node can be visually verified by human various kinds of secret keys can be generated on demand after deployment the gdp supports batch deployment of sensor nodes to save setup time does not rely on any additional hardware devices and is mostly based on symmetric key cryptography while allowing batch node addition and revocation we implemented gdp on sensor network testbed and evaluated its performance experimental results show that that gdp indeed achieves the expected design goals
large scale scientific investigation often includes collaborative data exploration among geographically distributed researchers the tools used for this exploration typically include some communicative component and this component often forms the basis for insight and idea sharing among collaborators minimizing the tool interaction required to locate interesting communications is therefore of paramount importance we present the design of novel visualization interface for representing the communications among multiple collaborating authors and detail the benefits of our approach versus traditional methods our visualization integrates directly with the existing data exploration interface we present our system in the context of an international research effort conducting collaborative analysis of accelerator simulations
the peak power consumption of hardware components affects their powersupply packaging and cooling requirements when the peak power consumption is high the hardware components or the systems that use them can become expensive and bulky given that components and systems rarely if ever actually require peak power it is highly desirable to limit power consumption to less than peak power budget based on which power supply packaging and cooling infrastructure scan be more intelligently provisioned in this paper we study dynamic approaches for limiting the powerconsumption of main memories specifically we propose four techniques that limit consumption by adjusting the power states of thememory devices as function of the load on the memory subsystem our simulations of applications from three benchmarks demonstrate that our techniques can consistently limit power to pre established budget two of the techniques can limit power with very low performance degradation our results also show that when using these superior techniques limiting power is at least as effective an energy conservation approach as state of the art technique sexplicitly designed for performance aware energy conservation these latter results represent departure from current energy management research and practice
in many core cmp architectures the cache coherence protocol is key component since it can add requirements of area and power consumption to the final design and therefore it could restrict severely its scalability area constraints limit the use of precise sharing codes to small or medium scale cmps power constraints make impractical to use broadcast based protocols for large scale cmpstoken cmp and dico cmp are cache coherence protocols that have been recently proposed to avoid the indirection problem of traditional directory based protocols however token cmp is based on broadcasting requests to all tiles while dico cmp adds precise sharing code to each cache entry in this work we address the traffic area trade off for these indirection aware protocols in particular we propose and evaluate several implementations of dico cmp which differ in the amount of coherence information that they must store our evaluation results show that our proposals entail good traffic area trade off by halving the traffic requirements compared to token cmp and considerably reducing the area storage required by dico cmp
this paper describes framework that allows user to synthesize human motion while retaining control of its qualitative properties the user paints timeline with annotations like walk run or jump from vocabulary which is freely chosen by the user the system then assembles frames from motion database so that the final motion performs the specified actions at specified times the motion can also be forced to pass through particular configurations at particular times and to go to particular position and orientation annotations can be painted positively for example must run negatively for example may not run backwards or as don’t care the system uses novel search method based around dynamic programming at several scales to obtain solution efficiently so that authoring is interactive our results demonstrate that the method can generate smooth natural looking motionthe annotation vocabulary can be chosen to fit the application and allows specification of composite motions run and jump simultaneously for example the process requires collection of motion data that has been annotated with the chosen vocabulary this paper also describes an effective tool based around repeated use of support vector machines that allows user to annotate large collection of motions quickly and easily so that they may be used with the synthesis algorithm
the goal of giving well defined meaning to information is currently shared by endeavors such as the semantic web as well as by current trends within knowledge management they all depend on the large scale formalization of knowledge and on the availability of formal metadata about information resources however the question how to provide the necessary formal metadata in an effective and efficient way is still not solved to satisfactory extent certainly the most effective way to provide such metadata as well as formalized knowledge is to let humans encode them directly into the system but this is neither efficient nor feasible furthermore as current social studies show individual knowledge is often less powerful than the collective knowledge of certain communityas potential way out of the knowledge acquisition bottleneck we present novel methodology that acquires collective knowledge from the world wide web using the googletm api in particular we present pankow concrete instantiation of this methodology which is evaluated in two experiments one with the aim of classifying novel instances with regard to an existing ontology and one with the aim of learning sub superconcept relations
near duplicate keyframes ndks are important visual cues to link news stories from different tv channel time language etc however the quadratic complexity required for ndk detection renders it intractable in large scale news video corpus to address this issue we propose temporal semantic and visual partitioning model to divide the corpus into small overlapping partitions by exploiting domain knowledge and corpus characteristics this enables us to efficiently detect ndks in each partition separately and then link them together across partitions we divide the corpus temporally into sequential partitions and semantically into news story genre groups and within each partition we visually group potential ndks by using asymmetric hierarchical means clustering on our proposed semi global image features in each visual group we detect ndk pairs by exploiting our proposed sift based fast keypoint matching scheme based on local color information of keypoints finally the detected ndk groups in each partition are linked up via transitivity propagation of ndks shared by different partitions the testing on trecvid corpus with keyframes shows that our proposed approach could result in multifold increase in speed as compared to the best reported approach and complete the ndk detection in manageable time with satisfactory accuracy
most of the previously proposed image based rendering approaches rely on large number of samples accurate depth information or geometric proxies it is therefore challenge to apply them to render novel views for dynamic scenes as the required information is difficult to obtain in real time the proposed approach requires only sparely sampled data two new interpolation techniques are presented in this paper the combination of which can produce reasonable rendering results for sparsely sampled real scenes the first one which is color matching based interpolation searches for possible physical point along the testing ray using color information in nearby reference images the second technique which is disparity matching based interpolation tries to find the closest intersection between the testing ray and the disparity surfaces defined by nearby reference images both approaches are designed as backward rendering techniques and can be combined to produce robust results our experiments suggest that the proposed approach is capable of handling complex dynamic real scenes offline
multi agent systems where the members are developed by parties with competing interests and where there is no access to member’s internal state are often classified as open the specification of open agent systems of this sort is largely seen as design time activity moreover there is no support for run time specification modification due to environmental social or other conditions however it is often required to revise the specification during the system execution to address this requirement we present an infrastructure for dynamic specifications that is specifications that may be modified at run time by the agents the infrastructure consists of well defined procedures for proposing modification of the rules of the game as well as decision making over and enactment of proposed modifications we employ the action language to formalise dynamic specifications and the causal calculator implementation of to execute the specifications we illustrate our infrastructure by presenting dynamic specification of resource sharing protocol
so what is all this dtn research about anyway sceptics ask why are there no dtn applications or why is dtn performance so miserable this article attempts to address some of these complaints we present suggestions of expectations for applications and metrics for performance which suggest more tolerant view of research in the area
this paper presents fuzzy qualitative representation of conventional trigonometry with the goal of bridging the gap between symbolic cognitive functions and numerical sensing control tasks in the domain of physical systems especially in intelligent robotics fuzzy qualitative coordinates are defined by replacing unit circle with fuzzy qualitative circle cartesian translation and orientation are defined by their normalized fuzzy partitions conventional trigonometric functions rules and the extensions to triangles in euclidean space are converted into their counterparts in fuzzy qualitative coordinates using fuzzy logic and qualitative reasoning techniques this approach provides promising representation transformation interface to analyze general trigonometry related physical systems from an artificial intelligence perspective fuzzy qualitative trigonometry has been implemented as matlab toolbox named xtrig in terms of tuple fuzzy numbers examples are given throughout the paper to demonstrate the characteristics of fuzzy qualitative trigonometry one of the examples focuses on robot kinematics and also explains how contributions could be made by fuzzy qualitative trigonometry to the intelligent connection of low level sensing control tasks to high level cognitive tasks
recently there has been significant theoretical progress towards fixed parameter algorithms for the dominating set problem of planar graphs it is known that the problem on planar graph with vertices and dominating number can be solved in time using tree branch decomposition based algorithms in this paper we report computational results of fomin and thilikos algorithm which uses the branch decomposition based approach the computational results show that the algorithm can solve the dominating set problem of large planar graphs in practical time and memory space for the class of graphs with small branchwidth for the class of graphs with large branchwidth the size of instances that can be solved by the algorithm in practice is limited to about one thousand edges due to memory space bottleneck the practical performances of the algorithm coincide with the theoretical analysis of the algorithm the results of this paper suggest that the branch decomposition based algorithms can be practical for some applications on planar graphs
method of automatic abstraction is presented that uses proofs of unsatisfiability derived from sat based bounded model checking as guide to choosing an abstraction for unbounded model checking unlike earlier methods this approach is not based on analysis of abstract counterexamples the performance of this approach on benchmarks derived from microprocessor verification indicates that sat solvers are quite effective in eliminating logic that is not relevant to given property moreover benchmark results suggest that when bounded model checking successfully terminates and the problem is unsatisfiable the number of state variables in the proof of unsatisfiability tends to be small in almost all cases tested when bounded model checking succeeded unbounded model checking of the resulting abstraction also succeeded
we present ecce and logen two partial evaluators for prolog using the online and offline approach respectively we briefly present the foundations of these tools and discuss various applications we also present new implementations of these tools carried out in ciao prolog in addition to command line interface new user friendly web interfaces were developed these enable non expert users to specialise logic programs using web browser without the need for local installation
current data repositories include variety of data types including audio images and time series state of the art techniques for indexing such data and doing query processing rely on transformation of data elements into points in multidimensional feature space indexing and query processing then take place in the feature space in this paper we study algorithms for finding relationships among points in multidimensional feature spaces specifically algorithms for multidimensional joins like joins of conventional relations correlations between multidimensional feature spaces can offer valuable information about the data sets involved we present several algorithmic paradigms for solving the multidimensional join problem and we discuss their features and limitations we propose generalization of the size separation spatial join algorithm named multidimensional spatial join msj to solve the multidimensional join problem we evaluate msj along with several other specific algorithms comparing their performance for various dimensionalities on both real and synthetic multidimensional data sets our experimental results indicate that msj which is based on space filling curves consistently yields good performance across wide range of dimensionalities
sensor networks consist of autonomous wireless sensor nodes that are networked together in an ad hoc fashion the tiny nodes are equipped with substantial processing capabilities enabling them to combine and compress their sensor data the aim is to limit the amount of network traffic and as such conserve the nodes limited battery energy however due to the small packet payload the mac header is significant and energy costly overhead to remedy this we propose novel scheme for mac address assignment the two key features which make our approach unique are the exploitation of spatial address reuse and an encoded representation of the addresses in data packets to assign the addresses we develop purely distributed algorithm that relies solely on local messsage exchanges other salient features of our approach are the ability to handle unidirectional links and the excellent scalability of both the assignment algorithm and address representation in typical scenarios the mac overhead is reduced by factor of three compared to existing approaches
this paper presents fast and accurate routing demand estimation called rudy and its efficient integration in force directed quadratic placer to optimize placements for routability rudy is based on rectangular uniform wire density per net and accurately models the routing demand of circuit as determined by the wire distribution after final routing unlike published routing demand estimation rudy depends neither on bin structure nor on certain routing model to estimate the behavior of router therefore rudy is independent of the router our fast and robust force directed quadratic placer is based on generic demand and supply model and is guided by the routing demand estimation rudy to optimize placements for routability this yields placer which simultaneously reduces the routing demand in congested regions and increases the routing supply there therefore our placer fully utilizes the potential to optimize the routability this results in the best published routed wirelength of the ibmv benchmark suite until now in detail our approach outperforms mpl rooster and aplace by and respectively compared by the cpu times which rooster needs to place this benchmark our routability optimization placer is eight times faster
we propose two approximation algorithms for identifying communities in dynamic social networks communities are intuitively characterized as unusually densely knit subsets of social network this notion becomes more problematic if the social interactions change over time aggregating social networks over time can radically misrepresent the existing and changing community structure recently we have proposed an optimization based framework for modeling dynamic community structure also we have proposed an algorithm for finding such structure based on maximum weight bipartite matching in this paper we analyze its performance guarantee for special case where all actors can be observed at all times in such instances we show that the algorithm is small constant factor approximation of the optimum we use similar idea to design an approximation algorithm for the general case where some individuals are possibly unobserved at times and to show that the approximation factor increases twofold but remains constant regardless of the input size this is the first algorithm for inferring communities in dynamic networks with provable approximation guarantee we demonstrate the general algorithm on real data sets the results confirm the efficiency and effectiveness of the algorithm in identifying dynamic communities
motivated by the importance of accurate identification for range of applications this paper compares and contrasts the effective and efficient classification of network based applications using behavioral observations of network traffic and those using deep packet inspection importantly throughout our work we are able to make comparison with data possessing an accurate independently determined ground truth that describes the actual applications causing the network traffic observed in unique study in both the spatial domain comparing across different network locations and in the temporal domain comparing across number of years of data we illustrate the decay in classification accuracy across range of application classification mechanisms further we document the accuracy of spatial classification without training data possessing spatial diversity finally we illustrate the classification of udp traffic we use the same classification approach for both stateful flows tcp and stateless flows based upon udp importantly we demonstrate high levels of accuracy greater than for the worst circumstance regardless of the application
practical analysis tools for distributed authorization need to answer quickly and accurately the question who can access this resource dap delegation with acyclic paths is distributed authorization framework introduced in that tries to inter operate better with standard pki mechanisms while retaining some of the benefits of new trust management schemes dap has an acyclicity requirement which makes it more difficult to answer the question quickly in this paper we use technique borrowed from compiler optimization dominator tree problem decomposition to overcome this limitation of dap with fast heuristic we show through simulation the heuristic’s performance in realistic federated resource management scenario we also show how this heuristic can be complemented by clone analysis techniques that exploit similarities between principals to further improve performance we are currently using the heuristic and clone analysis in practice in design analysis security tool
materialized views can be maintained by submitting maintenance queries to the data sources however the query results may be erroneous due to concurrent source updates state of the art maintenance strategies typically apply compensations to resolve such conflicts and assume all source schemata remain stable over time in loosely coupled dynamic environment the sources may autonomously change not only their data but also their schema or semantics consequently either the maintenance or the compensation queries may be broken unlike compensation based approaches found in the literature we instead model the complete materialized view maintenance process as view maintenance transaction vm lowbar transaction this way the anomaly problem can be rephrased as the serializability of vm lowbar transactions to achieve vm lowbar transaction serializability we propose multiversion concurrency control algorithm called txnwrap which is shown to be the appropriate design for loosely coupled environments with autonomous data sources txnwrap is complementary to the maintenance algorithms proposed in the literature since it removes concurrency issues from consideration allowing the designer to focus on the maintenance logic we show several optimizations of txnwrap in particular space optimizations on versioned data materialization and parallel maintenance scheduling with these optimizations txnwrap even outperforms state of the art view maintenance solutions in terms of refresh time further several design choices of txnwrap are studied each having its respective advantages for certain environmental settings correctness proof based on transaction theory for txnwrap is also provided last we have implemented txnwrap the experimental results confirm that txnwrap achieves predictable performance under varying rate of concurrency
cost estimation is one of the most important but most difficult tasks in software project management many methods have been proposed for software cost estimation analogy based estimation abe which is essentially case based reasoning cbr approach is one popular technique to improve the accuracy of abe method several studies have been focusing on the adjustments to the original solutions however most published adjustment mechanisms are based on linear forms and are restricted to numerical type of project features on the other hand software project datasets often exhibit non normal characteristics with large proportions of categorical features to explore the possibilities for better adjustment mechanism this paper proposes artificial neural network ann for non linear adjustment to abe nabe with the learning ability to approximate complex relationships and incorporating the categorical features the proposed nabe is validated on four real world datasets and compared against the linear adjusted abes cart ann and swr subsequently eight artificial datasets are generated for systematic investigation on the relationship between model accuracies and dataset properties the comparisons and analysis show that non linear adjustment could generally extend abe’s flexibility on complex datasets with large number of categorical features and improve the accuracies of adjustment techniques
automatic text classification tc is essential for information sharing and management its ideal goals are to achieve high quality tc accepting almost all documents that should be accepted ie high recall and rejecting almost all documents that should be rejected ie high precision unfortunately the ideal goals are rarely achieved making automatic tc not suitable for those applications in which classifier’s erroneous decision may incur high cost and or serious problems one way to pursue the ideal is to consult users to confirm the classifier’s decisions so that potential errors may be corrected however its main challenge lies on the control of the number of confirmations which may incur heavy cognitive load on the users we thus develop an intelligent and classifier independent confirmation strategy iccom empirical evaluation shows that iccom may help various kinds of classifiers to achieve very high precision and recall by conducting fewer confirmations the contributions are significant to the archiving and recommendation of critical information since identification of possible tc errors those that require confirmation is the key to process information more properly
this paper addresses the automatic transformation of financial statements into conceptual graph interchange format cgif the method mainly involves extracting relevant financial performance indicators parsing it to obtain syntactic sentence structure and to generate the cgif for the extracted text the required components for the transformation are detailed out with an illustrative example the paper also discusses the potential manipulation of the resulting cgif for knowledge discovery and more precisely for deviation detection
distributed computing is very broad and active research area comprising fields such as cluster computing computational grids desktop grids and peer to peer pp systems studies in this area generally resort to simulations which enable reproducible results and make it possible to explore wide ranges of platform and application scenarios in this context network simulation is certainly the most critical part many packet level network simulators are available and enable high accuracy simulation but they lead to prohibitively long simulation times therefore many simulation frameworks have been developed that simulate networks at higher levels thus enabling fast simulation but losing accuracy one such framework simgrid uses flow level approach that approximates the behavior of tcp networks including tcp’s bandwidth sharing properties preliminary study of the accuracy loss by comparing it to popular packet level simulators has been proposed in and in which regimes in which simgrid’s accuracy is comparable to that of these packet level simulators are identified in this article we come back on this study reproduce these experiments and provide deeper analysis that enables us to greatly improve simgrid’s range of validity
watching tv is practice many people enjoy and feel comfortable with while watching tv programme users can be offered the opportunity to while making annotations create their own edited versions of the programme in this scenario it is challenge to allow the user to add comments in ubiquitous transparent way in this paper we exploit the concept of end user live editing of interactive video programmes by detailing an environment where users are able to live edit video using the itv remote control we contextualise our approach in the context of the brazilian interactive digital tv platform
we consider constructive preference elicitation for decision aid systems in applications such as configuration or electronic catalogs we are particularly interested in supporting decision tradeoff where preferences are revised in response to the available outcomes in several user involved decision aid systems we designed in the past we were able to observe three generic tradeoff strategies that people like to use we show how preference model based on soft constraints is well suited for supporting these strategies such framework provides an agile preference model particularly powerful for preference revision during tradeoff analysis we further show how to integrate the constraint based preference model with an interaction model called example critiquing we report on user studies which show that this model offers significant advantages over the commonly used ranked list model especially when the decision problem becomes complex
computational social choice is an interdisciplinary field of study at the interface of social choice theory and computer science promoting an exchange of ideas in both directions on the one hand it is concerned with the application of techniques developed in computer science such as complexity analysis or algorithm design to the study of social choice mechanisms such as voting procedures or fair division algorithms on the other hand computational social choice is concerned with importing concepts from social choice theory into computing for instance the study of preference aggregation mechanisms is also very relevant to multiagent systems in this short paper we give general introduction to computational social choice by proposing taxonomy of the issues addressed by this discipline together with some illustrative examples and an incomplete bibliography
it is commonly accepted that coordination is key characteristic of multi agent systems and that in turn the capability of coordinating with others constitutes centrepiece of agenthood however the key elements of coordination models mechanisms and languages for multi agent systems are still subject to considerable debate this paper provides brief overview of different approaches to coordination in multi agent systems it will then show how these approaches relate to current efforts working towards paradigm for smart next generation distributed systems where coordination is based on the concept of agreement between computational agents
computing frequent itemsets is one of the most prominent problems in data mining we study the following related problem called freqsat in depth given some itemset interval pairs does there exist database such that for every pair the frequency of the itemset falls into the interval this problem is shown to be np complete the problem is then further extended to include arbitrary boolean expressions over items and conditional frequency expressions in the form of association rules we also show that unless equals np the related function problem find the best interval for an itemset under some frequency constraints cannot be approximated efficiently furthermore it is shown that freqsat is recursively axiomatizable but that there cannot exist an axiomatization of finite arity
the need for retrieval based not on the attribute values but on the very data content has recently led to rise of the metric based similarity search the computational complexity of such retrieval and large volumes of processed data call for distributed processing which allows to achieve scalability in this paper we propose chord distributed data structure for metric based similarity search the structure takes advantage of the idea of vector index method idistance in order to transform the issue of similarity searching into the problem of interval search in one dimension the proposed peer to peer organization based on the chord protocol distributes the storage space and parallelizes the execution of similarity queries promising features of the structure are validated by experiments on the prototype implementation and two real life datasets
linking or matching databases is becoming increasingly important in many data mining projects as linked data can contain information that is not available otherwise or that would be too expensive to collect manually main challenge when linking large databases is the classification of the compared record pairs into matches and non matches in traditional record linkage classification thresholds have to be set either manually or using an em based approach more recently developed classification methods are mainly based on supervised machine learning techniques and thus require training data which is often not available in real world situations or has to be prepared manually in this paper novel two step approach to record pair classification is presented in first step example training data of high quality is generated automatically and then used in second step to train supervised classifier initial experimental results on both real and synthetic data show that this approach can outperform traditional unsupervised clustering and even achieve linkage quality almost as good as fully supervised techniques
in order to improve the retrieval accuracy of content based image retrieval systems research focus has been shifted from designing sophisticated low level feature extraction algorithms to reducing the semantic gap between the visual features and the richness of human semantics this paper attempts to provide comprehensive survey of the recent technical achievements in high level semantic based image retrieval major recent publications are included in this survey covering different aspects of the research in this area including low level image feature extraction similarity measurement and deriving high level semantic features we identify five major categories of the state of the art techniques in narrowing down the semantic gap using object ontology to define high level concepts using machine learning methods to associate low level features with query concepts using relevance feedback to learn users intention generating semantic template to support high level image retrieval fusing the evidences from html text and the visual content of images for www image retrieval in addition some other related issues such as image test bed and retrieval performance evaluation are also discussed finally based on existing technology and the demand from real world applications few promising future research directions are suggested
in this paper we investigate the problem of locating mobile facility at or near the center of set of clients that move independently continuously and with bounded velocity it is shown that the euclidean center of the clients may move with arbitrarily high velocity relative to the maximum client velocity this motivates the search for strategies for moving facility so as to closely approximate the euclidean center while guaranteeing low relative velocitywe present lower bounds and efficient competitive algorithms for the exact and approximate maintenance of the euclidean center for set of moving points in the plane these results serve to accurately quantify the intrinsic velocity approximation quality tradeoff associated with the maintenance of the mobile euclidean center
we consider the problem of how to best parallelize range queries in massive scale distributed database in traditional systems the focus has been on maximizing parallelism for example by laying out data to achieve the highest throughput however in massive scale database such as our pnuts system or bigtable maximizing parallelism is not necessarily the best strategy the system has more than enough servers to saturate single client by returning results faster than the client can consume them and when there are multiple concurrent queries maximizing parallelism for all of them will cause disk contention reducing everybody’s performance how can we find the right parallelism level for each query in order to achieve high consistent throughput for all queries we propose an adaptive approach with two aspects first we adaptively determine the ideal parallelism for single query execution which is the minimum number of parallel scanning servers needed to satisfy the client depending on query selectivity client load client server bandwidth and so on second we adaptively schedule which servers will be assigned to different query executions to minimize disk contention on servers and ensure that all queries receive good performance our scheduler can be tuned based on different policies such as favoring short versus long queries or high versus low priority queries an experimental study demonstrates the effectiveness of our techniques in the pnuts system
pervasive computing deployments are increasingly using sensor networks to build instrumented environments that provide local data to immersed mobile applications these applications demand opportunistic and unpredictable interactions with local devices while this direct communication has the potential to reduce both overhead and latency it deviates significantly from existing uses of sensor networks that funnel information to static central collection point this pervasive computing driven perspective demands new communication abstractions that enable the required direct communication among mobile applications and sensors this paper presents the scene abstraction which allows immersed applications to create dynamic distributed data structures over the immersive sensor network scene is created based on application requirements properties of the underlying network and properties of the physical environment this paper details our work on defining scenes providing an abstract model an implementation and an evaluation
the limitation of resource on mobile devices makes providing real time realistic graphics on local become challenging task recent researches focus on remote rendering which is not good in interaction and simple rendering on local which is not good in rendering quality as for this challenge this paper presents new multiresolution object space point based rendering approach for mobile devices local rendering the approach use hierarchical clustering to create hierarchy of bounding volumes in addition we use curvature sampling to reduce more amounts of sample points and give rapid lod selection algorithm then use view independent object space surface splatting as the rendering primitives which can provide good rendering quality experiment results show that this approach uses less time and gets better rendering quality for mobile devices
skeletal trees are commonly used in order to express geometric properties of the shape accordingly tree edit distance is used to compute dissimilarity between two given shapes we present new tree edit based shape matching method which uses recent coarse skeleton representation the coarse skeleton representation allows us to represent both shapes and shape categories in the form of depth trees consequently we can easily integrate the influence of the categories into shape dissimilarity measurements the new dissimilarity measure gives better within group versus between group separation and it mimics the asymmetric nature of human similarity judgements
the advent of xml as universal exchange format and of web services as basis for distributed computing has fostered the apparition of new class of documents dynamic xml documents these are xml documents where some data is given explicitly while other parts are given only intensionally by means of embedded calls to web services that can be called to generate the required information by the sole presence of web services dynamic documents already include inherently some form of distributed computation higher level of distribution that also allows fragments of dynamic documents to be distributed and or replicated over several sites is highly desirable in today’s web architecture and in fact is also relevant for regular non dynamic documentsthe goal of this paper is to study new issues raised by the distribution and replication of dynamic xml data our study has originated in the context of the active xml system but the results are applicable to many other systems supporting dynamic xml data starting from data model and query language we describe complete framework for distributed and replicated dynamic xml documents we provide comprehensive cost model for query evaluation and show how it applies to user queries and service calls finally we describe an algorithm that for given peer chooses data and services that the peer should replicate to improve the efficiency of maintaining and querying its dynamic data
in this paper we present set of studies designed to explore japanese young people’s practices around leisure outings how they are discovered planned coordinated and conducted and the resources they use to support these practices tokyo youth have wealth of leisure opportunities and tools to choose from they are technologically savvy and are in the vanguard of those for whom the new mobile internet technologies are available we characterize typical leisure outings described by our study participants how they are structured and the tools used to support them we found that discovery of leisure options tends to occur serendipitously often through personal recommendations from friends and family for leisure research and planning the internet is the tool of choice but accessed via pc not the mobile phone or keitai which is primarily used to communicate and coordinate not to search for information these and related findings suggest some emerging issues and opportunities for the design of future leisure support technologies
spanner of graph is spanning subgraph in which the distance between every pair of vertices is at most times their distance in the sparsest spanner problem asks to find for given graph and an integer spanner of with the minimum number of edges on general vertex graphs the problem is known to be np hard for all and even more it is np hard to approximate it with ratio logn for every for the problem remains np hard for planar graphs and up to now the approximability status of the problem on planar graphs considered to be open in this note we resolve this open issue by showing that the sparsest spanner problem admits polynomial time approximation scheme ptas for every actually our results hold for much wider class of graphs namely on the class of apex minor free graphs which contains the classes of planar and bounded genus graphs
this paper is call to arms for the community to take up van bush’s original challenge of effecting transformation of scholarly communications and record keeping it argues for the necessity of an interactive scholarly communication research agenda by briefly reviewing the rapid development of alternative authoring and publishing models seven dimensions of interactive communication that delineate design space for the area are described previous work and existing new media are used to initially populate the design space and show opportunities for new research directions vkb spaces synchrony padls and walden’s paths are used as foils for describing new media for interactive scholarly communication this leads to brief discussion of uncovered areas in the design space and open research questions community developed framework for future interactive scholarly communications would be major contribution and is put forth as the overall goal
data storage has become important issue in sensor networks as large amount of collected data need to be archived for future information retrieval this paper introduces storage nodes to store the data collected from the sensors in their proximities the storage nodes alleviate the heavy load of transmitting all the data to central place for archiving and reduce the communicatio cost induced by the network query this paper considers the storage node placement problem aiming to minimize the total energy cost for gathering data to the storage nodes and replying queries we examine deterministic placement of storage odes and present optimal algorithms based on dy amic programming further we give stochastic analysis for random deployment and conduct simulatio evaluatio for both deterministic and random placements of storage nodes
in this paper we investigate efficient strategies for supporting on demand information dissemination and gathering in large scale wireless sensor networks in particular we propose comb needle discovery support model resembling an ancient method use comb to help find needle in sand or haystack the model combines push and pull for information dissemination and gathering the push component features data duplication in linear neighborhood of each node the pull component features dynamic formation of an on demand routing structure resembling comb the comb needle model enables us to investigate the cost of spectrum of push and pull combinations for supporting query and discovery in large scale sensor networks our result shows that the optimal routing structure depends on the frequency of query occurrence and the spatial temporal frequency of related events in the network the benefit of balancing push and pull for information discovery is demonstrated
the proprietary nature of existing content delivery networks cdns means they are closed and do not naturally cooperate resulting in islands of cdns finding ways for distinct cdns to coordinate and cooperate with other cdns is necessary to achieve better overall service as perceived by end users at lower cost in this paper we present an architecture to support peering arrangements among cdn providers based on virtual organization vo model our approach promotes peering among providers reduces expenditure while upholding user perceived performance this is achieved through proper policy management of negotiated service level agreements slas among peers in addition scalability and resource sharing among cdns is improved through effective peering thus evolving past the current landscape where islands of cdns exist we also show analytically that significant performance improvement can be achieved through the peering of cdns
in well spaced point set when there is bounding hypercube the voronoi cells all have bounded aspect ratio ie the distance from the voronoi site to the farthest point in the voronoi cell divided by the distance to the nearest neighbor in the set is bounded by small constant well spaced point sets satisfy some important geometric properties and yield quality voronoi or simplicial meshes that can be important in scientific computations in this paper we consider the dynamic well spaced point sets problem which requires computing the well spaced superset of dynamically changing input set eg as input points are inserted or deleted we present dynamic algorithm that allows inserting deleting points into from the input in worst case log time where is the geometric spread natural measure that is bounded by log when input points are represented by log size words we show that the runtime of the dynamic update algorithm is optimal in the worst case our algorithm generates size optimal outputs the resulting output sets are never more than constant factor larger than the minimum size necessary preliminary implementation indicates that the algorithm is indeed fast in practice to the best of our knowledge this is the first time and size optimal dynamic algorithm for well spaced point sets
the terminology of machine learning and data mining methods does not always allow simple match between practical problems and methods while some problems look similar from the user’s point of view but require different methods to be solved some others look very different yet they can be solved by applying the same methods and tools choosing appropriate machine learning methods for problem solving in practice is therefore largely matter of experience and it is not realistic to expect simple look up table with matches between problems and methods however some guidelines can be given and collection that summarizes other people’s experience can also be helpful small number of definitions characterize the tasks that are performed by large proportion of methods most of the variation in methods is concerned with differences in data types and algorithmic aspects of methods in this paper we summarize the main task types and illustrate how wide variety of practical problems are formulated in terms of these tasks the match between problems and tasks is illustrated with collection of example applications with the aim of helping to express new practical problems as machine learning tasks some tasks can be decomposed into subtasks allowing wider variety of matches between practical problems and combinations of methods we review the main principles for choosing between alternatives and illustrate this with large collection of applications we believe that this provides some guidelines
existing data mining algorithms on graphs look for nodes satisfying specific properties such as specific notions of structural similarity or specific measures of link based importance while such analyses for predetermined properties can be effective in well understood domains sometimes identifying an appropriate property for analysis can be challenge and focusing on single property may neglect other important aspects of the data in this paper we develop foundation for mining the properties themselves we present theoretical framework defining the space of graph properties variety of mining queries enabled by the framework techniques to handle the enormous size of the query space and an experimental system called miner that demonstrates the utility and feasibility of property mining
this paper presents novel segmentation method to assist the rigging of articulated bodies the method computes coarse to fine hierarchy of segments ordered by the level of detail the results are invariant to deformations and numerically robust to noise irregular tessellations and topological short circuits the segmentation is based on two key ideas first it exploits the multiscale properties of the diffusion distance on surfaces and then it introduces new definition of medial structures composing bijection between medial structures and segments our method computes this bijection through simple and fast iterative approach and applies it to triangulated meshes
deformations of shapes and distances between shapes are an active research topic in computer vision we propose an energy of infinitesimal deformations of continuous and dimensional shapes that is based on the elastic energy of deformed objects this energy defines shape metric which is inherently invariant with respect to euclidean transformations and yields very natural deformations which preserve details we compute shortest paths between planar shapes based on elastic deformations and apply our approach to the modeling of dimensional shapes
this paper exploits the tradeoff between data quality and energy consumption to extend the lifetime of wireless sensor networks to obtain an aggregate form of sensor data with precision guarantees the precision constraint is partitioned and allocated to individual sensor nodes in coordinated fashion our key idea is to differentiate the precisions of data collected from different sensor nodes to balance their energy consumption three factors affecting the lifetime of sensor nodes are identified the changing pattern of sensor readings the residual energy of sensor nodes and the communication cost between the sensor nodes and the base station we analyze the optimal precision allocation in terms of network lifetime and propose an adaptive scheme that dynamically adjusts the precision constraints at the sensor nodes the adaptive scheme also takes into consideration the topological relations among sensor nodes and the effect of in network aggregation experimental results using real data traces show that the proposed scheme significantly improves network lifetime compared to existing methods
simulating how the global internet behaves is an immensely challenging undertaking because of the network’s great heterogeneity and rapid change the heterogeneity ranges from the individual links that carry the network’s traffic to the protocols that interoperate over the links the mix of different applications used at site and the levels of congestion seen on different links we discuss two key strategies for developing meaningful simulations in the face of these difficulties searching for invariants and judiciously exploring the simulation parameter space we finish with brief look at collaborative effort within the research community to develop common network simulator
open hypermedia systems ohs aim to provide efficient dissemination adaptation and integration of hyperlinked multimedia resources content available in peer to peer pp networks could add significant value to ohs provided that challenges for efficient discovery and prompt delivery of rich and up to date content are successfully addressed this paper proposes an architecture that enables the operation of ohs over pp overlay network of ohs servers based on semantic annotation of peer ohs servers and of multimedia resources that can be obtained through the link services of the ohs the architecture provides efficient resource discovery semantic query based subscriptions over this pp network can enable access to up to date content while caching at certain peers enables prompt delivery of multimedia content advanced query resolution techniques are employed to match different parts of subscription queries subqueries these subscriptions can be shared among different interested peers thus increasing the efficiency of multimedia content dissemination
java middleware mechanisms such as java rmi or corba implementations do not support thread coordination over the network synchronizing on remote objects does not work correctly and thread identity is not preserved for executions spanning multiple machines the current approaches dealing with the problem suffer from one of two weaknesses either they require new middleware mechanism making them less portable or they add overhead to the execution to propagate thread identifier through all method calls in this paper we present an approach that works with an unmodified middleware implementation yet does not impose execution overhead the key to our technique is the bytecode transformation of only stub routines instead of the entire client application we argue that this approach is portable and can be applied to mostly any middleware mechanism at the same time we show that compared to past techniques our approach eliminates an overhead of applications from the spec jvm suite
software products are often built from commercial off the shelf cots components when new releases of these components are made available for integration and testing source code is usually not provided by the vendors various regression test selection techniques have been developed and have been shown to be cost effective however the majority of these test selection techniques rely on source code for change identification and impact analysis in our research we have evolved regression test selection rts process called integrated black box approach for component change identification bacci for cots based applications bacci reduces the test suite based upon changes in the binary code of the cots component using the firewall regression test selection method in this paper we present the pallino tool pallino statically analyzes binary code to identify the code change and the impact of these changes based on the output of pallino and the original test suit testers can determine the regression test cases needed to cover the application glue code which is affected by the changed areas in the new version of the cots component three case studies examining total of fifteen component releases were conducted on abb internal products with the help of pallino rts via the bacci process can be completed in about one to two person hours for each release of the case studies the total size of application and component for each release is about kloc pallino is extensible and can be modified to support other rts methods for cots components currently pallino works on components in common object file format or portable executable formats
recently graphics processing units or gpus have become viable alternative as commodity parallel hardware for general purpose computing due to their massive data parallelism high memory bandwidth and improved general purpose programming interface in this paper we explore the use of gpu on the grid file traditional multidimensional access method considering the hardware characteristics of gpus we design massively multi threaded gpu based grid file for static memory resident multidimensional point data moreover we propose hierarchical grid file variant to handle data skews efficiently our implementations on the nvidia gtx graphics card are able to achieve two to eight times higher performance than their cpu counterparts on single pc
we present new family of join algorithms called ripple joins for online processing of multi table aggregation queries in relational database management system dbms such queries arise naturally in interactive exploratory decision support applications traditional offline join algorithms are designed to minimize the time to completion of the query in contrast ripple joins are designed to minimize the time until an acceptably precise estimate of the query result is available as measured by the length of confidence interval ripple joins are adaptive adjusting their behavior during processing in accordance with the statistical properties of the data ripple joins also permit the user to dynamically trade off the two key performance factors of on line aggregation the time between successive updates of the running aggregate and the amount by which the confidence interval length decreases at each update we show how ripple joins can be implemented in an existing dbms using iterators and we give an overview of the methods used to compute confidence intervals and to adaptively optimize the ripple join ldquo aspect ratio rdquo parameters in experiments with an initial implementation of our algorithms in the postgres dbms the time required to produce reasonably precise online estimates was up to two orders of magnitude smaller than the time required for the best offline join algorithms to produce exact answers
in this chapter we present an overview of web personalization process viewed as an application of data mining requiring support for all the phases of typical data mining cycle these phases include data collection and pre processing pattern discovery and evaluation and finally applying the discovered knowledge in real time to mediate between the user and the web this view of the personalization process provides added flexibility in leveraging multiple data sources and in effectively using the discovered models in an automatic personalization system the chapter provides detailed discussion of host of activities and techniques used at different stages of this cycle including the preprocessing and integration of data from multiple sources as well as pattern discovery techniques that are typically applied to this data we consider number of classes of data mining algorithms used particularly forweb personalization including techniques based on clustering association rule discovery sequential pattern mining markov models and probabilistic mixture and hidden latent variable models finally we discuss hybrid data mining frameworks that leverage data from variety of channels to provide more effective personalization solutions
over the past few years several technological advances have been made to enable locating people in indoor settings where way finding is something we do on daily basis in similar way as it happened with gps and today’s popular outdoor navigation systems indoor navigation is set to become one of the first truly ubiquitous services that will make our living and working environments intelligent two critical characteristics of human way finding are destination choice and path selection this work focuses on the latter which traditionally has been assumed to be the result of minimizing procedures such as selecting the shortest path the quickest or the least costly path however this path approximations are not necessarily the most natural paths taking advantage of context aware information sources this paper presents an easy to deploy context aware indoor navigation system together with an efficient spatial representation and novel approach for path adaptation to help people find their destination according to their preferences and contextual information we tested our system in one building with several users to estimate first an assessment of preference values and later to compare how the paths suggested by our system correspond to those people would actually follow the positive results of this evaluation confirm the suitability of our models and algorithms
we consider real time rendering of dynamic glossy objects with realistic shadows under distant all frequency environment lighting previous prt approaches pre compute light transport for fixed scene and cannot account for cast shadows on high glossy objects occluded by dynamic neighbors in this paper we extend double triple product integral to generalized multi function product integral we represent shading integral at each vertex as the product integral of multiple functions involving the lighting brdf local visibility and dynamic occlusions our main contribution is new mathematical representation and analysis of multi function product integral in the wavelet domain we show that multi function product integral in the primal corresponds to the summation of the product of basis coefficients and integral coefficients we propose novel generalized haar integral coefficient theorem to evaluate arbitrary haar integral coefficients we present an efficient sub linear algorithm to render dynamic glossy objects under time variant all frequency lighting and arbitrary view conditions in few seconds on commodity cpu orders of magnitude faster than previous techniques to further accelerate shadow computation we propose just in time radiance transfer jrt technique jrt is new generalization to prt for dynamic scenes it is compact and flexible and supports glossy materials by pre computing radiance transfer vectors at runtime we demonstrate rendering dynamic view dependent all frequency shadows in real time
prior to their deployment on an embedded system operating systems are commonly tailored to reduce code size and improve runtime performance program specialization is promising match for this process it is predictable and modules and it allows the reuse of previously implemented specializations specialization engine for embedded systems must overcome three main obstacles reusing existing compilers for embedded systems ii supporting specialization on resource limited system and iii coping with dynamic applications by supporting specialization on demand in this article we describe runtime specialization infrastructure that addresses these problems our solution proposes specialization in two phases of which the former generates specialized templates and the latter uses dedicated compiler to generate efficient native code ii virtualization mechanism that facilitates specialization of code at remote location iii an api and supporting os extensions that allow applications to produce manage and dispose of specialized code we evaluate our work through two case studies the tcp ip implementation of linux and ii the tux embedded web server we report appreciable improvements in code size and performance we also quantify the overhead of specialization and argue that specialization server can scale to support sizable workload
we address the verification problem of networks of communicating pushdown systems modeling communicating parallel programs with procedure calls processes in such networks can read the control state of the other processes according to given communication structure specifying the observability rights between processes the reachability problem of such models is undecidable in general first we define class of networks that effectively preserves recognizability hence its reachability problem is decidable then we consider networks where the communication structure can change dynamically during the execution according to phase graph the reachability problem for these dynamic networks being undecidable in general we define subclass for which it becomes decidable then we consider reachability when the switches in the communication structures are bounded we show that this problem is undecidable even for one switch then we define natural class of models for which this problem is decidable this class can be used in the definition of an efficient semi decision procedure for the analysis of the general model of dynamic networks our techniques allowed to find bugs in two versions of windows nt bluetooth driver
resource location or discovery is key issue for grid systems in which applications are composed of hardware and software resources that need to be located classical approaches to grid resource location are either centralized or hierarchical and will prove inefficient as the scale of grid systems rapidly increases on the other hand the peer to peer pp paradigm emerged as successful model that achieves scalability in distributed systems one possibility would be to borrow existing methods from the pp paradigm and to adopt them to grid systems taking into consideration the existing differences several such attempts have been made during the last couple of years this paper aims to serve as review of the most promising grid systems that use pp techniques to facilitate resource discovery in order to perform qualitative comparison of the existing approaches and to draw conclusions about their advantages and weaknesses future research directions are also discussed
today’s top high performance computing systems run applications with hundreds of thousands of processes contain hundreds of storage nodes and must meet massive requirements for capacity and performance these leadership class systems face daunting challenges to deploying scalable systems in this paper we present case study of the challenges to performance and scalability on intrepid the ibm blue gene system at the argonne leadership computing facility listed in the top fastest supercomputers of intrepid runs computational science applications with intensive demands on the system we show that intrepid’s file and storage system sustain high performance under varying workloads as the applications scale with the number of processes
regression testing is an important activity that can account for large proportion of the cost of software maintenance one approach to reducing the cost of regression testing is to employ selective regression testing technique that chooses subset of test suite that was used to test the software before the modifications then uses this subset to test the modified software selective regression testing techniques reduce the cost of regression testing if the cost of selecting the subset from the test suite together with the cost of running the selected subset of test cases is less than the cost of rerunning the entire test suite rosenblum and weyuker recently proposed coverage based predictors for use in predicting the effectiveness of regression test selection strategies using the regression testing cost model of leung and white rosenblum and weyuker demonstrated the applicability of these predictors by performing case study involving versions of the kornshell to further investigate the applicability of the rosenblum weyuker rw predictor additional empirical studies have been performed the rw predictor was applied to number of subjects using two different selective regression testing tools dejavu and testtube these studies support two conclusions first they show that there is some variability in the success with which the predictors work and second they suggest that these results can be improved by incorporating information about the distribution of modifications it is shown how the rw prediction model can be improved to provide such an accounting
cost information can be exploited in variety of contexts including parallelizing compilers autonomic grids and real time systems in this paper we introduce novel type and effect system the sized time system that is capable of determining upper bounds for both time and space costs and which we initially intend to apply to determining good granularity for parallel tasks the analysis is defined for simple strict higher order and polymorphic functional language incorporating arbitrarily sized list data structures the inference algorithm implementing this analysis constructs cost and size terms for expressions plus constraints over free size and cost variables in those terms that can be solved to produce information for higher order functions the paper presents both the analysis and the inference algorithm providing examples that illustrate the primary features of the analysis
how do problem domains impact software features we mine software code bases to relate problem domains characterized by imports to code features such as complexity size or quality the resulting predictors take the specific imports of component and predict its size complexity and quality metrics in an experiment involving plug ins of the eclipse project we found good prediction accuracy for most metrics since the predictors rely only on import relationships and since these are available at design time our approach allows for early estimation of crucial software metrics
boundary objects are critical but understudied theoretical construct in cscw through field study of aircraft technical support we examined the role of boundary objects in the practical achievement of safety by service engineers their resolution of repair requests was preserved in the organization’s memory via three compound boundary objects these crystallizations did not manifest static interpretation but instead were continually reinterpreted in light of meta negotiations this suggests design implications for organizational memory systems which can more fluidly represent the meta negotiations surrounding boundary objects
this paper presents new approach for generating coarse level approximations of topologically complex models dramatic topology reduction is achieved by converting model to and from volumetric representation our approach produces valid error bounded models and supports the creation of approximations that do not interpenetrate the original model either being completely contained in the input solid or bounding it several simple to implement versions of our approach are presented and discussed we show that these methods perform significantly better than other surface based approaches when simplifying topologically rich models such as scene parts and complex mechanical assemblies
condensed representations of patterns are at the core of many data mining works and there are lot of contributions handling data described by items in this paper we tackle sequential data and we define an exact condensed representation for sequential patterns according to the frequency based measures these measures are often used typically in order to evaluate classification rules furthermore we show how to infer the best patterns according to these measures ie the patterns which maximize them these patterns are immediately obtained from the condensed representation so that this approach is easily usable in practice experiments conducted on various datasets demonstrate the feasibility and the interest of our approach
data provenance is essential in applications such as scientific computing curated databases and data warehouses several systems have been developed that provide provenance functionality for the relational data model these systems support only subset of sql severe limitation in practice since most of the application domains that benefit from provenance information use complex queries such queries typically involve nested subqueries aggregation and or user defined functions without support for these constructs provenance management system is of limited use in this paper we address this limitation by exploring the problem of provenance derivation when complex queries are involved more precisely we demonstrate that the widely used definition of why provenance fails in the presence of nested subqueries and show how the definition can be modified to produce meaningful results for nested subqueries we further present query rewrite rules to transform an sql query into query propagating provenance the solution introduced in this paper allows us to track provenance information for far wider subset of sql than any of the existing approaches we have incorporated these ideas into the perm provenance management system engine and used it to evaluate the feasibility and performance of our approach
disjunctive logic programming dlp is an advanced formalism for knowledge representation and reasoning the language of dlp is very expressive and supports the representation of problems of high computational complexity specifically all problems in the complexity class sigma np np the dlp encoding of large variety of problems is often very concise simple and elegant in this paper we explain the computational process commonly performed by dlp systems with focus on search space pruning which is crucial for the efficiency of such systems we present two suitable operators for pruning fitting’s and well founded discuss their peculiarities and differences with respect to efficiency and effectiveness we design an intelligent strategy for combining the two operators exploiting the advantages of both we implement our approach in dlv the state of the art dlp system and perform some experiments these experiments show interesting results and evidence how the choice of the pruning operator affects the performance of dlp systems
higher order logic proof systems combine functional programming with logic providing functional programmers with comfortable setting for the formalization of programs specifications and proofs however possibly unfamiliar aspect of working in such an environment is that formally establishing program termination is necessary in many cases termination can be automatically proved but there are useful programs that diverge and others that always terminate but have difficult termination proofs we discuss techniques that support the expression of such programs as logical functions
one of the most important tasks in sensor networks is to determine the physical location of sensory nodes as they may not all be equipped with gps receivers in this paper we propose localization method for wireless sensor networks wsns using single mobile beacon the sensor locations are maintained as probability distributions that are sequentially updated using monte carlo sampling as the mobile beacon moves over the deployment area our method relieves much of the localization tasks from the less powerful sensor nodes themselves and relies on the more powerful beacon to perform the calculation we discuss the monte carlo sampling steps in the context of the localization using single beacon for various types of observations such as ranging angle of arrival aoa connectivity and combinations of those we also discuss the communication protocol that relays the observation data to the beacon and the localization result back to the sensors we consider security issues in the localization process and the necessary steps to guard against the scenario in which small number of sensors are compromised our simulation shows that our method is able to achieve less than localization error and over coverage with very sparse network of degree less than while achieving significantly better results if network connectivity increases
information obtained by merging data extracted from problem reporting systems such as bugzilla and versioning systems such as concurrent version system cvs is widely used in quality assessment approaches this paper attempts to shed some light on threats and difficulties faced when trying to integrate information extracted from mozilla cvs and bug repositories indeed the heterogeneity of mozilla bug reports often dealing with non defect issues and lacking of traceable information may undermine validity of quality assessment approaches relying on repositories integration in the reported mozilla case study we observed that available integration heuristics are unable to recover thousands of traceability links furthermore bugzilla classification mechanisms do not enforce distinction between different kinds of maintenance activities obtained evidence suggests that large amount of information is lost we conjecture that to benefit from cvs and problem reporting systems more systematic issue classification and more reliable traceability mechanisms are needed
although augmented reality technology was first developed over forty years ago there has been little survey work giving an overview of recent research in the field this paper reviews the ten year development of the work presented at the ismar conference and its predecessors with particular focus on tracking interaction and display research it provides roadmap for future augmented reality research which will be of great value to this relatively young field and also for helping researchers decide which topics should be explored when they are beginning their own studies in the area
tags or observable features shared by group of similar agents are effectively used in real and artificial societies to signal intentions and can be used to infer unobservable properties and choose appropriate behaviors use of tags to select partners has been shown to produce stable cooperation in agent populations playing the prisoner’s dilemma game existing tag mechanisms however can promote cooperation only if that requires identical actions from all group members we propose more general tag based interaction scheme that facilitates and supports significantly richer coordination between agents our work is motivated by previous research that showed the ineffectiveness of current tag schemes for solving games requiring divergent actions the mechanisms proposed here not only solves those problems but are effective for other general sum games we argue that these general purpose tag mechanisms allow new application possibilities of multiagent learning algorithms as they allow an agent to reuse its learned knowledge about one agent when interacting with other agents sharing the same observable features
method inlining is well known and effective optimization technique for object oriented programs in the context of dynamic compilation method inlining can be used as an adaptive optimization in order to eliminate the overhead of frequently executed calls this work presents an implementation of method inlining in the cacao virtual machine on stack replacement is used for installing optimized code and for deoptimizing code when optimistic assumptions of the optimizer are broken by dynamic class loading three inlining heuristics are compared using empirical results from set of benchmark programs the best heuristic eliminates up to of all executed calls and improves execution time up to
direction based spatial relationships are critical in many domains including geographic information systems gis and image interpretation they are also frequently used as selection conditions in spatial queries in this paper we explore the processing of queries based on object orientation based directional relationships new open shape based strategy oss is proposed oss converts the processing of the direction predicates to the processing of topological operations between open shapes and closed geometry objects since oss models the direction region as an openshape it does not need to know the boundary of the embedding world and also eliminating the computation related to the world boundary the experimental evaluations show that oss consistently outperforms classical range query strategy both in and cpu cost this paper is summary of the results
we present general framework for resource discovery composition and substitution in mobile ad hoc networks exploiting knowledge representation techniques key points of the proposed approach are reuse of discovery information at network layer in order to build fully unified semantic based discovery and routing framework use of semantic annotations in order to perform the orchestration of elementary resources for building personalized services adopting concept covering procedure and to allow the automatic substitution of no more suitable available components using ns simulator we evaluated performances of the proposed framework with reference to disaster recovery scenario in particular the impact of the number of available services and active clients has been investigated in various mobility conditions and for several service covering threshold levels obtained results show that the proposed framework is highly scalable given that its overall performance is improved by increasing the number of active clients the traffic load due to clients is negligible also for very small number of available service providers very high hit ratios can be reached increasing the number of providers can lead to hit ratios very close to at the expense of an increased traffic load finally the effectiveness of cross layer interaction between routing and resource discovery protocols has been also evaluated and discussed
scientific workflow systems have become necessary tool for many applications enabling the composition and execution of complex analysis on distributed resources today there are many workflow systems often with overlapping functionality key issue for potential users of workflow systems is the need to be able to compare the capabilities of the various available tools there can be confusion about system functionality and the tools are often selected without proper functional analysis in this paper we extract taxonomy of features from the way scientists make use of existing workflow systems and we illustrate this feature set by providing some examples taken from existing workflow systems the taxonomy provides end users with mechanism by which they can assess the suitability of workflow in general and how they might use these features to make an informed choice about which workflow system would be good choice for their particular application
describing relational data sources ie databases by means of ontologies constitutes the foundation of most of the semantic based approaches to data access and integration in spite of the importance of the task this is mostly carried out manually and to the best of our knowledge not much research has been devoted to its automatisation in this paper we introduce an automatic procedure for building ontologies starting from the integrity constraints present in the relational sourcesour work builds upon the wide literature on database schema reverse engineering however we adapt these techniques to the specific purpose of reusing the extracted schemata or ontologies in the context of semantic data access in particular we ensure that the underlying data sources can be queried through the ontologies and the extracted ontologies can be used for semantic integration using recently developed techniques in this areain order to represent the extracted ontology we adopt variant of the dlr lite description logic because of its ability to express the mostly used modelling constraints and its nice computational properties the connection with the relational data sources is captured by means of sound views moreover the adoption of this formal language enables us to prove that the extracted ontologies preserve the semantics of the integrity constraints in the relational sources therefore there is no data loss and the extracted ontology constitutes faithful wrapper of the relational sources
face detection is key component in numerous computer vision applications most face detection algorithms achieve real time performance by some form of dimensionality reduction of the input data such as principal component analysis in this paper we are exploring the emerging method of random projections rp data independent linear projection method for dimensionality reduction in the context of face detection the benefits of using random projections include computational efficiency that can be obtained by implementing matrix multiplications with small number of integer additions or subtractions the computational savings are of great significance in resource constrained environments such as wireless video sensor networks experimental results suggest that rp can achieve performance that is comparable to that obtained with traditional dimensionality reduction techniques for face detection using support vector machines
sequential behavior and sequence learning are essential to intelligence often the elements of sequences exhibit an internal structure that can elegantly be represented using relational atoms applying traditional sequential learning techniques to such relational sequences requires one either to ignore the internal structure or to live with combinatorial explosion of the model complexity this chapter briefly reviews relational sequence learning and describes several techniques tailored towards realizing this such as local pattern mining techniques hidden markov models conditional random fields dynamic programming and reinforcement learning
in pipelined channel interconnection network multiple bits may be simultaneously inflight on single wire this allows the cycle time of the network to be independent of thewire lengths significantly affecting the network design trade offs this paper investigatesthe design and performance of pipelined channel ary cube networks with particularemphasis on the choice of dimensionality and radix networks are investigated under theconstant link width constant node size and constant bisection constraints we find thatthe optimal dimensionality of pipelined channel networks is higher than that ofnonpipelined channel networks with the difference being greater under looser wiringconstraints their radix should remain roughly constant as network size is grown decreasing slightly for some unidirectional tori and increasing slightly for some bi directional meshes pipelined channel networks are shown to provide lower latency and higher bandwidth than their nonpipelined channel counterparts especially for high dimensional networks the paper also investigates the effects of switching overhead and message lengths indicating where results agree with and differ from previous results obtained for nonpipelined channel networks
data and knowledge grids represent emerging and attracting application scenarios for grid computing and pose novel and previously unrecognized challenges to the research community basically data and knowledge grids found on high performance grid infrastructures and add to the latter meaningful data and knowledge oriented abstractions and metaphors that perfectly marry with innovative requirements of modern complex intelligent information systems to this end service oriented architectures and paradigms are the most popular ones for grids and on the whole represent an active and widely recognized area of grid computing research in this paper we introduce the so called grid based rtsoa frameworks which essentially combine grid computing with real time service management and execution paradigms and put the basis for novel research perspectives in data intensive science grid applications with real time bound constraints this novel framework is then specialized to the particular context of data transformation services over grids which play relevant role for both data and knowledge grids finally we complete the main contribution of the paper with rigorous theoretical model for efficiently supporting grid based rtsoa frameworks with particular emphasis to the context of data transformation services over grids along with its preliminary experimental assessment
in this paper we present method of handling the visualization of hetereogeneous event traffic that is generated by intrusion detection sensors log files and other event sources on computer network from the point of view of detecting multistage attack paths that are of importance we perform aggregation and correlation of these events based on their semantic content to generate attack tracks that are displayed to the analyst in real time our tool called the event correlation for cyber attack recognition system ec cars enables the analyst to distinguish and separate an evolving multistage attack from the thousands of events generated on network we focus here on presenting the environment and framework for multistage attack detection using eccars along with screenshots that demonstrate its capabilities
in this paper we propose an efficient method for mining all frequent inter transaction patterns the method consists of two phases first we devise two data structures dat list which stores the item information used to find frequent inter transaction patterns and an itp tree which stores the discovered frequent inter transaction patterns in the second phase we apply an algorithm called itp miner inter transaction patterns miner to mine all frequent inter transaction patterns by using the itp tree the algorithm requires only one database scan and can localize joining pruning and support counting to small number of dat lists the experiment results show that the itp miner algorithm outperforms the fiti first intra then inter algorithm by one order of magnitude
to improve the performance of embedded processors an effective technique is collapsing critical computation subgraphs as application specific instruction set extensions and executing them on custom functional units the problem with this approach is the immense cost and the long times required to design new processor for each application as solution to this issue we propose an adaptive extensible processor in which custom instructions cis are generated and added after chip fabrication to support this feature custom functional units are replaced by reconfigurable matrix of functional units fus systematic quantitative approach is used for determining the appropriate structure of the reconfigurable functional unit rfu we also introduce an integrated framework for generating mappable cis on the rfu using this architecture performance is improved by up to with an average improvement of compared to issue in order risc processor by partitioning the configuration memory detecting similar subset cis and merging small cis the size of the configuration memory is reduced by
we present new technique for removing unnecessary synchronization operations from statically compiled java programs our approach improves upon current efforts based on escape analysis as it can eliminate synchronization operations even on objects that escape their allocating threads it makes use of compact equivalence class based representation that eliminates the need for fixed point operations during the analysis we describe and evaluate the performance of an implementation in the marmot native java compiler for the benchmark programs examined the optimization removes of the dynamic synchronization operations in single threaded programs and in multi threaded programs at low cost in additional compilation time and code growth
execution omission errors are known to be difficult to locate using dynamic analysis these errors lead to failure at runtime because of the omission of execution of some statements that would have been executed if the program had no errors since dynamic analysis is typically designed to focus on dynamic information arising from executed statements and statements whose execution is omitted do not produce dynamic information detection of execution omission errors becomes challenging task for example while dynamic slices are very effective in capturing faulty code for other types of errors they fail to capture faulty code in presence of execution omission errors to address this issue relevant slices have been defined to consider certain static dependences called potential dependences in addition to dynamic dependences however due to the conservative nature of static analysis overly large slices are produced in this paper we propose fully dynamic solution to locating execution omission errors using dynamic slices we introduce the notion of implicit dependences which are dependences that are normally invisible to dynamic slicing due to the omission of execution of some statements we design dynamic method that forces the execution of the omitted code by switching outcomes of relevant predicates such that those implicit dependences are exposed and become available for dynamic slicing dynamic slices can be computed and effectively pruned to produce fault candidate sets containing the execution omission errors we solve two main problems verifying the existence of single implicit dependence through predicate switching and recovering the implicit dependences in demand driven manner such that small number of verifications are required before the root cause is captured our experiments show that the proposed technique is highly effective in capturing execution omission errors
in this paper we discuss the scenario of petroleum engineering projects of petrobras large brazilian governmental oil gas company based on this scenario we propose set of application requirements and system architecture to guide the construction of collaborative engineering environment cee for assisting the control and execution of large and complex industrial projects in oil and gas industry the environment is composed by the integration of three different technologies of distributed group work workflow management system wfms multimedia collaborative system mmcs and collaborative virtual environments cve
the effective exploitation of multi san smp clusters and the use of generic clusters to support complex information systems require new approaches multi san smp clusters introduce new levels of parallelism and traditional environments are mainly used to run scientific computations in this paper we present novel approach to the exploitation of clusters that allows integrating in unique metaphor the representation of physical resources the modelling of applications and the mapping of application into physical resources the proposed abstractions favoured the development of an api that allows combining and benefiting from the shared memory message passing and global memory paradigms
many failures in long term collaboration occur because of lack of activity awareness activity awareness is broad concept that involves awareness of synchronous and asynchronous interactions over extended time periods we describe procedure to evaluate activity awareness and collaborative activities in controlled setting the activities used are modeled on real world collaborations documented earlier in field study we developed an experimental method to study these activity awareness problems in the laboratory participants worked on simulated long term project in the laboratory over multiple experimental sessions with confederate who partially scripted activities and probes we present evidence showing that this method represents valid model of real collaboration based on participants active engagement lively negotiation and awareness difficulties we found that having the ability to define reproduce and systematically manipulate collaborative situations allowed us to assess the effect of realistic conditions on activity awareness in remote collaboration
collecting data is very easy now owing to fast computers and ease of internet access it raises the problem of the curse of dimensionality to supervised classification problems in our previous work an intra prototype inter class separability ratio ipicsr model is proposed to select relevant features for semi supervised classification problems in this work new margin based feature selection model is proposed based on the ipicsr model for supervised classification problems owing to the nature of supervised classification problems more accurate class separating margin could be found by the classifier we adopt this advantage in the new intra prototype class margin separability ratio ipcmsr model experimental results are promising when compared to several existing methods using uci datasets
satellite software has to deal with specific needs including high integrity and dynamic updates in order to deal with these requirements we propose working at higher level of abstraction thanks to model driven engineering doing so involves many technologies including model manipulation code generation and verification before we can implement the approach there is need for further research in these areas eg about meta transformations in order to maintain several consistent related code generators we highlight such issues in regard to the current state of the art
many computations exhibit trade off between execution time and quality of service video encoder for example can often encode frames more quickly if it is given the freedom to produce slightly lower quality video developer attempting to optimize such computations must navigate complex trade off space to find optimizations that appropriately balance quality of service and performance we present new quality of service profiler that is designed to help developers identify promising optimization opportunities in such computations in contrast to standard profilers which simply identify time consuming parts of the computation quality of service profiler is designed to identify subcomputations that can be replaced with new and potentially less accurate subcomputations that deliver significantly increased performance in return for acceptably small quality of service losses our quality of service profiler uses loop perforation which transforms loops to perform fewer iterations than the original loop to obtain implementations that occupy different points in the performance quality of service trade off space the rationale is that optimizable computations often contain loops that perform extra iterations and that removing iterations then observing the resulting effect on the quality of service is an effective way to identify such optimizable subcomputations our experimental results from applying our implemented quality of service profiler to challenging set of benchmark applications show that it can enable developers to identify promising optimization opportunities and deliver successful optimizations that substantially increase the performance with only small quality of service losses
this work is motivated by strong market demand for the replacement of nor flash memory with nand flash memory to cut down the cost of many embedded system designs such as mobile phones different from lru related caching or buffering studies we are interested in prediction based prefetching based on given execution traces of application executions an implementation strategy is proposed for the storage of the prefetching information with limited sram and run time overheads an efficient prediction procedure is presented based on information extracted from application executions to reduce the performance gap between nand flash memory and nor flash memory in reads with the behavior of target application extracted from set of collected traces we show that data access to nor flash memory can respond effectively over the proposed implementation
the pypy project seeks to prove both on research and practical level the feasibility of constructing virtual machine vm for dynamic language in dynamic language in this case python the aim is to translate ie compile the vm to arbitrary target environments ranging in level from posix to smalltalk squeak via java and cli net while still being of reasonable efficiency within these environmentsa key tool to achieve this goal is the systematic reuse of the python language as system programming language at various levels of our architecture and translation process for each level we design corresponding type system and apply generic type inference engine for example the garbage collector is written in style that manipulates simulated pointer and address objects and when translated to these operations become level pointer and address instructions
we address the problem of replaying an application dialog between two hosts the ability to accurately replay application dialogs is useful in many security oriented applications such as replaying an exploit for forensic analysis or demonstrating an exploit to third partya central challenge in application dialog replay is that the dialog intended for the original host will likely not be accepted by another without modification for example the dialog may include or rely on state specific to the original host such as its hostname known cookie etc in such cases straight forward byte by byte replay to different host with different state eg different hostname than the original observed dialog participant will likely fail these state dependent protocol fields must be updated to reflect the different state of the different host for replay to succeedwe formally define the replay problem we present solution which makes novel use of program verification techniques such as theorem proving and weakest pre condition by employing these techniques we create the first sound solution to the replay problem replay succeeds whenever our approach yields an answer previous techniques though useful are based on unsound heuristics we implement prototype of our techniques called replayer which we use to demonstrate the viability of our approach
faced with growing knowledge management needs enterprises are increasingly realizing the importance of seamlessly integrating critical business information distributed across both structured and unstructured data sources in existing information integration solutions the application needs to formulate the sql logic to retrieve the needed structured data on one hand and identify set of keywords to retrieve the related unstructured data on the other this paper proposes novel approach wherein the application specifies its information needs using only sql query on the structured data and this query is automatically translated into set of keywords that can be used to retrieve relevant unstructured data we describe the techniques used for obtaining these keywords from the query result and ii additional related information in the underlying database we further show that these techniques achieve high accuracy with very reasonable overheads
we present algorithms for testing language inclusion between tree automata in time where is deterministic bottom up or top down we extend our algorithms for testing inclusion of automata for unranked trees in deterministic dtds or deterministic edtds with restrained competition in time previous algorithms were less efficient or less general
conference mining has been an important problem discussed these days for the purpose of academic recommendation previous approaches mined conferences by using network connectivity or by using semantics based intrinsic structure of the words present between documents modeling from document level dl while ignored semantics based intrinsic structure of the words present between conferences in this paper we address this problem by considering semantics based intrinsic structure of the words present in conferences richer semantics by modeling from conference level cl we propose generalized topic modeling approach based on latent dirichlet allocation lda named as conference mining conmin by using it we can discover topically related conferences conferences correlations and conferences temporal topic trends experimental results show that proposed approach significantly outperformed baseline approach in discovering topically related conferences and finding conferences correlations because of its ability to produce less sparse topics
sentiment classification aims to automatically predict sentiment polarity eg positive or negative of users publishing sentiment data eg reviews blogs although traditional classification algorithms can be used to train sentiment classifiers from manually labeled text data the labeling work can be time consuming and expensive meanwhile users often use some different words when they express sentiment in different domains if we directly apply classifier trained in one domain to other domains the performance will be very low due to the differences between these domains in this work we develop general solution to sentiment classification when we do not have any labels in target domain but have some labeled data in different domain regarded as source domain in this cross domain sentiment classification setting to bridge the gap between the domains we propose spectral feature alignment sfa algorithm to align domain specific words from different domains into unified clusters with the help of domain independent words as bridge in this way the clusters can be used to reduce the gap between domain specific words of the two domains which can be used to train sentiment classifiers in the target domain accurately compared to previous approaches sfa can discover robust representation for cross domain data by fully exploiting the relationship between the domain specific and domain independent words via simultaneously co clustering them in common latent space we perform extensive experiments on two real world datasets and demonstrate that sfa significantly outperforms previous approaches to cross domain sentiment classification
we study the problem of generating database and parameters for given parameterized sql query satisfying given test condition we introduce formal background theory that includes arithmetic tuples and sets and translate the generation problem into satisfiability or model generation problem modulo the background theory we use the satisfiability modulo theories smt solver in the concrete implementation we describe an application of model generation in the context of the database unit testing framework of visual studio
vast amount of documents in the web have duplicates which is challenge for developing efficient methods that would compute clusters of similar documents in this paper we use an approach based on computing closed sets of attributes having large support large extent as clusters of similar documents the method is tested in series of computer experiments on large public collections of web documents and compared to other established methods and software such as biclustering on same datasets practical efficiency of different algorithms for computing frequent closed sets of attributes is compared
recent research indicates that prediction based coherence optimizations offer substantial performance improvements for scientific applications in distributed shared memory multiprocessors important commercial applications also show sensitivity to coherence latency which will become more acute in the future as technology scales therefore it is important to investigate prediction of memory coherence activity in the context of commercial workloadsthis paper studies trace based downgrade predictor dgp for predicting last stores to shared cache blocks and pattern based consumer set predictor csp for predicting subsequent readers we evaluate this class of predictors for the first time on commercial applications and demonstrate that our dgp correctly predicts of last stores memory sharing patterns in commercial workloads are inherently non repetitive hence csp cannot attain high coverage we perform an opportunity study of dgp enhanced through competitive underlying predictors and in commercial and scientific applications demonstrate potential to increase coverage up to
subgroup discovery is the task of finding subgroups of population which exhibit both distributional unusualness and high generality due to the non monotonicity of the corresponding evaluation functions standard pruning techniques cannot be used for subgroup discovery requiring the use of optimistic estimate techniques instead so far however optimistic estimate pruning has only been considered for the extremely simple case of binary target attribute and up to now no attempt was made to move beyond suboptimal heuristic optimistic estimates in this paper we show that optimistic estimate pruning can be developed into sound and highly effective pruning approach for subgroup discovery based on precise definition of optimality we show that previous estimates have been tight only in special cases thereafter we present tight optimistic estimates for the most popular binary and multi class quality functions and present family of increasingly efficient approximations to these optimal functions as we show in empirical experiments the use of our newly proposed optimistic estimates can lead to speed up of an order of magnitude compared to previous approaches
the notion of feature is heavily used in software engineering especially for software product lines however this notion appears to be confusing mixing various aspects of problem and solution in this paper we attempt to clarify the notion of feature in the light of zave and jackson’s framework for requirements engineering by redefining problem level feature as set of related requirements specifications and domain assumptions the three types of statements central to zave and jackson’s framework we also revisit the notion of feature interaction this clarification work opens new perspectives on formal description and verification of software product lines an important benefit of the approach is to enable an early identification of feature interactions taking place in the systems environment notoriously challenging problem the approach is illustrated through proof of concept prototype tool and applied to smart home example
the cscw conference is celebrating its th birthday this is perfect time to analyze the coherence of the field to examine whether it has solid core or sub communities and to identify various patterns of its development in this paper we analyze the structure of the cscw conference using structural analysis of the citation graph of cscw and related publications we identify the conference’s core and most prominent clusters we also define measure to identify chasm papers namely papers cited significantly more outside the conference than within and analyze such papers
performance monitoring in most distributed systems provides minimal guidance for tuning problem diagnosis and decision making stardust is monitoring infrastructure that replaces traditional performance counters with end to end traces of requests and allows for efficient querying of performance metrics such traces better inform key administrative performance challenges by enabling for example extraction of per workload per resource demand information and per workload latency graphs this paper reports on our experience building and using end to end tracing as an on line monitoring tool in distributed storage system using diverse system workloads and scenarios we show that such fine grained tracing can be made efficient less than overhead and is useful for on and off line analysis of system behavior these experiences make case for having other systems incorporate such an instrumentation framework
in this article we consider model oriented formal specification languages we generate test cases by performing symbolic execution over model and from the test cases obtain java program this java program acts as test driver and when it is run in conjunction with the implementation then testing is performed in an automatic manner our approach makes the testing cycle fully automatic the main contribution of our work is that we perform automatic testing even when the models are non deterministic
there are many interface schemes that allow users to work at and move between focused and contextual views of dataset we review and categorize these schemes according to the interface mechanisms used to separate and blend views the four approaches are overview detail which uses spatial separation between focused and contextual views zooming which uses temporal separation focus context which minimizes the seam between views by displaying the focus within the context and cue based techniques which selectively highlight or suppress items within the information space critical features of these categories and empirical evidence of their success are discussed the aim is to provide succinct summary of the state of the art to illuminate both successful and unsuccessful interface strategies and to identify potentially fruitful areas for further work
program analyses and optimizations of java programs require reference information that determines the instances that may be accessed through dereferences reference information can be computed using reference analysis this paper presents set of studies that evaluate the precision of two existing approaches for identifying instances and one approach for computing reference information in reference analysis the studies use dynamic reference information collected during run time as lower bound approximation to the precise reference information the studies measure the precision of an existing approach by comparing the information computed using the approach with the lower bound approximation the paper also presents case studies that attempt to identify the cases under which an existing approach is not effective the presented studies provide information that may guide the usage of existing reference analysis techniques and the development of new reference analysis techniques
area efficiency is one of the major considerations in constraint aware hardware software partitioning process this paper focuses on the algorithmic aspects for hardware software partitioning with the objective of minimizing area utilization under the constraints of execution time and power consumption an efficient heuristic algorithm running in log is proposed by extending the method devised for solving the knapsack problem also an exact algorithm based on dynamic programming is proposed to produce the optimal solution for small sized problems simulation results show that the proposed heuristic algorithm yields very good approximate solutions while dramatically reducing the execution time
motivated by frequently repeated activities of negotiating similar sales contracts and inadequate studies of business to business bb negotiation processes we formulate meta model of negotiation based on practical meta model for contract template and template variables to allow flexible support for variety of negotiation processes based on our meta models we develop an effective implementation framework with contemporary web services technology we illustrate our methodology with three typical kinds of sales negotiation processes namely bargaining auction and request for proposals rfps as result bb business to customer bc or even customer to customer cc negotiation can be systematically supported in unified pragmatic framework for both human and programmatic access
we introduce an extended tableau calculus for answer set programming asp the proof system is based on the asp tableaux defined in the work by gebser and schaub tableau calculi for answer set programming in proceedings of the nd international conference on logic programming iclp etalle and truszczynski eds lecture notes in computer science vol springer with an added extension rule we investigate the power of extended asp tableaux both theoretically and empirically we study the relationship of extended asp tableaux with the extended resolution proof system defined by tseitin for sets of clauses and separate extended asp tableaux from asp tableaux by giving polynomial length proof for family of normal logic programs �n for which asp tableaux has exponential length minimal proofs with respect to additionally extended asp tableaux imply interesting insight into the effect of program simplification on the lengths of proofs in asp closely related to extended asp tableaux we empirically investigate the effect of redundant rules on the efficiency of asp solving
ibm db is truly hybrid commercial database system that combines xml and relational data it provides native support for xml storage and indexing and query evaluation support for xquery by building hybrid system the designers of db were able to use the existing sql query evaluation and optimization techniques to develop similar methods for xquery however sql and xquery are sufficiently different that new optimization techniques can and are being developed in the new xquery domain this paper describes few such techniques all based on static rewrites of xquery expressions
this paper describes computer aided software engineering case tool that helps designers analyze and fine tune the timing properties of their embedded real time software existing case tools focus on the software specification and design of embedded systems however they provide little if any support after the software has been implemented even if the developer used case tool to design the system their system most likely does not meet the specifications on the first try this paper includes guidelines for implementing analyzable code profiling real time system filtering and extracting measured data analyzing the data and interactively predicting the effect of changes to the real time system the tool is necessary first step towards automating the debugging and fine tuning of an embedded system’s temporal properties
the number of sensor networks deployed for manifold of applications is expected to increase dramatically in the coming few years advances in wireless communications and the growing interest in wireless networks are spurring this this growth will not only simplify the access to sensor information but will also motivate the creation of numerous new information paradoxically this growth will make the task of getting meaningful information from disparate sensor nodes not trivial one on the one hand traffic overheads and the increased probabilities of hardware failures make it very difficult to maintain an always on ubiquitous service on the other hand the heterogeneity of the sensor nodes makes finding extracting and aggregating data at the processing elements and sink nodes much harder these two issues in addition to course to the distribution dynamicity accuracy and reliability issues impose the need for more efficient and reliable techniques for information integration of data collected from sensor nodes in this paper we first address the issues related to data integration in wireless sensor networks with respect to heterogeneity dynamicity and distribution at both the technology and application levels second we present and discuss query processing algorithm which make use of the semantic knowledge about sensor networks expressed in the form of integrity constraints to reduce network traffic overheads improve scalability and extensibility of wireless networks and increase the stability and reliability of networks against hardware and software failures third we discuss uniform interface to data collected from sensor nodes that will map sensor specific data to the global information source based on context exported by the data integration system
the separation of concerns is fundamental principle in software engineering crosscutting concerns are concerns that do not align with hierarchical and block decomposition supported by mainstream programming languages in the past crosscutting concerns have been studied mainly in the context of object orientation feature orientation is novel programming paradigm that supports the de composition of crosscutting concerns in system with hierarchical block structure in two case studies we explore the problem of crosscutting concerns in functional programming and propose two solutions based on feature orientation
emerging critical issues for flash memory storage systems especially with regard to implementation within many embedded systems are the programmed nature of data transfers and their energy efficient nature we propose an request mechanism in the memory technology device mtd layer to exploit the programmed based data transfers for flash memory storage systems we propose to revise the waiting function in the memory technology device mtd layer to relieve the microprocessor from busy waiting in order to make more cpu cycles available for other tasks an energy efficient mechanism based on the request mechanism is also presented for multi bank flash memory storage systems which particularly focuses on switching the power state of each flash memory bank we demonstrate that the energy efficient request mechanism not only saves more cpu cycles to execute other tasks but also reduces the energy consumption of flash memory based on experiments incorporating realistic system workloads
much of the research that deals with understanding the real world and representing it in conceptual model uses some form of the entity relationship model as means of representation this research proposes an ontology for classifying relationship verb phrases based upon the domain and context of the application within which the relationship appears the classification categories to which the verb phrases are mapped were developed based upon prior research in databases ontologies and linguistics the usefulness of the ontology for comparing relationships when used in conjunction with an entity ontology is discussed together these ontologies can be effective in comparing two conceptual database designs for integration and validation empirical testing of the ontology on number of relationships from different application domains and contexts illustrates the usefulness of the research
cluster based storage systems are popular for data intensive applications and it is desirable yet challenging to provide incremental expansion and high availability while achieving scalability and strong consistency this paper presents the design and implementation of self organizing storage cluster called sorrento which targets data intensive workload with highly parallel requests and low write sharing patterns sorrento automatically adapts to storage node joins and departures and the system can be configured and maintained incrementally without interrupting its normal operation data location information is distributed across storage nodes using consistent hashing and the location protocol differentiates small and large data objects for access efficiency it adopts versioning to achieve single file serializability and replication consistency in this paper we present experimental results to demonstrate features and performance of sorrento using microbenchmarks application benchmarks and application trace replay
we characterize participatory design pd as maturing area of research and as an evolving practice among design professionals although pd has been applied outside of technology design here we focus on pd in relation to the introduction of computer based systems at work we discuss three main issues addressed by pd researchers the politics of design the nature of participation and method tools and techniques for participation we also report on the conditions for the transfer of rsquo rsquo pd results lsquo lsquo to workers user groups and design professionals that have characterized pd over time and across geopolitical terrains the topic of the sustainability of pd within an organizational context is also considered the article concludes with discussion of common issues explored within pd and cscw and frames directions for continuing dialogue between researchers and practitioners from the two fields the article draws on review of pd and cscw literatures as well as on our own research and practical experiences
professionals in the field of speech technology are often constrained by lack of speech corpora that are important to their research and development activities these corpora exist within the archives of various businesses and institutions however these entities are often prevented from sharing their data due to privacy rules and regulations efforts to scrub this data to make it shareable can result in data that has been either inadequately protected or data that has been rendered virtually unusable due to the loss resulting from suppression this work attempts to address these issues by developing scientific workflow that combines proven techniques in data privacy with controlled audio distortion resulting in corpora that have been adequately protected with minimal information loss
wireless sensor networks link the physical and digital worlds enabling both surveillance as well as scientific exploration in both cases on line detection of interesting events can be accomplished with continuous queries cqs in data stream management system dsms however the quality of service requirements of detecting these events are different for different monitoring applications the cqs for detecting anomalous events eg fire flood have stricter response time requirements over cqs which are for logging and keeping statistical information of physical phenomena in this work we are proposing the continuous query class cqc scheduler new scheduling policy which employs two level scheduling that is able to handle different ranks of cq classes it provides the lowest response times for classes of critical cqs while at the same time keeping reasonable response times for the other classes down the rank we have implemented cqc in the aqsios prototype dsms and evaluated it against existing scheduling policies under different workloads
current loop buffer organizations for very large instruction word processors are essentially centralized as consequence they are energy inefficient and their scalability is limited to alleviate this problem we propose clustered loop buffer organization where the loop buffers are partitioned and functional units are logically grouped to form clusters along with two schemes for buffer control which regulate the activity in each cluster furthermore we propose design time scheme to generate clusters by analyzing an application profile and grouping closely related functional units the simulation results indicate that the energy consumed in the clustered loop buffers is on average percent lower than the energy consumed in an uncompressed centralized loop buffer scheme percent lower than centralized compressed loop buffer scheme and percent lower than randomly clustered loop buffer scheme
seed sets are of significant importance for trust propagation based anti spamming algorithms eg trustrank conventional approaches require manual evaluation to construct seed set which restricts the seed set to be small in size since it would cost too much and may even be impossible to construct very large seed set manually the small sized seed set can cause detrimental effect on the final ranking results thus it is desirable to automatically expand an initial seed set to much larger one in this paper we propose the first automatic seed set expansion algorithm ase which expands small seed set by selecting reputable seeds that are found and guaranteed to be reputable through joint recommendation link structure experimental results on the webspam dataset show that with the same manual evaluation efforts ase can automatically obtain large number of reputable seeds with high precision thus significantly improving the performance of the baseline algorithm in terms of both reputable site promotion and spam site demotion
in this article we study the trade offs in designing efficient caching systems for web search engines we explore the impact of different approaches such as static vs dynamic caching and caching query results vs caching posting lists using query log spanning whole year we explore the limitations of caching and we demonstrate that caching posting lists can achieve higher hit rates than caching query answers we propose new algorithm for static caching of posting lists which outperforms previous methods we also study the problem of finding the optimal way to split the static cache between answers and posting lists finally we measure how the changes in the query log influence the effectiveness of static caching given our observation that the distribution of the queries changes slowly over time our results and observations are applicable to different levels of the data access hierarchy for instance for memory disk layer or broker remote server layer
the goal of model driven engineering is to raise the level of abstraction by shifting the focus to models as result complex software development activities move to the modelling level as well one such activity is model refactoring technique for restructuring the models in order to improve some quality attributes of the models as first contribution of this paper we argue and show that refactoring model is enabled by inconsistency detection and resolution inconsistencies in or between models occur since models typically describe software system from different viewpoints and on different levels of abstraction second contribution of this paper is rule based inconsistency resolution which enables reuse of different inconsistency resolutions across model refactorings and manages the flow of inconsistency resolution steps automatically
several benchmarks for measuring the memory performance of hpc systems along dimensions of spatial and temporal memory locality have recently been proposed however little is understood about the relationships of these benchmarks to real applications and to each other we propose methodology for producing architecture neutral characterizations of the spatial and temporal locality exhibited by the memory access patterns of applications we demonstrate that the results track intuitive notions of locality on several synthetic and application benchmarks we employ the methodology to analyze the memory performance components of the hpc challenge benchmarks the apex map benchmark and their relationships to each other and other benchmarks and applications we show that this analysis can be used to both increase understanding of the benchmarks and enhance their usefulness by mapping them along with applications to space along axes of spatial and temporal locality
the popularity of social bookmarking sites has made them prime targets for spammers many of these systems require an administrator’s time and energy to manually filter or remove spam here we discuss the motivations of social spam and present study of automatic detection of spammers in social tagging system we identify and analyze six distinct features that address various properties of social spam finding that each of these features provides for helpful signal to discriminate spammers from legitimate users these features are then used in various machine learning algorithms for classification achieving over accuracy in detecting social spammers with false positives these promising results provide new baseline for future efforts on social spam we make our dataset publicly available to the research community
the literature on web browsing indicates that older adults exhibit number of deficiencies when compared with younger users but have we perhaps been looking at the question in the wrong way when considering technology skills of older users what are the strengths of older users that can be leveraged to support technology use this paper considers cognitive aging with respect to distinctions in abilities that decline and those that do not with age look at specific abilities and their interactions may serve to help designers create software that meets the needs of older users
we developed an inquiry technique which we called paratype based on experience prototyping and event contingent experience sampling to survey people in real life situations about ubiquitous computing ubicomp technology we used this tool to probe the opinions of the conversation partners of users of the personal audio loop memory aid that can have strong impact on their privacy we present the findings of this study and their implications specifically the need to broaden public awareness of ubicomp applications and the unfitness of traditional data protection guidelines for tackling the privacy issues of many ubicomp applications we also point out benefits and methodological issues of paratypes and discuss why they are particularly fit for studying certain classes of mobile and ubicomp applications
data exchange deals with inserting data from one database into another database having different schema fagin et al lsqb rsqb have shown that among the universal solutions of solvable data exchange problem there exists mdash up to isomorphism mdash unique most compact one ldquo the core rdquo and have convincingly argued that this core should be the database to be materialized they stated as an important open problem whether the core can be computed in polynomial time in the general setting where the mapping between the source and target schemas is given by source to target constraints that are arbitrary tuple generating dependencies tgds and target constraints consisting of equality generating dependencies egds and weakly acyclic set of tgds in this article we solve this problem by developing new methods for efficiently computing the core of universal solution this positive result shows that data exchange based on cores is feasible and applicable in very general setting in addition to our main result we use the method of hypertree decompositions to derive new algorithms and upper bounds for query containment checking and computing cores of arbitrary database instances we also show that computing the core of data exchange problem is fixed parameter intractable with respect to number of relevant parameters and that computing cores is np complete if the rule bodies of target tgds are augmented by special predicate that distinguishes null value from constant data value
recently adaptive random testing through iterative partitioning ip art has been proposed as random testing method that is more effective than pure random testing besides this it is supposed to be equally effective as very good random testing techniques namely distance based adaptive random testing and restricted random testing while only having between linear and quadratic runtime in the present paper it is investigated what influence the ratio of width and height of rectangular input domain has on the effectiveness of various adaptive random testing methods based on our findings an improved version of ip art is proposed the effectiveness of the new method is also analyzed for various ratios of width and height of the input domain
model checking is emerging as practical tool for detecting logical errors in early stages of system design we investigate the model checking of hierarchical nested systems ie finite state machines whose states themselves can be other machines this nesting ability is common in various software design methodologies and is available in several commercial modeling tools the straightforward way to analyze hierarchical machine is to flatten it thus incurring an exponential blow up and apply model checking tool on the resulting ordinary fsm we show that this flattening can be avoided we develop algorithms for verifying linear time requirements whose complexity is polynomial in the size of the hierarchical machine we address also the verification of branching time requirements and provide efficient algorithms and matching lower bounds
gossip protocols have been successfully applied in the last few years to address wide range of functionalities so far however very few software frameworks have been proposed to ease the development and deployment of these gossip protocols to address this issue this paper presents gossipkit an event driven framework that provides generic and extensible architecture for the development of re configurable gossip oriented middleware gossipkit is based on generic interaction model for gossip protocols and relies on fine grained event mechanism to facilitate configuration and reconfiguration and promote code reuse
matrix clocks are generalization of the notion of vector clocks that allows the local representation of causal precedence to reach into an asynchronous distributed computation’s past with depth where is an integer maintaining matrix clocks correctly in system of nodes requires that every message be accompanied by nx numbers which reflects an exponential dependency of the complexity of matrix clocks upon the desired depth we introduce novel type of matrix clock one that requires only nx numbers to be attached to each message while maintaining what for many applications may be the most significant portion of the information that the original matrix clock carries in order to illustrate the new clock’s applicability we demonstrate its use in the monitoring of certain resource sharing computations
the problem of finding specified pattern in time series database ie query by content has received much attention and is now relatively mature field in contrast the important problem of enumerating all surprising or interesting patterns has received far less attention this problem requires meaningful definition of surprise and an efficient search technique all previous attempts at finding surprising patterns in time series use very limited notion of surprise and or do not scale to massive datasets to overcome these limitations we propose novel technique that defines pattern surprising if the frequency of its occurrence differs substantially from that expected by chance given some previously seen data this notion has the advantage of not requiring the user to explicitly define what is surprising pattern which may be hard or perhaps impossible to elicit from domain expert instead the user gives the algorithm collection of previously observed normal data our algorithm uses suffix tree to efficiently encode the frequency of all observed patterns and allows markov model to predict the expected frequency of previously unobserved patterns once the suffix tree has been constructed measure of surprise for all the patterns in new database can be determined in time and space linear in the size of the database we demonstrate the utility of our approach with an extensive experimental evaluation
this paper introduces virtual reality vr based automated end user behavioural assessment tool required during product and system development process in the proposed method while vr is used as means of interactive system prototype data collection and analysis are handled by an event based system which is adopted from human computer interaction and data mining literature the overall objective of the study is developing an intelligent support system where physical prototyping and the needs for human involvement as instructors data collectors and analysts during the user assessment studies are eliminated the proposed method is tested on product design example using subjects the experimental results clearly demonstrate the effectiveness of the proposed user assessment tool
one of the challenging problems that web service technology faces is the ability to effectively discover services based on their capabilities we present an approach to tackling this problem in the context of description logics dls we formalize service discovery as new instance of the problem of rewriting concepts using terminologies we call this new instance the best covering problem we provide formalization of the best covering problem in the framework of dl based ontologies and propose hypergraph based algorithm to effectively compute best covers of given request we propose novel matchmaking algorithm that takes as input service request or query and an ontology mathcal of services and finds set of services called xc best cover xd of whose descriptions contain as much common information with as possible and as little extra information with respect to as possible we have implemented the proposed discovery technique and used the developed prototype in the context of the multilingual knowledge based european electronic marketplace mkbeem project
testing large scale distributed systems is challenge because some errors manifest themselves only after distributed sequence of events that involves machine and network failures ds is checker that allows developers to specify predicates on distributed properties of deployed system and that checks these predicates while the system is running when ds finds problem it produces the sequence of state changes that led to the problem allowing developers to quickly find the root cause developers write predicates in simple and sequential programming style while ds checks these predicates in distributed and parallel manner to allow checking to be scalable to large systems and fault tolerant by using binary instrumentation ds works transparently with legacy systems and can change predicates to be checked at runtime an evaluation with deployed systems shows that ds can detect non trivial correctness and performance bugs at runtime and with low performance overhead less than
constraint based placement tools and their use in diagramming tools has been investigated for decades one of the most important and natural placement constraints in diagrams is that their graphic elements do not overlap however non overlap of objects especially non convex objects is difficult to solve and in particular to solve sufficiently rapidly for direct manipulation here we present the first practical approach for solving non overlap of possibly non convex objects in conjunction with other placement constraints such as alignment and distribution our methods are based on approximating the non overlap constraint by smoothly changing linear approximation we have found that this in combination with techniques for lazy addition of constraints is rapid enough to support direct manipulation in reasonably sized diagrams
semantic spaces encode similarity relationships between objects as function of position in mathematical space this paper discusses three different formulations for building semantic spaces which allow the automatic annotation and semantic retrieval of images the models discussed in this paper require that the image content be described in the form of series of visual terms rather than as continuous feature vector the paper also discusses how these term based models compare to the latest state of the art continuous feature models for auto annotation and retrieval
applying coordination mechanisms to handle interdependencies that exist between agents in multi agent systems mass is an important issue in this paper two levels mas modeling scheme and language to describe mas plan based on interdependencies between agents plans are proposed initially generic study of possible interdependencies between agents in mass is presented followed by the formal modeling using colored petri nets of coordination mechanisms for those dependencies these mechanisms control the dependencies between agents to avoid unsafe interactions where individual agents plans are merged into global multi agent plan this separation managed by the coordination mechanisms offers more powerful modularity in mass modeling
television is increasingly viewed through computers in the form of downloaded or steamed content yet computer based television consumption has received little attention in hci in this paper we describe study of the uses and practices of tech savvy college students studying their television consumption through the internet we find that users personalize their viewing but that tv is still richly social experience not as communal watching but instead through communication around television programs we explore new possibilities for technology based interaction around television
in recent years it is increasingly common to see using application specific instruction set processors asips in embedded system designs these asips can offer the ability of customizing hardware computation accelerators for an application domain along with instruction set extensions ises the customized accelerators can significantly improve the performance of embedded processors which has already been exemplified in previous research work and industrial products however these accelerators in asips can only accelerate the applications that are compiled with ises those applications compiled without ises can not benefit from the hardware accelerators at all in this paper we propose using software dynamic binary translation to overcome this problem ie dynamically utilizing the accelerators unlike static approach dynamically utilizing accelerator poses many new problems this paper comprehensively explores the techniques and design choices for solving these problems and demonstrates the effectiveness by the results of experiments
this article is survey of methods for measuring agreement among corpus annotators it exposes the mathematics and underlying assumptions of agreement coefficients covering krippendorff’s alpha as well as scott’s pi and cohen’s kappa discusses the use of coefficients in several annotation tasks and argues that weighted alpha like coefficients traditionally less used than kappa like measures in computational linguistics may be more appropriate for many corpus annotation tasks but that their use makes the interpretation of the value of the coefficient even harder
cloud computing offers users the ability to access large pools of computational and storage resources on demand multiple commercial clouds already allow businesses to replace or supplement privately owned it assets alleviating them from the burden of managing and maintaining these facilities however there are issues that must be addressed before this vision of utility computing can be fully realized in existing systems customers are charged based upon the amount of resources used or reserved but no guarantees are made regarding the application level performance or quality of service qos that the given resources will provide as cloud providers continue to utilize virtualization technologies in their systems this can become problematic in particular the consolidation of multiple customer applications onto multicore servers introduces performance interference between collocated workloads significantly impacting application qos to address this challenge we advocate that the cloud should transparently provision additional resources as necessary to achieve the performance that customers would have realized if they were running in isolation accordingly we have developed clouds qos aware control framework that tunes resource allocations to mitigate performance interference effects clouds uses online feedback to build multi input multi output mimo model that captures performance interference interactions and uses it to perform closed loop resource management in addition we utilize this functionality to allow applications to specify multiple levels of qos as application states for such applications clouds dynamically provisions underutilized resources to enable elevated qos levels thereby improving system efficiency experimental evaluations of our solution using benchmark applications illustrate the benefits performance interference is mitigated completely when feasible and system utilization is improved by up to using states
we show that type system based on the intuitionistic modal logic provides an expressive framework for specifying and analyzing computation stages in the context of typed lgr calculi and functional languages we directly demonstrate the sense in which our rarr square calculus captures staging and also give conservative embeddng of nielson and nielson’s two level functional language in our functional language mini ml square thus proving that binding time correctness is equivalent to modal correctness on this fragment in addition mini ml square can also express immediate evaluation and sharing of code across multiple stages thus supporting run time code generation as well as partial evaluation
the performance of sensor network may be best judged by the quality of application specific information return the actual sensing performance of deployed sensor network depends on several factors which cannot be accounted at design time such as environmental obstacles to sensing we propose the use of mobility to overcome the effect of unpredictable environmental influence and to adapt to run time dynamics now mobility with its dependencies such as precise localization and navigation is expensive in terms of hardware resources and energy constraints and may not be feasible in compact densely deployed and widespread sensor nodes we present method based on low complexity and low energy actuation primitives which are feasible for implementation in sensor networks we prove how these primitives improve the detection capabilities with theoretical analysis extensive simulations and real world experiments the significant coverage advantage recurrent in our investigation justifies our own and other parallel ongoing work in the implementation and refinement of self actuated systems
in the age of information and knowledge the role of business companies is not only just to sell products and services but also to educate their consumers educating consumers with extremely diversified backgrounds about technical knowledge is an important part of competitive advantage to succeed modern companies need to develop the best education technologies and teaching philosophies intelligent agents can play the right role in the personalised education of online consumers because intelligent agents can learn the consumers learning preferences and styles and recommend the best learning strategies educated and informed consumers are smarter and more confident in their buying decisions this paper investigates using intelligent agents to support consumer learning and proposes an agent based consumer learning support framework
quality of service qos information for web services is essential to qos aware service management and composition currently most qos aware solutions assume that the qos for component services is readily available and that the qos for composite services can be computed from the qos for component services the issue of how to obtain the qos for component services has largely been overlooked in this paper we tackle this fundamental issue we argue that most of qos metrics can be observed computed based on service operations we present the design and implementation of high performance qos monitoring system the system is driven by qos observation model that defines it and business level metrics and associated evaluation formulas integrated into the soa infrastructure at large the monitoring system can detect and route service operational events systemically further model driven hybrid compilation interpretation approach is used in metric computation to process service operational events and maintain metrics efficiently experiments suggest that our system can support high event processing throughput and scales to the number of cpus
dynamic branch prediction plays key role in delivering high performance in the modern microprocessors the cycles between the prediction of branch and its execution constitute the branch misprediction penalty because misprediction can be detected only after the branch executes branch misprediction penalty depends not only on the depth of the pipeline but also on the availability of branch operands fetched branches belonging to the dependence chains of loads that miss in the data cache exhibit very high misprediction penalty due to the delay in the execution resulting from unavailability of operands we call these the long latency branches it has been speculated that predicting such branches accurately or identifying such mispredicted branches before they execute would be beneficial in this paper we show that in traditional pipeline the frequency of mispredicted long latency branches is extremely small therefore predicting all these branches correctly does not offer any performance improvement architectures that allow checkpoint assisted speculative load retirement fetch large number of branches belonging to the dependence chains of the speculatively retired loads accurate prediction of these branches is extremely important for staying on the correct path we show that even if all the branches belonging to the dependence chains of the loads that miss in the data cache are predicted correctly only four applications out of twelve control speculation sensitive applications selected from the specint and biobench suites exhibit visible performance improvement this is an upper bound on the achievable performance improvement in these architectures this article concludes that it may not be worth designing specialized hardware to improve the prediction accuracy of the long latency branches
many functional programs with accumulating parameters are contained in the class of macro tree transducers we present program transformation technique that can be used to solve the efficiency problems due to creation and consumption of intermediate data structures in compositions of such functions where classical deforestation techniques fail to do so given two macro tree transducers under appropriate restrictions we construct single macro tree transducer that implements the composition of the two original ones the imposed restrictions are more liberal than those in the literature on macro tree transducer composition thus generalising previous results
the aim of this paper is to study the effect of local memory hierarchy and communication network exploitation on message sending and the influence of this effect on the decomposition of regular applications in particular we have considered two different parallel computers cray te and an sgi origin in both systems the bandwidth reduction due to non unit stride memory access is quite significant and could be more important than the reduction due to contention in the network these conclusions affect the choice of optimal decompositions for regular domains problems thus although traditional decompositions lead to lower inherent communication to computation ratios and could exploit more efficiently the interconnection network lower dimensional decompositions are found to be more efficient due to the data decomposition effects on the spatial locality of the messages to be communicated this increasing importance of local optimisations has also been shown using well known communication computation overlapping technique which increases execution time instead of reducing it as we could expect due to poor cache memory exploitation
typechecking consists of statically verifying whether the output of an xml transformation always conforms to an output type for documents satisfying given input type in this general setting both the input and output schema as well as the transformation are part of the input for the problem however scenarios where the input or output schema can be considered to be fixed are quite common in practice in the present work we investigate the computational complexity of the typechecking problem in the latter setting
attribute grammars add specification of static semantic properties to context free grammars which in turn describe the syntactic structure of program units however context free grammars cannot express programming in the large features common in modern programming languages including unordered collections of units included units and sharing of included units we present extensions to context free grammars and corresponding extensions to attribute grammars suitable for defining such features we explain how batch and incremental attribute evaluation algorithms can be adapted to support these extensions resulting in uniform approach to intraunit and interunit static semantic analysis and translation of multiunit programs
identifying and categorizing network traffic by application type is challenging because of the continued evolution of applications especially of those with desire to be undetectable the diminished effectiveness of port based identification and the overheads of deep packet inspection approaches motivate us to classify traffic by exploiting distinctive flow characteristics of applications when they communicate on network in this paper we explore this latter approach and propose semi supervised classification method that can accommodate both known and unknown applications to the best of our knowledge this is the first work to use semi supervised learning techniques for the traffic classification problem our approach allows classifiers to be designed from training data that consists of only few labeled and many unlabeled flows we consider pragmatic classification issues such as longevity of classifiers and the need for retraining of classifiers our performance evaluation using empirical internet traffic traces that span month period shows that high flow and byte classification accuracy ie greater than can be achieved using training data that consists of small number of labeled and large number of unlabeled flows presence of mice and elephant flows in the internet complicates the design of classifiers especially of those with high byte accuracy and necessitates the use of weighted sampling techniques to obtain training flows and retraining of classifiers is necessary only when there are non transient changes in the network usage characteristics as proof of concept we implement prototype offline and realtime classification systems to demonstrate the feasibility of our approach
state of the art statistical nlp systems for variety of tasks learn from labeled training data that is often domain specific however there may be multiple domains or sources of interest on which the system must perform for example spam filtering system must give high quality predictions for many users each of whom receives emails from different sources and may make slightly different decisions about what is or is not spam rather than learning separate models for each domain we explore systems that learn across multiple domains we develop new multi domain online learning framework based on parameter combination from multiple classifiers our algorithms draw from multi task learning and domain adaptation to adapt multiple source domain classifiers to new target domain learn across multiple similar domains and learn across large number of disparate domains we evaluate our algorithms on two popular nlp domain adaptation tasks sentiment classification and spam filtering
time synchronization is fundamental middleware service for any distributed system wireless sensor networks make extensive use of synchronized time in many contexts eg data fusion tdma schedules synchronized sleep periods etc we propose time synchronization method relevant for wireless sensor networks the solution features minimal complexity in network bandwidth storage as well as processing and can achieve good accuracy especially relevant for sensor networks it also provides tight deterministic bounds on offset and clock drift method for synchronizing the entire network is presented the performance of the algorithm is analyzed theoretically and validated on realistic testbed the results show that the proposed algorithm outperforms existing algorithms in terms of precision and resource requirements
using large set of human segmented natural images we study the statistics of region boundaries we observe several power law distributions which likely arise from both multi scale structure within individual objects and from arbitrary viewing distance accordingly we develop scale invariant representation of images from the bottom up using piecewise linear approximation of contours and constrained delaunay triangulation to complete gaps we model curvilinear grouping on top of this graphical geometric structure using conditional random field to capture the statistics of continuity and different junction types quantitative evaluations on several large datasets show that our contour grouping algorithm consistently dominates and significantly improves on local edge detection
negative bias temperature instability nbti which reduces the lifetime of pmos transistors is becoming growing reliability concern for sub micrometer cmos technologies parametric variation introduced by nano scale device fabrication inaccuracy can exacerbate the pmos transistor wear out problem and further reduce the reliable lifetime of microprocessors in this work we propose microarchitecture design techniques to combat the combined effect of nbti and process variation pv on the reliability of high performance microprocessors experimental evaluation shows our proposed process variation aware pv aware nbti tolerant microarchitecture design techniques can considerably improve the lifetime of reliability operation while achieving an attractive trade off with performance and power
quantitative association rule qar mining has been recognized an influential research problem over the last decade due to the popularity of quantitative databases and the usefulness of association rules in real life unlike boolean association rules bars which only consider boolean attributes qars consist of quantitative attributes which contain much richer information than the boolean attributes however the combination of these quantitative attributes and their value intervals always gives rise to the generation of an explosively large number of itemsets thereby severely degrading the mining efficiency in this paper we propose an information theoretic approach to avoid unrewarding combinations of both the attributes and their value intervals being generated in the mining process we study the mutual information between the attributes in quantitative database and devise normalization on the mutual information to make it applicable in the context of qar mining to indicate the strong informative relationships among the attributes we construct mutual information graph mi graph whose edges are attribute pairs that have normalized mutual information no less than predefined information threshold we find that the cliques in the mi graph represent majority of the frequent itemsets we also show that frequent itemsets that do not form clique in the mi graph are those whose attributes are not informatively correlated to each other by utilizing the cliques in the mi graph we devise an efficient algorithm that significantly reduces the number of value intervals of the attribute sets to be joined during the mining process extensive experiments show that our algorithm speeds up the mining process by up to two orders of magnitude most importantly we are able to obtain most of the high confidence qars whereas the qars that are not returned by mic are shown to be less interesting
grid workflow is complex and typical grid application but owing to the highly dynamic feature of grid environments resource unavailability is increasingly becoming severe and poses great challenges to grid workflow scheduling though fault recovery mechanism adopted in grid system guarantee the completion of jobs to some extent but wasting system resources to overcome the shortcoming this paper proposes markov chain based grid node availability prediction model which can efficiently predict grid nodes availability in the future without adding significant overhead based on this model the paper presents grid workflow scheduling based on reliability cost rcgs the performance evaluation results demonstrate that rcgs improves the dependability of workflow execution and success ratio of tasks with low reliability cost
microarchitecturally integrated on chip networks or micronets are candidates to replace busses for processor component interconnect in future processor designs for micronets tight coupling between processor microarchitecture and network architecture is one of the keys to improving processor performance this paper presents the design implementation and evaluation of the trips operand network opn the trips opn is dynamically routed mesh micronet that is integrated into the trips microprocessor core the trips opn is used for operand passing register file and primary memory system we discuss in detail the opn design including the unique features that arise from its integration with the processor core such as its connection to the execution unit’s wakeup pipeline and its in flight mis speculated traffic removal we then evaluate the performance of the network under synthetic and realistic loads finally we assess the processor performance implications of opn design decisions with respect to the end to end latency of opn packets and the opn’s bandwidth
requirements engineering is concerned with the identification of high level goals to be achieved by the system envisioned the refinement of such goals the operationalization of goals into services and constraints and the assignment of responsibilities for the resulting requirements to agents such as humans devices and programs goal refinement and operationalization is complex process which is not well supported by current requirements engineering technology ideally some form of formal support should be provided but formal methods are difficult and costly to apply at this stagethis paper presents an approach to goal refinement and operationalization which is aimed at providing constructive formal support while hiding the underlying mathematics the principle is to reuse generic refinement patterns from library structured according to strengthening weakening relationships among patterns the patterns are once for all proved correct and complete they can be used for guiding the refinement process or for pointing out missing elements in refinement the cost inherent to the use of formal method is thus reduced significantly tactics are proposed to the requirements engineer for grounding pattern selection on semantic criteriathe approach is discussed in the context of the multi paradigm language used in the kaos method this language has an external semantic net layer for capturing goals constraints agents objects and actions together with their links and an inner formal assertion layer that includes real time temporal logic for the specification of goals and constraints some frequent refinement patterns are high lighted and illustrated through variety of examplesthe general principle is somewhat similar in spirit to the increasingly popular idea of design patterns although it is grounded on formal framework here
significant effort is currently invested in application integration enabling business processes of different companies to interact and form complex multiparty processes web service standards based on wsdl web service definition language have been adopted as process to process communication paradigms however the conceptual modeling of applications using web services has not yet been addressed interaction with web services is often specified at the level of the source code thus web service interfaces are buried within programmatic specificationin this article we argue that web services should be considered first class citizens in the specification of web applications thus service enabled web applications should benefit from the high level modeling and automatic code generation techniques that have long been advocated for web application design and implementation to this end we extend declarative model for specifying data intensive web applications in two directions high level modeling of web services and their interactions with the web applications which use them and ii modeling and specification of web applications implementing new complex web serviceservicesour approach is fully implemented within case tool allowing the high level modeling and automatic deployment of service enabled web applications
we propose novel approach based on coinductive logic to specify type systems of programming languagesthe approach consists in encoding programs in horn formulas which are interpreted wrt their coinductive herbrand modelwe illustrate the approach by first specifying standard type system for small object oriented language similar to featherweight java then we define an idealized type system for variant of the language where type annotations can be omitted the type system involves infinite terms and proof trees not representable in finite way thus providing theoretical limit to type inference of object oriented programs since only sound approximations of the system can be implementedapproximation is naturally captured by the notions of subtyping and subsumption indeed rather than increasing the expressive power of the system as it usually happens here subtyping is needed for approximating infinite non regular types and proof trees with regular ones
in high performance general purpose workstations and servers the workload can be typically constituted of both sequential and parallel applications shared bus shared memory multiprocessor can be used to speed up the execution of such workload in this environment the scheduler takes care of the load balancing by allocating ready process on the first available processor thus producing process migration process migration and the persistence of private data into different caches produce an undesired sharing named passive sharing the copies due to passive sharing produce useless coherence traffic on the bus and coping with such problem may represent challenging design problem for these machines many protocols use smart solutions to limit the overhead to maintain coherence among shared copies none of these studies treats passive sharing directly although some indirect effect is present while dealing with the other kinds of sharing affinity scheduling can alleviate this problem but this technique does not adapt to all load conditions especially when the effects of migration are massive we present simple coherence protocol that eliminates passive sharing using information from the compiler that is normally available in operating system kernels we evaluate the performance of this protocol and compare it against other solutions proposed in the literature by means of enhanced trace driven simulation we evaluate the complexity in terms of the number of protocol states additional bus lines and required software support our protocol further limits the coherence maintaining overhead by using information about access patterns to shared data exhibited in parallel applications
the finite mixture model is widely used in various statistical learning problems however the model obtained may contain large number of components making it inefficient in practical applications in this paper we propose to simplify the mixture model by minimizing an upper bound of the approximation error between the original and the simplified model under the use of the distance measure this is achieved by first grouping similar components together and then performing local fitting through function approximation the simplified model obtained can then be used as replacement of the original model to speed up various algorithms involving mixture models during training eg bayesian filtering belief propagation and testing eg kernel density estimation support vector machine svm testing encouraging results are observed in the experiments on density estimation clustering based image segmentation and simplification of svm decision functions
pseudo relevance feedback is an effective technique for improving retrieval results traditional feedback algorithms use whole feedback document as unit to extract words for query expansion which is not optimal as document may cover several different topics and thus contain much irrelevant information in this paper we study how to effectively select from feedback documents those words that are focused on the query topic based on positions of terms in feedback documents we propose positional relevance model prm to address this problem in unified probabilistic way the proposed prm is an extension of the relevance model to exploit term positions and proximity so as to assign more weights to words closer to query words based on the intuition that words closer to query words are more likely to be related to the query topic we develop two methods to estimate prm based on different sampling processes experiment results on two large retrieval datasets show that the proposed prm is effective and robust for pseudo relevance feedback significantly outperforming the relevance model in both document based feedback and passage based feedback
most software product lines are first specified as an architecture high level description of what the overall family system structure is to be like and from which individual product architectures can be generated this structure however must be mapped onto implementation code stored in software configuration management system for it to be useful beyond the definitional phase of product line development various solutions have been developed to date but the recent emergence of change based approaches to product line architecture description challenges these existing solutions in this paper we characterize those challenges and present an alternative solution that relies on mapping to change based software configuration management system we motivate this choice discuss why it is more appropriate and carefully lay out its strengths and weaknesses relative to the existing state of the art
this paper addresses bottleneck problem in mobile file systems the propagation of updated large files from weakly connected client to its servers it proposes an efficient mechanism called operation shipping or operation based update propagation in the new mechanism the client ships the user operation that updated the large files rather than the files themselves across the weak network in contrast existing file systems use value shipping and ship the files the user operation is sent to surrogate client that is strongly connected to the servers the surrogate replays the user operation regenerates the files checks whether they are identical to the originals and if so sends the files to the servers on behalf of the client care has been taken such that the new mechanism does not compromise correctness or server scalability for example we show how forward error correction fec can restore minor reexecution discrepancies and thus make operation shipping work with more applications operation shipping can be further classified into two types application transparent and application aware their feasibilities and benefits have been demonstrated by the design implementation and evaluation of prototype extension to the coda file system in our controlled experiments operation shipping achieved substantial performance improvements network traffic reductions from times to nearly times and speedups in the range of times to nearly times
this paper presents conceptual navigation and navcon an architecture that implements this navigation in world wide web pages navcon architecture makes use of ontology as metadata to contextualise user’s search for information conceptual navigation is technique to browse websites within context context filters relevant retrieved information and it drives user’s navigation through paths that meet his needs based on ontologies navcon automatically inserts conceptual links in web pages these links permit the users to access graph representing concepts and their relationships browsing this graph it is possible to reach documents associated with user’s desired ontology concept
we present online algorithms to extract social context social spheres are labeled locations of significance represented as convex hulls extracted from gps traces colocation is determined from bluetooth and gps to extract social rhythms patterns in time duration place and people corresponding to real world activities social ties are formulated from proximity and shared spheres and rhythms quantitative evaluation is performed for million samples over man months applications are presented with assessment of perceived utility socio graph video and photo browser with filters for social metadata and jive blog browser that uses rhythms to discover similarity between entries automatically
in this paper we address the question of how we can identify hosts that will generate links to web spam detecting such spam link generators is important because almost all new spam links are created by them by monitoring spam link generators we can quickly find emerging web spam that can be used for updating existing spam filters in order to classify spam link generators we investigate various linkbased features including modified pagerank scores based on white and spam seeds and these scores of neighboring hosts an online learning algorithm is used to handle large scale data and the effectiveness of various features is examined experiments on three yearly archives of japanese web show that we can predict spam link generators with reasonable performance
an important issue in data warehouse development is the selection of set of views to materialize in order to accelerate on line analytical processing queries given certain space and maintenance time constraints existing methods provide good results but their high execution cost limits their applicability for large problems in this paper we explore the application of randomized local search algorithms to the view selection problem the efficiency of the proposed techniques is evaluated using synthetic datasets which cover wide range of data and query distributions the results show that randomized search methods provide near optimal solutions in limited time being robust to data and query skew furthermore they can be easily adapted for various versions of the problem including the simultaneous existence of size and time constraints and view selection in dynamic environments the proposed heuristics scale well with the problem size and are therefore particularly useful for real life warehouses which need to be analyzed by numerous business perspectives
in video on demand vod systems as the size of the buffer allocated to user requests increases initial latency and memory requirements increase hence the buffer size must be minimized the existing static buffer allocation scheme however determines the buffer size based on the assumption that the system is in the fully loaded state thus when the system is in partially loaded state the scheme allocates buffer larger than necessary to user request this paper proposes dynamic buffer allocation scheme that allocates to user requests buffers of the minimum size in partially loaded state as well as in the fully loaded state the inherent difficulty in determining the buffer size in the dynamic buffer allocation scheme is that the size of the buffer currently being allocated is dependent on the number of and the sizes of the buffers to be allocated in the next service period we solve this problem by the predict and enforce strategy where we predict the number and the sizes of future buffers based on inertia assumptions and enforce these assumptions at runtime any violation of these assumptions is resolved by deferring service to the violating new user request until the assumptions are satisfied since the size of the current buffer is dependent on the sizes of the future buffers the size is represented by recurrence equation we provide solution to this equation which can be computed at the system initialization time for runtime efficiency we have performed extensive analysis and simulation the results show that the dynamic buffer allocation scheme reduces initial latency averaged over the number of user requests in service from one to the maximum capacity to divide nsim divide of that for the static one and by reducing the memory requirement increases the number of concurrent user requests to sim times that of the static one when averaged over the amount of system memory available these results demonstrate that the dynamic buffer allocation scheme significantly improves the performance and capacity of vod systems
software architectures have the potential to substantially improve the development and evolution of large complex multi lingual multi platform long running systems however in order to achieve this potential specific techniques for architecture based modeling analysis and evolution must be provided furthermore one cannot fully benefit from such techniques unless support for mapping an architecture to an implementation also exists this paper motivates and presents one such approach which is an outgrowth of our experience with systems developed and evolved according to the architectural style we describe an architecture description language adl specifically designed to support architecture based evolution and discuss the kinds of evolution the language supports we then describe component based environment that enables modeling analysis and evolution of architectures expressed in the adl as well as mapping of architectural models to an implementation infrastructure the architecture of the environment itself can be evolved easily to support multiple adls kinds of analyses architectural styles and implementation platforms our approach is fully reflexive the environment can be used to describe analyze evolve and partially implement itself using the very adl it supports an existing architecture is used throughout the paper to provide illustrations and examples
conventional mobile social services such as loopt and google latitude rely on two classes of trusted relationships participants trust centralized server to manage their location information and trust between users is based on existing social relationships unfortunately these assumptions are not secure or general enough for many mobile social scenarios centralized servers cannot always be relied upon to preserve data confidentiality and users may want to use mobile social services to establish new relationships to address these shortcomings this paper describes smile privacy preserving missed connections service in which the service provider is untrusted and users are not assumed to have pre established social relationships with each other at high level smile uses short range wireless communication and standard cryptographic primitives to mimic the behavior of users in existing missed connections services such as craigslist trust is founded solely on anonymous users ability to prove to each other that they shared an encounter in the past we have evaluated smile using protocol analysis an informal study of craigslist usage and experiments with prototype implementation and found it to be both privacy preserving and feasible
we develop supervised dimensionality reduction method called lorentzian discriminant projection ldp for feature extraction and classification our method represents the structures of sample data by manifold which is furnished with lorentzian metric tensor different from classic discriminant analysis techniques ldp uses distances from points to their within class neighbors and global geometric centroid to model new manifold to detect the intrinsic local and global geometric structures of data set in this way both the geometry of group of classes and global data structures can be learnt from the lorentzian metric tensor thus discriminant analysis in the original sample space reduces to metric learning on lorentzian manifold we also establish the kernel tensor and regularization extensions of ldp in this paper the experimental results on benchmark databases demonstrate the effectiveness of our proposed method and the corresponding extensions
the lack of memory safety in many popular programming languages including and has been cause for great concern in the realm of software reliability verification and more recently system security major portion of known security attacks against software systems can be attributed to this shortcoming including the well known stack overflow heap overflow and format string attacks despite their limitations the flexibility performance and ease of use of these languages have made them the choice of most embedded software developers researchers have proposed various techniques to enhance programs for memory safety however they are all subject to severe performance penalties making their use impractical in most scenarios in this paper we present architectural enhancements to enable efficient memory safe execution of software on embedded processors the key insight behind our approach is to extend embedded processors with hardware that significantly accelerates the execution of the additional computations involved in memory safe execution specifically we design custom instructions to perform various kinds of memory safety checks and augment the instruction set of state of the art extensible processor xtensa from tensilica inc to implement them we demonstrate the application of the proposed architectural enhancements using ccured an existing tool for type safe retrofitting of programs the tool uses type inferencing engine that is built around strong type safety theory and is provably safe simulations of memory safe versions of popular embedded benchmarks on cycle accurate simulator modeling typical embedded system configuration indicate an average performance improvement of and maximum of when using the proposed architecture these enhancements entail minimal less than hardware overhead to the base processor our approach is completely automated and applicable to any program making it promising and practical approach for addressing the growing security and reliability concerns in embedded software
tens and eventually hundreds of processing cores are projected to be integrated onto future microprocessors making the global interconnect key component to achieving scalable chip performance within given power envelope while cmos compatible nanophotonics has emerged as leading candidate for replacing global wires beyond the nm timeframe on chip optical interconnect architectures proposed thus far are either limited in scalability or are dependent on comparatively slow electrical control networks in this paper we present phastlane hybrid electrical optical routing network for future large scale cache coherent multicore microprocessors the heart of the phastlane network is low latency optical crossbar that uses simple predecoded source routing to transmit cache line sized packets several hops in single clock cycle under contentionless conditions when contention exists the router makes use of electrical buffers and if necessary high speed drop signaling network overall phastlane achieve better network performance than state of the art electrical baseline while consuming less network power
this article focuses on the optimization of pcdm parallel two dimensional delaunay mesh generation application and its interaction with parallel architectures based on simultaneous multithreading smt processors we first present the step by step effect of series of optimizations on performance these optimizations improve the performance of pcdm by up to factor of six they target issues that very often limit the performance of scientific computing codes we then evaluate the interaction of pcdm with real smt based smp system using both high level metrics such as execution time and low level information from hardware performance counters
routing delays dominate other delays in current fpga designs we have proposed novel globally asynchronous locally synchronous gals fpga architecture called the gapla to deal with this problem in the gapla architecture the fpga area is divided into locally synchronous blocks and the communications between them are through asynchronous interfaces an automatic design flow is developed for the gapla architecture starting from behavioral description design is partitioned into smaller modules and fit to gapla synchronous blocks the asynchronous communications between modules are then sytthesized the cad flow is parameterized in modeling the gapla architecture by manipulating the parameters we could study different factors of the designed gapla arcitecturc our experimental results show an average of performance improvement could be achieved by the gapla architecture
we present non photorealistic rendering approach to capture and convey shape features of real world scenes we use camera with multiple flashes that are strategically positioned to cast shadows along depth discontinuities in the scene the projective geometric relationship of the camera flash setup is then exploited to detect depth discontinuities and distinguish them from intensity edges due to material discontinuitieswe introduce depiction methods that utilize the detected edge features to generate stylized static and animated images we can highlight the detected features suppress unnecessary details or combine features from multiple images the resulting images more clearly convey the structure of the imaged sceneswe take very different approach to capturing geometric features of scene than traditional approaches that require reconstructing model this results in method that is both surprisingly simple and computationally efficient the entire hardware software setup can conceivably be packaged into self contained device no larger than existing digital cameras
database storage managers have long been able to efficiently handle multiple concurrent requests until recently however computer contained only few single core cpus and therefore only few transactions could simultaneously access the storage manager’s internal structures this allowed storage managers to use non scalable approaches without any penalty with the arrival of multicore chips however this situation is rapidly changing more and more threads can run in parallel stressing the internal scalability of the storage manager systems optimized for high performance at limited number of cores are not assured similarly high performance at higher core count because unanticipated scalability obstacles arise we benchmark four popular open source storage managers shore berkeleydb mysql and postgresql on modern multicore machine and find that they all suffer in terms of scalability we briefly examine the bottlenecks in the various storage engines we then present shore mt multithreaded and highly scalable version of shore which we developed by identifying and successively removing internal bottlenecks when compared to other dbms shore mt exhibits superior scalability and times higher absolute throughput than its peers we also show that designers should favor scalability to single thread performance and highlight important principles for writing scalable storage engines illustrated with real examples from the development of shore mt
in this paper multiobjective mo learning approach to image feature extraction is described where pareto optimal interest point ip detectors are synthesized using genetic programming gp ips are image pixels that are unique robust to changes during image acquisition and convey highly descriptive information detecting such features is ubiquitous to many vision applications eg object recognition image indexing stereo vision and content based image retrieval in this work candidate ip operators are automatically synthesized by the gp process using simple image operations and arithmetic functions three experimental optimization criteria are considered the repeatability rate the amount of global separability between ips and the information content captured by the set of detected ips the mo gp search considers pareto dominance relations between candidate operators perspective that has not been contemplated in previous research devoted to this problem the experimental results suggest that ip detection is an illposed problem for which single globally optimum solution does not exist we conclude that the evolved operators outperform and dominate in the pareto sense all previously man made designs
memory latency is an important bottleneck in system performance that cannot be adequately solved by hardware alone several promising software techniques have been shown to address this problem successfully in specific situations however the generality of these software approaches has been limited because current architectures do not provide fine grained low overhead mechanism for observing and reacting to memory behavior directly to fill this need we propose new class of memory operations called informing memory operations which essentially consist of memory operation combined either implicitly or explicitly with conditional branch and link operation that is taken only if the reference suffers cache miss we describe two different implementations of informing memory operations one based on cache outcome condition code and another based on low overhead traps and find that modern in order issue and out of order issue superscalar processors already contain the bulk of the necessary hardware support we describe how number of software based memory optimizations can exploit informing memory operations to enhance performance and look at cache coherence with fine grained access control as case study our performance results demonstrate that the runtime overhead of invoking the informing mechanism on the alpha and mips processors is generally small enough to provide considerable flexibility to hardware and software designers and that the cache coherence application has improved performance compared to other current solutions we believe that the inclusion of informing memory operations in future processors may spur even more innovative performance optimizations
in this paper we propose secure flexible robust and fully distributed signature service for ad hoc groups in order to provide the service we use new threshold scheme that allows to share secret key among the current group members the novelty of the scheme is in that it easily and efficiently enables dynamic increase of the threshold according to the needs of the group so that the service provides both adaptiveness to the level of threat the ad hoc group is subject to and availability we prove the correctness of the protocol and evaluate its efficiency the changes to the threshold are performed by using protocol that is efficient in terms of interactions among nodes and per node required resources resulting suitable even for resource constrained settings finally the same proposed scheme allows to detect nodes that attempt to disrupt the service providing invalid contributions to the distributed signature service
user interface ui plasticity denotes ui adaptation to the context of use user platform physical and social environments while preserving usability in this article we focus on the use of model driven engineering and demonstrate how the intrinsic flexibility of this approach can be exploited by designers for ui prototyping as well as by end users in real settings for doing so the models developed at design time which convey high level design decisions are still available at run time as result an interactive system is not limited to set of linked pieces of code but is graph of models that evolves expresses and maintains multiple perspectives on the system from top level tasks to the final ui simplified version of home heating control system is used to illustrate our approach and technical implementation
models are subject to wide variety of processing operations such as compression simplification or watermarking which may introduce some geometric artifacts on the shape the main issue is to maximize the compression simplification ratio or the watermark strength while minimizing these visual degradations however few algorithms exploit the human visual system to hide these degradations while perceptual attributes could be quite relevant for this task particularly the masking effect defines the fact that one visual pattern can hide the visibility of another in this context we introduce an algorithm for estimating the roughness of mesh as local measure of geometric noise on the surface indeed textured or rough region is able to hide geometric distortions much better than smooth one our measure is based on curvature analysis on local windows of the mesh and is independent of the resolution connectivity of the object the accuracy and the robustness of our measure together with its relevance regarding visual masking have been demonstrated through extensive comparisons with state of the art and subjective experiment two applications are also presented in which the roughness is used to lead and improve respectively compression and watermarking algorithms
this paper shows how basic human values are related to behavior patterns of the usage and production of mobile multimedia content for these purposes we applied an interview technique called laddering technique referring to the means end theory these in depth interviews establish relations between product characteristics attributes user behaviors consequences and basic values and user goals we carried out interviews with respondents we found that the entertainment of other people the exchange of content the desire to save time and strategies to influence one’s mood are the main driving forces for multimedia usage those are strongly related to basic values like social recognition pleasure and happiness as well as to ambition it is shown that usability aspects like an intuitive ui are strongly related to the users desire for being effective and ambitious summarizing we report the method’s applicability in the realm of mobile hci
metric learning is powerful approach for semi supervised clustering in this paper metric learning method considering both pairwise constraints and the geometrical structure of data is introduced for semi supervised clustering at first smooth metric is found based on an optimization problem using positive constraints as supervisory information then an extension of this method employing both positive and negative constraints is introduced as opposed to the existing methods the extended method has the capability of considering both positive and negative constraints while considering the topological structure of data the proposed metric learning method can improve performance of semi supervised clustering algorithms experimental results on real world data sets show the effectiveness of this method
in this paper we propose fully automatic dynamic scratch pad memory spm management technique for instructions our technique loads required code segments into the spm on demand at runtime our approach is based on postpass analysis and optimization techniques and it handles the whole program including libraries the code mapping is determined by solving mixed integer linear programming formulation that approximates our demand paging technique we increase the effectiveness of demand paging by extracting from functions natural loops that are smaller in size and have higher instruction fetch count the postpass optimizer analyzes the object files of an application and transforms them into an application binary image that enables demand paging to the spm we evaluate our technique on eleven embedded applications and compare it to processor core with an instruction cache in terms of its performance and energy consumption the cache size is about of the executed code size and the spm size is chosen such that its die area is equal to that of the cache the experimental results show that on average the processor core and memory subsystem’s energy consumption can be reduced by and the performance improved by moreover in comparison with the optimal static placement strategy our technique reduces energy consumption by and improves performance by on average
memory encryption has become common approach to providing secure processing environment but current schemes suffer from extra performance and storage overheads this paper presents predecryption as method of providing this security with less overhead by using well known prefetching techniques to retrieve data from memory and perform decryption before it is needed by the processor our results tested mostly on spec benchmarks show that using our predecryption scheme can actually result in no increase in execution time despite an extra cycle decryption latency per memory block access
this paper reviews issues concerning the design of adaptive protocols for parallel discrete event simulation pdes the need for adaptive protocols are motivated in the background of the synchronization problem that has driven much of the research in this field traditional conservative and optimistic protocols and their hybrid variants are also discussed adaptive synchronization protocols are reviewed with special reference to their characteristics regarding the aspects of the simulation state that influence the adaptive decisions and the control parameters used finally adaptive load management and scheduling strategies and their relationship to the synchronization protocol are discussed
fundamental problem of finding applications that are highly relevant to development tasks is the mismatch between the high level intent reflected in the descriptions of these tasks and low level implementation details of applications to reduce this mismatch we created an approach called exemplar executable examples archive for finding highly relevant software projects from large archives of applications after programmer enters natural language query that contains high level concepts eg mime data sets exemplar uses information retrieval and program analysis techniques to retrieve applications that implement these concepts our case study with professional java programmers shows that exemplar is more effective than sourceforge in helping programmers to quickly find highly relevant applications
envisioning new generation of sensor network applications in healthcare and workplace safety we seek mechanisms that provide timely and reliable transmissions of mission critical data inspired by the physics in magnetism we propose simple diffusion based data dissemination mechanism referred to as the magnetic diffusion md in that the data sink functioning like the magnet propagates the magnetic charge to set up the magnetic field under the influence of the magnetic field the sensor data functioning like the metallic nails are attracted towards the sink we compare md to the state of the art mechanisms and find that md performs the best in timely delivery of data achieves high data reliability in the presence of network dynamics and yet works as energy efficiently as the state of the art these suggest that md is an effective data dissemination solution to the mission critical applications
fast algorithms can be created for many graph problems when instances are confined to classes of graphs that are recursively constructed this article first describes some basic conceptual notions regarding the design of such fast algorithms and then the coverage proceeds through several recursive graph classes specific classes include trees series parallel graphs terminal graphs treewidth graphs trees partial trees jackknife graphs pathwidth graphs bandwidth graphs cutwidth graphs branchwidth graphs halin graphs cographs cliquewidth graphs nlc graphs hb graphs and rankwidth graphs the definition of each class is provided typical algorithms are applied to solve problems on instances of most classes relationships between the classes are also discussed
we explore how to manage database workloads that contain mixture of oltp like queries that run for milliseconds as well as business intelligence queries and maintenance tasks that last for hours as data warehouses grow in size to petabytes and complex analytic queries play greater role in day to day business operations factors such as inaccurate cardinality estimates data skew and resource contention all make it notoriously difficult to predict how such queries will behave before they start executing however traditional workload management assumes that accurate expectations for the resource requirements and performance characteristics of workload are available at compile time and relies on such information in order to make critical workload management decisions in this paper we describe our approach to dealing with inaccurate predictions first we evaluate the ability of workload management algorithms to handle workloads that include unexpectedly long running queries second we describe new and more accurate method for predicting the resource usage of queries before runtime we have carried out an extensive set of experiments and report on few of our results
software productivity has been steadily increasing over the past years but not enough to close the gap between the demands placed on the software industry and what the state of the practice can deliver nothing short of an order of magnitude increase in productivity will extricate the software industry from its perennial crisis several decades of intensive research in software engineering and artificial intelligence left few alternatives but software reuse as the only realistic approach to bring about the gains of productivity and quality that the software industry needs in this paper we discuss the implications of reuse on the production with an emphasis on the technical challenges software reuse involves building software that is reusable by design and building with reusable software software reuse includes reusing both the products of previous software projects and the processes deployed to produce them leading to wide spectrum of reuse approaches from the building blocks reusing products approach on one hand to the generative or reusable processor reusing processes on the other we discuss the implication of such approaches on the organization control and method of software development and discuss proposed models for their economic analysis software reuse benefits from methodologies and tools tobuild more readily reusable software andlocate evaluate and tailor reusable software the last being critical for the building blocks approachboth sets of issues are discussed in this paper with focus on application generators and oo development for the first and thorough discussion of retrieval techniques for software components component composition or bottom up design and transformational systems for the second we conclude by highlighting areas that in our opinion are worthy of further investigation
the parallelization of simulink applications is currently responsibility of the system designer and the superscalar execution of the processors state of the art simulink compilers excel at producing reliable and production quality embedded code but fail to exploit the natural concurrency available in the programs and to effectively use modern multi core architectures the reason may be that many simulink applications are replete with loop carried dependencies that inhibit most parallel computing techniques and compiler transformations in this paper we introduce the concept of strands that allow the data dependencies to be broken while preserving the original semantics of the simulink program our fully automatic compiler transformations create concurrent representation of the program and thread level parallelism for multi core systems is planned and orchestrated to improve single processor performance we also exploit fine grain equation level parallelism by level order scheduling inside each thread our strand transformation has been implemented as an automatic transformation in proprietary compiler and with realistic aeronautic model executed in two processors leads to an up to times speedup over uniprocessor execution while the existing manual parallelization method achieves times speedup
in this special interest group sig we plan to focus on discussions and activities surrounding the design of technologies to support families many researchers and designers study domestic routines to inform technology design create novel interactive systems and evaluate these systems through real world use bringing together researchers designers and practitioners interested in technologies for families at sig provides forum for discussing shared interests including methods for gaining an understanding of the user metrics for evaluating interventions and shared definitions of the concept of the family
various resources such as files and memory are associated with certain protocols about how they should be accessed for example memory cell that has been allocated should be eventually deallocated and after the deallocation the cell should no longer be accessed igarashi and kobayashi recently proposed general type based method to check whether program follows such resource access policies but their analysis was not precise enough for certain programs in this paper we refine their type based analysis by introducing new notion of time regions the resulting analysis combines the merits of two major previous approaches to type based analysis of resource usage linear type based and effect based approaches
the approach to representation and presentation of knowledge used in aries an environment to experiment with support for analysts in modeling target domains and in entering and formalizing system requirements is described to effectively do this aries must manage variety of notations so that analysts can enter information in natural manner and aries can present it back in different notations and from different viewpoints to provide this functionality single highly expressive internal representation is used for all information in the system the system architecture separates representation and presentation in order to localize consistency and propagation issues the presentation architecture is tailored to be flexible enough so that new notations can be easily introduced on top of the underlying representation presentation knowledge is coupled to specification evolution knowledge thereby leveraging common representations for both in order to provide automated focusing support to users who need informative guidance in creating and modifying specifications
we propose novel approach to the study of internet topology in which we use an optimization framework to model the mechanisms driving incremental growth while previous methods of topology generation have focused on explicit replication of statistical properties such as node hierarchies and node degree distributions our approach addresses the economic tradeoffs such as cost and performance and the technical constraints faced by single isp in its network design by investigating plausible objectives and constraints in the design of actual networks observed network properties such as certain hierarchical structures and node degree distributions can be expected to be the natural by product of an approximately optimal solution chosen by network designers and operators in short we advocate here essentially an approach to network topology design modeling and generation that is based on the concept of highly optimized tolerance hot in contrast with purely descriptive topology modeling this opens up new areas of research that focus on the causal forces at work in network design and aim at identifying the economic and technical drivers responsible for the observed large scale network behavior as result the proposed approach should have significantly more predictive power than currently pursued efforts and should provide scientific foundation for the investigation of other important problems such as pricing peering or the dynamics of routing protocols
the objective of this work is to authenticate individuals based on the appearance of their faces this is difficult pattern recognition problem because facial appearance is generally greatly affected by the changes in the way face is illuminated by the camera viewpoint and partial occlusions for example due to eye wear we describe fully automatic algorithm that systematically addresses each of these challenges the main novelty is an algorithm for decision level fusion of two types of imagery one acquired in the visual and one acquired in infrared electromagnetic spectrum specifically we examine the effects of preprocessing of data in each domain ii the fusion of holistic and local facial appearance and iii propose an algorithm for combining the similarity scores in visual and thermal spectra in the presence of prescription glasses and significant pose variations using small number of training images our system achieved high correct identification rate of on freely available data set containing extreme illumination changes
this paper presents method for automatic real time face and mouth recognition using radial basis function neural networks rbfnn the proposed method uses the motion information to localize the face region and the face region is processed in yc rc color space to determine the locations of the eyes the center of the mouth is determined relative to the locations of the eyes facial and mouth features are extracted using multiscale morphological erosion and dilation operations respectively the facial features are extracted relative to the locations of the eyes and mouth features are extracted relative to the locations of the eyes and mouth the facial and mouth features are given as input to radial basis function neural networks the rbfnn is used to recognize person in video sequences using face and mouth modalities the evidence from face and mouth modalities are combined using weighting rule and the result is used for identification and authentication the performance of the system using facial and mouth features is evaluated in real time in the laboratory environment and the system achieves recognition rate rr of and an equal error rate eer of about for subjects the performance of the system is also evaluated for xmvts database and the system achieves recognition rate rr of an equal error rate eer of about for subjects
in this paper new corner detector is proposed based on evolution difference of scale pace which can well reflect the change of the domination feature between the evolved curves in gaussian scale space we use difference of gaussian dog to represent these scale evolution differences of planar curves and the response function of the corners is defined as the norm of dog characterizing the scale evolution differences the proposed dog detector not only employs both the low scale and the high one for detecting the candidate corners but also assures the lowest computational complexity among the existing boundary based detectors finally based on acu and error index criteria the comprehensive performance evaluation of the proposed detector is performed and the results demonstrate that the present detector allows very strong response for corner position and possesses better detection and localization performance and robustness against noise
this paper presents the research results of an ongoing technology transfer project carried out in cooperation between the university of salerno and small software company the project is aimed at developing and transferring migration technology to the industrial partner the partner should be enabled to migrate monolithic multi user cobol legacy systems to multi tier web based architecture the assessment of the legacy systems of the partner company revealed that these systems had very low level of decomposability with spaghetti like code and embedded control flow and database accesses within the user interface descriptions for this reason it was decided to adopt an incremental migration strategy based on the reengineering of the user interface using web technology on the transformation of interactive legacy programs into batch programs and the wrapping of the legacy programs middleware framework links the new web based user interface with the wrapped legacy system an eclipse plug in named melis migration environment for legacy information systems was also developed to support the migration process both the migration strategy and the tool have been applied to two essential subsystems of the most business critical legacy system of the partner company copyright copy john wiley sons ltd
the biases of individual algorithms for non parametric document clustering can lead to non optimal solutions ensemble clustering methods may overcome this limitation but have not been applied to document collections this paper presents comparison of strategies for non parametric document ensemble clustering
clustering is descriptive data mining task aiming to group the data into homogeneous groups this paper presents novel evolutionary algorithm nocea that efficiently and effectively clusters massive numerical databases nocea evolves individuals of variable length consisting of disjoint and axis aligned hyper rectangular rules with homogeneous data distribution the antecedent part of the rules includes an interval like condition for each dimension novel quantisation algorithm imposes regular multi dimensional grid structure onto the data space to reduce the search combinations due to quantisation the boundaries of the intervals are encoded as integer values the evolutionary search is guided by simple data coverage maximisation function the enormous data space is effectively explored by task specific recombination and mutation operators producing candidate solutions with no overlapping rules parsimony generalisation operator shortens the discovered knowledge by replacing adjacent rules with more generic ones nocea employs special homogeneity operator that enforces quasi uniform data distribution in the space enclosed by the candidate rules after convergence the discovered knowledge undergoes simplification to perform subspace clustering and to assemble the clusters results using real world datasets are included to show that nocea has several attractive properties for clustering including comprehensible output in the form of disjoint and homogeneous rules the ability to discover clusters of arbitrary shape density size and data coverage ability to perform effective subspace clustering near linear scalability with the database size data and cluster dimensionality and substantial potential for task parallelism speedup of on processors real world example is detailed study of the seismicity along the african eurasian arabian plate boundaries
while zoomable user interfaces can improve the usability of applications by easing data access drawback is that some users tend to become lost after they have zoomed in previous studies indicate that this effect could be related to individual differences in spatial ability to overcome such orientation problems many desktop applications feature an additional overview window showing miniature of the entire information space small devices however have very limited screen real estate and incorporating an overview window often means pruning the size of the detail view considerably given this context we report the results of user study in which participants solved search tasks by using two zoomable scatterplot applications on pda one of the applications featured an overview the other relied solely on the detail view in contrast to similar studies for desktop applications there was no significant difference in user preference between the interfaces on the other hand participants solved search tasks faster without the overview this indicates that on small screens larger detail view can outweigh the benefits gained from an overview window individual differences in spatial ability did not have significant effect on task completion times although results suggest that participants with higher spatial ability were slowed down by the overview more than low spatial ability users
we consider finding descriptive labels for anonymous structured datasets such as those produced by state of the art web wrappers we give probabilistic model to estimate the affinity between attributes and labels and describe method that uses web search engine to populate the model we discuss method for finding good candidate labels for unlabeled datasets ours is the first unsupervised labeling method that does not rely on mining the html pages containing the data experimental results with data from different domains show that our methods achieve high accuracy even with very few search engine accesses
we present novel network on chip based architecture for future programmable chips fpgas key challenge for fpga design is supporting numerous highly variable design instances with good performance and low cost our architecture minimizes the cost of supporting wide range of design instances with given throughput requirements by balancing the amount of efficient hard coded noc infrastructure and the allocation of soft networking resources at configuration time although traffic patterns are design specific the physical link infrastructure is performance bottleneck and hence should be hard coded it is therefore important to employ routing schemes that allow for high flexibility to efficiently accommodate different traffic patterns during configuration we examine the required capacity allocation for supporting collection of typical traffic patterns on such chips under number of routing schemes we propose new routing scheme weighted ordered toggle wot and show that it allows high design flexibility with low infrastructure cost moreover wot utilizes simple small area on chip routers and has low memory demands
the direct access file system dafs is new fast and lightweight remote file system protocol dafs targets the data center by addressing the performance and functional needs of clusters of application servers we call this the local file sharing environment file access performance is improved by utilizing direct access transports such as infiniband remote direct data placement and the virtual interface architecture dafs also enhances file sharing semantics compared to prior network file system protocols applications using dafs through user space library can bypass operating system overhead further improving performance we present performance measurements of an ip based dafs network demonstrating the dafs protocol’s lower client cpu requirements over commodity gigabit ethernet we also provide the first multiprocessor scaling results for well known application gnu gzip converted to use dafs
profiling is technique of gathering program statistics in order to aid program optimization in particular it is an essential component of compiler optimization for the extraction of instruction level parallelism code instrumentation has been the most popular method of profiling however real time interactive and transaction processing applications suffer from the high execution time overhead imposed by software instrumentation this paper suggests the use of hardware dedicated to the task of profiling the hardware proposed consists of set of counters the profile buffer profile collection method that combines the use of hardware the compiler and operating system support is described three methods for profile buffer indexing address mapping selective indexing and compiler indexing are presented that allow this approach to produce accurate profiling information with very little execution slowdown the profile information obtained is applied to prominent compiler optimization namely superblock scheduling the resulting instruction level parallelism approaches that obtained through the use of perfect profile information
we describe the design and current status of our effort to implement the programming model of nested data parallelism into the glasgow haskell compiler we extended the original programming model and its implementation both of which were first popularised by the nesl language in terms of expressiveness as well as efficiency our current aim is to provide convenient programming environment for smp parallelism and especially multicore architectures preliminary benchmarks show that we are at least for some programs able to achieve good absolute performance and excellent speedups
we present new algorithm of mining sequential patterns in data stream in recent years data stream emerges as new data type in many applications when processing data stream the memory is fixed new stream elements flow continuously the stream data can not be paused or completely stored we develop lsp tree data structure to store the discovered sequential patterns the experiment result shows that our proposal is able to mine sequential patterns from stream data with rather low price
the deployment of infrastructure less ad hoc networks is suffering from the lack of applications in spite of active research over decade this problem can be solved to certain extent by porting successful legacy internet applications and protocols to the ad hoc network domain session initiation protocol sip is designed to provide the signaling support for multimedia applications such as internet telephony instant messaging presence etc sip relies on the infrastructure of the internet and an overlay of centralized sip servers to enable the sip endpoints discover each other and establish session by exchanging sip messages however such an infrastructure is unavailable in ad hoc networks in this paper we propose two approaches to solve this problem and enable sip based session setup in ad hoc networks loosely coupled approach where the sip endpoint discovery is decoupled from the routing procedure and ii tightly coupled approach which integrates the endpoint discovery with fully distributed cluster based routing protocol that builds virtual topology for efficient routing simulation experiments show that the tightly coupled approach performs better for relatively static multihop wireless networks than the loosely coupled approach in terms of the latency in sip session setup the loosely coupled approach on the other hand generally performs better in networks with random node mobility the tightly coupled approach however has lower control overhead in both the cases
we describe strategy for enabling existing commodity operating systems to recover from unexpected run time errors in nearly any part of the kernel including core kernel components our approach is dynamic and request oriented it isolates the effects of fault to the requests that caused the fault rather than to static kernel components this approach is based on notion of recovery domains an organizing principle to enable rollback of state affected by request in multithreaded system with minimal impact on other requests or threads we have applied this approach on and of the linux kernel and it required lines of changed or new code the other changes are all performed by simple instrumentation pass of compiler our experiments show that the approach is able to recover from otherwise fatal faults with minimal collateral impact during recovery event
we address the challenge of training visual concept detectors on web video as available from portals such as youtube in contrast to high quality but small manually acquired training sets this setup permits us to scale up concept detection to very large training sets and concept vocabularies on the downside web tags are only weak indicators of concept presence and web video training data contains lots of non relevant content so far there are two general strategies to overcome this label noise problem both targeted at discarding non relevant training content manual refinement supported by active learning sample selection an automatic refinement using relevance filtering in this paper we present highly efficient approach combining these two strategies in an interleaved setup manually refined samples are directly used to improve relevance filtering which again provides good basis for the next active learning sample selection our results demonstrate that the proposed combination called active relevance filtering outperforms both purely automatic filtering and manual one based on active learning for example by using manual labels per concept an improvement of over an automatic filtering is achieved and over active learning by annotating only of weak positive samples in the training set performance comparable to training on ground truth labels is reached
in the interference scheduling problem one is given set of communication requests described by source destination pairs of nodes from metric space the nodes correspond to devices in wireless network each pair must be assigned power level and color such that the pairs in each color class can communicate simultaneously at the specified power levels the feasibility of simultaneous communication within color class is defined in terms of the signal to interference plus noise ratio sinr that compares the strength of signal at receiver to the sum of the strengths of other signals the objective is to minimize the number of colors as this corresponds to the time needed to schedule all requestswe introduce an instance based measure of interference denoted by that enables us to improve on previous results for the interference scheduling problem we prove upper and lower bounds in terms of on the number of steps needed for scheduling set of requests for general power assignments we prove lower bound of log log steps where denotes the aspect ratio of the metric when restricting to the two dimensional euclidean space as previous work the bound improves to logδ alternatively when restricting to linear power assignments the lower bound improves even to the lower bounds are complemented by an efficient algorithm computing schedule for linear power assignments using only mathcal log steps more sophisticated algorithm computes schedule using even only mathcal log steps for dense instances in the two dimensional euclidean space this gives constant factor approximation for scheduling under linear power assignments which shows that the price for using linear and hence energy efficient power assignments is bounded by factor of mathcal log delta in addition we extend these results for single hop scheduling to multi hop scheduling and combined scheduling and routing problems where our analysis generalizes previous results towards general metrics and improves on the previous approximation factors
this paper studies phase singularities pss for image representation we show that pss calculated with laguerre gauss filters contain important information and provide useful tool for image analysis pss are invariant to image translation and rotation we introduce several invariant features to characterize the core structures around pss and analyze the stability of pss to noise addition and scale change we also study the characteristics of pss in scale space which lead to method to select key scales along phase singularity curves we demonstrate two applications of pss object tracking and image matching in object tracking we use the iterative closest point algorithm to determine the correspondences of pss between two adjacent frames the use of pss allows us to precisely determine the motions of tracked objects in image matching we combine pss and scale invariant feature transform sift descriptor to deal with the variations between two images and examine the proposed method on benchmark database the results indicate that our method can find more correct matching pairs with higher repeatability rates than some well known methods
multi dimensional expressions mdx provide an interface for asking several related olap queries simultaneously an interesting problem is how to optimize the execution of an mdx query given that most data warehouses maintain set of redundant materialized views to accelerate olap operations number of greedy and approximation algorithms have been proposed for different versions of the problem in this paper we evaluate experimentally their performance concluding that they do not scale well for realistic workloads motivated by this fact we develop two novel greedy algorithms our algorithms construct the execution plan in top down manner by identifying in each step the most beneficial view instead of finding the most promising query we show by extensive experimentation that our methods outperform the existing ones in most cases
we present shape deformation approach which preserves volume prevents self intersections and allows for exact control of the deformation impact the volume preservation and prevention of selfintersections are achieved by utilizing the method of vector field based shape deformations this method produces physically plausible deformations efficiently by integrating formally constructed divergence free vector fields where the region of influence is described by implicitly defined shapes we introduce an implicit representation of deformation boundaries which allows for an exact control of the deformation by placing the boundaries directly on the shape surface the user can specify precisely where the shape should be deformed and where not the simple polygonal representation of the boundaries allows for gpu implementation which is able to deform high resolution meshes in real time
service oriented applications feature interactions among several participants over the network mechanisms such as correlation sets and two party sessions have been proposed in the literature to separate messages sent to different instances of the same service this paper presents process calculus featuring dynamically evolving multiparty sessions to model interactions that spread over several participants the calculus also provides primitives for service definition invocation and for structured communication in order to highlight the interactions among the different concepts several examples from the soc area show the suitability of our approach
broadcast encryption be deals with secure transmission of message to group of users such that only an authorized subset of users can decrypt the message some of the most effective be schemes in the literature are the tree based schemes of complete subtree cs and subset difference sd the key distribution trees in these schemes are traditionally constructed without considering user preferences in fact these schemes can be made significantly more efficient when user profiles are taken into account in this paper we consider this problem and study how to construct the cs and sd trees more efficiently according to user profiles we first analyze the relationship between the transmission cost and the user profile distribution and prove number of key results in this aspect then we propose several optimization algorithms which can reduce the bandwidth requirement of the cs and sd schemes significantly this reduction becomes even more significant when number of free riders can be allowed in the system
the current schemes for security policy interoperation in multidomain environments are based on centralized mediator where the mediator may be bottleneck for maintaining the policies and mediating cross domain resource access control in this paper we present mediator free scheme for secure policy interoperation in our scheme policy interoperation is performed by the individual domains for which distributed multi domain policy model is proposed and distributed algorithms are given to create such cross domain policies specially the policies are distributed to each domain and we ensure that the policies are consistent and each domain keeps the complete policies it shall know
this paper describes several techniques designed to improve protocol latency and reports on their effectiveness when measured on modern risc machine employing the dec alpha processor we found that the memory system which has long been known to dominate network throughput is also key factor in protocol latency as result improving instruction cache effectiveness can greatly reduce protocol processing overheads an important metric in this context is the memory cycles per instructions mcpi which is the average number of cycles that an instruction stalls waiting for memory access to complete the techniques presented in this paper reduce the mcpi by factor of to in analyzing the effectiveness of the techniques we also present detailed study of the protocol processing behavior of two protocol stacks tcp ip and rpc on modern risc processor
static compiler optimizations can hardly cope with the complex run time behavior and hardware components interplay of modern processor architectures multiple architectural phenomena occur and interact simultaneously which requires the optimizer to combine multiple program transformations whether these transformations are selected through static analysis and models runtime feedback or both the underlying infrastructure must have the ability to perform long and complex compositions of program transformations in flexible manner existing compilers are ill equipped to perform that task because of rigid phase ordering fragile selection rules using pattern matching and cumbersome expression of loop transformations on syntax trees moreover iterative optimization emerges as pragmatic and general means to select an optimization strategy via machine learning and operations research searching for the composition of dozens of complex dependent parameterized transformations is challenge for iterative approachesthe purpose of this article is threefold to facilitate the automatic search for compositions of program transformations introducing richer framework which improves on classical polyhedral representations suitable for iterative optimization on simpler structured search space to illustrate using several examples that syntactic code representations close to the operational semantics hamper the composition of transformations and that complex compositions of transformations can be necessary to achieve significant performance benefits the proposed framework relies on unified polyhedral representation of loops and statements the key is to clearly separate four types of actions associated with program transformations iteration domain schedule data layout and memory access functions modifications the framework is implemented within the open orc compiler aiming for native ia amd and ia code generation along with source to source optimization of fortran and
software evolution research inherently has several resource intensive logistical constraints archived project artifacts such as those found in source code repositories and bug tracking systems are the principal source of input data analysis specific facts such as commit metadata or the location of design patterns within the code must be extracted for each change or configuration of interest the results of this resource intensive fact extraction phase must be stored efficiently for later use by more experimental types of research tasks such as algorithm or model refinement in order to perform any type of software evolution research each of these logistical issues must be addressed and an implementation to manage it created in this paper we introduce kenyon system designed to facilitate software evolution research by providing common set of solutions to these common logistical problems we have used kenyon for processing source code data from systems of varying sizes and domains archived in different types of software configuration management systems we present our experiences using kenyon with these systems and also describe kenyon’s usage by students in graduate seminar class
we propose methodology for building practical robust query classification system that can identify thousands of query classes with reasonable accuracy while dealing in real time with the query volume of commercial web search engine we use blind feedback technique given query we determine its topic by classifying the web search results retrieved by the query motivated by the needs of search advertising we primarily focus on rare queries which are the hardest from the point of view of machine learning yet in aggregation account for considerable fraction of search engine traffic empirical evaluation confirms that our methodology yields considerably higher classification accuracy than previously reported we believe that the proposed methodology will lead to better matching of online ads to rare queries and overall to better user experience
fine grained program power behavior is useful in both evaluating power optimizations and observing power optimization opportunities detailed power simulation is time consuming and often inaccurate physical power measurement is faster and objective however fine grained measurement generates enormous amounts of data in which locating important features is difficult while coarse grained measurement sacrifices important detail we present program power behavior characterization infrastructure that identifies program phases selects representative interval of execution for each phase and instruments the program to enable precise power measurement of these intervals to get their time dependent power behavior we show that the representative intervals accurately model the fine grained time dependent behavior of the program they also accurately estimate the total energy of program our compiler infrastructure allows for easy mapping between measurement result and its corresponding source code we improve the accuracy of our technique over previous work by using edge vectors ie counts of traversals of control flow edges instead of basic block vectors as well as incorporating event counters into our phase classification we validate our infrastructure through the physical power measurement of spec cpu integer benchmarks on an intel pentium system we show that using edge vectors reduces the error of estimating total program energy by over using basic block vectors and using edge vectors plus event counters reduces the error of estimating the fine grained time dependent power profile by over using basic block vectors
as the complexity of processor architectures increases there is widening gap between peak processor performance and sustained processor performance so that programs now tend to exploit only fraction of available performance while there is tremendous amount of literature on program optimizations compiler optimizations lack efficiency because they are plagued by three flaws they often implicitly use simplified if not simplistic models of processor architecture they usually focus on single processor component eg cache and ignore the interactions among multiple components the most heavily investigated components eg caches sometimes have only small impact on overall performance through the in depth analysis of simple program kernel we want to show that understanding the complex interactions between programs and the numerous processor architecture components is both feasible and critical to design efficient program optimizations
in this paper we propose framework to combine knowledge management and context aware and pervasive computing emphasizing on synchronization and adaptation issues of workflow processes in mobile settings the key aspect of the proposed framework is to enable adaptive two way interaction between context aware systems and users in mobile settings in contrast to existing concepts we aim at capturing active feedback from users which should contribute to the organizational memory after being reviewed evaluated and classified thus users would not only act as consumers but also as suppliers of relevant information and knowledge to the system and other users in addition the concept includes existing approaches to adapt to for instance different quality of service levels in order to provide maximum level of local autonomy we suggest using the adaptation concept to also support adjustments to cross cultural differences in perceiving and communicating information and knowledge our work is motivated by the need for distributed context aware and pervasive computing framework to support maintenance and administration tasks related to the international monitoring system ims of the ctbto prepcom an international organization located in vienna austria
this survey paper identifies some trends in the application implementation technology and processor architecture areas taxonomy which captures the influence of these trends on processor microsystems is presented and the communication needs of various classes of these architectures is also briefly surveyed we observe trend toward on chip networked microsystems derived from logically and physically partitioning the processor architecture partitioning the architecture logically enables the parallelism offered by growing application workloads to be well exploited partitioning the architecture physically enables the scaling properties of the underlying implementation technology to continue providing increased performance and not be encumbered by chip crossing wire delay which no longer is negligible the impact on future research directions of this paradigm shift in the way microsystems are designed and intraconnected is briefly highlighted
many general purpose object oriented scripting languages are dynamically typed which provides flexibility but leaves the programmer without the benefits of static typing including early error detection and the documentation provided by type annotations this paper describes diamondback ruby druby tool that blends ruby’s dynamic type system with static typing discipline druby provides type language that is rich enough to precisely type ruby code we have encountered without unneeded complexity when possible druby infers static types to discover type errors in ruby programs when necessary the programmer can provide druby with annotations that assign static types to dynamic code these annotations are checked at run time isolating type errors to unverified code we applied druby to suite of benchmarks and found several bugs that would cause run time type errors druby also reported number of warnings that reveal questionable programming practices in the benchmarks we believe that druby takes major step toward bringing the benefits of combined static and dynamic typing to ruby and other object oriented languages
in typical concept location queries the location is sometimes given by terms that cannot be found in gazetteers or geographic databases such terms usually describe vague geographical regions but might also include more general terms like mining or theme parks in which case the corresponding geographic footprint is less obvious in the present paper we describe our approach to deal with such vague location specifications in geographic queries roughly we determine geographic representation for these location specifications from toponyms found in the top documents resulting from query using the terms describing the location in this paper we describe an efficient process to derive the geographic representation for such situations at query time furthermore we present experiments depicting the performance of our approach as well as the result quality our approach allows for an efficient execution of queries such as camping ground near theme park it can also be used as standalone application giving visual impression of the geographic footprint of arbitrary terms
various online social networks osns have been developed rapidly on the internet researchers have analyzed different properties of such osns mainly focusing on the formation and evolution of the networks as well as the information propagation over the networks in knowledge sharing osns such as blogs and question answering systems issues on how users participate in the network and how users generate contribute knowledge are vital to the sustained and healthy growth of the networks however related discussions have not been reported in the research literature in this work we empirically study workloads from three popular knowledge sharing osns including blog system social bookmark sharing network and question answering social network to examine these properties our analysis consistently shows that users posting behavior in these networks exhibits strong daily and weekly patterns but the user active time in these osns does not follow exponential distributions the user posting behavior in these osns follows stretched exponential distributions instead of power law distributions indicating the influence of small number of core users cannot dominate the network the distributions of user contributions on high quality and effort consuming contents in these osns have smaller stretch factors for the stretched exponential distribution our study provides insights into user activity patterns and lays out an analytical foundation for further understanding various properties of these osns
this paper approaches the incremental view maintenance problem from an algebraic perspective we construct the algebraic structure of ring of databases and use it as the foundation of the design of query calculus that allows to express powerful aggregate queries the query calculus inherits key properties of the ring such as having normal form of polynomials and being closed under computing inverses and delta queries the th delta of polynomial query of degree without nesting is purely function of the update not of the database this gives rise to method of eliminating expensive query operators such as joins from programs that perform incremental view maintenance the main result is that for non nested queries each individual aggregate value can be incrementally maintained using constant amount of work this is not possible for nonincremental evaluation
regular expression re matching has important applications in the areas of xml content distribution and network security in this paper we present the end to end design of high performance re matching system our system combines the processing efficiency of deterministic finite automata dfa with the space efficiency of non deterministic finite automata nfa to scale to hundreds of res in experiments with real life re data on data streams we found that bulk of the dfa transitions are concentrated around few dfa states we exploit this fact to cache only the frequent core of each dfa in memory as opposed to the entire dfa which may be exponential in size further we cluster res such that res whose interactions cause an exponential increase in the number of states are assigned to separate groups this helps to improve cache hits by controlling the overall dfa size to the best of our knowledge ours is the first end to end system capable of matching res at high speeds and in their full generality through clever combination of re grouping and static and dynamic caching it is able to perform re matching at high speeds even in the presence of limited memory through experiments with real life data sets we show that our re matching system convincingly outperforms state of the art network intrusion detection tool with support for efficient re matching
dynamic slicing is widely used technique for program analysis debugging and comprehension however the reported slice is often too large to be inspected by the programmer in this work we address this deficiency by hierarchically applying dynamic slicing at various levels of granularity the basic observation is to divide program execution trace into phases with data control dependencies inside each phase being suppressed only the inter phase dependencies are presented to the programmer the programmer then zooms into one of these phases which is further divided into sub phases and analyzed we also discuss how our ideas can be used to augment debugging methods other then slicing such as fault localization recently proposed trace comparison method for software debugging
we propose new framework for solving general shallow wave equations gswe in order to efficiently simulate water flows on solid surfaces under shallow wave assumptions within this framework we develop implicit schemes for solving the external forces applied to water including gravity and surface tension we also present two way coupling method to model interactions between fluid and floating rigid objects water flows in this system can be simulated not only on planar surfaces by using regular grids but also on curved surfaces directly without surface parametrization the experiments show that our system is fast stable physically sound and straightforward to implement on both cpus and gpus it is capable of simulating variety of water effects including shallow waves water drops rivulets capillary events and fluid floating rigid body coupling because the system is fast we can also achieve real time water drop control and shape design
statistical parsers have become increasingly accurate to the point where they are useful in many natural language applications however estimating parsing accuracy on wide variety of domains and genres is still challenge in the absence of gold standard parse trees in this paper we propose technique that automatically takes into account certain characteristics of the domains of interest and accurately predicts parser performance on data from these new domains as result we have cheap no annotation involved and effective recipe for measuring the performance of statistical parser on any given domain
when upgrading storage systems the key is migrating data from old storage subsystems to the new ones for achieving data layout able to deliver high performance increased capacity and strong data availability while preserving the effectiveness of its location methodhowever achieving such data layout is not trivial when handling redundancy scheme because the migration algorithm must guarantee both data and redundancy will not be allocated on the same disk the orthogonal redundancy for instance delivers strong data availability for distributed disk arrays but this scheme is basically focused on homogeneous and static environments and technique that moves overall data layout called re striping is applied when upgrading itthis paper presents deterministic placement approach for distributing orthogonal redundancy on distributed heterogeneous disk arrays which is able to adapt on line the storage system to the capacity performance demands by only moving fraction of data layoutthe evaluation reveals that our proposal achieve data layouts delivering an improved performance and increased capacity while keeping the effectiveness of the redundancy scheme even after several migrations finally it keeps the complexity of the data management at an acceptable level
it has been recognized only recently that like databases web sites need models and schemes data intensive web sites are best developed using multi level design approach proceeding from data design via navigation design to web page design modern web based information systems are no longer static in nature rather they are dynamic besides querying they support workflow tasks and commerce transactions the design of such systems needs to consider the underlying business process next to the data their integrated design has been mainly treated in an ad hoc way so far in this paper we present three level schema architecture for the conceptual design of dynamic web based information systems we employ an object oriented approach that integrates data and process management and complements previous approaches for the design of data intensive web sites
some index structures have been redesigned to minimize the cache misses and improve their cpu cache performances the cache sensitive tree and recently developed cache sensitive tree are the most well known cache conscious index structures their performance evaluations however were made in single core cpu machines nowadays even the desktop computers are equipped with multi core cpu processors in this paper we present an experimental performance study to show how cache conscious trees perform on different types of cpu processors that are available in the market these days
benchmarks set standards for innovation in computer architecture research and industry product development consequently it is of paramount importance that these workloads are representative of real world applications however composing such representative workloads poses practical challenges to application analysis teams and benchmark developers real world workloads are intellectual property and vendors hesitate to share these proprietary applications and porting and reducing these applications to benchmarks that can be simulated in tractable amount of time is nontrivial task in this paper we address this problem by proposing technique that automatically distills key inherent behavioral attributes of proprietary workload and captures them into miniature synthetic benchmark clone the advantage of the benchmark clone is that it hides the functional meaning of the code but exhibits similar performance characteristics as the target application moreover the dynamic instruction count of the synthetic benchmark clone is substantially shorter than the proprietary application greatly reducing overall simulation time for spec cpu the simulation time reduction is over five orders of magnitude compared to entire benchmark execution using set of benchmarks representative of general purpose scientific and embedded applications we demonstrate that the power and performance characteristics of the synthetic benchmark clone correlate well with those of the original application across wide range of microarchitecture configurations
we introduce novel approach to the smart execution of scenario based models of reactive systems such as those resulting from the multi modal inter object language of live sequence charts lscs our approach finds multiple execution paths from given state of the system and allows the user to interactively traverse them the method is based on translating the problem of finding superstep of execution into problem in the ai planning domain and issuing known planning algorithm which we have had to modify and strengthen for our purposes
the strongest tradition of ir systems evaluation has focused on system effectiveness more recently there has been growing interest in evaluation of interactive ir systems balancing system and user oriented evaluation criteria in this paper we shift the focus to considering how ir systems and particularly digital libraries can be evaluated to assess and improve their fit with users broader work activities taking this focus we answer different set of evaluation questions that reveal more about the design of interfaces user system interactions and how systems may be deployed in the information working context the planning and conduct of such evaluation studies share some features with the established methods for conducting ir evaluation studies but come with shift in emphasis for example greater range of ethical considerations may be pertinent we present the pret rapporter framework for structuring user centred evaluation studies and illustrate its application to three evaluation studies of digital library systems
server selection is an important subproblem in distributed information retrieval dir but has commonly been studied with collections of more or less uniform size and with more or less homogeneous content in contrast realistic dir applications may feature much more varied collections in particular personal metasearch novel application of dir which includes all of user’s online resources may involve collections which vary in size by several orders of magnitude and which have highly varied data we describe number of algorithms for server selection and consider their effectiveness when collections vary widely in size and are represented by imperfect samples we compare the algorithms on personal metasearch testbed comprising calendar email mailing list and web collections where collection sizes differ by three orders of magnitude we then explore the effect of collection size variations using four partitionings of the trec ad hoc data used in many other dir experiments kullback leibler divergence previously considered poorly effective performs better than expected in this application other techniques thought to be effective perform poorly and are not appropriate for this problem strong correlation with size based rankings for many techniques may be responsible
algorithmic tools for searching and mining the web are becoming increasingly sophisticated and vital in this context algorithms that use and exploit structural information about the web perform better than generic methods in both efficiency and reliabilitywe present an extensive characterization of the graph structure of the web with view to enabling high performance applications that make use of this structure in particular we show that the web emerges as the outcome of number of essentially independent stochastic processes that evolve at various scales striking consequence of this scale invariance is that the structure of the web is fractal cohesive subregions display the same characteristics as the web at large an understanding of this underlying fractal nature is therefore applicable to designing data services across multiple domains and scaleswe describe potential applications of this line of research to optimized algorithm design for web scale data analysis
active network nodes allow for non trivial processing of data streams these complex network applications typically benefit from protection between their components for fault tolerance or security however fine grained memory protection introduces bottlenecks in communication among components this paper describes memory protection in expert an os for programmable network elements which re examines thread tunnelling as way of allowing these complex applications to be split over multiple protection domains we argue that previous problems with tunnelling are symptoms of overly general designs and we demonstrate minimal domain crossing primitive which nevertheless achieves the majority of benefits possible from tunnelling
this paper proposes methodology that allows users to control character’s motion interactively but continuously inspired by the work of gleicher et al gskj we propose semi automatic method to build fat graphs where node corresponds to pose and its incoming and outgoing edges represent the motion segments starting from and ending at similar poses group of edges is built into fat edge that parameterizes similar motion segments into blendable form employing the existing motion transition and blending methods our run time system allows users to control character interactively in continuous parameter spaces with conventional input devices such as joysticks and the mice the capability of the proposed methodology is demonstrated through several applications although our method has some limitations on motion repertories and qualities it can be adapted to number of real world applications including video games and virtual reality applications
program code compression is an emerging research activity that is having an impact in several production areas such as networking and embedded systems this is because the reduced sized code can have positive impact on network traffic and embedded system costs such as memory requirements and power consumption although code size reduction is relatively new research area numerous publications already exist on it the methods published usually have different motivations and variety of application contexts they may use different principles and their publications often use diverse notations to our knowledge there are no publications that present good overview of this broad range of methods and give useful assessment this article surveys twelve methods and several related works appearing in some papers published up to now we provide extensive assessment criteria for evaluating the methods and offer basis for comparison we conclude that it is fairly hard to make any fair comparisons of the methods or draw conclusions about their applicability
in this paper we propose layered tcp ltcp for short set of simple modifications to the congestion window response of tcp to make it more scalable in highspeed networks ltcp modifies the tcp flow to behave as collection of virtual flows to achieve more efficient bandwidth probing the number of virtual flows emulated is determined based on the dynamic network conditions by using the concept of virtual layers such that the convergence properties and rtt unfairness behavior is maintained similar to that of tcp in this paper we provide the intuition and the design for the ltcp protocol modifications and evaluation results based on ns simulations and linux implementation our results show that ltcp has promising convergence properties is about an order of magnitude faster than tcp in utilizing high bandwidth links employs few parameters and retains aimd characteristics
the explosive growth of embedded electronics is bringing information and control systems of increasing complexity to every aspects of our lives the most challenging designs are safety critical systems such as transportation systems eg airplanes cars and trains industrial plants and health care monitoring the difficulties reside in accommodating constraints both on functionality and implementation the correct behavior must be guaranteed under diverse states of the environment and potential failures implementation has to meet cost size and power consumption requirements the design is therefore subject to extensive mathematical analysis and simulation however traditional models of information systems do not interface well to the continuous evolving nature of the environment in which these devices operate thus in practice different mathematical representations have to be mixed to analyze the overall behavior of the system hybrid systems are particular class of mixed models that focus on the combination of discrete and continuous subsystems there is wealth of tools and languages that have been proposed over the years to handle hybrid systems however each tool makes different assumptions on the environment resulting in somewhat different notions of hybrid system this makes it difficult to share information among tools thus the community cannot maximally leverage the substantial amount of work that has been directed to this important topic in this paper we review and compare hybrid system tools by highlighting their differences in terms of their underlying semantics expressive power and mathematical mechanisms we conclude our review with comparative summary which suggests the need for unifying approach to hybrid systems design as step in this direction we make the case for semantic aware interchange format which would enable the use of joint techniques make formal comparison between different approaches possible and facilitate exporting and importing design representations
this is survey of results about versions of fine hierarchies and many one reducibilities that appear in different parts of theoretical computer science these notions and related techniques play crucial role in understanding complexity of finite and infinite computations we try not only to present the corresponding notions and facts from the particular fields but also to identify the unifying notions techniques and ideas
we present parametric higher order abstract syntax phoas new approach to formalizing the syntax of programming languages in computer proof assistants based on type theory like higher order abstract syntax hoas phoas uses the meta language’s binding constructs to represent the object language’s binding constructs unlike hoas phoas types are definable in general purpose type theories that support traditional functional programming like coq’s calculus of inductive constructions we walk through how coq can be used to develop certified executable program transformations over several statically typed functional programming languages formalized with phoas that is each transformation has machine checked proof of type preservation and semantic preservation our examples include cps translation and closure conversion for simply typed lambda calculus cps translation for system and translation from language with ml style pattern matching to simpler language with no variable arity binding constructs by avoiding the syntactic hassle associated with first order representation techniques we achieve very high degree of proof automation
this paper aims at discussing and classifying the various ways in which the object paradigm is used in concurrent and distributed contexts we distinguish among the library approach the integrative approach and the reflective approach the library approach applies object oriented concepts as they are to structure concurrent and distributed systems through class libraries the integrative approach consists of merging concepts such as object and activity message passing and transaction etc the reflective approach integrates class libraries intimately within an object based programming language we discuss and illustrate each of these and point out their complementary levels and goals
we introduce framework of multiple viewpoint systems for describing and designing systems that use more than one representation or set of relevance judgments on the same collection viewpoint is any representational scheme on some collection of data objects together with mechanism for accessing this content multiple viewpoint system allows searcher to pose queries to one viewpoint and then change to another viewpoint while retaining sense of context multiple viewpoint systems are well suited to alleviate vocabulary mismatches and to take advantage of the possibility of combining evidence we discuss some of the issues that arise in designing and using such systems and illustrate the concepts with several examples
the border gateway protocol bgp controls inter domain routing in the internet bgp is vulnerable to many attacks since routers rely on hearsay information from neighbors secure bgp bgp uses dsa to provide route authentication and mitigate many of these risks however many performance and deployment issues prevent bgp’s real world deployment previous work has explored improving bgp processing latencies but space problems such as increased message size and memory cost remain the major obstacles in this paper we design aggregated path authentication schemes by combining two efficient cryptographic techniques signature amortization and aggregate signatures we propose six constructions for aggregated path authentication that substantially improve efficiency of bgp’s path authentication on both speed and space criteria our performance evaluation shows that the new schemes achieve such an efficiency that they may overcome the space obstacles and provide real world practical solution for bgp security
for blind web users completing tasks on the web can be frustrating each step can require time consuming linear search of the current web page to find the needed interactive element or piece of information existing interactive help systems and the playback components of some programming by demonstration tools identify the needed elements of page as they guide the user through predefined tasks obviating the need for linear search on each step we introduce trailblazer system that provides an accessible non visual interface to guide blind users through existing how to knowledge formative study indicated that participants saw the value of trailblazer but wanted to use it for tasks and web sites for which no existing script was available to address this trailblazer offers suggestion based help created on the fly from short user provided task description and an existing repository of how to knowledge in an evaluation on tasks the correct prediction was contained within the top suggestions of the time
focused web crawlers collect topic related web pages from the internet using learning and semi supervised learning theories this study proposes an online semi supervised clustering approach for topical web crawlers sctwc to select the most topic related url to crawl based on the scores of the urls in the unvisited list the scores are calculated based on the fuzzy class memberships and the values of the unlabelled urls experimental results show that sctwc increases the crawling performance
we present novel method for shape acquisition based on mobile structured light unlike classical structured light methods in which static projector illuminates the scene with dynamic illumination patterns mobile structured light employs moving projector translated at constant velocity in the direction of the projector’s horizontal axis emitting static or dynamic illumination for our approach time multiplexed mix of two signals is used wave pattern enabling the recovery of point projector distances for each point observed by the camera and de bruijn pattern used to uniquely encode sparse subset of projector pixels based on this information retrieved on per camera pixel basis we are able to estimate sparse reconstruction of the scene as this sparse set of camera scene correspondences is sufficient to recover the camera location and orientation within the scene we are able to convert the dense set of point projector distances into dense set of camera depths effectively providing us with dense reconstruction of the observed scene we have verified our technique using both synthetic and real world data our experiments display the same level of robustness as previous mobile structured light methods combined with the ability to accurately estimate dense scene structure and accurate camera projector motion without the need for prior calibration
this article advances strategic proposal that would enable future research in data knowledge engineering and natural language processing to take broader range of meanings into account than can be derived from analyzing text with current methods focusing on syntactic and semantic text meanings it advocates drawing on the knowledge and common sense meanings that users acquire from nl interactions in their lifeworld ie the life meanings that are created and shared in social communities three philosophical language perspectives are described to derive research program for incorporating life meaning based methods into contemporary dke nlp research
reducing power consumption for server class computers is important since increased energy usage causes more heat dissipation greater cooling requirements reduced computational density and higher operating costs for typical data center storage accounts for percnt of energy consumption conventional server class raids cannot easily reduce power because loads are balanced to use all disks even for light loads we have built the power aware raid paraid which reduces energy use of commodity server class disks without specialized hardware paraid uses skewed striping pattern to adapt to the system load by varying the number of powered disks by spinning disks down during light loads paraid can reduce power consumption while still meeting performance demands by matching the number of powered disks to the system load reliability is achieved by limiting disk power cycles and using different raid encoding schemes based on our five disk prototype paraid uses up to percnt less power than conventional raids while achieving similar performance and reliability
hydrastor is scalable secondary storage solution aimed at the enterprise market the system consists of back end architectured as grid of storage nodes built around distributed hash table and front end consisting of layer of access nodes which implement traditional file system interface and can be scaled in number for increased performance this paper concentrates on the back end which is to our knowledge the first commercial implementation of scalable high performance content addressable secondary storage delivering global duplicate elimination per block user selectable failure resiliency self maintenance including automatic recovery from failures with data and network overlay rebuilding the back end programming model is based on an abstraction of sea of variable sized content addressed immutable highly resilient data blocks organized in dag directed acyclic graph this model is exported with low level api allowing clients to implement new access protocols and to add them to the system on line the api has been validated with an implementation of the file system interface the critical factor for meeting the design targets has been the selection of proper data organization based on redundant chains of data containers we present this organization in detail and describe how it is used to deliver required data services surprisingly the most complex to deliver turned out to be on demand data deletion followed not surprisingly by the management of data consistency and integrity
grids provide uniform access to aggregations of heterogeneous resources and services such as computers networks and storage owned by multiple organizations however such dynamic environment poses many challenges for application composition and deployment in this paper we present the design of the gridbus grid resource broker that allows users to create applications and specify different objectives through different interfaces without having to deal with the complexity of grid infrastructure we present the unique requirements that motivated our design and discuss how these provide flexibility in extending the functionality of the broker to support different low level middlewares and user interfaces we evaluate the broker with different job profiles and grid middleware and conclude with the lessons learnt from our development experience copyright copy john wiley sons ltd
fundamental problem in program analysis and optimization concerns the discovery of structural similarities between different sections of given program and or across different programs specifically there is need to find topologically identical segments within compiler intermediate representations irs such topological isomorphism has many applications for example finding isomorphic sub trees within different expression trees points to common computational resources that can be shared when targeting application specific hardware isomorphism in the controlflow graph can be used to discovery of custom instructions for customizable processors discovering isomorphism in context call trees during program execution is invaluable to several jit compiler optimizations thus all these different applications rely on the fundamental ability to find topologically identical segments within given tree or graph representation in this paper we present generic formulation of the subtree isomorphism problem that is more powerful than previous proposals we prove that an optimal quadratic time solution exists for this problem we employ dynamic programming based algorithm to efficiently enumerate all isomorphic sub trees within given reference trees and also demonstrate its efficacy in production compiler
the research domain of process discovery aims at constructing process model eg petri net which is an abstract representation of an execution log such model should be able to reproduce the log under consideration and be independent of the number of cases in the log in this paper we present process discovery algorithm where we use concepts taken from the language based theory of regions well known petri net research area we identify number of shortcomings of this theory from the process discovery perspective and we provide solutions based on integer linear programming
this paper concerns mixed initiative interaction between users and agents after classifying agents according to their task and their interactivity with the user the critical aspects of delegation based interaction are outlined then masma an agent system for distributed meeting scheduling is described and the solutions developed to control interaction are explained in detail the issues addressed concern the agent capability of adapting its behavior to the user it is supporting the solution adopted to control the shift of initiative between personal agents their users and other agents in the environment the availability of features eg the inspection mechanism that endow the user with further level of control to enhance his sense of trust in the agent
this article compares several page ordering strategies for web crawling under several metrics the objective of these strategies is to download the most important pages early during the crawl as the coverage of modern search engines is small compared to the size of the web and it is impossible to index all of the web for both theoretical and practical reasons it is relevant to index at least the most important pageswe use data from actual web pages to build web graphs and execute crawler simulator on those graphs as the web is very dynamic crawling simulation is the only way to ensure that all the strategies considered are compared under the same conditions we propose several page ordering strategies that are more efficient than breadth first search and strategies based on partial pagerank calculations
westwood tcp is sender side only modification of the classic tahoe reno tcp that has been recently proposed to improve fairness and efficiency of tcp the key idea of westwood tcp is to perform an end to end estimate of the bandwidth available for tcp connection by properly counting and filtering the stream of ack packets this estimate is used to adaptively decrease the congestion window and slow start threshold after congestion episode in this way westwood tcp substitutes the classic multiplicative decrease paradigm with the adaptive decrease paradigm in this paper we report experimental results that have been obtained running linux implementations of westwood westwood and reno tcp to ftp data over an emulated wan and over internet connections spanning continental and intercontinental distances in particular collected measurements show that the bandwidth estimation algorithm employed by westwood nicely tracks the available bandwidth whereas the tcp westwood bandwidth estimation algorithm greatly overestimates the available bandwidth because of ack compression live internet measurements also show that westwood tcp improves the goodput wrt tcp reno finally computer simulations using ns have been developed to test westwood westwood and reno in controlled scenarios these simulations show that westwood improves fairness and goodput wrt reno
in functional languages such as obj cafeobj and maude symbols are given strategy annotations that specify the order in which subterms are evaluated syntactically strategy annotations are given either as lists of natural numbers or as lists of integers associated to function symbols whose absolute values refer to the arguments of the corresponding symbol positive index prescribes the evaluation of an argument whereas negative index means evaluation on demand these on demand indices have been proposed to support laziness in obj like languages while strategy annotations containing only natural numbers have been implemented and investigated to some extent regarding for example termination confluence and completeness fully general annotations including positive and negative indices have been disappointingly under explored to date in this paper we first point out number of problems of current proposals for handling on demand strategy annotations then we propose solution to these problems by keeping an accurate track of annotations along the evaluation sequences we formalize this solution as suitable extension of the evaluation strategy of obj like languages which only consider annotations given as natural numbers to on demand strategy annotations our on demand evaluation strategy ode overcomes the drawbacks of previous proposals and also has better computational properties for instance we show how to use this strategy for computing head normal forms we also introduce transformation which allows us to prove the termination of the new evaluation strategy by using standard rewriting techniques finally we present two interpreters of the new strategy together with some encouraging experiments which demonstrate the usefulness of our approach
in this paper we address the issue of optimizing the per request cost and maximizing the number of requests that can be served by networked system that demands staging of data at vantage sites in order to provide guaranteed quality of services generation of multicast trees with end to end delay constraints is recommended to minimize the costs since such an issue has been proved to be np complete in networked environment we proposed two efficient and practically realizable heuristic algorithms referred to as source initiated scheduling sis algorithm and client initiated scheduling cis algorithm to solve the problem in polynomial time both sis and cis algorithms judiciously combine the concept of qos constrained multicast routing and network caching so that the copies of data to be staged can be dynamically cached in the network these strategies are carefully designed to consider the underlying resource constraints imposed by the dynamic network environments such as link bandwidth availability data availability on the network and the storage capabilities of site we analyze and quantify the performance under several influencing parameters such as link availability cache capacity and the data availability simulation results show that both of the proposed algorithms are able to reduce the service cost and achieve high acceptance ratio
previous research has documented the fragmented nature of software development work to explain this in more detail we analyzed software developers day to day information needs we observed seventeen developers at large software company and transcribed their activities in minute sessions we analyzed these logs for the information that developers sought the sources that they used and the situations that prevented information from being acquired we identified twenty one information types and cataloged the outcome and source when each type of information was sought the most frequently sought information included awareness about artifacts and coworkers the most often deferred searches included knowledge about design and program behavior such as why code was written particular way what program was supposed to do and the cause of program state developers often had to defer tasks because the only source of knowledge was unavailable coworkers
topic detection and tracking and topic segmentation play an important role in capturing the local and sequential information of documents previous work in this area usually focuses on single documents although similar multiple documents are available in many domains in this paper we introduce novel unsupervised method for shared topic detection and topic segmentation of multiple similar documents based on mutual information mi and weighted mutual information wmi that is combination of mi and term weights the basic idea is that the optimal segmentation maximizes mi or wmi our approach can detect shared topics among documents it can find the optimal boundaries in document and align segments among documents at the same time it also can handle single document segmentation as special case of the multi document segmentation and alignment our methods can identify and strengthen cue terms that can be used for segmentation and partially remove stop words by using term weights based on entropy learned from multiple documents our experimental results show that our algorithm works well for the tasks of single document segmentation shared topic detection and multi document segmentation utilizing information from multiple documents can tremendously improve the performance of topic segmentation and using wmi is even better than using mi for the multi document segmentation
the design implementation and testing of virtual environments is complicated by the concurrency and real time features of these systems therefore the development of formal methods for modeling and analysis of virtual environments is highly desirable in the past petri net models have led to good empirical results in the automatic verification of concurrent and real time systems we applied timed extension of petri nets to modeling and analysis of the cavetm virtual environment at the university of illinois at chicago we report on our time petri net model and on empirical studies that we conducted with the cabernet toolset from politecnico di milano our experiments uncovered flaw in the way shared buffer is used by cave processes due to an erroneous synchronization on the buffer different cave walls can simultaneously display images based on different input information we conclude from our empirical studies that petri net based tools can effectively support the development of reliable virtual environments
this paper presents unique multi core security architecture based on efi this architecture combines secure efi environment with insecure os so that it supports secure and reliable bootstrap hardware partition encryption service as well as real time security monitoring and inspection with this architecture secure efi environment provides users with management console to authenticate monitor and audit insecure os here an insecure os is general purpose os such as linux or windows in which user can perform ordinary jobs without obvious limitation and performance degradation this architecture also has unique capability to protect authentication rules and secure information such as encrypted data even if the security ability of an os is compromised prototype was designed and implemented experiment and test results show great performance merits for this new architecture
theoretical results suggest that in order to learn the kind of complicated functions that can represent high level abstractions eg in vision language and other ai level tasks one may need deep architectures deep architectures are composed of multiple levels of non linear operations such as in neural nets with many hidden layers or in complicated propositional formulae re using many sub formulae searching the parameter space of deep architectures is difficult task but learning algorithms such as those for deep belief networks have recently been proposed to tackle this problem with notable success beating the state of the art in certain areas this monograph discusses the motivations and principles regarding learning algorithms for deep architectures in particular those exploiting as building blocks unsupervised learning of single layer models such as restricted boltzmann machines used to construct deeper models such as deep belief networks
most studies on tilt based interaction can be classified as point designs that demonstrate the utility of wrist tilt as an input medium tilt parameters are tailored to suit the specific interaction at hand in this paper we systematically analyze the design space of wrist based interactions and focus on the level of control possible with the wrist in first study we investigate the various factors that can influence tilt control separately along the three axes of wrist movement flexion extension pronation supination and ulnar radial deviation results show that users can control comfortably at least levels on the pronation supination axis and that using quadratic mapping function for discretization of tilt space significantly improves user performance across all tilt axes we discuss the findings of our results in the context of several interaction techniques and identify several general design recommendations
virtually all proposals for querying xml include class of query we term ldquo containment queries rdquo it is also clear that in the foreseeable future substantial amount of xml data will be stored in relational database systems this raises the question of how to support these containment queries the inverted list technology that underlies much of information retrieval is well suited to these queries but should we implement this technology in separate loosely coupled ir engine or using the native tables and query execution machinery of the rdbms with option more than twenty years of work on rdbms query optimization query execution scalability and concurrency control and recovery immediately extend to the queries and structures that implement these new operations but all this will be irrelevant if the performance of option lags that of by too much in this paper we explore some performance implications of both options using native implementations in two commercial relational database systems and in special purpose inverted list engine our performance study shows that while rdbmss are generally poorly suited for such queries under conditions they can outperform an inverted list engine our analysis further identifies two significant causes that differentiate the performance of the ir and rdbms implementations the join algorithms employed and the hardware cache utilization our results suggest that contrary to most expectations with some modifications native implementations in an rdbms can support this class of query much more efficiently
within the context of web the word intelligence is often connected with the visions of semantic web and web one of the main characteristic of semantic web lies in the fact that information is annotated with metadata and this gives the opportunity of organizing knowledge extracting new knowledge and performing some basic operations like query answering or inference reasoning following this argument the advent of the semantic web is often claimed to bring about substantial progress in web accessibility which is part of the inclusion concept web sites favoring massive information sharing could as well be of great importance for inclusion enabling new forms of social interaction collective intelligence and new patterns of interpersonal communication benefits could be substantial also for people with activity limitations the paper tries to highlight the possible roles and convergence of web and semantic web in favoring inclusion it highlights the fact that examples of applications of these concepts to the inclusion domain are few and limited to the accessibility field
we present psychophysical experiment to determine the effectiveness of perceptual shape cues for rigidly moving objects in an interactive highly dynamic task we use standard non photorealistic npr techniques to carefully separate and study shape cues common to many rendering systems our experiment is simple to implement engaging and intuitive for participants and sensitive enough to detect significant differences between individual shape cues we demonstrate our experimental design with user study in that study participants are shown moving objects of which are designated targets rendered in different shape from styles participants select targets projected onto touch sensitive table we find that simple lambertian shading offers the best shape cue in our user study followed by contours and lastly texturing further results indicate that multiple shape cues should be used with care as these may not behave additively
the ability to interactively control viewpoint while watching video is an exciting application of image based rendering the goal of our work is to render dynamic scenes with interactive viewpoint control using relatively small number of video cameras in this paper we show how high quality video based rendering of dynamic scenes can be accomplished using multiple synchronized video streams combined with novel image based modeling and rendering algorithms once these video streams have been processed we can synthesize any intermediate view between cameras at any time with the potential for space time manipulationin our approach we first use novel color segmentation based stereo algorithm to generate high quality photoconsistent correspondences across all camera views mattes for areas near depth discontinuities are then automatically extracted to reduce artifacts during view synthesis finally novel temporal two layer compressed representation that handles matting is developed for rendering at interactive rates
searching and presenting rich data using mobile devices is hard given their inherent limitations one approach for alleviating these limitations is device symbiosis whereby the interaction with one’s personal mobile device is augmented by additionally engaging with more capable infrastructure devices such as kiosks and displays the celadon framework previously developed by our team builds upon device symbiosis for delivering zone based services through mobile and infrastructure devices in public spaces such as shopping malls train stations and theme parks an approach for rich data visualization that is gaining wide popularity is mashups in this paper we describe user defined mashups general methodology that combines device symbiosis and automated creation of mashups we have applied this methodology to build system that enables celadon users to flexibly interact with rich zone information through their mobile devices leveraging large public displays our system bridges public and personal devices data and services
in defining large complex access control policies one would like to compose sub policies perhaps authored by different organizations into single global policy existing policy composition approaches tend to be ad hoc and do not explain whether too many or too few policy combinators have been defined we define an access controlpolicy as four valued predicate that maps accesses to either grant deny conflict or unspecified these correspond to the four elements of the belnap bilattice functions on this bilattice are then extended to policies to serve as policy combinators we argue that this approach provides simple andnatural semantic framework for policy composition with minimal but functionally complete set of policy combinators we define derived higher level operators that are convenient for the specification of access control policies and enable the decoupling of conflict resolution from policy composition finally we propose basic query language and show that it can reduce important analyses eg conflict analysis to checks of policy refinement
ringing is the most disturbing artifact in the image deconvolution in this paper we present progressive inter scale and intra scale non blind image deconvolution approach that significantly reduces ringing our approach is built on novel edge preserving deconvolution algorithm called bilateral richardson lucy brl which uses large spatial support to handle large blur we progressively recover the image from coarse scale to fine scale inter scale and progressively restore image details within every scale intra scale to perform the inter scale deconvolution we propose joint bilateral richardson lucy jbrl algorithm so that the recovered image in one scale can guide the deconvolution in the next scale in each scale we propose an iterative residual deconvolution to progressively recover image details the experimental results show that our progressive deconvolution can produce images with very little ringing for large blur kernels
this article focuses on real time image correction techniques that enable projector camera systems to display images onto screens that are not optimized for projections such as geometrically complex colored and textured surfaces it reviews hardware accelerated methods like pixel precise geometric warping radiometric compensation multi focal projection and the correction of general light modulation effects online and offline calibration as well as invisible coding methods are explained novel attempts in super resolution high dynamic range and high speed projection are discussed these techniques open variety of new applications for projection displays some of them will also be presented in this report
multidatabase system provides integrated access to heterogeneous autonomous local databases in distributed system an important problem in current multidatabase systems is identification of semantically similar data in different local databases the summary schemas model ssm is proposed as an extension to multidatabase systems to aid in semantic identification the ssm uses global data structure to abstract the information available in multidatabase system this abstracted form allows users to use their own terms imprecise queries when accessing data rather than being forced to use system specified terms the system uses the global data structure to match the user’s terms to the semantically closest available system terms simulation of the ssm is presented to compare imprecise query processing with corresponding query processing costs in standard multidatabase system the costs and benefits of the ssm are discussed and future research directions are presented
several computer based tools have been developed to support cooperative work the majority of these tools rely on the traditional input devices available on standard computer systems ie keyboard and mouse this paper focuses on the use of gestural interaction for cooperative scenarios discussing how it is more suited for some tasks and hypothesizing on how users cooperatively decide on which tasks to perform based on the available input modalities and task characteristics an experiment design is presented to validate the proposed hypothesis the preliminary evaluation results also presented support this hypothesis
bit address spaces are increasingly important for modern applications but they come at price pointers use twice as much memory reducing the effective cache capacity and memory bandwidth of the system compared to bit address spaces this paper presents sophisticated automatic transformation that shrinks pointers from bits to bits the approach is macroscopic ie it operates on an entire logical data structure in the program at time it allows an individual data structure instance or even subset thereof to grow up to bytes in size and can compress pointers to some data structures but not others together these properties allow efficient usage of large bit address space we also describe but have not implemented dynamic version of the technique that can transparently expand the pointers in an individual data structure if it exceeds the gb limit for collection of pointer intensive benchmarks we show that the transformation reduces peak heap sizes substantially by to for several of these benchmarks and improves overall performance significantly in some cases
most commerce websites rely on title keyword search to accurately retrieve the items for sale in particular category we have found that the titles of many items on ebay are shortened or not very specific which leads to ineffective results when searched one possible solution is to recommend the sellers relevant and informative terms for title expansion without any change of search function the related technique has been explored in previous work such as query expansion and keyword suggestion in this paper we study the effect of term suggestion on title based search frequently used approach co occurrence is tested on dataset collected from ebay website wwwebaycom besides for suggestion algorithm we take into account three particular features in our application scenario including concept term description relevance and chance to be viewed although the experiments are conducted on ebay data we believe that considering commerce particularities will help us to customize the suggestion according to the requirements of web commerce
in theory the expressive power of an aspect language should be independent of the aspect deployment approach whether it is static or dynamic weaving however in the area of strictly statically typed and compiled languages such as or there seems to be feedback from the weaver implementation to the language level dynamic aspect languages offer noticeable fewer features than their static counterparts especially means for generic aspect implementations are missing as they are very difficult to implement in dynamic weavers this hinders reusability of aspects and the application of aop to scenarios where both runtime and compile time adaptation is required our solution to overcome these limitations is based on novel combination of static and dynamic weaving techniques which facilitates the support of typical static language features such as generic advice in dynamic weavers for compiled languages in our implementation the same aspectc aspect code can now be woven statically or dynamically into the squid web proxy providing flexibility and best of bread for many aop based adaptation scenarios
the paper summarises our experiences teaching formal program specification and verification using the specification language jml and the automated program verification tool esc java this technology has proven to be mature and simple enough to introduce students to formal methods even undergraduate students with no prior knowledge of formal methods and even only very basic knowledge of java programming however there are some limitations on the kind of examples that can be comfortably tackled
technical infrastructure for storing querying and managing rdfdata is key element in the current semantic web development systems like jena sesame or the ics forth rdf suite are widelyused for building semantic web applications currently none ofthese systems supports the integrated querying of distributed rdf repositories we consider this major shortcoming since the semanticweb is distributed by nature in this paper we present an architecture for querying distributed rdf repositories by extending the existing sesame system we discuss the implications of our architectureand propose an index structure as well as algorithms forquery processing and optimization in such distributed context
this paper presents an architectural framework and algorithms for engineering dynamic real time distributed systems using commercial off the shelf technologies in the proposed architecture real time system application is developed in general purpose programming language further the architectural level description of the system such as composition and interconnections of application software and hardware and the operational requirements of the system such as timeliness and survivability are specified in system description language the specification of the system is automatically translated into an intermediate representation ir that models the system in platform independent manner the ir is augmented with dynamic measurements of the system by language runtime system to produce dynamic system model the dynamic model is used by resource management middleware strategies to perform resource management that achieves the timeliness and survivability requirements the middleware techniques achieve the timeliness and survivability requirements through runtime monitoring and failure detection diagnosis and dynamic resource allocation we present two classes of algorithms predictive and availability based for performing resource allocation to validate the viability of the approach we use real time benchmark application that functionally approximates dynamic real time command and control systems the benchmark is specified in the system description language and the effectiveness of the architecture in achieving its design goals is examined through set of experiments the experimental characterizations illustrate that the middleware is able to achieve the desired timeliness requirements during number of load situations furthermore the results indicate that availability based allocation algorithms perform resource allocation less frequently whereas the predictive algorithms give better steady state performance for the application
the empirical assessment of test techniques plays an important role in software testing research one common practice is to seed faults in subject software either manually or by using program that generates all possible mutants based on set of mutation operators the latter allows the systematic repeatable seeding of large numbers of faults thus facilitating the statistical analysis of fault detection effectiveness of test suites however we do not know whether empirical results obtained this way lead to valid representative conclusions focusing on four common control and data flow criteria block decision use and use this paper investigates this important issue based on middle size industrial program with comprehensive pool of test cases and known faults based on the data available thus far the results are very consistent across the investigated criteria as they show that the use of mutation operators is yielding trustworthy results generated mutants can be used to predict the detection effectiveness of real faults applying such mutation analysis we then investigate the relative cost and effectiveness of the above mentioned criteria by revisiting fundamental questions regarding the relationships between fault detection test suite size and control data flow coverage although such questions have been partially investigated in previous studies we can use large number of mutants which helps decrease the impact of random variation in our analysis and allows us to use different analysis approach our results are then compared with published studies plausible reasons for the differences are provided and the research leads us to suggest way to tune the mutation analysis process to possible differences in fault detection probabilities in specific environment
motivated by psychophysiological investigations on the human auditory system bio inspired two dimensional auditory representation of music signals is exploited that captures the slow temporal modulations although each recording is represented by second order tensor ie matrix third order tensor is needed to represent music corpus non negative multilinear principal component analysis nmpca is proposed for the unsupervised dimensionality reduction of the third order tensors the nmpca maximizes the total tensor scatter while preserving the non negativity of auditory representations an algorithm for nmpca is derived by exploiting the structure of the grassmann manifold the nmpca is compared against three multilinear subspace analysis techniques namely the non negative tensor factorization the high order singular value decomposition and the multilinear principal component analysis as well as their linear counterparts ie the non negative matrix factorization the singular value decomposition and the principal components analysis in extracting features that are subsequently classified by either support vector machine or nearest neighbor classifiers three different sets of experiments conducted on the gtzan and the ismir genre datasets demonstrate the superiority of nmpca against the aforementioned subspace analysis techniques in extracting more discriminating features especially when the training set has small cardinality the best classification accuracies reported in the paper exceed those obtained by the state of the art music genre classification algorithms applied to both datasets
switching activity and instruction cycles are two of the most important factors in power dissipation when the supply voltage is fixed this paper studies the scheduling and assignment problems that minimize the total energy caused by both instruction processing and switching activities for applications with loops on multi core multi functional unit multi fu architectures an algorithm empls energy minimization with probability using loop scheduling is proposed to minimize the total energy while satisfying timing constraint with guaranteed probability we perform scheduling and assignment simultaneously our approach shows better performance than the approaches that consider scheduling and assignment in separate phases compared with previous work our algorithm exhibits significant improvement in total energy reduction
software customers want both sufficient product quality and agile response to requirements changes formal software requirements tracing helps to systematically determine the impact of changes and to keep track of development artifacts that need to be re tested when requirements change however full tracing of all requirements on the most detailed level can be very expensive and time consuming in the paper an initial tracing activity model is introduced along with framework that allows measuring the expected cost and benefit of tracing approaches in feasibility study subset of the activities belonging to the model has been applied to compare three tracing strategies agile just in time tracing and fully formal tracing the study focused on re testing and it has been performed in the context of an industry project where the customer was large financial service providerin the study the model was found useful to capture costs and benefits of the tracing activities and to compare different strategies combination of tracing approaches proved helpful in balancing agility and formalism
we present decentralized algorithm for online clustering analysis used for anomaly detection in self monitoring distributed systems in particular we demonstrate the monitoring of network of printing devices that can perform the analysis without the use of external computing resources ie in network analysis we also show how to ensure the robustness of the algorithm in terms of anomaly detection accuracy in the face of failures of the network infrastructure on which the algorithm runs further we evaluate the tradeoff in terms of overhead necessary for ensuring this robustness and present method to reduce this overhead while maintaining the detection accuracy of the algorithm
this paper introduces the tablemouse new cursor manipulation interaction technology for tabletop computing specifically designed to support multiple users operating on large horizontal displays the tablemouse is low cost absolute positioning device utilising visually tracked infrared light emitting diodes for button state position orientation and unique identification information the supporting software infrastructure is designed to support up to tablemouse devices simultaneously each with an individual system cursor this paper introduces the device and software infrastructure and presents two applications exposing its functionality formal benchmarking was performed against the traditional mouse for its performance and accuracy
since web workloads are known to vary dynamically with time in this paper we argue that dynamic resource allocation techniques are necessary to provide guarantees to web applications running on shared data centers to address this issue we use system architecture that combines online measurements with prediction and resource allocation techniques to capture the transient behavior of the application workloads we model server resource using time domain description of generalized processor sharing gps server this model relates application resource requirements to their dynamically changing workload characteristics the parameters of this model are continuously updated using an online monitoring and prediction framework this framework uses time series analysis techniques to predict expected workload parameters from measured system metrics we then employ constrained non linear optimization technique to dynamically allocate the server resources based on the estimated application requirements the main advantage of our techniques is that they capture the transient behavior of applications while incorporating nonlinearity in the system model we evaluate our techniques using simulations with synthetic as well as real world web workloads our results show that these techniques can judiciously allocate system resources especially under transient overload conditions
while user interface ui prototyping is generally considered useful it may often be too expensive and time consuming this problem becomes even more severe through the ubiquitous use of variety of devices such as pcs mobile phones and pdas since each of these devices has its own specifics that require special user interface instead of developing ui prototypes directly we propose specifying one interaction design from which uis can be automatically generated for multiple devices our implemented approach uses communicative acts which derive from speech act theory and carry desired intentions in interactions models of communicative acts ui domain objects and interaction sequences comprise interaction design specifications in our approach and are based on metamodel that we have defined we support the development of such models through an ide which is coupled with the ui generator this allows new form of ui prototyping where the effects of each model change can be seen immediately in the automatically generated uis for every device at once
we define quality of service qos and cost model for communications in systems on chip soc and derive related network on chip noc architecture and design process soc inter module communication traffic is classified into four classes of service signaling for inter module control signals real time representing delay constrained bit streams rd wr modeling short data access and block transfer handling large data bursts communication traffic of the target soc is analyzed by means of analytic calculations and simulations and qos requirements delay and throughput for each service class are derived customized quality of service noc qnoc architecture is derived by modifying generic network architecture the customization process minimizes the network cost in area and power while maintaining the required qosthe generic network is based on two dimensional planar mesh and fixed shortest path based multiclass wormhole routing once communication requirements of the target soc are identified the network is customized as follows the soc modules are placed so as to minimize spatial traffic density unnecessary mesh links and switching nodes are removed and bandwidth is allocated to the remaining links and switches according to their relative load so that link utilization is balanced the result is low cost customized qnoc for the target soc which guarantees that qos requirements are met
we propose novel technique for melting and burning solid materials including the simulation of the resulting liquid and gas the solid is simulated with traditional mesh based techniques triangles or tetrahedra which enable robust handling of both deformable and rigid objects collision and self collision rolling friction stacking etc the subsequently created liquid or gas is simulated with modern grid based techniques including vorticity confinement and the particle level set method the main advantage of our method is that state of the art techniques are used for both the solid and the fluid without compromising simulation quality when coupling them together or converting one into the other for example we avoid modeling solids as eulerian grid based fluids with high viscosity or viscoelasticity which would preclude the handling of thin shells self collision rolling etc thus our method allows one to achieve new effects while still using their favorite algorithms and implementations for simulating both solids and fluids whereas other coupling algorithms require major algorithm and implementation overhauls and still fail to produce rich coupling effects eg melting and burning solids
software instrumentation is widely used technique for parallel program performance evaluation debugging steering and visualization with increasing sophistication of parallel tool development technologies and broadening of application areas where these tools are being used runtime data collection and management activities are growing in importance we use the term instrumentation system is to refer to components that support these activities in state of the art parallel tool environments an is consists of local instrumentation servers an instrumentation system manager and transfer protocol the overheads and perturbation effects attributed to an is must be accounted for to ensure correct and efficient representation of program behavior especially for on line and real time environments moreover an is is key facilitator of integration of tools in an environment in this paper we define the primary components of an is and their roles in an integrated environment and classify iss according to selected features we introduce structured approach to plan design model evaluate implement and validate an is the approach provides means to formally address domain specific requirements the modeling and evaluation processes are illustrated in the context of three distinctive is case studies for picl paradyn and vista valuable feedback on performance effects of is parameters and policies can assist developers in making design decisions early in the software development cycle additionally use of structured software engineering methods can support the mapping of an abstract is model to an implementation of the is
temporal information has been regarded as key vehicle for sorting and grouping home photos into albums associated with events while time based browsing might be adequate for relatively small photo collection query and retrieval would be very useful to find relevant photos of an event in large collection in this paper we propose the use of temporal events for organizing and representing home photos using structured document formalism and hence new way to retrieve photos of an event using both image content and temporal context we describe hierarchical model of temporal events and the algorithm to construct it from collection of home photos in particular we compute metadata of node from the metadata of its children recursively to facilitate content based and context based matching between query and an event with semantic content representation extracted using visual keywords and extended conceptual graphs we demonstrate the effectiveness of photo retrieval on time stamped heterogeneous home photos with very promising results
as the disparity between processor and main memory performance grows the number of execution cycles spent waiting for memory accesses to complete also increases as result latency hiding techniques are critical for improved application performance on future processors we present microarchitecture scheme which detects and adapts to varying spatial locality dynamically adjusting the amount of data fetched on cache miss the spatial locality detection table introduced in this paper facilitates the detection of spatial locality across adjacent cached blocks results from detailed simulations of several integer programs show significant speedups the improvements are due to the reduction of conflict and capacity misses by utilizing small blocks and small fetch sizes when spatial locality is absent and the prefetching effect of large fetch sizes when spatial locality exists
recently two dimensional locality preserving projections dlpp was proposed to extract features directly from image matrices based on locality preserving criterion though dlpp has been applied in many domains including face and palmprint recognition it still has several disadvantages the nearest neighbor graph fails to model the intrinsic manifold structure inside the image large dimensionality training space affects the calculation efficiency and too many coefficients are needed for image representation these problems inspire us to propose an improved dlpp idlpp for recognition in this paper the modifications of the proposed idlpp mainly focus on two aspects firstly the nearest neighbor graph is constructed in which each node corresponds to column inside the matrix instead of the whole image to better model the intrinsic manifold structure secondly dpca is implemented in the row direction prior to dlpp in the column direction to reduce the calculation complexity and the final feature dimensions by using the proposed idlpp we achieve better recognition performance in both accuracy and speed furthermore owing to the robustness of gabor filter against variations the improved dlpp based on the gabor features idlppg can further enhance the recognition rate experimental results on the two palmprint databases of our lab demonstrate the effectiveness of the proposed method
contemporary data warehouses now represent some of the world’s largest databases as these systems grow in size and complexity however it becomes increasingly difficult for brute force query processing approaches to meet the performance demands of end users certainly improved indexing and more selective view materialization are helpful in this regard nevertheless with warehouses moving into the multi terabyte range it is clear that the minimization of external memory accesses must be primary performance objective in this paper we describe the cache natively multi dimensional caching framework designed specifically to support sophisticated warehouse olap environments cache is based upon an in memory version of the tree that has been extended to support buffer pages rather than disk blocks key strength of the cache is that it is able to utilize multi dimensional fragments of previous query results so as to significantly minimize the frequency and scale of disk accesses moreover the new caching model directly accommodates the standard relational storage model and provides mechanisms for pro active updates that exploit the existence of query hot spots the current prototype has been evaluated as component of the sidera dbms shared nothing parallel olap server designed for multi terabyte analytics experimental results demonstrate significant performance improvements relative to simpler alternatives
data cubes support powerful data analysis method called the range sum query the range sum query is widely used in finding trends and in discovering relationships among attributes in diverse database applications range sum query computes aggregate information over an online analytical processing olap data cube in specified query ranges existing techniques for range sum queries on data cubes use an additional cube called the prefix sum cube pc to store the cumulative sums of data causing high space overhead this space overhead not only leads to extra costs for storage devices but also causes additional propagations of updates and longer access time on physical devicesin this paper we present new cube representation called the pc pool which drastically reduces the space of the pc in large data warehouse the pc pool decreases the update propagation caused by the dependency between values in cells of the pc we develop an effective algorithm which finds dense sub cubes from large data cube we perform an extensive experiment with diverse data sets and examine the space reduction and performance of our proposed method with respect to various dimensions of the data cube and query sizes experimental results show that our method reduces the space of the pc while having reasonable query performance
we describe meta querying system for databases containing queries in addition to ordinary data in the context of such databases meta query is query about queries representing stored queries in xml and using the standard xml manipulation language xslt as sublanguage we show that just few features need to be added to sql to turn it into fully fledged meta query language the good news is that these features can be directly supported by extensible database technology
we consider the problem of evaluating continuous selection queries over sensor generated values in the presence of faults small sensors are fragile have finite energy and memory and communicate over lossy medium hence tuples produced by them may not reach the querying node resulting in an incomplete and ambiguous answer as any of the non reporting sensors may have produced tuple which was lost we develop protocol fault tolerant evaluation of continuous selection queries fate csq which guarantees user requested level of quality in an efficient manner when many faults occur this may not be achievable in that case we aim for the best possible answer under the query’s time constraints fate csq is designed to be resilient to different kinds of failures our design decisions are based on an analytical model of different fault tolerance strategies based on feedback and retransmission additionally we evaluate fate csq and competing protocols with realistic simulation parameters under variety of conditions demonstrating its good performance
re finding common web task is difficult when previously viewed information is modified moved or removed for example if person finds good result using the query breast cancer treatments she expects to be able to use the same query to locate the same result again while re finding could be supported by caching the original list caching precludes the discovery of new information such as in this case new treatment options people often use search engines to simultaneously find and re find information the research engine is designed to support both behaviors in dynamic environments like the web by preserving only the memorable aspects of result list study of result list memory shows that people forget lot the research engine takes advantage of these memory lapses to include new results where old results have been forgotten
traditionally ad hoc networks have been viewed as connected graph over which end to end routing paths had to be establishedmobility was considered necessary evil that invalidates paths and needs to be overcome in an intelligent way to allow for seamless ommunication between nodeshowever it has recently been recognized that mobility an be turned into useful ally by making nodes carry data around the network instead of transmitting them this model of routing departs from the traditional paradigm and requires new theoretical tools to model its performance mobility assisted protocol forwards data only when appropriate relays encounter each other and thus the time between such encounters called hitting or meeting time is of high importancein this paper we derive accurate closed form expressions for the expected encounter time between different nodes under ommonly used mobility models we also propose mobility model that can successfully capture some important real world mobility haracteristics often ignored in popular mobility models and alculate hitting times for this model as well finally we integrate this results with general theoretical framework that can be used to analyze the performance of mobility assisted routing schemes we demonstrate that derivative results oncerning the delay of various routing hemes are very accurate under all the mobility models examined hence this work helps in better under standing the performance of various approaches in different settings and an facilitate the design of new improved protocols
many popular database management systems implement multiversion concurrency control algorithm called snapshot isolation rather than providing full serializability based on locking there are well known anomalies permitted by snapshot isolation that can lead to violations of data consistency by interleaving transactions that would maintain consistency if run serially until now the only way to prevent these anomalies was to modify the applications by introducing explicit locking or artificial update conflicts following careful analysis of conflicts between all pairs of transactions this article describes modification to the concurrency control algorithm of database management system that automatically detects and prevents snapshot isolation anomalies at runtime for arbitrary applications thus providing serializable isolation the new algorithm preserves the properties that make snapshot isolation attractive including that readers do not block writers and vice versa an implementation of the algorithm in relational dbms is described along with benchmark and performance study showing that the throughput approaches that of snapshot isolation in most cases
in this paper we explore the impact of caching during search in the context of the recent framework of and or search in graphical models specifically we extend the depth first and or branch and bound tree search algorithm to explore an and or search graph by equipping it with an adaptive caching scheme similar to good and no good recording furthermore we present best first search algorithms for traversing the same underlying and or search graph and compare both algorithms empirically we focus on two common optimization problems in graphical models finding the most probable explanation mpe in belief networks and solving weighted csps wcsp in an extensive empirical evaluation we demonstrate conclusively the superiority of the memory intensive and or search algorithms on variety of benchmarks
modern mobile phones and pdas are equipped with positioning capabilities eg gps users can access public location based services eg google maps and ask spatial queries although communication is encrypted privacy and confidentiality remain major concerns since the queries may disclose the location and identity of the user commonly spatial anonymity is employed to hide the query initiator among group of users however existing work either fails to guarantee privacy or exhibits unacceptably long response time in this paper we propose mobihide peer to peer system for anonymous location based queries which addresses these problems mobihide employs the hilbert space filling curve to map the locations of mobile users to space the transformed locations are indexed by chord based distributed hash table which is formed by the mobile devices the resulting peer to peer system is used to anonymize query by mapping it to random group of users that are consecutive in the space compared to existing state of the art mobihide does not provide theoretical anonymity guarantees for skewed query distributions nevertheless it achieves strong anonymity in practice and it eliminates system hotspots our experimental evaluation shows that mobihide has good load balancing and fault tolerance properties and is applicable to real life scenarios with numerous mobile users
displaying scanned book pages in web browser is difficult due to an array of characteristics of the common user’s configuration that compound to yield text that is degraded and illegibly small for books which contain only text this can often be solved by using ocr or manual transcription to extract and present the text alone or by magnifying the page and presenting it in scrolling panel books with rich illustrations especially children’s picture books present greater challenge because their enjoyment is dependent on reading the text in the context of the full page with its illustrations we have created two novel prototypes for solving this problem by magnifying just the text without magnifying the entire page we present the results of user study of these techniques users found our prototypes to be more effective than the dominant interface type for reading this kind of material and in some cases even preferable to the physical book itself
the unified modeling language uml is family of design notations that is rapidly becoming de facto standard software design language uml provides variety of useful capabilities to the software designer including multiple interrelated design views semiformal semantics expressed as uml meta model and an associated language for expressing formal logic constraints on design elements the primary goal of this work is an assessment of uml’s expressive power for modeling software architectures in the manner in which number of existing software architecture description languages adls model architectures this paper presents two strategies for supporting architectural concerns within uml one strategy involves using uml as is while the other incorporates useful features of existing adls as uml extensions we discuss the applicability strengths and weaknesses of the two strategies the strategies are applied on three adls that as whole represent broad cross section of present day adl capabilities one conclusion of our work is that uml currently lacks support for capturing and exploiting certain architectural concerns whose importance has been demonstrated through the research and practice of software architectures in particular uml lacks direct support for modeling and exploiting architectural styles explicit software connectors and local and global architectural constraints
mutual anonymity system enables communication between client and service provider without revealing their identities in general the anonymity guarantees made by the protocol are enhanced when large number of participants are recruited into the anonymity system peer to peer pp systems are able to attract large number of nodes and hence are highly suitable for anonymity systems however the churn changes in system membership within pp networks poses significant challenge for low bandwidth reliable anonymous communication in these networks this paper presents muon protocol to achieve mutual anonymity in unstructured pp networks muon leverages epidemic style data dissemination to deal with churn simulation results and security analysis indicate that muon provides mutual anonymity in networks with high churn while maintaining predictable latencies high reliability and low communication overhead
speculative partial redundancy elimination spre uses execution profiles to improve the expected performance of programs we show how the problem of placing expressions to achieve the optimal expected performance can be mapped to particular kind of network flow problem and hence solved by well known techniques our solution is sufficiently efficient to be used in practice furthermore the objective function may be chosen so that reduction in space requirements is the primary goal and execution time is secondary one surprising result that an explosion in size may occur if speed is the sole goal and consideration of space usage is therefore important
in recent years high dynamic range textures hdrts have been frequently used in real time applications and video games to enhance realism unfortunately hdrts consume considerable amount of memory and efficient compression methods are not straightforward to implement on modern gpus we propose framework for efficient hdrt compression using tone mapping and its dual inverse tone mapping in our method encoding is performed by compressing the dynamic range using tone mapping operator followed by traditional encoding method for low dynamic range imaging our decoding method decodes the low dynamic range image and expands its range with the inverse tone mapping operator we present results using the photographic tone reproduction tone mapping operator and its inverse encoded with stc running in real time on current programmable gpu hardware resulting in compressed hdrts at bits per pixel bpp using fast shader program for decoding we show how our approach is favorable compared to other existing methods
opportunistic grids are class of computational grids that can leverage the idle processing and storage capacity of shared workstations in laboratories companies and universities to perform useful computation oppstore is middleware that allows using the free disk space of machines from an opportunistic grid for the distributed storage of application data but when machines depart from the grid it is necessary to reconstruct the fragments that were stored in that machines depending on the amount of stored data and the rate of machine departures the generated traffic may make the distributed storage of data infeasible in this work we present and evaluate fragment recovery mechanism that makes viable to achieve redundancy and large data scale in dynamic environment
as most blogs and traditional media support rss or atom feeds the news feed technology becomes increasingly prevalent taking advantage of ubiquitous news feeds we design feedex news feed exchange system forming distribution overlay network nodes in feedex not only fetch feed documents from the servers but also exchange them with neighbors among many benefits of collaborative feed exchange we focus on the low overhead scalable delivery mechanism that increases the availability of news feeds our design of feedex is incentive compatible so that nodes are encouraged into cooperating rather than free riding in addition for better design of feedex we analyze the data collected from feeds for days and present relevant statistics about news feed publishing including the distributions of feed size entry lifetime and publishing rateour experimental evaluation using planetlab machines which fetch from real world feed servers shows that feedex is an efficient system in many respects even when node fetches feed documents as infrequently as every hours it captures more than of the total entries published and those captured entries are available within minutes on average after published at the servers by contrast stand alone applications in the same condition show of entry coverage and hours of time lag the efficient delivery of feedex is achieved with low communication overhead as each node receives only document exchange calls and document checking calls per minute on average
analysis of technology and application trends reveals growing imbalance in the peak compute to memory capacity ratio for future servers at the same time the fraction contributed by memory systems to total datacenter costs and power consumption during typical usage is increasing in response to these trends this paper re examines traditional compute memory co location on single system and details the design of new general purpose architectural building block memory blade that allows memory to be disaggregated across system ensemble this remote memory blade can be used for memory capacity expansion to improve performance and for sharing memory across servers to reduce provisioning and power costs we use this memory blade building block to propose two new system architecture solutions page swapped remote memory at the virtualization layer and block access remote memory with support in the coherence hardware that enable transparent memory expansion and sharing on commodity based systems using simulations of mix of enterprise benchmarks supplemented with traces from live datacenters we demonstrate that memory disaggregation can provide substantial performance benefits on average in memory constrained environments while the sharing enabled by our solutions can improve performance per dollar by up to when optimizing memory provisioning across multiple servers
mobile ad hoc networks manets follow unique organizational and behavioral logic manets characteristics such as their dynamic topology coupled with the characteristics of the wireless communication medium make quality of service provisioning difficult challenge this paper presents new approach based on mobile routing backbone for supporting quality of service qos in manets in real life manets nodes will possess different communication capabilities and processing characteristics hence we aim to identify those nodes whose capabilities and characteristics will enable them to take part in the mobile routing backbone and efficiently participate in the routing process moreover the route discovery mechanism we developed for the mobile routing backbone dynamically distributes traffic within the network according to current network traffic levels and nodes processing loads simulation results show that our solution improves network throughput and packet delivery ratio by directing traffic through lowly congested regions of the network that are rich in resources moreover our protocol incurs lower communication overheads than aodv ad hoc on demand distance vector routing protocol when searching for routes in the network
this paper presents the popularity analysis performed on the wwwlnees video on demand service its principal special characteristics are the wide range of subjects the type of contents offered and the daily introduction of several new videos all of this and its similarity with the majority of the digital news services make it an interesting case study the analysis pays attention to all the elements which can influence content popularity such as the subject the video characteristics and the new content introduction policy moreover the results are checked in different timescales and periods of time
large and increasing gap exists between processor and memory speeds in scalable cache coherent multiprocessors to cope with this situation programmers and compiler writers must increasingly be aware of the memory hierarchy as they implement software tools to support memory performance tuning have however been hobbled by the fact that it is difficult to observe the caching behavior of running program little hardware support exists specifically for observing caching behavior furthermore what support does exist is often difficult to use for making fine grained observations about program memory behaviorour work observes that in multiprocessor the actions required for memory performance monitoring are similar to those required for enforcing cache coherence in fact we argue that on several machines the coherence communication system itself can be used as machine support for performance monitoring we have demonstrated this idea by implementing the flashpoint memory performance monitoring tool flashpoint is implemented as special performance monitoring coherence protocol for the stanford flash multiprocessor by embedding performance monitoring into cache coherence scheme based on programmable controller we can gather detailed per data structure memory statistics with less than slowdown compared to unmonitored program executions we present results on the accuracy of the data collected and on how flashpoint performance scales with the number of processors
generalization from string to trees and from languages to translations is given of the classical result that any regular language can be learned from examples it is shown that for any deterministic top down tree transformation there exists sample set of polynomial size with respect to the minimal transducer which allows to infer the translation until now only for string transducers and for simple relabeling tree transducers similar results had been known learning of deterministic top down tree transducers dtops is far more involved because dtop can copy delete and permute its input subtrees thus complex dependencies of labeled input to output paths need to be maintained by the algorithm first myhill nerode theorem is presented for dtops which is interesting on its own this theorem is then used to construct learning algorithm for dtops finally it is shown how our result can be applied to xml transformations eg xslt programs for this new dtd based encoding of unranked trees by ranked ones is presented over such encodings dtops can realize many practically interesting xml transformations which cannot be realized on firstchild next sibling encodings
the performance analysis of dynamic routing algorithms in interconnection networks of parallel computers has thus far predominantly been done by simulation studies limitation of simulation studies is that they usually only hold for specific combinations of network routing algorithm and traffic pattern in this paper we derive saturation point results for the class of homogeneous traffic patterns and large class of routing functions on meshes we show that the best possible saturation point on mesh is half the best possible saturation point on torus we also show that if we restrict ourselves to homogeneous routing functions the worst possible saturation point on mesh is again half the best possible saturation point finally we present class of homogeneous routing functions containing the well known cube routing function which are all optimal for all homogeneous traffic patterns
the world wide web is new advertising medium that corporations use to increase their exposure to consumers very large websites whose content is derived from source database need to maintain freshness that reflects changes that are made to the base data this issue is particularly significant for websites that present fast changing information such as stock exchange information and product information in this article we formally define and study the freshness of website that is refreshed by scheduled set of queries that fetch fresh data from the databases we propose several online scheduling algorithms and compare the performance of the algorithms on the freshness metric we show that maximizing the freshness of website is np hard problem and that the scheduling algorithm mief performs better than the other proposed algorithms our conclusion is verified by empirical results
database centric information systems are critical to the operations of large organisations in particular they often process large amount of data with stringent performance requirements currently however there is lack of systematic approaches to evaluating and predicting their performance when they are subject to an exorbitant growth of workload in this paper we introduce such systematic approach that combines benchmarking production system monitoring and performance modelling bmm to address this issue the approach helps the performance analyst to understand the system’s operating environment and quantify its performance characteristics under varying load conditions via monitoring and benchmarking based on such realistic measurements modelling techniques are used to predict the system performance our experience of applying bmm to real world system demonstrates the capability of bmm in predicting the performance of existing and enhanced software architectures in planning for its capacity growth
the increased use of video data sets for multimedia based applications has created demand for strong video database support including efficient methods for handling the content based query and retrieval of video data video query processing presents significant research challenges mainly associated with the size complexity and unstructured nature of video data video query processor must support video operations for search by content and streaming new query types and the incorporation of video methods and operators in generating optimizing and executing query plans in this paper we address these query processing issues in two contexts first as applied to the video data type and then as applied to the stream data type we first present the query processing functionality of the vdbms video database management system as framework designed to support the full range of functionality for video as an abstract data type we describe two query operators for the video data type which implement the rank join and stop after algorithms as videos may be considered streams of consecutive image frames video query processing can be expressed as continuous queries over video data streams the stream data type was therefore introduced into the vdbms system and system functionality was extended to support general data streams from this viewpoint we present an approach for defining and processing streams including video through the query execution engine we describe the implementation of several algorithms for video query processing expressed as continuous queries over video streams such as fast forward region based blurring and left outer join we include description of the window join algorithm as core operator for continuous query systems and discuss shared execution as an optimization approach for stream query processing
in this paper we propose model based quality of service qos routing scheme for ieee ad hoc networks unlike most of qos routing schemes in the literature the proposed scheme provides stochastic end to end delay guarantees instead of average delay guarantees to delay sensitive bursty traffic sources via cross layer design approach the scheme selects the routes based on geographical on demand ad hoc routing protocol and checks the availability of network resources by using traffic source and link layer channel modeling taking into consideration the ieee characteristics and node interactions our scheme extends the well developed effective bandwidth theory and its dual effective capacity concept to multihop ieee ad hoc networks extensive computer simulations demonstrate that the proposed scheme is effective in satisfying the end to end delay bound to probabilistic limit
standard ml is statically typed programming language that is suited for the construction of both small and large programs programming in the small is captured by standard ml’s core language programming in the large is captured by standard ml’s modules language that provides constructs for organising related core language definitions into self contained modules with descriptive interfaces while the core is used to express details of algorithms and data structures modules is used to express the overall architecture of software system the modules and core languages are stratified in the sense that modules may not be manipulated as ordinary values of the core this is limitation since it means that the architecture of program cannot be reconfigured according to run time demands we propose novel and practical extension of the language that allows modules to be manipulated as first class values of the core language
the backends of today’s internet services rely heavily on caching at various layers both to provide faster service to common requests and to reduce load on back end components cache placement is especially challenging given the diversity of workloads handled by widely deployed internet services this paper presents tool an analysis technique that automatically optimizes cache placement our experiments have shown that near optimal cache placements vary significantly based on input distribution
like other unstructured decision problems selection of external trustworthy objects are challenging particularly in virtual organization vo novel methods are desired to filter out invalid information as well as insecure programs this paper presents new conceptual approach to support selection of objects it is two level decision model which helps vo participant determine whether an external object can be accepted based on the object’s quality and security features this hierarchical decision making process complies with both practical evidence and theoretical decision models its underlying concepts are logically sound and comprehensible we illustrate the approaches using software selection
how much can smart combinatorial algorithms improve web search engines to address this question we will describe three algorithms that have had positive impact on web search engines the pagerank algorithm algorithms for finding near duplicate web pages and algorithms for index server loadbalancing
querying large scale graph structured data with twig patterns is attracting growing interest generally twig pattern could have an extremely large potentially exponential number of matches in graph retrieving and returning to the user this many answers may both incur high computational overhead and overwhelm the user in this paper we propose two efficient algorithms dp and dp for retrieving top ranked twig pattern matches from large graphs our first algorithm dp is able to retrieve exact top ranked answer matches from potentially exponentially many matches in time and space linear in the size of our data inputs even in the worst case further beyond the linear cost result of dp our second algorithm dp could take far less than linear time and space cost in practice to the best of our knowledge our algorithms are the first to have these performance properties our experimental results demonstrate the high performance of both algorithms on large datasets we also analyze and compare the performance trade off between dp and dp from the theoretical and practical viewpoints
personalization is one of the keys for the success of web services in this paper we present sean server for adaptive news an adaptive system for the personalized access to news servers on the www the aims of the system are to select the sections topics and news in the server that are most relevant for each user ii to customize the detail level of each news item to the user’s characteristics and iii to select the advertisements that are most appropriate for each page and user in the paper we discuss the functionalities of the system and we present the choices we made in its design in particular we focus on the techniques we adopted for structuring the news archive for creating and maintaining the user model and for generating the personalized hypertext for browsing the news server
this paper presents novel feature selection method for classification of high dimensional data such as those produced by microarrays it includes partial supervision to smoothly favor the selection of some dimensions genes on new dataset to be classified the dimensions to be favored are previously selected from similar datasets in large microarray databases hence performing inductive transfer learning at the feature level this technique relies on feature selection method embedded within regularized linear model estimation practical approximation of this technique reduces to linear svm learning with iterative input rescaling the scaling factors depend on the selected dimensions from the related datasets the final selection may depart from those whenever necessary to optimize the classification objective experiments on several microarray datasets show that the proposed method both improves the selected gene lists stability with respect to sampling variation as well as the classification performances
many applications that make use of sensor networks require secure communication because asymmetric key solutions are difficult to implement in such resource constrained environment symmetric key methods coupled with priori key distribution schemes have been proposed to achieve the goals of data secrecy and integrity these approaches typically assume that all nodes are similar in terms of capabilities and hence deploy the same number of keys in all sensors in network to provide the aforementioned protections in this paper we demonstrate that probabilistic unbalanced distribution of keys throughout the network that leverages the existence of small percentage of more capable sensor nodes can not only provide an equal level of security but also reduce the consequences of node compromise to fully characterize the effects of the unbalanced key management system we design implement and measure the performance of complementary suite of key establishment protocols known as liger using their predeployed keys nodes operating in isolation from external networks can securely and efficiently establish keys with each other should resources such as backhaul link to key distribution center kdc become available networks implementing liger automatically incorporate and benefit from such facilities detailed experiments demonstrate that the unbalanced distribution in combination with the multimodal liger suite offers robust and practical solution to the security needs in sensor networks
users often try to accumulate information on topic of interest from multiple information sources in this case user’s informational need might be expressed in terms of an available relevant document eg web page or an mail attachment rather than query database search engines are mostly adapted to the queries manually created by the users in case user’s informational need is expressed in terms of document we need algorithms that map keyword queries automatically extracted from this document to the database content in this paper we analyze the impact of selected document and database statistics on the effectiveness of keyword disambiguation for manually created as well as automatically extracted keyword queries our evaluation is performed using set of user queries from the aol query log and set of queries automatically extracted from wikipedia articles both executed against the internet movie database imdb our experimental results show that knowledge of the document context is crucial in order to extract meaningful keyword queries statistics which enable effective disambiguation of user queries are not sufficient to achieve the same quality for the automatically extracted requests
context aware computing is key paradigm of ubiquitous computing in which applications automatically adapt their operations to dynamic context data from multiple sources managing number of distributed sources middleware that facilitates the development of context aware applications must provide uniform view of all these sources to the applications local schemas of context data from individual sources need to be matched into set of global schemas in the middleware upon which applications can issue context queries to acquire data in this paper we study this problem of schema matching for context aware computing we propose multi criteria algorithm to determine candidate attribute matches between two schemas the algorithm adaptively adjusts the priorities of different criteria based on previous matching results to improve the efficiency and accuracy of succeeding operations we further develop an algorithm to categorize new local schema into one of the global schemas whenever possible via shared attribute dictionary our results based on schemas from real world websites demonstrate the good matching accuracy achieved by our algorithms
this work introduces distributed branch and bound algorithm to be run on computational grids grids are often organized in hierarchical fashion clusters of processors connected via high speed links while the clusters themselves are geographically distant and connected through slower links our algorithm does not employ the usual master worker paradigm and it considers the hierarchical structure of grids in its load balance and fault tolerance procedures this algorithm was applied over an existing code for the steiner problem in graphs experiments on real grid conditions have demonstrated its efficiency and scalability
bursty application patterns together with transfer limited storage devices combine to create major bottleneck on parallel systems this paper explores the use of time series models to forecast application request times then prefetching requests during computation intervals to hide latency experimental results with intensive scientific codes show performance improvements compared to standard unix prefetching strategies
the research in materialization of derived data elements has dealt so far with the if issue that is the question whether to physically store derived data elements in the active database area there has been some research on the how issue in this paper we deal with the when issue devising an optimization model to determine the optimal materialization strategy the decision problem confronted by the optimization model is more complex than to materialize or not to materialize the decision problem deals with devising the materialization strategy that consists of set of interdependent decisions about each derived data element each decision relates to two issuesshould the value of derived data element be persistent what is the required level of consistency of derived value with respect to its derivers for each derived data element the decision is based on both its local properties complexity of derivation update and retrieval frequencies etc and its interdependencies with other derived values the optimization model is based on heuristic algorithm that finds local optimum which is global optimum in many cases in and monitor that obtains feedback about the actual database performance this optimization model is general and is not specific to any data model our experimental results show that predictor for the optimal solution cannot be obtained in any intuitive or analytic way due to the complexity of the involved considerations thus there is no obvious way to achieve these results without using the optimization model this fact is strong motivation for applying such an optimization model our experimental results further indicate that the optimization model is useful in the sense that the system performance with respect to the applications goal function is substantially improved compared to any universal materialization policy
this paper presents novel approach for using clickthrough data to learn ranked retrieval functions for web search results we observe that users searching the web often perform sequence or chain of queries with similar information need using query chains we generate new types of preference judgments from search engine logs thus taking advantage of user intelligence in reformulating queries to validate our method we perform controlled user study comparing generated preference judgments to explicit relevance judgments we also implemented real world search engine to test our approach using modified ranking svm to learn an improved ranking function from preference data our results demonstrate significant improvements in the ranking given by the search engine the learned rankings outperform both static ranking function as well as one trained without considering query chains
modern distributed systems involving large number of nonstationary clients mobile hosts mh connected via unreliable low bandwidth communication channels are very prone to frequent disconnections this disconnection may occur because of different reasons the clients may voluntarily switch off to save battery power or client may be involuntarily disconnected due to its own movement in mobile network hand off wireless link failures etc mobile computing environment is characterized by slow wireless links and relatively underprivileged hosts with limited battery powers still when data at the server changes the client hosts must be made aware of this fact in order for them to invalidate their cache otherwise the host would continue to answer queries with the cached values returning incorrect data the nature of the physical medium coupled with the fact that disconnections from the network are very frequent in mobile computing environments demand cache invalidation strategy with minimum possible overheads in this paper we present new cache maintenance scheme called as the objective of the proposed scheme is to minimize the overhead for the mhs to validate their cache upon reconnection to allow stateless servers and to minimize the bandwidth requirement the general approach is to use asynchronous invalidation messages and to buffer invalidation messages from servers at the mh’s home location cache hlc while the mh is disconnected from the network and redeliver these invalidation messages to the mh when it gets reconnected to the network use of asynchronous invalidation messages minimizes access latency buffering of invalidation messages minimizes the overhead of validating mh’s cache after each disconnection and use of hlc off loads the overhead of maintaining state of mh’s cache from the servers the mh can be disconnected from the server either voluntarily or involuntarily we capture the effects of both by using single parameter the percentage of time mobile host is disconnected from the network we demonstrate the efficacy of our scheme through simulation and performance modeling in particular we show that the average data access latency and the number of uplink requests by mh decrease by using the proposed strategy at the cost of using buffer space at the hlc we provide analytical comparison between our proposed scheme and the existing scheme for cache management in mobile environment extensive experimental results are provided to compare the schemes in terms of performance metrics like latency number of uplink requests etc under both high and low rate of change of data at servers for various values of the parameter mathematical model for the scheme is developed which matches closely with the simulation results
this paper proposes new concept of polycube splines and develops novel modeling techniques for using the polycube splines in solid modeling and shape computing polycube splines are essentially novel variant of manifold splines which are built upon the polycube map serving as its parametric domain our rationale for defining spline surfaces over polycubes is that polycubes have rectangular structures everywhere over their domains except very small number of corner points the boundary of polycubes can be naturally decomposed into set of regular structures which facilitate tensor product surface definition gpu centric geometric computing and image based geometric processing we develop algorithms to construct polycube maps and show that the introduced polycube map naturally induces the affine structure with finite number of extraordinary points besides its intrinsic rectangular structure the polycube map may approximate any original scanned data set with very low geometric distortion so our method for building polycube splines is both natural and necessary as its parametric domain can mimic the geometry of modeled objects in topologically correct and geometrically meaningful manner we design new data structure that facilitates the intuitive and rapid construction of polycube splines in this paper we demonstrate the polycube splines with applications in surface reconstruction and shape computing
given the importance of parallel mesh generation in large scale scientific applications and the proliferation of multilevel smt based architectures it is imperative to obtain insight on the interaction between meshing algorithms and these systems we focus on parallel constrained delaunay mesh pcdm generation we exploit coarse grain parallelism at the subdomain level and fine grain at the element level this multigrain data parallel approach targets clusters built from low end commercially available smts our experimental evaluation shows that current smts are not capable of executing fine grain parallelism in pcdm however experiments on simulated smt indicate that with modest hardware support it is possible to exploit fine grain parallelism opportunities the exploitation of fine grain parallelism results to higher performance than pure mpi implementation and closes the gap between the performance of pcdm and the state of the art sequential mesher on single physical processor our findings extend to other adaptive and irregular multigrain parallel algorithms
in this paper we describe wizard of oz woz user study of an augmented reality ar interface that uses multimodal input mmi with natural hand interaction and speech commands our goal is to use woz study to help guide the creation of multimodal ar interface which is most natural to the user in this study we used three virtual object arranging tasks with two different display types head mounted display and desktop monitor to see how users used multimodal commands and how different ar display conditions affect those commands the results provided valuable insights into how people naturally interact in multimodal ar scene assembly task for example we discovered the optimal time frame for fusing speech and gesture commands into single command we also found that display type did not produce significant difference in the type of commands used using these results we present design recommendations for multimodal interaction in ar environments
combining table top and tangible user interfaces is relatively new research area many problems remain to be solved before users can benefit from tangible interactive table top systems this paper presents our effort in this direction robotable is an interactive table top system that enables users to naturally and intuitively manipulate robots the goal of this research is to develop software framework for human robot interaction which combines table top tangible objects artificial intelligence and physics simulations and demonstrate the framework with game applications
an important class of queries in moving object databases involves trajectories we propose to divide trajectory predicates into topological and non topological parts extend the intersection model of egenhofer franzosa to step evaluation strategy for trajectory queries filter step refinement step and tracing stepthe filter and refinement steps are similar to region searches as in spatial databases approximations of trajectories are typically used in evaluating trajectory queries in earlier studies minimum bounding boxes mbrs are used to approximate trajectory segments which allow index structures to be built eg tb trees and trees the use of mbrs hinders the efficiency since mbrs are very coarse approximations especially for trajectory segments to overcome this problem we propose new type of approximations minimum bounding octagon prism mbop we extend tree to new index structure octagon prism tree op tree for mbops of trajectory segments we conducted experiments to evaluate efficiency of op trees in performing region searches and trajectory queries the results show that op trees improve region searches significantly over synthetic trajectory data sets to tb trees and trees and can significantly reduce the evaluation cost of trajectory queries compared to tb trees
expertise to assist people on complex tasks is often in short supply one solution to this problem is to design systems that allow remote experts to help multiple people in simultaneously as first step towards building such system we studied experts attention and communication as they assisted two novices at the same time in co located setting we compared simultaneous instruction when the novices are being instructed to do the same task or different tasks using machine learning we attempted to identify speech markers of upcoming attention shifts that could serve as input to remote assistance system
digital watermarking is already used to establish the copyright of graphics audio and text and is now increasingly important for the protection of geometric data as well watermarking polygonal models in the spectral domain gives protection against similarity transformation mesh smoothing and additive random noise attacks however drawbacks exist in analyzing the eigenspace of laplacian matrices in this paper we generalize an existing spectral decomposition and propose new spatial watermarking technique based on this generalization while inserting the watermark we avoid the cost of finding the eigenvalues and eigenvectors of laplacian matrix in spectral decomposition instead we use linear operators derived from scaling functions that are generated from chebyshev polynomials experimental results show how the cost of inserting and detecting watermarks can be traded off against robustness under attacks like additive random noise and affine transformation
the redundancy of succinct data structure is the difference between the space it uses and the appropriate information theoretic lower bound we consider the problem of representing binary sequences and strings succinctly using small redundancy we improve the redundancy required to support the important operations of rank and select efficiently for binary sequences and for strings over small alphabets we also show optimal density sensitive upper and lower bounds on the redundancy for systematic encodings of binary sequences
hash tables are fundamental data structures that optimally answer membership queries suppose client stores elements in hash table that is outsourced at remote server so that the client can save space or achieve load balancing authenticating the hash table functionality ie verifying the correctness of queries answered by the server and ensuring the integrity of the stored data is crucial because the server lying outside the administrative control of the client can be malicious we design efficient and secure protocols for optimally authenticating membership queries on hash tables for any fixed constants constant time requiring nε logκε time to treat updates yet keeping the communication and verification costs constant this is the first construction for authenticating hash table with constant query cost and sublinear update cost our solution employs the rsa accumulator in nested way over the stored data strictly improving upon previous accumulator based solutions our construction applies to two concrete data authentication models and lends itself to scheme that achieves different trade offs namely constant update time and nε logκε query time for fixed and an experimental evaluation of our solution shows very good scalability
in this paper we address the problem of integrating independent and possibly heterogeneous data warehouses problem that has received little attention so far but that arises very often in practice we start by tackling the basic issue of matching heterogeneous dimensions and provide number of general properties that dimension matching should fulfill we then propose two different approaches to the problem of integration that try to enforce matchings satisfying these properties the first approach refers to scenario of loosely coupled integration in which we just need to identify the common information between data sources and perform join operations over the original sources the goal of the second approach is the derivation of materialized view built by merging the sources and refers to scenario of tightly coupled integration in which queries are performed against the view we also illustrate architecture and functionality of practical system that we have developed to demonstrate the effectiveness of our integration strategies
wide range of database applications manage time varying information existing database technology currently provides little support for managing such data the research area of temporal databases has made important contributions in characterizing the semantics of such information and in providing expressive and efficient means to model store and query temporal data this paper introduces the reader to temporal data management surveys state of the art solutions to challenging aspects of temporal data management and points to research directions
suffix trees are among the most important data structures in stringology with number of applications in flourishing areas like bioinformatics their main problem is space usage which has triggered much research striving for compressed representations that are still functional smaller suffix tree representation could fit in faster memory outweighing by far the theoretical slowdown brought by the space reduction we present novel compressed suffix tree which is the first achieving at the same time sublogarithmic complexity for the operations and space usage that asymptotically goes to zero as the entropy of the text does the main ideas in our development are compressing the longest common prefix information totally getting rid of the suffix tree topology and expressing all the suffix tree operations using range minimum queries and novel primitive called next previous smaller value in sequence our solutions to those operations are of independent interest
transductive video concept detection is an effective way to handle the lack of sufficient labeled videos however another issue the multi label interdependence is not essentially addressed in the existing transductive methods most solutions only applied the transductive single label approach to detect each individual concept separately but ignoring the concept relation or simply imposed the smoothness assumption over the multiple labels for each video without indeed exploring the interdependence between the concepts on the other hand the semi supervised extension of supervised multi label classifiers such as correlative multi label support vector machines is usually intractable and hence impractical due to the quite expensive computational cost in this paper we propose an effective transductive multi label classification approach which simultaneously models the labeling consistency between the visually similar videos and the multi label interdependence for each video in an integrated framework we compare the performance between the proposed approach and several representative transductive single label and supervised multi label classification approaches for the video concept detection task over the widely used trecvid data set the comparative results demonstrate the superiority of the proposed approach
body sensor networks are emerging as promising platform for remote human monitoring with the aim of extracting bio kinematic parameters from distributed body worn sensors these systems require collaboration of sensor nodes to obtain relevant information from an overwhelmingly large volume of data clearly efficient data reduction techniques and distributed signal processing algorithms are needed in this paper we present data processing technique that constructs motion transcripts from inertial sensors and identifies human movements by taking collaboration between the nodes into consideration transcripts of basic motions called primitives are built to reduce the complexity of the sensor data this model leads to distributed algorithm for segmentation and action recognition we demonstrate the effectiveness of our framework using data collected from five normal subjects performing ten transitional movements the results clearly illustrate the effectiveness of our framework in particular we obtain classification accuracy of with only one sensor node involved in the classification process
the method of reservoir based sampling is often used to pick an unbiased sample from data stream large portion of the unbiased sample may become less relevant over time because of evolution an analytical or mining task eg query estimation which is specific to only the sample points from recent time horizon may provide very inaccurate result this is because the size of the relevant sample reduces with the horizon itself on the other hand this is precisely the most important case for data stream algorithms since recent history is frequently analyzed in such cases we show that an effective solution is to bias the sample with the use of temporal bias functions the maintenance of such sample is non trivial since it needs to be dynamically maintained without knowing the total number of points in advance we prove some interesting theoretical properties of large class of memory less bias functions which allow for an efficient implementation of the sampling algorithm we also show that the inclusion of bias in the sampling process introduces maximum requirement on the reservoir size this is nice property since it shows that it may often be possible to maintain the maximum relevant sample with limited storage requirements we not only illustrate the advantages of the method for the problem of query estimation but also show that the approach has applicability to broader data mining problems such as evolution analysis and classification
open evolvable systems design requires methodological and conceptual paradigm different from the conventional software design evolvable systems research and has established itself as new research field but the content is more domain oriented than universal consequently major contributions toward substantiation of that universal methodological and conceptual paradigm are yet to come in this paper we present new perspective and method for the general purpose design of evolvable systems the paper presents the attributes of the evolvable systems and discusses the distinction between evolvable systems and conventional software design as well as the methodological ramifications we pose and address the question of what is an efficient methodology for designing system for which we do not know the boundaries we present our version of process oriented modeling as the key method in the high level design of evolvable systems and show its utilization in implementation of one modeling case of complex evolvable system the dna replication process we also present the dynamic aspects of the design process management and pre code verifications in the framework of quantified controls and simulations
many data center virtualization solutions such as vmware esx employ content based page sharing to consolidate the resources of multiple servers page sharing identifies virtual machine memory pages with identical content and consolidates them into single shared page this technique implemented at the host level applies only between vms placed on given physical host in multi server data center opportunities for sharing may be lost because the vms holding identical pages are resident on different hosts in order to obtain the full benefit of content based page sharing it is necessary to place virtual machines such that vms with similar memory content are located on the same hosts in this paper we present memory buddies memory sharing aware placement system for virtual machines this system includes memory fingerprinting system to efficiently determine the sharing potential among set of vms and compute more efficient placements in addition it makes use of live migration to optimize vm placement as workloads change we have implemented prototype memory buddies system with vmware esx server and present experimental results on our testbed as well as an analysis of an extensive memory trace study evaluation of our prototype using mix of enterprise and commerce applications demonstrates an increase of data center capacity ie number of vms supported of while imposing low overhead and scaling to as many as thousand servers
we present hubble system that operates continuously to find internet reachability problems in which routes exist to destination but packets are unable to reach the destination hubble monitors at minute granularity the data path to prefixes that cover of the internet’s edge address space key enabling techniques include hybrid passive active monitoring approach and the synthesis of multiple information sources that include historical data with these techniques we estimate that hubble discovers of the reachability problems that would be found with pervasive probing approach while issuing only as many probes we also present the results of three week study conducted with hubble we find that the extent of reachability problems both in number and duration is much greater than we expected with problems persisting for hours and even days and many of the problems do not correlate with bgp updates in many cases multi homed as is reachable through one provider but probes through another terminate using spoofed packets we isolated the direction of failure in of cases we analyzed and found all problems to be exclusively on the forward path from the provider to the destination snapshot of the problems hubble is currently monitoring can be found at http hubblecswashingtonedu
one of the most common programming errors is the use of variable before its definition this undefined value may produce incorrect results memory violations unpredictable behaviors and program failure to detect this kind of error two approaches can be used compile time analysis and run time checking however compile time analysis is far from perfect because of complicated data and control flows as well as arrays with non linear indirection subscripts etc on the other hand dynamic checking although supported by hardware and compiler techniques is costly due to heavy code instrumentation while information available at compile time is not taken into account this paper presents combination of an efficient compile time analysis and source code instrumentation for run time checking all kinds of variables are checked by pips fortran research compiler for program analyses transformation parallelization and verification uninitialized array elements are detected by using imported array region an efficient inter procedural array data flow analysis if exact array regions cannot be computed and compile time information is not sufficient array elements are initialized to special value and their utilization is accompanied by value test to assert the legality of the access in comparison to the dynamic instrumentation our method greatly reduces the number of variables to be initialized and to be checked code instrumentation is only needed for some array sections not for the whole array tests are generated as early as possible in addition programs can be proved to be free from used before set errors statically at compile time or on the contrary have real undefined errors experiments on spec cfp show encouraging results on analysis cost and run time overheads
advances in magnetic recording technology have resulted in rapid increase in disk capacities but improvements in the mechanical characteristics of disks have been quite modest for example the access time to random disk blocks has decreased by mere factor of two while disk capacities have increased by several orders of magnitude high performance oltp applications subject disks to very demanding workload since they require high access rates to randomly distributed disk blocks and gain limited benef pound from caching and prefetching we address this problem by re evaluating the performance of some well known disk scheduling methods before proposing and evaluating extensions to them variation to cscan takes into account rotational latency so that the service time of further requests is reduced variation to satf considers the sum of service times of several successive requests in scheduling the next request so that the arm is moved to temporal neighborhood with many requests the service time of further requests is discounted since their immediate processing is not guaranteed variation to the satf policy prioritizes reads with respect to writes and processes winner write requests conditionally ie when the ratio of their service time to that of the winner read request is smaller than certain threshold we review previous work to put our work into the proper perspective and discuss plans for future work
the evolution of distributed computing and applications has put new challenges on models architectures and systems to name just one reconciling uncertainty with predictability is required by today’s simultaneous pressure on increasing the quality of service of applications and on degrading the assurance given by the infrastructurethis challenge can be mapped onto more than one facet such as time or security or others in this paper we explore the time facet reviewing past and present of distributed systems models and making the case for the use of hybrid vs homogeneous models as key to overcoming some of the difficulties faced when asynchronous models uncertainty meet timing specifications predictability the wormholes paradigm is described as the first experiment with hybrid distributed systems models
in this paper we present an approach to solve the drawbacks of manual composition of software components our approach is applied within the jcolibri framework for building case based reasoning cbr applications we propose system design process based on reusing templates obtained from previously designed cbr systems templates store the control flow of the cbr applications and include semantic annotations conceptualizing its behavior and expertise we use cbr ontology to formalize syntactical semantical and pragmatical aspects of the reusable components of the framework the ontology vocabulary facilitates an annotation process of the components and allows to reason about their composition facilitating the semi automatic configuration of complex systems from their composing pieces
an augmented reality ar book is an application that applies ar technologies to physical books for providing new experience to users in this paper we propose new marker less tracking method for the ar book the main goal of the tracker is not only to recognize many pages but also to compute dof camera pose as result we can augment different virtual contents according to the corresponding page for this purpose we use multi core programming approach that separates the page recognition module from the tracking module in the page recognition module highly distinctive scale invariant features transform sift features are used in the tracking module coarse to fine approach is exploited for fast frame to frame matching our tracker provides more than frames per second in addition to the tracker we explain multi layer based data structure for maintaining the ar book gui based authoring tool is also shown to validate feasibility of the tracker and data structures the proposed algorithm would be helpful to create various ar applications that require multiple planes tracking
reasoning about program variables as sets of ldquo values rdquo leads to simple accurate and intuitively appealing notion of program approximation this paper presents approach for the compile time analysis of ml programs to develop the core ideas of the analysis we consider simple untyped call by value functional language starting with an operational semantics for the language we develop an approximate ldquo set based rdquo operational semantics which formalizes the intuition of treating program variables as sets the key result of the paper is an algorithm for computing the set based approximation of program we then extend this analysis in natural way to deal with arrays arithmetic exceptions and continuations we briefly describe our experience with an implementation of this analysis for ml programs
we propose very simple randomized algorithms to compute sparse overlay networks for geometric random graphs modelling wireless communication networks the algorithms generate in constant time sparse overlay network that with high probability is connected and spans the whole network moreover by making use of the power of choice paradigm the maximum degree can be made as small as log log where is the size of the network we show the usefulness of this kind of overlays by giving new protocol for the classical broadcast problem where source is to send message to the whole network our experimental evaluation shows that our approach outperforms the well known gossiping approach in all situations where the cost of message can be charged to the pair sender receiver ie to the edge connecting the two this includes sensor networks
uncertainty sampling is an effective method for performing active learning that is computationally efficient compared to other active learning methods such as loss reduction methods however unlike loss reduction methods uncertainty sampling cannot minimize total misclassification costs when errors incur different costs this paper introduces method for performing cost sensitive uncertainty sampling that makes use of self training we show that even when misclassification costs are equal this self training approach results in faster reduction of loss as function of number of points labeled and more reliable posterior probability estimates as compared to standard uncertainty sampling we also show why other more naive methods of modifying uncertainty sampling to minimize total misclassification costs will not always work well
moore’s law predicts that fabrication processes will soon yield over billion transistors on single die over the past twenty years the gap between the amount of available real estate on chip and designer productivity has widened and continues to grow so that designers are less able to make effective use of the increasing number of transistors on chip the primary problem in system on chip soc design is no longer the limit on the number of resources rather the development of new methodologies and system level design tools is the challenge that lies ahead as these have become essential to keeping costs low in the planning and construction of these systems designers have realized that the consideration of the overwhelming details at register transfer level rtl early in development restricts design space exploration inhibits trade off evaluation and results in increased time to market in order to effectively utilize the available resources the entry point of design flows in recent methodologies is at higher levels of abstraction with consideration to significantly fewer details of the final hardware implementation in recent years we have also seen the introduction of single chip multiprocessors as new tools and methodologies emerge for abstract system design proposed solutions to the arising problems in the performance of single chip multiprocessors can be implemented and evaluated more efficiently for example the amount of on chip memory for such parallel systems continues to steadily rise but so does the amount of power used by the memory system in this work we apply novel design methodology and tools to single chip multicore architecture considering alternatives for power reduction of the storage components through system level modeling
information integration is becoming critical problem for both businesses and individuals the data especially the one that comes from the web is naturally incomplete that is some data values may be unknown or lost because of communication problems hidden due to privacy considerations at the same time research in virtual integration in the community focusses on null free sources and addresses limited forms of incompleteness only in our work we aim to extend current results on virtual integration by considering various forms of incompleteness at the level of the sources the integrated database and the queries we call this incomplete information integration or iii more specifically we aim to extend current query answering techniques for local and global as view integration to integration of tables with sql nulls codd tables etc we also aim to consider incomplete answers as natural extension of the classical approach our main research issues are semantics of iii ii semantics of query answering in iii iii complexity of query answering and iv algorithms possibly approximate to compute the answers
free theorems feature prominently in the field of program transformation for pure functional languages such as haskell however somewhat disappointingly the semantic properties of so based transformations are often established only very superficially this paper is intended as case study showing how to use the existing theoretical foundations and formal methods for improving the situation to that end we investigate the correctness issue for new transformation rule in the short cut fusion family this destroy build rule provides certain reconciliation between the competing foldr build and destroy unfoldr approaches to eliminating intermediate lists our emphasis is on systematically and rigorously developing the rule’s correctness proof even while paying attention to semantic aspects like potential nontermination and mixed strict nonstrict evaluation
the mining of frequent patterns in databases has been studied for several years but few reports have discussed for fault tolerant ft pattern mining ft data mining is more suitable for extracting interesting information from real world data that may be polluted by noise in particular the increasing amount of today’s biological databases requires such data mining technique to mine important data eg motifs in this paper we propose the concept of proportional ft mining of frequent patterns the number of tolerable faults in proportional ft pattern is proportional to the length of the pattern two algorithms are designed for solving this problem the first algorithm named ft bottomup applies an ft apriori heuristic and finds all ft patterns with any number of faults the second algorithm ft levelwise divides all ft patterns into several groups according to the number of tolerable faults and mines the content patterns of each group in turn by applying our algorithm on real data two reported epitopes of spike proteins of sars cov can be found in our resulting itemset and the proportional ft data mining is better than the fixed ft data mining for this application
we adopt the untyped imperative object calculus of abadi and cardelli as minimal setting in which to study problems of compilation and program equivalence that arise when compiling object oriented languages we present both big step and small step substitution based operational semantics for the calculus our first two results are theorems asserting the equivalence of our substitution based semantics with closure based semantics like that given by abadi and cardelli our third result is direct proof of the correctness of compilation to stack based abstract machine via small step decompilation algorithm our fourth result is that contextual equivalence of objects coincides with form of mason and talcott’s ciu equivalence the latter provides tractable means of establishing operational equivalences finally we prove correct an algorithm used in our prototype compiler for statically resolving method offsets this is the first study of correctness of an object oriented abstract machine and of operational equivalence for the imperative object calculus
the problem of statically assigning nonpartitioned files in parallel system has been extensively investigated basic workload characteristic assumption of most existing solutions to the problem is that there exists strong inverse correlation between file access frequency and file size in other words the most popular files are typically small in size while the large files are relatively unpopular recent studies on the characteristics of web proxy traces suggested however the correlation if any is so weak that it can be ignored hence the following two questions arise naturally first can existing algorithms still perform well when the workload assumption does not hold quest second if not can one develop new file assignment strategy that is immune to the workload assumption quest to answer these questions we first evaluate the performance of three well known file assignment algorithms with and without the workload assumption respectively next we develop novel static nonpartitioned file assignment strategy for parallel systems called static round robin sor which is immune to the workload assumption comprehensive experimental results show that sor consistently improves the performance in terms of mean response time over the existing schemes
many object recognition and localization techniques utilize multiple levels of local representations these local feature representations are common and one way to improve the efficiency of algorithms that use them is to reduce the size of the local representations there has been previous work on selecting subsets of image features but the focus here is on systematic study of the feature selection problem we have developed combinatorial characterization of the feature subset selection problem that leads to general optimization framework this framework optimizes multiple objectives and allows the encoding of global constraints the features selected by this algorithm are able to achieve improved performance on the problem of object localization we present dataset of synthetic images along with ground truth information which allows us to precisely measure and compare the performance of feature subset algorithms our experiments show that subsets of image features produced by our method stable bounded canonical sets sbcs outperform subsets produced by means clustering ga and threshold based methods for the task of object localization under occlusion
in this paper pair programming is empirically investigated from the perspective of developer personalities and temperaments and how they affect pair effectiveness controlled experiment was conducted to investigate the impact of developer personalities and temperaments on communication pair performance and pair viability collaboration the experiment involved undergraduate students and the objective was to compare pairs of heterogeneous developer personalities and temperaments with pairs of homogeneous personalities and temperaments in terms of pair effectiveness pair effectiveness is expressed in terms of pair performance measured by communication velocity design correctness and passed acceptance tests and pair collaboration viability measured by developers satisfaction knowledge acquisition and participation the results have shown that there is important difference between the two groups indicating better communication pair performance and pair collaboration viability for the pairs with heterogeneous personalities and temperaments in order to provide an objective assessment of the differences between the two groups of pairs number of statistical tests and stepwise discriminant analysis were used
grid warping is placement strategy based on novel physical analogy rather than move the gates to optimize their location it elastically deforms model of the chip surface on which the gates have been coarsely placed via standard quadratic solve although the original warping idea works well for cell based placement it works poorly for mixed size placements with large fixed macrocells the new problem is how to avoid elastically deforming gates into illegal overlaps with these background objects we develop new lightweight mechanism called geometric hashing which relocates gates to avoid these overlaps but is efficient enough to embed directly in the nonlinear warping optimization results from new placer warp running on the ispd benchmark suite show both good quality and scalability
as the size and complexity of web sites expands dramatically it has become increasingly challenging to design web sites where web surfers can easily find the information they seek in this article we address the design of the portal page of web site which serves as the homepage of web site or default web portal we define an important research problem hyperlink selection selecting from large set of hyperlinks in given web site limited number of hyperlinks for inclusion in portal page the objective of hyperlink selection is to maximize the efficiency effectiveness and usage of web site’s portal page we propose heuristic approach to hyperlink selection linkselector which is based on relationships among hyperlinks structural relationships that can be extracted from an existing web site and access relationships that can be discovered from web log we compared the performance of linkselector with that of the current practice of hyperlink selection ie manual hyperlink selection by domain experts using data obtained from the university of arizona web site results showed that linkselector outperformed the current manual selection method
replica location service rls allows registration and discovery of data replicas in earlier work we proposed an rls framework and described the performance and scalability of an rls implementation in globus toolkit version in this paper we present peer to peer replica location service rls with properties of self organization fault tolerance and improved scalability rls uses the chord algorithm to self organize prls servers and exploits the chord overlay network to replicate rls mappings adaptively our performance measurements demonstrate that update and query latencies increase at logarithmic rate with the size of the rls network while the overhead of maintaining the rls network is reasonable our simulation results for adaptive replication demonstrate that as the number of replicas per mapping increases the mappings are more evenly distributed among rls nodes we introduce predecessor replication scheme and show it reduces query hotspots of popular mappings by distributing queries among nodes
the paradigm of human computation seeks to harness human abilities to solve computational problems or otherwise perform distributed work that is beyond the scope of current ai technologies one aspect of human computation has become known as games with purpose and seeks to elicit useful computational work in fun typically multi player games human computation also encompasses distributed work or peer production systems such as wikipedia and question and answer forums in this short paper we survey existing game theoretic models for various human computation designs and outline research challenges in advancing theory that can enable better design
modern computer architectures increasingly depend on mechanisms that estimate future control flow decisions to increase performance mechanisms such as speculative execution and prefetching are becoming standard architectural mechanisms that rely on control flow prediction to prefetch and speculatively execute future instructions at the same time computer programmers are increasingly turning to object oriented languages to increase their productivity these languages commonly use run time dispatching to implement object polymorphism dispatching is usually implemented using an indirect function call which presents challenges to existing control flow prediction techniques we have measured the occurrence of indirect function calls in collection of programs we show that although it is more important to predict branches accurately indirect call prediction is also an important factor in some programs and will grow in importance with the growth of object oriented programming we examine the improvement offered by compile time optimizations and static and dynamic prediction techniques and demonstrate how compilers can use existing branch prediction mechanisms to improve performance in programs using these methods with the programs we examined the number of instructions between mispredicted breaks in control can be doubled on existing computers
the number of successful attacks on the internet shows that it is very difficult to guarantee the security of online servers over extended periods of time breached server that is not detected in time may return incorrect query answers to users in this article we introduce authentication schemes for users to verify that their query answers from an online server are complete ie no qualifying tuples are omitted and authentic ie all the result values are legitimate we introduce scheme that supports range selection projection as well as primary key foreign key join queries on relational databases we also present authentication schemes for single and multi attribute range aggregate queries the schemes complement access control mechanisms that rewrite queries dynamically and are computationally secure we have implemented the proposed schemes and experiment results showed that they are practical and feasible schemes with low overheads
we consider the problem of compact routing with slack in networks of low doubling dimension namely we seek name independent routing schemes with stretch and polylogarithmic storage at each node since existing lower bound precludes such scheme we relax our guarantees to allow for small fraction of nodes to have large storage say size of log bits or ii small fraction of source destination pairs to have larger but still constant stretch in this paper given any constant any polylog and any connected edge weighted undirected graph with doubling dimension log log and arbitrary node names we present stretch name independent routing scheme for with polylogarithmic packet header size and with nodes storing polylogarithmic size routing tables each and the remaining δn nodes storing nlog bit routing tables each name independent routing scheme for with polylogarithmic storage and packet header size and with stretch for source nodes and for the remaining source nodes these results are to be contrasted with our lower bound from podc where we showed that stretch is asymptotically optimal for name independent compact routing schemes in networks of constant doubling dimension
recent research on skyline queries has attracted much interest in the database and data mining community the concept of dominant relationship analysis has commonly used in the context of skyline computation due to its importance in many applications current methods have only considered so called min max hard attributes like price and quality which user wants to minimize or maximize however objects can also have temporal attribute which can be used to represent relevant constraints on the query results in this paper we introduce novel skyline query types taking into account not only min max hard attributes but also temporal attribute and the relationships between these different attribute types we find the interrelated connection between the time evolving attributes and the dominant relationship based on this discovery we define the novel dominant relationship based on temporal aggregation and use it to analyze the problem of positioning product in competitive market while the time frame is required we propose new and efficient method to process temporal aggregation dominant relationship queries using corner transformation our experimental evaluation using real dataset and various synthetic datasets demonstrates that the new query types are indeed meaningful and the proposed algorithms are efficient and scalable
there are number of data dependence tests that have been proposed in the literature the most widely used approximate data dependence tests are the banerjee inequality and the gcd test in this paper we consider parallelization for microprocessors with the multimedia extensions for the short simd parallelism extraction it is essential that if dependency exists then the distance between memory references is greater than or equal to the number of data processed in the simd register this implies that some loops that could not be vectorized on traditional vector processors can still be parallelized for the short simd execution in this paper we present an accurate and simple method that can filter out data dependences with sufficiently large distance between memory references for linear array references within nested loop the presented method is suitable for use in dependence analyzer that is organized as series of tests progressively increasing in accuracy as replacement for the gcd or banerjee tests
improper management of software evolution compounded by imprecise and changing requirements along with the short time to market requirement commonly leads to lack of up to date specifications this can result in software that is characterized by bugs anomalies and even security threats software specification mining is new technique to address this concern by inferring specifications automatically in this paper we propose novel api specification mining architecture called smartic specification mining architecture with trace filtering and clustering to improve the accuracy robustness and scalability of specification miners this architecture is constructed based on two hypotheses erroneous traces should be pruned from the input traces to miner and clustering related traces will localize inaccuracies and reduce over generalizationin learning correspondingly smartic comprises four components an erroneous trace filtering block related trace clustering block learner and merger we show through experiments that the quality of specification mining can be significantly improved using smartic
an approach is described for checking the methods of class against full specification it shares with traditional model checking the idea of exhausting the entire space of executions within some finite bounds and with traditional verification the idea of modular analysis in which method is analyzed in isolation for all possible calling contextsthe analysis involves an automatic two phase reduction first to an intermediate form in relational logic using new encoding described here and second to boolean formula using existing techniques which is then handed to an off the shelf sat solvera variety of implementations of the java collections framework’s list interface were checked against existing jml specifications the analysis revealed bugs in the implementations as well as errors in the specifications themselves
we present system that allows for changing the major camera parameters after the acquisition of an image using the high dynamic range composition technique and additional range information captured with small and low cost time of flight camera our setup enables us to set the main parameters of virtual camera system and to compute the resulting image hence the aperture size and shape exposure time as well as the focus can be changed in postprocessing step since the depth of field computation is sensitive to proper range data it is essential to process the color and depth data in an integrated manner we use non local filtering approach to denoise and upsample the range data the same technique is used to infer missing information regarding depth and color which occur due to the parallax between both cameras as well as due to the lens camera model that we use to simulate the depth of field in physically correct way
bytecode verification is key point in the security chain of the java platform this feature is only optional in many embedded devices since the memory requirements of the verification process are too high in this article we propose an approach that significantly reduces the use of memory by serial parallel decomposition of the verification into multiple specialized passes the algorithm reduces the type encoding space by operating on different abstractions of the domain of types the results of our evaluation show that this bytecode verification can be performed directly on small memory systems the method is formalized in the framework of abstract interpretation
in this paper we present novel system modeling language which targets primarily the development of source level multiprocessor memory aware optimizations in contrast to previous system modeling approaches this approach tries to model the whole system and especially the memory hierarchy in structural and semantically accessible way previous approaches primarily support generation of simulators or retargetable code selectors and thus concentrate on pure behavioral models or describe only the processor instruction set in semantically accessible way simple database like interface is offered to the optimization developer which in conjunction with the maccv framework enables rapid development of source level architecture independent optimizations
sensor networks exhibit unique funneling effect which is product of the distinctive many to one hop by hop traffic pattern found in sensor networks and results in significant increase in transit traffic intensity collision congestion packet loss and energy drain as events move closer toward the sink while network eg congestion control and application techniques eg aggregation can help counter this problem they cannot fully alleviate it we take different but complementary approach to solving this problem than found in the literature and present the design implementation and evaluation of localized sink oriented funneling mac capable of mitigating the funneling effect and boosting application fidelity in sensor networks the funneling mac is based on csma ca being implemented network wide with localized tdma algorithm overlaid in the funneling region ie within small number of hops from the sink in this sense the funneling mac represents hybrid mac approach but does not have the scalability problems associated with the network wide deployment of tdma the funneling mac is sink oriented because the burden of managing the tdma scheduling of sensor events in the funneling region falls on the sink node and not on resource limited sensor nodes and it is localized because tdma only operates locally in the funneling region close to the sink and not across the complete sensor field we show through experimental results from mica testbed that the funneling mac mitigates the funneling effect improves throughput loss and energy efficiency and importantly significantly outperforms other representative protocols such as mac and more recent hybrid tdma csma mac protocols such as mac
parallel corpora are crucial for training smt systems however for many language pairs they are available only in very limited quantities for these language pairs huge portion of phrases encountered at run time will be unknown we show how techniques from paraphrasing can be used to deal with these otherwise unknown source language phrases our results show that augmenting state of the art smt system with paraphrases leads to significantly improved coverage and translation quality for training corpus with sentence pairs we increase the coverage of unique test set unigrams from to with more than half of the newly covered items accurately translated as opposed to none in current approaches
an error that occurs in microkernel operating system service can potentially result in state corruption and service failure simple restart of the failed service is not always the best solution for reliability blindly restarting service which maintains client related state such as session information results in the loss of this state and affects all clients that were using the service curios represents novel os design that uses lightweight distribution isolation and persistence of os service state to mitigate the problem of state loss during restart the design also significantly reduces error propagation within client related state maintained by an os service this is achieved by encapsulating services in separate protection domains and granting access to client related state only when required for request processing fault injection experiments show that it is possible to recover from between and of manifested errors in os services such as the file system network timer and scheduler while maintaining low performance overheads
suppose you have passion for items of certain type and you wish to start recommender system around those items you want system like amazon or epinions but for cookie recipes local theater or microbrew beer how can you set up your recommender system without assembling complicated algorithms large software infrastructure large community of contributors or even full catalog of items wikilens is open source software that enables anyone anywhere to start community maintained recommender around any type of item we introduce five principles for community maintained recommenders that address the two key issues community contribution of items and associated information and finding items of interest since all recommender communities start small we look at feasibility and utility in the small world one with few users few items few ratings we describe the features of wikilens which are based on our principles and give lessons learned from two years of experience running wikilensorg
the results of location dependent queries ldq queries generally depend on the current locations of query issuers many mechanisms eg broadcast scheme hoarding or caching policy have been developed to improve system performance and provide better service which are specialized for ldqs it is necessary to design them considering geographical adjacency and characteristics of target area in ldq for this reason this paper proposes the caching policy and broadcast scheme which these features are reflected in to develop the caching policy suitable for ldq in urban area we apply the moving distance of mh to our caching policy moreover based on the adjacency of data in ldq our broadcast scheme use space filling curve to cluster data we evaluate the performance of the caching policy measuring the workload of mhs and the correctness of ldq results and the performance of the broadcast scheme measuring the average setup time of mhs in our experiments finally we expect that our caching policy provides more correct answers when executing ldq in local cache and leads significant improvement of the workload of mhs it also seems quite probable that our broadcast scheme leads improvement of battery life of the mh
recently there has been growing interest in random sampling from online hidden databases these databases reside behind form like web interfaces which allow users to execute search queries by specifying the desired values for certain attributes and the system responds by returning few eg top tuples that satisfy the selection conditions sorted by suitable scoring function in this paper we consider the problem of uniform random sampling over such hidden databases key challenge is to eliminate the skew of samples incurred by the selective return of highly ranked tuples to address this challenge all state of the art samplers share common approach they do not use overflowing queries this is done in order to avoid favoring highly ranked tuples and thus incurring high skew in the retrieved samples however not considering overflowing queries substantially impacts sampling efficiency in this paper we propose novel sampling techniques which do leverage overflowing queries as result we are able to significantly improve sampling efficiency over the state of the art samplers while at the same time substantially reduce the skew of generated samples we conduct extensive experiments over synthetic and real world databases to illustrate the superiority of our techniques over the existing ones
most text clustering techniques are based on words and or phrases weights in the text such representation is often unsatisfactory because it ignores the relationships between terms and considers them as independent featuresin this paper new semantic similarity based model ssbm is proposed the semantic similarity based model computes semantic similarities by utilizing wordnet as an ontology the proposed model captures the semantic similarities between documents that contain semantically similar terms but unnecessarily syntactically identicalthe semantic similarity based model assigns new weight to document terms reflecting the semantic relationships between terms that co occur literally in the document our model in conjunction with the extended gloss overlaps measure and the adapted lesk algorithm solves ambiguity synonymy problems that are not detected using traditional term frequency based text mining techniquesthe proposed model is evaluated on the reuters and the newsgroups text collections datasets the performance is assessed in terms of the fmeasure purity and entropy quality measures the obtained results show promising performance improvements compared to the traditional term based vector space model vsm as well as other existing methods that include semantic similarity measures in text clustering
in information extraction uncertainty is ubiquitous for this reason it is useful to provide users querying extracted data with explanations for the answers they receive providing the provenance for tuples in query result partially addresses this problem in that provenance can explain why tuple is in the result of query however in some cases explaining why tuple is not in the result may be just as helpful in this work we focus on providing provenance style explanations for non answers and develop mechanism for providing this new type of provenance our experience with an information extraction prototype suggests that our approach can provide effective provenance information that can help user resolve their doubts over non answers to query
memory errors continue to be major source of software failure to address this issue we present meds memory error detection system system for detecting memory errors within binary executables the system can detect buffer overflow uninitialized data reads double free and deallocated memory access errors and vulnerabilities it works by using static analysis to prove memory accesses safe if memory access cannot be proven safe meds falls back to run time analysis the system exceeds previous work with dramatic reductions in false positives as well as covering all memory segments stack static heap
this paper proposes real time robust and effective tracking framework for visual servoing applications the algorithm is based on the fusion of visual cues and on the estimation of transformation either homography or pose the parameters of this transformation are estimated using non linear minimization of unique criterion that integrates information both on the texture and the edges of the tracked object the proposed tracker is more robust and performs well in conditions where methods based on single cue fail the framework has been tested for object motion estimation and pose computation the method presented in this paper has been validated on several video sequences as well as in visual servoing experiments considering various objects results show the method to be robust to occlusions or textured backgrounds and suitable for visual servoing applications
the database summarization system coined scp aint scp scp ti scp provides multi resolution summaries of structured data stored into acentralized database summaries are computed online with conceptual hierarchical clustering algorithm however most companies work in distributed legacy environments and consequently the current centralized version of scp aint scp scp ti scp is either not feasible privacy preserving or not desirable resource limitations to address this problem we propose new algorithms to generate single summary hierarchy given two distinct hierarchies without scanning the raw data the greedy merging algorithm gma takes all leaves of both hierarchies and generates the optimal partitioning for the considered data set with regards to cost function compactness and separation then hierarchical organization of summaries is built by agglomerating or dividing clusters such that the cost function may emphasize local or global patterns in the data thus we obtain two different hierarchies according to the performed optimisation however this approach breaks down due to its exponential time complexity two alternative approaches with constant time complexity wrt the number of data items are proposed to tackle this problem the first one called merge by incorporation algorithm mia relies on the scp aint scp scp ti scp engine whereas the second approach named merge by alignment algorithm maa consists in rearranging summaries by levels in top down manner then we compare those approaches using an original quality measure in order to quantify how good our merged hierarchies are finally an experimental study using real data sets shows that merging processes mia and maa are efficient in terms of computational time
interactive applications require simplifications to lighting geometry and material properties that preclude many effects encountered in the physical world until recently only the most simplistic reflections and refractions could be performed interactively but state of the art research has lifted some restrictions on such materials this paper builds upon this work but examines reflection and refraction from the light’s viewpoint to achieve interactive caustics from point sources our technique emits photons from the light and stores the results in image space similar to shadow map we then examine various techniques for gathering these photons comparing their advantages and disadvantages for rendering caustics these approaches run interactively on modern gpus work in conjunction with existing techniques for rendering specular materials and produce images competitive with offline renderings using comparable numbers of photons
test collections are essential to evaluate information retrieval ir systems the relevance assessment set has been recognized as the key bottleneck in test collection building especially on very large sized document collections this paper addresses the problem of efficiently selecting documents to be included in the assessment set we will show how machine learning techniques can fit this task this leads to smaller pools than traditional round robin pooling thus reduces significantly the manual assessment workload experimental results on trec collections consistently demonstrate the effectiveness of our approach according to different evaluation criteria
search engine logs are an emerging new type of data that offers interesting opportunities for data mining existing work on mining such data has mostly attempted to discover knowledge at the level of queries eg query clusters in this paper we propose to mine search engine logs for patterns at the level of terms through analyzing the relations of terms inside query we define two novel term association patterns ie context sensitive term substitutions and term additions and propose new methods for mining such patterns from search engine logs these two patterns can be used to address the mis specification and under specification problems of ineffective queries experiment results on real search engine logs show that the mined context sensitive term substitutions can be used to effectively reword queries and improve their accuracy while the mined context sensitive term addition patterns can be used to support query refinement in more effective way
the importance of software maintenance in managing the life cycle costs of system cannot be overemphasized beyond point however it is better to replace system rather than maintain it we derive model and operating policy that reduces the sum of maintenance and replacement costs in the useful life of software system the main goal is to compare uniform occurring at fixed time intervals versus flexible occurring at varying planned time intervals polices for maintenance and replacement the model draws from the empirical works of earlier researchers to consider inclusion of user requests for maintenance scale economies in software maintenance efficiencies derived from replacing old software technology with new software technology and the impact of software reuse on replacement and maintenance results from our model show that the traditional practice of maintaining or replacing software system at uniform time intervals may not be optimal we also find that an increase in software reuse leads to more frequent replacement but the number of maintenance activities is not significantly impacted
intrusion detection id is one of network security engineers most important tasks textual command line and visual interfaces are two common modalities used to support engineers in id we conducted controlled experiment comparing representative textual and visual interface for id to develop deeper understanding about the relative strengths and weaknesses of each we found that the textual interface allows users to better control the analysis of details of the data through the use of rich powerful and flexible commands while the visual interface allows better discovery of new attacks by offering an overview of the current state of the network with this understanding we recommend designing hybrid interface that combines the strengths of textual and visual interfaces for the next generation of tools used for intrusion detection
dynamic code generation allows programmers to use run time information in order to achieve performance and expressiveness superior to those of static code the tick language is superset of ansi that supports efficient and high level use of dynamic code generation provides dynamic code generation at the level of expressions and statements and supports the composition of dynamic code at run time these features enable programmers to add dynamic code generation to existing code incrementally and to write important applications such as ldquo just in time rdquo compilers easily the article presents many examples of how can be used to solve practical problems the tcc compiler is an efficient portable and freely available implementation of tcc allows programmers to trade dynamic compilation speed for dynamic code quality in some aplications it is most important to generate code quickly while in others code quality matters more than compilation speed the overhead of dynamic compilation is on the order of to cycles per generated instruction depending on the level of dynamic optimizaton measurements show that the use of dynamic code generation can improve performance by almost an order of magnitude two to four fold speedups are common in most cases the overhead of dynamic compilation is recovered in under uses of the dynamic code sometimes it can be recovered within one use
maintaining traceability links among software artifacts is particularly important for many software engineering tasks even though automatic traceability link recovery tools are successful in identifying the semantic connections among software artifacts produced during software development no existing traceability link management approach can effectively and automatically deal with software evolution we propose technique to automatically manage traceability link evolution and update the links in evolving software our novel technique called incremental latent semantic indexing ilsi allows for the fast and low cost lsi computation for the update of traceability links by analyzing the changes to software artifacts and by reusing the result from the previous lsi computation before the changes we present our ilsi technique and describe complete automatic traceability link evolution management tool tlem that is capable of interactively and quickly updating traceability links in the presence of evolving software artifacts we report on our empirical evaluation with various experimental studies to assess the performance and usefulness of our approach
this paper describes dialogueview tool for annotating dialogues with utterance boundaries speech repairs speech act tags and hierarchical discourse blocks the tool provides three views of dialogue wordview which shows the transcribed words time aligned with the audio signal utteranceview which shows the dialogue line by line as if it were script for movie and blockview which shows an outline of the dialogue the different views provide different abstractions of what is occurring in the dialogue abstraction helps users focus on what is important for different annotation tasks for example for annotating speech repairs utterance boundaries and overlapping and abandoned utterances the tool provides the exact timing information for coding speech act tags and hierarchical discourse structure broader context is created by hiding such low level details which can still be accessed if needed we find that the different abstractions allow users to annotate dialogues more quickly without sacrificing accuracy the tool can be configured to meet the requirements of variety of annotation schemes
pipeline flushes are becoming increasingly expensive in modern microprocessors with large instruction windows and deep pipelines selective re execution is technique that can reduce the penalty of mis speculations by re executing only instructions affected by the mis speculation instead of all instructions in this paper we introduce new selective re execution mechanism that exploits the properties of dataflow like explicit data graph execution edge architecture to support efficient mis speculation recovery while scaling to window sizes of thousands of instructions with high performance this distributed selective re execution dsre protocol permits multiple speculative waves of computation to be traversing dataflow graph simultaneously with commit wave propagating behind them to ensure correct execution we evaluate one application of this protocol to provide efficient recovery for load store dependence speculation unlike traditional dataflow architectures which resorted to single assignment memory semantics the dsre protocol combines dataflow execution with speculation to enable high performance and conventional sequential memory semantics our experiments show that the dsre protocol results in an average speedup over the best dependence predictor proposed to date and obtains of the performance possible with perfect oracle directing the issue of loads
most algorithms for mining spatial co locations adopt an apriori like approach to generate size prevalence co locations after size prevalence co locations however generating and storing the co locations and table instances is costly novel order clique based approach for mining maximal co locations is proposed in this paper the efficiency of the approach is achieved by two techniques the spatial neighbor relationships and the size prevalence co locations are compressed into extended prefix tree structures which allows the order clique based approach to mine candidate maximal co locations and co location instances and the co location instances do not need to be stored after computing some characteristics of the corresponding co location which significantly reduces the execution time and space required for mining maximal co locations the performance study shows that the new method is efficient for mining both long and short co location patterns and is faster than some other methods in particular the join based method and the join less method
aimed at verifying safety properties and improving simulation coverage for hybrid systems models of embedded control software we propose technique that combines numerical simulation and symbolic methods for computing state sets we consider systems with linear dynamics described in the commercial modeling tool simulink stateflow given an initial state and discrete time simulation trajectory our method computes set of initial states that are guaranteed to be equivalent to where two initial states are considered to be equivalent if the resulting simulation trajectories contain the same discrete components at each step of the simulation we illustrate the benefits of our method on two case studies one case study is benchmark proposed in the literature for hybrid systems verification and another is simulink demo model from mathworks
we present the lixto project which is both research project in database theory and commercial enterprise that develops web data extraction wrapping and web service definition software we discuss the project’s main motivations and ideas in particular the use of logic based framework for wrapping then we present theoretical results on monadic datalog over trees and on elog its close relative which is used as the internal wrapper language in the lixto system these results include both characterization of the expressive power and the complexity of these languages we describe the visual wrapper specification process in lixto and various practical aspects of wrapping we discuss work on the complexity of query languages for trees that was inseminated by our theoretical study of logic based languages for wrapping then we return to the practice of wrapping and the lixto transformation server which allows for streaming integration of data extracted from web pages this is natural requirement in complex services based on web wrapping finally we discuss industrial applications of lixto and point to open problems for future study
image clustering is an important research topic which contributes to wide range of applications traditional image clustering approaches are based on image content features only while content features alone can hardly describe the semantics of the images in the context of web images are no longer assumed homogeneous and flatdistributed but are richly structured there are two kinds of reinforcements embedded in such data the reinforcement between attributes of different data types intra type links reinforcements and the reinforcement between object attributes and the inter type links inter type links reinforcements unfortunately most of the previous works addressing relational data failed to fully explore the reinforcements in this paper we propose reinforcement clustering framework to tackle this problem it reinforces images and texts attributes via inter type links and inversely uses these attributes to update these links the iterative reinforcing nature of this framework promises the discovery of the semantic structure of images which is the basis of image clustering experimental results show the effectiveness of our proposed framework
distance measures along with shape features are the most critical components in shape based model retrieval system given shape feature an optimal distance measure will vary per query per user or per database no single fixed distance measure would be satisfactory all the time this paper focuses on method to adapt distance measure to the database to be queried by using learning based dimension reduction algorithms we experimentally compare six such dimension reduction algorithms both linear and non linear for their efficacy in the context of shape based model retrieval we tested the efficacy of these methods by applying them to five global shape features among the dimension reduction methods we tested non linear manifold learning algorithms performed better than the other eg linear algorithms such as principal component analysis performance of the best performing combination is roughly the same as the top finisher in the shrec contest
choosing weak isolation level such as read committed is understood as trade off where less isolation means that higher performance is gained but there is an increased possibility that data integrity will be lost previously one side of this trade off has been carefully studied quantitatively there are well known metrics for performance such as transactions per minute standardized benchmarks that measure these in controlled way and analytic models that can predict how performance is influenced by system parameters like multiprogramming level this paper contributes to quantifying the other aspect of the trade off we define novel microbenchmark that measures how rapidly integrity violations are produced at different isolation levels for simple set of transactions we explore how this rate is impacted by configuration factors such as multiprogramming level or contention frequency for the isolation levels in multi version platforms snapshot isolation and the multiversion variant of read committed we offer simple probabilistic model that predicts the rate of integrity violations in our microbenchmark from configuration parameters we validate the predictive model against measurements from the microbenchmark the model identifies region of the configuration space where surprising inversion occurs for these parameter settings more integrity violations happen with snapshot isolation than with multi version read committed even though the latter is considered lower isolation level
many stateful services use the replicated state machine approach for high availability in this approach service runs on multiple machines to survive machine failures this paper describes smart new technique for changing the set of machines where such service runs ie migrating the service smart improves upon existing techniques in three important ways first smart allows migrations that replace non failed machines thus smart enables load balancing and lets an automated system replace failed machines such autonomic migration is an important step toward full autonomic operation in which administrators play minor role and need not be available twenty four hours day seven days week second smart can pipeline concurrent requests useful performance optimization third prior published migration techniques are described in insufficient detail to admit implementation whereas our description of smart is complete in addition to describing smart we also demonstrate its practicality by implementing it evaluating our implementation’s performance and using it to build consistent replicated migratable file system our experiments demonstrate the performance advantage of pipelining concurrent requests and show that migration has only minor and temporary effect on performance
access requests to keys stored into data structure often exhibit locality of reference in practice such regularity can be modeled eg by working sets in this paper we study to what extent can the existence of working sets be taken advantage of in splay trees in order to reduce the number of costly splay operations we monitor for information on the current working set and its change we introduce simple algorithm which attempts to splay only when necessary under worst case analysis the algorithm guarantees an amortized logarithmic bound in empirical experiments it is more efficient than randomized splay trees and at most more efficient than the original splay tree we also briefly analyze the usefulness of the commonly used zipf’s distribution as general model of locality of reference
digital mementos are increasingly problematic as people acquire large amounts of digital belongings that are hard to access and often forgotten based on fieldwork with families we designed new type of embodied digital memento the fm radio it allows families to access and play sonic mementos of their previous holidays we describe our underlying design motivation where recordings are presented as series of channels on an old fashioned radio user feedback suggests that the device met our design goals being playful and intriguing easy to use and social it facilitated family interaction and allowed ready access to mementos thus sharing many of the properties of physical mementos that we intended to trigger
the management of moving objects has been intensively studied in recent years wide and increasing range of database applications has to deal with spatial objects whose position changes continuously over time called moving objects the main interest of these applications is to efficiently store and query the positions of these continuously moving objects to achieve this goal index structures are required the main proposals of index structures for moving objects deal with unconstrained dimensional movement constrained movement is special and very important case of object movement for example cars move in roads and trains in railroads in this paper we propose new index structure for moving objects on networks the mon tree we describe two network models that can be indexed by the mon tree the first model is edge oriented ie the network consists of nodes and edges and there is polyline associated to each edge the second one is more suitable for transportation networks and is route oriented ie the network consists of routes and junctions in this model polyline also serves as representation of the routes we propose the index in terms of the basic algorithms for insertion and querying we test our proposal in an extensive experimental evaluation with generated data sets using as underlying networks the roads of germany in our tests the mon tree shows good scalabiliy and outperforms the competing index structures in updating index creation as well as in querying
it is becoming increasingly important that geographical information system delivers high performance to efficiently store retrieve and process the voluminous data that it needs to handle it is necessary to employ processing and storage parallelism for scalable long term solutions with the demise of many custom built parallel machines it is imperative that we use off the shelf technology to provide this parallelism closely coupled network of workstations is viable alternative this paper shows that distributed index structure spanning the workstations can provide an efficient shared storage structure that can be used to get to the geographic information distributed amongst the individual disks and memories of the workstations this goal can be attained without significantly compromising on the time taken to build this structure
dependence graphs and memoization can be used to efficiently update the output of program as the input changes dynamically recent work has studied techniques for combining these approaches to effectively dynamize wide range of applications toward this end various theoretical results were given in this paper we describe the implementation of library based on these ideas and present experimental results on the efficiency of this library on variety of applications the results of the experiments indicate that the approach is effective in practice often requiring orders of magnitude less time than recomputing the output from scratch we believe this is the first experimental evidence that incremental computation of any type is effective in practice for reasonably broad set of applications
string matching plays central role in packet inspection applications such as intrusion detection anti virus anti spam and web filtering since they are computation and memory intensive software matching algorithms are insufficient to meet the high speed performance thus offloading packet inspection to dedicated hardware seems inevitable this paper presents scalable automaton matching sam coprocessor that uses aho corasick ac algorithm with two parallel acceleration techniques root indexing and pre hashing the root indexing can match multiple bytes in one single matching and the pre hashing can be used to avoid bitmap ac matching which is cycle consuming operation in the platform based soc implementation of the xilinx ml fpga the proposed hardware architecture can achieve almost gbps and support over patterns for virus which is the largest pattern set from among the existing works on the average the performance of sam is times faster than the original bitmap ac furthermore sam is feasible for either internal or external memory architecture the internal memory architecture provides high performance while the external memory architecture provides high scalability in term of the number of patterns
large number of embedded systems include bit microcontrollers for their energy efficiency and low cost multi bank memory architecture is commonly applied in bit microcontrollers to increase the size of memory without extending address buses to switch among different memory banks special instruction bank selection is used how to minimize the number of bank selection instructions inserted is important to reduce code size for embedded systems in this paper we consider how to insert the minimum number of bank selection instructions in program to achieve feasibility program can be represented by control flow graph cfg we prove that it is np hard to insert the minimum number of bank selection instructions if all the variables are pre assigned to memory banks therefore we introduce approximation algorithm using rounding method when the cfg is tree or the out degree of each node in the cfg is at most two we show that we can insert the bank selection instructions optimally in polynomial time we then consider the case when there are some nodes that do not access any memory bank and design dynamic programming method to compute the optimal insertion strategy when the cfg is tree experimental result shows the proposed techniques can reduce bank selection instructions significantly on partitioned memory architecture
broadcast disk technology has become popular method for data dissemination in wireless information systems however factors such as intentional periodic disconnection by the mobile hosts mh result in further needs for managing access modes so as to let the mobile hosts procure highest benefit through the disconnections we address this issue in this paper and propose solutions for the mh to retrieve the broadcast data our performance results show that the proposed algorithms indeed achieve almost the optimal performance but require only of the cost of an algorithm that is otherwise designed
there has been good deal of progress made recently toward the efficient parallelization of individual phases of single queries in multiprocessor database systems in this paper we devise and experimentally evaluate number of scheduling algorithms designed to handle multiple parallel queries scheduling in this context implies the determination of both processor allotments and temporal processor assignments to individual queries and query phases one of these algorithms performs the best in our experiments this algorithm is hierarchical in nature in the first phase good quality precedence based schedule is created for each individual query and each possible number of processors this component employs dynamic programming in the second phase the results of the first phase are used to create an overall schedule of the full set of queries this component is based on previously published work on nonprecedence based malleable scheduling even though the problem we are considering is np hard in the strong sense the multiple query schedules generated by our hierarchical algorithm are seen experimentally to achieve high quality results
with applications in biology the world wide web and several other areas mining of graph structured objects has received significant interest recently one of the major research directions in this field is concerned with predictive data mining in graph databases where each instance is represented by graph some of the proposed approaches for this task rely on the excellent classification performance of support vector machines to control the computational cost of these approaches the underlying kernel functions are based on frequent patterns in contrast to these approaches we propose kernel function based on natural set of cyclic and tree patterns independent of their frequency and discuss its computational aspects to practically demonstrate the effectiveness of our approach we use the popular nci hiv molecule dataset our experimental results show that cyclic pattern kernels can be computed quickly and offer predictive performance superior to recent graph kernels based on frequent patterns
clustering has been widely used in wireless ad hoc networks for various purposes such as routing broadcasting and qos many clustering algorithms have been proposed however most of them implicitly assume that nodes behave honestly in the clustering process in practice there might be some malicious nodes trying to manipulate the clustering process to make them serve as clusterheads which can obtain some special power eg eavesdropping more messages in this paper we present secure weighted clustering algorithm swca swca uses the weighted clustering algorithm wca for clustering and telsa for efficiently authenticating packets we propose novel neighbor verification scheme to check whether the values of election related features eg node degree are forged by malicious nodes also we theoretically analyze the probability for malicious node to tamper node degree without being detected and derive lower bound on the probability finally simulation results show that swca is secure but still has comparable performance with wca to the best of our knowledge swca is the first algorithm considering the security of hop type clustering in this type only the clusterhead can communicate with ordinary members directly in ad hoc networks
many scientific disciplines such as biology and astronomy are now collecting large amounts of raw data systematically and storing it centrally in relational databases for shared use by their communities in many cases such systems accept arbitrary sql queries submitted using web form typical database management systems do not provide support for enforcing resource access control policies to guard against expensive or pointless queries submitted accidentally by legitimate users or intentionally by attackers instead such systems typically employ timeouts to halt queries that do not terminate within reasonable period of time this approach can limit misuse but cannot prevent it moreover it does not provide useful feedback for legitimate users whose queries exceed the time limit in this paper we study language based technique for bounding the time and space resource usage of database queries we introduce cost semantics for simple core database query language define type based analysis for estimating an upper bound on the asymptotic running time of query and prove its soundness we also discuss prototype implementation which we have used to analyze typical sql queries submitted to the sdss skyserver astronomical database
after bounded update to database first order incremental evaluation system abbreviated foies derives the new answer to an expensive database query by applying first order query on the old answer and perhaps some stored auxiliary relations the auxiliary relations are also maintained in first order foies can be deterministic or nondeterministic depending on whether its stored auxiliary relations are defined by deterministic or nondeterministic mappings from databases in this paper we study the impact of the determinism restriction on foies and we compare nondeterminism with determinism in foies it turns out that nondeterministic foies are more powerful than the deterministic ones deterministic foies using auxiliary relations with arity are shown to be strictly weaker than their nondeterministic counterparts for each and it is shown that there is simple query which has nondeterministic foies with binary auxiliary relations but does not have any deterministic foies with auxiliary relations of any arity strict arity hierarchy of deterministic foies is established for the small arities interestingly the deterministic foies arity hierarchy collapses to ary when limited to queries over unary relations
in this paper we investigate how existing theoretical contributions on usable security can serve to guide the design of specific system we illustrate how going through this theoretically informed concrete design process also provides the basis for complementing existing theoretical contributions the system we have designed is system taking advantage of pervasive computing technology to offer hotel guests access to their personal digital materials while in hotel room the design is based on two ideas novel to usable security namely falsification and the singleton invariant
online aggregation is promising solution to achieving fast early responses for interactive ad hoc queries that compute aggregates on large amount of data essential to the success of online aggregation is good non blocking join algorithm that enables both high early result rates with statistical guarantees and ii fast end to end query times we analyze existing non blocking join algorithms and find that they all provide sub optimal early result rates and those with fast end to end times achieve them only by further sacrificing their early result rates we propose new non blocking join algorithm partitioned expanding ripple join pr join which achieves considerably higher early result rates than previous non blocking joins while also delivering fast end to end query times pr join performs separate ripple like join operations on individual hash partitions where the width of ripple expands multiplicatively over time this contrasts with the non partitioned fixed width ripples of block ripple join assuming as in previous non blocking join studies that the input relations are in random order pr join ensures representative early results that are amenable to statistical guarantees we show both analytically and with real machine experiments that pr join achieves over an order of magnitude higher early result rates than previous non blocking joins we also discuss the benefits of using flash based ssd for temporary storage showing that pr join can then achieve close to optimal end to end performance finally we consider the joining of finite data streams that arrive over time and find that pr join achieves similar or higher result rates than rpj the state of the art algorithm specialized for that domain
this paper presents framework for power efficient detection in embedded sensor systems state detection is structured as decision tree classifier that dynamically orders the activation and adjusts the sampling rate of the sensors termed groggy wakeup such that only the data necessary to determine the system state is collected at any given time this classifier can be tuned to trade off accuracy and power in structured parameterized fashion an embedded instantiation of these classifiers including real time sensor control is described an application based on wearable gait monitor provides quantitative support for this framework the decision tree classifiers achieved roughly identical detection accuracies to those obtained using support vector machines while drawing three times less power both simulation and real time operation of the classifiers demonstrate that our multi tiered classifier determines states as accurately as single trigger binary wakeup system while drawing as little as half as much power and with only negligible increase in latency
this paper describes the architectural considerations in the design of web services bb application component design is presented which exploits the postulated advantages of object oriented and web services technologies our main focus has been to design interim measures to overcome the limitations of current web services architecture and standards we discuss our interim design with reference to our approach to load balancing and the design of application based error handling to support both the load balancing requirements and the web services architecture the design issues surrounding application based concurrency control in bb web services environment are also described our concurrency control approach presents practical adaptation of pessimistic scheme supporting nested transaction model
the algebra tna of generalised temporal database model supporting temporal relations nested to any finite depth is presented the temporal nested relations consist of temporal nested attributes which are formed from temporal attributes together with the corresponding time varying attributes therefore the temporal dimension of the model is nested and is not integral with the corresponding time dependent value all the operations of the algebra are defined recursively and are proved to be closed in particular considering the natural join operation for temporal nested relations different cases are presented distinguished by the types and the nesting levels of the common attributes that participate in the natural join operation
materialized aggregate views represent set of redundant entities in data warehouse that are frequently used to accelerate on line analytical processing olap due to the complex structure of the data warehouse and the different profiles of the users who submit queries there is need for tools that will automate and ease the view selection and management processes in this article we present dynamat system that manages dynamic collections of materialized aggregate views in data warehouse at query time dynamat utilizes dedicated disk space for storing computed aggregates that are further engaged for answering new queries queries are executed independently or can be bundled within multiquery expression in the latter case we present an execution mechanism that exploits dependencies among the queries and the materialized set to further optimize their execution during updates dynamat reconciles the current materialized view selection and refreshes the most beneficial subset of it within given maintenance window we show how to derive an efficient update plan with respect to the available maintenance window the different update policies for the views and the dependencies that exist among them
many applications employ sensors for monitoring entities such as temperature and wind speed centralized database tracks these entities to enable query processing due to continuous changes in these values and limited resources eg network bandwidth and battery power it is often infeasible to store the exact values at all times similar situation exists for moving object environments that track the constantly changing locations of objects in this environment it is possible for database queries to produce incorrect or invalid results based upon old data however if the degree of error or uncertainty between the actual value and the database value is controlled one can place more confidence in the answers to queries more generally query answers can be augmented with probabilistic estimates of the validity of the answers in this paper we study probabilistic query evaluation based upon uncertain data classification of queries is made based upon the nature of the result set for each class we develop algorithms for computing probabilistic answers we address the important issue of measuring the quality of the answers to these queries and provide algorithms for efficiently pulling data from relevant sensors or moving objects in order to improve the quality of the executing queries extensive experiments are performed to examine the effectiveness of several data update policies
we study the problem of maintaining sketches of recent elements of data stream motivated by applications involving network data we consider streams that are asynchronous in which the observed order of data is not the same as the time order in which the data was generated the notion of recent elements of stream is modeled by the sliding timestamp window which is the set of elements with timestamps that are close to the current time we design algorithms for maintaining sketches of all elements within the sliding timestamp window that can give provably accurate estimates of two basic aggregates the sum and the median of stream of numbers the space taken by the sketches the time needed for querying the sketch and the time for inserting new elements into the sketch are all polylog with respect to the maximum window size and the values of the data items in the window our sketches can be easily combined in lossless and compact way making them useful for distributed computations over data streams previous works on sketching recent elements of data stream have all considered the more restrictive scenario of synchronous streams where the observed order of data is the same as the time order in which the data was generated our notion of recency of elements is more general than that studied in previous work and thus our sketches are more robust to network delays and asynchrony
we present storage management layer that facilitates the implementation of parallel information retrieval systems and related applications on networks of workstations the storage management layer automates the process of adding and removing nodes and implements dispersed mirroring strategy to improve reliability when nodes are added and removed the document collection managed by the system is redistributed for load balancing purposes the use of dispersed mirroring minimizes the impact of node failures and system modifications on query performance
log linear and maximum margin models are two commonly used methods in supervised machine learning and are frequently used in structured prediction problems efficient learning of parameters in these models is therefore an important problem and becomes key factor when learning from very large data sets this paper describes exponentiated gradient eg algorithms for training such models where eg updates are applied to the convex dual of either the log linear or max margin objective function the dual in both the log linear and max margin cases corresponds to minimizing convex function with simplex constraints we study both batch and online variants of the algorithm and provide rates of convergence for both cases in the max margin case eg updates are required to reach given accuracy in the dual in contrast for log linear models only log updates are required for both the max margin and log linear cases our bounds suggest that the online eg algorithm requires factor of less computation to reach desired accuracy than the batch eg algorithm where is the number of training examples our experiments confirm that the online algorithms are much faster than the batch algorithms in practice we describe how the eg updates factor in convenient way for structured prediction problems allowing the algorithms to be efficiently applied to problems such as sequence learning or natural language parsing we perform extensive evaluation of the algorithms comparing them to bfgs and stochastic gradient descent for log linear models and to svm struct for max margin models the algorithms are applied to multi class problem as well as to more complex large scale parsing task in all these settings the eg algorithms presented here outperform the other methods
text categorization systems often induce document classifiers from pre classified examples by the use of machine learning techniques the circumstance that each example document can belong to many different classes often leads to impractically high computational costs that sometimes grow exponentially in the number of features looking for ways to reduce these costs we explored the possibility of running baseline induction algorithm separately for subsets of features obtaining set of classifiers to be combined for the specific case of classifiers that return not only class labels but also confidences in these labels we investigate here few alternative fusion techniques including our own mechanism that was inspired by the dempster shafer theory the paper describes the algorithm and in our specific case study compares its performance to that of more traditional mechanisms
one important issue the designer of scalable shared memory multiprocessor must deal with is the amount of extra memory required to store the directory information it is desirable that the directory memory overhead be kept as low as possible and that it scales very slowly with the size of the machine unfortunately current directory architectures provide scalability at the expense of performance this work presents scalable directory architecture that significantly reduces the size of the directory for large scale configurations of multiprocessor without degrading performance first we propose multilayer clustering as an effective approach to reduce the width of directory entries based on this concept we derive three new compressed sharing codes some of them with space complexity of rm left log left log left right right right for an node system then we present novel two level directory architecture to eliminate the penalty caused by compressed directories in general the proposed organization consists of small full map first level directory which provides precise information for the most recently referenced lines and compressed second level directory which provides in excess information for all the lines the proposals are evaluated based on extensive execution driven simulations using rsim of node cc numa multiprocessor results demonstrate that system with two level directory architecture achieves the same performance as multiprocessor with big and nonscalable full map directory with very significant reduction of the memory overhead
aspect oriented programming aop has started to achieve industry adoption for custom programs and some adoption in frameworks such as the spring framework aspect oriented programming provides many benefits it can increase the scope of concerns that can be captured cleanly it has explicit language support and the separation provided by aop provides an elegant mechanism for custom solutions in this paper we present model for aop testing this includes model for risk assessment an associated fault model and aop testing patterns we also propose further opportunities for research in the area for automated aop risk assessment and testing at aptsi applied technology solutions inc we have been applying aop in the creation of our soasense framework and in our consulting engagements we are seeing adoption typically in classical aop areas such as logging error handling audit events etc in these scenarios having reliable aop implementation is critical for example having an audit event not occur for service call due to faulty join point definition can have severe legal implications we need solution that provides reliability is repeatable and enables us to assess risk
increased chip temperature has been known to cause severe reliability problems and to significantly increase leakage power the register file has been previously shown to exhibit the highest temperature compared to all other hardware components in modern high end embedded processor which makes it particularly susceptible to faults and elevated leakage power we show that this is mostly due to the highly clustered register file accesses where set of few registers physically placed close to each other are accessed with very high frequency we propose compile time temperature aware register reallocation methodologies for breaking such groups of registers and to uniformly distribute the accesses to the register file this is achieved with no performance and no hardware overheads we show that the underlying problem is np hard and subsequently introduce and evaluate two efficient algorithmic heuristics our extensive experimental study demonstrates the efficiency of the proposed methodology
we present new approach based on graph transformation to incremental specification of the operational execution semantics of visual languages the approach combines editing rules with two meta models one to define the concrete syntax and one for the static semantics we introduce the notion of action patterns defining basic actions eg consuming or producing token in transition based semantics in way similar to graph transformation rules the application of action patterns to static semantics editing rule produces meta rule to be paired with the firing of the corresponding syntactic rule to incrementally build an execution rule an execution rule is thus tailored to any active element eg transition in petri net model in the model examples from petri nets state automata and workflow languages illustrate these ideas
in this paper we present the consequences of unifying the representation of the schema and the instance levels of an object programming language to the formal representation of object model the uniform representation of schema and instance levels of object languages is achieved as in the frame based knowledge representation languages by representing them using uniform set of modeling constructs we show that using such an approach the structural part of the object language model can be described in clear manner providing the simple means for the description of the main constructs of the structural model and the relationships among them further we study the consequences of releasing the boundary between the schema and the instance levels of an object programming language by allowing the definition of objects which include data from both levels we show that few changes are needed in order to augment the previously presented formal definition of the structural part of object language to represent the extended object model the acm portal is published by the association for computing machinery copyright acm inc terms of usage privacy policy code of ethics contact us useful downloads adobe acrobat quicktime windows media player real player
routing protocols in sensor networks maintain information on neighbor states and potentially many other factors in order to make informed decisions challenges arise both in performing accurate and adaptive information discovery and processing analyzing the gathered data to extract useful features and correlations to address such challenges this paper explores using supervised learning techniques to make informed decisions in the context of wireless sensor networks we investigate the design space of both offline learning and online learning and use link quality estimation as case study to evaluate their effectiveness for this purpose we present metricmap metric based collection routing protocol atop mintroute that derives link quality using classifiers learned in the training phase when the traditional etx approach fails the offline learning approach is evaluated on node sensor network testbed and our results show that metricmap can achieve up to improvement over mintroute in data delivery rate for high data rate situations with no negative impact on other performance metrics we also explore the possibility of using online learning in this paper
in recent years we have witnessed the emergence and establishment of research in sensor network security the majority of the literature has focused on discovering numerous vulnerabilities and attacks against sensor networks along with suggestions for corresponding countermeasures however there has been little guidance for understanding the holistic nature of sensor network security for practical deployments in this paper we discuss these concerns and propose taxonomy composed of the security properties of the sensor network the threat model and the security design space in particular we try to understand the application layer goals of sensor network and provide guide to research challenges that need to be addressed in order to prioritize our defenses against threats to application layer goals
enterprises rely critically on the timely and sustained delivery of information to support this need we augment information flow middleware with new functionality that provides high levels of availability to distributed applications while at the same time maximizing the utility end users derive from such information specifically the paper presents utility driven proactive availability management techniques to offer information flows that dynamically self determine their availability requirement based on high level utility specifications flows that can trade recovery time for performance based on the perceived stability of and failure predictions early alarm for the underlying system and methods based on real world case studies to deal with both transient and non transient failures utility driven proactive availability management is integrated into information flow middleware and used with representative applications experiments reported in the paper demonstrate middleware capability to self determine availability guarantees to offer improved performance versus statically configured system and to be resilient to wide range of faults
the capabilities of xslt processing are widely used to transform xml documents into target xml documents these target xml documents conform to output schemas of the used xslt stylesheet output schemas of xslt stylesheets can be used for static analysis of the used xslt stylesheet to automatically detect the xslt stylesheet of target xml documents or to reason on the output schema without access to the target xml documents in this paper we develop an approach to automatically determining the output schema of an xslt stylesheet we also describe several application scenarios of output schemas the experimental evaluation shows that our prototype can determine the output schemas of nearly all typical xslt stylesheets and the improvements in preciseness in several application scenarios when using output schemas in comparison to when not using output schemas
the use of asynchronous duty cycling in wireless sensor network mac protocols is common since it can greatly reduce energy consumption and requires no clock synchronization however existing systems using asynchronous duty cycling do not efficiently support broadcast based communication that may be used for example in route discovery or in network wide queries or information dissemination in this paper we present the design and evaluation of adb asynchronous duty cycle broadcasting new protocol for efficient multihop broadcast in wireless sensor networks using asynchronous duty cycling adb differs from traditional multihop broadcast protocols that operate above the mac layer in that it is integrated with the mac layer to exploit information only available at this layer rather than treating the data transmission from node to all of its neighbors as the basic unit of progress for the multihop broadcast adb dynamically optimizes the broadcast at the level of transmission to each individual neighbor of node as the neighbors asynchronously wakeup we evaluate adb both through ns simulations and through measurements in testbed of micaz motes using tinyos and compare its performance to multihop broadcast based on mac and on ri mac in both evaluations adb substantially reduced energy consumption network load and delivery latency compared to other protocols while achieving over delivery ratio
java bytecode verification forms the basis for java based internet security and needs rigorous description one important aspect of bytecode verification is to check if java virtual machine jvm program is statically well typed so far several formal specifications have been proposed to define what the static well typedness means this paper takes step further and presents chaotic fixpoint iteration which represents family of fixpoint computation strategies to compute least type for each jvm program within finite number of iteration steps since transfer function in the iteration is not monotone we choose to follow the example of nonstandard fixpoint theorem which requires that all transfer functions are increasing and monotone in case the bigger element is already fixpoint the resulting least type is the artificial top element if and only if he jvm program is not statically well typed the iteration is standard and close to sun’s informal specification and most commercial bytecode verifiers
media analysis for video indexing is witnessing an increasing influence of statistical techniques examples of these techniques include the use of generative models as well as discriminant techniques for video structuring classification summarization indexing and retrieval there is increasing emphasis on reducing the amount of supervision and user interaction needed to construct and utilize the semantic models this paper highlights the statistical learning techniques in semantic multimedia indexing and retrieval in particular the gamut of techniques from supervised to unsupervised systems will be demonstrated
wireless infrastructures are increasingly diverse complex and difficult to manage those who restrict themselves to homogeneous managed campus or corporate networks are vanishing breed in the wild users are confronted with many overlapping infrastructures with broad variety of strengths and weaknesses such diversity of infrastructure is both challenge and an opportunity the challenge lies in presenting the alternatives to applications and users in way that provides the best possible utility to both however by managing these many alternatives we can provide significant benefits exploiting multiple networks concurrently and planning future transmissions intelligently
recently researchers have proposed modeling register allocation as an integer linear programming ip problem and solving it optimally for general purpose processors and for dedicated embedded systems compared with traditional graph coloring approaches the ip based allocators can improve program’s performance however the solution times are much slowerthis paper presents an lp based optimal register allocator which is much faster than previous work we present several local and global reduction techniques to identify locations in program’s control flow graph where spill decisions and register deallocation decisions are unnecessary for optimal register allocation we propose hierarchical reduction approach to efficiently remove the corresponding redundant decisions and constraints from the ip model this allocator is built into the gnu compiler and is evaluated experimentally using the specnt benchmarks the results show that the improved ip model is much simpler the number of constraints produced is almost linear with the function size the optimal allocation time is much faster with speedup factor of about for hard allocation problems
an approach to cope with the impossibility of solving agreement problems in asynchronous systems made up of processes and prone to process crashes is to use failure detectors an orthogonal approach that has been used is to consider conditions that restrict the possible inputs to such problem this paper considers system with both failure detectors and conditions the aim is to identify the failure detector class that abstracts away the synchrony needed to solve set agreement for given conditionthree main contributions are presented the first is new class of failure detectors denoted Ï�ty the processes can invoke primitive queryy with set of process ids roughly speaking queryy returns true only when all processes in have crashed provided it is shown that the classic chandra and toueg’s failure detectors are incomparable to the Ï�ty failure detectors the second contribution is generic condition based protocol for Ï�ty that solves set agreement it can be instantiated with any legal condition and solves set agreement for max termination is guaranteed for inputs in condition is legal if and only if it can be used to solve fault tolerant asynchronous consensus variant of the protocol that terminates always is described finally corresponding lower bound is presented showing that there is no Ï�ty based set agreement protocol for legal conditions with max
we present and evaluate the idea of adaptive processor cache management specifically we describe novel and general scheme by which we can combine any two cache management algorithms eg lru lfu fifo random and adaptively switch between them closely tracking the locality characteristics of given program the scheme is inspired by recent work in virtual memory management at the operating system level which has shown that it is possible to adapt over two replacement policies to provide an aggregate policy that always performs within constant factor of the better component policy hardware implementation of adaptivity requires very simple logic but duplicate tag structures to reduce the overhead we use partial tags which achieve good performance with small hardware cost in particular adapting between lru and lfu replacement policies on an way kb cache yields improvement in average cpi on applications that exhibit non negligible miss ratio our approach increases total cache storage by but it still provides slightly better performance than conventional way setassociative kb cache which requires more storage
workloads generated by the real world parallel applications that are executed on multicomputer have strong effect on the performance of its interconnection network mdash the hardware fabric supporting communication among individual processors existing multicomputer networks have been primarily designed and analysed under the assumption that the workload follows the non bursty poisson arrival process as step towards obtaining clear understanding of network performance under various workloads this paper presents new analytical model for computing message latency in wormhole switched torus networks in the presence of bursty traffic based on the well known markov modulated poisson process mmpp in order to derive the model the approach for accurately capturing the properties of the composite mmpps is applied to characterize traffic on network channels moreover general method has been proposed for calculating the probability of virtual channel occupancy when the traffic on network channels follows multi state mmpp process simulation experiments reveal that the model exhibits good degree of accuracy
large scale wireless sensor applications require that the wireless sensor network wsn be sufficiently intelligent to make decisions or tradeoffs within partially uncertain environment such intelligence is difficult to model because the sensor nodes are resource bounded especially on energy supply within this paper we model wsn as two level multiagent architecture and introduce the concept of energy aware and utility based belief desire intention bdi agents we subsequently formalise the fuzzy set based belief generating and theoretically deduce the rules of the meta level reasoning of wsn we further apply practical reasoning algorithm which enables cost awareness and status awareness within wsn tileworld based simulation and subsequent comparison of the effectiveness of bold cautious and energy aware and utility based agents demonstrates that an energy aware and utility based bdi agent outperforms other traditional agents especially when the environment is highly dynamic
in this paper we focus on the problem of designing very fast parallel algorithms for the convex hull and the vector maxima problems in three dimensions that are output size sensitive our algorithms achieve log logn log parallel time and optimal log work with high probability in the crcw pram where and are the input and output size respectively these bounds are independent of the input distribution and are faster than the previously known algorithms we also present an optimal speed up with respect to the input size only sublogarithmic time algorithm that uses superlinear number of processors for vector maxima in three dimensions
the geometry of space curve is described in terms of euclidean invariant frame field metric connection torsion and curvature here the torsion and curvature of the connection quantify the curve geometry in order to retain stable and reproducible description of that geometry such that it is slightly affected by non uniform protrusions of the curve linearised euclidean shortening flow is proposed semi discretised versions of the flow subsequently physically realise concise and exact semi discrete curve geometry imposing special ordering relations the torsion and curvature in the curve geometry can be retrieved on multi scale basis not only for simply closed planar curves but also for open branching intersecting and space curves of non trivial knot type in the context of the shortening flows we revisit the maximum principle the semi group property and the comparison principle normally required in scale space theories we show that our linearised flow satisfies an adapted maximum principle and that its green lsquo functions possess semi group property we argue that the comparison principle in the case of knots can obstruct topological changes being in contradiction with the required curve simplification principle our linearised flow paradigm is not hampered by this drawback semi all non symmetric knots tend to trivial ones being infinitely small circles in plane finally the differential and integral geometry of the multi scale representation of the curve geometry under the flow is quantified by endowing the scale space of curves with an appropriate connection and calculating related torsion and curvature aspects this multi scale modern geometric analysis forms therewith an alternative for curve description methods based on entropy scale space theories
recently the re ranking algorithms have been quite popular for web search and data mining however one of the issues is that those algorithms treat the content and link information individually inspired by graph based machine learning algorithms we propose novel and general framework to model the re ranking algorithm by regularizing the smoothness of ranking scores over the graph along with regularizer on the initial ranking scores which are obtained by the base ranker the intuition behind the model is the global consistency over the graph similar entities are likely to have the same ranking scores with respect to query our approach simultaneously incorporates the content with other explicit or implicit link information in latent space graph then an effective unified re ranking algorithm is performed on the graph with respect to the query to illustrate our methodology we apply the framework to literature retrieval and expert finding applications on dblp bibliography data we compare the proposed method with the initial language model method and another pagerank style re ranking method also we evaluate the proposed method with varying graphs and settings experimental results show that the improvement in our proposed method is consistent and promising
current information systems are more and more complex they require more interactions between different components and users so ensuring system security must not be limited to using an access control model but also it is primordial to deal with information flows in system thus an important function of security policy is to enforce access to different system elements and supervise information flows simultaneously several works have been undertaken to join together models of access control and information flow unfortunately beyond the fact that the reference model they use is blp which is quite rigid these research works suggest non integrated models which do nothing but juxtapose access control and information flow controls or are based on misuse of mapping between mls and rbac models in this paper we suggest to formalize dte model in order to use it as solution for flexible information flow control then we integrate it into an unique access control model expressive enough to handle access and flow control security rules the expressivity of the orbac model makes this integration possible and quite natural
we provide theoretical proof showing that under proportional noise model the discrete eight point algorithm behaves similarly to the differential eight point algorithm when the motion is small this implies that the discrete algorithm can handle arbitrarily small motion for general scene as long as the noise decreases proportionally with the amount of image motion and the proportionality constant is small enough this stability result extends to all normalized variants of the eight point algorithm using simulations we show that given arbitrarily small motions and proportional noise regime the normalized eight point algorithms outperform their differential counterparts by large margin using real data we show that in practical small motion problems involving optical flow these discrete structure from motion sfm algorithms also provide better estimates than their differential counterparts even when the motion magnitudes reach sub pixel level the better performance of these normalized discrete variants means that there is much to recommend them as differential sfm algorithms that are linear and normalized
succinet full text self index is data structure built on text tt tn which takes little space ideally close to that of the compressed text permits efficient search for the occurrences of pattern pp pm in and is able to reproduce any text substring so the self index replaces the textseveral remarkable self indexes have been developed in recent years many of those take space proportional to nh or nhk bits where hk is the kth order empirical entropy of the time to count how many times does occur in ranges from to log in this paper we present new self index called rlfm index for run length fm index that counts the occurrences of in in time when the alphabet size is polylog the rlfm index requires nhklog� bits of space for any αlog�n and constant previous indexes that achieve counting time either require more than nh bits of space or require that we also show that the rlfm index can be enhanced to locate occurrences in the text and display text substrings in time independent of �in addition we prove close relationship between the kth order entropy of the text and some regularities that show up in their suffix arrays and in the burrows wheeler transform of this relationship is of independent interest and permits bounding the space occupancy of the rlfm index as well as that of other existing compressed indexesfinally we present some practical considerations in order to implement the rlfm index we empirically compare our index against the best existing implementations and show that it is practical and competitive against those in passing we obtain competitive implementation of an existing theoretical proposal that can be seen as simplified rlfm index and explore other practical ideas such as huffman shaped wavelet trees
we define challenging and meaningful benchmark for genericity in language processing namely the notion of generic program refactoring we provide the first implementation of the benchmark based on functional strategic programming in haskell we use the basic refactoring of abstraction extraction as the running example our implementation comes as functional programming framework with hot spots for the language specific ingredients for refactoring eg means for abstraction construction and destruction and recognisers for name analysis the language parametric framework can be instantiated for various rather different languages eg java prolog haskell or xml schema
to maintain high reliability and availability system level diagnosis should be considered for the multiprocessor systems the self diagnosis problem of hypermesh emerging potential optical interconnection networks for multiprocessor systems is solved in this paper we derive that the precise one step diagnosability of hypermesh is based on the principle of cycle decomposition one step fault diagnosis algorithm for hypermesh which runs in nn time also is described
existing methods for handling pointer variables during dataflow analyses can make such analyses inefficient in both time and space because the data flow analyses must store and propagate large sets of data facts that are introduced by dereferences of pointer variables this article presents equivalence analysis general technique to improve the efficiency of data flow analyses in the presence of pointer variables the technique identifies equivalence relations among the memory locations accessed by procedure and ensures that two equivalent memory locations share the same set of data facts in procedure and in the procedures that are called by that procedure thus data flow analysis needs to compute the data flow information for only representative memory location in an equivalence class the data flow information for other memory locations in the equivalence class can be derived from that of the representative memory location the article also shows the extension to an interprocedural slicing algorithm that uses equivalence analysis to improve the efficiency of the algorithm our empirical studies suggest that equivalence analysis may effectively improve the efficiency of many data flow analyses
evaluating novel networked protocols and services requires subjecting the target system to realistic internet conditions however there is no common understanding of what is required to capture such realism conventional wisdom suggests that competing background traffic will influence service and protocol behavior once again however there is no understanding of what aspects of background traffic are important and the extent to which services are sensitive to these characteristics earlier work shows that internet traffic demonstrates significant burstiness at range of time scales unfortunately existing systems evaluations either do not consider background traffic or employ simple synthetic models eg based on poisson arrivals that do not capture these burstiness properties in this paper we show that realistic background traffic has qualitatively different impact on application and protocol behavior than simple traffic models one conclusion from our work is that applications should be evaluated under range of background traffic characteristics to determine the relative merits of applications and to understand behavior in corner cases of live deployment
study at large it company shows that mobile information workers frequently migrate work across devices here smartphones desktop pcs laptops while having multiple devices provides new opportunities to work in the face of changing resource deprivations the management of devices is often problematic the most salient problems are posed by the physical effort demanded by various management tasks anticipating what data or functionality will be needed and aligning these efforts with work mobility and social situations workers strategies of coping with these problems center on two interwoven activities the physical handling of devices and cross device synchronization these aim at balancing risk and effort in immediate and subsequent use workers also exhibit subtle ways to handle devices in situ appropriating their physical and operational properties the design implications are discussed
we consider the problem of storing an ordered dictionary data structure over distributed set of nodes in contrast to traditional sequential data structures distributed data structures should ideally have low congestion we present novel randomized data structure called family tree to solve this problem family tree has optimal expected congestion uses only constant amount of state per node and supports searches and node insertion deletion in expected log time on system with nodes furthermore family tree supports keys from any ordered domain because the keys are not hashed searches have good locality in the sense that intermediate nodes on the search path have keys that are not far outside of the range between the source and destination
current face recognition techniques rely heavily on the large size and representativeness of the training sets and most methods suffer degraded performance or fail to work if there is only one training sample per person available this so called one sample problem is challenging issue in face recognition in this paper we propose novel feature extraction method named uniform pursuit to address the one sample problem the underlying idea is that most recognition errors are due to the confusions between faces that look very similar and thus one can reduce the risk of recognition error by mapping the close class prototypes to be distant ie uniforming the pairwise distances between different class prototypes specifically the up method pursues in the whitened pca space the low dimensional projections that reduce the local confusion between the similar faces the resulting low dimensional transformed features are robust against the complex image variations such as those caused by lighting and aging standardized procedure on the large scale feret and frgc databases is applied to evaluate the one sample problem experimental results show that the robustness accuracy and efficiency of the proposed up method compare favorably to the state of the art one sample based methods
the constraint satisfaction problem csp is ubiquitous in artificial intelligence it has wide applicability ranging from machine vision and temporal reasoning to planning and logic programming this paper attempts systematic and coherent review of the foundations of the techniques for constraint satisfaction it discusses in detail the fundamental principles and approaches this includes an initial definition of the constraint satisfaction problem graphical means of problem representation conventional tree search solution techniques and pre processing algorithms which are designed to make subsequent tree search significantly easier
securing wireless mobile ad hoc networks manets is challenging due to the lack of centralized authority and poor connectivity key distribution mechanism is central to any public key management scheme we propose novel key distribution scheme for manets that exploits the routing infrastructure to effectively chain peer nodes together keying material propagates along these virtual chains via message relaying mechanism we show that the proposed approach results in key distribution scheme with low implementation complexity ideally suited for stationary ad hoc networks and manets with low to high mobility the proposed scheme uses mobility as an aid to fuel the rate of bootstrapping the routing security but in contrast to existing schemes does not become dependent on mobility the key dissemination occurs completely on demand security associations are only established as needed by the routing protocol we show through simulations that the scheme’s communication and computational overhead has negligible impact on network performance
data stream is continuous and high speed flow of data items high speed refers to the phenomenon that the data rate is high relative to the computational power the increasing focus of applications that generate and receive data streams stimulates the need for online data stream analysis tools mining data streams is real time process of extracting interesting patterns from high speed data streams mining data streams raises new problems for the data mining community in terms of how to mine continuous high speed data items that you can only have one look at in this paper we propose algorithm output granularity as solution for mining data streams algorithm output granularity is the amount of mining results that fits in main memory before any incremental integration we show the application of the proposed strategy to build efficient clustering frequent items and classification techniques the empirical results for our clustering algorithm are presented and discussed which demonstrate acceptable accuracy coupled with efficiency in running time
as animated characters increasingly become vital parts of virtual environments then the engines that drive these characters increasingly become vital parts of virtual environment software this paper gives an overview of the state of the art in character engines and proposes taxonomy of the features that are commonly found in them this taxonomy can be used as tool for comparison and evaluation of different engines in order to demonstrate this we use it to compare three engines the first is cald the most commonly used open source engine we also introduce two engines created by the authors piavca and halca the paper ends with brief discussion of some other popular engines
metric temporal logic mtl is widely studied real time extension of linear temporal logic in this paper we survey results about the complexity of the satisfiability and model checking problems for fragments of mtl with respect to different semantic models we show that these fragments have widely differing complexities from polynomial space to non primitive recursive and even undecidable however we show that the most commonly occurring real time properties such as invariance and bounded response can be expressed in fragments of mtl for which model checking if not satisfiability can be decided in polynomial or exponential space
we present method for extracting lattice from near regular texture our method demands minimal user intervention needing single mouse click to select typical texton the algorithm follows four step approach first an estimate of texton size is obtained by considering the spacing of peaks in the auto correlation of the texture second sample of the image around the user selected texton is correlated with the image third the resulting correlation surface is converted to map of potential texton centres using non maximal suppression finally the maxima are formed into graph by connecting potential texton centres we have found the method robust in the face of significant changes in pixel intensity and geometric structure between textons
this study examined how two communication media mail and instant messaging affected communication outcomes and more specifically how these two media influenced the relationship between flow experience and communication outcomes an experiment was conducted on college campus using student subjects communication outcomes were collected using questionnaire data were analyzed using mancova multivariate analysis of covariance and discriminant analysis playfulness was used as covariate the analysis showed that the mail group appeared to have higher communication quality and effectiveness significant relationship was found to exist between flow and communication outcomes when the communication medium was mail but no significant relationship was found to exist when the communication medium was instant messaging playfulness covariate affected the relationship between the media type and communication outcomes
ensemblue is distributed file system for personal multimedia that incorporates both general purpose computers and consumer electronic devices ceds ensem blue leverages the capabilities of few general purpose computers to make ceds first class clients of the file system it supports namespace diversity by translating between its distributed namespace and the local namespaces of ceds it supports extensibility through persistent queries robust event notification mechanism that leverages the underlying cache consistency protocols of the file system finally it allows mobile clients to self organize and share data through device ensembles our results show that these features impose little overhead yet they enable the integration of emerging platforms such as digital cameras mp players and dvrs
we present metamorph system and framework for generating vertical deep web search engines in knowledge based way the approach enables the separation between the roles of higher skilled ontology engineer and less skilled service engineer which adds new web sources in an intuitive semi automatic manner using the proven lixto suite one part of the framework is the understanding process for complex web search forms and the generation of an ontological representation of each form and its intrinsic run time dependencies based on these representations unified meta form and matchings from the meta form to the individual search forms and vice versa are created taking into account different form element types contents and labels we discuss several aspects of the metamorph ontology which focuses especially on the interaction semantics of web forms and give short account of our semi automatic tagging system
semantic event recognition based only on vision cues is challenging problem this problem is particularly acute when the application domain is unconstrained still images available on the internet or in personal repositories in recent years it has been shown that metadata captured with pictures can provide valuable contextual cues complementary to the image content and can be used to improve classification performance with the recent geotagging phenomenon an important piece of metadata available with many geotagged pictures now on the world wide web is gps information in this study we obtain satellite images corresponding to picture location data and investigate their novel use to recognize the picture taking environment as if through third eye above the object additionally we combine this inference with classical vision based event detection methods and study the synergistic fusion of the two approaches we employ both color and structure based visual vocabularies for characterizing ground and satellite images respectively training of satellite image classifiers is done using multiclass adaboost engine while the ground image classifiers are trained using svms modeling and prediction involve some of the most interesting semantic event activity classes encountered in consumer pictures including those that occur in residential areas commercial areas beaches sports venues and parks the powerful fusion of the complementary views achieves significant performance improvement over the ground view baseline with integrated gps capable cameras on the horizon we believe that our line of research can revolutionize event recognition and media annotation in years to come
continuous queries over data streams typically produce large volumes of result streams to scale up the system one should carefully study the problem of delivering the result streams to the end users which unfortunately is often overlooked in existing systems in this paper we leverage distributed publish subscribe system dpss scalable data dissemination infrastructure for efficient stream query result delivery to take advantage of dpss’s multicast like data dissemination architecture one has to exploit the common contents among different result streams and maximize the sharing of their delivery hence we propose to merge the user queries into few representative queries whose results subsume those of the original ones and disseminate the result streams of these representative queries through the dpss to realize this approach we study the stream query containment theories and propose efficient query grouping and merging algorithms the proposed approach is non intrusive and hence can be easily implemented as middleware to be incorporated into existing stream processing systems prototype is developed on top of an open source stream processing system and results of an extensive performance study on real datasets verify the efficacy of the proposed techniques
the problem of analyzing and classifying conceptual schemas is becomig increasingly important due to the availability of large number of schemas related to existing applications the purposes of schema analysis and classification activities can be different to extract information on intensional properties of legacy systems in order to restructure or migrate to new architectures to build libraries of reference conceptual components to be used in building new applications in given domain and to identify information flows and possible replication of data in an organization this article proposes set of techniques for schema analysis and classification to be used separately or in combination the techniques allow the analyst to derive significant properties from schemas with human intervention limited as far as possible in particular techniques for associating descriptors with schemas for abstracting reference conceptual schemas based on schema clustering and for determining schema similarity are presented methodology for systematic schema analysis is illustrated with the purpose of identifying and abstracting into reference components the similar and potentially reusable parts of set of schemas experiences deriving from the application of the proposed techniques and methodology on large set of entity relationship conceptual schemas of information systems in the italian public administration domain are described
this paper presents algorithms for reducing the communication overhead for parallel programs that use dynamically allocated data structures the framework consists of an analysis phase called possible placement analysis and transformation phase called communication selectionthe fundamental idea of possible placement analysis is to find all possible points for insertion of remote memory operations remote reads are propagated upwards whereas remote writes are propagated downwards based on the results of the possible placement analysis the communication selection transformation selects the best place for inserting the communication and determines if pipelining or blocking of communication should be performedthe framework has been implemented in the earth mccat optimizing parallelizing compiler and experimental results are presented for five pointer intensive benchmarks running on the earth manna distributed memory parallel architecture these experiments show that the communication optimization can provide performance improvements of up to over the unoptimized benchmarks
virtual and mixed reality environments vmre often imply full body human computer interaction scenarios we used public multimodal mixed reality installation the synthetic oracle and between groups design to study the effects of implicit eg passively walking or explicit eg pointing interaction modes on the users emotional and engagement experiences and we assessed it using questionnaires additionally real time arm motion data was used to categorize the user behavior and to provide interaction possibilities for the explicit interaction group the results show that the online behavior classification corresponded well to the users interaction mode in addition contrary to the explicit interaction the engagement ratings from implicit users were positively correlated with valence but were uncorrelated with arousal ratings interestingly arousal levels were correlated with different behaviors displayed by the visitors depending on the interaction mode hence this study confirms that the activity level and behavior of users modulates their experience and that in turn the interaction mode modulates their behavior thus these results show the importance of the selected interaction mode when designing users experiences in vmre
abstract this paper presents system for the offline recognition of large vocabulary unconstrained handwritten texts the only assumption made about the data is that it is written in english this allows the application of statistical language models in order to improve the performance of our system several experiments have been performed using both single and multiple writer data lexica of variable size from to words have been used the use of language models is shown to improve the accuracy of the system when the lexicon contains words the error rate is reduced by sim percent for single writer data and by sim percent for multiple writer data our approach is described in detail and compared with other methods presented in the literature to deal with the same problem an experimental setup to correctly deal with unconstrained text recognition is proposed
open world software is paradigm which allows to develop distributed and heterogeneous software systems they can be built by integrating already developed third party services which use to declare qos values eg related to performance it is true that these qos values are subject to some uncertainties consequently the performance of the systems using these services may unexpectedly decrease challenge for this kind of software is to self adapt its behavior as response to changes in the availability or performance of the required services in this paper we develop an approach to model self renconfigurable open world software systems with stochastic petri nets moreover we develop strategies for system to gain new state where it can recover its availability or even improve its performance through an example we apply these strategies and evaluate them to discover suitable reconfigurations for the system results will announce appropriate strategies for system performance enhancement
recently single chip multiprocessor cmp is becoming an attractive architecture for improving throughput of program execution in cmps multiple processor cores share several hardware resources such as cache memory and memory bus therefore the resource contention significantly degrades performance of each thread and also loses fairness between threads in this paper we propose dynamic frequency and voltage scaling dvfs algorithm for improving total instruction throughput fairness and energy efficiency of cmps the proposed technique periodically observes the utilization ratio of shared resources and controls the frequency and the voltage of each processor core individually to balance the ratio between threads we evaluate our technique and the evaluation results show that fairness between threads are greatly improved by the technique moreover the total instruction throughput increases in many cases while reducing energy consumption
caching can reduce the bandwidth requirement in wireless computing environment as well as minimize the energy consumption of wireless portable computers to facilitate mobile clients in ascertaining the validity of their cache content servers periodically broadcast cache invalidation reports that contain information of data that has been updated however as mobile clients may operate in doze or even totally disconnected mode to conserve energy it is possible that some reports may be missed and the clients are forced to discard the entire cache content in this paper we reexamine the issue of designing cache invalidation strategies we identify the basic issues in designing cache invalidation strategies from the solutions to these issues large set of cache invalidation schemes can be constructed we evaluate the performance of four representative algorithms two of which are known algorithms ie dual report cache invalidation and bit sequences while the other two are their counterparts that exploit selective tuning namely selective dual report cache invalidation and bit sequences with bit count our study shows that the two proposed schemes are not only effective in salvaging the cache content but consume significantly less energy than their counterparts while the selective dual report cache invalidation scheme performs best in most cases it is inferior to the bit sequences with the bit count scheme under high update rates
several negative results are proved about the ability to type check queries in the only existing proposed standard for object oriented databases the first of these negative results is that it is not possible to type check oql queries in the type system underlying the odmg object model and its definition language odl the second negative result is that oql queries cannot be type checked in the type system of the java binding of the odmg standard either solution proposed in this paper is to extend the odmg object model with explicit support for parametric polymorphism universal type quantification these results show that java cannot be viable database programming language unless extended with parametric polymorphism this is why type checking oql queries presents no problem for the type system of the binding of the odmg standard however type system that is strictly more powerful than any of the type systems of the odmg standard is required in order to properly type ordered collectgions and indices the required form of polymorphism is bounded type quantification constrained genericity and even bounded polymorphism further result is that neither static nor the standard dynamic object oriented type checking is possible for java oql in spite of the fact that java oql combines features of two strongly and mostly statically typed languages contrary to one of the promises of object oriented database technology this result shows that the impedance mismatch does not disappear in the odmg standard type safe reflective technique is proposed for overcoming this mismatch
in knowledge discovery in text database extracting and returning subset of information highly relevant to user’s query is critical task in broader sense this is essentially identification of certain personalized patterns that drives such applications as web search engine construction customized text summarization and automated question answering related problem of text snippet extraction has been previously studied in information retrieval in these studies common strategies for extracting and presenting text snippets to meet user needs either process document fragments that have been delimitated priori or use sliding window of fixed size to highlight the results in this work we argue that text snippet extraction can be generalized if the user’s intention is better utilized it overcomes the rigidness of existing approaches by dynamically returning more flexible start end positions of text snippets which are also semantically more coherent this is achieved by constructing and using statistical language models which effectively capture the commonalities between document and the user intention experiments indicate that our proposed solutions provide effective personalized information extraction services
contemporary information systems eg wfm erp crm scm and bb systems record business events in so called event logs business process mining takes these logs to discover process control data organizational and social structures although many researchers are developing new and more powerful process mining techniques and software vendors are incorporating these in their software few of the more advanced process mining techniques have been tested on real life processes this paper describes the application of process mining in one of the provincial offices of the dutch national public works department responsible for the construction and maintenance of the road and water infrastructure using variety of process mining techniques we analyzed the processing of invoices sent by the various subcontractors and suppliers from three different perspectives the process perspective the organizational perspective and the case perspective for this purpose we used some of the tools developed in the context of the prom framework the goal of this paper is to demonstrate the applicability of process mining in general and our algorithms and tools in particular
considerable research has been performed in applying run time reconfigurable component models to the domain of wireless sensor networks the ability to dynamically deploy and reconfigure software components has clear advantages in sensor network deployments which are typically large in scale and expected to operate for long periods in the face of node mobility dynamic environmental conditions and changing application requirements to date research on component and binding models for sensor networks has primarily focused on the development of specialized component models that are optimized for use in resource constrained environments however current approaches impose significant overhead upon developers and tend to use inflexible binding models based on remote procedure calls to address these concerns we introduce novel component and binding model for networked embedded systems looci looci components are designed to impose minimal additional overhead on developers furthermore looci components use novel event based binding model that allows developers to model rich component interactions while providing support for easy interception re wiring and re use prototype implementation of our component and binding model has been realised for the sunspot platform our preliminary evaluation shows that looci has an acceptable memory footprint and imposes minimal overhead on developers
we consider configuration of wireless sensor networks where certain functions must be automatically assigned to sensor nodes such that the properties of sensor node eg remaining energy network neighbors match the requirements of the assigned function essentially sensor nodes take on certain roles in the network as result of configuration to help developers with such configuration tasks for variety of applications we propose generic role assignment as programming abstraction where roles and rules for their assignment can be easily specified using configuration language we present such role specification language and distributed algorithms for role assignment according to such specifications we evaluate our approach and show that efficient and robust generic role assignment is practically feasible for wireless sensor networks
in this paper we present novel approach for inducing word alignments from sentence aligned data we use conditional random field crf discriminative model which is estimated on small supervised training set the crf is conditioned on both the source and target texts and thus allows for the use of arbitrary and overlapping features over these data moreover the crf has efficient training and decoding processes which both find globally optimal solutionswe apply this alignment model to both french english and romanian english language pairs we show how large number of highly predictive features can be easily incorporated into the crf and demonstrate that even with only few hundred word aligned training sentences our model improves over the current state of the art with alignment error rates of and for the two tasks respectively
large clusters of mutual dependence can cause problems for comprehension testing and maintenance this paper introduces the concept of coherent dependence clusters techniques for their efficient identification visualizations to better understand them empirical results concerning their practical significance as the paper will show coherent dependence clusters facilitate fine grained analysis of the subtle relationships between clusters of dependence
privacy preserving distributed olap is becoming critical challenge for next generation business intelligence bi scenarios due to the natural suitability of olap in analyzing distributed massive bi repositories in multidimensional and multigranularity manner in particular in these scenarios xml formatted bi repositories play dominant role due to the wellknow amenities of xml in modeling and representing distributed business data however while privacy preserving distributed data mining has been widely investigated very few efforts have focused on the problem of effectively and efficiently supporting privacy preserving olap over distributed collections of xml documents in order to fulfill this gap we propose novel secure multiparty computation smc based privacy preserving olap framework for distributed collections of xml documents the framework has many novel features ranging from nice theoretical properties to an effective and efficient protocol the efficiency of our approach has been validated by an experimental evaluation over distributed collections of synthetic xml documents
we propose new document summarization algorithm which is personalized the key idea is to rely on the attention reading time of individual users spent on single words in document as the essential clue the prediction of user attention over every word in document is based on the user’s attention during his previous reads which is acquired via vision based commodity eye tracking mechanism once the user’s attentions over small collection of words are known our algorithm can predict the user’s attention over every word in the document through word semantics analysis our algorithm then summarizes the document according to user attention on every individual word in the document with our algorithm we have developed document summarization prototype system experiment results produced by our algorithm are compared with the ones manually summarized by users as well as by commercial summarization software which clearly demonstrates the advantages of our new algorithm for user oriented document summarization
this paper presents new pattern recognition framework for face recognition based on the combination of radon and wavelet transforms which is invariant to variations in facial expression and illumination it is also robust to zero mean white noise the technique computes radon projections in different orientations and captures the directional features of face images further the wavelet transform applied on radon space provides multiresolution features of the facial images being the line integral radon transform improves the low frequency components that are useful in face recognition for classification the nearest neighbor classifier has been used experimental results using feret orl yale and yaleb databases show the superiority of the proposed method with some of the existing popular algorithms
this paper presents delivery framework for streaming media with advertisements and an associated pricing model the delivery model combines the benefits of periodic broadcasting and stream merging the advertisements revenues are used to subsidize the price of the media content the pricing is determined based on the total ads viewing time moreover this paper presents three modified scheduling policies that are well suited to the proposed delivery framework and analyzes their effectiveness through simulation
we propose semi automatic tool termight that supports the construction of bilingual glossaries termight consists of two components which address the two subtasks in glossary construction preparing monolingual list of all technical terms in source language document and finding the translations for these terms in parallel source ndash target documents as first step in each component the tool extracts automatically candidate terms and candidate translations based on term extraction and word alignment algorithms it then performs several additional preprocessing steps which greatly facilitate human post editing of the candidate lists these steps include grouping and sorting of candidates and associating example concordance lines with each candidate finally the data prepared in preprocessing is presented to the user via an interactive interface which supports quick post editing operations termight was deployed by translators at at business translation services formerly at language line services leading to very high rates of semi automatic glossary construction
the emerging peer to peer pp model has become very powerful and attractive paradigm for developing internet scale systems for sharing resources including files and documents the distributed nature of these systems where nodes are typically located across different networks and domains inherently hinders the efficient retrieval of information in this paper we consider the effects of topologically aware overlay construction techniques on efficient pp keyword search algorithms we present the peer fusion pfusion architecture that aims to efficiently integrate heterogeneous information that is geographically scattered on peers of different networks our approach builds on work in unstructured pp systems and uses only local knowledge our empirical results using the pfusion middleware architecture and data sets from akamai’s internet mapping infrastructure akamai the active measurement project nlanr and the text retrieval conference trec show that the architecture we propose is both efficient and practical
synthesizing architectural requirements from an application viewpoint can help in making important architectural design decisions towards building large scale parallel machines in this paper we quantify the link bandwidth requirement on binary hypercube topology for set of five parallel applications we use an execution driven simulator called spasm to collect data points for system sizes that are feasible to be simulated these data points are then used in regression analysis for projecting the link bandwidth requirements for larger systems the requirements are projected as function of the following system parameters number of processors cpu clock speed and problem size these results are also used to project the link bandwidths for other network topologies our study quantifies the link bandwidth that has to be made available to limit the network overhead in an application to specified tolerance level the results show that typical link bandwidths mbytes sec found in current commercial parallel architectures such as intel paragon and cray td would have fairly low network overhead for the applications considered in this study for two of the applications this overhead is negligible for the other applications this overhead can be limited to about of the execution time provided the problem sizes are increased commensurate with the processor clock speed the technique presented can be useful to system architect to synthesize the bandwidth requirements for realizing well balanced parallel architectures
an increasing number of households are equipped with large number of tv sets and more and more of them are large highresolution displays furthermore we see the integration of web browsing and email functionalities in these devices which are then often controlled via wireless mouse and keyboard the latter were rather designed for the usage on desk rather then by person sitting on their sofa in living room therefore this paper investigates the usage of pda as replacement which can be used for controlling remote cursor and for text input the results of the experimental comparison of these input devices show as expected the superiority of mouse and keyboard as the study participants were very experienced with them surprising results were the task completion time and usability satisfaction when using the mobile device these results show the applicability of using mobile device for controlling an application on remote screen using mobile device provides the advantages that every person can eg use their own mobile phone or that these devices can be used in multi user scenarios
the bypass paths and multiported register files in microprocessors serve as an implicit interconnect to communicate operand values among pipeline stages and multiple alus previous superscalar designs implemented this interconnect using centralized structures that do not scale with increasing ilp demands in search of scalability recent microprocessor designs in industry and academia exhibit trend toward distributed resources such as partitioned register files banked caches multiple independent compute pipelines and even multiple program counters some of these partitioned microprocessor designs have begun to implement bypassing and operand transport using point to point interconnects we call interconnects optimized for scalar data transport whether centralized or distributed scalar operand networks although these networks share many of the challenges of multiprocessor networks such as scalability and deadlock avoidance they have many unique requirements including ultra low latency few cycles versus tens of cycles and ultra fast operation operand matching this paper discusses the unique properties of scalar operand networks sons examines alternative ways of implementing them and introduces the astro taxonomy to distinguish between them it discusses the design of two alternative networks in the context of the raw microprocessor and presents timing area and energy statistics for real implementation the paper also presents tuple performance model for sons and analyzes their performance sensitivity to network properties for ilp workloads
in multi level cache such as those used for web caching hit at level leads to the caching of the requested object in all intermediate caches on the reverse path levels this paper shows that simple modification to this de facto behavior in which only the level cache gets to store copy can lead to significant performance gains the modified caching behavior is called leave copy down lcd it has the merit of being able to avoid the amplification of replacement errors and also the unnecessary repetitious caching of the same objects at multiple levels simulation results against other cache interconnections show that when lcd is applied under typical web workloads it reduces the average hit distance we construct an approximate analytic model for the case of lcd interconnection of lru caches and use it to gain better insight as to why the lcd interconnection yields an improved performance
clustering or partitioning is crucial step between logic synthesis and physical design in the layout of large scale design design verified at the logic synthesis level may have timing closure problems at post layout stages due to the emergence of multiple clock period interconnects consequently trade off between clock frequency and throughput may be needed to meet the design requirements in this paper we find that the processing rate defined as the product of frequency and throughput of sequential system is upper bounded by the reciprocal of its maximum cycle ratio which is only dependent on the clustering we formulate the problem of processing rate optimization as seeking an optimal clustering with the minimal maximum cycle ratio in general graph and present an iterative algorithm to solve it since our algorithm avoids binary search and is essentially incremental it has the potential of being combined with other optimization techniques experimental results validate the efficiency of our algorithm
pervasive computing with its focus on users and their tasks rather than on computing devices and technology provides an attractive vision for the future of computing but while hardware and networking infrastructure to realize this vision are increasingly becoming reality precious few applications run in this infrastructure we believe that this lack of applications can be attributed to three characteristics that are inadequately addressed by existing systems first devices are heterogeneous ranging from wearable devices to conventional computers second network connectivity often is limited and intermittent and third interactions typically involve several autonomous administrative domains in this paper we introduce system architecture that directly addresses these challenges our architecture is targeted at application developers and administrators and it supports mobile computations persistent storage and resource discovery within single comprehensive framework
lists are pervasive data structure in functional programs the generality and simplicity of their structure makes them expensive hindley milner type inference and partial evaluation are all that is needed to optimise this structure yielding considerable improvements in space and time consumption for some interesting programs this framework is applicable to many data types and their optimised representations such as lists and parallel implementations of bags or arrays and quadtrees
the probabilistic stream model was introduced by jayram et al it is generalization of the data stream model that issuited to handling probabilistic data where each item of the stream represents probability distribution over set of possible events therefore probabilistic stream determines distribution over apotentially exponential number of classical deterministic streams where each item is deterministically one of the domain values designing efficient aggregation algorithms for probabilistic data is crucial for handling uncertainty in data centric applications such as olap such algorithms are also useful in variety of other setting including analyzing search engine traffic and aggregation in sensor networks we present algorithms for computing commonly used aggregates ona probabilistic stream we present the first one pass streaming algorithms for estimating the expected mean of probabilistic stream improving upon results in next we consider the problem of estimating frequency moments for probabilistic data we propose general approach to obtain unbiased estimators working over probabilistic data by utilizing unbiased estimators designed for standard streams applying this approach we extend classical data stream algorithm to obtain one pass algorithm for estimating the second frequency moment we present the first known streaming algorithms forestimating the number of distinct items on probabilistic streamsour work also gives an efficient one pass algorithm for estimatingthe median of probabilistic stream
we present language mechanisms for polymorphic extensible records and their exact dual polymorphic sums with extensible first class cases these features make it possible to easily extend existing code with new cases in fact such extensions do not require any changes to code that adheres to particular programming style using that style individual extensions can be written independently and later be composed to form larger components these language mechanisms provide solution to the expression problemwe study the proposed mechanisms in the context of an implicitly typed purely functional language polyr we give type system for the language and provide rules for phase transformation first into an explicitly typed calculus with record polymorphism and finally to efficient index passing code the first phase eliminates sums and cases by taking advantage of the duality with recordswe implement version of polyr extended with imperative features and pattern matching we call this language mlpolyr programs in mlpolyr require no type annotations the implementation employs reconstruction algorithm to infer all types the compiler generates machine code currently for powerpc and optimizes the representation of sums by eliminating closures generated by the dual construction
this paper introduces novel unsupervised constraint driven learning algorithm for identifying named entity ne transliterations in bilingual corpora the proposed method does not require any annotated data or aligned corpora instead it is bootstrapped using simple resource romanization table we show that this resource when used in conjunction with constraints can efficiently identify transliteration pairs we evaluate the proposed method on transliterating english nes to three different languages chinese russian and hebrew our experiments show that constraint driven learning can significantly outperform existing unsupervised models and achieve competitive results to existing supervised models
we explore automation of protein structural classification using supervised machine learning methods on set of pairs of protein domains up to sequence identity consisting of three secondary structure elements fifteen algorithms from five categories of supervised algorithms are evaluated for their ability to learn for pair of protein domains the deepest common structural level within the scop hierarchy given one dimensional representation of the domain structures this representation encapsulates evolutionary information in terms of sequence identity and structural information characterising the secondary structure elements and lengths of the respective domains the evaluation is performed in two steps first selecting the best performing base learners and subsequently evaluating boosted and bagged meta learners the boosted random forest collection of decision trees is found to be the most accurate with cross validated accuracy of and measures of and for classification of proteins to the class fold super family and family levels in the scop hierarchy the meta learning regime especially boosting improved performance by more accurately classifying the instances from less populated classes
in this work we explore new family of coarse grain reconfigurable architecture called brick which is capable of mapping complete expressions and pipelines into one processing element with multiple input multiple output characteristics while provided with centralized control unit to synchronize the operation of each processing element pe each pe has heterogeneous alus specialized in particular type of operation these alus can be interconnected to implement complex expressions either sequential or combinational increasing computational density and utilization rate of the reconfigurable array preliminary synthesis results and application examples show that efficient mappings can be achieved with brick
in the near future small intelligent devices will be deployed in homes plantations oceans rivers streets and highways to monitor the environment these devices require time synchronization so voice and video data from different sensor nodes can be fused and displayed in meaningful way at the sink instead of time synchronization between just the sender and receiver or within local group of sensor nodes some applications require the sensor nodes to maintain similar time within certain tolerance throughout the lifetime of the network the time diffusion synchronization protocol tdp is proposed as network wide time synchronization protocol it allows the sensor network to reach an equilibrium time and maintains small time deviation tolerance from the equilibrium time in addition it is analytically shown that the tdp enables time in the network to converge also simulations are performed to validate the effectiveness of tdp in synchronizing the time throughout the network and balancing the energy consumed by the sensor nodes
the shortest remaining processing time srpt scheduling disciplineis optimal and its superior performance compared with the policies that do not use the knowledge of job sizes can be quantified using mean value analysis as well as our new symptotic distribution allimits for the relatively smaller heavy tailed jobs however the main difficulty in implementing srpt in large practical systems eg web servers is that its complexity grows with the number of jobs in the queue hence in order to lower the complexity it is natural to approximate srpt by grouping the arrivals into fixed small number of classes containing jobs of approximately equal size and then serve the classes of smaller jobs with higher priorities in this paper we design novel adaptive grouping mechanism based on relative size comparison of newly arriving job to the preceding arrivals specifically if the newly arriving job is smallerthan and larger than of the previous jobs it isrouted into class the excellent performance of this mechanism even for small number of classes is demonstrated using both the asymptotic queueing analysis under heavy tails and extensive simulations we also discuss refinements of the comparison grouping mechanism that improve the accuracy of job classification at the expense of small additional complexity
given the proliferation of layered multicore and smt based architectures it is imperative to deploy and evaluate important multi level scientific computing codes such as meshing algorithms on these systems we focus on parallel constrained delaunay mesh pcdm generation we exploit coarse grain parallelism at the subdomain level medium grain at the cavity level and fine grain at the element level this multi grain data parallel approach targets clusters built from commercially available smts and multicore processors the exploitation of the coarser degree of granularity facilitates scalability both in terms of execution time and problem size on loosely coupled clusters the exploitation of medium grain parallelism allows performance improvement at the single node level our experimental evaluation shows that the first generation of smt cores is not capable of taking advantage of fine grain parallelism in pcdm many of our experimental findings with pcdm extend to other adaptive and irregular multigrain parallel algorithms as well
graphics hardware is undergoing change from fixed function pipelines to more programmable organizations that resemble general purpose stream processors in this paper we show that certain general algorithms not normally associated with computer graphics can be mapped to such designs specifically we cast nonlinear optimization as data streaming process that is well matched to modern graphics processors our framework is particularly well suited for solving image based modeling problems since it can be used to represent large and diverse class of these problems using common formulation we successfully apply this approach to two distinct image based modeling problems light field mapping approximation and fitting the lafortune model to spatial bidirectional reflectance distribution functions comparing the performance of the graphics hardware implementation to cpu implementation we show more than fold improvement
this paper presents dynamic testing method that exploits automata learning to systematically test black box systems almost without prerequisites based on interface descriptions our method successively explores the system under test sut while it at the same time extrapolates behavioral model this is in turn used to steer the further exploration process due to the applied learning technique our method is optimal in the sense that the extrapolated models are most concise in consistently representing all the information gathered during the exploration using the learnlib our framework for automata learning our method can elegantly be combined with numerous optimizations of the learning procedure various choices of model structures and last but not least with the option to dynamically interactively enlarge the alphabet underlying the learning process all these features will be illustrated using as case study the web application mantis bug tracking system widely used in practice we will show how the dynamic testing procedure proceeds and how the behavioral models arise that concisely summarize the current testing effort it has turned out that these models besides steering the automatic exploration process are ideal for user guidance and to support analyzes to improve the system understanding
content pollution is pervasive in the current peer to peer file sharing systems many previous reputation models have been proposed to address this problem however such models strongly rely on the participants feedback in this paper we bring forward new holistic mechanism which integrates the reputation model inherent file source based information and the statistical data reflecting the diffusion state to defend against pollution attack first we deploy redundancy mechanism to assure that the file requester receives the correct indices that accord with the information published by the file provider second we complement the reputation information with the diffusion data to help the file requester select the authentic file for downloading finally we introduce block oriented probabilistic verification protocol to help the file requester discern the polluted files during the downloading with low cost we perform simulation which shows that our holistic mechanism can perform very well and converge to high accuracy rapidly even in highly malicious environment
three dimensional stacking of silicon layers is emerging as promising solution to handle the design complexity and heterogeneity of systems on chips socs networks on chips nocs are necessary to efficiently handle the interconnect complexity designing power efficient nocs for socs that satisfy the application performance requirements while satisfying the technology constraints is big challenge in this work we address this problem and present synthesis approach for designing power performance efficient nocs we present methods to determine the best topology compute paths and perform placement of the noc components in each layer we perform experiments on varied realistic soc benchmarks to validate the methods and also perform comparative study of the resulting noc designs with optimized mesh topologies the nocs designed by our synthesis method results in large interconnect power reduction average of and latency reduction average of when compared to traditional noc designs
as the accuracy of biometrics improves it is getting increasingly hard to push the limits using single modality in this paper unified approach that fuses three dimensional facial and ear data is presented an annotated deformable model is fitted to the data and geometry image is extracted wavelet coefficients are computed from the geometry image and used as biometric signature the method is evaluated using the largest publicly available database and achieves rank one recognition rate the state of the art accuracy of the multimodal fusion is attributed to the low correlation between the individual differentiability of the two modalities
the rapid increase of world wide web users and the development of services with high bandwidth requirements have caused the substantial increase of response times for users on the internet web latency would be significantly reduced if browser proxy or web server software could make predictions about the pages that user is most likely to request next while the user is viewing the current page and prefetch their contentin this paper we study predictive prefetching on totally new web system architecture this is system that provides two levels of caching before information reaches the clients this work analyses prefetching on wide area network with the above mentioned characteristics we first provide structured overview of predictive prefetching and show its wide applicability to various computer systems the wan that we refer to is the grnet academic network in greece we rely on log files collected at the network’s transparent cache primary caching point located at grnet’s edge connection to the internet we present the parameters that are most important for prefetching on grnet’s architecture and provide preliminary results of an experimental study quantifying the benefits of prefetching on the wan our experimental study includes the evaluation of two prediction algorithms an ldquo most popular document rdquo algorithm and variation of the ppm prediction by partial matching prediction algorithm our analysis clearly shows that predictive prefetching can improve web response times inside the grnet wan without substantial increase in network traffic due to prefetching
the increasing variability in manufacturing process parameters is expected to lead to significant performance degradation in deep submicron technologies multiple voltage frequency island vfi design styles with fine grained process variation aware clocking have recently been shown to possess increased immunity to manufacturing process variations in this article we propose theoretical framework that allows designers to quantify the performance improvement that is to be expected if they were to migrate from fully synchronous design to the proposed multiple vfi design style specifically we provide techniques to efficiently and accurately estimate the probability distribution of the execution rate or throughput of both single and multiple vfi systems under the influence of manufacturing process variations finally using an mpeg encoder benchmark we demonstrate how the proposed analysis framework can be used by designers to make architectural decisions such as the granularity of vfi domain partitioning based on the throughput constraints their systems are required to satisfy
number of studies have shown the abundance of unused spectrum in the tv bands this is in stark contrast to the overcrowding of wireless devices in the ism bands recent trend to alleviate this disparity is the design of cognitive radios which constantly sense the spectrum and opportunistically utilize unused frequencies in the tv bands in this paper we introduce the concept of time spectrum block to model spectrum reservation and use it to present theoretical formalization of the spectrum allocation problem in cognitive radio networks we present centralized and distributed protocol for spectrum allocation and show that these protocols are close to optimal in most scenarios we have implemented the distributed protocol in qualnet and show that our analysis closely matches the simulation results
user centricity is significant concept in federated identity management fim as it provides for stronger user control and privacy however several notions of user centricity in the fim community render its semantics unclear and hamper future research in this area therefore we consider user centricity abstractly and establish comprehensive taxonomy encompassing user control architecture and usability aspects of user centric fim we highlight the various mechanisms to achieve the properties identified in the taxonomy we show how these mechanisms may differ based on the underlying technologies which in turn result in different trust assumptions we classify the technologies into two predominant variants of user centric fim systems with significant feature sets we distinguish credential focused systems which advocate offline identity providers and long term credentials at user’s client and relationship focused systems which rely on the relationships between users and online identity providers that create short term credentials during transactions note that these two notions of credentials are quite different the former encompasses cryptographic credentials as defined by lysyanskaya et al in selected areas in cryptography lncs vol and the latter encompasses federation tokens as used in today’s fim protocols like liberty we raise the question where user centric fim systems may go within the limitations of the user centricity paradigm as well as beyond them firstly we investigate the existence of universal user centric fim system that can achieve superset of security and privacy properties as well as the characteristic features of both predominant classes secondly we explore the feasibility of reaching beyond user centricity that is allowing user of user centric fim system to again give away user control by means of an explicit act of delegation we do neither claim solution for universal user centric systems nor for the extension beyond the boundaries of user centricity however we establish starting point for both ventures by leveraging the properties of credential focused fim system
in the past few years the application of aspect oriented software development aosd technologies has helped improve the development integration deployment evolution and quality of object oriented and other software for growing community of software developers the concern manipulation environment cme is an open source eclipse project that targets aspect oriented technologies the cme contains task oriented tools for usage approaches that apply aspect orientation in different development and deployment scenarios the cme also provides component and framework level support for building aspect oriented tools for variety of types of software artifacts
the notion of certain answers arises when one queries incompletely specified databases eg in data integration and exchange scenarios or databases with missing information while in the relational case this notion is well understood there is no natural analog of it for xml queries that return documents we develop an approach to defining certain answers for such xml queries and apply it in the settings of incomplete information and xml data exchange we first revisit the relational case and show how to present the key concepts related to certain answers in new model theoretic language this new approach naturally extends to xml we prove number of generic application independent results about computability and complexity of certain answers produced by it we then turn our attention to pattern based xml query language with trees as outputs and present technique for computing certain answers that relies on the notion of basis of set of trees we show how to compute such bases for documents with nulls and for documents arising in data exchange scenarios and provide complexity bounds while in general complexity of query answering in xml data exchange could be high we exhibit natural class of xml schema mappings for which not only query answering but also many static analysis problems can be solved efficiently
we present technique for interactive rendering of glossy objects in complex and dynamic lighting environments that captures interreflections and all frequency shadows our system is based on precomputed radiance transfer and separable brdf approximation we factor glossy brdfs using separable decomposition and keep only few low order approximation terms each consisting of purely view dependent and purely light dependent component in the precomputation step for every vertex we sample its visibility and compute direct illumination transport vector corresponding to each brdf term we use modern graphics hardware to accelerate this step and further compress the data using nonlinear wavelet approximation the direct illumination pass is followed by one or more interreflection passes each of which gathers compressed transport vectors from the previous pass to produce global illumination transport vectors to render at run time we dynamically sample the lighting to produce light vector also represented in wavelet basis we compute the inner product of the light vector with the precomputed transport vectors and the results are further combined with the brdf view dependent components to produce vertex colors we describe acceleration of the rendering algorithm using programmable graphics hardware and discuss the limitations and trade offs imposed by the hardware
automatic word alignment plays critical role in statistical machine translation unfortunately the relationship between alignment quality and statistical machine translation performance has not been well understood in the recent literature the alignment task has frequently been decoupled from the translation task and assumptions have been made about measuring alignment quality for machine translation which it turns out are not justified in particular none of the tens of papers published over the last five years has shown that significant decreases in alignment error rate aer result in significant increases in translation performance this paper explains this state of affairs and presents steps towards measuring alignment quality in way which is predictive of statistical machine translation performance
in this paper we present novel scenario driven role engineering process for rbac roles the scenario concept is of central significance for the presented approach due to the strong human factor in role engineering scenarios are good means to drive the process we use scenarios to derive permissions and to define tasks our approach considers changeability issues and enables the straightforward incorporation of changes into affected models finally we discuss the experiences we gained by applying the scenario driven role engineering process in three case studies
as cluster based web servers are increasingly adopted to host variety of network based services improving the performance of such servers has become critical to satisfy the customers demands especially the user response time is an important factor so that clients feel satisfied with the web services in this paper we investigate the feasibility of minimizing the response time of server by exploiting the advantages of both user level communication and coscheduling we thus propose coscheduled server model based on the recently proposed distributed press web server where the remote cache accesses can be coscheduled on different nodes to reduce the response time we experiment this concept using two known coscheduling techniques called dynamic coscheduling dcs and dcs with immediate blocking we have developed comprehensive simulation testbed that captures the underlying communication layer in cluster the characteristics of various coscheduling algorithms and the characteristics of the distributed server model to estimate the average delay and throughput with different system configurations the accuracy of the via communication layer and the dcs mechanism is verified using measurements on node linux cluster extensive simulation of four server models press over via coscheduled press model with dcs with dcs and blocking and adaptive using node cluster configurations indicates that the average response time of distributed server can be minimized significantly by coscheduling the communicating processes the use of the dcs scheme reduced the average latency up to four times to the press over via model that uses only user level communication
this paper proposes an infrastructure and related algorithms for the controlled and cooperative updates of xml documents key components of the proposed system are set of xml based languages for specifying access control policies and the path that the document must follow during its update such path can be fully specified before the update process begins or can be dynamically modified by properly authorized subjects while being transmitted our approach is fully distributed in that each party involved in the process can verify the correctness of the operations performed until that point on the document without relying on central authority more importantly the recovery procedure also does not need the participation of central authority our approach is based on the use of some special control information that is transmitted together with the document and suite of protocols we formally specify the structure of such control information and the protocols we also analyze security and complexity of the proposed protocols
tree structured data are becoming ubiquitous nowadays and manipulating them based on similarity is essential for many applications although similarity search on textual data has been extensively studied searching for similar trees is still an open problem due to the high complexity of computing the similarity between trees especially for large numbers of tress in this paper we propose to transform tree structured data into strings with one to one mapping we prove that the edit distance of the corresponding strings forms bound for the similarity measures between trees including tree edit distance largest common subtrees and smallest common super trees based on the theoretical analysis we can employ any existing algorithm of approximate string search for effective similarity search on trees moreover we embed the bound into filter and refine framework for facilitating similarity search on tree structured data the experimental results show that our algorithm achieves high performance and outperforms state of the art methods significantly our method is especially suitable for accelerating similarity query processing on large numbers of trees in massive datasets
developing security critical systems is difficult and there are many well known examples of security weaknesses exploited in practice thus sound methodology supporting secure systems development is urgently neededwe present an extensible verification framework for verifying uml models for security requirements in particular it includes various plugins performing different security analyses on models of the security extension umlsec of uml here we concentrate on an automated theorem prover binding to verify security properties of umlsec models which make use of cryptography such as cryptographic protocols the work aims to contribute towards usage of uml for secure systems development in practice by offering automated analysis routines connected to popular case tools we present an example of such an application where our approach found and corrected several serious design flaws in an industrial biometric authentication system
anton special purpose parallel machine currently under construction is the result of significant hardware software codesign effort that relied heavily on an architectural simulator one of this simulator’s many important roles is to support the development of embedded software software that runs on anton’s asics which is challenging for several reasons first the anton asic is heterogeneous multicore system on chip with three types of embedded cores tightly coupled to special purpose hardware units second standard asic configuration contains total of distinct embedded cores all of which must be explicitly modeled within the simulator third portion of the embedded software is dynamically generated at simulation time this paper discusses the various ways in which the anton simulator addresses these challenges we use hardware abstraction layer that allows embedded software source code to be compiled without modification for either the simulation host or the hardware target we report on the effectiveness of embedding golden model testbenches within the simulator to verify embedded software as it runs we also describe our hardware software cosimulation strategy for dynamically generated embedded software finally we use methodology that we refer to as concurrent mixed level simulation to model embedded cores within massively parallel systems these techniques allow the anton simulator to serve as an efficient platform for embedded software development
the establishment of localization system is an important task in wireless sensor networks due to the geographic correlation of the sensed data location information is commonly used to name the gathered data address nodes and regions and also improve the performance of many geographic algorithms depending on the localization algorithm different error behaviors eg mean probability distribution and correlation can be exhibited by the sensor network the process of understanding and analysing this behavior is the first step toward mathematical model of the localization error furthermore this knowledge can also be used to propose improvements to these systems in this work we divide the localization systems into three components distance estimation position computation and the localization algorithm we show how each component can affect on the final error of the system in this work we concentrate on the third component the localization algorithm the error behaviors of three known localization algorithms are evaluated together in similar scenarios so the different behaviors of the localization error can be identified and analysed the influence of these errors in geographic algorithms is also analysed showing the importance of understanding the error behavior and the importance of geographic algorithms which consider the inaccuracy of position estimations
information diffusion viral marketing and collective classification all attempt to model and exploit the relationships in network to make inferences about the labels of nodes variety of techniques have been introduced and methods that combine attribute information and neighboring label information have been shown to be effective for collective labeling of the nodes in network however in part because of the correlation between node labels that the techniques exploit it is easy to find cases in which once misclassification is made incorrect information propagates throughout the network this problem can be mitigated if the system is allowed to judiciously acquire the labels for small number of nodes unfortunately under relatively general assumptions determining the optimal set of labels to acquire is intractable here we propose an acquisition method that learns the cases when given collective classification algorithm makes mistakes and suggests acquisitions to correct those mistakes we empirically show on both real and synthetic datasets that this method significantly outperforms greedy approximate inference approach viral marketing approach and approaches based on network structural measures such as node degree and network clustering in addition to significantly improving accuracy with just small amount of labeled data our method is tractable on large networks
agile software development methods are quite popular nowadays and are being adopted at an increasing rate in the industry every year however these methods are still lacking usability awareness in their development lifecycle and the integration of usability user centered design ucd into agile methods is not adequately addressed this paper presents the preliminary results of recently conducted online survey regarding the current state of the integration of agile methods and usability ucd world wide response of practitioners was received the results show that the majority of practitioners perceive that the integration of agile methods with usability ucd has added value to their adopted processes and to their teams has resulted in the improvement of usability and quality of the product developed and has increased the satisfaction of the end users of the product developed the top most used hci techniques are low fidelity prototyping conceptual designs observational studies of users usability expert evaluations field studies personas rapid iterative testing and laboratory usability testing
opportunistic sensing allows applications to task mobile devices to measure context in target region for example one could leverage sensor equipped vehicles to measure traffic or pollution levels on particular street or users mobile phones to locate bluetooth enabled objects in their neighborhood in most proposed applications context reports include the time and location of the event putting the privacy of users at increased risk even if report has been anonymized the accompanying time and location can reveal sufficient information to deanonymize the user whose device sent the reportwe propose anonysense general purpose architecture for leveraging users mobile devices for measuring context while maintaining the privacy of the usersanonysense features multiple layers of privacy protection framework for nodes to receive tasks anonymously novel blurring mechanism based on tessellation and clustering to protect users privacy against the system while reporting context and anonymous report aggregation to improve the users privacy against applications receiving the context we outline the architecture and security properties of anonysense and focus on evaluating our tessellation and clustering algorithm against real mobility traces
this paper proposes fast and efficient method for producing physically based animations of the ice melting phenomenon including thermal radiation as well as thermal diffusion and convective thermal transfer our method adopts simple color function called the vof volume of fluid with advection to track the free surface which enables straightforward simulation of the phase changes such as ic egrave melting although advection of functions that vary abruptly such as the step function causes numerical problems we have solved these by the rcip rational constrained interpolation profile method we present an improvement to control numerical diffusion and to render anti aliased surfaces the method also introduces technique analogous to photon mapping for calculating thermal radiation by the photon mapping method tuned for heat calculation the thermal radiation phenomenon in scene is solved efficiently by storing thermal energy in each photon here we report the results of several ice melting simulations produced by our method
access control policies for xml typically use regular path expressions such as xpath for specifying the objects for access control policies however such access control policies are burdens to the engines for xml query languages to relieve this burden we introduce static analysis for xml access control given an access control policy query expression and an optional schema static analysis determines if this query expression is guaranteed not to access elements or attributes that are permitted by the schema but hidden by the access control policy static analysis can be performed without evaluating any query expression against an actual database run time checking is required only when static analysis is unable to determine whether to grant or deny access requests nice side effect of static analysis is query optimization access denied expressions in queries can be evaluated to empty lists at compile time we have built prototype of static analysis for xquery and shown the effectiveness and scalability through experiments
recently the applications of web usage mining are more and more concentrated on finding valuable user behaviors from web navigation record data where the sequential pattern model has been well adapted however with the growth of the explored user behaviors the decision makers will be more and more interested in unexpected behaviors but not only in those already confirmed in this paper we present our approach user that finds unexpected sequences and implication rules from sequential data with user defined beliefs for mining unexpected behaviors from web access logs our experiments with the belief bases constructed from explored user behaviors show that our approach is useful to extract unexpected behaviors for improving the web site structures and user experiences
in this paper we give an overview on some algorithms for learning automata starting with biermann’s and angluin’s algorithms we describe some of the extensions catering for specialized or richer classes of automata furthermore we survey their recent application to verification problems
visual data mining strategy lies in tightly coupling the visualizations and analytical processes into one data mining tool that takes advantage of the strengths from multiple sources we present concrete cooperation between automatic algorithms interactive algorithms and visualization methods the first kind of cooperation is an interactive decision tree algorithm ciad it allows the user to be helped by an automatic algorithm based on support vector machine svm to optimize the interactive split performed in the current tree node or to compute the best split in an automatic mode another effective cooperation is visualization algorithm used to explain the results of svm algorithm the same visualization method can also be used to help the user in the parameters tuning step in input of automatic svm algorithms then we present methods using both automatic and interactive methods to deal with very large datasets the obtained results let us think it is promising way to deal with very large datasets
network infrastructures ni such as the internet grid smart spaces and enterprise computing environments usually consists of computing nodes that are stationary provide the backbone for environment sensing and high performance computing and communication ni in addition may have various types of application software for performing resource intensive computation on the other hand recent advances in the embedded systems and wireless communication technologies have increased the flexibility of using mobile devices for various practical applications mobile devices mostly execute application software that improves the personal productivity of the user however despite the rapid technology advances mobile devices are expected to be always resource poor in comparison with the computing resources in the nis on the other hand the computing resources in an ni cannot readily add the flexibility to individual users due to their fixed location and size it is therefore desirable to combine the respective strengths of mobile devices and network infrastructures ni whenever possible dynamic integration is the process using which mobile device can detect communicate with and use the required resources in nearby nis in an application transparent way the benefit of dynamic integration is that the applications in both mobile device and ni can interoperate with each other as if mobile device itself is an integral part of the ni or vice versa in this paper context sensitive middleware called reconfigurable context sensitive middleware rcsm is presented for addressing this dynamic integration problem novel feature of rcsm is that its dynamic integration mechanism is context sensitive and as such the integration between the application software in mobile device and an ni can be restricted to specific contexts such as particular location or particular time rcsm furthermore provides transparency over the dynamic resource discovery and networking aspects so that application level cohesion can be easily achieved the integration process does not force any development time restrictions on the application software in an ni our experimental results based on the implementation of rcsm in integrated ad hoc and infrastructure based ieee test bed environment indicate that the integration process is lightweight and results in reasonably high performance in pda like devices and desktop pcs
similarity join correlating fragments in xml documents which are similar in structure and content can be used as the core algorithm to support data cleaning and data integration tasks for this reason built in support for such an operator in an xml database management system xdbms is very attractive however similarity assessment is especially difficult on xml datasets because structure besides textual information may embody variations in xml documents representing the same real world entity moreover the similarity computation is considerably more expensive for tree structured objects and should therefore be prime optimization candidate in this paper we explore and optimize tree based similarity joins and analyze their performance and accuracy when embedded in native xdbmss
despite general awareness of the importance of keeping one’s system secure and widespread availability of consumer security technologies actual investment in security remains highly variable across the internet population allowing attacks such as distributed denial of service ddos and spam distribution to continue unabated by modeling security investment decision making in established eg weakest link best shot and novel games eg weakest target and allowing expenditures in self protection versus self insurance technologies we can examine how incentives may shift between investment in public good protection and private good insurance subject to factors such as network size type of attack loss probability loss magnitude and cost of technology we can also characterize nash equilibria and social optima for different classes of attacks and defenses in the weakest target game an interesting result is that for almost all parameter settings more effort is exerted at nash equilibrium than at the social optimum we may attribute this to the strategic uncertainty of players seeking to self protect at just slightly above the lowest protection level
we study cross selling operations in call centers the following questions are addressed how many customer service representatives are required staffing and when should cross selling opportunities be exercised control in way that will maximize the expected profit of the center while maintaining prespecified service level target we tackle these questions by characterizing control and staffing schemes that are asymptotically optimal in the limit as the system load grows large our main finding is that threshold priority control in which cross selling is exercised only if the number of callers in the system is below certain threshold is asymptotically optimal in great generality the asymptotic optimality of threshold priority reduces the staffing problem to solution of simple deterministic problem in one regime and to simple search procedure in another we show that our joint staffing and control scheme is nearly optimal for large systems furthermore it performs extremely well even for relatively small systems
our interest in the global information sharing process is motivated by the advances in communication and computation technologies the marriage between the two technologies and the almost limitless amount of information available on the network within the scope of the global information sharing process when user’s request potentially mobile is directed to public data broadcasting has been suggested as an effective mechanism to access data the effectiveness of the schemes to retrieve public data is determined by their ability to reduce the access latency and power consumed by the mobile unit various indexing techniques can be used to further improve the effectiveness of retrieving broadcast datathis paper addresses the application of object indexing in parallel broadcast channels in addition to further reduce access latency it proposes several scheduling schemes to order accesses to the data objects on parallel channels the proposed schemes are simulated and analyzed our simulation results indicate that the employment of indexing scheme and proper scheduling of object retrieval along the parallel channels drastically reduces both the access latency and power consumption at the mobile unit
today’s integrated development environments ides are hampered by their dependence on files and file based editing we propose novel user interface that is based on collections of lightweight editable fragments called bubbles which when grouped together form concurrently visible working sets we describe the design of prototype ide user interface for java based on working sets
software system user requirements tend to change and evolve over time the uml activity diagrams are useful language for modeling system processes additionally designers must often maintain activity diagrams incrementally this paper presents the cdade tool which can help designers detect conflicts in activity diagram evolution the cdade tool is composed of ontologies metadata and conflict detection rules speech act theory is used to reveal evolutionary change in the activity diagrams the cdade prototype and case study of electronic commerce are presented to demonstrate and validate the feasibility and effectiveness of the cdade tool
simulating human hair is recognized as one of the most difficult tasks in computer animation in this paper we show that the kirchhoff equations for dynamic inextensible elastic rods can be used for accurately predicting hair motion these equations fully account for the nonlinear behavior of hair strands with respect to bending and twisting we introduce novel deformable model for solving them each strand is represented by super helix ie piecewise helical rod which is animated using the principles of lagrangian mechanics this results in realistic and stable simulation allowing large time steps our second contribution is an in depth validation of the super helix model carried out through series of experiments based on the comparison of real and simulated hair motions we show that our model efficiently handles wide range of hair types with high level of realism
desktop grids use the computing network and storage resources from idle desktop pcs distributed over multiple lans or the internet to compute large variety of resource demanding distributed applications while these applications need to access compute store and circulate large volumes of data little attention has been paid to data management in such large scale dynamic heterogeneous volatile and highly distributed grids in most cases data management relies on ad hoc solutions and providing general approach is still challenging issue new class of data management service is desirable to deal with such variety of file transfer protocols than client server pp or the new and emerging cloud storage service to address this problem we propose the bitdew framework programmable environment for automatic and transparent data management on computational desktop grids this paper describes the bitdew programming interface its architecture and the performance evaluation of its runtime components bitdew relies on specific set of metadata to drive key data management operations namely life cycle distribution placement replication and fault tolerance with high level of abstraction the bitdew runtime environment is flexible distributed service architecture that integrates modular pp components such as dhts distributed hash tables for distributed data catalog and collaborative transport protocols for data distribution we explain how to plug in new or existing protocols and we give evidence of the versatility of the framework by implementing http ftp and bittorrent protocols and access to the amazon and ibp wide area storage we describe the mechanisms used to provide asynchronous and reliable multi protocols transfers through several examples we describe how application programmers and bitdew users can exploit bitdew’s features we report on performance evaluation using micro benchmarks various usage scenarios and data intense bioinformatics application both in the grid context and on the internet the performance evaluation demonstrates that the high level of abstraction and transparency is obtained with reasonable overhead while offering the benefit of scalability performance and fault tolerance with little programming cost
runtime stacks are critical components of any modern software they are used to implement powerful control structures such as function call return stack cutting and unwinding coroutines and thread context switch stack operations however are very hard to reason about there are no known formal specifications for certifying style setjmp longjmp stack cutting and unwinding or weak continuations in in many proof carrying code pcc systems return code pointers and exception handlers are treated as general first class functions as in continuation passing style even though both should have more limited scopesin this paper we show that stack based control abstractions follow much simpler pattern than general first class code pointers we present simple but flexible hoare style framework for modular verification of assembly code with all kinds of stackbased control abstractions including function call return tail call setjmp longjmp weak continuation stack cutting stack unwinding multi return function call coroutines and thread context switch instead of presenting specific logic for each control structure we develop all reasoning systems as instances of generic framework this allows program modules and their proofs developed in different pcc systems to be linked together our system is fully mechanized we give the complete soundness proof and full verification of several examples in the coq proof assistant
detection of near duplicate documents is an important problem in many data mining and information filtering applications when faced with massive quantities of data traditional techniques relying on direct inter document similarity computation are often not feasible given the time and memory performance constraints on the other hand fingerprint based methods such as match while very attractive computationally can be unstable even to small perturbations of document content which causes signature fragmentation we focus on match and present randomization based technique of increasing its signature stability with the proposed method consistently outperforming traditional match by as high as in terms of the relative improvement in near duplicate recall importantly the large gains in detection accuracy are offset by only small increases in computational requirements we also address the complimentary problem of spurious matches which is particularly important for match when fingerprinting long documents our discussion is supported by experiments involving large web page and email datasets
this paper defines novel relatedness measure by conditional query explores snippets in various web domains as corpora and evaluates the relatedness measure on three famous benchmarks including wordsimilarity miller charles and rubenstein goodenough datasets conditional query qy on web domain estimates frequency fy by querying to search engine results of dependency score is in terms of frequencies fy and fx and content overlap of search results of and by various operations transfer function projects dependency score to mutual dependency of and two transfer functions based on poisson and gompertz models are considered gompertz model reports the correlation score in the wordsimilarity dataset gompertz model also shows the best performance among all the web based approaches in rubenstein goodenough and miller charles datasets
java virtual machine jvm crashes are often due to an invalid memory reference to the jvm heap before the bug that caused the invalid reference can be fixed its location must be identified it can be in either the jvm implementation or the native library written in language invoked from java applications to help system engineers identify the location we implemented feature using page protection that prevents threads executing native methods from referring to the jvm heap this feature protects the jvm heap during native method execution and when native method execution refers to the jvm heap invalidly it interrupts the execution by generating page fault exception and then reports the location where the page fault exception was generated this helps the system engineer to identify the location of the bug in the native library the runtime overhead for using this feature averaged based on an estimation using the specjvm specjbb and jfcmark benchmark suites
dataflow analyses sacrifice path sensitivity for efficiency and lead to false positives when used for verification predicate refinement based model checking methods are path sensitive but must perform many expensive iterations to find all the relevant facts about program not all of which are naturally expressed and analyzed using predicates we show how to join these complementary techniques to obtain efficient and precise versions of any lattice based dataflow analysis using predicated lattices predicated lattice partitions the program state according to set of predicates and tracks lattice element for each partition the resulting dataflow analysis is more precise than the eager dataflow analysis without the predicatesin addition we automatically infer predicates to rule out imprecisions the result is dataflow analysis that can adaptively refine its precision we then instantiate this generic framework using symbolic execution lattice which tracks pointer and value information precisely we give experimental evidence that our combined analysis is both more precise than the eager analysis in that it is sensitive enough to prove various properties as well as much faster than the lazy analysis as many relevant facts are eagerly computed thus reducing the number of iterationsthis results in an order of magnitude improvement in the running times from purely lazy analysis
large scale online communities need to manage the tension between critical mass and information overload slashdot is news and discussion site that has used comment rating to allow massive participation while providing mechanism for users to filter content by default comments with low ratings are hidden of users who changed the defaults more than three times as many chose to use ratings for filtering or sorting as chose to suppress the use of comment ratings nearly half of registered users however never strayed from the default filtering settings suggesting that the costs of exploring and selecting custom filter settings exceeds the expected benefit for many users we recommend leveraging the efforts of the users that actively choose filter settings to reduce the cost of changing settings for all other users one strategy is to create static schemas that capture the filtering preferences of different groups of readers another strategy is to dynamically set filtering thresholds for each conversation thread based in part on the choices of previous readers for predicting later readers choices the choices of previous readers are far more useful than content features such as the number of comments or the ratings of those comments
in this paper we explore new data mining capability that involves mining path traversal patterns in distributed information providing environment where documents or objects are linked together to facilitate interactive access our solution procedure consists of two steps first we derive an algorithm to convert the original sequence of log data into set of maximal forward references by doing so we can filter out the effect of some backward references which are mainly made for ease of traveling and concentrate on mining meaningful user access sequences second we derive algorithms to determine the frequent traversal patterns ie large reference sequences from the maximal forward references obtained two algorithms are devised for determining large reference sequences one is based on some hashing and pruning techniques and the other is further improved with the option of determining large reference sequences in batch so as to reduce the number of database scans required performance of these two methods is comparatively analyzed it is shown that the option of selective scan is very advantageous and can lead to prominent performance improvement sensitivity analysis on various parameters is conducted
in this paper we propose hierarchical discriminative approach for human action recognition it consists of feature extraction with mutual motion pattern analysis and discriminative action modeling in the hierarchical manifold space hierarchical gaussian process latent variable model hgplvm is employed to learn the hierarchical manifold space in which motion patterns are extracted cascade crf is also presented to estimate the motion patterns in the corresponding manifold subspace and the trained svm classifier predicts the action label for the current observation using motion capture data we test our method and evaluate how body parts make effect on human action recognition the results on our test set of synthetic images are also presented to demonstrate the robustness
we propose new features and algorithms for automating web page classification tasks such as content recommendation and ad blocking we show that the automated classification of web pages can be much improved if instead of looking at their textual content we consider each links’s url and the visual placement of those links on referring page these features are unusual rather than being scalar measurements like word counts they are tree structured describing the position of the item in tree we develop model and algorithm for machine learning using such tree structured features we apply our methods in automated tools for recognizing and blocking web advertisements and for recommending interesting news stories to reader experiments show that our algorithms are both faster and more accurate than those based on the text content of web documents
the routing of traffic between internet domains or autonomous systems ass task known as interdomain routing is currently handled by the border gateway protocol bgp in this paper we address the problem of interdomain routing from mechanism design point of view the application of mechanism design principles to the study of routing is the subject of earlier work by nisan and ronen and hershberger and suri in this paper we formulate and solve version of the routing mechanism design problem that is different from the previously studied version in three ways that make it more accurately reflective of real world interdomain routing we treat the nodes as strategic agents rather than the links our mechanism computes lowest cost routes for all source destination pairs and payments for transit nodes on all of the routes rather than computing routes and payments for only one source destination pair at time as is done in we show how to compute our mechanism with distributed algorithm that is straightforward extension to bgp and causes only modest increases in routing table size and convergence time in contrast with the centralized algorithms used in this approach of using an existing protocol as substrate for distributed computation may prove useful in future development of internet algorithms generally not only for routing or pricing problems our design and analysis of strategy proof bgp based routing mechanism provides new promising direction in distributed algorithmic mechanism design which has heretofore been focused mainly on multicast cost sharing
the availability of automatic tools for inferring semantics of database schemes is useful to solve several database design problems such as that of obtaining cooperative information systems or data warehouses from large sets of data sources in this context main problem is to single out similarities or dissimilarities among scheme objects interscheme properties this paper presents graph based techniques for uniform derivation of interscheme properties including synonymies homonymies type conflicts and subscheme similarities these techniques are characterized by common core the computation of maximum weight matchings on some bipartite weighted graphs derived using suitable metrics to measure semantic closeness of objects the techniques have been implemented in system prototype several experiments conducted with it and in part accounted for in the paper confirmed the effectiveness of our approach
as the web grows more and more data has become available under dynamic forms of publication such as legacy databases accessed by an html form the so called hidden web in situations such as this integration of this data relies more and more on the fast generation of agents that can automatically fetch pages for further processing as result there is an increasing need for tools that can help users generate such agents in this paper we describe method for automatically generating agents to collect hidden web pages this method uses pre existing data repository for identifying the contents of these pages and takes the advantage of some patterns that can be found among web sites to identify the navigation paths to follow to demonstrate the accuracy of our method we discuss the results of number of experiments carried out with sites from different domains
motion analysis of complex signals is particularly important and difficult topic as classical computer vision and image processing methodologies either based on some extended conservation hypothesis or regularity conditions may show their inherent limitations an important example of such signals are those coming from the remote sensing of the oceans in those signals the inherent complexities of the acquired phenomenon fluid in the regime of fully developed turbulence fdt are made even more fraught through the alterations coming from the acquisition process sun glint haze missing data etc the importance of understanding and computing vector fields associated to motion in the oceans or in the atmosphere eg cloud motion raises some fundamental questions and the need for derivating motion analysis and understanding algorithms that match the physical characteristics of the acquired signals among these questions one of the most fundamental is to understand what classical methodologies eg such as the various implementations of the optical flow are missing and how their drawbacks can be mitigated in this paper we show that the fundamental problem of motion evaluation in complex and turbulent acquisitions can be tackled using new multiscale characterizations of transition fronts the use of appropriate paradigms coming from statistical physics can be combined with some specific signal processing evaluation of the microcanonical cascade associated to turbulence this leads to radically new methods for computing motion fields in these signals these methods are first assessed on the results of oceanic circulation model and then applied on real data
there is an increased use of software in safety critical systems trend that is likely to continue in the future although traditional system safety techniques are applicable to software intensive systems there are new challenges emerging in this report we will address four issues we believe will pose challenges in the future first the nature of safety is continuing to be widely misunderstood and known system safety techniques are not applied second our ability to demonstrate certify that safety requirements have been met is inadequate third modeling and automated tools for example code generation and automated testing are introduced in hope to increase productivity this reliance on tools rather than people however introduces new and poorly understood problems finally safety critical systems are increasingly relying on data configuration data or databases incorrect data could have catastrophic and widespread consequences
dynamic optimizer is runtime software system thatgroups program’s instruction sequences into traces optimizesthose traces stores the optimized traces in software basedcode cache and then executes the optimized code inthe code cache to maximize performance the vast majorityof the program’s execution should occur in the codecache and not in the different aspects of the dynamic optimizationsystem in the past designers of dynamic optimizershave used the spec benchmark suite to justifytheir use of simple code cache management schemesin this paper we show that the problem and importance ofcode cache management changes dramatically as we movefrom spec with its relatively small number of dynamicallygenerated code traces to large interactive windowsapplications we also propose and evaluate new cachemanagement algorithm based on generational code cachesthat results in an average miss rate reduction of over aunified cache which translates into fewer instructionsspent in the dynamic optimizer the algorithm categorizescode traces based on their expected lifetimes and groupstraces with similar lifetimes together in separate storage areasusing this algorithm short lived code traces can easilybe removed from code cache without introducing fragmentationand without suffering the performance penaltiesassociated with evicting long lived code traces
when experts participate in collaborative systems tension may arise between them and novice contributors in particular when experts perceive novices as bother or threat the experts may express territoriality behaviors communicating ownership of target of interest in this paper we describe the results of user study of mobile social tagging system deployed within museum gallery to group of novices and experts collaboratively tagging part of the collection we observed that experts express greater feelings of ownership towards their contributions to the system and the museum in general experts were more likely than novices to participate at higher rates and to negatively evaluate contributions made by others we suggest number of design strategies to balance experts expressions of territoriality so as to motivate their participation while discouraging exclusionary behaviors
uniform interface allows programs to be written relatively independently of specific services and yet work with wide variety of the services available in distributed environment ideally the interface provides this uniform access without excessive complexity in the interface or loss of performance however uniform interface does not arise from careful design of individual system interfaces alone it requires explicit definition in this paper the uio uniform system interface that has been used for the past five years in the distributed operating system is described with the focus on the key design issues this interface provides several extensions beyond the interface of unix trade including support for record locking atomic transactions and replication as well as attributes that indicate whether optional semantics and operations are available experience in using and implementing this interface with variety of different services is described along with the performance of both local and network it is concluded that the uio interface provides uniform system interface with significant functionality wide applicability and no significant performance penalty
web document could be seen to be composed of textual content as well as social metadata of various forms eg anchor text search query and social annotation both of which are valuable to indicate the semantic content of the document however due to the free nature of the web the two streams of web data suffer from the serious problems of noise and sparseness which have actually become the major challenges to the success of many web mining applications previous work has shown that it could enhance the content of web document by integrating anchor text and search query in this paper we study the problem of exploring emergent social annotation for document enhancement and propose novel reinforcement framework to generate social representation of document distinguishing from prior work textual content and social annotation are enhanced simultaneously in our framework which is achieved by exploiting kind of mutual reinforcement relationship behind them two convergent models social content model and social annotation model are symmetrically derived from the framework to represent enhanced textual content and enhanced social annotation respectively the enhanced document is referred to as social document or sdoc in that it could embed complementary viewpoints from many web authors and many web visitors in this sense the document semantics is enhanced exactly by exploring social wisdom we build the framework on large delicious data and evaluate it through three typical web mining applications annotation classification and retrieval experimental results demonstrate that social representation of web document could boost the performance of these applications significantly
data mining is supposed to be an iterative and exploratory process in this context we are working on project with the overall objective of developing practical computing environment for the human centered exploratory mining of frequent sets one critical component of such an environment is the support for the dynamic mining of constrained frequent sets of items constraints enable users to impose certain focus on the mining process dynamic means that in the middle of the computation users are able to change such as tighten or relax the constraints and or ii change the minimum support threshold thus having decisive influence on subsequent computations in real life situation the available buffer space may be limited thus adding another complication to the problemin this article we develop an algorithm called dcf for dynamic constrained frequent set computation this algorithm is enhanced with few optimizations exploiting lightweight structure called segment support map it enables dcf to obtain sharper bounds on the support of sets of items and to ii better exploit properties of constraints furthermore when handling dynamic changes to constraints dcf relies on the concept of delta member generating function which generates precisely the sets of items that satisfy the new but not the old constraints our experimental results show the effectiveness of these enhancements
we address the problem of designing practical energy efficient protocols for data collection in wireless sensor networks using predictive modeling prior work has suggested several approaches to capture and exploit the rich spatio temporal correlations prevalent in wsns during data collection although shown to be effective in reducing the data collection cost those approaches use simplistic corelation models and further ignore many idiosyncrasies of wsns in particular the broadcast nature of communication our proposed approach is based on approximating the joint probability distribution over the sensors using undirected graphical models ideally suited to exploit both the spatial correlations and the broadcast nature of communication we present algorithms for optimally using such model for data collection under different communication models and for identifying an appropriate model to use for given sensor network experiments over synthetic and real world datasets show that our approach significantly reduces the data collection cost
ws bpel applications are kind of service oriented application they use xpath extensively to integrate loosely coupled workflow steps however xpath may extract wrong data from the xml messages received resulting in erroneous results in the integrated process surprisingly although xpath plays key role in workflow integration inadequate researches have been conducted to address the important issues in software testing this paper tackles the problem it also demonstrates novel transformation strategy to construct artifacts we use the mathematical definitions of xpath constructs as rewriting rules and propose data structure called xpath rewriting graph xrg which not only models how an xpath is conceptually rewritten but also tracks individual rewritings progressively we treat the mathematical variables in the applied rewriting rules as if they were program variables and use them to analyze how information may be rewritten in an xpath conceptually we thus develop an algorithm to construct xrgs and novel family of data flow testing criteria to test ws bpel applications experiment results show that our testing approach is promising
the paper presents methods that we have implemented to improve the quality of the def uses reported for dynamically allocated locations the methods presented are based on the ruggieri murtagh naming scheme for dynamically created locations we expand upon this scheme to name dynamically allocated locations for some user written allocation routines using this expanded naming scheme we introduce an inexpensive non iterative and localized calculation of extended must alias analysis to handle dynamically allocated locations and show how this information can be used to improve def use information this is the first attempt to specify must alias information for names which represent set of dynamically allocated locations empirical results are presented to illustrate the usefulness of our method we consider this work step towards developing practical re engineering tools for
we present comprehensive system for weather data visualization weather data are multivariate and contain vector fields formed by wind speed and direction several well established visualization techniques such as parallel coordinates and polar systems are integrated into our system we also develop various novel methods including circular pixel bar charts embedded into polar systems enhanced parallel coordinates with shape axis and weighted complete graphs our system was used to analyze the air pollution problem in hong kong and some interesting patterns have been found
the discovery of characteristic rules is well known data mining task and has lead to several successful applications however because of the descriptive nature of characteristic rules typically very large number of them is discovered during the mining stage this makes monitoring and control of these rules in practice extremely costly and difficult therefore selection of the most promising subset of rules is desirable some heuristic rule selection methods have been proposed in the literature that deal with this issue in this paper we propose an integer programming model to solve the problem of optimally selecting the most promising subset of characteristic rules moreover the proposed technique enables to control user defined level of overall quality of the model in combination with maximum reduction of the redundancy extant in the original ruleset we use real world data to empirically evaluate the benefits and performance of the proposed technique against the well known rulecover heuristic results demonstrate that the proposed integer programming techniques are able to significantly reduce the number of retained rules and the level of redundancy in the final ruleset moreover the results demonstrate that the overall quality in terms of the discriminant power of the final ruleset slightly increases if integer programming methods are used
we define two new classes of shared memory objects ratifiers which detect agreement and conciliators which ensure agreement with some probability we show that consensus can be solved by an alternating sequence of these objects and observe that most known randomized consensus algorithms have this structure we give deterministic valued ratifier for an unbounded number of processes that uses lg log log space and individual work we also give randomized conciliator for any number of values in the probabilistic write model with processes that guarantees agreement with constant probability while using one multiwriter register log expected individual work and expected total work combining these objects gives consensus protocol for the probabilistic write model that uses log individual work and nlog total work no previous protocol in this model uses sublinear individual work or linear total work for constant
this paper describes the design and implementation of scalable run time system and an optimizing compiler for unified parallel upc an experimental evaluation on bluegene distributed memory machine demonstrates that the combination of the compiler with the runtime system produces programs with performance comparable to that of efficient mpi programs and good performance scalability up to hundreds of thousands of processorsour runtime system design solves the problem of maintaining shared object consistency efficiently in distributed memory machine our compiler infrastructure simplifies the code generated for parallel loops in upc through the elimination of affinity tests eliminates several levels of indirection for accesses to segments of shared arrays that the compiler can prove to be local and implements remote update operations through lower cost asynchronous message the performance evaluation uses three well known benchmarks hpc randomaccess hpc stream and nas cg to obtain scaling and absolute performance numbers for these benchmarks on up to processors the full bluegene machine these results were used to win the hpc challenge competition at sc in seattle wa demonstrating that pgas languages support both productivity and performance
this paper addresses the problem of evaluating ranked top queries with expensive predicates as major dbmss now all support expensive user defined predicates for boolean queries we believe such support for ranked queries will be even more important first ranked queries often need to model user specific concepts of preference relevance or similarity which call for dynamic user defined functions second middleware systems must incorporate external predicates for integrating autonomous sources typically accessible only by per object queries third ranked queries often accompany boolean ranking conditions which may turn predicates into expensive ones as the index structure on the predicate built on the base table may be no longer effective in retrieving the filtered objects in order fourth fuzzy joins are inherently expensive as they are essentially user defined operations that dynamically associate multiple relations these predicates being dynamically defined or externally accessed cannot rely on index mechanisms to provide zero time sorted output and must instead require per object probe to evaluate to enable probe minimization we develop the problem as cost based optimization of searching over potential probe schedules in particular we decouple probe scheduling into object and predicate scheduling problems and develop an analytical object scheduling optimization and dynamic predicate scheduling optimization which combined together form cost effective probe schedule
gossip based mechanisms are touted for their simplicity limited resource usage robustness to failures and tunable system behavior these qualities make gossiping an ideal mechanism for storage systems that are responsible for maintaining and updating data in mist of failures and limited resources eg intermittent network connectivity limited bandwidth constrained communication range or limited battery power we focus on persistent storage systems that unlike mere caches are responsible for both the durability and the consistency of data examples of such systems may be encountered in many different environments in particular wide area networks constrained by limited bandwidth wireless sensor networks characterized by limited resources and mobile ad hoc networks suffering from intermittent connectivity in this paper we demonstrate the qualities of gossiping in these three respective environments
greedy routing and face routing route data by using location information of nodes to solve scalability problem incurred in table driven routing greedy routing efficiently routes data in dense networks but it does not guarantee message delivery face routing has been designed to achieve guaranteed message delivery face routing however is not efficient in terms of routing path length in this paper we present skipping face routing sfr protocol to reduce the face traversal cost incurred in the existing approaches in sfr we specify set of sufficient conditions so that each node can determine if it can skip some intermediate nodes during face traversing based solely on the neighbour information of the node resulting in reduced total number of transmissions by using simulation studies we show that sfr significantly reduces the communication cost and traversal time required in face traversal compared with the existing approaches
to improve performance and reduce power processor designers employ advances that shrink feature sizes lower voltage levels reduce noise margins and increase clock rates however these advances make processors more susceptible to transient faults that can affect correctness while reliable systems typically employ hardware techniques to address soft errors software techniques can provide lower cost and more flexible alternative this paper presents novel software only transient fault detection technique called swift swift efficiently manages redundancy by reclaiming unused instruction level resources present during the execution of most programs swift also provides high level of protection and performance with an enhanced control flow checking mechanism we evaluate an implementation of swift on an itanium which demonstrates exceptional fault coverage with reasonable performance cost compared to the best known single threaded approach utilizing an ecc memory system swift demonstrates average speedup
as people leave on the web their opinions on products and services they have used it has become important to develop methods of semi automatically classifying and gauging them the task of analyzing such data collectively called customer feedback data is known as opinion mining opinion mining consists of several steps and multiple techniques have been proposed for each step in this paper we survey and analyze various techniques that have been developed for the key tasks of opinion mining on the basis of our survey and analysis of the techniques we provide an overall picture of what is involved in developing software system for opinion mining
increased device density and working set size are driving rise in cache capacity which comes at the cost of high access latency based on the characteristic of shared data which is accessed frequently and consumes little capacity novel two level directory organization is proposed to minimize the cache access time in this paper in this scheme small fast directory is used to offer fast hits for great fraction of memory accesses detailed simulation results show that on core tiled chip multiprocessor this approach reduces average access latency by compared to the general cache organization and improves the overall performance by on average
decision tree construction is well studied problem in data mining recently there has been much interest in mining streaming data domingos and hulten have presented one pass algorithm for decision tree construction their work uses hoeffding inequality to achieve probabilistic bound on the accuracy of the tree constructedin this paper we revisit this problem we make the following two contributions we present numerical interval pruning nip approach for efficiently processing numerical attributes our results show an average of reduction in execution times we exploit the properties of the gain function entropy and gini to reduce the sample size required for obtaining given bound on the accuracy our experimental results show reduction in the number of data instances required
the reconstruction of objects from point cloud is based on sufficient separation of the points representing objects of interest from the points of other unwanted objects this operation called segmentation is discussed in this paper we present an interactive unstructured point cloud segmentation based on graph cut method where the cost function is derived from euclidean distance of point cloud points the graph topology and direct point cloud segmentation are the novel parts of our work the segmentation is presented on real application the terrain reconstruction of complex miniature paper model the langweil model of prague
cartesian product network is obtained by applying the cross operation on two graphs in this paper we study the problem of constructing the maximum number of edge disjoint spanning trees abbreviated to edsts in cartesian product networks letg vg eg be graph havingnedsts andf vf ef be graph havingnedsts two methods are proposed for constructing edsts in the cartesian product ofgandf denoted byg times the graphghast eg vg more edges than that are necessary for constructingnedsts in it and the graphfhast ef vf more edges than that are necessary for constructingnedsts in it by assuming thatt ge nandt ge our first construction shows thatn nedsts can be constructed ing times our second construction does not need any assumption and it constructsn edsts ing times by applying the proposed methods it is easy to construct the maximum numbers of edsts in many important cartesian product networks such as hypercubes tori generalized hypercubes mesh connected trees and hyper petersen networks
in today’s fast changing business environment flexible process aware information systems paiss are required to allow companies to rapidly adjust their business processes to changes in the environment however increasing flexibility in large paiss usually leads to less guidance for its users and consequently requires more experienced users to allow for flexible systems with high degree of support intelligent user assistance is required in this paper we propose recommendation service which when used in combination with flexible paiss can support end users during process execution by giving recommendations on possible next steps recommendations are generated based on similar past process executions by considering the specific optimization goals in this paper we also evaluate the proposed recommendation service by means of experiments
this paper proposes an efficient method the frequent items ultrametric trees fiut for mining frequent itemsets in database fiut uses special frequent items ultrametric tree fiu tree structure to enhance its efficiency in obtaining frequent itemsets compared to related work fiut has four major advantages first it minimizes overhead by scanning the database only twice second the fiu tree is an improved way to partition database which results from clustering transactions and significantly reduces the search space third only frequent items in each transaction are inserted as nodes into the fiu tree for compressed storage finally all frequent itemsets are generated by checking the leaves of each fiu tree without traversing the tree recursively which significantly reduces computing time fiut was compared with fp growth well known and widely used algorithm and the simulation results showed that the fiut outperforms the fp growth in addition further extensions of this approach and their implications are discussed
horizontal partitioning is logical database design technique which facilitates efficient execution of queries by reducing the irrelevant objects accessed given set of most frequently executed queries on class the horizontal partitioning generates horizontal class fragments each of which is subset of object instances of the class that meet the queries requirements there are two types of horizontal class partitioning namely primary and derived primary horizontal partitioning of class is performed using predicates of queries accessing the class derived horizontal partitioning of class is the partitioning of class based on the horizontal partitioning of another class we present algorithms for both primary and derived horizontal partitioning and discuss some issues in derived horizontal partitioning and present their solutions there are two important aspects for supporting database operations on partitioned database namely fragment localization for queries and object migration for updates fragment localization deals with identifying the horizontal fragments that contribute to the result of the query and object migration deals with migrating objects from one class fragment to another due to updates we provide novel solutions to these two problems and finally we show the utility of horizontal partitioning for query processing
we initiate the study of scenarios that combine online decision making with interaction between non cooperative agents to this end we introduce online games that model such scenarios as non cooperative games and lay the foundations for studying this model roughly speaking an online game captures systems in which independent agents serve requests in common environment the requests arrive in an online fashion and each is designated to be served by different agent the cost incurred by serving request is paid for by the serving agent and naturally the agents seek to minimize the total cost they pay since the agents are independent it is unlikely that some central authority can enforce policy or an algorithm centralized or distributed on them and thus the agents can be viewed as selfish players in non cooperative game in this game the players have to choose as strategy an online algorithm according to which requests are served to further facilitate the game theoretic approach we suggest the measure of competitive analysis as the players decision criterion as the expected result of non cooperative games is an equilibrium the question of finding the equilibria of game is of central importance and thus it is the central issue we concentrate on in this paper we study some natural examples for online games in order to obtain general insights and develop generic techniques we present an abstract model for the study of online games generalizing metrical task systems we suggest method for constructing equilibria in this model and further devise techniques for implementing it
data exchange deals with the following problem given an instance over source schema specification of the relationship between the source and the target and dependencies on the target construct an instance over target schema that satisfies the given relationships and dependencies recently for data exchange settings without target dependencies libkin pods introduced new concept of solutions based on the closed world assumption so calledcwa solutions and showed that in some respects this new notion behaves better than the standard notion of solutions considered in previous papers on data exchange the present paper extends libkin’s notion of cwa solutions to data exchange settings with target dependencies we show that when restricting attention to data exchange settings with weakly acyclic target dependencies this new notion behaves similarly as before the core is the unique minimal cwa solution and computing cwa solutions as well as certain answers to positive queries is possible in polynomial time and can be ptime hard however there may be more than one maximal cwa solution and going beyond the class of positive queries we obtain that there are conjunctive queries with just one inequality for which evaluating the certain answers is conp hard finally we consider the existence of cwa solutions problem while the problem is tractable for data exchange settings with weakly acyclic target dependencies it turns out to be undecidable for general data exchange settings as consequence we obtain that also the existence of universal solutions problem is undecidable in genera
we propose video editing system that allows user to apply time coherent texture to surface depicted in the raw video from single uncalibrated camera including the surface texture mapping of texture image and the surface texture synthesis from texture swatch our system avoids the construction of shape model and instead uses the recovered normal field to deform the texture so that it plausibly adheres to the undulations of the depicted surface the texture mapping method uses the nonlinear least squares optimization of spring model to control the behavior of the texture image as it is deformed to match the evolving normal field through the video the texture synthesis method uses coarse optical flow to advect clusters of pixels corresponding to patches of similarly oriented surface points these clusters are organized into minimum advection tree to account for the dynamic visibility of clusters we take rather crude approach to normal recovering and optical flow estimation yet the results are robust and plausible for nearly diffuse surfaces such as faces and shirts
many applications require the discovery of items which have occur frequently within multiple distributed data streams past solutions for this problem either require high degree of error tolerance or can only provide results periodically in this paper we introduce new algorithm designed for continuously tracking frequent items over distributed data streams providing either exact or approximate answers we tested the efficiency of our method using two real world data sets the results indicated significant reduction in communication cost when compared to naïve approaches and an existing efficient algorithm called top monitoring since our method does not rely upon approximations to reduce communication overhead and is explicitly designed for tracking frequent items our method also shows increased quality in its tracking results
the multiprocessor scheduling of collections of real time jobs is considered sufficient tests are derived for feasibility analysis of collection of sporadic jobs where job migration between processors is forbidden the fixed priority scheduling of real time jobs with job migration is analyzed and sufficient tests of schedulability are obtained for the deadline monotonic dm and the earliest deadline first edf scheduling algorithms the feasibility and schedulability tests of this paper may be applied even when the collection of jobs is incompletely specified the applicability of these tests to the scheduling of collections of jobs that are generated by systems of recurrent real time tasks is discussed in particular sufficient conditions for the dm scheduling of sporadic task systems are derived and compared to previously known tests
conventional oblivious routing algorithms are either not application aware or assume that each flow has its own private channel to ensure deadlock avoidance we present framework for application aware routing that assures deadlock freedom under one or more channels by forcing routes to conform to an acyclic channel dependence graph arbitrary minimal routes can be made deadlock free through appropriate static channel allocation when two or more channels are available given bandwidth estimates for flows we present mixed integer linear programming milp approach and heuristic approach for producing deadlock free routes that minimize maximum channel load the heuristic algorithm is calibrated using the milp algorithm and evaluated on number of benchmarks through detailed network simulation our framework can be used to produce application aware routes that target the minimization of latency number of flows through link bandwidth or any combination thereof
this paper presents high availability system architecture called indra an integrated framework for dependable and revivable architecture that enhances multicore processor or cmp with novel security and fault recovery mechanisms indra represents the first effort to create remote attack immune self healing network services using the emerging multicore processors by exploring the property of tightly coupled multicore system indra pioneers several concepts it creates hardware insulation establishes finegrained fault monitoring exploits monitoring backup concurrency and facilitates fast recovery services with minimal performance impact in addition indra’s fault exploit monitoring is implemented in software rather than in hardware logic thereby providing better flexibility and upgradability to provide efficient service recovery and thus improve service availability we propose novel delta state backup and recovery on demand mechanism in indra that substantially outperforms conventional checkpointing schemes we demonstrate and evaluate indra’s capability and performance using real network services and cycle level architecture simulator as indicated by our performance results indra is highly effective in establishing more dependable system with high service availability using emerging multicore processors
wire delays are major concern for current and forthcoming processorsone approach to attack this problem is to divide the processorinto semi independent units referred to as clusters acluster usually consists of local register file and subset of thefunctional units while the data cache remains centralized however as technology evolves the latency of such centralizedcache will increase leading to an important performance impactin this paper we propose to include flexible low latency buffers ineach cluster in order to reduce the performance impact of highercache latencies the reduced number of entries in each buffer permitsthe design of flexible ways to map data from to these buffersthe proposed buffers are managed by the compiler whichis responsible to decide which memory instructions make use ofthemeffective instruction scheduling techniques are proposed togenerate code that exploits these buffers results for the media benchbenchmark suite show that the performance of clusteredvliw processor with unified data cache is improved by when such buffers are used in addition the proposed architecturealso shows significant advantages over both multivliw processorsand clustered processors with word interleaved cache two state of the art designs with distributed data cache
in this work we propose new model for multimedia documents suitable for government applications that provides different representations of the same multimedia contents allowing to solve open problems related to the technology evolution different documental format and access rights the model constitutes the starting point for an information system capable of managing documental streams integrating and processing different multimedia data types and providing facilities for indexing storage and retrieval together with long term preservation strategies we have implemented prototypal version of the system that realises the described information retrieval and presentation tasks for juridical documents
process based on argumentation theory is described for classifying very noisy data more specifically process founded on concept called arguing from experience is described where by several software agents argue about the classification of new example given individual case bases containing previously classified examples two arguing from experience protocols are described padua which has been applied to binary classification problems and pisa which has been applied to multi class problems evaluation of both padua and pisa indicates that they operate with equal effectiveness to other classification systems in the absence of noise however the systems out perform comparable systems given very noisy data keywords classification argumentation noisy data
this paper presents review of rfid based approaches used for the development of smart spaces and smart objects we explore approaches that enable rfid technology to make the transition from the recognized applications such as retail to ubiquitous computing in which computers and technology fade into the background of day to day life in this paper we present the case for the use of rfid technology as key technology of ubiquitous computing due to its ability to embed itself in everday objects and spaces frameworks to support the operation of rfid based smart objects and spaces are discussed and key design concepts identified conceptual frameworks based on academic research and deployed frameworks based on real world implementations are reviewed and the potential for rfid as truly ubiquitous technology is considered and presented
the ability to check memory references against their associated array buffer bounds helps programmers to detect programming errors involving address overruns early on and thus avoid many difficult bugs down the line this paper proposes novel approach called boud to the array bounds checking problem that exploits the debug register hardware in modern cpus boud allocates debug register to monitor accesses to an array or buffer within loop so that accesses stepping outside the array’s or buffer’s bound will trigger breakpoint exeption because the number of debug registers is typically small in cases when hardware bounds checking is not possible boud falls back to software bounds checking although boud can effectively eliminate per array reference software checking overhead in most cases it still incurs fixed set up overhead for each use of an array within loop this paper presents the detailed design and implementation of the boud compiler and comprehensive evaluation of various performance tradeoffs associated with the proposed array bounds checking technique for the set of real world network applications we tested including apache sendmail bind etc the latency penalty of boud’s bounds checking mechanism is between to respectively when compared with the vanilla gcc compiler which does not perform any bounds checking
this paper presents novel method for shape analysis which can deal with complex expressions in language it supports taking addresses of fields and stack variables the concept of abstract evaluation path aep is proposed which is generated from the expression in the language aep is used to refine the abstract shape graph asg to get set of more precise asgs on which the semantics of the statement can be defined easily the results can be used to determine shape invariants and detect memory leak conservatively prototype has been implemented and the results of the experiment are shown
clusters of workstations cows are becoming increasingly popular as cost effective alternative to parallel computers in these systems processors are connected using irregular topologies providing the wiring flexibility scalability and incremental expansion capability required in this environment myrinet is one of the most popular interconnection networks for cows myrinet uses source routing and wormhole switching the up down routing algorithm is used to build the network routes on the other hand in myrinet network behavior is controlled by the software running at the network interfaces hence new features such as new routing algorithms can be added by only changing this software in previous work we proposed the in transit buffer itb mechanism to improve the performance of source routing based networks the itb mechanism temporarily ejects packets from the network at some intermediate hosts and later reinjects them into the network performing special kind of virtual cut through switching at these hosts we applied this mechanism to up down routing in order to remove the down rightarrow up forbidden channel dependences that prevented minimal routing between every pair of hosts results showed that network throughput can be more than doubled on medium sized switches networks in this paper we analyze in depth the effect of using itbs in the network showing that they not only serve for guaranteeing minimal routing but also that they are powerful mechanism able to balance network traffic and reduce network contention to demonstrate these capabilities we apply the itb mechanism to improved routing schemes such as dfs and smart routing these routing algorithms without itbs are able to improve the performance of up down by percent and percent respectively for switch network the evaluation results show that when itbs are used together with these improved routing algorithms network throughput achieved by dfs and smart routing can still be improved by percent and percent respectively however smart routing requires time to compute the routing tables that rapidly grows with network size it being impossible in practice to build networks with more than switches this high computational cost is mainly motivated by the need of obtaining deadlock free routing tables however when itbs are used one can decouple the stages of computing routing tables and breaking cycles moreover as stated above itbs can be used to reduce network contention in this way in this paper we also propose completely new routing algorithm that tries to balance network traffic by using simple and low time consuming strategy the proposed algorithm guarantees deadlock freedom and reduces network contention with the use of itbs the evaluation results show that our algorithm obtains unprecedented throughputs in switch networks tripling the original up down and almost doubling smart routing
over the past nine years the formal methods group at the ibm haifa research laboratory has made steady progress in developing tools and techniques that make the power of model checking accessible to the community of hardware designers and verification engineers to the point where it has become an integral part of the design cycle of many teams we discuss our approach to the problem of integrating formal methods into an industrial design cycle and point out those techniques which we have found to be especially effective in an industrial setting
we present streamflow new multithreaded memory manager designed for low overhead high performance memory allocation while transparently favoring locality streamflow enables low over head simultaneous allocation by multiple threads and adapts to sequential allocation at speeds comparable to that of custom sequential allocators it favors the transparent exploitation of temporal and spatial object access locality and reduces allocator induced cache conflicts and false sharing all using unified design based on segregated heaps streamflow introduces an innovative design which uses only synchronization free operations in the most common case of local allocations and deallocations while requiring minimal non blocking synchronization in the less common case of remote deallocations spatial locality at the cache and page level is favoredby eliminating small objects headers reducing allocator induced conflicts via contiguous allocation of page blocks in physical memory reducing allocator induced false sharing by using segregated heaps and achieving better tlb performance and fewer page faults via the use of superpages combining these locality optimizations with the drastic reduction of synchronization and latency overhead allows streamflow to perform comparably with optimized sequential allocators and outperform on shared memory systemwith four two way smt processors four state of the art multi processor allocators by sizeable margins in our experiments the allocation intensive sequential and parallel benchmarks used in our experiments represent variety of behaviors including mostly local object allocation deallocation patterns and producer consumer allocation deallocation patterns
spatial joins are one of the most important operations for combining spatial objects of several relations in this paper spatial join processing is studied in detail for extended spatial objects in two dimensional data space we present an approach for spatial join processing that is based on three steps first spatial join is performed on the minimum bounding rectangles of the objects returning set of candidates various approaches for accelerating this step of join processing have been examined at the last year’s conference bks in this paper we focus on the problem how to compute the answers from the set of candidate which is handled by the following two steps first of all sophisticated approximations are used to identify answers as well as to filter out false hits from the set of candidates for this purpose we investigate various types of conservative and progressive approximations in the last step the exact geometry of the remaining candidates has to be tested against the join predicate the time required for computing spatial join predicates can essentially be reduced when objects are adequately organized in main memory in our approach objects are first decomposed into simple components which are exclusively organized by main memory resident spatial data structure overall we present complete approach of spatial join processing on complex spatial objects the performance of the individual steps of our approach is evaluated with data sets from real cartographic applications the results show that our approach reduces the total execution time of the spatial join by factors
the effort required to complete software projects is often estimated completely or partially using the judgment of experts whose assessment may be biased in general such bias as there is seems to be towards estimates that are overly optimistic the degree of bias varies from expert to expert and seems to depend on both conscious and unconscious processes one possible approach to reduce this bias towards over optimism is to combine the judgments of several experts this paper describes an experiment in which experts with different backgrounds combined their estimates in group discussion first software professionals were asked to provide individual estimates of the effort required for software development project subsequently they formed five estimation groups each consisting of four experts each of these groups agreed on project effort estimate via the pooling of knowledge in discussion we found that the groups submitted less optimistic estimates than the individuals interestingly the group discussion based estimates were closer to the effort expended on the actual project than the average of the individual expert estimates were ie the group discussions led to better estimates than mechanical averaging of the individual estimates the groups rsquo ability to identify greater number of the activities required by the project is among the possible explanations for this reduction of bias
the problem of assessing the significance of data mining results on high dimensional datasets has been studied extensively in the literature for problems such as mining frequent sets and finding correlations significance testing can be done by standard statistical tests such as chi square or other methods however the results of such tests depend only on the specific attributes and not on the dataset as whole moreover the tests are difficult to apply to sets of patterns or other complex results of data mining algorithms in this article we consider simple randomization technique that deals with this shortcoming the approach consists of producing random datasets that have the same row and column margins as the given dataset computing the results of interest on the randomized instances and comparing them to the results on the actual data this randomization technique can be used to assess the results of many different types of data mining algorithms such as frequent sets clustering and spectral analysis to generate random datasets with given margins we use variations of markov chain approach which is based on simple swap operation we give theoretical results on the efficiency of different randomization methods and apply the swap randomization method to several well known datasets our results indicate that for some datasets the structure discovered by the data mining algorithms is expected given the row and column margins of the datasets while for other datasets the discovered structure conveys information that is not captured by the margin counts
hybrid consistency new consistency condition for shared memory multiprocessors attempts to capture the guarantees provided by contemporary high performance architectures it combines the expressiveness of strong consistency conditions eg sequential consistency linearizability and the efficiency of weak consistency conditions eg pipelined ram causal memory memory access operations are classified either strong or weak global ordering of strong operations at different processes is guaranteed but there is very little guarantee on the ordering of weak operations at different processes except for what is implied by their interleaving with the strong operations formal and precise definition of this condition is given an efficient implementation of hybrid consistency on distributed memory machines is presented in this implementation weak opearations are executed instantaneously while the response time for strong operations is linear in the network delay it is proven that this is within constant factor of the optimal time bounds to motivate hybrid consistency it is shown that weakly consistent memories do not support non cooperative in particular non centralized algorithms for mutual exclusion
abstract the design of ad hoc mobile applications often requires the availability of consistent view of the application state among the participating hosts such views are important because they simplify both the programming and verification tasks we argue that preventing the occurrence of unannounced disconnection is essential to constructing and maintaining consistent view in the ad hoc mobile environment in this light we provide the specification for partitionable group membership service supporting ad hoc mobile applications and propose protocol for implementing the service unique property of this partitionable group membership is that messages sent between group members are guaranteed to be delivered successfully given appropriate system assumptions this property is preserved over time despite movement and frequent disconnections the protocol splits and merges groups and maintains logical connectivity graph based on notion of safe distance an implementation of the protocol in java is available for testing this work is used in an implementation of lime middleware for mobility that supports transparent sharing of data in both wired and ad hoc wireless environments
traditional indexes aim at optimizing the node accesses during query processing which however does not necessarily minimize the total cost due to the possibly large number of random accesses in this paper we propose general framework for adaptive indexes that improve overall query cost the performance gain is achieved by allowing index nodes to contain variable number of disk pages update algorithms dynamically re structure adaptive indexes depending on the data and query characteristics extensive experiments show that adaptive and trees significantly outperform their conventional counterparts while incurring minimal update overhead
the service oriented architecture enables the development of flexible large scale applications in open environments by dynamically combining web services nevertheless current techniques fail to address the problem of selecting adequate services to meet service consumer needs service selection must take into account non functional parameters especially the quality of service qos in this work we propose web service selection approach based on qos attributes we extend the ws policy to represent qos policies we apply ontological concepts to ws policy in order to enable semantic matching publishing qos policies is also examined in this work we propose to extend the uddi register to handle qos based policies
cascades of boosted ensembles have become popular in the object detection community following their highly successful introduction in the face detector of viola and jones since then researchers have sought to improve upon the original approach by incorporating new methods along variety of axes eg alternative boosting methods feature sets etc nevertheless key decisions about how many hypotheses to include in an ensemble and the appropriate balance of detection and false positive rates in the individual stages are often made by user intervention or by an automatic method that produces unnecessarily slow detectors we propose novel method for making these decisions which exploits the shape of the stage roc curves in ways that have been previously ignored the result is detector that is significantly faster than the one produced by the standard automatic method when this algorithm is combined with recycling method for reusing the outputs of early stages in later ones and with retracing method that inserts new early rejection points in the cascade the detection speed matches that of the best hand crafted detector we also exploit joint distributions over several features in weak learning to improve overall detector accuracy and explore ways to improve training time by aggressively filtering features
in today’s world where distributed systems form many of our critical infrastructures dependability outagesare becoming increasingly common in many situations it is necessary to not just detect failure but alsoto diagnose the failure ie to identify the source of the failure diagnosis is challenging since highthroughput applications with frequent interactions between the different components allow fast errorpropagation it is desirable to consider applications as black boxes for the diagnostic process in thispaper we propose monitor architecture for diagnosing failures in large scale network protocols themonitor only observes the message exchanges between the protocol entities pes remotely and doesnot access internal protocol state at runtime it builds causal graph between the pes based on theircommunication and uses this together with rule base of allowed state transition paths to diagnose thefailure the tests used for the diagnosis are based on the rule base and are assumed to have imperfectcoverage the hierarchical monitor framework allows distributed diagnosis handling failures at individualmonitors the framework is implemented and applied to reliable multicast protocol executing on ourcampus wide network fault injection experiments are carried out to evaluate the accuracy and latency ofthe diagnosis
the web has become the primary medium for accessing information and for conducting many types of online transactions including shopping paying bills making travel plans etc the primary mode of interaction over the web is via graphical browsers designed for visual navigation sighted users can visually segment web pages and quickly identify relevant information on the contrary screen readers the dominant assistive technology used by visually impaired individuals function by speaking out the screen’s content serially consequently users with visual impairments are forced to listen to the information in web pages sequentially thereby experiencing considerable information overload this problem becomes even more prominent when conducting online transactions that often involve number of steps spanning several pages thus there is large gap in web accessibility between individuals with visual impairments and their sighted counterparts this paper we describe our ongoing work on this problem we have developed several techniques that synergistically couple web content analysis user’s browsing context process modeling and machine learning to bridge this divide these techniques include context directed browsing that uses link context to find relevant information as users move from page to page change detection that separates the interface from the implementation of web pages and helps users find relevant information in changing web content and process modeling that helps users find concepts relevant in web transactions we describe these three techniques within the context of our hearsay non visual web browser
accurate identification of network applications is important for many network activities the traditional port based technique has become much less effective since many new applications no longer use well known fixed port numbers in this paper we propose novel profile based approach to identifying traffic flows belonging to the target application in contrast to the method used in previous studies of classifying traffic based on statistics of individual flows we build behavioral profiles of the target application which describe dominant patterns in the application based on the behavior profiles two level matching method is used to identify new traffic we first determine whether host participates in the target application by comparing its behavior with the profiles subsequently we compare each flow of the host with those patterns in the application profiles to determine which flows belong to this application we demonstrate the effectiveness of our method on campus traffic traces our results show that one can identify popular pp applications with very high accuracy
this paper presents novel adaptive voltage scheme based on lookahead circuit that checks the transmitter buffer for data transitions the advanced knowledge of incoming data patterns is used to adjust the link swing voltage improving delay and energy performance in the presented example system transition detection circuit is used to check the transmitter buffer for rising transitions in cycle in cycle when rising transition is detected higher supply voltage is applied to the driver for small portion of the clock cycle to boost the rising edge delay improving link performance lower voltage is used for all other transmissions improving the delay performance of falling edge transitions and the link energy dissipation for ghz link frequency the proposed approach improves energy dissipation by compared to traditional two inverter buffer an energy savings of up to is achieved compared to previously proposed dual voltage scheme
searching for and making decisions about information is becoming increasingly difficult as the amount of information and number of choices increases recommendation systems help users find items of interest of particular type such as movies or restaurants but are still somewhat awkward to use our solution is to take advantage of the complementary strengths of personalized recommendation systems and dialogue systems creating personalized aides we present system the adaptive place advisor that treats item selection as an interactive conversational process with the program inquiring about item attributes and the user responding individual long term user preferences are unobtrusively obtained in the course of normal recommendation dialogues and used to direct future conversations with the same user we present novel user model that influences both item search and the questions asked during conversation we demonstrate the effectiveness of our system in significantly reducing the time and number of interactions required to find satisfactory item as compared to control group of users interacting with non adaptive version of the system
empirical program optimizers estimate the values of key optimization parameters by generating different program versions and running them on the actual hardware to determine which values give the best performance in contrast conventional compilers use models of programs and machines to choose these parameters it is widely believed that model driven optimization does not compete with empirical optimization but few quantitative comparisons have been done to date to make such comparison we replaced the empirical optimization engine in atlas system for generating dense numerical linear algebra library called the blas with model driven optimization engine that used detailed models to estimate values for optimization parameters and then measured the relative performance of the two systems on three different hardware platforms our experiments show that model driven optimization can be surprisingly effective and can generate code whose performance is comparable to that of code generated by empirical optimizers for the blas
type systems for secure information flow aim to prevent program from leaking information from variables classified as to variables classified as in this work we extend such type system to address encryption and decryption our intuition is that encrypting plaintext yields ciphertext we argue that well typed polynomial time programs in our system satisfy computational probabilistic noninterference property provided that the encryption scheme is ind cca secure as part of our proof we first consider secure information flow in language with random assignment operator but no encryption we establish result that may be of independent interest namely that well typed probabilistically total programs with random assignments satisfy probabilistic noninterference we establish this result using weak probabilistic bisimulation
novel and very simple correct by construction top down methodology for high utilization mixed size placement is presented the polarbear algorithm combines recursive cut size driven partitioning with fast and scalable legalization of every placement subproblem generated by every partitioning the feedback provided by the legalizer at all stages of partitioning improves final placement quality significantly on standard ibm benchmarks and dramatically on low white space adaptations of them compared to feng shui and capo polarbear is the only tool that can consistently find high quality placements for benchmarks with less than white space with white space at polarbear beats capo by in average total wirelength while feng shui frequently fails to find legal placements altogether with white space polarbear still beats capo by and feng shui by in average total wirelength in comparable run times
the choice of an interconnection network for parallel computer depends on large number of performance factors which are very often application dependent we propose performance evaluation and comparison methodology this methodology is applied to recently introduced class of interconnection networks multistage chordal ring based multistage interconnection network these networks are compared to the well known omega network of comparable architectural characteristics the methodology is expected to serve in the evaluation of the use of multistage interconnection networks as an intercommunication medium in today’s multi processor systems
we propose technique for measuring the structural similarity of semistructured documents based on entropy after extracting the structural information from two documents we use either ziv lempel encoding or ziv merhav crossparsing to determine the entropy and consequently the similarity between the documents to the best of our knowledge this is the first true linear time approach for evaluating structural similarity in an experimental evaluation we demonstrate that the results of our algorithm in terms of clustering quality are on par with or even better than existing approaches
systemc based design methodology has been widely adopted for heterogeneous multiprocessor system on chip mpsoc design however systemc is hardware oriented language and it is not the standard language used by designers to specify complex applications at algorithm level on the other hand simulink is popular choice for algorithm designer to specify complex system but there are few design tools to implement simulink models on mpsoc to deal with the increasing complexity of embedded applications and mpsoc architectures concurrent hardware software design and verification at different abstraction levels is an essential technique in this paper we present simulink systemc based multiprocessor soc design flow that enables mixed hardware software refinement and simulation at different abstraction levels in addition to opening new facilities like communication mapping exploration and interconnection component refinement we applied the proposed approach for software and communication architecture refinement for three multimedia applications mp motion jpeg and
an intrusion detection system ids can be key component of security incident response within organizations traditionally intrusion detection research has focused on improving the accuracy of idss but recent work has recognized the need to support the security practitioners who receive the ids alarms and investigate suspected incidents to examine the challenges associated with deploying and maintaining an ids we analyzed interviews with it security practitioners who have worked with idss and performed participatory observations in an organization deploying network ids we had three main research questions what do security practitioners expect from an ids what difficulties do they encounter when installing and configuring an ids and how can the usability of an ids be improved our analysis reveals both positive and negative perceptions that security practitioners have for idss as well as several issues encountered during the initial stages of ids deployment in particular practitioners found it difficult to decide where to place the ids and how to best configure it for use within distributed environment with multiple stakeholders we provide recommendations for tool support to help mitigate these challenges and reduce the effort of introducing an ids within an organization
desktop grids have evolved to combine peer to peer and grid computing techniques to improve the robustness reliability and scalability of job execution infrastructures however efficiently matching incoming jobs to available system resources and achieving good load balance in fully decentralized and heterogeneous computing environment is challenging problem in this paper we extend our prior work with new decentralized algorithm for maintaining approximate global load information and job pushing mechanism that uses the global information to push jobs towards underutilized portions of the system the resulting system more effectively balances load and improves overall system throughput through comparative analysis of experimental results across different system configurations and job profiles performed via simulation we show that our system can reliably execute grid applications on distributed set of resources both with low cost and with good load balance
in this paper we present an example based motion synthesis technique that generates continuous streams of high fidelity controllable motion for interactive applications such as video games our method uses new data structure called parametric motion graph to describe valid ways of generating linear blend transitions between motion clips dynamically generated through parametric synthesis in realtime our system specifically uses blending based parametric synthesis to accurately generate any motion clip from an entire space of motions by blending together examples from that space the key to our technique is using sampling methods to identify and represent good transitions between these spaces of motion parameterized by continuously valued parameter this approach allows parametric motion graphs to be constructed with little user effort because parametric motion graphs organize all motions of particular type such as reaching to different locations on shelf using single parameterized graph node they are highly structured facilitating fast decision making for interactive character control we have successfully created interactive characters that perform sequences of requested actions such as cartwheeling or punching
there is growing evidence that for wide variety of database workloads and system configurations locking based concurrency control outperforms other types of concurrency control strategies however in the presence of increased data contention locking protocols such as two phase locking perform poorly in this paper we analyze family of locking based protocols that employ new relationship between locks called ordered sharing using centralized database simulation model we demonstrate that these protocols exhibit comparable performance to that of traditional locking based protocols when data contention is low and they exhibit superior performance when data contention is high furthermore we show that the performance of these protocols improves as resources become more plentiful this is particularly significant because the performance of two phase locking degrades as result of data contention not resource contention thus introducing additional resources improves the performance of the proposed protocols though it does not benefit two phase locking significantly
the aim of this paper is to establish methodological foundation for human computer interaction hci researchers aiming to assess trust between people interacting via computer mediated communication cmc technology the most popular experimental paradigm currently employed by hci researchers are social dilemma games based on the prisoner’s dilemma pd technique originating from economics hci researchers employing this experimental paradigm currently interpret the rate of cooperation measured in the form of collective pay off as the level of trust the technology allows its users to develop we argue that this interpretation is problematic since the game’s synchronous nature models only very specific trust situations furthermore experiments that are based on pd games cannot model the complexity of how trust is formed in the real world since they neglect factors such as ability and benevolence in conclusion we recommend means of improving social dilemma experiments by using asynchronous trust games collecting broader range of data in particular qualitative and increased use of longitudinal studies
the association rule mining one of the most popular data mining techniques is to find the frequent itemsets which occur commonly in transaction database of the various association algorithms the apriori is the most popular one and its implementation technique to improve the performance has been continuously developed during the past decade in this paper we propose bitmap based association rule technique called bar in order to drastically improve the performance of the apriori algorithm compared to the latest apriori implementation our approach can improve the performance by nearly up to two orders of magnitude this gain comes mainly from the following characteristics of bar bitmap based implementation paradigm reduction of redundant bitmap and operations and an efficient implementation of bitmap and and bit counting operation by exploiting the advanced cpu technology including simd and sw prefetching we will describe the basic concept of bar approach and its optimization techniques and will show through experimental results how each of the above characteristics of bar can contribute the performance improvement
instruction schedulers employ code motion as means of instruction reordering to enable scheduling of instructions at points where the resources required for their execution are available in addition driven by the profiling data schedulers take advantage of predication and speculation for aggressive code motion across conditional branches optimization algorithms for partial dead code elimination pde and partial redundancy elimination pre employ code sinking and hoisting to enable optimization however unlike instruction scheduling these optimization algorithms are unaware of resource availability and are incapable of exploiting profiling information speculation and predication in this paper we develop data flow algorithms for performing the above optimizations with the following characteristics opportunities for pre and pde enabled by hoisting and sinking are exploited ii hoisting and sinking of code statement is driven by availability of functional unit resources iii predication and speculation is incorporated to allow aggressive hoisting and sinking and iv path profile information guides predication and speculation to enable optimization
an important problem in many computer vision tasks is the separation of an object from its background one common strategy is to estimate appearance models of the object and background region however if the appearance is spatially varying simple homogeneous models are often inaccurate gaussian mixture models can take multimodal distributions into account yet they still neglect the positional information in this paper we propose localised mixture models lmms and evaluate this idea in the scope of model based tracking by automatically partitioning the fore and background into several subregions in contrast to background subtraction methods this approach also allows for moving backgrounds experiments with rigid object and the humaneva ii benchmark show that tracking is remarkably stabilised by the new model
we study generalization of the median problem with respect to an arbitrary dissimilarity measure given finite set our goal is to find set of size such that the sum of errors sigma isin minc isin is minimized the main result in this paper can be stated as follows there exists an nk epsilon time epsilon approximation algorithm for the median problem with respect to if the median problem can be approximated within factor of epsilon by taking random sample of constant size and solving the median problem on the sample exactly using this characterization we obtain the first linear time epsilon approximation algorithms for the median problem in an arbitrary metric space with bounded doubling dimension for the kullback leibler divergence relative entropy for mahalanobis distances and for some special cases of bregman divergences moreover we obtain previously known results for the euclidean median problem and the euclidean means problem in simplified manner our results are based on new analysis of an algorithm from
instruction set customization is an effective way to improve processor performance critical portions of applicationdata flow graphs are collapsed for accelerated execution on specialized hardware collapsing dataflow subgraphs will compress the latency along critical paths and reduces the number of intermediate results stored in the register file while custom instructions can be effective the time and cost of designing new processor for each application is immense to overcome this roadblock this paper proposes flexible architectural framework to transparently integrate custom instructions into general purpose processor hardware accelerators are added to the processor to execute the collapsed subgraphs simple microarchitectural interface is provided to support plug and play model for integrating wide range of accelerators into pre designed and verified processor core the accelerators are exploited using an approach of static identification and dynamic realization the compiler is responsible for identifying profitable subgraphs while the hardware handles discovery mapping and execution of compatible subgraphs this paper presents the design of plug and play transparent accelerator system and evaluates the cost performance implications of the design
this paper discusses some of the issues involved in implementing shared address space programming model on large scale distributed memory multiprocessors while such programming model can be implemented on both shared memory and message passing architectures we argue that the transparent coherent caching of global data provided by many shared memory architectures is of crucial importance because message passing mechanisms ar much more efficient than shared memory loads and stores for certain types of interprocessor communication and synchronization operations hwoever we argue for building multiprocessors that efficiently support both shared memory and message passing mechnisms we describe an architecture alewife that integrates support for shared memory and message passing through simple interface we expect the compiler and runtime system to cooperate in using appropriate hardware mechanisms that are most efficient for specific operations we report on both integrated and exclusively shared memory implementations of our runtime system and two applications the integrated runtime system drastically cuts down the cost of communication incurred by the scheduling load balancing and certain synchronization operations we also present preliminary performance results comparing the two systems
in this paper we present hybrid modeling framework for creating complex objects incrementally our system relies on an extended csg tree that assembles skeletal implicit primitives triangle meshes and point set models in coherent fashion we call this structure the hybridtree editing operations are performed by exploiting the complementary abilities of implicit and polygonal mesh surface representations in complete transparent way for the user implicit surfaces are powerful for combining shapes with boolean and blending operations while triangle meshes are well suited for local deformations such as ffd and fast visualization our system can handle point sampled geometry through mesh surface reconstruction algorithm the hybridtree may be evaluated through four kinds of queries depending on the implicit or explicit formulation is required field function and gradient at given point in space point membership classification and polygonization every kind of query is achieved automatically in specific and optimized fashion for every node of the hybridtree
this paper proposes lightweight fusion method for general recursive function definitions compared with existing proposals our method has several significant practical features it works for general recursive functions on general algebraic data types it does not produce extra runtime overhea except for possible code size increase due to the success of fusion and it is readily incorporated in standard inlining optimization this is achieved by extending the ordinary inlining process with new fusion law that transforms term of the form fixgλxe to new fixed point term fixhλxe by promoting the function through the fixed point operator this is sound syntactic transformation rule that is not sensitive to the types of and this property makes our method applicable to wide range of functions including those with multi parameters in both curried and uncurried forms although this method does not guarantee any form of completeness it fuses typical examples discussed in the literature and others that involve accumulating parameters either in the tt foldl like specific forms or in general recursive forms without any additional machinery in order to substantiate our claim we have implemented our method in compiler although it is preliminary it demonstrates practical feasibility of this method
program slicing has been mainly studied in the context of imperative languages where it has been applied to many software engineering tasks like program understanding maintenance debugging testing code reuse etc this paper introduces the first forward slicing technique for multi paradigm declarative programs in particular we show how program slicing can be defined in terms of online partial evaluation our approach clarifies the relation between both methodologies and provides simple way to develop program slicing tools from existing partial evaluators
in this paper we propose multimedia categorization framework that is able to exploit information across different parts of multimedia document eg web page pdf microsoft office document for example web news page is composed by text describing some event eg car accident and picture containing additional information regarding the real extent of the event eg how damaged the car is or providing evidence corroborating the text part the framework handles multimedia information by considering not only the document’s text and images data but also the layout structure which determines how given text block is related to particular image the novelties and contributions of the proposed framework are support of heterogeneous types of multimedia documents document graph representation method and the computation of cross media correlations moreover we applied the framework to the tasks of categorising web news feed data and our results show significant improvement over single medium based framework
recent years have seen significant increase in the usage of computers and their capabilities to communicate with each other with this has come the need for more security and firewalls have proved themselves an important piece of the overall architecture as the body of rules they implement actually realises the security policy of their owners unfortunately there is little help for their administrators to understand the actual meaning of the firewall rules this work shows that formal logic is an important tool in this respect because it is particularly apt at modelling real world situations and its formalism is conductive to reason about such model as consequence logic may be used to prove the properties of the models it represents and is sensible way to go in order to create those models on computers to automate such activities we describe here prototype which includes description of network and the body of firewall rules applied to its components we were able to detect number of anomalies within the rule set inexistent elements eg hosts or services on destination components redundancies in rules defining the same action for network and hosts belonging to it irrelevance as rules would involve traffic that would not pass through filtering device and contradiction in actions applied to elements or to network and its hosts the prototype produces actual firewall rules as well generated from the model and expressed in the syntax of ipchains and cisco’s pix
multimedia on demand mod has grown dramaticallyin popularity especially in the domains of education business and entertainment therefore the investigation ofvarious alternatives to improve the performance of modservers has become major research focus the performanceof these servers can be enhanced significantly byservicing multiple requests from common set of resourcesthe exploited degrees of resource sharing depend greatly onhow servers schedule the waiting requests by schedulingthe requests intelligently server can support more concurrentcustomers and can reduce their waiting times for servicein this paper we provide detailed analysis of existingscheduling policies and propose two new policies calledquantized first come first serve qfcfs and enhancedminimum idling maximum loss iml we demonstratethe effectiveness of these policies through simulation andshow that they suite different patterns of customer waitingtolerance
in recent years many tone mapping operators tmos have been presented in order to display high dynamic range images hdri on typical display devices tmos compress the luminance range while trying to maintain contrast the dual of tone mapping inverse tone mapping expands low dynamic range image ldri into hdri hdris contain broader range of physical values that can be perceived by the human visual system the majority of today’s media is stored in low dynamic range inverse tone mapping operators itmos could thus potentially revive all of this content for use in high dynamic range display and image based lighting we propose an approximate solution to this problem that uses median cut to find the areas considered of high luminance and subsequently apply density estimation to generate an expand map in order to extend the range in the high luminance areas using an inverse photographic tone reproduction operator
this paper describes deskjockey system to provide users with additional display space by projecting information on passive physical surfaces in the environment the current deskjockey prototype utilizes projected desk and allows information to be moved easily between active and passive displays using world in miniature interaction metaphor four week in situ field study was conducted to compare usage of deskjockey with typical multiple monitor use the results revealed potential for utilizing passive physical surfaces in this manner and demonstrated that this type of display space has distinctive affordances and benefits which enhance traditional display space
the study reported here tested the efficacy of an information retrieval system output summary and visualization scheme for undergraduates taking vietnam war history who were in kuhlthau’s stage of researching history essay the visualization scheme consisted of the undergraduate’s own visualization of his or her essay topic drawn by the student on the bottom half of sheet of paper and visualization of the information space determined by index term counting on the top half of the same page to test the visualization scheme students enrolled in vietnam war history course were randomly assigned to either the visualization scheme group who received high recall search output or the nonvisualization group who received high precision search output the dependent variable was the mark awarded the essay by the course instructor there was no significant difference between the mean marks for the two groups we were pleasantly surprised with this result given the bad reputation of high recall as practical search strategy we hypothesize that more proactive visualization system is needed that takes the student through the process of using the visualization scheme including steps that induce student cognition about task subject objectives
this paper describes the scalable hyperlink store distributed in memory database for storing large portions of the web graph shs is an enabler for research on structural properties of the web graph as well as new link based ranking algorithms previous work on specialized hyperlink databases focused on finding efficient compression algorithms for web graphs by contrast this work focuses on the systems issues of building such database specifically it describes how to build hyperlink database that is fast scalable fault tolerant and incrementally updateable
fluid models of ip networks are based on set of ordinary differential equations that provide an abstract deterministic description of the average network dynamics when ip networks operate close to saturation fluid models were proved to provide reliable performance estimates instead when the network load is well below saturation standard fluid models lead to wrong performance predictions since all buffers are forecasted to be always empty so that the packet discard probability is predicted to be zero these incorrect predictions are due to the fact that fluid models being deterministic in nature do not account for the random traffic variations that may induce temporary congestion of some network elements in this paper we discuss three different approaches to describe random traffic variations in fluid models considering randomness at both the flow and packet levels with these approaches fluid models allow reliable results to be obtained also in the case of ip networks that operate well below their saturation load numerical results are presented to prove the accuracy and the versatility of the proposed approaches considering both stationary and non stationary traffic regimes
calculating with graphs and relations has many applications in the analysis of software systems for example the detection of design patterns or patterns of problematic design and the computation of design metrics these applications require an expressive query language in particular for the detection of graph patterns and an efficient evaluation of the queries even for large graphs in this paper we introduce rml simple language for querying and manipulating relations based on predicate calculus and crocopat an interpreter for rml programs rml is general because it enables the manipulation not only of graphs ie binary relations but of relations of arbitrary arity crocopat executes rml programs efficiently because it internally represents relations as binary decision diagrams data structure that is well known as compact representation of large relations in computer aided verification we evaluate rml by giving example programs for several software analyses and crocopat by comparing its performance with calculators for binary relations prolog system and relational database management system
submodular function maximization is central problem in combinatorial optimization generalizing many important problems including max cut in directed undirected graphs and in hypergraphs certain constraint satisfaction problems maximum entropy sampling and maximum facility location problems unlike submodular minimization submodular maximization is np hard in this paper we give the first constant factor approximation algorithm for maximizing any non negative submodular function subject to multiple matroid or knapsack constraints we emphasize that our results are for non monotone submodular functions in particular for any constant we present approximation for the submodular maximization problem under matroid constraints and approximation algorithm for this problem subject to knapsack constraints is any constant we improve the approximation guarantee of our algorithm to for partition matroid constraints this idea also gives approximation for maximizing monotone submodular function subject to partition matroids which improves over the previously best known guarantee of
usenet is popular distributed messaging and file sharing service servers in usenet flood articles over an overlay network to fully replicate articles across all servers however replication of usenet’s full content requires that each server pay the cost of receiving and storing over tbyte day this paper presents the design and implementation of usenetdht usenet system that allows set of cooperating sites to keep shared distributed copy of usenet articles usenetdht consists of client facing usenet nntp front ends and distributed hash table dht that provides shared storage of articles across the wide area this design allows participating sites to partition the storage burden rather than replicating all usenet articles at all sites usenetdht requires dht that maintains durability despite transient and permanent failures and provides high storage performance these goals can be difficult to provide simultaneously even in the absence of failures verifying adequate replication levels of large numbers of objects can be resource intensive and interfere with normal operations this paper introduces passing tone new replica maintenance algorithm for dhash that minimizes the impact of monitoring replication levels on memory and disk resources by operating with only pairwise communication passing tone’s implementation provides performance by using data structures that avoid disk accesses and enable batch operations microbenchmarks over local gigabit network demonstrate that the total system throughput scales linearly as servers are added providing mbyte of write bandwidth and mbyte of read bandwidth per server usenetdht is currently deployed on server network at sites running passing tone over the wide area this network supports our research laboratory’s live mbyte usenet feed and mbyte of synthetic read traffic these results suggest dht based design may be viable way to redesign usenet and globally reduce costs
as the internet infrastructure grows to support variety of services its legacy protocols are being overloaded with new functions such as traffic engineering today operators engineer such capabilities through clever but manual parameter tuning in this paper we propose back end support tool for large scale parameter configuration that is based on efficient parameter state space search techniques and on line simulation the framework is useful when the network protocol performance is sensitive to its parameter settings and its performance can be reasonably modeled in simulation in particular our system imports the network topology relevant protocol models and latest monitored traffic patterns into simulation that runs on line in network operations center noc each simulation evaluates the network performance for particular setting of protocol parameters we propose an efficient large dimensional parameter state space search technique called recursive random search rrs each sample point chosen by rrs results in single simulation an important feature of this framework is its flexibility it allows arbitrary choices in terms of the simulation engines used eg ns ssfnet network protocols to be simulated eg ospf bgp and in the specification of the optimization objectives we demonstrate the flexibility and relevance of this framework in three scenarios joint tuning of the red buffer management parameters at multiple bottlenecks traffic engineering using ospf link weight tuning and outbound load balancing of traffic at peering transit points using bgp localpref parameter
this paper describes goal centric approach for effectively maintaining critical system qualities such as security performance and usability throughout the lifetime of software system in goal centric traceability gct non functional requirements and their interdependencies are modeled as softgoals in softgoal interdependency graph sig probabilistic network model is then used to dynamically retrieve links between classes affected by functional change and elements within the sig these links enable developers to identify potentially impacted goals to analyze the level of impact on those goals to make informed decisions concerning the implementation of the proposed change and finally to develop appropriate risk mitigating strategies this paper also reports experimental results for the link retrieval and illustrates the gct process through an example of change applied to road management system
as system on chip soc designs become more complex it becomes increasingly harder to design communication architectures which satisfy design constraints manually traversing the vast communication design space for constraint driven synthesis is not feasible anymore in this paper we propose an approach that automates the synthesis of bus based communication architectures for systems characterized by possibly several throughput constraints our approach accurately and effectively prunes the large communication design space to synthesize feasible low cost bus architecture which satisfies the constraints in design
the multitasking virtual machine called from now on simply mvm is modification of the java virtual machine it enables safe secure and scalable multitasking safety is achieved by strict isolation of application from one another resource control augment security by preventing some denial of service attacks improved scalability results from an aggressive application of the main design principle of mvm share as much of the runtime as possible among applications and replicate everything else the system can be described as no compromise’approach all the known apis and mechanisms of the java programming language are available to applications mvm is implemented as series of carefully tuned modifications to the java hotspot virtual machine including the dynamic compiler this paper presents the design of mvm focusing on several novel and general techniques an in runtime design of lightweight isolation an extension of copying generational garbage collector to provide best effort management of portion of the heap space and transparent and automated mechanism for safe execution of user level native code mvm demonstrates that multitasking in safe language can be accomplished with high degree of protection without constraining the language and and with competitive performance characteristics
block sorting is an innovative compression mechanism introduced in by burrows and wheeler it involves three steps permuting the input one block at time through the use of the burrows wheeler transform bwt applying move to front mtf transform to each of the permuted blocks and then entropy coding the output with huffman or arithmetic coder until now block sorting implementations have assumed that the input message is sequence of characters in this paper we extend the block sorting mechanism to word based models we also consider other transformations as an alternative to mtf and are able to show improved compression results compared to mtf for large files of text the combination of word based modelling bwt and mtf like transformations allows excellent compression effectiveness to be attained within reasonable resource costs
statistical machine translation systems are usually trained on large amounts of bilingual text used to learn translation model and also large amounts of monolingual text in the target language used to train language model in this article we explore the use of semi supervised model adaptation methods for the effective use of monolingual data from the source language in order to improve translation quality we propose several algorithms with this aim and present the strengths and weaknesses of each one we present detailed experimental evaluations on the french english europarl data set and on data from the nist chinese english large data track we show significant improvement in translation quality on both tasks
software verification is an important and difficult problem many static checking techniques for software require annotations from the programmer in the form of method specifications and loop invariants this annotation overhead particularly of loop invariants is significant hurdle in the acceptance of static checking we reduce the annotation burden by inferring loop invariants automaticallyour method is based on predicate abstraction an abstract interpretation technique in which the abstract domain is constructed from given set of predicates over program variables novel feature of our approach is that it infers universally quantified loop invariants which are crucial for verifying programs that manipulate unbounded data such as arrays we present heuristics for generating appropriate predicates for each loop automatically the programmer can specify additional predicates as well we also present an efficient algorithm for computing the abstraction of set of states in terms of collection of predicatesexperiments on kloc program show that our approach can automatically infer the necessary predicates and invariants for all but of the routines that contain loops
high level query constructs help greatly improve the clarity of programs and the productivity of programmers and are being introduced to increasingly more languages however the use of high level queries in programming languages can come at cost to program efficiency because these queries are expensive and may be computed repeatedly on slightly changed inputs for efficient computation in practical applications powerful method is needed to incrementally maintain query results with respect to updates to query parameters this paper describes general and powerful method for automatically generating incremental implementations of high level queries over objects and sets in object oriented programs where query may contain arbitrary set enumerators field selectors and additional conditions the method can handle any update to object fields and addition and removal of set elements and generate coordinated maintenance code and invocation mechanisms to ensure that query results are computed correctly and efficiently our implementation and experimental results for example queries and updates confirm the effectiveness of the method
rough set theory is useful tool for data mining it is based on equivalence relations and has been extended to covering based generalized rough set this paper studies three kinds of covering generalized rough sets for dealing with the vagueness and granularity in information systems first we examine the properties of approximation operations generated by covering in comparison with those of the pawlak’s rough sets then we propose concepts and conditions for two coverings to generate an identical lower approximation operation and an identical upper approximation operation after the discussion on the interdependency of covering lower and upper approximation operations we address the axiomization issue of covering lower and upper approximation operations in addition we study the relationships between the covering lower approximation and the interior operator and also the relationships between the covering upper approximation and the closure operator finally this paper explores the relationships among these three types of covering rough sets
in this paper we investigate the combination of four machine learning methods for text categorization using dempster’s rule of combination these methods include support vector machine svm knn nearest neighbor knn model based approach knnm and rocchio we first present general representation of the outputs of different classifiers in particular modeling it as piece of evidence by using novel evidence structure called focal element triplet furthermore we investigate an effective method for combining pieces of evidence derived from classifiers generated by fold cross validation finally we evaluate our methods on the newsgroup and reuters benchmark data sets and perform the comparative analysis with majority voting in combining multiple classifiers along with the previous result our experimental results show that the best combined classifier can improve the performance of the individual classifiers and dempster’s rule of combination outperforms majority voting in combining multiple classifiers
this paper explores the relationship between personal identity and the act of appropriating digital objects in the home specifically do it yourself to inform the design of empowering products it reports ongoing research and provides preliminary analysis of the steampunk movement as case study for personal appropriation appropriation identity design guidelines are provided as result of the data analysis
application level nondeterminism can lead to inconsistent state that defeats the purpose of replication as fault tolerance strategy we present midas new approach for living with nondeterminism in distributed replicated middleware applications midas exploits the static program analysis of the application’s source code prior to replica deployment and ii the online compensation of replica divergence even as replicas execute we identify the sources of nondeterminism within the application discriminate between actual and superficial nondeterminism and track the propagation of actual nondeterminism we evaluate our techniques for the active replication of servers using micro benchmarks that contain various sources multi threading system calls and propagation of nondeterminism
we consider weighted tree automata with discounting over commutative semirings for their behaviors we establish kleene theorem and an msologic characterization we introduce also weighted muller tree automata with discounting over the max plus and the min plus semirings and we show their expressive equivalence with two fragments of weighted mso sentences
the work performed by publish subscribe system can conceptually be divided into subscription processing and notification dissemination traditionally research in the database and networking communities has focused on these aspects in isolation the interface between the database server and the network is often overlooked by previous research at one extreme database servers are directly responsible for notifying individual subscribers at the other extreme updates are injected directly into the network and the network is solely responsible for processing subscriptions and forwarding notifications these extremes are unsuitable for complex and stateful subscription queries primary goal of this paper is to explore the design space between the two extremes and to devise solutions that incorporate both database side and network side considerations in order to reduce the communication and server load and maintain system scalability our techniques apply to broad range of stateful query types and we present solutions for several of them our detailed experiments based on real and synthetic workloads with varying characteristics and link level network simulation show that by exploiting the query semantics and building an appropriate interface between the database and the network it is possible to achieve orders of magnitude savings in network traffic at low server side processing cost
students are characterized by different learning styles focusing on different types of information and processing this information in different ways one of the desirable characteristics of web based education system is that all the students can learn despite their different learning styles to achieve this goal we have to detect how students learn reflecting or acting steadily or in fits and starts intuitively or sensitively in this work we evaluate bayesian networks at detecting the learning style of student in web based education system the bayesian network models different aspects of student behavior while he she works with this system then it infers his her learning styles according to the modeled behaviors the proposed bayesian model was evaluated in the context of an artificial intelligence web based course the results obtained are promising as regards the detection of students learning styles different levels of precision were found for the different dimensions or aspects of learning style
this paper we illustrate scalable parallel performance for the timewarp synchronization protocol on the and variants of the ibm bluegene supercomputer scalable time warp performance for models that communicate large percentage of the event population over the network has not been shown on more than handful of processors we present our design for robust performing time warp simulator over variety of communication loads and extremely large processor counts up to for the phold benchmark model using processors our time warp simulator produces peak committed event rate of billion events per second at remote events and billion events per second at remote events the largest ever reported additionally for the transmission line matrix tlm model which approximates maxwell’s equations for electromagnetic wave propagation we report committed event rate in excess of million on processors with million grid lps the tlm model is particularly challenging given the bursty and cubic growth in event generation overall these performance results indicate that scalable time warp performance is obtainable on high processor counts over wide variety of event scheduling behaviors and not limited to relatively low non bursty rates of off processor communications
in traditional information flow type systems the security policy is often formalized as noninterference properties however noninterference alone is too strong to express security properties useful in practice if we allow downgrading in such systems it is challenging to formalize the security policy as an extensional property of the systemthis paper presents generalized framework of downgrading policies such policies can be specified in simple and tractable language and can be statically enforced by mechanisms such as type systems the security guarantee is then formalized as concise extensional property using program equivalences this relaxed noninterference generalizes traditional pure noninterference and precisely characterizes the information released due to downgrading
the effectiveness of instruction reuse ir technique to eliminate redundant computations at run time is limited by the fact that performance gain seldom exceeds and is dependent on the criticality of instructions being reused in this paper we focus on the power aspect of ir and propose resultbus optimization that exploits communication reuse to reduce the power dissipated over high capacitance resultbus the effectiveness of this optimization depends on the number of result producing instructions that are reused and improves overall power and energy delay product edp by over base ir policy for entry reuse buffer rb as domain specific study we examine the impact of multithreading on ir in the context of packet header processing applications specifically sharing the rb among threads can lead to either constructive or destructive interference thereby increasing or decreasing the amount of ir that can be uncovered further packet header processing applications are unique in the sense that repetition in data values within flows are quite prevalent which can be exploited to improve ir we find that an architecture that uses this flow information to govern accesses to the rb improves ir by as much as for header processing kernels
in this paper we provide comparative study of content based copy detection methods which include research literature methods based on salient point matching surf discrete cosine and wavelet transforms color histograms biologically motivated visual matching and other methods in our evaluation we focus on large scale applications especially on performance in the context of search engines for web images we assess the scalability of the tested methods by investigating the detection accuracy relative to descriptor size description time per image and matching time per image for testing original images altered by diverse set of realistic transformations are embedded in collection of one million web images
we present an improved slicing algorithm for java the best algorithm known so far first presented in is not always precise if nested objects are used as actual parameters the new algorithm presented in this paper always generates correct and precise slices but is more expensive in generalwe describe the algorithms and their treatment of objects as parameters in particular we present new safe criterion for termination of unfolding nested parameter objects we then compare the two algorithms by providing measurements for benchmark of java and javacard programs
the next decade will afford us computer chips with to of cores on single piece of silicon contemporary operating systems have been designed to operate on single core or small number of cores and hence are not well suited to manage and provide operating system services at such large scale if multicore trends continue the number of cores that an operating system will be managing will continue to double every months the traditional evolutionary approach of redesigning os subsystems when there is insufficient parallelism will cease to work because the rate of increasing parallelism will far outpace the rate at which os designers will be capable of redesigning subsystems the fundamental design of operating systems and operating system data structures must be rethought to put scalability as the prime design constraint this work begins by documenting the scalability problems of contemporary operating systems these studies are used to motivate the design of factored operating system fos fos is new operating system targeting manycore systems with scalability as the primary design constraint where space sharing replaces time sharing to increase scalabilitywe describe fos which is built in message passing manner out of collection of internet inspired services each operating system service is factored into set of communicating servers which in aggregate implement system service these servers are designed much in the way that distributed internet services are designed but instead of providing high level internet services these servers provide traditional kernel services and replace traditional kernel data structures in factored spatially distributed manner fos replaces time sharing with space sharing in other words fos’s servers are bound to distinct processing cores and by doing so do not fight with end user applications for implicit resources such as tlbs and caches we describe how fos’s design is well suited to attack the scalability challenge of future multicores and discuss how traditional application operating systems interfaces can be redesigned to improve scalability
the development of simple and intuitive interactive deformation techniques for point based models is essential if they have to find wide spread application in different domains in this paper we describe an interactive technique for deforming the surface of point based model by adapting physically based mesh free shape deformation formulation to work with input from an electronic glove each finger tip of the glove forms point probe whose movement into away from the surface is used as directed force at the surface point for deforming the model after the glove is spatially registered with the point based model one or more fingers can be simultaneously used for deforming the model
this paper is concerned with rank aggregation the task of combining the ranking results of individual rankers at meta search previously rank aggregation was performed mainly by means of unsupervised learning to further enhance ranking accuracies we propose employing supervised learning to perform the task using labeled data we refer to the approach as supervised rank aggregation we set up general framework for conducting supervised rank aggregation in which learning is formalized an optimization which minimizes disagreements between ranking results and the labeled data as case study we focus on markov chain based rank aggregation in this paper the optimization for markov chain based methods is not convex optimization problem however and thus is hard to solve we prove that we can transform the optimization problem into that of semidefinite programming and solve it efficiently experimental results on meta searches show that supervised rank aggregation can significantly outperform existing unsupervised methods
we explore the use of modern recommender system technology to address the problem of learning software applications before describing our new command recommender system we first define relevant design considerations we then discuss month user study we conducted with professional users to evaluate our algorithms which generated customized recommendations for each user analysis shows that our item based collaborative filtering algorithm generates times as many good suggestions as existing techniques in addition we present prototype user interface to ambiently present command recommendations to users which has received promising initial user feedback
the conceptual design of the extract transform load etl processes is crucial burdensome and challenging procedure that takes places at the early phases of data warehouse project several models have been proposed for the conceptual design and representation of etl processes but all share two inconveniences they require intensive human effort from the designers to create them as well as technical knowledge from the business people to understand them in previous work we have relaxed the former difficulty by working on the automation of the conceptual design leveraging semantic web technology in this paper we built upon our previous results and we tackle the second issue by investigating the application of natural language generation techniques to the etl environment in particular we provide method for the representation of conceptual etl design as narrative which is the most natural means of communication and does not require knowledge of any specific model we discuss how linguistic techniques can be used for the establishment of common application vocabulary finally we present flexible and customizable template based mechanism for generating natural language representations for the etl process requirements and operations
how do users accept and use for long period of time location based services lbs on their mobile handsets friendzone suite of mobile location based community services has been launched the services included instant messaging and locator im location based chat and anonymous instant messaging aim with supporting privacy managementa month usage survey of more than users most of them young adults followed by user interviews is reported herein the results indicate that aim is the most popular and used service more than im with lower use of chat the interviews showed that young adults are interested in immediate stimulations and therefore use aim which could lead them to face to face meetings in addition im is limited to one carrier and hence is less attractive lastly young adults using this service are more interested in sharing their location than in their privacy
this paper addresses the problem of query optimization for dynamic databases in distributed environments where data frequently change their values an adaptive query optimization algorithm is proposed to evaluate queries rather than constructing full plan for an access path and executing it the algorithm constructs partial plan executes it updates the statistics and constructs new partial plan since partial plan is constructed based on the latest statistics the algorithm is adaptive to data modifications and errors from the statistics the algorithm extends the sdd algorithm by considering local processing cost as well as communication cost whereas the sdd algorithm only uses semi joins to reduce communication cost the algorithm reduces it with joins as well it is proved that the adaptive algorithm is more efficient than the sdd algorithm
one major problem of existing methods to mine data streams is that it makes ad hoc choices to combine most recent data with some amount of old data to search the new hypothesis the assumption is that the additional old data always helps produce more accurate hypothesis than using the most recent data only we first criticize this notion and point out that using old data blindly is not better than gambling in other words it helps increase the accuracy only if we are lucky we discuss and analyze the situations where old data will help and what kind of old data will help the practical problem on choosing the right example from old data is due to the formidable cost to compare different possibilities and models this problem will go away if we have an algorithm that is extremely efficient to compare all sensible choices with little extra cost based on this observation we propose simple efficient and accurate cross validation decision tree ensemble method
this paper describes scsolver geometric constraint solver based on adaptive sampling of an underlying constraint space the solver is demonstrated on the computation of the offset to surface as well as the computation of the bisector between two surfaces the adaptive constraint sampling generates solution manifold through generalized dual contouring approach appropriate for higher dimensional problems experimental results show that the scsolver approach can compute solutions for complex input geometry at interactive rates for each example application
join processing in the streaming environment has many practical applications such as data cleaning and outlier detection due to the inherent uncertainty in the real world data it has become an increasingly important problem to consider the join processing on uncertain data streams where the incoming data at each timestamp are uncertain and imprecise different from the static databases processing uncertain data streams has its own requirements such as the limited memory small response time and so on to tackle the challenges with respect to efficiency and effectiveness in this paper we formalize the problem of join on uncertain data streams usj which can guarantee the accuracy of usj answers over uncertain data and propose effective pruning methods to filter out false alarms we integrate the pruning methods into an efficient query procedure for incrementally maintaining usj answers extensive experiments have been conducted to demonstrate the efficiency and effectiveness of our approaches
we develop framework for learning generic expressive image priors that capture the statistics of natural scenes and can be used for variety of machine vision tasks the approach provides practical method for learning high order markov random field mrf models with potential functions that extend over large pixel neighborhoods these clique potentials are modeled using the product of experts framework that uses non linear functions of many linear filter responses in contrast to previous mrf approaches all parameters including the linear filters themselves are learned from training data we demonstrate the capabilities of this field of experts model with two example applications image denoising and image inpainting which are implemented using simple approximate inference scheme while the model is trained on generic image database and is not tuned toward specific application we obtain results that compete with specialized techniques
efficient search for nearest neighbors to given location point called knn query is an important problem arising in variety of sensor network applications in this paper we investigate in network query processing strategies under knn query processing framework in location aware wireless sensor networks set of algorithms namely the geo routing tree the knn boundary tree and the itinerary based knn algorithms are designed in accordance with the global infrastructure based local infrastructure based and infrastructure free strategies respectively they have distinctive performance characteristics and are desirable under different contexts we evaluate the performance of these algorithms under several sensor network scenarios and application requirements and identify the conditions under which the various approaches are preferable
business process modeling and enactment are notoriously complex especially in open settings where business partners are autonomous requirements must be continually finessed and exceptions frequently arise because of real world or organizational problems traditional approaches which attempt to capture processes as monolithic flows have proven inadequate in addressing these challenges we propose business protocols as components for developing business processes protocol is an abstract modular publishable specification of an interaction among different roles to be played by different participants when instantiated with the participants internal policies protocols yield concrete business processes protocols are reusable and refinable thus simplifying business process design we show how protocols and their composition are theoretically founded in the pi calculus
replication aims to improve accessibility shorter response time and fault tolerance when data is associated with geographical location in the network and valid only within region around that location the benefits from replication will apply only within this region in mobile ad hoc networks manets nodes move in and out of region and can even leave the network completely which leads to frequent changing of replica holders as mobile nodes have usually constrained processing power and memory replica holders need to be selected carefully in such networks to reduce communication overhead this paper proposes solution for replication of location dependent data in mobile ad hoc networks it will be shown that an improvement of in hit ratio is achieved in accessing data items with only moderate increase in total traffic generated the scalability of the solution with regards to the increase in the number of nodes or data items in the network will also be shown to be good
buffer resizing and buffer insertion are two transformation techniques for the performance optimization of elastic systems different approaches for each technique have already been proposed in the literature both techniques increase the storage capacity and can potentially contribute to improve the throughput of the system each technique offers different trade off between area cost and latency this paper presents method that combines both techniques to achieve the maximum possible throughput while minimizing the cost of the implementation the provided method is based on mixed integer linear programming set of experiments is designed to show the feasibility of the approach
level set based approaches are widely used for image segmentation and object tracking as these methods are usually driven by low level cues such as intensity colour texture and motion they are not sufficient for many problems to improve the segmentation and tracking results shape priors were introduced into level set based approaches shape priors are generated by presenting many views priori but in many applications this priori information is not available in this paper we present level set based segmentation and tracking method that builds the shape model incrementally from new aspects obtained by segmentation or tracking in addition in order to tolerate errors during the segmentation process we present robust active shape model which provides robust shape prior in each level set iteration step for the tracking we use simple decision function to maintain the desired topology for multiple regions we can even handle full occlusions and objects which are temporarily hidden in containers by combining the decision function and our shape model our experiments demonstrate the improvement of the level set based segmentation and tracking using an active shape model and the advantages of our incremental robust method over standard approaches
incremental view maintenance is well known topic that has been addressed in the literature as well as implemented in database products yet incremental refresh has been studied in depth only for subset of the aggregate functions in this paper we propose general incremental maintenance mechanism that applies to all aggregate functions including those that are not distributive over all operations this class of functions is of great interest and includes min max stddev correlation regression xml constructor and user defined functions we optimize the maintenance of such views in two ways first by only recomputing the set of affected groups second we extend the incremental infrastructure with work areas to support the maintenance of functions that are algebraic we further optimize computation when multiple dissimilar aggregate functions are computed in the same view and for special cases such as the maintenance of min max which are incrementally maintainable over insertions we also address the important problem of incremental maintenance of views containing super aggregates including materialized olap cubes we have implemented our algorithm on prototype version of ibm db udb and an experimental evaluation proves the validity of our approach
xpath is widely used as an xml query language and is embedded in xquery expressions and in xslt stylesheets in this paper which is an extended version of sven groppe stefan bottcher jinghua groppe xpath query simplification with regard to the elimination of intersect and except operators in rd international workshop on xml schema and data management xsdm in conjuction with ieee icde atlanta usa we propose rule set which logically simplifies xpath queries by using heuristic method in order to improve the processing time furthermore we show how to substitute the xpath intersect and except operators in given xpath query with computed filter expressions performance evaluation comparing the execution times of the original xpath queries which contain the intersect and except operators and of the queries that are the result of our simplification approach shows that depending on the used query evaluator and on the original query performance improvements of factor of up to are possible additionally we prove that xpath is closed under complementation and first order complete
network on chip is new design paradigm for designing core based system on chip it features high degree of reusability and scalability in this paper we propose switch which employs the latency insensitive concepts and applies the round robin scheduling techniques to achieve high communication resource utilization based on the assumptions of the mesh network topology constructed by the switch this work not only models the communication and the contention effect of the network but develops communication driven task binding algorithm that employs the divide and conquer strategy to map applications onto the multiprocessor system on chip the algorithm attempts to derive binding of tasks such that the overall system throughput is maximized to compare with the task binding without consideration of communication and contention effect the experimental results demonstrate that the overall improvement of the system throughput is for test cases
backup is cumbersome and expensive individual users almost never back up their data and backup is significant cost in large organizations this paper presents pastiche simple and inexpensive backup system pastiche exploits excess disk capacity to perform peer to peer backup with no administrative costs each node minimizes storage overhead by selecting peers that share significant amount of data it is easy for common installations to find suitable peers and peers with high overlap can be identified with only hundreds of bytes pastiche provides mechanisms for confidentiality integrity and detection of failed or malicious peers pastiche prototype suffers only overhead for modified andrew benchmark and restore performance is comparable to cross machine copy
erasure coding can reduce the space and band width overheads of redundancy in fault tolerant data storage and delivery systems but it introduces the fundamental difficulty of ensuring that all erasure coded fragments correspond to the same block of data without such assurance different block may be reconstructed from different subsets of fragments this paper develops technique for providing this assurance without the bandwidth and computational overheads associated with current approaches the core idea is to distribute with each fragment what we call homomorphic fingerprints these fingerprints preserve the structure of the erasure code and allow each fragment to be independently verified as corresponding to specific block we demonstrate homomorphic fingerprinting functions that are secure efficient and compact
we study deterministic broadcasting in radio networks in the recently introduced framework of network algorithms with advice we concentrate on the problem of trade offs between the number of bits of information size of advice available to nodes and the time in which broadcasting can be accomplished in particular we ask what is the minimum number of bits of information that must be available to nodes of the network in order to broadcast very fast for networks in which constant time broadcast is possible under complete knowledge of the network we give tight answer to the above question bits of advice are sufficient but bits are not in order to achieve constant broadcasting time in all these networks this is in sharp contrast with geometric radio networks of constant broadcasting time we show that in these networks constant number of bits suffices to broadcast in constant time for arbitrary radio networks we present broadcasting algorithm whose time is inverse proportional to the size of the advice
visibly pushdown languages are an interesting subclass of deterministic context free languages that can model nonregular properties of interest in program analysis such class properly contains typical classes of parenthesized languages such as parenthesis bracketed balanced and input driven languages it is closed under boolean operations and has decidable decision problems such as emptiness inclusion and universality we study the membership problem for visibly pushdown languages and show that it can be solved in time linear in both the size of the input grammar and the length of the input word the algorithm relies on reduction to the reachability problem for game graphs we also discuss the time complexity of the membership problem for the class of balanced languages which is the largest among those cited above besides the intrinsic theoretical interest we further motivate our main result showing an application to the validation of xml documents against schema and document type definitions dtds
this study aims at finding out which attributes people actually recall about their own documents electronic and paper and what are the characteristics of their recall in order to provide recommendations on how to improve tools allowing users to retrieve their electronic files more effectively and more easily an experiment was conducted with fourteen participants at their workplace they were asked first to recall features about one or several of their own work documents and secondly to retrieve these documents the difficulties encountered by the participants in retrieving their electronic documents support the need for better retrieval tools more specifically results of the recall task indicate which attributes are candidates for facilitating file retrieval and how search tools should use these attributes
we propose new wavelet compression algorithm based on the rate distortion optimization for densely sampled triangular meshes exploiting the normal remesher of guskov et al the proposed algorithm includes wavelet transform and an original bit allocation optimizing the quantization of the wavelet coefficients the allocation process minimizes the reconstruction error for given bit budget as distortion measure we use the mean square error of the normal mesh quantization expressed according to the quantization error of each subband we show that this metric is suitable criterion to evaluate the reconstruction error ie the geometric distance between the input mesh and the quantized normal one moreover to design fast bit allocation we propose model based approach depending on distribution of the wavelet coefficients compared to the state of the art methods for normal meshes our algorithm provides improvements in coding performance up to db compared to the original zerotree coder
software development teams exchange source code in shared repositories these repositories are kept consistent by having developers follow commit policy such as program edits can be committed only if all available tests succeed such policies may result in long intervals between commits increasing the likelihood of duplicative development and merge conflicts furthermore commit policies are generally not automatically enforceable we present program analysis to identify committable changes that can be released early without causing failures of existing tests even in the presence of failing tests in developer’s local workspace the algorithm can support relaxed commit policies that allow early release of changes reducing the potential for merge conflicts in experiments using several versions of non trivial software system with failing tests newly enabled commit policies were shown to allow significant percentage of changes to be committed
it is consensus in microarray analysis that identifying potential local patterns characterized by coherent groups of genes and conditions may shed light on the discovery of previously undetectable biological cellular processes of genes as well as macroscopic phenotypes of related samples in order to simultaneously cluster genes and conditions we have previously developed fast co clustering algorithm minimum sum squared residue co clustering mssrcc which employs an alternating minimization scheme and generates what we call co clusters in checkerboard structure in this paper we propose specific strategies that enable mssrcc to escape poor local minima and resolve the degeneracy problem in partitional clustering algorithms the strategies include binormalization deterministic spectral initialization and incremental local search we assess the effects of various strategies on both synthetic gene expression datasets and real human cancer microarrays and provide empirical evidence that mssrcc with the proposed strategies performs better than existing co clustering and clustering algorithms in particular the combination of all the three strategies leads to the best performance furthermore we illustrate coherence of the resulting co clusters in checkerboard structure where genes in co cluster manifest the phenotype structure of corresponding specific samples and evaluate the enrichment of functional annotations in gene ontology go
this paper introduces new method of generating flat patterns from triangulated surface by opening the bending configuration of each winged triangle pair the flattening can be divided into four steps first triangulated surface is modeled with mass spring system that simulates the surface deformation during the flattening second an unwrapping force field is built to drive the mass spring system to developable configuration through the numerical integration third velocity redistribution procedure is initiated to average velocity variances among the particles finally the mass spring system is forced to collide with plane and the final pattern is generated after all the winged triangle pairs are spread onto the colliding plane to retain the size and area of the original surface strain control mechanism is introduced to keep the springs from over elongation or over shrinkage at each time step
data stream management systems dsmss do not statically respond to issued queries rather they continuously produce result streams to standing queries and often operate in context where any interruption can lead to data loss support for schema evolution in continuous query processing is currently unaddressed in this work we address evolution in dsmss by proposing semantics for three evolution primitives add attribute and drop attribute schema evolution and alter data data evolution we characterize how subset of commonly used query operators in dsms act on and propagate these primitives
we present opal light weight framework for interactively locating missing web pages http status code opal is an example of in vivo preservation harnessing the collective behavior of web archives commercial search engines and research projects for the purpose of preservation opal servers learn from their experiences and are able to share their knowledge with other opal servers by mutual harvesting using the open archives initiative protocol for metadata harvesting oai pmh using cached copies that can be found on the web opal creates lexical signatures which are then used to search for similar versions of the web page we present the architecture of the opal framework discuss reference implementation of the framework and present quantitative analysis of the framework that indicates that opal could be effectively deployed
major issue of activity recognition in sensor networks is automatically recognizing user’s high level goals accurately from low level sensor data traditionally solutions to this problem involve the use of location based sensor model that predicts the physical locations of user from the sensor data this sensor model is often trained offline incurring large amount of calibration effort in this article we address the problem using goal based segmentation approach in which we automatically segment the low level user traces that are obtained cheaply by collecting the signal sequences as user moves in wireless environments from the traces we discover primitive signal segments that can be used for building probabilistic activity model to recognize goals directly major advantage of our algorithm is that it can reduce significant amount of human effort in calibrating the sensor data while still achieving comparable recognition accuracy we present our theoretical framework for activity recognition and demonstrate the effectiveness of our new approach using the data collected in an indoor wireless environment
aspect oriented programming aop fosters the coding of tangled concerns in separated units that are then woven together in the executable system unfortunately the oblivious nature of the weaving process makes difficult to figure out the augmented system behavior it is difficult for example to understand the effect of change just by reading the source code in this paper we focus on detecting the run time impact of the editing actions on given set of test cases our approach considers two versions of an aspectj program and test case our tool implemented on top of the abc weaver and the ajana framework is able to map semantics changes to the atomic editing changes in the source code
heterogeneous clusters claim for new models and algorithms in this paper new parallel computational model is presented the model based on the loggp model has been extended to be able to deal with heterogeneous parallel systems for that purpose the loggp’s scalar parameters have been replaced by vector and matrix parameters to take into account the different nodes features the work presented here includes the parametrization of real cluster which illustrates the impact of node heterogeneity over the model’s parameters finally the paper presents some experiments that can be used for assessing the method’s validity together with the main conclusions and future work
this paper focuses on evaluation of the effectiveness of optimization at various layers of the io path such as the file system the device driver scheduler and the disk drive itself io performance is enhanced via effective block allocation at the file system request merging and reordering at the device driver and additional complex request reordering at the disk drive our measurements show that effective combination of these optimization forms yields superior performance under specific workloads in particular the impact on io performance of technological advances in modern disk drives ie reduction on head positioning times and deployment of complex request scheduling is shown for example if the outstanding requests in the io subsystem can all be accommodated by the disk queue buffer then disk level request scheduling is as effective as to close any gaps in the performance between io request schedulers at the device driver level even more for disk drives with write through caches large queue depths improve overall io throughput and when combined with the best performing disk scheduling algorithm at the device driver level perform comparably with an io subsystem where disks have write back caches
there has been little design consideration given to ease the navigation through long chat archive in limited screen display by incorporating graphical and user centered design messages can be presented in logical grouping for navigation ease and efficient tracking of specific messages in long chat archive this paper explores usable interface design for mobile group chat systems via navigation and visualisation to track messages that results in minimal key pressed and fast message retrieval additionally we incorporate avatars and emoticon in user identification and human embodiment to facilitate ease of understanding of the messages contents
in this paper we study the problem of packing sequence of objects into bins the objects are all either divisible or indivisible and occur in accordance with certain probability distribution we would like to find the average number of entries wasted in bin if objects are indivisible and the probability of splitting the last object in bin if objects are divisible we solve this problem under unified formulation by modeling packing process as markov chain whose state transition probabilities are derived from an application of the partitions of integers an application of this study to instruction cache design shows that line size of bytes has minimized the probability of splitting the last instruction in cache line for micro op cache design line size of four entries has minimized the number of entries wasted per line
focus context techniques such as fisheye lenses are used to navigate and manipulate objects in multi scale worlds they provide in place magnification of region without requiring users to zoom the whole representation and consequently lose context their adoption is however hindered by usability problems mostly due to the nature of the transition between focus and context existing transitions are often based on physical metaphor magnifying glass fisheye rubber sheet and are almost always achieved through single dimension space we investigate how other dimensions namely time and translucence can be used to achieve more efficient transitions we present an extension to carpendale’s framework for unifying presentation space accommodating these new dimensions we define new lenses in that space called sigma lenses and compare them to existing lenses through experiments based on generic task focus targeting results show that one new lens the speed coupled flattening lens significantly outperforms all others
always on video provides rich awareness for distance separated coworkers yet video can threaten privacy especially when it captures telecommuters working at home we evaluated video blurring an image masking method long touted to balance privacy and awareness results show that video blurring is unable to balance privacy with awareness for risky situations reactions by participants suggest that other popular image masking techniques will be problematic as well the design implication is that image masking techniques will not suffice for privacy protection in video based telecommuting situations other context aware privacy protecting strategies are required as illustrated in our prototype context aware home media space
this paper investigates wrapper induction from web sites whose layout may change over time we formulate the reinduction as an incremental learning problem and identify that wrapper induction from an incomplete label is key problem to be solved we propose novel algorithm for incrementally inducing lr wrappers and show that this algorithm asymptotically identifies the correct wrapper as the number of tuples is increased this property is used to propose lr wrapper reinduction algorithm this algorithm requires examples to be provided exactly once and there after the algorithm can detect the layout changes and reinduce wrappers automatically in experimental studies we observe that the reinduction algorithm is able to achieve near perfect performance
we address the problem of computing approximate answers to continuous sliding window joins over data streams when the available memory may be insufficient to keep the entire join state one approximation scenario is to provide maximum subset of the result with the objective of losing as few result tuples as possible an alternative scenario is to provide random sample of the join result eg if the output of the join is being aggregated we show formally that neither approximation can be addressed effectively for sliding window join of arbitrary input streams previous work has addressed only the maximum subset problem and has implicitly used frequency based model of stream arrival we address the sampling problem for this model more importantly we point out broad class of applications for which an age based model of stream arrival is more appropriate and we address both approximation scenarios under this new model finally for the case of multiple joins being executed with an overall memory constraint we provide an algorithm for memory allocation across the joins that optimizes combined measure of approximation in all scenarios considered all of our algorithms are implemented and experimental results demonstrate their effectiveness
strongly dynamic software systems are difficult to verify by strongly dynamic we mean that the actors in such systems change dynamically that the resources used by such systems are dynamically allocated and deallocated and that for both sets no bounds are statically known in this position paper we describe the progress we have made in automated verification of strongly dynamic systems using abstract interpretation with three valued logical structures we then enumerate number of challenges that must be tackled in order for such techniques to be widely adopted
in contrast to traditional database queries query on stream data is continuous in that it is periodically evaluated over fractions sliding windows of the data stream this introduces challenges beyond those encountered when processing traditional queries over traditional dbms database management system the answer to an aggregate query is usually much smaller than the answer to similar non aggregate query making query processing condensative current proposals for declarative query languages over data streams do not support such condensative processing nor is it yet well understood what query constructs and what semantics should be adopted for continuous query languages in order to make existing stream query languages more expressive novel stream query language csql condensative stream query language are proposed over sequence based stream model ma nutt it is shown that the sequence model supports precise tuple based semantics that is lacking in previous time based models and thereby provides formal semantics to understand and reason about continuous queries csql supports sliding window operators found in previous languages and processes declarative semantics that allows one to specify and reason about the different meanings of the frequency by which query returns answer tuples which are beyond previous query languages over streams in addition novel condensative stream algebra is defined by extending an existing stream algebra with new frequency operator to capture the condensative property it is shown that condensative stream algebra enables the generation of efficient continuous query plans and can be used to validate query optimisation finally it is shown via an experimental study that the proposed operators are effective and efficient in practice
efficient scheduling of workflow applications represented by weighted directed acyclic graphs dag on set of heterogeneous processors is essential for achieving high performance the optimization problem is np complete in general few heuristics for scheduling on heterogeneous systems have been proposed recently however few of them consider the case where processors have different capabilities in this paper we present novel list scheduling based algorithm to deal with this situation the algorithm sdc has two distinctive features first the algorithm takes into account the effect of percentage of capable processors pcp when assigning the task node weights for two task nodes with same average computation cost our weight assignment policy tends to give higher weight to the task with small pcp secondly during the processor selection phase the algorithm adjusts the effective earliest finish time strategy by incorporating the average communication cost between the current scheduling node and its children comparison study shows that our algorithm performs better than related work overall
this paper discusses the prototypical implementation of an ambient display and the results of an empirical study in retail store it presents the context of shopping as an application area for ambient intelligence ami technologies the prototype consists of an ambient store map that enhances the awareness of customer activity the results of our study indicate potentials and challenges for an improvement of the shopping experience with ami technologies based on our findings we discuss challenges and future developments for applying ami technologies to shopping environments
an increasing number of enterprises outsource their it services to third parties who can offer these services for much lower cost due to economy of scale quality of service is major concern in outsourcing in particular query integrity which means that query results returned by the service provider are both correct and complete must be assured previous work requires clients to manage data locally to audit the results sent back by the server or database engine to be modified for generating authenticated results in this paper we introduce novel integrity audit mechanism that eliminating these costly requirements in our approach we insert small amount of records into an outsourced database so that the integrity of the system can be effectively audited by analyzing the inserted records in the query results we study both randomized and deterministic approaches for generating the inserted records as how these records are generated has significant implications for storage and performance furthermore we show that our method is provable secure which means it can withstand any attacks by an adversary whose computation power is bounded our analytical and empirical results demonstrate the effectiveness of our method
different from traditional deadlock avoidance schemes deadlock detection and recovery based routing algorithms in wormhole networks have gained attention due to low hardware complexity and high routing adaptability by its nature current deadlock detection techniques based on time out accompany unignorable number of false deadlock detections especially in heavily loaded network or with long packet size and may mark more than one packet in deadlock as deadlocked this would saturate the resources allocated for recovery making deadlock recovery schemes less viable this paper proposes simple but more accurate deadlock detection scheme which is less dependent on the time out value the proposed scheme uses control packet to find potential cyclic dependency between packets and presumes deadlock only upon finding such dependency the suggested scheme considerably reduces the probability of false deadlock detections over previous schemes thus enabling more efficient deadlock recovery and higher network throughput simulation results are provided to demonstrate the efficiency of the proposed scheme
static analysis designers must carefully balance precision and efficiency in our experience many static analysis tools are built around an elegant core algorithm but that algorithm is then extensively tweaked to add just enough precision for the coding idioms seen in practice without sacrificing too much efficiency there are several downsides to adding precision in this way the tool’s implementation becomes much more complicated it can be hard for an end user to interpret the tool’s results and as software systems vary tremendously in their coding styles it may require significant algorithmic engineering to enhance tool to perform well in particular software domain in this paper we present mix novel system that mixes type checking and symbolic execution the key aspect of our approach is that these analyses are applied independently on disjoint parts of the program in an off the shelf manner at the boundaries between nested type checked and symbolically executed code regions we use special mix rules to communicate information between the off the shelf systems the resulting mixture is provably sound analysis that is more precise than type checking alone and more efficient than exclusive symbolic execution in addition we also describe prototype implementation mixy for mixy checks for potential null dereferences by mixing null non null type qualifier inference system with symbolic executor
online analytical processing is powerful framework for the analysis of organizational data olap is often supported by logical structure known as data cube multidimensional data model that offers an intuitive array based perspective of the underlying data supporting efficient indexing facilities for multi dimensional cube queries is an issue of some complexity in practice the difficulty of the indexing problem is exacerbated by the existence of attribute hierarchies that sub divide attributes into aggregation layers of varying granularity in this paper we present hierarchy and caching framework that supports the efficient and transparent manipulation of attribute hierarchies within parallel rolap environment experimental results verify that when compared to the non hierarchical case very little overhead is required to handle streams of arbitrary hierarchical queries
we define new fixpoint modal logic the visibly pushdown calculus vp as an extension of the modal calculus the models of this logic are execution trees of structured programs where the procedure calls and returns are made visible this new logic can express pushdown specifications on the model that its classical counterpart cannot and is motivated by recent work on visibly pushdown languages we show that our logic naturally captures several interesting program specifications in program verification and dataflow analysis this includes variety of program specifications such as computing combinations of local and global program flows pre post conditions of procedures security properties involving the context stack and interprocedural dataflow analysis properties the logic can capture flow sensitive and inter procedural analysis and it has constructs that allow skipping procedure calls so that local flows in procedure can also be tracked the logic generalizes the semantics of the modal calculus by considering summaries instead of nodes as first class objects with appropriate constructs for concatenating summaries and naturally captures the way in which pushdown models are model checked the main result of the paper is that the model checking problem for vp is effectively solvable against pushdown models with no more effort than that required for weaker logics such as ctl we also investigate the expressive power of the logic vp we show that it encompasses all properties expressed by corresponding pushdown temporal logic on linear structures caret as well as by the classical calculus this makes vp the most expressive known program logic for which algorithmic software model checking is feasible in fact the decidability of most known program logics calculus temporal logics ltl and ctl caret etc can be understood by their interpretation in the monadic second order logic over trees this is not true for the logic vp making it new powerful tractable program logic
certification of modeling and simulation applications poses significant technical challenges for program managers engineers and practitioners certification is becoming increasingly more important as applications are used more and more for military training complex system design evaluation based acquisition problem solving and critical decision making certification very complex process involves the measurement and evaluation of hundreds of qualitative and quantitative elements mandates subject matter expert evaluation and requires the integration of different evaluations planning and managing such measurements and evaluations requires unifying methodology and should not be performed in an ad hoc manner this paper presents such methodology the methodology consists of the following body of methods rules and postulates employment of subject matter experts construction of hierarchy of indicators relative criticality weighting of indicators using the analytic hierarchy process using rule based expert knowledge base with an object oriented specification language assignment of crisp fuzzy and nominal scores for the indicators aggregation of indicator scores graphical representation of the indicator scores and weights hypertext certification report and interpretation of the results the methodology can be used for certification of any kind of application either throughout the development life cycle or after the development is completed
the java security architecture includes dynamic mechanism for enforcing access control checks the so called stack inspection process while the architecture has several appealing features access control checks are all implemented via dynamic method calls this is highly nondeclarative form of specification that is hard to read and that leads to additional run time overhead this article develops type systems that can statically guarantee the success of these checks our systems allow security properties of programs to be clearly expressed within the types themselves which thus serve as static declarations of the security policy we develop these systems using systematic methodology we show that the security passing style translation proposed by wallach et al as dynamic implementation technique also gives rise to static security aware type systems by composition with conventional type systems to define the latter we use the general hm framework and easily construct several constraint and unification based type systems
the relationship between possible and supported models of unstratified indefinite deductive databases is studied when disjunction is interpreted inclusively possible and supported models are shown to coincide under suitable definition of supportedness and the concept of supported cover is introduced and shown to characterise possible models and facilitate top down query processing and compilation under the possible model semantics the properties and query processing of deductive databases under the possible model semantics is compared and contrasted with the perfect model semantics
we show how classical datalog semantics can be used directly and very simply to provide semantics to syntactic extension of datalog with methods classes inheritance overloading and late binding several approaches to resolution are considered implemented in the model and formally compared they range from resolution in style to original kinds of resolution suggested by the declarative nature of the language we show connections to view specification and further extension allowing runtime derivation of the class hierarchy
tablet pcs are gaining popularity but many users particularly older ones still struggle with pen based interaction one type of error drifting occurs when users accidentally hover over an adjacent menu causing their focus menu to close and the adjacent one to open in this paper we propose two approaches to address drifting the first tap requires an explicit tap to switch menus and thus eliminates the possibility of drift the second glide uses distance threshold to delay switching and thereby reduce the likelihood of drift we performed comparative evaluation of our approaches with control interface tap was effective at reducing drifts for both groups but it was only popular among older users glide surprisingly did not show any performance improvement additional research is needed to determine if the negative findings for glide are result of the particular threshold used or reflect fundamental flaw in the glide approach
kernel is described called the link kernel for high performance interprocess communication among shared memory multiprocessors using an ethernet the link kernel provides asynchronous multicast communication service without protocol overhead links are created by listeners and processes may be simultaneously talkers and listeners processes access links through global link table multicast addresses are site independent so communication is independent of process locations in the network communication is asynchronous so much of the overhead of message management is masked by parallel computation
various security models have been proposed in recent years for different purposes each of these aims to ease administration by introducing new types of security policies and models this increases the complexity system administrator is faced with ultimately the resources expended in choosing amongst all of these models leads to less efficient administrationin this paper we propose new access control paradigm which is already well established in virus and spam protection as partial delegation of administration to external expertise centres well known vulnerabilities can be filtered out and known sources of attacks can be automatically blocked we describe how partial outsourcing can be achieved in secure way framework which enables this process has already been developed
blast is an automatic verification tool for checking temporal safety properties of programs blast is based on lazy predicate abstraction driven by interpolation based predicate discovery the blast specification language specifies program properties at two levels of precision at the lower level monitor automata are used to specify temporal safety properties of program executions traces at the higher level relational reachability queries over program locations are used to combine lower level trace properties the two level specification language can be used to break down verification task into several independent calls of the model checking engine in this way each call to the model checker may have to analyze only part of the program or part of the specification and may thus succeed in reduction of the number of predicates needed for the analysis in addition the two level specification language provides means for structuring and maintaining specifications
motivated by applications in software verification we explore automated reasoning about the non disjoint combination of theories of infinitely many finite structures where the theories share set variables and set operations we prove combination theorem and apply it to show the decidability of the satisfiability problem for class of formulas obtained by applying propositional connectives to formulas belonging to boolean algebra with presburger arithmetic with quantifiers over sets and integers weak monadic second order logic over trees with monadic second order quantifiers two variable logic with counting quantifiers ranging over elements the bernays schönfinkel ramsey class of first order logic with equality with quantifier prefix and the quantifier free logic of multisets with cardinality constraints
many studies have shown that collaboration is still badly supported in software development environments sdes this is why we try to take benefits from theory developed in social and human sciences the activity theory to better understand the cooperative human activities in which sd is realized this paper particularly focuses on the experience crystallization principle to propose new solutions while enhancing the support for collaboration in the widely used eclipse ide
continuous nearest neighbor cknn query is one of the most fundamental queries in the field of spatio temporal databases given time interval cknn query is to retrieve the nearest neighbors knns of moving user at each time instant within existing methods for processing cknn query however assume that each object moves with fixed direction and or fixed speed in this paper we relieve this assumption by allowing both the moving speed and the moving direction of each object to vary this uncertainty on speed and direction of moving object would increase the complexity of processing cknn query we thoroughly analyze the involved issues incurred by this uncertainty and propose continuous possible knn cpknn algorithm to effectively find the objects that could be the knns these objects are termed the possible knns pknns in this paper probability based model is designed accordingly to quantify the possibility of each pknn being the knn in addition we design pknn updating mechanism to rapidly evaluate the new query result when object updates occur comprehensive experiments are conducted to demonstrate the effectiveness and the efficiency of the proposed approach
the connected coverage is one of the most important problems in wireless sensor networks however most existing approaches to connected coverage require knowledge of accurate location information this paper solves challenging problem without accurate location information how to schedule sensor nodes to save energy and meet both constraints of sensing area coverage and network connectivity our solution is based on the theoretical analysis of the sensing area coverage property of minimal dominating set we establish the relationship between point coverage and area coverage and derive the upper and lower bound that point coverage is equivalent to area coverage in random geometric graphs based on the analytical results and the existing algorithms which construct the connected dominating set an energy efficient connected coverage protocol eeccp is proposed extensive simulation studies show that the proposed connected coverage protocol can effectively maintain both high quality sensing coverage and connectivity for long time
we extend our correspondence between evaluators and abstract machines from the pure setting of the calculus to the impure setting of the computational calculus we show how to derive new abstract machines from monadic evaluators for the computational calculus starting from generic evaluator parameterized by monad and monad specifying computational effect we inline the components of the monad in the generic evaluator to obtain an evaluator written in style that is specific to this computational effect we then derive the corresponding abstract machine by closure converting cps transforming and defunctionalizing this specific evaluator we illustrate the construction with the identity monad obtaining the cek machine and with lifted state monad obtaining variant of the cek machine with error and statein addition we characterize the tail recursive stack inspection presented by clements and felleisen as lifted state monad this enables us to combine this stack inspection monad with other monads and to construct abstract machines for languages with properly tail recursive stack inspection and other computational effects the construction scales to other monads including one more properly dedicated to stack inspection than the lifted state monad and other monadic evaluators
ensuring the correctness of multithreaded programs is difficult due to the potential for unexpected interactions between concurrent threads much previous work has focused on detecting race conditions but the absence of race conditions does not by itself prevent undesired thread interactions we focus on the more fundamental non interference property of atomicity method is atomic if its execution is not affected by and does not interfere with concurrently executing threads atomic methods can be understood according to their sequential semantics which significantly simplifies formal and informal correctness argumentsthis paper presents dynamic analysis for detecting atomicity violations this analysis combines ideas from both lipton’s theory of reduction and earlier dynamic race detectors experience with prototype checker for multithreaded java code demonstrates that this approach is effective for detecting errors due to unintended interactions between threads in particular our atomicity checker detects errors that would be missed by standard race detectors and it produces fewer false alarms on benign races that do not cause atomicity violations our experimental results also indicate that the majority of methods in our benchmarks are atomic supporting our hypothesis that atomicity is standard methodology in multithreaded programming
outdoor community mesh networks based on ieee have seen tremendous growth in the recent past the current understanding is that wireless link performance in these settings is inherently unpredictable due to multipath delay spread consequently researchers have focused on developing intelligent routing techniques to achieve the best possible performance in this paper we are specifically interested in mesh networks in rural locations we first present detailed measurements to show that the phy layer in these settings is indeed stable and predictable there is strong correlation between the error rate and the received signal strength we show that interference and not multipath fading is the primary cause of unpredictable performance this is in sharp contrast with current widespread knowledge from prior studies furthermore we corroborate our view with fresh analysis of data presented in these prior studies while our initial measurements focus on we then use two different phy technologies as well operating in the ghz ism band and these show similar results too based on our results we argue that outdoor rural mesh networks can indeed be built with the link abstraction being valid this has several design implications including at the mac and routing layers and opens up fresh perspective on wide range of technical issues in this domain
design patterns are reusable abstractions in object oriented software however using current mainstream programming languages these elements can only be expressed extra linguistically as prose pictures and prototypes we believe that this is not inherent in the patterns themselves but evidence of lack of expressivity in the languages of today we expect that in the languages of the future the code parts of design patterns will be expressible as reusable library components indeed we claim that the languages of tomorrow will suffice the future is not far away all that is needed in addition to commonly available features are higher order and datatype generic constructs these features are already or nearly available now we argue the case by presenting higher order datatype generic programs capturing origami small suite of patterns for recursive data structures
natural interfaces for cad applications based on sketching devices have been explored to some extent former approaches used techniques to perform the recognition process like invariant features extracted with image analysis techniques as neural networks statistical learning or fuzzy logic currently more flexible and robust techniques are being introduced which consider other information as context data and other relationships however this kind of interfaces is still scarcely extended because they still lack scalability and reliability for interpreting user inputs
we give lower bound of xa on the quantum query complexity for finding fixed point of discrete brouwer function over grid xa our lower bound is nearly tight as grover search can be used to find fixed point with quantum queries our result establishes nearly tight bound for the computation of dimensional approximate brouwer fixed points defined by scarf and by hirsch papadimitriou and vavasis it can be extended to the quantum model for sperner lemma in any dimensions the quantum query complexity of finding panchromatic cell in sperner coloring of triangulation of dimensional simplex with cells is xa for this result improves the bound of xa of friedl ivanyos santha and verhoeven more significantly our result provides quantum separation of local search and fixed point computation over for aaronson local search algorithm for grid xa using aldous sampling and grover search makes quantum queries thus the quantum query model over for strictly separates these two fundamental search problems
this paper discusses new approaches to interaction design for communication of art in the physical museum space in contrast to the widespread utilization of interactive technologies in cultural heritage and natural science museums it is generally challenge to introduce technology in art museums without disturbing the domain of the art works to explore the possibilities of communicating art through the use of technology and to minimize disturbance of the artworks we apply four main approaches in the communication gentle audio augmentation of art works conceptual affinity of art works and remote interactive installations using the body as an interaction device consistent audio visual cues for interaction opportunities the paper describes the application of these approaches for communication of inspirational material for mariko mori exhibition the installations are described and argued for experiences with the interactive communication are discussed based on qualitative and quantitative evaluations of visitor reactions it is concluded that the installations are received well by the visitors who perceived exhibition and communication as holistic user experience with seamless interactive communication
let ge be an integer we show that any undirected and unweighted graph on vertices has subgraph with kn edges such that for any two vertices isin if delta then delta furthermore we show that such subgraphs can be constructed in mn time where and are the number of edges and vertices in the original graph we also show that it is possible to construct weighted graph with kn edges such that for every isin if delta then delta le delta these are the first such results with additive error terms of the form ie additive error terms that are sublinear in the distance being approximated
this work envisions common design methodology applicable for every interconnect level and based on early wire characterization to provide faster convergence to feasible and robust design we claim that such novel design methodology is vital for upcoming nanometer technologies where increased variations in both device characteristics and interconnect parameters introduce tedious design closure problems the proposed methodology has been successfully applied to the wire synthesis of network on chip interconnect to achieve given delay and noise goals and ii attain more power efficient design with respect to existing techniques
security is crucial aspect in any modern software system to ensure security in the final product security requirements must be considered in the entire software development process we evaluate in this paper how security requirements can be integrated into the analysis phase of an object oriented software development process our approach is model driven by providing models for security aspects related to the models for functional requirements we investigate how the security models can be generated from the functional models we give graph based formal semantics to the security models and present verification concepts which ensure the security requirements in the models
virtual functions make code easier for programmers to reuse but also make it harder for compilers to analyze we investigate the ability of three static analysis algorithms to improve programs by resolving virtual function calls thereby reducing compiled code size and reducing program complexity so as to improve both human and automated program understanding and analysis in measurements of seven programs of significant size to lines of code each we found that on average the most precise of the three algorithms resolved of the virtual function calls and reduced compiled code size by this algorithm is very fast it analyzes source lines per second on an mhz powerpc because of its accuracy and speed this algorithm is an excellent candidate for inclusion in production compilers
we present system architecture for the th generation of pc class programmable graphics processing units gpus the new pipeline features significant additions and changes to the prior generation pipeline including new programmable stage capable of generating additional primitives and streaming primitive data to memory an expanded common feature set for all of the programmable stages generalizations to vertex and image memory resources and new storage formats we also describe structural modifications to the api runtime and shading language to complement the new pipeline we motivate the design with descriptions of frequently encountered obstacles in current systems throughout the paper we present rationale behind prominent design choices and alternatives that were ultimately rejected drawing on insights collected during multi year collaboration with application developers and hardware designers
contemporary software development is based on global sharing of software component libraries as result programmers spend much time reading reference documentation rather than writing code making library reference documentation central programming tool traditionally reference documentation is designed for textbooks even though it may be distributed online however the computer provides new dimensions of change evolution and adaptation that can be utilized to support efficiency and quality in software development what is difficult to determine is how the electronic text dimensions best can be utilized in library reference documentationthis article presents study of the design of electronic reference documentation for software component libraries results are drawn from study in an industrial environment based on the use of an experimental electronic reference documentation called dynamic javadoc or djavadoc used in real work situation for months the results from interviews with programmers indicate that the electronic library reference documentation does not require adaptation or evolution on an individual level more importantly reference documentation should facilitate the transfer of code from documentation to source files and also support the integration of multiple documentation sources
in some applications wireless sensor networks wsns operate in very harsh environments and nodes become subject to increased risk of damage sometimes wsn suffers from the simultaneous failure of multiple sensors and gets partitioned into disjoint segments restoring network connectivity in such case is crucial in order to avoid negative effects on the application given that wsns often operate unattended in remote areas the recovery should be autonomous this paper promotes an effective strategy for restoring the connectivity among these segments by populating the least number of relay nodes finding the optimal count and position of relay nodes is np hard and heuristics are thus pursued we propose distributed algorithm for optimized relay node placement using minimum steiner tree dorms since in autonomously operating wsns it is infeasible to perform network wide analysis to diagnose where segments are located dorms moves relay nodes from each segment toward the center of the deployment area as soon as those relays become in range of each other the partitioned segments resume operation dorms further model such initial inter segment topology as steiner tree in order to minimize the count of required relays disengaged relays can return to their respective segments to resume their pre failure duties we analyze dorms mathematically and explain the beneficial aspects of the resulting topology with respect to connectivity and traffic balance the performance of dorms is validated through extensive simulation experiments
we present system on chip soc testing approach that integrates test data compression test access mechanism test wrapper design and test scheduling an efficient linear feedback shift register lfsr reseeding technique is used as the compression engine all cores on the soc share single on chip lfsr at any clock cycle one or more cores can simultaneously receive data from the lfsr seeds for the lfsr are computed from the care bits for the test cubes for multiple cores we also propose scan slice based scheduling algorithm that attempts to maximize the number of care bits the lfsr can produce at each clock cycle such that the overall test application time tat is minimized this scheduling method is static in nature because it requires predetermined test cubes we also present dynamic scheduling method that performs test compression during test generation experimental results for international symposium on circuits and systems and international workshop on logic and synthesis benchmark circuits as well as industrial circuits show that optimum tat which is determined by the largest core can often be achieved by the static method if structural information is available for the cores the dynamic method is more flexible particularly since the performance of the static compression method depends on the nature of the predetermined test cubes
we observe that the principal typing property of type system is the enabling technology for modularity and separate compilation we use this technology to formulate modular and polyvariant closure analysis based on the rank intersection types annotated with control flow informationmodularity manifests itself in syntax directed annotated type inference algorithm that can analyse program fragments containing free variables principal typing property is used to formalise it polyvariance manifests itself in the separation of different behaviours of the same function at its different uses this is formalised via the rank intersection types as the rank intersection type discipline types at least all core ml programs our analysis can be used in the separate compilation of such programs
program executing on low end embedded system such as smart card faces scarce memory resources and fixed execution time constraints we demonstrate that factorization of common instruction sequences in java bytecode allows the memory footprint to be reduced on average to of its original size with minimal execution time penalty while preserving java compatibility our solution requires only few modifications which are straightforward to implement in any jvm used in low end embedded system
this paper addresses an energy saving scheduling scheme of periodic real time tasks with the capability of dynamic voltage and frequency scaling on the lightly loaded multi core platform containing more processing cores than running tasks first it is shown that the problem of minimizing energy consumption of real time tasks is np hard even on the lightly loaded multi core platform next heuristic scheduling scheme is proposed to find an energy efficient schedule with low time complexity while meeting the deadlines of real time tasks the scheme exploits overabundant cores to reduce energy consumption using parallel execution and turns off the power of unused or rarely used cores evaluation shows that the proposed scheme saves up to energy consumption of the existing method executing each task on separate core
existing hierarchical summarization techniques fail to provide synopses good in terms of relative error metrics this paper introduces multiplicative synopses summarization paradigm tailored for effective relative error summarization this paradigm is inspired from previous hierarchical index based summarization schemes but goes beyond them by altering their underlying data representation mechanism existing schemes have decomposed the summarized data based on sums and differences of values resulting in what we call additive synopses we argue that the incapacity of these models to handle relative error metrics stems exactly from this additive nature of their representation mechanism we substitute this additive nature by multiplicative one we argue that this is more appropriate for achieving low relative error data approximations we develop an efficient linear time dynamic programming scheme for one dimensional multiplicative synopsis construction under general relative error based metrics and special scheme for the case of maximum relative error we generalize our schemes to higher data dimensionality and we show surprising additional benefit gained by our special scheme for maximum relative error in this case in our experimental study we verify the higher efficacy of our model on relative error oriented summarization problems
we present novel photographic technique called dual photography which exploits helmholtz reciprocity to interchange the lights and cameras in scene with video projector providing structured illumination reciprocity permits us to generate pictures from the viewpoint of the projector even though no camera was present at that location the technique is completely image based requiring no knowledge of scene geometry or surface properties and by its nature automatically includes all transport paths including shadows inter reflections and caustics in its simplest form the technique can be used to take photographs without camera we demonstrate this by capturing photograph using projector and photo resistor if the photo resistor is replaced by camera we can produce dataset that allows for relighting with incident illumination using an array of cameras we can produce slice of the reflectance field that allows for relighting with arbitrary light fields since an array of cameras can operate in parallel without interference whereas an array of light sources cannot dual photography is fundamentally more efficient way to capture such dataset than system based on multiple projectors and one camera as an example we show how dual photography can be used to capture and relight scenes
we analyze the bounded reachability problem of programs that use abstract data types and set comprehensions such programs are common as high level executable specifications of complex protocols we prove decidability and undecidability results of restricted cases of the problem and extend the satisfiability modulo theories approach to support analysis of set comprehensions over tuples and bag axioms we use the solver for our implementation and experiments and we use asml as the modeling language
in information integration systems sources may have diverse and limited query capabilities to obtain maximum information from these restrictive sources to answer query one can access sources that are not specified in the query ie off query sources in this article we propose query planning framework to answer queries in the presence of limited access patterns in the framework query and source descriptions are translated to recursive datalog program we then solve optimization problems in this framework including how to decide whether accessing off query sources is necessary how to choose useful sources for query and how to test query containment we develop algorithms to solve these problems and thus construct an efficient program to answer query
concurrent multipath transfer cmt uses the stream control transmission protocol’s sctp multihoming feature to distribute data across multiple end to end paths in multihomed sctp association we identify three negative side effects of reordering introduced by cmt that must be managed before efficient parallel transfer can be achieved unnecessary fast retransmissions by sender overly conservative congestion window cwnd growth at sender and increased ack traffic due to fewer delayed acks by receiver we propose three algorithms which augment and or modify current sctp to counter these side effects presented with several choices as to where sender should direct retransmissions of lost data we propose five retransmission policies for cmt we demonstrate spurious retransmissions in cmt with all five policies and propose changes to cmt to allow the different policies cmt is evaluated against appstripe which is an idealized application that stripes data over multiple paths using multiple sctp associations the different cmt retransmission policies are then evaluated with varied constrained receive buffer sizes in this foundation work we operate under the strong assumption that the bottleneck queues on the end to end paths used in cmt are independent
the goal in domain adaptation is to train model using labeled data sampled from domain different from the target domain on which the model will be deployed we exploit unlabeled data from the target domain to train model that maximizes likelihood over the training sample while minimizing the distance between the training and target distribution our focus is conditional probability models used for predicting label structure given input based on features defined jointly over and we propose practical measures of divergence between the two domains based on which we penalize features with large divergence while improving the effectiveness of other less deviant correlated features empirical evaluation on several real life information extraction tasks using conditional random fields crfs show that our method of domain adaptation leads to significant reduction in error
this paper proposes new model for the partitioning and scheduling of specification on partially dynamically reconfigurable hardware although this problem can be solved optimally only by tackling its subproblems jointly the exceeding complexity of such task leads to decomposition into two phases the partitioning phase is based on new graph theoretic approach which aims to obtain near optimality even if performed independently from the subsequent phase for the scheduling phase new integer linear programming formulation and heuristic approach are developed both take into account configuration prefetching and module reuse the experimental results show that the proposed method compares favorably with existing solutions
how should and how can software be managed what is the management concept or paradigm software professionals if they think about management of software at all think in terms of configuration management this is not method for over all software management it merely controls software items versions this is much too fine level of granularity management begins with accurate and timely information managers tend to view software as something unfortunately very necessary but troubling because they have very little real information about it and control is still nebulous at best accountants view software as an incomprehensible intangible neither wholly an expense nor really an asset they do not have nor do they produce information concerning it their data concerning software barely touches on direct outlays and contains no element of effort part of this disorientation is the basic confusion between business software and engineering software this gordian knot must be opened it needs to be made much more clear this article shows direction how such clarity may be achieved
as means of transmitting not only data but also code encapsulated within functions higher order channels provide an advanced form of task parallelism in parallel computations in the presence of mutable references however they pose safety problem because references may be transmitted to remote threads where they are no longer valid this paper presents an ml like parallel language with type safe higher order channels by type safety we mean that no value written to channel contains references or equivalently that no reference escapes via channel from the thread where it is created the type system uses typing judgment that is capable of deciding whether the value to which term evaluates contains references or not the use of such typing judgment also makes it easy to achieve another desirable feature of channels channel locality that associates every channel with unique thread for serving all values addressed to it our type system permits mutable references in sequential computations and also ensures that mutable references never interfere with parallel computations thus it provides both flexibility in sequential programming and ease of implementing parallel computations
multimedia and complex data are usually queried by similarity predicates whereas there are many works dealing with algorithms to answer basic similarity predicates there are not generic algorithms able to efficiently handle similarity complex queries combining several basic similarity predicates in this work we propose simple and effective set of algorithms that can be combined to answer complex similarity queries and set of algebraic rules useful to rewrite similarity query expressions into an adequate format for those algorithms those rules and algorithms allow relational database management systems to turn complex queries into efficient query execution plans we present experiments that highlight interesting scenarios they show that the proposed algorithms are orders of magnitude faster than the traditional similarity algorithms moreover they are linearly scalable considering the database size
collaborative filtering aims at helping users find items they should appreciate from huge catalogues in that field we can distinguish user based item based and model based approaches for each of them many options play crucial role for their performances and in particular the similarity function defined between users or items the number of neighbors considered for user or item based approaches the number of clusters for model based approaches using clustering and the prediction function usedin this paper we review the main collaborative filtering methods proposed in the litterature and compare them on the same widely used real dataset called movielens and using the same widely used performance measure called mean absolute error mae this study thus allows us to highlight the advantages and drawbacks of each approach and to propose some default options that we think should be used when using given approach or designing new one
we introduce novel sensor fusion approach for automated initialization of marker less tracking systems it is not limitated in tracking range and working environment given model of the objects or the real scene this is achieved based on statistical analysis and probabilistic estimation of the uncertainties of the tracking sensors the explicit representation of the error distribution allows the fusion of different sensor data this methodology was applied to an augmented reality system using mobile camera and several stationary tracking sensors and can be easily extended to the case of any additional sensor in order to solve the initialization problem we adapt modify and integrate advanced techniques such as plenoptic viewing intensity based registration and icp thereby the registration error is minimized in object space rather than in image experimental results show how complex objects can be registered efficiently and accurately to single image
code size is an important concern in embedded systems vliw architectures are popular for embedded systems but often increase code size by requiring nops to be inserted into the code to satisfy instruction placement constraints existing vliw instruction schedulers target run time but not code size indeed current schedulers often increase code size by generating compensation copies of instructions when moving them across basic block boundaries our approach for the first time uses the power of scheduling instructions across blocks to reduce code size and not just run time for certain class of vliws we therefore show that trace scheduling previously synonymous with increased code size can in fact be used to reduce code size on such vliws our scheduler uses cost model driven back tracking approach that starts with an optimal algorithm for searching the solution space in exponential time but then also employs branch and bound techniques and non optimal heuristics to keep the compile time reasonable within factor of our method reduces the code size for our benchmarks by versus the best existing across block scheduler while being within of its run time
person seeking another person’s attention is normally able to quickly assess how interruptible the other person currently is such assessments allow behavior that we consider natural socially appropriate or simply polite this is in sharp contrast to current computer and communication systems which are largely unaware of the social situations surrounding their usage and the impact that their actions have on these situations if systems could model human interruptibility they could use this information to negotiate interruptions at appropriate times thus improving human computer interactionthis article presents series of studies that quantitatively demonstrate that simple sensors can support the construction of models that estimate human interruptibility as well as people do these models can be constructed without using complex sensors such as vision based techniques and therefore their use in everyday office environments is both practical and affordable although currently based on demographically limited sample our results indicate substantial opportunity for future research to validate these results over larger groups of office workers our results also motivate the development of systems that use these models to negotiate interruptions at socially appropriate times
this paper describes program representation and algorithms for realizing novel structural testing methodology that not only focuses on addressing the complex features of object oriented languages but also incorporates the structure of object oriented software into the approach the testing methodology is based on the construction of contextual def use associations which provide context to each definition and use of an object testing based on contextual def use associations can provide increased test coverage by identifying multiple unique contextual def use associations for the same context free association such testing methodology promotes more thorough and focused testing of the manipulation of objects in object oriented programs this paper presents technique for the construction of contextual def use associations as well as detailed examples illustrating their construction an analysis of the cost of constructing contextual def use associations with this approach and description of prototype testing tool that shows how the theoretical contributions of this work can be useful for structural test coverage
we are developing virtuoso system for distributed computing using virtual machines vms virtuoso must be uble to mix batch und interactive vms on the same physical hardwure while satisfiing constraint on re sponsiveness und compute rates for each workload vsched is the component of virtuoso that provides this capability vsched is an entirely user level tool that interacts with the stock linux kernel running below any type virtual machine monitor to schedule vms indeed any process using periodic real time scheduling model this abstraction allows compute rate and responsivness constraints to be straightforwardly described using period und slice within the period and it allows for just and simple admission control this paper makes the case for periodic real time scheduling for vm based computing environments and then describes and evaluates vsched it also applies vsched to scheduling parallel worklouds showing that it can help bsp application maintain fixed stable performance despite externally caused loud imbalance
wireless sensor networks can revolutionise soil ecology by providing measurements at temporal and spatial granularities previously impossible this paper presents our first steps towards fulfilling that goal by developing and deploying two experimental soil monitoring networks at urban forests in baltimore md the nodes of these networks periodically measure soil moisture and temperature and store the measurements in local memory raw measurements are incrementally retrieved by sensor gateway and persistently stored in database the database also stores calibrated versions of the collected data the measurement database is available to third party applications through various web services interfaces at high level the deployments were successful in exposing high level variations of soil factors however we have encountered number of challenging technical problems need for low level programming at multiple levels calibration across space and time and sensor faults these problems must be addressed before sensor networks can fulfil their potential as high quality instruments that can be deployed by scientists without major effort or cost
medical cbir content based image retrieval applications pose unique challenges but at the same time offer many new opportunities on one hand while one can easily understand news or sports videos medical image is often completely incomprehensible to untrained eyes on the other hand semantics in the medical domain is much better defined and there is vast accumulation of formal knowledge representations that could be exploited to support semantic search for any specialty areas in medicine in this paper however we will not dwell on any one particular specialty area but rather address the question of how to support scalable semantic search across the whole of medical cbir field what are the advantages to take and gaps to fill what are the key enabling technologies and the critical success factor from an industrial point of view in terms of enabling technologies we discuss three aspects anatomical disease and contextual semantics and their representations using ontologies scalable image analysis and tagging algorithms and ontological reasoning and its role in guiding and improving image analysis and retrieval more specifically for ontological representation of medical imaging semantics we discuss the potential use of fma radlex icd and aim for scalable image analysis we present learning based anatomy detection and segmentation framework using distribution free priors it is easily adaptable to different anatomies and different imaging modalities
internet search companies sell advertisement slots based on users search queries via an auction advertisers have to determine how to place bids on the keywords of their interest in order to maximize their return for given budget this is the budget optimization problem the solution depends on the distribution of future queries in this paper we formulate stochastic versions of the budget optimization problem based on natural probabilistic models of distribution over future queries and address two questions that arise evaluation given solution can we evaluate the expected value of the objective function optimization can we find solution that maximizes the objective function in expectation our main results are approximation and complexity results for these two problems in our three stochastic models in particular our algorithmic results show that simple prefix strategies that bid on all cheap keywords up to some level are either optimal or good approximations for many cases we show other cases to be np hard
the emerging group oriented mobile commerce services are receiving significant interest among researchers developers wireless service providers and users some of these services including mobile auctions mobile financial services and multi party interactive games are transaction oriented and will require the network and protocol support for managing transactions in this paper we focus on technical challenges of managing transactions in group oriented mobile commerce services by presenting framework which includes requirements membership management and support for dependable transactions more specifically we present several group oriented mobile services characterize transaction requirements of group oriented commerce services present protocols for membership management to support both distributed and centralized processing and present multi network access and agent based system for dependable transactions
the ladis workshop on large scale distributed systems brought together leaders from the commercial cloud computing community with researchers working on variety of topics in distributed computing the dialog yielded some surprises some hot research topics seem to be of limited near term importance to the cloud builders while some of their practical challenges seem to pose new questions to us as systems researchers this brief note summarizes our impressions
this paper presents efficient algorithms for broadcasting on heterogeneous switch based networks of workstations hnow by two partitioned sub networks in an hnow many multiple speed types of workstations have different send and receive overheads previous research has found that routing by two sub networks in now can significantly increase system’s performance proc th international conference on computer communications and networks pp similarly ebs and vbbs proc th ieee international symposium on computer and communication pp designed by applying the concept of fastest nodes first can be executed in nlog time where is the number of workstations this paper proposes two schemes two ebs and two vbbs for broadcasting in an hnow these two schemes divide an hnow into two sub networks that are routed concurrently and combine ebs and vbbs to broadcast in an hnow based on simulation results two vbbs outperforms ebs vbbs vbbswf proc th ieee international symposium on computer and communication pp the postorder recursive doubling proc merged ipps spdp conference pp and the optimal scheduling tree proc parallel and distributed processing symposium proc th international generated by dynamic programming in an hnow
we show how to efficiently obtain linear priori bounds on the heap space consumption of first order functional programsthe analysis takes space reuse by explicit deallocation into account and also furnishes an upper bound on the heap usage in the presence of garbage collection it covers wide variety of examples including for instance the familiar sorting algorithms for lists including quicksortthe analysis relies on type system with resource annotations linear programming lp is used to automatically infer derivations in this enriched type systemwe also show that integral solutions to the linear programs derived correspond to programs that can be evaluated without any operating system support for memory management the particular integer linear programs arising in this way are shown to be feasibly solvable under mild assumptions
in this paper the concepts of set valued homomorphism and strong set valued homomorphism of ring are introduced and related properties are investigated the notions of generalized lower and upper approximation operators constructed by means of set valued mapping which is generalization of the notion of lower and upper approximation of ring are provided we also propose the notion of generalized lower and upper approximations with respect to an ideal of ring which is an extended notation of rough ideal introduced lately by davvaz davvaz roughness in rings information science in ring and discuss some significant properties of them
load balancing has been key concern for traditional multiprocessor systems the emergence of computational grids extends this challenge to deal with more serious problems such as scalability heterogeneity of computing resources and considerable transfer delay in this paper we present dynamic and decentralized load balancing algorithm for computationally intensive jobs on heterogeneous distributed computing platform the time spent by job in the system is considered as the main issue that needs to be minimized our main contributions are our algorithm uses site desirability for processing power and transfer delay to guide load assignment and redistribution our transfer and location policies are combination of two specific strategies that are performance driven to minimize execution cost these two policies are the instantaneous distribution policy idp and the load adjustment policy lap the communication overhead involved in information collection is reduced using mutual information feedback the simulation results show that our proposed algorithm outperforms conventional approaches over wide range of system parameters
at present most of the state of the art solutions for xml access controls are either document level access control techniques that are too limited to support fine grained security enforcement view based approaches that are often expensive to create and maintain or impractical proposals that require substantial security related support from underlying xml databases in this paper we take different approach that assumes no security support from underlying xml databases and examine three alternative fine grained xml access control solutions namely primitive pre processing and post processing approaches in particular we advocate pre processing method called qfilter that uses non deterministic finite automata nfa to rewrite user’s query such that any parts violating access control rules are pruned we show the construction and execution of qfilter and demonstrate its superiority to other competing methods
obliq is lexically scoped untyped interpreted language that supports distributed object oriented computation obliq objects have state and are local to site obliq computations can roam over the network while maintaining network connections distributed lexical scoping is the key mechanism for managing distributed computation
many future shared memory multiprocessor servers will both target commercial workloads and use highly integrated glueless designs implementing low latency cache coherence in these systems is difficult because traditional approaches either add indirection for common cache to cache misses directory protocols or require totally ordered interconnect traditional snooping protocols unfortunately totally ordered interconnects are difficult to implement in glueless designs an ideal coherence protocol would avoid indirections and interconnect ordering however such an approach introduces numerous protocol races that are difficult to resolvewe propose new coherence framework to enable such protocols by separating performance from correctness performance protocol can optimize for the common case ie absence of races and rely on the underlying correctness substrate to resolve races provide safety and prevent starvation we call the combination token coherence since it explicitly exchanges and counts tokens to control coherence permissionsthis paper develops tokenb specific token coherence performance protocol that allows glueless multiprocessor to both exploit low latency unordered interconnect like directory protocols and avoid indirection like snooping protocols simulations using commercial workloads show that our new protocol can significantly outperform traditional snooping and directory protocols
to our best knowledge all existing graph pattern mining algorithms can only mine either closed maximal or the complete set of frequent subgraphs instead of graph generators which are preferable to the closed subgraphs according to the minimum description length principle in some applications in this paper we study new problem of frequent subgraph mining called frequent connected graph generator mining which poses significant challenges due to the underlying complexity associated with frequent subgraph mining as well as the absence of apriori property for graph generators whereas we still present an efficient solution fogger for this new problem by exploring some properties of graph generators two effective pruning techniques backward edge pruning and forward edge pruning are proposed to prune the branches of the well known dfs code enumeration tree that do not contain graph generators to further improve the efficiency an effective index structure adi is also devised to facilitate the subgraph isomorphism checking we experimentally evaluate various aspects of fogger using both real and synthetic datasets our results demonstrate that the two pruning techniques are effective in pruning the unpromising parts of search space and fogger is efficient and scalable in terms of the base size of input databases meanwhile the performance study for graph generator based classification model shows that generator based model is much simpler and can achieve almost the same accuracy for classifying chemical compounds in comparison with closed subgraph based model
in this paper we develop recommendation framework to connect image content with communities in online social media the problem is important because users are looking for useful feedback on their uploaded content but finding the right community for feedback is challenging for the end user social media are characterized by both content and community hence in our approach we characterize images through three types of features visual features user generated text tags and social interaction user communication history in the form of comments recommendation framework based on learning latent space representation of the groups is developed to recommend the most likely groups for given image the model was tested on large corpus of flickr images comprising images our method outperforms the baseline method with mean precision and mean recall importantly we show that fusing image content text tags with social interaction features outperforms the case of only using image content or tags
self optimization is one of the defining characteristics of an autonomic computing system for complex system such as the database management system dbms to be self optimizing it should recognize properties of its workload and be able to adapt to changes in these properties over time the workload type for example is key to tuning dbms and may vary over the system’s normal processing cycle continually monitoring dbms using special tool called workload classifier in order to detect changes in the workload type can inevitably impose significant overhead that may degrade the overall performance of the system instead the dbms should selectively monitor the workload during some specific periods recommended by the psychic skeptic prediction psp framework that we introduce in this work the psp framework allows the dbms to forecast major shifts in the workload by combining off line and on line prediction methods we integrate the workload classifier with the psp framework in order to come up with an architecture by which the autonomous dbms can tune itself efficiently our experiments show that this approach is effective and resilient as the prediction framework adapts gracefully to changes in the workload patterns
developing user friendly transformation tools for converting all or part of given relational database into xml has not received enough consideration this paper presents flexible user interface called virex visual relational to xml which facilitates converting selected portion of given underlying relational database into xml virex works even when the catalogue of the underlying relational database is missing for the latter case virex extracts the required catalogue information by analyzing the underlying database content from the catalogue information whether available or extracted virex derives and displays on the screen graph similar to the entity relationship diagram virex provides user friendly interface to specify on the graph certain factors to be considered while converting relational data into xml such factors include selecting the relations attributes to be converted into xml specifying predicate to be satisfied by the information to be converted into xml deciding on the order of nesting between the relations to be converted into xml all of these are specified by sequence of mouse clicks with minimum keyboard input as result virex displays on the screen the xml schema that satisfies the specified characteristics and generates the xml document from the underlying relational database finally virex is essential to optimize the amount of information to be transferred over network by giving the user the flexibility to specify the amount of relational data to be converted into xml also virex can be used to teach xml to beginners
acontextuality of the mobile phone often leads to caller’s uncertainty over callee’s current state which in turn often hampers mobile collaboration we are interested in re designing smartphone’s contact book to provide cues of the current situations of others contextcontacts presents several meaningful automatically communicated situation cues of trusted others its interaction design follows social psychological findings on how people make social attributions based on impoverished cues on how self disclosure of cues is progressively and interactionally managed and on how mobility affects interaction through cues we argue how our design choices support mobile communication decisions and group coordinations by promoting awareness as result the design is very minimal and integrated in an unremarkable manner to previously learned usage patterns with the phone first laboratory and field evaluations indicate important boundary conditions for and promising avenues toward more useful and enjoyable mobile awareness applications
we describe new attack against web authentication which we call dynamic pharming dynamic pharming works by hijacking dns and sending the victim’s browser malicious javascript which then exploits dns rebinding vulnerabilities and the name based same origin policy to hijack legitimate session after authentication has taken place as result the attack works regardless of the authentication scheme used dynamic pharming enables the adversary to eavesdrop on sensitive content forge transactions sniff secondary passwords etc to counter dynamic pharming attacks we propose two locked same origin policies for web browsers in contrast to the legacy same origin policy which regulates cross object access control in browsers using domain names the locked same origin policies enforce access using servers certificates and public keys we show how our policies help two existing web authentication mechanisms client side ssl and ssl only cookies resist both pharming and stronger active attacks also we present deployability analysis of our policies based on study of ssl domains our results suggest one of our policies can be deployed today and interoperate seamlessly with the vast majority of legacy web servers for our other policy we present simple incrementally deployable opt in mechanism for legacy servers using policy files and show how web sites can use policy files to support self signed and untrusted certificates shared subdomain objects and key updates
we propose the model of nested words for representation of data with both linear ordering and hierarchically nested matching of items examples of data with such dual linear hierarchical structure include executions of structured programs annotated linguistic data and html xml documents nested words generalize both words and ordered trees and allow both word and tree operations we define nested word automata mdash finite state acceptors for nested words and show that the resulting class of regular languages of nested words has all the appealing theoretical properties that the classical regular word languages enjoys deterministic nested word automata are as expressive as their nondeterministic counterparts the class is closed under union intersection complementation concatenation kleene ast prefixes and language homomorphisms membership emptiness language inclusion and language equivalence are all decidable and definability in monadic second order logic corresponds exactly to finite state recognizability we also consider regular languages of infinite nested words and show that the closure properties mso characterization and decidability of decision problems carry over the linear encodings of nested words give the class of visibly pushdown languages of words and this class lies between balanced languages and deterministic context free languages we argue that for algorithmic verification of structured programs instead of viewing the program as context free language over words one should view it as regular language of nested words or equivalently visibly pushdown language and this would allow model checking of many properties such as stack inspection pre post conditions that are not expressible in existing specification logics we also study the relationship between ordered trees and nested words and the corresponding automata while the analysis complexity of nested word automata is the same as that of classical tree automata they combine both bottom up and top down traversals and enjoy expressiveness and succinctness benefits over tree automata
nfc and rfid technologies have found their way into current mobile phones and research has presented variety of applications using nfc rfid tags for interaction between physical objects and mobile devices since this type of interaction is widely novel for most users there is considerable initial inhibition threshold for them in order to get novice users started with this physical interaction and its applications we have designed different ways to increase the learnability and guidance of such applications their effectiveness was evaluated in qualitative and quantitative user study with participants who interacted with nfc equipped posters in different ways we report on the types of usage errors observed and show that future designs of nfc rfid based mobile applications should consider using dedicated start tag for interaction
it is not easy to obtain the right information from the web for particular web user or group of users due to the obstacle of automatically acquiring web user profiles the current techniques do not provide satisfactory structures for mining web user profiles this paper presents novel approach for this problem the objective of the approach is to automatically discover ontologies from data sets in order to build complete concept models for web user information needs it also proposes method for capturing evolving patterns to refine discovered ontologies in addition the process of assessing relevance in ontology is established this paper provides both theoretical and experimental evaluations for the approach the experimental results show that all objectives we expect for the approach are achievable
most computer users find organizing large amounts of personal information problematic often hierarchies are the sole means to do it however users can remember broader range of autobiographic contextual data about their personal items unfortunately it can seldom be used to manage and retrieve them even when this is possible it is often done by asking users to fill in values for arbitrary properties in dialog boxes or wizards we propose that narrative based interfaces can be natural and effective way to help users recall relevant autobiographic data about their personal items and convey it to the computer using quill narrative based personal document retrieval interface as case study we show how such an interface can be designed we demonstrate the approach’s validity based on set of user studies discussing how the problems raised by the evaluation of such an interface were overcome
skewness has been one of the major problems not only in parallel relational database systems but also in parallel object oriented database systems to improve performance of object oriented query processing careful and intelligent skew handling for load balancing must be established depending on the parallel machine environment whether it is shared memory or shared nothing architecture load balancing can be achieved through physical or logical data re distribution it is not the aim of this paper to propose or to investigate skew handling methods but rather to analyze the impact of load balancing to query execution scheduling strategies our analysis shows that when load balancing is achieved serial execution scheduling is preferable to parallel execution scheduling strategy in other words allocating full resources to sub query seems to be better than dividing resources to multiple sub queries
authorship identification can be viewed as text categorization task however in this task the most frequent features appear to be the most important discriminators there is usually shortage of training texts and the training texts are rarely evenly distributed over the authors to cope with these problems we propose tensors of second order for representing the stylistic properties of texts our approach requires the calculation of much fewer parameters in comparison to the traditional vector space representation we examine various methods for building appropriate tensors taking into account that similar features should be placed in the same neighborhood based on an existing generalization of svm able to handle tensors we perform experiments on corpora controlled for genre and topic and show that the proposed approach can effectively handle cases where only limited training texts are available
wireless sensor networks have received lot of attention recently due to their wide applications such as target tracking environment monitoring and scientific exploration in dangerous environments it is usually necessary to have cluster of sensor nodes share common view of local clock time so that all these nodes can coordinate in some important applications such as time slotted mac protocols power saving protocols with sleep listen modes etc however all the clock synchronization techniques proposed for sensor networks assume benign environments they cannot survive malicious attacks in hostile environments fault tolerant clock synchronization techniques are potential candidates to address this problem however existing approaches are all resource consuming and suffer from message collisions in most of cases this paper presents novel fault tolerant clock synchronization scheme for clusters of nodes in sensor networks where the nodes in each cluster can communicate through broadcast the proposed scheme guarantees an upper bound of clock difference between any nonfaulty nodes in cluster provided that the malicious nodes are no more than one third of the cluster unlike the traditional fault tolerant clock synchronization approaches the proposed technique does not introduce collisions between synchronization messages nor does it require costly digital signatures
scientific applications often perform complex computational analyses that consume and produce large data sets we are concerned with data placement policies that distribute data in ways that are advantageous for application execution for example by placing data sets so that they may be staged into or out of computations efficiently or by replicating them for improved performance and reliability in particular we propose to study the relationship between data placement services and workflow management systems in this paper we explore the interactions between two services used in large scale science today we evaluate the benefits of prestaging data using the data replication service versus using the native data stage in mechanisms of the pegasus workflow management system we use the astronomy application montage for our experiments and modify it to study the effect of input data size on the benefits of data prestaging as the size of input data sets increases prestaging using data placement service can significantly improve the performance of the overall analysis
register integration or just integration is register renaming discipline that implements instruction reuse via physical register sharing initially developed to perform squash reuse the integration mechanism can exploit more reuse scenarios here we describe three extensions to the original design that expand its applicability and boost its performance impact first we extend squash reuse to general reuse whereas squash reuse maintains the concept of an instruction instance owning its output register we allow multiple instructions to simultaneously share single register next we replace the pc indexing scheme with an opcode based indexing scheme that exposes more integration opportunities finally we introduce an extension called reverse integration in which we speculatively create integration entries for the inverses of operations for instance when renaming an add we create an entry for the inverse subtract reverse integration allows us to reuse operations that the program itself has not executed yet we use reverse integration to implement speculative memory bypassing for stack pointer based loads register fills and restores our evaluation shows that these extensions increase the integration rate the number of retired instructions that integrate older results and bypass the execution engine to an average of on the spec integer benchmarks on way superscalar processor with an aggressive memory system this translates into an average ipc improvement of the fact that integrating instructions completely bypass the execution engine raises the possibility of using integration as low complexity substitute for execution bandwidth and issue buffering our experiments show that such trade off is possible enabling range of ipc complexity designs
multi hop infrastructure wireless mesh networks offer increased reliability coverage and reduced equipment costs over their single hop counterpart wireless lans equipping wireless routers with multiple radios further improves the capacity by transmitting over multiple radios simultaneously using orthogonal channels efficient channel assignment and routing is essential for throughput optimization of mesh clients efficient channel assignment schemes can greatly relieve the interference effect of close by transmissions effective routing schemes can alleviate potential congestion on any gateways to the internet thereby improving per client throughput unlike previous heuristic approaches we mathematically formulate the joint channel assignment and routing problem taking into account the interference constraints the number of channels in the network and the number of radios available at each mesh router we then use this formulation to develop solution for our problem that optimizes the overall network throughput subject to fairness constraints on allocation of scarce wireless capacity among mobile clients we show that the performance of our algorithms is within constant factor of that of any optimal algorithm for the joint channel assignment and routing problem our evaluation demonstrates that our algorithm can effectively exploit the increased number of channels and radios and it performs much better than the theoretical worst case bounds
parameterization of mesh data is important for many graphics applications in particular for texture mapping remeshing and morphing closed manifold genus meshes are topologically equivalent to sphere hence this is the natural parameter domain for them parameterizing triangle mesh onto the sphere means assigning position on the unit sphere to each of the mesh vertices such that the spherical triangles induced by the mesh connectivity are not too distorted and do not overlap satisfying the non overlapping requirement is the most difficult and critical component of this process we describe generalization of the method of barycentric coordinates for planar parameterization which solves the spherical parameterization problem prove its correctness by establishing connection to spectral graph theory and show how to compute these parameterizations
prime implicates and prime implicants have proven relevant to number of areas of artificial intelligence most notably abductive reasoning and knowledge compilation the purpose of this paper is to examine how these notions might be appropriately extended from propositional logic to the modal logic we begin the paper by considering number of potential definitions of clauses and terms for the different definitions are evaluated with respect to set of syntactic semantic and complexity theoretic properties characteristic of the propositional definition we then compare the definitions with respect to the properties of the notions of prime implicates and prime implicants that they induce while there is no definition that perfectly generalizes the propositional notions we show that there does exist one definition which satisfies many of the desirable properties of the propositional case in the second half of the paper we consider the computational properties of the selected definition to this end we provide sound and complete algorithms for generating and recognizing prime implicates and we show the prime implicate recognition task to be pspace complete we also prove upper and lower bounds on the size and number of prime implicates while the paper focuses on the logic all of our results hold equally well for multi modal and for concept expressions in the description logic alc
routing protocols for disruption tolerant networks dtns use variety of mechanisms including discovering the meeting probabilities among nodes packet replication and network coding the primary focus of these mechanisms is to increase the likelihood of finding path with limited information and so these approaches have only an incidental effect on such routing metrics as maximum or average delivery delay in this paper we present rapid an intentional dtn routing protocol that can optimize specific routing metric such as the worst case delivery delay or the fraction of packets that are delivered within deadline the key insight is to treat dtn routing as resource allocation problem that translates the routing metric into per packet utilities that determine how packets should be replicated in the system we evaluate rapid rigorously through prototype deployed over vehicular dtn testbed of buses and simulations based on real traces to our knowledge this is the first paper to report on routing protocol deployed on real outdoor dtn our results suggest that rapid significantly outperforms existing routing protocols for several metrics we also show empirically that for small loads rapid is within of the optimal performance
while object oriented database management systems are already arriving in the marketplace their formal foundations are still under development in this paper one central aspect of such foundations formal models for object oriented databases is considered it is discussed why formal model is desirable what it is supposed to comprise in this context structural as well as behavioral part and how this can be achieved to this end the central ingredients which are shared by many proposed models are presented in some detail this carries over to design issues for database descriptions in an object oriented model for which two distinct strategies are outlined finally the question is discussed whether the modeling concepts described are indeed the ones that the applications which originally triggered the merger of database technology with object oriented concepts need our argument is that this is only partially the case and two promising directions for future work are sketched
current methods of using lexical features in machine translation have difficulty in scaling up to realistic mt tasks due to prohibitively large number of parameters involved in this paper we propose methods of using new linguistic and contextual features that do not suffer from this problem and apply them in state of the art hierarchical mt system the features used in this work are non terminal labels non terminal length distribution source string context and source dependency lm scores the effectiveness of our techniques is demonstrated by significant improvements over strong base line on arabic to english translation improvements in lower cased bleu are on nist mt and on mt newswire data on decoding output on chinese to english translation the improvements are on mt and on mt newswire data
the annotation of web sites in social bookmarking systems has become popular way to manage and find information on the web the community structure of such systems attracts spammers recent post pages popular pages or specific tag pages can be manipulated easily as result searching or tracking recent posts does not deliver quality results annotated in the community but rather unsolicited often commercial web sites to retain the benefits of sharing one’s web content spam fighting mechanisms that can face the flexible strategies of spammers need to be developed classical approach in machine learning is to determine relevant features that describe the system’s users train different classifiers with the selected features and choose the one with the most promising evaluation results in this paper we will transfer this approach to social bookmarking setting to identify spammers we will present features considering the topological semantic and profile based information which people make public when using the system the dataset used is snapshot of the social bookmarking system bibsonomy and was built over the course of several months when cleaning the system from spam based on our features we will learn large set of different classification models and compare their performance our results represent the groundwork for first application in bibsonomy and for the building of more elaborate spam detection mechanisms
we give polynomial time algorithm to find shortest contractible cycle ie closed walk without repeated vertices in graph embedded in surface this answers question posed by hutchinson in contrast we show that finding shortest contractible cycle through given vertex is np hard we also show that finding shortest separating cycle in an embedded graph is np hard this answers question posed by mohar and thomassen
we develop new approach that uses the ordered weighted averaging owa operator in the selection of financial products in doing so we introduce the ordered weighted averaging distance owad operator and the ordered weighted averaging adequacy coefficient owaac operator these aggregation operators are very useful for decision making problems because they establish comparison between an ideal alternative and available options in order to find the optimal choice the objective of this new model is to manipulate the attitudinal character of previous methods based on distance measures so that the decision maker can select financial products according to his or her degree of optimism which is also known as the orness measure the main advantage of using the owa operator is that we can generate parameterized family of aggregation operators between the maximum and the minimum thus the analysis developed in the decision process by the decision maker is much more complete because he or she is able to select the particular case in accordance with his or her interests in the aggregation process the paper ends with an illustrative example that shows results obtained by using different types of aggregation operators in the selection of financial products
deforming surfaces such as cloth can be generated through physical simulation morphing and even video capture such data is currently very difficult to alter after the generation process is complete and data generated for one purpose generally cannot be adapted to other uses such adaptation would be extremely useful however being able to take cloth captured from flapping flag and attach it to character to make cape or enhance the wrinkles on simulated garment would greatly enhance the usability and re usability of deforming surface data in addition it is often necessary to cleanup or tweak simulation results doing this by editing each frame individually is very time consuming and tedious process extensive research has investigated how to edit and re use skeletal motion capture data but very little has addressed completely non rigid deforming surfaces we have developed novel method that now makes it easy to edit such arbitrary deforming surfaces our system enables global signal processing direct manipulation multiresolution embossing and constraint editing on arbitrarily deforming surfaces such as simulated cloth motion captured cloth morphs and other animations the foundation of our method is novel time varying multiresolution transform which adapts to the changing geometry of the surface in temporally coherent manner
we pose the correspondence problem as one of energy based segmentation in this framework correspondence assigns each pixel in an image to exactly one of several non overlapping regions and it also computes displacement function for each region the framework is better able to capture the scene geometry than the more direct formulation of matching pixels in two or more images particularly when the surfaces in the scene are not fronto parallel to illustrate the framework we present specific correspondence algorithm that minimizes an energy functional by alternating between segmenting the image into number of non overlapping regions using the multiway cut algorithm of boykov veksler and zabih and finding the affine parameters describing the displacement of the pixels in each region after convergence final step escapes local minima due to over segmentation the basic algorithm is extended in two ways using ground control points to detect long thin regions and warping segmentation results to efficiently process image sequences experiments on real images show the algorithm’s ability to find an accurate segmentation and displacement map as well as discontinuities and creases on wide variety of stereo and motion imagery
we present novel approach to the analysis of the reliability of component based system that takes into account an important architectural attribute namely the error propagation probability this is the probability that an error arising somewhere in the system propagates to other components possibly up to the system output as we show in the paper this attribute may heavily affect decisions on crucial architectural choices nonetheless it is often neglected in modeling the reliability of component based systems our modeling approach provides useful support to the reliability engineering of component based systems since it can be used to drive several significant tasks such as placing error detection and recovery mechanisms ii focusing the design implementation and selection efforts on critical components iii devising cost effective testing strategies we illustrate the approach on an atm example system
the video databases have become popular in various areas due to the recent advances in technology video archive systems need user friendly interfaces to retrieve video frames in this paper user interface based on natural language processing nlp to video database system is described the video database is based on content based spatio temporal video data model the data model is focused on the semantic content which includes objects activities and spatial properties of objects spatio temporal relationships between video objects and also trajectories of moving objects can be queried with this data model in this video database system natural language interface enables flexible querying the queries which are given as english sentences are parsed using link parser the semantic representations of the queries are extracted from their syntactic structures using information extraction techniques the extracted semantic representations are used to call the related parts of the underlying video database system to return the results of the queries not only exact matches but similar objects and activities are also returned from the database with the help of the conceptual ontology module this module is implemented using distance based method of semantic similarity search on the semantic domain independent ontology wordnet
we present an efficient algorithm for collision detection between static rigid objects using dual bounding volume hierarchy which consists of an oriented bounding box obb tree enhanced with bounding spheres this approach combines the compactness of obbs and the simplicity of spheres the majority of distant objects are separated using the simpler sphere tests the remaining objects are in close proximity where some separation axes are significantly more effective than others we select from among the potential separating axes for obbs experimental results show that our algorithm achieves considerable speedup in most cases with respect to the existing obb algorithms
we describe new class of utility maximization scheduling problem with precedence constraints the disconnected staged scheduling problem dssp dssp is nonpreemptive multiprocessor deadline scheduling problem that arises in several commercially important applications including animation rendering protein analysis and seismic signal processing dssp differs from most previously studied deadline scheduling problems because the graph of precedence constraints among tasks within jobs is disconnected with one component per job another difference is that in practice we often lack accurate estimates of task execution times and so purely offline solutions are not possible however we do know the set of jobs and their precedence constraints up front and therefore some offline planning is possibleour solution decomposes dssp into an offline job selection phase followed by an online task dispatching phase we model the former as knapsack problem and explore several solutions to it describe new dispatching algorithm for the latter and compare both with existing methods our theoretical results show that while dssp is np hard and inapproximable in general our two phase scheduling method guarantees good performance bound for many special cases our empirical results include an evaluation of scheduling algorithms on real animation rendering workload we present characterization of this workload in companion paper the workload records eight weeks of activity on cpu cluster used to render portions of the full length animated feature film shrek in we show that our improved scheduling algorithms can substantially increase the aggregate value of completed jobs compared to existing practices our new task dispatching algorithm lcpf performs well by several metrics including job completion times as well as the aggregate value of completed jobs
collecting and annotating exemplary cases is costly and critical task that is required in early stages of any classification process reducing labeling cost without degrading accuracy calls for compromise solution which may be achieved with active learning common active learning approaches focus on accuracy and assume the availability of pre labeled set of exemplary cases covering all classes to learn this assumption does not necessarily hold in this paper we study the capabilities of new active learning approach confidence in rapidly covering the case space when compared to the traditional active learning confidence criterion when the representativeness assumption is not met experimental results also show that confidence reduces the number of queries required to achieve complete class coverage and tends to improve or maintain classification error
the ml module system provides powerful parameterization facilities but lacks the ability to split mutually recursive definitions across modules and provides insufficient support for incremental programming promising approach to solve these issues is ancona and zucca’s mixin module calculus cms however the straightforward way to adapt it to ml fails because it allows arbitrary recursive definitions to appear at any time which ml does not otherwise support in this article we enrich cms with refined type system that controls recursive definitions through the use of dependency graphs we then develop and prove sound separate compilation scheme directed by dependency graphs that translates mixin modules down to call by value lambda calculus extended with nonstandard let rec construct
scratchpad memory has been introduced as replacement for cache memory as it improves the performance of certain embedded systems additionally it has also been demonstrated that scratchpad memory can significantly reduce the energy consumption of the memory hierarchy of embedded systems this is significant as the memory hierarchy consumes substantial proportion of the total energy of an embedded system this paper deals with optimization of the instruction memory scratchpad based on novel methodology that uses metric which we call the concomitance this metric is used to find basic blocks which are executed frequently and in close proximity in time once such blocks are found they are copied into the scratchpad memory at appropriate times this is achieved using special instruction inserted into the code at appropriate places for set of benchmarks taken from mediabench our scratchpad system consumed just avg of the energy of the cache system and avg of the energy of the state of the art scratchpad system while improving the overall performance compared to the state of the art method the number of instructions copied into the scratchpad memory from the main memory is reduced by
we demonstrate that shape contexts can be used to quickly prune search for similar shapes we present two algorithms for rapid shape retrieval representative shape contexts performing comparisons based on small number of shape contexts and shapemes using vector quantization in the space of shape contexts to obtain prototypical shape pieces
we present novel language model based approach to re ranking an initially retrieved list so as to improve precision at top ranks our model integrates whole document information with that induced from passages specifically inter passage inter document and query based similarities are integrated in our model empirical evaluation demonstrates the effectiveness of our approach
current anomaly detection schemes focus on control flow monitoring recently chen et al discovered that large category of attacks tamper program data but do not alter control flows these attacks are not only realistic but are also as important as classical attacks tampering control flows detecting these attacks is critical issue but has received little attention so far in this work we propose an intrusion detection scheme with both compiler and micro architecture support detecting data tampering directly the compiler first identifies program regions in which the data should not be modified as per program semantics then the compiler performs an analysis to determine the conditions for modification of variables in different program regions and conveys this information to the hardware and the hardware checks the data accesses based on the information if the compiler asserts that the data should not be modified but there is an attempt to do so at runtime an attack is detected the compiler starts with basic scheme achieving maximum data protection but such scheme also suffers from high performance overhead we then attempt to reduce the performance overhead through different optimization techniques our experiments show that our scheme achieves strong memory protection with tight control over the performance degradation thus our major contribution is to provide an efficient scheme to detect data tampering while minimizing the overhead
in the recent years query answering over description logic dl knowledge bases has been receiving increasing attention and various methods and techniques have been presented for this problem in this paper we consider knots which are an instance of the mosaic technique from modal logic when annotated with suitable query information knots are flexible tool for query answering that allows for solving the problem in simple and intuitive way the knot approach yields optimal complexity bounds as we illustrate on the dls mathcal alch and mathcal alchi and can be easily extended to accommodate other constructs
in this paper we describe new density biased sampling algorithm it exploits spatial indexes and the local density information they preserve to provide improved quality of sampling result and fast access to elements of the dataset it attains improved sampling quality with respect to factors like skew noise or dimensionality moreover it has the advantage of efficiently handling dynamic updates and it requires low execution times the performance of the proposed method is examined experimentally the comparative results illustrate its superiority over existing methods
attributed graphs are increasingly more common in many application domains such as chemistry biology and text processing central issue in graph mining is how to collect informative subgraph patterns for given learning task we propose an iterative mining method based on partial least squares regression pls to apply pls to graph data sparse version of pls is developed first and then it is combined with weighted pattern mining algorithm the mining algorithm is iteratively called with different weight vectors creating one latent component per one mining call our method graph pls is efficient and easy to implement because the weight vector is updated with elementary matrix calculations in experiments our graph pls algorithm showed competitive prediction accuracies in many chemical datasets and its efficiency was significantly superior to graph boosting gboost and the naive method based on frequent graph mining
naive bayes and logistic regression perform well in different regimes while the former is very simple generative model which is efficient to train and performs well empirically in many applications the latter is discriminative model which often achieves better accuracy and can be shown to outperform naive bayes asymptotically in this paper we propose novel hybrid model partitioned logistic regression which has several advantages over both naive bayes and logistic regression this model separates the original feature space into several disjoint feature groups individual models on these groups of features are learned using logistic regression and their predictions are combined using the naive bayes principle to produce robust final estimation we show that our model is better both theoretically and empirically in addition when applying it in practical application email spam filtering it improves the normalized auc score at false positive rate by and compared to naive bayes and logistic regression when using the exact same training examples
following recent interest in the study of computer science problems in game theoretic setting we consider the well known bin packing problem where the items are controlled by selfish agents each agent is charged with cost according to the fraction of the used bin space its item requires that is the cost of the bin is split among the agents proportionally to their sizes thus the selfish agents prefer their items to be packed in bin that is as full as possible the social goal is to minimize the number of the bins used the social cost in this case is therefore the number of bins used in the packinga pure nash equilibrium is packing where no agent can obtain smaller cost by unilaterally moving his item to different bin while other items remain in their original positions strong nash equilibrium is packing where there exists no subset of agents all agents in which can profit from jointly moving their items to different bins we say that all agents in subset profit from moving their items to different bins if all of them have strictly smaller cost as result of moving while the other items remain in their positionswe measure the quality of the equilibria using the standard measures poa and pos that are defined as the worst case worst best asymptotic ratio between the social cost of pure nash equilibrium and the cost of an optimal packing respectively we also consider the recently introduced measures spoa and spos that are defined similarly to the poa and the pos but consider only strong nash equilibriawe give nearly tight lower and upper bounds of and respectively on the poa of the bin packing game improving upon previous result by bilò and establish the fact that pos we show that the bin packing game admits strong nash equilibrium and that spoa spos we prove that this value is equal to the approximation ratio of natural greedy algorithm for bin packing
semantic web applications share large portion of development effort with database driven web applications existing approaches for development of these database driven applications cannot be directly applied to semantic web data due to differences in the underlying data model we develop mapping approach that embeds semantic web data into object oriented languages and thereby enables reuse of existing web application frameworks we analyse the relation between the semantic web and the web and survey the typical data access patterns in semantic web applications we discuss the mismatch between object oriented programming languages and semantic web data for example in the semantics of class membership inheritance relations and object conformance to schemas we present activerdf an object oriented api for managing rdf data that offers full manipulation and querying of rdf data does not rely on schema and fully conforms to rdf semantics activerdf can be used with different rdf data stores adapters have been implemented to generic sparql endpoints sesame jena redland and yars and new adapters can be added easily we demonstrate the usage of activerdf and its integration with the popular ruby on rails framework which enables rapid development of semantic web applications
we propose new synchronization protocol devised to support multiplayer online games mogs over peer to peer architectures the dissemination of game events is performed through an overlay network peers are kept synchronized thanks to an optimistic mechanism which is able to drop obsolete events ie events that lose their importance as the game goes on and to allow different processing orders for non correlated events ie events which do not represent competing actions in the virtual world to allow fast identification of obsolete events gossip protocol is added to the synchronization mechanism which is in charge of spreading in background information on generated game events results coming from extensive simulations confirm the viability of our approach
operating system memory managers fail to consider the population of read versus write pages in the buffer pool or outstanding requests when writing dirty pages to disk or network file systems this leads to bursty patterns which stall processes reading data and reduce the efficiency of storage we address these limitations by adaptively allocating memory between write buffering and read caching and by writing dirty pages to disk opportunistically before the operating system submits them for write back we implement and evaluate our methods within the linux operating system and show performance gains of more than for mixed read write workloads
in order to generate high quality code for modern processors compiler must aggressively schedule instructions maximizing resource utilization for execution efficiency for compiler to produce such code it must avoid structural hazards by being aware of the processor’s available resources and of how these resources are utilized by each instruction unfortunately the most prevalent approach to constructing such scheduler manually discovering and specifying this information is both tedious and error prone this paper presents new approach which when given processor or processor model automatically determines this information after establishing that the problem of perfectly determining processor’s structural hazards through probing is not solvable this paper proposes heuristic algorithm that discovers most of this information in practice this can be used either to alleviate the problems associated with manual creation or to verify an existing specification scheduling with these automatically derived structural hazards yields almost all of the performance gain achieved using perfect hazard information
there is growing recognition that programming platforms should support the decomposition of programs into components independent units of compiled code that are explicitly linked to form complete programs this paper describes how to formulate general component system for nominally typed object oriented language supporting first class generic types simply by adding appropriate annotations and syntactic sugar the fundamental semantic building blocks for constructing type checking and manipulating components are provided by the underlying first class generic type system to demonstrate the simplicity and utility of this approach to supporting components we have designed and implemented an extension of java called component nextgen cgen cgen which is based on the sun java javac compiler is backward compatible with existing code and runs on current java virtual machines
we have recently begun to see hardware support for the tabletop user interface offering number of new ways for humans to interact with computers tabletops offer great potential for face to face social interaction advances in touch technology and computer graphics provide natural ways to directly manipulate virtual objects which we can display on the tabletop surface such an interface has the potential to benefit wide range of the population and it is important that we design for usability and learnability with diverse groups of peoplethis paper describes the design of sharepic multiuser multi touch gestural collaborative digital photograph sharing application for tabletop and our evaluation with both young adult and elderly user groups we describe the guidelines we have developed for the design of tabletop interfaces for range of adult users including elders and the user interface we have built based on them novel aspects of the interface include design strongly influenced by the metaphor of physical photographs placed on the table with interaction techniques designed to be easy to learn and easy to remember in our evaluation we gave users the final task of creating digital postcard from collage of photographs and performed realistic think aloud with pairs of novice participants learning together from tutorial script
the processing of data streams in general and the mining of such streams in particular have recently attracted considerable attention in various research fields key problem in stream mining is to extend existing machine learning and data mining methods so as to meet the increased requirements imposed by the data stream scenario including the ability to analyze incoming data in an online incremental manner to observe tight time and memory constraints and to appropriately respond to changes of the data characteristics and underlying distributions amongst others this paper considers the problem of classification on data streams and develops an instance based learning algorithm for that purpose the experimental studies presented in the paper suggest that this algorithm has number of desirable properties that are not at least not as whole shared by currently existing alternatives notably our method is very flexible and thus able to adapt to an evolving environment quickly point of utmost importance in the data stream context at the same time the algorithm is relatively robust and thus applicable to streams with different characteristics
texture atlas is an efficient color representation for paint systems the model to be textured is decomposed into charts homeomorphic to discs each chart is parameterized and the unfolded charts are packed in texture space existing texture atlas methods for triangulated surfaces suffer from several limitations requiring them to generate large number of small charts with simple borders the discontinuities between the charts cause artifacts and make it difficult to paint large areas with regular patternsin this paper our main contribution is new quasi conformal parameterization method based on least squares approximation of the cauchy riemann equations the so defined objective function minimizes angle deformations and we prove the following properties the minimum is unique independent of similarity in texture space independent of the resolution of the mesh and cannot generate triangle flips the function is numerically well behaved and can therefore be very efficiently minimized our approach is robust and can parameterize large charts with complex borderswe also introduce segmentation methods to decompose the model into charts with natural shapes and new packing algorithm to gather them in texture space we demonstrate our approach applied to paint both scanned and modeled data sets
much research has focused on content based image retrieval cbir methods that can be automated in image classification and query processing in this paper we propose blob centric image retrieval scheme based on the blobworld representation the blob centric scheme consists of several newly proposed components including an image classification method an image browsing method based on semantic hierarchy of representative blobs and blob search method based on multidimensional indexing we present the database structures and their maintenance algorithms for these components and conduct performance comparison of three image retrieval methods the naive method the representative blobs method and the indexed blobs method our quantitative analysis shows significant reduction in query response time by using the representative blobs method and the indexed blobs method
continuations can be used to explain wide variety of control behaviours including calling returning procedures raising handling exceptions labelled jumping tt goto statements process switching coroutines and backtracking however continuations are often manipulated in highly stylised way and we show that all of these bar backtracking in fact use their continuations linearly semi this is formalised by taking target language for cps transforms that has both intuitionistic and linear function types
detecting strong conjunctive predicates is fundamental problem in debugging and testing distributed programs strong conjunctive predicate is logical statement to represent the desired event of the system therefore if the predicate is not true an error may occur because the desired event does not happen recently several reported detection algorithms reveal the problem of unbounded state queue growth since the system may generate huge amount of execution states in very short time in order to solve this problem this paper introduces the notion of removable states which can be disregarded in the sense that detection results still remain correct fully distributed algorithm is developed in this paper to perform the detection in an online manner based on the notion of removable states the time complexity of the detection algorithm is improved as the number of states to be evaluated is reduced
collecting program’s execution profile is important for many reasons code optimization memory layout program debugging and program comprehension path based execution profiles are more detailed than count based execution profiles since they present the order of execution of the various blocks in program modules procedures basic blocks etc recently online string compression techniques have been employed for collecting compact representations of sequential program executions in this paper we show how similar approach can be taken for shared memory parallel programs our compaction scheme yields one to two orders of magnitude compression compared to the uncompressed parallel program trace on some of the splash benchmarks our compressed execution traces contain detailed information about synchronization and control data flow which can be exploited for post mortem analysis in particular information in our compact execution traces are useful for accurate data race detection detecting unsynchronized shared variable accesses that occurred in the execution
traditional models of information retrieval assume documents are independently relevant but when the goal is retrieving diverse or novel information about topic retrieval models need to capture dependencies between documents such tasks require alternative evaluation and optimization methods that operate on different types of relevance judgments we define faceted topic retrieval as particular novelty driven task with the goal of finding set of documents that cover the different facets of an information need faceted topic retrieval system must be able to cover as many facets as possible with the smallest number of documents we introduce two novel models for faceted topic retrieval one based on pruning set of retrieved documents and one based on retrieving sets of documents through direct optimization of evaluation measures we compare the performance of our models to mmr and the probabilistic model due to zhai et al on set of topics annotated with facets showing that our models are competitive
capacity limitation is one of the fundamental issues in wireless mesh networks this paper addresses capacity improvement issues in multiradio multi channel wireless mesh networks our objective is to find both dynamic and static channel assignments and corresponding link schedules that maximize the network capacity we focus on determining the highest gain we can achieve from increasing the number of radios and channels under certain traffic demands we consider two different types of traffic demands one is expressed in the form of data size vector and the other is in the form of data rate vector for the first type of traffic demand our objective is to minimize the number of time slots to transport all the data for the second type of traffic demand our objective is to satisfy the bandwidth requirement as much as possible we perform trade off analysis between network performance and hardware cost based on the number of radios and channels in different topologies this work provides valuable insights for wireless mesh network designers during network planning and deployment
frequently proposed solution to node misbehavior in mobile ad hoc networks is to use reputation systems but in ephemeral networks new breed of mobile networks where contact times between nodes are short and neighbors change frequently reputations are hard to build in this case local revocation is faster and more efficient alternative in this paper we define game theoretic model to analyze the various local revocation strategies we establish and prove the conditions leading to subgame perfect equilibria we also derive the optimal parameters for voting based schemes then we design protocol based on our analysis and the practical aspects that cannot be captured in the model with realistic simulations on ephemeral networks we compare the performance and economic costs of the different techniques
real world data especially when generated by distributed measurement infrastructures such as sensor networks tends to be incomplete imprecise and erroneous making it impossible to present it to users or feed it directly into applications the traditional approach to dealing with this problem is to first process the data using statistical or probabilistic models that can provide more robust interpretations of the data current database systems however do not provide adequate support for applying models to such data especially when those models need to be frequently updated as new data arrives in the system hence most scientists and engineers who depend on models for managing their data do not use database systems for archival or querying at all at best databases serve as persistent raw data storein this paper we define new abstraction called model based views and present the architecture of mauvedb the system we are building to support such views just as traditional database views provide logical data independence model based views provide independence from the details of the underlying data generating mechanism and hide the irregularities of the data by using models to present consistent view to the users mauvedb supports declarative language for defining model based views allows declarative querying over such views using sql and supports several different materialization strategies and techniques to efficiently maintain them in the face of frequent updates we have implemented prototype system that currently supports views based on regression and interpolation using the apache derby open source dbms and we present results that show the utility and performance benefits that can be obtained by supporting several different types of model based views in database system
in comparison to maps mobile maps involve volumetric instead of flat representation of space realistic instead of symbolic representation of objects more variable views that are directional and bound to first person perspective more degrees of freedom in movement and dynamically changing object details we conducted field experiment to understand the influence of these qualities on mobile spatial task where buildings shown on the map were to be localized in the real world the representational differences were reflected in how often users interact with the physical environment and in when they are more likely to physically turn and move the device instead of using virtual commands maps direct users into using reliable and ubiquitous environmental cues like street names and crossings and better affords the use of pre knowledge and bodily action to reduce cognitive workload both acclaimed virtues of mobile maps rapid identification of objects and ego centric alignment worked poorly due reasons we discuss however with practice some users learned to shift to like strategies and could thereby improve performance we conclude with discussion of how representational differences in mobile maps affect strategies of embodied interaction
the intuition that different text classifiers behave in qualitatively different ways has long motivated attempts to build better metaclassifier via some combination of classifiers we introduce probabilistic method for combining classifiers that considers the context sensitive reliabilities of contributing classifiers the method harnesses reliability indicators mdash variables that provide signals about the performance of classifiers in different situations we provide background present procedures for building metaclassifiers that take into consideration both reliability indicators and classifier outputs and review set of comparative studies undertaken to evaluate the methodology
monodic first order temporal logic is fragment of first order temporal logic for which sound and complete calculi have been devised one such calculus is ordered fine grained resolution with selection which is implemented in the theorem prover temp however the architecture of temp cannot guarantee the fairness of its derivations in this paper we present an architecture for resolution based monodic first order temporal logic prover that can ensure fair derivations and we describe the implementation of this fair architecture in the theorem prover tspass
in this paper we define and study new opinionated text data analysis problem called latent aspect rating analysis lara which aims at analyzing opinions expressed about an entity in an online review at the level of topical aspects to discover each individual reviewer’s latent opinion on each aspect as well as the relative emphasis on different aspects when forming the overall judgment of the entity we propose novel probabilistic rating regression model to solve this new text mining problem in general way empirical experiments on hotel review data set show that the proposed latent rating regression model can effectively solve the problem of lara and that the detailed analysis of opinions at the level of topical aspects enabled by the proposed model can support wide range of application tasks such as aspect opinion summarization entity ranking based on aspect ratings and analysis of reviewers rating behavior
password based three party authenticated key exchange protocols are extremely important to secure communications and are now extensively adopted in network communications these protocols allow users to communicate securely over public networks simply by using easy to remember passwords in considering authentication between server and user this study categorizes password based three party authenticated key exchange protocols into explicit server authentication and implicit server authentication the former must achieve mutual authentication between server and users while executing the protocol while the latter only achieves authentication among users this study presents two novel simple and efficient three party authenticated key exchange protocols one protocol provides explicit server authentication and the other provides implicit server authentication the proposed protocols do not require server public keys additionally both protocols have proven secure in the random oracle model compared with existing protocols the proposed protocols are more efficient and provide greater security
power leakage constitutes an increasing fraction of the total power consumption in modern semiconductor technologies recent research efforts indicate that architectures compilers and software can be optimized so as to reduce the switching power also known as dynamic power in microprocessors this has lead to interest in using architecture and compiler optimization to reduce leakage power also known as static power in microprocessors in this article we investigate compiler analysis techniques that are related to reducing leakage power the architecture model in our design is system with an instruction set to support the control of power gating at the component level our compiler provides an analysis framework for utilizing instructions to reduce the leakage power we present framework for analyzing data flow for estimating the component activities at fixed points of programs whilst considering pipeline architectures we also provide equations that can be used by the compiler to determine whether employing power gating instructions in given program blocks will reduce the total energy requirements as the duration of power gating on components when executing given program routines is related to the number and complexity of program branches we propose set of scheduling policies and evaluate their effectiveness we performed experiments by incorporating our compiler analysis and scheduling policies into suif compiler tools and by simulating the energy consumptions on wattch toolkits the experimental results demonstrate that our mechanisms are effective in reducing leakage power in microprocessors
we provide smoothed analysis of hoare’s find algorithm and we revisit the smoothed analysis of quicksort hoare’s find algorithm often called quickselect is an easy to implement algorithm for finding the th smallest element of sequence while the worst case number of comparisons that hoare’s find needs is the average case number is we analyze what happens between these two extremes by providing smoothed analysis of the algorithm in terms of two different perturbation models additive noise and partial permutationsin the first model an adversary specifies sequence of numbers of and then each number is perturbed by adding random number drawn from the interval we prove that hoare’s find needs theta frac sqrt comparisons in expectation if the adversary may also specify the element that we would like to find furthermore we show that hoare’s find needs fewer comparisons for finding the medianin the second model each element is marked with probability and then random permutation is applied to the marked elements we prove that the expected number of comparisons to find the median is in omega big frac np log big which is again tightfinally we provide lower bounds for the smoothed number of comparisons of quicksort and hoare’s find for the median of three pivot rule which usually yields faster algorithms than always selecting the first element the pivot is the median of the first middle and last element of the sequence we show that median of three does not yield significant improvement over the classic rule the lower bounds for the classic rule carry over to median of three
in this article novel local experts organization leo model for processing tree structures with its application of natural scene images classification is presented instead of relatively poor representation of image features in flat vector form we proposed to extract the features and encode them into binary tree representation the proposed leo model is used to generalize this tree representation in order to perform the classification task the capabilities of the proposed leo model are evaluated in simulations running under different image scenarios experimental results demonstrate that the leo model is consistent in terms of robustness amongst the other tested classifiers
this paper describes the issue of piconet interconnection for bluetooth technology these larger networks known as scatternets have the potential to increase networking flexibility and facilitate new applications while the bluetooth specification permits piconet interconnection the creation operation and maintenance of scatternets remains open in this paper the research contributions in this arena are brought together to give an overview of the state of the art first operation of the bluetooth system is explained followed by the mechanism for link formation then the issue of piconet interconnection is considered in detail processes for network formation routing and intra and inter piconet scheduling are explained and classified finally the research issues arising are outlined
variable and feature selection have become the focus of much research in areas of application for which datasets with tens or hundreds of thousands of variables are available these areas include text processing of internet documents gene expression array analysis and combinatorial chemistry the objective of variable selection is three fold improving the prediction performance of the predictors providing faster and more cost effective predictors and providing better understanding of the underlying process that generated the data the contributions of this special issue cover wide range of aspects of such problems providing better definition of the objective function feature construction feature ranking multivariate feature selection efficient search methods and feature validity assessment methods
we present an interactive and accurate collision detection algorithm for deformable polygonal objects based on the streaming computational model our algorithm can detect all possible pairwise primitive level intersections between two severely deforming models at highly interactive rates in our streaming computational model we consider set of axis aligned bounding boxes aabbs that bound each of the given deformable objects as an input stream and perform massively parallel pairwise overlapping tests onto the incoming streams as result we are able to prevent performance stalls in the streaming pipeline that can be caused by expensive indexing mechanism required by bounding volume hierarchy based streaming algorithms at runtime as the underlying models deform over time we employ novel streaming algorithm to update the geometric changes in the aabb streams moreover in order to get only the computed result ie collision results between aabbs without reading back the entire output streams we propose streaming en decoding strategy that can be performed in hierarchical fashion after determining overlapped aabbs we perform primitive level eg triangle intersection checking on serial computational model such as cpus we implemented the entire pipeline of our algorithm using off the shelf graphics processors gpus such as nvidia geforce gtx for streaming computations and intel dual core processors for serial computations we benchmarked our algorithm with different models of varying complexities ranging from up to triangles under various deformation motions and the timings were obtained as sim fps depending on the complexity of models and their relative configurations finally we made comparisons with well known gpu based collision detection algorithm cullide check end of sentence and observed about three times performance improvement over the earlier approach we also made comparisons with sw based aabb culling algorithm check end of sentence and observed about two times improvement
using software based technique to dynamically decompress selected code fragments during program execution
we consider policies that are described by regular expressions finite automata or formulae of linear temporal logic ltl such policies are assumed to describe situations that are problematic and thus should be avoided given trace pattern ie sequence of action symbols and variables were the variables stand for unknown ie not observed sequences of actions we ask whether potentially violates given policy ie whether the variables in can be replaced by sequences of actions such that the resulting trace belongs to we also consider the dual case where the regular policy is supposed to describe all the admissible situations here we want to know whether always adheres to the given policy ie whether all instances of belong to we determine the complexity of the violation and the adherence problem depending on whether trace patterns are linear or not and on whether the policy is assumed to be fixed or not
the impetus behind semantic web research remains the vision of supplementing availability with utility that is the world wide web provides availability of digital media but the semantic web will allow presently available digital media to be used in unseen ways an example of such an application is multimedia retrieval at present there are vast amounts of digital media available on the web once this media gets associated with machine understandable metadata the web can serve as potentially unlimited supplier for multimedia web services which could populate themselves by searching for keywords and subsequently retrieving images or articles which is precisely the type of system that is proposed in this paper such system requires solid interoperability central ontology semantic agent search capabilities and standards specifically this paper explores this cross section of image annotation and semantic web services models the web service components that constitute such system discusses the sequential cooperative execution of these semantic web services and introduces intelligent storage of image semantics as part of semantic link space
when program uses software transactionalmemory stm to synchronize accesses to shared memory the performance often depends on which stm implementation is used implementation vary greatly in their underlying mechanisms in the features they provide and in the assumptions they make about the common case consequently the best choice of algorithm is workload dependent worse yet for workload composed of multiple phases of execution the best choice of implementation may change during execution we present low overhead system for adapting between stm implementations like previous work our system enable adaptivity between different parameterizations of given algorithm and it allows adapting between the use of transactions and coarse grained locks in addition we support dynamic switching between fundamentally different stm implementations we also explicitly support irrevocability retry based condition synchronization and privatization through series of experiments we show that our system introduces negligible overhead we also present candidate use of dynamic adaptivity as replacement for contention management when using adaptivity in this manner stm implementations can be simplified to great degree without lowering throughput or introducing risk of pathological slowdown even for challenging workloads
many distributed applications can make use of large background transfers transfers of data that humans are not waiting for to improve availability reliability latency or consistency however given the rapid fluctuations of available network bandwidth and changing resource costs due to technology trends hand tuning the aggressiveness of background transfers risks complicating applications being too aggressive and interfering with other applications and being too timid and not gaining the benefits of background transfers our goal is for the operating system to manage network resources in order to provide simple abstraction of near zero cost background transfers our system tcp nice can provably bound the interference inflicted by background flows on foreground flows in restricted network model and our microbenchmarks and case study applications suggest that in practice it interferes little with foreground flows reaps large fraction of spare network bandwidth and simplifies application construction and deployment for example in our prefetching case study application aggressive prefetching improves demand performance by factor of three when nice manages resources but the same prefetching hurts demand performance by factor of six under standard network congestion control
we study clustering problems in the streaming model where the goal is to cluster set of points by making one pass or few passes over the data using small amount of storage space our main result is randomized algorithm for the median problem which produces constant factor approximation in one pass using storage space poly log this is significant improvement of the previous best algorithm which yielded approximation using nε space next we give streaming algorithm for the median problem with an arbitrary distance function we also study algorithms for clustering problems with outliers in the streaming model here we give bicriterion guarantees producing constant factor approximations by increasing the allowed fraction of outliers slightly
in this paper we propose an approach to automatic compiler parallelization based on language extensions that is applicable to broader range of program structures and application domains than in past work as complement to ongoing work on high productivity languages for explicit parallelism the basic idea in this paper is to make sequential languages more amenable to compiler parallelization by adding enforceable declarations and annotations specifically we propose the addition of annotations and declarations related to multidimensional arrays points regions array views parameter intents array and object privatization pure methods absence of exceptions and gather reduce computations in many cases these extensions are also motivated by best practices in software engineering and can also contribute to performance improvements in sequential code detailed case study of the java grande forum benchmark suite illustrates the obstacles to compiler parallelization in current object oriented languages and shows that the extensions proposed in this paper can be effective in enabling compiler parallelization the results in this paper motivate future work on building an automatically parallelizing compiler for the language extensions proposed in this paper
we present system for refocusing images and videos of dynamic scenes using novel single view depth estimation method our method for obtaining depth is based on the defocus of sparse set of dots projected onto the scene in contrast to other active illumination techniques the projected pattern of dots can be removed from each captured image and its brightness easily controlled in order to avoid under or over exposure the depths corresponding to the projected dots and color segmentation of the image are used to compute an approximate depth map of the scene with clean region boundaries the depth map is used to refocus the acquired image after the dots are removed simulating realistic depth of field effects experiments on wide variety of scenes including close ups and live action demonstrate the effectiveness of our method
exact query reformulation using views in positive relational languages is well understood and has variety of applications in query optimization and data sharing generalizations to larger fragments of the relational algebra ra specifically support for the difference operator would increase the options available for query reformulation and also apply to view adaptation updating materialized view in response to modified view definition and view maintenance unfortunately most questions about queries become undecidable in the presence of difference negation we present novel way of managing this difficulty via an excursion through non standard semantics relations where tuples are annotated with positive or negative integers we show that under semantics ra queries have normal form as single difference of positive queries and this leads to the decidability of equivalence in most real world settings with difference it is possible to convert the queries to this normal form we give sound and complete algorithm that explores all reformulations of an ra query under semantics using set of ra views finitely bounding the search space with simple and natural cost model we investigate related complexity questions and we also extend our results to queries with built in predicates relations are interesting in their own right because they capture updates and data uniformly however our algorithm turns out to be sound and complete also for bag semantics albeit necessarily only for subclass of ra this subclass turns out to be quite large and covers generously the applications of interest to us we also show subclass of ra where reformulation and evaluation under semantics can be combined with duplicate elimination to obtain the answer under set semantics
we present new method for the efficient simulation of large bodies of water especially effective when three dimensional surface effects are important similar to traditional two dimensional height field approach most of the water volume is represented by tall cells which are assumed to have linear pressure profiles in order to avoid the limitations typically associated with height field approach we simulate the entire top surface of the water volume with state of the art fully three dimensional navier stokes free surface solver our philosophy is to use the best available method near the interface in the three dimensional region and to coarsen the mesh away from the interface for efficiency we coarsen with tall thin cells as opposed to octrees or amr because they maintain good resolution horizontally allowing for accurate representation of bottom topography
precise static type analysis is important to make available dynamic aspects of object oriented programs oops approximately known at compile time many techniques have been proposed for static type analysis depending upon the tradeoff of cost and precision the techniques may generate spurious possible types for particular dynamic dispatch which makes the static type analysis imprecise in this paper we propose symbolic execution based type analysis technique that analyzes the dynamic type inter procedurally by keeping the flow of the program in consideration we analyze test cases with different class hierarchies the proposed technique was capable to resolve the target method for most of the dynamic dispatches at reduced computational cost
catalogs of periodic variable stars contain large numbers of periodic light curves photometric time series data from the astrophysics domain separating anomalous objects from well known classes is an important step towards the discovery of new classes of astronomical objects most anomaly detection methods for time series data assume either single continuous time series or set of time series whose periods are aligned light curve data precludes the use of these methods as the periods of any given pair of light curves may be out of sync one may use an existing anomaly detection method if prior to similarity calculation one performs the costly act of aligning two light curves an operation that scales poorly to massive data sets this paper presents pcad an unsupervised anomaly detection method for large sets of unsynchronized periodic time series data that outputs ranked list of both global and local anomalies it calculates its anomaly score for each light curve in relation to set of centroids produced by modified means clustering algorithm our method is able to scale to large data sets through the use of sampling we validate our method on both light curve data and other time series data sets we demonstrate its effectiveness at finding known anomalies and discuss the effect of sample size and number of centroids on our results we compare our method to naive solutions and existing time series anomaly detection methods for unphased data and show that pcad’s reported anomalies are comparable to or better than all other methods finally astrophysicists on our team have verified that pcad finds true anomalies that might be indicative of novel astrophysical phenomena
matrix pattern oriented least squares support vector classifier matlssvc can directly classify matrix patterns and has superior classification performance than its vector version least squares support vector classifier lssvc especially for images however it can be found that the classification performance of matlssvc is matrixization dependent ie heavily relying on the reshaping ways from the original vector or matrix pattern to another matrix thus it is difficult to determine which reshaping way is fittest to classification on the other hand the changeable and different reshaping ways can naturally give birth to set of matlssvcs with diversity and it is the diversity that provides means to build an ensemble of classifiers in this paper we exactly exploit the diversity of the changeable reshaping ways and borrow adaboost to construct an adaboost matlssvc ensemble named adamatlssvc our contributions are that the proposed adamatlssvc can greatly avoid the matrixization dependent problem on single matlssvc different from the ensemble principle of the original adaboost that uses single type of classifiers as its base components the proposed adamatlssvc is on top of multiple types of matlssvcs in different reshapings since adamatlssvc adopts multiple matrix representations of the same pattern it can provide complementarity among different matrix representation spaces adamatlssvc mitigates the selection of the regularization parameter which are all validated in the experiments here
proof carrying code provides trust in mobile code by requiring certificates that ensure the code adherence to specific conditions the prominent approach to generate certificates for compiled code is certifying compilation that automatically generates certificates for simple safety properties in this work we present certificate translation novel extension for standard compilers that automatically transforms formal proofs for more expressive and complex properties of the source program to certificates for the compiled code the article outlines the principles of certificate translation instantiated for nonoptimizing compiler and for standard compiler optimizations in the context of an intermediate rtl language
cooperative bug isolation cbi is feedback directed approach to improving software quality developers provide instrumented applications to the general public and then use statistical methods to mine returned data for information about the root causes of failure thus users and developers form feedback loop of continuous software improvement given cbi’s focus on statistical methods and dynamic data collection it is not clear how static program analysis can most profitably be employed we discuss current uses of static analysis during cbi instrumentation and failure modeling we propose novel ways in which static analysis could be applied at various points along the cbi feedback loop from fairly concrete low level optimization opportunities to hybrid failure modeling approaches that may cut across current static dynamic statistical boundaries
since the publication of brin and page’s paper on pagerank many in the web community have depended on pagerank for the static query independent ordering of web pages we show that we can significantly outperform pagerank using features that are independent of the link structure of the web we gain further boost in accuracy by using data on the frequency at which users visit web pages we use ranknet ranking machine learning algorithm to combine these and other static features based on anchor text and domain characteristics the resulting model achieves static ranking pairwise accuracy of vs for pagerank or for random
meta learning has been successfully applied to acquire knowledge used to support the selection of learning algorithms each training example in meta learning ie each meta example is related to learning problem and stores the experience obtained in the empirical evaluation of set of candidate algorithms when applied to the problem the generation of good set of meta examples can be costly process depending for instance on the number of available learning problems and the complexity of the candidate algorithms in this work we proposed the active meta learning in which active learning techniques are used to reduce the set of meta examples by selecting only the most relevant problems for meta example generation in an implemented prototype we evaluated the use of two different active learning techniques applied in two different meta learning tasks the performed experiments revealed significant gain in the meta learning performance when the active techniques were used to support the meta example generation
collaborations over distance must contend with the loss of the rich subtle interactions that co located teams use to coordinate their work previous research has suggested that one consequence of this loss is that cross site work will take longer than comparable single site work we use both survey data and data from the change management system to measure the extent of delay in multi site software development organization we also measure site interdependence differences in same site and cross site communication patterns and analyze the relationship of these variables to delay our results show significant relationship between delay in cross site work and the degree to which remote colleagues are perceived to help out when workloads are heavy this result is particularly troubling in light of the finding that workers generally believed they were as helpful to their remote colleagues as to their local colleagues we discuss implications of our findings for collaboration technology for distributed organizations
much of research in data mining and machine learning has led to numerous practical applications spam filtering fraud detection and user query intent analysis has relied heavily on machine learned classifiers and resulted in improvements in robust classification accuracy combining multiple classifiers aka ensemble learning is well studied and has been known to improve effectiveness of classifier to address two key challenges in ensemble learning learning weights of individual classifiers and the combination rule of their weighted responses this paper proposes novel ensemble classifier enlr that computes weights of responses from discriminative classifiers and combines their weighted responses to produce single response for test instance the combination rule is based on aggregating weighted responses where weight of an individual classifier is inversely based on their respective variances around their responses here variance quantifies the uncertainty of the discriminative classifiers parameters which in turn depends on the training samples as opposed to other ensemble methods where the weight of each individual classifier is learned as part of parameter learning and thus the same weight is applied to all testing instances our model is actively adjusted as individual classifiers become confident at its decision for test instance our empirical experiments on various data sets demonstrate that our combined classifier produces effective results when compared with single classifier our novel classifier shows statistically significant better accuracy when compared to well known ensemble methods bagging and adaboost in addition to robust accuracy our model is extremely efficient dealing with high volumes of training samples due to the independent learning paradigm among its multiple classifiers it is simple to implement in distributed computing environment such as hadoop
the problem of summarizing multi dimensional data into lossy synopses supporting the estimation of aggregate range queries has been deeply investigated in the last three decades several summarization techniques have been proposed based on different approaches such as histograms wavelets and sampling the aim of most of the works in this area was to devise techniques for constructing effective synopses enabling range queries to be estimated trading off the efficiency of query evaluation with the accuracy of query estimates in this paper the use of summarization is investigated in more specific context where privacy issues are taken into account in particular we study the problem of constructing privacy preserving synopses that is synopses preventing sensitive information from being extracted while supporting safe analysis tasks in this regard we introduce probabilistic framework enabling the evaluation of the quality of the estimates which can be obtained by user owning the summary data based on this framework we devise technique for constructing histogram based synopses of multi dimensional data which provide as much accurate as possible answers for given workload of safe queries while preventing high quality estimates of sensitive information from being extracted
in many applications attribute and relationship data areavailable carrying complementary information about real world entities in such cases joint analysis of both types of data can yield more accurate results than classical clustering algorithms that either use only attribute data or only relationship graph data the connected center ckc has been proposed as the first joint cluster analysis model to discover clusters which are cohesive on both attribute and relationship data however it is well known that prior knowledge on the number of clusters is often unavailable in applications such as community dentification and hotspot analysis in this paper we introduce and formalize the problem of discovering an priori unspecified number of clusters in the context of joint cluster analysis of attribute and relationship data called connected clusters cxc problem true clusters are assumed to be compact and distinctive from their neighboring clusters in terms of attribute data and internally connected in terms of relationship data different from classical attribute based clustering methods the neighborhood of clusters is not defined in terms of attribute data but in terms of relationship data to efficiently solve the cxc problem we present jointclust an algorithm which adopts dynamic two phase approach in the first phase we find so called cluster atoms we provide probability analysis for thisphase which gives us probabilistic guarantee that each true cluster is represented by at least one of the initial cluster atoms in the second phase these cluster atoms are merged in bottom up manner resulting in dendrogram the final clustering is determined by our objective function our experimental evaluation on several real datasets demonstrates that jointclust indeed discovers meaningful and accurate clusterings without requiring the user to specify the number of clusters
this paper studies the effect of bisimulation minimisation in model checking of monolithic discrete time and continuous time markov chains as well as variants thereof with rewards our results show that as for traditional model checking enormous state space reductions up to logarithmic savings may be obtained in contrast to traditional model checking in many cases the verification time of the original markov chain exceeds the quotienting time plus the verification time of the quotient we consider probabilistic bisimulation as well as versions thereof that are tailored to the property to be checked
this paper describes three practical techniques for authenticating the code and other execution state of an operating system using the services of the tpm and hypervisor the techniques trade off detailed reporting of the os code and configuration with the manageability and comprehensibility of reported configurations such trade offs are essential because of the complexity and diversity of modern general purpose operating systems makes simple code authentication schemes using code hashes or certificates infeasible
editing recorded motions to make them suitable for different sets of environmental constraints is general and difficult open problem in this paper we solve significant part of this problem by modifying full body motions with an interactive randomized motion planner our method is able to synthesize collision free motions for specified linkages of multiple animated characters in synchrony with the characters full body motions the proposed method runs at interactive speed for dynamic environments of realistic complexity we demonstrate the effectiveness of our interactive motion editing approach with two important applications motion correction to remove collisions and synthesis of realistic object manipulation sequences on top of locomotion
representation independence formally characterizes the encapsulation provided by language constructs for data abstraction and justifies reasoning by simulation representation independence has been shown for variety of languages and constructs but not for shared references to mutable state indeed it fails in general for such languages this article formulates representation independence for classes in an imperative object oriented language with pointers subclassing and dynamic dispatch class oriented visibility control recursive types and methods and simple form of module an instance of class is considered to implement an abstraction using private fields and so called representation objects encapsulation of representation objects is expressed by restriction called confinement on aliasing representation independence is proved for programs satisfying the confinement condition static analysis is given for confinement that accepts common designs such as the observer and factory patterns the formalization takes into account not only the usual interface between client and class that provides an abstraction but also the interface often called ldquo protected rdquo between the class and its subclasses
we present two experiments on the use of non speech audio at an interactive multi touch multi user tabletop display we first investigate the use of two categories of reactive auditory feedback affirmative sounds that confirm user actions and negative sounds that indicate errors our results show that affirmative auditory feedback may improve one’s awareness of group activity at the expense of one’s awareness of his or her own activity negative auditory feedback may also improve group awareness but simultaneously increase the perception of errors for both the group and the individual in our second experiment we compare two methods of associating sounds to individuals in co located environment specifically we compare localized sound where each user has his or her own speaker to coded sound where users share one speaker but the waveform of the sounds are varied so that different sound is played for each user results of this experiment reinforce the presence of tension between group awareness and individual focus found in the first experiment user feedback suggests that users are more easily able to identify who caused sound when either localized or coded sound is used but that they are also more able to focus on their individual work our experiments show that in general auditory feedback can be used in co located collaborative applications to support either individual work or group awareness but not both simultaneously depending on how it is presented
this paper describes our experiences in defining the processes associated with preparing and administrating chemotherapy and then using those process definitions as the basis for analyses aimed at finding and correcting defects the work is collaboration between medical professionals from major regional cancer center and computer science researchers the work uses the little jil language to create precise process definitions the propel system to specify precise process requirements and the flavers system to verify that the process definitions adhere to the requirement specifications the paper describes how these technologies were applied to successfully identify defects in the chemotherapy process although this work is still ongoing early experiences suggest that this approach can help reduce medical errors and improve patient safety the work has also helped us to learn about the desiderata for process definition and analysis technologies both of which are expected to be broadly applicable to other domains
spatial queries in high dimensional spaces have been studied extensively recently among them nearest neighbor queries are important in many settings including spatial databases find the closest cities and multimedia databases find the most similar images previous analyses have concluded that nearest neighbor search is hopeless in high dimensions due to the notorious curse of dimensionality here we show that this may be overpessimistic we show that what determines the search performance at least for tree like structures is the intrinsic dimensionality of the data set and not the dimensionality of the address space referred to as the embedding dimensionality the typical and often implicit assumption in many previous studies is that the data is uniformly distributed with independence between attributes however real data sets overwhelmingly disobey these assumptions rather they typically are skewed and exhibit intrinsic fractal dimensionalities that are much lower than their embedding dimension eg due to subtle dependencies between attributes in this paper we show how the hausdorff and correlation fractal dimensions of data set can yield extremely accurate formulas that can predict the performance to within one standard deviation on multiple real and synthetic data sets the practical contributions of this work are our accurate formulas which can be used for query optimization in spatial and multimedia databases the major theoretical contribution is the deflation of the dimensionality curse our formulas and our experiments show that previous worst case analyses of nearest neighbor search in high dimensions are overpessimistic to the point of being unrealistic the performance depends critically on the intrinsic fractal dimensionality as opposed to the embedding dimension that the uniformity and independence assumptions incorrectly imply
we investigate the problem of ranking the answers to database query when many tuples are returned in particular we present methodologies to tackle the problem for conjunctive and range queries by adapting and applying principles of probabilistic models from information retrieval for structured data our solution is domain independent and leverages data and workload statistics and correlations we evaluate the quality of our approach with user survey on real database furthermore we present and experimentally evaluate algorithms to efficiently retrieve the top ranked results which demonstrate the feasibility of our ranking system
in recent years the technological advances in mapping genes have made it increasingly easy to store and use wide variety of biological data such data are usually in the form of very long strings for which it is difficult to determine the most relevant features for classification task for example typical dna string may be millions of characters long and there may be thousands of such strings in database in many cases the classification behavior of the data may be hidden in the compositional behavior of certain segments of the string which cannot be easily determined apriori another problem which complicates the classification task is that in some cases the classification behavior is reflected in global be havior of the string whereas in others it is reflected in local patterns given the enormous variation in the behavior of the strings over different data sets it is useful to develop an approach which is sensitive to both the global and local behavior of the strings for the purpose of classification for this purpose we will exploit the multi resolution property of wavelet decomposition in order to create scheme which can mine classification characteristics at different levels of granularity the resulting scheme turns out to be very effective in practice on wide range of problems
static approach is proposed to study secure composition of services we extend the lambda calculus with primitives for selecting and invoking services that respect given security requirements security critical code is enclosed in policy framings with possibly nested local scope policy framings enforce safety and liveness properties the actual run time behaviour of services is over approximated by type and effect system types are standard and effects include the actions with possible security concerns as well as information about which services may be invoked at run time an approximation is model checked to verify policy framings within their scopes this allows for removing any run time execution monitor and for determining the plans driving the selection of those services that match the security requirements on demand
this report presents initial results in the area of software testing and analysis produced as part of the software engineering impact project the report describes the historical development of runtime assertion checking including description of the origins of and significant features associated with assertion checking mechanisms and initial findings about current industrial use future report will provide more comprehensive assessment of development practice for which we invite readers of this report to contribute information
extracting and fusing discriminative features in fingerprint matching especially in distorted fingerprint matching is challenging task in this paper we introduce two novel features to deal with nonlinear distortion in fingerprints one is finger placement direction which is extracted from fingerprint foreground and the other is ridge compatibility which is determined by the singular values of the affine matrix estimated by some matched minutiae and their associated ridges both of them are fixed length and easy to be incorporated into matching score in order to improve the matching performance we combine these two features with orientation descriptor and local minutiae structure which are used to measure minutiae similarity to achieve fingerprint matching in addition we represent minutiae set as graph and use graph connect component and iterative robust least square irls to detect creases and remove spurious minutiae close to creases experimental results on fvc db and db demonstrate that the proposed algorithm could obtain promising results the equal error rates eer are and on db and db respectively
for some time there has been increasing interest in the problem of monitoring the occurrence of topics in stream of events such as stream of news articles this has led to different models of bursts in these streams ie periods of elevated occurrence of events today there are several burst definitions and detection algorithms and their differences can produce very different results in topic streams these definitions also share fundamental problem they define bursts in terms of an arrival rate this approach is limiting other stream dimensions can matter we reconsider the idea of bursts from the standpoint of simple kind of physics instead of focusing on arrival rates we reconstruct bursts as dynamic phenomenon using kinetics concepts from physics mass and velocity and derive momentum acceleration and force from these we refer to the result as topic dynamics permitting hierarchical expressive model of bursts as intervals of increasing momentum as sample application we present topic dynamics model for the large pubmed medline database of biomedical publications using the mesh medical subject heading topic hierarchy we show our model is able to detect bursts for mesh terms accurately as well as efficiently
this paper presents new algorithm for recommender systems applied to smart carts as the customers pass through the store’s aisles they place their desired products in their cart baskets in many instances the customers pick an item from the shelf and place it in the basket but after while they find similar item with different specifications the difference may be in price quality weight or other factors in our proposed plan based on the customer’s decision from choosing the first item and replacing it with another item from the same group there will be an attempt to identify the customer’s taste and accordingly recommend third fourth etc item that might also meet his her needs the complete algorithm is introduced as systematic procedure and the implementation results are shown the proposed recommender system is designed based on the features of smart carts we have simulated part of the smart cart in an application named nikshiri shop using and sql
in an out of order issue processor instructions are dynamically reordered and issued to function units in their data ready order rather than their original program order to achieve high performance the logic that facilitates dynamic issue is one of the most power hungry and time critical components in typical out of order issue processorthis paper develops cooperative hardware software technique to reduce complexity and energy consumption of the issue logic the proposed scheme is based on the observation that not all instructions in program require the same amount of dynamic reordering instructions that belong to basic blocks for which the compiler can perform near optimal sche duling do not need any intra block instruction reordering but require only inter block instruction overlap in contrast blocks where the compiler is limited by artificial dependences and memory misses require both intra block and inter block instruction reordering the proposed reorder sensitive issue scheme utilizes novel compile time analyzer to evaluate the quality of schedules generated by the static scheduler and to estimate the dynamic reordering requirement of instructions within each basic block at the micro architecture level we propose novel issue queue that exploits the varying dynamic scheduling requirement of basic blocks to lower the power dissipation and complexity of the dynamic issue hardwarean evaluation of the technique on several spec integer benchmarks indicates that we can reduce the energy consumption in the issue queue on average by with only performance degradation additionally the proposed issue hardware is significantly less complex when compared to conventional monolithic out of order issue queue providing the potential for high clock speeds
local network coding is growing in prominence as technique to facilitate greater capacity utilization in multi hop wireless networks specific objective of such local network coding techniques has been to explicitly minimize the total number of transmissions needed to carry packets across each wireless hop while such strategy is certainly useful we argue that in lossy wireless environments better use of local network coding is to provide higher levels of redundancy even at the cost of increasing the number of transmissions required to communicate the same information in this paper we show that the design space for effective redundancy in local network coding is quite large which makes optimal formulations of the problem hard to realize in practice we present detailed exploration of this design space and propose suite of algorithms called clone that can lead to further throughput gains in multi hop wireless scenarios through careful analysis simulations and detailed implementation on real testbed we show that some of our simplest clone algorithms can be efficiently implemented in today’s wireless hardware to provide factor of two improvement in throughput for example scenarios while other more effective clone algorithms require additional advances in hardware processing speeds to be deployable in practice
it is challenging problem to interactively deform densely sampled complex objects this paper proposed an easy but efficient approach to it by using coarse control meshes to embed the target objects the control mesh can be efficiently deformed by various existing methods and then the target object can be accordingly deformed by interpolation one of the most simplest interpolation methods is to use the barycentric coordinates which however generates apparent first order discontinuity artifacts across the boundary due to its piecewise linear property to avoid such artifacts this paper introduced modified barycentric interpolation modified bi technique the central idea is to add local transformation at each control vertex for interpolation so that we can minimize the first order discontinuity by optimizing the local transformations we also minimize the second order derivatives of the interpolation function to avoid undesired vibrations while focus on deforming objects embed in tetrahedron meshes the proposed method is applicable to image objects embed in planar triangular meshes the experimental results in both and demonstrated the success and advantages of the proposed method
we present new mobile interaction model called double side multi touch based on mobile device that receives simultaneous multi touch input from both the front and the back of the device this new double sided multi touch mobile interaction model enables intuitive finger gestures for manipulating objects and user interfaces on screen
this paper introduces new framework for human contour tracking and action sequence recognition given gallery of labeled human contour sequences we define each contour as word and encode all of them into contour dictionary this dictionary will be used to translate the video to this end contour graph is constructed by connecting all the neighboring contours then the motion in video is viewed as an instance of random walks on this graph as result we can avoid explicitly parameterizing the contour curves and modeling the dynamical system for contour updating in such work setting there are only few state variables to be estimated when using sequence monte carlo smc approach to realize the random walks in addition the walks on the graph also perform sequence comparisons implicitly with those in the predefined gallery from which statistics about class label is evaluated for action recognition experiments on diving tracking and recognition illustrate the validity of our method
resource efficient checkpoint processors have been shown to recover to an earlier safe state very fast yet in order to complete the misprediction recovery they also need to reexecute the code segment between the recovered checkpoint and the mispredicted instruction this paper evaluates two novel reuse methods which accelerate reexecution paths by reusing the results of instructions and the outcome of branches obtained during the first run the paper also evaluates in the context of checkpoint processors two other reuse methods targeting trivial and repetitive arithmetic operations reuse approach combining all four methods requires an area of mm consumes mw and improves the energy delay product by and for the integer and floating point benchmarks respectively
we study the expressive power of variants of klaim an experimental language with programming primitives for network aware programming that combines the process algebra approach with the coordination oriented one klaim has proved to be suitable for programming wide range of distributed applications with agents and code mobility and has been implemented on the top of runtime system written in java in this paper the expressivity of its constructs is tested by distilling from it few more and more foundational languages and by studying the encoding of each of them into simpler one the expressive power of the considered calculi is finally tested by comparing one of them with asynchronous calculus
this paper presents finding and technique on program behavior prediction the finding is that surprisingly strong statistical correlations exist among the behaviors of different program components eg loops and among different types of program level behaviors eg loop trip counts versus data values furthermore the correlations can be beneficially exploited they help resolve the proactivity adaptivity dilemma faced by existing program behavior predictions making it possible to gain the strengths of both approaches the large scope and earliness of offline profiling based predictions and the cross input adaptivity of runtime sampling based predictions the main technique contributed by this paper centers on new concept seminal behaviors enlightened by the existence of strong correlations among program behaviors we propose regression based framework to automatically identify small set of behaviors that can lead to accurate prediction of other behaviors in program we call these seminal behaviors by applying statistical learning techniques the framework constructs predictive models that map from seminal behaviors to other behaviors enabling proactive and cross input adaptive prediction of program behaviors the prediction helps commercial compiler the ibm xl compiler generate code that runs up to faster on average demonstrating the large potential of correlation based techniques for program optimizations
model of data flow analysis and fixed point iteration solution procedures is presented the faulty incremental iterative algorithm is introduced examples of the imprecision of restarting iteration from the intraprocedural and interprocedural domains are given some incremental techniques which calculate precise data flow information are summarized
gaining rapid overview of an emerging scientific topic sometimes called research fronts is an increasingly common task due to the growing amount of interdisciplinary collaboration visual overviews that show temporal patterns of paper publication and citation links among papers can help researchers and analysts to see the rate of growth of topics identify key papers and understand influences across subdisciplines this article applies novel network visualization tool based on meaningful layouts of nodes to present research fronts and show citation links that indicate influences across research fronts to demonstrate the value of two dimensional layouts with multiple regions and user control of link visibility we conducted design oriented preliminary case study with domain experts over month period the main benefits were being able to easily identify key papers and see the increasing number of papers within research front and to quickly see the strength and direction of influence across related research fronts copy wiley periodicals inc
in order to realize the dream of global personal communications the integration of terrestrial and satellite communications networks becomes mandatory in such environments multimedia applications such as video on demand services will become more popular this paper proposes an architecture based on combination of quasi geostationary orbit satellite systems and terrestrial networks for building large scale and efficient vod system hybrid network made of fixed and mobile nodes is considered the key idea of the architecture is to service fixed nodes according to the neighbors buffering policy recently proposed scheme for vod delivery while mobile nodes are served directly from the local server to allow users to receive their vod applications with higher degree of mobility issues related to mobility management are discussed and simple scheme is proposed to guarantee smooth streaming of video data the importance of the proposed architecture is verified by numerical results in case of requests coming from fixed nodes within the reach of terrestrial networks analytical results elucidate the good performance of the architecture in terms of both increasing the system capacity and reducing the disk bandwidth requirements conducted simulations indicate how efficient the proposed system is in smoothening handoffs
in this paper we show how to establish reliable and efficient high level communication system in randomly deployed network of sensors equipped with directional antennas this high level communication system enables the programming of the sensor network using high level communication functionalities without the burden of taking care of their physical capacities low range unidirectional links single frequency presence of collisions etc the high level communication functionalities we offer include point to point communication point to area communication and one to all communication the basic idea to implement this system is to simulate virtual network that emerges from the ad hoc network using self organization self discovery and collaborative methods we also analyse the efficiency scalability and robustness of the proposed protocols
this paper investigates how the vision of the semantic web can be carried overto the realm of email we introduce general notion of semantice mail in which an email message consists of an rdf query or update coupled with corresponding explanatory text semantic email opens the door to wide range of automated email mediated applications with formally guaranteed properties in particular this paper introduces broad class of semantic email processes for example consider the process of sending an email to program committee asking who will attend the pc dinner automatically collecting the responses and tallying them up we define bothlogical and decision theoretic models where an email process ismodeled as set of updates to data set on which we specify goals via certain constraints or utilities we then describe set ofinference problems that arise while trying to satisfy these goals and analyze their computational tractability in particular weshow that for the logical model it is possible to automatically infer which email responses are acceptable wrt set ofconstraints in polynomial time and for the decision theoreticmodel it is possible to compute the optimal message handling policy in polynomial time finally we discuss our publicly available implementation of semantic email and outline research challenges inthis realm
automated cad model simplification plays an important role in effectively utilizing physics based simulation during the product realization process currently rich body of literature exists that describe many successful techniques for fully automatic or semi automatic simplification of cad models for wide variety of applications the purpose of this paper is to compile list of the techniques that are relevant for physics based simulations problems and to characterize them based on their attributes we have classified them into the following four categories techniques based on surface entity based operators volume entity based operators explicit feature based operators and dimension reduction operators this paper also presents the necessary background information in the cad model representation to assist the new readers we conclude the paper by outlining open research directions in this field
mapreduce is programming model and an associated implementation for processing and generating large datasets that is amenable to broad variety of real world tasks users specify the computation in terms of map and reduce function and the underlying runtime system automatically parallelizes the computation across large scale clusters of machines handles machine failures and schedules inter machine communication to make efficient use of the network and disks programmers find the system easy to use more than ten thousand distinct mapreduce programs have been implemented internally at google over the past four years and an average of one hundred thousand mapreduce jobs are executed on google’s clusters every day processing total of more than twenty petabytes of data per day
while standardization efforts for xml query languages have been progressing researchers and users increasingly focus on the database technology that has to deliver on the new challenges that the abundance of xml documents poses to data management validation performance evaluation and optimization of xml query processors are the upcoming issues following long tradition in database research we provide framework to assess the abilities of an xml database to cope with broad range of different query types typically encountered in real world scenarios the benchmark can help both implementors and users to compare xml databases in standardized application scenario to this end we offer set of queries where each query is intended to challenge particular aspect of the query processor the overall workload we propose consists of scalable document database and concise yet comprehensive set of queries which covers the major aspects of xml query processing ranging from textual features to data analysis queries and ad hoc queries we complement our research with results we obtained from running the benchmark on several xml database platforms these results are intended to give first baseline and illustrate the state of the art
with the advances in web service techniques new collaborative applications have emerged like supply chain arrangements and coalition in government agencies in such applications the collaborating parties are responsible for managing and protecting resources entrusted to them access control decisions thus become collaborative activity in which global policy must be enforced by set of collaborating parties without compromising the autonomy or confidentiality requirements of these parties unfortunately none of the conventional access control systems meets these new requirements to support collaborative access control in this paper we propose novel policy based access control model our main idea is based on the notion of policy decomposition and we propose an extension to the reference architecture for xacml we present algorithms for decomposing global policy and efficiently evaluating requests
recently there has been tendency for the research community to move away from closed hypermedia syustems towards open hypermedia link services which allow third parties to produce applications so that they are hypertext enabled this paper explores the frontiers of this trend by examining the minimum responsibility of an application to co operate with the underlying link service and in the limiting case where the application has not been enabled in any way it explores the properties and qualities of hypermedia systems that can be produced tool the universal viewer which allows the microcosm hypermedia system to co operate with applications which have not been enabled in introduced and case study is presented which demonstrates the functionality that may be achieved using entirely third party applications most of which have not been enabled
xml languages such as xquery xslt and sql xml employ xpath as the search and extraction language xpath expressions often define complicated navigation resulting in expensive query processing especially when executed over large collections of documents in this paper we propose framework for exploiting materialized xpath views to expedite processing of xml queries we explore class of materialized xpath views which may contain xml fragments typed data values full paths node references or any combination thereof we develop an xpath matching algorithm to determine when such views can be used to answer user query containing xpath expressions we use the match information to identify the portion of an xpath expression in the user query which is not covered by the xpath view finally we construct possibly multiple compensation expressions which need to be applied to the view to produce the query result experimental evaluation using our prototype implementation shows that the matching algorithm is very efficient and usually accounts for small fraction of the total query compilation time
current intrusion detection and prevention systems seek to detect wide class of network intrusions eg dos attacks worms port scans at network vantage points unfortunately even today many ids systems we know of keep per connection or per flow state to detect malicious tcp flows thus it is hardly surprising that these ids systems have not scaled to multigigabit speeds by contrast both router lookups and fair queuing have scaled to high speeds using aggregation via prefix lookups or diffserv thus in this paper we initiate research into the question as to whether one can detect attacks without keeping per flow state we will show that such aggregation while making fast implementations possible immediately causes two problems first aggregation can cause behavioral aliasing where for example good behaviors can aggregate to look like bad behaviors second aggregated schemes are susceptible to spoofing by which the intruder sends attacks that have appropriate aggregate behavior we examine wide variety of dos and scanning attacks and show that several categories bandwidth based claim and hold port scanning can be scalably detected in addition to existing approaches for scalable attack detection we propose novel data structure called partial completion filters pcfs that can detect claim and hold attacks scalably in the network we analyze pcfs both analytically and using experiments on real network traces to demonstrate how we can tune pcfs to achieve extremely low false positive and false negative probabilities
the real time immersive network simulation environment rinse simulator is being developed to support large scale network security preparedness and training exercises involving hundreds of players and modeled network composed of hundreds of local area networks lans the simulator must be able to present realistic rendering of network behavior as attacks are launched and players diagnose events and try counter measures to keep network services operating the authors describe the architecture and function of rinse and outline how techniques such as multiresolution traffic modeling multiresolution attack models and new routing simulation methods are used to address the scalability challenges of this application they also describe in more detail new work on cpu memory models necessary for the exercise scenarios and latency absorption technique that will help when extending the range of client tools usable by the players
we present an architecture of decoupled processors with memory hierarchy consisting only of scratch pad memories and main memory this architecture exploits the more efficient pre fetching of decoupled processors that make use of the parallelism between address computation and application data processing which mainly exists in streaming applications this benefit combined with the ability of scratch pad memories to store data with no conflict misses and low energy per access contributes significantly for increasing the system’s performance the application code is split in two parallel programs the first runs on the access processor and computes the addresses of the data in the memory hierarchy the second processes the application data and runs on the execute processor processor with limited address space just the register file addresses each transfer of any block in the memory hierarchy up to the execute processor’s register file is controlled by the access processor and the dma units this strongly differentiates this architecture from traditional uniprocessors and existing decoupled processors with cache memory hierarchies the architecture is compared in performance with uniprocessor architectures with scratch pad and cache memory hierarchies and the existing decoupled architectures showing its higher normalized performance the reason for this gain is the efficiency of data transferring that the scratch pad memory hierarchy provides combined with the ability of the decoupled processors to eliminate memory latency using memory management techniques for transferring data instead of fixed prefetching methods experimental results show that the performance is increased up to almost times compared to uniprocessor architectures with scratch pad and up to times compared to the ones with cache the proposed architecture achieves the above performance without having penalties in energy delay product costs
the application of decentralized reputation systems is promising approach to ensure cooperation and fairness as well as to address random failures and malicious attacks in mobile ad hoc networks however they are potentially vulnerable to liars with our work we provide first step to analyzing robustness of reputation system based on deviation test using mean field approach to our stochastic process model we show that liars have no impact unless their number exceeds certain threshold phase transition we give precise formulae for the critical values and thus provide guidelines for an optimal choice of parameters
static analysis tools report software defects that may or may not be detected by other verification methods two challenges complicating the adoption of these tools are spurious false positive warnings and legitimate warnings that are not acted on this paper reports automated support to help address these challenges using logistic regression models that predict the foregoing types of warnings from signals in the warnings and implicated code because examining many potential signaling factors in large software development settings can be expensive we use screening methodology to quickly discard factors with low predictive power and cost effectively build predictive models our empirical evaluation indicates that these models can achieve high accuracy in predicting accurate and actionable static analysis warnings and suggests that the models are competitive with alternative models built without screening
in this paper we present clustering and indexing paradigm called clindex for high dimensional search spaces the scheme is designed for approximate similarity searches where one would like to find many of the data points near target point but where one can tolerate missing few near points for such searches our scheme can find near points with high recall in very few ios and perform significantly better than other approaches our scheme is based on finding clusters and then building simple but efficient index for them we analyze the trade offs involved in clustering and building such an index structure and present extensive experimental results
the demand for automatically annotating and retrieving medical images is growing faster than ever in this paper we present novel medical image retrieval method for special medical image retrieval problem where the images in the retrieval database can be annotated into one of the pre defined labels even more user may query the database with an image that is close to but not exactly what he she expects the retrieval consists of the deducible retrieval and the traditional retrieval the deducible retrieval is special semantic retrieval and is to retrieve the label that user expects while the traditional retrieval is to retrieve the images in the database which belong to this label and are most similar to the query image in appearance the deducible retrieval is achieved using semi supervised semantic error correcting output codes semi secc the active learning method is also exploited to further reduce the number of the required ground truthed training images relevance feedbacks rfs are used in both retrieval steps in the deducible retrieval rf acts as short term memory feedback and helps identify the label that user expects in the traditional retrieval rf acts as long term memory feedback and helps ground truth the unlabelled training images in the database the experimental results on imageclef annotation data set clearly show the strength and the promise of the presented methods
the gaussian elimination algorithm is in fact an algorithm family common implementations contain at least six mostly independent design choices generic implementation can easily be parametrized by all these design choices but this usually leads to slow and bloated code using metaocaml’s staging facilities we show how we can produce natural and type safe implementation of gaussian elimination which exposes its design choices at code generation time so that these choices can effectively be specialized away and where the resulting code is quite efficient
distributed denial of service ddos attacks currently represent serious threat to the appropriate operation of internet services we propose an ip traceback system to be deployed at the level of autonomous systems ases to deal with this threat our proposed as level ip traceback system contrasts with previous work as it requires priori no knowledge of the network topology while allowing single packet traceback and incremental deployment we also investigate and evaluate the strategic placement of our systems showing that the partial deployment offered by our proposed system provides relevant results in ip traceback rendering it feasible for large scale networks such as the internet
in dynamic environments such as the world wide web changing document collection query population and set of search services demands frequent repetition of search effectiveness relevance evaluations reconstructing static test collections such as in trec requires considerable human effort as large collection sizes demand judgments deep into retrieved pools in practice it is common to perform shallow evaluations over small numbers of live engines often pairwise engine vs engine without system pooling although these evaluations are not intended to construct reusable test collections their utility depends on conclusions generalizing to the query population as whole we leverage the bootstrap estimate of the reproducibility probability of hypothesis tests in determining the query sample sizes required to ensure this finding they are much larger than those required for static collections we propose semiautomatic evaluation framework to reduce this effort we validate this framework against manual evaluation of the top ten results of ten web search engines across queries in navigational and informational tasks augmenting manual judgments with pseudo relevance judgments mined from web taxonomies reduces both the chances of missing correct pairwise conclusion and those of finding an errant conclusion by approximately percnt
portable standards compliant systems software is usually associated with unavoidable overhead from the standards prescribed interface for example consider the posix threads standard facility for using thread specific data tsd to implement multithreaded code the first tsd reference must be preceded by pthreadgetspecific typically implemented as function or macro with instructions this paper proposes method that uses the runtime specialization facility of the tempo program specializer to convert such unavoidable source code into simple memory references of one or two instructions for execution consequently the source code remains standard compliant and the executed code’s performance is similar to direct global variable access measurements show significant performance gains over range of code sizes random number generator lines of shows speedup of times on sparc and times on pentium time converter lines was sped up by and percent respectively and parallel genetic algorithm system lines was sped up by and percent
similarity join algorithms find pairs of objects that lie within certain distance epsi of each other algorithms that are adapted from spatial join techniques are designed primarily for data in vector space and often employ some form of multidimensional index for these algorithms when the data lies in metric space the usual solution is to embed the data in vector space and then make use of multidimensional index such an approach has number of drawbacks when the data is high dimensional as we must eventually find the most discriminating dimensions which is not trivial in addition although the maximum distance between objects increases with dimension the ability to discriminate between objects in each dimension does not these drawbacks are overcome via the introduction of new method called quickjoin that does not require multidimensional index and instead adapts techniques used in distance based indexing for use in method that is conceptually similar to the quicksort algorithm formal analysis is provided of the quickjoin method experiments show that the quickjoin method significantly outperforms two existing techniques
the semantic web is an extension of the current web where information would have precisely defined meaning based on knowledge representation languages the current wc standard for representing knowledge is the web ontology language owl owl is based on description logics which is popular knowledge representation formalism although dls are quire expressive they feature limitations with respect to what can be said about vague knowledge which appears in several applications consequently fuzzy extensions to owl and dls have gained considerable attention in the current paper we study fuzzy extensions of the semantic web language owl first we present the abstract syntax and semantics of rather elementary fuzzy extension of owl creating fuzzy owl owl more importantly we use this extension to provide an investigation on the semantics of several owl axioms and more precisely for those which in classical dls can be expressed in different but equivalent ways moreover we present translation method which reduces inference problems of owl into inference problems of expressive fuzzy description logics in order to provide reasoning support through fuzzy dls finally we present two further fuzzy extensions of owl based on fuzzy subsumption and fuzzy nominals
general purpose system on chip platforms consistingof configurable components are emerging as an attractivealternative to traditional customized solutions eg asics custom socs owing to their flexibility time to market advantage and low engineering costs however the adoptionof such platforms in many high volume markets eg wirelesshandhelds is limited by concerns about their performance andenergy efficiency this paper addresses the problemof enablingthe use of configurable platforms in domains where custom approacheshave traditionally been used we introduce dynamicplatform management methodology for customizing configurablegeneral purpose platform at run time to help bridgethe performance and energy efficiency gap with custom approachesthe proposed technique uses software layer that detectstime varying processing requirements imposed by set ofapplications and dynamically optimizes architectural parametersand platform components dynamic platform managementenables superior application performance more efficient utilizationof platform resources and improved energy efficiency as compared to statically optimized platform without requiringany modifications to the underlying hardwarewe illustrate dynamic platform management by applying itto the design of dual access umts wlansecurity processingsystem implemented on general purpose configurable platformexperiments demonstrate that compared to staticallyoptimized design on the same platform the proposed techniquesenable upto improvements in security processingthroughput while achieving savings in energy consumption on average
in this paper we design implement and evaluate adaptguard software service for guarding adaptive systems such as qos adaptive servers from instability caused by software anomalies and faults adaptive systems are of growing importance due to the need to adjust performance to larger range of changing environmental conditions without human intervention such systems however implicitly assume model of system behavior that may be violated causing adaptation loops to perform poorly or fail the purpose of adaptguard is simple in the absence of an priori model of the adaptive software system anticipate system instability attribute it correctly to the right runaway adaptation loop and disconnect it replacing it with conservative but stable open loop control until further notice we evaluate adaptguard by injecting various software faults into adaptive systems that are managed by typical adaptation loops results demonstrate that it can successfully anticipate instability caused by the injected faults and recover from performance degradation further case study is presented using an apache web server serving multiple classes of traffic performance anomaly is demonstrated caused by unexpected interactions between an admission controller and the linux anti livelock mechanism in the absence of model that describes this mechanism adaptguard is able to correctly attribute the unexpected problem to the right runaway loop and fix it
the detection of high level concepts in video data is an essential processing step of video retrieval system the meaning and the appearance of certain events or concepts are strongly related to contextual information for example the appearance of semantic concepts such as eg entertainment or news anchors is determined by the used editing layout which usually is typical for certain broadcasting station in recent years supervised machine learning approaches have been extensively used to learn and detect high level concepts in video shots the class of semi supervised learning methods incorporates unlabeled data in the learning process transductive learning is subclass of semi supervised learning in the transductive setting all training samples are labeled but the unlabeled test samples are considered in the learning process as well up to now transductive learning has not been applied for the purpose of video indexing and retrieval in this paper we propose transductive learning realized by transductive support vector machines tsvm for the detection of those high level concepts whose appearance is strongly related to particular video for each video and each concept transductive model is learned separately and adapted to the appearance of specific concept in the particular test video experimental results on trecvid video data demonstrate the feasibility of the proposed transductive learning approach for several high level concepts
we present multivalued mu calculus an expressive logic to specify knowledge and time in multi agent systems we show that the general method of translation from multivalued to two valued de morgan algebras can be extended to mv mu calculus model checking this way can we reduce the model checking problem for mv mu calculus to several instances of the model checking problem for two valued mu calculus as result properties involving mv mu calculus or its subsets like mv ctlk or mv ctl can be verified using any of the available model checking algorithms three simple examples are shown to exemplify possible applications of multivalued logics of knowledge and time
explanation is an important capability for usable intelligent systems including intelligent agents and cognitive models embedded within simulations and other decision support systems explanation facilities help users understand how and why an intelligent system possesses given structure and set of behaviors prior research has resulted in number of approaches to provide explanation capabilities and identified some significant challenges we describe designs that can be reused to create intelligent agents capable of explaining themselves the designs include ways to provide ontological mechanistic and operational explanations these designs inscribe lessons learned from prior research and provide guidance for incorporating explanation facilities into intelligent systems the designs are derived from both prior research on explanation tool design and from the empirical study reported here on the questions users ask when working with an intelligent system we demonstrate the use of these designs through examples implemented using the herbal high level cognitive modeling language these designs can help build better agents they support creating more usable and more affordable intelligent agents by encapsulating prior knowledge about how to generate explanations in concise representations that can be instantiated or adapted by agent developers
general transform called the geometric transform get that models the appearance inside closed contour is proposed the proposed get is functional of an image intensity function and region indicator function derived from closed contour it can be designed to combine the shape and appearance information at different resolutions and to generate models invariant to deformation articulation or occlusion by choosing appropriate functionals and region indicator functions the get unifies radon transform trace transform and class of image warpings by varying the region indicator and the types of features used for appearance modeling five novel types of gets are introduced and applied to fingerprinting the appearance inside contour they include the gets based on level set shape matching feature curves and the get invariant to occlusion and multiresolution get mrget applications of get to pedestrian identity recognition human body part segmentation and image synthesis are illustrated the proposed approach produces promising results when applied to fingerprinting the appearance of human and body parts despite the presence of nonrigid deformations and articulated motion
existing approaches for protecting sensitive information stored outsourced at external honest but curious servers are typically based on an overlying layer of encryption that is applied on the whole information or use combination of fragmentation and encryption the computational load imposed by encryption makes such approaches not suitable for scenarios with lightweight clientsin this paper we address this issue and propose novel model for enforcing privacy requirements on the outsourced information which departs from encryption the basic idea of our approach is to store small portion of the data just enough to break sensitive associations on the client which is trusted being under the data owner control while storing the remaining information in clear form at the external honest but curious server we model the problem and provide solution for it aiming at minimizing the data stored at the client we also illustrate the execution of queries on the fragmented information
thunderdome is system for collaboratively measuring upload bandwidths in ad hoc peer to peer systems it works by scheduling bandwidth probes between pairs of hosts wherein each pairwise exchange reveals the upload constraint of one participant using the abstraction of bandwidth tournaments unresolved hosts are successively paired with each other until every peer knows its upload bandwidth to recover from measurement errors that corrupt its tournament schedule thunderdome aggregates multiple probe results for each host avoiding pathological bandwidth estimations that would otherwise occur in systems with heterogeneous bandwidth distributions for scalability the coordination of probes is distributed across the hosts simulations on empirical and analytic bandwidth distributions validated with wide area planetlab experiments show that thunderdome efficiently yields upload bandwidth estimates that are robust to measurement error
tap is storage cache sequential prefetching and caching technique to improve the read ahead cache hit rate and system response time unique feature of tap is the use of table to detect sequential access patterns in the workload and to dynamically determine the optimum prefetch cache size when compared to some popular prefetching techniques tap gives better hit rate and response time while using read cache that is often an order of magnitude smaller than that needed by other techniques tap is especially efficient when the workload consists of interleaved requests from various applications where only some of the applications are accessing their data sequentially for example tap achieves the same hit rate as the other techniques with cache length that is times smaller than the cache needed by other techniques when the interleaved workload consists of sequential application data and random application data
abstract in this paper we propose an enhanced concurrency control algorithm that maximizes the concurrency of multidimensional index structures the factors that deteriorate the concurrency of index structures are node splits and minimum bounding region mbr updates in multidimensional index structures the properties of our concurrency control algorithm are as follows first to increase the concurrency by avoiding lock coupling during mbr updates we propose the plc partial lock coupling technique second new mbr update method is proposed it allows searchers to access nodes where mbr updates are being performed finally our algorithm holds exclusive latches not during whole split time but only during physical node split time that occupies the small part of whole split process for performance evaluation we implement the proposed concurrency control algorithm and one of the existing link technique based algorithms on midas iii that is storage system of bada iv dbms we show through various experiments that our proposed algorithm outperforms the existing algorithm in terms of throughput and response time also we propose recovery protocol for our proposed concurrency control algorithm the recovery protocol is designed to assure high concurrency and fast recovery
concurrency analysis is static analysis technique that determines whether two statements or operations in shared memory program may be executed by different threads concurrently concurrency relationships can be derived from the partial ordering among statements imposed by synchronization constructs thus analyzing barrier synchronization is at the core of concurrency analyses for many parallel programming models previous concurrency analyses for programs with barriers commonly assumed that barriers are named or textually aligned this assumption may not hold for popular parallel programming models such as openmp where barriers are unnamed and can be placed anywhere in parallel region ie they may be textually unaligned we present in this paper the first interprocedural concurrency analysis that can handle openmp and in general programs with unnamed and textually unaligned barrierswe have implemented our analysis for openmp programs written in and have evaluated the analysis on programs from the npb and specomp benchmark suites
digital libraries dls are among the most complex kinds of information systems due in part to their intrinsic multi disciplinary nature nowadays dls are built within monolithic tightly integrated and generally inflexible systems or by assembling disparate components together in an ad hoc way with resulting problems in interoperability and adaptability more importantly conceptual modeling requirements analysis and software engineering approaches are rarely supported making it extremely difficult to tailor dl content and behavior to the interests needs and preferences of particular communities in this paper we address these problems in particular we present sl declarative language for specifying and generating domain specific digital libraries sl is based on the formal theory for digital libraries and enables high level specification of dls in five complementary dimensions including the kinds of multimedia information the dl supports stream model how that information is structured and organized structural model different logical and presentational properties and operations of dl components spatial model the behavior of the dl scenario model and the different societies of actors and managers of services that act together to carry out the dl behavior societal model the practical feasibility of the approach is demonstrated by the presentation of sl digital library generator for the marian digital library system
partial redundancy elimination pre is general scheme for suppressing partial redundancies which encompasses traditional optimizations like loop invariant code motion and redundant code elimination in this paper we address the problem of performing this optimization interprocedurally we use interprocedural partial redundancy elimination for placement of communication and communication preprocessing statements while compiling for distributed memory parallel machines
multi mode network typically consists of multiple heterogeneous social actors among which various types of interactions could occur identifying communities in multi mode network can help understand the structural properties of the network address the data shortage and unbalanced problems and assist tasks like targeted marketing and finding influential actors within or between groups in general network and the membership of groups often evolve gradually in dynamic multi mode network both actor membership and interactions can evolve which poses challenging problem of identifying community evolution in this work we try to address this issue by employing the temporal information to analyze multi mode network spectral framework and its scalability issue are carefully studied experiments on both synthetic data and real world large scale networks demonstrate the efficacy of our algorithm and suggest its generality in solving problems with complex relationships
there is general consensus on the importance of good requirements engineering re for achieving high quality software the modeling and analysis of requirements have been the main challenges during the development of complex systems although semi formal scenario driven approaches have raised the awareness and use of requirement engineering techniques mostly because of their intuitive representation scenarios are well established approach to describe functional requirements uncovering hidden requirements and trade offs as well as validating and verifying requirements the ability to perform quantitative analysis at the requirements level supports the detection of design errors during the early stages of software development life cycle and helps reduce the cost of later redesign activities in order to achieve this goal non functional aspects and in particular time related aspects have to be incorporated at the software requirement phase this is essential in order to correctly model and analyze time dependent applications at early stages in system development the widespread interest in time modeling and analysis techniques provides the major motivation for our paper the objective of the article is to provide readers with sufficient knowledge about existing timed scenario approaches to guide them in making informed decisions to when and how time aspects can be incorporated in their development process in order to support this process we present comprehensive classification evaluation and comparison of time based scenario notations in order to evaluate these existing notations we introduce set of eleven time related criteria and apply them to categorize and compare forty seven scenario construction approaches
contextual search refers to proactively capturing the information need of user by automatically augmenting the user query with information extracted from the search context for example by using terms from the web page the user is currently browsing or file the user is currently editingwe present three different algorithms to implement contextual search for the web the first it query rewriting qr augments each query with appropriate terms from the search context and uses an off the shelf web search engine to answer this augmented query the second rank biasing rb generates representation of the context and answers queries using custom built search engine that exploits this representation the third iterative filtering meta search ifm generates multiple subqueries based on the user query and appropriate terms from the search context uses an off the shelf search engine to answer these subqueries and re ranks the results of the subqueries using rank aggregation methodswe extensively evaluate the three methods using contexts and over human relevance judgments of search results we show that while qr works surprisingly well the relevance and recall can be improved using rb and substantially more using ifm thus qr rb and ifm represent cost effective design spectrum for contextual search
major bottleneck for the efficient management of personal photographic collections is the large gap between low level image features and high level semantic contents of images this paper proposes and evaluates two methodologies for making appropriate re use of natural language photographic annotations for extracting references to people location and objects and propagating any location references encountered to previously unannotated images the evaluation identifies the strengths of each approach and shows extraction and propagation results with promising accuracy
an often heard complaint about hearing aids is that their amplification of environmental noise makes it difficult for users to focus on one particular speaker in this paper we present new prototype attentive hearing aid aha based on viewpointer wearable calibration free eye tracker with aha users need only look at the person they are listening to to amplify that voice in their hearing aid we present preliminary evaluation of the use of eye input by hearing impaired users for switching between simultaneous speakers we compared eye input with manual source selection through pointing and remote control buttons results show eye input was faster than selection by pointing and faster than button selection in terms of recall of the material presented eye input performed better than traditional hearing aids better than buttons and better than pointing participants rated eye input as highest in the easiest most natural and best overall categories
query processing in large scale unstructured pp networks is crucial part of operating such systems in order to avoid expensive flooding of the network during query processing so called routing indexes are used each peer maintains such an index for its neighbors it provides compact representation data summary of data accessible via each neighboring peer an important problem in this context is to keep these data summaries up to date without paying high maintenance costs in this paper we investigate the problem of maintaining distributed data summaries in pp based environments without global knowledge and central instances based on classification of update propagation strategies we discuss several approaches to reduce maintenance costs and present results from an experimental evaluation
we study the problem of scheduling repetitive real time tasks with the earliest deadline first edf policy that can guarantee the given maximal temperature constraint we show that the traditional scheduling approach ie to repeat the schedule that is feasible through the range of one hyper period does not apply any more then we present necessary and sufficient conditions for real time schedules to guarantee the maximal temperature constraint based on these conditions novel scheduling algorithm is proposed for developing the appropriate schedule that can ensure the maximal temperature guarantee finally we use experiments to evaluate the performance of our approach
message dispatch in object oriented programming oop involves target method lookup in dispatch table tree reflective environment builds dispatch data structure at runtime as types can be added at runtime hence algorithms for reflective environments require dynamic data structure for dispatch in this paper we propose tree based algorithm for multiple dispatch in reflective runtime environment new classes can be added to the system at runtime proposed algorithm performs lookup in time proportional to log times the polymorphic arguments where is number of classes in system proposed algorithm uses type safe approach for multimethod lookup resolving ambiguities we compare performance of the proposed algorithm with the dispatch mechanism in commonly used virtual reflexive systems eg java and microsoft’s common language runtime ms clr in respect of efficiency and type safety
in this paper we address the problem of reducing the communication cost and hence the energy costs incurred in data gathering applications of sensor network environmental data depicts huge amount of correlation in both the spatial and temporal domains we exploit these temporal spatial correlations to address the aforementioned problem more specifically we propose framework that partitions the physical sensor network topology into number of feature regions each sensor node builds data model that represents the underlying structure of the data representative node in each feature region communicates only the model coefficients to the sink which then uses them to answer queries the temporal and spatial similarity has special meaning in outlier cleaning too we use modified score technique to precisely label the outliers and use the spatial similarity to confirm whether the outliers are due to true change in the phenomenon under study or due to faulty sensor nodes
in an effort to reduce costs and improve staffing options companies are going offshore to staff software projects although advances in communication technologies make it easier to share information growing number of studies have highlighted the disadvantages and hidden costs of distributed teams this paper reports the results of study that used social network analysis to study the informal communication patterns in three successful global software teams the results indicated that technical leaders acted as brokers to coordinate work across tasks and sites in self organizing sub groups personal networks awareness of tasks and accessibility influenced communication which in turn affected the social climate of the team we discuss the implications of these results for technologies that support awareness and for management practices especially in globally distributed work contexts
we present new method of detecting privacy violations in the context of database publishing our method defines published view to preserve the privacy of secret query if and return no tuples in common over all possible database instances we then establish necessary and sufficient conditions that characterize when preserves the privacy of in terms of the projected inequalities in the queries both for conjunctive queries and queries with negation we also show that integrity constraints have an effect on privacy and derive test for ensuring privacy preservation in the presence of fd constraints the issue of privacy preservation in the presence of multiple views is investigated and we show that it can reduced to the single view case for suitably chosen view
mapping problem space features into solution space features is fundamental configuration problem in software product line engineering configuration problem is defined as generating the most optimal combination of software features given requirements specification and given set of configuration rules current approaches however provide little support for expressing complex configuration rules between problem and solution space that support incomplete requirements specifications in this paper we propose an approach to model complex configuration rules based on generalization of the concept of problem solution feature interactions these are interactions between solution space features that only arise in specific problem contexts the use of an existing tool to support our approach is also discussed we use the dlv answer set solver to express particular configuration problem as logic program whose answer set corresponds to the optimal combinations of solution space features we motivate and illustrate our approach with case study in the field of managing dynamic adaptations in distributed software where the goal is to generate an optimal protocol for accommodating given adaptation
this paper introduces beira an area based map user interface for location based contents recently various web map services are widely used to search for location based contents however browsing large number of contents that are arranged on map as points may be troublesome we tackle this issue by using area based representations instead of points aoi area of interest which is core concept of beira is an arbitrary shaped area boundary with text summary information with aoi users can instantly grasp area characteristics without examining each point aoi is deduced by performing geo semantic co clustering of location based contents geo semantic co clustering takes both geographic and semantic features of contents into account we confirm that the ratio of the geo semantic blend is the key to deducing an appropriate boundary we further propose and evaluate location aware term weighting to obtain an informative summary
validation of rtl descriptions remains one of the principal bottlenecks in the circuit design process random simulation based methods for functional validation suffer from fundamental limitations and may be inappropriate or too expensive in fact for some circuits large number of vectors is required in order to make the circuit reach hard to test constructs and obtain accurate values for their testability in this work we present static non simulation based method for the determination of the controllability of rtl constructs that is efficient and gives accurate feedback to the designers in what regards the presence of hard to control constructs in their rtl code the method takes as input verilog rtl description solves the chapman kolmogorov equations that describe the steady state of the circuit and outputs the computed values for the controllability of the rtl constructs to avoid the exponential blow up that results from writing one equation for each circuit state and solving the resulting system of equations an approximation method is used we present results showing that the approximation is effective and describe how the method can be used to bias random testgenerator in order to achieve higher coverage using smaller number of vectors
the embodied and situated approach to artificial intelligence ai has matured and become viable alternative to traditional computationalist approaches with respect to the practical goal of building artificial agents which can behave in robust and flexible manner under changing real world conditions nevertheless some concerns have recently been raised with regard to the sufficiency of current embodied ai for advancing our scientific understanding of intentional agency while from an engineering or computer science perspective this limitation might not be relevant it is of course highly relevant for ai researchers striving to build accurate models of natural cognition we argue that the biological foundations of enactive cognitive science can provide the conceptual tools that are needed to diagnose more clearly the shortcomings of current embodied ai in particular taking an enactive perspective points to the need for ai to take seriously the organismic roots of autonomous agency and sense making we identify two necessary systemic requirements namely constitutive autonomy and adaptivity which lead us to introduce two design principles of enactive ai it is argued that the development of such enactive ai poses significant challenge to current methodologies however it also provides promising way of eventually overcoming the current limitations of embodied ai especially in terms of providing fuller models of natural embodied cognition finally some practical implications and examples of the two design principles of enactive ai are also discussed
embedded processor performance is dependent on both the underlying architecture and the compiler optimisations applied however designing both simultaneously is extremely difficult to achieve due to the time constraints designers must work under therefore current methodology involves designing compiler and architecture in isolation leading to sub optimal performance of the final product this paper develops novel approach to this co design space problem for any microarchitectural configuration we automatically predict the performance that an optimising compiler would achieve without actually building it once trained single run of on the new architecture is enough to make prediction with just error rate this allows the designer to accurately choose an architectural configuration with knowledge of how an optimising compiler will perform on it we use this to find the best optimising compiler architectural configuration in our co design space and demonstrate that it achieves an average performance improvement and energy savings of compared to the baseline leading to an energy delay ed value of
we study here deterministic broadcasting in geometric radio networks grn whose nodes have complete knowledge of the network nodes of grn are deployed in the euclidean plane and each of them can transmit within some range assigned to it we adopt model in which ranges of nodes are non uniform and they are drawn from the predefined interval min max all our results are in the conflict embodied model where receiving node must be in the range of exactly one transmitting node in order to receive the messagewe derive several lower and upper bounds on the time of deterministic broadcasting in grns in terms of the number of nodes distribution of nodes ranges and the eccentricity of the source node ie the maximum length of shortest directed path from the source node to another node in the network in particular we show that log rounds are required to accomplish broadcasting in some grn where each node has the transmission range set either to or to we also prove that the bound log is almost tight providing broadcasting procedure that works in this type of grn in time logn in grns with wider choice of positive node ranges from min max we show that broadcasting requires omega min log frac max min log rounds and that it can be accomplished in log frac max min rounds subsuming the best currently known upper bound frac max min provided in we also study the problem of simulation of minimum energy broadcasting in arbitrary grns we show that energy optimal broadcasting that can be completed in rounds in conflict free model may require up to additional rounds in the conflict embodied model this lower bound should be seen as separation result between conflict free and conflict embodied geometric radio networks finally we also prove that any hop broadcasting algorithm with the energy consumption cal in grn can be simulated within hlog rounds in the conflict embodied model using energy cal where is the ratio between the largest and the shortest euclidean distance between pair of nodes in the network
based on our recent work on the development of trust model for recommender agents and qualitative survey we explore the potential of building users trust with explanation interfaces we present the major results from the survey which provided roadmap identifying the most promising areas for investigating design issues for trust inducing interfaces we then describe set of general principles derived from an in depth examination of various design dimensions for constructing explanation interfaces which most contribute to trust formation we present results of significant scale user study which indicate that the organization based explanation is highly effective in building users trust in the recommendation interface with the benefit of increasing users intention to return to the agent and save cognitive effort
indoor location determination has emerged as significant research topic due to the wide spread deployment of wireless local area networks wlans and the demand for context aware services inside buildings however prediction accuracy remains primary issue surrounding the practicality of wlan based location determination systems this study proposes novel scheme that utilizes mobile user orientation information to improve prediction accuracy theoretically if the precise orientation of user can be identified then the location determination system can predict that user’s location with high degree of accuracy by using the training data of this specific orientation in reality mobile user’s orientation can be estimated only by comparing variations in received signal strength and nevertheless the predicted orientation may be incorrect incorrect orientation information causes the accuracy of the entire system to decrease therefore this study presents an accumulated orientation strength algorithm which can utilize uncertain estimated orientation information to improve prediction accuracy implementation of this system is based on the bayesian model and the experimental results indeed show the effectiveness of our proposed approach
desired capability of automatic problem solvers is that they can explain the results such explanations should justify that the solution proposed by the problem solver arises from the known domain knowledge in this paper we discuss how explanations can be used in case based reasoning cbr in order to justify the results in classification tasks and also for solving new problems we particularly focus on explanations derived from building symbolic description of the similar aspects among cases moreover we show how symbolic descriptions of similarity can be exploited in the different processes of cbr namely retrieve reuse revise and retain
sentiment analysis or opinion mining aims to use automated tools to detect subjective information such as opinions attitudes and feelings expressed in text this paper proposes novel probabilistic modeling framework based on latent dirichlet allocation lda called joint sentiment topic model jst which detects sentiment and topic simultaneously from text unlike other machine learning approaches to sentiment classification which often require labeled corpora for classifier training the proposed jst model is fully unsupervised the model has been evaluated on the movie review dataset to classify the review sentiment polarity and minimum prior information have also been explored to further improve the sentiment classification accuracy preliminary experiments have shown promising results achieved by jst
as object oriented model becomes the trend of database technology there is need to convert relational to object oriented database system to improve productivity and flexibility the changeover includes schema translation data conversion and program conversion this paper describes methodology for integrating schema translation and data conversion schema translation involves semantic reconstruction and the mapping of relational schema into object oriented schema data conversion involves unloading tuples of relations into sequential files and reloading them into object oriented classes files the methodology preserves the constraints of the relational database by mapping the equivalent data dependencies
three dimensional shape searching is problem of current interest in several different fields most techniques have been developed for particular domain and reduce shape into simpler shape representation the techniques developed for particular domain will also find applications in other domains we classify and compare various shape searching techniques based on their shape representations brief description of each technique is provided followed by detailed survey of the state of the art the paper concludes by identifying gaps in current shape search techniques and identifies directions for future research
as result of the growing popularity of wireless sensor networks wsns secure group communication becomes an important research issue for network security because of the popularity of group oriented applications in wsns such as electronic monitoring and collaborative sensing the secure group key agreement protocol design is crucial for achieving secure group communications as we all know most security technologies are currently deployed in wired networks and are not fully applicable to wsns involving mobile nodes with limited capability in we proposed blom’s matrix based group key management protocol bkm with robust continuity for wsns unfortunately in this paper we present that the bkm has security weakness in which participants cannot confirm that their contributions were actually involved in the group key establishment this is an important property of group key agreement therefore we propose verified group key agreement protocol bka for resource constrained wsns we show that the proposed protocol produces contributory group key agreement we demonstrate that the proposed protocol is perfect key management protocol and is well suited for resource constrained wsns
in this paper we study hardware supported compiler directed hscd cache coherence scheme which can be implemented on large scale multiprocessor using off the shelf microprocessors such as the cray td the scheme can be adapted to various cache organizations including multiword cache lines and byte addressable architectures several system related issues including critical sections interthread communication and task migration have also been addressed the cost of the required hardware support is minimal and proportional to the cache size the necessary compiler algorithms including intra and interprocedural array data flow analysis have been implemented on the polaris parallelizing compiler from our simulation study using the perfect club benchmarks we found that in spite of the conservative analysis made by the compiler for four of six benchmark programs tested the proposed hscd scheme outperforms the full map hardware directory scheme up to percent while the hardware scheme outperforms the hscd scheme in the remaining two applications up to percent given its comparable performance and reduced hardware cost the proposed scheme can be viable alternative for large scale multiprocessors such as the cray td which rely on users to maintain data coherence
current search engines do not in general perform well with longer more verbose queries one of the main issues in processing these queries is identifying the key concepts that will have the most impact on effectiveness in this paper we develop and evaluate technique that uses query dependent corpus dependent and corpus independent features for automatic extraction of key concepts from verbose queries we show that our method achieves higher accuracy in the identification of key concepts than standard weighting methods such as inverse document frequency finally we propose probabilistic model for integrating the weighted key concepts identified by our method into query and demonstrate that this integration significantly improves retrieval effectiveness for large set of natural language description queries derived from trec topics on several newswire and web collections
we introduce the notion of ranking robustness which refers to property of ranked list of documents that indicates how stable the ranking is in the presence of uncertainty in the ranked documents we propose statistical measure called the robustness score to quantify this notion our initial motivation for measuring ranking robustness is to predict topic difficulty for content based queries in the ad hoc retrieval task our results demonstrate that the robustness score is positively and consistently correlation with average precision of content based queries across variety of trec test collections though our focus is on prediction under the ad hoc retrieval task we observe an interesting negative correlation with query performance when our technique is applied to named page finding queries which are fundamentally different kind of queries side effect of this different behavior of the robustness score between the two types of queries is that the robustness score is also found to be good feature for query classification
information presented in website is usually organized into certain logical structure that is intuitive to users it would be useful to model websites with such logical structure so that extraction of web data from these sites can be performed in simple and efficient manner however the recognition and reconstruction of such logical structure by software agent is not straightforward due to the complex hyper link structure among webpages and the html formatting within each webpage in this paper we propose the wiccap data model data model that maps websites from their physical structure into commonly perceived logical views to enable easy and rapid creation of such data models we have implemented visual tool called the mapping wizard to facilitate and automate the process of producing wiccap data models using the tool the time required to construct logical representation for given website is significantly reduced
aspects require access to the join point context in order to select and adapt join points for this purpose current aspect oriented systems offer large number of pointcut constructs that provide access to join point information that is local to the join point context like parameters in method call join points however these systems are quite miserly with non local information that cannot directly be derived from the local execution context recently there have been some proposals that offer access to some kind of non local information one such proposal is the path expression pointcut that permits to abstract over non local object information path pointcuts expose non local objects that are specified in corresponding path expression patterns in this paper we show recurrent situations where developers need to access the whole object paths and consequently they add workarounds other than pointcut constructs to get the required accesses then we present and study an extension to the path expression pointcuts to permit exposing the object paths and show how this extension overcomes the problem
object technology has been widely acclaimed as offering revolution in computing that will resolve myriad of problems inherent in developing and managing organizational information processing capabilities although its foundations arose in computer programming languages object technology has implications for wide range of business computing activities including programming analysis and design information management and information sharing we examine six fundamental research frontiers in each activity common business classes organizational barriers applications and tools reuse and object management standards testing and metrics and technology investment the cross product of the business computing activities with these fundamental research frontiers yields taxonomy within which to position the research needed to realize the promises offered by object technology
with the advent of new fcc policies on spectrum allocation for next generation wireless devices we have rare opportunity to redesign spectrum access protocols to support demanding latency sensitive applications such as high def media streaming in home networks given their low tolerance for traffic delays and disruptions these applications are ill suited for traditional contention based csma protocols in this paper we explore an alternative approach to spectrum access that relies on frequency agile radios to performinterference free transmission across orthogonal frequencies we describe jello mac overlay where devices sense and occupy unused spectrum without central coordination or dedicated radio for control we show that over time spectrum fragmentation can significantly reduce usable spectrum in the system jello addresses this using two complementary techniques online spectrum defragmentation where active devices periodically migrate spectrum usage and non contiguous access which allows single flow to utilize multiple spectrum fragments our prototype on an node gnu radio testbed shows that jello significantly reduces spectrum fragmentation and provides high utilization while adapting to client flows changing traffic demands
holistic approach to modelling embedded systems is advocated many aspects of system should be analysed in isolation to keep the task manageable but they often influence each other during integration in way that the desired system becomes unrealisable tool supported approach that aims at integrated models of different concerns based on formal methods is suggested to solve this problem this approach uses creol which is language designed for object oriented modelling of distributed systems we report on ongoing work on the design and the implementation of tools that support modelling validation and verification we focus on sensor networks which are distributed system that consists of many embedded devices with tight constraints on computational power energy availability and timeliness the described tools are compiler that performs static checks and optimisations an interpreter that defines formal semantics and prototypical ltl model checker this supports seamless development with formal methods
over the last few years social network systems have greatly increased users involvement in online content creation and annotation since such systems usually need to deal with large amount of multimedia data it becomes desirable to realize an interactive service that minimizes tedious and time consuming manual annotation in this paper we propose an interactive online platform that is capable of performing semi automatic image annotation and tag recommendation for an extensive online database first when the user marks specific object in an image the system performs an object duplicate detection and returns the search results with images containing similar objects then the annotation of the object can be performed in two ways in the tag recommendation process the system recommends tags associated with the object in images of the search results among which the user can accept some tags for the object in the given image in the tag propagation process when the user enters his her tag for the object it is propagated to images in the search results different techniques to speed up the process of indexing and retrieval are presented in this paper and their effectiveness demonstrated through set of experiments considering various classes of objects
many important workloads today such as web hosted services are limited not by processor core performance but by interactions among the cores the memory system devices and the complex software layers that tie these components together architects designing future systems for these workloads are challenged to identify performance bottlenecks because as in any concurrent system overheads in one component may be hidden due to overlap with other operations these overlaps span the user kernel and software hardware boundaries making traditional performance analysis techniques inadequate we present methodology for identifying end to end critical paths across software and simulated hardware in complex networked systems by modeling systems as collections of state machines interacting via queues we can trace critical paths through multiplexed processing engines identify when resources create bottlenecks including abstract resources such as flow control credits and predict the benefit of eliminating bottlenecks by increasing hardware speeds or expanding available resources we implement our technique in full system simulator and analyze tcp microbenchmark web server the linux tcp ip stack and an ethernet controller from single run of the microbenchmark our tool within minutes correctly identifies series of bottlenecks and predicts the performance of hypothetical systems in which these bottlenecks are successively eliminated culminating in total speedup of xwe then validate these predictions through hours of additional simulation and find them to be accurate within we also analyze the web server find it to be cpu bound and predict the performance of system with an additional core within
maximum margin criterion mmc based feature extraction is more efficient than linear discriminant analysis lda for calculating the discriminant vectors since it does not need to calculate the inverse within class scatter matrix however mmc ignores the discriminative information within the local structures of samples and the structural information embedding in the images in this paper we develop novel criterion namely laplacian bidirectional maximum margin criterion lbmmc to address the issue we formulate the image total laplacian matrix image within class laplacian matrix and image between class laplacian matrix using the sample similar weight that is widely used in machine learning the proposed lbmmc based feature extraction computes the discriminant vectors by maximizing the difference between image between class laplacian matrix and image within class laplacian matrix in both row and column directions experiments on the feret and yale face databases show the effectiveness of the proposed lbmmc based feature extraction method
we introduce parameterized pattern queries as new paradigm to extend traditional pattern expressions over sequence databases parameterized pattern is essentially string made of constant symbols or variables where variables can be matched against any symbol of the input string parameterized patterns allow concise and expressive definition of regular expressions that would be very complex to describe without variables they can also be used to express additional constraints to relax pattern expressions by allowing more freedom and finally to cluster patterns in order to minimize the number of symbols comparisons
we present visualization design to enhance the ability of an administrator to detect and investigate anomalous traffic between local network and external domains central to the design is parallel axes view which displays netflow records as links between two machines or domains while employing variety of visual cues to assist the user we describe several filtering options that can be employed to hide uninteresting or innocuous traffic such that the user can focus his or her attention on the more unusual network flows this design is implemented in the form of visflowconnect prototype application which we used to study the effectiveness of our visualization approach using visflowconnect we were able to discover variety of interesting network traffic patterns some of these were harmless normal behavior but some were malicious attacks against machines on the network
few existing visualization systems can handle large data sets with hundreds of dimensions since high dimensional data sets cause clutter on the display and large response time in interactive exploration in this paper we present significantly improved multidimensional visualization approach named value and relation var display that allows users to effectively and efficiently explore large data sets with several hundred dimensions in the var display data values and dimension relationships are explicitly visualized in the same display by using dimension glyphs to explicitly represent values in dimensions and glyph layout to explicitly convey dimension relationships in particular pixel oriented techniques and density based scatterplots are used to create dimension glyphs to convey values multidimensional scaling jigsaw map hierarchy visualization techniques and an animation metaphor named rainfall are used to convey relationships among dimensions rich set of interaction tools has been provided to allow users to interactively detect patterns of interest in the var display prototype of the var display has been fully implemented the case studies presented in this paper show how the prototype supports interactive exploration of data sets of several hundred dimensions user study evaluating the prototype is also reported in this paper
there are number of challenges facing the high performance computing hpc community including increasing levels of concurrency threads cores nodes deeper and more complex memory hierarchies register cache disk network mixed hardware sets cpus and gpus and increasing scale tens or hundreds of thousands of processing elements assessing the performance of complex scientific applications on specialised high performance computing architectures is difficult in many cases traditional computer benchmarking is insufficient as it typically requires access to physical machines of equivalent or similar specification and rarely relates to the potential capability of an application technique known as application performance modelling addresses many of these additional requirements modelling allows future architectures and or applications to be explored in mathematical or simulated setting thus enabling hypothetical questions relating to the configuration of potential future architecture to be assessed in terms of its impact on key scientific codes this paper describes the warwick performance prediction warpp simulator which is used to construct application performance models for complex industry strength parallel scientific codes executing on thousands of processing cores the capability and accuracy of the simulator is demonstrated through its application to scientific benchmark developed by the united kingdom atomic weapons establishment awe the results of the simulations are validated for two different hpc architectures each case demonstrating greater than accuracy for run time prediction simulation results collected from runs on standard pc are provided for up to processor cores it is also shown how the addition of operating system jitter to the simulator can improve the quality of the application performance model results
many statistical techniques have been proposed to predict fault proneness of program modules in software engineering choosing the best candidate among many available models involves performance assessment and detailed comparison but these comparisons are not simple due to the applicability of varying performance measures classifying software module as fault prone implies the application of some verification activities thus adding to the development cost misclassifying module as fault free carries the risk of system failure also associated with cost implications methodologies for precise evaluation of fault prediction models should be at the core of empirical software engineering research but have attracted sporadic attention in this paper we overview model evaluation techniques in addition to many techniques that have been used in software engineering studies before we introduce and discuss the merits of cost curves using the data from public repository our study demonstrates the strengths and weaknesses of performance evaluation techniques and points to conclusion that the selection of the best model cannot be made without considering project cost characteristics which are specific in each development environment
we present an interactive system that stylizes an input video into painterly animation the system consists of two phases the first is an video parsing phase that extracts and labels semantic objects with different material properties skin hair cloth and so on in the video and then establishes robust correspondence between frames for discriminative image features inside each object the second painterly rendering phase performs the stylization based on the video semantics and feature correspondence compared to the previous work the proposed method advances painterly animation in three aspects firstly we render artistic painterly styles using rich set of example based brush strokes these strokes placed in multiple layers and passes are automatically selected according to the video semantics secondly we warp brush strokes according to global object deformations so that the strokes appear to be tightly attached to the object surfaces thirdly we propose series of novel teniques to reduce the scintillation effects results applying our system to several video clips show that it produces expressive oil painting animations
software quality is defined as the degree to which software component or system meets specified requirements and specifications assessing software quality in the early stages of design and development is crucial as it helps reduce effort time and money however the task is difficult since most software quality characteristics such as maintainability reliability and reusability cannot be directly and objectively measured before the software product is deployed and used for certain period of time nonetheless these software quality characteristics can be predicted from other measurable software quality attributes such as complexity and inheritance many metrics have been proposed for this purpose in this context we speak of estimating software quality characteristics from measurable attributes for this purpose software quality estimation models have been widely used these take different forms statistical models rule based models and decision trees however data used to build such models is scarce in the domain of software quality as result the accuracy of the built estimation models deteriorates when they are used to predict the quality of new software components in this paper we propose search based software engineering approach to improve the prediction accuracy of software quality estimation models by adapting them to new unseen software products the method has been implemented and favorable result comparisons are reported in this work
hair is major feature of digital characters unfortunately it has complex geometry which challenges standard modeling tools some dedicated techniques exist but creating realistic hairstyle still takes hours complementary to user driven methods we here propose an image based approach to capture the geometry of hairthe novelty of this work is that we draw information from the scattering properties of the hair that are normally considered hindrance to do so we analyze image sequences from fixed camera with moving light source we first introduce novel method to compute the image orientation of the hairs from their anisotropic behavior this method is proven to subsume and extend existing work while improving accuracy this image orientation is then raised into orientation by analyzing the light reflected by the hair fibers this part relies on minimal assumptions that have been proven correct in previous workfinally we show how to use several such image sequences to reconstruct the complete hair geometry of real person results are shown to illustrate the fidelity of the captured geometry to the original hair this technique paves the way for new approach to digital hair generation
with its th biannual anniversary conference participatory design pd is leaving its teens and must now be considered ready to join the adult world in this article we encourage the pd community to think big pd should engage in large scale information systems development and opt for pd approach applied throughout design and organizational implementation to pursue this aim we extend the iterative pd prototyping approach by emphasizing pd experiments as transcending traditional prototyping by evaluating fully integrated systems exposed to real work practices incorporating improvisational change management including anticipated emergent and opportunity based change and extending initial design and development into sustained and ongoing stepwise implementation that constitutes an overall technology driven organizational change the extended approach is exemplified through large scale pd experiment in the danish healthcare sector we reflect on our experiences from this experiment and discuss four challenges pd must address in dealing with large scale systems development
when writing program generator requires considerable intellectual effort it is valuable to amortize that effort by using the generator to build more than one application when program generator serves multiple clients however the implementor must address pragmatic questions that implementors of single use program generators can ignore in how many languages should generated code be written how should code be packaged what should the interfaces to the client code look like how should user control variations this paper elaborates on these questions by means of case studies of the new jersey machine code toolkit the lambda rtl translator and the asdl program generator it is hoped that the paper will stimulate the development of better techniques most urgently needed are standard way to support multiple target languages and simple clear way to control interfaces to generated code
phase change memory pcm is an emerging memory technology for future computing systems compared to other non volatile memory alternatives pcm is more matured to production and has faster read latency and potentially higher storage density the main roadblock precluding pcm from being used in particular in the main memory hierarchy is its limited write endurance to address this issue recent studies proposed to either reduce pcm’s write frequency or use wear leveling to evenly distribute writes although these techniques can extend the lifetime of pcm most of them will not prevent deliberately designed malicious codes from wearing it out quickly furthermore all the prior techniques did not consider the circumstances of compromised os and its security implication to the overall pcm design compromised os will allow adversaries to manipulate processes and exploit side channels to accelerate wear out in this paper we argue that pcm design not only has to consider normal wear out under normal application behavior most importantly it must take the worst case scenario into account with the presence of malicious exploits and compromised os to address the durability and security issues simultaneously in this paper we propose novel low cost hardware mechanism called security refresh to avoid information leak by constantly migrating their physical locations inside the pcm obfuscating the actual data placement from users and system software it uses dynamic randomized address mapping scheme that swaps data using random keys upon each refresh due the hardware overhead is tiny without using any table the best lifetime we can achieve under the worst case malicious attack is more than six years also our scheme incurs around performance degradation for normal program operations
graphics applications often need to manipulate numerous graphical objects stored as polygonal models mesh simplification is an approach to vary the levels of visual details as appropriate thereby improving on the overall performance of the applications different mesh simplification algorithms may cater for different needs producing diversified types of simplified polygonal model as result testing mesh simplification implementations is essential to assure the quality of the graphics applications however it is very difficult to determine the oracles or expected outcomes of mesh simplification for the verification of test results reference model is an implementation closely related to the program under test is it possible to use such reference models as pseudo oracles for testing mesh simplification programs if so how effective are they this paper presents fault based pattern classification methodology called pat to address the questions in pat we train the classifier using black box features of samples from reference model and its fault based versions in order to test samples from the subject program we evaluate pat using four implementations of mesh simplification algorithms as reference models applied to open source three dimensional polygonal models empirical results reveal that the use of reference model as pseudo oracle is effective for testing the implementations of resembling mesh simplification algorithms however the results also show tradeoff when compared with simple reference model the use of resembling but sophisticated reference model is more effective and accurate but less robust
we solve the problem of integrating modulo scheduling with instruction selection including cluster assignment instruction scheduling and register allocation with optimal spill code generation and scheduling our method is based on integer linear programming we prove that our algorithm delivers optimal results in finite time for certain class of architectures we believe that these results are interesting both from theoretical point of view and as reference point when devising heuristic methods
temporal data dependencies are high level linguistic constructs that define relationships among values of data elements in temporal databases these constructs enable the support of schema versioning as well as the definition of consistency requirements for single time point and among values in different time points in this paper we present multiagent update process in database with temporal data dependencies and schema versioning the update process supports the evolution of dependencies over time and the use of temporal operators within temporal data dependencies the temporal dependency language is presented along with the temporal dependency graph which serves as the executable data structure thorough discussion of the feasibility performance and consistency of the presented model is provided
in this paper we present an approach to semantic based web service discovery and prototypical tool based on syntactic and structural schema matching it is based on matching an input ontology describing service request to web services descriptions at the syntactic level through web services description language wsdl or at the semantic level through service ontologies described with languages such as ontology web language for services owl web services modelling ontology wsmo semantic web services framework swsf and web services description language semantics wsdl the different input schemas wsdl descriptions ontology web language owl ontologies owl wsmo swsf and wsdl components are represented in uniform way by means of directed rooted graphs where nodes represent schema elements connected by directed links of different types eg for containment and referential relationships on this uniform internal representation number of matching algorithms operate including structural based algorithms children matcher leaves matcher graph and subgraph isomorphism and syntactical ones edit distance levenshtein distance or ld and synonym matcher through the wordnet synonyms thesaurus
in recent years time information is more and more important in collaborative filtering cf based recommender system because many systems have collected rating data for long time and time effects in user preference is stronger in this paper we focus on modeling time effects in cf and analyze how temporal features influence cf there are four main types of time effects in cf time bias the interest of whole society changes with time user bias shifting user may change his her rating habit over time item bias shifting the popularity of items changes with time user preference shifting user may change his her attitude to some types of items in this work these four time effects are used by factorized model which is called timesvd moreover many other time effects are used by simple methods our time dependent models are tested on netflix data from nov to dec experimental results show that prediction accuracy in cf can be improved significantly by using time information
ontology languages such as owl are being widely used as the semantic web movement gains momentum with the proliferation of the semantic web more and more large scale ontologies are being developed in real world applications to represent and integrate knowledge and data there is an increasing need for measuring the complexity of these ontologies in order for people to better understand maintain reuse and integrate them in this paper inspired by the concept of software metrics we propose suite of ontology metrics at both the ontology level and class level to measure the design complexity of ontologies the proposed metrics are analytically evaluated against weyuker’s criteria we have also performed empirical analysis on public domain ontologies to show the characteristics and usefulness of the metrics we point out possible applications of the proposed metrics to ontology quality control we believe that the proposed metric suite is useful for managing ontology development projects
ownership types characterize the topology of objects in the heap through characterization of the context to which an object belongs they have been used to support reasoning memory management concurrency etc subtyping is traditionally invariant wrt contexts which has often proven inflexible in some situations recent work has introduced restricted forms of subtype variance and unknown context but in rather ad hoc and restricted waywe develop jo calculus which supports parameterisation of types as well as contexts and allows variant subtyping of contexts based on existential quantification jo is more expressive general and uniform than previous works which add variance to ownership languages our explicit use of existential types makes the connection to type theoretic foundations from existential types more transparent we prove type soundness for jo and extend it to jo deep which enforces the owners as dominators property
cwi and university of twente used pf tijah flexible xml retrieval system to evaluate structured document retrieval multimedia retrieval and entity ranking tasks in the context of inex for the retrieval of textual and multimedia elements in the wikipedia data we investigated various length priors and found that biasing towards longer elements than the ones retrieved by our language modelling approach can be useful for retrieving images in isolation we found that their associated text is very good source of evidence in the wikipedia collection for the entity ranking task we used random walks to model multi step relevance propagation from the articles describing entities to all related entities and further and obtained promising results
among the many face matching techniques that have been developed are variants of facial curve matching which reduce the amount of face data to one or few curves the face’s central profile for instance proved to work well however the selection of the optimal set of curves and the best way to match them has not been researched systematically we propose face matching framework that allows profile and contour based face matching using this framework we evaluate profile and contour types including those described in the literature and select subsets of facial curves for effective and efficient face matching with set of eight geodesic contours we achieve mean average precision map of and recognition rate rr on the face retrieval track of the shape retrieval contest shrec and map of and rr on the university of notre dame und test set face matching with these curves is time efficient and performs better than other sets of facial curves and depth map comparison
the information resources on the web are vast but much of the web is based on browsing paradigm that requires someone to actively seek information instead one would like to have information agents that continuously attend to one’s personal information needs such agents need to be able to extract the relevant information from web sources integrate data across sites and execute efficiently in networked environment in this paper describe the technologies we have developed to rapidly construct and deploy information agents on the web this includes wrapper learning to convert online sources into agent friendly resources query planning and record linkage to integrate data across different sites and streaming dataflow execution to efficiently execute agent plans also describe how we applied this work within the electric elves project to deploy set of agents for continuous monitoring of travel itineraries
we present the first machine checked correctness proof for information flow control ifc based on program dependence graphs pdgs ifc based on slicing and pdgs is flow sensitive context sensitive and object sensitive thus offering more precision than traditional approaches while the method has been implemented and successfully applied to realistic java programs only manual proof of fundamental correctness property was available so far the new proof is based on new correctness proof for intraprocedural pdgs and program slices both proofs are formalized in isabelle hol they rely on abstract structures and properties instead of concrete syntax and definitions carrying the correctness proof over to any given language or dependence definition reduces to just showing that it fulfills the necessary preconditions thus eliminating the need to develop another full proof we instantiate the framework with both simple while language and java bytecode as well as with three different control dependence definitions thus we obtain ifc correctness proofs for the price of
branch mispredictions can have major performance impact on high performance processors multipath execution has recently been introduced to help limit the misprediction penalties incurred by branches that are difficult to predict this paper presents efficient instruction fetch architecture designs for these multipath processor execution cores we evaluate number of design trade offs for the first level instruction cache and the multipath pc fetch arbiter furthermore we evaluate the effect of additional bandwidth limitations imposed by the processor frontend pipeline our results show that instruction fetch support for efficient multipath execution can be achieved with realizable hardware implementations in addition we show that the best performing instruction fetch designs for multipath execution and multithreaded processors are likely to differ since both designs optimize the processor for different performance goals minimal execution time vs maximal throughput
the quality and impact of academic web sites is ofinterest to many audiences including the scholars who use them and web educators who need to identify best practice several large scale european union research projects have been funded to build new indicators for online scientific activity reflecting recognition of the importance of the web for scholarly communication in this paper we address the key question of whether higher rated scholars produce higher impact web sites using the united kingdom as case study and measuring scholars quality in terms of university wide average research ratings methodological issues concerning the measurement of the online impact are discussed leading to the adoption of counts of links to university’s constituent single domain web sites from an aggregated counting metric the findings suggest that universities with higher rated scholars produce significantly more web content but with similar average online impact higher rated scholars therefore attract more total links from their peers but only by being more prolific refuting earlier suggestions it can be surmised that general web publications are very different from scholarly journal articles and conference papers for which scholarly quality does associate with citation impact this has important implications for the construction of new web indicators for example that online impact should not be used to assess the quality of small groups of scholars even within single discipline
large cluster systems with thousands of nodes have become cost effective alternative to traditional supercomputers in these systems cluster nodes are interconnected using high degree switches regular direct network topologies including tori ary cubes and meshes are among adapted choices for interconnecting these high degree switches we propose general fault tolerant routing scheme applicable for regular direct interconnection networks satisfying some interconnection conditions the scheme is based on the availability of efficiently identifiable disjoint routes between network nodes the proposed scheme is first presented in general terms for any interconnection topology satisfying the presented connectivity conditions the scheme is then illustrated on two examples of interconnection topologies namely the binary hypercube and the ary cube
we present an image based technique to relight real objects illuminated by incident light field representing the illumination of an environment by exploiting the richness in angular and spatial variation of the light field objects can be relit with high degree of realismwe record photographs of an object illuminated from various positions and directions using projector mounted on gantry as moving light source the resulting basis images are used to create subset of the full reflectance field of the object using this reflectance field we can create an image of the object relit with any incident light field and observed from flxed camera positionto maintain acceptable recording times and reduce the amount of data we propose an efficient data acquisition methodsince the object can be relit with incident light field illumination effects encoded in the light field such as shafts of shadow or spot light effects can be realized
this paper presents novel shape analysis algorithm with local reasoning that is designed to analyze heap structures with structural invariants such as doubly linked lists the algorithm abstracts and analyzes one single heap cell at time in order to maintain the structural invariants the analysis uses local heap abstraction that models the sub heap consisting of one cell and its immediate neighbors the proposed algorithm can successfully analyze standard doubly linked list manipulations
we approach mosaicing as camera tracking problem within known parameterized surface from video of camera moving within surface we compute mosaic representing the texture of that surface flattened onto planar image our approach works by defining warp between images as function of surface geometry and camera pose globally optimizing this warp to maximize alignment across all frames determines the camera trajectory and the corresponding flattened mosaic image in contrast to previous mosaicing methods which assume planar or distant scenes or controlled camera motion our approach enables mosaicing in cases where the camera moves unpredictably through proximal surfaces such as in medical endoscopy applications
sram and dram cells have been the predominant technologies used to implement memory cells in computer systems each one having its advantages and shortcomings sram cells are faster and require no refresh since reads are not destructive in contrast dram cells provide higher density and minimal leakage energy since there are no paths within the cell from vdd to ground recently dram cells have been embedded in logic based technology thus overcoming the speed limit of typical dram cells in this paper we propose an bit macrocell that implements one static cell and dynamic cells this cell is aimed at being used in an way set associative first level data cache our study shows that in four way set associative cache with this macrocell compared to an sram based with the same capacity leakage is reduced by about and area more than half with minimal impact on performance architectural mechanisms have also been devised to avoid refresh logic experimental results show that no performance is lost when the retention time is larger than processor cycles in addition the proposed delayed writeback policy that avoids refreshing performs similar amount of writebacks than conventional cache with the same organization so no power wasting is incurred
we present an extension of view dependent texture mapping vdtm allowing rendering of complex geometric meshes at high frame rates without usual blurring or skinning artifacts we combine hybrid geometric and image based representation of given object to speed up rendering at the cost of little loss of visual accuracyduring precomputation step we store an image based version of the original mesh by simply and quickly computing textures from viewpoints positionned around it by the user during the rendering step we use these textures in order to map on the fly colors and geometric details onto the surface of low polygon count version of the meshreal time rendering is achieved while combining up to three viewpoints at time using pixel shaders no parameterization of the mesh is needed and occlusion effects are taken into account while computing on the fly the best viewpoints for given pixel moreover the integration of this method in common real time rendering systems is straightforward and allows applying self shadowing as well as other buffer effects
the condition based approach for consensus solvability consists of identifying sets of input vectors called conditions for which there exists an asynchronous protocol solving consensus despite the occurrence of up to process crashesthis paper investigates cf the largest set of conditions which allow us to solve the consensus problem in an asynchronous shared memory systemthe first part of the paper shows that cf is made up of hierarchy of classes of conditions cf where is parameter called degree of the condition starting with min and ending with where cf cf we prove that each one is strictly contained in the previous one cf cf various properties of the hierarchy are also derived it is shown that class can be characterized in two equivalent but complementary ways one is convenient for designing protocols while the other is for analyzing the class properties the paper also defines linear family of conditions that can be used to derive many specific conditions in particular for each two natural conditions are presentedthe second part of the paper is devoted to the design of efficient condition based protocols generic condition based protocol is presented this protocol can be instantiated with any condition cf and requires at most log shared memory read write operations per process in the synchronization part of the protocol thus the value represents the difficulty of the class cf an improvement of the protocol for the conditions in cf is also presented
we recently introduced an efficient multiresolution structure for distributing and rendering very large point sampled models on consumer graphics platforms the structure is based on hierarchy of precomputed object space point clouds that are combined coarse to fine at rendering time to locally adapt sample densities according to the projected size in the image the progressive block based refinement nature of the rendering traversal exploits on board caching and object based rendering apis hides out of core data access latency through speculative prefetching and lends itself well to incorporate backface view frustum and occlusion culling as well as compression and view dependent progressive transmission the resulting system allows rendering of complex out of core models at high frame rates over rendered points second supports network streaming and is fundamentally simple to implement we demonstrate the efficiency of the approach on number of very large models stored on local disks or accessed through consumer level broadband network including massive samples isosurface generated by compressible turbulence simulation and samples model of michelangelo’s st matthew many of the details of our framework were presented in previous study we here provide more thorough exposition but also significant new material including the presentation of higher quality bottom up construction method and additional qualitative and quantitative results
model checking is suitable formal technique to analyze parallel programs execution in an industrial context because automated tools can be designed and operated with very limited knowledge of the underlying techniques however the specification must be given using dedicated notations that are not always familiar to engineers so far model checking on uml raises complex problems that will not be solved immediately this paper proposes an approach to perform transformation of source code programs into petri nets suitable specification for model checking to overcome the complexity of the resulting specification we focus on specific aspects of the program so several transformations can be performed to verify some aspects of the processed programs parts of this approach could be reused by intrusion detection systems
one of the surprising developments in the area of program verification is how ideas introduced by logicians in the early part of the th century ended up yielding by the century industrial standard property specification languages this development was enabled by the equally unlikely transformation of the mathematical machinery of automata on infinite words introduced in the early for second order logic into effective algorithms for model checking tools this paper attempts to trace the tangled threads of this development
modern programs make extensive use of reusable software libraries for example study of number of large java applications shows that between percnt and percnt of the classes in those applications use container classes defined in the javautil package given this extensive code reuse in java programs it is important for the interfaces of reusable classes to be well documented an interface is well documented if it satisfies the following requirements the documentation completely describes how to use the interface the documentation is clear the documentation is unambiguous and any deviation between the documentation and the code is machine detectable unfortunately documentation in natural language which is the norm does not satisfy the above requirements formal specifications can satisfy them but they are difficult to develop requiring significant effort on the part of programmers to address the practical difficulties with formal specifications we describe and evaluate tool to help programmers write and debug algebraic specifications given an algebraic specification of class our interpreter generates prototype that can be used within an application like regular java class when running an application that uses the prototype the interpreter prints error messages that tell the developer in which way the specification is incomplete or inconsistent with hand coded implementation of the class we use case studies to demonstrate the usefulness of our system
new approach to polygonal approximation is presented in this paper it starts from an initial set of dominant points break points where the integral square error from given shape is zero the proposed algorithm iteratively deletes most redundant dominant points till required approximation is achieved stabilization algorithm after elimination of each dominant point ensures high quality of approximation results of proposed algorithm are compared with classical algorithms the proposed algorithm has additional benefits like polygonal approximation with any number of dominant points and up to any error value and robustness of results
fundamental feature of the software process consists in its own stochastic nature convenient approach for extracting the stochastic dynamics of process from log data is that of modelling the process as markov model in this way the discovery of the short medium range dynamics of the process is cast in terms of the learning of markov models of different orders ie in terms of learning the corresponding transition matrices in this paper we show that the use of full bayesian approach in the learning process helps providing robustness against statistical noise and over fitting as the size of transition matrix grows exponentially with the order of the model we give specific model model similarity definition and the corresponding calculation procedure to be used in model to sequence or sequence to sequence conformance assessment this similarity definition could also be applied to other inferential tasks such as unsupervised process learning
this paper characterizes the family of truthful double sided auctions despite the importance of double sided auctions to market design to date no characterization of truthful double sided auctions was made this paper characterizes truthful mechanisms for double sided auctions by generalizing roberts classic result to show that truthful double sided auctions must almost be affine maximizers our main result of characterizing double sided auctions required the creation of new set of tools reductions that preserve economic properties this paper utilizes two such reductions truth preserving reduction and non affine preserving reduction the truth preserving reduction is used to reduce the double sided auction to special case of combinatorial auction to make use of the impossibility result proved in intuitively our proof shows that truthful double sided auctions are as hard to design as truthful combinatorial auctions two important concepts are developed in addition to the main result first the form of reduction used in this paper is of independent interest as it provides means for comparing mechanism design problems by design difficulty second we define the notion of extension of payments which given set of payments for some players finds payments for the remaining players the extension payments maintain the truthful and affine maximization properties
we devised novel statistical technique for the identification of the translation equivalents of source words obtained by transformation rule based translation trt the effectiveness of the technique called frequency based identification of translation equivalents fite was tested using biological and medical cross lingual spelling variants and out of vocabulary oov words in spanish english and finnish english trt the results showed that depending on the source language and frequency corpus fite trt the identification of translation equivalents from trt’s translation set by means of the fite technique may achieve high translation recall in the case of the web as the frequency corpus translation recall was percnt percnt for spanish english fite trt for both language pairs fite trt achieved high translation precision percnt percnt the technique also reliably identified native source language words source words that cannot be correctly translated by trt dictionary based clir augmented with fite trt performed substantially better than basic dictionary based clir where oov keys were kept intact fite trt with web document frequencies was the best technique among several fuzzy translation matching approaches tested in cross language retrieval experiments we also discuss the application of fite trt in the automatic construction of multilingual dictionaries
interest point detection has wide range of applications such as image retrieval and object recognition given an image many previous interest point detectors first assign interest strength to each image point using certain filtering technique and then apply non maximum suppression scheme to select set of interest point candidates however we observe that non maximum suppression tends to over suppress good candidates for weakly textured image such as face image we propose new candidate selection scheme that chooses image points whose zero first order intensities can be clustered into two imbalanced classes in size as candidates our tests of repeatability across image rotations and lighting conditions show the advantage of imbalance oriented selection we further present new face recognition application facial identity representability evaluation to show the value of imbalance oriented selection
in this paper we propose an approach for recovering structural design patterns from object oriented source code the recovery process is organized in two phases in the first phase the design pattern instances are identified at coarse grained level by considering the design structure only and exploiting parsing technique used for visual language recognition then the identified candidate patterns are validated by fine grained source code analysis phase the recognition process is supported by tool namely design pattern recovery environment which allowed us to assess the retrieval effectiveness of the proposed approach on six public domain programs and libraries
several programming languages guarantee that array subscripts are checked to ensure they are within the bounds of the array while this guarantee improves the correctness and security of array based code it adds overhead to array references this has been an obstacle to using higher level languages such as java for high performance parallel computing where the language specification requires that all array accesses must be checked to ensure they are within bounds this is because in practice array bounds checking in scientific applications may increase execution time by more than factor of previous research has explored optimizations to statically eliminate bounds checks but the dynamic nature of many scientific codes makes this difficult or impossible our approach is instead to create compiler and operating system infrastructure that does not generate explicit bounds checks it instead places arrays inside of index confinement regions icrs which are large isolated mostly unmapped virtual memory regions any array reference outside of its bounds will cause protection violation this provides implicit bounds checking our results show that when applying this infrastructure to high performance computing programs written in java the overhead of bounds checking relative to program with no bounds checks is reduced from an average of percnt to an average of percnt
while automatic image annotation remains an actively pursued research topic enhancement of image search through its use has not been extensively explored we propose an annotation driven image retrieval approach and argue that under number of different scenarios this is very effective for semantically meaningful image search in particular our system is demonstrated to effectively handle cases of partially tagged and completely untagged image databases multiple keyword queries and example based queries with or without tags all in near realtime because our approach utilizes extra knowledge from training dataset it outperforms state of the art visual similarity based retrieval techniques for this purpose novel structure composition model constructed from beta distributions is developed to capture the spatial relationship among segmented regions of images this model combined with the gaussian mixture model produces scalable categorization of generic images the categorization results are found to surpass previously reported results in speed and accuracy our novel annotation framework utilizes the categorization results to select tags based on term frequency term saliency and wordnet based measure of congruity to boost salient tags while penalizing potentially unrelated ones bag of words distance measure based on wordnet is used to compute semantic similarity the effectiveness of our approach is shown through extensive experiments
disk subsystem is known to be major contributor to overall power consumption of high end parallel systems past research proposed several architectural level techniques to reduce disk power by taking advantage of idle periods experienced by disks while such techniques have been known to be effective in certain cases they share common drawback they operate in reactive manner ie they control disk power by observing past disk activity eg idle and active periods and estimating future ones consequently they can miss opportunities for saving power and incur significant performance penalties due to inaccuracies in predicting idle and active times motivated by this observation this paper proposes and evaluates compiler driven approach to reducing disk power consumption of array based scientific applications executing on parallel architectures the proposed approach exposes disk layout information to compiler allowing it to derive disk access pattern ie the order in which parallel disks are accessed this paper demonstrates two uses of this information first we can implement proactive disk power management ie we can select the most appropriate powersaving strategy and disk preactivation strategy based on the compiler predicted future idle and active periods of parallel disks second we can restructure the application code to increase length of idle disk periods which leads to better exploitation of available power saving capabilities we implemented both these approaches within an optimizing compiler and tested their effectiveness using set of benchmark codes from the spec suite and disk power simulator our results show that the compiler driven disk power management is very promising the experimental results also reveal that while proactive disk power management is very effective code restructuring for disk power achieves additional energy savings across all the benchmarks tested and these savings are very close to optimal savings that can be obtained through an ilp integer linear programming based scheme
application specific extensions to the computational capabilities of processor provide an efficient mechanism to meet the growing performance and power demands of embedded applications hardware in the form of new function units or coprocessors and the corresponding instructions are added to baseline processor to meet the critical computational demands of target application in this paper the design of system to automate the instruction set customization process is presented dataflow graph design space exploration engine efficiently identifies computation subgraphs to create custom hardware and compiler subgraph matching framework seamlessly exploits this hardware we demonstrate the effectiveness of this system across range of application domains and study the applicability of the custom hardware across an entire application domain generalization techniques are presented which enable the application specific hardware to be more effectively used across domain
due to the popularity of knowledge discovery and data mining in practice as well as among academic and corporate professionals association rule mining is receiving increasing attention the authors present the recent progress achieved in mining quantitative association rules causal rules exceptional rules negative association rules association rules in multi databases and association rules in small databases this book is written for researchers professionals and students working in the fields of data mining data analysis machine learning and knowledge discovery in databases and for anyone who is interested in association rule mining
in wireless data networks such as the wap system the cached data may be time sensitive and thus strong consistency must be maintained ie the data presented to the user at the wap handset must be the same as that in the origin server in this paper strongly consistent cached data access algorithm probability based callback pcb in short is proposed for such networks in the pcb upon an update arrival the action taken by the server is not deterministic the server can either invalidate the cached data entry in the client or send the updated data entry to the client the pcb scheme can make good tradeoff between communication cost and access delay which is extremely difficult for most of the existing cache access schemes besides the pcb scheme possesses excellent universal adaptability and thus can adapt to the inherent heterogeneity of wireless networks and applications we analytically model the pcb scheme and derive closed form analytical formulae for the mean communication cost per data entry access and the mean access delay under general assumption on distributions of the inter update and inter access times it is demonstrated that the existing push and callback schemes are special cases of the pcb scheme
method for generating polynomial invariants of imperative programs is presented using the abstract interpretation framework it is shown that for programs with polynomial assignments an invariant consisting of conjunction of polynomial equalities can be automatically generated for each program point the proposed approach takes into account tests in conditional statements as well as in loops insofar as they can be abstracted into polynomial equalities and disequalities the semantics of each program statement is given as transformation on polynomial ideals merging of execution paths is defined as the intersection of the polynomial ideals associated with each path for loop junctions family of widening operators based on selecting polynomials up to certain degree is proposed the presented method has been implemented and successfully tried on many programs heuristics employed in the implementation to improve its efficiency are discussed and tables providing details about its performance are included
the concept of serializability has been the traditionally accepted correctness criterion in database systems however in multidatabase systems mdbss ensuring global serializability is difficult task the difficulty arises due to the heterogeneity of the concurrency control protocols used by the participating local database management systems dbmss and the desire to preserve the autonomy of the local dbmss in general solutions to the global serializability problem result in executions with low degree of concurrency the alternative relaxed serializability may result in data inconsistency in this article we introduce systematic approach to relaxing the serializability requirement in mdbs environments our approach exploits the structure of the integrity constraints and the nature of transaction programs to ensure consistency without requiring executions to be serializable we develop simple yet powerful classification of mdbss based on the nature of integrity constraints and transaction programs for each of the identified models we show how consistency can be preserved by ensuring that executions are two level serializable lsr lsr is correctness criterion for mdbs environments weaker than serializability what makes our approach interesting is that unlike global serializability ensuring lsr in mdbs environments is relatively simple and protocols to ensure lsr permit high degree of concurrency furthermore we believe the range of models we consider cover many practical mdbs environments to which the results of this article can be applied to preserve database consistency
researchers have had great success using motion capture tools for controlling avatars in virtual worlds another current of virtual reality research has focused on building collaborative environments connected by networks the present paper combines these tendencies to describe an open source software system that uses motion capture tools as input devices for realtime collaborative virtual environments important applications of our system lie in the realm of simulating interactive multiparticipant physical activities like sport and dance several challenges and their respective solutions are outlined first we describe the infrastructure necessary to handle full body articulated avatars as driven by motion capture equipment including calibration and avatar creation next we outline the pc cluster solution chosen to render our worlds exploring methods of data sharing and synchronization both within the pc cluster nodes and between different sites in the distributed system finally virtual sports require physics and we describe the simulation algorithms used
customizing architectures for particular applications is promising approach to yield highly energy efficient designs for embedded systems this work explores the benefits of architectural customization for class of embedded architectures typically used in energy and area constrained application domains such as sensor nodes and multimedia processing we implement process flow that performs an automatic synthesis and evaluation of the different architectures based on runtime profiles of applications and determines an efficient architecture with consideration for both energy and area constraints an expressive architectural model used by our engine is introduced that takes advantage of efficient opcode allocation several memory addressing modes and operand types by profiling embedded benchmarks from variety of sensor and multimedia applications we show that the energy savings resulting from various architectural optimizations relative to the base architectures eg mips and msp are significant and can reach percnt depending on the application we then identify the set of architectures that achieves near optimal savings for group of applications finally we propose the use of heterogeneous isa processors implementing those architectures as solution to capitalize on energy savings provided by application customization while executing range of applications efficiently
linear typing schemes can be used to guarantee non interference and so the soundness of in place update with respect to functional semantics but linear schemes are restrictive in practice and more restrictive than necessary to guarantee soundness of in place update this limitation has prompted research into static analysis and more sophisticated typing disciplines to determine when in place update may be safely used or to combine linear and non linear schemes here we contribute to this direction by defining new typing scheme that better approximates the semantic property of soundness of in place update for functional semantics we begin from the observation that some data are used only in read only context after which it may be safely re used before being destroyed formalising the in place update interpretation in machine model semantics allows us to refine this observation motivating three usage aspects apparent from the semantics that are used to annotate function argument types the aspects are used destructively used read only but shared with result and used read only and not shared with the result the main novelty is aspect which allows linear value to be safely read and even aliased with result of function without being consumed this novelty makes our type system more expressive than previous systems for functional languages in the literature the system remains simple and intuitive but it enjoys strong soundness property whose proof is non trivial moreover our analysis features principal types and feasible type reconstruction as shown in koneÄ�n’y in types workshop nijmegen proceedings springer verlag
design of stable software architectures has increasingly been deep challenge to software developers due to the high volatility of their concerns and respective design decisions architecture stability is the ability of the high level design units to sustain their modularity properties and not succumb to modifications architectural aspects are new modularity units aimed at improving design stability through the modularization of otherwise crosscutting concerns however there is no empirical knowledge about the positive and negative influences of aspectual decompositions on architecture stability this paper presents an exploratory analysis of the influence exerted by aspect oriented composition mechanisms in the stability of architectural modules addressing typical crosscutting concerns such as error handling and security our investigation encompassed comparative analysis of aspectual and non aspectual decompositions based on different architectural styles applied to an evolving multi agent software architecture in particular we assessed various facets of components and compositions stability through such alternative designs of the same multi agent system using conventional quantitative indicators we have also investigated the key characteristics of aspectual decompositions that led to in stabilities being observed in the target architectural options the evaluation focused upon number of architecturally relevant changes that are typically performed through real life maintenance tasks
skm snp snp markers detection program is proposed to identify set of relevant snps for the association between disease and multiple marker genotypes we employ subspace categorical clustering algorithm to compute weight for each snp in the group of patient samples and the group of normal samples and use the weights to identify the subsets of relevant snps that categorize these two groups the experiments on both schizophrenia and parkinson disease data sets containing genome wide snps are reported to demonstrate the program results indicate that our method can find some relevant snps that categorize the disease samples the online skm snp program is available at http wwwmathhkbueduhk mng skm snp skm snphtml
we study the problem of estimating selectivity of approximate substring queries its importance in databases is ever increasing as more and more data are input by users and are integrated with many typographical errors and different spelling conventions to begin with we consider edit distance for the similarity between pair of strings based on information stored in an extended gram table we propose two estimation algorithms mof and lbs for the task the latter extends the former with ideas from set hashing signatures the experimental results show that mof is light weight algorithm that gives fairly accurate estimations however if more space is available lbs can give better accuracy than mof and other baseline methods next we extend the proposed solution to other similarity predicates sql like operator and jaccard similarity
recently security issues in wireless sensor networks become more important many mechanisms have been proposed for solving varying types of malicious attacks however few of them discussed malicious packet modifying attacks mpa the mpa come from some malicious nodes that modify contents of data packets while relaying once mpa occur the sink node may make wrong decisions according to the incorrect packets in this paper an overhearing based detection mechanism obd is presented for detecting the occurrence of mpa both obd and traditional two path detection mechanism were successfully implemented for comparison using the ns the simulation measured the metrics in successful detection rate end to end delay power consumption and detection latency compared to the two path detection mechanism the overhearing based mechanism not only incurred less overhead but provided more accurate failure detection performance
multi agent systems are known to be an adequate design paradigm to build cooperative information systems the efficient and effective use of agent technology in organizations requires structuring the design of cooperative information systems upon protocols to properly capture and implement the concepts involved in the operation of an organization protocols have to meet three requirements being able to take into account and to integrate three interdependent and complementary concerns of organizations the informational organizational and behavioral dimensions dealing with the deontic aspects obligations permissions and prohibitions of interaction rules supporting concurrency openness and reliability petri net pn dialects are formalisms known to be well adapted to model protocols and to cope easily with the last requirement moreover they cover all the protocol engineering life cycle specification analysis and simulation including the implementation thanks to their operational semantics however existing pn dialects do not deal simultaneously with the first two requirements in this paper new petri net based formalism called organizational petri nets ogpn is proposed ogpn satisfies the three previous requirements in formal and coherent framework it also provides process to design and develop ogpn models the advantages of this formalism are an easy integration of the designed protocols in organizations the possibility to simulate these protocols before their deployment and the possibility to analyze their behavioral properties thus ogpn is serious candidate formalism to specify protocols in cooperative information systems and may be included in agent oriented methodologies like gaia or moise
like hardware embedded software faces stringent design constraints undergoes extremely aggressive optimization and therefore has similar need for verifying the functional equivalence of two versions of design eg before and after an optimization the concept of cutpoints was breakthrough in the formal equivalence verification of combinational circuits and is the key enabling technology behind its successful commercialization we introduce an analogous idea for formally verifying the equivalence of structurally similar combinational software ie software routines that compute result and return terminate rather than executing indefinitely we have implemented proof of concept cutpoint approach in our prototype verification tool for the ti cx family of vliw dsps and our experiments show large improvements in runtime and memory usage
we introduce new notion of bisimulation for showing contextual equivalence of expressions in an untyped lambda calculus with an explicit store and in which all expressed values including higher order values are storable our notion of bisimulation leads to smaller and more tractable relations than does the method of sumii and pierce in particular our method allows one to write down bisimulation relation directly in cases where requires an inductive specification and where the principle of local invariants is inapplicable our method can also express examples with higher order functions in contrast with the most widely known previous methods which are limited in their ability to deal with such examples the bisimulation conditions are derived by manually extracting proof obligations from hypothetical direct proof of contextual equivalence
semantic search has been one of the motivations of the semantic web since it was envisioned we propose model for the exploitation of ontology based knowledge bases to improve search over large document repositories in our view of information retrieval on the semantic web search engine returns documents rather than or in addition to exact values in response to user queries for this purpose our approach includes an ontology based scheme for the semiautomatic annotation of documents and retrieval system the retrieval model is based on an adaptation of the classic vector space model including an annotation weighting algorithm and ranking algorithm semantic search is combined with conventional keyword based retrieval to achieve tolerance to knowledge base incompleteness experiments are shown where our approach is tested on corpora of significant scale showing clear improvements with respect to keyword based search
in this paper we present new graph based frame work for collaborative place object and part recognition in indoor environments we consider scene to be an undirected graphical model composed of place node object nodes and part nodes with undirected links our key contribution is the introduction of collaborative place and object recognition we call it as the hierarchical context in this paper instead of object only or causal relation of place to objects we unify the hierarchical context and the well known spatial context into complete hierarchical graphical model hgm in the hgm object and part nodes contain labels and related pose information instead of only label for robust inference of objects the most difficult problems of the hgm are learning and inferring variable graph structures we learn the hgm in piecewise manner instead of by joint graph learning for tractability since the inference includes variable structure estimation with marginal distribution of each node we approximate the pseudo likelihood of marginal distribution using multimodal sequential monte carlo with weights updated by belief propagation data driven multimodal hypothesis and context based pruning provide the correct inference for successful recognition issues related to object recognition are also considered and several state of the art methods are incorporated the proposed system greatly reduces false alarms using the spatial and hierarchical contexts we demonstrate the feasibility of the hgm based collaborative place object and part recognition in actual large scale environments for guidance applications places objects
geographic routing is useful and scalable point to point communication primitive for wireless sensor networks however previous work on geographic routing makes the unrealistic assumption that all the nodes in the network are awake during routing this overlooks the common deployment scenario where sensor nodes are duty cycled to save energy in this paper we investigate several important aspects of geographic routing over duty cycled nodes first we extend existing geographic routing algorithms to handle the highly dynamic networks resulting from duty cycling second we provide the first formal analysis of the performance of geographic routing on duty cycled nodes third we use this analysis to develop an efficient decentralized sleep scheduling algorithm for reducing the number of awake nodes while maintaining both network coverage and tunable target routing latency finally we evaluate via simulation the performance of our approach versus running existing geographic routing algorithms on sensors duty cycled according to previous sleep scheduling algorithms our results show perhaps surprisingly that network of duty cycled nodes can have slightly better routing performance than static network that uses comparable energy our results further show that compared to previous algorithms our sleep scheduling algorithm significantly improves routing latency and network lifetime
there has recently been significant increase in the number of community based question and answer services on the web where people answer other peoples questions these services rapidly build up large archives of questions and answers and these archives are valuable linguistic resource one of the major tasks in question and answer service is to find questions in the archive that semantically similar to user’s question this enables high quality answers from the archive to be retrieved and removes the time lag associated with community based system in this paper we discuss methods for question retrieval that are based on using the similarity between answers in the archive to estimate probabilities for translation based retrieval model we show that with this model it is possible to find semantically similar questions with relatively little word overlap
although model checking has proven remarkably effective in detecting errors in hardware designs its success in the analysis of software specifications has been limited model checking algorithms for hardware verification commonly use binary decision diagrams bdds to represent predicates involving the many boolean variables commonly found in hardware descriptions unfortunately bdd representations may be less effective for analyzing software specifications which usually contain not only booleans but variables spanning wide range of data types further software specifications typically have huge sometimes infinite state spaces that cannot be model checked directly using conventional symbolic methods one promising but largely unexplored approach to model checking software specifications is to apply mathematically sound abstraction methods such methods extract reduced model from the specification thus making model checking feasible currently users of model checkers routinely analyze reduced models but often generate the models in ad hoc ways as result the reduced models may be incorrectthis paper an expanded version of bharadwaj and heitmeyer describes how one can model check complete requirements specification expressed in the scr software cost reduction tabular notation unlike previous approaches which applied model checking to mode transition tables with boolean variables we use model checking to analyze properties of complete scr specification with variables ranging over many data types the paper also describes two sound and under certain conditions complete methods for producing abstractions from requirements specifications these abstractions are derived from the specification and the property to be analyzed finally the paper describes how scr requirements specifications can be translated into the languages of spin an explicit state model checker and smv symbolic model checker and presents the results of model checking two sample scr specifications using our abstraction methods and the two model checkers
new approach to broadcast in wormhole routed two and three dimensional torus networks is proposed the underlying network is assumed to support only deterministic dimension ordered unicast routing the approach extends the graph theoretical concept of dominating nodes by accounting for the relative distance insensitivity of the wormhole routing switching strategy the proposed algorithm also takes advantage of an all port communication architecture which allows each node to simultaneously transmit messages on different outgoing channels the resulting broadcast operation is based on tree structure that uses multiple levels of extended dominating nodes edns performance results are presented that confirm the advantage of this method over other approaches
the current state of the art in visualization research places strong emphasis on different techniques to derive insight from disparate types of data however little work has investigated the visualization process itself the information content of the visualization process the results history and relationships between those results is addressed by this work characterization of the visualization process is discussed leading to general model of the visualization exploration process the model based upon new parameter derivation calculus can be used for automated reporting analysis or visualized directly an xml based language for expressing visualization sessions using the model is also described these sessions can then be shared and reused by collaborators the model along with the xml representation provides an effective means to utilize the information within the visualization process to further data exploration
the random like filling strategy pursuing high compression for today’s popular test compression schemes introduces large test power to achieve high compression in conjunction with reducing test power for multiple scan chain designs is even harder and very few works were dedicated to solve this problem this paper proposes and demonstrates multilayer data copy mdc scheme for test compression as well as test power reduction for multiple scan chain designs the scheme utilizes decoding buffer which supports fast loading using previous loaded data to achieve test data compression and test power reduction at the same time the scheme can be applied automatic test pattern generation atpg independently or to be incorporated in an atpg to generate highly compressible and power efficient test sets experiment results on benchmarks show that test sets generated by the scheme had large compression and power saving with only small area design overhead
aspect oriented software development aosd has emerged as new approach to develop software systems by improving their structure reuse maintenance and evolution properties it is being applied to all stages of the software life cycle in this paper we present the prisma approach which introduces aosd in software architectures prisma is characterized by integrating aspects as first order citizens of software architectures this paper shows how the prisma methodology is applied to develop case study of the tele operation system domain we illustrate how the prisma approach can improve the development and maintenance processes of these kinds of industrial systems
the ability to harness heterogeneous dynamically available grid resources is attractive to typically resource starved computational scientists and engineers as in principle it can increase by significant factors the number of cycles that can be delivered to applications however new adaptive application structures and dynamic runtime system mechanisms are required if we are to operate effectively in grid environments to explore some of these issues in practical setting the authors are developing an experimental framework called cactus that incorporates both adaptive application structures for dealing with changing resource characteristics and adaptive resource selection mechanisms that allow applications to change their resource allocations eg via migration when performance falls outside specified limits the authors describe the adaptive resource selection mechanisms and describe how they are used to achieve automatic application migration to better resources following performance degradation the results provide insights into the architectural structures required to support adaptive resource selection in addition the authors suggest that the cactus worm affords many opportunities for grid computing
statistical database statdb retrieves only aggregate results as opposed to individual tuples this paper investigates the construction of privacy preserving statdb that can accurately answer an infinite number of counting queries and ii effectively protect privacy against an adversary that may have acquired all the previous query results the core of our solutions is novel technique called dynamic anonymization specifically given query we on the fly compute tailor made anonymized version of the microdata which maximizes the precision of the query result privacy preservation is achieved by ensuring that the combination of all the versions deployed to process the past queries does not allow accurate inference of sensitive information extensive experiments with real data confirm that our technique enables highly effective data analysis while offering strong privacy guarantees
it is now common for web sites to use active web content such as flash silverlight or java applets to support rich interactive applications for many mobile devices however supporting active content is problematic first the physical resource requirements of the browser plug ins that execute active content may exceed the capabilities of the device second plug ins are simply not available for many devices finally active code and the plug ins that execute it often contain security flaws potentially exposing user’s device or private data to harm this paper explores proxy based approach for transparently supporting active web content on mobile devices our approach uses proxy to splice active content out of web pages and replace it with an ajax based remote display component the spliced active content executes within remote sandbox on the proxy but it appears embedded in the web page on the mobile device’s browser to demonstrate the viability of this approach we have designed implemented and evaluated flashproxy by using flashproxy any mobile web browser that supports javascript transparently inherits the ability to access sites that contain flash programs the major challenge in flashproxy is in trapping and handling interactions between the flash program and its execution environment including browser interactions flashproxy uses binary rewriting of flash bytecode to interpose on such interactions redirecting them through javascript based rpc layer to the user’s browser our evaluation of flashproxy shows that it is transparent performant and compatible with nearly all flash programs that we examined
recently some researches for nn nearest neighbour search in multimedia retrieval for example the va file and the lpc file have been proposed to resolve the problem called curse of dimensionality they are called as filtering approach because they first filter out the irrelevant objects by scanning all object approximations and compute the distances between the query and remaining objects to find out exact nn in this approach since all approximations are scanned in the filtering step the efficiency of computation in the filtering process must be seriously considered otherwise it would be very hard to be used in the real applications with lot of multimedia objects this paper proposes an efficient indexing mechanism for nn search to speed up this filtering process using novel indexing structure called hierarchical bitmap in which each object is represented as bitmap of size Ä�d where is the dimension of object’s feature vector that is th feature value of object is approximated with two bits that represent whether it is relatively high low or neither compared to the th feature value of other objects as performing xor operation between two bitmaps we can calculate the lower bound of lp distance between two feature vectors this mechanism can be hierarchically applied to generate multiple bitmaps of vector in order to raise the filtering rate
viable ontology engineering methodology requires supporting domain experts in gradually building and managing increasingly complex versions of ontological elements and their converging and diverging interrelationships contexts are necessary to formalise and reason about such dynamic wealth of knowledge however context dependencies introduce many complexities in this article we introduce formal framework for supporting context dependency management processes based on the dogma framework and methodology for scalable ontology engineering key notions are set of context dependency operators which can be combined to manage complex context dependencies like articulation application specialisation and revision dependencies in turn these dependencies can be used in context driven ontology engineering processes tailored to the specific requirements of collaborative communities this is illustrated by real world case of interorganisational competency ontology engineering
we give an nlogn time method for finding best link piecewise linear function approximating an point planar data set using the well known uniform metric to measure the error egr ge of the approximation our method is based upon new characterizations of such functions which we exploit to design an efficient algorithm using plane sweep in ldquo egr space rdquo followed by several applications of the parametric searching technique the previous best running time for this problem was
nowadays the need for incorporating preference querying in database technology is very important issue in variety of applications ranging from commerce to personalized search engines lot of recent research work has been dedicated to this topic in the artificial intelligence and database communities several formalisms allowing preference reasoning and specification have been proposed in the ai field on the other hand in the database field the interest has been focused mainly in extending standard sql with preference facilities in order to provide personalized query answering in this paper we propose to build bridge between these two approaches by using logic formalism originally designed to specify and reason with preference in order to extend sql with conditional preference constructors such constructors allow to express large class of preference statements with ceteris paribus semantics
in recent years immunization strategies have been developed for stopping epidemics in complex network like environments yet it still remains challenge for existing strategies to deal with dynamically evolving networks that contain community structures though they are ubiquitous in the real world in this paper we examine the performances of an autonomy oriented distributed search strategy for tackling such networks the strategy is based on the ideas of self organization and positive feedback from autonomy oriented computing aoc our experimental results have shown that autonomous entities in this strategy can collectively find and immunize most highly connected nodes in dynamic community based network within few steps
understanding of embodied interaction in the context of walk through displays and designing for it is very limited this study examined children’s intuitive embodied interaction with large semi visible projective walk through display and space around it using observation we identified several interaction patterns for passing staying and moving inside the screen using whole body and its parts for manipulating surface and content on the screen and ways of expanding the actual interaction environment outside of the projected screen we summarize the interaction patterns in the form of palette for rich embodied interaction with projected walk through displays
in this paper we describe research that has been on going within our group for the past four years on semantically smart disk systems semantically smart system goes beyond typical block based storage systems by extracting higher level information from the stream of traffic to disk doing so enables new and interesting pieces of functionality to be implemented within low level storage systems we first describe the development of our efforts over the past four years highlighting the key technologies needed to build semantically smart systems as well as the main weaknesses of our approach we then discuss future directions in the design and implementation of smarter storage systems
we study the problem of bulk inserting records into tables in system that horizontally range partitions data over large cluster of shared nothing machines each table partition contains contiguous portion of the table’s key range and must accept all records inserted into that range examples of such systems include bigtable at google and pnuts at yahoo during bulk inserts into an existing table if most of the inserted records end up going into small number of data partitions the obtained throughput may be very poor due to ineffective use of cluster parallelism we propose novel approach in which planning phase is invoked before the actual insertions by creating new partitions and intelligently distributing partitions across machines the planning phase ensures that the insertion load will be well balanced since there is tradeoff between the cost of moving partitions and the resulting throughput gain the planning phase must minimize the sum of partition movement time and insertion time we show that this problem is variation of np hard bin packing reduce it to problem of packing vectors and then give solution with provable approximation guarantees we evaluate our approach on prototype system deployed on cluster of machines and show that it yields significant improvements over more naïve techniques
contextual advertising also called content match refers to the placement of small textual ads within the content of generic web page it has become significant source of revenue for publishers ranging from individual bloggers to major newspapers at the same time it is an important way for advertisers to reach their intended audience this reach depends on the total number of exposures of the ad impressions and its click through rate ctr that can be viewed as the probability of an end user clicking on the ad when shown these two orthogonal critical factors are both difficult to estimate and even individually can still be very informative and useful in planning and budgeting advertising campaigns in this paper we address the problem of forecasting the number of impressions for new or changed ads in the system producing such forecasts even within large margins of error is quite challenging ad selection in contextual advertising is complicated process based on tens or even hundreds of page and ad features the publishers content and traffic vary over time and the scale of the problem is daunting over course of week it involves billions of impressions hundreds of millions of distinct pages hundreds of millions of ads and varying bids of other competing advertisers we tackle these complexities by simulating the presence of given ad with its associated bid over weeks of historical data we obtain an impression estimate by counting how many times the ad would have been displayed if it were in the system over that period of time we estimate this count by an efficient two level search algorithm over the distinct pages in the data set experimental results show that our approach can accurately forecast the expected number of impressions of contextual ads in real time we also show how this method can be used in tools for bid selection and ad evaluation
in this paper we further develop novel approach for modelling heterogeneous objects containing entities of various dimensions and representations within cellular functional framework based on the implicit complex notion we provide brief description for implicit complexes and describe their structure including both the geometry and topology of cells of different types then the paper focuses on the development of algorithms for set theoretic operations on heterogeneous objects represented by implicit complexes we also describe step by step procedure for the construction of hybrid model using these operations finally we present case study showing how to construct hybrid model integrating both boundary and function representations our examples also illustrate modelling with attributes and dynamic modelling
subsequence similarity matching in time series databases is an important research area for many applications this paper presents new approximate approach for automatic online subsequence similarity matching over massive data streams with simultaneous on line segmentation and pruning algorithm over the incoming stream the resulting piecewise linear representation of the data stream features high sensitivity and accuracy the similarity definition is based on permutation followed by metric distance function which provides the similarity search with flexibility sensitivity and scalability also the metric based indexing methods can be applied for speed up to reduce the system burden the event driven similarity search is performed only when there is potential event the query sequence is the most recent subsequence of piecewise data representation of the incoming stream which is automatically generated by the system the retrieved results can be analyzed in different ways according to the requirements of specific applications this paper discusses an application for future data movement prediction based on statistical information experiments on real stock data are performed the correctness of trend predictions is used to evaluate the performance of subsequence similarity matching
cmps enable simultaneous execution of multiple applications on the same platforms that share cache resources diversity in the cache access patterns of these simultaneously executing applications can potentially trigger inter application interference leading to cache pollution whereas large cache can ameliorate this problem the issues of larger power consumption with increasing cache size amplified at sub nm technologies makes this solution prohibitive in this paper in order to address the issues relating to power aware performance of caches we propose caching structure that addresses the following definition of application specific cache partitions as an aggregation of caching units molecules the parameters of each molecule namely size associativity and line size are chosen so that the power consumed by it and access time are optimal for the given technology application specific resizing of cache partitions with variable and adaptive associativity per cache line way size and variable line size replacement policy that is transparent to the partition in terms of size heterogeneity in associativity and line size through simulation studies we establish the superiority of molecular cache caches built as aggregations of molecules that offers power advantage over that of an equivalently performing traditional cache
ensuring the consistency and the availability of replicated data in highly mobile ad hoc networks is challenging task because of the lack of backbone infrastructure previous work provides strong data guarantees by limiting the motion and the speed of the mobile nodes during the entire system lifetime and by relying on assumptions that are not realistic for most mobile applications we provide small set of mobility constraints that are sufficient to ensure strong data guarantees and that can be applied when nodes move along unknown paths and speed and are sparsely distributed in the second part of the paper we analyze the problem of conserving energy while ensuring strong data guarantees using quorum system techniques we devise condition necessary for quorum system to guarantee data consistency and data availability under our mobility model this condition shows the unsuitability of previous quorum systems and is the basis for novel class of quorum systems suitable for highly mobile networks called mobile dissemination quorum mdq systems we also show mdq system that is provably optimal in terms of communication cost by proposing an efficient implementation of read write atomic shared memory the suitability of our mobility model and mdq systems is validated through simulations using the random waypoint model and the restricted random waypoint on city section finally we apply our results to assist routing and coordinate the low duty cycle of mobile nodes while maintaining network connectivity
we present novel method to model and synthesize variation in motion data given few examples of particular type of motion as input we learn generative model that is able to synthesize family of spatial and temporal variants that are statistically similar to the input examples the new variants retain the features of the original examples but are not exact copies of them we learn dynamic bayesian network model from the input examples that enables us to capture properties of conditional independence in the data and model it using multivariate probability distribution we present results for variety of human motion and handwritten characters we perform user study to show that our new variants are less repetitive than typical game and crowd simulation approaches of re playing small number of existing motion clips our technique can synthesize new variants efficiently and has small memory requirement
although code transformations are routinely applied to improve the performance of programs for both scalar and parallel machines the properties of code improving transformations are not well understood in this article we present framework that enables the exploration both analytically and experimentally of properties of code improving transformations the major component of the framework is specification language gospel for expressing the conditions needed to safely apply transformation and the actions required to change the code to implement the transformation the framework includes technique that facilitates an analytical investigation of code improving transformations using the gospel specifications it also contains tool genesis that automatically produces transformer that implements the transformations specified in gospel we demonstrate the usefulness of the framework by exploring the enabling and disabling properties of transformations we first present analytical results on the enabling and disabling properties of set of code transformations including both traditional and parallelizing transformations and then describe experimental results showing the types of transformations and the enabling and disabling interactions actually found in set of programs
we consider the problem of efficiently producing ranked results for keyword search queries over hyperlinked xml documents evaluating keyword search queries over hierarchical xml documents as opposed to conceptually flat html documents introduces many new challenges first xml keyword search queries do not always return entire documents but can return deeply nested xml elements that contain the desired keywords second the nested structure of xml implies that the notion of ranking is no longer at the granularity of document but at the granularity of an xml element finally the notion of keyword proximity is more complex in the hierarchical xml data model in this paper we present the xrank system that is designed to handle these novel features of xml keyword search our experimental results show that xrank offers both space and performance benefits when compared with existing approaches an interesting feature of xrank is that it naturally generalizes hyperlink based html search engine such as google xrank can thus be used to query mix of html and xml documents
building on the internet ip platforms available on mobile devices and personal computers pcs we implemented service called weconnect that facilitates broadcast of content via dedicated and personal media channels weconnect includes simple tools for integrating content into media mixes and provides content viewers for ubiquitous access to weconnect channels via mobile desktop and other ip enabled devices in this paper we present an exploratory user study based on deployment of the weconnect service among individuals in close relationships the study focuses on the user perception and experience with the always on channels delivering personalized content we started with images text and animations as familiar and easily accessible media we observed high level of reciprocity in creating and exchanging expressive content and need for persistence reuse and notification of content delivery the users voiced their enthusiasm for receiving personalized media across mobile and desktop devices the always on nature of the weconnect channels raised new requirements for the service design to assist content producers with creating streams of personalized media efficiently and to enable recipients to view the content flexibly
unified database framework that will enable better comprehension of ranked xml retrieval is still challenge in the xml database field we propose logical algebra named score region algebra that enables transparent specification of information retrieval ir models for xml databases the transparency is achieved by possibility to instantiate various retrieval models using abstract score functions within algebra operators while logical query plan and operator definitions remain unchanged our algebra operators model three important aspects of xml retrieval element relevance score computation element score propagation and element score combination to illustrate the usefulness of our algebra we instantiate four different well known ir scoring models and combine them with different score propagation and combination functions we implemented the algebra operators in prototype system on top of low level database kernel the evaluation of the system is performed on collection of ieee articles in xml format provided by inex we argue that state of the art xml ir models can be transparently implemented using our score region algebra framework on top of any low level physical database engine or existing rdbms allowing more systematic investigation of retrieval model behavior
depth of field dof the range of scene depths that appear sharp in photograph poses fundamental tradeoff in photography wide apertures are important to reduce imaging noise but they also increase defocus blur recent advances in computational imaging modify the acquisition process to extend the dof through deconvolution because deconvolution quality is tight function of the frequency power spectrum of the defocus kernel designs with high spectra are desirable in this paper we study how to design effective extended dof systems and show an upper bound on the maximal power spectrum that can be achieved we analyze defocus kernels in the light field space and show that in the frequency domain only low dimensional manifold contributes to focus thus to maximize the defocus spectrum imaging systems should concentrate their limited energy on this manifold we review several computational imaging systems and show either that they spend energy outside the focal manifold or do not achieve high spectrum over the dof guided by this analysis we introduce the lattice focal lens which concentrates energy at the low dimensional focal manifold and achieves higher power spectrum than previous designs we have built prototype lattice focal lens and present extended depth of field results
this paper introduces programming language that makes it convenient to compose large software systems combining their features in modular way supports nested intersection building on earlier work on nested inheritance in the language jx nested inheritance permits modular type safe extension of package including nested packages and classes while preserving existing type relationships nested intersection enables composition and extension of two or more packages combining their types and behavior while resolving conflicts with relatively small amount of code the utility of is demonstrated by using it to construct two composable extensible frameworks compiler framework for java and peer to peer networking system both frameworks support composition of extensions for example two compilers adding different domain specific features to java can be composed to obtain compiler for language that supports both sets of features
sbse techniques have been widely applied to requirements selection and prioritization problems in order to ascertain suitable set of requirements for the next release of system unfortunately it has been widely observed that requirements tend to be changed as the development process proceeds and what is suitable for today may not serve well into the future though sbse has been widely applied to requirements analysis there has been no previous work that seeks to balance the requirements needs of today with those of the future this paper addresses this problem it introduces multi objective formulation of the problem which is implemented using multi objective pareto optimal evolutionary algorithms the paper presents the results of experiments on both synthetic and real world data
an aspect oriented programming aop based approach is proposed to perform context aware service composition on the fly it realises context aware composition by semantically weaving context into static web service composition context weaver is implemented based on the proposed approach the proposed semantic weaving allows services to be composed in systematic way with changing context
in multilabel learning each instance in the training set is associated with set of labels and the task is to output label set whose size is unknown priori for each unseen instance in this paper this problem is addressed in the way that neural network algorithm named bp mll ie backpropagation for multilabel learning is proposed it is derived from the popular backpropogation algorithm through employing novel error function capturing the characteristics of multilabel learning ie the labels belonging to an instance should be ranked higher than those not belonging to that instance applications to two real world multilabel learning problems ie functional genomics and text categorization show that the performance of bp mll is superior to that of some well established multilabel learning algorithms
the modelcamera is system for inside looking out modeling of static room size indoor scenes the modelcamera implements an interactive modeling pipeline based on real time dense color and sparse depth the operator scans the scene with device that acquires video stream augmented with depth samples per frame the system registers the frames using the depth and color data and integrates them into an evolving model that is displayed continually the operator selects views by checking the display for missing or undersampled surfaces and aiming the camera at them model is built from thousands of frames
time series data is common in many settings including scientific and financial applications in these applications the amount of data is often very large we seek to support prediction queries over time series data prediction relies on model building which can be too expensive to be practical if it is based on large number of data points we propose to use statistical tests of hypotheses to choose proper subset of data points to use for given prediction query interval this involves two steps choosing proper history length and choosing the number of data points to use within this history further we use an conscious skip list data structure to provide samples of the original data set based on the statistics collected for query workload which we model as probability mass function pmf over query intervals we devise randomized algorithm that selects set of pre built models pm’s to construct subject to some maintenance cost constraint when there are updates given this set of pm’s we discuss interesting query processing strategies for not only point queries but also range aggregation and join queries we conduct comprehensive empirical study on real world datasets to verify the effectiveness of our approaches and algorithms
we present simple and practical approach for segmenting unoccluded items in scene by actively casting shadows by items we refer to objects or part of objects enclosed by depth edges our approach utilizes the fact that under varying illumination un occluded items will cast shadows on occluded items or background but will not be shadowed themselves we employ an active illumination approach by taking multiple images under different illumination directions with illumination source close to the camera our approach ignores the texture edges in the scene and uses only the shadow and silhouette information to determine the occlusions we show that such segmentation does not require the estimation of depth map or information which can be cumbersome expensive and often fails due to the lack of texture and presence of specular objects in the scene our approach can handle complex scenes with self shadows and specularities results on several real scenes along with the analysis of failure cases are presented
code generators for realistic application domains are not directly verifiable in practice in the certifiable code generation approach the generator is extended to generate logical annotations ie pre and postconditions and loop invariants along with the programs allowing fully automated program proofs of different safety properties however this requires access to the generator sources and remains difficult to implement and maintain because the annotations are cross cutting concerns both on the object level ie in the generated code and on the meta level ie in the generator here we describe new generic post generation annotation inference algorithm that circumvents these problems we exploit the fact that the output of code generator is highly idiomatic so that patterns can be used to describe all code constructs that require annotations the patterns are specific to the idioms of the targeted code generator and to the safety property to be shown but the algorithm itself remains generic it is based on pattern matcher used to identify instances of the idioms and build property specific abstracted control flow graph and graph traversal that follows the paths from the use nodes backwards to all corresponding definitions annotating the statements along these paths this core is instantiated for two generators and successfully applied to automatically certify initialization safety for range of generated programs
modularity of ontologies is currently an active research field and many different notions of module have been proposed in this paper we review the fundamental principles of modularity and identify formal properties that robust notion of modularity should satisfy we explore these properties in detail in the contexts of description logic and classical predicate logic and put them into the perspective of well known concepts from logic and modular software specification such as interpolation forgetting and uniform interpolation we also discuss reasoning problems related to modularity
we propose new architecture for on demand media streaming centered around the peer to peer pp paradigm the key idea of the architecture is that peers share some of their resources with the system as peers contribute resources to the system the overall system capacity increases and more clients can be served the proposed architecture employs several novel techniques to use the often underutilized peers resources which makes the proposed architecture both deployable and cost effective aggregate contributions from multiple peers to serve requesting peer so that supplying peers are not overloaded make good use of peer heterogeneity by assigning relatively more work to the powerful peers and organize peers in network aware fashion such that nearby peers are grouped into logical entity called cluster the network aware peer organization is validated by statistics collected and analyzed from real internet data the main benefit of the network aware peer organization is that it allows to develop efficient searching to locate nearby suppliers and dispersion to disseminate new files into the system algorithms we present network aware searching and dispersion algorithms that result in fast dissemination of new media files ii reduction of the load on the underlying network and iii better streaming servicewe demonstrate the potential of the proposed architecture for large scale on demand media streaming service through an extensive simulation study on large internet like topologies starting with limited streaming capacity hence low cost the simulation shows that the capacity rapidly increases and many clients can be served this occurs for all studied arrival patterns including constant rate arrivals flash crowd arrivals and poisson arrivals furthermore the simulation shows that reasonable client side initial buffering of is sufficient to ensure full quality playback even in the presence of peer failures
we present generic framework to evaluate patterns obtained from transactional web data streams whose underlying distribution changes with time the evolving nature of the data makes it very difficult to determine whether there is structure in the data stream and whether this structure is being learned this challenge arises in applications such as mining online store transactions summarizing dynamic document collections and profiling web traffic we propose to evaluate this hard instance of unsupervised learning using continuous assessment of the predictive power of the learned patterns with specific examples that borrow concepts from supervised learning we present results from experiments with synthetic data the newsgroups dataset web clickstream data and custom collection of rss news feeds
we investigate the role of cycles structures ie subsets of clauses of the form bar vee bar vee bar vee bar in the quality of the lower bound lb of modern maxsat solvers given cycle structure we have two options use the cycle structure just to detect inconsistent subformulas in the underestimation component and ii replace the cycle structure with bar vee bar vee bar bar vee vee by applying maxsat resolution and at the same time change the behaviour of the underestimation component we first show that it is better to apply maxsat resolution to cycle structures occurring in inconsistent subformulas detected using unit propagation or failed literal detection we then propose heuristic that guides the application of maxsat resolution to cycle structures during failed literal detection and evaluate this heuristic by implementing it in maxsatz obtaining new solver called maxsatz our experiments on weighted maxsat and partial maxsat instances indicate that maxsatz substantially improves maxsatz on many hard random crafted and industrial instances
perceptually based computer graphics techniques attempt to take advantage of limitations in the human visual system to improve system performance this paper investigates the distortions caused by the implementation of technique known as region warping from the human visual perception perspective region warping was devised in conjunction with other techniques to facilitate priority rendering for virtual reality address recalculation pipeline arp system the arp is graphics display architecture designed to reduce user head rotational latency in immersive head mounted display hmd virtual reality priority rendering was developed for use with the arp system to reduce the overall rendering load large object segmentation region priority rendering and region warping are techniques that have been introduced to assist priority rendering and to further reduce the overall rendering load region warping however causes slight distortions to appear in the graphics while this technique might improve system performance the human experience and perception of the system cannot be neglected this paper presents results of two experiments that address issues raised by our previous studies in particular these experiments investigate whether anti aliasing and virtual environments with different scene complexities might affect user’s visual perception of region warping distortions
stereotyping is technique used in many information systems to represent user groups and or to generate initial individual user models however there has been lack of evidence on the accuracy of their use in representing users we propose formal evaluation method to test the accuracy or homogeneity of the stereotypes that are based on users explicit characteristics using the method the results of an empirical testing on common user stereotypes of information retrieval ir systems are reported the participants memberships in the stereotypes were predicted using discriminant analysis based on their ir knowledge the actual membership and the predicted membership of each stereotype were compared the data show that librarians ir professionals is an accurate stereotype in representing its members while some others such as undergraduate students and social sciences humanities users are not accurate stereotypes the data also demonstrate that based on the user’s ir knowledge stereotype can be made more accurate or homogeneous the results show the promise that our method can help better detect the differences among stereotype members and help with better stereotype design and user modeling we assume that accurate stereotypes have better performance in user modeling and thus the system performancelimitations and future directions of the study are discussed
the goal of graph clustering is to partition vertices in large graph into different clusters based on various criteria such as vertex connectivity or neighborhood similarity graph clustering techniques are very useful for detecting densely connected groups in large graph many existing graph clustering methods mainly focus on the topological structure for clustering but largely ignore the vertex properties which are often heterogenous in this paper we propose novel graph clustering algorithm sa cluster based on both structural and attribute similarities through unified distance measure our method partitions large graph associated with attributes into clusters so that each cluster contains densely connected subgraph with homogeneous attribute values an effective method is proposed to automatically learn the degree of contributions of structural similarity and attribute similarity theoretical analysis is provided to show that sa cluster is converging extensive experimental results demonstrate the effectiveness of sa cluster through comparison with the state of the art graph clustering and summarization methods
we present hml type inference system that supports full first class polymorphism where few annotations are needed only function parameters with polymorphic type need to be annotated hml is simplification of mlf where only flexibly quantified types are used this makes the types easier to work with from programmers perspective and simplifies the implementation of the type inference algorithm still hml retains much of the expressiveness of mlf it is robust with respect to small program transformations and has simple specification of the type rules with an effective type inference algorithm that infers principal types small reference implementation with many examples is available at http researchmicrosoftcom users daan pubshtml
component based architectures are the traditional approach to reconcile application specific optimization with reusable abstractions in sensor networks however they frequently overwhelm the application designer with the range of choices in component selection and composition we introduce component framework that reduces this complexity it provides well defined content based publish subscribe service but allows the application designer to adapt the service by making orthogonal choices about the communication protocol components for subscription and notification delivery the supported data attributes and set of service extension components we present tinycops our implementation of the framework in tinyos and demonstrate its advantages by showing experimental results for different application configurations on two sensor node platforms in large scale indoor testbed
state space lumping is one of the classical means to fight the state space explosion problem in state based performance evaluation and verification particularly when numerical algorithms are applied to analyze markov model one often observes that those algorithms do not scale beyond systems of moderate size to alleviate this problem symbolic lumping algorithms have been devised to effectively reduce very large but symbolically represented markov models to moderate size explicit representations this lumping step partitions the markov model in such way that any numerical analysis carried out on the lumped model is guaranteed to produce exact results for the original system but even this lumping preprocessing may fail due to time or memory limitations this paper discusses the two main approaches to symbolic lumping and combines them to improve on their respective limitations the algorithm automatically converts between known symbolic partition representations in order to provide trade off between memory consumption and runtime we show how to apply this algorithm for the lumping of markov chains but the same techniques can be adapted in straightforward way to other models like markov reward models labeled transition systems or interactive markov chains
the initiation of interaction in face to face environments is gradual process and takes place in rich information landscape of awareness attention and social signals one of the main benefits of this process is that people can be more sensitive to issues of privacy and interruption while they are moving towards interaction however on line communication tools do not provide this subtlety and often lead to unwanted interruptions we have developed prototype message system called openmessenger om that adds the idea of gradual initiation of interaction to on line communication openmessenger provides multiple levels of awareness about people and provides notification to those about whom information is being gathered openmessenger allows people to negotiate interaction in richer fashion than is possible with any other current messaging system preliminary evaluation data suggest the utility of the approach but also shows that there are number of issues yet to be resolved in this area
large scale applications require the efficient exchange of data across their distributed components including data from heterogeneous sources and to widely varying clients inherent to such data exchanges are discrepancies among the data representations used by sources clients or intermediate application components eg due to natural mismatches or due to dynamic component evolution and requirements to route combine or otherwise manipulate data as it is being transferred as result there is an ever growing need for data conversion services handled by stubs in application servers by middleware or messaging services by the operating system or by the network this paper’s goal is to demonstrate and evaluate the ability of modern network processors to efficiently address data compatibility issues when data is in transit between application level services toward this end we present the design and implementation of network level execution environment that permits systems to dynamically deploy and configure application level data conversion services into the network infrastructure experimental results obtained with prototype implementation on intel’s ixp network processors include measurements of xml like data format conversions implemented with efficient binary data formats
applications need to become more concurrent to take advantage of the increased computational power provided by chip level multiprocessing programmers have traditionally managed this concurrency using locks mutex based synchronization unfortunately lock based synchronization often leads to deadlocks makes fine grained synchronization difficult hinders composition of atomic primitives and provides no support for error recovery transactions avoid many of these problems and therefore promise to ease concurrent programmingwe describe software transactional memory stm system that is part of mcrt an experimental multi core runtime the mcrt stm implementation uses number of novel algorithms and supports advanced features such as nested transactions with partial aborts conditional signaling within transaction and object based conflict detection for applications the mcrt stm exports interfaces that can be used from programs directly or as target for compilers translating higher level linguistic constructswe present detailed performance analysis of various stm design tradeoffs such as pessimistic versus optimistic concurrency undo logging versus write buffering and cache line based versus object based conflict detection we also show mcas implementation that works on arbitrary values coexists with the stm and can be used as more efficient form of transactional memory to provide baseline we compare the performance of the stm with that of fine grained and coarse grained locking using number of concurrent data structures on processor smp system we also show our stm performance on non synthetic workload the linux sendmail application
in this article we describe the sweep shake system novel low interaction cost approach to supporting the spontaneous discovery of geo located information by sweeping mobile device around their environment users browse for interesting information related to points of interest we built mobile haptic prototype which encourages the user to explore their surroundings to search for location information helping them discover this by providing directional vibrotactile feedback once potential targets are selected the interaction is extended to offer an hierarchy of information levels with simple method for filtering and selecting desired types of data for each geo tagged location we describe and motivate our approach and present short field trial to situate our design in real environment followed by more detailed user study that compares it against an equivalent visual based system
middleware components are becoming increasingly important as applications share computational resources in distributed environments such as high end clusters with ever larger number of processors computational grids and increasingly large server farms one of the main challenges in such environments is to achieve scalability of synchronization in general concurrency services arbitrate resource requests in distributed systems but concurrency protocols currently lack scalability adding such guarantees enables resource sharing and computing with distributed objects in systems with large number of nodes the objective of our work is to enhance middleware services to provide scalability of synchronization and to support state replication in distributed systems we have designed and implemented middleware protocol in support of these objectives its essence is peer to peer protocol for multi mode hierarchical locking which is applicable to transaction style processing and distributed agreement we demonstrate high scalability combined with low response times in high performance cluster environments our technical contribution is novel fully decentralized hierarchical locking protocol to enhance concurrency in distributed resource allocation following the specification of general concurrency services for large scale data and object repositories our experiments on an ibm sp show that the number of messages approaches an asymptote at node from which point on the message overhead is in the order of messages per request depending on system parameters at the same time response times increase linearly with proportional increase in requests and consequently higher concurrency levels specifically in the range of up to nodes response times under ms are observed for critical sections that are one th the size of noncritical code the high degree of scalability and responsiveness of our protocol is due in large to high level of concurrency upon resolving requests combined with dynamic path compression for request propagation paths our approach is not only applicable to corba its principles are shown to provide benefits to general distributed concurrency services and transaction models besides its technical strengths our approach is intriguing due to its simplicity and its wide applicability ranging from large scale clusters to server style computing
it is crucial to detect zero day polymorphic worms and to generate signatures at network gateways or honeynets so that we can prevent worms from propagating at their early phase however most existing network based signatures are specific to exploit and can be easily evaded in this paper we propose generating vulnerability driven signatures at network level without any host level analysis of worm execution or vulnerable programs as the first step we design network based length based signature generator lesg for the worms exploiting buffer overflow vulnerabilities the signatures generated are intrinsic to buffer overflows and are very difficult for attackers to evade we further prove the attack resilience bounds even under worst case attacks with deliberate noise injection moreover lesg is fast and noisetolerant and has efficient signature matching evaluation based on real world vulnerabilities of various protocols and real network traffic demonstrates that lesg is promising in achieving these goals
variety of dynamic aspect oriented language constructs are proposed in recent literature with corresponding compelling use cases such constructs demonstrate the need to dynamically adapt the set of join points intercepted at fine grained level the notion of morphing aspects and continuous weaving is motivated by this need we propose an intermediate language model called nu that extends object oriented intermediate language models with two fine grained deployment primitives bind and remove these primitives offer higher level of abstraction as compilation target for dynamic aspect oriented language constructs thereby making it easier to support such constructs we present the design and implementation of the nu model in the sun hotspot vm an industrial strength virtual machine which serves to show the feasibility of the intermediate language design our implementation uses dedicated caching mechanisms to significantly reduce the amortized costs of join point dispatch our evaluation shows that the cost of supporting dynamic deployment model can be reduced to as little as we demonstrate the potential utility of the intermediate language design by expressing variety of aspect oriented source language constructs of dynamic flavor such as caeserj’s deploy history based pointcuts and control flow constructs in terms of the nu model
one difficulty in software maintenance is that the relationship between observed program behavior and source code is not always clear in this paper we are concerned specifically with the maintenance of graphical user interfaces guis user interface code can crosscut the decomposition of applications making guis hard to maintain popular approach to develop and maintain guis is to use what you see is what you get editors they allow developers to work directly with graphical design view instead of scattered source elements unfortunately gui editors are limited by their ability to statically reconstruct dynamic collaborations between objects in this paper we investigate the combination of hybrid dynamic and static approach to allow for view based maintenance of guis dynamic analysis reconstructs object relationships providing concrete context in which maintenance can be performed static checking restricts that only changes in the design view which can meaningfully be translated back to source are allowed we implemented prototype ide plug in and evaluate our approach by applying it to five open source projects
we study the interaction between the aimd additive increase multiplicative decrease multi socket congestion control and bottleneck router with drop tail buffer we consider the problem in the framework of deterministic hybrid models first we show that trajectories always converge to limiting cycles we characterize the cycles necessary and sufficient conditions for the absence of multiple jumps in the same cycle are obtained then we propose an analytical framework for the optimal choice of the router buffer size we formulate this problem as multi criteria optimization problem in which the lagrange function corresponds to linear combination of the average goodput and the average delay in the queue our analytical results are confirmed by simulations performed with matlab simulink
model driven development mdd is widely used to develop modern business applications mdd involves creating models at different levels of abstractions starting with models of domain concepts these abstractions are successively refined using transforms to design level models and eventually code level artifacts although many tools exist that support transform creation and verification tools that help users in understanding and using transforms are rare in this paper we present an approach for assisting users in understanding model transformations and debugging their input models we use automated program analysis techniques to analyze the transform code and compute constraints under which transformation may fail or be incomplete these code level constraints are mapped to the input model elements to generate model level rules the rules can be used to validate whether an input model violates transform constraints and to support general user queries about transformation we have implemented the analysis in tool called xylem we present empirical results which indicate that our approach can be effective in inferring useful rules and the rules let users efficiently diagnose failing transformation without examining the transform source code
since most sensor networks expose funnelling effect caused by the congestion phenomenon where routing transit flows towards the base station through distinctive multihop many to one pattern we propose hybrid efficient routing protocol herp with power awareness algorithm for sink oriented mobile ad hoc sensor networks to boost the network fidelity and prolong the network lifetime of mobile communications the results show that herp can boost network fidelity up to under intense traffic in localised sink oriented mobile ad hoc sensor network and effectively prolong the network lifetime of mobile communications
existing workflow management systems assume that scientists have well specified workflow design before the execution in reality lot of scientific discoveries are made as result of dynamic process where scientists keep proposing new hypotheses and verifying them through multiple tries of various experiments before achieving successful experimental results consequently not all the experiments in workflow execution have necessarily contributed to the final result in this paper we investigate the problem of effectively reproducing the results of previous scientific workflow executions by discovering the critical experiments leading to the success and the logical constraints on their execution order relational schema and sql queries have been designed for effectively recording the workflow execution log efficiently identifying the critical experiments from the log and recommending experiment reproduction strategies to users furthermore we propose optimization techniques for evaluating such sql queries according to the unique characteristics of the log data experimental evaluations demonstrate the performance speedup of our approach
the typical software engineering course consists of lectures in which concepts and theories are conveyed along with small toy software engineering project which attempts to give students the opportunity to put this knowledge into practice although both of these components are essential neither one provides students with adequate practical knowledge regarding the process of software engineering namely lectures allow only passive learning and projects are so constrained by the time and scope requirements of the academic environment that they cannot be large enough to exhibit many of the phenomena occurring in real world software engineering processes to address this problem we have developed problems and programmers an educational card game that simulates the software engineering process and is designed to teach those process issues that are not sufficiently highlighted by lectures and projects we describe how the game is designed the mechanics of its game play and the results of an experiment we conducted involving students playing the game
dlove distributed links over variables evaluation is new model for specifying and implementing virtual reality and other next generation or non wimp user interfaces our approach matches the parallel and continuous structure of these interfaces by combining data flow or constraint like component with an event based component for discrete interactions moreover because the underlying constraint graph naturally lends itself to parallel computation dlove provides for the constraint graph to be partitioned and executed in parallel across several machines for improved performance with our system one can write program designed for single machine but can execute it in distributed environment with minor code modifications the system also supports mechanics for implementing or transforming single user programs into multi user programs we present experiments demonstrating how dlove improves performance by dramatically increasing the validity of the rendered frames we also present performance measures to measure statistical skew in the frames which we believe is more suitable for interactive systems than traditional measures of parallel systems such as throughput or frame rate because they fail to capture the freshness of each rendered frame
in typical content based image retrieval cbir system query results are set of images sorted by feature similarities with respect to the query however images with high feature similarities to the query may be very different from the query in terms of semantics this is known as the semantic gap we introduce novel image retrieval scheme cluster based retrieval of images by unsupervised learning clue which tackles the semantic gap problem based on hypothesis semantically similar images tend to be clustered in some feature space clue attempts to capture semantic concepts by learning the way that images of the same semantics are similar and retrieving image clusters instead of set of ordered images clustering in clue is dynamic in particular clusters formed depend on which images are retrieved in response to the query therefore the clusters give the algorithm as well as the users semantic relevant clues as to where to navigate clue is general approach that can be combined with any real valued symmetric similarity measure metric or nonmetric thus it may be embedded in many current cbir systems experimental results based on database of about images from corel demonstrate improved performance
polycube spline has been formulated elegantly that can unify splines and manifold splines to define new class of shape representations for surfaces of arbitrary topology by using polycube map as its parametric domain in essense the data fitting quality using polycube splines hinges upon the construction of underlying polycube maps yet existing methods for polycube map construction exhibit some disadvantages for example existing approaches for polycube map construction either require projection of points from surface to its polycube approximation which is therefore very difficult to handle the cases when two shapes differ significantly or compute the map by conformally deforming the surfaces and polycubes to the common canonical domain and then construct the map using function composition which is challenging to control the location of singularities and makes it hard for the data fitting and hole filling processes later on this paper proposes novel framework of user controllable polycube maps which can overcome disadvantages of the conventional methods and is much more efficient and accurate the current approach allows users to directly select the corner points of the polycubes on the original surfaces then construct the polycube maps by using the new computational tool of discrete euclidean ricci flow we develop algorithms for computing such polycube maps and show that the resulting user controllable polycube map serves as an ideal parametric domain for constructing spline surfaces and other applications the location of singularities can be interactively placed where no important geometric features exist experimental results demonstrate that the proposed polycube maps introduce lower area distortion and retain small angle distortion as well and subsequently make the entire hole filling process much easier to accomplish
wide area sensing services enable users to query data collected from multitudes of widely distributed sensors in this paper we consider the novel distributed database workload characteristics of these services and present idp an online adaptive data placement and replication system tailored to this workload given hierarchical database idp automatically partitions it among set of networked hosts and replicates portions of it idp makes decisions based on measurements of access locality within the database read and write load for individual objects within the database proximity between queriers and potential replicas and total load on hosts participating in the database our evaluation of idp under real and synthetic workloads including flash crowds of queriers demonstrates that in comparison with previously studied replica placement techniques idp reduces average response times for user queries by up to factor of and reduces network traffic for queries updates and data movements by up to an order of magnitude
floorplacement has attracted attention as placement formula tion for designs with thousands or millions of soft macro blocks in this paper we investigate the standard block approach where soft blocks are shaped to have uniform height rather than wide range of different sizes this allows many macro blocks can be treated as standard cells simplifying the problem to one of ordinary mixed size placement we obtain high quality results for suite of recent benchmarks and also present novel legalization algorithms that are more robust than the widely used mixed size tetris approach
we present technique for the easy acquisition of realistic materials and mesostructures without acquiring the actual brdf the method uses the observation that under certain circumstances the mesostructure of surface can be acquired independently of the underlying brdfthe acquired data can be used directly for rendering with little preprocessing rendering is possible using an offline renderer but also using graphics hardware where it achieves real time frame rates compelling results are achieved for wide variety of materials
in this paper we focus on the role that photo viewing plays within large distributed enterprise we describe the results of an analysis of users viewing behavior through log activity and semi structured interviews with respect to photo sharing application embedded within an internal social networking site specifically we investigate how these forms of expression can assist in the transmission of the norms and values associated with the culture of the organization through impression formation we conclude by discussing how photos might act as resource for newcomers to learn about the various aspects of the organizational culture and offer design suggestions for photo viewing systems within organizations
current denial of service dos attacks are directed towards specific victim the research community has devised several countermeasures that protect the victim host against undesired traffic we present coremelt new attack mechanism where attackers only send traffic between each other and not towards victim host as result none of the attack traffic is unwanted the coremelt attack is powerful because among attackers there are connections which cause significant damage in the core of the network we demonstrate the attack based on simulations within real internet topology using realistic attacker distributions and show that attackers can induce significant amount of congestion
this paper proposes novel natural facial expression recognition method that recognizes sequence of dynamic facial expression images using the differential active appearance model aam and manifold learning as follows first the differential aam features dafs are computed by the difference of the aam parameters between an input face image and reference neutral expression face image second manifold learning embeds the dafs on the smooth and continuous feature space third the input facial expression is recognized through two steps computing the distances between the input image sequence and gallery image sequences using directed hausdorff distance dhd and selecting the expression by majority voting of nearest neighbors nn sequences in the gallery the dafs are robust and efficient for the facial expression analysis due to the elimination of the inter person camera and illumination variations since the dafs treat the neutral expression image as the reference image the neutral expression image must be found effectively this is done via the differential facial expression probability density model dfepdm using the kernel density approximation of the positively directional dafs changing from neutral to angry happy surprised and negatively directional dafs changing from angry happy surprised to neutral then face image is considered to be the neutral expression if it has the maximum dfepdm in the input sequences experimental results show that the dafs improve the facial expression recognition performance over conventional aam features by and the sequence based nn classifier provides facial expression recognition performance on the facial expression database fed
in this paper we present new global router nthu route for modern designs nthu route is based on iterative rip ups and reroutes and several techniques are proposed to enhance our global router these techniques include history based cost function which helps to distribute overflow during iterative rip ups and reroutes an adaptive multi source multi sink maze routing method to improve the wirelength of maze routing congested region identification method to specify the order for nets to be ripped up and rerouted and refinement process to further reduce overflow when iterative history based rip ups and reroutes reach bottleneck compared with two state of the art works on ispd benchmarks nthu route outperforms them in both overflow and wirelength for the much larger designs from the ispd benchmark suite our solution quality is better than or comparable to the best results reported in the ispd routing contest
angular similarity measures have been utilized by several database applications to define semantic similarity between various data types such as text documents time series images and scientific data although similarity searches based on euclidean distance have been extensively studied in the database community processing of angular similarity searches has been relatively untouched problems due to mismatch in the underlying geometry as well as the high dimensionality of the data make current techniques either inapplicable or their use results in poor performance this brings up the need for effective indexing methods for angular similarity queries we first discuss how to efficiently process such queries and propose effective access structures suited to angular similarity measures in particular we propose two classes of access structures namely angular sweep and cone shell which perform different types of quantization based on the angular orientation of the data objects we also develop query processing algorithms that utilize these structures as dense indices the proposed techniques are shown to be scalable with respect to both dimensionality and the size of the data our experimental results on real data sets from various applications show two to three orders of magnitude of speedup over the current techniques
concurrent programming is conceptually harder to undertake and to understand than sequential programming because programmer has to manage the coexistence and coordination of multiple concurrent activities to alleviate this task several high level approaches to concurrent programming have been developed for some high level programming approaches prototyping for facilitating early evaluation of new ideas is central goal prototyping is used to explore the essential features of proposed system through practical experimentation before its actual implementation to make the correct design choices early in the process of software development approaches to prototyping concurrent applications with very high level programming systems intend to alleviate the development in different ways early experimentation with alternate design choices or problem decompositions for concurrent applications is suggested to make concurrent programming easier this paper presents survey of programming languages and systems for prototyping concurrent applications to review the state of the art in this area the surveyed approaches are classified with respect to the prototyping process
schema evolution is problem that is faced by long lived data when schema changes existing persistent data can become inaccessible unless the database system provides mechanisms to access data created with previous versions of the schema most existing systems that support schema evolution focus on changes local to individual types within the schema thereby limiting the changes that the database maintainer can perform we have developed model of type changes involving multiple types the model describes both type changes and their impact on data by defining derivation rules to initialize new data based on the existing data the derivation rules can describe local and nonlocal changes to types to capture the intent of large class of type change operations we have built system called tess type evolution software system that uses this model to recognize type changes by comparing schemas and then produces transformer that can update data in database to correspond to newer version of the schema
we present and evaluate cache oblivious algorithm for stencil computations which arise for example in finite difference methods our algorithm applies to arbitrary stencils in dimensional spaces on an ideal cache of size our algorithm saves factor of cache misses compared to naive algorithm and it exploits temporal locality optimally throughout the entire memory hierarchy we evaluate our algorithm in terms of the number of cache misses and demonstrate that the memory behavior agrees with our theoretical predictions our experimental evaluation is based on finite difference solution of heat diffusion problem as well as gauss seidel iteration and dimensional lbmhd program both reformulated as cache oblivious stencil computations
we introduce multipresenter novel presentation system designed to work on very large display spaces multiple displays or physically large high resolution displays multipresenter allows presenters to organize and present pre made and dynamic presentations that take advantage of very large display space accessed from personal laptop presenters can use the extra space to provide long term persistency of information to the audience our design deliberately separates content generation authoring from the presentation of content we focus on supporting presentation flow and variety of presentation styles ranging from automated scripted sequences of pre made slides to highly dynamic ad hoc and non linear content by providing smooth transition between these styles presenters can easily alter the flow of content during presentation to adapt to an audience or to change emphasis in response to emerging interests we describe our goals rationale and the design process providing detailed description of the current version of the system and discuss our experience using it throughout one semester first year computer science course
the research presented in this paper introduces relative representation of trajectories in space and time the objective is to represent space the way it is perceived by moving observer acting in the environment and to provide complementary view to the usual absolute vision of space trajectories are characterized from the perception of moving observer where relative positions and relative velocities are the basic primitives this allows for formal identification of elementary trajectory configurations and their relationships with the regions that compose the environment the properties of the model are studied including transitions and composition tables these properties characterize trajectory transitions by the underlying processes that semantically qualify them the approach provides representation that might help the understanding of trajectory patterns in space and time
periodic broadcast protocols enable efficient streaming of highly popular media files to large numbers of concurrent clients most previous periodic broadcast protocols however assume that all clients can receive at the same rate and also assume that reception bandwidth is not time varying in this article we first develop new periodic broadcast protocol optimized heterogeneous periodic broadcast ohpb that can be optimized for given population of clients with heterogeneous reception bandwidths and quality of service requirements the ohpb protocol utilizes an optimized segment size progression determined by solving linear optimization model that takes as input the client population characteristics and an objective function such as mean client startup delay we then develop generalization of the ohpb linear optimization model that allows optimal server bandwidth allocation among multiple concurrent ohpb broadcasts wherein each media file and its clients may have different characteristics finally we propose complementary client protocols employing work ahead buffering of data during playback so as to enable more uniform playback quality when the reception bandwidth is time varying
in this paper we study the following problem given database and set of queries we want to find set of views that can compute the answers to the queries such that the amount of space in bytes required to store the viewset is minimum on the given database we also handle problem instances where the input has set of database instances as described by an oracle that returns the sizes of view relations for given view definitions this problem is important for applications such as distributed databases data warehousing and data integration we explore the decidability and complexity of the problem for workloads of conjunctive queries we show that results differ significantly depending on whether the workload queries have self joins further for queries without self joins we describe very compact search space of views which contains all views in at least one optimal viewset we present techniques for finding minimum size viewset for single query without self joins by using the shape of the query and its constraints and validate the approach by extensive experiments
dynamic capacity provisioning is useful technique for handling the multi time scale variations seen in internet workloads in this article we propose novel dynamic provisioning technique for multi tier internet applications that employs flexible queuing model to determine how much of the resources to allocate to each tier of the application and combination of predictive and reactive methods that determine when to provision these resources both at large and small time scales we propose novel data center architecture based on virtual machine monitors to reduce provisioning overheads our experiments on forty machine xen linux based hosting platform demonstrate the responsiveness of our technique in handling dynamic workloads in one scenario where flash crowd caused the workload of three tier application to double our technique was able to double the application capacity within five minutes thus maintaining response time targets our technique also reduced the overhead of switching servers across applications from several minutes to less than second while meeting the performance targets of residual sessions
we are interested in verifying dynamic properties of finite state reactive systems under fairness assumptions by model checking the systems we want to verify are specified through top down refinement processin order to deal with the state explosion problem we have proposed in previous works to partition the reachability graph and to perform the verification on each part separately moreover we have defined class called bmod of dynamic properties that are verifiable by parts whatever the partition we decide if property belongs to bmod by looking at the form of the uuml chi automaton that accepts not however when property belongs to bmod the property rarr where is fairness assumption does not necessarily belong to bmodin this paper we propose to use the refinement process in order to build the parts on which the verification has to be performed we then show that with such partition if property is verifiable by parts and if is the expression of the fairness assumptions on system then the property rarr is still verifiable by partsthis approach is illustrated by its application to the chip card protocol equals using the engineering design language
this paper addresses the synthesis of labyrinthine and maze structures which are represented as curves on manifolds the curves evolve based on simulation controlled by spatially varying parameters defined by texture maps we introduce the graphics community to the fascinating area of maze art and present model for the automatic generation of organic looking labyrinths and mazes we also present framework based on regions and patterns for the interactive artistic control of npr algorithms such as ours that evolve in both space and time in the context of labyrinths the framework provides the designer with control over both the path complexity and visual aesthetics as the curves evolve the resulting labyrinths and mazes range from mathematically simple to intricately complex visual structures applications of the resulting curves include npr difficult to counterfeit imagery environmental design and architecture computer games and parameterization of manifolds
region based memory management is an alternative to standard tracing garbage collection that makes operation such as memory deallocation explicit but verifiably safe in this article we present new compiler intermediate language called the capability language cl that supports region based memory management and enjoys provably safe type systems unlike previous region based type system region lifetimes need not be lexically scoped and yet the language may be checked for safety without complex analyses therefore our type system may be deployed in settings such as extensible operating systems where both the performance and safety of untrusted code is important the central novelty of the language is the use of static capabilities to specify the permissibility of various operations such as memory access and deallocation in order to ensure capabilities are relinquished properly the type system tracks aliasing information using form of bounded quantification moreover unlike previous work on region based type systems the proof of soundness of our type system is relatively simple employing only standard syntactic techniques in order to show how our language may be used in practice we show how to translate variant of tofte and talpin’s high level type and effects system for region based memory management into our language when combined with known region inference algorithms this translation provides way to compile source level languages to cl
most analysts start with an overview of the data before gradually refining their view to be more focused and detailed multiscale pan and zoom systems are effective because they directly support this approach however generating abstract overviews of large data sets is difficult and most systems take advantage of only one type of abstraction visual abstraction furthermore these existing systems limit the analyst to single zooming path on their data and thus to single set of abstract views this paper presents formalism for describing multiscale visualizations of data cubes with both data and visual abstraction and method for independently zooming along one or more dimensions by traversing zoom graph with nodes at different levels of detail as an example of how to design multiscale visualizations using our system we describe four design patterns using our formalism these design patterns show the effectiveness of multiscale visualization of general relational databases
we present novel method for acquisition modeling compression and synthesis of realistic facial deformations using polynomial displacement maps our method consists of an analysis phase where the relationship between motion capture markers and detailed facial geometry is inferred and synthesis phase where novel detailed animated facial geometry is driven solely by sparse set of motion capture markers for analysis we record the actor wearing facial markers while performing set of training expression clips we capture real time high resolution facial deformations including dynamic wrinkle and pore detail using interleaved structured light scanning and photometric stereo next we compute displacements between neutral mesh driven by the motion capture markers and the high resolution captured expressions these geometric displacements are stored in polynomial displacement map which is parameterized according to the local deformations of the motion capture dots for synthesis we drive the polynomial displacement map with new motion capture data this allows the recreation of large scale muscle deformation medium and fine wrinkles and dynamic skin pore detail applications include the compression of existing performance data and the synthesis of new performances our technique is independent of the underlying geometry capture system and can be used to automatically generate high frequency wrinkle and pore details on top of many existing facial animation systems
overlay networks among cooperating hosts have recently emerged as viable solution to several challenging problems including multicasting routing content distribution and peer to peer services application level overlays however incur performance penalty over router level solutions this paper quantifies and explains this performance penalty for overlay multicast trees via internet experimental data simulations and theoretical models we compare number of overlay multicast protocols with respect to overlay tree structure and underlying network characteristics experimental data and simulations illustrate that the mean number of hops and mean per hop delay between parent and child hosts in overlay trees generally decrease as the level of the host in the overlay tree increases overlay multicast routing strategies overlay host distribution and internet topology characteristics are identified as three primary causes of the observed phenomenon we show that this phenomenon yields overlay tree cost savings our results reveal that the normalized cost is for small where is the total number of hops in all overlay links is the average number of hops on the source to receiver unicast paths and is the number of members in the overlay multicast session this can be compared to an ip multicast cost proportional to to
studying program behavior is central component in architectural designs in this paper we study and exploit one aspect of program behavior the behavior repetition to expedite simulation detailed architectural simulation can be long and computationally expensive various alternatives are commonly used to simulate much smaller instruction stream to evaluate design choices using reduced input set or simulating only small window of the instruction stream in this paper we propose to reduce the amount of detailed simulation by avoiding simulating repeated code sections that demonstrate stable behavior by characterizing program behavior repetition and use the information to select subset of instructions for detailed simulation we can significantly speed up the process without affecting the accuracy in most cases simulation time of full length spec cpu benchmarks is reduced from hundreds of hours to few hours the average error incurred is only about or less for range of metrics
the challenge for the development of next generation software is the successful management of the complex computational environment while delivering to the scientist the full power of flexible compositions of the available algorithmic alternatives self adapting numerical software sans systems are intended to meet this significant challenge the process of arriving at an efficient numerical solution of problems in computational science involves numerous decisions by numerical expert attempts to automate such decisions distinguish three levels algorithmic decision management of the parallel environment and processor specific tuning of kernels additionally at any of these levels we can decide to rearrange the user’s data in this paper we look at number of efforts at the university of tennessee to investigate these areas
search engines continue to struggle to provide everyday users with service capable of delivering focussed results that are relevant to their information needs moreover traditional search engines really only provide users with starting point for their information search that is upon selecting page from search result list the interaction between user and search engine is effectively over and the user must continue their search alone in this article we argue that comprehensive search service needs to provide the user with more help both at the result list level and beyond and we outline some recommendations for intelligent web search support we introduce the searchguide web search support system and we describe how it fulfils the requirements for search support system providing evaluation results where applicable
recent litigation and intense regulatory focus on secure retention of electronic records have spurred rush to introduce write once read many worm storage devices for retaining business records such as electronic mail however simply storing records in worm storage is insuffcient to ensure that the records are trustworthy ie able to provide irrefutable proof and accurate details of past events specifically some form of index is needed for timely access to the records but unless the index is maintained securely the records can in effect be hidden or altered even if stored in worm storage in this paper we systematically analyze the requirements for establishing trustworthy inverted index to enable keyword based search queries we propose novel scheme for effcient creation of such an index and demonstrate through extensive simulations and experiments with an enterprise keyword search engine that the scheme can achieve online update speeds while maintaining good query performance in addition we present secure index structure for multi keyword queries that supports insert lookup and range queries in time logarithmic in the number of documents
abstract we present an efficient stereoscopic rendering algorithm supporting interactive navigation through large scale voxel based environments in this algorithm most of the pixel values of the right image are derived from the left image by fast warping based on specific stereoscopic projection geometry an accelerated volumetric ray casting then fills the remaining gaps in the warped right image our algorithm has been parallelized on multiprocessor by employing effective task partitioning schemes and achieved high cache coherency and load balancing we also extend our stereoscopic rendering to include view dependent shading and transparency effects we have applied our algorithm in two virtual navigation systems flythrough over terrain and virtual colonoscopy and reached interactive stereoscopic rendering rates of more than frames per second on processor sgi challenge
we present new external memory multiresolution surface representation for massive polygonal meshes previous methods for building such data structures have relied on resampled surface data or employed memory intensive construction algorithms that do not scale well our proposed representation combines efficient access to sampled surface data with access to the original surface the construction algorithm for the surface representation exhibits memory requirements that are insensitive to the size of the input mesh allowing it to process meshes containing hundreds of millions of polygons the multiresolution nature of the surface representation has allowed us to develop efficient algorithms for view dependent rendering approximate collision detection and adaptive simplification of massive meshes the empirical performance of these algorithms demonstrates that the underlying data structure is powerful and flexible tool for operating on massive geometric data
one of the surprising developments in the area of program verification is how ideas introduced originally by logicians in the ended up yielding by an industrial standard property specification language called psl this development was enabled by the equally unlikely transformation of the mathematical machinery of automata on infinite words introduced in the early for second order arithmetics into effective algorithms for model checking tools this paper attempts to trace the tangled threads of this development
this paper presents communication network targeted for complex system on chip soc and network on chip noc designs the heterogeneous ip block interconnection hibi aims at maximum efficiency and minimum energy per transmitted bit combined with quality of service qos in transfers other features include support for hierarchical topologies with several clock domains flexible scalability and runtime reconfiguration of network parameters hibi is intended for integrating coarse grain components such as intellectual property ip blocks that have size of thousands of gateshibi has been implemented in vhdl and systemc and synthesized on several cmos technologies and on fpga bit wrapper requires gates and runs with mhz on technology which shows that only minimal area overhead is paid for the advanced features the area and frequency results are well comparable to other noc proposalsfurthermore data transfers are shown to approach the maximum theoretical performance for protocol efficiency hibi network is accompanied with design framework with tools for optimizing the system through automated design space exploration
as commonly acceptable standard for guiding web markup documents xml allows the internet users to create multimedia documents of their preferred structures and share with other people the creation of various multimedia document structures typically as trees implies that some kinds of conversion mechanisms are needed for people using different structures to understand each other this paper presents visual approach to the representation and validation of multimedia document structures specified in xml and transformation of one structure to another the underlying theory of our approach is context sensitive graph grammar formalism the paper demonstrates the conciseness and expressiveness of the graph grammar formalism an example xml structure is provided and its graph grammar representation validation and transformation to multimedia representation are presented
disruption tolerant networks dtns differ from other types of networks in that capacity is created by the movements of network participants this implies that understanding and influencing the participants motions can have significant impact on network performance in this paper we introduce the routing protocol mora which learns structure in the movement patterns of network participants and uses it to enable informed message passing we also propose the introduction of autonomous agents as additional participants in dtns these agents adapt their movements in response to variations in network capacity and demand we use multi objective control methods from robotics to generate motions capable of optimizing multiple network performance metrics simultaneously we present experimental evidence that these strategies individually and in conjunction result in significant performance improvements in dtns
verifying concurrent programs is challenging since the number of thread interleavings that need to be explored can be huge even for moderate programs we present cartesian semantics that reduces the amount of nondeterminism in concurrent programs by delaying unnecessary context switches using this semantics we construct novel dynamic partial order reduction algorithm we have implemented our algorithm and evaluate it on small set of benchmarks our preliminary experimental results show significant potential saving in the number of explored states and transitions
we port verification techniques for device drivers from the windows domain to linux combining several tools and techniques into one integrated tool chain building on ideas from microsoft’s static driver verifier sdv project we extend their specification language and combine its implementation with the public domain bounded model checker cbmc as new verification back end we extract several api conformance rules from linux documentation and formulate them in the extended language slicx thus sdv style verification of temporal safety specifications is brought into the public domain in addition we show that slicx together with cbmc can be used to simulate preemption in multi threaded code and to find race conditions and to prove the absence of deadlocks and memory leaks
index compression techniques are known to substantially decrease the storage requirements of text retrieval system as side effect they may increase its retrieval performance by reducing disk overhead despite this advantage developers sometimes choose to store index data in uncompressed form in order to not obstruct random access into each index term’s postings list in this paper we show that index compression does not harm random access performance in fact we demonstrate that in some cases random access into term’s postings list may be realized more efficiently if the list is stored in compressed form instead of uncompressed this is regardless of whether the index is stored on disk or in main memory since both types of storage hard drives and ram do not support efficient random access in the first place
statistic estimation such as output size estimation of operators is well studied subject in the database research community mainly for the purpose of query optimization the assumption however is that queries are ad hoc and therefore the emphasis has been on capturing the data distribution when long standing continuous queries on changing database are concerned more direct approach namely building an estimation model for each operator is possible in this paper we propose novel learning based method our method consists of two steps the first step is to design dedicated feature extraction algorithm that can be used incrementally to obtain feature values from the underlying data the second step is to use data mining algorithm to generate an estimation model based on the feature values extracted from the historical data to illustrate the approach this paper studies the case of similarity based searches over streaming time series experimental results show this approach provides accurate statistic estimates with low overhead
nandfs is flash file system that exposes memory performance tradeoff to system integrators the file system can be configured to use large amount of ram in which case it delivers excellent performance in particular when nandfs is configured with the same amount of ram that yaffs uses the performance of the two file systems is comparable yaffs is file system that is widely used in embedded linux and other embedded environments but yaffs and other state of the art flash file systems allocate ram dynamically and do not provide the system builder with way to limit the amount ofmemory that they allocate nandfs on the other hand allows the system builder to configure it to use specific amount of ram the performance of nandfs degrades when the amount of ram it uses shrinks but the degradation is graceful not catastrophic nandfs is able to provide this flexibility thanks to novel data structure that combines coarsegrained logical to physical mapping with log structured file system
many techniques for power management employed in advanced rtl synthesis tools rely explicitly or implicitly on observability don’t care odc conditions in this paper we present systematic approach to maximizing the effectiveness of these techniques by generating power friendly rtl descriptions in behavioral synthesis tool we first introduce the concept of behavior level observability and investigate its relation with observability under given schedule using an extension of boolean algebra we then propose an efficient algorithm to compute behavior level observability on data flow graph our algorithm exploits knowledge about select and boolean instructions and allows certain forms of other knowledge once uncovered to be considered for stronger observability conditions we also describe behavioral synthesis flow where behavior level observability is used to guide the scheduler toward maximizing the likelihood that execution of power hungry instructions will be avoided under latency constraint experimental results show that our approach is able to reduce total power and it outperforms previous method in by on average on set of real world designs to the best of our knowledge this is the first work to use comprehensive behavioral level observability analysis to guide optimizations in behavioral synthesis
program traces can be used to drive visualisations of reusable components but such traces can be gigabytes in size are very expensive to generate and are hard to extract information from we have developed solution to this problem an xml data storage environment xdse for storing xml based program traces in native xml database we use xquery to extract information from the program traces and the results are then transformed into understandable visualisations
this paper expands on the scope of correlation filters to show their usefulness for reliable face recognition towards that end we propose adaptive and robust correlation filters arcf and describe their usefulness for reliable face authentication using recognition by parts strategies arcf provide information that involves both appearance and location the cluster and strength of the arcf correlation peaks indicate the confidence of the face authentication made if any the development of arcf motivated by mace filters and adaptive beam forming from radar sonar is driven by tikhonov regularization the adaptive aspect of arcf comes from their derivation using both training and test data similar to transduction while the robust aspect benefits from the correlation peak optimization to decrease their sensitivity to noise and distortions the comparative advantages of arcf are motivated explained and illustrated vis vis competing correlation filters experimental evidence shows the feasibility and reliability of arcf vis vis occlusion disguise and illumination expression and temporal variability the generalization ability of arcf is further illustrated when decision making thresholds learned priori from one data base eg feret carry over to face images from another data base eg ar
we present and evaluate framework parexc to reduce the runtime penalties of compiler generated runtime checks an obvious approach is to use idle cores of modern multi core cpus to parallelize the runtime checks this could be accomplished by parallelizing the application and in this way implicitly parallelizing the checks or by parallelizing the checks only parallelizing an application is rarely easy and frameworks that simplify the parallelization eg like software transactional memory stm can introduce considerable overhead parexc is based on alternative we compare it with an approach using transactional memory based alternative our experience shows that parexc is not only more efficient than the stm based solution but the manual effort for an application developer to integrate parexc is lower parexc has in contrast to similar frameworks two noteworthy features that permit more efficient parallelization of checks speculative variables and the ability to add checks by static instrumentation
tomorrow’s microprocessors will be able to handle multiple flows of control applications that exhibit task level parallelism tlp and can be decomposed into parallel tasks will perform well on these platforms tlp arises when task is independent of its neighboring code traditional parallel compilers exploit one variety of tlp loop level parallelism llp where loop iterations are executed in parallel llp can overwhelming be found in numeric typically fortran programs with regular patterns of data accesses in contrast irregular applications typified by general purpose integer applications exhibit little llp as they tend to access data in irregular patterns through pointers without pointer disambiguation to analyze data access dependences traditional parallel compilers cannot parallelize these irregular applications and ensure correct executionwe focus on different variety of tlp namely speculative task parallelism stp stp arises when task either leaf procedure non leaf procedure or an entire loop is control and memory independent of its preceding code and thus could be executed in parallel two sections of code are memory independent when neither contains store to memory location that the other accesses to exploit stp we assume hypothetical speculative machine that supports speculative futures parallel programming construct that executes task early on different thread or processor with mechanisms for resolving incorrect speculation when the task is not after all independent this allows us to speculatively parallelize code when there is high probability of independence but no guaranteefigure illustrates stp showing task in the dynamic instruction stream of an irregular application that has no memory access conflicts with group of instructions that precede the shorter of and determines the overlap of memory independent instructions as seen in figures and in the absence of any register dependences and may be executed in parallel resulting in shorter execution time it is hard for traditional parallel compilers of pointer based languages to expose this parallelismthe goals of this paper are to identify such regions as and within irregular applications and to find the number of instructions that may thus be removed from the critical path this number represents the maximum stp when the cost of exploiting stp is zerobecause the biggest barrier to detecting independence in irregular codes is memory disambiguation we identify memory independent tasks using profile based approach and measure the amount of stp by estimating the amount of memory independent instructions those tasks expose we vary the level of control dependence and memory dependence to investigate their effect on the amount of memory independence we find we profile at different memory granularities and introduce synchronization to expose higher levels of memory independence across this variety of speculation assumptions to of dynamic instructions are within tasks that are found to be memory independent this was on the specint benchmarks set of irregular applications for which traditional methods of parallelization are ineffective
given large collection of sparse vector data in high dimensional space we investigate the problem of finding all pairs of vectors whose similarity score as determined by function such as cosine distance is above given threshold we propose simple algorithm based on novel indexing and optimization strategies that solves this problem without relying on approximation methods or extensive parameter tuning we show the approach efficiently handles variety of datasets across wide setting of similarity thresholds with large speedups over previous state of the art approaches
an important issue any organization or individual has to face when managing data containing sensitive information is the risk that can be incurred when releasing such data even though data may be sanitized before being released it is still possible for an adversary to reconstruct the original data using additional information thus resulting in privacy violations to date however systematic approach to quantify such risks is not available in this paper we develop framework based on statistical decision theory that assesses the relationship between the disclosed data and the resulting privacy risk we model the problem of deciding which data to disclose in terms of deciding which disclosure rule to apply to database we assess the privacy risk by taking into account both the entity identification and the sensitivity of the disclosed information furthermore we prove that under some conditions the estimated privacy risk is an upper bound on the true privacy risk finally we relate our framework with the anonymity disclosure method the proposed framework makes the assumptions behind anonymity explicit quantifies them and extends them in several natural directions
this paper describes the active memory abstraction for memory system simulation in this abstraction designed specifically for on the fly simulation memory references logically invoke user specified function depending upon the reference’s type and accessed memory block state active memory allows simulator writers to specify the appropriate action on each reference including no action for the common case of cache hits because the abstraction hides implementation details implementations can be carefully tuned for particular platforms permitting much more efficient on the fly simulation than the traditional trace driven abstractionour sparc implementation fast cache executes simple data cache simulations two or three times faster than highly tuned trace driven simulator and only to times slower than the original program fast cache implements active memory by performing fast table look up of the memory block state taking as few as cycles on supersparc for the no action case modeling the effects of fast cache’s additional lookup instructions qualitatively shows that fast cache is likely to be the most efficient simulator for miss ratios between and
variant parametric types have been introduced to provide flexible subtyping mechanism for generic types and are recently being developed into java wildcards shipped worldwide with the jdk release the two approaches which are strictly related retain safety by providing rather peculiar and non trivial mechanisms to restrict access to class functionalities methods and fields in this paper we aim at studying unified framework to describe this issue in detail and to facilitate the understanding and exploitation of this new programming conceptour work is both technical and conceptual on the one hand we provide formal rules to access restriction and specialise them for the two approaches so as to emphasise similarities and differences on the other hand we show that such rules promote natural description and understanding of access restriction in terms of the ability of instances of generic class to produce consume elements of the abstracted type
abstract this paper proposes novel hypergraph skeletal representation for shape based on formal derivation ofthe generic structure of its medial axis by classifying each skeletal point by its order of contact we show that generically the medial axis consists of five types of points which are then organized into sheets curves and points sheets manifolds with boundary which are the locus of bitangent spheres with regular tangency ak notation means distinct hbox rm fold tangencies of the sphere of contact as explained in the text two types of curves the intersection curve of three sheets and the locus of centers of tritangent spheres and the boundary of sheets which are the locus of centers of spheres whose radius equals the larger principal curvature ie higher order contact points and two types of points centers of quad tangent spheres and centers of spheres with one regular tangency and one higher order tangency aa the geometry of the medial axis thus consists of sheets bounded by one type of curve on their free end which corresponds to ridges on the surface and attached to two other sheets at another type of curve which support generalized cylinder description the curves can only end in aa points where they must meet an curve the curves meet together in fours at an point this formal result leads to compact representation for shape referred to as the medial axis hypergraph representation consisting of nodes and points links between pairs of nodes and curves and hyperlinks between groups of links sheets the description of the local geometry at nodes by itself is sufficient to capture qualitative aspects of shapes in analogy to we derive pointwise reconstruction formula to reconstruct surface from this medial axis hypergraph together with the radius function thus this information completely characterizes shape and lays the theoretical foundation for its use in recognition morphing design and manipulation of shapes
how to help reusers retrieve components efficiently and conveniently is critical to the success of the component based software development cbsd in the literature many research efforts have been devoted to the improvement of component retrieval mechanisms although various retrieval methods have been proposed nowadays retrieving software component by the description text is still prevalent in most real world scenarios therefore the quality of the component description text is vital for the component retrieval unfortunately the descriptions of components often contain improper or even noisy information which could deteriorate the effectiveness of the retrieval mechanism to alleviate the problem in this paper we propose an approach which can improve the component description by leveraging user query logs the key idea of our approach is to refine the description of component by extracting proper information from the user query logs two different strategies are proposed to carry out the information extraction the first strategy extracts information for component only from its own related query logs whereas our second strategy further takes logs from similar components into consideration we performed an experimental study on two different data sets to evaluate the effectiveness of our approach the experimental results demonstrate that by using either extraction strategy our approach can improve retrieval performance and our approach can be more effective by leveraging the second strategy which utilizes logs from similar components
in recent years workflow management systems have become an accepted technology to support automation in process centric environments lately organizations concentrate more and more on their core business processes while outsourcing supporting processes to other organizations thereby forming virtual enterprises the organizations forming the virtual enterprise operate in bb commerce setting in which provider organizations perform services for consumer organizations to apply workflow management technology in these virtual enterprises current workflow management systems need to be extended to offer support for cross organizational processes transaction support already considered an important issue in intra organizational workflow management systems must be extended to deal with the cross organizational aspects as well this paper presents high level compensation based transaction model and flexible architecture to support this transaction model as required by cross organizational workflow processes characteristic of the model is the flexibility in rollback semantics by combining rollback modes and rollback scopes this is supported by dynamically composed architecture that is configured using the agreements that are specified in an electronic contract that has been established between the participating organizations the transaction model supported by the dynamically composed architecture is implemented in prototype system based on commercial workflow management technology
the publish subscribe architectural style has recently emerged as promising approach to tackle the dynamism of modern distributed applications the correctness of these applications does not only depend on the behavior of each component in isolation but the interactions among components and the delivery infrastructure play key roles this paper presents the first results on considering the validation of these applications in probabilistic setting we use probabilistic model checking techniques on stochastic models to tackle the uncertainty that is embedded in these systems the communication infrastructure ie the transmission channels and the publish subscribe middleware are modeled directly by means of probabilistic timed automata application components are modeled by using statechart diagrams and then translated into probabilistic timed automata the main elements of the approach are described through an example
the emergent behavior of complex systems which arises from the interaction of multiple entities can be difficult to validate especially when the number of entities or their relationships grows this validation requires understanding of what happens inside the system in the case of multi agent systems which are complex systems as well this understanding requires analyzing and interpreting execution traces containing agent specific information deducing how the entities relate to each other guessing which acquaintances are being built and how the total amount of data can be interpreted the paper introduces some techniques which have been applied in developments made with an agent oriented methodology ingenias which provides framework for modeling complex agent oriented systems these techniques can be regarded as intelligent data analysis techniques all of which are oriented towards providing simplified representations of the system these techniques range from raw data visualization to clustering and extraction of association rules
the development of complex software systems satisfying performance requirements is achievable only spending careful attention to performance goals throughout the lifecycle and especially from its very beginning unified modeling language uml is quickly becoming standard notation for specification and design of software systems uml offers several diagrams for separating concerns of different system views and this feature is helpful to derive early performance models that take into account combined data from these diagrams in this paper we introduce methodology performance incremental validation in uml prima uml aimed at generating queueing network based performance model from uml diagrams that are usually available early in the software lifecycle prima uml is incremental in that it combines information extracted from and annotated into different uml diagrams to piecewise build the performance model besides this is not black box approach as the methodology is open to embed information coming from other uml diagrams possibly in late lifecycle phases for detailing refining or domain tailoring the performance model this work is contribute to encompass the performance validation task as an integrated activity within the development process of complex systems we apply the methodology to quite simple example to show how effective it can be to get early performance insights
we present novel algorithm for accurate high quality point rendering which is based on the formulation of splatting using homogeneous coordinates in contrast to previous methods this leads to perspective correct splat shapes avoiding artifacts such as holes caused by the affine approximation of the perspective projection further our algorithm implements the ewa resampling filter hence providing high image quality with anisotropic texture filtering we also present an extension of our rendering primitive that facilitates the display of sharp edges and corners finally we describe an efficient implementation of the entire point rendering pipeline using vertex and fragment programs of current gpus
superimposed coding method frame sliced signature file is proposed and the performance of this method is studied and compared with that of other signature file methods the response time of the method is improved due to its ability to effectively partition the signature file so that fewer random disk accesses are required on both retrieval and insertion while the good characteristics of conventional square file ie low space overhead low maintenance cost and the write once property are retained the generalized version of the method is shown to be unified framework for several popular signature file methods including the sequential signature file ssf method bit sliced signature file bssf method and its enhanced version of b’ssf prototype system was implemented on unix workstations with the language experimental results on mb database consisting of technical reports and mb database with technical reports are presented
we present the design and implementation of highly scalable and easily deployed middleware system that provides performance isolated execution environments for client and server application functionality the java active extensions system allows clients or servers to extend their operation by hosting portions of their codes called extensions at network vantage points for improved performance and reliability and by providing them with quality of service via rate based resource reservations this system is especially useful for wireless resource limited clients which can remotely locate filters caches monitors buffers etc to act on their behalf and improve interactions with servers servers also benefit by moving some of their services close to their clients eg those near common base station to reduce latency and improve bandwidth in both cases the client’s or server’s extended functionality executes with specified fraction of the remote system’s processor the system design is based on scalable distributed architecture that allows for incremental hardware growth and is highly deployable as it runs entirely at user level including its rate based scheduling system
since the very beginnings of cryptography many centuries ago key management has been one of the main challenges in cryptographic research in case of group of players wanting to share common key many schemes exist in the literature managing groups where all players are equal or proposing solutions where the group is structured as hierarchy this paper presents the first key management scheme suitable for hierarchy where no central authority is needed and permitting to manage graph representing the hierarchical group with possibly several roots this is achieved by using hmac and non hierarchical group key agreement scheme in an intricate manner and introducing the notion of virtual node
while reasoners are year after year scaling up in the classical time invariant domain of ontological knowledge reasoning upon rapidly changing information has been neglected or forgotten on the contrary processing of data streams has been largely investigated and specialized stream database management systems exist in this paper by coupling reasoners with powerful reactive throughput efficient stream management systems we introduce the concept of stream reasoning we expect future realization of such concept to have high impact on the future internet because it enables reasoning in real time at throughput and with reactivity not obtained in previous works
this paper provides comprehensive treatment of index management in transaction systems we present method called ariesiim algorithm for recovery and isolation exploiting semantics for index management for concurrency control and recovery of trees aries im guarantees serializability and uses write ahead logging for recovery it supports very high concurrency and good performance by treating as the lock of key the same lock as the one on the corresponding record data in data page eg at the record level not acquiring in the interest of permitting very high concurrency commit duration locks on index pages even during index structure modification operations smos like page splits and page deletions and allowing retrievals inserts and deletes to go on concurrently with smos during restart recovery any necessary redos of index changes are always performed in page oriented fashion ie without traversing the index tree and during normal processing and restart recovery whenever possible undos are performed in page oriented fashion aries im permits different granularities of locking to be supported in flexible manner subset of aries im has been implemented in the os extended edition database manager since the locking ideas of aries im have general applicability some of them have also been implemented in sql ds and the vm shared file system even though those systems use the shadow page technique for recovery
this paper explores optimization techniques of the synchronization mechanisms for mpsocs based on complex interconnect network on chip targeted at future power efficient systems the proposed solution is based on the idea of locally performing synchronization operations which require the continuous polling of shared variable thus featuring large contention eg spin locks we introduce hw module the synchronization operation buffer sb which queues and manages the requests issued by the processors experimental validation has been carried out by using grapes cycle accurate performance power simulation platform for processor target architecture we show that the proposed solution achieves up to performance improvement and energy saving with respect to synchronization based on directory based coherence protocol
we propose flexible interaction mechanism for cbir by enabling relevance feedback inside images through drawling strokes user’s interest is obtained from an easy to use user interface and fused seamlessly with traditional feedback information in semi supervised learning framework retrieval performance is boosted due to more precise description of the query concept region segmentation is also improved based on the collected strokes and further enhances the retrieval precision we implement our system flexible image search tool fist based on the ideas above experiments on two real world data sets demonstrate the effectiveness of our approach
hypervisors are increasingly utilized in modern computer systems ranging from pcs to web servers and data centers aside from server applications hypervisors are also becoming popular target for implementing many security systems since they provide small and easy to secure trusted computing base this paper presents novel way of using hypervisors to protect application data privacy even when the underlying operating system is not trustable each page in virtual address space is rendered to user applications according to the security context the application is running in the hypervisor encrypts and decrypts each memory page requested depending on the application’s access permission to the page the main result of this system is the complete removal of the operating system from the trust base for user applications data privacy to reduce the runtime overhead of the system two optimization techniques are employed we use page frame replication to reduce the number ofcryptographic operations by keeping decrypted versions of page frame we also employ lazy synchronization to minimize overhead due to an update to one of the replicated page frame our system is implemented and evaluated by modifying the xen hypervisor showing that it increases the application execution time only by for cpu and memory intensive workloads
most people consider database is merely data repository that supports data storage and retrieval actually database contains rich inter related multi typed data and information forming one or set of gigantic interconnected heterogeneous information networks much knowledge can be derived from such information networks if we systematically develop an effective and scalable database oriented information network analysis technology in this tutorial we introduce database oriented information network analysis methods and demonstrate how information networks can be used to improve data quality and consistency facilitate data integration and generate interesting knowledge this tutorial presents an organized picture on how to turn database into one or set of organized heterogeneous information networks how information networks can be used for data cleaning data consolidation and data qualify improvement how to discover various kinds of knowledge from information networks how to perform olap in information networks and how to transform database data into knowledge by information network analysis moreover we present interesting case studies on real datasets including dblp and flickr and show how interesting and organized knowledge can be generated from database oriented information networks
support vector machine svm is novel classifier based on the statistical learning theory to increase the performance of classification the approach of svm with kernel is usually used in classification tasks in this study we first attempted to investigate the performance of svm with kernel several kernel functions polynomial rbf summation and multiplication were employed in the svm and the feature selection approach developed hermes buhmann feature selection for support vector machines in proceedings of the international conference on pattern recognition icpr vol pp was utilized to determine the important features then hypertension diagnosis case was implemented and anthropometrical factors related to hypertension were selected implementation results show that the performance of combined kernel approach is better than the single kernel approach compared with backpropagation neural network method svm based method was found to have better performance based on two epidemiological indices such as sensitivity and specificity
the goal of query focused summarization is to extract summary for given query from the document collection although much work has been done for this problem there are still many challenging issues the length of the summary is predefined by for example the number of word tokens or the number of sentences query usually asks for information of several perspectives topics however existing methods cannot capture topical aspects with respect to the query in this paper we propose novel approach by combining statistical topic model and affinity propagation specifically the topic model called qlda can simultaneously model documents and the query moreover the affinity propagation can automatically discover key sentences from the document collection without predefining the length of the summary experimental results on duc and duc data sets show that our approach is effective and the summarization performance is better than baseline methods
we have designed and implemented vmgl virtual machine monitor vmm independent graphics processing unit gpu independent and cross platform opengl virtualization solution vmgl allows applications executing within virtual machines vms to leverage hardware rendering acceleration thus solving problem that has limited virtualization of growing class of graphics intensive applications vmgl also provides applications running within vms with suspend and resume capabilities across gpus from different vendors our experimental results from number of graphics intensive applications show that vmgl provides excellent rendering performance within or better of that obtained with native graphics hardware acceleration further vmgl’s performance is two orders of magnitude better than that of software rendering the commonly available alternative today for graphics intensive applications running in virtualized environments our results confirm vmgl’s portability across vmware workstation and xen on vt and non vt hardware and across linux with and without paravirtualization freebsd and solaris our results also show that the resource demands of vmgl align well with the emerging trend of multi core processors
we prove secure concrete and practical two round authenticated message exchange protocol which reflects the authentication mechanisms for web services discussed in various standardization documents the protocol consists of single client request and subsequent server response and works under the realistic assumptions that the responding server is long lived has bounded memory and may be reset occasionally the protocol is generic in the sense that it can be used to implement securely any service based on authenticated message exchange because request and response can carry arbitrary payloads our security analysis is computational analysis in the bellare rogaway style and thus provides strong guarantees it is novel from technical point of view since we extend the bellare rogaway framework by timestamps and payloads with signed parts
the wide adoption of xml has increased the interest on data models that are based on tree structured data querying capabilities are provided through tree pattern queries tpqs the need for querying tree structured data sources when their structure is not fully known and the need to integrate multiple data sources with different tree structures have driven recently the suggestion of query languages that relax the complete specification of tree pattern assigning semantics to the queries of these languages so that they return meaningful answers is challenging issue in this paper we introduce query language which allows the specification of partial tree pattern queries ptpqs the structure in ptpq can be flexibly specified fully partially or not at all we define index graphs which summarize the structural information of data trees using index graphs we show that ptpqs can be evaluated through the generation of an equivalent set of complete tpqs we suggest an original approach that exploits the set of complete tpqs of ptpq to assign meaningful semantics to the ptpq language in contrast to previous approaches that operate locally on the data to compute meaningful answers usually by computing lowest common ancestors our approach operates globally on index graphs to detect meaningful complete tpqs we implemented and experimentally evaluated our approach on dblp based data sets with irregularities its comparison to previous ones shows that it succeeds in finding all the meaningful answers when the others fail perfect recall further it outperforms approaches with similar recall in excluding meaningless answers better precision finally it is superior to and scales better than the only previous approach that allows for structural constraints in the queries our approach generates tpqs and therefore it can be easily implemented on top of an xquery engine
we present new limited form of interprocedural analysis called field analysis that can be used by compiler to reduce the costs of modern language features such as object oriented programming automatic memory management and run time checks required for type safety unlike many previous interprocedural analyses our analysis is cheap and does not require access to the entire program field analysis exploits the declared access restrictions placed on fields in modular language eg field access modifiers in java in order to determine useful properties of fields of an object we describe our implementation of field analysis in the swift optimizing compiler for java as well set of optimizations that exploit the results of field analysis these optimizations include removal of run time tests compile time resolution of method calls object inlining removal of unnecessary synchronization and stack allocation our results demonstrate that field analysis is efficient and effective speedups average on wide range of applications with some times reduced by up to compile time overhead of field analysis is about
it is widely recognised that paper remains pervasive resource for collaboration and yet there has been uncertain progress in developing technologies that aim to enhance paper documents with computational capabilities in this article we discuss the design of technology that interweaves developments in hardware and materials electronics and software and seeks to create new affinities between digital content and paper the design of the technology drew from findings from naturalistic studies of the uses of paper particularly when considering how users might interact with the augmented technology we briefly review these studies and discuss the results of an evaluation of the emerging technology analysis of the fine details of the conduct of participants in these assessments suggest how even when we design simple forms of interaction with device these can be shaped and transformed by the participation and collaboration of others
we present physics based approach to synthesizing motion of virtual character in dynamically varying environment our approach views the motion of responsive virtual character as sequence of solutions to the constrained optimization problem formulated at every time step this framework allows the programmer to specify active control strategies using intuitive kinematic goals significantly reducing the engineering effort entailed in active body control our optimization framework can incorporate changes in the character’s surroundings through synthetic visual sensory system and create significantly different motions in response to varying environmental stimuli our results show that our approach is general enough to encompass wide variety of highly interactive motions
an xml to relational mapping scheme consists of procedure for shredding documents into relational databases procedure for publishing databases back as documents and set of constraints the databases must satisfy in previous work we defined two notions of information preservation for mapping schemes losslessness which guarantees that any document can be reconstructed from its corresponding database and validation which requires every legal database to correspond to valid document we also described one information preserving mapping scheme called edge and showed that under reasonable assumptions losslessness and validation are both undecidable this leads to the question we study in this paper how to design mapping schemes that are information preserving we propose to do it by starting with scheme known to be information preserving and applying to it equivalence preserving transformations written in weakly recursive ilog we study an instance of this framework the lilo algorithm and show that it provides significant performance improvements over edge and introduces constraints that are efficiently enforced in practice
most partial evaluators do not take the availability of machine level resources such as registers or cache into consideration when making their specialization decisions the resulting resource contention can lead to severe performance degradation causing in extreme cases the specialized code to run slower than the unspecialized code in this paper we consider how resource considerations can be incorporated within partial evaluator we develop an abstract formulation of the problem show that optimal resource bounded partial evaluation is np complete and discuss simple heuristics that can be used to address the problem in practice
unification is one of the key procedures in first order theorem provers most first order theorem provers use the robinson unification algorithm although its complexity is in the worst case exponential the algorithm is easy to implement and examples on which it may show exponential behaviour are believed to be atypical more sophisticated algorithms such as the martelli and montanari algorithm offer polynomial complexity but are harder to implement very little is known about the practical perfomance of unification algorithms in theorem provers previous case studies have been conducted on small numbers of artificially chosen problem and compared term to term unification while the best theorem provers perform set of terms to term unification using term indexing to evaluate the performance of unification in the context of term indexing we made large scale experiments over the tptp library containing thousands of problems using the compitmethodology our results confirm that the robinson algorithm is the most efficient one in practice they also reveal main sources of inefficiency in other algorithms we present these results and discuss various modification of unification algorithms
real time active database systems rtadbss have attracted considerable amount of research attention in the very recent past and number of important applications have been identified for such systems such as telecommunications network management automated air traffic control automated financial trading process control and military command and control systems in spite of the recognized importance of this area very little research has been devoted to exploring the dynamics of transaction processing in rtadbss concurrency control cc constitutes an integral part of any transaction processing strategy and thus deserves special attention in this paper we study cc strategies in rtadbss and postulate number of cc algorithms these algorithms exploit the special needs and features of rtadbss and are shown to deliver substantially superior performance to conventional real time cc algorithms
with the advent of internet and world wide web www technologies distance education learning or web based learning has enabled new era of education there are number of issues that have significant impact on distance education including those from educational sociological and psychological perspectives rather than attempting to cover exhaustively all the related perspectives in this survey article we focus on the technological issues number of technology issues are discussed including distributed learning collaborative learning distributed content management mobile and situated learning and multimodal interaction and augmented devices for learning although we have tried to include the state of the art technologies and systems here it is anticipated that many new ones will emerge in the near future as such we point out several emerging issues and technologies that we believe are promising for the purpose of highlighting important directions for future research
this paper addresses the inherent unreliability and instability of worker nodes in large scale donationbased distributed infrastructures such as pp and grid systems we present adaptive scheduling techniques that can mitigate this uncertainty and significantly outperform current approaches in this work we consider nodes that execute tasks via donated computational resources and may behave erratically or maliciously we present model in which reliability is not binary property but statistical one based on nodes prior performance and behavior we use this model to construct several reputationbased scheduling algorithms that employ estimated reliability ratings of worker nodes for efficient task allocation our scheduling algorithms are designed to adapt to changing system conditions as well as non stationary node reliability through simulation we demonstrate that our algorithms can significantly improve throughput while maintaining very high success rate of task completion our results suggest that reputation based scheduling can handle wide variety of worker populations including non stationary behavior with overhead that scales well with system size we also show that our adaptation mechanism allows the application designer fine grain control over desired performance metrics
this paper describes scalable low complexity alternative to the conventional load store queue lsq for superscalar processors that execute load and store instructions speculatively and out of order prior to resolving their dependences whereas the lsq requires associative and age prioritized searches for each access we propose that an address indexed store forwarding cache sfc perform store to load forwarding and that an address indexed memory disambiguation table mdt perform memory disambiguation neither structure includes cam the sfc behaves as small cache accessed speculatively and out oforder by both loads and stores because the sfc does not rename in flight stores to the same address violations of memory anti and output dependences can cause in flight loads to obtain incorrect values from the sfc therefore the mdt uses sequence numbers to detect and recover from true anti and output memory dependence violations we observe empirically that loads and stores that violate anti and output memory dependences are rarely on program critical path and that the additional cost of enforcing predicted anti and output dependences among these loads and stores is minimal in conjunction with scheduler that enforces predicted anti and output dependences the mdt and sfc yield performance equivalent to that of large lsq that has similar or greater circuit complexity the sfc and mdt are scalable structures that yield high performance and lower dynamic power consumption than the lsq and they are well suited for checkpointed processors with large instruction windows
the successful deployment of digital technologies by humanities scholars presents computer scientists with number of unique scientific and technological challenges the task seems particularly daunting because issues in the humanities are presented in abstract language demanding the kind of subtle interpretation often thought to be beyond the scope of artificial intelligence and humanities scholars themselves often disagree about the structure of their disciplines the future of humanities computing depends on having tools for automatically discovering complex semantic relationships among different parts of corpus digital library tools for the humanities will need to be capable of dynamically tracking the introduction of new ideas and interpretations and applying them to older texts in ways that support the needs of scholars and students this paper describes the design of new algorithms and the adjustment of existing algorithms to support the automated and semi automated management of domain rich metadata for an established digital humanities project the stanford encyclopedia of philosophy our approach starts with hand built formal ontology that is modified and extended by combination of automated and semi automated methods thus becoming dynamic ontology we assess the suitability of current information retrieval and information extraction methods for the task of automatically maintaining the ontology we describe novel measure of term relatedness that appears to be particularly helpful for predicting hierarchical relationships in the ontology we believe that our project makes further contribution to information science by being the first to harness the collaboration inherent in expert maintained dynamic reference work to the task of maintaining and verifying formal ontology we place special emphasis on the task of bringing domain expertise to bear on all phases of the development and deployment of the system from the initial design of the software and ontology to its dynamic use in fully operational digital reference work
library of algorithms developed as algorithmic cyberfilms is presented algorithmic cyberfilms are new type of software components for presentation specification programming and automatic code generation of computational algorithms the algorithmic cyberfilm format is implemented as set of multimedia frames and scenes and each component is represented by frames of algorithmic skeletons representing dynamical features of an algorithm by frames of integrated view providing static features of the algorithm in compact format and by corresponding template codes supporting the program generation we developed library which is collection of basic and advanced algorithms taught at universities including computation on grids trees and graphs in this paper we present basic constructs of visual languages which are used for representing cyberfilms as well as for demonstrating the library components we also provide general overview of the library and its features in addition we discuss results of experiments which were conducted to verify the usability of the library components and their usefulness in education
the join operation is one of the fundamental relational database query operations it facilitates the retrieval of information from two different relations based on cartesian product of the two relations the join is one of the most diffidult operations to implement efficiently as no predefined links between relations are required to exist as they are with network and hierarchical systems the join is the only relational algebra operation that allows the combining of related tuples from relations on different attribute schemes since it is executed frequently and is expensive much research effort has been applied to the optimization of join processing in this paper the different kinds of joins and the various implementation techniques are surveyed these different methods are classified based on how they partition tuples from different relations some require that all tuples from one be compared to all tuples from another other algorithms only compare some tuples from each in addition some techniques perform an explicit partitioning whereas others are implicit
vectorization has been an important method of using data level parallelism to accelerate scientific workloads on vector machines such as cray for the past three decades in the last decade it has also proven useful for accelerating multi media and embedded applications on short simd architectures such as mmx sse and altivec most of the focus has been directed at innermost loops effectively executing their iterations concurrently as much as possible outer loop vectorization refers to vectorizing level of loop nest other than the innermost which can be beneficial if the outer loop exhibits greater data level parallelism and locality than the innermost loop outer loop vectorization has traditionally been performed by interchanging an outer loop with the innermost loop followed by vectorizing it at the innermost position more direct unroll and jam approach can be used to vectorize an outer loop without involving loop interchange which can be especially suitable for short simd architectures in this paper we revisit the method of outer loop vectorization paying special attention to properties of modern short simd architectures we show that even though current optimizing compilers for such targets do not apply outer loop vectorization in general it can provide significant performance improvements over innermost loop vectorization our implementation of direct outer loop vectorization available in gcc achieves speedup factors of and on average across set of benchmarks compared to and achieved by innermost loop vectorization when running on cell be spu and powerpc processors respectively moreover outer loop vectorization provides new reuse opportunities that can be vital for such short simd architectures including efficient handling of alignment we present an optimization tapping such opportunities capable of further boosting the performance obtained by outer loop vectorization to achieve average speedup factors of and
missing values issue in databases is an important problem because missing values bias the information provided by the usual data mining methods in this paper we are searching for mining patterns satisfying correct properties in presence of missing values it means that these patterns must satisfy the properties in the corresponding complete database we focus on free patterns thanks to new definition of this property suitable for incomplete data and compatible with the usual one we certify that the extracted free patterns in an incomplete database also satisfy this property in the corresponding complete database moreover this approach enables to provide an anti monotone criterion with respect to the pattern inclusion and thus design an efficient level wise algorithm which extracts correct free patterns in presence of missing values
sensor networks are widely used in many applications to collaboratively collect information from the physical environment in these applications the exploration of the relationship and linkage of sensing data within multiple regions can be naturally expressed by joining tuples in these regions however the highly distributed and resource constraint nature of the network makes join challenging query in this paper we address the problem of processing join query among different regions progressively and energy efficiently in sensor networks the proposed algorithm peja progressive energy efficient join algorithm adopts an event driven strategy to output the joining results as soon as possible and alleviates the storage shortage problem in the in network nodes it also installs filters in the joining regions to prune unmatchable tuples in the early processing phase saving lots of unnecessary transmissions extensive experiments on both synthetic and real world data sets indicate that the peja scheme outperforms other join algorithms and it is effective in reducing the number of transmissions and the delay of query results during the join processing
concurrent garbage collectors require write barriers to preserve consistency but these barriers impose significant direct and indirect costs while there has been lot of work on optimizing write barriers we present the first study of their elision in concurrent collector we show conditions under which write barriers are redundant and describe how these conditions can be applied to both incremental update or snapshot at the beginning barriers we then evaluate the potential for write barrier elimination with trace based limit study which shows that significant percentage of write barriers are redundant on average of incremental barriers and of snapshot barriers are unnecessary
we introduce novel spatial join operator the ring constrained join rcj given two sets and of spatial points the result of rcj consists of pairs where epsilon epsilon satisfying an intuitive geometric constraint the smallest circle enclosing and contains no other points in this new operation has important applications in decision support eg placing recycling stations at fair locations between restaurants and residential complexes clearly rcj is defined based on geometric constraint but not on distances between points thus our operation is fundamentally different from the conventional distance joins and closest pairs problems we are not aware of efficient processing algorithms for rcj in the literature brute force solution requires computational cost quadratic to input size and it does not scale well for large datasets in view of this we develop efficient tree based algorithms for computing rcj by exploiting the characteristics of the geometric constraint we evaluate experimentally the efficiency of our methods on synthetic and real spatial datasets the results show that our proposed algorithms scale well with the data size and have robust performance across different data distributions
we prove that with high probability skip graph contains regular expander as subgraph and estimate the quality of the expansion via simulations as consequence skip graphs contain large connected component even after an adversarial deletion of nodes we show how the expansion property could be used to sample node in the skip graph in highly efficient manner we also show that the expansion property could be used to load balance the skip graph quickly finally it is shown that the skip graph could serve as an unstructured pp system thus it is good candidate for hybrid pp system
enumerating maximal biclique subgraphs from graph is computationally challenging problem in this paper we efficiently enumerate them through the use of closed patterns of the adjacency matrix of the graph for an undirected graph without self loops we prove that the number of closed patterns in the adjacency matrix of is even and ii for every maximal biclique subgraph there always exists unique pair of closed patterns that matches the two vertex sets of the subgraph therefore the problem of enumerating maximal bicliques can be solved by using efficient algorithms for mining closed patterns which are algorithms extensively studied in the data mining field however this direct use of existing algorithms causes duplicated enumeration to achieve high efficiency we propose an mn time delay algorithm for non duplicated enumeration in particular for enumerating those maximal bicliques with large size where and are the number of edges and vertices of the graph respectively we evaluate the high efficiency of our algorithm by comparing it to state of the art algorithms on many graphs
in this paper we propose complete framework for geometry modeling and processing that uses only fast geodesic computations the basic building block for these techniques is novel greedy algorithm to perform uniform or adaptive remeshing of triangulated surface our other contributions include parameterization scheme based on barycentric coordinates an intrinsic algorithm for computing geodesic centroidal tessellations and fast and robust method to flatten genus surface patch on large meshes more than vertices our techniques speed up computation by over one order of magnitude in comparison to classical remeshing and parameterization methods our methods are easy to implement and do not need multilevel solvers to handle complex models that may contain poorly shaped triangles
unary inclusion dependencies are database constraints expressing subset relationships the decidability of implication for these dependencies together with embedded implicational dependencies such as functional dependencies are investigated as shown by casanova et al the unrestricted and finite implication problems are different for the class of functional and unary inclusion dependencies also for this class and for any fixed finite implication has no ary complete axiomatization for both of these problems complete axiomatizations and polynomial time decision procedures are provided linear time for unrestricted implication and cubic time for finite implication it follows that functional and unary inclusion dependencies form semantically natural class of first order sentences with equality which although not finitely controllable is efficiently solvable and docile generalizing from these results it is shown that the interaction between functional and inclusion dependencies characterizes unrestricted implication of unary inclusion and all embedded implicational dependencies finite implication of unary inclusion and all full implicational dependencies finite implication of unary inclusion and all embedded tuple generating dependencies as direct consequence of this analysis most of the applications of dependency implication are extended within polynomial time to database design problems involving unary inclusion dependencies such examples are tests for lossless joins and tests for complementarity of projective views finally if one additionally requires that
current mobile phone technologies have fostered the emergence of new generation of mobile applications such applications allow users to interact and share information opportunistically when their mobile devices are in physical proximity or close to fixed installations it has been shown how mobile applications such as collaborative filtering and location based services can take advantage of ad hoc connectivity to use physical proximity as filter mechanism inherent to the application logic we discuss the different modes of information sharing that arise in such settings based on the models of persistence and synchronisation we present platform that supports the development of applications that can exploit these modes of ad hoc information sharing and by means of an example show how such an application can be realised based on the supported event model
most virtual channel routers have multiple virtual channels to mitigate the effects of head of line blocking when there are more flows than virtual channels at link packets or flows must compete for channels either in dynamic way at each link or by static assignment computed before transmission starts in this paper we present methods that statically allocate channels to flows at each link when oblivious routing is used and ensure deadlock freedom for arbitrary minimal routes when two or more virtual channels are available we then experimentally explore the performance trade offs of static and dynamic virtual channel allocation for various oblivious routing methods including dor romm valiant and novel bandwidth sensitive oblivious routing scheme bsorm through judicious separation of flows static allocation schemes often exceed the performance of dynamic allocation schemes
an instance of the maximum constraint satisfaction problem max csp is finite collection of constraints on set of variables and the goal is to assign values to the variables that maximises the number of satisfied constraints max csp captures many well known problems such as maxk sat and max cut and is consequently np hard thus it is natural to study how restrictions on the allowed constraint types or constraint language affect the complexity and approximability of max csp the pcp theorem is equivalent to the existence of constraint language for which max csp has hard gap at location ie it is np hard to distinguish between satisfiable instances and instances where at most some constant fraction of the constraints are satisfiable all constraint languages for which the csp problem ie the problem of deciding whether all constraints can be satisfied is currently known to be np hard have certain algebraic property we prove that any constraint language with this algebraic property makes max csp have hard gap at location which in particular implies that such problems cannot have ptas unless np we then apply this result to max csp restricted to single constraint type this class of problems contains for instance max cut and max dicut assuming
inspired by the recently introduced framework of and or search spaces for graphical models we propose to augment multi valued decision diagrams mdd with and nodes in order to capture function decomposition structure and to extend these compiled data structures to general weighted graphical models eg probabilistic models we present the and or multi valued decision diagram aomdd which compiles graphical model into canonical form that supports polynomial eg solution counting belief updating or constant time eg equivalence of graphical models queries we provide two algorithms for compiling the aomdd of graphical model the first is search based and works by applying reduction rules to the trace of the memory intensive and or search algorithm the second is inference based and uses bucket elimination schedule to combine the aomdds of the input functions via the the apply operator for both algorithms the compilation time and the size of the aomdd are in the worst case exponential in the treewidth of the graphical model rather than pathwidth as is known for ordered binary decision diagrams obdds we introduce the concept of semantic treewidth which helps explain why the size of decision diagram is often much smaller than the worst case bound we provide an experimental evaluation that demonstrates the potential of aomdds
compiling concurrent programs to run on sequential processor presents difficult tradeoff between execution time and size of generated code on one hand the process based approach to compilation generates reasonable sized code but incurs significant execution overhead due to concurrency on the other hand the automata based approach incurs much smaller execution overhead but can result in code that is several orders of magnitude larger this paper proposes way of combining the two approaches so that the performance of the automata based approach can be achieved without suffering the code size increase due to it the key insight is that the best of the two approaches can be achieved by using symbolic execution similar to the automata based approach to generate code for the commonly executed paths referred to as fast paths and using the process based approach to generate code for the rest of the program we demonstrate the effectiveness of this approach by implementing our techniques in the esp compiler and applying them to set of filter programs and to vmmc network firmware
loop fusion and loop shifting are important transformations for improving data locality to reduce the number of costly accesses to off chip memories since exploring the exact platform mapping for all the loop transformation alternatives is time consuming process heuristics steered by improved data locality are generally used however pure locality estimates do not sufficiently take into account the hierarchy of the memory platform this paper presents fast incremental technique for hierarchical memory size requirement estimation for loop fusion and loop shifting at the early loop transformations design stage as the exact memory platform is often not yet defined at this stage we propose platform independent approach which reports the pareto optimal trade off points for scratch pad memory size and off chip memory accesses the estimation comes very close to the actual platform mapping experiments on realistic test vehicles confirm that it helps the designer or tool to find the interesting loop transformations that should then be investigated in more depth afterward
barriers locks and flags are synchronizing operations widely used programmers and parallelizing compilers to produce race free parallel programs often times these operations are placed suboptimally either because of conservative assumptions about the program or merely for code simplicitywe propose speculative synchronization which applies the philosophy behind thread level speculation tls to explicitly parallel applications speculative threads execute past active barriers busy locks and unset flags instead of waiting the proposed hardware checks for conflicting accesses and if violation is detected offending speculative thread is rolled back to the synchronization point and restarted on the fly tls’s principle of always keeping safe thread is key to our proposal in any speculative barrier lock or flag the existence of one or more safe threads at all times guarantees forward progress even in the presence of access conflicts or speculative buffer overflow our proposal requires simple hardware and no programming effort furthermore it can coexist with conventional synchronization at run timewe use simulations to evaluate compiler and hand parallelized applications our results show reduction in the time lost to synchronization of on average and reduction in overall program execution time of on average
this paper presents system for designing freeform surfaces with collection of curves the user first creates rough model by using sketching interface unlike previous sketching systems the user drawn strokes stay on the model surface and serve as handles for controlling the geometry the user can add remove and deform these control curves easily as if working with line drawing the curves can have arbitrary topology they need not be connected to each other for given set of curves the system automatically constructs smooth surface embedding by applying functional optimization our system provides real time algorithms for both control curve deformation and the subsequent surface optimization we show that one can create sophisticated models using this system which have not yet been seen in previous sketching or functional optimization systems
to be an effective platform for high performance distributed applications off the shelf object request broker orb middleware such as corba must preserve communication layer quality of service qos properties both vertically ie network interface leftrightarrow application layer and horizontally ie end to end however conventional network interfaces subsystems and middleware interoperability protocols are not well suited for applications that possess stringent throughput latency and jitter requirements it is essential therefore to develop vertically and horizontally integrated orb endsystems that can be configured flexibly to support high performance network interfaces and subsystems and used transparently by performance sensitive applications this paper provides three contributions to research on high performance support for qos enabled orb middleware first we outline the key research challenges faced by high performance orb endsystem developers second we describe how our real time rio subsystem and pluggable protocol framework enables orb endsystems to preserve high performance network interface qos up to applications running on off the shelf hardware and software third we illustrate empirically how highly optimized orb middleware can be integrated with real time subsystem to reduce latency bounds on communication between high priority clients without unduly penalizing low priority and best effort clients our results demonstrate how it is possible to develop orb endsystems that are both highly flexible and highly efficient
background test first programming is regarded as one of the software development practices that can make unit tests to be more rigorous thorough and effective in fault detection code coverage measures can be useful as indicators of the thoroughness of unit test suites while mutation testing turned out to be effective at finding faults objective this paper presents an experiment in which test first vs test last programming practices are examined with regard to branch coverage and mutation score indicator of unit tests method student subjects were randomly assigned to test first and test last groups in order to further reduce pre existing differences among subjects and to get more sensitive measure of our experimental effect multivariate analysis of covariance was performed results multivariate tests results indicate that there is no statistically significant difference between test first and test last practices on the combined dependent variables ie branch coverage and mutation score indicator even if we control for the pre test results the subjects experience and when the subjects who showed deviations from the assigned programming technique are excluded from the analysis conclusion according to the preliminary results presented in this paper the benefits of the test first practice in this specific context can be considered minor limitation it is probably the first ever experimental evaluation of the impact of test first programming on mutation score indicator of unit tests and further experimentation is needed to establish evidence
we introduce point based algorithm for computing and rendering stream surfaces and path surfaces of flow the points are generated by particle tracing and an even distribution of those particles on the surfaces is achieved by selective particle removal and creation texture based surface flow visualization is added to show inner flow structure on those surfaces we demonstrate that our visualization method is designed for steady and unsteady flow alike both the path surface component and the texture based flow representation are capable of processing time dependent data finally we show that our algorithms lend themselves to an efficient gpu implementation that allows the user to interactively visualize and explore stream surfaces and path surfaces even when seed curves are modified and even for time dependent vector fields
the major concern of educators is how to enhance the outcome of education better education media used to assist teaching has constantly been sought by researchers in the educational technology domain virtual reality vr has been identified as one of them many have agreed that vr could help to improve performance and conceptual understanding on specific range of task however there is limited understanding of how vr could enhance the learning outcomes this paper reviews types of vr that have been used for learning the theoretical framework for vr learning environment and instructional design for vr based learning environment further research is suggested for vr based learning environment
server providers that support commerce applications as service for multiple commerce web sites traditionally use tiered server architecture this architecture includes an application tier to process requests for dynamically generated content how this tier is provisioned can significantly impact provider’s profit margin in this article we study methods to provision servers in the application serving tier that increase server provider’s profits first we examine actual traces of request arrivals to the application tier of an commerce site and show that the arrival process is effectively poisson next we construct an optimization problem in the context of set of application servers modeled as ps queueing systems and derive three simple methods that approximate the allocation that maximizes profits simulation results demonstrate that our approximation methods achieve profits that are close to optimal and are significantly higher than those achieved via simple heuristics
the pervasive internet and the massive deployment of sensor devices have lead to huge heterogeneous distributed system connecting millions of data sources and customers together fra on the one hand mediation systems bgl dnjt using xml as an exchange language have been proposed to federate data accross distributed heterogeneous data sources on the other hand work msfc aml bgs ndk have been done to integrate data from sensors the challenge is now to integrate data coming from both classical data dbms web sites xml files and dynamic data sensors in the context of an ad hoc network and finally to adapt queries and result to match the client profile we propose to use the tgv model tdnl tdnla as mobile agent to query sources across devices sources and terminal in the context of rescue coordination system this work is integrated in the padawan project
recent work on querying data streams has focused on systems where newly arriving data is processed and continuously streamed to the user in real time in many emerging applications however ad hoc queries and or intermittent connectivity also require the processing of data that arrives prior to query submission or during period of disconnection for such applications we have developed psoup system that combines the processing of ad hoc and continuous queries by treating data and queries symmetrically allowing new queries to be applied to old data and new data to be applied to old queries psoup also supports intermittent connectivity by separating the computation of query results from the delivery of those results psoup builds on adaptive query processing techniques developed in the telegraph project at uc berkeley in this paper we describe psoup and present experiments that demonstrate the effectiveness of our approach
the branch misprediction penalty is major performance limiter and major cause of wasted energy in high performance processors the diverge merge processor reduces this penalty by dynamically predicating wide range of hard to predict branches at runtime in an energy efficient way that doesn’t significantly increase hardware complexity or require major isa changes
splatting based rendering techniques are currently the best choice for efficient high quality rendering of point based geometries however such techniques are not suitable for large magnification especially when the object is under sampled this paper improves the rendering quality of pure splatting techniques using fast dynamic up sampling algorithm for point based geometry our algorithm is inspired by interpolatory subdivision surfaces where the geometry is refined iteratively at each step the refined geometry is that from the previous step enriched by new set of points the point insertion procedure uses three operators local neighborhood selection operator refinement operator adding new points and smoothing operator even though our insertion procedure makes the analysis of the limit surface complicated and it does not guarantee its continuity it remains very efficient for high quality real time point rendering indeed while providing an increased rendering quality especially for large magnification our algorithm needs no other preprocessing nor any additional information beyond that used by any splatting technique this extended version real time point cloud refinement in proceedings of eurographics symposium on point based graphic pp contains details on creases handling and more comparison to other smoothing operators
we present novel culling algorithm that uses deforming non penetration filters to improve the performance of continuous collision detection ccd algorithms the underlying idea is to use simple and effective filter that reduces both the number of false positives and the elementary tests between the primitives this filter is derived from the coplanarity condition and can be easily combined with other methods used to accelerate ccd we have implemented the algorithm and tested its performance on many non rigid simulations in practice we can reduce the number of false positives significantly and improve the overall performance of ccd algorithms by
this paper describes an approach to the use of citation links to improve the scientific paper classification performance in this approach we develop two refinement functions linear label refinement llr and probabilistic label refinement plr to model the citation link structures of the scientific papers for refining the class labels of the documents obtained by the content based naive bayes classification method the approach with the two new refinement models is examined and compared with the content based naive bayes method on standard paper classification data set with increasing training set sizes the results suggest that both refinement models can significantly improve the system performance over the content based method for all the training set sizes and that plr is better than llr when the training examples are sufficient
users are storing ever increasing amounts of information digitally driven by many factors including government regulations and the public’s desire to digitally record their personal histories unfortunately many of the security mechanisms that modern systems rely upon such as encryption are poorly suited for storing data for indefinitely long periods of time it is very difficult to manage keys and update cryptosystems to provide secrecy through encryption over periods of decades worse an adversary who can compromise an archive need only wait for cryptanalysis techniques to catch up to the encryption algorithm used at the time of the compromise in order to obtain secure data to address these concerns we have developed potshards an archival storage system that provides long term security for data with very long lifetimes without using encryption secrecy is achieved by using provably secure secret splitting and spreading the resulting shares across separately managed archives providing availability and data recovery in such system can be difficult thus we use new technique approximate pointers in conjunction with secure distributed raid techniques to provide availability and reliability across independent archives to validate our design we developed prototype potshards implementation which has demonstrated normal storage and retrieval of user data using indexes the recovery of user data using only the pieces user has stored across the archives and the reconstruction of an entire failed archive
in data outsourcing model data owners engage third party data servers called publishers to manage their data and process queries on their behalf as these publishers may be untrusted or susceptible to attacks it could produce incorrect query results to users in this paper we introduce an authentication scheme for outsourced multi dimensional databases with the proposed scheme users can verify that their query answers from publisher are complete ie no qualifying tuples are omitted and authentic ie all the result values are legitimate in addition our scheme guarantees minimality ie no non answer points are returned in the plain our scheme supports window range knn and rnn queries on multi dimensional databases we have implemented the proposed scheme and our experimental results on knn queries show that our approach is practical scheme with low overhead
current weak consistency semantics provide worst case guarantees to clients these guarantees fail to adequately describe systems that provide varying levels of consistency in the face of distinct failure modes or that achieve better than worst case guarantees during normal execution the inability to make precise statements about consistency throughout system’s execution represents lost opportunity to clearly understand client application requirements and to optimize systems and services appropriately in this position paper we motivate the need for and introduce the concept of consistability unified metric of consistency and availability consistability offers means of describing specifying and discussing how much consistency usually consistent system provides and how often it does so we describe our initial results of applying consistability reasoning to keyvalue store we are developing and to other recent distributed systems we also discuss the limitations of our consistability definition
we have developed web repository crawler that is used for reconstructing websites when backups are unavailable our crawler retrieves web resources from the internet archive google yahoo and msn we examine the challenges of crawling web repositories and we discuss strategies for overcoming some of these obstacles we propose three crawling policies which can be used to reconstruct websites we evaluate the effectiveness of the policies by reconstructing websites and comparing the results with live versions of the websites we conclude with our experiences reconstructing lost websites on behalf of others and discuss plans for improving our web repository crawler
the internet is an increasingly important source of information about political candidates and issues voters and candidates alike are taking advantage of wide variety of internet tools many internet users continue to be unskilled at the use of search tools and find restricted subset of available information simple assistive technology was developed to help voters formulate queries this query prosthesis changed the search and browsing patterns of participants in mock voting study and provides guidelines for more comprehensive interfaces to help voters an argument is made for greater application of design science in digital government research
after the debunking of some myths about why pp overlays are not feasible in sensornets many such solutions have been proposed none of the existing pp overlays for sensornets provide energy level application and services on this purpose and based on the efficient pp method presented in we design novel pp overlay for energy level discovery in sensornet the so called eldt energy level distributed tree sensor nodes are mapped to peers based on their energy level as the energy levels change the sensor nodes would have to move from one peer to another and this oparation is the most crucial for the efficient scalability of the proposed system similarly as the energy level of sensor node becomes extremelly low that node may want to forward it’s task to another node with the desired energy level the adaptation of the pp index presented in quarantees the best known query performance of the above operation we experimentally verify this performance via an appropriate simulator we have designed for this purpose
distributed applications provide numerous advantages related to software performance reliability interoperability and extensibility this paper focuses on distributed java programs built with the help of the remote method invocation rmi mechanism we consider points to analysis for such applications points to analysis determines the objects pointed to by reference variable or reference object field such information plays fundamental role as prerequisite for many other static analyses we present the first theoretical definition of points to analysis for rmi based java applications and we present an algorithm for implementing flow and context insensitive points to analysis for such applications we also discuss the use of points to information for computing call graph information for understanding data dependencies due to remote memory locations and for identifying opportunities for improving the performance of object serialization at remote calls the work described in this paper solves one key problem for static analysis of rmi programs and provides starting point for future work on improving the understanding testing verification and performance of rmi based software
the problem of rewriting query using materialized view is studied for well known fragment of xpath that includes the following three constructs wildcards descendant edges and branches in earlier work determining the existence of rewriting was shown to be conp hard but no tight complexity bound was given while it was argued that sigma is an upper bound the proof was based on results that have recently been refuted consequently the exact complexity and even decidability of this basic problem has been unknown and there have been no practical rewriting algorithms if the query and the view use all the three constructs mentioned above it is shown that under fairly general conditions there are only two candidates for rewriting and hence the problem can be practically solved by two containment tests in particular under these conditions determining the existence of rewriting is conp complete the proofs utilize various novel techniques for reasoning about xpath patterns for the general case the exact complexity remains unknown but it is shown that the problem is decidable
process oriented composition languages such as bpel allow web services to be composed into more sophisticated services using workflow process however such languages exhibit some limitations with respect to modularity and flexibility they do not provide means for well modularized specification of crosscutting concerns such as logging persistence auditing and security they also do not support the dynamic adaptation of composition at runtime in this paper we advocate an aspect oriented approach to web service composition and present the design and implementation of aobpel an aspect oriented extension to bpel we illustrate through examples how aobpel makes the composition specification more modular and the composition itself more flexible and adaptable
in this article we present interactive focus and context visualizations for augmented reality ar applications we demonstrate how visualizations are used to affect the user’s perception of hidden objects by presenting contextual information in the area of augmentation we carefully overlay synthetic data on top of the real world imagery by taking into account the information that is about to be occluded furthermore we present operations to control the amount of augmented information additionally we developed an interaction tool based on the magic lens technique which allows for interactive separation of focus from context we integrated our work into rendering framework developed on top of the studierstube augmented reality system we finally show examples to demonstrate how our work benefits ar
the inefficiency of integration processes as an abstraction of workflow based integration tasks is often reasoned by low resource utilization and significant waiting times for external systems with the aim to overcome these problems we proposed the concept of process vectorization there instance based integration processes are transparently executed with the pipes and filters execution model here the term vectorization is used in the sense of processing sequence vector of messages by one standing process although it has been shown that process vectorization achieves significant throughput improvement this concept has two major drawbacks first the theoretical performance of vectorized integration process mainly depends on the performance of the most cost intensive operator second the practical performance strongly depends on the number of available threads in this paper we present an advanced optimization approach that addresses the mentioned problems therefore we generalize the vectorization problem and explain how to vectorize process plans in cost based manner due to the exponential complexity we provide heuristic computation approach and formally analyze its optimality in conclusion of our evaluation the message throughput can be significantly increased compared to both the instance based execution as well as the rule based process vectorization
we present new algorithm for simplifying the shape of objects by manipulating their medial axis transform mat from an unorganized set of boundary points our algorithm computes the mat decomposes the axis into parts then selectively removes subset of these parts in order to reduce the complexity of the overall shape the result is simplified mat that can be used for variety of shape operations in addition polygonal surface of the resulting shape can be directly generated from the filtered mat using robust surface reconstruction method the algorithm presented is shown to have number of advantages over other existing approaches
we study logic programs with arbitrary abstract constraint atoms called atoms as theoretical means to analyze program properties we investigate the possibility of unfolding these programs to logic programs composed of ordinary atoms this approach reveals some structural properties of program with atoms and enables characterization of these properties based on the known properties of the transformed program furthermore this approach leads to straightforward definition of answer sets for disjunctive programs with catoms where atom may appear in the head of rule as well as in the body we also study the complexities for various classes of logic programs with atoms
consider the setting where panel of judges is repeatedly asked to partially rank sets of objects according to given criteria and assume that the judges expertise depends on the objects domain learning to aggregate their rankings with the goal of producing better joint ranking is fundamental problem in many areas of information retrieval and natural language processing amongst others however supervised ranking data is generally difficult to obtain especially if coming from multiple domains therefore we propose framework for learning to aggregate votes of constituent rankers with domain specific expertise without supervision we apply the learning framework to the settings of aggregating full rankings and aggregating top lists demonstrating significant improvements over domain agnostic baseline in both cases
by means of passive optical motion capture real people can be authentically animated and photo realistically textured to import real world characters into virtual environments however surface reflectance properties must also be known we describe video based modeling approach that captures human shape and motion as well as reflectance characteristics from handful of synchronized video recordings the presented method is able to recover spatially varying surface reflectance properties of clothes from multiview video footage the resulting model description enables us to realistically reproduce the appearance of animated virtual actors under different lighting conditions as well as to interchange surface attributes among different people eg for virtual dressing our contribution can be used to create renditions of real world people under arbitrary novel lighting conditions on standard graphics hardware
in this article we propose techniques for modeling and rendering of heterogeneous translucent materials that enable acquisition from measured samples interactive editing of material attributes and real time rendering the materials are assumed to be optically dense such that multiple scattering can be approximated by diffusion process described by the diffusion equation for modeling heterogeneous materials we present the inverse diffusion algorithm for acquiring material properties from appearance measurements this modeling algorithm incorporates regularizer to handle the ill conditioning of the inverse problem an adjoint method to dramatically reduce the computational cost and hierarchical gpu implementation for further speedup to render an object with known material properties we present the polygrid diffusion algorithm which solves the diffusion equation with boundary condition defined by the given illumination environment this rendering technique is based on representation of an object by polygrid grid with regular connectivity and an irregular shape which facilitates solution of the diffusion equation in arbitrary volumes because of the regular connectivity our rendering algorithm can be implemented on the gpu for real time performance we demonstrate our techniques by capturing materials from physical samples and performing real time rendering and editing with these materials
one approach to model checking program source code is to view model checker as target machine in this setting program source code is translated to model checker’s input language using process that shares much in common with program compilation for example well defined intermediate program representations are used to stage the translation through series of analyses and optimizing transformations and target specific details are isolated in code generation modulesin this paper we present the bandera intermediate representation bir guarded assignment transformation system language that has been designed to support the translation of java programs to variety of model checkers bir includes constructs such as inheritance dynamic creation of data and locking primitives that are designed to model the semantics of java primitives bir also includes several non deterministic choice constructs that support abstraction in modeling and specification of properties of dynamic heap structureswe have developed bir based tool infrastructure that has been applied to develop customized analysis frameworks for several different input languages using different model checking tools we present bir’s type system and operational semantics in sufficient detail to support similar applications by other researchers this semantics details several state space reductions and state space search variations we describe the translation of java to bir and how bir is translated to the input languages of several model checkers
this paper proposes novel distributed differential evolution algorithm namely distributed differential evolution with explorative exploitative population families dde eepf in dde eepf the sub populations are grouped into two families sub populations belonging to the first family have constant population size are arranged according to ring topology and employ migration mechanism acting on the individuals with the best performance this first family of sub populations has the role of exploring the decision space and constituting an external evolutionary framework the second family is composed of sub populations with dynamic population size the size is progressively reduced the sub populations belonging to the second family are highly exploitative and are supposed to quickly detect solutions with high performance the solutions generated by the second family then migrate to the first family in order to verify its viability and effectiveness the dde eepf has been run on set of various test problems and compared to four distributed differential evolution algorithms numerical results show that the proposed algorithm is efficient for most of the analyzed problems and outperforms on average all the other algorithms considered in this study
lightweight photo sharing particularly via mobile devices is fast becoming common communication medium used for maintaining presence in the lives of friends and family how should such systems be designed to maximize this social presence while maintaining simplicity an experimental photo sharing system was developed and tested that compared to current systems offers highly simplified group centric sharing automatic and persistent people centric organization and tightly integrated desktop and mobile sharing and viewing in an experimental field study the photo sharing behaviors of groups of family or friends were studied using their normal photo sharing methods and with the prototype sharing system results showed that users found photo sharing easier and more fun shared more photos and had an enhanced sense of social presence when sharing with the experimental system results are discussed in the context of design principles for the rapidly increasing number of lightweight photo sharing systems
establishing interschema semantic knowledge between corresponding elements in cooperating owl based multi information server grid environment requires deep knowledge not only about the structure of the data represented in each server but also about the commonly occurring differences in the intended semantics of this data the same information could be represented in various incompatible structures and more importantly the same structure could be used to represent data with many diverse and incompatible semantics in grid environment interschema semantic knowledge can only be detected if both the structural and semantic properties of the schemas of the cooperating servers are made explicit and formally represented in way that computer system can process unfortunately very often there is lack of such knowledge and the underlying grid information servers iss schemas being semantically weak as consequence of the limited expressiveness of traditional data models do not help the acquisition of this knowledge the solution to overcome this limitation is primarily to upgrade the semantic level of the is local schemas through semantic enrichment process by augmenting the local schemas of grid iss to semantically enriched schema models then to use these models in detecting and representing correspondences between classes belonging to different schemas in this paper we investigate the possibility of using owl based domain ontologies both for building semantically rich schema models and for expressing interschema knowledge and reasoning about it we believe that the use of owl rdf in this setting has two important advantages on the one hand it enables semantic approach for interschema knowledge specification by concentrating on expressing conceptual and semantic correspondences between both the conceptual intensional definition and the set of instances extension of classes represented in different schemas on the other hand it is exactly this semantic nature of our approach that allows us to devise reasoning mechanisms for discovering and reusing interschema knowledge when the need arises to compare and combine it
this paper investigates the problem of augmenting labeled data with unlabeled data to improve classification accuracy this is significant for many applications such as image classification where obtaining classification labels is expensive while large unlabeled examples are easily available we investigate an expectation maximization em algorithm for learning from labeled and unlabeled data the reason why unlabeled data boosts learning accuracy is because it provides the information about the joint probability distribution theoretical argument shows that the more unlabeled examples are combined in learning the more accurate the result we then introduce em algorithm based on the combination of em with bootstrap method to exploit the large unlabeled data while avoiding prohibitive cost experimental results over both synthetic and real data sets show that the proposed approach has satisfactory performance
connectivity primarily graph theoretic concept helps define the fault tolerance of wireless sensor networks wsns in the sense that it enables the sensors to communicate with each other so their sensed data can reach the sink on the other hand sensing coverage an intrinsic architectural feature of wsns plays an important role in meeting application specific requirements for example to reliably extract relevant data about sensed field sensing coverage and network connectivity are not quite orthogonal concepts in fact it has been proven that connectivity strongly depends on coverage and hence considerable attention has been paid to establish tighter connection between them although only loose lower bound on network connectivity of wsns is known in this article we investigate connectivity based on the degree of sensing coverage by studying covered wsns where every location in the field is simultaneously covered or sensed by at least sensors property known as coverage where is the degree of coverage we observe that to derive network connectivity of covered wsns it is necessary to compute the sensor spatial density required to guarantee coverage more precisely we propose to use model called the reuleaux triangle to characterize coverage with the help of helly’s theorem and the analysis of the intersection of sensing disks of sensors using deterministic approach we show that the sensor spatial density to guarantee coverage of convex field is proportional to and inversely proportional to the sensing range of the sensors we also prove that network connectivity of covered wsns is higher than their sensing coverage furthermore we propose new measure of fault tolerance for covered wsns called conditional fault tolerance based on the concepts of conditional connectivity and forbidden faulty sensor set that includes all the neighbors of given sensor we prove that covered wsns can sustain large number of sensor failures provided that the faulty sensor set does not include forbidden faulty sensor set
the usage control model ucon has been proposed to augment traditional access control models by integrating authorizations obligations and conditions and providing the properties of decision continuity and attribute mutability several recent work have applied ucon to support security requirements in different computing environments such as resource sharing in collaborative computing systems and data control in remote platforms in this paper we identify two individual but interrelated problems of the original ucon model and recent implementations oversimplifying the concept of usage session of the model and the lack of comprehensive ongoing enforcement mechanism of implementations we extend the core ucon model with continuous usage sessions thus extensively augment the expressiveness of obligations in ucon and then propose general continuity enhanced and configurable usage control enforcement engine finally we explain how our approach can satisfy flexible security requirements with an implemented prototype for healthcare information system
checkpointing and rollback recovery are widely used techniques for achieving fault tolerance in distributed systems in this paper we present novel checkpointing algorithm which has the following desirable features process can independently initiate consistent global checkpointing by saving its current state called tentative checkpoint other processes come to know about consistent global checkpoint initiation through information piggy backed with the application messages or limited control messages if necessary when process comes to know about new consistent global checkpoint initiation it takes tentative checkpoint after processing the message not before processing the message as in existing communication induced checkpointing algorithms after process takes tentative checkpoint it starts logging the messages sent and received in memory when process comes to know that every other process has taken tentative checkpoint corresponding to current consistent global checkpoint initiation it flushes the tentative checkpoint and the message log to the stable storage the tentative checkpoints together with the message logs stored in the stable storage form consistent global checkpoint two or more processes can concurrently initiate consistent global checkpointing by taking new tentative checkpoint in that case the tentative checkpoints taken by all these processes will be part of the same consistent global checkpoint the sequence numbers assigned to checkpoints by process increase monotonically checkpoints with the same sequence number form consistent global checkpoint we also present the performance evaluation of our algorithm
this paper presents new interactive rendering and display technique for complex scenes with expensive shading such as global illumination our approach combines sparsely sampled shading points and analytically computed discontinuities edges to interactively generate high quality images the edge and point image is new compact representation that combines edges and points such that fast table driven interpolation of pixel shading from nearby point samples is possible while respecting discontinuitiesthe edge and point renderer is extensible permitting the use of arbitrary shaders to collect shading samples shading discontinuities such as silhouettes and shadow edges are found at interactive rates our software implementation supports interactive navigation and object manipulation in scenes that include expensive lighting effects such as global illumination and geometrically complex objects for interactive rendering we show that high quality images of these scenes can be rendered at frames per second on desktop pc speedup of over ray tracer computing single sample per pixel
future distributed applications will need to support computing devices with wide range of capabilities varying network connectivity increasing mobility of users and wide variation in load placed by clients on services this paper presents dacia framework for building adaptive distributed applications in dacia distributed applications are viewed as consisting of connected components that typically implement data streaming processing and filtering functions dacia provides mechanisms for run time reconfiguration of applications to allow them to adapt to the changing operating environments components can be moved to different hosts during execution while maintaining communication connectivity with other components new components can also be introduced along data paths for example to provide compression on low bandwidth connections keeping communication overheads low is significant challenge in designing component based services dacia is designed so that communication costs among co located components are similar to those of procedure calls performance results as well as examples of adaptive services that can be built using dacia are presented
the objective of this qualitative study was to understand the complex practice of software testing and based on this knowledge to develop process improvement propositions that could concurrently reduce development and testing costs and improve software quality first survey of testing practices was onducted and organizational units ous were interviewed from this sample five ous were further selected for an in depth case study the study used grounded theory as its research method and the data was collected from theme based interviews the analysis yielded improvement propositions that included enhanced testability of software components efficient communication and interaction between development and testing early involvement of testing and risk based testing the connective and central improvement proposition was that testing ought to adapt to the business orientation of the ou other propositions were integrated around this central proposition the results of this study can be used in improving development and testing processes
in parallel processor systems the performance of individual processors is key factor in overall performance processor performance is strongly affected by the behavior of cache memory in that high hit rates are essential for high performance hit rates are lowered when collisions on placing lines in the cache force cache line to be replaced before it has been used to best effect spatial cache collisions occur if data structures and data access patterns are misaligned we describe mathematical scheme to improve alignment and enhance performance in applications which have moderate to large numbers of arrays where various dimensionalities are involved in localized computation and array access patterns are sequential these properties are common in many computational modeling applications furthermore the scheme provides single solution when an application is targeted to run on various numbers of processors in power of two sizes the applicability of the proposed scheme is demonstrated on testbed code for an air quality modeling problem
sensor based interaction has enabled variety of new creative practices with ubiquitous computing designing for creative user experience with sensor based devices benefits from new opportunities as well as new challenges we propose design approach where surrounding context information is brought to the foreground to become resource for interaction available at hand and in real time to the users we illustrate this approach with our project context photography as design case context photography consists of taking still pictures that capture not only incoming light but also some of the additional context surrounding the scene with real time context information visually affecting the pictures as they are taken based on the design and use of our context camera prototypes this paper brings insight into implications of our approach to the design of sensor based ubiquitous computing systems for creative purposes
making case adaptation practical is longstanding challenge for case based reasoning one of the impediments to widespread use of automated case adaptation is the adaptation knowledge bottleneck the adaptation process may require extensive domain knowledge which may be difficult or expensive for system developers to provide this paper advances new approach to addressing this problem proposing that systems mine their adaptation knowledge as needed from pre existing large scale knowledge sources available on the world wide web the paper begins by discussing the case adaptation problem opportunities for adaptation knowledge mining and issues for applying the approach it then presents an initial illustration of the method in case study of the testbed system webadapt webadapt applies the approach in the travel planning domain using opencyc wikipedia and the geonames gis database as knowledge sources for generating substitutions experimental results suggest the promise of the approach especially when information from multiple sources is combined
aggregation of system wide information in large scale distributed systems such as pp systems and grids can be unfairly influenced by nodes that are selfish colluding with each other or are offline most of the time we present avcol which uses probabilistic and gossip style techniques to provide availability aware aggregation concretely avcol is the first aggregation system that implements any arbitrary global predicate that explicitly specifies any node’s probability of inclusion in the global aggregate as mathematical function of that node’s availability ie percentage time online probabilistically tolerates large numbers of selfish nodes and large groups of colluders and scales well with hundreds to thousands of nodes avcol uses several unique design decisions per aggregation tree construction where nodes are allowed limited but flexible probabilistic choice of parents or children probabilistic aggregation along trees and auditing of nodes both during aggregation as well as in gossip style ie periodically we have implemented avcol and we experimentally evaluated it using real life churn traces our evaluation and our mathematical analysis show that avcol satisfies arbitrary predicates scales well and withstands variety of selfish and colluding attacks
we explore using hashing to pack sparse data into compact table while retaining efficient random access specifically we design perfect multidimensional hash function one that is precomputed on static data to have no hash collisions because our hash function makes single reference to small offset table queries always involve exactly two memory accesses and are thus ideally suited for parallel simd evaluation on graphics hardware whereas prior hashing work strives for pseudorandom mappings we instead design the hash function to preserve spatial coherence and thereby improve runtime locality of reference we demonstrate numerous graphics applications including vector images texture sprites alpha channel compression parameterized textures painting simulation and collision detection
the pc desktop is very rich repository of personal information efficiently capturing user’s interests in this paper we propose new approach towards an automatic personalization of web search in which the user specific information is extracted from such local desktops thus allowing for an increased quality of user profiling while sharing less private information with the search engine more specifically we investigate the opportunities to select personalized query expansion terms for web search using three different desktop oriented approaches summarizing the entire desktop data summarizing only the desktop documents relevant to each user query and applying natural language processing techniques to extract dispersive lexical compounds from relevant desktop resources our experiments with the google api showed at least the latter two techniques to produce very strong improvement over current web search
we discuss image sense discrimination isd and apply method based on spectral clustering using multimodal features from the image and text of the embedding web page we evaluate our method on new data set of annotated web images retrieved with ambiguous query terms experiments investigate different levels of sense granularity as well as the impact of text and image features and global versus local text features
the integration of heterogeneous databases affects two main problems schema integration and instance integration at both levels mapping from local elements to global elements is specified and various conflicts caused by the heterogeneity of the sources have to be resolved for the detection and resolution of instance level conflicts we propose an interactive example driven approach the basic idea is to combine an interactive query tool similar to query by example with facilities for defining and applying integration operations this integration approach is supported by multidatabase query language which provides special mechanisms for conflict resolution the foundations of these mechanisms are introduced and their usage in instance integration and reconciliation is presented in addition we discuss basic techniques for supporting the detection of instance level conflicts
one of the main advantages of using scientific workflow management system swfms to orchestrate data flows among scientific activities is to control and register the whole workflow execution the execution of activities within workflow with high performance computing hpc presents challenges in swfms execution control current solutions leave the scheduling to the hpc queue system since the workflow execution engine does not run on remote clusters swfms are not aware of the parallel strategy of the workflow execution consequently remote execution control and provenance registry of the parallel activities is very limited from the swfms side this work presents set of components to be included on the workflow specification of any swmfs to control parallelization of activities as mtc in addition these components can gather provenance data during remote workflow execution through these mtc components the parallelization strategy can be registered and reused and provenance data can be uniformly queried we have evaluated our approach by performing parameter sweep parallelization in solving the incompressible navier stokes equations experimental results show the performance gains with the additional benefits of distributed provenance support
agent systems based on the belief desire and intention model of rao and georgeff have been used for number of successful applications however it is often difficult to learn how to apply such systems due to the complexity of both the semantics of the system and the computational model in addition there is gap between the semantics and the concepts that are presented to the programmer in this paper we address these issues by re casting the foundations of such systems into logic programming framework in particular we show how the integration of backward and forward chaining techniques for linear logic provides natural starting point for this investigation we discuss how the integrated system provides for the interaction between the proactive and reactive parts of the system and we discuss several aspects of this interaction in particular one perhaps surprising outcome is that goals and plans may be thought of as declarative and procedural aspects of the same concept we also discuss the language design issues for such system and particularly the way in which the potential choices for rule evaluation in forward chaining manner is crucial to the behaviour of the system
green computing is new paradigm of designing the computer system which considers not only the processing performance but also the energy efficiency power management is one of the approaches in green computing to reduce the power consumption in distributed computing system in this paper we first propose an optimal power management opm used by batch scheduler in server farm this opm observes the state of server farm and makes the decision to switch the operation mode ie active or sleep of the server to minimize the power consumption while the performance requirements are met an optimization problem based on constrained markov decision process cmdp is formulated and solved to obtain an optimal decision of opm given that opm is used in the server farm then an assignment of users to the server farms by job broker is considered this assignment is to ensure that the cost due to power consumption and network transportation is minimized the performance of the system is extensively evaluated the result shows that with opm the job waiting time can be maintained below the maximum threshold while the power consumption is much smaller than that without opm
we present technique for learning clothing models that enables the simultaneous animation of thousands of detailed garments in real time this surprisingly simple conditional model learns and preserves the key dynamic properties of cloth motion along with folding details our approach requires no priori physical model but rather treats training data as black box we show that the models learned with our method are stable over large time steps and can approximately resolve cloth body collisions we also show that within class of methods no simpler model covers the full range of cloth dynamics captured by ours our method bridges the current gap between skinning and physical simulation combining benefits of speed from the former with dynamic effects from the latter we demonstrate our approach on variety of apparel worn by male and female human characters performing varied set of motions typically used in video games eg walking running jumping etc
this paper considers dos attacks on dns wherein attackers flood the nameservers of zone to disrupt resolution of resource records belonging to the zone and consequently any of its sub zones we propose minor change in the caching behavior of dns resolvers that can significantly alleviate the impact of such attacks in our proposal dns resolvers do not completely evict cached resource records whose ttl has expired rather such resource records are stored in separate stale cache if during the resolution of query resolver does not receive any response from the nameservers that are responsible for authoritatively answering the query it can use the information stored in the stale cache to answer the query in effect the stale cache is the part of the global dns database that has been accessed by the resolver and represents an insurance policy that the resolver uses only when the relevant dns servers are unavailable we analyze day dns trace to quantify the benefits of stale cache under different attack scenarios further while the proposed change to dns resolvers also changes dns semantics we argue that it does not adversely impact any of the fundamental dns characteristics such as the autonomy of zone operators and hence is very simple and practical candidate for mitigating the impact of dos attacks on dns
the application of thesauri in networked environments is seriously hampered by the challenges of introducing new concepts and terminology into the formal controlled vocabulary which is critical for enhancing its retrieval capability the author describes an automated process of adding new terms to thesauri as entry vocabulary by analyzing the association between words sol phrases extracted from bibliographic titles and subject descriptors in the metadata record subject descriptors are terms assigned from controlled vocabularies of thesauri to describe the subjects of the objects eg books articles represented by the metadata records the investigated approach uses corpus of metadata for scientific and technical publications in which the titles contain substantive words for key topics the three steps of the method are extracting words and phrases from the title field of the metadata applying method to identify and select the specific and meaningful keywords based on the associated controlled vocabulary terms from the thesaurus used to catalog the objects and inserting selected keywords into the thesaurus as new terms most of them are in hierarchical relationships with the existing concepts thereby updating the thesaurus with new terminology that is being used in the literature the effectiveness of the method was demonstrated by an experiment with the chinese classification thesaurus cct and bibliographic data in china machine readable cataloging record marc format cnmarc provided by peking university library this approach is equally effective in large scale collections and in other languages copy wiley periodicals inc
there are several problem areas that must be addressed when applying randomization to unit testing as yet no general fully automated solution that works for all units has been proposed we therefore have developed rute java package intended to help programmers do randomized unit testing in java in this paper we describe rute and illustrate how it supports the development of per unit solutions for the problems of randomized unit testing we report on an experiment in which we applied rute to the standard java treemap class measuring the efficiency and effectiveness of the technique we also illustrate the use of randomized testing in experimentation by adapting rute so that it generates randomized minimal covering test suites and measuring the effectiveness of the test suites generated
the creation of generic and modular query optimization and processing infrastructure can provide significant benefits to xml data management key pieces of such an infrastructure are the physical operators that are available to the execution engine to turn queries into execution plans such operators to be efficient need to implement sophisticated algorithms for logical xpath or xquery operations moreover to enable cost based optimizer to choose among them correctly it is also necessary to provide cost models for such operator implementations in this paper we present two novel families of algorithms for xpath physical operators called lookup lu and sort merge based sm along with detailed cost models our algorithms have significantly better performance compared to existing techniques over any one of variety of different xml storage systems that provide set of common primitive access methods to substantiate the robustness and efficiency of our physical operators we evaluate their individual performance over four different xml storage engines against operators that implement existing xpath processing techniques we also demonstrate the performance gains for twig processing of using plans consisting of our operators compared to state of the art holistic technique specifically twigstack additionally we evaluate the precision of our cost models and we conduct an analysis of the sensitivity of our algorithms and cost models to variety of parameters
during the past few years two main approaches have been taken to improve the performance of software shared memory implementations relaxing consistency models and providing fine grained access control their performance tradeoffs however we not well understood this paper studies these tradeoffs on platform that provides access control in hardware but runs coherence protocols in software we compare the performance of three protocols across four coherence granularities using applications on node cluster of workstations our results show that no single combination of protocol and granularity performs best for all the applications the combination of sequentially consistent sc protocol and fine granularity works well with of the applications the combination of multiple writer home based lazy release consistency hlrc protocol and page granularity works well with out of the applications for applications that suffer performance losses in moving to coarser granularity under sequential consistency the performance can usually be regained quite effectively using relaxed protocols particularly hlrc we also find that the hlrc protocol performs substantially better than single writer lazy release consistent sw lrc protocol at coase granularity for many irregular applications for our applications and platform when we use the original versions of the applications ported directly from hardware coherent shared memory we find that the sc protocol with byte granularity performs best on average however when the best versions of the applications are compared the balance shifts in favor of hlrc at page granularity
an important component of higher level fusion is knowledge discovery one form of knowledge is set of relationships between concepts this paper addresses the automated discovery of ontological knowledge representations such as taxonomies thesauri from imagery based data multi target classification is used to transform each source data point into set of conceptual predictions from pre defined lexicon this classification pre processing produces co occurrence data that is suitable for input to an ontology learning algorithm neural network with an associative incremental learning nail algorithm processes this co occurrence data to find relationships between elements of the lexicon thus uncovering the knowledge structure hidden in the dataset the efficacy of this approach is demonstrated on dataset created from satellite imagery of metropolitan region the flexibility of the nail algorithm is illustrated by employing it on an additional dataset comprised of topic categories from text document collection the usefulness of the knowledge structure discovered from the imagery data is illustrated via construction of bayesian network which produces an inference engine capable of exploiting the learned knowledge model effective automation of knowledge discovery in an information fusion context has considerable potential for aiding the development of machine based situation awareness capabilities
this chapter presents number of different aspects related to particular kind of large and complex networks wireless sensor network wsn consists of large number of nodes that individually have limited computing power and information their interaction is strictly local but their task is to build global structures and pursue global objectivesdealing with wsns requires mixture of theory and practice ie combination of algorithmic foundations with simulations and experiments that has been the subject of our project swarmnet in the first part we describe number of fundamental algorithmic issues boundary recognition without node coordinates clustering routing and energy constrained flows the second part deals with the simulation of large scale wsns we describe the most important challenges and how they can be tackled with our network simulator shawn
widespread recognition of the usefulness of graphical user interfaces guis has established their importance as critical components of today’s software guis have characteristics different from traditional software and conventional testing techniques do not directly apply to guis this paper’s focus is on coverage critieria for guis important rules that provide an objective measure of test quality we present new coverage criteria to help determine whether gui has been adequately tested these coverage criteria use events and event sequences to specify measure of test adequacy since the total number of permutations of event sequences in any non trivial gui is extremely large the gui’s hierarchical structure is exploited to identify the important event sequences to be tested gui is decomposed into gui components each of which is used as basic unit of testing representation of gui component called an event flow graph identifies the interaction of events within component and intra component criteria are used to evaluate the adequacy of tests on these events the hierarchical relationship among components is represented by an integration tree and inter component coverage criteria are used to evaluate the adequacy of test sequences that cross components algorithms are given to construct event flow graphs and an integration tree for given gui and to evaluate the coverage of given test suite with respect to the new coverage criteria case study illustrates the usefulness of the coverage report to guide further testing and an important correlation between event based coverage of gui and statement coverage of its software’s underlying code
the cilkview scalability analyzer is software tool for profiling estimating scalability and benchmarking multithreaded cilk applications cilkview monitors logical parallelism during an instrumented execution of the cilk application on single processing core as cilkview executes it analyzes logical dependencies within the computation to determine its work and span critical path length these metrics allow cilkview to estimate parallelism and predict how the application will scale with the number of processing cores in addition cilkview analyzes cheduling overhead using the concept of burdened dag which allows it to diagnose performance problems in the application due to an insufficient grain size of parallel subcomputations cilkview employs the pin dynamic instrumentation framework to collect metrics during serial execution of the application code it operates directly on the optimized code rather than on debug version metadata embedded by the cilk compiler in the binary executable identifies the parallel control constructs in the executing application this approach introduces little or no overhead to the program binary in normal runs cilkview can perform real time scalability benchmarking automatically producing gnuplot compatible output that allows developers to compare an application’s performance with the tool’s predictions if the program performs beneath the range of expectation the programmer can be confident in seeking cause such as insufficient memory bandwidth false sharing or contention rather than inadequate parallelism or insufficient grain size
this survey concerns the role of data structures for compactly storing and representing various types of information in localized and distributed fashion traditional approaches to data representation are based on global data structures which require access to the entire structure even if the sought information involves only small and local set of entities in contrast localized data representation schemes are based on breaking the information into small local pieces or labels selected in way that allows one to infer information regarding small set of entities directly from their labels without using any additional global information the survey concentrates mainly on combinatorial and algorithmic techniques such as adjacency and distance labeling schemes and interval schemes for routing and covers complexity results on various applications focusing on compact localized schemes for message routing in communication networks
thinsight is novel optical sensing system fully integrated into thin form factor display capable of detecting multi ple fingers placed on or near the display surface we describe this new hardware in detail and demonstrate how it can be embedded behind regular lcd allowing sensing without degradation of display capability with our approach fingertips and hands are clearly identifiable through the display the approach of optical sensing also opens up the exciting possibility for detecting other physical objects and visual markers through the display and some initial experiments are described we also discuss other novel capabilities of our system interaction at distance using ir pointing devices and ir based communication with other electronic devices through the display major advantage of thinsight over existing camera and projector based optical systems is its compact thin form factor making such systems even more deployable we therefore envisage using thinsight to capture rich sensor data through the display which can be processed using computer vision techniques to enable both multi touch and tangible interaction
we present family of discrete isometric bending models ibms for triangulated surfaces in space these models are derived from an axiomatic treatment of discrete laplace operators using these operators to obtain linear models for discrete mean curvature from which bending energies are assembled under the assumption of isometric surface deformations we show that these energies are quadratic in surface positions the corresponding linear energy gradients and constant energy hessians constitute an efficient model for computing bending forces and their derivatives enabling fast time integration of cloth dynamics with two to three fold net speedup over existing nonlinear methods and near interactive rates for willmore smoothing of large meshes
views stored in data warehouse need to be kept current as recomputing the views is very expensive incremental maintenance algorithms are required over recent years several incremental maintenance algorithms have been proposed none of the proposed algorithms handle the general case of relational expressions involving aggregate and outerjoin operators efficientlyin this article we develop the change table technique for incrementally maintaining general view expressions involving relational and aggregate operators we show that the change table technique outperforms the previously proposed techniques by orders of magnitude the developed framework easily extends to efficiently maintaining view expressions containing outerjoin operators we prove that the developed change table technique is an optimal incremental maintenance scheme for given view expression tree under some reasonable assumptions
in the conventional steganographic framework covert message is hidden within larger seemingly innocent message we argue that this framework must be extended in order to adequately model the means and goals of modern collaborative systems whereas messages are static objects with single creator collaborative systems are dynamically changed by multiple entities according to rules and patterns specific to the given system the primary contribution of this paper is to frame the general question when can one steganographically embed collaborative system into collaborative system as case study we develop system for embedding simple wiki into the flickr photo sharing service we develop techniques for steganographically embedding collaborative system whose update rules are quite different than those of the host service
lazy programs are beautiful but they are slow because they build many thunks simple measurements show that most of these thunks are unnecessary they are in fact always evaluated or are always cheap in this paper we describe optimistic evaluation an evaluation strategy that exploits this observation optimistic evaluation complements compile time analyses with run time experiments it evaluates thunk speculatively but has an abortion mechanism to back out if it makes bad choice run time adaption mechanism records expressions found to be unsuitable for speculative evaluation and arranges for them to be evaluated more lazily in the futurewe have implemented optimistic evaluation in the glasgow haskell compiler the results are encouraging many programs speed up significantly some improve dramatically and none go more than slower
dynamic structure discrete event system specification dsdevs is an advanced modeling formalism that allows devs models and their couplings to be dynamically changed the modeling power and advantages of dsdevs have been well studied however the performance aspect is generally overlooked this paper provides comprehensive performance measurement of dsdevs for large scale cellular space model we consider the modeling and simulation layers for performance analysis and carry out performance measurement based on token ring model and fire spread model the results show that ds modeling can improve simulation performance for large scale cellular space models due to the fact that it makes the simulation focus only on those active models and thus is more efficient than when the entire cellular space is loaded on the other hand the ds overhead cannot be ignored and can become significant and even dominant when large number of cells are dynamically added deleted
memory is scarce resource during embedded system design increasing memory often increases packaging costs cooling costs size and power consumption this article presents crames novel and efficient software based ram compression technique for embedded systems the goal of crames is to dramatically increase effective memory capacity without hardware or application design changes while maintaining high performance and low energy consumption to achieve this goal crames takes advantage of an operating system’s virtual memory infrastructure by storing swapped out pages in compressed format it dynamically adjusts the size of the compressed ram area protecting applications capable of running without it from performance or energy consumption penalties in addition to compressing working data sets crames also enables efficient in ram filesystem compression thereby further increasing ram capacity crames was implemented as loadable module for the linux kernel and evaluated on battery powered embedded system experimental results indicate that crames is capable of doubling the amount of ram available to applications running on the original system hardware execution time and energy consumption for broad range of examples are rarely affected when physical ram is reduced to percnt of its original quantity crames enables the target embedded system to support the same applications with reasonable performance and energy consumption penalties on average percnt and percnt while without crames those applications either may not execute or suffer from extreme performance degradation or instability in addition to presenting novel framework for dynamic data memory compression and in ram filesystem compression in embedded systems this work identifies the software based compression algorithms that are most appropriate for use in low power embedded systems
in this paper general framework for the parallel bdi model suitable for dynamic environments is proposed it is parallel agent architecture that supports the following agent abilities at architecture level the ability to monitor the environment at all times and respond to emergencies timely the ability to reconsider and re schedule goals intentions and actions in reaction to unexpected or new information the ability to perform multiple actions at once the ability to perceive deliberate and act simultaneously the ability to prioritize the deliberations and intention executions we define the functions and the operations of the processing units in the agent and how these units interact cooperate and synchronize with each other with the advances in semiconductor technology which allow multiple processing units to be implemented on the same silicon chip parallel bdi agent will be an effective way to enable it to perform in dynamically changing environment when the arrival rate of events is high we illustrate the working of parallel agent under the general framework with an agent simulating the behaviour of vessel captain navigating in sea then the performance of parallel agent is evaluated against several versions of sequential agents the issue of how much parallelism and how to configure parallel agent based on the general framework are studied by experiments with different configurations of the parallel agent
we explore syntactic approach to sentence compression in the biomedical domain grounded in the context of result presentation for related article search in the pubmed search engine by automatically trimming inessential fragments of article titles system can effectively display more results in the same amount of space our implemented prototype operates by applying sequence of syntactic trimming rules over the parse trees of article titles two separate studies were conducted using corpus of manually compressed examples from medline an automatic evaluation using bleu and summative evaluation involving human assessors experiments show that syntactic approach to sentence compression is effective in the biomedical domain and that the presentation of compressed article titles supports accurate interest judgments decisions by users as to whether an article is worth examining in more detail
various approaches for keyword search in different settings eg relational databases and xml actually deal with the problem of enumerating fragments for given set of keywords fragment is subtree of the given data graph such that contains all the keywords of and no proper subtree of has this property there are three types of fragments directed undirected and strong this paper describes efficient algorithms for enumerating fragments specifically for all three types of fragments algorithms are given for enumerating all fragments with polynomial delay and polynomial space it is shown how these algorithms can be enhanced to enumerate fragments in heuristic order for directed fragments and acyclic data graphs an algorithm is given for enumerating with polynomial delay in the order of increasing weight ie the ranked order assuming that is of fixed size
this paper makes two contributions first we introduce model for evaluating the performance of data allocation and replication algorithms in distributed databases the model is comprehensive in the sense that it accounts for cost for communication cost and because of reliability considerations for limits on the minimum number of copies of the object the model captures existing replica management algorithms such as read one write all quorum consensus etc these algorithms are static in the sense that in the absence of failures the copies of each object are allocated to fixed set of processorsin modern distributed databases particularly in mobile computing environments processors will dynamically store objects in their local database and will relinquish them therefore as second contribution of this paper we introduce an algorithm for automatic dynamic allocation of replicas to processors then using the new model we compare the performance of the traditional read one write all static allocation algorithm to the performance of the dynamic allocation algorithm as result we obtain the relationship between the communication cost and cost for which static allocation is superior to dynamic allocation and the relationships for which dynamic allocation is superior
in many data analysis tasks one is often confronted with very high dimensional data feature selection techniques are designed to find the relevant feature subset of the original features which can facilitate clustering classification and retrieval in this paper we consider the feature selection problem in unsupervised learning scenario which is particularly difficult due to the absence of class labels that would guide the search for relevant information the feature selection problem is essentially combinatorial optimization problem which is computationally expensive traditional unsupervised feature selection methods address this issue by selecting the top ranked features based on certain scores computed independently for each feature these approaches neglect the possible correlation between different features and thus can not produce an optimal feature subset inspired from the recent developments on manifold learning and regularized models for subset selection we propose in this paper new approach called multi cluster feature selection mcfs for unsupervised feature selection specifically we select those features such that the multi cluster structure of the data can be best preserved the corresponding optimization problem can be efficiently solved since it only involves sparse eigen problem and regularized least squares problem extensive experimental results over various real life data sets have demonstrated the superiority of the proposed algorithm
this paper describes robust mechanism for transmitting meshes over the internet tcp ip is an excellent means for reliable transport over the internet however multi user real time graphics applications may find tcp transmission disadvantageous when reception of mesh is time critical to improve speed one could use an unreliable transmission protocol yet typical mesh compression schemes increase the fragility of the mesh to lossy transmission in this paper we develop hybrid method of transmitting meshes over the internet built upon progressive mesh technology the hybrid method transmits important visual detail in lossless manner but trades off loss of visually less important detail for transmission speed tests of the method in lossy network environment show that the method improves the transmission time of the mesh with little degradation in quality
run time errors in concurrent programs are generally due to the wrong usage of synchronization primitives such as monitors conventional validation techniques such as testing become ineffective for concurrent programs since the state space increases exponentially with the number of concurrent processes in this paper we propose an approach in which the concurrency control component of concurrent program is formally specified it is verified automatically using model checking and the code for concurrency control component is automatically generated we use monitors as the synchronization primitive to control access to shared resource by multipleconcurrent processes since our approach decouples the concurrency control component from the rest of the implementation it is scalable we demonstrate the usefulness of our approach by applying it to case study on airport ground traffic controlwe use the action language to specify the concurrency control component of system action language is specification language for reactive software systems it is supported by an infinite state model checker that can verify systems with boolean enumerated and udbounded integer variables our code generation tool automatically translates the verified action language specification into java monitor our translation algorithm employs symbolic manipulation techniques and the specific notification pattern to generate an optimized monitor class by eliminating the context switch overhead introduced as result of unnecessary thread notification using counting abstraction we show that we can automatically verify the monitor specifications for arbitrary number of threads
database query engines typically rely upon query size estimators in order to evaluate the potential cost of alternate query plans in multi dimensional database systems such as those typically found in large data warehousing environments these selectivity estimators often take the form of multi dimensional histograms but while single dimensional histograms have proven to be quite accurate even in the presence of data skew the multi dimensional variations have generally been far less reliable in this paper we present new histogram model that is based upon an tree space partitioning the localization of the tree boxes is in turn controlled by hilbert space filling curve while series of efficient area equalization heuristics restructures the initial boxes to provide improved bucket representation experimental results demonstrate significantly improved estimation accuracy relative to state of the art alternatives as well as superior consistency across variety of record distributions
for continuous valued information systems the attribute values of objects for the same attribute represent not only their ordinal relationship but also their relative distances therefore the classical rough set model is not suitable for deducing attribute reductions and optimal decision rules for continuous valued information systems though some discretization methods are proposed to transform the continuous valued information systems into discrete ones but those methods are too categorical and may lead to loss of information in some cases to solve such information loss problem we propose tolerance rough set model in this paper with given level the proposed model can divide universe into some maximal tolerance classes also two types of lower and upper approximations are defined accordingly then the reductions of the maximal tolerance class and optimal decision rules based on the proposed attribute descriptors are defined and the approximate discernibility function for the maximal tolerance class is constructed and used to compute all the corresponding optimal decision rules via using boolean reasoning techniques finally the general reductions and consistent reductions for continuous valued information systems are discussed
we study the problem of exploiting parallelism from search based ai systems on share nothing platforms ie platforms where different machines do not have access to any form of shared memory we propose novel environment representation technique called stack splitting which is modification of the well known stack copying technique that enables the efficient exploitation of or parallelism from ai systems on distributed memory machines stack splitting coupled with appropriate scheduling strategies leads to reduced communication during distributed execution and effective distribution of larger grain sized work to processors the novel technique can also be implemented on shared memory machines and it is quite competitive in this paper we present distributed implementation of or parallelism based on stack splitting including results our results suggest that stack splitting is an effective technique for obtaining high performance parallel ai systems on shared memory as well as distributed memory multiprocessors
wearable computing and smart clothing have attracted lot of attention in the last years for variety of applications it can be seen as potential future direction of mobile user interfaces in this paper we concentrate on usability and applicability issues concerned with capacitive touch input on clothing to be able to perform user studies we built generic platform for attaching eg capacitive sensors of different types on top of that several prototypes of wearable accessories and clothing and implemented various application scenarios we report on two studies we undertook with these implementations with user group randomly sampled at shopping mall we provide significant set of guidelines and lessons learned that emerged from our experiences and those studies thus developers of similar projects have to put major efforts into minimizing the delay between button activation and feedback and to make location and identification of controls and their function as simple and quick as possible issues that have to be treated in all designs include the requirement of one handed interaction and that even for minimal functionality to find general solution with regard to layout and button to function mapping is hardly possible additionally in order to generate satisfactory user experience good usability must be combined with aesthetical factors
this paper presents web graph representation based on compact tree structure that takes advantage of large empty areas of the adjacency matrix of the graph our results show that our method is competitive with the best alternatives in the literature offering very good compression ratio bits per link while permitting fast navigation on the graph to obtain direct as well as reverse neighbors microseconds per neighbor delivered moreover it allows for extended functionality not usually considered in compressed graph representations
one of the main reasons for the failure of many software projects is the late discovery of mismatch between the customers expectations and the pieces of functionality implemented in the delivered system at the root of such mismatch is often set of poorly defined incomplete under specified and inconsistent requirements test driven development has recently been proposed as way to clarify requirements during the initial elicitation phase by means of acceptance tests that specify the desired behavior of the system the goal of the work reported in this paper is to empirically characterize the contribution of acceptance tests to the clarification of the requirements coming from the customer we focused on fit tables way to express acceptance tests which can be automatically translated into executable test cases we ran two experiments with students from university of trento and politecnico of torino to assess the impact of fit tables on the clarity of requirements we considered whether fit tables actually improve requirement understanding and whether this requires any additional comprehension effort experimental results show that fit helps in the understanding of requirements without requiring significant additional effort
ifra an acronym for instruction footprint recording and analysis overcomes major challenges associated with very expensive step in post silicon validation of processors pinpointing bug location and the instruction sequence that exposes the bug from system failure such as crash special on chip recorders inserted in processor during design collect instruction footprints special information about flows of instructions and what the instructions did as they passed through various microarchitectural blocks of the processor the recording is done concurrently during the normal operation of the processor in post silicon system validation setup upon detection of system failure the recorded information is scanned out and analyzed offline for bug localization special self consistency based program analysis techniques together with the test program binary of the application executed during post silicon validation are used for this purpose major benefits of using ifra over traditional techniques for post silicon bug localization are it does not require full system level reproduction of bugs and it does not require full system level simulation hence it can overcome major hurdles that limit the scalability of traditional post silicon validation methodologies simulation results on complex superscalar processor demonstrate that ifra is effective in accurately localizing electrical bugs with chip level area impact
this paper presents an extremely lightweight dynamic voltage and frequency scaling technique targeted towards modern multi tasking systems the technique utilizes processors runtime statistics and an online learning algorithm to estimate the best suited voltage and frequency setting at any given point in time we implemented the proposed technique in linux running on an intel pxax platform and performed experiments in both single and multi task environments our measurements show that we can achieve the maximum energy savings of and reduce the implementation overhead by factor of when compared to state of the art techniques
web service technology provides an infrastructure for developing distributed systems and performing electronic business operations within and across organizational boundaries it is still evolving currently it is lacking mechanisms to deal with quality of service qos service consumer requirements may include functional and non functional aspects the web services description language wsdl and universal description discovery integration uddi standards support the specification publication and discovery of web services based only on functional aspects the goal of this paper is to propose an approach for supporting web service interactions brokers are employed to facilitate the partnership establishment between service consumers and providers they select services in uddi registries according to consumer functional and non functional requirements the main contributions of this paper are an extension to the web services policy framework ws policy standard to complement wsdl descriptions with semantics enriched qos policies using the ontology web language owl and able rule language arl standards and an extension to the uddi standard to include qos policies
we address the problem of automatic interpretation ofnon exaggerated human facial and body behaviours captured in video we illustrate our approach by three examples we introduce canonical correlation analysis cca and matrix canonical correlation analysis mcca for capturing and analyzing spatial correlations among non adjacent facial parts for facial behaviour analysis we extend canonical correlation analysis to multimodality correlation for bebaviour inference using both facial and body gestures we model temporal correlation among human movement patterns in wider space using mixture of multi observation hidden markov model for human behaviour profiling and behavioural anomaly detection
spam is highly pervasive in pp file sharing systems and is difficult to detect automatically before actually downloading file due to the insufficient and biased description of file returned to client as query result to alleviate this problem we propose probing technique to collect more complete feature information of query results from the network and apply feature based ranking for automatically detecting spam in pp query result sets furthermore we examine the tradeoff between the spam detection performance and the network cost different ways of probing are explored to reduce the network cost experimental results show that the proposed techniques successfully decrease the amount of spam by in the top results and by in the top results with reasonable cost
in data collection applications of low end sensor networks major challenge is ensuring reliability without significant goodput degradation short hops over high quality links minimize per hop transmissions but long routes may cause congestion and load imbalance longer links can be exploited to build shorter routes but poor links may have high energy cost there exists complex interplay among routing performance reliability goodput energy efficiency link estimation congestion control and load balancing we design routing architecture arbutus that exploits this interplay and perform an extensive experimental evaluation on testbeds of berkeley motes
how does the web look how could we tell an abnormal social network from normal one these and similar questions are important in many fields where the data can intuitively be cast as graph examples range from computer networks to sociology to biology and many more indeed any relation in database terminology can be represented as graph lot of these questions boil down to the following ldquo how can we generate synthetic but realistic graphs rdquo to answer this we must first understand what patterns are common in real world graphs and can thus be considered mark of normality realism this survey give an overview of the incredible variety of work that has been done on these problems one of our main contributions is the integration of points of view from physics mathematics sociology and computer science further we briefly describe recent advances on some related and interesting graph problems
when commerce companies merge there is need to integrate their local schema into uniform global source that is easily generated and maintained in this paper we explore the incremental maintenance of global xml schema against updates to local schemas with the use of three simple operations add remove and change these operations are designed to work as an extension to the axis model which currently does not have way to maintain the global schema once an underlying source schema is updated
this paper presents dynamic feedback technique that enables computations to adapt dynamically to different execution environments compiler that uses dynamic feedback produces several different versions of the same source code each version uses different optimization policy the generated code alternately performs sampling phases and production phases each sampling phase measures the overhead of each version in the current environment each production phase uses the version with the least overhead in the previous sampling phase the computation periodically resamples to adjust dynamically to changes in the environmentwe have implemented dynamic feedback in the context of parallelizing compiler for object based programs the generated code uses dynamic feedback to automatically choose the best synchronization optimization policy our experimental results show that the synchronization optimization policy has significant impact on the overall performance of the computation that the best policy varies from program to program that the compiler is unable to statically choose the best policy and that dynamic feedback enables the generated code to exhibit performance that is comparable to that of code that has been manually tuned to use the best policy we have also performed theoretical analysis which provides under certain assumptions guaranteed optimality bound for dynamic feedback relative to hypothetical and unrealizable optimal algorithm that uses the best policy at every point during the execution
wireless sensor networks wsn are envisioned to have significant impacts on many applications in scenarios such as first responder systems and administrative applications data sinks are frequently being mobile in this research we study one to many and many to many data communications from stationary sensors to mobile sinks which is defined as mobile multicasting here we firstly propose track and transmit tnt simple lightweight approach to providing mobile multicasting service in wsn its rationale is based on the fact that the mobile sinks will stamp their movement traces in the networks while moving continuously from one location to another then we propose priced tnt ptnt which improves the forwarding efficiency of tnt compared to very lightweight mobile multicasting vlm simulations show that both tnt and ptnt are able to dramatically suppress the control overheads as well as achieve better data delivery ratio and acceptable delay performances
recently there has been considerable interest in mining spatial colocation patterns from large spatial datasets spatial colocations represent the subsets of spatial events whose instances are frequently located together in nearby geographic area most studies of spatial colocation mining require the specification of minimum prevalent threshold to find the interesting patterns however it is difficult for users to provide appropriate thresholds without prior knowledge about the task specific spatial data we propose different framework for spatial colocation pattern mining finding most prevalent colocated event sets where is the desired number of event sets with the highest interest measure values per each pattern size we developed an algorithm for mining most prevalent colocation patterns experimental results with real data show that our algorithmic design is computationally effective
on chip caches consume significant fraction of the energy in current microprocessors as result architectural circuit level techniques such as block buffering and sub banking have been proposed and shown to be very effective in reducing the energy consumption of on chip caches while there has been some work on evaluating the energy and performance impact of different block buffering schemes we are not aware of software solutions to take advantage of on chip cache block buffersthis article presents compiler based approach that modifies code and variable layout to take better advantage of block buffering the proposed technique is aimed at class of embedded codes that make heavy use of scalar variables unlike previous work that uses only storage pattern optimization or only access pattern optimization we propose an integrated approach that uses both code restructuring which affects the access sequence and storage pattern optimization which determines the storage layout of variables we use graph based formulation of the problem and present solution for determining suitable variable placements and accompanying access pattern transformations the proposed technique has been implemented using an experimental compiler and evaluated using set of complete programs the experimental results demonstrate that our approach leads to significant energy savings based on these results we conclude that compiler support is complementary to architecture and circuit based techniques to extract the best energy behavior from cache subsystem that employs block buffering
distance estimation is important for localization and multitude of other tasks in wireless sensor networkswe propose new scheme for distance estimation based on the comparison of neighborhood lists it is inspired by the observation that distant nodes have fewer neighbors in common than close ones other than many distance estimation schemes it relies neither on special hardware nor on unreliable measurements of physical wireless communication properties like rssi additionally the approach benefits from message exchange by other protocols and requires single additional message exchange for distance estimation we will show that the approach is universally applicable and works with arbitrary radio hardware we discuss related work and present the new approach in detail including its mathematical foundations we demonstrate the performance of our approach by presenting various simulation results
with the exponential growth in size of geometric data it is becoming increasingly important to make effective use of multilevel caches limited disk storage and bandwidth as result recent work in the visualization community has focused either on designing sequential access compression schemes or on producing cache coherent layouts of uncompressed meshes for random access unfortunately combining these two strategies is challenging as they fundamentally assume conflicting modes of data access in this paper we propose novel order preserving compression method that supports transparent random access to compressed triangle meshes our decompression method selectively fetches from disk decodes and caches in memory requested parts of mesh we also provide general mesh access api for seamless mesh traversal and incidence queries while the method imposes no particular mesh layout it is especially suitable for cache oblivious layouts which minimize the number of decompression requests and provide high cache utilization during access to decompressed in memory portions of the mesh moreover the transparency of our scheme enables improved performance without the need for application code changes we achieve compression rates on the order of and significantly improved performance due to reduced data transfer to demonstrate the benefits of our method we implement two common applications as benchmarks by using cache oblivious layouts for the input models we observe times overall speedup compared to using uncompressed meshes
every piece of textual data is generated as method to convey its authors opinion regarding specific topics authors deliberately organize their writings and create links ie references acknowledgments for better expression thereafter it is of interest to study texts as well as their relations to understand the underlying topics and communities although many efforts exist in the literature in data clustering and topic mining they are not applicable to community discovery on large document corpus for several reasons first few of them consider both textual attributes as well as relations second scalability remains significant issue for large scale datasets additionally most algorithms rely on set of initial parameters that are hard to be captured and tuned motivated by the aforementioned observations hierarchical community model is proposed in the paper which distinguishes community cores from affiliated members we present our efforts to develop scalable community discovery solution for large scale document corpus our proposal tries to quickly identify potential cores as seeds of communities through relation analysis to eliminate the influence of initial parameters an innovative attribute based core merge process is introduced so that the algorithm promises to return consistent communities regardless initial parameters experimental results suggest that the proposed method has high scalability to corpus size and feature dimensionality with more than topical precision improvement compared with popular clustering techniques
contextual structure servers and versioning servers share similar goal in allowing different views on stored structure according to the viewer’s perspective in this paper we argue that generic contextual model can be used to facilitate versioning in order to prove our hypothesis we have drawn on our experiences with ohp version to extend fohm’s contextual model
ad hoc environments are subject to tight security and architectural constraints which call for distributed adaptive robust and efficient solutions in this paper we propose distributed signature protocol for large scale long lived ad hoc networks the proposed protocol is based on rsa and new secret sharing scheme the nodes of the network are uniformly partitioned into classes and the nodes belonging to the same class are provided with the same share any nodes belonging to different classes can collectively issue signature without any interaction the scheme is at least as secure as any threshold scheme ie an adversary can neither forge signature nor disrupt the computation unless it has compromised at least nodes belonging to different classes moreover an attempt to disrupt the distributed service by providing fake signature share would reveal the cheating node further it is possible to easily increase the level of security by shifting from to scheme for reasonable choice of parameter involving just fraction of the nodes so that the scheme is adaptive to the level of threat that the ad hoc network is subject to finally the distributed signature protocol is efficient the number of messages sent and received for generating signature as well as to increase the level of security is small and both computations and memory required are small as well all the authors have been partially funded by the web minds project supported by the italian miur under the firb program roberto di pietro is also partially supported by the cnr isti pisa in the framework of the ldquo satnex ii rdquo noe project contract
we define and design succinct indexes for several abstract data types adts the concept is to design auxiliary data structures that occupy asymptotically less space than the information theoretic lower bound on the space required to encode the given data and support an extended set of operations using the basic operators defined in the adt as opposed to succinct integrated data index encodings the main advantage of succinct indexes is that we make assumptions only on the adt through which the main data is accessed rather than the way in which the data is encoded this allows more freedom in the encoding of the main data in this paper we present succinct indexes for various data types namely strings binary relations and multi labeled trees given the support for the interface of the adts of these data types we can support various useful operations efficiently by constructing succinct indexes for them when the operators in the adts are supported in constant time our results are comparable to previous results while allowing more flexibility in the encoding of the given data using our techniques we design succinct encoding that represents string of length over an alphabet of size sigma using nhk lg sigma bits to support access rank select operations in lg lg sigma time we also design succinct text index using nhk lg sigma bits that supports pattern matching queries in lg lg sigma occ lg epsilon nlg lg sigma time for given pattern of length previous results on these two problems either have lg sigma factor instead of lg lg sigma in terms of running time or are not compressible
finding the position of mobile agent in wide distributed system still represents an open research issue the various existing mobile agent platforms implement ad hoc naming and location policies studied to address the requirements of the design choices made this paper proposes naming scheme and location protocol of general validity for mobile agents able to effectively meet all the typical requirements of mobile agent environments and thus easy to integrate into different platforms the paper identifies the main characteristics which an agent naming scheme and location protocol of general validity should have and suggests some properties and parameters to be taken into account to evaluate the effectiveness of naming schemes and location protocols then we propose human readable agent naming scheme based on the distributed environment outlined in masif and suitable location finding protocol called the search by path chase both of them are compared with some of the solutions already provided using the properties and the parameters suggested the performances are finally evaluated by means of set of measurements
recent studies have demonstrated that it is possible to perform public key cryptographic operations on the resource constrained sensor platforms however the significant resource consumption imposed by public key cryptographic operations makes such mechanisms easy targets of denial of service dos attacks for example if digital signatures such as ecdsa are used directly for broadcast authentication without further protection an attacker can simply broadcast forged packets and force the receiving nodes to perform large number of unnecessary signature verifications eventually exhausting their battery power this paper studies how to deal with such dos attacks when signatures are used for broadcast authentication in sensor networks in particular this paper presents two filtering techniques group based filter and key chain based filter to handle dos attacks against signature verification both methods can significantly reduce the number of unnecessary signature verifications that sensor node has to perform the analytical results also show that these two techniques are efficient and effective for resource constrained sensor networks
this paper makes the case that operating system designs for sensor networks should focus on the coordination of resource management decisions across the network rather than merely on individual nodes we motivate this view by describing the challenges inherent to achieving globally efficient use of sensor network resources especially when the network is subject to unexpected variations in both load and resource availability we present peloton new distributed os for sensor networks that provides mechanisms for representing distributed resource allocations efficient state sharing across nodes and decentralized management of network resources we outline the peloton os architecture and present three sample use cases to illustrate its design
in this paper we address the issue of learning to rank for document retrieval in the task model is automatically created with some training data and then is utilized for ranking of documents the goodness of model is usually evaluated with performance measures such as map mean average precision and ndcg normalized discounted cumulative gain ideally learning algorithm would train ranking model that could directly optimize the performance measures with respect to the training data existing methods however are only able to train ranking models by minimizing loss functions loosely related to the performance measures for example ranking svm and rankboost train ranking models by minimizing classification errors on instance pairs to deal with the problem we propose novel learning algorithm within the framework of boosting which can minimize loss function directly defined on the performance measures our algorithm referred to as adarank repeatedly constructs weak rankers on the basis of reweighted training data and finally linearly combines the weak rankers for making ranking predictions we prove that the training process of adarank is exactly that of enhancing the performance measure used experimental results on four benchmark datasets show that adarank significantly outperforms the baseline methods of bm ranking svm and rankboost
applications with widely shared data do not perform well on cc numa multiprocessors due to the hot spots they create in the system in this paper we address this problem by enhancing the memory controller with forwarding mechanism capable of hiding the read latency of widely shared data while potentially decreasing the memory and network contention based on the influx of requests the memory anticipates the next read references and forwards the data in advance to the processors to identify the set of processors the data is to be forwarded to we use heuristic based on the spatial locality of memory blocks to increase the forwarding effectiveness and minimize the number of messages we incorporate simple filters combined with feedback mechanism we also show that further improvements are possible using combined software prefetching hardware forwarding approach our experimental results obtained with detailed execution driven simulator with ilp processors show significant improvements in execution time up to
an ad hoc network is formed when two or more wireless nodes agree to forward packets on behalf of each other as the wireless range of such nodes is severely limited the nodes mutually cooperate with their neighbours in order to extend the overall communication range of the network dynamic source routing dsr is one of the commonly used protocols used in establishing ad hoc networks the network keeps on functioning smoothly when each node executes the routing protocol in the correct manner however along with benevolent nodes there may always be some malicious and selfish nodes present in the network that try to disrupt distort or disturb the network traffic in this paper we propose novel and pragmatic scheme for establishing and sustaining trustworthy routes in the network each node maintains trust levels for its immediate neighbours based upon their current actions nodes also share these trust levels reputations to get ancillary information about other nodes in the network in order to minimise control packet overhead we have integrated the trust sharing mechanism with the dsr route discovery process in unique manner that augments the protocol’s performance in the presence of malicious nodes
this paper presents novel prototype hierarchy based clustering phc framework for the organization of web collections it solves simultaneously the problem of categorizing web collections and interpreting the clustering results for navigation by utilizing prototype hierarchies and the underlying topic structures of the collections phc is modeled as multi criterion optimization problem based on minimizing the hierarchy evolution maximizing category cohesiveness and inter hierarchy structural and semantic resemblance the flexible design of metrics enables phc to be general framework for applications in various domains in the experiments on categorizing collections of distinct domains phc achieves improvement in over the state of the art techniques further experiments provide insights on performance variations with abstract and concrete domains completeness of the prototype hierarchy and effects of different combinations of optimization criteria
an important feature of database support for expert systems is the ability of the database to answer queries regarding the existence of path from one node to another in the directed graph underlying some database relation given just the database relation answering such query is time consuming but given the transitive closure of the database relation table look up suffices we present an indexing scheme that permits the storage of the pre computed transitive closure of database relation in compressed form the existence of specified tuple in the closure can be determined from this compressed store by single look up followed by an index comparision we show how to add nodes and arcs to the compressed closure incrementally we also suggest how this compression technique can be used to reduce the effort required to compute the transitive closure
in this paper we propose novel computational method to infer visual saliency in images the computational method is based on the idea that salient objects should have local characteristics that are different than the rest of the scene being edges color or shape and that these characteristics can be combined to infer global information the proposed approach is fast does not require any learning and the experimentation shows that it can enhance interesting objects in images improving the state of the art performance on public dataset
the publication and reuse of intellectual resources using the web technologies provide no support for us to clip out any portion of web pages to combine them together for their local reuse nor to distribute the newly composed object for its reuse by other people such composition requires both layout composition and functional composition both of them should be performed only through direct manipulation this paper shows how the meme media architecture is applied to the web to provide such support for us this makes the web work as shared repository not only for publishing intellectual resources but also for their collaborative reediting and furthermore for the flexible federation of intellectual resources federation here denotes ad hoc definition and or execution of interoperation among intellectual resources we will propose general framework for clipping arbitrary web contents as live objects and for the recombination and linkage of such clips based on both the original and some user defined relationships among them this framework allows us to define federations of intellectual resources over the web dynamically through direct manipulation
we present the results of two controlled studies comparing layered surface visualizations under various texture conditions the task was to estimate surface normals measured by accuracy of hand set surface normal probe single surface visualization was compared with the two surfaces case under conditions of no texture and with projected grid textures variations in relative texture spacing on top and bottom surfaces were compared as well as opacity of the top surface significant improvements are found for the textured cases over non textured surfaces either larger or thinner top surface textures and lower top surface opacities are shown to give less bottom surface error top surface error appears to be highly resilient to changes in texture given the results we also present an example of how appropriate textures might be useful in volume visualization
this paper aims to address the face recognition problem with wide variety of views we proposed tensor subspace analysis and view manifold modeling based multi view face recognition algorithm by improving the tensorface based one tensor subspace analysis is applied to separate the identity and view information of multi view face images to model the nonlinearity in view subspace novel view manifold is introduced to tensorface thus uniform multi view face model is achieved to deal with the linearity in identity subspace as well as the nonlinearity in view subspace meanwhile parameter estimation algorithm is developed to solve the view and identity factors automatically the new face model yields improved facial recognition rates against the traditional tensorface based method
this paper presents worm it new intrusion tolerant group communication system with membership service and view synchronous atomic multicast primitive the system is intrusion tolerant in the sense that it behaves correctly even if some nodes are corrupted and become malicious it is based on novel approach that enhances the environment with special secure distributed component used by the protocols to execute securely few crucial operations using this approach we manage to bring together two important features worm it tolerates the maximum number of malicious members possible it does not have to detect the failure of primary members problem in previous intrusion tolerant group communication systems
with the exponential growth of web contents recommender system has become indispensable for discovering new information that might interest web users despite their success in the industry traditional recommender systems suffer from several problems first the sparseness of the user item matrix seriously affects the recommendation quality second traditional recommender systems ignore the connections among users which loses the opportunity to provide more accurate and personalized recommendations in this paper aiming at providing more realistic and accurate recommendations we propose factor analysis based optimization framework to incorporate the user trust and distrust relationships into the recommender systems the contributions of this paper are three fold we elaborate how user distrust information can benefit the recommender systems in terms of the trust relations distinct from previous trust aware recommender systems which are based on some heuristics we systematically interpret how to constrain the objective function with trust regularization the experimental results show that the distrust relations among users are as important as the trust relations the complexity analysis shows our method scales linearly with the number of observations while the empirical analysis on large epinions dataset proves that our approaches perform better than the state of the art approaches
one of the challenges with using mobile touch screen devices is that they do not provide tactile feedback to the user thus the user is required to look at the screen to interact with these devices in this paper we present semfeel tactile feedback system which informs the user about the presence of an object where she touches on the screen and can offer additional semantic information about that item through multiple vibration motors that we attached to the backside of mobile touch screen device semfeel can generate different patterns of vibration such as ones that flow from right to left or from top to bottom to help the user interact with mobile device through two user studies we show that users can distinguish ten different patterns including linear patterns and circular pattern at approximately accuracy and that semfeel supports accurate eyes free interactions
bundle adjustment constitutes large nonlinear least squares problem that is often solved as the last step of feature based structure and motion estimation computer vision algorithms to obtain optimal estimates due to the very large number of parameters involved general purpose least squares algorithm incurs high computational and memory storage costs when applied to bundle adjustment fortunately the lack of interaction among certain subgroups of parameters results in the corresponding jacobian being sparse fact that can be exploited to achieve considerable computational savings this article presents sba publicly available software package for realizing generic bundle adjustment with high efficiency and flexibility regarding parameterization
electronic markets dispute resolution and negotiation protocols are three types of application domains that can be viewed as open agent societies key characteristics of such societies are agent heterogeneity conflicting individual goals and unpredictable behavior members of such societies may fail to or even choose not to conform to the norms governing their interactions it has been argued that systems of this type should have formal declarative verifiable and meaningful semantics we present theoretical and computational framework being developed for the executable specification of open agent societies we adopt an external perspective and view societies as instances of normative systems in this article we demonstrate how the framework can be applied to specifying and executing contract net protocol the specification is formalized in two action languages the language and the event calculus and executed using respective software implementations the causal calculator and the society visualizer we evaluate our executable specification in the light of the presented case study discussing the strengths and weaknesses of the employed action languages for the specification of open agent societies
in this paper we present design and an analysis of customized crossbar schedulers for reconfigurable on chip crossbar networks in order to alleviate the scalability problem in conventional crossbar network we propose adaptive schedulers on customized crossbar ports specifically we present scheduler with weighted round robin arbitration scheme that takes into account the bandwidth requirements of specific applications in addition we propose the sharing of schedulers among multiple ports in order to reduce the implementation cost the proposed schedulers arbitrate on demand at design time interconnects and adhere to the link bandwidth requirements where physical topologies are identical to logical topologies for given applications considering conventional crossbar schedulers as reference designs comparative performance analysis is conducted the hardware scheduler modules are parameterized experiments with practical applications show that our custom schedulers occupy up to less area and maintain better performance compared to the reference schedulers
software testing helps ensure not only that the software under development has been implemented correctly but also that further development does not break it if developers introduce new defects into the software these should be detected as early and inexpensively as possible in the development cycle to help optimize which tests are run at what points in the design cycle we have built echelon test prioritization system which prioritizes the application’s given set of tests based on what changes have been made to the programechelon builds on the previous work on test prioritization and proposes practical binary code based approach that scales well to large systems echelon utilizes binary matching system that can accurately compute the differences at basic block granularity between two versions of the program in binary form echelon utilizes fast simple and intuitive heuristic that works well in practice to compute what tests will cover the affected basic blocks in the program echelon orders the given tests to maximally cover the affected program so that defects are likely to be found quickly and inexpensively although the primary focus in echelon is on program changes other criteria can be added in computing the prioritiesechelon is part of test effectiveness infrastructure that runs under the windows environment it is currently being integrated into the microsoft software development process echelon has been tested on large microsoft product binaries the results show that echelon is effective in ordering tests based on changes between two program versions
we introduce new dissimilarity function for ranked lists the expected weighted hoeffding distance that has several advantages over current dissimilarity measures for ranked search results first it is easily customized for users who pay varying degrees of attention to websites at different ranks second unlike existing measures such as generalized kendall’s tau it is based on true metric preserving meaningful embeddings when visualization techniques like multi dimensional scaling are applied third our measure can effectively handle partial or missing rank information while retaining probabilistic interpretation finally the measure can be made computationally tractable and we give highly efficient algorithm for computing it we then apply our new metric with multi dimensional scaling to visualize and explore relationships between the result sets from different search engines showing how the weighted hoeffding distance can distinguish important differences in search engine behavior that are not apparent with other rank distance metrics such visualizations are highly effective at summarizing and analyzing insights on which search engines to use what search strategies users can employ and how search results evolve over time we demonstrate our techniques using collection of popular search engines representative set of queries and frequently used query manipulation methods
despite major advances in formal verification simulation continues to be the dominant workhorse for functional verification abstraction guided simulation has long been promising framework for leveraging the power of formal techniques to help simulation reach difficult target states assertion violations or coverage targets model checking smaller abstracted version of the design avoids complexity blow up yet computes approximate distances from any state of the actual design to the target these approximate distances are used during random simulation to guide the simulator unfortunately the performance of previous work has been unreliable sometimes great sometimes poor the problem is the guidance strategy because the abstract distances are approximate greedy strategy will get stuck in local optima previous works expanded the search horizon to try to avoid dead ends we explore such heuristics and find that they tend to perform poorly adding too much search overhead for limited ability to escape dead ends based on these experiments we propose new guidance strategy which pursues more global search and is better able to avoid getting stuck experiments show that our new guidance strategy is highly effective in most cases that are hard for random simulation and beyond the capacity of formal verification
we present an experimental software repository system that provides organization storage management and access facilities for reusable software components the system intended as part of an applications development environment supports the representation of information about requirements designs and implementations of software and offers facilities for visual presentation of the software objects this article details the features and architecture of the repository system the technical challenges and the choices made for the system development along with usage scenario that illustrates its functionality the system has been developed and evaluated within the context of the ithaca project technology integration software engineering project sponsored by the european communities through the esprit program aimed at developing an integrated reuse centered application development and support environment based on object oriented techniques
we propose maximal figure of merit mfom learning approach for robust classifier design which directly optimizes performance metrics of interest for different target classifiers the proposed approach embedding the decision functions of classifiers and performance metrics into an overall training objective learns the parameters of classifiers in decision feedback manner to effectively take into account both positive and negative training samples thereby reducing the required size of positive training data it has three desirable properties it is performance metric oriented learning the optimized metric is consistent in both training and evaluation sets and it is more robust and less sensitive to data variation and can handle insufficient training data scenarios we evaluate it on text categorization task using the reuters dataset training an based binary tree classifier using mfom we observed significantly improved performance and enhanced robustness compared to the baseline and svm especially for categories with insufficient training samples the generality for designing other metrics based classifiers is also demonstrated by comparing precision recall and based classifiers the results clearly show consistency of performance between the training and evaluation stages for each classifier and mfom optimizes the chosen metric
in this paper we investigate data driven synthesis approach to constructing geometric surface models we provide methods with which user can search large database of meshes to find parts of interest cut the desired parts out of the meshes with intelligent scissoring and composite them together in different ways to form new objects the main benefit of this approach is that it is both easy to learn and able to produce highly detailed geometric models the conceptual design for new models comes from the user while the geometric details come from examples in the database the focus of the paper is on the main research issues motivated by the proposed approach interactive segmentation of surfaces shape based search to find models with parts matching query and composition of parts to form new models we provide new research contributions on all three topics and incorporate them into prototype modeling system experience with our prototype system indicates that it allows untrained users to create interesting and detailed models
software test environments stes provide means of automating the test process and integrating testing tools to support required testing capabilities across the test process specifically stes may support test planning test management test measurement test failure analysis test development and test execution the software architecture of an ste describes the allocation of the environment’s functions to specific implementation structures an ste’s architecture can facilitate or impede modifications such as changes to processing algorithms data representation or functionality performance and reusability are also subject to architecturally imposed constraints evaluation of an ste’s architecture can provide insight into modifiability extensibility portability and reusability of the ste this paper proposes reference architecture for stes its analytical value is demonstrated by using saam software architectural analysis method to compare three software test environments protest ii prolog test environment version ii taos testing with analysis and oracle support and cite convex integrated test environment
random testing is not only useful testing technique in itself but also plays core role in many other testing methods hence any significant improvement to random testing has an impact throughout the software testing community recently adaptive random testing art was proposed as an effective alternative to random testing this paper presents synthesis of the most important research results related to art in the course of our research and through further reflection we have realised how the techniques and concepts of art can be applied in much broader context which we present here we believe such ideas can be applied in variety of areas of software testing and even beyond software testing amongst these ideas we particularly note the fundamental role of diversity in test case selection strategies we hope this paper serves to provoke further discussions and investigations of these ideas
virtualization is often used in cloud computing platforms for its several advantages in efficiently managing resources however virtualization raises certain additional challenges and one of them is lack of power metering for virtual machines vms power management requirements in modern data centers have led to most new servers providing power usage measurement in hardware and alternate solutions exist for older servers using circuit and outlet level measurements however vm power cannot be measured purely in hardware we present solution for vm power metering named joulemeter we build power models to infer power consumption from resource usage at runtime and identify the challenges that arise when applying such models for vm power metering we show how existing instrumentation in server hardware and hypervisors can be used to build the required power models on real platforms with low error our approach is designed to operate with extremely low runtime overhead while providing practically useful accuracy we illustrate the use of the proposed metering capability for vm power capping technique to reduce power provisioning costs in data centers experiments are performed on server traces from several thousand production servers hosting microsoft’s real world applications such as windows live messenger the results show that not only does vm power metering allows virtalized data centers to achieve the same savings that non virtualized data centers achieved through physical server power capping but also that it enables further savings in provisioning costs with virtualization
this paper proposes novel framework for music content indexing and retrieval the music structure information ie timing harmony and music region content is represented by the layers of the music structure pyramid we begin by extracting this layered structure information we analyze the rhythm of the music and then segment the signal proportional to the inter beat intervals thus the timing information is incorporated in the segmentation process which we call beat space segmentation to describe harmony events we propose two layer hierarchical approach to model the music chords we also model the progression of instrumental and vocal content as acoustic events after information extraction we propose vector space modeling approach which uses these events as the indexing terms in query by example music retrieval query is represented by vector of the statistics of the gram events we then propose two effective retrieval models hard indexing scheme and soft indexing scheme experiments show that the vector space modeling is effective in representing the layered music information achieving top retrieval accuracy using sec music clips as the queries the soft indexing outperforms hard indexing in general
this paper reports on an exploratory experimental study of the relationships between physical movement and desired visual information in the performance of video mediated collaborative tasks in the real world by geographically distributed groups twenty three pairs of participants one helper and one worker linked only by video and audio participated in lego construction task in one of three experimental conditions fixed scene camera helper controlled pan tilt zoom camera and dedicated operator controlled camera worker motion was tracked in space for all three conditions as were all camera movements results suggest performance benefits for the operator controlled condition and the relationships between camera position movement and worker action are explored to generate preliminary theoretical and design implications
collaborative filtering is concerned with making recommendations about items to users most formulations of the problem are specifically designed for predicting user ratings assuming past data of explicit user ratings is available however in practice we may only have implicit evidence of user preference and furthermore better view of the task is of generating top list of items that the user is most likely to like in this regard we argue that collaborative filtering can be directly cast as relevance ranking problem we begin with the classic probability ranking principle of information retrieval proposing probabilistic item ranking framework in the framework we derive two different ranking models showing that despite their common origin different factorizations reflect two distinctive ways to approach item ranking for the model estimations we limit our discussions to implicit user preference data and adopt an approximation method introduced in the classic text retrieval model ie the okapi bm formula to effectively decouple frequency counts and presence absence counts in the preference data furthermore we extend the basic formula by proposing the bayesian inference to estimate the probability of relevance and non relevance which largely alleviates the data sparsity problem apart from theoretical contribution our experiments on real data sets demonstrate that the proposed methods perform significantly better than other strong baselines
agent technology is emerging as new software paradigm in the areas of distributed computing the use of multiple agents is common technique in agent based systems in distributed agent systems it is often required for two agents to communicate securely over public network authentication and key exchange are fundamental for establishing secure communication channels over public insecure networks password based protocols for authenticated key exchange are designed to work even when user authentication is done via the use of passwords drawn from small known set of values there have been many protocols proposed over the years for password authenticated key exchange in the three party scenario in which two agents attempt to establish secret key interacting with one same authentication server however little has been done for password authenticated key exchange in the more general and realistic four party setting where two clients or two agents trying to establish secret key are registered with different authentication servers in this paper we propose new protocol designed carefully for four party password authenticated key exchange that requires each agent only to remember password shared with its authentication server
the increased use of scanned geometry for applications in computer graphics and hardcopy output has highlighted the need for general robust algorithms for reconstruction of watertight models given partial polygonal meshes as input we present an algorithm for hole filling based on decomposition of space into atomic volumes which are each determined to be either completely inside or completely outside the model by defining the output model as the union of interior atomic volumes we guarantee that the resulting mesh is watertight individual volumes are labeled as inside or outside by computing minimum cost cut of graph representation of the atomic volume structure patching all the holes simultaneously in globally sensitive manner user control is provided to select between multiple topologically distinct yet still valid ways of filling holes finally we use an octree decomposition of space to provide output sensitive computation time we demonstrate the ability of our algorithm to fill complex non planar holes in large meshes obtained from scanning devices
this paper re examines the soft error effect caused by cosmic radiation in sub nm technologies considering the impact of process variation number of statistical natures of transient faults are found more sophisticated than their static ones we apply the state of the art statistical learning algorithm to tackle the complexity of these natures and build compact yet accurate generation and propagation models for transient fault distributions statistical analysis framework for soft error rate ser is also proposed on the basis of these models experimental results show that the proposed framework can obtain improved ser estimation compared to the static approaches
static analysis may cause state space explosion problem in this paper we explore differential equation model that makes the task of verifying software architecture properties much more efficient we demonstrate how ordinary differential equations can be used to verify application specific properties of an architecture description without hitting this problem an architecture behavior can be modeled by group of ordinary differential equations containing some control parameters where the control parameters are used to represent deterministic nondeterministic choices each equation describes the state change by checking the conditions associated with the control parameters we can check whether an equation model is feasible after solving feasible equation model based on the solution behavior and the state variable representation we can analyze properties of the architecture wright architecture description of the gas station problem has been used as the example to illustrate our method all of the equations have been computed with matlab tool
the paper presents new approach to database preferences queries where preferences are represented in possibilistic logic manner using symbolic weights the symbolic weights may be processed without assessing their precise value which leaves the freedom for the user to not specify any priority among the preferences the user may also enforce partial ordering between them if necessary the approach can be related to the processing of fuzzy queries whose components are conditionally weighted in terms of importance here importance levels are symbolically processed and refinements of both pareto ordering and minimum ordering are used the representational power of the proposed setting is stressed while the approach is compared with database best operator like methods and with the cp net approach developed in artificial intelligence the paper also provides structured and rather broad overview of the different lines of research in the literature dealing with the handling of preferences in database queries
information fusion can assist in the development of sensor network applications by merging capabilities raw data and decisions from multiple sensors through distributed and collaborative integration algorithms in this paper we introduce multi layered middleware driven multi agent interoperable architecture for distributed sensor networks that bridges the gap between the programmable application layer consisting of software agents and the physical layer consisting of sensor nodes we adopt an energy efficient fault tolerant approach for collaborative information processing among multiple sensor nodes using mobile agent based computing model in this model the sink base station deploys mobile agents that migrate from node to node following certain itinerary either pre determined or determined on the fly and fuse the information data locally at each node this way the intelligence is distributed throughout the network edge and communication cost is reduced to make the sensor network energy efficient we evaluate the performance of our mobile agent based approach as well as that of the traditional client server based computing model vis vis energy consumption and execution time through both analytical study and simulation we draw important conclusions based on our findings finally we consider collaborative target classification application supported by our architectural framework to illustrate the efficacy of the mobile agent based computing model
in the past several attempts have been made to deal with the state space explosion problem by equipping depth first search dfs algorithm with state cache or by avoiding collision detection thereby keeping the state hash table at fixed size most of these attempts are tailored specifically for dfs and are often not guaranteed to terminate and or to exhaustively visit all the states in this paper we propose general framework of hierarchical caches which can also be used by breadth first searches bfs our method based on an adequate sampling of bfs levels during the traversal guarantees that the bfs terminates and traverses all transitions of the state space we define several static or adaptive configurations of hierarchical caches and we study experimentally their effectiveness on benchmark examples of state spaces and on several communication protocols using generic implementation of the cache framework that we developed within the cadp toolbox
this paper introduces novel examplar based inpainting algorithm through investigating the sparsity of natural image patches two novel concepts of sparsity at the patch level are proposed for modeling the patch priority and patch representation which are two crucial steps for patch propagation in the examplar based inpainting approach first patch structure sparsity is designed to measure the confidence of patch located at the image structure eg the edge or corner by the sparseness of its nonzero similarities to the neighboring patches the patch with larger structure sparsity will be assigned higher priority for further inpainting second it is assumed that the patch to be filled can be represented by the sparse linear combination of candidate patches under the local patch consistency constraint in framework of sparse representation compared with the traditional examplar based inpainting approach structure sparsity enables better discrimination of structure and texture and the patch sparse representation forces the newly inpainted regions to be sharp and consistent with the surrounding textures experiments on synthetic and natural images show the advantages of the proposed approach
we define type system for cows formalism for specifying and combining services while modelling their dynamic behaviour our types permit to express policies constraining data exchanges in terms of sets of service partner names attachable to each single datum service programmers explicitly write only the annotations necessary to specify the wanted policies for communicable data while type inference system statically derives the minimal additional annotations that ensure consistency of services initial configuration then the language dynamic semantics only performs very simple checks to authorize or block communication we prove that the type system and the operational semantics are sound as consequence we have the following data protection property services always comply with the policies regulating the exchange of data among interacting services we illustrate our approach through simplified but realistic scenario for service based electronic marketplace
planar parameterization of surfaces is useful for many applications the applicability of the parameterization depends on how well it preserves the surface geometry angles distances and areas for most surface meshes there is no parameterization which preserves the geometry exactly the distortion usually increases with the rise in surface complexity for highly complicated surfaces the distortion can become so strong as to make the parameterization unusable for application’s purposes solution is to partition the surface or introduce cuts in way which will reduce the distortion this article presents new method for cutting seams in mesh surfaces the addition of seams reduces the surface complexity and hence reduces the metric distortion produced by the parameterization seams often introduce additional constraints on the application for which the parameterization is used hence their length should be minimal the presented method minimizes the seam length while reducing the parameterization distortion to acceptable level
the proxy signature schemes allow proxy signers to sign messages on behalf of an original signer company or an organization such schemes have been suggested for use in number of applications particularly in distributed computing where delegation of rights is quite common most of proxy signature schemes previously proposed in literatures are based on discrete logarithms or from pairings in shao proposed the first two proxy signature schemes based on rsa though being very efficient they have no formal security proofs in this paper we provide formal security proofs under strong security model in the random oracle model after minor modification
runtime monitoring support serves as foundation for the important tasks of providing security performing debugging and improving performance of applications runtime monitoring typically requires the maintenance of meta data associated with each of the application’s original memory location which are held in corresponding shadow memory locations each original memory instruction omi is then accompanied by additional shadow memory instructions smis that manipulate the meta data associated with the memory location often the smis associated with omis are symmetric in that original stores loads are accompanied by shadow stores loads unfortunately existing shadow memory implementations need thread serialization to ensure that omis and smis are executed atomically naturally this is not an efficient approach especially in the now ubiquitous multiprocessors in this paper we present an efficient shadow memory implementation that handles symmetric shadow instructions by coupling the coherency of shadow memory with the coherency of the main memory we ensure that the smis execute atomically with their corresponding omis we also couple the allocation of application memory pages with its associated shadow pages for enabling fast translation of original addresses into corresponding shadow memory addresses our experiments show that the overheads of run time monitoring tasks are significantly reduced in comparison to previous software implementations
novel behavioral detection framework is proposed to detect mobile worms viruses and trojans instead of the signature based solutions currently available for use in mobile devices first we propose an efficient representation of malware behaviors based on key observation that the logical ordering of an application’s actions over time often reveals the malicious intent even when each action alone may appear harmless then we generate database of malicious behavior signatures by studying more than distinct families of mobile viruses and worms targeting the symbian os the most widely deployed handset os and their variants next we propose two stage mapping technique that constructs these signatures at run time from the monitored system events and api calls in symbian os we discriminate the malicious behavior of malware from the normal behavior of applications by training classifier based on support vector machines svms our evaluation on both simulated and real world malware samples indicates that behavioral detection can identify current mobile viruses and worms with more than accuracy we also find that the time and resource overheads of constructing the behavior signatures from low level api calls are acceptably low for their deployment in mobile devices
as organizations scale up their collective knowledge increases and the potential for serendipitous collaboration between members grows dramatically however finding people with the right expertise or interests becomes much more difficult semi structured social media such as blogs forums and bookmarking present viable platform for collaboration if enough people participate and if shared content is easily findable within the trusted confines of an organization users can trade anonymity for rich identity that carries information about their role location and position in its hierarchy this paper describes watercooler tool that aggregates shared internal social media and cross references it with an organization’s directory we deployed watercooler in large global enterprise and present the results of preliminary user study despite the lack of complete social networking affordances we find that watercooler changed users perceptions of their workplace made them feel more connected to each other and the company and redistributed users attention outside their own business groups
mutual exclusion is fundamental distributed coordination problem shared memory mutual exclusion research focuses on local spin algorithms and uses the remote memory references rmrs metric recent proof established an log lower bound on the number of rmrs incurred by processes as they enter and exit the critical section matching an upper bound by yang and anderson both these bounds apply for algorithms that only use read and write operations the lower bound of only holds for deterministic algorithms however the question of whether randomized mutual exclusion algorithms using reads and writes only can achieve sub logarithmic expected rmr complexity remained open this paper answers this question in the affirmative we present two strong adversary randomized local spin mutual exclusion algorithms in both algorithms processes incur log log log expected rmrs per passage in every execution our first algorithm has sub optimal worst case rmr complexity of log log log our second algorithm is variant of the first that can be combined with deterministic algorithm such as to obtain log worst case rmr complexity the combined algorithm thus achieves sub logarithmic expected rmr complexity while maintaining optimal worst case rmr complexity our upper bounds apply for both the cache coherent cc and the distributed shared memory dsm models
the autofeed system automatically extracts data from semistructured web sites previously researchers have developed two types of supervised learning approaches for extracting web data methods that create precise site specific extraction rules and methods that learn less precise site independent extraction rules in either case significant training is required autofeed follows third more ambitious approach in which unsupervised learning is used to analyze sites and discover their structure our method relies on set of heterogeneous experts each of which is capable of identifying certain types of generic structure each expert represents its discoveries as hints based on these hints our system clusters the pages and identifies semi structured data that can be extracted to identify good clustering we use probabilistic model of the hint generation process this paper summarizes our formulation of the fully automatic web extraction problem our clustering approach and our results on set of experiments
we present new parallel sparse lu factorization algorithm and code the algorithm uses column preordering partial pivoting unsymmetric pattern multifrontal approach our baseline sequential algorithm is based on umfpack but is somewhat simpler and is often somewhat faster than umfpack version our parallel algorithm is designed for shared memory machines with small or moderate number of processors we tested it on up to processors we experimentally compare our algorithm with superlumt an existing shared memory sparse lu factorization with partial pivoting superlumt scales better than our new algorithm but our algorithm is more reliable and is usually faster more specifically on matrices that are costly to factor our algorithm is usually faster on up to processors and is usually faster on and we were not able to run superlumt on the main contribution of this article is showing that the column preordering partial pivoting unsymmetric pattern multifrontal approach developed as sequential algorithm by davis in several recent versions of umfpack can be effectively parallelized
many high performance dsp processors employ multi bank on chip memory to improve performance and energy consumption this architectural feature supports higher memory bandwidth by allowing multiple data memory accesses to be executed in parallel however making effective use of multi bank memory remains difficult considering the combined effect of performance and energy requirement this paper studies the scheduling and assignment problem about how to minimize the total energy consumption while satisfying the timing constraint with heterogeneous multi bank memory for applications with loop an algorithm tasl type assignment and scheduling for loops is proposed the algorithm uses bank type assignment with the consideration of variable partition to find the best configuration for both memory and alu the experimental results show that the average improvement on energy saving is significant by using tasl
method is presented for tracking objects as they transform rigidly in space within sparse range image sequence the method operates in discrete space and exploits the coherence across image frames that results from the relationship between known bounds on the object’s velocity and the sensor frame rate these motion bounds allow the interframe transformation space to be reduced to reasonable and indeed tiny size comprising only tens or hundreds of possible states the tracking problem is in this way cast into classification framework effectively trading off localization precision for runtime efficiency and robustness the method has been implemented and tested extensively on variety of freeform objects within sparse range data stream comprising only few hundred points per image it has been shown to compare favorably against continuous domain iterative closest point icp tracking methods performing both more efficiently and more robustly hybrid method has also been implemented that executes small number of icp iterations following the initial discrete classification phase this hybrid method is both more efficient than the icp alone and more robust than either the discrete classification method or the icp separately
compiler for multi threaded object oriented programs needs information about the sharing of objects for variety of reasons to implement optimizations to issue warnings to add instrumentation to detect access violations that occur at runtime an object use graph oug statically captures accesses from different threads to objects an oug extends the heap shape graph hsg which is compile time abstraction for runtime objects nodes and their reference relations edges an oug specifies for specific node in the hsg partial order of events relevant to the corresponding runtime object relevant events include read and write access object escape thread start and joinougs have been implemented in java compiler initial experience shows that ougs are effective to identify object accesses that potentially conflict at runtime and isolate accesses that never cause problem at runtime the capabilities of ougs are compared with an advanced program analysis that has been used for lock elimination for the set of benchmarks investigated here ougs report only fraction of shared objects as conflicting and reduce the number of compile time reports in terms of allocation sites of conflicting objects by average for benchmarks of up to kloc the time taken to construct ougs is with one exception in the order of secondsthe information collected in the oug has been used to instrument java programs with checks for object races ougs provide precise information about object sharing and static protection so runtime instrumentation that checks those cases that cannot be disambiguated at compile time is sparse and the total runtime overhead of checking for object races is only average
in this paper we present metadata based recommendation algorithms addressing two scenarios within social desktop communities recommendation of resources from the co worker’s desktop and recommendation of metadata for enriching the own annotation layer together with the algorithms we present first evaluation results as well as empirical evaluations showing that metadata based recommendations can be used in such distributed social desktop communities
sensor networks consist of devices that make various observations in the environment and communicate these observations to central processing unit from where users can access collected data in this regard users interpretation of collected data highly depends on the reported location of the sensor making an observation gps is an established technology to enable precise location information when deployed in open field yet resource constraints and size issues prohibit its use in small sensor nodes that are designed to be cost efficient instead locations are estimated using number of approaches to date however the focus of such estimations was based on individual accuracy of sensor locations in isolation to the complete network in this paper we discuss problems with such approaches in terms of data management and analysis we propose novel location estimation algorithm called quad quadrant based localization to enable representative topology information in particular quad makes use of relative distances from landmark points to determine the quadrant node resides in and refines estimations according to neighbour provided information quad makes use of uncertainty levels in estimates to further assist data analysis our experiment results suggest significant improvements in individual accuracy prior to optional refinements drastic improvements are achieved in the overall topology using refinements
long scenes can be imaged by mosaicing multiple images from cameras scanning the scene we address the case of video camera scanning scene while moving in long path eg scanning city street from driving car or scanning terrain from low flying aircraft robust approach to this task is presented which is applied successfully to sequences having thousands of frames even when using hand held camera examples are given on few challenging sequences the proposed system consists of two components motion and depth computation ii mosaic rendering in the first part direct method is presented for computing motion and dense depth robustness of motion computation has been increased by limiting the motion model for the scanning camera an iterative graph cuts approach with planar labels and flexible similarity measure allows the computation of dense depth for the entire sequence in the second part new minimal aspect distortion mad mosaicing uses depth to minimize the geometrical distortions of long panoramic images in addition to mad mosaicing interactive visualization using slits is also demonstrated
in mixed focus collaboration users continuously switch between individual and group work we have developed new two person interaction mechanism coupled tele desktops that is arguably not biased towards individual or group work we evaluate this mechanism and the general idea of mixed focus collaboration using new quantitative framework consisting of set of precisely defined coupling modes determining the extent of individual and group work and the times spent in durations of and number of transitions among these modes we describe new visualization scheme for compactly displaying these metrics in an individual collaborative session we use this framework to characterize about forty six person hours of use of coupled tele desktops most of which involved collaborative use of ui builder our results include quantitative motivation for coupled tele desktops and several new quantitative observations and quantification of several earlier qualitative observations regarding mixed focus collaboration
as global interconnection bus is critical for chip performance in deep submicron technology reducing bus routing vias will facilitate the lithography and give bus routing higher yield and also higher performance in this paper we present floorplan revising method to minimize the number of reducible routing vias with controllable loss on the chip area and wirelength therefore it is easy to make proper tradeoff between via reduction and revising loss experiments show that our method reaches and reduction of routing vias which is close to and runs fast besides our revising is friendly to all third party floorplanners which can be applied to any existing floorplans to reduce vias it is also scalable to larger benchmarks
the probabilistic roadmap algorithm is leading heuristic for robot motion planning it is extremely efficient in practice yet its worst case convergence time is unbounded as function of the input’s combinatorial complexity we prove smoothed polynomial upper bound on the number of samples required to produce an accurate probabilistic roadmap and thus on the running time of the algorithm in an environment of simplices this sheds light on its widespread empirical success
we present ripples system which enables visualizations around each contact point on touch display and through these visualizations provides feedback to the user about successes and errors of their touch interactions our visualization system is engineered to be overlaid on top of existing applications without requiring the applications to be modified in any way and functions independently of the application’s responses to user input ripples reduces the fundamental problem of ambiguity of feedback when an action results in an unexpected behaviour this ambiguity can be caused by wide variety of sources we describe the ambiguity problem and identify those sources we then define set of visual states and transitions needed to resolve this ambiguity of use to anyone designing touch applications or systems we then present the ripples implementation of visualizations for those states and the results of user study demonstrating user preference for the system and demonstrating its utility in reducing errors
in sensor networks sensors are prone to be captured by attackers because they are usually deployed in unattended surroundings if an adversarial compromises sensor he she uses the keys from the compromised sensor to uncover the keys of others sensors therefore it is very important to renew the keys of sensors in periodic or reactive manner even though many group key renewal schemes for distributed key renewals have been proposed they expose some flaws first they employ single group key in cluster so that the compromise of one sensor discloses the group key second they evict the compromised nodes by updating the compromised keys with non compromised keys this eviction scheme is useless when the non compromised keys are exhausted due to the increase of compromised nodes in this paper we propose lightweight key renewal scheme which evicts the compromised nodes clearly by reforming clusters without compromised nodes besides in cluster each member employs pairwise key for communication with its ch cluster head so that our scheme is tolerable against sensor compromise our simulation results prove that the proposed scheme is more tolerable against the compromise of sensors and it is more energy saving than the group key renewal schemes
this article presents dynamic feedback technique that enables computations to adapt dynamically to different execution environments compiler that uses dynamic feedback produces several different versions of the same source code each version uses different optimization policy the generated code alternately performs sampling phases and production phases each sampling phase measures the overhead of each version in the current environment each production phase uses the version with the least overhead in the previous sampling phase the computation periodically resamples to adjust dynamically to changes in the environment we have implemented dynamic feedback in the context of parallelizing compiler for object based programs the generated code uses dynamic feedback to automatically choose the best synchronization optimization policy our experimental results show that the synchronization optimization policy has significant impact on the overall performance of the computation that the best policy varies from program to program that the compiler is unable to statically choose the best policy and that dynamic feedback enables the generted code to exhibit performance that is comparable to that of code that has been manually tuned to use the best policy we have also performed theoretical analysis which provides under certain assumptions guaranteed optimality bound for dynamic feedback relative to hypothetical and unrealizable optimal algorithm that uses the best policy at every point during the execution
although cryptographic protocols are typically analyzed in isolation they are used in combinations if protocol when analyzed alone was shown to meet some security goals will it still meet those goals when executed together with second protocol not necessarily for every some undermine its goals we use the strand space authentication test principles to suggest criterion to ensure preserves goals this criterion strengthens previous proposalssecurity goals for are expressed in language mathcal in classical logic strand spaces provide the models for mathcal certain homomorphisms among models for mathcal preserve the truth of the security goals this gives way to extract from counterexample to goal that uses both protocols counterexample using only the first protocol this model theoretic technique using homomorphisms among models to prove results about syntactically defined set of formulas appears to be novel for protocol analysis
high retrieval precision in content based image retrieval can be attained by adopting relevance feedback mechanisms the main difficulties in exploiting relevance information are the gap between user perception of similarity and the similarity computed in the feature space used for the representation of image content and ii the availability of few training data users typically label few dozen of images at present svm are extensively used to learn from relevance feedback due to their capability of effectively tackling the above difficulties however the performances of svm depend on the tuning of number of parameters in this paper different approach based on the nearest neighbor paradigm is proposed each image is ranked according to relevance score depending on nearest neighbor distances this approach is proposed both in low level feature spaces and in dissimilarity spaces where image are represented in terms of their dissimilarities from the set of relevant images reported results show that the proposed approach allows recalling higher percentage of images with respect to svm based techniques
the state explosion problem remains major hurdle in applying symbolic model checking to large hardware designs state space abstraction having been essential for verifying designs of industrial complexity is typically manual process requiring considerable creativity and insightin this article we present an automatic iterative abstraction refinement methodology that extends symbolic model checking in our method the initial abstract model is generated by an automatic analysis of the control structures in the program to be verified abstract models may admit erroneous or spurious counterexamples we devise new symbolic techniques that analyze such counterexamples and refine the abstract model correspondingly we describe asmv prototype implementation of our methodology in nusmv practical experiments including large fujitsu ip core design with about latches and lines of smv code confirm the effectiveness of our approach
power consumption of the information and communication technology sector ict has recently become key challenge in particular actions to improve energy efficiency of internet service providers isps are becoming imperative to this purpose in this paper we focus on reducing the power consumption of access nodes in an isp network by controlling the amount of service capacity each network device has to offer to meet the actual traffic demand more specifically we propose green router router implementing congestion control technique named active window management awm coupled with new capacity scaling algorithm named energy aware service rate tuner handling earth the awm characteristics allow to detect whether waste of energy is playing out whereas earth is aimed at invoking power management primitives at the hardware level to precisely control the current capacity of access nodes and consequently their power consumption we test the benefits of the awm earth mechanism on realistic scenario results show that the capacity scaling technique can save up to of power consumption while guaranteeing quality of service and traffic demand constraints
if current technology scaling trends hold leakage power dissipation will soon become the dominant source of power consumption caches because of the fact that they account for the largest fraction of on chip transistors in most modern processors are primary candidate for attacking the leakage problem while there has been flurry of research in this area over the last several years major question remains unanswered what is the total potential of existing architectural and circuit techniques to address this important design concern in this paper we explore the limits in which existing circuit and architecture technologies may address this growing problem we first formally propose parameterized model that can determine the optimal leakage savings based on the perfect knowledge of the address trace by carefully applying the sleep and drowsy modes we find that the total leakage power from the instruction cache data cache and unified cache may be reduced to mere and percnt respectively of the unoptimized case we further study how such model can be extended to obtain the optimal leakage power savings for different cache configurations
in this paper we show novel method for modelling behaviours of security protocols using networks of communicating automata in order to verify them with sat based bounded model checking these automata correspond to executions of the participants as well as to their knowledge about letters given bounded number of sessions we can verify both correctness or incorrectness of security protocol proving either reachability or unreachability of an undesired state we exemplify all our notions on the needham schroeder public key authentication protocol nspk and show experimental results for checking authentication using the verification tool verics
these lecture notes introduce libraries for datatype generic programming in haskell we introduce three characteristic generic programming libraries lightweight implementation of generics and dynamics extensible and modular generics for the masses and scrap your boilerplate we show how to use them to use and write generic programs in the case studies for the different libraries we introduce generic components of medium sized application which assists student in solving mathematical exercises
analytical models of deterministic routing in common wormhole routed networks such as the hypercube have been widely reported in the literature however all these models have been discussed for the uniform traffic pattern the performance of deterministic routing under other important non uniform communication patterns such as hot spots has often been analyzed through simulation the main advantage of the analytical approach over simulation is that the analytical models can be used to obtain performance results for large systems that are infeasible by simulation due to the excessive computation demands on conventional computers this paper presents the first analytical model of deterministic routing in the hypercube in the presence of hot spot traffic simulation results confirm that the proposed model predicts message latency with reasonable degree of accuracy under different traffic conditions
the emerging pervasive computing services will eventually lead to the establishment of context marketplace where context consumers will be able to obtain the information they require by plethora of context providers in this marketplace several aspects need to be addressed such as support for flexible federation among context stakeholders enabling them to share data when required efficient query handling based on navigational spatial or semantic criteria performance optimization especially when management of mobile physical objects is required and enforcement of privacy and security protection techniques concerning the sensitive context information maintained or traded this paper presents mechanisms that address the aforementioned requirements these mechanisms establish robust spatially enhanced distributed context management framework and have already been designed and carefully implemented
query processing for data streams raises challenges that cannot be directly handled by existing database management systems dbms most related work in the literature mainly focuses on developing techniques for dedicated data stream management system dsms these systems typically either do not permit joining data streams with conventional relations or simply convert relations to streams before joining in this paper we present techniques to process queries that join data streams with relations without treating relations as special streams we focus on typical type of such queries called star streaming joins we process these queries based on the semantics of sliding window joins over data streams and apply load shedding approximation when system resources are limited recently proposed window join approximation based on importance semantics for data streams is extended in this paper to maximize the total importance of the approximation result of star streaming join both online and offline approximation algorithms are discussed our experimental results demonstrate that the presented techniques are quite promising in processing star streaming joins to achieve the maximum total importance of their approximation results
the problem considered in this article is how does one go about discovering and designing intelligent systems the solution to this problem is considered in the context of what is known as wisdom technology wistech an important computing and reasoning paradigm for intelligent systems rough granular approach to wistech is proposed for developing one of its possible foundations the proposed approach is in sense the result of the evolution of computation models developed in the rasiowa pawlak school we also present long term program for implementation of what is known as wisdom engine the program is defined in the framework of cooperation of many research development institutions and is based on wistech network wn organization
texture analysis is one possible method to detect features in biomedical images during texture analysis texture related information is found by examining local variations in image brightness dimensional haralick texture analysis is method that extracts local variations along space and time dimensions and represents them as collection of fourteen statistical parameters however the application of the haralick method on large time dependent and image datasets is hindered by computation and memory requirements this paper presents parallel implementation of haralick texture analysis on pc clusters we present performance evaluation of our implementation on cluster of pcs our results show that good performance can be achieved for this application via combined use of task and data parallelism
pki has history of very poor support for revocation it is both too expensive and too coarse grained so that private keys which are compromised or otherwise become invalid remain in use long after they should have been revoked this paper considers instant revocation or revocations which take place within second or twoa new revocation scheme certificate push revocation cpr is described which can support instant revocation cpr can be hundreds to thousands of times more internet bandwidth efficient than traditional and widely deployed schemes it also achieves significant improvements in cryptographic overheads its costs are essentially independent of the number of queries encouraging widespread use of pki authenticationalthough explored in the context of instant revocation cpr is even more efficient both in relative and absolute terms when used with coarser grain non instant revocations
classic direct mechanisms require full utility revelation from agents which can be very difficult in practical multi attribute settings in this work we study partial revelation within the framework of one shot mechanisms each agent’s type space is partitioned into finite set of partial types and agents should report the partial type within which their full type lies classic result implies that implementation in dominant strategies is impossible in this model we first show that relaxation to bayes nash implementation does not circumvent the problem we then propose class of partial revelation mechanisms that achieve approximate dominant strategy implementation and describe computationally tractable algorithm for myopically optimizing the partitioning of each agent’s type space to reduce manipulability and social welfare loss this allows for the automated design of one shot partial revelation mechanisms with worst case guarantees on both manipulability and efficiency
in this paper we focus on new problem in database integration attribute correspondence identification in multilingual schemas and give rule based method for the problem attribute correspondence identification in multilingual schemas involves the study of integrating schemas designed in various languages and returns the correspondences among those schemas we first analyze the problem through two schemas of large financial corporate of china based on the relationships of the attribute names of the schemas method of name based attribute correspondence identification is proposed and we give computer aided system to deal with the problem according to the method the components and identifying procedure of the method have been discussed in detail in the paper we have implemented prototype and the tool shows its effectiveness in application
the problem of re ranking initial retrieval results exploring the intrinsic structure of documents is widely researched in information retrieval ir and has attracted considerable amount of time and study however one of the drawbacks is that those algorithms treat queries and documents separately furthermore most of the approaches are predominantly built upon graph based methods which may ignore some hidden information among the retrieval set this paper proposes novel document re ranking method based on latent dirichlet allocation lda which exploits the implicit structure of the documents with respect to original queries rather than relying on graph based techniques to identify the internal structure the approach tries to find the latent structure of topics or concepts in the initial re trieval set then we compute the distance between queries and initial retrieval results based on latent semantic information deduced empirical results demonstrate that the method can comfortably achieve significant improvement over various baseline systems
with the growing demand in learning numerous research works have been done to enhance teaching quality in learning environments among these studies researchers have indicated that adaptive learning is critical requirement for promoting the learning performance of students adaptive learning provides adaptive learning materials learning strategies and or courses according to student’s learning style hence the first step for achieving adaptive learning environments is to identify students learning styles this paper proposes learning style classification mechanism to classify and then identify students learning styles the proposed mechanism improves nearest neighbor nn classification and combines it with genetic algorithms ga to demonstrate the viability of the proposed mechanism the proposed mechanism is implemented on an open learning management system the learning behavioral features of elementary school students are collected and then classified by the proposed mechanism the experimental results indicate that the proposed classification mechanism can effectively classify and identify students learning styles
this paper proposes novel web page segmentation method for mobile browsing aiming to break web page into visually and semantically coherent units fitted to the limited screen size of mobile devices we intend to simulate humans perceptive process guided by four general laws in gestalt theory namely proximity similarity closure and simplicity we also present an application of adapting web pages to mobile terminals based on segmentation experimental results show that the proposed method is efficient and can greatly improve segmentation accuracy
to exploit the similarity information hidden in the hyperlink structure of the web this paper introduces algorithms scalable to graphs with billions of vertices on distributed architecture the similarity of multistep neighborhoods of vertices are numerically evaluated by similarity functions including simrank recursive refinement of cocitation and psimrank novel variant with better theoretical characteristics our methods are presented in general framework of monte carlo similarity search algorithms that precompute an index database of random fingerprints and at query time similarities are estimated from the fingerprints we justify our approximation method by asymptotic worst case lower bounds we show that there is significant gap between exact and approximate approaches and suggest that the exact computation in general is infeasible for large scale inputs we were the first to evaluate simrank on real web data on the stanford webbase graph of pages the quality of the methods increased significantly in each refinement step until step four
lists multisets and sets are well known data structures whose usefulness is widely recognized in various areas of computer science they have been analyzed from an axiomatic point of view with parametric approach in dovier et al lsqb rsqb where the relevant unification algorithms have been developed in this article we extend these results considering more general constraints namely equality and membership constraints and their negative counterparts
the most intuitive memory model for shared memory multithreaded programming is sequential consistency sc but it disallows the use of many compiler and hardware optimizations thereby impacting performance data race free drf models such as the proposed memory model guarantee sc execution for datarace free programs but these models provide no guarantee at all for racy programs compromising the safety and debuggability of such programs to address the safety issue the java memory model which is also based on the drf model provides weak semantics for racy executions however this semantics is subtle and complex making it difficult for programmers to reason about their programs and for compiler writers to ensure the correctness of compiler optimizations we present the drfx memory model which is simple for programmers to understand and use while still supporting many common optimizations we introduce memory model mm exception which can be signaled to halt execution if program executes without throwing this exception then drfx guarantees that the execution is sc if program throws an mm exception during an execution then drfx guarantees that the program has data race we observe that sc violations can be detected in hardware through lightweight form of conflict detection furthermore our model safely allows aggressive compiler and hardware optimizations within compiler designated program regions we formalize our memory model prove several properties about this model describe compiler and hardware design suitable for drfx and evaluate the performance overhead due to our compiler and hardware requirements
this paper presents cgal kernel for algorithms manipulating spheres circles and circular arcs the paper makes three contributions first the mathematics underlying two non trivial predicates are presented second the design of the kernel concept is developed and the connexion between the mathematics and this design is established in particular we show how two different frameworks can be combined one for the general setting and one dedicated to the case where all the objects handled lie on reference sphere finally an assessment about the efficacy of the spherical kernel is made through the calculation of the exact arrangement of circles on sphere on average while computing arrangements with few degeneracies on sample molecular models it is shown that certifying the result incurs modest factor of two with respect to calculations using plain double arithmetic
how do we conceptualize social awareness and what support is needed to develop and maintain social awareness in flexible work settings the paper begins by arguing the relevance of designing for social awareness in flexible work it points out how social awareness is suspended in the field of tension that exists between the ephemerality and continuity of social encounters exploring ways to construct identity through relationships by means of social encounters notably those that are accidental and unforced we probe into this issue through design research in particular we present three exploratory prototyping processes in an open office setting examining the concepts of shared calendar personal panels and ambient awareness cues field studies conducted in parallel have contributed to conceptual deconstruction of cscw concepts resulting in focus on cues to relatedness to belonging and to care analyzing these three prototypes in their microcosmic usage setting results in specific recommendations for the three types of applications with respect to social awareness the experiences indicate that the metaphors shared mirror and breadcrumbs are promising foundations on which to base further design we present these analyses and suggest that the metaphors work because of their ability to map experiences from the physical space into conceptual experiences we conclude that social awareness in flexible work must be constructed indirectly presenting itself as an option rather than as consequence of being able to overhear and oversee
best practices currently state that the security requirements and security architectures of distributed software intensive systems should be based on security risk assessments which have been designed from security patterns are implemented in security standards and are tool supported throughout their development life cycle web service based information systems uphold inter enterprise relations through the internet and this technology has been revealed as the reference solution with which to implement service oriented architectures in this paper we present the application of the process for web service security pwssec developed by the authors to real web service based case study the manner in which security in inter organizational information systems can be analyzed designed and implemented by applying pwssec which combines risk analysis and management along with security architecture and standard based approach is also shown we additionally present tool built to provide support to the pwssec process
uncertain data is inherent in few important applications such as environmental surveillance and mobile object tracking top queries also known as ranking queries are often natural and useful in analyzing uncertain data in those applications in this paper we study the problem of answering probabilistic threshold top queries on uncertain data which computes uncertain records taking probability of at least to be in the top list where is user specified probability threshold we present an efficient exact algorithm fast sampling algorithm and poisson approximation based algorithm an empirical study using real and synthetic data sets verifies the effectiveness of probabilistic threshold top queries and the efficiency of our methods
column oriented database system architectures invite re evaluation of how and when data in databases is compressed storing data in column oriented fashion greatly increases the similarity of adjacent records on disk and thus opportunities for compression the ability to compress many adjacent tuples at once lowers the per tuple cost of compression both in terms of cpu and space overheadsin this paper we discuss how we extended store column oriented dbms with compression sub system we show how compression schemes not traditionally used in row oriented dbmss can be applied to column oriented systems we then evaluate set of compression schemes and show that the best scheme depends not only on the properties of the data but also on the nature of the query workload
in this paper we present method for fast energy minimization of virtual garments our method is based upon the idea of multiresolution particle system when garments are approximately positioned around virtual character their spring energy may be high which will cause instability or at least long execution time of the simulation an energy minimization algorithm is needed if fixed resolution is used it will require many iterations to reduce its energy even though the complexity of each iteration is with high resolution mass spring system this minimization process can take whole day the hierarchical method presented in this paper is used to reduce significantly the execution time of the minimization process the garments are firstly discretized in several resolutions once the lowest resolution particles system is minimized in short time higher resolution model is derived then minimized the procedure is iterated up to the highest resolution but at this stage the energy to minimize is already much lower so that minimization takes reasonable time
security typed languages enforce secrecy or integrity policies by type checking this paper investigates continuation passing style cps as means of proving that such languages enforce noninterference and as first step towards understanding their compilation we present low level secure calculus with higher order imperative features and linear continuationslinear continuations impose stack discipline on the control flow of programs this additional structure in the type system lets us establish strong information flow security property called noninterference we prove that our cps target language enjoys the noninterference property and we show how to translate secure high level programs to this low level language this noninterference proof is the first of its kind for language with higher order functions and state
relational schemas consisting of relation schemes key dependencies and key based inclusion dependencies referential integrity constraints are considered schemas of this form are said to be entity relationship eer convertible if they can be associated with an eer schema procedure that determines whether relational schema is eer convertible is developed normal form is proposed for relational schemas representing eer object structures for eer convertible relational schemas the corresponding normalization procedure is presented the procedures can be used for analyzing the semantics of existing relational databases and for converting relational database schemas into object oriented database schemas
this study proposes hybrid fuzzy time series model with two advanced methods cumulative probability distribution approach cpda and rough set rule induction to forecast stock markets to improve forecasting accuracy three refining processes of fuzzy time series are provided in the proposed model using cpda to discretize the observations in training datasets based on the characteristics of data distribution generating rules fuzzy logical relationships by rough set algorithm and producing forecasting results based on rule support values from rough set algorithm to verify the forecasting performance of the proposed model in detail two empirical stock markets taiex and nyse are used as evaluating databases two other methodologies proposed by chen and yu are used as comparison models and two different evaluation methods moving windows are used the proposed model shows greatly improved performance in stock market forecasting compared to other fuzzy time series models
we have improved the performance of the mach operating system by redesigning its internal thread and interprocess communication facilities to use continuations as the basis for control transfer compared to previous versions of mach our new system consumes less space per thread cross address space remote procedure calls execute faster exception handling runs over fasterin addition to improving system performance we have used continuations to generalize many control transfer optimizations that are common to operating systems and have recast those optimizations in terms of single implementation methodology this paper describes our experiences with using continuations in the mach operating system
we propose an in network data centric storage indcs scheme for answering ad hoc queries in sensor networks previously proposed in network storage ins schemes suffered from storage hot spots that are formed if either the sensors locations are not uniformly distributed over the coverage area or the distribution of sensor readings is not uniform over the range of possible reading values our tree based data centric storage kddcs scheme maintains the invariant that the storage of events is distributed reasonably uniformly among the sensors kddcs is composed of set of distributed algorithms whose running time is within poly log factor of the diameter of the network the number of messages any sensor has to send as well as the bits in those messages is poly logarithmic in the number of sensors load balancing in kddcs is based on defining and distributively solving theoretical problem that we call the weighted split median problem in addition to analytical bounds on kddcs individual algorithms we provide experimental evidence of our scheme’s general efficiency as well as its ability to avoid the formation of storage hot spots of various sizes unlike all previous indcs schemes
one task in which inaccurate measurements are often used is location discovery process where the nodes in network determine their locations we have focused on location discovery as the primary target of our study since many sensor network tasks are dependent on location information we demonstrate the benefits of location error analysis for system software and applications in wireless sensor networks the technical highlight of our work is statistically validated parameterized model of location errors that can be used to evaluate the impact of location discovery algorithm on subsequent tasks we prove that the distribution of location error can be approximated with family of weibull distributions then we show that while performing the location discovery task the nodes in network can estimate the parameters of the distribution finally we describe how applications can use the estimated statistical parameters to estimate the confidence intervals for their results ii organize resource consumption to achieve optimal results in presence of estimated magnitude of error
research on anticipatory behavior in adaptive learning systems continues to gain more recognition and appreciation in various research disciplines this book provides an overarching view on anticipatory mechanisms in cognition learning and behavior it connects the knowledge from cognitive psychology neuroscience and linguistics with that of artificial intelligence machine learning cognitive robotics and others this introduction offers an overview over the contributions in this volume highlighting their interconnections and interrelations from an anticipatory behavior perspective we first clarify the main foci of anticipatory behavior research next we present taxonomy of how anticipatory mechanisms may be beneficially applied in cognitive systems with relation to the taxonomy we then give an overview over the book contributions the first chapters provide surveys on currently known anticipatory brain mechanisms anticipatory mechanisms in increasingly complex natural languages and an intriguing challenge for artificial cognitive systems next conceptualizations of anticipatory processes inspired by cognitive mechanisms are provided the conceptualizations lead to individual predictive challenges in vision and processing of event correlations over time next anticipatory mechanisms in individual decision making and behavioral execution are studied finally the book offers systems and conceptualizations of anticipatory processes related to social interaction
the automatic creation of models of urban spaces has become very active field of research this has been inspired by recent applications in the location awareness on the internet as demonstrated in mapslivecom and similar websites the level of automation in creating city models has increased considerably and has benefited from an increase in the redundancy of the source imagery namely digital aerial photography in this paper we argue that the next big step forward is to replace photographic texture by an interpretation of what the texture describes and to achieve this fully automatically one calls the result semantic knowledge for example we want to know that certain part of the image is car person building tree shrub window door instead of just collection of points or triangles with superimposed photographic texture we investigate object recognition methods to make this next big step we demonstrate an early result of using the on line variant of boosting algorithm to indeed detect cars in aerial digital imagery to satisfactory and useful level of completeness and we show that we can use this semantic knowledge to produce improved orthophotos we expect that also the models will be improved by the knowledge of cars
automatic image annotation has been hot pursuit among multimedia researchers of late modest performance guarantees and limited adaptability often restrict its applicability to real world settings we propose tagging over time to push the technology toward real world applicability of particular interest are online systems that receive user provided images and feedback over time with user focus possibly changing and evolving the framework consists of principled probabilistic approach to meta learning which acts as go between for black box annotation system and the users inspired by inductive transfer the approach attempts to harness available information including the black box model’s performance the image representations and the wordnet ontology being computationally lightweight this meta learner efficiently re trains over time to improve and or adapt to changes the black box annotation model is not required to be re trained allowing computationally intensive algorithms to be used we experiment with standard image datasets and real world data streams using two existing annotation systems as black boxes both batch and online annotation settings are experimented with it is observed that the addition of this meta learning layer produces much improved results that outperform best known results for the online setting the approach produces progressively better annotation with time significantly outperforming the black box as well as the static form of the meta learner on real world data
connections is file system search tool that combines traditional content based search with context information gathered from user activity by tracing file system calls connections can identify temporal relationships between files and use them to expand and reorder traditional content search results doing so improves both recall reducing false positives and precision reducing false negatives for example connections improves the average recall from to and precision from to on the first ten results when averaged across all recall levels connections improves precision from to connections provides these benefits with only modest increases in average query time seconds indexing time seconds daily and index size under of the user’s data set
this paper examines an under explored area of digital photography namely photo display using examples from study undertaken with six families we examine photo displays on mantelpieces sideboards and hallway walls and in homeoffices using the examples we make case relating to the material properties of photo displays suggesting that families routinely and often unintentionally express something of themselves in the ways they display their photos the very ideas of family and home we suggest are tightly interwoven with the methods of photo display this position is used to offer up some early design considerations for digital photo displays we outline some basic properties that might be designed around and contend that the ideas of family and home impose constraints on which of these properties might be best combined and exploited we also present three design concepts to illustrate how we have been developing this position
laser pointer can be powerful tool for robot control however in the past their use in the field of robotics has been limited to simple target designation without exploring their potential as versatile input devices this paper proposes to create laser pointer based user interface for giving various instructions to robot by applying stroke gesture recognition to the laser’s trajectory through this interface the user can draw stroke gestures using laser pointer to specify target objects and commands for the robot to execute accordingly this system which includes lasso and dwelling gestures for object selection stroke gestures for robot operation and push button commands for movement cancellation has been refined from its prototype form through several user study evaluations our results suggest that laser pointers can be effective not only for target designation but also for specifying command and target location for robot to perform
this paper describes the concepts used in the implementation of dbdsgn an experimental physical design tool for relational databases developed at the ibm san jose research laboratory given workload for system consisting of set of sql statements and their execution frequencies dbdsgn suggests physical configurations for efficient performance each configuration consists of set of indices and an ordering for each table workload statements are evaluated only for atomic configurations of indices which have only one index per table costs for any configuration can be obtained from those of the atomic configurations dbdsgn uses information supplied by the system optimizer both to determine which columns might be worth indexing and to obtain estimates of the cost of executing statements in different configurations the tool finds efficient solutions to the index selection problem if we assume the cost estimates supplied by the optimizer are the actual execution costs it finds the optimal solution optionally heuristics can be used to reduce execution time the approach taken by dbdsgn in solving the index selection problem for multiple table statements significantly reduces the complexity of the problem dbdsgn’s principles were used in the relational design tool rdt an ibm product based on dbdsgn which performs design for sql ds relational system based on system system actually uses dbdsgn’s suggested solutions as the tool expects because cost estimates and other necessary information can be obtained from system using new sql statement the explain statement this illustrates how system can export model of its internal assumptions and behavior so that other systems such as tools can share this model
hair models for computer graphics consist of many curves representing individual hair fibers in current practice these curves are generated by ad hoc random processes and in close up views their arrangement appears plainly different from real hair to begin improving this situation this paper presents new method for measuring the detailed arrangement of fibers in hair assembly many macrophotographs with shallow depth of field are taken of sample of hair sweeping the plane of focus through the hair’s volume the shallow depth of field helps isolate the fibers and reduces occlusion several sweeps are performed with the hair at different orientations resulting in multiple observations of most of the clearly visible fibers the images are filtered to detect the fibers and the resulting feature data from all images is used jointly in hair growing process to construct smooth curves along the observed fibers finally additional hairs are generated to fill in the unseen volume inside the hair the method is demonstrated on both straight and wavy hair with results suitable for realistic close up renderings these models provide the first views we know of into the arrangement of hair fibers in real hair assemblies
theory for the design of deadlock free adaptive routing algorithms for wormhole networks was proposed in this theory supplies the sufficient conditions for an adaptive routing algorithm to be deadlock free even when there are cyclic dependencies between channels also two design methodologies were proposed multicast communication refers to the delivery of the same message from one source node to an arbitrary number of destination nodes tree like routing scheme is not suitable for hardware supported multicast in wormhole networks because it produces many headers for each message drastically increasing the probability of message being blocked path based multicast routing model was proposed in for multicomputers with mesh and hypercube topologies in this model messages are not replicated at intermediate nodes this paper develops the theoretical background for the design of deadlock free adaptive multicast routing algorithms this theory is valid for wormhole networks using the path based routing model it is also valid when messages with single destination and multiple destinations are mixed together the new channel dependencies produced by messages with several destinations are studied also two theorems are proposed developing conditions to verify that an adaptive multicast routing algorithm is deadlock free even when there are cyclic dependencies between channels as an example the multicast routing algorithms presented in are extended so that they can take advantage of the alternative paths offered by the network
hybrid cellular and wireless ad hoc network architectures are currently considered to be promising alternative solutions to the standalone cellular or ad hoc network architectures in this paper we propose an efficient hash table based node identification htni method using which bandwidth for various flows can be reserved in such network environments bandwidth reservation depends on the type of the traffic and its priorities we define bandwidth reservation factor for use in such hybrid network environments we propose cross layer based architecture for bandwidth reservation to maintain quality of service qos we use priority re allocation method for flows which starve for long time the proposed method is useful for finding the position of nodes with low communication cost
this paper analyzes the main sources of power consumption in networks on chip noc based systems analytical power models of global interconnection links are studied at different levels of abstraction additionally power measurement experiments are performed for different types of routers based on this study we propose new topology based methodology to optimize the power consumption of complex noc based systems at early design phases the efficiency of the proposed methodology is verified through case study of an mpeg video application experimental results show promising improvement in power consumption average number of hops and number of global links compared to the best known related work
conditional functional dependencies cfds have recently been proposed as useful integrity constraint to summarize data semantics and identify data inconsistencies cfd augments functional dependency fd with pattern tableau that defines the context ie the subset of tuples in which the underlying fd holds while many aspects of cfds have been studied including static analysis and detecting and repairing violations there has not been prior work on generating pattern tableaux which is critical to realize the full potential of cfds this paper is the first to formally characterize good pattern tableau based on naturally desirable properties of support confidence and parsimony we show that the problem of generating an optimal tableau for given fd is np complete but can be approximated in polynomial time via greedy algorithm for large data sets we propose an on demand algorithm providing the same approximation bound that outperforms the basic greedy algorithm in running time by an order of magnitude for ordered attributes we propose the range tableau as generalization of pattern tableau which can achieve even more parsimony the effectiveness and efficiency of our techniques are experimentally demonstrated on real data
mobile code and mobile agents are generally associated with security vulnerabilities rather than with increased security this paper describes an approach in which mobile agents are confined in order to allow content providers to retain control over how their data is exported while allowing agents to search the full content of this data locally this approach offers increased control and security compared to the traditional client server technologies commonly used for building distributed systems we describe new system called mansion which implements confinement of mobile agents and describe number of applications of the confinement model to illustrate its potential
the authors present multi user implementation of bi level process modeling language pml most process modeling formalisms are well suited to one of two levels of specification but not both some concentrate on global control flow and synchronization these languages make it easy to define the broad outline of process but harder to refine the process by expressing constraints and policies on individual tools and data other process formalisms are inherently local it is easy to define constraints but far from straightforward to express control flow combining global and local formalisms is proposed to produce bi level formalisms suitable for expressing the enacting large scale processes the new pml is called the activity structures language the activity structures language integrates global constrained expressions with local rules its implementation on top of the marvel rule based environment is described
many real time database applications arise in electronic financial services safety critical installations and military systems where enforcing security is crucial to the success of the enterprise for real time database systems supporting applications with firm deadlines we investigate here the performance implications in terms of killed transactions of guaranteeing multilevel secrecy in particular we focus on the concurrency control cc aspects of this issueour main contributions are the following first we identify which among the previously proposed real time cc protocols are capable of providing covert channel free security second using detailed simulation model we profile the real time performance of representative set of these secure cc protocols for variety of security classified workloads and system configurations our experiments show that prioritized optimistic cc protocol opt wait provides the best overall performance third we propose and evaluate novel ldquo dual cc rdquo approach that allows the real time database system to simultaneously use different cc mechanisms for guaranteeing security and for improving real time performance by appropriately choosing these different mechanisms concurrency control protocols that provide even better performance than opt wait are designed finally we propose and evaluate guard an adaptive admission control policy designed to provide fairness with respect to the distribution of killed transactions across security levels our experiments show that guard efficiently provides close to ideal fairness for real time applications that can tolerate covert channel bandwidths of upto one bit per second
an ever increasing amount of information on the web today is available only through search interfaces the users have to type in set of keywords in search form in order to access the pages from certain web sites these pages are often referred to as the hidden web or the deep web since there are no static links to the hidden web pages search engines cannot discover and index such pages and thus do not return them in the results however according to recent studies the content provided by many hidden web sites is often of very high quality and can be extremely valuable to many usersin this paper we study how we can build an effective hidden web crawler that can autonomously discover and download pages from the hidden web since the only entry point to hidden web site is query interface the main challenge that hidden web crawler has to face is how to automatically generate meaningful queries to issue to the site here we provide theoretical framework to investigate the query generation problem for the hidden web and we propose effective policies for generating queries automatically our policies proceed iteratively issuing different query in every iteration we experimentally evaluate the effectiveness of these policies on real hidden web sites and our results are very promising for instance in one experiment one of our policies downloaded more than of hidden web site that contains million documents after issuing fewer than queries
as the commercial usage of erlang increases so does the need for mature development and testing tools this paper aims to evaluate the available tools with their shortcomings strengths and commercial usability compared to common practices in other languages to identify the needs of erlang developers in this area we published an online survey advertising it in various media the results of this survey and additional research in this field is presented through the comparison of tools and the requirements of the developers the paper identifies paths for future development
caches contribute to much of microprocessor system’s power and energy consumption numerous new cache architectures such as phased pseudo set associative way predicting reactive associative way shutdown way concatenating and highly associative are intended to reduce power and or energy but they all impose some performance overhead we have developed new cache architecture called way halting cache that reduces energy further than previously mentioned architectures while imposing no performance overhead our way halting cache is four way set associative cache that stores the four lowest order bits of all ways tags into fully associative memory which we call the halt tag array the lookup in the halt tag array is done in parallel with and is no slower than the set index decoding the halt tag array predetermines which tags cannot match due to their low order bits mismatching further accesses to ways with known mismatching tags are then halted thus saving power our halt tag array has an additional feature of using static logic only rather than dynamic logic used in highly associative caches making our cache simpler to design with existing tools we provide data from experiments on benchmarks drawn from powerstone mediabench and spec based on our layouts in micron cmos technology on average we obtained percnt savings of memory access related energy over conventional four way set associative cache we show that savings are greater than previous methods and nearly twice that of highly associative caches while imposing no performance overhead and only percnt cache area overhead
algorithms for learning minimal separating dfa of two disjoint regular languages have been proposed and adapted for different applications one of the most important applications is learning minimal contextual assumptions in automated compositional verification we propose in this paper an efficient learning algorithm called that learns and generates minimal separating dfa our algorithm has quadratic query complexity in the product of sizes of the minimal dfa’s for the two input languages in contrast the most recent algorithm of gupta et al has an exponential query complexity in the sizes of the two dfa’s moreover experimental results show that our learning algorithm significantly outperforms all existing algorithms on randomly generated example problems we describe how our algorithm can be adapted for automated compositional verification the adapted version is evaluated on the ltsa benchmarks and compared with other automated compositional verification approaches the result shows that our algorithm surpasses others in of benchmark problems
to meet the demands of modern architectures optimizing compilers must incorporate an ever larger number of increasingly complex transformation algorithms since code transformations may often degrade performance or interfere with subsequent transformations compilers employ predictive heuristics to guide optimizations by predicting their effects priori unfortunately the unpredictability of optimization interaction and the irregularity of today’s wide issue machines severely limit the accuracy of these heuristics as result compiler writers may temper high variance optimization with overly conservative heuristics or may exclude these optimizations entirely while this process results in compiler capable of generating good average code quality across the target benchmark set it is at the cost of missed optimization opportunities in individual code segmentsto replace predictive heuristics researchers have proposed compilers which explore many optimization options selecting the best one posteriori unfortunately these existing iterative compilation techniques are not practical for reasons of compile time and applicability in this paper we present the optimization space exploration ose compiler organization the first practical iterative compilation strategy applicable to optimizations in general purpose compilers instead of replacing predictive heuristics ose uses the compiler writer’s knowledge encoded in the heuristics to select small number of promising optimization alternatives for given code segment compile time is limited by evaluating only these alternatives for hot code segments using general compiletime performance estimator an ose enhanced version of lntel’s highly tuned aggressively optimizing production compiler for ia yields significant performance improvement more than in some cases on itanium for spec codes
for many applications such as friend finder buddy tracking and location mapping in mobile wireless networks or information sharing and cooperative caching in mobile ad hoc networks it is often important to be able to identify whether given set of moving objects is close to each other or close to given point of demarcation to achieve this continuously available location position information of thousands of mobile objects must be correlated against each other to identify whether fixed set of objects is in certain proximity relation which if satisfied would be signaled to the objects or any interested party in this paper we state this problem referring to it as the location constraint matching problem and present and evaluate solutions for solving it we introduce two types of location constraints to model the proximity relations and experimentally validate that our solution scales to the processing of hundreds of thousands of constraints and moving objects
the success of kernel methods including support vector machines svms strongly depends on the design of appropriate kernels while initially kernels were designed in order to handle fixed length data their extension to unordered variable length data became more than necessary for real pattern recognition problems such as object recognition and bioinformatics we focus in this paper on object recognition using new type of kernel referred to as context dependent objects seen as constellations of local features interest points regions etc are matched by minimizing an energy function mixing fidelity term which measures the quality of feature matching neighborhood criterion which captures the object geometry and regularization term we will show that the fixed point of this energy is context dependent kernel cdk which also satisfies the mercer condition experiments conducted on object recognition show that when plugging our kernel in svms we clearly outperform svms with context free kernels
this chapter surveys the interaction between active rules and integrity constraints first we analyze the static case following the sql standard committee point of view which up to date represents the state of the art then we consider the case of dynamic constraints for which we use temporal logic formalism finally we discuss the applicabilty limitations and partial solutions found when attempting to ensure the satisfaction of dynamic constraints
the secure release of identity attributes is key enabler for electronic business interactions integrity and confidentiality of identity attributes are two key requirements in such context users should also have the maximum control possible over the release of their identity attributes and should state under which conditions these attributes can be disclosed moreover users should disclose only the identity attributes that are actually required for the transactions at hand in this paper we present an approach for the controlled release of identity attributes that addresses such requirements the approach is based on the integration of trust negotiation and minimal credential disclosure techniques trust negotiation supports selective and incremental disclosure of identity attributes while minimal credential disclosure guarantees that only the attributes necessary to complete the on line interactions are disclosed
input validation is essential for any software that deals with input from its external environment it forms major part of such software that has intensive interaction with its environment through the integration of invariant and empirical properties for implementing input validation this paper proposes novel approach for the automation of the following tasks from processing the source code of program verification of existence of input validation generation of test cases to test and demonstrate all the input validations classification of each validation into the various types defined along with its test case generated all the empirical properties in the theory have been validated statistically based on open source systems our evaluation shows that the proposed approach can help in both testing of input validation features and verifying the adequacy of input control
this paper develops and evaluates new share based scheduling algorithms for differentiated service quality in network services such as network storage servers this form of resource control makes it possible to share server among multiple request flows with probabilistic assurance that each flow receives specified minimum share of server’s capacity to serve requests this assurance is important for safe outsourcing of services to shared utilities such as storage service providersour approach interposes share based request dispatching on the network path between the server and its clients two new scheduling algorithms are designed to run within an intermediary eg network switch where they enforce fair sharing by throttling request flows and reordering requests these algorithms are adaptations of start time fair queuing sfq for servers with configurable degree of internal concurrency third algorithm request windows rw bounds the outstanding requests for each flow independently it is amenable to decentralized implementation but may restrict concurrency under light load the analysis and experimental results show that these new algorithms can enforce shares effectively when the shares are not saturated and that they provide acceptable performance isolation under saturation although the evaluation uses storage service as an example interposed request scheduling is non intrusive and views the server as black box so it is useful for complex services with no internal support for differentiated service quality
with today’s ever increasing demands on software software developers must produce software that can be changed without the risk of degrading the software architecture one way to address software changes is to characterize their causes and effects software change characterization mechanism allows developers to characterize the effects of change using different criteria eg the cause of the change the type of change that needs to be made and the part of the system where the change must take place this information then can be used to illustrate the potential impact of the change this paper presents systematic literature review of software architecture change characteristics the results of this systematic review were used to create the software architecture change characterization scheme saccs this report addresses key areas involved in making changes to software architecture saccs’s purpose is to identify the characteristics of software change that will have an impact on the high level software architecture
this paper proposes and implements rigorous method for studying the dynamic behaviour of aspectj programs as part of this methodology several new metrics specific to aspectj programs are proposed and tools for collecting the relevant metrics are presented the major tools consist of modified version of the aspectj compiler that tags bytecode instructions with an indication of the cause of their generation such as particular feature of aspectj and modified version of the dynamic metrics collection tool which is composed of jvmpi based trace generator and an analyzer which propagates tags and computes the proposed metrics this dynamic propagation is essential and thus this paper contributes not only new metrics but also non trivial ways of computing them we furthermore present set of benchmarks that exercise wide range of aspectj’s features and the metrics that we measured on these benchmarks the results provide guidance to aspectj users on how to avoid efficiency pitfalls to aspectj implementors on promising areas for future optimization and to tool builders on ways to understand the runtime behaviour of aspectj
this paper presents chunking based discriminative approach to full parsing we convert the task of full parsing into series of chunking tasks and apply conditional random field crf model to each level of chunking the probability of an entire parse tree is computed as the product of the probabilities of individual chunking results the parsing is performed in bottom up manner and the best derivation is efficiently obtained by using depth first search algorithm experimental results demonstrate that this simple parsing framework produces fast and reasonably accurate parser
software code caches are increasingly being used to amortizethe runtime overhead of tools such as dynamic optimizers simulators and instrumentation engines the additional memory consumed by these caches along with the data structures used to manage them limits the scalability of dynamic tool deployment inter process sharing of code caches significantly improves the ability to efficiently apply code caching tools to many processes simultaneously in this paper we present method of code cache sharing among processes for dynamic tools operating on native applications our design also supports code cache persistence for improved cold code execution in short lived processes or long initialization sequences sharing raises security concerns and we show how to achieve sharing without risk of privilege escalation and with read only code caches and associated data structures we evaluate process shared and persisted code caches implemented in the dynamorio industrial strength dynamic instrumentation engine where we achieve two thirds reduction in both memory usage and startup time
grid information service gis stores information about the resources of distributed computing environment and answers questions about it we are developing rgis gis system based on the relational data model rgis users can write sql queries that search for complex compositions of resources that meet collective requirements executing these queries can be very expensive however in response we introduce the nondeterministic query an extension to the select statement which allows the user and rgis to trade off between the query’s running time and the number of results the results are random sample of the deterministic results which we argue is sufficient and appropriate herein we describe rgis the nondeterministic query extension and its implementation our evaluation shows that meaningful tradeoff between query time and results returned is achievable and that the tradeoff can be used to keep query time largely independent of query complexity
in multiagent semi competitive environments competitions and cooperations can both exist as agents compete with each other they have incentives to lie sometimes agents can increase their utilities by cooperating with each other then they have incentives to tell the truth therefore being receiver an agent needs to decide whether or not to trust the received message to help agents make this decision some of the existing models make use of trust or reputation only which means agents choose to believe or cooperate with the trustworthy senders or senders with high reputation however trustworthy agent may only bring little benefit another way to make the decision is to use expected utility however agents who only believe messages with high expected utilities can be cheated easily to solve the problems this paper introduces the trust model which makes use of trust expected utility and also agents attitudes towards risk to make decisions on the other hand being sender an agent needs to decide whether or not to be honest to help agents make this decision this paper introduces the honesty model which is symmetric to the trust model in addition we introduce an adaptive strategy to the trust honesty model which enables agents to learn from and adapt to the environment simulations show that agents with the adaptive trust honesty model perform much better than agents which only use trust or expected utility to make the decision
we present de novo hierarchical simulation framework for first principles based predictive simulations of materials and their validation on high end parallel supercomputers and geographically distributed clusters in this framework high end chemically reactive and non reactive molecular dynamics md simulations explore wide solution space to discover microscopic mechanisms that govern macroscopic material properties into which highly accurate quantum mechanical qm simulations are embedded to validate the discovered mechanisms and quantify the uncertainty of the solution the framework includes an embedded divide and conquer edc algorithmic framework for the design of linear scaling simulation algorithms with minimal bandwidth complexity and tight error control the edc framework also enables adaptive hierarchical simulation with automated model transitioning assisted by graph based event tracking tunable hierarchical cellular decomposition parallelization framework then maps the edc algorithms onto petaflops computers while achieving performance tunability through hierarchy of parameterized cell data computation structures as well as its implementation using hybrid grid remote procedure call message passing threads programming high end computing platforms such as ibm bluegene sgi altix and the nsf teragrid provide an excellent test grounds for the framework on these platforms we have achieved unprecedented scales of quantum mechanically accurate and well validated chemically reactive atomistic simulations billion atom fast reactive force field md and million atom trillion grid points quantum mechanical md in the framework of the edc density functional theory on adaptive multigrids in addition to billion atom non reactive space time multiresolution md with the parallel efficiency as high as on dual processor bluegene nodes we have also achieved an automated execution of hierarchical qm md simulation on grid consisting of supercomputer centers in the us and japan in total of processor hours in which the number of processors change dynamically on demand and resources are allocated and migrated dynamically in response to faults furthermore performance portability has been demonstrated on wide range of platforms such as bluegene altix and amd opteron based linux clusters
this paper introduces new image based approach to capturing and modeling highly specular transparent or translucent objects we have built system for automatically acquiring high quality graphical models of objects that are extremely difficult to scan with traditional scanners the system consists of turntables set of cameras and lights and monitors to project colored backdrops we use multi background matting techniques to acquire alpha and environment mattes of the object from multiple viewpoints using the alpha mattes we reconstruct an approximate shape of the object we use the environment mattes to compute high resolution surface reflectance field we also acquire low resolution surface reflectance field using the overhead array of lights both surface reflectance fields are used to relight the objects and to place them into arbitrary environments our system is the first to acquire and render transparent and translucent objects such as glass of beer from arbitrary viewpoints under novel illumination
adopting software product line approach allows companies to realise significant improvements in time to market cost productivity and system quality fundamental problem in software product line engineering is the fact that product line of industrial size can easily incorporate several thousand variation points the scale and interdependencies can lead to variability management and product derivation tasks that are extremely complex to manage this paper investigates visualisation techniques to support and improve the effectiveness of these tasks
plagiarism in universities has always been difficult problem to overcome various tools have been developed over the past few years to help teachers detect plagiarism in students work by being able to categorize the multitude of plagiarism detection tools it is possible to estimate their capabilities advantages and disadvantages in this article consider modern plagiarism software solutions paying attention mostly to desktop systems intended for plagiarism detection in program code also estimate the speed and reliability of different plagiarism detection systems that are currently available
we propose novel versatile gesture input device called the mcube to support both desktop and hand held interactions in ubiquitous computing environments it allows for desktop interactions by moving the device on planar surface like computer mouse by lifting the device from the surface users can seamlessly continue handheld interactions in the same application since mcube is single completely wireless device it can be carried and used for different display platforms we explore the use of multiple sensors to support wide range of tasks namely gesture commands multi dimensional manipulation and navigation and tool selections on pie menu this paper addresses the design and implementation of the device with set of design principles and demonstrates its exploratory interaction techniqueswe also discuss the results of user evaluation and future directions
heterogeneous multiprocessors are increasingly important in the multi core era due to their potential for high performance and energy efficiency in order for software to fully realize this potential the step that maps computations to processing elements must be as automated as possible however the state of the art approach is to rely on the programmer to specify this mapping manually and statically this approach is not only labor intensive but also not adaptable to changes in runtime environments like problem sizes and hardware software configurations in this study we propose adaptive mapping fully automatic technique to map computations to processing elements on cpu gpu machine we have implemented it in our experimental heterogeneous programming system called qilin our results show that by judiciously distributing works over the cpu and gpu automatic adaptive mapping achieves reduction in execution time and reduction in energy consumption than static mappings on average for set of important computation benchmarks we also demonstrate that our technique is able to adapt to changes in the input problem size and system configuration
formal methods and testing are two important approaches that assist in the development of high quality software while traditionally these approaches have been seen as rivals in recent years new consensus has developed in which they are seen as complementary this article reviews the state of the art regarding ways in which the presence of formal specification can be used to assist testing
modularity is an important principle of software design it is directly associated with software understandability maintainability and reusability however as software systems evolve old code segments are modified removed and new code segments are added the original modular design of the program might be distorted one of the factors that can affect the modularity of the system is the introduction of code clones portion of source code that is identical or similar to another in the software evolution process this paper applies clone detection techniques to study the modularity of linux the code clones are first identified using an automatic tool then each clone set is analyzed by domain expert to classify it into one of the three clone concern categories singular concern crosscutting concern and partial concern different approaches to dealing with these different categories of code clones are suggested in order to improve modularity
recent advances in polyhedral compilation technology have made it feasible to automatically transform affine sequential loop nests for tiled parallel execution on multi core processors however for multi statement input programs with statements of different dimensionalities such as cholesky or lu decomposition the parallel tiled code generated by existing automatic parallelization approaches may suffer from significant load imbalance resulting in poor scalability on multi core systems in this paper we develop completely automatic parallelization approach for transforming input affine sequential codes into efficient parallel codes that can be executed on multi core system in load balanced manner in our approach we employ compile time technique that enables dynamic extraction of inter tile dependences at run time and dynamic scheduling of the parallel tiles on the processor cores for improved scalable execution our approach obviates the need for programmer intervention and re writing of existing algorithms for efficient parallel execution on multi cores we demonstrate the usefulness of our approach through comparisons using linear algebra computations lu and cholesky decomposition
the issue of data broadcast has received much attention in mobile computing periodic broadcast of frequently requested data can reduce the workload of the up link channel and facilitate data access for the mobile user many approaches have been proposed to schedule data items for broadcasting however the issues of accessing multiple data items on the broadcast channel are less discussed two problems are discussed in this paper that is deciding the content of the broadcast channel based on the queries from the clients and scheduling the data items to be broadcast we will show that these two problems are np complete different heuristics to these problems are presented and compared through performance evaluations
despite significant progress in the theory and practice of program analysis analyzing properties of heap data has not reached the same level of maturity as the analysis of static and stack data the spatial and temporal structure of stack and static data is well understood while that of heap data seems arbitrary and is unbounded we devise bounded representations that summarize properties of the heap data this summarization is based on the structure of the program that manipulates the heap the resulting summary representations are certain kinds of graphs called access graphs the boundedness of these representations and the monotonicity of the operations to manipulate them make it possible to compute them through data flow analysis an important application that benefits from heap reference analysis is garbage collection where currently liveness is conservatively approximated by reachability from program variables as consequence current garbage collectors leave lot of garbage uncollected fact that has been confirmed by several empirical studies we propose the first ever end to end static analysis to distinguish live objects from reachable objects we use this information to make dead objects unreachable by modifying the program this application is interesting because it requires discovering data flow information representing complex semantics in particular we formulate the following new analyses for heap data liveness availability and anticipability and propose solution methods for them together they cover various combinations of directions of analysis ie forward and backward and confluence of information ie union and intersection our analysis can also be used for plugging memory leaks in plus plus languages
in this paper we propose matching algorithm for measuring the structural similarity between an xml document and dtd the matching algorithm by comparing the document structure against the one the dtd requires is able to identify commonalities and differences differences can be due to the presence of extra elements with respect to those the dtd requires and to the absence of required elements the evaluation of commonalities and differences gives raise to numerical rank of the structural similarity moreover in the paper some applications of the matching algorithm are discussed specifically the matching algorithm is exploited for the classification of xml documents against set of dtds the evolution of the dtd structure the evaluation of structural queries the selective dissemination of xml documents and the protection of xml document contents
recent advances in software and architectural support for server virtualization have created interest in using this technology in the design of consolidated hosting platforms since virtualization enables easier and faster application migration as well as secure co location of antagonistic applications higher degrees of server consolidation are likely to result in such virtualization based hosting platforms vhps we identify key shortcoming in existing virtual machine monitors vmms that proves to be an obstacle in operating hosting platforms such as internet data centers under conditions of such high consolidation cpu schedulers that are agnostic to the communication behavior of modern multi tier applications we develop new communication aware cpu scheduling algorithm to alleviate this problem we implement our algorithm in the xen vmm and build prototype vhp on cluster of servers our experimental evaluation with realistic internet server applications and benchmarks demonstrates the performance cost benefits and the wide applicability of our algorithms for example the tpc benchmark exhibited improvements in average response times of up to for variety of consolidation scenarios streaming media server hosted on our prototype vhp was able to satisfactorily service up to times as many clients as one running on the default xen
the work described here initially formed part of triangulation exercise to establish the effectiveness of the query term order algorithm it subsequently proved to be reliable indicator for summarising english web documents we utilised the human summaries from the document understanding conference data and generated queries automatically for testing the qto algorithm six sentence weighting schemes that made use of query term frequency and qto were constructed to produce system summaries and this paper explains the process of combining and balancing the weighting components the summaries produced were evaluated by the rouge metric and the results showed that using qto in weighting combination resulted in the best performance we also found that using combination of more weighting components always produced improved performance compared to any single weighting component
number of web cache related algorithms such as replacement and prefetching policies rely on specific characteristics present in the sequence of requests for efficient performance further there is an increasing need to synthetically generate long traces of web requests for studying the performance of algorithms and systems related to the web these reasons motivate us to obtain simple and accurate model of web request tracesour markovian model precisely captures the degrees to which temporal correlations and document popularity influence web trace requests we describe mathematical procedure to extract the model parameters from real traces and generate synthetic traces using these parameters this procedure is verified by standard statistical analysis we also validate the model by comparing the hit ratios for real traces and their synthetic counterparts under various caching algorithmsas an important by product the model provides guidelines for designing efficient replacement algorithms we obtain optimal algorithms given the parameters of the model we also introduce spectrum of practicable high performance algorithms that adapt to the degree of temporal correlation present in the request sequence and discuss related implementation concerns
this paper introduces class of join algorithms termed join for joining multiple infinite data streams join addresses the infinite nature of the data streams by joining stream data items that lie within sliding window and that match certain join condition in addition to its general applicability in stream query processing join can be used to track the motion of moving object or detect the propagation of clouds of hazardous material or pollution spills over time in sensor network environment we describe two new algorithms for join and address variations and local global optimizations related to specifying the nature of the window constraints to fulfill the posed queries the performance of the proposed algorithms is studied experimentally in prototype stream database system using synthetic data streams and real time series data tradeoffs of the proposed algorithms and their advantages and disadvantages are highlighted given variations in the aggregate arrival rates of the input data streams and the desired response times per query
we present generative model based method for recovering both the shape and the reflectance of the surface of scene from multiple images assuming that illumination conditions are known in advance based on variational framework and via gradient descents the algorithm minimizes simultaneously and consistently global cost functional with respect to both shape and reflectance contrary to previous works which consider specific individual scenarios our method applies to number of scenarios mutiview stereovision multiview photometric stereo and multiview shape from shading in addition our approach naturally combines stereo silhouette and shading cues in single framework and unlike most previous methods dealing with only lambertian surfaces the proposed method considers general dichromatic surfaces
multicast services are demanded by variety of applications many applications require anonymity during their communication however there has been very little work on anonymous multicasting and such services are not available yet due to the fundamental differences between multicast and unicast the solutions proposed for anonymity in unicast communications cannot be directly applied to multicast applications in this paper we define the anonymous multicast system and propose mutual anonymous multicast mam protocol including the design of unicast mutual anonymity protocol and construction and optimization of an anonymous multicast tree mam is self organizing and completely distributed we define the attack model in an anonymous multicast system and analyze the anonymity degree we also evaluate the performance of mam by comprehensive simulations
current text based mobile group chatting systems hinder navigation ease through long chat archive in limited screen display moreover tracking messages sent by specific chatter is cumbersome and time consuming hence graphical based usable interface that aids navigation and message tracking through minimal key pressed and enhances user expression via avatars employment is proposed the research outcomes typified that there was significant linear relationship between user interface and usability on text based and graphical based usable interface on mobile chat moreover the experimental evaluation results indicated that text based usability could be improved by creating interface that encourages usages whereas the graphical based usable mobile chat is augmented by crafting user friendly interface that enhances user satisfaction encourages usages and promotes navigation ease the empirical findings and results exemplified that the potential use of graphical based usable mobile chat as substitution to the text based that presently has poor reception and is under utilised in commercial arena
we present novel distributed algorithm for the maximal independent set mis problem on growth bounded graphs gbg our deterministic algorithm finishes in log time being the number of nodes in light of linial’s log lower bound our algorithm is asymptotically optimal our algorithm answers prominent open problems in the ad hoc sensor network domain for instance it solves the connected dominating set problem for unit disk graphs in log time exponentially faster than the state of the art algorithm with new extension our algorithm also computes delta coloring in log time where delta is the maximum degree of the graph
traditional networks are surprisingly fragile and difficult to manage the problem can partly be attributed to the exposition of too many details of the controlled objects leading to the deluge of complexity in control plane and the absence of network wide views leading to the blindness of network management with these problems this paper decomposes the necessary network management information into three parts the basic information the cross layer association and global information and new controlled object description model is presented in the trustworthy and controllable network control architecture which separates the functionality of control and management of network form the data plane of ip network and constructs the formal control and management plane of ip network the new model identifies and abstracts the controlled objects with object oriented approach based on this model cross layer database is built to store the different layer control objects and to present cross layer association view processing mechanism to process the original information is presented for global network state view and control plane is constructed to realize network control the control information description model restricts the complexity of the controlled objects to their own implementation by abstraction and alleviates the difficulty of network management the cross layer association view and the global network state view composes the network wide views the network wide views realize the visibility and improve the manageability of network finally we present examples to indicate that the model alleviates the complexity of configuration management
traditional means of data processing management information systems and decision support systems cannot meet new demand ushered in with the evolution of mini micro computers modern computer end user especially modern decision maker needs single pool of information that may be geographically dispersed therefore new combination of technologies is needed for coping with this new demand the purpose of this paper is to develop unified methodology for distributed system design with distributed databases the distributed systems designed under this unified methodology can satisfy geographical data independence in addition to logical and physical data independence in the traditional sense
in this paper we present technique of compressing bitmap indexes for application in data warehouses the developed compression technique called run length huffman rlh is based on the run length encoding and on the huffman encoding rlh was implemented and experimentally compared to the well known word aligned hybrid bitmap compression technique that has been reported to provide the shortest query execution time the experiments discussed in this paper show that rlh offers shorter query response times than wah for certain cardinalities of indexed attributes moreover bitmaps compressed with rlh are smaller than corresponding bitmaps compressed with wah additionally we propose modified rlh called rlh which is designed to better support bitmap updates
this paper discusses technological interventions in support of planners citizens and other stakeholders in envisioning and nego tiating an urban project set of prototypal tools including tangible user interface has been developed that allow users to create and manipulate visual auditory scenes and mesh these scenes with the real environment of an urban site the paper discusses how toallsupport users different types of stakeholders in the collaborative creation of mixed reality configurations as an integral part of expressing their ideas about an urban project distinguishing between different types and levels of cooperation it also looks into how to use mixed reality tools for enhancing an already highly developed representational culture
document summarization plays an increasingly important role with the exponential growth of documents on the web many supervised and unsupervised approaches have been proposed to generate summaries from documents however these approaches seldom simultaneously consider summary diversity coverage and balance issues which to large extent determine the quality of summaries in this paper we consider extract based summarization emphasizing the following three requirements diversity in summarization which seeks to reduce redundancy among sentences in the summary sufficient coverage which focuses on avoiding the loss of the document’s main information when generating the summary and balance which demands that different aspects of the document need to have about the same relative importance in the summary we formulate the extract based summarization problem as learning mapping from set of sentences of given document to subset of the sentences that satisfies the above three requirements the mapping is learned by incorporating several constraints in structure learning framework and we explore the graph structure of the output variables and employ structural svm for solving the resulted optimization problem experiments on the duc data sets demonstrate significant performance improvements in terms of and rouge metrics
we present new model of the homogeneous bssrdf based on large scale simulations our model captures the appearance of materials that are not accurately represented using existing single scattering models or multiple isotropic scattering models eg the diffusion approximation we use an analytic function to model the hemispherical distribution of exitant light at point on the surface and table of parameter values of this function computed at uniformly sampled locations over the remaining dimensions of the bssrdf domain this analytic function is expressed in elliptic coordinates and has six parameters which vary smoothly with surface position incident angle and the underlying optical properties of the material albedo mean free path length phase function and the relative index of refraction our model agrees well with measured data and is compact requiring only mb to represent the full spatial and angular distribution of light across wide spectrum of materials in practice rendering single material requires only about kb to represent the bssrdf
with the availability of chip multiprocessor cmp and simultaneous multithreading smt machines extracting thread level parallelism from sequential program has become crucial for improving performance however many sequential programs cannot be easily parallelized due to the presence of dependences to solve this problem different solutions have been proposed some of them make the optimistic assumption that such dependences rarely manifest themselves at runtime however when this assumption is violated the recovery causes very large overhead other approaches incur large synchronization or computation overhead when resolving the dependences consequently for loop with frequently arising cross iteration dependences previous techniques are not able to speed up the execution in this paper we propose compiler technique which uses state separation and multiple value prediction to speculatively parallelize loops in sequential programs that contain frequently arising cross iteration dependences the key idea is to generate multiple versions of loop iteration based on multiple predictions of values of variables involved in cross iteration dependences ie live in variables these speculative versions and the preceding loop iteration are executed in separate memory states simultaneously after the execution if one of these versions is correct ie its predicted values are found to be correct then we merge its state and the state of the preceding iteration because the dependence between the two iterations is correctly resolved the memory states of other incorrect versions are completely discarded based on this idea we further propose runtime adaptive scheme that not only gives good performance but also achieves better cpu utilization we conducted experiments on benchmark programs on real machine the results show that our technique can achieve speedup on average across all used benchmarks
dynamic voltage scaling dvs and dynamic power management dpm are the two main techniques for reducing the energy consumption of embedded systems the effectiveness of both dvs and dpmneeds to be considered in the development of an energy management policy for system that consists of both dvs enabled and dpm enabled components the characteristics of the power source also have to be explicitly taken into account in this paper we propose policy to maximize the operational lifetime of dvs dpm enabled embedded system powered by fuel cell battery fc hybrid source we show that the lifetime of the system is determined by the fuel consumption of the fuel cell fc and that the fuel consumption can be minimized by combination of load energy minimization policy and an optimal fuel flow control policy the proposed method when applied to randomized task trace demonstrated superior performance compared to competing policies based on dvs and or dpm
sorting is one of the most important and well studied problems in computer science many good algorithms are known which offer various trade offs in efficiency simplicity memory use and other factors however these algorithms do not take into account features of modern computer architectures that significantly influence performance caches and branch predictors are two such features and while there has been significant amount of research into the cache performance of general purpose sorting algorithms there has been little research on their branch prediction properties in this paper we empirically examine the behavior of the branches in all the most common sorting algorithms we also consider the interaction of cache optimization on the predictability of the branches in these algorithms we find insertion sort to have the fewest branch mispredictions of any comparison based sorting algorithm that bubble and shaker sort operate in fashion that makes their branches highly unpredictable that the unpredictability of shellsort’s branches improves its caching behavior and that several cache optimizations have little effect on mergesort’s branch mispredictions we find also that optimizations to quicksort for example the choice of pivot have strong influence on the predictability of its branches we point out simple way of removing branch instructions from classic heapsort implementation and also show that unrolling loop in cache optimized heapsort implementation improves the predicitability of its branches finally we note that when sorting random data two level adaptive branch predictors are usually no better than simpler bimodal predictors this is despite the fact that two level adaptive predictors are almost always superior to bimodal predictors in general
rendering throughput has reached level that enables novel approach to level of detail lod control in terrain rendering we introduce the geometry clipmap which caches the terrain in set of nested regular grids centered about the viewer the grids are stored as vertex buffers in fast video memory and are incrementally refilled as the viewpoint moves this simple framework provides visual continuity uniform frame rate complexity throttling and graceful degradation moreover it allows two new exciting real time functionalities decompression and synthesis our main dataset is gb height map of the united states compressed image pyramid reduces the size by remarkable factor of so that it fits entirely in memory this compressed data also contributes normal maps for shading as the viewer approaches the surface we synthesize grid levels finer than the stored terrain using fractal noise displacement decompression synthesis and normal map computations are incremental thereby allowing interactive flight at frames sec
key generation from biometrics has been studied intensively in recent years linking key with certain biometric enhances the strength of identity authentication but the state of the art key generation systems are far away from practicality due to low accuracy the special manner of biometric matching makes single feature based key generation system difficult to obtain high recognition accuracy integrating more features into key generation system may be potential solution to improve the system performance in this paper we propose fingerprint based key generation system under the framework of fuzzy extractor by fusing two kinds of features minutia based features and image based features three types of sketch including minutiae based sketch modified biocode based sketch and combined feature based sketch are constructed to deal with the feature differences our system is tested on fvc db and db and the experimental results show that the fusion scheme effectively improves the system performance compared with the systems based only on minutiae or modified biocode
recent advances in low power sensing devices coupled with the widespread availability of wireless ad hoc networks have fueled the development of sensor networks these are typically deployed over wide areas to gather data in the environment and monitor events of interest the ability to run spatial queries is extremely useful for sensor networks spatial query execution has been extensively studied in the context of centralized spatial databases however because of the energy and bandwidth limitation of sensor nodes these solutions are not directly applicable to the sensor network in this paper we propose scalable and distributed way of spatial query execution in sensor networks we develop distributed spatial index over the sensor nodes that is used in processing spatial queries in distributed fashion we evaluate the behavior of our approach and show that our mechanism provides an efficient and scalable way to run spatial queries over sparse and dense sensor networks
the amount of information available online is increasing exponentially while this information is valuable resource its sheer volume limits its value many research projects and companies are exploring the use of personalized applications that manage this deluge by tailoring the information presented to individual users these applications all need to gather and exploit some information about individuals in order to be effective this area is broadly called user profiling this chapter surveys some of the most popular techniques for collecting information about users representing and building user profiles in particular explicit information techniques are contrasted with implicitly collected user information using browser caches proxy servers browser agents desktop agents and search logs we discuss in detail user profiles represented as weighted keywords semantic networks and weighted concepts we review how each of these profiles is constructed and give examples of projects that employ each of these techniques finally brief discussion of the importance of privacy protection in profiling is presented
reverse nearest neighbor rnn queries are of particular interest in wide range of applications such as decision support systems profile based marketing data streaming document databases and bioinformatics the earlier approaches to solve this problem mostly deal with two dimensional data however most of the above applications inherently involve high dimensions and high dimensional rnn problem is still unexplored in this paper we propose an approximate solution to answer rnn queries in high dimensions our approach is based on the strong correlation in practice between nn and rnn it works in two phases in the first phase the nn of query point is found and in the next phase they are further analyzed using novel type of query boolean range query brq experimental results show that brq is much more efficient than both nn and range queries and can be effectively used to answer rnn queries performance is further improved by running multiple brq simultaneously the proposed approach can also be used to answer other variants of rnn queries such as rnn of order bichromatic rnn and matching query which has many applications of its own our technique can efficiently answer nn rnn and its variants with approximately same number of as running nn query
we present technique that controls the peak power consumption of high density server by implementing feedback controller that uses precise system level power measurement to periodically select the highest performance state while keeping the system within fixed power constraint control theoretic methodology is applied to systematically design this control loop with analytic assurances of system stability and controller performance despite unpredictable workloads and running environments in real server we are able to control power over second period to within and over an second period to within conventional servers respond to power supply constraint situations by using simple open loop policies to set safe performance level in order to limit peak power consumption we show that closed loop control can provide higher performance under these conditions and implement this technique on an ibm bladecenter hs server experimental results demonstrate that closed loop control provides up to higher application performance compared to open loop control and up to higher performance compared to widely used ad hoc technique
type based protection mechanisms in jvm like environment must be administrated by the code consumer at the bytecode level unfortunately formulating sound static type system for the full jvm bytecode language can be daunting task it is therefore counter productive for the designer of bytecode level type system to address the full complexity of the vm environment in the early stage of design in this work lightweight modelling tool featherweight jvm is proposed to facilitate the early evaluation of bytecode level type based protection mechanisms and specifically their ability to enforce security motivated stack invariants and confinement properties rather than modelling the execution of specific bytecode stream featherweight jvm is nondeterministic event model that captures all the possible access event sequences that may be generated by jvm like environment when well typed bytecode programs are executed the effect of deploying type based protection mechanism can be modelled by safety policy that constrains the event sequences produced by the vm model to evaluate the effectiveness of the protection mechanism security theorems in the form of state invariants can then be proved in the policy guarded vm model to demonstrate the utility of the proposed approach vitek et al confined types has been formulated as safety policy for the featherweight jvm and corresponding confinement theorem has been established to reduce class loading overhead capability based reformulation of confined types is then studied and is shown to preserve the confinement theorem this paper thus provides first evidence on the utility of featherweight jvm in providing early feedback to the designer of type based protection mechanisms for jvm like environments
we describe how to express constraints in functional semantic data model which has working implementation in an object database we trace the development of such constraints from being integrity checks embedded in procedural code to being something declarative and self contained combining data access and computation that can be moved around into other contexts in intelligent distributed systems we see this as paralleling and extending the original vision of functions as values in functional programming systems it is greatly helped by using referentially transparent functional formalisation we illustrate these ideas by showing how constraints can move around within database systems colan angelic daplex being transformed for various uses or even moved out into other systems and fused into specification for configuration problem we look forward to future directions involving agents
in many applications xml documents need to be modelled as graphs the query processing of graph structured xml documents brings new challenges in this paper we design method based on labelling scheme for structural queries processing on graph structured xml documents we give each node some labels the reachability labelling scheme by extending an interval based reachability labelling scheme for dag by rakesh et al we design labelling schemes to support the judgements of reachability relationships for general graphs based on the labelling schemes we design graph structural join algorithms to answer the structural queries with only ancestor descendant relationship efficiently for the processing of subgraph query we design subgraph join algorithm with efficient data structure the subgraph join algorithm can process subgraph queries with various structures efficiently experimental results show that our algorithms have good performance and scalability
visual context provides cues about an object’s presence position and size within an observed scene which are used to increase the performance of object detection techniques however state of the art methods for context aware object detection could decrease the initial performance we discuss the reasons for failure and propose concept that overcomes these limitations by introducing novel technique for integrating visual context and object detection therefore we apply the prior probability function of an object detector that maps the detector’s output to probabilities together with an appropriate contextual weighting probabilistic framework is established in addition we present an extension to state of the art methods to learn scale dependent visual context information and show how this increases the initial performance the standard methods and our proposed extensions are compared on novel demanding image data set results show that visual context facilitates object detection methods
multicast operation is an important operation in multicomputer communication systems and can be used to support several collective communication operations significant performance improvement can be achieved by supporting multicast operations at the hardware level in this paper we propose an asynchronous tree based multicasting atbm technique for multistage interconnection networks mins the deadlock issues in tree based multicasting in mins are analyzed first to examine the main causes of deadlocks an atbm framework is developed in which deadlocks are prevented by serializing the initiations of tree operations that have potential to create deadlocks these tree operations are identified through grouping algorithm the atbm approach is not only simple to implement but also provides good communication performance using minimal overheads in terms of additional hardware requirements and synchronization delay using the atbm framework algorithms are developed for both unidirectional and bidirectional multistage interconnection networks the performances of the proposed algorithms are evaluated through simulation experiments the results indicate that the proposed hardware based atbm scheme reduces the communication latency when compared to the software multicasting approach proposed earlier
this paper presents distributed systems foundation dsf common platform for distributed systems research and development it can run distributed algorithm written in java under multiple execution modes simulation massive multi tenancy and real deployment dsf provides set of novel features to facilitate testing and debugging including chaotic timing test and time travel debugging with mutable replay unlike existing research prototypes that offer advanced debugging features by hacking programming tools dsf is written entirely in java without modifications to any external tools such as jvm java runtime library compiler linker system library os or hypervisor this simplicity stems from our goal of making dsf not only research prototype but more importantly production tool experiments show that dsf is efficient and easy to use dsf’s massive multi tenancy mode can run os level threads in single jvm to concurrently execute as opposed to simulate dht nodes in real time
we describe simple strategy to achieve translation performance improvements by combining output from identical statistical machine translation systems trained on alternative morphological decompositions of the source language combination is done by means of minimum bayes risk decoding over shared best list when translating into english from two highly inflected languages such as arabic and finnish we obtain significant improvements over simply selecting the best morphological decomposition
the purpose of this study is to demonstrate the benefit of using common data mining techniques on survey data where statistical analysis is routinely applied the statistical survey is commonly used to collect quantitative information about an item in population statistical analysis is usually carried out on survey data to test hypothesis we report in this paper an application of data mining methodologies to breast feeding survey data which have been conducted and analysed by statisticians the purpose of the research is to study the factors leading to deciding whether or not to breast feed new born baby various data mining methods are applied to the data feature or variable selection is conducted to select the most discriminative and least redundant features using an information theory based method and statistical approach decision tree and regression approaches are tested on classification tasks using features selected risk pattern mining method is also applied to identify groups with high risk of not breast feeding the success of data mining in this study suggests that using data mining approaches will be applicable to other similar survey data the data mining methods which enable search for hypotheses may be used as complementary survey data analysis tool to traditional statistical analysis
currently message passing mp and shared address space sas are the two leading parallel programming paradigms mp has been standardized with mpi and is the more common and mature approach however code development can be extremely difficult especially for irregularly structured computations sas offers substantial ease of programming but may suffer from performance limitations due to poor spatial locality and high protocol overhead in this paper we compare the performance of and the programming effort required for six applications under both programming models on processor pc smp cluster platform that is becoming increasingly attractive for high end scientific computing our application suite consists of codes that typically do not exhibit scalable performance under shared memory programming due to their high communication to computation ratios and or complex commumcation patterns results indicate that sas can achieve about half the parallel efficiency of mpi for most of our applications while being competitive for the others hybrid mpi sas strategy shows only small performance advantage over pure mpi in some cases finally improved implementations of two mpi collective operations on pc smp clusters are presented
transient faults that arise in large scale software systems can often be repaired by re executing the code in which they occur ascribing meaningful semantics for safe re execution in multi threaded code is not obvious however for thread to correctly re execute region of code it must ensure that all other threads which have witnessed its unwanted effects within that region are also reverted to meaningful earlier state if not done properly data inconsistencies and other undesirable behavior may result however automatically determining what constitutes consistent global checkpoint is not straightforward since thread interactions are dynamic property of the program in this paper we present safe and efficient checkpointing mechanism for concurrent ml cml that can be used to recover from transient faults we introduce new linguistic abstraction called stabilizers that permits the specification of per thread monitors and the restoration of globally consistent checkpoints global states are computed through lightweight monitoring of communication events among threads eg message passing operations or updates to shared variables our checkpointing abstraction provides atomicity and isolation guarantees during state restoration ensuring restored global states are safe our experimental results on several realistic multithreaded server style cml applications including web server and windowing toolkit show that the overheads to use stabilizers are small and lead us to conclude that they are viable mechanism for defining safe checkpoints in concurrent functional programs our experiments conclude with case study illustrating how to build open nested transactions from our checkpointing mechanism
one of the main benefits of component based architectures is their support for reuse the port and interface definitions of architectural components facilitate the construction of complex functionality by composition of existing components for such composition means for sufficient verification either by testing or formal verification are necessary however the overwhelming complexity of the interaction of distributed real time components usually excludes that testing alone can provide the required coverage when integrating legacy component in this paper we present scheme on how embedded legacy components can be tackled for the embedded legacy components initially behavioral model is derived from the interface description of the architectural model this is in the subsequent steps enriched by an incremental synthesis using formal verification techniques for the systematic generation of component tests the proposed scheme results in an effective combination of testing and formal verification while verification is employed to tackle the inherently subtle interaction of the distributed real time components which could not be covered by testing local testing of the components guided by the verification results is employed to derive refined behavioral models the approach further has two outstanding benefits it can pin point real failures without false negatives right from the beginning it can also prove the correctness of the integration without learning the whole legacy component using the restrictions of the integration context
parallel satisfiability testing algorithm called parallel modoc is presented parallel modoc is based on modoc which is based on propositional model elimination with an added capability to prune away certain branches that cannot lead to successful subrefutation the pruning information is encoded in partial truth assignment called an autarky parallel modoc executes multiple instances of modoc as separate processes and allows processes to cooperate by sharing lemmas and autarkies as they are found when modoc process finds new autarky or new lemma it makes the information available to other modoc processes via ldquo blackboard rdquo combining autarkies generally is not straightforward because two autarkies found by two separate processes may have conflicting assignments the paper presents an algorithm to combine two arbitrary autarkies to form larger autarky experimental results show that for many of the formulas parallel modoc achieves speedup greater than the number of processors formulas that could not be solved in an hour by modoc were often solved by parallel modoc in the order of minutes and in some cases in seconds
an ad hoc network is composed of mobile nodes without the presence of fixed infrastructure communications among nodes are accomplished by forwarding data packets for each other on hop by hop basis along the current connection to the destination node in particular vehicle to vehicle communications have been studied in recent years to improve driver safety as more of such applications of high mobility ad hoc networks emerge it is critical that the routing protocol employed is capable of efficiently coping with the high frequency of broken links ie robust with respect to high mobility this paper presents comprehensive comparative study in city environment of eight representative routing protocols for wireless mobile ad hoc networks and inter vehicular networks developed in recent years in city environment communication protocols need adapt fast moving nodes eg vehicles on streets and large obstacles eg office buildings in this paper we elaborate upon extensive simulation results based on various network scenarios and discuss the strengths and weaknesses of these techniques with regard to their support for highly mobile nodes
traditional caching policies are known to perform poorly for storage server caches one promising approach to solving this problem is to use hints from the storage clients to manage the storage server cache previous hinting approaches are ad hoc in that predefined reaction to specific types of hints is hard coded into the caching policy with ad hoc approaches it is difficult to ensure that the best hints are being used and it is difficult to accommodate multiple types of hints and multiple client applications in this paper we propose client informed caching clic generic hint based policy for managing storage server caches clic automatically interprets hints generated by storage clients and translates them into server caching policy it does this without explicit knowledge of the application specific hint semantics we demonstrate using trace based simulation of database workloads that clic outperforms hint oblivious and state of the art hint aware caching policies we also demonstrate that the space required to track and interpret hints is small
while microprocessor designers turn to multicore architectures to sustain performance expectations the dramatic increase in parallelism of such architectures will put substantial demands on off chip bandwidth and make the memory wall more significant than ever this paper demonstrates that one profitable application of multicore processors is the execution of many similar instantiations of the same program we identify that this model of execution is used in several practical scenarios and term it as multi execution often each such instance utilizes very similar data in conventional cache hierarchies each instance would cache its own data independently we propose the mergeable cache architecture that detects data similarities and merges cache blocks resulting in substantial savings in cache storage requirements this leads to reductions in off chip memory accesses and overall power usage and increases in application performance we present cycle accurate simulation results of benchmarks from spec to demonstrate that our technique provides scalable solution and leads to significant speedups due to reductions in main memory accesses for cores running similar executions of the same application and sharing an exclusive mb way cache the mergeable cache shows speedup in execution by on average ranging from to while posing an overhead of only on cache area and on power when it is used
this paper describes generic tableau algorithm which is the basis for general customizable method for producing oracles from temporal logic specifications generic argument gives semantic rules with which to build the semantic tableau for specification parameterizing the tableau algorithm by semantic rules permits it to easily accommodate variety of temporal operators and provides clean mechanism for fine tuning the algorithm to produce efficient oraclesthe paper develops conditions to ensure that set of rules results in correct tableau procedure it gives sample rules for variety of linear time temporal operators and shows how rules are tailored to reduce the size of an oracle
in this article we introduce sketching reality the process of converting freehand sketch into realistic looking model we apply this concept to architectural designs as the sketch is being drawn our system periodically interprets its geometry by identifying new junctions edges and faces and then analyzing the extracted topology the user can add detailed geometry and textures through sketches as well this is possible through the use of databases that match partial sketches to models of detailed geometry and textures the final product is realistic texture mapped model of the building we show variety of buildings that have been created using this system
recent proliferation of computing devices has brought attention to heterogeneous collaborative systems where key challenges arise from the resource limitations and disparities sharing data across disparate devices makes it necessary to employ mechanisms for adapting the original data and presenting it to the user in the best possible way however this could represent major problem for effective collaboration since users may find it difficult to reach consensus with everyone working with individually tailored data this paper presents novel approach to controlling the coupling of heterogeneous collaborative systems by combining concepts from complex systems and data adaptation techniques the key idea is that data must be adapted to each individual’s preferences and resource capabilities to support and promote collaboration this adaptation must be interdependent and adaptation performed by one individual should influence the adaptation of the others these influences are defined according to the user’s roles and collaboration requirements we model the problem as distributed optimization problem so that the most useful data both for the individual and the group as whole is scheduled for each user while satisfying their preferences their resource limitations and their mutual influences we show how this approach can be applied in collaborative design application and how it can be extended to other applications
tree pattern matching is fundamental problem that has wide range of applications in web data management xml processing and selective data dissemination in this paper we develop efficient algorithms for the tree homeomorphism problem ie the problem of matching tree pattern with exclusively transitive descendant edges we first prove that deciding whether there is tree homeomorphism is logspace complete improving on the current logcfl upper bound as our main result we develop practical algorithm for the tree homeomorphism decision problem that is both space and time efficient the algorithm is in logdcfl and space consumption is strongly bounded while the running time is linear in the size of the data tree this algorithm immediately generalizes to the problem of matching the tree pattern against all subtrees of the data tree preserving the mentioned efficiency properties
for given query raised by specific user the query suggestion technique aims to recommend relevant queries which potentially suit the information needs of that user due to the complexity of the web structure and the ambiguity of users inputs most of the suggestion algorithms suffer from the problem of poor recommendation accuracy in this paper aiming at providing semantically relevant queries for users we develop novel effective and efficient two level query suggestion model by mining clickthrough data in the form of two bipartite graphs user query and query url bipartite graphs extracted from the clickthrough data based on this we first propose joint matrix factorization method which utilizes two bipartite graphs to learn the low rank query latent feature space and then build query similarity graph based on the features after that we design an online ranking algorithm to propagate similarities on the query similarity graph and finally recommend latent semantically relevant queries to users experimental analysis on the clickthrough data of commercial search engine shows the effectiveness and the efficiency of our method
bittorrent is widely popular peer to peer content distribution protocol unveiling patterns of resource demand and supply in its usage is paramount to inform operators and designers of bittorrent and of future content distribution systems this study examines three bittorrent content sharing communities regarding resource demand and supply the resulting characterization is significantly broader and deeper than previous bittorrent investigations it compares multiple bittorrent communities and investigates aspects that have not been characterized before such as aggregate user behavior and resource contention the main findings are three fold resource demand more accurate model for the peer arrival rate over time is introduced contributing to workload synthesis and analysis additionally torrent popularity distributions are found to be non heavy tailed what has implications on the design of bittorrent caching mechanisms ii resource supply small set of users contributes most of the resources in the communities but the set of heavy contributors changes over time and is typically not responsible for most resources used in the distribution of an individual file these results imply some level of robustness can be expected in bittorrent communities and directs resource allocation efforts iii relation between resource demand and supply users that provide more resources are also those that demand more from it also the distribution of file usually experiences resource contention although the communities achieve high rates of served requests
the paper presents deterministic distributed algorithm that given constructs in rounds spanner of edges for every node unweighted graph if is not available to the nodes then our algorithm executes in rounds and still returns spanner with edges previous distributed solutions achieving such optimal stretch size trade off either make use of randomization providing performance guarantees in expectation only or perform in log� rounds and all require priori knowledge of based on this algorithm we propose second deterministic distributed algorithm that for every constructs spanner of edges in rounds without any prior knowledge on the graph our algorithms are complemented with lower bounds which hold even under the assumption that is known to the nodes it is shown that any randomized distributed algorithm requires rounds in expectation to compute spanner of edges for it is also shown that for every any randomized distributed algorithm that constructs spanner with fewer than edges in at most nε expected rounds must stretch some distances by an additive factor of n� in other words while additive stretched spanners with edges may exist eg for they cannot be computed distributively in sub polynomial number of rounds in expectation
due to the availability of huge amount of textual data from variety of sources users of internationally distributed information regions need effective methods and tools that enable them to discover retrieve and categorize relevant information in whatever language and form it may have been stored this drives convergence of numerous interests from diverse research communities focusing on the issues related to multilingual text categorization in this work we implemented and measured the performance of the leading supervised and unsupervised approaches for multilingual text categorization we selected support vector machines svm as representative of supervised techniques as well as latent semantic indexing lsi and self organizing maps som techniques as our selective ones of unsupervised methods for system implementation the preliminary results show that our platform models including both supervised and unsupervised learning methods have the potentials for multilingual text categorization
the economic and social demand for ubiquitous and multifaceted electronic systems in combination with the unprecedented opportunities provided by the integration of various manufacturing technologies is paving the way to new class of heterogeneous integrated systems with increased performance and connectedness and providing us with gateways to the living world this paper surveys design requirements and solutions for heterogeneous systems and addresses design technologies for realizing them
communication overhead is the key obstacle to reaching hardware performance limits the majority is associated with software overhead significant portion of which is attributed to message copying to reduce this copying overhead we have devised techniques that do not require to copy received message in order for it to be bound to its final destination rather late binding mechanism which involves address translation and dedicated cache facilitates fast access to received messages by the consuming process thread we have introduced two policies namely direct to cache transfer dtct and lazy dtct that determine whether message after it is bound needs to be transferred into the data cache we have studied the proposed methods in simulation and have shown their effectiveness in reducing access times to message payloads by the consuming process
to create new flexible system for volume illustration we have explored the use of wang cubes the extension of wang tiles we use small sets of wang cubes to generate large variety of nonperiodic illustrative patterns and texture which otherwise would be too large to use in real applications we also develop direct volume rendering framework with the generated patterns and textures our framework can be used to render volume datasets effectively and variety of rendering styles can be achieved with less storage specifically we extend the nonperiodic tiling process of wang tiles to wang cubes and modify it for multipurpose tiling we automatically generate isotropic wang cubes consisting of patterns or textures to simulate various illustrative effects anisotropic wang cubes are generated to yield patterns by using the volume data curvature and gradient information we also extend the definition of wang cubes into set of different sized cubes to provide multiresolution volume rendering finally we provide both coherent geometry based and texture based rendering frameworks that can be integrated with arbitrary feature exploration methods
empowering users to access databases using simple keywords can relieve the users from the steep learning curve of mastering structured query language and understanding complex and possibly fast evolving data schemas in this tutorial we give an overview of the state of the art techniques for supporting keyword search on structured and semi structured data including query result definition ranking functions result generation and top query processing snippet generation result clustering query cleaning performance optimization and search quality evaluation various data models will be discussed including relational data xml data graph structured data data streams and workflows we also discuss applications that are built upon keyword search such as keyword based database selection query generation and analytical processing finally we identify the challenges and opportunities of future research to advance the field
this paper presents novel approach for an efficient yet accurate estimation technique for power consumption and performance of embedded and general purpose applications our approach is adaptive in nature and is based on detecting sections of code characterized by high temporal locality also called hotspots in the execution profile of the benchmark being executed on target processor the technique itself is architecture and input independent and can be used for both embedded as well as for general purpose processors we have implemented hybrid simulation engine which can significantly shorten the simulation time by using on the fly profiling for critical sections of the code and by reusing this information during power performance estimation for the rest of the code by using this strategy we were able to achieve up to better accuracy compared to flat non adaptive sampling scheme and simulation speed up of up to with maximum error of for performance and for total energy on wide variety of media and general purpose applications
efficient and visually compelling reproduction of effects due to multiple scattering in participating media remains one of the most difficult tasks in computer graphics although several fast techniques were recently developed most of them work only for special types of media for example uniform or sufficiently dense or require extensive precomputation in this paper we present lighting model for the general case of inhomogeneous medium and demonstrate its implementation on programmable graphics hardware it is capable of producing high quality imagery at interactive frame rates with only mild assumptions about medium scattering properties and moderate amount of simple precomputation
automated refactoring tools are an essential part of software developer’s toolbox they are most useful for gradually improving large existing code bases and it is essential that they work reliably since even simple refactoring may affect many different parts of program and the programmer should not have to inspect every individual change to ensure that the transformation went as expected even extensively tested industrial strength refactoring engines however are fraught with many bugs that lead to incorrect non behaviour preserving transformations we argue that software refactoring tools are prime candidate for mechanical verification offering significant challenges but also the prospect of tangible benefits for real world software development
join queries having heavy cost are necessary to data stream management system in the sensor network in this paper we propose an optimization algorithm for multiple continuous join operators over data streams using heuristic strategy first we propose solution of building the global shared query execution plan second we solve the problems of updating window size and routing for join result our experimental results show that the proposed protocol can provide better throughputs than previous methods
we present the secure communication library seal source can be downloaded from http wwwcsecuhkeduhk cslui ansrlab software seal linux based language application programming interface api library that implements secure group key agreement algorithms that allow communication group to periodically renew common secret group key for secure and private communication the group key agreement protocols satisfy several important characteristics distributed property ie no centralized key server is needed collaborative property ie every group member contributes to the group key and dynamic property ie group members can join or leave the group without impairing the efficiency of the group key generation using seal we developed testing tool termed gauger to evaluate the performance of the group key agreement algorithms in both wired and wireless lans according to different levels of membership dynamics we show that our implementation achieves robustness when there are group members leaving the communication group in the middle of rekeying operation we also developed secure chat room application termed chatter to illustrate the usage of seal our seal implementation demonstrates the effectiveness of group key agreement in real network settings
real time surveillance systems telecommunication systems and other dynamic environments often generate tremendous potentially infinite volume of stream data the volume is too huge to be scanned multiple times much of such data resides at rather low level of abstraction whereas most analysts are interested in relatively high level dynamic changes such as trends and outliers to discover such high level characteristics one may need to perform on line multi level multi dimensional analytical processing of stream data in this paper we propose an architecture called streamcube to facilitate on line multi dimensional multi level analysis of stream datafor fast online multi dimensional analysis of stream data three important techniques are proposed for efficient and effective computation of stream cubes first tilted time frame model is proposed as multi resolution model to register time related data the more recent data are registered at finer resolution whereas the more distant data are registered at coarser resolution this design reduces the overall storage of time related data and adapts nicely to the data analysis tasks commonly encountered in practice second instead of materializing cuboids at all levels we propose to maintain small number of critical layers flexible analysis can be efficiently performed based on the concept of observation layer and minimal interesting layer third an efficient stream data cubing algorithm is developed which computes only the layers cuboids along popular path and leaves the other cuboids for query driven on line computation based on this design methodology stream data cube can be constructed and maintained incrementally with reasonable amount of memory computation cost and query response time this is verified by our substantial performance study
in this paper we present framework for design space exploration of network processor that incorporates parameterisation power and cost analysis this method utilises multi objective evolutionary algorithms and object oriented analysis and design using this approach an engineer specifies certain hard and soft performance requirements for multi processor system and allows it to be generated automatically by competitive evolution optimisation thus obviating the need for detailed design to make the proposal concrete we use the intel ixp network processor as baseline complex system design and show how various improvements can be make to this architecture by evolutionary competitive design various approaches to multi objective optimisation darwin lamarck baldwin etc are compared and contrasted in their ability to generate architectures meeting various constraints we also present an assessment of proposed architecture with reference to four different packet processing roles the merits of an island clocking scheme versus common clocking scheme are also discussed our paper highlights the flexibility that this framework bestows on the designer along with the potential to achieve cost savings and performance improvement
cql continuous query language is supported by the stream prototype data stream management system dsms at stanford cql is an expressive sql based declarative language for registering continuous queries against streams and stored relations we begin by presenting an abstract semantics that relies only on xc black box xd mappings among streams and relations from these mappings we define precise and general interpretation for continuous queries cql is an instantiation of our abstract semantics using sql to map from relations to relations window specifications derived from sql to map from streams to relations and three new operators to map from relations to streams most of the cql language is operational in the stream system we present the structure of cql’s query execution plans as well as details of the most important components operators interoperator queues synopses and sharing of components among multiple operators and queries examples throughout the paper are drawn from the linear road benchmark recently proposed for dsmss we also curate public repository of data stream applications that includes wide variety of queries expressed in cql the relative ease of capturing these applications in cql is one indicator that the language contains an appropriate set of constructs for data stream processing
when integrating data from autonomous sources exact matches of data items that represent the same real world object often fail due to lack of common keys yet in many cases structural information is available and can be used to match such data as running example we use residential address information addresses are hierarchical structures and are present in many databases often they are the best if not only relationship between autonomous data sources typically the matching has to be approximate since the representations in the sources differwe propose pq grams to approximately match hierarchical information from autonomous sources we define the pq gram distance between ordered labeled trees as an effective and efficient approximation of the well known tree edit distance we analyze the properties of the pq gram distance and compare it with the edit distance and alternative approximations experiments with synthetic and real world data confirm the analytic results and the scalability of our approach
we present new programming model gueesstimate for developing collaborative distributed systems the model allows atomic isolated operations that transform system from consistent state to consistent state and provides shared transactional store for collection of such operations executed by various machines in distributed system in addition to committed state which is identical in all machines in the distributed system guesstimate allows each machine to have replicated local copy of the state called guesstimated state so that operations on shared state can be executed locally without any blocking while also guaranteeing that eventually all machines agree on the sequences of operations executed thus each operation is executed multiple times once at the time of issue when it updates the guesstimated state of the issuing machine once when the operation is committed atomically to the committed state of all machines and several times in between as the guesstimated state converges toward the committed state while we expect the results of these executions of the operation to be identical most of the time in the class of applications we study it is possible for an operation to succeed the first time when it is executed on the guesstimated state and fail when it is committed guesstimate provides facilities that allow the programmer to deal with this potential discrepancy this paper presents our programming model its operational semantics its realization as an api in and our experience building collaborative distributed applications with this model
protocols that govern the interactions between software components are popular means to support the construction of correct component based systems previous studies have however almost exclusively focused on static component systems that are not subject to evolution evolution of component based systems with explicit interaction protocols can be defined quite naturally using aspects in the sense of aop that modify component protocols major question then is whether aspect based evolutions preserve fundamental correctness properties such as compatibility and substitutability relations between software components in this paper we discuss how such correctness properties can be proven in the presence of aspect languages that allow matching of traces satisfying interaction protocols and enable limited modifications to protocols we show how common evolutions of distributed components can be modeled using vpa based aspects and be proven correct directly in terms of properties of operators of the aspect language we first present several extensions to an existing language for vpa based aspects that facilitate the evolution of component systems we then discuss different proof techniques for the preservation of composition properties of component based systems that are subject to evolution using protocol modifying aspects
transactional memory tm thread level speculation tls and checkpointed multiprocessors are three popular architectural techniques based on the execution of multiple cooperating speculative threads in these environments correctly maintaining data dependences across threads requires mechanisms for disambiguating addresses across threads invalidating stale cache state and making committed state visible these mechanisms are both conceptually involved and hard to implement in this paper we present bulk novel approach to simplify these mechanisms the idea is to hash encode thread’s access information in concise signature and then support in hardware signature operations that efficiently process sets of addresses such operations implement the mechanisms described bulk operations are inexact but correct and provide substantial conceptual and implementation simplicity we evaluate bulk in the context of tls using specint codes and tm using multithreaded java workloads despite its simplicity bulk has competitive performance with more complex schemes we also find that signature configuration is key design parameter
we develop notion of higher order connector towards supporting the systematic construction of architectural connectors for software design higher order connector takes connectors as parameters and allows for services such as security protocols and fault tolerance mechanisms to be superposed over the interactions that are handled by the connectors passed as actual arguments the notion is first illustrated over community parallel program design language that we have been using for formalizing aspects of architectural design formal algebraic semantics is then presented which is independent of any architectural description language finally we discuss how our results can impact software design methods and tools
compilation for embedded processors can be either aggressive time consuming cross compilation or just in time embedded and usually dynamic the heuristics used in dynamic compilation are highly constrained by limited resources time and memory in particular recent results on the ssa form open promising directions for the design of new register allocation heuristics for embedded systems and especially for embedded compilation in particular heuristics based on tree scan with two separated phases one for spilling then one for coloring coalescing seem good candidates for designing memory friendly fast and competitive register allocators still also because of the side effect on power consumption the minimization of loads and stores overhead spilling problem is an important issue this paper provides an exhaustive study of the complexity of the spill everywhere problem in the context of the ssa form unfortunately conversely to our initial hopes many of the questions we raised lead to np completeness results we identify some polynomial cases but that are impractical in jit context nevertheless they can give hints to simplify formulations for the design of aggressive allocators
this paper describes new method for self calibration of camera with constant internal parameters under circular motion using one sequence and two images captured with different camera orientations unlike the previous method in which three circular motion sequences are needed with known motion the new method computes the rotation angles and the projective reconstructions of the sequence and the images with circular constraint enforced which is called circular projective reconstruction using factorization based method it is then shown that the images of the circular points of each circular projective reconstruction can be readily obtained subsequently the image of the absolute conic and the calibration matrix of the camera can be determined experiments on both synthetic and real image sequence are given showing the accuracy and robustness of the new algorithm
the lack of positive results on supervised domain adaptation for wsd have cast some doubts on the utility of hand tagging general corpora and thus developing generic supervised wsd systems in this paper we show for the first time that our wsd system trained on general source corpus bnc and the target corpus obtains up to error reduction when compared to system trained on the target corpus alone in addition we show that as little as of the target corpus when supplemented with the source corpus is sufficient to obtain the same results as training on the full target data the key for success is the use of unlabeled data with svd combination of kernels and svm
when building service oriented systems it is often the case that existing web services do not perfectly match user requirements in target systems to achieve smooth integration and high reusability of web services mechanisms to support automated evolution of web services are highly in demand this paper advocates achieving the above evolution by applying highly automated aspect oriented adaptation approach to the underlying components of web services by generating and then applying the adaptation aspects under designed weaving process according to specific adaptation requirements an expandable library of reusable adaptation aspects at multiple abstraction levels has been developed prototype tool is developed to scale up the approach
localization is crucial to many applications in wireless sensor networks in this article we propose range free anchor based localization algorithm for mobile wireless sensor networks that builds upon the monte carlo localization algorithm we concentrate on improving the localization accuracy and efficiency by making better use of the information sensor node gathers and by drawing the necessary location samples faster to do so we constrain the area from which samples are drawn by building box that covers the region where anchors radio ranges overlap this box is the region of the deployment area where the sensor node is localized simulation results show that localization accuracy is improved by minimum of and by maximum of average for varying node speeds when considering nodes with knowledge of at least three anchors the coverage is also strongly affected by speed and its improvement ranges from to average finally the processing time is reduced by for similar localization accuracy
contextual advertising on web pages has become very popular recently and it poses its own set of unique text mining challenges often advertisers wish to either target or avoid some specific content on web pages which may appear only in small part of the page learning for these targeting tasks is difficult since most training pages are multi topic and need expensive human labeling at the sub document level for accurate training in this paper we investigate ways to learn for sub document classification when only page level labels are available these labels only indicate if the relevant content exists in the given page or not we propose the application of multiple instance learning to this task to improve the effectiveness of traditional methods we apply sub document classification to two different problems in contextual advertising one is sensitive content detection where the advertiser wants to avoid content relating to war violence pornography etc even if they occur only in small part of page the second problem involves opinion mining from review sites the advertiser wants to detect and avoid negative opinion about their product when positive negative and neutral sentiments co exist on page in both these scenarios we present experimental results to show that our proposed system is able to get good block level labeling for free and improve the performance of traditional learning methods
current relational databases have been developed in order to improve the handling of stored data however there are some types of information that have to be analysed for which no suitable tools are available these new types of data can be represented and treated as constraints allowing set of data to be represented through equations inequations and boolean combinations of both to this end constraint databases were defined and some prototypes were developed since there are aspects that can be improved we propose new architecture called labelled object relational constraint database lorcdb this provides more expressiveness since the database is adapted in order to support more types of data instead of the data having to be adapted to the database in this paper the projection operator of sql is extended so that it works with linear and polynomial constraints and variables of constraints in order to optimize query evaluation efficiency some strategies and algorithms have been used to obtain an efficient query plan most work on constraint databases uses spatiotemporal data as case studies however this paper proposes model based diagnosis since it is highly potential research area and model based diagnosis permits more complicated queries than spatiotemporal examples our architecture permits the queries over constraints to be defined over different sets of variables by using symbolic substitution and elimination of variables
the traditional assumption about memory is that read returns the value written by the most recent write however in shared memory multiprocessor several processes independently and simultaneously submit reads and writes resulting in partial order of memory operations in this partial order the definition of most recent write may be ambiguous memory consistency models have been developed to specify what values may be returned by read given that memory operations may only be partially ordered before this work consistency models were defined independently each model followed set of rules which was separate from the rules of every other model in our work we have defined set of four consistency properties any subset of the four properties yields set of rules which constitute consistency model every consistency model previously described in the literature can be defined based on our four properties therefore we present these properties as unfied theory of shared memory consistencyour unified theory provides several benefits first we claim that these four properties capture the underlying structure of memory consistency that is the goal of memory consistency is to ensure certain declarative properties which can be intuitively understood by programmer and hence allow him or her to write correct program our unified theory provides uniform formal definition of all previously described consistency models and in addition some combinations of properties produce new models that have not yet been described we believe these new models will prove to be useful because they are based on declarative properties which programmers desire to be enforced finally we introduce the idea of selecting consistency model as an on line activity before our work shared memory program would run start to finish under single consistency model our unified theory allows the consistency model to change as the program runs while maintaining consistent definition of what values may be returned by each read
we consider the problem of dynamically assigning application sessions of mobile users or user groups to service points such assignments must balance the trade off between two conflicting goals on the one hand we would like to connect user to the closest server in order to reduce network costs and service latencies on the other hand we would like to minimize the number of costly session migrations or handoffs between service points we tackle this problem using two approaches first we employ algorithmic online optimization to obtain algorithms whose worst case performance is within factor of the optimal next we extend them with opportunistic heuristics that achieve near optimal practical average performance and scalability we conduct case studies of two settings where such algorithms are required wireless mesh networks with mobile users and wide area groupware applications with or without mobility
in this paper we discuss which properties of formally verified component are preserved when the component is changed due to an adaption to new use more specifically we will investigate when temporal logic property of an object class is preserved under modification or extension of the class with new features to this end we use the slicing technique from program analysis which provides us with representation of the dependencies within the class in the form of program dependence graph this graph can be used to determine the effect of change to the class’s behaviour and thus to the validity of temporal logic formula
information integration systems provide uniform interfaces to varieties of heterogeneous information sources for query answering in such systems the current generation of query answering algorithms in local as view source centric information integration systems all produce what has been thought of as the best obtainable answer given the circumstances that the source centric approach introduces incomplete information into the virtual global relations however this best obtainable answer does not include all information that can be extracted from the sources because it does not allow partial information neither does the best obtainable answer allow for composition of queries meaning that querying result of previous query will not be equivalent to the composition of the two queries in this paper we provide foundation for information integration based on the algebraic theory of incomplete information our framework allows us to define the semantics of partial facts and introduce the notion of the exact answer that is the answer that includes partial facts we show that querying under the exact answer semantics is compositional we also present two methods for actually computing the exact answer the first method is tableau based and it is generalization of the inverse rules approach the second much more efficient method is generalization of the rewriting approach and it is based on partial containment mappings introduced in the paper
this article presents the design implementation and evaluation of enviromic low cost experimental prototype of novel distributed acoustic monitoring storage and trace retrieval system designed for disconnected operation our intended use of acoustic monitoring is to study animal populations in the wild since permanent connection to the outside world is not assumed and due to the relatively large size of audio traces the system must optimally exploit available resources such as energy and network storage capacity towards that end we design prototype and evaluate distributed algorithms for coordinating acoustic recording tasks reducing redundancy of data stored by nearby sensors filtering out silence and balancing storage utilization in the network for experimentation purposes we implement enviromic on tinyos based platform and systematically evaluate its performance through both indoor testbed experiments and an outdoor deployment results demonstrate up to four fold improvement in effective storage capacity of the network compared to uncoordinated recording
we investigate the price of selfish routing in non cooperative networks in terms of the coordination and bicriteria ratios in the recently introduced game theoretic network model of koutsoupias and papadimitriou we present the first thorough study of this model for general monotone families of cost functions and for cost functionsm from queueing theory our main results can be summarized as followswe give precise characterization of cost functions having bounded unbounded coordination ratio for example cost functions that describe the expected delay in queueing systems have an unbounded coordination ratiowe show that an unbounded coordination ratio implies additionally an extremely high performance degradation under bicriteria measures we demonstrate that the price of selfish routing can be as high as bandwidth degradation by factor that is linear in the network sizewe separate the game theoretic integral allocation model from the fractional flow model by demonstrating that even very small in fact negligible amount of integrality can lead to dramatic performance degradationwe unify recent results on selfish routing under different objectives by showing that an unbounded coordination ratio under the min max objective implies an unbounded coordination ratio under the average cost or total latency objective and vice versa our special focus lies on cost functions describing the behavior of web servers that can open only limited number of tcp connections in particular we compare the performance of queueing systems that serve all incoming requests with servers that reject requests in case of overloadfrom the result presented in this paper we conclude that queuing systems without rejection cannot give any reasonable guarantee on the expected delay of requests under selfish routing even when the injected load is far away from the capacity of the system in contrast web server farms that are allowed to reject requests can guarantee high quality of service for every individual request stream even under relatively high injection rates
packet classification is crucial for the internet to provide more value added services and guaranteed quality of service besides hardware based solutions many software based classification algorithms have been proposed however classifying at gbps speed or higher is challenging problem and it is still one of the performance bottlenecks in core routers in general classification algorithms face the same challenge of balancing between high classification speed and low memory requirements this paper proposes modified recursive flow classification rfc algorithm bitmap rfc which significantly reduces the memory requirements of rfc by applying bitmap compression technique to speed up classifying speed we experiment on exploiting the architectural features of many core and multithreaded architecture from algorithm design to algorithm implementation as result bitmap rfc strikes good balance between speed and space it can not only keep high classification speed but also reduce memory space significantlythis paper investigates the main npu software design aspects that have dramatic performance impacts on any npu based implementations memory space reduction instruction selection data allocation task partitioning and latency hiding we experiment with an architecture aware design principle to guarantee the high performance of the classification algorithm on an npu implementation the experimental results show that the bitmap rfc algorithm achieves gbps speed or higher and has good scalability on intel ixp np
the results of the cade atp system competition casc are presented
heterogeneous multi processor soc platforms bear the potential to optimize conflicting performance flexibility and energy efficiency constraints as imposed by demanding signal processing and networking applications however in order to take advantage of the available processing and communication resources an optimal mapping of the application tasks onto the platform resources is of crucial importance in this paper we propose systemc based simulation framework which enables the quantitative evaluation of application to platform mappings by means of an executable performance model key element of our approach is configurable event driven virtual processing unit to capture the timing behavior of multi processor multi threaded mp soc platforms the framework features an xml based declarative construction mechanism of the performance model to significantly accelerate the navigation in large design spaces the capabilities of the proposed framework in terms of design space exploration is presented by case study of commercially available mp soc platform for networking applications focussing on the application to architecture mapping our introduced framework highlights the potential for optimization of an efficient design space exploration environment
most of the research on text categorization has focused on classifying text documents into set of categories with no structural relationships among them flat classification however in many information repositories documents are organized in hierarchy of categories to support thematic search by browsing topics of interests the consideration of the hierarchical relationship among categories opens several additional issues in the development of methods for automated document classification questions concern the representation of documents the learning process the classification process and the evaluation criteria of experimental results they are systematically investigated in this paper whose main contribution is general hierarchical text categorization framework where the hierarchy of categories is involved in all phases of automated document classification namely feature selection learning and classification of new document an automated threshold determination method for classification scores is embedded in the proposed framework it can be applied to any classifier that returns degree of membership of document to category in this work three learning methods are considered for the construction of document classifiers namely centroid based naïve bayes and svm the proposed framework has been implemented in the system webclassiii and has been tested on three datasets yahoo dmoz rcv which present variety of situations in terms of hierarchical structure experimental results are reported and several conclusions are drawn on the comparison of the flat vs the hierarchical approach as well as on the comparison of different hierarchical classifiers the paper concludes with review of related work and discussion of previous findings vs our findings
many online communities are emerging that like wikipedia bring people together to build community maintained artifacts of lasting value calvs motivating people to contribute is key problem because the quantity and quality of contributions ultimately determine calv’s value we pose two related research questions how does intelligent task routing matching people with work affect the quantity of contributions how does reviewing contributions before accepting them affect the quality of contributions field experiment with contributors shows that simple intelligent task routing algorithms have large effects we also model the effect of reviewing contributions on the value of calvs the model predicts and experimental data shows that value grows more slowly with review before acceptance it also predicts surprisingly that calv will reach the same final value whether contributions are reviewed before or after they are made available to the community
with many daily tasks now performed on the internet productivity and efficiency in working with web pages have become transversal necessities for all users many of these tasks involve the inputting of user information obligating the user to interact with webform research has demonstrated that productivity depends largely on users personal characteristics implying that it will vary from user to user the webform development process must therefore include modeling of its intended users to ensure the interface design is appropriate taking all potential users into account is difficult however primarily because their identity is unknown and some may be effectively excluded by the final design such discrimination can be avoided by incorporating rules that allow webforms to adapt automatically to the individual user’s characteristics the principal one being the person’s culture in this paper we report two studies that validate this option we begin by determining the relationships between user’s cultural dimension scores and their behavior when faced with webform we then validate the notion that rules based on these relationships can be established for the automatic adaptation of webform in order to reduce the time taken to complete it we conclude that the automatic webform adaptation to the cultural dimensions of users improves their performance
statistical machine translation smt treats the translation of natural language as machine learning problem by examining many samples of human produced translation smt algorithms automatically learn how to translate smt has made tremendous strides in less than two decades and new ideas are constantly introduced this survey presents tutorial overview of the state of the art we describe the context of the current research and then move to formal problem description and an overview of the main subproblems translation modeling parameter estimation and decoding along the way we present taxonomy of some different approaches within these areas we conclude with an overview of evaluation and discussion of future directions
handling the evolving permanent contact of deformable objects leads to collision detection problem of high computing cost situations in which this type of contact happens are becoming more and more present with the increasing complexity of virtual human models especially for the emerging medical applications in this context we propose novel collision detection approach to deal with situations in which soft structures are in constant but dynamic contact which is typical of biological elements our method proceeds in two stages first in preprocessing stage mesh is chosen under certain conditions as reference mesh and is spherically sampled in the collision detection stage the resulting table is exploited for each vertex of the other mesh to obtain in constant time its signed distance to the fixed mesh the two working hypotheses for this approach to succeed are typical of the deforming anatomical systems we target first the two meshes retain layered configuration with respect to central point and second the fixed mesh tangential deformation is bounded by the spherical sampling resolution within this context the proposed approach can handle large relative displacements reorientations and deformations of the mobile mesh we illustrate our method in comparison with other techniques on biomechanical model of the human hip joint
web sites must often service wide variety of clients thus it is inevitable that web site will allow some visitors to find their information quickly while other visitors have to follow many links to get to the information that they need worse as web sites evolve they may get worse over time so that all visitors have to follow many links to find the information that they need this paper describes an extensible system that analyzes web logs to find and exploit opportunities for improving the navigation of web site the system is extensible in that the inefficiencies that it finds and eliminates are not predetermined to search for new kind of inefficiency web site admininstrators can provide pattern in language designed specifically for this that finds and eliminates the new inefficiency
this paper proposes new single rate multicast congestion control scheme named pgmtcc which has been implemented and investigated in pgm the primary idea of pgmtcc is to extend sack tcp congestion control mechanism to multicast in order to make multicast perform almost the same as sack tcp under all kinds of network conditions to achieve this goal first of all the sender should accurately select receiver with the worst throughput as representative acker by simplified equation of tcp throughput then the sack tcp congestion control mechanism with some modifications to be adapted to multicast is deployed to take charge of congestion control between the sender and the acker moreover in our scheme the problem of the feedback suppression is considered and solved by selective suppression mechanism of feedback ns is used to test and investigate the performance of our scheme as expected pgmtcc performs almost like sack tcp under all kinds of conditions we believe that it is tcp friendly robust and scalable
hierarchical text categorization htc is the task of generating usually by means of supervised learning algorithms text classifiers that operate on hierarchically structured classification schemes notwithstanding the fact that most large sized classification schemes for text have hierarchical structure so far the attention of text classification researchers has mostly focused on algorithms for flat classification ie algorithms that operate on non hierarchical classification schemes these algorithms once applied to hierarchical classification problem are not capable of taking advantage of the information inherent in the class hierarchy and may thus be suboptimal in terms of efficiency and or effectiveness in this paper we propose treeboostmh multi label htc algorithm consisting of hierarchical variant of adaboostmh very well known member of the family of boosting learning algorithms treeboostmh embodies several intuitions that had arisen before within htc eg the intuitions that both feature selection and the selection of negative training examples should be performed locally ie by paying attention to the topology of the classification scheme it also embodies the novel intuition that the weight distribution that boosting algorithms update at every boosting round should likewise be updated locally all these intuitions are embodied within treeboostmh in an elegant and simple way ie by defining treeboostmh as recursive algorithm that uses adaboostmh as its base step and that recurs over the tree structure we present the results of experimenting treeboostmh on three htc benchmarks and discuss analytically its computational cost
to defend against multi step intrusions in high speed networks efficient algorithms are needed to correlate isolated alerts into attack scenarios existing correlation methods usually employ an in memory index for fast searches among received alerts with finite memory the index can only be built on limited number of alerts inside sliding window knowing this fact an attacker can prevent two attack steps from both falling into the sliding window by either passively delaying the second step or actively injecting bogus alerts between the two steps in either case the correlation effort is defeated in this paper we first address the above issue with novel queue graph qg approach instead of searching all the received alerts for those that prepare for new alert we only search for the latest alert of each type the correlation between the new alert and other alerts is implicitly represented using the temporal order between alerts consequently our approach can correlate alerts that are arbitrarily far away and it has linear in the number of alert types time complexity and quadratic memory requirement then we extend the basic qg approach to unified method to hypothesize missing alerts and to predict future alerts finally we propose compact representation for the result of alert correlation empirical results show that our method can fulfill correlation tasks faster than an ids can report alerts hence the method is promising solution for administrators to monitor and predict the progress of intrusions and thus to take appropriate countermeasures in timely manner
our ability to generate ever larger increasingly complex data has established the need for scalable methods that identify and provide insight into important variable trends and interactions query driven methods are among the small subset of techniques that are able to address both large and highly complex datasetsthis paper presents new method that increases the utility of query driven techniques by visually conveying statistical information about the trends that exist between variables in query in this method correlation fields created between pairs of variables are used with the cumulative distribution functions of variables expressed in user’s query this integrated use of cumulative distribution functions and correlation fields visually reveals with respect to the solution space of the query statistically important interactions between any three variables and allows for trends between these variables to be readily identified we demonstrate our method by analyzing interactions between variables in two flame front simulations
similarity search is widely used in multimedia retrieval systems to find the most similar ones for given object some similarity measures however are not metric leading to existing metric index structures cannot be directly used to address this issue we propose simulated annealing based technique to derive optimized mapping functions that transfer non metric measures into metric and still preserve the original similarity orderings then existing metric index structures can be used to speed up similarity search by exploiting the triangular inequality property the experimental study confirms the efficacy of our approach
large body of research literature has focused on improving the performance of longest prefix match ip lookup more recently embedded memory based architectures have been proposed which delivers very high lookup and update throughput these architectures often use pipeline of embedded memories where each stage stores single or set of levels of the lookup trie stream of lookup requests are issued into the pipeline one every cycle in order to achieve high throughput most recently baboescu et al have proposed novel architecture which uses circular memory pipeline and dynamically maps parts of the lookup trie to different stagesin this paper we extend this approach with an architecture called circular adaptive and monotonic pipeline camp which is based upon the key observation that circular pipeline allows decoupling the number of pipeline stages from the number of levels in the trie this provides much more flexibility in mapping nodes of the lookup trie to the stages the flexibility in turn improves the memory utilization and also reduces the total memory and power consumption the flexibility comes at cost however since the requests are issued at an arbitrary stage they may get blocked if their entry stage is busy in an extreme case request may block for time equal to the pipeline depth which may severely affect the pipeline utilization we show that fairly straightforward techniques can ensure nearly full utilization of the pipeline these techniques coupled with an adaptive mapping of trie nodes to the circular pipeline create pipelined architecture which can operate at high rates irrespective of the trie size
recently keyword search has attracted great deal of attention in xml database it is hard to directly improve the relevancy of xml keyword search because lots of keyword matched nodes may not contribute to the results to address this challenge in this paper we design an adaptive xml keyword search approach called xbridge that can derive the semantics of keyword query and generate set of effective structured queries by analyzing the given keyword query and the schemas of xml data sources to efficiently answer keyword query we only need to evaluate the generated structured queries over the xml data sources with any existing xquery search engine in addition we extend our approach to process top keyword search based on the execution plan to be proposed the quality of the returned answers can be measured using the context of the keyword matched nodes and the contents of the nodes together the effectiveness and efficiency of xbridge is demonstrated with an experimental performance study on real xml data
the rapid advancement of world wide web web technology and constant need for attractive websites produce pages that hinder visually impaired users we assert that understanding how sighted users browse web pages can provide important information that will enhance web accessibility especially for visually impaired users we present an eye tracking study where sighted users browsing behaviour on nine web pages was investigated to determine how the page’s visual clutter is related to sighted users browsing patterns the results show that salient elements attract users attention first users spend more time on the main content of the page and users tend to fixate on the first three or four items on the menu lists common gaze patterns begin at the salient elements of the page move to the main content header right column and left column of the page and finish at the footer area we argue that the results should be used as the initial step for proposing guidelines that assist in designing and transforming web pages for an easier and faster access for visually impaired users
specification matching is technique that has been used to retrieve reusable components from reuse libraries the relationship between query specification and library specification is typically based on refinement where library specification matches query specification if the library specification is more detailed than the query specification reverse engineering is process of analyzing components and component interrelationships in order to construct descriptions of system at higher level of abstraction in this paper we define the concept of an abstraction match as basis for reverse engineering and show how the abstraction match can be used to facilitate process for generalizing specifications finally we apply the specification generalization technique to portion of nasa jpl ground based mission control system for unmanned flight systems
we introduce new programming language construct interactors supporting the agent oriented view that programming is dialog between simple self contained autonomous building blockswe define interactors as an abstraction of answer generation and refinement in logic engines resulting in expressive language extension and metaprogramming patternsas first step toward declarative semantics we sketch pure prolog specification showing that interactors can be expressed at source level in relatively simple and natural wayinteractors extend language constructs like ruby python and multiple coroutining block returns through yield statements and they can emulate the action of fold operations and monadic constructs in functional languagesusing the interactor api we describe at source level language extensions like dynamic databases and algorithms involving generation of infinite answer streams
while implicit relevance feedback irf algorithms exploit users interactions with information to customize support offered to users of search systems it is unclear how individual and task differences impact the effectiveness of such algorithms in this paper we describe study on the effect on retrieval performance of using additional information about the user and their search tasks when developing irf algorithms we tested four algorithms that use document display time to estimate relevance and tailored the threshold times ie the time distinguishing relevance from non relevance to the task the user combination of both or neither interaction logs gathered during longitudinal naturalistic study of online information seeking behavior are used as stimuli for the algorithms the findings show that tailoring display time thresholds based on task information improves irf algorithm performance but doing so based on user information worsens performance this has implications for the development of effective irf algorithms
recently there has been growing interest in join query evaluation for scenarios in which inputs arrive at highly variable and unpredictable rates in such scenarios the focus shifts from completing the computation as soon as possible to producing prefix of the output as soon as possible to handle this shift in focus most solutions to date rely upon some combination of streaming binary operators and on the fly execution plan reorganization in contrast we consider the alternative of extending existing symmetric binary join operators to handle more than two inputs toward this end we have completed prototype implementation of multi way join operator which we term the mjoin operator and explored its performance our results show that in many instances the mjoin produces outputs sooner than any tree of binary operators additionally since mjoins are completely symmetric with respect to their inputs they can reduce the need for expensive runtime plan reorganization this suggests that supporting multiway joins in single symmetric streaming operator may be useful addition to systems that support queries over input streams from remote sites
to deal with the problem of insufficient labeled data in video object classification one solution is to utilize additional pairwise constraints that indicate the relationship between two examples ie whether these examples belong to the same class or not in this paper we propose discriminative learning approach which can incorporate pairwise constraints into conventional margin based learning framework different from previous work that usually attempts to learn better distance metrics or estimate the underlying data distribution the proposed approach can directly model the decision boundary and thus require fewer model assumptions moreover the proposed approach can handle both labeled data and pairwise constraints in unified framework in this work we investigate two families of pairwise loss functions namely convex and nonconvex pairwise loss functions and then derive three pairwise learning algorithms by plugging in the hinge loss and the logistic loss functions the proposed learning algorithms were evaluated using people identification task on two surveillance video data sets the experiments demonstrated that the proposed pairwise learning algorithms considerably outperform the baseline classifiers using only labeled data and two other pairwise learning algorithms with the same amount of pairwise constraints
etl extract transform load processing is filling an increasingly critical role in analyzing business data and in taking appropriate business actions based on the results as the volume of business data to be analyzed increases and quick responses are more critical for business success there are strong demands for scalable high performance etl processors in this paper we evaluate distributed data stream processing engine called system for those purposes based on the original motivation of building system as data stream processing engine we first perform qualitative study to see if the programming model of system is suitable for representing an etl workflow second we did performance studies with representative etl scenario through our series of experiments we found that the spade programming model and its runtime environment naturally fits the requirements of handling massive amounts of etl data in highly scalable manner
in order to display web pages designed for desktop sized monitors some small screen web browsers provide single column or thumbnail views both have limitations single column views affect page layouts and require users to scroll significantly more thumbnail views tend to reduce contained text beyond readability so differentiating visually similar areas requires users to zoom in this paper we present summary thumbnails thumbnail views enhanced with readable text fragments summary thumbnails help users identify viewed material and distinguish between visually similar areas in our user study participants located content in web pages about faster and with lower error rates when using the summary thumbnail interface than when using the single column interface and zoomed less than when using the thumbnail interface nine of the eleven participants preferred summary thumbnails over both the thumbnail and single column interfaces
the process of resource distribution and load balance of distributed pp network can be described as the process of mining supplement frequent patterns sfps from query transaction database with given minimum support minsup and minimum share support minsharesup each sfp includes core frequent pattern bfp used to draw other frequent or sub frequent items latter query returns subset of sfp as the result to realize the sfps mining this paper proposes the structure of sfp tree along with relative mining algorithms the main contribution includes describes the concept of supplement frequent pattern proposes the sfp tree along with frequency ascending order header table fp tree afp tree and conditional mix pattern tree cmp tree proposes the sfps mining algorithms based on sfp tree and conducts the performance experiment on both synthetic and real datasets the result shows the effectiveness and efficiency of the sfps mining algorithm based on sfp tree
with an increasingly mobile society and the worldwide deployment of mobile and wireless networks the wireless infrastructure can support many current and emerging healthcare applications this could fulfill the vision of pervasive healthcare or healthcare to anyone anytime and anywhere by removing locational time and other restraints while increasing both the coverage and the quality in this paper we present applications and requirements of pervasive healthcare wireless networking solutions and several important research problems the pervasive healthcare applications include pervasive health monitoring intelligent emergency management system pervasive health care data access and ubiquitous mobile telemedicine one major application in pervasive healthcare termed comprehensive health monitoring is presented in significant details using wireless networking solutions of wireless lans ad hoc wireless networks and cellular gsm infrastructure oriented networks many interesting challenges of comprehensive wireless health monitoring including context awareness reliability and autonomous and adaptable operation are also presented along with several high level solutions several interesting research problems have been identified and presented for future research
we present rule based framework for the development of scalable parallel high performance simulations for broad class of scientific applications with particular emphasis on continuum mechanics we take pragmatic approach to our programming abstractions by implementing structures that are used frequently and have common high performance implementations on distributed memory architectures the resulting framework borrows heavily from rule based systems for relational database models however limiting the scope to those parts that have obvious high performance implementation using our approach we demonstrate predictable performance behavior and efficient utilization of large scale distributed memory architectures on problems of significant complexity involving multiple disciplines
we have developed an automated confluence prover for term rewriting systems trss this paper presents theoretical and technical ingredients that have been used in our prover distinctive feature of our prover is incorporation of several divide and conquer criteria such as those for commutative toyama layer preserving ohlebusch and persistent aoto toyama combinations for trs to which direct confluence criteria do not apply the prover decomposes it into components and tries to apply direct confluence criteria to each component then the prover combines these results to infer the non confluence of the whole system to the best of our knowledge an automated confluence prover based on such an approach has been unknown
we present language for specifying web service interfaces web service interface puts three kinds of constraints on the users of the service first the interface specifies the methods that can be called by client together with types of input and output parameters these are called signature constraints second the interface may specify propositional constraints on method calls and output values that may occur in web service conversation these are called consistency constraints third the interface may specify temporal constraints on the ordering of method calls these are called protocol constraints the interfaces can be used to check first if two or more web services are compatible and second if web service can be safely substituted for web service the algorithm for compatibility checking verifies that two or more interfaces fulfill each others constraints the algorithm for substitutivity checking verifies that service demands fewer and fulfills more constraints than service
reconfigurable processors provide means to flexible and energy aware computing in this paper we present new scheme for runtime energy minimization remis as part of dynamically reconfigurable processor that is exposed to run time varying constraints like performance and footprint ie amount of reconfigurable fabric the scheme chooses an energy minimizing set of so called special instructions considering leakage dynamic and reconfiguration energy and then power gates temporarily unused subset of the special instruction set we provide comprehensive evaluation for different technologies ranging from nm to nm and thereby show that our scheme is technology independent ie it is beneficial for various technologies alike by means of an video encoder we demonstrate that for certain performance constraints our scheme applied to our in house reconfigurable processor achieves an allover energy saving of up to avg compared to performance maximizing scheme we also demonstrate that our scheme is equally beneficial to various other state of the art reconfigurable processor architectures like molen where it achieves energy savings of up to avg at nm we have employed an encoder within this paper as an application in order to demonstrate the strengths of our scheme since the complexity and run time unpredictability present challenging scenario for state of the art architectures
ontologies provide powerful tool for distributed agent based information systems however in their raw form they can be difficult for users to interact with directly different query architectures use structured query languages as an interface but these still require the users to have an expert understanding of the underlying ontologies by using an open hypermedia model as an interface to an ontological information space users can interact with such system using familiar browsing and navigation techniques which are translated into queries over the underlying information coupled with dynamic document generation this allows complicated queries to be made without the user having to interact directly with the ontologies our key contribution is notion of hypermedia links between concepts and queries within an ontological information space this approach is demonstrated with dynamic cv application built around the sofar agent framework and the fundamental open hypermedia model fohm in addition to abstracting the interface open hypermedia allows alternative linkbases to be used to represent different ldquo query recipes rdquo providing different views and navigational experiences to the user
there has recently been large effort in using unlabeled data in conjunction with labeled data in machine learning semi supervised learning and active learning are two well known techniques that exploit the unlabeled data in the learning process in this work the active learning is used to query label for an unlabeled data on top of semi supervised classifier this work focuses on the query selection criterion the proposed criterion selects the example for which the label change results in the largest pertubation of other examples label experimental results show the effectiveness of the proposed query selection criterion in comparison to existing techniques
gui systems are becoming increasingly popular thanks to their ease of use when compared against traditional systems however gui systems are often challenging to test due to their complexity and special features traditional testing methodologies are not designed to deal with the complexity of gui systems using these methodologies can result in increased time and expense in our proposed strategy gui system will be divided into two abstract tiers the component tier and the system tier on the component tier flow graph will be created for each gui component each flow graph represents set of relationships between the pre conditions event sequences and post conditions for the corresponding component on the system tier the components are integrated to build up viewpoint of the entire system tests on the system tier will interrogate the interactions between the components this method for gui testing is simple and practical we will show the effectiveness of this approach by performing two empirical experiments and describing the results found
means is one of the most popular and widespread partitioning clustering algorithms due to its superior scalability and efficiency typically the means algorithm treats all features fairly and sets weights of all features equally when evaluating dissimilarity however meaningful clustering phenomenon often occurs in subspace defined by specific subset of all features to address this issue this paper proposes novel feature weight self adjustment fwsa mechanism embedded into means in order to improve the clustering quality of means in the fwsa mechanism finding feature weights is modeled as an optimization problem to simultaneously minimize the separations within clusters and maximize the separations between clusters with this objective the adjustment margin of feature weight can be derived based on the importance of the feature to the clustering quality at each iteration in means all feature weights are adaptively updated by adding their respective adjustment margins number of synthetic and real data are experimented on to show the benefits of the proposed fwas mechanism in addition when compared to recent similar feature weighting work the proposed mechanism illustrates several advantages in both the theoretical and experimental results
although processor design verification consumes ever increasing resources many design defects still slip into production silicon in few cases such bugs have caused expensive chip recalls to truly improve productivity hardware bugs should be handled like system software ones with vendors periodically releasing patches to fix hardware in the field based on an analysis of serious design defects in current amd intel ibm and motorola processors this paper proposes and evaluates phoenix novel field programmable on chip hardware that detects and recovers from design defects phoenix taps key logic signals and based on downloaded defect signatures combines the signals into conditions that flag defects on defect detection phoenix flushes the pipeline and either retries or invokes customized recovery handler phoenix induces negligible slowdown while adding only area and wire overheads phoenix detects all the serious defects that are triggered by concurrent control signals moreover it recovers from most of them and simplifies recovery for the rest finally we present an algorithm to automatically size phoenix for new processors
increasing communication demands of processor and memory cores in systems on chips socs necessitate the use of networks on chip noc to interconnect the cores an important phase in the design of nocs is he mapping of cores onto the most suitable opology for given application in this paper we present sunmap tool for automatically selecting he best topology for given application and producing mapping of cores onto that topology sunmap explores various design objectives such as minimizing average communication delay area power dissipation subject to bandwidth and area constraints the tool supports different routing functions dimension ordered minimum path traffic splitting and uses floorplanning information early in the topology selection process to provide feasible mappings the network components of the chosen noc are automatically generated using cycle accurate systemc soft macros from pipes architecture sunmap automates noc selection and generation bridging an important design gap in building nocs several experimental case studies are presented in the paper which show the rich design space exploration capabilities of sunmap
regression testing is critical activity which occurs during the maintenance stage of the software lifecycle however it requires large amounts of test cases to assure the attainment of certain degree of quality as result test suite sizes may grow significantly to address this issue test suite reduction techniques have been proposed however suite size reduction may lead to significant loss of fault detection efficacy to deal with this problem greedy algorithm is presented in this paper this algorithm attempts to select test case which satisfies the maximum number of testing requirements while having minimum overlap in requirements coverage with other test cases in order to evaluate the proposed algorithm experiments have been conducted on the siemens suite and the space program the results demonstrate the effectiveness of the proposed algorithm by retaining the fault detection capability of the suites while achieving significant suite size reduction
we consider query optimization techniques for data intensive pp applications we show how to adapt an old technique from deductive databases namely query sub query qsq to setting where autonomous and distributed peers share large volumes of interelated datawe illustrate the technique with an important telecommunication problem the diagnosis of distributed telecom systems we show that the problem can be modeled using datalog programs and ii it can benefit from the large battery of optimization techniques developed for datalog in particular we show that simple generic use of the extension of qsq achieves an optimization as good as that previously provided by dedicated diagnosis algorithms furthermore we show that it allows solving efficiently much larger class of system analysis problems
we propose novel and efficient surface matching approach for reassembling broken solids as well as for matching assembly components using cluster trees of oriented points the method rapidly scans through the space of all possible contact poses of the fragments to be re assembled using tree search strategy which neither relies on any surface features nor requires an initial solution the new method first decomposes each point set into binary tree structure using hierarchical clustering algorithm subsequently the fragments are matched pairwise by descending the cluster trees simultaneously in depth first fashion in contrast to the reassemblage of pottery and thin walled artifacts this paper addresses the problem of matching broken solids on the basis of their fracture surfaces which are assumed to be reasonable large our proposed contact area maximization is powerful common basis for most surface matching tasks which can be adapted to numerous special applications the suggested approach is very robust and offers an outstanding efficiency
emerging software development environments are characterized by heterogeneity they are composed of diverse object stores user interfaces and tools this paper presents an approach for providing hypertext services in this heterogeneous setting central notions of the approach include the following anchors are established with respect to interactive views of objects rather than the objects themselves composable ary links can be established between anchors on different views of objects stored in distinct object bases viewers and objects may be implemented in different programming languages afforded by client server architecture multiple concurrently active viewers enable multimedia hypertext services the paper describes the approach and presents an architecture which supports it experience with the chimera prototype and its relationship to other systems is described
identity based negotiations are convenient protocols to closely control users personal data that empower users to negotiate the trust of unknown counterparts by carefully governing the disclosure of their identities such type of negotiations presents however unique challenges mainly caused by the way identity attributes are distributed and managed in this paper we present novel approach for conducting long running negotiations in the context of digital identity management systems we propose some major extensions to an existing trust negotiation protocol to support negotiations that are conducted during multiple sessions to the best of our knowledge this is the first time protocol for conducting trust negotiations over multiple sessions is presented
we show how model checking and symbolic execution can be used to generate test inputs to achieve structural coverage of code that manipulates complex data structures we focus on obtaining branch coverage during unit testing of some of the core methods of the red black tree implementation in the java treemap library using the java pathfinder model checker three different test generation techniques will be introduced and compared namely straight model checking of the code model checking used in black box fashion to generate all inputs up to fixed size and lastly model checking used during white box test input generation the main contribution of this work is to show how efficient white box test input generation can be done for code manipulating complex data taking into account complex method preconditions
we define logic programs with defaults and argumentation theories new framework that unifies most of the earlier proposals for defeasible reasoning in logic programming we present model theoretic semantics and study its reducibility and well behavior properties we use the framework as an elegant and flexible foundation to extend and improve upon generalized courteous logic programs gclp one of the popular forms of defeasible reasoning the extensions include higher order and object oriented features of hilog and logic the improvements include much simpler incremental reasoning algorithms and more intuitive behavior the framework and its courteous family instantiation were implemented as an extension to the flora system
the combination of evidence can increase retrieval effectiveness in this paper we investigate the effectiveness of decision mechanism for the selective combination of evidence for web information retrieval and particularly for topic distillation we introduce two measures of query’s broadness and use them to select an appropriate combination of evidence for each query the results from our experiments show that there is statistically significant association between the output of the decision mechanism and the relative effectiveness of the different combinations of evidence moreover we show that the proposed methodology can be applied in an operational setting where relevance information is not available by setting the decision mechanism’s thresholds automatically
this paper introduces the mtree index algorithm special purpose xml xpath index designed to meet the needs of the hierarchical xpath query language with the increasing importance of xml xpath and xquery several methods have been proposed for creating xml structure indexes and many variants using relational technology have been proposed this work proposes new xml structure index called mtree which is designed to be optimal for traversing all xpath axes the primary feature of mtree lies in its ability to provide the next subtree root node in document order for all axes to each context node in mtree is special purpose xpath index structure that matches the special purpose query requirements for xpath this approach is in contrast to other approaches that map the problem domain into general purpose index structures such as tree that must reconstruct the xml tree from those structures for every query mtree supports modification operations such as insert and delete mtree has been implemented both in memory and on disk and performance results using xmark benchmark data are presented showing up to two orders of magnitude improvement over other well known implementations
the term grammar based software describes software whose input can be specified by context free grammar this grammar may occur explicitly in the software in the form of an input specification to parser generator or implicitly in the form of hand written parser or other input verification routines grammar based software includes not only programming language compilers but also tools for program analysis reverse engineering software metrics and documentation generation such tools often play crucial role in automated software development and ensuring their completeness and correctness is vital prerequisite for their usin this paper we propose strategy for the construction of test suites for grammar based software and illustrate this strategy using the iso cpp grammar we use the concept of rule coverage as pivot for the reduction of implementation based and specification based test suites and demonstrate significant decrease in the size of these suites to demonstrate the validity of the approach we use the reduced test suite to analyze three grammar based tools for cpp we compare the effectiveness of the reduced test suite with the original suite in terms of code coverage and fault detection
as the industry moves toward larger scale chip multiprocessors the need to parallelize applications grows high inter thread communication delays exacerbated by over stressed high latency memory subsystems and ever increasing wire delays require parallelization techniques to create partially or fully independent threads to improve performance unfortunately developers and compilers alike often fail to find sufficient independent work of this kind recently proposed pipelined streaming techniques have shown significant promise for both manual and automatic parallelization these techniques have wide scale applicability because they embrace inter thread dependences albeit acyclic dependences and tolerate long latency communication of these dependences this paper addresses the lack of architectural support for this type of concurrency which has blocked its adoption and hindered related language and compiler research we observe that both manual and automatic techniques create high frequency streaming threads with communication occurring every to instructions even while easily tolerating inter thread transit delays high frequency communication makes thread performance very sensitive to intrathread delays from the repeated execution of the communication operations using this observation we define the design space and evaluate several mechanisms to find better trade off between performance and operating system hardware and design costs from this we find light weight streaming aware enhancement to conventional memory subsystems that doubles the speed of these codes and is within of the best performing but heavy weight hardware solution
the importance of tiles or blocks in scientific computing cannot be overstated many algorithms both iterative and recursive can be expressed naturally if tiles are represented explicitly from the point of view of performance tiling either as code or data layout transformation is one of the most effective ways to exploit locality which is must to achieve good performance in current computers because of the significant difference in speed between processor and memory furthermore tiles are also useful to express data distribution in parallel computations however despite the importance of tiles most languages do not support them directly this gives place to bloated programs populated with numerous subscript expressions which make the code difficult to read and coding mistakes more likely this paper discusses hierarchically tiled arrays htas data type which facilitates the easy manipulation of tiles in object oriented languages with emphasis on two new features dynamic partitioning and overlapped tiling these features facilitate the expression of locality and communication while maintaining the same performance of algorithms written using conventional languages
although curvature estimation from given mesh or regularly sampled point set is well studied problem it is still challenging when the input consists of cloud of unstructured points corrupted by misalignment error and outlier noise such input is ubiquitous in computer vision in this paper we propose three pass tensor voting algorithm to robustly estimate curvature tensors from which accurate principal curvatures and directions can be calculated our quantitative estimation is an improvement over the previous two pass algorithm where only qualitative curvature estimation sign of gaussian curvature is performed to overcome misalignment errors our improved method automatically corrects input point locations at subvoxel precision which also rejects outliers that are uncorrectable to adapt to different scales locally we define the radiushit of curvature tensor to quantify estimation accuracy and applicability our curvature estimation algorithm has been proven with detailed quantitative experiments performing better in variety of standard error metrics percentage error in curvature magnitudes absolute angle difference in curvature direction in the presence of large amount of misalignment noise
body sensor network bsn is network of sensors deployed on person’s body usually for health care monitoring since the sensors collect personal medical data security and privacy are important components in body sensor network at the same time the collected data has to readily available in the event of an emergency in this paper we present ibe lite lightweight identity based encryption suitable for sensors and developed protocols based on ibe lite for bsn
an important aspect of database processing in parallel computer systems is the use of data parallel algorithms several parallel algorithms for the relational database join operation in hypercube multicomputer system are given the join algorithms are classified as cycling or global partitioning based on the tuple distribution method employed the various algorithms are compared under common framework using time complexity analysis as well as an implementation on node ncube hypercube system in general the global partitioning algorithms demonstrate better speedup however the cycling algorithm can perform better than the global algorithms in specific situations viz when the difference in input relation cardinalities is large and the hypercube dimension is small the usefulness of the data redistribution operation in improving the performance of the join algorithms in the presence of uneven data partitions is examined the results indicate that redistribution significantly decreases the join algorithm execution times for unbalanced partitions
crucial aspect of peer to peer pp systems is that of providing incentives for users to contribute their resources to the system without such incentives empirical data show that majority of the participants act as free riders as result substantial amount of resource goes untapped and frequently pp systems devolve into client server systems with attendant issues of performance under high load we propose to address the free rider problem by introducing the notion of pp contract in it peers are made aware of the benefits they receive from the system as function of their contributions in this paper we first describe utility based framework to determine the components of the contract and formulate the associated resource allocation problem we consider the resource allocation problem for flash crowd scenario and show how the contract mechanism implemented using centralized server can be used to quickly create pseudoservers that can serve out the requests we then study decentralized implementation of the pp contract scheme in which each node implements the contract based on local demand we show that in such system other than contributing storage and bandwidth to serve out requests it is also important that peer nodes function as application level routers to connect pools of available pseudoservers we study the performance of the distributed implementation with respect to the various parameters including the terms of the contract and the triggers to create pseudoservers and routers
software engineers frequently update cots components integrated in component based systems and can often chose among many candidates produced by different vendors this paper tackles both the problem of quickly identifying components that are syntactically compatible with the interface specifications but badly integrate in target systems and the problem of automatically generating regression test suites the technique proposed in this paper to automatically generate compatibility and prioritized test suites is based on behavioral models that represent component interactions and are automatically generated while executing the original test suites on previous versions of target systems
dynamic graphs or sequence of graphs attract much attention recently in this paper as first step towards finding significant patterns hidden in dynamic graphs we consider the problem of mining successive sequence of subgraphs which appear frequently in long sequence of graphs in addition to exclude insignificant patterns we take into account the mutual dependency measured by correlation coefficient among the components in patterns an algorithm named corsss which utilizes the generality ordering of patterns effectively is developed for enumerating all frequent and correlated patterns the effectiveness of corsss is confirmed through the experiments using real datasets
data exchange is the problem of taking data structured under source schema and creating an instance of target schema that reflects the source data as accurately as possible given source instance there may be many solutions to the data exchange problem that is many target instances that satisfy the constraints of the data exchange problem in an earlier article we identified special class of solutions that we call universal universal solution has homomorphisms into every possible solution and hence is ldquo most general possible rdquo solution nonetheless given source instance there may be many universal solutions this naturally raises the question of whether there is ldquo best rdquo universal solution and hence best solution for data exchange we answer this question by considering the well known notion of the core of structure notion that was first studied in graph theory and has also played role in conjunctive query processing the core of structure is the smallest substructure that is also homomorphic image of the structure all universal solutions have the same core up to isomorphism we show that this core is also universal solution and hence the smallest universal solution the uniqueness of the core of universal solution together with its minimality make the core an ideal solution for data exchange we investigate the computational complexity of producing the core well known results by chandra and merlin imply that unless equals np there is no polynomial time algorithm that given structure as input returns the core of that structure as output in contrast in the context of data exchange we identify natural and fairly broad conditions under which there are polynomial time algorithms for computing the core of universal solution we also analyze the computational complexity of the following decision problem that underlies the computation of cores given two graphs and is the core of earlier results imply that this problem is both np hard and conp hard here we pinpoint its exact complexity by establishing that it is dp complete problem finally we show that the core is the best among all universal solutions for answering existential queries and we propose an alternative semantics for answering queries in data exchange settings
this paper presents pads policy architecture for building distributed storage systems policy architecture has two aspects first common set of mechanisms that allow new systems to be implemented simply by defining new policies second structure for how policies themselves should be specified in the case of distributed storage systems pads defines data plane that provides fixed set of mechanisms for storing and transmitting data and maintaining consistency information pads requires designer to define control plane policy that specifies the system specific policy for orchestrating flows of data among nodes pads then divides control plane policy into two parts routing policy and blocking policy the pads prototype defines concise interface between the data and control planes it provides declarative language for specifying routing policy and it defines simple interface for specifying blocking policy we find that pads greatly reduces the effort to design implement and modify distributed storage systems in particular by using pads we were able to quickly construct dozen significant distributed storage systems spanning large portion of the design space using just few dozen policy rules to define each system
for the optical packet switching routers to be widely deployed in the internet the size of packet buffers on routers has to be significantly small such small buffer networks rely on traffic with low levels of burstiness to avoid buffer overflows and packet losses we present pacing system that proactively shapes traffic in the edge network to reduce burstiness our queue length based pacing uses an adaptive pacing on single queue and paces traffic indiscriminately where deployed in this work we show through analysis and simulation that this pacing approach introduces bounded delay and that it effectively reduces traffic burstiness we also show that it can achieve higher throughput than end system based pacing
distributed system builders are faced with the task of meeting variety of requirements on the global behaviour of the target system such as stability fault tolerance and failure recovery concurrency control commitment and consistency of replicated data the subset of these requirements relevant to particular application we call its coherence constraint the coherence constraint may be very difficult to enforceexisting operating system services do not provide the system builder with an adequate platform for addressing coherence although some systems address other aspects of coherence for example isis addresses the fault tolerance issue even recent developments in micro kernels such as mach and chorus which have concentrated on supporting the shared memory abstraction still leave the systems builder to bridge significant gap between os services and basic coherence requirements the variety of coherence requirements has given rise to welter of mechanisms having familial resemblance yet lacking real conceptual integration consequently the distributed application programmer treats each requirement in isolation often resulting in costly solutions which are nevertheless obscure and idiosyncraticsuch problems have been observed in the context of object based programming environments such as argus clouds and others they are confirmed by our own experience with persistent object store transaction mechanism using nfs oriented file locking this paper describes an approach to distributed coherence enforcement based upon rollback the approach is optimistic in the sense that violations of coherence are resolved rather than prevented rollback is the agent of this resolutionsupport for coherence is provided by units of distributed computation called transactions this transaction mechanism is highly controllable being designed to support advanced database requirements involving non atomic transactions as well as conventional atomic transactions cf the transaction service is underpinned by rollback to provide the synchronisation supported in turn by stable checkpointing and an integrated ipc protocolthe approach raises two key issues the first is the problem of disseminating rollback properly through distributed system the second arises because computational progress does not occur monotonically in physical time but along its own virtual time axis and concerns the interaction of these two time axes
write caches using fast non volatile storage are now widely used in modern storage controllers since they enable hiding latency on writes effective algorithms for write cache management are extremely important since in raid due to read modify write and parity updates each write may cause up to four separate disk seeks while read miss causes only single disk seek and ii typically write cache size is much smaller than the read cache size proportion of is typical write caching policy must decide what data to destage on one hand to exploit temporal locality we would like to destage data that is least likely to be re written soon with the goal of minimizing the total number of destages this is normally achieved using caching algorithm such as lrw least recently written however read cache has very small uniform cost of replacing any data in the cache whereas the cost of destaging depends on the state of the disk heads hence on the other hand to exploit spatial locality we would like to destage writes so as to minimize the average cost of each destage this can be achieved by using disk scheduling algorithm such as cscan that destages data in the ascending order of the logical addresses at the higher level of the write cache in storage controller observe that lrw and cscan focus respectively on exploiting either temporal or spatial locality but not both simultaneously we propose new algorithm namely wise ordering for writes wow for write cache management that effectively combines and balances temporal and spatial locality our experimental set up consisted of an ibm xseries dual processor server running linux that is driving software raid or raid array using workload akin to storage performance council’s widely adopted spc benchmark in cache sensitive configuration on raid wow delivers peak throughput that is higher than cscan and higher than lrw in cache insensitive configuration on raid wow and cscan deliver peak throughput that is higher than lrw for random write workload with nearly misses on raid with cache size of kb pages mb wow and cscan deliver peak throughput that is higher than lrw in summary wow has better or comparable peak throughput to the best of cscan and lrw across wide gamut of write cache sizes and workload configurations in addition even at lower throughputs wow has lower average response times than cscan and lrw
we describe collaborative efforts among group of knowledge representation kr experts domain scientists and scientific information managers in developing knowledge models for ecological and environmental concepts the development of formal structured approaches to kr used by the group ie ontologies can be informed by evidence marshalled from unstructured approaches to kr and semantic tagging already in use by the community
ml style modules are valuable in the development and maintenance of large software systems unfortunately none of the existing languages support them in fully satisfactory manner the official sml definition does not allow higher order functors so module that refers to externally defined functors cannot accurately describe its import interface macqueen and tofte extended sml with fully transparent higher order functors but their system does not have type theoretic semantics thus fails to support fully syntactic signatures the systems of manifest types and translucent sums support fully syntactic signatures but they may propagate fewer type equalities than fully transparent functors this paper presents module calculus that supports both fully transparent higher order functors and fully syntactic signatures and thus true separate compilation we give simple type theoretic semantics to our calculus and show how to compile it into an omega like lambda calculus extended with existential types
with the proliferation of wifi technology many wifi networks are accessible from vehicles on the road making vehicular wifi access realistic however several challenges exist long latency to establish connection to wifi access point ap lossy link performance and frequent disconnections due to mobility we argue that people drive on familiar routes frequently and thus the mobility and connectivity related information along their drives can be predicted with good accuracy using historical information such as gps tracks with timestamps rf fingerprints and link and network layer addresses of visible aps we exploit such information to develop new handoff and data transfer strategies the handoff strategy reduces the connection establishment latency and also uses pre scripted handoffs triggered by change in vehicle location the data transfer strategy speeds up download performance by using prefetching on the aps yet to be encountered experimental performance evaluation reveals that the predictability of mobility and connectivity is high enough to be useful in such protocols in our experiments with vehicular client accessing road side aps the handoff strategy improves download performance by roughly factor of relative to the state of the art the data transfer strategy further improves this performance by another factor of
with recommender systems users receive items recommended on the basis of their profile new users experience the cold start problem as their profile is very poor the system performs very poorly in this paper classical new user cold start techniques are improved by exploiting the cold user data ie the user data that is readily available eg age occupation location etc in order to automatically associate the new user with better first profile relying on the existing community spaces model rule based induction process is used and recommendation process based on the level of agreement principle is defined the experiments show that the quality of recommendations compares to that obtained after classical new user technique while the new user effort is smaller as no initial ratings are asked
structural computing grew from the trend in hypertext research towards generalised systems it asserts the primacy of structure over data as philosophy it has been compared to structuralism in anthropology and linguistics and has given birth to new trend in systems design known as multiple open services mos the fundamental open hypermedia model fohm is an alternative approach to generalised hypertext that views the various hypertext domains as continuous rather than discrete its relationships to structural computing structuralism and mos have never been fully exploredthis paper examines these relationships we explore how fohm might be implemented in mos environments and describe the data border the point where structure meets data we then use this to explore how fohm and generalised hypermedia are related to structuralism and structural computing
this paper presents study comparing two zoomable user interfaces with overviews zuios against classic zoomable user interface zui in the context of user navigation of large information spaces on mobile devices the study aims at exploring if an overview is worth the space it uses as an orientation tool during navigation of an information space and ii if part of the lost space can be recovered by switching to wireframe visualization of the overview and dropping semantic information in it the study takes into consideration search tasks on three types of information space namely maps diagrams and web pages that widely differ in structural complexity results suggest that overviews bring enough benefit to justify the used space if they highlight relevant semantic information that users can exploit during search and ii the structure of the considered information space does not provide appropriate orientation cues
support vector machines svms have been adopted by many data mining and information retrieval applications for learning mining or query concept and then retrieving the rm top hbox best matches to the concept however when the data set is large naively scanning the entire data set to find the top matches is not scalable in this work we propose kernel indexing strategy to substantially prune the search space and thus improve the performance of rm top hbox queries our kernel indexer kdx takes advantage of the underlying geometric properties and quickly converges on an approximate set of rm top hbox instances of interest more importantly once the kernel eg gaussian kernel has been selected and the indexer has been constructed the indexer can work with different kernel parameter settings eg gamma and sigma without performance compromise through theoretical analysis and empirical studies on wide variety of data sets we demonstrate kdx to be very effective an earlier version of this paper appeared in the siam international conference on data mining this version differs from the previous submission in providing detailed cost analysis under different scenarios specifically designed to meet the varying needs of accuracy speed and space requirements developing an approach for insertion and deletion of instances presenting the specific computations as well as the geometric properties used in performing the same and providing detailed algorithms for each of the operations necessary to create and use the index structure
set value attributes are concise and natural way to model complex data sets modern object relational systems support set value attributes and allow various query capabilities on them in this paper we initiate formal study of indexing techniques for set value attributes based on similarity for suitably defined notions of similarity between sets such techniques are necessary in modern applications such as recommendations through collaborative filtering and automated advertising our techniques are probabilistic and approximate in nature as design principle we create structures that make use of well known and widely used data structuring techniques as means to ease integration with existing infrastructure we show how the problem of indexing collection of sets based on similarity can be reduced to the problem of indexing suitably encoded in way that preserves similarity binary vectors in hamming space thus reducing the problem to one of similarity query processing in hamming space then we introduce and analyze two data structure primitives that we use in cooperation to perform similarity query processing in hamming space we show how the resulting indexing technique can be optimized for properties of interest by formulating constraint optimization problems based on the space one is willing to devote for indexing finally we present experimental results from prototype implementation of our techniques using real life datasets exploring the accuracy and efficiency of our overall approach as well as the quality of our solutions to problems related to the optimization of the indexing scheme
in this paper we present novel exploratory visual analytic system called tiara text insight via automated responsive analytics which combines text analytics and interactive visualization to help users explore and analyze large collections of text given collection of documents tiara first uses topic analysis techniques to summarize the documents into set of topics each of which is represented by set of keywords in addition to extracting topics tiara derives time sensitive keywords to depict the content evolution of each topic over time to help users understand the topic based summarization results tiara employs several interactive text visualization techniques to explain the summarization results and seamlessly link such results to the original text we have applied tiara to several real world applications including email summarization and patient record analysis to measure the effectiveness of tiara we have conducted several experiments our experimental results and initial user feedback suggest that tiara is effective in aiding users in their exploratory text analytic tasks
consider message passing system of processors in which each processor holds one piece of data initially the goal is to compute an associative and commutative reduction function on the pieces of data and to make the result known to all the processors this operation is frequently used in many message passing systems and is typically referred to as global combine census computation or gossiping this paper explores the problem of global combine in the multiport postal model this model is characterized by three parameters the number of processors the number of ports per processor and the communication latency in this model in every round each processor can send distinct messages to other processors and it can receive messages that were sent from other processors rounds earlier this paper provides an optimal algorithm for the global combine problem that requires the least number of communication rounds and minimizes the time spent by any processor in sending and receiving messages
we propose an algorithm for placing tasks of data flows for streaming systems onto servers within message oriented middleware where certain tasks can be replicated our work is centered on the idea that certain transformations are stateless and can therefore be replicated replication in this case can cause workloads to be partitioned among multiple machines thus enabling message processing to be parallelized and lead to improvements in performance we propose guided replication approach for this purpose that iteratively computes the optimal placement of replicas where each subsequent iteration of the algorithm takes as input optimal solutions computed in the previous run as result the system performance is consistently improved which eventually converges as shown in simulation results we demonstrate through simulation experiments with both simple and complex task flow graphs and network topologies that introducing our replication mechanism can lead to improvements in runtime performance when system resources are scarce the benefits of applying our replication mechanism are even greater
this paper presents innovative program transformations for the efficient and accurate profiling of java programs the profiling is based on deterministic sampling mechanism that exploits the number of executed jvm bytecode instructions to trigger user defined profiling agent in order to process samples of the call stack the instrumentation is entirely portable profiles are reproducible and the sampling rate can be dynamically tuned moderate overhead and high profile accuracy make the profiling framework attractive for developers of complex systems such as application servers
in this paper we present systematic experimental study of the effect of inter cell interference on ieee performance with increasing penetration of wifi into residential areas and usage in ad hoc conference settings chaotic unplanned deployments are becoming the norm rather than an exception these networks often operate many nearby access points and stations on the same channel either due to lack of coordination or insufficient available channels thus inter cell interference is common but not well understood according to conventional wisdom the efficiency of an network is determined by the number of active clients surprisingly we find that with typical tcp dominant workload cumulative system throughput is characterized by the number of interfering access points rather than the number of clients we find that due to tcp flow control the number of backlogged stations in such network equals twice the number of access points thus single access point network proved very robust even with over one hundred clients multiple interfering access points however lead to an increase in collisions that reduces throughput and affects volume of traffic in the network
wireless sensor networks wsns have attracted intense interest due to their extensible capability in this paper we attempt to answer fundamental but practical question how should we deploy these nodes in most current designs sensor nodes are randomly or uniformly distributed because of their simplicity however the node deployment has great impact on the performance of wsns instead of maintaining the coverage for some snapshots of wsn it is essential that we can provide continuous coverage in the whole lifecycle of the wsn we will exhibit the weakness of the uniform distribution by disclosing the fatal sink routing hole problem to address this problem we propose non uniform power aware distribution scheme our analysis and simulation results show that the power aware deployment scheme can significantly improve the long term network connectivity and service quality
this paper presents novel approach towards automated highlight generation of broadcast sports video sequences from its extracted events and semantic concepts sports video is hierarchically divided into temporal partitions namely megaslots slots and semantic entities namely concepts and events the proposed method extracts event sequence from video and classifies each sequence into concept by sequential association mining the extracted concepts and events within the concepts are selected according to their degree of importance to include those in the highlights parameter degree of abstraction is proposed which gives choice to the user about how concisely the extracted concepts should be produced for specified highlight duration we have successfully extracted highlights from recorded video of cricket match and compared our results with the manually generated highlights by sports television channel
blinking is one of the most important cues for forming person impressions we focus on the eye blinking rate of avatars and investigate its effect on viewer subjective impressions two experiments are conducted the stimulus avatars included humans with generic reality male and female cartoon style humans male and female animals and unidentified life forms that were presented as second animation with various blink rates and blinks min subjects rated their impressions of the presented stimulus avatars on seven point semantic differential scale the results showed significant effect of the avatar’s blinking on viewer impressions and it was larger with the human style avatars than the others the results also lead to several implications and guidelines for the design of avatar representation blink animation of blinks min with human style avatar produces the friendliest impression the higher blink rates ie blinks min give inactive impressions while the lower blink rates ie blinks min give intelligent impressions through these results guidelines are derived for managing attractiveness of avatar by changing the avatar’s blinking rate
we present new approach for dynamic lod processing for geometry images gis in the graphics processing unit gpu gi mipmap is constructed from scanned model then mipmap selector map is created from the camera position’s information and the gi mipmap using the mipmap selector map and the current camera position some parts of the gi mipmap are discarded using one pass shader program that selects the mipmap level that must be displayed and calculates backface culling since our method does not require any extra spatial data structure conceptually it is very simple since lod selection is one pass algorithm performed on the gpu it is also fast
the demand for quickly delivering new applications is increasingly becoming business imperative today however application development is often done in an ad hoc manner resulting in poor reuse of software assets and longer time to delivery web services have received much interest due to their potential in facilitating seamless business to business or enterprise application integration web service composition system can help automate the process from specifying business process functionalities to developing executable workflows that capture non functional eg quality of service qos requirements to deploying them on runtime infrastructure intuitively web services can be viewed as software components and the process of web service composition similar to software synthesis in addition service composition needs to address the build time and runtime issues of the integrated application thereby making it more challenging and practical problem than software synthesis however current solutions based on business web services using wsdl bpel soap etc or semantic web services using ontologies goal directed reasoning etc are both piecemeal and insufficient we formulate the web service composition problem and describe the first integrated system for composing web services end to end ie from specification to deployment the proposed solution is based on novel two staged composition approach that addresses the information modeling aspects of web services provides support for contextual information while composing services employs efficient decoupling of functional and non functional requirements and leads to improved scalability and failure handling we also present synthy prototype of the service composition system and demonstrate its effectiveness with the help of an application scenario from the telecom domain
on chip networks are critical to the scaling of future multi core processors the challenge for on chip network is to reduce the cost including power consumption and area while providing high performance such as low latency and high bandwidth although much research in on chip network have focused on improving the performance of on chip networks they have often relied on router microarchitecture adopted from off chip networks as result the on chip network architecture will not scale properly because of design complexity in this paper we propose low cost on chip network router microarchitecture which is different from the commonly assumed baseline router microarchitecture we reduce the cost of on chip networks by partitioning the crossbar prioritizing packets in flight to simplify arbitration and reducing the amount of buffers we show that by introducing intermediate buffers to decouple the routing in the and the dimensions high performance can be achieved with the proposed low cost router microarchitecture by removing the complexity of baseline router microarchitecture the low cost router microarchitecture can also approach the ideal latency in on chip networks however the prioritized switch arbitration simplifies the router but creates starvation for some nodes we show how delaying the rate credits are returned upstream can be used to implement distributed starvation avoidance mechanism to provide fairness our evaluations show that the proposed low cost router can reduce the area by and the power consumption by compared with baseline router microarchitecture that achieves similar throughput
in this paper we present the results of comparative study that explores the potential benefits of using embodied interaction to help children aged to learn abstract concepts related to musical sounds forty children learned to create musical sound sequences using an interactive sound making environment half the children used version of the system that instantiated body based metaphor in the mapping layer connecting body movements to output sounds the remaining children used version of the same environment that did not instantiate metaphor in the mapping layer in general children were able to more accurately demonstrate sound sequences in the embodied metaphor based system version however we observed that children often resorted to spatial rather than body based metaphors and that the mapping must be easily discoverable as well as metaphorical to provide benefit
the primary goal of web usage mining is the discovery of patterns in the navigational behavior of web users standard approaches such as clustering of user sessions and discovering association rules or frequent navigational paths do not generally provide the ability to automatically characterize or quantify the unobservable factors that lead to common navigational patterns it is therefore necessary to develop techniques that can automatically discover hidden semantic relationships among users as well as between users and web objects probabilistic latent semantic analysis plsa is particularly useful in this context since it can uncover latent semantic associations among users and pages based on the co occurrence patterns of these pages in user sessions in this paper we develop unified framework for the discovery and analysis of web navigational patterns based on plsa we show the flexibility of this framework in characterizing various relationships among users and web objects since these relationships are measured in terms of probabilities we are able to use probabilistic inference to perform variety of analysis tasks such as user segmentation page classification as well as predictive tasks such as collaborative recommendations we demonstrate the effectiveness of our approach through experiments performed on real world data sets
information flow type systems provide an elegant means to enforce confidentiality of programs using the proof assistant isabelle hol we have specified an information flow type system for concurrent language featuring primitives for scheduling and shown that typable programs are non interfering for possibilistic notion of non interference the development which constitutes to our best knowledge the first machine checked account of non interference for concurrent language takes advantage of the proof assistant facilities to structure the proofs about different views of the programming language and to identify the relationships among them and the type system our language and type system generalize previous work of boudol and castellani theoretical computer science in particular by including arrays and lifting several convenient but unnecessary conditions in the syntax and type system of the work of boudol and castellani we illustrate the generality of our language and the usefulness of our type system with medium size example
the emerging edge services architecture promises to improve the availability and performance of web services by replicating servers at geographically distributed sites key challenge in such systems is data replication and consistency so that edge server code can manipulate shared data without suffering the availability and performance penalties that would be incurred by accessing traditional centralized database this article explores using distributed object architecture to build an edge service data replication system for an commerce application the tpc benchmark which simulates an online bookstore we take advantage of application specific semantics to design distributed objects that each manages specific subset of shared information using simple and effective consistency models our experimental results show that by slightly relaxing consistency within individual distributed objects our application realizes both high availability and excellent performance for example in one experiment we find that our object based edge server system provides five times better response time over traditional centralized cluster architecture and factor of nine improvement over an edge service system that distributes code but retains centralized database
over the last seven years we have developed static analysis methods to recover good approximation to the variables and dynamically allocated memory objects of stripped executable and to track the flow of values through them it is relatively easy to track the effects of an instruction operand that refers to global address ie an access to global variable or that uses stack frame offset ie an access to local scalar variable via the frame pointer or stack pointer in our work our algorithms are able to provide useful information for close to of such direct uses and defs it is much harder for static analysis algorithm to track the effects of an instruction operand that uses non stack frame register these indirect uses and defs correspond to accesses to an array or dynamically allocated memory object in one study our approach recovered useful information for only of indirect uses and of indirect defs however using the technique described in this paper the algorithm recovered useful information for of indirect uses and of indirect defs
computer system security is traditionally regarded as primarily technological concern the fundamental questions to which security researchers address themselves are those of the mathematical guarantees that can be made for the performance of various communication and computational challenges however in our research we focus on different question for us the fundamental security question is one that end users routinely encounter and resolve for themselves many times day the question of whether system is secure enough for their immediate needsin this paper we will describe our explorations of this issue in particular we will draw on three major elements of our research to date the first is empirical investigation into everyday security practices looking at how people manage security as practical day to day concern and exploring the context in which security decisions are made this empirical work provides foundation for our reconsideration of the problems of security to large degree as an interactional problem the second is our systems approach based on visualization and event based architectures this technical approach provides broad platform for investigating security and interaction based on set of general principles the third is our initial experiences in prototype deployment of these mechanisms in an application for peer to peer file sharing in face to face collaborative settings we have been using this application as the basis of an initial evaluation of our technology in support of everyday security practices in collaborative workgroups
query optimization in ibm’s system rx the first truly relational xml hybrid data management system requires accurate selectivity estimation of path value pairs ie the number of nodes in the xml tree reachable by given path with the given text value previous techniques have been inadequate because they have focused mainly on the tag labeled paths tree structure of the xml data for most real xml data the number of distinct string values at the leaf nodes is orders of magnitude larger than the set of distinct rooted tag paths hence the real challenge lies in accurate selectivity estimation of the string predicates on the leaf values reachable via given pathin this paper we present cxhist novel workload aware histogram technique that provides accurate selectivity estimation on broad class of xml string based queries cxhist builds histogram in an on line manner by grouping queries into buckets using their true selectivity obtained from query feedback the set of queries associated with each bucket is summarized into feature distributions these feature distributions mimic bayesian classifier that is used to route query to its associated bucket during selectivity estimation we show how cxhist can be used for two general types of path string queries exact match queries and substring match queries experiments using prototype show that cxhist provides accurate selectivity estimation for both exact match queries and substring match queries
within personalized marketing recommendation issue known as multicampaign assignment is to overcome critical problem known as the multiple recommendation problem which occurs when running several personalized campaigns simultaneously this paper mainly deals with the hardness of multicampaign assignment which is treated as very challenging problem in marketing the objective in this problem is to find customer campaign matrix which maximizes the effectiveness of multiple campaigns under some constraints we present realistic response suppression function which is designed to be more practical and explain how this can be learned from historical data moreover we provide proof that this more realistic version of the problem is np hard thus justifying to use of heuristics presented in previous work
the fields of user modeling and natural language processing have been closely linked since the early days of user modeling natural language systems consult user models in order to improve their understanding of users requirements and to generate appropriate and relevant responses at the same time the information natural language systems obtain from their users is expected to increase the accuracy of their user models in this paper we review natural language systems for generation understanding and dialogue focusing on the requirements and limitations these systems and user models place on each other we then propose avenues for future research
workflow systems have traditionally focused on the so called production processes which are characterized by predefinition high volume and repetitiveness recently the deployment of workflow systems in non traditional domains such as collaborative applications learning and cross organizational process integration have put forth new requirements for flexible and dynamic specification however this flexibility cannot be offered at the expense of control critical requirement of business processesin this paper we will present foundation set of constraints for flexible workflow specification these constraints are intended to provide an appropriate balance between flexibility and control the constraint specification framework is based on the concept of pockets of flexibility which allows ad hoc changes and or building of workflows for highly flexible processes basically our approach is to provide the ability to execute on the basis of partially specified model where the full specification of the model is made at runtime and may be unique to each instancethe verification of dynamically built models is essential where as ensuring that the model conforms to specified constraints does not pose great difficulty ensuring that the constraint set itself does not carry conflicts and redundancy is an interesting and challenging problem in this paper we will provide discussion on both the static and dynamic verification aspects we will also briefly present chameleon prototype workflow engine that implements these concepts
development of languages for specifying or modelling problems is an important direction in constraint modelling to provide greater abstraction and modelling convenience these languages are becoming more syntactically rich leading to variety of questions about their expressive power in this paper we consider the expressiveness of essence specification language with rich variety of syntactic features we identify natural fragments of essence that capture the complexity classes np all levels sigmai of the polynomial time hierarchy and all levels nexp of the nondeterministic exponential time hierarchy the union of these classes is the very large complexity class elementary one goal is to begin to understand which features play role in the high expressive power of the language and which are purely features of convenience we also discuss the formalization of arithmetic in essence and related languages notion of capturing np search which is slightly different than that of capturing np and conjectured limit to the expressive power of essence our study is an application of descriptive complexity theory and illustrates the value of taking logic based view of modelling and specification languages
as xml becomes widely used dealing with redundancies in xml data has become an increasingly important issue redundantly stored information can lead not just to higher data storage cost but also to increased costs for data transfer and data manipulation furthermore such data redundancies can lead to potential update anomalies rendering the database inconsistentone way to avoid data redundancies is to employ good schema design based on known functional dependencies in fact several recent studies have focused on defining the notion of xml functional dependencies xml fds to capture xml data redundancies we observe further that xml databases are often casually designed and xml fds may not be determined in advance under such circumstances discovering xml data redundancies in terms of fds from the data itself becomes necessary and is an integral part of the schema refinement processin this paper we present the design and implementation of the first system discoverxfd for effcient discovery of xml data redundancies it employs novel xml data structure and introduces new class of partition based algorithms discoverxfd can not only be used for the previous definitions of xml functional dependencies but also for more comprehensive notion we develop in this paper capable of detecting redundancies involving set elements while maintaining clear semantics experimental evaluations using real life and benchmark datasets demonstrate that our system is practical and scales well with increasing data size
in data integration efforts portal development in particular much development time is devoted to entity resolution often advanced similarity measurement techniques are used to remove semantic duplicates or solve other semantic conflicts it proves impossible however to automatically get rid of all semantic problems an often used rule of thumb states that about of the development effort is devoted to semi automatically resolving the remaining hard cases in an attempt to significantly decrease human effort at data integration time we have proposed an approach that strives for good enough initial integration which stores any remaining semantic uncertainty and conflicts in probabilistic database the remaining cases are to be resolved with user feedback during query time the main contribution of this paper is an experimental investigation of the effects and sensitivity of rule definition threshold tuning and user feedback on the integration quality we claim that our approach indeed reduces development effort and not merely shifts the effort by showing that setting rough safe thresholds and defining only few rules suffices to produce good enough initial integration that can be meaningfully used and that user feedback is effective in gradually improving the integration quality
spatio temporal databases deal with geometries changing over time in general geometries cannot only change in discrete steps but continuously and we are talking about moving objects if only the position in space of an object is relevant then moving point is basic abstraction if also the extent is of interest then the moving region abstraction captures moving as well as growing or shrinking regions we propose new line of research where moving points and moving regions are viewed as space time or higher dimensional entities whose structure and behavior is captured by modeling them as abstract data types such types can be integrated as base attribute data types into relational object oriented or other dbms data models they can be implemented as data blades cartridges etc for extensible dbmss we expect these spatio temporal data types to play similarly fundamental role for spatio temporal databases as spatial data types have played for spatial databases the paper explains the approach and discusses several fundamental issues and questions related to it that need to be clarified before delving into specific designs of spatio temporal algebras
the view selection problem is to choose set of views to materialize over database schema such that the cost of evaluating set of workload queries is minimized and such that the views fit into prespecified storage constraint the two main applications of the view selection problem are materializing views in database to speed up query processing and selecting views to materialize in data warehouse to answer decision support queries in addition view selection is core problem for intelligent data placement over wide area network for data integration applications and data management for ubiquitous computing we describe several fundamental results concerning the view selection problem we consider the problem for views and workloads that consist of equality selection project and join queries and show that the complexity of the problem depends crucially on the quality of the estimates that query optimizer has on the size of the views it is considering to materialize when query optimizer has good estimates of the sizes of the views we show somewhat surprising result namely that an optimal choice of views may involve number of views that is exponential in the size of the database schema on the other hand when an optimizer uses standard estimation heuristics we show that the number of necessary views and the expression size of each view are polynomially bounded
program profile attributes run time costs to portions of program’s execution most profiling systems suffer from two major deficiencies first they only apportion simple metrics such as execution frequency or elapsed time to static syntactic units such as procedures or statements second they aggressively reduce the volume of information collected and reported although aggregation can hide striking differences in program behaviorthis paper addresses both concerns by exploiting the hardware counters available in most modern processors and by incorporating two concepts from data flow analysis flow and context sensitivity to report more context for measurements this paper extends our previous work on efficient path profiling to flow sensitive profiling which associates hardware performance metrics with path through procedure in addition it describes data structure the calling context tree that efficiently captures calling contexts for procedure level measurementsour measurements show that the spec benchmarks execute small number of hot paths that account for of their data cache misses moreover these hot paths are concentrated in few routines which have complex dynamic behavior
datalog can be used to specify variety of class analyses for object oriented programs as variations of common framework in this framework the result of analysing class is set of datalog clauses whose least fixpoint is the information analysed for modular class analysis of program fragments is then expressed as the resolution of open datalog programs we provide theory for the partial resolution of sets of open clauses and define number of operators for reducing such open clauses
process oriented support of collaborative work is an important challenge today at first glance workflow management systems wfms seem to be very suitable tools for realizing team work processes however such processes have to be frequently adapted eg due to process optimizations or when process goals change unfortunately runtime adaptability still seems to be an unsolvable problem for almost all existing wfms usually process changes can be accomplished by modifying corresponding graphical workflow wf schema especially for long running processes however it is extremely important that such changes can be propagated to already running wf instances as well but without causing inconsistencies and errors the paper presents general and comprehensive correctness criterion for ensuring compliance of in progress wf instances with modified wf schema for different kinds of wf schema changes it is precisely stated which rules and which information are needed at mininum for satisfying this criterion
selecting software technologies for software projects represents challenge to software engineers it is known that software projects differ from each other by presenting different characteristics that can complicate the selection of such technologies this is not different when considering model based testing there are many approaches with different characteristics described in the technical literature that can be used in software projects however there is no indication as to how they can fit software project therefore strategy to select model based testing approaches for software projects called porantim is fully described in this paper porantim is based on body of knowledge describing model based testing approaches and their characterization attributes identified by secondary and primary experimental studies and process to guide by adequacy and impact criteria regarding the use of this sort of software technology that can be used by software engineers to select model based testing approaches for software projects
sustainable operation of battery powered wireless embedded systems such as sensor nodes is key challenge and considerable research effort has been devoted to energy optimization of such systems environmental energy harvesting in particular solar based has emerged as viable technique to supplement battery supplies however designing an efficient solar harvesting system to realize the potential benefits of energy harvesting requires an in depth understanding of several factors for example solar energy supply is highly time varying and may not always be sufficient to power the embedded system harvesting components such as solar panels and energy storage elements such as batteries or ultracapacitors have different voltage current characteristics which must be matched to each other as well as the energy requirements of the system to maximize harvesting efficiency further battery non idealities such as self discharge and round trip efficiency directly affect energy usage and storage decisions the ability of the system to modulate its power consumption by selectively deactivating its sub components also impacts the overall power management architecture this paper describes key issues and tradeoffs which arise in the design of solar energy harvesting wireless embedded systems and presents the design implementation and performance evaluation of heliomote our prototype that addresses several of these issues experimental results demonstrate that heliomote which behaves as plug in to the berkeley crossbow motes and autonomously manages energy harvesting and storage enables near perpetual harvesting aware operation of the sensor node
accurate measurement of network bandwidth is important for network management applications as well as flexible internet applications and protocols which actively manage and dynamically adapt to changing utilization of network resources extensive work has focused on two approaches to measuring bandwidth measuring it hop by hop and measuring it end to end along path unfortunately best practice techniques for the former are inefficient and techniques for the latter are only able to observe bottlenecks visible at end to end scope in this paper we develop end to end probing methods which can measure bottleneck capacity bandwidth along arbitrary targeted subpaths of path in the network including subpaths shared by set of flows we evaluate our technique through ns simulations then provide comparative internet performance evaluation against hop by hop and end to end techniques we also describe number of applications which we foresee as standing to benefit from solutions to this problem ranging from network troubleshooting and capacity provisioning to optimizing the layout of application level overlay networks to optimized replica placement
kruppa equation based camera self calibration is one of the classical problems in computer vision most state of the art approaches directly solve the quadratic constraints derived from kruppa equations which are computationally intensive and difficult to obtain initial values in this paper we propose new initialization algorithm by estimating the unknown scalar in the equation thus the camera parameters can be computed linearly in closed form and then refined iteratively via global optimization techniques we prove that the scalar can be uniquely recovered from the infinite homography and propose practical method to estimate the homography from physical or virtual plane located at far distance to the camera extensive experiments on synthetic and real images validate the effectiveness of the proposed method
we present new approach to building secure systems in our approach which we call model driven security designers specify system models along with their security requirements and use tools to automatically generate system architectures from the models including complete configured access control infrastructures rather than fixing one particular modeling language for this process we propose general schema for constructing such languages that combines languages for modeling systems with languages for modeling security we present several instances of this schema that combine both syntactically and semantically different uml modeling languages with security modeling language for formalizing access control requirements from models in the combined languages we automatically generate access control infrastructures for server based applications built from declarative and programmatic access control mechanisms the modeling languages and generation process are semantically well founded and are based on an extension of role based access control we have implemented this approach in uml based case tool and report on experiments
digital inking systems accept pen based input from the user process and archive the resulting data as digital ink however the reviewing techniques currently available for such systems are limited in this paper we formalize operators that model the user interaction during digital ink capture such operators can be applied in situations where it is important to have customized view of the inking activity we describe the implementation of player that allows the user by selecting the desired operators to interact with digitally annotated documents while reviewing them
we present polynomial time randomized algorithm for global value numbering our algorithm is complete when conditionals are treated as non deterministic and all operators are treated as uninterpreted functions we are not aware of any complete polynomial time deterministic algorithm for the same problem the algorithm does not require symbolic manipulations and hence is simpler to implement than the deterministic symbolic algorithms the price for these benefits is that there is probability that the algorithm can report false equality we prove that this probability can be made arbitrarily small by controlling various parameters of the algorithmour algorithm is based on the idea of random interpretation which relies on executing program on number of random inputs and discovering relationships from the computed values the computations are done by giving random linear interpretations to the operators in the program both branches of conditional are executed at join points the program states are combined using random affine combination we discuss ways in which this algorithm can be made more precise by using more accurate interpretations for the linear arithmetic operators and other language constructs
parity games are player games of perfect information and infinite duration that have important applications in automata theory and decision procedures validity as well as model checking for temporal logics in this paper we investigate practical aspects of solving parity games the main contribution is suggestion on how to solve parity games efficiently in practice we present generic solver that intertwines optimisations with any of the existing parity game algorithms which is only called on parts of game that cannot be solved faster by simpler methods this approach is evaluated empirically on series of benchmarking games from the aforementioned application domains showing that using this approach vastly speeds up the solving process as side effect we obtain the surprising observation that zielonka’s recursive algorithm is the best parity game solver in practice
we propose novel algorithm to register multiple point sets within common reference frame using manifold optimization approach the point sets are obtained with multiple laser scanners or mobile scanner unlike most prior algorithms our approach performs an explicit optimization on the manifold of rotations allowing us to formulate the registration problem as an unconstrained minimization on constrained manifold this approach exploits the lie group structure of so and the simple representation of its associated lie algebra so in terms of our contributions are threefold we present new analytic method based on singular value decompositions that yields closed form solution for simultaneous multiview registration in the noise free scenario secondly we use this method to derive good initial estimate of solution in the noise free case this initialization step may be of use in any general iterative scheme finally we present an iterative scheme based on newton’s method on so that has locally quadratic convergence we demonstrate the efficacy of our scheme on scan data taken both from the digital michelangelo project and from scans extracted from models and compare it to some of the other well known schemes for multiview registration in all cases our algorithm converges much faster than the other approaches in some cases orders of magnitude faster and generates consistently higher quality registrations
by using data mining techniques the data stored in data warehouse dw can be analyzed for the purpose of uncovering and predicting hidden patterns within the data so far different approaches have been proposed to accomplish the conceptual design of dws by following the multidimensional md modeling paradigm in previous work we have proposed uml profile for dws enabling the specification of main md properties at conceptual level this paper presents novel approach to integrating data mining models into multidimensional models in order to accomplish the conceptual design of dws with association rules ar to this goal we extend our previous work by providing another uml profile that allows us to specify association rules mining models for dw at conceptual level in clear and expressive way the main advantage of our proposal is that the association rules rely on the goals and user requirements of the data warehouse instead of the traditional method of specifying association rules by considering only the final database implementation structures such as tables rows or columns in this way ars are specified in the early stages of dw project thus reducing the development time and cost finally in order to show the benefits of our approach we have implemented the specified association rules on commercial database management server
we present novel keyword search scheme for file sharing applications based on content distance addressing cda through cda we are able to associate node ids with content distances and thus reduce several complex keyword queries to routing problems on structured overlays unlike traditional approaches we require neither set intersection process nor replication for query processing as result of content distance addressing in this paper we present the design and theoretical analysis of cda for keyword search as well as simulation results using real world parameters
information visualizations often make permanent changes to the user interface with the aim of supporting specific tasks however permanent visualization cannot support the variety of tasks found in realistic work settings equally well we explore interaction techniques that transiently visualize information near the user’s focus of attention transient visualizations support specific contexts of use without permanently changing the user interface and aim to seamlessly integrate with existing tools and to decrease distraction examples of transient visualizations for document search map zoom outs fisheye views of source code and thesaurus access are presented we provide an initial validation of transient visualizations by comparing transient overview for maps to permanent visualization among users of these visualizations all but four preferred the transient visualization however differences in time and error rates were insignificant on this background we discuss the potential of transient visualizations and future directions
software architecture styles for developing multiuser applications are usually defined at conceptual level abstracting such low level issues of distributed implementation as code replication caching strategies and concurrency control policies ultimately such conceptual architectures must be cast into code the iterative design inherent in interactive systems implies that significant evolution will take place at the conceptual level equally however evolution occurs at the implementation level in order to tune performance this paper introduces dragonfly software architecture style that maintains tight bidirectional link between conceptual and implementation software architectures allowing evolution to be performed at either level dragonfly has been implemented in the java based telecomputing developer tcd toolkit
in this paper we propose new interconnection mechanism for network line cards we project that the packet storage needs for the next generation networks will be much higher such that the number of memory modules required to store the packets will be more than that can be directly connected to the network processor npu in other words the npu pins are limited and they do not scale well with the growing number of memory modules and processing elements employed on the network line cards as result we propose to explore more suitable off chip interconnect and communication mechanisms that will replace the existing systems and that will provide extraordinary high throughput in particular we investigate if the packet switched ary cube networks can be solution to the best of our knowledge this is the first time the ary cube networks are used on board we investigate multiple ary cube based interconnects and include variation of ary cube interconnect called the mesh all of the ary cube interconnects include multiple highly efficient techniques to route switch and control packet flows in order to minimize congestion spots and packet loss within the interconnects we explore the tradeoffs between implementation constraints and performance performance results show that ary cube topologies significantly outperform the existing line card interconnects and they are able to sustain higher traffic loads furthermore the mesh reaches the highest performance results of all interconnects and allows future scalability to adopt more memories and or processors to increase the line card’s processing power
wikis have proved to be very effective collaboration and knowledge management tools in large variety of fields thanks to their simplicity and flexible nature another important development for the internet is the emergence of powerful mobile devices supported by fast and reliable wireless networks the combination of these developments begs the question of how to extend wikis on mobile devices and how to leverage mobile devices rich modalities to supplement current wikis realizing that composing and consuming through auditory channel is the most natural and efficient way for mobile device user this paper explores the use of audio as the medium of wiki our work as the first step towards this direction creates framework called mobile audio wiki which facilitates asynchronous audio mediated collaboration on the move in this paper we present the design of mobile audio wiki as part of such design we propose an innovative approach for light weight audio content annotation system for enabling group editing versioning and cross linking among audio clips to elucidate the novel collaboration model introduced by mobile audio wiki its four usage modes are identified and presented in storyboard format finally we describe the initial design for presentation and navigation of mobile audio wiki
swarm based systems are class of multi agent systems mas of particular interest because they exhibit emergent behaviour through self organisation they are biology inspired but find themselves applicable to wide range of domains with some of them characterised as mission critical it is therefore implied that the use of formal framework and methods would facilitate modelling of mas in such way that the final product is fully tested and safety properties are verified one way to achieve this is by defining new formalism to specify mas something which could precisely fit the purpose but requires significant period to formally prove the validation power of the method the alternative is to use existing formal methods thus exploiting their legacy in this paper we follow the latter approach we present operas an open framework that facilitates formal modelling of mas through employing existing formal methods we describe how particular instance of this framework namely operas xc could integrate the most prominent characteristics of finite state machines and biological computation systems such as machines and systems respectively we demonstrate how the resulting method can be used to formally model swarm system and discuss the flexibility and advantages of this approach
current schema matching approaches still have to improve for very large and complex schemas such schemas are increasingly written in the standard language wc xml schema especially in business applications the high expressive power and versatility of this schema language in particular its type system and support for distributed schemas and name spaces introduce new issues in this paper we study some of the important problems in matching such large xml schemas we propose fragment oriented match approach to decompose large match problem into several smaller ones and to reuse previous match results at the level of schema fragments
we present the first report of automatic sentiment summarization in the legal domain this work is based on processing set of legal questions with system consisting of semi automatic web blog search module and fastsum fully automatic extractive multi document sentiment summarization system we provide quantitative evaluation results of the summaries using legal expert reviewers we report baseline evaluation results for query based sentiment summarization for legal blogs on five point scale average responsiveness and linguistic quality are slightly higher than with human inter rater agreement at to the best of our knowledge this is the first evaluation of sentiment summarization in the legal blogosphere
software development environment supports complex network of items of at least the following major types people policies laws resources processes and results such items may need to be changed on an on going basis the authors have designed in the prism project model of changes and two supporting change related environment infrastructures with the following key features separation of changes to the described items from the changes to the environmental facilities encapsulating these items facility called the dependency structure for describing various items and their interdependencies and for identifying the items affected by given change facility called the change structure for classifying recording and analyzing change related data and for making qualitative judgments of the consequences of change identification of the many distinct properties of change and built in mechanism for providing feedback the author’s approach to the problem of change and its rationale is described
over the past few years we have seen the proliferation of internet based services ranging from search engines and map services to video on demand servers all of these kinds of services need to be able to provide guarantees of availability and scalability to their users with millions of users on the internet today these services must have the capacity to handle large number of clients and remain available even in the face of extremely high load in this paper we present generic architecture for supporting such internet applications we provide substrate for scalable network services sns on top of which application developers can design their services without worrying about the details of service management we back our design with three real world services web distillation proxy proxy based web browser for pdas and an mbone archive server
ring interconnects may be an attractive solution for future chip multiprocessors because they can enable faster links than buses and simpler switches than arbitrary switched interconnects moreover ring naturally orders requests sufficiently to enable directory less coherence but not in the total order that buses provide for snooping coherence existing cache coherence protocols for rings either establish total ordering point ordering point or use greedy order greedy order with unbounded retries in this work we propose new class of ring protocols ringorder in which requests complete in ring position order to achieve two benefits first ring order improves performance relative to ordering point by activating requests immediately instead of waiting for them to reach the ordering point second it improves performance stability relative to greedy order by not using retries thus the new ring order combines the best of ordering point good performance stability with the best of greedy order good average performance
how can authoring tools help authors create complex innovative hypertext narrative structures tools for creating hypertext fiction typically represent such narratives in the form of nodes and links however existing tools are not particularly helpful when an author wants to create story with more complex structure such as story told from multiple points of view in this paper we describe our work to develop hypedyn new hypertext authoring tool that provides alternative representations designed to make it easier to create complex hypertext story structures as an initial exploration the tool has been designed to support authoring of interactive multiple points of view stories in order to describe the tool we describe simplified transformation of rashomon into progressively more interactive narrative along the way we identify useful new representations mechanisms and visualizations for helping the author we conclude with some thoughts about the design of interactive storytelling authoring tools in general
vague information is common in many database applications due to internet scale data dissemination such as those data arising from sensor networks and mobile communications we have formalized the notion of vague relation in order to model vague data in our previous work in this paper we utilize functional dependencies fds which are the most fundamental integrity constraints that arise in practice in relational databases to maintain the consistency of vague relation the problem we tackle is given vague relation over schema and set of fds over what is the best approximation of with respect to when taking into account of the median membership and the imprecision membership thresholds using these two thresholds of vague set we define the notion of mi overlap between vague sets and merge operation on satisfaction of an fd in is defined in terms of values being mi overlappingwe show that lien’s and atzeni’s axiom system is sound and complete for fds being satisfied in vague relations we study the chase procedure for vague relation over named vchase as means to maintain consistency of with respect to our main result is that the output of the procedure is the most object precise approximation of with respect to the complexity of vchase is polynomial time in the sizes of and
the widespread adoption of xml holds the promise that document structure can be exploited to specify precise database queries however users may have only limited knowledge of the xml structure and may be unable to produce correct xquery expression especially in the context of heterogeneous information collection the default is to use keyword based search and we are all too familiar with how difficult it is to obtain precise answers by these means we seek to address these problems by introducing the notion of meaningful query focus mqf for finding related nodes within an xml document mqf enables users to take full advantage of the preciseness and efficiency of xquery without requiring perfect knowledge of the document structure such schema free xquery is potentially of value not just to casual users with partial knowledge of schema but also to experts working in data integration or data evolution in such context schema free query once written can be applied universally to multiple data sources that supply similar content under different schemas and applied forever as these schemas evolve our experimental evaluation found that it is possible to express wide variety of queries in schema free manner and efficiently retrieve correct results over broad diversity of schemas furthermore the evaluation of schema free query is not expensive using novel stack based algorithm we developed for computing mqf the overhead is from to times the execution time of an equivalent schema aware query the evaluation cost of schema free queries can be further reduced by as much as using selectivity based algorithm we develop to enable the integration of mqf operation into the query pipeline
accurate next web page prediction benefits many applications business in particular the most widely used techniques for this purpose are markov model association rules and clustering however each of these techniques has its own limitations especially when it comes to accuracy and space complexity this paper presents an improved prediction accuracy and state space complexity by using novel approaches that combine clustering association rules and markov models the three techniques are integrated together to maximise their strengths the integration model has been shown to achieve better prediction accuracy than individual and other integrated models
we consider an business web server system where the network traffic exhibits self similarity we demonstrate that traditional techniques are unsuitable for predicting the network performance under such traffic conditions instead we propose and demonstrate novel decomposition approximation technique that helps predict delays more accurately and thus is better suited for capacity planning and network design when compared to traditional queueing network analyzers we also consider several strategies for mitigating the effect of self similarity and conclude that admission control holds the greatest potential for improving service we provide an approximation technique for computing the admission control parameter values numerical results and suggestions for future work are discussed
previous implementations of generic rewriting libraries have number of limitations they require the user to either adapt the datatype on which rewriting is applied or the rewriting rules are specified as functions which makes it hard or impossible to document test and analyse them we describe library that demonstrates how to overcome these limitations by defining rules in terms of datatypes and show how to use type indexed datatype to automatically extend datatype for syntax trees with case for metavariables we then show how rewrite rules can be implemented without any knowledge of how the datatype is extended with metavariables we use haskell extended with associated type synonyms to implement both type indexed datatypes and generic functions we analyse the performance of our library and compare it with other approaches to generic rewriting
in cost sharing problem several participants with unknown preferences vie to receive some good or service and each possible outcome has known cost cost sharing mechanism is protocol that decides which participants are allocated good and at what prices three desirable properties of cost sharing mechanism are incentive compatibility meaning that participants are motivated to bid their true private value for receiving the good budget balance meaning that the mechanism recovers its incurred cost with the prices charged and economic efficiency meaning that the cost incurred and the value to the participants are traded off in an optimal way these three goals have been known to be mutually incompatible for thirty years nearly all the work on cost sharing mechanism design by the economics and computer science communities has focused on achieving two of these goals while completely ignoring the third we introduce novel measures for quantifying efficiency loss in cost sharing mechanisms and prove simultaneous approximate budget balance and approximate efficiency guarantees for mechanisms for wide range of cost sharing problems including all submodular and steiner tree problems our key technical tool is an exact characterization of worst case efficiency loss in moulin mechanisms the dominant paradigm in cost sharing mechanism design
our goal is to use the vast repositories of available open source code to generate specific functions or classes that meet user’s specifications the key words here are specifications and generate we let users specify what they are looking for as precisely as possible using keywords class or method signatures test cases contracts and security constraints our system then uses an open set of program transformations to map retrieved code into what the user asked for this approach is implemented in prototype system for java with web interface
user session based testing of web applications gathers user sessions to create and continually update test suites based on real user input in the field to support this approach during maintenance and beta testing phases we have built an automated framework for testing web based software that focuses on scalability and evolving the test suite automatically as the application’s operational profile changes this paper reports on the automation of the replay and oracle components for web applications which pose issues beyond those in the equivalent testing steps for traditional stand alone applications concurrency nondeterminism dependence on persistent state and previous user sessions complex application infrastructure and large number of output formats necessitate developing different replay and oracle comparator operators which have tradeoffs in fault detection effectiveness precision of analysis and efficiency we have designed implemented and evaluated set of automated replay techniques and oracle comparators for user session based testing of web applications this paper describes the issues algorithms heuristics and an experimental case study with user sessions for two web applications from our results we conclude that testers performing user session based testing should consider their expectations for program coverage and fault detection when choosing replay and oracle technique
in this paper we explore how to add pointing input capabilities to very small screen devices on first sight touchscreens seem to allow for particular compactness because they integrate input and screen into the same physical space the opposite is true however because the user’s fingers occlude contents and prevent precision we argue that the key to touch enabling very small devices is to use touch on the device backside in order to study this we have created prototype device we simulate screens smaller than that by masking the screen we present user study in which participants completed pointing task successfully across display sizes when using back of device interface the touchscreen based control condition enhanced with the shift technique in contrast failed for screen diagonals below inch we present four form factor concepts based on back of device interaction and provide design guidelines extracted from second user study
learning is currently rapidly expanding domain provoked by the fast advances of mobile technologies different applications and systems are developed continuously here we address the hoarding problem which is weakly explored before but is particularly important issue in the mobile domain and solution should be included in every system with large quantity of data hoarding is the process of automatically selecting learning content which is to be prepared and prefetched on the mobile device’s local memory for the following offline session we describe the hoarding problem and the strategy to solve it with the goal of providing an efficient hoarding solution
reflection plays major role in the programming of generic applications however it introduces an interpretation layer which is detrimental to performance solution consists of relying on partial evaluation to remove this interpretation layer this paper deals with improving standard partial evaluator in order to handle the java reflection api the improvements basically consist of taking type information into account when distinguishing between static and dynamic data as well as introducing two new specialization actions reflection actions benchmarks using the serialization framework show the benefits of the approach
earlier research on gender effects with software features intended to help problem solvers in end user debugging environments has shown that females are less likely to use unfamiliar software features this poses serious problem because these features may be key to helping them with debugging problems contrasting this with research documenting males inclination for tinkering in unfamiliar environments the question arises as to whether encouraging tinkering with new features would help females overcome the factors such as low self efficacy that led to the earlier results in this paper we present an experiment with males and females in an end user debugging setting and investigate how tinkering behavior impacts several measures of their debugging success our results show that the factors of tinkering reflection and self efficacy can combine in multiple ways to impact debugging effectiveness differently for males than for females
this paper describes how information visualization techniques can be used to monitor web based collaborative platform and to support workplace awareness by providing global overview of the activities an innovative prototype is described its originality relies on using some enclosure based visualization methods in the context of activities monitoring which is rather unusual in addition new layout is described for representing data trees the use of the system is illustrated with the case of eu funded network of excellence
there is correspondence between classical logic and programming language calculi with first class continuations with the addition of control delimiters prompts the continuations become composable and the calculi are believed to become more expressive we formalise that the addition of prompts corresponds to the addition of single dynamically scoped variable modelling the special top level continuation from type perspective the dynamically scoped variable requires effect annotations from logic perspective the effect annotations can be understood in standard logic extended with the dual of implication namely subtraction
the explosion of multimedia data necessitates effective and efficient ways for us to get access to our desired ones in this article we draw an analogy between image retrieval and text retrieval and propose visual phrase based approach to retrieve images containing desired objects object based image retrieval the visual phrase is defined as pair of frequently co occurred adjacent local image patches and is constructed using data mining we design methods on how to construct visual phrase and how to index search images based on visual phrase we demonstrate experiments to show our visual phrase based approach can be very efficient and more effective than current visual word based approach
many collaboration features in software development tools draw on lightweight technologies such as tagging and wikis we propose to study the role of emergent knowledge structures created through these features using mixed methods approach we investigate which processes emergent knowledge structures support and how tool support can leverage them
data can be distinguished according to volume variable types and distribution and each of these characteristics imposes constraints upon the choice of applicable algorithms for their visualisation this has led to an abundance of often disparate algorithmic techniques previous work has shown that hybrid algorithmic approach can be successful in addressing the impact of data volume on the feasibility of multidimensional scaling mds this paper presents system and framework in which user can easily explore algorithms as well as their hybrid conjunctions and the data flowing through them visual programming and novel algorithmic architecture let the user semiautomatically define data flows and the coordination of multiple views of algorithmic and visualisation components we propose that our approach has two main benefits significant improvements in run times of mds algorithms can be achieved and intermediate views of the data and the visualisation program structure can provide greater insight and control over the visualisation process
hierarchical classification framework is proposed for discriminating rare classes in imprecise domains characterized by rarity of both classes and cases noise and low class separability the devised framework couples the rules of rule based classifier with as many local probabilistic generative models these are trained over the coverage of the corresponding rules to better catch those globally rare cases classes that become less rare in the coverage two novel schemes for tightly integrating rule based and probabilistic classification are introduced that classify unlabeled cases by considering multiple classifier rules as well as their local probabilistic counterparts an intensive evaluation shows that the proposed framework is competitive and often superior in accuracy wrt established competitors while overcoming them in dealing with rare classes
mismatch and overload are two fundamental issues regarding the efficiency of web information gathering to provide satisfactory solution this paper presents web information gathering system that encapsulates two phases the filtering and sophisticated data processing the objective of the filtering is to quickly filter out most irrelevant data in order to avoid mismatch the phase of the sophisticated data processing can use more sophisticated techniques without carefully considering time complexities the second phase is for solving the problem of the information overload
the growing complexity of customizable single chip multiprocessors is requiring communication resources that can only be provided by highly scalable communication infrastructure this trend is exemplified by the growing number of network on chip noc architectures that have been proposed recently for system on chip soc integration developing noc based systems tailored to particular application domain is crucial for achieving high performance energy efficient customized solutions the effectiveness of this approach largely depends on the availability of an ad hoc design methodology that starting from high level application specification derives an optimized noc configuration with respect to different design objectives and instantiates the selected application specific on chip micronetwork automatic execution of these design steps is highly desirable to increase soc design productivity this paper illustrates complete synthesis flow called netchip for customized noc architectures that partitions the development work into major steps topology mapping selection and generation and provides proper tools for their automatic execution sunmap timespipescompiler the entire flow leverages the flexibility of fully reusable and scalable network components library called timespipes consisting of highly parameterizable network building blocks network interface switches switch to switch links that are design time tunable and composable to achieve arbitrary topologies and customized domain specific noc architectures several experimental case studies are presented in the paper showing the powerful design space exploration capabilities of the proposed methodology and tools
current critical systems often use lot of floating point computations and thus the testing or static analysis of programs containing floating point operators has become priority however correctly defining the semantics of common implementations of floating point is tricky because semantics may change according to many factors beyond source code level such as choices made by compilers we here give concrete examples of problems that can appear and solutions for implementing in analysis software
in this paper we present the design implementation and evaluation of iplane scalable service providing accurate predictions of internet path performance for emerging overlay services unlike the more common black box latency prediction techniques in use today iplane adopts structural approach and predicts end to end path performance by composing the performance of measured segments of internet paths for the paths we observed this method allows us to accurately and efficiently predict latency bandwidth capacity and loss rates between arbitrary internet hosts we demonstrate the feasibility and utility of the iplane service by applying it to several representative overlay services in use today content distribution swarming peer to peer filesharing and voice over ip in each case using iplane’s predictions leads to improved overlay performance
communication in wireless sensor networks uses the majority of sensor’s limited energy using aggregation in wireless sensor network reduces the overall communication cost security in wireless sensor networks entails many different challenges traditional end to end security is not suitable for use with in network aggregation corrupted sensor has access to the data and can falsify results additively homomorphic encryption allows for aggregation of encrypted values with the result being the same as the result when unencrypted data was aggregated using public key cryptography digital signatures can be used to achieve integrity we propose new algorithm using homomorphic encryption and additive digital signatures to achieve confidentiality integrity and availability for in network aggregation in wireless sensor networkswe prove that our digital signature algorithm which is based on the elliptic curve digital signature algorithm ecdsa is as secure as ecdsa
fast estimates for aggregate queries are useful in database query optimization approximate query answering and online query processing hence there has been lot of focus on ldquo selectivity estimation rdquo that is computing summary statistics on the underlying data and using that to answer aggregate queries fast and to reasonable approximation we present two sets of results for range aggregate queries which are amongst the most common queries first we focus on histogram as summary statistics and present algorithms for constructing histograms that are provably optimal or provably approximate for range queries these algorithms take pseudo polynomial time these are the first known optimality or approximation results for arbitrary range queries previously known results were optimal only for restricted range queries such as equality queries hierarchical or prefix range queries second we focus on wavelet based representations as summary statistics and present fast algorithms for picking wavelet statistics that are provably optimal for range queries no previously known wavelet based methods have this property we perform an experimental study of the various summary representations show the benefits of our algorithms over the known methods
real life data is often dirty and costs billions of pounds to businesses worldwide each year this paper presents promising approach to improving data quality it effectively detects and fixes inconsistencies in real life data based on conditional dependencies an extension of database dependencies by enforcing bindings of semantically related data values it accurately identifies records from unreliable data sources by leveraging relative candidate keys an extension of keys for relations by supporting similarity and matching operators across relations in contrast to traditional dependencies that were developed for improving the quality of schema the revised constraints are proposed to improve the quality of data these constraints yield practical techniques for data repairing and record matching in uniform framework
the tautology problem is the problem to prove the validity of statements in this paper we present calculus for this undecidable problem on graphical conditions prove its soundness investigate the necessity of each deduction rule and discuss practical aspects concerning an implementation as we use the framework of weak adhesive hlr categories the calculus is applicable to number of replacement capable structures such as petri nets graphs or hypergraphs
current advances in digital technology promote capturing and storing more digital photos than ever while photo collections are growing in size the amount of time that can be devoted to viewing managing and sharing digital photos remains constant photo decision making and selection has been identified as key to addressing this concern after conducting exploratory research on photo decision making including wide scale survey of user behaviors detailed contextual inquiries and longer term diary studies pixaura was designed to address problems that emerged from our research specifically pixaura aims to bridge the gap between importing source photos and sharing them with others by supporting tentative decision making within the selection process for this experience the system incorporates certain core elements flexibility to experiment with relationships between photos and groups of photos the ability to closely couple photos while sharing only subset of those photos and tight connection between the photo selection and photo sharing space
the probabilistic latent semantic indexing model introduced by hofmann has engendered applications in numerous fields notably document classification and information retrieval in this context the fisher kernel was found to be an appropriate document similarity measure however the kernels published so far contain unjustified features some of which hinder their performances furthermore plsi is not generative for unknown documents shortcoming usually remedied by folding them in the plsi parameter spacethis paper contributes on both points by introducing new rigorous development of the fisher kernel for plsi addressing the role of the fisher information matrix and uncovering its relation to the kernels proposed so far and proposing novel and theoretically sound document similarity which avoids the problem of folding in unknown documents for both aspects experimental results are provided on several information retrieval evaluation sets
it can be very difficult to debug impure code let alone prove its correctness to address these problems we provide functional specification of three central components of peyton jones’s awkward squad teletype io mutable state and concurrency by constructing an internal model of such concepts within our programming language we can test debug and reason about programs that perform io as if they were pure in particular we demonstrate how our specifications may be used in tandem with quickcheck to automatically test complex pointer algorithms and concurrent programs
clustering can play critical role in increasing the performance and lifetime of wireless networks the facility location problem is general abstraction of the clustering problem and this paper presents the first constant factor approximation algorithm for the facility location problem on unit disk graphs udgs commonly used model for wireless networks in this version of the problem connection costs are not metric ie they do not satisfy the triangle inequality because connecting to non neighbor costs in non metric settings the best approximation algorithms guarantee an logn factor approximation but we are able to use structural properties of udgs to obtain constant factor approximation our approach combines ideas from the primal dual algorithm for facility location due to jain and vazirani jacm with recent results on the weighted minimum dominating set problem for udgs huang et al comb opt we then show that the facility location problem on udgs is inherently local and one can solve local subproblems independently and combine the solutions in simple way to obtain good solution to the overall problem this leads to distributed version of our algorithm in the mathcal local model that runs in constant rounds and still yields constant factor approximation even if the udg is specified without geometry we are able to combine recent results on maximal independent sets and clique partitioning of udgs to obtain an logn approximation that runs in log rounds
this paper addresses design exploration for protocols that are employed in systems with availability consistency tradeoffs distributed data is modelled as states of objects replicated across network and whose updates require satisfaction of integrity constraints over multiple objects upon detection of partition such network will continue to provide delivery of services in parallel partitions but only for updates with non critical integrity constraints once the degraded mode ends the parallel network partitions are reconciled to arrive at one partitionusing formal treatment of the reconciliation process three algorithms are proposed and studied in terms of their influence on service outage duration the longer the reconciliation time the lower is system availability since the interval in which no services are provided is longer however the reconciliation time in turn is affected by the time to construct the post partition system state the shorter the construction time the higher is the number of updates that took place in the degraded mode but that will not be taken up in the reconciled partition this will lead to longer interval for rejecting redoing these operations and thereby increase reconciliation time
wireless sensor networks are capable of carrying out surveillance missions for various applications in remote areas without human interventions an essential issue of sensor networks is to search for the balance between the limited battery supply and the desired lifetime of network operations beside data communication between sensors maintaining sufficient surveillance or sensing coverage over target region by coordination within the network is critical for many sensor networks due to the limited supply of energy source for each sensor this paper presents novel sensor network coverage maintenance protocol called coverage aware sensor engagement case to efficiently maintain the required degree of sensing coverage by activating small number of sensors while putting the others to sleep mode different from other coverage maintenance protocols case schedules active inactive sensing states of sensor according to the sensor’s contribution to the network sensing coverage therefore preserving the expected behavior of the sensor network coverage contribution of each sensor is quantitatively measured by metric called coverage merit by activating sensors with relatively large coverage merit and deactivating those with small coverage merit case effectively achieves energy conservation while maintaining sufficient sensor network coverage we provide simulation results to show that case considerably improves the energy efficiency of coverage maintenance with low communication overhead
providing accurate and scalable solutions to map low level perceptual features to high level semantics is critical for multimedia information organization and retrieval in this paper we propose confidence based dynamic ensemble cde to overcome the shortcomings of the traditional static classifiers in contrast to the traditional models cde can make dynamic adjustments to accommodate new semantics to assist the discovery of useful low level features and to improve class prediction accuracy we depict two key components of cde multi level function that asserts class prediction confidence and the dynamic ensemble method based upon the confidence function through theoretical analysis and empirical study we demonstrate that cde is effective in annotating large scale real world image datasets
we initiate the cryptographic study of order preserving symmetric encryption ope primitive suggested in the database community by agrawal et al sigmod for allowing efficient range queries on encrypted data interestingly we first show that straightforward relaxation of standard security notions for encryption such as indistinguishability against chosen plaintext attack ind cpa is unachievable by practical ope scheme instead we propose security notion in the spirit of pseudorandom functions prfs and related primitives asking that an ope scheme look as random as possible subject to the order preserving constraint we then design an efficient ope scheme and prove its security under our notion based on pseudorandomness of an underlying blockcipher our construction is based on natural relation we uncover between random order preserving function and the hypergeometric probability distribution in particular it makes black box use of an efficient sampling algorithm for the latter
disk performance is increasingly limited by its head positioning latencies ie seek time and rotational delay to reduce the head positioning latencies we propose novel technique that dynamically places copies of data in file system’s free blocks according to the disk access patterns observed at runtime as one or more replicas can now be accessed in addition to their original data block choosing the nearest replica that provides fastest access can significantly improve performance for disk operationswe implemented and evaluated prototype based on the popular ext file system in our prototype since the file system layout is modified only by using the free unused disk space hence the name free space file system or fs users are completely oblivious to how the file system layout is modified in the background they will only notice performance improvements over time for wide range of workloads running under linux fs is shown to reduce disk access time by as result of shorter seek time and shorter rotational delay making overall user perceived performance improvement the reduced disk access time also leads to energy savings per access
database systems work hard to tune performance but do not always achieve the full performance potential of modern disk systems their abstracted view of storage components hides useful device specific characteristics such as disk track boundaries and advanced built in firmware algorithms this paper presents new storage manager architecture called lachesis that exploits and adapts to observable device specific characteristics in order to achieve and sustain high performance for dss queries lachesis achieves efficiency nearly equivalent to sequential streaming even in the presence of competing random traffic in addition lachesis simplifies manual configuration and restores the optimizer’s assumptions about the relative costs of different access patterns expressed in query plans experiments using ibm db traces as well as prototype implementation show that lachesis improves standalone dss performance by on average more importantly when running concurrently with an on line transaction processing oltp workload lachesis improves dss performance by up to while oltp also exhibits speedup
feature subset selection is important not only for the insight gained from determining relevant modeling variables but also for the improved understandability scalability and possibly accuracy of the resulting models feature selection has traditionally been studied in supervised learning situations with some estimate of accuracy used to evaluate candidate subsets however we often cannot apply supervised learning for lack of training signal for these cases we propose new feature selection approach based on clustering number of heuristic criteria can be used to estimate the quality of clusters built from given feature subset rather than combining such criteria we use elsa an evolutionary local selection algorithm that maintains diverse population of solutions that approximate the pareto front in multi dimensional objective space each evolved solution represents feature subset and number of clusters two representative clustering algorithms means and em are applied to form the given number of clusters based on the selected features experimental results on both real and synthetic data show that the method can consistently find approximate pareto optimal solutions through which we can identify the significant features and an appropriate number of clusters this results in models with better and clearer semantic relevance
there has been considerable research in join operation in relational databases in this paper we introduce the concept of web join for combining hyperlinked web data web join is one of the web algebraic operator in our web warehousing system called whoweda warehouse of web data similar to its relational counterpart it can be used to gather useful composite information from two web tables the significance of web join perhaps can be best realized when we wish to combine data from web site where some of the information in the web site is no longer available due to changes to the site web join operation can be constraint free or constraint driven depending on the absence or presence of join conditions in this paper we focus our discussion on constraint driven web join operation ie web join operation in the presence of user specified join conditions specifically we discuss the syntax semantics and algorithm of web join operator
simulating hand drawn illustration techniques can succinctly express information in manner that is communicative and informative we present framework for an interactive direct volume illustration system that simulates traditional stipple drawing by combining the principles of artistic and scientific illustration we explore several feature enhancement techniques to create effective interactive visualizations of scientific and medical datasets we also introduce rendering mechanism that generates appropriate point lists at all resolutions during an automatic preprocess and modifies rendering styles through different combinations of these feature enhancements the new system is an effective way to interactively preview large complex volume datasets in concise meaningful and illustrative manner volume stippling is effective for many applications and provides quick and efficient method to investigate volume models
abstract domain specific web search engines are effective tools for reducing the difficulty experienced when acquiring information from the web existing methods for building domain specific web search engines require human expertise or specific facilities however we can build domain specific search engine simply by adding domain specific keywords called keyword spices to the user’s input query and forwarding it to general purpose web search engine keyword spices can be effectively discovered from web documents using machine learning technologies this paper will describe domain specific web search engines that use keyword spices for locating recipes restaurants and used cars
we develop and implement an optimal broadcast algorithm for fully connected processor networks under bidirectional communication model in which each processor can simultaneously send message to one processor and receive message from another possibly different processor for any number of processors the algorithm requires logp communication rounds to broadcast blocks of data from root processor to the remaining processors meeting the lower bound in the model for data of size assuming that sending and receiving data of size takes time bm the best running time that can be achieved by the division of into equal sized blocks is logp bm the algorithm uses regular circulant graph communication pattern and degenerates into binomial tree broadcast when the number of blocks to be broadcast is one the algorithm is furthermore well suited to fully connected clusters of smp symmetric multi processor nodes the algorithm is implemented as part of an mpi message passing interface library we demonstrate significant practical bandwidth improvements of up to factor over several other commonly used broadcast algorithms on both small smp cluster and node nec sx vector supercomputer
photo tourism is platform that allows users to transform unstructured online digital photos into experience nowadays image sensors are being extensively used to allow images to be taken automatically and remotely which facilitates the opportunity for live update of photo mosaics in this paper we present novel framework for live photo mosaic with group of wireless image sensor nodes where the image data aggregation is accomplished in an efficient and distributed way essentially we propose to conduct clustering and data compression at wireless image sensor network level while conserving the completeness of the feature point information for reconstruction toward the realization of the whole system we have built image sensor prototypes with commodity cameras and we validated our approach by indepth analysis extensive simulations and field experiments
we present framework that allows translation of predicated code into the static single assignment ssa form and simplifies application of the ssa based optimizations to predicated code in particular we represent predicate join points in the program by the psi functions similar to the phi functions of the basic ssa the ssa based optimizations such as constant propagation can be applied to predicated code by simply specifying additional rules for processing the psi functions we present efficient algorithms for constructing and then for removing the psi functions at the end of ssa processing our algorithm for translating out of the psi ssa splits predicated live ranges into smaller live ranges active under disjoint predicates the experimental evaluation on set of predicated benchmarks demonstrates efficiency of our approach
we present algorithmic complexity and implementation results concerning real root isolation of polynomial of degree with integer coefficients of bit size using sturm habicht sequences and the bernstein subdivision solver in particular we unify and simplify the analysis of both methods and we give an asymptotic complexity bound of mathcal tilde tau this matches the best known bounds for binary subdivision solvers moreover we generalize this to cover the non square free polynomials and show that within the same complexity we can also compute the multiplicities of the roots we also consider algorithms for sign evaluation comparison of real algebraic numbers and simultaneous inequalities and we improve the known bounds at least by factor of dfinally we present our implementation in synaps and some preliminary experiments on various data sets
wireless sensor networks have recently been suggested for many surveillance applications such as object monitoring path protection or area coverage since the sensors themselves are important and critical objects in the network natural question is whether they need certain level of protection so as to resist the attacks targeting on them directly if this is necessary then who should provide this protection and how it can be done quest we refer to the above problem as self protection as we believe the sensors themselves are the best and often the only candidates to provide such protection in this article we for the first time present formal study on the self protection problems in wireless sensor networks we show that if we simply focus on enhancing the quality of field or object covering the sensors might not necessarily be self protected which in turn makes the system extremely vulnerable we then investigate different forms of self protections and show that the problems are generally np complete we develop efficient approximation algorithms for centrally controlled sensors we further extend the algorithms to fully distributed implementation and introduce smart sleep scheduling algorithm that minimizes the energy consumption
in both commercial and defense sectors compelling need is emerging for rapid yet secure dissemination of information to the concerned actors traditional approaches to information sharing that rely on security labels eg multi level security mls suffer from at least two major drawbacks first static security labels do not account for tactical information whose value decays over time second mls like approaches have often ignored information transform semantics when deducing security labels eg output security label max over all input security labels while mls like label deduction appears to be conservative we argue that this approach can result in both underestimation and overestimation of security labels we contend that overestimation may adversely throttle information flows while underestimation incites information misuse and leakage in this paper we present novel calculus approach to securely share tactical information we model security metadata as vector half space as against lattice in mls like approach that supports three operators and the value operator maps metadata vector into time sensitive scalar value the operators and support arithmetic on the metadata vector space that are homomorphic with the semantics of information transforms we show that it is unfortunately impossible to achieve strong homomorphism without incurring exponential metadata expansion we use splines class of compact parametric curves to develop concrete realizations of our metadata calculus that satisfy weak homomorphism without suffering from metadata expansion and quantify the tightness of values estimates in the proposed approach
for admission control in real time multimedia systems buffer space disk bandwidth and network bandwidth must be considered the cbr based mechanisms do not use system resources effectively since media data is usually encoded with vbr compression techniques we propose an admission control mechanism based on vbr data model that has dynamic period length in our mechanism the period can be adaptively changed to maximize the performance considering both disk bandwidth and buffer space to compare the performance extensive simulations are conducted on rr scan and gss schemes which have the dynamic period length and the static period length
an object of database mathcal is called hot item if there is sufficiently large population of other objects in mathcal that are similar to in other words hot items are objects within dense region of other objects and provide basis for many density based data mining techniques intuitively objects that share their attribute values with lot of other objects could be potentially interesting as they show typical occurrence of objects in the database also there are lot of application domains eg sensor databases traffic management or recognition systems where objects have vague and uncertain attributes we propose an approach for the detection of potentially interesting objects hot items of an uncertain database in probabilistic way an efficient algorithm is presented which detects hot items where to each object confidence value is assigned that reflects the likelihood that is hot item in an experimental evaluation we show that our method can compute the results very efficiently compared to its competitors
we present new approach to anomaly based network intrusion detection for web applications this approach is based on dividing the input parameters of the monitored web application in two groups the regular and the irregular ones and applying new method for anomaly detection on the regular ones based on the inference of regular language we support our proposal by realizing sphinx an anomaly based intrusion detection system based on it thorough benchmarks show that sphinx performs better than current state of the art systems both in terms of false positives false negatives as well as needing shorter training period
multi view modeling and separation of concerns are widely used to decrease the design complexity of the large scale software system to ensure the correctness and consistency of multi view requirement models the formal verification technology should be applied to the model driven development process however there still lacks unified theory foundation and tool supports for the rigorous modeling approach to solve these problems we implemented an integrated modeling and verification environment tmda trustable mda based on the theory of utp in tmda developers model system requirements with uml static and dynamic models and verify the correctness and consistency of different models multidimensional model is proposed which supports the consistency verification liveness and safety property verification ocl constraints and ltl formula verification bank atm system bas is introduced to demonstrate how to utilize tmda for design and verification
wireless sensor networks come of age and start moving out of the laboratory into the field as the number of deployments is increasing the need for an efficient and reliable code update mechanism becomes pressing reasons for updates are manifold ranging from fixing software bugs to retasking the whole sensor network the scale of deployments and the potential physical inaccessibility of individual nodes asks for wireless software management scheme in this paper we present an efficient code update strategy which utilizes the knowledge of former program versions to distribute mere incremental changes using small set of instructions delta of minimal size is generated this delta is then disseminated throughout the network allowing nodes to rebuild the new application based on their currently running code the asymmetry of computational power available during the process of encoding pc and decoding sensor node necessitates careful balancing of the decoder complexity to respect the limitations of today’s sensor network hardware we provide seamless integration of our work into deluge the standard tinyos code dissemination protocol the efficiency of our approach is evaluated by means of testbed experiments showing significant reduction in message complexity and thus faster updates
the number and the importance of web applications have increased rapidly over the last years at the same time the quantity and impact of security vulnerabilities in such applications have grown as well since manual code reviews are time consuming error prone and costly the need for automated solutions has become evident in this paper we address the problem of vulnerable web applications by means of static source code analysis to this end we present novel precise alias analysis targeted at the unique reference semantics commonly found in scripting languages moreover we enhance the quality and quantity of the generated vulnerability reports by employing novel iterative two phase algorithm for fast and precise resolution of file inclusionswe integrated the presented concepts into pixy cite jovanovic pixyshort high precision static analysis tool aimed at detecting cross site scripting vulnerabilities in php scripts to demonstrate the effectiveness of our techniques we analyzed three web applications and discovered vulnerabilities both the high analysis speed as well as the low number of generated false positives show that our techniques can be used for conducting effective security audits
common approach to content based image retrieval is to use example images as queries images in the collection that have low level features similar to the query examples are returned in response to the query in this paper we explore the use of image regions as query examples we compare the retrieval effectiveness of using whole images single regions and multiple regions as examples we also compare two approaches for combining shape features an equal weight linear combination and classification using machine learning algorithms we show that using image regions as query examples leads to higher effectiveness than using whole images and that an equal weight linear combination of shape features is simpler and at least as effective as using machine learning algorithm
duplicate detection is the problem of detecting different entries in data source representing the same real world entity while research abounds in the realm of duplicate detection in relational data there is yet little work for duplicates in other more complex data models such as xml in this paper we present generalized framework for duplicate detection dividing the problem into three components candidate definition defining which objects are to be compared duplicate definition defining when two duplicate candidates are in fact duplicates and duplicate detection specifying how to efficiently find those duplicatesusing this framework we propose an xml duplicate detection method dogmatix which compares xml elements based not only on their direct data values but also on the similarity of their parents children structure etc we propose heuristics to determine which of these to choose as well as similarity measure specifically geared towards the xml data model an evaluation of our algorithm using several heuristics validates our approach
in this paper we advocate the use of multi dimensional modal logics as framework for knowledge representation and in particular for representing spatio temporal information we construct two dimensional logic capable of describing topological relationships that change over time this logic called pstl propositional spatio temporal logic is the cartesian product of the well known temporal logic ptl and the modal logic su which is the lewis system augmented with the universal modality although it is an open problem whether the full pstl is decidable we show that it contains decidable fragments into which various temporal extensions both point based and interval based of the spatial logic rcc can be embedded we consider known decidability and complexity results that are relevant to computation with multi dimensional formalisms and discuss possible directions for further research
in order to meet the increasing scale and users requirements for the distributed object computing doc systems their infrastructures are highly desirable to be redesigned based on the principles of immune system and the evolution mechanisms learned from an antibody network model novel evolutionary framework for doc doc is proposed the antibody network model as well as the evolution process including clonal proliferation immune elimination and immune memory is studied then doc framework based on the antibody network is proposed whose simulation platform is designed and implemented on the platform the evolutionary features are studied by diversity and stability of antibodies and genotypes detection and elimination of antigens effect of immune memory and tendency of eliminated and stimulated antigens the experiment results show that the proposed framework can achieve the evolution ability and the promising performance which are critical to doc systems doc is extendable for the future design of distributed object middleware such as websphere application server and bea weblogic application server
this paper presents new algorithm for segmentation of triangulated freeform surfaces using geometric quantities and morse theory the method consists of two steps initial segmentation and refinement first the differential geometry quantities are estimated on triangular meshes with which the meshes are classified into different surface types the initial segmentation is obtained by grouping the topologically adjacent meshes with the same surface types based on region growing the critical points of triangular meshes are then extracted with morse theory and utilized to further determine the boundaries of initial segments finally the region growing process starting from each critical point is performed to achieve refined segmentation the experimental results on several models demonstrate the effectiveness and usefulness of this segmentation method
recent years have seen advances in building large internet scale index structures generally known as structured overlays early structured overlays realized distributed hash tables dhts which are ill suited for anything but exact queries the need to support range queries necessitates systems that can handle uneven load distributions however such systems suffer from practical problems mdash including poor latency disproportionate bandwidth usage at participating peers or unrealistic assumptions on peers homogeneity in terms of available storage or bandwidth resources in this article we consider system that is not only able to support uneven load distributions but also to operate in heterogeneous environments where each peer can autonomously decide how much of its resources to contribute to the system we provide the theoretical foundations of realizing such network and present newly proposed system oscar based on these principles oscar can construct efficient overlays given arbitrary load distributions by employing novel scalable network sampling technique the simulations of our system validate the theory and evaluate oscar’s performance under typical challenges encountered in real life large scale networked systems including participant heterogeneity faults and skewed and dynamic load distributions thus the oscar distributed index fills in an important gap in the family of structured overlays bringing into life practical internet scale index which can play crucial role in enabling data oriented applications distributed over wide area networks
in this article we study layout and circuit implementations of input lookup table lut for via configurable structured asic we present new lut circuit and several layout designs we also propose method to improve the delay of any logic function with fewer inputs lut being able to realize all the input functions enables us to synthesize circuit using both standard cell synthesizer and an fpga technology mapper such as flowmap our study shows that circuits synthesized using standard cell synthesizer usually achieves better timing than that obtained by flowmap our study further shows that the well known lut implemented with multiplexers achieves better timing area and power dissipation our methodology can be also employed to study look up tables with more inputs
verisoft is tool for systematically exploring the state spaces of systems composed of several concurrent processes executing arbitrary code written in full fledged programming languages such as or the state space of concurrent system is directed graph that represents the combined behavior of all concurrent components in the system by exploring its state space verisoft can automatically detect coordination problems between the processes of concurrent systemwe report in this paper our analysis with verisoft of the heart beat monitor hbm telephone switching application developed at lucent technologies the hbm of telephone switch determines the status of different elements connected to the switch by measuring propagation delays of messages transmitted via these elements this information plays an important role in the routing of data in the switch and can significantly impact switch performancewe discuss the steps of our analysis of the hbm using verisoft because no modeling of the hbm code is necessary with this tool the total elapsed time before being able to run the first tests was on the order of few hours instead of several days or weeks that would have been needed for the error prone modeling phase required with traditional model checkers or theorem proverswe then present the results of our analysis since verisoft automatically generates executes and evaluates thousands of tests per minute and has complete control over nondeterminism our analysis revealed hbm behavior that is virtually impossible to detect or test in traditional lab testing environment specifically we discovered flaws in the existing documentation on this application and unexpected behaviors in the software itself these results are being used as the basis for the redesign of the hbm software in the next commercial release of the switching software
this paper presents two approaches based on metabolic and stochastic systems together with their associated analysis methods for modelling biological systems and illustrates their use through two case studies
in this paper we study how to efficiently perform set similarity joins in parallel using the popular mapreduce framework we propose stage approach for end to end set similarity joins we take as input set of records and output set of joined records based on set similarity condition we efficiently partition the data across nodes in order to balance the workload and minimize the need for replication we study both self join and join cases and show how to carefully control the amount of data kept in main memory on each node we also propose solutions for the case where even if we use the most fine grained partitioning the data still does not fit in the main memory of node we report results from extensive experiments on real datasets synthetically increased in size to evaluate the speedup and scaleup properties of the proposed algorithms using hadoop
one of the benefits of finite state verification fsv tools such as model checkers is that counterexample is provided when the property cannot be verified not all counterexamples however are equally useful to the analysts trying to understand and localize the fault often counterexamples are so long that they are hard to understand thus it is important for fsv tools to find short counterexamples and to do so quickly commonly used search strategies such as breadth first and depth first search do not usually perform well in both of these dimensions in this paper we investigate heuristic guided search strategies for the fsv tool flavers and propose novel two stage counterexample search strategy we describe an experiment showing that this two stage strategy when combined with appropriate heuristics is extremely effective at quickly finding short counterexamples for large set of verification problems
users of online services are increasingly wary that their activities could disclose confidential information on their business or personal activities it would be desirable for an online document service to perform text retrieval for users while protecting the privacy of their activities in this article we introduce privacy preserving similarity based text retrieval scheme that prevents the server from accurately reconstructing the term composition of queries and documents and anonymizes the search results from unauthorized observers at the same time our scheme preserves the relevance ranking of the search server and enables accounting of the number of documents that each user opens the effectiveness of the scheme is verified empirically with two real text corpora
in video on demand vod system in order to guarantee smooth playback of video stream sufficient resources such as disk input output bandwidth network bandwidth have to be reserved in advance thus given limited resources the number of simultaneous streams can be supported by video server is restricted due to the mechanical nature the subsystem is generally the performance bottleneck of vod system and there have been number of caching algorithms to overcome the disk bandwidth limitation in this paper we propose novel caching strategy referred to as client assisted interval caching cic scheme to balance the requirements of bandwidth and cache capacity in cost effective way the cic scheme tends to use the cache memory available in clients to serve the first few blocks of streams so as to dramatically reduce the demand on the bandwidth of the server our objective is to maximize the number of requests that can be supported by the system and minimize the overall system cost simulations are carried out to study the performance of our proposed strategy under various conditions the experimental results show the superior of cic scheme to the tradition interval caching ic scheme with respect to request accepted ratio and average servicing cost per stream
web prefetching is based on web caching and attempts to reduce user perceived latency unlike on demand caching web prefetching fetches objects and stores them in advance hoping that the prefetched objects are likely to be accessed in the near future and such accesses would be satisfied from the caches rather than by retrieving the objects from the web server this paper reviews the popular prefetching algorithms based on popularity good fetch apl characteristic and lifetime and then makes the following contributions the paper proposes family of linear time prefetching algorithms objective greedy prefetching wherein each algorithm greedily prefetches those web objects that most significantly improve the performance as per the targeted metric the hit rate greedy and bandwidth greedy algorithms are shown to be optimal for their respective objective metrics linear time optimal prefetching algorithm that maximizes the metric as the performance measure is proposed the paper shows the results of performance analysis via simulations comparing the proposed algorithms with the existing algorithms in terms of the respective objectives the hit rate bandwidth and the metrics the proposed prefetching algorithms are seen to provide better objective based performance than any existing algorithms further greedy performs almost as well as optimal
malware writers and detectors have been running an endless battle self defense is the weapon most malware writers prepare against malware detectors malware writers have tried to evade the improved detection techniques of anti virus av products packing and code obfuscation are two popular evasion techniques when these techniques are applied to malwares they are able to change their instruction sequence while maintaining their intended function we propose detection mechanism defeating these self defense techniques to improve malware detection since an obfuscated malware is able to change the syntax of its code while preserving its semantics the proposed mechanism uses the semantic invariant we convert the api call sequence of the malware into graph commonly known as call graph to extract the semantic of the malware the call graph can be reduced to code graph used for semantic signatures of the proposed mechanism we show that the code graph can represent the characteristics of program exactly and uniquely next we evaluate the proposed mechanism by experiment the mechanism has an detection ratio of real world malwares and detects metamorphic malwares that can evade av scanners in this paper we show how to analyze malwares by extracting program semantics using static analysis it is shown that the proposed mechanism provides high possibility of detecting malwares even when they attempt self protection
in this paper we show how pervasive technologies can be employed on public display advertisement scenario to enable behavioral self adaptation of content we show this through myads system capable of exploiting pervasive technologies to autonomously adapt the advertisement process to the trends of interests detected among the audience in venue after describing the rationale the architecture and the prototype of myads we describe the advantages brought by the use of such system in terms of impact on the audience and economic efficiency the comparison of myads performances with different advertisement selection techniques confirms the validity of our advertisement model and our prototype in particular as means for maximising product awareness in an audience and for enhancing economic efficiency
in multi label learning each training example is associated with set of labels and the task is to predict the proper label set for the unseen example due to the tremendous exponential number of possible label sets the task of learning from multi label examples is rather challenging therefore the key to successful multi label learning is how to effectively exploit correlations between different labels to facilitate the learning process in this paper we propose to use bayesian network structure to efficiently encode the conditional dependencies of the labels as well as the feature set with the feature set as the common parent of all labels to make it practical we give an approximate yet efficient procedure to find such network structure with the help of this network multi label learning is decomposed into series of single label classification problems where classifier is constructed for each label by incorporating its parental labels as additional features label sets of unseen examples are predicted recursively according to the label ordering given by the network extensive experiments on broad range of data sets validate the effectiveness of our approach against other well established methods
recent and effective approach to probabilistic inference calls for reducing the problem to one of weighted model counting wmc on propositional knowledge base specifically the approach calls for encoding the probabilistic model typically bayesian network as propositional knowledge base in conjunctive normal form cnf with weights associated to each model according to the network parameters given this cnf computing the probability of some evidence becomes matter of summing the weights of all cnf models consistent with the evidence number of variations on this approach have appeared in the literature recently that vary across three orthogonal dimensions the first dimension concerns the specific encoding used to convert bayesian network into cnf the second dimensions relates to whether weighted model counting is performed using search algorithm on the cnf or by compiling the cnf into structure that renders wmc polytime operation in the size of the compiled structure the third dimension deals with the specific properties of network parameters local structure which are captured in the cnf encoding in this paper we discuss recent work in this area across the above three dimensions and demonstrate empirically its practical importance in significantly expanding the reach of exact probabilistic inference we restrict our discussion to exact inference and model counting even though other proposals have been extended for approximate inference and approximate model counting
while large scale taxonomies especially for web pages have been in existence for some time approaches to automatically classify documents into these taxonomies have met with limited success compared to the more general progress made in text classification we argue that this stems from three causes increasing sparsity of training data at deeper nodes in the taxonomy error propagation where mistake made high in the hierarchy cannot be recovered and increasingly complex decision surfaces in higher nodes in the hierarchy while prior research has focused on the first problem we introduce methods that target the latter two problems first by biasing the training distribution to reduce error propagation and second by propagating up first guess expert information in bottom up manner before making refined top down choice finally we present an empirical study demonstrating that the suggested changes lead to improvements in scores versus an accepted competitive baseline hierarchical svms
the satisfiability problem sat as one of the six basic core np complete problems has been the deserving object of many studies in the last two decades stochastic local search and genetic algorithms are two current state of the art techniques for solving the sats gasat and sat waga are two current state of the art genetic algorithms for solving sats beside the discrete lagrange multiplier dlm and the exponentiated subgradient algorithms esg are the current state of the art local search algorithms for solving sats in this paper we compare dlm and esg with gasat and sat waga we show that the performance of the local search based algorithms dlm and esg is better than the performance of the genetic based algorithms gasat and sat waga we further analyze the results of the comparisons we hope these comparisons give light to the researchers while researching in genetic and local search algorithms help understanding of the reasons for these algorithms performance and behaviour and to the new researchers who are in the process of choosing research direction
explicit referencing is mechanism for enabling deictic gestures in on line communication little is known about the impact of er on distance problem solving in this paper we report on study where students pairs had to solve problem collaboratively at distance using chat tools that differed in the way user may relate an utterance to the task context results indicate that team performance is improved by explicit referencing mechanisms however when explicit referencing is implemented in way that is detrimental to the linearity of the conversation resulting in the visual dispersion or scattering of messages its use has negative consequences for collaborative work at distance the role of linear message history in the collaboration mechanisms was equally important than that of explicit referencing
pp computing is gaining more and more attention from both academia and industrial communities for its potential to reconstruct current distributed applications on the internet however the basic dht based pp systems support only exact match queries ranked queries produce results that are ordered by certain computed scores which have become widely used in many applications relying on relational databases where users do not expect exact answers to their queries but instead ranked set of the objects that best match their preferences by combing pp computing and ranked query processing this paper addresses the problem of providing ranked queries support in peer to peer pp networks and introduces efficient algorithms to solve this problem considering that the existing algorithms for ranked queries consume an excessive amount of bandwidth when they are applied directly into the scenario of pp networks we propose two new algorithms psel for ranked selection queries and pjoin for ranked join queries psel and pjoin reduce bandwidth cost by pruning irrelevant tuples before query processing performance of the proposed algorithms are validated by extensive experiments
complex queries over high speed data streams often need to rely on approximations to keep up with their input the research community has developed rich literature on approximate streaming algorithms for this application many of these algorithms produce samples of the input stream providing better properties than conventional random sampling in this paper we abstract the stream sampling process and design new stream sample operator we show how it can be used to implement wide variety of algorithms that perform sampling and sampling based aggregations also we show how to implement the operator in gigascope high speed stream database specialized for ip network monitoring applications as an example study we apply the operator within such an enhanced gigascope to perform subset sum sampling which is of great interest for ip network management we evaluate this implemention on live high speed internet traffic data stream and find that the operator is flexible versatile addition to gigascope suitable for tuning and algorithm engineering and the operator imposes only small evaluation overhead this is the first operational implementation we know of for wide variety of stream sampling algorithms at line speed within data stream management system
concerns over the scalability of tcp’s end to end approach to congestion control and its aimd congestion adaptation have led to proposals for router based congestion control specifically active queue management aqm in this paper we present an end to end alternative to aqm new congestion detection and reaction mechanism for tcp based on measurements of one way transit times of tcp segments within tcp connection our design called sync tcp places timestamps in tcp headers measures variation in one way transit times and uses these measurements as form of early congestion notification we demonstrate empirically that sync tcp provides better throughput and http response time performance than tcp reno sync tcp provides better early congestion detection and reaction than the adaptive random early detection with explicit congestion notification aqm mechanism sync tcp’s congestion detection and adaptation mechanisms are robust against clock drift sync tcp is an incrementally deployable protocol sync tcp connections can co exist with tcp reno connections in network and the performance of tcp reno connections are improved with the addition of even small percentage of sync tcp connections
finding suitable fragment of interest in nonschematic xml document with simple keyword search is complex task to deal with this problem this paper proposes theoretical framework with focus on an algebraic query model having novel query semantics based on this semantics xml fragments that look meaningful to keyword based query are effectively retrieved by the operations defined in the model in contrast to earlier work our model supports filters for restricting the size of query result which otherwise may contain large number of potentially irrelevant fragments we introduce class of filters having special property that enables significant reduction in query processing cost many practically useful filters fall in this class and hence the proposed model can be efficiently applied to real world xml documents several other issues regarding algebraic manipulation of the operations defined in our query model are also formally discussed
automatic video search based on semantic concept detectors has recently received significant attention since the number of available detectors is much smaller than the size of human vocabulary one major challenge is to select appropriate detectors to response user queries in this paper we propose novel approach that leverages heterogeneous knowledge sources for domain adaptive video search first instead of utilizing wordnet as most existing works we exploit the context information associated with flickr images to estimate query detector similarity the resulting measurement named flickr context similarity fcs reflects the co occurrence statistics of words in image context rather than textual corpus starting from an initial detector set determined by fcs our approach novelly transfers semantic context learned from test data domain to adaptively refine the query detector similarity the semantic context transfer process provides an effective means to cope with the domain shift between external knowledge source eg flickr context and test data which is critical issue in video search to the best of our knowledge this work represents the first research aiming to tackle the challenging issue of domain change in video search extensive experiments on textual queries over trecvid data sets demonstrate the effectiveness of semantic context transfer for domain adaptive video search results also show that the fcs is suitable for measuring query detector similarity producing better performance to various other popular measures
systematic approach is given for symbolically caching intermediate results useful for deriving incremental programs from non incremental programs we exploit number of program analysis and transformation techniques centered around effective caching based on its utilization in deriving incremental programs in order to increase the degree of incrementality not otherwise achievable by using only the return values of programs that are of direct interest our method can be applied straightforwardly to provide systematic approach to program improvement via caching
in some real world applications the data can be represented naturally in special kind of graphs in which each vertex consists of set of structured data such as item sets sequences and so on one of the typical examples is metabolic pathways in bioinformatics metabolic pathway is represented in graph structured data in which each vertex corresponds to an enzyme described by set of various kinds of properties such as amino acid sequence enzyme number and so on we call this kind of complex graphs multi structured graphs in this paper we propose an algorithm named fmg for mining frequent patterns in multistructured graphs in fmg while the external structure will be expanded by the same mechanism of conventional graph miners the internal structure will be enumerated by the algorithms suitable for its structure in addition fmg employs novel pruning techniques to exclude uninteresting patterns the preliminary experimental results with real datasets show the effectiveness of the proposed algorithm
research on networks on chips nocs has spanned over decade and its results are now visible in some products thus the seminal idea of using networking technology to address the chip level interconnect problem has been shown to be correct moreover as technology scales down in geometry and chips scale up in complexity nocs become the essential element to achieve the desired levels of performance and quality of service while curbing power consumption levels design and timing closure can only be achieved by sophisticated set of tools that address noc synthesis optimization and validation
the web of data keeps growing rapidly however the full exploitation of this large amount of structured data faces numerous challenges like usability scalability imprecise information needs and data change we present semplore an ir based system that aims at addressing these issues semplore supports intuitive faceted search and complex queries both on text and structured data it combines imprecise keyword search and precise structured query in unified ranking scheme scalable query processing is supported by leveraging inverted indexes traditionally used in ir systems this is combined with novel block based index structure to support efficient index update when data changes the experimental results show that semplore is an efficient and effective system for searching the web of data and can be used as basic infrastructure for web scale semantic web search engines
dual contouring dc is feature preserving isosurfacing method that extracts crack free surfaces from both uniform and adaptive octree grids we present an extension of dc that further guarantees that the mesh generated is manifold even under adaptive simplification our main contribution is an octree based topology preserving vertex clustering algorithm for adaptive contouring the contoured surface generated by our method contains only manifold vertices and edges preserves sharp features and possesses much better adaptivity than those generated by other isosurfacing methods under topologically safe simplification
this paper describes mean field approach to defining and implementing policy based system administration the concepts of regulation and optimization are used to define the notion of maintenance these are then used to evaluate stable equilibria of system configuration that are associated with sustainable policies for system management stable policies are thus associated with fixed points of mapping that describes the evolution of the system in general such fixed points are the solutions of strategic games consistent system policy is not sufficient to guarantee compliance the policy must also be implementable and maintainable the paper proposes two types of model to understand policy driven management of human computer systems average dynamical descriptions of computer system variables which provide quantitative basis for decision and ii competitive game theoretical descriptions that select optimal courses of action by generalizing the notion of configuration equilibria it is shown how models can be formulated and simple examples are given
workflow management systems wfms are often used to support the automated execution of business processes in today’s networked environment it is not uncommon for organizations representing different business partners to collaborate for providing value added services and products as such workflows representing the business processes in this loosely coupled dynamic and ad hoc coalition environment tend to span across the organizational boundaries as result it is not viable to employ single centralized wfms to control the execution of the inter organizational workflow due to limited scalability availability and performance to this end in this paper we present decentralized workflow model where inter task dependencies are enforced without requiring to have centralized wfms in our model workflow is divided into partitions called self describing workflows and handled by light weight workflow management component called the workflow stub located at each organization we present performance study by considering different types of workflows with varying degrees of parallelism our performance results indicate that decentralized workflow management indeed enjoys significant gain in performance over its centralized counterpart in cases where there is less parallelism
we present method for stochastic fiber tract mapping from diffusion tensor mri dt mri implemented on graphics hardware from the simulated fibers we compute connectivity map that gives an indication of the probability that two points in the dataset are connected by neuronal fiber path bayesian formulation of the fiber model is given and it is shown that the inversion method can be used to construct plausible connectivity an implementation of this fiber model on the graphics processing unit gpu is presented since the fiber paths can be stochastically generated independently of one another the algorithm is highly parallelizable this allows us to exploit the data parallel nature of the gpu fragment processors we also present framework for the connectivity computation on the gpu our implementation allows the user to interactively select regions of interest and observe the evolving connectivity results during computation results are presented from the stochastic generation of over fiber steps per iteration at interactive frame rates on consumer grade graphics hardware
one aim of component based software engineering cbse is to enable the prediction of extra functional properties such as performance and reliability utilising well defined composition theory nowadays such theories and their accompanying prediction methods are still in maturation stage several factors influencing extra functional properties need additional research to be understood special problem in cbse stems from its specific development process software components should be specified and implemented independent from their later context to enable reuse thus extra functional properties of components need to be specified in parametric way to take different influence factors like the hardware platform or the usage profile into account in our approach we use the palladio component model pcm to specify component based software architectures in parametric way this model offers direct support of the cbse development process by dividing the model creation among the developer roles in this paper we present our model and simulation tool based on it which is capable of making performance predictions within case study we show that the resulting prediction accuracy can be sufficient to support the evaluation of architectural design decisions
as baseline for software development correct and complete requirements definition is one foundation of software quality previously novel approach to static testing of software requirements was proposed in which requirements definitions are tested on set of task scenarios by examining software behaviour in each scenario described by an activity list such descriptions of software behaviour can be generated automatically from requirements models this paper investigates various testing methods for selecting test scenarios data flow state transition and entity testing methods are studied variety of test adequacy criteria and their combinations are formally defined and the subsume relations between the criteria are proved empirical studies of the testing methods and the construction of prototype testing tool are reported
transfinite mean value interpolation has recently emerged as simple and robust way to interpolate function defined on the boundary of planar domain in this paper we study basic properties of the interpolant including sufficient conditions on the boundary of the domain to guarantee interpolation when is continuous then by deriving the normal derivative of the interpolant and of mean value weight function we construct transfinite hermite interpolant and discuss various applications
the research issue of broadcasting has attracted considerable amount of attention in mobile computing system by utilizing broadcast channels server continuously and repeatedly broadcasts data to mobile users these broadcast channels are also known as broadcast disks from which mobile users can retrieve data using broadcasting mobile users can obtain the data of interest efficiently and only need to wait for the required data to present on the broadcast channel the issue of designing proper data allocation in the broadcast disks is to reduce the average expected delay of all data items we explore in this paper the problem of generating hierarchical broadcast programs with the data access frequencies and the number of broadcast disks in broadcast disk array given specifically we first transform the problem of generating hierarchical broadcast programs into the one of constructing channel allocation tree with variant fanout by exploiting the feature of tree generation with variant fanout we develop heuristic algorithm vfk to minimize the expected delay of data items in the broadcast program in order to evaluate the solution quality obtained by algorithm vfk and compare its resulting broadcast program with the optimal one we devise an algorithm opt based on guided search to obtain the optimal solution performance of these algorithms is comparatively analyzed sensitivity analysis on several parameters including the number of data items and the number of broadcast disks is conducted it is shown by our simulation results that by exploiting the feature of variant fanout in constructing the channel allocation tree the solution obtained by algorithm vfk is of very high quality and is in fact very close to the optimal one resulted by algorithm opt moreover algorithm vfk is of very good scalability which is important for algorithm vfk to be of practical use to generate hierarchical broadcast programs dynamically in mobile computing environment
to deal with the problem of too many results returned from an commerce web database in response to user query this paper proposes novel approach to rank the query results based on the user query we speculate how much the user cares about each attribute and assign corresponding weight to it then for each tuple in the query result each attribute value is assigned score according to its desirableness to the user these attribute value scores are combined according to the attribute weights to get final ranking score for each tuple tuples with the top ranking scores are presented to the user first our ranking method is domain independent and requires no user feedback experimental results demonstrate that this ranking method can effectively capture user’s preferences
in this work we investigate the use of directional antennas and beam steering techniques to improve performance of links in the context of communication between amoving vehicle and roadside aps to this end we develop framework called mobisteer that provides practical approaches to perform beam steering mobisteer can operate in two modes cached mode where it uses prior radiosurvey data collected during idle drives and online mode where it uses probing the goal is to select the best ap and beam combination at each point along the drive given the available information so that the throughput can be maximized for the cached mode an optimal algorithm for ap and beam selection is developed that factors in all overheads we provide extensive experimental results using commercially available eight element phased array antenna in the experiments we use controlled scenarios with our own aps in two different multipath environments as well as in situ scenarios where we use aps already deployed in an urban region to demonstrate the performance advantage of using mobisteer over using an equivalent omni directional antenna we show that mobisteer improves the connectivity duration as well as phy layer data rate due to better snr provisioning in particular mobisteer improves the throughput in the controlled experiments by factor of in in situ experiments it improves the connectivity duration by more than factor of and average snr by about db
service providers have begun to offer multimedia on demand services to residential estates by installing isolated small scale multimedia servers at individual estates such an arrangement allows the service providers to operate without relying on high speed large capacity metropolitan area network which is still not available in many countries unfortunately installing isolated servers could incur very high server costs as each server requires spare bandwidth to cope with fluctuations in user demand in this paper we explore the feasibility of linking up several small multimedia servers to limited capacity network and allowing servers with idle retrieval bandwidth to help out servers that are temporarily overloaded the goal is to minimize the waiting time for service to begin we identify four characteristics of load sharing in distributed multimedia system that differentiate it from load balancing in conventional distributed system we then introduce gwq load sharing algorithm that fits and exploits these characteristics it puts all servers pending requests in global queue from which server with idle capacity obtains additional jobs the performance of the algorithm is captured by an analytical model which we validate through simulations both the analytical and simulation models show that the algorithm vastly reduces wait times at the servers the analytical model also provides guidelines for capacity planning finally we propose an enhanced gwq algorithm that allows server to reclaim active local requests that are being serviced remotely simulation experiments indicate that the scheduling decisions of gwq are optimal in the sense that it enables the distributed servers to approximate the performance of large centralized server
in the everyday exercise of controlling their locomotion humans rely on their optic flow of the perceived environment to achieve collision free navigation in crowds in spite of the complexity of the environment made of numerous obstacles humans demonstrate remarkable capacities in avoiding collisions cognitive science work on human locomotion states that relatively succinct information is extracted from the optic flow to achieve safe locomotion in this paper we explore novel vision based approach of collision avoidance between walkers that fits the requirements of interactive crowd simulation by simulating humans based on cognitive science results we detect future collisions as well as the level of danger from visual stimuli the motor response is twofold reorientation strategy prevents future collision whereas deceleration strategy prevents imminent collisions several examples of our simulation results show that the emergence of self organized patterns of walkers is reinforced using our approach the emergent phenomena are visually appealing more importantly they improve the overall efficiency of the walkers traffic and avoid improbable locking situations
it is difficult for instructors of cs and cs courses to get accurate answers to such critical questions as how long are students spending on programming assignments or what sorts of errors are they making at the same time students often have no idea of where they stand with respect to the rest of the class in terms of time spent on an assignment or the number or types of errors that they encounter in this paper we present tool called retina which collects information about students programming activities and then provides useful and informative reports to both students and instructors based on the aggregation of that data retina can also make real time recommendations to students in order to help them quickly address some of the errors they make in addition to describing retina and its features we also present some of our initial findings during two trials of the tool in real classroom setting
to date realistic isp topologies have not been accessible to the research community leaving work that depends on topology on an uncertain footing in this paper we present new internet mapping techniques that have enabled us to directly measure router level isp topologies our techniques reduce the number of required traces compared to brute force all to all approach by three orders of magnitude without significant loss in accuracy they include the use of bgp routing tables to focus the measurements exploiting properties of ip routing to eliminate redundant measurements better alias resolution and the use of dns to divide each map into pops and backbone we collect maps from ten diverse isps using our techniques and find that our maps are substantially more complete than those of earlier internet mapping efforts we also report on properties of these maps including the size of pops distribution of router outdegree and the inter domain peering structure as part of this work we release our maps to the community
this paper introduces the notion of physical hypermedia addressing the problem of organizing material in mixed digital and physical environments based on empirical studies we propose concepts for collectional actions and meta data actions and present prototypes combining principles from augmented reality and hypermedia to support organization of mixtures of digital and physical materials our prototype of physical hypermedia system is running on an augmented architect’s desk and digital walls utilizing radio frequency identifier rfid tags as well as visual tags tracked by cameras it allows users to tag physical materials and have these tracked by readers antennas that may become pervasive in our work environments in the physical hypermedia system we work with three categories of rfid tags simple object tags collectional tags and tooltags invoking operations such as grouping and linking of physical material in addition we utilize visual artoolkit tags for linking and navigating models on physical desk our primary application domain is architecture and design and so we discuss the use of augmented collectional artifacts primarily for this domain
mutual exclusion and concurrency are two fundamental and essentially opposite features in distributed systems however in some applications such as computer supported cooperative work cscw we have found it necessary to impose mutual exclusion on different groups of processes in accessing resource while allowing processes of the same group to share the resource to our knowledge no such design issue has been previously raised in the literaturein this paper we address this issue by presenting new problem called congenial talking philosophers to model group mutual exclusion we also propose several criteria to evaluate solutions of the problem and to measure their performance finally we provide an efficient and highly concurrent distributed algorithm for the problem in shared memory model where processes communicate by reading from and writing to shared variables the distributed algorithm meets the proposed criteria and has performance similar to some naive but centralized solutions to the problem
internet content is increasingly available for mobile users mobile devices are capable of delivering not only full web pages but also other web content such as podcasts and rss feeds today cost speed of data transfer and network coverage are among the biggest remaining concerns for mobile web users it would be useful to have the interesting content readily available on the device when the user wants to access it battery and memory limitations together with the cost of data transfer require delicate content prefetching as loading content carelessly could result in overloading of device resources and huge phone bill this article introduces an auto update concept that lets the users control the costs and the mobile device resources by means of high level profiles that hide underlying complexity from the users we tested the usefulness of the auto update prototype in smallscale field study and the results indicate that auto update is especially useful for prefetching feeds
extracting facts from software source code forms the foundation for any software analysis experience shows however that extracting facts from programs written in wide range of programming and application languages is labour intensive and error prone we present defacto new technique for fact extraction it amounts to annotating the context free grammar of language of interest with fact annotations that describe how to extract elementary facts for language elements such as for instance declaration or use of variable procedure or method call or control flow statements once the elementary facts have been extracted we use relational techniques to further enrich them and to perform the actual software analysiswe motivate and describe our approach sketch prototype implementation and assess it using various examples comparison with other fact extraction methods indicates that our fact extraction descriptions are considerably smaller than those of competing methods
this paper presents neural computing model that can automatically extract motion qualities from live performance the motion qualities are in terms of laban movement analysis lma effort factors the model inputs both motion capture and video projections the output is classification of motion qualities that are detected in the input the neural nets are trained with professional lma notators to ensure valid analysis and have achieved an accuracy of about in motion quality recognition the combination of this system with the emote motion synthesis system provides capability for automating both observation and analysis processes to produce natural gestures for embodied communicative agents
in the last few years lot of attention has been paid to the specification and subsequent manipulation of schema mappings problem which is of fundamental importance in metadata management there have been many achievements in this area and semantics have been defined for operators on schema mappings such as composition and inverse however little research has been pursued towards providing formal tools to compare schema mappings in terms of their ability to transfer data and avoid storing redundant information which has hampered the development of foundations for more complex operators as many of them involve these notions in this paper we address the problem of providing foundations for metadata management by developing an order to compare the amount of information transferred by schema mappings from this order we derive several other criteria to compare mappings we provide tools to deal with these criteria and we show their usefulness in defining and studying schema mapping operators more precisely we show how the machinery developed can be used to study the extract and merge operators that have been identified as fundamental for the development of metadata management framework we also use our machinery to provide simpler proofs for some fundamental results regarding the inverse operator and we give an effective characterization for the decidability of the well known schema evolution problem
we review the current status of ethnography in systems design we focus particularly on new approaches to and understandings of ethnography that have emerged as the computer has moved out of the workplace these seek to implement different order of ethnographic study to that which has largely been employed in design to date in doing so they reconfigure the relationship ethnography has to systems design replacing detailed empirical studies of situated action with studies that provide cultural interpretations of action and critiques of the design process itself we hold these new approaches to and understandings of ethnography in design up to scrutiny with the purpose of enabling designers to appreciate the differences between new and existing approaches to ethnography in systems design and the practical implications this might have for design
mobile device based human centric sensing and user state recognition provide rich contextual information for various mobile applications and services however continuously capturing this contextual information consumes significant amount of energy and drains mobile device battery quickly in this paper we propose computationally efficient algorithm to obtain the optimal sensor sampling policy under the assumption that the user state transition is markovian this markov optimal policy minimizes user state estimation error while satisfying given energy consumption budget we first compare the markov optimal policy with uniform periodic sensing for markovian user state transitions and show that the improvements obtained depend upon the underlying state transition probabilities we then apply the algorithm to two different sets of real experimental traces pertaining to user motion change and inter user contacts and show that the markov optimal policy leads to an approximately improvement over the naive uniform sensing policy
cache coherence in shared memory multiprocessor systems has been studied mostly from an architecture viewpoint often by means of aggregating metrics in many cases aggregate events provide insufficient information for programmers to understand and optimize the coherence behavior of their applications better understanding would be given by source code correlations of not only aggregate events but also finer granularity metrics directly linked to high level source code constructs such as source lines and data structures in this paper we explore novel application centric approach to studying coherence traffic we develop coherence analysis framework based on incremental coherence simulation of actual reference traces we provide tool support to extract these reference traces and synchronization information from openmp threads at runtime using dynamic binary rewriting of the application executable these traces are fed to ccsim our cache coherence simulator the novelty of ccsim lies in its ability to relate low level cache coherence metrics such as coherence misses and their causative invalidations to high level source code constructs including source code locations and data structures we explore the degree of freedom in interleaving data traces from different processors and assess simulation accuracy in comparison to metrics obtained from hardware performance counters our quantitative results show that cache coherence traffic can be simulated with considerable degree of accuracy for spmd programs as the invalidation traffic closely matches the corresponding hardware performance counters detailed high level coherence statistics are very useful in detecting isolating and understanding coherence bottlenecks we use ccsim with several well known benchmarks and find coherence optimization opportunities leading to significant reductions in coherence traffic and savings in wall clock execution time
mining maximal frequent itemsets is one of the most fundamental problems in data mining in this paper we study the complexity theoretic aspects of maximal frequent itemset mining from the perspective of counting the number of solutions we present the first formal proof that the problem of counting the number of distinct maximal frequent itemsets in database of transactions given an arbitrary support threshold is complete thereby providing strong theoretical evidence that the problem of mining maximal frequent itemsets is np hard this result is of particular interest since the associated decision problem of checking the existence of maximal frequent itemset is in pwe also extend our complexity analysis to other similar data mining problems dealing with complex data structures such as sequences trees and graphs which have attracted intensive research interests in recent years normally in these problems partial order among frequent patterns can be defined in such way as to preserve the downward closure property with maximal frequent patterns being those without any successor with respect to this partial order we investigate several variants of these mining problems in which the patterns of interest are subsequences subtrees or subgraphs and show that the associated problems of counting the number of maximal frequent patterns are all either complete or hard
similar code may exist in large software projects due to some common software engineering practices such as copying and pasting code and version programming although previous work has studied syntactic equivalence and small scale coarse grained program level and function level semantic equivalence it is not known whether significant fine grained code level semantic duplications exist detecting such semantic equivalence is also desirable because it can enable many applications such as code understanding maintenance and optimization in this paper we introduce the first algorithm to automatically mine functionally equivalent code fragments of arbitrary size down to an executable statement our notion of functional equivalence is based on input and output behavior inspired by schwartz’s randomized polynomial identity testing we develop our core algorithm using automated random testing candidate code fragments are automatically extracted from the input program and random inputs are generated to partition the code fragments based on their output values on the generated inputs we implemented the algorithm and conducted large scale empirical evaluation of it on the linux kernel our results show that there exist many functionally equivalent code fragments that are syntactically different ie they are unlikely due to copying and pasting code the algorithm also scales to million line programs it was able to analyze the linux kernel with several days of parallel processing
the asymptotic robustness of estimators as function of rarity parameter in the context of rare event simulation is often qualified by properties such as bounded relative error bre and logarithmic efficiency le also called asymptotic optimality however these properties do not suffice to ensure that moments of order higher than one are well estimated for example they do not guarantee that the variance of the empirical variance remains under control as function of the rarity parameter we study generalizations of the bre and le properties that take care of this limitation they are named bounded relative moment of order brm and logarithmic efficiency of order le where ge is an arbitrary real number we also introduce and examine stronger notion called vanishing relative centered moment of order and exhibit examples where it holds these properties are of interest for various estimators including the empirical mean and the empirical variance we develop sufficient lyapunov type conditions for these properties in setting where state dependent importance sampling is is used to estimate first passage time probabilities we show how these conditions can guide us in the design of good is schemes that enjoy convenient asymptotic robustness properties in the context of random walks with light tailed and heavy tailed increments as another illustration we study the hierarchy between these robustness properties and few others for model of highly reliable markovian system hrms where the goal is to estimate the failure probability of the system in this setting for popular class of is schemes we show that brm and le are equivalent and that these properties become strictly stronger when increases we also obtain necessary and sufficient condition for brm in terms of quantities that can be readily computed from the parameters of the model
despite that simulation possesses an establish background and offers tremendous promise for designing and analyzing complex production systems manufacturing industry has been less successful in using it as decision support tool especially in the conceptual phase of factory design this paper presents how simplification and aggregation strategies are incorporated in modeling simulation and analysis tool with the aim of supporting decision making in conceptual phase conceptual modeling is guided by framework using an object library with generic drag and drop system components and system control objects data inputs are simplified by the use of effective process time distributions and novel aggregation method for product mix cycle time differences the out coming specification is through web service interface handle by modeling system architecture automatically generating simulation model and analysis case studies confirm breakthrough in project time reduction without appreciable effects on the model’s fidelity
this paper provides survey of security features in modern programming languages for computer science instructors we present the role that type safety and capabilities provide for the building of secure systems and how language systems allow designers to model security issues that once were part and parcel of operating systems or that can not be modeled by the latter
concurrency and distribution pose algorithmic and implementation challenges in developing reliable distributed systems making the field an excellent testbed for evaluating programming language and verification paradigms several specialized domain specific languages and extensions of memory unsafe languages were proposed to aid distributed system development we present an alternative to these approaches showing that modern higher order strongly typed memory safe languages provide an excellent vehicle for developing and debugging distributed systems we present opis functional reactive approach for developing distributed systems in objective caml an opis protocol description consists of reactive function called event function describing the behavior of distributed system node the event functions in opis are built from pure functions as building blocks composed using the arrow combinators such architecture aids reasoning about event functions both informally and using interactive theorem provers for example it facilitates simple termination arguments given protocol description developer can use higher order library functions of opis to deploy the distributed system run the distributed system in network simulator with full replay capabilities apply explicit state model checking to the distributed system detecting undesirable behaviors and do performance analysis on the system we describe the design and implementation of opis and present our experience in using opis to develop peer to peer overlay protocols including the chord distributed hash table and the cyclon random gossip protocol we found that using opis results in high programmer productivity and leads to easily composable protocol descriptions opis tools were effective in helping identify and eliminate correctness and performance problems during distributed system development
we describe new ways to simulate party communication protocols to get protocols with potentially smaller communication we show that every communication protocol that communicates bits and reveals bits of information about the inputs to the participating parties can be simulated by new protocol involving at most ci bits of communication if the protocol reveals bits of information about the inputs to an observer that watches the communication in the protocol we show how to carry out the simulation with bits of communication these results lead to direct sum theorem for randomized communication complexity ignoring polylogarithmic factors we show that for worst case computation computing copies of function requires times the communication required for computing one copy of the function for average case complexity given any distribution on inputs computing copies of the function on inputs sampled independently according to requires times the communication for computing one copy if is product distribution computing copies on independent inputs sampled according to requires times the communication required for computing the function we also study the complexity of computing the sum or parity of evaluations of and obtain results analogous to those above
digital library mediators allow interoperation between diverse information services in this paper we describe flexible and dynamic mediator infrastructure that allows mediators to be composed from set of modules blades each module implements particular mediation function such as protocol translation query translation or result merging all the information used by the mediator including the mediator logic itself is represented by an rdf graphwe illustrate our approach using mediation scenario involving dienst and server and we discuss the potential advantages and weaknesses of our framework
the logic of bunched implications bi introduced by o’hearn and pym is substructural logic which freely combines additive and multiplicative implications boolean bi bbi denotes bi with classical interpretation of additives and its model is the commutative monoid we show that when the monoid is finitely generated and propositions are recursively defined or the monoid is infinitely generated and propositions are restricted to generator propositions the model checking problem is undecidable in the case of finitely related monoid and generator propositions the model checking problem is expspace complete
we present the architecture of scalable and dynamic intermediary infrastructure for developing and deploying advanced edge computing services by using cluster of heterogeneous machines our main goal is to address the challenges of the next generation internet services scalability high availability fault tolerance and robustness moreover secs offers an easy on the fly and per user configuration of services the architecture is based on ibm’s web based intermediaries wbi
complex relationships frequently referred to as semantic associa tions are the essence of the semantic web query and retrieval of semantic associations has been an important task in many analytical and scientific activities such as detecting money laundering and querying for metabolic pathways in biochemistry we believe that support for semantic path queries should be an integral component of rdf query languages in this paper we present sparqler novel extension of the sparql query language which adds the support for semantic path queries the proposed extension fits seamlessly within the overall syntax and semantics of sparql and allows easy and natural formulation of queries involving wide variety of regular path patterns in rdf graphs sparqler’s path patterns can capture many low level details of the queried associations we also present an implementation of sparqler and its initial performance results our implementation is built over brahms our own rdf storage system
recent research literature on sensor network databases has focused on finding ways to perform in network aggregation of sensor readings to reduce the message cost however with these techniques information about the state at particular location is lost in many applications such as visualization finite element analysis and cartography constructing field from all sensor readings is very important however requiring all sensors to report their readings to centralized station adversely impacts the life span of the sensor network in this paper we focus on modeling sensor networks as field deployed in physical space and exploiting in network surface simplification techniques to reduce the message cost in particular we propose two schemes for performing in network surface simplification namely hierarchical approach and triangulation based approach we focus on quad tree based method and decimation method for the two approaches respectively the quad tree based method employs an incremental refinement process during reconstruction using increasingly finer levels of detail sent by selected sensors it has guaranteed error bound the decimation method starts with triangulation of all sensors and probabilistically selects sensors not to report to prevent error accumulation to demonstrate the performance the two simplification techniques are compared with the naive approach of having all sensors report experimental results show that both techniques provide substantial message savings compared to the naivealgorithm usually requiring less than as many messages and less than for some data sets furthermore though the decimation algorithm does not provide guaranteed error bound for our experiments less than of the interpolated values exceeded the given bound
many parallel applications exhibit unpredictable communication between threads leading to contention for shared objects the choice of contention management strategy impacts strongly the performance and scalability of these applications spinning provides maximum performance but wastes significant processor resources while blocking based approaches conserve processor resources but introduce high overheads on the critical path of computation under situations of high or changing load the operating system complicates matters further with arbitrary scheduling decisions which often preempt lock holders leading to long serialization delays until the preempted thread resumes execution we observe that contention management is orthogonal to the problems of scheduling and load management and propose to decouple them so each may be solved independently and effectively to this end we propose load control mechanism which manages the number of active threads in the system separately from any contention which may exist by isolating contention management from damaging interactions with the os scheduler we combine the efficiency of spinning with the robustness of blocking the proposed load control mechanism results in stable high performance for both lightly and heavily loaded systems requires no special privileges or modifications at the os level and can be implemented as library which benefits existing code
more and more web users ask for contents and services highly tailored to their particular contexts of use especially due to the increasing affordability of new and powerful mobile communication devices they also appreciate the availability of ubiquitous access independent from the device actually in use due to such premises traditional software design methods need to be extended and new issues and requirements need to be addressed for supporting context aware access to services and applications in this paper we propose model driven approach towards adaptive context aware web applications accompanied by general purpose execution framework enabling active context awareness whereas conventional adaptive hypermedia systems address the problem of adapting html pages in response to user generated requests in this work we especially stress the importance of user independent context triggered adaptivity actions this finally leads us to interpret the context as an active actor operating independently from users during their navigations
evolutionary methods have been used to repair programs automatically with promising results however the fitness function used to achieve these results was based on few simple test cases and is likely too simplistic for larger programs and more complex bugs we focus here on two aspects of fitness evaluation efficiency and precision efficiency is an issue because many programs have hundreds of test cases and it is costly to run each test on every individual in the population moreover the precision of fitness functions based on test cases is limited by the fact that program either passes test case or does not which leads to fitness function that can take on only few distinct values this paper investigates two approaches to enhancing fitness functions for program repair incorporating test suite selection to improve efficiency and formal specifications to improve precision we evaluate test suite selection on programs improving running time for automated repair by we evaluate program invariants using the fitness distance correlation fdc metric demonstrating significant improvements and smoother evolution of repairs
image computation is fundamental step in formal verification of sequential systems including sequential equivalence checking and symbolic model checking since conventional reduced ordered binary decision diagram robdd based methods can potentially suffer from memory explosion there has been growing interest in using automatic test pattern generation atpg boolean satisfiability sat based techniques in recent years while atpg has been successful for computing pre image image computation presents very different set of problems in this paper we present novel backtracing based atpg technique for forward image computation we carefully alter the atpg engine to compute the image cubes and store them incrementally in zero suppressed binary decision diagram zbdd in order to improve the efficiency of image computation we propose three heuristics gate observability based decision selection heuristics to accelerate atpg ii search state based learning techniques supported with proof for correctness and iii on the fly state set minimization techniques to reduce the size of computed image set experimental results on iscas and itc benchmark circuits show that we can achieve orders of magnitude improvement over obdd based and sat based techniques
in recent years explosively growing information makes the users confused in making decisions among various kinds of products such as music movies books etc as result it is challenging issue to help the user identify what she he prefers to this end so called recommender systems are proposed to discover the implicit interests in user’s mind based on the usage logs however the existing recommender systems suffer from the problems of cold start first rater sparsity and scalability to alleviate such problems we propose novel recommender namely frsa fusion of rough set and average category rating that integrates multiple contents and collaborative information to predict user’s preferences based on the fusion of rough set and average category rating through the integrated mining of multiple contents and collaborative information our proposed recommendation method can successfully reduce the gap between the user’s preferences and the automated recommendations the empirical evaluations reveal that the proposed method frsa can associate the recommended items with user’s interests more effectively than other existing well known ones in terms of accuracy
in privacy preserving data mining there is need to consider on line data collection applications in client server to user csu model in which trusted server can help clients create and disseminate anonymous data existing privacy preserving data publishing ppdp and privacy preserving data collection ppdc methods do not sufficiently address the needs of these applications in this paper we present novel ppdc method that lets respondents clients use generalization to create anonymous data in the csu model generalization is widely used for ppdp but has not been used for ppdc we propose new probabilistic privacy measure to model distribution attack and use it to define the respondent’s problem rp for finding an optimal anonymous tuple we show that rp is np hard and present heuristic algorithm for it our method is compared with number of existing ppdc and ppdp methods in experiments based on two uci datasets and two utility measures preliminary results show that our method can better protect against the distribution attack and provide good balance between privacy and data utility
palamidessi has shown that the calculus with mixed choice is powerful enough to solve the leader election problem on symmetric ring of processes we show that this is also possible in the calculus of mobile ambients ma without using communication or restriction following palamidessi’s methods we deduce that there is no encoding satisfying certain conditions from ma into ccs we also show that the calculus of boxed ambients is more expressive than its communication free fragment
hyperlog is declarative graph based language that supports database querying and update it visualizes schema information data and query output as sets of nested graphs which can be stored browsed and queried in uniform way thus the user need only be familiar with very small set of syntactic constructs hyperlog queries consist of set of graphs that are matched against the database database updates are supported by means of programs consisting of set of rules this paper discusses the formulation evaluation expressiveness and optimization of hyperlog queries and programs we also describe prototype implementation of the language and we compare and contrast our approach with work in number of related areas including visual database languages graph based data models database update languages and production rule systems
different approaches and tools have been proposed to support change impact analysis ie the identification of the potential consequences of change or the estimation of what needs to be modified to accomplish change however just few empirical studies of software developers actual change impact analysis approaches have been reported in the literature to minimize this gap this paper describes an empirical study of two software development teams it describes through the presentation of ethnographic data the strategies used by software developers to handle the effect of software dependencies and changes in their work the concept of impact management is proposed as an analytical framework to present these practices and is used to suggest avenues for future research in change impact analysis techniques
resource allocation is key aspect of shared testbed infrastructures such as planetlab and emulab despite their many differences both types of testbed have many resource allocation issues in common in this paper we explore issues related to designing general resource allocation interface that is sufficient for wide variety of testbeds current and future our explorations are informed by our experience developing and running emulab’s assign resource allocator and the sword resource discoverer our experience with the planetlab and emulab testbeds and our projection of future testbed needs
with the advent of multicore and many core architectures we are facing problem that is new to parallel computing namely the management of hierarchical parallel caches one major limitation of all earlier models is their inability to model multicore processors with varying degrees of sharing of caches at different levels we propose unified memory hierarchy model that addresses these limitations and is an extension of the mhg model developed for single processor with multi memory hierarchy we demonstrate that our unified framework can be applied to number of multicore architectures for variety of applications in particular we derive lower bounds on memory traffic between different levels in the hierarchy for financial and scientific computations we also give multicore algorithms for financial application that exhibits constant factor optimal amount of memory traffic between different cache levels we implemented the algorithm on multicore system with two quad core intel xeon ghz processors having total of cores our algorithms outperform compiler optimized and auto parallelized code by factor of up to
in this paper we outline an approach for network based information access and exploration in contrast to existing methods the presented framework allows for the integration of both semantically meaningful information as well as loosely coupled information fragments from heterogeneous information repositories the resulting bisociative information networks bisonets together with explorative navigation methods facilitate the discovery of links across diverse domains in addition to such chains of evidence they enable the user to go back to the original information repository and investigate the origin of each link ultimately resulting in the discovery of previously unknown connections between information entities of different domains subsequently triggering new insights and supporting creative discoveries
graph searching is one of the most popular tools for analyzing the chase for powerful and hostile software agent called the intruder by set of software agents called the searchers in network the existing solutions for the graph searching problem suffer however from serious drawback they are mostly centralized and assume global synchronization mechanism for the searchers in particular the search strategy for every network is computed based on the knowledge of the entire topology of the network and the moves of the searchers are controlled by centralized mechanism that decides at every step which searcher has to move and what movement it has to perform this paper addresses the graph searching problem in distributed setting we describe distributed protocol that enables searchers with logarithmic size memory to clear any network in fully decentralized manner the search strategy for the network in which the searchers are launched is computed online by the searchers themselves without knowing the topology of the network in advance it performs in an asynchronous environment ie it implements the necessary synchronization mechanism in decentralized manner in every network our protocol performs connected strategy using at most searchers where is the minimum number of searchers required to clear the network in monotone connected way using strategy computed in the centralized and synchronous setting
there is growing interest in the networked sensing community in the technique of macroprogramming where the end user can design system using high level description without worrying about the node level details since the burden of customizing the code to the target architecture is moved to the compiler that translates the high level description to generate node level codes research on the issues involved in compilation of such program assumes importance in this paper we list some issues that need to be resolved by the designers of compiler for such macroprogramming framework including the decisions to be made in the choice of an abstraction the design of the runtime system and the generating of the code for each node we discuss some solution techniques that we are currently exploring to solve the above problems
traditional compiler techniques developed for sequential programs do not guarantee the correctness sequential consistency of compiler transformations when applied to parallel programs this is because traditional compilers for sequential programs do not account for the updates to shared variable by different threads we present concurrent static single assignment cssa form for parallel programs containing cobegin coend and parallel do constructs and post wait synchronization primitives based on the cssa form we present copy propagation and dead code elimination techniques also global value numbering technique that detects equivalent variables in parallel programs is presented by using global value numbering and the cssa form we extend classical common subexpression elimination redundant load store elimination and loop invariant detection to parallel programs without violating sequential consistency these optimization techniques are the most commonly used techniques for sequential programs by extending these techniques to parallel programs we can guarantee the correctness of the optimized program and maintain single processor performance in multiprocessor environment
data integration is the problem of combining data residing at different sources and providing the user with unified view of these data the problem of designing data integration systems is important in current real world applications and is characterized by number of issues that are interesting from theoretical point of view this document presents on overview of the material to be presented in tutorial on data integration the tutorial is focused on some of the theoretical issues that are relevant for data integration special attention will be devoted to the following aspects modeling data integration application processing queries in data integration dealing with inconsistent data sources and reasoning on queries
in the multiapplicative context of smart cards strict control of underlying information flow between applications is highly desired in this paper we propose model to improve information flow usability in such systems by limiting the overhead for adding information flow security to java virtual machine we define domain specific language for defining security policies describing the allowed information flow inside the card the applications are certified at loading time with respect to information flow security policies we illustrate our approach on the loyaltycard multiapplicative smart card involving four loyalty applications sharing fidelity points
future generations of chip multiprocessors cmp will provide dozens or even hundreds of cores inside the chip writing applications that benefit from the massive computational power offered by these chips is not going to be an easy task for mainstream programmers who are used to sequential algorithms rather than parallel ones this paper explores the possibility of using transactional memory tm in openmp the industrial standard for writing parallel programs on shared memory architectures for and fortran one of the major complexities in writing openmp applications is the use of critical regions locks atomic regions and barriers to synchronize the execution of parallel activities in threads tm has been proposed as mechanism that abstracts some of the complexities associated with concurrent access to shared data while enabling scalable performance the paper presents first proof of concept implementation of openmp with tm some language extensions to openmp are proposed to express transactions these extensions are implemented in our source to source openmp mercurium compiler and our software transactional memory stm runtime system nebelung that supports the code generated by mercurium hardware transactional memory htm or hardware assisted stm hastm are seen as possible paths to make the tandem tm openmp more scalable in the evaluation section we show the preliminary results the paper finishes with set of open issues that still need to be addressed either in openmp or in the hardware software implementations of tm
with the development of inexpensive storage devices space usage is no longer bottleneck for computer users however the increasingly large amount of personal information poses critical problem to those users traditional file organization in hierarchical directories may not be suited to the effective management of personal information because it ignores the semantic associations therein and bears no connection with the applications that users will run to address such limitations we present our vision of semantic desktop which relies on the use of ontologies to annotate and organize data and on the concept of personal information application pia which is associated with user’s task the pia designer is the tool that is provided for building variety of pias consisting of views eg text list table graph which are spatially arranged and display interrelated fragments of the overall personal information the semantic organization of the data follows layered architecture that models separately the personal information the domain data and the application data the network of concepts that ensues from extensive annotation and explicit associations lends itself well to rich browsing capabilities and to the formulation of expressive database like queries these queries are also the basis for the interaction among views of the pias in the same desktop or in networked desktops in the latter case the concept of desktop service provides for semantic platform for the integration of information across different desktops and the web in this paper we present in detail the semantic organization of the information the overall system architecture and implementation aspects queries and their processing pias and the pia designer including usability studies on the designer and the concepts of semantic navigation in desktop and of interoperation in network of desktops
power analysis tools are an integral component of any current power sign off methodology the performance of design’s power grid affects the timing and functionality of circuit directly impacting the overall performance ensuring power grid robustness implies taking into account among others static and dynamic effects of voltage drop ground bounce and electromigration this type of verification is usually done by simulation targeting worst case scenario where devices switching almost simultaneously could impose stern current demands on the power grid while determination of the exact worst case switching conditions from the grid perspective is usually not practical the choice of simulation stimuli has critical effect on the results of the analysis targetting safe but unrealistic settings could lead to pessimistic results and costly overdesigns in terms of die area in this article we describe software tool that generates reasonable realistic set of stimuli for simulation the approach proposed accounts for timing and spatial restrictions that arise from the circuit’s netlist and placement and generates an approximation to the worst case condition the resulting stimuli indicate that only fraction of the gates change in any given timing window leading to more robust verification methodology especially in the dynamic case generating such stimuli is akin to performing standard static timing analysis so the tool fits well within conventional design frameworks furthermore the tool can be used for hotspot detection in early design stages
relevance feedback is an effective scheme bridging the gap between high level semantics and low level features in content based image retrieval cbir in contrast to previous methods which rely on labeled images provided by the user this article attempts to enhance the performance of relevance feedback by exploiting unlabeled images existing in the database concretely this article integrates the merits of semisupervised learning and active learning into the relevance feedback process in detail in each round of relevance feedback two simple learners are trained from the labeled data that is images from user query and user feedback each learner then labels some unlabeled images in the database for the other learner after retraining with the additional labeled data the learners reclassify the images in the database and then their classifications are merged images judged to be positive with high confidence are returned as the retrieval result while those judged with low confidence are put into the pool which is used in the next round of relevance feedback experiments show that using semisupervised learning and active learning simultaneously in cbir is beneficial and the proposed method achieves better performance than some existing methods
in many sensor network applications it is critical for the base station to know the delivery or execution status of its broadcast messages or commands one straightforward way to do so is to let every sensor node send an authenticated acknowledgement ack to the bs directly however this naive solution is highly communication inefficient and may result in severe ack implosion near the bs in this paper we propose communication efficient scheme to provide secure feedback service in sensor networks in our basic scheme we use ack aggregation to reduce the ack traffic meanwhile we store audit information for each aggregation operation so that the bs can use the audit information to locate errors in the network we further improve the basic scheme by constructing balanced aggregation tree to reduce localization delay and using bloom filters to reduce storage requirement in each sensor for storing audit information we analyze the performance of the proposed scheme and show it achieves good bandwidth gain over the naive approach
in this paper the multiple piecewise constant mpc active contour model is extended to deal with multiphase case this proposed multiphase model can be effectively optimized by solving the minimum cuts problem of specially devised multilayer graph based on the proposed energy functional and its graph cuts optimization an interactively multiphase partition method for image segmentation is presented the user places some scribbles with different colors on the image according to the practical application demand and each group of scribbles with the same color corresponds to potential image region the distribution of each region can be learned from the input scribbles with some particular color then the corresponding multilayer graph can be constructed and its minimum cuts can be computed to determine the segmentation result of the image numerical experiments show that the proposed interactively multiphase segmentation method can accurately segment the image into different regions according to the input scribbles with different color
we present an approach for the verification of spatial properties with spin we first extend one of spin’s main property specification mechanisms ie the linear time temporal logic ltl with spatial connectives that allow us to restrict the reasoning of the behaviour of system to some components of the system only for instance one can express whether the system can reach certain state from which subset of processes can evolve alone until some property is fulfilled we give model checking algorithm for the logic and propose how spin can be minimally extended to include the algorithm we also discuss potential improvements to mitigate the exponential complexity introduced by spatial connectives finally we present some experiments that compare our spin extension with spatial model checker for the calculus
intuition is often not good guide to know which testing strategies will work best there is no substitute for experimental analysis based on objective criteria how many faults strategy finds and how fast random testing is an example of an idea that intuitively seems simplistic or even dumb but when assessed through such criteria can yield better results than seemingly smarter strategies the efficiency of random testing is improved if the generated inputs are evenly spread across the input domain this is the idea of adaptive random testing art art was initially proposed for numerical inputs on which notion of distance is immediately available to extend the ideas to the testing of object oriented software we have developed notion of distance between objects and new testing strategy called artoo which selects as inputs objects that have the highest average distance to those already used as test inputs artoo has been implemented as part of tool for automated testing of object oriented software we present the artoo concepts their implementation and set of experimental results of its application analysis of the results shows in particular that compared to directed random strategy artoo reduces the number of tests generated until the first fault is found in some cases by as much as two orders of magnitude artoo also uncovers faults that the random strategy does not find in the time allotted and its performance is more predictable
this paper presents principled framework for efficient processing of ad hoc top ranking aggregate queries which provide the groups with the highest aggregates as results essential support of such queries is lacking in current systems which process the queries in naïve materialize group sort scheme that can be prohibitively inefficient our framework is based on three fundamental principles the upper bound principle dictates the requirements of early pruning and the group ranking and tuple ranking principles dictate group ordering and tuple ordering requirements they together guide the query processor toward provably optimal tuple schedule for aggregate query processing we propose new execution framework to apply the principles and requirements we address the challenges in realizing the framework and implementing new query operators enabling efficient group aware and rank aware query plans the experimental study validates our framework by demonstrating orders of magnitude performance improvement in the new query plans compared with the traditional plans
the effectiveness of current software development strategies such as model driven development mdd depends largely on the quality of their primary artefacts ie software models as the standard modelling language for software systems is the unified modelling language uml quality assurance of uml models is major research field in computer science understandability ie model’s ability to be easily understood is one model quality property that is currently heavily under investigation in particular researchers are searching for the factors that determine an uml model’s understandability and are looking for ways to manipulate these factors this paper presents an empirical study investigating the effect that structural complexity has on the understandability of one particular type of uml model ie the statechart diagram based on data collected in family of three experiments we have identified three dimensions of structural complexity that affect understandability the size and control flow complexity of the statechart in terms of features such as the number of states events guards and state transitions ii the actions that are performed when entering or leaving state iii the sequence of actions that is performed while staying within state based on these structural complexity dimensions we have built an understandability prediction model using regression technique that is specifically recommended for data obtained through repeated measures design our test results show that each of the underlying structural complexity dimensions has significant impact on the understandability of statechart diagram
sensor network is network consisting of small inexpensive low powered sensor nodes that communicate to complete common task sensor nodes are characterized by having limited communication and computation capabilities energy and storage they often are deployed in hostile environments creating demand for encryption and authentication of the messages sent between them due to severe resource constraints on the sensor nodes efficient key distribution schemes and secure communication protocols with low overhead are desired in this paper we present an asynchronous group key distribution scheme with no time synchronization requirements the scheme decreases the number of key updates by providing them on an as needed basis according to the amount of network traffic we evaluate the cc radio security mechanism and show how to use it as basis to implement secure group communication using our proposed group key distribution scheme
when comparing discrete probability distributions natural measures of similarity are not distances but rather are information divergences such as kullback leibler and hellinger this paper considers some of the issues related to constructing small space sketches of distributions in the data stream model concept related to dimensionality reduction such that these measures can be approximated from the sketches related problems for distances are reasonably well understood via series of results by johnson and lindenstrauss contemp math alon et al comput syst sci indyk ieee symposium on foundations of computer science pp and brinkman and charikar ieee symposium on foundations of computer science pp in contrast almost no analogous results are known to date about constructing sketches for the information divergences used in statistics and learning theoryour main result is an impossibility result that shows that no small space sketches exist for the multiplicative approximation of any commonly used divergences and bregman divergences with the notable exceptions of and where small space sketches exist we then present data stream algorithms for the additive approximation of wide range of information divergences throughout our emphasis is on providing general characterizations
traditionally the performance of distributed algorithms has been measured in terms of time and message complexity message complexity concerns the number of messages transmitted over all the edges during the course of the algorithm however in energy constrained ad hoc wireless networks eg sensor networks energy is critical factor in measuring the efficiency of distributed algorithm transmitting message between two nodes has an associated cost energy and moreover this cost can depend on the two nodes eg the distance between them among other things thus in addition to the time and message complexity it is important to consider energy complexity that accounts for the total energy associated with the messages exchanged among the nodes in distributed algorithm this paper addresses the minimum spanning tree mst problem fundamental problem in distributed computing and communication networks we study energy efficient distributed algorithms for the euclidean mst problem assuming random distribution of nodes we show non trivial lower bound of log on the energy complexity of any distributed mst algorithm we then give an energy optimal distributed algorithm that constructs an optimal mst with energy complexity log on average and log log log with high probability this is an improvement over the previous best known bound on the average energy complexity of log our energy optimal algorithm exploits novel property of the giant component of sparse random geometric graphs all of the above results assume that nodes do not know their geometric coordinates if the nodes know their own coordinates then we give an algorithm with energy complexity which is the best possible that gives an approximation to the mst
we present the formal semantics of future in scheme like language which has both side effects and first class continuations correctness is established by proving that programs annotated by future have the same observable behaviour as their non annotated counterparts even though evaluation may be parallel
we review query log of hundreds of millions of queries that constitute the total query traffic for an entire week of general purpose commercial web search service previously query logs have been studied from single cumulative view in contrast our analysis shows changes in popularity and uniqueness of topically categorized queries across the hours of the day we examine query traffic on an hourly basis by matching it against lists of queries that have been topically pre categorized by human editors this represents of the query traffic we show that query traffic from particular topical categories differs both from the query stream as whole and from other categories this analysis provides valuable insight for improving retrieval effectiveness and efficiency it is also relevant to the development of enhanced query disambiguation routing and caching algorithms
while deployment and practical on site testing remains the ultimate touchstone for sensor network code good simulation tools can help curtail in field troubleshooting time unfortunately current simulators are successful only at evaluating system performance and exposing manifestations of errors they are not designed to diagnose the root cause of the exposed anomalous behavior this paper presents diagnostic simulator implemented as an extension to tossim it allows the user to ask questions such as why is some specific bad behavior occurring and ii conjectures on possible causes of the user specified behavior when it is encountered during simulation the simulator works by logging event sequences and states produced in regular simulation run it then uses sequence extraction and frequent pattern analysis techniques to recognize sequences and states that are possible root causes of the user defined undesirable behavior to evaluate the effectiveness of the tool we have implemented the directed diffusion protocol and used our tool during the development process during this process the tool was able to uncover two design bugs that were not addressed in the original protocol the manifestation of these two bugs were same but the causes of failure were completely different one was triggered by node reboot and the other was triggered by an overflow of timestamps generated by the local clock the case study demonstrates success scenario for diagnostic simulation
monitoring and diagnosing software based on requirement models is problem that has recently received lot of attention in field of requirement engineering in this context wang et al propose framework that uses goal models to diagnose failures in software at different levels of granularity in this paper we extend wang’s framework to monitor and diagnose malicious attacks our extensions include the addition of anti goals to model attacker intentions as well as context based modeling of the domain within which our system operates the extended framework has been implemented and evaluated through series of experiments intended to test its scalability
an iconic image database is collection of symbolic images where each image is collection of labeled point features called icons method is presented to support fast position independent similarity search in an iconic database for symbolic images where the similarity condition involves finding icon pairs that satisfy specific spatial relationship this is achieved by introducing an index data structure based on space which corresponds to the cartesian product of separation ie inter icon distance and some representation of relative spatial orientation in this space each pairing of two icons is represented by single point and all pairs with the same separation and relative orientation regardless of absolute position map to the same point similarly all icon pairs with the same separation but different relative orientations map to points on line parallel to the axis while all pairs with different separations but the same relative orientation map to points on line parallel to the axis using such an index database search for icon pairs with given spatial relationship or range is accomplished by examining the subarea of the index space into which desired pairs would map this index space can be organized using well known spatial database techniques such as quadtrees or trees although the size of such an index grows only linearly with respect to the number of images in the collection it grows quadratically with the average number of icons in an image scheme is described to reduce the size of the index by pruning away subset of the pairs at the cost of incurring additional work when searching the database this pruning is governed by parameter whose variation provides continuous range of trade offs between index size and search time
data stream classification is hot topic in data mining research the great challenge is that the class priors may evolve along the data sequence algorithms have been proposed to estimate the dynamic class priors and adjust the classifier accordingly however the existing algorithms do not perform well on prior estimation due to the lack of samples from the target distribution sample size has great effects in parameter estimation and small sample effects greatly contaminate the estimation performance in this paper we propose novel parameter estimation method called transfer estimation transfer estimation makes use of samples not only from the target distribution but also from similar distributions we apply this new estimation method to the existing algorithms and obtain an improved algorithm experiments on both synthetic and real data sets show that the improved algorithm outperforms the existing algorithms on both class prior estimation and classification
domain specific techniques take advantage of the commonalities among applications developed within certain domain they are known to improve quality and productivity by incorporating domain knowledge and previous project experiences and promote reuse this paper describes six domain specific software engineering techniques for developing multimedia applications within the digital library domain we provide examples of each technique from several projects in which they were used how the techniques are used within general software engineering practice in particular mbase how the techniques address some of the particular challenges multimedia software engineering and the positive impacts we have measured resulting from their use within graduate level software engineering course
in this paper we introduce new surface representing the displaced subdivision surface it represents detailed surface model as scalar valued displacement over smooth domain surface our representation defines both the domain surface and the displacement function using unified subdivision framework allowing for simple and efficient evaluation of analytic surface properties we present simple automatic scheme for converting detailed geometric models into such representation the challenge in this conversion process is to find simple subdivision surface that still faithfully expresses the detailed model as its offset we demonstrate that displaced subdivision surfaces offer number of benefits including geometry compression editing animation scalability and adaptive rendering in particular the encoding of fine detail as scalar function makes the representation extremely compact
gifford and others proposed an effect typing discipline to delimit the scope of computational effects within program while moggi and others proposed monads for much the same purpose here we marry effects to monads uniting two previously separate lines of research in particular we show that the type region and effect system of talpin and jouvelot carries over directly to an analogous system for monads including type and effect reconstruction algorithm the same technique should allow one to transpose any effect systems into corresponding monad system
our work addresses the spatiotemporally varying nature of data traffic in environmental monitoring and surveillance applications by employing network controlled mobile basestation mb we present simple energy efficient data collection protocol for wireless sensor networks wsns in contrast to the existing mb based solutions where wsn nodes buffer data passively until visited by an mb our protocol maintains an always on multihop connectivity to the mb by means of an efficient distributed tracking mechanism this allows the nodes to forward their data in timely fashion avoiding latencies due to long term buffering our protocol progressively relocates the mb closer to the regions that produce higher data rates and reduces the average weighted multihop traffic enabling energy savings using the convexity of the cost function we prove that our local and greedy protocol is in fact optimal
entity matching is crucial and difficult task for data integration entity matching frameworks provide several methods and their combination to effectively solve different match tasks in this paper we comparatively analyze proposed frameworks for entity matching our study considers both frameworks which do or do not utilize training data to semi automatically find an entity matching strategy to solve given match task moreover we consider support for blocking and the combination of different match algorithms we further study how the different frameworks have been evaluated the study aims at exploring the current state of the art in research prototypes of entity matching frameworks and their evaluations the proposed criteria should be helpful to identify promising framework approaches and enable categorizing and comparatively assessing additional entity matching frameworks and their evaluations
the aim of this paper is to show how the generic approach to connector architectures presented in the first part of this work can be applied to given modeling formalism to define architectural component and connector notions associated to that formalism starting with review of the generic approach in this second part of the paper we consider two modeling formalisms elementary petri nets and csp as main results we show that both cases satisfy the axioms of our component framework so that the results concerning the semantics of architectures can be applied moreover small case study in terms of petri nets is presented in order to show how the results can be applied to connector architecture based on petri nets
the skyline of dimensional dataset consists of all points not dominated by others the incorporation of the skyline operator into practical database systems necessitates an efficient and effective cardinality estimation module however existing theoretical work on this problem is limited to the case where all dimensions are independent of each other which rarely holds for real datasets the state of the art log sampling ls technique simply applies theoretical results for independent dimensions to non independent data anyway sometimes leading to large estimation errors to solve this problem we propose novel kernel based kb approach that approximates the skyline cardinality with nonparametric methods extensive experiments with various real datasets demonstrate that kb achieves high accuracy even in cases where ls fails at the same time despite its numerical nature the efficiency of kb is comparable to that of ls furthermore we extend both ls and kb to the dominant skyline which is commonly used instead of the conventional skyline for high dimensional data
this paper reports on study that explored ways of inventing and devising movement for use in the design of movement based interaction with video based motion sensing technologies methods that dancers trained in movement improvisation and performance making used to choreograph movement were examined as sources of potential methods for technology designers the findings enabled us to develop methods and tools for creating and structuring new movements based on felt experience and the creative potential of the moving body these methods and tools contribute to the ongoing development of design methodology underpinned by the principle of making strange by making strange we mean ways of unsettling habitual perceptions and conceptions of the moving body to arrive at fresh appreciations and perspectives for design that are anchored in the sensing feeling and moving body
the availability of low cost hardware has increased the development of distributed systems the allocation of resources in these systems may be optimised through the use of load balancing algorithm the load balancing algorithms are responsible for the homogeneous distribution of the occupation in the environment with the aim of obtaining gains on the final performance this paper presents and analyses new load balancing algorithm that is based on logical computer tree hierarchy the analysis is made using results obtained by simulator and prototype of this algorithm the algorithm simulations show significant performance gain by lowering the response times and the number of messages that pass through the communication system to perform the load balancing operations after good results were obtained by the simulations prototype was built and validated such results
large multi processor systems on chip use networks on chip with high degree of reusability and scalability for message communication therefore network infrastructure is crucial element affecting the overall system performance on the other hand technology improvements may lead to much energy consumption in micro routers of an on chip network this necessitates an exhaustive analysis of nocs for future designs this paper presents comprehensive analytical model to predict message latency for different data flows traversing across the network this model considers channel buffers of multiple flits which were not previously studied in noc context also architectural descriptions of the overall consumed power in the network components are extracted considering message arrival and service rates the results obtained from simulation experiments confirm that the proposed performance and power models exhibit good accuracy for various network configurations and workloads
networks on chip noc have emerged as the design paradigm for scalable system on chip communication infrastructure growing number of applications often with firm frt or soft real time srt requirements are integrated on the same chip to provide time related guarantees noc resources are reserved eg by non work conserving time division multiplexing tdm traditionally reservations are made on per communication channel basis thus providing frt guarantees to individual channels for srt applications this strategy is overly restrictive as slack bandwidth is not used to improve performance in this paper we introduce the concept of channel trees where time slots are reserved for sets of communication channels by employing work conserving arbitration within tree we exploit the inherent single threaded behaviour of the resource at the root of the tree resulting in drastic reduction in both average case latency and tdm table size we show how channel trees enable us to halve the latter in car entertainment soc and reduce the average latency by as much as much as in mobile phone soc by applying channel trees to an decoder soc we increase processor utilisation by
distributed processors must balance communication and concurrency when dividing instructions among the processors key factors are the available concurrency criticality of dependence chains and communication penalties the amount of concurrency determines the importance of the other factors if concurrency is high wider distribution of instructions is likely to tolerate the increased operand routing latencies if concurrency is low mapping dependent instructions close to one another is likely to reduce communication costs that contribute to the critical path this paper explores these tradeoffs for distributed explicit dataflow graph execution edge architectures that execute blocks of dataflow instructions atomically runtime block mapper assigns instructions from single thread to distributed hardware resources cores based on compiler assigned instruction identifiers we explore two approaches fixed strategies that map all blocks to the same number of cores and adaptive strategies that vary the number of cores for each block the results show that best fixed strategy varies based on the cores issue width simple adaptive strategy improves performance over the best fixed strategies for single and dual issue cores but its benefits decrease as the cores issue width increases these results show that by choosing an appropriate runtime block mapping strategy average performance can be increased by while simultaneously reducing average operand communication by saving energy as well as improving performance these results indicate that runtime block mapping is promising mechanism for balancing communication and concurrency in distributed processors
new progressive lossless triangular mesh encoder is proposed in this work which can encode any triangular mesh with an arbitrary topological structure given mesh the quantized vertices are first partitioned into an octree ot structure which is then traversed from the root and gradually to the leaves during the traversal each cell in the tree front is subdivided into eight childcells for each cell subdivision both local geometry and connectivity changes are encoded where the connectivity coding is guided by the geometry coding furthermore prioritized cell subdivision is performed in the tree front to provide better rate distortion rd performance experiments show that the proposed mesh coder outperforms the kd tree algorithm in both geometry and connectivity coding efficiency for the geometry coding part the range of improvement is typically around but may go up to for meshes with highly regular geometry data and or tight clustering of vertices
this article provides some new insight into the properties of four well established classifier paradigms namely support vector machines svm classifiers based on mixture density models cmm fuzzy classifiers fcl and radial basis function neural networks rbf it will be shown that these classifiers can be formulated in way such that they are functionally equivalent or at least highly similar the interpretation of specific classifier as being an svm cmm fcl or rbf then only depends on the objective function and the optimization algorithm used to adjust the parameters the properties of these four paradigms however are very different discriminative classifier such as an svm is expected to have optimal generalization capabilities on new data generative classifier such as cmm also aims at modeling the processes from which the observed data originate and comprehensible classifier such as an fcl is intended to be parameterized and understood by human domain experts we will discuss the advantages and disadvantages of these properties and show how they can be measured numerically in order to compare these classifiers in such way the article aims at supporting practitioner in assessing the properties of classifier paradigms and in selecting or combining certain paradigms for given application problem
feedback directed optimization has developed into an increasingly important tool in designing optimizing compilers based upon profiling memory distance analysis has shown much promise in predicting data locality and memory dependences and has seen use in locality based optimizations and memory disambiguation in this paper we apply form of memory distance called store distance to the problem of memory disambiguation in out of order issue processors store distance is defined as the number of store references between load and the previous store accessing the same memory location by generating representative store distance for each load instruction we can apply compiler micro architecture cooperative scheme to direct run time load speculation using store distance the processor can in most cases accurately determine on which specific store instruction load depends according to its store distance annotation our experiments show that the proposed store distance method performs much better than the previous distance based memory disambiguation scheme and yields performance very close to perfect memory disambiguation the store distance based scheme also outperforms the store set technique with relatively small predictor space and achieves performance comparable to that of entry store set implementation for both floating point and integer programs
in this paper we propose linear model based general framework to combine best parse outputs from multiple parsers the proposed framework leverages on the strengths of previous system combination and re ranking techniques in parsing by integrating them into linear model as result it is able to fully utilize both the logarithm of the probability of each best parse tree from each individual parser and any additional useful features for feature weight tuning we compare the simulated annealing algorithm and the perceptron algorithm our experiments are carried out on both the chinese and english penn treebank syntactic parsing task by combining two state of the art parsing models head driven lexicalized model and latent annotation based un lexicalized model experimental results show that our scores of on chinese and on english outperform the previously best reported systems by and respectively
in this paper we present cosmopen reverse engineering tool optimized for the behavioural analysis of complex layered software cosmopen combines cheap and non intrusive observation techniques with versatile graph manipulation engine by programming different graph manipulation scripts the lsquo focal length rsquo of our tool can be adapted to different abstraction levels we illustrate how our tool can be used to extract high level behavioural models from complex multi threaded platform gnu linux corba middleware copyright copy john wiley sons ltd
requirements views such as coverage and status views are an important asset for monitoring and managing software development projects we have developed method that automates the process of reconstructing these views and we have built tool reqanalyst that supports this method this paper presents an investigation as to which extent requirements views can be automatically generated in order to monitor requirements in industrial practice the paper focuses on monitoring the requirements in test categories and test cases in order to retrieve the necessary data an information retrieval technique called latent semantic indexing was used the method was applied in an industrial study number of requirements views were defined and experiments were carried out with different reconstruction settings for generating these views finally we explored how these views can help the developers during the software development process
in this paper we conduct an in depth evaluation of broad spectrum of scheduling alternatives for clusters these include the widely used batch scheduling local scheduling gang scheduling all prior communication driven coscheduling algorithms dynamic coscheduling dcs spin block sb periodic boost pb and co ordinated coscheduling cc and newly proposed hybrid coscheduling algorithm on node myrinet connected linux cluster performance and energy measurements using several nas llnl and anl benchmarks on the linux cluster provide several interesting conclusions first although batch scheduling is currently used in most clusters all blocking based coscheduling techniques such as sb cc and hybrid and the gang scheduling can provide much better performance even in dedicated cluster platform second in contrast to some of the prior studies we observe that blocking based schemes like sb and hybrid can provide better performance than spin based techniques like pb on linux platform third the proposed hybrid scheduling provides the best performance energy behavior and can be implemented on any cluster with little effort all these results suggest that blocking based coscheduling techniques are viable candidates to be used in clusters for significant performance energy benefits
sometimes people cannot remember the names or locations of things on their computer but they can remember what other things are associated with them we created feldspar the first system that fully supports this associative retrieval of personal information on the computer feldspar’s contributions include an intuitive user interface that allows users to find information by interactively and incrementally specifying multiple levels of associations as retrieval queries such as find the file from the person who met at an event in may and algorithms for collecting the association information and for providing answers to associative queries in real time user study showed that feldspar is easy to use and suggested that it might be faster than conventional browsing and searching for these kinds of retrieval tasks feldspar could be an important addition to search and browsing tools
in this paper we review studies of the growth of the internet and technologies that are useful for information search and retrieval on the web we present data on the internet from several different sources eg current as well as projected number of users hosts and web sites although numerical figures vary overall trends cited by the sources are consistent and point to exponential growth in the past and in the coming decade hence it is not surprising that about of internet users surveyed claim using search engines and search services to find specific information the same surveys show however that users are not satisfied with the performance of the current generation of search engines the slow retrieval speed communication delays and poor quality of retrieved results eg noise and broken links are commonly cited problems we discuss the development of new techniques targeted to resolve some of the problems associated with web based information retrieval and speculate on future trends
the increasing availability of new types of interaction platforms raises number of issues for designers and developers there is need for new methods and tools to support development of nomadic applications which can be accessed through variety of devices this paper presents solution based on the use of three levels of abstractions that allows designers to focus on the relevant logical aspects and avoid dealing with plethora of low level details we have defined number of transformations able to obtain user interfaces from such abstractions taking into account the available platforms and their interaction modalities while preserving usability the transformations are supported by an authoring tool teresa which provides designers and developers with various levels of automatic support and several possibilities for tailoring such transformations to their needs
in this paper we investigate the problem of active learning the partition of the dimensional hypercube into cubes where the th cube has color the model we are using is exact learning via color evaluation queries without equivalence queries as proposed by the work of fine and mansour we give randomized algorithm solving this problem in mlogn expected number of queries which is tight while its expected running time is logn furthermore we generalize the problem to allow partitions of the cube into monochromatic parts where each part is the union of cubes we give two randomized algorithms for the generalized problem the first uses logn expected number of queries which is almost tight with the lower bound however its naïve implementation requires an exponential running time in the second more practical algorithm achieves better running time complexity of tilde however it may fail to learn the correct partition with an arbitrarily small probability and it requires slightly more expected number of queries tilde mn where the tilde represents poly logarithmic factor in
device drivers commonly execute in the kernel to achieve high performance and easy access to kernel services however this comes at the price of decreased reliability and increased programming difficulty driver programmers are unable to use user mode development tools and must instead use cumbersome kernel tools faults in kernel drivers can cause the entire operating system to crash user mode drivers have long been seen as solution to this problem but suffer from either poor performance or new interfaces that require rewrite of existing drivers this paper introduces the microdrivers architecture that achieves high performance and compatibility by leaving critical path code in the kernel and moving the rest of the driver code to user mode process this allows data handling operations critical to performance to run at full speed while management operations such as initialization and configuration run at reduced speed in user level to achieve compatibility we present driverslicer tool that splits existing kernel drivers into kernel level component and user level component using small number of programmer annotations experiments show that as much as of driver code can be removed from the kernel without affecting common case performance and that only percent of the code requires annotations
we introduce cinematographic video production system to create movie like attractive footage from our indoor daily life since the system is designed for ordinary users in non studio environments it is composed of standard hardware components provides simple interface and works in near real time of frames sec the proposed system reconstructs visual hull from acquired multiple videos and then generates final videos from the model by referring to the camera shots used in film making the proposed method utilizes reliability to compensate for errors that may have occurred in non studio environments and to produce the most natural scene from the reconstructed model by using virtual camera control system even non experts can easily convert the model to movies that look as if they were created by experienced filmmakers
to solve consensus distributed systems have to be equipped with oracles such as failure detector leader capability or random number generator for each oracle various consensus algorithms have been devised some of these algorithms are indulgent toward their oracle in the sense that they never violate consensus safety no matter how the underlying oracle behaves this paper presents simple and generic indulgent consensus algorithm that can be instantiated with any specific oracle and be as efficient as any ad hoc consensus algorithm initially devised with that oracle in mind the key to combining genericity and efficiency is to factor out the information structure of indulgent consensus executions within new distributed abstraction which we call lambda interestingly identifying this information structure also promotes fine grained study of the inherent complexity of indulgent consensus we show that instantiations of our generic algorithm with specific oracles or combinations of them match lower bounds on oracle efficiency zero degradation and one step decision we show however that no leader or failure detector based consensus algorithm can be at the same time zero degrading and configuration efficient moreover we show that leader based consensus algorithms that are oracle efficient are inherently zero degrading but some failure detector based consensus algorithms can be both oracle efficient and configuration efficient these results highlight some of the fundamental trade offs underlying each oracle
information stored in logs of computer system is of crucial importance to gather forensic evidence of investigated actions or attacks against the system analysis of this information should be rigorous and credible hence it lends itself to formal methods we propose model checking approach to the formalization of the forensic analysis of logs the set of logs of certain system is modeled as tree whose labels are events extracted from the logs in order to provide structure to these events we express each event as term of term algebra the signature of the algebra is carefully chosen to include all relevant information necessary to conduct the analysis properties of the model are expressed as formulas of logic having dynamic linear temporal and modal characteristics moreover we provide tableau based proof system for this logic upon which model checking algorithm can be developed in order to illustrate the proposed approach the windows auditing system is studied the properties that we capture in our logic include invariant properties of system forensic hypotheses and generic or specific attack signatures moreover we discuss the admissibility of forensics hypotheses and the underlying verification issues
simultaneous multithreaded smt processors use data caches which are dynamically shared between threads depending on the processor workload sharing the data cache may harm performance due to excessive cache conflicts way to overcome this problem is to physically partition the cache between threads unfortunately partitioning the cache requires additional hardware and may lead to lower utilisation of the cache in certain workloads it is therefore important to consider software mechanisms to implicitly partition the cache between threads by controlling the locations in the cache in which each thread can load data this paper proposes standard program transformations for partitioning the shared data caches of smt processors if and only if there are conflicts between threads in the shared cache at runtime we propose transformations based on dynamic tiling the key idea is to use two tile sizes in the program one for single threaded execution mode and one suitable for multithreaded execution mode and switch between tile sizes at runtime our transformations combine dynamic tiling with either copying or storing arrays in block layout the paper presents an implementation of these transformations along with runtime mechanisms for detecting cache contention between threads and react to it on the fly our experimental results show that for regular perfect loop nests these transformations provide substantial performance improvements
research has investigated mappings among data sources under two perspectives on one side there are studies of practical tools for schema mapping generation these focus on algorithms to generate mappings based on visual specifications provided by users on the other side we have theoretical researches about data exchange these study how to generate solution ie target instance given set of mappings usually specified as tuple generating dependencies however despite the fact that the notion of core of data exchange solution has been formally identified as an optimal solution there are yet no mapping systems that support core computations in this paper we introduce several new algorithms that contribute to bridge the gap between the practice of mapping generation and the theory of data exchange we show how given mapping scenario it is possible to generate an executable script that computes core solutions for the corresponding data exchange problem the algorithms have been implemented and tested using common runtime engines to show that they guarantee very good performances orders of magnitudes better than those of known algorithms that compute the core as post processing step
below we present an information theoretic method for proving the amount of information leaked by programs formalized using the hol theorem prover the advantages of this approach are that the analysis is quantitative and therefore capable of expressing partial leakage and that proofs are performed using the hol theorem prover and are therefore guaranteed to be logically and mathematically consistent with the formalization the applicability of this methodology to proving privacy properties of privacy enhancing technologies is demonstrated by proving the anonymity of the dining cryptographers protocol to the best of the author’s knowledge this is the first machine verified proof of privacy of the dining cryptographers protocol for an unbounded number of participants and quantitative metric for privacy the acm portal is published by the association for computing machinery copyright acm inc terms of usage privacy policy code of ethics contact us useful downloads adobe acrobat quicktime windows media player real player
defocus matting is fully automatic and passive method for pulling mattes from video captured with coaxial cameras that have different depths of field and planes of focus nonparametric sampling can accelerate the video matting process from minutes to seconds per frame in addition super resolution technique efficiently bridges the gap between mattes from high resolution video cameras and those from low resolution cameras off center matting pulls mattes for an external high resolution camera that doesn’t share the same center of projection as the low resolution cameras used to capture the defocus matting data
typical collaborative filtering recommenders cf do not provide any chance for users to choose or evaluate the bases for recommendation once the system evaluates group of users as being similar to target user her information is tailored by unknown people’s taste as cultural event recommender pittcult provides way for users to rate the trustworthiness of other users then according to those ratings recommendation is generated this paper explains why trust based recommendation is necessary and how studies using pittcult cope with the problems of the existing cf
this paper presents design method for fuzzy rule based systems that performs data modeling consistently according to the symbolic relations expressed by the rules the focus of the model is the interpretability of the rules and the model’s accuracy such that it can be used as tool for data understanding the number of rules is defined by the eigenstructure analysis of the similarity matrix which is computed from data the rule induction algorithm runs clustering algorithm on the dataset and associates one rule to each cluster each rule is selected among all possible combinations of one dimensional fuzzy sets as the one nearest to cluster’s center the rules are weighted in order to improve the classifier performance and the weights are computed by bounded quadratic optimization problem the model complexity is minimized in structure selection search performed by genetic algorithm that selects simultaneously the most representative subset of variables and also the number of fuzzy sets in the fuzzy partition of the selected variables the resulting model is evaluated on set of benchmark datasets for classification problems the results show that the proposed approach produces accurate and yet compact fuzzy classifiers the resulting model is also evaluated from an interpretability point of view showing how the rule weights provide additional information to help data understanding and model exploitation
this paper presents an observational study of face to face university classrooms and provides preliminary design principles for improving interactivity in webcast presentations despite the fact that participation and interaction patterns appear to depend heavily on presentation style and class size useful patterns were observed and analyzed design principles presented include the need to support rapid changes in floor control multiple types of presentation technologies and the subtleties of awareness between the audience and presenter
great majority of program paths are found to be infeasible which in turn make static analysis overly conservative as static analysis plays central part in many software engineering activities knowledge about infeasible program paths can be used to greatly improve the performance of these activities especially structural testing and coverage analysis in this paper we present an empirical approach to the problem of infeasible path detection we have discovered that many infeasible paths exhibit some common properties which are caused by four code patterns including identical complement decision mutually exclusive decision check then do and looping by flag pattern through realizing these properties from source code many infeasible paths can be precisely detected binomial tests have been conducted which give strong statistical evidences to support the validity of the empirical properties our experimental results show that even with some limitations in the current prototype tool the proposed approach accurately detects of all the infeasible paths
with the aim of improving the performance of centroid text classifier we attempt to make use of the advantages of error correcting output codes ecoc strategy the framework is to decompose one multi class problem into multiple binary problems and then learn the individual binary classification problems by centroid classifier however this kind of decomposition incurs considerable bias for centroid classifier which results in noticeable degradation of performance for centroid classifier in order to address this issue we use model refinement strategy to adjust this so called bias the basic idea is to take advantage of misclassified examples in the training data to iteratively refine and adjust the centroids of text data the experimental results reveal that model refinement strategy can dramatically decrease the bias introduced by ecoc and the combined classifier is comparable to or even better than svm classifier in performance
many applications of knowledge discovery and data mining such as rule discovery for semantic query optimization database integration and decision support require the knowledge to be consistent with the data however databases usually change over time and make machine discovered knowledge inconsistent useful knowledge should be robust against database changes so that it is unlikely to become inconsistent after database updates this paper defines this notion of robustness in the context of relational databases and describes how robustness of first order horn clause rules can be estimated experimental results show that our estimation approach can accurately identify robust rules we also present rule antecedent pruning algorithm that improves the robustness and applicability of machine discovered rules to demonstrate the usefulness of robustness estimation
there are two kinds of approaches for termination analysis of logic programs transformational and direct ones direct approaches prove termination directly on the basis of the logic program transformational approaches transform logic program into term rewrite system trs and then analyze termination of the resulting trs instead thus transformational approaches make all methods previously developed for trss available for logic programs as well however the applicability of most existing transformations is quite restricted as they can only be used for certain subclasses of logic programs most of them are restricted to well moded programs in this paper we improve these transformations such that they become applicable for any definite logic program to simulate the behavior of logic programs by trss we slightly modify the notion of rewriting by permitting infinite terms we show that our transformation results in trss which are indeed suitable for automated termination analysis in contrast to most other methods for termination of logic programs our technique is also sound for logic programming without occur check which is typically used in practice we implemented our approach in the termination prover aprove and successfully evaluated it on large collection of examples
interpreted languages have become increasingly popular due to demands for rapid program development ease of use portability and safety beyond the general impression that they are slow however little has been documented about the performance of interpreters as class of applicationsthis paper examines interpreter performance by measuring and analyzing interpreters from both software and hardware perspectives as examples we measure the mipsi java perl and tcl interpreters running an array of micro and macro benchmarks on dec alpha platform our measurements of these interpreters relate performance to the complexity of the interpreter’s virtual machine and demonstrate that native runtime libraries can play key role in providing good performance from an architectural perspective we show that interpreter performance is primarily function of the interpreter itself and is relatively independent of the application being interpreted we also demonstrate that high level interpreters demands on processor resources are comparable to those of other complex compiled programs such as gcc we conclude that interpreters as class of applications do not currently motivate special hardware support for increased performance
we present framework for formally proving that the composition of the behaviors of the different parts of complex real time system ensures desired global specification of the overall system the framework is based on simple compositional rely guarantee circular inference rule plus methodology concerning the integration of the different parts into whole system the reference specification language is the trio metric linear temporal logic the novelty of our approach with respect to existing compositional frameworks most of which do not deal explicitly with real time requirements consists mainly in its generality and abstraction from any assumptions about the underlying computational model and from any semantic characterizations of the temporal logic language used in the specification moreover the framework deals equally well with continuous and discrete time it is supported by tool implemented on top of the proof checker pvs to perform deduction based verification through theorem proving of modular real time axiom systems as an example of application we show the verification of real time version of the old fashioned but still relevant benchmark of the dining philosophers problem
architectural patterns provide proven solutions to recurring design problems that arise in system context major challenge for modeling patterns in system design is effectively expressing pattern variability however modeling pattern variability in system design remains challenging task mainly because of the infinite pattern variants addressed by each architectural pattern this paper is an attempt to solve this problem by categorizing the solution participants of patterns more precisely we identify variable participants that lead to specializations within individual pattern variants and participants that appear over and over again in the solution specified by several patterns with examples and case study we demonstrate the successful applicability of this approach for designing systems using the uml extension mechanism we offer extensible architectural modeling constructs that can be used for modeling several pattern variants
conventional performance evaluation mechanisms focus on dedicated systems grid computing infrastructure on the other hand is shared collaborative environment constructed on virtual organizations each organization has its own resource management policy and usage pattern the non dedicated characteristic of grid computing prevents the leverage of conventional performance evaluation systems in this study we introduce the grid harvest service ghs performance evaluation and task scheduling system for solving large scale applications in shared environment ghs is based on novel performance prediction model and set of task scheduling algorithms ghs supports three classes of task scheduling single task parallel processing and meta task experimental results show that ghs provides satisfactory solution for performance prediction and task scheduling of large applications and has real potential
this paper presents an approach for the recommendation of items represented by different kinds of features the motivation behind our research is that often in online catalogues items to be recommended are described both by textual features and by non textual features for example books on amazoncom are described by title authors abstract but also by price and year of publication both types of features are useful to decide whether the item should be recommended to the customer we propose an approach which integrates non standard inference services and naïve bayes profiling system able to analyze the textual features of the items by advanced natural language processes and to learn semantic user profiles exploited in the recommendation process
the problem of computing craig interpolants for propositional sat formulas has recently received lot of interest mainly for its applications in formal verification however propositional logic is often not expressive enough for representing many interesting verification problems which can be more naturally addressed in the framework of satisfiability modulo theories smt although some works have addressed the topic of generating interpolants in smt the techniques and tools that are currently available have some limitations and their performace still does not exploit the full power of current state of the art smt solvers in this paper we try to close this gap we present several techniques for interpolant generation in smt which overcome the limitations of the current generators mentioned above and which take full advantage of state of the art smt technology these novel techniques can lead to substantial performance improvements wrt the currently available tools we support our claims with an extensive experimental evaluation of our implementation of the proposed techniques in the mathsat smt solver
multi database mining using local pattern analysis could be considered as an approximate method of mining multiple large databases thus it might be required to enhance the quality of knowledge synthesized from multiple databases also many decision making applications are directly based on the available local patterns in different databases the quality of synthesized knowledge decision based on local patterns in different databases could be enhanced by incorporating more local patterns in the knowledge synthesizing processing activities thus the available local patterns play crucial role in building efficient multi database mining applications we represent patterns in condensed form by employing coding called acp coding it allows us to consider more local patterns by lowering further the user inputs like minimum support and minimum confidence the proposed coding enables more local patterns participate in the knowledge synthesizing processing activities and thus the quality of synthesized knowledge based on local patterns in different databases gets enhanced significantly at given pattern synthesizing algorithm and computing resource
in free viewpoint video the viewer can interactively choose his viewpoint in space to observe the action of dynamic real world scene from arbitrary perspectives the human body and its motion plays central role in most visual media and its structure can be exploited for robust motion estimation and efficient visualization this paper describes system that uses multi view synchronized video footage of an actor’s performance to estimate motion parameters and to interactively re render the actor’s appearance from any viewpointthe actor’s silhouettes are extracted from synchronized video frames via background segmentation and then used to determine sequence of poses for human body model by employing multi view texturing during rendering time dependent changes in the body surface are reproduced in high detail the motion capture subsystem runs offline is non intrusive yields robust motion parameter estimates and can cope with broad range of motion the rendering subsystem runs at real time frame rates using ubiquous graphics hardware yielding highly naturalistic impression of the actor the actor can be placed in virtual environments to create composite dynamic scenes free viewpoint video allows the creation of camera fly throughs or viewing the action interactively from arbitrary perspectives
recent work has shown the effectiveness of leveraging layout and tag tree structure for segmenting webpages and labeling html elements however how to effectively segment and label the text contents inside html elements is still an open problem since many text contents on webpage are often text fragments and not strictly grammatical traditional natural language processing techniques that typically expect grammatical sentences are no longer directly applicable in this paper we examine how to use layout and tag tree structure in principled way to help understand text contents on webpages we propose to segment and label the page structure and the text content of webpage in joint discriminative probabilistic model in this model semantic labels of page structure can be leveraged to help text content understanding and semantic labels ofthe text phrases can be used in page structure understanding tasks such as data record detection thus integration of both page structure and text content understanding leads to an integrated solution of webpage understanding experimental results on research homepage extraction show the feasibility and promise of our approach
this paper addresses the problem of parallel transposition of large out of core arrays although algorithms for out of core matrix transposition have been widely studied previously proposed algorithms have sought to minimise the number of operations and the in memory permutation time we propose an algorithm that directly targets the improvement of overall transposition time the characteristics of the system are used to determine the read write and communication block sizes such that the total execution time is minimised we also provide solution to the array redistribution problem for arrays on disk the solutions to the sequential transposition problem and the parallel array redistribution problem are then combined to obtain an algorithm for the parallel out of core transposition problem
the information technology revolution has transformed all aspects of our society including critical infrastructures and led significant shift from their old and disparate business models based on proprietary and legacy environments to more open and consolidated ones supervisory control and data acquisition scada systems have been widely used not only for industrial processes but also for some experimental facilities due to the nature of open environments managing scada systems should meet various security requirements since system administrators need to deal with large number of entities and functions involved in critical infrastructures in this paper we identify necessary access control requirements in scada systems and articulate access control policies for the simulated scada systems we also attempt to analyze and realize those requirements and policies in the context of role based access control that is suitable for simplifying administrative tasks in large scale enterprises
the loosely coupled relationships between visualization and analytical data mining dm techniques represent the majority of the current state of art in visual data mining dm modeling is typically an automatic process with very limited forms of guidance from users conceptual model of the visualization support to dm modeling process and novel interactive visual decision tree ivdt classification process have been proposed in this paper with the aim of exploring humans pattern recognition ability and domain knowledge to facilitate the knowledge discovery process an ivdt for categorical input attributes has been developed and experimented on subjects to test three hypotheses regarding its potential advantages the experimental results suggested that compared to the automatic modeling process as typically applied in current decision tree modeling tools ivdt process can improve the effectiveness of modeling in terms of producing trees with relatively high classification accuracies and small sizes enhance users understanding of the algorithm and give them greater satisfaction with the task
we propose lambda text eta calculus which is second order polymorphic call by value calculus with extensional universal types unlike product types or function types in call by value extensional universal types are genuinely right adjoint to the weakening ie equality and equality hold for not only values but all terms we give monadic style categorical semantics so that the results can be applied also to languages like haskell to demonstrate validity of the calculus we construct concrete models for the calculus in generic manner exploiting relevant parametricity on such models we can obtain reasonable class of monads consistent with extensional universal types this class admits polynomial like constructions and includes non termination exception global state input output and list non determinism the acm portal is published by the association for computing machinery copyright acm inc terms of usage privacy policy code of ethics contact us useful downloads adobe acrobat quicktime windows media player real player
this paper studies the relationship between storage requirements and performance storage related dependences inhibit optimizations for locality and parallelism techniques such as renaming and array expansion can eliminate all storage related dependences but do so at the expense of increased storage this paper introduces the universal occupancy vector uov for loops with regular stencil of dependences the uov provides schedule independent storage reuse pattern that introduces no further dependences other than those implied by true flow dependences ov mapped code requires less storage than full array expansion and only slightly more storage than schedule dependent minimal storagewe show that determine if vector is uov is npcomplete however an easily constructed but possibly nonminimal uov can be used we also present branch and bound algorithm which finds the minimal uov while still maintaining legal uov at all timesour experimental results show that the use of ov mapped storage coupled with tiling for locality achieves better performance than tiling after array expansion and accommodates larger problem sizes than untilable storage optimized code f’urthermore storage mapping based on the uov introduces negligible runtime overhead
pacer is gesture based interactive paper system that supports fine grained paper document content manipulation through the touch screen of cameraphone using the phone’s camera pacer links paper document to its digital version based on visual features it adopts camera based phone motion detection for embodied gestures eg marquees underlines and lassos with which users can flexibly select and interact with document details eg individual words symbols and pixels the touch input is incorporated to facilitate target selection at fine granularity and to address some limitations of the embodied interaction such as hand jitter and low input sampling rate this hybrid interaction is coupled with other techniques such as semi real time document tracking and loose physical digital document registration offering gesture based command system we demonstrate the use of pacer in various scenarios including work related reading maps and music score playing preliminary user study on the design has produced encouraging user feedback and suggested future research for better understanding of embodied vs touch interaction and one vs two handed interaction
future computer systems will integrate tens of multithreaded processor cores on single chip die resulting in hundreds of concurrent program threads sharing system resources these designs will be the cornerstone of improving throughput in high performance computing and server environments however to date appropriate systems software operating system run time system and compiler technologies for these emerging machines have not been adequately explored future processors will require sophisticated hardware monitoring units to continuously feed back resource utilization information to allow the operating system to make optimal thread co scheduling decisions and also to software that continuously optimizes the program itselfnevertheless in order to continually and automatically adapt systems resources to program behaviors and application needs specific run time information must be collected to adequately enable dynamic code optimization and operating system scheduling generally run time optimization is limited by the time required to collect profiles the time required to perform optimization and the inherent benefits of any optimization or decisions initial techniques for effectively utilizing run time information for dynamic optimization and informed thread scheduling in future multithreaded architectures are presented
we present novel algorithm for interface generation of software components given component our algorithm uses learning techniques to compute permissive interface representing legal usage of the component unlike our previous work this algorithm does not require knowledge about the component’s environment furthermore in contrast to other related approaches our algorithm computes permissive interfaces even in the presence of non determinism in the component our algorithm is implemented in the javapathfinder model checking framework for uml statechart components we have also added support for automated assume guarantee style compositional verification in javapathfinder using component interfaces we report on the application of the approach to interface generation for flight software components
one of the flagship applications of partial evaluation is compilation and compiler generation however partial evaluation is usually expressed as source to source transformation for high level languages whereas realistic compilers produce object codewe close this gap by composing partial evaluator with compiler by automatic means our work is successful application of several meta computation techniques to build the system both in theory and in practice the composition is an application of deforestation or fusionthe result is run time code generation system built from existing components its applications are numerous for example it allows the language designer to perform interpreter based experiments with source to source version of the partial evaluator before building realistic compiler which generates object code automatically
context of hyperlink or link context is defined as the terms that appear in the text around hyperlink within web page link contexts have been applied to variety of web information retrieval and categorization tasks topical or focused web crawlers have special reliance on link contexts these crawlers automatically navigate the hyperlinked structure of the web while using link contexts to predict the benefit of following the corresponding hyperlinks with respect to some initiating topic or theme using topical crawlers that are guided by support vector machine we investigate the effects of various definitions of link contexts on the crawling performance we find that crawler that exploits words both in the immediate vicinity of hyperlink as well as the entire parent page performs significantly better than crawler that depends on just one of those cues also we find that crawler that uses the tag tree hierarchy within web pages provides effective coverage we analyze our results along various dimensions such as link context quality topic difficulty length of crawl training data and topic domain the study was done using multiple crawls over topics covering millions of pages allowing us to derive statistically strong results
register allocation has gained renewed attention in the recent past several authors propose separation of the problem into decoupled sub tasks including spilling allocation assignment and coalescing this approach is largely motivated by recent advances in ssa based register allocation that suggest that decomposition does not significantly degrade the overall allocation quality the algorithmic challenges of intra procedural spilling have been neglected so far and very crude heuristics were employed in this work we introduce the constrained min cut cmc problem for solving the spilling problem we provide an integer linear program formulation for computing an optimal solution of cmc and we devise progressive lagrangian solver that is viable for production compilers our experiments with speck and mibench show that optimal solutions are feasible even for very large programs and that heuristics leave significant potential behind for small register files
unlike the connected sum in classical topology its digital version is shown to have some intrinsic feature in this paper we study both the digital fundamental group and the euler characteristic of connected sum of digital closed surfaces
the synthesis for dynamic flowing water is of relatively high practical value in design of virtual reality computer games digital movies and scientific computing etc on one hand the physical model cannot help people to produce the photorealistic and easy edited flowing scene on the other hand digital products can be used to show the flowing scene in the world easily this paper presents novel algorithm for synthesizing dynamic water scene based on sample video to obtain video textons we analyze the sample video automatically using dynamic textures model then we utilize linear dynamic system lds to represent the characteristic of each texton by further hard constrains we synthesize new video for dynamic water flow which is prolonged and non fuzzy in vision we provide test examples to demonstrate the effective and efficiency of our proposed method
publish subscribe is common messaging paradigm used for asynchronous communication between applications synchronous publish subscribe middleware exist but are less common because they must address two main performance difficulties the first being that message dissemination involves larger delays and the second being that resources remain locked for much longer period of time hermes transaction service hts is such middleware which is capable of treating group of publications as transaction in this paper we propose design for transactional publish subscribe middleware based on hts accordingly we name the middleware tops transaction oriented publish subscribe we present the detailed functionality and architecture of tops and its differences with hts to demonstrate the advantages of the tops middleware we describe how different strategies of replication may be implemented all using the middleware proposed
the wait free hierarchy classifies multiprocessor synchronization primitives according to their power to solve consensus the classification is based on assigning number to each synchronization primitive where is the maximal number of processes for which deterministic wait free consensus can be solved using instances of the primitive and read write registers conditional synchronization primitives such as compare and swap and load linked store conditional can implement deterministic wait free consensus for any number of processes they have consensus number and are thus considered to be among the strongest synchronization primitives compare and swap and load linked store conditional have consequently became the synchronization primitives of choice and have been implemented in hardware in many multiprocessor architecturesthis paper shows that though they are strong in the context of consensus conditional synchronization primitives are not efficient in terms of memory space for implementing many key objects our results hold for starvation free implementations of mutual exclusion and for wait free implementations of large class of concurrent objects that we call visible roughly visible is class that includes all objects that support some operation that must perform visible write before it terminates visible includes many useful objects some examples are counters stacks queues swap fetch and add and single writer snapshot objects we show that at least conditional registers are required by any such implementation even if registers are of unbounded size we also obtain tradeoffs between time and space for process wait free implementations of any one time object in visible all these results hold for both deterministic and randomized implementationsstarvation free mutual exclusion and wait free implementations of some objects in visible eg counters swap and fetch and add can be implemented by non conditional primitives thus we believe that basing multiprocessor strong synchronization solely on conditional synchronization primitives might not be the best design choice
even as modern computing systems allow the manipulation and distribution of massive amounts of information users of these systems are unable to manage the confidentiality of their data in practical fashion conventional access control security mechanisms cannot prevent the illegitimate use of privileged data once access is granted for example information provided by user during an online purchase may be covertly delivered to malicious third parties by an untrustworthy web browser existing information flow security mechanisms do provide this assurance but only for programmer specified policies enforced during program development as static analysis on special purpose type safe languages not only are these techniques not applicable to many commonly used programs but they leave the user with no defense against malicious programmers or altered binaries in this paper we propose rifle runtime information flow security system designed from the user’s perspective by addressing information flow security using architectural support rifle gives users practical way to enforce their own information flow security policy on all programs we prove that contrary to statements in the literature run time systems like rifle are no less secure than existing language based techniques using model of the architectural framework and binary translator we demonstrate rifle’s correctness and illustrate that the performance cost is reasonable
we model the multi path routing with selfish nodes as an auction and provide novel solution from the gametheoretical perspective by adapting the idea of generalized second price gsp payment originating from internet advertising business and developing pertinent policies for multi hop networks we design mechanism that results in nash equilibria rather than the traditional strategyproofness which alleviates the over payment problem of the widely used vickrey clark groves vcg payment mechanism we first provide rigorous theoretical analysis of the proposed mechanism showing the equilibrium behavior and bounds of the over payment alleviation and then evaluate the effectiveness of this protocol through extensive simulations
the software engineering communities frequently propose new software engineering technologies such as new development techniques programming languages and tools without rigorous scientific evaluation one way to evaluate software engineering technologies is through controlled experiments where the effects of the technology can be isolated from confounding factors ie establishing cause effect relationships for practical and financial reasons however such experiments are often quite unrealistic typically involving students in class room environment solving small pen and paper tasks common criticism of the results of the experiments is their lack of external validity ie that the results are not valid outside the experimental conditions to increase the external validity of the experimental results the experiments need to be more realistic the realism can be increased using professional developers as subjects who conduct larger experimental tasks in their normal work environment however the logistics involved in running such experiments are tremendous more specifically the experimental materials eg questionnaires task descriptions code and tools must be distributed to each programmer the progress of the experiment needs to be controlled and monitored and the results of the experiment need to be collected and analyzed to support this logistics for large scale controlled experiments we have developed web based experiment support environment called sese this paper describes sese its development and the experiences from using it to conduct large controlled experiment in industry
program comprehension is very important activity during the development and the maintenance of programs this activity has been actively studied in the past decades to present software engineers with the most accurate and hopefully most useful pieces of information on the organisation algorithms executions evolution and documentation of program yet only few work tried to understand concretely how software engineers obtain and use this information software engineers mainly use sight to obtain information about program usually from source code or class diagrams therefore we use eye tracking to collect data about the use of class diagrams by software engineers during program comprehension we introduce new visualisation technique to aggregate and to present the collected data we also report the results and surprising insights gained from two case studies
java applications rely on just in time jit compilers or adaptive compilers to generate and optimize binary code at runtime to boost performance in conventional java virtual machines jvm however the binary code is typically written into the data cache and then is loaded into the instruction cache through the shared cache or memory which is not efficient in terms of both time and energy in this paper we study three hardware based code caching strategies to write and read the dynamically generated code faster and more energy efficiently our experimental results indicate that writing code directly into the instruction cache can improve the performance of variety of java applications by on average and up to also the overall energy dissipation of these java programs can be reduced by on average
many enterprises have been devoting significant portion of their budget to new product development npd in order to distinguish their products from those of their competitors and to make them better fit the needs and wants of customers hence businesses should develop products that fulfill the customer demands since this will increase the enterprise’s competitiveness and it is an essential criterion to earning higher loyalties and profits this paper presents the product map obtained from data mining results which investigates the relationships among customer demands product characteristics and transaction records using the apriori algorithm as methodology of association rules for data mining the product map shows that different knowledge patterns and rules can be extracted from customers to develop new cosmetic products and possible marketing solutions accordingly this paper suggests that the cosmetics industry should extract customer knowledge from the demand side and use this as knowledge resource on its supply chain for new product development
this article initiates the study of testing properties of directed graphs in particular the article considers the most basic property of directed graphs acyclicity because the choice of representation affects the choice of algorithm the two main representations of graphs are studied for the adjacency matrix representation most appropriate for dense graphs testing algorithm is developed that requires query and time complexity of where is distance parameter independent of the size of the graph the algorithm which can probe the adjacency matrix of the graph accepts every graph that is acyclic and rejects with probability at least every graph whose adjacency matrix should be modified in at least fraction of its entries so that it becomes acyclic for the incidence list representation most appropriate for sparse graphs an lower bound is proved on the number of queries and the time required for testing where is the set of vertices in the graph along with acyclicity this article considers the property of strong connectivity contrasting upper and lower bounds are proved for the incidence list representation in particular if the testing algorithm can query on both incoming and outgoing edges at each vertex then it is possible to test strong connectivity in time and query complexity on the other hand if the testing algorithm only has access to outgoing edges then queries are required to test for strong connectivity
the conditional connectivity and the conditional fault diameter of crossed cube are studied in this work the conditional connectivity is the connectivity of an interconnection network with conditional faults where each node has at least one fault free neighbor based on this requirement the conditional connectivity of crossed cube is shown to be extending this result the conditional fault diameter of crossed cube is also shown to be cq as set of node failures this indicates that the conditional fault diameter of crossed cube is increased by three compared to the fault free diameter of crossed cube the conditional fault diameter of crossed cube is approximately half that of the hypercube in this respect the crossed cube is superior to the hypercube
we study the satisfiability problem associated with xpath in the presence of dtds this is the problem of determining given query in an xpath fragment and dtd whether or not there exists an xml document such that conforms to and the answer of on is nonempty we consider variety of xpath fragments widely used in practice and investigate the impact of different xpath operators on the satisfiability analysis we first study the problem for negation free xpath fragments with and without upward axes recursion and data value joins identifying which factors lead to tractability and which to np completeness we then turn to fragments with negation but without data values establishing lower and upper bounds in the absence and in the presence of upward modalities and recursion we show that with negation the complexity ranges from pspace to exptime moreover when both data values and negation are in place we find that the complexity ranges from nexptime to undecidable furthermore we give finer analysis of the problem for particular classes of dtds exploring the impact of various dtd constructs identifying tractable cases as well as providing the complexity in the query size alone finally we investigate the problem for xpath fragments with sibling axes exploring the impact of horizontal modalities on the satisfiability analysis
in this paper we investigate to which extent the elimination of class of redundant clauses in sat instances could improve the efficiency of modern satisfiability provers since testing whether sat instance does not contain any redundant clause is np complete logically incomplete but polynomial time procedure to remove redundant clauses is proposed as pre treatment of sat solvers it relies on the use of the linear time unit propagation technique and often allows for significant performance improvements of the subsequent satisfiability checking procedure for really difficult real world instances
in this paper we investigate an application of feature clustering for word sense disambiguation and propose semisupervised feature clustering algorithm compared with other feature clustering methods ex supervised feature clustering it can infer the distribution of class labels over unseen features unavailable in training data labeled data by the use of the distribution of class labels over seen features available in training data thus it can deal with both seen and unseen features in feature clustering process our experimental results show that feature clustering can aggressively reduce the dimensionality of feature space while still maintaining state of the art sense disambiguation accuracy furthermore when combined with semi supervised wsd algorithm semi supervised feature clustering outperforms other dimensionality reduction techniques which indicates that using unlabeled data in learning process helps to improve the performance of feature clustering and sense disambiguation
we report on detailed study of the application and effectiveness of program analysis based on abstract interpretation of automatic program parallelization we study the case of parallelizing logic programs using the notion of strict independence we first propose and prove correct methodology for the application in the parallelization task of the information inferred by abstract interpretation using parametric domain the methodology is generic in the sense of allowing the use of different analysis domains number of well known approximation domains are then studied and the transformation into the parametric domain defined the transformation directly illustrates the revelance and applicability of each abstract domain for the application both local and global analyzers are then built using these domains and embedded in complete parallelizing compiler then the performance of the domains in this context is assessed through number of experiments comparatively wide range of aspects is studied from the resources needed by the analyzers in terms of time and memory to the actual benefits obtained from the information inferred such benefits are evaluated both in terms of the characteristics of the parallelized code and of the actual speedups obtained from it the results show that data flow analysis plays an important role in achieving efficient parallelizations and that the cost of such analysis con be reasonable even for quite sophisticated abstract domains furthermore the results also offer significant insight into the characteristics of the domains the demands of the application and the trade offs involved
large scale distributed hash tables dht are typically implemented without respect to node location or characteristics thus producing physically long routes and squandering network resources some systems have integrated round trip times through proximity aware identifier selection pis proximity aware route selection prs and proximity aware neighbor selection pns while prs and pns tend to optimize existing systems pis deterministically selects node identifiers based on physical node location leading to loss of scalability and robustness the trade off between the scalability and robustness gained from dht’s randomness and the better allocation of network resources that comes with location aware deterministically structured dht make it difficult to design system that is both robust and scalable and resource conserving we present initial ideas for the construction of small world dht which mitigates this trade off by retaining scalability and robustness while effectively integrating round trip times and additional node quality with the help of vivaldi network coordinates
fifo queues have over the years been the subject of significant research such queues are used as buffers both in variety of applications and in recent years as key tool in buffering data in high speed communication networks overall the most popular dynamic memory lock free fifo queue algorithm in the literature remains the ms queue algorithm of michael and scott unfortunately this algorithm as well as many others offers no more parallelism than that provided by allowing concurrent accesses to the head and tail in this paper we present the baskets queue new highly concurrent lock free linearizable dynamic memory fifo queue the baskets queue introduces new form of parallelism among enqueue operations that creates baskets of mixed order items instead of the standard totally ordered list the operations in different baskets can be executed in parallel surprisingly however the end result is linearizable fifo queue and in fact we show that basket queue based on the ms queue outperforms the original ms queue algorithm in various benchmarks
wifi radios in smart phones consume significant amount of power when active the standard allows these devices to save power through an energy conserving power save mode psm however depending on the psm implementation strategies used by the clients access points aps we find competing background traffic results in one or more of the following negative consequences significant increase up to in client’s energy consumption decrease in wireless network capacity due to unnecessary retransmissions and unfairness in this paper we propose napman network assisted power management for wifi devices that addresses the above issues napman leverages ap virtualization and new energy aware fair scheduling algorithm to minimize client energy consumption and unnecessary retransmissions while ensuring fairness among competing traffic napman is incrementally deployable via software updates to the ap and does not require any changes to the protocol or the mobile clients our prototype implementation improves the energy savings on smart phone by up to under varied settings of background traffic while ensuring fairness
context management systems are expected to administrate large volumes of spatial and non spatial information in geographical disperse domains in particular when these systems cover wide areas such as cities countries or even the entire planet the design of scalable storage retrieval and propagation mechanisms is paramount this paper elaborates on mechanisms that address advanced requirements including support for distributed context databases management efficient query handling innovative management of mobile physical objects and optimization strategies for distributed context data dissemination these mechanisms establish robust spatially enhanced distributed context management framework that has already been designed and carefully implemented and thoroughly evaluated
in this paper we describe middleware component supporting flexible user interaction for networked home appliances which is simple mechanism to fill the gap between traditional user interface systems and advanced user interaction devices our system enables us to control appliances in uniform way at any places and the system allows us to select suitable input and output devices according to our preferences and situations our system has based on the stateless thin client system and translates input and output interaction events according to user interaction devices we also show the effectiveness of our approach in our home computing systems
this paper takes fresh look at the problem of precise verification of heap manipulating programs using first order satisfiability modulo theories smt solvers we augment the specification logic of such solvers by introducing the logic of interpreted sets and bounded quantification for specifying properties of heap manipulating programs our logic is expressive closed under weakest preconditions and efficiently implementable on top of existing smt solvers we have created prototype implementation of our logic over the solvers simplify and and used our prototype to verify many programs our preliminary experience is encouraging the completeness and the efficiency of the decisionprocedure is clearly evident in practice and has greatly improved the user experience of the verifier
discovering patterns with great significance is an important problem in data mining discipline an episode is defined to be partially ordered set of events for consecutive and fixed time intervals in sequence most of previous studies on episodes consider only frequent episodes in sequence of events called simple sequence in real world we may find set of events at each time slot in terms of various intervals hours days weeks etc we refer to such sequences as complex sequences mining frequent episodes in complex sequences has more extensive applications than that in simple sequences in this paper we discuss the problem on mining frequent episodes in complex sequence we extend previous algorithm minepi to minepi for episode mining from complex sequences furthermore memory anchored algorithm called emma is introduced for the mining task experimental evaluation on both real world and synthetic data sets shows that emma is more efficient than minepi
peer to peer networks have become very popular on the internet with millions of peers all over the world sharing large volumes of data in the assistive healthcare sector it is likely that pp networks will develop that interconnect and allow the controlled sharing of patient databases of various hospitals clinics and research laboratories however the sheer scale of these networks has made it difficult to gather statistics that could be used for building new features in this paper we present technique to obtain estimations of the number of distinct values matching query on the network we evaluate the technique experimentally and provide set of results that demonstrate its effectiveness as well as its flexibility in supporting variety of queries and applications
application profiling the process of monitoring an application to determine the frequency of execution within specific regions is an essential step within the design process for many software and hardware systems profiling is often critical step within hardware software partitioning utilized to determine the critical kernels of an application in this paper we present non intrusive dynamic application profiler daprof capable of profiling an executing application by monitoring the application’s short backwards branches function calls function returns as well as efficiently detecting context switches to provide accurate characterization of the frequently executed loops within multitasked applications daprof can accurately profile multiple tasks within software application with accuracy using as little as additional area compared to an arm processor
one of the most difficult jobs in designing communication and multimedia chips is to design and verify complex complementary circuit pair in which circuit transforms information into format that is suitable for transmission and storage while e’s complementary circuit recovers this information in order to ease this job we propose novel two step approach to synthesize complementary circuit from fully automatically first we assume that the circuit satisfies parameterized complementary assumption which means its input can be recovered from its output under some parameter setting we check this assumption with sat solver and find out proper values of these parameters second with parameter values and the sat instance obtained in the first step we build the complementary circuit with an efficient satisfying assignments enumeration technique that is specially designed for circuits with lots of xor gates to illustrate its usefulness and efficiency we run our algorithm on several complex encoders from industrial projects including pcie and ethernet and successfully generate correct complementary circuits for them
this paper presents an analysis of feature oriented and aspect oriented modularization approaches with respect to variability management as needed in the context of system families this analysis serves two purposes on the one hand our analysis of the weaknesses of feature oriented approaches foas for short emphasizes the importance of crosscutting modularity as supported by the aspect oriented concepts of pointcut and advice on the other hand by pointing out some of aspectj’s weaknesses and by demonstrating how caesar language which combines concepts from both aspectj and foas is more effective in this context we also demonstrate the power of appropriate support for layer modules
spatio temporal databases deal with geometries changing over time in general geometries do not only change discretely but continuously hence we are dealing with moving objects in the past few moving object data models and query languages have been proposed each of them supports either historical movements or future movements but not both together consequently queries that start in the past and extend into the future cannot be supported to model both historical and future movements of an object two separate concepts with different properties are required and extra attention is necessary to avoid their conflicts furthermore current definitions of moving objects are too general and vague it is unclear how moving object is allowed to move through space and time for instance the continuity or discontinuity of motion is not specified in this paper we propose new moving object data model called balloon model which provides integrated support for both historical and future movements of moving objects as part of the model we provide formal definitions of moving objects with respect to their past and future movements all kinds of queries including past queries future queries and queries that start in the past and end in the future are supported in our model
over the past decade our group has approached interaction design from an industrial design point of view in doing so we focus on branch of design called ldquo formgiving rdquo whilst formgiving is somewhat of neologism in english many other european languages do have separate word for form related design including german gestaltung danish formgivnin swedish formgivning and dutch vormgeving traditionally formgiving has been concerned with such aspects of objects as form colour texture and material in the context of interaction design we have come to see formgiving as the way in which objects appeal to our senses and motor skills in this paper we first describe our approach to interaction design of electronic products we start with how we have been first inspired and then disappointed by the gibsonian perception movement how we have come to see both appearance and actions as carriers of meaning and how we see usability and aesthetics as inextricably linked we then show number of interaction concepts for consumer electronics with both our initial thinking and what we learnt from them finally we discuss the relevance of all this for tangible interaction we argue that in addition to data centred view it is also possible to take perceptual motor centred view on tangible interaction in this view it is the rich opportunities for differentiation in appearance and action possibilities that make physical objects open up new avenues to meaning and aesthetics in interaction design
in this paper we explore simple and general approach for developing parallel algorithms that lead to good cache complexity on parallel machines with private or shared caches the approach is to design nested parallel algorithms that have low depth span critical path length and for which the natural sequential evaluation order has low cache complexity in the cache oblivious model we describe several cache oblivious algorithms with optimal work polylogarithmic depth and sequential cache complexities that match the best sequential algorithms including the first such algorithms for sorting and for sparse matrix vector multiply on matrices with good vertex separators using known mappings our results lead to low cache complexities on shared memory multiprocessors with single level of private caches or single shared cache we generalize these mappings to multi level cache hierarchies of private or shared caches implying that our algorithms also have low cache complexities on such hierarchies the key factor in obtaining these low parallel cache complexities is the low depth of the algorithms we propose
canonical distributed optimization problem is solving covering packing linear program in distributed environment with fast convergence and low communication and space overheads in this paper we consider the following covering and packing problems which are the dual of each other passive commodity monitoring minimize the total cost of monitoring devices used to measure the network traffic on all paths maximum throughput multicommodity flow maximize the total value of the flow with bounded edge capacities we present the first known distributed algorithms for both of these problems that converge to approximate solutions in poly logarithmic time with communication and space overheads that depend on the maximal path length but are almost independent of the size of the entire network previous distributed solutions achieving similar approximations required convergence time communication or space overheads that depend polynomially on the size of the entire network the sequential simulation of our algorithm is more efficient than the fastest known approximation algorithms for multicommodity flows eg garg könemann when the maximal path length is small
deterministic parsing guided by treebank induced classifiers has emerged as simple and efficient alternative to more complex models for data driven parsing we present systematic comparison of memory based learning mbl and support vector machines svm for inducing classifiers for deterministic dependency parsing using data from chinese english and swedish together with variety of different feature models the comparison shows that svm gives higher accuracy for richly articulated feature models across all languages albeit with considerably longer training times the results also confirm that classifier based deterministic parsing can achieve parsing accuracy very close to the best results reported for more complex parsing models
the basic unit of testing in an object oriented program is class although there has been much recent research on testing of classes most of this work has focused on black box approaches however since black box testing techniques may not provide sufficient code coverage they should be augmented with code based or white box techniques dataflow testing is code based testing technique that uses the dataflow relations in program to guide the selection of tests existing dataflow testing techniques can be applied both to individual methods in class and to methods in class that interact through messages but these techniques do not consider the dataflow interactions that arise when users of class invoke sequences of methods in an arbitrary order we present new approach to class testing that supports dataflow testing for dataflow interactions in class for individual methods in class and methods that send messages to other methods in the class our technique is similar to existing dataflow testing techniques for methods that are accessible outside the class and can be called in any order by users of the class we compute dataflow information and use it to test possible interactions between these methods the main benefit of our approach is that it facilitates dataflow testing for an entire class by supporting dataflow testing of classes we provide opportunities to find errors in classes that may not be uncovered by black box testing our technique is also useful for determining which sequences of methods should be executed to test class even in the absence of specification finally as with other code based testing techniques large portion of our technique can be automated
the capability of olap database software systems to handle data complexity comes at high price for analysts presenting them combinatorially vast space of views of relational database we respond to the need to deploy technologies sufficient to allow users to guide themselves to areas of local structure by casting the space of views of an olap database as combinatorial object of all projections and subsets and view discovery as an search process over that lattice we equip the view lattice with statistical information theoretical measures sufficient to support combinatorial optimization process we outline hop chaining as particular view discovery algorithm over this object wherein users are guided across permutation of the dimensions by searching for successive two dimensional views pushing seen dimensions into an increasingly large background filter in spiraling search process we illustrate this work in the context of data cubes recording summary statistics for radiation portal monitors at us ports
affective and human centered computing are two areas related to hci which have attracted attention during the past years one of the reasons that this may be attributed to is the plethora of devices able to record and process multimodal input from the part of the users and adapt their functionality to their preferences or individual habits thus enhancing usability and becoming attractive to users less accustomed with conventional interfaces in the quest to receive feedback from the users in an unobtrusive manner the visual and auditory modalities allow us to infer the users emotional state combining information both from facial expression recognition and speech prosody feature extraction in this paper we describe multi cue dynamic approach in naturalistic video sequences contrary to strictly controlled recording conditions of audiovisual material the current research focuses on sequences taken from nearly real world situations recognition is performed via simple recurrent network which lends itself well to modeling dynamic events in both user’s facial expressions and speech moreover this approach differs from existing work in that it models user expressivity using dimensional representation of activation and valence instead of detecting the usual universal emotions which are scarce in everyday human machine interaction the algorithm is deployed on an audiovisual database which was recorded simulating human human discourse and therefore contains less extreme expressivity and subtle variations of number of emotion labels
in this paper we propose geocross simple yet novel event driven geographic routing protocol that removes cross links dynamically to avoid routing loops in urban vehicular ad hoc networks vanets geocross exploits the natural planar feature of urban maps without resorting to cumbersome planarization its feature of dynamic loop detection makes geocross suitable for highly mobile vanet we have shown that in pathologic cases geocross’s packet delivery ratio pdr is consistently higher than greedy perimeter stateless routing’s gpsr’s and greedy perimeter coordinator routing’s gpcr’s we have also shown that caching geocross cache provides the same high pdr but uses fewer hops
this paper presents novel computational framework based on dynamic spherical volumetric simplex splines for simulation of genuszero real world objects in this framework we first develop an accurate and efficient algorithm to reconstruct the high fidelity digital model of real world object with spherical volumetric simplex splines which can represent with accuracy geometric material and other properties of the object simultaneously with the tight coupling of lagrangian mechanics the dynamic volumetric simplex splines representing the object can accurately simulate its physical behavior because it can unify the geometric and material properties in the simulation the visualization can be directly computed from the object’s geometric or physical representation based on the dynamic spherical volumetric simplex splines during simulation without interpolation or resampling we have applied the framework for biomechanic simulation of brain deformations such as brain shifting during the surgery and brain injury under blunt impact we have compared our simulation results with the ground truth obtained through intra operative magnetic resonance imaging and the real biomechanic experiments the evaluations demonstrate the excellent performance of our new technique presented in this paper
program entities such as branches def use pairs and call sequences are used in diverse software development tasks reducing set of entities to small representative subset through subsumption saves monitoring overhead focuses the developer’s attention and provides insights into the complexity of program previous work has solved this problem for entities of the same type and only for some types in this paper we introduce novel and general approach for subsumption of entities of any type based on predicate conditions we discuss applications of this technique and address future steps
this paper describes the architecture and implementation of the alphaserver gs cache coherent non uniform memory access multiprocessor developed at compaq the alphaserver gs architecture is specifically targeted at medium scale multiprocessing with to processors each node in the design consists of four alpha processors up to gb of coherent memory and an aggressive io subsystem the current implementation supports up to such nodes for total of processors while snoopy based designs have been stretched to medium scale multiprocessors by some vendors providing sufficient snoop bandwidth remains major challenge especially in systems with aggressive processors at the same time directory protocols targeted at larger scale designs lead to number of inherent inefficiencies relative to snoopy designs key goal of the alphaserver gs architecture has been to achieve the best of both worlds partly by exploiting the bounded scale of the target systemsthis paper focuses on the unique design features used in the alphaserver gs to efficiently implement coherence and consistency the guiding principle for our directory based protocol is to address correctness issues related to rare protocol races without burdening the common transaction flows our protocol exhibits lower occupancy and lower message counts compared to previous designs and provides more efficient handling of hop transactions furthermore our design naturally lends itself to elegant solutions for deadlock livelock starvation and fairness the alphaserver gs architecture also incorporates couple of innovative techniques that extend previous approaches for efficiently implementing memory consistency models these techniques allow us to generate commit events which are used for ordering purposes well in advance of formulating the reply to transaction furthermore the separation of the commit event allows time critical replies to bypass inbound requests without violating ordering properties even though our design specifically targets medium scale servers many of the same techniques can be applied to larger scale directory based and smaller scale snoopy based designs finally we evaluate the performance impact of some of the above optimizations and present few competitive benchmark results
in order to provide an increasing number of functionalities and benefit from sophisticated and application tailored services from the network distributed applications are led to integrate an ever widening range of networking technologies as these applications become more complex this requirement for network heterogeneity is becoming crucial issue in their development although progress has been made in the networking community in addressing such needs through the development of network overlays we claim in this paper that the middleware community has been slow to integrate these advances into middleware architectures and hence to provide the foundational bedrock for heterogeneous distributed applications in response we propose our open overlays framework this framework which is part of wider middleware architecture accommodates overlay plug ins allows physical nodes to support multiple overlays supports the stacking of overlays to create composite protocols and adopts declarative approach to configurable deployment and dynamic reconfigurability the framework has been in development for number of years and supports an extensive range of overlay plug ins including popular protocols such as chord and pastry we report on our experiences with the open overlays framework evaluate it in detail and illustrate its application in detailed case study of network heterogeneity
we describe new type of graphical user interface widget known as tracking menu tracking menu consists of cluster of graphical buttons and as with traditional menus the cursor can be moved within the menu to select and interact with items however unlike traditional menus when the cursor hits the edge of the menu the menu moves to continue tracking the cursor thus the menu always stays under the cursor and close at handin this paper we define the behavior of tracking menus show unique affordances of the widget present variety of examples and discuss design characteristics we examine one tracking menu design in detail reporting on usability studies and our experience integrating the technique into commercial application for the tablet pc while user interface issues on the tablet pc such as preventing round trips to tool palettes with the pen inspired tracking menus the design also works well with standard mouse and keyboard configuration
drawing from the uses and gratifications theory this study explores the influences of informativeness entertainment and interactivity on the digital multimedia broadcasting dmb user dimensions such as attitude usage and satisfaction particularly usage and satisfaction are explored as the consequences of attitude towards dmb while informativeness entertainment and interactivity are the antecedents of attitude towards dmb this relational model was tested with structural equation modelling sem approach sem results indicated that the uses and gratifications theory explains users attitudes towards dmb dmb users who perceive it as entertaining and informative generally show positive attitude towards dmb in addition the interactivity of dmb has significant relation with the users positive attitudes towards dmb finally this study found that dmb users with positive attitude towards dmb watch its content more often and feel more satisfied
prefetching has been widely used to improve system performance in mobile environments since prefetching consumes system resources such as bandwidth and power it is important to consider the system overhead when designing prefetching schemes in this paper we propose cache miss initiated prefetch cmip scheme to address this issue the cmip scheme relies on two prefetch sets the always prefetch set and the miss prefetch set the always prefetch set consists of the data that should always be prefetched if it is possible the miss prefetch set consists of the data that are closely related to the cache missed data item when cache miss happens instead of sending an uplink request to only ask for the cache missed data item the client requests several data items which are within the miss prefetch set to reduce future cache misses note that the client can ask for more than one data item by an uplink request with very little additional cost thus prefetching several data items in one uplink request can save additional uplink requests we propose novel algorithms to mine the association rules and use them to construct the prefetch sets detailed experiments are used to evaluate the performance of the proposed scheme compared to the uir scheme cao scalable low latency cache invalidation strategy for mobile environments ieee transactions on knowledge and data engineering and the uir scheme without prefetch our cmip scheme can greatly improve the system performance in terms of cache hit ratio reduced uplink requests and additional traffic
we present smooth interpretation method to systematically approximate numerical imperative programs by smooth mathematical functions this approximation facilitates the use of numerical search techniques like gradient descent for program analysis and synthesis the method extends to programs the notion of gaussian smoothing popular signal processing technique that filters out noise and discontinuities from signal by taking its convolution with gaussian function in our setting gaussian smoothing executes program according to probabilistic semantics the execution of program on an input after gaussian smoothing can be summarized as follows apply gaussian perturbation to the perturbed input is random variable following normal distribution with mean compute and return the expected output of on this perturbed input computing the expectation explicitly would require the execution of on all possible inputs but smooth interpretation bypasses this requirement by using form of symbolic execution to approximate the effect of gaussian smoothing on the result is an efficient but approximate implementation of gaussian smoothing of programs smooth interpretation has the effect of attenuating features of program that impede numerical searches of its input space for example discontinuities resulting from conditional branches are replaced by continuous transitions we apply smooth interpretation to the problem of synthesizing values of numerical control parameters in embedded control applications this problem is naturally formulated as one of numerical optimization the goal is to find parameter values that minimize the error between the resulting program and programmer provided behavioral specification solving this problem by directly applying numerical optimization techniques is often impractical due to the discontinuities in the error function by eliminating these discontinuities smooth interpretation makes it possible to search the parameter space efficiently by means of simple gradient descent our experiments demonstrate the value of this strategy in synthesizing parameters for several challenging programs including models of an automated gear shift and pid controller
high quality monte carlo image synthesis requires the ability to importance sample realistic brdf models however analytic sampling algorithms exist only for the phong model and its derivatives such as lafortune and blinn phong this paper demonstrates an importance sampling technique for wide range of brdfs including complex analytic models such as cook torrance and measured materials which are being increasingly used for realistic image synthesis our approach is based on compact factored representation of the brdf that is optimized for sampling we show that our algorithm consistently offers better efficiency than alternatives that involve fitting and sampling lafortune or blinn phong lobe and is more compact than sampling strategies based on tabulating the full brdf we are able to efficiently create images involving multiple measured and analytic brdfs under both complex direct lighting and global illumination
we propose the class of visibly pushdown languages as embeddings of context free languages that is rich enough to model program analysis questions and yet is tractable and robust like the class of regular languages in our definition the input symbol determines when the pushdown automaton can push or pop and thus the stack depth at every position we show that the resulting class vpl of languages is closed under union intersection complementation renaming concatenation and kleene and problems such as inclusion that are undecidable for context free languages are exptime complete for visibly pushdown automata our framework explains unifies and generalizes many of the decision procedures in the program analysis literature and allows algorithmic verification of recursive programs with respect to many context free properties including access control properties via stack inspection and correctness of procedures with respect to pre and post conditions we demonstrate that the class vpl is robust by giving two alternative characterizations logical characterization using the monadic second order mso theory over words augmented with binary matching predicate and correspondence to regular tree languages we also consider visibly pushdown languages of infinite words and show that the closure properties mso characterization and the characterization in terms of regular trees carry over the main difference with respect to the case of finite words turns out to be determinizability nondeterministic büchi visibly pushdown automata are strictly more expressive than deterministic muller visibly pushdown automata
abstract lock free objects offer significant performance and reliability advantages over conventional lock based objects however the lack of an efficient portable lock free method for the reclamation of the memory occupied by dynamic nodes removed from such objects is major obstacle to their wide use in practice this paper presents hazard pointers memory management methodology that allows memory reclamation for arbitrary reuse it is very efficient as demonstrated by our experimental results it is suitable for user level applications as well as system programs without dependence on special kernel or scheduler support it is wait free it requires only single word reads and writes for memory access in its core operations it allows reclaimed memory to be returned to the operating system in addition it offers lock free solution for the aba problem using only practical single word instructions our experimental results on multiprocessor system show that the new methodology offers equal and more often significantly better performance than other memory management methods in addition to its qualitative advantages regarding memory reclamation and independence of special hardware support we also show that lock free implementations of important object types using hazard pointers offer comparable performance to that of efficient lock based implementations under no contention and no multiprogramming and outperform them by significant margins under moderate multiprogramming and or contention in addition to guaranteeing continuous progress and availability even in the presence of thread failures and arbitrary delays
we present study of the automatic classification of web user navigation patterns and propose novel approach to classifying user navigation patterns and predicting users future requests the approach is based on the combined mining of web server logs and the contents of the retrieved web pages the textual content of web pages is captured through extraction of character grams which are combined with web server log files to derive user navigation profiles the approach is implemented as an experimental system and its performance is evaluated based on two tasks classification and prediction the system achieves the classification accuracy of nearly and the prediction accuracy of about which is about higher than the classification accuracy by mining web server logs alone this approach may be used to facilitate better web personalization and website organization
we present an approach for deriving syntactic word clusters from parsed text grouping words according to their unlexicalized syntactic contexts we then explore the use of these syntactic clusters in leveraging large corpus of trees generated by high accuracy parser to improve the accuracy of another parser based on different formalism for representing different level of sentence structure in our experiments we use phrase structure trees to produce syntactic word clusters that are used by predicate argument dependency parser significantly improving its accuracy
online transactions eg buying book on the web typically involve number of steps spanning several pages conducting such transactions under constrained interaction modalities as exemplified by small screen handhelds or interactive speech interfaces the primary mode of communication for visually impaired individuals is strenuous fatigue inducing activity but usually one needs to browse only small fragment of web page to perform transactional step such as form fillout selecting an item from search results list etc we exploit this observation to develop an automata based process model that delivers only the relevant page fragments at each transactional step thereby reducing information overload on such narrow interaction bandwidths we realize this model by coupling techniques from content analysis of web documents automata learning and statistical classification the process model and associated techniques have been incorporated into guide prototype system that facilitates online transactions using speech keyboard interface guide speech or with limited display size handhelds guide mobile performance of guide and its user experience are reported
serialization of threads due to critical sections is fundamental bottleneck to achieving high performance in multithreaded programs dynamically such serialization may be unnecessary because these critical sections could have safely executed concurrently without locks current processors cannot fully exploit such parallelism because they do not have mechanisms to dynamically detect such false inter thread dependenceswe propose speculative lock elision sle novel micro architectural technique to remove dynamically unnecessary lock induced serialization and enable highly concurrent multithreaded execution the key insight is that locks do not always have to be acquired for correct execution synchronization instructions are predicted as being unnecessary and elided this allows multiple threads to concurrently execute critical sections protected by the same lock misspeculation due to inter thread data conflicts is detected using existing cache mechanisms and rollback is used for recovery successful speculative elision is validated and committed without acquiring the locksle can be implemented entirely in microarchitecture without instruction set support and without system level modifications is transparent to programmers and requires only trivial additional hardware support sle can provide programmers fast path to writing correct high performance multithreaded programs
the elicitation modeling and analysis of requirements have consistently been one of the main challenges during the development of complex systems telecommunication systems belong to this category of systems due to the worldwide distribution and the heterogeneity of today’s telecommunication networks scenarios and use cases have become popular for capturing and analyzing requirements however little research has been done that compares different approaches and assesses their suitability for the telecommunications domain this paper defines evaluation criteria and then reviews fifteen scenario notations in addition twenty six approaches for the construction of design models from scenarios are briefly compared
in this paper we present new form of revocable lock that streamlines the construction of higher level concurrency abstractions such as atomic multi word heap updates the key idea is to expose revocation by displacing the previous lock holder’s execution to safe address this provides mutual exclusion without needing to block threads this brings many simplifications often removing the need for dynamic memory management and letting us strip operations from common case execution paths as well as streamlining algorithms design our results show that the technique leads to improved performance and scalability across range of levels of contention
most compiler optimizations and software productivity tools rely on information about the effects of pointer dereferences in program the purpose of points to analysis is to compute this information safely and as accurately as is practical unfortunately accurate points to information is difficult to obtain for large programs because the time and space requirements of the analysis become prohibitive we consider the problem of scaling flow and context insensitive points to analysis to large programs perhaps containing hundreds of thousands of lines of code our approach is based on variable substitution transformation which is performed off line ie before standard points to analysis is performed the general idea of variable substitution is that set of variables in program can be replaced by single representative variable thereby reducing the input size of the problem our main contribution is linear time algorithm which finds particular variable substitution that maintains the precision of the standard analysis and is also very effective in reducing the size of the problem we report our experience in performing points to analysis on large programs including some industrial sized ones experiments show that our algorithm can reduce the cost of andersen’s points to analysis substantially on average it reduced the running time by and the memory cost by relative to an efficient baseline implementation of the analysis
standard approaches to stereo correspondence have difficulty when scene structure does not lie in or near the frontal parallel plane in part because an orientation disparity as well as positional disparity is introduced we propose correspondence algorithm based on differential geometry that takes explicit advantage of both disparities the algorithm relates the differential structure position tangent and curvature of curves in the left and right images to the frenet approximation of the space curve compatibility function is defined via transport of the frenet frames and they are matched by relaxing this compatibility function on overlapping neighborhoods along the curve the remaining false matches are concurrently eliminated by model of near and far neurons derived from neurobiology examples on scenes with complex structures are provided
concurrency libraries can facilitate the development of multi threaded programs by providing concurrent implementations of familiar data types such as queues or sets there exist many optimized algorithms that can achieve superior performance on multiprocessors by allowing concurrent data accesses without using locks unfortunately such algorithms can harbor subtle concurrency bugs moreover they requirememory ordering fences to function correctly on relaxed memory models to address these difficulties we propose verification approach that can exhaustively check all concurrent executions of given test program on relaxed memory model and can verify that they are observationally equivalent to sequential execution our checkfence prototype automatically translates the implementation code and the test program into sat formula hands the latter to standard sat solver and constructs counter example traces if there exist incorrect executions applying checkfence to five previously published algorithms we were able to find several bugs some not previously known and determine how to place memory ordering fences for relaxed memory models
rapidly emerging it underpins decision support systems but research has yet to explain the it management challenges of such rapid change this study sought to understand the problems of rapid it change and the interrelationships between them it used structured interviews with it professionals and useable responses from survey of such professionals after identifying five problem categories it proposed and tested theory that vendor competitiveness leads to poor quality incompatibility and management confusion and these increase training demands besides the theory the research contributes survey instrument and focal points to help it managers better provide the infrastructure for dss
as sensor networks operate over long periods of deployment in difficult to reach places their requirements may change or new code may need to be uploaded to them the current state of the art protocols deluge and mnp for network reprogramming perform the code dissemination in multihop manner using three way handshake where metadata is exchanged prior to code exchange to suppress redundant transmissions the code image is also pipelined through the network at the granularity of pages in this article we propose protocol called freshet for optimizing the energy for code upload and speeding up the dissemination if multiple sources of code are available the energy optimization is achieved by equipping each node with limited nonlocal topology information which it uses to determine the time when it can go to sleep since code is not being distributed in its vicinity the protocol to handle multiple sources provides loose coupling of nodes to source and disseminates code in waves each originating at source with mechanism to handle collisions when the waves meet the protocol’s performance with respect to reliability delay and energy consumed is demonstrated through analysis simulation and implementation on the berkeley mote platform
we use features extracted from sources such as versioning and issue tracking systems to predict defects in short time frames of two months our multivariate approach covers aspects of software projects such as size team structure process orientation complexity of existing solution difficulty of problem coupling aspects time constrains and testing data we investigate the predictability of several severities of defects in software projects are defects with high severity difficult to predict are prediction models for defects that are discovered by internal staff similar to models for defects reported from the field we present both an exact numerical prediction of future defect numbers based on regression models as well as classification of software components as defect prone based on the decision tree we create models to accurately predict short term defects in study of applications composed of more than classes and lines of code the model quality is assessed based on fold cross validation
power consumption has emerged as key design concern across the entire computing range from low end embedded systems to high end supercomputers understanding the power characteristics of microprocessor under design requires careful study using variety of workloads these workloads range from benchmarks that represent typical behavior up to hand tuned stress benchmarks so called stressmarks that stress the microprocessor to its extreme power consumptionthis paper closes the gap between these two extremes by studying techniques for the automated identification of stress patterns worst case application behaviors in typical workloads for doing so we borrow from sampled simulation theory and we provide two key insights first although representative sampling is slightly less effective in characterizing average behavior than statistical sampling it is substantially more effective in finding stress patterns second we find that threshold clustering is better alternative than means clustering which is typically used in representative sampling for finding stress patterns overall we can identify extreme energy and power behaviors in microprocessor workloads with three orders of magnitude speedup with an error of few percent on average
camera view invariant object retrieval is an important issue in many traditional and emerging applications such as security surveillance computer aided design cad virtual reality and place recognition one straightforward method for camera view invariant object retrieval is to consider all the possible camera views of objects however capturing and maintaining such views require an enormous amount of time and labor in addition all camera views should be indexed for reasonable retrieval performance which requires extra storage space and maintenance overhead in the case of shape based object retrieval such overhead could be relieved by considering the symmetric shape feature of most objects in this paper we propose new shape based indexing and matching scheme of real or rendered objects for camera view invariant object retrieval in particular in order to remove redundant camera views to be indexed we propose camera view skimming scheme which includes mirror shape pairing and ii camera view pruning according to the symmetrical patterns of object shapes since our camera view skimming scheme considerably reduces the number of camera views to be indexed it could relieve the storage requirement and improve the matching speed without sacrificing retrieval accuracy through various experiments we show that our proposed scheme can achieve excellent performance
stack inspection is mechanism for programming secure applications in the presence of code from various protection domains run time checks of the call stack allow method to obtain information about the code that directly or indirectly invoked it in order to make access control decisions this mechanism is part of the security architecture of java and the net common language runtime central problem with stack inspection is to determine to what extent the local checks inserted into the code are sufficient to guarantee that global security property is enforced further problem is how such verification can be carried out in an incremental fashion incremental analysis is important for avoiding re analysis of library code every time it is used and permits the library developer to reason about the code without knowing its context of deployment we propose technique for inferring interfaces for stack inspecting libraries in the form of secure calling context for methods by secure calling context we mean pre condition on the call stack sufficient for guaranteeing that execution of the method will not violate given global property the technique is constraint based static program analysis implemented via fixed point iteration over an abstract domain of linear temporal logic properties
motivated by the success of large margin methods in supervised learning maximum margin clustering mmc is recent approach that aims at extending large margin methods to unsupervised learning however its optimization problem is nonconvex and existing mmc methods all rely on reformulating and relaxing the nonconvex optimization problem as semidefinite programs sdp though sdp is convex and standard solvers are available they are computationally very expensive and only small data sets can be handled to make mmc more practical we avoid sdp relaxations and propose in this paper an efficient approach that performs alternating optimization directly on the original nonconvex problem key step to avoid premature convergence in the resultant iterative procedure is to change the loss function from the hinge loss to the laplacian square loss so that overconfident predictions are penalized experiments on number of synthetic and real world data sets demonstrate that the proposed approach is more accurate much faster hundreds to tens of thousands of times faster and can handle data sets that are hundreds of times larger than the largest data set reported in the mmc literature
scenarios are increasingly recognized as an effective means for eliciting validating and documenting software requirements this paper concentrates on the use of scenarios for requirements elicitation and explores the process of inferring formal specifications of goals and requirements from scenario descriptions scenarios are considered here as typical examples of system usage they are provided in terms of sequences of interaction steps between the intended software and its environment such scenarios are in general partial procedural and leave required properties about the intended system implicit in the end such properties need to be stated in explicit declarative terms for consistency completeness analysis to be carried outa formal method is proposed for supporting the process of inferring specifications of system goals and requirements inductively from interaction scenarios provided by stakeholders the method is based on learning algorithm that takes scenarios as examples counterexamples and generates set of goal specifications in temporal logic that covers all positive scenarios while excluding all negative onesthe output language in which goals and requirements are specified is the kaos goal based specification language the paper also discusses how the scenario based inference of goal specifications is integrated in the kaos methodology for goal based requirements engineering in particular the benefits of inferring declarative specifications of goals from operational scenarios are demonstrated by examples of formal analysis at the goal level including conflict analysis obstacle analysis the inference of higher level goals and the derivation of alternative scenarios that better achieve the underlying goals
we present some decidability and undecidability results for subsets of the blenx language process calculi based programming language developed for modelling biological processes we show that for core subset of the language which considers only communication primitives termination is decidable moreover we prove that by adding either global priorities or events to this core language we obtain turing equivalent languages the proof is through encodings of random access machines rams well known turing equivalent formalism into our subsets of blenx all the encodings are shown to be correct
businesses and their supporting software evolve to accommodate the constant revision and re negotiation of commercial goals and to intercept the potential of new technology we have adopted the term co evolution to describe the concept of the business and the software evolving sympathetically but at potentially different rates more generally we extend co evolution to accommodate wide informatics systems that are assembled from parts that co evolve with each other and their environment and whose behavior is potentially emergent typically these are long lived systems in which dynamic co evolution whereby system evolves as part of its own execution in reaction to both expected and unexpected events is the only feasible option for change examples of such systems include continuously running business process models sensor nets grid applications self adapting tuning systems peer to peer routing systems control systems autonomic systems and pervasive computing applications the contribution of this paper comprises study of the intrinsic nature of dynamic co evolving systems the derivation of set of intrinsic requirements description of model and set of technologies new and extant to meet these intrinsic requirements and illustrations of how these technologies may be implemented within an architecture description language archware adl and conventional programming language java the model and technologies address three topics structuring for dynamic co evolution incremental design and adapting dynamic co evolving systems the combination yields framework that can describe the system’s specification the executing software and the reflective evolutionary mechanisms within single computational domain in which all three may evolve in tandem
many machine learning technologies such as support vector machines boosting and neural networks have been applied to the ranking problem in information retrieval however since originally the methods were not developed for this task their loss functions do not directly link to the criteria used in the evaluation of ranking specifically the loss functions are defined on the level of documents or document pairs in contrast to the fact that the evaluation criteria are defined on the level of queries therefore minimizing the loss functions does not necessarily imply enhancing ranking performances to solve this problem we propose using query level loss functions in learning of ranking functions we discuss the basic properties that query level loss function should have and propose query level loss function based on the cosine similarity between ranking list and the corresponding ground truth we further design coordinate descent algorithm referred to as rankcosine which utilizes the proposed loss function to create generalized additive ranking model we also discuss whether the loss functions of existing ranking algorithms can be extended to query level experimental results on the datasets of trec web track ohsumed and commercial web search engine show that with the use of the proposed query level loss function we can significantly improve ranking accuracies furthermore we found that it is difficult to extend the document level loss functions to query level loss functions
the semijoin has been used as an effective operator in reducing data transmission and processing over network that allows forward size reduction of relations and intermediate results generated during the processing of distributed query the authors propose relational operator two way semijoin which enhanced the semijoin with backward size reduction capability for more cost effective query processing pipeline way join algorithm for joining the reduced relations residing on sites is introduced the main advantage of this algorithm is that it eliminates the need for transferring and storing intermediate results among the sites set of experiments showing that the proposed algorithm outperforms all known conventional join algorithms that generate intermediate results is included
it has long been realized that social network of scientific collaborations provides window on patterns of collaboration within the academic community investigations and studies about static and dynamic properties of co authorship network have also been done in the recent years however the accent of most of the research is on the analysis about the macroscopic structure of the whole network or community over time such as distance diameter shrinking and densication phenomenon and microscopic formation analysis of links groups or communities over time but in fact how an individual or community grows over time may not only provide new view point to mine copious and valuable information about scientific networks but also reveal important factors that influence the growth process in this paper from temporal and microscopic analytical perspective we propose method to trace scientific individual’s and community’s growth process based on community’s evolution path combination with quantifiable measurements during the process of tracing we find out that it is the fact that the lifespan of community is related to the ability of altering its membership but what’s more and complementary we find out that the lifespan of community is also related to the ability of maintaining its core members meaning that community may last for longer lifespan if its core members are much more stable meanwhile we also trace the growth process of research individuals based on the evolution of communities
current paper based interfaces such as papiercraft provide very little feedback and this limits the scope of possible interactions so far there has been little systematic exploration of the structure constraints and contingencies of feedback mechanisms in paper based interaction systems for paper only environments we identify three levels of feedback discovery feedback eg to aid with menu learning status indication feedback eg for error detection and task feedback eg to aid in search task using three modalities visual tactile and auditory which can be easily implemented on pen sized computer we introduce conceptual matrix to guide systematic research on pen top feedback for paper based interfaces using this matrix we implemented multimodal pen prototype demonstrating the potential of our approach we conducted an experiment that confirmed the efficacy of our design in helping users discover new interface and identify and correct their errors
we present new access control mechanism for pp networks with distributed enforcement called pp access control system pacs pacs enforces powerful access control models like rbac with administrative delegation inside pp network in pure pp manner which is not possible in any of the currently used pp access control mechanisms pacs uses client side enforcement to support the replication of confidential data to avoid single point of failure at the time of privilege enforcement we use threshold cryptography to distribute the enforcement among the participants our analysis of the expected number of messages and the computational effort needed in pacs shows that its increased flexibility comes with an acceptable additional overhead
detection based semantics does not differentiate between event detection and event occurrence and has been used for detecting events in most of the active systems that support event condition action rules however this is limitation for many applications that require interval based semantics in this article we formalize the detection of snoop an event specification language event operators using interval based semantics termed snoopib in various event consumption modes we show how events are detected using event detection graphs and present few representative algorithms to detect snoopib operators and comment on their implementation in the context of sentinel an active object oriented dbms
current systems that publish relational data as nested xml views are passive in the sense that they can only respond to user initiated queries over the nested views in this article we propose an active system whereby users can place triggers on unmaterialized nested views of relational data in this architecture we present scalable and efficient techniques for processing triggers over nested views by leveraging existing support for sql triggers over flat relations in commercial relational databases we have implemented our proposed techniques in the context of the quark xml middleware system our performance results indicate that our proposed techniques are feasible approach to supporting triggers over nested views of relational data
we propose temporal dependency called trend dependency td which captures significant family of data evolution regularities an example of such regularity is salaries of employees generally do not decrease tds compare attributes over time using operators of leq geq ne we define satisfiability problem that is the dual of the logical implication problem for tds and we investigate the computational complexity of both problems as tds allow expressing meaningful trends mining them from existing databases is interesting for the purpose of td mining td satisfaction is characterized by support and confidence measures we study the problem rm tdmine given temporal database mine the tds that conform to given template and whose support and confidence exceed certain threshold values the complexity of rm tdmine is studied as well as algorithms to solve the problem
memory devices often consume more energy than microprocessors in current portable embedded systems but their energy consumption changes significantly with the type of transaction data values and access timing as well as depending on the total number of transactions these variabilities mean that an innovative tool and framework are required to characterize modern memory devices running in embedded system architectures we introduce an energy measurement and characterization platform for memory devices and demonstrate an application to multilevel cell mlc flash memories in which we discover significant value dependent programming energy variations we introduce an energy aware data compression method that minimizes the flash programming energy rather than the size of the compressed data which is formulated as an entropy coding with unequal bit pattern costs deploying probabilistic approach we derive energy optimal bit pattern probabilities and expected values of the bit pattern costs which are applicable to the large amounts of compressed data typically found in multimedia applications then we develop an energy optimal prefix coding that uses integer linear programming and construct prefix code table from consideration of pareto optimal energy consumption we can make tradeoffs between data size and programming energy such as percnt energy savings for percnt area overhead
emerging high rate applications imaging structural monitoring acoustic localization will need to transport large volumes of data concurrently from several sensors these applications are also loss intolerant key requirement for such applications then is protocol that reliably transport sensor data from many sources to one or more sinks without incurring congestion collapse in this paper we discuss rcrt rate controlled reliable transport protocol suitable for constrained sensor nodes rcrt uses end to end explicit loss recovery but places all the congestion detection and rate adaptation functionality in the sinks this has two important advantages efficiency and flexibility because sinks make rate allocation decisions they are able to achieve greater efficiency since they have more comprehensive view of network behavior for the same reason it is possible to alter the rate allocation decisions for example from one that ensures that all nodes get the same rate to one that ensures that nodes get rates in proportion to their demands without modifying sensor code at all we evaluate rcrt extensively on node wireless sensor network testbed and show that rcrt achieves more than twice the rate achieved by recently proposed interference aware distributed rate control protocol ifrc
exact match queries wildcard match queries and mismatch queries are widely used in various molecular biology applications including the searching of ests expressed sequence tags and dna transcription factors in this paper we suggest an efficient indexing and processing mechanism for such queries our indexing method places sliding window at every possible location of dna sequence and extracts its signature by considering the occurrence frequency of each nucleotide it then stores set of signatures using multi dimensional index such as the tree also by assigning weight to each position of window it prevents signatures from being concentrated around few spots in indexing space our query processing method converts query sequence into multi dimensional rectangle and searches the index for the signatures overlapping with the rectangle experiments with real biological data sets have revealed that the proposed approach is at least times times and several orders of magnitude faster than the previous one in performing exact match wildcard match and mismatch queries respectively
in this paper we present middleware architecture for coordination services in sensor networks that facilitates interaction between groups of sensors which monitor different environmental events it sits on top of the native routing infrastructure and exports the abstraction of mobile communication endpoints maintained at the locations of such events single logical destination is created and maintained for every environmental event of interest such destinations are uniquely labeled and can be used for communication by application level algorithms for coordination and sensory data management between the different event locales for example they may facilitate coordination in distributed intrusion scenario among nodes in the vicinity of the intruderswe evaluate our middleware architecture using glomosim wireless network simulator our results illustrate the success of our architecture in maintaining event related communication endpoints we provide an analysis of how architectural and network dependent parameters affect our performance additionally we provide proof of concept implementation on real sensor network testbed berkeley’s mica motes
stepwise refinement is at the core of many approaches to synthesis and optimization of hardware and software systems for instance it can be used to build synthesis approach for digital circuits from high level specifications it can also be used for post synthesis modification such as in engineering change orders ecos therefore checking if system modeled as set of concurrent processes is refinement of another is of tremendous value in this paper we focus on concurrent systems modeled as communicating sequential processes csp and show their refinements can be validated using insights from translation validation automated theorem proving and relational approaches to reasoning about programs the novelty of our approach is that it handles infinite state spaces in fully automated manner we have implemented our refinement checking technique and have applied it to variety of refinements we present the details of our algorithm and experimental results as an example we were able to automatically check an infinite state space buffer refinement that cannot be checked by current state of the art tools such as fdr we were also able to check the data part of an industrial case study on the ep system
query logs record the queries and the actions of the users of search engines and as such they contain valuable information about the interests the preferences and the behavior of the users as well as their implicit feedback to search engine results mining the wealth of information available in the query logs has many important applications including query log analysis user profiling and personalization advertising query recommendation and more in this paper we introduce the query flow graph graph representation of the interesting knowledge about latent querying behavior intuitively in the query flow graph directed edge from query qi to query qj means that the two queries are likely to be part of the same search mission any path over the query flow graph may be seen as searching behavior whose likelihood is given by the strength of the edges along the path the query flow graph is an outcome of query log mining and at the same time useful tool for it we propose methodology that builds such graph by mining time and textual information as well as aggregating queries from different users using this approach we build real world query flow graph from large scale query log and we demonstrate its utility in concrete applications namely finding logical sessions and query recommendation we believe however that the usefulness of the query flow graph goes beyond these two applications
this paper proposes declarative description of user interfaces that abstracts from low level implementation details in particular the user interfaces specified in our framework are executable as graphical user interfaces for desktop applications as well as web user interfaces via standard web browsers thus our approach combines the advantages of existing user interface technologies in flexible way without demands on the programmer’s side we sketch an implementation of this concept in the declarative multi paradigm programming language curry and show how the integrated functional and logic features of curry are exploited to enable high level implementation of this concept
this paper describes an on chip coma cache coherency protocol to support the microthread model of concurrent program composition the model gives sound basis for building multi core computers as it captures concurrency abstracts communication and identifies resources such as processor groups explicitly and where mapping and scheduling is performed dynamically the result is model where binary compatibility is guaranteed over arbitrary numbers of cores and where backward binary compatibility is also assured we present the design of memory system with relaxed synchronisation and consistency constraints that matches the characteristics of this model we exploit an on chip coma organisation which provides flexible and transparent partitioning between processors and memory this paper describes the coherency protocol and consistency model and describes work undertaken on the validation of the model and the development of co simulator to the microgrid cmp emulator
we consider conversational recommender system based on example critiquing where some recommendations are suggestions aimed at stimulating preference expression to acquire an accurate preference model user studies show that suggestions are particularly effective when they present additional opportunities to the user according to the look ahead principle this paper proposes strategy for producing suggestions that exploits prior knowledge of preference distributions and can adapt relative to users reactions to the displayed examples we evaluate the approach with simulations using data acquired by previous interactions with real users in two different settings we measured the effects of prior knowledge and adaptation strategies with satisfactory results
time series based prediction methods have wide range of uses in embedded systems many os algorithms and applications require accurate prediction of demand and supply of resources however configuring prediction algorithms is not easy since the dynamics of the underlying data requires continuous observation of the prediction error and dynamic adaptation of the parameters to achieve high accuracy current prediction methods are either too costly to implement on resource constrained devices or their parameterization is static making them inappropriate and inaccurate for wide range of datasets this paper presents nwslite prediction utility that addresses these shortcomings on resource restricted platforms
in large scale distributed systs such as grids an agreent between client and service provider specifies service level objectives both as expressions of client requirents and as provider assurances ideally these objectives are expressed in high level service or application specific manner rather than requiring clients to detail the necessary resources resource providers on the other hand expect low level resource specific performance criteria that are uniform across applications and can easily be interpreted and provisioned this paper presents framework for grid service managent that addresses this gap between high level specification of client performance objectives and existing resource managent infrastructures it identifies three levels of abstraction for resource requirents that service provider needs to manage namely detailed specification of raw resources virtualization of heterogeneous resources as abstract resources and performance objectives at an application level the paper also identifies three key functions for managing service level agreents namely translation of resource requirents across abstraction layers arbitration in allocating resources to client requests and aggregation and allocation of resources from multiple lower level resource managers one or more of these key functions may be present at each abstraction layer of service level manager thus the composition of these functions across resource abstraction layers enables modeling of wide array of managent scenarios we present framework that supports these functions it uses the service metadata and or service performance models to map client requirents to resource capabilities it uses business value associated with objectives in allocation decisions to arbitrate between competing requests and it allocates resources based on previously negotiated agreents
sport video data is growing rapidly as result of the maturing digital technologies that support digital video capture faster data processing and large storage however semi automatic content extraction and annotation scalable indexing model and effective retrieval and browsing still pose the most challenging problems for maximizing the usage of large video databases this article will present the findings from comprehensive work that proposes scalable and extensible sports video retrieval system with two major contributions in the area of sports video indexing and retrieval the first contribution is new sports video indexing model that utilizes semi schema based indexing scheme on top of an object relationship approach this indexing model is scalable and extensible as it enables gradual index construction which is supported by ongoing development of future content extraction algorithms the second contribution is set of novel queries which are based on xquery to generate dynamic and user oriented summaries and event structures the proposed sports video retrieval system has been fully implemented and populated with soccer tennis swimming and diving video the system has been evaluated against users to demonstrate and confirm its feasibility and benefits the experimental sports genres were specifically selected to represent the four main categories of sports domain period set point time race and performance based sports thus the proposed system should be generic and robust for all types of sports
many recent complex object database systems support the concepts of object identity and object identifier following an object identifier to access the referenced object is called navigation operation and is an essential operation in dealing with complex objects navigation operation is difficult operation to implement efficiently since every navigation operation inherently causes one disk access operation scheme to notably accelerate the navigation operation among sea of complex objects by increasing the effective number of objects in one disk page is proposed the main concept of the presented technique is threefold the first idea is to store cached value within complex object that is referencing another complex object the second is that when the referenced object is to be updated the update propagation is delayed until the time when the cached value is referenced the third is to utilize hashed table on main memory to efficiently validate the consistency between the cached values and the original values
this paper introduces novel method called reference based string alignment rbsa that speeds up retrieval of optimal subsequence matches in large databases of sequences under the edit distance and the smith waterman similarity measure rbsa operates using the assumption that the optimal match deviates by relatively small amount from the query an amount that does not exceed prespecified fraction of the query length rbsa has an exact version that guarantees no false dismissals and can handle large queries efficiently an approximate version of rbsa is also described that achieves significant additional improvements over the exact version with negligible losses in retrieval accuracy rbsa performs filtering of candidate matches using precomputed alignment scores between the database sequence and set of fixed length reference sequences at query time the query sequence is partitioned into segments of length equal to that of the reference sequences for each of those segments the alignment scores between the segment and the reference sequences are used to efficiently identify relatively small number of candidate subsequence matches an alphabet collapsing technique is employed to improve the pruning power of the filter step in our experimental evaluation rbsa significantly outperforms state of the art biological sequence alignment methods such as grams blast and bwt
this paper presents an overview on the last years of technical advances in the field of character and document recognition representative developments in each decade are described then key technical developments in the specific area of kanji recognition in japan are highlighted the main part of the paper discusses robustness design principles which have proven to be effective to solve complex problems in postal address recognition included are the hypothesis driven principle deferred decision multiple hypotheses principle information integration principle alternative solution principle and perturbation principle finally future prospects the long tail phenomena and promising new applications are discussed
to realize the full potential of human simulations in interactive environments we need controllers that have the ability to respond appropriately to unexpected events in this paper we create controllers for the trip recovery responses that occur during walking two strategies have been identified in human responses to tripping impact from an obstacle during early swing leads to an elevating strategy in which the swing leg is lifted over the obstacle and impact during late swing leads to lowering strategy in which swing leg is positioned immediately in front of the obstacle and then the other leg is swung forward and positioned in front of the body to allow recovery from the fall we design controllers for both strategies based on the available biomechanical literature and data captured from human subjects in the laboratory we evaluate our controllers by comparing simulated results and actual responses obtained from motion capture system
this paper describes visual shape descriptor based on the sectors and shape context of contour lines to represent the image local features used for image matching the proposed descriptor consists of two component feature vectors first the local region is separated into sectors and their gradient magnitude and orientation values are extracted feature vector is then constructed from these values second local shape features are obtained using the shape context of contour lines another feature vector is then constructed from these contour lines the proposed approach calculates the local shape feature without needing to consider the edges this can overcome the difficulty associated with textured images and images with ill defined edges the combination of two component feature vectors makes the proposed descriptor more robust to image scale changes illumination variations and noise the proposed visual shape descriptor outperformed other descriptors in terms of the matching accuracy better than sift better than pca sift better than gloh and better than the shape context
numerous attacks such as worms phishing and botnets threaten the availability of the internet the integrity of its hosts and the privacy of its users core element of defense against these attacks is anti virus av software service that detects removes and characterizes these threats the ability of these products to successfully characterize these threats has far reaching effects from facilitating sharing across organizations to detecting the emergence of new threats and assessing risk in quarantine and cleanup in this paper we examine the ability of existing host based anti virus products to provide semantically meaningful information about the malicious software and tools or malware used by attackers using large recent collection of malware that spans variety of attack vectors eg spyware worms spam we show that different av products characterize malware in ways that are inconsistent across av products incomplete across malware and that fail to be concise in their semantics to address these limitations we propose new classification technique that describes malware behavior in terms of system state changes eg files written processes created rather than in sequences or patterns of system calls to address the sheer volume of malware and diversity of its behavior we provide method for automatically categorizing these profiles of malware into groups that reflect similar classes of behaviors and demonstrate how behavior based clustering provides more direct and effective way of classifying and analyzing internet malware
in this paper we present an approach to automatically detect high impact coding errors in large java applications which use frameworks these high impact errors cause serious performance degradation and outages in real world production environments are very time consuming to detect and potentially cost businesses thousands of dollars based on years experience working with ibm customer production systems we have identified over high impact coding patterns from which we have been able to distill small set of pattern detection algorithms these algorithms use deep static analysis thus moving problem detection earlier in the development cycle from production to development additionally we have developed an automatic false positive filtering mechanism based on domain specific knowledge to achieve level of usability acceptable to ibm field engineers our approach also provides necessary contextual information around the sources of the problems to help in problem remediation we outline how our approach to problem determination can be extended to multiple programming models and domains we have implemented this problem determination approach in the saber tool and have used it successfully to detect many serious code defects in several large commercial applications this paper shows results from four such applications that had over coding defects
functional query is query whose answer is always defined and unique ie it is either true or false in all models it has been shown that the expressive powers of the various types of stable models when restricted to the class of datalog functional queries do not in practice go beyond those of well founded semantics except for the least undefined stable models which instead capture the whole boolean hierarchy bhin this paper we present functional language which by means of disciplined use of negation achieves the desired level of expressiveness up to bh although the semantics of the new language is partial all atoms in the source program are defined and possibly undefined atoms are introduced in rewriting phase to increase the expressive power we show that the language satisfies desirable properties better than classical languages with unstratified negation and stable model semantics we present an algorithm for the evaluation of functional queries and we show that exponential time resolution is required for hard problems only finally we present the architecture of prototype of the language which has been developed
modern information systems require temporal and privilege consuming usage of digital objects to meet these requirements we present new access control model times based usage control tucon tucon extends traditional and temporal access control models with times based usage control by defining the maximum times that privilege can be exercised when the usage times of privilege is consumed to zero or the time interval of the usage is expired the privilege exercised on the object is automatically revoked by the system formal definitions of tucon actions and rules are presented in this paper and the implementation of tucon is discussed
gals globally asynchronous locally synchronous system typically consists of collection of sequential deterministic components that execute concurrently and communicate using slow or unreliable channels this paper proposes general approach for modelling and verifying gals systems using combination of synchronous languages for the sequential components and process calculi for communication channels and asynchronous concurrency this approach is illustrated with an industrial case study provided by airbus tftpudp communication protocol between plane and the ground which is modelled using the eclipse topcased workbench for model driven engineering and then analysed formally using the cadp verification and performance evaluation toolbox
legacy systems constitute valuable assets to the organizations that own them and today there is an increased demand to make them accessible through the world wide web to support commerce activities as result the problem of legacy interface migration is becoming very important in the context of the cellest project we have developed new process for migrating legacy user interfaces to web accessible platforms instead of analyzing the application code to extract model of its structure the cellest process analyzes traces of the system user interaction to model the behavior of the application’s user interface the produced state transition model specifies the unique legacy interface screens as states and the possible commands leading from one screen to another as transitions between the states the interface screens are identified as clusters of similar in appearance snapshots in the recorded trace next the syntax of each transition command is extracted as the pattern shared by all the transition instances found in the trace this user interface model is used as the basis for constructing models of the tasks performed by the legacy application users these task models are subsequently used to develop new web accessible interface front ends for executing these tasks in this paper we discuss the cellest method for reverse engineering state transition model of the legacy interface we illustrate it with examples we discuss the results of our experimentation with it and we discuss how this model can be used to support the development of new interface front ends
nonprofit social service organizations provide the backbone of social support infrastructure in the us and around the world as the ecology of information exchange moves evermore digital nonprofit organizations with limited resources and expertise struggle to keep pace we present qualitative investigation of two nonprofit outreach centers providing service to the homeless in us metropolitan city despite similar goals shared by these organizations apparent differences in levels of computerization volunteerism and organizational structure demonstrate the challenges in attempting to adopt technology systems when resources and technical expertise are highly constrained
this paper explores the benefits and limitations of using inspector executor approach for software distributed shared memory sdsm systems the role of the inspector is to obtain description of the address space accessed during the execution of parallel loops the information collected by the inspector will enable the runtime to optimize the movement of shared data that will happen during the executor phase this paper addresses the main issues that have been considered to embed an inspector executor model in sdsm system amount of data collected by the inspector the accurateness of this data when the loop has data and or control dependences and the computational overhead introduced the paper also includes description of the sdsm system where the inspector executor model has been embedded the proposal is evaluated with four applications from the nas benchmark suite the evaluation shows that the accuracy of the inspection and the small overheads introduced by the approach allow its use in sdsm system
this paper presents practical evaluation and comparison of three state of the art parallel functional languages the evaluation is based on implementations of three typical symbolic computation programs with performance measured on beowulf class parallel architecturewe assess three mature parallel functional languages pmls system for implicitly parallel execution of ml programs semi gph mainly implicit parallel extension of haskell semi and eden more explicit parallel extension of haskell designed for both distributed and parallel execution while all three languages employ completely implicit approach to communication each language takes different approach to specifying and controlling parallelism ranging from explicit identification of processes as language constructs eden through annotation of potential parallelism gph to automatic detection of parallel skeletons in sequential code pmls we present detailed performance measurements of all three systems on widely available parallel architecture beowulf cluster of low cost commodity workstations we use three representative symbolic applications matrix multiplication algorithm an exact linear system solver and simple ray tracer our results show how moderate speedups can be achieved with little or no changes to the sequential code and that parallel performance can be significantly improved even within our high level model of parallel functional programming by controlling key aspects of the program such as load distribution and thread granularity
shape segmentations designed for different applications show significant variation in the composition of their parts in this paper we introduce the segmentation and labeling of shape based on the simultaneous optimization of multiple heterogenous objectives that capture application specific segmentation criteria we present number of efficient objective functions that capture useful shape adjectives compact flat narrow perpendicular etc segmentation descriptions within our framework combine multiple such objective functions with optional labels to define each part the optimization problem is simplified by proposing weighted voronoi partitioning as compact and continuous parametrization of spatially embedded shape segmentations separation of spatially close but geodesically distant parts is made possible using multi dimensional scaling prior to voronoi partitioning optimization begins with an initial segmentation found using the centroids of means clustering of surface elements this partition is automatically labeled to optimize heterogeneous part objectives and the voronoi centers and their weights optimized using generalized pattern search we illustrate our framework using several diverse segmentation applications consistent segmentations with semantic labels bounding volume hierarchies for path tracing and automatic rig and clothing transfer between animation characters
an extended method of rough sets called method of weighted equivalence classes is applied to data table containing imprecise values expressed in possibility distribution an indiscerniblity degree between objects is calculated family of weighted equivalence classes is obtained via indiscernible classes from binary relation for indiscernibility between objects each equivalence class in the family is accompanied by possibilistic degree to which it is an actual one by using the family of weighted equivalence classes we derive lower approximation and an upper approximation these approximations coincide with those obtained from methods of possible worlds therefore the method of weighted equivalence classes is justified
the generation of frequent itemsets is an essential and time consuming step in mining association rules most of the studies adopt the apriori based approach which has great effort in generating candidate itemsets and needs multiple database accesses recent studies indicate that fp tree approach has been utilized to avoid the generation of candidate itemsets and scan transaction database only twice but they work with more complicated data structure besides it needs to adjust the structure of fp tree when it applied to incremental mining application it is necessary to adjust the position of an item upward or downward in the structure of fp tree when new transaction increases or decreases the accumulation of the item the process of the adjustment of the structure of fp tree is the bottlenecks of the fp tree in incremental mining application therefore algorithms for efficient mining of frequent patterns are in urgent demand this paper aims to improve both time and space efficiency in mining frequent itemsets and incremental mining application we propose novel qsd quick simple decomposition algorithm using simple decompose principle which derived from minimal heap tree we can discover the frequent itemsets quickly under one database scan meanwhile qsd algorithm doesn’t need to scan database and reconstruct data structure again when database is updated or minimum support is varied it can be applied to on line incremental mining applications without any modification comprehensive experiments have been conducted to assess the performance of the proposed algorithm the experimental results show that the qsd algorithm outperforms previous algorithms
the paper addresses the problem of indexing data for nearest neighbors nn search given collection of data objects and similarity measure the searching goal is to find quickly the most similar objects to given query object we present top down indexing method that employs widely used scheme of indexing algorithms it starts with the whole set of objects at the root of an indexing tree and iteratively splits data at each level of indexing hierarchy in the paper two different data models are considered in the first objects are represented by vectors from multi dimensional vector space the second more general is based on an assumption that objects satisfy only the axioms of metric space we propose an iterative means algorithm for tree node splitting in case of vector space and an iterative approximate centers algorithm in case when only metric space is provided the experiments show that the iterative means splitting procedure accelerates significantly nn searching over the one step procedure used in other indexing structures such as gnat ss tree and tree and that the relevant representation of tree node is an important issue for the performance of the search process we also combine different search pruning criteria used in bst ght nad gnat structures into one and show that such combination outperforms significantly each single pruning criterion the experiments are performed for benchmark data sets of the size up to several hundreds of thousands of objects the indexing tree with the means splitting procedure and the combined search criteria is particularly effective for the largest tested data sets for which this tree accelerates searching up to several thousands times
there has been good deal of progress made recently towards the efficient parallelization of individual phases of single queries in multiprocessor database systems in this paper we devise and evaluate number of scheduling algorithms designed to handle multiple parallel queries one of these algorithms emerges as clear winner this algorithm is hierarchical in nature in the first phase good quality precedence based schedule is created for each individual query and each possible number of processors this component employs dynamic programming in the second phase the results of the first phase are used to create an overall schedule of the full set of queries this component is based on previously published work on nonprecedence based malleable scheduling even though the problem we are considering is np hard in the strong sense the multiple query schedules generated by our hierarchical algorithm are seen experimentally to achieve results which are close to optimal
complete authentication system based on fusion of face and hand biometrics is presented and evaluated in this paper the system relies on low cost real time sensor which can simultaneously acquire pair of depth and color images of the scene by combining and facial and hand geometry features we are able to provide highly reliable user authentication robust to appearance and environmental variations the design of the proposed system addresses two basic requirements of biometric technologies dependable performance under real world conditions along with user convenience experimental evaluation on an extensive database recorded in real working environment demonstrates the superiority of the proposed multimodal scheme against unimodal classifiers in the presence of numerous appearance and environmental variations thus making the proposed system an ideal solution for wide range of real world applications from high security to personalization of services and attendance control
we revisit the well known group membership problem and show how it can be considered special case of simple problem the set membership problem in the set membership problem processes maintain set whose elements are drawn from an arbitrary universe they can request the addition or removal of elements to from that set and they agree on the current value of the set group membership corresponds to the special case where the elements of the set happen to be processes we exploit this new way of looking at group membership to give simple and succint specification of this problem and to outline simple implementation approach based on the state machine paradigm this treatment of group membership separates several issues that are often mixed in existing specifications and or implementations of group membership we believe that this separation of concerns greatly simplifies the understanding of this problem
stealthy malware such as botnets and spyware are hard to detect because their activities are subtle and do not disrupt the network in contrast to dos attacks and aggressive worms stealthy malware however does communicate to exfiltrate data to the attacker to receive the attacker’s commands or to carry out those commands moreover since malware rarely infiltrates only single host in large enterprise these communications should emerge from multiple hosts within coarse temporal proximity to one another in this paper we describe system called md pronounced tamed with which an enterprise can identify candidate groups of infected computers within its network md accomplishes this by finding new communication aggregates involving multiple internal hosts ie communication flows that share common characteristics we describe characteristics for defining aggregates including flows that communicate with the same external network that share similar payload and or that involve internal hosts with similar software platforms and justify their use in finding infected hosts we also detail efficient algorithms employed by md for identifying such aggregates and demonstrate particular configuration of md that identifies new infections for multiple bot and spyware examples within traces of traffic recorded at the edge of university network this is achieved even when the number of infected hosts comprise only about of all internal hosts in the network
commercial web search engines have to process user queries over huge web indexes under tight latency constraints in practice to achieve low latency large result caches are employed and portion of the query traffic is served using previously computed results moreover search engines need to update their indexes frequently to incorporate changes to the web after every index update however the content of cache entries may become stale thus decreasing the freshness of served results in this work we first argue that the real problem in today’s caching for large scale search engines is not eviction policies but the ability to cope with changes to the index ie cache freshness we then introduce novel algorithm that uses time to live value to set cache entries to expire and selectively refreshes cached results by issuing refresh queries to back end search clusters the algorithm prioritizes the entries to refresh according to heuristic that combines the frequency of access with the age of an entry in the cache in addition for setting the rate at which refresh queries are issued we present mechanism that takes into account idle cycles of back end servers evaluation using real workload shows that our algorithm can achieve hit rate improvements as well as reduction in average hit ages an implementation of this algorithm is currently in production use at yahoo
finding frequent itemsets is the most costly task in association rule mining outsourcing this task to service provider brings several benefits to the data owner such as cost relief and less commitment to storage and computational resources mining results however can be corrupted if the service provider is honest but makes mistakes in the mining process or ii is lazy and reduces costly computation returning incomplete results or iii is malicious and contaminates the mining results we address the integrity issue in the outsourcing process ie how the data owner verifies the correctness of the mining results for this purpose we propose and develop an audit environment which consists of database transformation method and result verification method the main component of our audit environment is an artificial itemset planting aip technique we provide theoretical foundation on our technique by proving its appropriateness and showing probabilistic guarantees about the correctness of the verification process through analytical and experimental studies we show that our technique is both effective and efficient
the most difficult problem in automatic clustering is the determination of the total number of final clusters in the present paper new method for finding is proposed and is compared with previously developed methods the proposed method is based on the minimization of the functional an in rn rdist where is the number of shapes and textures in cluster dist is the intra cluster distance and and are two parameters controlling the grain of the clustering the proposed method provides almost perfect clustering for the kimia kimia mpeg shape databases subset of brodatz full brodatz and uiuctex texture databases and provides better results than all previously proposed methods for automatic clustering
micro blogs relatively new phenomenon provide new communication channel for people to broadcast information that they likely would not share otherwise using existing channels eg email phone im or weblogs micro blogging has become popu lar quite quickly raising its potential for serving as new informal communication medium at work providing variety of impacts on collaborative work eg enhancing information sharing building common ground and sustaining feeling of connectedness among colleagues this exploratory research project is aimed at gaining an in depth understanding of how and why people use twitter popular micro blogging tool and exploring micro blog’s poten tial impacts on informal communication at work
statistical debugging uses dynamic instrumentation and machine learning to identify predicates on program state that are strongly predictive of program failure prior approaches have only considered simple atomic predicates such as the directions of branches or the return values of function calls we enrich the predicate vocabulary by adding complex boolean formulae derived from these simple predicates we draw upon three valued logic static program structure and statistical estimation techniques to efficiently sift through large numbers of candidate boolean predicate formulae we present qualitative and quantitative evidence that complex predicates are practical precise and informative furthermore we demonstrate that our approach is robust in the face of incomplete data provided by the sparse random sampling that typifies postdeployment statistical debugging
we present linear time algorithm that given flowgraph and tree checks whether is the dominator tree of also we prove that there exist two spanning trees of and such that for any vertex the paths from to in and intersect only at the vertices that dominate the proof is constructive and our algorithm can build the two spanning trees in linear time simpler versions of our two algorithms run in alpha time where is the number of vertices and is the number of arcs in the existence of such two spanning trees implies that we can order the calculations of the iterative algorithm for finding dominators proposed by allen and cocke so that it builds the dominator tree in single iteration
mining frequent patterns in transaction databases time series databases and many other kinds of databases has been studied popularly in data mining research most of the previous studies adopt an apriori like candidate set generation and test approach however candidate set generation is still costly especially when there exist large number of patterns and or long patternsin this study we propose novel frequent pattern tree fp tree structure which is an extended prefix tree structure for storing compressed crucial information about frequent patterns and develop an efficient fp tree based mining method fp growth for mining the complete set of frequent patterns by pattern fragment growth efficiency of mining is achieved with three techniques large database is compressed into condensed smaller data structure fp tree which avoids costly repeated database scans our fp tree based mining adopts pattern fragment growth method to avoid the costly generation of large number of candidate sets and partitioning based divide and conquer method is used to decompose the mining task into set of smaller tasks for mining confined patterns in conditional databases which dramatically reduces the search space our performance study shows that the fp growth method is efficient and scalable for mining both long and short frequent patterns and is about an order of magnitude faster than the apriori algorithm and also faster than some recently reported new frequent pattern mining methods
this paper considers the problem of rewriting queries using views by means of tolerant method the approach proposed is based on an approximate matching between the constraints from the query and those from the views in the case where both the query and the views contain arithmetical constraints expressed as intervals in such context the answers obtained are not certain anymore but only more or less probable an algorithm which retrieves the top rewritings of given query is described experimentations are reported which show that the extra cost induced by the approximate nature of the rewriting process is perfectly acceptable
dealing with inconsistencies is one the main challenges in data integration systems where data stored in the local sources may violate integrity constraints specified at the global level recently declarative approaches have been proposed to deal with such problem existing declarative proposals do not take into account preference assertions specified between sources when trying to solve inconsistency on the other hand the designer of an integration system may often include in the specification preference rules indicating the quality of data sources in this paper we consider local as view integration systems and propose method that allows one to assign formal semantics to data integration system whose declarative specification includes information on source preferences to the best of our knowledge our approach is the first one to consider in declarative way information on source quality for dealing with inconsistent data in local as view integration systems
open answer set programming oasp is an extension of answer set programming where one may ground program with an arbitrary superset of the program’s constants we define fixed point logic fpl extension of clark’s completion such that open answer sets correspond to models of fpl formulas and identify syntactic subclass of programs called loosely guarded programs whereas reasoning with general programs in oasp is undecidable the fpl translation of loosely guarded programs falls in the decidable loosely guarded fixed point logic mu gf moreover we reduce normal closed asp to loosely guarded oasp enabling for the first time characterization of an answer set semantics by mu lgf formulas we further extend the open answer set semantics for programs with generalized literals such generalized programs gps have interesting properties for example the ability to express infinity axioms we restrict the syntax of gps such that both rules and generalized literals are guarded via translation to guarded fixed point logic we deduce exptime completeness of satisfiability checking in such guarded gps ggps bound ggps are restricted ggps with exptime complete satisfiability checking but still sufficiently expressive to optimally simulate computation tree logic ctl we translate datalog lite programs to ggps establishing equivalence of ggps under an open answer set semantics alternation free mu gf and datalog lite
it is often too expensive to compute and materialize complete high dimensional data cube computing an iceberg cube which contains only aggregates above certain thresholds is an effective way to derive nontrivial multi dimensional aggregations for olap and data mining in this paper we study efficient methods for computing iceberg cubes with some popularly used complex measures such as average and develop methodology that adopts weaker but anti monotonic condition for testing and pruning search space in particular for efficient computation of iceberg cubes with the average measure we propose top average pruning method and extend two previously studied methods apriori and buc to top apriori and top buc to further improve the performance an interesting hypertree structure called tree is designed and new iceberg cubing method called top cubing is developed our performance study shows that top buc and top cubing are two promising candidates for scalable computation and top cubing has better performance in most cases
query expansion is an information retrieval technique in which new query terms are selected to improve search performance although useful terms can be extracted from documents whose relevance is already known it is difficult to get enough of such feedback from user in actual use we propose query expansion method that performs well even if user makes practically minimum effort that is chooses only single relevant document to improve searches in these conditions we made two refinements to well known query expansion method one uses transductive learning to obtain pseudo relevant documents thereby increasing the total number of source documents from which expansion terms can be extracted the other is modified parameter estimation method that aggregates the predictions of multiple learning trials to sort candidate terms for expansion by importance experimental results show that our method outperforms traditional methods and is comparable to state of the art method
dynamic meshes are associated with voluminous data and need to be encoded for efficient storage and transmission we study the impact of vertex clustering on registration based dynamic mesh coding where compact mesh motion representation is achieved by computing correspondences for the mesh segments from the temporal reference to obtain high compression performance clustering algorithms segment the mesh into smaller pieces and the compression performance is directly related to how effectively these pieces can describe the mesh motion in this paper we demonstrate that the use of efficient vertex clustering schemes in the compression framework can bring about improvement in compression performance
the web continues to grow at phenomenal rate and the amount of information on the web is overwhelming finding the relevant information remains big challenge due to its wide distribution its openness and high dynamics the web is complex system for which we have to imagine mechanisms of content maintaining filtering and organizing that are able to deal with its evolving dynamics and distribution integrating mechanisms of self organization of the web content is an attractive perspective to match with these requirements self organized complex systems can be programmed using situated multi agent systems with coupling between the agents social organization and spatial organization this paper explores the web from complex adaptive system cas perspective it reviews some characteristic behaviors of cass and shows how the web exhibits similar behaviors we propose model and prototype of system that addresses the dynamic web content organization adopting the cas vision and using the multi agent paradigm
unauthorized re use of code by students is widespread problem in academic institutions and raises liability issues for industry manual plagiarism detection is time consuming and current effective plagiarism detection approaches cannot be easily scaled to very large code repositories while there are practical text based plagiarism detection systems capable of working with large collections this is not the case for code based plagiarism detection in this paper we propose techniques for detecting plagiarism in program code using text similarity measures and local alignment through detailed empirical evaluation on small and large collections of programs we show that our approach is highly scalable while maintaining similar levels of effectiveness to that of the popular jplag and moss systems copyright copy john wiley sons ltd
recent interest in graph pattern mining has shifted from finding all frequent subgraphs to obtaining small subset of frequent subgraphs that are representative discriminative or significant the main motivation behind that is to cope with the scalability problem that the graph mining algorithms suffer when mining databases of large graphs another motivation is to obtain succinct output set that is informative and useful in the same spirit researchers also proposed sampling based algorithms that sample the output space of the frequent patterns to obtain representative subgraphs in this work we propose generic sampling framework that is based on metropolis hastings algorithm to sample the output space of frequent subgraphs our experiments on various sampling strategies show the versatility utility and efficiency of the proposed sampling approach
data mining can extract important knowledge from large data collections but sometimes these collections are split among various parties privacy concerns may prevent the parties from directly sharing the data the irony is that data mining results rarely violate privacy the objective of data mining is to generalize across populations rather than reveal information about individuals thus the true problem is not data mining but how data mining is done this paper presents new scalable algorithm for discovering closed frequent itemsets in distributed environment using commutative encryption to ensure privacy concerns we address secure mining of association rules over horizontally partitioned data
downward and upward simulations form sound and jointly complete methodology for verifying relational data refinement in state based specification languages such as and in previous work we showed how both downward and upward simulation conditions can be discharged using ctl model checker the approach was implemented in the sal tool suite given the retrieve relation each of the simulation conditions can be proven fully automatically it has been recognised however that finding retrieve relations is often very hard in this paper we show how it is feasible to use the sal model checkers to also generate retrieve relations
character skinning determines how the shape of the surface geometry changes as function of the pose of the underlying skeleton in this paper we describe skinning templates which define common deformation behaviors for common joint types this abstraction allows skinning solutions to be shared and reused and they allow user to quickly explore many possible alternatives for the skinning behavior of character the skinning templates are implemented using cage based deformations which offer flexible design space within which to develop reusable skinning behaviors we demonstrate the interactive use of skinning templates to quickly explore alternate skinning behaviors for models
in this paper we proposed bayesian mixture model in which introduce context variable which has dirichlet prior in bayesian framework to model text multiple topics and then clustering it is novel unsupervised text learning algorithm to cluster large scale web data in addition parameters estimation we adopt maximum likelihood ml and em algorithm to estimate the model parameters and employed bic principle to determine the number of clusters experimental results show that method we proposed distinctly outperformed baseline algorithms
previous work on semantics based multi stage programming msp language design focused on homogeneous designs where the generating and the generated languages are the same homogeneous designs simply add hygienic quasi quotation and evaluation mechanism to base language an apparent disadvantage of this approach is that the programmer is bound to both the expressivity and performance characteristics of the base language this paper proposes practical means to avoid this by providing specialized translations from subsets of the base language to different target languages this approach preserves the homogeneous look of multi stage programs and more importantly the static guarantees about the generated code in addition compared to an explicitly heterogeneous approach it promotes reuse of generator source code and systematic exploration of the performance characteristics of the target languages to illustrate the proposed approach we design and implement translation to subset of suitable for numerical computation and show that it preserves static typing the translation is implemented and evaluated with several benchmarks the implementation is available in the online distribution of metaocaml
efficient estimation of population size is common requirement for many wireless sensor network applications examples include counting the number of nodes alive in the network and measuring the scale and shape of physically correlated events these tasks must be accomplished at extremely low overhead due to the severe resource limitation of sensor nodes which poses challenge for large scale sensor networks in this article we design novel measurement technique flake based on sparse sampling that is generic in that it is applicable to arbitrary wireless sensor networks wsn it can be used to efficiently evaluate system size scale of event and other global aggregating or summation information of individual nodes over the whole network in low communication cost this functionality is useful in many applications but hard to achieve when each node has only limited local knowledge of the network therefore flake is composed of two main components to solve this problem one is the injected random data dissemination sampling method the other is sparse sampling algorithm based on inverse sampling upon which it improves by achieving target variance with small error and low communication cost flake uses approximately uniform random data dissemination and sparse sampling in sensor networks which is an unstructured and localized method at last we provide experimental results demonstrating the efftectiveness of our algorithm on both small scale and large scale wsns our measurement technique appears to be the practiclal and appropriate choice
the proliferation of heterogeneous devices and diverse networking technologies demands flexible models to guarantee the quality of service qos at the application session level which is common behavior of many network centric applications eg web browsing and instant messaging several qos models have been proposed for heterogeneous wired wireless environments however we envision that the missing part which is also big challenge is taking energy scarce resource for mobile and energy constrained devices into consideration in this paper we propose novel energy aware qos model qos for application sessions that might across multiple protocol domains which will be common in the future internet rather than an exception the model provides qos guarantee by dynamically selecting and adapting application protocols to the best of our knowledge our model is the first attempt to address qos adaptation at the application session level by introducing new qos metric called session lifetime to show the effectiveness of the proposed scheme we have implemented two case studies web browsing from pocket pc to regular web server and an instant messaging application between two pocket pcs in the former case study our approach outperforms the conventional approach without energy aware qos by more than in terms of the session lifetime in the second case study we also successfully extend the session lifetime to the value negotiated by two pocket pcs with very diverse battery capacities
the increasing demand for mobility in our society poses various challenges to traffic engineering computer science in general and artificial intelligence and multiagent systems in particular as it is often the case it is not possible to provide additional capacity so that more efficient use of the available transportation infrastructure is necessary this relates closely to multiagent systems as many problems in traffic management and control are inherently distributed also many actors in transportation system fit very well the concept of autonomous agents the driver the pedestrian the traffic expert in some cases also the intersection and the traffic signal controller can be regarded as an autonomous agent however the agentification of transportation system is associated with some challenging issues the number of agents is high typically agents are highly adaptive they react to changes in the environment at individual level but cause an unpredictable collective pattern and act in highly coupled environment therefore this domain poses many challenges for standard techniques from multiagent systems such as coordination and learning this paper has two main objectives to present problems methods approaches and practices in traffic engineering especially regarding traffic signal control and ii to highlight open problems and challenges so that future research in multiagent systems can address them
today’s web applications require instruments and techniques able to face their complexity which has noticeably increased at the expense of productivity and quality factors number of design methodologies have been proposed in the process of trying to provide developers with languages and tools to abstract and capture web applications under orthogonal views like data navigation and presentation while the different modeling language constructs can be unified in common metamodel consistency among the distinct concerns is guaranteed by less formal relations usually they are based on name conventions and or ad hoc tool support that could affect reuse and maintenance ratings of specifications in order to define rigorous and explicit correspondences between the artifacts produced during system development this paper proposes the exploitation of dedicated weaving models the approach aims at providing structural mappings that do not interfere with the definition of the views on either side achieving clear separation between them and their connections furthermore following the everything is model principle this work can enable the use of general purpose theories and tools for example model transformations can be applied to evaluate the given specifications or to derive alternative descriptions like webile or webml
the current evolution of information technology leads to the increase of automatic data processing over multiple information systems the data we deal with concerns sensitive information about users or groups of users typical problem in this context concerns the disclosure of confidential identity data to tackle this difficulty we consider in this paper the context of hippocratic multi agent systems himas model designed for the privacy management in this context we propose common content language combining meta policies and application context data on one hand and on the other hand an interaction protocol for the exchange of sensitive data based on this proposal agents providing sensitive data are able to check the compliance of the consumers to the himas principles the protocol that we propose is validated on distributed calendar management application
the definition of choreography specification languages for service oriented systems poses important challenges mainstream approaches tend to focus on procedural aspects leading to over constrained and over specified models because of such drawback declarative languages are gaining popularity as better way to model service choreographies similar issue was met in the multi agent systems domain where declarative approaches based on social semantics have been used to capture the nature of agent interaction without over constraining their behaviour in this work we present an integrated framework capable to cover the entire cycle of specification and verification of choreographies by mixing approaches coming from the service oriented computing and multi agent systems research domains sciff is the underlying logic programming framework for modelling and verifying interaction in open systems the use of sciff brings us two main advantages it allows us to capture within single framework different aspects of choreography ranging from constraints on the flow of messages to effects and commitments resulting from their exchange it provides an operational model that can be exploited to perform variety of verification tasks
this paper considers qltl quantitative analagon of ltl and presents algorithms for model checking qltl over quantitative versions of kripke structures and markov chains
software frequently needs to adapt its behavior at run time to respond to changes in its execution environment different software components may use different approaches to adaptation composing single adaptive system from existing adaptive components requires an adaptation infrastructure to integrate and arbitrate adaptive behaviors this paper proposes model for such an infrastructure and describes the design and operation of sup sup prototype implementation sup sup uses technique called transparent shaping to modify existing components so that they can report events of interest to the sup sup core and implement appropriate responses the architecture and communication infrastructure of sup sup are described followed by case study in which sup sup is used to construct an adaptive multimedia conferencing application from otherwise incompatible components
when database query has large number of results the user can only be shown one page of results at time one popular approach is to rank results such that the best results appear first this approach is well suited for information retrieval and for some database queries such as similarity queries or under specified or keyword queries with known or guessable user preferences however standard database query results comprise set of tuples with no associated ranking it is typical to allow users the ability to sort results on selected attributes but no actual ranking is defined an alternative approach is not to try to show the estimated best results on the first page but instead to help users learn what is available in the whole result set and direct them to finding what they need we present datalens framework that generates the most representative data points to display on the first page without sorting or ranking ii allows users to drill down to more similar items in hierarchical fashion and iii dynamically adjusts the representatives based on the user’s new query conditions to the best of our knowledge datalens is the first to allow hierarchical database result browsing and searching at the same time
interface grammars are formalism for expressing constraints on sequences of messages exchanged between two components in this paper we extend interface grammars with an automated translation of xml schema definitions present in wsdl documents into interface grammar rules given an interface grammar we can then automatically generate either parser to check that sequence of messages generated by web service client is correct with respect to the interface specification or sentence generator producing compliant message sequences to check that the web service responds to them according to the interface specification by doing so we can validate and generate both messages and sequences of messages in uniform manner moreover we can express constraints where message structure and control flow cannot be handled separately
the paper investigates the relationship between analytical capabilities in the plan source make and deliver area of the supply chain and its performance using information system support and business process orientation as moderators structural equation modeling employs sample of companies from different industries from the usa europe canada brazil and china the findings suggest the existence of statistically significant relationship between analytical capabilities and performance the moderation effect of information systems support is considerably stronger than the effect of business process orientation the results provide better understanding of the areas where the impact of business analytics may be the strongest
several multiagent reinforcement learning marl algorithms have been proposed to optimize agents decisions due to the complexity of the problem the majority of the previously developed marl algorithms assumed agents either had some knowledge of the underlying game such as nash equilibria and or observed other agents actions and the rewards they received we introduce new marl algorithm called theweighted policy learner wpl which allows agents to reach nash equilibrium ne in benchmark player action games with minimum knowledge using wpl the only feedback an agent needs is its own local reward the agent does not observe other agents actions or rewards furthermore wpl does not assume that agents know the underlying game or the corresponding nash equilibrium priori we experimentally show that our algorithm converges in benchmark two player two action games we also show that our algorithm converges in the challenging shapley’s game where previous marl algorithms failed to converge without knowing the underlying game or the ne furthermore we show that wpl outperforms the state of the art algorithms in more realistic setting of agents interacting and learning concurrently an important aspect of understanding the behavior of marl algorithm is analyzing the dynamics of the algorithm how the policies of multiple learning agents evolve over time as agents interact with one another such an analysis not only verifies whether agents using given marl algorithm will eventually converge but also reveals the behavior of the marl algorithm prior to convergence we analyze our algorithm in two player two action games and show that symbolically proving wpl’s convergence is difficult because of the non linear nature of wpl’s dynamics unlike previous marl algorithms that had either linear or piece wise linear dynamics instead we numerically solve wpl’s dynamics differential equations and compare the solution to the dynamics of previous marl algorithms
we present offline ram compression an automated source to source transformation that reduces program’s data size statically allocated scalars pointers structures and arrays are encoded and packed based on the results of whole program analysis in the value set and pointer set domains we target embedded software written in that relies heavily on static memory allocation and runs on harvard architecture microcontrollers supporting just few kb of on chip ram on collection of embedded applications for avr microcontrollers our transformation reduces ram usage by an average of in addition to reduction through dead data elimination pass that is also driven by our whole program analysis for total ram savings of we also developeda technique for giving developers access to flexible spectrum of tradeoffs between ram consumption rom consumption and cpu efficiency this technique is based on model for estimating the cost benefit ratio of compressing each variable and then selectively compressing only those variables that present good value proposition in terms of the desired tradeoffs
we present trace semantics for language of parallel programs which share access to mutable data we introduce resource sensitive logic for partial correctness based on recent proposal of o’hearn adapting separation logic to the concurrent setting the logic allows proofs of parallel programs in which ownership of critical data such as the right to access update or deallocate pointer is transferred dynamically between concurrent processes we prove soundness of the logic using novel local interpretation of traces which allows accurate reasoning about ownership we show that every provable program is race free
exploiting thread level parallelism is paramount in the multicore era transactions enable programmers to expose such parallelism by greatly simplifying the multi threaded programming model virtualized transactions unbounded in space and time are desirable as they can increase the scope of transactions use and thereby further simplify programmer’s job however hardware support is essential to support efficient execution of unbounded transactions in this paper we introduce page based transactional memory to support unbounded transactions we combine transaction bookkeeping with the virtual memory system to support fast transaction conflict detection commit abort and to maintain transactions speculative data
since the amount of information is rapidly growing there is an overwhelming interest in efficient network computing systems including grids public resource computing systems pp systems and cloud computing in this paper we take detailed look at the problem of modeling and optimization of network computing systems for parallel decision tree induction methods firstly we present comprehensive discussion on mentioned induction methods with special focus on their parallel versions next we propose generic optimization model of network computing system that can be used for distributed implementation of parallel decision trees to illustrate our work we provide results of numerical experiments showing that the distributed approach enables significant improvement of the system throughput
schema integration is the problem of creating unified target schema based on set of existing source schemas that relate to each other via specified correspondences the unified schema gives standard representation of the data thus offering way to deal with the heterogeneity in the sources in this paper we develop method and design tool that provide adaptive enumeration of multiple interesting integrated schemas and easy to use capabilities for refining the enumerated schemas via user interaction our method is departure from previous approaches to schema integration which do not offer systematic exploration of the possible integrated schemas the method operates at logical level where we recast each source schema into graph of concepts with has relationships we then identify matching concepts in different graphs by taking into account the correspondences between their attributes for every pair of matching concepts we have two choices merge them into one integrated concept or keep them as separate concepts we develop an algorithm that can systematically output without duplication all possible integrated schemas resulting from the previous choices for each integrated schema the algorithm also generates mapping from the source schemas to the integrated schema that has precise information preserving properties furthermore we avoid full enumeration by allowing users to specify constraints on the merging process based on the schemas produced so far these constraints are then incorporated in the enumeration of the subsequent schemas the result is an adaptive and interactive enumeration method that significantly reduces the space of alternative schemas and facilitates the selection of the final integrated schema
in part ii of this paper firstly we study the relationship between the afs axiomatic fuzzy zet and fca formal concept analysis which has become powerful theory for data analysis information retrieval and knowledge discovery and some algebraic homomorphisms between the afs algebras and the concept lattices are established then the numerical approaches to determining membership functions proposed in part of this paper are used to study the fuzzy description and data clustering problems by mimicking human reasoning process finally illustrative examples show that the framework of afs theory offers far more flexible and effective approach to artificial intelligence system analysis and design with applications to knowledge acquisition and representations in practice
multicast is an important collective communication operation on multicomputer systems in which the same message is delivered from source node to an arbitrary number of destination nodes the star graph interconnection network has been recognized as an attractive alternative to the popular hypercube network in this paper we first address dual hamiltonian path based routing model with two virtual channels based on two hamiltonian paths hps and network partitioning strategy for wormhole routed star graph networks then we propose three efficient multicast routing schemes on basis of such model all of the three proposed schemes are proved deadlock free the first scheme network selection based dual path routing selects subnetworks that are constructed either by the first hp or by the second hp for dual path routing the second one optimum dual path routing selects subnetworks with optimum routing path for dual path routing the third scheme two phase optimum dual path routing includes two phases source to relay and relay to destination finally experimental results are given to show that our proposed three routing schemes outperform the unicast based the hp and the single hp based dual path routing schemes significantly
model driven engineering mde is an approach to develop software systems by creating models and applying automated transformations to them to ultimately generate the implementation for target platform although the main focus of mde is on the generation of code it is also necessary to support the analysis of the designs with respect to quality attributes such as performance to complement the model to implementation path of mde approaches an mde tool infrastructure should provide what we call model driven analysis this paper describes an approach to model driven analysis based on reasoning frameworks in particular it describes performance reasoning framework that can transform design into model suitable for analysis of real time performance properties with different evaluation procedures including rate monotonic analysis and simulation the concepts presented in this paper have been implemented in the pacc starter kit development environment that supports code generation and analysis from the same models
excessive supply voltage drops in circuit may lead to significant circuit performance degradation and even malfunction to handle this problem existing power delivery aware placement algorithms model voltage drops as an optimization objective we observe that directly minimizing voltage drops in an objective function might not resolve voltage drop violations and might even cause problems in power integrity convergence to remedy this deficiency in this paper we propose new techniques to incorporate device power spreading forces into mixed size analytical placement framework unlike the state of the art previous work that handles the worst voltage drop spots one by one our approach simultaneously and globally spreads all the blocks with voltage drop violations to desired locations directly to minimize the violations to apply the power force we model macro current density and power rails for our placement framework to derive desired macro cell locations to further improve the solution quality we propose an efficient mathematical transformation to adjust the power force direction and magnitude experimental results show that our approach can substantially improve the voltage drops wirelength and runtime over the previous work
in this paper we propose new method for approximating an unorganized set of points scattered over piecewise smooth surface by triangle mesh the method is based on the garland heckbert local quadric error minimization strategy first an adaptive spherical cover and auxiliary points corresponding to the cover elements are generated then the intersections between the spheres of the cover are analyzed and the auxiliary points are connected finally the resulting mesh is cleaned from non manifold parts the method allows us to control the approximation accuracy process noisy data and reconstruct sharp edges and corners further the vast majority of the triangles of the generated mesh have their aspect ratios close to optimal thus our approach integrates the mesh reconstruction smoothing decimation feature restoration and remeshing stages together
the architecture of multi agent system can naturally be viewed as computational organisation for this reason we believe organisational abstractions should play central role in the analysis and design of such systems to this end the concepts of agent roles and role models are increasingly being used to specify and design multi agent systems however this is not the full picture in this paper we introduce three additional organisational concepts organisational rules organisational structures and organisational patterns that we believe are necessary for the complete specification of computational organisationswe view the introduction of these concepts as step towards comprehensive methodology for agent oriented systems
due to the rapid growth in personal image collections there is increasing interest on automatic detection of near duplicates in this paper we propose novel fast near duplicate detection framework that takes advantages of heterogeneous features like exif data global image histogram and local features to improve the accuracy of local feature matching we have developed structure matching algorithm that takes into account of local feature’s neighborhood which can effectively reject mismatches in addition we developed computation sensitive cascade framework to combine stage classifiers trained on different feature spaces with different computational cost this method can quickly accept easily identified duplicates using only cheap features without the need to extract more sophisticate but expensive ones compared with existing approaches our experiments show very promising results using our new approach in terms of both efficiency and effectiveness
the reverse engineering community has recognized the importance of interoperability the cooperation of two or more systems to enable the exchange and utilization of data and has noted that the current lack of interoperability is contributing factor to the lack of adoption of available infrastructures to address the problems of interoperability and reproducing previous results we present an infrastructure that supports interoperability among reverse engineering tools and applications we present the design of our infrastructure including the hierarchy of schemas that captures the interactions among graph structures we also develop and utilize our implementation which is designed using gxl based pipe filter architecture to perform case study that demonstrates the feasibility of our infrastructure
the internet enables global sharing of data across organizational boundaries distributed file systems facilitate data sharing in the form of remote file access however traditional access control mechanisms used in distributed file systems are intended for machines under common administrative control and rely on maintaining centralized database of user identities they fail to scale to large user base distributed across multiple organizations we provide survey of decentralized access control mechanisms in distributed file systems intended for large scale in both administrative domains and users we identify essential properties of such access control mechanisms we analyze both popular production and experimental distributed file systems in the context of our survey
this roadmap describes ways that researchers in four areas specification languages program generation correctness by construction and programming languages might help further the goal of verified software it also describes what advances the verified software grand challenge might anticipate or demand from work in these areas that is the roadmap is intended to help foster collaboration between the grand challenge and these research areasa common goal for research in these areas is to establish language designs and tool architectures that would allow multiple annotations and tools to be used on single program in the long term researchers could try to unify these annotations and integrate such tools
we present new approach to enhancing answer set programming asp with constraint processing techniques which allows for solving interesting constraint satisfaction problems in asp we show how constraints on finite domains can be decomposed into logic programs such that unit propagation achieves arc bound or range consistency experiments with our encodings demonstrate their computational impact
in peer to peer pp file sharing system the faithful delivery of an authentic file depends on the authenticity of the file advertisement as well as the authenticity of the shared file advertised we present the index authenticity problem in distributed pp indexing scheme and propose to employ secure index verification scheme which allows querying requestor to securely verify the authenticity of file advertisements in the query response from distrusted index peer with unforgeable proofs of indexing correctness in order to combat index poisoning attacks targeting index authenticity solution based on signature and bloom filter bf and its cost efficiency analysis are given
this paper describes the framework of the statcan daily translation extraction system sdtes computer system that maps and compares web based translation texts of statistics canada statcan news releases in the statcan publication the daily the goal is to extract translations for translation memory systems for translation terminology building for cross language information retrieval and for corpus based machine translation systems three years of officially published statistical news release texts at http wwwstatcanca were collected to compose the statcan daily data bank the english and french texts in this collection were roughly aligned using the gale church statistical algorithm after this boundary markers of text segments and paragraphs were adjusted and the gale church algorithm was run second time for more fine grained text segment alignment to detect misaligned areas of texts and to prevent mismatched translation pairs from being selected key textual and structural properties of the mapped texts were automatically identified and used as anchoring features for comparison and misalignment detection the proposed method has been tested with web based bilingual materials from five other canadian government websites results show that the sdtes model is very efficient in extracting translations from published government texts and very accurate in identifying mismatched translations with parameters tuned the text mapping part can be used to align corpus data collected from official government websites and the text comparing component can be applied in prepublication translation quality control and in evaluating the results of statistical machine translation systems
epilepsy affects over three million americans of all ages despite recent advances more than of individuals with epilepsy never achieve adequate control of their seizures the use of small portable non invasive seizure monitor could benefit these individuals tremendously however in order for such device to be suitable for long term wear it must be both comfortable and lightweight typical state of the art non invasive seizure onset detection algorithms require scalp electrodes to be placed on the head these electrodes are used to generate data streams called channels the large number of electrodes is inconvenient for the patient and processing channels can consume considerable amount of energy problem for battery powered device in this paper we describe an automated way to construct detectors that use fewer channels and thus fewer electrodes starting from an existing technique for constructing channel patient specific detectors we use machine learning to automatically construct reduced channel detectors we evaluate our algorithm on data from patients used in an earlier study on average our algorithm reduced the number of channels from to while decreasing the mean fraction of seizure onsets detected from to for out of the patients there was no degradation in the detection rate while the average detection latency increased from to the average rate of false alarms per hour decreased from to we also describe prototype implementation of single channel eeg monitoring device built using off the shelf components and use this implementation to derive an energy consumption model using fewer channels reduced the average energy consumption by which amounts to increase in battery lifetime finally we show how additional energy savings can be realized by using low power screening detector to rule out segments of data that are obviously not seizures though this technique does not reduce the number of electrodes needed it does reduce the energy consumption by an additional
this paper shows that type graph obtained via polymorphic type inference harbors explicit directional flow paths between functions these flow paths arise from the instantiations of polymorphic types and correspond to call return sequences in first order programs we show that flow information can be computed efficiently while considering only paths with well matched call return sequences even in the higher order case furthermore we present practical algorithm for inferring type instantiation graphs and provide empirical evidence to the scalability of the presented techniques by applying them in the context of points to analysis for programs
alarm correlation analysis system is an useful method and tool for analyzing alarms and finding the root cause of faults in telecommunication networks recently the application of association rules mining becomes an important research area in alarm correlation analysis in this paper we propose novel association rules mining based alarm correlation analysis system arm acas to find interesting association rules between alarm events in order to mine some infrequent but important items arm acas first uses neural network to classify the alarms with different levels in addition arm acas also exploits an optimization technique with the weighted frequent pattern tree structure to improve the mining efficiency the system is both efficient and practical in discovering significant relationships of alarms as illustrated by experiments performed on simulated and real world datasets
dramatic shift is underway in how organizations use computer storage this shift will have profound impact on storage system design the requirement for storage of traditional transactional data is being supplemented by the necessity to store information for long periods in total of petabytes of storage was allocated worldwide for information that required long term retention and this amount is expected to grow to an estimated petabytes by in this paper we review the requirements for long term storage of data and describe an innovative approach for developing highly scalable and flexible archive storage system using commercial off the shelf cots components such system is expected to be capable of preserving data for decades providing efficient policy based management of the data and allowing efficient search and access to data regardless of data content or location
provisioning storage system requires balancing the costs of the solution with the benefits that the solution will provide previous provisioning approaches have started with fixed set of requirements and the goal of automatically finding minimum cost solutions to meet them such approaches neglect the cost benefit analysis of the purchasing decision purchasing storage system involves an extensive set of trade offs between metrics such as purchase cost performance reliability availability power etc increases in one metric have consequences for others and failing to account for these trade offs can lead to poor return on the storage investment using collection of storage acquisition and provisioning scenarios we show that utility functions enable this cost benefit structure to be conveyed to an automated provisioning tool enabling the tool to make appropriate trade offs between different system metrics including performance data protection and purchase cost
shopping in supermarkets is becoming an increasingly interactive experience as stores integrate technologies to support shoppers while shopping is an essential and routine type of consumer behaviour emerging technologies posses the qualities to change our behaviour and patterns while shopping this paper describes cast context aware shopping trolley designed to support the shopping activity in supermarket through context awareness and the acquiring of user attention the design is based on understandings of supermarket shopping needs and behaviour derived from previous studies the system supports customers in finding and purchasing products from shopping list field evaluation showed that cast affected the shopping behaviour and experience in more ways eg more uniform behaviour in terms of product sequence collection ease of finding products however they saved no significant time of the shopping activity
key challenge in web services security is the design of effective access control schemes that can adequately meet the unique security challenges posed by the web services paradigm despite the recent advances in web based access control approaches applicable to web services there remain issues that impede the development of effective access control models for web services environment amongst them are the lack of context aware models for access control and reliance on identity or capability based access control schemes additionally the unique service access control features required in web services technology are not captured in existing schemes in this paper we motivate the design of an access control scheme that addresses these issues and propose an extended trust enhanced version of our xml based role based access control rbac framework that incorporates trust and context into access control we outline the configuration mechanism needed to apply our model to the web services environment and provide service access control specification the paper presents an example service access policy composed using our framework and also describes the implementation architecture for the system
since its introduction answer set programming has been generalized in many directions to cater to the needs of real world applications as one of the most general classical approaches answer sets of arbitrary propositional theories can be defined as models in the equilibrium logic of pearce fuzzy answer set programming on the other hand extends answer set programming with the capability of modeling continuous systems in this paper we combine the expressiveness of both approaches and define answer sets of arbitrary fuzzy propositional theories as models in fuzzification of equilibrium logic we show that the resulting notion of answer set is compatible with existing definitions when the syntactic restrictions of the corresponding approaches are met we furthermore locate the complexity of the main reasoning tasks at the second level of the polynomial hierarchy finally as an illustration of its modeling power we show how fuzzy equilibrium logic can be used to find strong nash equilibria
object motion during camera exposure often leads to noticeable blurring artifacts proper elimination of this blur is challenging because the blur kernel is unknown varies over the image as function of object velocity and destroys high frequencies in the case of motions along direction eg horizontal we show that these challenges can be addressed using camera that moves during the exposure through the analysis of motion blur as space time integration we show that parabolic integration corresponding to constant sensor acceleration leads to motion blur that is invariant to object velocity thus single deconvolution kernel can be used to remove blur and create sharp images of scenes with objects moving at different speeds without requiring any segmentation and without knowledge of the object speeds apart from motion invariance we prove that the derived parabolic motion preserves image frequency content nearly optimally that is while static objects are degraded relative to their image from static camera reliable reconstruction of all moving objects within given velocities range is made possible we have built prototype camera and present successful deblurring results over wide variety of human motions
recent work has highlighted the importance of the constraint based mining paradigm in the context of frequent itemsets associations correlations sequential patterns and many other interesting patterns in large databases constraint pushing techniques have been developed for mining frequent patterns and associations with antimonotonic monotonic and succinct constraints in this paper we study constraints which cannot be handled with existing theory and techniques in frequent pattern mining for example avg theta median theta sum theta can contain items of arbitrary values theta isin lcub le ge rcub and is real number are customarily regarded as ldquo tough rdquo constraints in that they cannot be pushed inside an algorithm such as apriori we develop notion of convertible constraints and systematically analyze classify and characterize this class we also develop techniques which enable them to be readily pushed deep inside the recently developed fp growth algorithm for frequent itemset mining results from our detailed experiments show the effectiveness of the techniques developed
software development for sensor network is made difficult by resource constrained sensor devices distributed system complexity communication unreliability and high labor cost simulation as useful tool provides an affordable way to study algorithmic problems with flexibility and controllability however in exchange for speed simulation often trades detail that ultimately limits its utility in this paper we propose new development paradigm simulation based augmented reality in which simulation is used to enhance development on physical hardware by seamlessly integrating running simulated network with physical deployment in way that is transparent to each the advantages of such an augmented network include the ability to study large sensor network with limited hardware and the convenience of studying part of the physical network with simulation’s debugging profiling and tracing capabilities we implement the augmented reality system based on sensor network simulator with high fidelity and high scalability key to the design are super sensor nodes which are half virtual and half physical that interconnect simulation and physical network with fine grained traffic forwarding and accurate time synchronization our results detail the overhead associated with integrating live and simulated networks and the timing accuracy between virtual and physical parts of the network we also discuss various application scenarios for our system
scalable resource discovery services form the core of directory and other middleware services scalability requirements preclude centralized solutions the need to have directory services that are highly robust and that can scale with the number of resources and the performance of individual nodes points to peer to peer pp architectures as promising approach the resource location problem can be simply stated as given resource name find the location of node or nodes that manage the resource we call this the deterministic location problem in very large network it is clearly not feasible to contact all nodes to locate resource therefore we modify the problem statement to given resource name find with given probability the location of node or nodes that manage the resource we call this probabilistic location approach we present protocol that solves this problem and develop an analytical model to compute the probability that directory entry is found the fraction of peers involved in search and the average number of hops required to find directory entry numerical results clearly show that the proposed approach achieves high probability of finding the entry while involving relatively small fraction of the total number of peers the analytical results are further validated by results obtained from an implementation of the proposed protocol in cluster of workstations
this paper focuses on the problem of how to allow source to send message without revealing its physical location and proposes an anti localization routing protocol alar to achieve anonymous delivery in delay disruption tolerant networks the objectives of alar are to minimize the probability of data source being localized and to maximize the destination’s probability of receiving the message alar can protect the sender’s location privacy through message fragmentation and forwarding each segment to different receivers alar is validated on two real world human mobility datasets this study indicates that alar increases the sender’s anonymity performance by over in different adversary densities with reduction in delivery ratio
the increasing number of software based attacks has attracted substantial efforts to prevent applications from malicious interference for example trusted computing tc technologies have been recently proposed to provide strong isolation on application platforms on the other hand today pervasively available computing cycles and data resources have enabled various distributed applications that require collaboration among different application processes these two conflicting trends grow in parallel while much existing research focuses on one of these two aspects few authors have considered simultaneously providing strong isolation as well as collaboration convenience particularly in the tc environment however none of these schemes is transparent that is they require modifications either of legacy applications or the underlying operating system os in this paper we propose the securebus sb architecture aiming to provide strong isolation and flexible controlled information flow and communication between processes at runtime since sb is application and os transparent existing applications can run without changes to commodity os’s furthermore sb enables the enforcement of general access control policies which is required but difficult to achieve for typical legacy applications to study its feasibility and performance overhead we have implemented prototype system based on user mode linux our experimental results show that sb can effectively achieve its design goals
design patterns aim at improving reusability and variability of object oriented software despite notable success aspect oriented programming aop has been discussed recently to improve the design pattern implementations in another line of research it has been noticed that feature oriented programming fop is related closely to aop and that fop suffices in many situations where aop is commonly used in this paper we explore the assumed duality between aop and fop mechanisms as case study we use the aspect oriented design pattern implementations of hannemann and kiczales we observe that almost all of the aspect oriented design pattern implementations can be transformed straightforwardly into equivalent feature oriented design patterns for further investigations we provide set of general rules how to transform aspect oriented programs into feature oriented programs
in this paper we describe set of compiler analyses and an implementation that automatically map sequential and un annotated program into pipelined implementation targeted for an fpga with multiple external memories for this purpose we extend array data flow analysis techniques from parallelizing compilers to identify pipeline stages required inter pipeline stage communication and opportunities to find minimal program execution time by trading communication overhead with the amount of computation overlap in different stages using the results of this analysis we automatically generate application specific pipelined fpga hardware designs we use sample image processing kernel to illustrate these concepts our algorithm finds solution in which transmitting row of an array between pipeline stages per communication instance leads to speedup of over an implementation that communicates the entire array at once
as on line community activities have increased exponentially the need for group recommendation system has also become more and more imperative although the traditional recommendation system has achieved great success in supporting individuals purchasing decisions it is not suitable for supporting group purchasing decisions because its input can neither include items ratings given by groups nor can it generate recommendations for groups therefore this study proposes novel group recommendation system to satisfy this demand the system is designed based on the framework of collaborative filtering especially we use genetic algorithm to predict the possible interactions among group members so that we can correctly estimate the rating that group of members might give to an item the experimental results show that the proposed system can give satisfactory and high quality group recommendations
xml and xquery semantics are very sensitive to the order of the produced output although pattern tree based algebraic approaches are becoming more and more popular for evaluating xml there is no universally accepted technique which can guarantee both correct output order and choice of efficient alternative planswe address the problem using hybrid collections of trees that can be either sets or sequences or something in between each such collection is coupled with an ordering specification that describes how the trees are sorted full partial or no order this provides us with formal basis for developing query plan having parts that maintain no order and parts with partial or full orderit turns out that duplicate elimination introduces some of the same issues as order maintenance it is expensive and single collection type does not always provide all the flexibility required to optimize this properly to solve this problem we associate with each hybrid collection duplicate specification that describes the presence or absence of duplicate elements in it we show how to extend an existing bulk tree algebra tlc to use ordering and duplicate specifications and produce correctly ordered results we also suggest some optimizations enabled by the flexibility of our approach and experimentally demonstrate the performance increase due to them
two novel concurrency algorithms for abstract data types are presented that ensure serializability of transactions it is proved that both algorithms ensure local atomicity property called dynamic atomicity the algorithms are quite general permitting operations to be both partial and nondeterministic the results returned by operations can be used in determining conflicts thus allowing higher levels of concurrency than otherwise possible the descriptions and proofs encompass recovery as well as concurrency control the two algorithms use different recovery methods one uses intentions lists and the other uses undo logs it is shown that conflict relations that work with one recovery method do not necessarily work with the other general correctness condition that must be satisfied by the combination of recovery method and conflict relation is identified
application development for distributed computing grids can benefit from tools that variously hide or enable application level management of critical aspects of the heterogeneous environment as part of an investigation of these issues we have developed mpich grid enabled implementation of the message passing interface mpi that allows user to run mpi programs across multiple computers at the same or different sites using the same commands that would be used on parallel computer this library extends the argonne mpich implementation of mpi to use services provided by the globus toolkit for authentication authorization resource allocation executable staging and as well as for process creation monitoring and control various performance critical operations including startup and collective operations are configured to exploit network topology information the library also exploits mpi constructs for performance management for example the mpi communicator construct is used for application level discovery of and adaptation to both network topology and network quality of service mechanisms we describe the mpich design and implementation present performance results and review application experiences including record setting distributed simulations
we investigate single view algorithms as an alternative to multi view algorithms for weakly supervised learning for natural language processing tasks without natural feature split in particular we apply co training self training and em to one such task and find that both self training and fs em new variation of em that incorporates feature selection outperform co training and are comparatively less sensitive to parameter changes
population protocols are model presented recently for networks with very large possibly unknown number of mobile agents having small memory this model has certain advantages over alternative models such as dtn for such networks however it was shown that the computational power of this model is limited to semi linear predicates only hence various extensions were suggested we present model that enhances the original model of population protocols by introducing weak notion of speed of the agents this enhancement allows us to design fast converging protocols with only weak requirements for example suppose that there are different types of agents say agents attached to sick animals and to healthy animals two meeting agents just need to be able to estimate which of them is faster eg using their types but not to actually know the speeds of their types then using the new model we study the gathering problem in which there is an unknown number of anonymous agents that have values they should deliver to base station without replications we develop efficient protocols step by step searching for an optimal solution and adapting to the size of the available memory the protocols are simple though their analysis is somewhat involved we also present more involved result lower bound on the length of the worst execution for any protocol our proofs introduce several techniques that may prove useful also in future studies of time in population protocols
event detection is an essential application in wireless sensor networks wsns especially for the monitoring of physical world while most previous research focuses on the detection of local events in this paper we propose novel algorithms to energy efficiently detect events of large and global scale we divide global event into regional events so the detection algorithm can be executed distributively inside the network our approach also takes advantage of the temporal correlations of sensing data to gain energy efficiency it uses bound suppression mechanism to set bounds and suppresses silent regional events cutting down the transmissions simulation results show that the proposed geda continuous global event detection algorithm is efficient to reduce the cost of transmissions and it gains more than of cost reduction compared with the previous detection approaches
new generations of embedded devices following the trend found in personal computers are becoming computationally powerful current embedded scenario presents large amount of complex and heterogeneous functionalities which have been forcing designers to create novel solutions to increase the performance of embedded processors while at the same time maintain power dissipation as low as possible former embedded devices could have been designed to execute defined application set nowadays in the new generation of these devices some applications are unknown at design time for example in portable phones the client is able to download new applications during the product lifetime hence traditional designs can fail to deliver the required performance while executing an application behavior that has not been previously defined on the other hand reconfigurable architectures appear to be possible solution to increase the processor performance but their employment in embedded devices faces two main design constraints power and area in this work we propose an asip reconfigurable development flow that aggregates design area optimization and run time technique that reduces energy consumption the coupling of both methods builds an area optimized reconfigurable architecture to provide high performance and energy efficient execution of defined application set moreover thanks to the adaptability provided by the reconfigurable asip approach the execution of new application not foreseen at design time still shows high speedups rates with low energy consumption
two major trends in the digital design industry are the increase insystem complexity and the increasing importance of short designtimes the rise in design complexity is motivated by consumerdemand for higher performance products as well as increases inintegration density which allow more functionality to be placed ona single chip consequence of this rise in complexity is significantincrease in the amount of simulation required to design digitalsystems simulation time typically scales as the square of theincrease in system complexity short design times are importantbecause once design has been conceived there is limited timewindow in which to bring the system to market while its performanceis competitivesimulation serves many purposes during the design cycle of digitalsystem in the early stages of design high level simulation isused for performance prediction and analysis in the middle of thedesign cycle simulation is used to develop the software algorithmsand refine the hardware in the later stages of design simulation isused make sure performance targets are reached and to verify thecorrectness of the hardware and software the different simulationobjectives require varying levels of modeling detail to keep designtime to minimum it is critical to structure the simulation environmentto make it possible to trade off simulation performance formodel detail in flexible manner that allows concurrent hardwareand software developmentin this paper we describe the different simulation methodologies fordeveloping complex digital systems and give examples of one suchsimulation environment the rest of this paper is organized as followsin section we describe and classify the various simulationmethodologies that are used in digital system design and describehow they are used in the various stages of the design cycle in section we provide examples of the methodologies we describe asophisticated simulation environment used to develop large asicfor the stanford flash multiprocessor
this paper provides survey of simple network management protocol snmp related performance studies over the last years variety of such studies have been published performance benchmarking of snmp like all benchmarking studies is non trivial task that requires substantial effort to be performed well and achieve its purpose in many cases existing studies have employed different techniques metrics scenarios and parameters the reason for this diversity is the absence of common framework for snmp performance analysis without such framework results of snmp related performance studies cannot easily be compared extended or reused it is therefore important to start research activity to define such framework such research activity should start from analysing previous studies on this topic to reveal their employed methods in this survey we examine these studies by classifying and discussing them we present techniques approaches and metrics employed by these studies to quantify the performance of snmp based applications
understanding the relationship among different distance measures is helpful in choosing proper one for particular application in this paper we compare two commonly used distance measures in vector models namely euclidean distance eud and cosine angle distance cad for nearest neighbor nn queries in high dimensional data spaces using theoretical analysis and experimental results we show that the retrieval results based on eud are similar to those based on cad when dimension is high we have applied cad for content based image retrieval cbir retrieval results show that cad works no worse than eud which is commonly used distance measure for cbir while providing other advantages such as naturally normalized distance
we suggest modeling software package flaws bugs by assuming eventual byzantine behavior of the package we assume that if program is started in predefined initial state it will exhibit legal behavior for period of time but will eventually become byzantine we assume that this behavior pattern can be attributed to the fact that the manufacturer had performed sufficient package tests for limited time scenarios restarts are useful for recovering such systems we suggest general yet practical framework and paradigm for the monitoring and restarting of systems where the framework and paradigm are based on theoretical foundation an autonomic recoverer that monitors and initiates system recovery is proposed it is designed to handle task given specific task requirements in the form of predicates and actions directed acyclic graph subsystem hierarchical structure is used by consistency monitoring procedure for achieving gracious recovery the existence and correct functionality of the autonomic recovery is guaranteed by the use of self stabilizing kernel resident anchor process the autonomic recoverer uses new scheme for liveness assurance via on line monitoring that complements known schemes for on line safety assurance
constraint based rule miners find all rules in given data set meeting user specified constraints such as minimum support and confidence we describe new algorithm that directly exploits all user specified constraints including minimum support minimum confidence and new constraint that ensures every mined rule offers predictive advantage over any of its simplifications our algorithm maintains efficiency even at low supports on data that is dense eg relational tables previous approaches such as apriori and its variants exploit only the minimum support constraint and as result are ineffective on dense data due to combinatorial explosion of ldquo frequent itemsets rdquo
this article introduces method for partial matching of surfaces represented by triangular meshes our method matches surface regions that are numerically and topologically dissimilar but approximately similar regions we introduce novel local surface descriptors which efficiently represent the geometry of local regions of the surface the descriptors are defined independently of the underlying triangulation and form compatible representation that allows matching of surfaces with different triangulations to cope with the combinatorial complexity of partial matching of large meshes we introduce the abstraction of salient geometric features and present method to construct them salient geometric feature is compound high level feature of nontrivial local shapes we show that relatively small number of such salient geometric features characterizes the surface well for various similarity applications matching salient geometric features is based on indexing rotation invariant features and voting scheme accelerated by geometric hashing we demonstrate the effectiveness of our method with number of applications such as computing self similarity alignments and subparts similarity
this paper describes flexible and easily extensible predicate abstraction based approach to the verification of stlusage and observes the advantages of verifying programsin terms of high level data structures rather than low level pointer manipulations we formalize the semantics of thestl by means of hoare style axiomatization the verification requires an operational model conservatively approximating the semantics given by the standard our results show advantages in terms of errors detected and false positives avoided over previous attempts to analyze stl usage due to the power of the abstraction engine and model checker
computation performed in many typical aspects involve side effects in purely functional setting adding such aspects using techniques such as monadification will generally lead to crosscutting changes this paper presents an approach to provide side effecting aspects for purely lazy functional languages in user transparent fashion we propose simple yet direct state manipulation construct for developing side effecting aspects and devise systematic monadification scheme to translate the woven code to purely monadic style functional code
debugging data races in parallel applications is difficult task error causing data races may appear to vanish due to changes in an application’s optimization level thread scheduling whether or not debugger is used and other effects further many race conditions cause incorrect program behavior only in rare scenarios and may lie undetected during software testingtools exist today that do decent job in finding data races in multi threaded applications some data race detection tools are very efficient and can detect data races with less than performance penalty most such tools however do not provide enough information to the user require recompilation or impose other usage restrictions other tools such as the one considered in this paper intel’s thread checker provide users with plenty of useful information and can be used with any application binary but have high overheads often over it is the goal of this paper to speed up thread checker by filtering out the vast majority of memory references that are highly unlikely to be involved in data races in our work we develop filters that filter of all memory references from the datarace detection algorithm resulting in speedups of with an average improvement of
this paper compares two possible implementations of multithreaded architecture and proposes new architecture combining the flexibility of the first with the low hardware complexity of the second we present performance and step by step complexity analysis of two design alternatives of multithreaded architecture dynamic inter thread resource scheduling and static resource allocation we then introduce new multithreaded architecture based on new scheduling mechanism called the ldquo semi static rdquo we show that with two concurrent threads the dynamic scheduling processor achieves from to higher performance at the cost of much more complicated design this paper indicates that for relatively high number of execution resources the complexity of the dynamic scheduling logic will inevitably require design compromises moreover high chip wide communication time and an incomplete bypassing network will limit the dynamic scheduling and reduce its performance advantage on the other hand static scheduling architecture achieves low resource utilization the semi static architecture utilizes compiler techniques to exploit patterns of program parallelism and introduces new hardware mechanism in order to achieve performance close to dynamic scheduling without significantly increasing the static hardware complexity the semi static architecture statically assigns part of the functional units but dynamically schedules the most performance critical functional units on medium grain basis
we introduce an abductive method for coherent integration of independent datasources the idea is to compute list of data facts that should be inserted to the amalgamated database or retracted from it in order to restore its consistency this method is implemented by an abductive solver called system that applies sldnfa resolution on meta theory that relates different possibly contradicting input databases we also give pure model theoretic analysis of the possible ways to recover consistent data from an inconsistent database in terms of those models of the database that exhibit as minimal inconsistent information as reasonably possible this allows us to characterize the recovered databases in terms of the preferred ie most consistent models of the theory the outcome is an abductive based application that is sound and complete with respect to corresponding model based preferential semantics and to the best of our knowledge is more expressive thus more general than any other implementation of coherent integration of databases
this paper describes methodology for the development of www applications and tool environment specifically tailored for the methodology the methodology and the development environment are based upon models and techniques already used in the hypermedia information systems and software engineering fields adapted and blended in an original mix the foundation of the proposal is the conceptual design of www applications using hdm lite notation for the specification of structure navigation and presentation semantics the conceptual schema is then translated into ldquo traditional rdquo database schema which describes both the organization of the content and the desired navigation and presentation features the www pages can therefore be dynamically generated from the database content following the navigation requests of the user case environment called autoweb system offers set of software tools which assist the design and the execution of www application in all its different aspects real life experiences of the use of the methodology and of the autoweb system in both the industrial and academic context are reported
swift is new principled approach to building web applications that are secure by construction in modern web applications some application functionality is usually implemented as client side code written in javascript moving code and data to the client can create security vulnerabilities but currently there are no good methods for deciding when it is secure to do so swift automatically partitions application code while providing assurance that the resulting placement is secure and efficient application code is written as java like code annotated with information flow policies that specify the confidentiality and integrity of web application information the compiler uses these policies to automatically partition the program into javascript code running in the browser and java code running on the server to improve interactive performance code and data are placed on the client side however security critical code and data are always placed on the server code and data can also be replicated across the client and server to obtain both security and performance max flow algorithm is used to place code and data in way that minimizes client server communication
we explore the use of signatures ie partial truth tables generated via bit parallel functional simulation during soft error analysis and logic synthesis we first present signature based cad framework that incorporates tools for the logic level analysis of soft error rate and for signature based design for reliability sider we observe that the soft error rate ser of logic circuit is closely related to various testability parameters such as signal observability and probability we show that these parameters can be computed very efficiently in linear time by means of signatures consequently anser evaluates logic masking two to three orders of magnitude faster than other ser evaluators while maintaining accuracy anser can also compute ser efficiently in sequential circuits by approximating steady state probabilities and sequential signal observabilities in the second part of this paper we incorporate anser into logic synthesis design flows aimed at reliable circuit design sider identifies and exploits redundancy already present in circuit via signature comparison to decrease ser we show that sider reduces ser by with only area overhead we also describe second signature based synthesis strategy that employs local rewriting to simultaneously improve area and decrease ser this technique yields reduction in ser with area decrease we show that combining the two synthesis approaches can result in further area reliability improvements
with the increasing occurrence of temporal and spatial data in present day database applications the interval data type is adopted by more and more database systems for an efficient support of queries that contain selections on interval attributes as well as simple valued attributes eg numbers strings at the same time special index structures are required supporting both types of predicates in combination based on the relational interval tree we present various indexing schemes that support such combined queries and can be integrated in relational database systems with minimum effort experiments on different query types show superior performance for the new techniques in comparison to competing access methods
phase change memory pcm is an emerging memory technology that can increase main memory capacity in cost effective and power efficient manner however pcm cells can endure only maximum of writes making pcm based system have lifetime of only few years under ideal conditions furthermore we show that non uniformity in writes to different cells reduces the achievable lifetime of pcm system by writes to pcm cells can be made uniform with wear leveling unfortunately existing wear leveling techniques require large storage tables and indirection resulting in significant area and latency overheads we propose start gap simple novel and effective wear leveling technique that uses only two registers by combining start gap with simple address space randomization techniques we show that the achievable lifetime of the baseline gb pcm based system is boosted from with no wear leveling to of the theoretical maximum while incurring total storage overhead of less than bytes and obviating the latency overhead of accessing large tables we also analyze the security vulnerabilities for memory systems that have limited write endurance showing that under adversarial settings pcm based system can fail in less than one minute we provide simple extension to start gap that makes pcm based systems robust to such malicious attacks
abstract this paper proposes new lossy to lossless progressive compression scheme for triangular meshes based on wavelet multiresolution theory for irregular meshes although remeshing techniques obtain better compression ratios for geometric compression this approach can be very effective when one wants to keep the connectivity and geometry of the processed mesh completely unchanged the simplification is based on the solving of an inverse problem optimization of both the connectivity and geometry of the processed mesh improves the approximation quality and the compression ratio of the scheme at each resolution level we show why this algorithm provides an efficient means of compression for both connectivity and geometry of meshes and it is illustrated by experimental results on various sets of reference meshes where our algorithm performs better than previously published approaches for both lossless and progressive compression
storage plays pivotal role in the performance of many applications optimizing disk architectures is design time as well as run time issue and requires balancing between performance power and capacity the design space is large and there are many knobs that can be used to optimize disk drive behavior here we present sensitivity based optimization for disk architectures soda which leverages results from digital circuit design using detailed models of the electro mechanical behavior of disk drives and suite of realistic workloads we show how soda can aid in design and runtime optimization
classification rules are convenient method of expressing regularities that exist within databases they are particularly useful when we wish to find patterns that describe defined class of interest ie for the task of partial classification or nugget discovery in this paper we address the problems of finding classification rules from databases containing nominal and ordinal attributes the number of rules that can be formulated from database is usually potentially vast due to the effect of combinatorial explosion this means that generating all rules in order to find the best rules according to some stated criteria is usually impractical and alternative strategies must be used in this paper we present an algorithm that delivers clearly defined set of rules the pc optimal set this set describes the interesting associations in database but excludes many rules that are simply minor variations of other rules the algorithm addresses the problems of combinatorial explosion and is capable of finding rules from databases comprising nominal and ordinal attributes in order to find the pc optimal set efficiently novel pruning functions are used in the search that take advantage of the properties of the pc optimal set our main contribution is method of on the fly pruning based on exploiting the relationship between pc optimal sets and ordinal data we show that using these methods results in very considerable increase in efficiency allowing the discovery of useful rules from many databases
as design complexities and circuit densities are increasing the detailed routing dr problem is becoming more and more challenging problem due to the high complexity of dr algorithms it is very important to start the routing process with clean solutions rather than starting with suboptimal routes and trying to fix them in an iterative process in this paper we propose an escape routing algorithm that can optimize routing of set of nets around their terminals for this we first propose polynomial time algorithm that guarantees to find the optimal escape routing solution for set of nets when the track structures are uniform then we use this algorithm as baseline and study the general problem with arbitrary track structures for this we propose novel multicommodity flow mcf model that has one to one correspondence with the escape routing problem this mcf model is novel in the sense that the interdependence and contention between different flow commodities is minimal using this model we propose lagrangian relaxation based algorithm to solve the escape problem our experiments demonstrate that this algorithm improves the overall routability significantly by reducing the number of nets that require rip up and reroute
we enhance the narrowing driven partial evaluation scheme for lazy functional logic programs with the computation of symbolic costs the enhanced scheme allows us to estimate the effects of the program transformer in precise framework and moreover to quantify these effects the considered costs are ldquo symbolic rdquo in the sense that they measure the number of basic operations performed during computation rather than actual execution times our scheme may serve as basis to develop speedup analyses and cost guided transformers cost augmented partial evaluator which demonstrates the usefulness of our approach has been implemented in the multi paradigm language curry
the dynamic execution layer interface dell offers the following unique capability it provides fine grain control over the execution of programs by allowing its clients to observe and optionally manipulate every single instruction at run time just before it runs dell accomplishes this by opening up an interface to the layer between the execution of software and hardware to avoid the slowdown dell caches private copy of the executed code and always runs out of its own private cachein addition to giving powerful control to clients dell opens up caching and linking to ordinary emulators and just in time compilers which then get the reuse benefits of the same mechanism for example emulators themselves can also use other clients to mix emulation with already existing services native code and other emulatorsthis paper describes the basic aspects of dell including the underlying caching and linking mechanism the hardware abstraction mechanism ham the binary level translation blt infrastructure and the application programming interface api exposed to the clients we also cover some of the services that clients could offer through the dell such as isa emulation software patching and sandboxing finally we consider case study of emulation in detail the emulation of pocketpc system on the lx st embedded vliw processor in this case dell enables us to achieve near native performance and to mix and match native and emulated code
in this article face recognition algorithm aimed at mimicking the human ability to differentiate people is proposed for each individual we first compute projection line that maximizes his or her dissimilarity to all other people in the user database facial identity is thus encoded in the dissimilarity pattern composed by all the projection coefficients of an individual against all other enrolled user identities facial recognition is achieved by calculating the dissimilarity pattern of an unknown individual with that of each enrolled user as the proposed algorithm is composed of different one dimensional projection lines it easily allows adding or removing users by simply adding or removing the corresponding projection lines in the system ideally to minimize the influence of these additions removals the user group should be representative enough of the general population experiments on three widely used databases xmvts ar and equinox show consistently good results the proposed algorithm achieves equal error rate eer and half total error rate hter values in the ranges of and respectively our approach yields results comparable to the top two winners in recent contests reported in the literature
various query methods for video search exist because of the semantic gap each method has its limitations we argue that for effective retrieval query methods need to be combined at retrieval time however switching query methods often involves change in query and browsing interface which puts heavy burden on the user in this paper we propose novel method for fast and effective search trough large video collections by embedding multiple query methods into single browsing environment to that end we introduced the notion of query threads which contain shot based ranking of the video collection according to some feature based similarity measure on top of these threads we define several thread based visualizations ranging from fast targeted search to very broad exploratory search with the fork browser as the balance between fast search and video space exploration we compare the effectiveness and efficiency of the forkbrowser with the crossbrowser on the trecvid interactive search task results show that different query methods are needed for different types of search topics and that the forkbrowser requires significantly less user interactions to achieve the same result as the crossbrowser in addition both browsers rank among the best interactive retrieval systems currently available
in this paper we propose methodological approach for the model driven development of secure xml databases db this proposal is within the framework of midas model driven methodology for the development of web information systems based on the model driven architecture mda proposed by the object management group omg the xml db development process in midas proposes using the data conceptual model as platform independent model pim and the xml schema model as platform specific model psm with both of these represented in uml in this work such models will be modified so as to be able to add security aspects if the stored information is considered as critical on the one hand the use of uml extension to incorporate security aspects at the conceptual level of secure db development pim is proposed on the other the previously defined xml schema profile will be modified the purpose being to incorporate security aspects at the logical level of the secure xml db development psm in addition to all this the semi automatic mappings from pim to psm for secure xml db will be defined
in glueless shared memory multiprocessors where cache coherence is usually maintained using directory based protocol the fast access to the on chip components caches and network router among others contrasts with the much slower main memory unfortunately directory based protocols need to obtain the sharing status of every memory block before coherence actions can be performed this information has traditionally been stored in main memory and therefore these cache coherence protocols are far from being optimal in this work we propose two alternative designs for the last level private cache of glueless shared memory multiprocessors the lightweight directory and the sglum cache our proposals completely remove directory information from main memory and store it in the home node’s cache thus reducing both the number of accesses to main memory and the directory memory overhead the main characteristics of the lightweight directory are its simplicity and the significant improvement in the execution time for most applications its drawback however is that the performance of some particular applications could be degraded on the other hand the sglum cache offers more modest improvements in execution time for all the applications by adding some extra structures that cope with the cases in which the lightweight directory fails
rule based optimizers and optimizer generators use rules to specify query transformations rules act directly on query representations which typically are based on query algebras but most algebras complicate rule formulation and rules over these algebras must often resort to calling to externally defined bodies of code code makes rules difficult to formulate prove correct and reason about and therefore compromises the effectiveness of rule based systemsin this paper we present kola combinator based algebra designed to simplify rule formulation kola is not user language and kola’s variable free queries are difficult for humans to read but kola is an effective internal algebra because its combinator style makes queries manipulable and structurally revealing as result rules over kola queries are easily expressed without the need for supplemental code we illustrate this point first by showing some transformations that despite their simplicity require head and body routines when expressed over algebras that include variables we show that these transformations are expressible without supplemental routines in kola we then show complex transformations of class of nested queries expressed over kola nested query optimization while having been studied before have seriously challenged the rule based paradigm
presently massively parallel processors mpps are available only in few commercial models sequence of three asci teraflops mpps has appeared before the new millennium this paper evaluates six mpp systems through stap benchmark experiments the stap is radar signal processing benchmark which exploits regularly structured spmd data parallelism we reveal the resource scaling effects on mpp performance along orthogonal dimensions of machine size processor speed memory capacity messaging latency and network bandwidth we show how to achieve balanced resources scaling against enlarged workload problem size among three commercial mpps the ibm sp shows the highest speed and efficiency attributed to its well designed network with middleware support for single system image the cray td demonstrates high network bandwidth with good numa memory hierarchy the intel paragon trails far behind due to slow processors used and excessive latency experienced in passing messages our analysis projects the lowest stap speed on the asci red compared with the projected speed of two asci blue machines this is attributed to slow processors used in asci red and the mismatch between its hardware and software the blue pacific shows the highest potential to deliver scalable performance up to thousands of nodes the blue mountain is designed to have the highest network bandwidth our results suggest limit on the scalability of the distributed shared memory dsm architecture adopted in blue mountain the scaling model offers quantitative method to match resource scaling with problem scaling to yield truly scalable performance the model helps mpp designers optimize the processors memory network and subsystems of an mpp for mpp users the scaling results can be applied to partition large workload for spmd execution or to minimize the software overhead in collective communication or remote memory update operations finally our scaling model is assessed to evaluate mpps with benchmarks other than stap
the emergence of monitoring applications has precipitated the need for data stream management systems dsmss which constantly monitor incoming data feeds through registered continuous queries in order to detect events of interest in this article we examine the problem of how to schedule multiple continuous queries cqs in dsms to optimize different quality of service qos metrics we show that unlike traditional online systems scheduling policies in dsmss that optimize for average response time will be different from policies that optimize for average slowdown which is more appropriate metric to use in the presence of heterogeneous workload towards this we propose policies to optimize for the average case performance for both metrics additionally we propose hybrid scheduling policy that strikes fine balance between performance and fairness by looking at both the average and worst case performance for both metrics we also show how our policies can be adaptive enough to handle the inherent dynamic nature of monitoring applications furthermore we discuss how our policies can be efficiently implemented and extended to exploit sharing in optimized multi query plans and multi stream cqs finally we experimentally show using real data that our policies consistently outperform currently used ones
growing importance of distributed data mining techniques has recently attracted attention of researchers in multiagent domain in this paper we present novel framework multiagent learning framework malef designed for both the agent based distributed machine learning as well as data mining proposed framework is based on the exchange of meta level descriptions of individual learning process online reasoning about learning success and learning progress this paper illustrates how malef framework can be used in practical system in which different learners use different datasets hypotheses and learning algorithms we describe our experimental results obtained using this system and review related work on the subject
we present technique for using infeasible program paths to automatically infer range predicates that describe properties of unbounded array segments first we build proofs showing the infeasibility of the paths using axioms that precisely encode the high level but informal rules with which programmers reason about arrays next we mine the proofs for craig interpolants which correspond to predicates that refute the particular counterexample path by embedding the predicate inference technique within counterexample guided abstraction refinement cegar loop we obtain method for verifying data sensitive safety properties whose precision is tailored in program and property sensitive manner though the axioms used are simple we show that the method suffices to prove variety of array manipulating programs that were previously beyond automatic model checkers
object oriented programming promotes reuse of classes in multiple contexts thus class is designed and implemented with several usage scenarios in mind some of which possibly open and generic correspondingly the unit testing of classes cannot make too strict assumptions on the actual method invocation sequences since these vary from application to applicationin this paper genetic algorithm is exploited to automatically produce test cases for the unit testing of classes in generic usage scenario test cases are described by chromosomes which include information on which objects to create which methods to invoke and which values to use as inputs the proposed algorithm mutates them with the aim of maximizing given coverage measure the implementation of the algorithm and its application to classes from the java standard library are described
recent research on multi party communications shows how multi agent communications can take advantage of the complexity of the human communication process the salient point is the very nature of the communication channels which enable humans to focus their attention on ambient communications as well as to direct their own communications for multi agent systems the difficulty is the routing of messages according to both the needs of the sender and the needs of the potential recipients this difficulty is compounded by the necessity of taking into account the context of this communication this article proposes an architecture for the environment as active support for interaction model easi which is based on classification data model and supports multiparty communication our proposition has been implemented and the functional description of the environment is given
local image descriptor robust to the common photometric transformations blur illumination noise and jpeg compression and geometric transformations rotation scaling translation and viewpoint is crucial to many image understanding and computer vision applications in this paper the representation and matching power of region descriptors are to be evaluated common set of elliptical interest regions is used to evaluate the performance the elliptical regions are further normalized to be circular with fixed size the normalized circular regions will become affine invariant up to rotational ambiguity here new distinctive image descriptor to represent the normalized region is proposed which primarily comprises the zernike moment zm phase information an accurate and robust estimation of the rotation angle between pair of normalized regions is then described and used to measure the similarity between two matching regions the discriminative power of the new zm phase descriptor is compared with five major existing region descriptors sift gloh pca sift complex moments and steerable filters based on the precision recall criterion the experimental results involving more than million region pairs indicate the proposed zm phase descriptor has generally speaking the best performance under the common photometric and geometric transformations both quantitative and qualitative analyses on the descriptor performances are given to account for the performance discrepancy first the key factor for its striking performance is due to the fact that the zm phase has accurate estimation accuracy of the rotation angle between two matching regions second the feature dimensionality and feature orthogonality also affect the descriptor performance third the zm phase is more robust under the nonuniform image intensity fluctuation finally time complexity analysis is provided
oop style requires programmers to organize their code according to objects or nouns using natural language as metaphor causing program’s actions verbs to become scattered during implementation we define an action oriented identifier graph aoig to reconnect the scattered actions in an oop system an oop system with an aoig will essentially support the dynamic virtual remodularization of oop code into an action oriented view we have developed an algorithm to automatically construct an aoig and an implementation of the construction process to automatically construct an aoig we use natural language processing nlp techniques to process the natural language clues left by programmers in source code and comments and we connect code segments through the actions that they perform using reasonably sized program we present several applications of an aoig feature location working set recovery and aspect mining which demonstrate how the aoig can be used by software engineering tools to combat the tyranny of the dominant decomposition
we present method for parameterizing subdivision surfaces in an as rigid as possible fashion while much work has concentrated on parameterizing polygon meshes little if any work has focused on subdivision surfaces despite their popularity we show that polygon parameterization methods produce suboptimal results when applied to subdivision surfaces and describe how these methods may be modified to operate on subdivision surfaces we also describe method for creating extended charts to further reduce the distortion of the parameterization finally we demonstrate how to take advantage of the multi resolution structure of subdivision surfaces to accelerate convergence of our optimization
in business creating common concept set for business integration interoperation and interaction has to consider the heterogeneity reality of different interpretations from multiple concept providers maintaining semantic consistency between multiple concept providers is difficult problem to solve this problem this paper first reviewed the existing technologies of collaborative editing systems and consistency maintenance in the areas of both cscw and business based on the discussion of existing technologies it then proposes novel chces approach which divides collaborative editing system into two layers in topology and introduces four strategies to edit common concepts between the two layers set of operations is designed which demonstrates the solution
automatic content based image categorization is challenging research topic and has many practical applications images are usually represented as bags of feature vectors and the categorization problem is studied in the multiple instance learning mil framework in this paper we propose novel learning technique which transforms the mil problem into standard supervised learning problem by defining feature vector for each image bag specifically the feature vectors of the image bags are grouped into clusters and each cluster is given label using these labels each instance of an image bag can be replaced by corresponding label to obtain bag of cluster labels data mining can then be employed to uncover common label patterns for each image category these label patterns are converted into bags of feature vectors and they are used to transform each image bag in the data set into feature vector such that each vector element is the distance of the image bag to distinct pattern bag with this new image representation standard supervised learning algorithms can be applied to classify the images into the pre defined categories our experimental results demonstrate the superiority of the proposed technique in categorization accuracy as compared to state of the art methods
the decentralised cooperative and self organising nature of peer to peer pp systems help to mitigate and even overcome many challenges which overwhelm the traditional client server approaches on the other hand these very characteristics also introduce some novel issues in pp environments one of the critical issues is how to build the trust relationship within pp systems in this paper we first discuss the desired properties that need to be considered while building trust in pp systems then we analyse two types of attacks both the ones mitigated by as well as the ones aimed at trust systems after this we divide the previous research work on building trust in pp systems into two broad categories that is reputation based and trade based we then review and discuss the advances in this area based on this classification finally we point out some potential research directions in building trust securely in pp systems
common representation used in text categorization is the bag of words model aka unigram model learning with this particular representation involves typically some preprocessing eg stopwords removal stemming this results in one explicit tokenization of the corpus in this work we introduce logistic regression approach where learning involves automatic tokenization this allows us to weaken the priori required knowledge about the corpus and results in tokenization with variable length word or character grams as basic tokens we accomplish this by solving logistic regression using gradient ascent in the space of all ngrams we show that this can be done very efficiently using branch and bound approach which chooses the maximum gradient ascent direction projected onto single dimension ie candidate feature although the space is very large our method allows us to investigate variable length gram learning we demonstrate the efficiency of our approach compared to state of the art classifiers used for text categorization such as cyclic coordinate descent logistic regression and support vector machines
in this paper we present graphical query language for xml the language based on simple form of graph grammars permits us to extract data and reorganize information in new structure as with most of the current query languages for xml queries consist of two parts one extracting subgraph and one constructing the output graph the semantics of queries is given in terms of graph grammars the use of graph grammars makes it possible to define in simple way the structural properties of both the subgraph that has to be extracted and the graph that has to be constructed we provide an example driven comparison of our language wrt other xml query languages and show the effectiveness and simplicity of our approach
many approaches have been proposed for digital system verification either based on simulation strategies or on formal verification techniques both of them show advantages and drawbacks and new mixed approaches have been presented in order to improve the verification process specifically the adoption of formal methods still lacks coverage metrics to let the verification engineer get measure of which portion of the circuit is already covered by the written properties that far and which parts still need to be addressed the present paper describes new simulation based methodology aimed at measuring the error coverage achieved by temporal assertions proved by model checking the approach has been applied to the description of protocol converter block and some preliminary results are presented in the paper
this paper shows how to analytically calculate the statistical properties of the errors in estimated parameters the basic tools to achieve this aim include first order approximation perturbation techniques such as matrix perturbation theory and taylor series this analysis applies for general class of parameter estimation problems that can be abstracted as linear or linearized homogeneous equationof course there may be many reasons why one might which to have such estimates here we concentrate on the situation where one might use the estimated parameters to carry out some further statistical fitting or optimal refinement in order to make the problem concrete we take homography estimation as specific problem in particular we show how the derived statistical errors in the homography coefficients allow improved approaches to refining these coefficients through subspace constrained homography estimation chen and suter in int comput vis indeed having derived the statistical properties of the errors in the homography coefficients before subspace constrained refinement we do two things we verify the correctness through statistical simulations but we also show how to use the knowledge of the errors to improve the subspace based refinement stage comparison with the straightforward subspace refinement approach without taking into account the statistical properties of the homography coefficients shows that our statistical characterization of these errors is both correct and useful
we propose to bridge the discrepancy between data representations in memory and those favored by the simd processor by customizing the low level address mapping to achieve this we employ the extended single affiliation multiple stride sams parallel memory scheme at an appropriate level in the memory hierarchy this level of memory provides both array of structures aos and structure of arrays soa views for the structured data to the processor appearing to have maintained multiple layouts for the same data with such multi layout memory optimal simdization can be achieved our synthesis results using tsmc nm cmos technology indicate that the sams multi layout memory system has efficient hardware implementation with critical path delay of less than ns and moderate hardware overhead experimental evaluation based on modified ibm cell processor model suggests that our approach is able to decrease the dynamic instruction count by up to for selection of real applications and kernels under the same conditions the total execution time can be reduced by up to
efficient reduction algorithms are crucial to many large scale parallel scientific applications while previous algorithms constrain processing to the host cpu we explore and utilise the processors in modern cluster network interface cards nics we present the design issues solutions analytical models and experimental evaluations of family of nic based reduction algorithms through experiments on the alc cluster at lawrence livermore national laboratory which connects dual cpu nodes with the quadrics qsnet interconnect we find nic based reductions to be more efficient than host based implementations at large scale our nic based reductions are more than twice as fast as the host based production level mpi implementation
we present method to enhance fault localization for software systems based on frequent pattern mining algorithm our method is based on large set of test cases for given set of programs in which faults can be detected the test executions are recorded as function call trees based on test oracles the tests can be classified into successful and failing tests frequent pattern mining algorithm is used to identify frequent subtrees in successful and failing test executions this information is used to rank functions according to their likelihood of containing fault the ranking suggests an order in which to examine the functions during fault analysis we validate our approach experimentally using subset of siemens benchmark programs
data items that arrive online as streams typically have attributes which take values from one or more hierarchies time and geographic location source and destination ip addresses etc providing an aggregate view of such data is important for summarization visualization and analysis we develop an aggregate view based on certain organized sets of large valued regions ldquo heavy hitters rdquo corresponding to hierarchically discounted frequency counts we formally define the notion of hierarchical heavy hitters hhhs we first consider computing approximate hhhs over data stream drawn from single hierarchical attribute we formalize the problem and give deterministic algorithms to find them in single pass over the input in order to analyze wider range of realistic data streams eg from ip traffic monitoring applications we generalize this problem to multiple dimensions here the semantics of hhhs are more complex since ldquo child rdquo node can have multiple ldquo parent rdquo nodes we present online algorithms that find approximate hhhs in one pass with provable accuracy guarantees the product of hierarchical dimensions forms mathematical lattice structure our algorithms exploit this structure and so are able to track approximate hhhs using only small fixed number of statistics per stored item regardless of the number of dimensions we show experimentally using real data that our proposed algorithms yields outputs which are very similar virtually identical in many cases to offline computations of the exact solutions whereas straightforward heavy hitters based approaches give significantly inferior answer quality furthermore the proposed algorithms result in an order of magnitude savings in data structure size while performing competitively
data warehousing is corporate strategy that needs to integrate information from several sources of separately developed database management systems dbmss future dbms of data warehouse should provide adequate facilities to manage wide range of information arising from such integration we propose that the capabilities of database languages should be enhanced to manipulate user defined data orderings since business queries in an enterprise usually involve order we extend the relational model to incorporate partial orderings into data domains and describe the ordered relational model we have already defined and implemented minimal extension of sql called osql which allows querying over ordered relational databases one of the important facilities provided by osql is that it allows users to capture the underlying semantics of the ordering of the data for given application herein we demonstrate that osql aided with package discipline can be an effective means to manage the inter related operations and the underlying data domains of wide range of advanced applications that are vital in data warehousing such as temporal incomplete and fuzzy information we present the details of the generic operations arising from these applications in the form of three osql packages called osqltime osqlincomp and osqlfuzzy
in this article we propose state dependent importance sampling heuristics to estimate the probability of population overflow in jackson queueing networks these heuristics capture state dependence along the boundaries when one or more queues are empty which is crucial for the asymptotic efficiency of the change of measure the approach does not require difficult and often intractable mathematical analysis and is not limited by storage and computational requirements involved in adaptive importance sampling methodologies particularly for large state space experimental results on tandem parallel feed forward and feedback networks with moderate number of nodes suggest that the proposed heuristics may yield asymptotically efficient estimators possibly with bounded relative error when applied to queueing networks wherein no other state independent importance sampling techniques are known to be efficient the heuristics are robust and remain effective for larger networks moreover insights drawn from the basic networks considered in this article help understand sample path behavior along the boundaries conditional on reaching the rare event of interest this is key to the application of the methodology to networks of more general topologies it is hoped that empirical findings and insights in this paper will encourage more research on related practical and theoretical issues
communication coalescing is static optimization that can reduce both communication frequency and redundant data transfer in compiler generated code for regular data parallel applications we present an algorithm for coalescing communication that arises when generating code for regular data parallel applications written in high performance fortran hpf to handle sophisticated computation partitionings our algorithm normalizes communication before attempting coalescing we experimentally evaluate our algorithm which is implemented in the dhpf compiler in the compilation of hpf versions of the nas application benchmarks sp bt and lu our normalized coalescing algorithm improves the performance and scalability of compiler generated code for these benchmarks by reducing the communication volume up to compared to simpler coalescing strategy and enables us to match the communication volume and frequency in hand optimized mpi implementations of these codes
with the rapid growth of commerce product reviews on the web have become an important information source for customers decision making when they intend to buy some product as the reviews are often too many for customers to go through how to automatically classify them into different sentiment orientation categories ie positive negative has become research problem in this paper based on fisher’s discriminant ratio an effective feature selection method is proposed for product review text sentiment classification in order to validate the validity of the proposed method we compared it with other methods respectively based on information gain and mutual information while support vector machine is adopted as the classifier in this paper subexperiments are conducted by combining different feature selection methods with kinds of candidate feature sets under review documents of cars the experimental results indicate that the fisher’s discriminant ratio based on word frequency estimation has the best performance with value while the candidate features are the words which appear in both positive and negative texts
in this paper we present fast and versatile algorithm which can rapidly perform variety of nearest neighbor searches efficiency improvement is achieved by utilizing the distance lower bound to avoid the calculation of the distance itself if the lower bound is already larger than the global minimum distance at the preprocessing stage the proposed algorithm constructs lower bound tree lb tree by agglomeratively clustering all the sample points to be searched given query point the lower bound of its distance to each sample point can be calculated by using the internal node of the lb tree to reduce the amount of lower bounds actually calculated the winner update search strategy is used for traversing the tree for further efficiency improvement data transformation can be applied to the sample and the query points in addition to finding the nearest neighbor the proposed algorithm can also provide the nearest neighbors progressively ii find the nearest neighbors within specified distance threshold and iii identify neighbors whose distances to the query are sufficiently close to the minimum distance of the nearest neighbor our experiments have shown that the proposed algorithm can save substantial computation particularly when the distance of the query point to its nearest neighbor is relatively small compared with its distance to most other samples which is the case for many object recognition problems
translation model size is growing at pace that outstrips improvements in computing power and this hinders research on many interesting models we show how an algorithmic scaling technique can be used to easily handle very large models using this technique we explore several large model variants and show an improvement bleu on the nist chinese english task this opens the door for work on variety of models that are much less constrained by computational limitations
the task of separating genuine attacks from false alarms in large intrusion detection infrastructures is extremely difficult the number of alarms received in such environments can easily enter into the millions of alerts per day the overwhelming noise created by these alarms can cause genuine attacks to go unnoticed as means of highlighting these attacks we introduce host ranking technique utilizing alarm graphs rather than enumerate all potential attack paths as in attack graphs we build and analyze graphs based on the alarms generated by the intrusion detection sensors installed on network given that the alarms are predominantly false positives the challenge is to identify separate and ideally predict future attacks in this paper we propose novel approach to tackle this problem based on the pagerank algorithm by elevating the rank of known attackers and victims we are able to observe the effect that these hosts have on the other nodes in the alarm graph using this information we are able to discover previously overlooked attacks as well as defend against future intrusions
being able to activate triggers at timepoints reached or after time intervals elapsed has been acknowledged by many authors as valuable functionality of dbms recently the interest in time based triggers has been renewed in the context of data stream monitoring however up till now sql triggers react to data changes only even though research proposals and prototypes have been supporting several other event types in particular time based ones since long we therefore propose seamless extension of the sql trigger concept by time based triggers focussing on semantic issues arising from such an extension
as models are elevated to first class artifacts within the software development lifecycle the task of construction and evolution of large scale system models becomes manually intensive effort that can be very time consuming and error prone to address these problems this dissertation abstract presents model transformation approach there are three main features of this research first tasks of model construction and evolution are specified in model transformation language called the embedded constraint language second core transformation engine called saw is used to perform model transformation in an automated manner by executing the ecl transformation specification finally testing and debugging tools at the modeling level are provided to assist in detecting errors in the model transformation
scientific publications written in natural language still play central role as our knowledge source however due to the flood of publications obtaining comprehensive view even on topic of limited scope from stack of publications is becoming an arduous task examples are presented from our recent experiences in the materials science field where information is not shared among researchers studying different materials and different methods to overcome the limitation we propose structured keywords method to reinforce the functionality of future library
due to the growing importance of the world wide web archiving it has become crucial for preserving useful source of information to maintain web archive up to date crawlers harvest the web by iteratively downloading new versions of documents however it is frequent that crawlers retrieve pages with unimportant changes such as advertisements which are continually updated hence web archive systems waste time and space for indexing and storing useless page versions also querying the archive can take more time due to the large set of useless page versions stored thus an effective method is required to know accurately when and how often important changes between versions occur in order to efficiently archive web pages our work focuses on addressing this requirement through new web archiving approach that detects important changes between page versions this approach consists in archiving the visual layout structure of web page represented by semantic blocks this work seeks to describe the proposed approach and to examine various related issues such as using the importance of changes between versions to optimize web crawl scheduling the major interesting research questions that we would like to address in the future are introduced
in this paper we use real server and personal computer workloads to systematically analyze the true performance impact of various optimization techniques including read caching sequential prefetching opportunistic prefetching write buffering request scheduling striping and short stroking we also break down disk technology improvement into four basic effects faster seeks higher rpm linear density improvement and increase in track density and analyze each separately to determine its actual benefit in addition we examine the historical rates of improvement and use the trends to project the effect of disk technology scaling as part of this study we develop methodology for replaying real workloads that more accurately models arrivals and that allows the rate to be more realistically scaled than previously we find that optimization techniques that reduce the number of physical os are generally more effective than those that improve the efficiency in performing the os sequential prefetching and write buffering are particularly effective reducing the average read and write response time by about and respectively our results suggest that reliable method for improving performance is to use larger caches up to and even beyond of the storage used for given workload our analysis shows that disk technology improvement at the historical rate increases performance by about per year if the disk occupancy rate is kept constant and by about per year if the same number of disks are used we discover that the actual average seek time and rotational latency are respectively only about and of the specified values we also observe that the disk head positioning time far dominates the data transfer time suggesting that to effectively utilize the available disk bandwidth data should be reorganized such that accesses become more sequential
let be set of points in euclidean space and let well known result of johnson and lindenstrauss states that there is projection of onto subspace of dimension mathcal epsilon log such that distances change by at most factor of we consider an extension of this result our goal is to find an analogous dimension reduction where not only pairs but all subsets of at most points maintain their volume approximately more precisely we require that sets of size preserve their volumes within factor of we show that this can be achieved using mathcal max frac epsilon epsilon log dimensions this in particular means that for mathcal log epsilon we require no more dimensions asymptotically than the special case handled by johnson and lindenstrauss our work improves on result of magen that required as many as mathcal epsilon log dimensions and is tight up to factor of mathcal epsilon another outcome of our work is an alternative and greatly simplified proof of the result of magen showing that all distances between points and affine subspaces spanned by small number of points are approximately preserved when projecting onto mathcal epsilon log dimensions
we propose type system for programming language with memory allocation deallocation primitives which prevents memory related errors such as double frees and memory leaks the main idea is to augment pointer types with fractional ownerships which express both capabilities and obligations to access or deallocate memory cells by assigning an ownership to each pointer type constructor rather than to variable our type system can properly reason about list tree manipulating programs furthermore thanks to the use of fractions as ownerships the type system admits polynomial time type inference algorithm which serves as an algorithm for automatic verification of lack of memory related errors prototype verifier has been implemented and tested for programs
since sequential languages such as fortran and are more machine independent than current parallel languages it is highly desirable to develop powerful parallelization tools which can generate parallel codes automatically or semiautomatically targeting different parallel architectures array data flow analysis is known to be crucial to the success of automatic parallelization such an analysis should be performed interprocedurally and symbolically and it often needs to handle the predicates represented by if conditions unfortunately such powerful program analysis can be extremely time consuming if not carefully designed how to enhance the efficiency of this analysis to practical level remains an issue largely untouched to date this paper presents techniques for efficient interprocedural array data flow analysis and documents experimental results of its implementation in research parallelizing compiler our techniques are based on guarded array regions and the resulting tool runs faster by one or two orders of magnitude than other similarly powerful tools
text classification is major data mining task an advanced text classification technique is known as partially supervised text classification which can build text classifier using small set of positive examples only this leads to our curiosity whether it is possible to find set of features that can be used to describe the positive examples therefore users do not even need to specify set of positive examples as the first step in this paper we formalize it as new problem called hot bursty events detection to detect bursty events from text stream which is sequence of chronologically ordered documents here bursty event is set of bursty features and is considered as potential category to build text classifier it is important to know that the hot bursty events detection problem we study in this paper is different from tdt topic detection and tracking which attempts to cluster documents as events using clustering techniques in other words our focus is on detecting set of bursty features for bursty event in this paper we propose new novel parameter free probabilistic approach called feature pivot clustering our main technique is to fully utilize the time information to determine set of bursty features which may occur in differenttime windows we detect bursty events based on the feature distributions there is no need to tune or estimate any parameters we conduct experiments using real life data major english newspaper in hong kong and show that the parameter free feature pivot clustering approach can detect the bursty events with high success rate
we introduce the idea of optimisation validation which is to formally establish that an instance of an optimising transformation indeed improves with respect to some resource measure this is related to but in contrast with translation validation which aims to establish that particular instance of transformation undertaken by an optimising compiler is semantics preserving our main setting is program logic for subset of java bytecode which is sound and complete for resource annotated operational semantics the latter employs resource algebras for measuring dynamic costs such as time space and more elaborate examples we describe examples of optimisation validation that we have formally verified in isabelle hol using the logic we also introduce type and effect system for measuring static costs such as code size which is proved consistent with the operational semantics
aspect oriented programming aop has been successfully applied to application code thanks to techniques such as java bytecode instrumentation unfortunately with current technology such as aspectj aspects cannot be woven into standard java class libraries this restriction is particularly unfortunate for aspects that would benefit from complete bytecode coverage such as profiling or debugging aspects in this paper we present an adaptation of the popular aspectj weaver that is able to weave aspects also into standard java class libraries we evaluate our approach with existing profiling aspects which now cover all bytecode executing in the virtual machine in addition we present new aspect for memory leak detection that also benefits from our approach
question classification systems play an important role in question answering systems and can be used in wide range of other domains the goal of question classification is to accurately assign labels to questions based on expected answer type most approaches in the past have relied on matching questions against hand crafted rules however rules require laborious effort to create and often suffer from being too specific statistical question classification methods overcome these issues by employing machine learning techniques we empirically show that statistical approach is robust and achieves good performance on three diverse data sets with little or no hand tuning furthermore we examine the role different syntactic and semantic features have on performance we find that semantic features tend to increase performance more than purely syntactic features finally we analyze common causes of misclassification error and provide insight into ways they may be overcome
new junction characterization and validation method is proposed junction branches of volumetric objects are extracted at interest points in image using topologically constrained grouping process this is followed by structural validation and position refinement of extracted junctions an interesting feature of the proposed method is that all types of junctions are described uniformly and extracted using the same generic process for instance the size of the interest regions is kept constant despite local variations in contour density and curvature validation rate of real junctions is high and most false hypotheses are properly rejected an experimental evaluation illustrates the capabilities of the proposed method in demanding situations
model checking has for years been advertised as way of ensuring the correctness of complex software systems however there exist surprisingly few critical studies of the application of model checking to industrial scale software systems by people other than the model checker’s own authors in this paper we report our experience in applying the spin model checker to the validation of the failover protocols of commercial telecommunications system while we conclude that model checking is not yet ready for such applications we find that current research in the model checking community is working to address the difficulties we encountered
sensor networks are fundamentally constrained by the difficulty and energy expense of delivering information from sensors to sink our work has focused on garnering additional significant energy improvements by devising computationally efficient lossless compression algorithms on the source node these reduce the amount of data that must be passed through the network and to the sink and thus have energy benefits that are multiplicative with the number of hops the data travels through the networkcurrently if sensor system designers want to compress acquired data they must either develop application specific compression algorithms or use off the shelf algorithms not designed for resource constrained sensor nodes this paper discusses the design issues involved with implementing adapting and customizing compression algorithms specifically geared for sensor nodes while developing sensor lzw lzw and some simple but effective variations to this algorithm we show how different amounts of compression can lead to energy savings on both the compressing node and throughout the network and that the savings depends heavily on the radio hardware to validate and evaluate our work we apply it to datasets from several different real world deployments and show that our approaches can reduce energy consumption by up to factor of across the network
functional validation is major bottleneck in pipelined processor design simulation using functional test vectors is the most widely used form of processor validation while existing model checking based approaches have proposed several promising ideas for efficient test generation many challenges remain in applying them to realistic pipelined processors the time and resources required for test generation using existing model checking based techniques can be extremely large this paper presents an efficient test generation technique using decompositional model checking the contribution of the paper is the development of both property and design decomposition procedures for efficient test generation of pipelined processors our experimental results using multi issue mips processor demonstrate several orders of magnitude reduction in memory requirement and test generation time
quality of service qos is becoming an issue of increasing importance in distributed multimedia systems this article presents an overview of the qos issues involved in the design of such systems qos related parameters can be identified in all system components we will discuss qos parameters found in communication protocols operating systems multimedia databases file servers as well as those directly concerned with the human user the management of qos parameters can be seen as an overall negotiation process involving various aspects such as dynamic adaptation and translation of parameters between different abstraction levels first we present some approaches to this management found in the literature then we describe features of our own qos project
recommender system for scientific scholarly articles that is both hybrid content and collaborative filtering based and multi dimensional across metadata categories such as subject hierarchies journal clusters and keyphrases can improve scientists ability to discover new knowledge from digital library providing users with an interface which enables the filtering of recommendations across these multiple dimensions can simultaneously provide explanations for the recommendations and increase the user’s control over how the recommender behaves
the continuous growth of media databases necessitates development of novel visualization and interaction techniques to support management of these collections we present videotater an experimental tool for tablet pc that supports the efficient and intuitive navigation selection segmentation and tagging of video our veridical representation immediately signals to the user where appropriate segment boundaries should be placed and allows for rapid review and refinement of manually or automatically generated segments finally we explore distribution of modalities in the interface by using multiple timeline representations pressure sensing and tag painting erasing metaphor with the pen
in secure information flow analysis the classic denning restrictions allow programâ termination to be affected by the values of its variables resulting in potential information leaks in an effort to quantify such leaks in this work we study simple imperative language with random assignments we consider strippingâ operation on programs and establish fundamental relationship between the behavior of well typed program and of its stripped version to prove this relationship we introduce new notion of fast probabilistic simulation on markov chains as an application we prove that under the denning restrictions well typed probabilistic programs are guaranteed to satisfy an approximate probabilistic noninterference property provided that their probability of nontermination is small
despite proven successful in previous projects the use of formal methods for enhancing quality of software is still not used in its full potential in industry we argue that seamless support for formal verification in high level specification tool enhances the attractiveness of using formal approach for increasing software quality commercial complex event processing cep engines often have support for modelling debugging and testing cep applications however the possibility of utilizing formal analysis is not considered we argue that using formal approach for verifying cep system can be performed without expertise in formal methods in this paper prototype tool rex is presented with support for specifying both cep systems and correctness properties of the same application in high level graphical language the specified cep applications are seamlessly transformed into timed automata representation together with the high level properties for automatic verification in the model checker uppaal
we present novel surface reconstruction algorithm that can recover high quality surfaces from noisy and defective data sets without any normal or orientation information set of new techniques are introduced to afford extra noise tolerability robust orientation alignment reliable outlier removal and satisfactory feature recovery in our algorithm sample points are first organized by an octree the points are then clustered into set of monolithically singly oriented groups the inside outside orientation of each group is determined through robust voting algorithm we locally fit an implicit quadric surface in each octree cell the locally fitted implicit surfaces are then blended to produce signed distance field using the modified shepardýs method we develop sophisticated iterative fitting algorithms to afford improved noise tolerance both in topology recognition and geometry accuracy furthermore this iterative fitting algorithm coupled with local model selection scheme provides reliable sharp feature recovery mechanism even in the presence of bad input
malicious software in form of internet worms computer viruses and trojan horses poses major threat to the security of networked systems the diversity and amount of its variants severely undermine the effectiveness of classical signature based detection yet variants of malware families share typical behavioral patterns reflecting its origin and purpose we aim to exploit these shared patterns for classification of malware and propose method for learning and discrimination of malware behavior our method proceeds in three stages behavior of collected malware is monitored in sandbox environment based on corpus of malware labeled by an anti virus scanner malware behavior classifier is trained using learning techniques and discriminative features of the behavior models are ranked for explanation of classification decisions experiments with different heterogeneous test data collected over several months using honeypots demonstrate the effectiveness of our method especially in detecting novel instances of malware families previously not recognized by commercial anti virus software
data centers are the most critical infrastructure of companies demanding higher and higher levels of quality of service qos in terms of availability and scalability at the core of data centers are multi tier architectures providing service to applications replication is heavily used in this infrastructure for either availability or scalability but typically not for both combined additionally most approaches replicate single tier making the non replicated tiers potential bottlenecks and single points of failure in this paper we present novel approach that provides both availability and scalability for multi tier applications the approach uses replicated cache that takes into account both the application server tier middle tier and the database back end the underlying replicated cache protocol fully embeds the replication logic in the application server the protocol exhibits good scalability as shown by our evaluation based on the new industrial benchmark for jee multi tier systems specjappserver
this paper presents the design and an evaluation of mondrix version of the linux kernel with mondriaan memory protection mmp mmp is combination of hardware and software that provides efficient fine grained memory protection between multiple protection domains sharing linear address space mondrix uses mmp to enforce isolation between kernel modules which helps detect bugs limits their damage and improves kernel robustness and maintainability during development mmp exposed two kernel bugs in common heavily tested code and during fault injection experiments it prevented three of five file system corruptionsthe mondrix implementation demonstrates how mmp can bring memory isolation to modules that already exist in large software application it shows the benefit of isolation for robustness and error detection and prevention while validating previous claims that the protection abstractions mmp offers are good fit for software this paper describes the design of the memory supervisor the kernel module which implements permissions policywe present an evaluation of mondrix using full system simulation of large kernel intensive workloads experiments with several benchmarks where mmp was used extensively indicate the additional space taken by the mmp data structures reduce the kernel’s free memory by less than and the kernel’s runtime increases less than relative to an unmodified kernel
ranking function is instrumental in affecting the performance of search engine designing and optimizing search engine’s ranking function remains daunting task for computer and information scientists recently genetic programming gp machine learning technique based on evolutionary theory has shown promise in tackling this very difficult problem ranking functions discovered by gp have been found to be significantly better than many of the other existing ranking functions however current gp implementations for ranking function discovery are all designed utilizing the vector space model in which the same term weighting strategy is applied to all terms in document this may not be an ideal representation scheme at the individual query level considering the fact that many query terms should play different roles in the final ranking in this paper we propose novel nonlinear ranking function representation scheme and compare this new design to the well known vector space model we theoretically show that the new representation scheme subsumes the traditional vector space model representation scheme as special case and hence allows for additional flexibility in term weighting we test the new representation scheme with the gp based discovery framework in personalized search information routing context using trec web corpus the experimental results show that the new ranking function representation design outperforms the traditional vector space model for gp based ranking function discovery
in this paper we describe paper based interface which combines the physical real with the digital world while interacting with real paper printouts users can seamlessly work with digital whiteboard at the same time users are able to send data from real paper to the digital world by picking up the content eg images from real printouts and drop it on the digital surface the reverse direction for transferring data from the whiteboard to the real paper is supported through printouts of the whiteboard page that are enhanced with integrated anoto patterns we present four different interaction techniques that show the potential of this paper and digital world combination moreover we describe the workflow of our system that bridges the gap between the two worlds in detail
modal logic has good claim to being the logic of choice for describing the reactive behaviour of systems modelled as coalgebras logics with modal operators obtained from so called predicate liftings have been shown to be invariant under behavioural equivalence expressivity results stating that conversely logically indistinguishable states are behaviourally equivalent depend on the existence of separating sets of predicate liftings for the signature functor at hand here we provide classification result for predicate liftings which leads to an easy criterion for the existence of such separating sets and we give simple examples of functors that fail to admit expressive normal or monotone modal logics respectively or in fact an expressive unary modal logic at all we then move on to polyadic modal logic where modal operators may take more than one argument formula we show that every accessible functor admits an expressive polyadic modal logic moreover expressive polyadic modal logics are unlike unary modal logics compositional
in this paper we propose discretization based schemes to preserve privacy in time series data mining traditional research on preserving privacy in data mining focuses on time invariant privacy issues with the emergence of time series data mining traditional snapshot based privacy issues need to be extended to be multi dimensional with the addition of time dimension in this paper we defined three threat models based on trust relationship between the data miner and data providers we propose three different schemes for these three threat models the proposed schemes are extensively evaluated against public available time series data sets our experiments show that proposed schemes can preserve privacy with cost of reduction in mining accuracy for most data sets proposed schemes can achieve low privacy leakage with slight reduction in classification accuracy we also studied effect of parameters of proposed schemes in this paper
new type of shared object called timed register is proposed and used to design indulgent timing based algorithms timed register generalizes the notion of an atomic register as follows if process invokes two consecutive operations on the same timed register which are read followed by write then the write operation is executed only if it is invoked at most time units after the read operation where is defined as part of the read operation in this context timing based algorithm is an algorithm whose correctness relies on the existence of bound such that any pair of consecutive constrained read and write operations issued by the same process on the same timed register are separated by at most time units an indulgent algorithm is an algorithm that always guarantees the safety properties and ensures the liveness property as soon as the timing assumptions are satisfied the usefulness of this new type of shared object is demonstrated by presenting simple and elegant indulgent timing based algorithms that solve the mutual exclusion exclusion adaptive renaming test set and consensus problems interestingly timed registers are universal objects in systems with process crashes and transient timing failures ie they allow building any concurrent object with sequential specification the paper also suggests connections with schedulers and contention managers
many intractable problems have been shown to become tractable if the treewidth of the underlying structure is bounded by constant an important tool for deriving such results is courcelle’s theorem which states that all properties defined by monadic second order mso sentences are fixed parameter tractable with respect to the treewidth arnborg et al extended this result to counting problems defined via mso properties however the mso description of problem is of course not an algorithm consequently proving the fixed parameter tractability of some problem via courcelle’s theorem can be considered as the starting point rather than the endpoint of the search for an efficient algorithm gottlob et al have recently presented new approach via monadic datalog to actually devise efficient algorithms for decision problems whose tractability follows from courcelle’s theorem in this paper we extend this approach and apply it to some fundamental counting problems in logic an artificial intelligence
the information explosion in today’s electronic world has created the need for information filtering techniques that help users filter out extraneous content to identify the right information they need to make important decisions recommender systems are one approach to this problem based on presenting potential items of interest to user rather than requiring the user to go looking for them in this paper we propose recommender system that recommends research papers of potential interest to authors known to the citeseer database for each author participating in the study we create user profile based on their previously published papers based on similarities between the user profile and profiles for documents in the collection additional papers are recommended to the author we introduce novel way of representing the user profiles as trees of concepts and an algorithm for computing the similarity between the user profiles and document profiles using tree edit distance measure experiments with group of volunteers show that our concept based algorithm provides better recommendations than traditional vector space model based technique
spam in the form of link spam and click spam has become major obstacle in the effective functioning of ranking and reputation systems even in the absence of spam difficulty in eliciting feedback and self reinforcing nature of ranking systems are known problems in this paper we make case for sharing with users the revenue generated by such systems as incentive to provide useful feedback and present an incentive based ranking scheme in realistic model of user behavior which addresses the above problems we give an explicit ranking algorithm based on user feedback our incentive structure and ranking algorithm ensure that there is profitable arbitrage opportunity for the users of the system in correcting the inaccuracies of the ranking the system is oblivious to the source of inaccuracies benign or malicious thus making it robust to spam as well as the problems of eliciting feedback and self reinforcement
we introduce methodology for evaluating network intrusion detection systems using an observable attack space which is parameterized representation of type of attack that can be observed in particular type of log data using the observable attack space for log data that does not include payload eg netflow data we evaluate the effectiveness of five proposed detectors for bot harvesting and scanning attacks in terms of their ability even when used in conjunction to deter the attacker from reaching his goals we demonstrate the ranges of attack parameter values that would avoid detection or rather that would require an inordinately high number of false alarms in order to detect them consistently
an innovative event based data stream compression and mining model is presented in this paper the main novelty of our approach with respect to traditional data stream compression approaches relies on the semantics of the application in driving the compression process by identifying interested events occurring in the unbounded stream this puts the basis for novel class of intelligent applications over data streams where the knowledge on actual streams is integrated with and correlated to the knowledge related to expired events that are considered critical for the target application scenario
the use of random linear network coding nc has significantly simplified the design of opportunistic routing or protocols by removing the need of coordination among forwarding nodes for avoiding duplicate transmissions however nc based or protocols face new challenge how many coded packets should each forwarder transmit to avoid the overhead of feedback exchange most practical existing nc based or protocols compute offline the expected number of transmissions for each forwarder using heuristics based on periodic measurements of the average link loss rates and the etx metric although attractive due to their minimal coordination overhead these approaches may suffer significant performance degradation in dynamic wireless environments with continuously changing levels of channel gains interference and background traffic in this paper we propose ccack new efficient nc based or protocol ccack exploits novel cumulative coded acknowledgment scheme that allows nodes to acknowledge network coded traffic to their upstream nodes in simple way oblivious to loss rates and with practically zero overhead in addition the cumulative coded acknowledgment scheme in ccack enables an efficient credit based rate control algorithm our evaluation shows that compared to more state of the art nc based or protocol ccack improves both throughput and fairness by up to and respectively with average improvements of and respectively
the dynamic nature of some self adaptive software systems can result in potentially unpredictable adaptations which may be detrimental to overall system dependability by diminishing trust in the adaptation process this paper describes our initial work with architectural runtime configuration management in order to improve dependability and overall system usefulness by maintaining record of reconfigurations and providing support for architectural recovery operations our approach fully decoupled from self adaptive systems themselves and the adaptation management processes governing their changes provides for better adaptation visibility and self adaptive process dependability we elaborate on the vision for our overall approach present early implementation and testing results from prototyping efforts and discuss our future plans
we evaluate an approach for mobile smart objects to cooperate with projector camera systems to achieve interactive projected displays on their surfaces without changing their appearance or function smart objects describe their appearance directly to the projector camera system enabling vision based detection based on their natural appearance this detection is significant challenge as objects differ in appearance and appear at varying distances and orientations with respect to tracking camera we investigate four detection approaches representing different appearance cues and contribute three experimental studies analysing the impact on detection performance firstly of scale and rotation secondly the combination of multiple appearance cues and thirdly the use of context information from the smart object we find that the training of appearance descriptions must coincide with the scale and orientations providing the best detection performance that multiple cues provide clear performance gain over single cue and that context sensing masks distractions and clutter further improving detection performance
in this paper we present an approach to develop parallel applications based on aspect oriented programming we propose collection of aspects to implement group communication mechanisms on parallel applications in our approach parallelisation code is developed by composing the collection into the application core functionality the approach requires fewer changes to sequential applications to parallelise the core functionality than current alternatives and yields more modular code the paper presents the collection and shows how the aspects can be used to develop efficient parallel applications
computational techniques that build models to correctly assign chemical compounds to various classes of interest have many applications in pharmaceutical research and are used extensively at various phases during the drug development process these techniques are used to solve number of classification problems such as predicting whether or not chemical compound has the desired biological activity is toxic or nontoxic and filtering out drug like compounds from large compound libraries this paper presents substructure based classification algorithm that decouples the substructure discovery process from the classification model construction and uses frequent subgraph discovery algorithms to find all topological and geometric substructures present in the data set the advantage of this approach is that during classification model construction all relevant substructures are available allowing the classifier to intelligently select the most discriminating ones the computational scalability is ensured by the use of highly efficient frequent subgraph discovery algorithms coupled with aggressive feature selection experimental evaluation on eight different classification problems shows that our approach is computationally scalable and on average outperforms existing schemes by percent to percent
traditional vector architectures often lack virtual memory support because it is difficult to support fast and precise exceptions for these machines in this paper we propose new exception handling model for vector architectures based on software restart markers which divide the program into idempotent regions of code within region the processor can commit instruction results to the architectural state in any order if an exception occurs the machine jumps immediately to the exception handler and kills ongoing instructions to restart execution the operating system has just to begin execution at the start of the region this approach avoids the area and energy overhead to buffer uncommitted vector unit state that would otherwise be required with high performance precise exception mechanism but still provides simple exception handling interface for the operating system our scheme also removes the requirement of preserving vector register file contents in the event of context switch we show that using our approach causes an average performance reduction of less than across variety of benchmarks compared with vector machine that does not support virtual memory
high resolution matrix assisted laser desorption ionization time of flight mass spectrometry has recently shown promise as screening tool for detecting discriminatory peptide protein patterns the major computational obstacle in finding such patterns is the large number of mass charge peaks features biomarkers data points in spectrum to tackle this problem we have developed methods for data preprocessing and biomarker selection the preprocessing consists of binning baseline correction and normalization an algorithm extended markov blanket is developed for biomarker detection which combines redundant feature removal and discriminant feature selection the biomarker selection couples with support vector machine to achieve sample prediction from high resolution proteomic profiles our algorithm is applied to recurrent ovarian cancer study that contains platinum sensitive and platinum resistant samples after treatment experiments show that the proposed method performs better than other feature selection algorithms in particular our algorithm yields good performance in terms of both sensitivity and specificity as compared to other methods
in this paper we present an object oriented modeling approach for specifying and analyzing real time systems two models are used one is an object interaction model that describes the system hierarchy and causal interactions among objects the other is an object specification model that coherently specifies the structural behavioral and control aspects of objects our approach is applied to simple illustrative example the modeling of house heating system
exploiting spatial and temporal locality is essential for obtaining high performance on modern computers writing programs that exhibit high locality of reference is difficult and error prone compiler researchers have developed loop transformations that allow the conversion of programs to exploit locality recently transformations that change the memory layouts of multi dimensional arrays called data transformations have been proposed unfortunately both data and loop transformations have some important draw backs in this work we present an integrated framework that uses loop and data transformations in concert to exploit the benefits of both approaches while minimizing the impact of their disadvantages our approach works inter procedurally on acyclic call graphs uses profile data to eliminate layout conflicts and is unique in its capability of resolving conflicting layout requirements of different references to the same array in the same nest and in different nests for regular array based applicationsthe optimization technique presented in this paper has been implemented in source to source translator we evaluate its performance using standard benchmark suites and several math libraries complete programs with large input sizes experimental results show that our approach reduces the overall execution times of original codes by on the average this reduction comes from three important characteristics of the technique namely resolving layout conflicts between references to the same array in loop nest determining suitable order to propagate layout modifications across loop nests and propagating layouts between different procedures in the program all in unified framework
the capacity needs of online services are mainly determined by the volume of user loads for large scale distributed systems running such services it is quite difficult to match the capacities of various system components in this paper novel and systematic approach is proposed to profile services for resource optimization and capacity planning we collect resource consumption related measurements from various components across distributed systems and further search for constant relationships between these measurements if such relationships always hold under various workloads along time we consider them as invariants of the underlying system after extracting many invariants from the system given any volume of user loads we can follow these invariant relationships sequentially to estimate the capacity needs of individual components by comparing the current resource configurations against the estimated capacity needs we can discover the weakest points that may deteriorate system performance operators can consult such analytical results to optimize resource assignments and remove potential performance bottlenecks in this paper we propose several algorithms to support capacity analysis and guide operator’s capacity planning tasks our algorithms are evaluated with real systems and experimental results are also included to demonstrate the effectiveness of our approach
we present specification in the action language of brewka’s reconstruction of theory of formal disputation originally proposed by rescher the focus is on the procedural aspects rather than the adequacy of this particular protocol for the conduct of debate and the resolution of disputes the specification is structured in three separate levels covering the physical capabilities of the participant agents ii the rules defining the protocol itself specifying which actions are proper and timely according to the protocol and their effects on the protocol state and iii the permissions prohibitions and obligations of the agents and the sanctions and enforcement strategies that deal with non compliance also included is mechanism by which an agent may object to an action by another participant and an optional silence implies consent principle although comparatively simple brewka’s protocol is thus representative of wide range of other more complex argumentation and dispute resolution procedures that have been proposed finally we show how the causal calculator implementation of can be used to animate the specification and to investigate and verify properties of the protocol
data aggregation plays an important role in the design of scalable systems allowing the determination of meaningful system wide properties to direct the execution of distributed applications in the particular case of wireless sensor networks data collection is often only practicable if aggregation is performed several aggregation algorithms have been proposed in the last few years exhibiting different properties in terms of accuracy speed and communication tradeoffs nonetheless existing approaches are found lacking in terms of fault tolerance in this paper we introduce novel fault tolerant averaging based data aggregation algorithm it tolerates substantial message loss link failures while competing algorithms in the same class can be affected by single lost message the algorithm is based on manipulating flows in the graph theoretical sense that are updated using idempotent messages providing it with unique robustness capabilities furthermore evaluation results obtained by comparing it with other averaging approaches have revealed that it outperforms them in terms of time and message complexity
networks have remained challenge for information retrieval and visualization because of the rich set of tasks that users want to accomplish this paper offers an abstract content actor network data model classification of tasks and tool to support them the netlens interface was designed around the abstract content actor network data model to allow users to pose series of elementary queries and iteratively refine visual overviews and sorted lists this enables the support of complex queries that are traditionally hard to specify netlens is general and scalable in that it applies to any data set that can be represented with our abstract data model this paper describes the use of netlens with subset of the acm digital library consisting of about papers from the chi conference written by about authors and reports on usability study with nine participants
we propose simulation technique for elastically deformable objects based on the discontinuous galerkin finite element method dg fem in contrast to traditional fem it overcomes the restrictions of conforming basis functions by allowing for discontinuous elements with weakly enforced continuity constraints this added flexibility enables the simulation of arbitrarily shaped convex and non convex polyhedral elements while still using simple polynomial basis functions for the accurate strain integration over these elements we propose an analytic technique based on the divergence theorem being able to handle arbitrary elements eventually allows us to derive simple and efficient techniques for volumetric mesh generation adaptive mesh refinement and robust cutting
the variety in email related tasks as well as the increase in daily email load has created need for automated email management tools in this paper we provide an empirical evaluation of representational schemes and retrieval strategies for email in particular we study the impact of both textual and non textual email content for case representation applied to email task management our first contribution is stack an email representation based on stacking multiple casebases are created each using different case representation related with attributes corresponding to semi structured email content nn classifier is applied to each casebase and the output is used to form new case representation our second contribution is new evaluation method allowing the creation of random chronological stratified train test trials that respect both temporal and class distribution aspects crucial for the email domain the enron corpus was used to create dataset for the email deletion prediction task evaluation results show significant improvements with stack over single casebase retrieval and multiple casebases retrieval combined using majority vote
in this paper we address the problem of skewed data in topic tracking the small number of stories labeled positive as compared to negative stories and propose method for estimating effective training stories for the topic tracking task for small number of labeled positive stories we use bilingual comparable ie english and japanese corpora together with the edr bilingual dictionary and extract story pairs consisting of positive and associated stories to overcome the problem of large number of labeled negative stories we classified them into clusters this is done using semisupervised clustering algorithm combining means with em the method was tested on the tdt english corpus and the results showed that the system works well when the topic under tracking is talking about an event originating in the source language country even for small number of initial positive training stories
this paper provides general overview of creating scenarios for energy policies using bayesian network bn models bn is useful tool to analyze the complex structures which allows observation of the current structure and basic consequences of any strategic change this research will propose decision model that will support the researchers in forecasting and scenario analysis fields the proposed model will be implemented in case study for turkey the choice of the case is based on complexities of renewable energy resource rich country turkey is heavy energy importer discussing new investments domestic resources could be evaluated under different scenarios aiming the sustainability achievements of this study will open new vision for the decision makers in energy sector
distributed wireless systems dwss are emerging as the enabler for next generation wireless applications there is consensus that dws based applications such as pervasive computing sensor networks wireless information networks and speech and data communication networks will form the backbone of the next technological revolution simultaneously with great economic industrial consumer and scientific potential dwss pose numerous technical challenges among them two are widely considered as crucial autonomous localized operation and minimization of energy consumption we address the fundamental problem of how to maximize the lifetime of the network using only local information while preserving network connectivity we start by introducing the care free sleep cs theorem that provides provably optimal conditions for node to go into sleep mode while ensuring that global connectivity is not affected the cs theorem is the basis for an efficient localized algorithm that decides which nodes will go to into sleep mode and for how long we have also developed mechanisms for collecting neighborhood information and for the coordination of distributed energy minimization protocols the effectiveness of the approach is demonstrated using comprehensive study of the performance of the algorithm over wide range of network parameters another important highlight is the first mathematical and monte carlo analysis that establishes the importance of considering nodes within small number of hops in order to preserve energy
the java virtual machine executes bytecode programs that may have been sent from other possibly untrusted locations on the network since the transmitted code may be written by malicious party or corrupted during network transmission the java virtual machine contains bytecode verifier to check the code for type errors before it is run as illustrated by reported attacks on java run time systems the verifier is essential for system security however no formal specification of the bytecode verifier exists in the java virtual machine specification published by sun in this paper we develop such specification in the form of type system for subset of the bytecode language the subset includes classes interfaces constructors methods exceptions and bytecode subroutines we also present type checking algorithm and prototype bytecode verifier implementation and we conclude by discussing other applications of this work for example we show how to extend our formal system to check other program properties such as the correct use of object locks
three dimensional integrated circuits ics provide an attractive solution for improving circuit performance such solutions must be embedded in an electrothermally conscious design methodology since ics generate significant amount of heat per unit volume in this paper we propose temperature aware global routing algorithm with insertion of thermal vias and thermal wires to lower the effective thermal resistance of the material thereby reducing chip temperature since thermal vias and thermal wires take up lateral routing space our algorithm utilizes sensitivity analysis to judiciously allocate their usage and iteratively resolve contention between routing and thermal vias and thermal wires experimental results show that our routing algorithm can effectively reduce the peak temperature and alleviate routing congestion
in this paper we model probabilistic packet marking ppm schemes for ip traceback as an identification problem of large number of markers each potential marker is associated with distribution on tags which are short binary strings to mark packet marker follows its associated distribution in choosing the tag to write in the ip header since there are large number of for example over markers what the victim receives are samples from mixture of distributions essentially traceback aims to identify individual distribution contributing to the mixture guided by this model we propose random packet marking rpm scheme that uses simple but effective approach rpm does not require sophisticated structure relationship among the tags and employs hop by hop reconstruction similar to ams simulations show improved scalability and traceback accuracy over prior works for example in large network with over nodes markers induce of false positives in terms of edges identification using the ams marking scheme while rpm lowers it to the effectiveness of rpm demonstrates that with prior knowledge of neighboring nodes simple and properly designed marking scheme suffices in identifying large number of markers with high accuracy
sciff is declarative language based on abductive logic programming that accommodates forward rules predicate definitions and constraints over finite domain variables its abductive declarative semantics can be related to that of deontic operators its operational specification is the sound and complete sciff proof procedure defined as set of transition rules implemented and integrated into reasoning and verification tool variation of the sciff proof procedure sciff can be used for static verification of contract properties the use of sciff for business contract specification and verification is demonstrated in concrete scenario encoding of sciff contract rules in ruleml accommodates integration of sciff with architectures for business contracts
in capital market surveillance an emerging trend is that group of hidden manipulators collaborate with each other to manipulate three trading sequences buy orders sell orders and trades through carefully arranging their prices volumes and time in order to mislead other investors affect the instrument movement and thus maximize personal benefits if the focus is on only one of the above three sequences in attempting to analyze such hidden group based behavior or if they are merged into one sequence as per an investor the coupling relationships among them indicated through trading actions and their prices volumes times would be missing and the resulting findings would have high probability of mismatching the genuine fact in business therefore typical sequence analysis approaches which mainly identify patterns on single sequence cannot be used here this paper addresses novel topic namely coupled behavior analysis in hidden groups in particular we propose coupled hidden markov models hmm based approach to detect abnormal group based trading behaviors the resulting models cater for multiple sequences from group of people interactions among them sequence item properties and significant change among coupled sequences we demonstrate our approach in detecting abnormal manipulative trading behaviors on orderbook level stock data the results are evaluated against alerts generated by the exchange’s surveillance system from both technical and computational perspectives it shows that the proposed coupled and adaptive hmms outperform standard hmm only modeling any single sequence or the hmm combining multiple single sequences without considering the coupling relationship further work on coupled behavior analysis including coupled sequence event analysis hidden group analysis and behavior dynamics are very critical
in order to meet the high throughput requirements of applications exhibiting high ilp vliw asips may increasingly include large numbers of functional units fus unfortunately rdquo switching ldquo data through register files shared by large numbers of fus quickly becomes dominant cost performance factor suggesting that clustering smaller number of fus around local register files may be beneficial even if data transfers are required among clusters with such machines in mind we propose compiler transformation predicated switching which enables aggressive speculation while leveraging the penalties associated with inter cluster communication to achieve gains in performance based on representative benchmarks we demonstrate that this novel technique is particularly suitable for application specific clustered machines aimed at supporting high ilp as compared to state of the art approaches
we propose simple solution to the problem of efficient stack evaluation of lru multiprocessor cache memories with arbitrary set associative mapping it is an extension of the existing stack evaluation techniques for all set associative lru uniprocessor caches special marker entries are used in the stack to represent data blocks or lines deleted by an invalidation based cache coherence protocol method of marker splitting is employed when data block below marker in the stack is accessed using this technique one pass trace evaluation of memory access trace yields hit ratios for all cache sizes and set associative mappings of multiprocessor caches in single pass over memory reference trace simulation experiments on some multiprocessor trace data show an order of magnitude speed up in simulation time using this one pass technique
focused places of photo act as significant cue for image concept discovery and quality assessment therefore to find them is an important issue in this paper we design focusing degree detector by which focusing degree map is generated for photograph the results could be used to obtain focused places of photographs as concrete example of their applications image retrieval and image quality assessment are investigated in this work the experimental results show that the retrieval algorithm based on this detector and map can get more accurate retrieval results and the proposed assessment algorithm has high ability to discriminate photos from low quality to high quality
shared nothing clusters are well known and cost effective approach to database server scalability in particular with highly intensive read only workloads typical of many tier web based applications the common reliance on centralized component and simplistic propagation strategy employed by mainstream solutions however conduct to poor scalability with traditional on line transaction processing oltp where the update ratio is high such approaches also pose an additional obstacle to high availability while introducing single point of failuremore recently database replication protocols based on group communication have been shown to overcome such limitations expanding the applicability of shared nothing clusters to more demanding transactional workloads these take simultaneous advantage of total order multicast and transactional semantics to improve on mainstream solutions however none has already been widely deployed in general purpose database management systemin this paper we argue that major hurdle for their acceptance is that these proposals have disappointing performance with specific subsets of real world workloads such limitations are deep rooted and working around them requires in depth understanding of protocols and changes to applications we address this issue with novel protocol that combines multiple transaction execution mechanisms and replication techniques and then show how it avoids the identified pitfalls experimental results are obtained with workload based on the industry standard tpc benchmark
when using rewrite techniques for termination analysis of programs main problem are pre defined data types like integers we extend term rewriting by built in integers and adapt the dependency pair framework to prove termination of integer term rewriting automatically
the peer to peer pp approaches utilize the passive means of delivering large amounts of content due to the ease of content delivery and small amount of effort required for file transmission and failure control by the file provider however pp methods incur network congestion due to the large number of uncontrolled connections established by those retrieving content additionally pp methods potentially stop file providers from exchanging data with other devices because of congested outbound traffic therefore existing pp approaches are infeasible for large scale organizations as congested uncontrolled network is not ideal in business environment related studies demonstrated that the active method for content can minimize outbound congestion for provider but it needs the fault tolerance this study presents novel application called enterprise peer to peer epp which integrates passive and active approaches using epp the outbound traffic from sender on an intranet can be controlled efficiently because the transmission between provider and many receivers can be changed from passive to active on necessary therefore we suggest organizations use epp for the delivery of content
data dissemination in decentralized networks is often realized by using some form of swarming technique swarming enables nodes to gather dynamically in order to fulfill certain task collaboratively and to exchange resources typically pieces of files or packets of multimedia data stream as in most distributed systems swarming applications face the problem that the nodes in network have heterogeneous capabilities or act selfishly we investigate the problem of efficient live data dissemination eg tv streams in swarms the live streams should be distributed in such way that only nodes with sufficiently large contributions to the system are able to fully receive it even in the presence of freeloading nodes or nodes that upload substantially less than required to sustain the multimedia stream in contrast uncooperative nodes cannot properly receive the data stream as they are unable to fill their data buffers in time incentivizing fair sharing of resources if the number of selfish nodes increases our emulation results reveal that the situation steadily deteriorates for them while obedient nodes continue to receive virtually all packets in time
we consider the problem of private efficient data mining of vertically partitioned databases each of several parties holds column of data matrix vector and the parties want to investigate the componentwise combination of their vectors the parties want to minimize communication and local computation while guaranteeing privacy in the sense that no party learns more than necessary sublinear communication private protocols have primarily been studied only in the two party case in contrast this work focuses on multi party settings first we give efficient private multiparty protocols for sampling row of the data matrix and for computing arbitrary functions of random row where the row index is additively shared among two or more parties these results can be used to obtain private approximation protocols for several useful combination functionalities moreover these results have some interesting consequences for the general problem of reducing sublinear communication secure multiparty computation to two party private information retrieval pir second we give protocols for computing approximations summaries of the componentwise sum minimum and maximum of the columns here while providing weaker privacy guarantee where the approximation may leak up to the entire output vector our protocols are extremely efficient in particular the required cryptographic overhead compared to non private solutions is polylogarithmic in the number of rows
we consider the use of database cluster for high performance support of online analytical processing olap applications olap intra query parallelism can be obtained by partitioning the database tables across cluster nodes we propose to combine physical and virtual partitioning into partitioning scheme called adaptive hybrid partitioning ahp ahp requires less disk space while allowing for load balancing we developed prototype for olap parallel query processing in database clusters using ahp our experiments on node database cluster using the tpc benchmark demonstrate linear and super linear speedup thus ahp can reduce significantly the execution time of typical olap queries
java will include type system called jsr that supports parametric polymorphism or generic classes this will bring many benefits to java programmers not least because current java practice makes heavy use of logically generic classes including container classes translation of java source code into semantically equivalent jsr source code requires two steps parameterization adding type parameters to class definitions and instantiation adding the type arguments at each use of parameterized class parameterization need be done only once for class whereas instantiation must be performed for each client of which there are potentially many more therefore this work focuses on the instantiation problem we present technique to determine sound and precise jsr types at each use of class for which generic type specification is available our approach uses precise and context sensitive pointer analysis to determine possible types at allocation sites and set constraint based analysis that incorporates guarded or conditional constraints to choose consistent types for both allocation and declaration sites the technique handles all features of the jsr type system notably the raw types that provide backward compatibility we have implemented our analysis in tool that automatically inserts type parameters into java code and we report its performance when applied to number of real world java programs
we present timing driven partitioning and simulated annealing based placement algorithms together with detailed routing tool for fpga integration the circuit is first divided into layers with limited number of inter layer vias and then placed on individual layers while minimizing the delay of critical paths we use our tool as platform to explore the potential benefits in terms of delay and wire length that technologies can offer for fpga fabrics experimental results show on average total decrease of in wire length and in delay can be achieved over traditional chips when five layers are used in integration
the exploitation of parallelism among traces ie hot paths of execution in programs is novel approach to the automatic parallelization of java programs and it has many advantages however to date the extent to which parallelism exists among traces in programs has not been made clear the goal of this study is to measure the amount of trace level parallelism in several java programs we extend the jupiter java virtual machine with simulator that models an abstract parallel system we use this simulator to measure trace level parallelism we further use it to examine the effects of the number of processors trace window size and communication type and cost on performance our results indicate that enough trace level parallelism exists for modest number of processors thus we conclude that trace based parallelization is potentially viable approach to improve the performance of java programs
various types of security goals such as authentication or confidentiality can be defined as policies for service oriented architectures typically in manual fashion therefore we foster model driven transformation approach from modelled security goals in the context of process models to concrete security implementations we argue that specific types of security goals may be expressed in graphical fashion at the business process modelling level which in turn can be transformed into corresponding access control and security policies in this paper we present security policy and policy constraint models we further discuss translation of security annotated business processes into platform specific target languages such as xacml or axis security configurations to demonstrate the suitability of this approach an example transformation is presented based on an annotated process
we consider the following problem which is called the half integral disjoint paths packing input graph pair of vertices hellip sk tk in which are sometimes called terminals output paths hellip pk in such that pi joins si and ti for hellip and in addition each vertex is on at most two of these paths we present an log time algorithm for this problem for fixed this improves result by kleinberg who gave an algorithm for this problem in fact we also have algorithms running in epsilon time for any epsilon for these problems if is up to log log for general graphs up to log log log for planar graphs and up to log log log for graphs on the surface where is the euler genus furthermore if is fixed then we have linear time algorithms for the planar case and for the bounded genus case we also obtain log algorithms for several optimization problems related to the bounded unsplittable flow problem when the number of terminal pairs is bounded these results can all carry over to problems involving edge capacities
distance estimation is fundamental for many functionalities of wireless sensor networks and has been studied intensively in recent years critical challenge in distance estimation is handling anisotropic problems in sensor networks compared with isotropic networks anisotropic networks are more intractable in that their properties vary according to the directions of measurement anisotropic properties result from various factors such as geographic shapes irregular radio patterns node densities and impacts from obstacles in this paper we study the problem of measuring irregularity of sensor networks and evaluating its impact on distance estimation in particular we establish new metric to measure irregularity along path in sensor networks and identify turning nodes where considered path is inflected furthermore we develop an approach to construct virtual ruler for distance estimation between any pair of sensor nodes the construction of virtual ruler is carried out according to distance measurements among beacon nodes however it does not require beacon nodes to be deployed uniformly throughout sensor networks compared with existing methods our approach neither assumes global knowledge of boundary recognition nor relies on uniform distribution of beacon nodes therefore this approach is robust and applicable in practical environments simulation results show that our approach outperforms some previous methods such as dvdistance and pdm
we present topology synthesis method for high performance system on chip soc design our method provides an optimal topology of on chip communication network for the given bandwidth latency frequency and or area constraints the optimal topology consists of multiple crossbar switches and some of them can be connected in cascaded fashion for higher clock frequency and or area efficiency compared to previous works the major contribution of our work is the exactness of the solution from two aspects first the solving method of our work is exact by employing the mixed integer linear programming milp method second we generalize the crossbar switch representation in milp in order that the optimal topology can include any arbitrary sizes of crossbar switches together the experimental results show that the topologies optimized for the clock frequency area give up to improvements compared to the conventional single large crossbar switch networks for two industrial strength soc designs
we present an overview of the saturn program analysis system including rationale for three major design decisions the use of function at time or summary based analysis the use of constraints and the use of logic programming language to express program analysis algorithms we argue that the combination of summaries and constraints allows saturn to achieve both great scalability and great precision while the use of logic programming language with constraints allows for succinct high level expression of program analyses
the objective of control generation in logic programming is to derive computation rule for program that is efficient and yet does not compromise programcorrectness progress in solving this fundamental problem in logic programming has been slow and to date only partial solutions have been proposed previously proposed schemes are either inefficient incomplete incorrect or difficult to apply for programs consisting of many components the scheme is not modular this paper shows how the control generation problem can be tackled by program transformation the transformation relies on information about the depths of derivations to derive delay declarations which orchestrate the control to prove correctness of the transformation the notion of semi delay recurrency is introduced which generalises previous ideas in the termination literature for reasoning about logic programs with delay declarations in contrast to previous work semi delay recurrency does not require an atom to be completely resolved before another is selected for reduction this enhancement permits the transformation to introduce control which is flexible and relatively efficient
time varying volumetric data arise in variety of application domains and thus several techniques for dealing with such data have been proposed in the literature time varying dataset is typically modeled either as collection of discrete snapshots of volumetric data or as four dimensional dataset this choice influences the operations that can be efficiently performed on such data here we classify the various approaches to modeling time varying scalar fields and briefly describe them since most models of time varying data have been abstracted from well known approaches to volumetric data we review models of volumetric data as well as schemes to accelerate isosurface extraction and discuss how these approaches have been applied to time varying datasets finally we discuss multi resolution approaches which allow interactive processing and visualization of large time varying datasets
many claims have been made about the consequences of not documenting design rationale the general perception is that designers and architects usually do not fully understand the critical role of systematic use and capture of design rationale however there is to date little empirical evidence available on what design rationale mean to practitioners how valuable they consider it and how they use and document it during the design process this paper reports survey of practitioners to probe their perception of the value of design rationale and how they use and document the background knowledge related to their design decisions based on valid responses this study has discovered that practitioners recognize the importance of documenting design rationale and frequently use them to reason about their design choices however they have indicated barriers to the use and documentation of design rationale based on the findings we conclude that further research is needed to develop methodology and tool support for design rationale capture and usage furthermore we put forward some specific research questions about design rationale that could be further investigated to benefit industry practice
policy authors typically reconcile several different mental models and goals such as enabling collaboration securing information and conveying trust in colleagues the data underlying these models such as which roles are more trusted than others isn’t generally used to define policy rules as result policy management environments don’t gather this information in turn they fail to exploit it to help users check policy decisions against their multiple perspectives we present model of triangulating authoring environments that capture the data underlying these different perspectives and iteratively sanity check policy decisions against this information while editing we also present tool that consumes instances of the model and automatically generates prototype authoring tools for the described domain
news articles contain wealth of implicit geographic content that if exposed to readers improves understanding of today’s news however most articles are not explicitly geotagged with their geographic content and few news aggregation systems expose this content to users new system named newsstand is presented that collects analyzes and displays news stories in map interface thus leveraging on their implicit geographic content newsstand monitors rss feeds from thousands of online news sources and retrieves articles within minutes of publication it then extracts geographic content from articles using custom built geotagger and groups articles into story clusters using fast online clustering algorithm by panning and zooming in newsstand’s map interface users can retrieve stories based on both topical significance and geographic region and see substantially different stories depending on position and zoom level
we model the monadic second order logic mso evaluation problem on finite colored trees in purely database theoretic framework based on the well knownmso automata connection we reduce the problem to an acyclic conjunctive query evaluation problem on the one hand and to monadic datalog evaluation problem on the other hand this approach offers the possibility to solve the mso problem using optimized evaluation methods for relational algebra expressions and for datalog programs such as yannakakis algorithm and the rewriting method using resolutionbased filtering referred to as magic sets method in we use these methods for evaluating our queries and giving estimates of their complexity this is the first time to our knowledge that solution to the mso evaluation problem related to relational algebra is given furthermore thanks to this reduction we prove that the automata based algorithm given in constitutes particular instance of yannakakis algorithm besides the optimized database methods that we propose for solving the mso evaluation problem our results prove that mso definable queries over colored trees are datalog definable this result subsumes the corresponding result in which states that unary mso queries are monadic datalog definable and it also subsumes the well known result that any mso definable class of trees is monadic datalog definable
the effect of caching is fully determined by the program locality or the data reuse and several cache management techniques try to base their decisions on the prediction of temporal locality in programs however prior work reports only rough techniques which either try to predict when cache block loses its temporal locality or try to categorize cache items as highly or poorly temporal in this work we quantify the temporal characteristics of the cache block at run time by predicting the cache block reuse distances measured in intervening cache accesses based on the access patterns of the instructions pcs that touch the cache blocks we show that an instruction based reused distance predictor is very accurate and allows approximation of optimal replacement decisions since we can see the future we experimentally evaluate our prediction scheme in various sizes caches using subset of the most memory intensive spec benchmarks our proposal obtains significant improvement in terms of ipc over traditional lru up to on average and it also outperforms the previous state of the art proposal namely dynamic insertion policy or dip by up to on average
web information gathering surfers from the problems of information mismatching and overloading in an attempt to solve these fundamental problems many works have proposed to use concept based techniques to perform personalized information gathering for web users these works have significantly improved the performance of web information gathering systems in this paper survey is conducted on these works the reviewed scholar report that the concept based personalized techniques can gather more useful and meaningful information for web users the survey also suggests that improvement is needed for the representation and acquisition of user profiles in personalized web information gathering
this paper addresses the wide gap in space complexity of atomic multi writer multi reader register implementations while the space complexity of all previous implementations is linear the lower bounds are logarithmic we present three implementations which close this gap the first implementation is sequential and its role is to present the idea and data structures used in the second and third implementations the second and third implementations are both concurrent the second uses multi reader physical registers while the third uses single reader physical registers both the second and third implementations are optimal with respect to the two most important complexity criteria their space complexity is logarithmic and their time complexity is linear
although many high performance computer systems are now multiprocessor based little work has been done in real time concurrency control of transaction executions in multiprocessor environment real time concurrency control protocols designed for uniprocessor or distributed environments may not fit the needs of multiprocessor based real time database systems because of lower concurrency degree of transaction executions and larger number of priority inversions this paper proposes the concept of priority cap to bound the maximum number of priority inversions in multiprocessor based real time database systems to meet transaction deadlines we also explore the concept of two version data to increase the system concurrency level and to explore the abundant computing resources of multiprocessor computer systems the capability of the proposed methodology is evaluated in multiprocessor real time database system under different workloads database sizes and processor configurations it is shown that the benefits of priority cap in reducing the blocking time of urgent transactions is far over the loss in committing less urgent transactions the idea of two version data also greatly improves the system performance because of much higher concurrency degree in the system
we discuss four different core protocols for synchronizing access to and modifications of xml document collections these core protocols synchronize structure traversals and modifications they are meant to be integrated into native xml base management system xbms and are based on two phase locking we also demonstrate the different degrees of cooperation that are possible with these protocols by various experimental results furthermore we also discuss extensions of these core protocols to full fledged protocols further we show how to achieve higher degree of concurrency by exploiting the semantics expressed in document type definitions dtds
rendezvous is conference call solution that leverages voice over ip enterprise calendaring instant messaging and rich client functionality to enhance the user experience and effectiveness of distributed meetings we describe the service and two of its user experience innovations the conference call proxy and ihelp which function as digital backchannels we present results from preliminary user evaluation and discuss our notion of digital backchannels with respect to the social translucence framework
in this paper we study the overall link based spam structure and its evolution which would be helpful for the development of robust analysis tools and research for web spamming as social activity in the cyber space first we use strongly connected component scc decomposition to separate many link farms from the largest scc so called the core we show that denser link farms in the core can be extracted by node filtering and recursive application of scc decomposition to the core surprisingly we can find new large link farms during each iteration and this trend continues until at least iterations in addition we measure the spamicity of such link farms next the evolution of link farms is examined over two years results show that almost all large link farms do not grow anymore while some of them shrink and many large link farms are created in one year
there is growing realization that modern database management systems dbmss must be able to manage data that contains uncertainties that are represented in the form of probabilistic relations consequently the design of each core dbms component must be revisited in the presence of uncertain and probabilistic information in this paper we study how to build histogram synopses for probabilistic relations for the purposes of enabling both dbms internal decisions such as indexing and query planning and possibly user facing approximate query processing tools in contrast to initial work in this area our probabilistic histograms retain the key possible worlds semantics of probabilistic data allowing for more accurate yet concise representation of the uncertainty characteristics of data and query results we present variety of techniques for building optimal probabilistic histograms each one tuned to different choice of approximation error metric we show that these can be incorporated into general dynamic programming dp framework which generalizes that used for existing histogram constructions the end result is histogram where each bucket is approximately represented by compact probability distribution function pdf which can be used as the basis for query planning and approximate query answering we present novel polynomial time algorithms to find optimal probabilistic histograms for variety of pdf error metrics including variation distance sum squared error max error and emd our experimental study shows that our probabilistic histogram synopses can accurately capture the key statistical properties of uncertain data while being much more compact to store and work with than the original uncertain relations
background finding relevant articles from pubmed is challenging because it is hard to express the user’s specific intention in the given query interface and keyword query typically retrieves large number of results researchers have applied machine learning techniques to find relevant articles by ranking the articles according to the learned relevance function however the process of learning and ranking is usually done offline without integrated with the keyword queries and the users have to provide large amount of training documents to get reasonable learning accuracy this paper proposes novel multi level relevance feedback system for pubmed called refmed which supports both ad hoc keyword queries and multi level relevance feedback in real time on pubmed results refmed supports multi level relevance feedback by using the ranksvm as the learning method and thus it achieves higher accuracy with less feedback refmed tightly integrates the ranksvm into rdbms to support both keyword queries and the multi level relevance feedback in real time the tight coupling of the ranksvm and dbms substantially improves the processing time an efficient parameter selection method for the ranksvm is also proposed which tunes the ranksvm parameter without performing validation thereby refmed achieves high learning accuracy in real time without performing validation process refmed is accessible at sf http dmpostechackr refmed conclusions refmed is the first multi level relevance feedback system for pubmed which achieves high accuracy with less feedback it effectively learns an accurate relevance function from the user’s feedback and efficiently processes the function to return relevant articles in real time
as computing becomes more pervasive information sharing occurs in broad highly dynamic network based environments such pervasive computing environments pose difficult challenge in formally accessing the resources the digital information generally represents sensitive and confidential information that organizations must protect and allow only authorized personnel to access and manipulate them as organizations implement information strategies that call for sharing access to resources in the networked environment mechanisms must be provided to protect the resources from adversaries in this paper we seek to address the issue of how to advocate selective information sharing while minimizing the risks of unauthorized access we integrate role based delegation framework to propose system architecture we also demonstrate the feasibility of our framework through proof of concept implementation
the capability of the random access machine ram to execute any instruction in constant time is not realizable due to fundamental physical constraints on the minimum size of devices and on the maximum speed of signals this work explores how well the ideal ram performance can be approximated for significant classes of computations by machines whose building blocks have constant size and are connected at constant distance novel memory structure is proposed which is pipelined can accept new request at each cycle and hierarchical exhibiting optimal latency equals to address in dimensional realizations in spite of block transfer or other memory pipeline capabilities number of previous machine models do not achieve full overlap of memory accesses these are examples of machines with explicit data movement it is shown that there are direct flow computations without branches and indirect accesses that require time superlinear in the number of instructions on all such machines to circumvent the explicit data movement constraints the speculative prefetcher sp and the speculative prefetcher and evaluator spe processors are developed both processors can execute any direct flow program in linear time the spe also executes in linear time class of loop programs that includes many significant algorithms even quicksort somewhat irregular recursive algorithm admits linear time spe implementation relation between instructions called address dependence is introduced which limits memory access overlap and can lead to superlinear time as illustrated with the classical merging algorithm
software archives are one of the best sources available to researchers for understanding the software development process however much detective work is still necessary in order to unravel the software development story during this process researchers must isolate changes and follow their trails over time in support of this analysis several research tools have provided different representations for connecting the many changes extracted from software archives most of these tools are based on textual analysis of source code and use line based differencing between software versions this approach limits the ability to process changes structurally resulting in less concise and comparable items adoption of structure based approaches have been hampered by complex implementations and overly verbose change descriptions we present technique for expressing changes that is fine grained but preserves some structural aspects the structural information itself may not have changed but instead provides context for interpreting the change this in turn enables more relevant and concise descriptions in terms of software types and programming activities we apply our technique to common challenges that researchers face and then we discuss and compare our results with other techniques
can follow concurrency control permits transactionto read write an item write locked read locked by anothertransaction with almost no delays by combining the merits ofpl and vpl this approach mitigates the lock contention notonly between update and read only transactions but also betweenupdate and update transactions
while join processing in wireless sensor networks has received lot of attention recently current solutions do not work well for continuous queries in those networks however continuous queries are the rule to minimize the communication costs of join processing it is important to not ship non joining tuples in order to know which tuples do not join prior work has proposed precomputation step for continuous queries however repeating the precomputation for each execution is unnecessary and leaves aside that data tends to be temporally correlated in this paper we present filtering approach for the processing of continuous join queries we propose to keep the filters and to maintain them the problems are determining the sizes of the filters and deciding which filters to update simplistic approaches result in bad performance we show how to compute solutions that are optimal experiments on real world sensor data indicate that our method performs close to theoretical optimum and consistently outperforms state of the art join approaches
we introduce system fc which extends system with support for non syntactic type equality there are two main extensions explicit witnesses for type equalities and ii open non parametric type functions given meaning by top level equality axioms unlike system fc is expressive enough to serve as target for several different source language features including haskell’s newtype generalised algebraic data types associated types functional dependencies and perhaps more besides
common task in many text mining applications is to generate multi faceted overview of topic in text collection such an overview not only directly serves as an informative summary of the topic but also provides detailed view of navigation to different facets of the topic existing work has cast this problem as categorization problem and requires training examples for each facet this has three limitations all facets are predefined which may not fit the need of particular user training examples for each facet are often unavailable such an approach only works for predefined type of topics in this paper we break these limitations and study more realistic new setup of the problem in which we would allow user to flexibly describe each facet with keywords for an arbitrary topic and attempt to mine multi faceted overview in an unsupervised way we attempt probabilistic approach to solve this problem empirical experiments on different genres of text data show that our approach can effectively generate multi faceted overview for arbitrary topics the generated overviews are comparable with those generated by supervised methods with training examples they are also more informative than unstructured flat summaries the method is quite general thus can be applied to multiple text mining tasks in different application domains
unified modeling language uml has emerged as the software industry’s dominant modeling language it is the de facto modeling language standard for specifying visualizing constructing and documenting the components of software systems despite its prominence and status as the standard modeling language uml has its critics opponents argue that it is complex and difficult to learn some question the rationale of having nine diagramming techniques in uml and the raison d’être of those nine techniques in uml others point out that uml lacks comprehensive methodology to guide its users which makes the language even more convoluted few studies on uml can be found in the literature however no study exists to provide quantitative measure of uml complexity or to compare uml with other object oriented techniques in this research we evaluate the complexity of uml using complexity metrics the objective is to provide reliable and accurate quantitative measure of uml complexity comparison of the complexity metrical values of uml with other object oriented techniques was also carried out our findings suggest that each diagram in uml is not distinctly more complex than techniques in other modeling methods but as whole uml is very complex times more complex than other modeling methods
the peak heap consumption of program is the maximum size of the live data on the heap during the execution of the program ie the minimum amount of heap space needed to run the program without exhausting the memory it is well known that garbage collection gc makes the problem of predicting the memory required to run program difficult this paper presents the best of our knowledge the first live heap space analysis for garbage collected languages which infers accurate upper bounds on the peak heap usage of program’s execution that are not restricted to any complexity class ie we can infer exponential logarithmic polynomial etc bounds our analysis is developed for an sequential object oriented bytecode language with scoped memory manager that reclaims unreachable memory when methods return we also show how our analysis can accommodate other gc schemes which are closer to the ideal gc which collects objects as soon as they become unreachable the practicality of our approach is experimentally evaluated on prototype implementation we demonstrate that it is fully automatic reasonably accurate and efficient by inferring live heap space bounds for standardized set of benchmarks the jolden suite
an increasing number of applications use xml data published from relational databases for speed and convenience such applications routinely cache this xml data locally and access it through standard navigational interfaces such as dom sacrificing the consistency and integrity guarantees provided by dbms for speed the rolex system is being built to extend the capabilities of relational database systems to deliver fast consistent and navigable xml views of relational data to an application via virtual dom interface this interface translates navigation operations on dom tree into execution plan actions allowing spectrum of possibilities for lazy materialization the rolex query optimizer uses characterization of the navigation behavior of an application and optimizes view queries to minimize the expected cost of that navigation this paper presents the architecture of rolex including its model of query execution and the query optimizer we demonstrate with performance study the advantages of the rolex approach and the importance of optimizing query execution for navigation
collaborative filtering cf has been studied extensively in the literature and is demonstrated successfully in many different types of personalized recommender systems in this paper we propose unified method combining the latent and external features of users and items for accurate recommendation mapping scheme for collaborative filtering problem to text analysis problem is introduced and the probabilistic latent semantic analysis was used to calculate the latent features based on the historical rating data the main advantages of this technique over standard memory based methods are the higher accuracy constant time prediction and an explicit and compact model representation the experimental evaluation shows that substantial improvements in accuracy over existing methods can be obtained
software engineers informally use block diagrams with boxes and lines to express system architectures diagrammatic representations of this type are also found in many specification techniques however rarely are architectural documents containing such representations systematically maintained as system evolves architectural documents become obsolete and the design history of the system is ultimately lost additionally box and line representations used in these documents do not possess precise semantics invariant across the different techniques that rely on them this paper addresses expression of system evolution at the architectural level based on formal model of box and line diagrams the formal model provides semantic uniformity and precision and allows evolutionary steps to be represented as structural transformations interesting classes of such transformations are characterized in terms of the underlying operators with these tools the architectural evolution of system is captured as directed acyclic graph of baselines where each baseline consists of system of box and line diagrams and is mapped to successor baseline by set of structural transformations it is also shown how familiar design concepts such as extension abstraction and structural refinement can be formalized in simple terms within the framework developed
presents parallel hash join algorithm that is based on the concept of hierarchicalhashing to address the problem of data skew the proposed algorithm splits the usualhash phase into hash phase and an explicit transfer phase and adds an extrascheduling phase between these two during the scheduling phase heuristicoptimization algorithm using the output of the hash phase attempts to balance the loadacross the multiple processors in the subsequent join phase the algorithm naturallyidentifies the hash partitions with the largest skew values and splits them as necessary assigning each of them to an optimal number of processors assuming for concreteness azipf like distribution of the values in the join column join phase which is cpu bound and shared nothing environment the algorithm is shown to achieve good join phaseload balancing and to be robust relative to the degree of data skew and the totalnumber of processors the overall speedup due to this algorithm is compared to someexisting parallel hash join methods the proposed method does considerably better in high skew situations
this study investigated use of collaborative recommendations in web searching an experimental system was designed in the experimental system recommendations were generated in group report format including items judged relevant by previous users search queries and the urls of documents the study explored how users used these items the effects of their use and what factors contributed to this use the results demonstrate that users preferred using queries and document sources urls rather than relevance judgment document ratings the findings also show that using recommended items had significant effect on the number of documents viewed but not on precision or number of queries task difficulty and search skills had significant impact on the use possible reasons for the results are analyzed implications and future directions are discussed
interpreting the relevance of user contributed tag with respect to the visual content of an image is an emerging problem in social image retrieval in the literature this problem is tackled by analyzing the correlation between tags and images represented by specific visual features unfortunately no single feature represents the visual content completely eg global features are suitable for capturing the gist of scenes while local features are better for depicting objects to solve the problem of learning tag relevance given multiple features we introduce in this paper two simple and effective methods one is based on the classical borda count and the other is method we name uniformtagger both methods combine the output of many tag relevance learners driven by diverse features in an unsupervised rather than supervised manner experiments on million social tagged images and two test sets verify our proposal using learned tag relevance as updated tag frequency for social image retrieval both borda count and uniformtagger outperform retrieval without tag relevance learning and retrieval with single feature tag relevance learning moreover the two unsupervised methods are comparable to state of the art supervised alternative but without the need of any training data
recent research suggests that architectural knowledge such as design decisions is important and should be recorded alongside the architecture description different approaches have emerged to support such architectural knowledge ak management activities however there are different notions of and emphasis on what and how architectural activities should be supported this is reflected in the design and implementation of existing ak tools to understand the current status of software architecture knowledge engineering and future research trends this paper compares five architectural knowledge management tools and the support they provide in the architecture life cycle the comparison is based on an evaluation framework defined by set of criteria the results of the comparison provide insights into the current focus of architectural knowledge management support their advantages deficiencies and conformance to the current architectural description standard based on the outcome of this comparison research agenda is proposed for future work on ak tools
this paper studies the existence and the regularity of logarithmic harary graphs lhgs this study is motivated by the fact that these graphs are employed for modeling the communication topology to support efficient flooding in the presence of link and node failures when considering an initial arbitrary number of nodes therefore the capability to identify graph constraints that allow the construction of lhgs for the largest number of pairs where is the desired degree of connectivity to be tolerant to failures becomes of primary importance the paper presents several results in that direction we introduce graph constraint namely pasted tree that allows the construction of lhg for every pair such that secondly we present another graph constraint for lhg namely diamond which is equivalent to pasted tree in terms of capability to construct lhgs for any pair the interest of diamond lies in the fact that for given diamond allows us to construct more regular graphs than pasted tree does regular graph shows the minimal number of links required by connected graph leading to minimal flooding cost the paper formally shows in particular that there are an infinite number of pairs such that there exists regular lhg for the pair that satisfies diamond and does not satisfy pasted tree
in this position paper we argue for exploiting the synergy between gossip based algorithms and structured overlay networks son these two strands of research have both aimed at building fault tolerant dynamic self managing and large scale distributed systems despite the common goals the two areas have however been relatively isolated we focus on three problem domains where there is an untapped potential of using gossiping combined with sons we argue for applying gossip based membership for ring based sons such as chord and bamboo to make them handle partition mergers and loopy networks we argue that small world sons such as accordion and mercury are specifically well suited for gossip based membership management the benefits would be better graph theoretic properties finally we argue that gossip based algorithms could use the overlay constructed by sons for example many unreliable broadcast algorithms for sons could be augmented with anti entropy protocols similarly gossip based aggregation could be used in sons for network size estimation and load balancing purposes
with the proliferation of mobile computing the ability to index efficiently the movements of mobile objects becomes important objects are typically seen as moving in two dimensional space which means that their movements across time may be embedded in the three dimensional space further the movements are typically represented as trajectories sequences of connected line segments in certain cases movement is restricted specifically in this paper we aim at exploiting that movements occur in transportation networks to reduce the dimensionality of the data briefly the idea is to reduce movements to occur in one spatial dimension as consequence the movement occurs in two dimensional space the advantages of considering such lower dimensional trajectories are that the overall size of the data is reduced and that lower dimensional data is to be indexed since off the shelf database management systems typically do not offer higher dimensional indexing this reduction in dimensionality allows us to use existing dbmses to store and index trajectories moreover we argue that given the right circumstances indexing these dimensionality reduced trajectories can be more efficient than using three dimensional index decisive factor here is the fractal dimension of the network the lower the more efficient is the proposed approach this hypothesis is verified by an experimental study that incorporates trajectories stemming from real and synthetic road networks
skyline queries return set of interesting data points that are not dominated on all dimensions by any other point most of the existing algorithms focus on skyline computation in centralized databases and some of them can progressively return skyline points upon identification rather than all in batch processing skyline queries over the web is more challenging task because in many web applications the target attributes are stored at different sites and can only be accessed through restricted external interfaces in this paper we develop pds progressive distributed skylining progressive algorithm that evaluates skyline queries efficiently in this setting the algorithm is also able to estimate the percentage of skyline objects already retrieved which is useful for users to monitor the progress of long running skyline queries our performance study shows that pds is efficient and robust to different data distributions and achieves its progressive goal with minimal overhead
evaluating and selecting software packages that meet an organization’s requirements is difficult software engineering process selection of wrong software package can turn out to be costly and adversely affect business processes the aim of this paper is to provide basis to improve the process of evaluation and selection of the software packages this paper reports systematic review of papers published in journals and conference proceedings the review investigates methodologies for selecting software packages software evaluation techniques software evaluation criteria and systems that support decision makers in evaluating software packages the key findings of the review are analytic hierarchy process has been widely used for evaluation of the software packages there is lack of common list of generic software evaluation criteria and its meaning and there is need to develop framework comprising of software selection methodology evaluation technique evaluation criteria and system to assist decision makers in software selection
atomicity is desirable property that safeguards application consistency for service compositions service composition exhibiting this property could either complete or cancel itself without any side effects it is possible to achieve this property for service composition by selecting suitable web services to form an atomicity sphere however this property might still be breached at runtime due to the interference between various service compositions caused by implicit interactions existing approaches to addressing this problem by restricting concurrent execution of services to avoid all implicit interactions however compromise the performance of service compositions due to the long running nature of web services in this paper we propose novel static approach to analyzing the implicit interactions web service may incur and their impacts on the atomicity property in each of its service compositions by locating afflicted implicit interactions in service composition behavior constraints based on property propagation are formulated as local safety properties which can then be enforced by the affected web services at runtime to suppress the impacts of the afflicted implicit interactions we show that the satisfaction of these safety properties exempts the atomicity property of this service composition from being interfered by other services at runtime the approach is illustrated using two service applications
the widely used mark and sweep garbage collector has drawback in that it does not move objects during collection as result large long running realistic applications such as web application servers frequently face the fragmentation problem to eliminate fragmentation heap compaction is run periodically however compaction typically imposes very long undesirable pauses in the application while efficient concurrent collectors are ubiquitous in production runtime systems such as jvms an efficient non intrusive compactor is still missingin this paper we present the compressor novel compaction algorithm that is concurrent parallel and incremental the compressor compacts the entire heap to single condensed area while preserving the objects order but reduces pause times significantly thereby allowing acceptable runs on large heaps furthermore the compressor is the first compactor that requires only single heap pass as such it is the most efficient compactors known today even when run in parallel stop the world manner ie when the program threads are halted thus to the best of our knowledge the compressor is the most efficient compactor known today the compressor was implemented on jikes research rvm and we provide measurements demonstrating its qualities
an important problem in wireless ad hoc and sensor networks is to select few nodes to form virtual backbone that supports routing and other tasks such as area monitoring previous work in this area has focused on selecting small virtual backbone for high efficiency in this paper we propose the construction of connected dominating set cds as backbone to balance efficiency and fault tolerance four localized cds construction protocols are proposed the first protocol randomly selects virtual backbone nodes with given probability pk where pk depends on the value of and network condition such as network size and node density the second one maintains fixed backbone node degree of bk where bk also depends on the network condition the third protocol is deterministic approach it extends wu and dai’s coverage condition which is originally designed for cds construction to ensure the formation of cds the last protocol is hybrid of probabilistic and deterministic approaches it provides generic framework that can convert many existing cds algorithms into cds algorithms these protocols are evaluated via simulation study
we present an api for computing the semantic relatedness of words in wikipedia
this paper addresses the difficult problem of selecting representative samples of peer properties eg degree link bandwidth number of files shared in unstructured peer to peer systems due to the large size and dynamic nature of these systems measuring the quantities of interest on every peer is often prohibitively expensive while sampling provides natural means for estimating system wide behavior efficiently however commonly used sampling techniques for measuring peer to peer systems tend to introduce considerable bias for two reasons first the dynamic nature of peers can bias results towards short lived peers much as naively sampling flows in router can lead to bias towards short lived flows second the heterogeneous nature of the overlay topology can lead to bias towards high degree peerswe present detailed examination of the ways that the behavior of peer to peer systems can introduce bias and suggest the metropolized random walk with backtracking mrwb as viable and promising technique for collecting nearly unbiased samples we conduct an extensive simulation study to demonstrate that the proposed technique works well for wide variety of common peer to peer network conditions using the gnutella network we empirically show that our implementation of the mrwb technique yields more accurate samples than relying on commonly used sampling techniques furthermore we provide insights into the causes of the observed differences the tool we have developed ion sampler selects peer addresses uniformly at random using the mrwb technique these addresses may then be used as input to another measurement tool to collect data on particular property
popular software testing tools such as junit allow frequent retesting of modified code yet the manually created test scripts are often seriously incomplete unit testing tool called jwalk has therefore been developed to address the need for systematic unit testing within the context of agile methods the tool operates directly on the compiled code for java classes and uses new lazy method for inducing the changing design of class on the fly this is achieved partly through introspection using java’s reflection capability and partly through interaction with the user constructing and saving test oracles on the fly predictive rules reduce the number of oracle values that must be confirmed by the tester without human intervention jwalk performs bounded exhaustive exploration of the class’s method protocols and may be directed to explore the space of algebraic constructions or the intended design state space of the tested class with some human interaction jwalk performs up to the equivalent of fully automated state based testing from specification that was acquired incrementally
deadlock detection scheduling is an important yet oft overlooked problem that can significantly affect the overall performance of deadlock handlingan excessive initiation of deadlock detection increases overall message usage resulting in degraded system performance in the absence of deadlocks while deficient initiation of deadlock detection increases the deadlock persistence time resulting in an increased deadlock resolution cost in the presence of deadlocks such performance tradeoff however is generally missing in literature in this paper we study the impact of deadlock detection scheduling on the system performance and show that there exists an optimal deadlock detection frequency that yields the minimum long run mean average cost associated with the message complexity of deadlock detection and resolution algorithms and the rate of deadlock formation based on the up to date deadlock detection and resolution algorithms we show that the asymptotically optimal frequency of deadlock detection scheduling that minimizes the message overhead is cal when the total number of processes is sufficiently large furthermore we show that in general fully distributed uncoordinated deadlock detection scheduling can not be performed as efficiently as centralized coordinated deadlock detection scheduling
we study the design of truthful mechanisms that do not use payments for the generalized assignment problem gap and its variants an instance of the gap consists of bipartite graph with jobs on one side and machines on the other machines have capacities and edges have values and sizes the goal is to construct welfare maximizing feasible assignment in our model of private valuations motivated by impossibility results the value and sizes on all job machine pairs are public information however whether an edge exists or not in the bipartite graph is job’s private information that is the selfish agents in our model are the jobs and their private information is their edge set we want to design mechanisms that are truthful without money henceforth strategyproof and produce assignments whose welfare is good approximation to the optimal omniscient welfare we study several variants of the gap starting with matching for the unweighted version we give an optimal strategyproof mechanism for maximum weight bipartite matching we show that no strategyproof mechanism deterministic or randomized can be optimal and present approximate strategyproof mechanism along with matching lowerbound next we study knapsack like problems which unlike matching are np hard for these problems we develop general lp based technique that extends the ideas of lavi and swamy to reduce designing truthful approximate mechanism without money to designing such mechanism for the fractional version of the problem we design strategyproof approximate mechanisms for the fractional relaxations of multiple knapsack size invariant gap and value invariant gap and use this technique to obtain respectively and approximate strategyproof mechanisms for these problems we then design an log approximate strategyproof mechanism for the gap by reducing with logarithmic loss in the approximation to our solution for the value invariant gap our technique may be of independent interest for designing truthful mechanisms without money for other lp based problems
high performance computer simulations are an increasingly popular alternative or complement to physical experiments or prototypes however as these simulations grow more massive and complex it becomes challenging to monitor and control their execution cumulvs is middleware infrastructure for visualizing and steering scientific simulations while they are running front end viewers attach dynamically to simulation programs to extract and collect intermediate data values even if decomposed over many parallel tasks these data can be graphically viewed or animated in variety of commercial or custom visualization environments using provided viewer library in response to this visual feedback scientists can close the loop and apply interactive control using computational steering of any user defined algorithmic or model parameters the data identification interfaces and gathering protocols can also be applied for parallel data exchange in support of coupled simulations and for application directed collection of key program data in checkpoints for automated restart in response to software or hardware failures cumulvs was originally based on pvm but interoperates well with simulations that use mpi or other parallel environments several alternate messaging systems are being integrated with cumulvs to ease its applicability eg to mpi cumulvs has recently been integrated with the common component architecture cca for visualization and parallel data redistribution referred to as mxn and also with global arrays this paper serves as comprehensive overview of the cumulvs capabilities their usage and their development over several years
the study of intelligent user interfaces and user modeling and adaptation is well suited for augmenting educational visits to museums we have defined novel integrated framework for museum visits and claim that such framework is essential in such vast domain that inherently implies complex interactivity we found that it requires significant investment in software and hardware infrastructure design and implementation of intelligent interfaces and systematic and iterative evaluation of the design and functionality of user interfaces involving actual visitors at every stage we defined and built suite of interactive and user adaptive technologies for museum visitors which was then evaluated at the buonconsiglio castle in trento italy animated agents that help motivate visitors and focus their attention when necessary automatically generated adaptive video documentaries on mobile devices and automatically generated post visit summaries that reflect the individual interests of visitors as determined by their behavior and choices during their visit these components are supported by underlying user modeling and inference mechanisms that allow for adaptivity and personalization novel software infrastructure allows for agent connectivity and fusion of multiple positioning data streams in the museum space we conducted several experiments focusing on various aspects of peach in one conducted with visitors we found evidence that even older users are comfortable interacting with major component of the system
digital representations are widely used for audiovisual content enabling the creation of large online repositories of video allowing access such as video on demand however the ease of copying and distribution of digital video makes piracy growing concern for content owners we investigate methods for identifying coderivative video content that is video clips that are derived from the same original source by using dynamic programming to identify regions of similarity in video signatures it is possible to efficiently and accurately identify coderivatives even when these regions constitute only small section of the clip being searched we propose four new methods for producing compact video signatures based on the way in which the video changes over time the intuition is that such properties are likely to be preserved even when the video is badly degraded we demonstrate that these signatures are insensitive to dramatic changes in video bitrate and resolution two parameters that are often altered when reencoding in the presence of mild degradations our methods can accurately identify copies of clips that are as short as within dataset min long these methods are much faster than previously proposed techniques using more compact signature this query can be completed in few milliseconds
prolonging network lifetime is one of the most important design objectives in energy constrained wireless sensor networks wsns using mobile instead of static base station bs to reduce or alleviate the non uniform energy consumption among sensor nodes is an efficient mechanism to prolong the network lifetime in this paper we deal with the problem of prolonging network lifetime in data gathering by employing mobile bs to achieve that we devise novel clustering based heuristic algorithm for finding trajectory of the mobile bs that strikes the trade off between the traffic load among sensor nodes and the tour time constraint of the mobile bs we also conduct experiments by simulations to evaluate the performance of the proposed algorithm the experimental results show that the use of clustering in conjunction with mobile bs for data gathering can prolong network lifetime significantly
optimizing compilers require accurate dependence testing to enable numerous performance enhancing transformations however data dependence testing is difficult problem particularly in the presence of pointers though existing approaches work well for pointers to named memory locations ie other variables they are overly conservative in the case of pointers to unnamed memory locations the latter occurs in the context of dynamic pointer based data structures used in variety of applications ranging from system software to computational geometry to body and circuit simulations in this paper we present new technique for performing more accurate data dependence testing in the presence of dynamic pointer based data structures we will demonstrate its effectiveness by breaking false dependences that existing approaches cannot and provide results which show that removing these dependences enables significant parallelization of real application
network lifetime has become the key characteristic for evaluating sensor networks in an application specific way especially the availability of nodes the sensor coverage and the connectivity have been included in discussions on network lifetime even quality of service measures can be reduced to lifetime considerations great number of algorithms and methods were proposed to increase the lifetime of sensor network mdash while their evaluations were always based on particular definition of network lifetime motivated by the great differences in existing definitions of sensor network lifetime that are used in relevant publications we reviewed the state of the art in lifetime definitions their differences advantages and limitations this survey was the starting point for our work towards generic definition of sensor network lifetime for use in analytic evaluations as well as in simulation models mdash focusing on formal and concise definition of accumulated network lifetime and total network lifetime our definition incorporates the components of existing lifetime definitions and introduces some additional measures one new concept is the ability to express the service disruption tolerance of network another new concept is the notion of time integration in many cases it is sufficient if requirement is fulfilled over certain period of time instead of at every point in time in addition we combine coverage and connectivity to form single requirement called connected coverage we show that connected coverage is different from requiring noncombined coverage and connectivity finally our definition also supports the concept of graceful degradation by providing means of estimating the degree of compliance with the application requirements we demonstrate the applicability of our definition based on the surveyed lifetime definitions as well as using some example scenarios to explain the various aspects influencing sensor network lifetime
we explore the performance of number of popular feature detectors and descriptors in matching object features across viewpoints and lighting conditions to this end we design method based on intersecting epipolar constraints for providing ground truth correspondence automatically these correspondences are based purely on geometric information and do not rely on the choice of specific feature appearance descriptor we test detector descriptor combinations on database of objects viewed from calibrated viewpoints under three different lighting conditions we find that the combination of hessian affine feature finder and sift features is most robust to viewpoint change harris affine combined with sift and hessian affine combined with shape context descriptors were best respectively for lighting change and change in camera focal length we also find that no detector descriptor combination performs well with viewpoint changes of more than
we investigate texture classification from single images obtained under unknown viewpoint and illumination statistical approach is developed where textures are modelled by the joint probability distribution of filter responses this distribution is represented by the frequency histogram of filter response cluster centres textons recognition proceeds from single uncalibrated images and the novelty here is that rotationally invariant filters are used and the filter response space is low dimensionalclassification performance is compared with the filter banks and methods of leung and malik ijcv schmid cvpr and cula and dana ijcv and it is demonstrated that superior performance is achieved here classification results are presented for all materials in the columbia utrecht texture databasewe also discuss the effects of various parameters on our classification algorithm such as the choice of filter bank and rotational invariance the size of the texton dictionary as well as the number of training images used finally we present method of reliably measuring relative orientation co occurrence statistics in rotationally invariant manner and discuss whether incorporating such information can enhance the classifier’s performance
we have developed method for determining whether data found on the web are for the same or different objects that takes into account the possibility of changes in their attribute values over time specifically we estimate the probability that observed data were generated for the same object that has undergone changes in its attribute values over time and the probability that the data are for different objects and we define similarities between observed data using these probabilities by giving specific form to the distributions of time varying attributes we can calculate the similarity between given data and identify objects by using agglomerative clustering on the basis of the similarity experiments in which we compared identification accuracies between our proposed method and method that regards all attribute values as constant showed that the proposed method improves the precision and recall of object identification
mobile agent robot modeled as finite automaton has to visit all nodes of regular graph how does the memory size of the agent the number of states of the automaton influence its exploration capability in particular does every increase of the memory size enable an agent to explore more graphs we give partial answer to this problem by showing that strict gain of the exploration power can be obtained by polynomial increase of the number of states we also show that for automata with few states the increase of memory by even one state results in the capability of exploring more graphs
many secure systems rely on human in the loop to perform security critical functions however humans often fail in their security roles whenever possible secure system designers should find ways of keeping humans out of the loop however there are some tasks for which feasible or cost effective alternatives to humans are not available in these cases secure system designers should engineer their systems to support the humans in the loop and maximize their chances of performing their security critical functions successfully we propose framework for reasoning about the human in the loop that provides systematic approach to identifying potential causes for human failure this framework can be used by system designers to identify problem areas before system is built and proactively address deficiencies system operators can also use this framework to analyze the root cause of security failures that have been attributed to human error we provide examples to illustrate the applicability of this framework to variety of secure systems design problems including anti phishing warnings and password policies
with the increasing use of geometry scanners to create models there is rising need for fast and robust mesh smoothing to remove inevitable noise in the measurements while most previous work has favored diffusion based iterative techniques for feature preserving smoothing we propose radically different approach based on robust statistics and local first order predictors of the surface the robustness of our local estimates allows us to derive non iterative feature preserving filtering technique applicable to arbitrary triangle soups we demonstrate its simplicity of implementation and its efficiency which make it an excellent solution for smoothing large noisy and non manifold meshes
the cray td and te are non cache coherent ncc computers with numa structure they have been shown to exhibit very stable and scalable performance for variety of application programs considerable evidence suggests that they are more stable and scalable than many other shared memory multiprocessors however the principal drawback of these machines is lack of programmability caused by the absence of the global cache coherence that is necessary to provide convenient shared view of memory in hardware this forces the programmer to keep careful track of where each piece of data is stored complication that is unnecessary when pure shared memory view is presented to the user we believe that remedy for this problem is advanced compiler technology in this paper we present our experience with compiler framework for automatic parallelization and communication generation that has the potential to reduce the time consuming hand tuning that would otherwise be necessary to achieve good performance with this type of machine from our experiments we learned that our compiler performs well for variety of applications on the td and te and we found few sophisticated techniques that could improve performance even more once they are fully implemented in the compiler
the paper proposes general optimization model with separable strictly convex objective function to obtain the consistent owa ordered weighted averaging operator family the consistency means that the aggregation value of the operator monotonically changes with the given orness level some properties of the problem are discussed with its analytical solution the model includes the two most commonly used maximum entropy owa operator and minimum variance owa operator determination methods as its special cases the solution equivalence to the general minimax problem is proved then with the conclusion that the rim regular increasing monotone quantifier can be seen as the continuous case of owa operator with infinite dimension the paper further proposes general rim quantifier determination model and analytically solves it with the optimal control technique some properties of the optimal solution and the solution equivalence to the minimax problem for rim quantifier are also proved comparing with that of the owa operator problem the rim quantifier solutions are usually more simple intuitive dimension free and can be connected to the linguistic terms in natural language with the solutions of these general problems we not only can use the owa operator or rim quantifier to obtain aggregation value that monotonically changes with the orness level for any aggregated set but also can obtain the parameterized owa or rim quantifier families in some specific function forms which can incorporate the background knowledge or the required characteristic of the aggregation problems
scott and scherer recently pointed out that existing locking algorithms do not meet need that arises in practical systems specifically database systems and real time systems need mutual exclusion locks that support the abort capability which makes it possible for process that waits too long to abort its attempt to acquire the lock further to ensure high performance in cache coherent and numa multiprocessors the locking algorithm should generate as few remote references as possibleto help meet this need scott and scherer in and scott in proposed some local spin abortable mutual exclusion algorithms but these algorithms have shortcomings specifically the algorithm by scott and scherer allows an aborting process to be blocked by other processes which is unacceptable the subsequent algorithms by scott overcome this shortcoming but these have unbounded worst case time and space complexityin this paper we present art efficient local spin algorithm with the following complexity in each acquisition and release abort of the lock process makes min log remote memory references where is the point contention and is the total number of processes for which the lock is designed thus not only is the algorithm adaptive but also its worst case time complexity has small logarithmic bound the algorithm has space complexity to our knowledge this is the first abortable mutual exclusion algorithm that has bounded time complexity and requires only bounded number of memory words
the use of optimization techniques has been recently proposed to build models for software development effort estimation in particular some studies have been carried out using search based techniques such as genetic programming and the results reported seem to be promising at the best of our knowledge nobody has analyzed the effectiveness of tabu search for development effort estimation tabu search is meta heuristic approach successful used to address several optimization problems in this paper we report on an empirical analysis carried out exploiting tabu search on publicity available dataset ie desharnais dataset the achieved results show that tabu search provides estimates comparable with those achieved with some widely used estimation techniques
computer networks have expanded significantly in use and numbers this expansion makes them more vulnerable to attack by unwanted agents many current intrusion detection systems ids are unable to identify unknown or mutated attack modes or are unable to operate in dynamic environment as is necessary with mobile networks as result it is necessary to find new ways to implement and operate intrusion detection systems genetic based systems offer to ability to adapt to changing environments robustness to noise and the ability to identify unknown attack methods this paper presents fuzzy genetic approach to intrusion detection that is shown to increase the performance of an ids
low duty cycle operation is critical to conserve energy in wireless sensor networks traditional wake up scheduling approaches either require periodic synchronization messages or incur high packet delivery latency due to the lack of any synchronization to simultaneously achieve the seemingly contradictory goals of energy efficiency and low latency the design of new low duty cycle mac layer protocol called convergent mac cmac is presented cmac avoids synchronization overhead while supporting low latency by using zero communication when there is no traffic cmac allows sensor nodes to operate at very low duty cycles when carrying traffic cmac first uses anycast to wake up forwarding nodes and then converges gradually from route suboptimal anycast with unsynchronized duty cycling to route optimal unicast with synchronized scheduling to validate our design and provide usable module for the research community cmac has been implemented in tinyos and evaluated on the kansei testbed consisting of xsm nodes the results show that cmac at percnt duty cycle significantly outperforms bmac at percnt in terms of latency throughput and energy efficiency the performance of cmac is also compared with other protocols using simulations in which the results show for percnt and lower duty cycles cmac exhibits similar throughput and latency as csma ca using much less energy and outperforms smac dmac and geraf in almost all aspects
specification mining is dynamic analysis process aimed at automatically inferring suggested specifications of program from its execution traces we describe novel method framework and tool for mining inter object scenario based specifications in the form of uml compliant variant of damm and harels live sequence charts lsc lsc extends the classical partial order semantics of sequence diagrams with temporal liveness and symbolic class level lifelines in order to generate compact and expressive specifications the output of our algorithm is sound and complete set of statistically significant lscs ie satisfying given thresholds of support and confidence mined from an input execution trace we locate statistically significant lscs by exploring the search space of possible lscs and checking for their statistical significance in addition we use an effective search space pruning strategy specifically adapted to lscs which enables efficient mining of scenarios of arbitrary size we demonstrate and evaluate the utility of our work in mining informative specifications using case study on jeti popular full featured messaging application
in this survey we discuss software infrastructures and frameworks which support the construction of distributed interactive systems they range from small projects with one implemented prototype to large scale research efforts and they come from the fields of augmented reality ar intelligent environments and distributed mobile systems in their own way they can all be used to implement various aspects of the ubiquitous computing vision as described by mark weiser this survey is meant as starting point for new projects in order to choose an existing infrastructure for reuse or to get an overview before designing new one it tries to provide systematic relatively broad and necessarily not very deep overview while pointing to relevant literature for in depth study of the systems discussed
the goal of this paper is to investigate new shape analysis method based on randomized cuts of surface meshes the general strategy is to generate random set of mesh segmentations and then to measure how often each edge of the mesh lies on segmentation boundary in the randomized set the resulting partition function defined on edges provides continuous measure of where natural part boundaries occur in mesh and the set of most consistent cuts provides stable list of global shape features the paper describes methods for generating random distributions of mesh segmentations studies sensitivity of the resulting partition functions to noise tessellation pose and intra class shape variations and investigates applications in mesh visualization segmentation deformation and registration
an imbalanced training data set can pose serious problems for many real world data mining tasks that employ svms to conduct supervised learning in this paper we propose kernel boundary alignment algorithm which considers the training data imbalance as prior information to augment svms to improve class prediction accuracy using simple example we first show that svms can suffer from high incidences of false negatives when the training instances of the target class are heavily outnumbered by the training instances of nontarget class the remedy we propose is to adjust the class boundary by modifying the kernel matrix according to the imbalanced data distribution through theoretical analysis backed by empirical study we show that our kernel boundary alignment algorithm works effectively on several data sets
to provide users with only relevant data from the huge amount of available information personalization systems utilize preferences to allow users to express their interest on specific pieces of data most often user preferences vary depending on the circumstances for instance when with friends users may like to watch thrillers whereas when with their kids they may prefer to watch cartoons contextual preference systems address this challenge by supporting preferences that depend on the values of contextual attributes such as the surrounding environment time or location in this paper we address the problem of finding interesting data items based on contextual preferences that assign interest scores to pieces of data based on context to this end we propose number of pre processing steps instead of pre computing scores for all data items under all potential context states we exploit the hierarchical nature of context attributes to identify representative context states furthermore we introduce method for grouping preferences based on the similarity of the scores that they produce this method uses bitmap representation of preferences and scores with various levels of precision that lead to approximate rankings with different degrees of accuracy we evaluate our approach using both real and synthetic data sets and present experimental results showing the quality of the scores attained using our methods
load balancing is very important and complex problem in computational grids computational grid differs from traditional high performance computing systems in the heterogeneity of the computing nodes as well as the communication links that connect the different nodes together there is need to develop algorithms that could capture this complexity yet can be easily implemented and used to solve wide range of load balancing scenarios in this paper we propose game theoretic solution to the grid load balancing problem the algorithm developed combines the inherent efficiency of the centralized approach and the fault tolerant nature of the distributed decentralized approach we model the grid load balancing problem as non cooperative game whereby the objective is to reach the nash equilibrium experiments were conducted to show the applicability of the proposed approaches one advantage of our scheme is the relatively low overhead and robust performance against inaccuracies in performance prediction information
this paper presents pq fully decentralized gossip based protocol to personalize query processing in social tagging systems pq dynamically associates each user with social acquaintances sharing similar tagging behaviours queries are gossiped among such acquaintances computed on the fly in collaborative yet partitioned manner and results are iteratively refined and returned to the querier analytical and experimental evaluations convey the scalability of pq for top query processing more specifically we show that on user delicious trace with little storage at each user the queries are accurately computed within reasonable time and bandwidth consumption we also report on the inherent ability of pq to cope with users updating profiles and departing
this paper develops declarative language log that combines logical and probabilistic arguments in its reasoning answer set prolog is used as the logical foundation while causal bayes nets serve as probabilistic foundation we give several non trivial examples and illustrate the use of log for knowledge representation and updating of knowledge we argue that our approach to updates is more appealing than existing approaches we give sufficiency conditions for the coherency of log programs and show that bayes nets can be easily mapped to coherent log programs
gate level characterization glc is the process of characterizing each gate of an integrated circuit ic in terms of its physical and manifestation properties it is key step in the ic applications regarding cryptography security and digital rights management however glc is challenging due to the existence of manufacturing variability mv and the strong correlations among some gates in the circuit we propose new solution for glc by using thermal conditioning techniques in particular we apply thermal control on the process of glc which breaks the correlations by imposing extra variations concerning gate level leakage power the scaling factors of all the gates can be characterized by solving system of linear equations using linear programming lp based on the obtained gate level scaling factors we demonstrate an application of glc hardware trojan horse hth detection by using constraint manipulation we evaluate our approach of glc and hth detection on several iscas benchmarks the simulation results show that our thermally conditioned glc approach is capable of characterizing all the gates with an average error less than the measurement error and we can detect hths with accuracy on target circuit
various architectural components of orion and orion sx are described and review of the current implementation is provided the message handler receives all messages sent to the orion system the object subsystem provides high level data management functions including query optimization schema management long data management including text search and support for versionable objects composite objects and multimedia objects the transaction management subsystem coordinates concurrent object accesses and provides recovery capabilities the storage subsystem manages persistent storage of objects and controls the flow of objects between the secondary storage device and main memory buffers in orion all subsystems reside in one computer the orion sx architecture is significantly different from orion in the management of shared data structures and distribution of these subsystems and their components
to introduce the republication of ldquo definitional interpreters for higher order programming languages rdquo the author recounts the circumstances of its creation clarifies several obscurities corrects few mistakes and briefly summarizes some more recent developments
in prefetching the objects that are expected to be accessed in the future are fetched from the server to the client in advance prefetching reduces the number of round trips and increases the system performance to prefetch object effectively we need to correctly predict the future navigational patterns in this paper we propose the prefetchguide novel data structure that captures the navigational access patterns we also formally define the notion of the attribute access log set and analyze the navigational access patterns that can be captured by the prefetchguide we then present an prefetching algorithm using the prefetchguide to show effectiveness of our algorithm we have conducted extensive experiments in prototype object relational database management systems dbms the results show that our method significantly outperforms the state of the art prefetching method these results indicate that our approach provides practical method that can be implemented in commercial object oriented object relational dbmss we believe our method is practically usable for object oriented programmers and dbms implementors
recent advancements in rapid prototyping techniques such as printing and laser cutting are changing the perception of physical models in architecture and industrial design physical models are frequently created not only to finalize project but also to demonstrate an idea in early design stages for such tasks models can easily be annotated to capture comments edits and other forms of feedback unfortunately these annotations remain in the physical world and cannot easily be transferred back to the digital world our system modelcraft addresses this problem by augmenting the surface of model with traceable pattern any sketch drawn on the surface of the model using digital pen is recovered as part of digital representation sketches can also be interpreted as edit marks that trigger the corresponding operations on the cad model modelcraft supports wide range of operations on complex models from editing model to assembling multiple models and offers physical tools to capture free space input several interviews and formal study with the potential users of our system proved the modelcraft system useful our system is inexpensive requires no tracking infrastructure or per object calibration and we show how it could be extended seamlessly to use current printing technology
to produce quality software and evolve them in an economic and timely fashion enactable software process models are used for regulating development activities with the support of process centered software engineering environments pcsees however due to the dynamically changing development environment the developers do not always follow the process model in presence of unforeseen situations as human with creativity and variant nature each developer has his or her own way of doing development that may not be allowed by the process model as result various inconsistencies arise in software processes and then the authority of the process model will be undermined in this paper we propose an algebraic approach to promote the efficient management of inconsistencies with the approach potential inconsistencies can be precisely detected and valuable diagnostic information is available to help process designers efficiently locate the detected inconsistencies the effectiveness of the approach is demonstrated by experimenting it on an example process
we incorporate innovations from the bigwig project into the java language to provide high level features for web service programming the resulting language jwig contains an advanced session model and flexible mechanism for dynamic construction of xml documents in particular xhtml to support program development we provide suite of program analyses that at compile time verify for given program that no runtime errors can occur while building documents or receiving form input and that all documents being shown are valid according to the document type definition for xhtml we compare jwig with servlets and jsp which are widely used web service development platforms our implementation and evaluation of jwig indicate that the language extensions can simplify the program structure and that the analyses are sufficiently fast and precise to be practically useful
conventional wisdom and anecdote suggests that testing takes between to of project’s effort however testing is not monolithic activity as it consists of number of different phases such as unit testing integration testing and finally system and acceptance testunit testing has received lot of criticism in terms of the amount of time that it is perceived to take and its perceived costs however it still remains an important verification activity being an effective means to test individual software components for boundary value behavior and ensure that all code has been exercised adequately we examine the available data from three safety related industrial software projects that have made use of unit testing using this information we argue that the perceived costs of unit testing may be exaggerated and that the likely benefits in terms of defect detection are quite high in relation to those costswe also discuss the different issues that have been found applying the technique at different phases of the development and using different methods to generate those tests we also compare results we have obtained with empirical results from the literature and highlight some possible weakness of research in this area
trace driven simulations have been widely used in computer architecture for quantitative evaluations of new ideas and design prototypes efficient trace compression and fast decompression are crucial for contemporary workloads as representative benchmarks grow in size and number this article presents stream based compression sbc novel technique for single pass compression of address traces the sbc technique compresses both instruction and data addresses by associating them with particular instruction stream that is block of consecutively executing instructions the compressed instruction trace is trace of instruction stream identifiers the compressed data address trace encompasses the data address stride and the number of repetitions for each memory referencing instruction in stream ordered by the corresponding stream appearances in the trace sbc reduces the size of spec cpu dinero instruction and data address traces from to times outperforming the best trace compression techniques presented in the open literature sbc can be successfully combined with general purpose compression techniques the combined sbc gzip compression ratio is from to and the sbc bzip compression ratio is from to moreover sbc outperforms other trace compression techniques when both decompression time and compression time are considered this article also shows how the sbc algorithm can be modified for hardware implementation with very modest resources and only minor loss in compression ratio
recent developments in sensor technology have made it feasible to use mobile robots in several fields but robots still lack the ability to accurately sense the environment major challenge to the widespread deployment of mobile robots is the ability to function autonomously learning useful models of environmental features recognizing environmental changes and adapting the learned models in response to such changes this article focuses on such learning and adaptation in the context of color segmentation on mobile robots in the presence of illumination changes the main contribution of this article is survey of vision algorithms that are potentially applicable to color based mobile robot vision we therefore look at algorithms for color segmentation color learning and illumination invariance on mobile robot platforms including approaches that tackle just the underlying vision problems furthermore we investigate how the inter dependencies between these modules and high level action planning can be exploited to achieve autonomous learning and adaptation the goal is to determine the suitability of the state of the art vision algorithms for mobile robot domains and to identify the challenges that still need to be addressed to enable mobile robots to learn and adapt models for color so as to operate autonomously in natural conditions
virtual routers are promising way to provide network services such as customer specific routing policy based routing multi topology routing and network virtulization however the need to support separate forwarding information base fib for each virtual router leads to memory scaling challenges in this paper we present small shared data structure and fast lookup algorithm that capitalize on the commonality of ip prefixes between each fib experiments with real packet traces and routing tables show that our approach achieves much lower memory requirements and considerably faster lookup times our prototype implementation in the click modular router running both in user space and in the linux kernel demonstrates that our data structure and algorithm are an interesting solution for building scalable routers that support virtualization
denotational semantics is given for java like language with pointers subclassing and dynamic dispatch class oriented visibility control recursive types and methods and privilege based access control representation independence relational parametricity is proved using semantic notion of confinement similar to ones for which static disciplines have been recently proposed
the data cache is one of the most frequently accessed structures in the processor because of this and its moderate size it is major consumer of power in order to reduce its power consumption in this paper small filter structure that exploits the special features of the references to the stack region is proposed this filter which acts as top non inclusive level of the data memory hierarchy consists of register set that keeps the data stored in the neighborhood of the top of the stack our simulation results show that using small stack filter sf of only few registers to data cache power savings can be achieved on average with negligible performance penalty
today’s internet users and applications are placing increased demands on internet service providers isps to deliver fine grained flexible route control to assist network operators in addressing this challenge we present the intelligent route service control point irscp route control architecture that allows network operator to flexibly control routing between the traffic ingresses and egresses within an isp’s network without modifying the isp’s existing routers in essence irscp subsumes the control plane of an isp’s network by replacing the distributed bgp decision process of each router in the network with more flexible logically centralized appliction controlled route computation irscp supplements the traditional bgp decision process with an explicitly ranked decision process that allows route control applications to provide per destination per router explicit ranking of traffic egresses we describe our implementation of irscp as well as straightforward set of correctness requirements that prevents routing anomalies to illustrate the potential of application controlled route selection we use our irscp prototype to implement simple form of dynamic customer traffic load balancing and demonstrate through emulation that our implementation is scalable
this paper presents an analytical model to study how working sets scale with database size and other applications parameters in decision support systems dss the model uses application parameters that are measured on down scaled database executions to predict cache miss ratios for executions of large databasesby applying the model to two database engines and typical dss queries we find that even for large databases the most performance critical working set is small and is caused by the instructions and private data that are required to access single tuple consequently its size is not affected by the database size surprisingly database data may also exhibit temporal locality but the size of its working set critically depends on the structure of the query the method of scanning and the size and the content of the database
in recent times data are generated as form of continuous data streams in many applications since handling data streams is necessary and discovering knowledge behind data streams can often yield substantial benefits mining over data streams has become one of the most important issues many approaches for mining frequent itemsets over data streams have been proposed these approaches often consist of two procedures including continuously maintaining synopses for data streams and finding frequent itemsets from the synopses however most of the approaches assume that the synopses of data streams can be saved in memory and ignore the fact that the information of the non frequent itemsets kept in the synopses may cause memory utilization to be significantly degraded in this paper we consider compressing the information of all the itemsets into structure with fixed size using hash based technique this hash based approach skillfully summarizes the information of the whole data stream by using hash table provides novel technique to estimate the support counts of the non frequent itemsets and keeps only the frequent itemsets for speeding up the mining process therefore the goal of optimizing memory space utilization can be achieved the correctness guarantee error analysis and parameter setting of this approach are presented and series of experiments is performed to show the effectiveness and the efficiency of this approach
mining informative patterns from very large dynamically changing databases poses numerous interesting challenges data summarizations eg data bubbles have been proposed to compress very large static databases into representative points suitable for subsequent effective hierarchical cluster analysis in many real world applications however the databases dynamically change due to frequent insertions and deletions possibly changing the data distribution and clustering structure over time completely reapplying both the data summarization and the clustering algorithm to detect the changes in the clustering structure and update the uncovered data patterns following such deletions and insertions is prohibitively expensive for large fast changing databases in this paper we propose new scheme to maintain data bubbles incrementally by using incremental data bubbles high quality hierarchical clustering is quickly available at any point in time in our scheme quality measure for incremental data bubbles is used to identify data bubbles that do not compress well their underlying data points after certain insertions and deletions only these data bubbles are re built using efficient split and merge operations an extensive experimental evaluation shows that the incremental data bubbles provide significantly faster data summarization than completely re building the data bubbles after certain number of insertions and deletions and are effective in preserving and in some cases even improving the quality of the data summarization
this paper compares data distribution methodologies for scaling the performance of openmp on numa architectures we investigate the performance of automatic page placement algorithms implemented in the operating system runtime algorithms based on dynamic page migration runtime algorithms based on loop scheduling transformations and manual data distribution these techniques present the programmer with trade offs between performance and programming effort automatic page placement algorithms are transparent to the programmer but may compromise memory access locality dynamic page migration algorithms are also transparent but require careful engineering and tuned implementations to be effective manual data distribution requires substantial programming effort and architecture specific extensions to the api but may localize memory accesses in nearly optimal manner loop scheduling transformations may or may not require intervention from the programmer but conform better to an architecture agnostic programming paradigm like openmp we identify the conditions under which runtime data distribution algorithms can optimize memory access locality in openmp we also present two novel runtime data distribution techniques one based on memory access traces and another based on affinity scheduling of parallel loops these techniques can be used to effectively replace manual data distribution in regular applications the results provide proof of concept that it is possible to scale portable shared memory programming model up to more than processors without modifying the api and without exposing architectural details to the programmer
components defined in software architecture have two features as basic elements of the architecture they must conform to the architectural constraints and in the meantime similar to the common components they should be designed flexibly enough to be able to be developed independently for the late third party integration however these two important issues have always been handled separately from different point of views which leads to the extra work confusions in the program structures as well as the difficulty in maintenance this paper presents basic model of the architecture based components implementation to band these two issues together it firstly describes novel design pattern triple pattern which stands for omponents ommunicate through onnector this pattern not only emphasizes that implementation must completely conform to the architectural definition but also attempts to change the fundamental way of components communication with suggesting provided service should be transferred through the connector instead of directly between the client and server components second it describes novel adl jcmpl toolset jcmp and techniques to keep architectural conformance in the implementation as well as support the architectural integration from separate components finally this model is evaluated in case study
an important potential application of image based techniques is to create photo realistic image based environments for interactive walkthrough however existing image based studies are based on different assumptions with different focuses there is lack of general framework or architecture for evaluation and development of practical image based system in this paper we propose an architecture to unify different image based methods based on the architecture we propose an image based system to support interactive walkthrough of scalable environments in particular we introduce the concept of angular range which is useful for designing scalable configuration recovering geometric proxy as well as rendering we also propose new method to recover geometry information even from outdoor scenes and new rendering method to address the problem of abrupt visual changes in scalable environment
we present method for automatically acquiring of corpus of disputed claims from the web we consider factual claim to be disputed if page on the web suggests both that the claim is false and also that other people say it is true our tool extracts disputed claims by searching the web for patterns such as falsely claimed that and then using statistical classifier to select text that appears to be making disputed claim we argue that such corpus of disputed claims is useful for wide range of applications related to information credibility on the web and we report what our current corpus reveals about what is being disputed on the web
this work evaluates task allocation strategies based on bin packing algorithms in the context of multiprocessor systems on chip mpsocs with task migration capabilities running soft real time applications the task migration model assumes that the whole code and data of the tasks are transferred from an origin node to the chosen destination node we combine two types of algorithms to obtain better allocation results experimental results show that there is trade off between deadline misses and system energy consumption when applying bin packing and linear clustering algorithms in order to save energy our system turns off idle processors and applies dynamic voltage scaling to processors with slack depending on the algorithm selection and on the application it is possible to obtain reduction on deadline misses from to and energy consumption savings from to
today’s system administrators burdened by rapidly increasing network activity must quickly perceive the security state of their networks but they often have only text based tools to work with these tools often provide no overview to help users grasp the big picture our interviews with administrators have revealed that they need visualization tools thus we present visual visual information security utility for administration live network security visualization tool that allows users to see communication patterns between their home or internal networks and external hosts visual is part of our network eye security visualization architecture also described in this paper we have designed and tested new computer security visualization that gives quick overview of current and recent communication patterns in the monitored network to the users many tools can detect and show fan out and fan in but visual shows network events graphically in context visualization helps users comprehend the intensity of network events more intuitively than text based tools can visual provides insight for networks with up to home hosts and external hosts shows the relative activity of hosts displays them in constant relative position and reveals the ports and protocols used
advances in technology have rendered the internet viable medium for employing multiple independent computers collaboratively in the solution of single computational problem variety of mechanisms eg web based computing peer to peer computing and grid computing have been developed for such internet based computing ic scheduling computation for ic presents challenges that were not encountered with earlier modalities of parallel or distributed computing especially when the computation’s constituent tasks have interdependencies that constrain their order of execution the process of scheduling such computations for ic is studied via pebble game that abstracts the process of orchestrating the allocation of computation’s interdependent tasks to participating computers quality measure for plays of this game is developed that addresses the danger of gridlock in ic when computation stalls because due to dependencies no tasks are eligible for execution this measure rewards schedules that maximize the number of tasks that are eligible for execution at every step of the computation one avenue for minimizing the likelihood of gridlock the resulting formal setting is illustrated via the problem of scheduling computations whose intertask dependencies have the structure of evolving meshes of finite dimensionalities within an idealized setting simple scheduling strategy is shown to be optimal when the dependencies have the structure of two dimensional mesh and within constant factor of optimal for meshes of higher dimensionalities the strategy remains optimal for generalization of two dimensional meshes whose structures are determined by abelian monoids monoid based version of cayley graphs the optimality results for the idealized setting provide scheduling guidelines for real settings
most of the software in regular use in businesses and organisations all over the world cannot be completely specified it cannot be implemented once and for all both the original implementation and the inevitable subsequent evolution maintenance are continual learning experience driven inter alia by feedback from the results of the behaviour under execution of the software as perceived by various stakeholders by advances and growth in the user organisations and by adaptation to changes in the external world both independent and as result of installation and use of the software real world termed type software is essentially evolutionary in nature the study of the processes of evolution of such software is of considerable interest as is that of the domains that co evolve with the software after briefly discussing the meaning of the term evolution in the context of software its technology the software process and related domains this paper describes some of the facets of the evolution phenomenon and implications to the evolution process as identified during many years of active interest in the topic
in this paper we explore new data mining capability that involves mining calling path patterns in global system for mobile communication gsm networks our proposed method consists of two phases first we devise data structure to convert the original calling paths in the log file into frequent calling path graph second we design an algorithm to mine the calling path patterns from the frequent calling path graph obtained by using the frequent calling path graph to mine the calling path patterns our proposed algorithm does not generate unnecessary candidate patterns and requires less database scans if the corresponding calling path graph of the gsm network can be fitted in the main memory our proposed algorithm scans the database only once otherwise the cellular structure of the gsm network is divided into several partitions so that the corresponding calling path sub graph of each partition can be fitted in the main memory the number of database scans for this case is equal to the number of partitioned sub graphs therefore our proposed algorithm is more efficient than the prefixspan and priori like approaches the experimental results show that our proposed algorithm outperforms the priori like and prefixspan approaches by several orders of magnitude
embedded software is widely used in automotive applications often in critical situations where reliability of the system is extremely important such systems often use model based development approaches model transformation is an important step in such scenarios this includes generating code from models transforming design models into analysis models or transforming model between variants of formalism such as variants of statecharts it becomes important to verify that the transformation was correct and the transformed model or code preserved the semantics of the design model in this paper we will look at technique called goal directed certification that provides pragmatic solution to the verification problem we will see how we can use concepts of bisimulation to verify whether certain transformation instance preserved certain properties we will then extend this idea using weak bisimulation and semantic anchoring to more general class of transformations
this paper classifies embedded software into four models according to hardware platform and execution time we propose an algorithm cmpch cost minimization with probability for configurable hardware that efficiently solves the configurable hardware no fixed execution time model which is the most complicated model among the four models we proposed according to hardware platform and execution time cmpch can solve other three models also our approach fully takes advantage of configurable hardware and the soft real time feature to improve the system performance experimental results show our approach achieves significant cost reduction comparing with previous work
we concentrate on automatic revision of untimed and real time programs with respect to unity properties the main focus of this article is to identify instances where addition of unity properties can be achieved efficiently in polynomial time and where the problem of adding unity properties is difficult np complete regarding efficient revision we present sound and complete algorithm that adds single leads to property respectively bounded time leads to property and conjunction of unless stable and invariant properties respectively bounded time unless and stable to an existing untimed respectively real time unity program in polynomial time in the state space respectively region graph of the given program regarding hardness results we show that while one leads to respectively ensures property can be added in polynomial time the problem of adding two such properties or any combination of leads to and ensures is np complete if maximum non determinism is desired then the problem of adding even single leads to property is np complete and the problem of providing maximum non determinism while adding single bounded time leads to property to real time program is np complete in the size of the program’s region graph even if the original program satisfies the corresponding unbounded leads to property
current multi display environments mdes can be composed of displays with different characteristics eg resolution size located in any position and at different angles these heterogeneous arrangements present specific interface problems it is difficult to provide meaningful transitions of cursors between displays it is difficult for users to visualize information that is presented on oblique surfaces and it is difficult to spread visual information over multiple displays in this paper we present middleware architecture designed to support new kind of perspective aware gui that solves the aforementioned problems our interaction architecture combines distributed input and position tracking data to generate perspective corrected output in each of the displays allowing groups of users to manipulate existing applications from current operating systems across large number of displays to test our design we implemented complex mde prototype and measured different aspects of its performance
we present two variants of the krivine abstract machine that reduce lambda terms to full normal form we give proof of their correctness by interpreting their behaviour in the calculus
ccl checkpointing and communication library is software layer in support of optimistic parallel discrete event simulation pdes on myrinet based cots clusters beyond classical low latency message delivery functionalities this library implements cpu offloaded non blocking asynchronous checkpointing functionalities based on data transfer capabilities provided by programmable dma engine on board of myrinet network cards these functionalities are unique since optimistic simulation systems conventionally rely on checkpointing implemented as synchronous cpu based data copy releases of ccl up to only support monoprogrammed non blocking checkpoints this forces re synchronization between cpu and dma activities which is potential source of overhead each time new checkpoint request must be issued at the simulation application level while the last issued one is still being carried out by the dma engine in this paper we present redesigned release of ccl that exploiting hardware capabilities of more advanced myrinet clusters supports multiprogrammed non blocking checkpoints the multiprogrammed approach allows higher degree of concurrency between checkpointing and other simulation specific operations carried out by the cpu with benefits on performance we also report the results of the experimental evaluation of those benefits for the case of personal communication system pcs simulation application selected as real world test bed
dynamic voltage scaling dvs is one of the techniques used to obtain energy saving in real time dsp systems in many dsp systems some tasks contain conditional instructions that have different execution times for different inputs due to the uncertainties in execution time of these tasks this paper models each varied execution time as probabilistic random variable and solves the voltage assignment with probability vap problem vap problem involves finding voltage level to be used for each node of an date flow graph dfg in uniprocessor and multiprocessor dsp systems this paper proposes two optimal algorithms one for uniprocessor and one for multiprocessor dsp systems to minimize the expected total energy consumption while satisfying the timing constraint with guaranteed confidence probability the experimental results show that our approach achieves significant energy saving than previous work for example our algorithm for multiprocessor achieves an average improvement of on total energy saving with probability satisfying timing constraint
word alignment methods can gain valuable guidance by ensuring that their alignments maintain cohesion with respect to the phrases specified by monolingual dependency tree however this hard constraint can also rule out correct alignments and its utility decreases as alignment models become more complex we use publicly available structured output svm to create max margin syntactic aligner with soft cohesion constraint the resulting aligner is the first to our knowledge to use discriminative learning method to train an itg bitext parser
the locality of the data in parallel programs is known to have strong impact on the performance of distributed memory multiprocessor systems the worse the locality in access pattern the worse the performance of single threaded multiprocessor systems the main reason is that lower locality increases the latency for network messages so processor waiting for these messages idles for long periods good data partitioning strategy strives to improve the locality of accesses by reducing the data sharing and the network traffic certain amount of data sharing however is must for any non trivial parallel program so to tune the performance of multiprocessor systems compilers and programmers expend significant effort to improve the data partitioningthe technique of multithreading has been promoted as an effective mechanism to hide inter processor communication and remote data access latencies by quickly switching among set of ready threads in this paper we show that multithreading also provides an immunity to the performance variations due to changes in data locality distributions in distributed memory multiprocessor first we propose two performance metrics to quantify the sensitivity of performance to the data locality second we perform quantitive comparison of data locality sensitivity with both single threaded and multithreaded computations underlying the designed experiments and benchmark programs we perform these experiments on the node earth manna system our experimental results show that not only does multithreaded computation yield higher performance than does the single threaded computation but the performance is more robust with respect to the same data partitioning that is lower data locality sensitivity can be achieved with multithreading
overload control is challenging problem for web based applications which are often prone to unexpected surges of traffic existing solutions are still far from guaranteeing the necessary responsiveness under rapidly changing operative conditions we contribute an original self overload control soc algorithm that self configures dynamic constraint on the rate of incoming new sessions in order to guarantee the fulfillment of the quality requirements specified in service level agreement sla our algorithm is based on measurement activity that makes the system capable of self learning and self configuring even in the case of rapidly changing traffic scenarios dynamic resource provisioning or server faults unlike other approaches our proposal does not require any prior information about the incoming traffic or any manual configuration of key parameters we ran extensive simulations under wide range of operating conditions the experiments show how the proposed system self protects from overload meeting sla requirements even under intense workload variations moreover it rapidly adapts to unexpected changes in available capacity as in the case of faults or voluntary architectural adjustments performance comparisons with other previously proposed approaches show that our algorithm has better performance and more stable behavior
interpreters designed for efficiency execute huge number of indirect branches and can spend more than half of the execution time in indirect branch mispredictions branch target buffers are the best widely available form of indirect branch prediction however their prediction accuracy for existing interpreters is only in this paper we investigate two methods for improving the prediction accuracy of btbs for interpreters replicating virtual machine vm instructions and combining sequences of vm instructions into superinstructions we investigate static interpreter build time and dynamic interpreter run time variants of these techniques and compare them and several combinations of these techniques these techniques can eliminate nearly all of the dispatch branch mispredictions and have other benefits resulting in speedups by factor of up to over efficient threaded code interpreters and speedups by factor of up to over techniques relying on superinstructions alone
multicluster architectures overcome the scaling problem of centralized resources by distributing the datapath register file and memory subsystem across multiple clusters connected by communication network traditional compiler partitioning algorithms focus solely on distributing operations across the clusters to maximize instruction level parallelism the distribution of data objects is generally ignored in this work we examine explicit partitioning of data objects and its affects on operation partitioning the partitioning of data objects must consider several factors object size access frequency pattern and dependence patterns between operations that manipulate the objects this work proposes compiler directed approach to synergistically partition both data objects and computation across multiple clusters first global view of the application determines the interaction between data memory objects and their associated computation next data objects are partitioned across multiple clusters with knowledge of the associated computation required by the application finally the resulting distribution of the data objects is relayed to region level computation partitioner which carefully places computation operations in performance centric manner
designing topologically aware overlays is recurrent subject in peer to peer research although there exists plethora of approaches internet coordinate systems such as gnp which attempt to predict the pair wise latencies between nodes using only measurements have become the most attractive approach to make the overlay connectivity structures congruent with the underlying ip level network topology with appropriate input coordinate systems allow complex distributed problems to be solved geometrically including multicast server selection etc for these applications and presumably others like that exact topological information is not required and it is sufficient to use informative hints about the relative positions of internet clients clustering operation which attempts to partition set of objects into several subsets that are distinguishable under some criterion of similarity could significantly ease these operations however when the main objective is clustering nodes internet coordinate systems present strong limitations to identify the right clusters problem known as false clustering in this work the authors answer fundamental question that has been obscured in proximity techniques so far how often false clustering happens in reality and how much this affects the overall performance of an overlay to that effect the authors present novel approach called tr clustering to cluster nodes in overlay networks based on their physical positions on the internet to be specific tr clustering uses the internet routers with high vertex betweenness centrality to cluster participating nodes informally the betweenness centrality of router is defined as the fraction of shortest paths between all pairs of nodes running through it simulation results illustrate that tr clustering is superior to existing techniques with less than of falsely clustered peers of course relative to the datasets utilized in their evaluation
wide range of database applications manage information that varies over time many of the underlying database schemas of these were designed using one of the several versions of the entity relationship er model in the research community as well as in industry it is common knowledge that the temporal aspects of the mini world are pervasive and important but are also difficult to capture using the er model not surprisingly several enhancements to the er model have been proposed in an attempt to more naturally and elegantly support the modeling of temporal aspects of information common to most of the existing temporally extended er models is that the semantics of the models are unclear this problem is addressed in this paper by developing formal semantics for the timeer model based on denotational semantics
broadcast scheduling is popular method for disseminating information in response to client requests there are pages of information and clients request pages at different times however multiple clients can have their requests satisfied by single broadcast of the requested page in this paper we consider several related broadcast scheduling problems one central problem we study simply asks to minimize the maximum response time over all requests another related problem we consider is the version in which every request has release time and deadline and the goal is to maximize the number of requests that meet their deadlines while approximation algorithms for both these problems were proposed several years back it was not known if they were np complete one of our main results is that both these problems are np complete in addition we use the same unified approach to give simple np completeness proof for minimizing the sum of response times very complicated proof was known for this version furthermore we give proof that fifo is competitive online algorithm for minimizing the maximum response time this result had been claimed earlier with no proof and that there is no better deterministic online algorithm this result was claimed earlier as well but with an incorrect proof
the conflict between web service personalization and privacy is challenge in the information society in this paper we address this challenge by introducing masks an architecture that provides data on the users interests to web services without violating their privacy the proposed approach hides the actual identity of users by classifying them into groups according to their interests exhibited during the interaction with web service by making requests on behalf of group instead of an individual user masks provides relevant information to the web services without disclosing the identity of the users we have implemented and tested grouping algorithm based on categories defined by the semantic tree of dmoz we used access logs from actual commerce sites to evaluate the grouping algorithm our tests show that of the requests made to the commerce service could be grouped into meaningful categories this indicates that the commerce sites could use the information provided by masks to do personalization of services without having access to the individual users in the groups
the internet’s interdomain routing protocol bgp supports complex network of autonomous systems which is vulnerable to number of potentially crippling attacks several promising cryptography based solutions have been proposed but their adoption has been hindered by the need for community consensus cooperation in public key infrastructure pki and common security protocol rather than force centralized control in distributed network this paper examines distributed security methods that are amenable to incremental deployment typically such methods are less comprehensive and not provably secure the paper describes distributed anomaly detection and response system that provides comparable security to cryptographic methods and has more plausible adoption path specifically the paper makes the following contributions it describes pretty good bgp pgbgp whose security is comparable but not identical to secure origin bgp it gives theoretical proofs on the effectiveness of pgbgp it reports simulation experiments on snapshot of the internet topology annotated with the business relationships between neighboring networks it quantifies the impact that known exploits could have on the internet and it determines the minimum number of ases that would have to adopt distributed security solution to provide global protection against these exploits taken together these results explore the boundary between what can be achieved with provably secure centralized security mechanisms for bgp and more distributed approaches that respect the autonomous nature of the internet
we propose secvisor tiny hypervisor that ensures code integrity for commodity os kernels in particular secvisor ensures that only user approved code can execute in kernel mode over the entire system lifetime this protects the kernel against code injection attacks such as kernel rootkits secvisor can achieve this propertyeven against an attacker who controls everything but the cpu the memory controller and system memory chips further secvisor can even defend against attackers with knowledge of zero day kernel exploits our goal is to make secvisor amenable to formal verificationand manual audit thereby making it possible to rule out known classes of vulnerabilities to this end secvisor offers small code size and small external interface we rely on memory virtualization to build secvisor and implement two versions one using software memory virtualization and the other using cpu supported memory virtualization the code sizes of the runtime portions of these versions are and lines respectively the size of the external interface for both versions of secvisor is hypercalls it is easy to port os kernels to secvisor we port the linux kernel version by adding lines and deleting lines out of total of approximately million lines of code in the kernel
mobile phone with camera enabled people to capture moments to remember in the right time and place due to limited user interface however mobile phone is not yet platform for enjoying the captured moments we explored possible application scenarios for promoting utilization of user created photos on mobile phone for the realization of the scenarios we designed context aware photo selection algorithms that take into consideration mobile phone contexts such as the current location and recent calls user study was conducted with mobile phone prototype for the evaluation of the photo selection algorithms and also for user feedback about the photo consumption scenarios
in this paper we describe service oriented middleware architecture for grid environments which enables efficient data management our design introduces concepts from peer to peer computing in order to provide scalable and reliable infrastructure for storage search and retrieval of annotated content to ensure fast file lookups in the distributed repositories our system incorporates multidimensional indexing scheme which serves the need for supporting both exact match and range queries over group of metadata attributes finally file transfers are conducted using gridtorrent grid enabled peer to peer mechanism that performs efficient data transfers by enabling cooperation among participating nodes and balances the cost of file transfer among them the proposed architecture is the middleware component used by the gredia project in which both media and banking partners plan to share large loads of annotated content
historically processor accesses to memory mapped device registers have been marked uncachable to insure their visibility to the device the ubiquity of snooping cache coherence however makes it possible for processors and devices to interact with cachable coherent memory operations using coherence can improve performance by facilitating burst transfers of whole cache blocks and reducing control overheads eg for polling this paper begins an exploration of network interfaces nis that use coherence coherent network interfaces cnis to improve communication performance we restrict this study to ni cnis that reside on coherent memory or buses to ni cnis that are much simpler than processors and to the performance of fine grain messaging from user process to user processour first contribution is to develop and optimize two mechanisms that cnis use to communicate with processors cachable device register derived from cachable control registers is coherent cachable block of memory used to transfer status control or data between device and processor cachable queues generalize cachable device registers from one cachable coherent memory block to contiguous region of cachable coherent blocks managed as circular queueour second contribution is taxonomy and comparison of four cnis with more conventional ni microbenchmark results show that cnis can improve the round trip latency and achievable bandwidth of small byte message by and respectively on the memory bus and and respectively on coherent bus experiments with five macrobenchmarks show that cnis can improve the performance by on the memory bus and on the bus
this paper presents amelie service oriented framework that supports the implementation of awareness systems amelie adopts the tenets of recombinant computing to address an important non functional requirement for ambient intelligence software namely the heterogeneous combination of services and components amelie is founded upon fn aar an abstract model of awareness systems which enables the immediate expression and implementation of socially salient requirements such as symmetry and social translucence we discuss the framework and show how system behaviours can be specified using the awareness mark up language aml
the problem of selecting subset of relevant features is classic and found in many branches of science including examples in pattern recognition in this paper we propose new feature selection criterion based on low loss nearest neighbor classification and novel feature selection algorithm that optimizes the margin of nearest neighbor classification through minimizing its loss function at the same time theoretical analysis based on energy based model is presented and some experiments are also conducted on several benchmark real world data sets and facial data sets for gender classification to show that the proposed feature selection method outperforms other classic ones
this paper describes kilim framework that employs combination of techniques to help create robust massively concurrent systems in mainstream languages such as java ultra lightweight cooperatively scheduled threads actors ii message passing framework no shared memory no locks and iii isolation aware messagingisolation is achieved by controlling the shape and ownership of mutable messages they must not have internal aliases and can only be owned by single actor at time we demonstrate static analysis built around isolation type qualifiers to enforce these constraintskilim comfortably scales to handle hundreds of thousands of actors and messages on modest hardware it is fast as well task switching is faster than java threads and faster than other lightweight tasking frameworks and message passing is faster than erlang currently the gold standard for concurrency oriented programming
this paper describes the design and deployment of collaborative software tool designed for and presently in use on the mars exploration rovers mer mission two central questions are addressed does collaborative content like that created on easels and whiteboards have persistent value can groups of people jointly manage collaboratively created content based on substantial quantitative and qualitative data collected during mission operations it remains difficult to conclusively answer the first question while there is some positive support for the second question the mer mission provides uniquely rich data set on the use of collaborative tools
an extension to the well known mvc architectural pattern is proposed to include an explicit structure model the proposed conceptual model is further extended to address requirements from the research fields cscw and ubiquitous computing furthermore data structure and behavior descriptions have been identified as basic abstractions in summary the proposed model addresses reuse as well as design for change on different levels of abstraction
an approach to mesh denoising based on the concept of random walks is examined the proposed method consists of two stages face normal filtering followed by vertex position updating to integrate the denoised face normals in least squares manner face normal filtering is performed by weighted averaging of normals in neighbourhood novel approach to determining weights is to compute the probability of arriving at each neighbour following fixed length random walk of virtual particle starting at given face of the mesh the probability of the particle stepping from its current face to some neighbouring face is function of the angle between the two face normals based on gaussian distribution whose variance is adaptively adjusted to enhance the feature preserving property of the algorithm the vertex position updating procedure uses the conjugate gradient algorithm for speed of convergence analysis and experiments show that random walks of different step lengths yield similar denoising results our experiments show that in fact iterative application of one step random walk in progressive manner effectively preserves detailed features while denoising the mesh very well this approach is faster than many other feature preserving mesh denoising algorithms
we discuss model of cooperation among autonomous agents based on the attribution of mental attitudes to groups these attitudes represent the shared beliefs and objectives and the wish to reduce the costs for the members when agents take decision they have to recursively model what their partners are expected to do under the assumption that they are cooperative and they have to adopt the goals and desires attributed to the group otherwise the other members consider them uncooperative and thus liable
learning with imbalanced data is one of the recent challenges in machine learning various solutions have been proposed in order to find treatment for this problem such as modifying methods or the application of preprocessing stage within the preprocessing focused on balancing data two tendencies exist reduce the set of examples undersampling or replicate minority class examples oversampling undersampling with imbalanced datasets could be considered as prototype selection procedure with the purpose of balancing datasets to achieve high classification rate avoiding the bias toward majority class examples evolutionary algorithms have been used for classical prototype selection showing good results where the fitness function is associated to the classification and reduction rates in this paper we propose set of methods called evolutionary undersampling that take into consideration the nature of the problem and use different fitness functions for getting good trade off between balance of distribution of classes and performance the study includes taxonomy of the approaches and an overall comparison among our models and state of the art undersampling methods the results have been contrasted by using nonparametric statistical procedures and show that evolutionary undersampling outperforms the nonevolutionary models when the degree of imbalance is increased
despite the fact that grid computing is the main theme of distributed computing research during the last few years programming on the grid is still huge difficulty to normal users the pop programming system has been built to provide grid programming facilities which greatly ease the development and the deployment of parallel applications on the grid the original parallel object model used in pop is combination of powerful features of object oriented programming and of high level distributed programming capabilities the model is based on the simple idea that objects are suitable structures to encapsulate and to distribute heterogeneous data and computing elements over the grid programmers can guide the resource allocation for each object through the high level resource descriptions the object creation process supported by the pop runtime system is transparent to programmers both inter object and intra object parallelism are supported through various method invocation semantics the pop programming language extends to support the parallel object model with just few new keywords in this paper we present the grid programming aspects of pop with pop writing grid enabled application becomes as simple as writing sequential application
several analytical models of fully adaptive routing have recently been proposed for wormhole routed ary cubes under the uniform traffic pattern however there has been hardly any model reported yet that deals with other important nonuniform traffic patterns such as hot spots as result most studies have resorted to simulation when evaluating the performance merits of adaptive routing in an effort to fill this gap this paper describes the first analytical model of fully adaptive routing in ary cubes in the presence of hot spot traffic results from simulation show close agreement with those predicted by the model
recent years have revealed growing importance of virtual reality vr visualization techniques which offer comfortable means to enable users to interactively explore data sets particularly in the field of computational fluid dynamics cfd the rapidly increasing size of data sets with complex geometric and supplementary scalar information requires new out of core solutions for fast isosurface extraction and other cfd post processing tasks whereas spatial access methods overcome the limitations of main memory size and support fast data selection their vr support needs to be improved firstly interactive users strongly depend on quick first views of the regions in their view direction and secondly they require quick relevant views even when they change their view point or view direction we develop novel view dependent extensions for access methods which support static and dynamic scenarios our new human vision oriented distance function defines an adjusted order of appearance for data objects in the visualization space and thus supports quick first views by novel incremental concept of view dependent result streaming which interactively follows dynamic changes of users viewpoints and view directions we provide high degree of interactivity and mobility in vr environments our integration into the new index based graphics data server indegs proves the efficiency of our techniques in the context of post processing cfd data with dynamically interacting users
in this paper we propose novel algorithm for directional temporal texture synthesis the generated temporal textures can move in any user specified direction at run time while it requires only static texture image as input we first synthesize texture sequences that approximate to true video clips by texture sequence synthesis algorithm then we use transition probabilities to generate infinite length sequences with semi regular characteristic the quality of synthesized temporal textures can be improved via cross fading technique we also extend our algorithm to interactive rendering it allows users to design the image masks and warp grids by using mouse drag and drop the temporal textures are pasted onto the scenery on the fly at the aid of image masks and warp grids therefore we are able to generate photo animation several examples such as ocean pond and ripple are included for demonstration
this article proposes process to retrieve the url of document for which metadata records exist in digital library catalog but pointer to the full text of the document is not available the process uses results from queries submitted to web search engines for finding the url of the corresponding full text or any related material we present comprehensive study of this process in different situations by investigating different query strategies applied to three general purpose search engines google yahoo msn and two specialized ones scholar and citeseer considering five user scenarios specifically we have conducted experiments with metadata records taken from the brazilian digital library of computing bdbcomp and the dblp computer science bibliography dblp we found that scholar was the most effective search engine for this task in all considered scenarios and that simple strategies for combining and re ranking results from scholar and google significantly improve the retrieval quality moreover we study the influence of the number of query results on the effectiveness of finding missing information as well as the coverage of the proposed scenarios
as databases increasingly integrate different types ot information such as time series multimedia and scientific data it becomes necessary to support efficient retrieval of multi dimensional data both the dimensionality and the amount of data that needs to be processed are increasing rapidly as result of the scale and high dimensional nature the traditional techniques have proven inadequate in this paper we propose search techniques that are effective especially for large high dimensional data sets we first propose va file technique which is based on scalar quantization of the data va file is especially useful for searching exact nearest neighbors nn in non uniform high dimensional data sets we then discuss how to improve the search and make it progressive by allowing some approximations in the query result we develop general framework for approximate nn queries discuss various approaches for progressive processing of similarity queries and develop metric for evaluation of such techniques finally new technique based on clustering is proposed which merges the benefits of various approaches for progressive similarity searching extensive experimental evaluation is performed on several real life data sets the evaluation establishes the superiority of the proposed techniques over the existing techniques for high dimensional similarity searching the techniques proposed in this paper are effective for real life data sets which are typically non uniform and they are scalable with respect to both dimensionality and size of the data set
we consider asynchronous distributed systems with message losses and process crashes we study the impact of finite process memory on the solution to consensus repeated consensus and reliable broadcast with finite process memory we show that in some sense consensus is easier to solve than reliable broadcast and that reliable broadcast is as difficult to solve as repeated consensus more precisely with finite memory consensus can be solved with failure detector cal and cal variant of the perfect failure detector which is stronger than cal is necessary and sufficient to solve reliable broadcast and repeated consensus
collaborative systems such as grids provide efficient and scalable access to distributed computing capabilities and enable seamless resource sharing between users and platforms this heterogeneous distribution of resources and the various modes of collaborations that exist between users virtual organizations and resource providers require scalable flexible and fine grained access control to pro tect both individual and shared computing resources in this paper we propose usage control ucon based authorization frame work for collaborative applications in our framework usage con trol policies are defined using subject and object attributes along with system attributes as conditions general attributes include not only persistent attributes such as role and group memberships but also mutable usage attributes of subjects and objects conditions in ucon can be used to support context based authorizations in ad hoc collaborations as proof of concept we implement pro totype system based on our proposed architecture and conduct ex perimental studies to demonstrate the feasibility and performance of our approach
query containment and query answering are two important computational tasks in databases while query answering amounts to computing the result of query over database query containment is the problem of checking whether for every database the result of one query is subset of the result of another query in this article we deal with unions of conjunctive queries and we address query containment and query answering under description logic constraints every such constraint is essentially an inclusion dependency between concepts and relations and their expressive power is due to the possibility of using complex expressions in the specification of the dependencies for example intersection and difference of relations special forms of quantification regular expressions over binary relations these types of constraints capture great variety of data models including the relational the entity relationship and the object oriented model all extended with various forms of constraints they also capture the basic features of the ontology languages used in the context of the semantic web we present the following results on both query containment and query answering we provide method for query containment under description logic constraints thus showing that the problem is decidable and analyze its computational complexity we prove that query containment is undecidable in the case where we allow inequalities in the right hand side query even for very simple constraints and queries we show that query answering under description logic constraints can be reduced to query containment and illustrate how such reduction provides upper bound results with respect to both combined and data complexity
component based software construction relies on suitable models underlying components and in particular the coordinators which orchestrate component behaviour verifying correctness and safety of such systems amounts to model checking the underlying system model where model checking techniques not only need to be correct but since system sizes increase also scalable and efficient in this paper we present sat based approach for bounded model checking of timed constraint automata we present an embedding of bounded model checking into propositional logic with linear arithmetic which overcomes the state explosion problem to deal with large systems by defining product that is linear in the size of the system to further improve model checking performance we show how to embed our approach into an extension of counterexample guided abstraction refinement with craig interpolants
good parameterizations are of central importance in many digital geometry processing tasks typically the behavior of such processing algorithms is related to the smoothness of the parameterization and how much distortion it contains since parameterization maps bounded region of the plane to the surface parameterization for surface which is not homeomorphic to disc must be made up of multiple pieces we present novel parameterization algorithm for arbitrary topology surface meshes which computes globally smooth parameterization with low distortion we optimize the patch layout subject to criteria such as shape quality and metric distortion which are used to steer mesh simplification approach for base complex construction global smoothness is achieved through simultaneous relaxation over all patches with suitable transition functions between patches incorporated into the relaxation procedure we demonstrate the quality of our parameterizations through numerical evaluation of distortion measures and the excellent rate distortion performance of semi regular remeshes produced with these parameterizations the numerical algorithms required to compute the parameterizations are robust and run on the order of minutes even for large meshes
knowledge of the largest traffic ows in network is important for many network management applications the problem of finding these ows is known as the heavy hitter problem and has been the subject of many studies in the past years one of the most efficient and well known algorithms for finding heavy hitters is lossy counting in this work we introduce probabilistic lossy counting plc which enhances lossy counting in computing network traffic heavy hitters plc uses on tighter error bound on the estimated sizes of traffic ows and provides probabilistic rather than deterministic guarantees on its accuracy the probabilistic based error bound substantially improves the memory consumption of the algorithm in addition plc reduces the rate of false positives of lossy counting and achieves low estimation error although slightly higher than that of lossy counting we compare plc with state of the art algorithms for finding heavy hitters our experiments using real traffic traces find that plc has between and lower memory consumption between and fewer false positives than lossy counting and small estimation error
personal names are an important kind of web queries in web search and yet they are special in many ways strategies for retrieving information on personal names should therefore be different from the strategies for other types of queries to improve the search quality for personal names first step is to detect whether query is personal name despite the importance of this problem relatively little previous research has been done on this topic since web queries are usually short conventional supervised machine learning algorithms cannot be applied directly an alternative is to apply some heuristic rules coupled with name term dictionaries however when the dictionaries are small this method tends to make false negatives when the dictionaries are large it tends to generate false positives more serious problem is that this method cannot provide good trade off between precision and recall to solve these problems we propose an approach based on the construction of probabilistic name term dictionaries and personal name grammars and use this algorithm to predict the probability of query to be personal name in this paper we develop four different methods for building probabilistic name term dictionaries in which term is assigned with probability value of the term being name term we compared our approach with baseline algorithms such as dictionary based look up methods and supervised classification algorithms including logistic regression and svm on some manually labeled test sets the results validate the effectiveness of our approach whose value is more than which outperforms the best baseline by more than
this paper presents the concept of inquisitive use and discusses design considerations for creating experience oriented interactive systems that inspire inquisitive use inquisitive use is based on the pragmatism of john dewey and defined by the interrelated aspects of experience inquiry and conflict the significance of this perspective for design is explored and discussed through two case studies of experience oriented installations the paper contributes to the expanding discourse on experience design on theoretical level by exploring one particular facet of interaction inquisitive use and on practical level by discussing implications for design prompted by insights into inquisitive use these implications are presented as set of design sensitivities which provide contextual insights and considerations for ongoing and future design processes
in this paper we present context aware rbac carbac model for pervasive computing applications the design of this model has been guided by the context based access control requirements of such applications these requirements are related to users memberships in roles permission executions by role members and context based dynamic integration of services in the environment with an application context information is used in role admission policies in policies related to permission executions by role members and in policies related to accessing of dynamically interfaced services by role members the dynamic nature of context information requires model level support for revocations of role memberships and permission activations when certain context conditions fail to hold based on this model we present programming framework for building context aware applications providing mechanisms for specifying and enforcing context based access control requirements
the bee is an integrated development environment for the scheme programming language it provides the user with connection between scheme and the programming language symbolic debugger profiler an interpreter an optimizing compiler that delivers stand alone executables source file browser project manager user libraries and online documentation this article details the facilities of the bee its user interface and presents an overview of the implementation of its main components
this paper reviews research on automatic summarising in the last decade this work has grown stimulated by technology and by evaluation programmes the paper uses several frameworks to organise the review for summarising itself for the factors affecting summarising for systems and for evaluation the review examines the evaluation strategies applied to summarising the issues they raise and the major programmes it considers the input purpose and output factors investigated in recent summarising research and discusses the classes of strategy extractive and non extractive that have been explored illustrating the range of systems built the conclusions drawn are that automatic summarisation has made valuable progress with useful applications better evaluation and more task understanding but summarising systems are still poorly motivated in relation to the factors affecting them and evaluation needs taking much further to engage with the purposes summaries are intended to serve and the contexts in which they are used
in this work we propose novel framework for bookmark weighting which allows us to estimate the effectiveness of each of the bookmarks individually we show that by weighting bookmarks according to their estimated quality we can significantly improve search effectiveness using empirical evaluation on real data gathered from two large bookmarking systems we demonstrate the effectiveness of the new framework for search enhancement
we study an extension of hennessy milner logic for the calculus which gives sound and complete characterisation of representative behavioural preorders and equivalences over typed processes new connectives are introduced representing actual and hypothetical typed parallel composition and hiding we study three compositional proof systems characterising the may must testing preorders and bisimilarity the proof systems are uniformly applicable to different type disciplines logical axioms distill proof rules for parallel composition studied by amadio and dam we demonstrate the expressiveness of our logic through verification of state transfer in multiparty interactions and fully abstract embeddings of program logics for higher order functions
reputation based trust management is increasingly popular in providing quantitative measurement for peers choosing reliable resources and trusted cooperators in decentralized peer to peer pp environment however existing approaches do little regarding the validation of peer’s reputation that is it is challenging to guarantee the validation and accuracy of computing reputation value due to malicious denigration or overpraising in this work we first investigate the impact of this problem we then propose truthrep approach which encourages peers to provide honest feedback by involving the quality of their evaluations of others into computing reputations we outline the challenging issues of this design and present preliminary experimental results
ui model discovery is lightweight formal method in which model of an interactive system is automatically discovered by exploring the system’s state space simulating the actions of user such models are then amenable to automatic analysis targetting structural usability concerns this paper specifies ui model discovery in some detail providing formal generic and language neutral api and discovery algorithm the technique has been implemented in prototype systems on several programming platforms yielding valuable usability insights the api described here supports further development of these ideas in systematic manner
in this paper we present cmtjava domain specific language for composable memory transactions in java cmtjava provides the abstraction of transactional objects transactional objects have their fields accessed only by special get and set methods that are automatically generated by the compiler these methods return transactional actions as result transactional action is an action that when executed will produce the desired effect transactional actions can only be executed by the atomic method transactional actions are first class values in java and they are composable transactions can be combined to generate new transactions the java type system guarantees that the fields of transactional objects will never be accessed outside transaction cmtjava supports the retry and orelse constructs from stm haskell to validate our design we implemented simple transactional system following the description of the original haskell system cmtjava is implemented as state passing monad using bbga closures java extension that supports closures in java
in the design of embedded systems processor architecture is tradeoff between energy consumption area speed design time and flexibility to cope with future design changes new versions in product generation may require small design changes in any part of the design we propose novel processor architecture concept which provides the flexibility needed in practice at reduced power and performance cost compared to fully programmable processor the crucial element is novel protocol combining an efficient customized component with flexible processor into hybrid architecture
current works address self adaptability of software architectures to build more autonomous and flexible systems however most of these works only perform adaptations at configuration level component is adapted by being replaced with new one the state of the replaced component is lost and related components can undergo undesirable changes this paper presents generic solution to design components that are capable of supporting runtime adaptation taking into account that component type changes must be propagated to its instances the adaptation is performed in decentralized and autonomous way in order to cope with the increasing need for building heterogeneous and autonomous systems as result each component type manages its instances and each instance applies autonomously the changes moreover our proposal uses aspect oriented components to benefit from their reuse and maintenance and it is based on mof and reflection concepts to benefit from the high abstraction level they provide
most of sensor mac protocols have fixed duty cycle which performs poorly under the dynamic traffic condition observed in event driven sensor applications such as surveillance fire detection and object tracking system this paper proposes traffic aware mac protocol which dynamically adjusts the duty cycle adapting to the traffic load our adaptive scheme operates on tree topology and nodes wake up only for the time measured for the successful transmissions by adjusting the duty cycle it can prevent packet drops and save energy the simulation results show that our scheme outperforms fixed duty cycle scheme and mac hence it can achieve high packet fidelity and save energy
many sensor network applications only require probabilistic data delivery as they can tolerate some missing data samples for example in environmental monitoring missing temperature pressure and humidity level samples can often be inferred by spatial and or temporal interpolations in this paper we propose and study an adaptive persistent csma based media access control protocol that supports end to end probabilistic reliability for sensor networks on hop by hop basis in an effort to reduce the probability of packet collisions first we tune the carrier sensing range of the nodes then given an end to end reliability requirement we determine the optimal allocation of per hop reliability requirements on each route to minimize the expected total number of transmissions needed finally our adaptive persistent csma protocol tunes its link persistence probability to further reduce the expected total number of transmissions and thereby minimizes the energy consumption in the network we formulate this latter problem as constrained optimization problem and then derive an algorithm to adapt the link persistence probabilities using the lagrangian dual decomposition method
we present an efficient computational algorithm for functions represented by nonlinear piecewise constant approximation called cuts our main contribution is single traversal algorithm for merging cuts that allows for arbitrary pointwise computation such as addition multiplication linear interpolation and multi product integration theoretical error bound of this approach can be proved using statistical interpretation of cuts our algorithm extends naturally to computation with many cuts and maps easily to modern gpus leading to significant advantages over existing methods based on wavelet approximation we apply this technique to the problem of realistic lighting and material design under complex illumination with arbitrary brdfs our system smoothly integrates all frequency relighting of shadows and reflections with dynamic per pixel shading effects such as bump mapping and spatially varying brdfs this combination of capabilities is typically missing in current systems we represent illumination and precomputed visibility as nonlinear sparse vectors we then use our cut merging algorithm to simultaneously interpolate visibility cuts at each pixel and compute the triple product integral of the illumination interpolated visibility and dynamic brdf samples finally we present two pass data driven approach that exploits pilot visibility samples to optimize the construction of the light tree leading to more efficient cuts and reduced datasets
within integrated multiple object databases missing data occurs due to the missing attribute conflict as well as the existence of null values set of algorithms is provided in this paper to process the predicates of global queries with missing data for providing more informative answers to users the maybe results due to missing data are presented in addition to the certain results the local maybe results may become certain results via the concept of object isomerism one algorithm is designed based on the centralized approach in which data are forwarded to the same site for integration and processing furthermore for reducing response time the localized approaches evaluate the predicates within distinct component databases in parallel the object signature is also applied in the design to further reduce the data transfer these algorithms are compared and discussed according to the simulation results of both the total execution and response times alternately the global schema may contain multivalued attributes with values derived from attribute values in different component databases hence the proposed approaches are also extended to process the global queries involving this kind of multivalued attribute
this paper outlines several multimedia systems that utilize multimodal approach these systems include audiovisual based emotion recognition image and video retrieval and face and head tracking data collected from diverse sources sensors are employed to improve the accuracy of correctly detecting classifying identifying and tracking of desired object or target it is shown that the integration of multimodality data will be more efficient and potentially more accurate than if the data was acquired from single source number of cutting edge applications for multimodal systems will be discussed an advanced assistance robot using the multimodal systems will be presented
we present two broadcast authentication protocols based on delayed key disclosure our protocols rely on symmetric key cryptographic primitives and use cryptographic puzzles to provide efficient broadcast authentication in different application scenarios including those with resource constrained wireless devices such as sensor nodes the strong points of the protocols proposed are that one protocol allows instantaneous message origin authentication whereas the other has low communication overhead in addition to formalizing and analyzing these specific protocols we carry out general analysis of broadcast authentication protocols based on delayed key disclosure this analysis uncovers fundamental limitations of this class of protocols in terms of the required accuracy of message propagation time estimations if the protocols are to guarantee security and run efficiently
network coding has been prominent approach to series of problems that used to be considered intractable with traditional transmission paradigms recent work on network coding includes substantial number of optimization based protocols but mostly for wireline multicast networks in this paper we consider maximizing the benefits of network coding for unicast sessions in lossy wireless environments we propose optimized multipath network coding omnc rate control protocol that dramatically improves the throughput of lossy wireless networks omnc employs multiple paths to push coded packets to the destination and uses the broadcast mac to deliver packets between neighboring nodes the coding and broadcast rate is allocated to transmitters by distributed optimization algorithm that maximizes the advantage of network coding while avoiding congestion with extensive experiments on an emulation testbed we find that omnc achieves more than two fold throughput increase on average compared to traditional best path routing and significant improvement over existing multipath routing protocols with network coding the performance improvement is notable not only for one unicast session but also when multiple concurrent unicast sessions coexist in the network
set associative caches achieve low miss rates for typical applications but result in significant energy dissipation set associative caches minimize access time by probing all the data ways in parallel with the tag lookup although the output of only the matching way is used the energy spent accessing the other ways is wasted eliminating the wasted energy by performing the data lookup sequentially following the tag lookup substantially increases cache access time and is unacceptable for high performance caches in this paper we apply two previously proposed techniques way prediction and selective direct mapping to reducing cache dynamic energy while maintaining high performance the techniques predict the matching way and probe only the predicted way and not all the ways achieving energy savings while these techniques were originally proposed to improve set associative cache access times this is the first paper to apply them to reducing cache energywe evaluate the effectiveness of these techniques in reducing cache cache and overall processor energy using these techniques our caches achieve the energy delay of sequential access while maintaining the performance of parallel access relative to parallel access and caches the techniques achieve overall processor energy delay reduction of while perfect way prediction with no performance degradation achieves reduction the performance degradation of the techniques is less than compared to an aggressive cycle way parallel access cache
wireless sensor networks have become important architectures for many application scenarios eg traffic monitoring or environmental monitoring in general as these sensors are battery powered query processing strategies aim at minimizing energy consumption because sending all sensor readings to central stream data management system consumes too much energy parts of the query can already be processed within the network in network query processing an important optimization criterion in this context is where to process which intermediate results and how to route them efficiently to overcome these problems we propose anduin system addressing these problems and offering an optimizer that decides which parts of the query should be processed within the sensor network it also considers optimization with respect to complex data analysis tasks such as burst detection furthermore anduin offers web based frontend for declarative query formulation and deployment in this paper we present our research prototype and focus on anduin’s components alleviating deployment and usability
in this paper we consider new class of logic programs called weight constraint programs with functions which are lparse programs incorporating functions over non herbrand domains we define answer sets for these programs and develop computational mechanism based on loop completion we present our results in two stages first we formulate loop formulas for lparse programs without functions our result improves the previous formulations in that our loop formulas do not introduce new propositional variables nor there is need of translating lparse programs to nested expressions building upon this result we extend the work to weight constraint programs with functions we show that the loop completion of such program can be transformed to constraint satisfaction problem csp whose solutions correspond to the answer sets of the program hence off the shelf csp solvers can be used for answer set computation we show some preliminary experimental results
the increasing complexity of interconnection designs has enhanced the importance of research into global routing when seeking high routability low overflow results or rapid search paths that report wire length estimations to placer this work presents two routing techniques namely adaptive pseudorandom net ordering routing and evolution based rip up and reroute using two stage cost function in high performance congestion driven global router we also propose two efficient via minimization methods namely congestion relaxation by layer shifting and rip up and re assignment for dynamic programming based layer assignment experimental results demonstrate that our router achieves performance similar to the first two winning routers in ispd routing contest in terms of both routability and wire length at and faster routing speed besides our layer assignment yields to fewer vias to shorter wirelength and to less runtime than cola
managing inconsistency in databases has long been recognized as an important problem one of the most promising approaches to coping with inconsistency in databases is the framework of database repairs which has been the topic of an extensive investigation over the past several years intuitively repair of an inconsistent database is consistent database that differs from the given inconsistent database in minimal way so far most of the work in this area has addressed the problem of obtaining the consistent answers to query posed on an inconsistent database repair checking is the following decision problem given two databases and is repair of although repair checking is fundamental algorithmic problem about inconsistent databases it has not received as much attention as consistent query answering in this paper we give polynomial time algorithm for subset repair checking under integrity constraints that are the union of weakly acyclic set of local as view lav tuple generating dependencies and set of equality generating dependencies this result significantly generalizes earlier work for subset repair checking when the integrity constraints are the union of an acyclic set of inclusion dependencies and set of functional dependencies we also give polynomial time algorithm for symmetric difference repair checking when the integrity constraints form weakly acyclic set of lav tgds after this we establish number of complexity theoretic results that delineate the boundary between tractability and intractability for the repair checking problem specifically we show that the aforementioned tractability results are optimal in particular subset repair checking for arbitrary weakly acyclic sets of tuple generating dependencies is conp complete problem we also study cardinality based repairs and show that cardinality repair checking is conp complete for various classes of integrity constraints encountered in database design and data exchange
this paper presents new algorithm for force directed graph layout on the gpu the algorithm whose goal is to compute layouts accurately and quickly has two contributions the first contribution is proposing general multi level scheme which is based on spectral partitioning the second contribution is computing the layout on the gpu since the gpu requires data parallel programming model the challenge is devising mapping of naturally unstructured graph into well partitioned structured one this is done by computing balanced partitioning of general graph this algorithm provides general multi level scheme which has the potential to be used not only for computation on the gpu but also on emerging multi core architectures the algorithm manages to compute high quality layouts of large graphs in fraction of the time required by existing algorithms of similar quality an application for visualization of the topologies of isp internet service provider networks is presented
recent research has demonstrated the success of tensor based subspace learning in both unsupervised and supervised configurations eg pca lda and dater in this correspondence we present new semi supervised subspace learning algorithm by integrating the tensor representation and the complementary information conveyed by unlabeled data conventional semi supervised algorithms mostly impose regularization term based on the data representation in the original feature space instead we utilize graph laplacian regularization based on the low dimensional feature space an iterative algorithm referred to as adaptive regularization based semi supervised discriminant analysis with tensor representation arsda is also developed to compute the solution in addition to handling tensor data vector based variant arsda is also presented in which the tensor data are converted into vectors before subspace learning comprehensive experiments on the cmu pie and yale databases demonstrate that arsda brings significant improvement in face recognition accuracy over both conventional supervised and semi supervised subspace learning algorithms
both type accurate and conservative garbage collectors have gained in importance since the original paper was written managing unnecessary retention by conservative collectors continues to be an important problem there appear to be few reimplementations of the techniques we described but significantly refined descendents of the original implementation are alive and well inside large number of applicationsthere has been later work both on quantifying space retention by conservative collectors and on theoretical bounds for such retentionwe call garbage collector conservative if it has only partial information about the location of pointers and is thus forced to treat arbitrary bit patterns as though they might be pointers in at least some cases we show that some very inexpensive but previously unused techniques can have dramatic impact on the effectiveness of conservative garbage collectors in reclaiming memory our most significant observation is that static data that appears to point to the heap should not result in misidentified references to the heap the garbage collector has enough information to allocate around such references we also observe that programming style has significant impact on the amount of spuriously retained storage typically even if the collector is not terribly conservative some fairly common and programming styles significantly decrease the effectiveness of any garbage collector these observations suffice to explain some of the different assessments of conservative collection that have appeared in the literature
program slicing is general widely used and accepted technique applicable to different software engineering tasks including debugging whereas model based diagnosis is an ai technique originally developed for finding faults in physical systems during the last years it has been shown that model based diagnosis can be used for software debugging in this paper we discuss the relationship between debugging using dependency based model and program slicing as result we obtain that slices of program in fault situation are equivalent to conflicts in model based debugging
although software reuse can improve both the quality and productivity of software development it will not do so until software developers stop believing that it is not worth their effort to find component matching their current problem in addition if the developers do not anticipate the existence of given component they will not even make an effort to find it in the first place even the most sophisticated and powerful reuse repositories will not be effective if developers don’t anticipate certain component exists or don’t deem it worthwhile to seek for it we argue that this crucial barrier to reuse is overcome by integrating active information delivery which presents information without explicit queries from the user and reuse repository systems prototype system codebroker illustrates this integration and raises several issues related to software reuse
modern intrusion detection systems are comprised of three basically different approaches host based network based and third relatively recent addition called procedural based detection the first two have been extremely popular in the commercial market for number of years now because they are relatively simple to use understand and maintain however they fall prey to number of shortcomings such as scaling with increased traffic requirements use of complex and false positive prone signature databases and their inability to detect novel intrusive attempts this intrusion detection systems represent great leap forward over current security technologies by addressing these and other concerns this paper presents an overview of our work in creating true database intrusion detection system based on many years of database security research the proposed solution detects wide range of specific and general forms of misuse provides detailed reports and has low false alarm rate traditional database security mechanisms are very limited in defending successful data attacks authorized but malicious transactions can make database useless by impairing its integrity and availability the proposed solution offers the ability to detect misuse and subversion through the direct monitoring of database operations inside the database host providing an important complement to host based and network based surveillance suites of the proposed solution may be deployed throughout network and their alarms man aged correlated and acted on by remote or local subscribing security services thus helping to address issues of decentralized management inside the host the proposed solution is intended to operate as true security daemon for database systems consuming few cpu cycles and very little memory and secondary storage the proposed intrusion prevention solution is managed by an access control system with intrusion detection profiles with item access rates and associating each user with profiles further the method determines whether result of query exceeds any one of the item access rates defined in the profile associated with the user and in that case notifies the access control system to alter the user authorization thereby making the received request an unauthorized request before the result is transmitted to the user the method allows for real time prevention of intrusion by letting the intrusion detection process interact directly with the access control system and change the user authority dynamically as result of the detected intrusion the method is also preventing an administrator impersonating user of relational database which database at least comprises table with at least user password wherein the password is stored as hash value the method comprises the steps of adding trigger to the table the trigger at least triggering an action when an administrator alters the table through the database management system dbms of the database calculating new password hash value differing from the stored password hash value when the trigger is triggered and replacing the stored password hash value with the new password hash value in this paper the design of the first mattssonhybrid prototype which is for oracle server is discussed mattssonhybrid uses triggers and transaction profiles to keep track of the items read and written by transactions isolates attacks by rewriting user sql statements and is transparent to end users the mattssonhybrid design is very general in addition to oracle it can be easily adapted to support many other database application platforms such as ibm db microsoft sql server sybase and informix
we describe method for controlling smoke simulations through user specified keyframes to achieve the desired behavior continuous quasi newton optimization solves for appropriate wind forces to be applied to the underlying velocity field throughout the simulation the cornerstone of our approach is method to efficiently compute exact derivatives through the steps of fluid simulation we formulate an objective function corresponding to how well simulation matches the user’s keyframes and use the derivatives to solve for force parameters that minimize this function for animations with several keyframes we present novel multiple shooting approach by splitting large problems into smaller overlapping subproblems we greatly speed up the optimization process while avoiding certain local minima
virtual enterprises ves provide novel solution to potentially enhance global competitiveness via collaboration on product design production assembly and marketing efficient and secure information resource sharing is one of the key factors to successful ve this study presents the architecture of virtual enterprise access control veac system with veac model to handle access control for information sharing across enterprises thus enabling ves to change dynamically to improve current access control models to be adapted to ves the requirements for the access control system and model are first specified by the results of analysis on the characteristics of dynamic ves according to the requirements this study designs the architecture of veac system and presents veac model based on the concept of role based access control rbac the veac model modelled in unified modeling language uml is composed of project based access control pbac model and rbac models to manage resources related to ve activities and to solve problems in distributed authorization management and security access control across organizations that may both compete and cooperate with each other moreover algorithms to generate user authorization are developed based on the model finally the prototype of the veac system is implemented the results of this study can provide ve workers with efficient and easy access to relevant resources with up to date information thus averting information delay and increase information transparency
the integration of virtual memory management and interprocess communication in the accent network operating system kernel is examined the design and implementation of the accent memory management system is discussed and its performance both on series of message oriented benchmarks and in normal operation is analyzed in detail
this paper introduces the checker framework which supports adding pluggable type systems to the java language in backward compatible way type system designer defines type qualifiers and their semantics and compiler plug in enforces the semantics programmers can write the type qualifiers in their programs and use the plug in to detect or prevent errors the checker framework is useful both to programmers who wish to write error free code and to type system designers who wish to evaluate and deploy their type systems the checker framework includes new java syntax for expressing type qualifiers declarative and procedural mechanisms for writing type checking rules and support for flow sensitive local type qualifier inference and for polymorphism over types and qualifiers the checker framework is well integrated with the java language and toolset we have evaluated the checker framework by writing checkers and running them on over lines of existing code the checkers found real errors then confirmed the absence of further errors in the fixed code the case studies also shed light on the type systems themselves
we develop the foundations for theory of group centric secure information sharing sis characterize specific family of models in this arena and identify several directions in which this theory can be extended traditional approach to information sharing characterized as dissemination centric focuses on attaching attributes and policies to an object as it is disseminated from producers to consumers in system in contrast group centric sharing envisions bringing the users and objects together in group to facilitate sharing the metaphors secure meeting room and subscription service characterize the group centric approach where participants and information come together to share for some common purpose our focus in this paper is on semantics of group operations join and leave for users and add and remove for objects each of which can have several variations called types we use linear temporal logic to first characterize the core properties of group in terms of these operations we then characterize additional properties for specific types of these operations finally we specify the authorization behavior for read access in single group for family of sis models and show that these models satisfy the above mentioned properties using the nusmv model checker
wireless mesh networks wmns have emerged recently as technology for next generation wireless networking several approaches that exploit directional and adaptive antennas have been proposed in the literature to increase the performance of wmns however while adaptive antennas can improve the wireless medium utilization by reducing radio interference and the impact of the exposed nodes problem they can also exacerbate the hidden nodes problem therefore efficient mac protocols are needed to fully exploit the features offered by adaptive antennas furthermore routing protocols that were designed for omnidirectional communications can be redesigned to exploit directional transmissions and the cross layer interaction between the mac and the network layer in this paper we first propose novel power controlled directional mac protocol pcd mac for adaptive antennas pcd mac uses the standard rts cts data ack exchange procedure the novel difference is the transmission of the rts and cts packets in all directions with tunable power while the data and ack are transmitted directionally at the minimal required power we then propose the directional deflection routing ddr routing algorithm that exploits multiple paths towards the destination based on the mac layer indication on channel availability in different directions we measure the performance of pcd mac and ddr by simulation of several realistic network scenarios and we compare them with other approaches proposed in the literature the results show that our schemes increase considerably both the total traffic accepted by the network and the fairness among competing connections
one of the most important design objectives in wireless sensor networks wsn is minimizing the energy consumption since these networks are expected to operate in harsh conditions where the recharging of batteries is impractical if not impossible the sleep scheduling mechanism allows sensors to sleep intermittently in order to reduce energy consumption and extend network lifetime in applications where coverage of the network field is not crucial allowing the coverage to drop below full coverage while keeping above predetermined threshold ie partial coverage can further increase the network lifetime in this paper we develop the distributed adaptive sleep scheduling algorithm dassa for wsns with partial coverage dassa does not require location information of sensors while maintaining connectivity and satisfying user defined coverage target in dassa nodes use the residual energy levels and feedback from the sink for scheduling the activity of their neighbors this feedback mechanism reduces the randomness in scheduling that would otherwise occur due to the absence of location information the performance of dassa is compared with an integer linear programming ilp based centralized sleep scheduling algorithm cssa which is devised to find the maximum number of rounds the network can survive assuming that the location information of all sensors is available dassa is also compared with the decentralized dgt algorithm dassa attains network lifetimes up to of the centralized solution and it achieves significantly longer lifetimes compared with the dgt algorithm
we report on initial research in tree locking tl schemes for compiled database applications such applications have repository style of architecture in which collection of software modules operate on common database in terms of set of predefined transaction types an architectural view that is also useful for embedded control programs since tl schemes are deadlock free it becomes possible to entirely decouple concurrency control from any functionality relating to recovery this property can help in the deployment of database technology to this new application area moreover with knowledge of transaction workload efficacious lock trees for runtime concurrency control can be determined at the time of system generation our experimental results show that tl produces better throughput than traditional two phase locking pl when transactions are write only and for main memory data tl performs comparably to pl even in workloads with many reads
we describe new architecture for byzantine fault tolerant state machine replication that separates agreement that orders requests from execution that processes requests this separation yields two fundamental and practically significant advantages over previous architectures first it reduces replication costs because the new architecture can tolerate faults in up to half of the state machine replicas that execute requests previous systems can tolerate faults in at most third of the combined agreement state machine replicas second separating agreement from execution allows general privacy firewall architecture to protect confidentiality through replication in contrast replication in previous systems hurts confidentiality because exploiting the weakest replica can be sufficient to compromise the system we have constructed prototype and evaluated it running both microbenchmarks and an nfs server overall we find that the architecture adds modest latencies to unreplicated systems and that its performance is competitive with existing byzantine fault tolerant systems
in this paper we present new key revocation scheme for ad hoc network environments with the following characteristics distributed our scheme does not require permanently available central authority active our scheme incentivizes rational selfish but honest nodes to revoke malicious nodes robust our scheme is resilient against large numbers of colluding malicious nodes of the network for detection error rate of detection error tolerant revocation decisions fundamentally rely on intrusion detection systems ids our scheme is active for any meaningful ids ids error rate and robust for an ids error rate of up to several schemes in the literature have two of the above four characteristics characteristic four is typically not explored this work is the first to possess all four making our revocation scheme well suited for environments such as ad hoc networks which are very dynamic have significant bandwidth constraints and where many nodes must operate under the continual threat of compromise
mobile ad hoc networks can be leveraged to provide ubiquitous services capable of acquiring processing and sharing real time information from the physical world unlike internet services these services have to survive frequent and unpredictable faults such as disconnections crashes or users turning off their devices this paper describes context aware fault tolerance mechanism for our migratory services model in this model per client service instance transparently migrates to different nodes in the network to provide continuous and semantically correct interaction with its client the proposed fault tolerance mechanism extends the primary backup approach with context aware checkpointing process the backup node is dynamically selected based on its distance from the client and service the similarity of its mobility pattern with those of the client and service the frequency of the checkpointing process and the size of the checkpointing state we demonstrate the feasibility of our approach through prototype implementation tested in small scale ad hoc network of smart phones additionally we simulate our mechanism in realistic urban environment with pedestrians cyclists and cars compared to approaches where the backup node is neighbor of the service node or the client node itself our mechanism performs as much as better than the former for recovery ratio and three times better than the latter for network overhead while achieving better or similar recovery latency
the flip chip package gives the highest chip density of any packaging method to support the pad limited asic design one of the most important characteristics of flip chip designs is that the input output buffers could be placed anywhere inside chip in this paper we first introduce the floorplanning problem for the flip chip design and formulate it as assigning the positions of input output buffers and first stage last stage blocks so that the path length between blocks and bump balls as well as the delay skew of the paths are simultaneously minimized we then present hierarchical method to solve the problem we first cluster block and its corresponding buffers to reduce the problem size then we go into iterations of the alternating and interacting global optimization step and the partitioning step the global optimization step places blocks based on simulated annealing using the tree representation to minimize given cost function the partitioning step dissects the chip into two subregions and the blocks are divided into two groups and are placed in respective subregions the two steps repeat until each subregion contains at most given number of blocks defined by the ratio of the total block area to the chip area at last we refine the floorplan by perturbing blocks inside subregion as well as in different subregions compared with the tree based floorplanner alone our method is more efficient and obtains significantly better results with an average cost of only of that obtained by using the tree alone based on set of real industrial flip chip designs provided by leading companies
users who are unfamiliar with database query languages can search xml data sets using keyword queries current approaches for supporting such queries are either for text centric xml where the structure is very simple and long text fields predominate or data centric where the structure is very rich however long text fields are becoming more common in data centric xml and existing approaches deliver relatively poor precision recall and ranking for such data sets in this paper we introduce an xml keyword search method that provides high precision recall and ranking quality for data centric xml even when long text fields are present our approach is based on new group of structural relationships called normalized term presence correlation ntpc in one time setup phase we compute the ntpcs for representative db instance then use this information to rank candidate answers for all subsequent queries based on each answer’s structure our experiments with user supplied queries over two real world xml data sets show that ntpc based ranking is always as effective as the best previously available xml keyword search method for data centric data sets and provides better precision recall and ranking than previous approaches when long text fields are present as the straightforward approach for computing ntpcs is too slow we also present algorithms to compute ntpcs efficiently
before undertaking new biomedical research identifying concepts that have already been patented is essential traditional keyword based search on patent databases may not be sufficient to retrieve all the relevant information especially for the biomedical domain more sophisticated retrieval techniques are required this paper presents biopatentminer system that facilitates information retrieval from biomedical patents it integrates information from the patents with knowledge from biomedical ontologies to create semantic web besides keyword search and queries linking the properties specified by one or more rdf triples the system can discover semantic associations between the resources the system also determines the importance of the resources to rank the results of search and prevent information overload while determining the semantic associations
we consider the problem of estimating the number of distinct values in column of table for large tables without an index on the column random sampling appears to be the only scalable approach for estimating the number of distinct values we establish powerful negative result stating that no estimator can guarantee small error across all input distributions unless it examines large fraction of the input data in fact any estimator must incur significant error on at least some of natural class of distributions we then provide new estimator which is provably optimal in that its error is guaranteed to essentially match our negative result drawback of this estimator is that while its worst case error is reasonable it does not necessarily give the best possible error bound on any given distribution therefore we develop heuristic estimators that are optimized for class of typical input distributions while these estimators lack strong guarantees on distribution independent worst case error our extensive empirical comparison indicate their effectiveness both on real data sets and on synthetic data sets
the temporal elements of users information requirements are continually confounding aspect of digital library design no sooner have users needs been identified and supported than they change this paper evaluates the changing information requirements of users through their information journey in two different domains health and academia in depth analysis of findings from interviews focus groups and observations of users have identified three stages to this journey information initiation facilitation or gathering and interpretation the study shows that although digital libraries are supporting aspects of users information facilitation there are still requirements for them to better support users overall information work in context users are poorly supported in the initiation phase as they recognize their information needs especially with regard to resource awareness in this context interactive press alerts are discussed some users especially clinicians and patients also require support in the interpretation of information both satisfying themselves that the information is trustworthy and understanding what it means for particular individual
we present two controlled experiments conducted with master students and practitioners and case study conducted with practitioners to evaluate the use of melis migration environment for legacy information systems for the migration of legacy cobol programs to the web melis has been developed as an eclipse plug in within technology transfer project conducted with small software company the partner company has developed and marketed in the last years several cobol systems that need to be migrated to the web due to the increasing requests of the customers the goal of the technology transfer project was to define systematic migration strategy and the supporting tools to migrate these cobol systems to the web and make the partner company an owner of the developed technology the goal of the controlled experiments and case study was to evaluate the effectiveness of introducing melis in the partner company and compare it with traditional software development environments the results of the overall experimentation show that the use of melis increases the productivity and reduces the gap between novice and expert software engineers
xml is the foundation of the soap protocol and in turn web service communication this self descriptive textual format for structured data is renowned to be verbose this verbosity can cause problems due to communication and processing overhead in resource constrained environments eg small wireless devices in this paper we compare different binary representations of xml documents to this end we propose multifaceted and reusable test suite based on real world scenarios our main result is that only simple xml compression methods are suitable for wide range of scenarios while these simple methods do not match the compression ratios of more specialized ones they are still competitive in most scenarios we also show that there are scenarios that none of the evaluated methods can deal with efficiently
polynomial bottom up and top down tree series transducers over partially ordered semirings are considered and the classes of tree to tree series for short ts and tree to tree series for short ts transformations computed by such transducers are compared the main result is the following let be weakly growing semiring and deterministic homomorphism the class of ts transformations computed by bottom up tree series transducers over is incomparable with respect to set inclusion with the class of ts transformations computed by bottom up tree series transducers over moreover the latter class is incomparable with the class of ts transformations computed by top down tree series transducers over if additionally is additively idempotent then the above statements even hold for every polynomial deterministic homomorphism
this paper proposes simple and fast operator the hidden point removal operator which determines the visible points in point cloud as viewed from given viewpoint visibility is determined without reconstructing surface or estimating normals it is shown that extracting the points that reside on the convex hull of transformed point cloud amounts to determining the visible points this operator is general it can be applied to point clouds at various dimensions on both sparse and dense point clouds and on viewpoints internal as well as external to the cloud it is demonstrated that the operator is useful in visualizing point clouds in view dependent reconstruction and in shadow casting
nominal terms generalise first order terms by including abstraction and name swapping constructs equivalence can be easily axiomatised using name swappings and freshness relation which makes the nominal approach well adapted to the specification of systems that involve binders nominal matching is matching modulo equivalence and has applications in programming languages rewriting and theorem proving in this paper we describe efficient algorithms to check the validity of equations involving binders and to solve matching problems modulo equivalence using the nominal approach
emerging pervasive computing technologies such as sensor networks and rfid tags can be embedded in our everyday environment to digitally store and elaborate variety of information by having application agents access in dynamic and wireless way such distributed information it is possible to enforce notable degree of context awareness in applications and increase the capabilities of interacting with the physical world in particular biologically inspired field based data structures such as gradients and pheromones are suitable to represent information in variety of pervasive computing applications this paper discusses how both sensor networks and rfid tags can be used to that purpose outlining the respective advantages and drawbacks of these technologies
aggregated journal ndash journal citation networks based on the journal citation reports of the science citation index journals and the social science citation index journals are made accessible from the perspective of any of these journals vector space model is used for normalization and the results are brought online at as input files for the visualization program pajek the user is thus able to analyze the citation environment in terms of links and graphs furthermore the local impact of journal is defined as its share of the total citations in the specific journal’s citation environments the vertical size of the nodes is varied proportionally to this citation impact the horizontal size of each node can be used to provide the same information after correction for within journal self citations in the ldquo citing rdquo environment the equivalents of this measure can be considered as citation activity index which maps how the relevant journal environment is perceived by the collective of authors of given journal as policy application the mechanism of interdisciplinary developments among the sciences is elaborated for the case of nanotechnology journals copy wiley periodicals inc
secure and effective access control is critical to sensitive organizations especially when multiple organizations are working together using diverse systems to alleviate the confusion and challenges of redundancy in such large complex organization in this paper we introduce composite role based access control rbac approach by separating the organizational and system role structures and by providing the mapping between them this allows for the explicit identification and separation of organizational and target system roles role hierarchies role assignments constraints and role activations with an attempt to bridge the gap between the organizational and system role structures the composite rbac approach supports scalable and reusable rbac mechanisms for large complex organizations our research explores the newly created department of homeland security dhs as large complex organization in which the composite rbac can be applied
with the advent of web tagging became popular feature people tag diverse kinds of content eg products at amazon music at lastfm images at flickr etc clicking on tag enables the users to explore related content in this paper we investigate how such tag based queries initialized by the clicking activity can be enhanced with automatically produced contextual information so that the search result better fits to the actual aims of the user we introduce the socialhits algorithm and present an experiment where we compare different algorithms for ranking users tags and resources in contextualized way
optimistic replication algorithms allow data presented to users to be stale non up to date but in controlled way they propagate updates in background and allow any replica to be accessed directly most of the time when the timely propagation of updates to remote distributed replicas is an important issue it is preferable that replica gets the same update twice than it does not receive it at all on the other hand few assumptions on the topology of the network can be made in nomadic environment where connections are likely to change unpredictably an extreme approach would be to blindly push every update to every replica however this would lead to huge waste of bandwidth and of resources in this paper we present novel approach based on timed buffers technique that tends to reduce the overall number of propagated updates while guaranteeing that every update is delivered to every replica and that the propagation is not delayed
this paper presents detailed design and implementation of power efficient garbage collector for java embedded systems the proposed scheme is hybrid between the standard mark sweep compact collector available in sun’s kvm and limited field reference counter there are three benefits resulting from the proposed scheme the proposed scheme reclaims memory more efficiently and this results in less mark sweep garbage collection invocations reduction in garbage collection invocations improves cache locality and reduces the number of main memory accesses and reduction in memory access ultimately results in lower energy consumption since memory access can consume large amount of energy when compared with an instruction execution the proposed scheme has been implemented into sun’s kvm and has been shown to reduce the number of mark sweep garbage collection invocations by up to in some cases and the number of level cache misses by as much as when compared to the default garbage collector we also find that in some applications the proposed scheme can reduce the power consumption by as much as when compared to the default sun’s kvm
we introduce numerical constraints into the context of xml which restrict the number of nodes within subtrees of an xml tree that contain specific value equal subnodes we demonstrate the applicability of numerical constraints by optimising xml queries and predicting the number of xml query answers updates and encryptions in order to effectively unlock the wide range of xml applications decision problems associated with numerical constraints are investigated the implication problem is conp hard for several restricted classes of numerical constraints these sources of intractability direct our attention towards numerical keys that permit the specification of upper bounds keys as introduced by buneman et al are numerical keys with upper bound numerical keys are finitely satisfiable finitely axiomatisable and their implication problem is decidable in quadratic time
in this paper we introduce formal approach for composing software components into distributed system we describe the system as hierarchical composition of some components which can be distributed on wide variety of hardware platforms and executed in parallel we represent each component by mathematical model and specify the abstract communication protocols of the components using interface automata ias to model hierarchical systems besides the basic components model we will present other components called nodes node consists of set of subnodes interacting under the supervision of controller each subnode in turn is node or discrete event component by considering subnode as node we can make hierarchical nodes components the entire system therefore forms the root of the hierarchy controller in turn is set of subcontrollers interface automata that specifies interaction protocol of the components inside node we have also presented an example demonstrating the model by illustrating nodes subnodes controllers and subcontrollers to address the state space explosion problem in system verification we utilize the controller as contract for independent analysis of the components and their interactions therefore node will not be analyzed directly instead we will analyze the controller
we propose several algorithms using the vector space model to classify the news articles posted on the netnews according to the newsgroup categories the baseline method combines the terms of all the articles of each newsgroup in the training set to represent the newsgroups as single vectors after training the incoming news articles are classified based on their similarity to the existing newsgroup categories we propose to use the following techniques to improve the classification performance of the baseline method use routing classification accuracy and the similarity values to refine the training set update the underlying term structures periodically during testing and apply means clustering to partition the newsgroup articles and represent each newsgroup by vectors our test collection consists of the real news articles and the subnewsgroups under the rec newsgroup of netnews in period of months our experimental results demonstrate that the technique of refining the training set reduces from one third to two thirds of the storage the technique of periodical updates improves the routing accuracy ranging from to but incurs runtime overhead finally representing each newsgroup by vectors with or using clustering yields the most significant improvement in routing accuracy ranging from to while causing only slightly higher storage requirements
this article presents method for real time line drawing of deforming objects object space line drawing algorithms for many types of curves including suggestive contours highlights ridges and valleys rely on surface curvature and curvature derivatives unfortunately these curvatures and their derivatives cannot be computed in real time for animated deforming objects in preprocessing step our method learns the mapping from low dimensional set of animation parameters eg joint angles to surface curvatures for deforming mesh the learned model can then accurately and efficiently predict curvatures and their derivatives enabling real time object space rendering of suggestive contours and other such curves this represents an order of magnitude speedup over the fastest existing algorithm capable of estimating curvatures and their derivatives accurately enough for many different types of line drawings the learned model can generalize to novel animation sequences and is also very compact typically requiring few megabytes of storage at runtime we demonstrate our method for various types of animated objects including skeleton based characters cloth simulation and blend shape facial animation using variety of nonphotorealistic rendering styles an important component of our system is the use of dimensionality reduction for differential mesh data we show that independent component analysis ica yields localized basis functions and gives superior generalization performance to that of principal component analysis pca
there is an increasing interest in techniques that support measurement and analysis of fielded software systems one of the main goals of these techniques is to better understand how software actually behaves in the field in particular many of these techniques require way to distinguish in the field failing from passing executions so far researchers and practitioners have only partially addressed this problem they have simply assumed that program failure status is either obvious ie the program crashes or provided by an external source eg the users in this paper we propose technique for automatically classifying execution data collected in the field as coming from either passing or failing program runs failing program runs are executions that terminate with failure such as wrong outcome we use statistical learning algorithms to build the classification models our approach builds the models by analyzing executions performed in controlled environment eg test cases run in house and then uses the models to predict whether execution data produced by fielded instance were generated by passing or failing program execution we also present results from an initial feasibility study based on multiple versions of software subject in which we investigate several issues vital to the applicability of the technique finally we present some lessons learned regarding the interplay between the reliability of classification models and the amount and type of data collected
the exponential growth of data demands scalable infrastructures capable of indexing and searching rich content such as text music and images promising direction is to combine information re trieval with peer to peer technology for scalability fault tolerance and low administration cost one pioneering work along this di rection is psearch psearch places documents onto peer to peer overlay network according to semantic vectors produced using latent semantic indexing lsi the search cost for query is reduced since documents related to the query are likely to be co located on small number of nodes unfortunately because of its reliance on lsi psearch also inherits the limitations of lsi when the corpus is large and heterogeneous lsi’s retrieval quality is inferior to methods such as okapi the singular value decomposition svd used in lsi is unscalable in terms of both memory consumption and computation timethis paper addresses the above limitations of lsi and makes the following contributions to reduce the cost of svd we reduce the size of its input matrix through document clustering and term selection our method retains the retrieval quality of lsi but is several orders of magnitude more efficient through extensive experimentation we found that proper normalization of semantic vectors for terms and documents improves recall by to further improve retrieval quality we use low dimensional subvectors of semantic vectors to cluster documents in the overlay and then use okapi to guide the search and document selection
in this paper we propose novel object tracking algorithm for video sequences based on active contours the tracking is based on matching the object appearance model between successive frames of the sequence using active contours we formulate the tracking as minimization of an objective function incorporating region boundary and shape information further in order to handle variation in object appearance due to self shadowing changing illumination conditions and camera geometry we propose an adaptive mixture model for the object representation the implementation of the method is based on the level set method we validate our approach on tracking examples using real video sequences with comparison to two recent state of the art methods
the database query optimizer requires the estimation of the query selectivity to find the most efficient access plan for queries referencing multiple attributes from the same relation we need multi dimensional selectivity estimation technique when the attributes are dependent each other because the selectivity is determined by the joint data distribution of the attributes additionally for multimedia databases there are intrinsic requirements for the multi dimensional selectivity estimation because feature vectors are stored in multi dimensional indexing trees in the dimensional case histogram is practically the most preferable in the multi dimensional case however histogram is not adequate because of high storage overhead and high error rates in this paper we propose novel approach for the multi dimensional selectivity estimation compressed information from large number of small sized histogram buckets is maintained using the discrete cosine transform this enables low error rates and low storage overheads even in high dimensions in addition this approach has the advantage of supporting dynamic data updates by eliminating the overhead for periodical reconstructions of the compressed information extensive experimental results show advantages of the proposed approach
this paper presents new way of thinking for ir metric optimization it is argued that the optimal ranking problem should be factorized into two distinct yet interrelated stages the relevance prediction stage and ranking decision stage during retrieval the relevance of documents is not known priori and the joint probability of relevance is used to measure the uncertainty of documents relevance in the collection as whole the resulting optimization objective function in the latter stage is thus the expected value of the ir metric with respect to this probability measure of relevance through statistically analyzing the expected values of ir metrics under such uncertainty we discover and explain some interesting properties of ir metrics that have not been known before our analysis and optimization framework do not assume particular relevance retrieval model and metric making it applicable to many existing ir models and metrics the experiments on one of resulting applications have demonstrated its significance in adapting to various ir metrics
query translation for cross lingual information retrieval clir has gained increasing attention in the research area previous work mainly used machine translation systems bilingual dictionaries or web corpora to perform query translation however most of these approaches require either expensive language resources or complex language models and cannot achieve timely translation for new queries in this paper we propose novel solution to automatically acquire query translation pairs from the knowledge hidden in the click through data that are represented by the url user clicks after submitting query to search engine our proposed solution consists of two stages identitying bilingual url pair patterns in the click through data and matching query translation pairs based on user click behavior experimental results on real dataset show that our method not only generates existing query translation pairs with high precision but also generates many timely query translation pairs that could not be obtained by previous methods comparative study between our system and two commercial online translation systems shows the advantage of our proposed method
we present dreadlocks an efficient new shared memory spin lock that actively detects deadlocks instead of spinning on boolean value each thread spins on the lock owner’s per thread digest compact representation of portion of the lock’s waits for graph digests can be implemented either as bit vectors for small numbers of threads or as bloom filters for larger numbers of threads updates to digests are propagated dynamically as locks are acquired and released dreadlocks can be applied to any spin lock algorithm that allows threads to time out experimental results show that dreadlocks outperform timeouts under many circumstances and almost never do worse
information visualization is challenging field enabling better use of humans visual and cognitive system to make sense of very large datasets this paper aims at improving the current information visualizations design workflow by enabling better cooperation among programmers designers and users in one to one and community oriented fashion our contribution is web based interface to create visualization flows that can be edited and shared between actors within communities we detail real case study where programmers designers and users successfully worked together to quickly design and improve an interactive image visualization interface based on images similarities
flow analysis is ubiquitous and much studied component of compiler technology and its variations abound amongst the most well known is shivers cfa however the best known algorithm for cfa requires time cubic in the size of the analyzed program and is unlikely to be improved consequently several analyses have been designed to approximate cfa by trading precision for faster computation henglein’s simple closure analysis for example forfeits the notion of directionality in flows and enjoys an almost linear time algorithm but in making trade offs between precision and complexity what has been given up and what has been gained where do these analyses differ and where do they coincide we identify core language the linear calculus where cfa simple closure analysis and many other known approximations or restrictions to cfa are rendered identical moreover for this core language analysis corresponds with instrumented evaluation because analysis faithfully captures evaluation and because the linear calculus is complete for ptime we derive ptime completeness results for all of these analyses
flexible test architecture for embedded cores and all interconnects in system on chip soc is presented it targets core testing parallelism and reduced test application time by using as much as possible existing core interconnects to form tam paths it also provides for dynamic wrapper reconfiguration algorithms that minimize the use of extra interconnects for the tam path formation are presented and evaluated
web macros automate the interactions of end users with web sites and related information systems though web macro recorders and players have grown in sophistication over the past decade these tools cannot yet meet many tasks that people perform in daily life based on observations of browser users we have compiled ten scenarios describing tasks that users would benefit from automating our analysis of these scenarios yields specific requirements that web macro tools should support if those tools are to be applicable to these real life tasks our set of requirements constitutes benchmark for evaluating tools which we demonstrate by evaluating the robofox coscripter and imacros tools
evaluating the impact of applications with significant operating system interactions requires detailed microarchitectural simulation combined with system level simulation cost effective and practical approach is to combine two widely used simulators simwattch integrates simics system level tool with wattch user level tool to facilitate analysis of wider design space for computer architects and system developers
many scientific applications including environmental monitoring outpatient health care research and wild life tracking require real time stream processing while state of the art techniques for processing window constrained stream queries tend to employ the delta result strategy to react to each and every change of the stream sensor measurements some scientific applications only require to produce results periodically making the complete result strategy better choice in this work we analyze the trade offs between the delta and the complete result query evaluation strategies we then design solution for hopping window query processing based on the above analysis in particular we propose query operators equipped with the ability to accept either delta or complete results as input and to produce either as output unlike prior works these flexible operators can then be integrated within one mode aware query plan taking advantage of both processing methodologies third we design mode assignment algorithm to optimally assign the input and output modes for each operator in the mode aware query plan lastly mode assignment is integrated with cost based plan optimizer the proposed techniques have been implemented within the wpi stream query engine called cape our experimental results demonstrate that our solution routinely outperforms the state of the art single mode solutions for various arrival rate and query plan shapes
in this article we introduce pesudoconstraints novel data mining pattern aimed at identifying rare events in databases at first we formally define pesudoconstraints using probabilistic model and provide statistical test to identify pesudoconstraints in database then we focus on specific class of pesudoconstraints named cycle pesudoconstraints which often occur in databases we define cycle pesudoconstraints in the context of the er model and present an automatic method for detecting cycle pesudoconstraints from relational database finally we present an experiment to show cycle pesudoconstraints ldquo at work rdquo on real data
without well provisioned dedicated servers modern fast paced action games limit the number of players who can interact simultaneously to this is because interacting players must frequently exchange state updates and high player counts would exceed the bandwidth available to participating machines in this paper we describe donnybrook system that enables epic scale battles without dedicated server resources even in fast paced game with tight latency bounds it achieves this scalability through two novel components first it reduces bandwidth demand by estimating what players are paying attention to thereby enabling it to reduce the frequency of sending less important state updates second it overcomes resource and interest heterogeneity by disseminating updates via multicast system designed for the special requirements of games that they have multiple sources are latency sensitive and have frequent group membership changes we present user study results using prototype implementation based on quake iii that show our approach provides desirable user experience we also present simulation results that demonstrate donnybrook’s efficacy in enabling battles of up to players
the difficulty of user query can affect the performance of information retrieval ir systems this work presents formal model for quantifying and reasoning about query difficulty as follows query difficulty is considered to be subjective belief which is formulated on the basis of various types of evidence this allows us to define belief model and set of operators for combining evidence of query difficulty the belief model uses subjective logic type of probabilistic logic for modeling uncertainties an application of this model with semantic and pragmatic evidence about trec queries illustrates the potential flexibility of this framework in expressing and combining evidence to our knowledge this is the first application of subjective logic to ir
many inductive systems including ilp systems learn from knowledge base that is structured around examples in practical situations this example centered representation can cause lot of redundancy for instance when learning from episodes eg from games the knowledge base contains consecutive states of world each state is usually described completely even though consecutive states may differ only slightly similar redundancies occur when the knowledge base stores examples that share common structures eg when representing complex objects as machines or molecules these two types of redundancies can place heavy burden on memory resources in this paper we propose method for representing knowledge bases in more efficient way this is accomplished by building graph that implicitly defines examples in terms of other structures we evaluate our method in the context of learning go heuristic
image annotation is an important computer vision problem where the goal is to determine the relevance of annotation terms for images image annotation has two main applications proposing list of relevant terms to users that want to assign indexing terms to images and ii supporting keyword based search for images without indexing terms using the relevance estimates to rank images in this paper we present tagprop weighted nearest neighbour model that predicts the term relevance of images by taking weighted sum of the annotations of the visually most similar images in an annotated training set tagprop can use collection of distance measures capturing different aspects of image content such as local shape descriptors and global colour histograms it automatically finds the optimal combination of distances to define the visual neighbours of images that are most useful for annotation prediction tagprop compensates for the varying frequencies of annotation terms using term specific sigmoid to scale the weighted nearest neighbour tag predictions we evaluate different variants of tagprop with experiments on the mir flickr set and compare with an approach that learns separate svm classifier for each annotation term we also consider using flickr tags to train our models both as additional features and as training labels we find the svms to work better when learning from the manual annotations but tagprop to work better when learning from the flickr tags we also find that using the flickr tags as feature can significantly improve the performance of svms learned from manual annotations
advances in data collection and storage capacity have made it increasingly possible to collect highly volatile graph data for analysis existing graph analysis techniques are not appropriate for such data especially in cases where streaming or near real time results are required an example that has drawn significant research interest is the cyber security domain where internet communication traces are collected and real time discovery of events behaviors patterns and anomalies is desired we propose metricforensics scalable framework for analysis of volatile graphs metricforensics combines multi level drill down approach collection of user selected graph metrics and collection of analysis techniques at each successive level more sophisticated metrics are computed and the graph is viewed at finer temporal resolutions in this way metricforensics scales to highly volatile graphs by only allocating resources for computationally expensive analysis when an interesting event is discovered at coarser resolution first we test metricforensics on three real world graphs an enterprise ip trace trace of legitimate and malicious network traffic from research institution and the mit reality mining proximity sensor data our largest graph has vertices and edges spanning days the results demonstrate the scalability and capability of metricforensics in analyzing volatile graphs and highlight four novel phenomena in such graphs elbows broken correlations prolonged spikes and lightweight stars
application of hardware parameterized models to distributed systems can result in omission of key bottlenecks such as the full cost of inter and intra node communication in cluster of smps however inclusion of message and middleware characteristics may result in impractical models nonetheless the growing gap between memory and cpu performance combined with the trend toward large scale clustered shared memory platforms implies an increased need to consider the impact of middleware on distributed communication we present software parameterized model of point to point communication for use in performance prediction and evaluation we illustrate the utility of the model in two ways to derive simple useful more accurate model of point to point communication in clusters of smps to predict and analyze point to point and broadcast communication costs in clusters of smps we present our results on an ia based cluster
virtual private servers and application checkpoint and restart are two advanced operating system features which place different but related requirements on the way kernel provided resources are accessed by userspace in linux kernel resources such as process ids and sysv shared messages have traditionally been identified using global tables since these tables have gradually been transformed into per process namespaces in order to support both resource availability on application restart and virtual private server functionality due to inherent differences in the resources themselves the semantics of namespace cloning differ for many of the resources this paper describes the existing and proposed namespaces as well as their uses
static analyses provide the semantic foundation for tools ranging from optimizing compilers to refactoring browsers and advanced debuggers unfortunately developing new analysis specifications and implementations is often difficult and error prone since analysis specifications are generally written in declarative style logic programming presents an attractive model for producing executable specifications of analyses however prior work on using logic programming for program analysis has focused exclusively on solving constraints derived from program texts by an external preprocessor in this paper we present dimple an analysis framework for java bytecodes implemented in the yap prolog system dimple provides both representation of java bytecodes in database of relations and declarative domain specific language for specifying new analyses as queries over this database dimple thus enables researchers to use logic programming for every step of the analysis development process from specification to prototype to implementation we demonstrate that our approach facilitates rapid prototyping of new program analyses and produces executable analysis implementations that are speed competitive with specialized analysis toolkits
we present new approach to deformable image registration suitable for articulated images such as hand drawn cartoon characters and human postures for such type of data state of the art techniques typically yield undesirable results we propose novel geometrically motivated iterative scheme where point movements are decoupled from shape consistency by combining locally optimal block matching with as rigid as possible shape regularization our algorithm allows us to register images undergoing large free form deformations and appearance variations we demonstrate its practical usability in various challenging tasks performed in the cartoon animation production pipeline including unsupervised inbetweening example based shape deformation auto painting editing and motion retargeting
it is well known that fairness assumptions can be crucial for verifying progress reactivity or other liveness properties for interleaving models this also applies to markov decision processes as an operational model for concurrent probabilistic systems and the task to establish tight lower or upper probability bounds for events that are specified by liveness properties in this paper we study general notions of strong and weak fairness constraints for markov decision processes formalized in an action or state based setting we present polynomially time bounded algorithm for the quantitative analysis of an mdp against automata specifications under fair worst or best case scenarios furthermore we discuss the treatment of strong and weak fairness and process fairness constraints in the context of partial order reduction techniques for markov decision processes that have been realized in the model checker liquor and rely on variant of peled’s ample set method
pretenuring long lived and immortal objects into infrequently or never collected regions reduces garbage collection costs significantly however extant approaches either require computationally expensive application specific off line profiling or consider only allocation sites common to all programs ie invoked by the virtual machine rather than application programs in contrast we show how simple program analysis combined with an object lifetime knowledge bank can be exploited to match both runtime system and application program structure with object lifetimes the complexity of the analysis is linear in the size of the program so need not be run ahead of time we obtain performance gains between in gc timeallagainst generational copying collector for several spec jvm programs
the system on chip era has arrived and it arrived quickly modular composition of components through shared interconnect is now becoming the standard rather than the exotic asynchronous interconnect fabrics and globally asynchronous locally synchronous gals design has been shown to be potentially advantageous however the arduous road to developing asynchronous on chip communication and interfaces to clocked cores is still nascent this road of converting to asynchronous networks and potentially the core intellectual property block as well will be rocky asynchronous circuit design has been employed since the however it is doubtful that its present form will be what we will see years hence this treatise is intended to provoke debate as it projects what technologies will look like in the future and discusses among other aspects the role of formal verification education the cad industry and the ever present tradeoff between greed and fear
collaborative filtering cf techniques are important in the business era as vital components of many recommender systems for they facilitate the generation of high quality recommendations by leveraging the similar preferences of community users however there is still major problem preventing cf algorithms from achieving better effectiveness the sparsity of training data lots of ratings in the training matrix are not collected few current cf methods try to do data smoothing before predicting the ratings of an active user in this work we have validated the effectiveness of data smoothing for memory based and hybrid collaborative filtering algorithms our experiments show that all these algorithms achieve higher accuracy after proper smoothing the average mean absolute error improvements of the three cf algorithms item based nearest neighbor and personality diagnosis are and respectively moreover we have compared different smoothing methods to show which works best for each of the algorithms
in logic program based updates contradictory information elimination conflict resolution and syntactic representation are three major issues that interfere with each other and significantly influence the update result we observe that existing approaches of logic program based updates in one way or another are problematic to deal with these issues in this article we address all these problems in systematic manner our approach to the logic program based update has the following features prioritized logic programming language is employed for providing formal basis of formalizing logic program based updates so that information conflict and its related problems in updates can be handled properly our approach presents both semantic characterization and syntactic representation for the underlying update procedure and hence is consistent with the nature of updates within the logic program extent declarative semantics and syntactic sensitivity and our approach also provides nontrivial solutions to simplify various update evaluation procedures under certain conditions
web applications are the most widely used class of software today increased diversity of web client platform configurations causes execution of web applications to vary unpredictably creating myriad of challenges for quality assurance during development this paper presents novel technique and an inductive model that leverages empirical data from fielded systems to evaluate web application correctness across multiple client configurations the inductive model is based on html tags and represents how web applications are expected to execute in each client configuration based on the fielded systems observed end users and developers update this model by providing empirical data in the form of positive correctly executing and negative incorrectly executing instances of fielded web applications the results of an empirical study show that the approach is useful and that popular web applications have serious client configuration specific flaws
we present new technique for user interface prototyping called mixed fidelity prototyping mixed fidelity prototyping combines and supports independent refinement of low medium and high fidelity interface elements within single prototype designers are able to investigate alternate more innovative designs and are able to elicit feedback from stakeholders without having to commit too early in the process the approach encourages collaboration among diverse group of stakeholders throughout the design process for example individuals who specialize in specific fidelities such as high fidelity components are able to become involved earlier on in the process we developed conceptual model called the region model and implemented proof of concept system called protomixer we then demonstrated the mixedfidelity approach by using protomixer to design an example application
literature on the topic of code cloning often asserts that duplicating code within software system is bad practice that it causes harm to the system’s design and should be avoided however in our studies we have found significant evidence that cloning is often used in variety of ways as principled engineering tool for example one way to evaluate possible new features for system is to clone the affected subsystems and introduce the new features there in kind of sandbox testbed as features mature and become stable within the experimental subsystems they can be migrated incrementally into the stable code base in this way the risk of introducing instabilities in the stable version is minimized this paper describes several patterns of cloning that we have observed in our case studies and discusses the advantages and disadvantages associated with using them we also examine through case study the frequencies of these clones in two medium sized open source software systems the apache web server and the gnumeric spreadsheet application in this study we found that as many as of the clones could be considered to have positive impact on the maintainability of the software system
we study class of scheduling problems which combines the structural aspects associated with task dependencies with the dynamic aspects associated with ongoing streams of requests that arrive during execution for this class of problems we develop scheduling policy which can guarantee bounded accumulation of backlog for all admissible request streams we show nevertheless that no such policy can guarantee bounded latency for all admissible request patterns unless they admit some laxity
non derivable frequent itemsets are one of several condensed representations of frequent itemsets which store all of the information contained in frequent itemsets using less space thus being more suitable for stream mining this paper considers problem that to the best of our knowledge has not been addressed namely how to mine non derivable frequent itemsets in an incremental fashion we design compact data structure named ndfit to efficiently maintain dynamically selected set of itemsets in ndfit the nodes are divided into four categories to reduce the redundant computational cost based on their properties consequently an optimized algorithm named ndfiods is proposed to generate non derivable frequent itemsets over stream sliding window our experimental results show that this method is effective and more efficient than previous approaches
recently many data types arising from data mining and web search applications can be modeled as bipartite graphs examples include queries and urls in query logs and authors and papers in scientific literature however one of the issues is that previous algorithms only consider the content and link information from one side of the bipartite graph there is lack of constraints to make sure the final relevance of the score propagation on the graph as there are many noisy edges within the bipartite graph in this paper we propose novel and general co hits algorithm to incorporate the bipartite graph with the content information from both sides as well as the constraints of relevance moreover we investigate the algorithm based on two frameworks including the iterative and the regularization frameworks and illustrate the generalized co hits algorithm from different views for the iterative framework it contains hits and personalized pagerank as special cases in the regularization framework we successfully build connection with hits and develop new cost function to consider the direct relationship between two entity sets which leads to significant improvement over the baseline method to illustrate our methodology we apply the co hits algorithm with many different settings to the application of query suggestion by mining the aol query log data experimental results demonstrate that coregu ie model of the regularization framework achieves the best performance with consistent and promising improvements
we present new sketch for summarizing network data the sketch has the following properties which make it useful in communication efficient aggregation in distributed streaming scenarios such as sensor networks the sketch is duplicate insensitive ie re insertions of the same data will not affect the sketch and hence the estimates of aggregates unlike previous duplicate insensitive sketches for sensor data aggregation it is also time decaying so that the weight of data item in the sketch can decrease with time according to user specified decay function the sketch can give provably approximate guarantees for various aggregates of data including the sum median quantiles and frequent elements the size of the sketch and the time taken to update it are both polylogarithmic in the size of the relevant data further multiple sketches computed over distributed data can be combined without losing the accuracy guarantees to our knowledge this is the first sketch that combines all the above properties
as cluster computers are used for wider range of applications we encounter the need to deliver resources at particular times to meet particular deadlines and or at the same time as other resources are provided elsewhere to address such requirements we describe scheduling approach in which users request resource leases where leases can request either as soon as possible best effort or reservation start times we present the design of lease management architecture haizea that implements leases as virtual machines vms leveraging their ability to suspend migrate and resume computations and to provide leased resources with customized application environments we discuss methods to minimize the overhead introduced by having to deploy vm images before the start of lease we also present the results of simulation studies that compare alternative approaches using workloads with various mixes of best effort and advance reservation requests we compare the performance of our vm based approach with that of non vm based schedulers we find that vm based approach can provide better performance measured in terms of both total execution time and average delay incurred by best effort requests than scheduler that does not support task pre emption and only slightly worse performance than scheduler that does support task pre emption we also compare the impact of different vm image popularity distributions and vm image caching strategies on performance these results emphasize the importance of vm image caching for the workloads studied and quantify the sensitivity of scheduling performance to vm image popularity distribution
this study is based on user scenario where augmented reality targets could be found by scanning the environment with mobile device and getting tactile feedback exactly in the direction of the target in order to understand how accurately and quickly the targets can be found we prepared an experiment setup where sensor actuator device consisting of orientation tracking hardware and tactile actuator were used the targets with widths and and various distances between each other were rendered in wide space successively and the task of the test participants was to find them as quickly as possible the experiment consisted of two conditions the first one provided tactile feedback only when pointing was on the target and the second one included also another cue indicating the proximity of the target the average target finding time was seconds the closest targets appeared to be not the easiest to find which was attributed to the adapted scanning velocity causing the missing the closest targets we also found that our data did not correlate well with fitts model which may have been caused by the non normal data distribution after filtering out of the least representative data items the correlation reached up to overall the performance between conditions did not differ from each other significantly the only significant improvement in the performance offered by the close to target cue occurred in the tasks where the targets where the furthest from each other
language libraries extend regular libraries with domain specific notation more precisely language library is combination of domain specific language embedded in the general purpose host language regular library implementing the underlying functionality and an assimilation transformation that maps embedded dsl fragments to host language code while the basic architecture for realizing language libraries is the same for all applications there are many design choices to be made in the design of particular combination of library guest language syntax host language and assimilation in this paper we give an overview of the design space for syntax embeddings and assimilations for the realization of language libraries
bitmap indexes must be compressed to reduce input output costs and minimize cpu usage to accelerate logical operations and or xor over bitmaps we use techniques based on run length encoding rle such as word aligned hybrid wah compression these techniques are sensitive to the order of the rows simple lexicographical sort can divide the index size by and make indexes several times faster we investigate reordering heuristics based on computed attribute value histograms simply permuting the columns of the table based on these histograms can increase the sorting efficiency by
in we have shown how to construct layered recurrent neural network that computes the fixed point of the meaning function tp of given propositional logic program cal which corresponds to the computation of the semantics of in this article we consider the first order case we define notion of approximation for interpretations and prove that there exists layered feed forward neural network that approximates the calculation of tp for given first order acyclic logic program with an injective level mapping arbitrarily well extending the feed forward network by recurrent connections we obtain recurrent neural network whose iteration approximates the fixed point of tp this result is proven by taking advantage of the fact that for acyclic logic programs the function tp is contraction mapping on complete metric space defined by the interpretations of the program mapping this space to the metric space with euclidean distance real valued function fp can be defined which corresponds to tp and is continuous as well as contraction consequently it can be approximated by an appropriately chosen class of feed forward neural networks
sketches are compact bit string representations of objects objects that have the same sketch are stored in the same database bucket by calculating the hamming distance of the sketches an estimation of the similarity of their respective objects can be obtained objects that are close to each other are expected to have sketches with small hamming distance values this estimation helps to schedule the order in which buckets are visited during search time recent research has shown that sketches can effectively approximate and distances in high dimensional settings remaining task is to provide general sketch for arbitrary metric spaces this paper presents novel sketch based on generalized hyperplane partitioning that can be employed on arbitrary metric spaces the core of the sketch is heuristic that tries to generate balanced partitions the indexing method aesa stores all the distances among database objects and this allows it to perform small number of distance computations experimental evaluations show that given good early termination strategy our algorithm performs up to one order of magnitude fewer distance operations than aesa in string spaces comparisons against other methods show greater gains furthermore we experimentally demonstrate that it is possible to reduce the physical size of the sketches by factor of ten with different run length encodings
the use of gateway proxies is one important approach to facilitating adaptation across wireless and mobile environments importantly augmented service entities deployed within the gateway proxy residing on the wired network can be composed and deployed to shield mobile clients from the effects of poor network characteristics the usual approach to the static composition of service entities on the gateway proxy is to have these service entities interact with each other by explicitly invoking procedures on the named interface but such tight coupling of interfaces inhibits the flexible composition and adaptation of the service entities to the dynamic operating characteristics of wireless networks in this paper we present mobile gateway for the active deployment of transport entities or for short mobigate pronounced mobi gate mobigate is mobile middleware framework that supports the robust and flexible composition of transport entities known as streamlets the flow of data traffic is subjected to processing by chain of streamlets each streamlet encapsulates service entity that adapts the flow of traffic across the wireless network to facilitate the dynamic reconfiguration of the streamlets we advocate applying the concept of coordination as the unifying approach to composing these transport service entities importantly mobigate delineates clear separation of interdependent parts from the service specific computational codes of those service entities it does this by using separate coordination language called mobigate coordination language mcl to describe the coordination among streamlet service entities the complete design implementation and evaluation of the mobigate system are presented in this paper initial experimental results validate the flexibility of the coordination approach in promoting separation of concern in the reconfiguration of services while achieving low computation and delay overheads
the problem of mechanically formalizing and proving metatheoretic properties of programming language calculi type systems operational semantics and related formal systems has received considerable attention recently however the dual problem of searching for errors in such formalizations has received comparatively little attention in this paper we consider the problem of bounded model checking for metatheoretic properties of formal systems specified using nominal logic in contrast to the current state of the art for metatheory verification our approach is fully automatic does not require expertise in theorem proving on the part of the user and produces counterexamples in the case that flaw is detected we present two implementations of this technique one based on negation as failure and one based on negation elimination along with experimental results showing that these techniques are fast enough to be used interactively to debug systems as they are developed
we describe method for generating queries for retrieving data from distributed heterogeneous semistructured documents and its implementation in the metadata interface ddxmi distributed document xml metadata interchange the proposed system generates local queries appropriate to local schemas from user query over the global schema the system constructs mappings between global schema and local schemas extracted from local documents if not given path substitution and node identification for resolving the heterogeneity among nodes with the same label that often exist in semistructured data the system uses quilt as its xml query language an experiment is reported over three local semistructured documents thesis reports and journal documents with article global schema the prototype was developed under windows system with java and javacc
processes are increasingly being used to make complex application logic explicit programming using processes has significant advantages but it poses difficult problem from the system point of view in that the interactions between processes cannot be controlled using conventional techniques in terms of recovery the steps of process are different from operations within transaction each one has its own termination semantics and there are dependencies among the different steps regarding concurrency control the flow of control of process is more complex than in flat transaction process may for example partially roll back its execution or may follow one of several alternatives in this article we deal with the problem of atomicity and isolation in the context of processes we propose unified model for concurrency control and recovery for processes and show how this model can be implemented in practice thereby providing complete framework for developing middleware applications using processes
recently there have been number of scheduling success stories in computer applications across wide array of applications the simple heuristic of prioritizing small jobs has been used to reduce user response times with enormous success for instance variants of shortest remaining processing time srpt and preemptive shortest job first psjf have been suggested for use in web servers wireless applications and databases as result of the attention given to size based policies by computer systems researchers there has been resurgence in analytical work studying these policies however the policies studied in theory eg srpt and psjf are idealized versions of the policies implemented by practitioners in particular the intricacies of computer systems force the use of complex hybrid policies in practice though these more complex policies are still built around the heuristic of prioritizing small jobs thus there exists gap between the results provided by theoretical research and the needs of practitioners this gap results from three primary disconnects between the model studied in theory and the needs of system designers first in designing systems the goal is not simply to provide small response times other performance measures are also important thus idealized policies such as srpt and psjf are often tweaked by practitioners to perform well on secondary performance measures eg fairness and slowdown second the overhead involved in distinguishing between an infinite number of different priority classes typically causes system designers to discretize policies such as srpt and psjf so that they use only small number of priority classes third in many cases information about the service demands sizes of jobs is inexact for instance when serving static content web servers have exact knowledge of the sizes of the files being served but have inexact knowledge of network conditions thus the web server only has an estimate of the true service demand
we propose knowledge representation approach to the semantic retrieval by content of graphics described in scalable vector graphics svg the novel xml based wc approved standard language for describing two dimensional graphicsthe approach is based on description logic devised for the semantic indexing and retrieval of complex objects we provide syntax to describe basic shapes complex objects as compositions of basic ones and transformations an extensional semantics which is compositional is introduced for defining retrieval classification and subsumption services algorithms exploiting reasoning services which are sound with respect to the semantics are also describedusing our logical approach as formal specification we implemented prototype system set of experiments carried out on testbed of svg documents to assess the retrieval capabilities of the system is presented
in this paper we describe pse postmortem symbolic evaluation static analysis algorithm that can be used by programmers to diagnose software failures the algorithm requires minimal information about failure namely its kind eg null dereference and its location in the program’s source code it produces set of execution traces along which the program can be driven to the given failure pse tracks the flow of single value of interest from the point in the program where the failure occurred back to the points in the program where the value may have originated the algorithm combines novel dataflow analysis and memory alias analysis in manner that allows for precise exploration of the program’s behavior in polynomial time we have applied pse to the problem of diagnosing potential null dereference errors in suite of programs including several spec benchmarks and large commercial operating system in most cases the analysis is able to either validate pointer dereference or find precise error traces demonstrating null value for the pointer in less than second
we investigate the complexity of greedy routing in uniform ring based random graphs general model that captures many topologies that have been proposed for peer to peer and social networks in this model the nodes form ring for each node we independently draw the set of distances along the ring from to its long range contacts from fixed distribution the same for all and connect to the corresponding nodes as well as its ring successor we prove that for any distribution in graph with nodes and an expected number of long range contacts per node constructed in this fashion the expected number of steps for greedy routing is logn lalog for some constant this improves an earlier lower bound of logn llog log by aspnes et al and is very close to the upper bound of logn achieved by greedy routing in kleinberg’s one dimensional small world networks particular instance of uniform ring based random graphs
with the increasing in demand on location based aware services and rfids efficient processing of continuous queries over moving object streams becomes important in this paper we propose an efficient in memory processing of continuous queries on themoving object streamswe model moving objects using function of time and use it in the prediction of usefulness of objects with respect to the continuous queries to effectively utilize the limited memory we derive several replacement policies to discard objects that are of no potential interest to the queries and design efficient algorithms with light data structures experimental studies are conducted and the results show that our proposed method is both memory and query efficient
peer to peer systems promise inexpensive scalability adaptability and robustness thus they are an attractive platform for file sharing distributed wikis and search engines these applications often store weakly structured data requiring sophisticated search algorithms to simplify the search problem most scalable algorithms introduce structure to the network however churn or violent disruption may break this structure compromising search guarantees this paper proposes simple probabilistic search system bubblestorm built on random multigraphs our primary contribution is flexible and reliable strategy for performing exhaustive search bubblestorm also exploits the heterogeneous bandwidth of peers however we sacrifice some of this bandwidth for high parallelism and low latency the provided search guarantees are tunable with success probability adjustable well into the realm of reliable systems for validation we simulate network with one million low end peers and show bubblestorm handles up to simultaneous peer departure and simultaneous crash
database fragmentation allows reducing irrelevant data accesses by grouping data frequently accessed together in dedicated segments in this paper we address multimedia database fragmentation to take into account the rich characteristics of multimedia objects we particularly discuss multimedia primary horizontal fragmentation and focus on semantic based textual predicates implication required as pre process in current fragmentation algorithms in order to partition multimedia data efficiently identifying semantic implication between similar queries if user searches for the images containing car he would probably mean auto vehicle van or sport car as well will improve the fragmentation process making use of the neighborhood concept in knowledge bases to identify semantic implications constitutes the core of our proposal prototype has been implemented to evaluate the performance of our approach
broadcast mechanism is prevalent in many forms of electronic networks modeling broadcast protocols succinctly and reasoning about how secure these protocols are is gaining importance as society increasingly comes to depend on wide variety of electronic communications in this work we present modified ambient calculus where the nature of communication is broadcast within domains we allow reconfigurable configurations of communication domains access restrictions to domains and the capability of modeling cryptographic communication protocols in broadcast scenarios
fuzzy sequential patterns are discovered by finding intertransaction fuzzy patterns among data items at single level in this paper fuzzy data mining method for finding fuzzy sequential patterns at multiple levels of abstraction is developed actually new method is proposed to mine multiple level fuzzy sequential patterns using fuzzy partition by simple fuzzy grid among data items at concept hierarchy the proposed method is composed of two phases one to find frequent level crossing fuzzy sequences and the other to generate multiple level fuzzy sequential patterns by analyzing the temporal relation between those frequent fuzzy sequences numerical example along with mining process is used to illustrate the usefulness of the proposed method
in information visualization adding and removing data elements can strongly impact the underlying visual space we have developed an inherently incremental technique incboard that maintains coherent disposition of elements from dynamic multidimensional data set on grid as the set changes here we introduce novel layout that uses pairwise similarity from grid neighbors as defined in incboard to reposition elements on the visual space free from constraints imposed by the grid the board continues to be updated and can be displayed alongside the new space as similar items are placed together while dissimilar neighbors are moved apart it supports users in the identification of clusters and subsets of related elements densely populated areas identified in the incspace can be efficiently explored with the corresponding incboard visualization which is not susceptible to occlusion the solution remains inherently incremental and maintains coherent disposition of elements even for fully renewed sets the algorithm considers relative positions for the initial placement of elements and raw dissimilarity to fine tune the visualization it has low computational cost with complexity depending only on the size of the currently viewed subset thus data set of size can be sequentially displayed in time reaching only if the complete set is simultaneously displayed
traditional access control models such as role based access control rbac do not take into account contextual information such as location and time for making access decisions consequently they are inadequate for specifying the access control needs of many complex real world applications such as the dengue decision support dds that we discuss in this paper we need to ensure that such applications are adequately protected using emerging access control models this requires us to represent the application and its access control requirements in formal specification language we choose the unified modeling language uml for this purpose since uml is becoming the defacto specification language in the software industry we need to analyze this formal specification to get assurance that the application is adequately protected manual analysis is error prone and tedious thus we need automated tools for verification of uml models towards this end we propose that the uml models be converted to alloy alloy is based on first order logic has software infrastructure that supports automated analysis and has been used for the verification of real world applications we show how to convert the uml models to alloy and verify the resulting model using the alloy analyzer which has embedded sat solvers the results from the alloy analyzer will help uncover the flaws in the specification and help us refine the application and its access control requirements
text is pervasive information type and many applications require querying over text sources in addition to structured data this paper studies the problem of query processing in system that loosely integrates an extensible database system and text retrieval system we focus on class of conjunctive queries that include joins between text and structured data in addition to selections over these two types of data we adapt techniques from distributed query processing and introduce novel class of join methods based on probing that is especially useful for joins with text systems and we present cost model for the various alternative query processing methods experimental results confirm the utility of these methods the space of query plans is extended due to the additional techniques and we describe an optimization algorithm for searching this extended space the techniques we describe in this paper are applicable to other types of external data managers loosely integrated with database system
measuring the similarity between implicit semantic relations is an important task in information retrieval and natural language processing for example consider the situation where you know an entity pair eg google youtube between which particular relation holds eg acquisition and you are interested in retrieving other entity pairs for which the same relation holds eg yahoo inktomi existing keyword based search engines cannot be directly applied in this case because in keyword based search the goal is to retrieve documents that are relevant to the words used in the query not necessarily to the relations implied by pair of words accurate measurement of relational similarity is an important step in numerous natural language processing tasks such as identification of word analogies and classification of noun modifier pairs we propose method that uses web search engines to efficiently compute the relational similarity between two pairs of words our method consists of three components representing the various semantic relations that exist between pair of words using automatically extracted lexical patterns clustering the extracted lexical patterns to identify the different semantic relations implied by them and measuring the similarity between different semantic relations using an inter cluster correlation matrix we propose pattern extraction algorithm to extract large number of lexical patterns that express numerous semantic relations we then present an efficient clustering algorithm to cluster the extracted lexical patterns finally we measure the relational similarity between word pairs using inter cluster correlation we evaluate the proposed method in relation classification task experimental results on dataset covering multiple relation types show statistically significant improvement over the current state of the art relational similarity measures
design automation or computer aided design cad for field programmable gate arrays fpgas has played critical role in the rapid advancement and adoption of fpga technology over the past two decades the purpose of this paper is to meet the demand for an up to date comprehensive survey tutorial for fpga design automation with an emphasis on the recent developments within the past years the paper focuses on the theory and techniques that have been or most likely will be reduced to practice it covers all major steps in fpga design flow which includes routing and placement circuit clustering technology mapping and architecture specific optimization physical synthesis rt level and behavior level synthesis and power optimization we hope that this paper can be used both as guide for beginners who are embarking on research in this relatively young yet exciting area and useful reference for established researchers in this field
given dimensional data set point dominates another point if it is better than or equal to in all dimensions and better than in at least one dimension point is skyline point if there does not exists any point that can dominate it skyline queries which return skyline points are useful in many decision making applicationsunfortunately as the number of dimensions increases the chance of one point dominating another point is very low as such the number of skyline points become too numerous to offer any interesting insights to find more important and meaningful skyline points in high dimensional space we propose new concept called dominant skyline which relaxes the idea of dominance to dominance point is said to dominate another point if there are dimensions in which is better than or equal to and is better in at least one of these dimensions point that is not dominated by any other points is in the dominant skylinewe prove various properties of dominant skyline in particular because dominant skyline points are not transitive existing skyline algorithms cannot be adapted for dominant skyline we then present several new algorithms for finding dominant skyline and its variants extensive experiments show that our methods can answer different queries on both synthetic and real data sets efficiently
prior work has identified set based comparisons as useful primitive for supporting wide variety of similarity functions in record matching accordingly various techniques have been proposed to improve the performance of set similarity lookups however this body of work focuses almost exclusively on symmetric notions of set similarity in this paper we study the indexing problem for the asymmetric jaccard containment similarity function that is an error tolerant variation of set containment we enhance this similarity function to also account for string transformations that reflect synonyms such as bob and robert referring to the same first name we propose an index structure that builds inverted lists on carefully chosen token sets and lookup algorithm using our index that is sensitive to the output size of the query our experiments over real life data sets show the benefits of our techniques to our knowledge this is the first paper that studies the indexing problem for jaccard containment in the presence of string transformations
in this paper we describe new method of instruction prefetching that reduces the cache miss penalty by anticipating the cache behavior based on previous execution our observations indicate that instruction cache misses often repeat in clusters under certain conditions prevalent in real time embedded networking systems by identifying the start of cluster miss sequence and preparing an instruction buffer for the upcoming cache misses the miss penalty can be reduced if miss does occur sample industrial networking example is used to illustrate the effectiveness of this technique compared with other prefetch methods
the ability to efficiently aggregate information mdash for example compute the average temperature mdash in large networks is crucial for the successful employment of sensor networks this article addresses the problem of designing truly scalable protocols for computing aggregates in the presence of faults protocols that can enable million node sensor networks to work efficiently more precisely we make four distinct contributions first we introduce simple fault model and analyze the behavior of two existing protocols under the fault model tree aggregation and gossip aggregation second since the behavior of the two protocols depends on the size of the network and probability of failure we introduce hybrid approach that can leverage the strengths of the two protocols and minimize the weaknesses the new protocol is analyzed under the same fault model third we propose methodology for determining the optimal mix between the two basic protocols the methodology consists in formulating an optimization problem using models of the protocol behavior and solving it fourth we perform extensive experiments to evaluate the performance of the hybrid protocol and show that it usually performs better sometimes orders of magnitude better than both the tree and gossip aggregation
formal verification tools have been extensively used in the past to assess the correctness of protocols processes and systems in general their most common use so far has been in identifying whether livelock or deadlock situations can occur during protocol execution process or system operation in this paper we aim to showcase that an additional equally important and useful application of formal verification tools can be in protocol design and optimization itself this can be achieved by using the tools in rather different context compared to their traditional use that is not only as means to assess the correctness of protocol in terms of lack of livelock and deadlock situations but rather as tools capable of building profiles of protocols associating performance related metrics and identifying operational patterns and possible bottleneck operations in terms of metrics of interest this process can provide protocol designers with an insight about the protocols behavior and guide them towards further protocol design optimizations we illustrate these principles using some existing protocol implementations as case studies
software risk management studies commonly focus on project level risks and strategies software architecture investigations are often concerned with the design implementation and maintenance of the architecture however there has been little effort to study risk management in the context of software architecture we have identified risks and corresponding management strategies specific to software architecture evolution as they occur in industry from interviews with norwegian it professionals the most influential and frequent risk was lack of stakeholder communication affected implementation of new and changed architectural requirements negatively the second most frequent risk was poor clustering of functionality affected performance negatively architects focus mainly on architecture creation however their awareness of needed improvements in architecture evaluation and documentation is increasing most have no formally defined documented architecture evaluation method nor mention it as mitigation strategy instead problems are fixed as they occur eg to obtain the missing artefacts
previous work on superimposed coding has been characterized by two aspects first it is generally assumed that signatures are generated from logical text blocks of the same size that is each block contains the same number of unique terms after stopword and duplicate removal we call this approach the fixed size block fsb method since each text block has the same size as measured by the number of unique terms contained in it second with only few exceptions most previous work has assumed that each term in the text contributes the same number of ones to the signature ie the weight of the term signatures is fixed the main objective of this paper is to derive an optimal weight assignment that assigns weights to document terms according to their occurrence and query frequencies in order to minimize the false drop probability the optimal scheme can account for both uniform and nonuniform occurence and query frequencies and the signature generation method is still based on hashing rather than on table lookup furthermore new way of generating signatures the fixed weight block fwb method is introduced fwb controls the weight of every signature to constant whereas in fsb only the expected signature weight is constant we have shown that fwb has lower false drop probability than that of the fsb method but its storage overhead is slightly higher other advantages of fwb are that the optimal weight assignment can be obtained analytically without making unrealistic assumptions and that the formula for computing the term signature weights is simple and efficient
this work explores the use of statistical techniques namely stratified sampling and cluster analysis as powerful tools for deriving traffic properties at the flow level our results show that the adequate selection of samples leads to significant improvements allowing further important statistical analysis although stratified sampling is well known technique the way we classify the data prior to sampling is innovative and deserves special attention we evaluate two partitioning clustering methods namely clustering large applications clara and means and validate their outcomes by using them as thresholds for stratified sampling we show that using flow sizes to divide the population we can obtain accurate estimates for both size and flow durations the presented sampling and clustering classification techniques achieve data reduction levels higher than that of existing methods on the order of while maintaining good accuracy for the estimates of the sum mean and variance for both flow duration and sizes
review assignment is common task that many people such as conference organizers journal editors and grant administrators would have to do routinely as computational problem it involves matching set of candidate reviewers with paper or proposal to be reviewed common deficiency of all existing work on solving this problem is that they do not consider the multiple aspects of topics or expertise and all match the entire document to be reviewed with the overall expertise of reviewer as result if document contains multiple subtopics which often happens existing methods would not attempt to assign reviewers to cover all the subtopics instead it is quite possible that all the assigned reviewers would cover the major subtopic quite well but not covering any other subtopic in this paper we study how to model multiple aspects of expertise and assign reviewers so that they together can cover all subtopics in the document well we propose three general strategies for solving this problem and propose new evaluation measures for this task we also create multi aspect review assignment test set using acm sigir publications experiment results on this data set show that the proposed methods are effective for assigning reviewers to cover all topical aspects of document
reactive system can be specified by labelled transition system which indicates static structure along with temporal logic formulas which assert dynamic behaviour but refining the former while preserving the latter can be difficult because labelled transition systems are lsquo total rsquo ndash characterised up to bisimulation ndash meaning that no new transition structure can appear in refinement ii alternatively refinement criterion not based on bisimulation might generate refined transition system that violates the temporal propertiesin response larsen and thomson proposed modal transition systems which are lsquo partial rsquo and defined refinement criterion that preserved formulas in hennessy ndash milner logic we show that modal transition systems are up to saturation condition exactly the mixed transition systems of dams that meet mix condition and we extend such systems to non flat state sets we then solve domain equation over the mixed powerdomain whose solution is bifinite domain that is universal for all saturated modal transition systems and is itself fully abstract when considered as modal transition system we demonstrate that many frameworks of partial systems can be translated into the domain partial kripke structures partial bisimulation structures kripke modal transition systems and pointer shape analysis graphs
real time group editors allow distributed users to edit shared document at the same time over computer network operational transformation ot is well accepted consistency control method in state of the art group editors significant progress has been made in this field but there are still many open issues and research opportunities in particular established theoretic ot frameworks all require that ot algorithms be able to converge along arbitrary transformation paths this property is desirable because group editors that implement such algorithms will not rely on central component for achieving convergence however this has not been achieved in any published work to our knowledge we analyze the root of this problem and propose novel state difference based transformation sdt approach which ensures convergence in the presence of arbitrary transformation paths our approach is based on novel consistency model that is more explicitly formulated than previously established models for proving correctness sdt is the first and the only ot algorithm proved to converge in peer to peer group editors
context bounded analysis is an attractive approach to verification of concurrent programs bounding the number of contexts executed per thread not only reduces the asymptotic complexity but also the complexity increases gradually from checking purely sequential programlal and reps provided method for reducing the context bounded verification of concurrent boolean program to the verification of sequential boolean program thereby allowing sequential reasoning to be employed for verifying concurrent programs in this work we adapt the encoding to work for systems programs written in with the heap and accompanying low level operations such as pointer arithmetic and casts our approach is completely automatic we use verification condition generator and smt solvers instead of boolean model checker in order to avoid manual extraction of boolean programs and false alarms introduced by the abstraction we demonstrate the use of field slicing for improving the scalability and in some cases coverage of our checking we evaluate our tool storm on set of real world windows device drivers and has discovered bug that could not be detected by extensive application of previous tools
crash failure detection is key topic in fault tolerance and it is important to be able to assess the qos of failure detection services most previous work on crash failure detectors has been based on the crash stop or fail free assumption in this paper we study and model crash recovery service which has the ability to recover from the crash state we analyse the qos bounds for such crash recovery failure detection service our results show that the dependability metrics of the monitored service will have an impact on the qos of the failure detection service our results are corroborated by simulation results showing bounds on the qos
the promise ofweb service computing is to utilizeweb services as fundamental elements for realizing distributed applications solutions in particular when no available service can satisfy client request parts of available services can be composed and orchestrated in order to satisfy such request in this paper we address the automatic composition when component services have access control authorization constraints and impose further reputation constraints on other component services in particular access authorization control is based on credentials component services may or not trust of credentials issued by other component services and the service behavior is modeled by the possible conversations the service can have with its clients we propose an automatic composition synthesis technique based on reduction to satisfiability in propositional dynamic logic that is sound complete and decidable moreover we will characterize the computational complexity of the problem
in canonical parallel processing the operating system os assigns processing core to single thread from multithreaded server application since different threads from the same application often carry out similar computation albeit at different times we observe extensive code reuse among different processors causing redundancy eg in our server workloads of all instruction blocks are accessed by all processors moreover largely independent fragments of computation compete for the same private resources causing destructive interference together this redundancy and interference lead to poor utilization of private microarchitecture resources such as caches and branch predictorswe present computation spreading csp which employs hardware migration to distribute thread’s dissimilar fragments of computation across the multiple processing cores of chip multiprocessor cmp while grouping similar computation fragments from different threads together this paper focuses on specific example of csp for os intensive server applications separating application level user computation from the os calls it makeswhen performing csp each core becomes temporally specialized to execute certain computation fragments and the same core is repeatedly used for such fragments we examine two specific thread assignment policies for csp and show that these policies across four server workloads are able to reduce instruction misses in private caches by private load misses by and branch mispredictions by
we study the complexity of two person constraint satisfaction games an instance of such game is given by collection of constraints on overlapping sets of variables and the two players alternately make moves assigning values from finite domain to the variables in specified order the first player tries to satisfy all constraints while the other tries to break at least one constraint the goal is to decide whether the first player has winning strategy we show that such games can be conveniently represented by logical form of quantified constraint satisfaction where an instance is given by first order sentence in which quantifiers alternate and the quantifier free part is conjunction of positive atomic formulas the goal is to decide whether the sentence is true while the problem of deciding such game is pspace complete in general by restricting the set of allowed constraint predicates one can obtain infinite classes of constraint satisfaction games of lower complexity we use the quantified constraint satisfaction framework to study how the complexity of deciding such game depends on the parameter set of allowed predicates with every predicate one can associate certain predicate preserving operations called polymorphisms we show that the complexity of our games is determined by the surjective polymorphisms of the constraint predicates we illustrate how this result can be used by identifying the complexity of wide variety of constraint satisfaction games
this paper surveys methods for representing and reasoning with imperfect information it opens with an attempt to classify the different types of imperfection that may pervade data and discussion of the sources of such imperfections the classification is then used as framework for considering work that explicitly concerns the representation of imperfect information and related work on how imperfect information may be used as basis for reasoning the work that is surveyed is drawn from both the field of databases and the field of artificial intelligence both of these areas have long been concerned with the problems caused by imperfect information and this paper stresses the relationships between the approaches developed in each
this paper proposes service discovery protocol for sensor networks that is specifically tailored for human centered pervasive environments and scales well to large sensor networks such as those deployed for medical care in major incidents and hospitals it uses the high level concept of computational activities logical bundles of data and resources to give sensors in activity based sensor networks absns knowledge about their usage even at the network layer absn redesigns classical service discovery protocols to include logical structuring of the network for more applicable discovery scheme noting that in practical settings activity based sensor patches are localized absn designs fully distributed hybrid discovery protocol based on extended zone routing protocol ezrp proactive in neighbourhood zone and reactive outside so that any query among the sensors of one activity is routed through the network with minimum overhead guided by the bounds of that activity compared to ezrp absn lowers the network overhead of the discovery process while keeping discovery latency close to optimal
magpie has been one of the first truly effective approaches to bringing semantics into the web browsing experience the key innovation brought by magpie was the replacement of manual annotation process by an automatically associated ontology based semantic layer over web resources which ensured added value at no cost for the user magpie also differs from older open hypermedia systems its associations between entities in web page and semantic concepts from an ontology enable link typing and subsequent interpretation of the resource the semantic layer in magpie also facilitates locating semantic services and making them available to the user so that they can be manually activated by user or opportunistically triggered when appropriate patterns are encountered during browsing in this paper we track the evolution of magpie as technology for developing open and flexible semantic web applications magpie emerged from our research into user accessible semantic web and we use this viewpoint to assess the role of tools like magpie in making semantic content useful for ordinary users we see such tools as crucial in bootstrapping the semantic web through the automation of the knowledge generation process
today independent publishers are offering digital libraries with fulltext archives in an attempt to provide single user interface to large set of archives the studied article database service offers consolidated interface to geographically distributed set of archives while this approach offers tremendous functional advantage to user the fulltext download delays caused by the network and queuing in servers make the user perceived interactive performance poor this paper studies how effective caching of articles at the client level can be achieved as well as at intermediate points as manifested by gateways that implement the interfaces to the many fulltext archives central research question in this approach is what is the nature of locality in the user access stream to such digital library based on access logs that drive the simulations it is shown that client side caching can result in hit rate even at the gateway level temporal locality is observable but published replacement algorithms are unable to exploit this temporal locality additionally spatial locality can be exploited by considering loading into cache all articles in an issue volume or journal if single article is accessed but our experiments showed that improvement introduced lot of overhead finally it is shown that the reason for this cache behavior is the long time distance between re accesses which makes caching quite unfeasible
in modern superscalar processors the complex instruction scheduler could form the critical path of the pipeline stages and limit the clock cycle time in addition complex scheduling logic results in the formation of hot spot on the processor chip consequently the latency and power consumption of the dynamic scheduler are two of the most crucial design issues when developing high performance microprocessor we propose an instruction wakeup scheme that remedies the speed and power issues faced with conventional designs this is achieved by new design that separates ram cells from the match circuits this separated design is such that the advantages of the cam and bit map ram schemes are retained while their respective disadvantages are eliminated specifically the proposed design retains the moderate area advantage of the cam scheme and the low power and low latency advantages of the bit map ram schemethe experimental results show that the proposed design saves power consumption by compared to the traditional cam based design and to the bit map ram design respectively in speed the proposed design reduces an average of in the wakeup latency compared to the conventional cam based design and an average of reduction of the latency of the bit map ram design for an issue superscalar processor the proposed design reduces the power consumption of the conventional wakeup logic by while simultaneously increasing the instruction count per nano second ipns by factor of approximately times with moderate area cost
the purpose of the starburst project is to improve the design of relational database management systems and enhance their performance while building an extensible system to better support nontraditional applications and to serve as testbed for future improvements in database technology the design and implementation of the starburst system to date are considered some key design decisions and how they affect the goal of improved structure and performance are examined how well the goal of extensibility has been met is examined what aspects of the system are extensible how extensions can be done and how easy it is to add extensions some actual extensions to the system including the experiences of the first real customizers are discussed
this paper reports on an evaluation of the usability designer role as applied in two swedish systems development organisations the role was initially defined by us but evolved in these two organisations we conducted interviews with usability designers project managers and user representative our main research question was whether or not the introduction of usability designer has been successful in terms of changes in the systems development process and the impact the role has had on products projects and organisations to some extent the role has met our expectations and intentions for instance in helping the usability designers shift their focus towards design and assume some kind of users advocate role but in other ways the role failed the usability designers in our study are still facing the kind of problems and obstacles that usability professionals have always had to deal with
the widespread diffusion of mobile computing calls for novel services capable of providing results that depend on both the current physical position of users location and the logical set of accessible resources subscribed services preferences and requirements context leaving the burden of location context management to applications complicates service design and development in addition traditional middleware solutions tend to hide location context visibility to the application level and are not suitable for supporting novel adaptive services for mobile computing scenarios the article proposes flexible middleware for the development and deployment of location context aware services for heterogeneous data access in the internet primary design choice is to exploit high level policy framework to simplify the specification of services that the middleware dynamically adapts to the client location context in addition the middleware adopts the mobile agent technology to effectively support autonomous asynchronous and local access to data resources and is particularly suitable for temporarily disconnected clients the article also presents the case study of museum guide assistant service that provides visitors with location context dependent artistic data the case study points out the flexibility and usability of the proposed middleware that permits automatic service reconfiguration with no impact on the implementation of the application logic
multi objective optimization is concerned with problems involving multiple measures of performance which should be optimized simultaneously in this paper we extend and or branch and bound aobb well known search algorithm from mono objective to multi objective optimization the new algorithm mo aobb exploits efficiently the problem structure by traversing an and or search tree and uses static and dynamic mini bucket heuristics to guide the search we show that mo aobb improves dramatically over the traditional or search approach on various benchmarks for multi objective optimization
we simulate fluid flow by locally refined lattice boltzmann method lbm on graphics hardware low resolution lbm simulation running on coarse grid models global flow behavior of the entire domain with low consumption of computational resources for regions of interest where small visual details are desired lbm simulations are performed on fine grids which are separate grids superposed on the coarse one the flow properties on boundaries of the fine grids are determined by the global simulation on the coarse grid thus the locally refined fine grid simulations follow the global fluid behavior and model the desired small scale and turbulent flow motion with their denser numerical discretization fine grid can be initiated and terminated at any time while the global simulation is running it can also move inside the domain with moving object to capture small scale vortices caused by the object besides the performance improvement due to the adaptive simulation the locally refined lbm is suitable for acceleration on contemporary graphics hardware gpu since it involves only local and linear computations therefore our approach achieves fast and adaptive flow simulation for computer games and other interactive applications
an important application of semantic web technology is recognizing human defined concepts in text query transformation is strategy often used in search engines to derive queries that are able to return more useful search results than the original query and most popular search engines provide facilities that let users complete specify or reformulate their queries we study the problem of semantic query suggestion special type of query transformation based on identifying semantic concepts contained in user queries we use feature based approach in conjunction with supervised machine learning augmenting term based features with search history based and concept specific features we apply our method to the task of linking queries from real world query logs the transaction logs of the netherlands institute for sound and vision to the dbpedia knowledge base we evaluate the utility of different machine learning algorithms features and feature types in identifying semantic concepts using manually developed test bed and show significant improvements over an already high baseline the resources developed for this paper ie queries human assessments and extracted features are available for download
the reliability of file systems depends in part on how well they propagate errors we develop static analysis technique edp that analyzes how file systems and storage device drivers propagate error codes running our edp analysis on all file systems and major storage device drivers in linux we find that errors are often incorrectly propagated calls drop an error code without handling it we perform set of analyses to rank the robustness of each subsystem based on the completeness of its error propagation we find that many popular file systems are less robust than other available choices we confirm that write errors are neglected more often than read errors we also find that many violations are not cornercase mistakes but perhaps intentional choices finally we show that inter module calls play part in incorrect error propagation but that chained propagations do not in conclusion error propagation appears complex and hard to perform correctly in modern systems
in the near future it will be possible to continuously record and store the entire audio visual lifetime of person together with all digital information that the person perceives or creates while the storage of this data will be possible soon retrieval and indexing into such large data sets are unsolved challenges since today’s retrieval cues seem insufficient we argue that additional cues obtained from body worn sensors make associative retrieval by humans possible we present three approaches to create such cues each along with an experimental evaluation the user’s physical activity from acceleration sensors his social environment from audio sensors and his interruptibility from multiple sensors
the magic lens concept is focus and context technique which facilitates the visualization of complex and dense data in this paper we propose new type of tangible magic lens in the form of flexible sheet we describe new interaction techniques associated with this tool and demonstrate how it can be applied in different ar applications
most of the prevalent anomaly detection systems use some training data to build models these models are then utilized to capture any deviations resulting from possible intrusions the efficacy of such systems is highly dependent upon training data set free of attacks clean or labeled training data is hard to obtain this paper addresses the very practical issue of refinement of unlabeled data to obtain clean data set which can then train an online anomaly detection system our system called morpheus represents system call sequence using the spatial positions of motifs subsequences within the sequence we also introduce novel representation called sequence space to denote all sequences with respect to reference sequence experiments on well known data sets indicate that our sequence space can be effectively used to purge anomalies from unlabeled sequences although an unsupervised anomaly detection system in itself our technique is used for data purification clean training set thus obtained improves the performance of existing online host based anomaly detection systems by increasing the number of attack detections
we analyze the capacity scaling laws of mobile ad hoc networks comprising heterogeneous nodes and spatial inhomogeneities most of previous work relies on the assumption that nodes are identical and uniformly visit the entire network space experimental data however show that the mobility pattern of individual nodes is usually restricted over the area while the overall node density is often largely inhomogeneous due to the presence of node concentration points in this paper we introduce general class of mobile networks which incorporates both restricted mobility and inhomogeneous node density and describe methodology to compute the asymptotic throughput achievable in these networks by the store carry forward communication paradigm we show how the analysis can be mapped under mild assumptions into maximum concurrent flow mcf problem over an associated generalized random geometric graph grgg moreover we propose an asymptotically optimal scheduling and routing scheme that achieves the maximum network capacity
this study examines how digital products can be designed towards increased levels of experienced engagement an experiment was conducted in which participants were asked to interact with videogame that varied in behavior and appearance aspects during experiential and goal directed tasks behavioral aspects were manipulated by varying the amount of possibilities in the game that also affected the complexity in human action appearance aspects were manipulated by varying the colorfulness detail and asymmetry within the visual design during experiential tasks participants were free to explore the game and during goal directed tasks participants were given goal that had to be completed as efficiently as possible results indicate that experienced engagement is based upon the extent the game provided rich experiences and by the extent the game provided sense of control based on these results recommendations for designing engaging interactions with digital products are discussed
model checking suffers from the state explosion problem due to the exponential increase in the size of finite state model as the number of system components grows directed model checking aims at reducing this problem through heuristic based search strategies the model of the system is built while checking the formula and this construction is guided by some heuristic function in this line we have defined structure based heuristic function operating on processes described in the calculus of communicating systems ccs which accounts for the structure of the formula to be verified expressed in the selective hennessy milner logic we have implemented tool to evaluate the method and verified sample of well known ccs processes with respect to some formulae the results of which are reported and commented
discrete event simulation is very popular technique for the performance evaluation of systems and in widespread use in network simulation tools it is well known however that discrete event simulation suffers from the problem of simultaneous events different execution orders of events with identical timestamps may lead to different simulation results current simulation tools apply tie breaking mechanisms which order simultaneous events for execution while this is an accepted solution legitimate question is why should only single simulation result be selected and other possible results be ignored in this paper we argue that confidence in simulation results may be increased by analyzing the impact of simultaneous events we present branching mechanism which examines different execution orders of simultaneous events and may be used in conjunction with or as an alternative to tie breaking rules we have developed new simulation tool moose which provides branching mechanisms for both sequential and distributed discrete event simulation while moose has originally been developed for network simulation it is fully usable as general simulation tool
sensor networks have been an attractive platform for pervasive computing and communication however they are vulnerable to attacks if deployed in hostile environments the past research of sensor network security has focused on securing information in communication but how to secure information in storage has been overlooked meanwhile distributed data storage and retrieval have become popular for efficient data management in sensor networks which renders the absence of schemes for securing stored information to be more severe problem therefore we propose three evolutionary schemes namely the simple hash based shb scheme the enhanced hash based ehb scheme and the adaptive polynomial based apb scheme to deal with the problem all the schemes have the properties that only authorized entities can access data stored in the sensor network and the schemes are resilient to large number of sensor node compromises the ehb and the apb schemes do not involve any centralized entity except for few initialization or renewal operations and thus support secure distributed data storage and retrieval the apb scheme further provides high scalability and flexibility and hence is the most suitable among the three schemes for real applications the schemes were evaluated through extensive analysis and tossim based simulations
document classification presents difficult challenges due to the sparsity and the high dimensionality of text data and to the complex semantics of the natural language the traditional document representation is word based vector bag of words or bow where each dimension is associated with term of the dictionary containing all the words that appear in the corpus although simple and commonly used this representation has several limitations it is essential to embed semantic information and conceptual patterns in order to enhance the prediction capabilities of classification algorithms in this paper we overcome the shortages of the bow approach by embedding background knowledge derived from wikipedia into semantic kernel which is then used to enrich the representation of documents our empirical evaluation with real data sets demonstrates that our approach successfully achieves improved classification accuracy with respect to the bow technique and to other recently developed methods
in this paper we propose bio inspired architecture for the visual reconstruction of silhouettes of moving objects based on the behaviour of simple cells complex cells and the long range interactions of these neurons present in the primary visual cortex of the primates this architecture was tested with real sequences of images acquired in natural environments the results combined with our previous results show the flexibility of our propose since it allows not only to reconstruct the silhouettes of objects in general but also allows to distinguish between different types of objects in motion this distinction is necessary since our future objective is the identification of people by their gait
we apply machine learning to the linear ordering problem in order to learn sentence specific reordering models for machine translation we demonstrate that even when these models are used as mere preprocessing step for german english translation they significantly outperform moses integrated lexicalized reordering model our models are trained on automatically aligned bitext their form is simple but novel they assess based on features of the input sentence how strongly each pair of input word tokens wi wj would like to reverse their relative order combining all these pairwise preferences to find the best global reordering is np hard however we present non trivial algorithm based on chart parsing that at least finds the best reordering within certain exponentially large neighborhood we show how to iterate this reordering process within local search algorithm which we use in training
conceptual relational mappings between conceptual models and relational schemas have been used increasingly to achieve interoperability or overcome impedance mismatch in modern data centric applications however both schemas and conceptual models evolve over time to accommodate new information needs when the conceptual model cm or the schema associated with mapping evolved the mapping needs to be updated to reflect the new semantics in the cm schema in this paper we propose round trip engineering solution which essentially synchronizes models by keeping them consistent for maintaining concep tual relational mappings first we define the consistency of conceptual relational mapping through semantically compatible instances next we carefully analyze the knowledge encoded in the standard database design process and develop round trip algorithms for maintaining the consistency of conceptual relational mappings under evolution finally we conduct set of comprehensive experiments the results show that our solution is efficient and provides significant benefits in comparison to the mapping reconstructing approach
prosody is an important cue for identifying dialog acts in this paper we show that modeling the sequence of acoustic prosodic values as gram features with maximum entropy model for dialog act da tagging can perform better than conventional approaches that use coarse representation of the prosodic contour through summative statistics of the prosodic contour the proposed scheme for exploiting prosody results in an absolute improvement of over the use of most other widely used representations of acoustic correlates of prosody the proposed scheme is discriminative and exploits context in the form of lexical syntactic and prosodic cues from preceding discourse segments such decoding scheme facilitates online da tagging and offers robustness in the decoding process unlike greedy decoding schemes that can potentially propagate errors our approach is different from traditional da systems that use the entire conversation for offline dialog act decoding with the aid of discourse model in contrast we use only static features and approximate the previous dialog act tags in terms of lexical syntactic and prosodic information extracted from previous utterances experiments on the switchboard damsl corpus using only lexical syntactic and prosodic cues from three previous utterances yield da tagging accuracy of compared to the best case scenario with accurate knowledge of previous da tags oracle which results in accuracy
transactional memory and speculative locking are optimistic concurrency control mechanisms whose goal is to enable highly concurrent execution while reducing the programming effort the same basic idea lies in the heart of both methods optimistically execute critical code segment determine whether there have been data conflicts and roll back in case validation fails transactional memory is widely considered to have advantages over lock based synchronization on shared memory multiprocessors several recent works suggest employment of transactional memory in distributed environment however being derived from traditional shared memory design space these schemes seem to be not optimistic enough for this setting each thread must validate the current transaction before proceeding to the next hence blocking remote requests whose purpose is to detect avoid data conflicts are placed on the critical path and thus delay execution in this paper we investigate whether in light of the above shortcomings speculative locking can be suitable alternative for transactional memory in distributed environment we present novel distributed speculative locking scheme and compare its properties to the existing distributed transactional memory protocols despite the conceptual similarity to transactional memory the distributed implementation of speculative locking manages to overlap communication with computation it allows thread to speculatively acquire multiple locks simultaneously which is analogous to executing one transaction before validating the previous
the arabic language has very rich complex morphology each arabic word is composed of zero or more prefixes one stem and zero or more suffixes consequently the arabic data is sparse compared to other languages such as english and it is necessary to conduct word segmentation before any natural language processing task therefore the word segmentation step is worth deeper study since it is preprocessing step which shall have significant impact on all the steps coming afterward in this article we present an arabic mention detection system that has very competitive results in the recent automatic content extraction ace evaluation campaign we investigate the impact of different segmentation schemes on arabic mention detection systems and we show how these systems may benefit from more than one segmentation scheme we report the performance of several mention detection models using different kinds of possible and known segmentation schemes for arabic text punctuation separation arabic treebank and morphological and character level segmentations we show that the combination of competitive segmentation styles leads to better performance results indicate statistically significant improvement when arabic treebank and morphological segmentations are combined
we explore the design space of two sided interactive touch table designed to receive touch input from both the top and bottom surfaces of the table by combining two registered touch surfaces we are able to offer new dimension of input for co located collaborative groupware this design accomplishes the goal of increasing the relative size of the input area of touch table while maintaining its direct touch input paradigm we describe the interaction properties of this two sided touch table report the results of controlled experiment examining the precision of user touches to the underside of the table and series of application scenarios we developed for use on inverted and two sided tables finally we present list of design recommendations based on our experiences and observations with inverted and two sided tables
data warehousing and on line analytical processing olap are technologies intended to support business intelligence spatial olap integrates spatial data into olap systems spatial olap models reformulate main olap concepts to define spatial dimensions and measures and spatio multidimensional navigation operators spatial olap reduces geographic information to its spatial component without taking into account map generalization relationships into the multidimensional decision process in this paper we present the concept of geographic dimension which extends the classical definition of spatial dimension by introducing map generalization hierarchies as they enhance analysis capabilities of solap models and systems geographic dimension is described by spatial descriptive and ormap generalization hierarchies these hierarchies permit to define ad hoc aggregation functions but at the same time raise several modeling problems
this paper examines simultaneous multithreading technique permitting several independent threads to issue instructions to superscalar’s multiple functional units in single cycle we present several models of simultaneous multithreading and compare them with alternative organizations wide superscalar fine grain multithreaded processor and single chip multiple issue multiprocessing architectures our results show that both single threaded superscalar and fine grain multithreaded architectures are limited their ability to utilize the resources of wide issue processor simultaneous multithreading has the potential to achieve times the throughput of superscalar and double that of fine grain multithreading we evaluate several cache configurations made possible by this type of organization and evaluate tradeoffs between them we also show that simultaneous multithreading is an attractive alternative to single chip multiprocessors simultaneous multithreaded processors with variety of organizations outperform corresponding conventional multiprocessors with similar execution resourceswhile simultaneous multithreading has excellent potential to increase processor utilization it can add substantial complexity to the design we examine many of these complexities and evaluate alternative organizations in the design space
in this article we present novel training method for localized phrase based prediction model for statistical machine translation smt the model predicts block neighbors to carry out phrase based translation that explicitly handles local phrase reordering we use maximum likelihood criterion to train log linear block bigram model which uses real valued features eg language model score as well as binary features based on the block identities themselves eg block bigram features the model training relies on an efficient enumeration of local block neighbors in parallel training data novel stochastic gradient descent sgd training algorithm is presented that can easily handle millions of features moreover when viewing smt as block generation process it becomes quite similar to sequential natural language annotation problems such as part of speech tagging phrase chunking or shallow parsing our novel approach is successfully tested on standard arabic english translation task using two different phrase reordering models block orientation model and phrase distortion model
calendars and periodicity play fundamental role in many applications recently some commercial databases started to support user defined periodicity in the queries in order to provide human friendly way of handling time see eg timeseries in oracle on the other hand only few relational data models support user defined periodicity in the data mostly using mathematical expressions to represent periodicity in this paper we propose high level symbolic language for representing user defined periodicity which seems to us more human oriented than mathematical ones and we use the domain of gadia’s temporal elements in order to define its properties and its extensional semantics we then propose temporal relational model which supports user defined symbolic periodicity eg to express on the second monday of each month in the validity time of tuples and also copes with frame times eg from to we define the temporal counterpart of the standard operators of the relational algebra and we introduce new temporal operators and functions we also prove that our temporal algebra is consistent extension of the classical atemporal one moreover we define both fully symbolic evaluation method for the operators on the periodicities in the validity times of tuples which is correct but not complete and semisymbolic one which is correct and complete and study their computational complexity
we present feature based technique for morphing objects represented by light fields our technique enables morphing of image based objects whose geometry and surface properties are too difficult to model with traditional vision and graphics techniques light field morphing is not based on reconstruction instead it relies on ray correspondence ie the correspondence between rays of the source and target light fields we address two main issues in light field morphing feature specification and visibility changes for feature specification we develop an intuitive and easy to use user interface ui the key to this ui is feature polygons which are intuitively specified as polygons and are used as control mechanism for ray correspondence in the abstract ray space for handling visibility changes due to object shape changes we introduce ray space warping ray space warping can fill arbitrarily large holes caused by object shape changes these holes are usually too large to be properly handled by traditional image warping our method can deal with non lambertian surfaces including specular surfaces with dense light fields we demonstrate that light field morphing is an effective and easy to use technqiue that can generate convincing morphing effects
practical knowledge discovery is an iterative process first the experiences gained from one mining run are used to inform the parameter setting and the dataset and attribute selection for subsequent runs second additional data either incremental additions to existing datasets or the inclusion of additional attributes means that the mining process is reinvoked perhaps numerous times reducing the number of iterations improving the accuracy of parameter setting and making the results of the mining run more clearly understandable can thus significantly speed up the discovery processin this paper we discuss our experiences in this area and present system that helps the user to navigate through association rule result sets in way that makes it easier to find useful results from large result set we present several techniques that experience has shown us to be useful the prototype system irsetnav is discussed which has capabilities in redundant rule reduction subjective interestingness evaluation item and itemset pruning related information searching text based itemset and rule visualisation hierarchy based searching and tracking changes between data sets using knowledge base techniques also discussed in the paper but not yet accommodated into irsetnav include input schema selection longitudinal ruleset analysis and graphical visualisation techniques
discovering frequent itemsets is key problem in important data mining applications such as the discovery of association rules strong rules episodes and minimal keys typical algorithms for solving this problem operate in bottom up breadth first search direction the computation starts from frequent itemsets the minimum length frequent itemsets and continues until all maximal length frequent itemsets are found during the execution every frequent itemset is explicitly considered such algorithms perform well when all maximal frequent itemsets are short however performance drastically deteriorates when some of the maximal frequent itemsets are long we present new algorithm which combines both the bottom up and the top down searches the primary search direction is still bottom up but restricted search is also conducted in the top down direction this search is used only for maintaining and updating new data structure the maximum frequent candidate set it is used to prune early candidates that would be normally encountered in the bottom up search very important characteristic of the algorithm is that it does not require explicit examination of every frequent itemset therefore the algorithm performs well even when some maximal frequent itemsets are long as its output the algorithm produces the maximum frequent set ie the set containing all maximal frequent itemsets thus specifying immediately all frequent itemsets we evaluate the performance of the algorithm using well known synthetic benchmark databases real life census and stock market databases the improvement in performance can be up to several orders of magnitude compared to the best previous algorithms
failure detector fd is the fundamental component of fault tolerant computer systems in recent years many research works have been done on the study of qos and implementation of fds for distributed computing environments almost all of these works are based on the heartbeat approach hbfd in this paper we propose general model for implementing fds which separates the processes to be monitored from the underlying running environment we identify the potential problems of hbfd approach and propose an alternative approach to implementing fds called notification based fd ntfd instead of letting the process periodically send heartbeat messages to show it is still alive in ntfd the underlying watchdog mechanism sends failure notification messages only when the failure of monitored process is detected locally compared with hbfd implementation under our model ntfd is more efficient and scalable and can guarantee the strong accuracy property trade off of achieving qos of fd is analyzed and the results show that ntfd has much higher probability to achieve better balance between completeness and accuracy yet provides much lower probability of false report and lower system cost based on the analysis we propose the design of hybrid fd which combines the advantages of hbfd and ntfd
accurate modeling of delay power and area of interconnections early in the design phase is crucial for effective system level optimization models presently used in system level optimizations such as network on chip noc synthesis are inaccurate in the presence of deep submicron effects in this paper we propose new highly accurate models for delay and power in buffered interconnects these models are usable by system level designers for existing and future technologies we present general and transferable methodology to construct our models from wide variety of reliable sources liberty lef itf itrs ptm etc the modeling infrastructure and number of characterized technologies are available as open source our models comprehend key interconnect circuit and layout design styles and power efficient buffering technique that overcomes unrealities of previous delay driven buffering techniques we show that our models are significantly more accurate than previous models for global and intermediate buffered interconnects in nm and nm foundry processes essentially matching signoff analyses we also integrate our models in the cosi occ synthesis tool and show that the more accurate modeling significantly affects optimal achievable architectures that are synthesized by the tool the increased accuracy provided by our models enables system level designers to obtain better assessments of the achievable performance power area tradeoffs for communication centric aspects of system design with negligible setup and overhead burdens
in this paper novel cache conscious indexing technique based on space partitioning trees is proposed many researchers investigated efficient cache conscious indexing techniques which improve retrieval performance of in memory database management system recently however most studies considered data partitioning and targeted fast information retrieval existing data partitioning based index structures significantly degrade performance due to the redundant accesses of overlapped spaces specially tree based index structures suffer from the propagation of mbr minimum bounding rectangle information by updating data frequently in this paper we propose an in memory space partitioning index structure for optimal cache utilization the proposed index structure is compared with the existing index structures in terms of update performance insertion performance and cache utilization rate in variety of environments the results demonstrate that the proposed index structure offers better performance than existing index structures
recommender system is useful for digital library to suggest the books that are likely preferred by user most recommender systems using collaborative filtering approaches leverage the explicit user ratings to make personalized recommendations however many users are reluctant to provide explicit ratings so ratings oriented recommender systems do not work well in this paper we present recommender system for cadal digital library namely cares which makes recommendations using ranking oriented collaborative filtering approach based on users access logs avoiding the problem of the lack of user ratings our approach employs mean ap correlation coefficients for computing similarities among users implicit preference models and random walk based algorithm for generating book ranking personalized for the individual experimental results on real access logs from the cadal web site show the effectiveness of our system and the impact of different values of parameters on the recommendation performance
while typical software component has clearly specified static interface in terms of the methods and the input output types they support information about the correct sequencing of method calls the client must invoke is usually undocumented in this paper we propose novel solution for automatically extracting such temporal specifications for java classes given java class and safety property such as the exception should not be raised the corresponding dynamic interface is the most general way of invoking the methods in the class so that the safety property is not violated our synthesis method first constructs symbolic representation of the finite state transition system obtained from the class using predicate abstraction constructing the interface then corresponds to solving partial information two player game on this symbolic graph we present sound approach to solve this computationally hard problem approximately using algorithms for learning finite automata and symbolic model checking for branching time logics we describe an implementation of the proposed techniques in the tool jist java interface synthesis tool and demonstrate that the tool can construct interfaces accurately and efficiently for sample javasdk library classes
we adapt the compact routing scheme by thorup and zwick to optimize it for power law graphs we analyze our adapted routing scheme based on the theory of unweighted random power law graphs with fixed expected degree sequence by aiello chung and lu our result is the first theoretical bound coupled to the parameter of the power law graph model for compact routing scheme in particular we prove that for stretch instead of routing tables with bits as in the general scheme by thorup and zwick expected sizes of nγ log bits are sufficient and that all the routing tables can be constructed at once in expected time log with where is the power law exponent and both bounds also hold with probability at least independent of the routing scheme is labeled scheme requiring stretch handshaking step and using addresses and message headers with log log log bits with probability at least we further demonstrate the effectiveness of our scheme by simulations on real world graphs as well as synthetic power law graphs with the same techniques as for the compact routing scheme we also adapt the approximate distance oracle by thorup and zwick for stretch and obtain new upper bound of expected for space and preprocessing
system call interposition is common approach to restrict the power of applications and to detect code injections it enforces model that describes what system calls and or what sequences thereof are permitted however there exist various issues like concurrency vulnerabilities and incomplete models that restrict the power of system call interposition approaches we present new system switchblade that uses randomized and personalized fine grained system call models to increase the probability of detecting code injections however using fine grain system call model we cannot exclude the possibility that the model is violated during normal program executions to cope with false positives switchblade uses on demand taint analysis to update system call model during runtime
user created media content is being increasingly shared with the communities people belong to the content has role of motivator in social interaction within the communities in fact the content creation and management can be often seen as collective effort where group members participate to create common memories and maintain relationships we studied how four communities interact with content that is collectively created and used ie collective content the aim was to explore communities collaborative interaction activities and the purposes of the content to be able to specify what collective content actually is we report users motivations for creating the collective content and its role in community interaction we determine the factors and characteristics by which collectivity ie the extent to which something is collective of the content can be described the community’s contribution the relevance of the content and the level of sharing based on the results we present new dimension of collectivity for categorizing media content and thus being able to better illustrate the community aspects in content interaction
in service oriented architectures soa composed services provide functionalities with certain non functional properties that depend on the properties of the basic services models that represent dependencies among these properties are necessary to analyze non functional properties of composed services in this paper we focus on the reliability of soa most reliability models for software that is assembled from basic elements eg objects components or services assume that the elements are independent namely they do not take into account the dependencies that may exist between basic elements we relax this assumption here and propose reliability model for soa that embeds the error propagation property we present path based model that generates the possible execution paths within soa from set of scenarios the reliability of the whole system is then obtained as combination of the reliability of all generated paths on the basis of our model we show on an example that the error propagation analysis may be key factor for trustworthy prediction of the reliability of soa such reliability model for soa may support during the system development the allocation of testing effort among services and at run time the selection of functionally equivalent services offered by different providers
this paper is concerned with the scaling of the number of relay nodes ie hops individual messages have to transit through in large scale wireless ad hoc network wanet we call this hop count as network latency nl large network latency affects all aspects of data communication in wanet including an increase in delay packet loss and the power needed to process and store messages in nodes lying on the relay path we consider network management and data routing challenges in wanets with scalable network latency eg when nl increases only polylogarithmically in the network size on the physical side reducing network latency imposes significantly higher power and bandwidth demand on nodes and are captured in set of new bounds derived in this paper on the protocol front designing distributed routing protocols that can guarantee the delivery of data packets within scalable number of hops is challenging task to solve this we introduce multiresolution randomized hierarchy mrrh novel power and bandwidth efficient wanet protocol with scalable network latency mrrh uses randomized algorithm for building and maintaining random hierarchical network topology which together with the proposed routing algorithm can guarantee efficient delivery of data packets in the wireless network for network of size mrrh can provide an average latency of only log the power consumption and bandwidth requirements of mrrh are shown to be nearly optimal for the latency it provides therefore mrrh is provably efficient candidate for truly large scale wireless ad hoc networking
there is much current interest in publishing and viewing databases as xml documents the general benefits of this approach follow from the popularity of xml and the tool set available for visualizing and processing information encoded in this universal standard in this paper we explore the additional and unique benefits achieved by this approach on temporal database applications we show that xml with xquery can provide surprisingly effective solutions to the problem of supporting historical queries on past content of database relations and their evolution indeed using xml the histories of database relations can be naturally represented by temporally grouped data models thus we identify mappings from relations to xml that are most conducive to modeling and querying database histories and show that temporal queries that would be difficult to express in sql can be easily expressed in standard xquery this approach is very general insofar as it can be used to store the version history of arbitrary documents and for relational databases it also supports queries on the evolution of their schemathen we turn to the problem of supporting efficiently the storage and the querying of relational table histories we present an experimental study of the pros and cons of using native xml databases versus using traditional databases where the xml represented histories are supported as views on the historical tables
mosaicing is connecting two or more images and making new wide area image with no visible seam lines several algorithms have been proposed to construct mosaics from image sequence where the camera motion is more or less complex most of these methods are based either on the interest points matching or on theoretical corner models this paper describes fully automated image mosaicing method based on the regions and the harris points primitives indeed in order to limit the search window of potential homologous points for each point of interest regions segmentation and matching steps are being performed this enables us to improve the reliability and the robustness of the harris points matching process by estimating the camera motion the main originality of the proposed system resides in the preliminary manipulation of regions matching thus making it possible to estimate the rotation the translation and the scale factor between two successive images of the input sequence this estimation allows an initial alignment of the images along with the framing of the interest points search window and therefore reducing considerably the complexity of the interest points matching algorithm then the resolution of minimization problem altogether considering the couples of matched points permits us to perform the homography in order to improve the mosaic continuity around junctions radiometric corrections are applied the validity of the herewith described method is illustrated by being tested on several sequences of complex and challenging images captured from real world indoor and outdoor scenes these simulations proved the validity of the proposed method against camera motions illumination variations acquirement conditions moving objects and image noise to determine the importance of the regions matching stage in motion estimation as well as for the framing of the search window associated to point of interest we compared the matching points results of this described method with those produced using the zero mean normalized cross correlation score without regions matching we made this comparison in the case of simple motion without the presence of rotation around optical axis and or scale factor in the case of rotation and in the general case of an homothety for justifying the effectiveness of this method we proposed an objective assessment by defining reconstruction error
writing detailed api application programming interface documentation is significant task for developing good class library or framework however existing documentation tools such as javadoc provide only limited support and thus the description written by programmers for api documentation often contains scattering text occasionally it also contains tangling text this paper presents that this problem is due to crosscutting concerns of api documentation then it proposes our new tool named comment weaver which provides several mechanisms for modularly describing api documentation of class libraries or frameworks written in java or aspectj it is an extended javadoc tool and it provides several new tags for controlling how the text manually written by the programmers is scattering and appended to other entries or how it is moved from the original entry to another entry to be tangling finally this paper evaluates commentweaver by using three class libraries and frameworks javassist the java standard library and eclipse it showed that commentweaver resolves the problems of scattering or tangling text and it adequately reduces the amount of description written by programmers for api documentation
we present region based image retrieval framework that integrates efficient region based representation in terms of storage and retrieval and effective on line learning capability the framework consists of methods for image segmentation and grouping indexing using modified inverted file relevance feedback and continuous learning by exploiting vector quantization method compact region based image representation is achieved based on this representation an indexing scheme similar to the inverted file technology is proposed in addition it supports relevance feedback based on the vector model with weighting scheme continuous learning strategy is also proposed to enable the system to self improve experimental results on database of general purposed images demonstrate the efficiency and effectiveness of the proposed framework
local data structure invariants are asserted over bounded fragment of data structure around distinguished node of the data structure an example of such an invariant for sorted doubly linked list is for all nodes of the list if null and mnext null then mnextprev and mvalue mnextvalue it has been shown that such local invariants are both natural and sufficient for describing large class of data structures this paper explores novel technique called krystal to infer likely local data structure invariants using variant of symbolic execution called universal symbolic execution universal symbolic execution is like traditional symbolic execution except the fact that we create fresh symbolic variable for every read of lvalue that has no mapping in the symbolic state rather than creating symbolic variable only for inputs this helps universal symbolic execution to symbolically track data flow for all memory locations along an execution even if input values do not flow directly into those memory locations we have implemented our algorithm and applied it to several data structure implementations in java our experimental results show that we can infer many interesting local invariants for these data structures
this paper presents the results of study in which artists made line drawings intended to convey specific shapes the study was designed so that drawings could be registered with rendered images of models supporting an analysis of how well the locations of the artists lines correlate with other artists with current computer graphics line definitions and with the underlying differential properties of the surface lines drawn by artists in this study largely overlapped one another are within mm of another line particularly along the occluding contours of the object most lines that do not overlap contours overlap large gradients of the image intensity and correlate strongly with predictions made by recent line drawing algorithms in computer graphics were not well described by any of the local properties considered in this study the result of our work is publicly available data set of aligned drawings an analysis of where lines appear in that data set based on local properties of models and algorithms to predict where artists will draw lines for new scenes
information integration systems allow users to express queries over high level conceptual models however such queries must subsequently be evaluated over collections of sources some of which are likely to be expensive to use or subject to periods of unavailability as such it would be useful if information integration systems were able to provide users with estimates of the consequences of omitting certain sources from query execution plans such omissions can affect both the soundness the fraction of returned answers which are returned and the completeness the fraction of correct answers which are returned of the answer set returned by plan many recent information integration systems have used conceptual models expressed in description logics dls this paper presents an approach to estimating the soundness and completeness of queries expressed in the alcqi dl our estimation techniques are based on estimating the cardinalities of query answers we have have conducted some statistical evaluation of our techniques the results of which are presented here we also offer some suggestions as to how estimates for cardinalities of subqueries can be used to aid users in improving the soundness and completeness of query plans
we consider the general problem of learning from both pairwise constraints and unlabeled data the pairwise constraints specify whether two objects belong to the same class or not known as the must link constraints and the cannot link constraints we propose to learn mapping that is smooth over the data graph and maps the data onto unit hypersphere where two must link objects are mapped to the same point while two cannot link objects are mapped to be orthogonal we show that such mapping can be achieved by formulating semidefinite programming problem which is convex and can be solved globally our approach can effectively propagate pairwise constraints to the whole data set it can be directly applied to multi class classification and can handle data labels pairwise constraints or mixture of them in unified framework promising experimental results are presented for classification tasks on variety of synthetic and real data sets
recently we proposed an optimization strategy for spatial and non spatial mixed queries in the strategy the filter step and the refinement step of spatial operator are regarded as individual algebraic operators and are early separated at the algebraic level by the query optimizer by doing so the optimizer using the strategy could generate more diverse and efficient plans than the traditional optimizer we called this optimization strategy the early separated filter and refinement esfar in this paper we improved the cost model of the esfar optimizer considering the real life environment such as the lru buffer the clustering of the dataset and the selectivity of the real data distribution and we conducted new experiment for esfar by comparing the optimization result generated by the new cost model and the actual execution result using real data the experimental result showed that our cost model is accurate and our esfar optimizer estimates the costs of execution plans wellsince the esfar strategy has more operators and more rules than the traditional one it consumes more optimization time in this paper we apply two existing heuristic algorithms the iterative improvement ii and the simulated annealing sa to the esfar optimizer additionally we propose new heuristic algorithm to find good initial state of ii and sa through experiments we show that the ii and sa algorithms in the esfar strategy find good sub optimal plan in reasonable time mostly the heuristic algorithms find lower cost plan in less time than the optimal plan generated by the traditional optimizer especially the ii algorithm with the initial state heuristic rapidly finds plan of high quality
optimizing exception handling is critical for programs that frequently throw exceptions we observed that there are many such exception intensive programs written in java there are two commonly used exception handling techniques stack unwinding and stack cutting stack unwinding optimizes the normal path by leaving the exception handling path unoptimized while stack cutting optimizes the exception handling path by adding extra work to the normal path however there has been no single exception handling technique to optimize the exception handling path without incurring any overhead to the normal pathwe propose new technique called exception directed optimization edo that optimizes exception intensive programs without slowing down exception minimal programs it is feedback directed dynamic optimization consisting of three steps exception path profiling exception path inlining and throw elimination exception path profiling attempts to detect hot exception paths exception path inlining embeds every hot exception path into the corresponding catching method throw elimination replaces throw with branch to the corresponding handler we implemented edo in ibm’s production just in time compiler and made several experiments in summary it improved the performance of exception intensive programs by up to percnt without decreasing the performance of exception minimal programs for specjvm we also found an opportunity for performance improvement using edo in the startup of java application server
in this paper we study the characteristics of search queries submitted from mobile devices using various yahoo one search applications during months period in the second half of and report the query patterns derived from million english sample queries submitted by users in us canada europe and asia we examine the query distribution and topical categories the queries belong to in order to find new trends we compare and contrast the search patterns between us vs international queries and between queries from various search interfaces xhtml wap java widgets and sms we also compare our results with previous studies wherever possible either to confirm previous findings or to find interesting differences in the query distribution and pattern
ziggurat is meta language system that permits programmers to develop scheme like macros for languages with nontrivial static semantics such as or java suitably encoded in an expression concrete syntax ziggurat permits language designers to construct towers of language levels with macros each level in the tower may have its own static semantics such as type systems or flow analyses crucially the static semantics of the languages at two adjacent levels in the tower can be connected allowing improved reasoning power at higher level to be reflected down to the static semantics of the language level below we demonstrate the utility of the ziggurat framework by implementing higher level language facilities as macros on top of an assembly language utilizing static semantics such as termination analysis polymorphic type system and higher order flow analysis
we present an experimental comparison of multi touch and tangible user interfaces for basic interface actions twelve participants completed manipulation and acquisition tasks on an interactive surface in each of three conditions tangible user interface multi touch and mouse and puck we found that interface control objects in the tangible condition were easiest to acquire and once acquired were easier more accurate to manipulate further qualitative analysis suggested that in the evaluated tasks tangibles offer greater adaptability of control and specifically highlighted problem of exit error that can undermine fine grained control in multi touch interactions we discuss the implications of these findings for interface design
for long time topological relationships between spatial objects have been focus of research in number of disciplines like artificial intelligence cognitive science linguistics robotics and spatial reasoning especially as predicates they support the design of suitable query languages for spatial data retrieval and analysis in spatial databases and geographical information systems gis unfortunately they have so far only been defined for and applicable to simplified abstractions of spatial objects like single points continuous lines and simple regions with the introduction of complex spatial data types an issue arises regarding the design definition and number of topological relationships operating on these complex types this article closes this gap and first introduces definitions of general and versatile spatial data types for complex points complex lines and complex regions based on the well known intersection model it then determines the complete sets of mutually exclusive topological relationships for all type combinations completeness and mutual exclusion are shown by proof technique called proof by constraint and drawing due to the resulting large numbers of predicates and the difficulty of handling them the user is provided with the concepts of topological cluster predicates and topological predicate groups which permit one to reduce the number of predicates to be dealt with in user defined and or application specific manner
memory accesses often account for about half of microprocessor system’s power consumption customizing microprocessor cache’s total size line size and associativity to particular program is well known to have tremendous benefits for performance and power customizing caches has until recently been restricted to core based flows in which new chip will be fabricated however several configurable cache architectures have been proposed recently for use in prefabricated microprocessor platforms tuning those caches to program is still however cumbersome task left for designers assisted in part by recent computer aided design cad tuning aids we propose to move that cad on chip which can greatly increase the acceptance of tunable caches we introduce on chip hardware implementing an efficient cache tuning heuristic that can automatically transparently and dynamically tune the cache to an executing program our heuristic seeks not only to reduce the number of configurations that must be examined but also traverses the search space in way that minimizes costly cache flushes by simulating numerous powerstone and mediabench benchmarks we show that such dynamic self tuning cache saves on average percnt of total memory access energy over standard nontuned reference cache
evaluation of recommender system algorithm is challenging task due to the many possible scenarios in which such systems may be deployed we have designed new performance plot called the croc curve with an associated statistic the area under the curve our croc curve supplements the widely used roc curve in recommender system evaluation by discovering performance characteristics that standard roc evaluation often ignores empirical studies on two domains and including several recommender system algorithms demonstrate that combining roc and croc curves in evaluation can lead to more informed characterization of performance than using either curve alone
producing reliable information is the ultimate goal of data processing the ocean of data created with the advances of science and technologies calls for integration of data coming from heterogeneous sources that are diverse in their purposes business rules underlying models and enabling technologies reference models semantic web standards ontology and other technologies enable fast and efficient merging of heterogeneous data while the reliability of produced information is largely defined by how well the data represent the reality in this paper we initiate framework for assessing the informational value of data that includes data dimensions aligning data quality with business practices identifying authoritative sources and integration keys merging models uniting updates of varying frequency and overlapping or gapped data sets
software development artifacts such as model descriptions diagrammatic languages abstract formal specifications and source code are highly interrelated where changes in some of them affect others trace dependencies characterize such relationships abstractly this paper presents an automated approach to generating and validating trace dependencies it addresses the severe problem that the absence of trace information or the uncertainty of its correctness limits the usefulness of software models during software development it also automates what is normally time consuming and costly activity due to the quadratic explosion of potential trace dependencies between development artifacts
information graphics or infographics are visual representations of information data or knowledge understanding of infographics in documents is relatively new research problem which becomes more challenging when infographics appear as raster images this paper describes technical details and practical applications of the system we built for recognizing and understanding imaged infographics located in document pages to recognize infographics in raster form both graphical symbol extraction and text recognition need to be performed the two kinds of information are then auto associated to capture and store the semantic information carried by the infographics two practical applications of the system are introduced in this paper including supplement to traditional optical character recognition ocr system and providing enriched information for question answering qa to test the performance of our system we conducted experiments using collection of downloaded and scanned infographic images another set of scanned document pages from the university of washington document image database were used to demonstrate how the system output can be used by other applications the results obtained confirm the practical value of the system
this paper presents system to automatically generate compact explosion diagrams inspired by handmade illustrations our approach reduces the complexity of an explosion diagram by rendering an exploded view only for subset of the assemblies of an object however the exploded views are chosen so that they allow inference of the remaining unexploded assemblies of the entire model in particular our approach demonstrates the assembly of set of identical groups of parts by presenting an exploded view only for single representative in order to identify the representatives our system automatically searches for recurring subassemblies it selects representatives depending on quality evaluation of their potential exploded view our system takes into account visibility information of both the exploded view of potential representative as well as visibility information of the remaining unexploded assemblies this allows rendering balanced compact explosion diagram consisting of clear presentation of the exploded representatives as well as the unexploded remaining assemblies since representatives may interfere with one another our system furthermore optimizes combinations of representatives throughout this paper we show number of examples which have all been rendered from unmodified cad models
we evaluate an admission control screening policy for proxy server caching that augments the lru least recently used algorithm our results are useful for operating proxy server deployed by an internet service provider or for an enterprise forward proxy server through which employees browse the internet the admission control policy classifies documents as cacheable and non cacheable based on loading times and then uses lru to operate the cache the mathematical analysis of the admission control approach is particularly challenging because it considers the dynamics of the caching policy lru operating at the proxy server our results show substantial reduction around in our numerical simulations in user delay the improvement can be even larger at high levels of proxy server capacity or when the user demand patterns are more random an approximation technique provides near optimal results for large problem sizes demonstrating that our approach can be used in real world situations we also show that the traffic downloaded by the proxy server does not change much as compared to lru as result of screening detailed simulation study on lru and other caching algorithms validate the theoretical results and provide additional insights furthermore we have provided ways to estimate policy parameter values using real world trace data
software architectures have evolved considerably over the last decade and partly also due to the significant progress made in component based development have become major subfield of software engineering the associated field of architecture description languages adls has also evolved considerably providing numerous approaches to the formal specification and representation of architectural designs in this field one of its most interesting and rather recent aspects has been the exploration of different ways to map architectural specifications down to executable representations in this paper we present methodology for mapping the generic features of any typical adl to executable code the mapping process involves the use of acme generic language for describing software architectures and the coordination paradigm more to the point we show how the core concepts of acme can be mapped to equivalent executable code written in the coordination language manifold the result is the generation of skeletal code which captures and implements the most important system implementation properties of the translated architectural design thus significantly assisting the programmer in filling in the rest of the needed code
advanced computer architectures rely mainly on compiler optimizations for parallelization vectorization and pipelining efficient code generation is based on control dependence analysis to find the basic blocks and to determine the regions of control however unstructured branch statements such as jumps and goto’s render the control flow analysis difficult time consuming and result in poor code generation branches are part of many programming languages and occur in legacy and maintenance code as well as in assembler intermediate languages and byte code simple and effective technique is presented to convert unstructured branches into hammock graph control structures using three basic transformations an equivalent program is obtained in which all control statements have well defined scope in the interest of predication and branch prediction the number of control variables has been minimized thereby allowing limited code replication the correctness of the transformations has been proven using an axiomatic proof rule system with respect to previous work the algorithm is simpler and the branch conditions are less complex making the program more readable and the code generation more efficient additionally hammock graphs define single entry single exit regions and therefore allow localized optimizations the restructuring method has been implemented into the parallelizing compiler fpt and allows to extract parallelism in unstructured programs the use of hammock graph transformations in other application areas such as vectorization decompilation and assembly program restructuring is also demonstrated
we present three new interaction techniques for aiding users in collecting and organizing web content first we demonstrate an interface for creating associations between websites which facilitate the automatic retrieval of related content second we present an authoring interface that allows users to quickly merge content from many different websites into uniform and personalized representation which we call card finally we introduce novel search paradigm that leverages the relationships in card to direct search queries to extract relevant content from multiple web sources and fill new series of cards instead of just returning list of webpage urls preliminary feedback from users is positive andvalidates our design
roles are powerful and policy neutral concept for facilitating distributed systems management and enforcing access control models which are now subject to becoming standard have been proposed and much work on extensions to these models has been done over the last years as documented in the recent rbac sacmat workshops when looking at these extensions we can often observe that they concentrate on particular stage in the life of role we investigate how these extensions fit into more general theoretical framework in order to give practitioners starting point from which to develop role based systems we believe that the life cycle of role could be seen as the basis for such framework and we provide an initial discussion on such role life cycle based on our experiences and observations in enterprise security management we propose life cycle model that is based on an iterative incremental process similar to those found in the area of software development
this paper describes safety analysis for multithreaded system based upon transactional memory the analysis guarantees that shared data is always read and written from within transaction while allowing for unsynchronized access to thread local and shared read only data as well as the migration of data between threads the analysis is based on type and effect system for object oriented programs called partitions programmers specify partitioning of the heap into disjoint regions at field level granularity and then use this partitioning to enforce safety properties in their programs our flow sensitive effect system requires methods to disclose which partitions of the heap they will read or write and also allows them to specify an effect agreement which can be used to limit the conditions in which method can be called
cloud computing adds more power to the existing internet technologies virtualization harnesses the power of the existing infrastructure and resources with virtualization we can simultaneously run multiple instances of different commodity operating systems since we have limited processors and jobs work in concurrent fashion overload situations can occur things become even more challenging in distributed environment we propose central load balancing policy for virtual machines clbvm to balance the load evenly in distributed virtual machine cloud computing environment this work tries to compare the performance of web servers based on our clbvm policy and independent virtual machine vm running on single physical server using xen virtualizaion the paper discusses the efficacy and feasibility of using this kind of policy for overall performance improvement
spatial co location pattern mining is an interesting and important issue in spatial data mining area which discovers the subsets of features whose events are frequently located together in geographic space however previous research literatures for mining co location patterns assume static neighborhood constraint that apparently introduces many drawbacks in this paper we conclude the preferences that algorithms rely on when making decisions for mining co location patterns with dynamic neighborhood constraint based on this we define the mining task as an optimization problem and propose greedy algorithm for mining co location patterns with dynamic neighborhood constraint the experimental evaluation on real world data set shows that our algorithm has better capability than the previous approach on finding co location patterns together with the consideration of the distribution of data set
the processes by which communities come together attract new members and develop over time is central research issue in the social sciences political movements professional organizations and religious denominations all provide fundamental examples of such communities in the digital domain on line groups are becoming increasingly prominent due to the growth of community and social networking sites such as myspace and livejournal however the challenge of collecting and analyzing large scale time resolved data on social groups and communities has left most basic questions about the evolution of such groups largely unresolved what are the structural features that influence whether individuals will join communities which communities will grow rapidly and how do the overlaps among pairs of communities change over timehere we address these questions using two large sources of data friendship links and community membership on livejournal and co authorship and conference publications in dblp both of these datasets provide explicit user defined communities where conferences serve as proxies for communities in dblp we study how the evolution of these communities relates to properties such as the structure of the underlying social networks we find that the propensity of individuals to join communities and of communities to grow rapidly depends in subtle ways on the underlying network structure for example the tendency of an individual to join community is influenced not just by the number of friends he or she has within the community but also crucially by how those friends are connected to one another we use decision tree techniques to identify the most significant structural determinants of these properties we also develop novel methodology for measuring movement of individuals between communities and show how such movements are closely aligned with changes in the topics of interest within the communities
the existence and use of standard test collections in information retrieval experimentation allows results to be compared between research groups and over time such comparisons however are rarely made most researchers only report results from their own experiments practice that allows lack of overall improvement to go unnoticed in this paper we analyze results achieved on the trec ad hoc web terabyte and robust collections as reported in sigir and cikm dozens of individual published experiments report effectiveness improvements and often claim statistical significance however there is little evidence of improvement in ad hoc retrieval technology over the past decade baselines are generally weak often being below the median original trec system and in only handful of experiments is the score of the best trec automatic run exceeded given this finding we question the value of achieving even statistically significant result over weak baseline we propose that the community adopt practice of regular longitudinal comparison to ensure measurable progress or at least prevent the lack of it from going unnoticed we describe an online database of retrieval runs that facilitates such practice
online reviews in which users publish detailed commentary about their experiences and opinions with products services or events are extremely valuable to users who rely on them to make informed decisions however reviews vary greatly in quality and are constantly increasing in number therefore automatic assessment of review helpfulness is of growing importance previous work has addressed the problem by treating review as stand alone document extracting features from the review text and learning function based on these features for predicting the review quality in this work we exploit contextual information about authors identities and social networks for improving review quality prediction we propose generic framework for incorporating social context information by adding regularization constraints to the text based predictor our approach can effectively use the social context information available for large quantities of unlabeled reviews it also has the advantage that the resulting predictor is usable even when social context is unavailable we validate our framework within real commerce portal and experimentally demonstrate that using social context information can help improve the accuracy of review quality prediction especially when the available training data is sparse
the ambient calculus is concurrent calculus where the unifying notion of ambient is used to model many different constructs for distributed and mobile computation we study type system that describes several properties of ambient behavior the type system allows ambients to be partitioned in disjoint sets groups according to the intended design of system in order to specify both the communication and the mobility behavior of ambients
data exchange is the problem of transforming data structured under source schema into data structured under target schema in such way that all constraints of schema mapping are satisfied at the heart of data exchange lies basic decision problem called the existence of solutions problem given source instance is there target instance that satisfies the constraints of the schema mapping at hand earlier work showed that for schema mappings specified by embedded implicational dependencies this problem is solvable in polynomial time assuming that the schema mapping is kept fixed and the constraints of the schema mapping satisfy certain structural condition called weak acyclicitywe investigate the effect of these assumptions on the complexity of the existence of solutions problem and show that each one is indispensable in deriving polynomial time algorithms for this problem specifically using machinery from universal algebra we show that if the weak acyclicity assumption is relaxed even in minimal way then the existence of solutions problem becomes undecidable we also show that if in addition to the source instance the schema mapping is part of the input then the existence of solutions problem becomes exptime complete thus there is provable exponential gap between the data complexity and the combined complexity of data exchange finally we study restricted classes of schema mappings and develop comprehensive picture for the combined complexity of the existence of solutions problem for these restrictions in particular depending on the restriction considered the combined complexity of this problem turns out to be either exptime complete or conp complete
simulations with web traffic usually generate input by sampling heavy tailed object size distribution as consequence these simulations remain in transient state over all periods of time ie all statistics that depend on moments of this distribution such as the average object size or the average user perceived latency of downloads do not converge within periods practically feasible for simulations we therefore investigate whether the th th and th latency percentiles which do not depend on the extreme tail of the latency distribution are more suitable statistics for the performance evaluation we exploit that corresponding object size percentiles in samples from heavy tailed distribution converge to normal distributions during periods feasible for simulations conducting simulation study with ns we find similar convergence for network latency percentiles we explain this finding with probability theory and propose method to reliably test for this convergence
many recent studies have suggested that the optimistic concurrency control occ protocols outperform the locking based protocols in real time database systems rtdbs however the occ protocols suffer from the problem of unnecessary transaction restarts that is detrimental to transactions meeting their deadlines the problem is more intensified in mixed transaction environments where both hard and firm real time transactions exist firm transactions are more vulnerable to restarts when they are in conflict with hard transactions on data access in this paper we have addressed the problem and devised an effective occ protocol with dynamic adjustment of serialization order daso called occ da for rtdbs with mixed transactions this protocol can avoid unnecessary transaction restarts by dynamically adjusting the serialization order of the conflicting transactions with respect to the validating transaction as result much resource can be saved and more firm transactions can meet their deadlines without affecting the execution of hard transactions the characteristics of the occ da protocol have been examined in detail by simulation the results show that the performance of the occ da protocol is consistently better than the other two popular protocols occ with forward validation and occ with wait over wide range of system settings in particular the occ da protocol provides more significant performance gain in mixed transaction environments
computational scientists often must choose between the greater programming productivity of high level abstractions such as matrices and mesh entities and the greater execution efficiency of low level constructs performance is degraded when abstraction indirection introduces overhead and hinders compiler analysis this can be overcome by targeting the semantics rather than the implementation of abstractions raising operators specified by domain expert project an application from an implementation space to an abstraction space where optimizations leverage domain semantics to complement conservative analyses raising operators define domain specific intermediate representation which optimizations target for improved portability following optimization transformed code is reified as concrete implementation via lowering operators we have developed framework to implement this optimization strategy which we use to introduce two domain specific unstructured mesh optimizations the first uses an inspector executor approach to avoid costly traversals over static mesh by memoizing the relatively few references required for mathematical computations the executor phase accesses stored entities without incurring the indirections the second optimization lowers object based mesh access and iteration to low level implementation which uses integer based access and iteration
this paper considers the following induction problem given the background knowledge and an observation find hypothesis such that consistent theory has minimal model satisfying we call this type of induction brave induction brave induction is different from explanatory induction in ilp which requires that is satisfied in every model of brave induction is useful for learning disjunctive rules from observations or learning from the background knowledge containing indefinite or incomplete information we develop an algorithm for computing brave induction and extend it to induction in answer set programming
the ability to deal with the incompatibilities of service requesters and providers is critical factor for achieving interoperability in dynamic open environments we focus on the problem of process mediation of the semantically annotated process models of the service requester and service provider we propose an abstract process mediation framework apmf identifying the key functional areas that need to be addressed by process mediation components next we present algorithms for solving the process mediation problem in two scenarios when the mediation process has complete visibility of the process model of the service provider and service requester complete visibility scenario and when the mediation process has visibility only of the process model of the service provider but not the service requester asymmetric scenario the algorithms combine planning and semantic reasoning with the discovery of appropriate external services such as data mediators finally the process mediation agent pma is introduced which realises an execution infrastructure for runtime mediation
the increasing use of microprocessor cores in embedded systems as well as mobile and portable devices creates an opportunity for customizing the cache subsystem for improved performance in traditional cache design the index portion of the memory address bus consists of the least significant bits where log and is the depth of the cache however in devices where the application set is known and characterized eg systems that execute fixed application set there is an opportunity to improve cache performance by choosing an optimal set of bits used as index into the cache this technique does not add any overhead in terms of area or delay we give an efficient heuristic algorithm for selecting index bits for improved cache performance we show the feasibility of our algorithm by applying it to large number of embedded system applications as well as the integer spec cpu benchmarks
run time type dispatch enables variety of advanced optimization techniques for polymorphic languages including tag free garbage collection unboxed function arguments and flattened data structures however modern type preserving compilers transform types between stages of compilation making type dispatch prohibitively complex at low levels of typed compilation it is crucial therefore for type analysis at these low levels to refer to the types of previous stages unfortunately no current intermediate language supports this facilityto fill this gap we present the language lx which provides rich language of type constructors supporting type analysis possibly of previous stage types as programming idiom this language is quite flexible supporting variety of other applications such as analysis of quantified types analysis with incomplete type information and type classes we also show that lx is compatible with type erasure semantics
to perform key business functions organizations in critical infrastructure sectors such as healthcare or finance increasingly need to share identifying and authorization related information such information sharing requires negotiation about identity safeguarding policies and capabilities as provided by processes technologies tools and models that negotiation must address the concerns not only of the organizations sharing the information but also of the individuals whose identity related information is shared spici sharing policy identity and control information provides descriptive and analytic framework to structure and support such negotiations with an emphasis on assurance
user clicks on url in response to query are extremely useful predictors of the url’s relevance to that query exact match click features tend to suffer from severe data sparsity issues in web ranking such sparsity is particularly pronounced for new urls or long queries where each distinct query url pair will rarely occur to remedy this we present set of straightforward yet informative query url gram features that allows for generalization of limited user click data to large amounts of unseen query url pairs the method is motivated by techniques leveraged in the nlp community for dealing with unseen words we find that there are interesting regularities across queries and their preferred destination urls for example queries containing form tend to lead to clicks on urls containing pdf we evaluate our set of new query url features on web search ranking task and obtain improvements that are statistically significant at value level over strong baseline with exact match clickthrough features
we present an algorithm for constructing kd trees on gpus this algorithm achieves real time performance by exploiting the gpu’s streaming architecture at all stages of kd tree construction unlike previous parallel kd tree algorithms our method builds tree nodes completely in bfs breadth first search order we also develop special strategy for large nodes at upper tree levels so as to further exploit the fine grained parallelism of gpus for these nodes we parallelize the computation over all geometric primitives instead of nodes at each level finally in order to maintain kd tree quality we introduce novel schemes for fast evaluation of node split costs as far as we know ours is the first real time kd tree algorithm on the gpu the kd trees built by our algorithm are of comparable quality as those constructed by off line cpu algorithms in terms of speed our algorithm is significantly faster than well optimized single core cpu algorithms and competitive with multi core cpu algorithms our algorithm provides general way for handling dynamic scenes on the gpu we demonstrate the potential of our algorithm in applications involving dynamic scenes including gpu ray tracing interactive photon mapping and point cloud modeling
we propose lexicalized syntactic reordering framework for cross language word aligning and translating researches in this framework we first flatten hierarchical source language parse trees into syntactically motivated linear string representations which can easily be input to many feature like probabilistic models during model training these string representations accompanied with target language word alignment information are leveraged to learn systematic similarities and differences in languages grammars at runtime syntactic constituents of source language parse trees will be reordered according to automatically acquired lexicalized reordering rules in previous step to closer match word orientations of the target language empirical results show that as preprocessing component bilingual word aligning and translating tasks benefit from our reordering methodology
real world activity is complex and increasingly involves use of multiple computer applications and communication devices over extended periods of time to understand activity at the level of detail required to provide natural and comprehensive support for it necessitates appreciating both its richness and dynamically changing context in this article we summarize field work in which we recorded the desktop activities of workers in law office analyze interview data in detail to show the effects of context reinstatement when viewing video summaries of past desktop activity we conclude by discussing the implications of our results for the design of software tools to assist work in office settings
separation of concerns is basic engineering principle that is also at the core of object oriented analysis and design methods in the context of the unified modeling language uml the uml gives the designer rich but somehow disorganized set of views on her model as well as many features such as design pattern occurrences stereotypes or tag values allowing her to add non functional information to model aspect oriented concepts are applied to manage the multitude of design constraints however it can then be an overwhelming task to reconcile the various aspects of model into working implementation in this paper we present our umlaut framework as toolkit for easily building application specific weavers for generating detailed design models from high level aspect oriented uml models this is illustrated with toy example of distributed multimedia application with weaving generating an implementation model more ambitious applications are briefly outlined in the conclusion
we consider the reachability problem on semi algebraic hybrid automata in particular we deal with the effective cost that has to be afforded to solve reachability through first order satisfiability the analysis we perform with some existing tools shows that even simple examples cannot be efficiently solved we need approximations to reduce the number of variables in our formulae this is the main source of time computation growth we study standard approximation methods based on taylor polynomials and ad hoc strategies to solve the problem and we show their effectiveness on the repressilator case study
the continuous partial match query is partial match query whose result remains consistently in the client’s memory conventional cache invalidation methods for mobile clients are record id based however since the partial match query uses content based retrieval the conventional id based approaches cannot efficiently manage the cache consistency of mobile clients in this paper we propose predicate based cache invalidation scheme for continuous partial match queries in mobile computing environments we represent the cache state of mobile client as predicate and also construct cache invalidation report cir which the server broadcasts to clients for cache management with predicates in order to reduce the amount of information that is needed for cache management we propose set of methods for cir construction in the server and identification of invalidated data in the client through experiments we show that the predicate based approach is very effective for the cache management of mobile clients
the role generic relationship for conceptual modeling relates class of objects eg persons and classes of roles eg students employees for those objects the role relationship is meant to capture dynamic aspects of real world objects while the usual generalization relationship deals with their more static aspects therefore to take into account both static and dynamic aspects object languages and systems must somehow support both relationships this paper presents generic role model where the semantics of roles is defined at both the class and the instance levels it discusses the interaction between the role relationship and generalization and it attempts to clarify some of their similarities and differences the introduction of roles as an abstraction mechanism in the overall software development lifecycle is reviewed the paper then proposes comprehensive implementation for the role relationship with the help of metaclass mechanism our implementation is illustrated along the lines of the vodak modeling language thus the semantics of our role model is implemented in metaclass that is template to be instantiated in applications application classes are then created as instances of the metaclass and they are thereby endowed with structure and behavior consistent with the semantics of roles
variable hiding and predicate abstraction are two popular abstraction methods to obtain simplified models for model checking although both methods have been used successfully in practice no attempt has been made to combine them in counterexample guided abstraction refinement cegar in this paper we propose hybrid abstraction method that allows both visible variables and predicates to take advantages of their relative strengths we use refinement based on weakest preconditions to add new predicates and under certain conditions trade in the predicates for visible variables in the abstract model we also present heuristics for improving the overall performance based on static analysis to identify useful candidates for visible variables and use of lazy constraints to find more effective unsatisfiable cores for refinement we have implemented the proposed hybrid cegar procedure our experiments on public benchmarks show that the new abstraction method frequently outperforms the better of the two existing abstraction methods
operating systems provide services that are accessed by processes via mechanisms that involve ring transition to transfer control to the kernel where the required function is performed this has one significant drawback that every service call involves an overhead of context switch where processor state is saved and protection domain transfer is performed however as we discovered it is possible on processor architectures that support segmentation to achieve significant performance gain in accessing the services provided by the operating system by not performing ring transition further such gains can be achieved without compromising on the separation of the privileged components from the unprivileged klos is kernel less operating system built on the basis of such design the klos service call mechanism is an order of magnitude faster than the current widely implemented mechanisms for service or system calls with improvement over the traditional trap interrupt and improvement over the intel sysenter sysexit fast system call models
the objective of this study is to develop measurement of search result relevance for chinese queries through comparing four chinese search engines the relevance measurement was first method and statistical test by blind evaluating of first search results four indexes such as average precisions within first results hit rate within results mean dead link rate within results md and mean reciprocal rank of first relevant document mrr were figured out the results implied that except for md engine was better the other three indexes engine were the best however by statistical analyzing it indicated that there were no significant difference of the and mrr among the four engines except for the index md
accurate simulation of large parallel applications can be facilitated with the use of direct execution and parallel discrete event simulation this paper describes the use of compass direct execution driven parallel simulator for performance prediction of programs that include both communication and intensive applications the simulator has been used to predict the performance of such applications on both distributed memory machines like the ibm sp and shared memory machines like the sgi origin the paper illustrates the usefulness of compass as versatile performance prediction tool we use both real world applications and synthetic benchmarks to study application scalability sensitivity to communication latency and the interplay between factors like communication pattern and parallel file system caching on application performance we also show that the simulator is accurate in its predictions and that it is also efficient in its ability to use parallel simulation to reduce its own execution time which in some cases has yielded nearlinear speedup
cubegrades are generalization of association rules which represent how set of measures aggregates is affected by modifying cube through specialization rolldown generalization rollup and mutation which is change in one of the cube’s dimensions cubegrades are significantly more expressive than association rules in capturing trends and patterns in data because they can use other standard aggregate measures in addition to count cubegrades are atoms which can support sophisticated ldquo what if rdquo analysis tasks dealing with behavior of arbitrary aggregates over different database segments as such cubegrades can be useful in marketing sales analysis and other typical data mining applications in businessin this paper we introduce the concept of cubegrades we define them and give examples of their usage we then describe in detail an important task for computing cubegrades generation of significant cubes which is analogous to generating frequent sets novel grid based pruning gbp method is employed for this purpose we experimentally demonstrate the practicality of the method we conclude with number of open questions and possible extensions of the work
asymmetric multicore processors amp promise higher performance per watt than their symmetric counterparts and it is likely that future processors will integrate few fast out of order cores coupled with large number of simpler slow cores all exposing the same instruction set architecture isa it is well known that one of the most effective ways to leverage the effectiveness of these systems is to use fast cores to accelerate sequential phases of parallel applications and to use slow cores for running parallel phases at the same time we are not aware of any implementation of this parallelism aware pa scheduling policy in an operating system so the questions as to whether this policy can be delivered efficiently by the operating system to unmodified applications and what the associated overheads are remain open to answer these questions we created two different implementations of the pa policy in opensolaris and evaluated it on real hardware where asymmetry was emulated via cpu frequency scaling this paper reports our findings with regard to benefits and drawbacks of this scheduling policy
this paper presents practical optimization procedure for object detection and recognition algorithms it is suitable for object recognition using catadioptric omnidirectional vision system mounted on mobile robot we use the sift descriptor to obtain image features of the objects and the environment first sample object images are given for training and optimization procedures bayesian classification is used to train various test objects based on different sift vectors the system selects the features based on the means group to predict the possible object from the candidate regions of the images it is thus able to detect the object with arbitrary shape without the information the feature optimization procedure makes the object features more stable for recognition and classification experimental results are presented for real scene images captured by catadioptric omnivision camera
we investigate the problem of evaluating fortran style array expressions on massively parallel distributed memory machines on such machine an elementwise operation can be performed in constant time for arrays whose corresponding elements are in the same processor if the arrays are not aligned in this manner the cost of aligning them is part of the cost of evaluating the expression tree the choice of where to perform the operation then affects this costwe describe the communication cost of the parallel machine theoretically as metric space we model the alignment problem as that of finding minimum cost embedding of the expression tree into this space we present algorithms based on dynamic programming that solve the embedding problem optimally for several communication cost metrics multidimensional grids and rings hypercubes fat trees and the discrete metric we also extend our approach to handle operations that change the shape of the arrays
in this paper we discuss the challenges posed by the neuroweb project as case study of ontological modeling at knowledge interface between neurovascular medicine and genomics the aim of the project is the development of support system for association studies we identify the notion of clinical phenotypes that is the pathological condition of patient as the central construct of the knowledge model clinical phenotypes are assessed through the diagnostic activity performed by clinical experts operating within communities of practice the different communities operate according to specific procedures but they also conform to the minimal requirements of international guidelines displayed by the adoption of common standard for the patient classification we develop central model for the clinical phenotypes able to reconcile the different methodologies into common classificatory system to bridge neurovascular medicine and genomics we identify the general theory of biological function as the common ground between the two disciplines therefore we decompose the clinical phenotypes into elementary phenotypes with homogeneous physiological background and we connect them to the biological processes acting as the elementary units of the genomic world
this paper presents novel system framework for interactive three dimensional stylized abstract painterly rendering in this framework the input models are first represented using point sets and then this point based representation is used to build multiresolution bounding sphere hierarchy from the leaf to root nodes spheres of various sizes are rendered into multiple size strokes on the canvas the proposed sphere hierarchy is developed using multiscale region segmentation this segmentation task assembles spheres with similar attribute regularities into meaningful region hierarchy these attributes include colors positions and curvatures this hierarchy is very useful in the following respects it ensures the screen space stroke density controls different input model abstractions maintains region structures such as the edges boundaries at different scales and renders models interactively by choosing suitable abstractions brush stroke and lighting parameters we can interactively generate various painterly styles we also propose novel scheme that reduces the popping effect in animation sequences many different stylized images can be generated using the proposed framework
buyers in online auctions write feedback comments to the sellers from whom the buyers have bought the items other bidders read them to determine which item to bid for in this research we aim at helping bidders by summarizing the feedback comments firstly we examine feedback comments in online auctions from the results of the examination we propose method called social summarization method which uses social relationships in online auctions for summarizing feedback comments we implement system based on our method and evaluate its effectiveness finally we propose an interactive presentation method of the summaries based on the result of the evaluation
the increasing relevance of areas such as real time and embedded systems pervasive computing hybrid systems control and biological and social systems modeling is bringing growing attention to the temporal aspects of computing not only in the computer science domain but also in more traditional fields of engineering this article surveys various approaches to the formal modeling and analysis of the temporal features of computer based systems with level of detail that is also suitable for nonspecialists in doing so it provides unifying framework rather than just comprehensive list of formalisms the article first lays out some key dimensions along which the various formalisms can be evaluated and compared then significant sample of formalisms for time modeling in computing are presented and discussed according to these dimensions the adopted perspective is to some extent historical going from ldquo traditional rdquo models and formalisms to more modern ones
the focus of this work is on techniques that promise to reduce the message delivery latency in message passing interface mpi environments the main contributors to message delivery latency in message passing environments are the copying operations needed to transfer and bind received message to the consuming process thread to reduce this copying overhead and to reach toward finer granularity we introduce architectural extensions comprising of specialized network cache and instructions to manage the operations of this extension in this work we study the caching environment and evaluate new technique called lazy direct to cache transfer dtct our simulations show that messages can be bound and kept into network cache where they persist long enough to be consumed we also demonstrate that lazy dtct provides significant reduction in the access latency for intensive environments such as message passing configurations and smps without polluting the data cache
materialized view or materialized query table mqt is an auxiliary table with precomputed data that can be used to significantly improve the performance of database query materialized query table advisor mqta is often used to recommend and create mqts the state of the art mqta works in standalone database server where mqts are placed on the same server as that in which the base tables are located the mqta does not apply to federated or scaleout scenario in which mqts need to be placed on other servers close to applications ie frontend database server for offloading the workload on the backend database server in this paper we propose data placement advisor dpa and load balancing strategies for multi tiered database systems built on top of the mqta dpa recommends mqts and advises placement strategies for minimizing the response time for query workload to demonstrate the benefit of the data placement advising we implemented prototype of dpa that works with the mqta in the ibm db universal database tm db udb and the ibm websphere information integrator websphere ii the evaluation results showed substantial improvements of workload response times when mqts are intelligently recommended and placed on frontend database server subject to space and load characteristics for tpc and olap type workloads
the translation look aside buffer tlb content addressable memory consumes significant power due to the associative search mechanism it uses in the virtual to physical address translation based on our analysis of the tlb accesses we make two observations first the entropy or information content of the stack virtual page numbers is low due to high spatial locality of stack memory references second the entropy of the higher order bits of global memory references is low since the size of the global data is determined and fixed during compilation of program based on these two characteristics we propose two techniques an entropy based speculative stack address tlb and deterministic global address tlb to achieve energy reducing our results show an average of energy savings in the data tlb with less than overall performance impact
this article studies the expressive power of finite state automata recognizing sets of real numbers encoded positionally it is known that the sets that are definable in the first order additive theory of real and integer variables can all be recognized by weak deterministic buchi automata regardless of the encoding base in this article we prove the reciprocal property ie subset of that is recognizable by weak deterministic automata in every base is necessarily definable in this result generalizes to real numbers the well known cobham’s theorem on the finite state recognizability of sets of integers our proof gives interesting insight into the internal structure of automata recognizing sets of real numbers which may lead to efficient data structures for handling these sets
nested iteration is an important technique for query evaluation it is the default way of executing nested subqueries in sql although decorrelation often results in cheaper non nested plans decorrelation is not always applicable for nested subqueries nested iteration if implemented properly can also win over decorrelation for several classes of queries decorrelation is also hard to apply to nested iteration in user defined sql procedures and functions recent research has proposed evaluation techniques to speed up execution of nested iteration but does not address the optimization issue in this paper we address the issue of exploiting the ordering of nested iteration procedure calls to speed up nested iteration we propose state retention of operators as an important technique to exploit the sort order of parameters correlation variables we then show how to efficiently extend an optimizer to take parameter sort orders into consideration we implemented our evaluation techniques on postgresql and present performance results that demonstrate significant benefits
traditional public key infrastructures pki have not lived up to their promise because there are too many ways to define pkis too many cryptographic primitives to build them with and too many administrative domains with incompatible roots of trust alpaca is an authentication and authorization framework that embraces pki diversity by enabling one pki to plug in another pki’s credentials and cryptographic algorithms allowing users of the latter to authenticate themselves to services using the former using their existing unmodified certificates alpaca builds on proof carrying authorization pca expressing credential as an explicit proof of logical claim alpaca generalizes pca to express not only delegation policies but also the cryptographic primitives credential formats and namespace structures needed to use foreign credentials directly to achieve this goal alpaca introduces method of creating and naming new principals which behave according to arbitrary rules modular approach to logical axioms and domain specific language specialized for reasoning about authentication we have implemented alpaca as python module that assists applications in generating proofs eg in client requesting access to resource and in verifying those proofs via compact line tcb eg in server providing that resource we present examples demonstrating alpaca’s extensibility in scenarios involving inter organization pki interoperability and secure remote pki upgrade
register files in modern embedded processors contribute substantial budget in the energy consumption due to their large switching capacitance and long working time for some embedded processors on average percnt of registers account for percnt of register file accessing time this motivates us to partition the register file into hot and cold regions with the most frequently used registers placed in the hot region and the rarely accessed ones in the cold region we employ the bit line splitting and drowsy register cell techniques to reduce the overall register file accessing power we propose novel approach to partition the register in way that can achieve the largest power saving we formulate the register file partitioning process into graph partitioning problem and apply an effective algorithm to obtain the optimal result we evaluate our algorithm for mibench and spec applications on the simplescalar pisa system and an average saving of percnt and percnt over the nonpartitioned register file accessing power is achieved the area overhead is negligible and the execution time overhead is acceptable percnt for mibench percnt for spec further evaluation for mibench applications is performed on alpha and system
many information flow type systems have been developed that allow to control the non interference of information between the levels of classification in the bell lapadula model we present here translation of typing information collected for bytecode programs to bytecode program logic this translation uses the syntax of bytecode specification language bml translation of this kind allows including the check of the non interference property in single unified verification framework based on program logic and thus can be exploited within foundational proof carrying code infrastructure it also provides flexible basis for various declassification strategies that may be useful in particular code body
it problem management calls for quick identification of resolvers to reported problems the efficiency of this process highly depends on ticket routing transferring problem ticket among various expert groups in search of the right resolver to the ticket to achieve efficient ticket routing wise decision needs to be made at each step of ticket transfer to determine which expert group is likely to be or to lead to the resolver in this paper we address the possibility of improving ticket routing efficiency by mining ticket resolution sequences alone without accessing ticket content to demonstrate this possibility markov model is developed to statistically capture the right decisions that have been made toward problem resolution where the order of the markov model is carefully chosen according to the conditional entropy obtained from ticket data we also design search algorithm called variable order multiple active state search vms that generates ticket transfer recommendations based on our model the proposed framework is evaluated on large set of real world problem tickets the results demonstrate that vms significantly improves human decisions problem resolvers can often be identified with fewer ticket transfers
the ability to summarize procedures is fundamental to building scalable interprocedural analyses for sequential programs procedure summarization is well understood and used routinely in variety of compiler optimizations and software defect detection tools however the benefit of summarization is not available to multithreaded programs for which clear notion of summaries has so far remained unarticulated in the research literaturein this paper we present an intuitive and novel notion of procedure summaries for multithreaded programs we also present model checking algorithm for these programs that uses procedure summarization as an essential component our algorithm can also be viewed as precise interprocedural dataflow analysis for multithreaded programs our method for procedure summarization is based on the insight that in well synchronized programs any computation of thread can be viewed as sequence of transactions each of which appears to execute atomically to other threads we summarize within each transaction the summary of procedure comprises the summaries of all transactions within the procedure we leverage the theory of reduction to infer boundaries of these transactionsthe procedure summaries computed by our algorithm allow reuse of analysis results across different call sites in multithreaded program benefit that has hitherto been available only to sequential programs although our algorithm is not guaranteed to terminate on multithreaded programs that use recursion reachability analysis for multithreaded programs with recursive procedures is undecidable there is large class of programs for which our algorithm does terminate we give formal characterization of this class which includes programs that use shared variables synchronization and recursion
in this paper we present approximation results for the class constrained bin packing problem that has applications to video on demand systems in this problem we are given bins of size with compartments and items of different classes each item with class and size the problem is to pack the items into bins where each bin contains at most different classes and has total items size at most we present several approximation algorithms for offline and online versions of the problem
internet search results are typically displayed as list conforming to static style sheet the difficulty of perusing this list can be exacerbated when screen real estate is limited when space is limited either few results are seen or result descriptions are abbreviated making it difficult to know whether to follow particular web link in this paper we describe wavelens dynamic layout technique for displaying search results which addresses these issues by combining fisheye lens with progressive exposure of page content results from usability study showed that participants performed faster and more accurately on search task with one of two distinct parameter settings of wavelens as compared to the typical static list in post hoc questionnaire participants favored that setting over both the static list and another setting which involved animated zoom we discuss design implications for the retrieval and display of search results
since collaborative filtering cf based recommendation methods rely on neighbors as information sources their performance depends on the quality of neighbor selection process however conventional cf has few fundamental limitations that make them unsuitable for web content services recommender reliability problem and no consideration of customers heterogeneous susceptibility on information sources to overcome these problems we propose new cf method based on the source credibility model in consumer psychology the proposed method extracts each target customer’s part worth on source credibility attributes using conjoint analysis the results of the experiment using the real web usage data verified that the proposed method outperforms the conventional methods in the personalized web content recommendation
the symbolic method for verifying definite iterations over hierarchical data structures without loop invariants is extended to allow tuples of altered data structures and the termination statement which contains condition depending on variables modified by the iteration body transformations of these generalized iterations to the standard ones are proposed and justified technique for generating verification conditions is described the generalization of the symbolic verification method allows us to apply it to pointer programs as case study programs over doubly linked lists are considered program that merges in place ordered doubly linked lists is verified by the symbolic method without loop invariants
online discussion boards are popular form of web based computer mediated communication especially in the areas of distributed education and customer support automatic analysis for discussion understanding would enable better information assessment and assistance this paper describes an extensive study of the relationship between individual messages and full discussion threads we present new approach to classifying discussions using rocchio style classifier with little cost for data labeling in place of labeled data set we employ coarse domain ontology that is automatically induced from canonical text in novel way and use it to build discussion topic profiles we describe new classify by dominance strategy for classifying discussion threads and demonstrate that in the presence of noise it can perform better than the standard classify as whole approach with an error rate reduction of this analysis of human conversation via online discussions provides basis for the development of future information extraction and question answering techniques
current mobile interaction is not well designed with considering mobility usability of mobile service is degraded while on the move since users can not pay enough attention to the service in such dynamic and complicated mobile context in this paper we propose the mobile service design framework which improves the mobility by decreasing the user’s cognitive load our approach provides two interaction modes ie simple interaction mode and normal interaction mode to mobile services so that the user can retrieve important information with less attention moreover the service’s events are simplified to support several modalities and thus the user can be notified in the most suitable way according to the situation in order to evaluate the feasibility of our approach through field experiments we have developed pedestrian navigation service as part of the framework the results showed that the simple interaction mode successfully decreased the user’s attention to the service also future directions for further improvements are discussed based on feedbacks from subjective comments
providing an integrated access to multiple heterogeneous sources is challenging issue in global information systems for cooperation and interoperability in the past companies have equipped themselves with data storing systems building up informative systems containing data that are related one another but which are often redundant not homogeneous and not always semantically consistent moreover to meet the requirements of global internet based information systems it is important that the tools developed for supporting these activities are semi automatic and scalable as much as possible to face the issues related to scalability in the large scale in this paper we propose the exploitation of mobile agents in the information integration area and in particular their integration in the momis infrastructure momis mediator environment for multiple information sources is system that has been conceived as pool of tools to provide an integrated access to heterogeneous information stored in traditional databases for example relational object oriented databases or in file systems as well as in semi structured data sources xml file this proposal has been implemented within the miks mediator agent for integration of knowledge sources system and it is completely described in this paper
this paper presents compiler which produces machine code from functions defined in the logic of theorem prover and at the same time proves that the generated code executes the source functions unlike previously published work on proof producing compilation from theorem prover our compiler provides broad support for user defined extensions targets multiple carefully modelled commercial machine languages and does not require termination proofs for input functions as case study the compiler is used to construct verified interpreters for small lisp like language the compiler has been implemented in the hol theorem prover
with recent advances in mobile technologies and infrastructures there are increasing demands for mobile users to connect to existing collaboration systems this requires extending supports from web browsers on personal computers to sms wap and pdas however in general the capabilities and bandwidth of these mobile devices are significantly inferior to desktop computers over wired connections which have been assumed by most collaboration systems instead of redesigning or adapting collaboration systems in an ad hoc manner for different platforms in connected society we propose methodology of such adaptation based on three tiers of views user interface views data views and process views these views provide customization and help balance security and trust user interface views provide alternative presentations of inputs and outputs data views summarize data over limited bandwidth and display them in different forms furthermore we introduce novel approach of applying process views to mobile collaboration process adaptation where mobile users may execute more concise version or modified procedures the process view also serves as the centric mechanism for integrating user interface views and data views this methodology also discusses ways to support external mobile users who have no agent support customizable degree of agent delegation and the employment of constraint technology for negotiation we demonstrate the feasibility of our methodology by extending web based meeting scheduler into distributed mobile one
we present novel method for procedurally modeling large complex shapes our approach is general purpose and takes as input any polyhedral model provided by user the algorithm exploits the connectivity between the adjacent boundary features of the input model and computes an output model that has similar connected features and resembles the input no additional user input is needed to guide the model generation and the algorithm proceeds automatically in practice our algorithm is simple to implement and can generate variety of complex shapes representing buildings landscapes and fractal shapes in few minutes
transaction processing has emerged as the killer application for commercial servers most servers are engaged in transactional workloads such as processing search requests serving middleware evaluating decisions managing databases and powering online commerce currently commercial servers are built from one or more high performance superscalar processors however commercial server applications exhibit high cache miss rates large memory footprints and low instruction level parallelism ilp which leads to poor utilization on traditional ilp focused superscalar processors in addition these ilp focused processors have been primarily optimized to deliver maximum performance by employing high clock rates and large amounts of speculation as result we are now at the point where the performance watt of subsequent generations of traditional ilp focused processors on server workloads has been flat or even decreasing the lack of increase in processor performance watt coupled with the continued decrease in server hardware acquisition costs and likely increases in future power and cooling costs is leading to situation where total cost of server ownership will soon be predominately determined by power in this paper we argue that attacking thread level parallelism tlp via large number of simple cores on chip multiprocessor cmp leads to much better performance watt for server workloads as case study we compare sun’s tlp oriented niagara processor against the ilp oriented dual core pentium extreme edition from intel showing that the niagara processor has significant performance watt advantage for throughput oriented server applications
the concept of new approach for debt portfolio pattern recognition is presented in the paper aggregated prediction of sequential repayment values over time for set of claims is performed by means of hybrid combination of various machine learning techniques including clustering of references model selection and enrichment of input variables with prediction outputs from preceding periods experimental studies on real data revealed usefulness of the proposed approach for claim appraisals the average accuracy was over much higher than for simplifier methods
in this paper rough approximations of cayley graphs are studied and rough edge cayley graphs are introduced furthermore new algebraic definition for pseudo cayley graphs containing cayley graphs is proposed and rough approximation is expanded to pseudo cayley graphs in addition rough vertex pseudo cayley graphs and rough pseudo cayley graphs are introduced some theorems are provided from which properties such as connectivity and optimal connectivity are derived this approach opens new research fields such as data networks
in this paper we consider ways in which images collected in the field can be used as to support sense making weick’s concept of sense making is applied to the capture of images study is reported in which visitors to an open air museum were asked to take photographs of aspects of the site that they found interesting photographs were taken using bespoke application in which webcam and global positioning system device attached to small tablet computer are used to capture tagged images tagging is supported by the use of simple menu that allows users to classify the images
program verification tools such as model checkers and static analyzers can find many errors in programs these tools need formal specifications of correct program behavior but writing correct specification is difficult just as writing correct program is difficult thus just as we need methods for debugging programs we need methods for debugging specificationsthis paper describes novel method for debugging formal temporal specifications our method exploits the short program execution traces that program verification tools generate from specification violations and that specification miners extract from programs manually examining these traces is straightforward way to debug specification but this method is tedious and error prone because there may be hundreds or thousands of traces to inspect our method uses concept analysis to automatically group the traces into highly similar clusters by examining clusters instead of individual traces person can debug specification with less workto test our method we implemented tool cable for debugging specifications we have used cable to debug specifications produced by strauss our specification miner we found that using cable to debug these specifications requires on average less than one third as many user decisions as debugging by examining all traces requires in one case using cable required only decisions while debugging by examining all traces required
novel framework based on action recognition feedback for pose reconstruction of articulated human body from monocular images is proposed in this paper the intrinsic ambiguity caused by perspective projection makes it difficult to accurately recover articulated poses from monocular images to alleviate such ambiguity we exploit the high level motion knowledge as action recognition feedback to discard those implausible estimates and generate more accurate pose candidates using large number of motion constraints during natural human movement the motion knowledge is represented by both local and global motion constraints the local spatial constraint captures motion correlation between body parts by multiple relevance vector machines while the global temporal constraint preserves temporal coherence between time ordered poses via manifold motion template experiments on the cmu mocap database demonstrate that our method performs better on estimation accuracy than other methods without action recognition feedback
probabilistic graphical models provide framework for compact representation and efficient reasoning about the joint probability distribution of several interdependent variables this is classical topic with roots in statistical physics in recent years spurred by several applications in unstructured data integration sensor networks image processing bio informatics and code design the topic has received renewed interest in the machine learning data mining and database communities techniques from graphical models have also been applied to many topics directly of interest to the database community including information extraction sensor data analysis imprecise data representation and querying selectivity estimation for query optimization and data privacy as database research continues to expand beyond the confines of traditional enterprise domains we expect both the need and applicability of probabilistic graphical models to increase dramatically over the next few years with this tutorial we are aiming to provide foundational overview of probabilistic graphical models to the database community accompanied by brief overview of some of the recent research literature on the role of graphical models in databases
in this paper we examine an emerging class of systems that link people to people to geographical places we call these systems through analyzing the literature we have identified four major system design techniques people centered systems that use either absolute user location eg active badge or user proximity eg hocman and place centered systems based on either representation of people’s use of physical spaces eg activemap or on matching virtual space that enables online interaction linked to physical location eg geonotes in addition each feature can be instantiated synchronously or asynchronously the system framework organizes existing systems into meaningful categories and structures the design space for an interesting new class of potentially context aware systems our discussion of the framework suggests new ways of understanding and addressing the privacy concerns associated with location aware community system and outlines additional socio technical challenges and opportunities
model driven development mdd is an emerging paradigm for software construction that uses models to specify programs and model transformations to synthesize executables feature oriented programming fop is paradigm for software product lines where programs are synthesized by composing features feature oriented model driven development fomdd is blend of fop and mdd that shows how products in software product line can be synthesized in an mdd way by composing features to create models and then transforming these models into executables we present case study of fomdd on product line of portlets which are components of web portals we reveal mathematical properties of portlet synthesis that helped us to validate the correctness of our abstractions tools and specifications as well as optimize portlet synthesis
relational databases provide the ability to store user defined functions and predicates which can be invoked in sql queries when evaluation of user defined predicate is relatively expensive the traditional method of evaluating predicates as early as possible is no longer sound heuristic there are two previous approaches for optimizing such queries however neither is able to guarantee the optimal plan over the desired execution space we present efficient techniques that are able to guarantee the choice of an optimal plan over the desired execution space the optimization algorithm with complete rank ordering improves upon the naive optimization algorithm by exploiting the nature of the cost formulas for join methods and is polynomial in the number of user defined predicates for given number of relations we also propose pruning rules that significantly reduce the cost of searching the execution space for both the naive algorithm as well as for the optimization algorithm with complete rank ordering without compromising optimality we also propose conservative local heuristic that is simpler and has low optimization overhead although it is not always guaranteed to find the optimal plans it produces close to optimal plans in most cases we discuss how depending on application requirements to determine the algorithm of choice it should be emphasized that our optimization algorithms handle user defined selections as well as user defined join predicates uniformly we present complexity analysis and experimental comparison of the algorithms
spreadsheets are the most popular programming systems in use today since spreadsheets are visual first order functional languages research into the foundations of spreadsheets is therefore highly relevant topic for the principles and in particular the practice of declarative programmingsince the error rate in spreadsheets is very high and since those errors have significant impact methods and tools that can help detect and remove errors from spreadsheets are very much needed type systems have traditionally played strong role in detecting errors in programming languages and it is therefore reasonable to ask whether type systems could not be helpful in improving the current situation of spreadsheet programmingin this paper we introduce type system and type inference algorithm for spreadsheets and demonstrate how this algorithm and the underlying typing concept can identify programming errors in spreadsheets in addition we also demonstrate how the type inference algorithm can be employed to infer models or specifications for spreadsheets which can be used to prevent future errors in spreadsheets
support vector machines svm offer theoretically wellfounded approach to automated learning of pattern classifiers they have been proven to give highly accurate results in complex classification problems for example gene expression analysis the svm algorithm is also quite intuitive with few inputs to vary in the fitting process and several outputs that are interesting to study for many data mining tasks eg cancer prediction finding classifiers with good predictive accuracy is important but understanding the classifier is equally important by studying the classifier outputs we may be able to produce simpler classifier learn which variables are the important discriminators between classes and find the samples that are problematic to the classification visual methods for exploratory data analysis can help us to study the outputs and complement automated classification algorithms in data mining we present the use of tour based methods to plot aspects of the svm classifier this approach provides insights about the cluster structure in the data the nature of boundaries between clusters and problematic outliers furthermore tours can be used to assess the variable importance we show how visual methods can be used as complement to crossvalidation methods in order to find good svm input parameters for particular data set
nanoscale systems on chip will integrate billion gate designs the challenge is to find scalable hw sw design style for future cmos technologies tiled architectures suggest possible path small processing tiles connected by short wires typical shapes tile contains vliw floating point dsp risc dnp distributed network processor distributed on chip memory the pot set of peripherals on tile plus an interface for dxm distributed external memory the shapes routing fabric connects on chip and off chip tiles weaving distributed packet switching network next neighbours engineering methodologies is adopted for off chip networking and maximum system density the sw challenge is to provide simple and efficient programming environment for tiled architectures shapes will investigate layered system software which does not destroy algorithmic and distribution info provided by the programmer and is fully aware of the hw paradigm for efficiency and qos the system sw manages intra tile and inter tile latencies bandwidths computing resources using static and dynamic profiling the sw accesses the on chip and off chip networks through homogeneous interface
automatically judging the quality of retrieval functions based on observable user behavior holds promise for making retrieval evaluation faster cheaper and more user centered however the relationship between observable user behavior and retrieval quality is not yet fully understood we present sequence of studies investigating this relationship for an operational search engine on the arxivorg print archive we find that none of the eight absolute usage metrics we explore eg number of clicks frequency of query reformulations abandonment reliably reflect retrieval quality for the sample sizes we consider however we find that paired experiment designs adapted from sensory analysis produce accurate and reliable statements about the relative quality of two retrieval functions in particular we investigate two paired comparison tests that analyze clickthrough data from an interleaved presentation of ranking pairs and we find that both give accurate and consistent results we conclude that both paired comparison tests give substantially more accurate and sensitive evaluation results than absolute usage metrics in our domain
topic detection and tracking tdt is research initiative that aims at techniques to organize news documents in terms of news events we propose method that incorporates simple semantics into tdt by splitting the term space into groups of terms that have the meaning of the same type such group can be associated with an external ontology this ontology is used to determine the similarity of two terms in the given group we extract proper names locations temporal expressions and normal terms into distinct sub vectors of the document representation measuring the similarity of two documents is conducted by comparing pair of their corresponding sub vectors at time we use simple perceptron to optimize the relative emphasis of each semantic class in the tracking and detection decisions the results suggest that the spatial and the temporal similarity measures need to be improved especially the vagueness of spatial and temporal terms needs to be addressed
query expansion is well known method for improving average effectiveness in information retrieval the most effective query expansion methods rely on retrieving documents which are used as source of expansion terms retrieving those documents is costly we examine the bottlenecks of conventional approach and investigate alternative methods aimed at reducing query evaluation time we propose new method that draws candidate terms from brief document summaries that are held in memory for each document while approximately maintaining the effectiveness of the conventional approach this method significantly reduces the time required for query expansion by factor of
recent micro architectural research has proposed various schemes to enhance processors with additional tags to track various properties of program such technique which is usually referred to as information flow tracking has been widely applied to secure software execution eg taint tracking protect software privacy and improve performance eg control speculation in this paper we propose novel use of information flow tracking to obfuscate the whole control flow of program with only modest performance degradation to defeat malicious code injection discourage software piracy and impede malware analysis specifically we exploit two common features in information flow tracking the architectural support for automatic propagation of tags and violation handling of tag misuses unlike other schemes that use tags as oracles to catch attacks eg taint tracking or speculation failures we use the tags as flow sensitive predicates to hide normal control flow transfers the tags are used as predicates for control flow transfers to the violation handler where the real control flow transfer happens we have implemented working prototype based on itanium processors by leveraging the hardware support for control speculation experimental results show that bosh can obfuscate the whole control flow with only mean of ranging from to overhead on specint the increase in code size and compilation time is also modest
we describe an algorithm for rendering animated smoke particle systems in cartoon style this style includes outlines and cel shading for efficient self shadowing effects we introduce nailboard shadow volumes that create complex shadows from few polygons including shadows the renderer draws only three polygons per particle and entirely avoids the latency of depth buffer readbackwe combine the renderer with fast simulator that generates the phenomenology of real smoke but has artistically controllable parameters together they produce real time interactive smoke animations at over fps thus our algorithm is well suited to applications where interactive performance and expressive power are preferred to realism for example video games and rapid development of animation
this work deals with automatic lexical acquisition and topic discovery from speech stream the proposed algorithm builds lexicon enriched with topic information in three steps transcription of an audio stream into phone sequences with speaker and task independent phone recogniser automatic lexical acquisition based on approximate string matching and hierarchical topic clustering of the lexical entries based on knowledge poor co occurrence approach the resulting semantic lexicon is then used to automatically cluster the incoming speech stream into topics the main advantages of this algorithm are its very low computational requirements and its independence to pre defined linguistic resources which makes it easy to port to new languages and to adapt to new tasks it is evaluated both qualitatively and quantitatively on two corpora and on two tasks related to topic clustering the results of these evaluations are encouraging and outline future directions of research for the proposed algorithm such as building automatic orthographic labels of the lexical items
many real world classification applications fall into the class of positive and unlabeled pu learning problems in many such applications not only could the negative training examples be missing the number of positive examples available for learning may also be fairly limited due to the impracticality of hand labeling large number of training examples current pu learning techniques have focused mostly on identifying reliable negative instances from the unlabeled set in this paper we address the oft overlooked pu learning problem when the number of training examples in the positive set is small we propose novel technique lplp learning from probabilistically labeled positive examples and apply the approach to classify product pages from commercial websites the experimental results demonstrate that our approach outperforms existing methods significantly even in the challenging cases where the positive examples in and the hidden positive examples in were not drawn from the same distribution
in this paper the scheduled dataflow sdf architecture decoupled memory execution multithreaded architecture using nonblocking threads is presented in detail and evaluated against superscalar architecture recent focus in the field of new processor architectures is mainly on vliw eg ia superscalar and superspeculative designs this trend allows for better performance but at the expense of increased hardware complexity and possibly higher power expenditures resulting from dynamic instruction scheduling our research deviates from this trend by exploring simpler yet powerful execution paradigm that is based on dataflow and multithreading program is partitioned into nonblocking execution threads in addition all memory accesses are decoupled from the thread’s execution data is preloaded into the thread’s context registers and all results are poststored after the completion of the thread’s execution while multithreading and decoupling are possible with control flow architectures sdf makes it easier to coordinate the memory accesses and execution of thread as well as eliminate unnecessary dependencies among instructions we have compared the execution cycles required for programs on sdf with the execution cycles required by programs on simplescalar superscalar simulator by considering the essential aspects of these architectures in order to have fair comparison the results show that sdf architecture can outperform the superscalar sdf performance scales better with the number of functional units and allows for good exploitation of thread level parallelism tlp and available chip area
we present method for interactive rendering of large outdoor scenes complex polygonal plant models and whole plant populations are represented by relatively small sets of point and line primitives this enables us to show landscapes faithfully using only limited percentage of primitives in addition hierarchical data structure allows us to smoothly reduce the geometrical representation to any desired number of primitives the scene is hierarchically divided into local portions of geometry to achieve large reduction factors for distant regions additionally the data reduction is adapted to the visual importance of geometric objects this allows us to maintain the visual fidelity of the representation while reducing most of the geometry drastically with our system we are able to interactively render very complex landscapes with good visual quality
navigational patterns have applications in several areas including web personalization recommendation user profiling and clustering etc most existing works on navigational pattern discovery give little consideration to the effects of time or temporal trends on navigational patterns some recent works have proposed frameworks for partial temporal representation of navigational patterns this paper proposes framework that models navigational patterns as full temporal objects that may be represented as time series such representation allows rich array of analysis techniques to be applied to the data the proposed framework also enhances the understanding and interpretation of discovered patterns and provides rich environment for integrating the analysis of navigational patterns with data from the underlying organizational environments and other external factors such integrated analysis is very helpful in understanding navigational patterns eg commerce sites may integrate the trend analysis of navigational patterns with other market data and economic indicators to achieve full temporal representation this paper proposes navigational pattern discovery technique that is not based on pre defined thresholds this is shift from existing techniques that are driven by pre defined thresholds that can only support partial temporal representation of navigational patterns
in recent years researchers in graph mining have been exploring linear paths as well as subgraphs as pattern languages in this paper we are investigating the middle ground between these two extremes mining free that is unrooted trees in graph data the motivation for this is the need to upgrade linear path patterns while avoiding complexity issues with subgraph patterns starting from such complexity considerations we are defining free trees and their canonical form before we present freetreeminer an algorithm making efficient use of this canonical form during search experiments with two datasets from the national cancer institute’s developmental therapeutics program dtp anti hiv and anti cancer screening data are reported
parallax is distributed storage system that uses virtualization to provide storage facilities specifically for virtual environments the system employs novel architecture in which storage features that have traditionally been implemented directly on high end storage arrays and switches are relocated into federation of storage vms sharing the same physical hosts as the vms that they serve this architecture retains the single administrative domain and os agnosticism achieved by array and switch based approaches while lowering the bar on hardware requirements and facilitating the development of new features parallax offers comprehensive set of storage features including frequent low overhead snapshot of virtual disks the gold mastering of template images and the ability to use local disks as persistent cache to dampen burst demand on networked storage
variant of iterative learning in the limit cf lz is studied when learner gets negative examples refuting conjectures containing data in excess of the target language and uses additional information of the following four types memorizing up to input elements seen so far up to feedback memberships queries testing if an item is member of the input seen so far the number of input elements seen so far the maximal element of the input seen so far we explore how additional information available to such learners defined and studied in jk may help in particular we show that adding the maximal element or the number of elements seen so far helps such learners to infer any indexed class of languages class preservingly using descriptive numbering defining the class as it is proved in jk this is not possible without using additional information we also study how in the given context different types of additional information fare against each other and establish hierarchies of learners memorizing versus input elements seen and versus feedback membership queries
many of the recently proposed techniques to reduce power consumption in caches introduce an additional level of non determinism in cache access latency due to this additional latency instructions dependent on load speculatively issued must be squashed and re issued as they will not have the correct data in time our experiments show that there is large performance degradation and associated dynamic energy wastage due to these effects of instruction squashing to address this problem we propose an early cache set resolution scheme our experimental evaluation shows that this technique is quite effective in mitigating the problem
this study investigates variational multiphase image segmentation method which combines the advantages of graph cut discrete optimization and multiphase piecewise constant image representation the continuous region parameters serve both image representation and graph cut labeling the algorithm iterates two consecutive steps an original closed form update of the region parameters and partition update by graph cut labeling using the region parameters the number of regions labels can decrease from an initial value thereby relaxing the assumption that the number of regions is known beforehand the advantages of the method over others are shown in several comparative experiments using synthetic and real images of intensity and motion
analysis on dataset of scanned surfaces have presented problems because of incompleteness on the surfaces and because of variances in shape size and pose in this paper high resolution generic model is aligned to data in the civilian american and european surface anthropometry resources caesar database in order to obtain consistent parameterization radial basis function rbf network is built for rough deformation by using landmark information from the generic model anatomical landmarks provided by caesar dataset and virtual landmarks created automatically for geometric deformation fine mapping then successfully applies weighted sum of errors on both surface data and the smoothness of deformation compared with previous methods our approach makes robust alignment in higher efficiency this consistent parameterization also makes it possible for principal components analysis pca on the whole body as well as human body segments our analysis on segmented bodies displays richer variation than that of the whole body this analysis indicates that wider application of human body reconstruction with segments is possible in computer animation
this paper deals with the problem of safety verification of nonlinear hybrid systems we start from classical method that uses interval arithmetic to check whether trajectories can move over the boundaries in rectangular grid we put this method into an abstraction refinement framework and improve it by developing an additional refinement step that employs interval constraint propagation to add information to the abstraction without introducing new grid elements moreover the resulting method allows switching conditions initial states and unsafe states to be described by complex constraints instead of sets that correspond to grid elements nevertheless the method can be easily implemented since it is based on well defined set of constraints on which one can run any constraint propagation based solver tests of such an implementation are promising
in the realm of component based software systems pursuers of the holy grail of automated application composition face many significant challenges in this paper we argue that while the general problem of automated composition in response to high level goal statements is indeed very difficult to solve we can realize composition in restricted context supporting varying degrees of manual to automated assembly for specific types of applications we propose novel paradigm for composition in flow based information processing systems where application design and component development are facilitated by the pervasive use of faceted tag based descriptions of processing goals of component capabilities and of structural patterns of families of application the facets and tags represent different dimensions of both data and processing where each facet is modeled as finite set of tags that are defined in controlled folksonomy all data flowing through the system as well as the functional capabilities of components are described using tags customized ai planner is used to automatically build an application in the form of flow of components given high level goal specification in the form of set of tags end users use an automatically populated faceted search and navigation mechanism to construct these high level goals we also propose novel software engineering methodology to design and develop set of reusable well described components that can be assembled into variety of applications with examples from case study in the financial services domain we demonstrate that composition using faceted tag based application design is not only possible but also extremely useful in helping end users create situational applications from wide variety of available components
we consider an optimization technique for deductive and relational databases the optimization technique is an extension of the magic templates rewriting and it can improve the performance of query evaluation by not materializing the extension of intermediate views standard relational techniques such as unfolding embedded view definitions do not apply to recursively defined views and so alternative techniques are necessary we demonstrate the correctness of our rewriting we define class of ldquo nonrepeating rdquo view definitions and show that for certain queries our rewriting performs at least as well as magic templates on nonrepeating views and often much better syntactically recognizable property called ldquo weak right linearity rdquo is proposed weak right linearity is sufficient condition for nonrepetition and is more general than right linearity our technique gives the same benefits as right linear evaluation of right linear views while applying to significantly more general class of views
commodity graphics hardware has become increasingly programmable over the last few years but has been limited to fixed resource allocation these architectures handle some workloads well others poorly load balancing to maximize graphics hardware performance has become critical issue in this paper we explore one solution to this problem using compile time resource allocation for our experiments we implement graphics pipeline on raw tile based multicore processor we express both the full graphics pipeline and the shaders using streamit high level language based on the stream programming model the programmer specifies the number of tiles per pipeline stage and the streamit compiler maps the computation to the raw architecturewe evaluate our reconfigurable architecture using mix of common rendering tasks with different workloads and improve throughput by over static allocation although our early prototype cannot compete in performance against commercial state of the art graphics processors we believe that this paper describes an important first step in addressing the load balancing challenge
this paper describes the establishment of an xml metadata knowledge base xmkb to assist integration of distributed heterogeneous structured data residing in relational databases and semi structured data held in well formed xml documents xml documents that conform to the xml syntax rules but have no referenced dtd or xml schema produced by internet applications we propose an approach to combine and query the data sources through mediation layer such layer is intended to establish and evolve an xmkb incrementally to assist the query processor to mediate between user queries posed over the master view and the distributed heterogeneous data sources the xmkb is built in bottom up fashion by extracting and merging incrementally the metadata of the data sources the xmkb is introduced to maintain the data source information names types and locations meta information about relationships of paths among data sources and function names for handling semantic and structural discrepancies system to integrate structured and semi structured databases sissd has been built that generates tool for meta user who does the metadata integration to describe mappings between the master view and local data sources by assigning index numbers and specifying conversion function names this system is flexible users can get any master view from the same set of data sources depending on their interest it also preserves local autonomy of the local data sources the sissd uses the local as view approach to map between the master view and the local schema structures this approach is well suited to supporting dynamic environment where data sources can be added to or removed from the system without the need to restructure the master view and to regenerate the xmkb from scratch
the propagation speed of fast scanning worms and the stealthy nature of slow scanning worms present unique challenges to intrusion detection typically techniques optimized for detection of fast scanning worms fail to detect slow scanning worms and vice versa in practice there is interest in developing an integrated approach to detecting both classes of worms in this paper we propose and analyze unique integrated detection approach capable of detecting and identifying traffic flow responsible for simultaneous fast and slow scanning malicious worm attacks the approach uses combination of evidence from distributed host based anomaly detectors self adapting profiler and bayesian inference from network heuristics to detect intrusion activity due to both fast and slow scanning worms we assume that the extreme nature of fast scanning worm epidemics make them well suited for extreme value theory and use sample mean excess function to determine appropriate thresholds for detection of such worms random scanning worm behavior is considered in analyzing the stochastic time intervals that affect behavior of the detection technique based on the analysis probability model for worm detection interval using the detection scheme was developed simulations are used to validate our assumptions and analysis
theoretical framework is presented to study the consistency of robust estimators used in vision problems involving extraction of fine details strong correlation between asymptotic performance of robust estimator and the asymptotic bias of its scale estimate is mathematically demonstrated where the structures are assumed to be linear corrupted by gaussian noise new measure for the inconsistency of scale estimators is defined and formulated by deriving the functional forms of four recent high breakdown robust estimators for each estimator the inconsistency measures are numerically evaluated for range of mutual distances between structures and inlier ratios and the minimum mutual distance between the structures for which each estimator returns non bridging fit is calculated
we model flames and fire using the navier stokes equations combined with the level set method and jump conditions to model the reaction front previous works modeled the flame using combination of propagation in the normal direction and curvature term which leads to level set equation that is parabolic in nature and thus overly dissipative and smooth asymptotic theory shows that one can obtain more interesting velocities and fully hyperbolic as opposed to parabolic equations for the level set evolution in particular researchers in the field of detonation shock dynamics dsd have derived set of equations which exhibit characteristic cellular patterns we show how to make use of the dsd framework in the context of computer graphics simulations of flames and fire to obtain interesting features such as flame wrinkling and cellular patterns
bitmap indexes are known to be efficient for ad hoc range queries that are common in data warehousing and scientific applications however they suffer from the curse of cardinality that is their efficiency deteriorates as attribute cardinalities increase number of strategies have been proposed but none of them addresses the problem adequately in this paper we propose novel binned bitmap index that greatly reduces the cost to answer queries and therefore breaks the curse of cardinality the key idea is to augment the binned index with an order preserving bin based clustering orbic structure this data structure significantly reduces the operations needed to resolve records that can not be resolved with the bitmaps to further improve the proposed index structure we also present strategy to create single valued bins for frequent values this strategy reduces index sizes and improves query processing speed overall the binned indexes with orbic great improves the query processing speed and are times faster than the best available indexes for high cardinality data
desktop grid computing is relatively new technology that can provide massive computing power for variety of applications at low cost these applications may use volunteered computing resources effectively if they have enough mass appeal to obtain large number of users alternatively they can be used as distributed computing application within corporation intranet or worldwide distributed research group the berkeley open infrastructure for network computing boinc provides proven open source infrastructure to set up such projects in relatively short time in this paper we survey scientific applications that have adopted this type of computing paradigm
introducing node mobility into the network also introduces new anonymity threats this important change of the concept of anonymity has recently attracted attentions in mobile wireless security research this paper presents identity free routing and on demand routing as two design principles of anonymous routing in mobile ad hoc networks we devise anodr anonymous on demand routing as the needed anonymous routing scheme that is compliant with the design principles our security analysis and simulation study verify the effectiveness and efficiency of anodr
we use linproc ie typed process calculus based on the calculus of solos in order to express computational processes generated by slpcf namely simple programming language conceived in order to program only linear functions we define faithful translation of slpcf on linproc which enables us to process redexes of slpcf in parallel way afterward we prove that suitable observational equivalence between processes is correct wrt the operational semantics of slpcf via our interpretation
java virtual machine jvm crashes are often due to an invalid memory reference to the jvm heap before the bug that caused the invalid reference can be fixed its location must be identified it can be in either the jvm implementation or the native library written in invoked from java applications to help system engineers identify the location we implemented feature using page protection that prevents threads executing native methods from referring to the jvm heap this feature protects the jvm heap during native method execution if the heap is referred to invalidly it interrupts the execution by generating page fault exception it then reports the location where the exception was generated the runtime overhead for using this feature depends on the frequency of native method calls because the protection is switched on each time native method is called we evaluated the runtime overhead by running the specjvm specjbb volanomark and jfcmark benchmark suites on pc with two intel xeon ghz processors the performance loss was less than for the benchmark items that do not call native methods so frequently times per second and for the benchmark items that do times per second the worst performance loss was which was recorded for benchmark item that calls native methods times per second
to enable multimedia broadcasting services in mesh networks it is critical to optimize the broadcast traffic load traditionally users associate with access points aps with the strongest signal strength we explore the concept of dual association where the ap for unicast traffic and the ap for broadcast traffic are independently chosen by exploiting overlapping coverages that are typical in mesh networks the goal of our proposed solution is to optimize the overall network load by exploiting the flexibility provided by independent selection of unicast and broadcast aps we propose novel cost metric based on ett expected transmission time and the number of nodes in range of the aps that are advertised in the beacons from the aps users periodically scan and associate with the ap which has the lowest cost metric the proposed approach reduces the number of aps that handle the broadcast traffic resulting in heavy reduction in control and data packet overhead this leads to higher packet delivery rate and enhanced video quality measured in terms of psnr our approach allows the freed up resources at aps to increase the unicast throughput we compare the performance of our approach with traditional signal strength based association using extensive simulations and real experiments on an indoor testbed of ieee based devices
grids consist of the aggregation of numerous dispersed computational storage and network resources able to satisfy even the most demanding computing jobs due to the data intensive nature of grid jobs there is an increasing interest in grids using optical transport networks as this technology allows for the timely delivery of large amounts of data such grids are commonly referred to as lambda grids an important aspect of grid deployment is the allocation and activation of installed network capacity needed to transfer data and jobs to and from remote resources however the exact nature of grid’s network traffic depends on the way arriving workload is scheduled over the various grid sites as grids possibly feature high numbers of resources jobs and users solving the combined grid network dimensioning and workload scheduling problem requires the use of scalable mathematical methods such as divisible load theory dlt lambda grids feature additional complexity such as wavelength granularity and continuity or conversion constraints must be enforced additionally grid resources cannot be expected to be available at all times therefore the extra complexity of resilience against possible resource failures must be taken into account when modelling the combined grid network dimensioning and workload scheduling problem enforcing the need for scalable solution methods in this work we tackle the lambda grid combined dimensioning and workload scheduling problem and incorporate single resource failure or unavailability scenarios we use divisible load theory to tackle the scalability problem and compare non resilient lambda grid dimensioning to the dimensions needed to survive single resource failures we distinguish three failure scenarios relevant to lambda grid deployment computational element network link and optical cross connect failure using regular network topologies we derive analytical bounds on the dimensioning cost to validate these bounds we present comparisons for the resulting grid dimensions assuming tier grid operation as function of varying wavelength granularity fiber wavelength cost models traffic demand asymmetry and grid scheduling strategy for specific set of optical transport networks
this paper presents case study in the formal specification and verification of smart card application the application is an electronic purse implementation developed by the smart card producer gemplus as test case for formal methods for smart cards it has been annotated by the authors with specifications using the java modeling language jml language designed to specify the functional behavior of java classes the reason for using jml as specification language is that several tools are available to check parts of the specification wrt an implementation these tools vary in their level of automation and in the level of correctness they ensure several of these tools have been used for the gemplus case study we discuss how the usage of these different tools is complementary large parts of the specification can be checked automatically while more precise verification methods can be used for the more intricate parts of the specification and implementation we believe that having such range of tools available for single specification language is an important step towards the acceptance of formal methods in industry
spatial collocation patterns associate the co existence of non spatial features in spatial neighborhood an example of such pattern can associate contaminated water reservoirs with certain deceases in their spatial neighborhood previous work on discovering collocation patterns converts neighborhoods of feature instances to itemsets and applies mining techniques for transactional data to discover the patterns we propose method that combines the discovery of spatial neighborhoods with the mining process our technique is an extension of spatial join algorithm that operates on multiple inputs and counts long pattern instances as demonstrated by experimentation it yields significant performance improvements compared to previous approaches
increasing software content in embedded systems and socs drives the demand to automatically synthesize software binaries from abstract models this is especially critical for hardware dependent software hds due to the tight coupling in this paper we present our approach to automatically synthesize hds from an abstract system model we synthesize driver code interrupt handlers and startup code we furthermore automatically adjust the application to use rtos services we target traditional rtos based multi tasking solutions as well as pure interrupt based implementation without any rtos our experimental results show the automatic generation of final binary images for six real life target applications and demonstrate significant productivity gains due to automation our hds synthesis is an enabler for efficient mpsoc development and rapid design space exploration
data reduction is key technique in the study of fixed parameter algorithms in the ai literature pruning techniques based on simple and efficient to implement reduction rules also play crucial role in the success of many industrial strength solvers understanding the effectiveness and the applicability of data reduction as technique for designing heuristics for intractable problems has been one of the main motivations in studying the phase transition of randomly generated instances of np complete problems in this paper we take the initiative to study the power of data reductions in the context of random instances of generic intractable parameterized problem the weighted cnf satisfiability problem we propose non trivial random model for the problem and study the probabilistic behavior of the random instances from the model we design an algorithm based on data reduction and other algorithmic techniques and prove that the algorithm solves the random instances with high probability and in fixed parameter polynomial time knm where is the number of variables is the number of clauses and is the fixed parameter we establish the exact threshold of the phase transition of the solution probability and show that in some region of the problem space unsatisfiable random instances of the problem have parametric resolution proof of fixed parameter polynomial size also discussed is more general random model and the generalization of the results to the model
researches on spatial and temporal databases have been done independently in terms of spatial databases whenever new object instance is inserted into database the old one should be deleted it stands for the difficulty to manage efficiently the historical information about spatial object that has been changed with temporal evolution in view of temporal databases because it did not consider supporting spatial type and operation it is extremely hard to manage directly spatial objects without any modification nevertheless because these research domains are closely related an integration field ie spatiotemporal databases has been launched spatiotemporal databases support historical information as well as spatial management for the object at the same time and can deal with geometries changing over time they can be used in the various application areas such as geographic information system gis urban plan system ups and car navigation system cns and so onin this paper we design not only the spatiotemporal query processing stqp system but also implement them they include spatiotemporal data model stdm that supports bitemporal concept for spatial objects we also explain specification of the spatiotemporal database query language entitled as stql as well compared with the results of the previous researches we insist that it is the first pilot system in the spatiotemporal database area that supports temporal concept and spatial expression as well
as an easily implemented approach ripup and reroute has been employed by most of today’s global routers which iteratively applies maze routing to refine solution quality but traditional maze routing is susceptible to get stuck at local optimal results in this work we will present fast and high quality global router fastroute with the new technique named virtual capacity virtual capacity is proposed to guide the global router at maze routing stage to achieve higher quality results in terms of overflow and runtime during maze routing stage virtual capacity works as substitute for the real edge capacity in calculating the maze routing cost there are two sub techniques included virtual capacity initialization virtual capacity update before the maze routing stage fastroute initializes the virtual capacity by subtracting the predicted overflow generated by adaptive congestion estimation ace from the real edge capacity and in the following maze routing iterations we further reduce the virtual capacity by the amount of existing overflow edge usage minus real edge capacity for the edges that are still congested to avoid excessive pushing away of routing wires the virtual capacity is increased by fixed percentage of the existing overflow if edge usage is smaller than real edge capacity experimental results show that fastroute is highly proficient dealing with ispd ispd and ispd benchmark suites the results outperform published ripup and reroute based academic global routers in both routability and runtime in particular fastroute completes routing all the ispd benchmarks for ispd and ispd global routing contest benchmarks it generates out of congestion free solutions the total runtime is enhanced greatly
users of mobile computers will soon have online access to large number of databases via wireless networks because of limited bandwidth wireless communication is more expensive than wire communication in this paper we present and analyze various static and dynamic data allocation methods the objective is to optimize the communication cost between mobile computer and the stationary computer that stores the online database analysis is performed in two cost models one is connection or time based as in cellular telephones where the user is charged per minute of connection the other is message based as in packet radio networks where the user is charged per message our analysis addresses both the average case and the worst case for determining the best allocation method
with the recent design shift towards increasing the number of processing elements in chip high bandwidth support in on chip interconnect is essential for low latency communication much of the previous work has focused on router architectures and network topologies using wide long channels however such solutions may result in complicated router design and high interconnect cost in this paper we exploit table based data compression technique relying on value patterns in cache traffic compressing large packet into small one can increase the effective bandwidth of routers and links while saving power due to reduced operations the main challenges are providing scalable implementation of tables and minimizing overhead of the compression latency first we propose shared table scheme that needs one encoding and one decoding tables for each processing element and management protocol that does not require in order delivery next we present streamlined encoding that combines flit injection and encoding in pipeline furthermore data compression can be selectively applied to communication on congested paths only if compression improves performance simulation results in core cmp show that our compression method improves the packet latency by up to with an average of and reduces the network power consumption by on average
we investigate how to organize large collection of geotagged photos working with dataset of about million images collected from flickr our approach combines content analysis based on text tags and image data with structural analysis based on geospatial data we use the spatial distribution of where people take photos to define relational structure between the photos that are taken at popular places we then study the interplay between this structure and the content using classification methods for predicting such locations from visual textual and temporal features of the photos we find that visual and temporal features improve the ability to estimate the location of photo compared to using just textual features we illustrate using these techniques to organize large photo collection while also revealing various interesting properties about popular cities and landmarks at global scale
most algorithms for frequent pattern mining use support constraint to prune the combinatorial search space but support based pruning is not enough after mining datasets to obtain frequent patterns the resulting patterns can have weak affinity although the minimum support can be increased it is not effective for finding correlated patterns with increased weight and or support affinity interesting measures have been proposed to detect correlated patterns but any approach does not consider both support and weight in this paper we present new strategy weighted interesting pattern mining wip in which new measure weight confidence is suggested to mine correlated patterns with the weight affinity weight range is used to decide weight boundaries and an confidence serves to identify support affinity patterns in wip without additional computation cost original confidence is used instead of the upper bound of confidence for performance improvement wip not only gives balance between the two measures of weight and support but also considers weight affinity and or support affinity between items within patterns so more correlated patterns can be detected to our knowledge ours is the first work specifically to consider weight affinity between items of patterns comprehensive performance study shows that wip is efficient and scalable for finding affinity patterns moreover it generates fewer but more valuable patterns with the correlation to decrease the number of thresholds confidence confidence and weighted support can be used selectively according to requirement of applications
despite that formal and informal quality aspects are of significantimportance to business process modeling there is only little empiricalwork reported on process model quality and its impact factors inthis paper we investigate understandability as proxy for quality of processmodels and focus on its relations with personal and model characteristicswe used questionnaire in classes at three european universitiesand generated several novel hypotheses from an exploratory data analysisfurthermore we interviewed practitioners to validate our findingsthe results reveal that participants tend to exaggerate the differences inmodel understandability that self assessment of modeling competenceappears to be invalid and that the number of arcs in models has animportant influence on understandability
the world wide web www is rapidly becoming important for society as medium for sharing data information and services and there is growing interest in tools for understanding collective behaviors and emerging phenomena in the www in this paper we focus on the problem of searching and classifying communities in the web loosely speaking community is group of pages related to common interest more formally communities have been associated in the computer science literature with the existence of locally dense sub graph of the web graph where web pages are nodes and hyper links are arcs of the web graph the core of our contribution is new scalable algorithm for finding relatively dense subgraphs in massive graphs we apply our algorithm on web graphs built on three publicly available large crawls of the web with raw sizes up to nodes and arcs the effectiveness of our algorithm in finding dense subgraphs is demonstrated experimentally by embedding artificial communities in the web graph and counting how many of these are blindly found effectiveness increases with the size and density of the communities it is close to for communities of thirty nodes or more even at low density it is still about even for communities of twenty nodes with density over of the arcs present at the lower extremes the algorithm catches of dense communities made of ten nodes we complete our community watch system by clustering the communities found in the web graph into homogeneous groups by topic and labelling each group by representative keywords
while most modern web browsers offer history functionality few people use it to revisit previously viewed web pages in this paper we present the design and evaluation of contextual web history cwh novel browser history implementation which improves the visibility of the history feature and helps people find previously visited web pages we present the results of formative user study to understand what factors helped people in finding past web pages from this we developed cwh to be more visible to users and supported search browsing thumbnails and metadata combined these relatively simple features outperformed mozilla firefox built in browser history function and greatly reduced the time and effort required to find and revisit web page
non interference is typically used as baseline security policy to formalize confidentiality of secret information manipulated by program in contrast to static checking of non interference this paper considers dynamic automaton based monitoring of information flow for single execution of sequential program the monitoring mechanism is based on combination of dynamic and static analyses during program execution abstractions of program events are sent to the automaton which uses the abstractions to track information flows and to control the execution by forbidding or editing dangerous actions the mechanism proposed is proved to be sound to preserve executions of well typed programs in the security type system of volpano smith and irvine and to preserve some safe executions of ill typed programs
there has recently been considerable research on physical design tuning algorithms at the same time there is only one published methodology to evaluate the quality of different competing approaches the tab benchmark in this paper we describe our experiences with tab we first report an experimental evaluation of tab on our latest prototype for physical design tuning we then identify certain weakness in the benchmark and briefly comment on alternatives to improve its usefulness
we aim to improve reliability of multithreaded programs by proposing dynamic detector that detects potentially erroneous program executions and their causes we design and evaluate serializability violation detector svd that has two unique goals triggering automatic recovery from erroneous executions using backward error recovery ber or simply alerting users that software error may have occurred and ii helping debug programs by revealing causes of error symptomstwo properties of svd help in achieving these goals first to detect only erroneous executions svd checks serializability of atomic regions which are code regions that need to be executed atomically second to improve usability svd does not require priori annotations of atomic regions instead svd approximates them using heuristic experimental results on three widely used multithreaded server programs show that svd finds real bugs and reports modest false positives the goal of this paper is to develop detector suitable for ber based avoidance of erroneous program executions and ii alerting users as software errors occur we argue that such detector should have the following two properties
channel subthreshold and gate leakage currents are predicted by many to become much more significant in advanced cmos technologies and are expected to have substantial impact on logic circuit design strategies to reduce static power techniques such as the use of monotonic logic and management of various evaluation and idle modes within logic stages may become important options in circuit optimization in this paper we present general multilevel model for logic blocks consisting of logic gates that include wide range of options for static power reduction in both the domains of topology and timing existing circuit techniques are classified within this framework and experiments are presented showing how aspects of performance might vary across this range in hypothetical technology the framework also allows exploration of optimal mixing of techniques
overlay networks create new networking services across nodes that communicate using pre existing networks mosaic is unified declarative platform for constructing new overlay networks from multiple existing overlays each possessing subset of the desired new network’s characteristics mosaic overlays are specified using mozlog new declarative language for expressing overlay properties independently from their particular implementation or underlying network this paper focuses on the runtime aspects of mosaic composition and deployment of control and or data plane functions of different overlay networks dynamic compositions of overlay networks to meet changing application needs and network conditions and seamless support for legacy applications mosaic is validated experimentally using compositions specified in mozlog we combine an indirection overlay that supports mobility resilient overlay ron and scalable lookups chord to provide new overlay networks with new functions mosaic uses runtime composition to simultaneously deliver application aware mobility nat traversal and reliability we further demonstrate mosaic’s dynamic composition capabilities by chord switching its underlay from ip to ron at runtime these benefits are obtained at low performance cost as demonstrated by measurements on both local cluster and planetlab
energy usage of grid has become critical due to environmental cost and heat factors significant amounts of energy can be saved by effective transition to lower power states we present an energy aware scheduler for desktop grid environment considering that task performance prediction is the key in such scenarios we present hardware prediction tool to model the user application memory wall problem being the biggest bottleneck in today’s applications we apply energy scheduling to memory intensive tasks we model the complete system considering all devices like processor hard drive and various controllers for static program analysis in this work we have chosen memory intensive applications commonly used in scientific applications our experiments show that significant amount of energy can be saved with little performance degradation normally the user highly overestimates the job execution time our prediction tool based on static program analysis estimates the task execution time to within error the overall energy savings falls in the range of based on the standard workloads and different grid scenarios for the given devices in the system
the network planning problem of placing replicated servers with qos constraints is considered each server site may consist of multiple server types with varying capacities and each site can be placed in any location among those belonging to given set each client can be served by more than one location as long as the round trip delay of data requests satisfies predetermined upper bounds our main focus is to minimize the cost of using the servers and utilizing the link bandwidth while serving requests according to their delay constraint this is an np hard problem pseudopolynomial and polynomial algorithm that provide guaranteed approximation factors with respect to the optimal for the problem at hand are presented
the tandem algorithm combines the marching cube algorithm for surface extraction and the edge contraction algorithm for surface simplification in lock step to avoid the costly intermediate step of storing the entire extracted surface triangulation beyond this basic strategy we introduce refinements to prevent artifacts in the resulting triangulation first by carefully monitoring the amount of simplification during the process and second by driving the simplification toward compromise between shape approximation and mesh quality we have implemented the algorithm and used extensive computational experiments to document the effects of various design options and to further fine tune the algorithm
afek awerbuch plotkin and saks identified an important fundamental problem inherent to distributed networks which they called the resource controller problem consider first the problem in which one node called the root is required to estimate the number of events that occurred all over the network this counting problem can be viewed as useful variant of the heavily studied and used task of topology update that deals with collecting all remote information the resource controller problem generalizes the counting problem such remote events are considered as requests and the counting node ie the root also issues permits for the requests that way the number of request granted can be controlled bounded an efficient resource controller was constructed in the paper by afek et al which can operate on dynamic network assuming that the network is spanned by tree that may only grow and only by allowing leaves to join the tree in contrast the resource controller presented here can operate under more general dynamic model allowing the spanning tree of the network to undergo both insertions and deletions of both leaves and internal nodes despite the more dynamic network model we allow the message complexity of our controller is always at most the message complexity of the more restricted controller all the applications for the controller of afek et al apply also for our controller moreover with the same message complexity our controller can handle these applications under the more general dynamic model mentioned above in particular under the more general dynamic model the new controller can be transformed into an efficient size estimation protocol ie protocol allowing the root to maintain constant estimation of the number of nodes in the dynamically changing network informally the resulted new size estimation protocol uses log amortized message complexity per topological change assuming that the number of changes in the network size is not too small where is the current number of nodes in the network in addition with the same message complexity as that of the size estimation protocol the new controller can be used to solve the name assignment problem by assigning and maintaining unique log bit identifiers for the nodes of the dynamic network the new size estimation protocol can be used for other applications not mentioned in the paper by afek et al specifically it can be used to extend many existing labeling schemes supporting different queries eg routing ancestry etc so that these schemes can now operate correctly also under more general models these extensions maintain the same asymptotic sizes of the corresponding labels or routing tables of the original schemes and incur only relatively low extra additive cost to the message complexity of the corresponding original schemes
denial of service dos distributed dos ddos attack is an eminent threat to an authentication server which is used to guard access to firewalls virtual private networks and wired wireless networks the major problem is that an authentication server needs to verify whether request is from legitimate user and if intensive computation and or memory resources are needed for verifying request then dos ddos attack is feasible in this paper new protocol called identity based privacy protected access control filter ipacf is proposed to counter dos ddos attack this protocol is an improvement of idf identity based dynamic access control filter the proposed protocol is stateless because it does not create state for an authentication request unless the request is from legitimate user moreover the ipacf is stateless for both user and authentication server since user and responder authenticate each other filter value which is generated by pre shared secrets is sent in frame and checked to see if the request is legitimate note that the process of checking filter value is not intensive computation the filter value is tabulated in table with user identity so that filter value represents user’s identity and only the legitimate user and authentication server can figure out the identity when filter value is from legitimate source new filter value will be generated for the next frame consequently the filter value is changed for every frame thus the privacy of both user and server are protected the ipacf is implemented for both user and authentication server the performance of the implementation is reported in this paper in order to counter more dos ddos attacks that issue fake requests parallel processing technique is used to implement the authentication server which is divided into server and server server only checks the validity of the request filter value against the filter value table if the request is legitimate the request will be passed to server for generating new filter value otherwise the fake request is rejected by server the performance comparison of dual server and single server is also reported
tag ranking has emerged as an important research topic recently due to its potential application on web image search conventional tag ranking approaches mainly rank the tags according to their relevance levels with respect to given image nonetheless such algorithms heavily rely on the large scale image dataset and the proper similarity measurement to retrieve semantic relevant images with multi labels in contrast to the existing tag relevance ranking algorithms in this paper we propose novel tag saliency ranking scheme which aims to automatically rank the tags associated with given image according to their saliency to the image content to this end this paper presents an integrated framework for tag saliency ranking which combines both visual attention model and multi instance learning algorithm to investigate the saliency ranking order information of tags with respect to the given image specifically tags annotated on the image level are propagated to the region level via an efficient multi instance learning algorithm firstly then visual attention model is employed to measure the importance of regions in the given image and finally tags are ranked according to the saliency values of the corresponding regions experiments conducted on the corel and msrc image datasets demonstrate the effectiveness and efficiency of the proposed framework
we present an approach to convert small portion of light field with extracted depth information into cinematic effect with simulated smooth camera motion that exhibits sense of parallax we develop taxonomy of the cinematic conventions of these effects distilled from observations of documentary film footage and organized by the number of subjects of interest in the scene we present an automatic content aware approach to apply these cinematic conventions to an input light field face detector identifies subjects of interest we then optimize for camera path that conforms to cinematic convention maximizes apparent parallax and avoids missing information in the input we describe gpu accelerated temporally coherent rendering algorithm that allows users to create more complex camera moves interactively while experimenting with effects such as focal length depth of field and selective depth based desaturation or brightening we evaluate and demonstrate our approach on wide variety of scenes and present user study that compares our cinematic effects to their counterparts
harvesting the power of modern graphics hardware to solve the complex problem of real time rendering of large unstructured meshes is major research goal in the volume visualization community while for regular grids texture based techniques are well suited for current gpus the steps necessary for rendering unstructured meshes are not so easily mapped to current hardware we propose novel volume rendering technique that simplifies the cpu based processing and shifts much of the sorting burden to the gpu where it can be performed more efficiently our hardware assisted visibility sorting algorithm is hybrid technique that operates in both object space and image space in object space the algorithm performs partial sort of the primitives in preparation for rasterization the goal of the partial sort is to create list of primitives that generate fragments in nearly sorted order in image space the fragment stream is incrementally sorted using fixed depth sorting network in our algorithm the object space work is performed by the cpu and the fragment level sorting is done completely on the gpu prototype implementation of the algorithm demonstrates that the fragment level sorting achieves rendering rates of between one and six million tetrahedral cells per second on an ati radeon
as the internet takes an increasingly central role in our communications infrastructure the slow convergence of routing protocols after network failure becomes growing problem to assure fast recovery from link and node failures in ip networks we present new recovery scheme called multiple routing configurations mrc our proposed scheme guarantees recovery in all single failure scenarios using single mechanism to handle both link and node failures and without knowing the root cause of the failure mrc is strictly connectionless and assumes only destination based hop by hop forwarding mrc is based on keeping additional routing information in the routers and allows packet forwarding to continue on an alternative output link immediately after the detection of failure it can be implemented with only minor changes to existing solutions in this paper we present mrc and analyze its performance with respect to scalability backup path lengths and load distribution after failure we also show how an estimate of the traffic demands in the network can be used to improve the distribution of the recovered traffic and thus reduce the chances of congestion when mrc is used
separation logic sl provides simple but powerful technique for reasoning about imperative programs that use shared data structures unfortunately sl supports only strong updates in which mutation to heap location is safe only if unique reference is owned this limits the applicability of sl when reasoning about the interaction between many high level languages eg ml java and low level ones since these high level languages do not support strong updates instead they adopt the discipline of weak updates in which there is global heap type to enforce the invariant of type preserving heap updates we present slw logic that extends sl with reference types and elegantly reasons about the interaction between strong and weak updates we also describe semantic framework for reference types this framework is used to prove the soundness of slw
implementing new programming language by the means of translator to an existing language is attractive as it provides portability over all platforms supported by the host language and reduces the development time as many low level tasks can be delegated to the host compiler the and programming languages are popular choices for many language implementations due to the availability of efficient compilers on many platforms and good portability for garbage collected languages however they are not perfect match as they provide no support for accurately discovering pointers to heap allocated data we evaluate the published techniques and propose new mechanism lazy pointer stacks for performing accurate garbage collection in such uncooperative environments we implemented the new technique in the ovm java virtual machine with our own java to compiler and gcc as back end and found that our technique outperforms existing approaches
increasing interest towards property based design calls for effective satisfiability procedures for expressive temporal logics eg the ieee standard property specification language psl in this paper we propose new approach to the satisfiability of psl formulae we follow recent approaches to decision procedures for satisfiability modulo theory typically applied to fragments of first order logic the underlying intuition is to combine two interacting search mechanisms on one side we search for assignments that satisfy the boolean abstraction of the problem on the other we invoke solver for temporal satisfiability on the conjunction of temporal formulae corresponding to the assignment within this framework we explore two directions first given the fixed polarity of each constraint in the theory solver aggressive simplifications can be applied second we analyze the idea of conflict reconstruction whenever satisfying assignment at the level of the boolean abstraction results in temporally unsatisfiable problem we identify inconsistent subsets that can be used to rule out possibly many other assignments we propose two methods to extract conflict sets on conjunctions of temporal formulae one based on bdd based model checking and one based on sat based simple bounded model checking we analyze the limits and the merits of the approach with thorough experimental evaluation
many state of the art selectivity estimation methods use query feedback to maintain histogram buckets thereby using the limited memory efficiently however they are reactive in nature that is they update the histogram based on queries that have come to the system in the past for evaluation in some applications future occurrences of certain queries may be predicted and proactive approach can bring much needed performance gain especially when combined with the reactive approach for these applications this paper provides method that builds customized proactive histograms based on query prediction and mergers them into reactive histograms when the predicted future arrives thus the method is called the proactive and reactive histogram prhist two factors affect the usefulness of the proactive histograms and are dealt with during the merge process the first is the predictability of queries and the second is the extent of data updates prhist adjusts itself to be more reactive or more proactive depending on these two factors through extensive experiments using both real and synthetic data and query sets this paper shows that in most cases prhist outperforms stholes the state of the art reactive method even when only small portion of the queries are predictable and significant portion of data is updated
recently multi level cell mlc nand flash memory is becoming widely used as storage media for mobile devices such as mobile phones mp players pdas and digital cameras mlc nand flash memory however has some restrictions that hard disk or single level cell slc nand flash memory do not have since most traditional database techniques assume hard disk they may not provide the best attainable performance on mlc nand flash memory in this paper we design and implement an mlc nand flash based dbms for mobile devices called acedb flashlight which fully exploits the unique characteristics of mlc nand flash memory our performance evaluations on an mlc nand flash based device show that the proposed dbms significantly outperforms the existing ones
we present feature based technique for morphing objects represented by light fields existing light field morphing methods require the user to specify corresponding feature elements to guide morph computation since slight errors in specification can lead to significant morphing artifacts we propose scheme based on feature elements that is less sensitive to imprecise marking of features first features are specified by the user in number of key views in the source and target light fields then the two light fields are warped view by view as guided by the corresponding features finally the two warped light fields are blended together to yield the desired light field morph two key issues in light field morphing are feature specification and warping of light field rays for feature specification we introduce user interface for delineating features in key views of light field which are automatically interpolated to other views for ray warping we describe technique that accounts for visibility changes and present comparison to the ideal morphing of light fields light field morphing based on features makes it simple to incorporate previous image morphing techniques such as nonuniform blending as well as to morph between an image and light field
this paper explores the definition applications and limitations of concepts and concept maps in with focus on library composition we also compare and contrast concepts to adaptation mechanisms in other languages efficient non intrusive adaptation mechanisms are essential when adapting data structures to library’s api development with reusable components is widely practiced method of building software components vary in form ranging from source code to non modifiable binary libraries the concepts language features slated to appear in the next version of have been designed with such compositions in mind promising an improved ability to create generic non intrusive efficient and identity preserving adapters we report on two cases of data structure adaptation between different libraries and illustrate best practices and idioms first we adapt gui widgets from several libraries with differing apis for use with generic layout engine we further develop this example to describe the run time concept idiom extending the applicability of concepts to domains where run time polymorphism is required second we compose an image processing library and graph algorithm library by making use of transparent adaptation layer enabling the efficient application of graph algorithms to the image processing domain we use the adaptation layer to realize few key algorithms and report little or no performance degradation
sustainable hci is now recognized area of human computer interaction drawing from variety of disciplinary approaches including the arts how might hci researchers working on sustainability productively understand the discourses and practices of ecologically engaged art as means of enriching their own activities we argue that an understanding of both the history of ecologically engaged art and the art historical and critical discourses surrounding it provide fruitful entry point into more critically aware sustainable hci we illustrate this through consideration of frameworks from the arts looking specifically at how these frameworks act more as generative devices than prescriptive recipes taking artistic influences seriously will require concomitant rethinking of sustainable hci standpoints potentially useful exercise for hci research in general
parameterized tiled loops where the tile sizes are not fixed at compile time but remain symbolic parameters until later are quite useful for iterative compilers and auto tuners that produce highly optimized libraries and codes tile size parameterization could also enable optimizations such as register tiling to become dynamic optimizations although it is easy to generate such loops for hyper rectangular iteration spaces tiled with hyper rectangular tiles many important computations do not fall into this restricted domain parameterized tile code generation for the general case of convex iteration spaces being tiled by hyper rectangular tiles has in the past been solved with bounding box approaches or symbolic fourier motzkin approaches however both approaches have less than ideal code generation efficiency and resulting code quality we present the theoretical foundations implementation and experimental validation of simple unified technique for generating parameterized tiled code our code generation efficiency is comparable to all existing code generation techniques including those for fixed tile sizes and the resulting code is as efficient as if not more than all previous techniques thus the technique provides parameterized tiled loops for free our one size fits all solution which is available as open source software can be adapted for use in production compilers
in order to perform meaningful experiments in optimizing compilation and run time system design researchers usually rely on suite of benchmark programs of interest to the optimization technique under consideration programs are described as numeric memory intensive concurrent or object oriented based on qualitative appraisal in some cases with little justification we believe it is beneficial to quantify the behaviour of programs with concise and precisely defined set of metrics in order to make these intuitive notions of program behaviour more concrete and subject to experimental validation we therefore define and measure set of unambiguous dynamic robust and architecture independent metrics that can be used to categorize programs according to their dynamic behaviour in five areas size data structure memory use concurrency and polymorphism framework computing some of these metrics for java programs is presented along with specific results demonstrating how to use metric data to understand program’s behaviour and both guide and evaluate compiler optimizations
we investigate the problem of using materialized views to answer sql queries we focus on modern decision support queries which involve joins arithmetic operations and other possibly user defined functions aggregation often along multiple dimensions and nested subqueries given the complexity of such queries the vast amounts of data upon which they operate and the requirement for interactive response times the use of materialized views mvs of similar complexity is often mandatory for acceptable performance we present novel algorithm that is able to rewrite user query so that it will access one or more of the available mvs instead of the base tables the algorithm extends prior work by addressing the new sources of complexity mentioned above that is complex expressions multidimensional aggregation and nested subqueries it does so by relying on graphical representation of queries and bottom up pair wise matching of nodes from the query and mv graphs this approach offers great modularity and extensibility allowing for the rewriting of large class of queries
the design of operating system for wireless sensor network wsn deviates from the traditional operating system design due to their specific characteristics like constrained resources high dynamics and inaccessible deployment environments we provide classification framework that surveys the state of the art in wsn operating systems os the purpose of this survey is two fold one is to classify the existing operating systems according to important os features and the other is to suggest appropriate oss for different categories of wsn applications mapping the application requirements and os features this classification helps in understanding the contrasting differences among existing operating systems and lays foundation to design an ideal wsn os we also classified existing wsn applications to help the application developer in choosing the appropriate os based on the application requirements summary and analysis and discussion of future research directions in this area have been presented
many problems in embedded compilation require one set of optimizations to be selected over another based on run time performance self tuned libraries iterative compilation and machine learning techniques all compare multiple compiled program versions in each program versions are timed to determine which has the best performance the program needs to be run multiple times for each version because there is noise inherent in most performance measurements the number of runs must be enough to compare different versions despite the noise but executing more than this will waste time and energy the compiler writer must either risk taking too few runs potentially getting incorrect results or taking too many runs increasing the time for their experiments or reducing the number of program versions evaluated prior works choose constant size sampling plans where each compiled version is executed fixed number of times without regard to the level of noise in this paper we develop sequential sampling plan which can automatically adapt to the experiment so that the compiler writer can have both confidence in the results and also be sure that no more runs were taken than were needed we show that our system is able to correctly determine the best optimization settings with between and fewer runs than needed by brute force constant sampling size approach we also compare our approach to javastats we needed to fewer runs than it needed
in many application domains scenarios have been developed that benefit from the idea of ambience systems will not necessarily be activated by people anymore but will react on their own to situations they recognize it thereby must dynamically adapt itself to changes in the technical environment or user context in addition such dynamically reconfigurable products must be customized to the individual needs of particular users product line engineering can be applied to create these variants efficiently however means for handling adaptation capabilities at generic level are required this paper introduces the front end of such means by describing an approach for analysis and specification of features that vary as part of reconfigurations at runtime
technology advancements have enabled the integration of large on die embedded dram edram caches edram is significantly denser than traditional srams but must be periodically refreshed to retain data like sram edram is susceptible to device variations which play role in determining refresh time for edram cells refresh power potentially represents large fraction of overall system power particularly during low power states when the cpu is idle future designs need to reduce cache power without incurring the high cost of flushing cache data when entering low power states in this paper we show the significant impact of variations on refresh time and cache power consumption for large edram caches we propose hi ecc technique that incorporates multi bit error correcting codes to significantly reduce refresh rate multi bit error correcting codes usually have complex decoder design and high storage cost hi ecc avoids the decoder complexity by using strong ecc codes to identify and disable sections of the cache with multi bit failures while providing efficient single bit error correction for the common case hi ecc includes additional optimizations that allow us to amortize the storage cost of the code over large data words providing the benefit of multi bit correction at same storage cost as single bit error correcting secded code overhead our proposal achieves reduction in refresh power vs baseline edram cache without error correcting capability and reduction in refresh power vs system using secded codes
static analysis of ada tasking programs has been hindered by the well known state explosion problem that arises in the verification of concurrent systems many different techniques have been proposed to combat this state explosion all proposed methods excel on certain kinds of systems but there is little empirical data comparing the performance of the methods in this paper we select one representative from each of three very different approaches to the state explosion problem partial orders representing state space reductions symbolic model checking representing obdd based approaches and inequality necessary conditions representing integer programming based approaches we apply the methods to several scalable concurrency examples from the literature and to one real ada tasking program the results of these experiments are presented and their significance is discussed
dynamic epoch time synchronisation method for distributed simulation federates is presented the proposed approach allows federates to advance their local times at full speed to the global safe point which is dynamically estimated using the look ahead function for each federate the simulation then slows for an interaction between federates this approach aims to reduce the number of time synchronisation occurrences and duration of the conservative phase the distributed simulation is implemented using the web services technology the experimental results reveal that the proposed approach reduces simulation execution time significantly while maintaining complete accuracy as compared with two existing methods
adapting keyword search to xml data has been attractive recently generalized as xml keyword search xks its fundamental task is to retrieve meaningful and concise result for the given keyword query and is the latest work which returns the fragments rooted at the slca smallest lca lowest common ancestor nodes to guarantee the fragments only containing meaningful nodes proposed contributor based filtering mechanism in its maxmatch algorithm however the filtering mechanism is not sufficient it will commit the false positive problem discarding interesting nodes and the redundancy problem keeping uninteresting nodes in this paper we propose new filtering mechanism to overcome those two problems the fundamental concept is valid contributor child is valid contributor to its parent if v’s label is unique among all u’s children or for the siblings with same label as v’s content is not covered by any of them our new filtering mechanism is all the nodes in each retrieved fragment should be valid contributors to their parents by doing so it not only satisfies the axiomatic properties proposed by but also ensures the filtered fragment more meaningful and concise we implement our proposal in validmatch and compare validmatch with maxmatch on real and synthetic xml data the result verifies our claims and shows the effectiveness of our valid contributor based filtering mechanism
recently system architects have built low power high performance clusters such as green destiny the idea behind these clusters is to improve the energy efficiency of nodes however these clusters save power at the expense of performance our approach is instead to use high performance cluster nodes that are frequency and voltage scalable energy can than be saved by scaling down the cpu our prior work has examined the costs and benefits of executing an entire application at single reduced frequencythis paper presents framework for executing single application in several frequency voltage settings the basic idea is to first divide programs into phases and then execute series of experiments with each phase assigned prescribed frequency during each experiment we measure energy consumption and time and then use heuristic to choose the assignment of frequency to phase for the next experimentour results show that significant energy can be saved without an undue performance penalty particularly our heuristic finds assignments of frequency to phase that is superior to any fixed frequency solution specifically this paper shows that more than half of the nas benchmarks exhibit better energy time tradeoff using multiple gears than using single gear for example is using multiple gears uses less energy and executes in less time than the closest single gear solution compared to no frequency scaling multiple gear is uses less energy while executing only longer
the purpose of the architecture evaluation of software system is to analyze the architecture to identify potential risks and to verify that the quality requirements have been addressed in the design this survey shows the state of the research at this moment in this domain by presenting and discussing eight of the most representative architecture analysis methods the selection of the studied methods tries to cover as many particular views of objective reflections as possible to be derived from the general goal the role of the discussion is to offer guidelines related to the use of the most suitable method for an architecture assessment process we will concentrate on discovering similarities and differences between these eight available methods by making classifications comparision and appropriateness studies
some of the most common parallel programming idioms include locks barriers and reduction operations the interaction of these programming idioms with the multiprocessor’s coherence protocol has significant impact on performance in addition the advent of machines that support multiple coherence protocols prompts the question of how to best implement such parallel constructs ie what combination of implementation and coherence protocol yields the best performance in this paper we study the running time and communication behavior of centralized ticket and mcs spin locks centralized dissemination and tree based barriers and parallel and sequential reductions under pure and competitive update coherence protocols results for write invalidate protocol are presented mostly for comparison purposes our experiments indicate that parallel programming techniques that are well established for write invalidate protocols such as mcs locks and parallel reductions are often inappropriate for update based protocols in contrast techniques such as dissemination and tree barriers achieve superior performance under update based protocols our results also show that the implementation of parallel programming idioms must take the coherence protocol into account since update based protocols often lead to different design decisions than write invalidate protocols our main conclusion is that protocol conscious implementation of parallel programming structures can significantly improve application performance for multiprocessors that can support more than one coherence protocol both the protocol and implementation should betaken into account when exploiting parallel constructs
due to the high skewed nature of network flow size distributions uniform packet sampling concentrates too much on few large flows and ignores the majority of small ones to overcome this drawback recently proposed sketch guided sampling sgs selects each packet at probability that is decreasing with its current flow size which results in better flow wide fairness however the pitfall of sgs is that it needs large high speed memory to accommodate flow size sketch making it impractical to be implemented and inflexible to be deployed we refined the flow size sketch using multi resolution left hashing schema which is both space efficient and accurate new fair packet sampling algorithm which is named space efficient fair sampling sefs is proposed based on this novel flow size sketch we compared the performance of sefs with that of sgs in the context of flow traffic measurement and large flow identification using real world traffic traces the experimental results show that sefs outperforms sgs in both application contexts while reduction of percent in space complexity can be achieved
this paper describes benchmark for evaluation of mesh segmentation salgorithms the benchmark comprises data set with manually generated segmentations for surface meshes of different object categories and it includes software for analyzing geometric properties of segmentations and producing quantitative metrics for comparison of segmentations the paper investigates the design decisions made in building the benchmark analyzes properties of human generated and computer generated segmentations and provides quantitative comparisons of recently published mesh segmentation algorithms our results suggest that people are remarkably consistent in the way that they segment most surface meshes that no one automatic segmentation algorithm is better than the others for all types of objects and that algorithms based on non local shape features seem to produce segmentations that most closely resemble ones made by humans
description logics also called terminological logics are commonly used in knowledge based systems to describe objects and their relationships we investigate the learnability of typical description logic classic and show that classic sentences are learnable in polynomial time in the exact learning model using equivalence queries and membership queries which are in essence ldquo subsumption queries rdquo we show that membership queries alone are insufficient for polynomial time learning of classic sentences combined with earlier negative results of cohen and hirsh showing that given standard complexity theoretic assumptions equivalence queries alone are insufficient or random examples alone in the pac setting are insufficient this shows that both sources of information are necessary for efficient learning in that neither type alone is sufficient in addition we show that modification of the algorithm deals robustly with persistent malicious two sided classification noise in the membership queries with the probability of misclassification bounded below
more than twelve years have elapsed since the first public release of weka in that time the software has been rewritten entirely from scratch evolved substantially and now accompanies text on data mining these days weka enjoys widespread acceptance in both academia and business has an active community and has been downloaded more than million times since being placed on source forge in april this paper provides an introduction to the weka workbench reviews the history of the project and in light of the recent stable release briefly discusses what has been added since the last stable version weka released in
so far extending light field rendering to dynamic scenes has been trivially treated as the rendering of static light fields stacked in time this type of approaches requires input video sequences in strict synchronization and allows only discrete exploration in the temporal domain determined by the capture rate in this paper we propose novel framework space time light field rendering which allows continuous exploration of dynamic scene in both spatial and temporal domain with unsynchronized input video sequencesin order to synthesize novel views from any viewpoint at any time instant we develop two stage rendering algorithm we first interpolate in the temporal domain to generate globally synchronized images using robust spatial temporal image registration algorithm followed by edge preserving image morphing we then interpolate those software synchronized images in the spatial domain to synthesize the final view our experimental results show that our approach is robust and capable of maintaining photo realistic results
an edge hidden graph is graph whose edges are not explicitly given detecting the presence of an edge requires expensive edge probing queries we consider the most connected vertex problem on hidden bipartite graphs specifically given bipartite graph with independent vertex sets and the goal is to find the vertices in with the largest degrees using the minimum number of queries this problem can be regarded as top extension of semi join and is encountered in many applications in practice eg top spatial join with arbitrarily complex join predicates if and have and vertices respectively the number of queries needed to solve the problem is nm in the worst case this however is pessimistic estimate on how many queries are necessary on practical data in fact on some easy inputs the problem can be efficiently settled with only km edges which is significantly lower than nm for the huge difference between km and nm makes it interesting to design an adaptive algorithm that is guaranteed to achieve the best possible performance on every input we give such an algorithm and prove that it is instance optimal among broad class of solutions this means that for any our algorithm can perform more queries than the optimal solution which is currently unknown by only constant factor which can be shown to be at most extensive experiments demonstrate that in practice the number of queries required by our technique is far less than nm and agrees with our theoretical findings very well
consider the following network design problem given network source sink pairs si ti arrive and desire to send unit of flow between themselves the cost of the routing is this if edge carries total of fe flow from all the terminal pairs the cost is given by sigma el fe where is some concave cost function the goal is to minimize the total cost incurred however we want the routing to be oblivious when terminal pair si ti makes its routing decisions it does not know the current flow on the edges of the network nor the identity of the other pairs in the system moreover it does not even know the identity of the function merely knowing that is concave function of the total flow on the edge how should it obliviously route its one unit of flow can we get competitive algorithms for this problem in this paper we develop framework to model oblivious network design problems of which the above problem is special case and give algorithms with poly logarithmic competitive ratio for problems in this framework and hence for this problem abstractly given problem like the one above the solution is multicommodity flow producing load on each edge of le fk and the total cost is given by an aggregation function agg le lem of the loads of all edges our goal is to develop oblivious algorithms that approximately minimize the total cost of the routing knowing the aggregation function agg but merely knowing that lies in some class and having no other information about the current state of the network hence we want algorithms that are simultaneously function oblivious as well as traffic oblivious the aggregation functions we consider are the max and sigma objective functions which correspond to the well known measures of congestion and total cost of network in this paper we prove the following bull if the aggregation function is sigma we give an oblivious algorithm with log competitive ratio whenever the load function is in the class of monotone sub additive functions recall that our algorithm is also function oblivious it works whenever each edge has load function in the class bull for the case when the aggregation function is max we give an oblivious algorithm with log log log competitive ratio when the load function is norm we also show that such competitive ratio is not possible for general sub additive functionsthese are the first such general results about oblivious algorithms for network design problems and we hope the ideas and techniques will lead to more and improved results in this area
cooperative work in learning environments has been shown to be successful extension to traditional learning systems due to the great impact of cooperation on students motivation and learning success in this paper we describe new approach to cooperative construction of cryptographic protocols using an appropriate visual language vl students describe protocol step by step modeling subsequent situations and alternating this with the creation of concept keyboard ck describing the operations in the protocol the system automatically generates colored petri subnet that is matched against an existing action logic specifying the protocol finally the learners implement role dependent cks in cooperative workflow and perform role play simulation
we study the performance benefits of speculation in release consistent software distributed shared memory system we propose new protocol speculative home based release consistency shrc that speculatively updates data at remote nodes to reduce the latency of remote memory accesses our protocol employs predictor that uses patterns in past accesses to shared memory to predict future accesses we have implemented our protocol in release consistent software distributed shared memory system that runs on commodity hardware we evaluate our protocol implementation using eight software distributed shared memory benchmarks and show that it can result in significant performance improvements
using only shadow trajectories of stationary objects in scene we demonstrate that using set of six or more photographs are sufficient to accurately calibrate the camera moreover we present novel application where using only three points from the shadow trajectory of the objects one can accurately determine the geo location of the camera up to longitude ambiguity and also the date of image acquisition without using any gps or other special instruments we refer to this as geo temporal localization we consider possible cases where ambiguities can be removed if additional information is available our method does not require any knowledge of the date or the time when the pictures are taken and geo temporal information is recovered directly from the images we demonstrate the accuracy of our technique for both steps of calibration and geo temporal localization using synthetic and real data
in mediator system based on annotated logics it is suitable requirement to allow annotations from different lattices in one program on per predicate basis these lattices however may be related through common sublattices hence demanding predicates which are able to carry combinations of annotations or access to components of annotations we show both demands to be satisifiable by using various composition operations on the domain of complete bounded distributive lattices or bilattices most importantly the free distributive product an implementation of the presented concepts based on the komet implementation of slg al with constraints is briefly introduced
this paper presents compiler analysis for data communication for the purpose of transforming ordinary programs into ones that run on distributed systems such transformations have been used for process migration and computation offloading to improve the performance of mobile computing devices in client server distributed environment the efficiency of an application can be improved by careful partitioning of tasks between the server and the client optimal task partitioning depends on the tradeoff between the computation workload and the communication cost our compiler analysis assisted by minimum set of user assertions estimates the amount of data communication between procedures the paper also presents experimental results based on an implementation in the gcc compiler the static estimates for several multimedia programs are compared against dynamic measurement performed using shade sun microsystem’s instruction level simulator the results show high precision of the static analysis for most pairs of the procedures
one of the recent web developments has focused on the opportunities it presents for social tagging through user participation and collaboration as result social tagging has changed the traditional online communication process the interpretation of tagging between humans and machines may create new problems if essential questions about how social tagging corresponds to online communications what objects the tags refer to who the interpreters are and why they are engaged are not explored systematically since such reasoning is an interpretation of social tagging among humans tags and machines it is complex issue that calls for deep reflection in this paper we investigate the relevance of the potential problems raised by social tagging through the framework of cs peirce’s semiotics we find that general phenomena of social tagging can be well classified by peirce’s classes of signs for reasoning this suggests that regarding social tagging as sign and systematically analyzing the interpretation are positively associated with the classes of signs peircean semiotics can be used to examine the dynamics and determinants of tagging hence the various uses of this categorization schema may have implications for the design and development of information systems and web applications
partial evaluation is semantics based program optimization technique which has been investigated within different programming paradigms and applied to wide variety of languages recently partial evaluation framework for functional logic programs has been proposed in this framework narrowing the standard operational semantics of integrated languages is used to drive the partial evaluation process this paper surveys the essentials of narrowing driven partial evaluation
advances in semiconductor technologies have placed mpsocs center stage as standard architecture for embedded applications of ever increasing complexity because of real time constraints applications are usually statically parallelized and scheduled onto the target mpsoc so as to obtain predictable worst case performance however both technology scaling trends and resource competition among applications have led to variations in the availability of resources during execution thus questioning the dynamic viability of the initial static schedules to eliminate this problem in this paper we propose to statically generate compact schedule with predictable response to various resource availability constraints such schedules are generated by adhering to novel band structure capable of spawning dynamically regular reassignment upon resource variations through incorporating several soft constraints into the original scheduling heuristic the proposed technique can furthermore exploit the inherent timing slack between dependent tasks thus retaining the spatial and temporal locality of the original schedule the efficacy of the proposed technique is confirmed by incorporating it into widely adopted list scheduling heuristic and experimentally verifying it in the context of single processor deallocations
logic programming update languages were proposed as an extension of logic programming that allows modeling the dynamics of knowledge bases where both extensional facts and intentional knowledge rules may change over time due to updates despite their generality these languages do not provide means to directly access past states of the evolving knowledge they are limited to so called markovian change ie changes entirely determined by the current statewe remedy this limitation by extending the logic programming update language evolp with ltl like temporal operators that allow referring to the history of the evolving knowledge base and show how this can be implemented in logic programming framework
due to the importance of skyline query in many applications it has been attracted much attention recently given an dimensional dataset point is said to dominate another point if is better than in at least one dimension and equal to or better than in the remaining dimensions recently li et al proposed to analyze more general dominant relationship in business model that users are more interested in the details of the dominant relationship in dataset ie point dominates how many other points in this paper we further generalize this problem that users are more interested in whom these dominated points are we show that the framework proposed in can not efficiently solve this problem we find the interrelated connection between the partial order and the dominant relationship based on this discovery we propose efficient algorithms to answer the general dominant relationship queries by querying the partial order representation of spatial datasets extensive experiments illustrate the effectiveness and efficiency of our methods
binary decision diagrams bdds have recently become widely accepted as space efficient method of representing relations in points to analyses when bdds are used to represent relations each element of domain is assigned bit pattern to represent it but not every bit pattern represents an element the circuit design model checking and verification communities have achieved significant reductions in bdd sizes using zero suppressed bdds zbdds to avoid the overhead of these don’t care bit patterns we adapt bdd based program analyses to use zbdds instead of bdds our experimental evaluation studies the space requirements of zbdds for both context insensitive and context sensitive program analyses and shows that zbdds can greatly reduce the space requirements for expensive context sensitive points to analysis using zbdds to reduce the size of the relations allows compiler or other software analysis tools to analyze larger programs with greater precision we also provide metric that can be used to estimate whether zbdds will be more compact than bdds for given analysis
very recently topic model based retrieval methods have produced good results using latent dirichlet allocation lda model or its variants in language modeling framework however for the task of retrieving annotated documents when using the lda based methods some post processing is required outside the model in order to make use of multiple word types that are specified by the annotations in this paper we explore new retrieval methods using multitype topic model that can directly handle multiple word types such as annotated entities category labels and other words that are typically used in wikipedia we investigate how to effectively apply the multitype topic model to retrieve documents from an annotated collection and show the effectiveness of our methods through experiments on entity ranking using wikipedia collection
cost estimation is vital task in most important software project decisions such as resource allocation and bidding analogy based cost estimation is particularly transparent as it relies on historical information from similar past projects whereby similarities are determined by comparing the projects key attributes and features however one crucial aspect of the analogy based method is not yet fully accounted for the different impact or weighting of project’s various features current approaches either try to find the dominant features or require experts to weight the features neither of these yields optimal estimation performance therefore we propose to allocate separate weights to each project feature and to find the optimal weights by extensive search we test this approach on several real world data sets and measure the improvements with commonly used quality metrics we find that this method increases estimation accuracy and reliability reduces the model’s volatility and thus is likely to increase its acceptance in practice and indicates upper limits for analogy based estimation quality as measured by standard metrics
problem with many of today’s appliance interfaces is that they are inconsistent for example the procedure for setting the time on alarm clocks and vcrs differs even among different models made by the same manufacturer finding particular functions can also be challenge because appliances often organize their features differently this paper presents system called uniform which approaches this problem by automatically generating remote control interfaces that take into account previous interfaces that the user has seen during the generation process uniform is able to automatically identify similarities between different devices and users may specify additional similarities the similarity information allows the interface generator to use the same type of controls for similar functions place similar functions so that they can be found with the same navigation steps and create interfaces that have similar visual appearance
the use of domain specific languages and appropriate software architectures are currently seen as the way to enhance reusability and improve software productivity here we outline use of algebraic software methodologies and advanced program constructors to improve the abstraction level of software for scientific computing this leads us to the language of coordinate free numerics as an alternative to the traditional coordinate dependent array notation this provides the backdrop for the three accompanying papers it coordinate free programming of computational fluid dynamics problems centered around an example of using coordinate free numerics it machine and collection abstractions for user implemented data parallel programming exploiting the higher abstraction level when parallelising code and it an algebraic programming style for numerical software and its optimization looking at high level transformations enabled by the domain specific programming style
temporal constraints are often set when complex science processes are modelled as scientific workflow specifications however many existing processes such as climate modelling often have only few coarse grained temporal constraints globally this is not sufficient to control overall temporal correctness as we can not find temporal violations locally in time for handling local handling affects fewer workflow activities hence more cost effective than global handling with coarse grained temporal constraints therefore in this paper we systematically investigate how to localise group of fine grained temporal constraints so that temporal violations can be indentified locally for better handling cost effectiveness the corresponding algorithms are developed the quantitative evaluation demonstrates that with local fine grained temporal constraints we can improve handling cost effectiveness significantly than only with coarse grained ones
wireless sensor networks are moving towards emerging standards such as ip zigbee and wirelesshart which makes interoperability testing important interoperability testing is performed today through black box testing with vendors physically meeting to test their equipment black box testing can test interoperability but gives no detailed information of the internals in the nodes during the testing blackbox testing is required because existing simulators cannot simultaneously simulate sensor nodes with different firmware for standards such as ip and wirelesshart white box interoperability testing approach is desired since it gives details on both performance and clues about why tests succeeded or failed to allow white box testing we propose simulation based approach to interoperability testing where the firmware from different vendors is run in the same simulator we extend our mspsim emulator and cooja wireless sensor network simulator to support interoperable simulation of sensor nodes with firmware from different vendors to demonstrate both cross vendor interoperability and the benefits of white box interoperability testing we run the state of the art contiki and tinyos operating systems in single simulation because of the white box testing we can do performance measurement and power profiling over both operating systems
we develop framework to study probabilistic sampling algorithms that approximate general functions of the form genfunc where domain and range are arbitrary sets our goal is to obtain lower bounds on the query complexity of functions namely the number of input variables xi that any sampling algorithm needs to query to approximate ldots xn we define two quantitative properties of functions the it block sensitivity and the minimum hellinger distance that give us techniques to prove lower bounds on the query complexity these techniques are quite general easy to use yet powerful enough to yield tight results our applications include the mean and higher statistical moments the median and other selection functions and the frequency moments where we obtain lower bounds that are close to the corresponding upper boundswe also point out some connections between sampling and streaming algorithms and lossy compression schemes
recently active behavior has received attention in the xml field to automatically react to occurred events aside from proprietary approaches for enriching xml with active behavior the wc standardized the document object model dom event module for the detection of events in xml documents when using any of these approaches however it is often impossible to decide which event to react upon because not single event but combination of multiple events ie composite event determines situation to react upon the paper presents the first approach for detecting composite events in xml documents by addressing the peculiarities of xml events which are caused by their hierarchical order in addition to their temporal order it also provides for the detection of satisfied multiplicity constraints defined by xml schemas thereby the approach enables applications operating on xml documents to react to composite events which have richer semantics
hard to predict system behavior and or reliability issues resulting from migrating to new technology nodes requires considering runtime adaptivity in future on chip systems runtime observability is prerequisite for runtime adaptivity as it is providing necessary system information gathered on the fly we are presenting the first comprehensive runtime observability infrastructure for an adaptive network on chip architecture which is flexible eg in choosing the routing path hardly intrusive and requires little additional overhead around of the total link bandwidth the hardware overhead is negligible too and is in fact less than the hardware savings due to resource multiplexing capabilities that are achieved through runtime observability adaptivity as an example our on demand buffer assignment scheme increases the buffer utilization and decreases the overall buffer requirements by an average of the buffer area amounts to about of the entire router area in our case study analysis compared to fixed buffer assignment scheme our runtime observability on an average also increases the connection success rate by compared to the case without runtime observability for the applications from the es benchmark suite we show the advantages obtained through runtime observability and compare with state of the art communication centric designs
wide area data delivery requires timely propagation of up to date information to thousands of clients over wide area network applications include web caching rss source monitoring and email access via mobile network data sources vary widely in their update patterns and may experience different update rates at different times or unexpected changes to update patterns traditional data delivery solutions are either push based which requires servers to push updates to clients or pull based which require clients to check for updates at servers while push based solutions ensure timely data delivery they are not always feasible to implement and may not scale to large number of clients in this article we present adaptive pull based policies that explicitly aim to reduce the overhead of contacting remote servers compared to existing pull based policies while meeting freshness requirements we model updates to data sources using update histories and present two novel history based policies to estimate when updates occur they are based on individual history and aggregate history these policies are presented within an architectural framework that supports their deployment either client side or server side we further develop two adaptive policies to handle objects that initially may have insufficient history or objects that experience changes in update patterns extensive experimental evaluation using three data traces from diverse applications shows that history based policies can reduce contact between clients and servers by up to percnt compared to existing pull based policies while providing comparable level of data freshness our experiments further demonstrate that our adaptive policies can select the best policy to match the behavior of an object and perform better than any individual policy thus they dominate standalone policies
in recent years active learning methods based on experimental design achieve state of the art performance in text classification applications although these methods can exploit the distribution of unlabeled data and support batch selection they cannot make use of labeled data which often carry useful information for active learning in this paper we propose novel active learning method for text classification called supervised experimental design sed which seamlessly incorporates label information into experimental design experimental results show that sed outperforms its counterparts which either discard the label information even when it is available or fail to exploit the distribution of unlabeled data
family of kernels for statistical learning is introduced that exploits the geometric structure of statistical models the kernels are based on the heat equation on the riemannian manifold defined by the fisher information metric associated with statistical family and generalize the gaussian kernel of euclidean space as an important special case kernels based on the geometry of multinomial families are derived leading to kernel based learning algorithms that apply naturally to discrete data bounds on covering numbers and rademacher averages for the kernels are proved using bounds on the eigenvalues of the laplacian on riemannian manifolds experimental results are presented for document classification for which the use of multinomial geometry is natural and well motivated and improvements are obtained over the standard use of gaussian or linear kernels which have been the standard for text classification
an improved understanding of the relationship between search intent result quality and searcher behavior is crucial for improving the effectiveness of web search while recent progress in user behavior mining has been largely focused on aggregate server side click logs we present new class of search behavior models that also exploit fine grained user interactions with the search results we show that mining these interactions such as mouse movements and scrolling can enable more effective detection of the user’s search goals potential applications include automatic search evaluation improving search ranking result presentation and search advertising we describe extensive experimental evaluation over both controlled user studies and logs of interaction data collected from hundreds of real users the results show that our method is more effective than the current state of the art techniques both for detection of searcher goals and for an important practical application of predicting ad clicks for given search session
initially designers only had keyboard and lines of text to design then the mouse enabled richer design ecosystem with two dimensional plains of ui now the design and research communities have access to multi touch and gestural interfaces which have been released on mass market scale this allows them to design and develop new unique and richer design patterns and approaches these methods are no longer confined to research projects or innovation labs but are now offered on large scale to millions of consumers with these new interface behaviors in combination with multiple types of hardware devices that can affect the interface there are new problems and patterns that have increased the complexity of designing interfaces the aim of this sig is to provide forum for designers researchers and usability professionals to discuss this new and emerging technology trends for multi touch and gesture interfaces as well as discuss current design patterns within these interfaces our goal is to cross pollinate ideas and current solutions from practitioners and researchers across communities to help drive awareness of this new field for those interested in just starting in or currently involved in the design of these systems
impostors are image based primitives commonly used to replace complex geometry in order to reduce the rendering time needed for displaying complex scenes however big problem is the huge amount of memory required for impostors this paper presents an algorithm that automatically places impostors into scene so that desired frame rate and image quality is always met while at the same time not requiring enormous amounts of impostor memory the low memory requirements are provided by new placement method and through the simultaneous use of other acceleration techniques like visibility culling and geometric levels of detail
many computer science departments are debating the role of programming languages in the curriculum these discussions often question the relevance and appeal of programming languages content for today’s students in our experience domain specific little languages projects provide compelling illustration of the importance of programming language concepts this paper describes projects that prototype mainstream applications such as powerpoint turbotax and animation scripting we have used these exercises as modules in non programming languages courses including courses for first year students such modules both encourage students to study linguistic topics in more depth and provide linguistic perspective to students who might not otherwise be exposed to the area
modern day datacenters host hundreds of thousands of servers that coordinate tasks in order to deliver highly available cloud computing services these servers consist of multiple hard disks memory modules network cards processors etc each of which while carefully engineered are capable of failing while the probability of seeing any such failure in the lifetime typically years in industry of server can be somewhat small these numbers get magnified across all devices hosted in datacenter at such large scale hardware component failure is the norm rather than an exception hardware failure can lead to degradation in performance to end users and can result in losses to the business sound understanding of the numbers as well as the causes behind these failures helps improve operational experience by not only allowing us to be better equipped to tolerate failures but also to bring down the hardware cost through engineering directly leading to saving for the company to the best of our knowledge this paper is the first attempt to study server failures and hardware repairs for large datacenters we present detailed analysis of failure characteristics as well as preliminary analysis on failure predictors we hope that the results presented in this paper will serve as motivation to foster further research in this area
we describe our visualization process for particle based simulation of the formation of the first stars and their impact on cosmic history the dataset consists of several hundred time steps of point simulation data with each time step containing approximately two million point particles for each time step we interpolate the point data onto regular grid using method taken from the radiance estimate of photon mapping we import the resulting regular grid representation into paraview with which we extract isosurfaces across multiple variables our images provide insights into the evolution of the early universe tracing the cosmic transition from an initially homogeneous state to one of increasing complexity specifically our visualizations capture the build up of regions of ionized gas around the first stars their evolution and their complex interactions with the surrounding matter these observations will guide the upcoming james webb space telescope the key astronomy mission of the next decade
first generation corba middleware was reasonably successful at meeting the demands of request response applications with best effort quality of service qos requirements supporting applications with more stringent qos requirements poses new challenges for next generation real time corba middleware however this paper provides three contributions to the design and optimization of real time corba middleware first we outline the challenges faced by real time orbs implementers focusing on optimization principle patterns that can be applied to corba’s object adapter and orb core second we describe how tao our real time corba implementation addresses these challenges and applies key orb optimization principle patterns third we present the results of empirical benchmarks that compare the impact of tao’s design strategies on orb efficiency predictability and scalability our findings indicate that orbs must be highly configurable and adaptable to meet the qos requirements for wide range of real time applications in addition we show how tao can be configured to perform predictably and scalably which is essential to support real time applications key result of our work is to demonstrate that the ability of corba orbs to support real time systems is mostly an implementation detail thus relatively few changes are required to the standard corba reference model and programming api to support real time applications
while it is becoming more common to see model checking applied to software requirements specifications it is seldom applied to software implementations the automated software engineering group at nasa ames is currently investigating the use of model checking for actual source code with the eventual goal of allowing software developers to augment traditional testing with model checking because model checking suffers from the state explosion problem one of the main hurdles for program model checking is reducing the size of the program in this paper we investigate the use of abstraction techniques to reduce the state space of real time operating system kernel written in we show how informal abstraction arguments could be formalized and improved upon within the framework of predicate abstraction technique based on abstract interpretation we introduce some extensions to predicate abstraction that all allow it to be used within the class instance framework of object oriented languages we then demonstrate how these extensions were integrated into an abstraction tool that performs automated predicate abstraction of java programs
image annotation and classification are important areas where pattern recognition algorithms can be applied in this article we report the insights that we have gained during our participation in the imageclef medical annotation task during the years and grayscale radiograph images taken from clinical routine had to be classified into one of the base classes or labeled with attributes which described various properties of the radiograph we present an algorithm based on local relational features which is robust with respect to illumination changes it incorporates the geometric constellation of the feature points during the matching process and thus obtains superior performance furthermore hierarchical classification scheme is presented which reduces the computational complexity of the classifier
energy consumption in hosting internet services is becoming pressing issue as these services scale up dynamic server provisioning techniques are effective in turning off unnecessary servers to save energy such techniques mostly studied for request response services face challenges in the context of connection servers that host large number of long lived tcp connections in this paper we characterize unique properties performance and power models of connection servers based on real data trace collected from the deployed windows live messenger using the models we design server provisioning and load dispatching algorithms and study subtle interactions between them we show that our algorithms can save significant amount of energy without sacrificing user experiences
as data warehousing applications grow in size existing data organizations and access strategies such as relational tables and tree indexes are becoming increasingly ineffective the two primary reasons for this are that these datasets involve many attributes and the queries on the data usually involve conditions on small subsets of the attributes two strategies are known to address these difficulties well namely vertical partitioning and bitmap indexes in this paper we summarize our experience of implementing number of bitmap index schemes on vertically partitioned data tables one important observation is that simply scanning the vertically partitioned data tables is often more efficient than using tree based indexes to answer ad hoc range queries on static datasets for these range queries compressed bitmap indexes are in most cases more efficient than scanning vertically partitioned tables we evaluate the performance of two different compression schemes for bitmap indexes stored is various ways using the compression scheme called word aligned hybrid code wah to store the bitmaps in plain files shows the best overall performance for bitmap indexes tests indicate that our bitmap index strategy based on wah is not only efficient for attributes of low cardinality say but also for high cardinality attributes with or more distinct values
we consider the parallels between the preference elicitation problem in combinatorial auctions and the problem of learning an unknown function from learning theory we show that learning algorithms can be used as basis for preference elicitation algorithms the resulting elicitation algorithms perform polynomial number of queries we also give conditions under which the resulting algorithms have polynomial communication our conversion procedure allows us to generate combinatorial auction protocols from learning algorithms for polynomials monotone dnf and linear threshold functions in particular we obtain an algorithm that elicits xor bids with polynomial communication
this paper addresses the issue of social recommendation based on collaborative filtering cf algorithms social recommendation emphasizes utilizing various attributes information and relations in social networks to assist recommender systems although recommendation techniques have obtained distinct developments over the decades traditional cf algorithms still have these following two limitations relational dependency within predictions an important factor especially when the data is sparse is not being utilized effectively and straightforward methods for combining features like linear integration suffer from high computing complexity in learning the weights by enumerating the whole value space making it difficult to combine various information into an unified approach in this paper we propose novel model multi scale continuous conditional random fields mccrf as framework to solve above problems for social recommendations in mccrf relational dependency within predictions is modeled by the markov property thus predictions are generated simultaneously and can help each other this strategy has never been employed previously besides diverse information and relations in social network can be modeled by state and edge feature functions in mccrf whose weights can be optimized globally thus both problems can be solved under this framework in addition we propose to utilize markov chain monte carlo mcmc estimation methods to solve the difficulties in training and inference processes of mccrf experimental results conducted on two real world data have demonstrated that our approach outperforms traditional cf algorithms additional experiments also show the improvements from the two factors of relational dependency and feature combination respectively
we propose novel approach for defining and querying super peer within schema based super peer network organized into two level architecture the low level called the peer level which contains mediator node the second one called super peer level which integrates mediators peers with similar content we focus on single super peer and propose method to define and solve query fully implemented in the sewasie project prototype the problem we faced is relevant as super peer is two level data integrated system then we are going beyond traditional setting in data integration we have two different levels of global as view mappings the first mapping is at the super peer level and maps several global virtual views gvvs of peers into the gvv of the super peer the second mapping is within peer and maps the data sources into the gvv of the peer moreover we propose an approach where the integration designer supported by graphical interface can implicitly define mappings by using resolution functions to solve data conflicts and the full disjunction operator that has been recognized as providing natural semantics for data merging queries
the rapid growth of heterogeneous devices and diverse networks in our daily life makes it is very difficult if not impossible to build one size fits all application or protocol which can run well in such dynamic environment adaptation has been considered as general approach to address the mismatch problem between clients and servers however we envision that the missing part which is also big challenge is how to inject and deploy adaptation functionality into the environment in this paper we propose novel application level protocol adaptation framework fractal which uses the mobile code technology for protocol adaptation and leverages existing content distribution networks cdn for protocol adaptors mobile codes deployment to the best of our knowledge fractal is the first application level protocol adaptation framework that considers the real deployment problem using mobile code and cdn to evaluate the proposed framework we have implemented two case studies an adaptive message encryption protocol and an adaptive communication optimization protocol in the adaptive message encryption protocol fractal always chooses proper encryption algorithm according to different application requirements and device characteristics and the adaptive communication optimization protocol is capable of dynamically selecting the best one from four communication protocols including direct sending gzip bitmap and vary sized blocking for different hardware and network configurations in comparison with other adaptation approaches evaluation results show the proposed adaptive approach performs very well on both the client side and server side for some clients the total communication overhead reduces compared with no protocol adaptation mechanism and compared with the static protocol adaptation approach
as processor technology continues to advance at rapid pace the principal performance bottleneck of shared memory systems has become the memory access latency in order to understand the effects of cache and memory hierarchy on system latencies performance analysts perform benchmark analysis on existing multiprocessors in this study we present detailed comparison of two architectures the hp class and the sgi origin our goal is to compare and contrast design techniques used in these multiprocessors we present the impact of processor design cache memory hierarchies and coherence protocol optimizations on the memory system performance of these multiprocessors we also study the effect of parallelism overheads such as process creation and synchronization on the user level performance of these multiprocessors our experimental methodology uses microbenchmarks as well as scientific applications to characterize the user level performance our microbenchmark results show the impact of cache size and tlb size on uniprocessor load store latencies the effect of coherence protocol design optimizations and data sharing patterns on multiprocessor memory access latencies and finally the overhead of parallelism our application based evaluation shows the impact of problem size dominant sharing patterns and number of processors used on speedup and raw execution time finally we use hardware counter measurements to study the correlation of system level performance metrics and the application’s execution time performance
refactoring as software engineering discipline has emerged over recent years to become an important aspect of maintaining software refactoring refers to the restructuring of software according to specific mechanics and principles in this paper we describe an analysis of the results from tool whose purpose was to identify and extract refactorings from seven open source java systems in particular we analyzed the mechanics of the most commonly and least commonly applied refactorings to try and account for their frequency results showed the most common refactorings of the fifteen coined gang of six to be generally those with high in degree and low out degree when mapped on dependency graph the same refactorings also featured strongly in the remedying of bad code smells remarkably and surprisingly inheritance and encapsulationbased refactorings were found to have been applied relatively infrequently we offer explanations for why this may be the case the paper thus identifies core refactorings central to many of the changes made by developers on open source systems while we can not guarantee that developers consciously undertake refactoring in any sense the empirical results demonstrate that simple renaming and moving fields methods between classes are common components of open source system re engineering from wider software engineering perspective knowledge of what modification will incur in likely sub tasks is of value to developers whether working on open source or other forms of software
the high dimensionality of massive data results in the discovery of large number of association rules the huge number of rules makes it difficult to interpret and react to all of the rules especially because many rules are redundant and contained in other rules we discuss how the sparseness of the data affects the redundancy and containment between the rules and provide new methodology for organizing and grouping the association rules with the same consequent it consists of finding metarules rules that express the associations between the discovered rules themselves the information provided by the metarules is used to reorganize and group related rules it is based only on data determined relationships between the rules we demonstrate the suggested approach on actual manufacturing data and show its effectiveness on several benchmark data sets
massively multiplayer online games mmogs and virtual worlds are among the most popular applications on the internet as player numbers increase the limits of the currently dominant client server architecture are becoming obvious in this paper we propose new distributed event dissemination protocol for virtual worlds and mmogs this protocol is based on the idea of mutual notification all players send their game event messages directly to all neighbouring players inside their area of interest aoi the connectedness of the system is ensured by binding neighbours they are selected using quad trees we show by simulation that the proposed system achieves practical performance for virtual worlds and mmogs
this paper presents ds novel peer to peer data sharing system ds is an architecture set of protocols and an implementation enabling the exchange of data among peers that are not necessarily connected to the internet peers can be either mobile or stationary it anticipates the information needs of users and fulfills them by searching from information among peers we evaluate via extensive simulations the effectiveness of our system for data dissemination among mobile devices with large number of user mobility scenarios we model several general data dissemination approaches and investigate the effect of the wireless converage range ds host density query interval and cooperation strategy among the mobile hosts using theory from random walks random environments and diffusion of controlled processes we model one of these data dissemination schemes and show that the analysis confirms the simulation results for scheme
tracing garbage collectors traverse references from live program variables transitively tracing out the closure of live objects memory accesses incurred during tracing are essentially random given object may contain references to any other object since application heaps are typically much larger than hardware caches tracing results in many cache misses technology trends will make cache misses more important so tracing is prime target for prefetchingsimulation of java benchmarks running with the boehm de mers weiser mark sweep garbage collector for projected hardware platform reveal high tracing overhead up to of elapsed time and that cache misses are problem applying boehm’s default prefetching strategy yields improvements in execution time on average with incremental generational collection for gc intensive benchmarks but analysis shows that his strategy suffers from significant timing problems prefetches that occur too early or too late relative to their matching loads this analysis drives development of new prefetching strategy that yields up to three times the performance improvement of boehm’s strategy for gc intensive benchmark average speedup and achieves performance close to that of perfect timing ie few misses for tracing accesses on some benchmarks validating these simulation results with live runs on current hardware produces average speedup of for the new strategy on gc intensive benchmarks with gc configuration that tightly controls heap growth in contrast boehm’s default prefetching strategy is ineffective on this platform
in this paper we propose new node allocation scheme that is based on probabilistic approach to protect mobile ad hoc network manet from partitioning this scheme adopts the normal distribution for node placement and we refer to it as the normally distributed node allocation scheme nas the proposed scheme is able to achieve high connectivity and good degree of coverage by controlling the positions of nodes in the manet in fully distributed way in this scheme theoretical upper bound of partition probability can be derived in addition unlike the existing schemes nas automatically initializes node positions when the manet is just constructed extensive simulations are carried out to validate the performance of our proposed scheme
high speed networks and rapidly improving microprocessor performance make the network of workstations an extremely important tool for parallel computing in order to speedup the execution of scientific applications shared memory is an attractive programming model for designing parallel and distributed applications where the programmer can focus on algorithmic development rather than data partition and communication based on this important characteristic the design of systems to provide the shared memory abstraction on physically distributed memory machines has been developed known as distributed shared memory dsm dsm is built using specific software to combine number of computer hardware resources into one computing environment such an environment not only provides an easy way to execute parallel applications but also combines available computational resources with the purpose of speeding up execution of these applications dsm systems need to maintain data consistency in memory which usually leads to communication overhead therefore there exists number of strategies that can be used to overcome this overhead issue and improve overall performance strategies as prefetching have been proven to show great performance in dsm systems since they can reduce data access communication latencies from remote nodes on the other hand these strategies also transfer unnecessary prefetching pages to remote nodes in this research paper we focus on the access pattern during execution of parallel application and then analyze the data type and behavior of parallel applications we propose an adaptive data classification scheme to improve prefetching strategy with the goal to improve overall performance adaptive data classification scheme classifies data according to the accessing sequence of pages so that the home node uses past history access patterns of remote nodes to decide whether it needs to transfer related pages to remote nodes from experimental results we can observe that our proposed method can increase the accuracy of data access in effective prefetch strategy by reducing the number of page faults and misprefetching experimental results using our proposed classification scheme show performance improvement of about over the same benchmark applications running on top of an original jiajia dsm system
this paper reflects upon existing composite based hypertext versioning systems and presents two high level design spaces that capture the range of potential choices in system data models for versioning links and versioning hypertext structure these two design spaces rest upon foundation consisting of containment model describing choices for containment in hypertext systems and the design space for persistently recording an object’s revision history with applicability to all versioning systems two example points in the structure versioning design space are presented corresponding to most existing composite based hypertext versioning systems using the presented design spaces allows the data models of existing hypertext versioning systems to be decomposed and compared in principled way and provides new system designers significant insight into the design tradeoffs between various link and structure versioning approaches
we are interested in the computing frontier around an essential question about compiler construction having program and set of non parametric compiler optimization modules called also phases is it possible to find sequence of these phases such that the performance execution time for instance of the final generated program is optimal we prove in this article that this problem is undecidable in two general schemes of optimizing compilation iterative compilation and library optimization generation fortunately we give some simplified cases when this problem be comes decidable and we provide some algorithms not necessary efficient that can answer our main question another essential question that we are interested in is parame ters space exploration in optimizing compilation tuning optimizing compilation parameters in this case we assume fixed sequence of optimization but each optimization phase is allowed to have parameter we try to figure out how to compute the best parameter values for all program transformations when the compilation sequence is given we also prove that this general problem is undecidable and we provide some simplified decidable instances
on line analytical processing olap has become one of the most powerful and prominent technologies for knowledge discovery in vldb very large database environments central to the olap paradigm is the data cube multi dimensional hierarchy of aggregate values that provides rich analytical model for decision support various sequential algorithms for the efficient generation of the data cube have appeared in the literature however given the size of contemporary data warehousing repositories multi processor solutions are crucial for the massive computational demands of current and future olap systemsin this paper we discuss the cgmcube project multi year effort to design and implement multi processor platform for data cube generation that targets the relational database model rolap more specifically we discuss new algorithmic and system optimizations relating to thorough optimization of the underlying sequential cube construction method and detailed and carefully engineered cost model for improved parallel load balancing and faster sequential cube construction these optimizations were key in allowing us to build prototype that is able to produce data cube output at rate of over one terabyte per hour
wavelet analysis is practical tool to study signal analysis and image processing traditional fourier transform can also transfer the signal into frequency domain but wavelet analysis is more attractive for its features of multi resolution and localization of frequency recently there has been significant development in the use of wavelet methods in the data mining process however the objective of the study described in this paper is twofold designing wavelet transform algorithm on the multiprocessor architecture and using this algorithm in mining spatial outliers of meteorological data spatial outliers are the spatial objects with distinct features from their surrounding neighbors outlier detection reveals important and valuable information from large spatial data sets as region outliers are commonly multi scale objects wavelet analysis is an effective tool to study them in this paper we present wavelet based approach and its applicability in outlier detection we design suite of algorithms to effectively discover region outliers and also parallel algorithm is designed to bring efficiency and speedup for the wavelet analysis the applicability and effectiveness of the developed algorithms are evaluated on real world meteorological dataset
with the emergence of research prototypes programming using atomic blocks and transactional memory tm is becoming more attractive this paper describes our experience building and using debugger for programs written with these abstractions we introduce three approaches debugging at the level of atomic blocks where the programmer is shielded from implementation details such as exactly what kind of tm is used or indeed whether lock inference is used instead ii debugging at the level of transactions where conflict rates read sets write sets and other tm internals are visible and iii debug time transactions which let the programmer manipulate synchronization from within the debugger eg enlarging the scope of an atomic block to try to identify bug in this paper we explain the rationale behind the new debugging approaches that we propose we describe the design and implementation of an extension to the windbg debugger enabling support for programs using atomic blocks and tm we also demonstrate the design of conflict point discovery technique for identifying program statements that introduce contention between transactions we illustrate how these techniques can be used by optimizing version of the genome application from stamp tm benchmark suite
we survey some of the main results regarding the complexity and expressive power of live sequence charts lscs we first describe the two main semantics given to lscs trace based semantics and an operational semantics the expressive power of the language is then examined by describing translations into various temporal logics some limitations of the language are also discussed finally we survey complexity results mainly due to bontemps and schobbens regarding the use of lscs for model checking execution and synthesis
automated recovery of system features and their designs from program source codes is important in reverse engineering and system comprehension it also helps in the testing of software an error that is made by users in an input to an execution of transaction and discovered only after the completion of the execution is called posttransaction user input error ptuie of the transaction for transaction in any database application usually it is essential to provide transactions for correcting the effect that could result from any ptuie of the transaction we discover some probable properties that exist between the control flow graph of transaction and the control flow graphs of transactions for correcting ptuie of the former transaction through recognizing these properties this paper presents novel approach for the automated approximate recovery of provisions and designs for transactions to correct ptuie of transactions in database application the approach recognizes these properties through analyzing the source codes of transactions in the database application statically
we explore the potential of hardware transactional memory htm to improve concurrent algorithms we illustrate number of use cases in which htm enables significantly simpler code to achieve similar or better performance than existing algorithms for conventional architectures we use sun’s prototype multicore chip code named rock to experiment with these algorithms and discuss ways in which its limitations prevent better results or would prevent production use of algorithms even if they are successful our use cases include concurrent data structures such as double ended queues work stealing queues and scalable non zero indicators as well as scalable malloc implementation and simulated annealing application we believe that our paper makes compelling case that htm has substantial potential to make effective concurrent programming easier and that we have made valuable contributions in guiding designers of future htm features to exploit this potential
in this paper we present framework for robust people detection in low resolution image sequences of highly cluttered dynamic scenes with non stationary background our model utilizes appearance features together with short and long term motion information in particular we boost integral gradient orientation histograms of appearance and short term motion outputs from the detector are maintained by tracker to correct any misdetections bayesian model is then deployed to further fuse long term motion information based on correlation experiments show that our model is more robust with better detection rate compared to the model of viola et al michael jones paul viola daniel snow detecting pedestrians using patterns of motion and appearance international journal of computer vision
abduction is one of the most important forms of reasoning it has been successfully applied to several practical problems such as diagnosis in this article we investigate whether the computational complexity of abduction can be reduced by an appropriate use of preprocessing this is motivated by the fact that part of the data of the problem namely the set of all possible assumptions and the theory relating assumptions and manifestations is often known before the rest of the problem in this article we show some complexity results about abduction when compilation is allowed
the definition and management of access rules eg to control access to business documents and business functions is fundamental task in any enterprise information system eis while there exists considerable work on how to specify and represent access rules only little research has been spent on access rule changes examples include the evolution of organisational models with need for subsequent adaptation of related access rules as well as direct access rule modifications eg to state previously defined rule more precisely this paper presents comprehensive change framework for the controlled evolution of role based access rules in eis first we consider changes of organisational models and elaborate how they affect existing access rules second we define change operations which enable direct adaptations of access rules in the latter context we define the formal semantics of access rule changes based on operator trees particularly this enables their unambiguous application ie we can precisely determine which effects are caused by respective rule changes this is important for example to be able to efficiently and correctly adapt user worklists in process aware information systems altogether this paper contributes to comprehensive life cycle support for access rules in adaptive eis
conceptual clustering techniques based on current theories of categorization provide way to design database schemas that more accurately represent classes an approach is presented in which classes are treated as complex clusters of concepts rather than as simple predicates an important service provided by the database is determining whether particular instance is member of class conceptual clustering algorithm based on theories of categorization aids in building classes by grouping related instances and developing class descriptions the resulting database schema addresses number of properties of categories including default values and prototypes analogical reasoning exception handling and family resemblance class cohesion results from trying to resolve conflicts between building generalized class descriptions and accommodating members of the class that deviate from these descriptions this is achieved by combining techniques from machine learning specifically explanation based learning and case based reasoning subsumption function is used to compare two class descriptions realization function is used to determine whether an instance meets an existing class description new function intersect is introduced to compare the similarity of two instances intersect is used in defining an exception condition exception handling results in schema modification this approach is applied to the database problems of schema integration schema generation query processing and view creation
an efficient job scheduling must ensure high throughput and good performance moreover in highly parallel systems where processors are critical resource high machine utilization becomes an essential aspectbackfilling consists on moving jobs ahead in the queue given that they do not delay certain previously submitted jobs when the execution time of backfilled job was underestimated some action has to be taken with it abort suspend resume checkpoint restart remain executingin this paper we propose an alternative choice for that situation which consists on apply virtual malleability to the backfilled job this means that its processors partition will be reduced and as mpi jobs aren’t really malleable we make the job contend with itself for the use of processors by applying co scheduling in this way resources are freed and the job at the head of the queue have chance to start executing in addition to this as mpi parallel jobs can be moldable we add this possibility to the schemewe obtained better performance than traditional backfilling in about especially in high machine utilization we claim also for the portability of our technique which does not requires special support from the operating system as checkpointing does
we discover communities from social network data and analyze the community evolution these communities are inherent characteristics of human interaction in online social networks as well as paper citation networks also communities may evolve over time due to changes to individuals roles and social status in the network as well as changes to individuals research interests we present an innovative algorithm that deviates from the traditional two step approach to analyze community evolutions in the traditional approach communities are first detected for each time slice and then compared to determine correspondences we argue that this approach is inappropriate in applications with noisy data in this paper we propose facetnet for analyzing communities and their evolutions through robust unified process this novel framework will discover communities and capture their evolution with temporal smoothness given by historic community structures our approach relies on formulating the problem in terms of maximum posteriori map estimation where the community structure is estimated both by the observed networked data and by the prior distribution given by historic community structures then we develop an iterative algorithm with proven low time complexity which is guaranteed to converge to an optimal solution we perform extensive experimental studies on both synthetic datasets and real datasets to demonstrate that our method discovers meaningful communities and provides additional insights not directly obtainable from traditional methods
in this paper we present unified formalism based on past temporal logic for specifying conditions and events in the rules for active database system this language permits specification of many time varying properties of database systems it also permits specification of temporal aggregates we present an efficient incremental algorithm for detecting conditions specified in this language the given algorithm for subclass of the logic was implemented on top of sybase
data access costs contribute significantly to the execution time of applications with complex data structures as the latency of memory accesses becomes high relative to processor cycle times application performance is increasingly limited by memory performance in some situations it may be reasonable to trade increased computation costs for reduced memory costs the contributions of this paper are three fold we provide detailed analysis of the memory performance of set of seven memory intensive benchmarks we describe computation regrouping general source level approach to improving the overall performance of these applications by improving temporal locality to reduce cache and tlb miss ratios and thus memory stall times and we demonstrate significant performance improvements from applying computation regrouping to our suite of seven benchmarks with computation regrouping we observe an average speedup of with individual speedups ranging from to most of this improvement comes from eliminating memory stall time
work on evaluating and improving the relevance of web search engines typically use human relevance judgments or clickthrough data both these methods look at the problem of learning the mapping from queries to web pages in this paper we identify some issues with this approach and suggest an alternative approach namely learning mapping from web pages to queries in particular we use human computation games to elicit data about web pages from players that can be used to improve search we describe three human computation games that we developed with focus on page hunt single player game we describe experiments we conducted with several hundred game players highlight some interesting aspects of the data obtained and define the findability metric we also show how we automatically extract query alterations for use in query refinement using techniques from bitext matching the data that we elicit from players has several other applications including providing metadata for pages and identifying ranking issues
the closed world assumption cwa on databases expresses the assumption that an atom not in the database is false this assumption is applicable only in cases where the database has complete knowledge about the domain of discourse in this article we investigate locally closed databases that is databases that are sound but partially incomplete about their domain such databases consist of standard database instance augmented with collection of local closed world assumptions lcwas lcwa is ldquo local rdquo form of the cwa expressing that database relation is complete in certain area called window of expertise in this work we study locally closed databases both from knowledge representation and from computational perspective at the representation level the approach taken in this article distinguishes between the data that is conveyed by database and the metaknowledge about the area in which the data is complete we study the semantics of the lcwa’s and relate it to several knowledge representation formalisms at the reasoning level we study the complexity of and algorithms for two basic reasoning tasks computing certain and possible answers to queries and determining whether database has complete knowledge on query as the complexity of these tasks is unacceptably high we develop efficient approximate methods for query answering we also prove that for useful classes of queries and locally closed databases these methods are optimal and thus they solve the original query in tractable way as result we obtain classes of queries and locally closed databases for which query answering is tractable
we present in this paper role based model for programming distributed cscw systems this model supports specification of dynamic security and coordination requirements in such systems we also present here model checking methodology for verifying the security properties of design expressed in this model the verification methodology presented here is used to ensure correctness and consistency of design specification it is also used to ensure that sensitive security requirements cannot be violated when policy enforcement functions are distributed among the participants several aspect specific verification models are developed to check security properties such as task flow constraints information flow confidentiality and assignment of administrative privileges
in this paper we analyze the shkq software watermarking algorithm originally title to stern hachez koeune and quisquater the algorithm has been implemented within tile sandmark framework system designed to allow effective study of software protection algorithms such as code obfuscation software watermarking and code tamper proofing targeting java bytecodethe shkq algorithm embeds watermark in program using spread spectrum technique the idea is to spread the watermark over the entire application by modifying instruction frequencies spreading the watermark over the code provides high level of stealth and some manner of resilience against attackin this paper we describe the implementation of the shkq algorithm in particular the issues that arise when targeting java bytecodes we then present an empirical examination of the robustness of the watermark against wide variety of attacks we conclude that shkq while stealthy is easily attacked by simple distortive transformations
we present simple randomized algorithmic framework for connected facility location problems the basic idea is as follows we run black box approximation algorithm for the unconnected facility location problem randomly sample the clients and open the facilities serving sampled clients in the approximate solution via novel analytical tool which we term core detouring we show that this approach significantly improves over the previously best known approximation ratios for several np hard network design problems for example we reduce the approximation ratio for the connected facility location problem from to and for the single sink rent or buy problem from to we show that our connected facility location algorithms can be derandomized at the expense of slightly worse approximation ratio the versatility of our framework is demonstrated by devising improved approximation algorithms also for other related problems
we extend the well known tree diffie hellman technique used for the design of group key exchange gke protocols with robustness ie with resistance to faults resulting from possible system crashes network failures and misbehavior of the protocol participants we propose fully robust gke protocol using the novel tree replication technique our basic protocol version ensures security against outsider adversaries whereas its extension addresses optional insider security both protocols are proven secure assuming stronger adversaries gaining access to the internal states of participants our security model for robust gke protocols can be seen as step towards unification of some earlier security models in this area
building on simple information theoretic concepts we study two quantitative models of information leakage in the pi calculus the first model presupposes an attacker with an essentially unlimited computational power the resulting notion of absolute leakage measured in bits is in agreement with secrecy as defined by abadi and gordon process has an absolute leakage of zero precisely when it satisfies secrecy the second model assumes restricted observation scenario inspired by the testing equivalence framework where the attacker can only conduct repeated success or failure experiments on processes moreover each experiment has cost in terms of communication effort the resulting notion of leakage rate measured in bits per action is in agreement with the first model the maximum amount of information that can be extracted by repeated experiments coincides with the absolute leakage of the process moreover the overall extraction cost is at least where is the rate of the process the compositionality properties of the two models are also investigated
this paper reports our first set of results on managing uncertainty in data integration we posit that data integration systems need to handle uncertainty at three levels and do so in principled fashion first the semantic mappings between the data sources and the mediated schema may be approximate because there may be too many of them to be created and maintained or because in some domains eg bioinformatics it is not clear what the mappings should be second queries to the system may be posed with keywords rather than in structured form third the data from the sources may be extracted using information extraction techniques and so may yield imprecise data as first step to building such system we introduce the concept of probabilistic schema mappings and analyze their formal foundations we show that there are two possible semantics for such mappings by table semantics assumes that there exists correct mapping but we don’t know what it is by tuple semantics assumes that the correct mapping may depend on the particular tuple in the source data we present the query complexity and algorithms for answering queries in the presence of approximate schema mappings and we describe an algorithm for efficiently computing the top answers to queries in such setting
we study the problem of generating efficient equivalent rewritings using views to compute the answer to query we take the closed world assumption in which views are materialized from base relations rather than views describing sources in terms of abstract predicates as is common when the open world assumption is used in the closed world model there can be an infinite number of different rewritings that compute the same answer yet have quite different performance query optimizers take logical plan rewriting of the query as an input and generate efficient physical plans to compute the answer thus our goal is to generate small subset of the possible logical plans without missing an optimal physical plan we first consider cost model that counts the number of subgoals in physical plan and show search space that is guaranteed to include an optimal rewriting if the query has rewriting in terms of the views we also develop an efficient algorithm for finding rewritings with the minimum number of subgoals we then consider cost model that counts the sizes of intermediate relations of physical plan without dropping any attributes and give search space for finding optimal rewritings our final cost model allows attributes to be dropped in intermediate relations we show that by careful variable renaming it is possible to do better than the standard supplementary relation approach by dropping attributes that the latter approach would retain experiments show that our algorithm of generating optimal rewritings has good efficiency and scalability
this paper focuses on privacy risks in health databases that arise in assistive environments where humans interact with the environment and this information is captured assimilated and events of interest are extracted the stakeholders of such an environment can range from caregivers to doctors and supporting family the environment also includes objects the person interacts with such as wireless devices that generate data about these interactions the data streams generated by such an environment are massive such databases are usually considered hidden ie are only accessible online via restrictive front end web interfaces security issues specific to such hidden databases however have been largely overlooked by the research community possibly due to the false sense of security provided by the restrictive access to such databases we argue that an urgent challenge facing such databases is the disclosure of sensitive aggregates enabled by recent studies on the sampling of hidden databases through its public web interface to protect sensitive aggregates we enunciate the key design principles propose three component design and suggest number of possible techniques that may protect sensitive aggregates while maintaining the service quality for normal search users our hope is that this paper sheds lights on fruitful direction of future research in security issues related to hidden web databases
the behaviour of systems on chip soc is complex because they contain multiple processors that interact through concurrent interconnects such as networks on chip noc debugging such socs is hard based on classification of debug scope and granularity we propose that debugging should be communication centric and based on transactions communication centric debug focusses on the communication and the synchronisation between the ip blocks which are implemented by the interconnect using transactions we define and implement modular debug architecture based on noc monitors and dedicated high speed event distribution broadcast interconnect the manufacturing test scan chains and ieee test access ports tap are re used for configuration and debug data read out our debug architecture requires only small changes to the functional architecture the additional area cost is limited to the monitors and the event distribution interconnect which are of the noc area or less than of the soc area the debug architecture runs at noc functional speed and reacts very quickly to debug events to stop the soc close in time to the condition that raised the event the speed at which data is retrieved from the soc after stopping using the tap is mhz we prove our concepts and architecture with gate level implementation that includes the noc event distribution interconnect and clock reset and tap controllers we include gate level signal traces illustrating debug at message and transaction levels
environmental monitoring is one of the driving applications in the domain of sensor networks the lifetime of such systems is envisioned to exceed several years to achieve this longevity in unattended operation it is crucial to minimize energy consumption of the battery powered sensor nodes this paper proposes dozer data gathering protocol meeting the requirements of periodic data collection and ultra low power consumption the protocol comprises mac layer topology control and routing all coordinated to reduce energy wastage of the communication subsystem using tree based network structure packets are reliably routed towards the data sink parents thereby schedule precise rendezvous times for all communication with their children in deployed network consisting of tinyos enabled sensor nodes dozer achieves radio duty cycles in the magnitude of
we present new fast algorithm for rendering physically based soft shadows in ray tracing based renderers our method replaces the hundreds of shadow rays commonly used in stochastic ray tracers with single shadow ray and local reconstruction of the visibility function compared to tracing the shadow rays our algorithm produces exactly the same image while executing one to two orders of magnitude faster in the test scenes used our first contribution is two stage method for quickly determining the silhouette edges that overlap an area light source as seen from the point to be shaded secondly we show that these partial silhouettes of occluders along with single shadow ray are sufficient for reconstructing the visibility function between the point and the light source
we address the rating inference problem wherein rather than simply decide whether review is thumbs up or thumbs down as in previous sentiment analysis work one must determine an author’s evaluation with respect to multi point scale eg one to five stars this task represents an interesting twist on standard multi class text categorization because there are several different degrees of similarity between class labels for example three stars is intuitively closer to four stars than to one star we first evaluate human performance at the task then we apply meta algorithm based on metric labeling formulation of the problem that alters given ary classifier’s output in an explicit attempt to ensure that similar items receive similar labels we show that the meta algorithm can provide significant improvements over both multi class and regression versions of svms when we employ novel similarity measure appropriate to the problem
like html many xml documents are resident on native file systems since xml data is irregular and verbose the disk space and the network bandwidth are wasted to overcome the verbosity problem research on compressors for xml data has been conducted some xml compressors do not support querying compressed data while other xml compressors which support querying compressed data blindly encode tags and data values using predefined encoding methods existing xml compressors do not provide the facility for updates on compressed xml datain this article we propose xpress an xml compressor which supports direct updates and efficient evaluations of queries on compressed xml data xpress adopts novel encoding method called reverse arithmetic encoding which encodes label paths of xml data and applies diverse encoding methods depending on the types of data values experimental results with real life data sets show that xpress achieves significant improvements on query performance for compressed xml data and reasonable compression ratios on average the query performance of xpress is times better than that of an existing xml compressor and the compression ratio of xpress is about percnt additionally we demonstrate the efficiency of the updates performed directly on compressed xml data
voltage islands provide very good opportunity for minimizing the energy consumption of core based networks on chip noc design by utilizing unique supply voltage for the cores on each island this paper addresses various complex design issues for noc implementation with voltage islands novel design framework based on genetic algorithm is proposed to optimize both the computation and communication energy with the creation of voltage islands concurrently for the noc using multiple supply voltages the algorithm automatically performs tile mapping routing path allocation link speed assignment voltage island partitioning and voltage assignment simultaneously experiments using both real life and artificial benchmarks were performed and results show that by using the proposed scheme significant energy reduction is obtained
as more sensitive data is captured in electronic form security becomes more and more important data encryption is the main technique for achieving security while in the past enterprises were hesitant to implement database encryption because of the very high cost complexity and performance degradation they now have to face the ever growing risk of data theft as well as emerging legislative requirements data encryption can be done at multiple tiers within the enterprise different choices on where to encrypt the data offer different security features that protect against different attacks one class of attack that needs to be taken seriously is the compromise of the database server its software or administrator secure way to address this threat is for dbms to directly process queries on the ciphertext without decryption we conduct comprehensive study on answering sum and avg aggregation queries in such system model by using secure homomorphic encryption scheme in novel way we demonstrate that the performance of such solution is comparable to traditional symmetric encryption scheme eg des in which each value is decrypted and the computation is performed on the plaintext clearly this traditional encryption scheme is not viable solution to the problem because the server must have access to the secret key and the plaintext which violates our system model and security requirements we study the problem in the setting of read optimized dbms for data warehousing applications in which sum and avg are frequent and crucial
this paper proposes two topology control algorithms absolute distance based abd and predictive distance based prd which adjust the transmission range of individual nodes in manet to achieve good network throughput particularly under correlated node movements as in vehicular environment both algorithms attempt to maintain the number of logical neighbors between two predefined thresholds the abd algorithm uses the absolute distance as the neighbor selection criteria while the prd algorithm incorporates mobility information to extend the neighbor lifetime and hence less chance of broken links simple expression of saturated end to end throughput is presented as function of path availability which depends on the average transmission range the network connectivity and the probability of broken links based on the simulation results it was found out the transmission range can only be increased to certain value to prolong the next hop neighbor beyond such value the mac interference becomes more dominant factor over the end to end throughput than routing overheads or the effects of broken links consequently using higher transmission range will only decrease the throughput under street mobility which has correlated node movement prd algorithm can take advantage of such correlation and achieves higher path availability and end to end throughput than abd algorithm
this paper is concerned with automatic extraction of titles from the bodies of html documents titles of html documents should be correctly defined in the title fields however in reality html titles are often bogus it is desirable to conduct automatic extraction of titles from the bodies of html documents this is an issue which does not seem to have been investigated previously in this paper we take supervised machine learning approach to address the problem we propose specification on html titles we utilize format information such as font size position and font weight as features in title extraction our method significantly outperforms the baseline method of using the lines in largest font size as title improvement in score as application we consider web page retrieval we use the trec web track data for evaluation we propose new method for html documents retrieval using extracted titles experimental results indicate that the use of both extracted titles and title fields is almost always better than the use of title fields alone the use of extracted titles is particularly helpful in the task of named page finding improvements
the ambient logic is modal logic that was proposed for the description of the structural and computational properties of distributed and mobile computation the structural part of the ambient logic is essentially logic of labelled trees hence it turns out to be good foundation for query languages for semistructured data much in the same way as first order logic is fitting foundation for relational query languages we define here query language for semistructured data that is based on the ambient logic and we outline an execution model for this language the language turns out to be quite expressive its strong foundations and the equivalences that hold in the ambient logic are helpful in the definition of the language semantics and execution model
efficiently exploring exponential size architectural design spaces with many interacting parameters remains an open problem the sheer number of experiments required renders detailed simulation intractable we attack this via an automated approach that builds accurate predictive models we simulate sampled points using results to teach our models the function describing relationships among design parameters the models can be queried and are very fast enabling efficient design tradeoff discovery we validate our approach via two uniprocessor sensitivity studies predicting ipc with only percnt error in an experimental study using the approach training on percnt of point cmp design space allows our models to predict performance with only percnt error our predictive modeling combines well with techniques that reduce the time taken by each simulation experiment achieving net time savings of three four orders of magnitude
stability is an important property of machine learning algorithms stability in clustering may be related to clustering quality or ensemble diversity and therefore used in several ways to achieve deeper understanding or better confidence in bioinformatic data analysis in the specific field of fuzzy biclustering stability can be analyzed by porting the definition of existing stability indexes to fuzzy setting and then adapting them to the biclustering problem this paper presents work done in this direction by selecting some representative stability indexes and experimentally verifying and comparing their properties experimental results are presented that indicate both general agreement and some differences among the selected methods
environment monitoring in coal mines is an important application of wireless sensor networks wsns that has commercial potential we discuss the design of structure aware self adaptive wsn system sasa by regulating the mesh sensor network deployment and formulating collaborative mechanism based on regular beacon strategy sasa is able to rapidly detect structure variations caused by underground collapses prototype is deployed with mica motes we present our implementation experiences as well as the experimental results to better evaluate the scalability and reliability of sasa we also conduct large scale trace driven simulation based on real data collected from the experiments
the dynamic power consumed by digital cmos circuit is directly proportional to capacitance in this paper we consider pre routing capacitance estimation for fpgas and develop an empirical estimation model suitable for use in power aware placement early power prediction and other applications we show that estimation accuracy is improved by considering aspects of the fpga interconnect architecture in addition to generic parameters such as net fanout and bounding box perimeter length we also show that there is an inherent variability noise in the capacitance of nets routed using commercial fpga layout tool this variability limits the accuracy attainable in capacitance estimation experimental results show that the proposed estimation model works well given the noise limitations
consider an vertex graph of maximum degree and suppose that each vertex hosts processor the processors are allowed to communicate only with their neighbors in the communication is synchronous ie it proceeds in discrete rounds in the distributed vertex coloring problem the objective is to color with or slightly more than colors using as few rounds of communication as possible the number of rounds of communication will be henceforth referred to as running time efficient randomized algorithms for this problem are known for more than twenty years specifically these algorithms produce coloring within log time with high probability on the other hand the best known deterministic algorithm that requires polylogarithmic time employs colors this algorithm was devised in seminal focs paper by linial its running time is log in the same paper linial asked whether one can color with significantly less than colors in deterministic polylogarithmic time by now this question of linial became one of the most central long standing open questions in this area in this paper we answer this question in the affirmative and devise deterministic algorithm that employs colors and runs in polylogarithmic time specifically the running time of our algorithm is log log for an arbitrarily slow growing function we can also produce coloring in log log time for an arbitrarily small constant and coloring in δε log time for an arbitrarily small constant our results are in fact far more general than this in particular for graph of arboricity our algorithm produces an coloring for an arbitrarily small constant in time log log
multithreaded applications with multi gigabyte heaps running on modern servers provide new challenges for garbage collection gc the challenges for server oriented gc include ensuring short pause times on multi gigabyte heap while minimizing throughput penalty good scaling on multiprocessor hardware and keeping the number of expensive multi cycle fence instructions required by weak ordering to minimum we designed and implemented fully parallel incremental mostly concurrent collector which employs several novel techniques to meet these challenges first it combines incremental gc to ensure short pause times with concurrent low priority background gc threads to take advantage of processor idle time second it employs low overhead work packet mechanism to enable full parallelism among the incremental and concurrent collecting threads and ensure load balancing third it reduces memory fence instructions by using batching techniques one fence for each block of small objects allocated one fence for each group of objects marked and no fence at all in the write barrier when compared to the mature well optimized parallel stop the world mark sweep collector already in the ibm jvm our collector prototype reduces the maximum pause time from ms to ms and the average pause time from ms to ms while only losing throughput when running the specjbb benchmark on mb heap on way mhz pentium multiprocessor
data prefetching has been widely used in the past as technique for hiding memory access latencies however data prefetching in multi threaded applications running on chip multiprocessors cmps can be problematic when multiple cores compete for shared on chip cache or in this paper we quantify the impact of conventional data prefetching on shared caches in cmps the experimental data collected using multi threaded applications indicates that while data prefetching improves performance in small number of cores its benefits reduce significantly as the number of cores is increased that is it is not scalable ii identify harmful prefetches as one of the main contributors for degraded performance with large number of cores and iii propose and evaluate compiler directed data prefetching scheme for shared on chip cache based cmps the proposed scheme first identifies program phases using static compiler analysis and then divides the threads into groups within each phase and assigns customized prefetcher thread helper thread to each group of threads this helps to reduce the total number of prefetches issued prefetch overheads and negative interactions on the shared cache space due to data prefetches and more importantly makes compiler directed prefetching scalable optimization for cmps our experiments with the applications from the spec omp benchmark suite indicate that the proposed scheme improves overall parallel execution latency by over the no prefetch case and over the conventional data prefetching scheme where each core prefetches its data independently on average when cores are used the corresponding average performance improvements with cores are over the no prefetch case and over the conventional prefetching case we also demonstrate that the proposed scheme is robust under wide range of values of our major simulation parameters and the improvements it achieves come very close to those that can be achieved using an optimal scheme
the problem of allocating and scheduling precedence constrained tasks on the processors of distributed real time system is np hard as such it has been traditionally tackled by means of heuristics which provide only approximate or near optimal solutions this paper proposes complete allocation and scheduling framework and deploys an mpsoc virtual platform to validate the accuracy of modelling assumptions the optimizer implements an efficient and exact approach to the mapping problem based on decomposition strategy the allocation subproblem is solved through integer programming ip while the scheduling one through constraint programming cp the two solvers interact by means of an iterative procedure which has been proven to converge to the optimal solution experimental results show significant speed ups wrt pure ip and cp exact solution strategies as well as high accuracy with respect to cycle accurate functional simulation two case studies further demonstrate the practical viability of our framework for real life applications
modular development of concurrent applications requires thread safe components that behave correctly when called concurrently by multiple client threads this paper focuses on linearizability specific formalization of thread safety where all operations of concurrent component appear to take effect instantaneously at some point between their call and return the key insight of this paper is that if component is intended to be deterministic then it is possible to build an automatic linearizability checker by systematically enumerating the sequential behaviors of the component and then checking if each its concurrent behavior is equivalent to some sequential behavior we develop this insight into tool called line up the first complete and automatic checker for deterministic linearizability it is complete because any reported violation proves that the implementation is not linearizable with respect to any sequential deterministic specification it is automatic requiring no manual abstraction no manual specification of semantics or commit points no manually written test suites no access to source code we evaluate line up by analyzing classes with total of methods in two versions of the net framework the violations of deterministic linearizability reported by line up exposed seven errors in the implementation that were fixed by the development team
more and more web users keep up with newest information through information streams such as the popular micro blogging website twitter in this paper we studied content recommendation on twitter to better direct user attention in modular approach we explored three separate dimensions in designing such recommender content sources topic interest models for users and social voting we implemented recommendation engines in the design space we formulated and deployed them to recommender service on the web to gather feedback from real twitter users the best performing algorithm improved the percentage of interesting content to from baseline of we conclude this work by discussing the implications of our recommender design and how our design can generalize to other information streams
consistency enforcement aims at systematically modifying database program such that the result is consistent with respect to specified set of integrity constraints this modification may be done at compile time or at run time the commonly known run time approach uses rule triggering systems rtss it has been shown that these systems cannot solve the problem in general as an alternative greatest consistent specializations gcss have been studied this approach requires the modified program specification to be maximal consistent diminution of the original one with respect to some partial order the chosen order is operational specialization on this basis it is possible to derive commutativity result and compositionality result the first one enables step by step enforcement for sets of constraints the second one reduces the problem to providing the gcss just for basic operations whereas for complex programs the gcs can be easily determined the approach turns out to be well founded since the gcs for such complex programs is effectively computable if we require loops to be bounded despite its theoretical merits the gcs approach is still too coarse this leads to the problem of modifying the chosen specialization order and to relax the requirement that the result should be unique one idea is to exploit the fact that operational specialization is equivalent to the preservation of set of transition invariants in this case reasonable order arises from slight modification of this set in which case we talk of maximal consistent effect preserver mce however strict theory of mces is still outstanding
good fit between the person and the organization is essential in better organizational performance this is even more crucial in case of institutionalization of software product line practice within an organization employees participation organizational behavior and management contemplation play vital role in successfully institutionalizing software product lines in company organizational dimension has been weighted as one of the critical dimensions in software product line theory and practice comprehensive empirical investigation to study the impact of some organizational factors on the performance of software product line practice is presented in this work this is the first study to empirically investigate and demonstrate the relationships between some of the key organizational factors and software product line performance of an organization the results of this investigation provide empirical evidence and further support the theoretical foundations that in order to institutionalize software product lines within an organization organizational factors play an important role
trojan attack maliciously modifies alters or embeds unplanned components inside the exploited chips given the original chip specifications and process and simulation models the goal of trojan detection is to identify the malicious components this paper introduces new trojan detection method based on nonintrusive external ic quiescent current measurements we define new metric called consistency based on the consistency metric and properties of the objective function we present robust estimation method that estimates the gate properties while simultaneously detecting the trojans experimental evaluations on standard benchmark designs show the validity of the metric and demonstrate the effectiveness of the new trojan detection
naive bayes nb is simple bayesian classifier that assumes the conditional independence and augmented nb anb models are extensions of nb by relaxing the independence assumption the averaged one dependence estimators aode is classifier that averages odes which are anb models however the expressiveness of aode is still limited by the restricted structure of ode in this paper we propose model averaging method for nb trees nbts with flexible structures and present experimental results in terms of classification accuracy results of comparative experiments show that our proposed method outperforms aode on classification accuracy
docxgo is new framework to support editing of shared documents on mobile devices three high level requirements influenced its design namely the need to adapt content especially textual content on the fly according to the quality of the network connection and the form factor of each device support for concurrent uncoordinated editing on different devices whose effects will later be merged on all devices in convergent and consistent manner without sacrificing the semantics of the edits and flexible replication architecture that accommodates both device to device and cloud mediated synchronization docxgo supports on the go editing for xml documents such as documents in microsoft word and other commonly used formats it combines the best practices from content adaptation systems weakly consistent replication systems and collaborative editing systems while extending the state of the art in each of these fields the implementation of docxgo has been evaluated based on workload drawn from wikipedia
data cache is commodity in modern microprocessor systems it is fact that the size of data caches keeps growing up however the increase in application size goes faster as result it is usually not possible to store the complete working set in the cache memorythis paper proposes an approach that allows the data access of some load store instructions to bypass the cache memory in this case the cache space can be reserved for storing more frequently reused data we implemented an analysis algorithm to identify the specific instructions and simulator to model the novel cache architecture the approach was verified using applications from mediabench mibench benchmark suite and for all except one application we achieved huge gains in performance
for the purpose of satisfying different users profiles and accelerating the subsequence olap online analytical processing queries in large data warehouse dynamic materialized olap view management is highly desirable previous work caches data as either chunks or multidimensional range fragments in this paper we focus on rolap relational olap in an existing relational database system we propose dynamic predicate based partitioning approach which can support wide range of olap queries we conducted extensive performance studies using tpch benchmark data on ibm db and encouraging results are obtained which indicate that our approach is highly feasible
we explore the possibility of using human generated time series as biometric signature adopting simple psychometric procedure in which button is pressed in entirely random manner successive elapsed times are registered and gathered in signal reflecting user’s internal cognitive processes by reconstructing and comparing the dynamics across repetitions from the same subject noticeable consistency was observed moreover the dynamics showed prominent idiosyncratic character when realizations from different subjects were contrasted we established an appropriate similarity measure to systematize such comparisons and experimentally verified that it is feasible to restore someone’s identity from rti random time interval signals by incorporating it in an svm based verification system which was trained and tested using medium sized dataset from persons considerably low equal error rate eer of was achieved rti signals can be collected effortlessly and this makes our approach appealing especially in transactions mediated by standard pc terminal keyboards or even telephone keypads
in this paper we study the problem of web forum crawling web forum has now become an important data source of many web applications while forum crawling is still challenging task due to complex in site link structures and login controls of most forum sites without carefully selecting the traversal path generic crawler usually downloads many duplicate and invalid pages from forums and thus wastes both the precious bandwidth and the limited storage space to crawl forum data more effectively and efficiently in this paper we propose an automatic approach to exploring an appropriate traversal strategy to direct the crawling of given target forum in detail the traversal strategy consists of the identification of the skeleton links and the detection of the page flipping links the skeleton links instruct the crawler to only crawl valuable pages and meanwhile avoid duplicate and uninformative ones and the page flipping links tell the crawler how to completely download long discussion thread which is usually shown in multiple pages in web forums the extensive experimental results on several forums show encouraging performance of our approach following the discovered traversal strategy our forum crawler can archive more informative pages in comparison with previous related work and commercial generic crawler
we discuss how emerging object relational database mediator technology can be used to integrate academic freeware and commercial off the shelf software components to create sequence of gradually more complex and powerful always semantically and syntactically homogeneous database centered image meta analysis environments we show how this may be done by definition and utilization of use case based evolutionary design and development process this process allows subsystems to be produced largely independently by several small specialist subprojects turning the system integration work into high level domain modelling task
we consider the problem of maximizing the lifetime of given multicast connection in wireless network of energy constrained eg battery operated nodes by choosing ideal transmission power levels for the nodes relaying the connection we distinguish between two basic operating modes in static assignment the power levels of the nodes are set at the beginning and remain unchanged until the nodes are depleted of energy in dynamic assignment the powers can be adjusted during operationwe show that lifetime maximizing static power assignments can be found in polynomial time whereas for dynamic assignments quantized time version of the problem is np hard we then study the approximability of the quantized dynamic case and conclude that no polynomial time approximation scheme ptas exists for the problem unless ptime np finally by considering two approximation heuristics for the dynamic case we show experimentally that the lifetime of dynamically maintained multicast connection can be made several times longer than what can be achieved by the best possible static assignment
progress was made in the understanding of object oriented architectures through the introduction of patterns of design and architecture few works however offer methods of precise specification for architecturesthis article provides well defined ontology and an underlying framework for the formal specification of architectures we observe key regularities and elementary design motifs in design and architectures define architectural model in logic and formulate relations between specifications we demonstrate how to declare and reason with the representations finally we use our conceptual toolkit to compare and evaluate proposed formalisms
abstract negative acknowledgments nacks and subsequent retries used to resolve races and to enforce total order among shared memory accesses in distributed shared memory dsm multiprocessors not only introduce extra network traffic and contention but also increase node controller occupancy especially at the home in this paper we present possible protocol optimizations to minimize these retries and offer thorough study of the performance effects of these messages on six scalable scientific applications running on node systems and larger to eliminate nacks we present mechanism to queue pending requests at the main memory of the home node and augment it with novel technique of combining pending read requests thereby accelerating the parallel execution for nodes by as much as percent speedup of compared to modified version of the sgi origin protocol we further design and evaluate protocol by combining this mechanism with technique that we call write string forwarding used in the alphaserver gs and piranha systems we find that without careful design considerations especially regarding atomic read modify write operations this aggressive write forwarding can hurt performance we identify and evaluate the necessary micro architectural support to solve this problem we compare the performance of these novel nack free protocols with base bitvector protocol modified version of the sgi origin protocol and nack free protocol that uses dirty sharing and write string forwarding as in the piranha system to understand the effects of network speed and topology the evaluation is carried out on three network configurations
super restricted connectivity and super restricted edge connectivity are more refined network reliability indices than connectivity and edge connectivity in this paper we first introduce the concepts of super restricted connectivity and super restricted edge connectivity and then give property of graphs with equal restricted edge connectivity and restricted connectivity applying this property we show sufficient condition for line graph to be super restricted edge connected and relationship between super restricted connected graphs and super restricted edge connected graphs
with the advent of the internet era and the maturation of electronic commerce strategic avatar design has become an important way of keeping up with market changes and customer tastes in this study we propose new dss for an adaptive avatar design that uses cognitive map cm as what if simulation vehicle the main virtue of the proposed avatar design recommendation dss abbreviated as adr dss is its ability to change specific avatar design features with objective consideration of the subsequent effects upon other design features thereby enhancing user satisfaction an avatar represents user’s self identity and desire for self disclosure therefore the claim is made that there is relationship between the characteristics of avatar design features and the choice of avatar the considerations in this study are props garments facial expression and miscellaneous and subjective judgments self image and user satisfaction the results of both brainstorming and focus group interviews with group of avatar experts were used to objectively organize the cm all the experts who participated are currently working in developing and designing avatar features in portal websites incorporating the cm as model base the proposed adr dss was implemented and two scenarios were presented for illustration to prove the validity of the adr dss rigorous survey was performed obtaining statistically significant results
orion is commercially available federated object oriented database management system designed and implemented at mcc one major architectural innovation in orion is the coexistence of shared databese and number of private databases the shared database is accessible to all authorized users of the system while each private database is accessible to only the user who owns it distributed database system with shared database and private databases for individual users is natural architecture for data intensive application environments on network of workstations notably computer aided design and engineering systems this paper discusses the benefits and limitations of such system and explores the impact of such an architecture on the semantics and implementation of some of the key functions of database system notably queries database schema and versions although the issues are discussed in the context of an object oriented data model the results at least significant portions thereof are applicable to database systems supporting other data models
this paper proposes an approach to fuzzy rough sets in the framework of lattice theory the new model for fuzzy rough sets is based on the concepts of both fuzzy covering and binary fuzzy logical operators fuzzy conjunction and fuzzy implication the conjunction and implication are connected by using the complete lattice based adjunction theory with this theory fuzzy rough approximation operators are generalized and fundamental properties of these operators are investigated particularly comparative studies of the generalized fuzzy rough sets to the classical fuzzy rough sets and pawlak rough set are carried out it is shown that the generalized fuzzy rough sets are an extension of the classical fuzzy rough sets as well as fuzzification of the pawlak rough set within the framework of complete lattices link between the generalized fuzzy rough approximation operators and fundamental morphological operators is presented in translation invariant additive group
hardware software partitioning is an important phase in embedded systems decisions made during this phase impact the quality cost performance and the delivery date of the final product over the past decade or more various partitioning approaches have been proposed majority operate at relatively fine granularity and use low level executable specification as the starting point this presents problems if the context is families of industrial products with frequent release of upgraded or new members managing complexity using low level specification is extremely challenging and impacts developer productivity designing using high level specification and component based development although better option imposes component integration and replacement problems during system evolution and new product release new approach termed concept based partitioning is presented that focuses on system evolution product lines and large scale reuse when partitioning beginning with information from uml sequence diagrams and concept repository concepts are identified and used as the unit of partitioning within specification methodology for the refinement of interpart communication in the system specification using sequence diagrams is also presented change localization during system evolution composability during large scale reuse and provision for configurable feature variations for product line are facilitated by generic adaptive layer gal around selected concepts the methodology was applied on subsystem of an unmanned aerial vehicle uav using various concepts which improved the composability of concepts while keeping performance and size overhead within the percnt range
tiling is widely used by compilers and programmer to optimize scientific and engineering code for better performance many parallel programming languages support tile tiling directly through first class language constructs or library routines however the current openmp programming language is tile oblivious although it is the de facto standard for writing parallel programs on shared memory systems in this paper we introduce tile aware parallelization into openmp we propose tile reduction an openmp tile aware parallelization technique that allows reduction to be performed on multi dimensional arrays the paper has three contributions it is the first paper that proposes and discusses tile aware parallelization in openmp we argue that it is not only necessary but also possible to have tile aware parallelization in openmp the paper introduces the methods used to implement tile reduction including the required openmp api extension and the associated code generation techniques we have applied tile reduction on set of benchmarks the experimental results show that tile reduction can make parallelization more natural and flexible it not only can expose more parallelism in program but also can improve its data locality
skin detection plays an important role in wide range of image processing applications ranging from face detection face tracking gesture analysis content based image retrieval systems and to various human computer interaction domains recently skin detection methodologies based on skin color information as cue has gained much attention as skin color provides computationally effective yet robust information against rotations scaling and partial occlusions skin detection using color information can be challenging task as the skin appearance in images is affected by various factors such as illumination background camera characteristics and ethnicity numerous techniques are presented in literature for skin detection using color in this paper we provide critical up to date review of the various skin modeling and classification strategies based on color information in the visual spectrum the review is divided into three different categories first we present the various color spaces used for skin modeling and detection second we present different skin modeling and classification approaches however many of these works are limited in performance due to real world conditions such as illumination and viewing conditions to cope up with the rapidly changing illumination conditions illumination adaptation techniques are applied along with skin color detection third we present various approaches that use skin color constancy and dynamic adaptation techniques to improve the skin detection performance in dynamically changing illumination and environmental conditions wherever available we also indicate the various factors under which the skin detection techniques perform well
spreadsheet languages which include commercial spreadsheets and various research systems have proven to be flexible tools in many domain specific settings research shows however that spreadsheets often contain faults we would like to provide at least some of the benefits of formal testing and debugging methodologies to spreadsheet developers this paper presents an integrated testing and debugging methodology for spreadsheets to accommodate the modeless and incremental development testing and debugging activities that occur during spreadsheet creation our methodology is tightly integrated into the spreadsheet environment to accommodate the users of spreadsheet languages we provide an interface to our methodology that does not require an understanding of testing and debugging theory and that takes advantage of the immediate visual feedback that is characteristic of the spreadsheet paradigm
we describe parallel scheduler for guaranteed quality parallel mesh generation and refinement methods we prove sufficient condition for the new points to be independent which permits the concurrent insertion of more than two points without destroying the conformity and delaunay properties of the mesh the scheduling technique we present is much more efficient than existing coloring methods and thus it is suitable for practical use the condition for concurrent point insertion is based on the comparison of the distance between the candidate points against the upper bound on triangle circumradius in the mesh our experimental data show that the scheduler introduces small overhead in the order of of the total execution time it requires local and structured communication compared to irregular variable and unpredictable communication of the other existing practical parallel guaranteed quality mesh generation and refinement method finally on cluster of more than workstations using simple block decomposition our data show that we can generate about million elements in less than seconds
in this work we describe just in time usage density based register allocator geared toward embedded systems with limited general purpose register set wherein speed code size and memory requirements are of equal concern the main attraction of the allocator is that it does not make use of the traditional live range and interval analysis nor does it perform advanced optimizations based on range splitting but results in very good code quality we circumvent the need for traditional analysis by using measure of usage density of variable the usage density of variable at program point represents both the frequency and the density of the uses we contend that by using this measure we can capture both range and frequency information which is essentially used by the good allocators based on splitting we describe framework based on this measure which has linear complexity in terms of the program size we perform comparisons with the static allocators based on graph coloring and the ones targeted toward just in time compilation systems like linear scan of live ranges through comparisons with graph coloring brigg’s style and live range based linear scan allocators we show that the memory footprint and the size of our allocator are smaller by percnt to percnt the speed of allocation is comparable and the speed of the generated code is better and its size smaller these attributes make the allocator an attractive candidate for performing fast memory efficient register allocation for embedded devices with small number of registers
technology scaling trends have forced designers to consider alternatives to deeply pipelining aggressive cores with large amounts of performance accelerating hardware one alternative is small simple core that can be augmented with latency tolerant helper engines as the demands placed on the processor core varies between applications and even between phases of an application the benefit seen from any set of helper engines will vary tremendously if there is single core these auxiliary structures can be turned on and off dynamically to tune the energy performance of the machine to the needs of the running applicationas more of the processor is broken down into helper engines and as we add more and more cores onto single chip which can potentially share helpers the decisions that are made about these structures become increasingly important in this paper we describe the need for methods that effectively manage these helper engines our counter based approach can dynamically turn off helpers on average while staying within of the performance when running with all helpers in multicore environment our intelligent and flexible sharing of helper engines provides an average speedup over static sharing in conjoined cores furthermore we show benefit from constructively sharing helper engines among multiple cores running the same application
we formalize and study business process systems that are centered around business artifacts or simply artifacts artifacts are used to represent real or conceptual key business entities including both their data schema and lifecycles the lifecycle of an artifact type specifies the possible sequencings of services that can be applied to an artifact of this type as it progresses through the business process the artifact centric approach was introduced by ibm and has been used to achieve substantial savings when performing business transformations in this paper artifacts carry attribute records and internal state relations holding sets of tuples that services can consult and update in addition services can access an underlying database and can introduce new values from an infinite domain thus modeling external inputs or partially specified processes described by pre and post conditions the lifecycles associate services to the artifacts using declarative condition action style rules we consider the problem of statically verifying whether all runs of an artifact system satisfy desirable correctness properties expressed in first order extension of linear time temporal logic we map the boundaries of decidability for the verification problem and provide its complexity the technical challenge to static verification stems from the presence of data from an infinite domain yielding an infinite state system while much work has been done lately in the verification community on model checking specialized classes of infinite state systems the available results do not transfer to our framework and this remains difficult problem we identify an expressive class of artifact systems for which verification is nonetheless decidable the complexity of verification is pspace complete which is no worse than classical finite state model checking this investigation builds upon previous work on verification of data driven web services and asm transducers while addressing significant new technical challenges raised by the artifact model
we present study of operating system errors found by automatic static compiler analysis applied to the linux and openbsd kernels our approach differs from previous studies that consider errors found by manual inspection of logs testing and surveys because static analysis is applied uniformly to the entire kernel source though our approach necessarily considers less comprehensive variety of errors than previous studies in addition automation allows us to track errors over multiple versions of the kernel source to estimate how long errors remain in the system before they are fixedwe found that device drivers have error rates up to three to seven times higher than the rest of the kernel we found that the largest quartile of functions have error rates two to six times higher than the smallest quartile we found that the newest quartile of files have error rates up to twice that of the oldest quartile which provides evidence that code hardens over time finally we found that bugs remain in the linux kernel an average of years before being fixed
if the internet is the next great subject for theoretical computer science to model and illuminate mathematically then game theory and mathematical economics more generally are likely to prove useful tools in this talk survey some opportunities and challenges in this important frontier
the instability of the tree like multicast overlay caused by nodes abrupt departures is considered as one of the major problems for peer to peer pp multicast systems in this paper we present protocol for improving the overlay’s stability by actively estimating the nodes lifetime model and combining the nodes lifetime information with the overlay’s structural properties we use the shifted pareto distribution to model the nodes lifetimes in designing our protocol to support this model we have measured the residual lifetimes of the nodes in popular iptv system named pplive pplive http wwwpplivecom and have formally analyzed the relationships between the distribution of the nodes lifetimes ages and their residual lifetimes under the shifted pareto distribution model we evaluate the overlay construction strategies which are essential in improving the overlay’s stability in our protocol by comparing them with number of other strategies in simulation the experimental results indicate that our proposed protocol could improve the overlay’s stability considerably with informative but not necessarily accurate lifetime model estimation and with limited overhead imposed on the network as well as negligible sacrifice regarding the end to end service latencies for the nodes on the overlay
this paper outlines the design of the social and spatial interactions platform the design of the platform was inspired by observing people’s pervasive use of mobile technologies the platform extends the current individual use of these devices to support shared co located interactions with mobile phones people are able to engage in playful social interactions on any flat surface by using devices fitted with wireless sensors that detect their current location with respect to each other
usability inspection techniques are widely used but few focus on users thinking and many are appropriate only for particular devices and use contexts we present new technique mot that guides inspection by metaphors of human thinking the metaphors concern habit the stream of thought awareness and associations the relation between utterances and thought and knowing the main novelty of mot is its psychological basis combined with its use of metaphors to stimulate inspection the first of three experiments shows that usability problems uncovered with mot are more serious and more complex to repair than problems found with heuristic evaluation problems found with mot are also judged more likely to persist for expert users the second experiment shows that mot finds more problems than cognitive walkthrough and has wider coverage of reference collection of usability problems participants prefer using mot over cognitive walkthrough an important reason being the wider scope of mot the third experiment compares mot cognitive walkthrough and think aloud testing in the context of nontraditional user interfaces participants prefer using think aloud testing but identify few problems with that technique that are not found also with mot or cognitive walkthrough mot identifies more problems than the other techniques across experiments and measures of usability problems utility in systems design mot performs better than existing inspection techniques and is comparable to think aloud testing
this paper presents new associative classification algorithm for data mining the algorithm uses elementary set concepts information entropy and database manipulation techniques to develop useful relationships between input and output attributes of large databases these relationships knowledge are represented using if then association rules where the if portion of the rule includes set of input attributes features and then portion of the rule includes set of output attributes that represent decision outcome application of the algorithm is presented with thermal spray process control case study thermal spray is process of forming desired shape of material by spraying melted metal on ceramic mould the goal of the study is to identify spray process input parameters that can be used to effectively control the process with the purpose of obtaining better characteristics for the sprayed material detailed discussion on the source and characteristics of the data sets is also presented
anycasting has recently become an important research topic especially for replicated servers with anycasting applications can request the nearest server for provision of desired multimedia service in this paper we study efficient distributed admission control dac for anycast flows we focus on algorithms that perform destination selection and efficient path establishment taking advantage of anycasting our distributed algorithms differ from each other in their dependence on system status information performance data obtained through mathematical analysis and simulations show that in terms of admission probabilities dac systems that are based on local status information have performance levels close to those that utilize global and dynamic status information this renders our dac algorithms useful not only for the network layer but also for the application layer admission control for anycast flows
web caching is an important technique to scale the internet one important performance factor of web caches is the replacement strategy due to specific characteristics of the world wide web there exist huge number of proposals for cache replacement this article proposes classification for these proposals that subsumes prior classifications using this classification different proposals and their advantages and disadvantages are described furthermore the article discusses the importance of cache replacement strategies in modern proxy caches and outlines potential future research topics
contextual text mining is concerned with extracting topical themes from text collection with context information eg time and location and comparing analyzing the variations of themes over different contexts since the topics covered in document are usually related to the context of the document analyzing topical themes within context can potentially reveal many interesting theme patterns in this paper we generalize some of these models proposed in the previous work and we propose new general probabilistic model for contextual text mining that can cover several existing models as special cases specifically we extend the probabilistic latent semantic analysis plsa model by introducing context variables to model the context of document the proposed mixture model called contextual probabilistic latent semantic analysis cplsa model can be applied to many interesting mining tasks such as temporal text mining spatiotemporal text mining author topic analysis and cross collection comparative analysis empirical experiments show that the proposed mixture model can discover themes and their contextual variations effectively
we have implemented concurrent copying garbage collector that uses replicating garbage collection in our design the client can continuously access the heap during garbage collection no low level synchronization between the client and the garbage collector is required on individual object operations the garbage collector replicates live heap objects and periodically synchronizes with the client to obtain the client’s current root set and mutation log an experimental implementation using the standard ml of new jersey system on shared memory multiprocessor demonstrates excellent pause time performance and moderate execution time speedups
while many application service providers have proposed using thin client computing to deliver computational services over the internet little work has been done to evaluate the effectiveness of thin client computing in wide area network to assess the potential of thin client computing in the context of future commodity high bandwidth internet access we have used novel noninvasive slow motion benchmarking technique to evaluate the performance of several popular thin client computing platforms in delivering computational services cross country over internet our results show that using thin client computing in wide area network environment can deliver acceptable performance over internet even when client and server are located thousands of miles apart on opposite ends of the country however performance varies widely among thin client platforms and not all platforms are suitable for this environment while many thin client systems are touted as being bandwidth efficient we show that network latency is often the key factor in limiting wide area thin client performance furthermore we show that the same techniques used to improve bandwidth efficiency often result in worse overall performance in wide area networks we characterize and analyze the different design choices in the various thin client platforms and explain which of these choices should be selected for supporting wide area computing services
an interactive and intuitive way of designing lighting around model is desirable in many applications in this paper we present tool for interactive inverse lighting in which model is rendered based on sketched lighting effects to specify target lighting the user freely sketches bright and dark regions on the model as if coloring it with crayons using these hints and the geometry of the model the system efficiently derives light positions directions intensities and spot angles assuming local point light based illumination model as the system also minimizes changes from the previous specifications lighting can be designed incrementally we formulate the inverse lighting problem as that of an optimization and solve it using judicious mix of greedy and minimization methods we also map expensive calculations of the optimization to graphics hardware to make the process fast and interactive our tool can be used to augment larger systems that use point light based illumination models but lack intuitive interfaces for lighting design and also in conjunction with applications like ray tracing where interactive lighting design is difficult to achieve
current proposals for web querying systems have assumed centralized processing architecture wherein data is shipped from the remote sites to the user’s site we present here the design and implementation of diaspora highly distributed query processing system for the web it is based on the premise that several web applications are more naturally processed in distributed manner opening up possibilities of significant reductions in network traffic and user response times diaspora is built over an expressive graph based data model that utilizes simple heuristics and lends itself to automatic generation the model captures both the content of web documents and the hyperlink structural framework of web site distributed queries on the model are expressed through declarative language that permits users to explicitly specify navigation diaspora implements query shipping model wherein queries are autonomously forwarded from one web site to another without requiring much coordination from the query originating site its design addresses variety of interesting issues that arise in the distributed web context including determining query completion handling query rewriting supporting query termination and preventing multiple computations of query at site due to the same query arriving through different paths in the hyperlink framework the diaspora system is currently operational and is undergoing testing on our campus network in this paper we describe the design of the system and report initial performance results that indicate significant performance improvements over comparable centralized approaches
in the field of classification problems we often encounter classes with very different percentage of patterns between them classes with high pattern percentage and classes with low pattern percentage these problems receive the name of classification problems with imbalanced data sets in this paper we study the behaviour of fuzzy rule based classification systems in the framework of imbalanced data sets focusing on the synergy with the preprocessing mechanisms of instances and the configuration of fuzzy rule based classification systems we will analyse the necessity of applying preprocessing step to deal with the problem of imbalanced data sets regarding the components of the fuzzy rule base classification system we are interested in the granularity of the fuzzy partitions the use of distinct conjunction operators the application of some approaches to compute the rule weights and the use of different fuzzy reasoning methods
we study the problem of estimating the position and orientation of calibrated camera from an image of known scene common problem in camera pose estimation is the existence of false correspondences between image features and modeled points existing techniques such as ransac to handle outliers have no guarantee of optimality in contrast we work with natural extension of the infin norm to the outlier case using simple result from classical geometry we derive necessary conditions for infin optimality and show how to use them in branch and bound setting to find the optimum and to detect outliers the algorithm has been evaluated on synthetic as well as real data showing good empirical performance in addition for cases with no outliers we demonstrate shorter execution times than existing optimal algorithms
there are many applications in olap and data analysis where we identify regions of interest for example in olap an analysis query involving aggregate sales performance of various products in different locations and seasons could help identify interesting cells such as cells of data cube having an aggregate sales higher than threshold while normal answer to such quiry merely returns all interesting cells it may be far more informative to the user if the system return summaries or descriptions of regions formed from the identified cells the minimum description length mdl principle is well known strategy for finding such region descriptions in this paper we propose generalization of the mdl principle called gmdl and show that gmdl leads to fewer regions than mdl and hence more concise answers returned to the user the key idea is that region may contain don’t care cells up to global maximum if these don’t care cells help to form bigger summary regions leading to more concise overall summary we study the problem of generating minimal region descriptions under the gmdl principle for two different scenarios in the first all dimensions of the data space are spatial in the second scenario all dimentions are categorial and organized in hierarchies we propose region finding algorithms for both scenarios and evaluate their run time and compression performance using detailed experimentation our results show the effectiveness of the gmdl principle and the proposed algorithms
there is large and continually growing quantity of electronic text available which contain essential human and organization knowledge an important research endeavor is to study and develop better ways to access this knowledge text clustering is popular approach to automatically organize textual document collections by topics to help users find the information they need adaptive resonance theory art neural networks possess several interesting properties that make them appealing in the area of text clustering although art has been used in several research works as text clustering tool the level of quality of the resulting document clusters has not been clearly established yet in this paper we present experimental results with binary art that address this issue by determining how close clustering quality is to an upper bound on clustering quality
markerless vision based human motion analysis has the potential to provide an inexpensive non obtrusive solution for the estimation of body poses the significant research effort in this domain has been motivated by the fact that many application areas including surveillance human computer interaction and automatic annotation will benefit from robust solution in this paper we discuss the characteristics of human motion analysis we divide the analysis into modeling and an estimation phase modeling is the construction of the likelihood function estimation is concerned with finding the most likely pose given the likelihood surface we discuss model free approaches separately this taxonomy allows us to highlight trends in the domain and to point out limitations of the current state of the art
web applications typically interact with back end database to retrieve persistent data and then present the data to the user as dynamically generated output such as html web pages however this interaction is commonly done through low level api by dynamically constructing query strings within general purpose programming language such as java this low level interaction is ad hoc because it does not take into account the structure of the output language accordingly user inputs are treated as isolated lexical entities which if not properly sanitized can cause the web application to generate unintended output this is called command injection attack which poses serious threat to web application security this paper presents the first formal definition of command injection attacks in the context of web applications and gives sound and complete algorithm for preventing them based on context free grammars and compiler parsing techniques our key observation is that for an attack to succeed the input that gets propagated into the database query or the output document must change the intended syntactic structure of the query or document our definition and algorithm are general and apply to many forms of command injection attacks we validate our approach with sqlchecks an implementation for the setting of sql command injection attacks we evaluated sqlchecks on real world web applications with systematically compiled real world attack data as input sqlchecks produced no false positives or false negatives incurred low runtime overhead and applied straightforwardly to web applications written in different languages
data deduplication has become popular technology for reducing the amount of storage space necessary for backup and archival data content defined chunking cdc techniques are well established methods of separating data stream into variable size chunks such that duplicate content has good chance of being discovered irrespective of its position in the data stream requirements for cdc include fast and scalable operation as well as achieving good duplicate elimination while the latter can be achieved by using chunks of small average size this also increases the amount of metadata necessary to store the relatively more numerous chunks and impacts negatively the system’s performance we propose new approach that achieves comparable duplicate elimination while using chunks of larger average size it involves using two chunk size targets and mechanisms that dynamically switch between the two based on querying data already stored we use small chunks in limited regions of transition from duplicate to nonduplicate data and elsewhere we use large chunks the algorithms rely on the block store’s ability to quickly deliver high quality reply to existence queries for already stored blocks chunking decision is made with limited lookahead and number of queries we present results of running these algorithms on actual backup data as well as four sets of source code archives our algorithms typically achieve similar duplicate elimination to standard algorithms while using chunks times as large such approaches may be particularly interesting to distributed storage systems that use redundancy techniques such as error correcting codes requiring multiple chunk fragments for which metadata overheads per stored chunk are high we find that algorithm variants with more flexibility in location and size of chunks yield better duplicate elimination at cost of higher number of existence queries
code mobility enables dynamic customization and configuration of ubiquitous internet applications mobile applications can transfer the execution of software components from one device to another depending on resource availability they can also adapt functionality according to user needs and device characteristics thus the authors have developed policy based approach to mobility programming that expresses and controls reconfiguration strategies at high level of abstraction separate from the application’s functionality
large software projects contain significant code duplication mainly due to copying and pasting code many techniques have been developed to identify duplicated code to enable applications such as refactoring detecting bugs and protecting intellectual property because source code is often unavailable especially for third party software finding duplicated code in binaries becomes particularly important however existing techniques operate primarily on source code and no effective tool exists for binaries in this paper we describe the first practical clone detection algorithm for binary executables our algorithm extends an existing tree similarity framework based on clustering of characteristic vectors of labeled trees with novel techniques to normalize assembly instructions and to accurately and compactly model their structural information we have implemented our technique and evaluated it on windows xp system binaries totaling over million assembly instructions results show that it is both scalable and precise it analyzed windows xp system binaries in few hours and produced few false positives we believe our technique is practical enabling technology for many applications dealing with binary code
data warehouses are major component of data driven decision support systems dss they rely on multidimensional models the latter provide decision makers with business oriented view to data thereby easing data navigation and analysis via on line analytical processing olap tools they also determine how the data are stored in the data warehouse for subsequent use not only by olap tools but also by other decision support tools data warehouse design is complex task which requires systematic method few such methods have been proposed to date this paper presents uml based data warehouse design method that spans the three design phases conceptual logical and physical our method comprises set of metamodels used at each phase as well as set of transformations that can be semi automated following our object orientation we represent all the metamodels using uml and illustrate the formal specification of the transformations based on omg’s object constraint language ocl throughout the paper we illustrate the application of our method to case study
as various applications of ad hoc network have been proposed security issues have become central concern and are increasingly important in this paper we propose distributed key management approach by using the self certified public key system and threshold secret sharing schemes without any assumption of prefixed trust relationship between nodes the ad hoc network works in self organising way to provide the key generation and key management services using threshold secret sharing schemes which effectively solves the problem of single point of failure the using of self certified public key system has the following advantages the storage space and the communication overheads can be reduced in that the certificate is unnecessary the computational costs can be decreased since it requires no public key verification there is no key escrow problem since the certificate authority ca does not know the users private keys as compared with the previous works which were implemented with the certificate based public key system and identity based id based public key system the proposed approach is more secure and efficient
the steady growth in the multifaceted use of broadband asynchronous transfer mode atm networks for time critical applications has significantly increased the demands on the quality of service qos provided by the networks satisfying these demands requires the networks to be carefully engineered based on inferences drawn from detailed analysis of various scenarios analysis of networks is often performed through computer based simulations simulation based analysis of the networks including nonquiescent or rare conditions must be conducted using high fidelity high resolution models that reflect the size and complexity of the network to ensure that crucial scalability issues do not dominate however such simulations are time consuming because significant time is spent in driving the models to the desired scenarios in an endeavor to address the issues associated with the aforementioned bottleneck this article proposes novel multiresolution modeling based methodology called dynamic component substitution dcs dcs is used to dynamically ie during simulation change the resolution of the model which enables more optimal trade offs between different parameters such as observability fidelity and simulation overheads thereby reducing the total time for simulationthe article presents the issues involved in applying dcs in parallel simulations of atm networks an empirical evaluation of the proposed approach is also presentedthe experiments indicate that dcs can significantly accelerate the simulation of atm networks without affecting the overall accuracy of the simulation results
in this paper we deal with mining sequential patterns in multiple time sequences building on state of the art sequential pattern mining algorithm prefixspan for mining transaction databases we propose mile mining in multiple sequences an efficient algorithm to facilitate the mining process mile recursively utilizes the knowledge of existing patterns to avoid redundant data scanning and therefore can effectively speed up the new patterns discovery process another unique feature of mile is that it can incorporate prior knowledge of the data distribution in time sequences into the mining process to further improve the performance extensive empirical results show that mile is significantly faster than prefixspan as mile consumes more memory than prefixspan we also present solution to trade time efficiency in memory constrained environments
we propose the first known solution to the problem of correlating in small space continuous streams of xml data through approximate structure and content matching as defined by general tree edit distance metric the key element of our solution is novel algorithm for obliviously embedding tree edit distance metrics into an vector space while guaranteeing worst case upper bound of logn log ast on the distance distortion between any data trees with at most nodes we demonstrate how our embedding algorithm can be applied in conjunction with known random sketching techniques to build compact synopsis of massive streaming xml data tree that can be used as concise surrogate for the full tree in approximate tree edit distance computations and approximate the result of tree edit distance similarity joins over continuous xml document streams experimental results from an empirical study with both synthetic and real life xml data trees validate our approach demonstrating that the average case behavior of our embedding techniques is much better than what would be predicted from our theoretical worst case distortion bounds to the best of our knowledge these are the first algorithmic results on low distortion embeddings for tree edit distance metrics and on correlating eg through similarity joins xml data in the streaming model
we present simple module calculus where selection and execution of component is possible on open modules that is modules that still need to import some external definitions hence it provides kernel model for computational paradigm in which standard execution that is execution of single computation described by fragment of code can be interleaved with operations at the meta level which can manipulate in various ways the context in which this computation takes place formally this is achieved by introducing configurations as basic terms these are roughly speaking pairs consisting of an open mutually recursive collection of named components and term representing program running in the context of these components configurations can be manipulated by classical module fragment operators hence reduction steps can be either execution steps of the program or steps that perform module operations called reconfiguration steps since configurations combine the features of lambda abstractions first class functions records environments with mutually recursive definitions and modules the calculus extends and integrates both traditional module calculi and recursive lambda calculi we state confluence of the calculus and propose different ways to prevent errors arising from the lack of some required component either by purely static type system or by combination of static and run time checks moreover we define call by need strategy that performs module simplification only when needed and only once leading to generalisation of call by need lambda calculi that includes module features we prove the soundness and completeness of this strategy using an approach based on information content which also allows us to preserve confluence even when local substitution rules are added to the calculus
face recognition is rapidly growing research area due to increasing demands for security in commercial and law enforcement applications this paper provides an up to date review of research efforts in face recognition techniques based on two dimensional images in the visual and infrared ir spectra face recognition systems based on visual images have reached significant level of maturity with some practical success however the performance of visual face recognition may degrade under poor illumination conditions or for subjects of various skin colors ir imagery represents viable alternative to visible imaging in the search for robust and practical identification system while visual face recognition systems perform relatively reliably under controlled illumination conditions thermal ir face recognition systems are advantageous when there is no control over illumination or for detecting disguised faces face recognition using images is another active area of face recognition which provides robust face recognition with changes in pose recent research has also demonstrated that the fusion of different imaging modalities and spectral components can improve the overall performance of face recognition
recent advances in wireless networks and embedded systems have created new class of pervasive systems such as wireless sensor networks wsns and radio frequency identification rfid systems wsns and rfid systems provide promising solutions for wide variety of applications particularly in pervasive computing however security and privacy concerns have raised serious challenges on these systems these concerns have become more apparent when wsns and rfid systems co exist in this article we first briefly introduce wsns and rfid systems we then present their security concerns and related solutions finally we propose linear congruential generator lcg based lightweight block cipher that can meet security co existence requirements of wsns and rfid systems for pervasive computing
given real valued function defined over some metric space is it possible to recover some structural information about from the sole information of its values at finite set sube of sample points whose pairwise distances in are given we provide positive answer to this question more precisely taking advantage of recent advances on the front of stability for persistence diagrams we introduce novel algebraic construction based on pair of nested families of simplicial complexes built on top of the point cloud from which the persistence diagram of can be faithfully approximated we derive from this construction series of algorithms for the analysis of scalar fields from point cloud data these algorithms are simple and easy to implement have reasonable complexities and come with theoretical guarantees to illustrate the generality of the approach we present some experimental results obtained in various applications ranging from clustering to sensor networks see the electronic version of the paper for color pictures
large scale network services can consist of tens of thousands of machines running thousands of unique software configurations spread across hundreds of physical networks testing such services for complex performance problems and configuration errors remains difficult problem existing testing techniques such as simulation or running smaller instances of service have limitations in predicting overall service behavior although technically and economically infeasible at this time testing should ideally be performed at the same scale and with the same configuration as the deployed service we present diecast an approach to scaling network services in which we multiplex all of the nodes in given service configuration as virtual machines vm spread across much smaller number of physical machines in test harness cpu network and disk are then accurately scaled to provide the illusion that each vm matches machine from the original service in terms of both available computing resources and communication behavior to remote service nodes we present the architecture and evaluation of system to support such experimentation and discuss its limitations we show that for variety of services including commercial high performance cluster based file system and resource utilization levels diecast matches the behavior of the original service while using fraction of the physical resources
we present lambda cil typed lambda calculus which serves as the foundation for typed intermediate language for optimizing compilers for higher order polymorphic programming languages the key innovation of lambda cil is novel formulation of intersection and union types and flow labels on both terms and types these flow types can encode polyvariant control and data flow information within polymorphically typed program representation flow types can guide compiler in generating customized data representations in strongly typed setting since lambda cil enjoys confluence standardization and subject reduction properties it is valuable tool for reasoning about programs and program transformations
data fusion is the combination of number of independent search results relating to the same document collection into single result to be presented to the user number of probabilistic data fusion models have been shown to be effective in empirical studies these typically attempt to estimate the probability that particular documents will be relevant based on training data however little attempt has been made to gauge how the accuracy of these estimations affect fusion performance the focus of this paper is twofold firstly that accurate estimation of the probability of relevance results in effective data fusion and secondly that an effective approximation of this probability can be made based on less training data that has previously been employed this is based on the observation that the distribution of relevant documents follows similar pattern in most high quality result sets curve fitting suggests that this can be modelled by simple function that is less complex than other models that have been proposed the use of existing ir evaluation metrics is proposed as substitution for probability calculations mean average precision is used to demonstrate the effectiveness of this approach with evaluation results demonstrating competitive performance when compared with related algorithms with more onerous requirements for training data
the parameterization of mesh into planar domain requires distortion metric and minimizing process most previous work has sought to minimize the average area distortion the average angle distortion or combination of these typical distortion metrics can reflect the overall performance of parameterizations but discount high local deformations this affects the performance of postprocessing operations such as uniform remeshing and texture mapping this paper introduces new metric that synthesizes the average distortions and the variances of both the area deformations and the angle deformations over an entire mesh experiments show that when compared with previous work the use of synthesized distortion metric performs satisfactorily in terms of both the average area deformation and the average angle deformation furthermore the area and angle deformations are distributed more uniformly this paper also develops new iterative process for minimizing the synthesized distortion the coefficient optimizing algorithm at each iteration rather than updating the positions immediately after the local optimization the coefficient optimizing algorithm first update the coefficients for the linear convex combination and then globally updates the positions by solving the laplace system the high performance of the coefficient optimizing algorithm has been demonstrated in many experiments
probabilistic flooding has been frequently considered as suitable dissemination information approach for limiting the large message overhead associated with traditional full flooding approaches that are used to disseminate globally information in unstructured peer to peer and other networks key challenge in using probabilistic flooding is the determination of the forwarding probability so that global network outreach is achieved while keeping the message overhead as low as possible in this paper by showing that probabilistic flooding network generated by applying probabilistic flooding to connected random graph network can be asymptotically bounded by properly parameterized random graph networks and by invoking random graph theory results asymptotic values of the forwarding probability are derived guaranteeing probabilistically successful coverage while significantly reducing the message overhead with respect to traditional flooding asymptotic expressions with respect to the average number of messages and the average time required to complete network coverage are also derived illustrating the benefits of the properly parameterized probabilistic flooding scheme simulation results support the claims and expectations of the analytical results and reveal certain aspects of probabilistic flooding not covered by the analysis
this paper addresses the design and implementation of an adaptive document version management scheme existing schemes typically assume priori expectations for how versions will be manipulated and ii fixed priorities between storage space usage and average access time they are not appropriate for all possible applications we introduce the concept of document version pertinence levels in order to select the best scheme for given requirements eg access patterns trade offs between access time and storage space pertinence levels can be considered as heuristics to dynamically select the appropriate scheme to improve the effectiveness of version management we present testbed for evaluating xml version management schemes
this paper describes an event dissemination algorithm that implements topic based publish subscribe interaction abstraction in mobile ad hoc networks manets our algorithm is frugal in two senses first it reduces the total number of duplicates and parasite events received by the subscribers second both the mobility of the publishers and the subscribers as well as the validity periods of the events are exploited to achieve high level of dissemination reliability with thrifty usage of the memory and bandwidth besides our algorithm is inherently portable and does not assume any underlying routing protocol we give simulation results of our algorithms in the two most popular mobility models city section and random waypoint we highlight interesting empirical lower bounds on the minimal validity period of any given event to ensure its reliable dissemination
various applications of spectral techniques for enhancing graph bisection in genetic algorithms are investigated several enhancements to genetic algorithm for graph bisection are introduced based on spectral decompositions of adjacency matrices of graphs and subpopulation matrices first the spectral decompositions give initial populations for the genetic algorithm to start with next spectral techniques are used to engineer new individuals and reorder the schema to strategically group certain sets of vertices together on the chromosome the operators and techniques are found to be beneficial when added to plain genetic algorithm and when used in conjunction with other local optimization techniques for graph bisection in addition several world record minimum bisections have been obtained from the methods described in this study
in previous work have introduced nonmonotonic probabilistic logics under variable strength inheritance with overriding they are formalisms for probabilistic reasoning from sets of strict logical default logical and default probabilistic sentences which are parameterized through value that describes the strength of the inheritance of default probabilistic knowledge in this paper continue this line of research give precise picture of the complexity of deciding consistency of strength and of computing tight consequences of strength furthermore present algorithms for these tasks which are based on reductions to the standard problems of deciding satisfiability and of computing tight logical consequences in model theoretic probabilistic logic finally describe the system nmproblog which includes prototype implementation of these algorithms
wireless sensor networks wsns are increasingly being used to monitor various parameters in wide range of environmental monitoring applications in many instances environmental scientists are interested in collecting raw data using long running queries injected into wsn for analyzing at later stage rather than injecting snap shot queries containing data reducing operators eg min max avg that aggregate data collection of raw data poses challenge to wsns as very large amounts of data need to be transported through the network this not only leads to high levels of energy consumption and thus diminished network lifetime but also results in poor data quality as much of the data may be lost due to the limited bandwidth of present day sensor nodes we alleviate this problem by allowing certain nodes in the network to aggregate data by taking advantage of spatial and temporal correlations of various physical parameters and thus eliminating the transmission of redundant data in this article we present distributed scheduling algorithm that decides when particular node should perform this novel type of aggregation the scheduling algorithm autonomously reassigns schedules when changes in network topology due to failing or newly added nodes are detected such changes in topology are detected using cross layer information from the underlying mac layer we first present the theoretical performance bounds of our algorithm we then present simulation results which indicate reduction in message transmissions of up to percnt and an increase in network lifetime of up to percnt when compared to collecting raw data our algorithm is also capable of completely eliminating dropped messages caused by buffer overflow
in recent years the weakness of the canonical support confidence framework for associations mining has been widely studied one of the difficulties in applying association rules mining is the setting of support constraints high support constraint avoids the combinatorial explosion in discovering frequent itemsets but at the expense of missing interesting patterns of low support instead of seeking way to set the appropriate support constraints all current approaches leave the users in charge of the support setting which however puts the users in dilemma this paper is an effort to answer this long standing open question according to the notion of confidence and lift measures we propose an automatic support specification for efficiently mining high confidence and positive lift associations without consulting the users experimental results show that the proposed method is not only good at discovering high confidence and positive lift associations but also effective in reducing spurious frequent itemsets
there is growing need for the use of active systems systems that act automatically based on events in many cases providing such active functionality requires materializing inferring the occurrence of relevant events widespread paradigm for enabling such materialization is complex event processing cep rule based paradigm which currently relies on domain experts to fully define the relevant rules these experts need to provide the set of basic events which serves as input to the rule their inter relationships and the parameters of the events for determining new event materialization while it is reasonable to expect that domain experts will be able to provide partial rules specification providing all the required details is hard task even for domain experts moreover in many active systems rules may change over time due to the dynamic nature of the domain such changes complicate even further the specification task as the expert must constantly update the rules as result we seek additional support to the definition of rules beyond expert opinion this work presents mechanism for automating both the initial definition of rules and the update of rules over time this mechanism combines partial information provided by the domain expert with machine learning techniques and is aimed at improving the accuracy of event specification and materialization the proposed mechanism consists of two main repetitive stages namely rule parameter prediction and rule parameter correction the former is performed by updating the parameters using an available expert knowledge regarding the future changes of parameters the latter stage utilizes expert feedback regarding the actual past occurrence of events and the events materialized by the cep framework to tune rule parameters we also include possible implementations for both stages based on statistical estimator and evaluate our outcome using case study from the intrusion detection domain
assembly instruction level reverse execution provides programmer with the ability to return program to previous state in its execution history via execution of reverse program the ability to execute program in reverse is advantageous for shortening software development time conventional techniques for recovering state rely on saving the state into record before the state is destroyed however state saving causes significant memory and time overheads during forward executionthe proposed method introduces reverse execution methodology at the assembly instruction level with low memory and time overheads the methodology generates from program reverse program by which destroyed state is almost always regenerated rather than being restored from record this significantly reduces state savingthe methodology has been implemented on powerpc processor with custom made debugger as compared to previous work all of which heavily use state saving techniques the experimental results show from to reduction in run time memory usage from to reduction in forward execution time overhead and from to reduction in forward execution time for the tested benchmarks furthermore due to the reduction in memory usage our method can provide reverse execution in many cases where other methods run out of available memory however for cases where there is enough memory available our method results in to slow down in reverse execution
while various optimization techniques have been used in existing thin client systems to reduce network traffic the screen updates triggered by many user operations will still result in long interactive latencies in many contemporary network environments long interactive latencies have an unfavorable effect on users perception of graphical interfaces and visual contents the long latencies arise when data spikes need to be transferred over network while the available bandwidth is limited these data spikes are composed of large amount of screen update data produced in very short time in this paper we propose model to analyze the packet level redundancy in screen update streams caused by repainting of graphical objects using this model we analyzed the data spikes in screen update streams based on the analysis result we designed hybrid cache compression scheme this scheme caches the screen updates in data spikes on both server and client sides and uses the cached data as history to better compress the recurrent screen updates in possible data spikes we empirically studied the effectiveness of our cache scheme on some screen updates generated by one of the most bandwidth efficient thin client system microsoft terminal service the experiment results showed that this cache scheme with cache of bytes can reduce data spike count and network traffic for the tested data and can reduce noticeable long latencies for different types of applications this scheme costs only little additional computation time and the cache size can be negotiated between the client and server
sometimes it is desirable to alter or optimize the behaviour of an object according to the needs of specific portion of the source code ie context such as particular loop or phase one technique to support this form of optimization flexibility is novel approach called scoped behaviour scoped behaviour allows the programmer to incrementally tune applications on per object and per context basis within standard we explore the use of scoped behaviour in the implementation of the aurora distributed shared data dsd system in aurora the programmer uses scoped behaviour as the interface to various data sharing optimizations we detail howa class library implements the basic data sharing functionality and how scoped behaviour co ordinates the compile time and run time interaction between classes to implement the optimizations we also explore how the library can be expanded with new classes and new optimization behaviours the good performance of aurora suggests that using scoped behaviour and class library is viable approach for supporting this form of optimization flexibility
use of traditional mean type algorithm is limited to numeric data this paper presents clustering algorithm based on mean paradigm that works well for data with mixed numeric and categorical features we propose new cost function and distance measure based on co occurrence of values the measures also take into account the significance of an attribute towards the clustering process we present modified description of cluster center to overcome the numeric data only limitation of mean algorithm and provide better characterization of clusters the performance of this algorithm has been studied on real world data sets comparisons with other clustering algorithms illustrate the effectiveness of this approach
we consider the problem of estimating the frequency count of data stream elements under polynomial decay functions in these settings every element arrives in the stream is assigned with time decreasing weight using non increasing polynomial function decay functions are used in applications where older data is less significant interesting reliable than recent data we propose poly logarithmic algorithms for the problem the first one deterministic uses frac epsilon log log log log bits the second one probabilistic uses frac epsilon log frac epsilon delta log bits and the third one deterministic in the stochastic model uses frac epsilon log bits in addition we show that using additional additive error can improve in some cases the space bounds this variant of the problem is important and has many applications to our knowledge it was never studied before
schema matching is crucial step in data integration many approaches to schema matching have been proposed these approaches make use of different types of information about schemas including structures linguistic features and data types etc to measure different types of similarity between the attributes of two schemas they then combine different types of similarity and use combined similarity to select collection of attribute correspondences for every source attribute thresholds are usually used for filtering out likely incorrect attribute correspondences which have to be set manually and are matcher and domain dependent selection strategy is also used to resolve any conflicts between attribute correspondences of different source attributes in this paper we propose new prioritized collective selection strategy that has two distinct characteristics first this strategy clusters set of attribute correspondences into number of clusters and collectively selects attribute correspondences from each of these clusters in prioritized order second it introduces use of null correspondence for each source attribute which represents the option that the source attribute has no attribute correspondence by considering this option our strategy does not need threshold to filter out likely incorrect attribute correspondences our experimental results show that our approach is highly effective
in an evolving software system components must be able to change independently while remaining compatible with their peers one obstacle to independent evolution is the brittle parameter problem the ability of two components to communicate can depend on number of inessential details of the types structure and or contents of the values communicated if these details change then the components can no longer communicate even if the essential parts of the message remain unaffectedwe present hydroj an extension of java that addresses this problem in hydroj components communicate using self describing semi structured messages and programmers use pattern matching to define the handling of messages this design stems from two central ideas first that self describing messages reduce dependence on inessential message format details and second that object oriented pattern matching naturally focuses on the essential information in message and is insensitive to inessential informationwe have developed these ideas in the context of rain distributed heterogeneous messaging system for ubiquitous computing to evaluate the design we have constructed prototype hydroj compiler implemented some rain services in hydroj studied the evolution of an existing rain service over time and formalized hydroj’s key features in core language
householders are increasingly adopting home networking as solution to the demands created by the presence of multiple computers devices and the desire to access the internet however current network solutions are derived from the world of work and initially the military and provide poor support for the needs of the home we present the key findings to emerge from empirical studies of home networks in the uk and us the studies reveal two key kinds of work that effective home networking relies upon one the technical work of setting up and maintaining the home network and the other the collaborative and socially organized work of the home which the network is embedded in and supports the two are thoroughly intertwined and rely upon one another for their realization yet neither is adequately supported by current networking technologies and applications explication of the ldquo work to make the home network work rdquo opens up the design space for the continued integration of the home network in domestic life and elaboration of future support key issues for development include the development of networking facilities that do not require advanced networking knowledge that are flexible and support the local social order of the home and the evolution of its routines and which ultimately make the home network visible and accountable to household members
the goal of the work described here is to integrate input device the wii controller and enjine didactic engine motivated by the growing use of interfaces this article discusses how this increases enjine’s didactic and technological potential and details the adopted solution as layered architecture two interaction styles were tested with the implemented solution test results show variety of data about the controller confirming that this solution works as desired and that using it to modify game so it can use the wiimote as its input device is simple task
software product line spl engineering is software development approach that takes advantage of the commonality and variability between products from family and supports the generation of specific products by reusing set of core family assets this paper proposes uml model transformation approach for software product lines to derive performance model for specific product the input to the proposed technique the source model is uml model of spl with performance annotations which uses two separate profiles product line profile from literature for specifying the commonality and variability between products and the marte profile recently standardized by omg for performance annotations the source model is generic and therefore its performance annotations must be parameterized the proposed derivation of performance model for concrete product requires two steps the transformation of spl model to uml model with performance annotations for given product and the transformation of the outcome of the first step into performance model this paper focuses on the first step whereas the second step will use the puma transformation approach of annotated uml models to performance models developed in previous work the output of the first step named target model is uml model with marte annotations where the variability expressed in the spl model has been analyzed and bound to specific product and the generic performance annotations have been bound to concrete values for the product the proposed technique is illustrated with an commerce case study
we investigate how money market news headlines can be used to forecast intraday currency exchange rate movements the innovation of the approach is that unlike analysis based on quantifiable information the forecasts are produced from text describing the current status of world financial markets as well as political and general economic news in contrast to numeric time series data textual data contains not only the effect eg the dollar rises against the deutschmark but also the possible causes of the event eg because of weak german bond market hence improved predictions are expected from this richer input the output is categorical forecast about currency exchange rates the dollar moves up remains steady or goes down within the next one two or three hours respectively on publicly available commercial data set the system produces results that are significantly better than random prediction the contribution of this research is the smart modeling of the prediction problem enabling the use of content rich text for forecasting purposes
in this article and two other articles which conceptualize future stage of the research program leide cole large beheshti submitted for publication cole leide large beheshti brooks in preparation we map out domain novice user’s encounter with an ir system from beginning to end so that appropriate classification based visualization schemes can be inserted into the encounter process this article describes the visualization of navigation classification scheme only the navigation classification scheme uses the metaphor of ship and ship’s navigator traveling through charted but unknown to the user waters guided by series of lighthouses the lighthouses contain mediation interfaces linking the user to the information store through agents created for each the user’s agent is the cognitive model the user has of the information space which the system encourages to evolve via interaction with the system’s agent the system’s agent is an evolving classification scheme created by professional indexers to represent the structure of the information store we propose more systematic multidimensional approach to creating evolving classification indexing schemes based on where the user is and what she is trying to do at that moment during the search session
although workstation clusters are common platform for high performance computing hpc they remain more difficult to manage than sequential systems or even symmetric multiprocessors furthermore as cluster sizes increase the quality of the resource management subsystem essentially all of the code that runs on cluster other than the applications increasingly impacts application efficiency in this paper we present storm resource management framework designed for scalability and performance the key innovation behind storm is software architecture that enables resource management to exploit low level network features as result of this hpc application like design storm is orders of magnitude faster than the best reported results in the literature on two sample resource management functions job launching and process scheduling
data cube is popular organization for summary data cube is simply multidimensional structure that contains in each cell an aggregate value ie the result of applying an aggregate function to an underlying relation in practical situations cubes can require large amount of storage so compressing them is of practical importance in this paper we propose an approximation technique that reduces the storage cost of the cube at the price of getting approximate answers for the queries posed against the cube the idea is to characterize regions of the cube by using statistical models whose description take less space than the data itself then the model parameters can be used to estimate the cube cells with certain level of accuracy to increase the accuracy and to guarantee the level of error in the query answers some of the ldquo outliers rdquo ie cells that incur in the largest errors when estimated are retained the storage taken by the model parameters and the retained cells of course should take fraction of the space of the full cube and the estimation procedure should be faster than computing the data from the underlying relations we use loglinear models to model the cube regions experiments show that the errors introduced in typical queries are small even when the description is substantially smaller than the full cube since cubes are used to support data analysis and analysts are rarely interested in the precise values of the aggregates but rather in trends providing approximate answers is in most cases satisfactory compromise although other techniques have been used for the purpose of compressing data cubes ours has the advantage of using parametric loglinear models and the retaining of outliers which enables the system to give error guarantees that are data independent for every query posed on the data cube the models also offer information about the underlying structure of the data modeled by them moreover these models are relatively easy to update dynamically as data is added to the warehouse
sybil node impersonates other nodes by broadcasting messages with multiple node identifiers id in contrast to existing solutions which are based on sharing encryption keys we present robust and lightweight solution for sybil attack problem based on received signal strength indicator rssi readings of messages our solution is robust since it detects all sybil attack cases with completeness and less than few percent false positives our solution is lightweight in the sense that alongside the receiver we need the collaboration of one other node ie only one message communication for our protocol we show through experiments that even though rssi is time varying and unreliable in general and radio transmission is non isotropic using ratio of rssis from multiple receivers it is feasible to overcome these problems
recently developed methods for learning sparse classifiers are among the state of the art in supervised learning these methods learn classifiers that incorporate weighted sums of basis functions with sparsity promoting priors encouraging the weight estimates to be either significantly large or exactly zero from learning theoretic perspective these methods control the capacity of the learned classifier by minimizing the number of basis functions used resulting in better generalization this paper presents three contributions related to learning sparse classifiers first we introduce true multiclass formulation based on multinomial logistic regression second by combining bound optimization approach with component wise update procedure we derive fast exact algorithms for learning sparse multiclass classifiers that scale favorably in both the number of training samples and the feature dimensionality making them applicable even to large data sets in high dimensional feature spaces to the best of our knowledge these are the first algorithms to perform exact multinomial logistic regression with sparsity promoting prior third we show how nontrivial generalization bounds can be derived for our classifier in the binary case experimental results on standard benchmarkdata sets attest to the accuracy sparsity and efficiency of the proposed methods
this paper presents novel interprocedural flow sensitive and context sensitive pointer analysis algorithm for multithreaded programs that may concurrently update shared pointers for each pointer and each program point the algorithm computes conservative approximation of the memory locations to which that pointer may point the algorithm correctly handles full range of constructs in multithreaded programs including recursive functions function pointers structures arrays nested structures and arrays pointer arithmetic casts between pointer variables of different types heap and stack allocated memory shared global variables and thread private global variableswe have implemented the algorithm in the suif compiler system and used the implementation to analyze sizable set of multithreaded programs written in the cilk multithreaded programming language our experimental results show that the analysis has good precision and converges quickly for our set of cilk programs
one of the key benefits of xml is its ability to represent mix of structured and unstructured text data although current xml query languages such as xpath and xquery can express rich queries over structured data they can only express very rudimentary queries over text data we thus propose texquery which is powerful full text search extension to xquery texquery provides rich set of fully composable full text search primitives such as boolean connectives phrase matching proximity distance stemming and thesauri texquery also enables users to seamlessly query over both structured and text data by embedding texquery primitives in xquery and vice versa finally texquery supports flexible scoring construct that can be used toscore query results based on full text predicates texquery is the precursor ofthe full text language extensions to xpath and xquery currently being developed by the wc
the rational unified process is comprehensive process model that is tailorable provides templates for the software engineering products and integrates the use of the unified modeling language uml it is rapidly becoming de facto standard for developing software the process supports the definition of requirements at multiple levels currently the early requirements or goals are captured in textual document called the vision document as the uml does not include goal modeling diagram the goals are subsequently refined into software requirements captured in uml use case diagrams given the well documented advantages of visual modeling techniques in requirements engineering including the efficient communication and understanding of complex information among numerous diverse stakeholders the need for an enhanced version of the vision document template which supports the visual modeling of goals is identified here an enhanced vision document is proposed which integrates two existing visual goal models and or graph for functional goals and softgoal interdependency graph for non functional goals specific approach to establishing traceability relationships from the goals to the use cases is presented tool support has been developed for the enhanced vision document template the approach is illustrated using an example system called the quality assurance review assistant tool
the paper deals with the problem of computing schedules for multi threaded real time programs in we introduced scheduling method based on the geometrization of pv programs in this paper we pursue this direction further by showing property of the geometrization that permits finding good schedules by means of efficient geometric computation in addition this geometric property is also exploited to reduce the scheduling problem to simple path planning problem originating from robotics for which we developed scheduling algorithm using probabilistic path planning techniques these results enabled us to implement prototype tool that can handle models with up to concurrent threads
we present general method for explaining individual predictions of classification models the method is based on fundamental concepts from coalitional game theory and predictions are explained with contributions of individual feature values we overcome the method’s initial exponential time complexity with sampling based approximation in the experimental part of the paper we use the developed method on models generated by several well known machine learning algorithms on both synthetic and real world data sets the results demonstrate that the method is efficient and that the explanations are intuitive and useful the acm portal is published by the association for computing machinery copyright acm inc terms of usage privacy policy code of ethics contact us useful downloads adobe acrobat quicktime windows media player real player
given class model built from dataset including labeled data classification assigns new data object to the appropriate class in associative classification the class model ie the classifier is set of association rules associative classification is promising technique for the generation of highly accurate classifiers in this article we present compact form which encodes without information loss the classification knowledge available in classification rule set this form includes the rules that are essential for classification purposes and thus it can replace the complete rule set the proposed form is particularly effective in dense datasets where traditional extraction techniques may generate huge rule sets the reduction in size of the rule set allows decreasing the complexity of both the rule generation step and the rule pruning step hence classification rule extraction can be performed also with low support in order to extract more possibly useful rules
we propose an approach for the selective enforcement of access control restrictions in possibly distributed large data collections based on two basic concepts flexible authorizations identify in declarative way the data that can be released and ii queries are checked for execution not with respect to individual authorizations but rather evaluating whether the information release they directly or indirectly entail is allowed by the authorizations our solution is based on the definition of query profiles capturing the information content of query and builds on graph based modeling of database schema authorizations and queries access control is then effectively modeled and efficiently executed in terms of graph coloring and composition and on traversal of graph paths we then provide polynomial composition algorithm for determining if query is authorized
sensor networks are often subject to physical attacks once node’s cryptographic key is compromised an attacker may completely impersonate it and introduce arbitrary false information into the network basic cryptographic mechanisms are often not effective in this situation most techniques to address this problem focus on detecting and tolerating false information introduced by compromised nodes they cannot pinpoint exactly where the false information is introduced and who is responsible for it in this article we propose an application independent framework for accurately identifying compromised sensor nodes the framework provides an appropriate abstraction of application specific detection mechanisms and models the unique properties of sensor networks based on the framework we develop alert reasoning algorithms to identify compromised nodes the algorithm assumes that compromised nodes may collude at will we show that our algorithm is optimal in the sense that it identifies the largest number of compromised nodes without introducing false positives we evaluate the effectiveness of the designed algorithm through comprehensive experiments
in this paper we investigate the multi node broadcasting problem in torus where there are an unknown number of source nodes located at unknown positions each intending to broadcast message of size bytes to the rest of the network the torus is assumed to use the all port model and the popular dimension ordered routing existing congestion free results are derived based on finding multiple edge disjoint spanning trees in the network this paper shows how to efficiently perform multi node broadcasting in torus the main technique used in this paper is an aggregation then distribution strategy which is characterized by the following features the broadcast messages are aggregated into some positions on the torus then number of independent subnetworks are constructed from the torus and ii these subnetworks which are responsible for distributing the messages fully exploit the communication parallelism and the characteristic of wormhole routing it is shown that such an approach is more appropriate than those using edge disjoint trees for fixed connection networks such as tori extensive simulations are conducted to evaluate this multi broadcasting algorithm
shrinking time to market and high demand for productivity has driven traditional hardware designers to use design methodologies that start from high level languages however meeting timing constraints of automatically generated ips is often challenging and time consuming task that must be repeated every time the specification is modified to address this issue new generation of ip design technologies that is capable of generating custom datapaths as well as programming an existing one is developed these technologies are often based on horizontal microcoded architectures large code size is well know problem in hmas and is referred to as code bloating problemin this paper we study the code size of one of the new hma based technologies called nisc we show that nisc code size can be several times larger than typical risc processor and we propose several low overhead dictionary based code compression techniques to reduce the code size our compression algorithm leverages the knowledge of don’t care values in the control words to better compress the content of dictionary memories our experiments show that by selecting proper memory architectures the code size of nisc can be reduced by ie times at cost of only performance degradation we also show that some code compression techniques may increase number of utilized block rams in fpga based implementations to address this issue we propose combining dictionaries and implementing them using embedded dual port memories
state of the art application specific instruction set processors asips allow the designer to define individual prefabrication customizations thus improving the degree of specialization towards the actual application requirements eg the computational hot spots however only subset of hot spots can be targeted to keep the asip within reasonable size we propose modular special instruction composition with multiple implementation possibilities per special instruction compile time embedded instructions to trigger run time adaptation of the instruction set and run time system that dynamically selects an appropriate variation of the instruction set ie situation dependent beneficial implementation for each special instruction we thereby achieve better efficiency of resource usage of up to average compared with current state of the art asips resulting in average improved application performance compared with general purpose processor up to and average
web services evolve over their life time and change their behavior in our work we analyze web service related changes and investigate interdependencies of web service related changes we classify changes of web services for an analysis regarding causes and effects of such changes and utilize dedicated web service information model to capture the changes of web services we show how to put changes of web services into an evolutionary context that allows us to develop holistic perspective on web services and their stakeholders in ecosystem of web services
this paper presents formal technique for system level power performance analysis that can help the designer to select the right platform starting from set of target applications by platform we mean family of heterogeneous architectures that satisfy set of architectural constraints imposed to allow re use of hardware and software components more precisely we introduce the stochastic automata networks sans as an effective formalism for average case analysis that can be used early in the design cycle to identify the best power performance figure among several appli cation architecture combinations this information not only helps avoid lengthy profiling simulations but also enables efficient map pings of the applications onto the chosen platform we illustrate the features of our technique through the design of an mpeg video decoder application
with the widening performance gap between processors and main memory efficient memory accessing behavior is necessary for good program performance loop partition is an effective way to exploit the data locality traditional loop partition techniques however consider only singleton nested loop this paper presents multiple loop partition scheduling technique which combines the loop partition and data padding to generate the detailed partition schedule the computation and data prefetching are balanced in the partition schedule such that the long memory latency can be hidden efficiently multiple loop partition scheduling explores parallelism among computations and exploit the data locality between different loop nests as well in each loop nest data padding is applied in our technique to eliminate the cache interference which overcomes the problem of cache conflict misses arisen from loop partition therefore our technique can be applied in architectures with low associativity cache the experiments show that multiple loop partition scheduling can achieve the significant improvement over the existing methods
in this paper we address the issue of reasoning with two classes of commonly used semantic integrity constraints in database and knowledge base systems implication constraints and referential constraints we first consider central problem in this respect the irc refuting problem which is to decide whether conjunctive query always produces an empty relation on finite database instances satisfying given set of implication and referential constraints since the general problem is undecidable we only consider acyclic referential constraints under this assumption we prove that the irc refuting problem is decidable and give novel necessary and sufficient condition for it under the same assumption we also study several other problems encountered in semantic query optimization such as the semantics based query containment problem redundant join problem and redundant selection condition problem and show that they are polynomially equivalent or reducible to the irc refuting problem moreover we give results on reducing the complexity for some special cases of the irc refuting problem
monads have become popular tool for dealing with computational effects in haskell for two significant reasons equational reasoning is retained even in the presence of effects and program modularity is enhanced by hiding plumbing issues inside the monadic infrastructure unfortunately not all the facilities provided by the underlying language are readily available for monadic computations in particular while recursive monadic computations can be defined directly using haskell’s built in recursion capabilities there is no natural way to express recursion over the values of monadic actions using examples we illustrate why this is problem and we propose an extension to haskell’s donotation to remedy the situation it turns out that the structure of monadic value recursion depends on the structure of the underlying monad we propose an axiomatization of the recursion operation and provide catalogue of definitions that satisfy our criteria
logic programming with the stable model semantics is put forward as novel constraint programming paradigm this paradigm is interesting because it bring advantages of logic programming based knowledge representation techniques to constraint programming and because implementation methods for the stable model semantics for ground variable dash free programs have advanced significantly in recent years for program with variables these methods need grounding procedure for generating variable dash free program as practical approach to handling the grounding problem subclass of logic programs domain restricted programs is proposed this subclass enables efficient grounding procedures and serves as basis for integrating built dash in predicates and functions often needed in applications it is shown that the novel paradigm embeds classical logical satisfiability and standard finite domain constraint satisfaction problems but seems to provide more expressive framework from knowledge representation point of view the first steps towards programming methodology for the new paradigm are taken by presenting solutions to standard constraint satisfaction problems combinatorial graph problems and planning problems an efficient implementation of the paradigm based on domain restricted programs has been developed this is an extension of previous implementation of the stable model semantics the smodels system and is publicly available it contains eg built dash in integer arithmetic integrated to stable model computation the implementation is described briefly and some test results illustrating the current level of performance are reported
this paper describes language and framework that allow coordinated transformations driven by invariants to be specified declaratively as invariant rules and applied automatically the framework supports incremental maintenance of invariants for program design and optimization as well as general transformations for instrumentation refactoring and other purposes this paper also describes our implementations for transforming python and programs and experiments with successful applications of the systems in generating efficient implementations from clear and modular specifications in instrumenting programs for runtime verification profiling and debugging and in code refactoring
spatial query language enables the spatial analysis of building information models and the extraction of partial models that fulfill certain spatial constraints among other features the developed spatial query language includes directional operators ie operators that reflect the directional relationships between spatial objects such as northof southof eastof westof above and below the paper presents in depth definitions of the semantics of two new directional models for extended objects the projection based and the halfspace based model by using point set theory notation it further describes the possible implementation of directional operators using newly developed space partitioning data structure called slot tree which is derived from the objects octree representation the slot tree allows for the application of recursive algorithms that successively increase the discrete resolution of the spatial objects employed and thereby enables the user to trade off between computational effort and the required accuracy the article also introduces detailed investigations on the runtime performance of the developed algorithms
the group oriented services are one of the primary application classes that are addressed by mobile ad hoc networks manets in recent years to support such services multicast routing is used thus there is need to design stable and reliable multicast routing protocols for manets to ensure better packet delivery ratio lower delays and reduced overheads in this paper we propose mesh based multicast routing scheme that finds stable multicast path from source to receivers the multicast mesh is constructed by using route request and route reply packets with the help of multicast routing information cache and link stability database maintained at every node the stable paths are found based on selection of stable forwarding nodes that have high stability of link connectivity the link stability is computed by using the parameters such as received power distance between neighboring nodes and the link quality that is assessed using bit errors in packet the proposed scheme is simulated over large number of manet nodes with wide range of mobility and the performance is evaluated performance of the proposed scheme is compared with two well known mesh based multicast routing protocols ie on demand multicast routing protocol odmrp and enhanced on demand multicast routing protocol eodmrp it is observed that the proposed scheme produces better packet delivery ratio reduced packet delay and reduced overheads such as control memory computation and message overheads
innovative ecustoms solutions play an important role in the pan european egovernment strategy the underlying premise is interoperability postulating common understanding of processes services and the documents that are exchanged between business and government organizations as well as between governmental authorities of different eu member states this article provides stringent approach for deriving documents and services from current ecustoms procedures based on the un cefact standards framework and for embedding these in service oriented architecture for collaborative egovernment in doing so we put special focus on document engineering by applying the un cefact core component technical specification ccts conceptual framework for modeling document components in syntax neutral and technology independent manner by relying on ccts we want to tackle the challenge of handling different document configurations imposed by divergent national legislations different customs procedures export import transit and excise and different industries the resulting conceptual model is transferred to xml schema serving as basis for web services design and implementation these web services are designed for seamless interoperable exchange of electronic customs documents between heterogeneous is landscapes both on business and government side beyond the theoretical deduction practical insights are gained from european research project implementing the artifacts proposed in real world setting
the web contains an abundance of useful semistructured information about real world objects and our empirical study shows that strong sequence characteristics exist for web information about objects of the same type across different web sites conditional random fields crfs are the state of the art approaches taking the sequence characteristics to do better labeling however as the information on web page is two dimensionally laid out previous linear chain crfs have their limitations for web information extraction to better incorporate the two dimensional neighborhood interactions this paper presents two dimensional crf model to automatically extract object information from the web we empirically compare the proposed model with existing linear chain crf models for product information extraction and the results show the effectiveness of our model
most of the work done in the field of code compression pertains to processors with fixed length instruction encoding the design of code compression scheme for variable length instruction encodings poses newer design challenges in this work we first investigate the scope for code compression on variable length instruction set processors whose encodings are already optimized to certain extent with respect to their usage for such isas instruction boundaries are not known prior to decoding another challenging task of designing code compression scheme for such isas is designing the decompression hardware which must decompress code postcache so that we gain in performance we present two dictionary based code compression schemes the first algorithm uses bit vector the second one uses reserved instructions to identify code words we design additional logic for each of the schemes to decompress the code on the fly we test the two algorithms with variable length risc processor we provide detailed experimental analysis of the empirical results obtained by extensive simulation based design space exploration for this system the optimized decompressor can now execute compressed program faster than the native program the experiments demonstrate reduction in code size up to percnt speed up up to percnt and bus switching activity up to percnt we also implement one decompressor in hardware description language and synthesize it to illustrate the small overheads associated with the proposed approach
despite the upsurge of interest in the aspect oriented programming aop paradigm there remain few results on test data generation techniques for aop furthermore there is no work on search based optimization for test data generation an approach that has been shown to be successful in other programming paradigms in this paper we introduce search based optimization approach to automated test data generation for structural coverage of aop systems we present the results of an empirical study that demonstrates the effectiveness of the approach we also introduce domain reduction approach for aop testing and show that this approach not only reduces test effort but also increases test effectiveness this finding is significant because similar studies for non aop programming paradigms show no such improvement in effectiveness merely reduction in effort we also present the results of an empirical study of the reduction in test effort achieved by focusing specifically on branches inside aspects
the process of populating an ontology based system with high quality and up to date instance information can be both time consuming and prone to error in many domains however one possible solution to this problem is to automate the instantiation process for given ontology by searching mining the web for the required instance information the primary challenges facing such system include efficiently locating web pages that most probably contain the desired instance information extracting the instance information from page and clustering documents that describe the same instance in order to exploit data redundancy on the web and thus improve the overall quality of the harvested data in addition these steps should require as little seed knowledge as possible in this paper the allright ontology instantiation system is presented which supports the full instantiation life cycle and addresses the above mentioned challenges through combination of new and existing techniques in particular the system was designed to deal with situations where the instance information is given in tabular form the main innovative pillars of the system are new high recall focused crawling technique xcrawl novel table recognition algorithm innovative methods for document clustering and instance name recognition as well as techniques for fact extraction instance generation and query based fact validation the successful evaluation of the system in different real world application scenarios shows that the ontology instantiation process can be successfully automated using only very limited amount of seed knowledge
we present new framework for processing point sampled objects using spectral methods by establishing concept of local frequencies on geometry we introduce versatile spectral representation that provides rich repository of signal processing algorithms based on an adaptive tesselation of the model surface into regularly resampled displacement fields our method computes set of windowed fourier transforms creating spectral decomposition of the model direct analysis and manipulation of the spectral coefficients supports effective filtering resampling power spectrum analysis and local error control our algorithms operate directly on points and normals requiring no vertex connectivity information they are computationally efficient robust and amenable to hardware acceleration we demonstrate the performance of our framework on selection of example applications including noise removal enhancement restoration and subsampling
the scaling of serial algorithms cannot rely on the improvement of cpus anymore the performance of classical support vector machine svm implementations has reached its limit and the arrival of the multi core era requires these algorithms to adapt to new parallel scenario graphics processing units gpu have arisen as high performance platforms to implement data parallel algorithms in this paper it is described how na iuml ve implementation of multiclass classifier based on svms can map its inherent degrees of parallelism to the gpu programming model and efficiently use its computational throughput empirical results show that the training and classification time of the algorithm can be reduced an order of magnitude compared to classical multiclass solver libsvm while guaranteeing the same accuracy
content based publish subscribe cps is powerful paradigm providing loosely coupled event driven messaging services although the general cps model is well known many features remain implementation specific because of different application requirements many of these requirements can be captured in policies that separate service semantics from system mechanisms but no such policy framework currently exists in the cps context in this paper we propose novel policy model and framework for cps systems that benefits from the scalability and expressiveness of existing cps matching algorithms in particular we provide reference implementation and several evaluation scenarios that demonstrate how our approach easily and dynamically enables features such as notification semantics meta events security zoning and cps firewalls
automated analysis of human affective behavior has attracted increasing attention from researchers in psychology computer science linguistics neuroscience and related disciplines promising approaches have been reported including automatic methods for facial and vocal affect recognition however the existing methods typically handle only deliberately displayed and exaggerated expressions of prototypical emotions despite the fact that deliberate behavior differs in visual and audio expressions from spontaneously occurring behavior recently efforts to develop algorithms that can process naturally occurring human affective behavior have emerged this paper surveys these efforts we first discuss human emotion perception from psychological perspective next we examine the available approaches to solving the problem of machine understanding of human affective behavior occurring in real world settings we finally outline some scientific and engineering challenges for advancing human affect sensing technology
increasing application complexity and improvements in process technology have today enabled chip multiprocessors cmps with tens to hundreds of cores on chip networks on chip nocs have emerged as scalable communication fabrics that can support high bandwidths for these massively parallel systems however traditional electrical noc implementations still need to overcome the challenges of high data transfer latencies and large power consumption on chip photonic interconnects have recently been proposed as an alternative to address these challenges with high performance per watt characteristics for intra chip communication in this paper we explore using photonic interconnects on chip to enhance traditional electrical nocs our proposed hybrid photonic noc utilizes photonic ring waveguide to enhance traditional electrical mesh noc experimental results indicate strong motivation for considering the proposed hybrid photonic noc for future cmps as much as reduction in power consumption and improved throughput and access latencies compared to traditional electrical mesh and torus noc architectures
despite attractive theoretical properties vickrey auctions are seldom used due to the risk of information revelation and fear of cheating cvas cryptographic vickrey auctions have been proposed to protect bidders privacy or to prevent cheating by the bid taker this paper focuses on incentive issues for certain cvas first it defines the cvas of interest and identifies ideal goals for this class of cvas one of the criteria identifies an incentive problem that is new to the literature on cvas the disincentive of bidders to complete the protocol once they have learned that they lost the auction any auction protocol that requires losing bidders to do additional work after learning they have lost the auction must provide the losers with proper incentives to follow the protocol second this paper shows that for class of cvas some losers must continue to participate even though they know they have lost finally it describes two new cva protocols that solve the protocol completion incentive problem both protocols use bidder bidder comparisons based on modified yao’s millionaires protocol the first protocol performs bidder bidder comparisons while the second protocol performs comparisons
we present rhodium new language for writing compiler optimizations that can be automatically proved sound unlike our previous work on cobalt rhodium expresses optimizations using explicit dataflow facts manipulated by local propagation and transformation rules this new style allows rhodium optimizations to be mutually recursively defined to be automatically composed to be interpreted in both flow sensitive and insensitive ways and to be applied interprocedurally given separate context sensitivity strategy all while retaining soundness rhodium also supports infinite analysis domains while guaranteeing termination of analysis we have implemented soundness checker for rhodium and have specified and automatically proven the soundness of all of cobalt’s optimizations plus variety of optimizations not expressible in cobalt including andersen’s points to analysis arithmetic invariant detection loop induction variable strength reduction and redundant array load elimination
expiration based consistency management is widely used to keep replicated contents up to date in the internet the effectiveness of replication can be characterized by the communication costs of client accesses and consistency management both costs depend on the locations of the replicas this paper investigates the problem of placing replicas in network where replica consistency is managed by the expiration based scheme our objective is to minimize the total cost of client accesses and consistency management by analyzing the communication cost of recursive validations for cascaded replicas we prove that in the optimal placement scheme the nodes not assigned replicas induce connected subgraph that includes the origin server our results are generic in that they apply to any request arrival patterns based on the analysis an hbox rm time algorithm is proposed to compute the optimal placement of the replicas where is the sum of the number of descendants over all nodes in the routing tree
most specification languages express only qualitative constraints however among two implementations that satisfy given specification one may be preferred to another for example if specification asks that every request is followed by response one may prefer an implementation that generates responses quickly but does not generate unnecessary responses we use quantitative properties to measure the goodness of an implementation using games with corresponding quantitative objectives we can synthesize optimal implementations which are preferred among the set of possible implementations that satisfy given specificationin particular we show how automata with lexicographic mean payoff conditions can be used to express many interesting quantitative properties for reactive systems in this framework the synthesis of optimal implementations requires the solution of lexicographic mean payoff games for safety requirements and the solution of games with both lexicographic mean payoff and parity objectives for liveness requirements we present algorithms for solving both kinds of novel graph games
it has been shown that storing documents having similar structures together can reduce the fragmentation problem and improve query efficiency unlike the flat text document the web document has no standard vectorial representation which is required in most existing classification algorithms in this paper we propose vectorization method for xml documents by using multidimensional scaling mds so that web documents can be fed into an existing classification algorithm the classical mds embeds data points into an euclidean space if the similarity matrix constructed by the data points is semidefinite the semidefniteness condition however may not hold due to the inference technique used in practice we will find semi definite matrix which is the closest to the distance matrix in the euclidean space based on recent developments on strongly semismooth matrix valued functions we solve the nearest semi definite matrix problem with newton type method experimental studies show that the classification accuracy can be improved
an algebra consists of set of objects and set of operators that act on those objects we treat shader programs as first class objects and define two operators connection and combination connection is functional composition the outputs of one shader are fed into the inputs of another combination concatenates the input channels output channels and computations of two shaders similar operators can be used to manipulate streams and apply computational kernels expressed as shaders to streams connecting shader program to stream applies that program to all elements of the stream combining streams concatenates the record definitions of those streamsin conjunction with an optimizing compiler these operators can manipulate shader programs in many useful ways including specialization without modifying the original source code we demonstrate these operators in sh metaprogramming shading language embedded in
group select operation has been defined for relational algebra this operation is found to be useful for efficiently reducing expressions of nonprocedural relational languages that permit natural quantifiers conceptually the operation first partitions relation into blocks of tuples that have the same value for an attribute or attribute concatenation it then extracts each block for which specified number of tuples meet specifiedcondition the quantity of tuples for the operation is specified by means of natural quantifier performance of the group select operation will be poor with conventional file processing making the operation more suitable for use with database machine with an associative memory
current processor and multiprocessor architectures are almost all based on the von neumann paradigm based on this paradigm one can build general purpose computer using very few transistors eg transistors in the first intel microprocessor in other terms the notion that on chip space is scarce resource is at the root of this paradigm which trades on chip space for program execution time today technology considerably relaxed this space constraint still few research works question this paradigm as the most adequate basis for high performance computers even though the paradigm was not initially designed to scale with technology and spacein this article we propose different computing model defining both an architecture and language that is intrinsically designed to exploit space we then investigate the implementation issues of computer based on this model and we provide simulation results for small programs and simplified architecture as first proof of concept through this model we also want to outline that revisiting some of the principles of today’s computing paradigm has the potential of overcoming major limitations of current architectures
proposed performance model for superscalar processorsconsists of component that models the relationshipbetween instructions issued per cycle and the sizeof the instruction window under ideal conditions and methods for calculating transient performance penaltiesdue to branch mispredictions instruction cache misses and data cache missesusing trace derived data dependenceinformation data and instruction cache miss rates and branch miss prediction rates as inputs the model canarrive at performance estimates for typical superscalarprocessor that are within of detailed simulation onaverage and within in the worst case the modelalso provides insights into the workings of superscalarprocessors and long term microarchitecture trends such aspipeline depths and issue widths
this work addresses the existing research gap regarding the security of service oriented architectures and their integration in the context of nomadic computing the state of the art of service oriented architectures soas is thoroughly investigated to understand what secure service provision means for different soas and whether an established notion of secure soa existed based on the analysis of existing soas we define set of requirements for securing services among different nomadic computing domains such requirements concern the security of service registration and that of the discovery and delivery phases the surveyed soas are then evaluated in the light of the defined requirements revealing interesting observations about how current soas address security issues the second part of this work addresses the research issue of achieving secure service provision in nomadic computing environment characterized by number of heterogeneous service oriented architectures solution is presented in the form of an architectural model named secure nomadic computing architecture the model relies on novel three phase discovery delivery protocol which allows the enforcement of number of security requirements identified as result of the first part of the work finally we present an exemplary implementation of the proposed architectural model developed within the context of distributed management information system for the discovery of digital educational content
packet classification is an enabling technology to support advanced internet services it is still challenge for software solution to achieve gbps line rate classification speed this paper presents classification algorithm that can be efficiently implemented on multi core architecture with or without cache the algorithm embraces the holistic notion of exploiting application characteristics considering the capabilities of the cpu and the memory hierarchy and performing appropriate data partitioning the classification algorithm adopts two stages searching on reduction tree and searching on list of ranges this decision is made based on classification heuristic the size of the range list is limited after the first stage search optimizations are then designed to speed up the two stage execution to exploit the speed gap between the cpu and external memory between internal memory cache and external memory an interpreter is used to trade the cpu idle cycles with demanding memory access requirements by applying the cisc style of instruction encoding to compress the range expressions it not only significantly reduces the total memory requirement but also makes effective use of the internal memory cache bandwidth we show that compressing data structures is an effective optimization across the multi core architectures we implement this algorithm on both intel ixp network processor and core duo architecture and experiment with the classification benchmark classbench by incorporating architecture awareness in algorithm design and taking into account the memory hierarchy data partitioning and latency hiding in algorithm implementation the resulting algorithm shows good scalability on intel ixp by effectively using the cache system the algorithm also runs faster than the previous fastest rfc on the core duo architecture
this paper presents novel data driven system for expressive facial animation synthesis and editing given novel phoneme aligned speech input and its emotion modifiers specifications this system automatically generates expressive facial animation by concatenating captured motion data while animators establish constraints and goals constrained dynamic programming algorithm is used to search for best matched captured motion nodes by minimizing cost function users optionally specify hard constraints motion node constraints for expressing phoneme utterances and soft constraints emotion modifiers to guide the search process users can also edit the processed facial motion node database by inserting and deleting motion nodes via novel phoneme isomap interface novel facial animation synthesis experiments and objective trajectory comparisons between synthesized facial motion and captured motion demonstrate that this system is effective for producing realistic expressive facial animations
this paper is concerned with particular family of regular connected graphs called chordal rings chordal rings are variation of ring networks by adding two extra links or chords at each vertex in ring network the reliability and fault tolerance of the network are enhanced two spanning trees on graph are said to be independent if they are rooted at the same vertex say and for each vertex neq the two paths from to one path in each tree are internally disjoint set of spanning trees on given graph is said to be independent if they are pairwise independent iwasaki et al check end of sentence proposed linear time algorithm for finding four independent spanning trees on chordal ring in this paper we give new linear time algorithm to generate four independent spanning trees with reduced height in each tree moreover complete analysis of our improvements on the heights of independent spanning trees is also provided
gestures are natural means of communication between humans and also natural modality for human computer interaction automatic recognition of gestures using computer vision is an important task in many real world applications such as sign language recognition computer games control virtual reality intelligent homes and assistive environments in order for gesture recognition system to be robust and deployable in non laboratory settings the system needs to be able to operate in complex scenes with complicated backgrounds and multiple moving and skin colored objects in this paper we propose an approach for improving gesture recognition performance in such complex environments the key idea is to integrate face detection module into the gesture recognition system and use the face location and size to make gesture recognition invariant to scale and translation our experiments demonstrate the significant advantages of the proposed method over alternative computer vision methods for gesture recognition
services play an increasingly important role in software applications today service oriented computing soc increases the speed of system development through loose coupling between the system components the discovery of services and consolidating multiple heterogeneous services however the present conventional approaches to engineering soc systems are not able to address the complexities of open and dynamic environments such as those in an extended virtual enterprise or in interorganisation workflows this is the first survey paper for agent oriented software engineering aose to be applied to soc in this paper we have also identified the critical challenges for service oriented software engineering sose and how agent based techniques can be applied to address these challenges the paper surveys and evaluates number of models and methodologies that attempt to tie in two domains of software engineering namely agent oriented analysis and design and service oriented analysis and design soad
the separation of concerns has been core idiom of software engineering for decades in general software can be decomposed properly only according to single concern other concerns crosscut the prevailing one this problem is well known as the tyranny of the dominant decomposition similarly at the programming level the choice of representation drives the implementation of the algorithms this article explores an alternative approach with no dominant representation instead each algorithm is developed in its natural representation and representation is converted into another one only when it is required to support this approach we designed laziness framework for java that performs partial conversions and dynamic optimizations while preserving the execution soundness performance evaluations over graph theory examples demonstrates this approach provides practicable alternative to naive one
objectives healthcare organizations must adopt measures to uphold their patients right to anonymity when sharing sensitive records such as dna sequences to publicly accessible databanks this is often achieved by suppressing patient identifiable information however such practice is insufficient because the same organizations may disclose identified patient information devoid of the sensitive information for other purposes and patients organization visit patterns or trails can re identify records to the identities from which they were derived there exist various algorithms that healthcare organizations can apply to ascertain when patient’s record is susceptible to trail re identification but they require organizations to exchange information regarding the identities of their patients prior to data protection certification in this paper we introduce an algorithmic approach to formally thwart trail re identification in secure setting methods and materials we present framework that allows data holders to securely collaborate through third party in doing so healthcare organizations keep all sensitive information in an encrypted state until the third party certifies that the data to be disclosed satisfies formal data protection model the model adopted for this work is an extended form of unlinkability protection model that until this work was applied in non secure setting only given the framework and protection model we develop an algorithm to generate data that satisfies the protection model in doing so we enable healthcare organizations to prevent trail re identification without revealing identified information results theoretically we prove that the proposed data protection model does not leak information even in the context of an organization’s prior knowledge empirically we use real world hospital discharge records to demonstrate that while the secure protocol induces additional suppression of patient information in comparison to an existing non secure approach the quantity of data disclosed by the secure protocol remains substantial for instance in population of over sickle cell anemia patients the non secure protocol discloses of dna records whereas the secure protocol permits the disclosure of conclusions our results demonstrate healthcare organizations can collaborate to disclose significant quantities of personal biomedical data without violating their anonymity in the process
spatio temporal databases store information about the positions of individual objects over time however in many applications such as traffic supervision or mobile communication systems only summarized data like the number of cars in an area for specific period or phone calls serviced by cell each day is required although this information can be obtained from operational databases its computation is expensive rendering online processing inapplicable in this paper we present specialized methods which integrate spatio temporal indexing with pre aggregation the methods support dynamic spatio temporal dimensions for the efficient processing of historical aggregate queries without priori knowledge of grouping hierarchies the superiority of the proposed techniques over existing methods is demonstrated through comprehensive probabilistic analysis and an extensive experimental evaluation
in this paper we design language and runtime support for isolation only multithreaded transactions called tasks tasks allow isolation to be declared instead of having to be encoded using the low level synchronization constructs the key concept of our design is the use of type system to support rollback free and safe runtime execution of taskswe present first order type system which can verify information for the concurrency controller we use an operational semantics to formalize and prove the type soundness result and an isolation property of tasks the semantics uses specialized concurrency control algorithm that is based on access versioning
one of the challenging tasks in the deployment of dense wireless networks like sensor networks is in devising routing scheme for node to node communication important consideration includes scalability routing complexity the length of the communication paths and the load sharing of the routes in this paper we show that compact and expressive abstraction of network connectivity by the medial axis enables efficient and localized routing we propose map medial axis based naming and routing protocol that does not require locations makes routing decisions locally and achieves good load balancing in its preprocessing phase map constructs the medial axis of the sensor field defined as the set of nodes with at least two closest boundary nodes the medial axis of the network captures both the complex geometry and non trivial topology of the sensor field it can be represented compactly by graph whose size is comparable with the complexity of the geometric features eg the number of holes each node is then given name related to its position with respect to the medial axis the routing scheme is derived through local decisions based on the names of the source and destination nodes and guarantees delivery with reasonable and natural routes we show by both theoretical analysis and simulations that our medial axis based geometric routing scheme is scalable produces short routes achieves excellent load balancing and is very robust to variations in the network model
in this work method for detecting distance based outliers in data streams is presented we deal with the sliding window model where outlier queries are performed in order to detect anomalies in the current window two algorithms are presented the first one exactly answers outlier queries but has larger space requirements the second algorithm is directly derived from the exact one has limited memory requirements and returns an approximate answer based on accurate estimations with statistical guarantee several experiments have been accomplished confirming the effectiveness of the proposed approach and the high quality of approximate solutions
the efficient handling of range queries in peer to peer systems is still an open issue several approaches exist but their lookup schemes are either too expensive space filling curves or their queries lack expressiveness topology driven data distribution we present two structured overlay networks that support arbitrary range queries the first one named chord has been derived from chord by substituting chord’s hashing function by key order preserving function it has logarithmic routing performance and it supports range queries which is not possible with chord its pointer update algorithm can be applied to any peer to peer routing protocol with exponentially increasing pointers we present formal proof of the logarithmic routing performance and show empirical results that demonstrate the superiority of chord over chord in systems with high churn rates we then extend our routing scheme to multiple dimensions resulting in sonar structured overlay network with arbitrary range queries sonar covers multi dimensional data spaces and in contrast to other approaches sonar’s range queries are not restricted to rectangular shapes but may have arbitrary shapes empirical results with data set of two million objects show the logarithmic routing performance in geospatial domain
context aware applications can better meet users needs when sensing agents installed in the environment automatically provide input relevant to the application however this non intrusive context usage may cause privacy concerns since sensitive user data could be leaked to unauthorized parties therefore data privacy protection becomes one of the major issues for context aware applications in this paper in order to provide services based on various levels of privacy concerns we extend the platform for privacy preferences of wc and define specification for representing user privacy preferences for context aware applications we also propose privacy infrastructure which could be installed as plug in service for middleware supporting context aware applications this infrastructure enables the middleware to automatically generate privacy policy and the user preference file according to the current context the middleware simply matches these two files to decide whether to proceed with the application we demonstrate the efficacy of this approach through prototype implementation
the growing amount of web based attacks poses severe threat to the security of web applications signature based detection techniques increasingly fail to cope with the variety and complexity of novel attack instances as remedy we introduce protocol aware reverse http proxy tokdoc the token doctor which intercepts requests and decides on per token basis whether token requires automatic healing in particular we propose an intelligent mangling technique which based on the decision of previously trained anomaly detectors replaces suspicious parts in requests by benign data the system has seen in the past evaluation of our system in terms of accuracy is performed on two real world data sets and large variety of recent attacks in comparison to state of the art anomaly detectors tokdoc is not only capable of detecting most attacks but also significantly outperforms the other methods in terms of false positives runtime measurements show that our implementation can be deployed as an inline intrusion prevention system
aspect oriented software development aosd is an approach to software development in which aspect oriented techniques are integrated with traditional mainly oo development techniques identifying the appropriate method components for supporting aspect oriented development is facilitated by the use of method engineering approach we demonstrate this approach by using the open process framework opf to identify previous deficiencies in the method fragments stored in the opf repository so that the enhanced opf repository is able to fully support aosd
we develop the first ever fully functional three dimensional guaranteed quality parallel graded delaunay mesh generator first we prove criterion and sufficient condition of delaunay independence of steiner points in three dimensions based on these results we decompose the iteration space of the sequential delaunay refinement algorithm by selecting independent subsets from the set of the candidate steiner points without resorting to rollbacks we use an octree which overlaps the mesh for coarse grained decomposition of the set of candidate steiner points based on their location we partition the worklist containing poor quality tetrahedra into independent lists associated with specific separated leaves of the octree finally we describe an example parallel implementation using publicly available state of the art sequential delaunay library tetgen this work provides case study for the design of abstractions and parallel frameworks for the use of complex labor intensive sequential codes on multicore architectures
this paper aims at identifying some of the key factors in adopting an organization wide software reuse program the factors are derived from practical experience reported by industry professionals through survey involving brazilian small medium and large software organizations some of them produce software with commonality between applications and have mature processes while others successfully achieved reuse through isolated ad hoc efforts the paper compiles the answers from the survey participants showing which factors were more associated with reuse success based on this relationship guide is presented pointing out which factors should be more strongly considered by small medium and large organizations attempting to establish reuse program
we present pressuremove pressure based interaction technique that enables simultaneous control of pressure input and mouse movement simultaneous control of pressure and mouse movement can support tasks that require control of multiple parameters like rotation and translation of an object or pan and zoom we implemented four variations of pressuremove techniques for position and orientation matching task where pressure manipulations mapped to object orientation and mouse movement to object translation the naive technique mapped raw pressure sensor values to the object rotation the rate based technique mapped discrete pressure values to speed of rotation and hierarchical and hybrid techniques that use two step approach to control orientation using pressure in user study that compared the four techniques with the default mouse only technique we found that rate based pressuremove was the fastest technique with the least number of crossings and as preferred as the default mouse in terms of user preference we discuss the implications of our user study and present several design guidelines
many software maintenance and enhancement tasks require considerable developer knowledge and experience in order to be efficiently completed on today’s large and complex systems preserving explicit forms of documentation that are accessible by large development teams with regular developer turnover is difficult problem this problem can result in temporal and spatial miscommunication an easily lost cognitive work context and largely unmaintainable software the research described in this paper hypothesizes that the problem may be addressed by semi structured goal question evidence methodology for program comprehension that has three primary aspects first redocumentation system should function in parallel with the development process by integrating into the user’s usual tool environment and development workflow second knowledge should be dispersed throughout development team as soon as it is discovered so that comprehension is not merely confined to the mind of one individual finally the developer should be made peripherally aware of their work objectives and the surrounding collaborative environment reducing time spent on task reorientation context reconstruction and duplicative work we present an observational study conducted on pair program comprehension and use the analyzed results to drive the formation of tool requirements for collaborative comprehension tool prototype tool has been developed showing promise for the methodology
virtual channels are an appealing flow control technique for on chip interconnection networks nocs in that they can potentially avoid deadlock and improve link utilization and network throughput however their use in the resource constrained multi processor system on chip mpsoc domain is still controversial due to their significant overhead in terms of area power and cycle time degradation this paper proposes simple yet efficient approach to vc implementation which results in more area and power saving solutions than conventional design techniques while these latter replicate only buffering resources for each physical link we replicate the entire switch and prove that our solution is counter intuitively more area power efficient while potentially operating at higher speeds this result builds on well known principle of logic synthesis for combinational circuits the area performance trade off when inferring logic function into gate level netlist and proves that when designer is aware of this novel architecture design techniques can be conceived
the history of histograms is long and rich full of detailed information in every step it includes the course of histograms in different scientific fields the successes and failures of histograms in approximating and compressing information their adoption by industry and solutions that have been given on great variety of histogram related problems in this paper and in the same spirit of the histogram techniques themselves we compress their entire history including their future history as currently anticipated in the given fixed space budget mostly recording details for the periods events and results with the highest personally biased interest in limited set of experiments the semantic distance between the compressed and the full form of the history was found relatively small
we study how to efficiently diffuse updates to large distributed system of data replicas some of which may exhibit arbitrary byzantine failures we assume that strictly fewer than replicas fail and that each update is initially received by at least correct replicas the goal is to diffuse each update to all correct replicas while ensuring that correct replicas accept no updates generated spuriously by faulty replicas to achieve this each correct replica further propagates an update only after receiving it from at least others in this way no correct replica will ever propagate or accept an update that only faulty replicas introduce since it will receive that update from only the faulty replicaswe provide the first analysis of diffusion protocols for such environments this analysis is fundamentally different from known analyses for the benign case due to our treatment of fully byzantine failures which among other things precludes the use of digital signatures for authenticating forwarded updates we propose two measures that characterize the efficiency of diffusion algorithms delay and fan in and prove general lower bounds with regards to these measures we then provide family of diffusion algorithms that have nearly optimal delay fan in product
regression test prioritization techniques re order the execution of test suite in an attempt to ensure that defects are revealed earlier in the test execution phase in prior work test suites were prioritized with respect to their ability to satisfy control flow based and mutation based test adequacy criteria in this paper we propose an approach to regression test prioritization that leverages the all dus test adequacy criterion that focuses on the definition and use of variables within the program under test our prioritization scheme is motivated by empirical studies that have shown that tests fulfilling the all dus test adequacy criteria are more likely to reveal defects than those that meet the control flow based criteria ii there is an unclear relationship between all dus and mutation based criteria and iii mutation based testing is significantly more expensive than testing that relies upon all dusin support of our prioritization technique we provide formal statement of the algorithms and equations that we use to instrument the program under test perform test suite coverage monitoring and calculate test adequacy furthermore we examine the architecture of tool that implements our novel prioritization scheme and facilitates experimentation the use of this tool in preliminary experimental evaluation indicates that for three case study applications our prioritization can be performed with acceptable time and space overheads finally these experiments also demonstrate that the prioritized test suite can have an improved potential to identify defects earlier during the process of test execution
in many research fields such as psychology linguistics cognitive science and artificial intelligence computing semantic similarity between words is an important issue in this paper new semantic similarity metric that exploits some notions of the feature based theory of similarity and translates it into the information theoretic domain which leverages the notion of information content ic is presented in particular the proposed metric exploits the notion of intrinsic ic which quantifies ic values by scrutinizing how concepts are arranged in an ontological structure in order to evaluate this metric an on line experiment asking the community of researchers to rank list of word pairs has been conducted the experiment’s web setup allowed to collect similarity ratings and to differentiate native and non native english speakers such large and diverse dataset enables to confidently evaluate similarity metrics by correlating them with human assessments experimental evaluations using wordnet indicate that the proposed metric coupled with the notion of intrinsic ic yields results above the state of the art moreover the intrinsic ic formulation also improves the accuracy of other ic based metrics in order to investigate the generality of both the intrinsic ic formulation and proposed similarity metric further evaluation using the mesh biomedical ontology has been performed even in this case significant results were obtained the proposed metric and several others have been implemented in the java wordnet similarity library
the multidim model is conceptual multidimensional model for data warehouse and olap applications these applications require the presence of time dimension to track changes in measure values however the time dimension cannot be used to represent changes in other dimensions in this paper we introduce temporal extension of the multidim model this extension is based on research realized in temporal databases we allow different temporality types valid time transaction time and lifespan which are obtained from source systems and loading time which is generated in the data warehouse our model provides temporal support for levels attributes hierarchies and measures for hierarchies we discuss different cases depending on whether the changes in levels or in the relationships between them must be kept for measures we give different scenarios that show the usefulness of the different temporality types further since measures can be aggregated before being inserted into data warehouses we discuss the issues related to different time granularities between source systems and data warehouses we finish the paper presenting transformation of the multidim model into the entity relationship and the object relational models
program slicing is fundamental operation for many software engineering tools currently the most efficient algorithm for interprocedural slicing is one that uses program representation called the system dependence graph this paper defines new algorithm for slicing with system dependence graphs that is asymptotically faster than the previous one preliminary experimental study indicates that the new algorithm is also significantly faster in practice providing roughly fold speedup on examples of to lines
for more than years significant research effort was concentrated on globally asynchronous locally synchronous gals design methodologies but despite several successful implementations gals has had little impact on industry products this article presents different gals techniques and architectures the authors also analyze the actual challenges and problems for wider adoption of the currently proposed gals methods their analysis shows that significant improvement can be achieved in terms of system integration and emi reduction on the other hand for power savings only marginal improvements to the existing techniques can be expected additionally introduction of gals approach leads to relatively small area increases and in some cases even causes certain performance losses the authors present major examples of gals implementations finally they outline some directions for future development of gals techniques and their design flow it is quite clear that the gals design and test flow must be improved and more automated furthermore the attendant performance degradations must be limited for example high data throughput must be ensured through very low hardware overhead for the gals circuitry
many systems have been introduced to detect software intrusions by comparing the outputs and behavior of diverse replicas when they are processing the same potentially malicious input when these replicas are constructed using off the shelf software products it is assumed that they are diverse and not compromised simultaneously under the same attack in this paper we analyze vulnerabilities published in to evaluate the extent to which this assumption is valid we focus on vulnerabilities in application software and show that the majority of these software products including those providing the same service and therefore multiple software substitutes can be used in replicated system to detect intrusions and those that run on multiple operating systems and therefore the same software can be used in replicated system with different operating systems to detect intrusions either do not have the same vulnerability or cannot be compromised with the same exploit we also find evidence that indicates the use of diversity in increasing attack tolerance for other software these results show that systems utilizing off the shelf software products to introduce diversity are effective in detecting intrusions
wireless sensor network wsn applications require redundant sensors to guarantee fault tolerance however the same degree of redundancy is not necessary for multi hop communication in this paper we present new scheduling method called virtual backbone scheduling vbs vbs employs heterogeneous scheduling where backbone nodes work with duty cycling to preserve network connectivity and nonbackbone nodes turn off radios to save energy we formulate maximum lifetime backbone scheduling mlbs problem to maximize the network lifetime using this scheduling model because the mlbs problem is np hard two approximation solutions based on the schedule transition graph stg and virtual scheduling graph vsg are proposed we also present an iterative local replacement ilr scheme as an distributed implementation of vbs the path stretch problem is analyzed in order to explore the impact of vbs on the network structure we show through simulations that vbs significantly prolongs the network lifetime under extensive conditions
the proliferation of newer agile integrative business information systems ibis environments that use the software agent and the multiagent systems paradigms has created the need for common and well accepted conceptual modeling grammar that can be used to efficiently precisely and unambiguously model agile ibis systems at the conceptual level in this paper we propose conceptual modeling grammar termed agile integration modeling language aiml based on established ontological foundation for the multiagent based integrative business information systems mibis universe the aiml grammar provides adequate and precise constructs and semantics for modeling agile integration among participating work systems in terms of quickly building and dismantling dynamic collaboration relationships among them to respond to fast changing market needs the aiml grammar is defined as formal model using extended bnf and first order logic and is elaborated using running example in the paper the grammar is also evaluated in terms of its syntactic semantic and pragmatic qualities and is found to exhibit high degree of quality on all these three dimensions in particular the pragmatic quality of aiml measured in terms of grammar complexity evaluated using complexity metrics indicates that aiml is much easier to learn and use as compared to the unified modeling language uml for modeling agile integration of work systems in organizations
bag of visual words bow has recently become popular representation to describe video and image content most existing approaches nevertheless neglect inter word relatedness and measure similarity by bin to bin comparison of visual words in histograms in this paper we explore the linguistic and ontological aspects of visual words for video analysis two approaches soft weighting and constraint based earth mover’s distance cemd are proposed to model different aspects of visual word linguistics and proximity in soft weighting visual words are cleverly weighted such that the linguistic meaning of words is taken into account for bin to bin histogram comparison in cemd cross bin matching algorithm is formulated such that the ground distance measure considers the linguistic similarity of words in particular bow ontology which hierarchically specifies the hyponym relationship of words is constructed to assist the reasoning we demonstrate soft weighting and cemd on two tasks video semantic indexing and near duplicate keyframe retrieval experimental results indicate that soft weighting is superior to other popular weighting schemes such as term frequency tf weighting in large scale video database in addition cemd shows excellent performance compared to cosine similarity in near duplicate retrieval
the sheer complexity of today’s embedded systems forces designers to start with modeling and simulating system components and their interactions in the very early design stages it is therefore imperative to have good tools for exploring wide range of design choices especially during the early design stages where the design space is at its largest this paper presents an overview of the sesame framework which provides high level modeling and simulation methods and tools for system level performance evaluation and exploration of heterogeneous embedded systems more specifically we describe sesame’s modeling methodology and trajectory it takes designer systematically along the path from selecting candidate architectures using analytical modeling and multiobjective optimization to simulating these candidate architectures with our system level simulation environment this simulation environment subsequently allows for architectural exploration at different levels of abstraction while maintaining high level and architecture independent application specifications we illustrate all these aspects using case study in which we traverse sesame’s exploration trajectory for motion jpeg encoder application
delta abstractions are introduced as mechanism for managing database states during the execution of active database rules delta abstractions build upon the use of object deltas capturing changes to individual objects through system supported collapsible type structure the object delta structure is implemented using object oriented concepts such as encapsulation and inheritance so that all database objects inherit the ability to transparently create and manage delta values delta abstractions provide an additional layer to the database programmer for organizing object deltas according to different language components that induce database changes such as methods and active rules as with object deltas delta abstractions are transparently created and maintained by the active database system we define different types of delta abstractions as views of object deltas and illustrate how the services of delta abstractions can be used to inspect the state of active rule execution an active rule analysis and debugging tool has been implemented to demonstrate the use of object deltas and delta abstractions for dynamic analysis of active rules at runtime
this paper presents novel algorithm that improves the localization of disparity discontinuities of disparity maps obtained by multi baseline stereo rather than associating disparity label to every pixel of disparity map it associates position to every disparity discontinuity this formulation allows us to find an approximate solution to labeling problem with robust smoothing term by minimizing multiple problems thus making possible the use of dynamic programming dynamic programming allows the efficient computation of the visibility of most of the cameras during the minimization the proposed algorithm is not stereo matcher on it own since it requires an initial disparity map nevertheless it is very effective way of improving the border localization of disparity maps obtained from large class of stereo matchers whilst the proposed minimization strategy is particularly suitable for stereo with occlusion it may be used for other labeling problems
the semiconductor and thin film transistor liquid crystal display tft lcd industries are currently two of the most important high tech industries in taiwan and occupy over of the global market share moreover these two industries need huge investments in manufacture production equipments pe and own large production scale of the global market therefore how to increase the processing quality of pe to raise the production capacity has become an important issue the statistical process control technique is usually adopted to monitor the important process parameters in the current fabs furthermore routine check for machine or predictive maintenance policy is generally applied to enhance the stability of process and improve yields however manufacturing system cannot be obtained online quality measurements during the manufacturing process when the abnormal conditions occur it will cause large number of scrapped substrates and the costs will be seriously raised in this research virtual metrology vm system is proposed to overcome those mentioned problems it not only fulfills real time quality measurement of each wafer but also detects the performance degradation of the corresponding machines from the information of manufacturing processes this paper makes four critical contributions the more principal component analysis pca we used the higher the accuracy we obtained by the kernel function approach the more support vector data description svdd we used the higher the accuracy we obtain in novelty detection module our empirical results show that genetic algorithm ga and incremental learning methods increase the training learning of support vector machine svm model and by developing wafer quality prediction model the svm approach obtains better prediction accuracy than the radial basis function neural network rbfn and back propagation neural network bpnn approaches therefore this paper proposes that the artificial intelligent ai approach could be more suitable methodology than traditional statistics for predicting the potential scrapped substrates risk of semiconductor manufacturing companies
there exist wide variety of network design problems that require traffic matrix as input in order to carry out performance evaluation the research community has not had at its disposal any information about how to construct realistic traffic matrices we introduce here the two basic problems that need to be addressed to construct such matrices the first is that of synthetically generating traffic volume levels that obey spatial and temporal patterns as observed in realistic traffic matrices the second is that of assigning set of numbers representing traffic levels to particular node pairs in given topology this paper provides an in depth discussion of the many issues that arise when addressing these problems our approach to the first problem is to extract statistical characteristics for such traffic from real data collected inside two large ip backbones we dispel the myth that uniform distributions can be used to randomly generate numbers for populating traffic matrix instead we show that the lognormal distribution is better for this purpose as it describes well the mean rates of origin destination flows we provide estimates for the mean and variance properties of the traffic matrix flows from our datasets we explain the second problem and discuss the notion of traffic matrix being well matched to topology we provide two initial solutions to this problem one using an ilp formulation that incorporates simple and well formed constraints our second solution is heuristic one that incorporates more challenging constraints coming from carrier practices used to design and evolve topologies
the past decade has witnessed proliferation of repositories that store and retrieve continuous media data types eg audio and video objects these repositories are expected to play major role in several emerging applications eg library information systems educational applications entertainment industry etc to support the display of video object the system partitions each object into fixed size blocks all blocks of an object reside permanently on the disk drive when displaying an object the system stages the blocks of the object into memory one at time for immediate display in the presence of multiple displays referencing different objects the bandwidth of the disk drive is multiplexed among requests introducing disk seeks disk seeks reduce the useful utilization of the disk bandwidth and result in lower number of simultaneous displays throughput this paper characterizes the impact of disk seeks on the throughput of the system it describes rebeca as mechanism that maximizes the throughput of the system by minimizing the time attributed to each incurred seek limitation of rebeca is that it increases the latency observed by each request we quantify this throughput vs latency tradeoff of rebeca and develop an efficient technique that computes its configuration parameters to realize the performance requirements desired latency and throughput of an application
the internet and the world wide web are becoming increasingly important in our highly interconnected world this book addresses the topic of querying the data available with regard to its quality in systematic and comprehensive way from database point of view first information quality and information quality measures are systematically introduced before ranking algorithms are developed for selecting web sources for access the second part is devoted to quality driven query answering particularly to query planning methods and algorithms
we describe the work we are conducting on new middleware services for dependable and secure mobile systems this work is based on approaches agrave la peer to peer in order to circumvent the problems introduced by the lack of infrastructure in self organizing networks of mobile nodes such as manets the mechanisms we propose are based on collaboration between peer mobile devices to provide middleware services such as trust management and critical data storage this short paper gives brief description of the problems we are trying to solve and some hints and ideas towards solution
in most parallel supercomputers submitting job for execution involves specifying how many processors are to be allocated to the job when the job is moldable ie there is choice on how many processors the job uses an application scheduler called sa can significantly improve job performance by automatically selecting how many processors to use since most jobs are moldable this result has great impact to the current state of practice in supercomputer scheduling however the widespread use of sa can change the nature of workload processed by supercomputers when many sas are scheduling jobs on one supercomputer the decision made by one sa affects the state of the system therefore impacting other instances of sa in this case the global behavior of the system comes from the aggregate behavior caused by all sas in particular it is reasonable to expect the competition for resources to become tougher with multiple sas and this tough competition to decrease the performance improvement attained by each sa individually this paper investigates this very issue we found that the increased competition indeed makes it harder for each individual instance of sa to improve job performance nevertheless there are two other aggregate behaviors that override increased competition when the system load is moderate to heavy first as load goes up sa chooses smaller requests which increases efficiency which effectively decreases the offered load which mitigates long wait times second better job packing and fewer jobs in the system make it easier for incoming jobs to fit in the supercomputer schedule thus reducing wait times further as result in moderate to heavy load conditions single instance of sa benefits from the fact that other jobs are also using sa
in this article we showcase an agent mediated bc and bb marketplace this marketplace is part of the social and immersive tourism environment itchy feet we give an overview of the framework that forms the basis of the marketplace and show how it is used to create bc bb and virtual organizations that are visualized in virtual world this interface provides users with an intuitive and easy way to interact with humans and software agents by means of virtual world the business logic is realized by autonomous software agents offering services to customers the marketplace is regulated by electronic institutions to ensure that all participants adhere to the rules of the market the article is concluded with detailed discussion on bridging the gaps between multi agent systems and virtual worlds and the preliminary results of conducted usability study of itchy feet
consider an arbitrary distributed network in which large numbers of objects are continuously being created replicated and destroyed basic problem arising in such an environment is that of organizing data tracking scheme for locating object copies in this paper we present new tracking scheme for locating nearly copies of replicated objects in arbitrary distributed environments our tracking scheme supports efficient accesses to data objects while keeping the local memory overhead low in particular our tracking scheme achieves an expected polylog approximation in the cost of any access operation for an arbitrary network the memory overhead incurred by our scheme is ogr polylog times the maximum number of objects stored at any node with high probability we also show that our tracking scheme adapts well to dynamic changes in the network
associative classification mining is promising approach in data mining that utilizes the association rule discovery techniques to construct classification systems also known as associative classifiers in the last few years number of associative classification algorithms have been proposed ie cpar cmar mcar mmac and others these algorithms employ several different rule discovery rule ranking rule pruning rule prediction and rule evaluation methods this paper focuses on surveying and comparing the state of the art associative classification techniques with regards to the above criteria finally future directions in associative classification such as incremental learning and mining low quality data sets are also highlighted in this paper
denial of service resilience is an important practical consideration for key agreement protocols in any hostile environment such as the internet there are well known models that consider the security of key agreement protocols but denial of service resilience is not considered as part of these models many protocols have been argued to be denial of service resilient only to be subsequently broken or shown ineffectivein this work we propose formal definition of denial of service resilience model for secure authenticated key agreement and show how security and denial of service resilience can be considered in common framework with particular focus on client puzzles the model accommodates variety of techniques for achieving denial of service resilience and we describe one such technique by exhibiting denial of service resilient secure authenticated key agreement protocol our approach addresses the correct integration of denial of service countermeasures with the key agreement protocol to prevent hijacking attacks that would otherwise render the countermeasures irrelevant
this paper presents systematic approach to the problem of photorealistic model acquisition from the combination of range and image sensing the input is sequence of unregistered range scans of the scene and sequence of unregistered photographs of the same scene the output is true texture mapped geometric model of the scene we believe that the developed modules are of vital importance for flexible photorealistic model acquisition system segmentation algorithms simplify the dense datasets and provide stable features of interest which can be used for registration purposes solid modeling provides geometrically correct models finally the automated range to an image registration algorithm can increase the flexibility of the system by decoupling the slow geometry recovery process from the image acquisition process the camera does not have to be precalibrated and rigidly attached to the range sensor the system is comprehensive in that it addresses all phases of the modeling problem with particular emphasis on automating the entire process interaction
this paper proposes boosting eigenactions algorithm for human action recognition spatio temporal information saliency map ism is calculated from video sequence by estimating pixel density function continuous human action is segmented into set of primitive periodic motion cycles from information saliency curve each cycle of motion is represented by salient action unit sau which is used to determine the eigenaction using principle component analysis human action classifier is developed using multi class adaboost algorithm with bayesian hypothesis as the weak classifier given human action video sequence the proposed method effectively locates the saus in the video and recognizes the human actions by categorizing the saus two publicly available human action databases namely kth and weizmann are selected for evaluation the average recognition accuracy are and for kth and weizmann databases respectively comparative results with two recent methods and robustness test results are also reported
in this work we propose method for computing mesh representations of objects reconstructed from set of silhouette images our method is based on the polygonization of volumetric reconstructions by using modified version of the dual contouring method in order to apply dual contouring on volumetric reconstruction from silhouettes we devised method that is able to determine the discrete topology of the surface in relation to the octree cells we also developed new scheme for computing hermitian data representing the intersections of conic volumes with the octree cells and their corresponding normals with subpixel accuracy due to the discrete and extremely noisy nature of the data used in the reconstruction we had to devise different criterion for mesh simplification that applies topological consistency tests only when the geometric error measure is beyond given tolerance we present results of the application of the proposed method in the extraction of mesh corresponding to the surface of objects of real scene
we study integrated prefetching and caching problems following the work of cao et al and kimbrel and karlin cao et al and kimbrel and karlin gave approximation algorithms for minimizing the total elapsed time in single and parallel disk settings the total elapsed time is the sum of the processor stall times and the length of the request sequence to be servedwe show that an optimum prefetching caching schedule for single disk problem can be computed in polynomial time thereby settling an open question by kimbrel and karlin for the parallel disk problem we give an approximation algorithm for minimizing stall time the solution uses few extra memory blocks in cache stall time is an important and harder to approximate measure for this problem all of our algorithms are based on new approach which involves formulating the prefetching caching problems as linear programs
power dissipation is quickly becoming one of the most important limiters in nanometer ic design for leakage increases exponentially as the technology scaling down however power and timing are often conflicting objectives during optimization in this paper we propose novel total power optimization flow under performance constraint instead of using placement gate sizing and multiple vt assignment techniques independently we combine them together through the concept of slack distribution management to maximize the potential for power reduction we propose to use the linear programming lp based placement and the geometric programming gp based gate sizing formulations to improve the slack distribution which helps to maximize the total power reduction during the vt assignment stage our formulations include important practical design constraints such as slew noise and short circuit power which were often ignored previously we tested our algorithm on set of industrial strength manually optimized circuits from multi ghz nm microprocessor and obtained very promising results to our best knowledge this is the first work that combines placement gate sizing and vt swapping systematically for total power and in particular leakage management
association rules are traditionally designed to capture statistical relationship among itemsets in given database to additionally capture the quantitative association knowledge korn etal recently propose paradigm named ratio rules for quantifiable data mining however their approach is mainly based on principle component analysis pca and as result it cannot guarantee that the ratio coefficients are non negative this may lead to serious problems in the rules application in this paper we propose new method called principal sparse non negative matrix factorization psnmf for learning the associations between itemsets in the form of ratio rules in addition we provide support measurement to weigh the importance of each rule for the entire dataset experiments on several datasets illustrate that the proposed method performs well for discovering latent associations between itemsets in large datasets
protocol narrations are widely used informal means to describe in an idealistic manner the functioning of cryptographic protocols as single intended sequence of cryptographic message exchanges among the protocol’s participants protocol narrations have also been informally turned into number of formal protocol descriptions eg using the spi calculus in this paper we propose direct formal operational semantics for protocol narrations that fixes particular and as we argue well motivated interpretation on how the involved protocol participants are supposed to execute based on this semantics we explain and formally justify natural and precise translation of narrations into spi calculus an optimised translation has been implemented in ocaml and we report on case studies that we have carried out using the tool
sensor networks are wireless networks that can obtain and process physical world information from scattered sensor devices due to the limited power slow processor and less memory in each device routing protocols of sensor networks must be designed carefully recently data centric scheme called directed diffusion has been proposed to provide efficient data transmission over sensor networks in this paper we enhance this scheme by hierarchical data aggregation technique hda experiments demonstrate that our enhanced scheme can save transmission energy up to over directed diffusion without any reliability or delivery efficiency being compromised at the same time our scheme can facilitate greater data level aggregation in data centric routing
scratch pad memories spms are important storage components in many embedded applications and used as an alternative or complimentary storage to on chip cache memories one of the most critical issues in the context of spms is to select the data elements to place in them since the gap between spm access latencies and off chip memory access latencies keep increasing dramatically previous research considered this problem and attacked it using both static and dynamic schemes most of the prior efforts on data spms have mainly focused on single application scenarios ie the spm space available is assumed to be managed by single application at any given time while this assumption makes sense in certain domains there also exist many cases where multiple applications need to share the same spm space this paper focuses on such multi application scenario and proposes nonuniformspm space partitioning and management across concurrentlyexecuting applications in our approach the amount of data to be allocated to each application is decided based on the data reuse each application exhibits
as compared to large spectrum of performance optimizations relatively less effort has been dedicated to optimize other aspects of embedded applications such as memory space requirements power real time predictability and reliability in particular many modern embedded systems operate under tight memory space constraints one way of addressing this constraint is to compress executable code and data as much as possible while researchers on code compression have studied efficient hardware and software based code compression strategies many of these techniques do not take application behavior into account that is the same compression decompression strategy is used irrespective of the application being optimized this article presents an application sensitive code compression strategy based on control flow graph cfg representation of the embedded program the idea is to start with memory image wherein all basic blocks of the application are compressed and decompress only the blocks that are predicted to be needed in the near future when the current access to basic block is over our approach also decides the point at which the block could be compressed we propose and evaluate several compression and decompression strategies that try to reduce memory requirements without excessively increasing the original instruction cycle counts some of our strategies make use of profile data whereas others are fully automatic our experimental evaluation using seven applications from the mediabench suite and three large embedded applications reveals that the proposed code compression strategy is very successful in practice our results also indicate that working at basic block granularity as opposed to procedure granularity is important for maximizing memory space savings
current parser generators are based on context free grammars because such grammars lack abstraction facilities the resulting specifications are often not easy to read fischer’s macro grammars extend context free grammars with macro like productions thus providing the equivalent of procedural abstraction however their use is hampered by the lack of an efficient off the shelf parsing technology for macro grammars we define specialization for macro grammars to enable reuse of parsing technology for context free grammars while facilitating the specification of language with macro grammar this specialization yields context free rules but it does not always terminate we present sound and complete static analysis that applies to any macro grammar and decides whether specialization terminates for it and thus yields finite context free grammar the analysis is based on an intuitive notion of self embedding nonterminals which is easy to check by hand we have implemented the analysis as part of preprocessing tool that transforms yacc grammar extended with macro productions to standard yacc grammar
many basic network engineering tasks eg traffic engineering capacity planning anomaly detection rely heavily on the availability and accuracy of traffic matrices however in practice it is challenging to reliably measure traffic matrices missing values are common this observation brings us into the realm of compressive sensing generic technique for dealing with missing values that exploits the presence of structure and redundancy in many real world systems despite much recent progress made in compressive sensing existing compressive sensing solutions often perform poorly for traffic matrix interpolation because real traffic matrices rarely satisfy the technical conditions required for these solutions to address this problem we develop novel spatio temporal compressive sensing framework with two key components new technique called sparsity regularized matrix factorization srmf that leverages the sparse or low rank nature of real world traffic matrices and their spatio temporal properties and ii mechanism for combining low rank approximations with local interpolation procedures we illustrate our new framework and demonstrate its superior performance in problems involving interpolation with real traffic matrices where we can successfully replace up to of the values evaluation in applications such as network tomography traffic prediction and anomaly detection confirms the flexibility and effectiveness of our approach
double dispatch is the ability to dynamically select method not only according to the run time type of the receiver single dispatch but also according to the run time type of the argument this mechanism unleashes the power of dynamic binding in object oriented languages so enhancing re usability and separation of responsibilities however many mainstream languages such as eg and java do not provide it resorting only to single dispatch in this paper we propose an extension of also applicable to other object oriented languages that enables double dispatch as language feature this yields dynamic overloading and covariant specialization of methods we define translation from the new constructs to standard and we present the preprocessor implementing this translation called mono doublecpp mono the translated code enjoys static type safety and implements the semantics of double dispatch by using only standard mechanisms of static overloading and dynamic binding with minimal impact on the performance of the program copyright copy john wiley sons ltd
the majority of the available classification systems focus on the minimization of the classification error rate this is not always suitable metric specially when dealing with two class problems with skewed classes and cost distributions in this case an effective criterion to measure the quality of decision rule is the area under the receiver operating characteristic curve auc that is also useful to measure the ranking quality of classifier as required in many real applications in this paper we propose nonparametric linear classifier based on the maximization of auc the approach lies on the analysis of the wilcoxon mann whitney statistic of each single feature and on an iterative pairwise coupling of the features for the optimization of the ranking of the combined feature by the pairwise feature evaluation the proposed procedure is essentially different from other classifiers using auc as criterion experiments performed on synthetic and real data sets and comparisons with previous approaches confirm the effectiveness of the proposed method
rfid has already found its way into variety of large scale applications and arguably it is already one of the most successful technologies in the history of computing beyond doubt rfid is an effective automatic identification technology for variety of objects including natural manufactured and handmade artifacts humans and other species locations and increasingly media content and mobile services in this survey we consider developments towards establishing rfid as the cost effective technical solution for the development of open shared universal pervasive computing infrastructures and look ahead to its future in particular we discuss the ingredients of current large scale applications the role of network services to provide complete systems privacy and security implications and how rfid is helping prototype emerging pervasive computing applications we conclude by identifying common trends in the new applications of rfid and ask questions related to sustainable universal deployment of this technology
we present new parallel programming tool environment that is accessible and executable ldquo anytime anywhere rdquo through standard web browsers and integrated in that it provides tools which adhere to common underlying methodology for parallel programming and performance tuning the environment is based on new network computing infrastructure developed at purdue university we evaluate our environment qualitatively by comparing our tool access method with conventional schemes of software download and installation we also quantitatively evaluate the efficiency of interactive tool access in our environment we do this by measuring the response times of various functions of the ursa minor tool and compare them with those of java applet based anytime anywhere tool access method we found that our environment offers significant advantages in terms of tool accessibility integration and efficiency
an efficient name resolution scheme is the cornerstone of any peer to peer network the name resolution scheme proposed by plaxton rajaraman and richa which we hereafter refer to as the prr scheme is scalable name resolution scheme that also provides provable locality properties however since prr goes to extra lengths to provide these locality properties it is somewhat complicated in this paper we propose scalable locality aware and fault tolerant name resolution scheme which can be considered simplified version of prr although this new scheme does not provide as strong locality guarantees as prr it exploits locality heuristically yet effectively
the intelligent book project aims to improve online education by designing materials that can model the subject matter they teach in the manner of reactive learning environment in this paper we investigate using an automated proof assistant particularly isabelle hol as the model supporting first year undergraduate exercises in which students write proofs in number theory automated proof assistants are generally considered to be difficult for novices to learn we examine whether by providing very specialized interface it is possible to build something that is usable enough to be of educational value to ensure students cannot game the system the exercise avoids tactic choosing interaction styles but asks the student to write out the proof proofs are written using mathstiles composable tiles that resemble written mathematics unlike traditional syntax directed editors mathstiles allows students to keep many answer fragments on the canvas at the same time and does not constrain the order in which an answer is written also the tile syntax does not need to match the underlying isar syntax exactly and different tiles can be used for different questions the exercises take place within the context of an intelligent book we performed user study and qualitative analysis of the system some users were able to complete proofs with much less training than is usual for the automated proof assistant itself but there remain significant usability issues to overcome
as part of the evolution of software systems effort is often invested to discover in what parts of the source code feature or other concern is implemented unfortunately knowledge about concern’s implementation can become invalid as the system evolves we propose to mitigate this problem by automatically inferring structural patterns among the elements identified as relevant to concern’s implementation we then document the inferred patterns as rules that can be checked as the source code evolves checking whether structural patterns hold across different versions of system enables the automatic identification of new elements related to documented concern we implemented our technique for java in an eclipse plug in called isis and applied it to number of concerns with case study spanning versions of the development history of an open source system we show how our approach supports the tracking of concern’s implementation through modifications such as extensions and refactorings
in recent years there has been growing interest in urban screen applications while there have been several deployments of these technologies in our urban environments surprisingly little research effort has aimed to explore the detailed material practice of people’s engagement and interaction with these urban screen applications in this paper we present study of collaborative game play on large urban displays situated in three city locations in the uk the study highlights ways in which collaborative play is initiated and coordinated within the context of an urban environment these experiences are related to physical characteristics of the architectural spaces the people populating these spaces and the interactive properties of the game itself the study moves on to discuss issues relating to audience and spectatorship an inherent feature of interaction in urban environments the issues of audience and spectatorship are discussed in their own right but also in terms of their relationship to the playing experience finally the study considers these interactive experiences in the contexts of being hosted by professional compere and also with no host present through the study we highlight factors to consider in the design of collaborative urban screen applications
we provide an experimental study of the role of syntactic parsing in semantic role labeling our conclusions demonstrate that syntactic parse information is clearly most relevant in the very first stage the pruning stage in addition the quality of the pruning stage cannot be determined solely based on its recall and precision instead it depends on the characteristics of the output candidates that make downstream problems easier or harder motivated by this observation we suggest an effective and simple approach of combining different semantic role labeling systems through joint inference which significantly improves the performance
the increasing availability of huge amounts of thin data ie data pertaining to time and positions generated by different sources with wide variety of technologies eg rfid tags gps gsm networks leads to large spatio temporal data collections mining such amounts of data is challenging since the possibility of extracting useful information from this particular type of data is crucial in many application scenarios such as vehicle traffic management hand off in cellular networks and supply chain management in this paper we address the issue of clustering spatial trajectories in the context of trajectory data this problem is even more challenging than in classical transactional relationships as here we deal with data trajectories in which the order of items is relevant we propose novel approach based on suitable regioning strategy and an efficient clustering technique based on edit distance experiments performed on real world datasets have confirmed the efficiency and effectiveness of the proposed techniques
many real world surfaces exhibit translucent appearance due to subsurface scattering although various methods exists to measure edit and render subsurface scattering effects no solution exists for manufacturing physical objects with desired translucent appearance in this paper we present complete solution for fabricating material volume with desired surface bssrdf we stack layers from fixed set of manufacturing materials whose thickness is varied spatially to reproduce the heterogeneity of the input bssrdf given an input bssrdf and the optical properties of the manufacturing materials our system efficiently determines the optimal order and thickness of the layers we demonstrate our approach by printing variety of homogenous and heterogenous bssrdfs using two hardware setups milling machine and printer
runtime bloat degrades significantly the performance and scalability of software systems an important source of bloat is the inefficient use of containers it is expensive to create inefficiently used containers and to invoke their associated methods as this may ultimately execute large volumes of code with call stacks dozens deep and allocate many temporary objects this paper presents practical static and dynamic tools that can find inappropriate use of containers in java programs at the core of these tools is base static analysis that identifies for each container the objects that are added to this container and the key statements ie heap loads and stores that achieve the semantics of common container operations such as add and get the static tool finds problematic uses of containers by considering the nesting relationships among the loops where these semantics achieving statements are located while the dynamic tool can instrument these statements and find inefficiencies by profiling their execution frequencies the high precision of the base analysis is achieved by taking advantage of context free language cfl reachability formulation of points to analysis and by accounting for container specific properties it is demand driven and client driven facilitating refinement specific to each queried container object and increasing scalability the tools built with the help of this analysis can be used both to avoid the creation of container related performance problems early during development and to help with diagnosis when problems are observed during tuning our experimental results show that the static tool has low false positive rate and produces more relevant information than its dynamic counterpart further case studies suggest that significant optimization opportunities can be found by focusing on statically identified containers for which high allocation frequency is observed at run time
current fully automatic model based test case generation techniques for guis employ static model therefore they are unable to leverage certain state based relationships between gui events eg one enables the other one alters the other’s execution that are revealed at run time and non trivial to infer statically we present alt new technique to generate gui test cases in batches because of its alternating nature alt enhances the next batch by using gui run time information from the current batch an empirical study on four fielded gui based applications demonstrated that alt was able to detect new and way gui interaction faults in contrast previous techniques due to their requirement of too many test cases were unable to even test and way gui interactions
sensor networks have wide range of potential practical and useful applications however there are issues that need to be addressed for efficient operation of sensor network systems in real applications energy saving is one critical issue for sensor networks since most sensors are equipped with nonrechargeable batteries that have limited lifetime to extend the lifetime of sensor network one common approach is to dynamically schedule sensors work sleep cycles or duty cycles moreover in cluster based networks cluster heads are usually selected in way that minimizes the total energy consumption and they may rotate among the sensors to balance energy consumption in general these energy efficient scheduling mechanisms also called topology configuration mechanisms need to satisfy certain application requirements while saving energy in this paper we provide survey on energy efficient scheduling mechanisms in sensor networks that have different design requirements than those in traditional wireless networks we classify these mechanisms based on their design assumptions and design objectives different mechanisms may make different assumptions about their sensors including detection model sensing area transmission range failure model time synchronization and the ability to obtain location and distance information they may also have different assumptions about network structure and sensor deployment strategy furthermore while all the mechanisms have common design objective to maximize network lifetime they may also have different objectives determined by their target applications
we present methodology for generating optimized architectures for data bandwidth constrained extensible processors we describe scalable integer linear programming ilp formulation that extracts the most profitable set of instruction set extensions given the available data bandwidth and transfer latency unlike previous approaches we differentiate between number of inputs and outputs for instruction set extensions and the number of register file ports this differentiation makes our approach applicable to architectures that include architecturally visible state registers and dedicated data transfer channels we support comprehensive design space exploration to characterize the area performance trade offs for various applications we evaluate our approach using actual asic implementations to demonstrate that our automatically customized processors meet timing within the target silicon area for an embedded processor with only two register read ports and one register write port we obtain up to times speed up with extensions incurring only area overhead
this paper introduces new consistency metric network imprecision ni to address central challenge in largescale monitoring systems safeguarding accuracy despite node and network failures to implement ni an overlay that monitors set of attributes also monitors its own state so that queries return not only attribute values but also information about the stability of the overlay the number of nodes whose recent updates may be missing and the number of nodes whose inputs may be double counted due to overlay reconfigurations when ni indicates that the network is stable query results are guaranteed to reflect the true state of the system but when the network is unstable ni puts applications on notice that query results should not be trusted allowing them to take corrective action such as filtering out inconsistent results to scalably implement ni’s introspection our prototype introduces key optimization dual tree prefix aggregation which exploits overlay symmetry to reduce overheads by more than an order of magnitude evaluation of three monitoring applications demonstrates that ni flags inaccurate results while incurring low overheads and monitoring applications that use ni to select good information can improve their accuracy by up to an order of magnitude
internet based distributed systems enable globally scattered resources to be collectively pooled and used in cooperative manner to achieve unprecedented petascale supercomputing capabilities numerous resource discovery approaches have been proposed to help achieve this goal to report or discover multi attribute resource most approaches use multiple messages with one message for each attribute leading to high overhead of memory consumption node communication and subsequent merging operation another approach can report and discover multi attribute resource using one query by reducing multi attribute to single index but it is not practically effective in an environment with large number of different resource attributes furthermore few approaches are able to locate resources geographically close to the requesters which is critical to system performance this paper presents pp based intelligent resource discovery pird mechanism that weaves all attributes into set of indices using locality sensitive hashing and then maps the indices to structured pp overlay pird can discover resources geographically close to requesters by relying on hierarchical pp structure it significantly reduces overhead and improves search efficiency and effectiveness in resource discovery it further incorporates the lempel ziv welch algorithm to compress attribute information for higher efficiency theoretical analysis and simulation results demonstrate the efficiency of pird in comparison with other approaches it dramatically reduces overhead and yields significant improvements on the efficiency of resource discovery
in multimedia applications run time memory management support has to allow real time memory de allocation retrieving and processing of data thus its implementation must be designed to combine high speed low power large data storage capacity and high memory bandwidth in this paper we assess the performance of our new system level exploration methodology to optimise the memory management of typical multimedia applications in an extensively used reconstruction image system this methodology is based on an analysis of the number of memory accesses normalised memory footprint and energy estimations for the system studied this results in an improvement of normalised memory footprint up to and the estimated energy dissipation up to over conventional static memory implementations in an optimised version of the driver application finally our final version is able to scale perfectly the memory consumed in the system for wide range of input parameters whereas the statically optimised version is unable to do this
multiprocessor deterministic replay has many potential uses in the era of multicore computing including enhanced debugging fault tolerance and intrusion detection while sources of nondeterminism in uniprocessor can be recorded efficiently in software it seems likely that hardware support will be needed in multiprocessor environment where the outcome of memory races must also be recordedwe develop memory race recording mechanism called rerun that uses small hardware state bytes core writes small race log bytes kilo instruction and operates well as the number of cores per system scales eg tocores rerun exploits the dual of conventional wisdom in race recording rather than record information about individual memory accesses that conflict we record how long thread executes without conflicting with other threads in particular rerun passively creates atomic episodes each episode is dynamic instruction sequence that thread happens to execute without interacting with other threads rerun uses lamport clocks to order episodes and enable replay of an equivalent execution
this paper presents tree pattern based method of automatically and accurately finding code clones in program files duplicate tree patterns are first collected by anti unification algorithm and redundancy free exhaustive comparisons and then finally clustered the algorithm is designed in such way that the same comparison is not repeated for speed while thoroughly examining every possible pairs of tree patterns for accuracy our method maintains the syntax structure of code in tree pattern clusters which gives the flexibility of finding different types of clones while keeping the precision
our physical bodies play central role in shaping human experience in the world understandingof the world and interactions in the world this paper draws on theories of embodiment from psychology sociology and philosophy synthesizing five themes we believe are particularly salient for interaction design thinking through doing performance visibility risk and thick practice we intro duce aspects of human embodied engagement in the world with the goal of inspiring new interaction design ap proaches and evaluations that better integrate the physical and computational worlds
tracking is usually interpreted as finding an object in single consecutive frames regularization is done by enforcing temporal smoothness of appearance shape and motion we propose tracker by interpreting the task of tracking as segmentation of volume in inherently temporal and spatial regularization is unified in single regularization term segmentation is done by variational approach using anisotropic weighted total variation tv regularization the proposed convex energy is solved globally optimal by fast primal dual algorithm any image feature can be used in the segmentation cue of the proposed mumford shah like data term as proof of concept we show experiments using simple color based appearance model as demonstrated in the experiments our tracking approach is able to handle large variations in shape and size as well as partial and complete occlusions
we evaluate two architectural alternatives mdash partitioned and integrated mdash for designing next generation file systems whereas partitioned server employs separate file system for each application class an integrated file server multiplexes its resources among all application classes we evaluate the performance of the two architectures with respect to sharing of disk bandwidth among the application classes we show that although the problem of sharing disk bandwidth in integrated file systems is conceptually similar to that of sharing network link bandwidth in integrated services networks the arguments that demonstrate the superiority of integrated services networks over separate networks are not applicable to file systems furthermore we show that an integrated server outperforms the partitioned server in large operating region and has slightly worse performance in the remaining region ii the capacity of an integrated server is larger than that of the partitioned server and iii an integrated server outperforms the partitioned server by up to factor of in the presence of bursty workloads
the business process execution language for web services bpelws or bpel for short is an xml based language for defining business processes that provides an interoperable portable language for both abstract and executable processes and that was designed from the beginning to operate in the heterogeneity and dynamism that is commonplace in information technology today bpel builds on the layers of flexibility provided by the web services stack and especially by xml in this paper we provide brief introduction to bpel with emphasis on architectural drivers and basic concepts then we survey ongoing bpel work in several application areas adding quality of service to bpel extending bpel to activities involving humans bpel for grid computing and bpel for autonomic computing
changes in recent business environments have created the necessity for more efficient and effective business process management the workflow management system is software that assists in defining business processes as well as automatically controlling the execution of the processes this paper proposes new approach to the automatic execution of business processes using event condition action eca rules that can be automatically triggered by an active database first of all we propose the concept of blocks that can classify process flows into several patterns block is minimal unit that can specify the behaviors represented in process model an algorithm is developed to detect blocks from process definition network and transform it into hierarchical tree model the behaviors in each block type are modeled using acta formalism this provides theoretical basis from which eca rules are identified the proposed eca rule based approach shows that it is possible to execute the workflow using the active capability of database without users intervention the operation of the proposed methods is illustrated through an example process
query log analysis has received substantial attention in recent years in which the click graph is an important technique for describing the relationship between queries and urls state of the art approaches based on the raw click frequencies for modeling the click graph however are not noise eliminated nor do they handle heterogeneous query url pairs well in this paper we investigate and develop novel entropy biased framework for modeling click graphs the intuition behind this model is that various query url pairs should be treated differently ie common clicks on less frequent but more specific urls are of greater value than common clicks on frequent and general urls based on this intuition we utilize the entropy information of the urls and introduce new concept namely the inverse query frequency iqf to weigh the importance discriminative ability of click on certain url the iqf weighting scheme is never explicitly explored or statistically examined for any bipartite graphs in the information retrieval literature we not only formally define and quantify this scheme but also incorporate it with the click frequency and user frequency information on the click graph for an effective query representation to illustrate our methodology we conduct experiments with the aol query log data for query similarity analysis and query suggestion tasks experimental results demonstrate that considerable improvements in performance are obtained with our entropy biased models moreover our method can also be applied to other bipartite graphs
color histograms have been widely used successfully in many computer vision and image processing applications however they do not include any spatial information in this paper we propose statistical model to integrate both color and spatial information our model is based on finite multiple bernoulli mixtures for the estimation of the model’s parameters we use maximum posteriori map approach through deterministic annealing expectation maximization daem smoothing priors on the components parameters are introduced to stabilize the estimation the selection of the number of clusters is based on stochastic complexity the results show that our model achieves good performance in some image classification problems
method inlining and data flow analysis are two major optimization components for effective program transformations however they often suffer from the existence of rarely or never executed code contained in the target method one major problem lies in the assumption that the compilation unit is partitioned at method boundaries this paper describes the design and implementation of region based compilation technique in our dynamic compilation system in which the compiled regions are selected as code portions without rarely executed code the key part of this technique is the region selection partial inlining and region exit handling for region selection we employ both static heuristics and dynamic profiles to identify rare sections of code the region selection process and method inlining decision are interwoven so that method inlining exposes other targets for region selection while the region selection in the inline target conserves the inlining budget leading to more method inlining thus the inlining process can be performed for parts of method not for the entire body of the method when the program attempts to exit from region boundary we trigger recompilation and then rely on on stack replacement to continue the execution from the corresponding entry point in the recompiled code we have implemented these techniques in our java jit compiler and conducted comprehensive evaluation the experimental results show that the approach of region based compilation achieves approximately performance improvement on average while reducing the compilation overhead by to in comparison to the traditional function based compilation techniques
we investigate the problem of computing in the presence of faults that may arbitrarily ie adversarially corrupt memory locations in the faulty memory model any memory cell can get corrupted at any time and corrupted cells cannot be distinguished from uncorrupted ones an upper bound on the number of corruptions and reliable memory cells are provided in this model we focus on the design of resilient dictionaries ie dictionaries which are able to operate correctly at least on the set of uncorrupted keyswe first present simple resilient dynamic search tree based on random sampling with log expected amortized cost per operation and space complexity we then propose an optimal deterministic static dictionary supporting searches in log time in the worst case and we show how to use it in dynamic setting in order to support updates in log amortized time our dynamic dictionary also supports range queries in log worst case time where is the size of the output finally we show that every resilient search tree with some reasonable properties must take log worst case time per search
providing various services on the internet by enterprises is an important trend in business composite services which consist of various services provided by different service providers are complex processes that require the cooperation among cross organizational service providers the flexibility and success of business depend on effective knowledge support to access related information resources of composite services thus providing effective knowledge support for accessing composite services is challenging task this work proposes knowledge map platform to provide an effective knowledge support for utilizing composite services data mining approach is applied to extract knowledge patterns from the usage records of composite services based on the mining result topic maps are employed to construct the knowledge map meanwhile the proposed knowledge map is integrated with recommendation capability to generate recommendations for composite services via data mining and collaborative filtering techniques prototype system is implemented to demonstrate the proposed platform the proposed knowledge map enhanced with recommendation capability can provide users customized decision support to effectively utilize composite services
until recently numerous feature selection techniques have been proposed and found wide applications in genomics and proteomics for instance feature gene selection has proven to be useful for biomarker discovery from microarray and mass spectrometry data while supervised feature selection has been explored extensively there are only few unsupervised methods that can be applied to exploratory data analysis in this paper we address the problem of unsupervised feature selection first we extend laplacian linear discriminant analysis llda to unsupervised cases second we propose novel algorithm for computing llda which is efficient in the case of high dimensionality and small sample size as in microarray data finally an unsupervised feature selection method called llda based recursive feature elimination llda rfe is proposed we apply llda rfe to several public data sets of cancer microarrays and compare its performance with those of laplacian score and svd entropy two state of the art unsupervised methods and with that of fisher score supervised filter method our results demonstrate that llda rfe outperforms laplacian score and shows favorable performance against svd entropy it performs even better than fisher score for some of the data sets despite the fact that llda rfe is fully unsupervised
data warehouse and online analytical processing olap play key role in business intelligent systems with the increasing amount of spatial data stored in business database how to utilize these spatial information to get insight into business data from the geo spatial point of view is becoming an important issue of data warehouse and olap however traditional data warehouse and olap tools can not fully exploit spatial data in coordinates because multi dimensional spatial data does not have implicit or explicit concept hierarchy to compute pre aggregation and materialization in data warehouse in this paper we extend the traditional set grouping hierarchy into multi dimensional data space and propose to use spatial index tree as the hierarchy on spatial dimension with spatial hierarchy spatial data warehouse can be built accordingly our approach preserve the star schema in data warehouse while building the hierarchy on spatial dimension and can be easily integrated into existing data warehouse and olap systems to process spatial olap query in spatial data warehouse we propose an olap favored search method which can utilize the pre aggregation result in spatial data warehouse to improve the performance of spatial olap queries for generality the algorithm is developed based on generalized index searching tree gist to improve the performance of olap favored search we further introduce heuristic search method which can provide an approximate answer to spatial olap query experiment result shows the efficiency of our method
all existing fault tolerance job scheduling algorithms for computational grids were proposed under the assumption that all sites apply the same fault tolerance strategy they all ignored that each grid site may have its own fault tolerance strategy because each site is itself an autonomous domain in fact it is very common that there are multiple fault tolerance strategies adopted at the same time in large scale computational grid various fault tolerance strategies may have different hardware and software requirements for instance if grid site employs the job checkpointing mechanism each computation node must have the following ability periodically the computational node transmits the transient state of the job execution to the server if job fails it will migrate to another computational node and resume from the last stored checkpoint therefore in this paper we propose genetic algorithm for job scheduling to address the heterogeneity of fault tolerance mechanisms problem in computational grid we assume that the system supports four kinds fault tolerance mechanisms including the job retry the job migration without checkpointing the job migration with checkpointing and the job replication mechanisms because each fault tolerance mechanism has different requirements for gene encoding we also propose new chromosome encoding approach to integrate the four kinds of mechanisms in chromosome the risk nature of the grid environment is also taken into account in the algorithm the risk relationship between jobs and nodes are defined by the security demand and the trust level simulation results show that our algorithm has shorter makespan and more excellent efficiencies on improving the job failure rate than the min min and sufferage algorithms
the internet provides wealth of useful information in vast number of dynamic information sources but it is difficult to determine which sources are useful for given query most existing techniques either require explicit source cooperation for example by exporting data summaries or build relatively static source characterization for example by assigning topic to the source we present system called infobeacons that takes different approach data and sources are left as is and peer to peer network of beacons uses past query results to guide queries to sources who do the actual query processing this approach has several advantages including requiring minimal changes to sources tolerance of dynamism and heterogeneity and the ability to scale to large numbers of sources we present the architecture of the system and discuss the advantages of our design we then focus on how beacon can choose good sources for query despite the loose coupling of beacons to sources beacons cache responses to previous queries and adapt the cache to changes at the source the cache is then used to select good sources for future queries we discuss results from detailed experimental study using our beacon prototype which demonstrates that our loosely coupled approach is effective beacon only has to contact sixty percent or less of the sources contacted by existing tightly coupled approaches while providing results of equivalent or better relevance to queries
in model driven verification model checker executes program by embedding it within test harness thus admitting program verification without the need to translate the program which runs as native code model checking techniques in which code is actually executed have recently gained popularity due to their ability to handle the full semantics of actual implementation languages and to support verification of rich properties in this paper we show that combination with dynamic analysis can with relatively low overhead considerably extend the capabilities of this style of model checking in particular we show how to use the cil framework to instrument code in order to allow the spin model checker when verifying programs to check additional properties simulate system resets and use local coverage information to guide the model checking search an additional benefit of our approach is that instrumentations developed for model checking may be used without modification in testing or monitoring code we are motivated by experience in applying model driven verification to jpl developed flight software modules from which we take our example applications we believe this is the first investigation in which an independent instrumentation for dynamic analysis has been integrated with model checking
the term systematic review is used to refer to specific methodology of research developed in order to gather and evaluate the available evidence pertaining to focused topic it represents secondary study that depends on primary study results to be accomplished several primary studies have been conducted in the field of software engineering in the last years determining an increasing improvement in methodology however in most cases software is built with technologies and processes for which developers have insufficient evidence to confirm their suitability limits qualities costs and inherent risks conducting systematic reviews in software engineering consists in major methodological tool to scientifically improve the validity of assertions that can be made in the field and as consequence the reliability degree of the methods that are employed for developing software technologies and supporting software processes this paper aims at discussing the significance of experimental studies particularly systematic reviews and their use in supporting software processes template designed to support systematic reviews in software engineering is presented and the development of ontologies to describe knowledge regarding such experimental studies is also introduced
online shoppers are generally highly task driven they have certain goal in mind and they are looking for product with features that are consistent with that goal unfortunately finding product with specific features is extremely time consuming using the search functionality provided by existing web sitesin this paper we present new search system called red opal that enables users to locate products rapidly based on features our fully automatic system examines prior customer reviews identifies product features and scores each product on each feature red opal uses these scores to determine which products to show when user specifies desired product feature we evaluate our system on four dimensions precision of feature extraction efficiency of feature extraction precision of product scores and estimated time savings to customers on each dimension red opal performs better than comparison system
we present fast but reliable way to detect routing criticalities in vlsi chips in addition we show how this congestion estimation can be incorporated into partitioning based placement algorithm different to previous approaches we do not rerun parts of the placement algorithm or apply post placement optimization but we use our congestion estimator for dynamic avoidance of routability problems in one single run of the placement algorithm computational experiments on chips with up to cells are presented the framework reduces the usage of the most critical routing edges by on average the running time increase for the placement is about however due to the smaller congestion the running time of routing tools can be decreased drastically so the total time for placement and global routing is decreased by on average
we propose methodology based on aspect oriented modeling aom for incorporating security mechanisms in an application the functionality of the application is described using the primary model and the attacks are specified using aspects the attack aspect is composed with the primary model to obtain the misuse model the misuse model describes how much the application can be compromised if the results are unacceptable then some security mechanism must be incorporated into the application the security mechanism modeled as security aspect is composed with the primary model to obtain the security treated model the security treated model is analyzed to give assurance that it is resilient to the attack
proxy object is surrogate or placeholder that controls access to another target object proxies can be used to support distributed programming lazy or parallel evaluation access control and other simple forms of behavioral reflection however wrapper proxies like futures or suspensions for yet to be computed results can require significant code changes to be used in statically typed languages while proxies more generally can inadvertently violate assumptions of transparency resulting in subtle bugs to solve these problems we have designed and implemented simple framework for proxy programming that employs static analysis based on qualifier inference but with additional novelties code for using wrapper proxies is automatically introduced via classfile to classfile transformation and potential violations of transparency are signaled to the programmer we have formalized our analysis and proven it sound our framework has variety of applications including support for asynchronous method calls returning futures experimental results demonstrate the benefits of our framework programmers are relieved of managing and or checking proxy usage analysis times are reasonably fast overheads introduced by added dynamic checks are negligible and performance improvements can be significant for example changing two lines in simple rmi based peer to peer application and then using our framework resulted in large performance gain
the language aml was designed to specify the semantics of architecture description languages adls especially adls describing architectures wherein the architecture itself evolves over time dynamic evolution concerns arise with considerable variation in time scale one may constrain how system may evolve by monitoring its development lifecycle another approach to such concerns involves limiting systems construction primitives to those from appropriate styles one may wish to constrain what implementations are appropriate semi concerns for interface compatibility are then germane and finally one may want to constrain the ability of the architecture to be modified as it is running aml attempts to circumscribe architectures in such way that one may express all of these constraints without committing to which time scale will be used to enforce them example aml specifications of the style and acme are presented
the ad hoc deployment of sensor network causes unpredictable patterns of connectivity and varied node density resulting in uneven bandwidth provisioning on the forwarding paths when congestion happens some sensors may have to reduce their data rates it is an interesting but difficult problem to determine which sensors must reduce rates and how much they should reduce this paper attempts to answer fundamental question about congestion resolution what are the maximum rates at which the individual sensors can produce data without causing congestion in the network and unfairness among the peers we define the maxmin optimal rate assignment problem in sensor network where all possible forwarding paths are considered we provide an iterative linear programming solution which finds the maxmin optimal rate assignment and forwarding schedule that implements the assignment in low rate sensor network we prove that there is one and only one such assignment for given configuration of the sensor network we also study the variants of the maxmin fairness problem in sensor networks
light refraction is an important optical phenomenon whose simulation greatly contributes to the realism of synthesized images although ray tracing can correctly simulate light refraction doing it in real time still remains challenge this work presents an image space technique to simulate the refraction of distant environments in real time contrary to previous approaches for interactive refraction at multiple interfaces the proposed technique does not require any preprocessing as result it can be directly applied to objects undergoing shape deformations which is common and important feature for character animation in computer games and movies our approach is general in the sense that it can be used with any object representation that can be rasterized on programmable gpu it is based on an efficient ray intersection procedure performed against dynamic depth map and carried out in texture space we demonstrate the effectiveness of our approach by simulating refractions through animated characters composed of several hundred thousand polygons in real time
how should ubicomp technologies be evaluated while lab studies are good at sensing aspects of human behavior and revealing usability problems they are poor at capturing context of use in situ studies are good at demonstrating how people appropriate technologies in their intended setting but are expensive and difficult to conduct here we show how they can be used more productively in the design process mobile learning device was developed to support teams of students carrying out scientific inquiry in the field an initial in situ study showed it was not used in the way envisioned contextualized analysis led to comprehensive understanding of the user experience usability and context of use leading to substantial redesign second in situ study showed big improvement in device usability and collaborative learning we discuss the findings and conclude how in situ studies can play an important role in the design and evaluation of ubicomp applications and user experiences
privacy can be an issue during collaboration around personal display when previous browsing activities become visible within web browser features eg autocomplete users currently lack methods to present only appropriate traces of prior activity in these features in this paper we explore semi automatic approach to privacy management that allows users to classify traces of browsing activity and filter them appropriately when their screen is visible by others we developed privatebits prototype web browser that instantiates previously proposed general design guidelines for privacy management systems as well as those specific to web browser visual privacy preliminary evaluation found this approach to be flexible enough to meet participants varying privacy concerns privacy management strategies and viewing contexts however the results also emphasized the need for additional security features to increase trust in the system and raised questions about how to best manage the tradeoff between ease of use and system concealment
matchmaking arises when supply and demand meet in an electronic marketplace or when agents search for web service to perform some task or even when recruiting agencies match curricula and job profiles in such open environments the objective of matchmaking process is to discover best available offers to given request we address the problem of matchmaking from knowledge representation perspective with formalization based on description logics we devise concept abduction and concept contraction as non monotonic inferences in description logics suitable for modeling matchmaking in logical framework and prove some related complexity results we also present reasonable algorithms for semantic matchmaking based on the devised inferences and prove that they obey to some commonsense properties finally we report on the implementation of the proposed matchmaking framework which has been used both as mediator in marketplaces and for semantic web services discovery
we generalize all the results obtained for maximum integer multiflow and minimum multicut problems in trees by garg vazirani and yannakakis garg vv vazirani yannakakis primal dual approximation algorithms for integral flow and multicut in trees algorithmica to graphs with fixed cyclomatic number while this cannot be achieved for other classical generalizations of trees we also introduce thek edge outerplanar graphs class of planar graphs with arbitrary but bounded tree width that generalizes the cacti and show that the integrality gap of the maximum edge disjoint paths problem is bounded in these graphs
as distributed applications increase in size and complexity traditional authorization mechanisms based on single policy decision point are increasingly fragile because this decision point represents single point of failure and performance bottleneck authorization recycling is one technique that has been used to address these challenges this paper introduces and evaluates the mechanisms for authorization recycling in rbac enterprise systems the algorithms that support these mechanisms allow precise and approximate authorization decisions to be made thereby masking possible failures of the policy decision point and reducing its load we evaluate these algorithms analytically and using prototype implementation our evaluation results demonstrate that authorization recycling can improve the performance of distributed access control mechanisms
we develop abstract interpretation from topological principles by relaxing the definitions of open set and continuity key results still hold we study families of closed and open sets and show they generate post and pre condition analyses respectively giacobazzi’s forwards and backwards complete functions are characterized by the topologically closed and continuous maps respectively finally we show that smyth’s upper and lower topologies for powersets induce the overapproximating and underapproximating transition functions used for abstract model checking
fundamental task of reconstructing non rigid articulated motion from sequences of unstructured feature points is to solve the problem of feature correspondence and motion estimation this problem is challenging in high dimensional configuration spaces in this paper we propose general model based dynamic point matching algorithm to reconstruct freeform non rigid articulated movements from data presented solely by sparse feature points the algorithm integrates key frame based self initialising hierarchial segmental matching with inter frame tracking to achieve computation effectiveness and robustness in the presence of data noise dynamic scheme of motion verification dynamic key frame shift identification and backward parent segment correction incorporating temporal coherency embedded in inter frames is employed to enhance the segment based spatial matching such spatial temporal approach ultimately reduces the ambiguity of identification inherent in single frame performance evaluation is provided by series of empirical analyses using synthetic data testing on motion capture data for common articulated motion namely human motion gave feature point identification and matching without the need for manual intervention in buffered real time these results demonstrate the proposed algorithm to be candidate for feature based real time reconstruction tasks involving self resuming tracking for articulated motion
mix networks are designed to provide anonymity for users in variety of applications including privacy preserving www browsing and numerous commerce systems such networks have been shown to be susceptible to number of statistical traffic analysis attacks among these are flow correlation attacks where an adversary may disclose the communication relationship between sender and receiver by measuring the similarity between the sender’s outbound flow and the receiver’s inbound flow the effectiveness of the attacks is measured in terms of the probability that an adversary correctly recognises the receiver this paper describes model for the flow correlation attack effectiveness our results illustrate the quantitative relationship among system parameters such as sample size noise level payload flow rate and attack effectiveness our analysis quantitatively reveals how under certain situations existing flow based anonymous systems would fail under flow correlation attacks thus providing useful guidelines for the design of future anonymous systems
heightfield terrain and parallax occlusion mapping pom are popular rendering techniques in games they can be thought of as per vertex and per pixel relief methods which both create texture stretch artifacts at steep slopes to ameliorate stretching artifacts we describe how to precompute an indirection map that transforms traditional texture coordinates into quasi conformal parameterization on the relief surface the map arises from iteratively relaxing spring network because it is independent of the resolution of the base geometry indirection mapping can be used with pom heightfields and any other displacement effect noisy textures like grass and stucco can be used with indirection mapped which is convenient when texturing terrain we pre warp structured textures by the inverse of the indirection map to maintain their appearance our process gives approximately uniform texture resolution on all surfaces during rendering the time and space overhead are one texture fetch and one texture map
this paper presents an automatic turns detection and annotation technique which works from unlabeled captured locomotion motion annotation is required by several motion capture editing techniques detection of turns is made difficult because of the oscillatory nature of the human locomotion our contribution is to address this problem by analyzing the trajectory of the center of mass of the human body into velocity curvature space representation our approach is based on experimental observations of carefully captured human motions we demonstrate the efficiency and the accuracy of our approach
in the past quantized local descriptors have been shown to be good base for the representation of images that can be applied to wide range of tasks however current approaches typically consider only one level of quantization to create the final image representation in this view they somehow restrict the image description to one level of visual detail we propose to build image representations from multi level quantization of local interest point descriptors automatically extracted from the images the use of this new multi level representation will allow for the description of fine and coarse local image detail in one framework to evaluate the performance of our approach we perform scene image classification using class data set we show that the use of information from multiple quantization levels increases the classification performance which suggests that the different granularity captured by the multi level quantization produces more discriminant image representation moreover by using multi level approach the time necessary to learn the quantization models can be reduced by learning the different models in parallel
one of the most important factors for success of native xml database systems is powerful query optimizer surprisingly little has been done to develop cost models to enable cost based optimization in such systems since the entire optimization process is so complex only stepwise approach will lead to satisfying future solution in this work we are paving the way for cost based xml query optimization by developing cost formulae for two important join operators which allow to perform join reordering and join fusion in cost aware way and therefore make joint application of structural joins and holistic twig joins possible
modelling and verification of systems such as communication network and security protocols which exhibit both probabilistic and non deterministic behaviour typically use markov decision processes mdps for large complex systems abstraction techniques are essential this paper builds on promising approach for abstraction of mdps based on stochastic two player games which provides distinct lower and upper bounds for minimum and maximum probabilistic reachability properties existing implementations work at the model level limiting their scalability in this paper we develop language level abstraction techniques that build game based abstractions of mdps directly from high level descriptions in the prism modelling language using predicate abstraction and smt solvers for efficiency we develop compositional framework for abstraction we have applied our techniques to range of case studies successfully verifying models larger than was possible with existing implementations we are also able to demonstrate the benefits of adopting compositional approach
polarity lexicons have been valuable resource for sentiment analysis and opinion mining there are number of such lexical resources available but it is often suboptimal to use them as is because general purpose lexical resources do not reflect domain specific lexical usage in this paper we propose novel method based on integer linear programming that can adapt an existing lexicon into new one to reflect the characteristics of the data more directly in particular our method collectively considers the relations among words and opinion expressions to derive the most likely polarity of each lexical item positive neutral negative or negator for the given domain experimental results show that our lexicon adaptation technique improves the performance of fine grained polarity classification
we look at model of two way nondeterministic finite automaton augmented with monotonic counters operating on inputs of the form ai anin for some fixed and distinct symbols an where in are nonnegative integers our results concern the following presburger safety verification problem given machine state and presburger relation over counter values is there in such that when started in its initial state on the left end of the input ai anin with all counters initially zero reaches some configuration where the state is and the counter values satisfy we give positive and negative results for different variations and generalizations of the model eg augmenting the model with reversal bounded counters discrete clocks etc in particular we settle an open problem in
context strategic release planning sometimes referred to as road mapping is an important phase of the requirements engineering process performed at product level it is concerned with selection and assignment of requirements in sequences of releases such that important technical and resource constraints are fulfilled objectives in this study we investigate which strategic release planning models have been proposed their degree of empirical validation their factors for requirements selection and whether they are intended for bespoke or market driven requirements engineering context methods in this systematic review number of article sources are used including compendex inspec ieee xplore acm digital library and springer link studies are selected after reading titles and abstracts to decide whether the articles are peer reviewed and relevant to the subject results twenty four strategic release planning models are found and mapped in relation to each other and taxonomy of requirements selection factors is constructed conclusions we conclude that many models are related to each other and use similar techniques to address the release planning problem we also conclude that several requirement selection factors are covered in the different models but that many methods fail to address factors such as stakeholder value or internal value moreover we conclude that there is need for further empirical validation of the models in full scale industry trials
web search engines help users find useful information on the world wide web www however when the same query is submitted by different users typical search engines return the same result regardless of who submitted the query generally each user has different information needs for his her query therefore the search result should be adapted to users with different information needs in this paper we first propose several approaches to adapting search results according to each user’s need for relevant information without any user effort and then verify the effectiveness of our proposed approaches experimental results show that search systems that adapt to each user’s preferences can be achieved by constructing user profiles based on modified collaborative filtering with detailed analysis of user’s browsing history in one day
an important trend in web information processing is the support of content based multimedia retrieval cbmr however the most prevailing paradigm of cbmr such as content based image retrieval content based audio retrieval etc is rather conservative it can only retrieve media objects of single modality with the rapid development of internet there is great deal of media objects of different modalities in the multimedia documents such as webpages which exhibit latent semantic correlation cross media retrieval as new multi media retrieval method is to retrieve all the related media objects with multi modalities via submitting query media object to the best of our knowledge this is the first study on how to speed up the cross media retrieval via indexes in this paper based on cross reference graph crg based similarity retrieval method we propose novel unified high dimensional indexing scheme called cindex which is specifically designed to effectively speedup the retrieval performance of the large cross media databases in addition we have conducted comprehensive experiments to testify the effectiveness and efficiency of our proposed method
we introduce an interactive tool which enables user to quickly assemble an architectural model directly over point cloud acquired from large scale scanning of an urban scene the user loosely defines and manipulates simple building blocks which we call smartboxes over the point samples these boxes quickly snap to their proper locations to conform to common architectural structures the key idea is that the building blocks are smart in the sense that their locations and sizes are automatically adjusted on the fly to fit well to the point data while at the same time respecting contextual relations with nearby similar blocks smartboxes are assembled through discrete optimization to balance between two snapping forces defined respectively by data fitting term and contextual term which together assist the user in reconstructing the architectural model from sparse and noisy point cloud we show that combination of the user’s interactive guidance and high level knowledge about the semantics of the underlying model together with the snapping forces allows the reconstruction of structures which are partially or even completely missing from the input
most of business intelligence applications use data warehousing solutions the star schema or its variants modelling these applications are usually composed of hundreds of dimension tables and multiple huge fact tables referential horizontal partitioning is one of physical design techniques adapted to optimize queries posed over these schemes in referential partitioning fact table can inherit the fragmentation characteristics from dimension table most of the existing works done on referential partitioning start from bag containing selection predicates defined on dimension tables partition each one based on its predicates and finally propagate their fragmentation schemes to the fact table this procedure gives all dimension tables the same probability to partition the fact table which is not always true in order to ensure high performance of the most costly queries the identification of relevant dimension table to referential partition fact table is crucial issue that should be addressed in this paper we first study the complexity of the problem of selecting dimension table used to partition fact table secondly we present strategies to perform their selection finally to validate of our proposal we conduct intensive experimental studies using mathematical cost model and the obtained results are verified on oracleg dbms
contract based property checkers hold the potential for precise scalable and incremental reasoning however it is difficult to apply such checkers to large program modules because they require programmers to provide detailed contracts including an interface specification module invariants and internal specifications we argue that given suitably rich assertion language modest effort suffices to document the interface specification and the module invariants however the burden of providing internal specifications is still significant and remains deterrent to the use of contract based checkers therefore we consider the problem of intra module inference which aims to infer annotations for internal procedures and loops given the interface specification and the module invariants we provide simple and scalable techniques to search for broad class of desired internal annotations comprising quantifiers and boolean connectives guided by the module specification we have validated our ideas by building prototype verifier and using it to verify several properties on windows device drivers with zero false alarms and small annotation overhead these drivers are complex they contain thousands of lines and use dynamic data structures such as linked lists and arrays our technique significantly improves the soundness precision and coverage of verification of these programs compared to earlier techniques
shape optimization is problem which arises in numerous computer vision problems such as image segmentation and multiview reconstruction in this paper we focus on certain class of binary labeling problems which can be globally optimized both in spatially discrete setting and in spatially continuous setting the main contribution of this paper is to present quantitative comparison of the reconstruction accuracy and computation times which allows to assess some of the strengths and limitations of both approaches we also present novel method to approximate length regularity in graph cut based framework instead of using pairwise terms we introduce higher order terms these allow to represent more accurate discretization of the norm in the length term
in business applications such as direct marketing decision makers are required to choose the action which best maximizes utility function cost sensitive learning methods can help them achieve this goal in this paper we introduce pessimistic active learning pal pal employs novel pessimistic measure which relies on confidence intervals and is used to balance the exploration exploitation trade off in order to acquire an initial sample of labeled data pal applies orthogonal arrays of fractional factorial design pal was tested on ten datasets using decision tree inducer comparison of these results to those of other methods indicates pal’s superiority
peer to peer pp paradigm has recently gained tremendous attraction and is widely used for content distribution and sharing the future multimedia communication applications have to support the user’s needs the terminal capabilities the content specification and the underlying networking technologies they should be network aware topology aware and end user centric thus in this paper we use the characteristics of the object based encoding scheme and pp network topology to propose adaptive content delivery architecture for pp networks we propose an efficient mechanism for transmission of real time content over pp networks called poems pp object based adaptive multimedia streaming this object based audio visual quality adaptive mechanism over pp networks is media aware network aware and user centric that is carried out through selection of appropriate sending peers willing to participate in the streaming mechanism organization of sending peers by constructing an overlay network to facilitate content delivery and adaptation dynamicity management of peers when some peer enters or leaves the system to maintain an acceptable level of perceived video quality and ensuring the end to end qos quality of services by orchestrating the overall streaming mechanism the obtained results demonstrate that combining content adaptation using object based encoding and advance network aware peers selection based on peer monitoring leads to intelligent efficient and large scale support of multimedia services over complex network architectures
extended subwords and the matrix register file mrf are two micro architectural techniques that address some of the limitations of existing simd architectures extended subwords are wider than the data stored in memory specifically for every byte of data stored in memory there are four extra bits in the media register file this avoids the need for data type conversion instructions the mrf is register file organization that provides both conventional row wise as well as column wise access to the register file in other words it allows to view the register file as matrix in which corresponding subwords in different registers corresponds to column of the matrix it was introduced to accelerate matrix transposition which is very common operation in multimedia applications in this paper we show that the mrf is very versatile since it can also be used for other permutations than matrix transposition specifically it is shown how it can be used to provide efficient access to strided data as is needed in eg color space conversion furthermore it is shown that special purpose instructions spis such as the sum of absolute differences sad instruction have limited usefulness when extended subwords and few general simd instructions that we propose are supported for the following reasons first when extended subwords are supported the sad instruction provides only relatively small performance improvement second the sad instruction processes bit subwords only which is not sufficient for quarter pixel resolution nor for cost functions used in image and video retrieval results obtained by extending the simplescalar toolset show that the proposed techniques provide speedup of up to over the mmx architecture the results also show that using at most extra media registers yields an additional performance improvement ranging from to
haskell’s popularity has driven the need for ever more expressive type system features most of which threaten the decidability and practicality of damas milner type inference one such feature is the ability to write functions with higher rank types ndash that is functions that take polymorphic functions as their arguments complete type inference is known to be undecidable for higher rank impredicative type systems but in practice programmers are more than willing to add type annotations to guide the type inference engine and to document their code however the choice of just what annotations are required and what changes are required in the type system and its inference algorithm has been an ongoing topic of research we take as our starting point lambda calculus proposed by odersky and auml ufer their system supports arbitrary rank polymorphism through the exploitation of type annotations on lambda bound arguments and arbitrary sub terms though elegant and more convenient than some other proposals odersky and auml ufer’s system requires many annotations we show how to use local type inference invented by pierce and turner to greatly reduce the annotation burden to the point where higher rank types become eminently usable higher rank types have very modest impact on type inference we substantiate this claim in very concrete way by presenting complete type inference engine written in haskell for traditional damas milner type system and then showing how to extend it for higher rank types we write the type inference engine using monadic framework it turns out to be particularly compelling example of monads in action the paper is long but is strongly tutorial in style although we use haskell as our example source language and our implementation language much of our work is directly applicable to any ml like functional language
storyboards grid layout of thumbnail images as surrogates representing video have received much attention in video retrieval interfaces and published studies through the years and work quite well as navigation aids and as facilitators for shot based information retrieval when the information need is tied less to shots and requires inspection of stories and across stories other interfaces into the video data have been demonstrated to be quite useful these interfaces include scatterplots for timelines choropleth maps dynamic query preview histograms and named entity relation diagrams representing sets of hundreds or thousands of video stories one challenge for interactive video search is to move beyond support for fact finding and also address broader longer term search activities of learning analysis synthesis and discovery examples are shown for broadcast news and life oral histories drawing from empirically collected data showing how such interfaces can promote improved exploratory search this paper surveys and reflects on body of informedia interface work dealing with news folding in for the first time an examination of exploratory transactions with an oral history corpus
recommender systems provide users with pertinent resources according to their context and their profiles by applying statistical and knowledge discovery techniques this paper describes new approach of generating suitable recommendations based on the active user’s navigation stream by considering long and short distance resources in the history with tractable model the skipping based recommender we propose uses markov models inspired from the ones used in language modeling while integrating skipping techniques to handle noise during navigation weighting schemes are also used to alleviate the importance of distant resources this recommender has also the characteristic to be anytime it has been tested on browsing dataset extracted from intranet logs provided by french bank results show that the use of exponential decay weighting schemes when taking into account non contiguous resources to compute recommendations enhances the accuracy moreover the skipping variant we propose provides high accuracy while being less complex than state of the art variants
difficulties in reasoning about functional correctness and relational properties of object oriented programs are reviewed an approach using auxiliary state is briefly described with emphasis on the author’s work some near term challenges are sketched
this paper presents an approach to specifying the different types of global reduction operations without laboriously coding the source code using the traditional textual approach we use the visual environment provided by the system called active knowledge studio aks to specify modify view or run the specification as background we give an overview of the aks system and the main technique being employed that is using cyberfilms in building programs although the main focus of this paper is to show how to override at the lowest level the default parameters supplied by the system we also include sufficient background about the higher levels so that the reader can relate the lowest level with the other higher levels in the hierarchy of specification in general this paper discusses how to specify global reduction operations using language of micro icons and shows how the visual programming environment supports the manipulation with these micro icons
this paper presents an automatic registration system for aligning combined range intensity scan pairs the overall approach is designed to handle several challenges including extensive structural changes large viewpoint differences repetitive structure illumination differences and flat regions the technique is split into three stages initialization refinement and verification during initialization intensity keypoints are backprojected into the scans and matched to form candidate transformations each based on single match we explore methods of improving this image based matching using the range data for refinement we extend the dual bootstrap icp algorithm for alignment of range data and introduce novel geometric constraints formed by backprojected image based edgel features the verification stage determines if refined transformation is correct we treat verification as classification problem based on accuracy stability and novel boundary alignment measure experiments with scan pairs show both the overall effectiveness of the algorithm and the importance of its component techniques
temporal information is an important attribute of topic and topic usually exists in limited period therefore many researchers have explored the utilization of temporal information in topic detection and tracking tdt they use either story’s publication time or temporal expressions in text to derive temporal relatedness between two stories or story and topic however past research neglects the fact that people tend to express time with different granularities as time lapses based on careful investigation of temporal information in news streams we propose new strategy with time granularity reasoning for utilizing temporal information in topic tracking set of topic times which as whole represent the temporal attribute of topic are distinguished from others in the given on topic stories the temporal relatedness between story and topic is then determined by the highest coreference level between each time in the story and each topic time where the coreference level between test time and topic time is inferred from the two times themselves their granularities and the time distance between the topic time and the publication time of the story where the test time appears furthermore the similarity value between an incoming story and topic that is the likelihood that story is on topic can be adjusted only when the new story is both temporally and semantically related to the target topic experiments on two different tdt corpora show that our proposed method could make good use of temporal information in news stories and it consistently outperforms the baseline centroid algorithm and other algorithms which consider temporal relatedness
chip multi processors have emerged as one of the most effective uses of the huge number of transistors available today and in the future but questions remain as to the best way to leverage cmps to accelerate single threaded applications previous approaches rely on significant speculation to accomplish this goal our proposal nxa is less speculative than previous proposals relying heavily on software to guarantee thread correctness though still allowing parallelism in the presence of ambiguous dependences it divides single thread of execution into multiple using the master worker paradigm where some set of master threads execute code that spawns tasks for other worker theads the master threads generally consist of performance critical instructions that can prefetch data compute critical control descisions or compute performance critical dataflow slices this prevents non critical instructions from competing with critical instructions for processor resources allowing the critical thread and thus the workload to complete faster empirical results from performance simulation show improvement in performance on way cmp machine demonstrating that software controlled multithreading can indeed provide benefit in the presence of hardware support
there has been increased interest in the use of simulated queries for evaluation and estimation purposes in information retrieval however there are still many unaddressed issues regarding their usage and impact on evaluation because their quality in terms of retrieval performance is unlike real queries in this paper wefocus on methods for building simulated known item topics and explore their quality against real known item topics using existing generation models as our starting point we explore factors which may influence the generation of the known item topic informed by this detailed analysis on six european languages we propose model with improved document and term selection properties showing that simulated known item topics can be generated that are comparable to real known item topics this is significant step towards validating the potential usefulness of simulated queries for evaluation purposes and becausebuilding models of querying behavior provides deeper insight into the querying process so that better retrieval mechanisms can be developed to support the user
we give linear time algorithm for computing the edge search number of cographs thereby proving that this problem can be solved in polynomial time on this graph class with our result the knowledge on graph searching of cographs is now complete node mixed and edge search numbers of cographs can all be computed efficiently furthermore we are one step closer to computing the edge search number of permutation graphs
security is important for many sensor network applications particularly harmful attack against sensor and ad hoc networks is known as the sybil attack where node illegitimately claims multiple identities this paper systematically analyzes the threat posed by the sybil attack to wireless sensor networks we demonstrate that the attack can be exceedingly detrimental to many important functions of the sensor network such as routing resource allocation misbehavior detection etc we establish classification of different types of the sybil attack which enables us to better understand the threats posed by each type and better design countermeasures against each type we then propose several novel techniques to defend against the sybil attack and analyze their effectiveness quantitatively
application programmer’s interfaces give access to domain knowledge encapsulated in class libraries without providing the appropriate notation for expressing domain composition since object oriented languages are designed for extensibility and reuse the language constructs are often sufficient for expressing domain abstractions at the semantic level however they do not provide the right abstractions at the syntactic level in this paper we describe metaborg method for providing concrete syntax for domain abstractions to application programmers the method consists of embedding domain specific languages in general purpose host language and assimilating the embedded domain code into the surrounding host code instead of extending the implementation of the host language the assimilation phase implements domain abstractions in terms of existing apis leaving the host language undisturbed indeed metaborg can be considered method for promoting apis to the language level the method is supported by proven and available technology ie the syntax definition formalism sdf and the program transformation language and toolset stratego xt we illustrate the method with applications in three domains code generation xml generation and user interface construction
this paper presents algebraic identities and algebraic query optimization for parametric model for temporal databases the parametric model has several features not present in the classical model in this model key is explicitly designated with relation and an operator is available to change the key the algebra for the parametric model is three sorted it includes relational expressions that evaluate to relations domain expressions that evaluate to time domains and boolean expressions that evaluate to true or false the identities in the parametric model are classified as weak identities and strong identities weak identities in this model are largely counterparts of the identities in classical relational databases rather than establishing weak identities from scratch meta inference mechanism introduced in the paper allows weak identities to be induced from their respective classical counterpart on the other hand the strong identities will be established from scratch an algorithm is presented for algebraic optimization to transform query to an equivalent query that will execute more efficiently
this paper presents hierarchical two layered trust management framework for very large scale distributed computing utilities where public resources provide majority of the resource capacity the dynamic nature of these utility networks introduce challenging management and security issues due to behavior turnabout maliciousness and diverse policy enforcement the trust management approach offers interesting answers to such issues in our framework the lower layer computes local reputation for peers within their domain based on individual contribution while the upper layer combines the local reputation with that of its domain’s as perceived by other domains to compute the peer’s global trust simulation results show that the hierarchical scheme is more scalable highly robust in hostile conditions and capable of creating rapid trust estimates features of the framework include ability to carry forward local behavior trends autonomous domain based policing high cohesiveness with the resource management system and securely exposing the trust evaluation operations to peers ie the subjects of the evaluation process detailed analysis of the threats attacks that the framework could be subjected is presented along with countermeasures against the attacks
in this paper we model the network throughput gains of two types of wireless network coding nc schemes including the conventional nc and the analog nc schemes over the traditional non nc transmission scheduling schemes in multi hop multi channel and multi radio wireless ad hoc networks in particular we first show that the network throughput gains of the conventional nc and analog nc are and respectively for the way relay networks where second we propose an analytical framework for deriving the network throughput gain of the wireless nc schemes over general wireless network topologies by solving the problem of maximizing the network throughput subject to the fairness requirements under our proposed framework we quantitatively analyze the network throughput gains of these two types of wireless nc schemes for variety of wireless ad hoc network topologies with different routing strategies finally we develop heuristic joint link scheduling channel assignment and routing algorithm that aims at approaching the optimal solution to the optimization problem under our proposed framework
we present web based diary study on location based search behavior using mobile search engine to capture users location based search behavior in ubiquitous setting we use web based diary tool that collects users detailed mobile search activity their location and diary entries this method enables us to capture users explicit behavior query made their implicit intention motivation behind search and the context spatial temporal and social in which the search was carried out the results of the study show that people tend to stick closely to regularly used routes and regularly visited places eg home and work we also found that most location based searches are conducted while in the presence of others we summarize our findings and offer suggestions to improve location based search by using features such as location based service mash ups
abstract we present two proactive resource allocation algorithms called dpr and lpr for satisfying the timeliness requirements of real time tasks in asynchronous real time distributed systems the algorithms are proactive in the sense that they allow application specified and user triggered resource allocation by allowing anticipated task workloads to be specified for future time intervals when proactively triggered the algorithms allocate resources to maximize the aggregate deadline satisfied ratio for the future time interval under the anticipated workload while dpr uses the earliest deadline first scheduling algorithm as the underlying algorithm for process scheduling and packet scheduling lpr uses modified least laxity first scheduling algorithm we show that lpr is computationally more expensive than dpr further our experimental studies reveal that lpr yields higher deadline satisfied ratio than dpr
as technology scales the delay uncertainty caused by process variations has become increasingly pronounced in deep submicron designs as result paradigm shift from deterministic to statistical design methodology at all levels of the design hierarchy is inevitable in this paper we propose variation aware task allocation and scheduling algorithm for multiprocessor system on chip mpsoc architectures to mitigate the impact of parameter variations new design metric called performance yield and defined as the probability of the assigned schedule meeting the predefined performance constraints is used to guide the task allocation and scheduling procedure an efficient yield computation method for task scheduling complements and significantly improves the effectiveness of the proposed variation aware scheduling algorithm experimental results show that our variation aware scheduler achieves significant yield improvements on average and yield improvements over worst case and nominal case deterministic schedulers respectively can be obtained across the benchmarks by using the proposed variation aware scheduler
when an image is filtered with gaussian of width and is considered as an extra dimension the image is extended to gaussian scale space gss image in earlier work it was shown that the gss image contains an intensity based hierarchical structure that can be represented as binary ordered rooted tree key elements in the construction of the tree are iso intensity manifolds and scale space saddles scale space saddle is critical point in scale space when it connects two different parts of an iso intensity manifold it is called dividing otherwise it is called void each dividing scale space saddle is connected to an extremum in the original image via curve in scale space containing critical points using the nesting of the iso intensity manifolds in the gss image and the dividing scale space saddles each extremum is connected to another extremum in the tree structure the dividing scale space saddles form the connecting elements in the hierarchy they are the nodes of the tree the extrema of the image form the leaves while the critical curves are represented as the edges to identify the dividing scale space saddles global investigation of the scale space saddles and the iso intensity manifolds through them is needed in this paper an overview of the situations that can occur is given in each case it is shown how to distinguish between void and dividing scale space saddles furthermore examples are given and the difference between selecting the dividing and the void scale space saddles is shown also relevant geometric properties of gss images are discussed as well as their implications for algorithms used for the tree extraction as main result it is not necessary to search through the whole gss image to find regions related to each relevant scale space saddle this yields considerable reduction in complexity and computation time as shown in two examples
enabled by the continuous advancement in fabrication technology present day synchronous microprocessors include more than million transistors and have clock speeds well in excess of the ghz mark distributing low skew clock signal in this frequency range to all areas of large chip is task of growing complexity as solution to this problem designers have recently suggested the use of frequency islands that are locally clocked and externally communicate with each other using mixed clock communication schemes such design style fits nicely with the recently proposed concept of voltage islands that in addition can potentially enable fine grain dynamic power management by simultaneous voltage and frequency scaling this paper proposes design exploration framework for application adaptive multiple clock processors which provides the means for analyzing and identifying the right interdomain communication scheme and the proper granularity for the choice of voltage frequency islands in case of superscalar out of order processors in addition the presented design exploration framework allows for comparative analysis of newly proposed or already published application driven dynamic power management strategies such design exploration framework and accompanying results can help designers and computer architects in choosing the right design strategy for achieving better power performance tradeoffs in multiple clock high end processors
the assumptions of uniformity and independence of attribute values in file uniformity of queries constant number of records per block and random placement of qualifying records among the blocks of file are frequently used in database performance evaluation studies in this paper we show that these assumptions often result in predicting only an upper bound of the expected system cost we then discuss the implications of nonrandom placement nonuniformity and dependencies of attribute values on database design and database performance evaluation
one of the primary issues confronting xml message brokers is the difficulty associated with processing large set of continuous xpath queries over incoming xml streams this paper proposes novel system designed to present an effective solution to this problem the proposed system transforms multiple xpath queries before their run time into new data structure called an xp table by sharing their common constraints an xp table is matched with stream relation sr transformed from target xml stream by sax parser this arrangement is intended to minimize the run time workload of continuous query processing in addition an early query termination strategy is proposed as an improved alternative to the basic approach it optimizes query processing by arranging the evaluation sequence of the member lists lists of an xp table adaptively and offers increased efficiency especially in cases of low selectivity system performance is estimated and verified through variety of experiments including comparisons with previous approaches such as yfilter and lazydfa the proposed system is practically linear scalable and stable for evaluating set of xpath queries in continuous and timely fashion
in many applications wireless ad hoc networks are formed by devices belonging to independent users therefore challenging problem is how to provide incentives to stimulate cooperation in this paper we study ad hoc games the routing and packet forwarding games in wireless ad hoc networks unlike previous work which focuses either on routing or on forwarding this paper investigates both routing and forwarding we first uncover an impossibility result there does not exist protocol such that following the protocol to always forward others traffic is dominant action then we define novel solution concept called cooperation optimal protocols we present corsac cooperation optimal protocol consisting of routing protocol and forwarding protocol the routing protocol of corsac integrates vcg with novel cryptographic technique to address the challenge in wireless ad hoc networks that link’s cost ie its type is determined by two nodes together corsac also applies efficient cryptographic techniques to design forwarding protocol to enforce the routing decision such that fulfilling the routing decision is the optimal action of each node in the sense that it brings the maximum utility to the node additionally we extend our framework to practical radio propagation model where transmission is successful with probability we evaluate our protocols using simulations our evaluations demonstrate that our protocols provide incentives for nodes to forward packets
while millions of dollars have been invested in information technologies to improve intelligence information sharing among law enforcement agencies at the federal tribal state and local levels there remains hesitation to share information between agencies this lack of coordination hinders the ability to prevent and respond to crime and terrorism work to date has not produced solutions nor widely accepted paradigms for understanding the problem therefore to enhance the current intelligence information sharing services between government entities in this interdisciplinary research we have identified three major areas of influence technical social and legal furthermore we have developed preliminary model and theory of intelligence information sharing through literature review experience and interviews with practitioners in the field this model and theory should serve as basic conceptual framework for further academic work and lead to further investigation and clarification of the identified factors and the degree of impact they exert on the system so that actionable solutions can be identified and implemented
user generated content has been fueling an explosion in the amount of available textual data in this context it is also common for users to express either explicitly through numerical ratings or implicitly their views and opinions on products events etc this wealth of textual information necessitates the development of novel searching and data exploration paradigms in this paper we propose new searching model similar in spirit to faceted search that enables the progressive refinement of keyword query result however in contrast to faceted search which utilizes domain specific and hard to extract document attributes the refinement process is driven by suggesting interesting expansions of the original query with additional search terms our query driven and domain neutral approach employs surprising word co occurrence patterns and optionally numerical user ratings in order to identify meaningful top query expansions and allow one to focus on particularly interesting subset of the original result set the proposed functionality is supported by framework that is computationally efficient and nimble in terms of storage requirements our solution is grounded on convex optimization principles that allow us to exploit the pruning opportunities offered by the natural top formulation of our problem the performance benefits offered by our solution are verified using both synthetic data and large real data sets comprised of blog posts
in the paper we discuss the problem of data integration in pp environment in such setting each peer stores schema of its local data mappings between the schema and schemas of some other peers peer’s partners and schema constraints the goal of the integration is to answer queries formulated against arbitrarily chosen peers the answer consists of data stored in the queried peer as well as data of its direct and indirect partners we focus on defining and using mappings schema constraints query propagation across the pp system and query reformulation in such scenario special attention is paid to discovering missing values using schema constraints and to reconcile inconsistent data using reliability levels assigning to the sources of data the discussed approach has been implemented in sixpp system semantic integration of xml data in pp environment
high performance distributed computing systems require high performance communication systems channels and hierarchical channels address this need by permitting high level of concurrency like non fifo channels while retaining the simplicity of fifo channels critical to the design and proof of many distributed algorithms in this paper we present counter based implementations for channels and hierarchical channels using message augmentation appending control information to message these implementations guarantee that no messages are unnecessarily delayed at the receiving end
in modern programming language scoping rules determine the visibility of names in various regions of program in this work we examine the idea of allowing an application developer to customize the scoping rules of its underlying language we demonstrate that such an ability can serve as the cornerstone of security architecture for dynamically extensible systems run time module system isomod is proposed for the java platform to facilitate software isolation core application may create namespaces dynamically and impose arbitrary name visibility policies ie scoping rules to control whether name is visible to whom it is visible and in what way it can be accessed because isomod exercises name visibility control at load time loaded code runs at full speed furthermore because isomod access control policies are maintained separately they evolve independently from core application code in addition the isomod policy language provides declarative means for expressing very general form of visibility constraints not only can the isomod policy language simulate sizable subset of permissions in the java security architecture it does so with policies that are robust to changes in software configurations the isomod policy language is also expressive enough to completely encode capability type system known as discretionary capability confinement in spite of its expressiveness the isomod policy language admits an efficient implementation strategy name visibility control in the style of isomod is therefore lightweight access control mechanism for java style language environments
current usability evaluation methods are essentially holistic in nature however engineers that apply component based software engineering approach might also be interested in understanding the usability of individual parts of an interactive system this paper examines the efficiency dimension of usability by describing method which engineers can use to test empirically and objectively the physical interaction effort to operate components in single device the method looks at low level events such as button clicks and attributes the physical effort associated with these interaction events to individual components in the system this forms the basis for engineers to prioritise their improvement effort the paper discusses face validity content validity criterion validity and construct validity of the method the discussion is set within the context of four usability tests in which users participated to evaluate the efficiency of four different versions of mobile phone the results of the study show that the method can provide valid estimation of the physical interaction event effort users made when interacting with specific part of device
we give the first systematic investigation of the design space of worm defense system strategies we accomplish this by providing taxonomy of defense strategies by abstracting away implementation dependent and approach specific details and concentrating on the fundamental properties of each defense category our taxonomy and analysis reveals the key parameters for each strategy that determine its effectiveness we provide theoretical foundation for understanding how these parameters interact as well as simulation based analysis of how these strategies compare as worm defense systems finally we offer recommendations based upon our taxonomy and analysis on which worm defense strategies are most likely to succeed in particular we show that hybrid approach combining proactive protection and reactive antibody defense is the most promising approach and can be effective even against the fastest worms such as hitlist worms thus we are the first to demonstrate with theoretic and empirical models which defense strategies will work against the fastest worms such as hitlist worms
this paper describes new method of executing software program on an fpga for embedded systems rather than combine reconfigurable logic with microprocessor core this method uses new technique to compile java programs directly to special purpose processors that are called flowpaths flowpaths allow software program to be executed in such way that one low capacity fpga is executing piece of the program while second low capacity fpga is being dynamically reconfigured to take over execution after the first one has completed its task in this fashion the program is executed by continuously alternating between the two chips this process allows large programs to be implemented on limited hardware resources and at higher clock frequencies on an fpga the sequencer and rules for partitioning flowpath are presented and the method is illustrated using genetic algorithm compiled from java directly to flowpaths using flowpath compiler designed in our lab the genetic algorithm flowpath requires of an expensive high density virtex xcv with maximum frequency of mhz this genetic algorithm would have required of the smaller more reasonably priced virtex however by splitting flowpaths as presented in this paper the algorithm can be implemented with an average maximum clock frequency of mhz using only two low capacity inexpensive virtex xcv chips the smallest capacity of the virtex family the split flowpaths require one quarter of the clock cycles used by the jstamp microcontroller therefore instead of using an expensive high density fpga the proposed design illustrates how genetic algorithm can be implemented using only pair of inexpensive chips that have only equivalent gates each low cost sequencer and mb of memory to store bit files the overall time performance depends on the speed of the fpga reconfiguration process
with this work we aim to make three fold contribution we first address the issue of supporting efficiently queries over string attributes involving prefix suffix containment and equality operators in large scale data networks our first design decision is to employ distributed hash tables dhts for the data network’s topology harnessing their desirable properties our next design decision is to derive dht independent solutions treating dht as black box second we exploit this infrastructure to develop efficient content based publish subscribe systems the main contribution here are algorithms for the efficient processing of queries subscriptions and events publications specifically we show that our subscription processing algorithms require logn messages for node network and our event processing algorithms require logn messages with being the average string length third we develop algorithms for optimizing the processing of multi dimensional events involving several string attributes further to our analysis we provide simulation based experiments showing promising performance results in terms of number of messages required bandwidth load balancing and response times
modern embedded processors with dedicated address generation unit support memory accesses through auto increment decrement addressing mode the auto increment decrement mode if properly utilized can save address arithmetic instructions reduce static and dynamic memory footprint of the program and speed up the execution as well liao categorized this problem as simple offset assignment soa and general offset assignment goa which involves storage layout of variables and assignment of address registers respectively proposing several heuristic solutions this article proposes new direction for investigating the solution space of the problem the general idea zhuang is to perform simplification of the underlying access graph through coalescence of the memory locations of program variables comprehensive framework is proposed including coalescence based offset assignment and post pre optimization variables not interfering with others not simultaneously live at any program point can be coalesced into the same memory location coalescing allows simplifications of the access graph yielding better soa solutions it also reduces the address register pressure to such low values that some goa solutions become optimal moreover it can reduce the memory footprint both statically and at runtime for stack variables our second optimization post pre optimization considers both post and pre modification mode for optimizing code across basic blocks which makes it useful making use of both addressing modes further reduces soa goa cost and our post pre optimization phase is optimal in selecting post or pre mode after variable offsets have been determined we have shown the advantages of our framework over previous approaches to capture more opportunities to reduce both stack size and soa goa cost leading to more speedup
trademark image retrieval tir branch of content based image retrieval cbir is playing an important role in multimedia information retrieval this paper proposes an effective solution for tir by combining shape description and feature matching we first present an effective shape description method which includes two shape descriptors second we propose an effective feature matching strategy to compute the dissimilarity value between the feature vectors extracted from images finally we combine the shape description method and the feature matching strategy to realize our solution we conduct large number of experiments on standard image set to evaluate our solution and the existing solutions by comparison of their experimental results we can see that the proposed solution outperforms existing solutions for the widely used performance metrics
this paper describes package for parallel steady state stochastic simulation that was designed to overcome problems caused by long simulation times experienced in our ongoing research in performance evaluation of high speed and integrated services communication networks while maintaining basic statistical rigors of proper analysis of simulation output data the package named akaroa accepts ordinary nonparallel simulation programs and alll further stages of stochastic simulation should be transparent for users the package employs new method of sequential estimation for the multiple replications in parallel scenario all basic functions including the transformation of originally nonparallel simulators into ones suitable for parallel execution control of the precision of estimates and stopping of parallel simulation processes when the required precision of the overall steady state estimates is achieved are automated the package can be used on multiprocessor systems and or heterogeneous computer networks involving an arbitrary number of processors the design issues architecture and implementation of akaroa as well as the results of its preliminary performance studies are presented
during the past decade cluster computing and mobile communication technologies have been extensively deployed and widely applied because of their giant commercial value the rapid technological advancement makes it feasible to integrate these two technologies and revolutionary application called mobile cluster computing is arising on the horizon mobile cluster computing technology can further enhance the power of our laptops and mobile devices by running parallel applications however scheduling parallel applications on mobile clusters is technically challenging due to the significant communication latency and limited battery life of mobile devices therefore shortening schedule length and conserving energy consumption have become two major concerns in designing efficient and energy aware scheduling algorithms for mobile clusters in this paper we propose two novel scheduling strategies aimed at leveraging performance and power consumption for parallel applications running on mobile clusters our research focuses on scheduling precedence constrained parallel tasks and thus duplication heuristics are applied to schedule parallel tasks to minimize communication overheads however existing duplication algorithms are developed with consideration of schedule lengths completely ignoring energy consumption of clusters in this regard we design two energy aware duplication scheduling algorithms called eadus and tebus to schedule precedence constrained parallel tasks with complexity of where is the number of tasks in parallel task set unlike the existing duplication based scheduling algorithms that replicate all the possible predecessors of each task the proposed algorithms judiciously replicate predecessors of task if the duplication can help in conserving energy our energy aware scheduling strategies are conducive to balancing scheduling lengths and energy savings of set of precedence constrained parallel tasks we conducted extensive experiments using both synthetic benchmarks and real world applications to compare our algorithms with two existing approaches experimental results based on simulated mobile clusters demonstrate the effectiveness and practicality of the proposed duplication based scheduling strategies for example eadus and tabus can reduce energy consumption for the gaussian elimination application by averages of and with merely and increase in schedule length respectively
tree pattern matching is one of the most fundamental tasks for xml query processing holistic twig query processing techniques have been developed to minimize the intermediate results namely those root to leaf path matches that are not in the final twig results however useless path matches cannot be completely avoided especially when there is parent child relationship in the twig query furthermore existing approaches do not consider the fact that in practice in order to process xpath or xquery statements more powerful form of twig queries namely generalized tree pattern gtp queries is required most existing works on processing gtp queries generally calls for costly post processing for eliminating redundant data and or grouping of the matching resultsin this paper we first propose novel hierarchical stack encoding scheme to compactly represent the twig results we introduce twigstack bottom up algorithm for processing twig queries based on this encoding scheme then we show how to efficiently enumerate the query results from the encodings for given gtp query to our knowledge this is the first gtp matching solution that avoids any post path join sort duplicate elimination and grouping operations extensive performance studies on various data sets and queries show that the proposed twigstack algorithm not only has better twig query processing performance than state of the art algorithms but is also capable of efficiently processing the more complex gtp queries
this paper investigates the potential for projecting linguistic annotations including part of speech tags and base noun phrase bracketings from one language to another via automatically word aligned parallel corpora first experiments assess the accuracy of unmodified direct transfer of tags and brackets from the source language english to the target languages french and chinese both for noisy machine aligned sentences and for clean hand aligned sentences performance is then substantially boosted over both of these baselines by using training techniques optimized for very noisy data yielding core french part of speech tag accuracy and french bracketing measure for stand alone monolingual tools trained without the need for any human annotated data in the given language
the new type of patterns sequential patterns with the negative conclusions is proposed in the paper they denote that certain set of items does not occur after regular frequent sequence some experimental results and the spawn algorithm for mining sequential patterns with the negative conclusions are also presented
chez scheme is now over years old the first version having been released in this paper takes brief look back on the history of chez scheme’s development to explore how and why it became the system it is today
in peer to peer inference system each peer can reason locally but can also solicit some of its acquaintances which are peers sharing part of its vocabulary in this paper we consider peer to peer inference systems in which the local theory of each peer is set of propositional clauses defined upon local vocabulary an important characteristic of peer to peer inference systems is that the global theory the union of all peer theories is not known as opposed to partition based reasoning systems the main contribution of this paper is to provide the first consequence finding algorithm in peer to peer setting deca it is anytime and computes consequences gradually from the solicited peer to peers that are more and more distant we exhibit sufficient condition on the acquaintance graph of the peer to peer inference system for guaranteeing the completeness of this algorithm another important contribution is to apply this general distributed reasoning setting to the setting of the semantic web through the somewhere semantic peer to peer data management system the last contribution of this paper is to provide an experimental analysis of the scalability of the peer to peer infrastructure that we propose on large networks of peers
this paper presents and compares wordnet based and distributional similarity approaches the strengths and weaknesses of each approach regarding similarity and relatedness tasks are discussed and combination is presented each of our methods independently provide the best results in their class on the rg and wordsim datasets and supervised combination of them yields the best published results on all datasets finally we pioneer cross lingual similarity showing that our methods are easily adapted for cross lingual task with minor losses
automatic image annotation is promising solution to enable semantic image retrieval via keywords in this paper we propose multi level approach to annotate the semantics of natural scenes by using both the dominant image components salient objects and the relevant semantic concepts to achieve automatic image annotation at the content level we use salient objects as the dominant image components for image content representation and feature extraction to support automatic image annotation at the concept level novel image classification technique is developed to map the images into the most relevant semantic image concepts in addition support vector machine svm classifiers are used to learn the detection functions for the pre defined salient objects and finite mixture models are used for semantic concept interpretation and modeling an adaptive em algorithm has been proposed to determine the optimal model structure and model parameters simultaneously we have also demonstrated that our algorithms are very effective to enable multi level annotation of natural scenes in large scale image dataset
object oriented systems that undergo repeated addition of functionality commonly suffer loss of quality in their underlying design this problem must often be remedied in costly refactoring phase before further maintenance programming can take place recently search based approaches to automating the task of software refactoring based on the concept of treating object oriented design as combinatorial optimization problem have been proposed however because search based refactoring is novel approach it is yet to be established as to which search techniques are most suitable for the task in this paper we report the results of an empirical comparison of simulated annealing sa genetic algorithms gas and multiple ascent hill climbing hcm in search based refactoring prototype automated refactoring tool is employed capable of making radical changes to the design of an existing program in order that it conforms more closely to contemporary quality model results show hcm to outperform both sa and ga over set of five input programs copyright copy john wiley sons ltd
the controller originally studied by afek awerbuch plotkin and saks is basic distributed tool that provides an abstraction for managing the consumption of global resource in distributed dynamic network the input to the controller arrives online in the form of requests presented at arbitrary nodes request presented at node corresponds to the desire of some entity to consume one unit of the global resource at and the controller should handle this request within finite time by either granting it with permit or denying it initially permits corresponding to units of the global resource are stored at designated root node throughout the execution permits can be transported from place to place along the network’s links so that they can be granted to requests presented at various nodes when permit is granted to some request it is eliminated from the network the fundamental rule of an controller is that request should not be denied unless it is certain that at least permits are eventually granted the most efficient controller known to date has message complexity log log where is the number of nodes that ever existed in the network the dynamic network may undergo node insertions and deletions in this paper we establish two new lower bounds on the message complexity of the controller problem we first prove simple lower bound stating that any controller must send log messages second for the important case when is proportional to this is the common case in most applications we use surprising reduction from the centralized monotonic labeling problem to show that any controller must send log messages in fact under long lasting conjecture regarding the complexity of the monotonic labeling problem this lower bound is improved to tight log the proof of this lower bound requires that which turns out to be somewhat inevitable due to new construction of an controller with message complexity log
feature weighting or selection is crucial process to identify an important subset of features from data set removing irrelevant or redundant features can improve the generalization performance of ranking functions in information retrieval due to fundamental differences between classification and ranking feature weighting methods developed for classification cannot be readily applied to feature weighting for ranking state of the art feature selection method for ranking called gas has been recently proposed which exploits importance of each feature and similarity between every pair of features however gas must compute the similarity scores of all pairs of features thus it is not scalable for high dimensional data and its performance degrades on nonlinear ranking functions this paper proposes novel algorithms rankwrapper and rankfilter which is scalable for high dimensional data and also performs reasonably well on nonlinear ranking functions rankwrapper and rankfilter are designed based on the key idea of relief algorithm relief is feature selection algorithm for classification which exploits the notions of hits data points within the same class and misses data points from different classes for classification however there is no such notion of hits or misses in ranking the proposed algorithms instead utilize the ranking distances of nearest data points in order to identify the key features for ranking our extensive experiments show that rankwrapper and rankfilter generate higher accuracy overall than the gas and traditional relief algorithms adapted for ranking and run substantially faster than the gas on high dimensional data
conventional programming models were designed to be used by expert programmers for programming for large scale multiprocessors distributed computational clusters or specialized parallel machines these models therefore are deemed either too difficult for an average programmer who will be expected to do parallel programming in the many core world or too inefficient to use for many core architectures similarly conventional execution models were designed for performance not scalability to address the challenge of performance scalability for many core architectures we introduce servo service oriented programming model that decomposes every program into set of components each with its own local memory mutable state and program counter that either request services or deliver services servo is characterized by the decoupling of logical communication mapping between program modules or services from physical communication mapping between modules this allows the services to be migrated and replicated during execution the proposed model also allows the granularity of data parallel operations to be changed dynamically this allows runtime variation in the locking granularity which in turn enables higher write parallelism finally the models partition even data into services this can significantly enhance locality and makes it possible to have superlinear speedups with increasing number of cores our preliminary investigations demonstrate significant performance scalability advantages for servo
the aim of this study was to investigate the impact of individual differences such as gender and attitude towards mobile phone use in public places on the usability of speech activated mobile city guide service in various context of use eg cafe train wizard of oz methodology was used to provide the service functionality for the mobile city guide service participants in the study completed specific tasks over six week period in public and private locations the results highlight the importance of considering the effects of individual differences on the context of use in system design and evaluation
we present algorithms methods and software for grid resource manager that performs resource brokering and job scheduling in production grids this decentralized broker selects computational resources based on actual job requirements job characteristics and information provided by the resources with the aim to minimize the total time to delivery for the individual application the total time to delivery includes the time for program execution batch queue waiting and transfer of executable and input output data to and from the resource the main features of the resource broker include two alternative approaches to advance reservations resource selection algorithms based on computer benchmark results and network performance predictions and basic adaptation facility the broker is implemented as built in component of job submission client for the nordugrid arc middleware
gxl graph exchange language is an xml based standard exchange format for sharing data between tools formally gxl represents typed attributed directed ordered graphs which are extended to represent hypergraphs and hierarchical graphs this flexible data model can be used for object relational data and wide variety of graphs an advantage of gxl is that it can be used to exchange instance graphs together with their corresponding schema information in uniform format ie using common document type specification this paper describes gxl and shows how gxl is used to provide interoperability of graph based tools gxl has been ratified by reengineering and graph transformation research communities and is being considered for adoption by other communities
we describe novel approach to verification of software systems centered around an underlying database instead of applying general purpose techniques with only partial guarantees of success it identifies restricted but reasonably expressive classes of applications and properties for which sound and complete verification can be performed in fully automatic way this leverages the emergence of high level specification tools for database centered applications that not only allow fast prototyping and improved programmer productivity but as side effect provide convenient targets for automatic verification we present theoretical and practical results on verification of database driven systems the results are quite encouraging and suggest that unlike arbitrary software systems significant classes of database driven systems may be amenable to automatic verification this relies on novel marriage of database and model checking techniques of relevance to both the database and the computer aided verification communities
dynamic web service composition can serve applications or users on an on demand basis with dynamic composition the application’s capabilities can be extended at runtime so that theoretically an unlimited number of new services can be created from limited set of service components thus making applications no longer restricted to the original set of operations specified and envisioned at design and or compile time moreover dynamic composition is the only means to adapt the behaviour of running components in highly available applications such as banking and telecommunication systems where services cannot be brought offline to upgrade or remove obsolete services in this paper we present novel classification of the current state of the art dynamic web services composition techniques with attention to the capabilities and limitations of the underlying approaches the proposed taxonomy of these techniques is derived based on comprehensive survey of what has been done so far in dynamic web service composition finally we summarise our findings and present vision for future research work in this area
infinite loops and redundant computations are long recognized open problems in prolog two methods have been explored to resolve these problems loop checking and tabling loop checking can cut infinite loops but it cannot be both sound and complete even for function free logic programs tabling seems to be an effective way to resolve infinite loops and redundant computations however existing tabulated resolutions such as oldt resolution slg resolution and tabulated sls resolution are non linear because they rely on the solution lookup mode in formulating tabling the principal disadvantage of non linear resolutions is that they cannot be implemented using simple stack based memory structure like that in prolog moreover some strictly sequential operators such as cuts may not be handled as easily as in prolog in this paper we propose hybrid method to resolve infinite loops and redundant computations we combine the ideas of loop checking and tabling to establish linear tabulated resolution called tp resolution tp resolution has two distinctive features it makes linear tabulated derivations in the same way as prolog except that infinite loops are broken and redundant computations are reduced it handles cuts as effectively as prolog and it is sound and complete for positive logic programs with the bounded term size property the underlying algorithm can be implemented by an extension to any existing prolog abstract machines such as wam or atoam
we describe technique for automatically proving compiler optimizations sound meaning that their transformations are always semantics preserving we first present domain specific language called cobalt for implementing optimizations as guarded rewrite rules cobalt optimizations operate over like intermediate representation including unstructured control flow pointers to local variables and dynamically allocated memory and recursive procedures then we describe technique for automatically proving the soundness of cobalt optimizations our technique requires an automatic theorem prover to discharge small set of simple optimization specific proof obligations for each optimization we have written variety of forward and backward intraprocedural dataflow optimizations in cobalt including constant propagation and folding branch folding full and partial redundancy elimination full and partial dead assignment elimination and simple forms of points to analysis we implemented our soundness checking strategy using the simplify automatic theorem prover and we have used this implementation to automatically prove our optimizations correct our checker found many subtle bugs during the course of developing our optimizations we also implemented an execution engine for cobalt optimizations as part of the whirlwind compiler infrastructure
regulatory compliance of business operations is critical problem for enterprises as enterprises increasingly use business process management systems to automate their business processes technologies to automatically check the compliance of process models against compliance rules are becoming important in this paper we present method to improve the reliability and minimize the risk of failure of business process management systems from compliance perspective the proposed method allows separate modeling of both process models and compliance concerns business process models expressed in the business process execution language are transformed into pi calculus and then into finite state machines compliance rules captured in the graphical business property specification language are translated into linear temporal logic thus process models can be verified against these compliance rules by means of model checking technology the benefit of our method is threefold through the automated verification of large set of business process models our approach increases deployment efficiency and lowers the risk of installing noncompliant processes it reduces the cost associated with inspecting business process models for compliance and compliance checking may ensure compliance of new process models before their execution and thereby increase the reliability of business operations in general
in this article we present media adaptation framework for an immersive biofeedback system for stroke patient rehabilitation in our biofeedback system media adaptation refers to changes in audio visual feedback as well as changes in physical environment effective media adaptation frameworks help patients recover generative plans for arm movement with potential for significantly shortened therapeutic time the media adaptation problem has significant challenges mdash high dimensionality of adaptation parameter space variability in the patient performance across and within sessions the actual rehabilitation plan is typically non first order markov process making the learning task hard our key insight is to understand media adaptation as real time feedback control problem we use mixture of experts based dynamic decision network ddn for online media adaptation we train ddn mixtures per patient per session the mixture models address two basic questions mdash given specific adaptation suggested by the domain experts predict the patient performance and given the expected performance determine the optimal adaptation decision the questions are answered through an optimality criterion based search on ddn models trained in previous sessions we have also developed new validation metrics and have very good results for both questions on actual stroke rehabilitation data
with the increasing performance gap between the processor and the memory the importance of caches is increasing for high performance processors however with reducing feature sizes and increasing clock speeds cache access latencies are increasing designers pipeline the cache accesses to prevent the increasing latencies from affecting the cache throughput nevertheless increasing latencies can degrade the performance significantly by delaying the execution of dependent instructionsin this paper we investigate predicting the data cache set and the tag of the memory address as means to reduce the effective cache access latency in this technique the predicted set is used to start the pipelined cache access in parallel to the memory address computation we also propose set address adaptive predictor to improve the prediction accuracy of the data cache sets our studies found that using set prediction to reduce load to use latency can improve the overall performance of the processor by as much as in this paper we also investigate techniques such as predicting the data cache line where the data will be present to limit the increase in cache energy consumption when using set prediction in fact with line prediction the techniques in this paper consume about less energy in the data cache than decoupled accessed cache with minimum energy consumption while still maintaining the performance improvement however the overall energy consumption is about more than decoupled accessed cache when the energy consumption in the predictor table is also considered
process rewrite systems prs are widely accepted as formalism for the description of infinite state systems it is known that the reachability problem for prs is decidable the problem becomes undecidable when prs are extended with finite state control unit in this paper we show that the problem remains decidable when prs are extended with weak ie acyclic except for self loops finite state control unit we also present some applications of this decidability result
run time application environments are affected by the changes in mini world or technology changes large number of applications are process driven for robust applications that can evolve over time there is need for methodology that implicitly handles changes at various levels from mini world to run time environment through layers of models and systems in this paper we present er methodology for evolving applications in the context of this paper the role of two way active behaviour and template driven development of applications is presented this methodology facilitates capturing active behaviour from run time transactions and provides means of using this knowledge to guide subsequent application design and its evolution
the recent spectacular progress in modern microelectronics created big stimulus towards development of embedded systems unfortunately it also introduced unusual complexity which results in many serious issues that cannot be resolved without new more adequate development methods and electronic design automation tools for the system level design this paper discusses the problem of an efficient model based multi objective optimal architecture synthesis for complex hard real time embedded systems when using as an example system level architecture exploration and synthesis method that we developed
many existing retargetable compilers for asips and domain specific processors generate low quality code since the compiler is not able to fully utilize the intricacies of isa of these processors hence there is need to further optimize the code produced by these compilers in this paper we introduce new post compilation optimization technique which is based on finding repeating instruction patterns in generated code and replacing them with their optimized equivalents the instruction patterns to be found are represented by finite state machines which allow encapsulation of multiple patterns in just one representation and instructions in pattern to be not necessarily lexically adjacent we also present conflict resolution algorithm to select an optimization whenever set of instructions fall under two or more different patterns of which only one can be applied on the basis of code size cycle count or switching activity improvement we tested this technique on the compiled binaries of arm and intel processors for code size improvement we discuss the possible applications of this strategy in design space exploration dse of embedded processors
this article describes an original approach for the optimized execution of computational tasks in grid environments tasks are represented as workflows that define interactions between different services functional service descriptions written in owl are extended with non functional properties allowing to specify the resource requirements of services depending on given inputs based on such annotations mathematical model is derived to estimate the execution costs of workflow moreover an optimization algorithm is presented that distributes the execution of workflow in grid supporting the dynamic deployment of software components on demand in order to fulfill user requirements such as limit on the total workflow execution time workflows are executed in fully decentralized way avoiding inefficient triangular routing of messages
recently there has been significant interest in employing probabilistic techniques for fault localization using dynamic dependence information for multiple passing runs learning techniques are used to construct probabilistic graph model for given program then given failing run the probabilistic model is used to rank the executed statements according to the likelihood of them being faulty in this paper we present novel probabilistic approach in which universal probabilistic models are learned to characterize the behaviors of various instruction types used by all programs the universal probabilistic model for an instruction type is in form of probability distribution that represents how errors in the input operand values are propagated as errors in the output result of given instruction type once these models have been constructed they can be used in the analysis of any program as follows given set of runs for any program including at least one passing and one failing run bayesian network called the error flow graph efg is then constructed from the dynamic dependence graphs of the program runs and the universal probabilistic models standard inference algorithms are employed to compute the probability of each executed statement being faulty we also present optimizations to reduce the runtime cost of inference using the efg our experiments demonstrate that our approach is highly effective in fault localization even when very few passing runs are available it also performs well in the presence of multiple faults
embedded systems bring special purpose computing power to consumer electronics devices such as smartcards cd players and pagers java is being aggressively targeted at such systems with initiatives such as the java platform micro edition which introduces certain efficiency optimizations to the java virtual machine code size reduction has been identified as an important future goal for ensuring java’s success on embedded systems however limited processing power and timing constraints often make traditional compression techniques untenable an effective solution must meet the conflicting requirements of size reduction and execution performance we propose modifications to the file format for java binaries that achieve significant size reduction with little or no performance penalty experiments conducted on several large java class libraries show typical size reduction for class files and size reduction for jar files
clustering is very powerful data mining technique for topic discovery from text documents the partitional clustering algorithms such as the family of means are reported performing well on document clustering they treat the clustering problem as an optimization process of grouping documents into clusters so that particular criterion function is minimized or maximized usually the cosine function is used to measure the similarity between two documents in the criterion function but it may not work well when the clusters are not well separated to solve this problem we applied the concepts of neighbors and link introduced in guha rastogi shim rock robust clustering algorithm for categorical attributes information systems to document clustering if two documents are similar enough they are considered as neighbors of each other and the link between two documents represents the number of their common neighbors instead of just considering the pairwise similarity the neighbors and link involve the global information into the measurement of the closeness of two documents in this paper we propose to use the neighbors and link for the family of means algorithms in three aspects new method to select initial cluster centroids based on the ranks of candidate documents new similarity measure which uses combination of the cosine and link functions and new heuristic function for selecting cluster to split based on the neighbors of the cluster centroids our experimental results on real life data sets demonstrated that our proposed methods can significantly improve the performance of document clustering in terms of accuracy without increasing the execution time much
this paper addresses the problem of human motion tracking from multiple image sequences the human body is described by five articulated mechanical chains and human body parts are described by volumetric primitives with curved surfaces if such surface is observed with camera an extremal contour appears in the image whenever the surface turns smoothly away from the viewer we describe method that recovers human motion through kinematic parameterization of these extremal contours the method exploits the fact that the observed image motion of these contours is function of both the rigid displacement of the surface and of the relative position and orientation between the viewer and the curved surface first we describe parameterization of an extremal contour point velocity for the case of developable surfaces second we use the zero reference kinematic representation and we derive an explicit formula that links extremal contour velocities to the angular velocities associated with the kinematic model third we show how the chamfer distance may be used to measure the discrepancy between predicted extremal contours and observed image contours moreover we show how the chamfer distance can be used as differentiable multi valued function and how the tracker based on this distance can be cast into continuous non linear optimization framework fourth we describe implementation issues associated with practical human body tracker that may use an arbitrary number of cameras one great methodological and practical advantage of our method is that it relies neither on model to image nor on image to image point matches in practice we model people with kinematic chains volumetric primitives and degrees of freedom we observe silhouettes in images gathered with several synchronized and calibrated cameras the tracker has been successfully applied to several complex motions gathered at frames second
in structural operational semantics sos was introduced as systematic way to define operational semantics of programming languages by set of rules of certain shape gd plotkin structural approach to operational semantics technical report daimi fn computer science department aarhus university aarhus denmark september subsequently the format of sos rules became the object of study using so called transition system specifications tss’s several authors syntactically restricted the format of rules and showed several useful properties about the semantics induced by any tss adhering to the format this has resulted in line of research proposing several syntactical rule formats and associated meta theorems properties that are guaranteed by such rule formats range from well definedness of the operational semantics and compositionality of behavioral equivalences to security time and probability related issues in this paper we provide an overview of sos rule formats and meta theorems formulated around them
number of program analysis problems can be tackled by transforming them into certain kinds of graph reachability problems in labeled directed graphs the edge labels can be used to filter out paths that are not interest path from vertex to vertex only counts as ldquo valid connection rdquo between and if the word spelled out by is in certain language often the languages used for such filtering purposes are languages of matching parantheses in some cases the matched parenthesis condition is used to filter out paths with mismatched calls and returns this leads to so called ldquo context sensitive rdquo program analyses such as context sensitive interprocedural slicing and context sensitive interprocedural dataflow analysis in other cases the matched parenthesis condition is used to capture graph theoretic analog of mccarthy’s rules ldquo car cons rdquo and ldquo cdr cons rdquo that is in the code fragment cons car the fact that there is ldquo structure transmitted data dependence rdquo from to but not from to is captured in graph by using vertex for each variable an edge from vertex to vertex when is used on the right hand side of an assignment to parentheses that match as the labels on the edges that run from to and to and parentheses that do not match as the labels on the edges that run from to and to however structure transmitted data dependence analysis is context insensitive because there are no constraints that filter out paths with mismatched calls and returns thus natural question is whether these two kinds of uses of parentheses can be combined to create context sensitive analysis for structure transmitted data dependences this article answers the question in the negative in general the problem of contextsensitive structure transmitted data dependence analysis is undecidable the results imply that in general both context sensitive set based analysis and infin cfa when data constructors and selectors and selectors are taken into account are also undecidable
we study the rough set theory as method of feature selection based on tolerant classes that extends the existing equivalent classes the determination of initial tolerant classes is challenging and important task for accurate feature selection and classification in this paper the expectation maximization clustering algorithm is applied to determine similar objects this method generates fewer features with either higher or the same accuracy compared with two existing methods ie fuzzy rough feature selection and tolerance based feature selection on number of benchmarks from the uci repository
document filtering df and document classification dc are often integrated together to classify suitable documents into suitable categories popular way to achieve integrated df and dc is to associate each category with threshold document may be classified into category only if its degree of acceptance doa with respect to is higher than the threshold of therefore tuning proper threshold for each category is essential threshold that is too high low may mislead the classifier to reject accept too many documents unfortunately thresholding is often based on the classifier’s doa estimations which cannot always be reliable due to two common phenomena the doa estimations made by the classifier cannot always be correct and not all documents may be classified without any controversy unreliable estimations are actually noises that may mislead the thresholding process in this paper we present an adaptive and parameter free technique ast to sample reliable doa estimations for thresholding ast operates by adapting to the classifier’s status without needing to define any parameters experimental results show that by helping to derive more proper thresholds ast may guide various classifiers to achieve significantly better and more stable performances under different circumstances the contributions are of practical significance for real world integrated df and dc
conformance testing with the guaranteed fault coverage is based on distinguishing faulty system implementations from the corresponding system specification we consider timed systems modeled by timed possibly non deterministic finite state machines tfsms and propose algorithms for distinguishing two tfsms in particular we present preset algorithm for separating two separable tfsms and an adaptive algorithm for distinguishing two possibly non separable tfsms the proposed techniques extend existing methods for untimed non deterministic fsms by dealing with the fact that unlike untimed fsms in general tfsm has an infinite number of timed inputs correspondingly we state that the upper bounds on the length of distinguishing sequences are the same as for untimed fsms
the ambient intelligence paradigm is built upon ubiquitous computing uc in which the computing devices are embedded in the environment with the purpose of enhancing the human experience at home workplace office learning health care etc the uc applications aim at providing services to the users anywhere anytime in an unobtrusive seemingly invisible way wireless sensor networks wsns have great potential for uc applications and are envisioned to revolutionize them this paper presents clustering routing protocol for event driven query based and periodic wsns the protocol aims at optimizing energy dissipation in the network as well as providing network’s fault tolerance and connectivity message propagation is accomplished by using short distance transmissions by employing nearest neighbor nodes between neighboring clusters moreover the algorithm proposes using an energy efficient approach by alternating the nodes responsible for inter cluster communication inside one cluster the algorithm also aims at even energy dissipation among the nodes in the network by alternating the possible routes to the sink this helps to balance the load on sensor nodes and increases the network lifetime while avoiding congested links at the same time we discuss the implementation of our protocol present its proof of correctness as well as the performance evaluation through an extensive set of simulation experiments
splitting volumetric object is useful operation in volume graphics and its applications but is not widely supported by existing systems for volume based modeling and rendering in this paper we present an investigation into two main algorithmic approaches namely explicit and implicit splitting for modeling and rendering splitting actions we consider generalized notion based on scalar fields which encompasses discrete specifications eg volume data sets as well as procedural specifications eg hypertextures of volumetric objects we examine the correctness effectiveness efficiency and deficiencies of each approach in specifying and controlling spatial and temporal specification of splitting we propose methods for implementing these approaches and for overcoming their deficiencies we present modeling tool for creating specifications of splitting functions and describe the use of volume scene graphs for facilitating direct rendering of volume splitting we demonstrate the use of these approaches with examples of volume visualization medical illustration volume animation and special effects
slicing is program analysis technique which can be used for reducing the size of the model and avoid state explosion in model checking in this work static slicing technique is proposed for reducing rebeca models with respect to property for applying the slicing techniques the rebeca dependence graph rdg is introduced as the static slicing usually produces large slices two other slicing based reduction techniques step wise slicing and bounded slicing are proposed as simple novel ideas step wise slicing first generates slices overapproximating the behavior of the original model and then refines it and bounded slicing is based on the semantics of non deterministic assignments in rebeca we also propose static slicing algorithm for deadlock detection in absence of any particular property the applicability of these techniques is checked by applying them to several case studies which are included in this paper similar techniques can be applied on the other actor based languages
we consider the problem of reconstructing surface from scattered points sampled on physical shape the sampled shape is approximated as the zero level set of function this function is defined as linear combination of compactly supported radial basis functions we depart from previous work by using as centers of basis functions set of points located on an estimate of the medial axis instead of the input data points those centers are selected among the vertices of the voronoi diagram of the sample data points being voronoi vertex each center is associated with maximal empty ball we use the radius of this ball to adapt the support of each radial basis function our method can fit user defined budget of centers the selected subset of voronoi vertices is filtered using the notion of lambda medial axis then clustered to fit the allocated budget
it is important to find the natural clusters in high dimensional data where visualization becomes difficult natural cluster is cluster of any shape and density and it should not be restricted to globular shape as wide number of algorithms assume or to specific user defined density as some density based algorithms require in this work it is proposed to solve the problem by maximizing the relatedness of distances between patterns in the same cluster it is then possible to distinguish clusters based on their distance based densities novel dynamic model is proposed based on new distance relatedness measures and clustering criteria the proposed algorithm mitosis is able to discover clusters of arbitrary shapes and arbitrary densities in high dimensional data it has good computational complexity compared to related algorithms it performs very well on high dimensional data discovering clusters that cannot be found by known algorithms it also identifies outliers in the data as by product of the cluster formation process validity measure that depends on the main clustering criterion is also proposed to tune the algorithm’s parameters the theoretical bases of the algorithm and its steps are presented its performance is illustrated by comparing it with related algorithms on several data sets
this paper describes the query rewrite facility of the starburst extensible database system novel phase of query optimization we present suite of rewrite rules used in starburst to transform queries into equivalent queries for faster execution and also describe the production rule engine which is used by starburst to choose and execute these rules examples are provided demonstrating that these query rewrite transformations lead to query execution time improvements of orders of magnitude suggesting that query rewrite in general mdash and these rewrite rules in particular mdash are an essential step in query optimization for modern database systems
due to the emergence of multimedia context rich applications and services over wireless networks networking protocols and services are becoming more and more integrated thus relying on context and application information to operate further wireless protocols and services have employed information from several network layers and the environment breaking the layering paradigm in order to cope with this increasing reliance on knowledge we propose mankop middleware for manets that instantiates new networking plane the knowledge plane kp is distributed entity that stores and disseminates information concerning the network its services and the environment orchestrating the collaboration among cross layer protocols autonomic management solutions and context aware services we use mankop to support the autonomic reconfiguration of pp network over manets simulation results show that the mankop enabled solution is applicable to more scenarios than the classic approaches as the network adapts its query dissemination strategy to match the current conditions of the peers
network on chip noc has been proposed to replace traditional bus based architectures to address the global communication challenges in nanoscale technologies in future soc architectures minimizing power consumption will continue to be an important design goal in this paper we present novel heuristic technique consisting of system level physical design and interconnection network generation that generates custom low power noc architectures for application specific soc we demonstrate the quality of the solutions produced by our technique by experimentation with many benchmarks our technique has low computational complexity and consumes only times the power consumption and times the number of router resources compared to an optimal milp based technique whose computational complexity is not bounded
collaborative object represents data type such as text document or spreadsheet designed to be shared by multiple geographically separated users in order to improve performance and availability of data in such distributed context each user has local copy of the shared objects upon which he may perform updates locally executed updates are then transmitted to the other users so the updates are applied in different orders at different copies of the collaborative object this replication potentially leads however to divergent ie different copies the operational transformation ot approach provides an interesting solution for copies divergence indeed every collaborative object has an algorithm which transforms the remote update according to local concurrent ones but this ot algorithm needs to fulfill two conditions in order to ensure the convergence proving the correctness of ot algorithms is very complex and error prone without the assistance of theorem prover in the present work we propose compositional method for specifying complex collaborative objects the most important feature of our method is that designing an ot algorithm for the composed collaborative object can be done by reusing the ot algorithms of component collaborative objects by using our method we can start from correct small collaborative objects which are relatively easy to handle and incrementally combine them to build more complex collaborative objects
we propose new similar sequence matching method that efficiently supports variable length and variable tolerance continuous query sequences on time series data stream earlier methods do not support variable lengths or variable tolerances adequately for continuous query sequences if there are too many query sequences registered to handle in main memory to support variable length query sequences we use the window construction mechanism that divides long sequences into smaller windows for indexing and searching the sequences to support variable tolerance query sequences we present new notion of intervaled sequences whose individual entries are an interval of real numbers rather than real number itself we also propose new similar sequence matching method based on these notions and then formally prove correctness of the method in addition we show that our method has the prematching characteristic which finds future candidates of similar sequences in advance experimental results show that our method outperforms the naive one by times and the existing methods in the literature by times over the entire ranges of parameters tested when the query selectivities are low
data analysts need to understand the quality of data in the warehouse this is often done by issuing many group by queries on the sets of columns of interest since the volume of data in these warehouses can be large and tables in data warehouse often contain many columns this analysis typically requires executing large number of group by queries which can be expensive we show that the performance of today’s database systems for such data analysis is inadequate we also show that the problem is computationally hard and develop efficient techniques for solving it we demonstrate significant speedup over existing approaches on today’s commercial database systems
software bugs can cause tremendous financial loss and are aserious threat to life or physical condition in safety critical areas formalsoftware verification with theorem provers aims at ensuring that no errorsare present but is too expensive to be employed for full scale systemswe show that these costs can be reduced significantly by reusing proofsand by the checker approach we demonstrate the applicability of ourapproach by case study checking the correctness of the scheduler of thepopular gcc compiler for vliw processor where we indeed found anerror
coca is fault tolerant and secure online certification authority that has been built and deployed both in local area network and in the internet extremely weak assumptions characterize environments in which coca’s protocols execute correctly no assumption is made about execution speed and message delivery delays channels are expected to exhibit only intermittent reliability and with coca servers up to may be faulty or compromised coca is the first system to integrate byzantine quorum system used to achieve availability with proactive recovery used to defend against mobile adversaries which attack compromise and control one replica for limited period of time before moving on to another in addition to tackling problems associated with combining fault tolerance and security new proactive recovery protocols had to be developed experimental results give quantitative evaluation for the cost and effectiveness of the protocols
publish subscribe applications are an important class of content based dissemination systems where the message transmission is defined by the message content rather than its destination ip address with the increasing use of xml as the standard format on many internet based applications xml aware pub sub applications become necessary in such systems the messages generated by publishers are encoded as xml documents and the profiles defined by subscribers as xml query statements as the number of documents and query requests grow the performance and scalability of the matching phase ie matching of queries to incoming documents become vital current solutions have limited or no flexibility to prune out queries in advance in this paper we overcome such limitation by proposing novel early pruning approach called bounding based xml filtering or boxfilter the boxfilter is based on new tree like indexing structure that organizes the queries based on their similarity and provides lower and upper bound estimations needed to prune queries not related to the incoming documents our experimental evaluation shows that the early profile pruning approach offers drastic performance improvements over the current state of the art in xml filtering
the objective of this paper is to empirically evaluate oomfpweb functional size measurement procedure for web applications we analyzed four data sets from family of experiments conducted in spain argentina and austria results showed that oomfpweb is efficient when compared to current industry practices oomfpweb produced reproducible functional size measurements and was perceived as easy to use and useful by the study participants who also expressed their intention to use oomfpweb in the future the analysis further supports the validity and reliability of the technology acceptance model tam based evaluation instrument used in the study
his paper studies under what conditions congestion control schemes can be both efficient so that capacity is not wasted and incentive compatible so that each participant can maximize its utility by following the prescribed protocol we show that both conditions can be achieved if routers run strict priority queueing spq or weighted fair queueing wfq and end hosts run any of family of protocols which we call probing increase educated decrease pied natural question is whether incentive compatibility and efficiency are possible while avoiding the per flow processing of wfq we partially address that question in the negative by showing that any policy satisfying certain locality condition cannot guarantee both properties our results also have implication for convergence to some steady state throughput for the flows even when senders transmit at fixed rate as in udp flow which does not react to congestion feedback effects among the routers can result in complex dynamics which do not appear in the simple topologies studied in past work
this paper defines and discusses the implementation of two novel extensions to the siena content based network cbn to extend it to become knowledge based network kbn thereby increasing the expressiveness and flexibility of its publications and subscription one extension provides ontological concepts as an additional message attribute type onto which subsumption relationships equivalence type queries and arbitrary ontological subscription filters can be applied the second extension provides for bag type to be used that allows bag equivalence sub bag and super bag relationships to be used in subscription filters possibly composed with any of the siena subscription operators or the ontological operators previously mentioned the performance of this kbn implementation has also been explored however to maintain scalability and performance it is important that these extensions do not break siena’s subscription aggregation algorithm we also introduce the necessary covering relationships for the new types and operators and examine the subscription matching overhead resulting from these new types and operators
using contextual equivalence aka observational equivalence to specify security properties is an important idea in the field of formal verification of cryptographic protocols while contextual equivalence is difficult to prove directly one is usually able to deduce it using the so called logical relations in typed calculi we apply this technique to the cryptographic metalanguage an extension of moggi’s computational calculus where we use stark’s model for name creation to explore the difficult aspect of dynamic key generation the categorical construction of logical relations for monadic types by goubault larrecq et al then allows us to derive logical relations over the category set although set is perfectly adequate model of dynamic key generation it lacks in some aspects when we study relations between programs in the metalanguage this leads us to an interesting exploration of what should be the proper category to consider we show that to define logical relations in the cryptographic metalanguage better choice of category is set that we proposed in zhang nowak logical relations for dynamic name creation in proceedings of the th international workshop of computer science logic and the th kurt godel colloqium csl kgl in lecture notes in computer science vol springer verlag pp however this category is still lacking in some subtler aspects and we propose refined category set to fix the flaws but our final choice is set which is equivalent to set we define the contextual equivalence based on set and show that the cryptographic logical relation derived over set is sound and can be used to verify protocols in practice
in this age of globalization organizations need to publish their microdata owing to legal directives or share it with business associates in order to remain competitive this puts personal privacy at risk to surmount this risk attributes that clearly identify individuals such as name social security number and driving license number are generally removed or replaced by random values but this may not be enough because such de identified databases can sometimes be joined with other public databases on attributes such as gender date of birth and zipcode to re identify individuals who were supposed to remain anonymous in the literature such an identity leaking attribute combination is called as quasi identifier it is always critical to be able to recognize quasi identifiers and to apply to them appropriate protective measures to mitigate the identity disclosure risk posed by join attacks in this paper we start out by providing the first formal characterization and practical technique to identify quasi identifiers we show an interesting connection between whether set of columns forms quasi identifier and the number of distinct values assumed by the combination of the columnswe then use this characterization to come up with probabilistic notion of anonymity again we show an interesting connection between the number of distinct values taken by combination of columns and the anonymity it can offer this allows us to find an ideal amount of generalization or suppression to apply to different columns in order to achieve probabilistic anonymity we work through many examples and show that our analysis can be used to make published database conform to privacy rules like hipaa in order to achieve probabilistic anonymity we observe that one needs to solve multiple dimensional anonymity problems we propose many efficient and scalable algorithms for achieving dimensional anonymity our algorithms are optimal in sense that they minimally distort data and retain much of its utility
automatic text classification is the problem of automatically assigning predefined categories to free text documents thus allowing for less manual labors required by traditional classification methods when we apply binary classification to multi class classification for text classification we usually use the one against the rest method in this method if document belongs to particular category the document is regarded as positive example of that category otherwise the document is regarded as negative example finally each category has positive data set and negative data set but this one against the rest method has problem that is the documents of negative data set are not labeled manually while those of positive set are labeled by human therefore the negative data set probably includes lot of noisy data in this paper we propose that the sliding window technique and the revised em expectation maximization algorithm are applied to binary text classification for solving this problem as result we can improve binary text classification through extracting potentially noisy documents from the negative data set using the sliding window technique and removing actually noisy documents using the revised em algorithm the results of our experiments showed that our method achieved better performance than the original one against the rest method in all the data sets and all the classifiers used in the experiments
outsourcing data to third party data providers is becoming common practice for data owners to avoid the cost of managing and maintaining databases meanwhile due to the popularity of location based services lbs the need for spatial data eg gazetteers vector data is increasing exponentially consequently we are witnessing new trend of outsourcing spatial datasets by data collectors two main challenges with outsourcing datasets is to keep the data private from the data provider and ensure the integrity of the query result for the clients unfortunately most of the techniques proposed for privacy and integrity do not extend to spatial data in straightforward manner hence recent studies proposed various techniques to support either privacy or integrity but not both on spatial datasets in this paper for the first time we propose technique that can ensure both privacy and integrity for outsourced spatial data in particular we first use one way spatial transformation method based on hilbert curves which encrypts the spatial data before outsourcing and hence ensures its privacy next by probabilistically replicating portion of the data and encrypting it with different encryption key we devise technique for the client to audit the trustworthiness of the query results we show the applicability of our approach for both nearest neighbor and spatial range queries the building blocks of any lbs application finally we evaluate the validity and performance of our algorithms with real world datasets
considering that when users watch video with someone else they are used to make comments regarding its contents such as comment with respect to someone appearing in the video in previous work we exploited ubiquitous computing concepts to propose the watching and commenting authoring paradigm in which user’s comments are automatically captured so as to automatically generate corresponding annotated interactive video in this paper we revisit and extend our previous work and detail our prototype that supports the watching and editing paradigm discussing how ubiquitous computing platform may explore digital ink and associated gestures to support the authoring of multimedia content while enhancing the social aspects of video watching
cluster based server consists of front end dispatcher and multiple back end servers the dispatcher receives incoming jobs and then decides how to assign them to back end servers which in turn serve the jobs according to some discipline cluster based servers have been widely deployed as they combine good performance with low costs several assignment policies have been proposed for cluster based servers most of which aim to balance the load among back end servers there are two main strategies for load balancing the first aims to balance the amount of workload at back end servers while the second aims to balance the number of jobs assigned to back end servers examples of policies using these strategies are dynamic and lc least connected respectively in this paper we propose policy called lc which combines the two aforementioned strategies the paper shows experimentally that when preemption is admitted ie when jobs execute concurrently on back end servers lc substantially outperforms bothdynamic and lc in terms of response time metrics this improved performance is achieved by using only information readily available to the dispatcher rendering lc practical policy to implement finally we study refinement called alc adaptive lc which further improves on the response time performance of lc by adapting its actions to incoming traffic rates
the growing popularity of hardware virtualization vmware and xen being two prominent implementations leads us to examine the common ground between this yet again vibrant technology and partial evaluation virtual machine executes on host hardware and presents to its guest program replica of that host environment complete with cpu memory and devices virtual machine can be seen as self interpreter program specializer is considered jones optimal if it is capable of removing layer of interpretational overhead we propose formulation of jones optimality which coincides with well known virtualization efficiency criterion fully abstract programming language translation an idea put forward by abadi is one that preserves program equivalences we may translate program by specializing self interpreter with respect to it we argue that full abstraction for such translations captures the notion of transparency whether or not program can determine if it is running on virtual machine in virtual machine folklore we hope that this discussion will encourage wider exchange of ideas between the virtualization and partial evaluation communities
the ability to integrate diverse components such as processor cores memories custom hardware blocks and complex network on chip noc communication frameworks onto single chip has greatly increased the design space available for system on chip soc designers efficient and accurate performance estimation tools are needed to assist the designer in making design decisions in this paper we present mc sim heterogeneous multi core simulator framework which is capable of accurately simulating variety of processor memory noc configurations and application specific coprocessors we also describe methodology to automatically generate fast cycle true behavioral based simulators for coprocessors using high level synthesis tool and integrate them with mc sim thus augmenting it with the capacity to simulate coprocessors our based simulators provide on an average improvement in simulation speed over that of rtl descriptions we have used this framework to simulate number of real life applications such as the mpeg decoder and litho simulation and experimented with number of design choices our simulator framework is able to accurately model the performance of these applications only off the actual implementation and allows us to explore the design space rapidly and achieve interesting design implementations
cache timing attacks are serious threat to security critical software we show that the combination of vector quantization and hidden markov model cryptanalysis is powerful tool for automated analysis of cache timing data it can be used to recover critical algorithm state such as key material we demonstrate its effectiveness by running an attack on the elliptic curve portion of openssl and under this involves automated lattice attacks leading to key recovery within hours we carry out the attack on live cache timing data without simulating the side channel showing these attacks are practical and realistic
little is known about the content of the major search engines we present an automatic learning method which trains an ontology with world knowledge of hundreds of different subjects in three level taxonomy covering the documents offered in our university library we then mine this ontology to find important classification rules and then use these rules to perform an extensive analysis of the content of the largest general purpose internet search engines in use today instead of representing documents and collections as set of terms we represent them as set of subjects which is highly efficient representation leading to more robust representation of information and decrease of synonymy
in this paper we review analyze and compare representations for simplicial complexes we classify such representations based on the dimension of the complexes they can encode into dimension independent structures and data structures for three and for two dimensional simplicial complexes we further classify the data structures in each group according to the basic kinds of the topological entities they represent we present description of each data structure in terms of the entities and topological relations encoded and we evaluate it based on its expressive power on its storage cost and on the efficiency in supporting navigation inside the complex ie in retrieving topological relations not explicitly encoded we compare the various data structures inside each category based on the above features
we investigate the randomized and quantum communication complexity of the hamming distance problem which is to determine if the hamming distance between two bit strings is no less than threshold we prove quantum lower bound of qubits in the general interactive model with shared prior entanglement we also construct classical protocol of log bits in the restricted simultaneous message passing model with public random coins improving previous protocols of bits ac yao on the power of quantum fingerprinting in proceedings of the th annual acm symposium on theory of computing pp and log bits gavinsky kempe de wolf quantum communication cannot simulate public coin quant ph
method for approximating spherical topology digital shapes by rational gaussian rag surfaces is presented points in shape are parametrized by approximating the shape with triangular mesh determining parameter coordinates at mesh vertices and finding parameter coordinates at shape points from interpolation of parameter coordinates at mesh vertices knowing the locations and parameter coordinates of the shape points the control points of rag surface are determined to approximate the shape with required accuracy the process starts from small set of control points and gradually increases the control points until the error between the surface and the digital shape reduces to required tolerance both triangulation and surface approximation proceed from coarse to fine therefore the method is particularly suitable for multiresolution creation and transmission of digital shapes over the internet application of the proposed method in editing of shapes is demonstrated
using the achievements of my research group over the last years provide evidence to support the following hypothesis by complementing each other cooperating reasoning process can achieve much more than they could if they only acted individually most of the work of my group has been on processes for mathematical reasoning and its applications eg to formal methods the reasoning processes we have studied include proof search by meta level inference proof planning abstraction analogy symmetry and reasoning with diagrams representation discovery formation and evolution by analysing diagnosing and repairing failed proof and planning attempts forming and repairing new concepts and conjectures and forming logical representations of informally stated problems other learning of new proof methods from example proofs finding counter examples reasoning under uncertainty the presentation of and interaction with proofs the automation of informal argument in particular we have studied how these different kinds of process can complement each other and cooperate to achieve complex goals we have applied this work to the following areas proof by mathematical induction and co induction analysis equation solving mechanics problems the building of ecological models the synthesis verification transformation and editing of both hardware and software including logic functional and imperative programs security protocols and process algebras the configuration of hardware game playing and cognitive modelling
formalization and quantification of the intuitive notion of relatedness between terms has long been major challenge for computing science and an intriguing problem for other sciences in this study we meet the challenge by considering general notion of relatedness between terms and given topic we introduce formal definition of relatedness measure based on term discrimination measures measurement of discrimination information mdi of terms is fundamental issue for many areas of science in this study we focus on mdi and present an in depth investigation into the concept of discrimination information conveyed in term information radius is an information measure relevant to wide variety of applications and is the basis of this investigation in particular we formally interpret discrimination measures in terms of simple but important property identified by this study and argue the interpretation is essential for guiding their application the discrimination measures can then naturally and conveniently be utilized to formalize and quantify the relatedness between terms and given topic some key points about the information radius discrimination measures and relatedness measures are also made an example is given to demonstrate how the relatedness measures can deal with some basic concepts of applications in the context of text information retrieval ir we summarize important features of and differences between the information radius and two other information measures from practical perspective the aim of this study is part of an attempt to establish theoretical framework with mdi at its core towards effective estimation of semantic relatedness between terms due to its generality our method can be expected to be useful tool with wide range of application areas
detecting outliers in large set of data objects is major data mining task aiming at finding different mechanisms responsible for different groups of objects in data set all existing approaches however are based on an assessment of distances sometimes indirectly by assuming certain distributions in the full dimensional euclidean data space in high dimensional data these approaches are bound to deteriorate due to the notorious curse of dimensionality in this paper we propose novel approach named abod angle based outlier detection and some variants assessing the variance in the angles between the difference vectors of point to the other points this way the effects of the curse of dimensionality are alleviated compared to purely distance based approaches main advantage of our new approach is that our method does not rely on any parameter selection influencing the quality of the achieved ranking in thorough experimental evaluation we compare abod to the well established distance based method lof for various artificial and real world data set and show abod to perform especially well on high dimensional data
many scalable data mining tasks rely on active learning to provide the most useful accurately labeled instances however what if there are multiple labeling sources oracles or experts with different but unknown reliabilities with the recent advent of inexpensive and scalable online annotation tools such as amazon’s mechanical turk the labeling process has become more vulnerable to noise and without prior knowledge of the accuracy of each individual labeler this paper addresses exactly such challenge how to jointly learn the accuracy of labeling sources and obtain the most informative labels for the active learning task at hand minimizing total labeling effort more specifically we present iethresh interval estimate threshold as strategy to intelligently select the expert with the highest estimated labeling accuracy iethresh estimates confidence interval for the reliability of each expert and filters out the one whose estimated upper bound confidence interval is below threshold which jointly optimizes expected accuracy mean and need to better estimate the expert’s accuracy variance our framework is flexible enough to work with wide range of different noise levels and outperforms baselines such as asking all available experts and random expert selection in particular iethresh achieves given level of accuracy with less than half the queries issued by all experts labeling and less than third the queries required by random expert selection on datasets such as the uci mushroom one the results show that our method naturally balances exploration and exploitation as it gains knowledge of which experts to rely upon and selects them with increasing frequency
continuous nearest neighbor queries knn are defined as finding the nearest points of interest along an enitre path eg finding the three nearest gas stations to moving car on any point of pre specified path the result of this type of query is set of intervals or split points and their corresponding knns such that the knns of all points within each interval are the same the current studies on knn focus on vector spaces where the distance between two objects is function of their spatial attributes eg euclidean distance metric these studies are not applicable to spatial network databases sndb where the distance between two objects is function of the network connectivity eg shortest path between two objects in this paper we propose two techniques to address knn queries in sndb intersection examination ie and upper bound algorithm uba with ie we first find the knns of all nodes on path and then for those adjacent nodes whose nearest neighbors are different we find the intermediate split points finally we compute the knns of the split points using the knns of the surrounding nodes the intuition behind uba is that the performance of ie can be improved by determining the adjacent nodes that cannot have any split points in between and consequently eliminating the computation of knn queries for those nodes our empirical experiments show that the uba approach outperforms ie specially when the points of interest are sparsely distributed in the network
the real time specification for java rtsj extends java’s support for platform independence into the realms of real time systems it supports an environment that facilitates on line feasibility analysis however an rtsj program is normally run on an execution platform that might be shared with other possibly non java applications this paper explores how the notion of service contracts can be used to enhance the platform independence of rtsj applications it also considers the role of real time application components within an rtsj environment
currently the web is an important part of people’s personal professional and social life and millions of services are becoming available online at the same time many efforts are made to semantically describe web services and several frameworks have been proposed ie wsmo sawsdl etc the web follows decentralized architecture thus all the services are available at some location but finding this location remains an open issue many efforts have been proposed to solve the service discovery problem but none of them took up in this work lightweight approach for service discovery is proposed our approach comprises of three main phases firstly during the crawling phase the semantic service descriptions are retrieved and stored locally afterwards in the homogenization phase the semantics of every description are mapped to service meta model and the resulting triples are stored in rdf repository finally at the search phase users are enabled to query the underlying repository and find online services
software birthmark is the inherent characteristics of program extracted from the program itself by comparing birthmarks we can detect whether program is copy of another program or not we propose static api birthmark for windows executables that utilizes sets of api calls identified by disassembler statically by comparing windows executables we show that our birthmark can distinguish similar programs and detect copies by comparing binaries generated by various compilers we also demonstrate that our birthmark is resilient we compare our birthmark with previous windows dynamic birthmark to show that it is more appropriate for gui applications
in this paper we survey recent advances in mobility modeling for mobile ad hoc network research the advances include some new mobility models and analysis of older mobility models first we classify mobility models into three categories according to the degree of randomness we introduce newly proposed mobility models in each of these categories next we discuss analysis for existing mobility models we describe the analysis work in three parts the first part is the statistical properties of the most widely used random waypoint model the second part describes the mobility metrics that aim to capture the characteristics of different mobility patterns the last part is the impact of mobility models on the performance of protocols we also describe some possible future work
we study the problem of global predicate detection in presence of permanent and transient failures we term the transient failures as small faults we show that it is impossible to detect predicates in an asynchronous distributed system prone to small faults even if nodes are equipped with powerful device known as failure detector sequencer denoted by to redress this impossibility we introduce theoretical device known as small fault sequencer denoted by �sf and show that �sf is necessary and sufficient for predicate detection unfortunately we also show that �sf cannot be implemented even in synchronous distributed system fortunately however we show that predicate detection can be achieved with high probability in synchronous systems
in weighted directed graph an bounded leg path is one whose constituent edges have length at most for any fixed computing bounded leg shortest paths is just as easy as the standard shortest path algorithm in this paper we study approximate distance oracles and reachability oracles for bounded leg path problems where the leg bound is not known in advance but forms part of the query bounded leg path problems are more complicated than standard shortest path problems because the number of distinct shortest paths between two vertices over all leg bounds could be as large as the number of edges in the graph the bounded leg constraint models situations where there is some limited resource that must be spent when traversing an edge for example the size of fuel tank or the life of battery places hard limit on how far vehicle can travel in one leg before refueling or recharging someone making long road trip may place hard limit on how many hours they are willing to drive in any one day our main result is nearly optimal algorithm for preprocessing directed graph in order to answer approximate bounded leg distance and bounded leg shortest path queries in particular we can preprocess any graph in otilde time producing data structure with size otilde that answers isin approximate bounded leg distance queries in log log time if the corresponding isin approximate shortest path has edges it can be returned in log log time these bounds are all within polylog factors of the best standard all pairs shortest path algorithm and improve substantially the previous best bounded leg shortest path algorithm whose preprocessing time and space are and otilde we also consider bounded leg oracles in other situations in the context of planar directed graphs we give time space tradeoff for answering bounded leg reachability queries for any ge we can build data structure with size kn that answers reachability queries in time otilde nk minus
three novel text vector representation approaches for neural network based document clustering are proposed the first is the extended significance vector model esvm the second is the hypernym significance vector model hsvm and the last is the hybrid vector space model hym esvm extracts the relationship between words and their preferred classified labels hsvm exploits semantic relationship from the wordnet ontology more general term the hypernym substitutes for terms with similar concepts this hypernym semantic relationship supplements the neural model in document clustering hym is combination of tfxidf vector and hypernym significance vector which combines the advantages and reduces the disadvantages from both unsupervised and supervised vector representation approaches according to our experiments the self organising map som model based on the hym text vector representation approach is able to improve classification accuracy and to reduce the average quantization error aqe on full text articles
component interaction automata provide fitting model to capture and analyze the temporal facets of hierarchical structured component oriented software systems however the rules governing composition as is typical for all automata based approaches suffer from combinatorial state explosion an effect that can have significant ramifications on the successful application of the component interaction automata formalism to real world scenarios we must therefore find some appropriate means to counteract state explosion one of which is partition refinement through weak bisimulation but while this technique can yield the desired state space reduction it does not consider synchronization cliques ie groups of states that are interconnected solely by internal synchronization transitions synchronization cliques give rise to action prefixes that capture pre conditions for component’s ability to interact with the environment current practice does not pay attention to these cliques but ignoring them can result in loss of valuable information for this reason we show in this paper how synchronization cliques emerge and how we can capture their behavior in order to make state space reduction aware of the presence of synchronization cliques
aggressive technology scaling into the nanometer regime has led to host of reliability challenges in the last several years unlike on chip caches which can be efficiently protected using conventional schemes the general core area is less homogeneous and structured making tolerating defects much more challenging problem due to the lack of effective solutions disabling non functional cores is common practice in industry to enhance manufacturing yield which results in significant reduction in system throughput although faulty core cannot be trusted to correctly execute programs we observe in this work that for most defects when starting from valid architectural state execution traces on defective core actually coarsely resemble those of fault free executions in light of this insight we propose robust and heterogeneous core coupling execution scheme necromancer that exploits functionally dead core to improve system throughput by supplying hints regarding high level program behavior we partition the cores in conventional cmp system into multiple groups in which each group shares lightweight core that can be substantially accelerated using these execution hints from potentially dead core to prevent this undead core from wandering too far from the correct path of execution we dynamically resynchronize architectural state with the lightweight core for core cmp system on average our approach enables the coupled core to achieve of the performance of fully functioning core this defect tolerance and throughput enhancement comes at modest area and power overheads of and respectively
research on security techniques for java bytecode has paid little attention to the security of the implementations of the techniques themselves assuming that ordinary tools for programming verification and testing are sufficient for security however different categories of security policies and mechanisms usually require different implementations each implementation requires extensive effort to test it and or verify itwe show that programming with well typed pattern structures in statically well typed language makes it possible to implement static byte code verification in fully type safe and highly adaptive way with security policies being fed in as first order parameters reduces the effort required to verify security of an implementation itself and the programming need for new policies also bytecode instrumentation can be handled in exactly the same way the approach aims at reducing the workload of building and understanding distributed systems especially those of mobile code
sophisticated commonsense knowledgebase is essential for many intelligent system applications this paper presents methodology for automatically retrieving event based commonsense knowledge from the web the approach is based on matching the text in web search results to designed lexico syntactic patterns we apply semantic role labeling technique to parse the extracted sentences so as to identify the essential knowledge associated with the event described in each sentence particularly we propose semantic role substitution strategy to prune knowledge items that have high probability of erroneously parsed semantic roles the experimental results in case study for retrieving the knowledge is capable of shows that the accuracy of the retrieved commonsense knowledge is around
we address the difficult question of inferring plausible node mobility based only on information from wireless contact traces working with mobility information allows richer protocol simulations particularly in dense networks but requires complex set ups to measure whereas contact information is easier to measure but only allows for simplistic simulation models in contact trace lot of node movement information is irretrievably lost so the original positions and velocities are in general out of reach we propose fast heuristic algorithm inspired by dynamic force based graph drawing capable of inferring plausible movement from any contact trace and evaluate it on both synthetic and real life contact traces our results reveal that the quality of the inferred mobility is directly linked to the precision of the measured contact trace and ii the simple addition of appropriate anticipation forces between nodes leads to an accurate inferred mobility
multi touch input on interactive surfaces has matured as device for bimanual interaction and invoked widespread research interest we contribute empirical work on direct versus indirect use multi touch input comparing direct input on tabletop display with an indirect condition where the table is used as input surface to separate vertically arranged display surface users perform significantly better in the direct condition however our experiments show that this is primarily the case for pointing with comparatively little difference for dragging tasks we observe that an indirect input arrangement impacts strongly on the users fluidity and comfort of hovering movement over the surface and suggest investigation of techniques that allow users to rest their hands on the surface as default position for interaction
we apply simplified image based lighting methods to reduce the equipment cost time and specialized skills required for high quality photographic lighting of desktop sized static objects such as museum artifacts we place the object and computer steered moving head spotlight inside simple foam core enclosure and use camera to record photos as the light scans the box interior optimization guided by interactive user sketching selects small set of these photos whose weighted sum best matches the user defined target sketch unlike previous image based relighting efforts our method requires only single area light source yet it can achieve high resolution light positioning to avoid multiple sharp shadows reduced version uses only handheld light and may be suitable for battery powered field photography equipment that fits into backpack
online detection of video clips that present previously unseen events in video stream is still an open challenge to date for this online new event detection oned task existing studies mainly focus on optimizing the detection accuracy instead of the detection efficiency as result it is difficult for existing systems to detect new events in real time especially for large scale video collections such as the video content available on the web in this paper we propose several scalable techniques to improve the video processing speed of baseline oned system by orders of magnitude without sacrificing much detection accuracy first we use text features alone to filter out most of the non new event clips and to skip those expensive but unnecessary steps including image feature extraction and image similarity computation second we use combination of indexing and compression methods to speed up text processing we implemented prototype of our optimized oned system on top of ibm’s system the effectiveness of our techniques is evaluated on the standard trecvid benchmark which demonstrates that our techniques can achieve fold speedup with detection accuracy degraded less than
in the last several decades it has become an important basis to retrieve images from image databases idbs by the semantic information held in the image objects and the spatial patterns formed by these objects in this paper we propose new method for similarity retrieval of symbolic images by both the attributes and the spatial relationships of the contained objects the proposed method cpm common pattern method retains the common spatial patterns of two images in new data structures cpdag common pattern directed acyclic graph and performs the similarity calculation efficiently in practice the conducted experiments use both synthetic dataset and an existing image database the experimental results show that cpm outperforms lcsclique sim sim and be string for average efficiency and effectiveness cpm also has steady efficiency while the number of image objects and the object symbol duplication rates increase
multi display groupware mdg systems which typically comprise both public and personal displays promise to enhance collaboration yet little is understood about how they differ in use from single display groupware sdg systems while research has established the technical feasibility of mdg systems evaluations have not addressed the question of how users behave in such environments how their interface design can impact group behavior or what advantages they offer for collaboration this paper presents user study that investigates the impact of display configuration and software interface design on taskwork and teamwork groups of three completed collaborative optimization task in single and multi display environments under different task interface constraints our results suggest that mdg configurations offer advantages for performing individual task duties whereas sdg conditions offer advantages for coordinating access to shared resources the results also reveal the importance of ergonomic design considerations when designing co located groupware systems
in semantic and object oriented data models each class has one or more typing properties that associate it to other classes and carry type information about all instances of the class we introduce new kind of property that we call instance typing property an instance typing property associates an instance of class to another class and carries type information about that particular instance and not about all instances of the class instance typing properties are important as they allow to represent summary information about an instance in addition to specific information in this paper we study inheritance of properties from class to an instance using type information about the class as well as type information about the instance this kind of inheritance that we call contextual instance inheritance provides us with the most specific type information about the instance in particular context intuitively context is metaclass of interest with respect to which this most specific information is determined we demonstrate that contextual instance inheritance is powerful conceptual modeling mechanism capable of expressing valuable information about instances we also provide framework in which derived instance inherited properties can be represented and retrieved in the same way as usual properties
new aggressive algorithm for the elimination of partially dead code is presented ie of code which is only dead on some program paths besides being more powerful than the usual approaches to dead code elimination this algorithm is optimal in the following sense partially dead code remaining in the resulting program cannot be eliminated without changing the branching structure or the semantics of the program or without impairing some program executions our approach is based on techniques for partial redundancy elimination besides some new technical problems there is significant difference here partial dead code elimination introduces second order effects which we overcome by means of exhaustive motion and elimination steps the optimality and the uniqueness of the program obtained is proved by means of new technique which is universally applicable and particularly useful in the case of mutually interdependent program optimizations
ridges are characteristic curves of surface that mark salient intrinsic features of its shape and are therefore valuable for shape matching surface quality control visualization and various other applications ridges are loci of points on surface where either of the principal curvatures attain critical value in its respective principal direction these curves have complex behavior near umbilics on surface and may also pass through certain turning points causing added complexity for ridge computation we present new algorithm for numerically tracing ridges on spline surfaces that also accurately captures ridge behavior at umbilics and ridge turning points the algorithm traverses ridge segments by detecting ridge points while advancing and sliding in principal directions on surface in novel manner thereby computing connected curves of ridge points the output of the algorithm is set of curve segments some or all of which may be selected for other applications such as those mentioned above the results of our technique are validated by comparison with results from previous research and with brute force domain sampling technique
this paper addresses the foundations of data model transformation catalog of data mappings is presented which includes abstraction and representation relations and associated constraints these are justified in an algebraic style via the pointfree transform technique whereby predicates are lifted to binary relation terms of the algebra of programming in two level style encompassing both data and operations this approach to data calculation which also includes transformation of recursive data models into flat database schemes is offered as alternative to standard database design from abstract models the calculus is also used to establish link between the proposed transformational style and bidirectional lenses developed in the context of the classical view update problem
existing data analysis techniques have difficulty in handling multidimensional data multidimensional data has been challenge for data analysis because of the inherent sparsity of the points in this paper we first present novel data preprocessing technique called shrinking which optimizes the inherent characteristic of distribution of data this data reorganization concept can be applied in many fields such as pattern recognition data clustering and signal processing then as an important application of the data shrinking preprocessing we propose shrinking based approach for multidimensional data analysis which consists of three steps data shrinking cluster detection and cluster evaluation and selection the process of data shrinking moves data points along the direction of the density gradient thus generating condensed widely separated clusters following data shrinking clusters are detected by finding the connected components of dense cells and evaluated by their compactness the data shrinking and cluster detection steps are conducted on sequence of grids with different cell sizes the clusters detected at these scales are compared by cluster wise evaluation measurement and the best clusters are selected as the final result the experimental results show that this approach can effectively and efficiently detect clusters in both low and high dimensional spaces
program specialization is program transformation methodology which improves program efficiency by exploiting the information about the input data which are available at compile time we show that current techniques for program specialization based on partial evaluation do not perform well on nondeterministic logic programs we then consider set of transformation rules which extend the ones used for partial evaluation and we propose strategy for guiding the application of these extended rules so to derive very efficient specialized programs the efficiency improvements which sometimes are exponential are due to the reduction of nondeterminism and to the fact that the computations which are performed by the initial programs in different branches of the computation trees are performed by the specialized programs within single branches in order to reduce nondeterminism we also make use of mode information for guiding the unfolding process to exemplify our technique we show that we can automatically derive very efficient matching programs and parsers for regular languages the derivations we have performed could not have been done by previously known partial evaluation techniques
pre post condition based specifications are common place in variety of software engineering activities that range from requirements through to design and implementation the fragmented nature of these specifications can hinder validation as it is difficult to understand if the specifications for the various operations fit together well in this paper we propose novel technique for automatically constructing abstractions in the form of behaviour models from pre post condition based specifications the level of abstraction at which such models are constructed preserves enabledness of sets of operations resulting in finite model that is intuitive to validate and which facilitates tracing back to the specification for debugging the paper also reports on the application of the approach to an industrial strength protocol specification in which concerns were identified
this paper addresses the problem of scheduling dag of unit length tasks on asynchronous processors that is processors having different and changing speeds the objective is to minimize the makespan that is the time to execute the entire dag asynchrony is modeled by an oblivious adversary which is assumed to determine the processor speeds at each point in time the oblivious adversary may change processor speeds arbitrarily and arbitrarily often but makes speed decisions independently of any random choices of the scheduling algorithm this paper gives bounds on the makespan of two randomized online firing squad scheduling algorithms all and level these two schedulers are shown to have good makespan even when asynchrony is arbitrarily extreme let and denote respectively the number of tasks and the longest path in the dag and let �ave denote the average speed of the processors during the execution in all each processor repeatedly chooses random task to execute from among all ready tasks tasks whose predecessors have been executed scheduler all is shown to have makespan tp over p�ave when over log log over p�ave log over �ave when over log for over �ave when over over log both expected and with high probability family of dags is exhibited for which this analysis is tight in level each of the processors repeatedly chooses random task to execute from among all critical tasks ready tasks at the lowest level of the dag this second scheduler is shown to have makespan of
systemc simulation kernel consists of deterministic implementation of the scheduler whose specification is non deterministic to leverage testing of systemc tlm design we focus on automatically exploring all possible behaviors of the design for given data input we combine static and dynamic partial order reduction techniques with systemc semantics to intelligently explore subset of the possible traces while still being provably sufficient for detecting deadlocks and safety property violations we have implemented our exploration algorithm in framework called satya and have applied it to variety of examples including the tac benchmark using satya we automatically found an assertion violation in benchmark distributed as part of the osci repository
because of their size service times and drain on server resources multimedia objects require specialized replication systems in order to meet demand and ensure content availability we present novel method for creating replication systems where the replicated objects sizes and or per object service times are large such replication systems are well suited to delivering multimedia objects on the internet assuming that user request patterns to the system are known we show how to create replication systems that distribute read load to servers in proportion to their contribution to system capacity and experimentally show the positive load distribution properties of such systems however when user request patterns differ from what the system was designed for system performance will be affected therefore we also report on results that reveal how server loads are affected and ii the impact two system design parameters indicators of system’s load distribution qualities have on server load when request patterns differ from that for which system was designed
in this paper new methodology for tolerating link as well as node defects in self adaptive reconfigurable networks will be presented currently networked embedded systems need certain level of redundancy for each node and link in order to tolerate defects and failures in network due to monetary constraints as well as space and power limitations the replication of each node and link is not an option in most embedded systems therefore we will present hardware software partitioning algorithm for reconfigurable networks that optimizes the task binding onto resources at runtime such that node link defects can be handled and data traffic on links between computational nodes will be minimized this paper presents new hardware software partitioning algorithm an experimental evaluation and for demonstrating the realizability an implementation on network of fpga based boards
enrichment of text documents with semantic metadata reflecting their meaning facilitates document organization indexing and retrieval however most web data remain unstructured because of the difficulty and the cost of manually annotating text in this work we present cerno framework for semi automatic semantic annotation of textual documents according to domain specific semantic model the proposed framework is founded on light weight techniques and tools intended for legacy code analysis and markup to illustrate the feasibility of our proposal we report experimental results of its application to two different domains these results suggest that light weight semi automatic techniques for semantic annotation are feasible require limited human effort for adaptation to new domain and demonstrate markup quality comparable with state of the art methods
in peer to peer pp systems peers often must interact with unknown or unfamiliar peers without the benefit of trusted third parties or authorities to mediate the interactions trust management through reputation mechanism to facilitate such interactions is recognized as an important element of pp systems it is however faced by the problems of how to stimulate reputation information sharing and honest recommendation elicitation this paper presents icrep an incentive compatible reputation mechanism for pp systems icrep has two unique features recommender’s credibility and level of confidence about the recommendation is considered in order to achieve more accurate calculation of reputations and fair evaluation of recommendations ii incentive for participation and honest recommendation is implemented through fair differential service mechanism it relies on peer’s level of participation and on the recommendation credibility theoretic analysis and simulation show that icrep can help peers effectively detect dishonest recommendations in variety of scenarios where more complex malicious strategies are introduced moreover it can also stimulate peers to send sufficiently honest recommendations
several authors have shown that when labeled data are scarce improved classifiers can be built by augmenting the training set with large set of unlabeled examples and then performing suitable learning these works assume each unlabeled sample originates from one of the known classes here we assume each unlabeled sample comes from either known or from heretofore undiscovered class we propose novel mixture model which treats as observed data not only the feature vector and the class label but also the fact of label presence absence for each sample two types of mixture components are posited predefined components generate data from known classes and assume class labels are missing at random nonpredefined components only generate unlabeled data ie they capture exclusively unlabeled subsets consistent with an outlier distribution or new classes the predefined nonpredefined natures are data driven learned along with the other parameters via an extension of the em algorithm our modeling framework addresses problems involving both the known and unknown classes robust classifier design classification with rejections and identification of the unlabeled samples and their components from unknown classes case is step toward new class discovery experiments are reported for each application including topic discovery for the reuters domain experiments also demonstrate the value of label presence absence data in learning accurate mixtures
the longstanding problem of automatic table interpretation still eludes us its solution would not only be an aid to table processing applications such as large volume table conversion but would also be an aid in solving related problems such as information extraction semantic annotation and semi structured data management in this paper we offer solution for the common special case in which so called sibling pages are available the sibling pages we consider are pages on the hidden web commonly generated from underlying databases our system compares them to identify and connect nonvarying components category labels and varying components data values we tested our solution using more than tables in source pages from three different domains car advertisements molecular biology and geopolitical information experimental results show that the system can successfully identify sibling tables generate structure patterns interpret tables using the generated patterns and automatically adjust the structure patterns as it processes sequence of hidden web pages for these activities the system was able to achieve an overall measure of further given that we can automatically interpret tables we next show that this leads immediately to conceptualization of the data in these interpreted tables and thus also to way to semantically annotate these interpreted tables with respect to the ontological conceptualization labels in nested table structures yield ontological concepts and interrelationships among these concepts and associated data values become annotated information we further show that semantically annotated data leads immediately to queriable data thus the entire process which is fully automatic transform facts embedded within tables into facts accessible by standard query engines
how things work visualizations use variety of visual techniques to depict the operation of complex mechanical assemblies we present an automated approach for generating such visualizations starting with cad model of an assembly we first infer the motions of individual parts and the interactions between parts based on their geometry and few user specified constraints we then use this information to generate visualizations that incorporate motion arrows frame sequences and animation to convey the causal chain of motions and mechanical interactions between parts we present results for wide variety of assemblies
new framework is presented for both understanding and developing graph cut based combinatorial algorithms suitable for the approximate optimization of very wide class of markov random fields mrfs that are frequently encountered in computer vision the proposed framework utilizes tools from the duality theory of linear programming in order to provide an alternative and more general view of state of the art techniques like the alpha expansion algorithm which is included merely as special case moreover contrary to alpha expansion the derived algorithms generate solutions with guaranteed optimality properties for much wider class of problems for example even for mrfs with nonmetric potentials in addition they are capable of providing per instance suboptimality bounds in all occasions including discrete mrfs with an arbitrary potential function these bounds prove to be very tight in practice that is very close to which means that the resulting solutions are almost optimal our algorithms effectiveness is demonstrated by presenting experimental results on variety of low level vision tasks such as stereo matching image restoration image completion and optical flow estimation as well as on synthetic problems
in traditional game theory players are typically endowed with exogenously given knowledge of the structure of the game either full omniscient knowledge or partial but fixed information in real life however people are often unaware of the utility of taking particular action until they perform research into its consequences in this paper we model this phenomenon we imagine player engaged in question and answer session asking questions both about his or her own preferences and about the state of reality thus we call this setting socratic game theory in socratic game players begin with an priori probability distribution over many possible worlds with different utility function for each world players can make queries at some cost to learn partial information about which of the possible worlds is the actual world before choosing an action we consider two query models an unobservable query model in which players learn only the response to their own queries and an observable query model in which players also learn which queries their opponents madethe results in this paper consider cases in which the underlying worlds of two player socratic game are either constant sum games or strategically zero sum games class that generalizes constant sum games to include all games in which the sum of payoffs depends linearly on the interaction between the players when the underlying worlds are constant sum we give polynomial time algorithms to find nash equilibria in both the observable and unobservable query models when the worlds are strategically zero sum we give efficient algorithms to find nash equilibria in unobservablequery socratic games and correlated equilibria in observablequery socratic games
this paper describes mimic an adaptive mixed initiative spoken dialogue system that provides movie showtime information mimic improves upon previous dialogue systems in two respects first it employs initiative oriented strategy adaptation to automatically adapt response generation strategies based on the cumulative effect of information dynamically extracted from user utterances during the dialogue second mimic’s dialogue management architecture decouples its initiative module from the goal and response strategy selection processes providing general framework for developing spoken dialogue systems with different adaptation behavior
although modern graphics hardware has strong capability to render millions of triangles within second huge scenes are still unable to be rendered in real time lots of parallel and distributed graphics systems are explored to solve this problem however none of them is built for large scale graphics applicationswe designed anygl large scale hybrid distributed graphics system which consists of four types of logical nodes geometry distributing node geometry rendering node image composition node and display node the first two types of logical nodes are combined to be sort first graphics architecture while the others compose images new state tracking method based on logical timestamp is also pro posed for state tracking of large scale distributed graphics systems besides three classes of compression are employed to reduce the requirement of network bandwidth including command code compression geometry compression and image compression new extension global share of textures and display lists is also implemented in anygl to avoid memory explosion in large scale cluster rendering systems
in this paper we take closer look at the security of out sourced databases aka database as the service or das topic of emerging importance das allows users to store sensitive data on remote untrusted server and retrieve desired parts of it on request at first we focus on basic exact match query functionality and then extend our treatment to prefix matching and to more limited extent range queries as well we propose several searchable encryption schemes that are not only practical enough for use in das in terms of query processing efficiency but also provably provide privacy and authenticity of data under new definitions of security that we introduce the schemes are easy to implement and are based on standard cryptographic primitives such as block ciphers symmetric encryption schemes and message authentication codes as we are some of the first to apply the provable security framework of modern cryptography to this context we believe our work will help to properly analyze future schemes and facilitate further research on the subject in general
although many algorithms hardware designs and security protocols have been formally verified formal verification of the security of software is still rare this is due in large part to the large size of software which results in huge costs for verification this paper describes novel and practical approach to formally establishing the security of code the approach begins with well defined set of security properties and based on the properties constructs compact security model containing only information needed to rea son about the properties our approach was formulated to provide evidence for common criteria evaluation of an embedded soft ware system which uses separation kernel to enforce data separation the paper describes our approach to verifying the kernel code and the artifacts used in the evaluation top level specification tls of the kernel behavior formal definition of dataseparation mechanized proof that the tls enforces data separation code annotated with pre and postconditions and partitioned into three categories and formal demonstration that each category of code enforces data separation also presented is the formal argument that the code satisfies the tls
this paper introduces the language independent concept of thread usage policy many multi threaded software systems contain policies that regulate associations among threads executable code and potentially shared state system for example may constrain which threads are permitted to execute particular code segments usually as means to constrain those threads from accessing or writing particular elements of state these policies ensure properties such as state confinement or reader writer constraints often without recourse to locking or transaction discipline our approach allows developers to concisely document their thread usage policies in manner that enables the use of sound scalable analysis to assess consistency of policy and as written code this paper identifies the key semantic concepts of our thread coloring language and illustrates how to use its succinct source level annotations to express models of thread usage policies following established annotation conventions for java we have built prototype static analysis tool implemented as an integrated development environment plug in for the eclipse ide that notifies developers of discrepancies between policy annotations and as written code our analysis technique uses several underlying algorithms based on abstract interpretation call graphs and type inference the resulting overall analysis is both sound and composable we have used this prototype analysis tool in case studies to model and analyze more than million lines of code our validation process included field trials on wide variety of complex large scale production code selected by the host organizations our in field experience led us to focus on potential adoptability by real world developers we have developed techniques that can reduce annotation density to less than one line per thousand lines of code kloc in addition the prototype analysis tool supports an incremental and iterative approach to modeling and analysis this approach enabled field trial partners to directly target areas of greatest concern and to achieve useful results within few hours
general framework for network aware programming is presented that consists of language for programming mobile applications logic for specifying properties of the applications and an automatic tool for verifying such properties the framework is based on klaim extended klaim an experimental programming language specifically designed to program distributed systems composed of several components interacting through multiple tuple spaces and mobile code the proposed logic is modal logic inspired by hennessy milner logic and is interpreted over the same labelled structures used for the operational semantics of klaim the automatic verification tool is based on complete proof system that has been previously developed for the logic
distributed web server systems have been widely used to provide effective internet services the management of these systems requires dynamic controls of the web traffic with the development of multimedia web sites and increasingly diversified services the existing load balancing approaches can no longer satisfy the requirements of either the service providers or the users in this paper new reward based control mechanism is proposed that can satisfy the dynamic content based control requirement while avoiding congestion at the dispatcher the core of the control algorithm is based on an mdp model to minimize the system overhead centralized dispatching with decentralized admission cdda approach is used to distribute the control related computation to each server pool this cuts down the dimensions of the problem dramatically we also propose state block scheme to further reduce the state space so that the algorithm becomes computationally feasible for on line implementation simulation results demonstrate that the proposed state block approach can not only reduce the computation time dramatically but also provide good approximation of power tailed request interarrival times common for internet traffic finally an implementation plan with system design is also proposed
the advent of multicores presents promising opportunity for speeding up the execution of sequential programs through their parallelization in this paper we present novel solution for efficiently supporting software based speculative parallelization of sequential loops on multicore processors the execution model we employ is based upon state separation an approach for separately maintaining the speculative state of parallel threads and non speculative state of the computation if speculation is successful the results produced by parallel threads in speculative state are committed by copying them into the computation’s non speculative state if misspeculation is detected no costly state recovery mechanisms are needed as the speculative state can be simply discarded techniques are proposed to reduce the cost of data copying between non speculative and speculative state and efficiently carrying out misspeculation detection we apply the above approach to speculative parallelization of loops in several sequential programs which results in significant speedups on dell poweredge server with two intel xeon quad core processors
there is growing demand for network devices capable of examining the content of data packets in order to improve network security and provide application specific services most high performance systems that perform deep packet inspection implement simple string matching algorithms to match packets against large but finite set of strings owever there is growing interest in the use of regular expression based pattern matching since regular expressions offer superior expressive power and flexibility deterministic finite automata dfa representations are typically used to implement regular expressions however dfa representations of regular expression sets arising in network applications require large amounts of memory limiting their practical applicationin this paper we introduce new representation for regular expressions called the delayed input dfa dfa which substantially reduces space equirements as compared to dfa dfa is constructed by transforming dfa via incrementally replacing several transitions of the automaton with single default transition our approach dramatically reduces the number of distinct transitions between states for collection of regular expressions drawn from current commercial and academic systems dfa representation reduces transitions by more than given the substantially reduced space equirements we describe an efficient architecture that can perform deep packet inspection at multi gigabit rates our architecture uses multiple on chip memories in such way that each remains uniformly occupied and accessed over short duration thus effectively distributing the load and enabling high throughput our architecture can provide ostffective packet content scanning at oc rates with memory requirements that are consistent with current asic technology
this paper introduces new paradigm for mutation testing which we call higher order mutation testing hom testing traditional mutation testing considers only first order mutants created by the injection of single fault often these first order mutants denote trivial faults that are easily killed higher order mutants are created by the insertion of two or more faults the paper introduces the concept of subsuming hom one that is harder to kill than the first order mutants from which it is constructed by definition subsuming homs denote subtle fault combinations the paper reports the results of an empirical study of hom testing using programs including several non trivial real world subjects for which test suites are available
an optimization query asks for one or more data objects that maximize or minimize some function over the data set we propose general class of queries model based optimization queries in which generic model is used to define wide variety of queries involving an optimization objective function and or set of constraints on the attributes this model can be used to define optimization of linear and nonlinear expressions over object attributes as well as many existing query types studied in database research literature significant and important subset of this general model relevant to real world applications include queries where the optimization function and constraints are convex we cast such queries as members of the convex optimization cp model and provide unified query processing framework for cp queries that optimally accesses data and space partitioning index structures without changing the underlying structures we perform experiments to show the generality of the technique and where possible compare to techniques developed for specialized optimization queries we find that we achieve nearly identical performance to the limited optimization query types with optimal solutions while providing generic modeling and processing for much broader class of queries and while effectively handling problem constraints
rather than detecting defects at an early stage to reduce their impact defect prevention means that defects are prevented from occurring in advance causal analysis is common approach to discover the causes of defects and take corrective actions however selecting defects to analyze among large amounts of reported defects is time consuming and requires significant effort to address this problem this study proposes defect prediction approach where the reported defects and performed actions are utilized to discover the patterns of actions which are likely to cause defects the approach proposed in this study is adapted from the action based defect prediction abdp an approach uses the classification with decision tree technique to build prediction model and performs association rule mining on the records of actions and defects an action is defined as basic operation used to perform software project while defect is defined as software flaws and can arise at any stage of the software process the association rule mining finds the maximum rule set with specific minimum support and confidence and thus the discovered knowledge can be utilized to interpret the prediction models and software process behaviors the discovered patterns then can be applied to predict the defects generated by the subsequent actions and take necessary corrective actions to avoid defects the proposed defect prediction approach applies association rule mining to discover defect patterns and multi interval discretization to handle the continuous attributes of actions the proposed approach is applied to business project giving excellent prediction results and revealing the efficiency of the proposed approach the main benefit of using this approach is that the discovered defect patterns can be used to evaluate subsequent actions for in process projects and reduce variance of the reported data resulting from different projects additionally the discovered patterns can be used in causal analysis to identify the causes of defects for software process improvement
in this paper we present software framework which supports the construction of mixed fidelity from sketch based to software prototypes for mobile devices the framework is available for desktop computers and mobile devices eg pdas smartphones it operates with low fidelity sketch based prototypes or mid to high fidelity prototypes with some range of functionality providing several dimensions of customization eg visual components audio video files navigation behavior and targeting specific usability concerns furthermore it allows designers and users to test the prototypes on actual devices gathering usage information both passively eg logging and actively eg questionnaires experience sampling overall it conveys common prototyping procedures with effective data gathering methods that can be used on ubiquitous scenarios supporting in situ prototyping and participatory design on the go we address the framework’s features and its contributions to the design and evaluation of applications for mobile devices and the field of mobile interaction design presenting real life case studies and results
document analysis is done to analyze entire forms eg intelligent form analysis table detection or to describe the layout structure of document in this paper document analysis is applied to snippets of torn documents to calculate features that can be used for reconstruction the main intention is to handle snippets of varying size and different contents eg handwritten or printed text documents can either be destroyed by the intention to make the printed content unavailable eg business crime or due to time induced degeneration of ancient documents eg bad storage conditions current reconstruction methods for manually torn documents deal with the shape or eg inpainting and texture synthesis techniques in this paper the potential of document analysis techniques of snippets to support reconstruction algorithm by considering additional features is shown this implies rotational analysis color analysis line detection paper type analysis checked lined blank and classification of the text printed or hand written preliminary results show that these features can be determined reliably on real dataset consisting of snippets
this paper presents methodology and tool to support test selection from regression test suites based on change analysis in object oriented designs we assume that designs are represented using the unified modeling language uml and we propose formal mapping between design changes and classification of regression test cases into three categories reusable retestable and obsolete we provide evidence of the feasibility of the methodology and its usefulness by using our prototype tool on an industrial case study and two student projects
image retrieval from an image database by the image objects and their spatial relationships has emerged as an important research subject in these decades to retrieve images similar to given query image retrieval methods must assess the similarity degree between database image and the query image by the extracted features with acceptable efficiency and effectiveness this paper proposes graph based model srg spatial relation graph to represent the semantic information of the contained objects and their spatial relationships in an image with no file annotation in an srg graph the image objects are symbolized by the predefined class names as vertices and the spatial relations between object pairs are represented as arcs the proposed model assesses the similarity degree between two images by calculating the maximum common subgraph of two corresponding srg’s through intersection which has quadratic time complexity owing to the characteristics of srg its efficiency remains quadratic regardless of the duplication rate of the object symbols the extended model srg is also proposed with the same time complexity for the applications that need to consider the topological relations among objects synthetic symbolic image database and an existing image dataset are used in the conducted experiments to verify the performance of the proposed models the experimental results show that the proposed models have compatible retrieval quality with remarkable efficiency improvements compared with three well known methods lcsclique sim and be string where lcsclique utilizes the number of objects in the maximum common subimage as its similarity function sim uses accumulation based similarity function of similar object pairs and be string calculates the similarity of patterns by the linear combination of two similarities
we give new algorithms for learning halfspaces in the challenging malicious noise model where an adversary may corrupt both the labels and the underlying distribution of examples our algorithms can tolerate malicious noise rates exponentially larger than previous work in terms of the dependence on the dimension and succeed for the fairly broad class of all isotropic log concave distributions we give poly epsilon time algorithms for solving the following problems to accuracy epsilon learning origin centered halfspaces in rn with respect to the uniform distribution on the unit ball with malicious noise rate eta omega epsilon log epsilon the best previous result was omega epsilon log epsilon learning origin centered halfspaces with respect to any isotropic log concave distribution on rn with malicious noise rate eta omega epsilon log epsilon this is the first efficient algorithm for learning under isotropic log concave distributions in the presence of malicious noise we also give poly epsilon time algorithm for learning origin centered halfspaces under any isotropic log concave distribution on rn in the presence of adversarial label noise at rate eta omega epsilon log epsilon in the adversarial label noise setting or agnostic model labels can be noisy but not example points themselves previous results could handle eta omega epsilon but had running time exponential in an unspecified function of epsilon our analysis crucially exploits both concentration and anti concentration properties of isotropic log concave distributions our algorithms combine an iterative outlier removal procedure using principal component analysis together with smooth boosting the acm portal is published by the association for computing machinery copyright acm inc terms of usage privacy policy code of ethics contact us useful downloads adobe acrobat quicktime windows media player real player
the effectiveness of service oriented computing relies on the trustworthiness of sharing of data between services we advocate semi automated approach for information distribution and sharing assisted by reputation system unlike current recommendation systems which provide user with general trust value for service we propose reputation model which calculates trust neighbourhoods through fine grained multi attribute analysis such model allows recommendation relevance to improve whilst maintaining large user group propagating and evolving trust perceptions between users the approach is demonstrated on small example
this paper proposes semi completely connected bus called skb to alleviate the long wire and pin neck problems against on chip systems through small diameter and dynamic clustering dynamic clustering allows to reduce the traffic to the per cluster units such as the global interconnect interface as compared with the static clustering fixed in hardware we derive node semi complete sk graph from simple node partitioning an skb is produced from the sk graph when we replace the links incident to node by single bus for the node the diameter of skb equals bus step though the bus length is rather long simulation results show that relative to the hypercube with the link delay of clock the skb’s bandwidth is about and assuming the bus delay of and clocks respectively that increases to about and with the dynamic clustering
real time performance is critical for many time sensitive applications of wireless sensor networks we present constrained flooding protocol called cflood which enhances the deadline satisfaction ratio per unit energy consumption of time sensitive packets in sensor networks cflood improves real time performance by flooding but effectively constrains energy consumption by controlling the scale of flooding ie flooding only when necessary if unicasting meets the distributed sub deadline of hop cflood aborts further flooding even after flooding has occurred in the current hop our simulation based experimental studies show that cflood achieves higher deadline satisfaction ratio per unit energy consumption than previous multipath forwarding protocols especially in sparsely deployed or unreliable sensor network environments
with the advent of the multiple ip core based design using network on chip noc it is possible to run multiple applications concurrently for applications with hard deadline guaranteed services gs are required to satisfy the deadline requirement gs typically under utilizes the network resources to increase the resources utilization efficiency gs applications are always complement with the best effort services be to allow more resource available for be the resource reservation for gs applications which depends heavily on the scheduling of the computation and communication needs to be optimized in this paper we propose new approach based on optimal link scheduling to judiciously schedule the packets on each of the links such that the maximum latency of the gs application is minimized with minimum network resources utilization to further increase the performance we propose novel router architecture using shared buffer implementation scheme the approach is formulated using integer linear programming ilp we applied our algorithm on real applications and experimental results show that significant improvement on the overall execution time and link utilization can be achieved
adaptability has become one of the major research topics in the area of workflow management today’s workflow management systems have problems dealing with both ad hoc changes and evolutionary changes as result the workflow management system is not used to support dynamically changing workflow processes or the workflow process is supported in rigid manner ie changes are not allowed or handled outside of the workflow management system in this paper we focus on notorious problem caused by workflow change the ldquo dynamic change bug rdquo ellis et al semi proceedings of the conference on organizational computing systems milpitas california acm sigois acm press new york pp ndash the dynamic change bug refers to errors introduced by migrating case ie process instance from the old process definition to the new one transfer from the old process to the new process can lead to duplication of work skipping of tasks deadlocks and livelocks this paper describes an approach for calculating safe change region if case is in such change region the transfer is postponed
processor simulators are important parts of processor design toolsets in which they are used to verify and evaluate the properties of the designed processors while simulating architectures with independent function unit pipelines using simulation techniques that avoid the overhead of instruction bitstring interpretation such as compiled simulation the simulation of function unit pipelines can become one of the new bottlenecks for simulation speed this paper evaluates commonly used models for function unit pipeline resource conflict detection in processor simulation resource vector based model and an finite state automata fsa based model in addition an improvement to the simulation initialization time by means of lazy initialization of states in the fsa based approach is proposed the resulting model is faster to initialize and provides equal simulation speed when compared to the actively initialized fsa our benchmarks show at best percent improvement to the initialization time
in data streaming applications data arrives at rapid rates and in high volume thus making it essential to process each stream update very efficiently in terms of both time and space data stream is sequence of data records that must be processed continuously in an online fashion using sub linear space and sub linear processing time we consider the problem of tracking the number of distinct items over data streams that allow insertion and deletion operations we present two algorithms that improve on the space and time complexity of existing algorithms
the shrimp cluster computing system has progressed to point of relative maturity variety of applications are running on node system we have enough experience to understand what we did right and wrong in designing and building the system in this paper we discuss some of the lessons we learned about computer architecture and about the challenges involved in building significant working system in an academic research environment we evaluate significant design choices by modifying the network interface firmware and the system software in order to empirically compare our design to other approaches
simple graph oriented database model supporting object identity is presented for this model transformation language based on elementary graph operations is defined this transformation language is suitable for both querying and updates it is shown that the transformation language supports both set operations except for the powerset operator and recursive functions
image manipulation takes many forms powerful approach involves image adjustment by example to make color edits more intuitive the intelligent transfer of user specified target image’s color palette can achieve multitude of creative effects provided the user is supplied with small set of straightforward parameters we present novel histogram reshaping technique which allows significantly more control than previous methods given that the user is free to chose any image as the target the process of steering the algorithm becomes artistic moreover we show for the first time that creative tone reproduction can be achieved by matching high dynamic range image against low dynamic range target
off chip substrate routing for high density packages is challenging and the existing substrate routing algorithms often result in large number of unrouted nets that have to be routed manually this paper develops an effective yet efficient diffusion driven method router to improve routability by simulated diffusion process based on the duality between congestion and concentration compared with recently published based algorithm used in state of the art commercial tool and with similar routability and runtime as the negotiation based routing router reduces the number of unrouted nets by with up to runtime reduction
continuous access control after an object is released into distributed environment has been regarded as the usage control problem and has been investigated by different researchers in various papers however the enabling technology for usage control is challenging problem and the space has not been fully explored yet in this paper we identify the general requirements of trusted usage control enforcement in heterogeneous computing environments and also propose general platform architecture to meet these requirements
peer to peer pp search requires intelligent decisions for query routing selecting the best peers to which given query initiated at some peer should be forwarded for retrieving additional search results these decisions are based on statistical summaries for each peer which are usually organized on per keyword basis and managed in distributed directory of routing indices such architectures disregard the possible correlations among keywords together with the coarse granularity of per peer summaries which are mandated for scalability this limitation may lead to poor search result qualitythis paper develops and evaluates two solutions to this problem sk stat based on single key statistics only and mk stat based on additional multi key statistics for both cases hash sketch synopses are used to compactly represent peer’s data items and are efficiently disseminated in the pp network to form decentralized directory experimental studies with gnutella and web data demonstrate the viability and the trade offs of the approaches
identifying change prone modules can enable software developers to take focused preventive actions that can reduce maintenance costs and improve quality some researchers observed correlation between change proneness and structural measures such as size coupling cohesion and inheritance measures however the modules with the highest measurement values were not found to be the most troublesome modules by some of our colleagues in industry which was confirmed by our previous study of six large scale industrial products to obtain additional evidence we identified and compared high change modules and modules with the highest measurement values in two large scale open source products mozilla and openoffice and we characterized the relationship between them contrary to common intuition we found through formal hypothesis testing that the top modules in change count rankings and the modules with the highest measurement values were different in addition we observed that high change modules had fairly high places in measurement rankings but not the highest places the accumulated findings from these two open source products together with our previous similar findings for six closed source products should provide practitioners with additional guidance in identifying the change prone modules
in this paper we look at what is required to produce programs that are dependable dependability requires more than just high availability rather program needs to be right as well solving the problem for which it was designed this requires program development infrastructure that can by means of appropriate abstractions permit the programmer to focus on his problem and not be distracted by systems issues that arise when high availability is required we discuss the attributes of good abstractions we then illustrate this in the programming of dependable systems our abstraction is transparently persistent stateful programming model for use in the web enterprise setting where exactly once execution is required work on this abstraction is reviewed the new technical meat of the paper is in describing how to reduce the performance cost of using the abstraction extending the flexibility of using this abstraction and showing how to exploit it to achieve dependability
it is very difficult for beginners to define and find the most relevant literature in research field they can search on the web or look at the most important journals and conference proceedings but it would be much better to receive suggestions directly from experts of the field unfortunately this is not always possible and systems like citeseer and googlescholar become extremely useful for beginners and not only in this paper we present an agent based system that facilitates scientific publications search users interacting with their personal agents produce transfer of knowledge about relevant publications from experts to beginners each personal agent observes how publications are used and induces behavioral patterns that are used to create more effective recommendations feedback exchange allows agents to share their knowledge and virtual communities of cloned experts can be created to support novice users we present set of experimental results obtained using citeseer as source of information that show the effectiveness of our approach
the natural optimization strategy for xml to relational mapping methods is exploitation of similarity of xml data however none of the current similarity evaluation approaches is suitable for this purpose while the key emphasis is currently put on semantic similarity of xml data the main aspect of xml to relational mapping methods is analysis of their structure in this paper we propose an approach that utilizes verified strategy for structural similarity evaluation tree edit distance to dtd constructs this approach is able to cope with the fact that dtds involve several types of nodes and can form general graphs in addition it is optimized for the specific features of xml data and if required it enables one to exploit the semantics of element attribute names using set of experiments we show the impact of these extensions on similarity evaluation and finally we discuss how this approach can be extended for xsds which involve plenty of syntactic sugar ie constructs that are structurally or semantically equivalent
in this article we describe and evaluate novel low interaction cost approach to supporting the spontaneous discovery of geo tagged information while on the move our mobile haptic prototype helps users to explore their environment by providing directional vibrotactile feedback based on the presence of location data we conducted study to investigate whether users can find these targets while walking comparing their performance when using only haptic feedback to that when using an equivalent visual system the results are encouraging and here we present our findings discussing their significance and issues relevant to the design of future systems that combine haptics with location awareness
we present new approach to integrating constraint processing cp techniques into answer set programming asp based on an alternative semantic approach we develop an algorithmic framework for conflict driven asp solving that exploits cp solving capacities significant technical issue concerns the combination of conflict information from different solver types we have implemented our approach combining asp solver clingo with the generic cp solver gecode and we empirically investigate its computational impact
remote error analysis aims at timely detection and remedy of software vulnerabilities through analyzing run time errors that occur on the client this objective can only be achieved by offering users effective protection of their private information and minimizing the performance impact of the analysis on their systems without undermining the amount of information the server can access for understanding errors to this end we propose in the paper new technique for privacy aware remote analysis called panalyst panalyst includes client component and server component once runtime exception happens to an application panalyst client sends the server an initial error report that includes only public information regarding the error such as the length of the packet that triggers the exception using an input built from the report panalyst server performs taint analysis and symbolic execution on the application and adjusts the input by querying the client about the information upon which the execution of the application depends the client agrees to answer only when the reply does not give away too much user information in this way an input that reproduces the error can be gradually built on the server under the client’s consent our experimental study of this technique demonstrates that it exposes very small amount of user information introduces negligible overheads to the client and enables the server to effectively analyze an error
as models are always abstractions of reality we often need multiple modeling perspectives for analysis the interplay of such modeling perspectives can take many forms and plays role both at the design level and during the operation of information systems examples include viewpoint resolution in requirements management mapping between conceptual and implementation design in databases and the integration or interoperation of multiple data and media sources starting from early experiences with our now year old conceptbase implementation of the telos language we describe logic based conceptual modeling and model management approach to these issues focusing on recent work which employs generic meta model to facilitate mappings and transformations between heterogeneous model representations both at the schema and the data level
given two datasets and their exclusive closest pairs ecp join is one to one assignment of objects from the two datasets such that the closest pair in is in the result and ii the remaining pairs are determined by removing objects from respectively and recursively searching for the next closest pair an application of exclusive closest pairs is the computation of car parking slot assignments in this paper we propose algorithms for the computation and continuous monitoring of ecp joins in memory given stream of events that indicate dynamic assignment requests and releases of pairs experimental results on system prototype demonstrate the efficiency of our solutions in practice
this paper describes language based approach for automatic and accurate cost bound analysis the approach consists of transformations for building cost bound functions in the presence of partially known input structures symbolic evaluation of the cost bound function based on input size parameters and optimizations to make the overall analysis efficient as well as accurate all at the source language level the calculated cost bounds are expressed in terms of primitive cost parameters these parameters can be obtained based on the language implementation or can be measured conservatively or approximately yielding accurate conservative or approximate time or space bounds we have implemented this approach and performed number of experiments for analyzing scheme programs the results helped confirm the accuracy of the analysis
this paper presents an approach to design an interface for document retrieval based on techniques from the semantic web combined with interactive graphical features the purpose of this study is to enhance the user’s knowledge while he she browses the information through graphical interface in this paper two aspects are considered fist interactive features such as object movability animation etc are discussed second method for visually integrating the search queries and the query outputs is addressed in order to retrieve documents the visual features and the querying method are combined taking account the semantical relations among extracted information from the documents this combination is evaluated as means to determine the most suitable location for the results inside the interface
concurrent real time software is increasingly used in safety critical embedded systems assuring the quality of such software requires the rigor of formal methods in order to analyze program formally we must first construct mathematical model of its behavior in this paper we consider the problem of constructing such models for concurrent real time software in particular we provide method for building mathematical models of real time ada tasking programs that are accurate enough to verify interesting timing properties and yet abstract enough to yield tractable analysis on nontrivial programs our approach differs from schedulability analysis in that we do not assume that the software has highly restricted structure eg set of periodic tasks also unlike most abstract models of real time systems we account for essential properties of real implementations such as resource constraints and run time overhead
today’s web applications are pushing the limits of modern web browsers the emergence of the browser as the platform of choice for rich client side applications has shifted the use of in browser javascript from small scripting programs to large computationally intensive application logic for many web applications javascript performance has become one of the bottlenecks preventing the development of even more interactive client side applications while traditional just in time compilation is successful for statically typed virtual machine based languages like java compiling javascript turns out to be challenging task many javascript programs and scripts are short lived and users expect responsive browser during page loading this leaves little time for compilation of javascript to generate machine code we present trace based just in time compiler for javascript that uses run time profiling to identify frequently executed code paths which are compiled to executable machine code our approach increases execution performance by up to by decomposing complex javascript instructions into simple forth based representation and then recording the actually executed code path through this low level ir giving developers more computational horsepower enables new generation of innovative web applications
human computer interaction hci often focuses on how designers can develop systems that convey single specific clear interpretation of what they are for and how they should be used and experienced new domains such as domestic and public environments new influences from the arts and humanities and new techniques in hci itself are converging to suggest that multiple potentially competing interpretations can fruitfully co exist in this paper we lay out the contours of the new space opened by focus on multiple interpretations which may more fully address the complexity dynamics and interplay of user system and designer interpretation we document how design and evaluation strategies shift when we abandon the presumption that specific authoritative interpretation of the systems we build is necessary possible or desirable
the paradigm of service oriented computing soc has emerged as an approach to provide flexibility and agility not just in systems development but also in business process management this modular approach to defining business flows as technology independent services has gained unanimous popularity among end users and technology vendors alike although there is significant amount of ongoing research on the potential of service oriented architectures soas there is paucity of research literature on the factors affecting the adoption of service oriented computing in practice this paper reviews the current state of the technology identifies the factors influencing the decision to adopt service oriented computing as an enterprise strategy and discusses the associated research literature and concludes with suggested research agenda and conceptual framework for investigating the use of service oriented computing in practice
recent years have seen dramatic increase in research and development of scientific workflow systems these systems promise to make scientists more productive by automating data driven and compute intensive analyses despite many early achievements the long term success of scientific workflow technology critically depends on making these systems useable by mere mortals ie scientists who have very good idea of the analysis methods they wish to assemble but who are neither software developers nor scripting language experts with these users in mind we identify set of desiderata for scientific workflow systems crucial for enabling scientists to model and design the workflows they wish to automate themselves as first step towards meeting these requirements we also show how the collection oriented modeling and design comad approach for scientific workflows implemented within the kepler system can help provide these critical design oriented capabilities to scientists
the most direct way toward understanding whether planetlab and other such systems serve their purpose is to build deploy and use them
although text categorization is burgeoning area of ir research readily available test collections in this field are surprisingly scarce we describe methodology and system named accio for automatically acquiring labeled datasets for text categorization from the world wide web by capitalizing on the body of knowledge encoded in the structure of existing hierarchical directories such as the open directory we define parameters of categories that make it possible to acquire numerous datasets with desired properties which in turn allow better control over categorization experiments in particular we develop metrics that estimate the difficulty of dataset by examining the host directory structure these metrics are shown to be good predictors of categorization accuracy that can be achieved on dataset and serve as efficient heuristics for generating datasets subject to user’s requirements large collection of automatically generated datasets are made available for other researchers to use
applying computer technology such as computer vision in driver assistance implies that processes and data are modeled as being discretized rather than being continuous the area of stereo vision provides various examples how concepts known in discrete mathematics eg pixel adjacency graphs belief propagation dynamic programming max flow min cut or digital straight lines are applied when aiming for efficient and accurate pixel correspondence solutions the paper reviews such developments for reader in discrete mathematics who is interested in applied research in particular in vision based driver assistance as second subject the paper also discusses lane detection and tracking which is particular task in driver assistance recently the euclidean distance transform proved to be very appropriate tool for obtaining fairly robust solution
in wireless sensor network applications the potential to use cooperation to resolve user queries remains largely untapped efficiently answering user’s questions requires identifying the correct set of nodes that can answer the question and enabling coordination between them in this article we propose query domain abstraction that allows an application to dynamically specify the nodes best suited to answering particular query selecting the ideal set of heterogeneous sensors entails answering two fundamental questions mdash how are the selected sensors related to one another and where should the resulting sensor coalition be located we introduce two abstractions the proximity function and the reference function to precisely specify each of these concerns within query all nodes in the query domain must satisfy any provided proximity function user defined function that constrains the relative relationship among the group of nodes eg based on property of the network or physical environment or on logical properties of the nodes the selected set of nodes must also satisfy any provided reference function mechanism to scope the location of the query domain to specified area of interest eg within certain distance from specified reference point in this article we model these abstractions and present set of protocols that accomplish this task with varying degrees of correctness we evaluate their performance through simulation and highlight the tradeoffs between protocol overhead and correctness
the concept of unique object arises in many emerging programming languages such as clean cqual cyclone tal and vault in each of these systems unique objects make it possible to perform operations that would otherwise be prohibited eg deallocating an object or to ensure that some obligation will be met eg an opened file will be closed however different languages provide different interpretations of uniqueness and have different rules regarding how unique objects interact with the rest of the languageour goal is to establish common model that supports each of these languages by allowing us to encode and study the interactions of the different forms of uniqueness the model we provide is based on substructural variant of the polymorphic calculus augmented with four kinds of mutable references unrestricted relevant affine and linear the language has natural operational semantics that supports deallocation of references strong type varying updates and storage of unique objects in shared references we establish the strong soundness of the type system by constructing novel semantic interpretation of the types
vagueness is often present in spatial phenomena representing and analysing vague spatial phenomena requires vague objects and operators whereas current gis and spatial databases can only handle crisp objects this paper provides mathematical definitions for vague object types and operators the object types that we propose are set of simple types set of general types and vague partitions the simple types represent identifiable objects of simple structure ie not divisible into components they are vague points vague lines and vague regions the general types represent classes of simple type objects they are vague multipoint vague multiline and vague multiregion general types assure closure under set operators simple and general types are defined as fuzzy sets in satisfying specific properties that are expressed in terms of topological notions these properties assure that set membership values change mostly gradually allowing stepwise jumps the type vague partition is collection of vague multiregions that might intersect each other only at their transition boundaries it allows for soft classification of space all types allow for both finite and an infinite number of transition levels they include crisp objects as special cases we consider standard set of operators on crisp objects and define them for vague objects we provide definitions for operators returning spatial types they are regularized fuzzy set operators union intersection and difference two operators from topology boundary and frontier and two operators on vague partitions overlay and fusion other spatial operators topological predicates and metric operators are introduced giving their intuition and example definitions all these operators include crisp operators as special cases types and operators provided in this paper form model for spatial data system that can handle vague information the paper is illustrated with an application of vague objects in coastal erosion
this paper introduces data driven representation and modeling technique for simulating non linear heterogeneous soft tissue it simplifies the construction of convincing deformable models by avoiding complex selection and tuning of physical material parameters yet retaining the richness of non linear heterogeneous behavior we acquire set of example deformations of real object and represent each of them as spatially varying stress strain relationship in finite element model we then model the material by non linear interpolation of these stress strain relationships in strain space our method relies on simple to build capture system and an efficient run time simulation algorithm based on incremental loading making it suitable for interactive computer graphics applications we present the results of our approach for several non linear materials and biological soft tissue with accurate agreement of our model to the measured data
we present the results of qualitative study of the sharing and consumption of entertainment media on low cost mobile phones in urban india practice which has evolved into vibrant informal socio technical ecosystem this wide ranging phenomenon includes end users mobile phone shops and content distributors and exhibits remarkable ingenuity even more impressive is the number of obstacles which have been surmounted in its establishment from the technical interface complexity limited internet access viruses to the broader socioeconomic cost language legality institutional rules lack of privacy all seemingly due to strong desire to be entertained our findings carry two implications for projects in hci seeking to employ technology in service of social and economic development first although great attention is paid to the details of ui in many such projects we find that sufficient user motivation towards goal turns ui barriers into mere speed bumps second we suggest that needs assessments carry an inherent bias towards what outsiders consider needs and that identified needs may not be as strongly felt as perceived
dialogue systems for health communication hold out the promise of providing intelligent assistance to patients through natural interfaces that require no training to use but in order to make the development of such systems cost effective we must be able to use generic techniques and components which are then specialized as needed to the specific health problem and patient population in this paper we describe chester prototype intelligent assistant that interacts with its user via conversational natural spoken language to provide them with information and advice regarding their prescribed medications chester builds on our prior experience constructing conversational assistants in other domains the emphasis of this paper is on the portability of our generic spoken dialogue technology and presents case study of the application of these techniques to the development of dialogue system for health communication
novel top down compression technique for data cubes is introduced and experimentally assessed in this paper this technique considers the previously unrecognized case in which multiple hierarchical range queries hrq very useful class of olap queries must be evaluated against the target data cube simultaneously this scenario makes traditional data cube compression techniques ineffective as contrarily to the aim of our work these techniques take into consideration one constraint only eg given space bound the result of our study consists in introducing an innovative multiple objective olap computational paradigm and hierarchical multidimensional histogram whose main benefit is meaningfully implementing an intermediate compression of the input data cube able to simultaneously accommodate an even large family of different in nature hrq complementary contribution of our work is represented by wide experimental evaluation of the query performance of our technique against both benchmark and real life data cubes also in comparison with state of the art histogram based compression techniques
recognizing hand sketched symbols is definitely complex problem the input drawings are often intrinsically ambiguous and require context to be interpreted in correct way many existing sketch recognition systems avoid this problem by recognizing single segments or simple geometric shapes in stroke however for recognition system to be effective and precise context must be exploited and both the simplifications on the sketch features and the constraints under which recognition may take place must be reduced to the minimum in this paper we present an agent based framework for sketched symbol interpretation that heavily exploits contextual information for ambiguity resolution agents manage the activity of low level hand drawn symbol recognizers that may be heterogeneous for better adapting to the characteristics of each symbol to be recognized and coordinate themselves in order to exchange contextual information thus leading to an efficient and precise interpretation of sketches we also present agentsketch multi domain sketch recognition system implemented according to the proposed framework first experimental evaluation has been performed on the domain of uml use case diagrams to verify the effectiveness of the proposed approach
the emphasis on participation in social technologies challenges some of our traditional assumptions about the role of users and designers in design it also exposes some of the limitations and assumptions about design embedded in our traditional models and methods based on review of emerging practice we present four perspectives on design in the context of social technologies by presenting this lay of the land we seek to contribute to ongoing work on the nature of participation and design in the context of social technologies we draw particular attention to the ways in which roles and responsibilities in design are being reassigned and redistributed as traditional boundaries between design and use and designer and user dissolve design is becoming more public in the context of social technologies design is moving out into the wild
we present study of the effects of instant messaging im on individuals management of work across multiple collaborative projects groups of four participants completed four web design tasks each participant worked on two tasks each task with different partner who was either co located or remote connected via im in one condition each participant had one co located and one remote partner in second condition both partners were remote we examined communication division of labor and task performance as function of condition the results indicated that nearly all participants divided their time unequally between projects but less unequally in the remote remote condition in the co located remote condition participants favored the task with the co located partner the results show that the effects of im differ depending on people’s multiple tasks are distributed across space we propose new im interface that promotes awareness of multiple collaborators on multiple tasks
the powerful abstraction mechanisms of functional programming languages provide the means to develop domain specific programming languages within the language itself typically this is realised by designing set of combinators higher order reusable programs for an application area and by constructing individual applications by combining and coordinating individual combinators this paper is concerned with successful example of such an embedded programming language namely fudgets library of combinators for building graphical user interfaces in the lazy functional language haskell the fudget library has been used to build number of substantial applications including web browser and proof editor interface to proof checker for constructive type theory this paper develops semantic theory for the non deterministic stream processors that are at the heart of the fudget concept the interaction of two features of stream processors makes the development of such semantic theory problematic the sharing of computation provided by the lazy evaluation mechanism of the underlying host language and ii the addition of non deterministic choice needed to handle the natural concurrency that reactive applications entail we demonstrate that this combination of features in higher order functional language can be tamed to provide tractable semantic theory and induction principles suitable for reasoning about contextual equivalence of fudgets
code transformation and analysis tools provide support for software engineering tasks such as style checking testing calculating software metrics as well as reverse and re engineering in this paper we describe the architecture and the applications of jtransform general java source code processing and transformation framework it consists of java parser generating configurable parse tree and various visitors transformers tree evaluators which produce different kinds of outputs while our framework is written in java the paper further opens an opportunity for new generation of xml based source code tools
recommender systems are tools to help users find items that they deem of interest to them they can be seen as an application of data mining process in this paper new recommender system based on multi features is introduced demographic and psychographic features are used to asses similarities between users the model is built on collaborative filtering method and addresses three problems sparsity scalability and cold star the sparsity problem is tackled by integrating users documents relevant information within meta clusters the scalability and the cold start problems are considered by using suitable probability model calculated on meta cluster information moreover weight similarity measure is introduced in order to take into account dynamic human being preferences behaviour prediction score for generating recommendations is proposed based on the target user previous behaviour and his her neighbourhood preferences on the target document
tracing the lineage of data is an important requirement for establishing the quality and validity of data recently the problem of data provenance has been increasingly addressed in database research earlier work has been limited to the lineage of data as it is manipulated using relational operations within an rdbms while this captures very important aspect of scientific data processing the existing work is incapable of handling the equally important and prevalent cases where the data is processed by non relational operations this is particularly common in scientific data where sophisticated processing is achieved by programs that are not part of dbms the problem of tracking lineage when non relational operators are used to process the data is particularly challenging since there is potentially no constraint on the nature of the processing in this paper we propose novel technique that overcomes this significant barrier and enables the tracing of lineage of data generated by an arbitrary function our technique works directly with the executable code of the function and does not require any high level description of the function or even the source code we establish the feasibility of our approach on typical application and demonstrate that the technique is able to discern the correct lineage furthermore it is shown that the method can help identify limitations in the function itself
side channel attack based upon the analysis of power traces is an effective way of obtaining the encryption key from secure processors power traces can be used to detect bitflips which betray the secure key balancing the bitflips with opposite bitflips have been proposed by the use of opposite logic this is an expensive solution where the balancing processor continues to balance even when encryption is not carried out in the processor we propose for the first time multiprocessor algorithmic balancing technique to prevent power analysis of processor executing an aes cryptographic program popular encryption standard for embedded systems our technique uses dual processor architecture where two processors execute the same program in parallel but with complementary intermediate data thus balancing the bitflips the second processor works in conjunction with the first processor for balancing only when the aes encryption is performed and both processors carry out independent tasks when no encryption is being performed accessing the encryption key or the input data by the first processor begins the obfuscation by the second processor to stop the encryption by the second processor we use novel signature detection technique which detects the end of the encryption automatically the multiprocessor balancing approach muteaes proposed here reduces performance by and increases the size of the hardware by though reduces to when no encryption is being performed we show that differential power analysis dpa fails when our technique is applied to aes we further illustrate that by the use of this balancing strategy the adversary is left with noise from the power profile with little useful information
in this paper we study the problem of packet scheduling in wireless environment with the objective of minimizing the average transmission energy expenditure under individual packet delay constraint most past studies assumed that the input arrivals follow poisson process or be statistically independent however traffic from real source typically has strong time correlationwe model packet scheduling and queuing system for general input process in linear time invariant systems we propose an energy efficient packet scheduling policy that takes the correlation into account meanwhile slower transmission rate implies that packets stay in the transmitter for longer time which may result in unexpected transmitter overload and buffer overflow we derive upper bounds of the maximum transmission rate under an overload probability and upper bounds of required buffer size under packet drop rate simulation results show that the proposed scheduler improves up to in energy savings compared with the policies that assume statistically independent input evaluation of the bounds in providing qos control shows that both deadline misses and packet drops can be effectively bounded by predefined constraint
augmented graphs were introduced for the purpose of analyzing the six degrees of separation between individuals observed experimentally by the sociologist standley milgram in the we define an augmented graph as pair where is an node graph with nodes labeled in and is an nxn stochastic matrix every node is given an extra link called long range link pointing to some node called the long range contact of the head of this link is chosen at random by pr in augmented graphs greedy routing is the oblivious routing process in which every intermediate node chooses from among all its neighbors including its long range contact the one that is closest to the target according to the distance measured in the underlying graph and forwards to it the best augmentation scheme known so far ensures that for any node graph greedy routing performs in expected number of steps our main result is the design of an augmentation scheme that overcomes the barrier precisely we prove that for any node graph whose nodes are arbitrarily labeled in there exists stochastic matrix such that greedy routing in performs in where the notation ignores the polylogarithmic factors we prove additional results when the stochastic matrix is universal to all graphs in particular we prove that the barrier can still be overcame for large graph classes even if the matrix is universal this however requires an appropriate labeling of the nodes if the node labeling is arbitrary then we prove that the barrier cannot be overcome with universal matrices
modern embedded compute platforms increasingly contain both microprocessors and field programmable gate arrays fpgas the fpgas may implement accelerators or other circuits to speedup performance many such circuits have been previously designed for acceleration via application specific integrated circuits asics redesigning an asic circuit for fpga implementation involves several challenges we describe case study that highlights common challenge related to memories the study involves converting pattern counting circuit architecture based on pipelined binary tree and originally designed for asic implementation into circuit suitable for fpgas the original asic oriented circuit when mapped to spartan fpga could process million patterns per second and handle up to patterns the redesigned circuit could instead process million patterns per second and handle up to patterns representing performance improvement and utilization improvement the redesign involved partitioning large memories into smaller ones at the expense of redundant control logic through this and other case studies design patterns may emerge that aid designers in redesigning asic circuits for fpgas as well as in building new high performance and efficient circuits for fpgas
adaptive query processing generally involves feedback loop comprising monitoring assessment and response so far individual proposals have tended to group together an approach to monitoring means of assessment and form of response however there are many benefits in decoupling these three phases and in constructing generic frameworks for each of them to this end this paper discusses monitoring of query plan execution as topic in its own right and advocates an approach based on self monitoring algebraic operators this approach is shown to be generic and independent of any specific adaptation mechanism easily implementable and portable sufficiently comprehensive appropriate for heterogeneous distributed environments and more importantly capable of driving on the fly adaptations of query plan execution an experimental evaluation of the overheads and of the quality of the results obtained by monitoring is also presented
relational datasets ie datasets in which individuals are described both by their own features and by their relations to other individuals arise from various sources such as databases both relational and object oriented knowledge bases or software models eg uml class diagrams when processing such complex datasets it is of prime importance for an analysis tool to hold as much as possible to the initial format so that the semantics is preserved and the interpretation of the final results eased therefore several attempts have been made to introduce relations into the formal concept analysis field which otherwise generated large number of knowledge discovery methods and tools however the proposed approaches invariably look at relations as an intra concept construct typically relating two parts of the concept description and therefore can only lead to the discovery of coarse grained patterns as an approach towards the discovery of finer grain relational concepts we propose to enhance the classical object attribute data representations with new dimension that is made out of inter object links eg spouse friend manager of etc consequently the discovered concepts are linked by relations which like associations in conceptual data models such as the entity relation diagrams abstract from existing links between concept instances the borders for the application of the relational mining task are provided by what we call relational context family set of binary data tables representing individuals of various sorts eg human beings companies vehicles etc related by additional binary relations as we impose no restrictions on the relations in the dataset major challenge is the processing of relational loops among data items we present method for constructing concepts on top of circular descriptions which is based on an iterative approximation of the final solution the underlying construction methods are illustrated through their application to the restructuring of class hierarchies in object oriented software engineering which are described in uml
closed sets have been proven successful in the context of compacted data representation for association rule learning however their use is mainly descriptive dealing only with unlabeled data this paper shows that when considering labeled data closed sets can be adapted for classification and discrimination purposes by conveniently contrasting covering properties on positive and negative examples we formally prove that these sets characterize the space of relevant combinations of features for discriminating the target class in practice identifying relevant irrelevant combinations of features through closed sets is useful in many applications to compact emerging patterns of typical descriptive mining applications to reduce the number of essential rules in classification and to efficiently learn subgroup descriptions as demonstrated in real life subgroup discovery experiments on high dimensional microarray data set
it is known that group based system provides better performance and more scalability to the whole system while it decreases the communication traffic group based architectures in content delivery networks cdns could be good solution to the need of scalability or when the bandwidth is limitation there is no pure group based cdn in existence although we proposed group based system to interconnect cdns of different providers in previous work this article shows new content delivery network based on grouping surrogates we will show the benefits of our proposal and its application environment we will describe the protocol developed to connect surrogates from the same group and from different groups the neighbor selection algorithm is based on their distance and round trip delay in order to provide lower content distribution times the system improves the quality of service qos by connecting surrogates with higher available capacity real measurements of the network control traffic and of the performance of the surrogates in controlled environment are shown we will also demonstrate its scalability by comparing the control traffic for different numbers of surrogates in the cdn finally we will show the differences with the system proposed in our previous work
the past decades have witnessed rapid growth of distributed interactive multimedia environments dimes despite their intensity of user involved interaction the existing evaluation frameworks remain very much system centric as step toward the human centric paradigm we present conceptual framework of quality of experience qoe in dimes to model measure and understand user experience and its relationship with the traditional quality of service qos metrics multi displinary approach is taken to build up the framework based on the theoretical results from various fields including psychology cognitive sciences sociology and information technology we introduce mapping methodology to quantify the correlations between qos and qoe and describe our controlled and uncontrolled studies as illustrating examples the results present the first deep study to model the multi facet qoe construct map the qos qoe relationship and capture the human centric quality modalities in the context of dimes
we study how functional dependencies affect the cyclicity of database scheme in particular when does set of functional dependencies make cyclic database scheme behave like an acyclic onea database scheme is fd acyclic if every pairwise consistent database state that satisfies the fd’s is join consistent we give simple characterization of fd acyclicity over restricted class of database schemes we then give tableau based characterization for the general case that leads to an algorithm for testing fd acyclicity this algorithm actually solves the more general problem of query equivalence under functional dependencies and typed inclusion dependencies
this paper presents framework for assessing the significance of inconsistencies which arise in object oriented design models that describe software systems from multiple perspectives and the findings of series of experiments conducted to evaluate it the framework allows the definition of significance criteria and measures the significance of inconsistencies as beliefs for the satisfiability of these criteria the experiments conducted to evaluate it indicate that criteria definable in the framework have the power to create elaborate rankings of inconsistencies in models
this article focuses on the effect of both process topology and load balancing on various programming models for smp clusters and iterative algorithms more specifically we consider nested loop algorithms with constant flow dependencies that can be parallelized on smp clusters with the aid of the tiling transformation we investigate three parallel programming models namely popular message passing monolithic parallel implementation as well as two hybrid ones that employ both message passing and multi threading we conclude that the selection of an appropriate mapping topology for the mesh of processes has significant effect on the overall performance and provide an algorithm for the specification of such an efficient topology according to the iteration space and data dependencies of the algorithm we also propose static load balancing techniques for the computation distribution between threads that diminish the disadvantage of the master thread assuming all inter process communication due to limitations often imposed by the message passing library both improvements are implemented as compile time optimizations and are further experimentally evaluated an overall comparison of the above parallel programming styles on smp clusters based on micro kernel experimental evaluation is further provided as well
active learning is well suited to many problems in natural language processing where unlabeled data may be abundant but annotation is slow and expensive this paper aims to shed light on the best active learning approaches for sequence labeling tasks such as information extraction and document segmentation we survey previously used query selection strategies for sequence models and propose several novel algorithms to address their shortcomings we also conduct large scale empirical comparison using multiple corpora which demonstrates that our proposed methods advance the state of the art
large multimedia document archives may hold major fraction of their data in tertiary storage libraries for cost reasons this paper develops an integrated approach to the vertical data migration between the tertiary secondary and primary storage in that it reconciles speculative prefetching to mask the high latency of the tertiary storage with the replacement policy of the document caches at the secondary and primary storage level and also considers the interaction of these policies with the tertiary and secondary storage request scheduling the integrated migration policy is based on continuous time markov chain model for predicting the expected number of accesses to document within specified time horizon prefetching is initiated only if that expectation is higher than those of the documents that need to be dropped from secondary storage to free up the necessary space in addition the possible resource contention at the tertiary and secondary storage is taken into account by dynamically assessing the response time benefit of prefetching document versus the penalty that it would incur on the response time of the pending document requests the parameters of the continuous time markov chain model the probabilities of co accessing certain documents and the interaction times between successive accesses are dynamically estimated and adjusted to evolving workload patterns by keeping online statistics the integrated policy for vertical data migration has been implemented in prototype system the system makes profitable use of the markov chain model also for the scheduling of volume exchanges in the tertiary storage library detailed simulation experiments with web server like synthetic workloads indicate significant gains in terms of client response time the experiments also show that the overhead of the statistical bookkeeping and the computations for the access predictions is affordable
in this paper we present novel feature based texture design scheme using deformation techniques firstly we apply compass operator to extract the feature map from the input small sample texture secondly we use the feature guided patch searching algorithm to find satisfactory candidate patches taking both color errors and feature errors into account when the new feature map is created the designed texture is obtained simultaneously thirdly completion based texture design method is employed to design variety of large deformed textures designer can repeat the above steps to design satisfactory textures the proposed algorithm has the ability to design variety of versatile textures from single small sample texture by measuring the structural similarity our experimental results demonstrate that our proposed technique can be used for other texture synthesis applications such as wang tiles based cyclic texture design
this article introduces the sieve novel building block that allows to adapt to the number of simultaneously active processes the point contention during the execution of an operation we present an implementation of the sieve in which each sieve operation requires log steps where is the point contention during the operationthe sieve is the cornerstone of the first wait free algorithms that adapt to point contention using only read and write operations specifically we present efficient algorithms for long lived renaming timestamping and collecting information
approximate queries on string data are important due to the prevalence of such data in databases and various conventions and errors in string data we present the vsol estimator novel technique for estimating the selectivity of approximate string queries the vsol estimator is based on inverse strings and makes the performance of the selectivity estimator independent of the number of strings to get inverse strings we decompose all database strings into overlapping substrings of length grams and then associate each gram with its inverse string the ids of all strings that contain the gram we use signatures to compress inverse strings and clustering to group similar signatures we study our technique analytically and experimentally the space complexity of our estimator only depends on the number of neighborhoods in the database and the desired estimation error the time to estimate the selectivity is independent of the number of database strings and linear with respect to the length of query string we give detailed empirical performance evaluation of our solution for synthetic and real world datasets we show that vsol is effective for large skewed databases of short strings
we analyze online gaming as site of collaboration in digital physical hybrid we ground our analysis in findings from an ethnographic study of the online game world of warcraft in china we examine the interplay of collaborative practices across the physical environment of china’s internet cafes and the virtual game space of world of warcraft our findings suggest that it may be fruitful to broaden existing notions of physical digital hybridity by considering the nuanced interplay between the digital and physical as multi dimensional environment or ecology we illustrate how socio economics government regulations and cultural value systems shaped hybrid cultural ecology of online gaming in china
the existing solutions to privacy preserving publication can be classified into the theoretical and heuristic categories the former guarantees provably low information loss whereas the latter incurs gigantic loss in the worst case but is shown empirically to perform well on many real inputs while numerous heuristic algorithms have been developed to satisfy advanced privacy principles such as diversity closeness etc the theoretical category is currently limited to anonymity which is the earliest principle known to have severe vulnerability to privacy attacks motivated by this we present the first theoretical study on diversity popular principle that is widely adopted in the literature first we show that optimal diverse generalization is np hard even when there are only distinct sensitive values in the microdata then an middot approximation algorithm is developed where is the dimensionality of the underlying dataset this is the first known algorithm with non trivial bound on information loss extensive experiments with real datasets validate the effectiveness and efficiency of proposed solution
we present methods for recovering surface height fields such as geometric details of textures by incorporating shadow constraints we introduce shadow graphs which give new graph based representation for shadow constraints it can be shown that the shadow graph alone is sufficient to solve the shape from shadow problem from dense set of images shadow graphs provide simpler and more systematic approach to represent and integrate shadow constraints from multiple images to recover height fields from sparse set of images we propose method for integrated shadow and shading constraints previous shape from shadow algorithms do not consider shading constraints while shape from shading usually assumes there is no shadow our method is based on collecting set of images from fixed viewpoint as known light source changes its position it first builds shadow graph from shadow constraints from which an upper bound for each pixel can be derived if the height values of small number of pixels are initialized correctly finally constrained optimization procedure is designed to make the results from shape from shading consistent with the height bounds derived from the shadow constraints our technique is demonstrated on both synthetic and real imagery
abstract mechanism for reducing the power requirements in processors that use separate architectural register file arf for holding committed values is proposed in this paper we exploit the notion of short lived operands values that target architectural registers that are renamed by the time the instruction producing the value reaches the writeback stage our simulations of the spec benchmarks show that as much as percent to percent of the results are short lived our technique avoids unnecessary writebacks into the result repository slot within the reorder buffer or physical register as well as writes into the arf from unnecessary commitments by caching and isolating short lived operands within small dedicated register file operands are cached in this manner till they can be safely discarded without jeopardizing the recovery from possible branch mispredictions or reconstruction of the precise state in case of interrupts or exceptions additional energy savings are achieved by limiting the number of ports used for instruction commitment the power energy savings are validated using spice measurements of actual layouts in micron cmos process the energy reduction in the rob and the arf is about percent translating into the overall chip energy reduction of about percent and this is achieved with no increase in cycle time little additional complexity and no degradation in the number of instructions committed per cycle
emerging microprocessors offer unprecedented parallel computing capabilities and deeper memory hierarchies increasing the importance of loop transformations in optimizing compilers because compiler heuristics rely on simplistic performance models and because they are bound to limited set of transformations sequences they only uncover fraction of the peak performance on typical benchmarks iterative optimization is maturing framework to address these limitations but so far it was not successfully applied complex loop transformation sequences because of the combinatorics of the optimization search space we focus on the class of loop transformation which can be expressed as one dimensional affine schedules we define systematic exploration method to enumerate the space of all legal distinct transformations in this class this method is based on an upstream characterization as opposed to state of the art downstream filtering approaches our results demonstrate orders of magnitude improvements in the size of the search space and in the convergence speed of dedicated iterative optimization heuristic
range estimation is essential in many sensor network localisation algorithms although wireless sensor systems usually have available received signal strength indication rssi readings this information has not been effectively used in the existing localisation algorithms in this paper we present novel approach to localisation of sensors in an ad hoc sensor network based on sorted rssi quantisation algorithm this algorithm can improve the range estimation accuracy when distance information is not available or too erroneous the range level used in the quantisation process can be determined by each node using an adaptive quantisation scheme the new algorithm can be implemented in distributed way and achieves significant improvement over existing range free algorithms the performance advantage for various sensor networks is shown with experimental results from our extensive simulation with realistic radio model
open source softwares provide rich resource of empirical research in software engineering static code metrics are good indicator of software quality and maintainability in this work we have tried to answer the question whether bug predictors obtained from one project can be applied to different project with reasonable accuracy two open source projects firefox and apache http server ahs are used for this study static code metrics are calculated for both projects using in house software and the bug information is obtained from bug databases of these projects the source code files are classified as clean or buggy using the decision tree classifier the classifier is trained on metrics and bug data of firefox and tested on apache http server and vice versa the results obtained vary with different releases of these projects and can be as good as of the files correctly classified and as poor as of the files correctly classified by the trained classifier
when we encounter an english word that we do not understand we can look it up in dictionary however when an american sign language asl user encounters an unknown sign looking up the meaning of that sign is not straightforward process it has been recently proposed that this problem can be addressed using computer vision system that helps users look up the meaning of sign in that approach sign lookup can be treated as video database retrieval problem when the user encounters an unknown sign the user provides video example of that sign as query so as to retrieve the most similar signs in the database necessary component of such sign lookup system is similarity measure for comparing sign videos given query video of specific sign the similarity measure should assign high similarity values to videos from the same sign and low similarity values to videos from other signs this paper evaluates state of the art video based similarity measure called dynamic space time warping dstw for the purposes of sign retrieval the paper also discusses how to specifically adapt dstw so as to tolerate differences in translation and scale
constructing code analyzers may be costly and error prone if inadequate technologies and tools are used if they are written in conventional programming language for instance several thousand lines of code may be required even for relatively simple analyses one way of facilitating the development of code analyzers is to define very high level domain oriented language and implement an application generator that creates the analyzers from the specification of the analyses they are intended to perform this paper presents system for developing code analyzers that uses database to store both no loss fine grained intermediate representation and the results of the analyses the system uses an algebraic representation called as the user visible intermediate representation analyzers are specified in declarative language called ell which enables an analysis to be specified in the form of traversal of an algebraic expression with access to and storage of the database information the algebraic expression indices foreign language interface allows the analyzers to be embedded in programs this is useful for implementing the user interface of an analyzer for example or to facilitate interoperation of the generated analyzers with pre existing tools the paper evaluates the strengths and limitations of the proposed system and compares it to other related approaches
an open vision problem is to automatically track the articulations of people from video sequence this problem is difficult because one needs to determine both the number of people in each frame and estimate their configurations but finding people and localizing their limbs is hard because people can move fast and unpredictably can appear in variety of poses and clothes and are often surrounded by limb like clutter we develop completely automatic system that works in two stages it first builds model of appearance of each person in video and then it tracks by detecting those models in each frame tracking by model building and detection we develop two algorithms that build models one bottom up approach groups together candidate body parts found throughout sequence we also describe top down approach that automatically builds people models by detecting convenient key poses within sequence we finally show that building discriminative model of appearance is quite helpful since it exploits structure in background without background subtraction we demonstrate the resulting tracker on hundreds of thousands of frames of unscripted indoor and outdoor activity feature length film run lola run and legacy sports footage from the world series and winter olympics experiments suggest that our system can count distinct individuals can identify and track them can recover when it loses track for example if individuals are occluded or briefly leave the view can identify body configuration accurately and is not dependent on particular models of human motion
in this paper we propose lightweight middleware system that supports wireless sensor networks wsns to handle real time network management using hierarchical framework the primary objective of this middleware is to provide standard management services for sensor applications to maintain network service quality with minimal human intervention middleware of sensor node also reconfigures its functionality autonomously to reflect changes of node resources expenditure or network environment in addition we propose an alternate power management solution to achieve energy efficiency of sensor networks via controlling management performance of sensor nodes this approach reduces node energy consumption without frequent reconfiguration of network management structure
high quality virtualization that is complete device semantics full feature set close to native performance and real time response is critical to both server and client virtualizations existing solutions for virtualization eg full device emulation paravirtualization and direct cannot meet the requirements of high quality virtualization due to high overheads lack of complete semantic or full feature set support we have developed new techniques for high quality virtualization including device semantic preservation essential principles for avoiding device virtualization holes and real time vmm scheduler extensions using direct with hardware iommu it not only meets the requirements of high quality virtualization but also is the basis for pci sig virtualization iov experimental results show that our implementation can achieve up to of the native performance and up to of the paravirtualization performance in addition it can improve the real time ness of the latency sensitive application by up to with the scheduler extensions
texture atlas parameterization provides an effective way to map variety of color and data attributes from texture domains onto polygonal surface meshes however the individual charts of such atlases are typically plagued by noticeable seams we describe new type of atlas which is seamless by construction our seamless atlas comprises all quadrilateral charts and permits seamless texturing as well as per fragment down sampling on rendering hardware and polygon simplification we demonstrate the use of this atlas for capturing appearance attributes and producing seamless renderings
we introduce the iceberg cube problem as reformulation of the datacube cube problem the iceberg cube problem is to compute only those group by partitions with an aggregate value eg count above some minimum support threshold the result of iceberg cube can be used to answer group by queries with clause such as having count where is greater than the threshold for mining multidimensional association rules and to complement existing strategies for identifying interesting subsets of the cube for precomputation we present new algorithm buc for iceberg cube computation buc builds the cube bottom up ie it builds the cube by starting from group by on single attribute then group by on pair of attributes then group by on three attributes and so on this is the opposite of all techniques proposed earlier for computing the cube and has an important practical advantage buc avoids computing the larger group bys that do not meet minimum support the pruning in buc is similar to the pruning in the apriori algorithm for association rules except that buc trades some pruning for locality of reference and reduced memory requirements buc uses the same pruning strategy when computing sparse complete cubes we present thorough performance evaluation over broad range of workloads our evaluation demonstrates that in contrast to earlier assumptions minimizing the aggregations or the number of sorts is not the most important aspect of the sparse cube problem the pruning in buc combined with an efficient sort method enables buc to outperform all previous algorithms for sparse cubes even for computing entire cubes and to dramatically improve iceberg cube computation
group editors allow distributed group of users to collaboratively edit shared documents such as source code and web pages as well accepted consistency control method in group editors operational transformation ot is intuitively believed to be able to preserve operation intentions however intention preservation as consistency constraint has not been rigorously formalized making it difficult to design ot algorithms and prove their correctness this work proposes formalization of intention preservation and analyzes several ot based approaches to achieving it
most of nowadays web content is stored in relational data bases it is important to develop ways for representation of this information in the semantic web to allow software agents to process it intelligently the paper presents an approach to translation of data schema and the most important constraints from relational databases into the semantic web without any extensions to the semantic web languages the acm portal is published by the association for computing machinery copyright acm inc terms of usage privacy policy code of ethics contact us useful downloads adobe acrobat quicktime windows media player real player
existing solutions to the automated physical design problem in database systems attempt to minimize execution costs of input workloads for given storage constraint in this work we argue that this model is not flexible enough to address several real world situations to overcome this limitation we introduce constraint language that is simple yet powerful enough to express many important scenarios we build upon previously proposed transformation based framework to incorporate constraints into the search space we then show experimentally that we are able to handle rich class of constraints and that our proposed technique scales gracefully our approach generalizes previous work that assumes simpler optimization models where configuration size is the only fixed constraint as consequence the process of tuning workload not only becomes more flexible but also more complex and getting the best design in the first attempt becomes difficult we propose paradigm shift for physical design tuning in which sessions are highly interactive allowing dbas to quickly try different options identify problems and obtain physical designs in an agile manner
our research project aims to build an infrastructure that can efficiently and seamlessly meet the demanding requirements of business event stream based analytical applications we focus on the complex event analysis on event streams qos aware event processing services and novel event stream processing model that seamlessly and efficiently combines event processing services that model is built upon the sense respond loops that support complete business intelligence process to sense interpret predict automate and respond to business processes and aim to decrease the time it takes to make the business decisions we have developed zelessa an event stream management system as proof of concept
this paper presents new language for identifying the changing roles that objects play over the course of the computation each object’s points to relationships with other objects determine the role that it currently plays roles therefore reflect the object’s membership in specific data structures with the object’s role changing as it moves between data structures we provide programming model which allows the developer to specify the roles of objects at different points in the computation the model also allows the developer to specify the effect of each operation at the granularity of role changes that occur in identified regions of the heap
when applying optimizations number of decisions are made using fixed strategies such as always applying an optimization if it is applicable applying optimizations in fixed order and assuming fixed configuration for optimizations such as tile size and loop unrolling factor while it is widely recognized that these fixed strategies may not be the most appropriate for producing high quality code especially for embedded systems there are no general and automatic strategies that do otherwise in this paper we present framework that enables these decisions to be made based on predicting the impact of an optimization taking into account resources and code context the framework consists of optimization models code models and resource models which are integrated for predicting the impact of applying optimizations because data cache performance is important to embedded codes we focus on cache performance and present an instance of the framework for cache performance in this paper since most opportunities for cache improvement come from loop optimizations we describe code optimization and cache models tailored to predict the impact of applying loop optimizations for data locality experimentally we demonstrate the need to selectively apply optimizations and show the performance benefit of our framework in predicting when to apply an optimization we also show that our framework can be used to choose the most beneficial optimization when number of optimizations can be applied to loop nest and lastly we show that we can use the framework to combine optimizations on loop nest
software code caches help amortize the overhead of dynamic binary transformation by enabling reuse of transformed code since code caches contain potentiallyaltered copy of every instruction that executes run time access to code cache can be very powerful opportunity unfortunately current research infrastructures lack the ability to model and direct code caching and as result past code cache investigations have required access to the source code of the binary transformation system this paper presents code cache aware interface to the pin dynamic instrumentation system while program executes our interface allows user to inspect the code cache receive callbacks when key events occur and manipulate the code cache contents at will we demonstrate the utility of this interface on four architectures ia emt ipf xscale and present several tools written using our api these tools include self modifying code handler two phase instrumentation analyzer code cache visualizer and custom code cache replacement policies we also show that tools written using our interface have comparable performance to direct source level implementations both our interface and sample open source tools that utilize the interface have been incorporated into the standard distribution of the pin dynamic instrumentation engine which has been downloaded over times in months
active memory systems help processors overcome the memory wall when applications exhibit poor cache behavior they consist of either active memory elements that perform data parallel computations in the memory system itself or an active memory controller that supports address re mapping techniques that improve data locality both active memory approaches create coherence problems even on uniprocessor systems since there are either additional processors operating on the data directly or the processor is allowed to refer to the same data via more than one address while most active memory implementations require cache flushes we propose new technique to solve the coherence problem by extending the coherence protocol our active memory controller leverages and extends the coherence mechanism so that re mapping techniques work transparently on both uniprocessor and multiprocessor systemswe present microarchitecture for an active memory controller with programmable core and specialized hardware that accelerates cache line assembly and disassembly we present detailed simulation results that show uniprocessor speedup from to on range of applications and microbenchmarks in addition to uniprocessor speedup we show single node multiprocessor speedup for parallel active memory applications and discuss how the same controller architecture supports coherent multi node systems called active memory clusters
in this paper we focus on the problem of belief aggregation ie the task of forming group consensus probability distribution by combining the beliefs of the individual members of the group we propose the use of bayesian networks to model the interactions between the individuals of the group and introduce average and majority canonical models and their application to information aggregation due to efficiency restrictions imposed by the group recommending problem where our research is framed we have had to develop specific inference algorithms to compute group recommendations
in this paper we describe semantic web application that detects conflict of interest coi relationships among potential reviewers and authors of scientific papers this application discovers various semantic associations between the reviewers and authors in populated ontology to determine degree of conflict of interest this ontology was created by integrating entities and relationships from two social networks namely knows from foaf friend of friend social network and co author from the underlying co authorship network of the dblp bibliography we describe our experiences developing this application in the context of class of semantic web applications which have important research and engineering challenges in common in addition we present an evaluation of our approach for real life coi detection
due to its high popularity weblogs or blogs in short present wealth of information that can be very helpful in assessing the general public’s sentiments and opinions in this paper we study the problem of mining sentiment information from blogs and investigate ways to use such information for predicting product sales performance based on an analysis of the complex nature of sentiments we propose sentiment plsa plsa in which blog entry is viewed as document generated by number of hidden sentiment factors training an plsa model on the blog data enables us to obtain succinct summary of the sentiment information embedded in the blogs we then present arsa an autoregressive sentiment aware model to utilize the sentiment information captured by plsa for predicting product sales performance extensive experiments were conducted on movie data set we compare arsa with alternative models that do not take into account the sentiment information as well as model with different feature selection method experiments confirm the effectiveness and superiority of the proposed approach
the tuple space coordination model is one of the most interesting coordination models for open distributed systems due to its space and time decoupling and its synchronization power several works have tried to improve the dependability of tuple spaces through the use of replication for fault tolerance and access control for security however many practical applications in the internet require both fault tolerance and security this paper describes the design and implementation of depspace byzantine fault tolerant coordination service that provides tuple space abstraction the service offered by depspace is secure reliable and available as long as less than third of service replicas are faulty moreover the content addressable confidentiality scheme developed for depspace bridges the gap between byzantine fault tolerant replication and confidentiality of replicated data and can be used in other systems that store critical data
general purpose operating systems such as unix which evolved on single processor machines have made the transition in one form or another to parallel architectures eg rothnie holman however it is not clear that all users of parallel architectures require the virtual machine presented by general purpose operating system eg bryant et al it is unfortunate if such users are given the alternatives of either compromising with whatever operating system interface is available or writing all the low level routines for themselves one solution to this problem is to provide customisable systems so that for high performance parallel applications acquire tailored resource management environment eg mukherjee and schwan this paper first gives some background to applications and operating systems and then describes flexible and extensible system currently being developed
battery lifetime has become one of the top usability concerns of mobile systems while many endeavors have been devoted to improving battery lifetime they have fallen short in understanding how users interact with batteries in response we have conducted systematic user study on battery use and recharge behavior an important aspect of user battery interaction on both laptop computers and mobile phones based on this study we present three important findings most recharges happen when the battery has substantial energy left considerable portion of the recharges are driven by context location and time and those driven by battery levels usually occur when the battery level is high and there is great variation among users and systems these findings indicate that there is substantial opportunity to enhance existing energy management policies which solely focus on extending battery lifetime and often lead to excess battery energy upon recharge by adapting the aggressiveness of the policy to match the usage and recharge patterns of the device we have designed deployed and evaluated user and statistics driven energy management system llama to exploit the battery energy in user adaptive and user friendly fashion to better serve the user we also conducted user study after the deployment that shows llama effectively harvests excess battery energy for better user experience brighter display or higher quality of service more application data without noticeable change in battery lifetime
predictive models benefit from compact non redundant subset of features that improves interpretability and generalization modern data sets are wide dirty mixed with both numerical and categorical predictors and may contain interactive effects that require complex models this is challenge for filters wrappers and embedded feature selection methods we describe details of an algorithm using tree based ensembles to generate compact subset of non redundant features parallel and serial ensembles of trees are combined into mixed method that can uncover masking and detect features of secondary effect simulated and actual examples illustrate the effectiveness of the approach the acm portal is published by the association for computing machinery copyright acm inc terms of usage privacy policy code of ethics contact us useful downloads adobe acrobat quicktime windows media player real player
this study addresses the problem of choosing the most suitable probabilistic model selection criterion for unsupervised learning of visual context of dynamic scene using mixture models rectified bayesian information criterion bicr and completed likelihood akaike’s information criterion cl aic are formulated to estimate the optimal model order complexity for given visual scene both criteria are designed to overcome poor model selection by existing popular criteria when the data sample size varies from small to large and the true mixture distribution kernel functions differ from the assumed ones extensive experiments on learning visual context for dynamic scene modelling are carried out to demonstrate the effectiveness of bicr and cl aic compared to that of existing popular model selection criteria including bic aic and integrated completed likelihood icl our study suggests that for learning visual context using mixture model bicr is the most appropriate criterion given sparse data while cl aic should be chosen given moderate or large data sample sizes
the last several years have seen proliferation of static and runtime analysis tools for finding security violations that are caused by explicit information flow in programs much of this interest has been caused by the increase in the number of vulnerabilities such as cross site scripting and sql injection in fact these explicit information flow vulnerabilities commonly found in web applications now outnumber vulnerabilities such as buffer overruns common in type unsafe languages such as and tools checking for these vulnerabilities require specification to operate in most cases the task of providing such specification is delegated to the user moreover the efficacy of these tools is only as good as the specification unfortunately writing comprehensive specification presents major challenge parts of the specification are easy to miss leading to missed vulnerabilities similarly incorrect specifications may lead to false positives this paper proposes merlin new approach for automatically inferring explicit information flow specifications from program code such specifications greatly reduce manual labor and enhance the quality of results while using tools that check for security violations caused by explicit information flow beginning with data propagation graph which represents interprocedural flow of information in the program merlin aims to automatically infer an information flow specification merlin models information flow paths in the propagation graph using probabilistic constraints naive modeling requires an exponential number of constraints one per path in the propagation graph for scalability we approximate these path constraints using constraints on chosen triples of nodes resulting in cubic number of constraints we characterize this approximation as probabilistic abstraction using the theory of probabilistic refinement developed by mciver and morgan we solve the resulting system of probabilistic constraints using factor graphs which are well known structure for performing probabilistic inference we experimentally validate the merlin approach by applying it to large business critical web applications that have been analyzed with catnet state of the art static analysis tool for net we find total of new confirmed specifications which result in total of additional vulnerabilities across the benchmarks more accurate specifications also reduce the false positive rate in our experiments merlin inferred specifications result in false positives being removed this constitutes reduction in the catnet false positive rate on these programs the final false positive rate for catnet after applying merlin in our experiments drops to under
in this paper we present long term study of user centric web traffic data collected in and from two large representative panels of french internet users our work focuses on the dynamics of personal territories on the web and their evolution between and at the session level we distinguish four profiles of browsing dynamics in and point out the growing dichotomy between straight routine sessions and exploratory browsing at global level we observe that although each individual’s corpus of visited sites is permanently growing his browsing practices are structured around routine well known sites which operate as links providers to new sites we argue that this tension between the known and the unknown is constitutive of web practices and is fundamental property of personal web territories
we study the complexity and the efficient computation of flow on triangulated terrains we present an acyclic graph the descent graph that enables us to trace flow paths in triangulations efficiently we use the descent graph to obtain efficient algorithms for computing river networks and watershed area maps in sort o’s where is the complexity of the river network and of the descent graph furthermore we describe data structure based on the subdivision of the terrain induced by the edges of the triangulation and paths of steepest ascent and descent from its vertices this data structure can be used to report the boundary of the watershed of query point or the flow path from in scan o’s where is the complexity of the subdivision underlying the data structure is the number of o’s used for planar point location in this subdivision and is the size of the reported output on fat terrains that is triangulated terrains where the minimum angle of any triangle is bounded from below by we show that the worst case complexity of the descent graph and of any path of steepest descent is where is the number of triangles in the terrain the worst case complexity of the river network and the above mentioned data structure on such terrains is when is positive constant this improves the corresponding bounds for arbitrary terrains by linear factor we prove that similar bounds cannot be proven for delaunay triangulations these can have river networks of complexity
we present zone and polygon menus two new variants of multi stroke marking menus that consider both the relative position and orientation of strokes our menus are designed to increase menu breadth over the item limit of status quo orientation based marking menus an experiment shows that zone and polygon menus can successfully increase breadth by factor of or more over orientation based marking menus while maintaining high selection speed and accuracy we also discuss hybrid techniques that may further increase menu breadth and performance our techniques offer ui designers new options for balancing menu breadth and depth against selection speed and accuracy
software routers can lead us from network of special purpose hardware routers to one of general purpose extensible infrastructure if that is they can scale to high speeds we identify the challenges in achieving this scalability and propose solution cluster based router architecture that uses an interconnect of commodity server platforms to build software routers that are both incrementally scalable and fully programmable
current trends in microprocessor designs indicate increasing pipeline depth in order to keep up with higher clock frequencies and increased architectural complexity speculatively issued instructions are particularly sensitive to increases in pipeline depth in this brief we use load hit speculation as an example and evaluate its cost effectiveness as pipeline depth increases our results indicate that as pipeline depth increases speculation is more essential for performance but can drastically alter the utilization of pipeline resources particularly the issue queue we propose an alternative more cost effective design that takes into consideration the different issue queue utilization demands without degrading overall processor performance
the predicate control problem involves synchronizing distributed computation to maintain given global predicate in contrast with many popular distributed synchronization problems such as mutual exclusion readers writers and dining philosophers predicate control assumes look ahead so that the computation is an off line rather than an on line input predicate control is targeted towards applications such as rollback recovery debugging and optimistic computing in which such computation look ahead is naturalwe define predicate control formally and show that in its full generality the problem is np complete we find efficient solutions for some important classes of predicates including disjunctive predicates mutual exclusion predicates and readers writers predicates for each class of predicates we determine the necessary and sufficient conditions for solving predicate control and describe an efficient algorithm for determining synchronization strategy in the case of independent mutual exclusion predicates we determine that predicate control is np complete and describe an efficient algorithm that finds solution under certain constraints
in this paper we demonstrate an approach for the discovery and validation of nm schema match in the hierarchical structures like the xml schemata basic idea is to propose an nm node match between children leaf nodes of two matching non leaf nodes of the two schemata the similarity computation of the two non leaf nodes is based upon the syntactic and linguistic similarity of the node labels supported by the similarity among the ancestral paths from nodes to the root the nm matching proposition is then validated with the help of the mini taxonomies hierarchical structures extracted from large set of schema trees belonging to the same domain the technique intuitively supports the collective intelligence of the domain users indirectly collaborating for the validation of the complex match propositions
we propose an efficient real time solution for tracking rigid objects in using single camera that can handle large camera displacements drastic aspect changes and partial occlusions while commercial products are already available for offline camera registration robust online tracking remains an open issue because many real time algorithms described in the literature still lack robustness and are prone to drift and jitter to address these problems we have formulated the tracking problem in terms of local bundle adjustment and have developed method for establishing image correspondences that can equally well handle short and wide baseline matching we then can merge the information from preceding frames with that provided by very limited number of keyframes created during training stage which results in real time tracker that does not jitter or drift and can deal with significant aspect changes
this paper deals with the problem of admission control for geographically distributed web servers in presence of several access routers the main contribution of this paper is the proposal of scalable admission control scheme with the purpose to accept as many new sessions as possible within the constraints on response time imposed by service level agreements the proposed policy autonomously configures and periodically adapts its component level parameters to the time varying traffic situations extensive simulations of our policy show that our algorithm always guarantees the adherence to slas under different traffic scenarios the proposed method shows stable behavior during overload by smoothing flash crowd effects it also improves the successful session termination probability and the utilization of system resources when compared to other traditional admission control schemes
instruction packing is combination compiler architectural approach that allows for decreased code size reduced power consumption and improved performance the packing is obtained by placing frequently occurring instructions into an instruction register file irf multiple irf entries can then be accessed using special packed instructions previous irf efforts focused on using single entry register file for the duration of an application this paper presents software and hardware extensions to the irf supporting multiple instruction register windows to allow greater number of relevant instructions to be available for packing in each function windows are shared among similar functions to reduce the overall costs involved in such an approach the results indicate that significant improvements in instruction fetch cost can be obtained by using this simple architectural enhancement we also show that using an irf with loop cache which is also used to reduce energy consumption results in much less energy consumption than using either feature in isolation
in this article we present new approach to page ranking the page rank of collection of web pages can be represented in parameterized model and the user requirements can be represented by set of constraints for particular parameterization namely linear combination of the page ranks produced by different forcing functions and user requirements represented by set of linear constraints the problem can be solved using quadratic programming method the solution to this problem produces set of parameters which can be used for ranking all pages in the web we show that the method is suitable for building customized versions of pagerank which can be readily adapted to the needs of vertical search engine or that of single user
considerable body of work on model based software debugging mbsd has been published in the past decade we summarise the underlying ideas and present the different approaches as abstractions of the concrete semantics of the programming language we compare the model based framework with other well known automated debugging approaches and present open issues challenges and potential future directions of mbsd
with the increased abilities for automated data collection made possible by modern technology the typical sizes of data collections have continued to grow in recent years in such cases it may be desirable to store the data in reduced format in order to improve the storage transfer time and processing requirements on the data one of the challenges of designing effective data compression techniques is to be able to preserve the ability to use the reduced format directly for wide range of database and data mining applications in this paper we propose the novel idea of hierarchical subspace sampling in order to create reduced representation of the data the method is naturally able to estimate the local implicit dimensionalities of each point very effectively and thereby create variable dimensionality reduced representation of the data such technique has the advantage that it is very adaptive about adjusting its representation depending upon the behavior of the immediate locality of data point an interesting property of the subspace sampling technique is that unlike all other data reduction techniques the overall efficiency of compression improves with increasing database size this is highly desirable property for any data reduction system since the problem itself is motivated by the large size of data sets because of its sampling approach the procedure is extremely fast and scales linearly both with data set size and dimensionality furthermore the subspace sampling technique is able to reveal important local subspace characteristics of high dimensional data which can be harnessed for effective solutions to problems such as selectivity estimation and approximate nearest neighbor search
the majority of people in rural developing regions do not have access to the world wide web traditional network connectivity technologies have proven to be prohibitively expensive in these areas the emergence of new long range wireless technologies provide hope for connecting these rural regions to the internet however the network connectivity provided by these new solutions are by nature intermittent due to high network usage rates frequent power cuts and the use of delay tolerant links typical applications especially interactive applications like web search do not tolerate intermittent connectivity in this paper we present the design and implementation of ruralcafe system intended to support efficient web search over intermittent networks ruralcafe enables users to perform web search asynchronously and find what they are looking for in one round of intermittency as opposed to multiple rounds of search downloads ruralcafe does this by providing an expanded search query interface which allows user to specify additional query terms to maximize the utility of the results returned by search query given knowledge of the limited available network resources ruralcafe performs optimizations to prefetch pages to best satisfy search query based on user’s search preferences in addition ruralcafe does not require modifications to the web browser and can provide single round search results tailored to various types of networks and economic constraints we have implemented and evaluated the effectiveness of ruralcafe using queries from logs made to large search engine queries made by users in an intermittent setting and live queries from small testbed deployment we have also deployed prototype of ruralcafe in kerala india
program profiles identify frequently executed portions of program which are the places at which optimizations offer programmers and compilers the greatest benefit compilers however infrequently exploit program profiles because profiling program requires programmer to instrument and run the program an attractive alternative is for the compiler to statically estimate program profiles this paper presents several new techniques for static branch prediction and profiling the first technique combines multiple predictions of branch’s outcome into prediction of the probability that the branch is taken another technique uses these predictions to estimate the relative execution frequency ie profile of basic blocks and control flow edges within procedure third algorithm uses local frequency estimates to predict the global frequency of calls procedure invocations and basic block and control flow edge executions experiments on the spec integer benchmarks and unix applications show that the frequently executed blocks edges and functions identified by our techniques closely match those in dynamic profile
this paper presents direct method for finding corresponding pairs of parts between two shapes statistical knowledge about large number of parts from many different objects is used to find part correspondence between two previously unseen input shapes no class membership information is required the knowledge based approach is shown to produce significantly better results than classical metric distance approach the potential role of part correspondence as complement to geometric and structural comparisons is discussed
here we propose weighted checkpointing approach for mobile distributed computing systems mdcss that significantly reduces checkpointing overheads on mobile nodes checkpointing protocols can be coordinated log based or quasi synchronous coordinated checkpointing requires extra synchronisation messages and may block the underlying computation in quasi synchronous approach processes have limited autonomy in checkpointing but all nodes need not checkpoint concurrently such protocols guarantee consistent global state but results in dynamic checkpointing overheads to minimise these overheads we propose weighted checkpointing protocol that requires no synchronisation messages reduces the checkpointing overheads at mobile nodes but requires logging for mobile nodes simulation results show that the new approach is better than quasi synchronous approach
the advent of large scale distributed systems poses unique engineering challenges in open systems such as the internet it is not possible to prescribe the behaviour of all of the components of the system in advance rather we attempt to design infrastructure such as network protocols in such way that the overall system is robust despite the fact that numerous arbitrary non certified third party components can connect to our system economists have long understood this issue since it is analogous to the design of the rules governing auctions and other marketplaces in which we attempt to achieve socially desirable outcomes despite the impossibility of prescribing the exact behaviour of the market participants who may attempt to subvert the market for their own personal gain this field is known as mechanism design the science of designing rules of game to achieve specific outcome even though each participant may be self interested although it originated in economics mechanism design has become an important foundation of multi agent systems mas research in traditional mechanism design problem analytical methods are used to prove that agents game theoretically optimal strategies lead to socially desirable outcomes in many scenarios traditional mechanism design and auction theory yield clear cut results however there are many situations in which the underlying assumptions of the theory are violated due to the messiness of the real world in this paper we review alternative approaches to mechanism design which treat it as an engineering problem and bring to bear engineering design principles viz iterative step wise refinement of solutions and satisficing instead of optimization in the face of intractable complexity we categorize these approaches under the banner of evolutionary mechanism design
this paper presents theory of dynamic slicing which reveals that the relationship between static and dynamic slicing is more subtle than previously thought the definitions of dynamic slicing are formulated in terms of the projection theory of slicing this shows that existing forms of dynamic slicing contain three orthogonal dimensions in their slicing criteria and allows for lattice theoretic study of the subsumption relationship between these dimensions and their relationship to static slicing formulations
conditional functional dependencies cfds have recently been proposed as extensions of classical functional dependencies that apply to certain subset of the relation as specified by pattern tableau calculating the support and confidence of cfd ie the size of the applicable subset and the extent to which it satisfies the cfd gives valuable information about data semantics and data quality while computing the support is easier computing the confidence exactly is expensive if the relation is large and estimating it from random sample of the relation is unreliable unless the sample is large we study how to efficiently estimate the confidence of cfd with small number of passes one or two over the input using small space our solutions are based on variety of sampling and sketching techniques and apply when the pattern tableau is known in advance and also the harder case when this is given after the data have been seen we analyze our algorithms and show that they can guarantee small additive error we also show that relative errors guarantees are not possible we demonstrate the power of these methods empirically with detailed study using both real and synthetic data these experiments show that it is possible to estimate the cfd confidence very accurately with summaries which are much smaller than the size of the data they represent
we propose methodology for building robust query classification system that can identify thousands of query classes while dealing in real time with the query volume of commercial web search engine we use pseudo relevance feedback technique given query we determine its topic by classifying the web search results retrieved by the query motivated by the needs of search advertising we primarily focus on rare queries which are the hardest from the point of view of machine learning yet in aggregate account for considerable fraction of search engine traffic empirical evaluation confirms that our methodology yields considerably higher classification accuracy than previously reported we believe that the proposed methodology will lead to better matching of online ads to rare queries and overall to better user experience
in this paper we study the media workload collected from large number of commercial web sites hosted by major isp and that collected from large group of home users connected to the internet via well known cable company some of our key findings are surprisingly the majority of media contents are still delivered via downloading from web servers substantial percentage of media downloading connections are aborted before completion due to the long waiting time hybrid approach pseudo streaming is used by clients to imitate real streaming the mismatch between the downloading rate and the client playback speed in pseudo streaming is common which either causes frequent playback delays to the clients or unnecessary traffic to the internet compared with streaming downloading and pseudo streaming are neither bandwidth efficient nor performance effective to address this problem we propose the design of autostream an innovative system that can provide additional previewing and streaming services automatically for media objects hosted on standard web sites in server farms at the client’s will
in aspect oriented programming languages advice evaluation is usually considered as part of the base program evaluation this is also the case for certain pointcuts such as if pointcuts in aspectj or simply all pointcuts in higher order aspect languages like aspectscheme while viewing aspects as part of base level computation clearly distinguishes aop from reflection it also comes at price because aspects observe base level computation evaluating pointcuts and advice at the base level can trigger infinite regression to avoid these pitfalls aspect languages propose ad hoc mechanisms which increase the complexity for programmers while being insufficient in many cases after shedding light on the many facets of the issue this paper proposes to clarify the situation by introducing levels of execution in the programming language thereby allowing aspects to observe and run at specific possibly different levels we adopt defensive default that avoids infinite regression in all cases and give advanced programmers the means to override this default using level shifting operators we formalize the semantics of our proposal and provide an implementation this work recognizes that different aspects differ in their intended nature and shows that structuring execution contexts helps tame the power of aspects and metaprogramming
query intent classification is crucial for web search and advertising it is known to be challenging because web queries contain less than three words on average and so provide little signal to base classification decisions on at the same time the vocabulary used in search queries is vast thus classifiers based on word occurrence have to deal with very sparse feature space and often require large amounts of training data prior efforts to address the issue of feature sparseness augmented the feature space using features computed from the results obtained by issuing the query to be classified against web search engine however these approaches induce high latency making them unacceptable in practice in this paper we propose new class of features that realizes the benefit of search based features without high latency these leverage co occurrence between the query keywords and tags applied to documents in search results resulting in significant boost to web query classification accuracy by pre computing the tag incidence for suitably chosen set of keyword combinations we are able to generate the features online with low latency and memory requirements we evaluate the accuracy of our approach using large corpus of real web queries in the context of commercial search
bugs in concurrent programs are extremely difficult to find and fix during testing in this paper we propose kivati which can efficiently detect and prevent atomicity violation bugs kivati imposes an average run time overhead of which makes it practical to deploy on software in production environments the key attribute that allows kivati to impose this low overhead is its use of hardware watchpoints which can be found on most commodity processors kivati combines watchpoints with simple static analysis that annotates regions of codes that likely need to be executed atomically the watchpoints are then used to monitor these regions for interleaving accesses that may lead to an atomicity violation when an atomicity violation is detected kivati dynamically reorders the access to prevent the violation from occurring kivati can be run in prevention mode which optimizes for performance or in bug finding mode which trades some performance for an enhanced ability to find bugs we implement and evaluate prototype of kivati that protects applications written in on linux platforms we find that kivati is able to detect and prevent atomicity violation bugs in real applications and imposes very reasonable overheads when doing so
if an off the shelf software product exhibits poor dependability due to design faults software fault tolerance is often the only way available to users and system integrators to alleviate the problem thanks to low acquisition costs even using multiple versions of software in parallel architecture scheme formerly reserved for few and highly critical applications may become viable for many applications we have studied the potential dependability gains from these solutions for off the shelf database servers we based the study on the bug reports available for four off the shelf sql servers plus later releases of two of them we found that many of these faults cause systematic non crash failures category ignored by most studies and standard implementations of fault tolerance for databases our observations suggest that diverse redundancy would be effective for tolerating design faults in this category of products only in very few cases would demands that triggered bug in one server cause failures in another one and there were no coincident failures in more than two of the servers use of different releases of the same product would also tolerate significant fraction of the faults we report our results and discuss their implications the architectural options available for exploiting them and the difficulties that they may present
we present new algorithm to compute topologically and geometrically accurate triangulation of an implicit surface our approach uses spatial subdivision techniques to decompose manifold implicit surface into star shaped patches and computes visibilty map for each patch based on these maps we compute homeomorphic and watertight triangulation as well as parameterization of the implicit surface our algorithm is general and makes no assumption about the smoothness of the implicit surface it can be easily implemented using linear programming interval arithmetic and ray shooting techniques we highlight its application to many complex implicit models and boundary evaluation of csg primitives
as the adoption of grid computing in organizations expands the need for wise utilization of different types of resource also increases volatile resource such as desktop computer is common type of resource found in grids however using efficiently other types of resource such as space shared resources represented by parallel supercomputers and clusters of workstations is extremely important since they can provide great amount of computation power using space shared resources in grids is not straightforward since they require jobs priori to specify some parameters such as allocation time and amount of processors current solutions eg grid resource and allocation management gram are based on the explicit definition of these parameters by the user on the other hand good progress has been made in supporting bag of tasks bot applications on grids this is restricted model of parallelism on which tasks do not communicate among themselves making recovering from failures simple matter of reexecuting tasks as such there is no need to specify maximum number of resources or period of time that resources must be executing the application such as required by space shared resources besides this state of affairs makes leverage from space shared resources hard for bot applications running on grid this paper presents an explicit allocation strategy in which an adaptor automatically fits grid requests to the resource in order to decrease the turn around time of the application we compare it with another strategy described in our previous work called transparent allocation strategy in which idle nodes of the space shared resource are donated to the grid as we shall see both strategies provide good results moreover they are complementary in the sense that they fulfill different usage roles the transparent allocation strategy enables resource owner to raise its utilization by offering cycles that would otherwise go wasted while protecting the local workload from increased contention the explicit allocation strategy conversely allows user to benefit from the accesses she has to space shared resources in the grid enabling her natively to submit tasks without having to craft time processors requests
recently there has been growing interest in streaming xml data much of the work on streaming xml data has been focused on efficient filtering filtering systems deliver xml documents to interested users the burden of extracting the xml fragments of interest from xml documents is placed on users in this paper we propose xtream which evaluates multiple queries in conjunction with the read once nature of streaming data in contrast to the previous work xtream supports wide class of xpath queries including tree shaped expressions order based predicates and nested predicates in addition to improve the efficiency and scalability of xtream we devise an optimization technique called query compaction experimental results with real life and synthetic xml data demonstrate the efficiency and scalability of xtream
transactional memory tm is on its way to becoming the programming api of choice for writing correct concurrent and scalable programs hardware tm htm implementations are expected to be significantly faster than pure software tm stm however full hardware support for true closed and open nested transactions is unlikely to be practical this paper presents novel mechanism the split hardware transaction spht that uses minimal software support to combine multiple segments of an atomic block each executed using separate hardware transaction into one atomic operation the idea of segmenting transactions can be used for many purposes including nesting local retry orelse and user level thread scheduling in this paper we focus on how it allows linear closed and open nesting of transactions spht overcomes the limited expressive power of best effort htm while imposing overheads dramatically lower than stm and preserving useful guarantees such as strong atomicity provided by the underlying htm
good random number generator is essential for many graphics applications as more such applications move onto parallel processing it is vital that good parallel random number generator be used unfortunately most random number generators today are still sequential exposing performance bottlenecks and denying random accessibility for parallel computations furthermore popular parallel random number generators are still based off sequential methods and can exhibit statistical bias in this paper we propose random number generator that maps well onto parallel processor while possessing white noise distribution our generator is based on cryptographic hash functions whose statistical robustness has been examined under heavy scrutiny by cryptologists we implement our generator as gpu pixel program allowing us to compute random numbers in parallel just like ordinary texture fetches given texture coordinate per pixel instead of returning texel as in ordinary texture fetches our pixel program computes random noise value based on this given texture coordinate we demonstrate that our approach features the best quality speed and random accessibility for graphics applications
what happens when you can access all the world’s media but the access is constrained by screen size bandwidth attention and battery life we present novel mobile context aware software prototype that enables access to images on the go our prototype utilizes the channel metaphor to give users contextual access to media of interest according to key dimensions spatial social and topical our experimental prototype attempts to be playful and simple to use yet provide powerful and comprehensive media access temporally driven sorting scheme for media items allows quick and easy access to items of interest in any dimension for ad hoc tasks we extend the application with keyword search to deliver the long tail of media and images elements of social interaction and communication around the photographs are built into the mobile application to increase user engagement the application utilizes flickrcom as an image and social network data source but could easily be extended to support other websites and media formats
timed interval calculus tic is highly expressive set based notation for specifying and reasoning about embedded real time systems however it lacks mechanical proving support as its verification usually involves infinite time intervals and continuous dynamics in this paper we develop system based on generic theorem prover prototype verification system pvs to assist formal verification of tic at high grade of automation tic semantics has been constructed by the pvs typed higher order logic based on the encoding we have checked all tic reasoning rules and discovered subtle flaws translator has been implemented in java to automatically transform tic models into pvs specifications collection of supplementary rules and pvs strategies has been defined to facilitate the rigorous reasoning of tic models with functional and non functional for example real time requirements at the interval level our approach is generic and can be applied further to support other real time notations
facilitating engaging user experiences is essential in the design of interactive systems to accomplish this it is necessary to understand the composition of this construct and how to evaluate it building on previous work that posited theory of engagement and identified core set of attributes that operationalized this construct we constructed and evaluated multidimensional scale to measure user engagement in this paper we describe the development of the scale as well as two large scale studies and that were undertaken to assess its reliability and validity in online shopping environments in the first we used reliability analysis and exploratory factor analysis to identify six attributes of engagement perceived usability aesthetics focused attention felt involvement novelty and endurability in the second we tested the validity of and relationships among those attributes using structural equation modeling the result of this research is multidimensional scale that may be used to test the engagement of software applications in addition findings indicate that attributes of engagement are highly intertwined complex interplay of user system interaction variables notably perceived usability played mediating role in the relationship between endurability and novelty aesthetics felt involvement and focused attention copy wiley periodicals inc
this paper presents tunable content based music retrieval cbmr system suitable for retrieval of music audio clips audio clips are represented as extracted feature vectors the cbmr system is expert tunable by altering the feature space the feature space is tuned according to the expert specified similarity criteria expressed in terms of clusters of similar audio clips the tuning process utilizes our genetic algorithm that optimizes cluster compactness the tree index for efficient retrieval of audio clips is based on the clustering of feature vectors for each cluster minimal bounding rectangle mbr is formed thus providing objects for indexing inserting new nodes into the tree is efficiently conducted because of the chosen quadratic split algorithm our cbmr system implements the point query and the nearest neighbors query with the log time complexity the paper includes experimental results in measuring retrieval performance in terms of precision and recall significant improvement in retrieval performance over the untuned feature space is reported
this paper addresses the problem of data integration in pp environment where each peer stores schema of its local data mappings between the schemas and some schema constraints the goal of the integration is to answer queries formulated against chosen peer the answer consists of data stored in the queried peer as well as data of its direct and indirect partners we focus on defining and using mappings schema constraints query propagation across the pp system and query reformulation in such scenario the main focus is the exploitation of constraints for merging results from different peers to derive more complex information and utilizing constraint knowledge to query propagation and the merging strategy we show how the discussed method has been implemented in sixpp system
we present comparative performance study of wide selection of optimization techniques to enhance application performance in the context of wide area wireless networks wwans unlike in traditional wired and wireless ip based networks applications running over wwan cellular environments are significantly affected by the vagaries of the cellular wireless medium prior research has proposed and analyzed optimizations at individual layers of the protocol stack in contrast we introduce the first detailed experiment based evaluation and comparison of all such optimization techniques in commercial wwan testbed this paper therefore summarizes our experience in implementing and deploying an infrastructure to improve wwan performancethe goals of this paper are to perform an accurate benchmark of application performance over such commercially deployed wwan environments to implement and characterize the impact of various optimization techniques across different layers of the protocol stack and to quantify their interdependencies in realistic scenarios additionally we also discuss measurement pitfalls that we experienced and provide guidelines that may be useful for future experimentation in wwan environments
this contribution suggests novel approach for systematic and automatic generation of process models from example runs the language used for process models is place transition petri nets the language used for example runs is labelled partial orders the approach adopts techniques from petri net synthesis and from process mining in addition to formal treatment of the approach case study is presented and implementation issues are discussed
modeling the evolution of the state of program memory during program execution is critical to many parallelization techniques current memory analysis techniques either provide very accurate information but run prohibitively slowly or produce very conservative results an approach based on abstract interpretation is presented for analyzing programs at compile time which can accurately determine many important program properties such as aliasing logical data structures and shape these properties are known to be critical for transforming single threaded program into version that can be run on multiple execution units in parallel the analysis is shown to be of polynomial complexity in the size of the memory heap experimental results for benchmarks in the jolden suite are given these results show that in practice the analysis method is efficient and is capable of accurately determining shape information in programs that create and manipulate complex data structures
the manticore project is an effort to design and implement new functional language for parallel programming unlike many earlier parallel languages manticore is heterogeneous language that supports parallelism at multiple levels specifically we combine cml style explicit concurrency with nesl nepal style data parallelism in this paper we describe and motivate the design of the manticore language we also describe flexible runtime model that supports multiple scheduling disciplines eg for both fine grain and course grain parallelism in uniform framework work on prototype implementation is ongoing and we give status report
with the migration to deep sub micron process technologies the power consumption of circuit has come to the forefront of concerns and as result the power has become critical design parameter this paper presents novel high level synthesis methodology called power islands synthesis that eliminates the spurious switching activity and the leakage in great portion of the resulting circuit by partitioning it into islands each island is cluster of logic whose power can be controlled independently from the rest of the circuit and hence can be completely powered down when all of the logic it contains is idling the partitioning is done in such way that the components with maximally overlapping lifetimes are placed on the same island by powering down an island during its idle cycles the following occur the spurious switching that results from the broadcast to idle components is silenced and the power consumption due to leakage in inactive components is eliminated experiments conducted on several synthesis benchmarks implemented at the layout level with nm process technology and simulated using transistor level simulator showed power savings ranging from to due to our methodology the reported savings were entirely from the power down of combinational elements functional resources of the data path
we describe unified representation of occluders in light transport and photography using shield fields the attenuation function which acts on any light field incident on an occluder our key theoretical result is that shield fields can be used to decouple the effects of occluders and incident illumination we first describe the properties of shield fields in the frequency domain and briefly analyze the forward problem of efficiently computing cast shadows afterwards we apply the shield field signal processing framework to make several new observations regarding the inverse problem of reconstructing occluders from cast shadows extending previous work on shape from silhouette and visual hull methods from this analysis we develop the first single camera single shot approach to capture visual hulls without requiring moving or programmable illumination we analyze several competing camera designs ultimately leading to the development of new large format mask based light field camera that exploits optimal tiled broadband codes for light efficient shield field capture we conclude by presenting detailed experimental analysis of shield field capture and occluder reconstruction
the mining of frequent patterns is basic problem in data mining applications frequent maximal and closed itemsets mining has become an important alternative of association rule mining in this paper we present an effective algorithm which based on the blanket approach for mining all frequent maximal closed itemsets the performance of the proposed algorithm had been compared with recently developed algorithms the results show how the proposed algorithm gives better performance this is achieved by examining the performance and functionality of the proposed technique
to achieve scalable and efficient broadcast authentication in large scale networks two asymmetry mechanisms have often been employed the cryptographic asymmetry and the time asymmetry authentication schemes using digital signatures are based on the cryptographic asymmetry while tesla and related protocols using hash chains and delayed key release methods are based on the time asymmetry however the former is computationally expensive while the latter provides delayed authentication only therefore they are vulnerable to denial of service attacks that repeatedly request signature verifications with false messages in this paper we propose novel broadcast authentication mechanism based on our information asymmetry model it leverages an asymmetric distribution of keys between sink and sensor nodes and uses the bloom filter as an authenticator which efficiently compresses multiple authentication information in addition with novel false negative tuning knob introduced in construction of the bloom filter we show that scalability of our authentication method can greatly be improved through an intensive analysis we demonstrate optimized trade off relationships between resiliency against compromised nodes and scalability of system size optimization results indicate that the proposed authentication scheme achieves low false positive rates with small signature verification costs
method is discussed for synchronizing operations on objects when the operations are invoked by transactions the technique which is motivated by desire to make use of possible concurrency in accessing objects takes into consideration the granularity at which operations affect an object dynamic method is presented for determining the compatibility of an invoked operation with respect to operations in progress in making decisions it utilizes the state of the object the semantics of the uncommitted operations the actual parameters of the invoked operation and the effect of the operations on the objects one of the attractive features of this technique is that single framework can be used to deal with the problem of synchronizing access to simple objects as well as compound objects ie objects in which some components are themselves objects
packet forwarding prioritization pfp in routers is one of the mechanisms commonly available to network operators pfp can have significant impact on the accuracy of network measurements the performance of applications and the effectiveness of network troubleshooting procedures despite its potential impacts no information on pfp settings is readily available to end users in this paper we present an end to end approach for pfp inference and its associated tool popi this is the first attempt to infer router packet forwarding priority through end to end measurement popi enables users to discover such network policies through measurements of packet losses of different packet types we evaluated our approach via statistical analysis simulation and wide area experimentation in planetlab we employed popi to analyze paths among planetlab sites popi flagged paths with multiple priorities of which were further validated through hop by hop loss rates measurements in addition we surveyed all related network operators and received responses for about half of them all confirming our inferences besides we compared popi with the inference mechanisms through other metrics such as packet reordering called out of order ooo ooo is unable to find many priority paths such as those implemented via traffic policing on the other hand interestingly we found it can detect existence of the mechanisms which induce delay differences among packet types such as slow processing path in the router and port based load sharing
most game interfaces today are largely symbolic translating simplified input such as keystrokes into the choreography of full body character movement in this paper we describe system that directly uses human motion performance to provide radically different and much more expressive interface for controlling virtual characters our system takes data feed from motion capture system as input and in real time translates the performance into corresponding actions in virtual world the difficulty with such an approach arises from the need to manage the discrepancy between the real and virtual world leading to two important subproblems recognizing the user’s intention and simulating the appropriate action based on the intention and virtual context we solve this issue by first enabling the virtual world’s designer to specify possible activities in terms of prominent features of the world along with associated motion clips depicting interactions we then integrate the prerecorded motions with online performance and dynamic simulation to synthesize seamless interaction of the virtual character in simulated virtual world the result is flexible interface through which user can make freeform control choices while the resulting character motion maintains both physical realism and the user’s personal style
parallel computing performance on scalable shared memory architectures is affected by the structure of the interconnection networks linking processors to memory modules and on the efficiency of the memory cache management systems cache coherence nonuniform memory access cc numa and cache only memory access coma are two effective memory systems and the hierarchical ring structure is an efficient interconnection network in hardware this paper focuses on comparative performance modeling and evaluation of cc numa and coma on hierarchical ring shared memory architecture analytical models for the two memory systems for comparative evaluation are presented intensive performance measurements on data migrations have been conducted on the ksr coma hierarchical ring shared memory machine experimental results support the analytical models and we present practical observations and comparisons of the two cache coherence memory systems our analytical and experimental results show that coma system balances the work load well however the overhead of frequent data movement may match the gains obtained from improving load balance we believe our performance results could be further generalized to the two memory systems on hierarchical network architecture although cc numa system may not automatically balance the load at the system level it provides an option for user to explicitly handle data locality for possible performance improvement
context and history visualization plays an important role in visual data mining especially in the visual exploration of large and complex data sets the preservation of context and history information in the visualization can improve user comprehension of the exploration process as well as enhance the reusability of mining techniques and parameters to archive the desired results this chapter presents methodology and various interactive visualization techniques supporting visual data mining in general as well as for visual preservation of context and history information algorithms are also described in supporting such methodology for visual data mining in real time
in wireless sensor networks the data of neighbouring sensor nodes are often highly correlated due to that they have overlapping sensing areas to exploit the data correlation for reducing the amount of data while transmitting data from sensor nodes to their base station this paper introduces multiple input turbo mit code to implement jointly source coding channel coding and data aggregation for wireless systems in particular wireless sensor networks mit code has multiple input sequences and employs partial interleaving to reduce the memory size and access requirements this paper also addresses how mit code can be used to implement security together with data aggregation and source channel coding the simulation results show that the bit error rate ber performance of mit code is slightly better than turbo codes even if mit code implements partial interleaving and has multiple inputs
user based document management system has been developed for small communities on the web the system is based on the free annotation of documents by users number of annotation support tools are used to suggest possible annotations including suggesting terms from external ontologies this paper outlines some evaluation data on how users actually interact with the system in annotating their document especially on the use of standard ontologies results indicate that although an established external taxonomy can be useful in proposing annotation terms users appear to be very selective in their use of the terms proposed and to have little interest in adhering to the particular hierarchical structure provided
there is increasing use of combinations of modal logics in both foundational and applied research areas this article provides an introduction to both the principles of such combinations and to the variety of techniques that have been developed for them in addition the article outlines many key research problems yet to be tackled within this callenging area of work
this paper describes transformational method applied to the core component of role based access control rbac to derive efficient implementations from specification based on the ansi standard for rbac the method is based on the idea of incrementally maintaining the result of expensive set operations where new method is described and used for systematically deriving incrementalization rules we calculate precise complexities for three variants of efficient implementations as well as for straightforward implementation based on the specification we describe successful prototypes and experiments for the efficient implementations and for automatically generating efficient implementations from straightforward implementations
we address the problem of estimating the shape and appearance of scene made of smooth lambertian surfaces with piecewise smooth albedo we allow the scene to have self occlusions and multiple connected components this class of surfaces is often used as an approximation of scenes populated by man made objects we assume we are given number of images taken from different vantage points mathematically this problem can be posed as an extension of mumford and shah’s approach to static image segmentation to the segmentation of function defined on deforming surface we propose an iterative procedure to minimize global cost functional that combines geometric priors on both the shape of the scene and the boundary between smooth albedo regions we carry out the numerical implementation in the level set framework
interactive steering can be valuable tool for understanding and controlling distributed computation in real time with interactive steering the user may change the state of computation by adjusting application parameters on the fly in our system we model both the program’s execution and steering actions in terms of transactions we define steering transaction as consistent if its vector time is not concurrent with the vector time of any program transaction that is consistent steering transactions occur between program transactions at point that represents consistent cut in this paper we present an algorithm for verifying the consistency of steering transactions the algorithm analyzes record of the program transactions and compares it against the steering transaction if the time at which the steering transaction was applied is inconsistent the algorithm generates vector representing the earliest consistent time at which the steering transaction could have been applied
in recent years we have seen an enormous growth in the size and prevalence of data processing workloads fayyad gray the picture that is becoming increasingly common is depicted in figure in it organizations or resourceful individuals provide services via set of loosely coupled workstation nodes the service is usually some form of data mining like searching filtering or image recognition clients which could be machines running web browsers not only initiate requests but also partake in the processing with the goal of reducing the request turnaround that is when the servers are overloaded clients with spare cycles take some of the computational burden naturally many aspects of such system cannot be determined at design time eg exactly how much work client should do depends on the computational resources available at the client and server cluster the network bandwidth unused between them and the workload demand this position paper is interested in this and other aspects that must be divined at run time to provide high performance and availability in data parallel systemswhat makes system tuning especially hard is that it’s not possible to find the right knob settings once and for all system upgrade or component failure may change the appropriate degree of data parallelism changes in usable bandwidth may ask for different partitioning of code among the client and server cluster moreover an application may go through distinct phases during its execution we should checkpoint the application for fault tolerance less often during those phases in which checkpointing takes longer finally the system needs to effectively allocate resources to concurrent applications which can start at any time and which benefit differently from having these resources in summary we argue that in the future significant fraction of computing will happen on architectures like figure and that due to the architectures inherent complexity high availability and fast turnaround can only be realized by dynamically tuning number of system parametersour position is that this tuning should be provided automatically by the system the contrasting application specific view contends that to the extent possible policies should be made by applications since they can make more informed optimizations however this requires great deal of sophistication from the programmer further it requires programmer time one of the most scarce resources in systems building todaytoward our goal we contribute framework that is sufficiently rich to express variety of interesting data parallel applications but which is also restricted enough so that the system can tune itself these applications are built atop the abacus migration system whose object placement algorithms are extended to reason about how many nodes should participate in data parallel computation how to split up application objects among client and server cluster how often program state should be checkpointed and the interaction sometimes conflicting between these questions by automatically determining number of critical parameters at runtime we are minimizing the management costs which have in recent years given system administrators the howling fantods satyanarayanan
we develop techniques for discovering patterns with periodicity in this work patterns with periodicity are those that occur at regular time intervals and therefore there are two aspects to the problem finding the pattern and determining the periodicity the difficulty of the task lies in the problem of discovering these regular time intervals ie the periodicity periodicities in the database are usually not very precise and have disturbances and might occur at time intervals in multiple time granularities to overcome these difficulties and to be able to discover the patterns with fuzzy periodicity we propose the fuzzy periodic calendar which defines fuzzy periodicities furthermore we develop algorithms for mining fuzzy periodicities and the fuzzy periodic association rules within them experimental results have shown that our method is effective in discovering fuzzy periodic association rules
in multiprocessor system on chips mpsocs that use snoop based cache coherency protocols miss in the data cache triggers the broadcast of coherency request to all the remote caches to keep all data coherent however the majority of these requests are unnecessary because remote caches do not have the matching blocks and so their tag lookups fail both the coherency requests and the tag lookups corresponding to remote miss consume unnecessary energy we propose an architecture level technique for snoop energy reduction called broadcast filtering which prevents unnecessary coherency requests from being broadcast to remote caches and thus reduces snoop energy consumption by both the cache and bus broadcast filtering is implemented using snooping cache and split bus the snooping cache checks if block that cannot be obtained locally exists in remote caches before broadcasting coherency request if no remote cache has the matching block there is no broadcast and if broadcasting is necessary the split bus allows coherency requests to be broadcast selectively to the remote caches which have matching blocks experimental results show reduction by of cache lookups by of bus usage and by of snoop energy consumption at small cost in reduced performance an analysis result based on the energy model shows the broadcast filtering technique can reduce by up to of energy consumption per cache coherency operation
modern micro architectures employ superscalar techniques to enhance system performance the superscalar microprocessors must fetch at least one instruction cache line at time to support high issue rate and large amount speculative executionsin this paper we propose the grouped branch prediction gbp that can recognize and predict multiple branches in the same instruction cache line for wide issue micro architecture several configurations of the gbp with different group sizes are simulated the simulation results show that the branch penalty of the group size with entry is under clock cycle in our design we choose the two group scheme with group size this feature achieves an average of ipcf the number of instructions fetched per cycle for machine front end furthermore we extend the gbp to achieve two cache lines predictions with two fetch units the scheme of the entry group with groupsize can produce an average of ipcf the performance is approximately better than the original group gbp’s the added hardware cost bits is less than
wireless ad hoc networks rely on multi hop routes to transport data from source to destination the routing function is implemented in collaborative manner with each node responsible for relaying traffic to the destination however an increasingly sophisticated pool of users with easy access to commercial wireless devices combined with the poor physical and software security of the devices can lead to node misconfiguration or misbehavior misbehaving node may refuse to forward packets in order to conserve its energy selfishness or simply degrade network performance maliciousness in this paper we investigate the problem of uniquely identifying the set of misbehaving nodes who refuse to forward packets we propose novel misbehavior identification scheme called react that provides resource efficient accountability for node misbehavior react identifies misbehaving nodes based on series of random audits triggered upon performance drop we show that source destination pair using react can identify any number of independently misbehaving nodes based on behavioral proofs provided by nodes proofs are constructed using bloom filters which are storage efficient membership structures thus significantly reducing the communication overhead for misbehavior detection
this paper proposes novel facial expression recognizer and describes its application to group meeting analysis our goal is to automatically discover the interpersonal emotions that evolve over time in meetings eg how each person feels about the others or who affectively influences the others the most as the emotion cue we focus on facial expression more specifically smile and aim to recognize who is smiling at whom when and how often since frequently smiling carries affective messages that are strongly directed to the person being looked at this point of view is our novelty to detect such communicative smiles we propose new algorithm that jointly estimates facial pose and expression in the framework of the particle filter the main feature is its automatic selection of interest points that can robustly capture small changes in expression even in the presence of large head rotations based on the recognized facial expressions and their directions to others which are indicated by the estimated head poses we visualize interpersonal smile events as graph structure we call it the interpersonal emotional network it is intended to indicate the emotional relationships among meeting participants four person meeting captured by an omnidirectional video system is used to confirm the effectiveness of the proposed method and the potential of our approach for deep understanding of human relationships developed through communications
the performance of dynamic optimization system depends heavily on the code it selects to optimize many current systems follow the design of hp dynamo and select single interprocedural path or trace as the unit of code optimization and code caching though this approach to region selection has worked well in practice we show that it is possible to adapt this basic approach to produce regions with greater locality less needless code duplication and fewer profiling counters in particular we propose two new region selection algorithms and evaluate them against dynamo selection mechanism next executing tail net our first algorithm last executed iteration lei identifies cyclic paths of execution better than net improving locality of execution while reducing the size of the code cache our second algorithm allows overlapping traces of similar execution frequency to be combined into single large region this second technique can be applied to both net and lei and we find that it significantly improves metrics of locality and memory overhead for each
in this paper we examine temporal based program interaction in order to improve layout by reducing the probability that program units will conflict in an instruction cache in that context we present two profile guided procedure reordering algorithms both techniques use cache line coloring to arrive at final program layout and target the elimination of first generation cache conflicts ie conflicts between caller callee pairs the first algorithm builds call graph that records local temporal interaction between procedures we will describe how the call graph is used to guide the placement step and present methods that accelerate cache line coloring by exploring aggressive graph pruning techniques in the second approach we capture global temporal program interaction by constructing conflict miss graph cmg the cmg estimates the worst case number of misses two competing procedures can inflict upon one another and reducing higher generation cache conflicts we use pruned cmg graph to guide cache line coloring using several and benchmarks we show the benefits of letting both types of graphs guide procedure reordering to improve instruction cache hit rates to contrast the differences between these two forms of temporal interaction we also develop new characterization streams based on the inter reference gap irg model
sensemaking tasks require that users gather and comprehend information from many sources to answer complex questions such tasks are common and include for example researching vacation destinations or performing market analysis in this paper we present an algorithm and interface which provides context based page unit recommendation to assist in connection discovery during sensemaking tasks we exploit the natural note taking activity common to sensemaking behavior as the basis for task specific context model our algorithm then dynamically analyzes each web page visited by user to determine which page units are most relevant to the user’s task we present the details of our recommendation algorithm describe the user interface and present the results of user study which show the effectiveness of our approach
peer to peer approaches to anonymous communication promise to eliminate the scalability concerns and central vulnerability points of current networks such as tor however the pp setting introduces many new opportunities for attack and previous designs do not provide an adequate level of anonymity we propose shadowwalker new low latency pp anonymous communication system based on random walk over redundant structured topology we base our design on shadows that redundantly check and certify neighbor information these certifications enable nodes to perform random walks over the structured topology while avoiding route capture and other attacks we analytically calculate the anonymity provided by shadowwalker and show that it performs well for moderate levels of attackers and is much better than the state of the art we also design an extension that improves forwarding performance at slight anonymity cost while at the same time protecting against selective dos attacks we show that our system has manageable overhead and can handle moderate churn making it an attractive new design for pp anonymous communication
spatial data is distinguished from conventional data by having extent therefore spatial queries involve both the objects and the space that they occupy the handling of queries that involve spatial data is facilitated by building an index on the data the traditional role of the index is to sort the data which means that it orders the data however since generally no ordering exists in dimensions greater than without transformation of the data to one dimension the role of the sort process is one of differentiating between the data and what is usually done is to sort the spatial objects with respect to the space that they occupy the resulting ordering is usually implicit rather than explicit so that the data need not be resorted ie the index need not be rebuilt when the queries change eg the query reference objects the index is said to order the space and the characteristics of such indexes are explored further
we consider the problem of evaluating multiple overlapping queries defined on data streams where each query is conjunction of multiple filters and each filter may be shared across multiple queries efficient support for overlapping queries is critical issue in the emerging data stream systems and this is particularly the case when filters are expensive in terms of their computational complexity and processing time this problem generalizes other well known problems such as pipelined filter ordering and set cover and is not only np hard but also hard to approximate within factor of log from the optimum where is the number of queries in this paper we present two near optimal approximation lgorithms with provably good performance guarantees for the evaluation of overlapping queries we present an edge coverage based greedy algorithm which achieves an approximation ratio of log log where is the number of queries and is the average number of filters in query we also present randomized fast and easily parallelizable harmonic algorithm which achieves an approximation ratio of where is the maximum number of filters in query we have implemented these algorithms in prototype system and evaluated their performance using extensive experiments in the context of multimedia stream analysis the results show that our greedy algorithm consistently outperforms other known algorithms under various settings and scales well as the numbers of queries and filters increase
raid storage arrays often possess gigabytes of ram forcaching disk blocks currently most raid systems use lruor lru like policies to manage these caches since these arraycaches do not recognize the presence of file system buffer caches they redundantly retain many of the same blocks as those cachedby the file system thereby wasting precious cache space in thispaper we introduce ray an exclusive raid array cachingmechanism ray achieves high degree of but not perfect exclusivitythrough gray box methods by observing which files havebeen accessed through updates to file system meta data rayconstructs an approximate image of the contents of the file systemcache and uses that information to determine the exclusive set ofblocks that should be cached by the array we use microbenchmarksto demonstrate that ray’s prediction of the file systembuffer cache contents is highly accurate and trace based simulationto show that ray considerably outperforms lru andperforms as well as other more invasive approaches the mainstrength of the ray approach is that it is easy to deploy allperformance gains are achieved without changes to the scsi protocolor the file system above
supporting ranking queries in database systems has been popular research topic recently however there is lack of study on supporting ranking queries in data warehouses where ranking is on multidimensional aggregates instead of on measures of base facts to address this problem we propose query execution model to answer different types of ranking aggregate queries based on unified partial cube structure arcube the query execution model follows candidate generation and verification framework where the most promising candidate cells are generated using set of high level guiding cells we also identify bounding principle for effective pruning once guiding cell is pruned all of its children candidate cells can be pruned we further address the problem of efficient online candidate aggregation and verification by developing chunk based execution model to verify bulk of candidates within bounded memory buffer our extensive performance study shows that the new framework not only leads to an order of magnitude performance improvements over the state of the art method but also is much more flexible in terms of the types of ranking aggregate queries supported
conventional approaches to image retrieval are based on the assumption that relevant images are physically near the query image in some feature space this is the basis of the cluster hypothesis however semantically related images are often scattered across several visual clusters although traditional content based image retrieval cbir technologies may utilize the information contained in multiple queries gotten in one step or through feedback process this is only reformulation of the original query as result these strategies only get the images in some neighborhood of the original query as the retrieval result this severely restricts the system performance relevance feedback techniques are generally used to mitigate this problem in this paper we present novel approach to relevance feedback which can return semantically related images in different visual clusters by merging the result sets of multiple queries further research topics such as achieving candidate queries visual diversity are also discussed we also provide experimental results to demonstrate the effectiveness of our approach
the publish subscribe paradigm represents large class of applications in sensor networks as sensors are designed mainly to detect and notify upon events of interests thus it is important to design publish subscribe mechanism to enable such applications many existing solutions require that the sensor node locations be known which is not possible for many sensor networks among those that do not require so the common approach currently is to use an expensive gossip procedure to disseminate subscription queries and published events we propose solution that does not require location information yet aimed at better efficiency than the gossip based approach our theoretical findings are complemented by simulation based evaluation
this paper introduces new multipass algorithm for efficiently computing direct illumination in scenes with many lights and complex occlusion images are first divided into times pixel blocks and for each point to be shaded within block probability density function pdf is constructed over the lights and sampled to estimate illumination using small number of shadow rays information from these samples is then aggregated at both the pixel and block level and used to optimize the pdfs for the next pass over multiple passes the pdfs and pixel estimates are updated until convergence using aggregation and feedback progressively improves the sampling and automatically exploits both visibility and spatial coherence we also use novel extensions for efficient antialiasing our adaptive multipass approach computes accurate direct illumination eight times faster than prior approaches in tests on several complex scenes
in distributed system high level actions can be modeled by nonatomic events this paper proposes causality relations between distributed nonatomic events and provides efficient testing conditions for the relations the relations provide fine grained granularity to specify causality relations between distributed nonatomic events the set of relations between nonatomic events is complete in first order predicate logic using only the causality relation between atomic events for pair of distributed nonatomic events and the evaluation of any of the causality relations requires inf inf inf inf integer comparisons where inf inf and inf inf respectively are the number of nodes on which the two nonatomic events and occur in this paper we show that this polynomial complexity of evaluation can by simplified to linear complexity using properties of partial orders specifically we show that most relations can be evaluated in min inf inf inf inf integer comparisons some in inf inf integer comparisons and the others in inf inf integer comparisons during the derivation of the efficient testing conditions we also define special system execution prefixes associated with distributed nonatomic events and examine their knowledge theoretic significance
social network based systems usually suffer from two major limitations they tend to rely on single data source eg email traffic and the form of network patterns is often privileged over their content to go beyond these limitations we describe system we developed to visualize and navigate hybrid networks constructed from multiple data sources with direct link between formal representations and the raw content we illustrate the benefits of our approach by analyzing patterns of collaboration in large open source project using hybrid networks to uncover important roles that would otherwise have been missed
although providing metadata association xlink lacks computer interpretability to support knowledge representation for intelligent applications this study proposes an owl based language called owl to make web resources links computer interpretable two aspects of owl link profile and link model are described the link profile provides the information required for an agent to discover link while the link model provides information that enables an agent to exploit link finally this study describes the feasibility of using role arcrole properties of links to represent owl based ontologies which can thus seamlessly interoperate and integrate with owl to enhance knowledge representation
building complex component based software architectures can lead to subtle assemblage errors in this paper we introduce type system based approach to avoid message handling errors when assembling component based communication systems such errors are not captured by classical type systems of host programming languages such as java or ml our approach relies on the definition of small process calculus that captures the operational essence of our target component based framework for communication systems and on the definition of novel type system that combines row types with process types
ideally one would like to perform image search using an intuitive and friendly approach many existing image search engines however present users with sets of images arranged in some default order on the screen typically the relevance to query only while this certainly has its advantages arguably more flexible and intuitive way would be to sort images into arbitrary structures such as grids hierarchies or spheres so that images that are visually or semantically alike are placed together this paper focuses on designing such navigation system for image browsers this is challenging task because arbitrary layout structure makes it difficult if not impossible to compute cross similarities between images and structure coordinates the main ingredient of traditional layouting approaches for this reason we resort to recently developed machine learning technique kernelized sorting it is general technique for matching pairs of objects from different domains without requiring cross domain similarity measures and hence elegantly allows sorting images into arbitrary structures moreover we extend it so that some images can be preselected for instance forming the tip of the hierarchy allowing to subsequently navigate through the search results in the lower levels in an intuitive way
this paper introduces an approach to apply data flow testing techniques to abstract state machines specifications since traditional data flow coverage criteria are strictly based on the mapping between program and its flow graph they cannot be directly applied to asms in this context we are interested in tracing the flow of data between states in asm runs as opposed to between nodes in program’s flow graph therefore we revise the classical concepts in data flow analysis and define them on two levels the syntactic rule level and the computational run level we also specify family of ad hoc data flow coverage criteria and introduce model checking based approach to generate automatically test cases satisfying given set of coverage criteria from asm models
this paper describes domain specific debugger for one way constraint solvers the debugger makes use of several new techniques first the debugger displays only portion of the dataflow graph called constraint slice that is directly related to an incorrect variable this technique helps the debugger scale to system containing thousands of constraints second the debugger presents visual representation of the solver’s data structures and uses color encodings to highlight changes to the data structures finally the debugger allows the user to point to variable that has an unexpected value and ask the debugger to suggest reasons for the unexpected value the debugger makes use of information gathered during the constraint satisfaction process to generate plausible suggestions informal testing has shown that the explanatory capability and the color coding of the constraint solver’s data structures are particularly useful in locating bugs in constraint code
ethnographic studies of the home revealed the fundamental roles that physical locations and context play in how household members understand and manage conventional information yet we also know that digital information is becoming increasingly important to households the problem is that this digital information is almost always tied to traditional computer displays which inhibits its incorporation into household routines our solution location dependent information appliances exploit both home location and context as articulated in ethnographic studies to enhance the role of ambient displays in the home setting these displays provide home occupants with both background awareness of an information source and foreground methods to gain further details if desired the novel aspect is that home occupants assign particular information to locations within home in way that makes sense to them as device is moved to particular home location information is automatically mapped to that device along with hints on how it should be displayed
inter operability in heterogeneous distributed systems is often provided with the help of corba compliant middleware many distributed object computing systems however are characterised by limited heterogeneity such systems often contain subset of components that are written in the same programming language and run on top of the same platform techniques that exploit such limited heterogeneity in systems for achieving high system performance are presented here while components implemented using diverse programming languages and or platform use corba compliant middleware the similar components can use flyover that employs separate path between the client and its server and avoid number of corba overheads prototype of tool that is used for installing such flyovers in corba based applications is implemented and is described the performance of flyover based systems is compared with those of pure corba based systems that use commercial middleware products under various workload and system parameters significantly large performance gain is achieved with the flyover for range of workload parameters insights into system behaviour and performance developed from results of experiments with synthetic workload running on network of pcs are presented
the design of distributed databases involves making decisions on the fragmentation and placement of data and programs across the sites of computer network the first phase of the distribution design in top down approach is the fragmentation phase which clusters in fragments the information accessed simultaneously by applications most distribution design algorithms propose horizontal or vertical class fragmentation however the user has no assistance in the choice between these techniques in this work we present detailed methodology for the design of distributed object databases that includes an analysis phase to indicate the most adequate fragmentation technique to be applied in each class of the database schema semi ii horizontal class fragmentation algorithm and iii vertical class fragmentation algorithm basically the analysis phase is responsible for driving the choice between the horizontal and the vertical partitioning techniques or even the combination of both in order to assist distribution designers in the fragmentation phase of object databases experiments using our methodology have resulted in fragmentation schemas offering high degree of parallelism together with an important reduction of irrelevant data
in this paper we mainly focus on solving scheduling problems with model checking where finite number of entities needs to be processed as efficiently as possible for instance by machine to solve these problems we model them in untimed process algebra where time is modelled using special tick action we propose set of distributed state space explorations to find schedules for the modelled problems building on the traditional notion of beam search the basic approach is called distributed detailed beam search which prunes parts of the state space while searching using an evaluation function in order to find near optimal schedules in very large state spaces variations on this approach are presented such as distributed flexible distributed synchronised and distributed priority beam search which can also practically be used in combinations
ip lookup is in the critical data path in high speed router in this paper we propose new on chip ip cache architecture for high performance ip lookup we design the ip cache along two important axes cache indexing and cache replacement policies first we study various hash performance and employ universal hashing for our ip cache second coupled with our cache indexing scheme we present progressive cache replacement policy by considering internet traffic characteristics our experiments with ip traces show that our ip cache reduces the miss ratio by and small kb ip cache can achieve as high as tbps routing throughput
in this paper we investigate the use of limited infrastructure in the form of wires for improving the energy efficiency of wireless sensor network we call such sensor network wireless sensor network with limited infrastructural support hybrid sensor network the wires act as short cuts to bring down the average hop count of the network resulting in reduced energy dissipation per node our results indicate that adding few wires to wireless sensor network can not only reduce the average energy expenditure per sensor node but also the nonuniformity in the energy expenditure across the sensor nodes
online mining of path traversal patterns from web click streams is one of the most important problems of web usage mining in this paper we propose sliding window based web data mining algorithm called top sw top kpath traversal patterns of stream sliding window to discover the set of top path traversal patterns from streaming maximal forward references where is the desired number of path traversal patterns to be mined new summary data structure called top list list of top kpath traversal patterns is developed to maintain the essential information about the top path traversal patterns from the current maximal forward references stream experimental studies show that the proposed top sw algorithm is an efficient single pass algorithm for mining the set of top path traversal patterns from continuous stream of maximal forward references
energy efficient query dissemination plays an important role for the lifetime of sensor networks in this work we consider probabilistic flooding for query dissemination and develop an analytical framework which enables the base station to predict the energy consumed and the nodes reached according to the rebroadcast probability furthermore we devise topology discovery protocol that collects the structural information required for the framework our analysis shows that the energy savings exceed the energy spent to obtain the required information after small number of query disseminations in realistic settings we verified our results both with simulations and experiments using the sun spot nodes
in measuring the overall security of network crucial issue is to correctly compose the measure of individual components incorrect compositions may lead to misleading results for example network with less vulnerabilities or more diversified configuration is not necessarily more secure to obtain correct compositions of individual measures we need to first understand the interplay between network components for example how vulnerabilities can be combined by attackers in advancing an intrusion such an understanding becomes possible with recent advances in modeling network security using attack graphs based on our experiences with attack graph analysis we propose an integrated framework for measuring various aspects of network security we first outline our principles andmethodologies we then describe concrete examples to buildintuitions finally we present our formal framework it is our belief that metrics developed based on the proposed framework will lead to novel quantitative approaches to vulnerability analysis network hardening and attack response
currently the internet has only one level of name resolution dns which converts user level domain names into ip addresses in this paper we borrow liberally from the literature to argue that there should be three levels of name resolution from user level descriptors to service identifiers from service identifiers to endpoint identifiers and from endpoint identifiers to ip addresses these additional levels of naming and resolution allow services and data to be first class internet objects in that they can be directly and persistently named seamlessly accommodate mobility and multi homing and integrate middleboxes such as nats and firewalls into the internet architecture we further argue that flat names are natural choice for the service and endpoint identifiers hence this architecture requires scalable resolution of flat names capability that distributed hash tables dhts can provide
we introduce simple technique that enables robust approximation of volumetric large deformation dynamics for real time or large scale offline simulations we propose lattice shape matching an extension of deformable shape matching to regular lattices with embedded geometry lattice vertices are smoothed by convolution of rigid shape matching operators on local lattice regions with the effective mechanical stiffness specified by the amount of smoothing via region width since the na iuml ve method can be very slow for stiff models per vertex costs scale cubically with region width we provide fast summation algorithm fast lattice shape matching fastlsm that exploits the inherent summation redundancy of shape matching and can provide large region matching at constant per vertex cost with this approach large lattices can be simulated in linear time we present several examples and benchmarks of an efficient cpu implementation including many dozens of soft bodies simulated at real time rates on typical desktop machine
reactive systems control many useful and complex real world devices tool supported specification modeling helps software engineers design such systems correctly one such tool scenario generator constructs an input event sequence for the spec model that reaches state satisfying given criteria it can uncover counterexamples to desired safety properties explain feature interactions in concrete terms to requirements analysts and even provide online help to end users learning how to use system however while exhaustive search algorithms such as model checkers work in limited cases the problem is highly intractable for the functionally rich models that correspond naturally to complex systems engineers wish to design this paper describes novel heuristic approach to the problem that is applicable to large class of infinite state reactive systems the key idea is to piece together scenarios that achieve subgoals into single scenario achieving the conjunction of the subgoals the scenarios are mined from library captured independently during requirements acquisition explanation based generalization then abstracts them so they may be coinstantiated and interleaved the approach is implemented and present the results of applying the tool to scenario generation problems arising from case study of telephony feature validation
the cade atp system competition casc is an annual evaluation of fully automatic first order automated theorem proving systems the world championship for such systems this paper captures the state of casc after casc the tenth casc held in it provides summarized history of casc details of the current design of the competition observations and discussion of the effects of casc on atp lessons learnt during casc and remarks regarding the past present and future of casc
power consumption has become major cause of concern spanning from data centers to handheld devices traditionally improvement in power performance efficiency of modern superscalar processor came from technology scaling however that is no longer the case many of the current systems deploy coarse grain voltage and or frequency scaling for power management these techniques are attractive but limited due to their granularity of control and effectiveness in nano cmos technologies this paper proposes novel architecture level mechanism to exploit intra thread variations for power performance efficiency in modern superscalar processors this class of processors implements several buffering queuing structures to support speculative out of order execution for performance enhancement applications may not need full capabilities of such structures at all times mechanism that collaboratively adapts finite set of key hardware structures to the changing program behavior can allow the processor to operate with heterogeneous power performance capabilities we present novel offline regression based empirical model to estimate structure resizing for selected set of structures it is shown that using few processor runtime events the system can dynamically estimate structure resizing to exploit power performance efficiency results show that using the proposed empirical model selective set of key structures can be resized at runtime to deliver power performance efficiency over baseline design with only loss of performance
large databases of linguistic annotations are used for testing linguistic hypotheses and for training language processing models these linguistic annotations are often syntactic or prosodic in nature and have hierarchical structure query languages are used to select particular structures of interest or to project out large slices of corpus for external analysis existing languages suffer from variety of problems in the areas of expressiveness efficiency and naturalness for linguistic query we describe the domain of linguistic trees and discuss the expressive requirements for query language then we present language that can express wide range of queries over these trees and show that the language is first order complete over trees
the optimizations in modern compilers are constructed for predetermined set of primitive types as result programmers are unable to exploit optimizations for user defined types where these optimizations would be correct and beneficial moreover because the set of optimizations is also fixed programmers are unable to incorporate new optimizations into the compiler to address these limitations we apply the reuse methodologies from generic programming to compiler analyses and optimizations to enable compilers to apply optimizations to classes of types rather than particular types we define optimizations in terms of generic interface descriptions similar to concepts or haskell type classes by extending these interface descriptions to include associated program analysis and transformation fragments we enable compilers to incorporate user defined transformations and analyses since these transformations are explicitly associated with interface descriptions they can be applied in generic fashion by the compiler we demonstrate that classical compiler optimizations when generalized using this framework can apply to broad range of types both built in and user defined finally we present an initial implementation the principles of which are generalizable to other compilers
this paper describes methodology of olap cube navigation to identify interesting surprises by using skewness based approach three different measures of interestingness of navigation rules are proposed the navigation rules are examined for their interestingness in terms of their expectedness of skewness from neighborhood rules novel axis shift theory ast to determine interesting navigation paths is presented along with an attribute influence approach for generalization of rules which measures the interestingness of dimensional attributes and their relative influence on navigation paths detailed examples and extensive experiments demonstrate the effectiveness of interestingness of navigation rules
leakage energy consumption is becoming an important design consideration with the scaling of technology besides caches branch predictors are among the largest on chip array structures and consume non trivial leakage energy this paper proposes two loop based strategies to reduce the branch predictor leakage without impacting prediction accuracy which is crucial for achieving high performance the loop based approaches exploit the fact that loops usually only contain small number of instructions and hence fewer branch instructions consequently all the non active entries of branch predictors can be placed into the low leakage mode during the loop execution for leakage energy savings compilers can annotate this information and pass it to the processor for reducing leakage at runtime compared to the recently proposed decay based approach our experimental results show that the loop based approach can extract more branch predictor idleness on average leading to more leakage energy savings without impacting the branch prediction accuracy and performance
partite networks are natural representations of complex multi entity databases however processing these networks can be highly memory and computation intensive task especially when positive correlation exists between the degrees of vertices from different partitions in order to improve the scalability of this process this paper proposes two algorithms that make use of sampling for obtaining less expensive approximate results the first algorithm is optimal for obtaining homogeneous discovery rates with low memory requirement but can be very slow in cases where the combined branching factor of these networks is too large second algorithm that incorporates concepts from evolutionary computation aims toward dealing with this slow convergence in the case when it is more interesting to increase approximation convergence speed of elements with high feature values this algorithm makes use of the positive correlation between local branching factors and the feature values two applications examples are demonstrated in searching for most influential authors in collections of journal articles and in analyzing most active earthquake regions from collection of earthquake events
the use of virtualization is progressively accommodating diverse and unpredictable workloads as being adopted in virtual desktop and cloud computing environments since virtual machine monitor lacks knowledge of each virtual machine the unpredictableness of workloads makes resource allocation difficult particularly virtual machine scheduling has critical impact on performance in cases where the virtual machine monitor is agnostic about the internal workloads of virtual machines this paper presents task aware virtual machine scheduling mechanism based on inference techniques using gray box knowledge the proposed mechanism infers the boundness of guest level tasks and correlates incoming events with bound tasks with this information we introduce partial boosting which is priority boosting mechanism with task level granularity so that an bound task is selectively scheduled to handle its incoming events promptly our technique focuses on improving the performance of bound tasks within heterogeneous workloads by lightweight mechanisms with complete cpu fairness among virtual machines all implementation is confined to the virtualization layer based on the xen virtual machine monitor and the credit scheduler we evaluate our prototype in terms of performance and cpu fairness over synthetic mixed workloads and realistic applications
in this paper we propose new co clustering algorithm called possibilistic fuzzy co clustering pfcc for automatic categorization of large document collections pfcc integrates possibilistic document clustering technique and combined formulation of fuzzy word ranking and partitioning into fast iterative co clustering procedure this novel framework brings about simultaneously some benefits including robustness in the presence of document and word outliers rich representations of co clusters highly descriptive document clusters good performance in high dimensional space and reduced sensitivity to the initialization in the possibilistic clustering we present the detailed formulation of pfcc together with the explanations of the motivations behind the advantages over other existing works and the algorithm’s proof of convergence are provided experiments on several large document data sets demonstrate the effectiveness of pfcc
the capacity and fidelity of image based diagnosis were extended due to the evolution of medical image acquisition techniques such data is usually visualised through volume rendering which denotes set of techniques used to present three dimensional images with the main goal of showing the interior of volume and enabling the identification of its inner regions and structures several tools described in the literature are dedicated to explore different ways of incorporating seeing through capabilities into volume rendering techniques however these tools are based on visualisation algorithms and are usually computationally intensive especially when working with large datasets an alternative to optimise rendering time is to use high performance programming to implement such tools thus providing faster response to user interaction this paper presents new approach to visualise inner structures in medical volume data using parallel ray casting algorithm to allow user interaction with the volume
the natural distribution of textual data used in text classification is often imbalanced categories with fewer examples are under represented and their classifiers often perform far below satisfactory we tackle this problem using simple probability based term weighting scheme to better distinguish documents in minor categories this new scheme directly utilizes two critical information ratios ie relevance indicators such relevance indicators are nicely supported by probability estimates which embody the category membership our experimental study using both support vector machines and naive bayes classifiers and extensive comparison with other classic weighting schemes over two benchmarking data sets including reuters shows significant improvement for minor categories while the performance for major categories are not jeopardized our approach has suggested simple and effective solution to boost the performance of text classification over skewed data sets
state machine based simulation of boolean functions is substantially faster if the function being simulated is symmetric unfortunately function symmetries are comparatively rare conjugate symmetries can be used to reduce the state space for functions that have no detectable symmetries allowing the benefits of symmetry to be applied to much wider class of functions substantial improvements in simulation speed from have been realized using these techniques
in smt processors the complex interplay between private and shared datapath resources needs to be considered in order to realize the full performance potential in this paper we show that blindly increasing the size of the per thread reorder buffers to provide larger number of in flight instructions does not result in the expected performance gains but quite in contrast degrades the instruction throughput for virtually all multithreaded workloads the reason for this performance loss is the excessive pressure on the shared datapath resources especially the instruction scheduling logic we propose intelligent mechanisms for dynamically adapting the number of reorder buffer entries allocated to each thread in an effort to avoid such allocations if they detrimentally impact the scheduler we achieve this goal through categorizing the program execution into issue bound and commit bound phases and only performing the buffer allocations to the threads operating in commit bound phases our adaptive technique achieves improvements of in instruction throughput and in the fairness metric compared to the best performing baseline configuration with static robs
the advances in semiconductor technology have set the shared memory server trend towards processors with multiple cores per die and multiple threads per core we believe that this technology shift forces reevaluation of how to interconnect multiple such chips to form larger systemsthis paper argues that by adding support for coherence traps in future chip multiprocessors large scale server systems can be formed at much lower cost this is due to shorter design time verification and time to market when compared to its traditional all hardware counter part in the proposed trap based memory architecture tma software trap handlers are responsible for obtaining read write permission whereas the coherence trap hardware is responsible for the actual permission checkin this paper we evaluate tma implementation called tma lite with minimal amount of hardware extensions all contained within the processor the proposed mechanisms for coherence trap processing should not affect the critical path and have negligible cost in terms of area and power for most processor designsour evaluation is based on detailed full system simulation using out of order processors with one or two dual threaded cores per die as processing nodes the results show that tma based distributed shared memory system can perform on par with highly optimized hardware based design
relational xquery systems try to re use mature relational data management infrastructures to create fast and scalable xml database technology this paper describes the main features key contributions and lessons learned while implementing such system its architecture consists of range based encoding of xml documents into relational tables ii compilation technique that translates xquery into basic relational algebra iii restricted order property aware peephole relational query optimization strategy and iv mapping from xml update statements into relational updates thus this system implements all essential xml database functionalities rather than single feature such that we can learn from the full consequences of our architectural decisions while implementing this system we had to extend the state of the art with number of new technical contributions such as loop lifted staircase join and efficient relational query evaluation strategies for xquery theta joins with existential semantics these contributions as well as the architectural lessons learned are also deemed valuable for other relational back end engines the performance and scalability of the resulting system is evaluated on the xmark benchmark up to data sizes of gb the performance section also provides an extensive benchmark comparison of all major xmark results published previously which confirm that the goal of purely relational xquery processing namely speed and scalability was met
we propose hybrid camera pose estimation method using an inclination sensor value and correspondence free line segments in this method possible azimuths of the camera pose are hypothesized by voting method under an inclination constraint then some camera positions for each possible azimuth are calculated based on the detected line segments that affirmatively voted for the azimuth finally the most consistent one is selected as the camera pose out of the multiple sets of the camera positions and azimuths unlike many other tracking methods our method does not use past information but rather estimates the camera pose using only present information this feature is useful for an initialization measure of registration in augmented reality ar systems this paper describes the details of the method and shows its effectiveness with experiments in which the method is actually used in an ar application
lighting design is complex but fundamental task in computer cinematography involving the adjustment of light parameters to define final scene appearance many user interfaces have been proposed to simplify lighting design they can be generally categorized in three paradigms direct light parameter manipulation indirect light feature manipulation eg shadow dragging and goal based optimization of lighting through painting to this date no formal evaluation of the relative effectiveness of these paradigms has been performed in this paper we present first step toward evaluating the benefits of these three paradigms in the form of user study with focus on novice users subjects participated in the experiment by performing various trials on simple scenes with up to point lights designed to test two lighting tasks precise adjustment of lighting and the artistic exploration of lighting configurations we collected objective and subjective data and found that subjects can light well with direct and indirect interfaces preferring the latter paint based goal specification was found to be significantly worse than the other paradigms especially since users tend to sketch rather than accurately paint goal images an input that painting algorithms were not designed for we also found that given enough time novices can perform relatively complex lighting tasks unhindered by geometry or lighting complexity finally we believe that our study will impact the design of future lighting interfaces and it will serve as the basis for designing additional experiments to reach comprehensive evaluation of lighting interfaces
metric databases are databases where metric distance function is defined for pairs of database objects in such databases similarity queries in the form of range queries or nearest neighbor queries are the most important query types in traditional query processing single queries are issued independently by different users in many data mining applications however the database is typically explored by iteratively asking similarity queries for answers of previous similarity queries in this paper we introduce generic scheme for such data mining algorithms and we investigate two orthogonal approaches reducing cost as well as cpu cost to speed up the processing of multiple similarity queries the proposed techniques apply to any type of similarity query and to an implementation based on an index or using sequential scan parallelization yields an additional impressive speed up an extensive performance evaluation confirms the efficiency of our approach
developing software for dynamic pervasive computing networks can be an intimidating prospect while much research has focused on developing and describing algorithms and protocols for these environments the process of deploying these technologies is far from mature or streamlined furthermore the heterogeneity of pervasive computing platforms can make the deployment task unapproachable in this paper we describe the evolving tuples model and demonstrate how simple protocol can be quickly and easily developed since the evolving tuples infrastructure serves as unifying base across heterogeneous platforms the resulting implementation inherently supports cross platform deployment common scenario for pervasive computing
labeling schemes lie at the core of query processing for many xml database management systems designing labeling schemes for dynamic xml documents is an important problem that has received lot of research attention existing dynamic labeling schemes however often sacrifice query performance and introduce additional labeling cost to facilitate arbitrary updates even when the documents actually seldom get updated since the line between static and dynamic xml documents is often blurred in practice we believe it is important to design labeling scheme that is compact and efficient regardless of whether the documents are frequently updated or not in this paper we propose novel labeling scheme called dde for dynamic dewey which is tailored for both static and dynamic xml documents for static documents the labels of dde are the same as those of dewey which yield compact size and high query performance when updates take place dde can completely avoid re labeling and its label quality is most resilient to the number and order of insertions compared to the existing approaches in addition we introduce compact dde cdde which is designed to optimize the performance of dde for insertions both dde and cdde can be incorporated into existing systems and applications that are based on dewey labeling scheme with minimum efforts experiment results demonstrate the benefits of our proposed labeling schemes over the previous approaches
with the rapid increasing popularity of the www websites are playing crucial role to convey knowledge and information to the end users discovering hidden and meaningful information about web users usage patterns is critical to determine effective marketing strategies to optimize the web server usage for accommodating future growth most of the currently available web server analysis tools provide only explicitly and statistical information without real useful knowledge for web managers the task of mining useful information becomes more challenging when the web traffic volume is enormous and keeps on growing in this paper we propose concurrent neurofuzzy model to discover and analyze useful knowledge from the available web log data we made use of the cluster information generated by self organizing map for pattern analysis and fuzzy inference system to capture the chaotic trend to provide short term hourly and long term daily web traffic trend predictions empirical results clearly demonstrate that the proposed hybrid approach is efficient for mining and predicting web server traffic and could be extended to other web environments as well
in this paper we generalise the sentence compression task rather than simply shorten sentence by deleting words or constituents as in previous work we rewrite it using additional operations such as substitution reordering and insertion we present new corpus that is suited to our task and discriminative tree to tree transduction model that can naturally account for structural and lexical mismatches the model incorporates novel grammar extraction method uses language model for coherent output and can be easily tuned to wide range of compression specific loss functions
presents the results of an implementation of several algorithms for checkpointing andrestarting parallel programs on shared memory multiprocessors the algorithms arecompared according to the metrics of overall checkpointing time overhead imposed bythe checkpointer on the target program and amount of time during which thecheckpointer interrupts the target program the best algorithm measured achieves itsefficiency through variation of copy on write which allows the most time consumingoperations of the checkpoint to be overlapped with the running of the program beingcheckpointed
minimum spanning tree mst clustering sequentially inserts the nearest points in the space into list which is then divided into clusters by using desired criteria this insertion order however can be relaxed provided approximately nearby points in condensed area are adjacently inserted into list before distant points in other areas based on this observation we propose an approximate clustering method in which new approximate mst amst is repeatedly built in the maximum iterations from two sources new hilbert curve created from carefully shifted data points and previous amst which holds cumulative vicinity information derived from earlier iterations although the final amst may not completely match to true mst built from an algorithm most mismatches occur locally within individual data groups which are unimportant for clustering our experiments on synthetic datasets and animal motion vectors extracted from surveillance videos show that high quality clusters can be efficiently obtained from this approximation method
in this paper we formalize the digital library dl integration problem and propose an overall approach based on the streams structures spaces scenarios and societies framework we then apply that framework to integrate domain specific archeological dls illustrating our solutions for key problems in dl integration an integrated archeological dl etana dl is used as case study to justify and evaluate our dl integration approach more specifically we develop minimal metamodel for archeological dls within the theory we implement the ssuite tool set to cover the process of union dl generation including requirements gathering conceptual modeling rapid prototyping and code generation ssuite consists of sgraph sgen and schemamapper each of which plays an important role in dl integration we also propose an approach to integrated dls based on the formalism which provides systematic method to design and implement dl exploring services
in this paper we investigate security problems which occur when exploiting linda like data driven coordination model in an open environment in this scenario there is no guarantee that all the agents accessing the shared tuple space are trusted starting from formalization of some typical security properties in the standard linda coordination model we present novel data driven coordination model which provides mechanisms to support the considered security properties the first of these mechanisms supports logical partitions of the shared repository in this way we can restrict the access to tuples stored inside partition simply by limiting the access to the partition itself the second mechanism consists of adding to the tuples some extra information which permits to authenticate the producer of tuple or to identify its reader consumer finally we support the possibility to define access control policies based on the kind of operations an agent performs on tuple thus discriminating between destructive input and non destructive read permissions on each single tuple
the extcow file system built on the popular ext file system provides an open source file versioning and snapshot platform for compliance with the versioning and audtitability requirements of recent electronic record retention legislation extcow provides time shifting interface that permits real time and continuous view of data in the past time shifting does not pollute the file system namespace nor require snapshots to be mounted as separate file system further extcow is implemented entirely in the file system space and therefore does not modify kernel interfaces or change the operation of other file systems extcow takes advantage of the fine grained control of on disk and in memory data available only to file system resulting in minimal degradation of performance and functionality experimental results confirm this hypothesis extcow performs comparably to ext on many benchmarks and on trace driven experiments
we present the tic transactions with isolation and cooperation model for concurrent programming tic adds to standard transactional memory the ability for transaction to observe the effects of other threads at selected points this allows transactions to cooperate as well as to invoke nonrepeatable or irreversible operations such as cooperating transactions run the danger of exposing intermediate state and of having other threads change the transaction’s state the tic model protects against unanticipated interference by having the type system keep track of all operations that may transitively violate the atomicity of transaction and require the programmer to establish consistency at appropriate points the result is programming model that is both general and simple we have used the tic model to re engineer existing lock based applications including substantial multi threaded web mail server and memory allocator with coarse grained locking our experience confirms the features of the tic model it is convenient for the programmer while maintaining the benefits of transactional memory
tree induction is one of the most effective and widely used methods for building classification models however many applications require cases to be ranked by the probability of class membership probability estimation trees pets have the same attractive features as classification trees eg comprehensibility accuracy and efficiency in high dimensions and on large data sets unfortunately decision trees have been found to provide poor probability estimates several techniques have been proposed to build more accurate pets but to our knowledge there has not been systematic experimental analysis of which techniques actually improve the probability based rankings and by how much in this paper we first discuss why the decision tree representation is not intrinsically inadequate for probability estimation inaccurate probabilities are partially the result of decision tree induction algorithms that focus on maximizing classification accuracy and minimizing tree size for example via reduced error pruning larger trees can be better for probability estimation even if the extra size is superfluous for accuracy maximization we then present the results of comprehensive set of experiments testing some straightforward methods for improving probability based rankings we show that using simple common smoothing method mdash the laplace correction mdash uniformly improves probability based rankings in addition bagging substantially improves the rankings and is even more effective for this purpose than for improving accuracy we conclude that pets with these simple modifications should be considered when rankings based on class membership probability are required
concurrent collection classes are widely used in multi threaded programming but they provide atomicity only for fixed set of operations software transactional memory stm provides convenient and powerful programming model for composing atomic operations but concurrent collection algorithms that allow their operations to be composed using stm are significantly slower than their non composable alternatives we introduce transactional predication method for building transactional maps and sets on top of an underlying non composable concurrent map we factor the work of most collection operations into two parts portion that does not need atomicity or isolation and single transactional memory access the result approximates semantic conflict detection using the stm’s structural conflict detection mechanism the separation also allows extra optimizations when the collection is used outside transaction we perform an experimental evaluation that shows that predication has better performance than existing transactional collection algorithms across range of workloads
we revisit the problem of incentive compatible interdomain routing examining the quite realistic special case in which the autonomous systems ases utilities are linear functions of the traffic in the incident links and the traffic leaving each as we show that incentive compatibility towards maximizing total welfare is achievable efficiently and in the uncapacitated case by an algorithm that can be implemented by bgp the standard protocol for interdomain routing
various applications impose different transactional loads on databases for example for telecommunication systems online games sensor networks and trading systems most of the database load consists of single tuple reads and single tuple writes in this paper approaches to handle these single tuple transactions in main memory systems are presented the protocols are evaluated by simulation and verified by statistical analysis the results show that more transactions can be executed while keeping response times low using the new approach compared to state of the art protocol
we propose novel mobility model named semi markov smooth sms model to characterize the smooth movement of mobile users in accordance with the physical law of motion in order to eliminate sharp turns abrupt speed change and sudden stops exhibited by existing models we formulate the smooth mobility model by semi markov process to analyze the steady state properties of this model because the transition time between consecutive phases states has discrete uniform distribution instead of an exponential distribution through stochastic analysis we prove that this model unifies many good features for analysis and simulations of mobile networks first it is smooth and steady because there is no speed decay problem for arbitrary starting speed while maintaining uniform spatial node distribution regardless of node placement second it can be easily and flexibly applied for simulating node mobility in wireless networks it can also adapt to different network environments such as group mobility and geographic constraints to demonstrate the impact of this model we evaluate the effect of this model on distribution of relative speed link lifetime between neighboring nodes and average node degree by ns simulations
real time rendering of large scale and complex scenes is one of the important subjects in virtual reality technology in this paper we present an efficient view dependent out of core rendering algorithm for large scale and complex scenes in the preprocessing phase we partition the scene in hierarchy and compute continuous hierarchical level of detail hlod for each hierarchical node using dynamic topology simplification method then at run time the multi threaded technique is used the rendering thread uses the hierarchy for coarse global refinement and uses the continuous hlods for fine local refinement the prefetching thread predicts the motion of the viewer and prefetches the data that the viewer may see next we have implemented our method on common pc with standard graphics hardware experimental results show that the system can perform interactive rendering of large scale and complex scenes with fine image quality and minimal popping artifacts
in this paper we propose an adaptive bfgs which uses selfadaptive scaling factor for the hessian matrix and is equipped with nonmonotone strategy our experimental evaluation using different recurrent networks architectures provides evidence that the proposed approach trains successfully recurrent networks of various architectures inheriting the benefits of the bfgs and at the same time alleviating some of its limitations
modern embedded multimedia and telecommunications systems need to store and access huge amounts of data this becomes critical factor for the overall energy consumption area and performance of the systems loop transformations are essential to improve the data access locality and regularity in order to optimally design or utilize memory hierarchy however due to abstract high level cost functions current loop transformation steering techniques do not take the memory platform sufficiently into account they usually also result in only one final transformation solution on the other hand the loop transformation search space for real life applications is huge especially if the memory platform is still not fully fixed use of existing loop transformation techniques will therefore typically lead to suboptimal end products it is critical to find all interesting loop transformation instances this can only be achieved by performing an evaluation of the effect of later design stages at the early loop transformation stage this article presents fast incremental hierarchical memory size requirement estimation technique it estimates the influence of any given sequence of loop transformation instances on the mapping of application data onto hierarchical memory platform as the exact memory platform instantiation is often not yet defined at this high level design stage platform independent estimation is introduced with pareto curve output for each loop transformation instance comparison among the pareto curves helps the designer or steering tool to find all interesting loop transformation instances that might later lead to low power data mapping for any of the many possible memory hierarchy instances initially the source code is used as input for estimation however performing the estimation repeatedly from the source code is too slow for large search space exploration an incremental approach based on local updating of the previous result is therefore used to handle sequences of different loop transformations experiments show that the initial approach takes few seconds which is two orders of magnitude faster than state of the art solutions but still too costly to be performed interactively many times the incremental approach typically takes just few milliseconds which is another two orders of magnitude faster than the initial approach this huge speedup allows us for the first time to handle real life industrial size applications and get realistic feedback during loop transformation exploration
the web has become an excellent source for gathering consumer opinions there are now numerous web sites containing such opinions eg customer reviews of products forums discussion groups and blogs this paper focuses on online customer reviews of products it makes two contributions first it proposes novel framework for analyzing and comparing consumer opinions of competing products prototype system called opinion observer is also implemented the system is such that with single glance of its visualization the user is able to clearly see the strengths and weaknesses of each product in the minds of consumers in terms of various product features this comparison is useful to both potential customers and product manufacturers for potential customer he she can see visual side by side and feature by feature comparison of consumer opinions on these products which helps him her to decide which product to buy for product manufacturer the comparison enables it to easily gather marketing intelligence and product benchmarking information second new technique based on language pattern mining is proposed to extract product features from pros and cons in particular type of reviews such features form the basis for the above comparison experimental results show that the technique is highly effective and outperform existing methods significantly
this paper presents our experience with implementing atomicity in two systems the quicksilver distributed file system and the starburst relational database manager each of these systems guarantees that certain collections of operations done on behalf of their clients execute atomically despite process machine or network failures in this paper we describe the atomic properties implemented by each system present the algorithms and mechanisms used examine the similarities and differences between the two systems and give the rationale for different design decisions we demonstrate that the support of atomicity with high performance requires variety of techniques carefully chosen to balance the amount of data logged the level of concurrency allowed and the mutual consistency requirements of sets of objects the main goal is to help others implement efficient systems that support atomicity
this paper proposes workflow based recommender system model on supplying proper knowledge to proper members in collaborative team contexts rather than daily life scenarios eg recommending commodities films news etc within collaborative team contexts more information could be utilized by recommender systems than ordinary daily life contexts the workflow in collaborative team contains information about relationships among members roles and tasks which could be combined with collaborative filtering to obtain members demands for knowledge in addition the work schedule information contained in the workflow could also be employed to determine the proper volume of knowledge that should be recommended to each member in this paper we investigate the mechanism of the workflow based recommender system and conduct series of experiments referring to several real world collaborative teams to validate the effectiveness and efficiency of the proposed methods
in recent years we have witnessed large interest in surface deformation techniques this has been reaction that can be attributed to the ability to develop techniques which are detail preserving space deformation techniques on the other hand received less attention but nevertheless they have many advantages over surface based techniques this paper explores the potential of these two approaches to deformation and discusses the opportunities that the fusion of the two may lead to
opinion retrieval is novel information retrieval task and has attracted great deal of attention with the rapid increase of online opinionated information most previous work adopts the classical two stage framework ie first retrieving topic relevant documents and then re ranking them according to opinion relevance however none has considered the problem of domain coherence between queries and topic relevant documents in this work we propose to address this problem based on the similarity measure of the usage of opinion words which users employ to express opinions our work is based on the observation that the opinion words are domain dependent we reformulate this problem as measuring the opinion similarity between domain opinion models of queries and document opinion models opinion model is constructed to capture the distribution of opinion words the basic idea is that if document has high opinion similarity with domain opinion model it indicates that it is not only opinionated but also in the same domain with the query ie domain coherence experimental results show that our approach performs comparatively with the state of the art work
this paper has focus on young children and their emerging new technologies it examines children’s drawings as an evaluation tool for capturing their experiences of different novel interfaces recent evaluation study with children and two follow up expert coding sessions were used to demonstrate how drawings could be used and coded and how the intercoder reliability could be improved usability and user experience ux factors fun goal fit gf and tangible magic tm were included in the coding scheme and they were the factors that have been looked at in the coding sessions our studies show the thoroughness and ease of use of the drawing method the method was effective and reliable in conveying the user experience form the drawings it also shows some of the limitation of the method eg resource intensive and open to evaluator’s interpretation from the result of the study number of the drawings conveyed information pertaining to user experiences gf and tm and the method was particularly reliable at capturing fun the result also led to the correlation found on the gf and tm
we propose reflective model to express and to automatically manage dependencies between objects this model describes reflective facilities which enable the changing of language semantics although the importance of inter object dependencies is well accepted there is only limited object oriented language support for their specification and implementation in response to this lack of expressiveness of object models the flo language integrates dependency management into the object oriented paradigm dependencies are described as first class objects and flo automatically maintains the consistency of the dependency graphin this paper we first show how user can declare dependencies and how the system maintains the consistency of the graph of expressed dependencies in second part we focus on the implementation of this management by controlling the messages sent to linked objects in order to make dependency management orthogonal to other application concerns we propose an abstraction of message handling implemented with meta objects we illustrate the extensibility of our language with different control behavior implementations in particular we study different implementations of the global control of message propagation flow
web caching is an important technology for improving the scalability of web services one of the key problems in coordinated enroute web caching is to compute the locations for storing copies of an object among the enroute caches so that some specified objectives are achieved in this article we address this problem for tree networks and formulate it as maximization problem we consider this problem for both unconstrained and constrained cases the constrained case includes constraints on the cost gain per node and on the number of object copies to be placed we present dynamic programming based solutions to this problem for different cases and theoretically show that the solutions are either optimal or convergent to optimal solutions we derive efficient algorithms that produce these solutions based on our mathematical model we also present solution to coordinated enroute web caching for autonomous systems as natural extension of the solution for tree networks we implement our algorithms and evaluate our model on different performance metrics through extensive simulation experiments the implementation results show that our methods outperform the existing algorithms of either coordinated enroute web caching for linear topology or object placement replacement at individual nodes only
this paper presents new randomized algorithm for achieving consensus among asynchronous processes that communicate by reading and writing shared registers in the presence of strong adversary the fastest previously known algorithm requires process to perform an expected log read and write operations in the worst case in our algorithm each process executes at most an expected log read and write operations it is shown that shared coin algorithms can be combined together to yield an algorithm with log individual work and total work
data diffusion architectures also known as cache only memory architectures provide shared address space on top of distributed memory their distinctive feature is that data diffuses or migrates and replicates in main memory according to whichever processors are using the data this requires an associative organisation of main memory which decouples each address and its data item from any physical location data item can thus be placed and replicated where it is needed also the physical address space does not have to be fixed and contiguous it can be any set of addresses within the address range of the processors possibly varying over time provided it is smaller than the size of main memory this flexibility is similar to that of virtual address space and offers new possibilities to organise virtual memory systemwe present an analysis of possible organisations of virtual memory on such architectures and propose two main alternatives traditional virtual memory tvm is organised around fixed and contiguous physical address space using traditional mapping associative memory virtual memory amvm is organised around variable and non contiguous physical address space using simpler mappingto evaluate tvm and amvm we extended multiprocessor emulation of data diffusion architecture to include part of the mach operating system virtual memory this extension implements tvm slightly modified version implements amvm on applications tested amvm shows marginal performance gain over tvm we argue that amvm will offer greater advantages with higher degrees of parallelism or larger data sets
the fuzzy lattice reasoning flr neural network was introduced lately based on an inclusion measure function this work presents novel flr extension namely agglomerative similarity measure flr or asmflr for short for clustering based on similarity measure function the latter function may also be based on metric we demonstrate application in metric space emerging from weighted graph towards partitioning it the asmflr compares favorably with four alternative graph clustering algorithms from the literature in series of computational experiments on artificial data in addition our work introduces novel index for the quality of clustering which index compares favorably with two popular indices from the literature
the application of automatic transformation processes during the formal development and optimization of programs can introduce encumbrances in the generated code that programmers usually or presumably do not write an example is the introduction of redundant arguments in the functions defined in the program redundancy of parameter means that replacing it by any expression does not change the result in this work we provide methods for the analysis and elimination of redundant arguments in term rewriting systems as model for the programs that can be written in more sophisticated languages on the basis of the uselessness of redundant arguments we also propose an erasure procedure which may avoid wasteful computations while still preserving the semantics under ascertained conditions prototype implementation of these methods has been undertaken which demonstrates the practicality of our approach
the erosion of trust put in traditional database servers and in database service providers the growing interest for different forms of data dissemination and the concern for protecting children from suspicious internet content are different factors that lead to move the access control from servers to clients several encryption schemes can be used to serve this purpose but all suffer from static way of sharing data with the emergence of hardware and software security elements on client devices more dynamic client based access control schemes can be devised this paper proposes an efficient client based evaluator of access control rules for regulating access to xml documents this evaluator takes benefit from dedicated index to quickly converge towards the authorized parts of potentially streaming document additional security mecanisms guarantee that prohibited data can never be disclosed during the processing and that the input document is protected from any form of tampering experiments on synthetic and real datasets demonstrate the effectiveness of the approach
we present surface modeling technique that supports adaptive resolution and hierarchical editing for surfaces of spherical topology the resulting surface is analytic ck and has continuous local parameterization defined at every point to manipulate these surfaces we describe user interface based on multiple overlapping subdivision style meshes
we show how importance driven refinement and wavelet basis can be combined to provide an efficient solution to the global illumination problem with glossy and diffuse reflections importance is used to focus the computation on the interactions having the greatest impact on the visible solution wavelets are used to provide an efficient representation of radiance importance and the transport operator we discuss number of choices that must be made when constructing finite element algorithm for glossy global illumination our algorithm is based on the standard wavelet decomposition of the transport operator and makes use of four dimensional wavelet representation for spatially and angularly varying radiance distributions we use final gathering step to improve the visual quality of the solution features of our implementation include support for curved surfaces as well as texture mapped anisotropic emission and reflection functions
family of tableau methods called ordered semantic hyper osh tableau methods for first order theories with function symbols is presented these methods permit semantic information to guide the search for proof they also may make use of orderings on literals clauses and interpretations to guide the search in typical tableau the branches represent conjunctions of literals and the tableau represents the disjunction of the branches an osh tableau is as usual except that each branch has an interpretation associated with it where is an interpretation supplied at the beginning and is the interpretation most like that satisfies only clauses that falsifies may be used to expand the branch thus restricting the kinds of tableau that can be constructed this restriction guarantees the goal sensitivity of these methods if is properly chosen certain choices of may produce purely bottom up tableau construction while others may result in goal oriented evaluation for given query the choices of which branch is selected for expansion and which clause is used to expand this branch are examined and their effects on the osh tableau methods considered branch reordering method is also studied as well as branch pruning technique called complement modification that adds additional literals to branches in soundness preserving manner all members of the family of osh tableaux are shown to be sound complete and proof convergent for refutations proof convergence means that any allowable sequence of operations will eventually find proof if one exists osh tableaux are powerful enough to be treated as generalization of several classes of tableau discussed in the literature including forward chaining and backward chaining procedures therefore they can be used for efficient query processing
the author addresses the problem of managing changes to items of various types in multitype software environment prism model of changes has been designed with the following features separation of concern between changes to the described items and changes to the environmental facilities housing these items facility called the dependency structure for describing various items and their interdependencies and for identifying the items affected by given change facility called the change structure for classifying recording and analyzing change related data and for making qualitative judgments of the consequences of change identification of the many distinct properties of change and built in mechanism for providing feedback the rationale for the design of the model of changes as well as that of the dependency structure and the change structure is given
this paper addresses the problem of variable ranking for support vector regression the ranking criteria that we proposed are based on leave one out bounds and some variants and for these criteria we have compared different search space algorithms recursive feature elimination and scaling factor optimization based on gradient descent all these algorithms have been compared on toy problems and real world qsar data sets results show that the radius margin criterion is the most efficient criterion for ranking variables using this criterion can then lead to support vector regressor with improved error rate while using fewer variables our results also support the evidence that gradient descent algorithm achieves better variable ranking compared to backward algorithm
we present language for querying list based complex objects the language is shown to express precisely the polynomial time generic list object functions the iteration mechanism of the language is based on new approach wherein in addition to the list over which the iteration is performed second list is used to control the number of iteration steps during the iteration the intermediate results can be moved to the output list as well as reinserted into the list being iterated over simple syntactic constraint allows the growth rate of the intermediate results to be tightly controlled which in turn restricts the expressiveness of the language to ptime
evolving and refactoring concurrent java software can be error prone resulting in race conditions and other concurrency difficulties we suggest that there are two principal causes concurrency design intent is often not explicit in code and additionally consistency of intent and code cannot easily be established through either testing or inspectionwe explore several aspects of this issue in this paper first we describe tool assisted approach to modeling and assurance for concurrent programs second we give an account of recent case study experience on larger scale production java systems third we suggest an approach to scalable co evolution of code and models that is designed to support working programmers without special training or incentives fourth we propose some concurrency related refactorings that with suitable analysis and tool support can potentially offer assurances of soundness
scheduling of most of the parallel scientific applications demand simultaneous exploitation of task and data parallelism for efficient and effective utilization of system and other resources traditional optimization techniques like optimal control theoretic approaches convex programming and bin packing have been suggested in the literature for dealing with the most critical processor allocation phase however their application onto the real world problems is not straightforward which departs the solutions away from optimality heuristic based approaches in contrast work in the integer domain for the number of processors all through and perform appreciably well two step modified critical path and area based mcpa scheduling heuristic is developed which targets at improving the processor allocation phase of an existing critical path and area based cpa scheduling algorithm strength of the suggested algorithm lies in bridging the gap between the processor allocation and task assignment phases of scheduling it helps in making better processor allocations for data parallel tasks without sacrificing the essential task parallelism available in the application program performance of mcpa algorithm in terms of normalized schedule length and speedup is evaluated for random and real application task graph suites it turns out to be much better than the parent cpa algorithm and comparable to the high complexity critical path reduction cpr algorithm
atomicity is fundamental correctness property in multithreaded programs method is atomic if for every execution there is an equivalent serial execution in which the actions of the method are not interleaved with actions of other threads atomic methods are amenable to sequential reasoning which significantly facilitates subsequent analysis and verification this article presents type system for specifying and verifying the atomicity of methods in multithreaded java programs using synthesis of lipton’s theory of reduction and type systems for race detection the type system supports guarded write guarded and unguarded fields as well as thread local data parameterized classes and methods and protected locks we also present an algorithm for verifying atomicity via type inference we have applied our type checker and type inference tools to number of commonly used java library classes and programs these tools were able to verify the vast majority of methods in these benchmarks as atomic indicating that atomicity is widespread methodology for multithreaded programming in addition reported atomicity violations revealed some subtle errors in the synchronization disciplines of these programs
lifecycle validation of the performance of software products ie the prediction of the product ability to satisfy the user performance requirements encompasses the production of performance models from case documentsthe model production activity is critical time consuming and error prone activity so that lifecycle validation is still not widely accepted and applied the reason is twofold the lack of methods for the automatic derivation of software performance models from case documents and the lack of environments that implement and integrate such methodsa number of methods for the automatic derivation of software performance models from case documents has been already proposed in literature without however solving the automation problem this paper instead faces up to such problem by introducing an integrated and standards based environment for the automatic derivation and evaluation of queueing based performance modelsthe environment is based on the use of standards for metadata exchange mof xmi to ease the integration of the most common uml based case tools thus enabling software designers to smoothly introduce performance validation activities into their best development practices
in this paper we study the quality of service qos aware replica placement problem in grid environments although there has been much work on the replica placement problem in parallel and distributed systems most of them concern average system performance and have not addressed the important issue of quality of service requirement in the very few existing work that takes qos into consideration simplified replication model is assumed therefore their solution may not be applicable to real systems in this paper we propose more realistic model for replica placement which consider storage cost update cost and access cost of data replication and also assumes that the capacity of each replica server is boundedthe qos aware replica placement is np complete even in the simple model we propose two heuristic algorithms called greedy remove and greedy add to approximate the optimal solution our extensive experiment results demonstrate that both greedy remove and greedy add find near optimal solution effectively and efficiently our algorithms can also adapt to various parallel and distributed environments
pegasus is planning framework for mapping abstract workflows for execution on the grid this paper presents the implementation of web based portal for submitting workflows to the grid using pegasus the portal also includes components for generating abstract workflows based on metadata description of the desired data products and application specific services we describe our experiences in using this portal for two grid applications major contribution of our work is in introducing several components that can be useful for grid portals and hence should be included in grid portal development toolkits
sensitivity analysis sa is novel compiler technique that complements and integrates with static automatic parallelization analysis for the cases when relevant program behavior is input sensitive in this paper we show how sa can extract all the input dependent statically unavailable conditions for which loops can be dynamically parallelized sa generates sequence of sufficient conditions which when evaluated dynamically in order of their complexity can each validate the dynamic parallel execution of the corresponding loop for example sa can first attempt to validate parallelization by checking simple conditions related to loop bounds if such simple conditions cannot be met then validating dynamic parallelization may require evaluating conditions related to the entire memory reference trace of loop thus decreasing the benefits of parallel execution we have implemented sensitivity analysis in the polaris compiler and evaluated its performance using industry standard benchmark codes running on two multicore systems in most cases we have obtained speedups superior to the intel ifort compiler because with sa we could complement static analysis with minimum cost dynamic analysis and extract most of the available coarse grained parallelism
we present generalization of standard typestate systems in which the typestate of each object is determined by its membership in collection of abstract typestate sets this generalization supports typestates that model participation in abstract data types composite typestates that correspond to membership in multiple sets and hierarchical typestates because membership in typestate sets corresponds directly to participation in data structures our typestate system characterizes global sharing patternsin our approach each module encapsulates data structure and uses membership in abstract sets to characterize how objects participate in its data structure each analysis verifies that the implementation of the module preserves important internal data structure representation invariants and conforms to specification that uses formulas in set algebra to characterize the effects of operations on the data structure the analyses use the common set abstraction to characterize how objects participate in multiple data structures and to enable the inter analysis communication required to verify properties that depend on multiple modules analyzed by different analyses
software robustness has significant impact on system availability unfortunately finding software bugs is very challenging task because many bugs are hard to reproduce while debugging program it would be very useful to rollback crashed program to previous execution point and deterministically re execute the buggy code region however most previous work on rollback and replay support was designed to survive hardware or operating system failures and is therefore too heavyweight for the fine grained rollback and replay needed for software debugging this paper presents flashback lightweight os extension that provides fine grained rollback and replay to help debug software flashback uses shadow processes to efficiently roll back in memory state of process and logs process interactions with the system to support deterministic replay both shadow processes and logging of system calls are implemented in lightweight fashion specifically designed for the purpose of software debugging we have implemented prototype of flashback in the linux operating system our experimental results with micro benchmarks and real applications show that flashback adds little overhead and can quickly roll back debugged program to previous execution point and deterministically replay from that point
we explore the use of multi finger input to emulate full mouse functionality such as the tracking state three buttons and chording we first present the design space for such techniques which serves as guide for the systematic investigation of possible solutions we then perform series of pilot studies to come up with recommendations for the various aspects of the design space these pilot studies allow us to arrive at recommended technique the sdmouse in formal study the sdmouse was shown to significantly improve performance in comparison to previously developed mouse emulation techniques
in most distributed systems naming of nodes for low level communication leverages topological location such as node addresses and is independent of any application in this paper we investigate an emerging class of distributed systems where low level communication does not rely on network topological location rather low level communication is based on attributes that are external to the network topology and relevant to the application when combined with dense deployment of nodes this kind of named data enables in network processing for data aggregation collaborative signal processing and similar problems these approaches are essential for emerging applications such as sensor networks where resources such as bandwidth and energy are limited this paper is the first description of the software architecture that supports named data and in network processing in an operational multi application sensor network we show that approaches such as in network aggregation and nested queries can significantly affect network traffic in one experiment aggregation reduces traffic by up to and nested queries reduce loss rates by although aggregation has been previously studied in simulation this paper demonstrates nested queries as another form of in network processing and it presents the first evaluation of these approaches over an operational testbed
in this paper we present method for binary image comparison for binary images intensity information is poor and shape extraction is often difficult therefore binary images have to be compared without using feature extraction due to the fact that different scene patterns can be present in the images we propose modified hausdorff distance hd locally measured in an adaptive way the resulting set of measures is richer than single global measure the local hd measures result in local dissimilarity map ldmap including the dissimilarity spatial layout classification of the images in function of their similarity is carried out on the ldmaps using support vector machine the proposed method is tested on medieval illustration database and compared with other methods to show its efficiency
research over the past five years has shown significant performance improvements using technique called adaptive compilation an adaptive compiler uses compile execute analyze feedback loop to find the combination of optimizations and parameters that minimizes some performance goal such as code size or execution timedespite its ability to improve performance adaptive compilation has not seen widespread use because of two obstacles the large amounts of time that such systems have used to perform the many compilations and executions prohibits most users from adopting these systems and the complexity inherent in feedback driven adaptive system has made it difficult to build and hard to usea significant portion of the adaptive compilation process is devoted to multiple executions of the code being compiled we have developed technique called virtual execution to address this problem virtual execution runs the program single time and preserves information that allows us to accurately predict the performance of different optimization sequences without running the code again our prototype implementation of this technique significantly reduces the time required by our adaptive compilerin conjunction with this performance boost we have developed graphical user interface gui that provides controlled view of the compilation process by providing appropriate defaults the interface limits the amount of information that the user must provide to get started at the same time it lets the experienced user exert fine grained control over the parameters that control the system
we report on field study of the multitasking behavior of computer users focused on the suspension and resumption of tasks data was collected with tool that logged users interactions with software applications and their associated windows as well as incoming instant messaging and email alerts we describe methods summarize results and discuss design guidelines suggested by the findings
we consider the problem of delivering an effective fine grained clustering tool to implementors and users of object oriented database systems this work emphasizes on line clustering mechanisms as contrasted with earlier work that concentrates on clustering policies deciding which objects should be near each other existing on line clustering methods can be ineffective and or difficult to use and may lead to poor space utilization on disk and in the disk block cache particularly for small to medium size groups of objects we introduce variable size clusters vclusters fine grained object clustering architecture that can be used directly or as the target of an automatic clustering algorithm we describe an implementation of vclusters in the shore oodbms and present experimental results that show that vclusters significantly outperform other mechanisms commonly found in object database systems fixed size clusters and near hints vclusters deliver excellent clustering and space utilization with only modest cost for maintaining clustering during updates
this paper presents aip accountable internet protocol network architecture that provides accountability as first order property aip uses hierarchy of self certifying addresses in which each component is derived from the public key of the corresponding entity we discuss how aip enables simple solutions to source spoofing denial of service route hijacking and route forgery we also discuss how aip’s design meets the challenges of scaling key management and traffic engineering
prior research indicates that there is much spatial variation in applications memory access patterns modern memory systems however use small fixed size cache blocks and as such cannot exploit the variation increasing the block size would not only prohibitively increase pin and interconnect bandwidth demands but also increase the likelihood of false sharing in shared memory multiprocessors in this paper we show that memory accesses in commercial workloads often exhibit repetitive layouts that span large memory regions eg several kb and these accesses recur in patterns that are predictable through codebased correlation we propose spatial memory streaming practical on chip hardware technique that identifies codecorrelated spatial access patterns and streams predicted blocks to the primary cache ahead of demand misses using cycle accurate full system multiprocessor simulation of commercial and scientific applications we demonstrate that spatial memory streaming can on average predict of and of off chip misses for mean performance improvement of and at best
we consider network of autonomous peers forming logically global but physically distributed search engine where every peer has its own local collection generated by independently crawling the web challenging task in such systems is to efficiently route user queries to peers that can deliver high quality results and be able to rank these returned results thus satisfying the users information need however the problem inherent with this scenario is selecting few promising peers out of an priori unlimited number of peers in recent research rather strict notion of semantic overlay networks has been established in most approaches peers are connected to other peers based on rigid semantic profile by clustering them based on their contents in contrast our strategy follows the spirit of peer autonomy and creates semantic overlay networks based on the notion of peer to peer dating peers are free to decide which connections they create and which they want to avoid based on various usefulness estimators the proposed techniques can be easily integrated into existing systems as they require only small additional bandwidth consumption as most messages can be piggybacked onto established communication we show how we can greatly benefit from these additional semantic relations during query routing in search engines such as minerva and in the jxp algorithm which computes the pagerank authority measure in completely decentralized manner
this article addresses the challenge of sound typestate verification with acceptable precision for real world java programs we present novel framework for verification of typestate properties including several new techniques to precisely treat aliases without undue performance costs in particular we present flow sensitive context sensitive integrated verifier that utilizes parametric abstract domain combining typestate and aliasing information to scale to real programs without compromising precision we present staged verification system in which faster verifiers run as early stages which reduce the workload for later more precise stages we have evaluated our framework on number of real java programs checking correct api usage for various java standard libraries the results show that our approach scales to hundreds of thousands of lines of code and verifies correctness for percnt of the potential points of failure
understanding experience is critical issue for variety of professions especially design to understand experience and the user experience that results from interacting with products designers conduct situated research activities focused on the interactions between people and products and the experience that results this paper attempts to clarify experience in interactive systems we characterize current approaches to experience from number of disciplines and present framework for designing experience for interactive system we show how the framework can be applied by members of multidisciplinary team to understand and generate the kinds of interactions and experiences new product and system designs might offer
current visualization tools lack the ability to perform full range spatial and temporal analysis on terascale scientific datasets two key reasons exist for this shortcoming and postprocessing on these datasets are being performed in suboptimal manners and the subsequent data extraction and analysis routines have not been studied in depth at large scales we resolved these issues through advanced techniques and improvements to current query driven visualization methods we show the efficiency of our approach by analyzing over terabyte of multivariate satellite data and addressing two key issues in climate science time lag analysis and drought assessment our methods allowed us to reduce the end to end execution times on these problems to one minute on cray xt machine
it is now feasible to view video at home as easily as text based pages were viewed when the web first appeared this development has led to the emergence of video search engines providing hosting indexing and access to large online video repositories key question in this new context is whether users search for media in the same way that they search for text this paper presents first step towards answering this question by providing novel analyses of people’s linking and search behavior using leading video search engine initial results show that page views in the video context deviate from the typical power law relationships seen on the web however more positively there are clear indications that tagging and textual descriptions play key role in making some video pages more popular than others this shows that many techniques based on text analysis could apply in the video context
the storage manager of general purpose database system can retain consistent disk page level snapshots and run application programs back in time against long lived past states virtualized to look like the current state this opens the possibility that functions such as on line trend analysis and audit formerly available in specialized temporal databases can become available to general applications in general purpose databases up to now in place updating database systems had no satisfactory way to run programs on line over long lived disk page level copy on write snapshots because there was no efficient indexing method for such snapshots we describe skippy new indexing approach that solves this problem using skippy database application code can run against an arbitrarily old snapshot and iterate over snapshot ranges as efficiently it can access recent snapshots for all update workloads performance evaluation of skippy based on theoretical analysis and experimental measurements indicates that the new approach provides efficient access to snapshots at low cost
as applications of description logics proliferate efficient reasoning with knowledge bases containing many assertions becomes ever more important for such cases we developed novel reasoning algorithm that reduces mathcal shiq knowledge base to disjunctive datalog program while preserving the set of ground consequences queries can then be answered in the resulting program while reusing existing and practically proven optimization techniques of deductive databases such as join order optimizations or magic sets moreover we use our algorithm to derive precise data complexity bounds we show that mathcal shiq is data complete for np and we identify an expressive fragment of mathcal shiq with polynomial data complexity
the layers architectural pattern has been widely adopted by the developer community in order to build large software systems in reality as the system evolves over time rarely does the system remain conformed to the intended layers pattern causing significant degradation of the system maintainability as part of re factoring such system practitioners often undertake mostly manual exercise to discover the intended layers and organize the modules into these layers in this paper we present method for semi automatically detecting layers in the system and propose quantitative measurement to compute the amount of non conformance of the system from the set of layered design principles we have applied the layer detection method and the non conformance measurement on set of open source and proprietary enterprise applications
weblogs and message boards provide online forums for discussion that record the voice of the public woven into this mass of discussion is wide range of opinion and commentary about consumer products this presents an opportunity for companies to understand and respond to the consumer by analyzing this unsolicited feedback given the volume format and content of the data the appropriate approach to understand this data is to use large scale web and text data mining technologiesthis paper argues that applications for mining large volumes of textual data for marketing intelligence should provide two key elements suite of powerful mining and visualization technologies and an interactive analysis environment which allows for rapid generation and testing of hypotheses this paper presents such system that gathers and annotates online discussion relating to consumer products using wide variety of state of the art techniques including crawling wrapping search text classification and computational linguistics marketing intelligence is derived through an interactive analysis framework uniquely configured to leverage the connectivity and content of annotated online discussion
oodbmss need more than declarative query languages and programming languages as their interfaces since they are designed and implemented for complex applications requiring more advanced and easy to use visual interfaces we have developed complete programming environment for this purpose called moodview moodview translates all the user actions performed through its graphical interface to sql statements and therefore it can be ported onto any object oriented database systems using sql moodview provides the database programmer with tools and functionalities for every phase of object oriented database application development current version of moodview allows database user to design browse and modify database schema interactively and to display class inheritance hierarchy as directed acyclic graph moodview can automatically generate graphical displays for complex and multimedia database objects which can be updated through the object browser furthermore database administration tool full screen text editor sql based query manager and graphical indexing tool for the spatial data ie trees are also implemented
we demonstrate that collaborative relationship between the operating system and applications can be used to meet user specified goals for battery duration we first describe novel profiling based approach for accurately measuring application and system energy consumption we then show how applications can dynamically modify their behavior to conserve energy we extend the linux operating system to yield battery lifetimes of user specified duration by monitoring energy supply and demand and by maintaining history of application energy use the approach can dynamically balance energy conservation and application quality our evaluation shows that this approach can meet goals that extend battery life by as much as percnt
the physiology of human visual perception helps explain different uses for color and luminance in visual arts when visual fields are isoluminant they look the same to our luminance processing pathway while potentially looking quite different to the color processing path this creates perceptual tension exploited by skilled artists in this paper we show how reproducing target color using set of isoluminant yet distinct colors can both improve existing npr image filters and help create new ones straight forward geometric technique for isoluminant color picking is presented and then applied in an improved pointillist filter new chuck close inspired filter and novel type of image mosaic filter
lately there exist increasing demands for online abnormality monitoring over trajectory streams which are obtained from moving object tracking devices this problem is challenging due to the requirement of high speed data processing within limited space cost in this paper we present novel framework for monitoring anomalies over continuous trajectory streams first we illustrate the importance of distance based anomaly monitoring over moving object trajectories then we utilize the local continuity characteristics of trajectories to build local clusters upon trajectory streams and monitor anomalies via efficient pruning strategies finally we propose piecewise metric index structure to reschedule the joining order of local clusters to further reduce the time cost our extensive experiments demonstrate the effectiveness and efficiency of our methods
recently introduced information theoretic approach to analyzing redundancies in database design was used to justify normal forms like bcnf that completely eliminate redundancies the main notion is that of an information content of each datum in an instance which is number in the closer to the less redundancy it carries in practice however one usually settles for nf which unlike bcnf may not eliminate all redundancies but always guarantees dependency preservationin this paper we use the information theoretic approach to prove that nf is the best normal form if one needs to achieve dependency preservation for each dependency preserving normal form we define the price of dependency preservation as an information theoretic measure of redundancy that gets introduced to compensate for dependency preservation this is number in the range the smaller it is the less redundancy normal form guarantees we prove that for every dependency preserving normal form the price of dependency preservation is at least and it is precisely for nf hence nf has the least amount of redundancy among all dependency preserving normal forms we also show that information theoretically unnormalized schemas have at least twice the amount of redundancy than schemas in nf
this paper focuses on the integration of the also integrated declarative paradigms of functional logic and fuzzy logic programming in order to obtain richer and much more expressive programming scheme where mathematical functions cohabit with fuzzy logic features our final goal is to achieve fully integrated language amalgamating powerful programming resources from both worlds including an efficient lazy evaluation mechanism non determinism similarity relations and so on starting with two representative languages from both settings namely curry and likelog we propose hybrid dialect where set of rewriting rules associated to the functional logic dimension of the language are accompanied with set of similarity equations between symbols of the same nature and arity which represents the fuzzy counterpart of the novel framework then we directly act inside the kernel of the operational mechanism of the language thus obtaining fuzzy variant of needed narrowing which safely deals with similarity relations key point in the design of this last operational method is that in contrast with the crisp case the fuzzified version of the function which determines the set of tuples that enable new narrowing steps for given goal say must explicitly take care of binding variables belonging to both goals and program rules this action is crucial to model the notion of needed narrowing with similarity relations step by means of transition system whose final states are triples of the form which collect the three relevant components of the new notion of fuzzy computed answer finally we prove that the resulting strategy verifies that apart from computing at least the same elements of the crisp case crispness all similar terms of given goal are completely treated too by fully exploiting the similarities collected in given program fuzziness while avoiding the risk of infinite loops associated to the intrinsic reflexive symmetric and transitive properties of similarity relations termination
we have witnessed great interest and wealth of promise in content based image retrieval as an emerging technology while the last decade laid foundation to such promise it also paved the way for large number of new techniques and systems got many new people involved and triggered stronger association of weakly related fields in this article we survey almost key theoretical and empirical contributions in the current decade related to image retrieval and automatic image annotation and in the process discuss the spawning of related subfields we also discuss significant challenges involved in the adaptation of existing image retrieval techniques to build systems that can be useful in the real world in retrospect of what has been achieved so far we also conjecture what the future may hold for image retrieval research
in current organizations valuable enterprise knowledge is often buried under rapidly expanding huge amount of unstructured information in the form of web pages blogs and other forms of human text communications we present novel unsupervised machine learning method called corder community relation discovery by named entity recognition to turn these unstructured data into structured information for knowledge management in these organizations corder exploits named entity recognition and co occurrence data to associate individuals in an organization with their expertise and associates we discuss the problems associated with evaluating unsupervised learners and report our initial evaluation experiments in an expert evaluation quantitative benchmarking and an application of corder in social networking tool called buddyfinder
in the last years several machine learning approaches have been developed for classification and regression in an intuitive manner we introduce the main ideas of classification and regression trees support vector machines bagging boosting and random forests we discuss differences in the use of machine learning in the biomedical community and the computer sciences we propose methods for comparing machines on sound statistical basis data from the german stroke study collaboration is used for illustration we compare the results from learning machines to those obtained by published logistic regression and discuss similarities and differences
peer to peer pp file sharing systems generate major portion of the internet traffic and this portion is expected to increase in the future we explore the potential of deploying proxy caches in different autonomous systems ases with the goal of reducing the cost incurred by internet service providers and alleviating the load on the internet backbone we conduct an eight month measurement study to analyze the pp traffic characteristics that are relevant to caching such as object popularity popularity dynamics and object size our study shows that the popularity of pp objects can be modeled by mandelbrot zipf distribution and that several workloads exist in pp traffic guided by our findings we develop novel caching algorithm for pp traffic that is based on object segmentation and proportional partial admission and eviction of objects our trace based simulations show that with relatively small cache size byte hit rate of up to can be achieved by our algorithm which is close to the byte hit rate achieved by an off line optimal algorithm with complete knowledge of future requests our results also show that our algorithm achieves byte hit rate that is at least more and at most triple the byte hit rate of the common web caching algorithms furthermore our algorithm is robust in face of aborted downloads which is common case in pp systems
we study the effects of feature selection and human feedback on features in active learning settings our experiments on variety of text categorization tasks indicate that there is significant potential in improving classifier performance by feature reweighting beyond that achieved via selective sampling alone standard active learning if we have access to an oracle that can point to the important most predictive features consistent with previous findings we find that feature selection based on the labeled training set has little effect but our experiments on human subjects indicate that human feedback on feature relevance can identify sufficient proportion of the most relevant features furthermore these experiments show that feature labeling takes much less about th time than document labeling we propose an algorithm that interleaves labeling features and documents which significantly accelerates active learning
this paper proposes generic framework for monitoring continuous spatial queries over moving objects the framework distinguishes itself from existing work by being the first to address the location update issue and to provide common interface for monitoring mixed types of queries based on the notion of safe region the client location update strategy is developed based on the queries being monitored thus it significantly reduces the wireless communication and query reevaluation costs required to maintain the up to date query results we propose algorithms for query evaluation reevaluation and for safe region computation in this framework enhancements are also proposed to take advantage of two practical mobility assumptions maximum speed and steady movement the experimental results show that our framework substantially outperforms the traditional periodic monitoring scheme in terms of monitoring accuracy and cpu time while achieving close to optimal wireless communication cost the framework also can scale up to large monitoring system and is robust under various object mobility patterns
software repository place where reusable components are stored and searched for is key ingredient for instituting and popularizing software reuse it is vital that software repository should be well organized and provide efficient tools for developers to locate reusable components that meet their requirements the growing hierarchical self organizing map ghsom an unsupervised learning neural network is powerful data mining technique for the clustering and visualization of large and complex data sets the resulting maps serving as retrieval interfaces can be beneficial to developers in obtaining better insight into the structure of software repository and increasing their understanding of the relationships among software components the ghsom which is an improvement over the basic self organizing map som can adapt its architecture during its learning process and expose the hierarchical structure that exists in the original data in this paper we demonstrate the potential of the ghsom for the organization and visualization of collection of reusable components stored in software repository and compare the results with the ones obtained by using the traditional som
in this paper we study model for ad hoc networks close enough to reality as to represent existing networks being at the same time concise enough to promote strong theoretical results the quasi unit disk graph model contains all edges shorter than parameter between and and no edges longer than we show that in comparison to the cost known on unit disk graphs the complexity results in this model contain the additional factor we prove that in quasi unit disk graphs flooding is an asymptotically message optimal routing technique provide geometric routing algorithm being more efficient above all in dense networks and show that classic geometric routing is possible with the same performance guarantees as for unit disk graphs if
performance and energy consumption behavior of embedded applications are increasingly being dependent on their memory usage access patterns focusing on software managed application specific multi level memory hierarchy this paper studies three different memory hierarchy management schemes from both energy and performance angles the first scheme is pure performance oriented and tuned for extracting the maximum performance possible from the software managed multi level memory hierarchy the second scheme is built upon the first one but it also reduces leakage by turning on and off memory modules ie different memory levels at appropriate program points during execution based on the data access pattern information extracted by the compiler the last scheme evaluated is oriented towards further reducing leakage energy as well as dynamic energy by modifying the data transfer policy data access pattern of the performance oriented scheme our empirical analysis indicates that it is possible to reduce leakage consumption of the application specific multi level memory hierarchy without seriously impacting its performance and that one can achieve further savings by modifying data transfer pattern across the different levels of the memory hierarchy
we describe dstm java software library that provides flexible framework for implementing object based software transactional memory stm the library uses transactional factories to transform sequential unsynchronized classes into atomic transactionally synchronized ones providing substantial improvement over the awkward programming interface of our previous dstm library furthermore researchers can experiment with alternative stm mechanisms by providing their own factories we demonstrate this flexibility by presenting two factories one that uses essentially the same mechanisms as the original dstm with some enhancements and another that uses completely different approachbecause dstm is packaged as java library wide range of programmers can easily try it out and the community can begin to gain experience with transactional programming furthermore researchers will be able to use the body of transactional programs that arises from this community experience to test and evaluate different stm mechanisms simply by supplying new transactional factories we believe that this flexible approach will help to build consensus about the best ways to implement transactions and will avoid the premature lock in that may arise if stm mechanisms are baked into compilers before such experimentation is done
languages supporting polymorphism typically have ad hoc restrictions on where polymorphic types may occur supporting firstclass polymorphism by lifting those restrictions is obviously desirable but it is hard to achieve this without sacrificing type inference we present new type system for higher rank and impredicative polymorphism that improves on earlier proposals it is an extension of damas milner it relies only on system types it has simple declarative specification it is robust to program transformations and it enjoys complete and decidable type inference algorithm
materialized views defined over distributed data sources are critical for many applications to ensure efficient access reliable performance and high availability materialized views need to be maintained upon source updates since stale view extents may not serve well or may even mislead user applications thus view maintenance performance is one of the keys to the success of these applications in this work we investigate two maintenance strategies extended batching and view graph transformation for maintaining general join views where join conditions may exist between any pairs of data sources possibly with cycles many choices are available for maintaining cyclic join views we thus propose cost driven view maintenance framework which generates optimized maintenance plans tuned to the environmental settings the proposed framework has been implemented in the txnwrap system experimental studies illustrate that our proposed optimization techniques significantly improve the view maintenance performance in distributed environment
in this paper we consider novel scheme referred to as cartesian contour to concisely represent the collection of frequent itemsets different from the existing works this scheme provides complete view of these itemsets by covering the entire collection of them more interestingly it takes first step in deriving generative view of the frequent pattern formulation ie how small number of patterns interact with each other and produce the complexity of frequent itemsets we perform theoretical investigation of the concise representation problem and link it to the biclique set cover problem and prove its np hardness we develop novel approach utilizing the technique developed in frequent itemset mining set cover and max cover to approximate the minimal biclique set cover problem in addition we consider several heuristic techniques to speedup the construction of cartesian contour the detailed experimental study demonstrates the effectiveness and efficiency of our approach
recent research on the human visual system shows that our perception of object shape relies in part on compression and stretching of the reflected lighting environment onto its surface we use this property to enhance the shape depiction of objects by locally warping the environment lighting around main surface features contrary to previous work which require specific illumination material characteristics and or stylization choices our approach enhances surface shape without impairing the desired appearance thanks to our novel local shape descriptor salient surface features are explicitly extracted in view dependent fashion at various scales without the need of any pre process we demonstrate our system on variety of rendering settings using object materials ranging from diffuse to glossy to mirror or refractive with direct or global illumination and providing styles that range from photorealistic to non photorealistic the warping itself is very fast to compute on modern graphics hardware enabling real time performance in direct illumination scenarios note third party material attribution third party material used in acm transactions on graphics article light warping for enhanced surface depiction by vergne pacanowski barla granier and schlick was used without proper attribution the model used in figures and as well as in the cover image of this volume of the journal was downloaded from the shape repository of aim shape project http shapesaimatshapenet and is the property of cnr imati we regret this oversight
fast and simultaneous retrieval of aggregate sums or averages from multiple regions in wireless sensor network can be achieved by constructing distributed data cube ddc however the prior work focused on maintaining ddc by globally synchronous protocol which is not flexible for large scale sensor network in this paper we propose more general ddc gddc which supports asynchronous ddc updates by using the proposed gddc only nodes need to be visited to compute an aggregate sum or average query over rectangular region with nodes first we develop the fundamental semantics for aggregate queries in system model without synchronized clock second we define the concept of consistency and derive set of theorems to guarantee correct query results third we design new distributed algorithms to implement gddc finally we evaluate the proposed techniques by extensive experiments many interesting impact factors of query accuracy have also been analyzed
nowadays most of the energy aware real time scheduling algorithms belong to the dvfs dynamic voltage and frequency scaling framework these dvfs algorithms are usually efficient but in addition to often consider unrealistic assumptions they do not take into account the current evolution of the processor energy consumption profiles in this paper we propose an alternative to the dvfs framework which preserves energy while considering the emerging technologies we introduce dual cpu type multiprocessor platform model compatible with any general purpose processor and non dvfs associated methodology which considerably simplifies the energy aware real time scheduling problem while providing significant energy savings
spatial relationships between objects are important features for designing content based image retrieval system in this paper we propose new scheme called spa representation for encoding the spatial relations in an image with this representation important functions of intelligent image database systems such as visualization browsing spatial reasoning iconic indexing and similarity retrieval can be easily achieved the capability of discriminating images based on spa representation is much more powerful than any spatial representation method based on minimum bounding rectangles or centroids of objects the similarity measures using spa representation provide wide range of fuzzy matching capability in similarity retrieval to meet different user’s requirements experimental results showed that our system is very effective in terms of recall and precision in addition the spa representation can be incorporated into two level index structure to help reduce the search space of each query processing the experimental results also demonstrated that on average only percent sim percent of symbolic pictures depending on various degrees of similarity were accessed per query in an image database containing symbolic pictures
refactorings are behaviour preserving program transformations typically for improving the structure of existing code few of these transformations have been mechanised in interactive development environments many more refactorings have been proposed and it would be desirable for programmers to script their own refactorings implementing such source to source transformations however is quite complex even the most sophisticated development environments contain significant bugs in their refactoring toolswe present domain specific language for refactoring named jungl it manipulates graph representation of the program all information about the program including asts for its compilation units variable binding control flow and so on is represented in uniform graph format the language is hybrid of functional language in the style of ml and logic query language akin to datalog jungl furthermore has notion of demand driven evaluation for constructing computed information in the graph such as control flow edges borrowing from earlier work on the specification of compiler optimisations jungl uses so called path queries to express dataflow propertieswe motivate the design of jungl via number of non trivial refactorings and describe its implementation on thenet platform
we present sampling based method for approximating the boundary of geometry defined by various geometric operations based on novel adaptive sampling condition we first construct volumetric grids such that an error minimizing point can be found in each cell to capture all the geometric objects inside the cell we then construct polygonal model from the grid we guarantee the boundary approximation has the same topology as the exact surfaces and the maximum approximation error from the exact surfaces is bounded by user specified tolerance our method is robust and easy to implement we have applied it in various applications such as remeshing of polygonal models boolean operations and offsetting operations we report experimental results on variety of cad models
spectral clustering algorithm has been shown to be more effective in finding clusters than most traditional algorithms however spectral clustering suffers from scalability problem in both memory use and computational time when dataset size is large to perform clustering on large datasets we propose to parallelize both memory use and computation on distributed computers through an empirical study on large document dataset of data instances and large photo dataset of we demonstrate that our parallel algorithm can effectively alleviate the scalability problem
local search is increasingly becoming major focus point of research interest it is widely recognized speciality search with large application area its data is usually aggregated from variety of sources one as yet largely untapped source of location data is the www today the web does not explicitly reveal its location relation rather this information is hidden somewhere within pages contents to exploit such location information we need to find extract and geo spatially index relevant web pages for an effective retrieval of such content this paper examines the application of focused web crawling to the geospatial domain we describe our approach for geo aware focused crawling of urban areas and other regions with high building density we present our experimental results that give us insight into spatial web information such as location density and link distance between topical pages our crawls and evaluations back our hypothesis that geospatially focused crawling is suitable for the urban geospatial topic
we study routing and scheduling in packet switched networks we assume an adversary that controls the injection time source and destination for each packet injected set of paths for these packets is admissible if no link in the network is overloaded we present the first on line routing algorithm that finds set of admissible paths whenever this is feasible our algorithm calculates path for each packet as soon as it is injected at its source using simple shortest path computation the length of link reflects its current congestion we also show how our algorithm can be implemented under today’s internet routing paradigmswhen the paths are known either given by the adversary or computed as above our goal is to schedule the packets along the given paths so that the packets experience small end to end delays the best previous delay bounds for deterministic and distributed scheduling protocols were exponential in the path length in this article we present the first deterministic and distributed scheduling protocol that guarantees polynomial end to end delay for every packetfinally we discuss the effects of combining routing with scheduling we first show that some unstable scheduling protocols remain unstable no matter how the paths are chosen however the freedom to choose paths can make difference for example we show that ring with parallel links is stable for all greedy scheduling protocols if paths are chosen intelligently whereas this is not the case if the adversary specifies the paths
as feature sizes shrink closer to single digit nanometer dimensions defect tolerance will become increasingly important this is true whether the chips are manufactured using top down methods such as photolithography or bottom up assembly processes such as chemically assembled electronic nanotechnology caen in this chapter we examine the consequences of this increased rate of defects and describe defect tolerance methodology centered around reconfigurable devices scalable testing method and dynamic place and route we summarize some of our own results in this area as well as those of others and enumerate some future research directions required to make nanometer scale computing reality
one of the major concerns in requirements engineering is to establish that the whys of the system to be developed fit the whats of the delivered system the aim is to ensure best fit between organisation needs whys and system functionality whats however systems once developed undergo changes and it is of prime importance that the changed need and the changed system functionality continue to preserve the best fit we explore the fitness relationship to reveal its nature and its engineering process we identify major issues that must be addressed in this process to arrive at the best fit we also consider the preservation of this relationship in the face of change and discuss some issues specific to this scenario the results presented are founded in our experience in about dozen industrial and research european projects
this paper aims to provide quantitative understanding of the performance of image and video processing applications on general purpose processors without and with media isa extensions we use detailed simulation of benchmarks to study the effectiveness of current architectural features and identify future challenges for these workloadsour results show that conventional techniques in current processors to enhance instruction level parallelism ilp provide factor of to performance improvement the sun vis media isa extensions provide an additional to performance improvement the ilp features and media isa extensions significantly reduce the cpu component of execution time making of the image processing benchmarks memory boundthe memory behavior of our benchmarks is characterized by large working sets and streaming data accesses increasing the cache size has no impact on of the benchmarks the remaining benchmarks require relatively large cache sizes dependent on the display sizes to exploit data reuse but derive less than performance benefits with the larger caches software prefetching provides to performance improvement in the image processing benchmarks where memory is significant problem with the addition of software prefetching all our benchmarks revert to being compute bound
in graphical user interface physical layout and abstract structure are two important aspects of graph this article proposes new graph grammar formalism which integrates both the spatial and structural specification mechanisms in single framework this formalism is equipped with parser that performs in polynomial time with an improved parsing complexity over its nonspatial predecessor that is the reserved graph grammar with the extended expressive power the formalism is suitable for many user interface applications the article presents its application in adaptive web design and presentation
tracers provide users with useful information about program executions in this article we propose tracer driver from single tracer it provides powerful front end enabling multiple dynamic analysis tools to be easily implemented while limiting the overhead of the trace generation the relevant execution events are specified by flexible event patterns and large variety of trace data can be given either systematically or on demand the proposed tracer driver has been designed in the context of constraint logic programming clp experiments have been made within gnu prolog execution views provided by existing tools have been easily emulated with negligible overhead experimental measures show that the flexibility and power of the described architecture lead to good performance the tracer driver overhead is inversely proportional to the average time between two traced events whereas the principles of the tracer driver are independent of the traced programming language it is best suited for high level languages such as clp where each traced execution event encompasses numerous low level execution steps furthermore clp is especially hard to debug the current environments do not provide all the useful dynamic analysis tools they can significantly benefit from our tracer driver which enables dynamic analyses to be integrated at very low cost
graphical user interfaces guis are one of the most commonly used parts of today’s software despite their ubiquity testing guis for functional correctness remains an understudied area typical gui gives many degrees of freedom to an end user leading to an enormous input event interaction space that needs to be tested gui test designers generate and execute test cases modeled as sequences of user events to traverse its parts targeting subspace in order to maximize fault detection is nontrivial task in this vein in previous work we used informal gui code examination and personal intuition to develop an event interaction graph eig in this article we empirically derive the eig model via pilot study and the resulting eig validates our intuition used in previous work the empirical derivation process also allows for model evolution as our understanding of gui faults improves results of the pilot study show that events interact in complex ways gui’s response to an event may vary depending on the context established by preceding events and their execution order the eig model helps testers to understand the nature of interactions between gui events when executed in test cases and why certain events detect faults so that they can better traverse the event space new test adequacy criteria are defined for the eig new algorithms use these criteria and eig to systematically generate test cases that are shown to be effective on four fielded open source applications
in this paper we propose an authenticated encryption mode for blockciphers our authenticated encryption mode cip has provable security bounds which are better than the usual birthday bound security besides the proven security bound for authenticity of cip is better than any of the previously known schemes the design is based on the encrypt then prf approach where the encryption part uses key stream generation of cenc and the prf part combines hash function based on the inner product and blockcipher
whanau is novel routing protocol for distributed hash tables dhts that is efficient and strongly resistant to the sybil attack whanau uses the social connections between users to build routing tables that enable sybil resistant lookups the number of sybils in the social network does not affect the protocol’s performance but links between honest users and sybils dowhen there are well connected honest nodes whanau can tolerate up to log such attack edges this means that an adversary must convince large fraction of the honest users to make social connection with the adversary’s sybils before any lookups will fail whanau uses ideas from structured dhts to build routing tables that contain log entries per node it introduces the idea of layered identifiers to counter clustering attacks class of sybil attacks challenging for previous dhts to handle using the constructed tables lookups provably take constant time simulation results using social network graphs from livejournal flickr youtube and dblp confirm the analytic results experimental results on planetlab confirmthat the protocol can handle modest churn
the increasing complexity of heterogeneous systems on chip soc and distributed embedded systems makes system optimization and exploration challenging task ideally designer would try all possible system configurations and choose the best one regarding specific system requirements unfortunately such an approach is not possible because of the tremendous number of design parameters with sophisticated effects on system properties consequently good search techniques are needed to find design alternatives that best meet constraints and cost criteria in this paper we present compositional design space exploration framework for system optimization and exploration using symta software tool for formal performance analysis in contrast to many previous approaches pursuing closed automated exploration strategies over large sets of system parameters our approach allows the designer to effectively control the exploration process to quickly find good design alternatives an important aspect and key novelty of our approach is system optimization with traffic shaping
we present streaming algorithm for reconstructing closed surfaces from large non uniform point sets based on geometric convection technique assuming that the sample points are organized into slices stacked along one coordinate axis triangle mesh can be efficiently reconstructed in streamable layout with controlled memory footprint our algorithm associates streaming delaunay triangulation data structure with multilayer version of the geometric convection algorithm our method can process millions of sample points at the rate of points per minute with mb of main memory
support vector regression svr solves regression problems based on the concept of support vector machine svm introduced by vapnik the main drawback of these newer techniques is their lack of interpretability in other words it is difficult for the human analyst to understand the knowledge learnt by these models during training the most popular way to overcome this difficulty is to extract if then rules from svm and svr rules provide explanation capability to these models and improve the comprehensibility of the system over the last decade different algorithms for extracting rules from svm have been developed however rule extraction from svr is not widely available yet in this paper novel hybrid approach for extracting rules from svr is presented the proposed hybrid rule extraction procedure has two phases obtain the reduced training set in the form of support vectors using svr train the machine leaning techniques with explanation capability using the reduced training set machine learning techniques viz classification and regression tree cart adaptive network based fuzzy inference system anfis and dynamic evolving fuzzy inference system denfis are used in the phase the proposed hybrid rule extraction procedure is compared to stand alone cart anfis and denfis extensive experiments are conducted on five benchmark data sets viz auto mpg body fat boston housing forest fires and pollution to demonstrate the effectiveness of the proposed approach in generating accurate regression rules the efficiency of these techniques is measured using root mean squared error rmse from the results obtained it is concluded that when the support vectors with the corresponding predicted target values are used the svr based hybrids outperform the stand alone intelligent techniques and also the case when the support vectors with the corresponding actual target values are used
networks of workstations nows which are generally composed of autonomous compute elements networked together are an attractive parallel computing platform since they offer high performance at low cost the autonomous nature of the environment however often results in inefficient utilization due to load imbalances caused by three primary factors unequal load compute or communication assignment to equally powerful compute nodes unequal resources at compute nodes and multiprogramming these load imbalances result in idle waiting time on cooperating processes that need to synchronize or communicate data additional waiting time may result due to local scheduling decisions in multiprogrammed environment in this paper we present combined approach of compile time analysis run time load distribution and operating system scheduler cooperation for improved utilization of available resources in an autonomous now the techniques we propose allow efficient resource utilization by taking into consideration all three causes of load imbalance in addition to locality of access in the process of load distribution the resulting adaptive load distribution and cooperative scheduling system allows applications to take advantage of parallel resources when available by providing better performance than when the loaded resources are not used at all
in eclipse and in most other development environments refactorings are activated by selecting code then using menu or hotkey and finally engaging in dialog with wizard however selection is error prone menus are slow hotkeys are hard to remember and wizards are time consuming the problem is that as consequence refactoring tools disrupt the programmer’s workflow and are perceived to be slower than refactoring by hand in this paper we present two new user interfaces to eclipse’s existing refactoring engine marking menus and refactoring cues both are designed to increase programming velocity by keeping the tool out of the programmer’s way
it is predicted that most computing and data storage will be done by cloud computing in the future tendency to use cloud services and changes in it world to become service based will be inevitable in future this leading change toward cloud computing will be great movement in it dependent industries one of the main parts of this variation is the usage of cloud platforms this method will affect software engineering events in software production process cloud platform let developers write programs which can both run in cloud space and use available services provided in cloud space in this paper survey on cloud platforms their arrangements foundation and infrastructure services and their main capabilities used in some leading software companies is presented
we develop distributed algorithms for adaptive sensor networks that respond to directing target through region of space we model this problem as an online distributed motion planning problem each sensor node senses values in its perception space and has the ability to trigger exceptions events we call ldquo danger rdquo and model as ldquo obstacles rdquo the danger obstacle landscape changes over time we present algorithms for computing distributed maps in perception space and for using these maps to compute adaptive paths for mobile node that can interact with the sensor network we give the analysis to the protocol and report on hardware experiments using physical sensor network consisting of mote sensors we also show how to reduce searching space and communication cost using voronoi diagram
server driven consistency protocols can reduce read latency and improve data freshness for given network and server overhead compared to the traditional consistency protocols that rely on client polling server driven consistency protocols appear particularly attractive for large scale dynamic web workloads because dynamically generated data can change rapidly and unpredictably however there have been few reports on engineering server driven consistency for such workloads this article reports our experience in engineering server driven consistency for sporting and event web site hosted by ibm one of the most popular sites on the internet for the duration of the event we also examine an commerce site for national retail store our study focuses on scalability and cachability of dynamic content to assess scalability we measure both the amount of state that server needs to maintain to ensure consistency and the bursts of load in sending out invalidation messages when popular object is modified we find that server driven protocols can cap the size of the server’s state to given amount without significant performance costs and can smooth the bursts of load with minimal impact on the consistency guarantees to improve performance we systematically investigate several design issues for which prior research has suggested widely different solutions including whether servers should send invalidations to idle clients finally we quantify the performance impact of caching dynamic data with server driven consistency protocols and the benefits of server driven consistency protocols for large scale dynamic web services we find that caching dynamically generated data can increase cache hit rates by up to percnt compared to the systems that do not cache dynamically generated data and ii server driven consistency protocols can increase cache hit rates by factor of for large scale dynamic web services compared to client polling protocols we have implemented prototype of server driven consistency protocol based on our findings by augmenting the popular squid cache
in this paper we investigate the problem of training support vector machines svms on count data multinomial dirichlet mixture models allow us to model efficiently count data on the other hand svms permit good discrimination we propose then hybrid model that appropriately combines their advantages finite mixture models are introduced as an svm kernel to incorporate prior knowledge about the nature of data involved in the problem at hand for the learning of our mixture model we propose deterministic annealing component wise em algorithm mixed with minimum description length type criterion in the context of this model we compare different kernels through some applications involving spam and image database categorization we find that our data driven kernel performs better
text message stream is newly emerging type of web data which is produced in enormous quantities with the popularity of instant messaging and internet relay chat it is beneficial for detecting the threads contained in the text stream for various applications including information retrieval expert recognition and even crime prevention despite its importance not much research has been conducted so far on this problem due to the characteristics of the data in which the messages are usually very short and incomplete in this paper we present stringent definition of the thread detection task and our preliminary solution to it we propose three variations of single pass clustering algorithm for exploiting the temporal information in the streams an algorithm based on linguistic features is also put forward to exploit the discourse structure information we conducted several experiments to compare our approaches with some existing algorithms on real dataset the results show that all three variations of the single pass algorithm outperform the basic single pass algorithm our proposed algorithm based on linguistic features improves the performance relatively by and when compared with the basic single pass algorithm and the best variation algorithm in terms of respectively
although the diversity of platforms for network experimentation is boon to the development of protocols and distributed systems it is challenging to exploit its benefits implementing or adapting the systems under test for such heterogeneous environments as network simulators network emulators testbeds and end systems is immensely time and work intensive in this paper we present vipe unified virtual platform for network experimentation that slashes the porting effort it allows to smoothly evolve single implementation of distributed system or protocol from its design up into its deployment by leveraging any form of network experimentation tool available
this work is focused on presenting split precondition approach for the modeling and proving the correctness of distributed algorithms formal specification and precise analysis of peterson’s distributed mutual exclusion algorithm for two process has been considered the proof of properties like mutual exclusion liveness and lockout freedom have also been presented
images of objects as queries is new approach to search for information on the web image based information retrieval goes beyond only matching images as information in other modalities also can be extracted from data collections using image search we have developed new system that uses images to search for web based information this paper has particular focus towards exploring user’s experience of general mobile image based web searches to find what issues and phenomena it contains this was achieved in multi part study by creating and letting respondents test prototypes of mobile image based search systems and collecting data using interviews observations video observations and questionnaires we observed that searching for information only based on visual similarity and without any assistance is sometimes difficult especially on mobile devices with limited interaction bandwidth most of our subjects preferred search tool that guides the users through the search result based on contextual information compared to presenting the search result as plain ranked list
keyword search in relational databases rdbs has been extensively studied recently keyword search or keyword query in rdbs is specified by set of keywords to explore the interconnected tuple structures in an rdb that cannot be easily identified using sql on rdbms in brief it finds how the tuples containing the given keywords are connected via sequences of connections foreign key references among tuples in an rdb such interconnected tuple structures can be found as connected trees up to certain size sets of tuples that are reachable from root tuple within radius or even multi center subgraphs within radius in the literature there are two main approaches one is to generate set of relational algebra expressions and evaluate every such expression using sql on an rdbms directly or in middleware on top of an rdbms indirectly due to large number of relational algebra expressions needed to process most of the existing works take middleware approach without fully utilizing rdbmss the other is to materialize an rdb as graph and find the interconnected tuple structures using graph based algorithms in memory in this paper we focus on using sql to compute all the interconnected tuple structures for given keyword query we use three types of interconnected tuple structures to achieve that and we control the size of the structures we show that the current commercial rdbmss are powerful enough to support such keyword queries in rdbs efficiently without any additional new indexing to be built and maintained the main idea behind our approach is tuple reduction in our approach in the first reduction step we prune tuples that do not participate in any results using sql and in the second join step we process the relational algebra expressions using sql over the reduced relations we conducted extensive experimental studies using two commercial rdbmss and two large real datasets and we report the efficiency of our approaches in this paper
future software systems will operate in highly dynamic world systems will need to operate correctly despite of unespected changes in factors such as environmental conditions user requirements technology legal regulations and market opportunities they will have to operate in constantly evolving environment that includes people content electronic devices and legacy systems they will thus need the ability to continuously adapt themselves in an automated manner to react to those changes to realize dynamic self adaptive systems the service concept has emerged as suitable abstraction mechanism together with the concept of the service oriented architecture soa this led to the development of technologies standards and methods to build service based applications by flexibly aggregating individual services this article discusses how those concepts came to be by taking two complementary viewpoints on the one hand it evaluates the progress in software technologies and methodologies that led to the service concept and soa on the other hand it discusses how the evolution of the requirements and in particular business goals influenced the progress towards highly dynamic self adaptive systems finally based on discussion of the current state of the art this article points out the possible future evolution of the field
manifold bootstrapping is new method for data driven modeling of real world spatially varying reflectance based on the idea that reflectance over given material sample forms low dimensional manifold it provides high resolution result in both the spatial and angular domains by decomposing reflectance measurement into two lower dimensional phases the first acquires representatives of high angular dimension but sampled sparsely over the surface while the second acquires keys of low angular dimension but sampled densely over the surface we develop hand held high speed brdf capturing device for phase one measurements condenser based optical setup collects dense hemisphere of rays emanating from single point on the target sample as it is manually scanned over it yielding brdf point measurements per second lighting directions from leds are applied at each measurement these are amplified to full brdf using the general ndf tabulated microfacet model the second phase captures images of the entire sample from fixed view and lit by varying area source we show that the resulting dimensional keys capture much of the distance information in the original brdf space so that they effectively discriminate among representatives though they lack sufficient angular detail to reconstruct the svbrdf by themselves at each surface position local linear combination of small number of neighboring representatives is computed to match each key yielding high resolution svbrdf quick capture session minutes on simple devices yields results showing sharp and anisotropic specularity and rich spatial detail
in this paper we devise data allocation algorithms that can utilize the knowledge of user moving patterns for proper allocation of shared data in mobile computing system by employing the data allocation algorithms devised the occurrences of costly remote accesses can be minimized and the performance of mobile computing system is thus improved the data allocation algorithms for shared data which are able to achieve local optimization and global optimization are developed local optimization refers to the optimization that the likelihood of local data access by an individual mobile user is maximized whereas global optimization refers to the optimization that the likelihood of local data access by all mobile users is maximized specifically by exploring the features of local optimization and global optimization we devise algorithm sd local and algorithm sd global to achieve local optimization and global optimization respectively in general the mobile users are divided into two types namely frequently moving users and infrequently moving users measurement called closeness measure which corresponds to the amount of the intersection between the set of frequently moving user patterns and that of infrequently moving user patterns is derived to assess the quality of solutions provided by sd local and sd global performance of these data allocation algorithms is comparatively analyzed from the analysis of sd local and sd global it is shown that sd local favors infrequently moving users whereas sd global is good for frequently moving users the simulation results show that the knowledge obtained from the user moving patterns is very important in devising effective data allocation algorithms which can lead to prominent performance improvement in mobile computing system
we consider new problem of detecting members of rare class of data the needles which have been hidden in set of records the haystack the only information regarding the characterization of the rare class is single instance of needle it is assumed that members of the needle class are similar to each other according to an unknown needle characterization the goal is to find the needle records hidden in the haystack this paper describes an algorithm for that task and applies it to several example cases
this paper develops the smart object paradigm and its instantiation which provide new conceptualization for the modeling design and development of an important but little researched class of information systems operations support systems oss oss is our term for systems which provide interactive support for the management of large complex operations environments such as manufacturing plants military operations and large power generation facilities the most salient feature of oss is their dynamic nature the number and kind of elements composing the system as well as the mode of control of those elements change frequently in response to the environment the abstraction of control and the ease with which complex dynamic control behavior can be modeled and simulated is one of the important aspects of the paradigm the framework for the smart object paradigm is the fusion of object oriented design models with declarative knowledge representation and active inferencing from ai models additional defining concepts from data knowledge models semantic data models active databases and frame based systems are added to the synthesis as justified by their contribution to the ability to naturally model oss at high level of abstraction the model assists in declaratively representing domain data knowledge and its structure and task or process knowledge in addition to modeling multilevel control and interobject coordination
dictionary based approaches to query translation have been widely used in cross language information retrieval clir experiments however translation has been not only limited by the coverage of the dictionary but also affected by translation ambiguities in this paper we propose novel method of query translation that combines other types of term relation to complement the dictionary based translation this allows extending the literal query translation to related words which produce beneficial effect of query expansion in clir in this paper we model query translation by markov chains mc where query translation is viewed as process of expanding query terms to their semantically similar terms in different language in mc terms and their relationships are modeled as directed graph and query translation is performed as random walk in the graph which propagates probabilities to related terms this framework allows us to incorporating different types of term relation either between two languages or within the source or target languages in addition the iterative training process of mc allows us to attribute higher probabilities to the target terms more related to the original query thus offers solution to the translation ambiguity problem we evaluated our method on three clir benchmark collections and obtained significant improvements over traditional dictionary based approaches
the next generations of supercomputers are projected to have hundreds of thousands of processors however as the numbers of processors grow the scalability of applications will be the dominant challenge this forces us to reexamine some of our fundamental ways that we approach the design and use of parallel languages and runtime systems in this paper we show how the globally shared arrays in popular partitioned global address space pgas language unified parallel upc can be combined with new collective interface to improve both performance and scalability this interface allows subsets or teams of threads to perform collective together as opposed to mpi’s communicators our interface allows set of threads to be placed in teams instantly rather than explicitly constructing communicators thus allowing for more dynamic team construction and manipulation we motivate our ideas with three application kernels dense matrix multiplication dense cholesky factorization and multidimensional fourier transforms we describe how the three aforementioned applications can be succinctly written in upc thereby aiding productivity we also show how such an interface allows for scalability by running on up to processors on the blue gene in few lines of upc code we wrote dense matrix multiply routine achieves tflop and fft that achieves tflop we analyze our performance results through models and show that the machine resources rather than the interfaces themselves limit the performance
given the increasing importance of communication teamwork and critical thinking skills in the computing profession we believe there is good reason to provide students with increased opportunities to learn and practice those skills in undergraduate computing courses toward that end we have been exploring studio based instructional methods which have been successfully employed in architecture and fine arts education for over century we have developed an adaptation of studio based instruction for computing education called the pedagogical code review which is modeled after the code inspection process used in the software industry to evaluate its effectiveness we carried out quasi experimental comparison of studio based cs course with pedagogical code reviews and an identical traditional cs course without pedagogical code reviews we found no learning outcome differences between the two courses however we did observe two interesting attitudinal trends self efficacy decreased more in the traditional course than in the studio based course and peer learning decreased in the traditional course but increased in the studio based course additional questionnaire and interview data provide further evidence of the positive impact of studio based instruction
real scale semantic web applications such as knowledge portals and marketplaces require the management of voluminous repositories of resource metadata the resource description framework rdf enables the creation and exchange of metadata as any other web data although large volumes of rdf descriptions are already appearing sufficiently expressive declarative query languages for rdf are still missing we propose rql new query language adapting the functionality of semistructured or xml query languages to the peculiarities of rdf but also extending this functionality in order to uniformly query both rdf descriptions and schemas rql is typed language following functional approach la oql and relies on formal graph model that permits the interpretation of superimposed resource descriptions created using one or more rdf schemas we illustrate the syntax semantics and type system of rql and report on the performance of rssdb our persistent rdf store for storing and querying voluminous rdf metadata
the past few years have witnessed different scheduling algorithms for processor that can manage its energy usage by scaling dynamically its speed in this paper we attempt to extend such work to the two processor setting specifically we focus on deadline scheduling and study online algorithms for two processors with an objective of maximizing the throughput while using the smallest possible energy the motivation comes from the fact that dual core processors are getting common nowadays our first result is new analysis of the energy usage of the speed function oa with respect to the optimal two processor schedule this immediately implies trivial two processor algorithm that is competitive for throughput and competitive for energy more interesting result is new online strategy for selecting jobs for the two processors together with oa it improves the competitive ratio for throughput from to while increasing that for energy by factor of note that even if the energy usage is not concern no algorithm can be better than competitive with respect to throughput
in this paper we propose method for learning reordering model for btg based statistical machine translation smt the model focuses on linguistic features from bilingual phrases our method involves extracting reordering examples as well as features such as part of speech and word class from aligned parallel sentences the features are classified with special considerations of phrase lengths we then use these features to train the maximum entropy me reordering model with the model we performed chinese to english translation tasks experimental results show that our bilingual linguistic model outperforms the state of the art phrase based and btg based smt systems by improvements of and bleu points respectively
in this paper we present theory for combining the effects of motion illumination structure albedo and camera parameters in sequence of images obtained by perspective camera we show that the set of all lambertian reflectance functions of moving object at any position illuminated by arbitrarily distant light sources lies close to bilinear subspace consisting of nine illumination variables and six motion variables this result implies that given an arbitrary video sequence it is possible to recover the structure motion and illumination conditions simultaneously using the bilinear subspace formulation the derivation builds upon existing work on linear subspace representations of reflectance by generalizing it to moving objects lighting can change slowly or suddenly locally or globally and can originate from combination of point and extended sources we experimentally compare the results of our theory with ground truth data and also provide results on real data by using video sequences of face and the entire human body with various combinations of motion and illumination directions we also show results of our theory in estimating motion and illumination model parameters from video sequence
structural statistical software testing ssst exploits the control flow graph of the program being tested to construct test cases specifically ssst exploits the feasible paths in the control flow graph that is paths which are actually exerted for some values of the program input the limitation is that feasible paths are massively outnumbered by infeasible ones addressing this limitation this paper presents an active learning algorithm aimed at sampling the feasible paths in the control flow graph the difficulty comes from both the few feasible paths initially available and the nature of the feasible path concept reflecting the long range dependencies among the nodes of the control flow graph the proposed approach is based on frugal representation inspired from parikh maps and on the identification of the conjunctive subconcepts in the feasible path concept within disjunctive version space framework experimental validation on real world and artificial problems demonstrates significant improvements compared to the state of the art
computers offer valuable assistance to people with physical disabilities however designing human computer interfaces for these users is complicated the range of abilities is more diverse than for able bodied users which makes analytical modelling harder practical user trials are also difficult and time consuming we are developing simulator to help with the evaluation of assistive interfaces it can predict the likely interaction patterns when undertaking task using variety of input devices and estimate the time to complete the task in the presence of different disabilities and for different levels of skill in this paper we describe the different components of the simulator in detail and present prototype of its implementation
modern compilers often implement function calls or returns in two steps first ldquo closure rdquo environment is properly installed to provide access for free variables in the target program fragment second the control is transferred to the target by ldquo jump with arguments for results rdquo closure conversion mdash which decides where and how to represent closures at runtime mdash is crucial step in the compilation of functional languages this paper presents new algorithm that exploits the use of compile time control and data flow information to optimize funtion calls by extensive closure sharing and allocation by and memory fetches for local and global variables by and improves the already efficient code generated by an earlier version of the standard ml of new jersey compiler by about on decstation moreover unlike most other approaches our new closure allocation scheme the strong safe for space complexity rule thus achieving good asymptotic space usage
multimedia ranking algorithms are usually user neutral and measure the importance and relevance of documents by only using the visual contents and meta data however users interests and preferences are often diverse and may demand different results even with the same queries how can we integrate user interests in ranking algorithms to improve search results here we introduce social network document rank sndocrank new ranking framework that considers searcher’s social network and apply it to video search sndocrank integrates traditional tf idf ranking with our multi level actor similarity mas algorithm which measures the similarity between social networks of searcher and document owners results from our evaluation study with social network and video data from youtube show that sndocrank offers search results more relevant to user’s interests than other traditional ranking methods
clocks are mechanism for providing synchronization barriers in concurrent programming languages they are usually implemented using primitive communication mechanisms and thus spare the programmer from reasoning about low level implementation details such as remote procedure calls and error conditionsclocks provide flexibility but programs often use them in specific ways that do not require their full implementation in this paper we describe tool that mitigates the overhead of general purpose clocks by statically analyzing how programs use them and choosing optimized implementations when availablewe tackle the clock implementation in the standard library of the programming language parallel distributed object oriented language we report our findings for small set of analyses and benchmarks our tool only adds few seconds to analysis time making it practical to use as part of compilation chain
networks on chip noc have been widely proposed as the future communication paradigm for use in next generation system on chip in this paper we present nocout methodology for generating an energy optimized application specific noc topology which supports both point to point and packet switched networks the algorithm uses prohibitive greedy iterative improvement strategy to explore the design space efficiently system level floorplanner is used to evaluate the iterative design improvements and provide feedback on the effects of the topology on wire length the algorithm is integrated within noc synthesis framework with characterized noc power and area models to allow accurate exploration for noc router library we apply the topology generation algorithm to several test cases including real world and synthetic communication graphs with both regular and irregular traffic patterns and varying core sizes since the method is iterative it is possible to start with known design to search for improvements experimental results show that many different applications benefit from mix of on chip networks and point to point networks with such hybrid network we achieve approximately lower energy consumption with maximum of than state of the art min cut partition based topology generator for variety of benchmarks in addition the average hop count is reduced by hops which would significantly reduce the network latency
surface editing operations commonly require geometric details of the surface to be preserved as much as possible we argue that geometric detail is an intrinsic property of surface and that consequently surface editing is best performed by operating over an intrinsic surface representation we provide such representation of surface based on the laplacian of the mesh by encoding each vertex relative to its neighborhood the laplacian of the mesh is enhanced to be invariant to locally linearized rigid transformations and scaling based on this laplacian representation we develop useful editing operations interactive free form deformation in region of interest based on the transformation of handle transfer and mixing of geometric details between two surfaces and transplanting of partial surface mesh onto another surface the main computation involved in all operations is the solution of sparse linear system which can be done at interactive rates we demonstrate the effectiveness of our approach in several examples showing that the editing operations change the shape while respecting the structural geometric detail
we introduce method for generating facial blendshape rigs from set of example poses of cg character our system transfers controller semantics and expression dynamics from generic template to the target blendshape model while solving for an optimal reproduction of the training poses this enables scalable design process where the user can iteratively add more training poses to refine the blendshape expression space however plausible animations can be obtained even with single training pose we show how formulating the optimization in gradient space yields superior results as compared to direct optimization on blendshape vertices we provide examples for both hand crafted characters and scans of real actor and demonstrate the performance of our system in the context of markerless art directable facial tracking
architecture based metrics can provide valuable information on whether or not one can localize the effects of modification such as adjusting data flows or control flows in software and can therefore be used to prevent the changes from adversely affecting other software components this paper proposes an architecture centric metric using entropy for assessing structural dependencies among software components the proposed metric is based on mathematical model representing the maintainability snapshot of system the introduced architectural level metric includes measures for coupling and cohesion from this model the relative maintainability of component referred to as maintainability profile can be developed to identify architectural decisions that are detrimental to the maintainability of system copyright copy john wiley sons ltd
creating an ontology and populating it with data are both labor intensive tasks requiring high degree of expertise thus scaling ontology creation and population to the size of the web in an effort to create web of data which some see as web is prohibitive can we find ways to streamline these tasks and lower the barrier enough to enable web toward this end we offer form based approach to ontology creation that provides way to create web ontologies without the need for specialized training and we offer way to semi automatically harvest data from the current web of pages for web ontology in addition to harvesting information with respect to an ontology the approach also annotates web pages and links facts in web pages to ontological concepts resulting in web of data superimposed over the web of pages experience with our prototype system shows that mappings between conceptual model based ontologies and forms are sufficient for creating the kind of ontologies needed for web and experiments with our prototype system show that automatic harvesting automatic annotation and automatic superimposition of web of data over web of pages work well
web query classification qc aims to classify web users queries which are often short and ambiguous into set of target categories qc has many applications including page ranking in web search targeted advertisement in response to queries and personalization in this paper we present novel approach for qc that outperforms the winning solution of the acm kddcup competition whose objective is to classify real user queries in our approach we first build bridging classifier on an intermediate taxonomy in an offline mode this classifier is then used in an online mode to map user queries to the target categories via the above intermediate taxonomy major innovation is that by leveraging the similarity distribution over the intermediate taxonomy we do not need to retrain new classifier for each new set of target categories and therefore the bridging classifier needs to be trained only once in addition we introduce category selection as new method for narrowing down the scope of the intermediate taxonomy based on which we classify the queries category selection can improve both efficiency and effectiveness of the online classification by combining our algorithm with the winning solution of kddcup we made an improvement by and in terms of precision and respectively compared with the best results of kddcup
chip multiprocessor cmp architectures present challenge for efficient simulation combining the requirements of detailed microprocessor simulator with that of tightly coupled parallel system in this paper distributed simulator for target cmps is presented based on the message passing interface mpi designed to run on host cluster of workstations microbenchmark based evaluation is used to narrow the parallelization design space concerning the performance impact of distributed vs centralized target simulation blocking vs non blocking remote cache accesses null message vs barrier techniques for clock synchronization and network interconnect selection the best combination is shown to yield speedups of up to on node cluster of dual cpu workstations partially due to cache effects
branch misprediction limits processor performance significantly as the pipeline deepens and the instruction issued per cycle increases since the introduction of the two level adaptive branch predictor branch history has been major input vector in branch prediction together with the address of branch instruction until now the length of branch history has been statically fixed for all branch instructions and the history length is usually selected in accordance with the size of branch prediction table however different branch instructions require different length histories to achieve high prediction accuracies therefore to dynamically adjust to the optimal history length for each branch instruction this paper presents dynamic per branch history length adjustment policy by tracking data dependencies of branches and identifying strongly correlated branches in branch history our method provides optimal history length for each branch instruction resulting in substantial improvement in prediction accuracy the proposed solution does not require any forms of prior profilings and it provides up to improvement in prediction accuracy further it even outperforms in some applications the prediction accuracy of optimally selected history length by prior profilings
one way that artists create compelling character animations is by manipulating details of character’s motion this process is expensive and repetitive we show that we can make such motion editing more efficient by generalizing the edits an animator makes on short sequences of motion to other sequences our method predicts frames for the motion using gaussian process models of kinematics and dynamics these estimates are combined with probabilistic inference our method can be used to propagate edits from examples to an entire sequence for an existing character and it can also be used to map motion from control character to very different target character the technique shows good generalization for example we show that an estimator learned from few seconds of edited example animation using our methods generalizes well enough to edit minutes of character animation in high quality fashion learning is interactive an animator who wants to improve the output can provide small correcting examples and the system will produce improved estimates of motion we make this interactive learning process efficient and natural with fast full body ik system with novel features finally we present data from interviews with professional character animators that indicate that generalizing and propagating animator edits can save artists significant time and work
this paper presents new formal method for the efficient verification of concurrent systems that are modeled using safe petri net representation our method generalizes upon partial order methods to explore concurrently enabled conflicting paths simultaneously we show that our method can achieve an exponential reduction in algorithmic complexity without resorting to an implicit enumeration approach
web documents are typically associated with many text streams including the body the title and the url that are determined by the authors and the anchor text or search queries used by others to refer to the documents through systematic large scale analysis on their cross entropy we show that these text streams appear to be composed in different language styles and hence warrant respective language models to properly describe their properties we propose language modeling approach to web document retrieval in which each document is characterized by mixture model with components corresponding to the various text streams associated with the document immediate issues for such mixture model arise as all the text streams are not always present for the documents and they do not share the same lexicon making it challenging to properly combine the statistics from the mixture components to address these issues we introduce an open vocabulary smoothing technique so that all the component language models have the same cardinality and their scores can simply be linearly combined to ensure that the approach can cope with web scale applications the model training algorithm is designed to require no labeled data and can be fully automated with few heuristics and no empirical parameter tunings the evaluation on web document ranking tasks shows that the component language models indeed have varying degrees of capabilities as predicted by the cross entropy analysis and the combined mixture model outperforms the state of the art bmf based system
most research work on optimization of nested queries focuses on aggregate subqueries in this article we show that existing approaches are not adequate for nonaggregate subqueries especially for those having multiple subqueries and certain comparison operators we then propose new efficient approach the nested relational approach based on the nested relational algebra the nested relational approach treats all subqueries in uniform manner being able to deal with nested queries of any type and any level we report on experimental work that confirms that existing approaches have difficulties dealing with nonaggregate subqueries and that the nested relational approach offers better performance we also discuss algebraic optimization rules for further optimizing the nested relational approach and the issue of integrating it into relational database systems
in this article pioneer study is conducted to evaluate the possibility of identifying people through their personality traits the study is conducted using the answers of population of individuals to collection of items these items aim at measuring five common different personality traits usually called the big five these five levels are neuroticism extraversion agreeableness conscientiousness and openness the traits are estimated using the widely used samejima’s model and then used to discriminate the individuals results point biometrics using personality traits as new promising biometric modality
the reflective capabilities of rewriting logic and their efficient implementation in the maude language can be exploited to endow reflective language like maude with module algebra in which structured theories can be combined and transformed by means of rich collection of module operations we have followed this approach and have used the specification of such module algebra as its implementation including user interface and an execution environment for it the high level at which the specification of the module algebra has been given makes this approach particularly attractive when compared to conventional implementations because of its shorter development time and the greater flexibility maintainability and extensibility that it affords we explain the general principles of the reflective design of the module algebra and its categorical foundations based on the institution theoretic notion of structured theory and morphisms and colimits for such theories based on such foundations we then explain the categorical semantics of maude’s parameterized theories modules and views and their instantiation and the reflective algebraic specification of the different module and view operations
there are about of men and of women suffering from colorblindness we show that the existing image search techniques cannot provide satisfactory results for these users since many images will not be well perceived by them due to the loss of color information in this paper we introduce scheme named accessible image search ais to accommodate these users different from the general image search scheme that aims at returning more relevant results ais further takes into account the colorblind accessibilities of the returned results ie the image qualities in the eyes of colorblind users the scheme includes two components accessibility assessment and accessibility improvement for accessibility assessment we introduce an analysisbased method and learning based method based on the measured accessibility scores different reranking methods can be performed to prioritize the images with high accessibilities in accessibility improvement component we propose an efficient recoloring algorithm to modify the colors of the images such that they can be better perceived by colorblind users we also propose the accessibility average precision aap for ais as complementary performance evaluation measure to the conventional relevance based evaluation methods experimental results with more than images and anonymous colorblind users demonstrate the effectiveness and usefulness of the proposed scheme
since the last decade images have been integrated into several application domains such as gis medicine etc this integration necessitates new managing methods particularly in image retrieval queries should be formulated using different types of features such as low level features of images histograms color distribution etc spatial and temporal relations between salient objects semantic features etc in this chapter we propose novel method for identifying and indexing several types of relations between salient objects spatial relations are used here to show how our method can provide high expressive power to relations in comparison to the traditional methods
integration of web search with geographic information has recently attracted much attention there are number of local web search systems enabling users to find location specific web content in this paper however we point out that this integration is still at superficial level most local web search systems today only link local web content to map interface they are extensions of conventional stand alone geographic information system gis applied to web based client server architecture in this paper we discuss the directions available for tighter integration of web search with gis in terms of extraction knowledge discovery and presentation we also describe implementations to support our argument that the integration must go beyond the simple map and hyperlink architecture
the challenge of saturating all phases of pervasive service provision with context aware functionality lies in coping with the complexity of maintaining retrieving and distributing context information to efficiently represent and query context information sophisticated modelling scheme should exist to distribute and synchronise context knowledge in various context repositories across multitude of administrative domains streamlined mechanisms are needed this paper elaborates on an innovative context management framework that has been designed to cope with free text and location based context retrieval and efficient context consistency control the proposed framework has been incorporated in multi functional pervasive services platform while most of the mechanisms it employs have been empirically evaluated
most work on pattern mining focuses on simple data structures such as itemsets and sequences of itemsets however lot of recent applications dealing with complex data like chemical compounds protein structures xml and web log databases and social networks require much more sophisticated data structures such as trees and graphs in these contexts interesting patterns involve not only frequent object values labels appearing in the graphs or trees but also frequent specific topologies found in these structures recently several techniques for tree and graph mining have been proposed in the literature in this paper we focus on constraint based tree pattern mining we propose to use tree automata as mechanism to specify user constraints over tree patterns we present the algorithm cobminer which allows user constraints specified by tree automata to be incorporated in the mining process an extensive set of experiments executed over synthetic and real data xml documents and web usage logs allows us to conclude that incorporating constraints during the mining process is far more effective than filtering the interesting patterns after the mining process
while shared nothing parallel infrastructures provide fast processing of explosively growing digital content managing data efficiently across multiple nodes is important the value range partitioning method with parallel tree structures in shared nothing environment is an efficient approach for handling large amounts of data to handle large amounts of data it is also important to provide an efficient concurrency control protocol for the parallel tree many studies have proposed concurrency control protocols for trees which use latch coupling none of these studies has considered that latch coupling contains performance bottleneck of sending of messages between processing elements pes in distributed environments because latch coupling is efficient for tree on single machine the only protocol without latch coupling is the link algorithm but it is difficult to use the link algorithm directly on an entire parallel tree structure because it is necessary to guarantee the consistency of the side pointers we propose new concurrency control protocol named lcfb that requires no latch coupling in optimistic processes lcfb reduces the amount of communication between pes during tree traversal to detect access path errors in the lcfb protocol caused by removal of latch coupling we assign boundary values to each index page because page split may cause page deletion in fat btree we also propose an effective method for handling page deletions without latch coupling we then combine lcfb with the link algorithm within each pe to reduce the cost of structure modification operations smos in pe as solution to the difficulty of consistency management for the side pointers in parallel tree structure to compare the performance of the proposed protocol with conventional protocols mark opt inc opt and aries im we implemented them on an autonomous disk system with fat btree structure experimental results in various environments indicate that the system throughput of the proposed protocols is always superior to those of the other protocols especially in large scale configurations and lcfb with the link algorithm is effective at higher update ratios
in this paper we study approximate landmark based methods for point to point distance estimation in very large networks these methods involve selecting subset of nodes as landmarks and computing offline the distances from each node in the graph to those landmarks at runtime when the distance between pair of nodes is needed it can be estimated quickly by combining the precomputed distances we prove that selecting the optimal set of landmarks is an np hard problem and thus heuristic solutions need to be employed we therefore explore theoretical insights to devise variety of simple methods that scale well in very large networks the efficiency of the suggested techniques is tested experimentally using five real world graphs having millions of edges while theoretical bounds support the claim that random landmarks work well in practice our extensive experimentation shows that smart landmark selection can yield dramatically more accurate results for given target accuracy our methods require as much as times less space than selecting landmarks at random in addition we demonstrate that at very small accuracy loss our techniques are several orders of magnitude faster than the state of the art exact methods finally we study an application of our methods to the task of social search in large graphs
one of the critical problems in distributed memory multi core architectures is scalable parallelization that minimizes inter processor communication using the concept of iteration space slicing this paper presents new code parallelization scheme for data intensive applications this scheme targets distributed memory multi core architectures and formulates the problem of data computation distribution partitioning across parallel processors using slicing such that starting with the partitioning of the output arrays it iteratively determines the partitions of other arrays as well as iteration spaces of the loop nests in the application code the goal is to minimize inter processor data communications based on this iteration space slicing based formulation of the problem we also propose solution scheme the proposed data computation scheme is evaluated using six data intensive benchmark programs in our experimental evaluation we also compare this scheme against three alternate data computation distribution schemes the results obtained are very encouraging indicating around better speedup with processors over the next best scheme when averaged over all benchmark codes we tested
we propose an efficient dynamic slicing algorithm for component based software architectures we first transform software architecture into an intermediate representation which we have named architecture component dependence graph acdg our slicing algorithm is based on marking and unmarking the in service and out of service edges on an acdg as and when dependencies arise and cease on occurrence of events we use the computed dynamic architectural slices to select test cases for regression testing of component based systems one important advantage of our approach is that slice is available for use even before request for slice is made this appreciably reduces the response time of slicing commands and help regression testing we show that our architectural slicing algorithm is more time and space efficient than the existing algorithms we also briefly discuss prototype tool srtwa slicer based regression testing of wright architectures which we have developed to implement our algorithm
managing the hierarchical organization of data is starting to play key role in the knowledge management community due to the great amount of human resources needed to create and maintain these organized repositories of information machine learning community has in part addressed this problem by developing hierarchical supervised classifiers that help maintainers to categorize new resources within given hierarchies although such learning models succeed in exploiting relational knowledge they are highly demanding in terms of labeled examples because the number of categories is related to the dimension of the corresponding hierarchy hence the creation of new directories or the modification of existing ones require strong investmentsthis paper proposes semi automatic process interleaved with human suggestions whose aim is to minimize simplify the work required to the administrators when creating modifying and maintaining directories within this process bootstrapping taxonomy with examples represents critical factor for the effective exploitation of any supervised learning model for this reason we propose method for the bootstrapping process that makes first hypothesis of categorization for set of unlabeled documents with respect to given empty hierarchy of concepts based on revision of self organizing maps namely taxsom the proposed model performs an unsupervised classification exploiting the priori knowledge encoded in taxonomy structure both at the terminological and topological level the ultimate goal of taxsom is to create the premise for successfully training supervised classifier
one of the demands of database system transaction management is to achieve high degree of concurrency by taking into consideration the semantics of high level operations on the other hand the implementation of such operations must pay attention to conflicts on the storage representation levels below to meet these requirements in layered architecture we propose multilevel transaction management utilizing layer specific semantics based on the theoretical notion of multilevel serializability family of concurrency control strategies is developed suitable recovery protocols are investigated for aborting single transactions and for restarting the system after crash the choice of levels involved in multilevel transaction strategy reveals an inherent trade off between increased concurrency and growing recovery costs series of measurements has been performed in order to compare several strategies preliminary results indicate considerable performance gains of the multilevel transaction approach
virtually all applications which provide or require security service need secret key in an ambient world where potentially sensitive information is continually being gathered about us it is critical that those keys be both securely deployed and safeguarded from compromise in this paper we provide solutions for secure key deployment and storage of keys in sensor networks and radio frequency identification systems based on the use of physical unclonable functions pufs in addition to providing an overview of different existing puf realizations we introduce puf realization aimed at ultra low cost applications we then show how the properties of fuzzy extractors or helper data algorithms can be used to securely deploy secret keys to low cost wireless node our protocols are more efficient round complexity and allow for lower costs compared to previously proposed ones we also provide an overview of puf applications aimed at solving the counterfeiting of goods and devices
stereo matching is one of the most active research areas in computer vision while large number of algorithms for stereo correspondence have been developed relatively little work has been done on characterizing their performance in this paper we present taxonomy of dense two frame stereo methods our taxonomy is designed to assess the different components and design decisions made in individual stereo algorithms using this taxonomy we compare existing stereo methods and present experiments evaluating the performance of many different variants in order to establish common software platform and collection of data sets for easy evaluation we have designed stand alone flexible implementation that enables the evaluation of individual components and that can easily be extended to include new algorithms we have also produced several new multi frame stereo data sets with ground truth and are making both the code and data sets available on the web finally we include comparative evaluation of large set of today’s best performing stereo algorithms
program specific or function specific optimization phase sequences are universally accepted to achieve better overall performance than any fixed optimization phase ordering number of heuristic phase order space search algorithms have been devised to find customized phase orderings achieving high performance for each function however to make this approach of iterative compilation more widely accepted and deployed in mainstream compilers it is essential to modify existing algorithms or develop new ones that find near optimal solutions quickly as step in this direction in this paper we attempt to identify and understand the important properties of some commonly employed heuristic search methods by using information collected during an exhaustive exploration of the phase order search space we compare the performance obtained by each algorithm with all others as well as with the optimal phase ordering performance finally we show how we can use the features of the phase order space to improve existing algorithms as well as devise new and better performing search algorithms
selected technologies that contribute to knowledge management solutions are reviewed using nonaka’s model of organizational knowledge creation as framework the extent to which knowledge transformation within and between tacit and explicit forms can be supported by the technologies is discussed and some likely future trends are identified it is found that the strongest contribution to current solutions is made by technologies that deal largely with explicit knowledge such as search and classification contributions to the formation and communication of tacit knowledge and support for making it explicit are currently weaker although some encouraging developments are highlighted such as the use of text based chat expertise location and unrestricted bulletin boards through surveying some of the technologies used for knowledge management this paper serves as an introduction to the subject for those papers in this issue that discuss technology
stream architecture is novel microprocessor architecture with wide application potential but as for whether it can be used efficiently in scientific computing many issues await further study this paper first gives the design and implementation of bit stream processor ft fei teng for scientific computing the carrying out of bit extension design and scientific computing oriented optimization are described in such aspects as instruction set architecture stream controller micro controller alu cluster memory hierarchy and interconnection interface here second two kinds of communications as message passing and stream communications are put forward an interconnection based on the communications is designed for ft based high performance computers third novel stream programming language sf stream fortran and its compiler sfcompiler stream fortran compiler are developed to facilitate the development of scientific applications finally nine typical scientific application kernels are tested and the results show the efficiency of stream architecture for scientific computing
dimensionality reduction via canonical variate analysis cva is important for pattern recognition and has been extended variously to permit more flexibility eg by kernelizing the formulation this can lead to over fitting usually ameliorated by regularization here method for sparse multinomial kernel discriminant analysis smkda is proposed using sparse basis to control complexity it is based on the connection between cva and least squares and uses forward selection via orthogonal least squares to approximate basis generalizing similar approach for binomial problems classification can be performed directly via minimum mahalanobis distance in the canonical variates smkda achieves state of the art performance in terms of accuracy and sparseness on benchmark datasets
peer knowledge management systems pkms offer flexible architecture for decentralized knowledge sharing in pkmss the knowledge sharing and evolution processes are based on peer ontologies finding an effective and efficient query rewriting algorithm for regular expression queries is vital for knowledge sharing between peers in pkmss and for this our solution is characterized by graph based query rewriting based on the graphs for both axioms and mappings we design novel algorithm regular expression rewriting algorithm to rewrite regular expression queries along semantic paths the simulation results show that the performance of our algorithm is better than mork’s reformulation algorithms mork peer architectures for knowledge sharing phd thesis university of washington and our algorithm is more effective than the naive rewriting algorithm
multiprocessor system on chip mpsoc architectures have received lot of attention in the past years but few advances in compilation techniques target these architectures this is particularly true for the exploitation of data locality most of the compilation techniques for parallel architectures discussed in the literature are based on single loop nest this article presents new techniques that consist in applying loop fusion and tiling to several loop nests and to parallelize the resulting code across different processors these two techniques reduce the number of memory accesses however they increase dependencies and thereby reduce the exploitable parallelism in the code this article tries to address this contradiction to optimize the memory space used by temporary arrays smaller buffers are used as replacement different strategies are studied to optimize the processing time spent accessing these buffers the experiments show that these techniques yield significant reduction in the number of data cache misses percnt and in processing time percnt
currently there are relatively few instances of hash and sign signatures in the standard model moreover most current instances rely on strong and less studied assumptions such as the strong rsa and strong diffie hellman assumptions in this paper we present new approach for realizing hash and sign signatures in the standard model in our approach signer associates each signature with an index that represents how many signatures that signer has issued up to that point then to make use of this association we create simple and efficient techniques that restrict an adversary which makes signature requests to forge on an index no greater than lceil lg rceil finally we develop methods for dealing with this restricted adversary our approach requires that signer maintains small amount of state counter of the number of signatures issued we achieve two new realizations for hash and sign signatures respectively based on the rsa assumption and the computational diffie hellman assumption in bilinear groups
this paper presents six novel approaches to biographic fact extraction that model structural transitive and latent properties of biographical data the ensemble of these proposed models substantially outperforms standard pattern based biographic fact extraction methods and performance is further improved by modeling inter attribute correlations and distributions over functions of attributes achieving an average extraction accuracy of over seven types of biographic attributes
in this paper we integrate history ndash encoding based methodology for checking dynamic database integrity constraints into situation calculus based specification of relational database updates by doing this we are able to answer queries about whole hypothetical evolution of database without having to update the entire database and keep all the information associated to the generated states state and prove dynamic integrity constraints as static integrity constraints transform history dependent preconditions for updates into local preconditionsthe methodology presented here is based on the introduction of operators of predicate past temporal logic as macros into the specifications written in the situation calculus of the dynamics of database temporal subformulas of query are treated as auxiliary views with the corresponding specification of their dynamics an implementation of hypothetical temporal query answering is presented
adaptable software architectures sas have been suggested as viable solution for the design of distributed applications that operate in mobile computing environment to cope with the high heterogeneity and variability of this environment mobile code techniques can be used to implement this kind of sas since they allow us to dynamically modify the load of the hosting nodes and the internode traffic to adapt to the resources available in the nodes and to the condition of the often wireless network link however moving code among nodes has cost eg in terms of network traffic and consumed energy for mobile nodes so designing an adaptable sa based on mobile code techniques requires careful analysis to determine its effectiveness since early design stages in this respect our main contribution consists of methodology called asap adaptable software architectures performance to automatically derive starting from design model of mobility based sa markov model whose solution provides insights about the most effective adaptation strategy based on code mobility in given execution environment we assume that the sa model is expressed using the unified modeling language uml because of its widespread use in software design also suggesting some extension to this formalism to better express the mobility structure of the application ie which are the mobile components and the possible targets of their movement
we have designed and implemented maya version of java that allows programmers to extend and reinterpret its syntax maya generalizes macro systems by treating grammar productions as generic functions and semantic actions on productions as multimethods on the corresponding generic functions programmers can write new generic functions ie grammar productions and new multimethods ie semantic actions through which they can extend the grammar of the language and change the semantics of its syntactic constructs respectively maya’s multimethods are compile time metaprograms that transform abstract syntax they execute at program compile time because they are semantic actions executed by the parser maya’s multimethods can be dispatched on the syntactic structure of the input as well as the static source level types of expressions in the input in this paper we describe what maya can do and how it works we describe how its novel parsing techniques work and how maya can statically detect certain kinds of errors such as code that generates references to free variables finally to demonstrate maya’s expressiveness we describe how maya can be used to implement the multijava language which was described by clifton et al at oopsla
automata have proved to be useful tool in infinite state model checking since they can represent infinite sets of integers and reals however analogous to the use of binary decision diagrams bdds to represent finite sets the sizes of the automata are an obstacle in the automata based set representation in this article we generalize the notion of don’t cares for bdds to word languages as means to reduce the automata sizes we show that the minimal weak deterministic büchi automaton wdba with respect to given don’t care set under certain restrictions is uniquely determined and can be efficiently constructed we apply don’t cares to improve the efficiency of decision procedure for the first order logic over the mixed linear arithmetic over the integers and the reals based on wdbas
recently model based diagnosis of discrete event systems has attracted more and more attention incremental diagnosis is essential for on line diagnosis and the diagnosis is usually performed on line however the observations are often uncertain to address this problem new concept of two restricted successive temporal windows is proposed thereafter an approach is given to support on line incremental model based diagnosis of discrete event systems with uncertain observations all the observation sequences emitted by the previous window can be produced if the second temporal window is long enough bigger than the maximal delay of transmission in this way the global emitted observation sequences can be inferred as well the proposed approach is sound complete timely and universal in particular it is well suited for on line diagnosis of discrete event systems when the received observations are too dense to find in time so called ldquo sound windows rdquo by traditional approaches
future large scale sensor networks may comprise thousands of wirelessly connected sensor nodes that could provide an unimaginable opportunity to interact with physical phenomena in real time however the nodes are typically highly resource constrained since the communication task is significant power consumer various attempts have been made to introduce energy awareness at different levels within the communication stack clustering is one such attempt to control energy dissipation for sensor data dissemination in multihop fashion the time controlled clustering algorithm tcca is proposed to realize network wide energy reduction realistic energy dissipation model is derived probabilistically to quantify the sensor network’s energy consumption using the proposed clustering algorithm discrete event simulator is developed to verify the mathematical model and to further investigate tcca in other scenarios the simulator is also extended to include the rest of the communication stack to allow comprehensive evaluation of the proposed algorithm
fast congestion prediction is essential for congestion reduction techniques at different stages of the flow probabilistic congestion estimation methods model congestion after placement by considering the probability wire will be routed over different areas of the routing region they are popular because they are much faster than traditional global routingin this paper two congestion estimation tools are presented the first one is an implementation of probabilistic method called pce and appears to be very fast in comparison with comparable methods the second one called fadglor is new and based on global routing techniques surprisingly fadglor is about as fast as pce the reason is that it is tuned towards congestion estimation and speed contrary to global routers that are tuned towards wire length reduction and routing as many wires as possible special focus is on congested areas of the chip the areas that may prevent design to be routable and fadglor more accurately predicts these areas than pce both tools are tested on designs varying from impossible to route to easily routable previous papers focussed on accurately modeling congestion of the final routable design in practice also unroutable designs need to be evaluated the results presented in this paper indicate that global routing based methods are probably more worthwhile than probabilistic methods
we propose technique for maintaining coherency of transactional distributed shared memory used by applications accessing shared persistent store our goal is to improve support for fine grained distributed data sharing in collaborative design applications such as cad systems and software development environments in contrast traditional research in distributed shared memory has focused on supporting parallel programs in this paper we show how distributed programs can benefit from this shared memory abstraction as well our approach called log based coherency integrates coherency support with standard mechanism for ensuring recoverability of persistent data in our system transaction logs are the basis of both recoverability and coherency we have prototyped log based coherency as set of extensions to rvm satyanarayanan et al runtime package supporting recoverable virtual memory our prototype adds coherency support to rvm in simple way that does not require changes to existing rvm applications we report on our prototype and its performance and discuss its relationship to other dsm systems
most multi objective evolutionary algorithms moeas use the concept of dominance in the search process to select the top solutions as parents in an elitist manner however as moeas are probabilistic search methods some useful information may be wasted if the dominated solutions are completely disregarded in addition the diversity may be lost during the early stages of the search process leading to locally optimal or partial pareto front beside this the non domination sorting process is complex and time consuming to overcome these problems this paper proposes multi objective evolutionary algorithms based on summation of normalized objective values and diversified selection snov ds the performance of this algorithm is tested on set of benchmark problems using both multi objective evolutionary programming moep and multi objective differential evolution mode with the proposed method the performance metric has improved significantly and the speed of the parent selection process has also increased when compared with the non domination sorting in addition the proposed algorithm also outperforms ten other algorithms
because of its global system nature energy consumption is major challenge of computer systems design ecosystem incorporates the currentcy model which lets the operating system manage energy as first class resource ecosystem accurately accounts for the energy that asynchronous device operation consumes and can express complex energy related goals and behaviors leading to more effective unified management policies
given set of classifiers and probability distribution over their domain one can define metric by taking the distance between pair of classifiers to be the probability that they classify random item differently we prove bounds on the sample complexity of pac learning in terms of the doubling dimension of this metric these bounds imply known bounds on the sample complexity of learning halfspaces with respect to the uniform distribution that are optimal up to constant factor we then prove bound that holds for any algorithm that outputs classifier with zero error whenever this is possible this bound is in terms of the maximum of the doubling dimension and the vc dimension of and strengthens the best known bound in terms of the vc dimension alone finally we show that there is no bound on the doubling dimension of halfspaces in in terms of that holds independently of the domain distribution this implies that there is no such bound in terms of the vc dimension of in contrast with the metric dimension
this paper evaluates the design and implementation of omniware safe efficient and language independent system for executing mobile program modules previous approaches to implementing mobile code rely on either language semantics or abstract machine interpretation to enforce safety in the former case the mobile code system sacrifices universality to gain safety by dictating particular source language or type system in the latter case the mobile code system sacrifices performance to gain safety through abstract machine interpretationomniware uses software fault isolation technology developed to provide safe extension code for databases and operating systems to achieve unique combination of language independence and excellent performance software fault isolation uses only the semantics of the underlying processor to determine whether mobile code module can corrupt its execution environment this separation of programming language implementation from program module safety enables our mobile code system to use radically simplified virtual machine as its basis for portability we measured the performance of omniware using suite of four spec programs on the pentium powerpc mips and sparc processor architectures including the overhead for enforcing safety on all four processors omnivm executed the benchmark programs within as fast as the optimized unsafe code produced by the vendor supplied compiler
web applications are fast becoming more widespread larger more interactive and more essential to the international use of computers it is well understood that web applications must be highly dependable and as field we are just now beginning to understand how to model and test web applications one straightforward technique is to model web applications as finite state machines however large numbers of input fields input choices and the ability to enter values in any order combine to create state space explosion problem this paper evaluates solution that uses constraints on the inputs to reduce the number of transitions thus compressing the fsm the paper presents an analysis of the potential savings of the compression technique and reports actual savings from two case studies
the automated generation of systems eg within model driven development is considerable improvement of the software development however besides the automated generation the verification of these generated systems needs to be supported too by applying generators it is not necessarily guaranteed that the generation outcome is correct typical problems may be firstly the use of wrong operator resulting in an erroneous generation static aspects of the generation secondly the interactions between the different generated system assets snippets of the generated outcome might be incorrect since the snippets might be connected in wrong sequence dynamic aspect of the generation therefore the hierarchical dependencies of the snippets which are the input of the generator as well as the dynamic behavior resulting from the generation have to be checked we describe the hierarchy in version model based on boolean logic the temporal behavior may be checked by model checkers for the generation we apply our xopt concept which provides domain specific transformation operators on the xml representation besides the principles of the static and dynamic elements of our checking approach the paper presents the way to map program assets to the version model and to finite state automata which are the prerequisite for the checking though the proposed checking is presented at the code level the approach may be applied to different kinds of assets eg also on the model level
this paper presents an approach to the problem of factual question generation factual questions are questions whose answers are specific facts who what where when we enhanced simple attribute value xml language and its interpretation engine with context sensitive primitives and added linguistic layer deep enough for the overall system to score well on user satisfiability and the linguistically well founded criteria used to measure up language generation systems experiments with open domain question generation on trec like data validate our claims and approach
the skyline operator was first proposed in for retrieving interesting tuples from dataset since then skyline related papers have been published however we discovered that one of the most intuitive and practical type of skyline queries namely group by skyline queries remains unaddressed group by skyline queries find the skyline for each group of tuples in this paper we present comprehensive study on processing group by skyline queries in the context of relational engines specifically we examine the composition of query plan for group by skyline query and develop the missing cost model for the bbs algorithm experimental results show that our techniques are able to devise the best query plans for variety of group by skyline queries our focus is on algorithms that can be directly implemented in today’s commercial database systems without the addition of new access methods which would require addressing the associated challenges of maintenance with updates concurrency control etc
the reactive programming model is largely different to what we’re used to as we don’t have full control over the application’s control flow if we mix the declarative and imperative programming style which is usual in the ml family of languages the situation is even more complex it becomes easy to introduce patterns where the usual garbage collector for objects cannot automatically dispose all components that we intuitively consider garbage in this paper we discuss duality between the definitions of garbage for objects and events we combine them into single one to specify the notion of garbage for reactive programming model in mixed functional imperative language and we present formal algorithm for collecting garbage in this environment building on top of the theoretical model we implement library for reactive programming that does not cause leaks when used in the mixed declarative imperative model the library allows us to safely combine both of the reactive programming patterns as result we can take advantage of the clarity and simplicity of the declarative approach as well as the expressivity of the imperative model
system virtualization is now available for mobile devices allowing for many advantages two of the major benefits from virtualization are system fault isolation and security the isolated driver domain idd model widely adopted architecture enables strong system fault isolation by limiting the impact of driver faults to the driver domain itself however excessive requests from malicious domain to an idd can cause cpu overuse of the idd and performance degradation of applications in the idd and other domains that share the same device with the malicious do main if the idd model is applied to mobile devices this failure of performance isolation could also lead to battery drain and thus it introduces new severe threat to mobile devices in order to solve this problem we propose fine grained access control mechanism in an idd requests from guest domains are managed by an accounting module in terms of cpu usage with the calcula tion of estimated cpu consumption using regression equations the requests are scheduled by an access control enforcer ac cording to security policies as result our mechanism provides precise control on the cpu usage of guest domain due to device access and prevents malicious guest domains from cpu overuse performance degradation and battery drain we have implemented prototype of our approach considering both network and storage devices with real smart phone sgh that runs two para virtualized linux kernels on top of secure xen on arm the evaluation shows our approach effectively protects smart phone against excessive attacks and guarantees availability
today it is possible to deploy sensor networks in the real world and collect large amounts of raw sensory data however it remains major challenge to make sense of sensor data ie to extract high level knowledge from the raw data in this paper we present novel in network knowledge discovery technique where high level information is inferred from raw sensor data directly on the sensor nodes in particular our approach supports the discovery of frequent distributed event patterns which characterize the spatial and temporal correlations between events observed by sensor nodes in confined network neighborhood one of the key challenges in realizing such system are the constrained resources of sensor nodes to this end our solution offers declarative query language that allows to trade off detail and scope of the sought patterns for resource consumption we implement our proposal on real hardware and evaluate the trade off between scope of the query and resource consumption
there is plethora of approaches to retrieval at one extreme is web search engine which provides the user with complete freedom to search collection of perhaps over billion documents at the opposite extreme is web page where the author has supplied small number of links to outside documents chosen in advance by the author many practical retrieval needs lie between these two extremes this paper aims to look at the multi dimensional spectrum of retrieval methods at the end it presents some hypothetical tools hopefully blueprints to the future that aim to cover wider parts of the spectrum than current tools do
in this paper we study the following problem we are given certain one or two dimensional region to monitor and requirement on the degree of coverage doc of to meet by network of deployed sensors the latter will be dropped by moving vehicle which can release sensors at arbitrary points within the node spatial distribution when sensors are dropped at certain point is modeled by certain probability density function the network designer is allowed to choose an arbitrary set of drop points and to release an arbitrary number of sensors at each point given this setting we consider the problem of determining the best performing strategy among certain set of grid like strategies that reflect the one or two dimensional symmetry of the region to be monitored the best performing deployment strategy is such that the doc requirement is fulfilled and the total number of deployed nodes is minimum we study this problem both analytically and through simulation under the assumption that is the two dimensional normal distribution centered at the drop point the main contribution of this paper is an in depth study of the inter relationships between environmental conditions doc requirement and cost of the deployment
tracking across cameras with non overlapping views is challenging problem firstly the observations of an object are often widely separated in time and space when viewed from non overlapping cameras secondly the appearance of an object in one camera view might be very different from its appearance in another camera view due to the differences in illumination pose and camera properties to deal with the first problem we observe that people or vehicles tend to follow the same paths in most cases ie roads walkways corridors etc the proposed algorithm uses this conformity in the traversed paths to establish correspondence the algorithm learns this conformity and hence the inter camera relationships in the form of multivariate probability density of space time variables entry and exit locations velocities and transition times using kernel density estimation to handle the appearance change of an object as it moves from one camera to another we show that all brightness transfer functions from given camera to another camera lie in low dimensional subspace this subspace is learned by using probabilistic principal component analysis and used for appearance matching the proposed approach does not require explicit inter camera calibration rather the system learns the camera topology and subspace of inter camera brightness transfer functions during training phase once the training is complete correspondences are assigned using the maximum likelihood ml estimation framework using both location and appearance cues experiments with real world videos are reported which validate the proposed approach
service discovery and service aggregation are two crucial issues in the emerging area of service oriented computing soc we propose new technique for the discovery of web services that accounts for the need of composing several services to satisfy client query the proposed algorithm makes use of owl ontologies and explicitly returns the sequence of atomic process invocations that the client must perform in order to achieve the desired result when no full match is possible the algorithm features flexible matching by returning partial matches and by suggesting additional inputs that would produce full match
in this article we present novel approach to ip traceback deterministic packet marking dpm dpm is based on marking all packets at ingress interfaces dpm is scalable simple to implement and introduces no bandwidth and practically no processing overhead on the network equipment it is capable of tracing thousands of simultaneous attackers during ddos attack given sufficient deployment on the internet dpm is capable of tracing back to the slaves responsible for ddos attacks that involve reflectors in dpm most of the processing required for traceback is done at the victim the traceback process can be performed post mortem allowing for tracing the attacks that may not have been noticed initially or the attacks which would deny service to the victim so that traceback is impossible in real time the involvement of the internet service providers isps is very limited and changes to the infrastructure and operation required to deploy dpm are minimal dpm is capable of performing the traceback without revealing topology of the providers network which is desirable quality of traceback method
ultimately display device should be capable of reproducing the visual effects observed in reality in this paper we introduce an autostereoscopic display that uses scalable array of digital light projectors and projection screen augmented with microlenses to simulate light field for given three dimensional scene physical objects emit or reflect light in all directions to create light field that can be approximated by the light field display the display can simultaneously provide many viewers from different viewpoints stereoscopic effect without headtracking or special viewing glasses this work focuses on two important technical problems related to the light field display calibration and rendering we present solution to automatically calibrate the light field display using camera and introduce two efficient algorithms to render the special multi view images by exploiting their spatial coherence the effectiveness of our approach is demonstrated with four projector prototype that can display dynamic imagery with full parallax
several alternatives to manage large xml document collections exist ranging from file systems over relational or other database systems to specifically tailored xml base management systems in this paper we give tour of natix database management system designed from scratch for storing and processing xml data contrary to the common belief that management of xml data is just another application for traditional databases like relational systems we illustrate how almost every component in database system is affected in terms of adequacy and performance we show how to design and optimize areas such as storage transaction management comprising recovery and multi user synchronization as well as query processing for xml
we present an efficient and scalable coarse grained multicomputer cgm coloring algorithm that colors graph with at most colors where is the maximum degree in this algorithm is given in two variants randomized and deterministic we show that on processor cgm model the proposed algorithms require parallel time of and total work and overall communication cost of these bounds correspond to the average case for the randomized version and to the worst case for the deterministic variant
although the advent of xml schema has rendered dtds obsolete research on practical xml optimization is mostly biased towards dtds and tends to largely ignore xsds some notable exceptions non withstanding one of the underlying reasons is most probably the perceived simplicity of dtds versus the alleged impenetrability of xml schema indeed optimization wrt dtds has local flavor and usually reduces to reasoning about the accustomed formalism of regular expressions xsds on the other hand even when sufficiently stripped down are related to the less pervious class of unranked regular tree automata recent results on the structural expressiveness of xsds however show that xsds are in fact much closer to dtds than to tree automata leveraging the possibility to directly extend techniques for dtd based xml optimization to the realm of xml schema the goal of the present paper is to present the results in in an easy and accessible way at the same time we discuss possible applications related research and future research directions throughout the paper we try to restrict notation to minimum we refer to for further details
applications in the domain of embedded systems are diverse and store an increasing amount of data in order to satisfy the varying requirements of these applications data management functionality is needed that can be tailored to the applications needs furthermore the resource restrictions of embedded systems imply need for data management that is customized to the hardware platform in this paper we present an approach for decomposing data management software for embedded systems using feature oriented programming the result of such decomposition is software product line that allows us to generate tailor made data management systems while existing approaches for tailoring software have significant drawbacks regarding customizability and performance feature oriented approach overcomes these limitations as we will demonstrate in non trivial case study on berkeley db we evaluate our approach and compare it to other approaches for tailoring dbms
virtual software execution environment known as virtual machine vm has been gaining popularity through java virtual machine jvm and common language infrastructure cli given their advantages in portability productivity and safety etc applying vm to real time embedded systems can leverage production cost fast time to market and software integrity however this approach can only become practical once the vm operations and application tasks are made schedulable jointly in this paper we present schedulable garbage collection algorithm applicable on real time applications in cli virtual machine environment to facilitate the scheduling of real time applications and garbage collection operations we make the pause time due to garbage collection controllable and the invocation of garbage collection predictable to demonstrate the approach prototype for schedulable garbage collection has been implemented in cli execution environment the garbage collection is carried out by concurrent thread while meeting targeted pause time and satisfying the memory requests of applications cost model of garbage collection is established based on measured wcet such that the execution time and overhead of garbage collection operations can be predicted finally we illustrate joint scheduling algorithm to meet the time and memory constraints of real time systems
utility programs which perform similar and largely independent operations on sequence of inputs include such common applications as compilers interpreters and document parsers databases and compression and encoding tools the repetitive behavior of these programs while often clear to users has been difficult to capture automatically we present an active profiling technique in which controlled inputs to utility programs are used to expose execution phases which are then marked automatically through binary instrumentation enabling us to exploit phase transitions in production runs with arbitrary inputs we demonstrate the effectiveness and programmability of active profiling via experiments with six utility programs from the spec benchmark suite compare to code and interval phases and describe applications of active profiling to memory management and memory leak detection
sharing healthcare data has become vital requirement in healthcare system management however inappropriate sharing and usage of healthcare data could threaten patients privacy in this paper we study the privacy concerns of the blood transfusion information sharing system between the hong kong red cross blood transfusion service bts and public hospitals and identify the major challenges that make traditional data anonymization methods not applicable furthermore we propose new privacy model called lkc privacy together with an anonymization algorithm to meet the privacy and information requirements in this bts case experiments on the real life data demonstrate that our anonymization algorithm can effectively retain the essential information in anonymous data for data analysis and is scalable for anonymizing large datasets
distil is software generator that implements declarative domain specific language dsl for container data structures distil is representative of new approach to domain specific language implementation instead of being the usual one of kind stand alone compiler distil is an extension library for the intentional programming ip transformation system currently under development by microsoft research distil relies on several reusable general purpose infrastructure tools offered by ip that substantially simplify dsl implementation
the paper presents the proof theoretical approach to probabilistic logic which allows expressions about approximate conditional probabilities the logic enriches propositional calculus with probabilistic operators which are applied to propositional formulas cp cp
early aspects are stakeholder concerns that crosscut the problem domain with the potential for broad impact on questions of scoping prioritization and architectural design analyzing early aspects improves early stage decision making and helps trace stakeholder interests throughout the software development life cycle however analysis of early aspects is hard because stakeholders are often vague about the concepts involved and may use different vocabularies to express their concerns in this paper we present rigorous approach to conceptual analysis of stakeholder concerns we make use of the repertory grid technique to identify terminological interference between stakeholders descriptions of their goals and formal concept analysis to uncover conflicts and trade offs between these goals we demonstrate how this approach can be applied to the goal models commonly used in requirements analysis resulting in the clarification and elaboration of early aspects preliminary qualitative evaluation indicates that the approach can be readily adopted in existing requirements analysis processes and can yield significant insights into crosscutting concerns in the problem domain
several studies have concentrated on the generation of wrappers for web data sources as wrappers can be easily described as grammars the grammatical inference heritage could play significant role in this research field recent results have identified new subclass of regular languages called prefix mark up languages that nicely abstract the structures usually found in html pages of large web sites this class has been proven to be identifiable in the limit and ptime unsupervised learning algorithm has been previously developed unfortunately many real life web pages do not fall in this class of languages in this article we analyze the roots of the problem and we propose technique to transform pages in order to bring them into the class of prefix mark up languages in this way we have practical solution without renouncing to the formal background defined within the grammatical inference framework we report on some experiments that we have conducted on real life web pages to evaluate the approach the results of this activity demonstrate the effectiveness of the presented techniques
new retrieval applications support flexible comparison for all pairs best match operations based on notion of similarity or distance the distance between items is determined by some arbitrary distance function users that pose queries may change their definition of the distance metric as they progress the distance metric change may be explicit or implicit in an application eg using relevance feedback recomputing from scratch the results with the new distance metric is wasteful in this paper we present an efficient approach to recomputing the all pairs best match join operation using the new distance metric by re using the work already carried out for the old distance metric our approach reduces significantly the work required to compute the new result as compared to naive re evaluation
the problem of mining all frequent queries on relational table is problem known to be intractable even for conjunctive queries in this article we restrict our attention to conjunctive projection selection queries and we assume that the table to be mined satisfies set of functional dependencies under these assumptions we define and characterize two pre orderings with respect to which the support measure is shown to be anti monotonic each of these pre orderings induces an equivalence relation for which all queries of the same equivalence class have the same support the goal of this article is not to provide algorithms for the computation of frequent queries but rather to provide basic properties of pre orderings and their associated equivalence relations showing that functional dependencies can be used for an optimized computation of supports of conjunctive queries in particular we show that one of the two pre orderings characterizes anti monotonicity of the support while the other one refines the former but allows to characterize anti monotonicity with respect to given table only basic computational implications of these properties are discussed in the article
this paper proposes framework for training conditional random fields crfs to optimize multivariate evaluation measures including non linear measures such as score our proposed framework is derived from an error minimization approach that provides simple solution for directly optimizing any evaluation measure specifically focusing on sequential segmentation tasks ie text chunking and named entity recognition we introduce loss function that closely reflects the target evaluation measure for these tasks namely segmentation score our experiments show that our method performs better than standard crf training
by providing direct data transfer between storage and client network attached storage devices have the potential to improve scalability for existing distributed file systems by removing the server as bottleneck and bandwidth for new parallel and distributed file systems through network striping and more efficient data paths together these advantages influence large enough fraction of the storage market to make commodity network attached storage feasible realizing the technology’s full potential requires careful consideration across wide range of file system networking and security issues this paper contrasts two network attached storage architectures networked scsi disks netscsi are network attached storage devices with minimal changes from the familiar scsi interface while network attached secure disks nasd are drives that support independent client access to drive object services to estimate the potential performance benefits of these architectures we develop an analytic model and perform trace driven replay experiments based on afs and nfs traces our results suggest that netscsi can reduce file server load during burst of nfs or afs activity by about with the nasd architecture server load during burst activity can be reduced by factor of up to five for afs and up to ten for nfs
as technology advances streams of data can be rapidly generated in many real life applications this calls for stream mining which searches for implicit previously unknown and potentially useful information such as frequent patterns that might be embedded in continuous data streams however most of the existing algorithms do not allow users to express the patterns to be mined according to their intentions via the use of constraints as result these unconstrained mining algorithms can yield numerous patterns that are not interesting to the users moreover many existing tree based algorithms assume that all the trees constructed during the mining process can fit into memory while this assumption holds for many situations there are many other situations in which it does not hold hence in this paper we develop efficient algorithms for stream mining of constrained frequent patterns in limited memory environment our algorithms allow users to impose certain focus on the mining process discover from data streams all those frequent patterns that satisfy the user constraints and handle situations where the available memory space is limited
some recent topic model based methods have been proposed to discover and summarize the evolutionary patterns of themes in temporal text collections however the theme patterns extracted by these methods are hard to interpret and evaluate to produce more descriptive representation of the theme pattern we not only give new representations of sentences and themes with named entities but we also propose sentence level probabilistic model based on the new representation pattern compared with other topic model methods our approach not only gets each topic’s distribution per term but also generates candidate summary sentences of the themes as well consequently the results are easier to understand and can be evaluated using the top sentences produced by our probabilistic model experimentation with the proposed methods on the tsunami dataset shows that the proposed methods are useful in the discovery of evolutionary theme patterns
in recent years high dimensional database applications deal with multidimensional ad hoc queries that refer to an arbitrary number of arbitrarily unpredictably chosen dimensions of high dimensional data this paper thoroughly and systematically investigates possible secondary storage based solutions to the problems of processing multidimensional ad hoc query in transactional or semi transactional environments then complementary solution called the indexed and transposed access method itam is proposed this method is based on two complementary measures multidimensional access method and proposed access method called the opus path the performance of multidimensional access methods deteriorates rapidly as the ratio of query dimensionality to data dimensionality decreases on the other hand the opus path shows retrieval performance that is actually better when the ratio is low
applications for constrained embedded systems require careful attention to the match between the application and the support offered by an architecture at the isa and microarchitecture levels generic processors such as arm and power pc are inexpensive but with respect to given application they often overprovision in areas that are unimportant for the application’s performance moreover while application specific customized logic could dramatically improve the performance of an application that approach is typically too expensive to justify its cost for most applications in this paper we describe our experience using reconfigurable architectures to develop an understanding of an application’s performance and to enhance its performance with respect to customized constrained logic we begin with standard isa currently in use for embedded systems we modify its core to measure performance characteristics obtaining system that provides cycle accurate timings and presents results in the style of gprof but with absolutely no software overhead we then provide cache behavior statistics that are typically unavailable in generic processor in contrast with simulation our approach executes the program at full speed and delivers statistics based on the actual behavior of the cache subsystem finally in response to the performance profile developed on our platform we evaluate various uses of the fpga realized instruction and data caches in terms of the application’s performance
in this paper we present the aria media processing workflow architecture that processes filters and fuses sensory inputs and actuates responses in real time the components of the architecture are programmable and adaptable ie the delay size and quality precision characteristics of the individual operators can be controlled via number of parameters each data object processed by qstream components is subject to transformations based on the parameter values for instance the quality of an output data object and the corresponding processing delay and resource usage depend on the values assigned to parameters of the operators in the object flow path in candan peng ryu chatha mayer efficient stream routing in quality and resource adaptive flow architectures in workshop on multimedia information systems we introduced class of flow optimization problems that promote creation and delivery of small delay or small resource usage objects to the actuators in single sensor single actuator workflows in this paper we extend our attention to multi sensor media processing workflow scenarios the algorithms we present take into account the implicit dependencies between various system parameters such as resource consumption and object sizes we experimentally show the effectiveness and efficiency of the algorithms
in an ordinary syntactic parser the input is string and the grammar ranges over strings this paper explores generalizations of ordinary parsing algorithms that allow the input to consist of string tuples and or the grammar to range over string tuples such algorithms can infer the synchronous structures hidden in parallel texts it turns out that these generalized parsers can do most of the work required to train and apply syntax aware statistical machine translation system
gossiping is an important problem in radio networks that has been well studied leading to many important results due to strong resouce limitations of sensor nodes previous solutions are frequently not feasible in sensor networks in this paper we study the gossiping problem in the restrictive context of sensor networks by exploiting the geometry of sensor node distributions we present reduced optimal running time of for an algorithm that completes gossiping with high probability in sensor network of unknown topology and adversarial wake up where is the diameter and the maximum degree of the network given that an algorithm for gossiping also solves the broadcast problem our result proves that the classic lower bound of can be broken if nodes are allowed to do preprocessing
mobile ad hoc networks are collections of mobile nodes with dynamism forming momentary network with no pre existing network infrastructure or centralised administration disaster relief education and armed forces are widespread situations where mobile nodes need to communicate in areas without any pre existing infrastructure multicast routing protocols outperform the basic broadcast routing by giving out the resources along general links while sending information to set of multiple destinations as the network topology is dynamic optimised routing protocols are needed for communication proficiency this paper explores the intelligent genetic algorithm ga based on demand multicast routing protocol and ga based multicast ad hoc on demand distance vector protocol which improve the control overheads and packet delivery ratio in the routing messages
we propose complete characterization of large class of distributed tasks with respect to weakened solvability notion called weak termination task is weak termination solvable if there is an algorithm by which at least one process outputs the proposed categorization of tasks is based on the weakest failure detectors needed to solve them we show that every task in the considered class is equivalent in the failure detector sense to some form of set agreement and thus its solvability with weak termination is completely characterized by its set consensus number the maximal integer such that can be weak termination solved using read write registers and set agreement objects the characterization goes through showing that �k recently shown to be the weakest failure detector for the task of set agreement is necessary to solve any task that is resilient impossible
this paper describes an evaluation of automatic video summarization systems run on rushes from several bbc dramatic series it was carried out under the auspices of the trec video retrieval evaluation trecvid as followup to the video summarization workshop held at acm multimedia research teams submitted video summaries of individual rushes video files aiming to compress out redundant and insignificant material each summary had duration of at most of the original the output of baseline system which simply presented each full video at times normal speed was contributed by carnegie mellon university cmu as control the procedures for developing ground truth lists of important segments from each video were applied at the national institute of standards and technology nist to the bbc videos at dublin city university dcu each summary was judged by humans with respect to how much of the ground truth was included and how well formed the summary was additional objective measures included how long it took the system to create the summary how long it took the assessor to judge it against the ground truth and what the summary’s duration was assessor agreement on finding desired segments averaged results indicated that while it was still difficult to exceed the performance of the baseline on including ground truth the baseline was outperformed by most other systems with respect to avoiding redundancy junk and presenting the summary with pleasant tempo rhythm
in recent years considerable advances have been made in the study of properties of metric spaces in terms of their doubling dimension this line of research has not only enhanced our understanding of finite metrics but has also resulted in many algorithmic applications however we still do not understand the interaction between various graphtheoretic topological properties of graphs and the doubling geometric properties of the shortest path metrics induced by them for instance the following natural question suggests itself given finite doubling metric is there always an unweighted graph with such that the shortest path metric on is still doubling and which agrees with on this is often useful given that unweighted graphs are often easier to reason about first hurdle to answering this question is that subdividing edges can increase the doubling dimension unboundedly and it is not difficult to show that the answer to the above question is negative however surprisingly allowing distortion between and enables us bypass this impossibility we show that for any metric space there is an unweighted graph with shortest path metric such that for all the distances and the doubling dimension for is not much more than that of where this change depends only on and not on the size of the graph we show similar result when both and are restricted to be trees this gives simple proof that doubling trees embed into constant dimensional euclidean space with constant distortion we also show that our results are tight in terms of the tradeoff between distortion and dimension blowup
we present the jars joint channel assignment routing and scheduling scheme for ad hoc wireless networks in which nodes are endowed with multiple radios jars is one example of the benefits gained by the integration of routing scheduling and channel assignment by using the multiple radios at each node to transmit and receive simultaneously on different orthogonal channels instead of choosing the optimal route based on the predetermined transmission scheduling and channel assignment results jars incorporates the efficiency of underlying channel assignment and scheduling information into the routing metric calculation so that the route with the maximal joint spatial and frequency reuse is selected once path is established the channel assignment and link scheduling are also determined at the same time jars also adapts different channel assignment and scheduling strategies according to the different communication patterns of broadcast and unicast transmissions simulation results show that jars efficiently exploits the channel diversity and spatial reuse features of multi channel multi radio system
as internet applications become larger and more complex the task of managing them becomes overwhelming abnormal events such as software updates failures attacks and hotspots become frequent the selfman project is tackling this problem by combining two technologies namely structured overlay networks and advanced component models to make the system self managing structured overlay networks sons developed out of peer to peer systems and provide robustness scalability communication guarantees and efficiency component models provide the framework to extend the self managing properties of sons over the whole system selfman is building self managing transactional storage and using it for two application demonstrators distributed wiki and an on demand media streaming service this paper provides an introduction and motivation for the ideas underlying selfman and snapshot of its contributions midway through the project we explain our methodology for building self managing systems as networks of interacting feedback loops we then summarize the work we have done to make sons practical basis for our architecture using an advanced component model handling network partitions handling failure suspicions and doing range queries with load balancing finally we show the design of self managing transactional storage on son
as processor performance continues to improve at rate much higher than dram and network performance we are approaching time when large scale distributed shared memory systems will have remote memory latencies measured in tens of thousands of processor cycles the impulse memory system architecture adds an optional level of address indirection at the memory controller applications can use this level of indirection to control how data is accessed and cached and thereby improve cache and bus utilization and reduce the number of memory accesses required previous impulse work focuses on uniprocessor systems and relies on software to flush processor caches when necessary to ensure data coherence in this paper we investigate an extension of impulse to multiprocessor systems that extends the coherence protocol to maintain data coherence without requiring software directed cache flushing specifically the multiprocessor impulse controller can gather scatter data across the network while its coherence protocol guarantees that each gather request gets coherent data and each scatter request updates every coherent replica in the system our simulation results demonstrate that the proposed system can significantly outperform conventional systems achieving an average speedup of on four memory bound benchmarks on processor system
we are developing companion cognitive systems new kind of software that can be effectively treated as collaborator aside from their potential utility we believe this effort is important because it focuses on three key problems that must be solved to achieve human level ai robust reasoning and learning interactivity and longevity we describe the ideas we are using to develop the first architecture for companions analogical processing grounded in cognitive science for reasoning and learning sketching and concept maps to improve interactivity and distributed agent architecture hosted on cluster to achieve performance and longevity we outline some results on learning by accumulating examples derived from our first experimental version
in this paper we describe system whose purpose is to help establish valid set of roles and role hierarchies with assigned users and associated permissions we have designed and implemented the system called ra system which enables role administrators to build and configure various components of role based access control rbac model thereby making it possible to lay foundation for role based authorization infrastructures three methodological constituents for our purpose are introduced together with the design and implementation issues the system has role centric view for easily managing constrained roles as well as assigned users and permissions an ldap accessible directory service was used for role database we show that the system can be seamlessly integrated with an existing privilege based authorization infrastructure we finally discuss our plans for future development of the system
variety of new modularization techniques is emerging to cope with the challenges of contemporary software engineering such as aspect oriented software development aosd feature oriented programming fop and the like the effective assessment of such technologies plays pivotal role in understanding their costs and benefits when compared to conventional development techniques and ii their effective transfer to mainstream software development the goal of the nd acom workshop is to put together researchers and practitioners with different backgrounds to understand the impact of contemporary modularization techniques in practice explore new and potentially more effective modularity modeling and assessment methods to account for and guide the application of modularization techniques and discuss the potential of using modularity assessment results to improve software development outcomes to improve existing modularization techniques and to foster the development of new techniques
constraint satisfaction problems csp are frequently solved over data residing in relational database systems in such scenarios the database is typically just used as data storage back end however there exist important advantages such as the wide availability of database practices and tools for modeling to having database systems that are capable of natively modeling and solving csps this paper introduces general concepts and techniques to extend database system with constraint processing capabilities input csps are modeled via sql augmented with non deterministic guess operator as introduced by cadoli and mancini tplp problems are represented with combination of internal relations and parse trees and are translated to flexible intermediate problem representation that is subsequently translated into several common representations for sat benchmarks with prototype system show the feasibility of the approach and demonstrate the promise of strong integration of csp solvers and database systems
new branch of biometrics palmprint authentication has attracted increasing amount of attention because palmprints are abundant of line features so that low resolution images can be used in this paper we propose new texture based approach for palmprint feature extraction template representation and matching an extension of the sax symbolic aggregate approximation time series technology to data is the key to make this new approach effective simple flexible and reliable experiments show that by adopting the simple feature of grayscale information only this approach can achieve an equal error rate of and rank one identification accuracy of on palmprint public database this new approach has very low computational complexity so that it can be efficiently implemented on slow mobile embedded platforms the proposed approach does not rely on any parameter training process and therefore is fully reproducible what is more besides the palmprint authentication the proposed extension of sax may also be applied to other problems of pattern recognition and data mining for images
this work considers the common problem of completing partially visible artifacts within scene human vision abilities to complete such artifacts are well studied within the realms of perceptual psychology however the psychological explanations for completion have received only limited application in the domain of computer vision here we examine prior work in this area of computer vision with reference to psychological accounts of completion and identify remaining challenges for future work
in this paper we identify the key software engineering challenges introduced by the need of accessing and exploiting huge amount of heterogeneous contextual information following we survey the relevant proposals in the area of context aware pervasive computing data mining and granular computing discussing their potentials and limitations with regard to their adoption in the development of context aware pervasive services on these bases we propose the model for contextual data and show how it can represent simple yet effective model to enable flexible general purpose management of contextual knowledge by pervasive services summarizing discussion and the identification of current limitations and open research directions conclude the paper
videos play an ever increasing role in our everyday lives with applications ranging from news entertainment scientific research security and surveillance coupled with the fact that cameras and storage media are becoming less expensive it has resulted in people producing more video content than ever before this necessitates the development of efficient indexing and retrieval algorithms for video data most state of the art techniques index videos according to the global content in the scene such as color texture brightness etc in this paper we discuss the problem of activity based indexing of videos to address the problem first we describe activities as cascade of dynamical systems which significantly enhances the expressive power of the model while retaining many of the computational advantages of using dynamical models second we also derive methods to incorporate view and rate invariance into these models so that similar actions are clustered together irrespective of the viewpoint or the rate of execution of the activity we also derive algorithms to learn the model parameters from video stream and demonstrate how single video sequence may be clustered into different clusters where each cluster represents an activity experimental results for five different databases show that the clusters found by the algorithm correspond to semantically meaningful activities
in this paper we describe our experience with using an abstract integer set framework to develop the rice dhpf compiler compiler for high performance fortran we present simple yet general formulations of the major computation partitioning and communication analysis tasks as well as number of important optimizations in terms of abstract operations on sets of integer tuples this approach has made it possible to implement comprehensive collection of advanced optimizations in dhpf and to do so in the context of more general computation partitioning model than previous compilers one potential limitation of the approach is that the underlying class of integer set problems is fundamentally unable to represent hpf data distributions on symbolic number of processors we describe how we extend the approach to compile codes for symbolic number of processors without requiring any changes to the set formulations for the above optimizations we show experimentally that the set representation is not dominant factor in compile times on both small and large codes finally we present preliminary performance measurements to show that the generated code achieves good speedups for few benchmarks overall we believe we are the first to demonstrate by implementation experience that it is practical to build compiler for hpf using general and powerful integer set framework
two physical objects cannot occupy the same space at the same time simulated physical objects do not naturally obey this constraint instead we must detect when two objects have collided we must perform collision detection this work presents simple voxel based collision detection algorithm an efficient parallel implementation of the algorithm and performance results
path logic programming is modest extension of prolog for the specification of program transformations we give an informal introduction to this extension and we show how it can be used in coding standard compiler optimisations and also number of obfuscating transformations the object language is the microsoft net intermediate language il
in situations of computer mediated communication and computer supported cooperation central challenge lies in increasing the willingness of those involved to share their information with the other group members in the experimental work presented here shared database setting is selected as prototypical situation of net based information exchange and examined from social dilemma perspective the individual who contributes information to shared database must reckon with costs and no benefits the most efficient strategy from the perspective of the individual is thus to withhold information previous research has shown that group awareness tool which provides information about the contribution behavior of group members influences people’s information exchange behavior in order to examine the psychological processes underlying these effects of group awareness in more detail the present study adopts an interactional approach according to which person situation interaction is investigated certain personality traits interpersonal trust sensation seeking and self monitoring were measured and several hypotheses tested regarding the reactions of individuals with high and low trait values to different types of awareness information results demonstrate that awareness tools providing information about highly cooperative group members encourage participants to trust one another and minimize the risk of being exploited when an awareness tool additionally provides feedback about the contribution behavior of single individuals it becomes an opportunity for self presentation in conclusion an interactional approach which considers personality traits and situational factors in net based information exchange situation provides new insights into both the influence processes of group awareness and the connection of these processes to specific personality traits with respect to contribution behavior
personal webservers have proven to be popular means of sharing files and peer collaboration unfortunately the transient availability and rapidly evolving content on such hosts render centralized crawl based search indices stale and incomplete to address this problem we propose yousearch distributed search application for personal webservers operating within shared context eg corporate intranet with yousearch search results are always fast fresh and complete properties we show arise from an architecture that exploits both the extensive distributed resources available at the peer webservers in addition to centralized repository of summarized network state yousearch extends the concept of shared context within web communities by enabling peers to aggregate into groups and users to search over specific groups in this paper we describe the challenges design implementation and experiences with successful intranet deployment of yousearch
the study of deterministic public key encryption was initiated by bellare et al crypto who provided the strongest possible notion of security for this primitive called priv and constructions in the random oracle ro model we focus on constructing efficient deterministic encryption schemes without random oracles to do so we propose slightly weaker notion of security saying that no partial information about encrypted messages should be leaked as long as each message is priori hard to guess given the others while priv did not have the latter restriction nevertheless we argue that this version seems adequate for many practical applications we show equivalence of this definition to single message and indistinguishability based ones which are easier to work with then we give general constructions of both chosen plaintext cpa and chosen ciphertext attack cca secure deterministic encryption schemes as well as efficient instantiations of them under standard number theoretic assumptions our constructions build on the recently introduced framework of peikert and waters stoc for constructing cca secure probabilistic encryption schemes extending it to the deterministic encryption setting as well
continuous wireless wide area network wwan access for mobile devices in future pervasive systems may be limited by battery power and may generate extensive data telecommunication costs in this paper we develop new message notification protocol mnp to enable mobile users to maintain continuous presence at their instant messaging im server while avoiding long idle connections in mnp mobile users cooperatively share single message notification channel to reduce users telecommunication charges and extend device’s battery life device may turn off its wwan interface for most of the time to save power and only needs to contact the im server when needed precise group information does not need to be maintained by mobile device message notification exchanged between the im server and the peer group is represented by compressed bloom filter to further reduce the protocol overhead and provide additional privacy and security the results of performance evaluation show that the mnp protocol could be able to save significant energy consumed in mobile device
media and scientific simulation applications have large amount of parallelism that can be exploited in contemporary multi core microprocessors however traditional pointer and array analysis techniques often fall short in automatically identifying this parallelism this is due to the allocation and referencing patterns of time slicing algorithms where information flows from one time slice to the next in these an object is allocated within loop and written to with source data obtained from objects created in previous iterations of the loop the objects are typically allocated at the same static call site through the same call chain in the call graph making them indistinguishable by traditional heap sensitive analysis techniques that use call chains to distinguish heap objects as result the compiler cannot separate the source and destination objects within each time slice of the algorithm in this work we discuss an analysis that quickly identifies these objects through partially flow sensitive technique called iteration disambiguation this is done through relatively simple aging mechanism we show that this analysis can distinguish objects allocated in different time slices across wide range of benchmark applications within tens of seconds even for complete media applications we will also discuss the obstacles to automatically identifying the remaining parallelism in studied applications and propose methods to address them
collaborative filtering is one of the most successful and widely used methods of automated product recommendation in online stores the most critical component of the method is the mechanism of finding similarities among users using product ratings data so that products can be recommended based on the similarities the calculation of similarities has relied on traditional distance and vector similarity measures such as pearson’s correlation and cosine which however have been seldom questioned in terms of their effectiveness in the recommendation problem domain this paper presents new heuristic similarity measure that focuses on improving recommendation performance under cold start conditions where only small number of ratings are available for similarity calculation for each user experiments using three different datasets show the superiority of the measure in new user cold start conditions
in this paper we describe teapot domain specific language for writing cache coherence protocols cache coherence is of concern when parallel and distributed systems make local replicas of shared data to improve scalability and performance in both distributed shared memory systems and distributed file systems coherence protocol maintains agreement among the replicated copies as the underlying data are modified by programs running on the system cache coherence protocols are notoriously difficult to implement debug and maintain moreover protocols are not off the shelf reusable components because their details depend on the requirements of the system under consideration the complexity of engineering coherence protocols can discourage users from experimenting with new potentially more efficient protocols we have designed and implemented teapot domain specific language that attempts to address this complexity teapot’s language constructs such as state centric control structure and continuations are better suited to expressing protocol code than those of typical systems programming language teapot also facilitates automatic verification of protocols so hard to find protocol bugs such as deadlocks can be detected and fixed before encountering them on an actual execution we describe the design rationale of teapot present an empirical evaluation of the language using two case studies and relate the lessons that we learned in building domain specific language for systems programming
recommender systems use historical data on user preferences and other available data on users for example demographics and items for example taxonomy to predict items new user might like applications of these methods include recommending items for purchase and personalizing the browsing experience on web site collaborative filtering methods have focused on using just the history of user preferences to make the recommendations these methods have been categorized as memory based if they operate over the entire data to make predictions and as model based if they use the data to build model which is then used for predictions in this paper we propose the use of linear classifiers in model based recommender system we compare our method with another model based method using decision trees and with memory based methods using data from various domains our experimental results indicate that these linear models are well suited for this application they outperform commonly proposed memory based method in accuracy and also have better tradeoff between off line and on line computational requirements
conventional database query languages are considered in the context of untyped sets the algebra without while has the expressive power of the typed complex object algebra the algebra plus while and col with untyped sets under stratified semantics or inflationary semantics have the power of the computable queries the calculus has power beyond the computable queries and is characterized using the typed complex object calculus with invention the bancilhon khoshafian calculus is also discussed technical tool called ldquo generic turing machine rdquo is introduced and used in several of the proofs
this paper studies fault tolerant routing for injured hypercubes using local safety information it is shown that minimum feasible path is always available if the spanning subcube that contains both source and destination is safe the safety information outside the spanning subcube is applied only when derouting is needed routing scheme based on local safety information is proposed and the extra cost to obtain local safety information is comparable to the one based on global safety information the proposed algorithm guarantees to find minimum feasible path if the spanning subcube is contained in maximal safe subcube and the source is locally safe in the maximal safe subcube new technique to set up partial path is proposed based on local safety information when the above conditions are not met sufficient simulation results are provided to demonstrate the effectiveness of the method by comparing with the previous methods
the worker wrapper transformation is technique for changing the type of computation usually with the aim of improving its performance it has been used by compiler writers for many years but the technique is little known in the wider functional programming community and has never been described precisely in this article we explain formalise and explore the generality of the worker wrapper transformation we also provide systematic recipe for its use as an equational reasoning technique for improving the performance of programs and illustrate the power of this recipe using range of examples
in this paper we consider the problem of web page usage prediction in web site by modeling users navigation history and web page content with weighted suffix trees this user’s navigation prediction can be exploited either in an on line recommendation system in web site or in web page cache system the method proposed has the advantage that it demands constant amount of computational effort per one user’s action and consumes relatively small amount of extra memory space these features make the method ideal for an on line working environment finally we have performed an evaluation of the proposed scheme with experiments on various web site log files and web pages and we have found that its quality performance is fairly well and in many cases an outperforming one
mpeg is very promising standard for the description of multimedia content certainly means for the adequate management of large amounts of mpeg media descriptions are needed in the near future essentially mpeg media descriptions are xml documents following media description schemes and descriptors defined with an extension of xml schema named mpeg ddl however xml database solutions available today are not suitable for the management of mpeg media descriptions they typically neglect type information available with the definitions of description schemes and descriptors and represent the basic contents of media descriptions as text but storing non textual multimedia data typically contained in media descriptions such as melody contours and object shapes textually and forcing applications to access and process such data as text is neither adequate nor efficient in this paper we therefore propose the typed document object model tdom data model for xml documents that can benefit from available schema definitions and represent the basic contents of document in typed fashion through these typed representations applications can access and work with multimedia data contained in mpeg media descriptions in way that is appropriate to the particular type of the data thereby tdom constitutes solid foundation for an xml database solution enabling the adequate management of mpeg media descriptions
many fully adaptive algorithms have been proposed for ary cubes over the past decade the performance characteristics of most of these algorithms have been analysed by means of software simulation only this paper proposes simple yet reasonably accurate analytical model to predict message latency in wormhole routed ary cubes with fully adaptive routing this model requires running time of which is the fastest model yet reported in the literature while maintaining reasonable accuracy
as prolific research area in data mining subspace clustering and related problems induced vast quantity of proposed solutions however many publications compare new proposition mdash if at all mdash with one or two competitors or even with so called ldquo na iuml ve rdquo ad hoc solution but fail to clarify the exact problem definition as consequence even if two solutions are thoroughly compared experimentally it will often remain unclear whether both solutions tackle the same problem or if they do whether they agree in certain tacit assumptions and how such assumptions may influence the outcome of an algorithm in this survey we try to clarify the different problem definitions related to subspace clustering in general ii the specific difficulties encountered in this field of research iii the varying assumptions heuristics and intuitions forming the basis of different approaches and iv how several prominent solutions tackle different problems
sign language consists of two types of action signs and fingerspellings signs are dynamic gestures discriminated by continuous hand motions and hand configurations while fingerspellings are combination of continuous hand configurations sign language spotting is the task of detection and recognition of signs and fingerspellings in signed utterance the internal structures of signs and fingerspellings differ significantly therefore it is difficult to spot signs and fingerspellings simultaneously in this paper novel method for spotting signs and fingerspellings is proposed it can distinguish signs fingerspellings and non sign patterns and is robust to the various sizes scales and rotations of the signer’s hand this is achieved through hierarchical framework consisting of three steps candidate segments of signs and fingerspellings are discriminated using two layer conditional random field crf hand shapes of segmented signs and fingerspellings are verified using boostmap embeddings the motions of fingerspellings are verified in order to distinguish those which have similar hand shapes and different hand motions experiments demonstrate that the proposed method can spot signs and fingerspellings from utterance data at rates of and respectively
finding proper distribution of translation probabilities is one of the most important factors impacting the effectiveness of cross language information retrieval system in this paper we present new approach that computes translation probabilities for given query by using only bilingual dictionary and monolingual corpus in the target language the algorithm combines term association measures with an iterative machine learning approach based on expectation maximization our approach considers only pairs of translation candidates and is therefore less sensitive to data sparseness issues than approaches using higher grams the learned translation probabilities are used as query term weights and integrated into vector space retrieval system results for english german cross lingual retrieval show substantial improvements over baseline using dictionary lookup without term weighting
with the increasing use of web interfaces across organisations corporate and supporting applications has come dramatic increase in the number of users in the resulting systems along with this trend to connect more and more of an organization’s staff and clients together via web interfaces has been the rise of user centric design models which place user requirements higher on the priorities list in system design and also places user satisfaction as major performance and quality indicator in addition the use of the web in capturing and making use of knowledge management ideas especially with the help of collaborative tools is being experimented with and implemented in many organisations this paper presents and discusses both modelling of these changes and the effects on web application and systems development
traditional fault tolerance techniques typically utilize resources ineffectively because they cannot adapt to the changing reliability and performance demands of system this paper proposes software controlled fault tolerance concept allowing designers and users to tailor their performance and reliability for each situation several software controllable fault detection techniques are then presented swift software only technique and craft suite of hybrid hardware software techniques finally the paper introduces profit technique which adjusts the level of protection and performance at fine granularities through software control when coupled with software controllable techniques like swift and craft profit offers attractive and novel reliability options
this paper studies the aggregation of messages in networks that consist of chain of nodes and each message is time constrained such that it needs to be aggregated during given time interval called its due interval the objective is to minimize the maximum cost incurred at any node which is for example concern in wireless sensor networks where it is crucial to distribute the energy consumption as equally as possible first we settle the complexity of this problem by proving its np hardness even for the case of unit length due intervals second we give qptas which we extend to ptas for the special case that the lengths of the due intervals are constants this is in particular interesting since we prove that this problem becomes apx hard if we consider tree networks instead of chain networks even for the case of unit length due intervals specifically we show that it cannot be approximated within for any unless np
performance evaluation of peer to peer search techniques has been based on simple performance metrics such as message hop counts and total network traffic mostly disregarding their inherent concurrent nature where contention may arise this paper is concerned with the effect of contention in complex pp network search focusing on techniques for multidimensional range search we evaluate peer to peer networks derived from recently proposed works introducing two novel metrics related to concurrency and contention namely responsiveness and throughput our results highlight the impact of contention on these networks and demonstrate that some studied networks do not scale in the presence of contention also our results indicate that certain network properties believed to be desirable eg uniform data distribution or peer accesses may not be as critical as previously believed
we study here the effect of concurrent greedy moves of players in atomic congestion games where selfish agents players wish to select resource each out of resources so that her selfish delay there is not much the problem of maintaining global progress while allowing concurrent play is exactly what is examined and answered here we examine two orthogonal settings game where the players decide their moves without global information each acting freely by sampling resources randomly and locally deciding to migrate if the new resource is better via random experiment here the resources can have quite arbitrary latency that is load dependent ii an organised setting where the players are pre partitioned into selfish groups coalitions and where each coalition does an improving coalitional move our work considers concurrent selfish play for arbitrary latencies for the first time also this is the first time where fast coalitional convergence to an approximate equilibrium is shown
message ordered multicast service delivers messages from multiple senders to multiple receivers preserving some ordering properties among the messages such as their sending sequence or possible causality relationship different applications require different ordering properties existing message ordered multicast protocols support only specific ordering propertywe describe method to realize different message order multicast services using network composed of switches and communication links with simple properties given these elements and their properties we derived common building blocks which have different ordering capabilities strong ordering or strong causal ordering these building blocks are very powerful and flexible tool to build more complex structures which are able to order the messages of multiple overlapping multicast groups
performance is nonfunctional software attribute that plays crucial role in wide application domains spreading from safety critical systems to commerce applications software risk can be quantified as combination of the probability that software system may fail and the severity of the damages caused by the failure in this paper we devise methodology for estimation of performance based risk factor which originates from violations of performance requirements namely performance failures the methodology elaborates annotated uml diagrams to estimate the performance failure probability and combines it with the failure severity estimate which is obtained using the functional failure analysis we are thus able to determine risky scenarios as well as risky software components and the analysis feedback can be used to improve the software design we illustrate the methodology on an commerce case study using step by step approach and then provide brief description of case study based on large real system
we describe here in detail our work toward creating dynamic lexicon from the texts in large digital library by leveraging small structured knowledge source word treebank we are able to extract selectional preferences for words from million word latin corpus this is promising news for low resource languages and digital collections seeking to leverage small human investment into much larger gain the library architecture in which this work is developed allows us to query customized subcorpora to report on lexical usage by author genre or era and allows us to continually update the lexicon as new texts are added to the collection
recent improvements in text entry error rate measurement have enabled the running of text entry experiments in which subjects are free to correct errors or not as they transcribe presented string in these ldquo unconstrained rdquo experiments it is no longer necessary to force subjects to unnaturally maintain synchronicity with presented text for the sake of performing overall error rate calculations however the calculation of character level error rates which can be trivial in artificially constrained evaluations is far more complicated in unconstrained text entry evaluations because it is difficult to infer subject’s intention at every character for this reason prior character level error analyses for unconstrained experiments have only compared presented and transcribed strings not input streams but input streams are rich sources of character level error information since they contain all of the text entered and erased by subject the current work presents an algorithm for the automated analysis of character level errors in input streams for unconstrained text entry evaluations it also presents new character level metrics that can aid method designers in refining text entry methods to exercise these metrics we perform two analyses on data from an actual text entry experiment one analysis available from the prior work uses only presented and transcribed strings the other analysis uses input streams as described in the current work the results confirm that input stream error analysis yields richer information for the same empirical data to facilitate the use of these new analyses we offer pseudocode and downloadable software for performing unconstrained text entry experiments and analyzing data
in this paper we study the critical transmitting range ctr for connectivity in mobile ad hoc networks we prove that cal sqrt ln over pi for some constant ge where cal is the ctr in the presence of cal hbox rm like node mobility and is the number of network nodes our result holds for an arbitrary mobility model cal such that cal is obstacle free and nodes are allowed to move only within certain bounded area we also investigate in detail the case of random waypoint mobility which is the most common mobility model used in the simulation of ad hoc networks denoting with rp the ctr with random waypoint mobility when the pause time is set to and node velocity is set to we prove that rp over over sqrt ln over pi if and that gg sqrt ln over the results of our simulations also suggest that if is large enough ge is well approximated by over ln where is the critical range in case of uniformly distributed nodes the results presented in this paper provide better understanding of the behavior of fundamental network parameter in the presence of mobility and can be used to improve the accuracy of mobile ad hoc network simulations
this paper describes method for the automatic inference of structural transfer rules to be used in shallow transfer machine translation mt system from small parallel corpora the structural transfer rules are based on alignment templates like those used in statistical mt alignment templates are extracted from sentence aligned parallel corpora and extended with set of restrictions which are derived from the bilingual dictionary of the mt system and control their application as transfer rules the experiments conducted using three difierent language pairs in the free open source mt platform apertium show that translation quality is improved as compared to word for word translation when no transfer rules are used and that the resulting translation quality is close to that obtained using hand coded transfer rules the method we present is entirely unsupervised and benefits from information in the rest of modules of the mt system in which the inferred rules are applied
an increasing number of applications are being developed using distributed object computing doc middleware such as corba many of these applications require the underlying middleware operating systems and networks to provide dependable end to end quality of service qos support to enhance their efficiency predictability scalability and reliability the object management group omg which standardizes corba has addressed many of these application requirements individually in the real time corba rt corba and fault tolerant corba ft corba specifications though the implementations of rt corba are suitable for mission critical commercial or military distributed real time and embedded dre systems the usage of ft corba with rt corba implementations are not yet suitable for systems that have stringent simultaneous dependability and predictability requirementsthis paper provides three contributions to the study and evaluation of dependable corba middleware for performance sensitive dre systems first we provide an overview of ft corba and illustrate the sources of unpredictability associated with conventional ft corba implementations second we discuss the qos requirements of an important class of mission critical dre systems to show how these requirements are not well served by ft corba today finally we empirically evaluate new dependability strategies for ft corba that can help make the use of doc middleware for mission critical dre systems reality
the group key management is one of the most crucial problems in group communication in dynamic and large scale groups the overhead of key generating and key updating is usually relevant to the group size which becomes performance bottleneck in achieving scalability therefore scalable group key management protocol which is independent from group size is the basis for wide applications of group communication the paper proposes novel group key management protocol which designates un trusted routers over internet as transmitting nodes to organize key material transmitting tree for transmitting key material members in group that are partitioned into subgroups attach to different transmitting nodes and compute sek using received key material and own secret parameter the overhead of key management can be shared by the transmitting nodes which can not reveal the data of group communications and the overhead for key management of each transmitting node is independent of the group size in addition the new protocol conduces to constant computation and communication overhead during key updating
number of studies have been written on sensor networks in the past few years due to their wide range of potential applications object tracking is an important topic in sensor networks and the limited power of sensor nodes presents numerous challenges to researchers previous studies of energy conservation in sensor networks have considered object movement behavior to be random however in some applications the movement behavior of an object is often based on certain underlying events instead of randomness completely moreover few studies have considered the real time issue in addition to the energy saving problem for object tracking in sensor networks in this paper we propose novel strategy named multi level object tracking strategy mlot for energy efficient and real time tracking of the moving objects in sensor networks by mining the movement log in mlot we first conduct hierarchical clustering to form hierarchical model of the sensor nodes second the movement logs of the moving objects are analyzed by data mining algorithm to obtain the movement patterns which are then used to predict the next position of moving object we use the multi level structure to represent the hierarchical relations among sensor nodes so as to achieve the goal of keeping track of moving objects in real time manner through experimental evaluation of various simulated conditions the proposed method is shown to deliver excellent performance in terms of both energy efficiency and timeliness
this paper presents case study in modeling and verifying posix like file store for flash memory this work fits in the context of hoare’s verification challenge and in particular joshi and holzmann’s mini challenge to build verifiable file store we have designed simple robust file store and implemented it in the form of promela model test harness is used to exercise the file store in number of ways model checking technology has been extensively used to verify the correctness of our implementation distinguishing feature of our approach is the bounded exhaustive verification of power loss recovery
database management system dbms performs query optimization based on statistical information about data in the underlying data base out of date statistics may lead to inefficient query processing in the system existing solutions to this problem have some drawbacks such as heavy administrative burden high system load and tardy updates to overcome these drawbacks our new approach called the piggyback method is proposed in this paper the key idea is to piggyback some additional retrievals during the processing of user query in order to collect more up to date statistics the collected statistics are used to optimize the processing of subsequent queries to specify the piggybacked queries basic piggybacking operators are defined in this paper using the operators several types of piggybacking such as vertical horizontal mixed vertical and horizontal and multi query piggybacking are introduced statistics that can be obtained from different access methods by applying piggyback analysis during query processing are also studied in order to meet users different requirements for the associated overhead several piggybacking levels are suggested other related issues including initial statistics piggybacking time and parallelism are also discussed our analysis shows that the piggyback method is promising in improving the quality of query optimization in dbms as well as in reducing the user’s administrative burden for maintaining an efficient dbms
in this study the moving average autoregressive exogenous arx prediction model is combined with grey systems theory and rough set rs theory to create an automatic stock market forecasting and portfolio selection mechanism in the proposed approach financial data are collected automatically every quarter and are input to an arx prediction model to forecast the future trends of the collected data over the next quarter or half year period the forecast data is then reduced using gm model clustered using means clustering algorithm and then supplied to rs classification module which selects appropriate investment stocks by applying set of decision making rules finally grey relational analysis technique is employed to specify an appropriate weighting of the selected stocks such that the portfolio’s rate of return is maximized the validity of the proposed approach is demonstrated using electronic stock data extracted from the financial database maintained by the taiwan economic journal tej the predictive ability and portfolio results obtained using the proposed hybrid model are compared with those of gm prediction method it is found that the hybrid method not only has greater forecasting accuracy than the gm method but also yields greater rate of return on the selected stocks
we have developed an unsupervised framework for simultaneously extracting and normalizing attributes of products from multiple web pages originated from different sites our framework is designed based on probabilistic graphical model that can model the page independent content information and the page dependent layout information of the text fragments in web pages one characteristic of our framework is that previously unseen attributes can be discovered from the clue contained in the layout format of the text fragments our framework tackles both extraction and normalization tasks by jointly considering the relationship between the content and layout information dirichlet process prior is employed leading to another advantage that the number of discovered product attributes is unlimited an unsupervised inference algorithm based on variational method is presented the semantics of the normalized attributes can be visualized by examining the term weights in the model our framework can be applied to wide range of web mining applications such as product matching and retrieval we have conducted extensive experiments from four different domains consisting of over web pages from over different web sites demonstrating the robustness and effectiveness of our framework
information based organizations depend upon computer databases and information systems for their ongoing operation and management information resource management irm is program of activities directed at making effective use of information technology within an organization these activities range from global corporate information planning to application system development operation and maintenance and support of end user computing numerous approaches to specific irm activities have been proposed they remain disjoint however and hence globally ineffectivea significant reason for inability to integrate irm activities is the failure to adequately define the information resource what is it that must be effectively managed this paper addresses this issue it applies data modeling concepts to the problem of managing organizational information resources data model is developed to support and integrate the various irm activities this model formally defines the information resource and the data needed to manage it it provides basic ingredient for effective information resource management
long term search history contains rich information about user’s search preferences which can be used as search context to improve retrieval performance in this paper we study statistical language modeling based methods to mine contextual information from long term search history and exploit it for more accurate estimate of the query language model experiments on real web search data show that the algorithms are effective in improving search accuracy for both fresh and recurring queries the best performance is achieved when using clickthrough data of past searches that are related to the current query
finding all the occurrences of twig pattern specified by selection predicate on multiple elements in an xml document is core operation for efficient evaluation of xml queries holistic twig join algorithms were proposed recently as an optimal solution when the twig pattern only involves ancestor descendant relationships in this paper we address the problem of efficient processing of holistic twig joins on all partly indexed xml documents in particular we propose an algorithm that utilizes available indices on element sets while it can be shown analytically that the proposed algorithm is as efficient as the existing state of the art algorithms in terms of worst case and cpu cost experimental results on various datasets indicate that the proposed index based algorithm performs significantly better than the existing ones especially when binary structural joins in the twig pattern have varying join selectivities
graph augmentation theory is general framework for analyzing navigability in social networks it is known that for large classes of graphs there exist augmentations of these graphs such that greedy routing according to the shortest path metric performs in polylogarithmic expected number of steps however it is also known that there are classes of graphs for which no augmentations can enable greedy routing according to the shortest path metric to perform better than log expected number of steps in fact the best known universal bound on the greedy diameter of arbitrary graph is essentially that is for any graph there is an augmentation such that greedy routing according to the shortest path metric performs in expected number of steps hence greedy routing according to the shortest path metric has at least two drawbacks first it is in general space consuming to encode locally the shortest path distances to all the other nodes and second greedy routing according to the shortest path metric performs poorly in some graphs we prove that using semimetrics of small stretch results in huge positive impact in both encoding space and efficiency of greedy routing more precisely we show that for any connected node graph and any integer there exist an augmentation of and semimetric on with stretch such that greedy routing according to performs in klogn expected number of steps as corollary we get that for any connected node graph there exist an augmentation of and semimetric on with stretch log such that greedy routing according to performs in polylogarithmic expected number of steps this latter semimetric can be encoded locally at every node using only polylogarithmic number of bits
we describe user study of large multi user interactive surface deployed for an initial period within real world setting the surface was designed to enable the sharing and exchange of wide variety of digital media the setting for the study was the common room of high school where students come together to mix socialize and collaborate throughout the day we report on how the students use the new technology within their own established communal space findings show that the system was used extensively by the students in variety of ways including sharing of photos video clips and websites and for facilitating social interaction we discuss how the interactive shared surface was appropriated by the students and introduced into their everyday lives in ways that both mirrored and extended their existing practices within the communal space
one approach to network emulation involves simulating virtual network with real time network simulator and providing an interface that enables interaction between real hosts and the virtual network this allows real protocols and applications to be tested in controlled and repeatable environment to reflect conditions of large networks such as the internet it is important that the emulation environment be scalable this paper examines improvements in scalability of the virtual network achieved through the use of parallel discrete event simulation and simulation abstraction using just parallel simulation techniques real time emulation performance of nearly million packet transmissions per second is achieved on processors for network model consisting of about nodes using both parallel simulation and abstraction techniques real time emulation performance of nearly million packet transmissions per second is achieved on processors for network model consisting of about nodes
distributed smart cameras dsc are an emerging technology for broad range of important applications including smart rooms surveillance entertainment tracking and motion analysis by having access to many views and through cooperation among the individual cameras these dscs have the potential to realize many more complex and challenging applications than single camera systems this article focuses on the system level software required for efficient streaming applications on single smart cameras as well as on networks of dscs embedded platforms with limited resources do not provide middleware services well known on general purpose platforms our software framework supports transparent intra and interprocessor communication while keeping the memory and computation overhead very low the software framework is based on publisher subscriber architecture and provides mechanisms for dynamically loading and unloading software components as well as for graceful degradation in case of software and hardware related faults the software framework has been completely implemented and tested on our embedded smart cameras consisting of an arm based network processor and several digital signal processors two case studies demonstrate the feasibility of our approach
experimental research in dependability has evolved over the past years accompanied by dramatic changes in the computing industry to understand the magnitude and nature of this evolution this paper analyzes industrial trends namely shifting error sources explosive complexity and global volume under each of these trends the paper explores research technologies that are applicable either to the finished product or artifact and the processes that are used to produce products the study gives framework to not only reflect on the research of the past but also project the needs of the future
web transaction data usually convey user task oriented behaviour pattern web usage mining technique is able to capture such informative knowledge about user task pattern from usage data with the discovered usage pattern information it is possible to recommend web user more preferred content or customized presentation according to the derived task preference in this paper we propose web recommendation framework based on discovering task oriented usage pattern with probabilistic latent semantic analysis plsa model the user intended tasks are characterized by the latent factors through probabilistic inference to represent the user navigational interests moreover the active user’s intuitive task oriented preference is quantized by the probabilities by which pages visited in current user session are associated with various tasks as well combining the identified task preference of current user with the discovered usage based web page categories we can present user more potentially interested or preferred web content the preliminary experiments performed on real world data sets demonstrate the usability and effectiveness of the proposed approach
when designing software programmers usually think in terms of modules that are represented as functions and classes but using existing configuration management systems programmers have to deal with versions and configurations that are organized by files and directories this is inconvenient and error prone since there is gap between handling source code and managing configurations we present framework for programming environments that handles versions and configurations directly in terms of the functions and classes in source code we show that with this framework configuration management issues in software reuse and cooperative programming become easier we also present prototype environment that has been developed to verify our ideas
one of the major challenges that visual tracking algorithms face nowadays is being able to cope with changes in the appearance of the target during tracking linear subspace models have been extensively studied and are possibly the most popular way of modelling target appearance we introduce linear subspace representation in which the appearance of face is represented by the addition of two approximately independent linear subspaces modelling facial expressions and illumination respectively this model is more compact than previous bilinear or multilinear approaches the independence assumption notably simplifies system training we only require two image sequences one facial expression is subject to all possible illuminations in one sequence and the face adopts all facial expressions under one particular illumination in the other this simple model enables us to train the system with no manual intervention we also revisit the problem of efficiently fitting linear subspace based model to target image and introduce an additive procedure for solving this problem we prove that matthews and baker’s inverse compositional approach makes smoothness assumption on the subspace basis that is equivalent to hager and belhumeur’s which worsens convergence our approach differs from hager and belhumeur’s additive and matthews and baker’s compositional approaches in that we make no smoothness assumptions on the subspace basis in the experiments conducted we show that the model introduced accurately represents the appearance variations caused by illumination changes and facial expressions we also verify experimentally that our fitting procedure is more accurate and has better convergence rate than the other related approaches albeit at the expense of slight increase in computational cost our approach can be used for tracking human face at standard video frame rates on an average personal computer
we present novel approach for interactive navigation in complex synthetic environments using path planning our algorithm precomputes global roadmap of the environment by using variant of randomized motion planning algorithm along with reachability based analysis at runtime our algorithm performs graph searching and automatically computes collision free and constrained path between two user specified locations it also enables local user steered exploration subject to motion constraints and integrates these capabilities in the control loop of interaction our algorithm only requires the scene geometry avatar orientation and parameters relating the avatar size to the model size the performance of the preprocessing algorithm grows as linear function of the model size we demonstrate its performance on two large environments power plant and factory room
problem solving methods psms describe the reasoning components of knowledge based systems as patterns of behavior that can be reused across applications while the availability of extensive problem solving method libraries and the emerging consensus on problem solving method specification languages indicate the maturity of the field number of important research issues are still open in particular very little progress has been achieved on foundational and methodological issues hence despite the number of libraries which have been developed it is still not clear what organization principles should be adopted to construct truly comprehensive libraries covering large numbers of applications and encompassing both task specific and task independent problem solving methods in this paper we address these fundamental issues and present comprehensive and detailed framework for characterizing problem solving methods and their development process in particular we suggest that psm development consists of introducing assumptions and commitments along three dimensional space defined in terms of problem solving strategy task commitments and domain knowledge assumptions individual moves through this space can be formally described by means of adapters in the paper we illustrate our approach and argue that our architecture provides answers to three fundamental problems related to research in problem solving methods what is the epistemological structure and what are the modeling primitives of psms how can we model the psm development process and how can we develop and organize truly comprehensive and manageable libraries of problem solving methods
realism in games is constantly improving with increased computing power available to games and as game players demand more visual realism therefore facial animation and particularly animated speech is becoming more prevalent in games we present survey of facial animation techniques suitable for computer games we break our discussion into two areas modeling and animation to model face method of representing the geometry is combined with parameterization that is used to specify new shape for that geometry changing the shape over time will create animation and we discuss methods for animating expressions as well as achieving lip synchronized speech
the role of structure in specifying designing analysing constructing and evolving software has been the central theme of our research in distributed software engineering this structural discipline dictates formalisms and techniques that are compositional components that are context independent and systems that can be constructed and evolved incrementally this extended abstract overviews our development of structural approach to engineering distributed software and gives indications of our future work which moves from explicit to implicit structural specification with the benefit of hindsight we attempt to give rational history to our research
instance based learning ibl so called memory based reasoning mbr is commonly used non parametric learning algorithm nearest neighbor nn learning is the most popular realization of ibl due to its usability and adaptability nn has been successfully applied to wide range of applications however in practice one has to set important model parameters only empirically the number of neighbors and weights to those neighbors in this paper we propose structured ways to set these parameters based on locally linear reconstruction llr we then employed sequential minimal optimization smo for solving quadratic programming step involved in llr for classification to reduce the computational complexity experimental results from classification and eight regression tasks were promising enough to merit further investigation not only did llr outperform the conventional weight allocation methods without much additional computational cost but also llr was found to be robust to the change of
this paper outlines method for solving the stereovision matching problem using edge segments as the primitives in stereovision matching the following constraints are commonly used epipolar similarity smoothness ordering and uniqueness we propose new matching strategy under fuzzy context in which such constraints are mapped the fuzzy context integrates both fuzzy clustering and fuzzy cognitive maps with such purpose network of concepts nodes is designed each concept represents pair of primitives to be matched each concept has associated fuzzy value which determines the degree of the correspondence the goal is to achieve high performance in terms of correct matches the main findings of this paper are reflected in the use of the fuzzy context that allows building the network of concepts where the matching constraints are mapped initially each concept value is loaded via the fuzzy clustering and then updated by the fuzzy cognitive maps framework this updating is achieved through the influence of the remainder neighboring concepts until good global matching solution is achieved under this fuzzy approach we gain quantitative and qualitative matching correspondences this method works as relaxation matching approach and its performance is illustrated by comparative analysis against some existing global matching methods
most proposed web prefetching techniques make predictions based on the historical references to requested objects in contrast this paper examines the accuracy of predicting user’s next action based on analysis of the content of the pages requested recently by the user predictions are made using the similarity of model of the user’s interest to the text in and around the hypertext anchors of recently requested web pages this approach can make predictions of actions that have never been taken by the user and potentially make predictions that reflect current user interests we evaluate this technique using data from full content log of web activity and find that textual similarity based predictions outperform simpler approaches
assume you are given data population characterized by certain number of attributes assume moreover you are provided with the information that one of the individuals in this data population is abnormal but no reason whatsoever is given to you as to why this particular individual is to be considered abnormal in several cases you will be indeed interested in discovering such reasons this article is precisely concerned with this problem of discovering sets of attributes that account for the priori stated abnormality of an individual within given dataset criterion is presented to measure the abnormality of combinations of attribute values featured by the given abnormal individual with respect to the reference population in this respect each subset of attributes is intended to somehow represent ldquo property rdquo of individuals we distinguish between global and local properties global properties are subsets of attributes explaining the given abnormality with respect to the entire data population with local ones instead two subsets of attributes are singled out where the former one justifies the abnormality within the data subpopulation selected using the values taken by the exceptional individual on those attributes included in the latter one the problem of individuating abnormal properties with associated explanations is formally stated and analyzed such formal characterization is then exploited in order to devise efficient algorithms for detecting both global and local forms of most abnormal properties the experimental evidence which is accounted for in the article shows that the algorithms are both able to mine meaningful information and to accomplish the computational task by examining negligible fraction of the search space
producing small dnf expression consistent with given data is classical problem in computer science that occurs in number of forms and has numerous applications we consider two standard variants of this problem the first one is two level logic minimization or finding minimal dnf formula consistent with given complete truth table tt mindnf this problem was formulated by quine in and has been since one of the key problems in logic design it was proved np complete by masek in the best known polynomial approximation algorithm is based on reduction to the set cover problem and produces dnf formula of size opt where is the number of variables we prove that tt mindnf is np hard to approximate within dγ for some constant establishing the first inapproximability result for the problemthe other dnf minimization problem we consider is pac learning of dnf expressions when the learning algorithm must output dnf expression as its hypothesis referred to as proper learning we prove that dnf expressions are np hard to pac learn properly even when the learner has access to membership queries thereby answering long standing open question due to valiant finally we show that inapproximability of tt mindnf implies hardness results for restricted proper learning of dnf expressions with membership queries even when learning with respect to the uniform distribution only
rapid early but rough system prototypes are becoming standard and valued part of the user interface design process pen paper and tools like flash and director are well suited to creating such prototypes however in the case of physical forms with embedded technology there is lack of tools for developing rapid early prototypes instead the process tends to be fragmented into prototypes exploring forms that look like the intended product or explorations of functioning interactions that work like the intended product bringing these aspects together into full design concepts only later in the design process to help alleviate this problem we present simple tool for very rapidly creating functioning rough physical prototypes early in the design process supporting what amounts to interactive physical sketching our tool allows designer to combine exploration of form and interactive function using objects constructed from materials such as thumbtacks foil cardboard and masking tape enhanced with small electronic sensor board by means of simple and fluid tool for delivering events to screen clippings these physical sketches can then be easily connected to any existing or new program running on pc to provide real or wizard of oz supported functionality
antisocial networks are distributed systems based on social networking web sites that can be exploited by attackers and directed to carry out network attacks malicious users are able to take control of the visitors of social sites by remotely manipulating their browsers through legitimate web control functionality such as image loading html tags javascript instructions etc in this paper we experimentally show that social network web sites have the ideal properties to become attack platformswe start by identifying all the properties of facebook real world social network and then study how we can utilize these properties and transform it into an attack platform against any host connected to the internet towards this end we developed real world facebook application that can perform malicious actions covertly we experimentally measured it’s impact by studying how innocent facebook users can be manipulated into carrying out denial of service attack finally we explored other possible misuses of facebook and how they can be applied to other online social network web sites
what does it mean for profession to be considered mature how valid is the claim that software faults may be excused due to the immaturity of the field in giving that claim serious consideration one might assume that there are stages to maturity that maturity doesn’t arrive in the world fully formed if so an understanding of maturity may be found from the viewing of the differences across various professions in terms of stages of maturity perhaps signaled by how profession detects and handles faults the question thus becomes more refined are software professionals more or less mature than their counterparts in respective fields in regards to the detection and handling of faults which raises the previously begged but now follow up question to whom should software professionals be compared the down select for professions to choose for this comparison was straightforward first to disregard comparison with the physical sciences as one could make strong case that programming is nothing more than data and rules ones and zeros may represent any object on off true not true apples oranges aelopiles and zeppelins and that rules on objects are infinitely mutable literally valid now and invalid one half tenth of millisecond later software is distinctively arbitrary where the physical sciences are not well except perhaps for the quantum and the astro in joining software with the soft sciences the likeliest candidates for comparison were identified as the fields of economics and law economics at first glance appears to be combination of mathematics and logic applied to finance and law appears to be combination of philosophy and logic applied to rules of conduct there also appears commonality with these particular soft sciences and software in the attributes of design professionals in the field of economics design models of the world in terms of money professionals in the field of law design models of the world in terms of behavioral control and software professionals design models for any purpose in any terms that one may choose to take software may be used to model both economics and law so why not compare software professionals to their counterparts in economics and law on further investigation in development of this text the rationale for this investigation hurt the premise for if one considered that software is applied logic then software has no reason to be considered an immature field logic and philosophy go back at least to the ancient greeks to aristotle if software is immature in the light of history then what would that say about the maturity of logic and philosophy hush you cynics this author began to have severe doubts that perhaps this whole line of investigation was naively misguided further investigation yielded additional insights that although maturity may be an interesting topic in its own right perhaps it wasn’t key to understanding software faults that perhaps instead it was the art of design design being common feature across software economics and law with this new direction in mind and then taking one step back for perspective perhaps the common feature across the professions could be the design of design and so this author meandered on down paths less traveled and more shadowed note the subtitle observing and describing all of interest and taking off yet again in directions oblique the instinct of authorial self restraint placed in competition with curiosity all tugged and pulled and fretted at this author the conflict of design choice reflected in an investigation of design choice oh how self similar deja vu all over again the themes of this paper that continued beyond the initial investigation of maturity are as follows study of games versus competition in design the limits of competition and the implications of these limits revisit of standing philosophical problems in computer science in particular chess searle’s chinese room and the turing test studied as competitions an exploration of the meta in design conclusions which were in the first draft imagined to be most unlikely given the initial premise but in revision became necessary and unavoidable
the cell be processor is heterogeneous multicore that contains one powerpc processor element ppe and eight synergistic processor elements spes each spe has small software managed local store applications must explicitly control all dma transfers of code and data between the spe local stores and the main memory and they must perform any coherence actions required for data transferred the need for explicit memory management together with the limited size of the spe local stores makes it challenging to program the cell be and achieve high performance in this paper we present the design and implementation of our comic runtime system and its programming model it provides the program with an illusion of globally shared memory in which the ppe and each of the spes can access any shared data item without the programmer having to worry about where the data is or how to obtain it comic is implemented entirely in software with the aid of user level libraries provided by the cell sdk for each read or write operation in spe code comic runtime function is inserted to check whether the data is available in its local store and to automatically fetch it if it is not we propose memory consistency model and programming model for comic in which the management of synchronization and coherence is centralized in the ppe to characterize the effectiveness of the comic runtime system we evaluate it with twelve openmp benchmark applications on cell be system and an smp like homogeneous multicore xeon
computers require formally represented information to perform computations that support users yet users who have needed such support have often proved to be unable or unwilling to formalize it to address this problem this article introduces an approach called incremental formalization in which first users express information informally and then the system aids them in formalizing it incremental formalization requires system architecture the integrates formal and informal representations and supports progressive formalization of information the system should have both tools to capture naturally available informal information and techniques to suggest possible formalizations of this information the hyper object substrate hos was developed to satisfy these requirements hos has been applied to number of problem domains including network design archeological site analysis and neuroscience education users have been successful in adding informal information and then later formalizing it incrementally with the aid of the system our experience with hos has reaffirmed the need for information spaces to evolve during use and has identified additional considerations in the design and instantiation of systems enabling and supporting incremental formalization
deductive database system prototype logicbase has been developed with an emphasis on efficient compilation and query evaluation of application oriented recursions in deductive databases the system identifies different classes of recursions and compiles recursions into chain or psuedo chain forms when appropriate queries posed to the compiled recursions are analyzed systematically with efficient evaluation plans generated and executed mainly based on chained based query evaluation method the system has been tested using sophisticated recursions and queries with satisfactory performance this paper introduces the general design principles and implementation techniques of the system and discusses its strength and limitations
this article presents two probabilistic models for answering ranking in the multilingual question answering qa task which finds exact answers to natural language question written in different languages although some probabilistic methods have been utilized in traditional monolingual answer ranking limited prior research has been conducted for answer ranking in multilingual question answering with formal methods this article first describes probabilistic model that predicts the probabilities of correctness for individual answers in an independent way it then proposes novel probabilistic method to jointly predict the correctness of answers by considering both the correctness of individual answers as well as their correlations as far as we know this is the first probabilistic framework that proposes to model the correctness and correlation of answer candidates in multilingual question answering and provide novel approach to design flexible and extensible system architecture for answer selection in multilingual qa an extensive set of experiments were conducted to show the effectiveness of the proposed probabilistic methods in english to chinese and english to japanese cross lingual qa as well as english chinese and japanese monolingual qa using trec and ntcir questions
the incidence of hard errors in cpus is challenge for future multicore designs due to increasing total core area even if the location and nature of hard errors are known priori either at manufacture time or in the field cores with such errors must be disabled in the absence of hard error tolerance while caches with their regular and repetitive structures are easily covered against hard errors by providing spare arrays or spare lines structures within core are neither as regular nor as repetitive previous work has proposed microarchitectural core salvaging to exploit structural redundancy within core and maintain functionality in the presence of hard errors unfortunately microarchitectural salvaging introduces complexity and may provide only limited coverage of core area against hard errors due to lack of natural redundancy in the core this paper makes case for architectural core salvaging we observe that even if some individual cores cannot execute certain operations cpu die can be instruction set architecture isa compliant that is execute all of the instructions required by its isa by exploiting natural cross core redundancy we propose using hardware to migrate offending threads to another core that can execute the operation architectural core salvaging can cover large core area against faults and be implemented by leveraging known techniques that minimize changes to the microarchitecture we show it is possible to optimize architectural core salvaging such that the performance on faulty die approaches that of fault free die assuring significantly better performance than core disabling for many workloads and no worse performance than core disabling for the remainder
model driven architecture mda promotes the development of software systems through successive building and generation of models improving the reusability of models applying the same principles to the area of agent oriented software engineering aose advances the ideas behind mda even more significantly due to the inherent adaptivity of software agents we describe an appropriate set of models originating from requirements specification and transformable to models understandable and executable by agents thus demonstrating an agent oriented model driven architecture amda approach in amda agents use hierarchical business knowledge models with business process rules at the top business rules to control policy and logic in the middle and base layer defining business concepts being externalised knowledge is easily configurable by human beings and applied by software agents real case study is used to illustrate the process the main advances over the object oriented mda are the addition of component dynamics ii the use of agent executable rule based business models and iii proposed higher level of abstraction with the direct representation of business requirements
analysis of range queries on spatial multidimensional data is both important and challenging most previous analysis attempts have made certain simplifying assumptions about the data sets and or queries to keep the analysis tractable as result they may not be universally applicable this paper proposes set of five analysis techniques to estimate the selectivity and number of index nodes accessed in serving range query the underlying philosophy behind these techniques is to maintain an auxiliary data structure called density file whose creation is one time cost which can be quickly consulted when the query is given the schemes differ in what information is kept in the density file how it is maintained and how this information is looked up it is shown that one of the proposed schemes called cumulative density cd gives very accurate results usually less than percent error using diverse suite of point and rectangular data sets that are uniform or skewed and wide range of query window parameters the estimation takes constant amount of time which is typically lower than percent of the time that it would take to execute the query regardless of data set or query window parameters
in this paper we show that generative models are competitive with and sometimes superior to discriminative models when both kinds of models are allowed to learn structures that are optimal for discrimination in particular we compare bayesian networks and conditional loglinear models on two nlp tasks we observe that when the structure of the generative model encodes very strong independence assumptions la naive bayes discriminative model is superior but when the generative model is allowed to weaken these independence assumptions via learning more complex structure it can achieve very similar or better performance than corresponding discriminative model in addition as structure learning for generative models is far more efficient they may be preferable for some tasks
this paper evaluates the impact of the parallel scheduling strategy on the performance of the file access in parallel file system for clusters of commodity computers clusterfile we argue that the parallel scheduling strategy should be seen as complement to other file access optimizations like striping over several servers non contiguous and collective our study is based on three simple decentralized parallel heuristics implemented inside clusterfile the measurements in real environment show that the performance of parallel file access may vary with as much as for writing and for reading with the employed heuristic and with the schedule block granularity
in this paper we consider the problem of analysing the shape of an object defined by polynomial equations in domain we describe regularity criteria which allow us to determine the topology of the implicit object in box from information on the boundary of this box such criteria are given for planar and space algebraic curves and for algebraic surfaces these tests are used in subdivision methods in order to produce polygonal approximation of the algebraic curves or surfaces even if it contains singular points we exploit the representation of polynomials in bernstein basis to check these criteria and to compute the intersection of edges or facets of the box with these curves or surfaces our treatment of singularities exploits results from singularity theory such as an explicit whitney stratification or the local conic structure around singularities few examples illustrate the behavior of the algorithms
object invariants define the consistency of objects they have subtle semantics because of call backs multi object invariants and subclassing several visible state verification techniques for object invariants have been proposed it is difficult to compare these techniques and ascertain their soundness because of differences in restrictions on programs and invariants in the use of advanced type systems eg ownership types in the meaning of invariants and in proof obligationswe develop unified framework for such techniques we distil seven parameters that characterise verification technique and identify sufficient conditions on these parameters which guarantee soundness we instantiate our framework with three verification techniques from the literature and use it to assess soundness and compare expressiveness
in component based software engineering the response time of an entire application is often predicted from the execution durations of individual component services however these execution durations are specific for an execution platform ie its resources such as cpu and for usage profile reusing an existing component on different execution platforms up to now required repeated measurements of the concerned components for each relevant combination of execution platform and usage profile leading to high effort this paper presents novel integrated approach that overcomes these limitations by reconstructing behaviour models with platform independent resource demands of bytecode components the reconstructed models are parameterised over input parameter values using platform specific results of bytecode benchmarking our approach is able to translate the platform independent resource demands into predictions for execution durations on certain platform we validate our approach by predicting the performance of file sharing application
we propose new dynamic method for multidimensional selectivity estimation for range queries that works accurately independent of data distribution good estimation of selectivity is important for query optimization and physical database design our method employs the multilevel grid file mlgf for accurate estimation of multidimensional data distribution the mlgf is dynamic hierarchical balanced multidimensional file structure that gracefully adapts to nonuniform and correlated distributions we show that the mlgf directory naturally represents multidimensional data distribution we then extend it for further refinement and present the selectivity estimation method based on the mlgf extensive experiments have been performed to test the accuracy of selectivity estimation the results show that estimation errors are very small independent of distributions even with correlated and or highly skewed ones finally we analyze the cause of errors in estimation and investigate the effects of various parameters on the accuracy of estimation
automatically generated lexers and parsers for programming languages have long history although they are well suited for many languages many widely used generators among them flex and bison fail to handle input stream ambiguities that arise in embedded languages in legacy languages and in programming by voice we have developed blender combined lexer and parser generator that enables designers to describe many classes of embedded languages and to handle ambiguities in spoken input and in legacy languages we have enhanced the incremental lexing and parsing algorithms in our harmonia framework to analyse lexical syntactic and semantic ambiguities the combination of better language description and enhanced analysis provides powerful platform on which to build the next generation of language analysis tools
support for optimistic parallelism such as thread level speculation tls and transactional memory tm has been proposed to ease the task of parallelizing software to exploit the new abundance of multicores key requirement for such support is the mechanism for tracking memory accesses so that conflicts between speculative threads or transactions can be detected existing schemes mainly track accesses at single fixed granularity ie at the word level cache line level or page level in this paper we demonstrate for hardware implementation of tls and corresponding speculatively parallelized specint benchmarks that the coarsest access tracking granularity that does not incur false violations varies significantly across applications within applications and across ranges of memory from word size to page size these results motivate variable granularity approach to access tracking and we show that such an approach can reduce the number of memory ranges that must be tracked and compared to detect conflicts can be reduced by an order of magnitude compared to word level tracking without increasing false violations we are currently developing variable granularity implementations of both hardware based tls system and an stm system
we integrate propbank semantic role labels to an existing statistical parsing model producing richer output we show conclusive results on joint learning and inference of syntactic and semantic representations
wireless and mobiles networks are excellent playground for researchers with an algorithm background many research problem turn out to be variants of classic graph theory problems in particular the rapidly growing areas for ad hoc and sensor networks demand new solutions for timeless graph theory problems because wireless devices have lower bandwidth and ii wireless devices are mobile and therefore the topology of the network changes rather frequently as consequences algorithms for wireless and mobile networks should have as little communication as possible and should ii run as fast as possible both goals can only be achieved by developing algorithms requiring small number of communication rounds only so called local algorithm in the work we present few algorithmic applications in wireless networking such as clustering topology control and geo routing each section is supplemented with an open problem
the influential pure embedding methodology of embedding domain specific languages dsls as libraries into general purpose host language forces the dsl designer to commit to single semantics this precludes the subsequent addition of compilation optimization or domain specific analyses we propose polymorphic embedding of dsls where many different interpretations of dsl can be provided as reusable components and show how polymorphic embedding can be realized in the programming language scala with polymorphic embedding the static type safety modularity composability and rapid prototyping of pure embedding are reconciled with the flexibility attainable by external toolchains
this paper addresses new cache organization in chip multiprocessors cmp environment we introduce nahalal an architecture whose novel floorplan topology partitions cached data according to its usage shared versus private data and thus enables fast access to shared data for all processors while preserving the vicinity of private data to each processor the nahalal architecture combines the best of both shared caches and private caches enabling fast accesses to data as in private caches while eliminating the need for inter cache coherence transactions detailed simulations in simics demonstrate that nahalal decreases cache access latency by up to compared to traditional cmp designs yielding performance gains of up to in run time
selecting random peer with uniform probability across peer to peer pp network is fundamental function for unstructured search data replication and monitoring algorithms such uniform sampling is supported by several techniques however current techniques suffer from sample bias and limited applicability in this paper we present sampling algorithm that achieves desired uniformity while making essentially no assumptions about the underlying pp network this algorithm called doubly stochastic converge dsc iteratively adjusts the probabilities of crossing each link in the network during random walk such that the resulting transition matrix is doubly stochastic dsc is fully decentralized and is designed to work on both directed and undirected topologies making it suitable for virtually any pp network our simulations show that dsc converges quickly on wide variety of topologies and that the random walks needed for sampling are short for most topologies in simulation studies with freepastry we show that dsc is resilient to high levels of churn while incurring minimal sample bias
with increasing amount of data being stored in xml format olap queries over these data become important olap queries have been well studied in the relational database systems however the evaluation of olap queries over xml data is not trivial extension of the relational solutions especially when schema is not available in this paper we introduce the ix cube iceberg xml cube over xml data to tackle the problem we extend olap operations to xml data we also develop efficient approaches to ix cube computation and olap query evaluation using ix cubes
this paper develops technique that uniquely combines the advantages of static scheduling and dynamic scheduling to reduce the energy consumed in modern superscalar processors with out of order issue logic in this hybrid scheduling paradigm regions of the application containing large amounts of parallelism visible at compile time completely bypass the dynamic scheduling logic and execute in low power static mode simulation studies using the wattch framework on several media and scientific benchmarks demonstrate large improvements in overall energy consumption of in kernels and in full applications with only performance degradation on average
dynamic kernel data have become an attractive target for kernel mode malware however previous solutions for checking kernel integrity either limit themselves to code and static data or can only inspect fraction of dynamic data resulting in limited protection our study shows that previous solutions may reach only of the dynamic kernel data and thus may fail to identify function pointers manipulated by many kernel mode malware to enable systematic kernel integrity checking in this paper we present kop system that can map dynamic kernel data with nearly complete coverage and nearly perfect accuracy unlike previous approaches which ignore generic pointers unions and dynamic arrays when locating dynamic kernel objects kop applies inter procedural points to analysis to compute all possible types for generic pointers eg void uses pattern matching algorithm to resolve type ambiguities eg unions and recognizes dynamic arrays by leveraging knowledge of kernel memory pool boundaries we implemented prototype of kop and evaluated it on windows vista sp system loaded with kernel drivers kop was able to accurately map of all the dynamic kernel data to demonstrate kop’s power we developed two tools based on it to systematically identify malicious function pointers and uncover hidden kernel objects our tools correctly identified all malicious function pointers and all hidden objects from nine real world kernel mode malware samples as well as one created by ourselves with no false alarms
literally hundreds of thousands of users of computer aided design cad tools are in the difficult process of transitioning to cad tools common problem for these users is disorientation in the abstract virtual environments that occur while developing new scenes to help address this problem we present novel in scene widget called the viewcube as orientation indicator and controller the viewcube is cube shaped widget placed in corner of the window when acting as an orientation indicator the viewcube turns to reflect the current view direction as the user re orients the scene using other tools when used as an orientation controller the viewcube can be dragged or the faces edges or corners can be clicked on to easily orient the scene to the corresponding view we conducted formal experiment to measure the performance of the viewcube comparing arcball style dragging using the viewcube for manual view switching clicking on face edge corner elements of the viewcube for automated view switching and clicking on dedicated row of buttons for automated view switching the results indicate that users prefer and are almost twice as fast at using the viewcube with dragging compared to clicking techniques independent of number of viewcube representations that we examined
successfully building model requires combination of expertise in the problem domain and in the practice of modeling and simulation model verification validation and testing vv are essential to the consistent production of models that are useful and correct there are significant communities of domain experts that build and use models without employing dedicated modeling specialists current modeling tools relatively underserve these communities particularly in the area of model testing and evaluation this paper describes several techniques that modeling tools can use to support the domain expert in performing vv and discusses the advantages and disadvantages of this approach to modeling
the primary implementations of aspectj to date are based on compile or load time weaving process that produces java byte code although this implementation strategy has been crucial to the adoption of aspectj it faces inherent performance constraints that stem from mismatch between java byte code and aspectj semantics we discuss these mismatches and show their performance impact on advice dispatch and we present machine code model that can be targeted by virtual machine jit compilers to alleviate this inefficiency we also present an implementation based on the jikes rvm which targets this machine code model performance evaluation with set of micro benchmarks shows that our machine code model provides improved performance over translation of advice dispatch to java byte code
the growing use of rfid in supply chains brings along an indisputable added value from the business perspective but raises number of new interesting security challenges one of them is the authentication of two participants of the supply chain that have possessed the same tagged item but that have otherwise never communicated before the situation is even more complex if we imagine that participants to the supply chain may be business competitors we present novel cryptographic scheme that solves this problem in our solution users exchange tags over the cycle of supply chain and if two entities have possessed the same tag they agree on secret common key they can use to protect their exchange of business sensitive information no rogue user can be successful in malicious authentication because it would either be traceable or it would imply the loss of secret key which provides strong incentive to keep the tag authentication information secret and protects the integrity of the supply chain we provide game based security proofs of our claims without relying on the random oracle model
we review logic based modeling language prism and report recent developments including belief propagation by the generalized inside outside algorithm and generative modeling with constraints the former implies prism subsumes belief propagation at the algorithmic level we also compare the performance of prism with state of theart systems in statistical natural language processing and probabilistic inference in bayesian networks respectively and show that prism is reasonably competitive
we present flexible architecture for trusted computing called terra that allows applications with wide range of security requirements to run simultaneously on commodity hardware applications on terra enjoy the semantics of running on separate dedicated tamper resistant hardware platform while retaining the ability to run side by side with normal applications on general purpose computing platform terra achieves this synthesis by use of trusted virtual machine monitor tvmm that partitions tamper resistant hardware platform into multiple isolated virtual machines vm providing the appearance of multiple boxes on single general purpose platform to each vm the tvmm provides the semantics of either an open box ie general purpose hardware platform like today’s pcs and workstations or closed box an opaque special purpose platform that protects the privacy and integrity of its contents like today’s game consoles and cellular phones the software stack in each vm can be tailored from the hardware interface up to meet the security requirements of its application the hardware and tvmm can act as trusted party to allow closed box vms to cryptographically identify the software they run ie what is in the box to remote parties we explore the strengths and limitations of this architecture by describing our prototype implementation and several applications that we developed for it
model driven and component based software development seems to be promising approach to handling the complexity and at the same time increasing the quality of software systems although the idea of assembling systems from pre fabricated components is appealing quality becomes major issue especially for embedded systems quality defects in one component might not affect the quality of the component but that of others this paper presents an integrated formal verification approach to ensure the correct behavior of embedded software components as well as case study that demonstrates its practical applicability the approach is based on the formalism of abstract components and their refinements with its focus being on interaction behavior among components the approach enables the identification of unanticipated design errors that are difficult to find and costly to correct using traditional verification methods such as testing and simulation
the traditional entity extraction problem lies in the ability of extracting named entities from plain text using natural language processing techniques and intensive training from large document collections examples of named entities include organisations people locations or dates there are many research activities involving named entities we are interested in entity ranking in the field of information retrieval in this paper we describe our approach to identifying and ranking entities from the inex wikipedia document collection wikipedia offers number of interesting features for entity identification and ranking that we first introduce we then describe the principles and the architecture of our entity ranking system and introduce our methodology for evaluation our preliminary results show that the use of categories and the link structure of wikipedia together with entity examples can significantly improve retrieval effectiveness
in order to find all occurrences of tree twig pattern in an xml database number of holistic twig join algorithms have been proposed however most of these algorithms focus on identifying larger query class or using novel label scheme to reduce operations and ignore the deficiency of the root to leaf strategy in this paper we propose novel twig join algorithm called track which adopts the opposite leaf to root strategy to process queries it brings us two benefits avoiding too much time checking the element index to make sure all branches are satisfied before new element comes ii using the tree structure to encode final tree matches so as to avoid the merging process further experiments on diverse data sets show that our algorithm is indeed superior to current algorithms in terms of query processing performance
as the user’s document and application workspace grows more diverse supporting personal information management becomes increasingly important this trend toward diversity renders it difficult to implement systems which are tailored to specific applications file types or other information sources we developed seetrieve personal document retrieval and classification system which abstracts applications by considering only the text they present to the user through the user interface associating the visible text which surrounds document in time seetrieve is able to identify important information about the task within which document is used this context enables novel useful ways for users to retrieve their personal documents when compared to content based systems this context based retrieval achieved substantial improvements in document recall
software vendors collect bug reports from customers to improve the quality of their software these reports should include the inputs that make the software fail to enable vendors to reproduce the bug however vendors rarely include these inputs in reports because they may contain private user data we describe solution to this problem that provides software vendors with new input values that satisfy the conditions required to make the software follow the same execution path until it fails but are otherwise unrelated with the original inputs these new inputs allow vendors to reproduce the bug while revealing less private information than existing approaches additionally we provide mechanism to measure the amount of information revealed in an error report this mechanism allows users to perform informed decisions on whether or not to submit reports we implemented prototype of our solution and evaluated it with real errors in real programs the results show that we can produce error reports that allow software vendors to reproduce bugs while revealing almost no private information
an active object oriented knowledge base server can provide many desirable features for supporting wide spectrum of advanced and complex database applications knowledge rules which are used to define variety of database tasks to be performed automatically on the occurrence of some events often need much more sophisticated rule specification and control mechanisms than the traditional priority based mechanism to specify the control structural relationships and parallel execution properties among rules the underlying object oriented oo knowledge representation model must provide means to model the structural relationships among data entities and the control structures among rules in uniform fashion the transaction execution model must provide means to incorporate the execution of structured rules in transaction framework also parallel implementation of an active knowledge base server is essential to achieve the needed efficiency in processing nested transactions and rules in this paper we present the architecture implementation and performance of parallel active oo knowledge base server which has the following features first the server is developed based on an extended oo knowledge representation model that models rules as objects and their control structural relationships as association types this is analogous to the modeling of entities as objects and their structural relationships as association types thus entities and rules and their structures can be uniformly modeled second the server uses graph based transaction model that can naturally incorporate the control semantics of structured rules and guarantee the serializable execution of rules as subtransactions thus the rule execution model is uniformly integrated with that of transactions third it uses an asynchronous parallel execution model to process the graph based transactions and structured rules this server named osam kbms has been implemented on shared nothing multiprocessor system ncube to verify and evaluate the proposed knowledge representation model graph based transaction model and asynchronous parallel execution model
the increasing popularity of cloud storage is leading organizations to consider moving data out of their own data centers and into the cloud however success for cloud storage providers can present significant risk to customers namely it becomes very expensive to switch storage providers in this paper we make case for applying raid like techniques used by disks and file systems but at the cloud storage level we argue that striping user data across multiple providers can allow customers to avoid vendor lock in reduce the cost of switching providers and better tolerate provider outages or failures we introduce racs proxy that transparently spreads the storage load over many providers we evaluate prototype of our system and estimate the costs incurred and benefits reaped finally we use trace driven simulations to demonstrate how racs can reduce the cost of switching storage vendors for large organization such as the internet archive by seven fold or more by varying erasure coding parameters
recent technological advances have produced network interfaces that provide users with very low latency access to the memory of remote machines we examine the impact of such networks on the implementation and performance of software dsm specifically we compare two dsm systems cashmere and treadmarks on processor dec alpha cluster connected by memory channel networkboth cashmere and treadmarks use virtual memory to maintain coherence on pages and both use lazy multi writer release consistency the systems differ dramatically however in the mechanisms used to track sharing information and to collect and merge concurrent updates to page with the result that cashmere communicates much more frequently and at much finer grainour principal conclusion is that low latency networks make dsm based on fine grain communication competitive with more coarse grain approaches but that further hardware improvements will be needed before such systems can provide consistently superior performance in our experiments cashmere scales slightly better than treadmarks for applications with false sharing at the same time it is severely constrained by limitations of the current memory channel hardware in general performance is better for treadmarks
learning to rank for information retrieval ir is task to automatically construct ranking model using training data such that the model can sort new objects according to their degrees of relevance preference or importance many ir problems are by nature ranking problems and many ir technologies can be potentially enhanced by using learning to rank techniques the objective of this tutorial is to give an introduction to this research direction specifically the existing learning to rank algorithms are reviewed and categorized into three approaches the pointwise pairwise and listwise approaches the advantages and disadvantages with each approach are analyzed and the relationships between the loss functions used in these approaches and ir evaluation measures are discussed then the empirical evaluations on typical learning to rank methods are shown with the letor collection as benchmark dataset which seems to suggest that the listwise approach be the most effective one among all the approaches after that statistical ranking theory is introduced which can describe different learning to rank algorithms and be used to analyze their query level generalization abilities at the end of the tutorial we provide summary and discuss potential future work on learning to rank
we present framework for verifying that programs correctly preserve important data structure consistency properties results from our implemented system indicate that our system can effectively enable the scalable verification of very precise data structure consistency properties within complete programs our system treats both internal properties which deal with single data structure implementation and external properties which deal with properties that involve multiple data structures key aspect of our system is that it enables multiple analysis and verification packages to productively interoperate to analyze single program in particular it supports the targeted use of very precise unscalable analyses in the context of larger analysis and verification system the integration of different analyses in our system is based on common set based specification language precise analyses verify that data structures conform to set specifications whereas scalable analyses verify relationships between data structures and preconditions of data structure operationsthere are several reasons why our system may be of interest in broader program analysis and verification effort first it can ensure that the program satisfies important data structure consistency properties which is an important goal in and of itself second it can provide information that insulates other analysis and verification tools from having to deal directly with pointers and data structure implementations thereby enabling these tools to focus on the key properties that they are designed to analyze finally we expect other developers to be able to leverage its basic structuring concepts to enable the scalable verification of other program safety and correctness properties
enterprise resource planning erp is the technology that provides the unified business function to the organization by integrating the core processes erp now is experiencing the transformation that will make it highly integrated more intelligent more collaborative web enabled and even wireless the erp system is becoming the system with high vulnerability and high confidentiality in which the security is critical for it to operate many erp vendors have already integrated their security solution which may work well internally while in an open environment we need new technical approaches to secure an erp system this paper introduces erp technology from its evolution through architecture to its products the security solution in erp as well as directions for secure erp systems is presented
there has been an increasing interest in employing decision theoretic framework for learner modeling and provision of pedagogical support in intelligent tutoring systems itss much of the existing learner modeling research work focuses on identifying appropriate learner properties little attention however has been given to leverage dynamic decision network ddn as dynamic learner model to reason and intervene across time employing ddn based learner model in scientific inquiry learning environment however remains at infant stage because there are factors contributed to the performance the learner model three factors have been identified to influence the matching accuracy of inqpro’s learner model these factors are thestructureof ddn model thevariable instantiationapproach and dns in this research work two phase empirical study involving learners and six domain experts was conducted to determine the optimal conditions for the inqpro’s dynamic learner model the empirical results suggested each time slice of the inqpro’s ddn should consist of dn and that dn should correspond to the graphical user interface gui accessed in light of evidence observable variables should be instantiated to their observedstates leaving the remaining observable nodes uninstantiated the empirical results also indicated that varying weights between two consecutive dns could optimize the matching accuracy of inqpro’s dynamic learner model
developing pervasive computing applications is difficult task because it requires to deal with wide range of issues heterogeneous devices entity distribution entity coordination low level hardware knowledge besides requiring various areas of expertise programming such applications involves writing lot of administrative code to glue technologies together and to interface with both hardware and software components this paper proposes generative programming approach to providing programming execution and simulation support dedicated to the pervasive computing domain this approach relies on domain specific language named diaspec dedicated to the description of pervasive computing systems our generative approach factors out features of distributed systems technologies making diaspec specified software systems portable the diaspec compiler is implemented and has been used to generate dedicated programming frameworks for variety of pervasive computing applications including detailed ones to manage the building of an engineering school
frequent itemset mining is often regarded as advanced querying where user specifies the source dataset and pattern constraints using given constraint model recently new problem of optimizing processing of sets of frequent itemset queries has been considered and two multiple query optimization techniques for frequent itemset queries mine merge and common counting have been proposed and tested on the apriori algorithm in this paper we discuss and experimentally evaluate three strategies for concurrent processing of frequent itemset queries using fp growth as basic frequent itemset mining algorithm the first strategy is mine merge which does not depend on particular mining algorithm and can be applied to fp growth without modifications the second is an implementation of the general idea of common counting for fp growth the last is completely new strategy motivated by identified shortcomings of the previous two strategies in the context of fp growth
pipeline flushes due to branch mispredictions is one of the most serious problems facing the designer of deeply pipelined superscalar processor many branch predictors have been proposed to help alleviate this problem including two level adaptive branch predictors and hybrid branch predictorsnumerous studies have shown which predictors and configurations best predict the branches in given set of benchmarks some studies have also investigated effects such as pattern history table interference that can be detrimental to the performance of these predictors however little research has been done on which characteristics of branch behavior make predictors perform wellin this paper we investigate and quantify reasons why branches are predictable we show that some of this predictability is not captured by the two level adaptive branch predictors an understanding of the predictability of branches may lead to insights ultimately resulting in better or less complex predictors we also investigate and quantify what fraction of the branches in each benchmark is predictable using each of the methods described in this paper
in this paper we propose new graph based data structure and indexing to organize and retrieve video data several researches have shown that graph can be better candidate for modeling semantically rich and complicated multimedia data however there are few methods that consider the temporal feature of video data which is distinguishable and representative characteristic when compared with other multimedia ie images in order to consider the temporal feature effectively and efficiently we propose new graph based data structure called spatio temporal region graph strg unlike existing graph based data structures which provide only spatial features the proposed strg further provides temporal features which represent temporal relationships among spatial objects the strg is decomposed into its subgraphs in which redundant subgraphs are eliminated to reduce the index size and search time because the computational complexity of graph matching subgraph isomorphism is np complete in addition new distance measure called extended graph edit distance eged is introduced in both non metric and metric spaces for matching and indexing respectively based on strg and eged we propose new indexing method strg index which is faster and more accurate since it uses tree structure and clustering algorithm we compare the strg index with the tree which is popular tree based indexing method for multimedia data the strg index outperforms the tree for various query loads in terms of cost and speed
we describe framework for adding type qualifiers to language type qualifiers encode simple but highly useful form of subtyping our framework extends standard type rules to model the flow of qualifiers through program where each qualifier or set of qualifiers comes with additional rules that capture its semantics our framework allows types to be polymorphic in the type qualifiers we present const inference system for as an example application of the framework we show that for set of real programs many more consts can be used than are actually present in the original code
sentiment analysis of weblogs is challenging problem most previous work utilized semantic orientations of words or phrases to classify sentiments of weblogs the problem with this approach is that semantic orientations of words or phrases are investigated without considering the domain of weblogs weblogs contain the author’s various opinions about multifaceted topics therefore we have to treat semantic orientation domain dependently in this paper we present an unsupervised learning model based on aspect model to classify sentiments of weblogs our model utilizes domain dependent semantic orientations of latent variables instead of words or phrases and uses them to classify sentiments of weblogs experiments on several domains confirm that our model assigns domain dependent semantic orientations to latent variables correctly and classifies sentiments of weblogs effectively
the architecture herein advanced finds its rationale in the visual interpretation of data obtained from monitoring computers and computer networks with the objective of detecting security violations this new outlook on the problem may offer new and unprecedented techniques for intrusion detection which take advantage of algorithmic tools drawn from the realm of image processing and computer vision in the system we propose the normal interaction between users and network configuration is represented in the form of snapshots that refer to limited number of attack free instances of different applications based on the representations generated in this way library is built which is managed according to case based approach the comparison between the query snapshot and those recorded in the system database is performed by computing the earth mover’s distance between the corresponding feature distributions obtained through cluster analysis
soft errors induced by terrestrial radiation are becoming significant concern in architectures designed in newer technologies if left undetected these errors can result in catastrophic consequences or costly maintenance problems in different embedded applications in this article we focus on utilizing the compiler’s help in duplicating instructions for error detection in vliw datapaths the instruction duplication mechanism is further supported by hardware enhancement for efficient result verification which avoids the need of additional comparison instructions in the proposed approach the compiler determines the instruction schedule by balancing the permissible performance degradation and the energy constraint with the required degree of duplication our experimental results show that our algorithms allow the designer to perform trade off analysis between performance reliability and energy consumption
transaction processing is of growing importance for mobile computing booking tickets flight reservation banking epayment and booking holiday arrangements are just few examples for mobile transactions due to temporarily disconnected situations the synchronisation and consistent transaction processing are key issues serializability is too strong criteria for correctness when the semantics of transaction is known we introduce transaction model that allows higher concurrency for certain class of transactions defined by its semantic the transaction results are escrow serializable and the synchronisation mechanism is non blocking experimental implementation showed higher concurrency transaction throughput and less resources used than common locking or optimistic protocols
performance tools based on hardware counters can efficiently profile the cache behavior of an application and help software developers improve its cache utilization simulator based tools can potentially provide more insights and flexibility and model many different cache configurations but have the drawback of large run time overheadwe present statcache performance tool based on statistical cache model it has small run time overhead while providing much of the flexibility of simulator based tools monitor process running in the background collects sparse memory access statistics about the analyzed application running natively on host computer generic locality information is derived and presented in code centric and or data centric viewwe evaluate the accuracy and performance of the tool using ten spec cpu benchmarks we also exemplify how the flexibility of the tool can be used to better understand the characteristics of cache related performance problems
motivated by growing need for intelligent housing to accommodate ageing populations we propose novel application of intertransaction association rule iar mining to detect anomalous behaviour in smart home occupants an efficient mining algorithm that avoids the candidate generation bottleneck limiting the application of current iar mining algorithms on smart home data sets is detailed an original visual interface for the exploration of new and changing behaviours distilled from discovered patterns using new process for finding emergent rules is presented finally we discuss our observations on the emergent behaviours detected in the homes of two real world subjects
the problem of results merging in distributed information retrieval environments has gained significant attention the last years two generic approaches have been introduced in research the first approach aims at estimating the relevance of the documents returned from the remote collections through ad hoc methodologies such as weighted score merging regression etc while the other is based on downloading all the documents locally completely or partially in order to calculate their relevance both approaches have advantages and disadvantages download methodologies are more effective but they pose significant overhead on the process in terms of time and bandwidth approaches that rely solely on estimation on the other hand usually depend on document relevance scores being reported by the remote collections in order to achieve maximum performance in addition to that regression algorithms which have proved to be more effective than weighted scores merging algorithms need significant number of overlap documents in order to function effectively practically requiring multiple interactions with the remote collections the new algorithm that is introduced is based on adaptively downloading limited selected number of documents from the remote collections and estimating the relevance of the rest through regression methodologies thus it reconciles the above two approaches combining their strengths while minimizing their drawbacks achieving the limited time and bandwidth overhead of the estimation approaches and the increased effectiveness of the download the proposed algorithm is tested in variety of settings and its performance is found to be significantly better than the former while approximating that of the latter
wireless sensor nodes are increasingly being tasked with computation and communication intensive functions while still subject to constraints related to energy availability on these embedded platforms once all low power design techniques have been explored duty cycling the various subsystems remains the primary option to meet the energy and power constraints this requires the ability to provide spurts of high mips and high bandwidth connections however due to the large overheads associated with duty cycling the computation and communication subsystems existing high performance sensor platforms are not efficient in supporting such an option in this paper we present the design and optimizations taken in wireless gateway node wgn that bridges data from wireless sensor networks to wi fi networks in an on demand basis we discuss our strategies to reduce duty cycling related costs by partitioning the system and by reducing the amount of time required to activate or deactivate the high powered components we compare the design choices and performance parameters with those made in the intel stargate platform to show the effectiveness of duty cycling on our platform we have built working prototype and the experimental results with two different power management schemes show significant reductions in latency and average power consumption compared to the stargate
the set of tcp congestion control algorithms associated with tcp reno eg slow start and congestion avoidance have been crucial to ensuring the stability of the internet algorithms such as tcp newreno which has been deployed and tcp vegas which has not been deployed represent incrementally deployable enhancements to tcp as they have been shown to improve tcp connection’s throughput without degrading performance to competing flows our research focuses on delay based congestion avoidance algorithms dca like tcp vegas which attempt to utilize the congestion information contained in packet round trip time rtt samples through measurement and simulation we show evidence suggesting that single deployment of dca ie tcp connection enhanced with dca algorithm is not viable enhancement to tcp over high speed paths we define several performance metrics that quantify the level of correlation between packet loss and rtt based on our measurement analysis we find that although there is useful congestion information contained within rtt samples the level of correlation between an increase in rtt and packet loss is not strong enough to allow tcp sender to reliably improve throughput while dca is able to reduce the packet loss rate experienced by connection in its attempts to avoid packet loss the algorithm will react unnecessarily to rtt variation that is not associated with packet loss the result is degraded throughput as compared to similar flow that does not support dca
in this paper we describe an interactive artwork that uses large body gestures as its primary interactive mode the artist intends the work to provoke active reflection in the audience by way of gesture and content the technology is not the focus rather the aim is to provoke memory to elicit feelings of connective human experiences in required to participate audience we find the work provokes diverse and contradictory set of responses the methods used to understand this include qualitative methods common to evaluating interactive art works as well as in depth discussions with the artist herself this paper is relevant to the human centered computing track because in all stages of the design of the work as well as the evaluation the focus is on the human aspect the computing is designed to enable all too human responses
to prevent information leakage in multilevel secure data models the concept of polyinstantiation was inevitably introduced unfortunately when it comes to references through foreign key in multilevel relational data models the polyinstantiation causes referential ambiguities to resolve this problem this paper proposes an extended referential integrity semantics for multilevel relational data model multilevel secure referential integrity semantics mls ris the mls ris distinguishes foreign key into two types of references ie value based and entity based reference for each type it defines the referential integrity to be held between two multilevel relations and provides resolution rules for the referential ambiguities in addition the mls ris specifies the semantics of referential actions of the sql update operations so as to preserve the referential integrity
as the importance of recommender systems increases in combination with the explosion in data available over the internet and in our own digital libraries we suggest an alternative method of providing explicit user feedback we create tangible interface which will not only facilitate multitasking but provide an enjoyable way of completing an otherwise frustrating and perhaps tiresome task
since manual black box testing of gui based applications gaps is tedious and laborious test engineers create test scripts to automate the testing process these test scripts interact with gaps by performing actions on their gui objects an extra effort that test engineers put in writing test scripts is paid off when these scripts are run repeatedly unfortunately releasing new versions of gaps with modified guis breaks their corresponding test scripts thereby obliterating benefits of test automation we offer novel approach for maintaining and evolving test scripts so that they can test new versions of their respective gaps we built tool to implement our approach and we conducted case study with forty five professional programmers and test engineers to evaluate this tool the results show with strong statistical significance that users find more failures and report fewer false positives in test scripts with our tool than with flagship industry product and baseline manual approach our tool is lightweight and it takes less than eight seconds to analyze approximately kloc of test scripts
the interference map of an network is collection of data structures that can help heuristics for routing channel assignment and call admission in dense wireless networks the map can be obtained from detailed measurements which are time consuming and require network down time we explore methods and models to produce the interference map with reduced number of measurements by identifying interference properties that help to extrapolate complex measurements from simple measurements actual interference in an testbed is shown to follow certain regularities it is linear with respect to packet rate of the source packet rate of the interferer and shows independence among interferers when multiple cards are available they behave differently and even different channels of the same card have different performance we find that while current methods of gathering the interference map may be appropriate for characterizing interference in one card networks they are unscalable for multiple card networks when considering characteristics card and channel asymmetries time variation required downtime and complexity of the measurement procedure
more and more software projects use commercial off the shelf cots components although previous studies have proposed specific cots based development processes there are few empirical studies that investigate how to use and customize cots based development processes for different project contexts this paper describes an exploratory study of state of the practice of cots based development processes sixteen software projects in the norwegian it companies have been studied by structured interviews the results are that cots specific activities can be successfully incorporated in most traditional development processes such as waterfall or prototyping given proper guidelines to reduce risks and provide specific assistance we have identified four cots specific activities the build vs buy decision cots component selection learning and understanding cots components and cots component integration and one new role that of knowledge keeper we have also found special cots component selection activity for unfamiliar components combining internet searches with hands on trials the process guidelines are expressed as scenarios problems encountered and examples of good practice they can be used to customize the actual development processes such as in which lifecycle phase to put the new activities into such customization crucially depends on the project context such as previous familiarity with possible cots components and flexibility of requirements
transient hardware fault occurs when an energetic particle strikes transistor causing it to change state these faults do not cause permanent damage but may result in incorrect program execution by altering signal transfers or stored values while the likelihood that such transient faults will cause any significant damage may seem remote over the last several years transient faults have caused costly failures in high end machines at america online ebay and the los alamos neutron science center among others because susceptibility to transient faults is proportional to the size and density of transistors the problem of transient faults will become increasingly important in the coming decadesthis paper defines the first formal type theoretic framework for studying reliable computation in the presence of transient faults more specifically it defines λzap lambda calculus that exhibits intermittent data faults in order to detect and recover from these faults λzap programs replicate intermediate computations and use majority voting thereby modeling software based fault tolerance techniques studied extensively but informally to ensure that programs maintain the proper invariants and use λzap primitives correctly the paper defines type system for the language this type system guarantees that well typed programs can tolerate any single data fault to demonstrate that λzap can serve as an idealized typed intermediate language we define type preserving translation from standard simply typed lambda calculus into λzap
we present software based solution to the multi dimensional packet classification problem which can operate at high line speeds eg in excess of gbps using high end multi core desktop platforms available today our solution called storm leverages common notion that subset of rules are likely to be popular over short durations of time by identifying suitable set of popular rules one can significantly speed up existing software based classification algorithms key aspect of our design is in partitioning processor resources into various relevant tasks such as continuously computing the popular rules based on sampled subset of traffic fast classification for traffic that matches popular rules dealing with packets that do not match the most popular rules and traffic sampling our results show that by using single core xeon processor desktop platform it is possible to sustain classification rates of more than gbps for representative rule sets of size in excess of dimensional rules with no packet losses this performance is significantly superior to way implementation of state of the art packet classification software system running on the same core machine therefore we believe that our design of packet classification functions can be useful classification building block for routebricks style designs where core router might be constructed as mesh of regular desktop machines
gradient domain processing is widely used to edit and combine images in this article we extend the framework in two directions first we adapt the gradient domain approach to operate on spherical domain to enable operations such as seamless stitching dynamic range compression and gradient based sharpening over spherical imagery an efficient streaming computation is obtained using new spherical parameterization with bounded distortion and localized boundary constraints second we design distributed solver to efficiently process large planar or spherical images the solver partitions images into bands streams through these bands in parallel within networked cluster and schedules computation to hide the necessary synchronization latency we demonstrate our contributions on several datasets including the digitized sky survey terapixel spherical scan of the night sky
the world wide web has been considered one of the important sources for information using search engines to retrieve web pages can gather lots of information including foreign information however to be better understood by local readers proper names in foreign language such as english are often transliterated to local language such as chinese due to different translators and the lack of translation standard translating foreign proper nouns may result in different transliterations and pose notorious headache in particular it may cause incomplete search results using one transliteration as query keyword will fail to retrieve the web pages which use different word as the transliteration consequently important information may be missed we present framework for mining synonymous transliterations as many as possible from the web for given transliteration the results can be used to construct database of synonymous transliterations which can be utilized for query expansion so as to alleviate the incomplete search problem experimental results show that the proposed framework can effectively retrieve the set of snippets which may contain synonymous transliterations and then extract the target terms most of the extracted synonymous transliterations have higher rank of similarity to the input transliteration compared to other noise terms
abstraction is commonly recognized as ubiquitous mechanism in human action conceptions about principles concepts and constructs of abstraction are however quite vague and divergent in the literature this paper proposes an ontology for abstraction composed of two inter related parts the first order abstraction defines concept things called primary things and their abstraction based relationships the second order abstraction also known as predicate abstraction involves predicates that characterize primary things the ontology covers four basic abstraction principles classification generalization composition and grouping for each of them key concepts and structural rules are defined and predicate derivation is discussed the ontology is also described in meta models in uml based ontology representation language we believe that the abstraction ontology can promote the achievement of shared understanding of abstraction principles and constructs predicate abstraction can also be used as foundation on which more sound systems of perspectives and viewpoints for database design and information systems development can be built
xml data sources are gaining popularity in the context of business intelligence and on line analytical processing olap applications due to the amenities of xml in representing and managing complex and heterogeneous data however xml native database systems currently suffer from limited performance both in terms of volumes of manageable data and query response time therefore recent research efforts are focusing on horizontal fragmentation techniques which are able to overcome the above limitations however classical fragmentation algorithms are not suitable to control the number of originated fragments which instead plays critical role in data warehouses in this paper we propose the use of the means clustering algorithm for effectively and efficiently supporting the fragmentation of very large xml data warehouses we complement our analytical contribution with comprehensive experimental assessment where we compare the efficiency of our proposal against existing fragmentation algorithms
increasingly applications need to be able to self reconfigure in response to changing requirements and environmental conditions autonomic computing has been proposed as means for automating software maintenance tasks as the complexity of adaptive and autonomic systems grows designing and managing the set of reconfiguration rules becomes increasingly challenging and may produce inconsistencies this paper proposes an approach to leverage genetic algorithms in the decision making process of an autonomic system this approach enables system to dynamically evolve reconfiguration plans at run time in response to changing requirements and environmental conditions key feature of this approach is incorporating system and environmental monitoring information into the genetic algorithm such that specific changes in the environment automatically drive the evolutionary process towards new viable solutions we have applied this genetic algorithm based approach to the dynamic reconfiguration of collection of remote data mirrors with the goal of minimizing costs while maximizing data reliability and network performance even in the presence of link failures
the continual decrease in transistor size through either scaled cmos or emerging nano technologies promises to usher in an era of tera to peta scale integration however this decrease in size is also likely to increase defect densities contributing to the exponentially increasing cost of top down lithography bottom up manufacturing techniques like self assembly may provide viable lower cost alternative to top down lithography but may also be prone to higher defects therefore regardless of fabrication methodology defect tolerant architectures are necessary to exploit the full potential of future increased device densitiesthis paper explores defect tolerant simd architecture key feature of our design is the ability of large number of limited capability nodes with high defect rates up to to self organize into set of simd processing elements despite node simplicity and high defect rates we show that by supporting the familiar data parallel programming model the architecture can execute variety of programs the architecture efficiently exploits large number of nodes and higher device densities to keep device switching speeds and power density low on medium sized system cm area the performance of the proposed architecture on our data parallel programs matches or exceeds the performance of an aggressively scaled out of order processor wide reorder buffer perfect memory system for larger systems cm the proposed architecture can match the performance of chip multiprocessor with aggressively scaled out of order cores
the need of methodologies and software tools that ease the development of applications where distributed human or software agents search trade and negotiate resources is great on the other hand electronic institutions of multiple agents can play main role in the development of systems where normative specifications play vital role electronic institutions define the rules of the game in agent societies by fixing what agents are permitted and forbidden to do and under what circumstances in this paper we present case study on the use of specific tools supporting the specification analysis and execution of institutions for maritime chartering proposing an infrastructure for internet based virtual chartering markets mavcm
regression testing is verifying that previously functioning software remains after change with the goal of finding basis for further research in joint industry academia research project we conducted systematic review of empirical evaluations of regression test selection techniques we identified papers reporting empirical studies experiments and case studies in total techniques for regression test selection are evaluated we present qualitative analysis of the findings an overview of techniques for regression test selection and related empirical evidence no technique was found clearly superior since the results depend on many varying factors we identified need for empirical studies where concepts are evaluated rather than small variations in technical implementations
non rigid shape correspondence is fundamental and difficult problem most applications which require correspondence rely on manually selected markers without user assistance the performances of existing automatic correspondence methods depend strongly on good initial shape alignment or shape prior and they generally do not tolerate large shape variations we present an automatic feature correspondence algorithm capable of handling large non rigid shape variations as well as partial matching this is made possible by leveraging the power of state of the art mesh deformation techniques and relying on combinatorial tree traversal for correspondence search the search is deformation driven prioritized by self distortion energy measured on meshes deformed according to given correspondence we demonstrate the ability of our approach to naturally match shapes which differ in pose local scale part decomposition and geometric detail through numerous examples
assuring and evolving concurrent programs requires understanding the concurrency related design decisions used in their implementation in java style shared memory programs these decisions include which state is shared how access to it is regulated the roles of threads and the policy that distinguishes desired concurrency from race conditions these decisions rarely have purely local manifestations in codein this paper we use case studies from production java code to explore the costs and benefits of new annotation based approach for expressing design intent our intent is both to assist in establishing thread safety attributes in code and to support tools that safely restructure code for example shifting critical section boundaries or splitting locks the annotations we use express mechanical properties such as lock state associations uniqueness of references and encapsulation of state into named aggregations our analyses revealed race conditions in our case study samples drawn from open source projects and library codethe novel technical features of this approach include flexible encapsulation via aggregations of state that can cross object boundaries the association of locks with state aggregations policy descriptions for allowable method interleavings and the incremental process for inserting validating and exploiting annotations
current semantic web services lack reusability and conceptual separation between services and goals we propose unified architecture based on the principles of wsmf and upml we introduce goal and domain independent web services reuse is achieved through the use of bridges and refiners for goal web service and domain descriptions
we survey the current techniques to cope with the problem of string matching that allows errors this is becoming more and more relevant issue for many fast growing areas such as information retrieval and computational biology we focus on online searching and mostly on edit distance explaining the problem and its relevance its statistical behavior its history and current developments and the central ideas of the algorithms and their complexities we present number of experiments to compare the performance of the different algorithms and show which are the best choices we conclude with some directions for future work and open problems
millions of people access the plentiful web content to locate information that is of interest to them searching is the primary web access method for many users during search the users visit web search engine and use an interface to specify query typically comprising few keywords that best describes their information need upon query issuing the engine’s retrieval modules identify set of potentially relevant pages in the engine’s index and return them to the users ordered in way that reflects the pages relevance to the query keywords currently all major search engines display search results as ranked list of urls pointing to the relevant pages physical location on the web accompanied by the returned pages titles and small text fragments that summarize the context of search keywords such text fragments are widely known as snippets and they serve towards offering glimpse to the returned pages contents in general text snippets extracted from the retrieved pages are an indicator of the pages usefulness to the query intention and they help the users browse search results and decide on the pages to visit thus far the extraction of text snippets from the returned pages contents relies on statistical methods in order to determine which text fragments contain most of the query keywords typically the first two text nuggets in the page’s contents that contain the query keywords are merged together to produce the final snippet that accompanies the page’s title and url in the search results unfortunately statistically generated snippets are not always representative of the pages contents and they are not always closely related to the query intention such text snippets might mislead web users in visiting pages of little interest or usefulness to them in this article we propose snippet selection technique which identifies within the contents of the query relevant pages those text fragments that are both highly relevant to the query intention and expressive of the pages entire contents the motive for our work is to assist web users make informed decisions before clicking on page in the list of search results towards this goal we firstly show how to analyze search results in order to decipher the query intention then we process the content of the query matching pages in order to identify text fragments that highly correlate to the query semantics finally we evaluate the query related text fragments in terms of coherence and expressiveness and pick from every retrieved page the text nugget that highly correlates to the query intention and is also very representative of the page’s content thorough evaluation over large number of web pages and queries suggests that the proposed snippet selection technique extracts good quality text snippets with high precision and recall that are superior to existing snippet selection methods our study also reveals that the snippets delivered by our method can help web users decide on which results to click overall our study suggests that semantically driven snippet selection can be used to augment traditional snippet extraction approaches that are mainly dependent upon the statistical properties of words within text
we present method for measuring the semantic similarity of texts using corpus based measure of semantic word similarity and normalized and modified version of the longest common subsequence lcs string matching algorithm existing methods for computing text similarity have focused mainly on either large documents or individual words we focus on computing the similarity between two sentences or two short paragraphs the proposed method can be exploited in variety of applications involving textual knowledge representation and knowledge discovery evaluation results on two different data sets show that our method outperforms several competing methods
new heterogeneous multiprocessor platforms are emerging that are typically composed of loosely coupled components that exchange data using programmable interconnections the components can be cpus or dsps specialized ip cores reconfigurable units or memories to program such platform we use the process network pn model of computation the localized control and distributed memory are the two key ingredients of pn allowing us to program the platforms the localized control matches the loosely coupled components and the distributed memory matches the style of interaction between the components to obtain applications in pn format we have built the compaan compiler that translates affine nested loop programs into functionally equivalent pns in this paper we describe novel analytical translation procedure we use in our compiler that is based on integer linear programming the translation procedure consists of four main steps and we will present each step by describing the main idea involved followed by representative example
we are developing tools to support conversational metaphor for requirements definition and analysis our conversational model consists of three components hypertextual representation of requirements and their interrelations an issue based speech act model and typology of changes these components act together in model we call the inquiry cycle we discuss requirements analysis activities supported by the conversational model including information retrieval and navigation rationale management and agenda management we have implemented prototype active hypertext system and we have applied our model and implementation to the requirements for an atm banking system an example we use in the paper for illustration
the battery in contrast to other hardware is not governed by moore’s law in location aware computing power is very limited resource as consequence recently number of promising techniques in various layers have been proposed to reduce the energy consumption the paper considers the problem of minimizing the energy used to track the location of mobile user over wireless link in mobile computing energy efficient location update protocol can be done by reducing the number of location update messages as possible and switching off as long as possible this can be achieved by the concept of mobility awareness we propose for this purpose this paper proposes novel mobility model called state based mobility model smm to provide more generalized framework for both describing the mobility and updating location information of complexly moving objects we also introduce the state based location update protocol slup based on this mobility model an extensive experiment on various synthetic datasets shows that the proposed method improves the energy efficiency by sim times with the additional of imprecision cost
access control features are often spread across and tangled with other functionality in design this makes modifying and replacing these features in design difficult aspect oriented modeling aom techniques can be used to support separation of access control concerns from other application design concerns using an aom approach access control features are described by aspect models and other application features are described by primary model composition of aspect and primary models yields design model in which access control features are integrated with other application features in this paper we present through an example an aom approach that supports verifiable composition of behaviors described in access control aspect models and primary models given an aspect model primary model and specified property the composition technique produces proof obligations as the behavioral descriptions in the aspect and primary models are composed one has to discharge the proof obligations to establish that the composed model has the specified property
crowd simulation techniques have frequently been used to animate large group of virtual humans in computer graphics applications we present data driven method of simulating crowd of virtual humans that exhibit behaviors imitating real human crowds to do so we record the motion of human crowd from an aerial view using camcorder extract the two dimensional moving trajectories of each individual in the crowd and then learn an agent model from observed trajectories the agent model decides each agent’s actions based on features of the environment and the motion of nearby agents in the crowd once the agent model is learned we can simulate virtual crowd that behaves similarly to the real crowd in the video the versatility and flexibility of our approach is demonstrated through examples in which various characteristics of group behaviors are captured and reproduced in simulated crowds
we partition the time line in different ways for example into minutes hours days etc when reasoning about relations between events and processes we often reason about their location within such partitions for example happened yesterday and happened today consequently and are disjoint reasoning about these temporal granularities so far has focussed on temporal units relations between minute hour slots shall argue in this paper that in our representations and reasoning procedures we need into account that events and processes often lie skew to the cells of our partitions for example lsquo happened yesterday rsquo does not mean that started at am and ended pm this has the consequence that our descriptions of temporal location of events and processes are often approximate and rough in nature rather than exact and crisp in this paper describe representation and reasoning methods that take the approximate character of our descriptions and the resulting limits granularity of our knowledge explicitly into account
we present the design implementation and evaluation of fully distributed directory service for farsite logically centralized file system that is physically implemented on loosely coupled network of desktop computers prior to this work the farsite system included distributed mechanisms for file content but centralized mechanisms for file metadata our distributed directory service introduces tree structured file identifiers that support dynamically partitioning metadata at arbitrary granularity recursive path leases for scalably maintaining name space consistency and protocol for consistently performing operations on files managed by separate machines it also mitigates metadata hotspots via file field leases and the new mechanism of disjunctive leases we experimentally show that farsite can dynamically partition file system metadata while maintaining full file system semantics
formulating appropriate and effective queries has been regarded as challenging issue since large number of candidate words or phrases could be chosen as query terms to convey users information needs in this paper we propose an approach to rank set of given query terms according their effectiveness wherein top ranked terms will be selected as an effective query our ranking approach exploits and benefits from the underlying relationship between the query terms and thereby the effective terms can be properly combined into the query two regression models which capture rich set of linguistic and statistical properties are used in our approach experiments on ntcir ad hoc retrieval tasks demonstrate that the proposed approach can significantly improve retrieval performance and can be well applied to other problems such as query expansion and querying by text segments
this paper describes webflow an environment that supports distributed coordination services on the world wide web webflow leverages the http web transport protocol and consists of number of tools for the development of applications that require the coordination of multiple distributed servers typical applications of webflow include distributed document workspaces inter intra enterprise workflow and electronic commerce in this paper we describe the general webflow architecture for distributed coordination and then focus on the environment for distributed workflow
we propose new functional framework for modeling querying and reasoning about olap databases the framework represents data data cubes and dimensional hierarchies and querying constructs as first order and second order functional symbols respectively polymorphic attribute based type system is used to annotate the functional symbols with proper type information furthermore semantic knowledge about the functional symbols such as the properties of dimensional hierarchical structures and algebraic identities among query constructs can be specified by equations which permits equational reasoning on equivalence of olap queries and generalized summarizability of aggregate views
solar sol for advanced reasoning is first order clausal consequence finding system based on the sol skip ordered linear tableau calculus the ability to find non trivial consequences of an axiom set is useful in many applications of artificial intelligence such as theorem proving query answering and nonmonotonic reasoning sol is connection tableau calculus which is complete for finding the non subsumed consequences of clausal theory solar is an efficient implementation of sol that employs several methods to prune away redundant branches of the search space this paper introduces some of the key pruning and control strategies implemented in solar and demonstrates their effectiveness on collection of benchmark problems
response to large scale emergencies is cooperative process that requires the active and coordinated participation of variety of functionally independent agencies operating in adjacent regions in practice this essential cooperation is sometimes not attained or is reduced due to poor information sharing non fluent communication flows and lack of coordination we report an empirical study of it mediated cooperation among spanish response agencies and we describe the challenges of adoption information sharing communication flows and coordination among agencies that do not share unity of command we analyze three strategies aimed at supporting acceptance and surmounting political organizational and personal distrust or skepticism participatory design advanced collaborative tools inducing cognitive absorption and end user communities of practice
in this paper we propose new approach for designing distributed systems to survive internet catastrophes called informed replication and demonstrate this approach with the design and evaluation of cooperative backup system called the phoenix recovery service informed replication uses model of correlated failures to exploit software diversity the key observation that makes our approach both feasible and practical is that internet catastrophes result from shared vulnerabilities by replicating system service on hosts that do not have the same vulnerabilities an internet pathogen that exploits vulnerability is unlikely to cause all replicas to fail to characterize software diversity in an internet setting we measure the software diversity of host operating systems and network services in large organization we then use insights from our measurement study to develop and evaluate heuristics for computing replica sets that have number of attractive features our heuristics provide excellent reliability guarantees result in low degree of replication limit the storage burden on each host in the system and lend themselves to fully distributed implementation we then present the design and prototype implementation of phoenix and evaluate it on the planetlab testbed
this paper presents an iterative technique to accurately reverse engineer models of the behaviour of software systems key novelty of the approach is the fact that it uses model based testing to refine the hypothesised model the process can in principle be entirely automated and only requires very small amount of manually generated information to begin with we have implemented the technique for use in the development of erlang systems and describe both the methodology as well as our implementation
the notion of tree projection provides natural generalization for various structural decomposition methods which have been proposed in the literature in order to single out classes of nearly acyclic hyper graphs in this paper the mathematical properties of the notion of tree projection are surveyed and the complexity of the basic tree projection problem of deciding the existence of tree projection is pinpointed in more details game theoretic characterization in terms of the robber and captain game for tree projections is described which yields simple argument for the membership in np of the tree projection problem eventually the main ideas proposed in and underlying the proof of np hardness of the tree projection problem are discussed
when visiting cities as tourists most users intend to explore the area looking for interesting things to see or for information about places events and so on to inform user choice an adaptive information system should provide contextual information information clustering and comparative presentation of objects of potential interest in the area where the user is located to this aim we developed system called mymap able to generate personalized presentation of objects of interest starting from an annotated city map mymap combines context and user modeling with natural language generation for suggesting to the user what could be interesting to see and do using as interaction metaphor an annotated tourist map an evaluation study has shown that the quality of the generated description is adequate compared with human written descriptions
real time applications in wireless networks are emerging in multimedia product and design however conventional real time message scheduling algorithms generally do not take energy efficiency into account when making scheduling decisions in this paper we address the issue of scheduling real time messages in wireless networks subject to timing and power constraints novel message scheduling scheme or parm power aware real time message is developed to generate optimal schedules that minimize both power consumption and the probability of missing deadlines for real time messages with power aware scheduling policy in place the proposed parm scheme is very energy efficient in addition we extended power consumption model to calculate power consumption rates in accordance to message transmission rates experimental results show that parm significantly improves the performance in terms of missed rate energy efficiency and overall performance over four baseline message scheduling schemes
thin client diagramming tools provide number of advantages over traditional thick client design tools but are challenging to build we describe an extension to thick client meta tool that allows any specified diagram editor to be realised as thin client tool set of server side components interact with the thick client tool to generate gif or svg diagrams for both display and editing in conventional web browser we describe the motivation for our work our novel architecture illustrate and discuss interaction issues with the generated diagrams and describe evaluations of the effectiveness of our approach
behaviors play an important role to relationship semantics in this paper we present how behavioral aspects of structures are conceived in callimachus structural computing environment callimachus supports the definition of behavioral designs called propagation templates that assist in addressing behavioral concerns of structures within structure servers propagation templates provide higher level of abstraction and signify an attempt to move from an atom based view of behaviors to system and pattern based view
the lions’s share of the software faults can be traced to requirements and specification errors so improvements in requirements engineering can have large impact on the effectiveness of the overall system development process weak link in the chain is the transition from the vague and informal needs of system stakeholders to the formal models that support theoretical analysis and software toolsthis paper explains the context for the monterey workshop that was dedicated to this problem it provides the case study that participants were asked to use to illustrate their new methods and summarizes the discussion and conclusions of the workshop
to date most association rule mining algorithms have assumed that the domains of items are either discrete or in limited number of cases hierarchical categorical or linear this constrains the search for interesting rules to those that satisfy the specified quality metrics as independent values or as higher level concepts of those values however in many cases the determination of single hierarchy is not practicable and for many datasets an item’s value may be taken from domain that is more conveniently structured as graph with weights indicating semantic or conceptual distance research in the development of algorithms that generate disjunctive association rules has allowed the production of rules such as radios or tvs rarr cables in many cases there is little semantic relationship between the disjunctive terms and arguably less readable rules such as radios or tuesday rarr cables can result this paper describes two association rule mining algorithms semgramg and semgramp that accommodate conceptual distance information contained in semantic graph the semgram algorithms permit the discovery of rules that include an association between sets of cognate groups of item values the paper discusses the algorithms the design decisions made during their development and some experimental results
based on human psychological cognitive behavior comprehensive and adaptive trust cat model for large scale pp networks is proposed firstly an adaptive trusted decision making method based on hew historical evidences window is proposed which can not only reduce the risk and improve system efficiency but also solve the trust forecasting problem when the direct evidences are insufficient then direct trust computing method based on iowa induced ordered weighted averaging operator and feedback trust converging mechanism based on dtt direct trust tree are set up which makes the model have better scalability than previous studies at the same time two new parameters confidence factor and feedback factor are introduced to assign the weights to direct trust and feedback trust adaptively which overcomes the shortage of traditional method in which the weights are assigned by subjective ways simulation results show that compared to the existing approaches the proposed model has remarkable enhancements in the accuracy of trust decision making and has better dynamic adaptation capability in handling various dynamic behaviors of peers
aspect oriented software development aosd has primarily focused on linguistic and meta linguistic mechanisms for separating concerns in program source however the kinds of concern separation and complexity management that aosd endeavors to achieve are not the exclusive province of programming language designin this paper we propose new model of concern separation called visual separation of concerns vsc which is based on new model of program storage by altering the mechanisms used to store and manipulate program artifacts much of the capability of concern separation can be captured without performing any linguistic transformations we also describe our implementation of vsc which is based on stellation an experimental software configuration management system the vsc approach combined with software configuration management can have advantages over conventional approaches by avoiding program transformations by providing persistent storage of features such as concern maps and by enabling new techniques for concern identification and manipulation
interface designers normally strive for design that minimises the user’s effort however when the design’s objective is to train users to interact with interfaces that are highly dependent on spatial properties eg keypad layout or gesture shapes we contend that designers should consider explicitly increasing the mental effort of interaction to test the hypothesis that effort aids spatial memory we designed frost brushing interface that forces the user to mentally retrieve spatial information or to physically brush away the frost to obtain visual guidance we report results from two experiments using virtual keypad interfaces the first concerns spatial location learning of buttons on the keypad and the second concerns both location and trajectory learning of gesture shape the results support our hypothesis showing that the frost brushing design improved spatial learning the participants subjective responses emphasised the connections between effort engagement boredom frustration and enjoyment suggesting that effort requires careful parameterisation to maximise its effectiveness
the provenance or lineage of workflow data product can be reconstructed by keeping complete trace of workflow execution this lineage information however is likely to be both imprecise because of the black box nature of the services that compose the workflow and noisy because of the many trivial data transformations that obscure the intended purpose of the workflow in this paper we argue that these shortcomings can be alleviated by introducing small set of optional lightweight annotations to the workflow in principled way we begin by presenting baseline annotation free lineage model for the taverna workflow system and then show how the proposed annotations improve the results of fundamental lineage queries
in wireless sensor networks many routing algorithms are designed to implement energy efficient mechanisms among those some focus on maximising an important performance index called network lifetime which is the number of messages successfully delivered in the network before failure in this paper we propose new online algorithm taking the goal of prolonging network lifetime when making routing decisions our algorithm named traffic aware energy efficient taee routing protocol utilises prospective traffic load information for further load balance in addition to power related metrics used in an enhanced cost function in calculating least cost paths an algorithm for automatic parameter adaption is also described to better accommodate to large scale sensor networks we further introduce random grouping scheme which enables hierarchical taee routing to run within and cross the dynamically formed groups to reduce computation and routing overhead while maintaining global energy efficiency our simulation shows that compared with the leading power aware max min zp lt sub align right gt min protocol the taee protocol generates better performance in terms of network lifetime without jeopardising network capacity
in recent years we have witnessed great increase in the interest in trust management tm techniques both from the industrial and the academic sectors the booming research has also determined duality in the very definition of tm system which can lead to confusion in one of the two categories of tm systems great deal of work has yet to be done in advancing the inherently adaptive nature of trust this position paper examines reasons for the success of tm the two broad tm categories and for reputation based tm issues of regret management and accountability that are necessary enhancements on the road leading to much more sophisticated tm architectures